Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 151 additions & 0 deletions .github/workflows/brev.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# SPDX-License-Identifier: Apache-2.0

name: Brev Launchable

on:
# schedule:
# - cron: '0 9 * * 1' # Every Monday at 9:00 AM UTC — disabled until Brev supports API tokens
workflow_dispatch:
inputs:
model:
description: 'Claude model for compatibility test'
default: 'aws/anthropic/claude-opus-4-5'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
provider-compat:
runs-on: ubuntu-latest
timeout-minutes: 360
environment: brev-compat
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to set this env to only allow action deployment from main? If not, someone can potentially create PRs with modified prompt.md and be able to abuse our Claude API key?

permissions:
contents: write

steps:
- name: Checkout
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
with:
token: ${{ secrets.SVC_OSMO_CI_TOKEN }}
fetch-depth: 0

# ── Secret masking ──────────────────────────────────────────────────────
- name: Mask secrets
run: |
echo "::add-mask::${{ secrets.NGC_SERVICE_KEY }}"
echo "::add-mask::${{ secrets.BREV_API_TOKEN }}"

# ── Tool setup ──────────────────────────────────────────────────────────
- name: Setup Node.js
uses: actions/setup-node@6044e13b5dc448c55e2357c09f80417699197238 # v6.2.0
with:
node-version: 20

- name: Install Brev CLI
run: |
curl -sfL https://raw.githubusercontent.com/brevdev/brev-cli/main/bin/install-brev.sh | bash
echo "$HOME/.brev/bin" >> "$GITHUB_PATH"

- name: Login to Brev
env:
BREV_API_TOKEN: ${{ secrets.BREV_API_TOKEN }}
run: brev login --token "$BREV_API_TOKEN"

- name: Install Claude Code skills
run: |
brev agent-skill install # installs /brev-cli — always latest
mkdir -p ~/.claude/skills
cp -r skills/osmo-agent ~/.claude/skills/ # /osmo-agent — from repo

- name: Configure git
run: |
git config user.name "brev-compat[bot]"
git config user.email "brev-compat[bot]@users.noreply.github.com"

# ── Build and run prompt ─────────────────────────────────────────────────
- name: Render prompt
run: |
sed \
-e "s/{{GITHUB_RUN_ID}}/${{ github.run_id }}/g" \
-e "s/{{GITHUB_SHA}}/${{ github.sha }}/g" \
deployments/brev/prompt.md > "$RUNNER_TEMP/prompt.md"

- name: Run compatibility matrix
env:
ANTHROPIC_API_KEY: ${{ secrets.NVIDIA_NIM_KEY }}
ANTHROPIC_BASE_URL: https://inference-api.nvidia.com
ANTHROPIC_MODEL: ${{ inputs.model || 'aws/anthropic/claude-opus-4-5' }}
DISABLE_PROMPT_CACHING: "1"
BREV_API_TOKEN: ${{ secrets.BREV_API_TOKEN }}
NGC_SERVICE_KEY: ${{ secrets.NGC_SERVICE_KEY }}
run: |
npx @anthropic-ai/claude-code@2.1.91 --print \
--model "$ANTHROPIC_MODEL" \
--allowedTools "Skill,Bash(brev *),Read,Write,Edit,Glob,Grep" \
--max-turns 200 \
"$(cat "$RUNNER_TEMP/prompt.md")"

# ── Guardrail ────────────────────────────────────────────────────────────
- name: Guardrail — README only
run: |
CHANGED=$(git diff --name-only HEAD)
ALLOWED="deployments/brev/README.md"
UNEXPECTED=$(echo "$CHANGED" | grep -v "^$ALLOWED$" || true)
if [ -n "$UNEXPECTED" ]; then
echo "::error::Claude modified files outside README.md:"
echo "$UNEXPECTED"
git checkout -- .
exit 1
fi

# ── Commit results ───────────────────────────────────────────────────────
- name: Commit README
run: |
if git diff --quiet HEAD -- deployments/brev/README.md; then
echo "No README changes to commit"
else
git add deployments/brev/README.md
git commit -m "chore: update brev compatibility matrix [skip ci]"
git push origin HEAD:main
fi

# ── Fail if any instance regressed ──────────────────────────────────────
- name: Check compatibility result
run: |
if [ ! -f compat-result.txt ]; then
echo "::error::compat-result.txt not written — Claude may have failed"
exit 1
fi
RESULT=$(cat compat-result.txt)
echo "Compatibility result: $RESULT"
if [ "$RESULT" = "FAIL" ]; then
echo "::error::One or more providers failed OSMO compatibility — see README matrix"
exit 1
fi

# ── Cleanup (always) ─────────────────────────────────────────────────────
- name: Delete brev instances
if: always()
env:
BREV_API_TOKEN: ${{ secrets.BREV_API_TOKEN }}
run: |
brev login --token "$BREV_API_TOKEN" 2>/dev/null || true
brev ls 2>/dev/null \
| grep "osmo-compat-${{ github.run_id }}" \
| awk '{print $1}' \
| xargs -r -I{} sh -c 'brev delete {} || true'
25 changes: 25 additions & 0 deletions .github/workflows/pr-checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ jobs:
ci: ${{ steps.filter.outputs.ci }}
docs: ${{ steps.filter.outputs.docs }}
ui: ${{ steps.filter.outputs.ui }}
brev: ${{ steps.filter.outputs.brev }}
steps:
- name: Checkout
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
Expand Down Expand Up @@ -63,6 +64,10 @@ jobs:
ui:
- '.github/workflows/pr-checks.yaml'
- 'src/ui/**'
brev:
- '.github/workflows/pr-checks.yaml'
- 'deployments/brev/**'
- 'MODULE.bazel'

#######################
# CI Tests #
Expand Down Expand Up @@ -255,6 +260,26 @@ jobs:
echo "Host Docker disk:"
docker system df 2>/dev/null || true

#######################
# Brev Checks #
#######################
brev:
needs: [check-paths]
if: needs.check-paths.outputs.brev == 'true' || github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1

- name: Setup Bazel
uses: bazel-contrib/setup-bazel@4fd964a13a440a8aeb0be47350db2fc640f19ca8
with:
bazelisk-cache: true
bazelisk-version: 1.27.0

- name: Run brev tests
run: bazel test --test_output=errors //deployments/brev/...

#######################
# Docs Build #
#######################
Expand Down
22 changes: 22 additions & 0 deletions MODULE.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,28 @@ osmo_constants(
image_tag = IMAGE_TAG,
)

################
# Shellcheck #
################

# Hermetic shellcheck binaries for sh_test rules in //deployments/brev/...
http_archive(
name = "shellcheck_linux_x86_64",
build_file_content = 'exports_files(["shellcheck"])',
sha256 = "6c881ab0698e4e6ea235245f22832860544f17ba386442fe7e9d629f8cbedf87",
strip_prefix = "shellcheck-v0.10.0",
url = "https://github.com/koalaman/shellcheck/releases/download/v0.10.0/shellcheck-v0.10.0.linux.x86_64.tar.xz",
)

http_archive(
name = "shellcheck_darwin_arm64",
build_file_content = 'exports_files(["shellcheck"])',
sha256 = "bbd2f14826328eee7679da7221f2bc3afb011f6a928b848c80c321f6046ddf81",
strip_prefix = "shellcheck-v0.10.0",
url = "https://github.com/koalaman/shellcheck/releases/download/v0.10.0/shellcheck-v0.10.0.darwin.aarch64.tar.xz",
)


################
# Common #
################
Expand Down
54 changes: 54 additions & 0 deletions deployments/brev/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
"""
SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

SPDX-License-Identifier: Apache-2.0
"""

load("@rules_shell//shell:sh_test.bzl", "sh_test")

config_setting(
name = "linux_x86_64",
constraint_values = [
"@platforms//os:linux",
"@platforms//cpu:x86_64",
],
)

config_setting(
name = "macos_arm64",
constraint_values = [
"@platforms//os:macos",
"@platforms//cpu:arm64",
],
)

sh_test(
name = "shellcheck",
srcs = ["shellcheck_test.sh"],
data = ["setup.sh"] + select({
":linux_x86_64": ["@shellcheck_linux_x86_64//:shellcheck"],
":macos_arm64": ["@shellcheck_darwin_arm64//:shellcheck"],
}),
env = select({
":linux_x86_64": {
"SHELLCHECK": "$(location @shellcheck_linux_x86_64//:shellcheck)",
"SETUP_SH": "$(location setup.sh)",
},
":macos_arm64": {
"SHELLCHECK": "$(location @shellcheck_darwin_arm64//:shellcheck)",
"SETUP_SH": "$(location setup.sh)",
},
}),
)
32 changes: 32 additions & 0 deletions deployments/brev/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,38 @@ The OSMO Brev deployment provides a pre-configured OSMO instance running in the
- NVIDIA Container Toolkit (>=1.18.1)
- NVIDIA Driver Version (>=575)

### Compatibility Matrix

<!-- COMPAT_MATRIX_START -->

Last updated: 2026-04-09

| Provider | Instance Type | GPU | Hello World | Disk Fill | GPU Workload | Notes |
|----------|---------------|-----|-------------|-----------|--------------|-------|
| massedcompute | massedcompute_L40S | L40S 1× | ✅ | ✅ | ✅ | |
| massedcompute | massedcompute_L40 | L40 1× | ✅ | ✅ | ✅ | |
| hyperstack | hyperstack_L40 | L40 1× | ✅ | ✅ | ✅ | Driver <575 min |
| verda | verda_L40S | L40S 1× | ✅ | ✅ | ✅ | |
| scaleway | scaleway_L40S | L40S 1× | ✅ | ✅ | ✅ | Driver <575 min |
| crusoe | l40s-48gb.1x | L40S 1× | ✅ | ✅ | ❌ | nvidia-cdi-refresh failed; GPU not exposed |
| nebius | gpu-l40s-a.1gpu-8vcpu-32gb | L40S 1× | ❌ | ❌ | ❌ | Docker not pre-installed |
| aws | g6e.xlarge | L40S 1× | ❌ | ❌ | ❌ | brev SSH failure |

**Test definitions:**
- **Hello World** — `ubuntu:22.04`, 1 CPU / 1Gi memory / 0 GPU
- **Disk Fill** — `nvcr.io/nvidia/nemo:24.12` (~40 GB); validates Docker data-root relocation
- **GPU Workload** — verifies GPU is exposed in the default pool, then runs MNIST CNN on `nvcr.io/nvidia/pytorch:24.03-py3`

**Status codes:** ✅ · ❌ · `—` (not applicable)

<!-- COMPAT_MATRIX_END -->

<!-- To update manually:
export NGC_SERVICE_KEY=nvapi-...
claude "$(sed -e "s/{{GITHUB_RUN_ID}}/local-$(date +%Y%m%d%H%M%S)/g" -e "s/{{GITHUB_SHA}}/$(git rev-parse HEAD)/g" deployments/brev/prompt.md)

Note: run all brev commands with dangerouslyDisableSandbox: true" -->

## Accessing the Brev Deployment

### Web UI Access
Expand Down
47 changes: 47 additions & 0 deletions deployments/brev/disk-fill-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# SPDX-License-Identifier: Apache-2.0

# CI validation workflow for the Brev launchable (deployments/brev/).
#
# Purpose:
# Validates that the Docker data-root relocation in setup.sh correctly moves image
# storage off the root partition. Pulls nvcr.io/nvidia/nemo:24.12 (~40 GB), which is
# large enough to exhaust the root filesystem on a Brev instance if the fix is absent.
# A successful run confirms that image layers are written to the larger mounted disk.
#
# Used by: .github/workflows/brev.yml (weekly E2E job, "Test: large image" step)
#
# Manual use:
# Prerequisites — register your NGC API key once after OSMO setup:
# osmo credential set my-ngc-cred \
# --type REGISTRY \
# --payload registry=nvcr.io \
# username='$oauthtoken' \
# auth=<your_ngc_api_key>
# Then: osmo workflow submit disk-fill-test.yaml

workflow:
name: disk-fill-test
resources:
default:
cpu: 1
memory: 2Gi
storage: 1Gi
tasks:
- name: large-image
image: nvcr.io/nvidia/nemo:24.12
command: ["python3"]
args: ["-c", "import nemo; print(f'NeMo {nemo.__version__} running on OSMO — disk fix verified')"]
Loading
Loading