ci: add brev launchable shellcheck and weekly e2e workflow by elookpotts-nvidia · Pull Request #829 · NVIDIA/OSMO

elookpotts-nvidia · 2026-04-07T21:59:22Z

Description

Adds CI for the Brev launchable (deployments/brev/).

Shellcheck runs on every PR touching deployments/brev/**, catching shell syntax and style issues in setup.sh.

Weekly E2E runs every Monday (and on manual dispatch) against a real Brev GPU instance provisioned from the launchable. It:

Verifies the installed osmo/quick-start Helm chart version matches the latest published release (fails the job if not)
Runs three workflow smoke tests: hello world (CPU), Isaac Sim SDG (GPU, from cookbook), and a large-image pull test using nvcr.io/nvidia/nemo:24.12 (~40 GB, validating the Docker data-root fix)
Always tears down the Brev instance, even on failure

The large-image test workflow (disk-fill-test.yaml) lives in deployments/brev/ rather than the cookbook since it exists to validate the deployment itself, not as a user-facing example.

Requires a brev-e2e GitHub environment with BREV_API_TOKEN and NGC_API_KEY secrets.

Issue #None

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Summary by CodeRabbit

New Features
- Added automated testing infrastructure for deployment validation and workflow execution.
- Introduced scheduled and manual trigger options for continuous integration testing.
- Enabled multi-workflow testing with configurable timeouts and automated instance cleanup.

coderabbitai · 2026-04-07T21:59:28Z

📝 Walkthrough

Walkthrough

Adds a new GitHub Actions workflow for automated end-to-end testing on Brev instances, triggered on schedule, manual dispatch, or PR changes to deployments/brev/. Includes provisioning, SSH/Kubernetes readiness validation, Helm version comparison, NGC credential configuration, and submission of multiple test workflows. Also adds a disk-fill-test workflow definition.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow `.github/workflows/brev.yml`	New Brev testing workflow with E2E provisioning, SSH/K8s readiness checks, Helm version validation, NGC credential setup, and automated workflow submission with polling and cleanup logic.
Test Workflow Definition `deployments/brev/disk-fill-test.yaml`	New workflow definition for disk-fill-test task that imports NeMo module and exercises container image layer writing with default resource requests.

Sequence Diagram(s)

sequenceDiagram
    actor GitHub as GitHub Actions
    participant Brev as Brev<br/>Service
    participant K8s as Kubernetes<br/>Cluster
    participant NGC as NGC<br/>Registry
    participant Osmo as Osmo<br/>Workflows

    GitHub->>Brev: Provision instance with launchable ID
    Brev-->>GitHub: Instance created
    GitHub->>Brev: Poll for IP/DNS via 'brev ls'
    Brev-->>GitHub: IP/DNS ready
    GitHub->>Brev: Validate SSH connection
    Brev-->>GitHub: SSH ready
    GitHub->>K8s: Poll pod status in osmo namespace
    K8s-->>GitHub: All pods Running/Completed
    GitHub->>Osmo: Compare Helm chart versions
    Osmo-->>GitHub: Version mismatch (record result)
    GitHub->>NGC: Configure credential via 'osmo credential set'
    NGC-->>GitHub: Credential configured
    GitHub->>Osmo: Submit hello_world workflow
    Osmo-->>GitHub: Workflow queued
    GitHub->>Osmo: Poll workflow status
    Osmo-->>GitHub: Workflow completed
    GitHub->>Osmo: Submit isaac_sim_sdg & disk-fill-test
    Osmo-->>GitHub: Workflows submitted & validated
    GitHub->>Brev: Delete instance (cleanup)
    Brev-->>GitHub: Instance deleted

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A Brev workflow hops to life,
With checks and balances, no strife!
From provisioning to pods so green,
The finest testing I've ever seen—
Workflows dance, then cleanup springs,
What infrastructure magic brings! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately summarizes the main changes: adding a ShellCheck job and a weekly e2e workflow for the Brev launchable, which are the primary objectives of the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch elookpotts/brev-test

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/brev.yml:
- Around line 69-72: The workflow step named "Install Brev CLI" currently pipes
a script from the mutable main branch into bash (curl | bash); replace that with
downloading the published GitHub Release tarball for the desired Brev version,
verify the release checksum/signature, extract the binary into $HOME/.brev/bin
(create the directory if needed), and then append $HOME/.brev/bin to
GITHUB_PATH; ensure the run block references a fixed release tag/version rather
than main and includes explicit checksum verification before extraction.
- Around line 111-116: The readiness check assigns NOT_READY from a pipeline
using grep -vcE which can exit non-zero when there are no matches, causing the
current "|| echo \"error\"" fallback to inject "error" and break the equality
check against "0". Change the pipeline that sets NOT_READY (the
ssh/kubectl/awk/grep pipeline) so it always returns a numeric count on success
or fallback to "0" on failure — for example replace the grep-count approach with
a robust alternative (e.g., pipe through wc -l or append "|| true"/"|| echo 0"
to the pipeline) so NOT_READY is reliably a number and the if [ "$NOT_READY" =
"0" ] check (and the readiness loop) works correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 435ac92f-5aed-4b15-a42e-8e405805a0b7

📥 Commits

Reviewing files that changed from the base of the PR and between df4bc8d and 0791e78.

📒 Files selected for processing (2)

.github/workflows/brev.yml
deployments/brev/disk-fill-test.yaml

coderabbitai · 2026-04-08T14:57:23Z

.github/workflows/brev.yml

+      - name: Install Brev CLI
+        run: |
+          curl -sfL https://raw.githubusercontent.com/brevdev/brev-cli/main/bin/install-brev.sh | bash
+          echo "$HOME/.brev/bin" >> "$GITHUB_PATH"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What is the recommended CI installation method for brevdev/brev-cli (without pipe-to-shell), and does the project publish release checksums/signatures for verification?

💡 Result:

Recommended CI install (no pipe-to-shell)

For CI, use the published GitHub Release tarball (pin a version), extract brev, and place it on PATH—i.e., do what the repo’s install-latest.sh does, but as explicit CI steps (so you’re not executing a remote script). The upstream installer script simply downloads the matching *_linux_{amd64,arm64}.tar.gz from GitHub Releases and installs the brev binary. [2]

(For non-CI / dev machines, the README’s “official” installs are Homebrew on macOS and the install-latest.sh approach on Linux/WSL; conda-forge is also mentioned.) [1]

Release verification (checksums/signatures)

Checksums: Yes. Releases include a brev-cli_<version>_checksums.txt asset. [3]

Signatures: I do not see any detached signature assets (e.g., .asc, .sig) alongside the release artifacts—only the tarballs and the checksums file are published. [3]

Sources: [1] README install section, [2] bin/install-latest.sh, [3] GitHub Releases API for latest release assets.

Use published GitHub Release tarball instead of pipe-to-shell from main branch.

Installing Brev CLI via curl | bash from the mutable main branch is a supply-chain risk, particularly in a CI job with environment secrets. For CI, use the published GitHub Release tarball with checksum verification instead.

Recommended approach

- curl -sfL https://raw.githubusercontent.com/brevdev/brev-cli/main/bin/install-brev.sh | bash + BREV_VERSION="v0.x.y" # Pin to a release tag + BREV_RELEASE_URL="https://github.com/brevdev/brev-cli/releases/download/${BREV_VERSION}/brev-cli_${BREV_VERSION#v}_linux_amd64.tar.gz" + curl -fsSL "${BREV_RELEASE_URL}" -o /tmp/brev-cli.tar.gz + curl -fsSL "https://github.com/brevdev/brev-cli/releases/download/${BREV_VERSION}/brev-cli_${BREV_VERSION#v}_checksums.txt" -o /tmp/checksums.txt + sha256sum --check --ignore-missing /tmp/checksums.txt + tar -xzf /tmp/brev-cli.tar.gz -C "$HOME/.brev/bin" --strip-components=1 echo "$HOME/.brev/bin" >> "$GITHUB_PATH"

This avoids script execution entirely and verifies checksums against published release artifacts.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/brev.yml around lines 69 - 72, The workflow step named "Install Brev CLI" currently pipes a script from the mutable main branch into bash (curl | bash); replace that with downloading the published GitHub Release tarball for the desired Brev version, verify the release checksum/signature, extract the binary into $HOME/.brev/bin (create the directory if needed), and then append $HOME/.brev/bin to GITHUB_PATH; ensure the run block references a fixed release tag/version rather than main and includes explicit checksum verification before extraction.

coderabbitai · 2026-04-08T14:57:23Z

.github/workflows/brev.yml

+            NOT_READY=$(ssh -i "$SSH_KEY" -o StrictHostKeyChecking=no \
+              "$SSH_USER@$INSTANCE_IP" \
+              "kubectl get pods -n osmo --no-headers 2>/dev/null \
+               | awk '{print \$3}' | grep -vcE '^(Running|Completed)$'" 2>/dev/null || echo "error")
+            if [ "$NOT_READY" = "0" ]; then
+              echo "OSMO is ready"


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Demonstrates current behavior: count=0 but fallback branch still runs due grep exit code. NOT_READY=$(bash -lc 'printf "p1 Running\np2 Completed\n" | awk "{print \$2}" | grep -vcE "^(Running|Completed)$"' || echo "error") printf 'Observed NOT_READY value: [%s]\n' "$NOT_READY" # Expected problematic output includes "error" despite zero non-ready pods.

Repository: NVIDIA/OSMO

Length of output: 206

🏁 Script executed:

cat -n .github/workflows/brev.yml | sed -n '105,120p'

Repository: NVIDIA/OSMO

Length of output: 838

🏁 Script executed:

# Test 1: Verify grep -vcE exit code behavior # When count is 0 (no non-matching lines), what is the exit code? bash -c 'echo -e "Running\nCompleted" | grep -vcE "^(Running|Completed)$"; echo "Exit code: $?"'

Repository: NVIDIA/OSMO

Length of output: 69

🏁 Script executed:

# Test 2: Verify the exact behavior with ssh-like command substitution # Simulate the pattern in the review comment bash -c 'result=$(echo -e "Running\nCompleted" | grep -vcE "^(Running|Completed)$" || echo "error"); echo "Result: [$result]"'

Repository: NVIDIA/OSMO

Length of output: 72

Readiness check can misclassify ready state and timeout indefinitely.

When all pods are Running/Completed, grep -vcE outputs 0 but exits non-zero (standard grep behavior with count mode and no matches). The || echo "error" fallback then executes, causing NOT_READY to contain both 0 and error as separate outputs. This causes the if [ "$NOT_READY" = "0" ] check on line 115 to fail indefinitely, resulting in the loop exhausting all 60 retries before timing out.

Suggested fix

- NOT_READY=$(ssh -i "$SSH_KEY" -o StrictHostKeyChecking=no \ - "$SSH_USER@$INSTANCE_IP" \ - "kubectl get pods -n osmo --no-headers 2>/dev/null \ - | awk '{print \$3}' | grep -vcE '^(Running|Completed)$'" 2>/dev/null || echo "error") + NOT_READY=$(ssh -i "$SSH_KEY" -o StrictHostKeyChecking=no \ + "$SSH_USER@$INSTANCE_IP" \ + "kubectl get pods -n osmo --no-headers 2>/dev/null \ + | awk '\$3 !~ /^(Running|Completed)$/ {c++} END {print c+0}'" 2>/dev/null || echo "error")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/brev.yml around lines 111 - 116, The readiness check assigns NOT_READY from a pipeline using grep -vcE which can exit non-zero when there are no matches, causing the current "|| echo \"error\"" fallback to inject "error" and break the equality check against "0". Change the pipeline that sets NOT_READY (the ssh/kubectl/awk/grep pipeline) so it always returns a numeric count on success or fallback to "0" on failure — for example replace the grep-count approach with a robust alternative (e.g., pipe through wc -l or append "|| true"/"|| echo 0" to the pipeline) so NOT_READY is reliably a number and the if [ "$NOT_READY" = "0" ] check (and the readiness loop) works correctly.

codecov · 2026-04-08T14:59:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.56%. Comparing base (df4bc8d) to head (0791e78).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #829      +/-   ##
==========================================
- Coverage   42.61%   42.56%   -0.06%     
==========================================
  Files         203      203              
  Lines       27025    27025              
  Branches     7652     7652              
==========================================
- Hits        11517    11502      -15     
- Misses      15397    15411      +14     
- Partials      111      112       +1

Flag	Coverage Δ
backend	`44.78% <ø> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

elookpotts-nvidia requested a review from a team as a code owner April 7, 2026 21:59

Base automatically changed from elookpotts/brev-mount to main April 7, 2026 23:14

elookpotts-nvidia added 2 commits April 8, 2026 10:45

ci: add brev launchable shellcheck and weekly e2e workflow

2ab6508

docs: add CI purpose docstring to disk-fill-test.yaml

0791e78

elookpotts-nvidia force-pushed the elookpotts/brev-test branch from 8234011 to 0791e78 Compare April 8, 2026 14:52

elookpotts-nvidia temporarily deployed to internal-ci April 8, 2026 14:52 — with GitHub Actions Inactive

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add brev launchable shellcheck and weekly e2e workflow#829

ci: add brev launchable shellcheck and weekly e2e workflow#829
elookpotts-nvidia wants to merge 2 commits intomainfrom
elookpotts/brev-test

elookpotts-nvidia commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 7, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 8, 2026

Uh oh!

coderabbitai bot Apr 8, 2026

Uh oh!

codecov bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elookpotts-nvidia commented Apr 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Recommended CI install (no pipe-to-shell)

Release verification (checksums/signatures)

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

elookpotts-nvidia commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 7, 2026 •

edited

Loading

codecov bot commented Apr 8, 2026 •

edited

Loading