Skip to content

ci: add brev launchable shellcheck and weekly e2e workflow#829

Open
elookpotts-nvidia wants to merge 2 commits intomainfrom
elookpotts/brev-test
Open

ci: add brev launchable shellcheck and weekly e2e workflow#829
elookpotts-nvidia wants to merge 2 commits intomainfrom
elookpotts/brev-test

Conversation

@elookpotts-nvidia
Copy link
Copy Markdown
Contributor

@elookpotts-nvidia elookpotts-nvidia commented Apr 7, 2026

Description

Adds CI for the Brev launchable (deployments/brev/).

Shellcheck runs on every PR touching deployments/brev/**, catching shell syntax and style issues in setup.sh.

Weekly E2E runs every Monday (and on manual dispatch) against a real Brev GPU instance provisioned from the launchable. It:

  • Verifies the installed osmo/quick-start Helm chart version matches the latest published release (fails the job if not)
  • Runs three workflow smoke tests: hello world (CPU), Isaac Sim SDG (GPU, from cookbook), and a large-image pull test using nvcr.io/nvidia/nemo:24.12 (~40 GB, validating the Docker data-root fix)
  • Always tears down the Brev instance, even on failure

The large-image test workflow (disk-fill-test.yaml) lives in deployments/brev/ rather than the cookbook since it exists to validate the deployment itself, not as a user-facing example.

Requires a brev-e2e GitHub environment with BREV_API_TOKEN and NGC_API_KEY secrets.

Issue #None

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features
    • Added automated testing infrastructure for deployment validation and workflow execution.
    • Introduced scheduled and manual trigger options for continuous integration testing.
    • Enabled multi-workflow testing with configurable timeouts and automated instance cleanup.

@elookpotts-nvidia elookpotts-nvidia requested a review from a team as a code owner April 7, 2026 21:59
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

Adds a new GitHub Actions workflow for automated end-to-end testing on Brev instances, triggered on schedule, manual dispatch, or PR changes to deployments/brev/. Includes provisioning, SSH/Kubernetes readiness validation, Helm version comparison, NGC credential configuration, and submission of multiple test workflows. Also adds a disk-fill-test workflow definition.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow
.github/workflows/brev.yml
New Brev testing workflow with E2E provisioning, SSH/K8s readiness checks, Helm version validation, NGC credential setup, and automated workflow submission with polling and cleanup logic.
Test Workflow Definition
deployments/brev/disk-fill-test.yaml
New workflow definition for disk-fill-test task that imports NeMo module and exercises container image layer writing with default resource requests.

Sequence Diagram(s)

sequenceDiagram
    actor GitHub as GitHub Actions
    participant Brev as Brev<br/>Service
    participant K8s as Kubernetes<br/>Cluster
    participant NGC as NGC<br/>Registry
    participant Osmo as Osmo<br/>Workflows

    GitHub->>Brev: Provision instance with launchable ID
    Brev-->>GitHub: Instance created
    GitHub->>Brev: Poll for IP/DNS via 'brev ls'
    Brev-->>GitHub: IP/DNS ready
    GitHub->>Brev: Validate SSH connection
    Brev-->>GitHub: SSH ready
    GitHub->>K8s: Poll pod status in osmo namespace
    K8s-->>GitHub: All pods Running/Completed
    GitHub->>Osmo: Compare Helm chart versions
    Osmo-->>GitHub: Version mismatch (record result)
    GitHub->>NGC: Configure credential via 'osmo credential set'
    NGC-->>GitHub: Credential configured
    GitHub->>Osmo: Submit hello_world workflow
    Osmo-->>GitHub: Workflow queued
    GitHub->>Osmo: Poll workflow status
    Osmo-->>GitHub: Workflow completed
    GitHub->>Osmo: Submit isaac_sim_sdg & disk-fill-test
    Osmo-->>GitHub: Workflows submitted & validated
    GitHub->>Brev: Delete instance (cleanup)
    Brev-->>GitHub: Instance deleted
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A Brev workflow hops to life,
With checks and balances, no strife!
From provisioning to pods so green,
The finest testing I've ever seen—
Workflows dance, then cleanup springs,
What infrastructure magic brings! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main changes: adding a ShellCheck job and a weekly e2e workflow for the Brev launchable, which are the primary objectives of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch elookpotts/brev-test

Comment @coderabbitai help to get the list of available commands and usage tips.

Base automatically changed from elookpotts/brev-mount to main April 7, 2026 23:14
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/brev.yml:
- Around line 69-72: The workflow step named "Install Brev CLI" currently pipes
a script from the mutable main branch into bash (curl | bash); replace that with
downloading the published GitHub Release tarball for the desired Brev version,
verify the release checksum/signature, extract the binary into $HOME/.brev/bin
(create the directory if needed), and then append $HOME/.brev/bin to
GITHUB_PATH; ensure the run block references a fixed release tag/version rather
than main and includes explicit checksum verification before extraction.
- Around line 111-116: The readiness check assigns NOT_READY from a pipeline
using grep -vcE which can exit non-zero when there are no matches, causing the
current "|| echo \"error\"" fallback to inject "error" and break the equality
check against "0". Change the pipeline that sets NOT_READY (the
ssh/kubectl/awk/grep pipeline) so it always returns a numeric count on success
or fallback to "0" on failure — for example replace the grep-count approach with
a robust alternative (e.g., pipe through wc -l or append "|| true"/"|| echo 0"
to the pipeline) so NOT_READY is reliably a number and the if [ "$NOT_READY" =
"0" ] check (and the readiness loop) works correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 435ac92f-5aed-4b15-a42e-8e405805a0b7

📥 Commits

Reviewing files that changed from the base of the PR and between df4bc8d and 0791e78.

📒 Files selected for processing (2)
  • .github/workflows/brev.yml
  • deployments/brev/disk-fill-test.yaml

Comment on lines +69 to +72
- name: Install Brev CLI
run: |
curl -sfL https://raw.githubusercontent.com/brevdev/brev-cli/main/bin/install-brev.sh | bash
echo "$HOME/.brev/bin" >> "$GITHUB_PATH"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What is the recommended CI installation method for brevdev/brev-cli (without pipe-to-shell), and does the project publish release checksums/signatures for verification?

💡 Result:

Recommended CI install (no pipe-to-shell)

For CI, use the published GitHub Release tarball (pin a version), extract brev, and place it on PATH—i.e., do what the repo’s install-latest.sh does, but as explicit CI steps (so you’re not executing a remote script). The upstream installer script simply downloads the matching *_linux_{amd64,arm64}.tar.gz from GitHub Releases and installs the brev binary. [2]

(For non-CI / dev machines, the README’s “official” installs are Homebrew on macOS and the install-latest.sh approach on Linux/WSL; conda-forge is also mentioned.) [1]

Release verification (checksums/signatures)

  • Checksums: Yes. Releases include a brev-cli_<version>_checksums.txt asset. [3]
  • Signatures: I do not see any detached signature assets (e.g., .asc, .sig) alongside the release artifacts—only the tarballs and the checksums file are published. [3]

Sources: [1] README install section, [2] bin/install-latest.sh, [3] GitHub Releases API for latest release assets.


Use published GitHub Release tarball instead of pipe-to-shell from main branch.

Installing Brev CLI via curl | bash from the mutable main branch is a supply-chain risk, particularly in a CI job with environment secrets. For CI, use the published GitHub Release tarball with checksum verification instead.

Recommended approach
-          curl -sfL https://raw.githubusercontent.com/brevdev/brev-cli/main/bin/install-brev.sh | bash
+          BREV_VERSION="v0.x.y"  # Pin to a release tag
+          BREV_RELEASE_URL="https://github.com/brevdev/brev-cli/releases/download/${BREV_VERSION}/brev-cli_${BREV_VERSION#v}_linux_amd64.tar.gz"
+          curl -fsSL "${BREV_RELEASE_URL}" -o /tmp/brev-cli.tar.gz
+          curl -fsSL "https://github.com/brevdev/brev-cli/releases/download/${BREV_VERSION}/brev-cli_${BREV_VERSION#v}_checksums.txt" -o /tmp/checksums.txt
+          sha256sum --check --ignore-missing /tmp/checksums.txt
+          tar -xzf /tmp/brev-cli.tar.gz -C "$HOME/.brev/bin" --strip-components=1
           echo "$HOME/.brev/bin" >> "$GITHUB_PATH"

This avoids script execution entirely and verifies checksums against published release artifacts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/brev.yml around lines 69 - 72, The workflow step named
"Install Brev CLI" currently pipes a script from the mutable main branch into
bash (curl | bash); replace that with downloading the published GitHub Release
tarball for the desired Brev version, verify the release checksum/signature,
extract the binary into $HOME/.brev/bin (create the directory if needed), and
then append $HOME/.brev/bin to GITHUB_PATH; ensure the run block references a
fixed release tag/version rather than main and includes explicit checksum
verification before extraction.

Comment on lines +111 to +116
NOT_READY=$(ssh -i "$SSH_KEY" -o StrictHostKeyChecking=no \
"$SSH_USER@$INSTANCE_IP" \
"kubectl get pods -n osmo --no-headers 2>/dev/null \
| awk '{print \$3}' | grep -vcE '^(Running|Completed)$'" 2>/dev/null || echo "error")
if [ "$NOT_READY" = "0" ]; then
echo "OSMO is ready"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Demonstrates current behavior: count=0 but fallback branch still runs due grep exit code.
NOT_READY=$(bash -lc 'printf "p1 Running\np2 Completed\n" | awk "{print \$2}" | grep -vcE "^(Running|Completed)$"' || echo "error")
printf 'Observed NOT_READY value: [%s]\n' "$NOT_READY"
# Expected problematic output includes "error" despite zero non-ready pods.

Repository: NVIDIA/OSMO

Length of output: 206


🏁 Script executed:

cat -n .github/workflows/brev.yml | sed -n '105,120p'

Repository: NVIDIA/OSMO

Length of output: 838


🏁 Script executed:

# Test 1: Verify grep -vcE exit code behavior
# When count is 0 (no non-matching lines), what is the exit code?
bash -c 'echo -e "Running\nCompleted" | grep -vcE "^(Running|Completed)$"; echo "Exit code: $?"'

Repository: NVIDIA/OSMO

Length of output: 69


🏁 Script executed:

# Test 2: Verify the exact behavior with ssh-like command substitution
# Simulate the pattern in the review comment
bash -c 'result=$(echo -e "Running\nCompleted" | grep -vcE "^(Running|Completed)$" || echo "error"); echo "Result: [$result]"'

Repository: NVIDIA/OSMO

Length of output: 72


Readiness check can misclassify ready state and timeout indefinitely.

When all pods are Running/Completed, grep -vcE outputs 0 but exits non-zero (standard grep behavior with count mode and no matches). The || echo "error" fallback then executes, causing NOT_READY to contain both 0 and error as separate outputs. This causes the if [ "$NOT_READY" = "0" ] check on line 115 to fail indefinitely, resulting in the loop exhausting all 60 retries before timing out.

Suggested fix
-            NOT_READY=$(ssh -i "$SSH_KEY" -o StrictHostKeyChecking=no \
-              "$SSH_USER@$INSTANCE_IP" \
-              "kubectl get pods -n osmo --no-headers 2>/dev/null \
-               | awk '{print \$3}' | grep -vcE '^(Running|Completed)$'" 2>/dev/null || echo "error")
+            NOT_READY=$(ssh -i "$SSH_KEY" -o StrictHostKeyChecking=no \
+              "$SSH_USER@$INSTANCE_IP" \
+              "kubectl get pods -n osmo --no-headers 2>/dev/null \
+               | awk '\$3 !~ /^(Running|Completed)$/ {c++} END {print c+0}'" 2>/dev/null || echo "error")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/brev.yml around lines 111 - 116, The readiness check
assigns NOT_READY from a pipeline using grep -vcE which can exit non-zero when
there are no matches, causing the current "|| echo \"error\"" fallback to inject
"error" and break the equality check against "0". Change the pipeline that sets
NOT_READY (the ssh/kubectl/awk/grep pipeline) so it always returns a numeric
count on success or fallback to "0" on failure — for example replace the
grep-count approach with a robust alternative (e.g., pipe through wc -l or
append "|| true"/"|| echo 0" to the pipeline) so NOT_READY is reliably a number
and the if [ "$NOT_READY" = "0" ] check (and the readiness loop) works
correctly.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.56%. Comparing base (df4bc8d) to head (0791e78).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #829      +/-   ##
==========================================
- Coverage   42.61%   42.56%   -0.06%     
==========================================
  Files         203      203              
  Lines       27025    27025              
  Branches     7652     7652              
==========================================
- Hits        11517    11502      -15     
- Misses      15397    15411      +14     
- Partials      111      112       +1     
Flag Coverage Δ
backend 44.78% <ø> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant