ci: add brev launchable shellcheck and weekly e2e workflow#829
ci: add brev launchable shellcheck and weekly e2e workflow#829elookpotts-nvidia wants to merge 2 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughAdds a new GitHub Actions workflow for automated end-to-end testing on Brev instances, triggered on schedule, manual dispatch, or PR changes to Changes
Sequence Diagram(s)sequenceDiagram
actor GitHub as GitHub Actions
participant Brev as Brev<br/>Service
participant K8s as Kubernetes<br/>Cluster
participant NGC as NGC<br/>Registry
participant Osmo as Osmo<br/>Workflows
GitHub->>Brev: Provision instance with launchable ID
Brev-->>GitHub: Instance created
GitHub->>Brev: Poll for IP/DNS via 'brev ls'
Brev-->>GitHub: IP/DNS ready
GitHub->>Brev: Validate SSH connection
Brev-->>GitHub: SSH ready
GitHub->>K8s: Poll pod status in osmo namespace
K8s-->>GitHub: All pods Running/Completed
GitHub->>Osmo: Compare Helm chart versions
Osmo-->>GitHub: Version mismatch (record result)
GitHub->>NGC: Configure credential via 'osmo credential set'
NGC-->>GitHub: Credential configured
GitHub->>Osmo: Submit hello_world workflow
Osmo-->>GitHub: Workflow queued
GitHub->>Osmo: Poll workflow status
Osmo-->>GitHub: Workflow completed
GitHub->>Osmo: Submit isaac_sim_sdg & disk-fill-test
Osmo-->>GitHub: Workflows submitted & validated
GitHub->>Brev: Delete instance (cleanup)
Brev-->>GitHub: Instance deleted
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
8234011 to
0791e78
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/brev.yml:
- Around line 69-72: The workflow step named "Install Brev CLI" currently pipes
a script from the mutable main branch into bash (curl | bash); replace that with
downloading the published GitHub Release tarball for the desired Brev version,
verify the release checksum/signature, extract the binary into $HOME/.brev/bin
(create the directory if needed), and then append $HOME/.brev/bin to
GITHUB_PATH; ensure the run block references a fixed release tag/version rather
than main and includes explicit checksum verification before extraction.
- Around line 111-116: The readiness check assigns NOT_READY from a pipeline
using grep -vcE which can exit non-zero when there are no matches, causing the
current "|| echo \"error\"" fallback to inject "error" and break the equality
check against "0". Change the pipeline that sets NOT_READY (the
ssh/kubectl/awk/grep pipeline) so it always returns a numeric count on success
or fallback to "0" on failure — for example replace the grep-count approach with
a robust alternative (e.g., pipe through wc -l or append "|| true"/"|| echo 0"
to the pipeline) so NOT_READY is reliably a number and the if [ "$NOT_READY" =
"0" ] check (and the readiness loop) works correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 435ac92f-5aed-4b15-a42e-8e405805a0b7
📒 Files selected for processing (2)
.github/workflows/brev.ymldeployments/brev/disk-fill-test.yaml
| - name: Install Brev CLI | ||
| run: | | ||
| curl -sfL https://raw.githubusercontent.com/brevdev/brev-cli/main/bin/install-brev.sh | bash | ||
| echo "$HOME/.brev/bin" >> "$GITHUB_PATH" |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the recommended CI installation method for brevdev/brev-cli (without pipe-to-shell), and does the project publish release checksums/signatures for verification?
💡 Result:
Recommended CI install (no pipe-to-shell)
For CI, use the published GitHub Release tarball (pin a version), extract brev, and place it on PATH—i.e., do what the repo’s install-latest.sh does, but as explicit CI steps (so you’re not executing a remote script). The upstream installer script simply downloads the matching *_linux_{amd64,arm64}.tar.gz from GitHub Releases and installs the brev binary. [2]
(For non-CI / dev machines, the README’s “official” installs are Homebrew on macOS and the install-latest.sh approach on Linux/WSL; conda-forge is also mentioned.) [1]
Release verification (checksums/signatures)
- Checksums: Yes. Releases include a
brev-cli_<version>_checksums.txtasset. [3] - Signatures: I do not see any detached signature assets (e.g.,
.asc,.sig) alongside the release artifacts—only the tarballs and the checksums file are published. [3]
Sources: [1] README install section, [2] bin/install-latest.sh, [3] GitHub Releases API for latest release assets.
Use published GitHub Release tarball instead of pipe-to-shell from main branch.
Installing Brev CLI via curl | bash from the mutable main branch is a supply-chain risk, particularly in a CI job with environment secrets. For CI, use the published GitHub Release tarball with checksum verification instead.
Recommended approach
- curl -sfL https://raw.githubusercontent.com/brevdev/brev-cli/main/bin/install-brev.sh | bash
+ BREV_VERSION="v0.x.y" # Pin to a release tag
+ BREV_RELEASE_URL="https://github.com/brevdev/brev-cli/releases/download/${BREV_VERSION}/brev-cli_${BREV_VERSION#v}_linux_amd64.tar.gz"
+ curl -fsSL "${BREV_RELEASE_URL}" -o /tmp/brev-cli.tar.gz
+ curl -fsSL "https://github.com/brevdev/brev-cli/releases/download/${BREV_VERSION}/brev-cli_${BREV_VERSION#v}_checksums.txt" -o /tmp/checksums.txt
+ sha256sum --check --ignore-missing /tmp/checksums.txt
+ tar -xzf /tmp/brev-cli.tar.gz -C "$HOME/.brev/bin" --strip-components=1
echo "$HOME/.brev/bin" >> "$GITHUB_PATH"This avoids script execution entirely and verifies checksums against published release artifacts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/brev.yml around lines 69 - 72, The workflow step named
"Install Brev CLI" currently pipes a script from the mutable main branch into
bash (curl | bash); replace that with downloading the published GitHub Release
tarball for the desired Brev version, verify the release checksum/signature,
extract the binary into $HOME/.brev/bin (create the directory if needed), and
then append $HOME/.brev/bin to GITHUB_PATH; ensure the run block references a
fixed release tag/version rather than main and includes explicit checksum
verification before extraction.
| NOT_READY=$(ssh -i "$SSH_KEY" -o StrictHostKeyChecking=no \ | ||
| "$SSH_USER@$INSTANCE_IP" \ | ||
| "kubectl get pods -n osmo --no-headers 2>/dev/null \ | ||
| | awk '{print \$3}' | grep -vcE '^(Running|Completed)$'" 2>/dev/null || echo "error") | ||
| if [ "$NOT_READY" = "0" ]; then | ||
| echo "OSMO is ready" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Demonstrates current behavior: count=0 but fallback branch still runs due grep exit code.
NOT_READY=$(bash -lc 'printf "p1 Running\np2 Completed\n" | awk "{print \$2}" | grep -vcE "^(Running|Completed)$"' || echo "error")
printf 'Observed NOT_READY value: [%s]\n' "$NOT_READY"
# Expected problematic output includes "error" despite zero non-ready pods.Repository: NVIDIA/OSMO
Length of output: 206
🏁 Script executed:
cat -n .github/workflows/brev.yml | sed -n '105,120p'Repository: NVIDIA/OSMO
Length of output: 838
🏁 Script executed:
# Test 1: Verify grep -vcE exit code behavior
# When count is 0 (no non-matching lines), what is the exit code?
bash -c 'echo -e "Running\nCompleted" | grep -vcE "^(Running|Completed)$"; echo "Exit code: $?"'Repository: NVIDIA/OSMO
Length of output: 69
🏁 Script executed:
# Test 2: Verify the exact behavior with ssh-like command substitution
# Simulate the pattern in the review comment
bash -c 'result=$(echo -e "Running\nCompleted" | grep -vcE "^(Running|Completed)$" || echo "error"); echo "Result: [$result]"'Repository: NVIDIA/OSMO
Length of output: 72
Readiness check can misclassify ready state and timeout indefinitely.
When all pods are Running/Completed, grep -vcE outputs 0 but exits non-zero (standard grep behavior with count mode and no matches). The || echo "error" fallback then executes, causing NOT_READY to contain both 0 and error as separate outputs. This causes the if [ "$NOT_READY" = "0" ] check on line 115 to fail indefinitely, resulting in the loop exhausting all 60 retries before timing out.
Suggested fix
- NOT_READY=$(ssh -i "$SSH_KEY" -o StrictHostKeyChecking=no \
- "$SSH_USER@$INSTANCE_IP" \
- "kubectl get pods -n osmo --no-headers 2>/dev/null \
- | awk '{print \$3}' | grep -vcE '^(Running|Completed)$'" 2>/dev/null || echo "error")
+ NOT_READY=$(ssh -i "$SSH_KEY" -o StrictHostKeyChecking=no \
+ "$SSH_USER@$INSTANCE_IP" \
+ "kubectl get pods -n osmo --no-headers 2>/dev/null \
+ | awk '\$3 !~ /^(Running|Completed)$/ {c++} END {print c+0}'" 2>/dev/null || echo "error")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/brev.yml around lines 111 - 116, The readiness check
assigns NOT_READY from a pipeline using grep -vcE which can exit non-zero when
there are no matches, causing the current "|| echo \"error\"" fallback to inject
"error" and break the equality check against "0". Change the pipeline that sets
NOT_READY (the ssh/kubectl/awk/grep pipeline) so it always returns a numeric
count on success or fallback to "0" on failure — for example replace the
grep-count approach with a robust alternative (e.g., pipe through wc -l or
append "|| true"/"|| echo 0" to the pipeline) so NOT_READY is reliably a number
and the if [ "$NOT_READY" = "0" ] check (and the readiness loop) works
correctly.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #829 +/- ##
==========================================
- Coverage 42.61% 42.56% -0.06%
==========================================
Files 203 203
Lines 27025 27025
Branches 7652 7652
==========================================
- Hits 11517 11502 -15
- Misses 15397 15411 +14
- Partials 111 112 +1
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Description
Adds CI for the Brev launchable (
deployments/brev/).Shellcheck runs on every PR touching
deployments/brev/**, catching shell syntax and style issues insetup.sh.Weekly E2E runs every Monday (and on manual dispatch) against a real Brev GPU instance provisioned from the launchable. It:
osmo/quick-startHelm chart version matches the latest published release (fails the job if not)nvcr.io/nvidia/nemo:24.12(~40 GB, validating the Docker data-root fix)The large-image test workflow (
disk-fill-test.yaml) lives indeployments/brev/rather than the cookbook since it exists to validate the deployment itself, not as a user-facing example.Requires a
brev-e2eGitHub environment withBREV_API_TOKENandNGC_API_KEYsecrets.Issue #None
Checklist
Summary by CodeRabbit