fix(onboard): upgrade OpenShell to 0.0.24 for sandbox survival across restarts#1587
fix(onboard): upgrade OpenShell to 0.0.24 for sandbox survival across restarts#1587
Conversation
… restarts Bump minimum OpenShell version to 0.0.24 which includes gateway resume with persistent SSH handshake secrets (OpenShell#488) and sandbox state persistence across stop/start cycles (OpenShell#739). - onboard.js: check installed OpenShell version and upgrade if below minimum (previously only installed when missing entirely) - install-openshell.sh: MIN_VERSION 0.0.22 → 0.0.24 - brev-launchable-ci-cpu.sh: default OPENSHELL_VERSION v0.0.20 → v0.0.24 - test-sandbox-survival.sh: rewrite E2E to use install.sh flow, verify workspace files, agent data, SSH connectivity, and live inference all survive a gateway stop/start cycle Validated fixes for #486, #888, #859, #1086. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
The "interactive mode auto-recreates" test mocked childProcess.spawn but streamSandboxCreate imports spawn at load time from its own "node:child_process" reference, bypassing the mock. This caused a real bash process to try to hit a live gateway, hanging for 87s+ until vitest's timeout killed it. Fix by using streamSandboxCreate's spawnImpl option to inject the fake spawn, preventing any real subprocess from being created. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…d tests - Add 10s timeout to smoke-macos-install pipe test to prevent hang - Mock dashboard readiness curl check in 4 onboard createSandbox tests that were sleeping 28s each through unmocked polling loop Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
📝 WalkthroughWalkthroughOpenShell minimum version requirement bumped to 0.0.24 across installation and onboarding paths, with version-checking logic added to onboard.js and CI scripts. The sandbox survival e2e test expanded to verify persistence across gateway and machine restarts, including SSH connectivity validation and state persistence markers. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/test-sandbox-survival.sh (1)
472-476:⚠️ Potential issue | 🟠 MajorDon't let
NotReadysatisfy the ready-state check.Line 475 extracts
running|readyas a substring, soNotReadystill yieldsReady. In that case this E2E can report restart survival even though the sandbox never recovered.🐛 Safer readiness check
- sandbox_phase=$(openshell sandbox list 2>&1 | grep "$SANDBOX_NAME" | grep -oiE 'running|ready' | head -1) + sandbox_phase=$( + openshell sandbox list 2>&1 | + awk -v name="$SANDBOX_NAME" '$1 == name && ($2 == "Running" || $2 == "Ready") { print $2; exit }' + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-survival.sh` around lines 472 - 476, The readiness check currently greps "running|ready" which matches substrings like "NotReady"; change the extraction in the loop that assigns sandbox_phase (the openshell sandbox list pipeline referencing SANDBOX_NAME and sandbox_phase) to only match whole words for the states (e.g. use a word-boundary regex like '\b(running|ready)\b' or the grep word-boundary syntax '\<(running|ready)\>' with the existing -i flag) so "NotReady" will not satisfy the check; keep the rest of the loop logic (head -1, non-empty test) intact.
🧹 Nitpick comments (2)
test/onboard.test.js (1)
1689-1689: Keep the curl stub scoped to the dashboard readiness probe.Lines 1689, 2104, 2300, and 2418 now return
"ok"for anysandbox exec ... curl ...command. That fixes the hang, but it also lets these tests stay green if the readiness probe regresses to the wrong URL or port. Matching the expectedlocalhost:18789probe keeps the sleep fix without weakening the assertion.Also applies to: 2104-2104, 2300-2300, 2418-2418
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/onboard.test.js` at line 1689, The current stub in test/onboard.test.js that returns "ok" for any command where command.includes("sandbox exec") && command.includes("curl") is too broad; narrow the condition to only match the dashboard readiness probe by also requiring the expected probe host/port (e.g., check that command includes "localhost:18789" or the exact probe URL used in tests) so only the readiness curl is short-circuited and other curl invocations still exercise assertions; update the same conditional at the other occurrences referenced (the stubs at the other locations) to the same scoped check.test/e2e/test-sandbox-survival.sh (1)
199-213: Retain the installer log when setup fails.Lines 199-213 remove
INSTALL_LOGbeforeinstall_exitis checked. Ifinstall.shdies early and the backgroundtail -fmissed the beginning, the only full artifact is gone right when the failure needs debugging.🪵 Small tweak
wait $install_pid install_exit=$? kill $tail_pid 2>/dev/null || true wait $tail_pid 2>/dev/null || true -rm -f "$INSTALL_LOG" # Source shell profile to pick up nvm/PATH changes from install.sh if [ -f "$HOME/.bashrc" ]; then # shellcheck source=/dev/null source "$HOME/.bashrc" 2>/dev/null || true @@ if [ $install_exit -eq 0 ]; then + rm -f "$INSTALL_LOG" pass "install.sh completed (exit 0)" else + info "install.sh log retained at $INSTALL_LOG" fail "install.sh failed (exit $install_exit)" exit 1 fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-survival.sh` around lines 199 - 213, The script currently deletes INSTALL_LOG unconditionally before checking install_exit, losing the log needed when install.sh fails; update the teardown so rm -f "$INSTALL_LOG" is performed only after you inspect install_exit (e.g., move the rm step into a conditional that runs when install_exit == 0 or otherwise retain the file on non-zero exit), keeping the existing background tail/kill/wait logic for install_pid and tail_pid and using those same symbols (INSTALL_LOG, install_pid, tail_pid, install_exit) to implement the conditional removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard.js`:
- Around line 1563-1585: The code currently skips upgrade when
getInstalledOpenshellVersion() returns null; change the logic in the block that
evaluates currentVersion so that an unreadable/null version is treated as "needs
upgrade": if currentVersion is falsy, call installOpenshell() (same flow as when
needsUpgrade is true), assign to openshellInstall, and handle failure (log
errors and process.exit(1)) the same way as the existing needsUpgrade branch;
update the conditional around needsUpgrade or introduce an explicit check (using
getInstalledOpenshellVersion, needsUpgrade, installOpenshell, and
openshellInstall) so unknown/oddly-formatted versions trigger reinstall.
In `@scripts/brev-launchable-ci-cpu.sh`:
- Line 43: The script sets OPENSHELL_VERSION but doesn't ensure the installed
OpenShell CLI matches it; update the OpenShell installation logic to detect the
currently installed version (e.g., by running the installed binary with a
version flag), compare that to OPENSHELL_VERSION, and if they differ replace the
installed binary with the pinned one (pull/install OPENSHELL_VERSION and restart
or re-link as the script already does for fresh installs). Modify the OpenShell
block to perform: check-version -> if version != OPENSHELL_VERSION then
download/install the pinned OPENSHELL_VERSION (and make executable/replace
existing binary) -> otherwise skip, ensuring the requested restart-survival fix
is applied when mismatched.
In `@test/e2e/test-sandbox-survival.sh`:
- Around line 264-265: The current grep-based check using REGISTRY and
SANDBOX_NAME is brittle because it matches substrings like defaultSandbox;
instead parse the registry JSON and assert the sandboxes array contains an entry
with the exact name SANDBOX_NAME. Replace the grep check around REGISTRY and
SANDBOX_NAME with a JSON-aware test (e.g., using jq or a small python -c) that
verifies .sandboxes[] | select(.name == "$SANDBOX_NAME") exists, and update the
identical checks later in the file (the ones around the second occurrence) so
both assertions explicitly verify the sandbox entry inside the sandboxes array
rather than relying on raw string matching.
In `@test/smoke-macos-install.test.js`:
- Line 10: Remove the broad describe.skip around "macOS smoke install script
guardrails" and instead enable the suite by using describe(...). Then gate only
the FIFO/named-pipe spec by conditionally skipping it on Darwin — e.g., detect
process.platform === 'darwin' and use (isDarwin ? it.skip : it) or test.skip for
the spec whose title includes "FIFO" or "named-pipe" so the other
help/env/runtime validation tests still run.
---
Outside diff comments:
In `@test/e2e/test-sandbox-survival.sh`:
- Around line 472-476: The readiness check currently greps "running|ready" which
matches substrings like "NotReady"; change the extraction in the loop that
assigns sandbox_phase (the openshell sandbox list pipeline referencing
SANDBOX_NAME and sandbox_phase) to only match whole words for the states (e.g.
use a word-boundary regex like '\b(running|ready)\b' or the grep word-boundary
syntax '\<(running|ready)\>' with the existing -i flag) so "NotReady" will not
satisfy the check; keep the rest of the loop logic (head -1, non-empty test)
intact.
---
Nitpick comments:
In `@test/e2e/test-sandbox-survival.sh`:
- Around line 199-213: The script currently deletes INSTALL_LOG unconditionally
before checking install_exit, losing the log needed when install.sh fails;
update the teardown so rm -f "$INSTALL_LOG" is performed only after you inspect
install_exit (e.g., move the rm step into a conditional that runs when
install_exit == 0 or otherwise retain the file on non-zero exit), keeping the
existing background tail/kill/wait logic for install_pid and tail_pid and using
those same symbols (INSTALL_LOG, install_pid, tail_pid, install_exit) to
implement the conditional removal.
In `@test/onboard.test.js`:
- Line 1689: The current stub in test/onboard.test.js that returns "ok" for any
command where command.includes("sandbox exec") && command.includes("curl") is
too broad; narrow the condition to only match the dashboard readiness probe by
also requiring the expected probe host/port (e.g., check that command includes
"localhost:18789" or the exact probe URL used in tests) so only the readiness
curl is short-circuited and other curl invocations still exercise assertions;
update the same conditional at the other occurrences referenced (the stubs at
the other locations) to the same scoped check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a0c014e4-6a94-418c-9098-ec51aa9f905f
📒 Files selected for processing (6)
bin/lib/onboard.jsscripts/brev-launchable-ci-cpu.shscripts/install-openshell.shtest/e2e/test-sandbox-survival.shtest/onboard.test.jstest/smoke-macos-install.test.js
| if [ -f "$REGISTRY" ] && grep -Fq "\"${SANDBOX_NAME}\"" "$REGISTRY"; then | ||
| pass "NemoClaw registry contains '$SANDBOX_NAME'" |
There was a problem hiding this comment.
Parse the registry JSON instead of grepping the raw sandbox name.
Lines 264-265 and 490-491 also match defaultSandbox, so these checks can pass even when the actual entry under sandboxes is missing. That weakens the “registry retains sandbox entry” assertion, and it makes the later cleanup assertion ambiguous for the same reason.
Also applies to: 490-491
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/test-sandbox-survival.sh` around lines 264 - 265, The current
grep-based check using REGISTRY and SANDBOX_NAME is brittle because it matches
substrings like defaultSandbox; instead parse the registry JSON and assert the
sandboxes array contains an entry with the exact name SANDBOX_NAME. Replace the
grep check around REGISTRY and SANDBOX_NAME with a JSON-aware test (e.g., using
jq or a small python -c) that verifies .sandboxes[] | select(.name ==
"$SANDBOX_NAME") exists, and update the identical checks later in the file (the
ones around the second occurrence) so both assertions explicitly verify the
sandbox entry inside the sandboxes array rather than relying on raw string
matching.
## Summary - Patch `streamSandboxCreate` spawn injection in onboard interactive test to prevent 87s+ hang from real subprocess hitting a live gateway - Mock dashboard readiness curl check in 4 onboard `createSandbox` tests that were sleeping 28s each through unmocked polling loop - Add 10s timeout to `smoke-macos-install` pipe test to prevent hang Cherry-picked from #1587 (test-only commits). ## Test plan - [x] `vitest run --project cli` passes - [x] Pre-push hooks pass <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Tests** * Enhanced test mocking infrastructure for sandbox execution health checks and child process handling to improve test reliability and coverage. * Adjusted macOS smoke test suite configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…nstalls Treat unreadable openshell version as too-old (reinstall instead of silently skipping the minimum-version gate). Also compare installed version against the pin in brev-launchable-ci-cpu.sh so reused launchables get upgraded when OPENSHELL_VERSION changes. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
| // Ensure the installed version meets the minimum required by install-openshell.sh. | ||
| // The script itself is idempotent — it exits early if the version is already sufficient. | ||
| const currentVersion = getInstalledOpenshellVersion(); | ||
| if (!currentVersion) { |
There was a problem hiding this comment.
Seems redundant with previous if, maybe it can be regrouped together to reduce branching?
| } | ||
| } else { | ||
| const parts = currentVersion.split(".").map(Number); | ||
| const minParts = [0, 0, 24]; // must match MIN_VERSION in scripts/install-openshell.sh |
There was a problem hiding this comment.
I think something that important might be better placed on some constant part of the code, on the first lines preferably if not in a separate config file
Even more so now that I see it's reused in other places
cv
left a comment
There was a problem hiding this comment.
LGTM — security review WARNING (non-blocking).
Approved with one note:
- MEDIUM:
brev-launchable-ci-cpu.shdownloads OpenShell without SHA-256 checksum verification (unlikeinstall-openshell.shwhich does verify). Pre-existing gap but this PR adds a second download-without-verify path. Consider reusing the checksum logic frominstall-openshell.shin a follow-up.
Otherwise clean:
- Version gate null-handling is correct
- Version comparison logic is sound (regex guarantees numeric segments)
- No credential changes
- All CI checks pass
- E2E test coverage is significantly expanded
## Summary - Document `nemoclaw credentials list` and `nemoclaw credentials reset` commands in commands reference (#1597) - Add `--dry-run` flag documentation for `policy-add` (#1276) - Update policy presets table: remove `docker` (#1647), add `brave` and `brew`, update HuggingFace endpoint (#1540) - Document `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` env var for local providers (#1620) - Document `NEMOCLAW_PROXY_HOST`/`NEMOCLAW_PROXY_PORT` env vars (#1563) - Add troubleshooting entries for Docker group permissions (#1614), sandbox survival after gateway restart (#1587), and proxy configuration - Regenerate `nemoclaw-user-*` skills from updated docs ## Test plan - [x] `make docs` builds without warnings - [x] All pre-commit and pre-push hooks pass - [ ] Verify rendered pages in docs site preview 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added `nemoclaw credentials list` command to display stored credential names * Added `nemoclaw credentials reset <KEY>` command with `--yes` flag to remove credentials * Added `--dry-run` flag for policy-add to preview endpoint changes * New policy presets: `brave` and `brew` * New configuration options: `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT`, `NEMOCLAW_PROXY_HOST`, and `NEMOCLAW_PROXY_PORT` * **Documentation** * Expanded troubleshooting guides for Docker permissions, sandbox connectivity, local inference timeouts, and proxy configuration <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Users with OpenShell older than 0.0.22 hit sandbox survival failures after gateway restarts — sandboxes appear listed but are unreachable because the gateway regenerated its SSH secrets, breaking the handshake with existing sandboxes.
OpenShell 0.0.22 fixed this by persisting SSH secrets and sandbox state across gateway stop/start cycles. The install script (
install-openshell.sh) already required 0.0.22 as its minimum version — but that gate only ran when openshell was missing. If any version was already installed,nemoclaw onboardskipped the install step entirely and never checked whether the installed version was sufficient. Users stuck on 0.0.20 or earlier were never upgraded.This PR closes that gap by making onboard check the installed version and upgrade when it's too old.
What changed
bin/lib/onboard.js— onboard preflight now checks the installed OpenShell version against the minimum. If it's too old, it runs the install script to upgrade. If the version can't be determined, it reinstalls rather than silently continuing.scripts/brev-launchable-ci-cpu.sh— same problem: reused Brev launchables that already had an older openshell skipped the install entirely. Now compares the installed version against the pin and reinstalls when they don't match.scripts/install-openshell.sh— minimum bumped from 0.0.22 to 0.0.24 (latest release).Test plan
vitest run --project cli)