fix(onboard): upgrade OpenShell to 0.0.24 for sandbox survival across restarts by ericksoa · Pull Request #1587 · NVIDIA/NemoClaw

ericksoa · 2026-04-07T22:27:58Z

Summary

Users with OpenShell older than 0.0.22 hit sandbox survival failures after gateway restarts — sandboxes appear listed but are unreachable because the gateway regenerated its SSH secrets, breaking the handshake with existing sandboxes.

OpenShell 0.0.22 fixed this by persisting SSH secrets and sandbox state across gateway stop/start cycles. The install script (install-openshell.sh) already required 0.0.22 as its minimum version — but that gate only ran when openshell was missing. If any version was already installed, nemoclaw onboard skipped the install step entirely and never checked whether the installed version was sufficient. Users stuck on 0.0.20 or earlier were never upgraded.

This PR closes that gap by making onboard check the installed version and upgrade when it's too old.

What changed

bin/lib/onboard.js — onboard preflight now checks the installed OpenShell version against the minimum. If it's too old, it runs the install script to upgrade. If the version can't be determined, it reinstalls rather than silently continuing.

scripts/brev-launchable-ci-cpu.sh — same problem: reused Brev launchables that already had an older openshell skipped the install entirely. Now compares the installed version against the pin and reinstalls when they don't match.

scripts/install-openshell.sh — minimum bumped from 0.0.22 to 0.0.24 (latest release).

Test plan

E2E sandbox survival: 36/36 passed locally
Unit tests pass (vitest run --project cli)
Pre-push hooks pass
Nightly E2E triggered on this branch

… restarts Bump minimum OpenShell version to 0.0.24 which includes gateway resume with persistent SSH handshake secrets (OpenShell#488) and sandbox state persistence across stop/start cycles (OpenShell#739). - onboard.js: check installed OpenShell version and upgrade if below minimum (previously only installed when missing entirely) - install-openshell.sh: MIN_VERSION 0.0.22 → 0.0.24 - brev-launchable-ci-cpu.sh: default OPENSHELL_VERSION v0.0.20 → v0.0.24 - test-sandbox-survival.sh: rewrite E2E to use install.sh flow, verify workspace files, agent data, SSH connectivity, and live inference all survive a gateway stop/start cycle Validated fixes for #486, #888, #859, #1086. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

The "interactive mode auto-recreates" test mocked childProcess.spawn but streamSandboxCreate imports spawn at load time from its own "node:child_process" reference, bypassing the mock. This caused a real bash process to try to hit a live gateway, hanging for 87s+ until vitest's timeout killed it. Fix by using streamSandboxCreate's spawnImpl option to inject the fake spawn, preventing any real subprocess from being created. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

…d tests - Add 10s timeout to smoke-macos-install pipe test to prevent hang - Mock dashboard readiness curl check in 4 onboard createSandbox tests that were sleeping 28s each through unmocked polling loop Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

coderabbitai · 2026-04-07T22:28:13Z

📝 Walkthrough

Walkthrough

OpenShell minimum version requirement bumped to 0.0.24 across installation and onboarding paths, with version-checking logic added to onboard.js and CI scripts. The sandbox survival e2e test expanded to verify persistence across gateway and machine restarts, including SSH connectivity validation and state persistence markers.

Changes

Cohort / File(s)	Summary
OpenShell Version Requirements `bin/lib/onboard.js`, `scripts/install-openshell.sh`, `scripts/brev-launchable-ci-cpu.sh`	Updated minimum version threshold to 0.0.24. Enhanced onboarding preflight to parse installed semantic version, compare against minimum, and trigger upgrade when needed. Added version-checking logic to CI script to reinstall if pinned version differs from installed version.
E2E Test Refactoring `test/e2e/test-sandbox-survival.sh`	Comprehensive rewrite of sandbox survival test to validate persistence after restart. Added NemoClaw installer orchestration, SSH connectivity helpers using `openshell sandbox ssh-config`, persistence markers, and multi-phase verification including post-restart state validation and inference execution.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Version bumps hop from point two-oh to two-four,
Sandboxes now survive the restart encore—
SSH paths checked, persistence marks stay true,
The gateway sleeps and wakes, still serving you! 🔄

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: upgrading OpenShell to 0.0.24 to enable sandbox survival across restarts, which directly aligns with the core objective in the PR description and linked issue `#486`.
Linked Issues check	✅ Passed	The PR implements the core requirement from issue `#486` by upgrading OpenShell to 0.0.24 across multiple files and updating the E2E test to verify sandbox persistence across gateway restarts, addressing the sandbox discovery and connectivity loss after machine restart.
Out of Scope Changes check	✅ Passed	All changes directly support the objective of sandbox restart resilience: version bumps to 0.0.24 in installation/onboarding flows, updated minimum version checks, and E2E test refactoring to verify persistence across restarts with SSH connectivity helpers.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test/sandbox-survival-openshell-0.0.24

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/e2e/test-sandbox-survival.sh (1)

472-476: ⚠️ Potential issue | 🟠 Major

Don't let NotReady satisfy the ready-state check.

Line 475 extracts running|ready as a substring, so NotReady still yields Ready. In that case this E2E can report restart survival even though the sandbox never recovered.

🐛 Safer readiness check

-  sandbox_phase=$(openshell sandbox list 2>&1 | grep "$SANDBOX_NAME" | grep -oiE 'running|ready' | head -1)
+  sandbox_phase=$(
+    openshell sandbox list 2>&1 |
+      awk -v name="$SANDBOX_NAME" '$1 == name && ($2 == "Running" || $2 == "Ready") { print $2; exit }'
+  )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-sandbox-survival.sh` around lines 472 - 476, The readiness
check currently greps "running|ready" which matches substrings like "NotReady";
change the extraction in the loop that assigns sandbox_phase (the openshell
sandbox list pipeline referencing SANDBOX_NAME and sandbox_phase) to only match
whole words for the states (e.g. use a word-boundary regex like
'\b(running|ready)\b' or the grep word-boundary syntax '\<(running|ready)\>'
with the existing -i flag) so "NotReady" will not satisfy the check; keep the
rest of the loop logic (head -1, non-empty test) intact.

🧹 Nitpick comments (2)

test/onboard.test.js (1)

1689-1689: Keep the curl stub scoped to the dashboard readiness probe.

Lines 1689, 2104, 2300, and 2418 now return "ok" for any sandbox exec ... curl ... command. That fixes the hang, but it also lets these tests stay green if the readiness probe regresses to the wrong URL or port. Matching the expected localhost:18789 probe keeps the sleep fix without weakening the assertion.

Also applies to: 2104-2104, 2300-2300, 2418-2418
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/onboard.test.js` at line 1689, The current stub in test/onboard.test.js
that returns "ok" for any command where command.includes("sandbox exec") &&
command.includes("curl") is too broad; narrow the condition to only match the
dashboard readiness probe by also requiring the expected probe host/port (e.g.,
check that command includes "localhost:18789" or the exact probe URL used in
tests) so only the readiness curl is short-circuited and other curl invocations
still exercise assertions; update the same conditional at the other occurrences
referenced (the stubs at the other locations) to the same scoped check.

test/e2e/test-sandbox-survival.sh (1)

199-213: Retain the installer log when setup fails.

Lines 199-213 remove INSTALL_LOG before install_exit is checked. If install.sh dies early and the background tail -f missed the beginning, the only full artifact is gone right when the failure needs debugging.

🪵 Small tweak

 wait $install_pid
 install_exit=$?
 kill $tail_pid 2>/dev/null || true
 wait $tail_pid 2>/dev/null || true
-rm -f "$INSTALL_LOG"
 
 # Source shell profile to pick up nvm/PATH changes from install.sh
 if [ -f "$HOME/.bashrc" ]; then
   # shellcheck source=/dev/null
   source "$HOME/.bashrc" 2>/dev/null || true
@@
 if [ $install_exit -eq 0 ]; then
+  rm -f "$INSTALL_LOG"
   pass "install.sh completed (exit 0)"
 else
+  info "install.sh log retained at $INSTALL_LOG"
   fail "install.sh failed (exit $install_exit)"
   exit 1
 fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-sandbox-survival.sh` around lines 199 - 213, The script
currently deletes INSTALL_LOG unconditionally before checking install_exit,
losing the log needed when install.sh fails; update the teardown so rm -f
"$INSTALL_LOG" is performed only after you inspect install_exit (e.g., move the
rm step into a conditional that runs when install_exit == 0 or otherwise retain
the file on non-zero exit), keeping the existing background tail/kill/wait logic
for install_pid and tail_pid and using those same symbols (INSTALL_LOG,
install_pid, tail_pid, install_exit) to implement the conditional removal.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 1563-1585: The code currently skips upgrade when
getInstalledOpenshellVersion() returns null; change the logic in the block that
evaluates currentVersion so that an unreadable/null version is treated as "needs
upgrade": if currentVersion is falsy, call installOpenshell() (same flow as when
needsUpgrade is true), assign to openshellInstall, and handle failure (log
errors and process.exit(1)) the same way as the existing needsUpgrade branch;
update the conditional around needsUpgrade or introduce an explicit check (using
getInstalledOpenshellVersion, needsUpgrade, installOpenshell, and
openshellInstall) so unknown/oddly-formatted versions trigger reinstall.

In `@scripts/brev-launchable-ci-cpu.sh`:
- Line 43: The script sets OPENSHELL_VERSION but doesn't ensure the installed
OpenShell CLI matches it; update the OpenShell installation logic to detect the
currently installed version (e.g., by running the installed binary with a
version flag), compare that to OPENSHELL_VERSION, and if they differ replace the
installed binary with the pinned one (pull/install OPENSHELL_VERSION and restart
or re-link as the script already does for fresh installs). Modify the OpenShell
block to perform: check-version -> if version != OPENSHELL_VERSION then
download/install the pinned OPENSHELL_VERSION (and make executable/replace
existing binary) -> otherwise skip, ensuring the requested restart-survival fix
is applied when mismatched.

In `@test/e2e/test-sandbox-survival.sh`:
- Around line 264-265: The current grep-based check using REGISTRY and
SANDBOX_NAME is brittle because it matches substrings like defaultSandbox;
instead parse the registry JSON and assert the sandboxes array contains an entry
with the exact name SANDBOX_NAME. Replace the grep check around REGISTRY and
SANDBOX_NAME with a JSON-aware test (e.g., using jq or a small python -c) that
verifies .sandboxes[] | select(.name == "$SANDBOX_NAME") exists, and update the
identical checks later in the file (the ones around the second occurrence) so
both assertions explicitly verify the sandbox entry inside the sandboxes array
rather than relying on raw string matching.

In `@test/smoke-macos-install.test.js`:
- Line 10: Remove the broad describe.skip around "macOS smoke install script
guardrails" and instead enable the suite by using describe(...). Then gate only
the FIFO/named-pipe spec by conditionally skipping it on Darwin — e.g., detect
process.platform === 'darwin' and use (isDarwin ? it.skip : it) or test.skip for
the spec whose title includes "FIFO" or "named-pipe" so the other
help/env/runtime validation tests still run.

---

Outside diff comments:
In `@test/e2e/test-sandbox-survival.sh`:
- Around line 472-476: The readiness check currently greps "running|ready" which
matches substrings like "NotReady"; change the extraction in the loop that
assigns sandbox_phase (the openshell sandbox list pipeline referencing
SANDBOX_NAME and sandbox_phase) to only match whole words for the states (e.g.
use a word-boundary regex like '\b(running|ready)\b' or the grep word-boundary
syntax '\<(running|ready)\>' with the existing -i flag) so "NotReady" will not
satisfy the check; keep the rest of the loop logic (head -1, non-empty test)
intact.

---

Nitpick comments:
In `@test/e2e/test-sandbox-survival.sh`:
- Around line 199-213: The script currently deletes INSTALL_LOG unconditionally
before checking install_exit, losing the log needed when install.sh fails;
update the teardown so rm -f "$INSTALL_LOG" is performed only after you inspect
install_exit (e.g., move the rm step into a conditional that runs when
install_exit == 0 or otherwise retain the file on non-zero exit), keeping the
existing background tail/kill/wait logic for install_pid and tail_pid and using
those same symbols (INSTALL_LOG, install_pid, tail_pid, install_exit) to
implement the conditional removal.

In `@test/onboard.test.js`:
- Line 1689: The current stub in test/onboard.test.js that returns "ok" for any
command where command.includes("sandbox exec") && command.includes("curl") is
too broad; narrow the condition to only match the dashboard readiness probe by
also requiring the expected probe host/port (e.g., check that command includes
"localhost:18789" or the exact probe URL used in tests) so only the readiness
curl is short-circuited and other curl invocations still exercise assertions;
update the same conditional at the other occurrences referenced (the stubs at
the other locations) to the same scoped check.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a0c014e4-6a94-418c-9098-ec51aa9f905f

📥 Commits

Reviewing files that changed from the base of the PR and between 84d0533 and 93f9e85.

📒 Files selected for processing (6)

bin/lib/onboard.js
scripts/brev-launchable-ci-cpu.sh
scripts/install-openshell.sh
test/e2e/test-sandbox-survival.sh
test/onboard.test.js
test/smoke-macos-install.test.js

bin/lib/onboard.js

scripts/brev-launchable-ci-cpu.sh

coderabbitai · 2026-04-07T22:40:55Z

test/e2e/test-sandbox-survival.sh

+if [ -f "$REGISTRY" ] && grep -Fq "\"${SANDBOX_NAME}\"" "$REGISTRY"; then
+  pass "NemoClaw registry contains '$SANDBOX_NAME'"


⚠️ Potential issue | 🟡 Minor

Parse the registry JSON instead of grepping the raw sandbox name.

Lines 264-265 and 490-491 also match defaultSandbox, so these checks can pass even when the actual entry under sandboxes is missing. That weakens the “registry retains sandbox entry” assertion, and it makes the later cleanup assertion ambiguous for the same reason.

Also applies to: 490-491

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-survival.sh` around lines 264 - 265, The current grep-based check using REGISTRY and SANDBOX_NAME is brittle because it matches substrings like defaultSandbox; instead parse the registry JSON and assert the sandboxes array contains an entry with the exact name SANDBOX_NAME. Replace the grep check around REGISTRY and SANDBOX_NAME with a JSON-aware test (e.g., using jq or a small python -c) that verifies .sandboxes[] | select(.name == "$SANDBOX_NAME") exists, and update the identical checks later in the file (the ones around the second occurrence) so both assertions explicitly verify the sandbox entry inside the sandboxes array rather than relying on raw string matching.

test/smoke-macos-install.test.js

## Summary - Patch `streamSandboxCreate` spawn injection in onboard interactive test to prevent 87s+ hang from real subprocess hitting a live gateway - Mock dashboard readiness curl check in 4 onboard `createSandbox` tests that were sleeping 28s each through unmocked polling loop - Add 10s timeout to `smoke-macos-install` pipe test to prevent hang Cherry-picked from #1587 (test-only commits). ## Test plan - [x] `vitest run --project cli` passes - [x] Pre-push hooks pass  ## Summary by CodeRabbit ## Release Notes * **Tests** * Enhanced test mocking infrastructure for sandbox execution health checks and child process handling to improve test reliability and coverage. * Adjusted macOS smoke test suite configuration.  --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

…nstalls Treat unreadable openshell version as too-old (reinstall instead of silently skipping the minimum-version gate). Also compare installed version against the pin in brev-launchable-ci-cpu.sh so reused launchables get upgraded when OPENSHELL_VERSION changes. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

HagegeR · 2026-04-08T19:18:55Z

bin/lib/onboard.js

+    // Ensure the installed version meets the minimum required by install-openshell.sh.
+    // The script itself is idempotent — it exits early if the version is already sufficient.
+    const currentVersion = getInstalledOpenshellVersion();
+    if (!currentVersion) {


Seems redundant with previous if, maybe it can be regrouped together to reduce branching?

HagegeR · 2026-04-08T19:21:05Z

bin/lib/onboard.js

+      }
+    } else {
+      const parts = currentVersion.split(".").map(Number);
+      const minParts = [0, 0, 24]; // must match MIN_VERSION in scripts/install-openshell.sh


I think something that important might be better placed on some constant part of the code, on the first lines preferably if not in a separate config file
Even more so now that I see it's reused in other places

cv

LGTM — security review WARNING (non-blocking).

Approved with one note:

MEDIUM: brev-launchable-ci-cpu.sh downloads OpenShell without SHA-256 checksum verification (unlike install-openshell.sh which does verify). Pre-existing gap but this PR adds a second download-without-verify path. Consider reusing the checksum logic from install-openshell.sh in a follow-up.

Otherwise clean:

Version gate null-handling is correct
Version comparison logic is sound (regex guarantees numeric segments)
No credential changes
All CI checks pass
E2E test coverage is significantly expanded

## Summary - Document `nemoclaw credentials list` and `nemoclaw credentials reset` commands in commands reference (#1597) - Add `--dry-run` flag documentation for `policy-add` (#1276) - Update policy presets table: remove `docker` (#1647), add `brave` and `brew`, update HuggingFace endpoint (#1540) - Document `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` env var for local providers (#1620) - Document `NEMOCLAW_PROXY_HOST`/`NEMOCLAW_PROXY_PORT` env vars (#1563) - Add troubleshooting entries for Docker group permissions (#1614), sandbox survival after gateway restart (#1587), and proxy configuration - Regenerate `nemoclaw-user-*` skills from updated docs ## Test plan - [x] `make docs` builds without warnings - [x] All pre-commit and pre-push hooks pass - [ ] Verify rendered pages in docs site preview 🤖 Generated with [Claude Code](https://claude.com/claude-code)  ## Summary by CodeRabbit * **New Features** * Added `nemoclaw credentials list` command to display stored credential names * Added `nemoclaw credentials reset <KEY>` command with `--yes` flag to remove credentials * Added `--dry-run` flag for policy-add to preview endpoint changes * New policy presets: `brave` and `brew` * New configuration options: `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT`, `NEMOCLAW_PROXY_HOST`, and `NEMOCLAW_PROXY_PORT` * **Documentation** * Expanded troubleshooting guides for Docker permissions, sandbox connectivity, local inference timeouts, and proxy configuration  Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ericksoa added 3 commits April 7, 2026 13:41

Merge branch 'main' into test/sandbox-survival-openshell-0.0.24

93f9e85

ericksoa mentioned this pull request Apr 7, 2026

fix(test): eliminate slow test hangs in onboard and smoke suites #1589

Merged

2 tasks

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

ericksoa added 2 commits April 7, 2026 16:09

Merge branch 'main' into test/sandbox-survival-openshell-0.0.24

5cb0c07

wscurran added Getting Started Use this label to identify setup, installation, or onboarding issues. priority: high Important issue that should be resolved in the next release OpenShell Support for OpenShell, a safe, private runtime for autonomous AI agents fix labels Apr 8, 2026

HagegeR reviewed Apr 8, 2026

View reviewed changes

cv added the v0.0.10 Release target label Apr 8, 2026

cv self-assigned this Apr 8, 2026

cv approved these changes Apr 8, 2026

View reviewed changes

cv merged commit 1aa8427 into main Apr 8, 2026
25 of 27 checks passed

miyoungc mentioned this pull request Apr 9, 2026

docs: catch up documentation for v0.0.10 changes #1654

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(onboard): upgrade OpenShell to 0.0.24 for sandbox survival across restarts#1587

fix(onboard): upgrade OpenShell to 0.0.24 for sandbox survival across restarts#1587
cv merged 6 commits intomainfrom
test/sandbox-survival-openshell-0.0.24

ericksoa commented Apr 7, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 7, 2026

Uh oh!

Uh oh!

HagegeR Apr 8, 2026

Uh oh!

HagegeR Apr 8, 2026 •

edited

Loading

Uh oh!

cv left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		if [ -f "$REGISTRY" ] && grep -Fq "\"${SANDBOX_NAME}\"" "$REGISTRY"; then
		pass "NemoClaw registry contains '$SANDBOX_NAME'"

Conversation

ericksoa commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Test plan

Uh oh!

coderabbitai bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HagegeR Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

HagegeR Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ericksoa commented Apr 7, 2026 •

edited

Loading

coderabbitai bot commented Apr 7, 2026 •

edited

Loading

HagegeR Apr 8, 2026 •

edited

Loading