Skip to content

fix(onboard): upgrade OpenShell to 0.0.24 for sandbox survival across restarts#1587

Merged
cv merged 6 commits intomainfrom
test/sandbox-survival-openshell-0.0.24
Apr 8, 2026
Merged

fix(onboard): upgrade OpenShell to 0.0.24 for sandbox survival across restarts#1587
cv merged 6 commits intomainfrom
test/sandbox-survival-openshell-0.0.24

Conversation

@ericksoa
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa commented Apr 7, 2026

Summary

Users with OpenShell older than 0.0.22 hit sandbox survival failures after gateway restarts — sandboxes appear listed but are unreachable because the gateway regenerated its SSH secrets, breaking the handshake with existing sandboxes.

OpenShell 0.0.22 fixed this by persisting SSH secrets and sandbox state across gateway stop/start cycles. The install script (install-openshell.sh) already required 0.0.22 as its minimum version — but that gate only ran when openshell was missing. If any version was already installed, nemoclaw onboard skipped the install step entirely and never checked whether the installed version was sufficient. Users stuck on 0.0.20 or earlier were never upgraded.

This PR closes that gap by making onboard check the installed version and upgrade when it's too old.

What changed

bin/lib/onboard.js — onboard preflight now checks the installed OpenShell version against the minimum. If it's too old, it runs the install script to upgrade. If the version can't be determined, it reinstalls rather than silently continuing.

scripts/brev-launchable-ci-cpu.sh — same problem: reused Brev launchables that already had an older openshell skipped the install entirely. Now compares the installed version against the pin and reinstalls when they don't match.

scripts/install-openshell.sh — minimum bumped from 0.0.22 to 0.0.24 (latest release).

Test plan

  • E2E sandbox survival: 36/36 passed locally
  • Unit tests pass (vitest run --project cli)
  • Pre-push hooks pass
  • Nightly E2E triggered on this branch

ericksoa added 3 commits April 7, 2026 13:41
… restarts

Bump minimum OpenShell version to 0.0.24 which includes gateway resume
with persistent SSH handshake secrets (OpenShell#488) and sandbox state
persistence across stop/start cycles (OpenShell#739).

- onboard.js: check installed OpenShell version and upgrade if below
  minimum (previously only installed when missing entirely)
- install-openshell.sh: MIN_VERSION 0.0.22 → 0.0.24
- brev-launchable-ci-cpu.sh: default OPENSHELL_VERSION v0.0.20 → v0.0.24
- test-sandbox-survival.sh: rewrite E2E to use install.sh flow, verify
  workspace files, agent data, SSH connectivity, and live inference all
  survive a gateway stop/start cycle

Validated fixes for #486, #888, #859, #1086.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
The "interactive mode auto-recreates" test mocked childProcess.spawn
but streamSandboxCreate imports spawn at load time from its own
"node:child_process" reference, bypassing the mock. This caused a real
bash process to try to hit a live gateway, hanging for 87s+ until
vitest's timeout killed it.

Fix by using streamSandboxCreate's spawnImpl option to inject the fake
spawn, preventing any real subprocess from being created.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…d tests

- Add 10s timeout to smoke-macos-install pipe test to prevent hang
- Mock dashboard readiness curl check in 4 onboard createSandbox tests
  that were sleeping 28s each through unmocked polling loop

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

OpenShell minimum version requirement bumped to 0.0.24 across installation and onboarding paths, with version-checking logic added to onboard.js and CI scripts. The sandbox survival e2e test expanded to verify persistence across gateway and machine restarts, including SSH connectivity validation and state persistence markers.

Changes

Cohort / File(s) Summary
OpenShell Version Requirements
bin/lib/onboard.js, scripts/install-openshell.sh, scripts/brev-launchable-ci-cpu.sh
Updated minimum version threshold to 0.0.24. Enhanced onboarding preflight to parse installed semantic version, compare against minimum, and trigger upgrade when needed. Added version-checking logic to CI script to reinstall if pinned version differs from installed version.
E2E Test Refactoring
test/e2e/test-sandbox-survival.sh
Comprehensive rewrite of sandbox survival test to validate persistence after restart. Added NemoClaw installer orchestration, SSH connectivity helpers using openshell sandbox ssh-config, persistence markers, and multi-phase verification including post-restart state validation and inference execution.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Version bumps hop from point two-oh to two-four,
Sandboxes now survive the restart encore—
SSH paths checked, persistence marks stay true,
The gateway sleeps and wakes, still serving you! 🔄

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: upgrading OpenShell to 0.0.24 to enable sandbox survival across restarts, which directly aligns with the core objective in the PR description and linked issue #486.
Linked Issues check ✅ Passed The PR implements the core requirement from issue #486 by upgrading OpenShell to 0.0.24 across multiple files and updating the E2E test to verify sandbox persistence across gateway restarts, addressing the sandbox discovery and connectivity loss after machine restart.
Out of Scope Changes check ✅ Passed All changes directly support the objective of sandbox restart resilience: version bumps to 0.0.24 in installation/onboarding flows, updated minimum version checks, and E2E test refactoring to verify persistence across restarts with SSH connectivity helpers.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/sandbox-survival-openshell-0.0.24

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/test-sandbox-survival.sh (1)

472-476: ⚠️ Potential issue | 🟠 Major

Don't let NotReady satisfy the ready-state check.

Line 475 extracts running|ready as a substring, so NotReady still yields Ready. In that case this E2E can report restart survival even though the sandbox never recovered.

🐛 Safer readiness check
-  sandbox_phase=$(openshell sandbox list 2>&1 | grep "$SANDBOX_NAME" | grep -oiE 'running|ready' | head -1)
+  sandbox_phase=$(
+    openshell sandbox list 2>&1 |
+      awk -v name="$SANDBOX_NAME" '$1 == name && ($2 == "Running" || $2 == "Ready") { print $2; exit }'
+  )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-sandbox-survival.sh` around lines 472 - 476, The readiness
check currently greps "running|ready" which matches substrings like "NotReady";
change the extraction in the loop that assigns sandbox_phase (the openshell
sandbox list pipeline referencing SANDBOX_NAME and sandbox_phase) to only match
whole words for the states (e.g. use a word-boundary regex like
'\b(running|ready)\b' or the grep word-boundary syntax '\<(running|ready)\>'
with the existing -i flag) so "NotReady" will not satisfy the check; keep the
rest of the loop logic (head -1, non-empty test) intact.
🧹 Nitpick comments (2)
test/onboard.test.js (1)

1689-1689: Keep the curl stub scoped to the dashboard readiness probe.

Lines 1689, 2104, 2300, and 2418 now return "ok" for any sandbox exec ... curl ... command. That fixes the hang, but it also lets these tests stay green if the readiness probe regresses to the wrong URL or port. Matching the expected localhost:18789 probe keeps the sleep fix without weakening the assertion.

Also applies to: 2104-2104, 2300-2300, 2418-2418

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/onboard.test.js` at line 1689, The current stub in test/onboard.test.js
that returns "ok" for any command where command.includes("sandbox exec") &&
command.includes("curl") is too broad; narrow the condition to only match the
dashboard readiness probe by also requiring the expected probe host/port (e.g.,
check that command includes "localhost:18789" or the exact probe URL used in
tests) so only the readiness curl is short-circuited and other curl invocations
still exercise assertions; update the same conditional at the other occurrences
referenced (the stubs at the other locations) to the same scoped check.
test/e2e/test-sandbox-survival.sh (1)

199-213: Retain the installer log when setup fails.

Lines 199-213 remove INSTALL_LOG before install_exit is checked. If install.sh dies early and the background tail -f missed the beginning, the only full artifact is gone right when the failure needs debugging.

🪵 Small tweak
 wait $install_pid
 install_exit=$?
 kill $tail_pid 2>/dev/null || true
 wait $tail_pid 2>/dev/null || true
-rm -f "$INSTALL_LOG"
 
 # Source shell profile to pick up nvm/PATH changes from install.sh
 if [ -f "$HOME/.bashrc" ]; then
   # shellcheck source=/dev/null
   source "$HOME/.bashrc" 2>/dev/null || true
@@
 if [ $install_exit -eq 0 ]; then
+  rm -f "$INSTALL_LOG"
   pass "install.sh completed (exit 0)"
 else
+  info "install.sh log retained at $INSTALL_LOG"
   fail "install.sh failed (exit $install_exit)"
   exit 1
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-sandbox-survival.sh` around lines 199 - 213, The script
currently deletes INSTALL_LOG unconditionally before checking install_exit,
losing the log needed when install.sh fails; update the teardown so rm -f
"$INSTALL_LOG" is performed only after you inspect install_exit (e.g., move the
rm step into a conditional that runs when install_exit == 0 or otherwise retain
the file on non-zero exit), keeping the existing background tail/kill/wait logic
for install_pid and tail_pid and using those same symbols (INSTALL_LOG,
install_pid, tail_pid, install_exit) to implement the conditional removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 1563-1585: The code currently skips upgrade when
getInstalledOpenshellVersion() returns null; change the logic in the block that
evaluates currentVersion so that an unreadable/null version is treated as "needs
upgrade": if currentVersion is falsy, call installOpenshell() (same flow as when
needsUpgrade is true), assign to openshellInstall, and handle failure (log
errors and process.exit(1)) the same way as the existing needsUpgrade branch;
update the conditional around needsUpgrade or introduce an explicit check (using
getInstalledOpenshellVersion, needsUpgrade, installOpenshell, and
openshellInstall) so unknown/oddly-formatted versions trigger reinstall.

In `@scripts/brev-launchable-ci-cpu.sh`:
- Line 43: The script sets OPENSHELL_VERSION but doesn't ensure the installed
OpenShell CLI matches it; update the OpenShell installation logic to detect the
currently installed version (e.g., by running the installed binary with a
version flag), compare that to OPENSHELL_VERSION, and if they differ replace the
installed binary with the pinned one (pull/install OPENSHELL_VERSION and restart
or re-link as the script already does for fresh installs). Modify the OpenShell
block to perform: check-version -> if version != OPENSHELL_VERSION then
download/install the pinned OPENSHELL_VERSION (and make executable/replace
existing binary) -> otherwise skip, ensuring the requested restart-survival fix
is applied when mismatched.

In `@test/e2e/test-sandbox-survival.sh`:
- Around line 264-265: The current grep-based check using REGISTRY and
SANDBOX_NAME is brittle because it matches substrings like defaultSandbox;
instead parse the registry JSON and assert the sandboxes array contains an entry
with the exact name SANDBOX_NAME. Replace the grep check around REGISTRY and
SANDBOX_NAME with a JSON-aware test (e.g., using jq or a small python -c) that
verifies .sandboxes[] | select(.name == "$SANDBOX_NAME") exists, and update the
identical checks later in the file (the ones around the second occurrence) so
both assertions explicitly verify the sandbox entry inside the sandboxes array
rather than relying on raw string matching.

In `@test/smoke-macos-install.test.js`:
- Line 10: Remove the broad describe.skip around "macOS smoke install script
guardrails" and instead enable the suite by using describe(...). Then gate only
the FIFO/named-pipe spec by conditionally skipping it on Darwin — e.g., detect
process.platform === 'darwin' and use (isDarwin ? it.skip : it) or test.skip for
the spec whose title includes "FIFO" or "named-pipe" so the other
help/env/runtime validation tests still run.

---

Outside diff comments:
In `@test/e2e/test-sandbox-survival.sh`:
- Around line 472-476: The readiness check currently greps "running|ready" which
matches substrings like "NotReady"; change the extraction in the loop that
assigns sandbox_phase (the openshell sandbox list pipeline referencing
SANDBOX_NAME and sandbox_phase) to only match whole words for the states (e.g.
use a word-boundary regex like '\b(running|ready)\b' or the grep word-boundary
syntax '\<(running|ready)\>' with the existing -i flag) so "NotReady" will not
satisfy the check; keep the rest of the loop logic (head -1, non-empty test)
intact.

---

Nitpick comments:
In `@test/e2e/test-sandbox-survival.sh`:
- Around line 199-213: The script currently deletes INSTALL_LOG unconditionally
before checking install_exit, losing the log needed when install.sh fails;
update the teardown so rm -f "$INSTALL_LOG" is performed only after you inspect
install_exit (e.g., move the rm step into a conditional that runs when
install_exit == 0 or otherwise retain the file on non-zero exit), keeping the
existing background tail/kill/wait logic for install_pid and tail_pid and using
those same symbols (INSTALL_LOG, install_pid, tail_pid, install_exit) to
implement the conditional removal.

In `@test/onboard.test.js`:
- Line 1689: The current stub in test/onboard.test.js that returns "ok" for any
command where command.includes("sandbox exec") && command.includes("curl") is
too broad; narrow the condition to only match the dashboard readiness probe by
also requiring the expected probe host/port (e.g., check that command includes
"localhost:18789" or the exact probe URL used in tests) so only the readiness
curl is short-circuited and other curl invocations still exercise assertions;
update the same conditional at the other occurrences referenced (the stubs at
the other locations) to the same scoped check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a0c014e4-6a94-418c-9098-ec51aa9f905f

📥 Commits

Reviewing files that changed from the base of the PR and between 84d0533 and 93f9e85.

📒 Files selected for processing (6)
  • bin/lib/onboard.js
  • scripts/brev-launchable-ci-cpu.sh
  • scripts/install-openshell.sh
  • test/e2e/test-sandbox-survival.sh
  • test/onboard.test.js
  • test/smoke-macos-install.test.js

Comment on lines +264 to +265
if [ -f "$REGISTRY" ] && grep -Fq "\"${SANDBOX_NAME}\"" "$REGISTRY"; then
pass "NemoClaw registry contains '$SANDBOX_NAME'"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Parse the registry JSON instead of grepping the raw sandbox name.

Lines 264-265 and 490-491 also match defaultSandbox, so these checks can pass even when the actual entry under sandboxes is missing. That weakens the “registry retains sandbox entry” assertion, and it makes the later cleanup assertion ambiguous for the same reason.

Also applies to: 490-491

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-sandbox-survival.sh` around lines 264 - 265, The current
grep-based check using REGISTRY and SANDBOX_NAME is brittle because it matches
substrings like defaultSandbox; instead parse the registry JSON and assert the
sandboxes array contains an entry with the exact name SANDBOX_NAME. Replace the
grep check around REGISTRY and SANDBOX_NAME with a JSON-aware test (e.g., using
jq or a small python -c) that verifies .sandboxes[] | select(.name ==
"$SANDBOX_NAME") exists, and update the identical checks later in the file (the
ones around the second occurrence) so both assertions explicitly verify the
sandbox entry inside the sandboxes array rather than relying on raw string
matching.

ericksoa added a commit that referenced this pull request Apr 7, 2026
## Summary
- Patch `streamSandboxCreate` spawn injection in onboard interactive
test to prevent 87s+ hang from real subprocess hitting a live gateway
- Mock dashboard readiness curl check in 4 onboard `createSandbox` tests
that were sleeping 28s each through unmocked polling loop
- Add 10s timeout to `smoke-macos-install` pipe test to prevent hang

Cherry-picked from #1587 (test-only commits).

## Test plan
- [x] `vitest run --project cli` passes
- [x] Pre-push hooks pass

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Tests**
* Enhanced test mocking infrastructure for sandbox execution health
checks and child process handling to improve test reliability and
coverage.
  * Adjusted macOS smoke test suite configuration.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
ericksoa added 2 commits April 7, 2026 16:09
…nstalls

Treat unreadable openshell version as too-old (reinstall instead of
silently skipping the minimum-version gate). Also compare installed
version against the pin in brev-launchable-ci-cpu.sh so reused
launchables get upgraded when OPENSHELL_VERSION changes.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@wscurran wscurran added Getting Started Use this label to identify setup, installation, or onboarding issues. priority: high Important issue that should be resolved in the next release OpenShell Support for OpenShell, a safe, private runtime for autonomous AI agents fix labels Apr 8, 2026
// Ensure the installed version meets the minimum required by install-openshell.sh.
// The script itself is idempotent — it exits early if the version is already sufficient.
const currentVersion = getInstalledOpenshellVersion();
if (!currentVersion) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems redundant with previous if, maybe it can be regrouped together to reduce branching?

}
} else {
const parts = currentVersion.split(".").map(Number);
const minParts = [0, 0, 24]; // must match MIN_VERSION in scripts/install-openshell.sh
Copy link
Copy Markdown
Contributor

@HagegeR HagegeR Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something that important might be better placed on some constant part of the code, on the first lines preferably if not in a separate config file
Even more so now that I see it's reused in other places

@cv cv added the v0.0.10 Release target label Apr 8, 2026
@cv cv self-assigned this Apr 8, 2026
Copy link
Copy Markdown
Contributor

@cv cv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — security review WARNING (non-blocking).

Approved with one note:

  • MEDIUM: brev-launchable-ci-cpu.sh downloads OpenShell without SHA-256 checksum verification (unlike install-openshell.sh which does verify). Pre-existing gap but this PR adds a second download-without-verify path. Consider reusing the checksum logic from install-openshell.sh in a follow-up.

Otherwise clean:

  • Version gate null-handling is correct
  • Version comparison logic is sound (regex guarantees numeric segments)
  • No credential changes
  • All CI checks pass
  • E2E test coverage is significantly expanded

@cv cv merged commit 1aa8427 into main Apr 8, 2026
25 of 27 checks passed
miyoungc added a commit that referenced this pull request Apr 9, 2026
## Summary
- Document `nemoclaw credentials list` and `nemoclaw credentials reset`
commands in commands reference (#1597)
- Add `--dry-run` flag documentation for `policy-add` (#1276)
- Update policy presets table: remove `docker` (#1647), add `brave` and
`brew`, update HuggingFace endpoint (#1540)
- Document `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` env var for local
providers (#1620)
- Document `NEMOCLAW_PROXY_HOST`/`NEMOCLAW_PROXY_PORT` env vars (#1563)
- Add troubleshooting entries for Docker group permissions (#1614),
sandbox survival after gateway restart (#1587), and proxy configuration
- Regenerate `nemoclaw-user-*` skills from updated docs

## Test plan
- [x] `make docs` builds without warnings
- [x] All pre-commit and pre-push hooks pass
- [ ] Verify rendered pages in docs site preview

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added `nemoclaw credentials list` command to display stored credential
names
* Added `nemoclaw credentials reset <KEY>` command with `--yes` flag to
remove credentials
  * Added `--dry-run` flag for policy-add to preview endpoint changes
  * New policy presets: `brave` and `brew`
* New configuration options: `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT`,
`NEMOCLAW_PROXY_HOST`, and `NEMOCLAW_PROXY_PORT`

* **Documentation**
* Expanded troubleshooting guides for Docker permissions, sandbox
connectivity, local inference timeouts, and proxy configuration

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Getting Started Use this label to identify setup, installation, or onboarding issues. OpenShell Support for OpenShell, a safe, private runtime for autonomous AI agents priority: high Important issue that should be resolved in the next release v0.0.10 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants