fix(preflight): distinguish docker socket permission from daemon-down#1599
fix(preflight): distinguish docker socket permission from daemon-down#1599latenighthackathon wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
When 'docker info' fails but 'systemctl is-active docker' reports the service is active, the daemon is running and the real problem is that the current user cannot access /var/run/docker.sock. The previous remediation always suggested 'sudo systemctl start docker', which is misleading and frustrating because the daemon is already running. This was reported on DGX Spark in NVIDIA#1574 — the user verified docker.service was active, then ran 'nemoclaw onboard' and got the wrong remediation. When the assessment shows docker is installed, unreachable, AND systemd reports the service is active, emit a new 'fix_docker_socket_permission' remediation that points users at the docker group + newgrp/relogin workflow instead. Adds a regression test asserting: - The new action id is returned for the active-but-unreachable case - The misleading 'sudo systemctl start docker' command is NOT present - The reason mentions the socket so users understand the root cause Closes NVIDIA#1574 Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughUpdated the Docker preflight remediation logic to conditionally handle scenarios where Docker is installed and its service is active on Linux but unreachable to the current user. Instead of suggesting to start Docker, it now recommends fixing socket permissions by adding the user to the docker group. Added a test case validating this new remediation path. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
✨ Thanks for submitting this fix, which proposes a way to distinguish Docker socket permission errors from daemon-down states during preflight checks. This improves the onboarding experience on DGX Spark and other Linux hosts where users may not be in the docker group. Possibly related open issues: |
|
@ericksoa landed essentially the same fix in #1614 yesterday (closing #50) — the Thanks @ericksoa for the cleaner shape — the For @zNeill on the original report (#1574): the fix is already on main and will land in the next release. The new behavior on Linux when systemd reports docker.service active but Cheers! |
Summary
fix_docker_socket_permissionremediationstart_dockerremediation for the case where the daemon is actually downProblem
On DGX Spark (and any Linux host where the user isn't in the
dockergroup),nemoclaw onboardreports that Docker is not reachable and suggestssudo systemctl start docker, even whendocker.serviceis already active and running. The current preflight already collectsdockerServiceActiveviasystemctl is-active dockerbut never uses it in the remediation logic — so a socket permission error and a daemon-down error get the same misleading message.The reporter verified that
systemctl status dockerreturnedactive (running)and was still told to start docker. This sends users down a dead end because re-runningsystemctl start dockerdoes nothing when it's already up.Fix
In
planHostRemediation, whendockerInstalled && !dockerReachable && dockerServiceActive === true && platform === "linux", emit a new remediation:fix_docker_socket_permissionsudo usermod -aG docker $USER→newgrp docker(or relogin) →docker info→nemoclaw onboardThe original
start_dockerremediation is preserved for the daemon-actually-down case.Test plan
sudo usermod -aG docker $USERis in the commandssudo systemctl start dockercommand is NOT presentnpx vitest run src/lib/preflight.test.ts)start_dockertest still passes — the daemon-down branch is unchangedCloses #1574
Signed-off-by: latenighthackathon latenighthackathon@users.noreply.github.com
Summary by CodeRabbit
Bug Fixes
Tests