Skip to content

fix: show host CPU/RAM in System Overview instead of Arcane container limits (#1110)#2343

Open
GiulioSavini wants to merge 2 commits intogetarcaneapp:mainfrom
GiulioSavini:fix/system-overview-use-host-stats
Open

fix: show host CPU/RAM in System Overview instead of Arcane container limits (#1110)#2343
GiulioSavini wants to merge 2 commits intogetarcaneapp:mainfrom
GiulioSavini:fix/system-overview-use-host-stats

Conversation

@GiulioSavini
Copy link
Copy Markdown
Contributor

@GiulioSavini GiulioSavini commented Apr 11, 2026

Summary

Fixes #1110 (originally reported as #1052). The System Overview dashboard reports CPU cores and memory totals clamped to the Arcane container's own cgroup limits. When Arcane runs in a container with e.g. cpus: 2 and memory: 512MB, the dashboard shows things like Memory Usage 4.69 GB / 512 MBmemUsed is read from the host while memTotal has been overwritten with the container limit — and 2 CPUs even when the host has many more. The referenced earlier fix (80ccc9b...) did not remove the override that caused the regression on both code paths that feed the dashboard.

Fix

Two surgical deletions, one per path, both leaving the surrounding infrastructure intact.

  1. backend/internal/api/ws_handler.go — drop the applyCgroupLimits call from collectSystemStats and delete the now-orphaned helper. collectSystemStats now returns what gopsutil reports for the host (cpu.Counts(true), mem.VirtualMemory()) unchanged.
    • The cgroup-limits caching infrastructure (cgroupLimitsDetector, getCachedCgroupLimitsInternal, the static cache sampler) stays put — still exercised by TestWebSocketHandler_GetCachedCgroupLimitsInternal_DeduplicatesRefresh. If someone wants to add a future "Arcane container stats" view that intentionally shows the container's own limits, the infra is there to consume; the regression just isn't wired into the host dashboard any more.
  2. backend/internal/huma/handlers/system.go — drop the DetectCgroupLimits block in GetDockerInfo and its now-unused dockerutil import. info.NCPU and info.MemTotal from the Docker daemon (which already reflect the host the daemon is running on) are passed through unchanged.

A short comment explains the intent in GetDockerInfo so a future refactor doesn't re-introduce the clamp.

Why this is safe

  • applyCgroupLimits had one caller (collectSystemStats) — confirmed by repo-wide grep. The helper is deleted cleanly, no dangling references.
  • The docker "github.com/getarcaneapp/arcane/backend/pkg/dockerutil" alias in system.go was only used inside the removed cgroup block — confirmed by the same grep. Removing it keeps imports tidy.
  • The dockerutil.DetectCgroupLimits function itself and the whole backend/pkg/dockerutil package are untouched.
  • No frontend changes. The dashboard will simply start rendering the host values it was always fetching on that path.

Tests

  • go vet ./internal/api/... ./internal/huma/handlers/... — clean.
  • go test ./internal/api/... -count=1ok in ~3.4s. All existing ws_handler tests pass, including the one that exercises getCachedCgroupLimitsInternal (which is still around and still reachable).
  • go test ./internal/huma/handlers/... -count=1ok in ~0.2s.

Out of scope

  • #995 (container update behavior) — intentionally not touched here, that one is a design discussion and kmendell's compose-restart / updater-restart experimental images already explore it.
  • Any future "show me my Arcane container's own resource budget" view — this would be a separate, additive PR that consumes the still-intact cgroup infrastructure.

Fixes #1110

Disclaimer Greptiles Reviews use AI, make sure to check over its work.

To better help train Greptile on our codebase, if the comment is useful and valid Like the comment, if its not helpful or invalid Dislike

To have Greptile Re-Review the changes, mention greptileai.

Greptile Summary

This PR fixes a regression where the System Overview dashboard showed CPU/RAM clamped to the Arcane container's cgroup limits instead of the actual host values. The fix is two surgical deletions: removing the applyCgroupLimits call and its method from ws_handler.go, and removing the DetectCgroupLimits override block (plus its now-unused import) from system.go.

Confidence Score: 5/5

Safe to merge — changes are pure deletions of an incorrect override with no regressions introduced.

Both changes are straightforward removals of a faulty cgroup-clamping path; no new logic is introduced, the remaining infrastructure is untouched, all existing tests pass, and the fix directly targets the reported bug.

No files require special attention.

Reviews (1): Last reviewed commit: "fix: show host CPU/RAM in System Overvie..." | Re-trigger Greptile

@kmendell
Copy link
Copy Markdown
Member

kmendell commented Apr 11, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@kmendell
Copy link
Copy Markdown
Member

The issue with this is LXC containers need the cgroup logic in order to detect the correct limits.

…er (getarcaneapp#1110)

The System Overview dashboard clamped CPU cores and memory totals to
the Arcane container's own cgroup limits (set by the operator via
--cpus / --memory in docker-compose). This caused nonsense readings
like "Memory Usage 4.69 GB / 512 MB" and "2 CPUs" on a 16-core host.

Root cause: applyCgroupLimits() and the GetDockerInfo cgroup block
applied the container's artificial scheduling limits as if they were
hardware facts, overwriting the correct host values gopsutil and the
Docker daemon already return.

The fix must preserve cgroup-limit application for LXC containers.
In LXC, gopsutil reads the physical host's /proc (which may show 128 GB
RAM) while the LXC guest was only allocated 8 GB — the cgroup limits
are the correct values to display. Docker is the opposite: its limits
are operator-defined throttles, not hardware reality.

Detection: Docker always creates /.dockerenv; LXC does not. A secondary
check on /proc/self/cgroup patterns covers edge cases. This is exposed
as docker.IsDockerContainer() in cgroup_utils.go and used in two places:

- ws_handler.go applyCgroupLimits(): returns early (no-op) when in Docker,
  allowing gopsutil host values through unchanged. Applies cgroup limits
  for LXC and other cgroup-managed environments as before.
- system.go GetDockerInfo(): same guard around the DetectCgroupLimits
  block — Docker daemon NCPU/MemTotal are already host values in Docker;
  in LXC the daemon may report the full physical machine so the LXC
  cgroup budget must still be applied.

Fixes getarcaneapp#1110
@GiulioSavini GiulioSavini force-pushed the fix/system-overview-use-host-stats branch from 6ceb719 to 4fc6400 Compare April 12, 2026 16:50
@GiulioSavini
Copy link
Copy Markdown
Contributor Author

Good catch, @kmendell. The previous version removed applyCgroupLimits entirely, which broke LXC deployments where the cgroup limits represent the real hardware budget assigned to the guest (gopsutil reads the host's /proc there and would show the physical machine's full RAM/CPU).

I've reworked the fix to be environment-aware:

  • Added IsDockerContainer() to cgroup_utils.go — checks for /.dockerenv (always created by Docker, never by LXC) with a /proc/self/cgroup pattern fallback.
  • applyCgroupLimits() returns early (no-op) when inside Docker, so gopsutil's already-correct host values are passed through unchanged.
  • For LXC and other cgroup-managed environments the limits are still applied as before.
  • Same guard in GetDockerInfo — Docker daemon's NCPU/MemTotal are already host values in Docker; in LXC the daemon may report the physical machine so the LXC cgroup budget must be applied.

Both go test ./internal/api/... and ./internal/huma/handlers/... pass.

@GiulioSavini
Copy link
Copy Markdown
Contributor Author

Two things I'd like your eyes on before this merges:

1. GetDockerInfo + LXC: cgroup limits might be double-wrong

If Arcane and Docker Engine are both running inside the same LXC guest (the common case), info.NCPU/info.MemTotal from the daemon already reflect the LXC-limited view — applying cgroup limits on top could produce incorrect results a second time. The guard !IsDockerContainer() correctly skips this in Docker, but for LXC-hosted-Docker the block may be redundant or harmful.

The original PR removed this block entirely for GetDockerInfo and I think that's actually correct: the Docker daemon always reports what its own environment sees, which is already the right thing to display. I can drop the cgroup block from GetDockerInfo unconditionally and only keep the applyCgroupLimits guard (with the !IsDockerContainer() check) in ws_handler.go where gopsutil reads raw /proc. Would you prefer that?

2. /.dockerenv + Podman docker-compat mode

Podman can create /.dockerenv when running in Docker-compatibility mode. In that case IsDockerContainer() would return true even though Podman's cgroup limits behave more like Docker's (artificial throttles), so the behavior would still be correct — but worth flagging in case you have Podman users who rely on the LXC path.

@kmendell
Copy link
Copy Markdown
Member

I will have to test on a LXC container later today

@GiulioSavini
Copy link
Copy Markdown
Contributor Author

alrght : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐞 Bug: System Overview has wrong reference for CPU Cores and RAM (#1052)

2 participants