Skip to content

fix: detect Intel Arc on multi-card hosts and collect real memory stats#2358

Open
GiulioSavini wants to merge 3 commits intogetarcaneapp:mainfrom
GiulioSavini:fix/intel-gpu-multi-card-detection
Open

fix: detect Intel Arc on multi-card hosts and collect real memory stats#2358
GiulioSavini wants to merge 3 commits intogetarcaneapp:mainfrom
GiulioSavini:fix/intel-gpu-multi-card-detection

Conversation

@GiulioSavini
Copy link
Copy Markdown
Contributor

@GiulioSavini GiulioSavini commented Apr 12, 2026

Fixes #999

Problem

On Proxmox (and similar setups where /dev/dri has more than one card) the
Intel Arc GPU was invisible in the dashboard. Two independent root causes:

  1. Wrong card selectedintel_gpu_top without -d defaults to the
    first DRI device (card0), which on a passthrough VM is typically the
    VirtIO display adapter. The actual Arc GPU sits on card1 (or higher) and
    was never queried.

  2. Stub implementationgetIntelStats() was a placeholder that returned
    a single hardcoded "Intel GPU" entry with zero memory figures and never
    called intel_gpu_top at all.

  3. No sysfs fallback — if intel_gpu_top was not in $PATH (ARM builds,
    minimal containers) the GPU didn't appear in the dashboard at all.

Changes

What How
findIntelDRICards() Scans /sys/class/drm/card*/device/vendor for PCI vendor 0x8086 — returns only real Intel cards, ignoring VirtIO or other non-Intel adapters regardless of card index
hasIntelGPUInternal() Thin wrapper used by detectGPUs() as a sysfs fallback when intel_gpu_top is absent
detectGPUs() After the intel_gpu_top probe fails, tries sysfs detection so the GPU still appears in the dashboard
getIntelStats() Iterates over every Intel DRI card; runs intel_gpu_top -d drm:<path> -J -s 100 -c 1 per card to get discrete VRAM stats (total/used) on Arc/Xe; gracefully falls back to zero memory when the tool is unavailable or the GPU has no local memory (iGPU)
intelGPUName() Reads device/label from sysfs for a human-readable name; falls back to "Intel GPU (cardN)"

Testing

Tested with go vet and existing unit tests (go test ./backend/internal/api/...).
Full hardware validation requires a multi-card Proxmox host with an Intel Arc
GPU — the fix directly addresses the setup described in #999.

Disclaimer Greptiles Reviews use AI, make sure to check over its work.

To better help train Greptile on our codebase, if the comment is useful and valid Like the comment, if its not helpful or invalid Dislike

To have Greptile Re-Review the changes, mention greptileai.

Greptile Summary

This PR fixes Intel Arc GPU invisibility on multi-card hosts (e.g. Proxmox) by replacing the placeholder getIntelStats with a real implementation that enumerates Intel DRI cards via sysfs vendor IDs, targets each card individually with intel_gpu_top -d drm:<path>, and adds a sysfs-only fallback in detectGPUs when the tool is absent. The core logic is sound. Remaining findings are all P2: findIntelDRICards violates the project-wide Internal suffix convention for unexported functions, fmt.Errorf("...: %w", nil) produces a confusing <nil> in the error message when the JSON array is empty, and the unknown-unit branch defaults silently to bytes without a log warning.

Confidence Score: 5/5

Safe to merge — all three findings are P2 style/quality suggestions that don't affect correctness or runtime behaviour.

The implementation logic is correct: sysfs vendor-ID scanning, per-card intel_gpu_top invocation, JSON parsing with both array and object formats, and nil-memory guard for iGPUs all look sound. The three open comments are purely about naming convention, error-message clarity, and a missing warn log — none blocks the fix from working correctly in production.

backend/internal/api/ws_handler.go — three minor P2 issues in the new Intel GPU functions

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: backend/internal/api/ws_handler.go
Line: 1891

Comment:
**Unexported function missing `Internal` suffix**

Per the project convention, all unexported standalone functions must use the `Internal` suffix. `findIntelDRICards` is a package-private function but does not follow this pattern — unlike the sibling `hasIntelGPUInternal` and `readSysfsValueInternal` in the same file.

```suggestion
func findIntelDRICardsInternal() []string {
```

`hasIntelGPUInternal` would then call `findIntelDRICardsInternal()`.

**Rule Used:** What: All unexported functions must have the "Inte... ([source](https://app.greptile.com/review/custom-context?memory=306fc233-4d2f-4ac4-bdf7-8059588e8a43))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: backend/internal/api/ws_handler.go
Line: 1985-1986

Comment:
**Nil error wrapped by `%w` produces a confusing message**

When `json.Unmarshal` succeeds but the slice is empty (`err == nil && len(arr) == 0`), `fmt.Errorf("parse intel_gpu_top array: %w", nil)` emits the message `"parse intel_gpu_top array: <nil>"`. The error is still non-nil so callers behave correctly, but the diagnostic message is misleading. Split the two conditions:

```suggestion
			if err := json.Unmarshal(data, &arr); err != nil {
				return intelMemStats{}, fmt.Errorf("parse intel_gpu_top array: %w", err)
			}
			if len(arr) == 0 {
				return intelMemStats{}, fmt.Errorf("parse intel_gpu_top array: empty output")
			}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: backend/internal/api/ws_handler.go
Line: 2007-2008

Comment:
**Silent fallback to raw bytes for unrecognised memory unit**

If `intel_gpu_top` ever emits an unexpected unit string (e.g. `"MiB "` with trailing whitespace, `"mb"`, or a future `"KiB"`), `scale` silently becomes `1` and the stored bytes value is off by ~six orders of magnitude. Adding a warning log would make such a mismatch far easier to diagnose:

```suggestion
	default:
		slog.WarnContext(ctx, "Unknown intel_gpu_top memory unit, treating as bytes", "unit", unit)
		scale = 1 // assume bytes
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: Intel Arc not showing on multi-card..." | Re-trigger Greptile

Greptile also left 3 inline comments on this PR.

Context used:

  • Rule used - What: All unexported functions must have the "Inte... (source)

@kmendell
Copy link
Copy Markdown
Member

kmendell commented Apr 12, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

card0 on a passthrough VM is usually a VirtIO adapter, not the Arc GPU.
intel_gpu_top without -d always grabbed the first card, so the real GPU
was never queried.

Also the stats function was a stub that never actually called intel_gpu_top.
Now it targets each Intel card by PCI vendor ID and parses the JSON output
for real VRAM figures.

Added a sysfs fallback so the card shows up even when intel_gpu_top isn't
installed (ARM, minimal images).

Fixes getarcaneapp#999
@GiulioSavini GiulioSavini force-pushed the fix/intel-gpu-multi-card-detection branch from 70a3d1b to 1551f59 Compare April 12, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐞 Bug: Intel Arc GPU not showing in dashboard

2 participants