feat: add component contributor test harness by ArangoGutierrez · Pull Request #508 · NVIDIA/aicr

ArangoGutierrez · 2026-04-08T18:41:29Z

Summary

Validate AICR components end-to-end with a single command — no GPU hardware required for most components.

make component-test COMPONENT=cert-manager

Three test tiers (auto-detected from registry.yaml): scheduling (KWOK redirect), deploy (Kind + bundle + health check), gpu-aware (Kind + nvml-mock + deploy + health check)
nvml-mock integration using ghcr.io/nvidia/nvml-mock:0.1.0 for GPU simulation in Kind clusters (arm64 + amd64, includes nvidia-smi)
Bundler bugfix: deploy.sh template now conditionally includes --version flag — fixes broken helm commands for components without defaultVersion in registry (e.g., gpu-operator)

New files

tools/component-test/ — 7 scripts (detect-tier, ensure-cluster, setup-gpu-mock, deploy-component, run-health-check, cleanup), Kind config, nvml-mock manifest, README
Makefile targets: component-test, component-detect, component-cluster, component-deploy, component-health, component-cleanup
Documentation updates in DEVELOPMENT.md and CONTRIBUTING.md

Test Plan

make test — all unit tests pass (72.1% coverage)
make component-test COMPONENT=cert-manager — deploy tier end-to-end (build → deploy → health check → cleanup)
make component-test COMPONENT=gpu-operator TIER=gpu-aware — gpu-aware tier end-to-end (build → nvml-mock → deploy → health check → cleanup)
make component-test COMPONENT=cert-manager TIER=scheduling — scheduling tier redirects to KWOK
New tests: TestGenerateDeployScript_EmptyVersionOmitsFlag, TestGenerateDeployScript_WithVersionIncludesFlag

kannon92 · 2026-04-08T18:50:02Z

So rather than go with mock GPUs is there a way we could have a CPU flavor?

I like that pattern for llama.cpp or vllm.

ArangoGutierrez · 2026-04-08T18:55:34Z

So rather than go with mock GPUs is there a way we could have a CPU flavor?

I like that pattern for llama.cpp or vllm.

Good question — the harness actually already has a GPU-free path. The deploy tier validates components in plain Kind without any GPU mock (cert-manager, kai-scheduler, etc. use this today).

The nvml-mock layer is specifically for components that gate on GPU presence during init — gpu-operator, nvidia-device-plugin, DRA driver — they won't even start their reconciliation loop unless they
detect NVML libraries and device nodes on the host. There's no CPU flavor of those because their entire purpose is managing GPU hardware.

For inference workloads like llama.cpp or vLLM, a CPU flavor would make sense as a complementary pattern — deploy the serving stack with a CPU backend and validate the end-to-end request path. That's a
higher-level integration test than what this harness targets (component deployment + health check), but it could be built on top of it.

So both patterns have a place:

nvml-mock: GPU infrastructure components that check for hardware at init
CPU flavors: inference/serving workloads that can run with CPU backends

Validate AICR components end-to-end with a single command: make component-test COMPONENT=cert-manager Three test tiers, auto-detected from registry.yaml: - scheduling: redirects to existing KWOK infrastructure - deploy: Kind cluster + aicr bundle + chainsaw health check - gpu-aware: Kind + nvml-mock DaemonSet + deploy + health check New files: - tools/component-test/{detect-tier,ensure-cluster,setup-gpu-mock, deploy-component,run-health-check,cleanup}.sh - tools/component-test/{kind-config.yaml,manifests/nvml-mock.yaml,README.md} Makefile targets: component-test, component-detect, component-cluster, component-deploy, component-health, component-cleanup. Uses ghcr.io/nvidia/nvml-mock:0.1.0 for GPU simulation in Kind clusters (arm64+amd64, includes nvidia-smi). Tested end-to-end: - deploy tier: cert-manager (build → deploy → health check → cleanup) - gpu-aware tier: gpu-operator (build → nvml-mock → deploy → health check → cleanup) Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

The deploy.sh template unconditionally included '--version {{ .Version }}' which produced a broken helm command when Version was empty (e.g., gpu-operator has no defaultVersion in registry.yaml). Helm 4 treats the empty --version as a missing required argument. The template now conditionally includes --version only when Version is non-empty, allowing components without pinned versions to install the latest chart from the repository. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

kannon92

Thanks for this! This should help me a lot of Kueue work.

ArangoGutierrez · 2026-04-08T20:08:04Z

CI is passing, ready for review @yuanchen8911 / @mchmarny

ArangoGutierrez requested review from a team as code owners April 8, 2026 18:41

github-actions bot added area/recipes area/docs area/bundler size/XL labels Apr 8, 2026

ArangoGutierrez mentioned this pull request Apr 8, 2026

add kueue components as an option #490

Draft

25 tasks

mchmarny assigned ArangoGutierrez Apr 8, 2026

ArangoGutierrez added 2 commits April 8, 2026 21:04

ArangoGutierrez force-pushed the feature/component-test-harness branch from d84bc0a to 45ddbbe Compare April 8, 2026 19:08

kannon92 reviewed Apr 8, 2026

View reviewed changes

Merge branch 'main' into feature/component-test-harness

2844c50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add component contributor test harness#508

feat: add component contributor test harness#508
ArangoGutierrez wants to merge 3 commits intoNVIDIA:mainfrom
ArangoGutierrez:feature/component-test-harness

ArangoGutierrez commented Apr 8, 2026

Uh oh!

kannon92 commented Apr 8, 2026

Uh oh!

ArangoGutierrez commented Apr 8, 2026

Uh oh!

kannon92 left a comment

Uh oh!

ArangoGutierrez commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ArangoGutierrez commented Apr 8, 2026

Summary

New files

Test Plan

Uh oh!

kannon92 commented Apr 8, 2026

Uh oh!

ArangoGutierrez commented Apr 8, 2026

Uh oh!

kannon92 left a comment

Choose a reason for hiding this comment

Uh oh!

ArangoGutierrez commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants