Skip to content

feat(recipe): add mixin composition for OS and platform fragments#501

Merged
mchmarny merged 1 commit intoNVIDIA:mainfrom
yuanchen8911:feat/recipe-mixins
Apr 8, 2026
Merged

feat(recipe): add mixin composition for OS and platform fragments#501
mchmarny merged 1 commit intoNVIDIA:mainfrom
yuanchen8911:feat/recipe-mixins

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

@yuanchen8911 yuanchen8911 commented Apr 7, 2026

Summary

Add RecipeMixin kind and spec.mixins field for composing orthogonal fragments (OS constraints, platform components) into leaf overlays without duplicating content across files. Update all recipe-related documentation to reflect the new mixin composition system.

Motivation / Context

Ubuntu OS constraints (3 lines) are duplicated in 12 leaf overlays. Kubeflow-trainer component definition (9 lines) is duplicated in 4 leaf overlays. Inference gateway components (kgateway-crds + kgateway, 20 lines) are duplicated in 5 service-inference overlays. When Ubuntu version, kubeflow chart version, or kgateway version changes, all copies must be updated — increasing review burden, merge conflict surface, and drift risk.

This is Phase 3 of the revised ADR-005 (#439): add composition abstractions after correctness fixes (Phase 1: #492, #493) and candidate selection (Phase 2: #496) are stable.

Fixes: N/A
Related: #439, #305, #492, #493, #496

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: ____________

Implementation Notes

Mixin system

  • RecipeMixin kind with constraints and componentRefs only — no criteria, base, mixins, or validation
  • Mixins []string field on RecipeMetadataSpec — accumulated during Merge(), stripped from materialized output
  • mergeMixins() helper called from both BuildRecipeResult and BuildRecipeResultWithEvaluator after mergeOverlayChains
  • Conflict detection: duplicate constraint or component names between a mixin and the inheritance chain produce an error at merge time
  • Duplicate mixin name detection: if two mixin files declare the same metadata.name, loading fails with a clear error
  • Loader: mixin files in recipes/mixins/ are loaded separately from overlays — distinct kind: RecipeMixin prevents them from being treated as matchable overlays

Mixin files

Mixin Content Shared by
os-ubuntu.yaml OS.release.ID, OS.release.VERSION_ID, OS.sysctl.kernel constraints 12 ubuntu leaf overlays (EKS, AKS, OKE)
platform-inference.yaml kgateway-crds + kgateway components with CRD manifests and inference gateway 5 service-inference overlays (EKS, AKS, GKE, Kind, OKE)
platform-kubeflow.yaml kubeflow-trainer component with deps and manifests 4 kubeflow leaf overlays (EKS, AKS, OKE)

Overlay conversions

Ubuntu leaf overlays — inline OS constraints → mixins: [os-ubuntu]:

  • h100-eks-ubuntu-training, h100-eks-ubuntu-inference, h100-aks-ubuntu-training, h100-aks-ubuntu-inference
  • gb200-eks-ubuntu-training, gb200-eks-ubuntu-inference
  • gb200-oke-ubuntu-training, gb200-oke-ubuntu-inference (new OKE overlays from Add GB200 overlays for OKE #497)

Kubeflow leaf overlays — inline OS constraints + kubeflow-trainer → mixins: [os-ubuntu, platform-kubeflow]:

  • h100-eks-ubuntu-training-kubeflow, h100-aks-ubuntu-training-kubeflow
  • gb200-eks-ubuntu-training-kubeflow
  • gb200-oke-ubuntu-training-kubeflow (new OKE overlay from Add GB200 overlays for OKE #497)

Service-inference overlays — inline kgateway-crds + kgateway → mixins: [platform-inference]:

Unchanged:

What a converted overlay looks like

# Before: 45 lines (inline OS constraints + kubeflow component)
spec:
  constraints:
    - name: K8s.server.version
      value: ">= 1.32.4"
    - name: OS.release.ID          # duplicated in 12 files
      value: ubuntu
    - name: OS.release.VERSION_ID  # duplicated in 12 files
      value: "24.04"
    - name: OS.sysctl./proc/sys/kernel/osrelease  # duplicated in 12 files
      value: ">= 6.8"
  componentRefs:
    - name: kubeflow-trainer       # duplicated in 4 files
      ...9 lines...

# After: 20 lines (mixin references + K8s constraint only)
spec:
  mixins:
    - os-ubuntu
    - platform-kubeflow
  constraints:
    - name: K8s.server.version
      value: ">= 1.32.4"
  componentRefs: []

Documentation updates

All recipe-related docs updated for mixin consistency:

File Changes
docs/contributor/data.md Added spec.mixins to schema table/YAML example, new Mixin Composition section, Step 5 (Apply Mixins) in recipe generation process
docs/integrator/recipe-development.md Added mixins to overview, leaf-with-mixins example, mixin naming conventions, best practices, external data sources
docs/integrator/data-flow.md Added mixin application step to Stage 2 recipe generation diagram
DEVELOPMENT.md Added recipes/mixins/*.yaml to Recipe Engine data source
site/docs/getting-started/index.md Added Mixin glossary entry
recipes/README.md Added mixins directory and description
.claude/CLAUDE.md Added mixin usage example and key files entry

Maintenance impact

Change Before After
Ubuntu version bump Edit 12 files Edit 1 mixin file
Kubeflow chart upgrade Edit 4 files Edit 1 mixin file
kgateway version bump Edit 5 files Edit 1 mixin file
New Ubuntu leaf overlay Copy-paste 3 constraints mixins: [os-ubuntu]
New inference service overlay Copy-paste 20 lines mixins: [platform-inference]

Testing

go test -race ./pkg/recipe/...
  • All existing recipe tests pass
  • Binary comparison: all 17 recipe criteria combinations produce byte-identical YAML output between main and mixin branch binaries
  • spec.mixins does not appear in materialized recipe output
  • TestBothBuildPathsProduceIdenticalContent verifies both build paths produce identical results across all 16 leaf overlays
  • OKE overlays from Add GB200 overlays for OKE #497 included in verification

Risk Assessment

  • Medium — Touches multiple components or has broader impact

Rollout notes: Backward compatible. To revert: remove spec.mixins from leaf overlays and inline the mixin content back. The RecipeMixin loader can remain dormant (no mixins referenced = no code path exercised). No recipe output format changes.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S) — GPG signing info

Post-compose constraint evaluation

Mixin-contributed constraints (e.g., kernel >= 6.8 from os-ubuntu) are evaluated against the snapshot in BuildRecipeResultWithEvaluator after mergeMixins. Per-overlay constraints are evaluated before merge (existing behavior); mixin constraints get their first evaluation post-compose.

Fallback behavior: If any mixin constraint fails, the entire composed candidate is reset to base-only output — all applied overlays from the inheritance chain are excluded, not just the mixin. This is a conservative choice: a failing mixin constraint means the composed recipe is not valid for the target environment, so partial results (chain content without mixin content) would be inconsistent. This is tested in TestMixinConstraintFailureExcludesCandidate.

Bundle verification

All recipe combinations are byte-identical. All platforms (EKS, GKE, AKS, Kind, OKE), intents (training, inference), accelerators (H100, GB200), OS variants (Ubuntu, COS), and platform overlays (kubeflow, dynamo, NIM) produce exactly the same recipe output before and after the mixin changes. Verified via binary comparison of aicr recipe output between main and PR branch.

GPU test dependency

Note: GPU CI tests (inference, conformance, training) fail on this PR due to a pre-existing imagePullPolicy regression in #444 (merged April 2) that is unrelated to the mixin changes. PR #505 fixes this by restoring the ko.localPullNever image pull policy for side-loaded validator images. Once #505 merges, rebasing this PR will resolve the GPU test failures. All non-GPU tests (unit tests, lint, KWOK, E2E, CLI E2E) pass.

@yuanchen8911 yuanchen8911 added enhancement New feature or request area/recipes labels Apr 7, 2026
@yuanchen8911 yuanchen8911 requested review from a team as code owners April 7, 2026 18:09
@yuanchen8911 yuanchen8911 added enhancement New feature or request area/recipes labels Apr 7, 2026
@github-actions github-actions bot added the size/L label Apr 7, 2026
@yuanchen8911 yuanchen8911 changed the title feat(recipe): add mixin composition for OS and platform fragments WIP: feat(recipe): add mixin composition for OS and platform fragments Apr 7, 2026
@yuanchen8911 yuanchen8911 marked this pull request as draft April 7, 2026 18:12
@yuanchen8911 yuanchen8911 force-pushed the feat/recipe-mixins branch 2 times, most recently from 76c8674 to 64855b2 Compare April 7, 2026 18:37
@github-actions github-actions bot added size/XL and removed size/L labels Apr 7, 2026
@yuanchen8911 yuanchen8911 force-pushed the feat/recipe-mixins branch 2 times, most recently from 678b9a4 to 94b3e09 Compare April 7, 2026 19:29
@yuanchen8911 yuanchen8911 force-pushed the feat/recipe-mixins branch 6 times, most recently from 8a7504a to f8a47c8 Compare April 7, 2026 22:10
@yuanchen8911 yuanchen8911 marked this pull request as ready for review April 8, 2026 13:14
@yuanchen8911 yuanchen8911 changed the title WIP: feat(recipe): add mixin composition for OS and platform fragments feat(recipe): add mixin composition for OS and platform fragments Apr 8, 2026
Copy link
Copy Markdown
Contributor

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on the mixin system — the byte-identical output verification is great and the conflict detection / fallback-to-base logic is solid. A couple things to fix before merging.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

@yuanchen8911 this PR now has merge conflicts with main. Please rebase to resolve them.

@yuanchen8911 yuanchen8911 force-pushed the feat/recipe-mixins branch 2 times, most recently from 1fc135b to f7848e2 Compare April 8, 2026 19:17
Introduce RecipeMixin kind and spec.mixins field on overlays to enable
composable, shared recipe fragments for cross-cutting concerns like OS
constraints and platform components. This eliminates duplication across
leaf overlays that share the same OS or platform content.

- Add recipes/mixins/ directory with os-ubuntu and platform-kubeflow
- Wire mixin loading, conflict detection, and constraint evaluation
  into the metadata store build pipeline
- Add mixin-aware tests for composition, conflicts, and constraint
  evaluation fallback
- Update all recipe documentation for mixin consistency

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
@github-actions github-actions bot removed the area/ci label Apr 8, 2026
@yuanchen8911 yuanchen8911 requested review from ArangoGutierrez and dims and removed request for ArangoGutierrez April 8, 2026 20:45
@mchmarny mchmarny enabled auto-merge (squash) April 8, 2026 23:01
@mchmarny mchmarny merged commit f47e95f into NVIDIA:main Apr 8, 2026
68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants