Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,27 @@ slog.Error("operation failed", "error", err, "component", "gpu-collector")

**Note:** A component must have either `helm` OR `kustomize` configuration, not both.

**Using mixins for shared OS/platform content:**
```yaml
# Leaf overlay referencing mixins instead of duplicating content
spec:
base: h100-eks-ubuntu-training
mixins:
- os-ubuntu # Ubuntu constraints (defined once in recipes/mixins/)
- platform-kubeflow # kubeflow-trainer component (defined once in recipes/mixins/)
criteria:
service: eks
accelerator: h100
os: ubuntu
intent: training
platform: kubeflow
constraints:
- name: K8s.server.version
value: ">= 1.32.4"
```

Mixins carry only `constraints` and `componentRefs` — no `criteria`, `base`, `mixins`, or `validation`. They live in `recipes/mixins/` with `kind: RecipeMixin`.

## Error Wrapping Rules

**Never return bare errors.** Every `return err` must wrap with context:
Expand Down Expand Up @@ -467,6 +488,7 @@ ${AICR_BIN} validate -r recipe.yaml -s snapshot.yaml --no-cluster
| `.settings.yaml` | Project settings: tool versions, quality thresholds, build/test config (single source of truth) |
| `recipes/registry.yaml` | Declarative component configuration |
| `recipes/overlays/*.yaml` | Recipe overlay definitions |
| `recipes/mixins/*.yaml` | Composable mixin fragments (OS constraints, platform components) |
| `recipes/components/*/values.yaml` | Component Helm values |
| `api/aicr/v1/server.yaml` | OpenAPI spec |
| `.goreleaser.yaml` | Release configuration |
Expand Down
2 changes: 1 addition & 1 deletion DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ aicr/
- **Snapshot Mode**: Extract query from snapshot → Build recipe → Return recommendations
- **Input**: OS, OS version, kernel, K8s service/version, GPU type, workload intent
- **Output**: Recipe with matched rules and configuration measurements
- **Data Source**: Embedded YAML configuration (`recipes/overlays/*.yaml` including `base.yaml`)
- **Data Source**: Embedded YAML configuration (`recipes/overlays/*.yaml` including `base.yaml`, `recipes/mixins/*.yaml`)
- **Query Extraction**: Parses K8s, OS, GPU measurements from snapshots to construct recipe queries

#### Snapshotter
Expand Down
73 changes: 69 additions & 4 deletions docs/contributor/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ recipes/
│ ├── eks-training.yaml # EKS + training workloads (inherits from eks)
│ ├── gb200-eks-ubuntu-training.yaml # GB200/EKS/Ubuntu/training (inherits from eks-training)
│ └── h100-ubuntu-inference.yaml # H100/Ubuntu/inference
├── mixins/ # Composable mixin fragments (kind: RecipeMixin)
│ ├── os-ubuntu.yaml # Ubuntu OS constraints (shared by leaf overlays)
│ ├── platform-inference.yaml # Inference gateway components (shared by service-inference overlays)
│ └── platform-kubeflow.yaml # Kubeflow trainer component (shared by leaf overlays)
└── components/ # Component values files
├── cert-manager/
│ └── values.yaml
Expand Down Expand Up @@ -88,6 +92,9 @@ metadata:

spec:
base: <parent-recipe> # Optional - inherits from another recipe
mixins: # Optional - composable mixin fragments
- os-ubuntu # OS constraints (from recipes/mixins/)
- platform-kubeflow # Platform components (from recipes/mixins/)

criteria: # When this recipe/overlay applies
service: eks # Kubernetes platform
Expand Down Expand Up @@ -118,6 +125,7 @@ spec:
| `apiVersion` | Always `aicr.nvidia.com/v1alpha1` |
| `metadata.name` | Unique recipe identifier |
| `spec.base` | Parent recipe to inherit from (empty = inherits from `overlays/base.yaml`) |
| `spec.mixins` | List of mixin names to compose (e.g., `["os-ubuntu", "platform-kubeflow"]`) |
| `spec.criteria` | Query parameters that select this recipe |
| `spec.constraints` | Pre-flight validation rules |
| `spec.componentRefs` | List of components to deploy |
Expand Down Expand Up @@ -389,6 +397,52 @@ spec:
| **Flexible Extension** | Add new leaf recipes without duplicating parent configs |
| **Testable** | Each level can be validated independently |

### Mixin Composition

Inheritance is single-parent (`spec.base`), which means cross-cutting concerns like OS constraints or platform components would need to be duplicated across leaf overlays. **Mixins** solve this by providing composable fragments that leaf overlays reference via `spec.mixins`.

Mixin files live in `recipes/mixins/` and use `kind: RecipeMixin`:

```yaml
# recipes/mixins/os-ubuntu.yaml
kind: RecipeMixin
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
name: os-ubuntu

spec:
constraints:
- name: OS.release.ID
value: ubuntu
- name: OS.release.VERSION_ID
value: "24.04"
- name: OS.sysctl./proc/sys/kernel/osrelease
value: ">= 6.8"
```

Leaf overlays compose mixins alongside inheritance:

```yaml
# recipes/overlays/h100-eks-ubuntu-training-kubeflow.yaml
spec:
base: h100-eks-training
mixins:
- os-ubuntu # Ubuntu constraints
- platform-kubeflow # Kubeflow trainer component
criteria:
service: eks
accelerator: h100
os: ubuntu
intent: training
platform: kubeflow
```

**Mixin rules:**
- Mixins carry only `constraints` and `componentRefs` — no `criteria`, `base`, `mixins`, or `validation`
- Mixins are applied after inheritance chain merging but before constraint evaluation
- Conflict detection: a mixin constraint or component that conflicts with the inheritance chain or a previously applied mixin produces an error
- When a snapshot is provided, mixin constraints are evaluated against it after merging; if any fail, the entire composed candidate is invalid and falls back to base-only output. In plain query mode (no snapshot), mixin constraints are merged but not evaluated

### Cycle Detection

The system detects circular inheritance to prevent infinite loops:
Expand Down Expand Up @@ -624,7 +678,7 @@ store, err := loadMetadataStore(ctx)

- Embedded YAML files are parsed into Go structs
- Cached in memory on first access (singleton pattern with `sync.Once`)
- Contains base recipe, all overlays, and component values files
- Contains base recipe, all overlays, mixins, and component values files

### Step 2: Find Matching Overlays

Expand Down Expand Up @@ -679,7 +733,18 @@ func mergeComponentRef(base, overlay ComponentRef) ComponentRef {
}
```

### Step 5: Validate Dependencies
### Step 5: Apply Mixins

```go
mixinConstraintNames, err := store.mergeMixins(mergedSpec)
```

- If the leaf overlay declares `spec.mixins`, each named mixin is loaded from `recipes/mixins/`
- Mixin constraints and componentRefs are appended to the merged spec
- Conflict detection prevents duplicates between the inheritance chain, previously applied mixins, and the current mixin
- When a snapshot evaluator is provided, mixin constraints are evaluated against it after merging; failure invalidates the entire composed candidate. In plain query mode (no snapshot), mixin constraints are merged but not evaluated

### Step 6: Validate Dependencies

```go
if err := mergedSpec.ValidateDependencies(); err != nil {
Expand All @@ -690,7 +755,7 @@ if err := mergedSpec.ValidateDependencies(); err != nil {
- Verify all `dependencyRefs` reference existing components
- Detect circular dependencies

### Step 6: Compute Deployment Order
### Step 7: Compute Deployment Order

```go
deployOrder, err := mergedSpec.TopologicalSort()
Expand All @@ -699,7 +764,7 @@ deployOrder, err := mergedSpec.TopologicalSort()
- Topologically sort components based on `dependencyRefs`
- Ensures dependencies are deployed before dependents

### Step 7: Build RecipeResult
### Step 8: Build RecipeResult

```go
return &RecipeResult{
Expand Down
11 changes: 8 additions & 3 deletions docs/integrator/data-flow.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,10 +251,15 @@ When a query matches a leaf recipe that has a `spec.base` reference, the system
│ ├─ + gb200-eks-training (GB200 overrides) │
│ └─ + gb200-eks-ubuntu-training (Ubuntu specifics) │
│ │
│ 4. Strip context (if !context) │
│ 4. Apply mixins (if spec.mixins declared) │
│ ├─ Load mixin files from recipes/mixins/ │
│ ├─ Append mixin constraints and componentRefs │
│ └─ If snapshot provided, evaluate mixin constraints│
│ │
│ 5. Strip context (if !context) │
│ └─ Remove context maps from all subtypes │
│ │
5. Return recipe │
6. Return recipe │
│ │
└────────────────────────────────────────────────────────┘
```
Expand Down Expand Up @@ -812,7 +817,7 @@ X-RateLimit-Reset: 1735650000
### Embedded Data

**Recipe Data:**
- Location: `recipes/overlays/*.yaml` (including `base.yaml`)
- Location: `recipes/overlays/*.yaml` (including `base.yaml`), `recipes/mixins/*.yaml`
- Embedded at compile time via `//go:embed` directives
- Loaded once per process, cached in memory
- TTL: 5 minutes (in-memory cache)
Expand Down
37 changes: 32 additions & 5 deletions docs/integrator/recipe-development.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,12 @@ make qualify # Includes end to end tests before submitting

## Overview

Recipe metadata files define component configurations for GPU-accelerated Kubernetes deployments using a **base-plus-overlay architecture** with **multi-level inheritance**:
Recipe metadata files define component configurations for GPU-accelerated Kubernetes deployments using a **base-plus-overlay architecture** with **multi-level inheritance** and **mixin composition**:

- **Base values** (`overlays/base.yaml`) - universal defaults
- **Intermediate recipes** (`eks.yaml`, `eks-training.yaml`) - shared configurations for categories
- **Leaf recipes** (`gb200-eks-ubuntu-training.yaml`) - hardware/workload-specific overrides
- **Mixins** (`mixins/*.yaml`) - composable fragments (OS constraints, platform components) that leaf overlays reference via `spec.mixins` instead of duplicating content
- **Inline overrides** - per-recipe customization without new files

Recipe files in `recipes/` are embedded at compile time. Integrators can extend or override using the `--data` flag (see [Advanced Topics](#advanced-topics)).
Expand Down Expand Up @@ -125,12 +126,31 @@ spec:
version: "580.82.07" # Hardware-specific override
```

**Merge order:** `base.yaml` (lowest) → intermediate → leaf (highest)
**Leaf recipes with mixins** compose shared fragments:
```yaml
# h100-eks-ubuntu-training-kubeflow.yaml
spec:
base: h100-eks-ubuntu-training
mixins:
- os-ubuntu # Shared Ubuntu constraints (from recipes/mixins/)
- platform-kubeflow # Kubeflow trainer component (from recipes/mixins/)
criteria:
service: eks
accelerator: h100
os: ubuntu
intent: training
platform: kubeflow
```

Mixins use `kind: RecipeMixin` and carry only `constraints` and `componentRefs`. They live in `recipes/mixins/` and are applied after inheritance chain merging. See [Data Architecture](../contributor/data.md#mixin-composition) for details.

**Merge order:** `base.yaml` (lowest) → intermediate → leaf → mixins (highest)

**Merge rules:**
- Constraints: same-named overridden, new added
- ComponentRefs: same-named merged field-by-field, new added
- Criteria: not inherited (each recipe defines its own)
- Mixin constraints/components must not conflict with the inheritance chain or other mixins

### Component Types

Expand Down Expand Up @@ -219,6 +239,8 @@ File names are for human readability—matching uses `spec.criteria`, not file n
| Service + intent | `{service}-{intent}.yaml` | `eks-training.yaml` |
| Full criteria | `{accel}-{service}-{os}-{intent}.yaml` | `gb200-eks-ubuntu-training.yaml` |
| + platform | `{accel}-{service}-{os}-{intent}-{platform}.yaml` | `gb200-eks-ubuntu-training-kubeflow.yaml` |
| Mixin (OS) | `os-{os}.yaml` | `os-ubuntu.yaml` |
| Mixin (platform) | `platform-{platform}.yaml` | `platform-kubeflow.yaml` |
| Component values | `values-{service}-{intent}.yaml` | `values-eks-training.yaml` |

## Constraints and Validation
Expand Down Expand Up @@ -298,9 +320,10 @@ go test -v ./pkg/recipe/... -run TestConstraintPathsUseValidMeasurementTypes

**Steps:**
1. Create overlay in `recipes/overlays/` with criteria and componentRefs
2. Create component values files if using `valuesFile`
3. Run tests: `make test`
4. Test generation: `aicr recipe --service eks --accelerator gb200 --format yaml`
2. If the recipe shares OS constraints or platform components with other overlays, reference existing mixins via `spec.mixins` instead of duplicating (or create new mixins in `recipes/mixins/`)
3. Create component values files if using `valuesFile`
4. Run tests: `make test`
5. Test generation: `aicr recipe --service eks --accelerator gb200 --format yaml`

**Example:**
```yaml
Expand Down Expand Up @@ -348,6 +371,7 @@ componentRefs:
**Do:**
- Use minimum criteria fields needed for matching
- Keep base recipe universal and conservative
- Use mixins for shared OS constraints or platform components instead of duplicating across leaf overlays
- Always explain why settings exist (1-2 sentences)
- Follow naming conventions (`{accel}-{service}-{os}-{intent}-{platform}`)
- Run `make test` before committing
Expand All @@ -357,6 +381,7 @@ componentRefs:
- Add environment-specific settings to base
- Over-specify criteria (too narrow = fewer matches)
- Create duplicate criteria combinations
- Duplicate OS or platform content across leaf overlays (use mixins instead)
- Skip validation tests
- Forget to update context when values change

Expand Down Expand Up @@ -406,6 +431,8 @@ Integrators can extend or override embedded recipe data using the `--data` flag
├── registry.yaml # Extends/overrides component registry
├── overlays/
│ └── custom-recipe.yaml # New or override existing recipe
├── mixins/
│ └── os-custom.yaml # Custom mixin fragments
└── components/
└── my-operator/
└── values.yaml # Component values
Expand Down
8 changes: 3 additions & 5 deletions pkg/recipe/builder_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -278,11 +278,9 @@ func TestGetEmbeddedFS(t *testing.T) {

// TestConstraintWarning tests the ConstraintWarning struct.
func TestConstraintWarning(t *testing.T) {
const k8sVersionConstraint = "K8s.server.version"

warning := ConstraintWarning{
Overlay: "h100-eks-ubuntu-training-kubeflow",
Constraint: k8sVersionConstraint,
Constraint: testK8sVersionConstant,
Expected: ">= 1.32.4",
Actual: "1.30.0",
Reason: "expected >= 1.32.4, got 1.30.0",
Expand All @@ -291,8 +289,8 @@ func TestConstraintWarning(t *testing.T) {
if warning.Overlay != "h100-eks-ubuntu-training-kubeflow" {
t.Errorf("expected overlay h100-eks-ubuntu-training-kubeflow, got %q", warning.Overlay)
}
if warning.Constraint != k8sVersionConstraint {
t.Errorf("expected constraint %s, got %q", k8sVersionConstraint, warning.Constraint)
if warning.Constraint != testK8sVersionConstant {
t.Errorf("expected constraint %s, got %q", testK8sVersionConstant, warning.Constraint)
}
if warning.Expected != ">= 1.32.4" {
t.Errorf("expected expression >= 1.32.4, got %q", warning.Expected)
Expand Down
2 changes: 1 addition & 1 deletion pkg/recipe/conformance_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ func TestConformanceRecipeInvariants(t *testing.T) {
if tt.wantDRAConstraint {
var hasDRAConstraint bool
for _, c := range result.Constraints {
if c.Name == "K8s.server.version" && strings.Contains(c.Value, "1.34") {
if c.Name == testK8sVersionConstant && strings.Contains(c.Value, "1.34") {
hasDRAConstraint = true
break
}
Expand Down
41 changes: 41 additions & 0 deletions pkg/recipe/metadata.go
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,12 @@ type RecipeMetadataSpec struct {
// Only present in overlay files, not in base.
Criteria *Criteria `json:"criteria,omitempty" yaml:"criteria,omitempty"`

// Mixins is a list of mixin names to compose into this overlay.
// Mixins are loaded from recipes/mixins/ and carry only constraints
// and componentRefs. This field is loader metadata and is stripped
// from the materialized recipe result.
Mixins []string `json:"mixins,omitempty" yaml:"mixins,omitempty"`

// Constraints are deployment assumptions/requirements.
Constraints []Constraint `json:"constraints,omitempty" yaml:"constraints,omitempty"`

Expand All @@ -270,6 +276,24 @@ type RecipeMetadataSpec struct {
Validation *ValidationConfig `json:"validation,omitempty" yaml:"validation,omitempty"`
}

// RecipeMixinKind is the kind value for mixin files.
const RecipeMixinKind = "RecipeMixin"

// RecipeMixin represents a composable fragment that carries only constraints
// and componentRefs. Mixins live in recipes/mixins/ and are referenced by
// overlay spec.mixins fields.
type RecipeMixin struct {
Kind string `json:"kind" yaml:"kind"`
APIVersion string `json:"apiVersion" yaml:"apiVersion"`
Metadata struct {
Name string `json:"name" yaml:"name"`
} `json:"metadata" yaml:"metadata"`
Spec struct {
Constraints []Constraint `json:"constraints,omitempty" yaml:"constraints,omitempty"`
ComponentRefs []ComponentRef `json:"componentRefs,omitempty" yaml:"componentRefs,omitempty"`
} `json:"spec" yaml:"spec"`
}

// RecipeMetadataHeader contains the Kubernetes-style header fields.
type RecipeMetadataHeader struct {
// Kind is always "RecipeMetadata".
Expand Down Expand Up @@ -422,6 +446,23 @@ func (s *RecipeMetadataSpec) Merge(other *RecipeMetadataSpec) {
}
}
}

// Accumulate mixins (deduplicated, preserving order).
// Both leaf and intermediate overlays can declare mixins. When an
// intermediate overlay (e.g., eks-inference) declares a mixin, it is
// accumulated into all descendants during inheritance chain merging.
if len(other.Mixins) > 0 {
seen := make(map[string]bool)
for _, m := range s.Mixins {
seen[m] = true
}
for _, m := range other.Mixins {
if !seen[m] {
s.Mixins = append(s.Mixins, m)
seen[m] = true
}
}
}
}

// mergeComponentRef merges overlay into base, with overlay taking precedence
Expand Down
Loading
Loading