Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
378 changes: 378 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,378 @@
# DynamicData — AI Instructions

## What is DynamicData?

DynamicData is a reactive collections library for .NET, built on top of [Reactive Extensions (Rx)](https://github.com/dotnet/reactive). It provides `SourceCache<TObject, TKey>` and `SourceList<TObject>` — observable data collections that emit **changesets** when modified. These changesets flow through operator pipelines (Sort, Filter, Transform, Group, Join, etc.) that maintain live, incrementally-updated views of the data.

DynamicData is used in production by thousands of applications. It is the reactive data layer for [ReactiveUI](https://reactiveui.net/), making it foundational infrastructure for the .NET reactive ecosystem.

## Cache vs List — Two Collection Types

DynamicData provides two parallel collection types. **Choose the right one — they are not interchangeable.**

| | **Cache** (`SourceCache<T, TKey>`) | **List** (`SourceList<T>`) |
|---|---|---|
| **Identity** | Items identified by unique key | Items identified by index position |
| **Duplicates** | Not allowed (key must be unique) | Allowed (same item at multiple positions) |
| **Ordering** | Unordered by default (Sort adds ordering) | Inherently ordered (like `List<T>`) |
| **Best for** | Entities with IDs, lookup by key | Ordered sequences, duplicates OK |
| **Change types** | Add, Update, Remove, Refresh, Moved | Add, AddRange, Replace, Remove, RemoveRange, Moved, Refresh, Clear |
| **Changeset** | `IChangeSet<TObject, TKey>` | `IChangeSet<T>` |

**Rule of thumb:** If your items have a natural unique key (ID, name, etc.), use **Cache**. If order matters and/or duplicates are possible, use **List**. Cache is used far more often in practice.

See `.github/instructions/dynamicdata-cache.instructions.md` for the complete cache operator reference.

See `.github/instructions/dynamicdata-list.instructions.md` for the complete list operator reference.

## Why Performance Matters

Every item flowing through a DynamicData pipeline passes through multiple operators. Each operator processes changesets — not individual items — so a single cache edit with 1000 items creates a changeset that flows through every operator in the chain. At library scale:

- **Per-item overhead compounds**: 1 allocation × 10 operators × 1000 items × 100 pipelines = 1M allocations per batch
- **Lock contention is the bottleneck**: operators serialize access to shared state. Minimizing lock hold time is a core design goal.
- **Prefer value types and stack allocation**: use structs, `ref struct`, `Span<T>`, and avoid closures in hot paths where possible

When optimizing, measure allocation rates and lock contention, not just wall-clock time.

## Why Rx Contract Compliance is Critical

DynamicData operators compose — the output of one is the input of the next. If any operator violates the Rx contract (e.g., concurrent `OnNext` calls, calls after `OnCompleted`), every downstream operator can corrupt its internal state. This is not a crash — it's silent data corruption that manifests as wrong results, missing items, or phantom entries. In a reactive UI, this means the user sees stale or incorrect data with no error message.

See `.github/instructions/rx.instructions.md` for comprehensive Rx contract rules, scheduler usage, disposable patterns, and a complete standard Rx operator reference.

## Breaking Changes

DynamicData follows [Semantic Versioning (SemVer)](https://semver.org/). Breaking changes **are possible** in major version bumps, but they are never done lightly. This library has thousands of downstream consumers — every breaking change has a blast radius.

**Rules:**
- Breaking changes require a major version bump. **You MUST explicitly call out any potentially breaking change to the user** before making it — even if you think it's minor. Let the maintainers decide.
- Prefer non-breaking alternatives first: new overloads, new methods, optional parameters with safe defaults.
- When a breaking change is justified, mark the old API with `[Obsolete("Use XYZ instead. This will be removed in vN+1.")]` in the current version and remove it in the next major.
- Behavioral changes (different ordering, different filtering semantics, different error propagation) are breaking even if the signature is unchanged. Call these out.
- Internal types (`internal` visibility) can change freely — they are not part of the public contract.

**What counts as breaking:**
- Changing the signature of a public extension method (parameters, return type, generic constraints)
- Changing observable behavior (emission order, filtering semantics, error/completion propagation)
- Removing or renaming public types, methods, or properties
- Adding required parameters to existing methods
- Changing the default behavior of an existing overload

## Maintaining These Instructions

These instruction files are living documentation. **They must be kept in sync with the code.**

- When a **new operator** is added, it **MUST** be added to the appropriate instruction file (`dynamicdata-cache.instructions.md` or `dynamicdata-list.instructions.md`) with its change reason handling table.
- When an **operator's behavior changes**, update its table in the instruction file.
- When a **new test utility** is added, update the test utilities reference in the main instructions and the appropriate `testing-*.instructions.md`.
- When a **new domain type** is added to `Tests/Domain/`, add it to the Domain Types section.

## Repository Structure

```
src/
├── DynamicData/ # The library
│ ├── Cache/ # Cache (keyed collection) operators
│ │ ├── Internal/ # Operator implementations (private)
│ │ ├── ObservableCache.cs # Core observable cache implementation
│ │ └── ObservableCacheEx.cs # Public API: extension methods for cache operators
│ ├── List/ # List (ordered collection) operators
│ │ ├── Internal/ # Operator implementations (private)
│ │ └── ObservableListEx.cs # Public API: extension methods for list operators
│ ├── Binding/ # UI binding operators (SortAndBind, etc.)
│ ├── Internal/ # Shared internal infrastructure
│ └── Kernel/ # Low-level types (Optional<T>, Error<T>, etc.)
├── DynamicData.Tests/ # Tests (xUnit + FluentAssertions)
│ ├── Cache/ # Cache operator tests
│ ├── List/ # List operator tests
│ └── Domain/ # Test domain types using Bogus fakers
```

## Operator Architecture Pattern

Most operators follow the same two-part pattern:

```csharp
// 1. Public API: extension method in ObservableCacheEx.cs (thin wrapper)
public static IObservable<IChangeSet<TDest, TKey>> Transform<TSource, TKey, TDest>(
this IObservable<IChangeSet<TSource, TKey>> source,
Func<TSource, TDest> transformFactory)
{
return new Transform<TDest, TSource, TKey>(source, transformFactory).Run();
}

// 2. Internal: sealed class in Cache/Internal/ with a Run() method
internal sealed class Transform<TDest, TSource, TKey>
{
public IObservable<IChangeSet<TDest, TKey>> Run() =>
Observable.Create<IChangeSet<TDest, TKey>>(observer =>
{
// Subscribe to source, process changesets, emit results
// Use ChangeAwareCache<T,K> for incremental state
// Call CaptureChanges() to produce the output changeset
});
}
```

**Key points:**
- The extension method is the **public API surface** — keep it thin
- The internal class holds constructor parameters and implements `Run()`
- `Run()` returns `Observable.Create<T>` which **defers subscription** (cold observable)
- Inside `Create`, operators subscribe to sources and wire up changeset processing
- `ChangeAwareCache<T,K>` tracks incremental changes and produces immutable snapshots via `CaptureChanges()`
- Operators must handle all change reasons: `Add`, `Update`, `Remove`, `Refresh`

## Thread Safety in Operators

When an operator has multiple input sources that share mutable state:
- All sources must be serialized through a shared lock
- Use `Synchronize(gate)` with a shared lock object to serialize multiple sources
- Keep lock hold times as short as practical

When operators use `Synchronize(lock)` from Rx:
- The lock is held during the **entire** downstream delivery chain
- This ensures serialized delivery across multiple sources sharing a lock
- Always use a private lock object — never expose it to external consumers

## Testing

**All new code MUST come with unit tests that prove 100% correctness. All bug fixes MUST include a regression test that reproduces the bug before verifying the fix.** No exceptions. Untested code is broken code — you just don't know it yet.

### Frameworks and Tools

- **xUnit** — test framework (`[Fact]`, `[Theory]`, `[InlineData]`)
- **FluentAssertions** — via the `AwesomeAssertions` NuGet package (`.Should().Be()`, `.Should().BeEquivalentTo()`, etc.)
- **Bogus** — fake data generation via `Faker<T>` in `DynamicData.Tests/Domain/Fakers.cs`
- **TestSourceCache<T,K>** and **TestSourceList<T>** — enhanced source collections in `Tests/Utilities/` that support `.Complete()` and `.SetError()` for testing terminal Rx events

### Test File Naming and Organization

Tests live in `src/DynamicData.Tests/` mirroring the library structure:
- `Cache/` — cache operator tests (one fixture class per operator, e.g., `TransformFixture.cs`)
- `List/` — list operator tests
- `Domain/` — shared domain types (`Person`, `Animal`, `AnimalOwner`, `Market`, etc.) and Bogus fakers
- `Utilities/` — test infrastructure (aggregators, validators, recording observers, stress helpers)

Naming convention: `{OperatorName}Fixture.cs`. For operators with multiple overloads, use partial classes: `FilterFixture.Static.cs`, `FilterFixture.DynamicPredicate.IntegrationTests.cs`, etc.

For cache-specific testing patterns (observation patterns, changeset assertions, stub pattern), see `.github/instructions/testing-cache.instructions.md`.

For list-specific testing patterns, see `.github/instructions/testing-list.instructions.md`.

### Testing the Rx Contract

`.ValidateSynchronization()` detects Rx contract violations. It tracks in-flight notifications with `Interlocked.Exchange` — if two threads enter `OnNext` simultaneously, it throws `UnsynchronizedNotificationException`. It uses raw observer/observable types to bypass Rx's built-in safety guards so violations are surfaced, not masked.

```csharp
source.Connect()
.Transform(x => new ViewModel(x))
.ValidateSynchronization() // THROWS if concurrent delivery detected
.RecordCacheItems(out var results);
```

### Writing Regression Tests for Bug Fixes

Every bug fix **must** include a test that:
1. **Reproduces the bug** — the test fails without the fix
2. **Verifies the fix** — the test passes with the fix
3. **Is named descriptively** — describes the scenario, not the bug ID

```csharp
// GOOD: describes the scenario that was broken
[Fact]
public void RemoveThenReAddWithSameKey_ShouldNotDuplicate()

// BAD: meaningless to future readers
[Fact]
public void FixBug1234()
```

### Waiting for Pipeline Completion

**Never use `Task.Delay` to wait for pipelines to settle.** Use the `Publish` + `LastOrDefaultAsync` + `ToTask` pattern for deterministic completion:

```csharp
var published = source.Connect()
.Filter(x => x.IsActive)
.Sort(comparer)
.Publish();

var hasCompleted = published.LastOrDefaultAsync().ToTask();
using var results = published.AsAggregator();
using var connect = published.Connect();

// Do stuff — write to source
source.AddOrUpdate(items);

// Signal completion — cascades OnCompleted through the chain
source.Dispose();

await hasCompleted; // deterministic wait — no timeout, no polling

// Assert exact results
results.Data.Items.Should().BeEquivalentTo(expectedItems);
```

**Why this works:** `LastOrDefaultAsync()` subscribes to the published stream and completes only when the source completes. `ToTask()` converts it to an awaitable. Disposing the source cascades `OnCompleted` through the entire chain.

### Chaining Intermediate Caches

Use `AsObservableCache()` to materialize pipeline stages into queryable caches. Each cache's `Connect()` feeds the next stage:

```csharp
// Stage 1: complex pipeline → materialized cache
using var cache1 = source.Connect()
.AutoRefresh(x => x.Rating)
.Filter(x => x.Rating > 3.0)
.Transform(x => new ViewModel(x))
.AsObservableCache();

// Stage 2: chain from cache1 → another cache
using var cache2 = cache1.Connect()
.GroupOn(x => x.Category)
.MergeManyChangeSets(g => g.Cache.Connect())
.AsObservableCache();

// Stage 3: final pipeline with completion tracking
var published = cache2.Connect()
.Sort(comparer)
.Publish();

var hasCompleted = published.LastOrDefaultAsync().ToTask();
using var finalResults = published.AsAggregator();
using var connect = published.Connect();

source.Dispose(); // cascades through cache1 → cache2 → finalResults
await hasCompleted;

// Verify exact contents at every stage
cache1.Items.Should().BeEquivalentTo(expectedStage1);
cache2.Items.Should().BeEquivalentTo(expectedStage2);
finalResults.Data.Items.Should().BeEquivalentTo(expectedFinal);
```

### Stress Test Pattern

The complete pattern for stress testing DynamicData operators:

1. **Generate deterministic test data** from a `Bogus.Randomizer` with a fixed seed. All quantities (item counts, property values, thread counts) come from the seed — nothing hardcoded except the seed itself.

2. **Wire up chained pipelines** with intermediate `AsObservableCache()` stages. Use long fluent operator chains (10+ operators per chain). Each operator should appear in at least 2 different chains.

3. **Track completion** with `Publish()` + `LastOrDefaultAsync().ToTask()` on terminal chains.

4. **Feed data from multiple threads** using `Barrier` for simultaneous start — maximum contention. Interleave adds, updates, removes, and property mutations.

5. **Pre-compute expected results** by simulating each chain's logic in plain LINQ on the same generated data — before the Rx pipelines run.

6. **Dispose sources** → `await hasCompleted` on all terminal chains.

7. **Verify exact contents** of ALL intermediate and final caches — not just counts, but the actual items via `BeEquivalentTo`.

```csharp
// Pre-compute expected results via LINQ
var expectedFiltered = generatedItems.Where(x => x.IsActive).ToList();
var expectedTransformed = expectedFiltered.Select(x => Transform(x)).ToList();

// After pipelines complete
cache1.Items.Should().BeEquivalentTo(expectedFiltered);
cache2.Items.Should().BeEquivalentTo(expectedTransformed);
```

### Stress Test Principles

- Use `Barrier` for simultaneous start — maximizes contention. Include the main thread in participant count.
- Use deterministic data so failures are reproducible. Use `Bogus.Randomizer` with a fixed seed — **never `System.Random`**.
- Assert the **exact final state**, not just "count > 0".
- Use `Task.WhenAny(completed, Task.Delay(timeout))` to detect deadlocks with a meaningful timeout.
- Include mixed operations: adds, updates, removes, property mutations, dynamic parameter changes.
- Use the `StressAddRemove` extension methods in `Tests/Utilities/` for standard add/remove patterns with timed removal.

### Test Utilities Reference

The `Tests/Utilities/` directory provides powerful helpers — **use them** instead of reinventing:

| Utility | Purpose |
|---------|---------|
| `ValidateSynchronization()` | Detects concurrent `OnNext` — Rx contract violation |
| `ValidateChangeSets(keySelector)` | Validates structural integrity of every changeset |
| `RecordCacheItems(out results)` | Cache recording observer with keyed + sorted tracking |
| `RecordListItems(out results)` | List recording observer |
| `TestSourceCache<T,K>` | SourceCache with `.Complete()` and `.SetError()` support |
| `TestSourceList<T>` | SourceList with `.Complete()` and `.SetError()` support |
| `StressAddRemove` extensions | Add/remove stress patterns with timed automatic removal |
| `ForceFail(count, exception)` | Forces an observable to error after N emissions |
| `Parallelize(count, parallel)` | Creates parallel subscriptions for stress testing |
| `ObservableSpy` | Diagnostic logging for pipeline debugging |
| `FakeScheduler` | Controlled scheduler for time-dependent tests |
| `Fakers.*` | Bogus fakers for `Person`, `Animal`, `AnimalOwner`, `Market` |

### Domain Types

Shared domain types in `Tests/Domain/`:

- **`Person`** — `Name` (key), `Age`, `Gender`, `FavoriteColor`, `PetType`. Implements `INotifyPropertyChanged`.
- **`Animal`** — `Name` (key), `Type`, `Family` (enum: Mammal, Reptile, Fish, Amphibian, Bird)
- **`AnimalOwner`** — `Name` (key), `Animals` (ObservableCollection). Ideal for `TransformMany`/`MergeManyChangeSets` tests.
- **`Market`** / **`MarketPrice`** — financial-style streaming data tests
- **`PersonWithGender`**, **`PersonWithChildren`**, etc. — transform output types

Generate test data with Bogus:
```csharp
var people = Fakers.Person.Generate(100);
var animals = Fakers.Animal.Generate(50);
var owners = Fakers.AnimalOwnerWithAnimals.Generate(10); // pre-populated with animals
```

### Test Anti-Patterns

**❌ Testing implementation details instead of behavior:**
```csharp
// BAD: message count is an implementation detail — fragile
results.Messages.Count.Should().Be(3);
// GOOD: test the observable behavior and final state
results.Data.Count.Should().Be(expectedCount);
results.Data.Items.Should().BeEquivalentTo(expectedItems);
```

**❌ Using `Task.Delay` to wait for pipelines to settle:**
```csharp
// BAD: flaky, slow, non-deterministic
await Task.Delay(2000); // "wait for async deliveries to settle"
results.Data.Count.Should().Be(expected);
// GOOD: deterministic completion via Publish + LastOrDefaultAsync
var published = pipeline.Publish();
var hasCompleted = published.LastOrDefaultAsync().ToTask();
using var results = published.AsAggregator();
using var connect = published.Connect();
source.Dispose();
await hasCompleted;
results.Data.Items.Should().BeEquivalentTo(expectedItems);
```

**❌ Using `Thread.Sleep` for timing:**
```csharp
// BAD: flaky and slow
Thread.Sleep(1000);
// GOOD: use test schedulers or deterministic waiting
var scheduler = new TestScheduler();
scheduler.AdvanceBy(TimeSpan.FromSeconds(1).Ticks);
```

**❌ Ignoring disposal:**
```csharp
// BAD: leaks subscriptions, masks errors
var results = source.Connect().Filter(p => true).AsAggregator();
// GOOD: using ensures cleanup even if assertion throws
using var results = source.Connect().Filter(p => true).AsAggregator();
```

**❌ Non-deterministic data without seeds:**
```csharp
// BAD: failures aren't reproducible across runs
var random = new Random();
// GOOD: use Bogus Randomizer with a fixed seed
var randomizer = new Randomizer(42);
var people = Fakers.Person.UseSeed(randomizer.Int()).Generate(100);
```
Loading
Loading