feat: Kornia GPU augmentation backend for detection training#874
feat: Kornia GPU augmentation backend for detection training#874
Conversation
- Add `augmentation_backend` field to `TrainConfig` (cpu/auto/gpu); cpu is the default - New `src/rfdetr/datasets/kornia_transforms.py`: registry of 8 transform factories, `build_kornia_pipeline`, `build_normalize`, `collate_boxes`/`unpack_boxes` box utilities - Wire `gpu_postprocess` flag through `coco.py` and `yolo.py` so CPU Albumentations augmentation and normalize are skipped when GPU path is active - Add `_setup_kornia_pipeline` + `on_after_batch_transfer` to `RFDETRDataModule`; segmentation models skip GPU aug (phase 2) with a one-time warning - Add `kornia>=0.7,<1` optional dep group in `pyproject.toml` - 12 new tests across `test_module_data.py` and `test_kornia_transforms.py` --- Co-authored-by: Claude Code <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an opt-in GPU-side augmentation path for detection training by introducing an augmentation_backend switch and routing normalization/augmentation to run after the batch is transferred to device (via RFDETRDataModule.on_after_batch_transfer), while keeping the existing CPU Albumentations pipeline as the default.
Changes:
- Add
TrainConfig.augmentation_backend("cpu" | "auto" | "gpu") and a new optional dependency groupkornia. - Thread a
gpu_postprocessflag through COCO/YOLO dataset builders so CPU Albumentations + Normalize can be skipped when GPU postprocessing is active. - Add DataModule logic + tests for backend resolution and the
on_after_batch_transferhook.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/rfdetr/training/module_data.py |
Adds Kornia pipeline setup and on_after_batch_transfer GPU postprocessing hook. |
src/rfdetr/datasets/coco.py |
Adds gpu_postprocess option to training transforms and wires backend flag through dataset builders. |
src/rfdetr/datasets/yolo.py |
Wires backend flag through Roboflow-from-YOLO builder. |
src/rfdetr/config.py |
Introduces augmentation_backend on TrainConfig. |
src/rfdetr/datasets/aug_config.py |
Documents Kornia GPU backend and Phase 1 limitations. |
pyproject.toml |
Adds optional dependency group kornia. |
tests/training/test_module_data.py |
Adds tests for backend resolution and on_after_batch_transfer. |
tests/training/conftest.py |
Adds autouse fixture to restore RFDETRDataModule.trainer property after tests. |
CHANGELOG.md |
Documents the new augmentation_backend feature. |
Comments suppressed due to low confidence (1)
src/rfdetr/datasets/coco.py:356
- The
make_coco_transforms()docstring and Args list no longer match behavior now thatgpu_postprocesscan skip Albumentations andNormalize()for the train split. Please document the newgpu_postprocessparameter and clarify that normalization is deferred to the DataModule GPU path when it’s enabled.
"""Build the standard COCO transform pipeline for a given dataset split.
Returns a composed transform that resizes images to the target ``resolution``
(with optional multi-scale jitter), applies Albumentations-based augmentations
during training, and normalises pixel values with ImageNet statistics.
For the ``"train"`` split the pipeline uses a two-branch ``OneOf`` between a
direct resize and a resize → random-crop → resize sequence (built via
:func:`_build_train_resize_config`), followed by the augmentation stack and
normalisation. For ``"val"``, ``"test"``, and ``"val_speed"`` only resize and
normalisation are applied — no augmentation.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…rmalize - Import get_logger and add module-level logger to o365.py - Detect augmentation_backend from args; emit WARNING when non-cpu (Phase 1 limitation: no aug_config support for O365) - Compute gpu_postprocess flag and pass to both make_coco_transforms / make_coco_transforms_square_div_64 calls Addresses review comment — HIGH blocking: double normalize for O365 users with augmentation_backend != 'cpu' (PR #874) --- Co-authored-by: Claude Code <noreply@anthropic.com>
…RNING
- Add _kornia_setup_done: bool = False in __init__ to prevent _setup_kornia_pipeline re-running on every setup('fit') call when the auto+no-CUDA/no-kornia fallback leaves _kornia_pipeline as None
- Switch the auto+no-CUDA fallback from logger.info to logger.warning (consistent with auto+no-kornia WARNING)
Addresses review comments — MEDIUM: setup guard re-runs in fallback path; inconsistent log levels (PR #874)
---
Co-authored-by: Claude Code <noreply@anthropic.com>
- _make_gaussian_blur: enforce blur_limit >= 3 after odd-rounding (Kornia requires kernel_size >= 3) - make_coco_transforms, make_coco_transforms_square_div_64: add gpu_postprocess to Args docstring - unpack_boxes: correct docstring claiming in-place mutation (function returns shallow copies) - conftest.py: fix docstring wording LightningModule → LightningDataModule Addresses review comments from @Copilot and @review on PR #874 --- Co-authored-by: Claude Code <noreply@anthropic.com>
…pipeline forward pass
- TestGaussianBlurMinKernel: parametrized test for blur_limit=1,2 producing valid kernel_size >= 3
- TestKorniaPipelineForwardPass: shape/dtype check and empty-bbox batch through built pipeline (kornia skip guard)
- TestBuildO365RawGpuBackend: warning emitted for non-cpu backend; gpu_postprocess wired correctly; square-resize delegate
- TestKorniaSetupDoneSentinel: sentinel starts False, set after fit, _setup_kornia_pipeline called exactly once across repeated setup('fit') calls
Closes review test-coverage gaps from PR #874
---
Co-authored-by: Claude Code <noreply@anthropic.com>
When augmentation_backend != 'cpu' and aug_config is not explicitly set,
build_kornia_pipeline was receiving {} (empty dict) while the CPU path
correctly fell back to AUG_CONFIG. This caused GPU training to have zero
augmentation by default — a silent regression.
- Import AUG_CONFIG in module_data.py
- Use `train_config.aug_config if ... is not None else AUG_CONFIG` in _setup_kornia_pipeline
- Add test_gpu_path_uses_aug_config_fallback to TestBackendResolution
Addresses QA finding B1 (blocking) from post-commit review of PR #874
---
Co-authored-by: Claude Code <noreply@anthropic.com>
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (77%) is below the target coverage (95%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #874 +/- ##
=======================================
Coverage 79% 79%
=======================================
Files 97 98 +1
Lines 7829 8070 +241
=======================================
+ Hits 6179 6370 +191
- Misses 1650 1700 +50 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…auto backend
- _make_gaussian_blur: kernel_size=(blur_limit, blur_limit) instead of (3, blur_limit) — square kernel per Albumentations semantics (Copilot #2991808605)
- build_normalize: pass plain Python tuples instead of torch.tensor() so Kornia handles device placement (Copilot #2991808573)
- on_after_batch_transfer: call .to(img.device) on pipeline and normalize before use to prevent CPU/GPU device mismatch (Copilot #2991808540)
- setup("fit"): resolve 'auto' backend via _resolve_augmentation_backend() before dataset build so gpu_postprocess matches actual runtime behavior — fixes silent CPU-normalize stripping on machines without CUDA/kornia (Copilot #2991808618, #2991808669)
- 4 new tests covering _resolve_augmentation_backend and namespace pre-resolution
---
Co-authored-by: Claude Code <noreply@anthropic.com>
The try block unconditionally set has_kornia=True with no import, making the except unreachable; on machines with CUDA but without kornia, auto would incorrectly resolve to gpu — causing ImportError or unnormalized training inputs (review HIGH finding). Also update test_auto_backend_emits_warning and test_gpu_postprocess_true_for_auto_backend to mock CUDA+kornia availability so the GPU path is actually exercised; add complementary no-CUDA tests. --- Co-authored-by: Claude Code <noreply@anthropic.com>
…kornia - Annotated kornia imports with `# type: ignore[import-not-found]` to suppress import errors in type-checking. - Updated `pyproject.toml` to include kornia in `ignore_missing_imports` for mypy. - Removed redundant `num_workers` assignment in `module_data.py`.
…cross modules and tests - Introduced `_has_cuda_device` helper in `module_data.py` for fork-safe CUDA checks. - Updated all instances of `torch.cuda.is_available` in datasets, training modules, and tests to use `_has_cuda_device`. - Refactored runtime backend resolution to ensure consistent error handling and modularity. - Enhanced various tests to mock `_has_cuda_device` for deterministic behavior. - Standardized normalization and backend resolution behaviors across GPU and CPU paths.
…rt usage - Replaced ambiguous variables (`H`, `W`, `B`) with descriptive names (`image_height`, `image_width`, `batch
- Add `_has_cuda_device()` and `resolve_augmentation_backend()` to `kornia_transforms.py` as single canonical implementations - `_has_cuda_device` uses `rfdetr.config.DEVICE` (fork-safe) instead of `torch.cuda.is_available()` — mirrors module_data.py - `_resolve_runtime_augmentation_backend` in coco.py now delegates to the shared resolver; yolo.py import unchanged - `build_o365_raw` inline resolution replaced with single `resolve_augmentation_backend()` call, removing duplicate logic and two direct `torch.cuda.is_available()` calls [resolve #1] /review finding by sw-engineer (report: .temp/output-review-aug-kornia-2026-04-09.md): "Duplicated backend-resolution logic across three modules" [resolve #2] /review finding by linting-expert (report: .temp/output-review-aug-kornia-2026-04-09.md): "Inconsistent CUDA detection: fork-safe vs direct" --- Co-authored-by: Claude Code <noreply@anthropic.com>
The `setattr(args, "augmentation_backend", "cpu")` calls mutated the shared namespace when resolving the "auto" or segmentation-forced-cpu paths. The local `resolved_augmentation_backend` variable already controls `gpu_postprocess` correctly; the mutations were redundant side-effects that could surprise the DataModule's own `_setup_kornia_pipeline` resolution path. [resolve #3] /review finding by sw-engineer (report: .temp/output-review-aug-kornia-2026-04-09.md): "Mutable namespace mutation in build_coco()" --- Co-authored-by: Claude Code <noreply@anthropic.com>
…ata.py The bare `import kornia.augmentation` statements in `_setup_kornia_pipeline` lacked the `# type: ignore[import-not-found]` annotation used consistently elsewhere in the PR (coco.py, o365.py, kornia_transforms.py). [resolve #4] /review finding by linting-expert (report: .temp/output-review-aug-kornia-2026-04-09.md): "kornia imports without type: ignore in module_data.py" --- Co-authored-by: Claude Code <noreply@anthropic.com>
…strings Both `make_coco_transforms` and `make_coco_transforms_square_div_64` described the train pipeline as always applying augmentation + normalization. Added a paragraph explaining that `gpu_postprocess=True` omits both, deferring them to `RFDETRDataModule.on_after_batch_transfer`. [resolve #5] /review finding by doc-scribe (report: .temp/output-review-aug-kornia-2026-04-09.md): "make_coco_transforms() docstring not updated for gpu_postprocess" --- Co-authored-by: Claude Code <noreply@anthropic.com>
- Upgrade typing to built-in generics (Dict→dict, List→list, Tuple→tuple, Optional→X|None) across kornia_transforms.py and coco.py - Remove unused `typing` imports; use `collections.abc.Callable` - Add `# noqa: F401` to all unused kornia guard imports in module_data.py and kornia_transforms.py - Restore `# noqa: N806` on PATHS dicts in coco.py and o365.py (removed by linter pass, re-flagged by ruff) - Minor: `super(Cls, self)` → `super()`, `setattr(self.coco, …)` → direct attr assignment, f-string instead of `.format()` --- Co-authored-by: Claude Code <noreply@anthropic.com>
`_has_cuda_device` now reads `rfdetr.config.DEVICE` (set once at import time) instead of calling `torch.cuda.is_available`. Patching `torch.cuda.is_available` had no effect, causing three TestBuildO365RawGpuBackend tests to fail on non-CUDA machines and `TestBuildRoboflowFromCocoBackendResolution.test_auto_no_cuda_keeps_cpu_normalize` to be fragile on CUDA hosts. Updated all six affected patch calls to target `rfdetr.datasets.kornia_transforms._has_cuda_device` directly. --- Co-authored-by: Claude Code <noreply@anthropic.com>
…entation_backend Previously build_coco() only forwarded 'auto' to the resolver, so an explicit augmentation_backend='gpu' with no CUDA would not fail at dataset-build time (the check only happened later in on_after_batch_transfer). Changed the condition from `== "auto"` to `!= "cpu"` so both 'auto' and 'gpu' pass through the shared resolver — matching the behaviour of build_roboflow_from_coco. --- Co-authored-by: Claude Code <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- resolve_augmentation_backend() now raises ValueError for unrecognised backend strings instead of silently returning the raw value (which would leave gpu_postprocess=True while no Kornia pipeline is built, producing unnormalised training inputs) - _make_affine() converts Albumentations translate_percent=(min, max) to Kornia RandomAffine(translate=(tx, ty)) by taking the max absolute value of the range, matching the intended symmetric translation bound [resolve #5] Review comment by @Copilot (PR #874): "resolve_augmentation_backend() falls through to `return backend` for any unexpected string..." [resolve #7] Review comment by @Copilot (PR #874): "_make_affine() forwards Albumentations-style translate_percent directly to Kornia RandomAffine..." --- Co-authored-by: Claude Code <noreply@anthropic.com>
The test was patching torch.cuda.is_available, which has no effect because backend resolution uses rfdetr.datasets.kornia_transforms._has_cuda_device (which reads the fork-safe DEVICE constant). Also the args namespace was missing augmentation_backend='auto', so the assertion passed trivially against the 'cpu' default rather than the auto path. [resolve #8] Review comment by @Copilot (PR #874): "This test claims to validate augmentation_backend='auto' no-CUDA path, but the constructed args never sets augmentation_backend..." --- Co-authored-by: Claude Code <noreply@anthropic.com>
What does this PR do?
augmentation_backendfield toTrainConfig(cpu/auto/gpu); cpu is the defaultsrc/rfdetr/datasets/kornia_transforms.py: registry of 8 transform factories,build_kornia_pipeline,build_normalize,collate_boxes/unpack_boxesbox utilitiesgpu_postprocessflag throughcoco.pyandyolo.pyso CPU Albumentations augmentation and normalize are skipped when GPU path is active_setup_kornia_pipeline+on_after_batch_transfertoRFDETRDataModule; segmentation models skip GPU aug (phase 2) with a one-time warningkornia>=0.7,<1optional dep group inpyproject.tomltest_module_data.pyandtest_kornia_transforms.pyCloses #862
Type of Change
Testing
Additional Context