feat: Kornia GPU augmentation backend for detection training#874
feat: Kornia GPU augmentation backend for detection training#874
Conversation
- Add `augmentation_backend` field to `TrainConfig` (cpu/auto/gpu); cpu is the default - New `src/rfdetr/datasets/kornia_transforms.py`: registry of 8 transform factories, `build_kornia_pipeline`, `build_normalize`, `collate_boxes`/`unpack_boxes` box utilities - Wire `gpu_postprocess` flag through `coco.py` and `yolo.py` so CPU Albumentations augmentation and normalize are skipped when GPU path is active - Add `_setup_kornia_pipeline` + `on_after_batch_transfer` to `RFDETRDataModule`; segmentation models skip GPU aug (phase 2) with a one-time warning - Add `kornia>=0.7,<1` optional dep group in `pyproject.toml` - 12 new tests across `test_module_data.py` and `test_kornia_transforms.py` --- Co-authored-by: Claude Code <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an opt-in GPU-side augmentation path for detection training by introducing an augmentation_backend switch and routing normalization/augmentation to run after the batch is transferred to device (via RFDETRDataModule.on_after_batch_transfer), while keeping the existing CPU Albumentations pipeline as the default.
Changes:
- Add
TrainConfig.augmentation_backend("cpu" | "auto" | "gpu") and a new optional dependency groupkornia. - Thread a
gpu_postprocessflag through COCO/YOLO dataset builders so CPU Albumentations + Normalize can be skipped when GPU postprocessing is active. - Add DataModule logic + tests for backend resolution and the
on_after_batch_transferhook.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/rfdetr/training/module_data.py |
Adds Kornia pipeline setup and on_after_batch_transfer GPU postprocessing hook. |
src/rfdetr/datasets/coco.py |
Adds gpu_postprocess option to training transforms and wires backend flag through dataset builders. |
src/rfdetr/datasets/yolo.py |
Wires backend flag through Roboflow-from-YOLO builder. |
src/rfdetr/config.py |
Introduces augmentation_backend on TrainConfig. |
src/rfdetr/datasets/aug_config.py |
Documents Kornia GPU backend and Phase 1 limitations. |
pyproject.toml |
Adds optional dependency group kornia. |
tests/training/test_module_data.py |
Adds tests for backend resolution and on_after_batch_transfer. |
tests/training/conftest.py |
Adds autouse fixture to restore RFDETRDataModule.trainer property after tests. |
CHANGELOG.md |
Documents the new augmentation_backend feature. |
Comments suppressed due to low confidence (1)
src/rfdetr/datasets/coco.py:356
- The
make_coco_transforms()docstring and Args list no longer match behavior now thatgpu_postprocesscan skip Albumentations andNormalize()for the train split. Please document the newgpu_postprocessparameter and clarify that normalization is deferred to the DataModule GPU path when it’s enabled.
"""Build the standard COCO transform pipeline for a given dataset split.
Returns a composed transform that resizes images to the target ``resolution``
(with optional multi-scale jitter), applies Albumentations-based augmentations
during training, and normalises pixel values with ImageNet statistics.
For the ``"train"`` split the pipeline uses a two-branch ``OneOf`` between a
direct resize and a resize → random-crop → resize sequence (built via
:func:`_build_train_resize_config`), followed by the augmentation stack and
normalisation. For ``"val"``, ``"test"``, and ``"val_speed"`` only resize and
normalisation are applied — no augmentation.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…rmalize - Import get_logger and add module-level logger to o365.py - Detect augmentation_backend from args; emit WARNING when non-cpu (Phase 1 limitation: no aug_config support for O365) - Compute gpu_postprocess flag and pass to both make_coco_transforms / make_coco_transforms_square_div_64 calls Addresses review comment — HIGH blocking: double normalize for O365 users with augmentation_backend != 'cpu' (PR #874) --- Co-authored-by: Claude Code <noreply@anthropic.com>
…RNING
- Add _kornia_setup_done: bool = False in __init__ to prevent _setup_kornia_pipeline re-running on every setup('fit') call when the auto+no-CUDA/no-kornia fallback leaves _kornia_pipeline as None
- Switch the auto+no-CUDA fallback from logger.info to logger.warning (consistent with auto+no-kornia WARNING)
Addresses review comments — MEDIUM: setup guard re-runs in fallback path; inconsistent log levels (PR #874)
---
Co-authored-by: Claude Code <noreply@anthropic.com>
- _make_gaussian_blur: enforce blur_limit >= 3 after odd-rounding (Kornia requires kernel_size >= 3) - make_coco_transforms, make_coco_transforms_square_div_64: add gpu_postprocess to Args docstring - unpack_boxes: correct docstring claiming in-place mutation (function returns shallow copies) - conftest.py: fix docstring wording LightningModule → LightningDataModule Addresses review comments from @Copilot and @review on PR #874 --- Co-authored-by: Claude Code <noreply@anthropic.com>
…pipeline forward pass
- TestGaussianBlurMinKernel: parametrized test for blur_limit=1,2 producing valid kernel_size >= 3
- TestKorniaPipelineForwardPass: shape/dtype check and empty-bbox batch through built pipeline (kornia skip guard)
- TestBuildO365RawGpuBackend: warning emitted for non-cpu backend; gpu_postprocess wired correctly; square-resize delegate
- TestKorniaSetupDoneSentinel: sentinel starts False, set after fit, _setup_kornia_pipeline called exactly once across repeated setup('fit') calls
Closes review test-coverage gaps from PR #874
---
Co-authored-by: Claude Code <noreply@anthropic.com>
When augmentation_backend != 'cpu' and aug_config is not explicitly set,
build_kornia_pipeline was receiving {} (empty dict) while the CPU path
correctly fell back to AUG_CONFIG. This caused GPU training to have zero
augmentation by default — a silent regression.
- Import AUG_CONFIG in module_data.py
- Use `train_config.aug_config if ... is not None else AUG_CONFIG` in _setup_kornia_pipeline
- Add test_gpu_path_uses_aug_config_fallback to TestBackendResolution
Addresses QA finding B1 (blocking) from post-commit review of PR #874
---
Co-authored-by: Claude Code <noreply@anthropic.com>
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (73%) is below the target coverage (95%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #874 +/- ##
=======================================
Coverage 77% 77%
=======================================
Files 97 98 +1
Lines 7593 7817 +224
=======================================
+ Hits 5856 6029 +173
- Misses 1737 1788 +51 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…auto backend
- _make_gaussian_blur: kernel_size=(blur_limit, blur_limit) instead of (3, blur_limit) — square kernel per Albumentations semantics (Copilot #2991808605)
- build_normalize: pass plain Python tuples instead of torch.tensor() so Kornia handles device placement (Copilot #2991808573)
- on_after_batch_transfer: call .to(img.device) on pipeline and normalize before use to prevent CPU/GPU device mismatch (Copilot #2991808540)
- setup("fit"): resolve 'auto' backend via _resolve_augmentation_backend() before dataset build so gpu_postprocess matches actual runtime behavior — fixes silent CPU-normalize stripping on machines without CUDA/kornia (Copilot #2991808618, #2991808669)
- 4 new tests covering _resolve_augmentation_backend and namespace pre-resolution
---
Co-authored-by: Claude Code <noreply@anthropic.com>
The try block unconditionally set has_kornia=True with no import, making the except unreachable; on machines with CUDA but without kornia, auto would incorrectly resolve to gpu — causing ImportError or unnormalized training inputs (review HIGH finding). Also update test_auto_backend_emits_warning and test_gpu_postprocess_true_for_auto_backend to mock CUDA+kornia availability so the GPU path is actually exercised; add complementary no-CUDA tests. --- Co-authored-by: Claude Code <noreply@anthropic.com>
What does this PR do?
augmentation_backendfield toTrainConfig(cpu/auto/gpu); cpu is the defaultsrc/rfdetr/datasets/kornia_transforms.py: registry of 8 transform factories,build_kornia_pipeline,build_normalize,collate_boxes/unpack_boxesbox utilitiesgpu_postprocessflag throughcoco.pyandyolo.pyso CPU Albumentations augmentation and normalize are skipped when GPU path is active_setup_kornia_pipeline+on_after_batch_transfertoRFDETRDataModule; segmentation models skip GPU aug (phase 2) with a one-time warningkornia>=0.7,<1optional dep group inpyproject.tomltest_module_data.pyandtest_kornia_transforms.pyCloses #862
Type of Change
Testing
Additional Context