Fix ONNX export for dynamic batch dimensions#871
Merged
Borda merged 10 commits intoroboflow:developfrom Mar 25, 2026
Merged
Conversation
18278fb to
febf8d7
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes RF-DETR’s ONNX export to better support dynamic batch sizes by removing shape-dependent Python constructs that get constant-folded into the exported graph.
Changes:
- Replace shape-derived Python constructs with tensor/attribute-based equivalents to avoid baking batch/spatial dims into ONNX graphs.
- Add
dynamic_batchoption toRFDETR.export()and the export CLI path to emitdynamic_axesfor batch dimension. - Update projector
LayerNormto useself.normalized_shapeinstead ofx.size(3)to prevent tracing fixed constants.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/rfdetr/models/transformer.py |
Reworks spatial_shapes construction and proposal generation to be more ONNX-trace-friendly. |
src/rfdetr/models/backbone/projector.py |
Prevents LayerNorm from embedding export-time channel size as a constant. |
src/rfdetr/export/main.py |
Adds CLI support for dynamic batch via dynamic_axes. |
src/rfdetr/detr.py |
Adds dynamic_batch parameter and wires dynamic_axes through high-level export API. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #871 +/- ##
======================================
Coverage 77% 77%
======================================
Files 97 97
Lines 7530 7538 +8
======================================
+ Hits 5793 5801 +8
Misses 1737 1737 🚀 New features to boost your workflow:
|
The PR switched spatial_shapes from a Python list to a tensor for ONNX tracing. But gen_encoder_output_proposals iterates over it and uses H_/W_ as slice indices and torch.linspace steps, both of which require Python ints, not scalar tensors — causing TypeError at runtime. Reintroduce spatial_shapes_hw as a list[tuple[int, int]] built alongside the tensor during the loop (h/w come from src.shape, so they are already Python ints). Pass spatial_shapes_hw to gen_encoder_output_proposals while the tensor form is still used for MSDeformAttn and level_start_index. Addresses review comment by @Copilot (PR roboflow#871)
Add two test suites recommended by Copilot review:
1. TestCliExportMain.test_dynamic_batch_forwards_dynamic_axes — verifies
that CLI main() passes dynamic_axes={name: {0: 'batch'}} to export_onnx
for every I/O name when --dynamic_batch=True, and None when False.
Also updates _make_args() to accept dynamic_batch and fake_export_onnx
to capture dynamic_axes.
2. test_rfdetr_export_dynamic_batch_forwards_dynamic_axes — verifies that
RFDETR.export(..., dynamic_batch=True) forwards a correctly keyed
dynamic_axes dict into export_onnx, covering detection and segmentation
model configs, plus static (False) baseline.
Addresses review comments by @Copilot (PR roboflow#871)
H_.expand(N_) assumed H_ is a tensor, but gen_encoder_output_proposals now receives spatial_shapes as list[tuple[int, int]] from Transformer.forward(). Python ints have no .expand() method, causing AttributeError on the export path (masks=None, two_stage=True). Replace with torch.full((N_,), H_, ...) which accepts Python ints. Also add regression test with list[tuple[int, int]] + masks=None to lock in the int-tuple path and catch any future regressions.
Borda
reviewed
Mar 25, 2026
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Borda
approved these changes
Mar 25, 2026
Borda
added a commit
that referenced
this pull request
Mar 27, 2026
* Fix ONNX export for dynamic batch dimensions * fix: restore Python int pairs for gen_encoder_output_proposals * test: add dynamic_batch coverage for CLI and RFDETR.export() * fix: replace H_.expand(N_) with torch.full for Python int spatial dims * Apply suggestions from code review --------- Co-authored-by: jirka <6035284+Borda@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Borda
added a commit
that referenced
this pull request
Mar 27, 2026
* Fix ONNX export for dynamic batch dimensions * fix: restore Python int pairs for gen_encoder_output_proposals * test: add dynamic_batch coverage for CLI and RFDETR.export() * fix: replace H_.expand(N_) with torch.full for Python int spatial dims * Apply suggestions from code review --------- Co-authored-by: jirka <6035284+Borda@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
ONNX export silently baked spatial dimensions and batch size into the graph as fixed constants. When users tried to run the exported model with a different batch size at inference time, it would fail because the ONNX graph contained hardcoded values derived from the export-time batch size.
Three root causes:
LayerNorm.forwardusedx.size(3)instead ofself.normalized_shape— The ONNX tracer captured the concrete integer from the export-time tensor shape and embedded it as a constant node. Switching toself.normalized_shape(already stored on the module) gives the tracer a static attribute it can reference symbolically.spatial_shapeswas built as a Python list, then converted withtorch.as_tensor()— The tracer never saw theh, wvalues flow through tensor operations, so it treated them as constants. Buildingspatial_shapesdirectly as a tensor with index assignment (spatial_shapes[lvl, 0] = h) lets the tracer track the symbolic relationship between the input spatial dims and downstream uses.gen_encoder_output_proposalscreatedvalid_H/valid_Wvia Python list comprehensions —torch.tensor([H_ for _ in range(N_)])bakes the batch-dependent valueN_as a fixed list length. Replacing this withH_.expand(N_)keeps the batch dimension symbolic in the graph.Additionally, a
dynamic_batchparameter is added toRFDETR.export_onnx()and the standalone export CLI. When enabled, it marks the batch axis (dim 0) as dynamic on all input and output names, allowing the exported model to accept variable batch sizes at runtime.Related Issue(s): #376 , #79
Type of Change
Testing