fix: slice numpy array values in custom_data per row in CSVSink#2199
fix: slice numpy array values in custom_data per row in CSVSink#2199farukalamai wants to merge 1 commit intoroboflow:developfrom
custom_data per row in CSVSink#2199Conversation
custom_data per row in CSVSink
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #2199 +/- ##
=======================================
- Coverage 77% 77% -0%
=======================================
Files 62 62
Lines 7640 7650 +10
=======================================
+ Hits 5919 5926 +7
- Misses 1721 1724 +3 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Fixes incorrect serialization of per-detection custom_data in CSVSink/JSONSink when users pass numpy arrays (previously the full array was written on every row), aligning output with expected “one value per detection row” behavior.
Changes:
- Update
CSVSink.parse_detection_data()to slicecustom_datanumpy arrays per detection row. - Update
JSONSink.parse_detection_data()to slicecustom_datanumpy arrays per detection row. - Add a unit test ensuring
CSVSinkslices numpy-arraycustom_dataper row.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/supervision/detection/tools/csv_sink.py |
Slice numpy-array custom_data per detection row when producing CSV rows. |
src/supervision/detection/tools/json_sink.py |
Apply analogous per-row slicing for numpy-array custom_data when producing JSON rows. |
tests/detection/test_csv.py |
Add regression test covering numpy-array custom_data in CSVSink. |
| if custom_data: | ||
| row.update(custom_data) | ||
| for key, value in custom_data.items(): | ||
| if isinstance(value, np.ndarray) and value.ndim == 0: | ||
| row[key] = str(value) | ||
| elif isinstance(value, np.ndarray): | ||
| row[key] = str(value[i]) | ||
| else: | ||
| row[key] = value |
There was a problem hiding this comment.
custom_data numpy arrays are serialized using str(...), which turns numeric values into JSON strings (while other built-in fields like confidence are numbers). Consider converting numpy values to native Python scalars (e.g., via .item() for 0-d arrays and elements) so JSON output preserves numeric types and remains consistently typed.
| for key, value in custom_data.items(): | ||
| if isinstance(value, np.ndarray) and value.ndim == 0: | ||
| row[key] = str(value) | ||
| elif isinstance(value, np.ndarray): | ||
| row[key] = str(value[i]) | ||
| else: |
There was a problem hiding this comment.
Indexing custom_data numpy arrays with value[i] will raise IndexError if the array length doesn't match the number of detections (including the common case of a 1-element array intended as a constant). It would be safer to validate lengths and either broadcast length-1 arrays or raise a clear ValueError explaining the expected shape.
| if custom_data: | ||
| row.update(custom_data) | ||
| for key, value in custom_data.items(): | ||
| if isinstance(value, np.ndarray) and value.ndim == 0: | ||
| row[key] = str(value) | ||
| elif isinstance(value, np.ndarray): | ||
| row[key] = str(value[i]) | ||
| else: | ||
| row[key] = value | ||
| parsed_rows.append(row) |
There was a problem hiding this comment.
The PR changes JSONSink behavior but there’s no unit test covering custom_data passed as a numpy array (similar to the new CSVSink test). Adding a test that asserts per-row slicing and JSON-serializable output would prevent regressions and confirm the fix end-to-end.
| elif isinstance(value, np.ndarray): | ||
| row[key] = value[i] | ||
| else: | ||
| row[key] = value |
There was a problem hiding this comment.
custom_data slicing currently only handles np.ndarray. If a caller passes a per-detection Python sequence (e.g., list/tuple) it will still be written as the full sequence on every row. Consider mirroring the detections.data logic here (slice values that are indexable and match detection length) or explicitly documenting that only numpy arrays are supported for per-row custom values.
| row[key] = value | |
| row[key] = value[i] if hasattr(value, "__getitem__") else value |
| for key, value in custom_data.items(): | ||
| if isinstance(value, np.ndarray) and value.ndim == 0: | ||
| row[key] = value | ||
| elif isinstance(value, np.ndarray): | ||
| row[key] = value[i] | ||
| else: |
There was a problem hiding this comment.
Indexing custom_data numpy arrays with value[i] can raise IndexError when the provided array length doesn't match the number of detections (including a 1-element array intended to broadcast). Consider validating lengths and either broadcasting or raising a clear ValueError describing the expected shape to make failures easier to debug.
Before submitting
Description
Fixes a bug in
CSVSinkandJSONSinkwhere passing a numpy array as acustom_datavalue wrote the entire array on every row instead of theper-detection scalar value.
Type of Change
Motivation and Context
When users pass computed per-detection values like
detections.areaviacustom_data, each row should receive its own scalar — not the whole array.The root cause was
row.update(custom_data)inside the per-detection loop,which blindly wrote the whole value. The fix applies the same per-index
slicing logic that
detections.dataalready uses correctly.Closes #1397
Changes Made
src/supervision/detection/tools/csv_sink.py— slice numpy array values incustom_dataper detection rowsrc/supervision/detection/tools/json_sink.py— same fixtests/detection/test_csv.py— added test case for numpy array incustom_dataTesting
Google Colab (optional)
Colab link:
Screenshots/Videos (optional)
Additional Notes
The fix is backward compatible — scalar values in
custom_data(e.g.{"frame_number": 42}) continue to work as before, written as-is on everyrow.