Skip to content

fix: slice numpy array values in custom_data per row in CSVSink#2199

Open
farukalamai wants to merge 1 commit intoroboflow:developfrom
farukalamai:fix/csv-json-sink-custom-data-array-slicing
Open

fix: slice numpy array values in custom_data per row in CSVSink#2199
farukalamai wants to merge 1 commit intoroboflow:developfrom
farukalamai:fix/csv-json-sink-custom-data-array-slicing

Conversation

@farukalamai
Copy link
Copy Markdown

Before submitting
  • Self-reviewed the code
  • Updated documentation, follow Google-style
  • Added docs entry for autogeneration (if new functions/classes)
  • Added/updated tests
  • All tests pass locally

Description

Fixes a bug in CSVSink and JSONSink where passing a numpy array as a
custom_data value wrote the entire array on every row instead of the
per-detection scalar value.

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)

Motivation and Context

When users pass computed per-detection values like detections.area via
custom_data, each row should receive its own scalar — not the whole array.

# Before (broken): every row got the full array
with sv.CSVSink("out.csv") as sink:
    sink.append(detections, custom_data={"area": detections.area})
# area column: [400.0, 400.0] on every row ❌

# After (fixed): each row gets its own value
# area column: 400.0, 400.0 ✅

The root cause was row.update(custom_data) inside the per-detection loop,
which blindly wrote the whole value. The fix applies the same per-index
slicing logic that detections.data already uses correctly.

Closes #1397

Changes Made

  • src/supervision/detection/tools/csv_sink.py — slice numpy array values in custom_data per detection row
  • src/supervision/detection/tools/json_sink.py — same fix
  • tests/detection/test_csv.py — added test case for numpy array in custom_data

Testing

  • I have tested this code locally
  • I have added unit tests that prove my fix is effective or that my feature works
  • All new and existing tests pass

Google Colab (optional)

Colab link:

Screenshots/Videos (optional)

Additional Notes

The fix is backward compatible — scalar values in custom_data (e.g.
{"frame_number": 42}) continue to work as before, written as-is on every
row.

@farukalamai farukalamai requested a review from SkalskiP as a code owner April 3, 2026 20:38
@Borda Borda requested a review from Copilot April 8, 2026 12:27
@Borda Borda changed the title fix: slice numpy array values in custom_data per row in CSVSink and J… fix: slice numpy array values in custom_data per row in CSVSink Apr 8, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 77%. Comparing base (48035c1) to head (628f295).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff           @@
##           develop   #2199   +/-   ##
=======================================
- Coverage       77%     77%   -0%     
=======================================
  Files           62      62           
  Lines         7640    7650   +10     
=======================================
+ Hits          5919    5926    +7     
- Misses        1721    1724    +3     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect serialization of per-detection custom_data in CSVSink/JSONSink when users pass numpy arrays (previously the full array was written on every row), aligning output with expected “one value per detection row” behavior.

Changes:

  • Update CSVSink.parse_detection_data() to slice custom_data numpy arrays per detection row.
  • Update JSONSink.parse_detection_data() to slice custom_data numpy arrays per detection row.
  • Add a unit test ensuring CSVSink slices numpy-array custom_data per row.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/supervision/detection/tools/csv_sink.py Slice numpy-array custom_data per detection row when producing CSV rows.
src/supervision/detection/tools/json_sink.py Apply analogous per-row slicing for numpy-array custom_data when producing JSON rows.
tests/detection/test_csv.py Add regression test covering numpy-array custom_data in CSVSink.

Comment on lines 120 to +127
if custom_data:
row.update(custom_data)
for key, value in custom_data.items():
if isinstance(value, np.ndarray) and value.ndim == 0:
row[key] = str(value)
elif isinstance(value, np.ndarray):
row[key] = str(value[i])
else:
row[key] = value
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom_data numpy arrays are serialized using str(...), which turns numeric values into JSON strings (while other built-in fields like confidence are numbers). Consider converting numpy values to native Python scalars (e.g., via .item() for 0-d arrays and elements) so JSON output preserves numeric types and remains consistently typed.

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +126
for key, value in custom_data.items():
if isinstance(value, np.ndarray) and value.ndim == 0:
row[key] = str(value)
elif isinstance(value, np.ndarray):
row[key] = str(value[i])
else:
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexing custom_data numpy arrays with value[i] will raise IndexError if the array length doesn't match the number of detections (including the common case of a 1-element array intended as a constant). It would be safer to validate lengths and either broadcast length-1 arrays or raise a clear ValueError explaining the expected shape.

Copilot uses AI. Check for mistakes.
Comment on lines 120 to 128
if custom_data:
row.update(custom_data)
for key, value in custom_data.items():
if isinstance(value, np.ndarray) and value.ndim == 0:
row[key] = str(value)
elif isinstance(value, np.ndarray):
row[key] = str(value[i])
else:
row[key] = value
parsed_rows.append(row)
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR changes JSONSink behavior but there’s no unit test covering custom_data passed as a numpy array (similar to the new CSVSink test). Adding a test that asserts per-row slicing and JSON-serializable output would prevent regressions and confirm the fix end-to-end.

Copilot generated this review using guidance from repository custom instructions.
elif isinstance(value, np.ndarray):
row[key] = value[i]
else:
row[key] = value
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom_data slicing currently only handles np.ndarray. If a caller passes a per-detection Python sequence (e.g., list/tuple) it will still be written as the full sequence on every row. Consider mirroring the detections.data logic here (slice values that are indexable and match detection length) or explicitly documenting that only numpy arrays are supported for per-row custom values.

Suggested change
row[key] = value
row[key] = value[i] if hasattr(value, "__getitem__") else value

Copilot uses AI. Check for mistakes.
Comment on lines +147 to +152
for key, value in custom_data.items():
if isinstance(value, np.ndarray) and value.ndim == 0:
row[key] = value
elif isinstance(value, np.ndarray):
row[key] = value[i]
else:
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexing custom_data numpy arrays with value[i] can raise IndexError when the provided array length doesn't match the number of detections (including a 1-element array intended to broadcast). Consider validating lengths and either broadcasting or raising a clear ValueError describing the expected shape to make failures easier to debug.

Copilot uses AI. Check for mistakes.
@Borda Borda added waiting for author bug Something isn't working labels Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working waiting for author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Save detection area with CSVSink

3 participants