Skip to content

fix: is_empty() returns False for empty tracker arrays (issue #2195)#2203

Open
Zeesejo wants to merge 1 commit intoroboflow:developfrom
Zeesejo:fix/is-empty-tracker-id
Open

fix: is_empty() returns False for empty tracker arrays (issue #2195)#2203
Zeesejo wants to merge 1 commit intoroboflow:developfrom
Zeesejo:fix/is-empty-tracker-id

Conversation

@Zeesejo
Copy link
Copy Markdown

@Zeesejo Zeesejo commented Apr 7, 2026

Problem

sv.Detections.is_empty() returned False when tracker_id was set to an empty array np.array([]) instead of None. This happened because the previous implementation compared self == Detections.empty(), and Detections.empty() sets tracker_id=None — so the equality check failed for any instance where tracker_id=[].

Minimal repro (before fix):

import numpy as np
import supervision as sv

detections = sv.Detections(
    xyxy=np.empty((0, 4), dtype=np.float32),
    tracker_id=np.array([])  # empty array, not None
)
print(detections.is_empty())  # ❌ returned False

Fix

Replaced the equality-based check with a direct length check:

def is_empty(self) -> bool:
    return len(self) == 0

This is robust to any optional field (tracker_id, confidence, class_id, etc.) being an empty array rather than None, since __len__ is based solely on the number of bounding boxes (len(self.xyxy)).

Fixes #2195

Previously, is_empty() used equality comparison against Detections.empty()
which sets tracker_id=None. When tracker_id was np.array([]) instead of None
(e.g., after filtering a Detections object that had a tracker_id), the __eq__
check failed even though the detection set is genuinely empty.

Fix: check len(self) == 0 directly, preserving data/metadata neutrality.

Fixes roboflow#2195
@Zeesejo Zeesejo requested a review from SkalskiP as a code owner April 7, 2026 13:12
Copilot AI review requested due to automatic review settings April 7, 2026 13:12
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 7, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes sv.Detections.is_empty() returning False for empty detections when optional fields (e.g. tracker_id) are present as zero-length arrays instead of None.

Changes:

  • Reimplemented Detections.is_empty() to return len(self) == 0 (based on xyxy length only).
  • Expanded the is_empty() docstring to clarify the new behavior.
  • Removed substantial docstring example blocks from from_lmm() / from_vlm() and adjusted the __getitem__ docstring example.

Comment on lines 1266 to +1273
def is_empty(self) -> bool:
"""
Returns `True` if the `Detections` object is considered empty.
Returns `True` if the `Detections` object is considered empty,
i.e. contains no detections. This check is based solely on the
number of bounding boxes, making it robust to optional fields
(such as `tracker_id`) being empty arrays rather than `None`.
"""
empty_detections = Detections.empty()
empty_detections.data = self.data
empty_detections.metadata = self.metadata
return bool(self == empty_detections)
return len(self) == 0
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add regression tests for the updated Detections.is_empty() behavior (e.g., xyxy empty with tracker_id=np.array([]) and/or other optional fields as empty arrays) to ensure the original issue (#2195) is covered and doesn't regress.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines 1461 to 1466
Example:
```python
import supervision as sv

detections = sv.Detections()
detections = sv.Detections(...)

Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The __getitem__ docstring example uses sv.Detections(...), which isn’t runnable and is inconsistent with other docstring examples in this module that provide concrete NumPy inputs. Consider replacing it with a minimal valid construction (e.g., a small xyxy array) so the example can be executed as documentation.

Copilot uses AI. Check for mistakes.
Comment on lines 965 to 999
def from_lmm(
cls, lmm: LMM | str, result: str | dict[str, Any], **kwargs: Any
) -> Detections:
"""
!!! deprecated "Deprecated"
`Detections.from_lmm` is **deprecated** and will be removed in `supervision-0.31.0`.
Please use `Detections.from_vlm` instead.

Creates a Detections object from the given result string based on the specified
Large Multimodal Model (LMM).

| Name | Enum (sv.LMM) | Tasks | Required parameters | Optional parameters |
|---------------------|----------------------|-------------------------|-----------------------------|---------------------|
| PaliGemma | `PALIGEMMA` | detection | `resolution_wh` | `classes` |
| PaliGemma 2 | `PALIGEMMA` | detection | `resolution_wh` | `classes` |
| Qwen2.5-VL | `QWEN_2_5_VL` | detection | `resolution_wh`, `input_wh` | `classes` |
| Google Gemini 2.0 | `GOOGLE_GEMINI_2_0` | detection | `resolution_wh` | `classes` |
| Google Gemini 2.5 | `GOOGLE_GEMINI_2_5` | detection, segmentation | `resolution_wh` | `classes` |
| Moondream | `MOONDREAM` | detection | `resolution_wh` | |
| DeepSeek-VL2 | `DEEPSEEK_VL_2` | detection | `resolution_wh` | `classes` |

Args:
lmm: The type of LMM (Large Multimodal Model) to use.
result: The result string containing the detection data.
**kwargs: Additional keyword arguments required by the specified LMM.

Returns:
A new Detections object.

Raises:
ValueError: If the LMM is invalid, required arguments are missing, or
disallowed arguments are provided.
ValueError: If the specified LMM is not supported.

!!! example "PaliGemma"
```python

import supervision as sv

paligemma_result = "<loc0256><loc0256><loc0768><loc0768> cat"
detections = sv.Detections.from_lmm(
sv.LMM.PALIGEMMA,
paligemma_result,
resolution_wh=(1000, 1000),
classes=['cat', 'dog']
)
detections.xyxy
# array([[250., 250., 750., 750.]])

detections.class_id
# array([0])

detections.data
# {'class_name': array(['cat'], dtype='<U10')}
```

!!! example "Qwen2.5-VL"

??? tip "Prompt engineering"

To get the best results from Qwen2.5-VL, use clear and descriptive prompts
that specify exactly what you want to detect.

**For general object detection, use this comprehensive prompt:**

```
Detect all objects in the image and return their locations and labels.
```

**For specific object detection with detailed descriptions:**

```
Detect the red object that is leading in this image and return its location and label.
```

**For simple, targeted detection:**

```
leading blue truck
```

**Additional effective prompts:**

```
Find all people and vehicles in this scene
```

```
Locate all animals in the image
```

```
Identify traffic signs and their positions
```

**Tips for better results:**

- Use descriptive language that clearly specifies what to look for
- Include color, size, or position descriptors when targeting specific objects
- Be specific about the type of objects you want to detect
- The model responds well to both detailed instructions and concise phrases
- Results are returned in JSON format with `bbox_2d` coordinates and `label` fields


```python
import supervision as sv

qwen_2_5_vl_result = \"\"\"```json
[
{"bbox_2d": [139, 768, 315, 954], "label": "cat"},
{"bbox_2d": [366, 679, 536, 849], "label": "dog"}
]
```\"\"\"
detections = sv.Detections.from_lmm(
sv.LMM.QWEN_2_5_VL,
qwen_2_5_vl_result,
input_wh=(1000, 1000),
resolution_wh=(1000, 1000),
classes=['cat', 'dog'],
)
detections.xyxy
# array([[139., 768., 315., 954.], [366., 679., 536., 849.]])

detections.class_id
# array([0, 1])

detections.data
# {'class_name': array(['cat', 'dog'], dtype='<U10')}

detections.class_id
# array([0, 1])
```

!!! example "Qwen3-VL"

```python
import supervision as sv

qwen_3_vl_result = \"\"\"```json
[
{"bbox_2d": [139, 768, 315, 954], "label": "cat"},
{"bbox_2d": [366, 679, 536, 849], "label": "dog"}
]
```\"\"\"
detections = sv.Detections.from_lmm(
sv.LMM.QWEN_3_VL,
qwen_3_vl_result,
resolution_wh=(1000, 1000),
classes=['cat', 'dog'],
)
detections.xyxy
# array([[139., 768., 315., 954.], [366., 679., 536., 849.]])

detections.class_id
# array([0, 1])

detections.data
# {'class_name': array(['cat', 'dog'], dtype='<U10')}

detections.class_id
# array([0, 1])
```

!!! example "Gemini 2.0"

??? tip "Prompt engineering"

From Gemini 2.0 onwards, models are further trained to detect objects in
an image and get their bounding box coordinates. The coordinates,
relative to image dimensions, scale to [0, 1000]. You need to convert
these normalized coordinates back to pixel coordinates using your
original image size.

According to the Gemini API documentation on image prompts (see
https://ai.google.dev/gemini-api/docs/vision#image-input), when using a
single image with text, the recommended approach is to place the text
prompt after the image part in the contents array. This ordering has
been shown to produce significantly better results in practice.

For example, when calling the Gemini API directly, you can structure
the request like this, with the image part first and the text prompt
second in the `parts` list:

```json
{
"model": "models/gemini-2.0-flash",
"contents": [
{
"role": "user",
"parts": [
{
"inline_data": {
"mime_type": "image/png",
"data": "<BASE64_IMAGE_BYTES>"
}
},
{
"text": "Detect all the cats and dogs in the image..."
}
]
}
]
}
```
To get the best results from Google Gemini 2.0, use the following prompt.

```
Detect all the cats and dogs in the image. The box_2d should be
[ymin, xmin, ymax, xmax] normalized to 0-1000.
```

```python
import supervision as sv

gemini_response_text = \"\"\"```json
[
{"box_2d": [543, 40, 728, 200], "label": "cat", "id": 1},
{"box_2d": [653, 352, 820, 522], "label": "dog", "id": 2}
]
```\"\"\"

detections = sv.Detections.from_lmm(
sv.LMM.GOOGLE_GEMINI_2_0,
gemini_response_text,
resolution_wh=(1000, 1000),
classes=['cat', 'dog'],
)

detections.xyxy
# array([[543., 40., 728., 200.], [653., 352., 820., 522.]])

detections.data
# {'class_name': array(['cat', 'dog'], dtype='<U26')}

detections.class_id
# array([0, 1])
```

!!! example "Gemini 2.5"

??? tip "Prompt engineering"

To get the best results from Google Gemini 2.5, use the following prompt.

This prompt is designed to detect all visible objects in the image,
including small, distant, or partially visible ones, and to return
tight bounding boxes.

According to the Gemini API documentation on image prompts, when using
a single image with text, the recommended approach is to place the text
prompt after the image part in the `contents` array. See the official
Gemini vision docs for details:
https://ai.google.dev/gemini-api/docs/vision#multi-part-input

For example, using the `google-generativeai` client:

```python
from google.generativeai import types

response = model.generate_content(
contents=[
types.Part.from_image(image_bytes),
"Carefully examine this image and detect ALL visible objects, including "
"small, distant, or partially visible ones.",
],
generation_config=generation_config,
safety_settings=safety_settings,
)
```

This ordering (image first, then text) has been shown to produce
significantly better results in practice.

```
Carefully examine this image and detect ALL visible objects, including
small, distant, or partially visible ones.

IMPORTANT: Focus on finding as many objects as possible, even if you are
only moderately confident.

Make sure each bounding box is as tight as possible.

Valid object classes: {class_list}

For each detected object, provide:
- "label": the exact class name from the list above
- "confidence": your certainty (between 0.0 and 1.0)
- "box_2d": the bounding box [ymin, xmin, ymax, xmax] normalized to 0-1000
- "mask": the binary mask of the object as a base64-encoded string

Detect everything that matches the valid classes. Do not be
conservative; include objects even with moderate confidence.

Return a JSON array, for example:
[
{
"label": "person",
"confidence": 0.95,
"box_2d": [100, 200, 300, 400],
"mask": "..."
},
{
"label": "kite",
"confidence": 0.80,
"box_2d": [50, 150, 250, 350],
"mask": "..."
}
]
```

When using the google-genai library, it is recommended to set
thinking_budget=0 in thinking_config for more direct and faster responses.

```python
from google.generativeai import types

model.generate_content(
...,
generation_config=generation_config,
safety_settings=safety_settings,
thinking_config=types.ThinkingConfig(
thinking_budget=0
)
)
```

For a shorter prompt focused only on segmentation masks, you can use:

```
Return a JSON list of segmentation masks. Each entry should include the
2D bounding box in the "box_2d" key, the segmentation mask in the "mask"
key, and the text label in the "label" key. Use descriptive labels.
```

```python
import supervision as sv

gemini_response_text = \"\"\"```json
[
{"box_2d": [543, 40, 728, 200], "label": "cat", "id": 1},
{"box_2d": [653, 352, 820, 522], "label": "dog", "id": 2}
]
```\"\"\"

detections = sv.Detections.from_lmm(
sv.LMM.GOOGLE_GEMINI_2_5,
gemini_response_text,
resolution_wh=(1000, 1000),
classes=['cat', 'dog'],
)

detections.xyxy
# array([[543., 40., 728., 200.], [653., 352., 820., 522.]])

detections.data
# {'class_name': array(['cat', 'dog'], dtype='<U26')}

detections.class_id
# array([0, 1])
```

!!! example "Moondream"


??? tip "Prompt engineering"

To get the best results from Moondream, use optimized prompts that leverage
its object detection capabilities effectively.

**For general object detection, use this simple prompt:**

```
objects
```

This single-word prompt instructs Moondream to detect all visible objects
and return them in the proper JSON format with normalized coordinates.


```python
import supervision as sv

moondream_result = {
'objects': [
{
'x_min': 0.5704046934843063,
'y_min': 0.20069346576929092,
'x_max': 0.7049859315156937,
'y_max': 0.3012596592307091
},
{
'x_min': 0.6210969910025597,
'y_min': 0.3300672620534897,
'x_max': 0.8417936339974403,
'y_max': 0.4961046129465103
}
]
}

detections = sv.Detections.from_lmm(
sv.LMM.MOONDREAM,
moondream_result,
resolution_wh=(1000, 1000),
)

detections.xyxy
# array([[1752.28, 818.82, 2165.72, 1229.14],
# [1908.01, 1346.67, 2585.99, 2024.11]])
```

!!! example "DeepSeek-VL2"


??? tip "Prompt engineering"

To get the best results from DeepSeek-VL2, use optimized prompts that leverage
its object detection and visual grounding capabilities effectively.

**For general object detection, use the following user prompt:**

```
<image>\\n<|ref|>The giraffe at the front<|/ref|>
```

**For visual grounding, use the following user prompt:**

```
<image>\\n<|grounding|>Detect the giraffes
```

```python
from PIL import Image
import supervision as sv

deepseek_vl2_result = "<|ref|>The giraffe at the back<|/ref|><|det|>[[580, 270, 999, 904]]<|/det|><|ref|>The giraffe at the front<|/ref|><|det|>[[26, 31, 632, 998]]<|/det|><|end▁of▁sentence|>"

detections = sv.Detections.from_vlm(
vlm=sv.VLM.DEEPSEEK_VL_2, result=deepseek_vl2_result, resolution_wh=image.size
)

detections.xyxy
# array([[ 420, 293, 724, 982],
# [ 18, 33, 458, 1084]])

detections.class_id
# array([0, 1])

detections.data
# {'class_name': array(['The giraffe at the back', 'The giraffe at the front'], dtype='<U24')}
```
""" # noqa: E501

Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR removes large docstring example blocks from from_lmm/from_vlm, but the PR description only discusses the is_empty() behavior change. If the documentation removal is intentional, it should be mentioned in the PR description (or split into a separate docs-focused PR) to avoid surprising downstream docs consumers.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR git too wild with removing a large portion of the docs, pls focus only on the described chnaged and add relevant tests

@Borda Borda added bug Something isn't working waiting for author labels Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working waiting for author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Inconsistent behavior of sv.Detections.is_empty() if tracker_id is not None

4 participants