fix(one shot load): adding metadata loading when reading from a dataset after writing by CarolinePascal · Pull Request #3305 · huggingface/lerobot

CarolinePascal · 2026-04-07T13:19:32Z

Type / Scope

Type: Bug
Scope: LeRobotDataset

Summary / Motivation

This PR adds metadata loading when reading from a dataset after writing it (i.e. without instantiating a LeRobotDataset class from an existing dataset).

Related issues

Fixes / Closes: # (if any)
Related: # (if any)

What changed

Add self._meta._load_metadata() in the DatasetReader.load_and_activate() method.

How was this tested (or how to run locally)

Here's a minimal reproducible example :

import shutil
import tempfile
from pathlib import Path
import numpy as np

from lerobot.datasets.lerobot_dataset import LeRobotDataset

def main():
    root = Path(tempfile.mkdtemp()) / "dummy_dataset"

    repo_id = "examples/dummy_dataset"
    fps = 10
    features = {
        "observation.image": {
            "dtype": "video",
            "shape": (64, 64, 3),
            "names": ["height", "width", "channels"],
        },
    }

    dataset = LeRobotDataset.create(
        repo_id=repo_id,
        fps=fps,
        features=features,
        root=root,
    )

    num_episodes = 2
    frames_per_episode = 20
    for ep in range(num_episodes):
        for step in range(frames_per_episode):
            frame = {
                "observation.image": np.random.randint(0, 256, size=(64, 64, 3), dtype=np.uint8),
                "task": "reach the red block",
            }
            dataset.add_frame(frame)

        dataset.save_episode()
    dataset.finalize()

    try:
        # This line will fail without the suggested fix !
        first_frame = dataset[0]
    finally:
        shutil.rmtree(root.parent)

if __name__ == "__main__":
    main()

Checklist (required before merge)

Linting/formatting run (pre-commit run -a)
All tests pass locally (pytest)
Documentation updated
CI is green

Reviewer notes

Anything the reviewer should focus on (performance, edge-cases, specific files) or general notes.
Anyone in the community is free to review the PR.

…et after writing

Copilot

Pull request overview

This PR fixes a read-after-write bug in the LeRobotDataset read path by ensuring dataset metadata is loaded from disk before activating the Hugging Face dataset and building the index mapping—particularly important when reading immediately after recording/finalizing without re-instantiating the dataset.

Changes:

Reload metadata from disk in DatasetReader.load_and_activate() before loading the HF dataset.
Ensures episode/task metadata needed by __getitem__ (especially for video-backed datasets) is available in one-shot read-after-write flows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-07T13:24:55Z

src/lerobot/datasets/dataset_reader.py

    def load_and_activate(self) -> None:
        """Load HF dataset from disk and build index mapping. Call after data is on disk."""
+        self._meta._load_metadata()


load_and_activate() now relies on LeRobotDatasetMetadata._load_metadata(), which is a private method on another class. To avoid coupling to a private API (and potential future breakage), consider adding/using a public metadata refresh method (e.g., reload() / load_from_disk()) on LeRobotDatasetMetadata and calling that here instead.

Suggested change

def load_and_activate(self) -> None:

"""Load HF dataset from disk and build index mapping. Call after data is on disk."""

self._meta._load_metadata()

def _refresh_metadata(self) -> None:

"""Refresh dataset metadata using a public API exposed by the metadata object."""

for method_name in ("reload", "load_from_disk"):

refresh = getattr(self._meta, method_name, None)

if callable(refresh):

refresh()

return

raise AttributeError(

"LeRobotDatasetMetadata must expose a public metadata refresh method "

"(for example, 'reload()' or 'load_from_disk()') before "

"DatasetReader.load_and_activate() can refresh metadata."

)

def load_and_activate(self) -> None:

"""Load HF dataset from disk and build index mapping. Call after data is on disk."""

self._refresh_metadata()

Copilot · 2026-04-07T13:24:55Z

src/lerobot/datasets/dataset_reader.py

    def load_and_activate(self) -> None:
        """Load HF dataset from disk and build index mapping. Call after data is on disk."""
+        self._meta._load_metadata()
        self.hf_dataset = self._load_hf_dataset()
        self._build_index_mapping()


This change fixes a read-after-write path (metadata reloaded before building the index), but there’s no regression test covering the failing scenario described in the PR (create dataset with video features, write episodes, finalize, then access dataset[0] without re-instantiating). Adding a test in the existing DatasetReader/LeRobotDataset test suite would prevent this from regressing again.

fix(one shot load): adding metadata loading when reading from a datas…

7de92a9

…et after writing

Copilot AI review requested due to automatic review settings April 7, 2026 13:19

github-actions bot added the dataset Issues regarding data inputs, processing, or datasets label Apr 7, 2026

Copilot started reviewing on behalf of CarolinePascal April 7, 2026 13:20 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(one shot load): adding metadata loading when reading from a dataset after writing#3305

fix(one shot load): adding metadata loading when reading from a dataset after writing#3305
CarolinePascal wants to merge 1 commit intomainfrom
fix/one-shot-load

CarolinePascal commented Apr 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    def load_and_activate(self) -> None:
-        """Load HF dataset from disk and build index mapping. Call after data is on disk."""
-        self._meta._load_metadata()
+    def _refresh_metadata(self) -> None:
+        """Refresh dataset metadata using a public API exposed by the metadata object."""
+        for method_name in ("reload", "load_from_disk"):
+            refresh = getattr(self._meta, method_name, None)
+            if callable(refresh):
+                refresh()
+                return
+        raise AttributeError(
+            "LeRobotDatasetMetadata must expose a public metadata refresh method "
+            "(for example, 'reload()' or 'load_from_disk()') before "
+            "DatasetReader.load_and_activate() can refresh metadata."
+        )
+    def load_and_activate(self) -> None:
+        """Load HF dataset from disk and build index mapping. Call after data is on disk."""
+        self._refresh_metadata()

Conversation

CarolinePascal commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type / Scope

Summary / Motivation

Related issues

What changed

How was this tested (or how to run locally)

Checklist (required before merge)

Reviewer notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CarolinePascal commented Apr 7, 2026 •

edited

Loading