feat(rewards): add VITA reward model integration with adaptation and outer-loop training by 0xPraedico · Pull Request #3269 · huggingface/lerobot

0xPraedico · 2026-04-02T17:53:43Z

Title

feat(rewards): add VITA reward model integration with adaptation and outer-loop training

Type / Scope

Type: Feature
Scope: rewards / vita

Summary / Motivation

This PR adds VITA as a first-class reward model under the new reward-model foundation in src/lerobot/rewards/.

Implementation is aligned with the core ideas from the VITA paper: arXiv:2506.10085.

It implements the paper-aligned core (multimodal latent, P_K/P_V/P_Q, fast-weight adaptation, sequential test-time updates, and a reward head), and also introduces an initial meta-learning outer-loop training path while keeping the backbone pluggable and lightweight.

The goal is to provide a clean, mergeable baseline integration consistent with LeRobot’s reward architecture.

Related issues

#3143

What changed

Added new VITA module files:
- src/lerobot/rewards/vita/configuration_vita.py
- src/lerobot/rewards/vita/adaptation.py
- src/lerobot/rewards/vita/modeling_vita.py
- src/lerobot/rewards/vita/processor_vita.py
- src/lerobot/rewards/vita/__init__.py
Registered VITA in reward registry/factory:
- src/lerobot/rewards/__init__.py
- src/lerobot/rewards/factory.py
Added tests:
- tests/rewards/test_vita_reward.py
- updated tests/rewards/test_reward_model_base.py
Added docs:
- docs/source/vita.mdx
- updated docs/source/_toctree.yml

How was this tested

Ran:

PYTHONPATH=src .venv/bin/pytest -q tests/rewards/test_vita_reward.py tests/rewards/test_reward_model_base.py

Next steps (future work)

To get closer to a full paper-level VITA implementation, the following are planned:

Meta-learning refinement
- Extend the current outer-loop into a full episodic meta-training setup (support/query protocol aligned with the paper).
- Add first-order vs higher-order gradient options and evaluate stability/performance trade-offs.
Native VLM backbone integration
- Integrate a real pretrained vision-language backbone (instead of pre-encoded feature inputs).
- Add processor wiring for image/text tokenization and feature extraction in the VITA pipeline.
- Add explicit freezing/unfreezing strategies for backbone components.
Paper-aligned sampling strategy
- Implement the trajectory/episode sampling strategy described in the paper.
- Add configurable sampling knobs in VitaConfig and dedicated tests for sampling behavior.
Evaluation and benchmarks
- Reproduce key paper metrics on relevant tasks/datasets.
- Add benchmark scripts and reporting to compare against existing reward models in LeRobot.

…learning outer loop

Add VITA reward model integration with test-time adaptation and meta-…

820aa79

…learning outer loop

0xPraedico mentioned this pull request Apr 2, 2026

Reward Models: call for contributions #3143

Open

imstevenpmwork self-assigned this Apr 3, 2026

s1lent4gnt self-assigned this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rewards): add VITA reward model integration with adaptation and outer-loop training#3269

feat(rewards): add VITA reward model integration with adaptation and outer-loop training#3269
0xPraedico wants to merge 1 commit intohuggingface:refactor/reward-modelsfrom
0xPraedico:feat/vita-reward-model

0xPraedico commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

0xPraedico commented Apr 2, 2026

Title

Type / Scope

Summary / Motivation

Related issues

What changed

How was this tested

Next steps (future work)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants