Skip to content

feat(rewards): add VITA reward model integration with adaptation and outer-loop training#3269

Open
0xPraedico wants to merge 1 commit intohuggingface:refactor/reward-modelsfrom
0xPraedico:feat/vita-reward-model
Open

feat(rewards): add VITA reward model integration with adaptation and outer-loop training#3269
0xPraedico wants to merge 1 commit intohuggingface:refactor/reward-modelsfrom
0xPraedico:feat/vita-reward-model

Conversation

@0xPraedico
Copy link
Copy Markdown
Contributor

Title

feat(rewards): add VITA reward model integration with adaptation and outer-loop training

Type / Scope

  • Type: Feature
  • Scope: rewards / vita

Summary / Motivation

This PR adds VITA as a first-class reward model under the new reward-model foundation in src/lerobot/rewards/.

Implementation is aligned with the core ideas from the VITA paper: arXiv:2506.10085.

It implements the paper-aligned core (multimodal latent, P_K/P_V/P_Q, fast-weight adaptation, sequential test-time updates, and a reward head), and also introduces an initial meta-learning outer-loop training path while keeping the backbone pluggable and lightweight.

The goal is to provide a clean, mergeable baseline integration consistent with LeRobot’s reward architecture.

Related issues

#3143

What changed

  • Added new VITA module files:
    • src/lerobot/rewards/vita/configuration_vita.py
    • src/lerobot/rewards/vita/adaptation.py
    • src/lerobot/rewards/vita/modeling_vita.py
    • src/lerobot/rewards/vita/processor_vita.py
    • src/lerobot/rewards/vita/__init__.py
  • Registered VITA in reward registry/factory:
    • src/lerobot/rewards/__init__.py
    • src/lerobot/rewards/factory.py
  • Added tests:
    • tests/rewards/test_vita_reward.py
    • updated tests/rewards/test_reward_model_base.py
  • Added docs:
    • docs/source/vita.mdx
    • updated docs/source/_toctree.yml

How was this tested

Ran:

PYTHONPATH=src .venv/bin/pytest -q tests/rewards/test_vita_reward.py tests/rewards/test_reward_model_base.py

Next steps (future work)

To get closer to a full paper-level VITA implementation, the following are planned:

  1. Meta-learning refinement

    • Extend the current outer-loop into a full episodic meta-training setup (support/query protocol aligned with the paper).
    • Add first-order vs higher-order gradient options and evaluate stability/performance trade-offs.
  2. Native VLM backbone integration

    • Integrate a real pretrained vision-language backbone (instead of pre-encoded feature inputs).
    • Add processor wiring for image/text tokenization and feature extraction in the VITA pipeline.
    • Add explicit freezing/unfreezing strategies for backbone components.
  3. Paper-aligned sampling strategy

    • Implement the trajectory/episode sampling strategy described in the paper.
    • Add configurable sampling knobs in VitaConfig and dedicated tests for sampling behavior.
  4. Evaluation and benchmarks

    • Reproduce key paper metrics on relevant tasks/datasets.
    • Add benchmark scripts and reporting to compare against existing reward models in LeRobot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants