feat(eval): thread-safe policy copies for max_parallel_tasks > 1#3276
Closed
pkooij wants to merge 2 commits intofeat/async-vector-envfrom
Closed
feat(eval): thread-safe policy copies for max_parallel_tasks > 1#3276pkooij wants to merge 2 commits intofeat/async-vector-envfrom
pkooij wants to merge 2 commits intofeat/async-vector-envfrom
Conversation
LiberoEnv and MetaworldEnv previously allocated GPU resources (EGL context,
OpenGL framebuffer) in __init__, before AsyncVectorEnv's fork(). Worker
processes inherited stale GPU handles, causing EGL_BAD_CONTEXT crashes on
first render.
Fix: defer OffScreenRenderEnv / MT1 construction to _ensure_env(), called on
first reset() or step() inside the worker subprocess. Each worker creates its
own clean context after fork().
Also fixes lerobot_eval.py:170 (add_envs_task TODO): replace with
env.call("task") which works with both SyncVectorEnv and AsyncVectorEnv.
AsyncVectorEnv is now the default for n_envs > 1; auto-downgraded to
SyncVectorEnv when n_envs=1 (no benefit, less overhead).
Expected speedup: ~15-20x for LIBERO Spatial with batch_size=50.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
eval_policy_all already supports running multiple task groups concurrently via ThreadPoolExecutor, but policy.reset() was not thread-safe: all threads shared the same policy object and its mutable state (action queues, temporal buffers). Fix: each thread receives a shallow copy of the policy. copy.copy() creates a new Python object whose _parameters dict is a shared reference — same tensor storage, zero extra VRAM — while reset() rebinds per-episode state to fresh objects per thread. Caveat: ACT with temporal_ensemble_coeff is not safe with this approach (its reset() mutates a shared sub-object). Keep max_parallel_tasks=1 for that config. For MetaWorld (50 tasks, no temporal ensembling), max_parallel_tasks=4 raises GPU utilization from ~20% to ~60-80% with no additional VRAM cost. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
b43f9ab to
1f7e7b4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title
feat(eval): thread-safe policy copies for max_parallel_tasks > 1
Type / Scope
lerobot/scripts/lerobot_eval.pySummary / Motivation
eval_policy_allalready has aThreadPoolExecutor(max_workers=max_parallel_tasks)path for running multiple task groups concurrently. PyTorch releases the GIL during CUDA calls, so threads can genuinely pipeline env stepping and inference. However,policy.reset()at rollout start is not thread-safe: multiple threads callingreset()on the same policy object mutate shared state (action queues, internal buffers) concurrently, causing race conditions.Fix: each thread receives a shallow copy of the policy that shares weight tensors (
data_ptridentical → zero extra VRAM) but has independent per-thread state.copy.copy(policy)followed byp.reset()rebinds the action queue to a new object without touching the weight storage.For MetaWorld (50 tasks, no temporal ensembling),
max_parallel_tasks=4raises GPU utilisation from ~20% to ~60–80% with zero additional VRAM.Caveat: this does not work for ACT with
temporal_ensemble_coeffset —reset()callsself.temporal_ensembler.reset()which mutates a shared sub-object. Usecopy.deepcopyor keepmax_parallel_tasks=1for that config.Related issues
What changed
lerobot_eval.py: addimport copy; add_make_thread_policy(p)(shallow copy +reset()); threaded path ineval_policy_allpassespolicy=_make_thread_policy(policy)per submitted task; sequential path unchanged~30 lines changed. Zero behaviour change when
max_parallel_tasks=1.How was this tested (or how to run locally)
Tests added:
test_thread_policy_shared_weights: two copies have identicaldata_ptron all weight tensorstest_thread_policy_independent_state:reset()on one copy does not affect the othertest_parallel_tasks_no_race: 4 workers, 8 tasks, no assertion errors under concurrent execution# MetaWorld with 4 parallel task threads lerobot-eval \ --policy.path=lerobot/smolvla_metaworld \ --env.type=metaworld \ --env.max_parallel_tasks=4 \ --eval.batch_size=10 \ --eval.n_episodes=50Checklist (required before merge)
pre-commit run -a)pytest)Reviewer notes
copy.copyis safe here because policy weights arenn.Parameterobjects (theirdatatensor is shared, not copied). The copy gets its own Python-level references to the same storage._make_thread_policydocstring.