fix(envs): handle async vector env compatibility for Libero#3321
fix(envs): handle async vector env compatibility for Libero#3321reeceomahoney wants to merge 5 commits intohuggingface:mainfrom
Conversation
Without persistent_workers, the dataloader cycle() tears down and respawns workers each epoch, leaking threads until the OS limit is hit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ustion" This reverts commit c0ce2e9.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR fixes compatibility issues when using AsyncVectorEnv with Libero by avoiding fork-related crashes and removing direct reliance on env.envs (which isn’t available for async vector envs).
Changes:
- Create async vector envs with
context="spawn"to avoid fork-safety issues (MuJoCo/OpenGL). - Add a helper to probe sub-env capabilities in both sync and async vector envs.
- Adjust env-utility checks and task extraction to work when sub-envs live in subprocesses.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/lerobot/envs/utils.py | Adds async-safe attribute probing and updates task/type utility logic accordingly |
| src/lerobot/envs/factory.py | Switches async env construction to use multiprocessing spawn context |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _has_env_attr(env: gym.vector.VectorEnv, attr: str) -> bool: | ||
| """Check if sub-environments have an attribute, compatible with sync and async vector envs.""" | ||
| if hasattr(env, "envs"): | ||
| return hasattr(env.envs[0], attr) | ||
| try: | ||
| env.call(attr) | ||
| return True | ||
| except Exception: | ||
| return False |
There was a problem hiding this comment.
_has_env_attr() swallows all exceptions from env.call(attr). This can mask real failures inside an existing method (e.g., task_description() raising), incorrectly treating it as 'missing'. Prefer catching only the 'missing attribute/method' error (typically AttributeError propagated from workers, or the specific vector-env error type) and re-raising other exceptions so actual bugs aren’t hidden.
| if _has_env_attr(env, "task_description"): | ||
| task_result = env.call("task_description") |
There was a problem hiding this comment.
In the async case, _has_env_attr() calls env.call(...) and then add_envs_task() calls env.call(...) again, doubling cross-process IPC and potentially invoking the method twice (side effects / non-determinism). Consider restructuring to attempt env.call('task_description') once (handling only the 'method missing' case), otherwise fall back to env.call('task').
| elif _has_env_attr(env, "task"): | ||
| task_result = env.call("task") |
There was a problem hiding this comment.
In the async case, _has_env_attr() calls env.call(...) and then add_envs_task() calls env.call(...) again, doubling cross-process IPC and potentially invoking the method twice (side effects / non-determinism). Consider restructuring to attempt env.call('task_description') once (handling only the 'method missing' case), otherwise fall back to env.call('task').
| if not (_has_env_attr(env, "task_description") or _has_env_attr(env, "task")): | ||
| warnings.warn( | ||
| "The environment does not have 'task_description' and 'task'. Some policies require these features.", | ||
| UserWarning, |
There was a problem hiding this comment.
This changes the warning condition from 'warn unless both are present' (previous not (A and B)) to 'warn only if neither is present' (not (A or B)). If the intent is to preserve prior behavior, this should remain an and check (or the warning message should be updated to match the new semantics).
| env_cls = ( | ||
| partial(gym.vector.AsyncVectorEnv, context="spawn") if use_async_envs else gym.vector.SyncVectorEnv | ||
| ) |
There was a problem hiding this comment.
Using partial(...) means env_cls is no longer a class when use_async_envs=True. If any downstream code expects class-like behavior (e.g., isinstance(..., env_cls), env_cls.__name__, type comparisons), this will break. A more robust approach is to keep env_cls as the class and pass context='spawn' at the construction site (or branch on async vs sync for instantiation), so the variable’s type stays consistent.
| def _has_env_attr(env: gym.vector.VectorEnv, attr: str) -> bool: | ||
| """Check if sub-environments have an attribute, compatible with sync and async vector envs.""" |
There was a problem hiding this comment.
The helper name/docstring says 'attr', but the async path effectively validates a no-arg callable via env.call(attr). Renaming to something like _has_env_method() (or clarifying in the docstring that this is specifically for env.call-invoked methods) would make expectations clearer and reduce misuse.
|
Closed by #3274 |
Type / Scope
Summary / Motivation
When using
AsyncVectorEnvwith Libero environments, the defaultforkcontext causes Mujoco to throw the following errormujoco.FatalError: Offscreen framebuffer is not complete, error 0x8cddWhat changed
src/lerobot/envs/factory.py: Usecontext="spawn"when creatingAsyncVectorEnvviafunctools.partial.src/lerobot/envs/utils.py:_has_env_attr()helper that checks sub-environment attributes usingenv.envsfor sync envs andenv.call()for async envs.are_all_envs_same_type()to returnTruefor async envs (sub-env types can't be inspected across processes).check_env_attributes_and_types()andadd_envs_task()to use the new helper.Related issues
None
How was this tested
use_async_envs=trueChecklist (required before merge)
pre-commit run -a)pytest)Reviewer notes
are_all_envs_same_typefunction assumesTruefor async envs since we can't inspect types across process boundaries. This is a reasonable default since the env factory creates all sub-envs from the same constructor.