-
Notifications
You must be signed in to change notification settings - Fork 759
fix: enable multi-GPU DDP training in Jupyter notebooks #928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Borda
merged 35 commits into
roboflow:develop
from
mfazrinizar:fix/ddp-notebook-cuda-init
Apr 8, 2026
Merged
Changes from 32 commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
2f814fd
fix: defer CUDA init to enable DDP training in notebooks
mfazrinizar 9814557
fix: skip CUDA bf16 probe for ddp_notebook strategy
mfazrinizar 9104d43
fix: eliminate all CUDA driver context leaks before DDP fork
mfazrinizar 34ab21b
fix: use overridden num_workers in all dataloaders for ddp_notebook
mfazrinizar 6ce6ad0
fix: possible thread-state corruption from fork()
mfazrinizar ed99190
revert: remove torch.set_num_threads that crashes forked DDP children
mfazrinizar a31dcb2
fix: use spawn-based DDP for ddp_notebook to avoid OpenMP SIGABRT
mfazrinizar 728c1e5
fix: adding logger for ddp_notebook strategy
mfazrinizar a464cf2
fix: use spawn-based DDP for ddp_notebook to avoid OpenMP SIGABRT
mfazrinizar 08af3c5
fix: remove unnecessary num_workers=0 override for ddp_notebook
mfazrinizar bcdfd0a
fix: use standard precision probing for DDP and guard auto-batch
mfazrinizar c4c88f2
Merge branch 'develop' into fix/ddp-notebook-cuda-init
mfazrinizar e7a84d0
fix(pre-commit): 🎨 auto format pre-commit hooks
pre-commit-ci[bot] e67ac24
style: fix ruff E402 imports and codespell in DDP tests
mfazrinizar 1927ca5
fix: handle None from torch.accelerator on CPU-only environments
mfazrinizar d798aed
fix: guard torch.accelerator access before current_accelerator check
Borda ea8eddf
fix: replace assert with RuntimeError in _NotebookSpawnDDPStrategy
Borda ef80e40
Apply suggestions from code review
Borda 28582bf
fix(pre-commit): 🎨 auto format pre-commit hooks
pre-commit-ci[bot] 680b308
Merge branch 'fix/ddp-notebook-cuda-init' of https://github.com/mfazr…
Borda 10b35fc
docs: note private PTL launcher API risk in trainer.py
Borda 9711ab9
docs: update _build_model_context docstring for lazy device placement
Borda 8602d95
fix: add Any type annotation to _ensure_model_on_device parameter
Borda 3359d21
docs: explain why CUDA calls in _resolve_precision are safe with spaw…
Borda a16ba34
docs: consolidate duplicated OMP fork explanation in trainer.py
Borda d309dbc
test: add coverage for _ensure_model_on_device auto-batch path + _det…
Borda 16023ab
lint: fix import ordering in test_config.py (I001)
Borda 31a6fb7
Merge branch 'fix/ddp-notebook-cuda-init' of https://github.com/mfazr…
Borda c74e181
Apply suggestions from code review
Borda 42fd15c
fix(pre-commit): 🎨 auto format pre-commit hooks
pre-commit-ci[bot] 908dfef
refactor(tests): convert mocking to @patch decorator style
Borda 903e962
refactor(tests): convert test_amp_true_ddp_notebook_probes_bf16_norma…
Borda 73a073d
Apply suggestions from code review
Borda 3307018
fix: address PR #928 unresolved reviews and CPU CI failure
Borda b4e82e4
Merge branch 'develop' into fix/ddp-notebook-cuda-init
Borda File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.