Make FlyDSL LDS checks architecture-aware and reduce tuner failure noise#2732
Open
Make FlyDSL LDS checks architecture-aware and reduce tuner failure noise#2732
Conversation
Use shared-memory-per-block queries to keep FlyDSL LDS checks architecture-aware, and surface candidate failures as concise runtime warnings so tuning can continue without noisy tracebacks. Made-with: Cursor
Avoid mutating the shared topk value while post-processing one shape so later shape groups keep the intended candidate limit. Made-with: Cursor
Contributor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR makes FlyDSL tuning and kernel validation architecture-aware by using device-specific LDS/shared-memory limits, improves error handling to reduce noisy failures during tuning, and ensures topk selection doesn’t leak across shape groups.
Changes:
- Add a utility to query per-device shared memory/LDS limits (with arch-based fallback) and use it in FlyDSL LDS validation.
- Improve FlyDSL HGEMM kernel compilation error messages and make tuner workers bail out early on invalid results.
- Keep
topkselection local per shape group during post-processing.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
aiter/utility/mp_tuner.py |
Adds helper error classifiers; broadens handled exceptions and returns early for failed runs to keep tuning moving. |
aiter/utility/base_tuner.py |
Uses an effective_topk per shape group to prevent cross-group mutation of the global topk. |
aiter/ops/flydsl/utils.py |
Introduces shared-memory/LDS limit helpers using device properties with an arch fallback. |
aiter/ops/flydsl/kernels/splitk_hgemm.py |
Replaces fixed LDS assert with device-aware limit checks and wraps compile failures with concise contextual errors. |
aiter/ops/flydsl/gemm_tune/flydsl_gemm_a8w8_bpreshuffle_common.py |
Switches tuning LDS cap to device-aware helper and de-duplicates arch mapping logic. |
aiter/ops/flydsl/gemm_kernels.py |
Replaces the fixed MAX_LDS_BYTES cap with device-aware LDS limit validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
aiter/ops/flydsl/gemm_tune/flydsl_gemm_a8w8_bpreshuffle_common.py
Outdated
Show resolved
Hide resolved
Avoid repeated device property lookups while validating FlyDSL kernel configs by caching the default device selection and shared-memory-per-block queries. Made-with: Cursor
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Technical Details
topkselection local to each shape to avoid leaking a reduced candidate limit into later shape groups.Test Plan
Test Result
Submission Checklist