Make FlyDSL LDS checks architecture-aware and reduce tuner failure noise by yzhou103 · Pull Request #2732 · ROCm/aiter

yzhou103 · 2026-04-14T07:13:28Z

Motivation

Technical Details

Use device shared-memory/LDS limits for FlyDSL GEMM filtering and split-k kernel validation instead of relying on a fixed LDS cap.
Surface FlyDSL candidate compile/LDS failures as concise runtime warnings so tuning can continue without noisy tracebacks.
Keep tuner topk selection local to each shape to avoid leaking a reduced candidate limit into later shape groups.

Test Plan

Verified failing candidates now log a single concise warning with kernel/context information and no extra traceback noise.
Verified tuning still completes and produces a valid best kernel result.

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Use shared-memory-per-block queries to keep FlyDSL LDS checks architecture-aware, and surface candidate failures as concise runtime warnings so tuning can continue without noisy tracebacks. Made-with: Cursor

Avoid mutating the shared topk value while post-processing one shape so later shape groups keep the intended candidate limit. Made-with: Cursor

github-actions · 2026-04-14T07:13:48Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-355`	Run Triton tests on MI355 in addition to MI325
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2732 --add-label <label>

Copilot

Pull request overview

This PR makes FlyDSL tuning and kernel validation architecture-aware by using device-specific LDS/shared-memory limits, improves error handling to reduce noisy failures during tuning, and ensures topk selection doesn’t leak across shape groups.

Changes:

Add a utility to query per-device shared memory/LDS limits (with arch-based fallback) and use it in FlyDSL LDS validation.
Improve FlyDSL HGEMM kernel compilation error messages and make tuner workers bail out early on invalid results.
Keep topk selection local per shape group during post-processing.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`aiter/utility/mp_tuner.py`	Adds helper error classifiers; broadens handled exceptions and returns early for failed runs to keep tuning moving.
`aiter/utility/base_tuner.py`	Uses an `effective_topk` per shape group to prevent cross-group mutation of the global `topk`.
`aiter/ops/flydsl/utils.py`	Introduces shared-memory/LDS limit helpers using device properties with an arch fallback.
`aiter/ops/flydsl/kernels/splitk_hgemm.py`	Replaces fixed LDS assert with device-aware limit checks and wraps compile failures with concise contextual errors.
`aiter/ops/flydsl/gemm_tune/flydsl_gemm_a8w8_bpreshuffle_common.py`	Switches tuning LDS cap to device-aware helper and de-duplicates arch mapping logic.
`aiter/ops/flydsl/gemm_kernels.py`	Replaces the fixed MAX_LDS_BYTES cap with device-aware LDS limit validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

aiter/ops/flydsl/gemm_tune/flydsl_gemm_a8w8_bpreshuffle_common.py

aiter/ops/flydsl/gemm_kernels.py

Avoid repeated device property lookups while validating FlyDSL kernel configs by caching the default device selection and shared-memory-per-block queries. Made-with: Cursor

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

yzhou103 added 2 commits April 14, 2026 15:07

Handle FlyDSL LDS limits and candidate failures

e6e7e74

Use shared-memory-per-block queries to keep FlyDSL LDS checks architecture-aware, and surface candidate failures as concise runtime warnings so tuning can continue without noisy tracebacks. Made-with: Cursor

Keep tuner topk local per shape

7bd33d6

Avoid mutating the shared topk value while post-processing one shape so later shape groups keep the intended candidate limit. Made-with: Cursor

yzhou103 requested review from a team and Copilot April 14, 2026 07:13

Copilot started reviewing on behalf of yzhou103 April 14, 2026 07:14 View session

fix lint

1291236

Copilot AI reviewed Apr 14, 2026

View reviewed changes

aiter/ops/flydsl/gemm_tune/flydsl_gemm_a8w8_bpreshuffle_common.py Outdated Show resolved Hide resolved

aiter/ops/flydsl/gemm_kernels.py Show resolved Hide resolved

yzhou103 and others added 3 commits April 14, 2026 15:35

Cache FlyDSL shared memory queries

4705b43

Avoid repeated device property lookups while validating FlyDSL kernel configs by caching the default device selection and shared-memory-per-block queries. Made-with: Cursor

fix lint

09a9c2e

Update aiter/ops/flydsl/gemm_tune/flydsl_gemm_a8w8_bpreshuffle_common.py

690ef94

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

coderfeli requested a review from solinzby1 April 14, 2026 09:35

Merge branch 'main' into opt_flysdl_bf16_tune

ea51bf5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make FlyDSL LDS checks architecture-aware and reduce tuner failure noise#2732

Make FlyDSL LDS checks architecture-aware and reduce tuner failure noise#2732
yzhou103 wants to merge 7 commits intomainfrom
opt_flysdl_bf16_tune

yzhou103 commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yzhou103 commented Apr 14, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions bot commented Apr 14, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants