Skip to content

Make FlyDSL LDS checks architecture-aware and reduce tuner failure noise#2732

Open
yzhou103 wants to merge 7 commits intomainfrom
opt_flysdl_bf16_tune
Open

Make FlyDSL LDS checks architecture-aware and reduce tuner failure noise#2732
yzhou103 wants to merge 7 commits intomainfrom
opt_flysdl_bf16_tune

Conversation

@yzhou103
Copy link
Copy Markdown
Contributor

Motivation

Technical Details

  • Use device shared-memory/LDS limits for FlyDSL GEMM filtering and split-k kernel validation instead of relying on a fixed LDS cap.
  • Surface FlyDSL candidate compile/LDS failures as concise runtime warnings so tuning can continue without noisy tracebacks.
  • Keep tuner topk selection local to each shape to avoid leaking a reduced candidate limit into later shape groups.

Test Plan

  • Verified failing candidates now log a single concise warning with kernel/context information and no extra traceback noise.
  • Verified tuning still completes and produces a valid best kernel result.

Test Result

Submission Checklist

Use shared-memory-per-block queries to keep FlyDSL LDS checks architecture-aware, and surface candidate failures as concise runtime warnings so tuning can continue without noisy tracebacks.

Made-with: Cursor
Avoid mutating the shared topk value while post-processing one shape so later shape groups keep the intended candidate limit.

Made-with: Cursor
@yzhou103 yzhou103 requested review from a team and Copilot April 14, 2026 07:13
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-355 Run Triton tests on MI355 in addition to MI325
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2732 --add-label <label>

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes FlyDSL tuning and kernel validation architecture-aware by using device-specific LDS/shared-memory limits, improves error handling to reduce noisy failures during tuning, and ensures topk selection doesn’t leak across shape groups.

Changes:

  • Add a utility to query per-device shared memory/LDS limits (with arch-based fallback) and use it in FlyDSL LDS validation.
  • Improve FlyDSL HGEMM kernel compilation error messages and make tuner workers bail out early on invalid results.
  • Keep topk selection local per shape group during post-processing.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
aiter/utility/mp_tuner.py Adds helper error classifiers; broadens handled exceptions and returns early for failed runs to keep tuning moving.
aiter/utility/base_tuner.py Uses an effective_topk per shape group to prevent cross-group mutation of the global topk.
aiter/ops/flydsl/utils.py Introduces shared-memory/LDS limit helpers using device properties with an arch fallback.
aiter/ops/flydsl/kernels/splitk_hgemm.py Replaces fixed LDS assert with device-aware limit checks and wraps compile failures with concise contextual errors.
aiter/ops/flydsl/gemm_tune/flydsl_gemm_a8w8_bpreshuffle_common.py Switches tuning LDS cap to device-aware helper and de-duplicates arch mapping logic.
aiter/ops/flydsl/gemm_kernels.py Replaces the fixed MAX_LDS_BYTES cap with device-aware LDS limit validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

yzhou103 and others added 3 commits April 14, 2026 15:35
Avoid repeated device property lookups while validating FlyDSL kernel configs by caching the default device selection and shared-memory-per-block queries.

Made-with: Cursor
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@coderfeli coderfeli requested a review from solinzby1 April 14, 2026 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants