Skip to content

sonic-moe: Add sonic-moe kernels #531

Open
adarshxs wants to merge 12 commits intohuggingface:mainfrom
adarshxs:kernel/sonic-moe
Open

sonic-moe: Add sonic-moe kernels #531
adarshxs wants to merge 12 commits intohuggingface:mainfrom
adarshxs:kernel/sonic-moe

Conversation

@adarshxs
Copy link
Copy Markdown
Contributor

@adarshxs adarshxs commented Apr 7, 2026

Adds SonicMoE as a Python-only kernel for accelerated Mixture-of-Experts on Hopper and Blackwell GPUs. SonicMoE uses CuTe-DSL grouped GEMMs and Triton routing kernels to deliver state-of-the-art MoE throughput with IO-aware tile scheduling - see arXiv:2512.14080. The kernel vendors QuACK (v0.2.5) for its GEMM infrastructure and declares nvidia-cutlass-dsl as a CUDA python-depends. Built and verified with nix run .#build-and-copy, and tested end-to-end on H100 via get_kernel("kernels-community/sonic-moe", version=1) with forward/backward correctness checks against the torch baseline.

@adarshxs adarshxs requested review from danieldk and drbh as code owners April 7, 2026 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants