Revert max_size from 1GB to 128MB to fix KV cache regression by AMD-yanfeiwang · Pull Request #2737 · ROCm/aiter

AMD-yanfeiwang · 2026-04-14T09:44:13Z

The max_size change in 8cfe5e2 (102410241024 = 1GB) causes:

Prefill allreduce dispatch change: QuickReduce INT8 -> custom_all_reduce
GPU memory increase: +3.6GB/GPU for allreduce buffers
KV cache reduction: -75K tokens (-1.68%), -2.47GB

This causes MTP accept rate regression at C>=2 (-8~~10%) and throughput regression (-4~~10%) while only C=1 improves slightly.

Reverting to 819210248*2 = 128MB restores the old dispatch behavior and recovers full KV cache capacity.

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

The max_size change in 8cfe5e2 (1024*1024*1024 = 1GB) causes: 1. Prefill allreduce dispatch change: QuickReduce INT8 -> custom_all_reduce 2. GPU memory increase: +3.6GB/GPU for allreduce buffers 3. KV cache reduction: -75K tokens (-1.68%), -2.47GB This causes MTP accept rate regression at C>=2 (-8~10%) and throughput regression (-4~10%) while only C=1 improves slightly. Reverting to 8192*1024*8*2 = 128MB restores the old dispatch behavior and recovers full KV cache capacity.

github-actions · 2026-04-14T09:45:10Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-355`	Run Triton tests on MI355 in addition to MI325
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2737 --add-label <label>

coderfeli force-pushed the main branch from 8047487 to 303a583 Compare April 14, 2026 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert max_size from 1GB to 128MB to fix KV cache regression#2737

Revert max_size from 1GB to 128MB to fix KV cache regression#2737
AMD-yanfeiwang wants to merge 1 commit intoROCm:mainfrom
AMD-yanfeiwang:fix/revert-max-size-128mb

AMD-yanfeiwang commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AMD-yanfeiwang commented Apr 14, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions bot commented Apr 14, 2026

🏷️ CI Guide

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant