Releases · ROCm/ATOM

What's Changed

move from private repo to ROCm by @valarLip in #1
Update logo by @carlushuang in #2
update the link to the repo by @sunway513 in #3
fix the example code cmd by @sunway513 in #5
Update README.md and pyproject.toml by @andyluo7 in #4
update readme for deepseek by @valarLip in #6
support gpt oss by @junhaha666 in #7
gpt_oss update: add fused_qk_rope_reshape_and_cache by @junhaha666 in #9
Fix server startup message to show after model is loaded by @indianspeedster in #10
fix gpt_oss accuracy drop by @junhaha666 in #12
deepseek fp4 by @junhaha666 in #8
gpt_oss: fix moe pad && use uniified attention 3d for full attention decode by @junhaha666 in #15
move preprocess into threadpool avoid serial process by @HaonanWang98 in #19
[perf] add qknorm_quant fusion for DS by @gbyu-amd in #18
engine max_model_len : default is set to hf_config.max_position_embed… by @junhaha666 in #21
[perf] add qknorm_quant and ar_rmsnorm fusion for DS by @gbyu-amd in #17
reduce data for ScheduledBatch by @valarLip in #23
fix port by @valarLip in #24
Mla cache udpate by @junhaha666 in #20
Add license and copyright headers by @ppalaniappan-amd in #30
add ATOM_PROFILER by @amd-ruitang3 in #33
use aiter hip fused_qk_rope_concat_and_cache_mla by @junhaha666 in #31
add qwen3 moe model support by @gbyu-amd in #22
support mtp stage 1: support draft model load by @jiayyu in #39
update benchmark to inferencemax version by @HaonanWang98 in #38
update server by @valarLip in #41
refactor prepare_kv_indices by @valarLip in #43
CI: Initial ATOM CI by @gyohuangxin in #40
limit max_split_per_batch to 16 by @valarLip in #47
support block size convert by @junhaha666 in #51
fix num_kvcache_blocks error by @junhaha666 in #52
Refactor arg_utils.py by @HaonanWang98 in #53
Adapt lm-eval chat completion request. by @HaonanWang98 in #58
CI: Add Dockerfile and nightly docker release pipeline by @gyohuangxin in #46
remove global dict for request id in stream mode by @HaonanWang98 in #60
CI: Add timeout for Nightly docker release by @gyohuangxin in #62
CI: Add ROCm 7.2 preview nightly image by @gyohuangxin in #66
update readme by @valarLip in #67
update readme by @valarLip in #71
CI: Add deepseek in ATOM tests by @gyohuangxin in #64
Making BMM use fp4 weights by @omuhamma in #57
llfp4 weight scale shuffle fix by @amirumoAMD in #74
ds3.2: add one param for top_k_per_row_prefill ops by @PerryZhang01 in #77
support aiter.gemm_a4w4 api changes by @junhaha666 in #70
remove timeout for inter token latency by @HaonanWang98 in #79
Enable INT4 QR for LLFP4 by @amirumoAMD in #76
update utiliy by @valarLip in #81
update_server by @valarLip in #82
Update Dockerfile by @valarLip in #83
Update Dockerfile by @valarLip in #87
Graph: add param check for cuda graph capture by @PerryZhang01 in #85
CI: Temporarily split gfx942 and gfx950 in nigthly docker release by @gyohuangxin in #89
CI: Temporarily split gfx942 and gfx950 in nigthly docker release pushing by @gyohuangxin in #90
CI: Increase timeout when building nightly docker image by @gyohuangxin in #91
CI: Update dockerfile to use PREBUILD_KERNELS=1 by @gyohuangxin in #92
remove async eng by @HaonanWang98 in #86
Perf: save perfermance info in beautiful format by @PerryZhang01 in #80
CI: Update base image to rocm/pytorch:latest in ATOM tests by @gyohuangxin in #88
CI: Fix issues in nightly build pipeline by @gyohuangxin in #93
CI: skip tests when building gfx942 nigthly docker image by @gyohuangxin in #96
MLA: update aiter mqa kernel by @PerryZhang01 in #95
CI: Fix node issues and use pre-download to accelerate tests by @gyohuangxin in #94
clear schedule redundant variables by @inkcherry in #100
Gpt oss triton moe by @junhaha666 in #98
CI: Fix output issues and add gsm8k accuracy tests in CI by @gyohuangxin in #73
CI: Fix issues by @gyohuangxin in #102
CI: Add gpt-oss model by @gyohuangxin in #103
feat: support pa_decode_gluon and refactor attention ops by @PerryZhang01 in #42
CI: Fix CI issues by @gyohuangxin in #104
PA: add ATOM_GPT_OSS_MODEL env for prefill attention by @PerryZhang01 in #105
CI: Add MAX_JOBS when building the nightly image by @gyohuangxin in #106
Update docker-release.yaml by @gyohuangxin in #107
[Perf][Qwen3] Enable qknorm_rope_cache_quant fusion by @gbyu-amd in #65
[fix] fix assert for Qwen3 by @gbyu-amd in #108
DeepSeek v3.2: add sparse prefill mla and fix indexer rope by @junhaha666 in #109
[CI] add Qwen3-235B-A22B-Instruct-2507-FP8 to CI by @gbyu-amd in #110
Update Dockerfile to put aiter/atom under dir /app by @valarLip in #112
adapt for opitimized ps_gluon_pa by @Bernard-Liu in #117
fuse rmsnorm + quant for llama fp8 by @scxiao in #56
code cleanup by @valarLip in #120
fix deepseek accuracy when ENABLE_DS_QKNORM_QUANT_FUSION=1 by @junhaha666 in #121
Update Dockerfile to install latest RCCL by @valarLip in #123
Update atom_test.sh by @valarLip in #122
CI: Enhance the docker release pipeline by @gyohuangxin in #125
llfp4 fail temporary workaround by @amirumoAMD in #75
CI: Fix the docker relase pipeline by @gyohuangxin in #131
Fix torch 2.9 rlock error in torch compile by @ZhangLirong-amd in #114
CI: Fix CI issues by @gyohuangxin in #135
adapt for upstream gluon pa by @Bernard-Liu in #137
[fix] fix gluon pa with bf16 kv by @gbyu-amd in #124
CI: Speed up CI by using a nightly image instead of rebuilding each time by @gyohuangxin in #136
[recipe] Add qwen3 235b recipe by @gbyu-amd in #111
Fix defer output for conc>max_num_seqs by @valarLip in #134
CI: Collect Accuracy tests summary by @gyohuangxin in #132
[Triton] DS FP4/FP8 Triton fusion and GEMM optimization by @k50112113 in #119
Fix DP issues in benchmark and support Mori in Moe by @ZhangLirong-amd in #72
re-enable ATOM_ENABLE_DS_QKNORM_QUANT_FUSION regardless of ATOM_USE_T… by @k50112113 in #139
fuse rmsnorm/quant and act_mul/quant for mxfp4 llama70B by @scxiao in #129
use ck mha instead of triton unified_attention for sink and window by @junhaha666 in #118
Fix attention mha logic error by @ZhangLirong-amd in #141
CI: Add gpt-oss-120b 2 GPUs test by @gyohuangxin in #143
shuffle_weights_update by @valarLip in #144
Add the external facing doc draft for review by @ChuanLi1101 in #99
CI: Re-enable dual-arch builds in the Docker nightly releases by @gyohuangxin...

No results found