Releases: ROCm/ATOM
Releases · ROCm/ATOM
v0.1.2
What's Changed
- move from private repo to ROCm by @valarLip in #1
- Update logo by @carlushuang in #2
- update the link to the repo by @sunway513 in #3
- fix the example code cmd by @sunway513 in #5
- Update README.md and pyproject.toml by @andyluo7 in #4
- update readme for deepseek by @valarLip in #6
- support gpt oss by @junhaha666 in #7
- gpt_oss update: add fused_qk_rope_reshape_and_cache by @junhaha666 in #9
- Fix server startup message to show after model is loaded by @indianspeedster in #10
- fix gpt_oss accuracy drop by @junhaha666 in #12
- deepseek fp4 by @junhaha666 in #8
- gpt_oss: fix moe pad && use uniified attention 3d for full attention decode by @junhaha666 in #15
- move preprocess into threadpool avoid serial process by @HaonanWang98 in #19
- [perf] add qknorm_quant fusion for DS by @gbyu-amd in #18
- engine max_model_len : default is set to hf_config.max_position_embed… by @junhaha666 in #21
- [perf] add qknorm_quant and ar_rmsnorm fusion for DS by @gbyu-amd in #17
- reduce data for ScheduledBatch by @valarLip in #23
- fix port by @valarLip in #24
- Mla cache udpate by @junhaha666 in #20
- Add license and copyright headers by @ppalaniappan-amd in #30
- add ATOM_PROFILER by @amd-ruitang3 in #33
- use aiter hip fused_qk_rope_concat_and_cache_mla by @junhaha666 in #31
- add qwen3 moe model support by @gbyu-amd in #22
- support mtp stage 1: support draft model load by @jiayyu in #39
- update benchmark to inferencemax version by @HaonanWang98 in #38
- update server by @valarLip in #41
- refactor prepare_kv_indices by @valarLip in #43
- CI: Initial ATOM CI by @gyohuangxin in #40
- limit max_split_per_batch to 16 by @valarLip in #47
- support block size convert by @junhaha666 in #51
- fix num_kvcache_blocks error by @junhaha666 in #52
- Refactor arg_utils.py by @HaonanWang98 in #53
- Adapt lm-eval chat completion request. by @HaonanWang98 in #58
- CI: Add Dockerfile and nightly docker release pipeline by @gyohuangxin in #46
- remove global dict for request id in stream mode by @HaonanWang98 in #60
- CI: Add timeout for Nightly docker release by @gyohuangxin in #62
- CI: Add ROCm 7.2 preview nightly image by @gyohuangxin in #66
- update readme by @valarLip in #67
- update readme by @valarLip in #71
- CI: Add deepseek in ATOM tests by @gyohuangxin in #64
- Making BMM use fp4 weights by @omuhamma in #57
- llfp4 weight scale shuffle fix by @amirumoAMD in #74
- ds3.2: add one param for top_k_per_row_prefill ops by @PerryZhang01 in #77
- support aiter.gemm_a4w4 api changes by @junhaha666 in #70
- remove timeout for inter token latency by @HaonanWang98 in #79
- Enable INT4 QR for LLFP4 by @amirumoAMD in #76
- update utiliy by @valarLip in #81
- update_server by @valarLip in #82
- Update Dockerfile by @valarLip in #83
- Update Dockerfile by @valarLip in #87
- Graph: add param check for cuda graph capture by @PerryZhang01 in #85
- CI: Temporarily split gfx942 and gfx950 in nigthly docker release by @gyohuangxin in #89
- CI: Temporarily split gfx942 and gfx950 in nigthly docker release pushing by @gyohuangxin in #90
- CI: Increase timeout when building nightly docker image by @gyohuangxin in #91
- CI: Update dockerfile to use PREBUILD_KERNELS=1 by @gyohuangxin in #92
- remove async eng by @HaonanWang98 in #86
- Perf: save perfermance info in beautiful format by @PerryZhang01 in #80
- CI: Update base image to rocm/pytorch:latest in ATOM tests by @gyohuangxin in #88
- CI: Fix issues in nightly build pipeline by @gyohuangxin in #93
- CI: skip tests when building gfx942 nigthly docker image by @gyohuangxin in #96
- MLA: update aiter mqa kernel by @PerryZhang01 in #95
- CI: Fix node issues and use pre-download to accelerate tests by @gyohuangxin in #94
- clear schedule redundant variables by @inkcherry in #100
- Gpt oss triton moe by @junhaha666 in #98
- CI: Fix output issues and add gsm8k accuracy tests in CI by @gyohuangxin in #73
- CI: Fix issues by @gyohuangxin in #102
- CI: Add gpt-oss model by @gyohuangxin in #103
- feat: support pa_decode_gluon and refactor attention ops by @PerryZhang01 in #42
- CI: Fix CI issues by @gyohuangxin in #104
- PA: add ATOM_GPT_OSS_MODEL env for prefill attention by @PerryZhang01 in #105
- CI: Add MAX_JOBS when building the nightly image by @gyohuangxin in #106
- Update docker-release.yaml by @gyohuangxin in #107
- [Perf][Qwen3] Enable qknorm_rope_cache_quant fusion by @gbyu-amd in #65
- [fix] fix assert for Qwen3 by @gbyu-amd in #108
- DeepSeek v3.2: add sparse prefill mla and fix indexer rope by @junhaha666 in #109
- [CI] add Qwen3-235B-A22B-Instruct-2507-FP8 to CI by @gbyu-amd in #110
- Update Dockerfile to put aiter/atom under dir /app by @valarLip in #112
- adapt for opitimized ps_gluon_pa by @Bernard-Liu in #117
- fuse rmsnorm + quant for llama fp8 by @scxiao in #56
- code cleanup by @valarLip in #120
- fix deepseek accuracy when ENABLE_DS_QKNORM_QUANT_FUSION=1 by @junhaha666 in #121
- Update Dockerfile to install latest RCCL by @valarLip in #123
- Update atom_test.sh by @valarLip in #122
- CI: Enhance the docker release pipeline by @gyohuangxin in #125
- llfp4 fail temporary workaround by @amirumoAMD in #75
- CI: Fix the docker relase pipeline by @gyohuangxin in #131
- Fix torch 2.9 rlock error in torch compile by @ZhangLirong-amd in #114
- CI: Fix CI issues by @gyohuangxin in #135
- adapt for upstream gluon pa by @Bernard-Liu in #137
- [fix] fix gluon pa with bf16 kv by @gbyu-amd in #124
- CI: Speed up CI by using a nightly image instead of rebuilding each time by @gyohuangxin in #136
- [recipe] Add qwen3 235b recipe by @gbyu-amd in #111
- Fix defer output for conc>max_num_seqs by @valarLip in #134
- CI: Collect Accuracy tests summary by @gyohuangxin in #132
- [Triton] DS FP4/FP8 Triton fusion and GEMM optimization by @k50112113 in #119
- Fix DP issues in benchmark and support Mori in Moe by @ZhangLirong-amd in #72
- re-enable ATOM_ENABLE_DS_QKNORM_QUANT_FUSION regardless of ATOM_USE_T… by @k50112113 in #139
- fuse rmsnorm/quant and act_mul/quant for mxfp4 llama70B by @scxiao in #129
- use ck mha instead of triton unified_attention for sink and window by @junhaha666 in #118
- Fix attention mha logic error by @ZhangLirong-amd in #141
- CI: Add gpt-oss-120b 2 GPUs test by @gyohuangxin in #143
- shuffle_weights_update by @valarLip in #144
- Add the external facing doc draft for review by @ChuanLi1101 in #99
- CI: Re-enable dual-arch builds in the Docker nightly releases by @gyohuangxin...