[minimax_m3] gate AR+RMSNorm fusion on ATOM_ENABLE_ALLREDUCE_RMSNORM_… by zejunchen-zejun · Pull Request #1344 · ROCm/ATOM

zejunchen-zejun · 2026-06-24T13:40:18Z

…FUSION

Previously M3's target-model fusion was hardcoded on (fused_allreduce_gemma_rms_norm fused all-reduce + residual-add + Gemma RMSNorm whenever TP>1), ignoring the ATOM_ENABLE_ALLREDUCE_RMSNORM_FUSION env that the draft / deepseek_v2 / glm4_moe already honor — so the env couldn't disable M3's fusion for A/B testing.

Make both halves of the fusion move together under the flag (default on):

layernorm.py fused_allreduce_gemma_rms_norm: also require envs.ATOM_ENABLE_ALLREDUCE_RMSNORM_FUSION; when off it returns a plain residual-add + Gemma RMSNorm (expects an already-all-reduced input).
minimax_m3.py: the RowParallel o_proj (dense + sparse attn), MoE experts, shared_experts and dense MLP down_proj now use reduce_results=not ENABLE_ALLREDUCE_RMSNORM_FUSION, so when the env is off they do their own all-reduce and the norm runs unfused.

Default (=1) is byte-equivalent to before. Set ATOM_ENABLE_ALLREDUCE_RMSNORM_FUSION=0 to disable M3 target fusion (linears all-reduce, norm unfused).

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…FUSION Previously M3's target-model fusion was hardcoded on (fused_allreduce_gemma_rms_norm fused all-reduce + residual-add + Gemma RMSNorm whenever TP>1), ignoring the ATOM_ENABLE_ALLREDUCE_RMSNORM_FUSION env that the draft / deepseek_v2 / glm4_moe already honor — so the env couldn't disable M3's fusion for A/B testing. Make both halves of the fusion move together under the flag (default on): - layernorm.py fused_allreduce_gemma_rms_norm: also require envs.ATOM_ENABLE_ALLREDUCE_RMSNORM_FUSION; when off it returns a plain residual-add + Gemma RMSNorm (expects an already-all-reduced input). - minimax_m3.py: the RowParallel o_proj (dense + sparse attn), MoE experts, shared_experts and dense MLP down_proj now use reduce_results=not ENABLE_ALLREDUCE_RMSNORM_FUSION, so when the env is off they do their own all-reduce and the norm runs unfused. Default (=1) is byte-equivalent to before. Set ATOM_ENABLE_ALLREDUCE_RMSNORM_FUSION=0 to disable M3 target fusion (linears all-reduce, norm unfused). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

zufayu requested review from JiaoliangYu and ZhangLirong-amd and removed request for ZhangLirong-amd June 26, 2026 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[minimax_m3] gate AR+RMSNorm fusion on ATOM_ENABLE_ALLREDUCE_RMSNORM_…#1344

[minimax_m3] gate AR+RMSNorm fusion on ATOM_ENABLE_ALLREDUCE_RMSNORM_…#1344
zejunchen-zejun wants to merge 1 commit into
wuhuikx/atom-m3-bf16-to-mainfrom
zejun/disable_rmsnorm_ar_fusion

zejunchen-zejun commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

zejunchen-zejun commented Jun 24, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant