Enable MiniMax-M3 vLLM plugin path by lirui927 · Pull Request #1342 · ROCm/ATOM

lirui927 · 2026-06-24T09:49:34Z

Motivation

Enable MiniMax-M3 vLLM plugin path.

Technical Details

Add MiniMax-M3 sparse MHA support in the vLLM plugin path.
Align sparse MHA KV cache handling with the page-shuffled layout expected by ATOM kernels.
Reuse vLLM-provided output buffers in sparse prefill/decode to avoid extra allocation and copy.
Handle actual-token slicing for padded vLLM batches and zero padded output tails.
Route mixed decode+prefill sparse batches through the prefill path so per-token sparse block tables are built correctly.

Test Plan

Start vLLM service and run full GSM8K.

Test Result

atom native

vllm-atom

## Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Fix PTPC FP8 MoE loading to preserve offline checkpoint bits and wire MiniMax-M3 sparse MHA metadata/backend support for vLLM serving. Co-authored-by: Cursor <cursoragent@cursor.com>

Reuse vLLM-provided output buffers in sparse MHA prefill/decode and align the adapter with the page-shuffled KV cache layout used by MiniMax-M3 serving. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

Keep mixed decode/prefill/extend batches phase-local and separate index-cache top-k metadata from main KV-cache sparse block emission to prevent cross-request fp8 accuracy drift.

XiaobingSuper force-pushed the lirui/vllm_atom_m3_0624 branch 3 times, most recently from f9d842b to 35ec283 Compare June 24, 2026 17:52

zejunchen-zejun reviewed Jun 25, 2026

View reviewed changes

Comment thread atom/models/minimax_m3.py Outdated

XiaobingSuper force-pushed the lirui/vllm_atom_m3_0624 branch 3 times, most recently from 0644b27 to bc8c975 Compare June 25, 2026 09:03

ganyi1996ppo reviewed Jun 25, 2026

View reviewed changes

Comment thread atom/plugin/vllm/attention/layer_mha.py Outdated

lirui927 and others added 2 commits June 25, 2026 13:15

[MiniMax-M3] support sparse MHA serving in vLLM

6688e00

Fix PTPC FP8 MoE loading to preserve offline checkpoint bits and wire MiniMax-M3 sparse MHA metadata/backend support for vLLM serving. Co-authored-by: Cursor <cursoragent@cursor.com>

[MiniMax-M3] optimize sparse MHA vLLM output path

39ebb75

Reuse vLLM-provided output buffers in sparse MHA prefill/decode and align the adapter with the page-shuffled KV cache layout used by MiniMax-M3 serving. Co-authored-by: Cursor <cursoragent@cursor.com>

XiaobingSuper force-pushed the lirui/vllm_atom_m3_0624 branch from f81cf00 to 39ebb75 Compare June 25, 2026 13:21

XiaobingSuper and others added 3 commits June 25, 2026 09:22

[MiniMax-M3] initialize sparse MHA vLLM cache state

4d4c143

Co-authored-by: Cursor <cursoragent@cursor.com>

[MiniMax-M3] register ATOM MXFP8 quant config

510a01f

Co-authored-by: Cursor <cursoragent@cursor.com>

[MiniMax-M3] fix sparse MHA fp8 metadata alignment

905922d

Keep mixed decode/prefill/extend batches phase-local and separate index-cache top-k metadata from main KV-cache sparse block emission to prevent cross-request fp8 accuracy drift.

XiaobingSuper marked this pull request as draft June 26, 2026 02:47

zufayu requested review from ZhangLirong-amd and valarLip and removed request for ZhangLirong-amd June 26, 2026 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable MiniMax-M3 vLLM plugin path#1342

Enable MiniMax-M3 vLLM plugin path#1342
lirui927 wants to merge 5 commits into
mainfrom
lirui/vllm_atom_m3_0624

lirui927 commented Jun 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

lirui927 commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

atom native

vllm-atom

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lirui927 commented Jun 24, 2026 •

edited

Loading