Enable MiniMax-M3 vLLM plugin path#1342
Draft
lirui927 wants to merge 5 commits into
Draft
Conversation
f9d842b to
35ec283
Compare
0644b27 to
bc8c975
Compare
Fix PTPC FP8 MoE loading to preserve offline checkpoint bits and wire MiniMax-M3 sparse MHA metadata/backend support for vLLM serving. Co-authored-by: Cursor <cursoragent@cursor.com>
Reuse vLLM-provided output buffers in sparse MHA prefill/decode and align the adapter with the page-shuffled KV cache layout used by MiniMax-M3 serving. Co-authored-by: Cursor <cursoragent@cursor.com>
f81cf00 to
39ebb75
Compare
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Keep mixed decode/prefill/extend batches phase-local and separate index-cache top-k metadata from main KV-cache sparse block emission to prevent cross-request fp8 accuracy drift.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Enable MiniMax-M3 vLLM plugin path.
Technical Details
Test Plan
Test Result
atom native
vllm-atom