Skip to content

Bbq support#5

Open
akhauriyash wants to merge 4 commits into
prodfrom
bbq_support
Open

Bbq support#5
akhauriyash wants to merge 4 commits into
prodfrom
bbq_support

Conversation

@akhauriyash
Copy link
Copy Markdown
Collaborator

Summary

Adds an opt-in Megatron-LM RoPE path for xLLM checkpoints that use partial RoPE with the HF/SGLang head-dimension layout.

Why

For xLLM 375B, rope_head_dim=64 and head_dim=128. Megatron’s standard rotary_percent=0.5 path rotates the first contiguous 64 head dims, while the xLLM HF/SGLang implementation applies RoPE after converting through the xLLM head layout. This mismatch can produce large trainer-vs-rollout logprob differences.

Changes

  • Add --xllm-partial-rope-layout
  • Add TransformerConfig.xllm_partial_rope_layout
  • Route unfused RoPE through an xLLM-specific partial-RoPE layout path when enabled
  • Disable fused RoPE for this opt-in path

Scope

The new behavior is disabled by default. Existing Megatron RoPE behavior is unchanged unless --xllm-partial-rope-layout is passed.

Validation

  • python3 -m py_compile megatron/core/models/common/embeddings/rope_utils.py megatron/core/transformer/transformer_config.py megatron/training/arguments.py
  • git diff --check

…ansformer_config.py: adds layernorm_num_groups, exposed as --layernorm-num-groups.

  - Megatron-LM/megatron/core/transformer/torch_norm.py: adds native GroupRMSNorm.
  - Megatron-LM/megatron/core/extensions/transformer_engine.py: makes TENorm return GroupRMSNorm when groups > 1.
  - Megatron-LM/megatron/core/extensions/transformer_engine_spec_provider.py and Megatron-LM/
    megatron/core/models/gpt/gpt_layer_specs.py: disable TE fused LN+linear when grouped RMSNorm is used and add the checkpoint key mapping for unfused norms.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant