Skip to content

add the configs for qwen3-vl-8b-instruct model#542

Open
sunny-infra wants to merge 1 commit into
sgl-project:mainfrom
sunny-infra:main
Open

add the configs for qwen3-vl-8b-instruct model#542
sunny-infra wants to merge 1 commit into
sgl-project:mainfrom
sunny-infra:main

Conversation

@sunny-infra
Copy link
Copy Markdown

Motivation

Add EAGLE3 draft model configuration for the Qwen3-VL-8B-Instruct model, enabling speculative decoding training support for this newly released vision-language model.

Currently, the project supports Qwen2.5-VL series VLMs (7B/32B) for EAGLE3 training, but lacks support for the Qwen3-VL series. This PR fills that gap by providing the necessary draft model config.

Modifications

Added configs/qwen3-vl-8b-instruct-eagle3.json : EAGLE3 draft model configuration for Qwen3-VL-8B-Instruct, with the following key parameters:

  • target_model_type : "qwen3_vl" — identifies this as a Qwen3-VL target model
  • hidden_size : 4096, intermediate_size : 12288, num_attention_heads : 32, num_key_value_heads : 8
  • rope_scaling : mRoPE with mrope_interleaved: true and mrope_section: [24, 20, 20] (differs from Qwen2.5-VL's [16, 24, 24] )
  • rope_theta : 5000000 (differs from Qwen2.5-VL's 1000000)
  • vocab_size : 151936, image_token_id : 151655, video_token_id : 151656
  • VLM-specific token IDs: vision_start_token_id , vision_end_token_id

Related Issues

None.

Accuracy Test

This PR only adds a configuration file and does not modify model-side code (kernels, architecture). No accuracy impact expected.

Benchmark & Profiling

No performance impact — this PR only adds a config file, full support for model training will be updated subsequently.

Checklist

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new configuration file for the qwen3-vl-8b-instruct-eagle3 draft model. Several critical issues were identified regarding the compatibility of this configuration with the existing codebase: the target_model_type is not yet supported in the model mapping, the rope_theta value is ignored by the draft model's rotary embedding initialization, and the mrope_interleaved parameter is not handled by the current implementation. These issues will likely result in runtime failures or significant embedding mismatches.

],
"image_token_id": 151655,
"model_type": "llama",
"target_model_type": "qwen3_vl",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The target_model_type set to "qwen3_vl" is not currently supported in the AutoDistributedTargetModel._model_mapping within specforge/modeling/auto.py. Without adding this mapping (e.g., mapping to a Qwen3VLForCausalLM class), the target model will fail to instantiate during training or inference.

],
"rope_type": "mrope"
},
"rope_theta": 5000000,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The rope_theta value of 5,000,000 will be ignored for the draft model. In specforge/modeling/draft/llama3_eagle.py, the LlamaAttention._init_rope method instantiates LlamaMutiRotaryEmbedding (used for mrope) without passing the base (theta) parameter, causing it to default to 10,000. This will lead to a significant mismatch between the draft and target model embeddings.

"num_key_value_heads": 8,
"rms_norm_eps": 1e-06,
"rope_scaling": {
"mrope_interleaved": true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The mrope_interleaved parameter is not currently handled by the apply_multimodal_rotary_pos_emb function in specforge/modeling/draft/llama3_eagle.py. The existing implementation assumes contiguous chunks for the temporal, height, and width sections based on mrope_section. If Qwen3-VL uses an interleaved layout, the draft model's rotary embeddings will be incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant