add the configs for qwen3-vl-8b-instruct model by sunny-infra · Pull Request #542 · sgl-project/SpecForge

sunny-infra · 2026-04-23T14:15:08Z

Motivation

Add EAGLE3 draft model configuration for the Qwen3-VL-8B-Instruct model, enabling speculative decoding training support for this newly released vision-language model.

Currently, the project supports Qwen2.5-VL series VLMs (7B/32B) for EAGLE3 training, but lacks support for the Qwen3-VL series. This PR fills that gap by providing the necessary draft model config.

Modifications

Added configs/qwen3-vl-8b-instruct-eagle3.json : EAGLE3 draft model configuration for Qwen3-VL-8B-Instruct, with the following key parameters:

target_model_type : "qwen3_vl" — identifies this as a Qwen3-VL target model
hidden_size : 4096, intermediate_size : 12288, num_attention_heads : 32, num_key_value_heads : 8
rope_scaling : mRoPE with mrope_interleaved: true and mrope_section: [24, 20, 20] (differs from Qwen2.5-VL's [16, 24, 24] )
rope_theta : 5000000 (differs from Qwen2.5-VL's 1000000)
vocab_size : 151936, image_token_id : 151655, video_token_id : 151656
VLM-specific token IDs: vision_start_token_id , vision_end_token_id

Related Issues

None.

Accuracy Test

This PR only adds a configuration file and does not modify model-side code (kernels, architecture). No accuracy impact expected.

Benchmark & Profiling

No performance impact — this PR only adds a config file, full support for model training will be updated subsequently.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

gemini-code-assist

Code Review

This pull request introduces a new configuration file for the qwen3-vl-8b-instruct-eagle3 draft model. Several critical issues were identified regarding the compatibility of this configuration with the existing codebase: the target_model_type is not yet supported in the model mapping, the rope_theta value is ignored by the draft model's rotary embedding initialization, and the mrope_interleaved parameter is not handled by the current implementation. These issues will likely result in runtime failures or significant embedding mismatches.

gemini-code-assist · 2026-04-23T14:16:47Z

+  ],
+  "image_token_id": 151655,
+  "model_type": "llama",
+  "target_model_type": "qwen3_vl",


The target_model_type set to "qwen3_vl" is not currently supported in the AutoDistributedTargetModel._model_mapping within specforge/modeling/auto.py. Without adding this mapping (e.g., mapping to a Qwen3VLForCausalLM class), the target model will fail to instantiate during training or inference.

gemini-code-assist · 2026-04-23T14:16:47Z

+    ],
+    "rope_type": "mrope"
+  },
+  "rope_theta": 5000000,


The rope_theta value of 5,000,000 will be ignored for the draft model. In specforge/modeling/draft/llama3_eagle.py, the LlamaAttention._init_rope method instantiates LlamaMutiRotaryEmbedding (used for mrope) without passing the base (theta) parameter, causing it to default to 10,000. This will lead to a significant mismatch between the draft and target model embeddings.

gemini-code-assist · 2026-04-23T14:16:47Z

+  "num_key_value_heads": 8,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": {
+    "mrope_interleaved": true,


The mrope_interleaved parameter is not currently handled by the apply_multimodal_rotary_pos_emb function in specforge/modeling/draft/llama3_eagle.py. The existing implementation assumes contiguous chunks for the temporal, height, and width sections based on mrope_section. If Qwen3-VL uses an interleaved layout, the draft model's rotary embeddings will be incorrect.

add the configs for qwen3-vl-8b-instruct model

41ba68b

sunny-infra requested review from FlamingoPg and FrankLeeeee as code owners April 23, 2026 14:15

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add the configs for qwen3-vl-8b-instruct model#542

add the configs for qwen3-vl-8b-instruct model#542
sunny-infra wants to merge 1 commit into
sgl-project:mainfrom
sunny-infra:main

sunny-infra commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sunny-infra commented Apr 23, 2026

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant