add the configs for qwen3-vl-8b-instruct model#542
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new configuration file for the qwen3-vl-8b-instruct-eagle3 draft model. Several critical issues were identified regarding the compatibility of this configuration with the existing codebase: the target_model_type is not yet supported in the model mapping, the rope_theta value is ignored by the draft model's rotary embedding initialization, and the mrope_interleaved parameter is not handled by the current implementation. These issues will likely result in runtime failures or significant embedding mismatches.
| ], | ||
| "image_token_id": 151655, | ||
| "model_type": "llama", | ||
| "target_model_type": "qwen3_vl", |
There was a problem hiding this comment.
| ], | ||
| "rope_type": "mrope" | ||
| }, | ||
| "rope_theta": 5000000, |
There was a problem hiding this comment.
The rope_theta value of 5,000,000 will be ignored for the draft model. In specforge/modeling/draft/llama3_eagle.py, the LlamaAttention._init_rope method instantiates LlamaMutiRotaryEmbedding (used for mrope) without passing the base (theta) parameter, causing it to default to 10,000. This will lead to a significant mismatch between the draft and target model embeddings.
| "num_key_value_heads": 8, | ||
| "rms_norm_eps": 1e-06, | ||
| "rope_scaling": { | ||
| "mrope_interleaved": true, |
There was a problem hiding this comment.
The mrope_interleaved parameter is not currently handled by the apply_multimodal_rotary_pos_emb function in specforge/modeling/draft/llama3_eagle.py. The existing implementation assumes contiguous chunks for the temporal, height, and width sections based on mrope_section. If Qwen3-VL uses an interleaved layout, the draft model's rotary embeddings will be incorrect.
Motivation
Add EAGLE3 draft model configuration for the Qwen3-VL-8B-Instruct model, enabling speculative decoding training support for this newly released vision-language model.
Currently, the project supports Qwen2.5-VL series VLMs (7B/32B) for EAGLE3 training, but lacks support for the Qwen3-VL series. This PR fills that gap by providing the necessary draft model config.
Modifications
Added configs/qwen3-vl-8b-instruct-eagle3.json : EAGLE3 draft model configuration for Qwen3-VL-8B-Instruct, with the following key parameters:
Related Issues
None.
Accuracy Test
This PR only adds a configuration file and does not modify model-side code (kernels, architecture). No accuracy impact expected.
Benchmark & Profiling
No performance impact — this PR only adds a config file, full support for model training will be updated subsequently.
Checklist