Add P-EAGLE training support by thyways · Pull Request #575 · sgl-project/SpecForge

thyways · 2026-06-04T16:09:58Z

Motivation

Closes #541.

P-EAGLE shows speed comparable to DFlash while also supporting parallel decoding, so SpecForge should support P-EAGLE training.

This PR adds P-EAGLE training support to SpecForge, following the algorithmic direction from:

Paper: P-EAGLE: Parallel-Drafting EAGLE with Scalable Training
Code reference: vllm-project/speculators#480
RFC reference: vllm-project/speculators#292

The implementation adapts the P-EAGLE idea to SpecForge's existing online EAGLE3 training pipeline.

Demo model:

https://huggingface.co/thyways/qwen3_8b_peagle_demo

Modifications

This PR adds P-EAGLE training support.

Main changes:

Add PEagleDraftModel, including the P-EAGLE multi-layer draft architecture.
Add OnlinePEagleModel with COD parallel sampling, P-EAGLE attention masking, loss, and per-depth accuracy metrics.
Add scripts/train_peagle.py for online P-EAGLE training with FSDP, TP/DP support, checkpoint save/resume, mask-token resolution, and wandb logging.
Add Qwen3-8B P-EAGLE config and example training script.
Register P-EAGLE in model/core exports and draft config loading.
Improve conversation normalization for ShareGPT-style from/value messages.
Add minimum_valid_tokens filtering to drop samples without trainable tokens.
Fix SGLang target data padding for loss_mask.
Add regression tests for P-EAGLE metrics, COD sampling, attention masking, parser normalization, and trainable-token filtering.

Implementation notes:

P-EAGLE inherits the EAGLE3-style draft model but performs parallel multi-token prediction.
COD sampling creates sampled prediction depths with geometric downsampling.
A learnable mask_hidden parameter is used for positions that do not have target hidden states in parallel prediction depths.
Draft embeddings are trainable by default.

Related Issues

Closes #541.

Related references:

Accuracy Test

The Qwen3-8B P-EAGLE was trained on 8x A100 GPUs.

Training setup:

Target model: Qwen/Qwen3-8B
Training dataset: jihwan1205/perfectblend-qwen3-8b-regen
Training length: 110k steps
Draft layers: 4
Prediction depths: 5
Max length: 4096
Learning rate: 1e-4

Training Loss：

The trained demo checkpoint is available at:

https://huggingface.co/thyways/qwen3_8b_peagle_demo

SGLang does not yet support P-EAGLE inference, so model-side inference accuracy was not evaluated through SGLang in this PR. For the current validation, I adapted the exported config for vLLM testing:

 {
   "architectures": [
     "Eagle3LlamaForCausalLM"
   ],
   "ptd_token_id": 151669
 }

Benchmark & Profiling

Evaluation setup:

Hardware: 1x B200
Dataset: MT-Bench
EAGLE3 baseline: RedHatAI/Qwen3-8B-speculator.eagle3, k=3
DFlash baseline: z-lab/Qwen3-8B-DFlash-b16, k=15
P-EAGLE: thyways/qwen3_8b_peagle_demo, k=5

Due to limited training resources, this checkpoint is not fully trained yet. However, the current results already validate the effectiveness of the implementation, and longer training is expected to further improve the acceptance quality and speedup, consistent with the P-EAGLE paper.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist · 2026-06-04T16:10:08Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

thyways and others added 27 commits May 28, 2026 03:26

init

fb58b49

init

60294a7

init

4ea5d58

init

10c9f7d

init

71e3857

init

7e75735

init

3065cae

init

a6a07bc

init

cb37889

init

4ce5ac8

init

6982dae

init

60b2f0f

init

eb8a67f

init

257e0d9

init

092ce7b

style: align run_qwen3_8b_peagle_online.sh with eagle3/dflash examples

d9570c8

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

set --num-epochs 20

a1b7996

script init

87cd838

fix bug

2ae7dfc

delete down-sample-ratio min

6ccbb28

init

86b202c

draft vocab 32000

cfb3c67

add down-sample-ratio-min

dc3374d

fix bug

d0faadb

Merge branch 'sgl-project:main' into P-Eagle

81fac60

Update Qwen3 P-EAGLE training settings

d07afc6

Add P-EAGLE regression tests

82e87c9

thyways requested review from FlamingoPg, shuaills and sleepcoo as code owners June 4, 2026 16:10

thyways requested review from FrankLeeeee and zyksir as code owners June 4, 2026 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add P-EAGLE training support#575

Add P-EAGLE training support#575
thyways wants to merge 27 commits into
sgl-project:mainfrom
thyways:P-Eagle

thyways commented Jun 4, 2026

Uh oh!

gemini-code-assist Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thyways commented Jun 4, 2026

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant