Add unified DeepSpeed finetune demo with MMLU/GSM8K benchmarks#1001
Merged
delock merged 1 commit intoMay 20, 2026
Conversation
…AutoEP configs - Add unified finetune script (finetune_llama.py) with DATASET_REGISTRY supporting Alpaca, CodeAlpaca, Magicoder, MetaMathQA, MMLU, MBPP datasets - Add sample_rate mechanism for dataset downsampling (MetaMathQA: 0.1) - Add MMLU and GSM8K evaluation pipelines (vllm-based generation + scoring) - Add Moonlight AutoEP ZeRO-2 configs (AdamW and Muon) - Add end-to-end run_and_evaluate.sh supporting MBPP/MMLU/GSM8K benchmarks - Add DeepSpeed checkpoint to HF model conversion with AutoEP/MoE support - Update README with dataset registry details, benchmark usage, and configs Signed-off-by: Guokai Ma <guokai.ma@gmail.com>
a0ae7bc to
1171f89
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a new standalone finetuning example under
training/deepspeed_finetune_demo/that demonstrates DeepSpeed's philosophy: use different training features via config files with no code change needed.The example is extracted and extended from DeepSpeed-ZenFlow/finetuning.
Key Features
sample_ratesupport for downsampling large datasetsconvert_ds_to_hf.py) from DeepSpeed format to HuggingFace, with AutoEP (expert parallelism) supportTested Configurations
Verified on Qwen2.5-0.5B with 2x RTX 4090 (AutoDL):
Full pipeline validated: train → convert checkpoint → vLLM eval.
File Structure
Dataset Support
tatsu-lab/alpacasahil2801/CodeAlpaca-20kmeta-math/MetaMathQAcais/mmlu