write dequantization scripts for DeepSeek V4 FP4/FP8 weights#3873
Open
snehalv2002 wants to merge 1 commit into
Open
write dequantization scripts for DeepSeek V4 FP4/FP8 weights#3873snehalv2002 wants to merge 1 commit into
snehalv2002 wants to merge 1 commit into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
9647437 to
b8b85f2
Compare
b8b85f2 to
8f8ce91
Compare
parambole
reviewed
May 13, 2026
Comment on lines
+183
to
+186
| parser.add_argument("--input-path", "--input-fp8-hf-path", type=str, required=True, | ||
| help="Path to DeepSeek FP8/FP4 Hugging Face folder") | ||
| parser.add_argument("--output-path", "--output-bf16-hf-path", type=str, required=True, | ||
| help="Directory to save output BF16 weights") |
Collaborator
There was a problem hiding this comment.
nit: We can probably have a check to make sure that the input and output paths are not the same ( to prevent override ).
parambole
reviewed
May 13, 2026
| ) | ||
|
|
||
|
|
||
| def weight_dequant_cpu(x: torch.Tensor, s: torch.Tensor, block_size: int = 128) -> torch.Tensor: |
Collaborator
There was a problem hiding this comment.
Rename it to dequantize_fp8 ?
parambole
reviewed
May 13, 2026
Comment on lines
+34
to
+35
| for i in range(0, M, block_size): | ||
| for j in range(0, N, block_size): |
Collaborator
There was a problem hiding this comment.
nit: Can we vectorize this operation ?
parambole
reviewed
May 13, 2026
Collaborator
parambole
left a comment
There was a problem hiding this comment.
Thank you for adding this script. I have left a few comments PTAL.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds scripts for dequantizing the DeepSeek V4 weights to bf16. DeepSeek gives us: MoE weights in INT8 (actually two FP4s stored in 1 byte because torch doesn't have an FP4 dtype) and attention weights in F8_E4M3. All scaling factors are in F8_E8M0. We heavily reference the huggingface dequantization script here.
FIXES: b/510020740
Tests
The test script runs inference on deepseek v4 flash through the transformers library, loading weights from the original deepseek checkpoints and our script dequantized checkpoints, then comparing kl-div, token output, etc. Test results can be seen here.
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.