DeepSeek V4 Integration#3867
Draft
parambole wants to merge 1 commit into
Draft
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
5f54827 to
07eb3e2
Compare
f84da67 to
3b1d7be
Compare
07eb3e2 to
4520166
Compare
b09cac5 to
45f13b0
Compare
4520166 to
10ca4f6
Compare
a1e3133 to
efc2768
Compare
10ca4f6 to
31a5932
Compare
efc2768 to
6bcffb8
Compare
31a5932 to
c98a34e
Compare
…ation stack Implement full model architecture, decoder integration layers, and execution configurations for DeepSeek-V4 integration into MaxText: - deepseek_v4.py: Model architecture definition supporting cyclical layer stacking and hyper-connections. - decoders.py & nnx_decoders.py: Integration of DeepSeekV4DecoderLayer, supporting get_attention_type routing and scanned vs unrolled compilation parity. - mhc.py & engram.py: Integration of multi-head hyper-connections (mHC) and engram memory management. - Configuration: Register model configs (deepseek_v4-flash.yml, deepseek_v4-tiny.yml) and hyperparameter definitions in base.yml and types.py. - Parity verification: Comprehensive unit test suite (deepseek_v4_vs_reference_test.py) validating end-to-end decoder block parity against PyTorch reference implementations at atol=1e-5, rtol=1e-5.
07ab2eb to
743f096
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Start with a short description of what the PR does and how this is a change from
the past.
The rest of the description includes relevant details and context, examples:
If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456
Notice 1: Once all tests pass, the "pull ready" label will automatically be assigned.
This label is used for administrative purposes. Please do not add it manually.
Notice 2: For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.
Tests
Please describe how you tested this change, and include any instructions and/or
commands to reproduce.
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.