Skip to content

mHC optimization for expansion rate 4 (flag controlled)#3890

Open
dandragona-dev wants to merge 1 commit into
AI-Hypercomputer:mainfrom
dandragona-dev:dandragona/mhc_k4_shortcut
Open

mHC optimization for expansion rate 4 (flag controlled)#3890
dandragona-dev wants to merge 1 commit into
AI-Hypercomputer:mainfrom
dandragona-dev:dandragona/mhc_k4_shortcut

Conversation

@dandragona-dev
Copy link
Copy Markdown
Collaborator

@dandragona-dev dandragona-dev commented May 12, 2026

Description

When expansion rate is equal to four we don't need to do sinkhorn as we can generate the doubly-stochastic matrix by taking a random convex combination of the permutation matrices (there are 4! = 24 of them when using expansion rate 4).

Based on https://arxiv.org/pdf/2601.05732.

  • Flag-controlled for now, but once we establish a DeepSeek v4 baseline we can use it by default if it shows a performance improvement.
  • Can potentially support this for expansion rates higher than 4 too before N! becomes too large (maybe it will increase training stability?).
  • Did some refactoring on the mhc_test.py code.
  • pylint refactors on some other files.

Tests

Just tested with unit tests, adding the gemini-review now.

Waiting to performance test it once we get baseline DeepSeek v4 numbers.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 57.69231% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/layers/mhc.py 57.69% 9 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@github-actions github-actions Bot mentioned this pull request May 13, 2026
4 tasks
@dandragona-dev dandragona-dev force-pushed the dandragona/mhc_k4_shortcut branch 5 times, most recently from 51c1161 to 1799cc1 Compare May 13, 2026 19:57
@dandragona-dev dandragona-dev force-pushed the dandragona/mhc_k4_shortcut branch 7 times, most recently from 75ceffb to fa7f997 Compare May 14, 2026 18:25
…tions and add enable_mhc_k4_shortcut feature gate
@dandragona-dev dandragona-dev force-pushed the dandragona/mhc_k4_shortcut branch from fa7f997 to b3a8bb4 Compare May 14, 2026 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant