Skip to content

feat: implement offline channel calibration for outlier strategy#94

Open
andrea-gentilini wants to merge 1 commit into
TheTom:mainfrom
andrea-gentilini:feature/dynamic-outlier-calibration
Open

feat: implement offline channel calibration for outlier strategy#94
andrea-gentilini wants to merge 1 commit into
TheTom:mainfrom
andrea-gentilini:feature/dynamic-outlier-calibration

Conversation

@andrea-gentilini

Copy link
Copy Markdown

This PR implements offline channel calibration for the OutlierTurboQuant strategy as described in Section 4.3 of the paper (references [63, 51]).

I noticed the comment in outlier.py stating:

"In practice, you'd pick channels with highest activation magnitude. For data-oblivious quantization, we use fixed split."

This PR implements exactly that practical approach without breaking the data-oblivious runtime constraint.

Changes:

  1. Added a calibration.py utility that sorts and extracts the top indices based on mean absolute magnitude.
  2. Made outlier_idx an optional parameter in OutlierTurboQuant. If left as None, it safely falls back to the original np.arange behavior.
  3. Updated the real model benchmark to utilize the offline calibration step on the KV cache prior to quantizing.

By running the calibration once offline, the indices are passed to the quantizer, which remains 100% data-oblivious during the actual compression (O(1) index mapping), but correctly targets the real "Attention Sinks" instead of blindly assuming they live in the first $N$ channels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant