feat: implement offline channel calibration for outlier strategy by andrea-gentilini · Pull Request #94 · TheTom/turboquant_plus

andrea-gentilini · 2026-06-03T14:09:34Z

This PR implements offline channel calibration for the OutlierTurboQuant strategy as described in Section 4.3 of the paper (references [63, 51]).

I noticed the comment in outlier.py stating:

"In practice, you'd pick channels with highest activation magnitude. For data-oblivious quantization, we use fixed split."

This PR implements exactly that practical approach without breaking the data-oblivious runtime constraint.

Changes:

Added a calibration.py utility that sorts and extracts the top indices based on mean absolute magnitude.
Made outlier_idx an optional parameter in OutlierTurboQuant. If left as None, it safely falls back to the original np.arange behavior.
Updated the real model benchmark to utilize the offline calibration step on the KV cache prior to quantizing.

By running the calibration once offline, the indices are passed to the quantizer, which remains 100% data-oblivious during the actual compression (O(1) index mapping), but correctly targets the real "Attention Sinks" instead of blindly assuming they live in the first $N$ channels.

feat: implement offline channel calibration for outlier strategy

9cde5fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement offline channel calibration for outlier strategy#94

feat: implement offline channel calibration for outlier strategy#94
andrea-gentilini wants to merge 1 commit into
TheTom:mainfrom
andrea-gentilini:feature/dynamic-outlier-calibration

andrea-gentilini commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

andrea-gentilini commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant