Quantized SDPA by barronalex · Pull Request #1515 · ml-explore/mlx

barronalex · 2024-10-23T03:38:08Z

First pass at adapting @angeloskath's flash attention to support quantized keys and values.

Still needs some optimization work since it's currently faster to write out the quantized_matmuls rather than use this fused version.

E.g. 4 bit on M2 Ultra for L=32768:

Timing sdpa ... 2.51938 msec
Timing quant_sdpa ... 0.97137 msec
Timing attention ... 1.31419 msec
Timing quant_attention ... 0.92342 msec

bghira · 2025-09-18T04:59:05Z

jfyi i have working int8 and int4 quantised attn, MIT licensed.

zcbenz · 2026-06-24T00:18:28Z

I'm closing this in favor of #3026.

barronalex force-pushed the q-sdpa branch from 42a638f to 1e0a199 Compare December 5, 2024 19:10

Alex Barron added 2 commits December 6, 2024 00:21

working qsdpa

12a4d89

add test

3507c10

barronalex force-pushed the q-sdpa branch from 1e0a199 to 3507c10 Compare December 6, 2024 08:45

Alex Barron added 3 commits December 6, 2024 01:09

add checks

c89ddf6

cpu fallback

7697046

fix test

82a956c

awni mentioned this pull request Apr 28, 2025

Missing f8 dtypes #1670

Open

CC-Yeh mentioned this pull request Jan 20, 2026

Quantized SDPA #3026

Open

7 tasks

zcbenz closed this Jun 24, 2026

zcbenz deleted the q-sdpa branch June 24, 2026 00:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quantized SDPA#1515

Quantized SDPA#1515
barronalex wants to merge 5 commits into
mainfrom
q-sdpa

barronalex commented Oct 23, 2024

Uh oh!

bghira commented Sep 18, 2025

Uh oh!

zcbenz commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

barronalex commented Oct 23, 2024

Uh oh!

bghira commented Sep 18, 2025

Uh oh!

zcbenz commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants