Skip to content

Add decode (flash-decoding) attention kernels to contributed/#129

Open
varuntej07 wants to merge 2 commits into
aws-neuron:mainfrom
varuntej07:contributed/decode-attention-gqa
Open

Add decode (flash-decoding) attention kernels to contributed/#129
varuntej07 wants to merge 2 commits into
aws-neuron:mainfrom
varuntej07:contributed/decode-attention-gqa

add GQA decode attention with KV tiling and online softmax

4f57504
Select commit
Loading
Failed to load commit list.

Workflow runs completed with no jobs