Skip to content

Fix a potential buffer size misaligning issue in TMA description of partition attention#3

Closed
yuxiaoguo wants to merge 1 commit into
HazyResearch:mainfrom
yuxiaoguo:fix_sm_partition_cacheline
Closed

Fix a potential buffer size misaligning issue in TMA description of partition attention#3
yuxiaoguo wants to merge 1 commit into
HazyResearch:mainfrom
yuxiaoguo:fix_sm_partition_cacheline

Conversation

@yuxiaoguo
Copy link
Copy Markdown

The TMA descriptor for attn_lse_intermediates is initialized based on the original number of SMs in the hardware during make_globals (in latency/scheduler.py). However, its actual allocated size is later rounded up to a multiple of 16 based on the number of SMs (in demos/low-latency-llama/attention_reduction.cu). This discrepancy leads to a failure when creating the TMA descriptor.

@yuxiaoguo yuxiaoguo closed this by deleting the head repository May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant