Skip to content

Feat/low bit ep#1336

Draft
JiaoliangYu wants to merge 13 commits into
ROCm:jly/low-bit-epfrom
JiaoliangYu:feat/mori-fp8-dispatch-clean
Draft

Feat/low bit ep#1336
JiaoliangYu wants to merge 13 commits into
ROCm:jly/low-bit-epfrom
JiaoliangYu:feat/mori-fp8-dispatch-clean

Conversation

@JiaoliangYu

Copy link
Copy Markdown
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

JiaoliangYu and others added 12 commits June 17, 2026 15:33
Pin each worker to a contiguous core range keyed on its rank at the very
top of AsyncIOProc.__init__, before any large allocation, so Linux
first-touch also places memory on the local NUMA node. Gated by
ATOM_CPU_AFFINITY so baseline vs pinned A/B needs no code change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add amd-smi performance-determinism lock before the benchmark/server runs
and an always() unlock to AUTO before container teardown, in both the main
benchmark job and the regression-rerun job. The lock is driver-level and
persists across jobs on the bare-metal runner, so the unlock must run even
on failure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@JiaoliangYu JiaoliangYu force-pushed the feat/mori-fp8-dispatch-clean branch from e4c971c to 2a57892 Compare June 24, 2026 06:35
@JiaoliangYu JiaoliangYu force-pushed the feat/mori-fp8-dispatch-clean branch from 2a57892 to 11a717b Compare June 24, 2026 06:37
@zufayu zufayu requested review from ZhangLirong-amd and removed request for ZhangLirong-amd June 26, 2026 06:12
Comment thread atom/model_ops/moe.py
Returns the (possibly padded) tensor and the original row count so the
caller can unpad after reduce-scatter.
"""
def pad_for_all_gather(x: torch.Tensor):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need change the old logic in all_gather and reduce scatter, I think quant dis/com may not influence such old logic

@zufayu zufayu requested review from ZhangLirong-amd and removed request for ZhangLirong-amd June 26, 2026 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants