Skip to content

Add fused chunk-major LMCache staging#2

Open
yhl-amd wants to merge 1 commit into
feature/lmcache-compatible-connectorfrom
feature/lmcache-compatible-fused-staging
Open

Add fused chunk-major LMCache staging#2
yhl-amd wants to merge 1 commit into
feature/lmcache-compatible-connectorfrom
feature/lmcache-compatible-fused-staging

Conversation

@yhl-amd

@yhl-amd yhl-amd commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Summary

  • add native HIP chunk-major pack/unpack staging for the LMCache-compatible ATOM connector
  • route LMCache store/load through fastpath=fused_chunk when OFFLOAD_NATIVE_KV_STAGING=1
  • keep existing fastpath=chunk request-level segment-major path as fallback
  • log connector fastpath in OFFLOAD load/save profile output
  • add chunk-major layout, duplicate block id, and fused connector fastpath tests

Validation

  • Host: python3 -m py_compile atom/kv_transfer/offload/*.py
  • Host: python3 -m pytest tests/test_lmcache_offload_connector.py -q -> 19 passed, 22 skipped
  • Docker yhl_kvoff_009: python3 -m py_compile atom/kv_transfer/offload/*.py && python3 -m pytest tests/test_lmcache_offload_connector.py -q -> 41 passed
  • Docker native HIP smoke with OFFLOAD_NATIVE_KV_STAGING=1 OFFLOAD_CODEC_LAYOUT=segment_indexed -> native_kv_staging smoke ok
  • git diff --check

Bench

Config: MiniMax-M2.5, TP=2, prefix cache on, chunked prefill on, MAXBATCH=16384, OFFLOAD_REQUEST_FASTPATH=0, OFFLOAD_NATIVE_STITCH=1, OFFLOAD_NATIVE_KV_STAGING=1, OFFLOAD_MIN_LOAD_TOKENS=1024.

  • 64K c2/s4 fixed-source: avg TTFT 2.2243s, follow avg 1.5089s, load total avg 132.50ms, fastpath=fused_chunk=80, no failures
  • 128K c2/s2 fixed-source: avg TTFT 7.5709s, follow avg 1.3424s, load total avg 144.56ms, fastpath=fused_chunk=86, no failures

Detailed results are recorded in /shared/amdgpu/home/hyi_qle/yhl/project/009-kv-off-llmcache/18_SUMMARY_009_kv_offload.md and code-review/lmcache-compatible-fused-staging-plan.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant