Skip to content

Add LMCache-compatible offload connector#1

Open
yhl-amd wants to merge 1 commit into
feature/lmcache-offload-scheme-afrom
feature/lmcache-compatible-connector
Open

Add LMCache-compatible offload connector#1
yhl-amd wants to merge 1 commit into
feature/lmcache-offload-scheme-afrom
feature/lmcache-compatible-connector

Conversation

@yhl-amd

@yhl-amd yhl-amd commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Summary

Replace the current LMCache offload worker save/load path with LMCache CacheEngine.store() / CacheEngine.retrieve() while keeping ATOM's AITER KV layout as opaque raw bytes.

Key pieces:

  • add ATOMRawBytesLMCacheMetadata and ATOMLMCacheGPUConnector;
  • pass the ATOM GPU connector into LMCacheEngineBuilder.get_or_create();
  • add device staging APIs in ATOMKVByteCodec;
  • keep scheduler-side load/save/deferred-free semantics;
  • fix save-frontier accounting so LMCache/HBM-hit prefixes are not re-saved.

Why

The first LMCache-compatible version had a regression where a prefix already hit in LMCache/HBM could still be saved again with skip=0. A bad 128K case looked like:

[OFFLOAD-LOAD-SKIP] seq=5 hbm_cached=129856 lmc_cached=129792 need=-64 reason=hbm_satisfies_after_alloc
[OFFLOAD-SAVE-PROF] req=5 toks=129792 skip=0 store_ms=1442/1463ms

This PR tracks the LMCache hit save frontier and rolls it back only on load failure, so warm requests only save newly computed suffix chunks.

Validation

Host:

python3 -m py_compile atom/kv_transfer/offload/*.py
python3 -m pytest tests/test_lmcache_offload_connector.py -q
# 19 passed, 18 skipped
git diff --check

Docker yhl_kvoff_009:

cd /host_009/ATOM
python3 -m py_compile atom/kv_transfer/offload/*.py
python3 -m pytest tests/test_lmcache_offload_connector.py -q
# 37 passed

Bench Notes

Compared against previous no-fastpath segment_indexed CPU offload:

  • 128K c2/s2 follow avg TTFT: 2.6139s -> 1.9139s; load avg 675.43ms -> 459.30ms.
  • 64K c2/s4 follow avg TTFT: 2.3633s -> 2.2717s; load avg 478.18ms -> 457.24ms.

Server logs confirm enable_prefix_caching=True and enable_chunked_prefill=True; no failed load markers were observed.

@yhl-amd yhl-amd changed the base branch from feature/lmcache-offload-scheme-a to main June 3, 2026 08:14
@yhl-amd yhl-amd changed the base branch from main to feature/lmcache-offload-scheme-a June 3, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant