Optimize memory usage for KV cache#838
Open
hnwyllmm wants to merge 4 commits into
Open
Conversation
Member
Author
|
The mapping Dima issue is detailed in the Optimization Analysis. |
Member
Author
Member
Author
|
Core Testing |
082d9f5 to
cbd7570
Compare
Member
Author
|
Core test |
Member
Author
|
Core test |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Task Description
In scenarios with small memory specifications (e.g., 2GB mini_mode) or cold starts, the original design of the KV Cache—featuring static large array pre-allocation and frequent small memory requests—led to severe memory fragmentation and inefficient Used/Hold memory ratios (with Hold memory being artificially high).
This MR integrates two key optimization commits, refactoring the allocation model for three core memory labels of the KV Cache:
CACHE_MAP_NODE,CACHE_MAP_BKT, andCACHE_MB_HANDLE. After optimization, the basic metadata startup overhead for the KV Cache in a 2GB memory specification dropped sharply from nearly 20MB to ~4MB. The Used/Hold ratio improved significantly, virtually eliminating memory fragmentation waste.Solution Description
2. Core Optimization Design & Implementation (Modifications & Design)
2.1 Shared Map Node Allocator (Target: CACHE_MAP_NODE)
Originally, each cache instance (
ObKVCacheInst, up toMAX_CACHE_NUM) held its own independent lock-free FIFO allocatorObLfFIFOAllocator node_allocator_. Since each allocator instance pre-allocated underlying buffers (blocks/chunks), this caused significant static memory waste as the number of registered cache instances increased.Refactored to use a global shared allocator. All
ObKVCacheInstinstances now share a single globalnode_allocator_belonging toObKVCacheMap.────────────────────────────────────────
2.2 Hash Bucket Pointer Contiguous Large Memory Allocation (Target: CACHE_MAP_BKT)
In the old code, the
bucket_cnthash buckets inObKVCacheMapwere allocated viabucket_allocator_through many small memory allocations (a loop ofbucket_cnttimes, each allocating onlysizeof(Node*) * bucket_size_). Frequent small allocations came with huge allocator management metadata overhead, resulting in an extremely low Used/Hold ratio (in a 2GB spec, actual usage was ~1.6MB, but the allocator held ~8MB of physical memory).Changed to a single large allocation, split by offset. During the
initphase, a single contiguous block of large memory (sizeof(Node*) * bucket_num_) is allocated viabucket_allocator_and split by pointer offset for each bucket during initialization. Deallocation requires only a singlefreeof the head pointer.CACHE_MAP_BKTdropped sharply from 8MB to ~2MB.────────────────────────────────────────
2.3 Two-Dimensional Segmented Array Dynamic On-Demand Allocation (Target: CACHE_MB_HANDLE)
Previously,
ObKVCacheStorewould pre-allocate a flat, one-dimensional large arraymb_handles_at startup based onmax_cache_sizeto hold all memory block handles. In small memory specs, due to constraints like Hazard Pointer retirement limits and thread count constants, the calculatedmax_mb_numremained large. This forced a mandatory pre-allocation of ~8MB memory during a cold start, even with no data. Switching to a purely dynamic pool allocation would impact the performance of global traversal in high-frequency background tasks likerefresh_scoreandwash, and introduce concurrency risks.Introduced a 2D segmented array
ObKVMBHandleArrayimplementing a dynamically expandable Block mechanism.OB_MALLOC_NORMAL_BLOCK_SIZE(8KB) as the physical allocation unit (BLOCK), replacing the flat large array. Handles are stored contiguously within a BLOCK, maximizing memory utilization.try_supply_mbis triggered to supply a block,ensure_blocksis called on-demand to expand and initialize the corresponding BLOCK.idx / HANDLE_BLOCK_SIZEandidx % HANDLE_BLOCK_SIZE), preserving the efficiency of the original traversal logic. Safety during concurrent BLOCK expansion by multiple threads is ensured viaATOMIC_LOADandATOMIC_BCAS.mb_handles_pool_was modified to be initialized via an allocator, removing its dependency on the physical memory continuity of the original one-dimensional array.CACHE_MB_HANDLEduring startup is reduced to nearly zero (only a small number of active blocks are allocated on demand).Passed Regressions
Upgrade Compatibility
Other Information
Release Note