Optimize memory usage for KV cache by hnwyllmm · Pull Request #838 · oceanbase/seekdb

hnwyllmm · 2026-06-02T03:18:57Z

Task Description

In scenarios with small memory specifications (e.g., 2GB mini_mode) or cold starts, the original design of the KV Cache—featuring static large array pre-allocation and frequent small memory requests—led to severe memory fragmentation and inefficient Used/Hold memory ratios (with Hold memory being artificially high).

This MR integrates two key optimization commits, refactoring the allocation model for three core memory labels of the KV Cache: CACHE_MAP_NODE, CACHE_MAP_BKT, and CACHE_MB_HANDLE. After optimization, the basic metadata startup overhead for the KV Cache in a 2GB memory specification dropped sharply from nearly 20MB to ~4MB. The Used/Hold ratio improved significantly, virtually eliminating memory fragmentation waste.

┌───────────────────────┬──────────────────────────┬───────────────────────┬────────────────────────────────────────────┐
│ Memory Label          │ Pre-optimization Hold    │ Post-optimization Hold│ Optimization Effect / Savings              │
├───────────────────────┼──────────────────────────┼───────────────────────┼────────────────────────────────────────────┤
│ CACHE_MAP_NODE        │ ~832 KB (multi-instance) │ Shared Global        │ Saves at least 832 KB base overhead       │
│ CACHE_MAP_BKT         │ ~8.0 MB                 │ ~2.0 MB              │ Reduced by 75%, Used/Hold ratio ~100%     │
│ CACHE_MB_HANDLE       │ ~8.0 MB (static full)   │ ~0 MB (on-demand)    │ Reduced ~100% (cold start), 8KB increments│
│ KvstCachWashStr       │ ~1.1MB                  │ ~8000 B              │ Changed to temp variable, allocated when needed. │
│ Total Metadata Overhead│ ~17 MB                 │ ~2.8 MB (cold start) │ Cold start overhead reduced by 80%+        │
└───────────────────────┴──────────────────────────┴───────────────────────┴────────────────────────────────────────────┘

Solution Description

2. Core Optimization Design & Implementation (Modifications & Design)

2.1 Shared Map Node Allocator (Target: CACHE_MAP_NODE)

Background & Problem:
Originally, each cache instance (ObKVCacheInst, up to MAX_CACHE_NUM) held its own independent lock-free FIFO allocator ObLfFIFOAllocator node_allocator_. Since each allocator instance pre-allocated underlying buffers (blocks/chunks), this caused significant static memory waste as the number of registered cache instances increased.
Refactoring Solution:
Refactored to use a global shared allocator. All ObKVCacheInst instances now share a single global node_allocator_ belonging to ObKVCacheMap.
Optimization Effect:
- Directly saves at least 832KB of pre-allocated memory per tenant.
- Prevents linear expansion of the allocator with increasing cache instance count.

────────────────────────────────────────

2.2 Hash Bucket Pointer Contiguous Large Memory Allocation (Target: CACHE_MAP_BKT)

Background & Problem:
In the old code, the bucket_cnt hash buckets in ObKVCacheMap were allocated via bucket_allocator_ through many small memory allocations (a loop of bucket_cnt times, each allocating only sizeof(Node*) * bucket_size_). Frequent small allocations came with huge allocator management metadata overhead, resulting in an extremely low Used/Hold ratio (in a 2GB spec, actual usage was ~1.6MB, but the allocator held ~8MB of physical memory).
Refactoring Solution:
Changed to a single large allocation, split by offset. During the init phase, a single contiguous block of large memory (sizeof(Node*) * bucket_num_) is allocated via bucket_allocator_ and split by pointer offset for each bucket during initialization. Deallocation requires only a single free of the head pointer.
Optimization Effect:
- Dramatic memory reduction: In a 2GB configuration, the Hold memory for CACHE_MAP_BKT dropped sharply from 8MB to ~2MB.
- Fragmentation eliminated: With hundreds of small memory fragment requests removed, the Used/Hold ratio is now nearly 100%.

────────────────────────────────────────

2.3 Two-Dimensional Segmented Array Dynamic On-Demand Allocation (Target: CACHE_MB_HANDLE)

Background & Problem:
Previously, ObKVCacheStore would pre-allocate a flat, one-dimensional large array mb_handles_ at startup based on max_cache_size to hold all memory block handles. In small memory specs, due to constraints like Hazard Pointer retirement limits and thread count constants, the calculated max_mb_num remained large. This forced a mandatory pre-allocation of ~8MB memory during a cold start, even with no data. Switching to a purely dynamic pool allocation would impact the performance of global traversal in high-frequency background tasks like refresh_score and wash, and introduce concurrency risks.
Refactoring Solution:
Introduced a 2D segmented array ObKVMBHandleArray implementing a dynamically expandable Block mechanism.
1. Segmentation by Block: Uses the system's standard large block size OB_MALLOC_NORMAL_BLOCK_SIZE (8KB) as the physical allocation unit (BLOCK), replacing the flat large array. Handles are stored contiguously within a BLOCK, maximizing memory utilization.
2. Zero Pre-allocation at Cold Start & On-Demand Growth: No BLOCK physical memory is allocated at startup. When try_supply_mb is triggered to supply a block, ensure_blocks is called on-demand to expand and initialize the corresponding BLOCK.
3. Traversal Performance & Lock-Free Safety: Handle location uses an extremely lightweight 2D array index calculation (idx / HANDLE_BLOCK_SIZE and idx % HANDLE_BLOCK_SIZE), preserving the efficiency of the original traversal logic. Safety during concurrent BLOCK expansion by multiple threads is ensured via ATOMIC_LOAD and ATOMIC_BCAS.
4. Decoupled Pool Pre-allocation: The mb_handles_pool_ was modified to be initialized via an allocator, removing its dependency on the physical memory continuity of the original one-dimensional array.
Optimization Effect:
- The memory overhead for CACHE_MB_HANDLE during startup is reduced to nearly zero (only a small number of active blocks are allocated on demand).
- Completely eliminates the 8MB memory waste during cold starts, achieving smooth, stepwise growth of handle memory as data volume increases (in 8KB increments).

Passed Regressions

Upgrade Compatibility

Other Information

Related links:
- DIMA-2026051100116012737

Release Note

hnwyllmm · 2026-06-02T03:19:04Z

The mapping Dima issue is detailed in the Optimization Analysis.

hnwyllmm · 2026-06-02T03:19:10Z

🐒Skip Core Tests
🌟Review Invitation Assistant
✡️MySQLTest Codeowner Auto-Maintenance ---:electric_plug:MySQLTest Case Owner Specification

hnwyllmm · 2026-06-02T03:19:14Z

Core Testing
Execution result: [FAILED] Task execution failed. GID: 4758000268 Details

hnwyllmm · 2026-06-02T05:54:31Z

Core test
Execution process: [FAILED] Task run failed. GID: 4758000268 Details

hnwyllmm · 2026-06-02T05:54:34Z

Core test
Execution process: [SUCCESS] Task ran successfully. GID: 4758000442 Details

hnwyllmm added 4 commits May 28, 2026 14:35

sharing kvcache map node allocator: 832K

e257f29

optmize memory usage about kvcache

062867e

optimize KvstCachWashStr memory

46d0631

fix unittest bug

cbd7570

hnwyllmm force-pushed the task/2026051100116012737 branch from 082d9f5 to cbd7570 Compare June 2, 2026 05:54

hnwyllmm changed the title ~~Optimize memory usage for KV Cache~~ Optimize memory usage for KV cache Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize memory usage for KV cache#838

Optimize memory usage for KV cache#838
hnwyllmm wants to merge 4 commits into
masterfrom
task/2026051100116012737

hnwyllmm commented Jun 2, 2026 •

edited

Loading

Uh oh!

hnwyllmm commented Jun 2, 2026

Uh oh!

hnwyllmm commented Jun 2, 2026

Uh oh!

hnwyllmm commented Jun 2, 2026

Uh oh!

hnwyllmm commented Jun 2, 2026

Uh oh!

hnwyllmm commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hnwyllmm commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Task Description

Solution Description

2. Core Optimization Design & Implementation (Modifications & Design)

2.1 Shared Map Node Allocator (Target: CACHE_MAP_NODE)

2.2 Hash Bucket Pointer Contiguous Large Memory Allocation (Target: CACHE_MAP_BKT)

2.3 Two-Dimensional Segmented Array Dynamic On-Demand Allocation (Target: CACHE_MB_HANDLE)

Passed Regressions

Upgrade Compatibility

Other Information

Release Note

Uh oh!

hnwyllmm commented Jun 2, 2026

Uh oh!

hnwyllmm commented Jun 2, 2026

Uh oh!

hnwyllmm commented Jun 2, 2026

Uh oh!

hnwyllmm commented Jun 2, 2026

Uh oh!

hnwyllmm commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hnwyllmm commented Jun 2, 2026 •

edited

Loading