Skip to content

Reduce runtime allocation churn#348

Open
KKould wants to merge 3 commits into
mainfrom
optimize-runtime-allocations
Open

Reduce runtime allocation churn#348
KKould wants to merge 3 commits into
mainfrom
optimize-runtime-allocations

Conversation

@KKould
Copy link
Copy Markdown
Member

@KKould KKould commented Jun 5, 2026

What problem does this PR solve?

Reduce allocator churn in binder/planner/optimizer/storage hot paths observed in TPCC LMDB heaptrack runs.

Issue link:

What is changed and how it works?

This PR reduces short-lived allocation pressure by:

  • reusing column-pruning outcome buffers and required-column state
  • avoiding repeated metadata/container clones in DML execution and table scan planning
  • caching histogram bound comparators
  • reusing HEP optimizer local-rule state across batches
  • merging primary-key column inclusion into storage deserializer construction
  • avoiding unnecessary lowercase string allocations when identifiers are already lowercase
  • adding a tpcc-lmdb-heaptrack Makefile target for repeatable profiling

heaptrack_print comparison, fresh main vs this branch with columns_len deserializer capacity:

metric main this branch diff
allocation calls 918,551,655 801,203,429 -117,348,226 (-12.78%)
temporary allocations 185,423,184 135,002,184 -50,421,000 (-27.19%)
peak heap 578.98M 579.19M +217.53K (+0.036%)
runtime 337.55s 332.01s -5.54s

Notable stack changes from the same reports:

  • Transaction::create_deserializers -> RawVec::grow_one: 1,007,139 allocation calls on main; no longer appears in this branch's report after using table.columns_len() capacity.
  • HepOptimizer::apply_local_rules: -52,118,907 allocation calls, -46,689,484 temporary allocations.
  • BTreeMap::clone_subtree / TableCatalog::clone: -51,039,510 allocation calls.
  • TableScanOperator::build: 39,275,715 -> 22,915,483 allocation calls (-41.65%).

Code changes

  • Has Rust code change
  • Has CI related scripts change

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Manual test / profiling:

cargo fmt --check
cargo test --lib storage::
heaptrack_print -f /tmp/tpcc_lmdb_heaptrack_main_fresh.zst > /tmp/tpcc_lmdb_heaptrack_main_fresh.report.txt
heaptrack_print -f /tmp/tpcc_lmdb_heaptrack_current_columns_len.zst > /tmp/tpcc_lmdb_heaptrack_current_columns_len.report.txt
heaptrack_print -f /tmp/tpcc_lmdb_heaptrack_current_columns_len.zst --diff /tmp/tpcc_lmdb_heaptrack_main_fresh.zst > /tmp/tpcc_lmdb_heaptrack_current_columns_len.diff_main.txt

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Note for reviewer

The optimization mainly reduces short-lived allocations. Peak heap is essentially unchanged, which matches the shape of the changes.

@KKould KKould self-assigned this Jun 5, 2026
@KKould KKould added the perf label Jun 5, 2026
@KKould
Copy link
Copy Markdown
Member Author

KKould commented Jun 5, 2026

TPCC full-run comparison (--measure-time 720 --num-ware 1).

Note: with default --max-retry 5, several runs stopped early due to Payment hitting duplicate primary key on the history insert path, so the comparable full-run table below uses --max-retry 100 for both branches/backends.

Backend Branch TpmC New-Order success/late/failure Payment success/late/failure real
LMDB main 63,143 757,715 / 0 / 7,818 757,691 / 0 / 85,349 729.00s
LMDB this PR 67,797 813,567 / 0 / 8,367 813,545 / 0 / 98,395 728.89s
RocksDB main 27,334 328,011 / 2 / 3,405 327,985 / 0 / 17,287 731.10s
RocksDB this PR 27,128 325,538 / 5 / 3,342 325,520 / 0 / 17,729 730.76s

Delta:

  • LMDB: 63,143 -> 67,797, +4,654 TpmC, +7.37%
  • RocksDB: 27,334 -> 27,128, -206 TpmC, -0.75%

Default max_retry=5 observations:

  • PR LMDB: failed twice with Payment duplicate primary key, no TpmC.
  • main LMDB: failed with Payment duplicate primary key, no TpmC.
  • PR RocksDB: failed with Payment duplicate primary key at real 63.99, no TpmC.
  • main RocksDB: completed with 29,423 TpmC at real 731.18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant