Skip to content

refactor(hot): delaminate sharding from HistoryRead/HistoryWrite#68

Open
prestwich wants to merge 21 commits into
mainfrom
prestwich/eng-2287-history-delamination
Open

refactor(hot): delaminate sharding from HistoryRead/HistoryWrite#68
prestwich wants to merge 21 commits into
mainfrom
prestwich/eng-2287-history-delamination

Conversation

@prestwich
Copy link
Copy Markdown
Member

Closes ENG-2287. Subsumes #66 (size-based shard splitter) — the splitter is included here as a backend-private detail of the MDBX impl.

Summary

Reshapes HistoryRead / HistoryWrite in signet-hot so that sharding is a backend-internal concern. The trait surface now exposes only logical history operations; ShardedKey<U256> no longer appears in any public method signature or return type.

Trait layout:

  • crate::db::history::HistoryRead — blanket-impled on HotKvRead, default-impl-only. Backends cannot override (orphan rules + blanket impl structurally enforce this).
  • crate::db::history::HistoryWrite — required per-backend, four methods: append_account_history, append_storage_history, truncate_account_history_above, truncate_storage_history_above. Plus default impls for bulk operations (update_history_indices, append_blocks, unwind_above, load_genesis, etc.).

Backend impls:

  • MemKv writes a single dup entry per (addr) at u64::MAX (or per (addr, slot) at ShardedKey(slot, u64::MAX)). No splitter.
  • signet-hot-mdbx uses the new signet_storage_types::merge_and_split with MAX_SHARD_BYTES = 1500 (constant private to the crate). Load-merge-rewrite for appends; backwards-walk delete-and-re-key-tail for truncates. The u64::MAX-tail subkey convention is preserved structurally.

Splitter (signet_storage_types::merge_and_split):

  • Pure function: takes ownership of (existing, additions, max_bytes), returns (first, second) with second = Some(tail) iff a split occurred.
  • Zero allocations on the no-split fast path beyond roaring container growth; exactly one IntegerList allocation on overflow.
  • Worst-case roaring sizes pinned with property tests in signet-storage-types (dense pack of 650 contiguous values, sparse 100 distinct 16-bit containers, and round-trip through merge_and_split at the 1500 B budget).

Deleted:

  • LegacyHistoryRead, LegacyUnsafeHistoryWrite, LegacyConsistentHistoryWrite.
  • append_to_sharded_history free function in crates/hot/src/db/inconsistent.rs.
  • ShardedKey::SHARD_COUNT constant (count-based splitting is obsolete).
  • crates/hot/src/db/consistent.rs (contents moved as default impls onto HistoryWrite).

Design + plan docs: docs/superpowers/specs/2026-05-22-history-trait-delamination-design.md and docs/superpowers/plans/2026-05-22-history-trait-delamination.md (both gitignored; available locally on the branch).

Why now

PR #66 surfaced two layering concerns that this refactor resolves:

  1. The size-based splitter in append_to_sharded_history referenced MAX_SHARD_BYTES (an MDBX-specific cap) from the abstract crate. The constant now lives in signet-hot-mdbx and the splitter is a pure helper in signet-storage-types.
  2. ShardedKey<U256> appeared in public return types (last_storage_history), forcing every backend — even ones with no DUPSORT value cap — to mimic MDBX's sharded layout. The new logical interface lets each backend choose its strategy.

Test plan

  • cargo +nightly fmt -- --check clean
  • cargo clippy --workspace --all-targets --all-features -- -D warnings clean
  • cargo clippy --workspace --all-targets --no-default-features -- -D warnings clean
  • RUSTDOCFLAGS=\"-D warnings\" cargo doc --workspace --no-deps --all-features clean
  • cargo t --workspace — 196 tests pass across all crates
  • signet-hot conformance suite runs against both MemKv and MDBX backends (6 history conformance entrypoints now wired into conformance())
  • signet-hot-mdbx structural test (history_sharding.rs) confirms MDBX splits oversized input into ≥2 dups and preserves the u64::MAX-tail subkey
  • signet-hot integration tests (load_genesis for local, mainnet, parmigiana, pecorino genesis artifacts) pass
  • signet-storage::unified end-to-end tests (append + drain_above + unwind) pass
  • signet-hot-mdbx unwind conformance: byte-level comparison after unwind against a fresh-build replica
  • Worst-case roaring size tests in signet-storage-types pin encoded bytes vs the 1500 B budget

Commit list (21 commits)

5b24c64 chore(hot): final cleanup from delamination review
2599ed9 test(hot-mdbx): structural assertion that MDBX splits oversized history
14d67de chore(storage-types): drop obsolete ShardedKey::SHARD_COUNT
28aedf8 refactor(hot): delete legacy history surface
02e96a0 test(hot): switch load_genesis integration test to logical history reads
5ba3949 test(hot): rewrite range conformance tests in logical terms
04739c7 test(hot): rewrite history conformance tests in logical terms
c6af236 refactor(hot): move non-shard-aware history helpers onto new HistoryWrite
d14f793 fix(storage): un-gate read-only UnifiedStorage methods from HistoryWrite
c186c8f refactor(hot): move consistent history ops onto new HistoryWrite
6558eec refactor(hot): flip revm test setup to logical HistoryWrite
bac8871 feat(hot-mdbx): implement HistoryWrite using merge_and_split
900a567 feat(hot): implement HistoryWrite for MemKv
aaa27f8 fixup(hot): panic on impossible-state in blocks_changed_* default impls
acc37bd feat(hot): introduce logical HistoryRead / HistoryWrite trait module
0a0bd2d refactor(hot): rename legacy history traits in preparation for delamination
0b54bf4 test(storage-types): pin BlockNumberList worst-case sizes against MDBX budget
e77c2e6 fixup(storage-types): restore debug_assert on merge_and_split precondition
a27e25a feat(storage-types): add merge_and_split for size-bounded history shards
f40f7a4 feat(storage-types): add BlockNumberList::pop_max

Note: commits are unsigned in this draft; they will be re-signed via git rebase --exec 'git commit --amend --no-edit -S' before merge.

🤖 Generated with Claude Code

prestwich and others added 20 commits May 22, 2026 12:11
Used by the upcoming merge_and_split helper to un-push the overflow block
when splitting a shard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure function: takes ownership of (existing, additions), merges them, and
splits off a tail iff the merged list exceeds max_bytes after roaring
encoding. Single linear pass; allocates the tail shard only when a split
is required.

This is the backend-agnostic primitive that signet-hot-mdbx's HistoryWrite
impl will use to maintain its size-bounded DUPSORT shards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ition

The spec mandates a debug-only check that additions fit within max_bytes.
Test 4 (merge_and_split_split_preserves_strict_ordering) was updated to use
sparse input data (one element per 16-bit roaring container) so encoded sizes
grow proportionally to count, keeping additions within budget while still
provoking a split.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…X budget

Two property-style tests assert roaring's encoded size stays under 1500 B
for both the dense-pack (650 contiguous) and sparse-distinct-container
(100 blocks across 100 16-bit chunks) worst cases, plus a round-trip
test through merge_and_split at the realistic budget. These pin roaring's
encoding behavior so a future crate update can't silently break MDBX's
DUPSORT value cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nation

HistoryRead -> LegacyHistoryRead
UnsafeHistoryWrite -> LegacyUnsafeHistoryWrite
HistoryWrite -> LegacyConsistentHistoryWrite

Pure mechanical rename. The unprefixed names are reserved for the new
logical traits introduced in the next commit, which will live in
crate::db::history and expose only logical (no shard-leaking) operations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New module crate::db::history defines:
- HistoryRead: blanket-impled on HotKvRead, default-impl-only.
- HistoryWrite: required per-backend, with default impls for bulk ops.

No callers yet; this is the trait scaffolding. Per-backend HistoryWrite
impls land in subsequent commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace .ok() with .expect("history blocks strictly increasing") on the
two BlockNumberList::append calls inside blocks_changed_account and
blocks_changed_storage. The error case is only reachable on DB
corruption; silently suppressing it would convert corruption into
data loss in the read path. Loud panic is correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single shard per (addr) at subkey u64::MAX, or per (addr, slot) at
ShardedKey(slot, u64::MAX). MemKv has no per-value size budget, so no
splitting is needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MDBX backend's HistoryWrite uses signet_storage_types::merge_and_split with
MAX_SHARD_BYTES = 1500, derived from MDBX's DUPSORT value cap minus key2
and per-node overhead. Load-merge-rewrite for appends; backwards-walk
delete-and-re-key-tail for truncates.

The u64::MAX-tail subkey convention is preserved structurally: appends
always write the tail at u64::MAX; truncates re-key the surviving prefix
to u64::MAX if the original tail was deleted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
setup_history_kv now uses append_account_history / append_storage_history
instead of the shard-aware write_*_history methods. The at-height tests
exercise the new HistoryRead default-impl path through get_account_at_height
/ get_storage_at_height.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
unwind_above's two shard-surgery blocks become single calls to
truncate_account_history_above / truncate_storage_history_above.
load_genesis uses append_account_history / append_storage_history.
validate_chain_extension and append_blocks move to HistoryWrite as
default impls.

The LegacyConsistentHistoryWrite trait and crates/hot/src/db/consistent.rs
are deleted; ADDRESS_MAX moves alongside the new default impls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit accidentally moved hot(), cold(), reader(),
revm_reader(), etc. into the impl block bounded by H::RwTx: HistoryWrite.
These methods don't open a write transaction; the bound is unnecessary
and tightens the public API for no reason. Split into two impl blocks
matching the pre-refactor structure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rite

Bulk helpers (write_plain_revert_sorted, write_state_changes,
append_block_inconsistent, etc.) move as default impls onto
crate::db::history::HistoryWrite. The legacy shard-aware methods
(write_*_history, last_*_history, append_*_history_index,
update_history_indices_inconsistent) stay on LegacyUnsafeHistoryWrite
for now; they are deleted in a subsequent commit.

Removes the temporary LegacyUnsafeHistoryWrite supertrait on HistoryWrite
introduced in the previous commit — the moved helpers now provide
everything the consistent-state default impls need.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All shard-shaped assertions (last_account_history returning specific shard
keys, write_account_history with explicit subkeys) replaced with logical
equivalents using HistoryRead::blocks_changed_account /
HistoryWrite::append_account_history / truncate_account_history_above.
Shard-structure assertions migrate to an MDBX-only structural test in
a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-shard scenarios that previously paired write_account_history calls
with explicit subkeys collapse into single append_account_history calls
with unioned block lists. The multi-shard layout is now an MDBX-internal
detail; conformance tests assert the logical union, not the layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LegacyHistoryRead, LegacyUnsafeHistoryWrite, and the
append_to_sharded_history free fn are all replaced by the logical
HistoryRead / HistoryWrite traits in crate::db::history. Non-shard-aware
header/range/check methods migrate as default impls onto the new
HistoryRead.

No more ShardedKey<_> in public return types. No more shard subkeys in
method signatures. The splitter logic now lives in
signet_storage_types::merge_and_split and is called only from the MDBX
backend.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Count-based shard splitting was replaced by merge_and_split's size-based
logic. The constant has no remaining users.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
200 sparse blocks (200 distinct roaring containers, written as two batches
of 100 to satisfy the per-additions shard precondition) feed the MDBX
HistoryWrite impl. Tests assert that at least 2 dup entries exist for the
addr in AccountsHistory (and analogously for StorageHistory), and that the
tail shard's subkey is u64::MAX. This is the only place ShardedKey<_>
appears outside the abstraction crate by design.

Also adds [[test]] required-features = ["test-utils"] to Cargo.toml so the
integration test is skipped under --no-default-features.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Wire the six new history conformance entrypoints into conformance()
  so they run against both MemKv and MDBX. Closes coverage gap on
  multi-batch appends + standalone truncates. The two update_history_indices
  tests use non-overlapping block windows (1-5 vs 1001-1005) to avoid
  re-indexing shared change-set entries on the shared conformance store.
- Add empirical-size comment to worst_case_dense_pack_fits_in_dupsort_budget
  explaining the 650 vs 750 adjustment.
- Drop a "shard key" terminology leak from a comment in db/history.rs.
- Update setup_history_kv docstring in revm.rs to describe logical
  state rather than shard layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread crates/hot/src/db/history.rs Outdated
@@ -0,0 +1,1002 @@
//! Logical history reads and writes.
//!
//! These traits replace the shard-leaking surface in `db::read` and
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustdoc should reference present behavior only, not past behavior

Comment thread crates/types/src/int_list.rs Outdated
/// `IntegerList::push` already does for roaring container growth. One
/// `IntegerList` allocation when a split occurs.
pub fn merge_and_split(
mut existing: IntegerList,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a function on IntegerList.
and the additions should not be materialized

let cold = ColdStorage::new(cold_backend, cancel_token);
Self::new(hot, cold)
}
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this file changed?

Comment thread crates/hot/src/db/history.rs Outdated

/// Logical reads against history + changeset tables.
///
/// Default-impl-only. Backends cannot override — the blanket impl below
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this section of rustdoc. item-level rustdoc describes ONLY the behavior and how the downstream user interacts with it. archietecture goes in module or lib level

- merge_and_split moves from a free fn to IntegerList::merge_and_split,
  taking an iterator for additions (no materialization). Drops the .clone()
  at the MDBX call site.
- HotKv::RwTx now bounds HistoryWrite, so UnifiedStorage and the
  conformance generics no longer need explicit `where H::RwTx: HistoryWrite`
  clauses. unified.rs reverts to a single impl block.
- Module-level rustdoc in db/history.rs drops the historical "replaces"
  framing; trait-level rustdoc on HistoryRead drops the architecture
  paragraph (override prevention is a property of the blanket impl, not
  user-visible behavior).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant