Skip to content

refactor: drop redundant Arc<Bytecode> indirection#144

Merged
flyq merged 4 commits into
mainfrom
liquan/remove_redundant_Arc_Bytecode
Jun 15, 2026
Merged

refactor: drop redundant Arc<Bytecode> indirection#144
flyq merged 4 commits into
mainfrom
liquan/remove_redundant_Arc_Bytecode

Conversation

@flyq

@flyq flyq commented Jun 13, 2026

Copy link
Copy Markdown
Member

Summary

Bytecode (revm::state::Bytecode, from revm-bytecode 6.2.2) is already internally reference-counted, so wrapping it in Arc<Bytecode> was a redundant second layer of indirection. This replaces Arc<Bytecode> with plain Bytecode everywhere it was stored or exchanged (contract cache, ContractStore trait, RPC fetch, witness DB, trace data), dropping one heap allocation and one pointer hop per contract while keeping clones cheap.

Why

Bytecode is an enum whose every field is either Copy or Arc/Bytes-backed:

pub enum Bytecode {                        // derives Clone
    Eip7702(Eip7702Bytecode),              // { Address, u8, raw: Bytes }
    LegacyAnalyzed(LegacyAnalyzedBytecode) // { bytecode: Bytes, original_len, jump_table: JumpTable }
}
// Bytes     -> Arc-backed buffer (bytes 1.x)
// JumpTable -> Arc<BitVec<u8>>  ("immutable, cheap to clone")

So Bytecode::clone() is already an O(1) refcount bump that shares the same allocation — not a deep copy. The outer Arc only saved copying ~7 words of enum metadata, at the cost of an extra heap allocation per contract and an indirection on every access. It also gave zero benefit on the hottest path: revm::DatabaseRef requires an owned Bytecode, so WitnessDatabase already did (**arc).clone() at the read boundary, cloning the inner Bytecode regardless. The previous doc comments justified the Arc as avoiding a "deep-clone on every cache hit," but that premise was false.

What changed

Type substitution Arc<Bytecode> -> Bytecode (plus removal of the Arc::new(..) / (**arc).clone() / .as_ref() adapters and rewritten rationale comments), across: core contract in stateless-core (ContractLookup + ContractStore trait in db.rs, WitnessDatabase.contracts in evm_database.rs, validate_block in executor.rs); cache / persistence in stateless-db (ContractCache weighter/map/get/insert in cache.rs, redb read/write in helpers.rs); RPC / fixtures (RpcClient::get_codes* in rpc_client.rs, TestFixtures in fixtures.rs); and the binaries (validator_db.rs, server_db.rs, data_provider.rs, tracing_executor.rs, both chain_sync.rs). The cache test that asserted Arc::ptr_eq now compares the underlying buffer pointer, demonstrating the shared-allocation property directly. Net diff: 14 files, +100 / -122.

Performance

A wash-to-slight-improvement with no deep copies introduced anywhere. Cache hit goes from an Arc clone (1 atomic inc) to a Bytecode clone (2 atomic incs), offset by removing the Arc::new heap box and a pointer hop; EVM code_by_hash_ref / basic_ref do identical work minus the (**arc) deref; and BlockData clone (trace single-flight) still shares every underlying buffer.

Breaking change

This changes the public ContractStore trait (ContractLookup and add_contracts) from Arc<Bytecode> to Bytecode. The out-of-tree mega-reth FullNode implements ChainStore: ContractStore and needs the same one-line type change in its impl — a coordination point for the EL compat matrix.

Testing

  • cargo check --workspace --all-targets — clean
  • cargo clippy --workspace --all-targets --all-features — no warnings
  • cargo fmt --all --check — clean
  • cargo test --workspace — 209 passed, 0 failed

@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.5%. Comparing base (7c6ee1f) to head (84eefc8).

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown

LGTM. The reasoning is sound — Bytecode is internally reference-counted through Bytes (Arc-backed buffer) and JumpTable (Arc<BitVec>), so the outer Arc added a redundant heap allocation and pointer hop with no benefit. The type substitution is mechanical and consistent across all 14 files.

The replacement of Arc::ptr_eq with bytes_slice().as_ptr() in the cache test correctly captures what actually matters (no deep copy of the buffer), and is a stronger test since it survives any future refactoring of the wrapper type.

One coordination note already called out in the description: the ContractStore trait change needs a matching one-liner update in the mega-reth FullNode impl.

The Arc<Bytecode> removal rewrote the `.cloned()` in `code_by_hash_ref`,
but revm reads contract code inline via `basic_ref`'s `AccountInfo.code`,
so the by-hash path was never exercised and codecov flagged it as the one
uncovered line in the patch. Add a focused unit test covering the
empty-code short-circuit, present-hash hit, and absent-hash error.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown

LGTM. The premise is correct: Bytecode is already internally reference-counted (Bytes + Arc<BitVec>), so the outer Arc was redundant — it saved copying ~7 words of enum metadata but added a heap allocation and a pointer hop, and gave zero benefit on the hottest read path (code_by_hash_ref) which was doing (**arc).clone() anyway.

Changes look correct throughout: the pointer check in cache.rs (bytes_slice().as_ptr() instead of Arc::ptr_eq) is a valid substitute for LegacyAnalyzed bytecodes constructed with new_raw, and the new code_by_hash_ref_serves_contracts_map test covers a previously untested path. The bincode serialization change (encode_to_vec(bytecode) vs encode_to_vec(bytecode.as_ref())) is a no-op since bytecode is already &Bytecode after the type change.

The breaking change to ContractStore (mega-reth coordination) is already called out in the description — nothing to add there.

@flyq

flyq commented Jun 13, 2026

Copy link
Copy Markdown
Member Author

之前 review 这个 PR 时发现 bytecode code 内部已经优化 clone(还的是 Fable 5):https://github.com/megaeth-labs/mega-reth/pull/1867

@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown

Clean, well-justified refactor. The premise is correct: Bytecode's Bytes buffer and JumpTable are both Arc-backed, so the outer Arc was genuinely redundant — it saved copying ~7 words of enum metadata at the cost of an extra heap allocation and a pointer hop on every access, with zero benefit on the revm hot path (basic_ref/code_by_hash_ref) which needed an owned Bytecode anyway.

The change is mechanical across all 14 files, tests pass, and the new code_by_hash_ref_serves_contracts_map test covers the previously-uncovered by-hash path.

One nit on the new test's assertion:

// pointer equality implies the bytes match, so it also confirms the right code came back.
assert_eq!(got.bytes_slice().as_ptr(), code.bytes_slice().as_ptr());

The comment mixes two concerns: shared allocation (what the pointer check proves) and identity (which bytecode came back). Pointer equality is sufficient evidence for both here, but if someone inserts a second entry into the map with the same bytes at a different hash, the pointer check could theoretically pass while returning the wrong entry. Separating the assertions would make the intent clearer: one assert_eq!(got.bytes_slice(), code.bytes_slice()) for correctness and one for the allocation-sharing property. Minor though — not a blocker.

LGTM.

@claude

claude Bot commented Jun 14, 2026

Copy link
Copy Markdown

LGTM. The premise is correct — Bytecode's Bytes buffer and JumpTable are both Arc-backed, so the outer Arc was genuinely redundant indirection. The refactoring is mechanical and consistent, the updated comments are accurate, and the new code_by_hash_ref_serves_contracts_map test adds previously-missing coverage. The breaking change to ContractStore is correctly called out.

@vincent-k2026 vincent-k2026 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@flyq flyq merged commit e8ae193 into main Jun 15, 2026
27 of 28 checks passed
@flyq flyq deleted the liquan/remove_redundant_Arc_Bytecode branch June 15, 2026 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants