Skip to content

block DB interface#3445

Draft
wen-coding wants to merge 7 commits into
mainfrom
wen/data_store_interface
Draft

block DB interface#3445
wen-coding wants to merge 7 commits into
mainfrom
wen/data_store_interface

Conversation

@wen-coding
Copy link
Copy Markdown
Contributor

@wen-coding wen-coding commented May 15, 2026

Summary

Interface only — no implementation, no consumer rewiring yet. To share with the colleague implementing the backend so the contract is pinned before any code changes hands.

The data.Store interface captures what data.State needs from a durable backing store to replace the current DataWAL (separate file WALs for blocks and FullCommitQCs). Adds a by-hash block index that the WAL didn't have — needed for the /block_by_hash RPC path.

Surface

type Store interface {
    // Writes — two-phase (write, then Flush)
    WriteBlock(ctx, n GlobalBlockNumber, *types.Block) error
    WriteQC(ctx, *types.FullCommitQC) error           // QC carries its GlobalRange
    PruneBefore(ctx, n GlobalBlockNumber) error
    Flush(ctx) error

    // Reads
    ReadAll(ctx) (*Loaded, error)                     // startup replay
    ReadBlockByNumber(ctx, n)    (utils.Option[*types.Block], error)
    ReadBlockByHash(ctx, hash)   (utils.Option[*types.Block], error)
    ReadQCByBlockNumber(ctx, n)  (utils.Option[*types.FullCommitQC], error)  // QC covering block n

    // Lifecycle
    Close(ctx) error
}

Contract highlights (full docs in the file)

  • Concurrency: all methods safe for concurrent use.
  • Durability is two-phase: WriteBlock / WriteQC return without guaranteeing the record is on disk; Flush blocks until everything previously written is durable. Reasoning: synchronous fsync-per-Write at chain throughput is real disk bandwidth — at dozens of blocks/sec it crowds out useful I/O. The expected pattern is "drain queue → write batch → Flush once," not "Flush after every Write." Implementations write eagerly even without Flush; Flush is "wait until durable," not "start writing."
  • What Store does NOT persist: AppHashes (recovered from app.Info().LastBlockAppHash + re-execution; data.State.inner.appProposals is in-memory only), and per-tx execution results / logs / events (those live on the receipt store per the Giga Transaction Query proposal). GlobalBlock.FinalAppState is extracted from qc.Proposal().App() — derivable from the persisted QC, no separate AppHash record needed.
  • GlobalRange convention: half-open interval [GlobalRange.First(), GlobalRange.Next()). The QC covers GlobalBlockNumbers First, First+1, …, Next-1, and Next is the First of the next contiguous QC.
  • QC contiguity: caller (data.State.runPersist) guarantees each QC's First equals the previous one's Next; implementations may but need not enforce.
  • Idempotency: duplicate writes (same n + same hash for blocks, same GlobalRange.First for QCs) are silent no-ops.
  • n on WriteBlock: required because *types.Block does NOT carry its GlobalBlockNumber — block.Header().BlockNumber() is the per-lane number (different typedef). The lane→global mapping lives in the QC's GlobalRange.
  • No separate hash arg on WriteBlock: derivable via block.Header().Hash(); implementation indexes it automatically.
  • Reads return utils.Option: "not yet written" / "pruned" surface as utils.None. Reads are non-blocking; wait semantics live above the interface in data.State.

What's not in this PR

  • No implementation. mem implementation, persistent implementation, all follow-ups.
  • No consumer rewiring. data.State.NewState / runPersist / runPruning still talk to DataWAL; that swap happens once an implementation exists.

Test plan

  • go build ./sei-tendermint/internal/autobahn/data/... — clean
  • gofmt -s -l — clean
  • go vet — clean
  • No tests intentionally — interface only, nothing to exercise yet.

🤖 Generated with Claude Code

…N-272)

Interface only — no implementation, no consumer rewiring. Captures the
contract we want from a database that will replace the file-WAL-based
DataWAL.Blocks + DataWAL.CommitQCs and add a by-hash block index.

Surface:
  - WriteBlock(n, *types.Block)
  - WriteQC(*types.FullCommitQC)            // qc carries its GlobalRange
  - PruneBefore(n)
  - Flush
  - ReadAll() → Loaded{Blocks, QCs}
  - ReadBlockByNumber(n)
  - ReadBlockByHash(hash)
  - ReadQCByBlockNumber(n)                  // QC covering block n
  - Close

Godocs spell out:
  - Concurrency: all methods safe for concurrent use
  - Crash safety: each Write is atomic; cross-write atomicity is the
    caller's problem (DataWAL.reconcile-style)
  - Read-your-writes within a session
  - Contiguity guarantee for QC writes (caller-guaranteed, not
    implementation-enforced)
  - Why n is required on WriteBlock (Block does not carry
    GlobalBlockNumber; only per-lane BlockNumber)
  - Why no separate hash arg on WriteBlock (derivable via
    block.Header().Hash())
  - Read methods are non-blocking; "not yet written" reports as
    (nil, false, nil) — wait semantics live above the interface

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMay 19, 2026, 6:30 PM

@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.29%. Comparing base (823a78d) to head (f295435).
⚠️ Report is 17 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3445   +/-   ##
=======================================
  Coverage   59.29%   59.29%           
=======================================
  Files        2125     2125           
  Lines      175629   175629           
=======================================
  Hits       104144   104144           
  Misses      62404    62404           
  Partials     9081     9081           
Flag Coverage Δ
sei-chain-pr 75.54% <ø> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

wen-coding and others added 2 commits May 15, 2026 14:29
…N-272)

Removing Flush from the Store interface. The existing FullCommitQCPersister
and GlobalBlockPersister don't have Flush either; data.State.runPersist
already relies on PersistQC/PersistBlock returning only after the write
is on disk in order to advance nextBlockToPersist (which gates
PushAppHash → AppVote durability).

Codify that directly: WriteBlock and WriteQC return only after the
record is durable. Implementations that want to batch fsyncs internally
can do so, but the individual Write call still blocks until the batch
covering it has been committed.

Smaller interface, no semantic change vs. the WAL it replaces. If a
future implementation needs an async fast-path with a separate Flush,
we can add it then with a real use case to design against.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the "No separate Flush method is exposed; if an implementation
wants to..." sentence from the type comment. The synchronous-durability
contract above already covers the substance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wen-coding wen-coding changed the title data: Store interface for persistent block + FullCommitQC backing (CON-272) block DB interface May 15, 2026
wen-coding and others added 3 commits May 15, 2026 14:34
Reframe the "Ordering" section as "Ordering and the GlobalRange
convention". Calls out that GlobalRange is [First(), Next()) — First
inclusive, Next exclusive — so the QC covers First, First+1, ...,
Next-1, and Next is the First of the next contiguous QC.

The convention was implicit before (used in ReadQCByBlockNumber's
"GlobalRange().First ≤ n < GlobalRange().Next" and Loaded.QCs's
"[First, Next)" but never stated outright). Implementers shouldn't have
to reverse-engineer it from those scattered references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a "What this does NOT store" section to the Store type doc. Heads
off the natural question — "don't we need to persist execution
results?" — by walking through why AppHash recovery works without
storing it:

  - App.Info().LastBlockAppHash on restart gives us the AppHash for
    the last committed height (lives in the app's CMS, not in
    data.State or DataWAL)
  - Heights above that are re-executed from replayed blocks +
    re-derived AppHashes
  - GlobalBlock.FinalAppState comes from qc.Proposal().App(),
    extracted from the persisted QC — no separate record needed

Per-tx execution results / logs / events live on the receipt store
(canonical txHash → execution result, per the Giga Transaction Query
proposal). Store stays scoped to blocks + QCs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… error)

Per reviewer preference for utils.Option over Go's (value, bool)
tuple pattern. The autobahn data package can freely import
sei-tendermint/libs/utils, so no layering concern (unlike sei-db,
where Transaction.Result() stayed on (bytes, bool) because sei-db
can't see the Option type).

Changed:
  - ReadBlockByNumber(ctx, n) → (utils.Option[*types.Block], error)
  - ReadBlockByHash(ctx, hash) → (utils.Option[*types.Block], error)
  - ReadQCByBlockNumber(ctx, n) → (utils.Option[*types.FullCommitQC], error)

Updated doc comments to refer to utils.None where they previously
said (nil, false, nil).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pompon0
Copy link
Copy Markdown
Contributor

pompon0 commented May 18, 2026

LGTM, couple of remarks:

  • we will not be able to afford ReadAll in prod (perhaps it will still be useful in tests). Instead we will need a method to fetch the available range of blocks/commitqc [first,next).
  • imo (weak preference) the type-aware layer should also be responsible for maintaining consistency between qcs and blocks.
  • I think we shouldn't focus on optimizing db layout rn. Afaik there is a huge chance that from the evm pov we will be aggregating multiple lane blocks into single execution block (sth like, 1 evm block per 1 commit qc) in which case the evm queries will need to respond with the whole evm block. Still, this large evm block (as of today a single commit qc can sequence even 400 lane blocks) is not a reasonable unit of network traffic (400 lane blocks x 2000 txs/block x 1kB/tx ~ 0.8GB), so block syncing will operate on lane blocks (currently equivalent to global blocks) anyway.

…ON-272)

Reviewer note (LittDB perspective): synchronous-fsync-per-Write is real
disk bandwidth. At dozens of blocks/sec the per-record fsync starts
crowding out useful I/O, regardless of which DB sits underneath. Pattern
that works better is two-phase:

  - WriteBlock / WriteQC return without a durability guarantee
  - Caller batches what it wants made durable together
  - Flush once at the batch boundary

Implementation is still free to start writing as records arrive — so
this batches better than the alternative of buffering until a batch is
"closed."

Reworked the Store doc to spell out:
  - The two-phase write/flush contract
  - The expected runPersist pattern (drain queue, write, flush, then
    advance nextBlockToPersist → gates AppVote)
  - Cross-write atomicity is still the caller's problem (Flush gives
    "everything before this is durable," not "everything before this
    is atomic")
  - Read-your-writes within a session is independent of Flush
  - Implementations should still write eagerly without a Flush —
    Flush is "wait until durable," not "tell the impl to start
    writing"

WriteBlock and WriteQC docs now point at the two-phase contract and
explain the runPersist → nextBlockToPersist → AppVote durability
chain. Flush itself has a doc that walks through the same pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants