Skip to content

docs(full-history): split getTransaction design into its own doc; reorganize design-docs#787

Open
tamirms wants to merge 17 commits into
feature/full-historyfrom
streaming-design
Open

docs(full-history): split getTransaction design into its own doc; reorganize design-docs#787
tamirms wants to merge 17 commits into
feature/full-historyfrom
streaming-design

Conversation

@tamirms

@tamirms tamirms commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

What this delivers

Net of this branch against feature/full-history: three new design docs under
design-docs/, plus removal of the stale numbered-doc README — written in a
plain, concept-first voice that leans on the pseudocode rather than
re-narrating it in prose.

  • full-history-streaming-workflow.md — the streaming daemon design:
    geometry, the catalog and the one write protocol, backfill
    (postcondition-driven planning → primitives → execution model), hot-DB
    ingestion, the lifecycle run (freeze → rebuild → discard → prune), the
    reader contract, and the correctness invariants (INV-1…4) with their audits.
  • gettransaction-full-history-design.md — the transaction-by-hash
    subsystem end to end: the hot txhash CF, the .bin/.idx streamhash
    formats, the rolling window-index rebuild, the getTransaction query path,
    and capacity numbers. Scoped to storage / rebuild / query — the crash-safety
    machinery lives in the streaming doc. The streaming doc references it for
    tx-index depth, mirroring how the events subsystem already has its own doc.
  • full-history-design-explorer.html — a self-contained interactive
    companion to both docs (no external dependencies; open in a browser), with a
    widget per concept: geometry, the artifact lifecycles and one write protocol,
    a crash simulator, the derived last-committed-ledger, the rolling index, a
    chunk boundary end to end, the resolver and executePlan, the startup
    backfill loop, the reader contract, and the invariants.

The full-history design docs are consolidated into the top-level design-docs/
alongside the existing events and packfile docs, with the numeric prefixes
dropped. full-history/design-docs/03-backfill-workflow.md is unchanged vs.
this base (retained as the historical standalone-backfill design; the streaming
doc records that it is subsumed).

Docs only — no code changes.

Notes for review

  • Easiest to review by reading the rendered markdown / opening the HTML
    explorer rather than the raw diff.

🤖 Generated with Claude Code

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7640b9a6b1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


`txhash/index/{window:08d}/{lo:08d}-{hi:08d}.idx`, meta-store key `index:{window:08d}:{lo:08d}:{hi:08d}`. One streamhash minimal-perfect-hash file per **coverage**, built by streamhash's `SortedBuilder` over the k-way merge of `.bin[lo..hi]`, with the cold-txhash option set:

- **Payload: 3 bytes** — the ledger seq stored as an offset from `MinLedger`, where `MinLedger = chunkFirstLedger(lo)` is derived from the build range and embedded in the file as user metadata. 3 bytes spans 16.7M ledgers, comfortably over a 10M-ledger window. No sidecar metadata.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce the 24-bit payload limit

The surrounding docs make chunks_per_txhash_index configurable, but this .idx format stores only a 24-bit ledger offset. If an operator pins chunks_per_txhash_index >= 1678 (more than 16.77M ledgers), offsets from MinLedger no longer fit in 3 bytes, so the generated index cannot represent the tail of the window and cold getTransaction lookups will wrap or fail. Please either cap validation at 1677 chunks or make the payload width depend on the configured window size.

Useful? React with 👍 / 👎.

// (produced locally via catchupSource's hot branch) — the bulk
// backend is never asked for them.
rangeEnd := lastCompleteChunkAt(anchor)
watermarkMidChunk := lastCommitted != chunkLastLedger(chunkID(lastCommitted))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard the genesis watermark before chunkID

On an empty/genesis start, deriveWatermark is documented to return earliest_ledger - 1 (ledger 1 when the floor is genesis), which is outside the valid domain for chunkID(seq) = (seq - 2) / 10000. For a network younger than one chunk, withinOneChunkOfTip is true and the later rangeEnd = chunkID(lastCommitted) - 1 turns the intended empty range into a huge wrapped chunk id, defeating the rangeEnd < rangeStart guard and scheduling bogus backfill work. Guard this sentinel before calling chunkID or use a signed/no-chunk representation.

Useful? React with 👍 / 👎.

Add the full-history RPC design docs under design-docs/:

- full-history-streaming-workflow.md — the streaming daemon: geometry, the
  meta store and the one write protocol, catch-up, hot-DB ingestion, the
  lifecycle tick (freeze -> rebuild -> discard -> prune), the reader
  retention contract, and the correctness invariants (INV-1..4) with audits.
- gettransaction-full-history-design.md — the transaction-by-hash subsystem:
  the hot txhash CF, the .bin/.idx streamhash formats, the rolling
  window-index build protocol (pseudocode, crash matrix, rewriting-safety
  argument), the getTransaction read path, and capacity numbers.
- full-history-design-explorer.html — a self-contained interactive companion
  to both docs.

Docs live in the top-level design-docs/ alongside the events and packfile
designs; the standalone backfill workflow doc is retained as historical and
is subsumed by the streaming doc. Docs only — no code changes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@tamirms tamirms force-pushed the streaming-design branch from 7640b9a to 6ce1159 Compare June 15, 2026 17:26

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ce11597d6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

INV-1 holds at every point the daemon is serving reads — transient states are never externally visible, because readers resolve `"frozen"` keys exclusively and the retention check masks everything else. INV-2, INV-3, and INV-4 hold at every quiescence reached after the events below; startup's first quiescence arrives when the first tick completes, shortly after reads open.

1. **Steady-state operation.** Hot DB ingestion advances `last_committed_ledger`; the lifecycle goroutine freezes complete chunks within retention and prunes anything past it. All four invariants hold by induction on `last_committed_ledger`.
2. **Operator state changes** — retention widening or shortening (`retention_chunks`), `earliest_ledger` raised. Both reduce to "`effectiveRetentionFloor` recomputes; the next startup converges to the new state." Catch-up's per-window resolver rule re-derives and rebuilds any window whose desired coverage now exceeds its stored coverage; the prune stage removes anything below a raised floor. The "next startup" is load-bearing for widening, enforced by the floor's two-role split: a lowered floor takes effect immediately in its *retention* role (pruning simply stops sooner), but the tick's *production* range still starts at existing storage — only the next catch-up, behind `validateBackendCovers`, materializes the new bottom.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove unsupported earliest_ledger raise path

This scenario says raising earliest_ledger converges on the next startup, but the startup validation above stores config:earliest_ledger on first start and fatals on any later non-now mismatch, while the configuration section also says changing it currently requires wiping the data directory or a future admin command. An operator following this documented state-change path will hit a startup abort rather than convergence, so either drop earliest_ledger from the supported operator changes here or describe the future admin workflow explicitly.

Useful? React with 👍 / 👎.

…nesis-corner fixes

Follow-up to the design-doc PR addressing the automated review.

gettransaction (#1): size the .idx ledger-seq payload to the window —
payloadWidth = ceil(log2(chunks_per_txhash_index * 10_000) / 8) — instead of
a fixed 3 bytes, so the format never caps chunks_per_txhash_index (3 bytes up
to 1677 chunks, 4 bytes beyond). Capacity references qualified as
default-window figures.

streaming (#2): fix the young-genesis corner case at the root. Add the
chunk-id convention to Geometry — signed ids, chunkID floor-divides so the
earliest_ledger-1 watermark sentinel maps to chunk -1 with
chunkLastLedger(-1) = 1, and the -1 sentinel is never serialized. With it,
deriveCompleteThrough's cold term (highestDurableChunk returns -1 on a fresh
start) and positional term (maxChunk-1 = -1 at chunk 0) degrade to the
pre-genesis sentinel instead of a spurious chunk-0 bound that would resume a
young network past its tip; the startup mid-chunk test reads the sentinel as
a chunk boundary, no special-case guard needed.

Codex finding #3 (earliest_ledger raise path) was a false positive: the doc
consistently frames earliest_ledger movement as a deferred operation gated
behind a future admin command.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d18a26b00a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

### 8.2 Cold lookup

```
resolve the window's unique "frozen" key

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Specify how a hash lookup chooses its index window

When the retention span contains more than one tx-hash window, this cold lookup has no deterministic window to resolve: the RPC request only supplies a hash, while each .idx is scoped to its own [lo, hi] range and stores only a window-local ledger offset. Probing a single “the window” misses transactions stored in every other window, but probing all windows contradicts the stated one-index-probe read path and needs different false-positive handling, so the design needs either a top-level hash→window locator or an explicit multi-window scan algorithm.

Useful? React with 👍 / 👎.

…up, scope/wording cleanup

Follow-up to the design-doc review.

Genesis corner (streaming): replace the per-site sentinel guard with one
chunk-id convention in Geometry — chunk ids are signed and chunkID floor-
divides, so the earliest_ledger-1 watermark sentinel maps to chunk -1 with
chunkLastLedger(-1) = 1, never serialized. deriveCompleteThrough's cold and
positional terms are then correct by construction on a young genesis network
(highestDurableChunk returns -1; maxChunk-1 = -1 at chunk 0), instead of a
spurious chunk-0 bound that would resume past the tip.

getTransaction read path: a cold lookup probes EVERY in-retention window's
.idx (a hash carries no window hint); correctness is the streamhash
fingerprint screen plus the mandatory fetch-and-verify, which confirms at
most one window. payloadWidth/fpWidth are streamhash config knobs.

Scope: how reads stay correct while sweeps and pruning unlink files
concurrently (the ENOENT races, coverage re-resolution) is the query-routing
design's concern, per the doc's own deferral — trim tx 8.4 and the streaming
reader-retention-contract section to the in-scope contract plus that deferral.

Wording: drop vacuous/overselling phrasing across both docs (the "one home"
framing, "bounded by RAM", "structurally unreachable", filler qualifiers).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1c6c1adc90

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +757 to +758
if stored, ok := cat.Get("config:chunks_per_txhash_index"); !ok {
cat.Put("config:chunks_per_txhash_index", itoa(cfg.ChunksPerTxhashIndex))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject a zero tx-hash window size

When chunks_per_txhash_index is set to 0 on a first start, this path records it as the immutable layout value before any range validation. The rest of the design divides by this value for indexID(c) / window ranges and uses it to size payloads, so a bad first config can either panic at runtime or pin an unusable data directory; validate it is non-zero before storing the config pin.

Useful? React with 👍 / 👎.

A zero value would divide-by-zero in indexID / window math (and break
payloadWidth) and, pinned on first start, permanently brick the data dir.
Validate non-zero before storing the config pin.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76d5d231a0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

}

func executePlan(ctx context.Context, plan Plan, cfg Config) error {
slots := make(chan struct{}, cfg.Workers) // the ONLY concurrency knob: one pool, all work kinds

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject non-positive worker counts

workers is user-configurable as an int, but the config validation shown above only rejects a zero tx-hash window size. When [catch_up].workers = 0, this channel is unbuffered and every chunk/index task blocks on slots <- before doing any work, while the only receive is deferred until after work completes, so any non-empty catch-up or lifecycle plan hangs forever; negative values panic in make. Validate or coerce cfg.Workers > 0 before executing the plan.

Useful? React with 👍 / 👎.

entry × count 20 bytes each: [key: 16][seq: 4 LE]
```

- `key` is the **first 16 bytes of the transaction hash**.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle duplicate cold-index prefixes

Because the cold .bin/MPHF key stores only the first 16 bytes, two in-retention transaction hashes with the same prefix become duplicate index keys with different ledger payloads. Streamhash can only map a key to one payload (or fail the build), so the later full-hash verification prevents returning the wrong transaction but turns the other real transaction into a false not-found, violating the doc's completeness/no-false-negatives guarantee for that collision case. Store a disambiguating full hash or collision list if prefix collisions are meant to be harmless.

Useful? React with 👍 / 👎.

tamirms and others added 2 commits June 16, 2026 10:13
The explorer had drifted from the design docs:
- Read-path explorer showed single-window resolution ("resolve the window's
  unique frozen key") — the exact bug the md was corrected away from. Rewrote
  it for probe-all: every in-retention window probed, fingerprint screen +
  fetch-and-verify, at most one confirms. Scenarios are now Found / Fingerprint
  false positive / Not found.
- Dropped the four ENOENT race scenarios; that read-vs-pruning handling is
  deferred to the query-routing design (matches the md's 8.4 pullback).
- chunkID shown as floor((seq-2)/10_000); payload referenced as payloadWidth
  (was a fixed 3 bytes).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…sion note

- validateConfig rejects workers < 1: a zero pool makes executePlan's
  slots channel unbuffered, so every task blocks on acquire forever
  (deadlock); negative panics in make.
- Correct the §8.2 16-byte-prefix-collision sentence. It claimed the
  collision "returns not-found", which is wrong for the cross-window case:
  probe-all + continue-past-verify-mismatch finds each transaction in its
  own window, and an intra-window collision is detected by streamhash at
  build time. The verify still guarantees the wrong transaction is never
  served; the residual collision (~10^-20/window) is accepted as negligible.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 535aa8810b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +783 to +785
if desired > backendNetworkTip(cfg) {
fatalf("earliest_ledger (%d) is past the current tip; reject.", desired)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate the earliest tip check to first start

When the config uses a numeric earliest_ledger on an existing data dir, this check still compares that value to the currently sampled backend tip before considering the stored pin. If the bulk LedgerBackend lags below the pinned floor (a case the startup loop later explicitly handles with max(tip, lastCommitted)), restart fatals even though local data may already be ahead; limit this desired > tip rejection to first start or compare after loading the stored value so backend lag does not brick numeric-frontfill deployments.

Useful? React with 👍 / 👎.

validateConfig sampled the backend tip on every start — the numeric
floor-past-tip rejection and the "now" re-resolution both. On a restart of a
numeric-frontfill deployment, a lagging bulk backend (tip below the pinned
floor — the case the startup loop's max(tip, lastCommitted) is built for)
would spuriously fatal, even though local data is already ahead.

Restructure around the pin: on restart, trust the stored pin and only check
the config didn't change (no backend call); sample the tip only on first
start, where the floor-past-tip rejection is meaningful. Fixes the numeric
restart fatal and the redundant "now" resampling, and removes the backend
dependency from the restart config path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e54a73259d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

fatalf("workers must be > 0 (got %d) — a zero pool deadlocks executePlan.", cfg.Workers)
}
if stored, ok := cat.Get("config:chunks_per_txhash_index"); !ok {
cat.Put("config:chunks_per_txhash_index", itoa(cfg.ChunksPerTxhashIndex))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pin layout only after first-start validation

On a first start this writes config:chunks_per_txhash_index before earliest_ledger has been parsed, alignment-checked, or compared with the backend tip. If that same startup then fatals on an invalid/past-tip earliest_ledger (or crashes before line 810), the data dir is left with only the tx-hash layout pin; retrying with a corrected chunks_per_txhash_index now aborts as a layout mismatch even though no artifacts were created. Resolve and validate all first-start pins first, then commit them together.

Useful? React with 👍 / 👎.

…te max_retries

- validateConfig pinned chunks_per_txhash_index before earliest_ledger was
  validated, so a first start that fataled on a bad earliest_ledger left a
  lone chunks pin — a retry correcting chunks_per_txhash_index then aborted
  as a layout mismatch though no artifacts existed. Validate both layout
  pins first, then commit them together in one atomic batch; treat "both
  pins present" as the committed/immutable state.

- Proactively: validate max_retries >= 0 in the same stateless block (a
  negative value is meaningless and, depending on withRetries, could skip a
  catch-up task silently). Same class as the chunks=0 / workers<1 guards.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d597a77a9d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// Every in-range chunk must be producible from SOME source: durable
// artifacts (self-skips), a complete ready hot DB, the local .pack, or
// the bulk backend — fail before any work otherwise.
if err := validateBackendCovers(cfg, rangeStart, rangeEnd); err != nil {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate only chunks that require the backend

When the bulk backend lags local state, startup intentionally computes rangeEnd from max(tip, lastCommitted), so the range can include chunks above the backend tip that are already complete locally. This unconditional validateBackendCovers(cfg, rangeStart, rangeEnd) runs before resolve/catchupSource can self-skip or choose the hot DB/pack source, and the config section describes this validator as checking the configured bulk backend; in that restart it aborts even though no backend read is needed for those chunks. Make the validation mirror the source preference, or validate only chunks that would actually fall through to the backend.

Useful? React with 👍 / 👎.

tamirms and others added 3 commits June 16, 2026 14:52
The numeric earliest_ledger path called atoi(cfg.EarliestLedger) with no
parse check, so a malformed value panicked or fell through to a confusing
"must be chunk-aligned (0)". Validate the form once in the stateless block
("genesis" / "now" / a ledger number), which also makes the restart-branch
atoi safe.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The name read as "the bulk backend covers [rangeStart, rangeEnd]", but the
startup loop deliberately sets rangeEnd above a lagging bulk tip (anchored on
max(tip, lastCommitted)), and those chunks are produced locally. Rename and
restate the contract: validate every in-range chunk is producible from SOME
source (mirroring catchupSource) — the backend need cover only the chunks
that fall through to it, not the whole range. Prevents a spurious abort on a
lagging-backend restart.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ive core) + config bounds

The audit showed the core design (concurrency, atomicity, arithmetic,
consistency) is sound; the gaps were all at external-dependency edges and a
couple of missing config bounds. One principle: untrusted external state must
never poison durable state or silently stop the daemon.

- networkTip(): single hardened tip sampler — retry-with-backoff, and reject a
  tip < genesis ("backend not ready"). Routes all three call sites. "now" (no
  local substitute) fatals on failure; the catch-up loop falls back to
  lastCommitted (serve local, skip catch-up); the numeric past-tip check is
  best-effort. Fixes: first-start "now"/below-genesis tip pinning a garbage
  immutable floor; a transient backend outage bricking a local-only restart.

- earliest_ledger static-form validated once (parse + >= genesis + aligned),
  so no path feeds a sub-genesis value into chunkID.

- chunks_per_txhash_index upper-bounded (MaxChunksPerTxhashIndex) so the
  window span fits the index offset and can't overflow uint32.

- lastCompleteChunkAt: cast before subtract — total over uint32 (was a latent
  underflow at ledger 0).

- runIngestionLoop: distinguish ctx-cancel (clean, return nil) from an
  unexpected captive-core stream close (restartable error → non-zero exit →
  supervisor restarts), instead of silently exiting 0 when core dies.

- tx §3 cold-tier cell: "in [lo, hi]" (was "at or below hi", dropping the lo bound).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8cf6b909d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +730 to +733
tip, err := networkTip(cfg)
if err != nil {
tip = lastCommitted // backend unreachable: serve local, skip catch-up this pass
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Abort first-start catch-up when the tip is unavailable

When this fallback runs on a first start with earliest_ledger="genesis" or a numeric floor, lastCommitted is only the earliest_ledger - 1 sentinel, so substituting it for an unreachable backend tip makes rangeEnd < rangeStart and skips backfill entirely before serveReads() starts. That violates the startup contract that the daemon catches up before serving and can expose an empty/incomplete history; only degrade to the local watermark once the catalog already contains a complete local retention window, otherwise fail startup until a real tip is available.

Useful? React with 👍 / 👎.

The select-based loop returned an error on the stream-close case without
checking ctx, so a graceful shutdown (ctx cancelled AND core closes its
stream) could exit non-zero if select picked the channel branch — making the
supervisor restart a daemon that was asked to stop. Guard the close case with
ctx.Err(): a close while ctx is done is a clean shutdown (return nil); only an
unexpected close is the restartable error.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5631e4ce0d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

- The executor runs each `IndexBuild` via `buildThenSweep` (defined with `resolve` above), which lands the commit batch (terminal for complete windows) and then runs the eager `"pruning"` sweep (rule 4). The sweep is window-local — this window's demoted inputs and superseded coverages, not a store-wide scan — so concurrent windows' sweeps touch disjoint keys, and `fsyncDir` on a bucket dir shared with another window's in-flight `.bin` writes is safe (a dir fsync with concurrent creates just makes more entries durable).
- Done-channels broadcast *completion*, not success: a chunk build that exhausts its retries still closes its channel (the `defer`), so a dependent index build can win the race against context cancellation and start — whereupon it fails `buildTxhashIndex`'s loud `.bin` precondition check before writing any key, landing on the same abort-and-restart path as the original failure. The precondition check is load-bearing here.
- A task that exhausts its retries aborts the daemon, per the [error policy](#lifecycle); restart re-resolves from durable keys, and completed work never repeats.
- **Single-process enforcement:** the meta store holds a kernel `flock` on a `LOCK` file; a second daemon opening the **same meta-store path** fails immediately, and the lock releases on any process exit (including `kill -9`). Because `[meta_store]` and each `[immutable_storage.*]` path are independently configurable, the meta-store lock alone cannot stop two daemons with *different* meta stores from sharing one artifact tree — the daemon therefore also takes a `flock` in each configured storage root.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include hot storage in the lock set

When two daemons use different meta-store paths but share [streaming.hot_storage].path, this locking scheme still lets both processes operate under the same hot/{chunk} tree, even though hot DBs are independently created, opened, and deleted by ingestion/discard. Please include the hot-storage root in the per-root flock set (or explicitly make it non-shareable), otherwise this allowed configuration can corrupt or remove the only hot copy of recent ledgers despite the immutable roots being protected.

Useful? React with 👍 / 👎.

…ge; lock hot storage

- The networkTip fallback (tip = lastCommitted on backend outage) was wrong on
  a FIRST start: lastCommitted is only the earliest-1 sentinel, so the catch-up
  range goes empty and serveReads() starts on empty/incomplete history. Degrade
  to the local watermark only when there is committed local progress
  (lastCommitted >= earliest); otherwise fatal until a real tip is available.

- Single-process enforcement: include [streaming.hot_storage] in the per-root
  flock set. Two daemons with different meta stores but a shared hot_storage
  path could both write the same hot/{chunk} tree — the only copy of recent
  ledgers — despite the immutable roots being locked.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ff11b83d0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +871 to +872
if tip, err := networkTip(cfg); err == nil && earliest > tip {
fatalf("earliest_ledger (%d) is past the current tip (%d); reject.", earliest, tip)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject unchecked future earliest ledgers

When this first-start tip check is skipped because networkTip is temporarily unavailable, the code still pins the numeric earliest_ledger immediately afterward. If that configured floor is actually ahead of the network and the later catch-up-loop tip sample succeeds, lastCommitted is earliest-1, so anchor=max(tip,lastCommitted) makes the backfill range empty and startup proceeds from a future ledger with the bad floor now immutable. Require a successful earliest <= tip check before committing a numeric first-start floor, or defer pinning until the check can be made.

Useful? React with 👍 / 👎.

tamirms and others added 2 commits June 16, 2026 22:07
…dress review

Watermark derivation now counts only "ready" hot keys (not value-blind). That
single change lets surgical recovery uniformly demote hot keys to "transient"
instead of the old demote-vs-remove split — collapsing recovery cases 3/4,
dropping the watermark-inflation reasoning and the dirs-first-keys-last operator
footgun. deriveWatermark's existing refinement read recovers the chunk-level
frontier after a boundary crash (the one spot the ready-only count leans on
opening a hot DB rather than key existence).

Also:
- catchupSource: bounded backend-coverage wait instead of abort/restart churn
  when a re-derived chunk sits above a lagging bulk tip; validateRangeProducible
  no longer aborts on coverage timing.
- Folded the no-hot-key-corner caveat out of recovery case 3 into the
  Monotonic-progress assumption where it's referenced.

Review fixes carried in the same pass:
- Numeric first-start earliest_ledger validated against the tip before pinning.
- earliest_ledger immutability scenario no longer claims a live config edit converges.
- Cold-index 16-byte prefix collision: same-window is a fail-stop ErrDuplicateKey
  build failure; cross-window is harmless (verify rejects it). payloadWidth/cpi cap
  cross-referenced; payloadWidth lives in the streamhash header, not user metadata.
- runIngestionLoop takes ctx; INV-3/INV-2 made consistent and the INV-2 audit scoped
  to [floor, completeThrough]; "readers resolve a ready hot DB or a frozen cold
  artifact, never a transient key"; read-only hot-DB handles closed before reopen/discard.

Docs only; no code changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewrite the full-history design docs in the plain, concept-first voice:
lead with the plain concept and rely on the pseudocode instead of
re-narrating it, drop defensive asides about alternatives that were never
pursued, and define terms before first use.

Rename for consistency throughout: meta store -> catalog, catch-up ->
backfill, lifecycle tick -> lifecycle run, quiescent -> settled. Reorder
the streaming doc's Backfill section top-down (postcondition-driven
planning -> primitives -> execution model) and tighten Correctness; the
gettransaction doc is cut to storage/rebuild/query only, with the
crash-safety machinery left to the streaming doc.

Re-sync the interactive explorer and add two widgets: an executePlan
dependency graph (worker semaphore + done-channel barrier, with a
failure-cascade toggle) and a Startup backfill-loop walkthrough that runs
the real startStreaming arithmetic. Fix dead cross-doc anchors left by the
gettransaction reorganization.

Docs only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tamirms tamirms requested a review from a team June 18, 2026 12:53

@chowbao chowbao left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I don't think these need to go in the design but I had a few comments/questions

  • Verification process (will probably defer to a separate issue/PR)
    • The design assumes that the data backfilled/ingested is complete and correct but there aren't any verification steps in place checking if the LCM contents are correctly ingested or not. For example, checking if for a given LCM the number of transactions ingested == the number of transactions marked as within the LCM
  • There are ledger within the Stellar network that have zero transactions (very early ledgers around genesis and when the network was live but not very active). Zero transaction ledgers would also exist now/in the future if core keeps running but maybe there is no way for people to submit transactions to the network. We should check to make sure when backfilling/frontfilling that the system knows the difference between missing and empty data like txhash rolling index is okay with empty .bin files
  • One clarification with backfill and frontfill on first startup

Backfill has a contract — given a range, ensure every artifact derived from every ledger in it is durable and servable — and resolves what's missing before scheduling anything

skipping the tip chunk that captive core is actively ingesting. Covers first-ever start, downtime gaps, and retention widening.

What happens in the case that while backfilling the live chunk moves to the next chunk. Is it possible to have multiple hot chunks ready to freeze?
and resolves what's missing before scheduling anything does this mean RPC won't serve/run until the backfill is complete? In that case doesn't it make sense to just start captive core after the backfill range is complete instead of skipping the tip/live chunk?

@tamirms

tamirms commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

@chowbao

I agree with you on the verification process. I think we'll need a separate design doc (can be more lightweight like a github issue) to scope out the requirements there.

I also looked into the 0 transaction case. If there is an entire 10 million consecutive ledger range where there are 0 transactions (could be the case in a new stand alone network), this would create a problem because streamhash does not support indexes where there are 0 keys. But, we can fix this easily in streamhash by allowing a 0 key index which would return not found on every query.

What happens in the case that while backfilling the live chunk moves to the next chunk

The backfill operation is run in a loop which should handle the case of the tip advancing to the next chunk during the backfill operation:

	earliest := cat.EarliestLedger()
	lastCommitted := lastCommittedLedger(cat)

	// Step 1: backfill from the floor up to the last complete chunk at the tip,
	// leaving the partial tip chunk to ingestion. Re-pass while the tip moves.
	backfilledThrough := int64(-1)
	for {
		tip, err := networkTip(cfg)
		if err != nil {
			if lastCommitted < earliest {
				fatalf("network tip unavailable and no local history to serve: %v", err)
			}
			tip = lastCommitted // backend down, but local data exists: serve it
		}
		anchor := max(tip, lastCommitted)
		rangeEnd := lastCompleteChunkAt(anchor)
		rangeStart := retentionFloorChunk(rangeEnd, cfg.RetentionChunks, earliest)
		midChunk := lastCommitted != chunkLastLedger(chunkID(lastCommitted))
		nearTip := int64(tip)-int64(lastCommitted) < LedgersPerChunk
		if nearTip && midChunk {
			rangeEnd = chunkID(lastCommitted) - 1 // leave the partial resume chunk to ingestion
		}
		if rangeEnd < rangeStart || rangeEnd <= backfilledThrough {
			break
		}
		if err := executePlan(ctx, cfg, resolve(cfg, rangeStart, rangeEnd)); err != nil {
			return err
		}
		lastCommitted = max(lastCommitted, chunkLastLedger(rangeEnd))
		backfilledThrough = rangeEnd
	}
	resumeLedger := lastCommitted + 1

Is it possible to have multiple hot chunks ready to freeze?

During the live ingestion phase (after we finished backfilling), it is possible to have hot chunks which are ready to freeze but it is very unlikely. Each freeze would probably complete in a matter of minutes and it takes 14 hours for a hot chunk to reach its capacity. For there to be backlog of hot chunks it means the system is in a bad / degraded state and would need investigation.

does this mean RPC won't serve/run until the backfill is complete? In that case doesn't it make sense to just start captive core after the backfill range is complete instead of skipping the tip/live chunk?

yes, RPC won't serve any requests until the backfill loop is complete. But, I don't understand the second question. After the backfill step is complete, we do actually start up captive core to do the live ingestion phase.

@chowbao

chowbao commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

does this mean RPC won't serve/run until the backfill is complete? In that case doesn't it make sense to just start captive core after the backfill range is complete instead of skipping the tip/live chunk?

yes, RPC won't serve any requests until the backfill loop is complete. But, I don't understand the second question. After the backfill step is complete, we do actually start up captive core to do the live ingestion phase.

I should worded this better. If I'm backfilling from genesis to now I would assume that the actual current ledger would be at least a chunk or two ahead of the "now" (the tip chunk that captive core is actively ingesting). I was just suggesting that we can backfill as close to current as possible instead of spinning up captive core for the old current ledger tip.

In other words

  • I want to spin up a new rpc v2 from genesis to now. "now" in this example can be like 1,000,000
  • Backfilling from 2 to 1,000,000 takes some arbitrary amount of time backfilling with BSB
  • By the time backfill BSB gets to 1,000,000 the current ledger might actually be 1,050,000
  • The way the design is worded right now makes me think that captive core would have been started at 1,000,000 which means that captive core is ingesting an extra 5 chunks slowly instead of just letting BSB do those 5 chunks

This is a small optimization. This probably doesn't matter in the long run because rpc v2 wouldn't start serving until the backfill is complete and caught up anyways. But I just wanted to call it out

Edit: Oh actually the code in your comment might have handled this scenario cause that code would run until it is in the latest chunk which at that point we would start up captive core

…able)

The tx-hash window is now a hardcoded 1,000-chunk (10M-ledger) constant rather than a config key. Drops the [backfill] config row, the config:chunks_per_txhash_index catalog pin and its validateConfig mismatch path, the MaxChunksPerTxhashIndex cap, and the variable payload-width machinery (payload is a flat 3 bytes). Mirrors the change in the companion HTML explorer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GtG9MzaQ7h3LqvhQCkWhrg

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21acdb03b7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

lifecycleCh := make(chan ChunkID, lifecycleQueueDepth)
lifecycleCh <- lastCompleteChunkAt(resumeLedger - 1) // seed the first run
go lifecycleLoop(ctx, cfg, lifecycleCh)
serveReads()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not serve before replaying the partial chunk

When startup stops backfill at lastCompleteChunkAt(tip), a first start or a restart whose current tip chunk was not already in a ready hot DB opens an empty resume hot DB and reaches serveReads() before runIngestionLoop has replayed resumeLedger..tip. During that catch-up window, requests for ledgers/transactions that were already at the sampled network tip can return not-found even though the startup text says the partial tip chunk is already served from hot storage; either replay to the sampled tip before serving or make this syncing state explicit to readers.

Useful? React with 👍 / 👎.

…short retention

The .bin lifetime descriptions stated it is removed at window finalization, which only holds when retention is at least a window wide. Under retention narrower than a window the window never finalizes; retention pruning removes each .bin as its chunk drops below the floor. Clarified across the streaming workflow doc, the getTransaction design, and the HTML explorer, consistent with the docs' existing invariant and crash-safety text.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GtG9MzaQ7h3LqvhQCkWhrg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants