docs(full-history): split getTransaction design into its own doc; reorganize design-docs#787
docs(full-history): split getTransaction design into its own doc; reorganize design-docs#787tamirms wants to merge 17 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7640b9a6b1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| `txhash/index/{window:08d}/{lo:08d}-{hi:08d}.idx`, meta-store key `index:{window:08d}:{lo:08d}:{hi:08d}`. One streamhash minimal-perfect-hash file per **coverage**, built by streamhash's `SortedBuilder` over the k-way merge of `.bin[lo..hi]`, with the cold-txhash option set: | ||
|
|
||
| - **Payload: 3 bytes** — the ledger seq stored as an offset from `MinLedger`, where `MinLedger = chunkFirstLedger(lo)` is derived from the build range and embedded in the file as user metadata. 3 bytes spans 16.7M ledgers, comfortably over a 10M-ledger window. No sidecar metadata. |
There was a problem hiding this comment.
Enforce the 24-bit payload limit
The surrounding docs make chunks_per_txhash_index configurable, but this .idx format stores only a 24-bit ledger offset. If an operator pins chunks_per_txhash_index >= 1678 (more than 16.77M ledgers), offsets from MinLedger no longer fit in 3 bytes, so the generated index cannot represent the tail of the window and cold getTransaction lookups will wrap or fail. Please either cap validation at 1677 chunks or make the payload width depend on the configured window size.
Useful? React with 👍 / 👎.
| // (produced locally via catchupSource's hot branch) — the bulk | ||
| // backend is never asked for them. | ||
| rangeEnd := lastCompleteChunkAt(anchor) | ||
| watermarkMidChunk := lastCommitted != chunkLastLedger(chunkID(lastCommitted)) |
There was a problem hiding this comment.
Guard the genesis watermark before chunkID
On an empty/genesis start, deriveWatermark is documented to return earliest_ledger - 1 (ledger 1 when the floor is genesis), which is outside the valid domain for chunkID(seq) = (seq - 2) / 10000. For a network younger than one chunk, withinOneChunkOfTip is true and the later rangeEnd = chunkID(lastCommitted) - 1 turns the intended empty range into a huge wrapped chunk id, defeating the rangeEnd < rangeStart guard and scheduling bogus backfill work. Guard this sentinel before calling chunkID or use a signed/no-chunk representation.
Useful? React with 👍 / 👎.
Add the full-history RPC design docs under design-docs/: - full-history-streaming-workflow.md — the streaming daemon: geometry, the meta store and the one write protocol, catch-up, hot-DB ingestion, the lifecycle tick (freeze -> rebuild -> discard -> prune), the reader retention contract, and the correctness invariants (INV-1..4) with audits. - gettransaction-full-history-design.md — the transaction-by-hash subsystem: the hot txhash CF, the .bin/.idx streamhash formats, the rolling window-index build protocol (pseudocode, crash matrix, rewriting-safety argument), the getTransaction read path, and capacity numbers. - full-history-design-explorer.html — a self-contained interactive companion to both docs. Docs live in the top-level design-docs/ alongside the events and packfile designs; the standalone backfill workflow doc is retained as historical and is subsumed by the streaming doc. Docs only — no code changes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
7640b9a to
6ce1159
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ce11597d6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| INV-1 holds at every point the daemon is serving reads — transient states are never externally visible, because readers resolve `"frozen"` keys exclusively and the retention check masks everything else. INV-2, INV-3, and INV-4 hold at every quiescence reached after the events below; startup's first quiescence arrives when the first tick completes, shortly after reads open. | ||
|
|
||
| 1. **Steady-state operation.** Hot DB ingestion advances `last_committed_ledger`; the lifecycle goroutine freezes complete chunks within retention and prunes anything past it. All four invariants hold by induction on `last_committed_ledger`. | ||
| 2. **Operator state changes** — retention widening or shortening (`retention_chunks`), `earliest_ledger` raised. Both reduce to "`effectiveRetentionFloor` recomputes; the next startup converges to the new state." Catch-up's per-window resolver rule re-derives and rebuilds any window whose desired coverage now exceeds its stored coverage; the prune stage removes anything below a raised floor. The "next startup" is load-bearing for widening, enforced by the floor's two-role split: a lowered floor takes effect immediately in its *retention* role (pruning simply stops sooner), but the tick's *production* range still starts at existing storage — only the next catch-up, behind `validateBackendCovers`, materializes the new bottom. |
There was a problem hiding this comment.
Remove unsupported earliest_ledger raise path
This scenario says raising earliest_ledger converges on the next startup, but the startup validation above stores config:earliest_ledger on first start and fatals on any later non-now mismatch, while the configuration section also says changing it currently requires wiping the data directory or a future admin command. An operator following this documented state-change path will hit a startup abort rather than convergence, so either drop earliest_ledger from the supported operator changes here or describe the future admin workflow explicitly.
Useful? React with 👍 / 👎.
…nesis-corner fixes Follow-up to the design-doc PR addressing the automated review. gettransaction (#1): size the .idx ledger-seq payload to the window — payloadWidth = ceil(log2(chunks_per_txhash_index * 10_000) / 8) — instead of a fixed 3 bytes, so the format never caps chunks_per_txhash_index (3 bytes up to 1677 chunks, 4 bytes beyond). Capacity references qualified as default-window figures. streaming (#2): fix the young-genesis corner case at the root. Add the chunk-id convention to Geometry — signed ids, chunkID floor-divides so the earliest_ledger-1 watermark sentinel maps to chunk -1 with chunkLastLedger(-1) = 1, and the -1 sentinel is never serialized. With it, deriveCompleteThrough's cold term (highestDurableChunk returns -1 on a fresh start) and positional term (maxChunk-1 = -1 at chunk 0) degrade to the pre-genesis sentinel instead of a spurious chunk-0 bound that would resume a young network past its tip; the startup mid-chunk test reads the sentinel as a chunk boundary, no special-case guard needed. Codex finding #3 (earliest_ledger raise path) was a false positive: the doc consistently frames earliest_ledger movement as a deferred operation gated behind a future admin command. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d18a26b00a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ### 8.2 Cold lookup | ||
|
|
||
| ``` | ||
| resolve the window's unique "frozen" key |
There was a problem hiding this comment.
Specify how a hash lookup chooses its index window
When the retention span contains more than one tx-hash window, this cold lookup has no deterministic window to resolve: the RPC request only supplies a hash, while each .idx is scoped to its own [lo, hi] range and stores only a window-local ledger offset. Probing a single “the window” misses transactions stored in every other window, but probing all windows contradicts the stated one-index-probe read path and needs different false-positive handling, so the design needs either a top-level hash→window locator or an explicit multi-window scan algorithm.
Useful? React with 👍 / 👎.
…up, scope/wording cleanup Follow-up to the design-doc review. Genesis corner (streaming): replace the per-site sentinel guard with one chunk-id convention in Geometry — chunk ids are signed and chunkID floor- divides, so the earliest_ledger-1 watermark sentinel maps to chunk -1 with chunkLastLedger(-1) = 1, never serialized. deriveCompleteThrough's cold and positional terms are then correct by construction on a young genesis network (highestDurableChunk returns -1; maxChunk-1 = -1 at chunk 0), instead of a spurious chunk-0 bound that would resume past the tip. getTransaction read path: a cold lookup probes EVERY in-retention window's .idx (a hash carries no window hint); correctness is the streamhash fingerprint screen plus the mandatory fetch-and-verify, which confirms at most one window. payloadWidth/fpWidth are streamhash config knobs. Scope: how reads stay correct while sweeps and pruning unlink files concurrently (the ENOENT races, coverage re-resolution) is the query-routing design's concern, per the doc's own deferral — trim tx 8.4 and the streaming reader-retention-contract section to the in-scope contract plus that deferral. Wording: drop vacuous/overselling phrasing across both docs (the "one home" framing, "bounded by RAM", "structurally unreachable", filler qualifiers). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c6c1adc90
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if stored, ok := cat.Get("config:chunks_per_txhash_index"); !ok { | ||
| cat.Put("config:chunks_per_txhash_index", itoa(cfg.ChunksPerTxhashIndex)) |
There was a problem hiding this comment.
Reject a zero tx-hash window size
When chunks_per_txhash_index is set to 0 on a first start, this path records it as the immutable layout value before any range validation. The rest of the design divides by this value for indexID(c) / window ranges and uses it to size payloads, so a bad first config can either panic at runtime or pin an unusable data directory; validate it is non-zero before storing the config pin.
Useful? React with 👍 / 👎.
A zero value would divide-by-zero in indexID / window math (and break payloadWidth) and, pinned on first start, permanently brick the data dir. Validate non-zero before storing the config pin. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76d5d231a0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
|
|
||
| func executePlan(ctx context.Context, plan Plan, cfg Config) error { | ||
| slots := make(chan struct{}, cfg.Workers) // the ONLY concurrency knob: one pool, all work kinds |
There was a problem hiding this comment.
Reject non-positive worker counts
workers is user-configurable as an int, but the config validation shown above only rejects a zero tx-hash window size. When [catch_up].workers = 0, this channel is unbuffered and every chunk/index task blocks on slots <- before doing any work, while the only receive is deferred until after work completes, so any non-empty catch-up or lifecycle plan hangs forever; negative values panic in make. Validate or coerce cfg.Workers > 0 before executing the plan.
Useful? React with 👍 / 👎.
| entry × count 20 bytes each: [key: 16][seq: 4 LE] | ||
| ``` | ||
|
|
||
| - `key` is the **first 16 bytes of the transaction hash**. |
There was a problem hiding this comment.
Handle duplicate cold-index prefixes
Because the cold .bin/MPHF key stores only the first 16 bytes, two in-retention transaction hashes with the same prefix become duplicate index keys with different ledger payloads. Streamhash can only map a key to one payload (or fail the build), so the later full-hash verification prevents returning the wrong transaction but turns the other real transaction into a false not-found, violating the doc's completeness/no-false-negatives guarantee for that collision case. Store a disambiguating full hash or collision list if prefix collisions are meant to be harmless.
Useful? React with 👍 / 👎.
The explorer had drifted from the design docs:
- Read-path explorer showed single-window resolution ("resolve the window's
unique frozen key") — the exact bug the md was corrected away from. Rewrote
it for probe-all: every in-retention window probed, fingerprint screen +
fetch-and-verify, at most one confirms. Scenarios are now Found / Fingerprint
false positive / Not found.
- Dropped the four ENOENT race scenarios; that read-vs-pruning handling is
deferred to the query-routing design (matches the md's 8.4 pullback).
- chunkID shown as floor((seq-2)/10_000); payload referenced as payloadWidth
(was a fixed 3 bytes).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…sion note - validateConfig rejects workers < 1: a zero pool makes executePlan's slots channel unbuffered, so every task blocks on acquire forever (deadlock); negative panics in make. - Correct the §8.2 16-byte-prefix-collision sentence. It claimed the collision "returns not-found", which is wrong for the cross-window case: probe-all + continue-past-verify-mismatch finds each transaction in its own window, and an intra-window collision is detected by streamhash at build time. The verify still guarantees the wrong transaction is never served; the residual collision (~10^-20/window) is accepted as negligible. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 535aa8810b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if desired > backendNetworkTip(cfg) { | ||
| fatalf("earliest_ledger (%d) is past the current tip; reject.", desired) | ||
| } |
There was a problem hiding this comment.
Gate the earliest tip check to first start
When the config uses a numeric earliest_ledger on an existing data dir, this check still compares that value to the currently sampled backend tip before considering the stored pin. If the bulk LedgerBackend lags below the pinned floor (a case the startup loop later explicitly handles with max(tip, lastCommitted)), restart fatals even though local data may already be ahead; limit this desired > tip rejection to first start or compare after loading the stored value so backend lag does not brick numeric-frontfill deployments.
Useful? React with 👍 / 👎.
validateConfig sampled the backend tip on every start — the numeric floor-past-tip rejection and the "now" re-resolution both. On a restart of a numeric-frontfill deployment, a lagging bulk backend (tip below the pinned floor — the case the startup loop's max(tip, lastCommitted) is built for) would spuriously fatal, even though local data is already ahead. Restructure around the pin: on restart, trust the stored pin and only check the config didn't change (no backend call); sample the tip only on first start, where the floor-past-tip rejection is meaningful. Fixes the numeric restart fatal and the redundant "now" resampling, and removes the backend dependency from the restart config path. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e54a73259d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| fatalf("workers must be > 0 (got %d) — a zero pool deadlocks executePlan.", cfg.Workers) | ||
| } | ||
| if stored, ok := cat.Get("config:chunks_per_txhash_index"); !ok { | ||
| cat.Put("config:chunks_per_txhash_index", itoa(cfg.ChunksPerTxhashIndex)) |
There was a problem hiding this comment.
Pin layout only after first-start validation
On a first start this writes config:chunks_per_txhash_index before earliest_ledger has been parsed, alignment-checked, or compared with the backend tip. If that same startup then fatals on an invalid/past-tip earliest_ledger (or crashes before line 810), the data dir is left with only the tx-hash layout pin; retrying with a corrected chunks_per_txhash_index now aborts as a layout mismatch even though no artifacts were created. Resolve and validate all first-start pins first, then commit them together.
Useful? React with 👍 / 👎.
…te max_retries - validateConfig pinned chunks_per_txhash_index before earliest_ledger was validated, so a first start that fataled on a bad earliest_ledger left a lone chunks pin — a retry correcting chunks_per_txhash_index then aborted as a layout mismatch though no artifacts existed. Validate both layout pins first, then commit them together in one atomic batch; treat "both pins present" as the committed/immutable state. - Proactively: validate max_retries >= 0 in the same stateless block (a negative value is meaningless and, depending on withRetries, could skip a catch-up task silently). Same class as the chunks=0 / workers<1 guards. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d597a77a9d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Every in-range chunk must be producible from SOME source: durable | ||
| // artifacts (self-skips), a complete ready hot DB, the local .pack, or | ||
| // the bulk backend — fail before any work otherwise. | ||
| if err := validateBackendCovers(cfg, rangeStart, rangeEnd); err != nil { |
There was a problem hiding this comment.
Validate only chunks that require the backend
When the bulk backend lags local state, startup intentionally computes rangeEnd from max(tip, lastCommitted), so the range can include chunks above the backend tip that are already complete locally. This unconditional validateBackendCovers(cfg, rangeStart, rangeEnd) runs before resolve/catchupSource can self-skip or choose the hot DB/pack source, and the config section describes this validator as checking the configured bulk backend; in that restart it aborts even though no backend read is needed for those chunks. Make the validation mirror the source preference, or validate only chunks that would actually fall through to the backend.
Useful? React with 👍 / 👎.
The numeric earliest_ledger path called atoi(cfg.EarliestLedger) with no
parse check, so a malformed value panicked or fell through to a confusing
"must be chunk-aligned (0)". Validate the form once in the stateless block
("genesis" / "now" / a ledger number), which also makes the restart-branch
atoi safe.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The name read as "the bulk backend covers [rangeStart, rangeEnd]", but the startup loop deliberately sets rangeEnd above a lagging bulk tip (anchored on max(tip, lastCommitted)), and those chunks are produced locally. Rename and restate the contract: validate every in-range chunk is producible from SOME source (mirroring catchupSource) — the backend need cover only the chunks that fall through to it, not the whole range. Prevents a spurious abort on a lagging-backend restart. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ive core) + config bounds
The audit showed the core design (concurrency, atomicity, arithmetic,
consistency) is sound; the gaps were all at external-dependency edges and a
couple of missing config bounds. One principle: untrusted external state must
never poison durable state or silently stop the daemon.
- networkTip(): single hardened tip sampler — retry-with-backoff, and reject a
tip < genesis ("backend not ready"). Routes all three call sites. "now" (no
local substitute) fatals on failure; the catch-up loop falls back to
lastCommitted (serve local, skip catch-up); the numeric past-tip check is
best-effort. Fixes: first-start "now"/below-genesis tip pinning a garbage
immutable floor; a transient backend outage bricking a local-only restart.
- earliest_ledger static-form validated once (parse + >= genesis + aligned),
so no path feeds a sub-genesis value into chunkID.
- chunks_per_txhash_index upper-bounded (MaxChunksPerTxhashIndex) so the
window span fits the index offset and can't overflow uint32.
- lastCompleteChunkAt: cast before subtract — total over uint32 (was a latent
underflow at ledger 0).
- runIngestionLoop: distinguish ctx-cancel (clean, return nil) from an
unexpected captive-core stream close (restartable error → non-zero exit →
supervisor restarts), instead of silently exiting 0 when core dies.
- tx §3 cold-tier cell: "in [lo, hi]" (was "at or below hi", dropping the lo bound).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b8cf6b909d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| tip, err := networkTip(cfg) | ||
| if err != nil { | ||
| tip = lastCommitted // backend unreachable: serve local, skip catch-up this pass | ||
| } |
There was a problem hiding this comment.
Abort first-start catch-up when the tip is unavailable
When this fallback runs on a first start with earliest_ledger="genesis" or a numeric floor, lastCommitted is only the earliest_ledger - 1 sentinel, so substituting it for an unreachable backend tip makes rangeEnd < rangeStart and skips backfill entirely before serveReads() starts. That violates the startup contract that the daemon catches up before serving and can expose an empty/incomplete history; only degrade to the local watermark once the catalog already contains a complete local retention window, otherwise fail startup until a real tip is available.
Useful? React with 👍 / 👎.
The select-based loop returned an error on the stream-close case without checking ctx, so a graceful shutdown (ctx cancelled AND core closes its stream) could exit non-zero if select picked the channel branch — making the supervisor restart a daemon that was asked to stop. Guard the close case with ctx.Err(): a close while ctx is done is a clean shutdown (return nil); only an unexpected close is the restartable error. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5631e4ce0d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| - The executor runs each `IndexBuild` via `buildThenSweep` (defined with `resolve` above), which lands the commit batch (terminal for complete windows) and then runs the eager `"pruning"` sweep (rule 4). The sweep is window-local — this window's demoted inputs and superseded coverages, not a store-wide scan — so concurrent windows' sweeps touch disjoint keys, and `fsyncDir` on a bucket dir shared with another window's in-flight `.bin` writes is safe (a dir fsync with concurrent creates just makes more entries durable). | ||
| - Done-channels broadcast *completion*, not success: a chunk build that exhausts its retries still closes its channel (the `defer`), so a dependent index build can win the race against context cancellation and start — whereupon it fails `buildTxhashIndex`'s loud `.bin` precondition check before writing any key, landing on the same abort-and-restart path as the original failure. The precondition check is load-bearing here. | ||
| - A task that exhausts its retries aborts the daemon, per the [error policy](#lifecycle); restart re-resolves from durable keys, and completed work never repeats. | ||
| - **Single-process enforcement:** the meta store holds a kernel `flock` on a `LOCK` file; a second daemon opening the **same meta-store path** fails immediately, and the lock releases on any process exit (including `kill -9`). Because `[meta_store]` and each `[immutable_storage.*]` path are independently configurable, the meta-store lock alone cannot stop two daemons with *different* meta stores from sharing one artifact tree — the daemon therefore also takes a `flock` in each configured storage root. |
There was a problem hiding this comment.
Include hot storage in the lock set
When two daemons use different meta-store paths but share [streaming.hot_storage].path, this locking scheme still lets both processes operate under the same hot/{chunk} tree, even though hot DBs are independently created, opened, and deleted by ingestion/discard. Please include the hot-storage root in the per-root flock set (or explicitly make it non-shareable), otherwise this allowed configuration can corrupt or remove the only hot copy of recent ledgers despite the immutable roots being protected.
Useful? React with 👍 / 👎.
…ge; lock hot storage
- The networkTip fallback (tip = lastCommitted on backend outage) was wrong on
a FIRST start: lastCommitted is only the earliest-1 sentinel, so the catch-up
range goes empty and serveReads() starts on empty/incomplete history. Degrade
to the local watermark only when there is committed local progress
(lastCommitted >= earliest); otherwise fatal until a real tip is available.
- Single-process enforcement: include [streaming.hot_storage] in the per-root
flock set. Two daemons with different meta stores but a shared hot_storage
path could both write the same hot/{chunk} tree — the only copy of recent
ledgers — despite the immutable roots being locked.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4ff11b83d0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if tip, err := networkTip(cfg); err == nil && earliest > tip { | ||
| fatalf("earliest_ledger (%d) is past the current tip (%d); reject.", earliest, tip) |
There was a problem hiding this comment.
Reject unchecked future earliest ledgers
When this first-start tip check is skipped because networkTip is temporarily unavailable, the code still pins the numeric earliest_ledger immediately afterward. If that configured floor is actually ahead of the network and the later catch-up-loop tip sample succeeds, lastCommitted is earliest-1, so anchor=max(tip,lastCommitted) makes the backfill range empty and startup proceeds from a future ledger with the bad floor now immutable. Require a successful earliest <= tip check before committing a numeric first-start floor, or defer pinning until the check can be made.
Useful? React with 👍 / 👎.
…dress review Watermark derivation now counts only "ready" hot keys (not value-blind). That single change lets surgical recovery uniformly demote hot keys to "transient" instead of the old demote-vs-remove split — collapsing recovery cases 3/4, dropping the watermark-inflation reasoning and the dirs-first-keys-last operator footgun. deriveWatermark's existing refinement read recovers the chunk-level frontier after a boundary crash (the one spot the ready-only count leans on opening a hot DB rather than key existence). Also: - catchupSource: bounded backend-coverage wait instead of abort/restart churn when a re-derived chunk sits above a lagging bulk tip; validateRangeProducible no longer aborts on coverage timing. - Folded the no-hot-key-corner caveat out of recovery case 3 into the Monotonic-progress assumption where it's referenced. Review fixes carried in the same pass: - Numeric first-start earliest_ledger validated against the tip before pinning. - earliest_ledger immutability scenario no longer claims a live config edit converges. - Cold-index 16-byte prefix collision: same-window is a fail-stop ErrDuplicateKey build failure; cross-window is harmless (verify rejects it). payloadWidth/cpi cap cross-referenced; payloadWidth lives in the streamhash header, not user metadata. - runIngestionLoop takes ctx; INV-3/INV-2 made consistent and the INV-2 audit scoped to [floor, completeThrough]; "readers resolve a ready hot DB or a frozen cold artifact, never a transient key"; read-only hot-DB handles closed before reopen/discard. Docs only; no code changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewrite the full-history design docs in the plain, concept-first voice: lead with the plain concept and rely on the pseudocode instead of re-narrating it, drop defensive asides about alternatives that were never pursued, and define terms before first use. Rename for consistency throughout: meta store -> catalog, catch-up -> backfill, lifecycle tick -> lifecycle run, quiescent -> settled. Reorder the streaming doc's Backfill section top-down (postcondition-driven planning -> primitives -> execution model) and tighten Correctness; the gettransaction doc is cut to storage/rebuild/query only, with the crash-safety machinery left to the streaming doc. Re-sync the interactive explorer and add two widgets: an executePlan dependency graph (worker semaphore + done-channel barrier, with a failure-cascade toggle) and a Startup backfill-loop walkthrough that runs the real startStreaming arithmetic. Fix dead cross-doc anchors left by the gettransaction reorganization. Docs only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
chowbao
left a comment
There was a problem hiding this comment.
LGTM
I don't think these need to go in the design but I had a few comments/questions
- Verification process (will probably defer to a separate issue/PR)
- The design assumes that the data backfilled/ingested is complete and correct but there aren't any verification steps in place checking if the LCM contents are correctly ingested or not. For example, checking if for a given LCM the number of transactions ingested == the number of transactions marked as within the LCM
- There are ledger within the Stellar network that have zero transactions (very early ledgers around genesis and when the network was live but not very active). Zero transaction ledgers would also exist now/in the future if core keeps running but maybe there is no way for people to submit transactions to the network. We should check to make sure when backfilling/frontfilling that the system knows the difference between missing and empty data like txhash rolling index is okay with empty .bin files
- One clarification with backfill and frontfill on first startup
Backfill has a contract — given a range, ensure every artifact derived from every ledger in it is durable and servable — and resolves what's missing before scheduling anything
skipping the tip chunk that captive core is actively ingesting. Covers first-ever start, downtime gaps, and retention widening.
What happens in the case that while backfilling the live chunk moves to the next chunk. Is it possible to have multiple hot chunks ready to freeze?
and resolves what's missing before scheduling anything does this mean RPC won't serve/run until the backfill is complete? In that case doesn't it make sense to just start captive core after the backfill range is complete instead of skipping the tip/live chunk?
|
I agree with you on the verification process. I think we'll need a separate design doc (can be more lightweight like a github issue) to scope out the requirements there. I also looked into the 0 transaction case. If there is an entire 10 million consecutive ledger range where there are 0 transactions (could be the case in a new stand alone network), this would create a problem because streamhash does not support indexes where there are 0 keys. But, we can fix this easily in streamhash by allowing a 0 key index which would return not found on every query.
The backfill operation is run in a loop which should handle the case of the tip advancing to the next chunk during the backfill operation: earliest := cat.EarliestLedger()
lastCommitted := lastCommittedLedger(cat)
// Step 1: backfill from the floor up to the last complete chunk at the tip,
// leaving the partial tip chunk to ingestion. Re-pass while the tip moves.
backfilledThrough := int64(-1)
for {
tip, err := networkTip(cfg)
if err != nil {
if lastCommitted < earliest {
fatalf("network tip unavailable and no local history to serve: %v", err)
}
tip = lastCommitted // backend down, but local data exists: serve it
}
anchor := max(tip, lastCommitted)
rangeEnd := lastCompleteChunkAt(anchor)
rangeStart := retentionFloorChunk(rangeEnd, cfg.RetentionChunks, earliest)
midChunk := lastCommitted != chunkLastLedger(chunkID(lastCommitted))
nearTip := int64(tip)-int64(lastCommitted) < LedgersPerChunk
if nearTip && midChunk {
rangeEnd = chunkID(lastCommitted) - 1 // leave the partial resume chunk to ingestion
}
if rangeEnd < rangeStart || rangeEnd <= backfilledThrough {
break
}
if err := executePlan(ctx, cfg, resolve(cfg, rangeStart, rangeEnd)); err != nil {
return err
}
lastCommitted = max(lastCommitted, chunkLastLedger(rangeEnd))
backfilledThrough = rangeEnd
}
resumeLedger := lastCommitted + 1
During the live ingestion phase (after we finished backfilling), it is possible to have hot chunks which are ready to freeze but it is very unlikely. Each freeze would probably complete in a matter of minutes and it takes 14 hours for a hot chunk to reach its capacity. For there to be backlog of hot chunks it means the system is in a bad / degraded state and would need investigation.
yes, RPC won't serve any requests until the backfill loop is complete. But, I don't understand the second question. After the backfill step is complete, we do actually start up captive core to do the live ingestion phase. |
I should worded this better. If I'm backfilling from genesis to now I would assume that the actual current ledger would be at least a chunk or two ahead of the "now" ( In other words
This is a small optimization. This probably doesn't matter in the long run because rpc v2 wouldn't start serving until the backfill is complete and caught up anyways. But I just wanted to call it out Edit: Oh actually the code in your comment might have handled this scenario cause that code would run until it is in the latest chunk which at that point we would start up captive core |
…able) The tx-hash window is now a hardcoded 1,000-chunk (10M-ledger) constant rather than a config key. Drops the [backfill] config row, the config:chunks_per_txhash_index catalog pin and its validateConfig mismatch path, the MaxChunksPerTxhashIndex cap, and the variable payload-width machinery (payload is a flat 3 bytes). Mirrors the change in the companion HTML explorer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GtG9MzaQ7h3LqvhQCkWhrg
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 21acdb03b7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| lifecycleCh := make(chan ChunkID, lifecycleQueueDepth) | ||
| lifecycleCh <- lastCompleteChunkAt(resumeLedger - 1) // seed the first run | ||
| go lifecycleLoop(ctx, cfg, lifecycleCh) | ||
| serveReads() |
There was a problem hiding this comment.
Do not serve before replaying the partial chunk
When startup stops backfill at lastCompleteChunkAt(tip), a first start or a restart whose current tip chunk was not already in a ready hot DB opens an empty resume hot DB and reaches serveReads() before runIngestionLoop has replayed resumeLedger..tip. During that catch-up window, requests for ledgers/transactions that were already at the sampled network tip can return not-found even though the startup text says the partial tip chunk is already served from hot storage; either replay to the sampled tip before serving or make this syncing state explicit to readers.
Useful? React with 👍 / 👎.
…short retention The .bin lifetime descriptions stated it is removed at window finalization, which only holds when retention is at least a window wide. Under retention narrower than a window the window never finalizes; retention pruning removes each .bin as its chunk drops below the floor. Clarified across the streaming workflow doc, the getTransaction design, and the HTML explorer, consistent with the docs' existing invariant and crash-safety text. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GtG9MzaQ7h3LqvhQCkWhrg
What this delivers
Net of this branch against
feature/full-history: three new design docs underdesign-docs/, plus removal of the stale numbered-doc README — written in aplain, concept-first voice that leans on the pseudocode rather than
re-narrating it in prose.
full-history-streaming-workflow.md— the streaming daemon design:geometry, the catalog and the one write protocol, backfill
(postcondition-driven planning → primitives → execution model), hot-DB
ingestion, the lifecycle run (freeze → rebuild → discard → prune), the
reader contract, and the correctness invariants (INV-1…4) with their audits.
gettransaction-full-history-design.md— the transaction-by-hashsubsystem end to end: the hot
txhashCF, the.bin/.idxstreamhashformats, the rolling window-index rebuild, the
getTransactionquery path,and capacity numbers. Scoped to storage / rebuild / query — the crash-safety
machinery lives in the streaming doc. The streaming doc references it for
tx-index depth, mirroring how the events subsystem already has its own doc.
full-history-design-explorer.html— a self-contained interactivecompanion to both docs (no external dependencies; open in a browser), with a
widget per concept: geometry, the artifact lifecycles and one write protocol,
a crash simulator, the derived last-committed-ledger, the rolling index, a
chunk boundary end to end, the resolver and
executePlan, the startupbackfill loop, the reader contract, and the invariants.
The full-history design docs are consolidated into the top-level
design-docs/alongside the existing events and packfile docs, with the numeric prefixes
dropped.
full-history/design-docs/03-backfill-workflow.mdis unchanged vs.this base (retained as the historical standalone-backfill design; the streaming
doc records that it is subsumed).
Docs only — no code changes.
Notes for review
explorer rather than the raw diff.
🤖 Generated with Claude Code