Streaming daemon: Phase 2 layer 1 — hot store + lifecycle (#816)#820
Draft
chowbao wants to merge 8 commits into
Draft
Streaming daemon: Phase 2 layer 1 — hot store + lifecycle (#816)#820chowbao wants to merge 8 commits into
chowbao wants to merge 8 commits into
Conversation
This was referenced Jun 24, 2026
3d12c9e to
84ff8c2
Compare
df8ec80 to
17b5c39
Compare
84ff8c2 to
419f7ec
Compare
17b5c39 to
145c1cc
Compare
04f9931 to
7f8e58f
Compare
e15575a to
bc56b0a
Compare
7f8e58f to
f3431cd
Compare
bc56b0a to
440443b
Compare
fa0083f to
cbc80ab
Compare
440443b to
c4944fb
Compare
chowbao
added a commit
that referenced
this pull request
Jun 25, 2026
…hanges Rebased onto the updated #820 and propagated #817's API changes into the Phase 2 live-ingestion/daemon layer: - window -> tx-hash index rename + key prefix index: -> txhash_index: (TxHashIndexCoverage.Index, Catalog.txhashIndex), Catalog.Get/Has -> get/has, config sections regrouped (cfg.Retention/Layout/Storage/Ingestion), pins via PinLayout. - daemon.go merge: kept #821's live-ingestion wiring (LifecycleConfig + Core) and deduped the HotProbe line (#821's Phase-2 wiring already set it, so #820's HotProbe fix is redundant here). - removed the #819 cold-only catch-up E2E (TestRunDaemon_CatchUpMaterializes...) + its someTxBackend/oneTxLCMBytes helpers: #821's daemon now requires Boundaries.Core and runs a continuous live loop, so a cold-only "catch up then return" test can't fit — and TestE2E_DaemonLifecycle covers it end to end. Mechanical propagation only; build/vet/test -short green (the heavy lifecycle E2E stays -short-gated).
Rebased onto #817's foundations after the round-2 review reworked them: the #824 geometry+catalog subpackage split, Windows support dropped, and the crash-test hooks removed. The primitives spine (processChunk, buildTxhashIndex, buildThenSweep + the cold backfill source order) stays in package streaming and now imports the new subpackages: - *Catalog -> *catalog.Catalog. The two former *Catalog helpers (txhashBinInputs, windowDemotedTxhashRefs) become free functions in streaming over the catalog's exported API, since the type now lives in another package and methods can't be added across the boundary. - Key/state/layout/index types and the fsync barrier (BarrierNewFile) resolve through geometry.*; the ArtifactSet -> ingest.Config translation stays in streaming (ingestConfigFor) so catalog keeps its one-way dependency on geometry alone (the #824 split invariant). Crash-test hooks dropped to match #817. The in-method ordering observations (afterMarkFreezing / afterBarrier / afterIndexMark / afterCommitBeforeSweep) are gone; the §7.6 crash matrix is reconstructed hook-free through the public protocol and the buildTxhashIndex(commit) / buildThenSweep(commit+sweep) seam, asserting recovery convergence on the durable states a crash leaves behind. The pure mid-method ordering checks are deferred to the fault-injection harness (#823), mirroring catalog's TestCrashSafety_FileWrittenKeyNotFlipped. go build, go vet, gofmt and go test (streaming + catalog + geometry + ingest, RocksDB 10.9.1 cgo toolchain) are all green. Part of #815.
… service wiring (closes #815) Derived progress/watermark (recomputed from durable keys), the resolve catalog diff -> Plan, executePlan (one bounded worker pool; index builds wait on their in-coverage chunk builds; withRetries with exponential backoff), and the cold-only startStreaming (networkTip-bounded catch-up loop -> serveReads handoff; no hot tier / live loop / lifecycle goroutine in Phase 1). Wires the daemon entrypoint (LoadConfig -> validateConfig -> locks -> supervised loop), the CLI full-history-streaming subcommand, and the folded-in cold metrics: the daemon builds a Prometheus registry and drives the cold tier through ingest.ColdService + NewPrometheusSink (ProcessConfig.Sink). Closes #815.
Comment-only reviewability pass over the orchestration + daemon layer
(no code changes — verified via comment-stripped diff). Keeps each
canonical explanation once, shrinks per-function docs to what is unique,
and collapses multi-line inline re-explanations; invariants, the "why"
behind non-obvious choices, design-doc citations, and all directive
comments are preserved. ~25% fewer comment words across the eight files.
Also corrects stale references that drifted from earlier layers:
- resolve.go / execute.go: IndexBuild and BuildConfig live in txindex.go,
not the never-named "build.go".
- startup.go: validateConfig is wired now (RunDaemon calls it before
startStreaming); drop the "Phase D / not done here" note.
- doc.go: the file map now lists the files this cold-only package
actually has (the hot tier, lifecycle, recovery, and audit files are
Phase 2), instead of files that do not exist yet.
Foundations (#817) renamed Paths.LockRoots() -> RootsToLock() to disambiguate from the package-level LockRoots() that acquires the flocks. This adapts the daemon's lock-acquisition call after rebasing onto that foundations; the package func call LockRoots(...) is unchanged. Surfaced by the rebase as a build break (paths.LockRoots undefined); no behavior change. build/vet/test -short green.
… exit
Review follow-ups to the Phase 1 orchestration + daemon layer:
- Executor test coverage:
- Cancel while an INDEX build is parked in its dependency wait (holding no
slot) now has a direct test: it must unblock via <-gctx.Done() and never
run on a chunk that never froze. The existing ContextCancelAborts only
covered a parked CHUNK build.
- The Rebuild metric emitted from the real executePlan index path is now
asserted end-to-end (one Rebuild per IndexBuild, chunks == Hi-Lo+1); it was
previously only exercised by a direct unit call on the sink.
Both run green under -race.
- startup.go: log a clean-shutdown line after ServeReads returns. In cold-only
Phase 1 the production ServeReads is a no-op (reads stay on the v1 SQLite
daemon until the #772 cutover), so the daemon exits immediately after
catch-up — the new line makes that an explicit, expected event rather than
looking like a misconfiguration. The "serving reads" log is reworded to
"handing off to the read server" to match.
- observability.go: note on the Metrics interface that LastCommitted /
ChunkBoundary / Freeze / Prune / ColdTierBytes are Phase-2-wired and have no
caller in this cold-only layer, so a reader doesn't hunt for one.
gofmt clean; go test -race green on the streaming package (RocksDB 10.9.1 cgo).
Closes #815.
Closes the one #815 acceptance criterion previously proven only at the primitive level: that the daemon, booted from one TOML, catches up to the tip and materializes all three cold data types PLUS the window index through the REAL entrypoint (RunDaemon -> validateConfig -> catchUp -> executePlan -> processChunk -> buildTxhashIndex -> buildThenSweep), then serves. The existing daemon happy-path test sits the tip inside chunk 0, so its catch-up is a deliberate no-op. The injected backend serves a complete chunk 0 of mostly zero-tx ledgers with a sparse few carrying one transaction, so the chunk's txhash .bin has keys to index (a wholly zero-tx chunk cannot build an index). cpi=1 makes the single-chunk window index terminal, so the test asserts the full txhash lifecycle: ledgers + events frozen on disk, one frozen index coverage [0,0] with its .idx present, and the per-chunk .bin key demoted + swept. workers=1 also exercises the index-waits-on-its-chunk path through the daemon. Cheap ledger bodies keep the full ~10k-ledger catch-up under ~0.3s, so it stays in -short. Test-only change; no production code touched. Part of #815.
Rebased the orchestration/daemon layer onto the reorganized #818 (geometry + catalog subpackages) and propagated #817's API changes: - qualify moved symbols: catalog.{Catalog,ArtifactSet,NewArtifactSet,AllArtifacts, NewCatalog}; geometry.{Kind*,State*,Layout,NewLayout,TxHashIndexID, TxHashIndexCoverage,TxHashIndexLayout,NewTxHashIndexLayout,MaxChunksPerTxhashIndex, LastCompleteChunkAt} - window -> tx-hash-index rename (Windows->TxHashIndexLayout, FrozenCoverage-> FrozenTxHashIndex, AllIndexKeys->AllTxHashIndexKeys, IndexBuild.Window->.Index) - config regroup: cfg.Backfill.ChunksPerTxhashIndex->cfg.Layout.*, cfg.Streaming.*->cfg.Retention.*/cfg.Storage.*; daemon_test config literals updated - tests reach the index layout via the public cat.TxHashIndexLayout() (txhashIndex is now an unexported field of the catalog package) build + vet + go test -short green on ./cmd/stellar-rpc/internal/fullhistory/...
cbc80ab to
ae91d20
Compare
Restacked on the split/no-hooks #819 and ported the hot tier across the new package boundary: - hot key schema -> geometry (HotState/HotReady/HotTransient, exported HotChunkKey/ParseHotChunkKey/HotChunkPrefix); hot catalog methods -> catalog (HotState, PutHotTransient, FlipHotReady, DeleteHotKey, {Ready,}HotChunkKeys) - processChunk hot-source branch + progress hot refinement (lastCommittedLedger(cat, probe), highestReadyChunkSigned, refineWithHotDB) - new files: pkg/stores/hotchunk, streaming/{eligibility,hotsource,ingest,lifecycle} - daemon wires the cold-only catch-up's HotProbe (NewRocksHotProbe) - crash-hooks REMOVED to match #817/#818 (the split makes cat.hooks unreachable from streaming); the one beforeHotTransient hook test is dropped, the rest are the structural crash tests #817/#818 established - propagated renames: window->tx-hash-index, RetentionGate->RetentionFloor, cat.Has->public HotState, cat.layout->Layout() build + vet + go test -short green on ./cmd/stellar-rpc/internal/fullhistory/...
c4944fb to
aeca6a0
Compare
chowbao
added a commit
that referenced
this pull request
Jun 25, 2026
Rebased the live-ingestion capstone onto the reorganized #820 and propagated: - qualify moved symbols (geometry./catalog.) in daemon.go, startup.go, e2e_test.go - window->tx-hash-index + RetentionGate->RetentionFloor renames; cat.layout->Layout(), cat.Has->public HotState shim, .IndexFilePath->.TxHashIndexFilePath - config regroup: cfg.Streaming.CaptiveCoreConfig -> cfg.Ingestion.CaptiveCoreConfig - restored #821's daemon_test.go (drops the cold-only catch-up test the full daemon supersedes; adds the supervise/backend-tip/boundaries tests) + the HotProbe/Core wiring - avoided the txhash_txhash_index find-replace corruption (was only in the dropped restack) build + vet + go test -short green EXCEPT the lifecycle E2E, whose generated TOML still uses the pre-regroup [streaming]/[backfill] schema (follow-up; per maintainer the stack will be re-rebased).
14aa4c8 to
aafbe0d
Compare
chowbao
added a commit
that referenced
this pull request
Jun 26, 2026
Relocate the one-write protocol ordering helper (mark -> create -> barrier -> flip) from backfill/process.go to catalog_protocol.go, where the protocol's states and mark/flip steps already live, and export it as catalog.OneWrite. processChunk and buildTxhashIndex now call it across the package boundary. It is a zero-dependency pure function and catalog never imports backfill, so there is no import cycle; #820's hot-tier openHotTierForChunk adopts it as the third caller by import alone, with no later relocation. Addresses the #818 review thread that asked to establish the shared helper here rather than deferring the move to #820.
2967c7a to
240aa5e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of #816 — Phase 2 (Live ingestion + lifecycle), layer 1 of 2. Stacked on #PR1C (
streaming-phase1-daemon, Phase 1).The hot tier + lifecycle machinery:
hotchunkDB (multi-CF: ledgers + events + tx-hash;transient/readystate machine; read-only freeze view)hotsource(the ready-hot-DB read source forbackfillSource+ freeze)MaxCommittedSeqread)ingest.HotServiceRunHot/RunColdstream-drain orchestration + itsingest_testcases (verified zero production callers)The machinery is tested standalone here (seeded hot DBs + direct ticks); the daemon wiring lands in the capstone layer.
Verification:
go build+go vet+go test -shortgreen on./cmd/stellar-rpc/internal/fullhistory/...(cgo RocksDB toolchain). Note: the fullcmd/stellar-rpcbinary link requires the Rustlibpreflight/libxdr2json(CImake build-libs);go vet ./cmd/stellar-rpc/type-checks the entrypoint locally.golangci-lintruns in CI.Stack:
streaming-phase2-lifecycle→streaming-phase1-daemon