bench(fullhistory): make apply-load synthetic LCM usable by the read benches at target TPS by chowbao · Pull Request #760 · stellar/stellar-rpc

chowbao · 2026-06-04T18:40:54Z

📊 Results report (addresses #762)

Benchmark report committed at cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load.md — the three synthetic datasets (sac 10k / token 9k / soroswap 2.5k TPS @ 600 ms) run through the full read + ingest suite on c6id.8xlarge, with a comparison to the pubnet chunk-5860 baseline.

Datasets + configs + CSVs + machine-readable RESULTS.md are in GCS at gs://rpc-full-history/synthetic-ledgers/2026-06-04-apply-load-20k/.

Summary

Addresses #762 — a controllable synthetic-ledger dataset whose transaction profile we set deliberately, feeding the existing cold-* / hot-* ingest + query benches. Builds on the apply-load driver from the parent branch (a8c82958).

It delivers the issue's acceptance criteria via the --source=lcm path: apply-load-gen.sh runs stellar-core apply-load to emit a framed LedgerCloseMeta stream, and cold-ingest --source=lcm turns it into the cold packfiles (+ hot store) the query benches read — unchanged. This is used instead of the proposed --source=synthetic + ingest/loadtest.ApplyLoad wiring because the pinned go-stellar-sdk doesn't yet include ingest/loadtest (stellar/go-stellar-sdk#5940), so --source=lcm reaches the same goal with no dependency bump. Swapping in loadtest.ApplyLoad later is a drop-in producer change; the ingest/query path is unaffected.

apply-load config

apply-load-gen.sh writes the upstream docs/apply-load-benchmark-*.cfg shape plus meta output, parameterized per profile:

APPLY_LOAD_MODE="benchmark", APPLY_LOAD_MODEL_TX=sac|custom_token|soroswap
APPLY_LOAD_MAX_SOROBAN_TX_COUNT = txs-per-ledger (the density knob)
APPLY_LOAD_LEDGER_MAX_DEPENDENT_TX_CLUSTERS = CLUSTERS (default 8) — parallel apply threads, a generation-speed knob only; capped at 8 (known multi-threaded-apply perf issues above that)
APPLY_LOAD_NUM_LEDGERS = ledgers to close after setup (NUM_LEDGERS)
METADATA_OUTPUT_STREAM + DISABLE_TX_META_FOR_TESTING=false, BL pre-gen disabled, GENESIS_TEST_ACCOUNT_COUNT ≥ 2× TPL, quorum/history boilerplate.

Target load shapes (10k SAC / 9k OZ / 2.5k Soroswap @ 600 ms blocks)

TPS is taken at the network's 600 ms block time (CLOSE_TIME_MS), so per-ledger tx count = TPS × 0.6. (The ledger header closeTime is whole seconds in XDR, so the sub-second cadence is modeled by density, not timestamps.)

profile	model tx	target	txs/ledger @600ms
sac	`sac`	10 000 TPS	6 000
token/OZ	`custom_token`	9 000 TPS	5 400
soroswap	`soroswap`	2 500 TPS	1 500

A small NUM_LEDGERS is enough — TPS is set by density, not ledger count.

What it took

apply-load-gen.sh path + pubnet passphrase — absolutize OUT_ROOT; default NETWORK_PASSPHRASE to pubnet (the bench binary hardcodes it).
Make the synthetic LCM consumable (lcm_fixup.go) — apply-load's streamed meta holds the same txs in the tx-set and TxProcessing but in different order, and the stored result hash matches no envelope under any passphrase, so the SDK ingest reader (pairs by hash) rejected it with unknown tx hash in LedgerCloseMeta, breaking roundtrip cold-txpage/cold-txhash. We repair it: each result's unique fee-charged account identifies its envelope, so we stamp the real hash (a correct pairing). Default-on (--lcm-fix-tx-hashes). Plus partial final chunk (--lcm-allow-partial) + cold-ledgers/cold-txhash cursor clamping, so runs below a full 10k-ledger chunk work; NUM_LEDGERS knob.
Single-contract events corpus (corpus.go) — relaxed the ≥3-contract floor to "enough filterable terms (anchors + topic values ≥ max K)"; a single SAC contract's transfer events over many accounts give the needed 15 terms. Pubnet behaviour unchanged.
sac BATCH_SAC=1 — BATCH_SAC>1 folds transfers into one tx and only that tx is streamed, so the pack carried ~1 tx/ledger; =1 makes each transfer its own tx so density equals the TPS target.
600 ms block model (CLOSE_TIME_MS, default 600) and CLUSTERS default 8 for generation speed.

Verification — generated all 3 profiles (100 ledgers each, CLUSTERS=8) and decoded the packs

profile	tx/ledger (decoded)	TPS @600ms	tx-hash fixup	cold-ledgers	cold-txpage	cold-txhash	cold-events
sac	6 000	10 000	600022 / 600022	✅	✅	✅ miss-rate 0	✅
token/OZ	5 400	9 000	540028 / 540028	✅	✅	✅ miss-rate 0	n/a
soroswap	1 500	2 500	164445 / 164445	✅	✅	✅ miss-rate 0	✅

Fixup paired 100% (0 skipped) on all three; read benches run 0-errors / miss-rate 0.

Known limitation

cold-events is not supported for token/custom_token — its events are not 4-topic (a workload property). Use sac or soroswap for event benches.

Base note: targeted at rpc-hack, which doesn't yet contain a8c82958 nor the 2026-06-03 bench-report commits, so the diff also carries those preceding bench(fullhistory) commits + a merge of rpc-hack (conflicts were stale report docs + a txpage helper, all resolved to rpc-hack; the apply-load files merged cleanly). The substantive new work is the apply-load commits above.

🤖 Generated with Claude Code

Markdown report covering the cross-machine bench run captured under gs://rpc-full-history/benchmarks/{c6id.2xlarge,c6id.4xlarge,c6id.8xlarge,im4gn.4xlarge}-2026-05-21*. Tables + Mermaid xychart-beta blocks for: peak read throughput, worker scaling (cold and hot n=1), tx-page page-size sweep, xdr-views vs round-trip on tx-hash + events-ingest, per-ledger ingest, bulk ingest, cold-vs-hot speedup, and x86 vs Graviton2 at matched vCPU. Source per-iter CSVs and the summary CSVs that back every table here live at gs://rpc-full-history/benchmarks/_summary/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… report New section 11 transposes the cross-machine tables: one consolidated table per machine (c6id.2xlarge, c6id.4xlarge, c6id.8xlarge, im4gn.4xlarge) listing every bench result — full ledger grid sweep, tx-page, tx-hash (hit/miss × xdrviews/roundtrip), per-ledger ingest, and bulk ingest — with p50/p90/p99 and throughput. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New Section 2 ("Internal vs production RPC providers") includes the prior black-box benchmark across 4–6 production RPC providers and juxtaposes their p50s with the internal hot/cold tiers. Adds a Mermaid bar/line chart of the per-workload speedups. Remaining sections renumbered 3–12. Headline: hot/cold full-history is 10×–1773× faster than the average production RPC across ledger-point, ledger-range, tx-page, tx-hash, and the four event-filter scenarios. Note: 'onfinality' and 'sorobanrpc' are absent from tx-hash and events workloads (n=4 instead of 6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Colons in x-axis labels (ev:nofilt, ev:contract, ev:topic, ev:both) break Mermaid's xychart-beta parser. Replaced with hyphens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New-data-only report over the 2026-06-03 runs (4 machines) on the rewritten rpc-hack bench harness. Notes methodology changes vs 2026-05-21: ops/s is no longer comparable across runs (only single-in-flight p50 latency is), the sweep axis is now query-concurrency 1-16, and ledger/tx-page/tx-hash read coverage narrowed while events query + ingest stage detail broadened. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Condensed two-table view (typical p50 latency + peak throughput) with a full glossary defining every row, column, tier, and variable (n, page, c, p50/p99, ops/s). Links back to the full cross-machine report. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds Table 3 (ingest throughput: hot-ingest ledgers/s, build-txhash-index keys/s) and Table 4 (per-stage ingest cost), plus glossary entries for the ingest workloads and ledgers/s, keys/s, and stage terms. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cold-ingest ledgers/s computed as sum(chunk_wall) / chunk-workers (upper-bound estimate, since the harness records summed per-chunk wall, not true end-to-end wall). Flagged as an estimate; scales with --chunk-workers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…1 report Source/summary CSV paths were missing the dated prefix (data lives under .../benchmarks/2026-05-21/, the undated paths don't exist). Also dates the title and forward-links the 2026-06-03 run, noting the harness changed and ops/s is not comparable across runs. Historical 5/21 numbers are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drives the full read + ingest bench suite in bench-fullhistory: builds the binary once, then runs cold+hot ledgers/txpage/txhash/events read benches (each a 1,4,8,16 query-concurrency sweep) plus the hot-ingest, cold-ingest, and build-txhash-index ingest benches. By default the reads use prebuilt fixtures and ingest writes to scratch (independent measurements). INGEST_FIRST=1 instead ingests first and repoints every read bench at the freshly-ingested stores, so the suite is self-contained from a single raw-ledger packfile seed — usable on a fresh machine with no prebuilt data. Paths/sizing knobs are env- overridable for running across different machines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

PR #750 review (tamirms) flagged two harness gaps and several execution issues. Code fixes: - txpage (hot+cold) previously only touched TransactionHash + ResultPair — it never fetched the page contents, so it measured a tx *count*, not a getTransactions response. New walkPageMaterialize (tx_page_helpers.go) builds a full db.Transaction per tx in the page (envelope, result, meta, events, hash, application order, ledger info). - txpage (hot+cold) had no --xdr-views flag, so it only measured the slow full-decode path. Added --xdr-views with a single-pass view materializer, mirroring the txhash bench. CSVs suffix -roundtrip / -xdrviews; detail column scan_ns -> materialize_ns (decode_ns stays 0 under views). Execution (run-all-benches.sh): - Run the decode-heavy query benches (txpage/txhash/events) once per mode (QUERY_VIEW_MODES = roundtrip + xdrviews) so the report can compare with/ without XDR views. Previously every query ran views-off (slow path). - Events use the worst-case query (EVENTS_BUCKETS=15, max filters/request). - Ingest runs with --parallel; hot-ingest runs both xdr-views on and off (the views run feeds the reads, the parsed run is kept for its CSVs). Smoke-tested: 0 errors, pages fully materialized; views 4-8x faster than round-trip (decode_ns=0 confirms the path dispatch). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

) Re-ran c6id.8xlarge with the corrected harness and rewrote the report to address the PR #750 review: - New "c6id.8xlarge — corrected" section: query latency split into hot/cold tables with roundtrip vs xdr-views columns and P50+P99; events use worst-case K=15; ingest shown hot (parsed vs view, --parallel) and cold with the per-stage phase breakdown + per-ledger driver total. - The other three machines (2xlarge/4xlarge/im4gn) are marked STALE (old harness: tx-page-as-count, views-off) pending a re-run. - Dropped the per-machine raw-cell dump (§12) — the CSVs are on GCS. - Summary table: same treatment (banner, corrected c6id.8xlarge rows, stale markers on the rest). Headline corrected numbers: xdr-views cuts tx-page/tx-hash p50 4-9x (hot tx-hash 10.6->1.2ms) and lifts peak throughput 5-10x (hot tx-hash 706->7253 ops/s); events is decode-insensitive (1.1-1.4x). Hot ingest with views is ~2.1x faster than parsed (skips the 8.4ms/ledger UnmarshalBinary). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ply-load Adds an `lcm` ledger source and an apply-load-gen.sh driver so the bench-fullhistory suite can run on fully synthetic, density-controlled data instead of real pubnet chunks. - sources.go: new --source=lcm reader over apply-load's framed-XDR METADATA_OUTPUT_STREAM. Skips setup ledgers (<= --lcm-checkpoint) and decode-free frame-skips to each chunk's 10k-ledger block; reuses the entire cold-ingest/hot-ingest/build-txhash-index pipeline. Wired --lcm-file/ --lcm-checkpoint flags into both ingest commands. - apply-load-gen.sh: drives stellar-core new-db/new-hist/apply-load -> meta.xdr -> cold-ingest --source=lcm -> packfiles -> build-txhash-index. Profiles map to apply-load model txs + target TPS: sac (~10k), token/oz (~9k custom_token), soroswap (~2.5k). Uses the installed core's protocol. - lcm_source_test.go: unit-tests setup-skip, chunk-block mapping, short-read. - README: documents the lcm source, the driver, profiles, BUILD_TESTS requirement, and the real cost of full 10k-ledger chunks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- absolutize OUT_ROOT so the config/meta paths survive the cd into the per-profile work dir (core was erroring "No config file ... found") - default NETWORK_PASSPHRASE to pubnet to match the bench binary's hardcoded pubnetPassphrase: the ingest reader recomputes each tx hash under this passphrase and matches it against the result entries, so a mismatch broke the roundtrip txpage/txhash read paths with "unknown tx hash in LedgerCloseMeta". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…benches apply-load streams a LedgerCloseMeta whose tx-set and TxProcessing are the same transactions in different order, but whose stored result hash does not equal any envelope's real hash under the network passphrase (confirmed against core 26.1.1: 0/N result hashes matched an envelope under pubnet/testnet/standalone, while every envelope's source account was fee-charged in exactly one TxProcessing entry — a clean bijection). The go-stellar-sdk ingest LedgerTransactionReader pairs envelope↔result BY HASH, so it rejected the meta with "unknown tx hash in LedgerCloseMeta", breaking the roundtrip tx-page and tx-hash read benches. (The xdr-views path, which pairs positionally, was unaffected.) - lcm_fixup.go: for each result, find the fee-charged account, map it back to the unique envelope with that source, and stamp the envelope's real tx hash. This is a correct pairing, not merely self-consistent. cold-ingest --source=lcm applies it by default (--lcm-fix-tx-hashes); logs fixed/skipped per chunk. - sources.go: lcmStream applies the fixup and tolerates a short final chunk (--lcm-allow-partial) so runs sized below a full 10k-ledger chunk work. - cold-ledgers / cold-txhash: clamp sampling + start cursors to each chunk's actual ledger range (FirstSeq/LastSeq) so partial chunks don't short-read. - apply-load-gen.sh: NUM_LEDGERS knob for quick runs — TPS is set by per-ledger density, not ledger count, so a few hundred ledgers hit the profile target. - README: document the fixup, partial chunks, NUM_LEDGERS, and that cold-events is unsupported on apply-load data (single-contract; corpus needs >=3). Validated end-to-end: cold-ledgers / cold-txpage / cold-txhash all run with 0 errors on both a 308-ledger SAC store and an 892-ledger / 7.65M-tx token store (fixup paired 7650042/7650042, skipped 0). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The cold-events corpus builder hard-required ≥3 distinct contracts emitting 4-topic events (termsPerCategory anchors), which excluded apply-load's single-contract synthetic workloads. But the real requirement is enough unique FILTERABLE TERMS to fill the K-bucket sweep — a contract anchor plus topic values — not a minimum contract count. A single contract with topic diversity (e.g. a SAC's `transfer` events varying from/to over thousands of accounts) provides them. - scanForTopTerms: accept ≥1 contract (anchors = min(3, nContracts)); fill the rest of the 15-term budget from topic values. Only fail when NO contract emits 4-topic events. - newCorpus: validate total terms ≥ max(buckets) — the actual sweep requirement — with a message that points at topic diversity / --buckets, not contracts. Validated: cold-events now runs the full K=2..15 sweep on a synthetic SAC store (1 contract + 14 topic terms = 15) and a soroswap store (2 contracts + 13). token/custom_token still yields nothing — its events are not 4-topic (a workload property). Existing pubnet-shaped corpus behaviour is unchanged (still picks 3 contract anchors when ≥3 are present). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

With APPLY_LOAD_BATCH_SAC_COUNT=100 the sac profile folded 100 transfers into a single InvokeHostFunction tx, and core's benchmark mode closed/streamed just one such tx per ledger — so the usable pack carried ~100 transfers/ledger (~100 TPS), 100x below the 10k target (verified by decoding the pack: 1 tx, 1 op, ~97 events per ledger). Setting BATCH_SAC=1 makes every transfer its own tx, so the closed ledger carries the full count. Verified by decoding the regenerated packs (tail/benchmark ledgers): sac : 10000 tx / 10000 ops / 10000 events per ledger -> 10000 TPS soroswap : 2500 tx / 2500 ops / 12500 events per ledger -> 2500 TPS token : 9000 tx / 9000 ops / 9000 events per ledger -> 9000 TPS (unchanged) All four read benches (cold-ledgers/txpage/txhash/events) run with 0 errors and miss-rate=0 on the sac and soroswap stores. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b75236a8a4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T18:45:10Z

+		applyIdx := start + k
+		envelopeRaw, envType, eerr := envAt(applyIdx)


Pair page envelopes by hash instead of index

For --source=lcm apply-load data, lcm_fixup.go documents that the generalized tx set and TxProcessing contain the same transactions in different orders. In the new --xdr-views txpage path, parts[k] comes from TxProcessing[start+k] but the envelope is fetched from the tx set at the same index, so V1/V2 synthetic pages can return a db.Transaction whose envelope does not match its hash/result/meta. This affects the apply-load read benches the commit adds; the roundtrip path avoids it by pairing through the SDK's hash map after fixup.

Useful? React with 👍 / 👎.

Resolves 3 conflicts, all from the bench-report lineage (NOT the apply-load work, which merged cleanly): - results/2026-06-03-cross-machine.md, results/2026-06-03-summary-table.md: took rpc-hack's versions (stale local copies superseded there). - tx_page_helpers.go: took rpc-hack's version (its newer single-pass envelope collection; supporting helpers come in via rpc-hack's other files). Package builds and tests pass; the apply-load SAC store still reads cleanly (cold-txpage errors=0, cold-txhash miss-rate=0, cold-events ok). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rgets Per issue #762, the synthetic datasets target specific load shapes. The network target is 600ms blocks, so TPS is taken at a 600ms block time: per-ledger tx count = TPS * 0.6. Replaces the 1s assumption (CLOSE_TIME_S) with CLOSE_TIME_MS (default 600). The ledger header closeTime is whole seconds in XDR, so the sub-second cadence can't be a timestamp — it's modeled purely by density. Resulting per-ledger densities (BATCH_SAC=1): sac 10,000 TPS -> 6,000 txs/ledger token/OZ 9,000 TPS -> 5,400 txs/ledger soroswap 2,500 TPS -> 1,500 txs/ledger Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b02f0eb5cc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T19:15:03Z

+					// End of the framed file. For the final/only chunk this is
+					// expected when the synthetic run was sized below a full
+					// chunk: yield what we have (if allowed) rather than error.
+					if p.opts.allowPartial && isEnd(rerr) {


Treat truncated LCM frames as ingest errors

When --lcm-allow-partial is enabled by default, this branch also accepts io.ErrUnexpectedEOF, which readFrame returns if the file ends in the middle of a length prefix or payload. In an interrupted apply-load run or a partially copied meta.xdr, cold-ingest will silently stop at the last complete ledger and commit a shortened pack instead of surfacing that the input is corrupt; allowing a short final chunk should only accept a clean EOF before the next frame.

Useful? React with 👍 / 👎.

… speed APPLY_LOAD_LEDGER_MAX_DEPENDENT_TX_CLUSTERS sets the number of parallel apply threads — purely a generation-speed knob, it doesn't change the workload. Per upstream guidance, default it to 8 (was per-profile 1/2) and cap there: stellar- core's multi-threaded apply has known perf issues above 8 even on bigger boxes. Promoted to a top-level CLUSTERS env knob; removed from the per-profile table. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2e7fd326d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T20:17:35Z

+		v1 := lcm.MustV1()
+		for i := range v1.TxProcessing {
+			if h, ok := stamp(v1.TxProcessing[i].FeeProcessing); ok {
+				v1.TxProcessing[i].Result.TransactionHash = h


Pair envelopes after the LCM hash fixup

When --source=lcm ingests V1/V2 apply-load metadata, this only stamps each TxProcessing result with the envelope's real hash but leaves TxProcessing in apply-load order while the generalized tx set remains in its different envelope order (as documented at the top of this file). This is a separate affected path from the existing tx-page note: the xdr-view txhash bench finds the stamped hash in TxProcessing and then materializeViews fetches the envelope by the same apply index, so synthetic LCM packs can return a db.Transaction whose envelope belongs to a different transaction; the fixup needs to make the view materializers pair by hash or otherwise align the two orders.

Useful? React with 👍 / 👎.

…parallel stellar-core binds its HTTP server (default port 11626); running multiple apply-load generations concurrently failed the 2nd/3rd with "bind: address already in use". apply-load doesn't need the HTTP endpoint, so default HTTP_PORT=0 (disabled), env-overridable. Lets all profiles generate in parallel — on a 32-vCPU box that cuts a 3-profile 20k run from ~99h sequential to ~the slowest profile (~42h). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2d00293ff1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T23:27:32Z

+				}
+				payload = raw
+			}
+			if !yield(p.applyFixup(payload, &fx), nil) {


Rewrite LCM header sequences before storing synthetic chunks

When --source=lcm is used with apply-load output whose benchmark ledger numbers do not already match the target chunk range (the script maps chunk 1 to ledger 10002+), this yields the raw LedgerCloseMeta unchanged while the driver indexes it under the synthetic l.Seq. Later read paths derive response metadata from the raw header (ledgerInfoFromHeader for tx pages and events.LCMToPayloads* for event payloads), so getTransaction/getTransactions/getEvents can return ledger sequences from the apply-load file rather than the pack/index sequence that was queried. The stream should rewrite the LCM header sequence to the positional chunk sequence before yielding/storing it.

Useful? React with 👍 / 👎.

Make the synthetic-ledger generation reproducible on another machine. The per-profile generator (apply-load-gen.sh) and the meta fixup were already committed; this adds the orchestration + docs that were previously ad-hoc: - synthetic-run.sh: loop profiles -> apply-load-gen.sh (generate) -> bench-suite.sh (read benches) -> optional GCS upload. Sequential by default; PARALLEL=1 opt-in. Auto-builds the bench binary if BENCH_BIN unset. - bench-suite.sh: cold-* and hot-* read suite per profile (both decode modes, concurrency sweep); skips events for non-4-topic profiles (token). - SYNTHETIC-LEDGERS.md: host prereqs (~buildtests core, RocksDB cgo, Go), the TPS/600ms model, run commands, outputs, and the RAM ceiling. RAM is the real limit: dense apply-load accumulates in-memory soroban state (~8.5 MB/ledger at 6000 SAC tx/ledger), so a full 10k-ledger 10k-TPS SAC chunk needs ~96-128 GB; on a 61 GB box cap sac/token near ~6000 ledgers. Documented with a per-box sizing table so the run can target a larger machine. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ea7cc40c7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-06T17:04:48Z

+gen_one(){
+  local P="$1"
+  log "generate $P (num_ledgers=$NUM_LEDGERS clusters=$CLUSTERS close_ms=$CLOSE_TIME_MS)"
+  PROFILE="$P" "$SCRIPT_DIR/apply-load-gen.sh" > "$OUT_ROOT/$P.gen.log" 2>&1


Propagate generation failures before benchmarking

When apply-load-gen.sh fails for a profile (for example because stellar-core OOMs or no meta.xdr is produced), this command returns non-zero but the function immediately runs log, so gen_one itself succeeds; with set -uo pipefail the sequential loop continues and the parallel wait path can also miss failures. The orchestrator can then run benches or upload using missing/stale cold stores and still print DONE with exit 0 instead of stopping on the failed generation.

Useful? React with 👍 / 👎.

… terms The events corpus hard-required exactly-4-topic events, so apply-load's custom_token profile (whose transfer events carry 3 topics) produced zero terms and couldn't run the events bench at all. - EVENTS_TOPIC_COUNT env (default 4) sets the required topic count; extractors and the scan loop use it instead of a literal 4. sac/soroswap (4-topic) unchanged; token runs with EVENTS_TOPIC_COUNT=3. - newCorpus: instead of erroring when the workload can't reach max(buckets), CAP the K-bucket sweep to the terms available (dedup), logging the cap. Lets low-diversity workloads run at the largest K they support. Verified: token (3-topic, 1 contract) now builds the full 15-term universe (contract + transfer symbol + 13 from-addresses) and runs the K=1..15 sweep; cold-events 235ms@c=1 / 111 ops peak, hot-events 14ms / 1140 ops. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2fd1966903

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-09T05:00:35Z

+  # ---- COLD read benches (auto-discover chunk range) ----
+  for n in $LEDGER_NS; do
+    "$BENCH_BIN" cold-ledgers --cold-dir="$COLD/ledgers" --n="$n" --iters="$LEDGERS_ITERS" \
+       --query-concurrency="$QC" --out="$O" > "$O/cold-ledgers-n$n.log" 2>&1 || echo "  cold-ledgers n=$n FAILED"


Return failure when a benchmark command fails

When any bench invocation fails (for example a missing txhash.idx or a cold-txpage error), the || echo ... FAILED branch consumes the non-zero status, and this script has no failure accumulator or final non-zero exit. As a result, direct bench-suite.sh runs and callers such as synthetic-run.sh can report/upload an incomplete result set as successful; if continuing through the remaining benches is desired, record the failure and exit 1 at the end.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-09T05:00:35Z

+CHECKPOINT="$(grep -oE 'Published final checkpoint before benchmark: ledger [0-9]+' "$APPLY_LOG" \
+  | grep -oE '[0-9]+$' | tail -1 || true)"
+CHECKPOINT="${CHECKPOINT:-0}"


Fail when the apply-load checkpoint cannot be found

If stellar-core does not emit this exact log message (or the wording changes), CHECKPOINT silently falls back to 0, so cold-ingest --source=lcm skips no pre-benchmark setup frames. The generated cold store then includes setup/account-creation ledgers before the dense benchmark ledgers, which corrupts the target TPS/density assumptions for the synthetic read benches; this should error unless the checkpoint was actually found or explicitly provided.

Useful? React with 👍 / 👎.

Results report for the three synthetic apply-load datasets (sac/token/soroswap, 10k SAC / 9k OZ / 2.5k Soroswap TPS @ 600ms blocks) run through the full read + ingest suite on c6id.8xlarge. Datasets, configs, CSVs, and RESULTS.md are in GCS at gs://rpc-full-history/synthetic-ledgers/2026-06-04-apply-load-20k/. Covers query latency (both decode paths, p50/p99), peak throughput, per-stage ingest, and a comparison to the pubnet chunk-5860 baseline (synthetic is ~5-9x slower per query due to per-ledger density; ingest is item-bound). Implements #762's acceptance criteria via --source=lcm (no SDK loadtest dependency). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…d results (#762) Self-contained, offline-capable HTML explorer (same tool as #758) for the three synthetic datasets. The "profile" dimension (sac/token/soroswap) maps onto the explorer's per-subdir axis; UI labels relabeled machine->Profile. - adds make_explorer.py (from #758) + the generated 2026-06-09-synthetic-apply-load-explorer.html (all CSVs embedded; no deps). - fixes the cold-throughput calc: divide chunk_wall by the actual chunk-workers (= chunk count for these runs; override via COLD_CHUNK_WORKERS) instead of a hardcoded 8, so cold ledgers/s reads correctly (sac 20, token 38, soroswap 108). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce8d9c272d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-09T15:07:28Z

+HOT="${HOT:-1}"                    # 1 = also build a hot store per profile and run hot-* benches
+# Profiles whose events are NOT 4-topic (skip cold/hot-events). apply-load's
+# custom_token emits non-4-topic events; sac/soroswap are fine.
+NO_EVENTS="${NO_EVENTS:-token}"


Skip events for the oz profile alias

When PROFILES uses the supported oz alias for custom_token (accepted by apply-load-gen.sh), this default does not skip event benches even though the same non-4-topic event shape applies. The suite then runs cold-events/hot-events for oz and hits the corpus error path instead of producing a clean read-bench run; include oz in the default skip list or normalize the profile name before this check.

Useful? React with 👍 / 👎.

…er + report (#762) Include the real pubnet (non-synthetic) chunk-5860 baseline alongside the three synthetic datasets, on the same c6id.8xlarge, so the comparison is interactive and in the tables — not just prose. - explorer: replace the synthetic-only HTML with 2026-06-09-synthetic-vs-pubnet-explorer.html — 4 datasets (pubnet + sac/token/ soroswap), 200 query rows. pubnet contributes its query sweeps (the headline comparison); the ingest tab stays synthetic-only (pubnet used 8 chunk-workers vs synthetic 1-2, so a single throughput divisor would misreport it — its ingest lives in results/2026-06-03-cross-machine.md). Dimension relabeled Profile -> Dataset. - report: add a pubnet column to the query-latency and throughput tables, and link the explorer. Pubnet query sweeps pulled from gs://.../benchmarks/2026-06-03/c6id.8xlarge-... -corrected (same harness/CSV layout as the synthetic run). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 23bee2f951

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-09T15:21:06Z

+    "$BENCH_BIN" cold-ledgers --cold-dir="$COLD/ledgers" --n="$n" --iters="$LEDGERS_ITERS" \
+       --query-concurrency="$QC" --out="$O" > "$O/cold-ledgers-n$n.log" 2>&1 || echo "  cold-ledgers n=$n FAILED"


Preserve each ledger-size sweep CSV

With the default LEDGER_NS="1 10 20", every iteration writes to the same --out=$O directory while cold-ledgers uses fixed output names (cold-ledgers.csv/cold-ledgers-sweep.csv via createCSV, which truncates with os.Create). The per-n logs are distinct, but the CSV data for n=1 and n=10 is overwritten by the later runs, so the result directory and explorer only retain the last ledger-size sweep; the hot-ledgers loop below has the same pattern.

Useful? React with 👍 / 👎.

tamirms · 2026-06-10T08:19:16Z

@@ -0,0 +1,168 @@
+package main


this is no longer necessary now that stellar/stellar-core#5319 has been merged. Even without that fix in stellar-core, the better workaround is to just configure the network passphrase in the benchmark commands to be "Apply Load"

tamirms · 2026-06-10T08:22:50Z

 const (
 	sourcePack = "pack"
 	sourceBSB  = "bsb"
+	sourceLCM  = "lcm"


instead of introducing a new source, I think it would be better to make have a post processing step after apply load which takes the generated ledgers and then converts them into the pack format. then we can run the benchmarks using the source as pack pointing to the synthetic ledgers packfile

…t p99 tail (#762) results/rocksdb-config.md — per-CF key knobs (events/ledgers/txhash) extracted from the OPTIONS files RocksDB wrote, plus the full verbatim events-CF OPTIONS. Reveals the p99 ingest-tail cause: events & ledgers CFs run on RocksDB defaults (auto-compaction on, max_background_jobs=2, L0 slowdown@20/stop@36), while txhash is tuned write-once (disable_auto_compactions, L0 triggers 999, 8 bg jobs). The events CF's default L0 throttling under dense writes is what produces the ~8x p99/p50 on events hot_write. Linked from the synthetic report. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chowbao and others added 19 commits May 21, 2026 19:37

bench(fullhistory): fix Mermaid parse error in Section 2 RPC chart

ef9351b

Colons in x-axis labels (ev:nofilt, ev:contract, ev:topic, ev:both) break Mermaid's xychart-beta parser. Replaced with hyphens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge branch 'rpc-hack' into bench/cross-machine-report-2026-05-21

02c4122

bench(fullhistory): add per-machine RAM to summary-table glossary

e542ee9

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

Simon Chow and others added 2 commits June 4, 2026 18:45

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 6, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 9, 2026

View reviewed changes

Simon Chow and others added 2 commits June 9, 2026 14:49

chatgpt-codex-connector Bot reviewed Jun 9, 2026

View reviewed changes

chowbao mentioned this pull request Jun 9, 2026

bench(fullhistory): synthetic-ledger datasets via stellar-core apply-load as an ingest/query bench source #762

Closed

chowbao added this to Platform Scrum Jun 9, 2026

github-project-automation Bot moved this to To Do in Platform Scrum Jun 9, 2026

chowbao moved this from To Do to Needs Review in Platform Scrum Jun 9, 2026

chowbao added this to the platform sprint 72 milestone Jun 9, 2026

tamirms reviewed Jun 10, 2026

View reviewed changes

chowbao removed this from the platform sprint 72 milestone Jun 16, 2026

tamirms mentioned this pull request Jun 23, 2026

Investigate and fix the issues surfaced by the O3 ingestion load test #800

Open

		applyIdx := start + k
		envelopeRaw, envType, eerr := envAt(applyIdx)

		"$BENCH_BIN" cold-ledgers --cold-dir="$COLD/ledgers" --n="$n" --iters="$LEDGERS_ITERS" \
		--query-concurrency="$QC" --out="$O" > "$O/cold-ledgers-n$n.log" 2>&1 \|\| echo " cold-ledgers n=$n FAILED"

Uh oh!

Conversation

chowbao commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Results report (addresses #762)

Summary

apply-load config

Target load shapes (10k SAC / 9k OZ / 2.5k Soroswap @ 600 ms blocks)

What it took

Verification — generated all 3 profiles (100 ledgers each, CLUSTERS=8) and decoded the packs

Known limitation

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

tamirms Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

tamirms Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

chowbao commented Jun 4, 2026 •

edited

Loading