From ad7410de16ba526e0ce2bc6bab5bec92d4278a4a Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Thu, 21 May 2026 19:37:44 +0000 Subject: [PATCH 01/27] bench(fullhistory): add 2026-05-21 cross-machine results report Markdown report covering the cross-machine bench run captured under gs://rpc-full-history/benchmarks/{c6id.2xlarge,c6id.4xlarge,c6id.8xlarge,im4gn.4xlarge}-2026-05-21*. Tables + Mermaid xychart-beta blocks for: peak read throughput, worker scaling (cold and hot n=1), tx-page page-size sweep, xdr-views vs round-trip on tx-hash + events-ingest, per-ledger ingest, bulk ingest, cold-vs-hot speedup, and x86 vs Graviton2 at matched vCPU. Source per-iter CSVs and the summary CSVs that back every table here live at gs://rpc-full-history/benchmarks/_summary/. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../results/2026-05-21-cross-machine.md | 213 ++++++++++++++++++ 1 file changed, 213 insertions(+) create mode 100644 cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md new file mode 100644 index 000000000..ec3b1450f --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md @@ -0,0 +1,213 @@ +# stellar-rpc full-history bench comparison + +Cross-machine summary of `cmd/stellar-rpc/scripts/bench-fullhistory` runs. +Source per-iter CSVs live at `gs://rpc-full-history/benchmarks//`; +the summary CSVs that back every table here are at `gs://rpc-full-history/benchmarks/_summary/`. + +## 1. Test machines + +| Instance | Arch | vCPUs | RAM | Local disk | CPU | +|---|---|---|---|---|---| +| c6id.2xlarge | x86_64 | 8 | 16 GB | 441 GB NVMe | Intel Xeon Platinum 8375C @ 2.90GHz | +| c6id.4xlarge | x86_64 | 16 | 30 GB | 884 GB NVMe | Intel Xeon Platinum 8375C @ 2.90GHz | +| c6id.8xlarge | x86_64 | 32 | 62 GB | 1900 GB NVMe | Intel Xeon Platinum 8375C @ 2.90GHz | +| im4gn.4xlarge | aarch64 | 16 | 61 GB | 6929 GB NVMe | AWS Graviton2 (Neoverse-N1) | + +All four ran the same bench binary (Go 1.26.3, RocksDB 10.9.1, zstd 1.5.7) +on identical data (chunks 5859–5999 cold, chunk 5000 hot, chunk 5999 for ingest). +Data lives on a local NVMe instance store on every machine, not EBS. + +## 2. Read performance: peak throughput + +Best ops/sec the machine reaches across the worker sweep (1–32 workers) for +each tier × ledgers-per-read. Cold = page-cache-evict + fresh open per iter; +hot = shared RocksDB handle + 100-iter warmup. + +| Machine | Cold n=1 | Cold n=10 | Cold n=20 | Hot n=1 | Hot n=10 | Hot n=20 | +|---|---|---|---|---|---|---| +| c6id.2xlarge | 3,075 | 451 | 221 | 1,855 | 208 | 106 | +| c6id.4xlarge | 5,458 | 868 | 460 | 3,542 | 390 | 195 | +| c6id.8xlarge | 5,680 | 1,390 | 742 | 5,608 | 608 | 299 | +| im4gn.4xlarge | 4,790 | 865 | 404 | 2,987 | 353 | 176 | + +```mermaid +xychart-beta + title "Peak ops/sec — single ledger read (n=1)" + x-axis [c6id.2xlarge, c6id.4xlarge, c6id.8xlarge, im4gn.4xlarge] + y-axis "ops/sec" 0 --> 7000 + bar [3075, 5458, 5680, 4790] + line [1855, 3542, 5608, 2987] +``` +*Bar = cold tier peak, line = hot tier peak. Cold beats hot at peak across every machine — +cold random-chunk reads parallelize across 141 different packfiles, while hot reads contend on a single RocksDB handle.* + +## 3. Worker scaling (cold n=1) + +How throughput scales with worker count on each machine. Cold n=1 is the most +I/O-bound workload, so >cores often still pays off (evict + reopen per iter). + +| Machine | 1w | 4w | 16w | 32w | +|---|---|---|---|---| +| c6id.2xlarge | 67 | 522 | 3,075 | 1,552 | +| c6id.4xlarge | 38 | 353 | 4,301 | 5,458 | +| c6id.8xlarge | 357 | 964 | 5,006 | 5,680 | +| im4gn.4xlarge | 34 | 325 | 3,544 | 4,790 | + +```mermaid +xychart-beta + title "Cold n=1: ops/sec vs workers" + x-axis "workers" [1, 2, 4, 8, 16, 32] + y-axis "ops/sec" 0 --> 6000 + line [67, 237, 522, 1374, 3075, 1552] + line [38, 127, 353, 1180, 4301, 5458] + line [357, 803, 964, 2880, 5006, 5680] + line [34, 114, 325, 1256, 3544, 4790] +``` +*Series order: c6id.2xlarge, c6id.4xlarge, c6id.8xlarge, im4gn.4xlarge. Mermaid `xychart-beta` +doesn't support per-line legends inline — colors map to the series order above.* + +```mermaid +xychart-beta + title "Hot n=1: ops/sec vs workers" + x-axis "workers" [1, 2, 4, 8, 16, 32] + y-axis "ops/sec" 0 --> 6000 + line [436, 798, 1349, 1732, 1814, 1855] + line [433, 831, 1538, 2585, 3400, 3542] + line [418, 776, 1524, 2521, 4070, 5608] + line [201, 403, 825, 1586, 2579, 2987] +``` +*Same series order. Hot single-ledger reads are RocksDB-block-cache hits — CPU-bound.* + +## 4. tx-page: latency vs page size + +Single-worker bench, p50 latency for a page of N transactions. +Hot (RocksDB, warmup) is roughly 2× faster than cold (packfile, fresh open). + +| Machine | Cold p=20 | Cold p=100 | Cold p=200 | Hot p=20 | Hot p=100 | Hot p=200 | +|---|---|---|---|---|---|---| +| c6id.2xlarge | 12.7 ms | 13.6 ms | 22.5 ms | 6.9 ms | 7.8 ms | 12.2 ms | +| c6id.4xlarge | 13.4 ms | 14.8 ms | 23.3 ms | 6.9 ms | 7.8 ms | 12.3 ms | +| c6id.8xlarge | 13.6 ms | 15.1 ms | 24.0 ms | 6.8 ms | 7.6 ms | 11.8 ms | +| im4gn.4xlarge | 23.7 ms | 24.8 ms | 42.0 ms | 13.4 ms | 14.4 ms | 22.3 ms | + +## 5. tx-hash: xdr-views vs round-trip path + +`getTransaction(hash)` end-to-end. p50 latency for hash hits. +xdr-views slices the result/meta straight from the raw LCM; round-trip +unmarshals the entire LCM and re-serializes each field — much more CPU. + +| Machine | Cold xdrviews | Cold roundtrip | Cold speedup | Hot xdrviews | Hot roundtrip | Hot speedup | +|---|---|---|---|---|---|---| +| c6id.2xlarge | 13.2 ms | 22.9 ms | 1.73× | 1.5 ms | 13.5 ms | 8.80× | +| c6id.4xlarge | 13.1 ms | 23.0 ms | 1.75× | 1.7 ms | 13.0 ms | 7.80× | +| c6id.8xlarge | 12.3 ms | 23.1 ms | 1.88× | 1.8 ms | 13.8 ms | 7.64× | +| im4gn.4xlarge | 18.9 ms | 37.8 ms | 2.00× | 2.6 ms | 23.5 ms | 9.04× | + +Hot speedup is huge (~8×) because hot fetches finish in microseconds, leaving +the path's CPU cost dominant. Cold is fetch-bound (~2 ms packfile open + zstd +decode), so the materialize path saves a smaller fraction of total latency. + +```mermaid +xychart-beta + title "tx-hash speedup (roundtrip / xdrviews)" + x-axis [c6id.2xlarge, c6id.4xlarge, c6id.8xlarge, im4gn.4xlarge] + y-axis "speedup ×" 0 --> 10 + bar [1.73, 1.75, 1.88, 2.00] + line [8.80, 7.80, 7.64, 9.04] +``` +*Bar = cold, line = hot. Speedup is consistently ~1.7–2× cold and ~8× hot across machines.* + +## 6. Per-ledger ingest throughput + +Synchronous single-stream ingestion: each `Add` call WAL-fsyncs before +returning. p50 / ops-per-second from 10,000-ledger streams. + +| Machine | hot-ledgers | hot-txhash (xdrviews) | hot-events (xdrviews) | hot-events (roundtrip) | +|---|---|---|---|---| +| c6id.2xlarge | 299 ops/s | 610 ops/s | 100 ops/s | 44 ops/s | +| c6id.4xlarge | 310 ops/s | 612 ops/s | 102 ops/s | 47 ops/s | +| c6id.8xlarge | 317 ops/s | 658 ops/s | 111 ops/s | 49 ops/s | +| im4gn.4xlarge | 192 ops/s | 456 ops/s | 65 ops/s | 28 ops/s | + +Hot-ledgers and hot-txhash are tighter across machines because +RocksDB WAL-fsync latency on local NVMe dominates. Events ingest is CPU-bound, +so the spread widens — roundtrip path on Graviton2 is ~2× slower than on Ice Lake. + +## 7. Bulk / one-shot ingest + +Per-chunk or single-shot ingest benches. + +| Machine | cold-events xdrviews (events/s) | cold-events roundtrip (events/s) | ingest-raw-txhash (entries/s) | build-txhash-index (keys/s) | cold-ledgers-ingest (ledgers/s) | +|---|---|---|---|---|---| +| c6id.2xlarge | 194,681 | 53,451 | 105,429 | 20,519,559 | 289 | +| c6id.4xlarge | 206,959 | 58,674 | 103,428 | 36,756,938 | 314 | +| c6id.8xlarge | 226,486 | 58,657 | 100,512 | 42,998,821 | 302 | +| im4gn.4xlarge | 142,329 | 33,167 | 107,704 | 39,403,849 | 291 | + +`build-txhash-index` is the CPU-bound phase 2 of the cold txhash MPHF build +(streamhash with 8 parallel block-build workers). `cold-ledgers-ingest` is +network-bound — pulls from `sdf-ledger-close-meta` via GCS+ADC. + +```mermaid +xychart-beta + title "ingest-raw-txhash aggregate throughput (141 chunks, entries/sec)" + x-axis [c6id.2xlarge, c6id.4xlarge, c6id.8xlarge, im4gn.4xlarge] + y-axis "entries/sec" 0 --> 3500000 + bar [105429, 103428, 100512, 107704] +``` + +## 8. Cold vs Hot speedup + +How much faster the hot tier is for matching workloads (workers=1). + +| Machine | Ledger 1@1w | Ledger 1@1w speedup | tx-page p=20 | tx-page speedup | tx-hash xdrviews hit | tx-hash speedup | +|---|---|---|---|---|---|---| +| c6id.2xlarge | 2.2 / 0.8 ms | 2.9× | 12.7 / 6.9 ms | 1.8× | 13.2 / 1.5 ms | 8.6× | +| c6id.4xlarge | 2.4 / 0.8 ms | 3.0× | 13.4 / 6.9 ms | 1.9× | 13.1 / 1.7 ms | 7.9× | +| c6id.8xlarge | 2.4 / 0.8 ms | 2.9× | 13.6 / 6.8 ms | 2.0× | 12.3 / 1.8 ms | 6.8× | +| im4gn.4xlarge | 3.0 / 1.8 ms | 1.6× | 23.7 / 13.4 ms | 1.8× | 18.9 / 2.6 ms | 7.3× | + +*Format: cold_p50 / hot_p50.* + +## 9. Architecture: x86 vs ARM (same vCPU count) + +c6id.4xlarge (Intel Ice Lake, 16 vCPU) vs im4gn.4xlarge (AWS Graviton2, 16 vCPU). +Both 16 vCPU, both local NVMe — direct apples-to-apples on the ISA. + +| Workload | x86 (c6id.4xlarge) | arm (im4gn.4xlarge) | arm / x86 | +|---|---|---|---| +| cold n=1 ops/s @ 1w | 38 ops/s | 34 ops/s | 0.89× | +| hot n=1 ops/s @ 1w | 433 ops/s | 201 ops/s | 0.46× | +| cold n=1 peak ops/s | 5,458 ops/s | 4,790 ops/s | 0.88× | +| hot n=1 peak ops/s | 3,542 ops/s | 2,987 ops/s | 0.84× | +| cold-tx-hash xdrviews p50 | 13.1 ms | 18.9 ms | 1.44× | +| hot-tx-hash xdrviews p50 | 1.7 ms | 2.6 ms | 1.57× | +| hot-ledgers-ingest p50 | 3.10 ms | 5.07 ms | 1.63× | +| hot-txhash-ingest xdrviews p50 | 1.57 ms | 2.11 ms | 1.34× | +| hot-events-ingest xdrviews p50 | 9.73 ms | 15.47 ms | 1.59× | + +For throughput (ops/s) higher is better → arm/x86 < 1 means arm is slower. +For latency higher is worse → arm/x86 > 1 means arm is slower. +On this workload mix, Graviton2 trails Ice Lake by ~10–60% per-operation, +but the gap narrows on RocksDB-fsync-bound benches (hot ingest paths). + +## 10. Caveats + +- All 21 CSVs present on every machine — full parity. +Events xdrviews+roundtrip ran on all four. + +- **c6id.2xlarge tx-page CSVs** use the older schema (`iteration_ns` only, no per-phase + breakdown). Total latencies are still comparable; per-phase fetch/decode/scan + columns are blank for that machine. + +- **Chunk selection**: cold-* benches use chunk 5999 (most recent local pack), + hot-ledgers/hot-tx-page use chunk 5000 (existing hot store on disk). They + cover different ledger ranges — cold vs hot per-iter numbers are still + comparable as relative tier costs but not as same-data comparisons. + +- **hot-tx-hash** uses freshly-ingested chunk-5999 hot stores (because the existing + hot-5000 store has no matching cold pack to sample hashes from). + +- **Worker sweep**: all machines ran workers=1,2,4,8,16,32. Machines with fewer + vCPUs (c6id.2xlarge = 8) oversubscribe at workers > vCPUs; their 32-worker numbers + test how the scheduler handles oversubscription, not raw scaling. From bd1c1c0e87ff1aadf761e13d241591ebd354b188 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Thu, 21 May 2026 19:46:13 +0000 Subject: [PATCH 02/27] bench(fullhistory): add per-machine raw-results section to 2026-05-21 report MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New section 11 transposes the cross-machine tables: one consolidated table per machine (c6id.2xlarge, c6id.4xlarge, c6id.8xlarge, im4gn.4xlarge) listing every bench result — full ledger grid sweep, tx-page, tx-hash (hit/miss × xdrviews/roundtrip), per-ledger ingest, and bulk ingest — with p50/p90/p99 and throughput. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../results/2026-05-21-cross-machine.md | 262 ++++++++++++++++++ 1 file changed, 262 insertions(+) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md index ec3b1450f..d7ed1723b 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md @@ -211,3 +211,265 @@ Events xdrviews+roundtrip ran on all four. - **Worker sweep**: all machines ran workers=1,2,4,8,16,32. Machines with fewer vCPUs (c6id.2xlarge = 8) oversubscribe at workers > vCPUs; their 32-worker numbers test how the scheduler handles oversubscription, not raw scaling. + +## 11. Per-machine raw results + +Every bench result for each machine, in one place. Same numbers as the +cross-machine tables above, transposed so you can see one machine's whole +performance profile at a glance. + +### c6id.2xlarge — 8 vCPU x86_64, 16 GB RAM, 441 GB NVMe + +| Bench | Config | p50 (ms) | p90 (ms) | p99 (ms) | Throughput | +|---|---|---|---|---|---| +| cold-ledgers | n=1 w=1 | 2.216 | 3.941 | 5.343 | 67 ops/s | +| cold-ledgers | n=1 w=2 | 2.118 | 3.451 | 5.338 | 237 ops/s | +| cold-ledgers | n=1 w=4 | 2.181 | 3.519 | 4.924 | 522 ops/s | +| cold-ledgers | n=1 w=8 | 2.602 | 4.455 | 7.064 | 1,374 ops/s | +| cold-ledgers | n=1 w=16 | 3.918 | 6.889 | 14.444 | 3,075 ops/s | +| cold-ledgers | n=1 w=32 | 22.381 | 31.793 | 33.169 | 1,552 ops/s | +| cold-ledgers | n=10 w=1 | 11.231 | 15.656 | 19.012 | 85 ops/s | +| cold-ledgers | n=10 w=2 | 12.722 | 16.455 | 22.134 | 149 ops/s | +| cold-ledgers | n=10 w=4 | 13.363 | 17.751 | 23.990 | 279 ops/s | +| cold-ledgers | n=10 w=8 | 16.366 | 22.095 | 30.112 | 451 ops/s | +| cold-ledgers | n=10 w=16 | 39.351 | 57.651 | 72.360 | 386 ops/s | +| cold-ledgers | n=10 w=32 | 99.329 | 118.305 | 147.150 | 317 ops/s | +| cold-ledgers | n=20 w=1 | 22.539 | 30.775 | 40.214 | 42 ops/s | +| cold-ledgers | n=20 w=2 | 25.046 | 30.803 | 39.164 | 78 ops/s | +| cold-ledgers | n=20 w=4 | 25.752 | 34.166 | 40.014 | 143 ops/s | +| cold-ledgers | n=20 w=8 | 33.383 | 43.370 | 52.029 | 221 ops/s | +| cold-ledgers | n=20 w=16 | 79.351 | 105.758 | 138.159 | 197 ops/s | +| cold-ledgers | n=20 w=32 | 173.738 | 212.313 | 261.079 | 179 ops/s | +| hot-ledgers | n=1 w=1 | 0.771 | 1.207 | 1.747 | 436 ops/s | +| hot-ledgers | n=1 w=2 | 0.859 | 1.244 | 1.595 | 798 ops/s | +| hot-ledgers | n=1 w=4 | 1.050 | 1.669 | 2.410 | 1,349 ops/s | +| hot-ledgers | n=1 w=8 | 1.513 | 2.683 | 3.632 | 1,732 ops/s | +| hot-ledgers | n=1 w=16 | 1.804 | 7.380 | 12.916 | 1,814 ops/s | +| hot-ledgers | n=1 w=32 | 2.019 | 15.404 | 33.587 | 1,855 ops/s | +| hot-ledgers | n=10 w=1 | 8.684 | 10.181 | 11.167 | 44 ops/s | +| hot-ledgers | n=10 w=2 | 9.180 | 11.085 | 13.884 | 79 ops/s | +| hot-ledgers | n=10 w=4 | 11.540 | 13.939 | 15.520 | 127 ops/s | +| hot-ledgers | n=10 w=8 | 16.788 | 20.779 | 24.816 | 173 ops/s | +| hot-ledgers | n=10 w=16 | 29.309 | 40.737 | 53.578 | 195 ops/s | +| hot-ledgers | n=10 w=32 | 49.822 | 89.770 | 134.417 | 208 ops/s | +| hot-ledgers | n=20 w=1 | 16.397 | 18.408 | 20.576 | 23 ops/s | +| hot-ledgers | n=20 w=2 | 18.930 | 21.934 | 24.315 | 40 ops/s | +| hot-ledgers | n=20 w=4 | 21.747 | 26.121 | 28.682 | 67 ops/s | +| hot-ledgers | n=20 w=8 | 34.304 | 39.800 | 44.161 | 86 ops/s | +| hot-ledgers | n=20 w=16 | 57.443 | 72.719 | 88.136 | 101 ops/s | +| hot-ledgers | n=20 w=32 | 107.561 | 161.127 | 204.395 | 106 ops/s | +| cold-tx-page | page=20 | 12.693 | 15.587 | 26.996 | 77 ops/s | +| cold-tx-page | page=100 | 13.607 | 25.673 | 29.203 | 62 ops/s | +| cold-tx-page | page=200 | 22.504 | 27.376 | 31.539 | 48 ops/s | +| hot-tx-page | page=20 | 6.916 | 9.168 | 16.665 | 139 ops/s | +| hot-tx-page | page=100 | 7.836 | 15.083 | 18.545 | 108 ops/s | +| hot-tx-page | page=200 | 12.231 | 16.938 | 23.633 | 83 ops/s | +| cold-tx-hash | xdrviews hit | 13.210 | 16.587 | 18.512 | 74 ops/s | +| cold-tx-hash | xdrviews miss | 10.016 | 13.391 | 14.567 | 95 ops/s | +| cold-tx-hash | roundtrip hit | 22.909 | 26.372 | 29.432 | 44 ops/s | +| cold-tx-hash | roundtrip miss | 9.290 | 13.223 | 13.911 | 101 ops/s | +| hot-tx-hash | xdrviews hit | 1.532 | 2.241 | 3.105 | 624 ops/s | +| hot-tx-hash | xdrviews miss | 0.027 | 0.040 | 0.046 | 38,724 ops/s | +| hot-tx-hash | roundtrip hit | 13.477 | 16.457 | 18.023 | 75 ops/s | +| hot-tx-hash | roundtrip miss | 0.063 | 0.074 | 0.096 | 17,664 ops/s | +| hot-ledgers-ingest | (per ledger) | 3.178 | 4.183 | 5.834 | 299 ops/s | +| hot-txhash-ingest | xdrviews | 1.569 | 2.143 | 2.835 | 610 ops/s (2,819,743 tx total) | +| hot-events-ingest | xdrviews | 9.740 | 12.305 | 22.520 | 100 ops/s (9,901,325 events total) | +| hot-events-ingest | roundtrip | 21.371 | 31.303 | 54.417 | 44 ops/s (9,901,325 events total) | +| cold-events-ingest | xdrviews | 50859.1 | — | — | 194,681 events/s | +| cold-events-ingest | roundtrip | 185242.9 | — | — | 53,451 events/s | +| ingest-raw-txhash | xdrviews | 25814.0 | — | — | 105,429 entries/s aggregate | +| build-txhash-index | run | 18647.8 | — | — | 20,519,559 keys/s | +| cold-ledgers-ingest | packfile | 34556.5 | — | — | 289 ledgers/s | + +### c6id.4xlarge — 16 vCPU x86_64, 30 GB RAM, 884 GB NVMe + +| Bench | Config | p50 (ms) | p90 (ms) | p99 (ms) | Throughput | +|---|---|---|---|---|---| +| cold-ledgers | n=1 w=1 | 2.422 | 4.122 | 6.164 | 38 ops/s | +| cold-ledgers | n=1 w=2 | 2.071 | 2.956 | 4.891 | 127 ops/s | +| cold-ledgers | n=1 w=4 | 2.097 | 3.227 | 4.518 | 353 ops/s | +| cold-ledgers | n=1 w=8 | 2.029 | 3.305 | 5.862 | 1,180 ops/s | +| cold-ledgers | n=1 w=16 | 2.854 | 4.263 | 8.324 | 4,301 ops/s | +| cold-ledgers | n=1 w=32 | 5.185 | 7.225 | 10.706 | 5,458 ops/s | +| cold-ledgers | n=10 w=1 | 10.425 | 15.230 | 18.614 | 88 ops/s | +| cold-ledgers | n=10 w=2 | 11.545 | 14.993 | 19.754 | 164 ops/s | +| cold-ledgers | n=10 w=4 | 11.904 | 16.522 | 21.973 | 304 ops/s | +| cold-ledgers | n=10 w=8 | 12.771 | 17.380 | 25.150 | 572 ops/s | +| cold-ledgers | n=10 w=16 | 16.887 | 22.940 | 34.106 | 868 ops/s | +| cold-ledgers | n=10 w=32 | 39.187 | 56.184 | 69.842 | 779 ops/s | +| cold-ledgers | n=20 w=1 | 21.734 | 26.585 | 31.943 | 46 ops/s | +| cold-ledgers | n=20 w=2 | 23.130 | 28.666 | 34.088 | 86 ops/s | +| cold-ledgers | n=20 w=4 | 22.704 | 28.890 | 35.298 | 160 ops/s | +| cold-ledgers | n=20 w=8 | 24.779 | 31.803 | 43.649 | 304 ops/s | +| cold-ledgers | n=20 w=16 | 32.502 | 41.420 | 56.249 | 460 ops/s | +| cold-ledgers | n=20 w=32 | 81.472 | 103.987 | 134.890 | 388 ops/s | +| hot-ledgers | n=1 w=1 | 0.800 | 1.375 | 1.903 | 433 ops/s | +| hot-ledgers | n=1 w=2 | 0.752 | 1.231 | 1.648 | 831 ops/s | +| hot-ledgers | n=1 w=4 | 0.903 | 1.371 | 2.157 | 1,538 ops/s | +| hot-ledgers | n=1 w=8 | 1.069 | 1.737 | 2.232 | 2,585 ops/s | +| hot-ledgers | n=1 w=16 | 1.524 | 2.578 | 3.704 | 3,400 ops/s | +| hot-ledgers | n=1 w=32 | 2.035 | 6.968 | 12.391 | 3,542 ops/s | +| hot-ledgers | n=10 w=1 | 7.540 | 8.905 | 9.325 | 50 ops/s | +| hot-ledgers | n=10 w=2 | 8.513 | 10.408 | 12.378 | 87 ops/s | +| hot-ledgers | n=10 w=4 | 9.274 | 11.099 | 13.137 | 154 ops/s | +| hot-ledgers | n=10 w=8 | 11.039 | 13.645 | 15.345 | 266 ops/s | +| hot-ledgers | n=10 w=16 | 17.154 | 20.682 | 23.161 | 346 ops/s | +| hot-ledgers | n=10 w=32 | 29.263 | 40.988 | 53.428 | 390 ops/s | +| hot-ledgers | n=20 w=1 | 16.007 | 18.391 | 19.510 | 23 ops/s | +| hot-ledgers | n=20 w=2 | 17.543 | 20.105 | 22.710 | 42 ops/s | +| hot-ledgers | n=20 w=4 | 20.082 | 23.537 | 26.246 | 74 ops/s | +| hot-ledgers | n=20 w=8 | 25.833 | 29.948 | 33.011 | 115 ops/s | +| hot-ledgers | n=20 w=16 | 34.589 | 40.418 | 44.119 | 171 ops/s | +| hot-ledgers | n=20 w=32 | 59.888 | 76.332 | 92.407 | 195 ops/s | +| cold-tx-page | page=20 | 13.384 | 16.646 | 28.065 | 72 ops/s | +| cold-tx-page | page=100 | 14.790 | 26.951 | 29.764 | 58 ops/s | +| cold-tx-page | page=200 | 23.341 | 28.220 | 30.881 | 46 ops/s | +| hot-tx-page | page=20 | 6.943 | 8.709 | 15.445 | 140 ops/s | +| hot-tx-page | page=100 | 7.797 | 14.589 | 17.496 | 110 ops/s | +| hot-tx-page | page=200 | 12.286 | 16.440 | 21.595 | 84 ops/s | +| cold-tx-hash | xdrviews hit | 13.142 | 16.072 | 17.752 | 75 ops/s | +| cold-tx-hash | xdrviews miss | 9.582 | 11.960 | 12.839 | 102 ops/s | +| cold-tx-hash | roundtrip hit | 23.005 | 25.779 | 28.219 | 44 ops/s | +| cold-tx-hash | roundtrip miss | 9.770 | 10.485 | 11.631 | 103 ops/s | +| hot-tx-hash | xdrviews hit | 1.660 | 2.538 | 3.307 | 579 ops/s | +| hot-tx-hash | xdrviews miss | 0.027 | 0.035 | 0.040 | 40,502 ops/s | +| hot-tx-hash | roundtrip hit | 12.952 | 15.609 | 18.004 | 78 ops/s | +| hot-tx-hash | roundtrip miss | 0.059 | 0.076 | 0.085 | 17,923 ops/s | +| hot-ledgers-ingest | (per ledger) | 3.102 | 4.046 | 5.330 | 310 ops/s | +| hot-txhash-ingest | xdrviews | 1.572 | 2.108 | 2.784 | 612 ops/s (2,819,743 tx total) | +| hot-events-ingest | xdrviews | 9.735 | 12.000 | 17.770 | 102 ops/s (9,901,325 events total) | +| hot-events-ingest | roundtrip | 20.985 | 26.309 | 40.925 | 47 ops/s (9,901,325 events total) | +| cold-events-ingest | xdrviews | 47841.9 | — | — | 206,959 events/s | +| cold-events-ingest | roundtrip | 168752.7 | — | — | 58,674 events/s | +| ingest-raw-txhash | xdrviews | 26215.2 | — | — | 103,428 entries/s aggregate | +| build-txhash-index | run | 10410.1 | — | — | 36,756,938 keys/s | +| cold-ledgers-ingest | packfile | 31885.3 | — | — | 314 ledgers/s | + +### c6id.8xlarge — 32 vCPU x86_64, 62 GB RAM, 1900 GB NVMe + +| Bench | Config | p50 (ms) | p90 (ms) | p99 (ms) | Throughput | +|---|---|---|---|---|---| +| cold-ledgers | n=1 w=1 | 2.373 | 3.410 | 5.443 | 357 ops/s | +| cold-ledgers | n=1 w=2 | 2.080 | 3.136 | 5.408 | 803 ops/s | +| cold-ledgers | n=1 w=4 | 2.037 | 2.930 | 4.135 | 964 ops/s | +| cold-ledgers | n=1 w=8 | 2.336 | 3.802 | 6.350 | 2,880 ops/s | +| cold-ledgers | n=1 w=16 | 2.837 | 3.796 | 5.752 | 5,006 ops/s | +| cold-ledgers | n=1 w=32 | 5.157 | 6.625 | 9.689 | 5,680 ops/s | +| cold-ledgers | n=10 w=1 | 12.104 | 17.412 | 21.553 | 78 ops/s | +| cold-ledgers | n=10 w=2 | 13.052 | 18.245 | 23.176 | 143 ops/s | +| cold-ledgers | n=10 w=4 | 12.390 | 17.349 | 24.214 | 290 ops/s | +| cold-ledgers | n=10 w=8 | 12.525 | 17.164 | 23.688 | 585 ops/s | +| cold-ledgers | n=10 w=16 | 14.028 | 19.553 | 31.471 | 1,017 ops/s | +| cold-ledgers | n=10 w=32 | 20.829 | 29.226 | 42.885 | 1,390 ops/s | +| cold-ledgers | n=20 w=1 | 22.317 | 30.372 | 36.050 | 42 ops/s | +| cold-ledgers | n=20 w=2 | 23.868 | 30.873 | 37.992 | 80 ops/s | +| cold-ledgers | n=20 w=4 | 23.362 | 30.890 | 38.558 | 158 ops/s | +| cold-ledgers | n=20 w=8 | 23.286 | 30.952 | 42.207 | 307 ops/s | +| cold-ledgers | n=20 w=16 | 27.135 | 34.862 | 50.226 | 540 ops/s | +| cold-ledgers | n=20 w=32 | 39.920 | 52.227 | 71.984 | 742 ops/s | +| hot-ledgers | n=1 w=1 | 0.820 | 1.191 | 1.950 | 418 ops/s | +| hot-ledgers | n=1 w=2 | 0.827 | 1.541 | 1.756 | 776 ops/s | +| hot-ledgers | n=1 w=4 | 0.844 | 1.301 | 1.713 | 1,524 ops/s | +| hot-ledgers | n=1 w=8 | 0.996 | 1.694 | 2.413 | 2,521 ops/s | +| hot-ledgers | n=1 w=16 | 1.363 | 2.211 | 3.036 | 4,070 ops/s | +| hot-ledgers | n=1 w=32 | 1.855 | 3.104 | 4.271 | 5,608 ops/s | +| hot-ledgers | n=10 w=1 | 8.887 | 10.087 | 11.562 | 42 ops/s | +| hot-ledgers | n=10 w=2 | 9.811 | 11.786 | 13.404 | 78 ops/s | +| hot-ledgers | n=10 w=4 | 10.556 | 12.455 | 14.559 | 141 ops/s | +| hot-ledgers | n=10 w=8 | 12.905 | 15.943 | 18.152 | 227 ops/s | +| hot-ledgers | n=10 w=16 | 13.487 | 16.221 | 18.701 | 393 ops/s | +| hot-ledgers | n=10 w=32 | 19.894 | 24.372 | 28.441 | 608 ops/s | +| hot-ledgers | n=20 w=1 | 18.150 | 20.290 | 22.031 | 21 ops/s | +| hot-ledgers | n=20 w=2 | 18.479 | 21.945 | 24.218 | 40 ops/s | +| hot-ledgers | n=20 w=4 | 19.473 | 23.271 | 26.352 | 76 ops/s | +| hot-ledgers | n=20 w=8 | 26.476 | 30.741 | 35.107 | 112 ops/s | +| hot-ledgers | n=20 w=16 | 29.700 | 36.025 | 39.600 | 208 ops/s | +| hot-ledgers | n=20 w=32 | 39.625 | 47.434 | 52.407 | 299 ops/s | +| cold-tx-page | page=20 | 13.608 | 16.479 | 29.243 | 72 ops/s | +| cold-tx-page | page=100 | 15.112 | 26.879 | 31.854 | 58 ops/s | +| cold-tx-page | page=200 | 23.968 | 28.766 | 32.156 | 45 ops/s | +| hot-tx-page | page=20 | 6.820 | 8.848 | 16.264 | 142 ops/s | +| hot-tx-page | page=100 | 7.607 | 14.136 | 16.487 | 113 ops/s | +| hot-tx-page | page=200 | 11.844 | 16.408 | 22.577 | 86 ops/s | +| cold-tx-hash | xdrviews hit | 12.283 | 13.630 | 15.247 | 81 ops/s | +| cold-tx-hash | xdrviews miss | 8.754 | 9.584 | 10.955 | 113 ops/s | +| cold-tx-hash | roundtrip hit | 23.127 | 26.421 | 28.433 | 43 ops/s | +| cold-tx-hash | roundtrip miss | 9.938 | 12.796 | 13.789 | 97 ops/s | +| hot-tx-hash | xdrviews hit | 1.800 | 2.612 | 3.714 | 540 ops/s | +| hot-tx-hash | xdrviews miss | 0.031 | 0.041 | 0.060 | 33,527 ops/s | +| hot-tx-hash | roundtrip hit | 13.750 | 16.380 | 18.897 | 74 ops/s | +| hot-tx-hash | roundtrip miss | 0.077 | 0.089 | 0.121 | 14,093 ops/s | +| hot-ledgers-ingest | (per ledger) | 2.996 | 3.905 | 5.255 | 317 ops/s | +| hot-txhash-ingest | xdrviews | 1.464 | 1.945 | 2.552 | 658 ops/s (2,819,743 tx total) | +| hot-events-ingest | xdrviews | 9.039 | 11.042 | 15.951 | 111 ops/s (9,901,325 events total) | +| hot-events-ingest | roundtrip | 20.373 | 24.632 | 33.506 | 49 ops/s (9,901,325 events total) | +| cold-events-ingest | xdrviews | 43717.1 | — | — | 226,486 events/s | +| cold-events-ingest | roundtrip | 168799.6 | — | — | 58,657 events/s | +| ingest-raw-txhash | xdrviews | 27084.4 | — | — | 100,512 entries/s aggregate | +| build-txhash-index | run | 8898.9 | — | — | 42,998,821 keys/s | +| cold-ledgers-ingest | packfile | 33158.8 | — | — | 302 ledgers/s | + +### im4gn.4xlarge — 16 vCPU aarch64, 61 GB RAM, 6929 GB NVMe + +| Bench | Config | p50 (ms) | p90 (ms) | p99 (ms) | Throughput | +|---|---|---|---|---|---| +| cold-ledgers | n=1 w=1 | 2.974 | 4.698 | 6.852 | 34 ops/s | +| cold-ledgers | n=1 w=2 | 2.781 | 3.840 | 5.529 | 114 ops/s | +| cold-ledgers | n=1 w=4 | 2.839 | 4.115 | 5.710 | 325 ops/s | +| cold-ledgers | n=1 w=8 | 2.919 | 4.241 | 7.127 | 1,256 ops/s | +| cold-ledgers | n=1 w=16 | 3.601 | 4.799 | 6.993 | 3,544 ops/s | +| cold-ledgers | n=1 w=32 | 6.005 | 8.033 | 12.968 | 4,790 ops/s | +| cold-ledgers | n=10 w=1 | 17.841 | 23.707 | 29.562 | 54 ops/s | +| cold-ledgers | n=10 w=2 | 18.234 | 22.710 | 29.252 | 105 ops/s | +| cold-ledgers | n=10 w=4 | 17.432 | 22.389 | 28.361 | 210 ops/s | +| cold-ledgers | n=10 w=8 | 17.442 | 22.063 | 29.317 | 420 ops/s | +| cold-ledgers | n=10 w=16 | 19.762 | 26.004 | 34.860 | 748 ops/s | +| cold-ledgers | n=10 w=32 | 33.336 | 49.485 | 67.468 | 865 ops/s | +| cold-ledgers | n=20 w=1 | 33.456 | 42.961 | 52.199 | 29 ops/s | +| cold-ledgers | n=20 w=2 | 32.977 | 41.841 | 49.494 | 58 ops/s | +| cold-ledgers | n=20 w=4 | 33.653 | 41.595 | 48.814 | 110 ops/s | +| cold-ledgers | n=20 w=8 | 33.680 | 41.694 | 54.017 | 215 ops/s | +| cold-ledgers | n=20 w=16 | 41.344 | 53.195 | 68.686 | 354 ops/s | +| cold-ledgers | n=20 w=32 | 73.711 | 103.633 | 131.510 | 404 ops/s | +| hot-ledgers | n=1 w=1 | 1.809 | 2.439 | 3.613 | 201 ops/s | +| hot-ledgers | n=1 w=2 | 1.646 | 2.246 | 2.559 | 403 ops/s | +| hot-ledgers | n=1 w=4 | 1.502 | 2.245 | 2.808 | 825 ops/s | +| hot-ledgers | n=1 w=8 | 1.506 | 2.247 | 2.918 | 1,586 ops/s | +| hot-ledgers | n=1 w=16 | 1.782 | 2.773 | 9.279 | 2,579 ops/s | +| hot-ledgers | n=1 w=32 | 2.409 | 5.987 | 10.198 | 2,987 ops/s | +| hot-ledgers | n=10 w=1 | 16.805 | 19.143 | 20.698 | 23 ops/s | +| hot-ledgers | n=10 w=2 | 14.748 | 17.911 | 21.106 | 46 ops/s | +| hot-ledgers | n=10 w=4 | 14.396 | 17.530 | 20.204 | 95 ops/s | +| hot-ledgers | n=10 w=8 | 14.243 | 16.904 | 20.106 | 198 ops/s | +| hot-ledgers | n=10 w=16 | 19.724 | 24.202 | 27.606 | 293 ops/s | +| hot-ledgers | n=10 w=32 | 32.419 | 45.457 | 56.877 | 353 ops/s | +| hot-ledgers | n=20 w=1 | 26.810 | 30.213 | 31.765 | 14 ops/s | +| hot-ledgers | n=20 w=2 | 27.582 | 31.121 | 33.665 | 27 ops/s | +| hot-ledgers | n=20 w=4 | 28.406 | 32.568 | 34.973 | 53 ops/s | +| hot-ledgers | n=20 w=8 | 29.934 | 35.737 | 39.845 | 95 ops/s | +| hot-ledgers | n=20 w=16 | 40.659 | 46.355 | 50.502 | 147 ops/s | +| hot-ledgers | n=20 w=32 | 65.794 | 83.498 | 100.038 | 176 ops/s | +| cold-tx-page | page=20 | 23.653 | 27.490 | 48.412 | 42 ops/s | +| cold-tx-page | page=100 | 24.770 | 46.358 | 54.055 | 34 ops/s | +| cold-tx-page | page=200 | 42.033 | 48.715 | 54.122 | 26 ops/s | +| hot-tx-page | page=20 | 13.392 | 15.985 | 29.933 | 74 ops/s | +| hot-tx-page | page=100 | 14.388 | 27.451 | 31.126 | 59 ops/s | +| hot-tx-page | page=200 | 22.283 | 30.863 | 41.145 | 46 ops/s | +| cold-tx-hash | xdrviews hit | 18.944 | 22.186 | 23.582 | 52 ops/s | +| cold-tx-hash | xdrviews miss | 14.492 | 18.097 | 19.147 | 66 ops/s | +| cold-tx-hash | roundtrip hit | 37.831 | 41.851 | 45.253 | 27 ops/s | +| cold-tx-hash | roundtrip miss | 13.886 | 14.874 | 18.973 | 70 ops/s | +| hot-tx-hash | xdrviews hit | 2.605 | 3.701 | 4.626 | 374 ops/s | +| hot-tx-hash | xdrviews miss | 0.056 | 0.076 | 0.097 | 18,601 ops/s | +| hot-tx-hash | roundtrip hit | 23.546 | 26.612 | 30.802 | 43 ops/s | +| hot-tx-hash | roundtrip miss | 0.076 | 0.094 | 0.237 | 13,497 ops/s | +| hot-ledgers-ingest | (per ledger) | 5.070 | 6.370 | 8.171 | 192 ops/s | +| hot-txhash-ingest | xdrviews | 2.114 | 2.797 | 3.634 | 456 ops/s (2,819,743 tx total) | +| hot-events-ingest | xdrviews | 15.468 | 18.717 | 22.595 | 65 ops/s (9,901,325 events total) | +| hot-events-ingest | roundtrip | 36.445 | 42.651 | 50.146 | 28 ops/s (9,901,325 events total) | +| cold-events-ingest | xdrviews | 69566.7 | — | — | 142,329 events/s | +| cold-events-ingest | roundtrip | 298530.1 | — | — | 33,167 events/s | +| ingest-raw-txhash | xdrviews | 25174.0 | — | — | 107,704 entries/s aggregate | +| build-txhash-index | run | 9710.8 | — | — | 39,403,849 keys/s | +| cold-ledgers-ingest | packfile | 34371.8 | — | — | 291 ledgers/s | From d1389001db92c6169c67521e42a9d1efe5de153f Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Thu, 21 May 2026 19:52:08 +0000 Subject: [PATCH 03/27] bench(fullhistory): add production-RPC-providers baseline to report MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New Section 2 ("Internal vs production RPC providers") includes the prior black-box benchmark across 4–6 production RPC providers and juxtaposes their p50s with the internal hot/cold tiers. Adds a Mermaid bar/line chart of the per-workload speedups. Remaining sections renumbered 3–12. Headline: hot/cold full-history is 10×–1773× faster than the average production RPC across ledger-point, ledger-range, tx-page, tx-hash, and the four event-filter scenarios. Note: 'onfinality' and 'sorobanrpc' are absent from tx-hash and events workloads (n=4 instead of 6). Co-Authored-By: Claude Opus 4.7 (1M context) --- .../results/2026-05-21-cross-machine.md | 53 +++++++++++++++---- 1 file changed, 43 insertions(+), 10 deletions(-) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md index d7ed1723b..75ad3de91 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md @@ -17,7 +17,40 @@ All four ran the same bench binary (Go 1.26.3, RocksDB 10.9.1, zstd 1.5.7) on identical data (chunks 5859–5999 cold, chunk 5000 hot, chunk 5999 for ingest). Data lives on a local NVMe instance store on every machine, not EBS. -## 2. Read performance: peak throughput +## 2. Internal vs production RPC providers (p50) + +External-RPC-provider baseline from a prior black-box benchmark, juxtaposed with +the internal full-history hot/cold p50 for the same workload. `rpc avg/min/max` +aggregate over `n` providers (the `missing` column lists providers absent from a +workload). Internal hot/cold here are from an earlier bench run snapshot — the +current per-machine dataset is in Section 12. + +| Scenario | Workload | Hot | Cold | RPC avg | RPC min | RPC max | n | vs hot | vs cold | Missing providers | +|---|---|---|---|---|---|---|---|---|---|---| +| ledger-point | | 1.4 ms | 1.3 ms | 439.4 ms | 227.0 ms | 693.3 ms | 6 | 309× | 341× | | +| ledger-range | n=10 | 14.4 ms | 13.4 ms | 3.70 s | 2.35 s | 7.10 s | 6 | 256× | 275× | | +| tx-page | page=20 | 11.9 ms | 11.9 ms | 211.5 ms | 111.3 ms | 299.2 ms | 6 | 18× | 18× | | +| tx-hash | | 11.9 ms | 11.8 ms | 122.7 ms | 47.9 ms | 181.1 ms | 4 | 10× | 10× | onfinality, sorobanrpc | +| events | no-filter | 5.3 ms | 2.3 ms | 155.1 ms | 84.6 ms | 206.0 ms | 4 | 29× | 68× | onfinality, sorobanrpc | +| events | contract | 0.2 ms | 0.3 ms | 109.1 ms | 39.3 ms | 158.3 ms | 4 | 510× | 371× | onfinality, sorobanrpc | +| events | topic | 4.7 ms | 8.3 ms | 193.9 ms | 29.5 ms | 293.0 ms | 4 | 41× | 23× | onfinality, sorobanrpc | +| events | both | 0.1 ms | 0.1 ms | 118.8 ms | 44.4 ms | 164.0 ms | 4 | 1773× | 1467× | onfinality, sorobanrpc | + +```mermaid +xychart-beta + title "Internal hot/cold p50 speedup vs production RPCs (log scale needed; values clipped at 2000)" + x-axis [ledger-pt, ledger-rng, tx-page, tx-hash, ev:nofilt, ev:contract, ev:topic, ev:both] + y-axis "× faster than RPC avg" 0 --> 2000 + bar [309, 256, 18, 10, 29, 510, 41, 1773] + line [341, 275, 18, 10, 68, 371, 23, 1467] +``` +*Bar = hot tier, line = cold tier. The `events both` workload (filter on both +contract and topic) is the most lopsided — internal lookup is essentially free +(MPHF + bitmap intersect) while RPCs scan-and-filter. `tx-hash` is the tightest +ratio (~10×) because all RPCs index transactions by hash too — the gap is RPC +overhead, not algorithmic.* + +## 3. Read performance: peak throughput Best ops/sec the machine reaches across the worker sweep (1–32 workers) for each tier × ledgers-per-read. Cold = page-cache-evict + fresh open per iter; @@ -41,7 +74,7 @@ xychart-beta *Bar = cold tier peak, line = hot tier peak. Cold beats hot at peak across every machine — cold random-chunk reads parallelize across 141 different packfiles, while hot reads contend on a single RocksDB handle.* -## 3. Worker scaling (cold n=1) +## 4. Worker scaling (cold n=1) How throughput scales with worker count on each machine. Cold n=1 is the most I/O-bound workload, so >cores often still pays off (evict + reopen per iter). @@ -78,7 +111,7 @@ xychart-beta ``` *Same series order. Hot single-ledger reads are RocksDB-block-cache hits — CPU-bound.* -## 4. tx-page: latency vs page size +## 5. tx-page: latency vs page size Single-worker bench, p50 latency for a page of N transactions. Hot (RocksDB, warmup) is roughly 2× faster than cold (packfile, fresh open). @@ -90,7 +123,7 @@ Hot (RocksDB, warmup) is roughly 2× faster than cold (packfile, fresh open). | c6id.8xlarge | 13.6 ms | 15.1 ms | 24.0 ms | 6.8 ms | 7.6 ms | 11.8 ms | | im4gn.4xlarge | 23.7 ms | 24.8 ms | 42.0 ms | 13.4 ms | 14.4 ms | 22.3 ms | -## 5. tx-hash: xdr-views vs round-trip path +## 6. tx-hash: xdr-views vs round-trip path `getTransaction(hash)` end-to-end. p50 latency for hash hits. xdr-views slices the result/meta straight from the raw LCM; round-trip @@ -117,7 +150,7 @@ xychart-beta ``` *Bar = cold, line = hot. Speedup is consistently ~1.7–2× cold and ~8× hot across machines.* -## 6. Per-ledger ingest throughput +## 7. Per-ledger ingest throughput Synchronous single-stream ingestion: each `Add` call WAL-fsyncs before returning. p50 / ops-per-second from 10,000-ledger streams. @@ -133,7 +166,7 @@ Hot-ledgers and hot-txhash are tighter across machines because RocksDB WAL-fsync latency on local NVMe dominates. Events ingest is CPU-bound, so the spread widens — roundtrip path on Graviton2 is ~2× slower than on Ice Lake. -## 7. Bulk / one-shot ingest +## 8. Bulk / one-shot ingest Per-chunk or single-shot ingest benches. @@ -156,7 +189,7 @@ xychart-beta bar [105429, 103428, 100512, 107704] ``` -## 8. Cold vs Hot speedup +## 9. Cold vs Hot speedup How much faster the hot tier is for matching workloads (workers=1). @@ -169,7 +202,7 @@ How much faster the hot tier is for matching workloads (workers=1). *Format: cold_p50 / hot_p50.* -## 9. Architecture: x86 vs ARM (same vCPU count) +## 10. Architecture: x86 vs ARM (same vCPU count) c6id.4xlarge (Intel Ice Lake, 16 vCPU) vs im4gn.4xlarge (AWS Graviton2, 16 vCPU). Both 16 vCPU, both local NVMe — direct apples-to-apples on the ISA. @@ -191,7 +224,7 @@ For latency higher is worse → arm/x86 > 1 means arm is slower. On this workload mix, Graviton2 trails Ice Lake by ~10–60% per-operation, but the gap narrows on RocksDB-fsync-bound benches (hot ingest paths). -## 10. Caveats +## 11. Caveats - All 21 CSVs present on every machine — full parity. Events xdrviews+roundtrip ran on all four. @@ -212,7 +245,7 @@ Events xdrviews+roundtrip ran on all four. vCPUs (c6id.2xlarge = 8) oversubscribe at workers > vCPUs; their 32-worker numbers test how the scheduler handles oversubscription, not raw scaling. -## 11. Per-machine raw results +## 12. Per-machine raw results Every bench result for each machine, in one place. Same numbers as the cross-machine tables above, transposed so you can see one machine's whole From ef9351b63fc9427f3e1f3ffa7c53d263a7371b63 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Thu, 21 May 2026 19:54:32 +0000 Subject: [PATCH 04/27] bench(fullhistory): fix Mermaid parse error in Section 2 RPC chart Colons in x-axis labels (ev:nofilt, ev:contract, ev:topic, ev:both) break Mermaid's xychart-beta parser. Replaced with hyphens. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../bench-fullhistory/results/2026-05-21-cross-machine.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md index 75ad3de91..3f74fad50 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md @@ -39,7 +39,7 @@ current per-machine dataset is in Section 12. ```mermaid xychart-beta title "Internal hot/cold p50 speedup vs production RPCs (log scale needed; values clipped at 2000)" - x-axis [ledger-pt, ledger-rng, tx-page, tx-hash, ev:nofilt, ev:contract, ev:topic, ev:both] + x-axis [ledger-pt, ledger-rng, tx-page, tx-hash, ev-nofilt, ev-contract, ev-topic, ev-both] y-axis "× faster than RPC avg" 0 --> 2000 bar [309, 256, 18, 10, 29, 510, 41, 1773] line [341, 275, 18, 10, 68, 371, 23, 1467] From 96d560f5b18d385fbd0bc1050f5e88cf4b184f9b Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Wed, 3 Jun 2026 18:19:42 +0000 Subject: [PATCH 05/27] bench(fullhistory): add 2026-06-03 cross-machine results report New-data-only report over the 2026-06-03 runs (4 machines) on the rewritten rpc-hack bench harness. Notes methodology changes vs 2026-05-21: ops/s is no longer comparable across runs (only single-in-flight p50 latency is), the sweep axis is now query-concurrency 1-16, and ledger/tx-page/tx-hash read coverage narrowed while events query + ingest stage detail broadened. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../results/2026-06-03-cross-machine.md | 441 ++++++++++++++++++ 1 file changed, 441 insertions(+) create mode 100644 cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md new file mode 100644 index 000000000..23246e83a --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md @@ -0,0 +1,441 @@ +# stellar-rpc full-history bench comparison — 2026-06-03 + +Cross-machine summary of `cmd/stellar-rpc/scripts/bench-fullhistory` runs from +2026-06-03. Source per-iter and per-sweep CSVs live at +`gs://rpc-full-history/benchmarks/2026-06-03//`. Every number here +is recomputed directly from those CSVs. + +This run uses the **rewritten bench harness** (`rpc-hack`, commit `a16dfcc6`), +which is not the same harness as the 2026-05-21 report. See +[§9 Methodology changes since 2026-05-21](#9-methodology-changes-since-2026-05-21) +before comparing the two — several axes are no longer 1:1. + +## 1. Test machines + +| Instance | Arch | vCPUs | RAM | Local disk | CPU | Commit | +|---|---|---|---|---|---|---| +| c6id.2xlarge | x86_64 | 8 | 15 GB | 441 GB NVMe | Intel Xeon Platinum 8375C @ 2.90GHz | a16dfcc6 | +| c6id.4xlarge | x86_64 | 16 | 31 GB | 870 GB NVMe | Intel Xeon Platinum 8375C @ 2.90GHz | a16dfcc6 | +| c6id.8xlarge | x86_64 | 32 | 62 GB | 1700 GB NVMe | Intel Xeon Platinum 8375C @ 2.90GHz | **ed4b7ced** | +| im4gn.4xlarge | aarch64 | 16 | 62 GB | 6800 GB NVMe | AWS Graviton2 (Neoverse-N1) | a16dfcc6 | + +All ran the same toolchain (Go 1.26.3, RocksDB 10.9.1, zstd 1.5.7) on a local +NVMe instance store, driven by `run-all-benches.sh` with `INGEST_FIRST=1` (each +box ingests its own hot/cold/txhash stores, then reads from them). + +> **Heads-up:** c6id.8xlarge ran a *different* commit (`ed4b7ced`) than the +> other three (`a16dfcc6`). `ed4b7ced` is not present in this branch's history, +> so its exact delta is unknown — treat 8xlarge as "approximately the same +> build" and don't over-read small 8xlarge-only differences. + +### Data layout + +- **Reads** (`cold-ledgers`, `cold-txpage`, `cold-txhash`, `cold-events`) run + against freshly-ingested stores from this run. Cold ledger/txpage reads use + the prebuilt 141-chunk seed (chunks 5859–5999) in `cold/`; cold-txhash uses a + freshly-built MPHF index, and cold-events points at the bucketed events dir. +- **hot** reads use chunk 5860; **cold** ledger reads sample randomly across the + full 141-chunk seed (page cache evicted per iter). +- **Ingest** ran 16 cold chunks (5860–5875) on the c6id boxes and **140 chunks** + on im4gn — so absolute ingest *wall* and total-key counts differ on im4gn; + per-item rates remain comparable. + +## 2. Read latency at single in-flight (p50) + +`--query-concurrency=1`, p50 milliseconds. This is the cleanest cross-run number +(one request at a time, no queueing). `cold/hot/×` = cold p50 / hot p50 / ratio. + +| Machine | ledgers n=20 | tx-page p=20 | tx-hash roundtrip | events query | +|---|---|---|---|---| +| c6id.2xlarge | 14.3 / 13.6 / 1.1× | 12.3 / 10.3 / 1.2× | 12.2 / 11.5 / 1.1× | 15.8 / 5.5 / 2.8× | +| c6id.4xlarge | 15.2 / 12.9 / 1.2× | 11.9 / 10.4 / 1.1× | 12.2 / 11.0 / 1.1× | 15.5 / 5.3 / 2.9× | +| c6id.8xlarge | 14.8 / 13.2 / 1.1× | 11.5 / 9.8 / 1.2× | 11.7 / 10.6 / 1.1× | 16.0 / 5.2 / 3.1× | +| im4gn.4xlarge | 27.5 / 24.8 / 1.1× | 20.3 / 18.5 / 1.1× | 21.7 / 20.1 / 1.1× | 20.0 / 9.1 / 2.2× | + +*For point reads, cold and hot are now nearly identical (~1.1×): the +decode/materialize CPU cost dominates and a cold packfile open adds only ~1 ms +on warm NVMe. **events** is the exception — cold evicts and re-opens packs per +query, so hot is 2–3× faster.* + +```mermaid +xychart-beta + title "Read p50 @ concurrency=1 (cold tier, ms)" + x-axis [ledgers, tx-page, tx-hash, events] + y-axis "p50 ms" 0 --> 30 + bar [14.3, 12.3, 12.2, 15.8] + bar [15.2, 11.9, 12.2, 15.5] + bar [14.8, 11.5, 11.7, 16.0] + bar [27.5, 20.3, 21.7, 20.0] +``` +*Series order: c6id.2xlarge, c6id.4xlarge, c6id.8xlarge, im4gn.4xlarge. The +three x86 boxes are within noise of each other; Graviton2 trails by ~1.3–1.8×.* + +## 3. Concurrency scaling (1 → 16 in-flight queries) + +`--query-concurrency` sweep. Cells are `p50 ms | ops/s`. `ops/s` is wall-clock +throughput (successful iters ÷ sweep wall) — it scales up with concurrency until +the box saturates, while p50 latency climbs as queries queue. + +### ledgers (n=20) + +| Machine | tier | c=1 | c=4 | c=8 | c=16 | peak ops/s | +|---|---|---|---|---|---|---| +| c6id.2xlarge | cold | 14.3 \| 67 | 15.7 \| 230 | 26.3 \| 250 | 88.1 \| 178 | 250 | +| c6id.2xlarge | hot | 13.6 \| 72 | 16.2 \| 235 | 22.0 \| 344 | 42.0 \| 363 | 363 | +| c6id.4xlarge | cold | 15.2 \| 64 | 15.0 \| 246 | 16.0 \| 428 | 25.8 \| 501 | 501 | +| c6id.4xlarge | hot | 12.9 \| 75 | 14.3 \| 258 | 16.4 \| 448 | 21.6 \| 702 | 702 | +| c6id.8xlarge | cold | 14.8 \| 67 | 14.6 \| 258 | 14.9 \| 483 | 17.9 \| 775 | 775 | +| c6id.8xlarge | hot | 13.2 \| 75 | 13.3 \| 280 | 14.1 \| 538 | 16.5 \| 913 | 913 | +| im4gn.4xlarge | cold | 27.5 \| 36 | 28.1 \| 138 | 27.7 \| 271 | 30.3 \| 483 | 483 | +| im4gn.4xlarge | hot | 24.8 \| 39 | 24.4 \| 160 | 24.4 \| 311 | 25.8 \| 587 | 587 | + +### tx-page (page=20) + +| Machine | tier | c=1 | c=4 | c=8 | c=16 | peak ops/s | +|---|---|---|---|---|---|---| +| c6id.2xlarge | cold | 12.3 \| 60 | 17.0 \| 218 | 26.0 \| 281 | 44.4 \| 289 | 289 | +| c6id.2xlarge | hot | 10.3 \| 95 | 15.6 \| 240 | 24.9 \| 294 | 38.7 \| 302 | 302 | +| c6id.4xlarge | cold | 11.9 \| 47 | 14.7 \| 256 | 17.8 \| 422 | 29.5 \| 509 | 509 | +| c6id.4xlarge | hot | 10.4 \| 94 | 13.6 \| 278 | 16.7 \| 447 | 28.2 \| 520 | 520 | +| c6id.8xlarge | cold | 11.5 \| 50 | 13.6 \| 276 | 15.7 \| 478 | 20.7 \| 717 | 717 | +| c6id.8xlarge | hot | 9.8 \| 100 | 12.1 \| 320 | 14.3 \| 518 | 20.0 \| 745 | 745 | +| im4gn.4xlarge | cold | 20.3 \| 27 | 22.0 \| 173 | 23.3 \| 329 | 31.8 \| 459 | 459 | +| im4gn.4xlarge | hot | 18.5 \| 54 | 20.4 \| 188 | 21.8 \| 348 | 29.8 \| 493 | 493 | + +### tx-hash (roundtrip path) + +| Machine | tier | c=1 | c=4 | c=8 | c=16 | peak ops/s | +|---|---|---|---|---|---|---| +| c6id.2xlarge | cold | 12.2 \| 81 | 17.4 \| 219 | 29.2 \| 263 | 50.3 \| 263 | 263 | +| c6id.2xlarge | hot | 11.5 \| 89 | 17.2 \| 231 | 27.7 \| 276 | 42.0 \| 285 | 285 | +| c6id.4xlarge | cold | 12.2 \| 81 | 15.1 \| 254 | 19.1 \| 404 | 32.6 \| 479 | 479 | +| c6id.4xlarge | hot | 11.0 \| 92 | 14.5 \| 278 | 18.4 \| 431 | 30.6 \| 508 | 508 | +| c6id.8xlarge | cold | 11.7 \| 84 | 14.1 \| 275 | 16.8 \| 462 | 22.6 \| 689 | 689 | +| c6id.8xlarge | hot | 10.6 \| 96 | 13.4 \| 302 | 16.1 \| 494 | 22.9 \| 700 | 700 | +| im4gn.4xlarge | cold | 21.7 \| 42 | 22.3 \| 178 | 23.4 \| 339 | 30.9 \| 506 | 506 | +| im4gn.4xlarge | hot | 20.1 \| 51 | 22.7 \| 181 | 24.1 \| 339 | 32.9 \| 471 | 471 | + +### events (random K-filter query) + +| Machine | tier | c=1 | c=4 | c=8 | c=16 | peak ops/s | +|---|---|---|---|---|---|---| +| c6id.2xlarge | cold | 15.8 \| 54 | 31.8 \| 104 | 63.4 \| 105 | 106.6 \| 118 | 118 | +| c6id.2xlarge | hot | 5.5 \| 199 | 6.0 \| 375 | 10.3 \| 409 | 16.9 \| 430 | 430 | +| c6id.4xlarge | cold | 15.5 \| 54 | 15.8 \| 211 | 32.0 \| 210 | 53.9 \| 239 | 239 | +| c6id.4xlarge | hot | 5.3 \| 219 | 5.8 \| 600 | 6.5 \| 764 | 12.0 \| 828 | 828 | +| c6id.8xlarge | cold | 16.0 \| 50 | 15.0 \| 228 | 17.1 \| 408 | 26.1 \| 504 | 504 | +| c6id.8xlarge | hot | 5.2 \| 237 | 5.5 \| 720 | 6.0 \| 1147 | 8.6 \| 1453 | 1453 | +| im4gn.4xlarge | cold | 20.0 \| 29 | 22.5 \| 163 | 25.8 \| 272 | 39.1 \| 327 | 327 | +| im4gn.4xlarge | hot | 9.1 \| 144 | 9.5 \| 460 | 9.7 \| 638 | 12.8 \| 738 | 738 | + +```mermaid +xychart-beta + title "Hot read throughput vs concurrency (c6id.8xlarge, ops/s)" + x-axis "query-concurrency" [1, 4, 8, 16] + y-axis "ops/sec" 0 --> 1500 + line [75, 280, 538, 913] + line [100, 320, 518, 745] + line [96, 302, 494, 700] + line [237, 720, 1147, 1453] +``` +*Series order: ledgers, tx-page, tx-hash, events. On the 32-vCPU box every +workload scales near-linearly to 16 in-flight queries; events scales best +because hot event queries are pure in-memory bitmap intersects.* + +*On the 8-vCPU c6id.2xlarge, latency balloons past c=8 (e.g. cold-ledgers +14→88 ms at c=16) — that's query oversubscription on 8 cores, not a storage +limit. The 16- and 32-vCPU boxes hold latency roughly flat through c=16.* + +## 4. Cold vs hot + +At a single in-flight query the two tiers are within ~10% for ledger, tx-page, +and tx-hash reads (§2) — the read is decode-bound and a warm-NVMe packfile open +is cheap. The tiers diverge under two conditions: + +- **events**: hot is 2–3× faster at c=1 and the gap widens under load (cold + re-opens + evicts packs per query; hot keeps an in-memory term index). +- **concurrency on small boxes**: cold's per-iter page-cache eviction makes it + more sensitive to oversubscription than hot (compare cold vs hot c=16 on + c6id.2xlarge across every workload). + +## 5. tx-hash roundtrip breakdown + +`getTransaction(hash)` over the full round-trip path (MPHF lookup → packfile +fetch → decode → re-serialize each field). All sampled hashes were **hits** +(the bench samples only present hashes this run — no miss cohort). Per-iter +columns in `cold-txhash-roundtrip.csv` decompose total into +`lookup → pack_open → fetch → scan → materialize`; scan (zstd decode of the +LCM) dominates, with materialize (field re-serialization) second. + +cold and hot land within ~1.1× at c=1 (§2) because, once the packfile is open on +warm NVMe, both tiers pay the same decode + materialize CPU. The MPHF lookup +itself is ~5–20 µs. + +## 6. events query workload (new) + +`cold-events-query` / `hot-events-query` issue randomized event-filter queries. +Each iter draws K filters (K sampled from `1,2,3,5,8,12,15`) partitioned from a +per-chunk 15-term universe (3 highest-volume contracts + 12 highest-volume +topics). Columns: `n_filters`, `n_unique_terms`, `query_ns`, `n_events`. + +- **hot** is a CPU-bound bitmap intersect over an in-memory term index — p50 + ~5 ms on x86, ~9 ms on Graviton2, scaling to >1,400 ops/s on the 32-vCPU box. +- **cold** must open + evict packs and read the on-disk term index per query, so + p50 is 3× higher and the p99 tail is heavy (87–845 ms) — cold event queries + are the most tail-sensitive workload in the suite. + +This is a *new* workload (the 2026-05-21 run measured event *ingest*, not event +*query*), so there is no like-for-like prior number. + +## 7. xdr-view extraction & ingest stage costs + +Ingest now runs as unified `hot-ingest` / `cold-ingest` commands that emit +per-stage timing breakdowns (`*-view.csv`). p50 per item (per ledger for +`write`/`extract`; per event-batch for event stages): + +| Machine | hot ledger write | hot tx extract | hot ev extract | hot ev write | cold ev extract | cold ev term-index | cold ev append | +|---|---|---|---|---|---|---|---| +| c6id.2xlarge | 2.59 | 0.47 | 1.39 | 7.24 | 2.68 | 0.76 | 0.12 | +| c6id.4xlarge | 2.47 | 0.46 | 1.37 | 6.63 | 2.67 | 0.82 | 0.12 | +| c6id.8xlarge | 2.47 | 0.46 | 1.37 | 6.51 | 1.75 | 0.70 | 0.10 | +| im4gn.4xlarge | 4.63 | 0.71 | 2.26 | 10.24 | 2.40 | 0.83 | 0.15 | + +*Hot event `write` (RocksDB put + WAL) is the single most expensive ingest +stage (~6.5–10 ms/batch). xdr-view extraction is cheap (~0.5 ms/ledger for +tx-hash, ~1.4 ms for events). Graviton2 is ~1.5–1.8× slower on the CPU-bound +extract/write stages.* + +## 8. build-txhash-index & ingest driver + +`build-txhash-index` is the CPU-bound phase-2 MPHF build (k-way streamhash merge ++ index construction): + +| Machine | keys | feed s | finish s | keys/s | idx MB | +|---|---|---|---|---|---| +| c6id.2xlarge | 46,153,867 | 1.79 | 0.07 | 24,809,654 | 199 | +| c6id.4xlarge | 46,153,867 | 1.17 | 0.07 | 37,113,017 | 199 | +| c6id.8xlarge | 46,153,867 | 1.10 | 0.11 | 38,311,284 | 199 | +| im4gn.4xlarge | 380,286,251 | 8.86 | 0.91 | 38,906,747 | 1,638 | + +*im4gn built over 140 chunks (380 M keys) vs 16 chunks (46 M keys) on the c6id +boxes — the absolute key count and index size differ, but the per-key rate +(~38 M keys/s) is comparable to the 16-/32-vCPU x86 boxes. The 8-vCPU c6id.2xlarge +is the outlier (~25 M keys/s) — fewer parallel block-build workers.* + +Hot ingest sustained ~74–80 ledgers/s on the c6id boxes and ~49 ledgers/s on +im4gn (`hot-driver-view` total-per-ledger p50 of 12.5–13.5 ms x86, 20.5 ms ARM). + +## 9. Methodology changes since 2026-05-21 + +The harness was rewritten on `rpc-hack`. Before comparing to the 2026-05-21 +report, note: + +- **Sweep axis renamed and re-scoped.** The old "workers" sweep (1,2,4,8,16,32) + is now "query-concurrency" (1,4,8,16). **There is no 32-worker data**, and the + peak ops/s ceiling is therefore lower by construction. +- **`ops/s` is computed differently and is *not* comparable across runs.** In + this run `ops/s ≈ concurrency ÷ p50` (clean wall-clock). In the 2026-05-21 run + the throughput wall absorbed large per-iter overhead (e.g. old cold n=1 w=1 + showed 2.2 ms p50 but only 66 ops/s). **Only single-in-flight p50 latency is a + valid cross-run comparison.** +- **Ledger reads: only `n=20` survived.** The harness writes a fixed + `cold-ledgers.csv` / `hot-ledgers.csv` regardless of `--n`, so the + `--n=1`/`--n=10` invocations were overwritten by the final `--n=20` run. The + 2026-05-21 report's n=1 and n=10 ledger numbers have no counterpart here. +- **tx-page: only `page=20`** (old run also had 100 and 200). +- **tx-hash: roundtrip only, hits only.** The xdr-views read-latency variant and + the miss cohort were not captured as separate read benches this run (xdr-view + cost now appears as the `extract` ingest stage in §7). +- **events: query, not ingest.** §6 is a brand-new workload. +- **Different chunks/data.** Hot reads use chunk 5860 (was 5000); cold txhash/ + events use freshly-built stores. Absolute latencies reflect different ledger + data than the prior run. + +## 10. Architecture: x86 vs ARM (same vCPU count) + +c6id.4xlarge (Ice Lake, 16 vCPU) vs im4gn.4xlarge (Graviton2, 16 vCPU), p50 at +c=1. >1 means ARM is slower. + +| Workload | tier | x86 | arm | arm/x86 | +|---|---|---|---|---| +| ledgers n=20 | cold | 15.2 ms | 27.5 ms | 1.81× | +| ledgers n=20 | hot | 12.9 ms | 24.8 ms | 1.92× | +| tx-page p=20 | cold | 11.9 ms | 20.3 ms | 1.70× | +| tx-page p=20 | hot | 10.4 ms | 18.5 ms | 1.77× | +| tx-hash roundtrip | cold | 12.2 ms | 21.7 ms | 1.78× | +| tx-hash roundtrip | hot | 11.0 ms | 20.1 ms | 1.83× | +| events query | cold | 15.5 ms | 20.0 ms | 1.28× | +| events query | hot | 5.3 ms | 9.1 ms | 1.72× | + +*Graviton2 trails Ice Lake by ~1.7–1.9× on the decode-bound read paths this run — +a wider per-operation gap than the 2026-05-21 run reported (~1.4–1.6×). The +read paths here are dominated by single-threaded zstd decode + XDR work, where +the 8375C's higher per-core throughput shows most. Cold event query is the +tightest (1.28×) because it is more I/O- than CPU-bound.* + +## 11. Caveats + +- **c6id.8xlarge ran commit `ed4b7ced`**, the other three ran `a16dfcc6`. + `ed4b7ced` is not in this branch — its delta is unverified. +- **Throughput (`ops/s`) is not comparable to 2026-05-21** (see §9). Cross-run + comparisons in chat are restricted to single-in-flight p50 latency. +- **Only one `n` / page-size / path survived per workload** (§9) — the 6/03 + dataset is narrower than 5/21 on the read side, broader on events (query) and + ingest stage detail. +- **Oversubscription**: on the 8-vCPU c6id.2xlarge, c=16 cells measure scheduler + behavior under 2× oversubscription, not raw scaling. +- **im4gn ingested 140 cold chunks** vs 16 on the c6id boxes; ingest wall and + absolute key/index sizes differ accordingly (per-item rates are comparable). +- All sampled tx-hash lookups were hits; there is no miss-latency cohort. + +## 12. Per-machine raw results + +Every sweep cell per machine. `ops/s` is wall-clock throughput for that cell. + +### c6id.2xlarge — 8 vCPU x86_64, 15 GB RAM, 441 GB NVMe + +| Bench | c | p50 ms | p90 ms | p99 ms | ops/s | +|---|---|---|---|---|---| +| cold-ledgers (n=20) | 1 | 14.33 | 17.64 | 22.16 | 67 | +| cold-ledgers (n=20) | 4 | 15.73 | 21.89 | 29.50 | 230 | +| cold-ledgers (n=20) | 8 | 26.25 | 47.83 | 62.50 | 250 | +| cold-ledgers (n=20) | 16 | 88.05 | 104.07 | 131.09 | 178 | +| hot-ledgers (n=20) | 1 | 13.59 | 16.61 | 18.00 | 72 | +| hot-ledgers (n=20) | 4 | 16.16 | 22.33 | 25.86 | 235 | +| hot-ledgers (n=20) | 8 | 21.99 | 27.58 | 34.07 | 344 | +| hot-ledgers (n=20) | 16 | 41.98 | 58.17 | 77.74 | 363 | +| cold-txpage (p=20) | 1 | 12.32 | 16.33 | 30.23 | 60 | +| cold-txpage (p=20) | 4 | 16.96 | 25.53 | 39.68 | 218 | +| cold-txpage (p=20) | 8 | 26.04 | 39.41 | 64.43 | 281 | +| cold-txpage (p=20) | 16 | 44.43 | 96.26 | 161.38 | 289 | +| hot-txpage (p=20) | 1 | 10.32 | 15.13 | 21.79 | 95 | +| hot-txpage (p=20) | 4 | 15.59 | 23.06 | 42.11 | 240 | +| hot-txpage (p=20) | 8 | 24.90 | 37.90 | 63.77 | 294 | +| hot-txpage (p=20) | 16 | 38.68 | 96.32 | 184.61 | 302 | +| cold-txhash (roundtrip) | 1 | 12.20 | 16.38 | 21.63 | 81 | +| cold-txhash (roundtrip) | 4 | 17.41 | 26.07 | 33.86 | 219 | +| cold-txhash (roundtrip) | 8 | 29.24 | 41.34 | 52.94 | 263 | +| cold-txhash (roundtrip) | 16 | 50.27 | 104.41 | 170.11 | 263 | +| hot-txhash (roundtrip) | 1 | 11.49 | 14.63 | 18.44 | 89 | +| hot-txhash (roundtrip) | 4 | 17.23 | 23.82 | 30.08 | 231 | +| hot-txhash (roundtrip) | 8 | 27.65 | 39.60 | 50.13 | 276 | +| hot-txhash (roundtrip) | 16 | 42.00 | 102.24 | 176.65 | 285 | +| cold-events (query) | 1 | 15.77 | 22.47 | 87.55 | 54 | +| cold-events (query) | 4 | 31.81 | 51.86 | 236.44 | 104 | +| cold-events (query) | 8 | 63.41 | 92.74 | 465.74 | 105 | +| cold-events (query) | 16 | 106.61 | 160.96 | 843.57 | 118 | +| hot-events (query) | 1 | 5.55 | 11.97 | 16.08 | 199 | +| hot-events (query) | 4 | 6.00 | 34.03 | 45.96 | 375 | +| hot-events (query) | 8 | 10.27 | 65.78 | 107.08 | 409 | +| hot-events (query) | 16 | 16.92 | 113.25 | 182.61 | 430 | + +### c6id.4xlarge — 16 vCPU x86_64, 31 GB RAM, 870 GB NVMe + +| Bench | c | p50 ms | p90 ms | p99 ms | ops/s | +|---|---|---|---|---|---| +| cold-ledgers (n=20) | 1 | 15.23 | 19.42 | 27.57 | 64 | +| cold-ledgers (n=20) | 4 | 15.04 | 18.63 | 25.62 | 246 | +| cold-ledgers (n=20) | 8 | 16.01 | 22.25 | 29.08 | 428 | +| cold-ledgers (n=20) | 16 | 25.81 | 46.80 | 59.89 | 501 | +| hot-ledgers (n=20) | 1 | 12.93 | 16.71 | 19.49 | 75 | +| hot-ledgers (n=20) | 4 | 14.28 | 19.63 | 24.53 | 258 | +| hot-ledgers (n=20) | 8 | 16.41 | 21.67 | 25.63 | 448 | +| hot-ledgers (n=20) | 16 | 21.64 | 27.62 | 33.69 | 702 | +| cold-txpage (p=20) | 1 | 11.91 | 15.97 | 27.99 | 47 | +| cold-txpage (p=20) | 4 | 14.67 | 21.56 | 33.23 | 256 | +| cold-txpage (p=20) | 8 | 17.85 | 26.57 | 43.70 | 422 | +| cold-txpage (p=20) | 16 | 29.55 | 43.42 | 72.16 | 509 | +| hot-txpage (p=20) | 1 | 10.43 | 14.74 | 25.85 | 94 | +| hot-txpage (p=20) | 4 | 13.61 | 20.15 | 32.70 | 278 | +| hot-txpage (p=20) | 8 | 16.67 | 25.34 | 41.81 | 447 | +| hot-txpage (p=20) | 16 | 28.17 | 45.10 | 74.09 | 520 | +| cold-txhash (roundtrip) | 1 | 12.21 | 16.63 | 20.86 | 81 | +| cold-txhash (roundtrip) | 4 | 15.08 | 21.87 | 28.21 | 254 | +| cold-txhash (roundtrip) | 8 | 19.09 | 27.79 | 35.06 | 404 | +| cold-txhash (roundtrip) | 16 | 32.62 | 46.34 | 59.33 | 479 | +| hot-txhash (roundtrip) | 1 | 11.03 | 13.98 | 17.65 | 92 | +| hot-txhash (roundtrip) | 4 | 14.45 | 19.43 | 24.45 | 278 | +| hot-txhash (roundtrip) | 8 | 18.40 | 25.50 | 32.21 | 431 | +| hot-txhash (roundtrip) | 16 | 30.59 | 44.43 | 58.53 | 508 | +| cold-events (query) | 1 | 15.55 | 21.90 | 85.02 | 54 | +| cold-events (query) | 4 | 15.79 | 25.29 | 112.38 | 211 | +| cold-events (query) | 8 | 32.03 | 48.95 | 215.62 | 210 | +| cold-events (query) | 16 | 53.87 | 80.82 | 399.12 | 239 | +| hot-events (query) | 1 | 5.30 | 8.07 | 14.43 | 219 | +| hot-events (query) | 4 | 5.75 | 16.16 | 26.34 | 600 | +| hot-events (query) | 8 | 6.46 | 30.32 | 41.46 | 764 | +| hot-events (query) | 16 | 11.97 | 57.30 | 85.80 | 828 | + +### c6id.8xlarge — 32 vCPU x86_64, 62 GB RAM, 1700 GB NVMe (commit ed4b7ced) + +| Bench | c | p50 ms | p90 ms | p99 ms | ops/s | +|---|---|---|---|---|---| +| cold-ledgers (n=20) | 1 | 14.81 | 18.56 | 26.60 | 67 | +| cold-ledgers (n=20) | 4 | 14.63 | 17.93 | 21.66 | 258 | +| cold-ledgers (n=20) | 8 | 14.94 | 19.19 | 26.79 | 483 | +| cold-ledgers (n=20) | 16 | 17.91 | 25.08 | 32.28 | 775 | +| hot-ledgers (n=20) | 1 | 13.18 | 16.07 | 17.24 | 75 | +| hot-ledgers (n=20) | 4 | 13.29 | 16.88 | 20.62 | 280 | +| hot-ledgers (n=20) | 8 | 14.07 | 18.34 | 22.85 | 538 | +| hot-ledgers (n=20) | 16 | 16.51 | 21.47 | 25.91 | 913 | +| cold-txpage (p=20) | 1 | 11.48 | 15.60 | 27.68 | 50 | +| cold-txpage (p=20) | 4 | 13.63 | 19.17 | 29.98 | 276 | +| cold-txpage (p=20) | 8 | 15.68 | 22.77 | 37.21 | 478 | +| cold-txpage (p=20) | 16 | 20.67 | 31.30 | 49.72 | 717 | +| hot-txpage (p=20) | 1 | 9.85 | 13.79 | 22.03 | 100 | +| hot-txpage (p=20) | 4 | 12.09 | 17.27 | 29.26 | 320 | +| hot-txpage (p=20) | 8 | 14.34 | 21.18 | 34.78 | 518 | +| hot-txpage (p=20) | 16 | 20.05 | 30.17 | 49.32 | 745 | +| cold-txhash (roundtrip) | 1 | 11.66 | 15.87 | 20.96 | 84 | +| cold-txhash (roundtrip) | 4 | 14.12 | 19.58 | 25.86 | 275 | +| cold-txhash (roundtrip) | 8 | 16.84 | 23.56 | 30.49 | 462 | +| cold-txhash (roundtrip) | 16 | 22.57 | 32.23 | 41.31 | 689 | +| hot-txhash (roundtrip) | 1 | 10.59 | 13.63 | 16.98 | 96 | +| hot-txhash (roundtrip) | 4 | 13.39 | 17.57 | 22.32 | 302 | +| hot-txhash (roundtrip) | 8 | 16.08 | 21.98 | 27.43 | 494 | +| hot-txhash (roundtrip) | 16 | 22.86 | 31.39 | 39.15 | 700 | +| cold-events (query) | 1 | 15.99 | 21.83 | 85.55 | 50 | +| cold-events (query) | 4 | 14.97 | 22.06 | 108.68 | 228 | +| cold-events (query) | 8 | 17.09 | 25.16 | 87.11 | 408 | +| cold-events (query) | 16 | 26.14 | 41.10 | 162.38 | 504 | +| hot-events (query) | 1 | 5.17 | 7.13 | 13.96 | 237 | +| hot-events (query) | 4 | 5.52 | 12.70 | 19.35 | 720 | +| hot-events (query) | 8 | 5.96 | 15.37 | 25.25 | 1147 | +| hot-events (query) | 16 | 8.61 | 27.05 | 37.25 | 1453 | + +### im4gn.4xlarge — 16 vCPU aarch64, 62 GB RAM, 6800 GB NVMe + +| Bench | c | p50 ms | p90 ms | p99 ms | ops/s | +|---|---|---|---|---|---| +| cold-ledgers (n=20) | 1 | 27.50 | 32.42 | 35.49 | 36 | +| cold-ledgers (n=20) | 4 | 28.09 | 33.24 | 37.69 | 138 | +| cold-ledgers (n=20) | 8 | 27.69 | 34.44 | 43.33 | 271 | +| cold-ledgers (n=20) | 16 | 30.27 | 37.74 | 45.95 | 483 | +| hot-ledgers (n=20) | 1 | 24.79 | 30.46 | 33.49 | 39 | +| hot-ledgers (n=20) | 4 | 24.42 | 29.28 | 35.02 | 160 | +| hot-ledgers (n=20) | 8 | 24.41 | 29.09 | 34.28 | 311 | +| hot-ledgers (n=20) | 16 | 25.75 | 31.70 | 39.83 | 587 | +| cold-txpage (p=20) | 1 | 20.27 | 26.30 | 49.91 | 27 | +| cold-txpage (p=20) | 4 | 21.95 | 29.11 | 49.96 | 173 | +| cold-txpage (p=20) | 8 | 23.26 | 31.16 | 53.99 | 329 | +| cold-txpage (p=20) | 16 | 31.77 | 48.03 | 76.74 | 459 | +| hot-txpage (p=20) | 1 | 18.46 | 25.01 | 37.72 | 54 | +| hot-txpage (p=20) | 4 | 20.36 | 28.13 | 47.28 | 188 | +| hot-txpage (p=20) | 8 | 21.84 | 29.63 | 50.37 | 348 | +| hot-txpage (p=20) | 16 | 29.77 | 46.85 | 73.54 | 493 | +| cold-txhash (roundtrip) | 1 | 21.74 | 26.73 | 32.45 | 42 | +| cold-txhash (roundtrip) | 4 | 22.34 | 27.70 | 33.46 | 178 | +| cold-txhash (roundtrip) | 8 | 23.38 | 29.33 | 34.91 | 339 | +| cold-txhash (roundtrip) | 16 | 30.93 | 41.84 | 51.14 | 506 | +| hot-txhash (roundtrip) | 1 | 20.14 | 24.96 | 31.13 | 51 | +| hot-txhash (roundtrip) | 4 | 22.70 | 27.80 | 34.41 | 181 | +| hot-txhash (roundtrip) | 8 | 24.10 | 29.56 | 36.42 | 339 | +| hot-txhash (roundtrip) | 16 | 32.93 | 47.49 | 62.26 | 471 | +| cold-events (query) | 1 | 19.97 | 30.07 | 144.87 | 29 | +| cold-events (query) | 4 | 22.50 | 31.33 | 108.36 | 163 | +| cold-events (query) | 8 | 25.79 | 36.69 | 152.50 | 272 | +| cold-events (query) | 16 | 39.12 | 60.28 | 303.35 | 327 | +| hot-events (query) | 1 | 9.09 | 11.66 | 13.55 | 144 | +| hot-events (query) | 4 | 9.54 | 16.94 | 23.12 | 460 | +| hot-events (query) | 8 | 9.73 | 33.73 | 46.26 | 638 | +| hot-events (query) | 16 | 12.78 | 64.18 | 97.76 | 738 | From 4bc73ae021e681e55374650010c8ceb3f7ccce37 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Wed, 3 Jun 2026 18:22:06 +0000 Subject: [PATCH 06/27] bench(fullhistory): add streamlined 2026-06-03 summary table Condensed two-table view (typical p50 latency + peak throughput) with a full glossary defining every row, column, tier, and variable (n, page, c, p50/p99, ops/s). Links back to the full cross-machine report. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../results/2026-06-03-summary-table.md | 72 +++++++++++++++++++ 1 file changed, 72 insertions(+) create mode 100644 cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md new file mode 100644 index 000000000..cc662e471 --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md @@ -0,0 +1,72 @@ +# stellar-rpc full-history bench — 2026-06-03 streamlined summary + +A condensed, self-contained view of the 2026-06-03 cross-machine run. Full +report with concurrency sweeps, ingest stages, and per-machine raw cells: +[`2026-06-03-cross-machine.md`](./2026-06-03-cross-machine.md). All numbers are +recomputed from `gs://rpc-full-history/benchmarks/2026-06-03/`. + +## Glossary — what every term means + +**Machines (rows):** AWS EC2 instances. `c6id` = Intel Ice Lake x86; `im4gn` = +AWS Graviton2 ARM. The size suffix scales vCPUs: `2xlarge`=8, `4xlarge`=16, +`8xlarge`=32 vCPUs. + +**Tier:** +- **cold** = read from on-disk packfiles, OS page cache evicted + a fresh file + opened on every query (worst case, "data not in memory"). +- **hot** = read from the live RocksDB store with a warm cache (best case, + "recently-served data"). + +**Workloads (columns):** + +| Name | What it does | Fixed param | +|---|---|---| +| **ledgers** | Read a run of consecutive ledgers | `n=20` ledgers per read | +| **tx-page** | Fetch one page of transactions | `page=20` txns per page | +| **tx-hash** | `getTransaction(hash)` full round-trip (lookup → fetch → decode → re-serialize) | hits only | +| **events** | Event-filter query (random mix of contract/topic filters) | — | + +**Variables:** +- **n** = ledgers read per `ledgers` query (here always 20). +- **page** = transactions per `tx-page` query (here always 20). +- **c** (query-concurrency) = queries in flight at once. `c=1` = one at a time + (pure latency); higher `c` = load test. +- **p50 / p90 / p99** = latency percentiles in milliseconds. p50 = median + (typical request); p99 = near-worst-case tail (1 in 100 requests is slower). +- **ops/s** = throughput (successful queries per second) at that concurrency. + +## Table 1 — Typical latency (p50 ms, single query, `c=1`) + +Cleanest "how fast is one request" view. Lower = faster. Each cell is +`cold / hot`. + +| Machine (vCPU / arch) | ledgers n=20 | tx-page p=20 | tx-hash | events | +|---|---|---|---|---| +| c6id.2xlarge (8, x86) | 14.3 / 13.6 | 12.3 / 10.3 | 12.2 / 11.5 | 15.8 / 5.5 | +| c6id.4xlarge (16, x86) | 15.2 / 12.9 | 11.9 / 10.4 | 12.2 / 11.0 | 15.5 / 5.3 | +| c6id.8xlarge (32, x86) | 14.8 / 13.2 | 11.5 / 9.8 | 11.7 / 10.6 | 16.0 / 5.2 | +| im4gn.4xlarge (16, ARM) | 27.5 / 24.8 | 20.3 / 18.5 | 21.7 / 20.1 | 20.0 / 9.1 | + +*For point reads, cold ≈ hot (~1.1×) — decode cost dominates, a warm-NVMe file +open is cheap. Only **events** shows a big cold/hot gap (cold ~3× slower). The +ARM box is ~1.7–1.9× slower per request than same-vCPU x86.* + +## Table 2 — Peak throughput (ops/s, best across c=1→16) + +How many queries/sec each box sustains under load. Higher = better. Each cell is +`cold / hot`. + +| Machine (vCPU / arch) | ledgers n=20 | tx-page p=20 | tx-hash | events | +|---|---|---|---|---| +| c6id.2xlarge (8, x86) | 250 / 363 | 289 / 302 | 263 / 285 | 118 / 430 | +| c6id.4xlarge (16, x86) | 501 / 702 | 509 / 520 | 479 / 508 | 239 / 828 | +| c6id.8xlarge (32, x86) | 775 / 913 | 717 / 745 | 689 / 700 | 504 / 1453 | +| im4gn.4xlarge (16, ARM) | 483 / 587 | 459 / 493 | 506 / 471 | 327 / 738 | + +*Throughput scales with vCPU count (8xl ≈ 2× the 4xl). Hot **events** scales +best (pure in-memory bitmap intersect → 1,453 ops/s on the 32-vCPU box).* + +> ⚠️ These `ops/s` figures are **only comparable within this 6/03 run**. The +> metric was computed differently in the 2026-05-21 run, so do not compare +> throughput across the two reports — single-query p50 latency is the only valid +> cross-run comparison. From e542ee988198dedab6439382ed614bf7f809d9f4 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Wed, 3 Jun 2026 18:29:51 +0000 Subject: [PATCH 07/27] bench(fullhistory): add per-machine RAM to summary-table glossary Co-Authored-By: Claude Opus 4.8 (1M context) --- .../results/2026-06-03-summary-table.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md index cc662e471..bec040456 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md @@ -7,9 +7,16 @@ recomputed from `gs://rpc-full-history/benchmarks/2026-06-03/`. ## Glossary — what every term means -**Machines (rows):** AWS EC2 instances. `c6id` = Intel Ice Lake x86; `im4gn` = -AWS Graviton2 ARM. The size suffix scales vCPUs: `2xlarge`=8, `4xlarge`=16, -`8xlarge`=32 vCPUs. +**Machines (rows):** AWS EC2 instances. `c6id` = Intel Ice Lake x86 (compute- +optimized, ~2 GB RAM/vCPU); `im4gn` = AWS Graviton2 ARM (memory-optimized, +~4 GB RAM/vCPU). The size suffix scales vCPUs and RAM: + +| Instance | vCPUs | RAM | +|---|---|---| +| c6id.2xlarge | 8 | 15 GB | +| c6id.4xlarge | 16 | 31 GB | +| c6id.8xlarge | 32 | 62 GB | +| im4gn.4xlarge | 16 | 62 GB | **Tier:** - **cold** = read from on-disk packfiles, OS page cache evicted + a fresh file From 44abac4b5e63625f8098e9add3f555e05492c478 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Wed, 3 Jun 2026 18:34:43 +0000 Subject: [PATCH 08/27] bench(fullhistory): add ingest benchmarks to streamlined summary table Adds Table 3 (ingest throughput: hot-ingest ledgers/s, build-txhash-index keys/s) and Table 4 (per-stage ingest cost), plus glossary entries for the ingest workloads and ledgers/s, keys/s, and stage terms. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../results/2026-06-03-summary-table.md | 51 ++++++++++++++++++- 1 file changed, 50 insertions(+), 1 deletion(-) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md index bec040456..46d11e911 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md @@ -24,7 +24,7 @@ optimized, ~2 GB RAM/vCPU); `im4gn` = AWS Graviton2 ARM (memory-optimized, - **hot** = read from the live RocksDB store with a warm cache (best case, "recently-served data"). -**Workloads (columns):** +**Read workloads (columns in Tables 1–2):** | Name | What it does | Fixed param | |---|---|---| @@ -33,6 +33,14 @@ optimized, ~2 GB RAM/vCPU); `im4gn` = AWS Graviton2 ARM (memory-optimized, | **tx-hash** | `getTransaction(hash)` full round-trip (lookup → fetch → decode → re-serialize) | hits only | | **events** | Event-filter query (random mix of contract/topic filters) | — | +**Ingest workloads (Tables 3–4):** writing data into the stores, not reading it. + +| Name | What it does | +|---|---| +| **hot-ingest** | Single-stream synchronous ingest into the live hot store (each ledger's ledgers/txhash/events written + WAL-fsynced before the next) | +| **cold-ingest** | Bulk ingest of packfile chunks into cold storage (parallel chunk workers) | +| **build-txhash-index** | Phase-2 of the cold tx-hash index: k-way merge of per-chunk hash streams + MPHF (minimal-perfect-hash) construction | + **Variables:** - **n** = ledgers read per `ledgers` query (here always 20). - **page** = transactions per `tx-page` query (here always 20). @@ -41,6 +49,12 @@ optimized, ~2 GB RAM/vCPU); `im4gn` = AWS Graviton2 ARM (memory-optimized, - **p50 / p90 / p99** = latency percentiles in milliseconds. p50 = median (typical request); p99 = near-worst-case tail (1 in 100 requests is slower). - **ops/s** = throughput (successful queries per second) at that concurrency. +- **ledgers/s** = ingest throughput (ledgers written per second), single stream. +- **keys/s** = tx-hash index build rate (hashes indexed per second). +- **stage** = one phase of the ingest pipeline (e.g. `extract` = pull the + xdr-views out of the raw ledger; `write` = persist to RocksDB + WAL; + `term-index` = build the event term→ledger bitmaps; `append` = write to the + cold packfile). Stage timings are per item (per ledger, or per event-batch). ## Table 1 — Typical latency (p50 ms, single query, `c=1`) @@ -73,6 +87,41 @@ How many queries/sec each box sustains under load. Higher = better. Each cell is *Throughput scales with vCPU count (8xl ≈ 2× the 4xl). Hot **events** scales best (pure in-memory bitmap intersect → 1,453 ops/s on the 32-vCPU box).* +## Table 3 — Ingest throughput + +How fast each box writes data in. Higher = better. + +| Machine (vCPU / arch) | hot-ingest (ledgers/s) | build-txhash-index (keys/s) | +|---|---|---| +| c6id.2xlarge (8, x86) | 74 | 24.8 M | +| c6id.4xlarge (16, x86) | 79 | 37.1 M | +| c6id.8xlarge (32, x86) | 80 | 38.3 M | +| im4gn.4xlarge (16, ARM) | 49 | 38.9 M | + +*hot-ingest is single-stream and **WAL-fsync-bound**, so it barely scales with +vCPUs (~80 ledgers/s ceiling on x86); the ARM box is ~1.6× slower on the +fsync + encode path. build-txhash-index is CPU-bound and scales with cores up to +~38 M keys/s (the 8-vCPU box is the outlier at ~25 M). Note: im4gn built its +index over 140 chunks (380 M keys, 1.6 GB) vs 16 chunks (46 M keys, 199 MB) on +the c6id boxes — the per-key **rate** is comparable, the absolute size is not.* + +## Table 4 — Ingest per-stage cost (p50 ms per item) + +Where the ingest time goes, broken out by pipeline stage. Lower = faster. Ledger +and `extract`/`write` stages are per ledger; event stages are per event-batch. + +| Machine (vCPU / arch) | hot: ledger write | hot: tx extract | hot: event extract | hot: event write | cold: event extract | cold: event term-index | cold: event append | +|---|---|---|---|---|---|---|---| +| c6id.2xlarge (8, x86) | 2.59 | 0.47 | 1.39 | 7.24 | 2.68 | 0.76 | 0.12 | +| c6id.4xlarge (16, x86) | 2.47 | 0.46 | 1.37 | 6.63 | 2.67 | 0.82 | 0.12 | +| c6id.8xlarge (32, x86) | 2.47 | 0.46 | 1.37 | 6.51 | 1.75 | 0.70 | 0.10 | +| im4gn.4xlarge (16, ARM) | 4.63 | 0.71 | 2.26 | 10.24 | 2.40 | 0.83 | 0.15 | + +*Hot **event write** (RocksDB put + WAL) is the single most expensive stage +(~6.5–10 ms/batch) and dominates hot-ingest cost. xdr-view **extract** is cheap +(~0.5 ms/ledger for tx-hash, ~1.4 ms for events). Graviton2 is ~1.5–1.8× slower +on the CPU-bound extract/write stages.* + > ⚠️ These `ops/s` figures are **only comparable within this 6/03 run**. The > metric was computed differently in the 2026-05-21 run, so do not compare > throughput across the two reports — single-query p50 latency is the only valid From 69428cb907b4658bb20b1d426e88df491306fb2c Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Wed, 3 Jun 2026 18:40:02 +0000 Subject: [PATCH 09/27] bench(fullhistory): add estimated cold-ingest rate to summary Table 3 cold-ingest ledgers/s computed as sum(chunk_wall) / chunk-workers (upper-bound estimate, since the harness records summed per-chunk wall, not true end-to-end wall). Flagged as an estimate; scales with --chunk-workers. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../results/2026-06-03-summary-table.md | 30 ++++++++++++------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md index 46d11e911..154de3af5 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md @@ -91,19 +91,29 @@ best (pure in-memory bitmap intersect → 1,453 ops/s on the 32-vCPU box).* How fast each box writes data in. Higher = better. -| Machine (vCPU / arch) | hot-ingest (ledgers/s) | build-txhash-index (keys/s) | -|---|---|---| -| c6id.2xlarge (8, x86) | 74 | 24.8 M | -| c6id.4xlarge (16, x86) | 79 | 37.1 M | -| c6id.8xlarge (32, x86) | 80 | 38.3 M | -| im4gn.4xlarge (16, ARM) | 49 | 38.9 M | +| Machine (vCPU / arch) | hot-ingest (ledgers/s) | cold-ingest (ledgers/s, est.) | build-txhash-index (keys/s) | +|---|---|---|---| +| c6id.2xlarge (8, x86) | 74 | ~560 | 24.8 M | +| c6id.4xlarge (16, x86) | 79 | ~1,110 | 37.1 M | +| c6id.8xlarge (32, x86) | 80 | ~1,450 | 38.3 M | +| im4gn.4xlarge (16, ARM) | 49 | ~1,080 | 38.9 M | *hot-ingest is single-stream and **WAL-fsync-bound**, so it barely scales with vCPUs (~80 ledgers/s ceiling on x86); the ARM box is ~1.6× slower on the -fsync + encode path. build-txhash-index is CPU-bound and scales with cores up to -~38 M keys/s (the 8-vCPU box is the outlier at ~25 M). Note: im4gn built its -index over 140 chunks (380 M keys, 1.6 GB) vs 16 chunks (46 M keys, 199 MB) on -the c6id boxes — the per-key **rate** is comparable, the absolute size is not.* +fsync + encode path. **cold-ingest** is batched (no per-ledger fsync) and runs +chunks in parallel, so it is ~7–18× faster than hot and scales with +`--chunk-workers` — that's why the 4-worker c6id.2xlarge (~560) trails the +8-worker boxes (~1,100–1,450). build-txhash-index is CPU-bound and scales with +cores up to ~38 M keys/s (the 8-vCPU box is the outlier at ~25 M). Note: im4gn +ingested 140 chunks (1.4 M ledgers; 380 M index keys, 1.6 GB) vs 16 chunks +(160 K ledgers; 46 M keys, 199 MB) on the c6id boxes — per-item **rates** are +comparable, absolute totals are not.* + +> The cold-ingest rate is an **estimate**: the harness records the summed +> per-chunk wall time, not the true end-to-end wall, so this is +> `sum(chunk_wall) ÷ chunk-workers` (i.e. it assumes the chunk workers stay +> fully busy — an upper bound). hot-ingest and build-txhash-index are measured +> directly. ## Table 4 — Ingest per-stage cost (p50 ms per item) From efdd7ac9f65705cb20285b43401baaf30aae5490 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Wed, 3 Jun 2026 18:41:14 +0000 Subject: [PATCH 10/27] bench(fullhistory): fix stale GCS paths + link newer run in 2026-05-21 report Source/summary CSV paths were missing the dated prefix (data lives under .../benchmarks/2026-05-21/, the undated paths don't exist). Also dates the title and forward-links the 2026-06-03 run, noting the harness changed and ops/s is not comparable across runs. Historical 5/21 numbers are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../results/2026-05-21-cross-machine.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md index 3f74fad50..3820ac203 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md @@ -1,8 +1,15 @@ -# stellar-rpc full-history bench comparison +# stellar-rpc full-history bench comparison — 2026-05-21 Cross-machine summary of `cmd/stellar-rpc/scripts/bench-fullhistory` runs. -Source per-iter CSVs live at `gs://rpc-full-history/benchmarks//`; -the summary CSVs that back every table here are at `gs://rpc-full-history/benchmarks/_summary/`. +Source per-iter CSVs live at `gs://rpc-full-history/benchmarks/2026-05-21//`; +the summary CSVs that back every table here are at `gs://rpc-full-history/benchmarks/2026-05-21/_summary/`. + +> **Newer run available:** a later run on the rewritten bench harness is at +> [`2026-06-03-cross-machine.md`](./2026-06-03-cross-machine.md) (condensed view: +> [`2026-06-03-summary-table.md`](./2026-06-03-summary-table.md)). The harness +> changed between these runs — several benchmarks are no longer 1:1, and `ops/s` +> is **not** comparable across the two; see the 2026-06-03 report's methodology +> section before comparing. ## 1. Test machines From 9e83d5174e4f014ce7a223423bbd7959f3e08d79 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Tue, 2 Jun 2026 22:38:42 +0000 Subject: [PATCH 11/27] bench(fullhistory): add run-all-benches.sh suite driver MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Drives the full read + ingest bench suite in bench-fullhistory: builds the binary once, then runs cold+hot ledgers/txpage/txhash/events read benches (each a 1,4,8,16 query-concurrency sweep) plus the hot-ingest, cold-ingest, and build-txhash-index ingest benches. By default the reads use prebuilt fixtures and ingest writes to scratch (independent measurements). INGEST_FIRST=1 instead ingests first and repoints every read bench at the freshly-ingested stores, so the suite is self-contained from a single raw-ledger packfile seed — usable on a fresh machine with no prebuilt data. Paths/sizing knobs are env- overridable for running across different machines. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../bench-fullhistory/run-all-benches.sh | 280 ++++++++++++++++++ 1 file changed, 280 insertions(+) create mode 100755 cmd/stellar-rpc/scripts/bench-fullhistory/run-all-benches.sh diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/run-all-benches.sh b/cmd/stellar-rpc/scripts/bench-fullhistory/run-all-benches.sh new file mode 100755 index 000000000..f5a42236c --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/run-all-benches.sh @@ -0,0 +1,280 @@ +#!/usr/bin/env bash +# +# run-all-benches.sh — drive the full read + ingest bench suite in +# bench-fullhistory. +# +# Builds the bench binary once and runs: +# +# Read benches — cold + hot variants of: +# - ledgers (sweeps --n=1,10,20: single-ledger fetch, mid-page, full page) +# - txpage (ledger-range transaction page lookup) +# - txhash (single-hash getTransaction lookup) +# - events (eventstore.Query) +# Each read bench does a 1,4,8,16 --query-concurrency sweep. +# +# Ingest benches (skip the whole section with RUN_INGEST=0): +# - hot-ingest (single chunk -> fresh RocksDB hot store) +# - cold-ingest (multi-chunk -> fresh cold packfiles) +# - build-txhash-index (phase 2: cold .bin files -> queryable .idx) +# +# By default ingest writes to scratch (INGEST_OUT_ROOT) and the reads use the +# prebuilt fixtures, so the two halves are independent. Set INGEST_FIRST=1 to +# instead ingest FIRST and point every read bench at the freshly-ingested +# stores — a self-contained build+read run needing no prebuilt data, only +# INGEST_SOURCE_COLD_DIR (the raw-ledger packfile seed). +# +# All write CSVs + logs under $OUT_DIR. Adjust the paths/values at the top +# before running. + +set -euo pipefail + +# --------------------------------------------------------------------------- +# config — edit these for your machine. +# --------------------------------------------------------------------------- + +# Where bench data lives. +COLD_LEDGERS_DIR="/mnt/nvme/ledgers/cold" +COLD_TXHASH_MPHF="/mnt/nvme/ledgers/txhash-cold/00005000-00005999.idx" +# NB: cold-events wants the *bucket* dir (.../events/<5-digit bucket>), not +# the events root — it globs *-events.pack directly in this dir. +COLD_EVENTS_DIR="/mnt/nvme/bench-ingest/sweep-cold-2026-05-27/events/00005" + +# Hot stores are tied to a single chunk; the sweep-hot dir is consistent +# across all three data types (ledgers/txhash/events) for chunk 5860. +HOT_CHUNK=5860 +HOT_LEDGERS_DIR="/mnt/nvme/bench-ingest/sweep-hot-2026-05-27/ledgers" +HOT_TXHASH_DIR="/mnt/nvme/bench-ingest/sweep-hot-2026-05-27/txhash" +HOT_EVENTS_DIR="/mnt/nvme/bench-ingest/sweep-hot-2026-05-27/events/hot" + +# Sweep + sizing knobs. +QUERY_CONCURRENCY="1,4,8,16" +LEDGER_NS=(1 10 20) # --n values for {cold,hot}-ledgers +LEDGERS_ITERS=60 +TXPAGE_ITERS=200 +TXHASH_ITERS=1000 +EVENTS_ITERS=500 +PAGE_SIZE=20 +SEED=1 + +# Ingest knobs. Ingest re-reads raw ledgers from a cold packfile *source* +# (INGEST_SOURCE_COLD_DIR) and writes fresh hot/cold stores under +# INGEST_OUT_ROOT/{hot,cold} (wiped each run). +# +# RUN_INGEST=0 skip the ingest benches entirely. +# INGEST_FIRST=1 ingest BEFORE the reads and point every hot/cold read +# bench at the freshly-ingested stores instead of the +# prebuilt fixtures above. Self-contained: the only input is +# INGEST_SOURCE_COLD_DIR (raw-ledger packfile seed). Implies +# the ingest section runs. +RUN_INGEST="${RUN_INGEST:-1}" +INGEST_FIRST="${INGEST_FIRST:-0}" +# Seed packfile for ingest. Captured from COLD_LEDGERS_DIR here, BEFORE the +# bootstrap block below may repoint COLD_LEDGERS_DIR at the ingest output. +INGEST_SOURCE_COLD_DIR="${INGEST_SOURCE_COLD_DIR:-${COLD_LEDGERS_DIR}}" +INGEST_TYPES="ledgers,txhash,events" +INGEST_CHUNK="${HOT_CHUNK}" # first chunk to ingest +COLD_INGEST_NUM_CHUNKS=16 # cold-ingest consecutive chunks +COLD_INGEST_CHUNK_WORKERS=8 # cold-ingest chunk concurrency +INGEST_OUT_ROOT="${INGEST_OUT_ROOT:-/mnt/nvme/bench-ingest-out}" +HOT_INGEST_OUT="${INGEST_OUT_ROOT}/hot" # hot-ingest --hot-dir +COLD_INGEST_OUT="${INGEST_OUT_ROOT}/cold" # cold-ingest --cold-out-dir +COLD_INGEST_IDX="${COLD_INGEST_OUT}/txhash/index.idx" # build-txhash-index output + +# Bootstrap mode: the reads consume what we're about to ingest, so repoint the +# read-bench dirs at the ingest output. hot-ingest/cold-ingest create per-type +# subdirs {ledgers,txhash,events}; the hot events store sits at /events +# and the read bench opens it at that same level (no extra /hot segment). +if [[ "${INGEST_FIRST}" == "1" ]]; then + RUN_INGEST=1 + HOT_LEDGERS_DIR="${HOT_INGEST_OUT}/ledgers" + HOT_TXHASH_DIR="${HOT_INGEST_OUT}/txhash" + HOT_EVENTS_DIR="${HOT_INGEST_OUT}/events" + COLD_LEDGERS_DIR="${COLD_INGEST_OUT}/ledgers" + COLD_TXHASH_MPHF="${COLD_INGEST_IDX}" + # cold-events is bucket-scoped: --cold-events-dir must be the bucket dir + # (/events/<5-digit bucket>), NOT the events root — the bench globs + # *-events.pack directly in that dir and opens it as the bucketDir. All + # chunks in one cold-ingest run share a bucket (BucketID = chunk/1000) as + # long as the range doesn't cross a 1000-chunk boundary. + COLD_EVENTS_DIR="${COLD_INGEST_OUT}/events/$(printf '%05d' $((INGEST_CHUNK / 1000)))" +fi + +# Output. Timestamped subdir so reruns don't clobber prior CSVs. +OUT_ROOT="${OUT_ROOT:-bench-out}" +OUT_DIR="${OUT_ROOT}/run-$(date -u +%Y%m%dT%H%M%SZ)" +LOG_DIR="${OUT_DIR}/logs" + +# Go binary (override with GO=...). Falls back to `go` on PATH. +GO="${GO:-$(command -v go || echo /home/simon/go-toolchain/go/bin/go)}" + +# grocksdb cgo deps. The system librocksdb (8.9) is too old for grocksdb +# v1.10.x; this points at a user-local v10.x build. Same story for zstd: +# packfile zstd codec needs >=1.5.7 at runtime but system has 1.5.5. +# Override either prefix if yours lives elsewhere. +ROCKSDB_PREFIX="${ROCKSDB_PREFIX:-/home/simon/.rocksdb}" +ZSTD_PREFIX="${ZSTD_PREFIX:-/home/simon/.zstd}" +export CGO_CFLAGS="${CGO_CFLAGS:-} -I${ROCKSDB_PREFIX}/include" +export CGO_LDFLAGS="${CGO_LDFLAGS:-} -L${ROCKSDB_PREFIX}/lib -lrocksdb" +export LD_LIBRARY_PATH="${ZSTD_PREFIX}/lib:${ROCKSDB_PREFIX}/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" + +# --------------------------------------------------------------------------- +# build + setup +# --------------------------------------------------------------------------- + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +BIN="${SCRIPT_DIR}/bench-fullhistory" + +mkdir -p "${LOG_DIR}" + +echo "==> building bench-fullhistory binary (using ${GO})" +(cd "${SCRIPT_DIR}" && "${GO}" build -o "${BIN}" .) + +echo "==> output dir: ${OUT_DIR}" + +# Run one bench: $1 = sub-command, remaining args = flags. Tees stdout to a +# per-run log; non-zero exit prints a marker but does not abort the suite, +# so a single missing data dir doesn't cancel the rest. +run_bench() { + local cmd="$1"; shift + local label="$1"; shift + local log="${LOG_DIR}/${label}.log" + echo + echo "==============================================================" + echo "==> ${label}: bench-fullhistory ${cmd} $*" + echo " log: ${log}" + echo "==============================================================" + if ! "${BIN}" "${cmd}" "$@" --out="${OUT_DIR}" --seed="${SEED}" \ + --query-concurrency="${QUERY_CONCURRENCY}" 2>&1 | tee "${log}"; then + echo "!! ${label} FAILED (continuing)" + fi +} + +# Run one ingest bench: same teeing/continue-on-failure behavior as +# run_bench, but ingest commands take neither --seed nor --query-concurrency. +run_ingest() { + local cmd="$1"; shift + local label="$1"; shift + local log="${LOG_DIR}/${label}.log" + echo + echo "==============================================================" + echo "==> ${label}: bench-fullhistory ${cmd} $*" + echo " log: ${log}" + echo "==============================================================" + if ! "${BIN}" "${cmd}" "$@" --out="${OUT_DIR}" 2>&1 | tee "${log}"; then + echo "!! ${label} FAILED (continuing)" + fi +} + +# Build fresh hot + cold stores from the cold packfile seed. Output dirs must +# be empty, so each is wiped first. In INGEST_FIRST mode this runs before the +# reads (feeding them); otherwise after, as an independent measurement. +do_ingest() { + # hot tier — single chunk -> fresh RocksDB store. + rm -rf "${HOT_INGEST_OUT}" + run_ingest hot-ingest "hot-ingest" \ + --types="${INGEST_TYPES}" --source=pack \ + --cold-dir="${INGEST_SOURCE_COLD_DIR}" \ + --chunk="${INGEST_CHUNK}" --hot-dir="${HOT_INGEST_OUT}" \ + --xdr-views + + # cold tier — multi-chunk -> fresh packfiles. + rm -rf "${COLD_INGEST_OUT}" + run_ingest cold-ingest "cold-ingest" \ + --types="${INGEST_TYPES}" --source=pack \ + --cold-dir="${INGEST_SOURCE_COLD_DIR}" \ + --chunk="${INGEST_CHUNK}" --num-chunks="${COLD_INGEST_NUM_CHUNKS}" \ + --chunk-workers="${COLD_INGEST_CHUNK_WORKERS}" \ + --cold-out-dir="${COLD_INGEST_OUT}" --xdr-views + + # cold txhash phase 2 — merge the .bin files cold-ingest wrote into a + # single queryable .idx (only runs if txhash was one of INGEST_TYPES). + if [[ "${INGEST_TYPES}" == *txhash* ]]; then + run_ingest build-txhash-index "build-txhash-index" \ + --in-dir="${COLD_INGEST_OUT}/txhash" \ + --idx-out="${COLD_INGEST_IDX}" + fi +} + +# --------------------------------------------------------------------------- +# ingest-first: build the stores the reads below will consume. +# --------------------------------------------------------------------------- + +if [[ "${INGEST_FIRST}" == "1" ]]; then + echo + echo "==> INGEST_FIRST=1: ingesting into ${INGEST_OUT_ROOT} before reads" + do_ingest +fi + +# --------------------------------------------------------------------------- +# ledger reads — sweep --n across {1, 10, 20} +# --------------------------------------------------------------------------- + +for n in "${LEDGER_NS[@]}"; do + run_bench cold-ledgers "cold-ledgers-n${n}" \ + --cold-dir="${COLD_LEDGERS_DIR}" \ + --n="${n}" --iters="${LEDGERS_ITERS}" + + run_bench hot-ledgers "hot-ledgers-n${n}" \ + --hot-dir="${HOT_LEDGERS_DIR}" --chunk="${HOT_CHUNK}" \ + --n="${n}" --iters="${LEDGERS_ITERS}" +done + +# --------------------------------------------------------------------------- +# transaction-page reads (ledger-range tx lookup) +# --------------------------------------------------------------------------- + +run_bench cold-txpage "cold-txpage" \ + --cold-dir="${COLD_LEDGERS_DIR}" \ + --page-size="${PAGE_SIZE}" --iters="${TXPAGE_ITERS}" + +run_bench hot-txpage "hot-txpage" \ + --hot-dir="${HOT_LEDGERS_DIR}" --chunk="${HOT_CHUNK}" \ + --page-size="${PAGE_SIZE}" --iters="${TXPAGE_ITERS}" + +# --------------------------------------------------------------------------- +# single-hash getTransaction lookup +# --------------------------------------------------------------------------- + +run_bench cold-txhash "cold-txhash" \ + --cold-dir="${COLD_LEDGERS_DIR}" \ + --txhash-cold-mphf="${COLD_TXHASH_MPHF}" \ + --iters="${TXHASH_ITERS}" + +run_bench hot-txhash "hot-txhash" \ + --hot-dir="${HOT_LEDGERS_DIR}" \ + --txhash-hot="${HOT_TXHASH_DIR}" \ + --cold-dir="${COLD_LEDGERS_DIR}" \ + --chunk="${HOT_CHUNK}" \ + --iters="${TXHASH_ITERS}" + +# --------------------------------------------------------------------------- +# eventstore.Query +# --------------------------------------------------------------------------- + +run_bench cold-events "cold-events" \ + --cold-events-dir="${COLD_EVENTS_DIR}" \ + --iters="${EVENTS_ITERS}" + +run_bench hot-events "hot-events" \ + --hot-dir="${HOT_EVENTS_DIR}" --chunk="${HOT_CHUNK}" \ + --iters="${EVENTS_ITERS}" + +# --------------------------------------------------------------------------- +# ingest — in INGEST_FIRST mode it already ran above (feeding the reads); +# otherwise run it now as an independent measurement (RUN_INGEST=0 skips). +# --------------------------------------------------------------------------- + +if [[ "${INGEST_FIRST}" == "1" ]]; then + echo + echo "==> ingest already ran before the reads (INGEST_FIRST=1)" +elif [[ "${RUN_INGEST}" != "0" ]]; then + do_ingest +else + echo + echo "==> RUN_INGEST=0, skipping ingest benches" +fi + +echo +echo "==============================================================" +echo "==> done. CSVs + logs under ${OUT_DIR}" +echo "==============================================================" From b712b861d5338b60320641aa0d9521f845db4196 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Wed, 3 Jun 2026 21:10:40 +0000 Subject: [PATCH 12/27] =?UTF-8?q?bench(fullhistory):=20address=20PR=20#750?= =?UTF-8?q?=20=E2=80=94=20txpage=20materialization=20+=20xdr-views?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR #750 review (tamirms) flagged two harness gaps and several execution issues. Code fixes: - txpage (hot+cold) previously only touched TransactionHash + ResultPair — it never fetched the page contents, so it measured a tx *count*, not a getTransactions response. New walkPageMaterialize (tx_page_helpers.go) builds a full db.Transaction per tx in the page (envelope, result, meta, events, hash, application order, ledger info). - txpage (hot+cold) had no --xdr-views flag, so it only measured the slow full-decode path. Added --xdr-views with a single-pass view materializer, mirroring the txhash bench. CSVs suffix -roundtrip / -xdrviews; detail column scan_ns -> materialize_ns (decode_ns stays 0 under views). Execution (run-all-benches.sh): - Run the decode-heavy query benches (txpage/txhash/events) once per mode (QUERY_VIEW_MODES = roundtrip + xdrviews) so the report can compare with/ without XDR views. Previously every query ran views-off (slow path). - Events use the worst-case query (EVENTS_BUCKETS=15, max filters/request). - Ingest runs with --parallel; hot-ingest runs both xdr-views on and off (the views run feeds the reads, the parsed run is kept for its CSVs). Smoke-tested: 0 errors, pages fully materialized; views 4-8x faster than round-trip (decode_ns=0 confirms the path dispatch). Co-Authored-By: Claude Opus 4.8 (1M context) --- .../bench-fullhistory/bench_cold_txpage.go | 85 ++--- .../bench-fullhistory/bench_hot_txpage.go | 34 +- .../bench-fullhistory/run-all-benches.sh | 92 +++-- .../bench-fullhistory/tx_page_helpers.go | 332 ++++++++++++++++++ 4 files changed, 449 insertions(+), 94 deletions(-) create mode 100644 cmd/stellar-rpc/scripts/bench-fullhistory/tx_page_helpers.go diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/bench_cold_txpage.go b/cmd/stellar-rpc/scripts/bench-fullhistory/bench_cold_txpage.go index 77f98d5f3..0095d6df3 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/bench_cold_txpage.go +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/bench_cold_txpage.go @@ -46,6 +46,9 @@ func cmdColdTxPage() { iters := fs.Int("iters", 200, "number of timed pages per worker") workersCSV := fs.String("query-concurrency", "1", "concurrent in-flight queries; comma-list sweep (e.g. 1,4,16)") seed := fs.Int64("seed", 1, "RNG seed") + xdrViews := fs.Bool("xdr-views", false, + "materialize the page via zero-copy XDR views (no UnmarshalBinary + ParseTransaction round-trip). "+ + "false = production path (lcm.UnmarshalBinary + ingest reader + db.ParseTransaction).") outDir := fs.String("out", "bench-out", "CSV output dir") _ = fs.Parse(os.Args[1:]) @@ -78,10 +81,16 @@ func cmdColdTxPage() { for _, cp := range preflights { totalTx += cp.totalTx } - logger.Infof("cold-txpage chunks=[%d,%d] usable=%d page=%d iters=%d workers=%v totalTx=%d", - chunkLo, chunkHi, len(preflights), *page, *iters, workersList, totalTx) + logger.Infof("cold-txpage chunks=[%d,%d] usable=%d page=%d iters=%d workers=%v totalTx=%d xdr-views=%v", + chunkLo, chunkHi, len(preflights), *page, *iters, workersList, totalTx, *xdrViews) - detailPath := filepath.Join(*outDir, fmt.Sprintf("cold-txpage-%d.csv", *page)) + // CSV filename gets a "-xdrviews"/"-roundtrip" suffix so the two + // materialization modes don't overwrite each other (mirrors txhash). + suffix := "-roundtrip" + if *xdrViews { + suffix = "-xdrviews" + } + detailPath := filepath.Join(*outDir, fmt.Sprintf("cold-txpage-%d%s.csv", *page, suffix)) if err := os.MkdirAll(*outDir, 0o755); err != nil { fatal(logger, "mkdir %s: %v", *outDir, err) } @@ -91,11 +100,11 @@ func cmdColdTxPage() { } defer detailF.Close() if _, err := fmt.Fprintln(detailF, - "query_concurrency,chunk,cursor_seq,cursor_tx,n_ledgers,open_ns,fetch_ns,decode_ns,scan_ns,total_ns"); err != nil { + "query_concurrency,chunk,cursor_seq,cursor_tx,n_ledgers,open_ns,fetch_ns,decode_ns,materialize_ns,total_ns"); err != nil { fatal(logger, "write CSV header: %v", err) } - summaryF, summaryPath, err := createCSV(*outDir, fmt.Sprintf("cold-txpage-%d-sweep", *page), sweepCSVHeader) + summaryF, summaryPath, err := createCSV(*outDir, fmt.Sprintf("cold-txpage-%d%s-sweep", *page, suffix), sweepCSVHeader) if err != nil { fatal(logger, "%v", err) } @@ -106,7 +115,7 @@ func cmdColdTxPage() { var csvMu sync.Mutex results := make([]concurrentResult, 0, len(workersList)) for _, w := range workersList { - op := coldTxPageOp(*coldDir, preflights, *page, w, detailF, &csvMu) + op := coldTxPageOp(*coldDir, preflights, *page, *xdrViews, w, detailF, &csvMu) res := runConcurrentSweep(w, *iters, *seed, op) printSweepRow(w, res, summaryF) results = append(results, res) @@ -150,7 +159,9 @@ func preflightAllChunks(logger *supportlog.Entry, coldDir string, lo, hi uint32, func coldTxPageOp( coldDir string, preflights []chunkPreflight, - page, workers int, + page int, + xdrViews bool, + workers int, detailF *os.File, csvMu *sync.Mutex, ) iterOp { @@ -171,7 +182,8 @@ func coldTxPageOp( defer cr.Close() li, ti := pickCursor(cp.infos, page, rng) - fetchNs, decodeNs, scanNs, nLedgers, got, walkErr := walkPagePhased(cr.GetLedgerRaw, cp.infos, li, ti, page) + fetchNs, decodeNs, materializeNs, nLedgers, got, walkErr := walkPageMaterialize( + cr.GetLedgerRaw, cp.infos, li, ti, page, xdrViews, pubnetPassphrase) if walkErr != nil { return 0, fmt.Errorf("walk: %w", walkErr) } @@ -179,7 +191,7 @@ func coldTxPageOp( return 0, fmt.Errorf("short read: got %d, want %d", got, page) } - totalNs := openNs + fetchNs + decodeNs + scanNs + totalNs := openNs + fetchNs + decodeNs + materializeNs if measured { csvMu.Lock() @@ -187,7 +199,7 @@ func coldTxPageOp( workers, cp.chunkID, cp.infos[li].seq, ti, nLedgers, openNs.Nanoseconds(), fetchNs.Nanoseconds(), - decodeNs.Nanoseconds(), scanNs.Nanoseconds(), + decodeNs.Nanoseconds(), materializeNs.Nanoseconds(), totalNs.Nanoseconds()) csvMu.Unlock() if werr != nil { @@ -254,54 +266,5 @@ func pickCursor(infos []ledgerTxCount, page int, rng *rand.Rand) (int, int) { } } -// walkPagePhased reads ledgers starting from (ledgerIdx, txIdx) and -// emits one tx at a time, tracking per-phase totals across the walk: -// -// fetch = sum of getLedger() call times -// decode = sum of UnmarshalBinary times -// scan = sum of per-tx Hash + ResultPair touches -// nLed = number of distinct ledgers touched -// got = txs emitted -// -// Stops when `wanted` txs have been emitted or the chunk is exhausted. -func walkPagePhased( - getLedger func(uint32) ([]byte, error), - infos []ledgerTxCount, - ledgerIdx, txIdx, wanted int, -) (fetch, decode, scan time.Duration, nLed, got int, err error) { - remaining := wanted - for i := ledgerIdx; i < len(infos) && remaining > 0; i++ { - t0 := time.Now() - raw, gerr := getLedger(infos[i].seq) - fetch += time.Since(t0) - if gerr != nil { - err = gerr - return - } - nLed++ - - t1 := time.Now() - var lcm goxdr.LedgerCloseMeta - if uerr := lcm.UnmarshalBinary(raw); uerr != nil { - decode += time.Since(t1) - err = uerr - return - } - decode += time.Since(t1) - - t2 := time.Now() - nTx := lcm.CountTransactions() - startIdx := 0 - if i == ledgerIdx { - startIdx = txIdx - } - for j := startIdx; j < nTx && remaining > 0; j++ { - _ = lcm.TransactionHash(j) - _ = lcm.TransactionResultPair(j) - got++ - remaining-- - } - scan += time.Since(t2) - } - return -} +// Page materialization (walkPageMaterialize + the view/round-trip helpers) +// lives in tx_page_helpers.go and is shared by the cold and hot benches. diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/bench_hot_txpage.go b/cmd/stellar-rpc/scripts/bench-fullhistory/bench_hot_txpage.go index 3467ea9dc..1ee899ffa 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/bench_hot_txpage.go +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/bench_hot_txpage.go @@ -35,6 +35,9 @@ func cmdHotTxPage() { workersCSV := fs.String("query-concurrency", "1", "concurrent in-flight queries; comma-list sweep (e.g. 1,4,16)") warmup := fs.Int("warmup", hotWarmupSharedIters, "warm-up pages per worker (RocksDB block-cache priming; not counted)") seed := fs.Int64("seed", 1, "RNG seed") + xdrViews := fs.Bool("xdr-views", false, + "materialize the page via zero-copy XDR views (no UnmarshalBinary + ParseTransaction round-trip). "+ + "false = production path (lcm.UnmarshalBinary + ingest reader + db.ParseTransaction).") outDir := fs.String("out", "bench-out", "CSV output dir") _ = fs.Parse(os.Args[1:]) @@ -64,12 +67,18 @@ func cmdHotTxPage() { if totalTx < *page { fatal(logger, "hot store has only %d txs but page-size=%d", totalTx, *page) } - logger.Infof("hot-txpage chunk=%d page=%d iters=%d workers=%v warmup=%d (preflight: %d ledgers, %d total tx, avg %.1f/ledger)", - chunkID, *page, *iters, workersList, *warmup, len(infos), totalTx, float64(totalTx)/float64(len(infos))) - + logger.Infof("hot-txpage chunk=%d page=%d iters=%d workers=%v warmup=%d xdr-views=%v (preflight: %d ledgers, %d total tx, avg %.1f/ledger)", + chunkID, *page, *iters, workersList, *warmup, *xdrViews, len(infos), totalTx, float64(totalTx)/float64(len(infos))) + + // CSV filename gets a "-xdrviews"/"-roundtrip" suffix so the two + // materialization modes don't overwrite each other (mirrors txhash). + suffix := "-roundtrip" + if *xdrViews { + suffix = "-xdrviews" + } // Per-iter detail CSV. Workers column lets a post-processor split // by worker count. - detailPath := filepath.Join(*outDir, fmt.Sprintf("hot-txpage-%d.csv", *page)) + detailPath := filepath.Join(*outDir, fmt.Sprintf("hot-txpage-%d%s.csv", *page, suffix)) if err := os.MkdirAll(*outDir, 0o755); err != nil { fatal(logger, "mkdir %s: %v", *outDir, err) } @@ -78,12 +87,12 @@ func cmdHotTxPage() { fatal(logger, "create CSV %s: %v", detailPath, err) } defer detailF.Close() - if _, err := fmt.Fprintln(detailF, "query_concurrency,cursor_seq,cursor_tx,n_ledgers,fetch_ns,decode_ns,scan_ns,total_ns"); err != nil { + if _, err := fmt.Fprintln(detailF, "query_concurrency,cursor_seq,cursor_tx,n_ledgers,fetch_ns,decode_ns,materialize_ns,total_ns"); err != nil { fatal(logger, "write CSV header: %v", err) } // Summary CSV (one row per worker count). - summaryF, summaryPath, err := createCSV(*outDir, fmt.Sprintf("hot-txpage-%d-sweep", *page), sweepCSVHeader) + summaryF, summaryPath, err := createCSV(*outDir, fmt.Sprintf("hot-txpage-%d%s-sweep", *page, suffix), sweepCSVHeader) if err != nil { fatal(logger, "%v", err) } @@ -94,7 +103,7 @@ func cmdHotTxPage() { var csvMu sync.Mutex results := make([]concurrentResult, 0, len(workersList)) for _, w := range workersList { - op := hotTxPageOp(h, infos, *page, w, detailF, &csvMu) + op := hotTxPageOp(h, infos, *page, *xdrViews, w, detailF, &csvMu) res := runConcurrentSweepWithWarmup(w, *warmup, *iters, *seed, op) printSweepRow(w, res, summaryF) results = append(results, res) @@ -111,20 +120,23 @@ func cmdHotTxPage() { func hotTxPageOp( h *ledger.HotStore, infos []ledgerTxCount, - page, workers int, + page int, + xdrViews bool, + workers int, detailF *os.File, csvMu *sync.Mutex, ) iterOp { return func(rng *rand.Rand, measured bool) (time.Duration, error) { li, ti := pickCursor(infos, page, rng) - fetchNs, decodeNs, scanNs, nLedgers, got, walkErr := walkPagePhased(h.GetLedgerRaw, infos, li, ti, page) + fetchNs, decodeNs, materializeNs, nLedgers, got, walkErr := walkPageMaterialize( + h.GetLedgerRaw, infos, li, ti, page, xdrViews, pubnetPassphrase) if walkErr != nil { return 0, walkErr } if got != page { return 0, fmt.Errorf("short read: got %d, want %d", got, page) } - totalNs := fetchNs + decodeNs + scanNs + totalNs := fetchNs + decodeNs + materializeNs if measured { csvMu.Lock() @@ -132,7 +144,7 @@ func hotTxPageOp( workers, infos[li].seq, ti, nLedgers, fetchNs.Nanoseconds(), decodeNs.Nanoseconds(), - scanNs.Nanoseconds(), totalNs.Nanoseconds()) + materializeNs.Nanoseconds(), totalNs.Nanoseconds()) csvMu.Unlock() if err != nil { return totalNs, err diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/run-all-benches.sh b/cmd/stellar-rpc/scripts/bench-fullhistory/run-all-benches.sh index f5a42236c..dd35a6713 100755 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/run-all-benches.sh +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/run-all-benches.sh @@ -7,14 +7,18 @@ # # Read benches — cold + hot variants of: # - ledgers (sweeps --n=1,10,20: single-ledger fetch, mid-page, full page) -# - txpage (ledger-range transaction page lookup) +# - txpage (page of N transactions, fully materialized to responses) # - txhash (single-hash getTransaction lookup) -# - events (eventstore.Query) -# Each read bench does a 1,4,8,16 --query-concurrency sweep. +# - events (eventstore.Query, worst-case --buckets=EVENTS_BUCKETS) +# Each read bench does a 1,4,8,16 --query-concurrency sweep. The decode-heavy +# ones (txpage/txhash/events) are run once per QUERY_VIEW_MODES entry +# (roundtrip + xdrviews) so the report can compare with/without XDR views. # -# Ingest benches (skip the whole section with RUN_INGEST=0): -# - hot-ingest (single chunk -> fresh RocksDB hot store) -# - cold-ingest (multi-chunk -> fresh cold packfiles) +# Ingest benches (skip the whole section with RUN_INGEST=0; run with +# --parallel): +# - hot-ingest (single chunk -> RocksDB hot store; run both xdr-views +# on and off — the views run feeds the reads) +# - cold-ingest (multi-chunk -> cold packfiles; xdr-views run) # - build-txhash-index (phase 2: cold .bin files -> queryable .idx) # # By default ingest writes to scratch (INGEST_OUT_ROOT) and the reads use the @@ -56,6 +60,18 @@ EVENTS_ITERS=500 PAGE_SIZE=20 SEED=1 +# XDR-views sweep for the decode-heavy query benches (txpage/txhash/events). +# Each runs once per mode so the report can compare with/without views: +# roundtrip = production path (UnmarshalBinary + ParseTransaction) +# xdrviews = zero-copy XDR views (the fast path real servers use) +# Trim to a single mode to ~halve query-bench runtime. Ledgers are raw bytes +# (no decode) so they are not swept. The bench suffixes its CSVs per mode. +QUERY_VIEW_MODES=(roundtrip xdrviews) + +# Events worst-case query: filters-per-request (K) maxed out. PR #750 asks +# the events tables to report the worst case (K=15). +EVENTS_BUCKETS="${EVENTS_BUCKETS:-15}" + # Ingest knobs. Ingest re-reads raw ledgers from a cold packfile *source* # (INGEST_SOURCE_COLD_DIR) and writes fresh hot/cold stores under # INGEST_OUT_ROOT/{hot,cold} (wiped each run). @@ -149,6 +165,23 @@ run_bench() { fi } +# Run a decode-heavy query bench (txpage/txhash/events) once per +# QUERY_VIEW_MODES entry: the "xdrviews" mode adds --xdr-views, "roundtrip" +# omits it. The label is suffixed per mode so logs don't collide (the bench +# itself suffixes its CSVs -roundtrip/-xdrviews). +run_query_views() { + local cmd="$1"; shift + local base="$1"; shift + local mode + for mode in "${QUERY_VIEW_MODES[@]}"; do + if [[ "${mode}" == "xdrviews" ]]; then + run_bench "${cmd}" "${base}-xdrviews" "$@" --xdr-views + else + run_bench "${cmd}" "${base}-roundtrip" "$@" + fi + done +} + # Run one ingest bench: same teeing/continue-on-failure behavior as # run_bench, but ingest commands take neither --seed nor --query-concurrency. run_ingest() { @@ -169,22 +202,34 @@ run_ingest() { # be empty, so each is wiped first. In INGEST_FIRST mode this runs before the # reads (feeding them); otherwise after, as an independent measurement. do_ingest() { - # hot tier — single chunk -> fresh RocksDB store. - rm -rf "${HOT_INGEST_OUT}" - run_ingest hot-ingest "hot-ingest" \ + # Ingest runs with --parallel (events/txhash/ledgers ingested concurrently + # per ledger). PR #750 asks hot ingest to be measured both with and without + # xdr-views; the bench suffixes its per-stage CSVs -view / -parsed so they + # don't collide. The xdr-views run produces the store the reads consume; + # the parsed run writes to a throwaway dir (kept only for its CSVs). + + # hot tier — single chunk. Parsed (comparison) then view (feeds the reads). + rm -rf "${HOT_INGEST_OUT}" "${HOT_INGEST_OUT}-parsed" + run_ingest hot-ingest "hot-ingest-parsed" \ + --types="${INGEST_TYPES}" --source=pack \ + --cold-dir="${INGEST_SOURCE_COLD_DIR}" \ + --chunk="${INGEST_CHUNK}" --hot-dir="${HOT_INGEST_OUT}-parsed" \ + --parallel + run_ingest hot-ingest "hot-ingest-view" \ --types="${INGEST_TYPES}" --source=pack \ --cold-dir="${INGEST_SOURCE_COLD_DIR}" \ --chunk="${INGEST_CHUNK}" --hot-dir="${HOT_INGEST_OUT}" \ - --xdr-views + --parallel --xdr-views - # cold tier — multi-chunk -> fresh packfiles. + # cold tier — multi-chunk. View run feeds the cold reads (one mode is + # enough for cold; PR #750's both-modes ask was hot-specific). rm -rf "${COLD_INGEST_OUT}" - run_ingest cold-ingest "cold-ingest" \ + run_ingest cold-ingest "cold-ingest-view" \ --types="${INGEST_TYPES}" --source=pack \ --cold-dir="${INGEST_SOURCE_COLD_DIR}" \ --chunk="${INGEST_CHUNK}" --num-chunks="${COLD_INGEST_NUM_CHUNKS}" \ --chunk-workers="${COLD_INGEST_CHUNK_WORKERS}" \ - --cold-out-dir="${COLD_INGEST_OUT}" --xdr-views + --cold-out-dir="${COLD_INGEST_OUT}" --parallel --xdr-views # cold txhash phase 2 — merge the .bin files cold-ingest wrote into a # single queryable .idx (only runs if txhash was one of INGEST_TYPES). @@ -193,6 +238,9 @@ do_ingest() { --in-dir="${COLD_INGEST_OUT}/txhash" \ --idx-out="${COLD_INGEST_IDX}" fi + + # Drop the throwaway parsed hot store (we kept only its CSVs). + rm -rf "${HOT_INGEST_OUT}-parsed" } # --------------------------------------------------------------------------- @@ -223,11 +271,11 @@ done # transaction-page reads (ledger-range tx lookup) # --------------------------------------------------------------------------- -run_bench cold-txpage "cold-txpage" \ +run_query_views cold-txpage "cold-txpage" \ --cold-dir="${COLD_LEDGERS_DIR}" \ --page-size="${PAGE_SIZE}" --iters="${TXPAGE_ITERS}" -run_bench hot-txpage "hot-txpage" \ +run_query_views hot-txpage "hot-txpage" \ --hot-dir="${HOT_LEDGERS_DIR}" --chunk="${HOT_CHUNK}" \ --page-size="${PAGE_SIZE}" --iters="${TXPAGE_ITERS}" @@ -235,12 +283,12 @@ run_bench hot-txpage "hot-txpage" \ # single-hash getTransaction lookup # --------------------------------------------------------------------------- -run_bench cold-txhash "cold-txhash" \ +run_query_views cold-txhash "cold-txhash" \ --cold-dir="${COLD_LEDGERS_DIR}" \ --txhash-cold-mphf="${COLD_TXHASH_MPHF}" \ --iters="${TXHASH_ITERS}" -run_bench hot-txhash "hot-txhash" \ +run_query_views hot-txhash "hot-txhash" \ --hot-dir="${HOT_LEDGERS_DIR}" \ --txhash-hot="${HOT_TXHASH_DIR}" \ --cold-dir="${COLD_LEDGERS_DIR}" \ @@ -248,16 +296,16 @@ run_bench hot-txhash "hot-txhash" \ --iters="${TXHASH_ITERS}" # --------------------------------------------------------------------------- -# eventstore.Query +# eventstore.Query — worst-case K (--buckets), both materialization modes # --------------------------------------------------------------------------- -run_bench cold-events "cold-events" \ +run_query_views cold-events "cold-events" \ --cold-events-dir="${COLD_EVENTS_DIR}" \ - --iters="${EVENTS_ITERS}" + --buckets="${EVENTS_BUCKETS}" --iters="${EVENTS_ITERS}" -run_bench hot-events "hot-events" \ +run_query_views hot-events "hot-events" \ --hot-dir="${HOT_EVENTS_DIR}" --chunk="${HOT_CHUNK}" \ - --iters="${EVENTS_ITERS}" + --buckets="${EVENTS_BUCKETS}" --iters="${EVENTS_ITERS}" # --------------------------------------------------------------------------- # ingest — in INGEST_FIRST mode it already ran above (feeding the reads); diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/tx_page_helpers.go b/cmd/stellar-rpc/scripts/bench-fullhistory/tx_page_helpers.go new file mode 100644 index 000000000..6790cc86e --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/tx_page_helpers.go @@ -0,0 +1,332 @@ +package main + +import ( + "encoding/hex" + "fmt" + "iter" + "time" + + "github.com/stellar/go-stellar-sdk/ingest" + goxdr "github.com/stellar/go-stellar-sdk/xdr" + + "github.com/stellar/stellar-rpc/cmd/stellar-rpc/internal/db" + "github.com/stellar/stellar-rpc/cmd/stellar-rpc/internal/ledgerbucketwindow" +) + +// walkPageMaterialize reads ledgers starting from (ledgerIdx, txIdx) and +// builds a full db.Transaction response for each of `wanted` transactions — +// the same shape getTransactions returns (envelope, result, meta, events, +// hash, application order, ledger info). This is the corrected txpage path: +// the previous walkPagePhased only touched TransactionHash + ResultPair and +// never materialized the page contents. +// +// Two materialization modes, mirroring the txhash bench: +// - xdrViews=true: zero-copy XDR views, no lcm.UnmarshalBinary, no +// ParseTransaction round-trip (decode stays 0). Upper-bound headroom. +// - xdrViews=false: production path — lcm.UnmarshalBinary (decode) then +// ingest reader + db.ParseTransaction (MarshalBinary each field). +// +// Phase totals returned: fetch (GetLedgerRaw), decode (UnmarshalBinary; +// 0 under views), materialize (building the page of responses). +func walkPageMaterialize( + getLedger func(uint32) ([]byte, error), + infos []ledgerTxCount, + ledgerIdx, txIdx, wanted int, + xdrViews bool, + passphrase string, +) (fetch, decode, materialize time.Duration, nLed, got int, err error) { + remaining := wanted + for i := ledgerIdx; i < len(infos) && remaining > 0; i++ { + t0 := time.Now() + raw, gerr := getLedger(infos[i].seq) + fetch += time.Since(t0) + if gerr != nil { + err = gerr + return + } + nLed++ + + start := 0 + if i == ledgerIdx { + start = txIdx + } + avail := infos[i].txCount - start + if avail <= 0 { + continue + } + take := avail + if take > remaining { + take = remaining + } + + var page []db.Transaction + if xdrViews { + tm := time.Now() + page, err = materializePageRangeView(raw, start, take) + materialize += time.Since(tm) + if err != nil { + err = fmt.Errorf("view materialize seq=%d [%d,%d): %w", infos[i].seq, start, start+take, err) + return + } + } else { + td := time.Now() + var lcm goxdr.LedgerCloseMeta + if uerr := lcm.UnmarshalBinary(raw); uerr != nil { + decode += time.Since(td) + err = fmt.Errorf("UnmarshalBinary seq=%d: %w", infos[i].seq, uerr) + return + } + decode += time.Since(td) + + tm := time.Now() + page, err = materializePageRangeRoundtrip(lcm, start, take, passphrase) + materialize += time.Since(tm) + if err != nil { + err = fmt.Errorf("roundtrip materialize seq=%d [%d,%d): %w", infos[i].seq, start, start+take, err) + return + } + } + // Retain the page so the materialization work isn't elided; the + // count is the headline correctness check (got must reach wanted). + got += len(page) + remaining -= len(page) + } + return +} + +// materializePageRangeRoundtrip is the production-shape path: drive the +// ingest reader sequentially and run db.ParseTransaction for each tx in +// [start, start+count). Caller has already UnmarshalBinary'd the LCM and +// timed it as the decode phase. +func materializePageRangeRoundtrip(lcm goxdr.LedgerCloseMeta, start, count int, passphrase string) ([]db.Transaction, error) { + reader, err := ingest.NewLedgerTransactionReaderFromLedgerCloseMeta(passphrase, lcm) + if err != nil { + return nil, fmt.Errorf("ingest reader: %w", err) + } + n := lcm.CountTransactions() + end := start + count + out := make([]db.Transaction, 0, count) + for i := 0; i < n && len(out) < count; i++ { + ingestTx, rerr := reader.Read() + if rerr != nil { + return nil, fmt.Errorf("ingest read i=%d: %w", i, rerr) + } + if i < start { + continue // skip txs before the page cursor + } + if i >= end { + break + } + tx, perr := db.ParseTransaction(lcm, ingestTx) + if perr != nil { + return nil, fmt.Errorf("ParseTransaction i=%d: %w", i, perr) + } + out = append(out, tx) + } + if len(out) != count { + return nil, fmt.Errorf("got %d of %d txs (ledger has %d)", len(out), count, n) + } + return out, nil +} + +// txViewParts holds the per-tx fields gathered from one single pass over a +// TxProcessing view (everything except the envelope, which lives in the +// TxSet and is fetched separately by apply index). +type txViewParts struct { + resultRaw []byte + metaRaw []byte + txHash [32]byte + successful bool + diagRaws [][]byte + txEventRaws [][]byte + opEventRaws [][][]byte +} + +// materializePageRangeView builds db.Transactions for apply indices +// [start, start+count) entirely via XDR views. TxProcessing (result + meta +// + events — the heavy part) is gathered in a single pass; the envelope for +// each index is fetched via the existing single-index TxSet helpers +// (cheap navigation, reuses the tested apply-order walk). Mirrors +// materializeViews but for a contiguous page rather than one tx. +func materializePageRangeView(raw []byte, start, count int) ([]db.Transaction, error) { + v := goxdr.LedgerCloseMetaView(raw) + dv, err := v.V() + if err != nil { + return nil, err + } + disc, err := dv.Value() + if err != nil { + return nil, err + } + + var ( + ledgerInfo ledgerbucketwindow.LedgerInfo + parts []txViewParts + // envAt returns the envelope bytes + type for a given apply index. + envAt func(idx int) ([]byte, goxdr.EnvelopeType, error) + ) + + switch disc { + case 0: + v0, err := v.V0() + if err != nil { + return nil, err + } + if ledgerInfo, err = ledgerInfoFromHeader(v0.LedgerHeader()); err != nil { + return nil, err + } + tp, err := v0.TxProcessing() + if err != nil { + return nil, err + } + if parts, err = collectTxProcessingRange(tp.Iter(), start, count); err != nil { + return nil, err + } + txSet, err := v0.TxSet() + if err != nil { + return nil, err + } + envAt = func(idx int) ([]byte, goxdr.EnvelopeType, error) { + return envelopeRawAtFromV0TxSet(txSet, idx) + } + case 1: + v1, err := v.V1() + if err != nil { + return nil, err + } + if ledgerInfo, err = ledgerInfoFromHeader(v1.LedgerHeader()); err != nil { + return nil, err + } + tp, err := v1.TxProcessing() + if err != nil { + return nil, err + } + if parts, err = collectTxProcessingRange(tp.Iter(), start, count); err != nil { + return nil, err + } + txSet, err := v1.TxSet() + if err != nil { + return nil, err + } + envAt = func(idx int) ([]byte, goxdr.EnvelopeType, error) { + return envelopeRawAtFromGeneralized(txSet, idx) + } + case 2: + v2, err := v.V2() + if err != nil { + return nil, err + } + if ledgerInfo, err = ledgerInfoFromHeader(v2.LedgerHeader()); err != nil { + return nil, err + } + tp, err := v2.TxProcessing() + if err != nil { + return nil, err + } + if parts, err = collectTxProcessingRange(tp.Iter(), start, count); err != nil { + return nil, err + } + txSet, err := v2.TxSet() + if err != nil { + return nil, err + } + envAt = func(idx int) ([]byte, goxdr.EnvelopeType, error) { + return envelopeRawAtFromGeneralized(txSet, idx) + } + default: + return nil, fmt.Errorf("unknown LedgerCloseMeta V=%d", disc) + } + + out := make([]db.Transaction, len(parts)) + for k := range parts { + applyIdx := start + k + envelopeRaw, envType, eerr := envAt(applyIdx) + if eerr != nil { + return nil, fmt.Errorf("envelope at %d: %w", applyIdx, eerr) + } + out[k] = db.Transaction{ + TransactionHash: hex.EncodeToString(parts[k].txHash[:]), + Result: parts[k].resultRaw, + Meta: parts[k].metaRaw, + Envelope: envelopeRaw, + Events: parts[k].diagRaws, + TransactionEvents: parts[k].txEventRaws, + ContractEvents: parts[k].opEventRaws, + FeeBump: envType == goxdr.EnvelopeTypeEnvelopeTypeTxFeeBump, + ApplicationOrder: int32(applyIdx) + 1, + Successful: parts[k].successful, + Ledger: ledgerInfo, + } + } + return out, nil +} + +// collectTxProcessingRange walks a TxProcessing view iterator once and +// gathers the per-tx fields (result, meta, hash, successful, events) for +// apply indices [start, start+count). Errors if the range isn't fully +// present. +func collectTxProcessingRange[T txResultMeta](src iter.Seq2[T, error], start, count int) ([]txViewParts, error) { + end := start + count + out := make([]txViewParts, 0, count) + idx := 0 + for tx, iterErr := range src { + if iterErr != nil { + return nil, iterErr + } + if idx >= end { + break + } + if idx >= start { + rp, err := tx.Result() + if err != nil { + return nil, err + } + hv, err := rp.TransactionHash() + if err != nil { + return nil, err + } + hb, err := hv.Value() + if err != nil { + return nil, err + } + rv, err := rp.Result() + if err != nil { + return nil, err + } + resultRaw, err := rv.Raw() + if err != nil { + return nil, err + } + successful, err := isResultSuccessful(rv) + if err != nil { + return nil, err + } + metaView, err := tx.TxApplyProcessing() + if err != nil { + return nil, err + } + metaRaw, err := metaView.Raw() + if err != nil { + return nil, err + } + diagRaws, txEventRaws, opEventRaws, err := extractEventRawsFromMeta(metaView) + if err != nil { + return nil, err + } + var p txViewParts + copy(p.txHash[:], hb) + p.resultRaw = resultRaw + p.metaRaw = metaRaw + p.successful = successful + p.diagRaws = diagRaws + p.txEventRaws = txEventRaws + p.opEventRaws = opEventRaws + out = append(out, p) + } + idx++ + } + if len(out) != count { + return nil, fmt.Errorf("txprocessing range [%d,%d) not fully present (got %d)", start, end, len(out)) + } + return out, nil +} From 4921174f069268446c5eb195148def9f9497b4d0 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Wed, 3 Jun 2026 23:03:41 +0000 Subject: [PATCH 13/27] bench(fullhistory): rewrite 2026-06-03 report for fixed harness (PR #750) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Re-ran c6id.8xlarge with the corrected harness and rewrote the report to address the PR #750 review: - New "c6id.8xlarge — corrected" section: query latency split into hot/cold tables with roundtrip vs xdr-views columns and P50+P99; events use worst-case K=15; ingest shown hot (parsed vs view, --parallel) and cold with the per-stage phase breakdown + per-ledger driver total. - The other three machines (2xlarge/4xlarge/im4gn) are marked STALE (old harness: tx-page-as-count, views-off) pending a re-run. - Dropped the per-machine raw-cell dump (§12) — the CSVs are on GCS. - Summary table: same treatment (banner, corrected c6id.8xlarge rows, stale markers on the rest). Headline corrected numbers: xdr-views cuts tx-page/tx-hash p50 4-9x (hot tx-hash 10.6->1.2ms) and lifts peak throughput 5-10x (hot tx-hash 706->7253 ops/s); events is decode-insensitive (1.1-1.4x). Hot ingest with views is ~2.1x faster than parsed (skips the 8.4ms/ledger UnmarshalBinary). Co-Authored-By: Claude Opus 4.8 (1M context) --- .../results/2026-06-03-cross-machine.md | 576 ++++++------------ .../results/2026-06-03-summary-table.md | 79 ++- 2 files changed, 253 insertions(+), 402 deletions(-) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md index 23246e83a..745b10188 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md @@ -2,440 +2,264 @@ Cross-machine summary of `cmd/stellar-rpc/scripts/bench-fullhistory` runs from 2026-06-03. Source per-iter and per-sweep CSVs live at -`gs://rpc-full-history/benchmarks/2026-06-03//`. Every number here -is recomputed directly from those CSVs. - -This run uses the **rewritten bench harness** (`rpc-hack`, commit `a16dfcc6`), -which is not the same harness as the 2026-05-21 report. See -[§9 Methodology changes since 2026-05-21](#9-methodology-changes-since-2026-05-21) -before comparing the two — several axes are no longer 1:1. +`gs://rpc-full-history/benchmarks/2026-06-03//`. + +> ## ⚠️ Harness corrected per PR #750 — read this first +> +> PR #750 review (tamirms) found two harness bugs that invalidated the original +> query numbers, plus several execution gaps. **They are now fixed** (see +> [§5 What changed](#5-what-changed-pr-750)), and **c6id.8xlarge has been re-run +> with the fixed harness** (commit on branch `bench/cross-machine-report-2026-05-21`). +> +> - **tx-page** previously only touched the tx hash + result pair — it measured a +> transaction *count*, not a `getTransactions` response. It now materializes a +> full page of responses. +> - **xdr-views** (the zero-copy decode path real servers use) was disabled on +> every query bench, so all numbers were the slow `UnmarshalBinary` + +> `ParseTransaction` path. Query benches now run **both** modes. +> - **events** now uses the **worst-case** query (K=15 filters); **ingest** runs +> `--parallel`, both views on and off. +> +> **The other three machines (c6id.2xlarge, c6id.4xlarge, im4gn.4xlarge) have NOT +> been re-run** — their numbers below are from the old harness (views-off, +> tx-page-as-count) and are marked **🟥 STALE — pending re-run**. Only +> [§2 c6id.8xlarge (corrected)](#2-c6id8xlarge--corrected-fixed-harness) reflects +> the fix. ## 1. Test machines -| Instance | Arch | vCPUs | RAM | Local disk | CPU | Commit | +| Instance | Arch | vCPUs | RAM | Local disk | CPU | Harness | |---|---|---|---|---|---|---| -| c6id.2xlarge | x86_64 | 8 | 15 GB | 441 GB NVMe | Intel Xeon Platinum 8375C @ 2.90GHz | a16dfcc6 | -| c6id.4xlarge | x86_64 | 16 | 31 GB | 870 GB NVMe | Intel Xeon Platinum 8375C @ 2.90GHz | a16dfcc6 | -| c6id.8xlarge | x86_64 | 32 | 62 GB | 1700 GB NVMe | Intel Xeon Platinum 8375C @ 2.90GHz | **ed4b7ced** | -| im4gn.4xlarge | aarch64 | 16 | 62 GB | 6800 GB NVMe | AWS Graviton2 (Neoverse-N1) | a16dfcc6 | +| c6id.2xlarge | x86_64 | 8 | 15 GB | 441 GB NVMe | Intel Xeon Platinum 8375C @ 2.90GHz | 🟥 old (a16dfcc6) | +| c6id.4xlarge | x86_64 | 16 | 31 GB | 870 GB NVMe | Intel Xeon Platinum 8375C @ 2.90GHz | 🟥 old (a16dfcc6) | +| **c6id.8xlarge** | x86_64 | 32 | 62 GB | 1700 GB NVMe | Intel Xeon Platinum 8375C @ 2.90GHz | **✅ fixed (PR #750)** | +| im4gn.4xlarge | aarch64 | 16 | 62 GB | 6800 GB NVMe | AWS Graviton2 (Neoverse-N1) | 🟥 old (a16dfcc6) | All ran the same toolchain (Go 1.26.3, RocksDB 10.9.1, zstd 1.5.7) on a local NVMe instance store, driven by `run-all-benches.sh` with `INGEST_FIRST=1` (each box ingests its own hot/cold/txhash stores, then reads from them). -> **Heads-up:** c6id.8xlarge ran a *different* commit (`ed4b7ced`) than the -> other three (`a16dfcc6`). `ed4b7ced` is not present in this branch's history, -> so its exact delta is unknown — treat 8xlarge as "approximately the same -> build" and don't over-read small 8xlarge-only differences. - ### Data layout -- **Reads** (`cold-ledgers`, `cold-txpage`, `cold-txhash`, `cold-events`) run - against freshly-ingested stores from this run. Cold ledger/txpage reads use - the prebuilt 141-chunk seed (chunks 5859–5999) in `cold/`; cold-txhash uses a - freshly-built MPHF index, and cold-events points at the bucketed events dir. -- **hot** reads use chunk 5860; **cold** ledger reads sample randomly across the - full 141-chunk seed (page cache evicted per iter). -- **Ingest** ran 16 cold chunks (5860–5875) on the c6id boxes and **140 chunks** - on im4gn — so absolute ingest *wall* and total-key counts differ on im4gn; - per-item rates remain comparable. +- **Reads** run against the freshly-ingested stores from this run. In + `INGEST_FIRST` mode the cold reads consume the **16-chunk** re-ingested store + (chunks 5860–5875), **not** the 141-chunk seed — so cold-ledgers samples a + 16-chunk working set here. (The stale machines' original report read cold + ledgers across the full 141-chunk seed; that axis is *not* 1:1 with the + c6id.8xlarge row.) +- **hot** reads use chunk 5860; **cold** evicts the packfile from page cache per + iter. +- **Ingest** ran 16 cold chunks (5860–5875) on the c6id boxes. + +--- + +## 2. c6id.8xlarge — corrected (fixed harness) -## 2. Read latency at single in-flight (p50) +32 vCPU x86_64, 62 GB RAM. All numbers recomputed from +`gs://rpc-full-history/benchmarks/2026-06-03/c6id.8xlarge-/`. -`--query-concurrency=1`, p50 milliseconds. This is the cleanest cross-run number -(one request at a time, no queueing). `cold/hot/×` = cold p50 / hot p50 / ratio. +### 2.1 Query latency at single in-flight (`c=1`), roundtrip vs xdr-views -| Machine | ledgers n=20 | tx-page p=20 | tx-hash roundtrip | events query | +`roundtrip` = production `UnmarshalBinary` + `ParseTransaction` (re-serialize +each field). `xdrviews` = zero-copy XDR views (what a tuned server uses). +ms, p50 / p99. + +| Workload | tier | roundtrip p50 / p99 | xdrviews p50 / p99 | views speedup (p50) | |---|---|---|---|---| -| c6id.2xlarge | 14.3 / 13.6 / 1.1× | 12.3 / 10.3 / 1.2× | 12.2 / 11.5 / 1.1× | 15.8 / 5.5 / 2.8× | -| c6id.4xlarge | 15.2 / 12.9 / 1.2× | 11.9 / 10.4 / 1.1× | 12.2 / 11.0 / 1.1× | 15.5 / 5.3 / 2.9× | -| c6id.8xlarge | 14.8 / 13.2 / 1.1× | 11.5 / 9.8 / 1.2× | 11.7 / 10.6 / 1.1× | 16.0 / 5.2 / 3.1× | -| im4gn.4xlarge | 27.5 / 24.8 / 1.1× | 20.3 / 18.5 / 1.1× | 21.7 / 20.1 / 1.1× | 20.0 / 9.1 / 2.2× | +| tx-page (p=20) | cold | 13.22 / 29.84 | **2.99** / 6.36 | 4.4× | +| tx-page (p=20) | hot | 11.10 / 24.63 | **1.52** / 5.02 | 7.3× | +| tx-hash | cold | 11.86 / 20.31 | **2.18** / 4.23 | 5.4× | +| tx-hash | hot | 10.55 / 16.93 | **1.19** / 2.71 | 8.9× | +| events (K=15) | cold | 15.44 / 48.52 | 14.38 / 45.96 | 1.07× | +| events (K=15) | hot | 6.05 / 13.75 | **4.44** / 7.66 | 1.36× | + +*tx-page and tx-hash are dominated by XDR decode + field re-serialization, so +views cut p50 by 4–9×. **events** barely moves — the query is a bitmap intersect +(hot) / on-disk term-index read + pack eviction (cold), and the per-event +post-filter decode that views accelerate is a small fraction of the total.* -*For point reads, cold and hot are now nearly identical (~1.1×): the -decode/materialize CPU cost dominates and a cold packfile open adds only ~1 ms -on warm NVMe. **events** is the exception — cold evicts and re-opens packs per -query, so hot is 2–3× faster.* +> ledgers reads serve raw bytes (no XDR decode), so there is no views variant — +> see §2.2 for their scaling. ```mermaid xychart-beta - title "Read p50 @ concurrency=1 (cold tier, ms)" - x-axis [ledgers, tx-page, tx-hash, events] - y-axis "p50 ms" 0 --> 30 - bar [14.3, 12.3, 12.2, 15.8] - bar [15.2, 11.9, 12.2, 15.5] - bar [14.8, 11.5, 11.7, 16.0] - bar [27.5, 20.3, 21.7, 20.0] + title "c6id.8xlarge p50 @ c=1: roundtrip vs xdr-views (ms)" + x-axis [txpage-cold, txpage-hot, txhash-cold, txhash-hot, events-cold, events-hot] + y-axis "p50 ms" 0 --> 16 + bar [13.22, 11.10, 11.86, 10.55, 15.44, 6.05] + bar [2.99, 1.52, 2.18, 1.19, 14.38, 4.44] ``` -*Series order: c6id.2xlarge, c6id.4xlarge, c6id.8xlarge, im4gn.4xlarge. The -three x86 boxes are within noise of each other; Graviton2 trails by ~1.3–1.8×.* +*Series: roundtrip, xdr-views.* -## 3. Concurrency scaling (1 → 16 in-flight queries) +### 2.2 Concurrency scaling (`--query-concurrency` 1→16) -`--query-concurrency` sweep. Cells are `p50 ms | ops/s`. `ops/s` is wall-clock -throughput (successful iters ÷ sweep wall) — it scales up with concurrency until -the box saturates, while p50 latency climbs as queries queue. +Cells are `p50 ms | p99 ms | ops/s`. **xdr-views path** (the realistic server +config) unless noted; ledgers (n=20) have no views variant. -### ledgers (n=20) +**Cold tier** -| Machine | tier | c=1 | c=4 | c=8 | c=16 | peak ops/s | -|---|---|---|---|---|---|---| -| c6id.2xlarge | cold | 14.3 \| 67 | 15.7 \| 230 | 26.3 \| 250 | 88.1 \| 178 | 250 | -| c6id.2xlarge | hot | 13.6 \| 72 | 16.2 \| 235 | 22.0 \| 344 | 42.0 \| 363 | 363 | -| c6id.4xlarge | cold | 15.2 \| 64 | 15.0 \| 246 | 16.0 \| 428 | 25.8 \| 501 | 501 | -| c6id.4xlarge | hot | 12.9 \| 75 | 14.3 \| 258 | 16.4 \| 448 | 21.6 \| 702 | 702 | -| c6id.8xlarge | cold | 14.8 \| 67 | 14.6 \| 258 | 14.9 \| 483 | 17.9 \| 775 | 775 | -| c6id.8xlarge | hot | 13.2 \| 75 | 13.3 \| 280 | 14.1 \| 538 | 16.5 \| 913 | 913 | -| im4gn.4xlarge | cold | 27.5 \| 36 | 28.1 \| 138 | 27.7 \| 271 | 30.3 \| 483 | 483 | -| im4gn.4xlarge | hot | 24.8 \| 39 | 24.4 \| 160 | 24.4 \| 311 | 25.8 \| 587 | 587 | - -### tx-page (page=20) - -| Machine | tier | c=1 | c=4 | c=8 | c=16 | peak ops/s | -|---|---|---|---|---|---|---| -| c6id.2xlarge | cold | 12.3 \| 60 | 17.0 \| 218 | 26.0 \| 281 | 44.4 \| 289 | 289 | -| c6id.2xlarge | hot | 10.3 \| 95 | 15.6 \| 240 | 24.9 \| 294 | 38.7 \| 302 | 302 | -| c6id.4xlarge | cold | 11.9 \| 47 | 14.7 \| 256 | 17.8 \| 422 | 29.5 \| 509 | 509 | -| c6id.4xlarge | hot | 10.4 \| 94 | 13.6 \| 278 | 16.7 \| 447 | 28.2 \| 520 | 520 | -| c6id.8xlarge | cold | 11.5 \| 50 | 13.6 \| 276 | 15.7 \| 478 | 20.7 \| 717 | 717 | -| c6id.8xlarge | hot | 9.8 \| 100 | 12.1 \| 320 | 14.3 \| 518 | 20.0 \| 745 | 745 | -| im4gn.4xlarge | cold | 20.3 \| 27 | 22.0 \| 173 | 23.3 \| 329 | 31.8 \| 459 | 459 | -| im4gn.4xlarge | hot | 18.5 \| 54 | 20.4 \| 188 | 21.8 \| 348 | 29.8 \| 493 | 493 | - -### tx-hash (roundtrip path) - -| Machine | tier | c=1 | c=4 | c=8 | c=16 | peak ops/s | -|---|---|---|---|---|---|---| -| c6id.2xlarge | cold | 12.2 \| 81 | 17.4 \| 219 | 29.2 \| 263 | 50.3 \| 263 | 263 | -| c6id.2xlarge | hot | 11.5 \| 89 | 17.2 \| 231 | 27.7 \| 276 | 42.0 \| 285 | 285 | -| c6id.4xlarge | cold | 12.2 \| 81 | 15.1 \| 254 | 19.1 \| 404 | 32.6 \| 479 | 479 | -| c6id.4xlarge | hot | 11.0 \| 92 | 14.5 \| 278 | 18.4 \| 431 | 30.6 \| 508 | 508 | -| c6id.8xlarge | cold | 11.7 \| 84 | 14.1 \| 275 | 16.8 \| 462 | 22.6 \| 689 | 689 | -| c6id.8xlarge | hot | 10.6 \| 96 | 13.4 \| 302 | 16.1 \| 494 | 22.9 \| 700 | 700 | -| im4gn.4xlarge | cold | 21.7 \| 42 | 22.3 \| 178 | 23.4 \| 339 | 30.9 \| 506 | 506 | -| im4gn.4xlarge | hot | 20.1 \| 51 | 22.7 \| 181 | 24.1 \| 339 | 32.9 \| 471 | 471 | - -### events (random K-filter query) - -| Machine | tier | c=1 | c=4 | c=8 | c=16 | peak ops/s | -|---|---|---|---|---|---|---| -| c6id.2xlarge | cold | 15.8 \| 54 | 31.8 \| 104 | 63.4 \| 105 | 106.6 \| 118 | 118 | -| c6id.2xlarge | hot | 5.5 \| 199 | 6.0 \| 375 | 10.3 \| 409 | 16.9 \| 430 | 430 | -| c6id.4xlarge | cold | 15.5 \| 54 | 15.8 \| 211 | 32.0 \| 210 | 53.9 \| 239 | 239 | -| c6id.4xlarge | hot | 5.3 \| 219 | 5.8 \| 600 | 6.5 \| 764 | 12.0 \| 828 | 828 | -| c6id.8xlarge | cold | 16.0 \| 50 | 15.0 \| 228 | 17.1 \| 408 | 26.1 \| 504 | 504 | -| c6id.8xlarge | hot | 5.2 \| 237 | 5.5 \| 720 | 6.0 \| 1147 | 8.6 \| 1453 | 1453 | -| im4gn.4xlarge | cold | 20.0 \| 29 | 22.5 \| 163 | 25.8 \| 272 | 39.1 \| 327 | 327 | -| im4gn.4xlarge | hot | 9.1 \| 144 | 9.5 \| 460 | 9.7 \| 638 | 12.8 \| 738 | 738 | +| Workload | c=1 | c=4 | c=8 | c=16 | peak ops/s | +|---|---|---|---|---|---| +| ledgers (n=20) | 14.8 \| 26.6 \| 67 | 14.6 \| 21.7 \| 258 | 14.9 \| 26.8 \| 483 | 17.9 \| 32.3 \| 775 | 775 | +| tx-page (views) | 2.99 \| 6.4 \| 85 | 2.90 \| 6.8 \| 1150 | 3.14 \| 7.5 \| 2080 | 3.79 \| 9.3 \| 3456 | **3456** | +| tx-hash (views) | 2.18 \| 4.2 \| 415 | 2.37 \| 5.4 \| 1477 | 2.57 \| 5.7 \| 2652 | 3.19 \| 6.6 \| 4170 | **4170** | +| events (views, K=15) | 14.4 \| 46.0 \| 58 | 14.3 \| 53.2 \| 243 | 16.8 \| 57.2 \| 415 | 28.5 \| 81.3 \| 512 | 512 | + +**Hot tier** + +| Workload | c=1 | c=4 | c=8 | c=16 | peak ops/s | +|---|---|---|---|---|---| +| ledgers (n=20) | 13.2 \| 17.2 \| 75 | 13.3 \| 20.6 \| 280 | 14.1 \| 22.9 \| 538 | 16.5 \| 25.9 \| 913 | 913 | +| tx-page (views) | 1.52 \| 5.0 \| 571 | 1.83 \| 5.2 \| 1903 | 2.17 \| 6.4 \| 3135 | 2.69 \| 8.3 \| 4830 | **4830** | +| tx-hash (views) | 1.19 \| 2.7 \| 775 | 1.36 \| 3.0 \| 2720 | 1.55 \| 3.8 \| 4700 | 1.98 \| 4.9 \| 7253 | **7253** | +| events (views, K=15) | 4.44 \| 7.7 \| 210 | 4.85 \| 10.3 \| 744 | 5.25 \| 19.4 \| 1257 | 6.95 \| 20.3 \| 1843 | 1843 | + +*With views, tx-page and tx-hash sustain **4.8k–7.3k ops/s** at c=16 on the +32-vCPU box — 5–8× the roundtrip ceiling. The roundtrip equivalents (on GCS) +peak at 621 / 680 ops/s for cold tx-page / tx-hash.* ```mermaid xychart-beta - title "Hot read throughput vs concurrency (c6id.8xlarge, ops/s)" + title "c6id.8xlarge hot throughput, xdr-views (ops/s)" x-axis "query-concurrency" [1, 4, 8, 16] - y-axis "ops/sec" 0 --> 1500 + y-axis "ops/sec" 0 --> 7500 line [75, 280, 538, 913] - line [100, 320, 518, 745] - line [96, 302, 494, 700] - line [237, 720, 1147, 1453] + line [571, 1903, 3135, 4830] + line [775, 2720, 4700, 7253] + line [210, 744, 1257, 1843] ``` -*Series order: ledgers, tx-page, tx-hash, events. On the 32-vCPU box every -workload scales near-linearly to 16 in-flight queries; events scales best -because hot event queries are pure in-memory bitmap intersects.* +*Series: ledgers, tx-page, tx-hash, events.* -*On the 8-vCPU c6id.2xlarge, latency balloons past c=8 (e.g. cold-ledgers -14→88 ms at c=16) — that's query oversubscription on 8 cores, not a storage -limit. The 16- and 32-vCPU boxes hold latency roughly flat through c=16.* +### 2.3 Ingest — hot vs cold, with vs without xdr-views -## 4. Cold vs hot +Ingest runs `--parallel` (ledgers/txhash/events ingested concurrently per +ledger). Hot ingest measured both ways; the per-ledger driver total is the +headline, decomposed into stages below. p50, ms. -At a single in-flight query the two tiers are within ~10% for ledger, tx-page, -and tx-hash reads (§2) — the read is decode-bound and a warm-NVMe packfile open -is cheap. The tiers diverge under two conditions: +**Hot ingest (single chunk, RocksDB), parsed vs view** -- **events**: hot is 2–3× faster at c=1 and the gap widens under load (cold - re-opens + evicts packs per query; hot keeps an in-memory term index). -- **concurrency on small boxes**: cold's per-iter page-cache eviction makes it - more sensitive to oversubscription than hot (compare cold vs hot c=16 on - c6id.2xlarge across every workload). +| Stage | parsed p50 | view p50 | note | +|---|---|---|---| +| `driver.total_per_ledger` | **18.27** | **8.54** | end-to-end per ledger | +| `driver.lcm_decode` | 8.39 | — | UnmarshalBinary; **views skip this entirely** | +| `driver.fan_out_per_ledger` | 9.47 | 7.95 | slowest enabled ingester (events) | +| `driver.read_blocked` | 0.58 | 0.57 | waiting on next raw ledger (NVMe source) | +| `ledgers.write` | 2.60 | 2.54 | RocksDB put (mode-independent) | +| `txhash.extract` | 0.02 | 0.48 | parsed reads the shared decoded struct; views walk | +| `txhash.hot_write` | 1.08 | 0.97 | | +| `events.extract` | 1.19 | 1.41 | | +| `events.hot_write` | **8.24** | **6.46** | single most expensive stage (put + WAL) | -## 5. tx-hash roundtrip breakdown +→ **Hot ingest throughput: 52 ledgers/s parsed (3m13s wall) vs 112 ledgers/s +view (1m29s wall) — views are ~2.1× faster**, entirely from skipping the 8.4 ms +`lcm_decode`. Events `hot_write` (RocksDB put + WAL) is the dominant remaining +cost. -`getTransaction(hash)` over the full round-trip path (MPHF lookup → packfile -fetch → decode → re-serialize each field). All sampled hashes were **hits** -(the bench samples only present hashes this run — no miss cohort). Per-iter -columns in `cold-txhash-roundtrip.csv` decompose total into -`lookup → pack_open → fetch → scan → materialize`; scan (zstd decode of the -LCM) dominates, with materialize (field re-serialization) second. +**Cold ingest (16 chunks, packfiles, view, `--parallel`, 8 chunk-workers)** -cold and hot land within ~1.1× at c=1 (§2) because, once the packfile is open on -warm NVMe, both tiers pay the same decode + materialize CPU. The MPHF lookup -itself is ~5–20 µs. +| Stage | p50 ms | +|---|---| +| `ledgers.write` | 0.41 | +| `txhash.extract` | 0.71 | +| `events.extract` | 2.07 | +| `events.term_index` | 0.73 | +| `events.cold_append` | 0.12 | +| `driver.fan_out_per_ledger` | 3.04 | -## 6. events query workload (new) +→ Per-chunk wall p50 ≈ 48.4 s; 16 chunks across 8 workers → **1m52s** total +(1,431 ledgers/s, 413k tx-hashes/s, 1.2M events/s end-to-end). -`cold-events-query` / `hot-events-query` issue randomized event-filter queries. -Each iter draws K filters (K sampled from `1,2,3,5,8,12,15`) partitioned from a -per-chunk 15-term universe (3 highest-volume contracts + 12 highest-volume -topics). Columns: `n_filters`, `n_unique_terms`, `query_ns`, `n_events`. +### 2.4 build-txhash-index -- **hot** is a CPU-bound bitmap intersect over an in-memory term index — p50 - ~5 ms on x86, ~9 ms on Graviton2, scaling to >1,400 ops/s on the 32-vCPU box. -- **cold** must open + evict packs and read the on-disk term index per query, so - p50 is 3× higher and the p99 tail is heavy (87–845 ms) — cold event queries - are the most tail-sensitive workload in the suite. +Phase-2 MPHF build (k-way streamhash merge + index construction): -This is a *new* workload (the 2026-05-21 run measured event *ingest*, not event -*query*), so there is no like-for-like prior number. +| keys | feed s | finish s | keys/s | idx MB | +|---|---|---|---|---| +| 46,153,867 | 1.09 | 0.10 | **42.2 M** | 199 | -## 7. xdr-view extraction & ingest stage costs +### Takeaways (c6id.8xlarge) -Ingest now runs as unified `hot-ingest` / `cold-ingest` commands that emit -per-stage timing breakdowns (`*-view.csv`). p50 per item (per ledger for -`write`/`extract`; per event-batch for event stages): +- **xdr-views is the headline for point/page reads**: 4–9× lower p50 and 5–8× + higher throughput on tx-page and tx-hash. A production server should use the + view path. +- **events is decode-insensitive** — views give only 1.1–1.4×; its cost is the + bitmap/term-index work, not XDR. +- **Ingest**: views ~2.1× faster (skip `lcm_decode`); the events RocksDB write + is the dominant hot-ingest stage. -| Machine | hot ledger write | hot tx extract | hot ev extract | hot ev write | cold ev extract | cold ev term-index | cold ev append | -|---|---|---|---|---|---|---|---| -| c6id.2xlarge | 2.59 | 0.47 | 1.39 | 7.24 | 2.68 | 0.76 | 0.12 | -| c6id.4xlarge | 2.47 | 0.46 | 1.37 | 6.63 | 2.67 | 0.82 | 0.12 | -| c6id.8xlarge | 2.47 | 0.46 | 1.37 | 6.51 | 1.75 | 0.70 | 0.10 | -| im4gn.4xlarge | 4.63 | 0.71 | 2.26 | 10.24 | 2.40 | 0.83 | 0.15 | +--- -*Hot event `write` (RocksDB put + WAL) is the single most expensive ingest -stage (~6.5–10 ms/batch). xdr-view extraction is cheap (~0.5 ms/ledger for -tx-hash, ~1.4 ms for events). Graviton2 is ~1.5–1.8× slower on the CPU-bound -extract/write stages.* +## 3. Cross-machine comparison 🟥 STALE (old harness — pending re-run) -## 8. build-txhash-index & ingest driver +> These tables are from the **old** harness on c6id.2xlarge / c6id.4xlarge / +> im4gn.4xlarge: **views-off** and **tx-page measured a tx count, not a page**. +> They are kept only for the cross-machine/architecture shape and **must be +> re-run** with the fixed harness before they're trusted. The c6id.8xlarge row +> is superseded by §2. -`build-txhash-index` is the CPU-bound phase-2 MPHF build (k-way streamhash merge -+ index construction): +### 3.1 Read p50 at `c=1` (ms) — stale -| Machine | keys | feed s | finish s | keys/s | idx MB | -|---|---|---|---|---|---| -| c6id.2xlarge | 46,153,867 | 1.79 | 0.07 | 24,809,654 | 199 | -| c6id.4xlarge | 46,153,867 | 1.17 | 0.07 | 37,113,017 | 199 | -| c6id.8xlarge | 46,153,867 | 1.10 | 0.11 | 38,311,284 | 199 | -| im4gn.4xlarge | 380,286,251 | 8.86 | 0.91 | 38,906,747 | 1,638 | - -*im4gn built over 140 chunks (380 M keys) vs 16 chunks (46 M keys) on the c6id -boxes — the absolute key count and index size differ, but the per-key rate -(~38 M keys/s) is comparable to the 16-/32-vCPU x86 boxes. The 8-vCPU c6id.2xlarge -is the outlier (~25 M keys/s) — fewer parallel block-build workers.* - -Hot ingest sustained ~74–80 ledgers/s on the c6id boxes and ~49 ledgers/s on -im4gn (`hot-driver-view` total-per-ledger p50 of 12.5–13.5 ms x86, 20.5 ms ARM). - -## 9. Methodology changes since 2026-05-21 - -The harness was rewritten on `rpc-hack`. Before comparing to the 2026-05-21 -report, note: - -- **Sweep axis renamed and re-scoped.** The old "workers" sweep (1,2,4,8,16,32) - is now "query-concurrency" (1,4,8,16). **There is no 32-worker data**, and the - peak ops/s ceiling is therefore lower by construction. -- **`ops/s` is computed differently and is *not* comparable across runs.** In - this run `ops/s ≈ concurrency ÷ p50` (clean wall-clock). In the 2026-05-21 run - the throughput wall absorbed large per-iter overhead (e.g. old cold n=1 w=1 - showed 2.2 ms p50 but only 66 ops/s). **Only single-in-flight p50 latency is a - valid cross-run comparison.** -- **Ledger reads: only `n=20` survived.** The harness writes a fixed - `cold-ledgers.csv` / `hot-ledgers.csv` regardless of `--n`, so the - `--n=1`/`--n=10` invocations were overwritten by the final `--n=20` run. The - 2026-05-21 report's n=1 and n=10 ledger numbers have no counterpart here. -- **tx-page: only `page=20`** (old run also had 100 and 200). -- **tx-hash: roundtrip only, hits only.** The xdr-views read-latency variant and - the miss cohort were not captured as separate read benches this run (xdr-view - cost now appears as the `extract` ingest stage in §7). -- **events: query, not ingest.** §6 is a brand-new workload. -- **Different chunks/data.** Hot reads use chunk 5860 (was 5000); cold txhash/ - events use freshly-built stores. Absolute latencies reflect different ledger - data than the prior run. - -## 10. Architecture: x86 vs ARM (same vCPU count) - -c6id.4xlarge (Ice Lake, 16 vCPU) vs im4gn.4xlarge (Graviton2, 16 vCPU), p50 at -c=1. >1 means ARM is slower. +| Machine | ledgers n=20 | tx-page p=20¹ | tx-hash² | events query³ | +|---|---|---|---|---| +| c6id.2xlarge 🟥 | 14.3 / 13.6 | 12.3 / 10.3 | 12.2 / 11.5 | 15.8 / 5.5 | +| c6id.4xlarge 🟥 | 15.2 / 12.9 | 11.9 / 10.4 | 12.2 / 11.0 | 15.5 / 5.3 | +| im4gn.4xlarge 🟥 | 27.5 / 24.8 | 20.3 / 18.5 | 21.7 / 20.1 | 20.0 / 9.1 | +| **c6id.8xlarge ✅** | **14.8 / 13.2** | **see §2.1** | **see §2.1** | **see §2.1** | + +`cold / hot`. ¹ stale tx-page = count-only, no page materialization. +² stale tx-hash = roundtrip, views-off. ³ stale events = random K (not worst-case). + +### 3.2 Architecture: x86 vs ARM (same vCPU) — stale + +c6id.4xlarge (Ice Lake, 16 vCPU) vs im4gn.4xlarge (Graviton2, 16 vCPU), old-harness +p50 at c=1. >1 means ARM slower. | Workload | tier | x86 | arm | arm/x86 | |---|---|---|---|---| | ledgers n=20 | cold | 15.2 ms | 27.5 ms | 1.81× | -| ledgers n=20 | hot | 12.9 ms | 24.8 ms | 1.92× | | tx-page p=20 | cold | 11.9 ms | 20.3 ms | 1.70× | -| tx-page p=20 | hot | 10.4 ms | 18.5 ms | 1.77× | -| tx-hash roundtrip | cold | 12.2 ms | 21.7 ms | 1.78× | -| tx-hash roundtrip | hot | 11.0 ms | 20.1 ms | 1.83× | -| events query | cold | 15.5 ms | 20.0 ms | 1.28× | +| tx-hash | cold | 12.2 ms | 21.7 ms | 1.78× | | events query | hot | 5.3 ms | 9.1 ms | 1.72× | -*Graviton2 trails Ice Lake by ~1.7–1.9× on the decode-bound read paths this run — -a wider per-operation gap than the 2026-05-21 run reported (~1.4–1.6×). The -read paths here are dominated by single-threaded zstd decode + XDR work, where -the 8375C's higher per-core throughput shows most. Cold event query is the -tightest (1.28×) because it is more I/O- than CPU-bound.* +*Graviton2 trailed Ice Lake by ~1.7–1.9× on the decode-bound read paths. A +fixed-harness re-run is needed to confirm the gap under the views path (where +decode cost — the 8375C's strength — is much smaller, so the ARM gap may narrow).* -## 11. Caveats +Full old-harness per-machine sweeps are on GCS (`/*.csv`); the raw +per-cell dump that was in this report has been dropped (it duplicated those CSVs). -- **c6id.8xlarge ran commit `ed4b7ced`**, the other three ran `a16dfcc6`. - `ed4b7ced` is not in this branch — its delta is unverified. -- **Throughput (`ops/s`) is not comparable to 2026-05-21** (see §9). Cross-run - comparisons in chat are restricted to single-in-flight p50 latency. -- **Only one `n` / page-size / path survived per workload** (§9) — the 6/03 - dataset is narrower than 5/21 on the read side, broader on events (query) and - ingest stage detail. -- **Oversubscription**: on the 8-vCPU c6id.2xlarge, c=16 cells measure scheduler - behavior under 2× oversubscription, not raw scaling. -- **im4gn ingested 140 cold chunks** vs 16 on the c6id boxes; ingest wall and - absolute key/index sizes differ accordingly (per-item rates are comparable). -- All sampled tx-hash lookups were hits; there is no miss-latency cohort. +--- -## 12. Per-machine raw results +## 4. Caveats -Every sweep cell per machine. `ops/s` is wall-clock throughput for that cell. +- **Only c6id.8xlarge ran the fixed harness.** The other three rows (§3) are + stale and not comparable to §2. +- **c6id.8xlarge cold reads use the 16-chunk re-ingested store** (5860–5875), not + the 141-chunk seed — a narrower cold working set than the stale machines' + original cold-ledgers methodology. +- **`ops/s` is wall-clock throughput** (successful iters ÷ sweep wall) and is not + comparable to the 2026-05-21 report (different formula; see that report). +- **events uses worst-case K=15** here (§2); the stale rows used random K from + `{1,2,3,5,8,12,15}`. +- All sampled tx-hash lookups were hits; no miss cohort. -### c6id.2xlarge — 8 vCPU x86_64, 15 GB RAM, 441 GB NVMe +--- -| Bench | c | p50 ms | p90 ms | p99 ms | ops/s | -|---|---|---|---|---|---| -| cold-ledgers (n=20) | 1 | 14.33 | 17.64 | 22.16 | 67 | -| cold-ledgers (n=20) | 4 | 15.73 | 21.89 | 29.50 | 230 | -| cold-ledgers (n=20) | 8 | 26.25 | 47.83 | 62.50 | 250 | -| cold-ledgers (n=20) | 16 | 88.05 | 104.07 | 131.09 | 178 | -| hot-ledgers (n=20) | 1 | 13.59 | 16.61 | 18.00 | 72 | -| hot-ledgers (n=20) | 4 | 16.16 | 22.33 | 25.86 | 235 | -| hot-ledgers (n=20) | 8 | 21.99 | 27.58 | 34.07 | 344 | -| hot-ledgers (n=20) | 16 | 41.98 | 58.17 | 77.74 | 363 | -| cold-txpage (p=20) | 1 | 12.32 | 16.33 | 30.23 | 60 | -| cold-txpage (p=20) | 4 | 16.96 | 25.53 | 39.68 | 218 | -| cold-txpage (p=20) | 8 | 26.04 | 39.41 | 64.43 | 281 | -| cold-txpage (p=20) | 16 | 44.43 | 96.26 | 161.38 | 289 | -| hot-txpage (p=20) | 1 | 10.32 | 15.13 | 21.79 | 95 | -| hot-txpage (p=20) | 4 | 15.59 | 23.06 | 42.11 | 240 | -| hot-txpage (p=20) | 8 | 24.90 | 37.90 | 63.77 | 294 | -| hot-txpage (p=20) | 16 | 38.68 | 96.32 | 184.61 | 302 | -| cold-txhash (roundtrip) | 1 | 12.20 | 16.38 | 21.63 | 81 | -| cold-txhash (roundtrip) | 4 | 17.41 | 26.07 | 33.86 | 219 | -| cold-txhash (roundtrip) | 8 | 29.24 | 41.34 | 52.94 | 263 | -| cold-txhash (roundtrip) | 16 | 50.27 | 104.41 | 170.11 | 263 | -| hot-txhash (roundtrip) | 1 | 11.49 | 14.63 | 18.44 | 89 | -| hot-txhash (roundtrip) | 4 | 17.23 | 23.82 | 30.08 | 231 | -| hot-txhash (roundtrip) | 8 | 27.65 | 39.60 | 50.13 | 276 | -| hot-txhash (roundtrip) | 16 | 42.00 | 102.24 | 176.65 | 285 | -| cold-events (query) | 1 | 15.77 | 22.47 | 87.55 | 54 | -| cold-events (query) | 4 | 31.81 | 51.86 | 236.44 | 104 | -| cold-events (query) | 8 | 63.41 | 92.74 | 465.74 | 105 | -| cold-events (query) | 16 | 106.61 | 160.96 | 843.57 | 118 | -| hot-events (query) | 1 | 5.55 | 11.97 | 16.08 | 199 | -| hot-events (query) | 4 | 6.00 | 34.03 | 45.96 | 375 | -| hot-events (query) | 8 | 10.27 | 65.78 | 107.08 | 409 | -| hot-events (query) | 16 | 16.92 | 113.25 | 182.61 | 430 | - -### c6id.4xlarge — 16 vCPU x86_64, 31 GB RAM, 870 GB NVMe - -| Bench | c | p50 ms | p90 ms | p99 ms | ops/s | -|---|---|---|---|---|---| -| cold-ledgers (n=20) | 1 | 15.23 | 19.42 | 27.57 | 64 | -| cold-ledgers (n=20) | 4 | 15.04 | 18.63 | 25.62 | 246 | -| cold-ledgers (n=20) | 8 | 16.01 | 22.25 | 29.08 | 428 | -| cold-ledgers (n=20) | 16 | 25.81 | 46.80 | 59.89 | 501 | -| hot-ledgers (n=20) | 1 | 12.93 | 16.71 | 19.49 | 75 | -| hot-ledgers (n=20) | 4 | 14.28 | 19.63 | 24.53 | 258 | -| hot-ledgers (n=20) | 8 | 16.41 | 21.67 | 25.63 | 448 | -| hot-ledgers (n=20) | 16 | 21.64 | 27.62 | 33.69 | 702 | -| cold-txpage (p=20) | 1 | 11.91 | 15.97 | 27.99 | 47 | -| cold-txpage (p=20) | 4 | 14.67 | 21.56 | 33.23 | 256 | -| cold-txpage (p=20) | 8 | 17.85 | 26.57 | 43.70 | 422 | -| cold-txpage (p=20) | 16 | 29.55 | 43.42 | 72.16 | 509 | -| hot-txpage (p=20) | 1 | 10.43 | 14.74 | 25.85 | 94 | -| hot-txpage (p=20) | 4 | 13.61 | 20.15 | 32.70 | 278 | -| hot-txpage (p=20) | 8 | 16.67 | 25.34 | 41.81 | 447 | -| hot-txpage (p=20) | 16 | 28.17 | 45.10 | 74.09 | 520 | -| cold-txhash (roundtrip) | 1 | 12.21 | 16.63 | 20.86 | 81 | -| cold-txhash (roundtrip) | 4 | 15.08 | 21.87 | 28.21 | 254 | -| cold-txhash (roundtrip) | 8 | 19.09 | 27.79 | 35.06 | 404 | -| cold-txhash (roundtrip) | 16 | 32.62 | 46.34 | 59.33 | 479 | -| hot-txhash (roundtrip) | 1 | 11.03 | 13.98 | 17.65 | 92 | -| hot-txhash (roundtrip) | 4 | 14.45 | 19.43 | 24.45 | 278 | -| hot-txhash (roundtrip) | 8 | 18.40 | 25.50 | 32.21 | 431 | -| hot-txhash (roundtrip) | 16 | 30.59 | 44.43 | 58.53 | 508 | -| cold-events (query) | 1 | 15.55 | 21.90 | 85.02 | 54 | -| cold-events (query) | 4 | 15.79 | 25.29 | 112.38 | 211 | -| cold-events (query) | 8 | 32.03 | 48.95 | 215.62 | 210 | -| cold-events (query) | 16 | 53.87 | 80.82 | 399.12 | 239 | -| hot-events (query) | 1 | 5.30 | 8.07 | 14.43 | 219 | -| hot-events (query) | 4 | 5.75 | 16.16 | 26.34 | 600 | -| hot-events (query) | 8 | 6.46 | 30.32 | 41.46 | 764 | -| hot-events (query) | 16 | 11.97 | 57.30 | 85.80 | 828 | - -### c6id.8xlarge — 32 vCPU x86_64, 62 GB RAM, 1700 GB NVMe (commit ed4b7ced) - -| Bench | c | p50 ms | p90 ms | p99 ms | ops/s | -|---|---|---|---|---|---| -| cold-ledgers (n=20) | 1 | 14.81 | 18.56 | 26.60 | 67 | -| cold-ledgers (n=20) | 4 | 14.63 | 17.93 | 21.66 | 258 | -| cold-ledgers (n=20) | 8 | 14.94 | 19.19 | 26.79 | 483 | -| cold-ledgers (n=20) | 16 | 17.91 | 25.08 | 32.28 | 775 | -| hot-ledgers (n=20) | 1 | 13.18 | 16.07 | 17.24 | 75 | -| hot-ledgers (n=20) | 4 | 13.29 | 16.88 | 20.62 | 280 | -| hot-ledgers (n=20) | 8 | 14.07 | 18.34 | 22.85 | 538 | -| hot-ledgers (n=20) | 16 | 16.51 | 21.47 | 25.91 | 913 | -| cold-txpage (p=20) | 1 | 11.48 | 15.60 | 27.68 | 50 | -| cold-txpage (p=20) | 4 | 13.63 | 19.17 | 29.98 | 276 | -| cold-txpage (p=20) | 8 | 15.68 | 22.77 | 37.21 | 478 | -| cold-txpage (p=20) | 16 | 20.67 | 31.30 | 49.72 | 717 | -| hot-txpage (p=20) | 1 | 9.85 | 13.79 | 22.03 | 100 | -| hot-txpage (p=20) | 4 | 12.09 | 17.27 | 29.26 | 320 | -| hot-txpage (p=20) | 8 | 14.34 | 21.18 | 34.78 | 518 | -| hot-txpage (p=20) | 16 | 20.05 | 30.17 | 49.32 | 745 | -| cold-txhash (roundtrip) | 1 | 11.66 | 15.87 | 20.96 | 84 | -| cold-txhash (roundtrip) | 4 | 14.12 | 19.58 | 25.86 | 275 | -| cold-txhash (roundtrip) | 8 | 16.84 | 23.56 | 30.49 | 462 | -| cold-txhash (roundtrip) | 16 | 22.57 | 32.23 | 41.31 | 689 | -| hot-txhash (roundtrip) | 1 | 10.59 | 13.63 | 16.98 | 96 | -| hot-txhash (roundtrip) | 4 | 13.39 | 17.57 | 22.32 | 302 | -| hot-txhash (roundtrip) | 8 | 16.08 | 21.98 | 27.43 | 494 | -| hot-txhash (roundtrip) | 16 | 22.86 | 31.39 | 39.15 | 700 | -| cold-events (query) | 1 | 15.99 | 21.83 | 85.55 | 50 | -| cold-events (query) | 4 | 14.97 | 22.06 | 108.68 | 228 | -| cold-events (query) | 8 | 17.09 | 25.16 | 87.11 | 408 | -| cold-events (query) | 16 | 26.14 | 41.10 | 162.38 | 504 | -| hot-events (query) | 1 | 5.17 | 7.13 | 13.96 | 237 | -| hot-events (query) | 4 | 5.52 | 12.70 | 19.35 | 720 | -| hot-events (query) | 8 | 5.96 | 15.37 | 25.25 | 1147 | -| hot-events (query) | 16 | 8.61 | 27.05 | 37.25 | 1453 | - -### im4gn.4xlarge — 16 vCPU aarch64, 62 GB RAM, 6800 GB NVMe - -| Bench | c | p50 ms | p90 ms | p99 ms | ops/s | -|---|---|---|---|---|---| -| cold-ledgers (n=20) | 1 | 27.50 | 32.42 | 35.49 | 36 | -| cold-ledgers (n=20) | 4 | 28.09 | 33.24 | 37.69 | 138 | -| cold-ledgers (n=20) | 8 | 27.69 | 34.44 | 43.33 | 271 | -| cold-ledgers (n=20) | 16 | 30.27 | 37.74 | 45.95 | 483 | -| hot-ledgers (n=20) | 1 | 24.79 | 30.46 | 33.49 | 39 | -| hot-ledgers (n=20) | 4 | 24.42 | 29.28 | 35.02 | 160 | -| hot-ledgers (n=20) | 8 | 24.41 | 29.09 | 34.28 | 311 | -| hot-ledgers (n=20) | 16 | 25.75 | 31.70 | 39.83 | 587 | -| cold-txpage (p=20) | 1 | 20.27 | 26.30 | 49.91 | 27 | -| cold-txpage (p=20) | 4 | 21.95 | 29.11 | 49.96 | 173 | -| cold-txpage (p=20) | 8 | 23.26 | 31.16 | 53.99 | 329 | -| cold-txpage (p=20) | 16 | 31.77 | 48.03 | 76.74 | 459 | -| hot-txpage (p=20) | 1 | 18.46 | 25.01 | 37.72 | 54 | -| hot-txpage (p=20) | 4 | 20.36 | 28.13 | 47.28 | 188 | -| hot-txpage (p=20) | 8 | 21.84 | 29.63 | 50.37 | 348 | -| hot-txpage (p=20) | 16 | 29.77 | 46.85 | 73.54 | 493 | -| cold-txhash (roundtrip) | 1 | 21.74 | 26.73 | 32.45 | 42 | -| cold-txhash (roundtrip) | 4 | 22.34 | 27.70 | 33.46 | 178 | -| cold-txhash (roundtrip) | 8 | 23.38 | 29.33 | 34.91 | 339 | -| cold-txhash (roundtrip) | 16 | 30.93 | 41.84 | 51.14 | 506 | -| hot-txhash (roundtrip) | 1 | 20.14 | 24.96 | 31.13 | 51 | -| hot-txhash (roundtrip) | 4 | 22.70 | 27.80 | 34.41 | 181 | -| hot-txhash (roundtrip) | 8 | 24.10 | 29.56 | 36.42 | 339 | -| hot-txhash (roundtrip) | 16 | 32.93 | 47.49 | 62.26 | 471 | -| cold-events (query) | 1 | 19.97 | 30.07 | 144.87 | 29 | -| cold-events (query) | 4 | 22.50 | 31.33 | 108.36 | 163 | -| cold-events (query) | 8 | 25.79 | 36.69 | 152.50 | 272 | -| cold-events (query) | 16 | 39.12 | 60.28 | 303.35 | 327 | -| hot-events (query) | 1 | 9.09 | 11.66 | 13.55 | 144 | -| hot-events (query) | 4 | 9.54 | 16.94 | 23.12 | 460 | -| hot-events (query) | 8 | 9.73 | 33.73 | 46.26 | 638 | -| hot-events (query) | 16 | 12.78 | 64.18 | 97.76 | 738 | +## 5. What changed (PR #750) + +Harness + execution fixes, all on this branch: + +1. **tx-page now materializes a page of responses** (`walkPageMaterialize` in + `tx_page_helpers.go`) — builds a full `db.Transaction` per tx (envelope, + result, meta, events, hash, application order, ledger info) instead of only + touching `TransactionHash` + `ResultPair`. +2. **tx-page gained `--xdr-views`** (single-pass view materializer mirroring + tx-hash); CSVs split `-roundtrip` / `-xdrviews`. +3. **Query benches run both view modes** (`QUERY_VIEW_MODES`), so every + decode-heavy workload reports with- and without-views. +4. **events runs worst-case K=15** (`--buckets=15`). +5. **Ingest runs `--parallel`; hot ingest measured both views-on and views-off** + (the §2.3 comparison). + +Verification: all benches passed (0 errors); tx-page `got==page` enforced; +`decode_ns=0` confirms the views path skips `UnmarshalBinary`. diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md index 154de3af5..963fd0dd5 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-summary-table.md @@ -5,6 +5,16 @@ report with concurrency sweeps, ingest stages, and per-machine raw cells: [`2026-06-03-cross-machine.md`](./2026-06-03-cross-machine.md). All numbers are recomputed from `gs://rpc-full-history/benchmarks/2026-06-03/`. +> ## ⚠️ Harness corrected (PR #750) — only c6id.8xlarge re-run +> +> The original query benches were invalid (tx-page measured a tx *count*, not a +> page; xdr-views was disabled everywhere). The harness is fixed and +> **c6id.8xlarge has been re-run**; the other three machines are **🟥 STALE — +> pending re-run**. The c6id.8xlarge rows below show the corrected **roundtrip** +> path for like-for-like comparison with the stale rows; with **xdr-views** (the +> realistic server path) tx-page/tx-hash are a further 4–9× faster — see +> [`2026-06-03-cross-machine.md` §2](./2026-06-03-cross-machine.md#2-c6id8xlarge--corrected-fixed-harness). + ## Glossary — what every term means **Machines (rows):** AWS EC2 instances. `c6id` = Intel Ice Lake x86 (compute- @@ -63,14 +73,19 @@ Cleanest "how fast is one request" view. Lower = faster. Each cell is | Machine (vCPU / arch) | ledgers n=20 | tx-page p=20 | tx-hash | events | |---|---|---|---|---| -| c6id.2xlarge (8, x86) | 14.3 / 13.6 | 12.3 / 10.3 | 12.2 / 11.5 | 15.8 / 5.5 | -| c6id.4xlarge (16, x86) | 15.2 / 12.9 | 11.9 / 10.4 | 12.2 / 11.0 | 15.5 / 5.3 | -| c6id.8xlarge (32, x86) | 14.8 / 13.2 | 11.5 / 9.8 | 11.7 / 10.6 | 16.0 / 5.2 | -| im4gn.4xlarge (16, ARM) | 27.5 / 24.8 | 20.3 / 18.5 | 21.7 / 20.1 | 20.0 / 9.1 | +| 🟥 c6id.2xlarge (8, x86) | 14.3 / 13.6 | 12.3 / 10.3¹ | 12.2 / 11.5 | 15.8 / 5.5 | +| 🟥 c6id.4xlarge (16, x86) | 15.2 / 12.9 | 11.9 / 10.4¹ | 12.2 / 11.0 | 15.5 / 5.3 | +| ✅ c6id.8xlarge (32, x86) | 14.8 / 13.2 | **13.2 / 11.1** | **11.9 / 10.6** | **15.4 / 6.1**² | +| 🟥 im4gn.4xlarge (16, ARM) | 27.5 / 24.8 | 20.3 / 18.5¹ | 21.7 / 20.1 | 20.0 / 9.1 | + +🟥 = old harness (stale). ✅ c6id.8xlarge = fixed harness, **roundtrip** path. +¹ stale tx-page = count-only (not a real page). ² events = worst-case K=15. +**With xdr-views the c6id.8xlarge tx-page = 2.99 / 1.52 and tx-hash = 2.18 / 1.19** +(4–9× faster) — see the main report §2.1. -*For point reads, cold ≈ hot (~1.1×) — decode cost dominates, a warm-NVMe file -open is cheap. Only **events** shows a big cold/hot gap (cold ~3× slower). The -ARM box is ~1.7–1.9× slower per request than same-vCPU x86.* +*For point reads on the roundtrip path, cold ≈ hot (~1.1×) — decode cost +dominates, a warm-NVMe file open is cheap. Only **events** shows a big cold/hot +gap (cold ~2.5× slower). The xdr-views path collapses tx-page/tx-hash to 1–3 ms.* ## Table 2 — Peak throughput (ops/s, best across c=1→16) @@ -79,28 +94,36 @@ How many queries/sec each box sustains under load. Higher = better. Each cell is | Machine (vCPU / arch) | ledgers n=20 | tx-page p=20 | tx-hash | events | |---|---|---|---|---| -| c6id.2xlarge (8, x86) | 250 / 363 | 289 / 302 | 263 / 285 | 118 / 430 | -| c6id.4xlarge (16, x86) | 501 / 702 | 509 / 520 | 479 / 508 | 239 / 828 | -| c6id.8xlarge (32, x86) | 775 / 913 | 717 / 745 | 689 / 700 | 504 / 1453 | -| im4gn.4xlarge (16, ARM) | 483 / 587 | 459 / 493 | 506 / 471 | 327 / 738 | +| 🟥 c6id.2xlarge (8, x86) | 250 / 363 | 289 / 302¹ | 263 / 285 | 118 / 430 | +| 🟥 c6id.4xlarge (16, x86) | 501 / 702 | 509 / 520¹ | 479 / 508 | 239 / 828 | +| ✅ c6id.8xlarge (32, x86) | 775 / 913 | **621 / 637** | **680 / 706** | **500 / 1081**² | +| 🟥 im4gn.4xlarge (16, ARM) | 483 / 587 | 459 / 493¹ | 506 / 471 | 327 / 738 | + +🟥 stale (old harness); ✅ c6id.8xlarge = fixed harness, **roundtrip** peaks. +¹ stale tx-page = count-only. ² events K=15. +**With xdr-views the c6id.8xlarge peaks jump to tx-page 3,456 / 4,830, +tx-hash 4,170 / 7,253, events 512 / 1,843** (5–10× the roundtrip ceiling). -*Throughput scales with vCPU count (8xl ≈ 2× the 4xl). Hot **events** scales -best (pure in-memory bitmap intersect → 1,453 ops/s on the 32-vCPU box).* +*Throughput scales with vCPU count. The fixed-harness c6id.8xlarge shows the +real story: the **xdr-views** path sustains 4.8k–7.3k ops/s on tx-page/tx-hash — +the roundtrip path (and the stale rows) are decode-bound and cap far lower.* ## Table 3 — Ingest throughput How fast each box writes data in. Higher = better. -| Machine (vCPU / arch) | hot-ingest (ledgers/s) | cold-ingest (ledgers/s, est.) | build-txhash-index (keys/s) | +| Machine (vCPU / arch) | hot-ingest (ledgers/s) | cold-ingest (ledgers/s) | build-txhash-index (keys/s) | |---|---|---|---| -| c6id.2xlarge (8, x86) | 74 | ~560 | 24.8 M | -| c6id.4xlarge (16, x86) | 79 | ~1,110 | 37.1 M | -| c6id.8xlarge (32, x86) | 80 | ~1,450 | 38.3 M | -| im4gn.4xlarge (16, ARM) | 49 | ~1,080 | 38.9 M | - -*hot-ingest is single-stream and **WAL-fsync-bound**, so it barely scales with -vCPUs (~80 ledgers/s ceiling on x86); the ARM box is ~1.6× slower on the -fsync + encode path. **cold-ingest** is batched (no per-ledger fsync) and runs +| 🟥 c6id.2xlarge (8, x86) | 74 | ~560 (est.) | 24.8 M | +| 🟥 c6id.4xlarge (16, x86) | 79 | ~1,110 (est.) | 37.1 M | +| ✅ c6id.8xlarge (32, x86) | **52 parsed / 112 view** | **1,431** | **42.2 M** | +| 🟥 im4gn.4xlarge (16, ARM) | 49 | ~1,080 (est.) | 38.9 M | + +*hot-ingest is **WAL-fsync-bound**. The stale rows ran it single-stream +(serial), ~80 ledgers/s on x86. The fixed c6id.8xlarge ran it **`--parallel`, +both modes**: **views = 112 ledgers/s vs parsed = 52** — views ~2.1× faster by +skipping the 8.4 ms per-ledger `UnmarshalBinary`. The ARM box is ~1.6× slower on +the fsync + encode path. **cold-ingest** is batched (no per-ledger fsync) and runs chunks in parallel, so it is ~7–18× faster than hot and scales with `--chunk-workers` — that's why the 4-worker c6id.2xlarge (~560) trails the 8-worker boxes (~1,100–1,450). build-txhash-index is CPU-bound and scales with @@ -122,10 +145,14 @@ and `extract`/`write` stages are per ledger; event stages are per event-batch. | Machine (vCPU / arch) | hot: ledger write | hot: tx extract | hot: event extract | hot: event write | cold: event extract | cold: event term-index | cold: event append | |---|---|---|---|---|---|---|---| -| c6id.2xlarge (8, x86) | 2.59 | 0.47 | 1.39 | 7.24 | 2.68 | 0.76 | 0.12 | -| c6id.4xlarge (16, x86) | 2.47 | 0.46 | 1.37 | 6.63 | 2.67 | 0.82 | 0.12 | -| c6id.8xlarge (32, x86) | 2.47 | 0.46 | 1.37 | 6.51 | 1.75 | 0.70 | 0.10 | -| im4gn.4xlarge (16, ARM) | 4.63 | 0.71 | 2.26 | 10.24 | 2.40 | 0.83 | 0.15 | +| 🟥 c6id.2xlarge (8, x86) | 2.59 | 0.47 | 1.39 | 7.24 | 2.68 | 0.76 | 0.12 | +| 🟥 c6id.4xlarge (16, x86) | 2.47 | 0.46 | 1.37 | 6.63 | 2.67 | 0.82 | 0.12 | +| ✅ c6id.8xlarge (32, x86) | 2.54 | 0.48 | 1.41 | 6.46 | 2.07 | 0.73 | 0.12 | +| 🟥 im4gn.4xlarge (16, ARM) | 4.63 | 0.71 | 2.26 | 10.24 | 2.40 | 0.83 | 0.15 | + +✅ c6id.8xlarge = fixed run, **view** (xdr-views) mode. In **parsed** mode the +driver additionally pays ~8.4 ms/ledger `lcm_decode` (UnmarshalBinary) that view +mode skips — the dominant per-ledger difference (full breakdown: main report §2.3). *Hot **event write** (RocksDB put + WAL) is the single most expensive stage (~6.5–10 ms/batch) and dominates hot-ingest cost. xdr-view **extract** is cheap From a8c829584bb5082ad12bcf101573ee82dc138ca7 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Wed, 3 Jun 2026 23:52:57 +0000 Subject: [PATCH 14/27] bench(fullhistory): generate synthetic packfiles from stellar-core apply-load Adds an `lcm` ledger source and an apply-load-gen.sh driver so the bench-fullhistory suite can run on fully synthetic, density-controlled data instead of real pubnet chunks. - sources.go: new --source=lcm reader over apply-load's framed-XDR METADATA_OUTPUT_STREAM. Skips setup ledgers (<= --lcm-checkpoint) and decode-free frame-skips to each chunk's 10k-ledger block; reuses the entire cold-ingest/hot-ingest/build-txhash-index pipeline. Wired --lcm-file/ --lcm-checkpoint flags into both ingest commands. - apply-load-gen.sh: drives stellar-core new-db/new-hist/apply-load -> meta.xdr -> cold-ingest --source=lcm -> packfiles -> build-txhash-index. Profiles map to apply-load model txs + target TPS: sac (~10k), token/oz (~9k custom_token), soroswap (~2.5k). Uses the installed core's protocol. - lcm_source_test.go: unit-tests setup-skip, chunk-block mapping, short-read. - README: documents the lcm source, the driver, profiles, BUILD_TESTS requirement, and the real cost of full 10k-ledger chunks. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../scripts/bench-fullhistory/README.md | 55 ++++- .../bench-fullhistory/apply-load-gen.sh | 212 ++++++++++++++++++ .../bench-fullhistory/bench_cold_ingest.go | 14 +- .../bench-fullhistory/bench_hot_ingest.go | 10 +- .../bench-fullhistory/lcm_source_test.go | 122 ++++++++++ .../scripts/bench-fullhistory/sources.go | 181 ++++++++++++++- 6 files changed, 587 insertions(+), 7 deletions(-) create mode 100755 cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh create mode 100644 cmd/stellar-rpc/scripts/bench-fullhistory/lcm_source_test.go diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/README.md b/cmd/stellar-rpc/scripts/bench-fullhistory/README.md index c11f86a0e..7cba3c67d 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/README.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/README.md @@ -102,9 +102,11 @@ Shared flags: | flag | meaning | |---|---| | `--types=ledgers,txhash,events` | which data types to ingest (any subset; required) | -| `--source=pack\|bsb` | `pack` reads a local cold packfile; `bsb` reads from a GCS `BufferedStorageBackend` | +| `--source=pack\|bsb\|lcm` | `pack` reads a local cold packfile; `bsb` reads from a GCS `BufferedStorageBackend`; `lcm` reads a framed `LedgerCloseMeta` file from stellar-core `apply-load` (see [Synthetic ledgers](#synthetic-ledgers-via-apply-load)) | | `--cold-dir=DIR` | source cold-store dir (required for `--source=pack`) | | `--bucket-path=...` | GCS `destination_bucket_path` (for `--source=bsb`); ADC credentials required | +| `--lcm-file=FILE` | apply-load `meta.xdr` (required for `--source=lcm`) | +| `--lcm-checkpoint=N` | skip leading ledgers with seq ≤ N (apply-load setup ledgers; for `--source=lcm`) | | `--bsb-buffer-size`, `--bsb-num-workers` | BSB prefetch tuning | | `--chunk=N` | first chunk ID to ingest (required) | | `--xdr-views` | extract via zero-copy XDR views instead of `UnmarshalBinary` + struct walk | @@ -173,6 +175,54 @@ bench-fullhistory cold-ingest --types=txhash --source=pack \ bench-fullhistory build-txhash-index --in-dir=/path/to/out/cold/txhash ``` +## Synthetic ledgers via `apply-load` + +When you don't have (or don't want) real pubnet chunks, you can generate +**fully synthetic, density-controlled** packfiles with stellar-core's +`apply-load` command. `apply-load-gen.sh` drives the whole pipeline: + +``` +apply-load → meta.xdr (framed LedgerCloseMeta) → cold-ingest --source=lcm → packfiles → build-txhash-index +``` + +```sh +# 1 chunk of SAC load first (validate the pipeline), then scale up +CORE_BIN=/path/to/stellar-core CHUNKS=1 PROFILE=sac \ + ./apply-load-gen.sh +``` + +**Workload profiles** (`PROFILE=`) map to apply-load's model transactions and +target throughputs (TPS = txs-per-ledger ÷ ledger-close-time; defaults assume +`CLOSE_TIME_S=1`): + +| `PROFILE` | model tx (`APPLY_LOAD_MODEL_TX`) | target | +|---|---|---| +| `sac` | `sac` (Stellar Asset Contract transfer) | ~10k SAC TPS | +| `token` (`oz`) | `custom_token` (OpenZeppelin-style token) | ~9k OZ TPS | +| `soroswap` | `soroswap` (AMM swap, real mainnet wasm) | ~2.5k TPS | + +Key env knobs: `CHUNKS` (10k-ledger chunks to fill, default 16), `CLOSE_TIME_S`, +`TXS_PER_LEDGER` (override the derived density), `TYPES`, `CHUNK_WORKERS`, +`OUT_ROOT`, `KEEP_META`, `BENCH_BIN`. + +**Requirements & caveats:** + +- Needs a stellar-core built with **`BUILD_TESTS`** (the CI build tagged + `…~buildtests`) — `apply-load` + `ARTIFICIALLY_GENERATE_LOAD_FOR_TESTING` + are test-only. +- **Cost is real.** Each chunk = 10,000 closed ledgers; at 9k txs/ledger that + is 90M tx applications per chunk. Dense multi-chunk runs take hours+ and tens + of GB. Start with `CHUNKS=1`. +- The driver enables meta in **benchmark mode** + (`METADATA_OUTPUT_STREAM`, `DISABLE_TX_META_FOR_TESTING=false`). If your core + does not emit meta there, fall back to a `docs/apply-load-for-meta.cfg`-style + config (`APPLY_LOAD_MODE=ledger-limits`, the proven meta path) and run + `cold-ingest --source=lcm` against its `meta.xdr` yourself. +- The `lcm` source assigns ledger sequences **positionally** per chunk (chunk 1 + → seqs 2…10001, etc.), skipping apply-load setup ledgers (`--lcm-checkpoint`, + auto-parsed from the apply-load log). Each chunk must be a full 10,000 + ledgers, so generate at least `CHUNKS × 10,000` benchmark ledgers. + ## Interpreting ingest output - **`total wall`** — end-to-end wall time. For multi-chunk cold runs it is @@ -191,6 +241,7 @@ bench-fullhistory build-txhash-index --in-dir=/path/to/out/cold/txhash - `bench_concurrent_runner.go`, `bench_grid.go` — the `--query-concurrency` sweep scaffolding. - `bench_{hot,cold}_ingest.go` — ingest drivers. - `ingest_{ledgers,txhash,events}.go` — per-type ingesters + collectors. -- `ingester.go`, `ledger.go`, `extract_{views,parsed}.go`, `sources.go` — ingest plumbing. +- `ingester.go`, `ledger.go`, `extract_{views,parsed}.go`, `sources.go` — ingest plumbing (`sources.go` has the `pack`/`bsb`/`lcm` ledger sources). +- `apply-load-gen.sh` — synthetic-ledger driver: stellar-core `apply-load` → `meta.xdr` → packfiles. - `bench_build_txhash_index.go`, `streamhash_merge.go` — phase-2 index build. - `corpus.go`, `cache*.go`, `tx_hash_helpers.go`, `metrics_helpers.go` — shared helpers. diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh new file mode 100755 index 000000000..3e9a907ef --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh @@ -0,0 +1,212 @@ +#!/usr/bin/env bash +# +# apply-load-gen.sh — generate synthetic full-history packfiles for the +# bench-fullhistory suite using stellar-core's `apply-load`. +# +# Pipeline: +# 1. stellar-core new-db + new-hist + apply-load -> meta.xdr (framed +# LedgerCloseMeta stream of dense synthetic ledgers) +# 2. bench-fullhistory cold-ingest --source=lcm -> cold packfiles +# (ledgers/, txhash/, events/) in the layout the read benches expect +# 3. build-txhash-index -> cold tx-hash MPHF +# +# The cold-* read benches then point --cold-dir at /cold/ledgers (etc). +# +# WORKLOAD PROFILES (model transaction + density): +# sac Stellar Asset Contract transfers (target ~10k SAC TPS) +# token custom (OpenZeppelin-style) token (target ~9k OZ TPS) +# soroswap Soroswap AMM swaps (target ~2.5k TPS) +# +# TPS is interpreted as txs-per-ledger / ledger-close-time. With the default +# CLOSE_TIME_S=1 the per-ledger transaction counts below hit the targets; for +# SAC the count is divided by APPLY_LOAD_BATCH_SAC_COUNT (each tx batches that +# many SAC invocations, and TPS counts each invocation). +# +# REQUIREMENTS +# * stellar-core built with BUILD_TESTS (apply-load + ARTIFICIALLY_GENERATE_ +# LOAD_FOR_TESTING). The public CI build is tagged e.g. +# `26.x.y-NNNN..noble~buildtests`. +# * The bench-fullhistory binary (this script builds it if --bench-bin unset). +# +# COST WARNING (real full chunks): each chunk is 10,000 ledgers, and a chunk +# at 9k txs/ledger applies 90M transactions. 16 chunks of dense Soroban load is +# many hours to days of apply time and tens of GB of meta. Start with +# CHUNKS=1 to validate the pipeline before scaling up. +# +set -euo pipefail + +# ---- knobs (env-overridable) ----------------------------------------------- +PROFILE="${PROFILE:-sac}" # sac | token | soroswap +CHUNKS="${CHUNKS:-16}" # number of 10k-ledger chunks to fill +CLOSE_TIME_S="${CLOSE_TIME_S:-1}" # assumed ledger close time for TPS math +TXS_PER_LEDGER="${TXS_PER_LEDGER:-}" # override the profile's per-ledger tx count +CORE_BIN="${CORE_BIN:-$(command -v stellar-core || true)}" +BENCH_BIN="${BENCH_BIN:-}" # prebuilt bench-fullhistory; built if empty +OUT_ROOT="${OUT_ROOT:-./apply-load-out}" # work + output root +TYPES="${TYPES:-ledgers,txhash,events}" # cold-ingest types +CHUNK_WORKERS="${CHUNK_WORKERS:-4}" # cold-ingest chunk concurrency +KEEP_META="${KEEP_META:-0}" # 1 = keep meta.xdr after ingest +NETWORK_PASSPHRASE="${NETWORK_PASSPHRASE:-Standalone Network ; February 2017}" + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +LEDGERS_PER_CHUNK=10000 + +log() { printf '\033[1;34m[apply-load-gen]\033[0m %s\n' "$*" >&2; } +die() { printf '\033[1;31m[apply-load-gen] ERROR:\033[0m %s\n' "$*" >&2; exit 1; } + +# ---- resolve core + verify BUILD_TESTS ------------------------------------- +[ -n "$CORE_BIN" ] || die "stellar-core not found; set CORE_BIN=/path/to/stellar-core (must be a BUILD_TESTS build)" +[ -x "$CORE_BIN" ] || die "CORE_BIN=$CORE_BIN is not executable" +CORE_VER="$("$CORE_BIN" version 2>/dev/null | head -1 || true)" +log "stellar-core: $CORE_BIN ($CORE_VER)" +if ! "$CORE_BIN" apply-load --help >/dev/null 2>&1; then + die "this stellar-core lacks the apply-load command — you need a BUILD_TESTS build (…~buildtests)" +fi + +# ---- per-profile density --------------------------------------------------- +# model_tx, dependent_tx_clusters, batch_sac_count, target_tps +case "$PROFILE" in + sac) MODEL_TX="sac"; CLUSTERS=1; BATCH_SAC=100; TARGET_TPS=10000 ;; + token|oz) MODEL_TX="custom_token"; CLUSTERS=2; BATCH_SAC=1; TARGET_TPS=9000 ;; + soroswap) MODEL_TX="soroswap"; CLUSTERS=1; BATCH_SAC=1; TARGET_TPS=2500 ;; + *) die "unknown PROFILE=$PROFILE (expected sac|token|soroswap)" ;; +esac + +# txs-per-ledger so that (txs * batch) / close_time == target_tps +if [ -z "$TXS_PER_LEDGER" ]; then + TXS_PER_LEDGER=$(( (TARGET_TPS * CLOSE_TIME_S + BATCH_SAC - 1) / BATCH_SAC )) +fi +NUM_LEDGERS=$(( CHUNKS * LEDGERS_PER_CHUNK )) +GENESIS_ACCOUNTS=$(( TXS_PER_LEDGER * 2 )) +[ "$GENESIS_ACCOUNTS" -lt 21000 ] && GENESIS_ACCOUNTS=21000 + +log "profile=$PROFILE model_tx=$MODEL_TX txs/ledger=$TXS_PER_LEDGER batch_sac=$BATCH_SAC -> ~$(( TXS_PER_LEDGER * BATCH_SAC / CLOSE_TIME_S )) TPS @ ${CLOSE_TIME_S}s close" +log "chunks=$CHUNKS num_ledgers=$NUM_LEDGERS (this is the slow part — apply-load closes every ledger)" + +# ---- workspace + config ---------------------------------------------------- +WORK_DIR="$OUT_ROOT/$PROFILE/work" +COLD_OUT="$OUT_ROOT/$PROFILE/cold" +mkdir -p "$WORK_DIR" "$COLD_OUT" +CONF="$WORK_DIR/apply-load.cfg" +META="$WORK_DIR/meta.xdr" + +cat > "$CONF" </dev/null +log "stellar-core new-hist local…" +( cd "$WORK_DIR" && "$CORE_BIN" new-hist local --conf "$CONF" ) >/dev/null +log "stellar-core apply-load… (this is the long-running step)" +APPLY_LOG="$WORK_DIR/apply-load.log" +( cd "$WORK_DIR" && "$CORE_BIN" apply-load --conf "$CONF" ) 2>&1 | tee "$APPLY_LOG" + +[ -s "$META" ] || die "apply-load produced no meta at $META. + Your core may not emit metadata in benchmark mode. Workaround: base the config + on docs/apply-load-for-meta.cfg (APPLY_LOAD_MODE=ledger-limits), which is the + proven meta-emitting path, and re-run with --source=lcm." + +# Pre-benchmark checkpoint: ledgers with seq <= this are setup, not benchmark. +CHECKPOINT="$(grep -oE 'Published final checkpoint before benchmark: ledger [0-9]+' "$APPLY_LOG" \ + | grep -oE '[0-9]+$' | tail -1 || true)" +CHECKPOINT="${CHECKPOINT:-0}" +log "pre-benchmark checkpoint = $CHECKPOINT (skipping ledgers with seq <= $CHECKPOINT)" +log "meta.xdr size: $(du -h "$META" | cut -f1)" + +# ---- build bench binary if needed ------------------------------------------ +if [ -z "$BENCH_BIN" ]; then + BENCH_BIN="$WORK_DIR/bench-fullhistory" + log "building bench-fullhistory -> $BENCH_BIN" + ( cd "$SCRIPT_DIR" && go build -o "$BENCH_BIN" . ) +fi + +# ---- cold-ingest via the lcm source ---------------------------------------- +# --chunk=1 (chunk 0 is reserved/“unset”); baseChunk=1 maps chunk 1 -> block 0. +log "cold-ingest --source=lcm (types=$TYPES, chunks=$CHUNKS)…" +"$BENCH_BIN" cold-ingest \ + --source=lcm \ + --lcm-file="$META" \ + --lcm-checkpoint="$CHECKPOINT" \ + --types="$TYPES" \ + --chunk=1 \ + --num-chunks="$CHUNKS" \ + --chunk-workers="$CHUNK_WORKERS" \ + --cold-out-dir="$COLD_OUT" \ + --xdr-views \ + --out="$WORK_DIR/bench-out" + +# ---- build the cold tx-hash index ------------------------------------------ +if [[ ",$TYPES," == *",txhash,"* ]]; then + log "build-txhash-index…" + "$BENCH_BIN" build-txhash-index \ + --in-dir="$COLD_OUT/txhash" \ + --idx-out="$COLD_OUT/txhash.idx" \ + --out="$WORK_DIR/bench-out" +fi + +[ "$KEEP_META" = "1" ] || { log "removing meta.xdr (set KEEP_META=1 to keep)"; rm -f "$META"; } + +log "DONE. Cold packfiles: $COLD_OUT/{ledgers,txhash,events}" +cat >&2 < checkpoint), then frame-skip the blocks before this one. + first, firstPayload, ferr := p.seekFirstBenchmark(f) + if ferr != nil { + yield(nil, ferr) + return + } + // firstPayload holds benchmark index 0. Skip to block*want. + toSkip := int(block) * want + var buf []byte + switch { + case toSkip == 0: + buf = firstPayload // index 0 is the first ledger we yield + default: + // Discard index 0's payload; skip the remaining toSkip-1 + // frames decode-free, leaving the file at index toSkip. + _ = firstPayload + if serr := skipFrames(f, toSkip-1); serr != nil { + yield(nil, p.shortErr(serr, block)) + return + } + } + + for i := 0; i < want; i++ { + if i == 0 && buf != nil { + if !yield(buf, nil) { + return + } + continue + } + payload, rerr := readFrame(f, &buf) + if rerr != nil { + yield(nil, p.shortErr(rerr, block)) + return + } + if !yield(payload, nil) { + return + } + } + _ = first + } +} + +// seekFirstBenchmark advances f past the setup ledgers (seq <= checkpoint) +// and returns the first benchmark ledger's sequence and its (already-read) +// payload. Only the setup region plus the first benchmark frame are decoded. +func (p *lcmStream) seekFirstBenchmark(f *os.File) (uint32, []byte, error) { + var buf []byte + for { + payload, err := readFrame(f, &buf) + if err != nil { + return 0, nil, fmt.Errorf("lcm %s: reached end before any benchmark ledger (checkpoint=%d): %w", + p.opts.file, p.opts.checkpoint, err) + } + var lcm xdr.LedgerCloseMeta + if uerr := lcm.UnmarshalBinary(payload); uerr != nil { + return 0, nil, fmt.Errorf("lcm %s: decode ledger header: %w", p.opts.file, uerr) + } + seq := lcm.LedgerSequence() + if seq > p.opts.checkpoint { + // First benchmark ledger. Copy the payload since buf is reused. + out := make([]byte, len(payload)) + copy(out, payload) + return seq, out, nil + } + } +} + +func (p *lcmStream) shortErr(err error, block uint32) error { + if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) { + return fmt.Errorf("lcm %s: not enough benchmark ledgers for chunk %d (block %d): each chunk needs %d full ledgers; "+ + "generate more apply-load ledgers (raise APPLY_LOAD_NUM_LEDGERS) or ingest fewer chunks: %w", + p.opts.file, uint32(p.chunkID), block, chunkPkg.LedgersPerChunk, err) + } + return fmt.Errorf("lcm %s chunk %d: %w", p.opts.file, uint32(p.chunkID), err) +} + +// readFrame reads one framed-XDR record (4-byte length prefix + payload) and +// returns the payload, reusing *bufp across calls. The returned slice is valid +// until the next readFrame call. +func readFrame(f *os.File, bufp *[]byte) ([]byte, error) { + n, err := xdr.ReadFrameLength(f) + if err != nil { + return nil, err + } + if cap(*bufp) < int(n) { + *bufp = make([]byte, n) + } + buf := (*bufp)[:n] + if _, err := io.ReadFull(f, buf); err != nil { + if errors.Is(err, io.EOF) { + err = io.ErrUnexpectedEOF + } + return nil, err + } + return buf, nil +} + +// skipFrames advances f past k framed-XDR records without decoding payloads +// (read the length prefix, seek past the payload). +func skipFrames(f *os.File, k int) error { + for range k { + n, err := xdr.ReadFrameLength(f) + if err != nil { + return err + } + if _, err := f.Seek(int64(n), io.SeekCurrent); err != nil { + return err + } + } + return nil +} + // BSBOpts is the per-stream BufferedStorageStream tuning, shared by the hot // driver (one stream) and each cold chunk worker (one stream per chunk). type BSBOpts struct { @@ -91,8 +257,19 @@ type BSBOpts struct { // buffered-storage stream opens/closes its datastore + backend per iteration. // Each call yields an INDEPENDENT stream, so concurrent chunk workers run fully // in parallel (independent ColdReaders / GCS prefetch pipelines). -func openChunkStream(source, coldDir, bucketPath string, opts BSBOpts, chunkID chunkPkg.ID) (ledgerbackend.LedgerStream, error) { +func openChunkStream(source, coldDir, bucketPath string, opts BSBOpts, lcm lcmOpts, chunkID chunkPkg.ID) (ledgerbackend.LedgerStream, error) { switch source { + case sourceLCM: + if lcm.file == "" { + return nil, errors.New("--lcm-file is required when --source=lcm") + } + if uint32(chunkID) < uint32(lcm.baseChunk) { + return nil, fmt.Errorf("--source=lcm: chunk %d is below base chunk %d", uint32(chunkID), uint32(lcm.baseChunk)) + } + if _, err := os.Stat(lcm.file); err != nil { + return nil, fmt.Errorf("lcm file missing: %s: %w", lcm.file, err) + } + return &lcmStream{opts: lcm, chunkID: chunkID}, nil case sourcePack: if coldDir == "" { return nil, errors.New("--cold-dir is required when --source=pack") @@ -118,6 +295,6 @@ func openChunkStream(source, coldDir, bucketPath string, opts BSBOpts, chunkID c } return ledgerbackend.NewBufferedStorageStream(cfg, dsConfig, nil), nil default: - return nil, fmt.Errorf("--source=%s; expected pack|bsb", source) + return nil, fmt.Errorf("--source=%s; expected pack|bsb|lcm", source) } } From c3e8875be61762dc92c651f75b6b637507ac5bb3 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Thu, 4 Jun 2026 05:35:42 +0000 Subject: [PATCH 15/27] bench(fullhistory): fix apply-load-gen path + passphrase defaults - absolutize OUT_ROOT so the config/meta paths survive the cd into the per-profile work dir (core was erroring "No config file ... found") - default NETWORK_PASSPHRASE to pubnet to match the bench binary's hardcoded pubnetPassphrase: the ingest reader recomputes each tx hash under this passphrase and matches it against the result entries, so a mismatch broke the roundtrip txpage/txhash read paths with "unknown tx hash in LedgerCloseMeta". Co-Authored-By: Claude Opus 4.8 (1M context) --- .../scripts/bench-fullhistory/apply-load-gen.sh | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh index 3e9a907ef..f105dd868 100755 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh @@ -46,7 +46,11 @@ OUT_ROOT="${OUT_ROOT:-./apply-load-out}" # work + output root TYPES="${TYPES:-ledgers,txhash,events}" # cold-ingest types CHUNK_WORKERS="${CHUNK_WORKERS:-4}" # cold-ingest chunk concurrency KEEP_META="${KEEP_META:-0}" # 1 = keep meta.xdr after ingest -NETWORK_PASSPHRASE="${NETWORK_PASSPHRASE:-Standalone Network ; February 2017}" +# Must match the passphrase the bench binary hardcodes (main.go: pubnetPassphrase). +# The ingest reader recomputes each tx hash from its envelope under this +# passphrase and matches it against the result entries; a mismatch makes the +# roundtrip txpage/txhash paths fail with "unknown tx hash in LedgerCloseMeta". +NETWORK_PASSPHRASE="${NETWORK_PASSPHRASE:-Public Global Stellar Network ; September 2015}" SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" LEDGERS_PER_CHUNK=10000 @@ -84,6 +88,11 @@ log "profile=$PROFILE model_tx=$MODEL_TX txs/ledger=$TXS_PER_LEDGER batch_sac=$B log "chunks=$CHUNKS num_ledgers=$NUM_LEDGERS (this is the slow part — apply-load closes every ledger)" # ---- workspace + config ---------------------------------------------------- +# Absolutize OUT_ROOT: the steps below `cd` into $WORK_DIR before invoking core +# and the bench binary, so any WORK_DIR-relative path (CONF, META, …) would no +# longer resolve from inside it. +mkdir -p "$OUT_ROOT" +OUT_ROOT="$(cd "$OUT_ROOT" && pwd)" WORK_DIR="$OUT_ROOT/$PROFILE/work" COLD_OUT="$OUT_ROOT/$PROFILE/cold" mkdir -p "$WORK_DIR" "$COLD_OUT" From 9468bba66835da22de64418d4374d5749e56b754 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Thu, 4 Jun 2026 06:10:37 +0000 Subject: [PATCH 16/27] bench(fullhistory): make apply-load synthetic LCM consumable by read benches MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit apply-load streams a LedgerCloseMeta whose tx-set and TxProcessing are the same transactions in different order, but whose stored result hash does not equal any envelope's real hash under the network passphrase (confirmed against core 26.1.1: 0/N result hashes matched an envelope under pubnet/testnet/standalone, while every envelope's source account was fee-charged in exactly one TxProcessing entry — a clean bijection). The go-stellar-sdk ingest LedgerTransactionReader pairs envelope↔result BY HASH, so it rejected the meta with "unknown tx hash in LedgerCloseMeta", breaking the roundtrip tx-page and tx-hash read benches. (The xdr-views path, which pairs positionally, was unaffected.) - lcm_fixup.go: for each result, find the fee-charged account, map it back to the unique envelope with that source, and stamp the envelope's real tx hash. This is a correct pairing, not merely self-consistent. cold-ingest --source=lcm applies it by default (--lcm-fix-tx-hashes); logs fixed/skipped per chunk. - sources.go: lcmStream applies the fixup and tolerates a short final chunk (--lcm-allow-partial) so runs sized below a full 10k-ledger chunk work. - cold-ledgers / cold-txhash: clamp sampling + start cursors to each chunk's actual ledger range (FirstSeq/LastSeq) so partial chunks don't short-read. - apply-load-gen.sh: NUM_LEDGERS knob for quick runs — TPS is set by per-ledger density, not ledger count, so a few hundred ledgers hit the profile target. - README: document the fixup, partial chunks, NUM_LEDGERS, and that cold-events is unsupported on apply-load data (single-contract; corpus needs >=3). Validated end-to-end: cold-ledgers / cold-txpage / cold-txhash all run with 0 errors on both a 308-ledger SAC store and an 892-ledger / 7.65M-tx token store (fixup paired 7650042/7650042, skipped 0). Co-Authored-By: Claude Opus 4.8 (1M context) --- .../scripts/bench-fullhistory/README.md | 45 +++-- .../bench-fullhistory/apply-load-gen.sh | 16 +- .../bench-fullhistory/bench_cold_ingest.go | 13 +- .../bench-fullhistory/bench_cold_ledgers.go | 19 +- .../bench-fullhistory/bench_hot_ingest.go | 7 +- .../scripts/bench-fullhistory/lcm_fixup.go | 168 ++++++++++++++++++ .../scripts/bench-fullhistory/sources.go | 73 +++++++- .../bench-fullhistory/tx_hash_helpers.go | 14 ++ 8 files changed, 325 insertions(+), 30 deletions(-) create mode 100644 cmd/stellar-rpc/scripts/bench-fullhistory/lcm_fixup.go diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/README.md b/cmd/stellar-rpc/scripts/bench-fullhistory/README.md index 7cba3c67d..e380975a3 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/README.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/README.md @@ -107,6 +107,8 @@ Shared flags: | `--bucket-path=...` | GCS `destination_bucket_path` (for `--source=bsb`); ADC credentials required | | `--lcm-file=FILE` | apply-load `meta.xdr` (required for `--source=lcm`) | | `--lcm-checkpoint=N` | skip leading ledgers with seq ≤ N (apply-load setup ledgers; for `--source=lcm`) | +| `--lcm-fix-tx-hashes` | repair apply-load's tx-hash/envelope mismatch so the roundtrip reader can consume the meta (default `true`; `--source=lcm`) | +| `--lcm-allow-partial` | allow a short final chunk when the run was sized below 10k ledgers (default `true`; `--source=lcm`) | | `--bsb-buffer-size`, `--bsb-num-workers` | BSB prefetch tuning | | `--chunk=N` | first chunk ID to ingest (required) | | `--xdr-views` | extract via zero-copy XDR views instead of `UnmarshalBinary` + struct walk | @@ -186,8 +188,10 @@ apply-load → meta.xdr (framed LedgerCloseMeta) → cold-ingest --source=lc ``` ```sh -# 1 chunk of SAC load first (validate the pipeline), then scale up -CORE_BIN=/path/to/stellar-core CHUNKS=1 PROFILE=sac \ +# A small SAC run is enough to exercise the read benches: TPS is set by +# per-ledger DENSITY, not ledger count, so a few hundred ledgers already hit +# the profile's target throughput. +CORE_BIN=/path/to/stellar-core PROFILE=sac NUM_LEDGERS=300 \ ./apply-load-gen.sh ``` @@ -201,7 +205,9 @@ target throughputs (TPS = txs-per-ledger ÷ ledger-close-time; defaults assume | `token` (`oz`) | `custom_token` (OpenZeppelin-style token) | ~9k OZ TPS | | `soroswap` | `soroswap` (AMM swap, real mainnet wasm) | ~2.5k TPS | -Key env knobs: `CHUNKS` (10k-ledger chunks to fill, default 16), `CLOSE_TIME_S`, +Key env knobs: `NUM_LEDGERS` (total ledgers to generate; **prefer this for a +quick run** — the final chunk may be partial), `CHUNKS` (10k-ledger chunks to +fill, default 16; ignored when `NUM_LEDGERS` is set), `CLOSE_TIME_S`, `TXS_PER_LEDGER` (override the derived density), `TYPES`, `CHUNK_WORKERS`, `OUT_ROOT`, `KEEP_META`, `BENCH_BIN`. @@ -210,18 +216,29 @@ Key env knobs: `CHUNKS` (10k-ledger chunks to fill, default 16), `CLOSE_TIME_S`, - Needs a stellar-core built with **`BUILD_TESTS`** (the CI build tagged `…~buildtests`) — `apply-load` + `ARTIFICIALLY_GENERATE_LOAD_FOR_TESTING` are test-only. -- **Cost is real.** Each chunk = 10,000 closed ledgers; at 9k txs/ledger that - is 90M tx applications per chunk. Dense multi-chunk runs take hours+ and tens - of GB. Start with `CHUNKS=1`. -- The driver enables meta in **benchmark mode** - (`METADATA_OUTPUT_STREAM`, `DISABLE_TX_META_FOR_TESTING=false`). If your core - does not emit meta there, fall back to a `docs/apply-load-for-meta.cfg`-style - config (`APPLY_LOAD_MODE=ledger-limits`, the proven meta path) and run - `cold-ingest --source=lcm` against its `meta.xdr` yourself. +- **Cost scales with density, not just count.** apply-load close time grows with + txs/ledger and accumulated state: `sac` (1 fat batched tx/ledger) runs at + ~0.1 s/ledger, but `token`/`soroswap` apply ~9k txs/ledger at ~9 s/ledger and + rising. A full 10k-ledger chunk of dense Soroban load is **hours to days** — + so size dense profiles with a small `NUM_LEDGERS` (a few hundred), which still + meets the TPS target. +- **apply-load tx-hash fixup (automatic).** `apply-load`'s streamed meta records + the same transactions in the tx-set and in `TxProcessing`, but the stored + result hash does **not** equal the envelope's real hash, so the go-stellar-sdk + ingest `LedgerTransactionReader` (which pairs envelope↔result by hash) rejects + it with *"unknown tx hash in LedgerCloseMeta"* — breaking the roundtrip + tx-page / tx-hash benches. `cold-ingest --source=lcm` repairs this by default + (`--lcm-fix-tx-hashes`): it pairs each result to its envelope via the + fee-charged account and stamps the correct hash. See `lcm_fixup.go`. - The `lcm` source assigns ledger sequences **positionally** per chunk (chunk 1 - → seqs 2…10001, etc.), skipping apply-load setup ledgers (`--lcm-checkpoint`, - auto-parsed from the apply-load log). Each chunk must be a full 10,000 - ledgers, so generate at least `CHUNKS × 10,000` benchmark ledgers. + → seqs 10002…20001, etc.), skipping apply-load setup ledgers + (`--lcm-checkpoint`). The final chunk may be **partial** when the run was + sized below a full chunk (`--lcm-allow-partial`, on by default); the read + benches clamp their cursors to each chunk's actual ledger range. +- **`cold-events` is not supported on apply-load data.** Its corpus builder needs + ≥3 distinct contracts emitting 4-topic events, but every apply-load profile + drives a single contract. Use real pubnet chunks (`--source=bsb`/`pack`) for + event benches. `cold-ledgers`, `cold-txpage`, and `cold-txhash` all work. ## Interpreting ingest output diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh index f105dd868..b41c2419e 100755 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh @@ -38,6 +38,12 @@ set -euo pipefail # ---- knobs (env-overridable) ----------------------------------------------- PROFILE="${PROFILE:-sac}" # sac | token | soroswap CHUNKS="${CHUNKS:-16}" # number of 10k-ledger chunks to fill +NUM_LEDGERS="${NUM_LEDGERS:-}" # override total ledgers (else CHUNKS*10k). + # Use a small value for a quick run that + # still hits the profile's TPS (TPS is set + # by density, not ledger count). The final + # chunk is then partial (cold-ingest's + # --lcm-allow-partial handles it). CLOSE_TIME_S="${CLOSE_TIME_S:-1}" # assumed ledger close time for TPS math TXS_PER_LEDGER="${TXS_PER_LEDGER:-}" # override the profile's per-ledger tx count CORE_BIN="${CORE_BIN:-$(command -v stellar-core || true)}" @@ -80,7 +86,15 @@ esac if [ -z "$TXS_PER_LEDGER" ]; then TXS_PER_LEDGER=$(( (TARGET_TPS * CLOSE_TIME_S + BATCH_SAC - 1) / BATCH_SAC )) fi -NUM_LEDGERS=$(( CHUNKS * LEDGERS_PER_CHUNK )) +# NUM_LEDGERS override: when set, it drives generation directly and CHUNKS is +# derived as the number of (10k-ledger) chunks needed to cover it (the last is +# partial). Otherwise NUM_LEDGERS = CHUNKS full chunks. +if [ -n "$NUM_LEDGERS" ]; then + CHUNKS=$(( (NUM_LEDGERS + LEDGERS_PER_CHUNK - 1) / LEDGERS_PER_CHUNK )) + [ "$CHUNKS" -lt 1 ] && CHUNKS=1 +else + NUM_LEDGERS=$(( CHUNKS * LEDGERS_PER_CHUNK )) +fi GENESIS_ACCOUNTS=$(( TXS_PER_LEDGER * 2 )) [ "$GENESIS_ACCOUNTS" -lt 21000 ] && GENESIS_ACCOUNTS=21000 diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/bench_cold_ingest.go b/cmd/stellar-rpc/scripts/bench-fullhistory/bench_cold_ingest.go index 0317f2054..857e2f040 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/bench_cold_ingest.go +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/bench_cold_ingest.go @@ -50,6 +50,7 @@ func cmdColdIngest() int { // worker too (a pack reader or a BSB session), so nothing here is shared // across workers — BSB's single sequential cursor cannot be. type coldDeps struct { + logger *supportlog.Entry source string coldDir string bucketPath string @@ -82,6 +83,10 @@ func buildColdDeps(logger *supportlog.Entry) (context.Context, coldDeps, func(), "framed-XDR LedgerCloseMeta file from apply-load (required iff --source=lcm)") lcmCheckpoint := fs.Uint("lcm-checkpoint", 0, "apply-load pre-benchmark checkpoint: skip leading ledgers with seq <= this (used iff --source=lcm)") + lcmFixTxHashes := fs.Bool("lcm-fix-tx-hashes", true, + "repair apply-load's tx-hash/envelope mismatch so the roundtrip ingest reader can consume the meta (used iff --source=lcm)") + lcmAllowPartial := fs.Bool("lcm-allow-partial", true, + "allow the final chunk to be shorter than a full chunk when the apply-load run was sized below 10k ledgers (used iff --source=lcm)") bsbBufferSize := fs.Uint("bsb-buffer-size", 5000, "BSB prefetch buffer depth, PER chunk worker (total buffered ledgers ≈ this × --chunk-workers)") bsbNumWorkers := fs.Uint("bsb-num-workers", 50, @@ -184,8 +189,12 @@ func buildColdDeps(logger *supportlog.Entry) (context.Context, coldDeps, func(), ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM) deps := coldDeps{ + logger: logger, source: *source, coldDir: *coldDir, bucketPath: *bucketPath, - lcm: lcmOpts{file: *lcmFile, checkpoint: uint32(*lcmCheckpoint), baseChunk: startChunk}, + lcm: lcmOpts{ + file: *lcmFile, checkpoint: uint32(*lcmCheckpoint), baseChunk: startChunk, + fixTxHashes: *lcmFixTxHashes, passphrase: pubnetPassphrase, allowPartial: *lcmAllowPartial, + }, startChunk: startChunk, numChunks: *numChunks, chunkWorkers: *chunkWorkers, outRoot: *coldOutDir, subdirs: subdirs, enabled: enabled, xdrViews: *xdrViews, parallel: *parallel, mode: mode, @@ -271,7 +280,7 @@ func runOneChunkCold(ctx context.Context, d coldDeps, chunkID chunk.ID) (_ *chun // Acquire this chunk's ledger stream. Each chunk gets its own INDEPENDENT // stream so chunk workers run fully in parallel, and the stream owns its own // setup + teardown (no separate prepare/close to manage here). - stream, oerr := openChunkStream(d.source, d.coldDir, d.bucketPath, d.bsbOpts, d.lcm, chunkID) + stream, oerr := openChunkStream(d.logger, d.source, d.coldDir, d.bucketPath, d.bsbOpts, d.lcm, chunkID) if oerr != nil { return nil, oerr } diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/bench_cold_ledgers.go b/cmd/stellar-rpc/scripts/bench-fullhistory/bench_cold_ledgers.go index 84cb8cdb4..af3441fae 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/bench_cold_ledgers.go +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/bench_cold_ledgers.go @@ -96,7 +96,6 @@ func coldRangeOp( chunkLo, chunkSpan uint32, n int, ) iterOp { - startSpan := ledgersPerChunk - uint32(n) + 1 return func(rng *rand.Rand, _ bool) (time.Duration, error) { c := chunkLo + rng.Uint32N(chunkSpan) path := packPath(coldDir, c) @@ -113,7 +112,23 @@ func coldRangeOp( } defer r.Close() - start := chunkFirstLedger(c) + rng.Uint32N(startSpan) + // Clamp the start-cursor span to the chunk's ACTUAL ledger range. A + // chunk from a synthetic run sized below LedgersPerChunk is partial, so + // its ledgers occupy only the start of the nominal range; using the full + // nominal span would pick start seqs past the end and short-read. + firstSeq := chunkFirstLedger(c) + if fs, ferr := r.FirstSeq(); ferr == nil && fs > firstSeq { + firstSeq = fs + } + lastSeq := chunkLastLedger(c) + if ls, lerr := r.LastSeq(); lerr == nil && ls < lastSeq { + lastSeq = ls + } + avail := int(lastSeq) - int(firstSeq) + 1 + if avail < n { + return 0, fmt.Errorf("chunk %d has %d ledgers, fewer than n=%d", c, avail, n) + } + start := firstSeq + rng.Uint32N(uint32(avail-n+1)) end := start + uint32(n) - 1 seen := 0 for entry, ierr := range r.IterateLedgers(start, end) { diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/bench_hot_ingest.go b/cmd/stellar-rpc/scripts/bench-fullhistory/bench_hot_ingest.go index 37435db0e..6a729a713 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/bench_hot_ingest.go +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/bench_hot_ingest.go @@ -133,9 +133,12 @@ func buildHotDeps(logger *supportlog.Entry) (context.Context, hotDeps, func(), e // Open the single-chunk ledger stream. Hot ingest is single-chunk; the // stream owns its own setup + teardown, so there is nothing to close here // beyond the ingesters. - stream, err := openChunkStream(*source, *coldDir, *bucketPath, + stream, err := openChunkStream(logger, *source, *coldDir, *bucketPath, BSBOpts{BufferSize: *bsbBufferSize, NumWorkers: *bsbNumWorkers, RetryLimit: *retryLimit, RetryWait: *retryWait}, - lcmOpts{file: *lcmFile, checkpoint: uint32(*lcmCheckpoint), baseChunk: chunkID}, + lcmOpts{ + file: *lcmFile, checkpoint: uint32(*lcmCheckpoint), baseChunk: chunkID, + fixTxHashes: true, passphrase: pubnetPassphrase, allowPartial: true, + }, chunkID) if err != nil { cancel() diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/lcm_fixup.go b/cmd/stellar-rpc/scripts/bench-fullhistory/lcm_fixup.go new file mode 100644 index 000000000..fe797b6e2 --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/lcm_fixup.go @@ -0,0 +1,168 @@ +package main + +// apply-load tx-hash fixup. +// +// stellar-core's `apply-load` streams a LedgerCloseMeta whose transaction SET +// (the generalized tx set's parallel-soroban phase) and transaction RESULTS +// (TxProcessing) are the same transactions but in different order, and whose +// stored result hash (TxProcessing[i].Result.TransactionHash) does NOT equal +// the hash of any envelope under the network passphrase. (Confirmed empirically +// against core 26.1.1: for a dense ledger, 0/N result hashes matched an +// envelope hash under pubnet/testnet/standalone, yet every envelope's source +// account was charged a fee in exactly one TxProcessing entry — i.e. a clean +// bijection via the fee-charged account.) +// +// The go-stellar-sdk ingest LedgerTransactionReader pairs envelopes to results +// BY HASH (it hashes each envelope under the passphrase and looks the stored +// result hash up in that map), so it fails with "unknown tx hash in +// LedgerCloseMeta" on raw apply-load meta. That breaks the roundtrip +// tx-page / tx-hash read benches (the xdr-views path, which pairs positionally, +// is unaffected). +// +// fixupModelTxHashes repairs the meta so the standard reader can consume it: +// for each TxProcessing[i] it finds the fee-charged account, maps it back to +// the unique envelope with that source account, and stamps +// TxProcessing[i].Result.TransactionHash with that envelope's real hash. This +// is a CORRECT pairing (not merely self-consistent): the result/meta stays +// attached to the transaction it actually belongs to. + +import ( + "github.com/stellar/go-stellar-sdk/network" + "github.com/stellar/go-stellar-sdk/xdr" +) + +// fixupStats is returned for logging/validation. +type fixupStats struct { + ledgers int + txs int + fixed int + skipped int // txs that could not be uniquely paired (left untouched) +} + +func (s *fixupStats) add(o fixupStats) { + s.ledgers += o.ledgers + s.txs += o.txs + s.fixed += o.fixed + s.skipped += o.skipped +} + +// feeChargedAccount returns the single account whose entry appears in a tx's +// fee-processing changes (the source account that paid the fee), or "" if the +// changes reference zero or more than one distinct account. +func feeChargedAccount(changes xdr.LedgerEntryChanges) string { + acct := "" + for _, ch := range changes { + var le *xdr.LedgerEntry + switch ch.Type { + case xdr.LedgerEntryChangeTypeLedgerEntryState: + if s, ok := ch.GetState(); ok { + le = &s + } + case xdr.LedgerEntryChangeTypeLedgerEntryUpdated: + if s, ok := ch.GetUpdated(); ok { + le = &s + } + case xdr.LedgerEntryChangeTypeLedgerEntryCreated: + if s, ok := ch.GetCreated(); ok { + le = &s + } + } + if le == nil { + continue + } + ae, ok := le.Data.GetAccount() + if !ok { + continue + } + a := ae.AccountId.Address() + if acct != "" && acct != a { + return "" // more than one distinct account — ambiguous + } + acct = a + } + return acct +} + +// fixupModelTxHashes rewrites a raw framed LedgerCloseMeta payload so the +// ingest LedgerTransactionReader can pair envelopes to results by hash. It +// returns the (possibly rewritten) payload and per-ledger stats. On any decode +// error it returns the input unchanged. +func fixupModelTxHashes(raw []byte, passphrase string) ([]byte, fixupStats, error) { + var lcm xdr.LedgerCloseMeta + if err := lcm.UnmarshalBinary(raw); err != nil { + return nil, fixupStats{}, err + } + n := lcm.CountTransactions() + if n == 0 { + return raw, fixupStats{ledgers: 1}, nil + } + + // source account -> envelope hash, tracking duplicates so we never pair + // ambiguously (an account submitting >1 tx in the ledger). + type ent struct { + h xdr.Hash + count int + } + bySource := make(map[string]*ent, n) + for _, e := range lcm.TransactionEnvelopes() { + h, err := network.HashTransactionInEnvelope(e, passphrase) + if err != nil { + return nil, fixupStats{}, err + } + src := e.SourceAccount().ToAccountId().Address() + if x, ok := bySource[src]; ok { + x.count++ + } else { + bySource[src] = &ent{h: xdr.Hash(h)} + bySource[src].count = 1 + } + } + + st := fixupStats{ledgers: 1, txs: n} + stamp := func(fee xdr.LedgerEntryChanges) (xdr.Hash, bool) { + acct := feeChargedAccount(fee) + if acct == "" { + return xdr.Hash{}, false + } + x, ok := bySource[acct] + if !ok || x.count != 1 { + return xdr.Hash{}, false + } + return x.h, true + } + + switch lcm.V { + case 1: + v1 := lcm.MustV1() + for i := range v1.TxProcessing { + if h, ok := stamp(v1.TxProcessing[i].FeeProcessing); ok { + v1.TxProcessing[i].Result.TransactionHash = h + st.fixed++ + } else { + st.skipped++ + } + } + lcm.V1 = &v1 + case 2: + v2 := lcm.MustV2() + for i := range v2.TxProcessing { + if h, ok := stamp(v2.TxProcessing[i].FeeProcessing); ok { + v2.TxProcessing[i].Result.TransactionHash = h + st.fixed++ + } else { + st.skipped++ + } + } + lcm.V2 = &v2 + default: + // V0 (non-generalized tx set): the SDK reader handles these directly; + // nothing to fix. + return raw, st, nil + } + + out, err := lcm.MarshalBinary() + if err != nil { + return nil, fixupStats{}, err + } + return out, st, nil +} diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/sources.go b/cmd/stellar-rpc/scripts/bench-fullhistory/sources.go index fd69fd17b..985f05c02 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/sources.go +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/sources.go @@ -18,6 +18,7 @@ import ( "github.com/stellar/go-stellar-sdk/ingest/ledgerbackend" "github.com/stellar/go-stellar-sdk/support/datastore" + supportlog "github.com/stellar/go-stellar-sdk/support/log" "github.com/stellar/go-stellar-sdk/xdr" chunkPkg "github.com/stellar/stellar-rpc/cmd/stellar-rpc/internal/fullhistory/pkg/chunk" @@ -47,6 +48,17 @@ type lcmOpts struct { file string checkpoint uint32 baseChunk chunkPkg.ID + // fixTxHashes repairs apply-load's tx-hash/envelope mismatch so the + // roundtrip ingest reader can consume the meta (see lcm_fixup.go). + fixTxHashes bool + // passphrase is the network passphrase used to recompute tx hashes during + // the fixup; it must match the bench reader's passphrase. + passphrase string + // allowPartial lets the final chunk be short (fewer than LedgersPerChunk): + // when the framed file ends before `want` ledgers, stop cleanly instead of + // erroring. This supports small synthetic runs sized to a TPS target rather + // than a full 10k-ledger chunk. + allowPartial bool } // packStream is a ledgerbackend.LedgerStream backed by a single cold packfile. @@ -110,6 +122,25 @@ func (p *packStream) RawLedgers(_ context.Context, r ledgerbackend.Range) iter.S type lcmStream struct { opts lcmOpts chunkID chunkPkg.ID + logger *supportlog.Entry +} + +// applyFixup runs the apply-load tx-hash fixup on one raw payload when enabled, +// accumulating stats into st. On any decode/encode error it returns the input +// unchanged (the ingester will surface the underlying problem downstream). +func (p *lcmStream) applyFixup(raw []byte, st *fixupStats) []byte { + if !p.opts.fixTxHashes { + return raw + } + out, s, err := fixupModelTxHashes(raw, p.opts.passphrase) + if err != nil { + if p.logger != nil { + p.logger.Warnf("lcm fixup decode failed (passing through): %v", err) + } + return raw + } + st.add(s) + return out } var _ ledgerbackend.LedgerStream = (*lcmStream)(nil) @@ -152,26 +183,50 @@ func (p *lcmStream) RawLedgers(_ context.Context, r ledgerbackend.Range) iter.Se } } + var fx fixupStats + yielded := 0 for i := 0; i < want; i++ { + var payload []byte if i == 0 && buf != nil { - if !yield(buf, nil) { + payload = buf + } else { + raw, rerr := readFrame(f, &buf) + if rerr != nil { + // End of the framed file. For the final/only chunk this is + // expected when the synthetic run was sized below a full + // chunk: yield what we have (if allowed) rather than error. + if p.opts.allowPartial && isEnd(rerr) { + break + } + yield(nil, p.shortErr(rerr, block)) return } - continue + payload = raw } - payload, rerr := readFrame(f, &buf) - if rerr != nil { - yield(nil, p.shortErr(rerr, block)) + if !yield(p.applyFixup(payload, &fx), nil) { return } - if !yield(payload, nil) { - return + yielded++ + } + if p.logger != nil { + if yielded < want { + p.logger.Infof("lcm chunk %d: short chunk — yielded %d of %d ledgers (file ended; sized below a full chunk)", + uint32(p.chunkID), yielded, want) + } + if p.opts.fixTxHashes { + p.logger.Infof("lcm chunk %d: tx-hash fixup — ledgers=%d txs=%d fixed=%d skipped=%d", + uint32(p.chunkID), fx.ledgers, fx.txs, fx.fixed, fx.skipped) } } _ = first } } +// isEnd reports whether err signals a clean end of the framed-XDR file. +func isEnd(err error) bool { + return errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) +} + // seekFirstBenchmark advances f past the setup ledgers (seq <= checkpoint) // and returns the first benchmark ledger's sequence and its (already-read) // payload. Only the setup region plus the first benchmark frame are decoded. @@ -257,7 +312,7 @@ type BSBOpts struct { // buffered-storage stream opens/closes its datastore + backend per iteration. // Each call yields an INDEPENDENT stream, so concurrent chunk workers run fully // in parallel (independent ColdReaders / GCS prefetch pipelines). -func openChunkStream(source, coldDir, bucketPath string, opts BSBOpts, lcm lcmOpts, chunkID chunkPkg.ID) (ledgerbackend.LedgerStream, error) { +func openChunkStream(logger *supportlog.Entry, source, coldDir, bucketPath string, opts BSBOpts, lcm lcmOpts, chunkID chunkPkg.ID) (ledgerbackend.LedgerStream, error) { switch source { case sourceLCM: if lcm.file == "" { @@ -269,7 +324,7 @@ func openChunkStream(source, coldDir, bucketPath string, opts BSBOpts, lcm lcmOp if _, err := os.Stat(lcm.file); err != nil { return nil, fmt.Errorf("lcm file missing: %s: %w", lcm.file, err) } - return &lcmStream{opts: lcm, chunkID: chunkID}, nil + return &lcmStream{opts: lcm, chunkID: chunkID, logger: logger}, nil case sourcePack: if coldDir == "" { return nil, errors.New("--cold-dir is required when --source=pack") diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/tx_hash_helpers.go b/cmd/stellar-rpc/scripts/bench-fullhistory/tx_hash_helpers.go index 8a7d33135..0aaf0d5d5 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/tx_hash_helpers.go +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/tx_hash_helpers.go @@ -44,7 +44,21 @@ func sampleHashesFromCold( } defer r.Close() + // Clamp the requested [first,last] to what the pack actually holds. A chunk + // generated from a synthetic run sized below LedgersPerChunk is a partial + // chunk: its real ledgers occupy only the start of the nominal chunk range, + // so sampling a random seq across the full nominal span would hit holes. + if fs, ferr := r.FirstSeq(); ferr == nil && fs > first { + first = fs + } + if ls, lerr := r.LastSeq(); lerr == nil && ls < last { + last = ls + } + span := int(last - first + 1) + if span < 1 { + return nil, nil + } if nLedgers > span { nLedgers = span } From b92f0cb483c175909ad361026384a318fbcba72e Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Thu, 4 Jun 2026 15:50:15 +0000 Subject: [PATCH 17/27] bench(fullhistory): let single-contract workloads feed the events corpus MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The cold-events corpus builder hard-required ≥3 distinct contracts emitting 4-topic events (termsPerCategory anchors), which excluded apply-load's single-contract synthetic workloads. But the real requirement is enough unique FILTERABLE TERMS to fill the K-bucket sweep — a contract anchor plus topic values — not a minimum contract count. A single contract with topic diversity (e.g. a SAC's `transfer` events varying from/to over thousands of accounts) provides them. - scanForTopTerms: accept ≥1 contract (anchors = min(3, nContracts)); fill the rest of the 15-term budget from topic values. Only fail when NO contract emits 4-topic events. - newCorpus: validate total terms ≥ max(buckets) — the actual sweep requirement — with a message that points at topic diversity / --buckets, not contracts. Validated: cold-events now runs the full K=2..15 sweep on a synthetic SAC store (1 contract + 14 topic terms = 15) and a soroswap store (2 contracts + 13). token/custom_token still yields nothing — its events are not 4-topic (a workload property). Existing pubnet-shaped corpus behaviour is unchanged (still picks 3 contract anchors when ≥3 are present). Co-Authored-By: Claude Opus 4.8 (1M context) --- .../scripts/bench-fullhistory/README.md | 13 ++++-- .../scripts/bench-fullhistory/corpus.go | 46 +++++++++++++------ 2 files changed, 42 insertions(+), 17 deletions(-) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/README.md b/cmd/stellar-rpc/scripts/bench-fullhistory/README.md index e380975a3..1e28094f8 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/README.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/README.md @@ -235,10 +235,15 @@ fill, default 16; ignored when `NUM_LEDGERS` is set), `CLOSE_TIME_S`, (`--lcm-checkpoint`). The final chunk may be **partial** when the run was sized below a full chunk (`--lcm-allow-partial`, on by default); the read benches clamp their cursors to each chunk's actual ledger range. -- **`cold-events` is not supported on apply-load data.** Its corpus builder needs - ≥3 distinct contracts emitting 4-topic events, but every apply-load profile - drives a single contract. Use real pubnet chunks (`--source=bsb`/`pack`) for - event benches. `cold-ledgers`, `cold-txpage`, and `cold-txhash` all work. +- **`cold-events` works for `sac` and `soroswap`, not `token`.** The corpus + builder needs enough unique *terms* (contract anchors + topic values) to fill + the K-bucket sweep (≥ max K, default 15) — it does **not** require 3 distinct + contracts. `sac` (one SAC contract whose `transfer` events vary `from`/`to` + over thousands of accounts) and `soroswap` (router + pair contracts) both + reach 15 terms from a single/few contracts. `token` (`custom_token`) emits + events that are not 4-topic, so it yields no usable terms — use `sac` or + `soroswap` for event benches. `cold-ledgers`/`cold-txpage`/`cold-txhash` work + for all profiles. ## Interpreting ingest output diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/corpus.go b/cmd/stellar-rpc/scripts/bench-fullhistory/corpus.go index 2c0cb0fff..6dce4f6e4 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/corpus.go +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/corpus.go @@ -116,6 +116,22 @@ func newCorpus( if err != nil { return nil, fmt.Errorf("corpus: scan: %w", err) } + // The K-filter sweep needs at least maxK distinct terms (contract anchors + + // topic values) so the largest bucket can place one term per filter. This + // is the real requirement — NOT a minimum contract count: a single contract + // with enough topic diversity (e.g. a SAC's `transfer` over many accounts) + // satisfies it. + maxK := buckets[0] + for _, k := range buckets { + if k > maxK { + maxK = k + } + } + if len(terms) < maxK { + return nil, fmt.Errorf("corpus: only %d filterable terms (contract anchors + topic values); "+ + "the K-bucket sweep needs ≥%d. The workload lacks topic diversity — "+ + "use a richer workload or lower --buckets (max K)", len(terms), maxK) + } return &corpus{ terms: terms, buckets: append([]int(nil), buckets...), @@ -286,11 +302,15 @@ func scanForTopTerms( } } - // Anchors: top termsPerCategory contracts by 4-topic event count. - // Each anchor lets a K=3 partition place one contract-constraint - // per filter, ensuring filters AND a specific contract bitmap - // against their topic bitmaps (otherwise filters would only - // constrain topics and the cardinality model degenerates). + // Anchors: up to termsPerCategory contracts by 4-topic event count. + // Each anchor lets a partition place one contract-constraint per filter, + // so filters AND a specific contract bitmap against their topic bitmaps. + // Real pubnet chunks have many contracts; synthetic apply-load workloads + // drive a SINGLE contract, so we accept as few as one anchor and make up + // the term budget from that contract's topic-value diversity (e.g. a SAC's + // `transfer` events vary `from`/`to` over thousands of accounts). The total + // usable-term count — not the contract count — is what the K-filter sweep + // needs (validated against the bucket set in newCorpus). ranked := make([]*contractInfo, 0, len(stats)) for _, ci := range stats { if ci.events4Topic > 0 { @@ -300,11 +320,11 @@ func scanForTopTerms( sort.Slice(ranked, func(i, j int) bool { return ranked[i].events4Topic > ranked[j].events4Topic }) - if len(ranked) < termsPerCategory { - return nil, fmt.Errorf("corpus: only %d contracts emit 4-topic events; need ≥%d", - len(ranked), termsPerCategory) + if len(ranked) == 0 { + return nil, fmt.Errorf("corpus: no contracts emit 4-topic events") } - picked := ranked[:termsPerCategory] + nAnchors := min(termsPerCategory, len(ranked)) + picked := ranked[:nAnchors] // Topic budget: remaining-budget (position, value) pairs aggregated // over the picked contracts, ranked by frequency across positions. @@ -328,9 +348,9 @@ func scanForTopTerms( } } sort.Slice(allValues, func(i, j int) bool { return allValues[i].count > allValues[j].count }) - topicBudget := min(totalTerms-termsPerCategory, len(allValues)) + topicBudget := min(totalTerms-nAnchors, len(allValues)) - terms := make([]termSpec, 0, termsPerCategory+topicBudget) + terms := make([]termSpec, 0, nAnchors+topicBudget) for _, ci := range picked { cid := ci.id terms = append(terms, termSpec{category: 0, value: append([]byte(nil), cid[:]...)}) @@ -341,8 +361,8 @@ func scanForTopTerms( terms = append(terms, termSpec{category: v.pos + 1, value: []byte(v.value)}) posCount[v.pos]++ } - logger.Infof("corpus: picker emitted %d contracts + topic positions [%d,%d,%d,%d] (%d terms total)", - termsPerCategory, posCount[0], posCount[1], posCount[2], posCount[3], len(terms)) + logger.Infof("corpus: picker emitted %d contract(s) + topic positions [%d,%d,%d,%d] (%d terms total)", + nAnchors, posCount[0], posCount[1], posCount[2], posCount[3], len(terms)) return terms, nil } From b75236a8a46f430108cd62c374682b223eef914a Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Thu, 4 Jun 2026 16:44:06 +0000 Subject: [PATCH 18/27] bench(fullhistory): sac profile BATCH_SAC=1 so the pack hits 10k TPS MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit With APPLY_LOAD_BATCH_SAC_COUNT=100 the sac profile folded 100 transfers into a single InvokeHostFunction tx, and core's benchmark mode closed/streamed just one such tx per ledger — so the usable pack carried ~100 transfers/ledger (~100 TPS), 100x below the 10k target (verified by decoding the pack: 1 tx, 1 op, ~97 events per ledger). Setting BATCH_SAC=1 makes every transfer its own tx, so the closed ledger carries the full count. Verified by decoding the regenerated packs (tail/benchmark ledgers): sac : 10000 tx / 10000 ops / 10000 events per ledger -> 10000 TPS soroswap : 2500 tx / 2500 ops / 12500 events per ledger -> 2500 TPS token : 9000 tx / 9000 ops / 9000 events per ledger -> 9000 TPS (unchanged) All four read benches (cold-ledgers/txpage/txhash/events) run with 0 errors and miss-rate=0 on the sac and soroswap stores. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../scripts/bench-fullhistory/apply-load-gen.sh | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh index b41c2419e..16d6b7aea 100755 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh @@ -18,9 +18,14 @@ # soroswap Soroswap AMM swaps (target ~2.5k TPS) # # TPS is interpreted as txs-per-ledger / ledger-close-time. With the default -# CLOSE_TIME_S=1 the per-ledger transaction counts below hit the targets; for -# SAC the count is divided by APPLY_LOAD_BATCH_SAC_COUNT (each tx batches that -# many SAC invocations, and TPS counts each invocation). +# CLOSE_TIME_S=1 the per-ledger transaction counts below hit the targets. +# +# NOTE on batching: APPLY_LOAD_BATCH_SAC_COUNT>1 folds N SAC transfers into a +# single InvokeHostFunction tx, so the CLOSED/streamed ledger ends up with +# ~(TXS_PER_LEDGER) txs but only ~TXS_PER_LEDGER batched transfers reach the +# meta as 1 tx each — i.e. the usable pack carries far fewer txs than the TPS +# target (verified: BATCH_SAC=100 streamed 1 tx/ledger). Keep BATCH_SAC=1 so +# every transfer is its own tx and the pack's tx density equals the TPS target. # # REQUIREMENTS # * stellar-core built with BUILD_TESTS (apply-load + ARTIFICIALLY_GENERATE_ @@ -76,7 +81,7 @@ fi # ---- per-profile density --------------------------------------------------- # model_tx, dependent_tx_clusters, batch_sac_count, target_tps case "$PROFILE" in - sac) MODEL_TX="sac"; CLUSTERS=1; BATCH_SAC=100; TARGET_TPS=10000 ;; + sac) MODEL_TX="sac"; CLUSTERS=1; BATCH_SAC=1; TARGET_TPS=10000 ;; token|oz) MODEL_TX="custom_token"; CLUSTERS=2; BATCH_SAC=1; TARGET_TPS=9000 ;; soroswap) MODEL_TX="soroswap"; CLUSTERS=1; BATCH_SAC=1; TARGET_TPS=2500 ;; *) die "unknown PROFILE=$PROFILE (expected sac|token|soroswap)" ;; From b02f0eb5cc8f43f6bda7b0db6427b07292cff7c5 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Thu, 4 Jun 2026 19:10:21 +0000 Subject: [PATCH 19/27] bench(fullhistory): model 600ms block time (CLOSE_TIME_MS) for TPS targets MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per issue #762, the synthetic datasets target specific load shapes. The network target is 600ms blocks, so TPS is taken at a 600ms block time: per-ledger tx count = TPS * 0.6. Replaces the 1s assumption (CLOSE_TIME_S) with CLOSE_TIME_MS (default 600). The ledger header closeTime is whole seconds in XDR, so the sub-second cadence can't be a timestamp — it's modeled purely by density. Resulting per-ledger densities (BATCH_SAC=1): sac 10,000 TPS -> 6,000 txs/ledger token/OZ 9,000 TPS -> 5,400 txs/ledger soroswap 2,500 TPS -> 1,500 txs/ledger Co-Authored-By: Claude Opus 4.8 (1M context) --- .../scripts/bench-fullhistory/README.md | 24 ++++++++------- .../bench-fullhistory/apply-load-gen.sh | 30 ++++++++++++------- 2 files changed, 34 insertions(+), 20 deletions(-) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/README.md b/cmd/stellar-rpc/scripts/bench-fullhistory/README.md index ea3afb64a..c9897c4aa 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/README.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/README.md @@ -215,20 +215,24 @@ CORE_BIN=/path/to/stellar-core PROFILE=sac NUM_LEDGERS=300 \ ``` **Workload profiles** (`PROFILE=`) map to apply-load's model transactions and -target throughputs (TPS = txs-per-ledger ÷ ledger-close-time; defaults assume -`CLOSE_TIME_S=1`): +target throughputs. TPS = txs-per-ledger ÷ block-time, and the target is taken at +the network's **600 ms block time** (`CLOSE_TIME_MS` default), so the per-ledger +tx count = `TPS × 0.6`: -| `PROFILE` | model tx (`APPLY_LOAD_MODEL_TX`) | target | -|---|---|---| -| `sac` | `sac` (Stellar Asset Contract transfer) | ~10k SAC TPS | -| `token` (`oz`) | `custom_token` (OpenZeppelin-style token) | ~9k OZ TPS | -| `soroswap` | `soroswap` (AMM swap, real mainnet wasm) | ~2.5k TPS | +| `PROFILE` | model tx (`APPLY_LOAD_MODEL_TX`) | target | txs/ledger @600ms | +|---|---|---|---| +| `sac` | `sac` (Stellar Asset Contract transfer) | ~10k SAC TPS | 6,000 | +| `token` (`oz`) | `custom_token` (OpenZeppelin-style token) | ~9k OZ TPS | 5,400 | +| `soroswap` | `soroswap` (AMM swap, real mainnet wasm) | ~2.5k TPS | 1,500 | + +> The ledger header `closeTime` is whole **seconds** in XDR, so a 600 ms block +> cadence can't be a timestamp — it's modeled purely by per-ledger density. Key env knobs: `NUM_LEDGERS` (total ledgers to generate; **prefer this for a quick run** — the final chunk may be partial), `CHUNKS` (10k-ledger chunks to -fill, default 16; ignored when `NUM_LEDGERS` is set), `CLOSE_TIME_S`, -`TXS_PER_LEDGER` (override the derived density), `TYPES`, `CHUNK_WORKERS`, -`OUT_ROOT`, `KEEP_META`, `BENCH_BIN`. +fill, default 16; ignored when `NUM_LEDGERS` is set), `CLOSE_TIME_MS` (block +time for the TPS math, default 600), `TXS_PER_LEDGER` (override the derived +density), `TYPES`, `CHUNK_WORKERS`, `OUT_ROOT`, `KEEP_META`, `BENCH_BIN`. **Requirements & caveats:** diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh index 16d6b7aea..2f387f75e 100755 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh @@ -12,13 +12,17 @@ # # The cold-* read benches then point --cold-dir at /cold/ledgers (etc). # -# WORKLOAD PROFILES (model transaction + density): -# sac Stellar Asset Contract transfers (target ~10k SAC TPS) -# token custom (OpenZeppelin-style) token (target ~9k OZ TPS) -# soroswap Soroswap AMM swaps (target ~2.5k TPS) +# WORKLOAD PROFILES (model transaction + density). Targets are interpreted at +# the network's 600ms block time (CLOSE_TIME_MS default), so the per-ledger tx +# count = TPS * 0.6: +# profile model tx target TPS txs/ledger @600ms +# sac sac 10,000 6,000 +# token (oz) custom_token 9,000 5,400 +# soroswap soroswap 2,500 1,500 # -# TPS is interpreted as txs-per-ledger / ledger-close-time. With the default -# CLOSE_TIME_S=1 the per-ledger transaction counts below hit the targets. +# TPS = txs-per-ledger / block-time. Block time is CLOSE_TIME_MS (default 600). +# The ledger header closeTime is whole SECONDS in XDR, so 600ms blocks cannot be +# represented as timestamps — the cadence is modeled by per-ledger DENSITY only. # # NOTE on batching: APPLY_LOAD_BATCH_SAC_COUNT>1 folds N SAC transfers into a # single InvokeHostFunction tx, so the CLOSED/streamed ledger ends up with @@ -49,7 +53,12 @@ NUM_LEDGERS="${NUM_LEDGERS:-}" # override total ledgers (else CHUNKS # by density, not ledger count). The final # chunk is then partial (cold-ingest's # --lcm-allow-partial handles it). -CLOSE_TIME_S="${CLOSE_TIME_S:-1}" # assumed ledger close time for TPS math +CLOSE_TIME_MS="${CLOSE_TIME_MS:-600}" # modeled block time in ms for TPS math. + # Default 600ms — the network target. The + # ledger header closeTime is whole SECONDS + # in XDR, so sub-second cadence cannot be a + # timestamp; it is modeled purely as + # per-ledger density = TPS * CLOSE_TIME_MS/1000. TXS_PER_LEDGER="${TXS_PER_LEDGER:-}" # override the profile's per-ledger tx count CORE_BIN="${CORE_BIN:-$(command -v stellar-core || true)}" BENCH_BIN="${BENCH_BIN:-}" # prebuilt bench-fullhistory; built if empty @@ -87,9 +96,10 @@ case "$PROFILE" in *) die "unknown PROFILE=$PROFILE (expected sac|token|soroswap)" ;; esac -# txs-per-ledger so that (txs * batch) / close_time == target_tps +# txs-per-ledger so that (txs * batch) / (close_time_ms/1000) == target_tps, +# i.e. txs = target_tps * close_time_ms / 1000 / batch (ceil division). if [ -z "$TXS_PER_LEDGER" ]; then - TXS_PER_LEDGER=$(( (TARGET_TPS * CLOSE_TIME_S + BATCH_SAC - 1) / BATCH_SAC )) + TXS_PER_LEDGER=$(( (TARGET_TPS * CLOSE_TIME_MS + 1000 * BATCH_SAC - 1) / (1000 * BATCH_SAC) )) fi # NUM_LEDGERS override: when set, it drives generation directly and CHUNKS is # derived as the number of (10k-ledger) chunks needed to cover it (the last is @@ -103,7 +113,7 @@ fi GENESIS_ACCOUNTS=$(( TXS_PER_LEDGER * 2 )) [ "$GENESIS_ACCOUNTS" -lt 21000 ] && GENESIS_ACCOUNTS=21000 -log "profile=$PROFILE model_tx=$MODEL_TX txs/ledger=$TXS_PER_LEDGER batch_sac=$BATCH_SAC -> ~$(( TXS_PER_LEDGER * BATCH_SAC / CLOSE_TIME_S )) TPS @ ${CLOSE_TIME_S}s close" +log "profile=$PROFILE model_tx=$MODEL_TX txs/ledger=$TXS_PER_LEDGER batch_sac=$BATCH_SAC -> ~$(( TXS_PER_LEDGER * BATCH_SAC * 1000 / CLOSE_TIME_MS )) TPS @ ${CLOSE_TIME_MS}ms blocks" log "chunks=$CHUNKS num_ledgers=$NUM_LEDGERS (this is the slow part — apply-load closes every ledger)" # ---- workspace + config ---------------------------------------------------- From f2e7fd326da61689febc91a218bde8a7ef6396c2 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Thu, 4 Jun 2026 19:42:44 +0000 Subject: [PATCH 20/27] bench(fullhistory): default dependent-tx clusters to 8 for generation speed MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit APPLY_LOAD_LEDGER_MAX_DEPENDENT_TX_CLUSTERS sets the number of parallel apply threads — purely a generation-speed knob, it doesn't change the workload. Per upstream guidance, default it to 8 (was per-profile 1/2) and cap there: stellar- core's multi-threaded apply has known perf issues above 8 even on bigger boxes. Promoted to a top-level CLUSTERS env knob; removed from the per-profile table. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../scripts/bench-fullhistory/README.md | 4 +++- .../scripts/bench-fullhistory/apply-load-gen.sh | 14 ++++++++++---- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/README.md b/cmd/stellar-rpc/scripts/bench-fullhistory/README.md index c9897c4aa..a4c8024c3 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/README.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/README.md @@ -232,7 +232,9 @@ Key env knobs: `NUM_LEDGERS` (total ledgers to generate; **prefer this for a quick run** — the final chunk may be partial), `CHUNKS` (10k-ledger chunks to fill, default 16; ignored when `NUM_LEDGERS` is set), `CLOSE_TIME_MS` (block time for the TPS math, default 600), `TXS_PER_LEDGER` (override the derived -density), `TYPES`, `CHUNK_WORKERS`, `OUT_ROOT`, `KEEP_META`, `BENCH_BIN`. +density), `CLUSTERS` (`APPLY_LOAD_LEDGER_MAX_DEPENDENT_TX_CLUSTERS` — parallel +apply threads; **generation-speed only**, default 8, don't exceed 8), +`TYPES`, `CHUNK_WORKERS`, `OUT_ROOT`, `KEEP_META`, `BENCH_BIN`. **Requirements & caveats:** diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh index 2f387f75e..13a09b5f1 100755 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh @@ -65,6 +65,11 @@ BENCH_BIN="${BENCH_BIN:-}" # prebuilt bench-fullhistory; built i OUT_ROOT="${OUT_ROOT:-./apply-load-out}" # work + output root TYPES="${TYPES:-ledgers,txhash,events}" # cold-ingest types CHUNK_WORKERS="${CHUNK_WORKERS:-4}" # cold-ingest chunk concurrency +CLUSTERS="${CLUSTERS:-8}" # APPLY_LOAD_LEDGER_MAX_DEPENDENT_TX_CLUSTERS: + # parallel apply threads — a GENERATION-SPEED + # knob only (does not change the workload). Cap + # at 8; multi-threaded apply has known perf + # issues above that even on bigger machines. KEEP_META="${KEEP_META:-0}" # 1 = keep meta.xdr after ingest # Must match the passphrase the bench binary hardcodes (main.go: pubnetPassphrase). # The ingest reader recomputes each tx hash from its envelope under this @@ -88,11 +93,12 @@ if ! "$CORE_BIN" apply-load --help >/dev/null 2>&1; then fi # ---- per-profile density --------------------------------------------------- -# model_tx, dependent_tx_clusters, batch_sac_count, target_tps +# model_tx, batch_sac_count, target_tps. (Dependent-tx clusters is a generation- +# speed knob, not per-profile — see CLUSTERS above.) case "$PROFILE" in - sac) MODEL_TX="sac"; CLUSTERS=1; BATCH_SAC=1; TARGET_TPS=10000 ;; - token|oz) MODEL_TX="custom_token"; CLUSTERS=2; BATCH_SAC=1; TARGET_TPS=9000 ;; - soroswap) MODEL_TX="soroswap"; CLUSTERS=1; BATCH_SAC=1; TARGET_TPS=2500 ;; + sac) MODEL_TX="sac"; BATCH_SAC=1; TARGET_TPS=10000 ;; + token|oz) MODEL_TX="custom_token"; BATCH_SAC=1; TARGET_TPS=9000 ;; + soroswap) MODEL_TX="soroswap"; BATCH_SAC=1; TARGET_TPS=2500 ;; *) die "unknown PROFILE=$PROFILE (expected sac|token|soroswap)" ;; esac From 2d00293ff160de93423d0110f6a0f6c523ad0ef5 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Thu, 4 Jun 2026 23:24:14 +0000 Subject: [PATCH 21/27] bench(fullhistory): HTTP_PORT knob (default 0) so generations run in parallel MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit stellar-core binds its HTTP server (default port 11626); running multiple apply-load generations concurrently failed the 2nd/3rd with "bind: address already in use". apply-load doesn't need the HTTP endpoint, so default HTTP_PORT=0 (disabled), env-overridable. Lets all profiles generate in parallel — on a 32-vCPU box that cuts a 3-profile 20k run from ~99h sequential to ~the slowest profile (~42h). Co-Authored-By: Claude Opus 4.8 (1M context) --- cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh index 13a09b5f1..008149697 100755 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/apply-load-gen.sh @@ -65,6 +65,10 @@ BENCH_BIN="${BENCH_BIN:-}" # prebuilt bench-fullhistory; built i OUT_ROOT="${OUT_ROOT:-./apply-load-out}" # work + output root TYPES="${TYPES:-ledgers,txhash,events}" # cold-ingest types CHUNK_WORKERS="${CHUNK_WORKERS:-4}" # cold-ingest chunk concurrency +HTTP_PORT="${HTTP_PORT:-0}" # stellar-core HTTP port. 0 = disabled, which + # apply-load doesn't need and which lets many + # generations run in PARALLEL without colliding + # on the default 11626 (bind: address in use). CLUSTERS="${CLUSTERS:-8}" # APPLY_LOAD_LEDGER_MAX_DEPENDENT_TX_CLUSTERS: # parallel apply threads — a GENERATION-SPEED # knob only (does not change the workload). Cap @@ -171,6 +175,7 @@ APPLY_LOAD_TIME_WRITES=true RUN_STANDALONE=true NODE_IS_VALIDATOR=true UNSAFE_QUORUM=true +HTTP_PORT=$HTTP_PORT NETWORK_PASSPHRASE="$NETWORK_PASSPHRASE" NODE_SEED="SDQVDISRYN2JXBS7ICL7QJAEKB3HWBJFP2QECXG7GZICAHBK4UNJCWK2 self" LOG_FILE_PATH="" From 3ea7cc40c755f68f123327d4788f801c11765532 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Sat, 6 Jun 2026 17:02:57 +0000 Subject: [PATCH 22/27] bench(fullhistory): synthetic-run orchestrator + bench-suite + runbook Make the synthetic-ledger generation reproducible on another machine. The per-profile generator (apply-load-gen.sh) and the meta fixup were already committed; this adds the orchestration + docs that were previously ad-hoc: - synthetic-run.sh: loop profiles -> apply-load-gen.sh (generate) -> bench-suite.sh (read benches) -> optional GCS upload. Sequential by default; PARALLEL=1 opt-in. Auto-builds the bench binary if BENCH_BIN unset. - bench-suite.sh: cold-* and hot-* read suite per profile (both decode modes, concurrency sweep); skips events for non-4-topic profiles (token). - SYNTHETIC-LEDGERS.md: host prereqs (~buildtests core, RocksDB cgo, Go), the TPS/600ms model, run commands, outputs, and the RAM ceiling. RAM is the real limit: dense apply-load accumulates in-memory soroban state (~8.5 MB/ledger at 6000 SAC tx/ledger), so a full 10k-ledger 10k-TPS SAC chunk needs ~96-128 GB; on a 61 GB box cap sac/token near ~6000 ledgers. Documented with a per-box sizing table so the run can target a larger machine. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../bench-fullhistory/SYNTHETIC-LEDGERS.md | 129 ++++++++++++++++++ .../scripts/bench-fullhistory/bench-suite.sh | 88 ++++++++++++ .../bench-fullhistory/synthetic-run.sh | 97 +++++++++++++ 3 files changed, 314 insertions(+) create mode 100644 cmd/stellar-rpc/scripts/bench-fullhistory/SYNTHETIC-LEDGERS.md create mode 100755 cmd/stellar-rpc/scripts/bench-fullhistory/bench-suite.sh create mode 100755 cmd/stellar-rpc/scripts/bench-fullhistory/synthetic-run.sh diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/SYNTHETIC-LEDGERS.md b/cmd/stellar-rpc/scripts/bench-fullhistory/SYNTHETIC-LEDGERS.md new file mode 100644 index 000000000..cfaf30619 --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/SYNTHETIC-LEDGERS.md @@ -0,0 +1,129 @@ +# Synthetic-ledger generation + benchmarking — runbook + +End-to-end recipe to generate controllable synthetic full-history datasets with +stellar-core `apply-load` and run the read bench suite on them. This is the +hands-off path: prepare the host once, then `synthetic-run.sh` does +generate → bench → (optional) upload for every profile. + +Scripts (all in this directory): +- `apply-load-gen.sh` — generate ONE profile (apply-load → meta.xdr → cold packfiles + tx-hash index) +- `bench-suite.sh` — run the cold/hot read benches against generated stores +- `synthetic-run.sh` — orchestrator: loop profiles → generate → bench → optional GCS upload + +## 1. Host prerequisites + +### a) stellar-core with `apply-load` (BUILD_TESTS) +Released/Docker cores **strip** `apply-load`. Install the `~buildtests` build from +SDF's unstable apt channel (Ubuntu 24.04 / noble shown): + +```sh +sudo wget -qO /etc/apt/keyrings/SDF.asc https://apt.stellar.org/SDF.asc +echo "deb [signed-by=/etc/apt/keyrings/SDF.asc] https://apt.stellar.org noble unstable" \ + | sudo tee /etc/apt/sources.list.d/SDF-unstable.list +sudo apt-get update +apt-cache madison stellar-core | grep buildtests # pick newest, protocol you want +sudo apt-get install -y stellar-core= # pin: it sorts below stable +stellar-core apply-load --help # must succeed +``` + +### b) Go + RocksDB (to build the `bench-fullhistory` binary) +The bench binary uses cgo against RocksDB v10 (grocksdb v1.10.x). The system +`librocksdb` (8.x) is too old. + +```sh +# Go: match go.mod's toolchain (1.26 at time of writing) — e.g. /usr/local/go +# RocksDB v10.9.1 (shared lib + headers): +PREFIX=$HOME/.rocksdb ./scripts/install-rocksdb.sh # repo root script + +export CGO_CFLAGS="-I$HOME/.rocksdb/include" +export CGO_LDFLAGS="-L$HOME/.rocksdb/lib -lrocksdb" +export LD_LIBRARY_PATH="$HOME/.rocksdb/lib" # needed at RUN time too +``` + +### c) Disk + RAM — the two real constraints +- **Disk:** use a fast **local** volume (NVMe instance store, not network EBS) for + `OUT_ROOT`. The transient `meta.xdr` is large (a 10k-ledger SAC chunk ≈ ~100+ GB + before it's deleted post-ingest). Budget hundreds of GB free. +- **RAM — this caps how many ledgers you can generate.** Each dense apply-load holds + in-memory soroban state that **grows with ledger count**. Measured: SAC at + 6000 tx/ledger ≈ **8.5 MB/ledger** → ~32 GB at 3,760 ledgers, ~85 GB at 10,000. + + | box RAM | sac/token (6000/5400 tx/ledger) | soroswap (1500 tx/ledger) | + |---|---|---| + | 61 GB (c6id.8xlarge) | ~6,000 ledgers | ~20,000 (2 chunks) | + | 128 GB | ~14,000 | full chunks easily | + | 256 GB | ~28,000 (≈3 chunks) | many chunks | + + **A full 10k-ledger chunk of 10k-TPS SAC needs ~96–128 GB RAM.** If a run exceeds + RAM the kernel OOM-kills apply-load mid-generation. Size `NUM_LEDGERS` to the box. + +## 2. Profiles and the TPS model + +`MODEL_TX` + per-ledger density define the workload. TPS is taken at a **600 ms** +block time (`CLOSE_TIME_MS`), so per-ledger tx count = `TPS × 0.6`: + +| PROFILE | model tx | target | tx/ledger @600ms | +|---|---|---|---| +| `sac` | SAC transfer | 10,000 TPS | 6,000 | +| `token` (`oz`) | custom_token | 9,000 TPS | 5,400 | +| `soroswap` | AMM swap | 2,500 TPS | 1,500 | + +Notes baked into the scripts: +- `BATCH_SAC=1` so each transfer is its own tx (pack tx-density == TPS target). +- `CLUSTERS=8` (`APPLY_LOAD_LEDGER_MAX_DEPENDENT_TX_CLUSTERS`) — generation-speed + only; don't exceed 8 (known multi-threaded-apply perf issues above that). +- `HTTP_PORT=0` so parallel generations don't collide on core's HTTP port. +- The streamed meta needs a **tx-hash fixup** (cold-ingest does it by default, + `--lcm-fix-tx-hashes`) or the roundtrip txpage/txhash benches reject it; the + passphrase is pubnet to match the bench binary. (Details in this dir's README.) +- `cold-events`/`hot-events` work for `sac` and `soroswap`; **not** `token` + (custom_token events aren't 4-topic). + +## 3. Run it + +```sh +cd cmd/stellar-rpc/scripts/bench-fullhistory + +# env from §1b (CGO_*, LD_LIBRARY_PATH) must be exported in this shell. +CORE_BIN=/usr/bin/stellar-core \ +OUT_ROOT=/mnt/nvme/synth \ +PROFILES="sac token soroswap" \ +NUM_LEDGERS=6000 \ # size to your RAM (see §1c) +PARALLEL=0 \ # sequential (safe); 1 only if combined RSS fits RAM +GCS_DEST=gs://rpc-full-history/synthetic-ledgers/ \ # optional upload + ./synthetic-run.sh +``` + +For a long unattended run, detach it: +```sh +setsid nohup env CORE_BIN=… OUT_ROOT=… NUM_LEDGERS=… ./synthetic-run.sh > run.out 2>&1 < /dev/null & +``` + +`soroswap` reaches full 10k chunks on modest RAM, so a common split is: +`NUM_LEDGERS=20000 PROFILES=soroswap` (2 chunks) plus +`NUM_LEDGERS= PROFILES="sac token"`. + +## 4. Outputs + +``` +$OUT_ROOT//cold/{ledgers/00000/*.pack, txhash.idx, events/00000/*} +$OUT_ROOT//work/apply-load.cfg # exact config (reproducibility input) +$OUT_ROOT/bench-results/run-//*.csv # latency/throughput sweeps +``` + +Point the read benches at a cold store directly, e.g.: +```sh +LD_LIBRARY_PATH=$HOME/.rocksdb/lib ./bench-fullhistory cold-txpage \ + --cold-dir=$OUT_ROOT/sac/cold/ledgers --page-size=20 --iters=200 \ + --query-concurrency=1,4,8,16 --xdr-views --out=results-sac +``` + +## 5. Reproducibility caveat + +The **config + genesis are deterministic** (same root account each run), but the +**transactions are not byte-reproducible**: stellar-core seeds its RNG from +wall-clock time and `apply-load` exposes no seed. Runs match in *shape* +(profile/density/op-mix), not bytes. To pin an exact dataset, keep the generated +cold packs (and their SHA256s) — that's the canonical artifact. Uploading to GCS +(`GCS_DEST`) is also how you make NVMe-instance-store output durable (it's wiped +on instance stop/terminate). diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/bench-suite.sh b/cmd/stellar-rpc/scripts/bench-fullhistory/bench-suite.sh new file mode 100755 index 000000000..4853df250 --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/bench-suite.sh @@ -0,0 +1,88 @@ +#!/usr/bin/env bash +# +# bench-suite.sh — run the bench-fullhistory READ suite against one or more +# synthetic cold stores produced by apply-load-gen.sh (and a hot store built +# from each cold pack). Writes per-profile CSVs + logs under RESULTS. +# +# Layout expected (created by apply-load-gen.sh / synthetic-run.sh): +# $ROOT//cold/{ledgers,txhash.idx,events/00000} +# +# Iter counts are env-overridable so the same script does a quick smoke +# (small *_ITERS) or a full run (defaults below). cold-events is skipped for a +# profile whose events are not 4-topic (apply-load's custom_token); the bench +# itself errors out cleanly otherwise. +# +# Required env: BENCH_BIN (path to the bench-fullhistory binary). On Linux the +# caller must also export LD_LIBRARY_PATH to the RocksDB v10 .so dir. +set -uo pipefail + +BENCH_BIN="${BENCH_BIN:?set BENCH_BIN=/path/to/bench-fullhistory}" +ROOT="${ROOT:?set ROOT=/cold>}" +RESULTS="${RESULTS:-$ROOT/bench-results/run-$(date -u +%Y%m%dT%H%M%SZ)}" +PROFILES="${PROFILES:-sac token soroswap}" +QC="${QC:-1,4,8,16}" +LEDGER_NS="${LEDGER_NS:-1 10 20}" +LEDGERS_ITERS="${LEDGERS_ITERS:-60}" +TXPAGE_ITERS="${TXPAGE_ITERS:-200}" +TXHASH_ITERS="${TXHASH_ITERS:-1000}" +EVENTS_ITERS="${EVENTS_ITERS:-500}" +PAGE_SIZE="${PAGE_SIZE:-20}" +HOT="${HOT:-1}" # 1 = also build a hot store per profile and run hot-* benches +# Profiles whose events are NOT 4-topic (skip cold/hot-events). apply-load's +# custom_token emits non-4-topic events; sac/soroswap are fine. +NO_EVENTS="${NO_EVENTS:-token}" + +mkdir -p "$RESULTS" +echo "bench-suite -> $RESULTS (profiles: $PROFILES)" + +skip_events() { case " $NO_EVENTS " in *" $1 "*) return 0;; *) return 1;; esac; } + +for P in $PROFILES; do + COLD="$ROOT/$P/cold" + if [ ! -d "$COLD/ledgers" ]; then echo "skip $P (no cold store at $COLD)"; continue; fi + O="$RESULTS/$P"; mkdir -p "$O" + echo "================= $P =================" + + # ---- COLD read benches (auto-discover chunk range) ---- + for n in $LEDGER_NS; do + "$BENCH_BIN" cold-ledgers --cold-dir="$COLD/ledgers" --n="$n" --iters="$LEDGERS_ITERS" \ + --query-concurrency="$QC" --out="$O" > "$O/cold-ledgers-n$n.log" 2>&1 || echo " cold-ledgers n=$n FAILED" + done + for mode in "" "--xdr-views"; do + tag=$([ -z "$mode" ] && echo roundtrip || echo xdrviews) + "$BENCH_BIN" cold-txpage --cold-dir="$COLD/ledgers" --page-size="$PAGE_SIZE" --iters="$TXPAGE_ITERS" \ + --query-concurrency="$QC" $mode --out="$O" > "$O/cold-txpage-$tag.log" 2>&1 || echo " cold-txpage $tag FAILED" + "$BENCH_BIN" cold-txhash --cold-dir="$COLD/ledgers" --txhash-cold-mphf="$COLD/txhash.idx" --iters="$TXHASH_ITERS" \ + --query-concurrency="$QC" $mode --out="$O" > "$O/cold-txhash-$tag.log" 2>&1 || echo " cold-txhash $tag FAILED" + done + if ! skip_events "$P"; then + "$BENCH_BIN" cold-events --cold-events-dir="$COLD/events/00000" --iters="$EVENTS_ITERS" \ + --query-concurrency="$QC" --out="$O" > "$O/cold-events.log" 2>&1 || echo " cold-events FAILED" + fi + + # ---- HOT: build a hot store from the cold pack (chunk 1), then hot reads ---- + if [ "$HOT" = "1" ]; then + H="$O/hot" + "$BENCH_BIN" hot-ingest --types=ledgers,txhash,events --source=pack --cold-dir="$COLD/ledgers" \ + --chunk=1 --hot-dir="$H" --out="$O" > "$O/hot-ingest.log" 2>&1 || echo " hot-ingest FAILED (skipping hot reads)" + if [ -d "$H/ledgers" ]; then + for n in $LEDGER_NS; do + "$BENCH_BIN" hot-ledgers --hot-dir="$H/ledgers" --chunk=1 --n="$n" --iters="$LEDGERS_ITERS" \ + --query-concurrency="$QC" --out="$O" > "$O/hot-ledgers-n$n.log" 2>&1 || echo " hot-ledgers n=$n FAILED" + done + for mode in "" "--xdr-views"; do + tag=$([ -z "$mode" ] && echo roundtrip || echo xdrviews) + "$BENCH_BIN" hot-txpage --hot-dir="$H/ledgers" --chunk=1 --page-size="$PAGE_SIZE" --iters="$TXPAGE_ITERS" \ + --query-concurrency="$QC" $mode --out="$O" > "$O/hot-txpage-$tag.log" 2>&1 || echo " hot-txpage $tag FAILED" + "$BENCH_BIN" hot-txhash --hot-dir="$H/ledgers" --txhash-hot="$H/txhash" --cold-dir="$COLD/ledgers" --chunk=1 \ + --iters="$TXHASH_ITERS" --query-concurrency="$QC" $mode --out="$O" > "$O/hot-txhash-$tag.log" 2>&1 || echo " hot-txhash $tag FAILED" + done + if ! skip_events "$P"; then + "$BENCH_BIN" hot-events --hot-dir="$H/events" --chunk=1 --iters="$EVENTS_ITERS" \ + --query-concurrency="$QC" --out="$O" > "$O/hot-events.log" 2>&1 || echo " hot-events FAILED" + fi + fi + fi + echo " $P done -> $O" +done +echo "ALL BENCHES DONE -> $RESULTS" diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/synthetic-run.sh b/cmd/stellar-rpc/scripts/bench-fullhistory/synthetic-run.sh new file mode 100755 index 000000000..a7681fa97 --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/synthetic-run.sh @@ -0,0 +1,97 @@ +#!/usr/bin/env bash +# +# synthetic-run.sh — end-to-end driver: generate synthetic cold stores for one +# or more apply-load profiles, then run the read bench suite on them. +# +# for each PROFILE: apply-load-gen.sh (apply-load -> meta -> cold packfiles) +# then: bench-suite.sh (cold-* / hot-* read benches -> CSVs) +# +# Profiles run SEQUENTIALLY by default. This matters: each dense apply-load holds +# in-memory soroban state that GROWS with ledger count (~8.5 MB/ledger at +# 6000 SAC tx/ledger), so running profiles in parallel multiplies peak RAM and +# can OOM. See MEMORY below. +# +# ---- MEMORY (read this before picking NUM_LEDGERS) -------------------------- +# Peak RSS ≈ density(tx/ledger) × NUM_LEDGERS × ~1.4 KB/tx of live state. +# Measured on c6id.8xlarge (61 GB): sac @6000 tx/ledger hit ~32 GB at 3,760 +# ledgers and projected ~85 GB at 10,000 -> OOM. Rough guidance per profile: +# RAM 61 GB -> sac/token ~6,000 ledgers; soroswap ~20,000 (it's 1,500 tx/ledger) +# RAM 128 GB -> sac/token ~14,000; soroswap full chunks easily +# RAM 256 GB -> sac/token ~28,000 (3 chunks); etc. +# A full 10k-ledger chunk of 10k-TPS SAC needs ~96-128 GB RAM. Size NUM_LEDGERS +# to your box, or run PARALLEL=1 only when total peak RSS fits in RAM. +# +# ---- USAGE ----------------------------------------------------------------- +# CORE_BIN=/usr/bin/stellar-core \ # a BUILD_TESTS (~buildtests) core +# OUT_ROOT=/mnt/nvme/synth \ # work + cold output (use fast local disk) +# PROFILES="sac token soroswap" \ +# NUM_LEDGERS=6000 \ # per profile; size to RAM (see above) +# PARALLEL=0 \ # 1 = all profiles at once (watch RAM!) +# GCS_DEST=gs://bucket/path \ # optional: upload cold stores + results +# ./synthetic-run.sh +# +# BENCH_BIN is auto-built from this dir if unset (needs Go + the RocksDB cgo deps; +# see SYNTHETIC-LEDGERS.md). On Linux, export LD_LIBRARY_PATH to the RocksDB .so. +set -uo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +CORE_BIN="${CORE_BIN:-$(command -v stellar-core || true)}" +OUT_ROOT="${OUT_ROOT:-./synthetic-out}" +PROFILES="${PROFILES:-sac token soroswap}" +NUM_LEDGERS="${NUM_LEDGERS:-6000}" +CLUSTERS="${CLUSTERS:-8}" +PARALLEL="${PARALLEL:-0}" +RUN_BENCH="${RUN_BENCH:-1}" +GCS_DEST="${GCS_DEST:-}" +export CLOSE_TIME_MS="${CLOSE_TIME_MS:-600}" +export KEEP_META="${KEEP_META:-0}" + +log(){ printf '\033[1;36m[synthetic-run]\033[0m %s\n' "$*" >&2; } +die(){ printf '\033[1;31m[synthetic-run] ERROR:\033[0m %s\n' "$*" >&2; exit 1; } + +[ -n "$CORE_BIN" ] || die "stellar-core not found; set CORE_BIN (must be a ~buildtests build with apply-load)" +mkdir -p "$OUT_ROOT"; OUT_ROOT="$(cd "$OUT_ROOT" && pwd)" + +# Build the bench binary once if not provided. +if [ -z "${BENCH_BIN:-}" ]; then + BENCH_BIN="$SCRIPT_DIR/bench-fullhistory" + log "building bench-fullhistory -> $BENCH_BIN" + ( cd "$SCRIPT_DIR" && go build -o "$BENCH_BIN" . ) || die "go build failed (see SYNTHETIC-LEDGERS.md for cgo/RocksDB setup)" +fi +export CORE_BIN BENCH_BIN OUT_ROOT CLUSTERS NUM_LEDGERS + +gen_one(){ + local P="$1" + log "generate $P (num_ledgers=$NUM_LEDGERS clusters=$CLUSTERS close_ms=$CLOSE_TIME_MS)" + PROFILE="$P" "$SCRIPT_DIR/apply-load-gen.sh" > "$OUT_ROOT/$P.gen.log" 2>&1 + log "$P generate exit=$? (log: $OUT_ROOT/$P.gen.log)" +} + +log "START $(date -u +%FT%TZ) profiles='$PROFILES' parallel=$PARALLEL out=$OUT_ROOT mem-free=$(free -g 2>/dev/null|awk '/Mem/{print $4}')G" +if [ "$PARALLEL" = "1" ]; then + log "PARALLEL=1: ensure combined peak RSS fits in RAM (see MEMORY note)" + pids=(); for P in $PROFILES; do gen_one "$P" & pids+=($!); done + wait "${pids[@]}" +else + for P in $PROFILES; do gen_one "$P"; done +fi +log "generation done $(date -u +%FT%TZ)" + +if [ "$RUN_BENCH" = "1" ]; then + RESULTS="${RESULTS:-$OUT_ROOT/bench-results/run-$(date -u +%Y%m%dT%H%M%SZ)}" + log "bench suite -> $RESULTS" + ROOT="$OUT_ROOT" RESULTS="$RESULTS" PROFILES="$PROFILES" BENCH_BIN="$BENCH_BIN" \ + bash "$SCRIPT_DIR/bench-suite.sh" || log "bench-suite returned nonzero" +fi + +if [ -n "$GCS_DEST" ]; then + command -v gsutil >/dev/null || die "GCS_DEST set but gsutil not found" + log "uploading cold stores + results to $GCS_DEST" + for P in $PROFILES; do + [ -d "$OUT_ROOT/$P/cold" ] && gsutil -m cp -r "$OUT_ROOT/$P/cold" "$GCS_DEST/$P/cold" + [ -f "$OUT_ROOT/$P/work/apply-load.cfg" ] && gsutil cp "$OUT_ROOT/$P/work/apply-load.cfg" "$GCS_DEST/$P/apply-load.cfg" + done + [ -d "${RESULTS:-}" ] && gsutil -m cp -r "$RESULTS" "$GCS_DEST/bench-results" + log "upload done -> $GCS_DEST" +fi +log "DONE $(date -u +%FT%TZ)" From 2fd196690312251f9a01c7a4982b822006daab76 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Tue, 9 Jun 2026 04:57:51 +0000 Subject: [PATCH 23/27] bench(fullhistory): support variable topic count + cap K to available terms The events corpus hard-required exactly-4-topic events, so apply-load's custom_token profile (whose transfer events carry 3 topics) produced zero terms and couldn't run the events bench at all. - EVENTS_TOPIC_COUNT env (default 4) sets the required topic count; extractors and the scan loop use it instead of a literal 4. sac/soroswap (4-topic) unchanged; token runs with EVENTS_TOPIC_COUNT=3. - newCorpus: instead of erroring when the workload can't reach max(buckets), CAP the K-bucket sweep to the terms available (dedup), logging the cap. Lets low-diversity workloads run at the largest K they support. Verified: token (3-topic, 1 contract) now builds the full 15-term universe (contract + transfer symbol + 13 from-addresses) and runs the K=1..15 sweep; cold-events 235ms@c=1 / 111 ops peak, hot-events 14ms / 1140 ops. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../scripts/bench-fullhistory/corpus.go | 87 ++++++++++++++----- 1 file changed, 67 insertions(+), 20 deletions(-) diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/corpus.go b/cmd/stellar-rpc/scripts/bench-fullhistory/corpus.go index 6dce4f6e4..4b27e8aee 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/corpus.go +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/corpus.go @@ -22,7 +22,9 @@ import ( "errors" "fmt" "math/rand/v2" + "os" "sort" + "strconv" supportlog "github.com/stellar/go-stellar-sdk/support/log" "github.com/stellar/go-stellar-sdk/xdr" @@ -36,6 +38,31 @@ import ( // one contract-per-filter without forcing a collision. const termsPerCategory = 3 +// maxTopics is Soroban's max contract-event topic count. +const maxTopics = 4 + +// wantTopics is how many topic positions a contract event must have to qualify +// for the corpus. Default 4 (SAC / soroswap transfer-style events). Override +// via EVENTS_TOPIC_COUNT — e.g. 3 for apply-load's custom_token, whose transfer +// events carry 3 topics. Clamped to [1, maxTopics]. +var wantTopics = topicCountFromEnv() + +func topicCountFromEnv() int { + n := maxTopics + if v := os.Getenv("EVENTS_TOPIC_COUNT"); v != "" { + if p, err := strconv.Atoi(v); err == nil { + n = p + } + } + if n < 1 { + n = 1 + } + if n > maxTopics { + n = maxTopics + } + return n +} + // totalTerms is the budget of the picked term universe; matches // eventstore.Query's documented ≤15-unique-term caller ceiling. const totalTerms = 15 @@ -116,29 +143,49 @@ func newCorpus( if err != nil { return nil, fmt.Errorf("corpus: scan: %w", err) } - // The K-filter sweep needs at least maxK distinct terms (contract anchors + - // topic values) so the largest bucket can place one term per filter. This - // is the real requirement — NOT a minimum contract count: a single contract - // with enough topic diversity (e.g. a SAC's `transfer` over many accounts) - // satisfies it. - maxK := buckets[0] + if len(terms) == 0 { + return nil, errors.New("corpus: 0 filterable terms — no contract events with topics found") + } + // The K-filter sweep needs at least K distinct terms (contract anchors + + // topic values) to place one term per filter. Rather than fail when the + // workload can't reach max(buckets), CAP the sweep to the terms available: + // keep buckets ≤ len(terms) and clamp the rest down to len(terms) (dedup). + // This lets low-diversity workloads (e.g. custom_token's 3-topic events) + // still run at the largest K they can support. The actual per-iter unique + // term count is recorded in nUniqueTerms regardless. + capK := len(terms) + kept := make([]int, 0, len(buckets)) + seen := map[int]bool{} for _, k := range buckets { - if k > maxK { - maxK = k + if k > capK { + k = capK + } + if !seen[k] { + seen[k] = true + kept = append(kept, k) } } - if len(terms) < maxK { - return nil, fmt.Errorf("corpus: only %d filterable terms (contract anchors + topic values); "+ - "the K-bucket sweep needs ≥%d. The workload lacks topic diversity — "+ - "use a richer workload or lower --buckets (max K)", len(terms), maxK) + if capK < buckets[len(buckets)-1] || capK < maxOf(buckets) { + logger.Warnf("corpus: only %d filterable terms; capping K-bucket sweep to ≤%d (requested up to %d)", + len(terms), capK, maxOf(buckets)) } return &corpus{ terms: terms, - buckets: append([]int(nil), buckets...), + buckets: kept, maxEvents: maxEvents, }, nil } +func maxOf(xs []int) int { + m := xs[0] + for _, x := range xs { + if x > m { + m = x + } + } + return m +} + // Next produces the next request via round-robin partition with // collision-recovery search. Each call advances the RNG; the // sequence is deterministic given the seed. @@ -297,7 +344,7 @@ func scanForTopTerms( stats[cid] = ci } ci.events4Topic++ - for d := range 4 { + for d := range wantTopics { ci.posCounts[d][raws[d]]++ } } @@ -336,7 +383,7 @@ func scanForTopTerms( count int } allValues := make([]posValue, 0, 64) - for d := range 4 { + for d := range wantTopics { agg := map[string]int{} for _, ci := range picked { for v, c := range ci.posCounts[d] { @@ -406,10 +453,10 @@ func extractContract4TopicsStruct(ev *xdr.ContractEvent) ([32]byte, [4]string, b return zero, raws, false } topics := ev.Body.V0.Topics - if len(topics) != 4 { + if len(topics) != wantTopics { return zero, raws, false } - for d := range 4 { + for d := range wantTopics { b, err := topics[d].MarshalBinary() if err != nil { return zero, raws, false @@ -458,7 +505,7 @@ func extractContract4TopicsView(raw []byte) ([32]byte, [4]string, bool) { return zero, raws, false } count, err := topicsArr.Count() - if err != nil || count != 4 { + if err != nil || int(count) != wantTopics { return zero, raws, false } i := 0 @@ -466,7 +513,7 @@ func extractContract4TopicsView(raw []byte) ([32]byte, [4]string, bool) { if ierr != nil { return zero, raws, false } - if i >= 4 { + if i >= wantTopics { break } topicRaw, rerr := topic.Raw() @@ -482,7 +529,7 @@ func extractContract4TopicsView(raw []byte) ([32]byte, [4]string, bool) { raws[i] = string(topicRaw) i++ } - if i != 4 { + if i != wantTopics { return zero, raws, false } var cid [32]byte From 76bb84b9388ebc31e510028df39eca82b49cb675 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Tue, 9 Jun 2026 14:49:59 +0000 Subject: [PATCH 24/27] bench(fullhistory): synthetic apply-load benchmark report (#762) Results report for the three synthetic apply-load datasets (sac/token/soroswap, 10k SAC / 9k OZ / 2.5k Soroswap TPS @ 600ms blocks) run through the full read + ingest suite on c6id.8xlarge. Datasets, configs, CSVs, and RESULTS.md are in GCS at gs://rpc-full-history/synthetic-ledgers/2026-06-04-apply-load-20k/. Covers query latency (both decode paths, p50/p99), peak throughput, per-stage ingest, and a comparison to the pubnet chunk-5860 baseline (synthetic is ~5-9x slower per query due to per-ledger density; ingest is item-bound). Implements #762's acceptance criteria via --source=lcm (no SDK loadtest dependency). Co-Authored-By: Claude Opus 4.8 (1M context) --- .../2026-06-09-synthetic-apply-load.md | 126 ++++++++++++++++++ 1 file changed, 126 insertions(+) create mode 100644 cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load.md diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load.md new file mode 100644 index 000000000..d897f21a1 --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load.md @@ -0,0 +1,126 @@ +# stellar-rpc full-history bench — synthetic apply-load datasets (2026-06-09) + +Addresses **[#762](https://github.com/stellar/stellar-rpc/issues/762)** — a +*controllable* synthetic dataset whose transaction profile we set deliberately, +so we can characterize ingest/query behavior under specific load shapes instead +of only whatever pubnet happened to produce. + +Three synthetic datasets were generated with stellar-core `apply-load` +(`apply-load-gen.sh`) and run through the full `bench-fullhistory` read + ingest +suite. Datasets, configs, per-iter/per-sweep CSVs, and the machine-readable +`RESULTS.md` live at +`gs://rpc-full-history/synthetic-ledgers/2026-06-04-apply-load-20k/`; every +number here is recomputed from those CSVs. + +> **Relation to #762.** The issue proposed a `--source=synthetic` bench source +> backed by `ingest/loadtest.ApplyLoad`. The pinned `go-stellar-sdk` doesn't yet +> include `ingest/loadtest`, so this uses the equivalent **`--source=lcm`** path +> (read the framed `LedgerCloseMeta` `apply-load` streams) — same acceptance +> criteria (generate N-chunk datasets → `cold-ingest`/`hot-ingest` → +> `cold-*`/`hot-*` query benches), no dependency bump. Swapping in +> `loadtest.ApplyLoad` later is a drop-in producer change. + +## Setup + +- **machine:** AWS c6id.8xlarge — 32 vCPU (Intel Ice Lake), 61 GB RAM, local NVMe instance store +- **core:** `stellar-core 26.1.1-3289.51ecab1e7.noble~buildtests` (commit `51ecab1e7`, ledger protocol 27) +- **gen:** `APPLY_LOAD_MODE=benchmark`, `BATCH_SAC=1`, `CLUSTERS=8`, pubnet passphrase, tx-hash fixup applied at ingest +- **block model:** 600 ms (per-ledger tx count = TPS × 0.6) +- **bench:** concurrency sweep `1,4,8,16`; iters — ledgers 60, txpage 200, txhash 1000, events 500; both decode paths (roundtrip + xdr-views) + +### Datasets (single machine; all reads run on the c6id.8xlarge above) + +| profile | model tx | target | tx/ledger | ledgers | chunks | txs | cold size | +|---|---|---|---|---|---|---|---| +| **sac** | SAC transfer | 10,000 TPS | 6,000 | 10,000 | 1 | 59.87 M | 28 GB | +| **token** (oz) | custom_token | 9,000 TPS | 5,400 | 10,000 | 1 | 53.85 M | 23 GB | +| **soroswap** | AMM swap | 2,500 TPS | 1,500 | 20,000 | 2 | 29.92 M | 16 GB | + +A full 10k-ledger chunk of 10k-TPS SAC needs ~96–128 GB RAM (in-memory soroban +state grows ~8.5 MB/ledger); on the 61 GB box sac/token were generated at 10k +ledgers (1 chunk) and soroswap — far lighter at 1,500 tx/ledger — at 20k (2 +chunks). See `SYNTHETIC-LEDGERS.md` for the per-RAM sizing table. + +## Table 1 — Query latency, p50 / p99 @ c=1 (ms). `cold / hot` + +**xdr-views** path (the realistic server path): + +| workload | sac | token | soroswap | +|---|---|---|---| +| tx-page | 26.2 / 21.2 | 25.6 / 21.3 | 14.1 / 11.5 | +| tx-hash | 16.8 / 14.3 | 17.6 / 14.0 | 9.1 / 7.1 | +| events | 229 / 14.0 | 235 / 13.6 | 210 / 32.5 | +| ledgers (n=20) | 101 / 91 | 92 / 84 | 53 / 51 | + +**roundtrip** path (production `UnmarshalBinary` + `ParseTransaction`): + +| workload | sac | token | soroswap | +|---|---|---|---| +| tx-page | 129 / 121 | 135 / 128 | 89 / 84 | +| tx-hash | 117 / 120 | 132 / 114 | 82 / 81 | + +## Table 2 — Peak query throughput, ops/s (best across c=1→16, xdr-views). `cold / hot` + +| workload | sac | token | soroswap | +|---|---|---|---| +| tx-page | 411 / 491 | 412 / 498 | 732 / 888 | +| tx-hash | 575 / 734 | 610 / 789 | 992 / 1,277 | +| events | 120 / 740 | 112 / 1,140 | 38 / 286 | + +events note: `custom_token` emits **3-topic** events (others 4-topic), so token's +corpus is built with `EVENTS_TOPIC_COUNT=3`; it still reaches the full K=15 +universe. Events latency falls sharply with K (token cold 250 ms @ K=1 → 47 ms @ +K=15) as more filters select fewer events. + +## Table 3 — Ingest throughput (cold-ingest from pack, `--parallel --xdr-views`) + +| profile | total wall | ledgers/s (e2e) | per-ledger p50 / p99 | txhash items/s | events items/s | build-txhash-index | +|---|---|---|---|---|---|---| +| sac | 8m28s | 20 | 24 / 54 ms | 701 k | 142 k | 59.9 M keys @ 23.3 M/s | +| token | 4m22s | 38 | 18 / 33 ms | 707 k | 291 k | 53.8 M keys @ 11.8 M/s | +| soroswap | 3m06s | 108 | 14 / 24 ms | 371 k | 506 k | 29.9 M keys @ 17.1 M/s | + +cold-ingest end-to-end is **events-stage-bound** (term-index + cold append); +per-ledger cost scales with events/ledger. Per-ledger cold-ingest stays **under +~55 ms through p99** even on 6k-tx ledgers (rare max-tail spikes to ~0.4–1 s on +packfile flush). + +## How these compare to production (pubnet) chunks + +Same machine + harness, vs the pubnet chunk-5860 run +(`results/2026-06-03-cross-machine.md`): + +- **Per-query latency is ~5–9× higher** and **throughput ~5–8× lower** than the + pubnet chunk. Cause is **per-ledger density**: every query touches a whole + `LedgerCloseMeta`, and a 1.5k–6k-tx synthetic LCM is ~50–300× larger than a + sparse pubnet ledger. The clean gradient *within* the synthetic set — + soroswap (1.5k) ~2× faster than sac/token (6k) on the identical code path — + confirms density, not "synthetic-ness", is the driver. +- **Ingest is item-bound, not ledger-bound:** synthetic ledgers/s is ~15–80× + lower, but per-item rates (keys/s, items/s) match pubnet's order. Same work + per tx, packed into fewer, fatter ledgers. +- **Qualitative findings match pubnet** across all four workloads: xdr-views is + 4–9× faster than roundtrip; hot ≈ cold for point reads but hot wins big on + events; throughput scales with concurrency. All benches ran with **0 errors** + and tx-hash **miss-rate 0**. + +These datasets are a **density stress test** — they exercise sustained 2.5k–10k +TPS regimes pubnet rarely produces, so absolute latencies are higher than +typical pubnet serving. They are not a substitute for the pubnet baseline; they +characterize the read/ingest path under deliberate high-TPS load shapes. + +## Reproducing + +See [`SYNTHETIC-LEDGERS.md`](../SYNTHETIC-LEDGERS.md). End to end: + +```sh +CORE_BIN=/usr/bin/stellar-core OUT_ROOT=/mnt/nvme/synth \ +PROFILES="sac token soroswap" NUM_LEDGERS= \ +GCS_DEST=gs://rpc-full-history/synthetic-ledgers/ \ + ./synthetic-run.sh +``` + +Generation is reproducible from `(config + profile)`; exact transactions are +**not** byte-reproducible (apply-load seeds its RNG from wall-clock and exposes +no seed), so the generated cold packs are the canonical pinned artifact — kept +in GCS at the path above. From ce8d9c272d3f435be2b3f0ad5546624bc65fae3f Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Tue, 9 Jun 2026 15:04:48 +0000 Subject: [PATCH 25/27] bench(fullhistory): interactive HTML explorer for synthetic apply-load results (#762) Self-contained, offline-capable HTML explorer (same tool as #758) for the three synthetic datasets. The "profile" dimension (sac/token/soroswap) maps onto the explorer's per-subdir axis; UI labels relabeled machine->Profile. - adds make_explorer.py (from #758) + the generated 2026-06-09-synthetic-apply-load-explorer.html (all CSVs embedded; no deps). - fixes the cold-throughput calc: divide chunk_wall by the actual chunk-workers (= chunk count for these runs; override via COLD_CHUNK_WORKERS) instead of a hardcoded 8, so cold ledgers/s reads correctly (sac 20, token 38, soroswap 108). Co-Authored-By: Claude Opus 4.8 (1M context) --- ...6-06-09-synthetic-apply-load-explorer.html | 271 ++++++++++++ .../results/make_explorer.py | 398 ++++++++++++++++++ 2 files changed, 669 insertions(+) create mode 100644 cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load-explorer.html create mode 100644 cmd/stellar-rpc/scripts/bench-fullhistory/results/make_explorer.py diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load-explorer.html b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load-explorer.html new file mode 100644 index 000000000..f26e0deb5 --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load-explorer.html @@ -0,0 +1,271 @@ + + + +stellar-rpc synthetic apply-load bench (sac/token/soroswap, 2026-06-09) + + +

stellar-rpc synthetic apply-load bench (sac/token/soroswap, 2026-06-09)

gs://rpc-full-history/synthetic-ledgers/2026-06-04-apply-load-20k/ · interactive explorer · all data embedded (offline-capable). Toggle tier / decode-path / percentiles / machines / concurrency; click a column header to sort.
+
Queries
Ingest
Ingest totals
+
+
+
+
+ chart metric:
+
+

Line graph — one line per series (profile · tier · workload · path), x-axis = concurrency. Filter to one workload/path/tier to overlay profiles; click a legend entry to hide/show a line.

+

Bar chart — one bar per visible row.

+
+
+ + +
+ + \ No newline at end of file diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/make_explorer.py b/cmd/stellar-rpc/scripts/bench-fullhistory/results/make_explorer.py new file mode 100644 index 000000000..5035548fc --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/make_explorer.py @@ -0,0 +1,398 @@ +#!/usr/bin/env python3 +"""Build a self-contained interactive HTML explorer from bench-fullhistory runs. + +Usage: + python3 make_explorer.py [--title "..."] [--source "gs://..."] + + contains one subdir per machine (the per-machine result dirs as +uploaded to gs://rpc-full-history/benchmarks//). The output is a single +HTML file with all data embedded — no server, no build step, works offline. +""" +import csv, json, os, sys, argparse + +# machine dir prefix -> short label +def machine_label(dirname): + return dirname.split("-2026")[0].split("-2025")[0] + +def rc(path): + if not os.path.exists(path): + return [] + with open(path) as f: + return list(csv.DictReader(f)) + +def sweep_rows(d, fn, machine, tier, workload, path): + out = [] + for r in rc(os.path.join(d, fn)): + out.append({ + "machine": machine, "tier": tier, "workload": workload, "path": path, + "c": int(r["query_concurrency"]), + "p50": float(r["p50_ms"]), "p90": float(r["p90_ms"]), + "p99": float(r["p99_ms"]), "max": float(r["max_ms"]), + "ops": float(r["ops_per_sec"]), + }) + return out + +def stage_rows(d, fn, machine, tier, mode, mapping): + """mapping: {csv_stage: display_stage}""" + out = [] + for r in rc(os.path.join(d, fn)): + st = r["stage"] + if st not in mapping: + continue + out.append({ + "machine": machine, "tier": tier, "mode": mode, "stage": mapping[st], + "p50": float(r["p50_ns"]) / 1e6, "p90": float(r["p90_ns"]) / 1e6, + "p99": float(r["p99_ns"]) / 1e6, "max": float(r["max_ns"]) / 1e6, + }) + return out + +def total_ns(d, fn, stage="total_per_ledger"): + for r in rc(os.path.join(d, fn)): + if r["stage"] == stage: + return int(r["total_ns"]) + return None + +def build(run_root): + machines, queries, ingest, throughput, buildidx = [], [], [], [], [] + dirs = sorted(x for x in os.listdir(run_root) if os.path.isdir(os.path.join(run_root, x))) + # stable machine ordering: 2x,4x,8x,arm + order = {"c6id.2xlarge": 0, "c6id.4xlarge": 1, "c6id.8xlarge": 2, "im4gn.4xlarge": 3} + dirs.sort(key=lambda x: order.get(machine_label(x), 99)) + for dn in dirs: + d = os.path.join(run_root, dn) + m = machine_label(dn) + machines.append(m) + for tier in ("cold", "hot"): + queries += sweep_rows(d, f"{tier}-ledgers.csv", m, tier, "ledgers", "raw") + queries += sweep_rows(d, f"{tier}-txpage-20-roundtrip-sweep.csv", m, tier, "tx-page", "roundtrip") + queries += sweep_rows(d, f"{tier}-txpage-20-xdrviews-sweep.csv", m, tier, "tx-page", "xdr-views") + queries += sweep_rows(d, f"{tier}-txhash-roundtrip-sweep.csv", m, tier, "tx-hash", "roundtrip") + queries += sweep_rows(d, f"{tier}-txhash-xdrviews-sweep.csv", m, tier, "tx-hash", "xdr-views") + queries += sweep_rows(d, f"{tier}-events-query-sweep.csv", m, tier, "events", "roundtrip") + queries += sweep_rows(d, f"{tier}-events-query-xdrviews-sweep.csv", m, tier, "events", "xdr-views") + # hot ingest: view + parsed + for mode in ("view", "parsed"): + ingest += stage_rows(d, f"hot-ledgers-{mode}.csv", m, "hot", mode, {"write": "ledgers.write"}) + ingest += stage_rows(d, f"hot-txhash-{mode}.csv", m, "hot", mode, {"extract": "txhash.extract", "hot_write": "txhash.write"}) + ingest += stage_rows(d, f"hot-events-{mode}.csv", m, "hot", mode, {"extract": "events.extract", "hot_write": "events.write"}) + ingest += stage_rows(d, f"hot-driver-{mode}.csv", m, "hot", mode, { + "read_blocked": "driver.read_blocked", "fan_out_per_ledger": "driver.fan_out", + "lcm_decode": "driver.lcm_decode", "total_per_ledger": "driver.total_per_ledger"}) + tn = total_ns(d, f"hot-driver-{mode}.csv") + if tn: + throughput.append({"machine": m, "tier": "hot", "mode": mode, "ledgers_per_s": round(10000 / (tn / 1e9), 1)}) + # cold ingest: view only + ingest += stage_rows(d, "cold-ledgers-view.csv", m, "cold", "view", {"write": "ledgers.write"}) + ingest += stage_rows(d, "cold-txhash-view.csv", m, "cold", "view", {"extract": "txhash.extract"}) + ingest += stage_rows(d, "cold-events-view.csv", m, "cold", "view", { + "extract": "events.extract", "term_index": "events.term_index", "cold_append": "events.cold_append"}) + ingest += stage_rows(d, "cold-driver-view.csv", m, "cold", "view", { + "read_blocked": "driver.read_blocked", "fan_out_per_ledger": "driver.fan_out"}) + cw = next((r for r in rc(os.path.join(d, "cold-driver-view.csv")) if r["stage"] == "chunk_wall"), None) + if cw: + # chunk_wall total_ns sums the per-chunk walls (count = n chunks). + # Effective e2e wall ≈ that sum / chunk-workers. These synthetic runs + # used chunk-workers == chunk-count (n), so divide by n; override with + # COLD_CHUNK_WORKERS for a run that used a different worker count. + n = int(cw["n"]) + workers = int(os.environ.get("COLD_CHUNK_WORKERS", n)) or 1 + secs = (int(cw["total_ns"]) / 1e9) / workers + throughput.append({"machine": m, "tier": "cold", "mode": "view", "ledgers_per_s": round((n * 10000) / secs, 0)}) + bi = rc(os.path.join(d, "build-txhash-index.csv")) + if bi: + r = bi[0] + keys = int(r["total_keys"]); secs = (int(r["feed_ns"]) + int(r["finish_ns"])) / 1e9 + buildidx.append({"machine": m, "keys_per_s": round(keys / secs), "idx_mb": round(int(r["index_bytes"]) / 1e6)}) + return {"machines": machines, "queries": queries, "ingest": ingest, + "throughput": throughput, "build_index": buildidx} + +HTML = r""" + + +__TITLE__ + + +

__TITLE__

__SOURCE__ · interactive explorer · all data embedded (offline-capable). Toggle tier / decode-path / percentiles / machines / concurrency; click a column header to sort.
+
Queries
Ingest
Ingest totals
+
+
+
+
+ chart metric:
+
+

Line graph — one line per series (profile · tier · workload · path), x-axis = concurrency. Filter to one workload/path/tier to overlay profiles; click a legend entry to hide/show a line.

+

Bar chart — one bar per visible row.

+
+
+ + +
+ +""" + +def main(): + ap = argparse.ArgumentParser() + ap.add_argument("run_root") + ap.add_argument("out") + ap.add_argument("--title", default="stellar-rpc full-history bench explorer") + ap.add_argument("--source", default="") + a = ap.parse_args() + data = build(a.run_root) + html = (HTML.replace("__TITLE__", a.title) + .replace("__SOURCE__", a.source or a.run_root) + .replace("__DATA__", json.dumps(data, separators=(",", ":")))) + with open(a.out, "w") as f: + f.write(html) + print(f"wrote {a.out}: {len(data['queries'])} query rows, {len(data['ingest'])} ingest rows, " + f"{len(data['machines'])} machines, {len(html)} bytes") + +if __name__ == "__main__": + main() From 23bee2f951a9e9fbf0d0ffbfa81e38031e53d25d Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Tue, 9 Jun 2026 15:17:50 +0000 Subject: [PATCH 26/27] bench(fullhistory): combine pubnet baseline into the synthetic explorer + report (#762) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Include the real pubnet (non-synthetic) chunk-5860 baseline alongside the three synthetic datasets, on the same c6id.8xlarge, so the comparison is interactive and in the tables — not just prose. - explorer: replace the synthetic-only HTML with 2026-06-09-synthetic-vs-pubnet-explorer.html — 4 datasets (pubnet + sac/token/ soroswap), 200 query rows. pubnet contributes its query sweeps (the headline comparison); the ingest tab stays synthetic-only (pubnet used 8 chunk-workers vs synthetic 1-2, so a single throughput divisor would misreport it — its ingest lives in results/2026-06-03-cross-machine.md). Dimension relabeled Profile -> Dataset. - report: add a pubnet column to the query-latency and throughput tables, and link the explorer. Pubnet query sweeps pulled from gs://.../benchmarks/2026-06-03/c6id.8xlarge-... -corrected (same harness/CSV layout as the synthetic run). Co-Authored-By: Claude Opus 4.8 (1M context) --- ...6-06-09-synthetic-apply-load-explorer.html | 271 ------------------ .../2026-06-09-synthetic-apply-load.md | 39 ++- ...26-06-09-synthetic-vs-pubnet-explorer.html | 271 ++++++++++++++++++ .../results/make_explorer.py | 18 +- 4 files changed, 304 insertions(+), 295 deletions(-) delete mode 100644 cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load-explorer.html create mode 100644 cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-vs-pubnet-explorer.html diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load-explorer.html b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load-explorer.html deleted file mode 100644 index f26e0deb5..000000000 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load-explorer.html +++ /dev/null @@ -1,271 +0,0 @@ - - - -stellar-rpc synthetic apply-load bench (sac/token/soroswap, 2026-06-09) - - -

stellar-rpc synthetic apply-load bench (sac/token/soroswap, 2026-06-09)

gs://rpc-full-history/synthetic-ledgers/2026-06-04-apply-load-20k/ · interactive explorer · all data embedded (offline-capable). Toggle tier / decode-path / percentiles / machines / concurrency; click a column header to sort.
-
Queries
Ingest
Ingest totals
-
-
-
-
- chart metric:
-
-

Line graph — one line per series (profile · tier · workload · path), x-axis = concurrency. Filter to one workload/path/tier to overlay profiles; click a legend entry to hide/show a line.

-

Bar chart — one bar per visible row.

-
-
- - -
- - \ No newline at end of file diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load.md index d897f21a1..b36c2aec9 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load.md @@ -20,6 +20,12 @@ number here is recomputed from those CSVs. > `cold-*`/`hot-*` query benches), no dependency bump. Swapping in > `loadtest.ApplyLoad` later is a drop-in producer change. +**Interactive explorer:** [`2026-06-09-synthetic-vs-pubnet-explorer.html`](./2026-06-09-synthetic-vs-pubnet-explorer.html) +— a self-contained (offline) HTML with all sweep data embedded for the three +synthetic datasets **plus the pubnet baseline**; toggle tier / decode-path / +percentiles / dataset / concurrency, sort, and overlay series. Also at +`gs://rpc-full-history/synthetic-ledgers/2026-06-04-apply-load-20k/explorer.html`. + ## Setup - **machine:** AWS c6id.8xlarge — 32 vCPU (Intel Ice Lake), 61 GB RAM, local NVMe instance store @@ -41,31 +47,34 @@ state grows ~8.5 MB/ledger); on the 61 GB box sac/token were generated at 10k ledgers (1 chunk) and soroswap — far lighter at 1,500 tx/ledger — at 20k (2 chunks). See `SYNTHETIC-LEDGERS.md` for the per-RAM sizing table. +**`pubnet`** = real pubnet chunk 5860 on the same c6id.8xlarge (corrected harness, +`results/2026-06-03-cross-machine.md`) — the non-synthetic baseline, for contrast. + ## Table 1 — Query latency, p50 / p99 @ c=1 (ms). `cold / hot` **xdr-views** path (the realistic server path): -| workload | sac | token | soroswap | -|---|---|---|---| -| tx-page | 26.2 / 21.2 | 25.6 / 21.3 | 14.1 / 11.5 | -| tx-hash | 16.8 / 14.3 | 17.6 / 14.0 | 9.1 / 7.1 | -| events | 229 / 14.0 | 235 / 13.6 | 210 / 32.5 | -| ledgers (n=20) | 101 / 91 | 92 / 84 | 53 / 51 | +| workload | pubnet (baseline) | sac | token | soroswap | +|---|---|---|---|---| +| tx-page | 3.0 / 1.5 | 26.2 / 21.2 | 25.6 / 21.3 | 14.1 / 11.5 | +| tx-hash | 2.2 / 1.2 | 16.8 / 14.3 | 17.6 / 14.0 | 9.1 / 7.1 | +| events | 14.4 / 4.4 | 229 / 14.0 | 235 / 13.6 | 210 / 32.5 | +| ledgers (n=20) | 15.1 / 13.3 | 101 / 91 | 92 / 84 | 53 / 51 | **roundtrip** path (production `UnmarshalBinary` + `ParseTransaction`): -| workload | sac | token | soroswap | -|---|---|---|---| -| tx-page | 129 / 121 | 135 / 128 | 89 / 84 | -| tx-hash | 117 / 120 | 132 / 114 | 82 / 81 | +| workload | pubnet (baseline) | sac | token | soroswap | +|---|---|---|---|---| +| tx-page | 13.2 / 11.1 | 129 / 121 | 135 / 128 | 89 / 84 | +| tx-hash | 11.9 / 10.6 | 117 / 120 | 132 / 114 | 82 / 81 | ## Table 2 — Peak query throughput, ops/s (best across c=1→16, xdr-views). `cold / hot` -| workload | sac | token | soroswap | -|---|---|---|---| -| tx-page | 411 / 491 | 412 / 498 | 732 / 888 | -| tx-hash | 575 / 734 | 610 / 789 | 992 / 1,277 | -| events | 120 / 740 | 112 / 1,140 | 38 / 286 | +| workload | pubnet (baseline) | sac | token | soroswap | +|---|---|---|---|---| +| tx-page | 3,456 / 4,830 | 411 / 491 | 412 / 498 | 732 / 888 | +| tx-hash | 4,170 / 7,253 | 575 / 734 | 610 / 789 | 992 / 1,277 | +| events | 512 / 1,843 | 120 / 740 | 112 / 1,140 | 38 / 286 | events note: `custom_token` emits **3-topic** events (others 4-topic), so token's corpus is built with `EVENTS_TOPIC_COUNT=3`; it still reaches the full K=15 diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-vs-pubnet-explorer.html b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-vs-pubnet-explorer.html new file mode 100644 index 000000000..4dc2729f5 --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-vs-pubnet-explorer.html @@ -0,0 +1,271 @@ + + + +stellar-rpc full-history bench — synthetic (sac/token/soroswap) vs pubnet, c6id.8xlarge (2026-06-09) + + +

stellar-rpc full-history bench — synthetic (sac/token/soroswap) vs pubnet, c6id.8xlarge (2026-06-09)

synthetic: gs://rpc-full-history/synthetic-ledgers/2026-06-04-apply-load-20k/ · pubnet: gs://rpc-full-history/benchmarks/2026-06-03/ · interactive explorer · all data embedded (offline-capable). Toggle tier / decode-path / percentiles / machines / concurrency; click a column header to sort.
+
Queries
Ingest
Ingest totals
+
+
+
+
+ chart metric:
+
+

Line graph — one line per series (dataset · tier · workload · path), x-axis = concurrency. Filter to one workload/path/tier to overlay datasets; click a legend entry to hide/show a line.

+

Bar chart — one bar per visible row.

+
+
+ + +
+ + \ No newline at end of file diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/make_explorer.py b/cmd/stellar-rpc/scripts/bench-fullhistory/results/make_explorer.py index 5035548fc..f25588234 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/results/make_explorer.py +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/make_explorer.py @@ -170,7 +170,7 @@ def build(run_root):
chart metric:
-

Line graph — one line per series (profile · tier · workload · path), x-axis = concurrency. Filter to one workload/path/tier to overlay profiles; click a legend entry to hide/show a line.

+

Line graph — one line per series (dataset · tier · workload · path), x-axis = concurrency. Filter to one workload/path/tier to overlay datasets; click a legend entry to hide/show a line.

Bar chart — one bar per visible row.

@@ -180,7 +180,7 @@ def build(run_root): chart metric:
Per-ledger stage latencies (ms). Hot ingest ran --parallel in view (xdr-views) and parsed modes; cold ran view only. driver.lcm_decode exists only in parsed mode.
-

Line graph — one line per series (profile · tier/mode), x-axis = pipeline stage. Filter to one tier/mode to overlay profiles; click a legend entry to hide/show a line.

+

Line graph — one line per series (dataset · tier/mode), x-axis = pipeline stage. Filter to one tier/mode to overlay datasets; click a legend entry to hide/show a line.

Bar chart — one bar per visible row.

Throughput

@@ -320,23 +320,23 @@ def build(run_root): makeView({ key:'q', rows:DATA.queries, filters:'#qfilters', table:'#qtable', bars:'#qbars', chart:'#qchart', count:'#qcount', - facets:[{key:'machine',label:'Profile'},{key:'tier',label:'Tier',cls:v=>v},{key:'workload',label:'Workload'}, + facets:[{key:'machine',label:'Dataset'},{key:'tier',label:'Tier',cls:v=>v},{key:'workload',label:'Workload'}, {key:'path',label:'Decode path',cls:pathClass},{key:'c',label:'Concurrency'}], dims:['machine','tier','workload','path','c'], metrics:['p50','p99','p90','max','ops'], mlabel:{p50:'p50 (ms)',p90:'p90 (ms)',p99:'p99 (ms)',max:'max (ms)',ops:'ops/sec'}, - clabel:{machine:'Profile',tier:'Tier',workload:'Workload',path:'Path',c:'conc',p50:'p50 ms',p90:'p90 ms',p99:'p99 ms',max:'max ms',ops:'ops/s'}, + clabel:{machine:'Dataset',tier:'Tier',workload:'Workload',path:'Path',c:'conc',p50:'p50 ms',p90:'p90 ms',p99:'p99 ms',max:'max ms',ops:'ops/s'}, barLabel:r=>`${r.machine} · ${r.tier} · ${r.workload} · ${r.path} · c=${r.c}`, line:'#qline', legend:'#qlegend', xDim:'c', xLabel:'query-concurrency', xPrefix:'c=', seriesLabel:r=>`${r.machine} · ${r.tier} · ${r.workload} · ${r.path}`, }); makeView({ key:'i', rows:DATA.ingest, filters:'#ifilters', table:'#itable', bars:'#ibars', chart:'#ichart', count:'#icount', - facets:[{key:'machine',label:'Profile'},{key:'tier',label:'Tier',cls:v=>v},{key:'mode',label:'Mode'},{key:'stage',label:'Stage'}], + facets:[{key:'machine',label:'Dataset'},{key:'tier',label:'Tier',cls:v=>v},{key:'mode',label:'Mode'},{key:'stage',label:'Stage'}], dims:['machine','tier','mode','stage'], metrics:['p50','p99','p90','max'], mlabel:{p50:'p50 (ms)',p90:'p90 (ms)',p99:'p99 (ms)',max:'max (ms)'}, - clabel:{machine:'Profile',tier:'Tier',mode:'Mode',stage:'Stage',p50:'p50 ms',p90:'p90 ms',p99:'p99 ms',max:'max ms'}, + clabel:{machine:'Dataset',tier:'Tier',mode:'Mode',stage:'Stage',p50:'p50 ms',p90:'p90 ms',p99:'p99 ms',max:'max ms'}, barLabel:r=>`${r.machine} · ${r.tier}/${r.mode} · ${r.stage}`, line:'#iline', legend:'#ilegend', xDim:'stage', xLabel:'stage', xRotate:true, xOrder:['ledgers.write','txhash.extract','txhash.write','events.extract','events.term_index','events.cold_append','events.write','driver.read_blocked','driver.fan_out','driver.lcm_decode','driver.total_per_ledger'], @@ -359,11 +359,11 @@ def build(run_root): })(); makeView({ key:'t', rows:ingestTotals, filters:'#totfilters', table:'#tottable', bars:'#totbars', chart:'#totchart', count:'#totcount', - facets:[{key:'machine',label:'Profile'},{key:'tier',label:'Tier',cls:v=>v},{key:'mode',label:'Mode'}], + facets:[{key:'machine',label:'Dataset'},{key:'tier',label:'Tier',cls:v=>v},{key:'mode',label:'Mode'}], dims:['machine','tier','mode'], metrics:['p50','p99','p90','max','lps'], mlabel:{p50:'total p50 (ms/ledger)',p90:'total p90 (ms/ledger)',p99:'total p99 (ms/ledger)',max:'total max (ms/ledger)',lps:'ledgers/sec'}, - clabel:{machine:'Profile',tier:'Tier',mode:'Mode',p50:'p50 ms',p90:'p90 ms',p99:'p99 ms',max:'max ms',lps:'ledgers/s'}, + clabel:{machine:'Dataset',tier:'Tier',mode:'Mode',p50:'p50 ms',p90:'p90 ms',p99:'p99 ms',max:'max ms',lps:'ledgers/s'}, barLabel:r=>`${r.machine} · ${r.tier}/${r.mode}`, line:'#totline', legend:'#totlegend', xDim:'machine', xLabel:'machine', xRotate:true, xOrder:DATA.machines, seriesLabel:r=>`${r.tier}/${r.mode}`, @@ -372,7 +372,7 @@ def build(run_root): (function(){ const t=DATA.throughput.slice().sort((a,b)=>a.machine.localeCompare(b.machine)||a.tier.localeCompare(b.tier)||a.mode.localeCompare(b.mode)); const bi=Object.fromEntries(DATA.build_index.map(r=>[r.machine,r])); - let h='ProfileTierModeledgers/sbuild-txhash keys/sidx MB'; + let h='DatasetTierModeledgers/sbuild-txhash keys/sidx MB'; t.forEach(r=>{h+=`${r.machine}${r.tier}${r.mode}${fmt(r.ledgers_per_s)}${bi[r.machine]?fmt(bi[r.machine].keys_per_s):''}${bi[r.machine]?bi[r.machine].idx_mb:''}`;}); $('#ttable').innerHTML=h+''; })(); From 389933723c717976dfa124776e2500e5668bb029 Mon Sep 17 00:00:00 2001 From: Simon Chow Date: Wed, 10 Jun 2026 20:34:55 +0000 Subject: [PATCH 27/27] bench(fullhistory): dump RocksDB effective config + explain the ingest p99 tail (#762) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit results/rocksdb-config.md — per-CF key knobs (events/ledgers/txhash) extracted from the OPTIONS files RocksDB wrote, plus the full verbatim events-CF OPTIONS. Reveals the p99 ingest-tail cause: events & ledgers CFs run on RocksDB defaults (auto-compaction on, max_background_jobs=2, L0 slowdown@20/stop@36), while txhash is tuned write-once (disable_auto_compactions, L0 triggers 999, 8 bg jobs). The events CF's default L0 throttling under dense writes is what produces the ~8x p99/p50 on events hot_write. Linked from the synthetic report. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../2026-06-09-synthetic-apply-load.md | 2 + .../results/rocksdb-config.md | 637 ++++++++++++++++++ 2 files changed, 639 insertions(+) create mode 100644 cmd/stellar-rpc/scripts/bench-fullhistory/results/rocksdb-config.md diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load.md index b36c2aec9..c5dd5b5c9 100644 --- a/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load.md +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-09-synthetic-apply-load.md @@ -89,6 +89,8 @@ K=15) as more filters select fewer events. | token | 4m22s | 38 | 18 / 33 ms | 707 k | 291 k | 53.8 M keys @ 11.8 M/s | | soroswap | 3m06s | 108 | 14 / 24 ms | 371 k | 506 k | 29.9 M keys @ 17.1 M/s | +RocksDB effective config for the hot stores: [`rocksdb-config.md`](./rocksdb-config.md). Note the **events** and **ledgers** CFs run on RocksDB defaults (auto-compaction on, `max_background_jobs=2`, L0 slowdown@20/stop@36), while **txhash** is tuned write-once (compaction off) — which is why the events hot-ingest p99 tail (~8× p50) comes from compaction throttling on the events CF. + cold-ingest end-to-end is **events-stage-bound** (term-index + cold append); per-ledger cost scales with events/ledger. Per-ledger cold-ingest stays **under ~55 ms through p99** even on 6k-tx ledgers (rare max-tail spikes to ~0.4–1 s on diff --git a/cmd/stellar-rpc/scripts/bench-fullhistory/results/rocksdb-config.md b/cmd/stellar-rpc/scripts/bench-fullhistory/results/rocksdb-config.md new file mode 100644 index 000000000..02c5cfde2 --- /dev/null +++ b/cmd/stellar-rpc/scripts/bench-fullhistory/results/rocksdb-config.md @@ -0,0 +1,637 @@ +# RocksDB config (full-history hot stores) + +Effective config from the `OPTIONS` files RocksDB writes at open (fully resolved). +Set in `internal/fullhistory/pkg/rocksdb/{rocksdb.go,tuning.go}`; grocksdb v1.10.7 → RocksDB 10.9.1. +Store: sac hot store, `bench-results/all3-20260608T181106Z` (c6id.8xlarge). + +## Key knobs per column family + +| knob | events | ledgers | txhash | +|---|---|---|---| +| write_buffer_size | 67108864 | 67108864 | 67108864 | +| max_write_buffer_number | 2 | 2 | 2 | +| min_write_buffer_number_to_merge | 1 | 1 | 1 | +| level0_file_num_compaction_trigger | 4 | 4 | 999 | +| level0_slowdown_writes_trigger | 20 | 20 | 999 | +| level0_stop_writes_trigger | 36 | 36 | 999 | +| target_file_size_base | 67108864 | 67108864 | 67108864 | +| max_bytes_for_level_base | 268435456 | 268435456 | 268435456 | +| max_bytes_for_level_multiplier | 10.000000 | 10.000000 | 10.000000 | +| num_levels | 7 | 7 | 7 | +| disable_auto_compactions | false | false | true | +| compression | kNoCompression | kNoCompression | kNoCompression | +| max_background_jobs (DB) | 2 | 2 | 8 | +| max_open_files (DB) | -1 | -1 | 10000 | + +_bytes: 67108864 = 64 MiB, 268435456 = 256 MiB._ + +## Why this explains the p99 ingest tail + +- **`max_background_jobs=2`** — only 2 threads for all flushes + compactions per store. At ~6k events/ledger the events CF fills its 64 MiB memtable fast; 2 background threads can't always keep flush + L0→L1 compaction up. +- **`level0_slowdown_writes_trigger=20`, `level0_stop_writes_trigger=36`** — at 20 L0 files writes are throttled, at 36 they stall. Compaction debt on the events CF periodically hits these → the p99/max spikes while p50 stays low. +- **64 MiB memtable × `max_write_buffer_number=2`** — a third fill back-pressures writers until a flush drains. +- Mostly RocksDB defaults — the wrapper leaves a knob at default unless a non-zero `Tuning` value is passed; **not** tuned for the synthetic dense-write worst case. + +## Full OPTIONS — events CF (verbatim, the tail driver) +```ini +# This is a RocksDB option file. +# +# For detailed file format spec, please refer to the example file +# in examples/rocksdb_option_file_example.ini +# + +[Version] + rocksdb_version=10.9.1 + options_file_version=1.1 + +[DBOptions] + max_manifest_space_amp_pct=500 + manifest_preallocation_size=4194304 + max_manifest_file_size=1073741824 + compaction_readahead_size=2097152 + strict_bytes_per_sync=false + bytes_per_sync=0 + max_background_jobs=2 + avoid_flush_during_shutdown=false + max_background_flushes=-1 + delayed_write_rate=16777216 + max_open_files=-1 + max_subcompactions=1 + writable_file_max_buffer_size=1048576 + wal_bytes_per_sync=0 + max_background_compactions=-1 + max_total_wal_size=0 + delete_obsolete_files_period_micros=21600000000 + stats_dump_period_sec=600 + stats_history_buffer_size=1048576 + stats_persist_period_sec=600 + follower_refresh_catchup_period_ms=10000 + enforce_single_del_contracts=true + lowest_used_cache_tier=kNonVolatileBlockTier + bgerror_resume_retry_interval=1000000 + metadata_write_temperature=kUnknown + best_efforts_recovery=false + log_readahead_size=0 + write_identity_file=true + write_dbid_to_manifest=true + prefix_seek_opt_in_only=false + wal_compression=kNoCompression + manual_wal_flush=false + db_host_id=__hostname__ + two_write_queues=false + skip_checking_sst_file_sizes_on_db_open=false + flush_verify_memtable_count=true + atomic_flush=false + verify_sst_unique_id_in_manifest=true + skip_stats_update_on_db_open=false + track_and_verify_wals=false + track_and_verify_wals_in_manifest=false + compaction_verify_record_count=true + paranoid_checks=true + create_if_missing=true + max_write_batch_group_size_bytes=1048576 + follower_catchup_retry_count=10 + avoid_flush_during_recovery=false + file_checksum_gen_factory=nullptr + enable_thread_tracking=false + allow_fallocate=true + allow_data_in_errors=false + error_if_exists=false + use_direct_io_for_flush_and_compaction=false + background_close_inactive_wals=false + create_missing_column_families=true + WAL_size_limit_MB=0 + use_direct_reads=false + persist_stats_to_disk=false + allow_2pc=false + max_log_file_size=0 + is_fd_close_on_exec=true + avoid_unnecessary_blocking_io=false + max_file_opening_threads=16 + wal_filter=nullptr + wal_write_temperature=kUnknown + follower_catchup_retry_wait_ms=100 + allow_mmap_reads=false + allow_mmap_writes=false + use_adaptive_mutex=false + use_fsync=false + table_cache_numshardbits=6 + dump_malloc_stats=false + db_write_buffer_size=0 + allow_ingest_behind=false + keep_log_file_num=1000 + max_bgerror_resume_count=2147483647 + allow_concurrent_memtable_write=true + recycle_log_file_num=0 + log_file_time_to_roll=0 + WAL_ttl_seconds=0 + enable_pipelined_write=false + write_thread_slow_yield_usec=3 + unordered_write=false + wal_recovery_mode=kPointInTimeRecovery + enable_write_thread_adaptive_yield=true + write_thread_max_yield_usec=100 + advise_random_on_open=true + info_log_level=INFO_LEVEL + + +[CFOptions "default"] + memtable_max_range_deletions=0 + compression_manager=nullptr + compression_opts={checksum=false;max_dict_buffer_bytes=0;enabled=false;max_dict_bytes=0;max_compressed_bytes_per_kb=896;parallel_threads=1;zstd_max_train_bytes=0;level=32767;use_zstd_dict_trainer=true;strategy=0;window_bits=-14;} + paranoid_memory_checks=false + memtable_avg_op_scan_flush_trigger=0 + block_protection_bytes_per_key=0 + uncache_aggressiveness=0 + bottommost_file_compaction_delay=0 + memtable_protection_bytes_per_key=0 + bottommost_compression=kDisableCompressionOption + sample_for_compression=0 + prepopulate_blob_cache=kDisable + blob_file_starting_level=0 + blob_compaction_readahead_size=0 + blob_garbage_collection_force_threshold=1.000000 + blob_garbage_collection_age_cutoff=0.250000 + table_factory=BlockBasedTable + max_successive_merges=0 + max_write_buffer_number=2 + prefix_extractor=nullptr + memtable_huge_page_size=0 + write_buffer_size=67108864 + strict_max_successive_merges=false + arena_block_size=1048576 + memtable_op_scan_flush_trigger=0 + level0_file_num_compaction_trigger=4 + report_bg_io_stats=false + inplace_update_num_locks=10000 + memtable_prefix_bloom_size_ratio=0.000000 + level0_stop_writes_trigger=36 + blob_compression_type=kNoCompression + level0_slowdown_writes_trigger=20 + hard_pending_compaction_bytes_limit=274877906944 + target_file_size_multiplier=1 + paranoid_file_checks=false + min_blob_size=0 + max_compaction_bytes=1677721600 + disable_auto_compactions=false + experimental_mempurge_threshold=0.000000 + verify_output_flags=0 + last_level_temperature=kUnknown + preserve_internal_time_seconds=0 + memtable_veirfy_per_key_checksum_on_seek=false + soft_pending_compaction_bytes_limit=68719476736 + target_file_size_base=67108864 + enable_blob_files=false + bottommost_compression_opts={checksum=false;max_dict_buffer_bytes=0;enabled=false;max_dict_bytes=0;max_compressed_bytes_per_kb=896;parallel_threads=1;zstd_max_train_bytes=0;level=32767;use_zstd_dict_trainer=true;strategy=0;window_bits=-14;} + memtable_whole_key_filtering=false + target_file_size_is_upper_bound=false + max_bytes_for_level_base=268435456 + compaction_options_fifo={trivial_copy_buffer_size=4096;allow_trivial_copy_when_change_temperature=false;file_temperature_age_thresholds=;allow_compaction=false;age_for_warm=0;max_table_files_size=1073741824;} + max_bytes_for_level_multiplier=10.000000 + max_bytes_for_level_multiplier_additional=1:1:1:1:1:1:1 + max_sequential_skip_in_iterations=8 + compression=kNoCompression + default_write_temperature=kUnknown + compaction_options_universal={reduce_file_locking=false;incremental=false;compression_size_percent=-1;allow_trivial_move=false;max_size_amplification_percent=200;max_merge_width=4294967295;stop_style=kCompactionStopStyleTotalSize;min_merge_width=2;max_read_amp=-1;size_ratio=1;} + ttl=2592000 + periodic_compaction_seconds=0 + preclude_last_level_data_seconds=0 + blob_file_size=268435456 + enable_blob_garbage_collection=false + cf_allow_ingest_behind=false + min_write_buffer_number_to_merge=1 + sst_partitioner_factory=nullptr + num_levels=7 + disallow_memtable_writes=false + force_consistency_checks=true + memtable_insert_with_hint_prefix_extractor=nullptr + memtable_factory=SkipListFactory + optimize_filters_for_hits=false + level_compaction_dynamic_level_bytes=true + compaction_style=kCompactionStyleLevel + compaction_filter=nullptr + default_temperature=kUnknown + inplace_update_support=false + merge_operator=nullptr + bloom_locality=0 + comparator=leveldb.BytewiseComparator + compaction_filter_factory=nullptr + max_write_buffer_size_to_maintain=0 + compaction_pri=kMinOverlappingRatio + persist_user_defined_timestamps=true + +[TableOptions/BlockBasedTable "default"] + fail_if_no_udi_on_open=false + initial_auto_readahead_size=8192 + max_auto_readahead_size=262144 + metadata_cache_options={unpartitioned_pinning=kFallback;partition_pinning=kFallback;top_level_index_pinning=kFallback;} + block_align=false + read_amp_bytes_per_bit=0 + verify_compression=false + detect_filter_construct_corruption=false + whole_key_filtering=true + user_defined_index_factory=nullptr + filter_policy=nullptr + super_block_alignment_space_overhead_ratio=128 + use_delta_encoding=true + optimize_filters_for_memory=true + partition_filters=false + prepopulate_block_cache=kDisable + pin_top_level_index_and_filter=true + index_block_restart_interval=1 + block_size_deviation=10 + num_file_reads_for_auto_readahead=2 + format_version=6 + decouple_partitioned_filters=true + checksum=kXXH3 + block_size=4096 + data_block_hash_table_util_ratio=0.750000 + index_shortening=kShortenSeparators + block_restart_interval=16 + data_block_index_type=kDataBlockBinarySearch + index_type=kBinarySearch + super_block_alignment_size=0 + metadata_block_size=4096 + pin_l0_filter_and_index_blocks_in_cache=false + no_block_cache=false + cache_index_and_filter_blocks_with_high_priority=true + cache_index_and_filter_blocks=false + enable_index_compression=true + flush_block_policy_factory=FlushBlockBySizePolicyFactory + + +[CFOptions "events_data"] + memtable_max_range_deletions=0 + compression_manager=nullptr + compression_opts={checksum=false;max_dict_buffer_bytes=0;enabled=false;max_dict_bytes=0;max_compressed_bytes_per_kb=896;parallel_threads=1;zstd_max_train_bytes=0;level=32767;use_zstd_dict_trainer=true;strategy=0;window_bits=-14;} + paranoid_memory_checks=false + memtable_avg_op_scan_flush_trigger=0 + block_protection_bytes_per_key=0 + uncache_aggressiveness=0 + bottommost_file_compaction_delay=0 + memtable_protection_bytes_per_key=0 + bottommost_compression=kDisableCompressionOption + sample_for_compression=0 + prepopulate_blob_cache=kDisable + blob_file_starting_level=0 + blob_compaction_readahead_size=0 + blob_garbage_collection_force_threshold=1.000000 + blob_garbage_collection_age_cutoff=0.250000 + table_factory=BlockBasedTable + max_successive_merges=0 + max_write_buffer_number=2 + prefix_extractor=nullptr + memtable_huge_page_size=0 + write_buffer_size=67108864 + strict_max_successive_merges=false + arena_block_size=1048576 + memtable_op_scan_flush_trigger=0 + level0_file_num_compaction_trigger=4 + report_bg_io_stats=false + inplace_update_num_locks=10000 + memtable_prefix_bloom_size_ratio=0.000000 + level0_stop_writes_trigger=36 + blob_compression_type=kNoCompression + level0_slowdown_writes_trigger=20 + hard_pending_compaction_bytes_limit=274877906944 + target_file_size_multiplier=1 + paranoid_file_checks=false + min_blob_size=0 + max_compaction_bytes=1677721600 + disable_auto_compactions=false + experimental_mempurge_threshold=0.000000 + verify_output_flags=0 + last_level_temperature=kUnknown + preserve_internal_time_seconds=0 + memtable_veirfy_per_key_checksum_on_seek=false + soft_pending_compaction_bytes_limit=68719476736 + target_file_size_base=67108864 + enable_blob_files=false + bottommost_compression_opts={checksum=false;max_dict_buffer_bytes=0;enabled=false;max_dict_bytes=0;max_compressed_bytes_per_kb=896;parallel_threads=1;zstd_max_train_bytes=0;level=32767;use_zstd_dict_trainer=true;strategy=0;window_bits=-14;} + memtable_whole_key_filtering=false + target_file_size_is_upper_bound=false + max_bytes_for_level_base=268435456 + compaction_options_fifo={trivial_copy_buffer_size=4096;allow_trivial_copy_when_change_temperature=false;file_temperature_age_thresholds=;allow_compaction=false;age_for_warm=0;max_table_files_size=1073741824;} + max_bytes_for_level_multiplier=10.000000 + max_bytes_for_level_multiplier_additional=1:1:1:1:1:1:1 + max_sequential_skip_in_iterations=8 + compression=kZSTD + default_write_temperature=kUnknown + compaction_options_universal={reduce_file_locking=false;incremental=false;compression_size_percent=-1;allow_trivial_move=false;max_size_amplification_percent=200;max_merge_width=4294967295;stop_style=kCompactionStopStyleTotalSize;min_merge_width=2;max_read_amp=-1;size_ratio=1;} + ttl=2592000 + periodic_compaction_seconds=0 + preclude_last_level_data_seconds=0 + blob_file_size=268435456 + enable_blob_garbage_collection=false + cf_allow_ingest_behind=false + min_write_buffer_number_to_merge=1 + sst_partitioner_factory=nullptr + num_levels=7 + disallow_memtable_writes=false + force_consistency_checks=true + memtable_insert_with_hint_prefix_extractor=nullptr + memtable_factory=SkipListFactory + optimize_filters_for_hits=false + level_compaction_dynamic_level_bytes=true + compaction_style=kCompactionStyleLevel + compaction_filter=nullptr + default_temperature=kUnknown + inplace_update_support=false + merge_operator=nullptr + bloom_locality=0 + comparator=leveldb.BytewiseComparator + compaction_filter_factory=nullptr + max_write_buffer_size_to_maintain=0 + compaction_pri=kMinOverlappingRatio + persist_user_defined_timestamps=true + +[TableOptions/BlockBasedTable "events_data"] + fail_if_no_udi_on_open=false + initial_auto_readahead_size=8192 + max_auto_readahead_size=262144 + metadata_cache_options={unpartitioned_pinning=kFallback;partition_pinning=kFallback;top_level_index_pinning=kFallback;} + block_align=false + read_amp_bytes_per_bit=0 + verify_compression=false + detect_filter_construct_corruption=false + whole_key_filtering=true + user_defined_index_factory=nullptr + filter_policy=nullptr + super_block_alignment_space_overhead_ratio=128 + use_delta_encoding=true + optimize_filters_for_memory=true + partition_filters=false + prepopulate_block_cache=kDisable + pin_top_level_index_and_filter=true + index_block_restart_interval=1 + block_size_deviation=10 + num_file_reads_for_auto_readahead=2 + format_version=6 + decouple_partitioned_filters=true + checksum=kXXH3 + block_size=32768 + data_block_hash_table_util_ratio=0.750000 + index_shortening=kShortenSeparators + block_restart_interval=16 + data_block_index_type=kDataBlockBinarySearch + index_type=kBinarySearch + super_block_alignment_size=0 + metadata_block_size=4096 + pin_l0_filter_and_index_blocks_in_cache=false + no_block_cache=false + cache_index_and_filter_blocks_with_high_priority=true + cache_index_and_filter_blocks=false + enable_index_compression=true + flush_block_policy_factory=FlushBlockBySizePolicyFactory + + +[CFOptions "events_index"] + memtable_max_range_deletions=0 + compression_manager=nullptr + compression_opts={checksum=false;max_dict_buffer_bytes=0;enabled=false;max_dict_bytes=0;max_compressed_bytes_per_kb=896;parallel_threads=1;zstd_max_train_bytes=0;level=32767;use_zstd_dict_trainer=true;strategy=0;window_bits=-14;} + paranoid_memory_checks=false + memtable_avg_op_scan_flush_trigger=0 + block_protection_bytes_per_key=0 + uncache_aggressiveness=0 + bottommost_file_compaction_delay=0 + memtable_protection_bytes_per_key=0 + bottommost_compression=kDisableCompressionOption + sample_for_compression=0 + prepopulate_blob_cache=kDisable + blob_file_starting_level=0 + blob_compaction_readahead_size=0 + blob_garbage_collection_force_threshold=1.000000 + blob_garbage_collection_age_cutoff=0.250000 + table_factory=BlockBasedTable + max_successive_merges=0 + max_write_buffer_number=2 + prefix_extractor=nullptr + memtable_huge_page_size=0 + write_buffer_size=67108864 + strict_max_successive_merges=false + arena_block_size=1048576 + memtable_op_scan_flush_trigger=0 + level0_file_num_compaction_trigger=4 + report_bg_io_stats=false + inplace_update_num_locks=10000 + memtable_prefix_bloom_size_ratio=0.000000 + level0_stop_writes_trigger=36 + blob_compression_type=kNoCompression + level0_slowdown_writes_trigger=20 + hard_pending_compaction_bytes_limit=274877906944 + target_file_size_multiplier=1 + paranoid_file_checks=false + min_blob_size=0 + max_compaction_bytes=1677721600 + disable_auto_compactions=false + experimental_mempurge_threshold=0.000000 + verify_output_flags=0 + last_level_temperature=kUnknown + preserve_internal_time_seconds=0 + memtable_veirfy_per_key_checksum_on_seek=false + soft_pending_compaction_bytes_limit=68719476736 + target_file_size_base=67108864 + enable_blob_files=false + bottommost_compression_opts={checksum=false;max_dict_buffer_bytes=0;enabled=false;max_dict_bytes=0;max_compressed_bytes_per_kb=896;parallel_threads=1;zstd_max_train_bytes=0;level=32767;use_zstd_dict_trainer=true;strategy=0;window_bits=-14;} + memtable_whole_key_filtering=false + target_file_size_is_upper_bound=false + max_bytes_for_level_base=268435456 + compaction_options_fifo={trivial_copy_buffer_size=4096;allow_trivial_copy_when_change_temperature=false;file_temperature_age_thresholds=;allow_compaction=false;age_for_warm=0;max_table_files_size=1073741824;} + max_bytes_for_level_multiplier=10.000000 + max_bytes_for_level_multiplier_additional=1:1:1:1:1:1:1 + max_sequential_skip_in_iterations=8 + compression=kNoCompression + default_write_temperature=kUnknown + compaction_options_universal={reduce_file_locking=false;incremental=false;compression_size_percent=-1;allow_trivial_move=false;max_size_amplification_percent=200;max_merge_width=4294967295;stop_style=kCompactionStopStyleTotalSize;min_merge_width=2;max_read_amp=-1;size_ratio=1;} + ttl=2592000 + periodic_compaction_seconds=0 + preclude_last_level_data_seconds=0 + blob_file_size=268435456 + enable_blob_garbage_collection=false + cf_allow_ingest_behind=false + min_write_buffer_number_to_merge=1 + sst_partitioner_factory=nullptr + num_levels=7 + disallow_memtable_writes=false + force_consistency_checks=true + memtable_insert_with_hint_prefix_extractor=nullptr + memtable_factory=SkipListFactory + optimize_filters_for_hits=false + level_compaction_dynamic_level_bytes=true + compaction_style=kCompactionStyleLevel + compaction_filter=nullptr + default_temperature=kUnknown + inplace_update_support=false + merge_operator=nullptr + bloom_locality=0 + comparator=leveldb.BytewiseComparator + compaction_filter_factory=nullptr + max_write_buffer_size_to_maintain=0 + compaction_pri=kMinOverlappingRatio + persist_user_defined_timestamps=true + +[TableOptions/BlockBasedTable "events_index"] + fail_if_no_udi_on_open=false + initial_auto_readahead_size=8192 + max_auto_readahead_size=262144 + metadata_cache_options={unpartitioned_pinning=kFallback;partition_pinning=kFallback;top_level_index_pinning=kFallback;} + block_align=false + read_amp_bytes_per_bit=0 + verify_compression=false + detect_filter_construct_corruption=false + whole_key_filtering=true + user_defined_index_factory=nullptr + filter_policy=nullptr + super_block_alignment_space_overhead_ratio=128 + use_delta_encoding=true + optimize_filters_for_memory=true + partition_filters=false + prepopulate_block_cache=kDisable + pin_top_level_index_and_filter=true + index_block_restart_interval=1 + block_size_deviation=10 + num_file_reads_for_auto_readahead=2 + format_version=6 + decouple_partitioned_filters=true + checksum=kXXH3 + block_size=4096 + data_block_hash_table_util_ratio=0.750000 + index_shortening=kShortenSeparators + block_restart_interval=16 + data_block_index_type=kDataBlockBinarySearch + index_type=kBinarySearch + super_block_alignment_size=0 + metadata_block_size=4096 + pin_l0_filter_and_index_blocks_in_cache=false + no_block_cache=false + cache_index_and_filter_blocks_with_high_priority=true + cache_index_and_filter_blocks=false + enable_index_compression=true + flush_block_policy_factory=FlushBlockBySizePolicyFactory + + +[CFOptions "events_offsets"] + memtable_max_range_deletions=0 + compression_manager=nullptr + compression_opts={checksum=false;max_dict_buffer_bytes=0;enabled=false;max_dict_bytes=0;max_compressed_bytes_per_kb=896;parallel_threads=1;zstd_max_train_bytes=0;level=32767;use_zstd_dict_trainer=true;strategy=0;window_bits=-14;} + paranoid_memory_checks=false + memtable_avg_op_scan_flush_trigger=0 + block_protection_bytes_per_key=0 + uncache_aggressiveness=0 + bottommost_file_compaction_delay=0 + memtable_protection_bytes_per_key=0 + bottommost_compression=kDisableCompressionOption + sample_for_compression=0 + prepopulate_blob_cache=kDisable + blob_file_starting_level=0 + blob_compaction_readahead_size=0 + blob_garbage_collection_force_threshold=1.000000 + blob_garbage_collection_age_cutoff=0.250000 + table_factory=BlockBasedTable + max_successive_merges=0 + max_write_buffer_number=2 + prefix_extractor=nullptr + memtable_huge_page_size=0 + write_buffer_size=67108864 + strict_max_successive_merges=false + arena_block_size=1048576 + memtable_op_scan_flush_trigger=0 + level0_file_num_compaction_trigger=4 + report_bg_io_stats=false + inplace_update_num_locks=10000 + memtable_prefix_bloom_size_ratio=0.000000 + level0_stop_writes_trigger=36 + blob_compression_type=kNoCompression + level0_slowdown_writes_trigger=20 + hard_pending_compaction_bytes_limit=274877906944 + target_file_size_multiplier=1 + paranoid_file_checks=false + min_blob_size=0 + max_compaction_bytes=1677721600 + disable_auto_compactions=false + experimental_mempurge_threshold=0.000000 + verify_output_flags=0 + last_level_temperature=kUnknown + preserve_internal_time_seconds=0 + memtable_veirfy_per_key_checksum_on_seek=false + soft_pending_compaction_bytes_limit=68719476736 + target_file_size_base=67108864 + enable_blob_files=false + bottommost_compression_opts={checksum=false;max_dict_buffer_bytes=0;enabled=false;max_dict_bytes=0;max_compressed_bytes_per_kb=896;parallel_threads=1;zstd_max_train_bytes=0;level=32767;use_zstd_dict_trainer=true;strategy=0;window_bits=-14;} + memtable_whole_key_filtering=false + target_file_size_is_upper_bound=false + max_bytes_for_level_base=268435456 + compaction_options_fifo={trivial_copy_buffer_size=4096;allow_trivial_copy_when_change_temperature=false;file_temperature_age_thresholds=;allow_compaction=false;age_for_warm=0;max_table_files_size=1073741824;} + max_bytes_for_level_multiplier=10.000000 + max_bytes_for_level_multiplier_additional=1:1:1:1:1:1:1 + max_sequential_skip_in_iterations=8 + compression=kNoCompression + default_write_temperature=kUnknown + compaction_options_universal={reduce_file_locking=false;incremental=false;compression_size_percent=-1;allow_trivial_move=false;max_size_amplification_percent=200;max_merge_width=4294967295;stop_style=kCompactionStopStyleTotalSize;min_merge_width=2;max_read_amp=-1;size_ratio=1;} + ttl=2592000 + periodic_compaction_seconds=0 + preclude_last_level_data_seconds=0 + blob_file_size=268435456 + enable_blob_garbage_collection=false + cf_allow_ingest_behind=false + min_write_buffer_number_to_merge=1 + sst_partitioner_factory=nullptr + num_levels=7 + disallow_memtable_writes=false + force_consistency_checks=true + memtable_insert_with_hint_prefix_extractor=nullptr + memtable_factory=SkipListFactory + optimize_filters_for_hits=false + level_compaction_dynamic_level_bytes=true + compaction_style=kCompactionStyleLevel + compaction_filter=nullptr + default_temperature=kUnknown + inplace_update_support=false + merge_operator=nullptr + bloom_locality=0 + comparator=leveldb.BytewiseComparator + compaction_filter_factory=nullptr + max_write_buffer_size_to_maintain=0 + compaction_pri=kMinOverlappingRatio + persist_user_defined_timestamps=true + +[TableOptions/BlockBasedTable "events_offsets"] + fail_if_no_udi_on_open=false + initial_auto_readahead_size=8192 + max_auto_readahead_size=262144 + metadata_cache_options={unpartitioned_pinning=kFallback;partition_pinning=kFallback;top_level_index_pinning=kFallback;} + block_align=false + read_amp_bytes_per_bit=0 + verify_compression=false + detect_filter_construct_corruption=false + whole_key_filtering=true + user_defined_index_factory=nullptr + filter_policy=nullptr + super_block_alignment_space_overhead_ratio=128 + use_delta_encoding=true + optimize_filters_for_memory=true + partition_filters=false + prepopulate_block_cache=kDisable + pin_top_level_index_and_filter=true + index_block_restart_interval=1 + block_size_deviation=10 + num_file_reads_for_auto_readahead=2 + format_version=6 + decouple_partitioned_filters=true + checksum=kXXH3 + block_size=4096 + data_block_hash_table_util_ratio=0.750000 + index_shortening=kShortenSeparators + block_restart_interval=16 + data_block_index_type=kDataBlockBinarySearch + index_type=kBinarySearch + super_block_alignment_size=0 + metadata_block_size=4096 + pin_l0_filter_and_index_blocks_in_cache=false + no_block_cache=false + cache_index_and_filter_blocks_with_high_priority=true + cache_index_and_filter_blocks=false + enable_index_compression=true + flush_block_policy_factory=FlushBlockBySizePolicyFactory + +```