Skip to content

restore: increase MTU from 64K to 256K#10173

Open
ripatel-fd wants to merge 1 commit into
mainfrom
ripatel/snapin-depth
Open

restore: increase MTU from 64K to 256K#10173
ripatel-fd wants to merge 1 commit into
mainfrom
ripatel/snapin-depth

Conversation

@ripatel-fd

Copy link
Copy Markdown
Contributor

Use tsorig field for frag sizes instead of sz field

Use tsorig field for frag sizes instead of sz field
Copilot AI review requested due to automatic review settings June 10, 2026 22:09
@github-actions

Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-424669000-perf per slot 0.046451 s 0.046539 s 0.189%
backtest mainnet-424669000-perf snapshot load 1.801 s 1.828 s 1.499%
backtest mainnet-424669000-perf total elapsed 60.200072 s 60.315135 s 0.191%
firedancer mem usage with mainnet.toml 504.41 GiB 504.41 GiB 0.000%

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the snapshot restore pipeline to support a larger per-fragment MTU (64 KiB → 256 KiB) by carrying fragment sizes in the tsorig metadata field (since fd_frag_meta_t.sz is only a ushort). It also updates the default/dev topologies to provision the larger MTU on the relevant snapshot links.

Changes:

  • Switch snapshot fragment size propagation from sz to tsorig across the snap* restore tiles (publish + consume paths).
  • Update snapshot link MTUs to 1UL<<18 (256 KiB) and reduce link depths to 4096 in multiple topology builders.
  • Adjust dcache chunk advancement to use the actual fragment size being forwarded (where applicable).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/discof/restore/fd_snapwr_tile.c Consume snapshot data frags using tsorig as the fragment size.
src/discof/restore/fd_snapld_tile.c Publish snapshot META/DATA/control frags with size carried in tsorig (and sz unused).
src/discof/restore/fd_snapin_tile.c Consume snapshot data frags using tsorig as the fragment size.
src/discof/restore/fd_snapdc_tile.c Forward/decompress snapshot stream while treating tsorig as fragment size.
src/discof/restore/fd_snapct_tile.c Treat incoming snapld stream fragment sizes as tsorig when processing.
src/app/firedancer/topology.c Increase snapshot link MTU to 256 KiB and reduce depths for snapld_dc/snapdc_in.
src/app/firedancer-dev/commands/snapshot_load.c Match snapshot-load dev topology MTU/depth updates for snapshot links.
src/app/firedancer-dev/commands/forktest/forktest.c Match forktest topology MTU/depth updates for snapshot links.
src/app/firedancer-dev/commands/backtest.c Match backtest topology MTU/depth updates for snapshot links.

}
} else {
fd_stem_publish( stem, 0UL, FD_SNAPSHOT_MSG_DATA, ctx->out_dc.chunk, (ulong)result, 0UL, 0UL, 0UL );
fd_stem_publish( stem, 0UL, FD_SNAPSHOT_MSG_DATA, ctx->out_dc.chunk, 0UL, 0UL, (ulong)result, 0UL );
@@ -501,8 +501,8 @@ fd_topo_initialize( config_t * config ) {
if( FD_LIKELY( snapshots_enabled ) ) {
/* TODO: Revisit the depths of all the snapshot links */
@greptile-jt

greptile-jt Bot commented Jun 10, 2026

Copy link
Copy Markdown

Greptile Summary

This PR increases the maximum fragment size (MTU) for snapshot restore links (snapld_dc and snapdc_in) from 64KB (USHORT_MAX) to 256KB (1UL<<18), enabling larger data chunks during snapshot loading. Since the mcache sz field is a ushort (16-bit, max 65535), the PR works around this by repurposing the tsorig field (uint, 32-bit) to carry the fragment size instead. The link depth is correspondingly reduced from 16384 to 4096 to maintain roughly the same total dcache memory footprint (~1GB).

  • All fd_stem_publish calls in the snapshot pipeline now place the fragment size in the tsorig parameter (7th arg) instead of sz (5th arg), which is set to 0.
  • All consumer returnable_frag callbacks updated to read size from tsorig instead of sz, with sz marked FD_PARAM_UNUSED.
  • Control message forwarding in fd_snapdc_tile.c and fd_snapld_tile.c now advances the dcache chunk pointer by actual message size instead of the full MTU, which reduces dcache waste.
  • All four topology files (topology.c, backtest.c, forktest.c, snapshot_load.c) are updated consistently.

Confidence Score: 5/5

This PR is safe to merge — all producers and consumers in the snapshot pipeline are consistently updated.

The change is mechanically straightforward: move size from the 16-bit sz mcache field to the 32-bit tsorig field across all snapshot tiles. All four topology files are updated identically. All five tile files are consistently updated on both producer and consumer sides. The tsorig field (uint, max ~4B) easily accommodates the new 256K MTU. Depth reduction from 16384 to 4096 maintains the same total dcache footprint. No during_frag callbacks exist in these tiles, so no additional consumers were missed. Verified that both snapin and snapwr consume from snapdc_in, and both are updated.

No files require special attention — all changes are consistent and correct.

Important Files Changed

Filename Overview
src/app/firedancer/topology.c Updated snapld_dc and snapdc_in link definitions: depth 16384→4096, MTU USHORT_MAX→1UL<<18. Consistent with other topology files.
src/discof/restore/fd_snapld_tile.c Producer: publishes size via tsorig in fd_stem_publish. Consumer: reads size from tsorig in returnable_frag. Also fixes dcache advance for control messages from MTU to actual size.
src/discof/restore/fd_snapdc_tile.c Producer/consumer: all fd_stem_publish calls move size from sz to tsorig. returnable_frag reads tsorig. Control message dcache advance changed from ctx->out.mtu to actual size.
src/discof/restore/fd_snapin_tile.c Consumer: returnable_frag updated to pass tsorig to handle_data_frag instead of sz. sz marked FD_PARAM_UNUSED.
src/discof/restore/fd_snapwr_tile.c Consumer: returnable_frag updated to pass tsorig to handle_data_frag instead of sz. sz marked FD_PARAM_UNUSED.
src/discof/restore/fd_snapct_tile.c Producer: publishes init message size via tsorig. Consumer: returnable_frag passes tsorig to snapld_frag instead of sz.
src/app/firedancer-dev/commands/backtest.c Same topology link parameter changes as topology.c — depth and MTU updated for snapshot links.
src/app/firedancer-dev/commands/forktest/forktest.c Same topology link parameter changes as topology.c — depth and MTU updated for snapshot links.
src/app/firedancer-dev/commands/snapshot_load.c Same topology link parameter changes as topology.c — depth and MTU updated for snapshot links.

Sequence Diagram

sequenceDiagram
    participant snapct as snapct_tile
    participant snapld as snapld_tile
    participant snapdc as snapdc_tile
    participant snapin as snapin_tile
    participant snapwr as snapwr_tile

    snapct->>snapld: "snapct_ld: init msg via tsorig"
    snapld->>snapdc: "snapld_dc: data/ctrl via tsorig (MTU=256K)"
    snapld->>snapdc: "snapld_dc: meta msg via tsorig"
    snapdc->>snapin: "snapdc_in: decompressed data via tsorig (MTU=256K)"
    snapdc->>snapwr: "snapdc_in: decompressed data via tsorig (MTU=256K)"
    Note over snapct,snapwr: sz field always 0. Actual size carried in tsorig (uint 32-bit)
Loading

Reviews (1): Last reviewed commit: "restore: increase MTU from 64K to 256K" | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants