Add more control over loadgen/latencies#394
Conversation
|
I think using the delays from the network survey is probably better than using the artificial latency calculated from geo locations. Pre this PR, for nodes we don't have the location for, we just assign them one of the locations from nodes we do know about, which may be overly optimistic. The network survey RTTs do include processing delay, but if the processing delay is substantial enough to shift the RTT, it might also be worth simulating. |
We can't do this. The simulation adds its own processing delay, so you'd end up double counting latency. We should use geolocations, because going forward we'll also want to experiments with manually assigned geos to measure impact on latency. This feels a bit like going down a rabbit hole. Stepping back a bit, can we please start with a one-pager proposal on what the e2e setup is? Specifically, we need clarity on:
Let me know if we need to hash out the requirements more. |
|
TL;DR
Time from
For the baseline measurement, a trimmed form of the most recent pubnet survey. (particular parameters not listed here since the survey is semi-sensitive)
For the baseline measurement, the nodes corresponding to Horizon instances. Open question as to whether there are other nodes we should pick/how.
For the baseline measurement, we can sidestep the issue by using some modified form of the network survey delays. Otherwise, we can use a topology that only includes the nodes we have geo data for and use the same existing delay model. It's probably worth checking both delay models to see how much they differ.
As an initial pass, we can just do a manual max tps and/or min block run and look at the metric. Open question: is this sufficient, or should we, e.g., have min block report the P50/P75/P99/etc... latencies? Proposal for e2e latency measurementAt a high level, I would like to set up supercluster/core so that we can easily tune the realism of the simulation of the e2e latency without having to change supercluster/core, just the graph file. In particular, I'd like to have the pubnet graph file include
Core measurementIt's a little artificial, but it is easiest to measure same-node e2e latency. It's pretty simple to have the nodes that do load generation have a metric that measures the time from Realistic baselineTo get the baseline number, using a topology based on the network survey seems reasonable to me. Tier1 NodesWe can just set the tier1 nodes based on the current tier1. Load GeneratorsUnfortunately, as far as I can tell, neither RPC nor Horizon list the public keys for the stellar-core nodes they use. A brief search also didn't show any of the Horizon/RPC providers listing their core public keys. So, as a first pass, we can identify the SDF horizon nodes (by looking at the logs and matching the public keys to the graph file) and use those as load generators. We could add a few more load generators by identifying nodes that are connected in a similar way. Latency ModelI think the most realistic number would come from using the survey latencies. It's true that the survey latencies include processing delay, but I think it makes sense to reflect this in the simulation: if a node was slow (e.g., because it was underpowered), then having the simulated latency be long makes sense. It's true that there will be some added latency from the simulation itself, but this also happens with geo locations, and we can subtract from the per-edge latencies the average if necessary1. I believe the ping time is based on messages that are at a lower priority than SCP, so the delay may be a little artificially bigger. However, I do think these numbers are closer to representing the true network baseline than the geolocation-based numbers. We also get the benefit that we don't have to assign a fake geolocation to each node. It's probably worth comparing this latency model to the geolocation-based one (we can prune the graph to only include nodes that we have the geolocations for); having the per-edge latencies allows us to swap between the models by just preprocessing the graph differently. PruningThe recent survey was much bigger than previous surveys, so we can do the same first pass tuning that we've done in internal. We can also prune to only include survey respondents (and optionally prune edges/nodes with delays that are excessively large (> 1000ms)). ExperimentsHaving the realistic baseline will let us evaluate various models for latency (e.g., we can compare to the current geo loc 2x slowdown model). For non-topology dissemination changes, we can either re-use the same graph as for the baseline or (especially for small tests) we can prune it further (e.g., include only the tier1 nodes and the nodes along the shortest path(s) between the load generators and tier1). For topology changes, we can do a before/after that doesn't (necessarily) use the survey graph: having the graph specify the latencies, tier1s, and load generators allows easy ad-hoc changes to the modeling assumptions (e.g., for latency). Test HarnessTo get a baseline, just running max tps/min block time at some fixed value and looking at a dashboard seems reasonable. Min Block TimeTo make this more general, we can add it to the results that min block time prints. I don't think it makes sense to use it as one of the criteria for min block time (since the latency may not be monotonically increasing with block time). Doing it this way (one number at the end instead of looking at a dashboard) may change how we want to store the e2e latency, though. The current core change just uses a Medida timer, but since the histogram percentiles are reset every 30 seconds, it's harder to get a sense of the overall P50/75/99/etc. Footnotes
|
Full e2e until data is available to downstream clients, like PRC. So this means ledger is applied and meta is emitted. I think this also means we should start measuring from /tx endpoint, which is what RPC interacts with.
Are you sure? With geolocations, we compute networking latency only. Then ping RTT includes that synthetic latency plus time inside core (scheduling, response time, etc) I don't see how this is the same. By using survey latency, we'll be double-counting processing time, so you'll get overly pessimistic results (I also don't think we need to replicate super slow nodes that would just further skew the simulation).
Watchers don't have public keys (they're rotated randomly on restart), so I think this will be tricky. What we can do is add a parameter "txSubHopCount" that assigns loadgen nodes depending on how far they are from Tier1. This will also tell us how much hop count impacts e2e latency.
I recommend minimal amount of pruning, so we can actually end up with a realistic view of the network. So removing suspected "fake nodes" on the outer rim should be good enough. |
cb5b9d7 to
fed3666
Compare
fed3666 to
015d9e5
Compare
There was a problem hiding this comment.
Pull request overview
This PR extends supercluster’s pubnet simulation and load-generation controls to better support experimentation with end-to-end transaction latency, including a new pubnet data format that can carry per-edge one-way delays and explicit “generates load” node selection, plus new mission/config knobs to simulate large network delays and gate loadgen latency metrics to only the relevant nodes.
Changes:
- Add a new
--pubnet-data-delaymode that loads per-edge one-way delays and per-nodegeneratesLoadfrom pubnet data, and plumbsedgeDelays/generatesLoadthrough CoreSetOptions. - Add mission/config support for
PEER_AUTHENTICATION_TIMEOUTand gatedLOADGEN_MEASURE_TX_LATENCY_FOR_TESTING. - Update MaxTPS/MinBlockTime to optionally select load-generator nodes explicitly via
generatesLoad.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/FSLibrary/StellarNetworkDelays.fs | Refactors delay command generation to accept precomputed per-peer delays; adds pubnet-delay-mode script path. |
| src/FSLibrary/StellarNetworkData.fs | Introduces delay-format pubnet JSON parsing, edge-delay extraction/validation, and propagates edgeDelays + generatesLoad into CoreSet options. |
| src/FSLibrary/StellarMissionContext.fs | Adds mission context fields for delay-format pubnet data, e2e latency flag, and peer auth timeout. |
| src/FSLibrary/StellarCoreSet.fs | Extends CoreSetOptions with edgeDelays and generatesLoad. |
| src/FSLibrary/StellarCoreCfg.fs | Adds gating for loadgen e2e latency metrics and optional PEER_AUTHENTICATION_TIMEOUT. |
| src/FSLibrary/MinBlockTimeTest.fs | Uses explicit generatesLoad selection when present, otherwise preserves old behavior. |
| src/FSLibrary/MaxTPSTest.fs | Uses explicit generatesLoad selection when present, otherwise preserves old behavior. |
| src/FSLibrary/json-type-samples/sample-network-data-delay.json | Adds a sample delay-format pubnet data file for type inference. |
| src/FSLibrary.Tests/Tests.fs | Updates test MissionContext defaults and adjusts tc-command test for new delay API. |
| src/App/Program.fs | Adds CLI options and validation for delay-format pubnet data, e2e metrics flag, and peer auth timeout. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| let getNetworkDelayCommands (loc1: GeoLoc) (locsAndNames: (GeoLoc * PeerDnsName) array) (delay: int option) : ShCmd = | ||
| let getPeerDelays (loc1: GeoLoc) (locsAndNames: (GeoLoc * PeerDnsName) array) : (int * PeerDnsName) array = | ||
| // Get the one way delays from loc1 to the locationss in locsAndNames |
| | None -> | ||
| if self.missionContext.flatNetworkDelay.IsNone then | ||
| failwith | ||
| "Failed to construct network delay script: no preferred peers map or flat network delay" | ||
| else | ||
| [||] |
| let (allPubnetNodes: PubnetNode array, edgeDelays: Map<string * string, int> option) = | ||
| if context.pubnetDataDelay then | ||
| if newNodes.Length > 0 then | ||
| failwith "--pubnet-data-delay cannot be used with --tier1-orgs-to-add or --non-tier1-nodes-to-add" | ||
|
|
||
| let nodes = PubnetNodeDelayJSON.Load context.pubnetData.Value |
To support experimentation with measuring tx e2e latency, this PR does the following
PEER_AUTHENTICATION_TIMEOUT(so we can simulate large network delays)