Add probing service#815
Conversation
|
👋 Thanks for assigning @tnull as a reviewer! |
|
🔔 1st Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 2nd Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 3rd Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 4th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 5th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 6th Reminder Hey @tnull! This PR has been waiting for your review. |
There was a problem hiding this comment.
Hi @randomlogin, thanks for the work on this! I've reviewed the first two commits:
I've left a bunch of inline comments addressing configuration and public API, commit hygiene, testing infrastructure, and test flakiness.
In summary:
- A couple of items are exposed publicly that seem like they should be scoped to probing or gated for tests only (see
scoring_fee_paramsinConfigandscorer_channel_liquidityonNode). - The probing tests duplicate existing test helpers (
setup_node,MockLogFacadeLogger). Reusing and extending what's already intests/common/would reduce duplication and keep the test file focused on the tests themselves. test_probe_budget_blocks_when_node_offlinehas a race condition where the prober dispatches probes before the baseline capacity is measured, causing the assertion between the baseline and stuck capacities to fail. Details in the inline comment.- A few nits about commit hygiene, import structure, and suggestions for renaming stuff.
Also needs to be rebased.
|
🔔 7th Reminder Hey @tnull! This PR has been waiting for your review. |
|
@enigbe, thanks for a review, the updates are incoming soon. |
436e4a3 to
07dfde4
Compare
ff741c2 to
c31f1ce
Compare
tnull
left a comment
There was a problem hiding this comment.
Thanks for taking this on and excuse the delay here!
Did a first review pass and this already looks great! Here are some relatively minor comments, mostly concerning the API design.
|
🔔 4th Reminder Hey @enigbe! This PR has been waiting for your review. |
tnull
left a comment
There was a problem hiding this comment.
Seems tests are failing right now:
thread 'exhausted_probe_budget_blocks_new_probes' (167312) panicked at tests/probing_tests.rs:381:5:
no probe dispatched within 15 s
failures:
exhausted_probe_budget_blocks_new_probes
probe_budget_increments_and_decrements
f99786b to
1e73e6e
Compare
Add integration tests that verify the probing service fires probes on the configured interval and respects the locked-msat budget cap. Shared helpers in tests/common are extended with probing-aware setup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
b11cea0 to
1a8f945
Compare
Changed the exhaust test to be statistical (locked amount never exceeds the cap) instead of trying to turn intermediary routing node offline when it hasn't yet forwarded the probe htlc.
346e002 to
2431f88
Compare
|
Removed usage of unwrap(), reworked the exhaust test not to rely on a node stopping race condition. |
tnull
left a comment
There was a problem hiding this comment.
Excuse the delay here. Looks pretty good, but there are some minor things we might want to address before this can land.
Addressed all issues mentioned, they have their respective commits. If they are fine, I'm going to squash them. |
a956232 to
f2706d2
Compare
Previously when calculated currently locked amount, we didn't account for preflight probes sent on a payment which could result in an incorrect value of probe locked_msat. Now Prober saves the PaymentId of probes it sent and tracks them on release, ignoring the user-sent ones.
f2706d2 to
8acd55d
Compare
|
|
||
| /// Configures background probing. | ||
| /// | ||
| /// See [`ProbingConfig`] for details. |
There was a problem hiding this comment.
Please make sure the docs on the ArcedNodeBuilder mirror the ones on the regular NodeBuilder.
There was a problem hiding this comment.
I think it does so already. The builder methods do have (short) docs on them for both normal and arced builders.
| LdkEvent::ProbeFailed { .. } => {}, | ||
| LdkEvent::ProbeSuccessful { path, payment_id, .. } => { | ||
| if let Some(prober) = &self.prober { | ||
| if let Some(amount) = |
There was a problem hiding this comment.
Maybe just move this check into handle_background_probe_successful/handle_background_probe_failed? Then you could keep inflight_probes visibility at the module level, too.
| pub interval: Duration, | ||
| /// Maximum total millisatoshis that may be locked in in-flight probes at any time. | ||
| pub max_locked_msat: u64, | ||
| pub(crate) locked_msat: Arc<AtomicU64>, |
There was a problem hiding this comment.
Isn't this redudant to the sum over all inflight_probes now? Probably it's better to keep one state for accounting purposes, to avoid the possibility for them to get out-of-sync.
There was a problem hiding this comment.
yeah, if we have all the probes details now, we can just calculate it when needed (with a tiny worsening of performance, but I guess it's fine).
| interval: probing_cfg.interval, | ||
| max_locked_msat: probing_cfg.max_locked_msat, | ||
| locked_msat: Arc::new(AtomicU64::new(0)), | ||
| inflight_probes: Mutex::new(HashMap::new()), |
There was a problem hiding this comment.
Hmm, it seems accounting will still be off if we send out probes and then restart, as we'll re-init with an empty map that forgot about the previously-sent probes. Do we think that's acceptable or do we need to somehow persist this map? Seems like persistence would add some considerable complication on top? Thoughts?
There was a problem hiding this comment.
Initially I wanted probing to be without persistence for the sake of simplicity, but now it seems it is really important, because if the node really goes out of the budget then likely this will occur after a restart and eventually it will drain the whole balance.
The corner case is the situation when a probing node:
- has a lot of unresolved probes (or small probing budget)
- restarts frequently
In that case the budget would be falsely shown as near-zero, whereas in fact there some probes sent and unresolved.
If we restart node more rarely than the time needed for locked inflight probes to release, we're more or less are fine. So if a node restarts more rarely than once in two hours it should be okay (it would not drain the balance completely, but there could be time periods where the probing budget is in reality around twice as bigger as it should be).
There is an option to add to the public API this: https://github.com/lightningdevkit/rust-lightning/blob/03ab73da95af34f3ae3d28d7e83383cfa0b04fc1/lightning/src/ln/channelmanager.rs#L6143
Then we could iterate through recent payments on restart and populate the inflight_probes with max_locked_msat without persisting anything new.
Also some calculations when when the budget limit hits: assume probing is done every 500ms, or equivalently it has frequency of
Let's assume the probability of a probe to get locked for
Let's assume node has
That would mean during time
With figures from the above if the rate of failure of probes is bigger than 0.0007 (0.07%), then we would hit the budget bound. If the node restarts. And actually it's not that big (1 out of 1440 probes is locked for 2 hours).
All in all, I think it's worth adding some way of persistence, because for 'ordinary' nodes the cap seems to be easily reached if the settings (frequency, amount) is not tuned properly.
Previously, `ChannelManager::list_recent_payments` didn't give us the means to discern 'real' payments from inflight probes. In lightningdevkit/ldk-node#815 we found that we need a way to re-derive which probes are still pending so our accounting of inflight probing amounts is still correct after restart. To this end, we here let callers distinguish liquidity probes while they are pending or abandoned. Co-Authored-By: HAL 9000 Co-Authored-By: HAL 9000 Signed-off-by: Elias Rohrer <dev@tnull.de>
Added a probing service which is used to send probes to estimate channels' capacities.
Related issue: #765.
Probing is intended to be used in two ways:
For probing a new abstraction
Proberis defined and is (optionally) created during node building.Prober periodically sends probes to feed the data to the scorer.
Prober sends probes using a ProbingStrategy.
ProbingStrategy trait has only one method:
fn next_probe(&self) -> Option<Probe>; every tick it generates a probe, whereProberepresents how to send a probe.To accommodate two different ways the probing is used, we either construct a probing route manually (
Probe::PrebuiltRoute) or rely on the router/scorer (Probe::Destination).Prober tracks how much liquidity is locked in-flight in probes, prevents the new probes from firing if the cap is reached.
There are two probing strategies implemented:
Random probing strategy, it picks a random route from the current node, the route is probed via
send_probe, thus ignores scoring parameters (what hops to pick), it also ignoresliquidity_limit_multiplierwhich prohibits taking a hop if its capacity is too small. It is a true random route.High degree probing strategy, it examines the graph and finds the nodes with the biggest number of (public) channels and probes routes to them using
send_spontaneous_preflight_probeswhich uses the current router/scorer.The former is meant to be used on payment nodes, while the latter on probing nodes. For the HighDegreeStrategy to work it is recommended to set
probing_diversity_penalty_msatto some nonzero value to prevent routes reuse, however it may fail to find any available routes.There are three tests added:
Example output (runs for ~1 minute, needs
--nocaptureflag):For performance testing I had to expose the scoring data (
scorer_channel_liquidity).Also exposed
scoring_fee_params: ProbabilisticScoringFeeParameterstoConfig.TODOs: