Skip to content

Fix Lightning channel flapping and force-closes; add reconcile RPC#3

Open
refined-element wants to merge 7 commits into
mainfrom
fix/ln-flapping-and-coop-close-fee
Open

Fix Lightning channel flapping and force-closes; add reconcile RPC#3
refined-element wants to merge 7 commits into
mainfrom
fix/ln-flapping-and-coop-close-fee

Conversation

@refined-element
Copy link
Copy Markdown
Owner

Summary

Three related fixes found while diagnosing why two open channels with an lnd peer kept dropping, and why four channels with that same peer had force-closed.

  • Stop force-closes from coop-close fee deadlockChannelCloseMinimum accepted only the 90th-percentile mempool fee with a 4 sat/vB floor (~722 sat). When lnd proposed a reasonable ~1 sat/vB close on a quiet mempool, LDK refused and force-closed. Now samples the 10th percentile / 1 sat/vB floor so economical coop closes succeed.
  • Stop peer flapping from over-frequent LDK timer tickstick() (run ≈1×/s) called peer_manager.timer_tick_occurred() and channel_manager.timer_tick_occurred() every time. LDK expects ~10s and ~60s respectively; driving the peer timer 10× too fast made ping/pong keepalive deadlines fire prematurely and disconnected healthy peers (~11‑min flap cadence, no logged reason). Now gated to the documented cadence via a tick counter.
  • Add reconcilechannels RPC — compares LDK's local channel view against on-chain reality via a configurable block explorer (default mempool.space), emitting a per-channel verdict. Needed because libbitcoinkernel 0.2 exposes no UTXO lookup and the node prunes, so it can't verify funding-output spend status locally.

Background

A litd/lnd v0.7.1 counterparty reported the two live channels with this LDK node were flapping (flap_count ~1,800, ~92.5% uptime) and that four prior channels had force-closed. On-chain verification (all three sources: lnd listchannels, LDK is_channel_ready, and chain funding-output status) confirmed the two surviving channels are genuinely open — the problem was the peer link and the close-fee logic, not channel fate.

Details

1. crates/wolfe-lightning/src/fee_estimator.rs

The force-close trace was explicit in the LDK log:

ERROR ldk: Closed channel … due to close-required error: Unable to come to
consensus about closing feerate, remote wants something (166 sat) lower than
our min fee (722 sat)

ChannelCloseMinimum is the lowest coop-close feerate we'll accept. Setting it high turns a routine low-fee coop close into a force-close. Lowered sampling 0.9 → 0.1 and floor 1000 → 253 sat/kw (1 sat/vB).

2. crates/wolfe-lightning/src/lib.rs

Added an AtomicU64 tick counter; peer_manager.timer_tick_occurred() now fires every 10th tick (~10s) and channel_manager.timer_tick_occurred() every 60th (~60s). Fast event-processing (process_pending_events, process_pending_htlc_forwards, process_events) still runs every tick.

3. crates/wolfe-rpc/src/handlers.rs

reconcilechannels (optional param [explorer_base_url], default https://mempool.space) returns per-channel OPEN_CONFIRMED / CLOSED_ONCHAIN / NO_FUNDING_TX / UNKNOWN plus a summary; degrades to UNKNOWN with a manual explorer URL if the explorer is unreachable. Adds reqwest (rustls, no default features).

Test plan

  • cargo build --release -p wolfe-node — clean
  • cargo clippy --release -p wolfe-lightning -p wolfe-rpc -- -D warnings — clean
  • cargo fmt — clean
  • Live: restarted node with all three fixes; both channels reconnect and report num_active_channels: 2; reconcilechannels returns OPEN_CONFIRMED for the funded channels.
  • Soak: monitor flap_count on the lnd side over 24h to confirm the cadence fix reduces disconnects.

🤖 Generated with Claude Code

Three related fixes for Lightning channel stability with lnd peers,
found while diagnosing why two open channels kept dropping and why
four channels with the same peer had force-closed.

1. Force-closes from cooperative-close fee deadlock
   (crates/wolfe-lightning/src/fee_estimator.rs)

   ChannelCloseMinimum — the *lowest* coop-close feerate we'll accept
   from a peer — was sampling the 90th mempool percentile with a 4 sat/vB
   floor (~722 sat for a close tx). When lnd proposed a reasonable ~1
   sat/vB (166 sat) close on a quiet mempool, LDK refused and escalated
   to a force-close. Lower the sample to the 10th percentile and the
   floor to 1 sat/vB so economical coop closes are accepted instead of
   burning the channel on-chain.

2. Peer connection flapping from over-frequent LDK timer ticks
   (crates/wolfe-lightning/src/lib.rs)

   tick() called peer_manager.timer_tick_occurred() and
   channel_manager.timer_tick_occurred() on every invocation, but tick()
   runs ~1×/second. PeerManager expects ~10s (it drives ping/pong
   keepalive) and ChannelManager expects ~60s. Driving them 10–60× too
   fast made ping/pong deadlines fire prematurely and disconnected
   healthy peers (~11-min flap cadence with no logged reason). Gate both
   timer ticks behind a tick counter so they fire at the documented
   cadence; the fast event-processing calls still run every tick.

3. reconcilechannels RPC
   (crates/wolfe-rpc/src/handlers.rs)

   New JSON-RPC method that compares LDK's local channel view against
   on-chain reality. The node can't answer this alone (libbitcoinkernel
   0.2 has no UTXO lookup and the node prunes), so it queries a
   configurable block explorer (default mempool.space; pass your own/Tor
   esplora as param 0) for each channel's funding-output spend status and
   emits a per-channel verdict: OPEN_CONFIRMED, CLOSED_ONCHAIN,
   NO_FUNDING_TX, or UNKNOWN. Degrades gracefully to UNKNOWN with a
   manual explorer URL when the explorer is unreachable. Adds reqwest
   (rustls, no default features) for the lookup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 26, 2026 01:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets Lightning reliability issues observed with an lnd peer (channel flapping and unexpected force-closes) and adds a diagnostic RPC to compare LDK’s channel view with on-chain funding-output status via an esplora-compatible block explorer.

Changes:

  • Adjusts LDK fee estimation for ChannelCloseMinimum to accept lower coop-close feerates.
  • Gates LDK PeerManager and ChannelManager timer ticks to their documented cadence instead of every 1s tick().
  • Adds a new reconcilechannels JSON-RPC method (and introduces reqwest) to query funding output spend status from an explorer.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
crates/wolfe-lightning/src/fee_estimator.rs Changes coop-close minimum fee sampling and lowers the floor.
crates/wolfe-lightning/src/lib.rs Adds a tick counter to gate LDK timer ticks to ~10s/~60s cadence.
crates/wolfe-rpc/src/handlers.rs Adds reconcilechannels RPC and an esplora outspend query helper.
crates/wolfe-rpc/Cargo.toml Adds reqwest from workspace dependencies.
Cargo.toml Adds workspace reqwest dependency (rustls + json).
Cargo.lock Locks new dependency graph additions from reqwest.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +71 to +72
// Sample a low percentile so we accept economical closes.
ConfirmationTarget::ChannelCloseMinimum => self.sample_fee_rate_sat_per_vb(0.1),
Comment on lines +94 to 96
// 1 sat/vB: accept low coop-close fees rather than force-closing.
ConfirmationTarget::ChannelCloseMinimum => 253, // 1 sat/vB
_ => 253, // 1 sat/vB minimum
Comment on lines +625 to +633
let base = get_param_str(params, 0)
.unwrap_or("https://mempool.space")
.trim_end_matches('/')
.to_string();

let client = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(15))
.build()
.map_err(|e| RpcError::Internal(format!("http client: {e}")))?;
Comment on lines +619 to +626
// Params: [explorer_base_url?] (default https://mempool.space)
"reconcilechannels" => {
let ln = state
.lightning()
.ok_or_else(|| RpcError::Lightning("lightning not enabled".to_string()))?;

let base = get_param_str(params, 0)
.unwrap_or("https://mempool.space")
Comment on lines +635 to +703
let channels = ln.channel_manager().list_channels();
let mut results = Vec::with_capacity(channels.len());
let (mut open, mut closed, mut unknown, mut no_funding) = (0u32, 0u32, 0u32, 0u32);

for c in &channels {
let funding = c.funding_txo.map(|op| (op.txid.to_string(), op.index));

let (verdict, detail, onchain) = match &funding {
None => {
no_funding += 1;
(
"NO_FUNDING_TX",
"channel has no funding outpoint yet (still negotiating)".to_string(),
json!({ "checked": false }),
)
}
Some((txid, vout)) => {
match explorer_outspend(&client, &base, txid, *vout).await {
Err(e) => {
unknown += 1;
(
"UNKNOWN",
format!("explorer lookup failed: {e}"),
json!({
"checked": false,
"explorer_url": format!("{base}/tx/{txid}"),
}),
)
}
Ok(v) => {
let spent =
v.get("spent").and_then(|x| x.as_bool()).unwrap_or(false);
if spent {
closed += 1;
let stx = v.get("txid").and_then(|x| x.as_str());
let height =
v.pointer("/status/block_height").and_then(|x| x.as_u64());
let detail = format!(
"funding output SPENT on-chain{} — channel closed; LDK still reports is_channel_ready={} (state divergence)",
height.map(|h| format!(" at height {h}")).unwrap_or_default(),
c.is_channel_ready
);
(
"CLOSED_ONCHAIN",
detail,
json!({
"checked": true,
"funding_spent": true,
"spending_txid": stx,
"spent_block_height": height,
"explorer_url": format!("{base}/tx/{txid}"),
}),
)
} else {
open += 1;
(
"OPEN_CONFIRMED",
"funding output unspent on-chain — channel genuinely open"
.to_string(),
json!({
"checked": true,
"funding_spent": false,
"explorer_url": format!("{base}/tx/{txid}"),
}),
)
}
}
}
}
Comment on lines +1146 to +1154
async fn explorer_outspend(
client: &reqwest::Client,
base: &str,
txid: &str,
vout: u16,
) -> Result<Value, String> {
let url = format!("{base}/api/tx/{txid}/outspend/{vout}");
let resp = client.get(&url).send().await.map_err(|e| e.to_string())?;
if !resp.status().is_success() {
refined-element and others added 6 commits May 26, 2026 23:44
`block_connected` skipped per-block notifications during IBD whenever
`channel_manager.list_channels()` returned empty, as an optimization for
fresh nodes with no channels. But `list_channels()` reflects only OPEN
channels: once channels close, the list goes empty even though the
`chain_monitor` still holds ChannelMonitors that need block updates to
detect CSV maturity, HTLC timeouts, and breach justice on force-closes.

Symptom that surfaced this: after both channels were closed (one coop,
one force-close), the force-close to_self output sat past its CSV
maturity height for hours without LDK emitting SpendableOutputs.
ChainMonitor was being starved of block_connected calls and never
reconciled the matured output. With the fix, monitor-aware gating allows
chain notifications to keep flowing for as long as any monitor exists,
and the SpendableOutputs event fired immediately on the next restart.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After LDK sweeps SpendableOutputs (coop-close to_remote or force-close
to_self), the resulting UTXOs sit at LDK's KeysManager-derived P2WPKH
destination address. BDK doesn't know about that address, so existing
wallet RPCs can't reach the funds.

This adds:

  ln_sweep_to_address <dest_addr> [<fee_rate_sat_per_vb>]
                      [<explorer_base_url>] [<include_unconfirmed>]

The RPC queries an explorer (mempool.space by default) for UTXOs at the
KeysManager destination address, synthesizes StaticOutput descriptors,
and lets KeysManager sign + broadcast a single sweep transaction to the
user-supplied address. Synthetic StaticOutput works because LDK's
StaticOutput signing path keys off output.script_pubkey rather than
channel-specific state — any UTXO at our destination_script is signable
regardless of how it got there.

Two new LightningManager methods support this: keys_destination_address
(for explorer queries) and sweep_outpoints (the actual sign-and-
broadcast path). sweep_fee_rate_sat_per_kw is also exposed for callers
that want the default sweep feerate, and network() so callers can
validate destination network matches.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extend ln_sweep_to_address so a single tx can sweep both:

  (a) Plain UTXOs at the KeysManager destination script (already
      supported — these are sweep results from prior SpendableOutputs
      events, signed via synthetic StaticOutput descriptors).

  (b) Pending close-related claims discovered from each ChannelMonitor.
      For every monitor whose funding outpoint is spent on-chain, we
      fetch the close tx (via explorer), pass it to
      ChannelMonitor::get_spendable_outputs, and bundle the returned
      descriptors. This captures CSV-locked to_self outputs from
      force-closes — including ones whose automatic SpendableOutputs
      sweep failed to broadcast (e.g. dust output, network policy).

The combined-input flow avoids dust-output failures that plague tiny
individual force-close to_self sweeps: bundling them with a destination
UTXO from a coop-close result raises the input total enough to clear
P2WPKH dust on the output side.

LDK's get_spendable_outputs replays descriptors from the close tx
regardless of subsequent on-chain spends, so we filter each
monitor-derived descriptor by querying the explorer's outspend endpoint
on its outpoint. Already-spent outpoints are reported under
descriptors_skipped_already_spent in the per-channel summary.

Three new LightningManager methods (list_channel_monitor_ids,
monitor_funding_outpoint, monitor_spendable_outputs) plus a generic
sweep_descriptors that takes pre-built SpendableOutputDescriptor list.
sweep_outpoints becomes a thin convenience wrapper.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The first PR commit lowered the ChannelCloseMinimum floor from 1000
sat/kw (4 sat/vB) to 253 sat/kw (1 sat/vB) to accept economical
coop-close fees rather than force-closing. The corresponding unit test
was still asserting the old floor value.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Same root cause as the prior fee_estimator_tests update — the original
test asserted >=1000 sat/kw, but we intentionally lowered the floor to
253 (1 sat/vB) to avoid forcing force-closes when a peer proposes an
economical coop-close feerate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants