Skip to content

fix: buffer blocks in finalizer when EL is syncing without blocking main loop#202

Open
matthias-wright wants to merge 4 commits into
audit-may-2026from
m/fix-deadlock
Open

fix: buffer blocks in finalizer when EL is syncing without blocking main loop#202
matthias-wright wants to merge 4 commits into
audit-may-2026from
m/fix-deadlock

Conversation

@matthias-wright
Copy link
Copy Markdown
Collaborator

Building on #167 and #192.

This addresses #197.

Changes:

  • Removes retries of block execution in unbounded loop when the EL is syncing.
  • Introduces buffers for the notarized and finalized blocks. If the EL returns SYNCING, the block is buffered and the main loop of the finalizer actor can continue. The block buffers are drained periodically. The interval is FINALIZER_DRAIN_INTERVAL, currently set to 5 secs.
  • The buffers are unbounded. After either one of them reached FINALIZER_BUFFERED_BLOCKS_WARN_THRESHOLD, warnings will be fired through logs and metrics.
  • Regression tests are added to verify the correct behavior.

@evonide
Copy link
Copy Markdown

evonide commented May 21, 2026

The deadlock fix looks right: when Reth returns SYNCING, the finalizer no longer waits inside the actor loop and can keep answering other messages.

There is one open question regarding the new retry queue though. For finalized blocks this seems naturally limited because syncer waits for acknowledgements before sending too many. However, notarized blocks do not seem to have the same backpressure, and 100 looks like a warning threshold rather than a hard cap.

So does this change the failure mode from “finalizer gets stuck” to “finalizer stays alive, but may keep accumulating notarized blocks while Reth is syncing”? If so, should pending notarized blocks be deduped/bounded, or is there another reason this queue cannot grow into a resource issue during prolonged EL syncing?

@matthias-wright
Copy link
Copy Markdown
Collaborator Author

Agreed, if Reth returns SYNCING for whatever reason, the pending notarized queue would grow indefinitely.
I added dedup for the pending notarized queue, and introduced a hard cap finalizer_pending_notarized_max on it. When the cap is reached, a graceful shutdown will be initiated with error logs. The cap can be set via CLI flag and defaults to 1000.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants