fix: wait for pubsub listeners before reconnect#1253
Conversation
* master: fix: various correctness related enhancements (#1248)
|
Warning Review limit reached
More reviews will be available in 59 minutes and 49 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR updates Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs`:
- Around line 224-233: The calls that use .expect(...) when locking
subscriptions and program_subs (the code that initializes account_subs and
program_subs by calling subscriptions.lock() and program_subs.lock()) must not
panic on PoisonError; replace those .expect(...) usages with recoverable
handling that maps the PoisonError into a RemoteAccountProviderError and then
propagate or send that error from the actor instead of panicking. Concretely,
change subscriptions.lock().expect(...) and program_subs.lock().expect(...) to
subscriptions.lock().map_err(|e|
RemoteAccountProviderError::MutexPoisoned(format!("{}", e)))? (or similar) and
return or send that RemoteAccountProviderError to the caller/actor mailbox; make
the same change for the other occurrences around the 418-423 region so all mutex
poison paths propagate RemoteAccountProviderError rather than calling expect.
- Around line 235-253: The current loops serially call and await
Self::cancel_and_wait_for_stream_drop for entries in account_subs and
program_subs, causing N×timeout reconnect latency; change the logic to first
invoke cancellation for all subs without awaiting (collecting the returned
futures) and then await them concurrently (e.g. via futures::future::join_all or
FuturesUnordered) while still capturing the first error into first_error;
operate on the same identifiers (account_subs, program_subs, client_id,
cancel_and_wait_for_stream_drop, first_error) so cancellations run in parallel
and overall wait is bounded by a single timeout rather than multiplied by N.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e6267bb9-b27d-46e7-acf4-559ddef8ddc6
📒 Files selected for processing (3)
magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rsmagicblock-chainlink/src/remote_account_provider/pubsub_common.rsmagicblock-chainlink/src/remote_account_provider/pubsub_connection_pool.rs
* master: fix: disable Chainlink for replicas (#1238)
GabrielePicco
left a comment
There was a problem hiding this comment.
LGTM overall. Leaving two reconnect edge cases to tighten.
Amp-Thread-ID: https://ampcode.com/threads/T-019e9138-db56-779f-adf3-fed4d386eb09 Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e9138-db56-779f-adf3-fed4d386eb09 Co-authored-by: Amp <amp@ampcode.com>
* master: fix: reject short account responses (#1290) fix: wait for pubsub listeners before reconnect (#1253) fix: retry failed program subscriptions (#1268) release: 0.12.0 (#1299) Ignore compute unit price in processor fees (#1298) Recover recent pending intents on restart (#1296) chore: adjust log level (#1297) release: v0.11.4 (#1292) fix(scheduler): remove block subscription in the scheduler (#1293) fix(committor): race-condition on cleanup (#1291) Handle ALT extend invalid instruction data (#1287) fix: use provided compute limits instead of defaults (#1289) feat: snapshot accountsdb even in the replica mode (#1282) feat: added vrf ephemeral test queue, delegation record and metadata for mb-test-validator (#1281) fix: use wire size (1232), not encoded size (1644), for tx fit checks! (#1285) chore: simplify, rename out_of_order_slot and add a comment (#1284) fix: execute post-delegation actions after clone (#1278) Handle oversized single-stage committor transactions (#1277) Reduce committor RPC confirmation calls (#1271) fix: preserve streams on optimize failure (#1273)
* master: fix: reject short account responses (#1290) fix: wait for pubsub listeners before reconnect (#1253) fix: retry failed program subscriptions (#1268) release: 0.12.0 (#1299) Ignore compute unit price in processor fees (#1298) Recover recent pending intents on restart (#1296) chore: adjust log level (#1297) release: v0.11.4 (#1292) fix(scheduler): remove block subscription in the scheduler (#1293) fix(committor): race-condition on cleanup (#1291) Handle ALT extend invalid instruction data (#1287) fix: use provided compute limits instead of defaults (#1289) feat: snapshot accountsdb even in the replica mode (#1282) feat: added vrf ephemeral test queue, delegation record and metadata for mb-test-validator (#1281) fix: use wire size (1232), not encoded size (1644), for tx fit checks! (#1285)
Summary
Fix pubsub reconnect lifetime safety by ensuring reconnect waits for subscription listener tasks to stop and drop their streams before pooled pubsub clients are replaced.
Details
magicblock-chainlink
This changes the reconnect path from best-effort cancellation to a two-phase drain:
The pubsub pool reconnect docs now call out the required precondition that old listener streams must be finished before pooled clients are dropped.
Additional tests cover successful account/program listener drain, completion timeout behavior, and the fast explicit-unsubscribe path preserving the map entry for reconnect drain.
Summary by CodeRabbit
Bug Fixes
Tests
Documentation