frank, set-identity: mitigate switch-while-leader bugs#10144
frank, set-identity: mitigate switch-while-leader bugs#10144jherrera-jump wants to merge 1 commit into
Conversation
c8442f5 to
8bd15a6
Compare
There was a problem hiding this comment.
Pull request overview
Mitigates identity switching hazards around leadership transitions in the Frankendancer pipeline by deferring key switches until sufficiently far from both the previous and next leader slots, and making the GUI more resilient to slot event reordering/duplicates caused by set-identity rewinds.
Changes:
- PoH tile: tighten
set-identityapplication conditions with a leader-slot guard window and simplifymaybe_change_identitycall sites. - Replay tile: add a similar leader-slot guard window before applying pending identity switches.
- GUI: handle duplicate/reordered/missed
SLOT_END/SLOT_STARTevents by finalizing or ignoring appropriately (instead of hard error).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/discoh/pohh/fd_pohh_tile.c | Defers identity switching until safely away from leadership transitions; updates call sites accordingly. |
| src/discof/replay/fd_replay_tile.c | Adds leader-slot guard window before applying a pending identity switch. |
| src/disco/gui/fd_gui.c | Makes slot bookkeeping tolerant to stale/duplicate/reordered slot start/end events caused by identity rewind scenarios. |
8bd15a6 to
b3cfb0e
Compare
b3cfb0e to
d598d4d
Compare
d598d4d to
94b023b
Compare
94b023b to
e52588f
Compare
e52588f to
65fc59a
Compare
Greptile SummaryThis PR hardens the identity-switch (
Confidence Score: 4/5This PR is a defensive hardening change that replaces crashes with graceful recovery and adds safety windows around leader rotations; safe to merge. All three files make logically sound changes. The GUI gracefully handles slot_start/slot_end mismatches instead of crashing. The leader proximity checks in replay and pohh tiles correctly inspect both the current and pending identity's leader schedule, with proper NULL/ULONG_MAX sentinel handling. The recursive call in fd_gui_handle_slot_end is bounded to depth 2. The only concerns are stylistic (code duplication, long lines). The byte offsets for reading the pending identity public key are correct for each tile's keyswitch layout (replay gets pubkey at offset 0 from set_identity.c:224; pohh gets full keypair at offset 0 from set_identityh.c:173, with pubkey at offset 32). No files require special attention; the most complex change is in fd_gui.c with the recursive slot_end handling, which has been verified to be bounded and correct. Important Files Changed
Sequence DiagramsequenceDiagram
participant Op as Operator (set-identity)
participant KS as Keyswitch
participant Replay as Replay Tile
participant PoH as PoH Tile
participant GUI as GUI
Op->>KS: Write keypair, set SWITCH_PENDING
Note over Replay: maybe_switch_identity()
Replay->>KS: Query state == SWITCH_PENDING?
Replay->>Replay: Check epoch boundary safety
Replay->>Replay: keyswitch_near_leader(current_id)?
Replay->>Replay: keyswitch_near_leader(pending_id)?
alt Near leader or epoch boundary
Replay-->>Replay: Defer switch (return early)
else Clear of leader rotation
Replay->>KS: Copy pubkey, set COMPLETED
end
Note over PoH: maybe_change_identity()
PoH->>KS: Query state == SWITCH_PENDING?
PoH->>PoH: Check epoch boundary safety
PoH->>PoH: keyswitch_near_leader(current_id)?
PoH->>PoH: keyswitch_near_leader(pending_id)?
alt Near leader or epoch boundary
PoH-->>PoH: Defer switch (return 0)
else Clear of leader rotation
PoH->>PoH: fd_ext_admin_rpc_set_identity()
PoH->>PoH: Rewind PoH to reset_slot
PoH->>KS: Set COMPLETED
end
Note over GUI: handle_slot_start/end relaxed
GUI->>GUI: slot_start with stale open slot?
GUI->>GUI: Finalize prior slot (call slot_end)
GUI->>GUI: slot_end mismatch?
GUI->>GUI: LOG_WARNING (not ERR/crash)
Reviews (1): Last reviewed commit: "frank, set-identity: mitigate switch-whi..." | Re-trigger Greptile |
| /* keyswitch_near_leader returns 1 if identity has any leader slots | ||
| within FD_EPOCH_SLOTS_PER_ROTATION slots of cur_slot. */ | ||
| static inline int | ||
| keyswitch_near_leader( fd_multi_epoch_leaders_t const * mleaders, | ||
| ulong cur_slot, | ||
| fd_pubkey_t const * identity ) { | ||
| ulong lo = fd_ulong_if( cur_slot>=FD_EPOCH_SLOTS_PER_ROTATION, cur_slot-FD_EPOCH_SLOTS_PER_ROTATION, 0UL ); | ||
| ulong next = fd_multi_epoch_leaders_get_next_slot( mleaders, lo, identity ); | ||
| return next!=ULONG_MAX && next<=cur_slot+FD_EPOCH_SLOTS_PER_ROTATION; | ||
| } |
There was a problem hiding this comment.
Duplicated helper across tiles
keyswitch_near_leader is defined identically in both fd_replay_tile.c and fd_pohh_tile.c (line 1252). Consider extracting it into a shared header (e.g., alongside the fd_multi_epoch_leaders API in fd_multi_epoch_leaders.h) to avoid drift if the logic is ever updated in only one place.
| gives them a slightly wider window to safely switch. */ | ||
| fd_pubkey_t const * pending_identity = fd_type_pun_const( ctx->keyswitch->bytes ); | ||
| fd_epoch_leaders_t const * cur_lsched = fd_multi_epoch_leaders_get_lsched_for_slot( ctx->mleaders, ctx->reset_slot ); | ||
| if( FD_UNLIKELY( cur_lsched && ( ctx->reset_slot < cur_lsched->slot0 + FD_EPOCH_SLOTS_PER_ROTATION || ctx->reset_slot >= cur_lsched->slot0 + fd_ulong_sat_sub( cur_lsched->slot_cnt, FD_EPOCH_SLOTS_PER_ROTATION ) ) ) ) return; |
There was a problem hiding this comment.
65fc59a to
eb83b99
Compare
Uh oh!
There was an error while loading. Please reload this page.