fix(keystone): recover task state after axon reconnect#84
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Checklist
Please ensure your PR meets the following requirements:
Summary
This PR fixes Axon recorder/transfer reconnect handling after stale WebSocket connections and reconciles Keystone task state when the edge process reconnects with its current recorder state.
It improves recovery for sudden process stops, robot power loss, and ping-timeout reconnect windows without introducing the larger durable upload-intent model.
Motivation
kill -STOPcan leave Keystone holding a stale recorder or transfer WebSocket.pending; when Axon later reconnected inreadyorrecording, Keystone did not always reconcile the task back to the expected state.Changes
Modified Files
[internal/api/handlers/axon_rpc.go](internal/api/handlers/axon_rpc.go)- Adds recorder stale-connection takeover, ping timeout close handling, cleaner WebSocket-close logging, and task state reconciliation from recorderstate_updateevents.[internal/api/handlers/task.go](internal/api/handlers/task.go)- Allows recording start callbacks to reconcilependingorreadytasks intoin_progress.[internal/api/handlers/transfer.go](internal/api/handlers/transfer.go)- Adds transfer stale-connection takeover, ping timeout close handling, cleaner WebSocket-close logging, and upload ACK completion frompending,ready, orin_progress.[internal/config/config.go](internal/config/config.go)- Adds recorder and transfer ping timeout and stale-threshold configuration defaults.[internal/services/hub.go](internal/services/hub.go)- Adds generic stale connection replacement based on connectionLastSeenAt.[internal/services/recorder_hub.go](internal/services/recorder_hub.go)- Exposes recorder connectionLastSeenAtand stale-threshold connect support.[internal/services/transfer_hub.go](internal/services/transfer_hub.go)- Exposes transfer connectionLastSeenAtand stale-threshold connect support.Added Files
[internal/api/handlers/task_state_recovery_test.go](internal/api/handlers/task_state_recovery_test.go)- Covers task reconciliation from recorder ready/recording/start callback paths.[internal/api/handlers/websocket_log.go](internal/api/handlers/websocket_log.go)- Centralizes filtering of expected WebSocket close errors.[internal/services/hub_test.go](internal/services/hub_test.go)- Covers stale-connection rejection/replacement behavior and stale handler disconnect safety.Deleted Files
None.
Type of Change
Impact Analysis
Breaking Changes
None.
Backward Compatibility
Fully backward compatible. Existing recorder and transfer WebSocket protocols are unchanged.
Testing
Test Environment
Local Keystone worktree with
GOCACHE=/tmp/keystone-go-build-cache.Test Cases
Manual Testing Steps
Test Coverage
Commands run:
Screenshots / Recordings
Not applicable.
Performance Impact
Documentation
Related Issues
Additional Notes
Reviewers
Notes for Reviewers
[internal/services/hub.go](internal/services/hub.go).[internal/api/handlers/axon_rpc.go](internal/api/handlers/axon_rpc.go).[internal/api/handlers/transfer.go](internal/api/handlers/transfer.go).Checklist for Reviewers