Skip to content

Harden recorder state sync and cloud sync scheduling#85

Merged
shark0F0497 merged 12 commits into
mainfrom
dev/ws-connection-takeover
May 29, 2026
Merged

Harden recorder state sync and cloud sync scheduling#85
shark0F0497 merged 12 commits into
mainfrom
dev/ws-connection-takeover

Conversation

@shark0F0497
Copy link
Copy Markdown
Collaborator

Pull Request Checklist

Please ensure your PR meets the following requirements:

  • Code follows the style guidelines
  • Tests pass locally
  • Code is formatted
  • Documentation updated if needed
  • Commit messages follow conventional commits
  • PR description is complete and clear

Summary

This PR hardens recorder/transfer state coordination around reconnects, timeouts, and connection takeover, adds device state streaming support, improves timeout logging, and introduces an opt-in cloud sync auto-scan toggle.


Motivation

  • Prevent unsafe task configuration while recorder state is stale, syncing, or owned by a replaced connection.
  • Improve recovery when recorder or transfer connections reconnect after transient failures.
  • Make state changes observable to clients through a device state stream.
  • Keep newly approved cloud-sync-eligible episodes local by default while preserving explicit manual sync and retry recovery.

Changes

Modified Files

  • internal/api/handlers/axon_rpc.go - Harden recorder RPC handling, state confidence, reconnect reconciliation, and stale state protections.
  • internal/api/handlers/task.go - Improve task callback and recorder/transfer coordination behavior.
  • internal/api/handlers/transfer.go - Handle transfer connection takeover and replaced messages more safely.
  • internal/services/recorder_hub.go - Track recorder state and connection lifecycle more defensively.
  • internal/services/transfer_hub.go - Add transfer hub support for connection takeover and stale connection handling.
  • internal/services/device_state_broker.go - Add device state broadcast infrastructure.
  • internal/api/handlers/device_state_stream.go - Expose device state stream handling.
  • internal/cloud/* - Add timeout logging helpers and improve cloud client timeout visibility.
  • internal/config/config.go - Add KEYSTONE_SYNC_AUTO_SCAN_ENABLED, defaulting to false.
  • internal/services/sync_worker.go - Gate only newly eligible episode auto-discovery behind the new auto-scan setting while keeping pending/retry processing active.
  • internal/api/handlers/sync.go - Return auto_scan_enabled from /api/v1/sync/config.
  • cmd/keystone-edge/main.go - Pass the auto-scan setting into the sync worker and include it in startup logs.
  • README.md and ARCHITECTURE.md - Document manual-first cloud sync scheduling.
  • docs/designs/* - Document recorder interaction tests and cloud sync scheduling behavior.

Added Files

  • internal/api/handlers/recorder_axon_interaction_test.go - Adds recorder/Axon interaction coverage.
  • internal/api/handlers/task_state_recovery_test.go - Adds task state recovery coverage.
  • internal/api/handlers/transfer_connection_takeover_test.go - Adds transfer connection takeover coverage.
  • internal/services/device_state_broker_test.go - Adds device state broker tests.
  • internal/cloud/timeout_log.go - Adds shared timeout logging helpers.
  • docs/designs/recorder-axon-interaction-test-plan.md - Documents recorder interaction test coverage.
  • docs/designs/recorder-axon-interaction-tests.html - Adds human-readable recorder interaction test documentation.

Deleted Files

None.


Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update (documentation changes only)
  • Refactoring (code improvement without functional changes)
  • Performance improvement (code changes that improve performance)
  • Test changes (adding, modifying, or removing tests)

Impact Analysis

Breaking Changes

None.

Backward Compatibility

Fully backward compatible. Cloud sync remains available when KEYSTONE_SYNC_ENABLED=true; only automatic discovery of newly eligible episodes is now opt-in through KEYSTONE_SYNC_AUTO_SCAN_ENABLED=true.


Testing

Test Environment

  • Ubuntu 24.04
  • Go from /usr/local/go/bin/go

Test Cases

  • Unit tests pass locally
  • Integration tests pass locally
  • E2E tests pass (if applicable)
  • Manual testing completed

Manual Testing Steps

  • Ran /usr/local/go/bin/go fmt ./....
  • Ran /usr/local/go/bin/go test ./....
  • Verified git diff --check passes.

Test Coverage

  • New tests added
  • Existing tests updated
  • Coverage maintained or improved

Screenshots / Recordings

Not applicable.


Performance Impact

  • Memory usage: No expected material change
  • CPU usage: No expected material change
  • Throughput: No expected material change
  • Lock contention: No expected material change

Documentation


Related Issues

  • Fixes #
  • Related to #
  • Refers to #

Additional Notes

  • KEYSTONE_SYNC_AUTO_SCAN_ENABLED defaults to false, so newly approved unsynced episodes are no longer auto-discovered unless explicitly enabled.
  • Pending sync rows and due retries are still processed even when auto-scan is disabled.

Reviewers

@


Notes for Reviewers

  • Please review recorder state confidence and reconnect behavior in internal/api/handlers/axon_rpc.go and internal/services/recorder_hub.go.
  • Please review the sync worker polling split in internal/services/sync_worker.go to confirm manual sync and retry recovery remain active while auto-scan is disabled.

Checklist for Reviewers

  • Code changes are correct and well-implemented
  • Tests are adequate and pass
  • Documentation is updated and accurate
  • No unintended side effects
  • Performance impact is acceptable
  • Backward compatibility maintained (if applicable)

@shark0F0497 shark0F0497 merged commit 1e67573 into main May 29, 2026
5 checks passed
@shark0F0497 shark0F0497 deleted the dev/ws-connection-takeover branch May 29, 2026 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant