Skip to content

feat(contrail): add refresh phase to contrail:sync#584

Merged
tompscanlan merged 1 commit into
mainfrom
feat/contrail-sync-refresh
May 18, 2026
Merged

feat(contrail): add refresh phase to contrail:sync#584
tompscanlan merged 1 commit into
mainfrom
feat/contrail-sync-refresh

Conversation

@tompscanlan

Copy link
Copy Markdown
Contributor

Summary

Extends npm run contrail:sync from discover → backfill to discover → backfill → refresh. The third phase re-walks every known DID's PDS, compares records against our local DB, and applies anything missing or stale.

Why now: dev confirmed today that the live-ingest path catches new writes but has no mechanism to repair gaps from Jetstream cursor expiry (multi-day outage scenarios) or transient ingest failures. The contrail library already exposes Contrail.refresh() for exactly this — we just weren't calling it.

Cost: with ~1,200 known DIDs × 2 collections at the lib's default concurrency=50, an end-to-end run is single-digit minutes. Records inside ignoreWindowMs (default 60s) are skipped so the bulk of recent writes don't cause repeat PDS reads.

Test plan

Functional verification will land via the infra PR that:

  1. Bumps the dev image pin to a build containing this commit
  2. Un-suspends the contrail-sync CronJob in the dev overlay

After that:

  • Manually trigger one job in dev: kubectl create job --from=cronjob/contrail-sync contrail-sync-verify-$(date +%s) -n dev
  • Verify three phases log in order (Discovery, Backfill, Refresh) with summary stats
  • Confirm Refreshed: N missing + M stale (X users scanned) line appears
  • Re-run job → expect 0 missing, 0 stale (idempotency confirmation)
  • Manually delete a record from contrail.records_event, re-run → expect 1 missing, record restored

Prod opt-in is gated behind separate Track B work (monitoring + image pin + CONTRAIL_DATABASE_URL secret).

What this PR does NOT do

  • No infra changes (infra PR follows)
  • No new tests — Contrail.refresh() is exercised by the lib's own test suite; this PR is glue code around the existing public API
  • Local typecheck is bypassed via .openmeet-skip-typecheck because the activity-feed/atproto-identity spec baseline is on a parked cleanup list; production src/ typechecks clean for this change

Adds a third phase to `npm run contrail:sync`: after discover + backfill,
walk every known DID's PDS and apply records we're missing or have stale
locally. Closes drift caused by jetstream cursor expiry (multi-day
outages) or transient ingest failures the live-ingest pod didn't recover
from. Records inside the lib's ignore window (default 60s) are skipped
so this stays cheap to run frequently.

Intended to run as a daily CronJob alongside the live-ingest Deployment
that handles the continuous case. Re-running is idempotent and bounded
by the relay's reverse-index for our NSIDs.

Also adds the missing Refresh* type declarations to contrail.d.ts so the
local Contrail class typing matches the actual lib.
@tompscanlan tompscanlan merged commit ed24d9d into main May 18, 2026
4 checks passed
@tompscanlan tompscanlan deleted the feat/contrail-sync-refresh branch May 18, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant