Skip to content

fix(cd): poll for GH profile mismatch instead of fixed 12s sleep#77

Merged
cafca merged 1 commit into
mainfrom
deploy/wipe-poll-loop
May 7, 2026
Merged

fix(cd): poll for GH profile mismatch instead of fixed 12s sleep#77
cafca merged 1 commit into
mainfrom
deploy/wipe-poll-loop

Conversation

@cafca
Copy link
Copy Markdown
Owner

@cafca cafca commented May 7, 2026

Summary

Replace the post-deploy fixed-sleep wipe detection with a 30×3s poll that exits early on either terminal signal (hash mismatch in logs, or /info answering on port 8989). Fixes the race that bit prod on #75's first deploy.

What happened

CD ran the new GH image build/push + docker compose pull && up -d cleanly. The 12s sleep then expired before the cold-boot JVM had reached GraphHopper.checkProfilesConsistency, so the grep "does not match" against docker logs --tail 80 saw nothing and the conditional wipe never fired. CD reported success, then prod GH crash-looped (Profile 'bike' does not match. Stored: 1861193009 / Configured: -828139490) until I wiped the cache by hand.

Fix

Poll up to 90s (30 × 3s) and break out on the first of:

  • Profile '...' does not match appears → wipe + reimport (existing path).
  • /info returns 200 → GH booted clean → leave cache alone.

Common-path latency unchanged: clean boots exit the loop as soon as /info answers (typically ~6–10s warm, ~15–25s cold). Mismatched boots get caught reliably regardless of how slow the JVM is.

Test plan

  • Validated manually on helena: after wiping the cache, GH came back clean and /api/route returned a valid LineString (4044m / 861s for the central-Berlin smoke pair).
  • Next CD run on a backend-only change should poll, see /info succeed, and skip the wipe path.
  • When a future custom_models edit lands, CD should poll, see "does not match", trigger the wipe, and end with a healthy GH.

The fixed-sleep wipe-detection in #75 raced on the first real deploy:
cold-boot JVM took longer than 12s to reach checkProfilesConsistency,
so the grep against `docker logs --tail 80` ran before the
"Profile 'bike' does not match" line had been written. CD reported
success while prod GH crash-looped, requiring a manual cache wipe.

Replace the fixed sleep with a 30×3s poll that exits early on either
terminal signal: a hash mismatch in the logs, or `/info` answering on
port 8989 (i.e. GH booted clean). Total ceiling 90s, common-path
latency unchanged because successful boots break out as soon as `/info`
responds.
@cafca cafca merged commit 1ddfd7a into main May 7, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant