Skip to content

fix(scenarios): retry RPC query in compute-target-height (chain-RPC bind race)#295

Merged
bdchatham merged 1 commit into
mainfrom
fix/compute-target-height-rpc-retry
May 19, 2026
Merged

fix(scenarios): retry RPC query in compute-target-height (chain-RPC bind race)#295
bdchatham merged 1 commit into
mainfrom
fix/compute-target-height-rpc-retry

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

Live manual run on harbor: `seictl nd watch --until=Ready` returns when SeiNode pods report Running, but seid's Tendermint RPC server takes a few more seconds AFTER that to bind port 26657. The single-shot curl in `compute-target-height` loses that race:

```
curl: (7) Failed to connect to -internal.nightly.svc:26657 after 10 ms:
Could not connect to server
failed to parse latest_block_height from .../status
```

Manual curl from a fresh pod 90s later returns HTTP 200 with `height: 286` — the chain IS up, just not at the instant `nd watch` returned.

Fix

Wrap the curl in a 30-attempt retry loop with 3s sleep (90s window) + 3s `--connect-timeout` per attempt. Matches the retry pattern `resolve-proposal-id` already uses (different step, same shape: chain-side query that tolerates brief warmup).

Symptom chain

  • compute-target-height exits 1 → workflow-vars ConfigMap not created
  • Downstream `submit-upgrade-proposal` seitask-runner pod stuck in `CreateContainerConfigError` because its `envFrom: configMapRef` can't resolve

Single fix at the source resolves the whole cascade.

Should `seictl nd watch --until=Ready` itself wait for RPC?

Probably. The SeiNode controller's Ready signal currently means "pods are scheduled and plan is complete." A "RPC bound" stricter condition would push this retry logic into the controller and let scenarios stay single-shot. Not in scope here — file as a separate follow-up.

Bug #9 in the major-upgrade-runs-end-to-end debugging chain. Same shape as several earlier ones: a timing/readiness assumption that doesn't hold under load.

🤖 Generated with Claude Code

…ind race)

Live manual run on harbor surfaced this: `seictl nd watch --until=Ready`
returns when SeiNode pods report Running (status.phase=Running, all
plan tasks Complete) — but seid's Tendermint RPC server takes a few
more seconds AFTER that to actually bind port 26657. The
compute-target-height bash step's single-shot curl loses that race:

  curl: (7) Failed to connect to <snd>-internal.nightly.svc:26657 after
  10 ms: Could not connect to server
  failed to parse latest_block_height from .../status

Manual curl from a fresh pod 90s later returns HTTP 200 with height
286 — the chain IS up, just not at the instant `nd watch` returned.

Wrap the curl in a 30-attempt retry loop with 3s sleep (90s window)
and a 3s --connect-timeout. Matches the retry pattern
resolve-proposal-id already uses (different step, same shape:
chain-side query that needs to tolerate a brief warmup).

Symptom chain on the live run: compute-target-height exits 1 →
workflow-vars ConfigMap not created → downstream submit-upgrade-proposal
seitask-runner pod stuck in CreateContainerConfigError because its
envFrom configMapRef can't resolve. Single fix at the source resolves
the whole cascade.

Bug #9 in the major-upgrade-runs-end-to-end debugging chain. Same
shape as several earlier ones: an assumption about timing/readiness
that doesn't survive contact with the cluster.
@cursor
Copy link
Copy Markdown

cursor Bot commented May 19, 2026

You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace.

To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard.

@bdchatham bdchatham merged commit ea245c8 into main May 19, 2026
2 checks passed
@bdchatham bdchatham deleted the fix/compute-target-height-rpc-retry branch May 19, 2026 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant