fix(scenarios): retry RPC query in compute-target-height (chain-RPC bind race) by bdchatham · Pull Request #295 · sei-protocol/sei-k8s-controller

bdchatham · 2026-05-19T21:57:26Z

Live manual run on harbor: `seictl nd watch --until=Ready` returns when SeiNode pods report Running, but seid's Tendermint RPC server takes a few more seconds AFTER that to bind port 26657. The single-shot curl in `compute-target-height` loses that race:

```
curl: (7) Failed to connect to -internal.nightly.svc:26657 after 10 ms:
Could not connect to server
failed to parse latest_block_height from .../status
```

Manual curl from a fresh pod 90s later returns HTTP 200 with `height: 286` — the chain IS up, just not at the instant `nd watch` returned.

Fix

Wrap the curl in a 30-attempt retry loop with 3s sleep (90s window) + 3s `--connect-timeout` per attempt. Matches the retry pattern `resolve-proposal-id` already uses (different step, same shape: chain-side query that tolerates brief warmup).

Symptom chain

compute-target-height exits 1 → workflow-vars ConfigMap not created
Downstream `submit-upgrade-proposal` seitask-runner pod stuck in `CreateContainerConfigError` because its `envFrom: configMapRef` can't resolve

Single fix at the source resolves the whole cascade.

Should `seictl nd watch --until=Ready` itself wait for RPC?

Probably. The SeiNode controller's Ready signal currently means "pods are scheduled and plan is complete." A "RPC bound" stricter condition would push this retry logic into the controller and let scenarios stay single-shot. Not in scope here — file as a separate follow-up.

Bug #9 in the major-upgrade-runs-end-to-end debugging chain. Same shape as several earlier ones: a timing/readiness assumption that doesn't hold under load.

🤖 Generated with Claude Code

…ind race) Live manual run on harbor surfaced this: `seictl nd watch --until=Ready` returns when SeiNode pods report Running (status.phase=Running, all plan tasks Complete) — but seid's Tendermint RPC server takes a few more seconds AFTER that to actually bind port 26657. The compute-target-height bash step's single-shot curl loses that race: curl: (7) Failed to connect to <snd>-internal.nightly.svc:26657 after 10 ms: Could not connect to server failed to parse latest_block_height from .../status Manual curl from a fresh pod 90s later returns HTTP 200 with height 286 — the chain IS up, just not at the instant `nd watch` returned. Wrap the curl in a 30-attempt retry loop with 3s sleep (90s window) and a 3s --connect-timeout. Matches the retry pattern resolve-proposal-id already uses (different step, same shape: chain-side query that needs to tolerate a brief warmup). Symptom chain on the live run: compute-target-height exits 1 → workflow-vars ConfigMap not created → downstream submit-upgrade-proposal seitask-runner pod stuck in CreateContainerConfigError because its envFrom configMapRef can't resolve. Single fix at the source resolves the whole cascade. Bug #9 in the major-upgrade-runs-end-to-end debugging chain. Same shape as several earlier ones: an assumption about timing/readiness that doesn't survive contact with the cluster.

cursor · 2026-05-19T21:57:30Z

You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace.

To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard.

bdchatham merged commit ea245c8 into main May 19, 2026
2 checks passed

bdchatham deleted the fix/compute-target-height-rpc-retry branch May 19, 2026 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scenarios): retry RPC query in compute-target-height (chain-RPC bind race)#295

fix(scenarios): retry RPC query in compute-target-height (chain-RPC bind race)#295
bdchatham merged 1 commit into
mainfrom
fix/compute-target-height-rpc-retry

bdchatham commented May 19, 2026

Uh oh!

cursor Bot commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdchatham commented May 19, 2026

Fix

Symptom chain

Should `seictl nd watch --until=Ready` itself wait for RPC?

Uh oh!

cursor Bot commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant