Skip to content

Sidecar config-apply should populate chain-id in client.toml #246

@bdchatham

Description

@bdchatham

Problem

When a SeiNode is first deployed against a freshly-mounted BYOV data volume that doesn't already contain a config/client.toml, the cosmos-sdk seid init (run from the seid-init initContainer) writes a client.toml with chain-id = '' (empty).

On the next boot of the main seid container, the cosmos-sdk startup invariant compares genesis.json's chain_id against client.toml's chain-id and panics if they differ:

panic: genesis file chain-id=pacific-1 does not equal config.toml chain-id=

The sidecar's config-apply task currently rewrites config.toml and app.toml to match the desired controller-side config, but does not populate chain-id in client.toml from spec.chainId — so first-boot ends in a guaranteed crashloop until an operator intervenes.

Impact

  • Crashloop blocks every fresh archive node deployment. Hit during pacific-1/archive-1 redeploy (PR sei-protocol/platform#519) — pod sat in CrashLoopBackOff until live patch via kubectl debug ephemeral container.
  • Workaround required for archive-2 redeploy: pre-seeded client.toml directly on the EBS volume before mounting (PR sei-protocol/platform#540), bypassing the controller's responsibility.
  • Future BYOV deployments will hit this on every first boot unless someone remembers the workaround, defeating the controller's "give me a chain + a volume and I'll run it" contract.

Relevant experts

  • @bdchatham (controller contributor, hit the bug twice in the field)
  • kubernetes-specialist (sidecar / seictl ownership)

Proposed approach

In the seictl sidecar's config-apply task, after rewriting config.toml/app.toml, also patch /sei/config/client.toml:

chain-id = "<spec.chainId>"

Use the same seictl config patch --target client machinery the sidecar already has. The chain ID is already in the bootstrap context (SeiNode.spec.chainId), so no new spec fields needed.

Verification: e2e test that boots a SeiNode against an empty data volume and asserts seid reaches committed state (not panic) without manual intervention.

Acceptance criteria

  • Sidecar's config-apply task writes chain-id into client.toml from spec.chainId (idempotent — re-running config-apply with the same chainId is a no-op)
  • If client.toml doesn't exist on disk yet (first boot before cosmos-sdk has run), config-apply either creates it with reasonable defaults including chain-id, or defers and re-runs after cosmos-sdk creates the skeleton
  • Unit test for the new patch behavior (chainId set, empty, conflicting value)
  • No regression on existing archive nodes that already have a working client.toml

Out of scope

  • Other unpopulated client.toml fields (keyring-backend, node, etc.) — these have sensible cosmos-sdk defaults; only chain-id is the panic-trigger
  • Live migration / patching of already-deployed nodes — they have working client.toml already from the manual workaround; this is for future deployments

References

  • sei-protocol/platform#519 — archive-1 redeploy where this was first hit
  • sei-protocol/platform#540 — archive-2 redeploy where it was pre-empted via volume-level seeding
  • Cosmos-SDK startup panic: sei-cosmos/server/start.go:200 (StartCmd.func2 — genesis-vs-config chain-id check)
  • Sidecar code likely at: internal/task/config-apply (in seictl)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions