Skip to content

ci(e2e): bump pinned runtime version from 1.17.3 to 1.17.7#1645

Merged
cicoyle merged 1 commit into
masterfrom
fix/e2e-bump-pinned-runtime-1.17.7
May 22, 2026
Merged

ci(e2e): bump pinned runtime version from 1.17.3 to 1.17.7#1645
cicoyle merged 1 commit into
masterfrom
fix/e2e-bump-pinned-runtime-1.17.7

Conversation

@nelson-parente
Copy link
Copy Markdown
Contributor

Summary

The E2E - KinD non-HA matrix has been red for 10 days (since 2026-05-11). Every scheduled run consistently fails with panic: test timed out after 25m0s in TestRenewCertificateMTLSEnabled/Renew_certificate_which_expires_in_less_than_30_days.

This bumps the pinned runtime version used by the E2E workflows from 1.17.31.17.7 to restore compatibility with the cert key algorithm the CLI now generates.

Root cause

#1629 bumped github.com/dapr/dapr from stable v1.17.3 to pseudo-version v1.17.0-rc.1.0.20260425162356-f8d0f6142987 to pull in the workflow proto renames. That commit (2026-04-25) sits after dapr/dapr#9598 ("Sentry: Use Ed25519 for X.509 certificate key generation", merged 2026-03-18). Through the transitive bump, bundle.GenerateX509() — called by pkg/kubernetes/renew_certificate.go — now produces Ed25519 root and issuer keys.

The E2E workflows hardcode DAPR_RUNTIME_PINNED_VERSION: 1.17.3 for scheduled runs. Sentry 1.17.3 cannot parse Ed25519 trust-bundle keys; it crashes on startup with:

unsupported key type ed25519.PrivateKey

This is the documented incompatibility called out in the 1.18 release notes as the "rollback floor is 1.17.7" rule.

After cert renewal, sentry crash-loops, and the --restart post-step in cmd/renew_certificate.go calls kubectl rollout status with no timeout, so it blocks until the Go test alarm fires at 25 minutes.

Why only non-HA?

TestRenewCertificateMTLSEnabled has if common.ShouldSkipTest(common.DaprModeNonHA) { t.Skip(...) } at the top — it only runs in non-HA mode. HA matrix legs skip it and stay green.

Fix

Bump the pinned runtime to 1.17.7 — the first stable release that includes the Ed25519 PEM-decoder backport (dapr/dapr#9904) and is officially compatible with a 1.18-cycle CLI.

Applied to both:

  • .github/workflows/kind_e2e.yaml
  • .github/workflows/self_hosted_e2e.yaml

Follow-up (separate PR)

The kubectl rollout status call in restartControlPlaneService (cmd/renew_certificate.go) has no timeout, which is what turned this incompatibility into a 25-minute silent hang instead of a clean failure. Worth adding --timeout=Xs defense-in-depth.

Test plan

  • E2E - KinD non-HA matrix (v1.32.9, v1.33.5, v1.34.1) goes green
  • E2E - KinD HA matrix stays green (was already green)
  • E2E - Self-hosted stays/becomes green

References

The E2E - KinD non-HA matrix has been red for 10 days (since 2026-05-11).
TestRenewCertificateMTLSEnabled hangs at the 25-minute suite timeout in
the "Renew certificate which expires in less than 30 days" subtest.

Root cause: PR #1629 bumped github.com/dapr/dapr from v1.17.3 to a
pseudo-version (commit f8d0f6142987, 2026-04-25) to pull in the workflow
proto renames. That commit also includes #9598 (Sentry: Ed25519 for
X.509 cert key generation, merged 2026-03-18). The CLI now generates
Ed25519 root/issuer certs via bundle.GenerateX509().

The E2E pinned runtime was 1.17.3, whose sentry cannot parse Ed25519
keys -- it crashes on startup with "unsupported key type
ed25519.PrivateKey" (a known issue documented in the 1.18 release notes
as the "rollback floor is 1.17.7" caveat). After the cert renewal step,
sentry crash-loops and the post-renewal `kubectl rollout status` call
(which has no timeout) blocks until the 25-minute test alarm fires.

Bumping the pinned version to 1.17.7 -- the first stable release that
includes the Ed25519 PEM-decoder fix (dapr/dapr#9904) -- aligns the
E2E runtime with the minimum compatible sentry for a 1.18 CLI.

Only non-HA matrix legs were affected because TestRenewCertificateMTLSEnabled
is skipped in HA mode (DaprModeNonHA guard at the top of the test).

Note: the unbounded `kubectl rollout status` in `restartControlPlaneService`
is a separate latent bug that turned this incompatibility into a 25-min
hang instead of a clean failure. Worth fixing in a follow-up.

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>
@nelson-parente nelson-parente requested review from a team as code owners May 21, 2026 11:21
@cicoyle cicoyle merged commit 1aacd09 into master May 22, 2026
35 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants