Opt 12 v1beta1 CRDs into storage-version migration + CI guard#5391
Merged
Conversation
Adds the kubebuilder marker
+kubebuilder:metadata:labels=toolhive.stacklok.dev/auto-migrate-storage-version=true
to every v1beta1 root type and regenerates the CRD manifests. The label
is the opt-in signal for an upcoming StorageVersionMigrator controller
that will reconcile each CRD's status.storedVersions down to the current
storage version. List types are intentionally not labelled — the label
applies to the CRD itself, which is keyed on the root type.
Part of #4969.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Guards against the main footgun of the opt-in-label design for storage-version migration: if a new CRD root type is added to cmd/thv-operator/api/v1beta1/ without the migrate-marker, the StorageVersionMigrator controller silently excludes it. The problem surfaces only when a future release tries to drop a deprecated version — at which point it is far too late to fix. The test scans every root type (marker block contains both +kubebuilder:object:root=true and +kubebuilder:storageversion) and fails unless the block also contains either the migrate marker or an explicit +thv:storage-version-migrator:exclude sibling marker. Part of #4969. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5391 +/- ##
=======================================
Coverage 68.81% 68.81%
=======================================
Files 628 628
Lines 63900 63900
=======================================
+ Hits 43971 43974 +3
+ Misses 16669 16667 -2
+ Partials 3260 3259 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The marker-coverage test was only scanning cmd/thv-operator/api/v1beta1/ and filtering for files matching *_types.go. This had two gaps: 1. Future graduations (v1beta1 → v1beta2) would silently break the test: when +kubebuilder:storageversion moves to v1beta2/, the v1beta1 root types lose the storage marker and become invisible to the scanner. The test would either pass vacuously (other version dirs unscanned) or hard-fail on the "no roots found" guard with no clear path forward. 2. MCPWebhookConfig lives in api/v1alpha1/types.go (not _types.go) and carries +kubebuilder:storageversion as the only never-graduated CRD. The original PR-B commit skipped it entirely; the migrator could not act on its CRs even when explicitly opted in. Changes: - Walk cmd/thv-operator/api/v*/ instead of hardcoding v1beta1. - File filter accepts both exact "types.go" and "*_types.go" suffix. - Rename TestV1beta1TypesMarkerCoverage → TestStorageVersionRootMarkerCoverage. - Update failure message + docstring to reflect the broader scope. - Add the migrate marker to MCPWebhookConfig in v1alpha1/types.go and regenerate its two CRD YAML files under deploy/charts/operator-crds/. Future-proof: when v1beta2 lands and +kubebuilder:storageversion moves to its root types, the scanner picks them up without code changes here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous test accepted either the migrate marker or an explicit +thv:storage-version-migrator:exclude sibling. That permissive contract opens a graduation-time hole: a single-version CRD created with exclude passes CI, but at v1→v2 graduation time the developer has to remember to swap exclude→migrate. Forgetting the swap leaves the CRD silently un-migrated — exactly the failure mode this test is meant to prevent. No CRD in the codebase uses the exclude marker today. Removing it costs nothing and closes the hole. If a legitimate exclusion ever comes up, handle it as a one-off PR conversation and a hardcoded allowlist in this test — not a self-serve marker that future contributors can misuse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two defensive additions identified during stress-testing the opt-in-label design: 1. Strict marker matching. containsMarker previously used strings.Contains, which would accept "=truee" (or any extra character) as a substring match for "=true". The CI test would pass, the generated CRD YAML would carry label value "truee", and the controller's runtime check (labels[key] == "true") would silently exclude the CRD. The bug only surfaces at version-drop time months later. containsMarker now requires the marker to be followed by end-of-line or whitespace. TestContainsMarkerStrictBoundary defends the contract against future regressions to substring semantics. 2. excludedRootTypes allowlist stub. The previous commit removed the self-serve +thv:storage-version-migrator:exclude marker. This adds the documented alternative — a hardcoded allowlist in the test file — as an empty map with an example entry and a comment explaining the deliberate, reviewed-decision policy for adding entries. If a future contributor needs to exclude a CRD, the path is "add an entry, justify it in PR review"; they don't have to invent the mechanism from scratch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- v1alpha1/types.go MCPWebhookConfig: move the migrate marker to AFTER +kubebuilder:subresource:status, matching the v1beta1 convention. controller-gen ignores order so behaviour is unchanged; this is purely consistency for greppability and future readers. - marker_coverage_test.go: drop `tc := tc` shadowing in TestContainsMarkerStrictBoundary. go.mod is on go 1.26; loop variables have been per-iteration since 1.22, so the shadow is dead code from older-Go muscle memory. go/ast or controller-tools/pkg/markers refactor (raised separately) deliberately deferred: the bufio scanner is correct for every CRD in the codebase today; switching to AST is a ~40-line restructure better landed as a follow-up if/when we actually need generics or grouped type decl support in a CRD. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rdimitrov
approved these changes
May 29, 2026
10 tasks
ChrisJBurns
added a commit
that referenced
this pull request
Jun 1, 2026
Adds an opt-in chart value (`operator.features.storageVersionMigrator`, default `false`) and wires `TOOLHIVE_ENABLE_STORAGE_VERSION_MIGRATOR` into the operator deployment. helm-docs README regenerated. The controller (PR-A, #5362) and the opt-in labels + CI guard (PR-B, #5391) are on main. This PR is the chart-side surface that lets operators opt in via `helm install --set` or values.yaml. Reference and upgrade-guide docs are intentionally separate — they land in a follow-up PR. Closes #4969. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the opt-in label that the StorageVersionMigrator controller (shipped in #5362) requires to act on a CRD, and a CI guard that prevents future CRD additions from silently skipping migration.
This is PR-B of the three-PR sequence for #4969. PR-A (#5362) shipped the controller behind a default-off env-var flag. PR-B (this one) labels every storage-version root type and adds the marker-coverage CI guard. PR-C (next) flips the default to on and ships the helm chart surface + user docs.
Part of #4969. Depends on #5362 (merged).
Medium level
+kubebuilder:metadata:labels=toolhive.stacklok.dev/auto-migrate-storage-version=trueto every storage-version root type undercmd/thv-operator/api/. That's the 12 root types underapi/v1beta1/plusMCPWebhookConfigunderapi/v1alpha1/types.go(the only CRD whose storage version is still v1alpha1). Regenerated CRD YAML so the label appears inmetadata.labelson every CRD the chart ships.TestStorageVersionRootMarkerCoverageincmd/thv-operator/controllers/marker_coverage_test.go. Walks everycmd/thv-operator/api/v*/directory and scanstypes.goand*_types.go. For every root type carrying both+kubebuilder:object:root=trueAND+kubebuilder:storageversion, the migrate marker is required. No escape hatch — if a CRD legitimately should not be auto-migrated, an entry is added to a hardcoded allowlist (excludedRootTypesin the same file, currently empty) via PR review, not via a self-serve marker.=trueewould pass a substring match but fail the controller's runtime check (labels[key] == "true"), silently excluding the CRD. The strict match catches it at PR time.TestContainsMarkerStrictBoundarydefends the boundary contract against future regressions.Low level
cmd/thv-operator/api/v1beta1/*_types.go(×12)+kubebuilder:metadata:labels=...auto-migrate-storage-version=truemarker on each root typecmd/thv-operator/api/v1alpha1/types.goMCPWebhookConfig(the only never-graduated storage-version root)cmd/thv-operator/controllers/marker_coverage_test.goapi/v*/directories, requires migrate marker on every storage-version root, supports anexcludedRootTypesallowlist for documented exclusionsdeploy/charts/operator-crds/files/crds/*.yaml(×13)metadata.labelsdeploy/charts/operator-crds/templates/*.yaml(×13)Type of change
Test plan
TestStorageVersionRootMarkerCoveragepasses (scans every root type across all API version dirs, asserts marker presence)TestContainsMarkerStrictBoundarypasses (7 subtests covering exact match, whitespace boundary, value-typo rejection — defends against future regression to permissive substring matching)task operator-manifestsproduces no diff after the markers — labels land in CRD YAML cleanlygo test -race ./cmd/thv-operator/...task lint-fix— known to be blocked locally by a pre-existing toolchain mismatch (golangci-lintbuilt against go1.25 vs repo's go1.26 target); not introduced by this PRWhy the CI guard matters
Without
TestStorageVersionRootMarkerCoverage:task operator-manifestsregenerates without the opt-in label.isManagedCRDcheck returns false for that new CRD; CRs of that kind never get re-encoded.spec.versions— at which point storedVersions still contains the deprecated version and the apiserver rejects the CRD update.With the test, the PR adding the new CRD fails CI with a clear pointer to the missing marker.
The test is forward-compatible: when a future v1beta2 graduation moves
+kubebuilder:storageversionfrom v1beta1 root types to v1beta2 root types, the scanner picks up the new directory automatically without code changes here. The previous version's root types lose the storage marker and are naturally filtered out.Design choice: no escape hatch
An earlier iteration of this test accepted either the migrate marker or an explicit
+thv:storage-version-migrator:excludesibling. That permissive contract opens a graduation-time hole: a single-version CRD created withexcludepasses CI, but at v1→v2 graduation time the developer has to remember to swap exclude→migrate. Forgetting the swap leaves the CRD silently un-migrated — exactly the failure mode the test is meant to prevent.Replaced with: every storage-version root must have the migrate marker, full stop. If a CRD legitimately needs to be excluded, the path is to add its
<version>/<file>.<TypeName>key to theexcludedRootTypesmap in the test file as a deliberate, reviewed decision. The allowlist is currently empty and has an example comment showing the format.Why every storage-version root (not just multi-version CRDs)
Single-version CRDs today have nothing to migrate —
storedVersions == [storageVersion], the controller'sisMigrationNeededcheck returns false, no-op forever. But when that CRD eventually graduates to a second version,storedVersionsaccumulates the old entry and the migrator suddenly needs to know whether to act. Requiring the label from day one means graduation PRs don't need to remember to add it later. The cost of the label on a single-version CRD is zero (a string inmetadata.labels); the cost of forgetting at graduation is silent data orphaning.Special notes for reviewers
toolhive.stacklok.dev/auto-migrate-storage-versionis a stable API surface once shipped — users may write Kyverno rules or GitOps filters against it, so renaming becomes a breaking change.task operator-manifests.=trueewas a real silent-failure class — controller runtime checks for exact string"true", but the CI's substringContainswould accept=trueeas a match. The newcontainsMarkerrequires the marker to be followed by end-of-line or whitespace, catching the typo at PR time.TestContainsMarkerStrictBoundaryis the regression guard if anyone "improves" it back to substring semantics.TOOLHIVE_ENABLE_STORAGE_VERSION_MIGRATOR=trueAND (b) they have CRs whose etcd-stored apiVersion differs from the current storage version. Today no toolhive cluster has the former and the latter is rare.Generated with Claude Code