Skip to content

CNTRLPLANE-3524: feat: add etcd data re-encryption after encryption key rotation#8219

Open
muraee wants to merge 10 commits into
openshift:mainfrom
muraee:etcd-reencryption-on-key-rotation
Open

CNTRLPLANE-3524: feat: add etcd data re-encryption after encryption key rotation#8219
muraee wants to merge 10 commits into
openshift:mainfrom
muraee:etcd-reencryption-on-key-rotation

Conversation

@muraee

@muraee muraee commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add a re-encryption controller in the HCCO that detects encryption key rotations and triggers StorageVersionMigration CRs to re-encrypt all existing etcd data with the new active key
  • Track progress through a new EtcdDataEncryptionUpToDate condition on HCP/HostedCluster
  • Vendor library-go's KubeStorageVersionMigrator and kube-storage-version-migrator informer/lister packages

Details

Problem

HyperShift supports encryption key rotation (setting a new active key + backup key), but has no mechanism to re-encrypt existing etcd data with the new key. Old data remains encrypted with the previous key indefinitely, violating compliance requirements (OCPSTRAT-2527, OCPSTRAT-2540).

Solution

Key change detection is performed entirely within the HCCO re-encryption controller — no CPO changes needed. The controller computes a SHA-256 fingerprint of the active key identity from the HCP spec on each reconcile and compares it against a hypershift.openshift.io/encryption-migrated-key-hash annotation on the kas-secret-encryption-config secret. When the hashes differ, re-encryption is triggered.

API (api/hypershift/v1beta1/hostedcluster_conditions.go):

  • EtcdDataEncryptionUpToDate condition type
  • Reasons: ReEncryptionInProgress, ReEncryptionCompleted, ReEncryptionFailed, WaitingForKASConvergence

HCCO (control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/):

  • New reencryption controller using library-go's KubeStorageVersionMigrator
  • computeActiveKeyFingerprint() — SHA-256 hash of active key identity (supports AWS KMS, Azure KMS, IBM Cloud KMS, AESCBC)
  • On first deployment, initializes migrated hash to active hash (assumes data is already encrypted with current key)
  • Watches HCP, encryption config secret, and KAS deployment on management cluster
  • Waits for KAS deployment convergence before starting migrations
  • Creates StorageVersionMigration CRs in the guest cluster for all encrypted resources (secrets, configmaps, routes, oauth tokens)
  • Retries failed migrations after 5-minute interval via PruneMigration
  • Sets EtcdDataEncryptionUpToDate condition on HCP with detailed progress messages

HyperShift Operator (hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go):

  • Bubbles up EtcdDataEncryptionUpToDate condition from HCP to HostedCluster (only when encryption is configured and condition exists)

Vendoring

  • library-go/pkg/operator/encryption/controllers/migrators/KubeStorageVersionMigrator, Migrator interface
  • kube-storage-version-migrator/pkg/clients/informer/ and lister/ — SVM informer/lister

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Test plan

  • Unit tests for computeActiveKeyFingerprint() — all provider types, determinism, different keys produce different hashes, unsupported type returns error
  • Unit tests for re-encryption controller reconcile — no encryption, first deploy init, hashes match, KAS not converged, migrations in progress, all succeeded, failed with retry/prune, cache not synced
  • go build ./... passes
  • go test -race passes for all modified packages
  • E2E: AWS KMS key rotation with re-encryption verification
  • E2E: Azure KMS key rotation with re-encryption verification
  • E2E: AESCBC key rotation with re-encryption verification

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added etcd data encryption status tracking and re-encryption progress monitoring to hosted clusters.
    • Extended encryption key status fields across hosted control planes and multiple KMS providers.
    • Introduced encryption migration history tracking and lifecycle state management.
  • Chores

    • Updated storage version migrator dependency classification.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai

coderabbitai Bot commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds etcd data encryption status tracking and key rotation lifecycle management to HyperShift. It introduces a new EtcdDataEncryptionUpToDate condition type with five associated re-encryption lifecycle reasons to report encryption state. The API layer extends HostedControlPlaneStatus and HostedClusterStatus with a SecretEncryption status field tracking active key fingerprints, provider identity, migration state, and rotation history. Provider-specific key status types are added for AWS, Azure, and IBM Cloud KMS along with AESCBC encryption, and deprecated backupKey fields are annotated. A new re-encryption controller reconciles HostedControlPlane objects to manage etcd data re-encryption when encryption keys rotate.

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ❓ Inconclusive Custom check specifies Ginkgo patterns; PR uses Go standard table-driven tests. Check's applicability unclear given framework mismatch. Assertion messages are weak (5%). Clarify scope: does check apply to Go standard tests? If yes: criterion 4 fails (assertion messages missing on 74/78 assertions). Add descriptive messages: g.Expect(err).ToNot(HaveOccurred(), "expected successful reconcile")
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names in reencryption_test.go are stable and deterministic. No dynamic content like generated IDs, timestamps, or variable interpolation. Tests use static BDD-style descriptions.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests were added. The new test file (reencryption_test.go) uses standard Go unit testing, not Ginkgo framework. No e2e test files were modified. Check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests added. Only new test file is reencryption_test.go with standard Go unit tests, not e2e tests. SNO check not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed No scheduling constraints, affinity rules, or topology assumptions introduced. All changes are API types, controller logic, and dependencies. Existing operator deployment has no constraints.
Ote Binary Stdout Contract ✅ Passed No stdout violations found. All logging uses logr.Logger. No fmt.Print or direct klog calls in process-level code. Tests use standard Go testing, not Ginkgo suite setup.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests were added. Only unit tests in reencryption_test.go using Go testing package. Check is not applicable to this PR.
Title check ✅ Passed The title clearly and specifically describes the main change: adding etcd data re-encryption functionality triggered by encryption key rotation, which aligns with the substantial changes across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/api Indicates the PR includes changes for the API area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Apr 13, 2026
@openshift-ci openshift-ci Bot requested review from cblecker and enxebre April 13, 2026 13:43
@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from d5e5097 to 0d54a40 Compare April 13, 2026 13:46

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go`:
- Around line 98-111: The code is writing the active key fingerprint and
unconditionally setting rekey-needed when the annotation was previously absent;
change the logic in the block that uses computeActiveKeyFingerprint,
EncryptionActiveKeyHashAnnotation and EncryptionRekeyNeededAnnotation so that
you always persist the fingerprint into secret.Annotations when fingerprint !=
"", but only set secret.Annotations[EncryptionRekeyNeededAnnotation] = "true" if
an existingHash was present (existingHash != "") and different from the new
fingerprint; do not mark rekey-needed when writing the hash for the first time.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 245-252: Reorder the finalization so the HCP status is updated to
show re-encryption completed before removing the rekey-needed annotation: call
setReEncryptionCondition(hcp, metav1.ConditionTrue,
hyperv1.ReEncryptionCompletedReason, ...) and persist that status update (as
done in the parent Reconcile flow) first, and only after the status update
succeeds call r.removeRekeyAnnotation(ctx, encryptionSecret); this ensures that
if the status update fails the reconcile will retry (via Reconcile's rekeyNeeded
logic) and that a later annotation-removal failure won't leave the condition
stuck in InProgress.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 897-904: The current logic only sets EtcdDataEncryptionUpToDate on
hcluster when the HCP condition exists, which leaves a stale condition when
encryption is disabled, hcp is nil, or the HCP removes the condition; update the
block around hcp and hcluster.Spec.SecretEncryption to handle the
missing/disabled case by removing the bubbled condition instead of doing
nothing: check if hcp is nil or hcluster.Spec.SecretEncryption is nil or
meta.FindStatusCondition(hcp.Status.Conditions,
string(hyperv1.EtcdDataEncryptionUpToDate)) returns nil, and in those cases call
meta.RemoveStatusCondition(&hcluster.Status.Conditions,
string(hyperv1.EtcdDataEncryptionUpToDate)); otherwise (when cond exists)
continue to set cond.ObservedGeneration = hcluster.Generation and
meta.SetStatusCondition as currently done.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 3f62b72e-be23-40c7-8620-e6a59a833dde

📥 Commits

Reviewing files that changed from the base of the PR and between 72647a4 and d5e5097.

⛔ Files ignored due to path filters (18)
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/errors.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/inprocess.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/inprocess_processor.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/kubestorageversionmigrator.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/metrics.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/factory.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/generic.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/internalinterfaces/factory_interfaces.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/interface.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/interface.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/storagestate.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/storageversionmigration.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/expansion_generated.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/storagestate.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/storageversionmigration.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (8)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go

Comment thread control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go (1)

103-111: ⚠️ Potential issue | 🟠 Major

Don't mark re-encryption needed on the first hash write.

When EncryptionActiveKeyHashAnnotation is absent (empty string), the condition existingHash != fingerprint evaluates to true, which sets encryption-rekey-needed=true. This will spuriously trigger re-encryption migrations for new clusters and for existing clusters upgrading to this version.

The hash should always be persisted, but rekey-needed should only be set when a previous hash existed and differs from the new one.

Suggested fix
 	if fingerprint != "" {
 		if secret.Annotations == nil {
 			secret.Annotations = map[string]string{}
 		}
 		existingHash := secret.Annotations[EncryptionActiveKeyHashAnnotation]
-		if existingHash != fingerprint {
-			secret.Annotations[EncryptionActiveKeyHashAnnotation] = fingerprint
-			secret.Annotations[EncryptionRekeyNeededAnnotation] = "true"
+		secret.Annotations[EncryptionActiveKeyHashAnnotation] = fingerprint
+		if existingHash != "" && existingHash != fingerprint {
+			secret.Annotations[EncryptionRekeyNeededAnnotation] = "true"
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go`
around lines 103 - 111, The current logic in secrtencryption.go sets
EncryptionRekeyNeededAnnotation when existingHash != fingerprint even if there
was no previous hash, causing spurious re-encryption; update the block around
secret.Annotations and EncryptionActiveKeyHashAnnotation so you always persist
the new fingerprint but only set EncryptionRekeyNeededAnnotation="true" if an
existing non-empty hash existed and is different (i.e., existingHash != "" &&
existingHash != fingerprint). Keep the nil-check for secret.Annotations and
ensure the new fingerprint is written to EncryptionActiveKeyHashAnnotation in
all cases.
control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go (1)

245-252: ⚠️ Potential issue | 🟡 Minor

Consider ordering: update status before removing annotation.

If the annotation removal at line 247 succeeds but the subsequent HCP status update in the parent Reconcile (line 138) fails, the next reconcile will find rekeyNeeded != "true" (line 167-168) and return early. This leaves the condition potentially stuck in its previous state (e.g., InProgress) rather than Completed.

To ensure consistency, consider splitting this into two reconcile cycles or restructuring to ensure status is persisted before annotation removal is committed.

Alternative approach: set condition before annotation removal
 	// All migrations succeeded — remove the rekey-needed annotation and set condition True.
 	log.Info("All re-encryption migrations completed successfully")
+	setReEncryptionCondition(hcp, metav1.ConditionTrue, hyperv1.ReEncryptionCompletedReason,
+		"All etcd data has been re-encrypted with the active encryption key")
+
+	// Note: If annotation removal fails after status update succeeds in parent Reconcile,
+	// the next reconcile will re-process and retry annotation removal (migrations already done).
 	if err := r.removeRekeyAnnotation(ctx, encryptionSecret); err != nil {
 		return reconcile.Result{}, fmt.Errorf("failed to remove rekey-needed annotation: %w", err)
 	}
-	setReEncryptionCondition(hcp, metav1.ConditionTrue, hyperv1.ReEncryptionCompletedReason,
-		"All etcd data has been re-encrypted with the active encryption key")
 	return reconcile.Result{}, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`
around lines 245 - 252, Reorder the success path so the HCP status is updated to
ReEncryptionCompleted before removing the rekey-needed annotation: call
setReEncryptionCondition(hcp, metav1.ConditionTrue,
hyperv1.ReEncryptionCompletedReason, ...) and ensure the status update is
persisted (via the same status update code used in Reconcile) prior to calling
removeRekeyAnnotation(ctx, encryptionSecret); if persisting status can fail,
return the error and only remove the annotation after status update succeeds,
ensuring the Reconcile loop sees the completed condition even if annotation
removal later fails; references: setReEncryptionCondition,
removeRekeyAnnotation, Reconcile, rekeyNeeded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go`:
- Around line 103-111: The current logic in secrtencryption.go sets
EncryptionRekeyNeededAnnotation when existingHash != fingerprint even if there
was no previous hash, causing spurious re-encryption; update the block around
secret.Annotations and EncryptionActiveKeyHashAnnotation so you always persist
the new fingerprint but only set EncryptionRekeyNeededAnnotation="true" if an
existing non-empty hash existed and is different (i.e., existingHash != "" &&
existingHash != fingerprint). Keep the nil-check for secret.Annotations and
ensure the new fingerprint is written to EncryptionActiveKeyHashAnnotation in
all cases.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 245-252: Reorder the success path so the HCP status is updated to
ReEncryptionCompleted before removing the rekey-needed annotation: call
setReEncryptionCondition(hcp, metav1.ConditionTrue,
hyperv1.ReEncryptionCompletedReason, ...) and ensure the status update is
persisted (via the same status update code used in Reconcile) prior to calling
removeRekeyAnnotation(ctx, encryptionSecret); if persisting status can fail,
return the error and only remove the annotation after status update succeeds,
ensuring the Reconcile loop sees the completed condition even if annotation
removal later fails; references: setReEncryptionCondition,
removeRekeyAnnotation, Reconcile, rekeyNeeded.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 969834bc-33b1-4893-9f05-17b72afb37c2

📥 Commits

Reviewing files that changed from the base of the PR and between d5e5097 and 0d54a40.

⛔ Files ignored due to path filters (18)
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/errors.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/inprocess.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/inprocess_processor.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/kubestorageversionmigrator.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/metrics.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/factory.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/generic.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/internalinterfaces/factory_interfaces.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/interface.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/interface.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/storagestate.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/storageversionmigration.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/expansion_generated.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/storagestate.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/storageversionmigration.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (8)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
✅ Files skipped from review due to trivial changes (1)
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • api/hypershift/v1beta1/hostedcluster_conditions.go

@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from 0d54a40 to e29501b Compare April 13, 2026 16:35
@openshift-ci openshift-ci Bot added the area/documentation Indicates the PR includes changes for documentation label Apr 13, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go (1)

245-251: ⚠️ Potential issue | 🟠 Major

Persist ReEncryptionCompleted before clearing encryption-rekey-needed.

removeRekeyAnnotation() runs before the outer Reconcile() call writes hcp.Status. If the secret update succeeds but that later status update fails, the next pass sees rekey-needed absent and exits early, leaving EtcdDataEncryptionUpToDate stuck at its previous value. Move annotation removal to a second phase after the completed condition has been successfully persisted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`
around lines 245 - 251, The code currently removes the encryption-rekey-needed
annotation (removeRekeyAnnotation) before the controller persists the updated
HCP status (setReEncryptionCondition), which can cause the requeue logic to skip
remaining work if the status update later fails; change the order so you first
set and persist the ReEncryptionCompleted condition on the HostedControlPlane
(call setReEncryptionCondition and ensure the status update is saved via the
status writer used in Reconcile), verify that the status update succeeded, and
only then call removeRekeyAnnotation to delete the encryption-rekey-needed
annotation; use the existing functions removeRekeyAnnotation,
setReEncryptionCondition and the same status update flow in Reconcile to
implement this two-phase commit.
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (1)

897-904: ⚠️ Potential issue | 🟠 Major

Remove the bubbled condition when HCP no longer reports it.

This only copies EtcdDataEncryptionUpToDate when the HCP currently has it. If encryption is disabled, the HCP is gone, or the reencryption controller removes the HCP condition, the old HostedCluster value sticks around and becomes stale.

Suggested fix
-	// Bubble up the EtcdDataEncryptionUpToDate condition from HCP when encryption is configured.
-	if hcp != nil && hcluster.Spec.SecretEncryption != nil {
-		cond := meta.FindStatusCondition(hcp.Status.Conditions, string(hyperv1.EtcdDataEncryptionUpToDate))
-		if cond != nil {
-			cond.ObservedGeneration = hcluster.Generation
-			meta.SetStatusCondition(&hcluster.Status.Conditions, *cond)
-		}
-	}
+	// Bubble up the EtcdDataEncryptionUpToDate condition from HCP when encryption is configured.
+	if hcluster.Spec.SecretEncryption == nil || hcp == nil {
+		meta.RemoveStatusCondition(&hcluster.Status.Conditions, string(hyperv1.EtcdDataEncryptionUpToDate))
+	} else if cond := meta.FindStatusCondition(hcp.Status.Conditions, string(hyperv1.EtcdDataEncryptionUpToDate)); cond != nil {
+		cond.ObservedGeneration = hcluster.Generation
+		meta.SetStatusCondition(&hcluster.Status.Conditions, *cond)
+	} else {
+		meta.RemoveStatusCondition(&hcluster.Status.Conditions, string(hyperv1.EtcdDataEncryptionUpToDate))
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`
around lines 897 - 904, The current bubble-up only sets
EtcdDataEncryptionUpToDate when HCP has the condition, but never removes it when
HCP no longer reports it; update the block in hostedcluster_controller.go that
references hcp, hcluster, meta.FindStatusCondition and meta.SetStatusCondition
to also handle the nil case: if hcp is nil, hcluster.Spec.SecretEncryption is
nil, or meta.FindStatusCondition(...) returns nil, call
meta.RemoveStatusCondition(&hcluster.Status.Conditions,
string(hyperv1.EtcdDataEncryptionUpToDate)) to delete the stale condition;
otherwise keep the existing behavior of updating ObservedGeneration and calling
meta.SetStatusCondition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 127-130: The Get calls that fetch the HostedControlPlane (hcp :=
resourcemanifests.HostedControlPlane(...) followed by r.cpClient.Get(...)) and
the similar Gets for the encryption config Secret and KAS Deployment should
treat apierrors.IsNotFound(err) as a transient condition instead of returning a
hard error; update the error handling in the r.cpClient.Get(...) blocks
(including the HCP fetch and the other Get calls around 158-176) to check
apierrors.IsNotFound(err) and return a non-error reconcile result (e.g.,
reconcile.Result{} or a short requeue with nil error) so the controller
idles/publishes waiting state instead of entering error backoff.
- Around line 165-166: The code uses
encryptionSecret.Annotations[kas.EncryptionActiveKeyHashAnnotation] (and the
same pattern at the other occurrence around lines ~200–201) as the writeKey for
EnsureMigration which is a deterministic key fingerprint and can collide when
rotating back to a previously used key; change this to a monotonic rotation
token annotation (e.g., kas.EncryptionRotationTokenAnnotation) that is
incremented (or replaced with a monotonic nonce) each time a rotation is
initiated and persisted on encryptionSecret, and update the code that reads the
annotation (the places referencing
encryptionSecret.Annotations[kas.EncryptionActiveKeyHashAnnotation]) to read the
rotation token instead and pass that token as the writeKey into EnsureMigration
so each rotation event is unique even if the underlying key value repeats.

---

Duplicate comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 245-251: The code currently removes the encryption-rekey-needed
annotation (removeRekeyAnnotation) before the controller persists the updated
HCP status (setReEncryptionCondition), which can cause the requeue logic to skip
remaining work if the status update later fails; change the order so you first
set and persist the ReEncryptionCompleted condition on the HostedControlPlane
(call setReEncryptionCondition and ensure the status update is saved via the
status writer used in Reconcile), verify that the status update succeeded, and
only then call removeRekeyAnnotation to delete the encryption-rekey-needed
annotation; use the existing functions removeRekeyAnnotation,
setReEncryptionCondition and the same status update flow in Reconcile to
implement this two-phase commit.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 897-904: The current bubble-up only sets
EtcdDataEncryptionUpToDate when HCP has the condition, but never removes it when
HCP no longer reports it; update the block in hostedcluster_controller.go that
references hcp, hcluster, meta.FindStatusCondition and meta.SetStatusCondition
to also handle the nil case: if hcp is nil, hcluster.Spec.SecretEncryption is
nil, or meta.FindStatusCondition(...) returns nil, call
meta.RemoveStatusCondition(&hcluster.Status.Conditions,
string(hyperv1.EtcdDataEncryptionUpToDate)) to delete the stale condition;
otherwise keep the existing behavior of updating ObservedGeneration and calling
meta.SetStatusCondition.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: fe17a0ce-bc5e-4d0c-87ad-e9027725f712

📥 Commits

Reviewing files that changed from the base of the PR and between 0d54a40 and e29501b.

⛔ Files ignored due to path filters (4)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/AROSwift/zz_fixture_TestControlPlaneComponents_kas_secret_encryption_config_secret.yaml is excluded by !**/testdata/**
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
  • docs/content/reference/api.md is excluded by !docs/content/reference/api.md
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (9)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
  • go.mod
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
✅ Files skipped from review due to trivial changes (1)
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go

@codecov

codecov Bot commented Apr 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 54.60123% with 74 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.74%. Comparing base (4755e9c) to head (58a38b5).

Files with missing lines Patch % Lines
...rator/controllers/hostedcontrolplane/v2/kas/kms.go 52.57% 33 Missing and 13 partials ⚠️
...ontrollers/hostedcontrolplane/v2/kas/deployment.go 0.00% 15 Missing ⚠️
...or/controllers/hostedcontrolplane/v2/kas/aescbc.go 66.66% 2 Missing and 2 partials ⚠️
...r/controllers/hostedcontrolplane/v2/kas/kms/aws.go 81.25% 2 Missing and 1 partial ⚠️
...controllers/hostedcontrolplane/v2/kas/kms/azure.go 82.35% 2 Missing and 1 partial ⚠️
...trollers/hostedcontrolplane/v2/kas/kms/ibmcloud.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8219      +/-   ##
==========================================
+ Coverage   41.54%   41.74%   +0.20%     
==========================================
  Files         758      765       +7     
  Lines       93838    94723     +885     
==========================================
+ Hits        38986    39545     +559     
- Misses      52107    52374     +267     
- Partials     2745     2804      +59     
Files with missing lines Coverage Δ
...ostedcontrolplane/hostedcontrolplane_controller.go 45.73% <100.00%> (+0.02%) ⬆️
...ontrollers/hostedcontrolplane/v2/cvo/deployment.go 41.17% <100.00%> (+0.58%) ⬆️
...r/controllers/hostedcontrolplane/v2/kas/kms/kms.go 100.00% <ø> (ø)
...lers/hostedcontrolplane/v2/kas/secretencryption.go 53.60% <ø> (+21.04%) ⬆️
...lane/v2/kube_storage_version_migrator/component.go 0.00% <ø> (ø)
...-plane-operator/hostedclusterconfigoperator/cmd.go 0.00% <ø> (ø)
...configoperator/controllers/reencryption/metrics.go 88.46% <ø> (ø)
...goperator/controllers/reencryption/reencryption.go 68.12% <ø> (ø)
...erconfigoperator/controllers/reencryption/setup.go 0.00% <ø> (ø)
...trollers/hostedcluster/hostedcluster_controller.go 45.91% <ø> (+0.02%) ⬆️
... and 11 more
Flag Coverage Δ
cmd-support 35.22% <ø> (+0.26%) ⬆️
cpo-hostedcontrolplane 43.92% <54.60%> (+0.32%) ⬆️
cpo-other 43.80% <ø> (+0.63%) ⬆️
hypershift-operator 51.49% <ø> (-0.14%) ⬇️
other 31.56% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from e29501b to e1b396f Compare April 13, 2026 19:07

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go (2)

254-263: ⚠️ Potential issue | 🟡 Minor

Completed status can still be lost on the failure path.

Line 261 persists annotation removal before the caller persists the updated HCP status. If removeRekeyAnnotation() succeeds and the later Status().Update() fails, the next reconcile hits rekeyNeeded != "true" and exits early, leaving EtcdDataEncryptionUpToDate stale instead of ReEncryptionCompleted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`
around lines 254 - 263, The code removes the rekey-needed annotation before the
updated HCP status is persisted, risking loss of the Completed condition if the
subsequent status update fails; change the order so you persist the updated HCP
status with the ReEncryptionCompleted condition (call the HCP status update via
r.Client.Status().Update or the existing status update helper) first, check that
it succeeded, and only then call removeRekeyAnnotation(encryptionSecret);
reference setReEncryptionCondition, the HCP Status().Update call, and
removeRekeyAnnotation to locate and reorder the operations.

169-171: ⚠️ Potential issue | 🟠 Major

Use a rotation token here, not the active-key fingerprint.

activeKeyHash identifies the current key, not the rotation event. If a cluster rotates A → B → A, this passes the same writeKey back into EnsureMigration, so an older completed migration for key A can satisfy the check even though the latest rotation still needs objects rewritten from B to A.

Also applies to: 208-210

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`
around lines 169 - 171, The code is using the active-key fingerprint
(encryptionSecret.Annotations[kas.EncryptionActiveKeyHashAnnotation]) to drive
EnsureMigration, which breaks for A→B→A rotations; instead read and use the
rotation token annotation (e.g.,
encryptionSecret.Annotations[kas.EncryptionRotationTokenAnnotation]) as the
writeKey passed to EnsureMigration so each rotation event is unique; update the
usage in the block that reads rekeyNeeded and activeKeyHash and the duplicate
occurrence around the EnsureMigration call (the lines referencing
kas.EncryptionActiveKeyHashAnnotation and the EnsureMigration/writeKey variable)
to use the rotation-token annotation and pass that token through.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go`:
- Around line 188-199: The fingerprinting currently includes the Secret name
(secretEncryption.AESCBC.ActiveKey.Name) which causes renames/copies to change
the fingerprint; change the input construction in the AESCBC case to only
fingerprint the key bytes: compute the SHA256 of
activeKeySecret.Data[hyperv1.AESCBCKeySecretKey] and set input to include only
the algorithm and the hash (e.g. "aescbc/<sha256>"), removing any use of
ActiveKey.Name; ensure you reference activeKeySecret and
hyperv1.AESCBCKeySecretKey when locating the key bytes and preserve existing
error handling around cpContext.Client.Get.

---

Duplicate comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 254-263: The code removes the rekey-needed annotation before the
updated HCP status is persisted, risking loss of the Completed condition if the
subsequent status update fails; change the order so you persist the updated HCP
status with the ReEncryptionCompleted condition (call the HCP status update via
r.Client.Status().Update or the existing status update helper) first, check that
it succeeded, and only then call removeRekeyAnnotation(encryptionSecret);
reference setReEncryptionCondition, the HCP Status().Update call, and
removeRekeyAnnotation to locate and reorder the operations.
- Around line 169-171: The code is using the active-key fingerprint
(encryptionSecret.Annotations[kas.EncryptionActiveKeyHashAnnotation]) to drive
EnsureMigration, which breaks for A→B→A rotations; instead read and use the
rotation token annotation (e.g.,
encryptionSecret.Annotations[kas.EncryptionRotationTokenAnnotation]) as the
writeKey passed to EnsureMigration so each rotation event is unique; update the
usage in the block that reads rekeyNeeded and activeKeyHash and the duplicate
occurrence around the EnsureMigration call (the lines referencing
kas.EncryptionActiveKeyHashAnnotation and the EnsureMigration/writeKey variable)
to use the rotation-token annotation and pass that token through.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: b08bffff-7931-4567-990b-d445f0cabf23

📥 Commits

Reviewing files that changed from the base of the PR and between e29501b and e1b396f.

⛔ Files ignored due to path filters (4)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/AROSwift/zz_fixture_TestControlPlaneComponents_kas_secret_encryption_config_secret.yaml is excluded by !**/testdata/**
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
  • docs/content/reference/api.md is excluded by !docs/content/reference/api.md
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (9)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
  • go.mod
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
✅ Files skipped from review due to trivial changes (3)
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • go.mod
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • api/hypershift/v1beta1/hostedcluster_conditions.go

Comment thread control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go Outdated
@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from e1b396f to d9f1f6a Compare April 14, 2026 10:04
@devguyio

devguyio commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

I now have all the information needed for a complete root cause analysis. Here is the report:


Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Commit fab20778d3:
1: CT1 Title does not start with one of fix, feat, chore, docs, style, refactor, perf, test, revert, ci, build:
   "vendor: add library-go migrators and kube-storage-version-migrator informer/lister"

Root Cause

The first commit in the PR (`fab20778d3`) has a title that **violates the Conventional Commits specification** enforced by the repository's `.gitlint` configuration.

The commit title is:

vendor: add library-go migrators and kube-storage-version-migrator informer/lister

The .gitlint configuration at the repository root requires titles to follow Conventional Commits format, with the allowed type prefixes being:

types = fix,feat,chore,docs,style,refactor,perf,test,revert,ci,build

The prefix vendor is not in the allowed list. The gitlint contrib-title-conventional-commits rule (CT1) rejects any commit title that does not begin with one of these types followed by a colon.

The second commit (d5e509743d) passes validation — its title feat: add etcd data re-encryption after encryption key rotation correctly uses the feat prefix.

Recommendations

These are some recommendations based on the failure analysis 1. **Reword the first commit's title** to use an allowed Conventional Commits prefix. The most appropriate prefix for a vendoring commit is `chore` or `build`: ``` chore: vendor library-go migrators and kube-storage-version-migrator informer/lister ``` or: ``` build: vendor library-go migrators and kube-storage-version-migrator informer/lister ```
  1. Interactive rebase to amend the commit message:
    git rebase -i HEAD~2
    # Change 'pick' to 'reword' for the first commit
    # Update the title to use chore: or build: prefix
    git push --force-with-lease

Evidence

This is some collected evidence that helped reach the mentioned root cause
Evidence Detail
Failing commit fab20778d318dbac81809b5f18434bbfacd3716e
Commit title vendor: add library-go migrators and kube-storage-version-migrator informer/lister
Gitlint rule violated CT1 (contrib-title-conventional-commits)
Invalid prefix used vendor
Allowed prefixes fix, feat, chore, docs, style, refactor, perf, test, revert, ci, build
Passing commit d5e509743d — title correctly uses feat: prefix

@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from d9f1f6a to 968dac4 Compare April 14, 2026 10:14

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go (1)

135-136: ⚠️ Potential issue | 🟡 Minor

Handle deleted HostedControlPlane as a no-op.

If the watched HCP is already gone, this returns an error and the controller backs off during normal teardown. Treat apierrors.IsNotFound(err) here the same way as the Secret/Deployment lookups and just exit cleanly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`
around lines 135 - 136, When fetching the HostedControlPlane (hcp) with
r.cpClient.Get in the Reconcile handler, treat a not-found error as a benign
delete and exit cleanly instead of returning an error; update the error handling
around r.cpClient.Get(ctx, client.ObjectKeyFromObject(hcp), hcp) to check
apierrors.IsNotFound(err) and return reconcile.Result{}, nil for that case,
while preserving the existing fmt.Errorf path for other errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 185-200: The code treats a missing
EncryptionMigratedKeyHashAnnotation as "first deployment" and initializes it to
activeKeyHash (via setMigratedKeyHash), which incorrectly skips re-encryption
when kas-secret-encryption-config is recreated during a pending rotation; modify
the logic in the reconcile path that reads EncryptionMigratedKeyHashAnnotation
and compares it to activeKeyHash so that both the "initialize migrated key hash"
branch and the migratedKeyHash == activeKeyHash fast-path are only taken when
the rekey-needed signal (hypershift.openshift.io/encryption-rekey-needed) is not
present/true; i.e., check the hypershift.openshift.io/encryption-rekey-needed
annotation before calling setMigratedKeyHash or returning early on equal hashes
so pending rekey flows continue to perform re-encryption.
- Around line 363-367: The code uses
activeKeySecret.Data[hyperv1.AESCBCKeySecretKey] without validating it and then
hashes potentially empty bytes; update the logic around the c.Get(...) result in
reencryption.go to check that activeKeySecret.Data contains a non-empty entry
for hyperv1.AESCBCKeySecretKey before computing input (the fingerprint), and if
the key is missing or empty return a clear error (or handle it upstream) instead
of generating a fingerprint for empty bytes; adjust any callers that rely on
input/EncryptionMigratedKeyHashAnnotation to expect this error path.

---

Duplicate comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 135-136: When fetching the HostedControlPlane (hcp) with
r.cpClient.Get in the Reconcile handler, treat a not-found error as a benign
delete and exit cleanly instead of returning an error; update the error handling
around r.cpClient.Get(ctx, client.ObjectKeyFromObject(hcp), hcp) to check
apierrors.IsNotFound(err) and return reconcile.Result{}, nil for that case,
while preserving the existing fmt.Errorf path for other errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 68ffe0d4-c58d-4278-b70b-957f7c8b917b

📥 Commits

Reviewing files that changed from the base of the PR and between e1b396f and d9f1f6a.

⛔ Files ignored due to path filters (4)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/AROSwift/zz_fixture_TestControlPlaneComponents_kas_secret_encryption_config_secret.yaml is excluded by !**/testdata/**
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
  • docs/content/reference/api.md is excluded by !docs/content/reference/api.md
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (8)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
  • go.mod
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
✅ Files skipped from review due to trivial changes (3)
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • go.mod

@muraee

muraee commented Apr 14, 2026

Copy link
Copy Markdown
Contributor Author

Re: @coderabbitai's comment on AESCBC key validation:

Done. Added validation to return empty fingerprint when key data is missing or empty.


AI-assisted response via Claude Code

@muraee

muraee commented Apr 14, 2026

Copy link
Copy Markdown
Contributor Author

Re: @coderabbitai's comment on missing migrated-hash during rotation:

The encryption-rekey-needed annotation was intentionally removed in this revision. The controller now detects re-encryption need by comparing the active key fingerprint (computed from HCP spec) against encryption-migrated-key-hash on the secret — no boolean signal involved.

The scenario described (secret recreated during pending rotation) requires manual deletion of an operator-managed secret, which would also disrupt KAS itself. In that edge case, the controller initializes the migrated hash to the current active key, which is correct — the CPO will recreate the secret with the current encryption config, and KAS will use the current key going forward. If re-encryption for the previous key was incomplete, the next key rotation will trigger a full re-encryption.


AI-assisted response via Claude Code

@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from 968dac4 to 2b0d02c Compare April 14, 2026 10:42
@coderabbitai

coderabbitai Bot commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@muraee: Thanks for addressing this! Adding the guard to return an empty fingerprint when the AESCBC key data is missing or empty is the right approach — it prevents computing a potentially incorrect fingerprint from incomplete data and avoids triggering unnecessary re-encryption cycles.

@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch 2 times, most recently from aa68582 to faf7b6b Compare June 1, 2026 13:47
@muraee

muraee commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

/label tide/merge-method-squash

@openshift-ci openshift-ci Bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 1, 2026
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8219 June 1, 2026 15:02 Inactive
@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from c498eea to 8224a68 Compare June 1, 2026 15:49
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8219 June 1, 2026 15:54 Inactive
@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch 2 times, most recently from 381fe06 to 53edcca Compare June 1, 2026 16:19
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8219 June 1, 2026 16:31 Inactive
@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from 53edcca to 375c307 Compare June 3, 2026 12:08
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8219 June 3, 2026 12:14 Inactive
@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from 375c307 to 4f1ea83 Compare June 3, 2026 12:20
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8219 June 3, 2026 12:24 Inactive
@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from 4f1ea83 to 16bf30b Compare June 5, 2026 12:44
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8219 June 5, 2026 12:48 Inactive
muraee and others added 10 commits June 10, 2026 16:29
…oDate condition

Add API types for tracking etcd data re-encryption after key rotation:
- SecretEncryptionStatus with ActiveKey, TargetKey, and History fields
- SecretEncryptionKeyStatus union type (Azure, AWS, IBMCloud, AESCBC)
- Per-provider key status types (AzureKMSKeyStatus, AWSKMSKeyStatus, etc.)
- EncryptionMigrationHistory and EncryptionMigrationState types
- EtcdDataEncryptionUpToDate condition with 7 reason constants
- SecretEncryption field added to HostedClusterStatus and HCPStatus
- Deprecated backupKey fields on AWSKMSSpec, AzureKMSSpec, AESCBCSpec

Signed-off-by: Mulham Raee <mulham.raee@gmail.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
Auto-generated output from make update after API type changes.
Includes vendored library-go migrators and kube-storage-version-migrator
informer/lister packages required by the HCCO re-encryption controller.

Signed-off-by: Mulham Raee <mulham.raee@gmail.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
…-up, and rotation guard VAP

- Add support/secretencryption/ package with fingerprint computation,
  key status construction, and EncryptionConfiguration parsing
- Copy EtcdDataEncryptionUpToDate condition from HCP to HostedCluster
  using the copy-if-present pattern
- Copy SecretEncryption status from HCP to HostedCluster
- Deploy ValidatingAdmissionPolicy that blocks active key changes while
  re-encryption is in progress (EtcdDataEncryptionUpToDate=False)

Signed-off-by: Mulham Raee <mulham.raee@gmail.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
…out and deploy migrator

Two-stage KAS encryption key rollout:
- Rewrite adaptSecretEncryptionConfig to derive write/read key assignment
  from HCP status and live EncryptionConfiguration
- ReadOnlyDeploy: old key writes, new key read-only
- WritePromote: new key writes once in config and KAS converged
- Extract AWSKMSProviderName, AzureKMSProviderName, AESCBCKeyName as
  shared functions for provider name computation
- Add --encryption-provider-config-automatic-reload=false for Azure KMS

Control plane kube-storage-version-migrator:
- New CPOv2 component with platform predicate (skip IBM/PowerVS)
- Add data-plane migrator deployments to CVO resourcesToRemove

Signed-off-by: Mulham Raee <mulham.raee@gmail.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
Add re-encryption controller to the HCCO that drives the key rotation
lifecycle by deriving the current phase from observable state:

- Detect key changes by comparing spec vs status fingerprints
- Derive phase from EncryptionConfiguration contents and KAS convergence
- Create StorageVersionMigration CRs via KubeStorageVersionMigrator
- Manage targetKey, activeKey, and history in HCP status
- Set EtcdDataEncryptionUpToDate condition with phase-specific reasons
- Expose Prometheus metrics for rotation state, duration, and failures

Signed-off-by: Mulham Raee <mulham.raee@gmail.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
Signed-off-by: Mulham Raee <mulham.raee@gmail.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
Signed-off-by: Mulham Raee <mulham.raee@gmail.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
Replace the duplicated *KeyStatus types (AWSKMSKeyStatus,
AzureKMSKeyStatus, IBMCloudKMSKeyStatus) with the existing spec types
(AWSKMSKeyEntry, AzureKMSKey, IBMCloudKMSKeyEntry) in
SecretEncryptionKeyStatus. This follows the Kubernetes convention of
reusing spec types in status when the data shape is the same.

Also adds omitempty to the inner fields of the three spec types,
fixing a pre-existing API convention violation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…omitzero

Use metav1.Time (non-pointer) with omitzero instead of *metav1.Time
with omitempty. The IsZero() method on metav1.Time makes omitzero
correctly omit the field when unset, removing the need for a pointer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix duplicate Azure KMS token minter container (copy-paste bug)
- Revert omitempty on IBMCloudKMSKeyEntry.KeyVersion (0 is valid)
- Add MinLength=1 to existing KMS key spec types alongside omitempty
- Use omitzero for AESCBCKeyStatus.Secret and StartedTime struct fields
- Remove unnecessary +k8s:deepcopy-gen=true markers on value-only types
- Remove unused ReEncryptionPersistentFailureReason constant
- Handle IsNotFound in HCCO isKASConverged and deriveTargetKeyRole
- Filter watches in setup.go by KAS deployment and encryption config secret
- Extract requeue interval constants
- Remove unused context parameter from handleMigratingPhase
- Gate recordMigrationDuration on state transition to avoid histogram inflation
- Add unit tests for DecodeEncryptionConfiguration, FingerprintFromKeyStatus
  (IBMCloud, AESCBC, empty-field branches), and KeyStatusFromSpec (AWS, IBMCloud)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from 16bf30b to 58a38b5 Compare June 10, 2026 14:30
@hypershift-jira-solve-ci

hypershift-jira-solve-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown

Now I have the full picture. Let me compile the final report.

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

codecov/project: 41.51% (-0.04%) compared to 4755e9c — FAILURE
Project coverage dropped from 41.54% (base) to 41.51% (head).
Hits: -24 | Misses: +48 | Partials: -1 | Lines: +23

Summary

This was a transient Codecov race condition, not a real test or code failure. Codecov posted two codecov/project check runs against the same commit (58a38b5) within the same check suite — the first at 14:38:33 UTC reported 41.51% (−0.04%, failure) based on an incomplete/partial upload, and the second at 14:39:01 UTC reported 41.58% (+0.03%, success) once all coverage flag uploads (cpo-hostedcontrolplane, cpo-other, hypershift-operator, cmd-support, other) had completed. GitHub surfaced the first (failing) run at the link provided, but gh pr checks confirms the PR's current codecov/project status is passing. The failure is already auto-resolved with no action needed.

Root Cause

The codecov/project check run that failed (ID 80586133101) was a premature coverage report posted by Codecov before all coverage upload flags had finished processing. The openshift/hypershift repository uses 5 separate coverage flags with carryforward: true (cpo-hostedcontrolplane, cpo-other, hypershift-operator, cmd-support, other), and the codecov.yml has notify.wait_for_ci: false, which tells Codecov to post status checks as soon as partial results are available rather than waiting for all CI jobs to complete.

The sequence was:

  1. 14:38:33 UTC — Codecov posted the first codecov/project check with only partial flag uploads processed. This computed coverage as 41.51% (−0.04% vs base), resulting in a failure conclusion because overall project coverage decreased.
  2. 14:39:01 UTC — Codecov posted the second (final) codecov/project check after all flag uploads completed. This computed coverage as 41.58% (+0.03% vs base), resulting in a success conclusion.

Both check runs belong to the same check suite (73337402442) on the same commit. GitHub's check run API shows the later (passing) run as the authoritative result, and gh pr checks confirms codecov/project is now pass.

The −0.04% initial drop was an artifact of incomplete flag data — the carryforward: true setting means missing flags carry forward stale data from previous commits, which can cause temporary coverage fluctuations until all current uploads arrive.

Recommendations
  1. No action required — The failure is already auto-resolved. The PR's current codecov/project status is passing (41.58%, +0.03%).
  2. Optional: Set wait_for_ci: true — To prevent future transient failures from premature Codecov reports, consider changing codecov.yml from wait_for_ci: false to wait_for_ci: true. This ensures Codecov waits for all CI coverage uploads before posting the status check, eliminating the race window that caused this false failure.
  3. Ignore and merge — This codecov failure does not indicate any test failure or coverage regression. All modified and coverable lines are covered by tests per the Codecov report itself ("All modified and coverable lines are covered by tests").
Evidence
Evidence Detail
Failing check run ID 80586133101 — conclusion: failure, reported 41.51% (−0.04%), posted at 14:38:33 UTC
Passing check run ID 80586237364 — conclusion: success, reported 41.58% (+0.03%), posted at 14:39:01 UTC
Same check suite Both runs belong to check suite 73337402442 on commit 58a38b5
Current PR status gh pr checks shows codecov/project as pass
Codecov config wait_for_ci: false in codecov.yml allows premature reports
Coverage flags 5 flags with carryforward: true — partial uploads cause temporary fluctuations
Patch coverage 54.60% (well above the 41.54% target) — all changed lines are covered
Failing report note Even the failing report states: "✅ All modified and coverable lines are covered by tests"

@openshift-ci

openshift-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@muraee: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/azure PR/issue for Azure (AzurePlatform) platform area/platform/ibmcloud PR/issue for IBMCloud (IBMCloudPlatform) platform jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants