Skip to content

ARO-26764: narrow HostedCluster primary watch to actionable metadata updates#8424

Open
tuxerrante wants to merge 11 commits into
openshift:mainfrom
tuxerrante:tuxerrante/hostedcluster-generation-sync
Open

ARO-26764: narrow HostedCluster primary watch to actionable metadata updates#8424
tuxerrante wants to merge 11 commits into
openshift:mainfrom
tuxerrante:tuxerrante/hostedcluster-generation-sync

Conversation

@tuxerrante

@tuxerrante tuxerrante commented May 5, 2026

Copy link
Copy Markdown

What this PR does / why we need it:

The HostedCluster controller currently receives reconcile requests from its primary watch for updates that do not represent new desired state, especially status-only churn. That makes the controller noisier than it needs to be and makes it harder to reason about which reconciliations are actually driven by user intent versus controller-written status.

This change narrows only the primary HostedCluster watch so that it continues to reconcile on meaningful inputs, while avoiding self-induced reconciles from status-only updates.

The key design point is that this cannot be implemented as a pure generation-only filter. For HostedCluster, some metadata-only updates are already real control inputs for reconciliation and must continue to enqueue even when the object generation does not change. Examples include specific annotations consumed by reconciliation, mirrored api.openshift.com/* labels, scope transitions, and deletion start.

To keep the behavior correct while reducing noise, this PR introduces a dedicated primary predicate that allows:

  • generation-changing updates
  • deletion timestamp transitions
  • actionable annotation changes
  • actionable mirrored label changes

and filters out status-only updates from the primary watch.

The change is intentionally narrow:

  • child-resource watches are left unchanged
  • the API / CRD surface is unchanged
  • the existing ObservedGeneration status model remains in place
  • the reconcile loop still re-reads the current HostedCluster, so stale queued requests continue to act on the latest available object state

Alongside the watch change, this PR fixes two same-area behaviors that became important once metadata-only updates were treated more explicitly:

  1. ValidReleaseImage is re-evaluated when SkipReleaseImageValidation is added or removed without a generation bump, so a previously cached True condition cannot remain stale.
  2. mirrored api.openshift.com/* labels are actively removed from HostedControlPlane when they are removed from the HostedCluster, instead of only being copied on add/update.

Overall, the goal is to reduce unnecessary reconciliations without weakening any existing metadata-driven control path.

Which issue(s) this PR fixes:

Fixes ARO-26764

Notes:

This PR is easier to review if you read it as a queueing-boundary change rather than as a reconcile algorithm rewrite.

The new predicate does not try to guarantee one reconcile per generation. Instead, it makes the enqueue boundary stricter and more explicit. The "latest object wins" behavior still comes from the existing reconcile entrypoint, which fetches the current HostedCluster by NamespacedName when the request is processed. That means older queued requests still reconcile against the latest stored object state, not the old event payload.

Because of that, the review focus should be:

  • whether the allowlist of actionable metadata inputs is complete
  • whether status-only updates are now correctly filtered out from the primary watch
  • whether same-generation metadata changes still propagate correctly
  • whether the blast radius is appropriately contained to the shared HostedCluster primary watch and adjacent regression fixes

Expected impact:

  • fewer self-induced HostedCluster reconciliations from status-only updates
  • no intended behavior change for spec updates
  • no intended behavior change for metadata-only inputs that already affect reconciliation
  • HCP-wide effect because this path is shared across Hosted Control Plane distributions, not specific to one platform

Test plan:

  • make verify
  • predicate unit tests cover generation changes, status-only updates, actionable annotations, actionable labels, deletion timestamp changes, and scope transitions
  • falsification tests verify that stale queued requests still reconcile the latest HostedCluster generation and latest actionable metadata
  • regression tests cover same-generation SkipReleaseImageValidation toggles
  • regression tests cover removal of mirrored api.openshift.com/* labels from HostedControlPlane

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • New Features

    • Added an annotation-based option to skip release image validation (records a clear skipped reason).
    • Labels with the OCM prefix are recognized and mirrored to the control plane.
  • Bug Fixes

    • Improved annotation and label synchronization to remove stale entries and only sync actionable prefixes.
    • Reconciliation conditions now correctly reflect generation-aware and skip-aware release-image validation outcomes.
  • Tests

    • Added extensive unit tests for predicates, label/annotation syncing, reconciliation conditions, and release-image validation logic.

tuxerrante and others added 3 commits May 5, 2026 17:13
Reduce self-induced HostedCluster reconciles by filtering the primary watch to
spec changes and explicit metadata triggers that already affect reconciliation.

Preserve mirrored annotation and label behavior with focused falsification tests
so stale queued requests still reconcile the latest HostedCluster state.

Signed-off-by: Alessandro Affinito <aaffinit@redhat.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
Co-authored-by: Cursor <cursoragent@cursor.com>
Track when release image validation was skipped so same-generation annotation
changes can invalidate a cached True ValidReleaseImage condition.

Add a focused regression test matrix covering skip annotation add and remove
transitions without a generation bump.

Signed-off-by: Alessandro Affinito <aaffinit@redhat.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
Co-authored-by: Cursor <cursoragent@cursor.com>
Update the new HostedCluster predicate code to use the k8sutil-scoped
annotation constants introduced on current main so the branch verifies cleanly
after rebasing.

Co-authored-by: Cursor <cursoragent@cursor.com>
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 5, 2026
@openshift-ci-robot

openshift-ci-robot commented May 5, 2026

Copy link
Copy Markdown

@tuxerrante: This pull request references ARO-26764 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the enhancement to target either version "5.0." or "openshift-5.0.", but it targets "ARO-Installer-4.21" instead.

Details

In response to this:

Summary

  • narrow the shared HostedCluster primary watch so HCP stops reconciling on status-only churn while still enqueueing on explicit metadata-only control inputs
  • keep child-resource watches and API/CRD surfaces unchanged, while preserving actionable annotations, mirrored api.openshift.com/* labels, delete-start handling, and release-image validation semantics
  • add falsification and regression coverage for latest-object-wins behavior, same-generation actionable metadata changes, mirrored label removal, and SkipReleaseImageValidation toggles

Test plan

  • make verify
  • Focused HostedCluster predicate, reconciliation, and release-image validation tests covered through the package test suite exercised by make verify
  • Optional follow-up: manually confirm a draft PR title/body and Jira linkage rendering as expected

Made with Cursor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 5, 2026
@openshift-ci

openshift-ci Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds prefix-based HostedCluster predicates that trigger on generation, deletionTimestamp, actionable annotation/prefix-family, and actionable label-prefix changes. The controller now uses hostedClusterPrimaryPredicate, mirrors actionable HCP annotations/labels by prefix (removing stale prefixed entries), and refactors release-image validation to be generation- and skip-aware via shouldValidateReleaseImage and hasSkipReleaseImageValidationAnnotation, marking ValidReleaseImage as skipped when annotated.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant API as API Server
    participant Controller as HostedCluster Controller
    participant HC as HostedCluster
    participant HCP as HostedControlPlane
    participant Validator as ReleaseImageValidator

    API->>Controller: HostedCluster update event
    Controller->>Controller: hostedClusterPrimaryPredicate evaluates
    alt Predicate allows update
        Controller->>API: Get HostedCluster
        API-->>Controller: HostedCluster (annotations, labels, generation, deletionTimestamp)
        Controller->>API: Get HostedControlPlane
        API-->>Controller: HostedControlPlane (annotations, labels)
        Controller->>Controller: Compute actionable annotation/label diffs (prefix-aware)
        Controller->>HCP: Remove stale prefixed annotations/labels
        Controller->>HCP: Copy actionable prefixed annotations/labels from HC
        Controller->>Controller: shouldValidateReleaseImage?
        alt Skip annotation present
            Controller->>Controller: Set ValidReleaseImage True (ReleaseImageValidationSkipped)
        else Validate
            Controller->>Validator: validateReleaseImage(releaseImageRef, secrets...)
            alt Validation succeeds
                Validator-->>Controller: success
                Controller->>Controller: Set ValidReleaseImage True (AsExpected)
            else Validation fails
                Validator-->>Controller: error
                Controller->>Controller: Set ValidReleaseImage False (reason)
            end
        end
        Controller->>API: Update HCP and HC status/conditions
        API-->>Controller: persisted
    else Predicate blocks update
        Controller-->>API: no reconcile triggered
    end
Loading

Suggested reviewers

  • xiuwang
  • sdminonne
  • jparrill
🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Tests use standard Go testing, not Ginkgo. Several assertions lack diagnostic messages to help diagnose failures (e.g., t.Fatalf without context on what condition failed). Add descriptive failure messages to assertions: t.Fatalf("expected condition %s found, got %v", type, actual) to help diagnose test failures.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: narrowing the HostedCluster primary watch to only enqueue on actionable metadata updates rather than all status changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names across four test files are stable, deterministic, and use static descriptive strings without dynamic content, generated IDs, timestamps, or pod/node/namespace names.
Microshift Test Compatibility ✅ Passed PR adds no Ginkgo e2e tests—only standard Go unit tests in controllers directory using testing.T interface, not Ginkgo framework.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds only Go unit tests (testing.T) with no Ginkgo e2e tests. Custom check applies only to Ginkgo e2e tests, so not applicable here.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only controller logic (predicates, label sync, validation). No scheduling constraints, pod affinity, topology spread, node selectors, or PDBs are introduced.
Ote Binary Stdout Contract ✅ Passed No OTE Binary Stdout Contract violations found. PR adds only function definitions, constants, and test code with no stdout-writing side effects in process-level code.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed All test files added are standard Go unit tests with in-memory fake clients, not Ginkgo e2e tests. No IPv4 assumptions or external connectivity requirements detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels May 5, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (1)

2399-2407: ⚡ Quick win

Avoid hardcoded actionable-label prefix in HCP label reconciliation.

Line 2400 and Line 2405 duplicate "api.openshift.com" instead of reusing hostedClusterActionableLabelPrefixes. Keeping one source of truth prevents future drift between watch triggering and label mirroring behavior.

♻️ Suggested refactor
-	for key := range hcp.Labels {
-		if strings.HasPrefix(key, "api.openshift.com") {
-			delete(hcp.Labels, key)
-		}
-	}
+	for key := range hcp.Labels {
+		for _, prefix := range hostedClusterActionableLabelPrefixes {
+			if strings.HasPrefix(key, prefix) {
+				delete(hcp.Labels, key)
+				break
+			}
+		}
+	}
 	for key, val := range hcluster.Labels {
-		if strings.HasPrefix(key, "api.openshift.com") {
-			hcp.Labels[key] = val
-		}
+		for _, prefix := range hostedClusterActionableLabelPrefixes {
+			if strings.HasPrefix(key, prefix) {
+				hcp.Labels[key] = val
+				break
+			}
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`
around lines 2399 - 2407, Replace the two hardcoded "api.openshift.com" checks
with the shared constant hostedClusterActionableLabelPrefixes: iterate the slice
hostedClusterActionableLabelPrefixes when pruning hcp.Labels and when copying
labels from hcluster to hcp (the code blocks that currently loop over hcp.Labels
and hcluster.Labels), check each label key against all prefixes in
hostedClusterActionableLabelPrefixes (e.g., using strings.HasPrefix in an inner
loop or a small helper like hasActionablePrefix(key)), and use that single
source of truth so both the deletion and assignment logic reference
hostedClusterActionableLabelPrefixes instead of the literal string.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 2399-2407: Replace the two hardcoded "api.openshift.com" checks
with the shared constant hostedClusterActionableLabelPrefixes: iterate the slice
hostedClusterActionableLabelPrefixes when pruning hcp.Labels and when copying
labels from hcluster to hcp (the code blocks that currently loop over hcp.Labels
and hcluster.Labels), check each label key against all prefixes in
hostedClusterActionableLabelPrefixes (e.g., using strings.HasPrefix in an inner
loop or a small helper like hasActionablePrefix(key)), and use that single
source of truth so both the deletion and assignment logic reference
hostedClusterActionableLabelPrefixes instead of the literal string.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 186ff352-7783-4ad9-9175-cd058d919415

📥 Commits

Reviewing files that changed from the base of the PR and between e09cc2d and 553451f.

📒 Files selected for processing (6)
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_hcp_labels_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_predicates.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_predicates_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_reconciliation_condition_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_release_image_validation_test.go

@codecov

codecov Bot commented May 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.92857% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.42%. Comparing base (e09cc2d) to head (b1377f9).
⚠️ Report is 415 commits behind head on main.

Files with missing lines Patch % Lines
...trollers/hostedcluster/hostedcluster_controller.go 74.35% 9 Missing and 1 partial ⚠️
...trollers/hostedcluster/hostedcluster_predicates.go 89.04% 6 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8424      +/-   ##
==========================================
+ Coverage   37.39%   40.42%   +3.02%     
==========================================
  Files         751      756       +5     
  Lines       91806    93277    +1471     
==========================================
+ Hits        34333    37708    +3375     
+ Misses      54838    52866    -1972     
- Partials     2635     2703      +68     
Files with missing lines Coverage Δ
...trollers/hostedcluster/hostedcluster_predicates.go 89.04% <89.04%> (ø)
...trollers/hostedcluster/hostedcluster_controller.go 43.12% <74.35%> (-0.12%) ⬇️

... and 73 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.44% <ø> (+1.87%) ⬆️
cpo-hostedcontrolplane 41.76% <ø> (+5.28%) ⬆️
cpo-other 40.31% <ø> (+2.58%) ⬆️
hypershift-operator 50.76% <83.92%> (+2.90%) ⬆️
other 31.58% <ø> (+3.80%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Keep HostedControlPlane label mirroring aligned with the shared actionable label
prefix list so the watch predicate and reconciliation logic stay in sync.

Co-authored-by: Cursor <cursoragent@cursor.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (1)

1191-1218: 💤 Low value

Optional: gate the validateReleaseImage call on skipReleaseImageValidation for clarity.

When the skip annotation is present, r.validateReleaseImage is still invoked but its result is discarded by the case skipReleaseImageValidation: branch. The function happens to short-circuit internally (line 3708), so this is functionally correct, but the caller reads as if validation always runs. Guarding the call makes intent explicit and avoids relying on the callee's internal guard surviving future edits.

♻️ Suggested refactor
 		if shouldValidateReleaseImage(hcluster, condition) {
 			condition := metav1.Condition{
 				Type:               string(hyperv1.ValidReleaseImage),
 				ObservedGeneration: hcluster.Generation,
 			}
 			skipReleaseImageValidation := hasSkipReleaseImageValidationAnnotation(hcluster)
-			err := r.validateReleaseImage(ctx, hcluster, releaseProvider)
+			var err error
+			if !skipReleaseImageValidation {
+				err = r.validateReleaseImage(ctx, hcluster, releaseProvider)
+			}
 			switch {
 			case skipReleaseImageValidation:
 				condition.Status = metav1.ConditionTrue
 				condition.Message = "Release image validation is skipped by annotation"
 				condition.Reason = releaseImageValidationSkippedReason
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`
around lines 1191 - 1218, The call to r.validateReleaseImage should be avoided
when the skip annotation is present to make the intent explicit: check
hasSkipReleaseImageValidationAnnotation(hcluster) (skipReleaseImageValidation)
before invoking r.validateReleaseImage and, if true, set the metav1.Condition
(Type string(hyperv1.ValidReleaseImage), Status True, Message "Release image
validation is skipped by annotation", Reason
releaseImageValidationSkippedReason) and call meta.SetStatusCondition without
calling validateReleaseImage; otherwise call r.validateReleaseImage and handle
err as currently done (setting Status/Message/Reason or AsExpectedReason) so the
skip path no longer relies on validateReleaseImage's internal short‑circuit.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 1191-1218: The call to r.validateReleaseImage should be avoided
when the skip annotation is present to make the intent explicit: check
hasSkipReleaseImageValidationAnnotation(hcluster) (skipReleaseImageValidation)
before invoking r.validateReleaseImage and, if true, set the metav1.Condition
(Type string(hyperv1.ValidReleaseImage), Status True, Message "Release image
validation is skipped by annotation", Reason
releaseImageValidationSkippedReason) and call meta.SetStatusCondition without
calling validateReleaseImage; otherwise call r.validateReleaseImage and handle
err as currently done (setting Status/Message/Reason or AsExpectedReason) so the
skip path no longer relies on validateReleaseImage's internal short‑circuit.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: f30f4f99-4b2a-4808-a4c7-d870c58c50b9

📥 Commits

Reviewing files that changed from the base of the PR and between 553451f and 1280cf1.

📒 Files selected for processing (1)
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go

@openshift-ci-robot

openshift-ci-robot commented May 6, 2026

Copy link
Copy Markdown

@tuxerrante: This pull request references ARO-26764 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the enhancement to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

The HostedCluster controller currently receives reconcile requests from its primary watch for updates that do not represent new desired state, especially status-only churn. That makes the controller noisier than it needs to be and makes it harder to reason about which reconciliations are actually driven by user intent versus controller-written status.

This change narrows only the primary HostedCluster watch so that it continues to reconcile on meaningful inputs, while avoiding self-induced reconciles from status-only updates.

The key design point is that this cannot be implemented as a pure generation-only filter. For HostedCluster, some metadata-only updates are already real control inputs for reconciliation and must continue to enqueue even when the object generation does not change. Examples include specific annotations consumed by reconciliation, mirrored api.openshift.com/* labels, scope transitions, and deletion start.

To keep the behavior correct while reducing noise, this PR introduces a dedicated primary predicate that allows:

  • generation-changing updates
  • deletion timestamp transitions
  • actionable annotation changes
  • actionable mirrored label changes

and filters out status-only updates from the primary watch.

The change is intentionally narrow:

  • child-resource watches are left unchanged
  • the API / CRD surface is unchanged
  • the existing ObservedGeneration status model remains in place
  • the reconcile loop still re-reads the current HostedCluster, so stale queued requests continue to act on the latest available object state

Alongside the watch change, this PR fixes two same-area behaviors that became important once metadata-only updates were treated more explicitly:

  1. ValidReleaseImage is re-evaluated when SkipReleaseImageValidation is added or removed without a generation bump, so a previously cached True condition cannot remain stale.
  2. mirrored api.openshift.com/* labels are actively removed from HostedControlPlane when they are removed from the HostedCluster, instead of only being copied on add/update.

Overall, the goal is to reduce unnecessary reconciliations without weakening any existing metadata-driven control path.

Which issue(s) this PR fixes:

Fixes ARO-26764

Special notes for your reviewer:

This PR is easier to review if you read it as a queueing-boundary change rather than as a reconcile algorithm rewrite.

The new predicate does not try to guarantee one reconcile per generation. Instead, it makes the enqueue boundary stricter and more explicit. The "latest object wins" behavior still comes from the existing reconcile entrypoint, which fetches the current HostedCluster by NamespacedName when the request is processed. That means older queued requests still reconcile against the latest stored object state, not the old event payload.

Because of that, the review focus should be:

  • whether the allowlist of actionable metadata inputs is complete
  • whether status-only updates are now correctly filtered out from the primary watch
  • whether same-generation metadata changes still propagate correctly
  • whether the blast radius is appropriately contained to the shared HostedCluster primary watch and adjacent regression fixes

Expected impact:

  • fewer self-induced HostedCluster reconciliations from status-only updates
  • no intended behavior change for spec updates
  • no intended behavior change for metadata-only inputs that already affect reconciliation
  • HCP-wide effect because this path is shared across Hosted Control Plane distributions, not specific to one platform

Test plan:

  • make verify
  • predicate unit tests cover generation changes, status-only updates, actionable annotations, actionable labels, deletion timestamp changes, and scope transitions
  • falsification tests verify that stale queued requests still reconcile the latest HostedCluster generation and latest actionable metadata
  • regression tests cover same-generation SkipReleaseImageValidation toggles
  • regression tests cover removal of mirrored api.openshift.com/* labels from HostedControlPlane

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Replace the dynamic platform override subtest titles with explicit static names
so the HostedCluster predicate tests satisfy deterministic naming checks
without changing the existing assertion style used in this package.

Signed-off-by: Alessandro Affinito <aaffinit@redhat.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
Co-authored-by: Cursor <cursoragent@cursor.com>
@tuxerrante tuxerrante marked this pull request as ready for review May 6, 2026 09:18
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 6, 2026
@openshift-ci openshift-ci Bot requested review from Nirshal and muraee May 6, 2026 09:19

@jparrill jparrill left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped some comments. Thanks

Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_predicates.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_predicates.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_predicates_test.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_predicates_test.go Outdated
tuxerrante and others added 4 commits May 22, 2026 11:31
…neric names

annotationPrefixChanged and prefixedAnnotations are used for both
annotations and labels. Rename to prefixedKeyChanged and prefixedEntries
with generic parameter names to avoid reader confusion when the
functions are called with label maps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on is present

Guard the validateReleaseImage call so it is not invoked when the skip
annotation is set. Previously the call ran but its result was discarded
by the switch branch, relying on an internal short-circuit. Making the
guard explicit avoids unnecessary work and makes intent clearer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the local ptrToTime helper with ptr.To from k8s.io/utils/ptr,
which is already a dependency in this package. Rename the predicate
local variable to pred so it does not shadow the controller-runtime
predicate package import.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_predicates.go`:
- Around line 184-185: The predicate uses prefixedKeyChanged(oldLabels,
newLabels, prefix) with a loose prefix (e.g., "api.openshift.com") which matches
keys like "api.openshift.comfoo"; update the call/sites where prefix is defined
so the prefix includes a delimiter (use "api.openshift.com/" or similar) and
ensure prefixedKeyChanged compares with this delimiter-qualified prefix so only
keys in the api.openshift.com/* label family match; modify the prefix variable
passed to prefixedKeyChanged and any related construction logic in
hostedcluster_predicates.go to use the slash-terminated prefix.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3b8d370c-7d3f-4346-8bfe-046cfe70cde6

📥 Commits

Reviewing files that changed from the base of the PR and between 037e346 and 18d2ac0.

📒 Files selected for processing (3)
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_predicates.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_predicates_test.go

@openshift-ci openshift-ci Bot added the area/api Indicates the PR includes changes for the API label May 22, 2026
tuxerrante and others added 2 commits May 22, 2026 12:29
…tant

Move the hardcoded "api.openshift.com/" label prefix to the API package
as OCMLabelPrefix so consumers outside the controller (ARO-HCP, OCM)
have a single source of truth for the prefix used to mirror labels from
HostedCluster to HostedControlPlane.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…to API package

Move releaseImageValidationSkippedReason from a controller-private
constant to the API package as ReleaseImageValidationSkippedReason,
alongside the existing AsExpectedReason and InvalidImageReason. This
allows external consumers (ARO-HCP, OCM) to inspect the ValidReleaseImage
condition reason programmatically without hardcoding the string.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tuxerrante tuxerrante force-pushed the tuxerrante/hostedcluster-generation-sync branch from e0e1fc8 to b1377f9 Compare May 22, 2026 10:29
@openshift-ci

openshift-ci Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

@tuxerrante: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@JoelSpeed

Copy link
Copy Markdown
Contributor

/approve for API

@openshift-ci

openshift-ci Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: JoelSpeed, tuxerrante
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 11, 2026
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hypershift-jira-solve-ci

hypershift-jira-solve-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown

Now I have complete evidence. Here's the comprehensive analysis:

Test Failure Analysis Complete

Job Information

  • Prow Job: Red Hat Konflux / hypershift-operator-main-enterprise-contract / hypershift-operator-main
  • Build ID: hypershift-operator-main-enterprise-contract-flx8s (Pipeline Run)
  • Second Job: Red Hat Konflux / hypershift-operator-enterprise-contract / hypershift-operator-main
  • Second Build ID: hypershift-operator-enterprise-contract-r9hwm
  • PR: ARO-26764: narrow HostedCluster primary watch to actionable metadata updates #8424 (ARO-26764: narrow HostedCluster primary watch to actionable metadata updates)
  • Head SHA: b1377f9
  • Snapshot: hypershift-operator-20260522-103006-000
  • Check Results: 254 successes, 24 warnings, 2 failures (both jobs identical)

Test Failure Analysis

Error

Enterprise Contract verify task FAILURE:
  ✅ 254 success(es)
  ⚠️ 24 warning(s)
  ❌ 2 failure(s)

Two EC policy violations detected in the Tekton pipeline provenance
used to build the hypershift-operator-main container image.

Summary

Both Konflux enterprise-contract checks fail because PR #8424's branch was forked from main at commit e09cc2d2 (2026-05-05), which predates two critical .tekton/ pipeline fixes merged on 2026-05-20/21. The PR's pipeline definitions are missing (1) the enable-package-registry-proxy: "true" parameter in the prefetch-dependencies-oci-ta task and (2) the build-image-index 0.2 → 0.3 migration (which removed deprecated COMMIT_SHA and IMAGE_EXPIRES_AFTER parameters). Because Konflux Pipelines-as-Code fetches pipeline definitions from the PR branch, not main, the image was built using outdated pipeline tasks that the Enterprise Contract validation correctly rejects. The PR's functional code changes (Go source files in api/ and hypershift-operator/controllers/) are not the cause — this is a stale branch issue. Rebasing the PR onto current main will resolve both failures.

Root Cause

The 2 enterprise-contract failures are caused by outdated Tekton pipeline task configurations on the PR branch, not by the PR's code changes.

How Konflux Pipelines-as-Code works: When a PR triggers a Konflux build, the pipeline definition is fetched from the PR's HEAD commit, not from the main branch. This means if the PR branch has stale .tekton/ files, the build runs with outdated pipeline tasks, and the subsequent Enterprise Contract (EC) verification detects the outdated/misconfigured tasks as policy violations.

PR branch fork point: The PR branch was forked from main at commit e09cc2d208a469 (2026-05-05T14:06:42Z — merge of PR #8414). The PR has 11 commits, all authored by the PR author, and was last pushed on 2026-05-22. Despite new commits being added, the branch was never rebased onto current main, so the .tekton/ directory remains at the 2026-05-05 state.

Two critical .tekton/ fixes merged after the fork point:

  1. Missing enable-package-registry-proxy parameter (commit b4241295, merged ~2026-05-20):

    • Added enable-package-registry-proxy: "true" to the prefetch-dependencies-oci-ta task in both .tekton/pipelines/common-operator-build.yaml and .tekton/hypershift-operator-main-tag.yaml
    • The EC policy requires this parameter for the prefetch-dependencies task to properly use the package registry proxy
    • PR branch: missing this parameter entirely
    • Main branch: has enable-package-registry-proxy: "true"
  2. Outdated build-image-index task (v0.2 → v0.3 migration) (PR NO-JIRA: Update Konflux Tekton task bundles #8557, merged 2026-05-20):

    • build-image-index was bumped from 0.2 to 0.3
    • v0.3 removed deprecated COMMIT_SHA and IMAGE_EXPIRES_AFTER parameters
    • PR branch: uses v0.2 (sha256:c7b0f7e1...) with deprecated params COMMIT_SHA and IMAGE_EXPIRES_AFTER
    • Main branch: uses v0.3 (sha256:b33bfa8d...) without those params, and adds ALWAYS_BUILD_INDEX: "true"

Additionally, all 18 task bundle digests in .tekton/pipelines/common-operator-build.yaml differ between the PR branch and main, including version bumps for clamav-scan (0.3 → 0.3.1) and rpms-signature-scan (0.2 → 0.2.1).

Confirmation via comparison with recent PRs:

The arithmetic confirms: 254 + 2 = 256 successes — the 2 failures on #8424 correspond exactly to the 2 additional successes seen on up-to-date PRs.

Recommendations
  1. Rebase PR ARO-26764: narrow HostedCluster primary watch to actionable metadata updates #8424 onto latest main to pick up the .tekton/ pipeline fixes:

    git fetch upstream
    git rebase upstream/main
    # Resolve merge conflicts (PR has CONFLICTING merge state)
    git push --force-with-lease
  2. No functional code changes needed — the PR's Go source changes (hostedcluster controller, predicates, API types) are unrelated to the EC failures.

  3. Expected outcome after rebase: The enterprise-contract checks should change from FAILURE (2 failures) to NEUTRAL (0 failures, warnings only) — matching the behavior of all recently updated PRs (NO-JIRA: Remove apply client workaround for controller-runtime < 0.22 #8693OCPBUGS-13553: Trigger NodePool rollout on in-place pull secret changes #8700).

  4. Note: The PR already has the needs-rebase label and is in CONFLICTING merge state, so a rebase is required for merge regardless.

Evidence
Evidence Detail
PR fork point Commit e09cc2d2 on main (2026-05-05), before .tekton/ fixes
Missing config #1 enable-package-registry-proxy: "true" not present in PR branch's prefetch-dependencies-oci-ta task (added by commit b4241295 on 2026-05-20)
Missing config #2 build-image-index task at v0.2 with deprecated COMMIT_SHA/IMAGE_EXPIRES_AFTER params (main has v0.3 without them, via PR #8557 merged 2026-05-20)
Task bundle divergence All 18 task bundle digests in common-operator-build.yaml differ between PR branch and main
EC result (PR #8424) 254 successes, 24 warnings, 2 failures → conclusion: FAILURE
EC result (PR #8700, up-to-date) 256 successes, 22 warnings, 0 failures → conclusion: NEUTRAL
Both checks identical hypershift-operator-main-enterprise-contract and hypershift-operator-enterprise-contract show same 2 failures
PR merge state CONFLICTING / DIRTY — already requires rebase
PR labels needs-rebase, jira/valid-reference, area/hypershift-operator, area/api
Commits behind main 418 commits behind as of 2026-06-11
PR does NOT modify .tekton/ Changed files are only in api/, hypershift-operator/controllers/, and vendor/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api Indicates the PR includes changes for the API area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants