CNTRLPLANE-3145: refactor(hostedcluster): segregate reconcile loop into error-collecting blocks by muraee · Pull Request #7908 · openshift/hypershift

muraee · 2026-03-10T16:10:27Z

Summary

Refactors reconcile() in the HostedCluster controller to use categorized error handling with critical/non-critical operations instead of sequential short-circuiting. Previously, any single failure among ~50 operations would block all subsequent work — e.g., a missing SSH key secret prevented CPO deployment and HCP creation.
Introduces a reconcileReport struct (reconcile_report.go) that classifies operations as critical (blocks downstream Phase 8) or nonCritical (errors collected, never blocks). When critical operations fail, Phase 8 components are automatically skipped with clear reporting of what failed and what was blocked.
Extracts inline code blocks into named methods and introduces wrapper methods (reconcileOperatorDeployments, reconcileRBACAndPolicies, reconcileKubeconfigAndPasswordSync, reconcileAuxiliary, reconcilePlatformSpecific) that collect errors independently.
The ReconciliationSucceeded condition now reflects the structured report: when critical failures exist, the condition message surfaces which operations failed and which were blocked (e.g., critical failures: [PullSecretSync]; blocked operations: [OperatorDeployments, RBACAndPolicies, ...]).

Key changes

Error categorization

Category	Behavior	Operations
critical	Failures block Phase 8 components	PlatformCredentials, PullSecretSync, SecretEncryptionSync, CoreHCPChain
nonCritical	Errors collected, never blocks	SSHKeySync, AuditWebhookSync, AdditionalTrustBundle, all Phase 8 groups

Phase structure

Phase	Behavior	Operations
0–5	Short-circuit (prerequisites)	HCP get, deletion, platform defaults, status, finalizers, namespace, platform
6a	Critical sync (error-collecting)	PlatformCredentials, PullSecretSync, SecretEncryptionSync
6b	Non-critical sync (error-collecting, never blocked)	RestoredFromBackup, AuditWebhookSync, SSHKeySync, AdditionalTrustBundle, SA signing key, etcd MTLS, ETCDMemberRecovery, GlobalConfigSync
7	Core HCP chain (always runs regardless of 6a)	HCP object → CAPI InfraCR → CAPI Cluster
8	Components — blocked if any critical failure	KubeconfigAndPasswordSync, OperatorDeployments, RBACAndPolicies, PlatformOIDCAndCSI, MonitoringAndCLISecrets

Condition reporting

The ReconciliationSucceeded condition now reflects the structured error report:

When critical failures exist, the condition message includes which operations failed and which were blocked
When only non-critical failures exist, the condition reports the aggregate error as before
Example condition message: critical failures: [PullSecretSync]; blocked operations: [KubeconfigAndPasswordSync, OperatorDeployments, RBACAndPolicies, PlatformOIDCAndCSI, MonitoringAndCLISecrets]

Structured error aggregation

When critical failures exist, aggregate() returns only critical errors with blocked operation list — non-critical errors are suppressed since the user should fix the critical issue first:

critical error: failed to get pull secret...; blocked operations: [KubeconfigAndPasswordSync, OperatorDeployments, RBACAndPolicies, PlatformOIDCAndCSI, MonitoringAndCLISecrets]

When no critical failures exist, all errors are returned as-is.

reconcileReport API

Two public methods on the report:

execute(name, category, func() error) — always runs the operation and records the result
executeOrBlock(name, func() error) — automatically checks hasCriticalFailure() and either runs the operation or records it as blocked

Analysis

See docs/design/hostedcluster-reconcile-segregation-analysis.md for the full design.

Test plan

All existing unit tests pass (go test -count=1 -race ./hypershift-operator/controllers/hostedcluster/)
make lint passes with 0 issues
New unit tests for reconcileReport methods (TestReconcileReport, TestConditionMessage, TestAggregate, TestExecuteOrBlock)
New unit tests for wrapper method isolation (TestReconcileKubeconfigAndPasswordSync_*, TestReconcileRBACAndPolicies_*)
New integration tests verifying blocking behavior:
- Phase 6a critical failure → Phase 8 blocked, Phase 7 still runs
- Phase 7 HCP creation failure → Phase 8 blocked
- Phase 6b non-critical failure → nothing blocked

openshift-ci-robot · 2026-03-10T16:10:32Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

coderabbitai · 2026-03-10T16:11:04Z

📝 Walkthrough

Walkthrough

This pull request introduces a phased, modular refactoring of the HostedCluster reconciliation loop. It adds a design document analyzing reconciliation segregation, restructures the controller into nine sequential phases with independent error aggregation, and introduces multiple helper functions to isolate functionality. A signature change removes the defaultIngressDomain parameter from reconcileControlPlaneOperator, and new tests validate partial progress when operations fail.

Changes

Cohort / File(s)	Summary
Design Documentation `docs/design/hostedcluster-reconcile-segregation-analysis.md`	New design document detailing reconciliation segregation analysis, including operation map splits (Pre-requisite, Part One, Part Two), dependency graphs, identified blocking issues, and impact assessment showing partial progress scenarios.
Controller Refactoring `hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`	Major restructuring into 9 phases: initialization, pre-deletion propagation, deletion handling, conversion fixes, status updates, prerequisites, and three independent phase blocks. Introduces 15+ new modular helper functions (e.g., `reconcileCoreHCPChain`, `reconcileOperatorDeployments`, `reconcilePlatformCredentialsWithStatus`), changes `reconcileControlPlaneOperator` signature (removes `defaultIngressDomain` parameter), and implements aggregated error collection across independent syncs.
Test Coverage `hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go`	Adds three new tests validating resilient reconciliation: kubeconfig sync failure with continued kubeadmin-password sync, RBAC failure with continued Prometheus RBAC creation, and Phase 6 SSH key failure with continued Phase 7–8 completion. Includes rbacv1 import for RBAC assertions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality	⚠️ Warning	Three new tests lack context timeouts, have incomplete coverage for failure scenarios, and include inconsistent assertion messages.	Add context timeouts using context.WithTimeout(), mock dependencies to force failures in the PKI RBAC test, and add meaningful messages to all assertions.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Stable And Deterministic Test Names	✅ Passed	Pull request uses Go's standard testing package (func TestXxx) rather than Ginkgo, so the Ginkgo test title stability check does not apply. Test function names are descriptive and static with no dynamic values.
Title check	✅ Passed	The title accurately describes the primary refactoring—segregating the reconcile loop into error-collecting blocks with clearer phase separation.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

muraee · 2026-03-10T16:13:23Z

/test e2e-aws

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/hostedcluster-reconcile-segregation-analysis.md`:
- Around line 112-190: The markdown fenced block containing the ASCII diagram
(the block that begins with
"+-----------------------------------------------------+" and includes "CRITICAL
PREREQUISITES (must succeed first)") needs a language label to satisfy
markdownlint: change the opening fence from ``` to ```text so the diagram is
fenced as a text block; update the single fenced block in the file (the ASCII
diagram between the backticks) accordingly.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 1726-1728: The code currently checks and then removes the
HostedClusterRestoredFromBackupAnnotation from hcluster before writing the
durable status (ReconciliationSucceeded/HostedClusterRestoredFromBackup
condition), which can lose the trigger if the status update fails; change the
flow so you do not consume/remove HostedClusterRestoredFromBackupAnnotation
until after the status write is confirmed: first set the
HostedClusterRestoredFromBackup condition on the HostedCluster status and
perform the status update (updateStatus on hcluster), retrying on conflict as
needed, and only after the status update succeeds remove the
HostedClusterRestoredFromBackupAnnotation (or perform the annotation removal in
a separate patch/update with proper conflict handling) so the reconcile will
retry if the status write failed.
- Around line 1456-1483: If reconcileCoreHCPChain failed and hcp is nil, phase‑8
helpers will dereference hcp and panic; guard the entire phase‑8 block by
checking if hcp == nil and, if so, append a recoverable error to componentErrs
(e.g. fmt.Errorf("skipping phase 8: HostedControlPlane is nil due to earlier
error")) and skip calling reconcileKubeconfigAndPasswordSync,
reconcileOperatorDeployments, reconcileRBACAndPolicies,
reconcilePlatformSpecific, and reconcileAuxiliary; otherwise run the existing
calls as before.
- Around line 2161-2174: The status fields holding secret references
(hcluster.Status.CustomKubeconfig and hcluster.Status.KubeadminPassword) are
only being cleared in memory; after deleting the Secrets you must also persist
those changes to the API by clearing the fields on the HostedCluster status and
calling the Status().Update (or Client.Status().Update) to save them. Modify the
branch that deletes the custom kubeconfig (and the other branch mentioned around
the KubeadminPassword) to set hcluster.Status.CustomKubeconfig = nil and/or
hcluster.Status.KubeadminPassword = nil as appropriate and then call
r.Status().Update(ctx, hcluster) (handling and returning any error) so the API
no longer holds dangling secret refs; use the existing DeleteIfNeeded flow and
ensure both branches behave the same way.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 33f6ef88-9d24-48cc-8609-666d2cca5d82

📥 Commits

Reviewing files that changed from the base of the PR and between cc479bc and 04ede4b.

📒 Files selected for processing (3)

docs/design/hostedcluster-reconcile-segregation-analysis.md
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go

openshift-ci · 2026-03-10T16:27:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: muraee

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [muraee]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

enxebre · 2026-03-12T12:27:49Z

was there jira bug we can ref reporting the scenario where this was being problematic for managed?

openshift-merge-robot · 2026-03-15T10:29:17Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

codecov · 2026-03-31T12:00:28Z

Codecov Report

❌ Patch coverage is 51.36187% with 375 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.86%. Comparing base (4755e9c) to head (e2adc8a).

Files with missing lines	Patch %	Lines
...trollers/hostedcluster/hostedcluster_controller.go	45.50%	326 Missing and 44 partials ⚠️
hypershift-operator/main.go	0.00%	2 Missing ⚠️
support/util/util.go	60.00%	1 Missing and 1 partial ⚠️
...rollers/hostedcluster/internal/platform/aws/aws.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7908      +/-   ##
==========================================
+ Coverage   41.54%   41.86%   +0.32%     
==========================================
  Files         758      759       +1     
  Lines       93838    94022     +184     
==========================================
+ Hits        38986    39365     +379     
+ Misses      52107    51924     -183     
+ Partials     2745     2733      -12

Files with missing lines	Coverage Δ
...ator/controllers/hostedcluster/reconcile_report.go	`100.00% <100.00%> (ø)`
...rollers/hostedcluster/internal/platform/aws/aws.go	`14.09% <0.00%> (ø)`
hypershift-operator/main.go	`0.00% <0.00%> (ø)`
support/util/util.go	`39.71% <60.00%> (+0.16%)`	⬆️
...trollers/hostedcluster/hostedcluster_controller.go	`51.68% <45.50%> (+5.79%)`	⬆️

... and 1 file with indirect coverage changes

Flag	Coverage Δ
cmd-support	`34.96% <60.00%> (+<0.01%)`	⬆️
cpo-hostedcontrolplane	`43.59% <ø> (ø)`
cpo-other	`43.17% <ø> (ø)`
hypershift-operator	`52.76% <51.30%> (+1.14%)`	⬆️
other	`31.56% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

csrwng

Ship observability first, then change behavior

The refactoring approach is sound — categorizing operations as critical/non-critical and collecting errors instead of short-circuiting is a clear improvement. However, this PR changes the control flow of the most critical reconciler in the system, and the new failure modes (parallel error paths, reordered operations) are exactly the kind that don't surface in unit tests or standard e2e — they manifest under partial failures in production.

Step 0: Observability before behavior change

Before changing any error handling or ordering, ship a PR that adds structured logging or metrics to the existing sequential reconcile loop that records:

Which operations fail and how often
What downstream operations would have continued running under the new model
Whether the current ordering assumptions actually matter in practice

This "dry-run" data from a production release cycle would validate the critical vs. non-critical categorization with real failure data, rather than guessing which operations are safe to unblock. It also gives us a baseline to compare against after the behavior change lands.

Then: incremental rollout

This PR bundles three distinct changes into one shot, which creates a large blast radius:

Extracting inline blocks into named methods (pure refactor)
Introducing reconcileReport and wiring it up (new framework)
Reclassifying operations as non-critical and reordering them (behavior change)

Suggested split:

PR 1 — Extract methods (zero behavior change). Move inline code blocks into named methods, keeping the exact same sequential short-circuit order. This is safe to review, easy to verify (identical behavior), and reduces the diff for subsequent PRs.

PR 2 — Introduce reconcileReport, classify everything as critical. Wire up the report framework but keep all operations as critical so behavior is identical to today — every error still blocks downstream work. This validates the framework without changing semantics.

PR 3+ — Reclassify operations as nonCritical one group at a time. Move SSH key sync, audit webhook, etc. to nonCritical incrementally, with production validation between each change. Each PR is small, reviewable, and independently revertable.

Other notes

Ordering changes need per-operation justification: The PR reorders several operations (e.g., CLI Secrets moved from first to last, RestoredFromBackup shifted relative to pull secret sync). Each change should have a brief rationale explaining why nothing downstream depends on the old position.
Feature gate: Consider a flag (env var or annotation) to switch between old and new reconcile paths during the rollout period, so issues can be mitigated without a rollback.

muraee · 2026-06-09T16:32:28Z

/test e2e-aws

…ng blocks The reconcile() method executes ~50 sequential operations where every error causes an early return, short-circuiting all remaining work. An unrelated failure (e.g., missing SSH key secret) prevents critical operations like deploying the CPO or reconciling the HCP object. This refactoring: - Extracts 12 inline blocks into named methods - Groups operations into phased error-collecting blocks - Aggregates all errors with utilerrors.NewAggregate at the end - Introduce reconcileReport struct that classifies reconcile operations as critical (blocks Phase 8) or non-critical (error-collecting, never blocks). Replace the sequential error chain where any failure short-circuits the entire loop with structured error collection and blocking rules. After this change, failures in one phase no longer block unrelated phases. For example, a missing SSH key no longer prevents CPO deployment or HCP object creation. Includes the analysis document and integration tests that verify non-blocking behavior across phases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rror-collection framework Replace the ad-hoc early-return pull-secret recovery path (PR openshift#8352) with the error-collection framework. Instead of inline HCP reconciliation when GetPullSecretBytes fails, the reconciliation now flows through the framework where PullSecretSync captures the error as critical and CoreHCPChain reconciles the HCP with full cert resolution. Key changes: - Move GetPullSecretBytes, CPO image/label resolution, and namespace reconciliation into a single report.execute("CPOImageAndNamespace") block. This prevents namespace PSA label downgrades when CPO labels are unavailable. - Make DetermineHostedClusterPayloadArch and lookupReleaseImage non-fatal so reconciliation continues to the framework. - Make cpoSupportsKASCustomKubeconfig status check unconditional — all supported CPO versions expose custom kubeconfig. - Wrap releaseImageVersion parsing in report.execute(critical) to block OperatorDeployments and RBACAndPolicies on failure instead of hard-returning. - Extract reconcileControlPlaneNamespace into its own method. - Update pull-secret-missing tests with valid fixtures (NonePlatform, Route, valid UUID) so reconciliation reaches the framework. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

muraee · 2026-06-10T10:46:13Z

/test e2e-aws

hypershift-jira-solve-ci · 2026-06-10T12:38:44Z

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2064660417933742080 | Cost: $1.5865967499999996 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report

_{Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6}

muraee · 2026-06-10T14:17:04Z

/test e2e-aws

muraee · 2026-06-11T09:21:19Z

/test e2e-aws

muraee · 2026-06-11T10:35:47Z

/test e2e-aws

openshift-ci · 2026-06-11T11:00:21Z

@muraee: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/docs-preview	`6805539`	link	false	`/test docs-preview`
ci/prow/verify-workflows	`6805539`	link	true	`/test verify-workflows`
ci/prow/e2e-aws	`e2adc8a`	link	true	`/test e2e-aws`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

hypershift-jira-solve-ci · 2026-06-11T11:03:36Z

Now I have the complete picture. Here is the final report:

Test Failure Analysis Complete

Job Information

Prow Job: pull-ci-openshift-hypershift-main-e2e-aws
Build ID: 2065020173840027648
Target: e2e-aws
PR: #7908 — CNTRLPLANE-3145: refactor(hostedcluster): segregate reconcile loop into error-collecting blocks
Failure Reason: executing_graph:step_failed:importing_release

Test Failure Analysis

Error

could not run steps: step [release:n3minor] failed: failed to import release
4.20.0-0.ci-2026-06-07-204143 to tag release:n3minor: failed to reimport the tag
ci-op-nk6yp9sw/stable-n3minor:hypershift: unable to import tag ... with message
Internal error occurred: dockerimage.image.openshift.io
"quay.io/openshift/ci@sha256:ef1b3047fb8915cf4bfd3e7a08ed90a1312b8399a152d8acf69767055a2a446d"
not found ... timed out waiting for the condition

(plus 2 additional release import failures: n2minor and n4minor)

Summary

This is a CI infrastructure failure unrelated to PR #7908. The e2e-aws job never reached any test execution — it failed during the ci-operator release payload import phase. Three out of five N-minor release payloads (n2minor/4.21, n3minor/4.20, n4minor/4.19) could not be imported because specific container images within those payloads were no longer available in the quay.io/openshift/ci and quay-proxy.ci.openshift.org/openshift/ci registries. The affected payloads were 3–8 days old at the time of the job run and their images had likely been garbage-collected. No code from the PR was tested. Retrying (/retest) should resolve this by picking up fresher release payloads.

Root Cause

The ci-operator for this HyperShift e2e-aws job resolves multiple "N-minor" release payloads (n1minor through n4minor) representing previous OCP minor versions, which are used for cross-version upgrade testing. These payloads are snapshots of CI-built images stored in quay.io/openshift/ci.

Three of the five release payloads contained image references that were no longer available:

n2minor (4.21.0-0.ci-2026-06-08-003009, 3 days old): agent-installer-ui image sha256:abea17a3... not found
n3minor (4.20.0-0.ci-2026-06-07-204143, 4 days old): hypershift image sha256:ef1b3047... not found
n4minor (4.19.0-0.ci-2026-06-03-210413, 8 days old): machine-config-operator image sha256:90428a82... not found

The ci-operator attempted 6 reimport retries for each failing tag before timing out. Since these are required dependency steps in the execution graph, the entire job was aborted before any multi-stage test steps (pre/test/post phases) could begin.

The two successfully imported payloads — initial (5.0.0, 1 day old) and n1minor (4.22.0, 4 days old) — had all their images still available, supporting the hypothesis that the failures are due to image garbage collection of stale CI payload images on the registry side.

This is a transient infrastructure issue and is completely unrelated to the PR code changes (refactoring the hostedcluster reconcile loop).

Recommendations

Retest the job — Run /retest or /test e2e-aws on the PR. The ci-operator will resolve fresh latest payloads for each N-minor stream, picking up newer images that are still available in the registry.
No code changes needed — This failure is entirely a CI infrastructure issue (stale release payload image references). The PR code was never tested.
If retest fails again — Check the OpenShift CI status page and #forum-ocp-crt / #announce-testplatform Slack channels for known registry or image-mirroring issues. File a bug against DPTP if the problem persists across multiple retests.

Evidence

Evidence	Detail
Failure stage	Release payload import (pre-test infrastructure), not test execution
Failure reason	`executing_graph:step_failed:importing_release`
Failed step: `[release:n2minor]`	Image `agent-installer-ui` (`sha256:abea17a3...`) not found in quay.io/openshift/ci — payload 4.21.0-0.ci-2026-06-08-003009 (3 days old)
Failed step: `[release:n3minor]`	Image `hypershift` (`sha256:ef1b3047...`) not found in quay.io/openshift/ci — payload 4.20.0-0.ci-2026-06-07-204143 (4 days old)
Failed step: `[release:n4minor]`	Image `machine-config-operator` (`sha256:90428a82...`) not found in quay.io/openshift/ci — payload 4.19.0-0.ci-2026-06-03-210413 (8 days old)
Passed step: `[release:initial]`	5.0.0-0.ci-2026-06-10-000905 (1 day old) — imported successfully
Passed step: `[release:n1minor]`	4.22.0-0.ci-2026-06-07-214855 (4 days old) — imported successfully
Test steps executed	None — job aborted before any e2e-aws multi-stage test steps ran
JUnit XML	`junit_operator.xml`: 25 tests, 3 failures — all 3 are release import steps
PR code relevance	None — failure is in CI infrastructure, not in PR code

openshift-ci Bot added the do-not-merge/needs-area label Mar 10, 2026

coderabbitai Bot reviewed Mar 10, 2026

View reviewed changes

openshift-ci Bot requested review from enxebre and sjenning March 10, 2026 16:26

openshift-ci Bot added the area/documentation Indicates the PR includes changes for documentation label Mar 10, 2026

openshift-ci Bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-area labels Mar 10, 2026

enxebre reviewed Mar 12, 2026

View reviewed changes

Comment thread docs/design/hostedcluster-reconcile-segregation-analysis.md Outdated

enxebre reviewed Mar 12, 2026

View reviewed changes

Comment thread docs/design/hostedcluster-reconcile-segregation-analysis.md Outdated

enxebre reviewed Mar 12, 2026

View reviewed changes

Comment thread docs/design/hostedcluster-reconcile-segregation-analysis.md Outdated

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 15, 2026

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 04ede4b to 168e67a Compare March 31, 2026 11:16

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 31, 2026

muraee temporarily deployed to docs-preview March 31, 2026 11:22 — with GitHub Actions Inactive

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 168e67a to 15e6a0c Compare March 31, 2026 11:23

muraee temporarily deployed to docs-preview March 31, 2026 11:25 — with GitHub Actions Inactive

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 15e6a0c to e7bc83c Compare March 31, 2026 11:27

muraee temporarily deployed to docs-preview March 31, 2026 11:29 — with GitHub Actions Inactive

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from e7bc83c to 5bc8ca5 Compare March 31, 2026 11:40

muraee temporarily deployed to docs-preview March 31, 2026 11:42 — with GitHub Actions Inactive

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 5bc8ca5 to 922ef75 Compare March 31, 2026 11:47

muraee temporarily deployed to docs-preview March 31, 2026 11:49 — with GitHub Actions Inactive

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch 2 times, most recently from 2d2292e to 4a89726 Compare May 5, 2026 16:41

csrwng reviewed May 11, 2026

View reviewed changes

enxebre mentioned this pull request May 28, 2026

OCPBUGS-77268: reconcile HCP when pull secret is unavailable #8352

Merged

4 tasks

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 4a89726 to 857c6b5 Compare May 28, 2026 16:22

openshift-ci Bot added the area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release label May 28, 2026

github-actions Bot temporarily deployed to docs-preview/pr-7908 May 28, 2026 16:32 Inactive

enxebre reviewed May 29, 2026

View reviewed changes

Comment thread support/config/constants.go Outdated

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 857c6b5 to 087c0cf Compare May 29, 2026 12:51

github-actions Bot temporarily deployed to docs-preview/pr-7908 May 29, 2026 12:53 Inactive

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 087c0cf to ed33ab2 Compare June 8, 2026 16:05

github-actions Bot temporarily deployed to docs-preview/pr-7908 June 8, 2026 16:07 Inactive

github-actions Bot temporarily deployed to docs-preview/pr-7908 June 9, 2026 16:38 Inactive

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch 2 times, most recently from e03d791 to 6328533 Compare June 10, 2026 10:24

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 10, 2026

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from 6328533 to db0fee8 Compare June 10, 2026 10:37

muraee and others added 2 commits June 10, 2026 12:43

muraee force-pushed the refactor/hostedcluster-reconcile-error-collecting branch from db0fee8 to e2adc8a Compare June 10, 2026 10:45

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 10, 2026

github-actions Bot deployed to docs-preview/pr-7908 June 10, 2026 10:48 View deployment

Conversation

muraee commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

Error categorization

Phase structure

Condition reporting

Structured error aggregation

reconcileReport API

Analysis

Test plan

Uh oh!

openshift-ci-robot commented Mar 10, 2026

Uh oh!

coderabbitai Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

muraee commented Mar 10, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openshift-ci Bot commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

enxebre commented Mar 12, 2026

Uh oh!

Uh oh!

openshift-merge-robot commented Mar 15, 2026

Uh oh!

codecov Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

csrwng left a comment

Choose a reason for hiding this comment

Ship observability first, then change behavior

Step 0: Observability before behavior change

Then: incremental rollout

Other notes

Uh oh!

Uh oh!

muraee commented Jun 9, 2026

Uh oh!

muraee commented Jun 10, 2026

Uh oh!

hypershift-jira-solve-ci Bot commented Jun 10, 2026

AI Test Failure Analysis

Uh oh!

muraee commented Jun 10, 2026

Uh oh!

muraee commented Jun 11, 2026

Uh oh!

muraee commented Jun 11, 2026

Uh oh!

openshift-ci Bot commented Jun 11, 2026

Uh oh!

hypershift-jira-solve-ci Bot commented Jun 11, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

muraee commented Mar 10, 2026 •

edited

Loading

coderabbitai Bot commented Mar 10, 2026 •

edited

Loading

codecov Bot commented Mar 31, 2026 •

edited

Loading

hypershift-jira-solve-ci Bot commented Jun 11, 2026 •

edited by openshift-ci Bot

Loading