Skip to content

OCPBUGS-87028: Fix KAS ReconcileServiceStatus to check LoadBalancer.Hostname before LB provisioning#8663

Open
sdminonne wants to merge 3 commits into
openshift:mainfrom
sdminonne:OCPBUGS-87028/cpo-kas-lb-hostname-check
Open

OCPBUGS-87028: Fix KAS ReconcileServiceStatus to check LoadBalancer.Hostname before LB provisioning#8663
sdminonne wants to merge 3 commits into
openshift:mainfrom
sdminonne:OCPBUGS-87028/cpo-kas-lb-hostname-check

Conversation

@sdminonne

@sdminonne sdminonne commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add a RouteHost helper in support/netutil/route.go that resolves route hostnames consistently, checking RoutePublishingStrategy.Hostname first, then falling back to status.Conditions with the route.openshift.io/host key, and finally the legacy BaseDomain-based default.
  • Fix kas.ReconcileServiceStatus (and oauth.ReconcileServiceStatus) to check LoadBalancer.Hostname before calling CollectLBMessageIfNotProvisioned, so that clusters with a pre-configured LB hostname skip the cloud LB provisioning wait entirely.
  • Update infra.ReconcileInfrastructure to use the RouteHost helper for OAuth and Ignition route hostname resolution, ensuring consistent behavior across Route and LoadBalancer service types.

Test plan

  • Unit tests: kas/service_test.go, oauth/service_test.go, infra/infra_test.go, netutil/route_test.go
  • Verify make build && make test pass
  • E2e: existing break-glass credential tests should continue to pass on 4.23+ payloads

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved hostname resolution for API server, Konnectivity, and OAuth across publishing strategies (prefer explicit route/spec host, fallback to route ingress host; ignore router canonical hostname)
    • Ensured preconfigured LoadBalancer hostnames take immediate precedence and avoid incorrect shared-ingress routing when KAS uses LoadBalancer
  • Improvements

    • Infrastructure status logging now includes richer status details
  • Tests

    • Expanded test coverage for reconciliation and service hostname/port behaviors

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 4, 2026
@openshift-ci

openshift-ci Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 4, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@sdminonne: This pull request references Jira Issue OCPBUGS-87028, which is invalid:

  • expected the bug to target only the "5.0.0" version, but multiple target versions were set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

  • Add a RouteHost helper in support/netutil/route.go that resolves route hostnames consistently, checking RoutePublishingStrategy.Hostname first, then falling back to status.Conditions with the route.openshift.io/host key, and finally the legacy BaseDomain-based default.
  • Fix kas.ReconcileServiceStatus (and oauth.ReconcileServiceStatus) to check LoadBalancer.Hostname before calling CollectLBMessageIfNotProvisioned, so that clusters with a pre-configured LB hostname skip the cloud LB provisioning wait entirely.
  • Update infra.ReconcileInfrastructure to use the RouteHost helper for OAuth and Ignition route hostname resolution, ensuring consistent behavior across Route and LoadBalancer service types.

Test plan

  • Unit tests: kas/service_test.go, oauth/service_test.go, infra/infra_test.go, netutil/route_test.go
  • Verify make build && make test pass
  • E2e: existing break-glass credential tests should continue to pass on 4.23+ payloads

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a3bb936f-a552-430e-9b53-3e29c44e0704

📥 Commits

Reviewing files that changed from the base of the PR and between efc01ec and a164ff6.

📒 Files selected for processing (3)
  • control-plane-operator/controllers/hostedcontrolplane/infra/infra.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • control-plane-operator/controllers/hostedcontrolplane/kas/service_test.go

📝 Walkthrough

Walkthrough

Adds netutil.RouteHost(route) to compute an effective Route hostname (Spec.Host then Status.Ingress[0].Host), updates KAS and OAuth service reconciliation to prefer configured LoadBalancer hostnames and to use RouteHost for Route-based host resolution, tightens a shared-ingress special-case to Route strategies, expands unit tests for these behaviors, and adds two small structured-log improvements in the hosted control plane controller.

Sequence Diagram(s)

sequenceDiagram
  participant HCPCtrl as HostedControlPlaneController
  participant ServiceRecon as ReconcileServiceStatus (KAS/OAuth)
  participant Route as routev1.Route / Service.Status.LoadBalancer
  participant util as netutil.RouteHost
  participant Infra as InfrastructureStatus

  HCPCtrl->>ServiceRecon: trigger reconciliation
  ServiceRecon->>Route: read Route or Service status
  ServiceRecon->>util: RouteHost(route)
  util-->>ServiceRecon: hostname or ""
  ServiceRecon->>Infra: assemble host/port/message into InfrastructureStatus
  HCPCtrl->>Infra: log infraStatus (structured)
Loading

Suggested reviewers

  • cblecker
  • muraee
🚥 Pre-merge checks | ✅ 10 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning New test assertions lack meaningful failure messages. Unlike existing infra_test.go, new tests use plain Expect() without diagnostic messages to help diagnose failures. Add meaningful failure messages to Expect() calls in new tests, e.g., g.Expect(host).To(Equal(expected), "host mismatch").
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: fixing KAS ReconcileServiceStatus to check LoadBalancer.Hostname before LB provisioning, which is the main focus of the PR and directly supported by the changes across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names in the PR are stable and deterministic. No tests contain dynamic content like UUIDs, timestamps, IPs, or generated identifiers. Test names use clear, static descriptions.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies service/infrastructure status reconciliation and route hostname resolution logic only; no scheduling constraints, affinity rules, or topology spreads introduced.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR adds only standard Go unit tests (func TestXxx(t *testing.T)), not Ginkgo e2e tests. The custom check applies only to Ginkgo e2e tests, so it is not applicable.
No-Weak-Crypto ✅ Passed No weak cryptography (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto, or non-constant-time secret comparisons detected. Comparisons involve only non-sensitive infrastructure data.
Container-Privileges ✅ Passed PR contains only Go source code changes to control plane operator logic; no Kubernetes manifests or container privilege settings are present or modified.
No-Sensitive-Data-In-Logs ✅ Passed The PR logs infrastructure status (hostnames, ports, and status messages) which are non-sensitive operational data. No passwords, tokens, API keys, PII, session IDs, or customer data are exposed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release label Jun 4, 2026
@openshift-ci

openshift-ci Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sdminonne
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Jun 4, 2026
@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 78.78788% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.51%. Comparing base (6a6d8c1) to head (a164ff6).
⚠️ Report is 184 commits behind head on main.

Files with missing lines Patch % Lines
...ator/controllers/hostedcontrolplane/infra/infra.go 50.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8663      +/-   ##
==========================================
+ Coverage   40.68%   41.51%   +0.82%     
==========================================
  Files         755      756       +1     
  Lines       93368    93673     +305     
==========================================
+ Hits        37985    38884     +899     
+ Misses      52649    52067     -582     
+ Partials     2734     2722      -12     
Files with missing lines Coverage Δ
...ostedcontrolplane/hostedcontrolplane_controller.go 45.70% <100.00%> (+0.67%) ⬆️
...ator/controllers/hostedcontrolplane/kas/service.go 52.43% <100.00%> (+12.07%) ⬆️
...or/controllers/hostedcontrolplane/oauth/service.go 82.00% <100.00%> (+7.25%) ⬆️
support/netutil/route.go 71.07% <100.00%> (+2.89%) ⬆️
...ator/controllers/hostedcontrolplane/infra/infra.go 53.43% <50.00%> (+2.89%) ⬆️

... and 58 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.90% <100.00%> (+0.19%) ⬆️
cpo-hostedcontrolplane 43.89% <68.18%> (+2.08%) ⬆️
cpo-other 42.74% <ø> (+1.35%) ⬆️
hypershift-operator 51.57% <ø> (+0.75%) ⬆️
other 31.64% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sdminonne and others added 2 commits June 4, 2026 11:29
…ostname resolution

Add a RouteHost helper in support/netutil that resolves route hostnames
consistently across service types. The helper checks
RoutePublishingStrategy.Hostname first, then falls back to
status.Conditions with the route.openshift.io/host key, and finally
the legacy BaseDomain-based default.

Signed-off-by: Salvatore Dario Minonne <sminonne@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ovisioning in KAS service status

- Update kas.ReconcileServiceStatus and oauth.ReconcileServiceStatus to
  check LoadBalancer.Hostname before calling
  CollectLBMessageIfNotProvisioned, so clusters with a pre-configured LB
  hostname skip the cloud LB provisioning wait entirely.
- Update infra.ReconcileInfrastructure to use the RouteHost helper for
  OAuth and Ignition route hostname resolution, ensuring consistent
  behavior across Route and LoadBalancer service types.

Signed-off-by: Salvatore Dario Minonne <sminonne@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sdminonne sdminonne force-pushed the OCPBUGS-87028/cpo-kas-lb-hostname-check branch from b2f9af1 to efc01ec Compare June 4, 2026 09:40

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@control-plane-operator/controllers/hostedcontrolplane/infra/infra.go`:
- Around line 55-62: InfrastructureStatus.String() currently prints raw
hostnames and message (APIHost, KonnectivityHost, InternalHCPRouterHost,
ExternalHCPRouterHost, Message, etc.), which can leak sensitive topology/runtime
details; change the formatter to avoid emitting raw values by replacing host
fields with presence flags or redacted placeholders (e.g., "set" / "<redacted>")
and truncate or redact Message (or replace with a generic summary), keeping
boolean flags like NeedInternalRouter/NeedExternalRouter and ports if
non-sensitive; update the InfrastructureStatus.String() implementation to
reference those fields (APIHost, APIPort, KonnectivityHost, KonnectivityPort,
OAuthEnabled, OAuthHost, OAuthPort, NeedInternalRouter, InternalHCPRouterHost,
NeedExternalRouter, ExternalHCPRouterHost, Message) but only emit non-sensitive
indicators instead of the verbatim hostnames/messages.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 5e35afe5-9ecf-4d62-9913-1b0a35a473e5

📥 Commits

Reviewing files that changed from the base of the PR and between b2f9af1 and efc01ec.

⛔ Files ignored due to path filters (2)
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_When_ARO_shared_ingress_is_enabled_with_KAS_LoadBalancer_strategy__it_should_use_LoadBalancer_host_not_shared_ingress.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_When_NonePlatform_cluster_has_KAS_LoadBalancer_with_configured_hostname__it_should_use_hostname_without_waiting_for_LB.yaml is excluded by !**/testdata/**
📒 Files selected for processing (9)
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/infra/infra.go
  • control-plane-operator/controllers/hostedcontrolplane/infra/infra_test.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service_test.go
  • control-plane-operator/controllers/hostedcontrolplane/oauth/service.go
  • control-plane-operator/controllers/hostedcontrolplane/oauth/service_test.go
  • support/netutil/route.go
  • support/netutil/route_test.go
✅ Files skipped from review due to trivial changes (1)
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • control-plane-operator/controllers/hostedcontrolplane/oauth/service_test.go
  • support/netutil/route.go
  • control-plane-operator/controllers/hostedcontrolplane/oauth/service.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service_test.go
  • support/netutil/route_test.go
  • control-plane-operator/controllers/hostedcontrolplane/infra/infra_test.go

Comment on lines +55 to +62
func (s InfrastructureStatus) String() string {
return fmt.Sprintf("apiHost=%s apiPort=%d konnectivityHost=%s konnectivityPort=%d oauthEnabled=%t oauthHost=%s oauthPort=%d needInternalRouter=%t internalRouterHost=%s needExternalRouter=%t externalRouterHost=%s message=%s",
s.APIHost, s.APIPort, s.KonnectivityHost, s.KonnectivityPort,
s.OAuthEnabled, s.OAuthHost, s.OAuthPort,
s.NeedInternalRouter, s.InternalHCPRouterHost,
s.NeedExternalRouter, s.ExternalHCPRouterHost,
s.Message)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid serializing hostnames/message verbatim in InfrastructureStatus.String().

Line 56 currently includes internal/external hostnames and raw message, which can leak sensitive topology/runtime details when this struct is logged. Prefer presence flags (or redacted values) instead of raw host/message content.

Proposed safer formatter
 func (s InfrastructureStatus) String() string {
-	return fmt.Sprintf("apiHost=%s apiPort=%d konnectivityHost=%s konnectivityPort=%d oauthEnabled=%t oauthHost=%s oauthPort=%d needInternalRouter=%t internalRouterHost=%s needExternalRouter=%t externalRouterHost=%s message=%s",
-		s.APIHost, s.APIPort, s.KonnectivityHost, s.KonnectivityPort,
-		s.OAuthEnabled, s.OAuthHost, s.OAuthPort,
-		s.NeedInternalRouter, s.InternalHCPRouterHost,
-		s.NeedExternalRouter, s.ExternalHCPRouterHost,
-		s.Message)
+	return fmt.Sprintf("apiPort=%d konnectivityPort=%d oauthEnabled=%t oauthPort=%d needInternalRouter=%t needExternalRouter=%t apiHostSet=%t konnectivityHostSet=%t oauthHostSet=%t internalRouterHostSet=%t externalRouterHostSet=%t hasMessage=%t",
+		s.APIPort, s.KonnectivityPort, s.OAuthEnabled, s.OAuthPort,
+		s.NeedInternalRouter, s.NeedExternalRouter,
+		s.APIHost != "", s.KonnectivityHost != "", s.OAuthHost != "",
+		s.InternalHCPRouterHost != "", s.ExternalHCPRouterHost != "",
+		s.Message != "")
 }

As per coding guidelines: "Flag logging that may expose passwords, tokens, API keys, PII ... internal hostnames, or customer data."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s InfrastructureStatus) String() string {
return fmt.Sprintf("apiHost=%s apiPort=%d konnectivityHost=%s konnectivityPort=%d oauthEnabled=%t oauthHost=%s oauthPort=%d needInternalRouter=%t internalRouterHost=%s needExternalRouter=%t externalRouterHost=%s message=%s",
s.APIHost, s.APIPort, s.KonnectivityHost, s.KonnectivityPort,
s.OAuthEnabled, s.OAuthHost, s.OAuthPort,
s.NeedInternalRouter, s.InternalHCPRouterHost,
s.NeedExternalRouter, s.ExternalHCPRouterHost,
s.Message)
}
func (s InfrastructureStatus) String() string {
return fmt.Sprintf("apiPort=%d konnectivityPort=%d oauthEnabled=%t oauthPort=%d needInternalRouter=%t needExternalRouter=%t apiHostSet=%t konnectivityHostSet=%t oauthHostSet=%t internalRouterHostSet=%t externalRouterHostSet=%t hasMessage=%t",
s.APIPort, s.KonnectivityPort, s.OAuthEnabled, s.OAuthPort,
s.NeedInternalRouter, s.NeedExternalRouter,
s.APIHost != "", s.KonnectivityHost != "", s.OAuthHost != "",
s.InternalHCPRouterHost != "", s.ExternalHCPRouterHost != "",
s.Message != "")
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@control-plane-operator/controllers/hostedcontrolplane/infra/infra.go` around
lines 55 - 62, InfrastructureStatus.String() currently prints raw hostnames and
message (APIHost, KonnectivityHost, InternalHCPRouterHost,
ExternalHCPRouterHost, Message, etc.), which can leak sensitive topology/runtime
details; change the formatter to avoid emitting raw values by replacing host
fields with presence flags or redacted placeholders (e.g., "set" / "<redacted>")
and truncate or redact Message (or replace with a generic summary), keeping
boolean flags like NeedInternalRouter/NeedExternalRouter and ports if
non-sensitive; update the InfrastructureStatus.String() implementation to
reference those fields (APIHost, APIPort, KonnectivityHost, KonnectivityPort,
OAuthEnabled, OAuthHost, OAuthPort, NeedInternalRouter, InternalHCPRouterHost,
NeedExternalRouter, ExternalHCPRouterHost, Message) but only emit non-sensitive
indicators instead of the verbatim hostnames/messages.

@sdminonne

Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 4, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@sdminonne: This pull request references Jira Issue OCPBUGS-87028, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sdminonne sdminonne marked this pull request as ready for review June 4, 2026 10:13
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 4, 2026
@openshift-ci openshift-ci Bot requested review from enxebre and sjenning June 4, 2026 10:14
@sdminonne

Copy link
Copy Markdown
Contributor Author

/pipeline-required

func ReconcileServiceStatus(svc *corev1.Service, strategy *hyperv1.ServicePublishingStrategy, apiServerPort int, messageCollector events.MessageCollector) (host string, port int32, message string, err error) {
switch strategy.Type {
case hyperv1.LoadBalancer:
if strategy.LoadBalancer != nil && strategy.LoadBalancer.Hostname != "" {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to think more carefully about the implications of this. Currently we block provisioning if the LoadBalancer has not provisioned. By returning the configured hostname, we're skipping that and proceeding with the control plane rollout. It's not a bad change, but it's definitely a change in behavior and we need to check whether it might negatively impact existing deployments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it make sense. I've moved the LB-bypass out of ReconcileServiceStatus (now reverted to original logic) and only for NonePlatform in reconcileAPIServerServiceStatus. I think this should preserve all the other platform behaviors. At the end of the day this is only to test NonePlatform avoiding different behaviours between AWS and AKS (at least for the moment).

… only

Revert ReconcileServiceStatus to the original behavior where the LB
provisioning check always runs before returning a host. The LB-bypass
for a configured LoadBalancer.Hostname is now handled at the call site
in reconcileAPIServerServiceStatus, scoped to NonePlatform only.

On NonePlatform there is no cloud provider to provision a real load
balancer, so the configured hostname is authoritative and waiting for
LB ingress status would block indefinitely. For all other platforms
the existing behavior is preserved: the LB must be provisioned before
the control plane rollout proceeds.

This follows the same pattern used for shared ingress and IBMCloud
platform-specific early returns in the same function.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sdminonne

Copy link
Copy Markdown
Contributor Author

@csrwng PTAL
Commited on top for clarity. Happy to squash in case the new approach looks more viable.
TY!

@openshift-ci-robot

Copy link
Copy Markdown

@sdminonne: This pull request references Jira Issue OCPBUGS-87028, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Summary

  • Add a RouteHost helper in support/netutil/route.go that resolves route hostnames consistently, checking RoutePublishingStrategy.Hostname first, then falling back to status.Conditions with the route.openshift.io/host key, and finally the legacy BaseDomain-based default.
  • Fix kas.ReconcileServiceStatus (and oauth.ReconcileServiceStatus) to check LoadBalancer.Hostname before calling CollectLBMessageIfNotProvisioned, so that clusters with a pre-configured LB hostname skip the cloud LB provisioning wait entirely.
  • Update infra.ReconcileInfrastructure to use the RouteHost helper for OAuth and Ignition route hostname resolution, ensuring consistent behavior across Route and LoadBalancer service types.

Test plan

  • Unit tests: kas/service_test.go, oauth/service_test.go, infra/infra_test.go, netutil/route_test.go
  • Verify make build && make test pass
  • E2e: existing break-glass credential tests should continue to pass on 4.23+ payloads

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

  • Improved hostname resolution for API server, Konnectivity, and OAuth across publishing strategies (prefer explicit route/spec host, fallback to route ingress host; ignore router canonical hostname)

  • Ensured preconfigured LoadBalancer hostnames take immediate precedence and avoid incorrect shared-ingress routing when KAS uses LoadBalancer

  • Improvements

  • Infrastructure status logging now includes richer status details

  • Tests

  • Expanded test coverage for reconciliation and service hostname/port behaviors

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

@sdminonne: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 10, 2026
@openshift-ci

openshift-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hypershift-jira-solve-ci

hypershift-jira-solve-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown

Now I have the complete picture. Here is the analysis:

Test Failure Analysis Complete

Job Information

  • Prow Job: tide (merge automation)
  • Build ID: N/A (tide is a merge controller, not a test job)
  • PR: #8663 — OCPBUGS-87028: Fix KAS ReconcileServiceStatus to check LoadBalancer.Hostname before LB provisioning
  • Target: openshift/hypershift main branch
  • Tide Status: ERROR since 2026-06-10T13:57:53Z

Test Failure Analysis

Error

tide: "Not mergeable. PR has a merge conflict."

Summary

This is not a test failure — it is a merge conflict preventing tide from merging the PR. PR #8537 ("CNTRLPLANE-3250,CNTRLPLANE-430: API-driven Azure topology and private connectivity (Phase 1)") was merged to main on 2026-06-10 at 10:24 UTC, five days after PR #8663's last commit (2026-06-05). Both PRs modify the same four files in the control-plane-operator, creating Git merge conflicts. Additionally, the PR has never received lgtm, approved, or ok-to-test labels, so the 8 e2e Prow jobs (e2e-aws, e2e-aks, e2e-v2-aws, etc.) were never triggered — they remain in PENDING state with no target URL since June 5. No CI test has actually run or failed on this PR.

Root Cause

The tide error state is caused by a Git merge conflict between PR #8663 and the main branch, triggered by the merge of PR #8537 on June 10, 2026.

Conflicting PR: #8537 — "CNTRLPLANE-3250,CNTRLPLANE-430: API-driven Azure topology and private connectivity (Phase 1)" — merged 2026-06-10T10:24:42Z.

Conflicting files (modified by both PRs):

  1. control-plane-operator/controllers/hostedcontrolplane/kas/service.go — both PRs modify KAS service logic
  2. control-plane-operator/controllers/hostedcontrolplane/infra/infra.go — both PRs modify infrastructure reconciliation
  3. control-plane-operator/controllers/hostedcontrolplane/infra/infra_test.go — both PRs modify infra tests
  4. control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go — both PRs modify the HCP controller

Timeline:

Additional blockers: The PR also lacks the lgtm, approved, and ok-to-test labels. Without ok-to-test, the 8 required e2e presubmit jobs were never triggered (all show PENDING with empty target URLs since June 5). Even after resolving the rebase, the PR would still need these labels before e2e tests will run and tide can merge.

Recommendations
  1. Rebase PR OCPBUGS-87028: Fix KAS ReconcileServiceStatus to check LoadBalancer.Hostname before LB provisioning #8663 onto current main — resolve the merge conflicts with PR CNTRLPLANE-3250,CNTRLPLANE-430: API-driven Azure topology and private connectivity (Phase 1) #8537's changes in the 4 overlapping files (kas/service.go, infra/infra.go, infra/infra_test.go, hostedcontrolplane_controller.go). Pay special attention to kas/service.go since both PRs modify KAS service reconciliation logic and CNTRLPLANE-3250,CNTRLPLANE-430: API-driven Azure topology and private connectivity (Phase 1) #8537 introduces Azure-specific topology changes.

  2. Request review and approval — the PR has only COMMENTED reviews (from @csrwng and @coderabbitai), but no lgtm or approved labels. After rebasing, request a formal review to obtain /lgtm and /approve.

  3. Trigger e2e tests — after rebase and obtaining ok-to-test, the 8 presubmit e2e jobs will run for the first time. Verify they pass before expecting tide to merge.

  4. Verify test compatibility — PR CNTRLPLANE-3250,CNTRLPLANE-430: API-driven Azure topology and private connectivity (Phase 1) #8537 was a large change (80+ files) that significantly refactored Azure handling and KAS service logic. After rebasing, ensure the KAS LoadBalancer hostname check logic in this PR is still correct given the new Azure private connectivity patterns introduced by CNTRLPLANE-3250,CNTRLPLANE-430: API-driven Azure topology and private connectivity (Phase 1) #8537.

Evidence
Evidence Detail
Tide status description "Not mergeable. PR has a merge conflict." posted 2026-06-10T13:57:53Z
GitHub mergeable state CONFLICTING / DIRTY
needs-rebase label Added by openshift-ci[bot] at 2026-06-10T13:34:36Z
Conflicting PR #8537 merged 2026-06-10T10:24:42Z — "API-driven Azure topology and private connectivity (Phase 1)"
Overlapping file 1 kas/service.go — both PRs modify KAS service reconciliation
Overlapping file 2 infra/infra.go — both PRs modify infrastructure reconciliation
Overlapping file 3 infra/infra_test.go — both PRs modify infra tests
Overlapping file 4 hostedcontrolplane_controller.go — both PRs modify HCP controller
Missing labels lgtm, approved, ok-to-test — none present
E2e jobs status All 8 jobs PENDING since 2026-06-05 with no target URL (never triggered)
PR last updated 2026-06-05T07:41:44Z (5 days before conflict was introduced)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants