Skip to content

CNTRLPLANE-3271: add External OIDC e2e tests for v2 framework#8674

Open
bryan-cox wants to merge 1 commit into
openshift:mainfrom
bryan-cox:CNTRLPLANE-3271
Open

CNTRLPLANE-3271: add External OIDC e2e tests for v2 framework#8674
bryan-cox wants to merge 1 commit into
openshift:mainfrom
bryan-cox:CNTRLPLANE-3271

Conversation

@bryan-cox

@bryan-cox bryan-cox commented Jun 4, 2026

Copy link
Copy Markdown
Member

What this PR does / why we need it:

Adds automated e2e testing for External OIDC support on self-managed Azure HCP using the v2 test framework. The test deploys Keycloak on the management cluster as a platform-agnostic OIDC provider, creates a new external-oidc cluster variant, and validates the resulting control plane state.

Test areas (11 assertions across 4 contexts):

  • Cluster OIDC Configuration — HC has authentication.type: OIDC and is Available
  • OAuth Server Not Deployed — No oauth-openshift or openshift-oauth-apiserver deployments in the control plane namespace
  • KAS Authentication Configurationauth-config ConfigMap has JWT authenticator matching HC spec; no OAuth webhook
  • Keycloak Authentication & Claims — Token via password grant authenticates to KAS; SelfSubjectReview returns correct username, groups (with prefix), and UID claim mappings

Platform-agnostic design: Keycloak deployment utility and test assertions are reusable by any platform. Only the Azure lifecycle integration is platform-specific. Other platforms can add an external-oidc variant to their lifecycle config with zero test code changes.

Azure AD excluded from automation: Device code flow requires interactive browser sign-in. Manually validated in CNTRLPLANE-3462. Both providers exercise the same KAS JWT authenticator and CPO reconciliation code paths.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/CNTRLPLANE-3271

Special notes for your reviewer:

  • All tests follow v2 AGENTS.md conventions (18 rules verified)
  • The *testing.Ttesting.TB signature widening in test/e2e/util/external_oidc.go is backward compatible — existing v1 callers are unaffected
  • Tests skip automatically on non-OIDC clusters via BeforeEach + Skip() on authentication type
  • The Keycloak deployment in PostCreate() is adapted from the CI step registry idp/external-oidc/keycloak/server/

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Support for External OIDC authentication in hosted clusters.
    • Keycloak deployment and management utilities for OIDC testing.
  • Tests

    • New E2E v2 test suite for External OIDC hosted clusters (validation of OIDC config, OAuth components absence, KAS auth settings, and Keycloak token auth).
    • Extended Azure E2E lifecycle to include an External OIDC cluster variant.
  • Chores

    • Registered env vars for External OIDC CA bundle and test users.
    • Broadened E2E helper interfaces to accept a more general testing context.
    • Registered additional API types for test tooling.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 4, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 4, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 4, 2026

Copy link
Copy Markdown

@bryan-cox: This pull request references CNTRLPLANE-3271 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Adds automated e2e testing for External OIDC support on self-managed Azure HCP using the v2 test framework. The test deploys Keycloak on the management cluster as a platform-agnostic OIDC provider, creates a new external-oidc cluster variant, and validates the resulting control plane state.

Test areas (11 assertions across 4 contexts):

  • Cluster OIDC Configuration — HC has authentication.type: OIDC and is Available
  • OAuth Server Not Deployed — No oauth-openshift or openshift-oauth-apiserver deployments in the control plane namespace
  • KAS Authentication Configurationauth-config ConfigMap has JWT authenticator matching HC spec; no OAuth webhook
  • Keycloak Authentication & Claims — Token via password grant authenticates to KAS; SelfSubjectReview returns correct username, groups (with prefix), and UID claim mappings

Platform-agnostic design: Keycloak deployment utility and test assertions are reusable by any platform. Only the Azure lifecycle integration is platform-specific. Other platforms can add an external-oidc variant to their lifecycle config with zero test code changes.

Azure AD excluded from automation: Device code flow requires interactive browser sign-in. Manually validated in CNTRLPLANE-3462. Both providers exercise the same KAS JWT authenticator and CPO reconciliation code paths.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/CNTRLPLANE-3271

Special notes for your reviewer:

  • All tests follow v2 AGENTS.md conventions (18 rules verified)
  • The *testing.Ttesting.TB signature widening in test/e2e/util/external_oidc.go is backward compatible — existing v1 callers are unaffected
  • Tests skip automatically on non-OIDC clusters via BeforeEach + Skip() on authentication type
  • The Keycloak deployment in PostCreate() is adapted from the CI step registry idp/external-oidc/keycloak/server/

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds end-to-end External OIDC E2E support: broadens four exported test helper signatures to testing.TB, registers two new env vars, adds Keycloak deploy/cleanup utilities (CA extraction and OIDC discovery verification), wires Azure lifecycle to deploy Keycloak and patch HostedCluster auth, and adds a new e2ev2 test suite validating KAS/auth plumbing and Keycloak-based authentication.

Sequence Diagram(s)

sequenceDiagram
  participant TestRunner
  participant AzureLifecycle
  participant KubernetesAPI
  participant Keycloak
  participant HostedClusterAPI

  TestRunner->>AzureLifecycle: request create external-oidc cluster
  AzureLifecycle->>KubernetesAPI: create Keycloak namespace + resources (ConfigMap, Service, StatefulSet, Route)
  KubernetesAPI->>Keycloak: start StatefulSet pods
  Keycloak->>AzureLifecycle: ready (StatefulSet ready)
  AzureLifecycle->>KubernetesAPI: create/update CA ConfigMap and console Secret
  AzureLifecycle->>HostedClusterAPI: patch HostedCluster authentication with ExtOIDCConfig
  TestRunner->>Keycloak: obtain test token (via issuer URL)
  TestRunner->>HostedClusterAPI: SelfSubjectReview using Keycloak token
  HostedClusterAPI->>TestRunner: authentication response (username, groups, uid)
Loading

Possibly related PRs

Suggested reviewers

  • enxebre
  • devguyio
  • clebs
🚥 Pre-merge checks | ✅ 9 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Cluster-scoped Keycloak resources created in PostCreate without cleanup; ~48% of assertions lack meaningful failure messages; some tests validate multiple unrelated behaviors. Add CleanupKeycloak call in PostCreate/teardown; add failure messages to all Expect statements matching patterns like "expected OAuth deployment to not exist in %s".
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning Test pulls Keycloak image from quay.io/keycloak/keycloak:26.0 and verifies OIDC endpoint via external HTTPS, incompatible with disconnected environments. Add [Skipped:Disconnected] label to tests or mirror Keycloak image to internal registry for disconnected/IPv6-only CI environments.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main objective: adding External OIDC end-to-end tests for the v2 testing framework, with a specific Jira ticket reference.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All 11 test names in hosted_cluster_external_oidc_test.go are static and deterministic. No dynamic values, timestamps, UUIDs, or pod/namespace names appear in any test titles.
Topology-Aware Scheduling Compatibility ✅ Passed PR only modifies E2E test code. Keycloak deployment has 1 replica, no affinity/topology constraints, no nodeSelectors. Compatible with all OpenShift topologies (SNO, TNF, TNA, HA, HyperShift).
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms, custom crypto implementations, or insecure comparisons found. Code properly uses crypto/rand, x509 certificate pools, and TLS 1.2+ with verification enabled.
Container-Privileges ✅ Passed No privileged containers, host namespace access, SYS_ADMIN capabilities, or allowPrivilegeEscalation flags found in any container or Kubernetes manifest definitions across all modified files.
No-Sensitive-Data-In-Logs ✅ Passed PR contains no logging of sensitive data (passwords, tokens, API keys, PII, session IDs, credentials, or customer data). Only public OIDC endpoints, cluster names, and test usernames are logged.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/platform/azure PR/issue for Azure (AzurePlatform) platform area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Jun 4, 2026
@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.54%. Comparing base (38f7b17) to head (ffdcc2c).
⚠️ Report is 43 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8674      +/-   ##
==========================================
+ Coverage   41.43%   41.54%   +0.11%     
==========================================
  Files         756      758       +2     
  Lines       93647    93838     +191     
==========================================
+ Hits        38802    38986     +184     
+ Misses      52124    52107      -17     
- Partials     2721     2745      +24     

see 18 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.96% <ø> (+0.08%) ⬆️
cpo-hostedcontrolplane 43.59% <ø> (+0.09%) ⬆️
cpo-other 43.17% <ø> (+0.42%) ⬆️
hypershift-operator 51.62% <ø> (+0.05%) ⬆️
other 31.56% <ø> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/internal/env_vars.go`:
- Around line 222-225: The env var E2E_EXTERNAL_OIDC_TEST_USERS registered via
RegisterEnvVar currently holds raw user:password pairs and can be printed by
PrintEnvVarHelp()/SetupTestEnv() when E2E_SHOW_ENV_HELP is enabled; change this
to avoid exposing secrets by either (A) converting the registration to accept a
file path (e.g., "path to file containing user:password lines") and update any
consumers that read E2E_EXTERNAL_OIDC_TEST_USERS to read that file, or (B) mark
the variable as sensitive in the masking logic used by
PrintEnvVarHelp()/RegisterEnvVar so it is redacted (add
"external_oidc_test_users" or the exact var name to the
secret/password/token/key detection or a dedicated sensitive flag on
RegisterEnvVar) and ensure SetupTestEnv exports it masked. Reference
RegisterEnvVar, E2E_EXTERNAL_OIDC_TEST_USERS, PrintEnvVarHelp, and SetupTestEnv
when making the change.

In `@test/e2e/v2/lifecycle/azure.go`:
- Around line 377-384: The env setup silently ignores os.Setenv errors for the
external OIDC variables; in the SetupTestEnv flow check the return values of
os.Setenv when setting "E2E_EXTERNAL_OIDC_CA_BUNDLE_FILE" and
"E2E_EXTERNAL_OIDC_TEST_USERS" and fail fast (return an error or call t.Fatalf /
log.Fatalf as appropriate) with contextual messages including the filepath and
the original error; update the code around the caPath/file read blocks to handle
and propagate Setenv errors instead of ignoring them.

In `@test/e2e/v2/tests/hosted_cluster_external_oidc_test.go`:
- Around line 60-69: The skipIfNotOIDC helper only checks Authentication.Type
but doesn't guard that Authentication.OIDCProviders exists or has at least one
element, which causes later OIDCProviders[0] accesses to panic; update
skipIfNotOIDC (in test/e2e/v2/tests/hosted_cluster_external_oidc_test.go) to
also check hc.Spec.Configuration.Authentication.OIDCProviders is non-nil and
len(...) > 0 and, if the slice is empty, call Fail (or Failf) with a clear
diagnostic like "External OIDC tests require at least one OIDC provider
configured" instead of Skip so tests fail with a helpful message rather than
panicking. Ensure you reference the Authentication and OIDCProviders fields when
adding the guard.

In `@test/e2e/v2/util/keycloak.go`:
- Line 45: The keycloakImage constant is pinned to
"quay.io/keycloak/keycloak:latest", which makes e2e runs non-reproducible;
update the keycloakImage variable to a specific, tested Keycloak image tag
(e.g., a stable release like 21.x or the exact version you validated), replace
":latest" with that pinned tag in the keycloakImage declaration, and add a short
comment noting why it’s pinned so future changes are explicit (ensure any
related startup script expectations in setup.sh and the readiness probe against
/realms/master are compatible with the chosen version).
- Around line 460-475: The verifyOIDCEndpoint function currently disables TLS
verification by setting tls.Config{InsecureSkipVerify: true}; instead build a
tls.Config that validates the issuer certificate chain using the provided
caBundle: create a x509.NewCertPool(), call AppendCertsFromPEM(caBundle) and set
tls.Config.RootCAs to that pool (remove InsecureSkipVerify), then use that
tls.Config in the http.Transport; keep a fallback only if AppendCertsFromPEM
fails but prefer failing the check and surfacing the error so the OIDC discovery
validates against the router CA.
- Around line 417-440: The getKeycloakIssuerURL function reads the Route once
and can return before spec.host or status.ingress[].host is populated; change it
to poll until a host is available (or a timeout/context deadline is reached). In
getKeycloakIssuerURL use a retry loop (e.g., wait.PollImmediate or simple
time.Ticker) that repeatedly calls client.Get for the Route
(keycloakRouteName/keycloakNamespace), each iteration checking route.Spec.Host
and falling back to route.Status.Ingress[*].Host, and only return the formatted
issuer URL once a non-empty host is found; if the loop times out return a clear
error indicating the host was never assigned. Ensure you reference
getKeycloakIssuerURL, keycloakRouteName, keycloakNamespace and defaultRealm in
the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1439d0d5-16ee-429f-b78d-8f4b49691f60

📥 Commits

Reviewing files that changed from the base of the PR and between 38f7b17 and ecf40c1.

📒 Files selected for processing (5)
  • test/e2e/util/external_oidc.go
  • test/e2e/v2/internal/env_vars.go
  • test/e2e/v2/lifecycle/azure.go
  • test/e2e/v2/tests/hosted_cluster_external_oidc_test.go
  • test/e2e/v2/util/keycloak.go

Comment thread test/e2e/v2/internal/env_vars.go
Comment thread test/e2e/v2/lifecycle/azure.go
Comment thread test/e2e/v2/tests/hosted_cluster_external_oidc_test.go
Comment thread test/e2e/v2/util/keycloak.go Outdated
Comment thread test/e2e/v2/util/keycloak.go Outdated
Comment thread test/e2e/v2/util/keycloak.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e/v2/util/keycloak.go (1)

157-177: 💤 Low value

Consider using json.Marshal for JSON construction.

The JSON strings are built via string concatenation, which could break if consoleRedirectURI contains characters requiring JSON escaping (e.g., quotes, backslashes). While the current caller passes a controlled URL, using json.Marshal would be more robust.

Example approach
type clientConfig struct {
    ClientID    string   `json:"clientId"`
    Enabled     bool     `json:"enabled"`
    // ... other fields
}
cfg := clientConfig{ClientID: defaultConsoleClientID, ...}
jsonBytes, _ := json.Marshal(cfg)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/util/keycloak.go` around lines 157 - 177, In buildSetupConfigMap
replace the hand-assembled JSON string concatenation for cliClientJSON and
consoleClientJSON with strongly-typed structs (e.g., clientConfig with fields
ClientID, Enabled, PublicClient, Secret, RedirectUris,
DirectAccessGrantsEnabled, DefaultClientScopes, Attributes) and use json.Marshal
to produce the JSON strings so values like consoleSecret and consoleRedirectURI
are properly escaped; populate the Attributes map for "access.token.lifespan"
and set RedirectUris as a []string, marshal to []byte and convert to string to
assign to the ConfigMap data entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e/v2/util/keycloak.go`:
- Around line 157-177: In buildSetupConfigMap replace the hand-assembled JSON
string concatenation for cliClientJSON and consoleClientJSON with strongly-typed
structs (e.g., clientConfig with fields ClientID, Enabled, PublicClient, Secret,
RedirectUris, DirectAccessGrantsEnabled, DefaultClientScopes, Attributes) and
use json.Marshal to produce the JSON strings so values like consoleSecret and
consoleRedirectURI are properly escaped; populate the Attributes map for
"access.token.lifespan" and set RedirectUris as a []string, marshal to []byte
and convert to string to assign to the ConfigMap data entries.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 64c1f89f-ce0b-4d5e-9a08-23951ba1c475

📥 Commits

Reviewing files that changed from the base of the PR and between ecf40c1 and 136b816.

📒 Files selected for processing (5)
  • test/e2e/util/external_oidc.go
  • test/e2e/v2/internal/env_vars.go
  • test/e2e/v2/lifecycle/azure.go
  • test/e2e/v2/tests/hosted_cluster_external_oidc_test.go
  • test/e2e/v2/util/keycloak.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/e2e/v2/internal/env_vars.go
  • test/e2e/util/external_oidc.go
  • test/e2e/v2/tests/hosted_cluster_external_oidc_test.go
  • test/e2e/v2/lifecycle/azure.go

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/lifecycle/azure.go`:
- Around line 237-260: The ConfigMap and Secret names (caCM, consoleSecret) are
hardcoded and must be scoped to the HostedCluster: derive their Name using the
test's `name` (e.g. fmt.Sprintf("%s-oidc-ca", name) and "%s-console-secret") and
set those names on the corev1.ConfigMap (caCM) and corev1.Secret
(consoleSecret), pass those derived names into `ExtOIDCConfig`, and add an
OwnerReference to `hc` on both resources before calling
`createOrUpdateResource(ctx, cl, ...)` so they are unique per HostedCluster and
are garbage-collected with the cluster; apply the same changes to the analogous
resources created around lines 264-275.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d2f76272-3893-40c9-8531-ec2b9495ee63

📥 Commits

Reviewing files that changed from the base of the PR and between 4eaad18 and f1e32ba.

📒 Files selected for processing (6)
  • test/e2e/util/external_oidc.go
  • test/e2e/v2/cmd/create-guests/main.go
  • test/e2e/v2/internal/env_vars.go
  • test/e2e/v2/lifecycle/azure.go
  • test/e2e/v2/tests/hosted_cluster_external_oidc_test.go
  • test/e2e/v2/util/keycloak.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/e2e/v2/internal/env_vars.go
  • test/e2e/util/external_oidc.go
  • test/e2e/v2/util/keycloak.go
  • test/e2e/v2/tests/hosted_cluster_external_oidc_test.go

Comment thread test/e2e/v2/lifecycle/azure.go Outdated
Comment on lines +237 to +260
caCM := &corev1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Name: "oidc-ca",
Namespace: namespace,
},
Data: map[string]string{
"ca-bundle.crt": string(kcConfig.CABundle),
},
}
if err := createOrUpdateResource(ctx, cl, caCM); err != nil {
return fmt.Errorf("creating OIDC CA configmap: %w", err)
}

consoleSecret := &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: "console-secret",
Namespace: namespace,
},
Type: corev1.SecretTypeOpaque,
StringData: map[string]string{
"clientSecret": kcConfig.ConsoleClientSecret,
},
}
if err := createOrUpdateResource(ctx, cl, consoleSecret); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope the OIDC ConfigMap/Secret to the HostedCluster.

These names are hardcoded inside the shared HostedCluster namespace, so another external-oidc run can upsert the same oidc-ca / console-secret objects and break an already-running cluster's auth wiring. Please derive the names from name and feed those names into ExtOIDCConfig; adding an owner reference to hc would also let destroy clean them up automatically.

Suggested change
+	caConfigMapName := fmt.Sprintf("%s-oidc-ca", name)
+	consoleSecretName := fmt.Sprintf("%s-console-secret", name)
+
 	caCM := &corev1.ConfigMap{
 		ObjectMeta: metav1.ObjectMeta{
-			Name:      "oidc-ca",
+			Name:      caConfigMapName,
 			Namespace: namespace,
 		},
 		Data: map[string]string{
 			"ca-bundle.crt": string(kcConfig.CABundle),
 		},
 	}
@@
 	consoleSecret := &corev1.Secret{
 		ObjectMeta: metav1.ObjectMeta{
-			Name:      "console-secret",
+			Name:      consoleSecretName,
 			Namespace: namespace,
 		},
@@
 	extOIDCConfig := &e2eutil.ExtOIDCConfig{
@@
-		ConsoleClientSecretName:  "console-secret",
+		ConsoleClientSecretName:  consoleSecretName,
@@
-		IssuerCAConfigmapName:    "oidc-ca",
+		IssuerCAConfigmapName:    caConfigMapName,
 		TestUsers:                kcConfig.TestUsers,
 	}

Also applies to: 264-275

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/lifecycle/azure.go` around lines 237 - 260, The ConfigMap and
Secret names (caCM, consoleSecret) are hardcoded and must be scoped to the
HostedCluster: derive their Name using the test's `name` (e.g.
fmt.Sprintf("%s-oidc-ca", name) and "%s-console-secret") and set those names on
the corev1.ConfigMap (caCM) and corev1.Secret (consoleSecret), pass those
derived names into `ExtOIDCConfig`, and add an OwnerReference to `hc` on both
resources before calling `createOrUpdateResource(ctx, cl, ...)` so they are
unique per HostedCluster and are garbage-collected with the cluster; apply the
same changes to the analogous resources created around lines 264-275.

@bryan-cox

Copy link
Copy Markdown
Member Author

/pj-rehearse pull-ci-openshift-hypershift-main-e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox bryan-cox force-pushed the CNTRLPLANE-3271 branch 2 times, most recently from dd7dc57 to 7db41d2 Compare June 5, 2026 11:09
@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test pull-ci-openshift-hypershift-main-e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

2 similar comments
@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/retest

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

3 similar comments
@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox

Copy link
Copy Markdown
Member Author

/test images

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

1 similar comment
@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@hypershift-jira-solve-ci

hypershift-jira-solve-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown

I now have all the evidence needed. Let me produce the final report:

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Failed to destroy cluster: failed to get hostedcluster: rpc error: code = Unknown desc = malformed header: missing HTTP content-type

Summary

All 416 e2e tests passed successfully (0 failures). The job failure is caused exclusively by a transient infrastructure issue in the post-phase destroy-management-cluster teardown step. The hypershift destroy cluster azure command failed because the IPI base cluster's kube-apiserver returned a malformed HTTP/2 response (missing content-type header) when the destroy code attempted to look up the HostedCluster resource fbef5a5f6a-mgmt. The base cluster API server was already experiencing degraded connectivity during the preceding destroy-guests and dump-management-cluster steps, as evidenced by server timeouts and broken pipe errors. This failure is unrelated to the PR's code changes, which only add External OIDC e2e test files.

Root Cause

The failure occurs in cmd/cluster/core/destroy.go at the GetCluster() function (line 94), which calls c.Get() to retrieve the HostedCluster CR from the IPI base cluster's kube-apiserver. The gRPC transport layer in google.golang.org/grpc/internal/transport/http2_client.go rejects the response because the HTTP/2 headers lack a content-type field, producing the error "malformed header: missing HTTP content-type".

Why the API server returned a malformed response:

The IPI base cluster's API server was under stress/degraded by the time destroy-management-cluster ran. The evidence chain:

  1. 10:06:46–10:10:58 UTC: Multiple 502 Bad Gateway errors from the CI lease service during destroy-guests phase
  2. 10:14:44 UTC: dump-management-cluster received "the server was unable to return a response in the time allotted" when running oc adm inspect for nodes — a Timeout error from the base cluster API
  3. 10:16:43 UTC: Broken pipe error in port forwarding during dump collection — TCP connection was reset
  4. 10:22:42 UTC: destroy-management-cluster received the gRPC malformed header error when querying the API server

The malformed header: missing HTTP content-type error specifically occurs when an HTTP/2 response from a non-gRPC intermediary (such as an Azure load balancer returning an error page, or the kube-apiserver returning a partial/degraded response) is received instead of a proper gRPC response. This is a transient network/infrastructure condition.

This is NOT a code bug. The GetCluster() function correctly wraps the error at line 99. The issue is that the base cluster API server was returning degraded responses, likely due to load or network issues in the CI infrastructure during the extended teardown sequence.

Recommendations
  1. Retest / Rerun the job — This is a transient infrastructure failure in post-phase teardown. The actual tests all passed (416/416). A /retest should succeed.

  2. No code changes needed in PR CNTRLPLANE-3271: add External OIDC e2e tests for v2 framework #8674 — The PR only modifies test files (test/e2e/v2/tests/hosted_cluster_external_oidc_test.go, test/e2e/v2/lifecycle/azure.go, etc.) and is completely unrelated to the destroy-management-cluster infrastructure code.

  3. Consider adding retry logic to GetCluster() (long-term improvement) — The core.GetCluster() function in cmd/cluster/core/destroy.go does not retry transient API errors. Adding retry.OnError() with a backoff for transient gRPC/network errors would make the destroy flow more resilient to brief API server disruptions.

  4. Consider marking post-phase teardown failures as non-blocking — When all test-phase steps pass but only cleanup/teardown steps fail, the job could be reported as "passed with cleanup warnings" rather than a hard failure, since the test results are valid.

Evidence
Evidence Detail
Test Results 416 Passed, 0 Failed — all e2e tests succeeded (tests/finished.json: "passed":true,"result":"SUCCESS")
Failed Step destroy-management-cluster (post phase only, 42s duration)
Error Location cmd/cluster/core/destroy.go:99GetCluster()c.Get() returns gRPC transport error
Error Message rpc error: code = Unknown desc = malformed header: missing HTTP content-type
API Server Timeout dump-management-cluster log: "the server was unable to return a response in the time allotted, but may still be processing the request (get nodes)" at 10:14:44 UTC
Broken Pipe Port forward error at 10:16:43 UTC: "error copying from remote stream to local connection: readfrom tcp6 ... write: broken pipe"
CI Lease 502s 5 × 502 Bad Gateway lease update failures between 10:06:46–10:10:58 UTC
PR Changes Only test files modified: test/e2e/v2/tests/hosted_cluster_external_oidc_test.go, test/e2e/v2/lifecycle/azure.go, test/e2e/v2/util/keycloak.go, etc. — no destroy/infrastructure code
Management Cluster Self-managed HostedCluster fbef5a5f6a-mgmt on IPI base cluster — successful creation and all guest clusters destroyed successfully

@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

Add platform-agnostic External OIDC testing infrastructure and tests
for the v2 e2e framework, using Keycloak as the identity provider.

Changes:
- Widen OIDC utility function signatures from *testing.T to testing.TB
  for Ginkgo v2 compatibility (backward compatible)
- Add Keycloak deployment utility (test/e2e/v2/util/keycloak.go) that
  deploys Keycloak on any management cluster with OIDC clients, test
  users, group mappings, and CA bundle extraction
- Add 'external-oidc' cluster variant to the Azure v2 lifecycle that
  deploys Keycloak in PostCreate and patches the HostedCluster with
  OIDC authentication config
- Add External OIDC validation tests covering: cluster OIDC config,
  OAuth server not deployed, KAS JWT authenticator configuration, and
  Keycloak token authentication with claim mapping verification
- Register E2E_EXTERNAL_OIDC_CA_BUNDLE_FILE and
  E2E_EXTERNAL_OIDC_TEST_USERS environment variables

Tests are platform-agnostic and skip based on authentication type,
not platform. Other platforms can reuse the Keycloak utility and test
file by adding an external-oidc variant to their lifecycle config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

1 similar comment
@bryan-cox

Copy link
Copy Markdown
Member Author

/test e2e-azure-v2-self-managed

@bryan-cox bryan-cox marked this pull request as ready for review June 11, 2026 20:33
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 11, 2026
@openshift-ci openshift-ci Bot requested review from Nirshal and cblecker June 11, 2026 20:33
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@bryan-cox: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/platform/azure PR/issue for Azure (AzurePlatform) platform area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants