GCP-688: Port remaining v1 TestCreateCluster gaps to v2 e2e by cblecker · Pull Request #8590 · openshift/hypershift

cblecker · 2026-05-26T18:47:28Z

Summary

Ports the remaining 6 v1 TestCreateCluster validation gaps to v2 e2e tests, following up on PR #8511 which ported 14 validations.

Non-lifecycle tests (read-only verification, no cluster mutation):

EnsureNodeCommunication (hosted_cluster_node_communication_test.go): Verifies konnectivity-agent pods exist in the hosted cluster and that pod logs are retrievable, confirming the konnectivity tunnel works
ValidateIngressOperatorConfiguration (hosted_cluster_ingress_test.go): Platform-agnostic test that validates EndpointPublishingStrategy configured on the HostedCluster is reflected in the hosted cluster's IngressController (replaces Azure-only version with platform-agnostic coverage, requires 4.21+)
VerifyCPOOverrideImage (hosted_cluster_cpo_test.go): Checks that when a CPO override image is configured for the platform/version, the running control-plane-operator pod uses that image (self-skips when no override is configured)

Lifecycle tests (mutate cluster state with cleanup):

EnsureDefaultSecurityGroupTags (hosted_cluster_aws_test.go): AWS-only, adds a day-2 tag to the HostedCluster spec and verifies it's applied to the default worker security group via AWS API (requires 4.20+)
EnsureAWSCCMWithCustomizations (hosted_cluster_aws_test.go): AWS-only, validates NLBSecurityGroupMode=Managed in aws-cloud-config ConfigMap and creates a LoadBalancer service to verify managed security groups are attached to the NLB (requires 4.23+)
EnsureGlobalPullSecret (hosted_cluster_pull_secret_test.go): Full pull secret lifecycle — creates additional pull secret, verifies propagation through GlobalPullSecret pipeline, validates DaemonSet readiness, verifies in-place management-cluster pull secret propagation without rollout, and confirms cleanup on deletion (AWS/Azure, requires 4.19+)

Test plan

make e2ev2 compiles successfully
make lint passes with 0 issues
Pre-existing unit test failures are unrelated to these changes
Run v2 suite on GCP — non-lifecycle tests pass, AWS-only tests skip
Run v2 suite on AWS — lifecycle and non-lifecycle AWS tests pass
Run v2 suite on Azure — global pull secret lifecycle test passes

Summary by CodeRabbit

Tests
- Added end-to-end tests for hosted clusters: AWS security‑group tagging and load‑balancer validation, control‑plane operator image override checks, ingress operator endpoint behavior, node communication/log access verification, and global pull‑secret lifecycle scenarios.
Documentation
- Updated e2e framework guidelines: package organization, mandatory cleanup patterns, idempotent resource handling, and improved assertion and testing best practices.
Chores
- Added optional environment variable to configure an additional pull-secret file for e2e runs.

openshift-merge-bot · 2026-05-26T18:47:32Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

openshift-ci-robot · 2026-05-26T18:47:33Z

@cblecker: This pull request explicitly references no jira issue.

Details

In response to this:

Summary

Ports the remaining 6 v1 TestCreateCluster validation gaps to v2 e2e tests, following up on PR #8511 which ported 14 validations.

Non-lifecycle tests (read-only verification, no cluster mutation):

EnsureNodeCommunication (hosted_cluster_node_communication_test.go): Verifies konnectivity-agent pods exist in the hosted cluster and that pod logs are retrievable, confirming the konnectivity tunnel works

ValidateIngressOperatorConfiguration (hosted_cluster_ingress_test.go): Platform-agnostic test that validates EndpointPublishingStrategy configured on the HostedCluster is reflected in the hosted cluster's IngressController (replaces Azure-only version with platform-agnostic coverage, requires 4.21+)

VerifyCPOOverrideImage (hosted_cluster_cpo_test.go): Checks that when a CPO override image is configured for the platform/version, the running control-plane-operator pod uses that image (self-skips when no override is configured)

Lifecycle tests (mutate cluster state with cleanup):

EnsureDefaultSecurityGroupTags (hosted_cluster_aws_test.go): AWS-only, adds a day-2 tag to the HostedCluster spec and verifies it's applied to the default worker security group via AWS API (requires 4.20+)

EnsureAWSCCMWithCustomizations (hosted_cluster_aws_test.go): AWS-only, validates NLBSecurityGroupMode=Managed in aws-cloud-config ConfigMap and creates a LoadBalancer service to verify managed security groups are attached to the NLB (requires 4.23+)

EnsureGlobalPullSecret (hosted_cluster_pull_secret_test.go): Full pull secret lifecycle — creates additional pull secret, verifies propagation through GlobalPullSecret pipeline, validates DaemonSet readiness, verifies in-place management-cluster pull secret propagation without rollout, and confirms cleanup on deletion (AWS/Azure, requires 4.19+)

Test plan

make e2ev2 compiles successfully

make lint passes with 0 issues

Pre-existing unit test failures are unrelated to these changes

Run v2 suite on GCP — non-lifecycle tests pass, AWS-only tests skip

Run v2 suite on AWS — lifecycle and non-lifecycle AWS tests pass

Run v2 suite on Azure — global pull secret lifecycle test passes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-05-26T18:47:36Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-05-26T18:47:46Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds multiple e2ev2 Ginkgo test suites for HostedCluster: CPO override image, ingress operator configuration, konnectivity-agent node communication, AWS-specific tests (default worker security group tagging and CCM/NLB security-group checks), and Global Pull Secret lifecycle tests. Updates AGENTS.md with v2 framework guidance and registers a new optional env var E2E_ADDITIONAL_PULL_SECRET_FILE.

Sequence Diagram(s)

sequenceDiagram
  participant TestRunner
  participant KubernetesAPI
  participant ELBV2
  participant EC2
  TestRunner->>KubernetesAPI: create NLB Service + annotations
  KubernetesAPI->>ELBV2: controller provisions load balancer (hostname observed)
  TestRunner->>ELBV2: describe load balancer by name
  ELBV2->>EC2: return attached security groups
  TestRunner->>EC2: verify non-empty security groups

Suggested reviewers

devguyio
sdminonne

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name	Status	Explanation	Resolution
No-Weak-Crypto	❌ Error	hosted_cluster_pull_secret_test.go line 147 uses non-constant-time bytes.Equal() to compare sensitive pull secret (Docker config JSON) data.	Replace bytes.Equal() with crypto/subtle.ConstantTimeCompare() for secret comparison at line 147.

✅ Passed checks (10 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly references the PR objective of porting remaining v1 TestCreateCluster gaps to v2 e2e tests, matching the primary goal and changeset focus.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All test names are stable and deterministic with no dynamic information like generated identifiers, timestamps, or random suffixes. Only clear, descriptive static strings are used.
Test Structure And Quality	✅ Passed	High-quality Ginkgo test code: single responsibility per test, DeferCleanup for all resources, explicit timeouts, meaningful messages, vacuous pass prevention, idempotent creation.
Topology-Aware Scheduling Compatibility	✅ Passed	Custom check targets deployment manifests, operator code, or controllers. All PR changes are E2E test code validating existing behavior, not production workload definitions.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	Tests contain no IPv4 assumptions, hardcoded IPs, or external connectivity requirements. Uses only internal Kubernetes APIs and local AWS credentials.
Container-Privileges	✅ Passed	PR adds e2e test files, not container manifests. No privileged settings (privileged:true, hostPID, hostNetwork, hostIPC, SYS_ADMIN, allowPrivilegeEscalation) found in any new code.
No-Sensitive-Data-In-Logs	✅ Passed	No logging of sensitive data found. Tests use only test/example data (e2e-dummy-token); env_vars.go properly masks sensitive env var values; no hardcoded credentials.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cblecker · 2026-05-26T18:48:09Z

/test e2e-v2-aws e2e-v2-gke

openshift-ci · 2026-05-26T18:49:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cblecker

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [cblecker]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov · 2026-05-26T18:56:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.27%. Comparing base (972ae23) to head (4a88e92).
⚠️ Report is 107 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8590      +/-   ##
==========================================
+ Coverage   40.53%   41.27%   +0.73%     
==========================================
  Files         755      755              
  Lines       93235    93446     +211     
==========================================
+ Hits        37794    38566     +772     
+ Misses      52740    52148     -592     
- Partials     2701     2732      +31

see 50 files with indirect coverage changes

Flag	Coverage Δ
cmd-support	`34.86% <ø> (+0.41%)`	⬆️
cpo-hostedcontrolplane	`43.50% <ø> (+1.73%)`	⬆️
cpo-other	`42.79% <ø> (+1.55%)`	⬆️
hypershift-operator	`51.00% <ø> (+0.27%)`	⬆️
other	`31.64% <ø> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_aws_test.go`:
- Around line 48-49: The two E2E specs that call external AWS APIs (the It block
with description "should apply the tag to the default worker security group via
AWS API" and the other spec around lines 147-148) must be marked as requiring
external connectivity; add the [Skipped:Disconnected] label to those tests
(e.g., modify the Label("AWS") call or add Label("Skipped:Disconnected")
alongside it) so the test harness will skip them when disconnected, ensuring
tests that call IAM/EC2/ELBv2 are properly flagged.
- Around line 94-100: The test appends a day-2 AWS tag to
hc.Spec.Platform.AWS.ResourceTags but never restores the original slice, leaking
cluster-scoped state; capture the original tags before calling
e2eutil.UpdateObject (e.g. origTags := append([]hyperv1.AWSResourceTag(nil),
hc.Spec.Platform.AWS.ResourceTags...)) and add a cleanup that uses
e2eutil.UpdateObject in an AfterEach/Defer to set
hc.Spec.Platform.AWS.ResourceTags = origTags (or remove the specific day2 tag)
so the HostedCluster resource is restored; reference the HostedCluster variable
(hc), the field Spec.Platform.AWS.ResourceTags, and the e2eutil.UpdateObject
call when making the change.

In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go`:
- Around line 48-50: The test currently calls Skip when version == "" which
hides parsing regressions; change the behavior to fail the test instead by
replacing the Skip call with a test failure (e.g., use t.Fatalf or FailNow with
a clear message) so that when the parsed version is empty we fail with context
including releaseImage and the unexpected empty version; locate the block that
checks version == "" (the one referencing version and releaseImage) and replace
Skip(fmt.Sprintf(...)) with t.Fatalf("could not extract version from release
image %s: parsed version is empty", releaseImage) or an equivalent failure
helper used in the test suite.

In `@test/e2e/v2/tests/hosted_cluster_ingress_test.go`:
- Around line 53-58: The test currently asserts hardcoded values
(operatorv1.LoadBalancerServiceStrategyType and operatorv1.InternalLoadBalancer)
instead of validating the IngressController mirrors the HostedCluster strategy;
update the assertions to compare
IngressController.Spec.EndpointPublishingStrategy to the expectedStrategy object
used earlier (e.g., replace checks against expectedStrategy.Type and
expectedStrategy.LoadBalancer.Scope with a direct equality/assertion between
IngressController.Spec.EndpointPublishingStrategy and expectedStrategy or
compare their Type and LoadBalancer fields to expectedStrategy.Type and
expectedStrategy.LoadBalancer to ensure the controller reflects the configured
strategy); apply the same change to the later block that repeats these hardcoded
assertions.

In `@test/e2e/v2/tests/hosted_cluster_node_communication_test.go`:
- Around line 47-58: The test currently always reads logs from podList.Items[0],
which is flaky because pod ordering is not stable; change the selection logic
inside the Eventually block (where podList is retrieved and GetLogs is called)
to iterate over podList.Items and pick a healthy konnectivity-agent pod (e.g.,
check pod.Status.Conditions for Ready==True or attempt GetLogs in a loop) and
only call GetLogs on the first pod that succeeds; update the GetLogs usage (the
call around GetLogs(...).DoRaw(tc.Context)) to try each pod name from the range
and assert success with g.Expect once a pod returns logs, failing only if none
succeed within the Eventually timeout.

In `@test/e2e/v2/tests/hosted_cluster_pull_secret_test.go`:
- Around line 99-100: Register the cleanup immediately after creating the secret
so it is removed even if the test fails: right after the
createAdditionalPullSecret(tc, hcClient, dummyPullSecretData) call, call
Ginkgo's DeferCleanup (or your test-framework cleanup helper) to delete the
secret (e.g., DeferCleanup(func() { deleteAdditionalPullSecret(tc, hcClient,
"additional-pull-secret") })); keep or remove the later cleanup at lines 139-145
as appropriate, but ensure the immediate DeferCleanup is present so the created
secret is always removed on test exit.
- Around line 232-238: The loop over nodePool.Status.Conditions must first
assert that a NodePoolUpdatingConfigConditionType entry exists before checking
its Status; modify the code that iterates nodePool.Status.Conditions (the block
referencing NodePoolUpdatingConfigConditionType) to record whether the condition
was found (e.g., found := false), set found = true when cond.Type ==
NodePoolUpdatingConfigConditionType, then after the loop add an
Expect(found).To(BeTrue(), "expected NodePoolUpdatingConfigConditionType to be
present") and only then validate Expect(string(cond.Status)).To(Equal(...)) for
the matching condition. Ensure you reference the same condition variable used in
the loop when asserting its Status.
- Around line 82-87: The test currently picks the first NodePool from
npList.Items which is nondeterministic; instead, iterate npList.Items to select
a deterministic NodePool that meets the test preconditions (e.g., Spec.Replicas
!= nil and Spec.Management.UpgradeType != hyperv1.UpgradeTypeInPlace), then if
multiple candidates exist sort them by a stable key like NodePool.Name and pick
the first; update references to np (currently used after selecting
npList.Items[0]) to use the chosen candidate. Ensure you check np.Spec.Replicas
and np.Spec.Management.UpgradeType on the selected NodePool and call Skip only
if no valid candidate is found.
- Around line 275-277: The test currently returns early when
hcClient.Create(tc.Context, secret) yields apierrors.IsAlreadyExists(err), which
leaves stale secret data; instead, on AlreadyExists you should fetch the
existing Secret (use hcClient.Get with the secret's NamespacedName), compare the
relevant fields (data/annotations/labels as used in the test), and if they
differ, update the existing Secret (set fields and use hcClient.Update or a
strategic patch ensuring ResourceVersion is preserved) so the secret content is
reconciled to the expected `secret` value; keep using tc.Context and the
original `secret` object for the desired state and handle other errors normally.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 03ff844d-17df-4497-9441-3b77b338c5b2

📥 Commits

Reviewing files that changed from the base of the PR and between 962cdbb and 8947114.

📒 Files selected for processing (5)

test/e2e/v2/tests/hosted_cluster_aws_test.go
test/e2e/v2/tests/hosted_cluster_cpo_test.go
test/e2e/v2/tests/hosted_cluster_ingress_test.go
test/e2e/v2/tests/hosted_cluster_node_communication_test.go
test/e2e/v2/tests/hosted_cluster_pull_secret_test.go

openshift-ci-robot · 2026-05-26T20:52:04Z

@cblecker: This pull request references GCP-688 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Ports the remaining 6 v1 TestCreateCluster validation gaps to v2 e2e tests, following up on PR #8511 which ported 14 validations.

Non-lifecycle tests (read-only verification, no cluster mutation):

EnsureNodeCommunication (hosted_cluster_node_communication_test.go): Verifies konnectivity-agent pods exist in the hosted cluster and that pod logs are retrievable, confirming the konnectivity tunnel works

ValidateIngressOperatorConfiguration (hosted_cluster_ingress_test.go): Platform-agnostic test that validates EndpointPublishingStrategy configured on the HostedCluster is reflected in the hosted cluster's IngressController (replaces Azure-only version with platform-agnostic coverage, requires 4.21+)

VerifyCPOOverrideImage (hosted_cluster_cpo_test.go): Checks that when a CPO override image is configured for the platform/version, the running control-plane-operator pod uses that image (self-skips when no override is configured)

Lifecycle tests (mutate cluster state with cleanup):

EnsureDefaultSecurityGroupTags (hosted_cluster_aws_test.go): AWS-only, adds a day-2 tag to the HostedCluster spec and verifies it's applied to the default worker security group via AWS API (requires 4.20+)

EnsureAWSCCMWithCustomizations (hosted_cluster_aws_test.go): AWS-only, validates NLBSecurityGroupMode=Managed in aws-cloud-config ConfigMap and creates a LoadBalancer service to verify managed security groups are attached to the NLB (requires 4.23+)

EnsureGlobalPullSecret (hosted_cluster_pull_secret_test.go): Full pull secret lifecycle — creates additional pull secret, verifies propagation through GlobalPullSecret pipeline, validates DaemonSet readiness, verifies in-place management-cluster pull secret propagation without rollout, and confirms cleanup on deletion (AWS/Azure, requires 4.19+)

Test plan

make e2ev2 compiles successfully

make lint passes with 0 issues

Pre-existing unit test failures are unrelated to these changes

Run v2 suite on GCP — non-lifecycle tests pass, AWS-only tests skip

Run v2 suite on AWS — lifecycle and non-lifecycle AWS tests pass

Run v2 suite on Azure — global pull secret lifecycle test passes

Summary by CodeRabbit

Tests

Added comprehensive end-to-end test coverage for hosted clusters, including:

AWS resource tagging and cloud controller manager customizations

Control plane operator image override validation

Ingress operator configuration verification

Node communication and connectivity checks

Global pull secret lifecycle management

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_aws_test.go`:
- Around line 101-111: DeferCleanup currently filters out tags by key and
ignores errors, which can drop pre-existing tags and mask failures; before
mutating obj.Spec.Platform.AWS.ResourceTags capture the original slice (e.g.
origTags := make([]hyperv1.AWSResourceTag, len(...)) copy(...)) and in the
DeferCleanup use e2eutil.UpdateObject to restore the exact original slice back
to obj.Spec.Platform.AWS.ResourceTags; check the returned error and treat
apierrors.IsNotFound(err) as acceptable but fail the test (return or call
GinkgoTB().Fatalf/Fail) for any other non-nil error so cleanup failures are
surfaced; reference functions/types: DeferCleanup, e2eutil.UpdateObject,
hyperv1.HostedCluster, ResourceTags, day2TagKey, and apierrors.IsNotFound.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 2bda1482-1b2e-4a72-936e-e54b68f465d3

📥 Commits

Reviewing files that changed from the base of the PR and between 8947114 and 92fcb28.

📒 Files selected for processing (5)

test/e2e/v2/AGENTS.md
test/e2e/v2/tests/hosted_cluster_aws_test.go
test/e2e/v2/tests/hosted_cluster_ingress_test.go
test/e2e/v2/tests/hosted_cluster_node_communication_test.go
test/e2e/v2/tests/hosted_cluster_pull_secret_test.go

✅ Files skipped from review due to trivial changes (1)

test/e2e/v2/AGENTS.md

Add 6 new v2 e2e tests covering functionality previously only tested in v1 TestCreateCluster: - EnsureNodeCommunication: verifies konnectivity-agent pods exist and logs are retrievable (non-lifecycle) - ValidateIngressOperatorConfiguration: validates EndpointPublishingStrategy propagation to hosted cluster IngressController (non-lifecycle) - VerifyCPOOverrideImage: checks CPO pod runs with expected override image for the platform and version (non-lifecycle) - EnsureDefaultSecurityGroupTags: applies day-2 resource tags to default worker security group via AWS API (lifecycle, AWS) - EnsureAWSCCMWithCustomizations: validates NLBSecurityGroupMode=Managed config and managed security groups on NLB (lifecycle, AWS) - EnsureGlobalPullSecret: full pull secret lifecycle including propagation, DaemonSet readiness, and cleanup (lifecycle) Assisted-by: Claude:claude-opus-4-6

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (3)

test/e2e/v2/tests/hosted_cluster_ingress_test.go (1)

52-58: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Hardcoded strategy assertions limit test generality.

Lines 72-79 were updated to compare against expectedStrategy, but these assertions still hardcode LoadBalancerServiceStrategyType and InternalLoadBalancer. The test's contract is to verify that the IngressController mirrors the HostedCluster configuration, yet these lines artificially restrict the test to one strategy type.

As per coding guidelines, when extracting a value from one resource to validate against another, use the extracted variable in assertions — not hardcoded enum constants.
♻️ Recommended change
-			expectedStrategy := hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy
-			Expect(expectedStrategy.Type).To(Equal(operatorv1.LoadBalancerServiceStrategyType),
-				"EndpointPublishingStrategy type should be LoadBalancerService")
-			Expect(expectedStrategy.LoadBalancer).NotTo(BeNil(),
-				"LoadBalancer configuration should be set")
-			Expect(expectedStrategy.LoadBalancer.Scope).To(Equal(operatorv1.InternalLoadBalancer),
-				"LoadBalancer scope should be Internal")
+			expectedStrategy := hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy
+			Expect(expectedStrategy).NotTo(BeNil(),
+				"EndpointPublishingStrategy should be configured on HostedCluster")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_ingress_test.go` around lines 52 - 58, The
assertions currently hardcode LoadBalancerServiceStrategyType and
InternalLoadBalancer instead of using the extracted expectedStrategy; update the
three Expect checks to compare the IngressController's observed strategy fields
against expectedStrategy (the value extracted from
hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy) so the
test validates mirroring behavior generically rather than enforcing a specific
enum constant.

test/e2e/v2/tests/hosted_cluster_aws_test.go (2)

94-112: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore AWS ResourceTags exactly instead of filtering by key.

Current cleanup removes by key only, which can drop pre-existing tags sharing that key. Capture the original slice before mutation and restore it verbatim; allow IsNotFound during cleanup.

Proposed fix

 import (
 	"fmt"
 	"strings"
 	"time"
@@
 	corev1 "k8s.io/api/core/v1"
+	apierrors "k8s.io/apimachinery/pkg/api/errors"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
@@
 			day2TagKey := "test-day2-tag"
 			day2TagValue := "test-day2-value"
+			originalTags := append([]hyperv1.AWSResourceTag(nil), hc.Spec.Platform.AWS.ResourceTags...)

 			err = e2eutil.UpdateObject(GinkgoTB(), tc.Context, tc.MgmtClient, hc, func(obj *hyperv1.HostedCluster) {
 				obj.Spec.Platform.AWS.ResourceTags = append(obj.Spec.Platform.AWS.ResourceTags, hyperv1.AWSResourceTag{
@@
 			DeferCleanup(func() {
 				err := e2eutil.UpdateObject(GinkgoTB(), tc.Context, tc.MgmtClient, hc, func(obj *hyperv1.HostedCluster) {
-					filtered := make([]hyperv1.AWSResourceTag, 0, len(obj.Spec.Platform.AWS.ResourceTags))
-					for _, tag := range obj.Spec.Platform.AWS.ResourceTags {
-						if tag.Key != day2TagKey {
-							filtered = append(filtered, tag)
-						}
-					}
-					obj.Spec.Platform.AWS.ResourceTags = filtered
+					obj.Spec.Platform.AWS.ResourceTags = append([]hyperv1.AWSResourceTag(nil), originalTags...)
 				})
-				Expect(err).NotTo(HaveOccurred(), "cleanup: failed to remove day-2 tag from HostedCluster")
+				if err != nil && !apierrors.IsNotFound(err) {
+					Expect(err).NotTo(HaveOccurred(), "cleanup: failed to restore HostedCluster AWS resource tags")
+				}
 			})

As per coding guidelines: "When a test mutates cluster state, capture the original state before mutation and defer restoration... In cleanup functions, check `apierrors.IsNotFound()` to handle cases where the resource was already deleted."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_aws_test.go` around lines 94 - 112, The
cleanup currently filters ResourceTags by key which can drop pre-existing tags;
before calling e2eutil.UpdateObject to append the day2 tag in the HostedCluster
(obj.Spec.Platform.AWS.ResourceTags), capture the original slice (e.g., origTags
:= append([]hyperv1.AWSResourceTag(nil), hc.Spec.Platform.AWS.ResourceTags...))
and in DeferCleanup restore that exact slice verbatim via e2eutil.UpdateObject
(setting obj.Spec.Platform.AWS.ResourceTags = origTags); also make the cleanup
tolerant to the resource being deleted by allowing apierrors.IsNotFound() when
checking the returned error from UpdateObject.

48-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Mark external AWS API specs as disconnected-incompatible.

Both specs call external AWS APIs and should include Label("Skipped:Disconnected") so disconnected jobs can filter them.

Proposed fix

-		It("should apply the tag to the default worker security group via AWS API", Label("AWS"), func() {
+		It("should apply the tag to the default worker security group via AWS API", Label("AWS", "Skipped:Disconnected"), func() {
...
-			It("should attach managed security groups to the NLB", func() {
+			It("should attach managed security groups to the NLB", Label("AWS", "CCM", "Skipped:Disconnected"), func() {

As per coding guidelines: "Flag tests requiring external connectivity ... Add [Skipped:Disconnected] label if external connectivity is required and cannot be adapted."

Also applies to: 159-160

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_aws_test.go` around lines 48 - 49, The test
It block that currently uses Label("AWS") (the "should apply the tag to the
default worker security group via AWS API" test) invokes external AWS APIs and
must be marked as disconnected-incompatible; add Label("Skipped:Disconnected")
alongside Label("AWS") on that It call to allow CI to skip when offline, and do
the same for the other It block referenced around the later section (the other
AWS API test). Locate the It(...) calls (the test name strings and the
Label("AWS") usages) and add Label("Skipped:Disconnected") to each.

🧹 Nitpick comments (4)

test/e2e/v2/tests/hosted_cluster_node_communication_test.go (2)

32-36: ⚡ Quick win

Add exported function comments with fail-loud behavior details.

RegisterNodeCommunicationTests and EnsureNodeCommunicationTest are exported but undocumented. In this package, comments should describe actual behavior and mention panic/fail-loud expectations.

Suggested patch

+// RegisterNodeCommunicationTests registers hosted-cluster node-communication e2e specs.
 func RegisterNodeCommunicationTests(getTestCtx internal.TestContextGetter) {
 	EnsureNodeCommunicationTest(getTestCtx)
 }
 
+// EnsureNodeCommunicationTest verifies konnectivity-agent pod presence and log retrievability.
+// It fails loudly via Gomega expectations when hosted-cluster client setup or log retrieval fails.
 func EnsureNodeCommunicationTest(getTestCtx internal.TestContextGetter) {
 	When("hosted cluster has konnectivity tunnel configured", func() {

As per coding guidelines: “Comments on exported functions must describe actual behavior including panic conditions, not just intended behavior.”

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_node_communication_test.go` around lines 32
- 36, Add Go doc comments for the exported functions
RegisterNodeCommunicationTests and EnsureNodeCommunicationTest describing their
actual behavior and explicitly stating fail‑loud/panic conditions (e.g., when a
nil TestContextGetter is passed or when underlying asserts/fatal checks will
call t.Fatal/panic), so callers know these functions may terminate the test run;
place the comments immediately above each function declaration and follow godoc
style (brief summary sentence starting with the function name, followed by
details about behavior and panic/fail‑loud expectations).

38-38: ⚡ Quick win

Label the It block for test filtering consistency.

The top-level Describe is labeled, but this It is not. Add a stable label on It as well to keep filtering behavior consistent with v2 guidance.

Suggested patch

-		It("should have konnectivity-agent pods with retrievable logs", func() {
+		It("should have konnectivity-agent pods with retrievable logs", Label("hosted-cluster-node-communication"), func() {

As per coding guidelines: “Apply labels to Describe and It blocks for test filtering.”

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_node_communication_test.go` at line 38, The
It block "should have konnectivity-agent pods with retrievable logs" is missing
a Ginkgo label; update the It invocation to include a stable Label(...) argument
(e.g. It("should have konnectivity-agent pods with retrievable logs",
Label("hosted-cluster-node-communication"), func() { ... })) so the test can be
filtered consistently with the top-level Describe; modify the It call in
hosted_cluster_node_communication_test.go to add the Label(...) option next to
the test description.

test/e2e/v2/tests/hosted_cluster_cpo_test.go (2)

34-38: ⚡ Quick win

Add doc comments for exported test registration functions.

Lines 34 and 38 export functions without comments describing actual behavior (including skip/failure behavior in the registered flow).

Suggested update

+// RegisterHostedClusterCPOTests registers Hosted Cluster CPO verification tests.
 func RegisterHostedClusterCPOTests(getTestCtx internal.TestContextGetter) {
 	VerifyCPOOverrideImageTest(getTestCtx)
 }
 
+// VerifyCPOOverrideImageTest verifies that a configured CPO override image
+// is used by a running control-plane-operator pod. The test skips when no
+// applicable override is configured for the platform/version.
 func VerifyCPOOverrideImageTest(getTestCtx internal.TestContextGetter) {

As per coding guidelines: test/e2e/v2/**/*.go: “Comments on exported functions must describe actual behavior including panic conditions, not just intended behavior.”

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go` around lines 34 - 38, Add
descriptive doc comments above the exported functions
RegisterHostedClusterCPOTests and VerifyCPOOverrideImageTest that explain their
actual runtime behavior: what tests they register, any side effects, conditions
that cause the test to be skipped, expected failures or panic conditions, and
how the registered flow behaves on error (e.g., whether it calls t.Skip or
t.Fatal). Mention any inputs (getTestCtx) and what state or prerequisites are
required for successful execution so the comments satisfy the guideline that
exported test functions describe actual behavior including skip/failure
behavior.

40-40: ⚡ Quick win

Add a label on the It block for filterability.

Line 40 should include a stable Label(...) so this case can be targeted consistently in filtered runs.

Suggested update

-		It("should run the control-plane-operator pod with the expected override image", func() {
+		It("should run the control-plane-operator pod with the expected override image", Label("hosted-cluster-cpo"), func() {

As per coding guidelines: test/e2e/v2/**/*_test.go: “Apply labels to Describe and It blocks for test filtering.”

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go` at line 40, Add a stable Label
to the Ginkgo It block that reads "should run the control-plane-operator pod
with the expected override image" so the test can be filtered reliably; update
the It declaration for that test case to include a Label argument (e.g.,
"control-plane-operator-override-image" or similar stable identifier) before the
final func() closure so the test is selectable via label-based filtering.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_aws_test.go`:
- Line 140: This test It("should have NLBSecurityGroupMode=Managed in the
aws-cloud-config ConfigMap") is missing a Ginkgo label used by the suite's
filter; add the same Label(...) used by other tests in this file (e.g.,
Label("aws") or the suite-specific label) to the It declaration so it becomes
It("...", Label("..."), func() { ... }), ensuring consistent label-based
filtering with the rest of the suite.
- Around line 173-176: After creating the namespace with
hcClient.Create(tc.Context, testNS), replace the anonymous defer with an
immediate tc.DeferCleanup registration that calls hcClient.Delete(tc.Context,
testNS) and checks the returned error; if the delete error is non-nil and not
apierrors.IsNotFound(err), surface it (e.g., via Expect/Fail) instead of
silently ignoring it. Reference the create call (hcClient.Create), the cleanup
call (hcClient.Delete), the test context (tc.Context), and use
apierrors.IsNotFound to ignore already-deleted resources when implementing the
cleanup.

In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go`:
- Around line 68-81: The current loop picks the first runningPod and then
asserts on container index 0, which is nondeterministic; change the search to
set a found boolean while scanning podList.Items for pods with Status.Phase ==
corev1.PodRunning, assert found is true, and then instead of inspecting
runningPod.Spec.Containers[0] only, check all containers of each running pod to
ensure at least one running control-plane-operator pod contains a container
whose Image equals expectedImage (use pod.Name and expectedImage in failure
messages); update the assertion logic accordingly so it tolerates
rollouts/sidecars and follows the required found-flag pattern.

In `@test/e2e/v2/tests/hosted_cluster_pull_secret_test.go`:
- Around line 200-210: The cleanup currently swallows errors: in the
DeferCleanup closure, replace the blind return on Get error with explicit
handling using apierrors.IsNotFound(err) to return only when the secret truly
doesn't exist, and for any other Get error fail loudly (e.g., tc.T.Fatalf or
equivalent). Likewise, check the error returned by tc.MgmtClient.Update and fail
the test immediately on update failure instead of ignoring it. Target the
DeferCleanup closure and the calls to tc.MgmtClient.Get and tc.MgmtClient.Update
and use apierrors.IsNotFound to distinguish not-found from real errors.
- Around line 82-93: The current selection of a NodePool only checks for
Spec.Replicas != nil but allows zero replicas; update the selection logic to
require a positive replica count so tests exercise node-level behavior: when
iterating npList.Items in the hosted_cluster_pull_secret_test, change the
candidate filter to require candidate.Spec.Replicas != nil &&
*candidate.Spec.Replicas > 0 (and then set nodeCount = *np.Spec.Replicas); also
add an upfront assertion that npList.Items is non-empty before the loop and keep
the Skip with a clear message if no suitable NodePool (non-InPlace upgrade and
replicas > 0) is found.

---

Duplicate comments:
In `@test/e2e/v2/tests/hosted_cluster_aws_test.go`:
- Around line 94-112: The cleanup currently filters ResourceTags by key which
can drop pre-existing tags; before calling e2eutil.UpdateObject to append the
day2 tag in the HostedCluster (obj.Spec.Platform.AWS.ResourceTags), capture the
original slice (e.g., origTags := append([]hyperv1.AWSResourceTag(nil),
hc.Spec.Platform.AWS.ResourceTags...)) and in DeferCleanup restore that exact
slice verbatim via e2eutil.UpdateObject (setting
obj.Spec.Platform.AWS.ResourceTags = origTags); also make the cleanup tolerant
to the resource being deleted by allowing apierrors.IsNotFound() when checking
the returned error from UpdateObject.
- Around line 48-49: The test It block that currently uses Label("AWS") (the
"should apply the tag to the default worker security group via AWS API" test)
invokes external AWS APIs and must be marked as disconnected-incompatible; add
Label("Skipped:Disconnected") alongside Label("AWS") on that It call to allow CI
to skip when offline, and do the same for the other It block referenced around
the later section (the other AWS API test). Locate the It(...) calls (the test
name strings and the Label("AWS") usages) and add Label("Skipped:Disconnected")
to each.

In `@test/e2e/v2/tests/hosted_cluster_ingress_test.go`:
- Around line 52-58: The assertions currently hardcode
LoadBalancerServiceStrategyType and InternalLoadBalancer instead of using the
extracted expectedStrategy; update the three Expect checks to compare the
IngressController's observed strategy fields against expectedStrategy (the value
extracted from
hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy) so the
test validates mirroring behavior generically rather than enforcing a specific
enum constant.

---

Nitpick comments:
In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go`:
- Around line 34-38: Add descriptive doc comments above the exported functions
RegisterHostedClusterCPOTests and VerifyCPOOverrideImageTest that explain their
actual runtime behavior: what tests they register, any side effects, conditions
that cause the test to be skipped, expected failures or panic conditions, and
how the registered flow behaves on error (e.g., whether it calls t.Skip or
t.Fatal). Mention any inputs (getTestCtx) and what state or prerequisites are
required for successful execution so the comments satisfy the guideline that
exported test functions describe actual behavior including skip/failure
behavior.
- Line 40: Add a stable Label to the Ginkgo It block that reads "should run the
control-plane-operator pod with the expected override image" so the test can be
filtered reliably; update the It declaration for that test case to include a
Label argument (e.g., "control-plane-operator-override-image" or similar stable
identifier) before the final func() closure so the test is selectable via
label-based filtering.

In `@test/e2e/v2/tests/hosted_cluster_node_communication_test.go`:
- Around line 32-36: Add Go doc comments for the exported functions
RegisterNodeCommunicationTests and EnsureNodeCommunicationTest describing their
actual behavior and explicitly stating fail‑loud/panic conditions (e.g., when a
nil TestContextGetter is passed or when underlying asserts/fatal checks will
call t.Fatal/panic), so callers know these functions may terminate the test run;
place the comments immediately above each function declaration and follow godoc
style (brief summary sentence starting with the function name, followed by
details about behavior and panic/fail‑loud expectations).
- Line 38: The It block "should have konnectivity-agent pods with retrievable
logs" is missing a Ginkgo label; update the It invocation to include a stable
Label(...) argument (e.g. It("should have konnectivity-agent pods with
retrievable logs", Label("hosted-cluster-node-communication"), func() { ... }))
so the test can be filtered consistently with the top-level Describe; modify the
It call in hosted_cluster_node_communication_test.go to add the Label(...)
option next to the test description.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b2ae6600-9319-46e5-b48a-c0ec45220efc

📥 Commits

Reviewing files that changed from the base of the PR and between 92fcb28 and 2173eee.

📒 Files selected for processing (6)

test/e2e/v2/AGENTS.md
test/e2e/v2/tests/hosted_cluster_aws_test.go
test/e2e/v2/tests/hosted_cluster_cpo_test.go
test/e2e/v2/tests/hosted_cluster_ingress_test.go
test/e2e/v2/tests/hosted_cluster_node_communication_test.go
test/e2e/v2/tests/hosted_cluster_pull_secret_test.go

✅ Files skipped from review due to trivial changes (1)

test/e2e/v2/AGENTS.md

- Add DeferCleanup to remove day-2 tag after AWS security group test, capturing original tags before mutation and restoring them exactly - Use dynamic expectedStrategy values in ingress assertions - Iterate all konnectivity-agent pods for log retrieval - Select suitable NodePool by upgrade type with replicas > 0 - Add DeferCleanup for additional-pull-secret creation - Add vacuous pass guard for UpdatingConfig condition search - Reconcile existing secret on AlreadyExists instead of silent return - Search CPO pods by container name instead of nondeterministic index - Fail on unparseable release image version instead of skipping - Use DeferCleanup with IsNotFound guard for namespace cleanup - Surface pull-secret restore failures in cleanup instead of swallowing - Add Standards 17-18, amend Standard 9 and 16 in AGENTS.md Assisted-by: Claude:claude-opus-4-6

cblecker · 2026-05-27T20:56:13Z

/pipeline required

openshift-merge-bot · 2026-05-27T20:56:16Z

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

bryan-cox · 2026-05-29T17:51:36Z

+				g.Expect(ds.Status.DesiredNumberScheduled).To(BeNumerically(">=", nodeCount))
+				g.Expect(ds.Status.NumberReady).To(Equal(ds.Status.DesiredNumberScheduled))
+			}, 20*time.Minute, 10*time.Second).Should(Succeed())
+		})


question: The v1 EnsureGlobalPullSecret includes three validation steps that are not ported here:

VerifyKubeletConfigWithDaemonSet (called twice in v1) - deploys a verifier DaemonSet that mounts the host filesystem and compares on-disk /var/lib/kubelet/config.json against the cluster pull secret.

Negative pod pull test - creates a pod with unauthorized creds and asserts image pull fails.

Positive pod pull test - creates a pod after applying valid creds and asserts image pull succeeds.

Together these prove the global pull secret pipeline works at the node level, not just at the Kubernetes object level. Was the omission intentional, or should these be added as a follow-up?

The omission was intentional. The v2 test validates the full Kubernetes API-level pipeline: management-cluster secret propagation, GlobalPullSecret creation/update/deletion, and DaemonSet readiness (ovnkube-node, global-pull-secret-syncer, konnectivity-agent). The DaemonSet readiness checks serve as implicit proof that images can be pulled and pods scheduled successfully.

The three omitted v1 steps — VerifyKubeletConfigWithDaemonSet (on-disk kubelet config.json verification), negative pod pull test, and positive pod pull test — are defense-in-depth at the node/filesystem and container-runtime levels. They're valuable but represent a separate concern from the controller pipeline being validated here.

Move E2E_ADDITIONAL_PULL_SECRET_FILE registration to the central env var catalog (Standard openshift#8) and replace defer with DeferCleanup for Ginkgo abort/timeout safety. Assisted-by: Claude:claude-sonnet-4-6

bryan-cox · 2026-05-30T00:18:58Z

/retest

cblecker · 2026-05-30T00:57:34Z

/pipeline required
/verified by e2e

openshift-merge-bot · 2026-05-30T00:57:36Z

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

openshift-ci-robot · 2026-05-30T00:57:44Z

@cblecker: This PR has been marked as verified by e2e.

Details

In response to this:

/pipeline required
/verified by e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

cblecker · 2026-05-30T03:42:21Z

/retest

cblecker · 2026-05-30T18:20:17Z

/retest

hypershift-jira-solve-ci · 2026-05-30T21:50:56Z

I now have all the evidence needed. Here is the final analysis:

Test Failure Analysis Complete

Job Information

Prow Job: pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn-reduced
Build ID: 2060788424960380928
Target: e2e-kubevirt-aws-ovn-reduced
Build Cluster: build03
Start Time: 2026-05-30T18:20:22Z
Duration: ~18 minutes
PR: GCP-688: Port remaining v1 TestCreateCluster gaps to v2 e2e #8590 (GCP-688: Port remaining v1 TestCreateCluster gaps to v2 e2e)

Test Failure Analysis

Error

could not run steps: step [release-inputs:latest] failed: could not resolve source
imagestream ocp/5.0 for release latest: imagestreams.image.openshift.io "5.0" not found

Summary

The e2e-kubevirt-aws-ovn-reduced test never executed. The job failed during the ci-operator graph setup phase — specifically when the [release-inputs:latest] step attempted to snapshot the ocp/5.0 ImageStream on the build03 CI cluster to assemble the latest release for testing. The ImageStream resource "5.0" in namespace ocp was not found on build03 at the time of execution, even though individual image references (e.g., ocp_5.0_cli) resolved successfully via quay-proxy. This is a transient CI infrastructure issue — the same job has completed successfully on other runs of PR #8590 both before and after this failure, confirming the ocp/5.0 ImageStream was temporarily unavailable on build03. The failure is not related to the PR's code changes.

Root Cause

The root cause is a transient unavailability of the ocp/5.0 ImageStream on the build03 CI build cluster.

The ci-operator configuration for openshift/hypershift on the main branch defines a releases.latest.integration entry pointing to name: "5.0", namespace: ocp. This was introduced during the OCP 5.0 branching (April 13, 2026, commit b9deb5c67c).

When the job ran on build03 at 18:20:36Z, the ci-operator:

Successfully loaded the integrated stream metadata from config.ci.openshift.org/integratedStream?name=5.0&namespace=ocp
Successfully resolved individual image references (e.g., ocp/5.0:cli → quay-proxy.ci.openshift.org/openshift/ci:ocp_5.0_cli)
Successfully resolved the initial release from 5.0.0-0.ci release stream
Failed in 10 milliseconds when attempting to snapshot the ocp/5.0 ImageStream resource directly on the build03 cluster — the Kubernetes ImageStream object with name "5.0" did not exist in namespace "ocp" on that cluster at that moment

This means the ImageStream mirrors on build03 were temporarily out of sync. The ImageStream must be present on the build cluster for the snapshot operation, which copies all tags from the source imagestream into the job's stable stream. This is a known class of transient CI infrastructure issue — the ImageStream syncing between the central registry and the build clusters can have intermittent gaps.

Evidence that this is transient: other runs of the same job on PR #8590, occurring both before and after this run, completed successfully (multiple success states visible in the job history for the same PR around the same time window).

Recommendations

Retest the job — This is a transient CI infrastructure failure unrelated to PR GCP-688: Port remaining v1 TestCreateCluster gaps to v2 e2e #8590's code changes. Simply re-trigger the e2e-kubevirt-aws-ovn-reduced job with /retest or /test e2e-kubevirt-aws-ovn-reduced.
No code changes needed — The PR's changes (porting v1 TestCreateCluster gaps to v2 e2e) did not contribute to this failure. The test step never ran; the failure occurred entirely within ci-operator's release image assembly phase.
If the failure recurs persistently, file an issue against the CI infrastructure team (Test Platform / DPTP) regarding ocp/5.0 ImageStream availability on build03. The relevant error reason reported by ci-operator is executing_graph:step_failed:creating_release_images.

Evidence

Evidence	Detail
Error message	`imagestreams.image.openshift.io "5.0" not found`
Failed step	`[release-inputs:latest]` — "Find all of the input images from ocp/5.0 and tag them into the stable stream"
Step duration	0.010 seconds (immediate failure)
Build cluster	`build03` (`console-openshift-console.apps.build03.ci.devcluster.openshift.com`)
Test step status	`e2e-kubevirt-aws-ovn-reduced`: never started
ci-operator reason	`executing_graph:step_failed:creating_release_images`
ImageStream source	`ocp/5.0` (395 tags), configured via `releases.latest.integration` in ci-operator config
Individual images resolved	`ocp/5.0:cli`, `ocp/5.0:manila-csi-driver-operator-test`, etc. all resolved via quay-proxy successfully
Config origin	`openshift/release` repo, `ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml`
5.0 introduced	April 13, 2026 during OCP 5.0 branching (commit `b9deb5c67c`)
Transient proof	Same job on PR #8590 shows `success` runs at 21:44–21:45Z (builds `2060839756312350720`, `2060840007391776768`, etc.)
All image builds succeeded	`src`, `hypershift-operator`, `hypershift`, `hypershift-tests`, `hypershift-cli` all built successfully despite the release-inputs failure

bryan-cox · 2026-06-02T11:50:57Z

/retest

csrwng

Just one comment, otherwise lgtm

csrwng · 2026-06-03T03:40:11Z

+			}
+
+			expectedStrategy := hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy
+			Expect(expectedStrategy.Type).To(Equal(operatorv1.LoadBalancerServiceStrategyType),


I would consider these pre-conditions and not necessarily test failures. If the strategy is not LoadBalancer or LoadBalancer does not have the internal scope, then skip the test.

Good catch. Removed them entirely rather than converting to Skip — these were artifacts of the v1 lifecycle model where the test itself injected that exact configuration via a BeforeApply hook. In v2 the test runs against a pre-existing cluster it doesn't control, so they were both redundant and unnecessarily restrictive. The Eventually block already validates propagation dynamically using expectedStrategy, so the test now works for any configured strategy type.

The v2 ingress test runs against a pre-existing cluster it does not control, unlike the v1 test which explicitly set LoadBalancerService/ Internal scope via a BeforeApply hook. The hardcoded Expect assertions for strategy type and LoadBalancer scope were artifacts of that v1 lifecycle model and unnecessarily restricted test applicability. The Eventually block already validates propagation dynamically using the extracted expectedStrategy values, so the assertions were redundant in addition to being restrictive. Assisted-by: Claude:claude-opus-4-6

cblecker · 2026-06-03T05:25:33Z

/pipeline required
/verified by e2e

openshift-merge-bot · 2026-06-03T05:25:36Z

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

openshift-ci-robot · 2026-06-03T05:25:44Z

@cblecker: This PR has been marked as verified by e2e.

Details

In response to this:

/pipeline required
/verified by e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

csrwng · 2026-06-03T13:12:19Z

/lgtm

openshift-merge-bot · 2026-06-03T13:12:38Z

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

openshift-ci · 2026-06-03T13:42:00Z

@cblecker: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 26, 2026

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2026

openshift-ci Bot added the do-not-merge/needs-area label May 26, 2026

openshift-ci Bot added area/platform/aws PR/issue for AWS (AWSPlatform) platform area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels May 26, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 26, 2026

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

cblecker changed the title ~~NO-JIRA: Port remaining v1 TestCreateCluster gaps to v2 e2e~~ GCP-688: Port remaining v1 TestCreateCluster gaps to v2 e2e May 26, 2026

cblecker marked this pull request as ready for review May 26, 2026 20:52

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2026

openshift-ci Bot requested review from devguyio and sdminonne May 26, 2026 20:53

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

Comment thread test/e2e/v2/tests/hosted_cluster_aws_test.go

cblecker force-pushed the feat/v2-e2e-remaining-create-cluster-gaps branch from 92fcb28 to 2173eee Compare May 26, 2026 22:24

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

hypershift-jira-solve-ci Bot mentioned this pull request May 27, 2026

CNTRLPLANE-3515: test(e2e): add day-2 label/taint no-rollout verification #8595

Open

4 tasks

cblecker force-pushed the feat/v2-e2e-remaining-create-cluster-gaps branch from 2173eee to 29c70cb Compare May 27, 2026 20:39

bryan-cox reviewed May 29, 2026

View reviewed changes

Comment thread test/e2e/v2/tests/hosted_cluster_ingress_test.go

bryan-cox reviewed May 29, 2026

View reviewed changes

Comment thread test/e2e/v2/tests/hosted_cluster_aws_test.go Outdated

fix(e2e/v2): address PR review feedback

6a6d8c1

Move E2E_ADDITIONAL_PULL_SECRET_FILE registration to the central env var catalog (Standard openshift#8) and replace defer with DeferCleanup for Ginkgo abort/timeout safety. Assisted-by: Claude:claude-sonnet-4-6

openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label May 29, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 30, 2026

hypershift-jira-solve-ci Bot mentioned this pull request Jun 1, 2026

OCPBUGS-86306: Stop controllers fighting over HCP status #8562

Merged

4 tasks

csrwng reviewed Jun 3, 2026

View reviewed changes

openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Jun 3, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 3, 2026

openshift-ci Bot assigned csrwng Jun 3, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2026

openshift-merge-bot Bot merged commit 6902fe3 into openshift:main Jun 3, 2026
41 checks passed

cblecker deleted the feat/v2-e2e-remaining-create-cluster-gaps branch June 3, 2026 13:50

coderabbitai Bot mentioned this pull request Jun 3, 2026

NO-JIRA: test(e2e/v2): sweep AGENTS.md conformance violations #8659

Merged

8 tasks

Conversation

cblecker commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented May 26, 2026

Uh oh!

openshift-ci-robot commented May 26, 2026

Summary

Test plan

Uh oh!

openshift-ci Bot commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Sequence Diagram(s)

Suggested reviewers

Pre-merge checks failed

❌ Failed checks (1 error)

Uh oh!

cblecker commented May 26, 2026

Uh oh!

openshift-ci Bot commented May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openshift-ci-robot commented May 26, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cblecker commented May 27, 2026

Uh oh!

openshift-merge-bot Bot commented May 27, 2026

Uh oh!

bryan-cox May 29, 2026

Choose a reason for hiding this comment

Uh oh!

cblecker May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bryan-cox commented May 30, 2026

Uh oh!

cblecker commented May 30, 2026

Uh oh!

openshift-merge-bot Bot commented May 30, 2026

Uh oh!

cblecker commented May 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading

codecov Bot commented May 26, 2026 •

edited

Loading

openshift-ci-robot commented May 26, 2026 •

edited by openshift-ci Bot

Loading

hypershift-jira-solve-ci Bot commented May 30, 2026 •

edited by openshift-ci Bot

Loading