Skip to content

GCP-688: Port remaining v1 TestCreateCluster gaps to v2 e2e#8590

Merged
openshift-merge-bot[bot] merged 4 commits into
openshift:mainfrom
cblecker:feat/v2-e2e-remaining-create-cluster-gaps
Jun 3, 2026
Merged

GCP-688: Port remaining v1 TestCreateCluster gaps to v2 e2e#8590
openshift-merge-bot[bot] merged 4 commits into
openshift:mainfrom
cblecker:feat/v2-e2e-remaining-create-cluster-gaps

Conversation

@cblecker

@cblecker cblecker commented May 26, 2026

Copy link
Copy Markdown
Member

Summary

Ports the remaining 6 v1 TestCreateCluster validation gaps to v2 e2e tests, following up on PR #8511 which ported 14 validations.

Non-lifecycle tests (read-only verification, no cluster mutation):

  • EnsureNodeCommunication (hosted_cluster_node_communication_test.go): Verifies konnectivity-agent pods exist in the hosted cluster and that pod logs are retrievable, confirming the konnectivity tunnel works
  • ValidateIngressOperatorConfiguration (hosted_cluster_ingress_test.go): Platform-agnostic test that validates EndpointPublishingStrategy configured on the HostedCluster is reflected in the hosted cluster's IngressController (replaces Azure-only version with platform-agnostic coverage, requires 4.21+)
  • VerifyCPOOverrideImage (hosted_cluster_cpo_test.go): Checks that when a CPO override image is configured for the platform/version, the running control-plane-operator pod uses that image (self-skips when no override is configured)

Lifecycle tests (mutate cluster state with cleanup):

  • EnsureDefaultSecurityGroupTags (hosted_cluster_aws_test.go): AWS-only, adds a day-2 tag to the HostedCluster spec and verifies it's applied to the default worker security group via AWS API (requires 4.20+)
  • EnsureAWSCCMWithCustomizations (hosted_cluster_aws_test.go): AWS-only, validates NLBSecurityGroupMode=Managed in aws-cloud-config ConfigMap and creates a LoadBalancer service to verify managed security groups are attached to the NLB (requires 4.23+)
  • EnsureGlobalPullSecret (hosted_cluster_pull_secret_test.go): Full pull secret lifecycle — creates additional pull secret, verifies propagation through GlobalPullSecret pipeline, validates DaemonSet readiness, verifies in-place management-cluster pull secret propagation without rollout, and confirms cleanup on deletion (AWS/Azure, requires 4.19+)

Test plan

  • make e2ev2 compiles successfully
  • make lint passes with 0 issues
  • Pre-existing unit test failures are unrelated to these changes
  • Run v2 suite on GCP — non-lifecycle tests pass, AWS-only tests skip
  • Run v2 suite on AWS — lifecycle and non-lifecycle AWS tests pass
  • Run v2 suite on Azure — global pull secret lifecycle test passes

Summary by CodeRabbit

  • Tests
    • Added end-to-end tests for hosted clusters: AWS security‑group tagging and load‑balancer validation, control‑plane operator image override checks, ingress operator endpoint behavior, node communication/log access verification, and global pull‑secret lifecycle scenarios.
  • Documentation
    • Updated e2e framework guidelines: package organization, mandatory cleanup patterns, idempotent resource handling, and improved assertion and testing best practices.
  • Chores
    • Added optional environment variable to configure an additional pull-secret file for e2e runs.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot

Copy link
Copy Markdown

@cblecker: This pull request explicitly references no jira issue.

Details

In response to this:

Summary

Ports the remaining 6 v1 TestCreateCluster validation gaps to v2 e2e tests, following up on PR #8511 which ported 14 validations.

Non-lifecycle tests (read-only verification, no cluster mutation):

  • EnsureNodeCommunication (hosted_cluster_node_communication_test.go): Verifies konnectivity-agent pods exist in the hosted cluster and that pod logs are retrievable, confirming the konnectivity tunnel works
  • ValidateIngressOperatorConfiguration (hosted_cluster_ingress_test.go): Platform-agnostic test that validates EndpointPublishingStrategy configured on the HostedCluster is reflected in the hosted cluster's IngressController (replaces Azure-only version with platform-agnostic coverage, requires 4.21+)
  • VerifyCPOOverrideImage (hosted_cluster_cpo_test.go): Checks that when a CPO override image is configured for the platform/version, the running control-plane-operator pod uses that image (self-skips when no override is configured)

Lifecycle tests (mutate cluster state with cleanup):

  • EnsureDefaultSecurityGroupTags (hosted_cluster_aws_test.go): AWS-only, adds a day-2 tag to the HostedCluster spec and verifies it's applied to the default worker security group via AWS API (requires 4.20+)
  • EnsureAWSCCMWithCustomizations (hosted_cluster_aws_test.go): AWS-only, validates NLBSecurityGroupMode=Managed in aws-cloud-config ConfigMap and creates a LoadBalancer service to verify managed security groups are attached to the NLB (requires 4.23+)
  • EnsureGlobalPullSecret (hosted_cluster_pull_secret_test.go): Full pull secret lifecycle — creates additional pull secret, verifies propagation through GlobalPullSecret pipeline, validates DaemonSet readiness, verifies in-place management-cluster pull secret propagation without rollout, and confirms cleanup on deletion (AWS/Azure, requires 4.19+)

Test plan

  • make e2ev2 compiles successfully
  • make lint passes with 0 issues
  • Pre-existing unit test failures are unrelated to these changes
  • Run v2 suite on GCP — non-lifecycle tests pass, AWS-only tests skip
  • Run v2 suite on AWS — lifecycle and non-lifecycle AWS tests pass
  • Run v2 suite on Azure — global pull secret lifecycle test passes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 26, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2026
@openshift-ci

openshift-ci Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds multiple e2ev2 Ginkgo test suites for HostedCluster: CPO override image, ingress operator configuration, konnectivity-agent node communication, AWS-specific tests (default worker security group tagging and CCM/NLB security-group checks), and Global Pull Secret lifecycle tests. Updates AGENTS.md with v2 framework guidance and registers a new optional env var E2E_ADDITIONAL_PULL_SECRET_FILE.

Sequence Diagram(s)

sequenceDiagram
  participant TestRunner
  participant KubernetesAPI
  participant ELBV2
  participant EC2
  TestRunner->>KubernetesAPI: create NLB Service + annotations
  KubernetesAPI->>ELBV2: controller provisions load balancer (hostname observed)
  TestRunner->>ELBV2: describe load balancer by name
  ELBV2->>EC2: return attached security groups
  TestRunner->>EC2: verify non-empty security groups
Loading

Suggested reviewers

  • devguyio
  • sdminonne

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name Status Explanation Resolution
No-Weak-Crypto ❌ Error hosted_cluster_pull_secret_test.go line 147 uses non-constant-time bytes.Equal() to compare sensitive pull secret (Docker config JSON) data. Replace bytes.Equal() with crypto/subtle.ConstantTimeCompare() for secret comparison at line 147.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly references the PR objective of porting remaining v1 TestCreateCluster gaps to v2 e2e tests, matching the primary goal and changeset focus.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names are stable and deterministic with no dynamic information like generated identifiers, timestamps, or random suffixes. Only clear, descriptive static strings are used.
Test Structure And Quality ✅ Passed High-quality Ginkgo test code: single responsibility per test, DeferCleanup for all resources, explicit timeouts, meaningful messages, vacuous pass prevention, idempotent creation.
Topology-Aware Scheduling Compatibility ✅ Passed Custom check targets deployment manifests, operator code, or controllers. All PR changes are E2E test code validating existing behavior, not production workload definitions.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Tests contain no IPv4 assumptions, hardcoded IPs, or external connectivity requirements. Uses only internal Kubernetes APIs and local AWS credentials.
Container-Privileges ✅ Passed PR adds e2e test files, not container manifests. No privileged settings (privileged:true, hostPID, hostNetwork, hostIPC, SYS_ADMIN, allowPrivilegeEscalation) found in any new code.
No-Sensitive-Data-In-Logs ✅ Passed No logging of sensitive data found. Tests use only test/example data (e2e-dummy-token); env_vars.go properly masks sensitive env var values; no hardcoded credentials.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@cblecker

Copy link
Copy Markdown
Member Author

/test e2e-v2-aws e2e-v2-gke

@openshift-ci openshift-ci Bot added area/platform/aws PR/issue for AWS (AWSPlatform) platform area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels May 26, 2026
@openshift-ci

openshift-ci Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cblecker

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 26, 2026
@codecov

codecov Bot commented May 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.27%. Comparing base (972ae23) to head (4a88e92).
⚠️ Report is 107 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8590      +/-   ##
==========================================
+ Coverage   40.53%   41.27%   +0.73%     
==========================================
  Files         755      755              
  Lines       93235    93446     +211     
==========================================
+ Hits        37794    38566     +772     
+ Misses      52740    52148     -592     
- Partials     2701     2732      +31     

see 50 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.86% <ø> (+0.41%) ⬆️
cpo-hostedcontrolplane 43.50% <ø> (+1.73%) ⬆️
cpo-other 42.79% <ø> (+1.55%) ⬆️
hypershift-operator 51.00% <ø> (+0.27%) ⬆️
other 31.64% <ø> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_aws_test.go`:
- Around line 48-49: The two E2E specs that call external AWS APIs (the It block
with description "should apply the tag to the default worker security group via
AWS API" and the other spec around lines 147-148) must be marked as requiring
external connectivity; add the [Skipped:Disconnected] label to those tests
(e.g., modify the Label("AWS") call or add Label("Skipped:Disconnected")
alongside it) so the test harness will skip them when disconnected, ensuring
tests that call IAM/EC2/ELBv2 are properly flagged.
- Around line 94-100: The test appends a day-2 AWS tag to
hc.Spec.Platform.AWS.ResourceTags but never restores the original slice, leaking
cluster-scoped state; capture the original tags before calling
e2eutil.UpdateObject (e.g. origTags := append([]hyperv1.AWSResourceTag(nil),
hc.Spec.Platform.AWS.ResourceTags...)) and add a cleanup that uses
e2eutil.UpdateObject in an AfterEach/Defer to set
hc.Spec.Platform.AWS.ResourceTags = origTags (or remove the specific day2 tag)
so the HostedCluster resource is restored; reference the HostedCluster variable
(hc), the field Spec.Platform.AWS.ResourceTags, and the e2eutil.UpdateObject
call when making the change.

In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go`:
- Around line 48-50: The test currently calls Skip when version == "" which
hides parsing regressions; change the behavior to fail the test instead by
replacing the Skip call with a test failure (e.g., use t.Fatalf or FailNow with
a clear message) so that when the parsed version is empty we fail with context
including releaseImage and the unexpected empty version; locate the block that
checks version == "" (the one referencing version and releaseImage) and replace
Skip(fmt.Sprintf(...)) with t.Fatalf("could not extract version from release
image %s: parsed version is empty", releaseImage) or an equivalent failure
helper used in the test suite.

In `@test/e2e/v2/tests/hosted_cluster_ingress_test.go`:
- Around line 53-58: The test currently asserts hardcoded values
(operatorv1.LoadBalancerServiceStrategyType and operatorv1.InternalLoadBalancer)
instead of validating the IngressController mirrors the HostedCluster strategy;
update the assertions to compare
IngressController.Spec.EndpointPublishingStrategy to the expectedStrategy object
used earlier (e.g., replace checks against expectedStrategy.Type and
expectedStrategy.LoadBalancer.Scope with a direct equality/assertion between
IngressController.Spec.EndpointPublishingStrategy and expectedStrategy or
compare their Type and LoadBalancer fields to expectedStrategy.Type and
expectedStrategy.LoadBalancer to ensure the controller reflects the configured
strategy); apply the same change to the later block that repeats these hardcoded
assertions.

In `@test/e2e/v2/tests/hosted_cluster_node_communication_test.go`:
- Around line 47-58: The test currently always reads logs from podList.Items[0],
which is flaky because pod ordering is not stable; change the selection logic
inside the Eventually block (where podList is retrieved and GetLogs is called)
to iterate over podList.Items and pick a healthy konnectivity-agent pod (e.g.,
check pod.Status.Conditions for Ready==True or attempt GetLogs in a loop) and
only call GetLogs on the first pod that succeeds; update the GetLogs usage (the
call around GetLogs(...).DoRaw(tc.Context)) to try each pod name from the range
and assert success with g.Expect once a pod returns logs, failing only if none
succeed within the Eventually timeout.

In `@test/e2e/v2/tests/hosted_cluster_pull_secret_test.go`:
- Around line 99-100: Register the cleanup immediately after creating the secret
so it is removed even if the test fails: right after the
createAdditionalPullSecret(tc, hcClient, dummyPullSecretData) call, call
Ginkgo's DeferCleanup (or your test-framework cleanup helper) to delete the
secret (e.g., DeferCleanup(func() { deleteAdditionalPullSecret(tc, hcClient,
"additional-pull-secret") })); keep or remove the later cleanup at lines 139-145
as appropriate, but ensure the immediate DeferCleanup is present so the created
secret is always removed on test exit.
- Around line 232-238: The loop over nodePool.Status.Conditions must first
assert that a NodePoolUpdatingConfigConditionType entry exists before checking
its Status; modify the code that iterates nodePool.Status.Conditions (the block
referencing NodePoolUpdatingConfigConditionType) to record whether the condition
was found (e.g., found := false), set found = true when cond.Type ==
NodePoolUpdatingConfigConditionType, then after the loop add an
Expect(found).To(BeTrue(), "expected NodePoolUpdatingConfigConditionType to be
present") and only then validate Expect(string(cond.Status)).To(Equal(...)) for
the matching condition. Ensure you reference the same condition variable used in
the loop when asserting its Status.
- Around line 82-87: The test currently picks the first NodePool from
npList.Items which is nondeterministic; instead, iterate npList.Items to select
a deterministic NodePool that meets the test preconditions (e.g., Spec.Replicas
!= nil and Spec.Management.UpgradeType != hyperv1.UpgradeTypeInPlace), then if
multiple candidates exist sort them by a stable key like NodePool.Name and pick
the first; update references to np (currently used after selecting
npList.Items[0]) to use the chosen candidate. Ensure you check np.Spec.Replicas
and np.Spec.Management.UpgradeType on the selected NodePool and call Skip only
if no valid candidate is found.
- Around line 275-277: The test currently returns early when
hcClient.Create(tc.Context, secret) yields apierrors.IsAlreadyExists(err), which
leaves stale secret data; instead, on AlreadyExists you should fetch the
existing Secret (use hcClient.Get with the secret's NamespacedName), compare the
relevant fields (data/annotations/labels as used in the test), and if they
differ, update the existing Secret (set fields and use hcClient.Update or a
strategic patch ensuring ResourceVersion is preserved) so the secret content is
reconciled to the expected `secret` value; keep using tc.Context and the
original `secret` object for the desired state and handle other errors normally.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 03ff844d-17df-4497-9441-3b77b338c5b2

📥 Commits

Reviewing files that changed from the base of the PR and between 962cdbb and 8947114.

📒 Files selected for processing (5)
  • test/e2e/v2/tests/hosted_cluster_aws_test.go
  • test/e2e/v2/tests/hosted_cluster_cpo_test.go
  • test/e2e/v2/tests/hosted_cluster_ingress_test.go
  • test/e2e/v2/tests/hosted_cluster_node_communication_test.go
  • test/e2e/v2/tests/hosted_cluster_pull_secret_test.go

Comment thread test/e2e/v2/tests/hosted_cluster_aws_test.go
Comment thread test/e2e/v2/tests/hosted_cluster_aws_test.go
Comment thread test/e2e/v2/tests/hosted_cluster_cpo_test.go Outdated
Comment thread test/e2e/v2/tests/hosted_cluster_ingress_test.go Outdated
Comment thread test/e2e/v2/tests/hosted_cluster_node_communication_test.go
Comment thread test/e2e/v2/tests/hosted_cluster_pull_secret_test.go Outdated
Comment thread test/e2e/v2/tests/hosted_cluster_pull_secret_test.go
Comment thread test/e2e/v2/tests/hosted_cluster_pull_secret_test.go
Comment thread test/e2e/v2/tests/hosted_cluster_pull_secret_test.go
@cblecker cblecker changed the title NO-JIRA: Port remaining v1 TestCreateCluster gaps to v2 e2e GCP-688: Port remaining v1 TestCreateCluster gaps to v2 e2e May 26, 2026
@openshift-ci-robot

openshift-ci-robot commented May 26, 2026

Copy link
Copy Markdown

@cblecker: This pull request references GCP-688 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Ports the remaining 6 v1 TestCreateCluster validation gaps to v2 e2e tests, following up on PR #8511 which ported 14 validations.

Non-lifecycle tests (read-only verification, no cluster mutation):

  • EnsureNodeCommunication (hosted_cluster_node_communication_test.go): Verifies konnectivity-agent pods exist in the hosted cluster and that pod logs are retrievable, confirming the konnectivity tunnel works
  • ValidateIngressOperatorConfiguration (hosted_cluster_ingress_test.go): Platform-agnostic test that validates EndpointPublishingStrategy configured on the HostedCluster is reflected in the hosted cluster's IngressController (replaces Azure-only version with platform-agnostic coverage, requires 4.21+)
  • VerifyCPOOverrideImage (hosted_cluster_cpo_test.go): Checks that when a CPO override image is configured for the platform/version, the running control-plane-operator pod uses that image (self-skips when no override is configured)

Lifecycle tests (mutate cluster state with cleanup):

  • EnsureDefaultSecurityGroupTags (hosted_cluster_aws_test.go): AWS-only, adds a day-2 tag to the HostedCluster spec and verifies it's applied to the default worker security group via AWS API (requires 4.20+)
  • EnsureAWSCCMWithCustomizations (hosted_cluster_aws_test.go): AWS-only, validates NLBSecurityGroupMode=Managed in aws-cloud-config ConfigMap and creates a LoadBalancer service to verify managed security groups are attached to the NLB (requires 4.23+)
  • EnsureGlobalPullSecret (hosted_cluster_pull_secret_test.go): Full pull secret lifecycle — creates additional pull secret, verifies propagation through GlobalPullSecret pipeline, validates DaemonSet readiness, verifies in-place management-cluster pull secret propagation without rollout, and confirms cleanup on deletion (AWS/Azure, requires 4.19+)

Test plan

  • make e2ev2 compiles successfully
  • make lint passes with 0 issues
  • Pre-existing unit test failures are unrelated to these changes
  • Run v2 suite on GCP — non-lifecycle tests pass, AWS-only tests skip
  • Run v2 suite on AWS — lifecycle and non-lifecycle AWS tests pass
  • Run v2 suite on Azure — global pull secret lifecycle test passes

Summary by CodeRabbit

  • Tests
  • Added comprehensive end-to-end test coverage for hosted clusters, including:
    • AWS resource tagging and cloud controller manager customizations
    • Control plane operator image override validation
    • Ingress operator configuration verification
    • Node communication and connectivity checks
    • Global pull secret lifecycle management

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cblecker cblecker marked this pull request as ready for review May 26, 2026 20:52
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2026
@openshift-ci openshift-ci Bot requested review from devguyio and sdminonne May 26, 2026 20:53

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_aws_test.go`:
- Around line 101-111: DeferCleanup currently filters out tags by key and
ignores errors, which can drop pre-existing tags and mask failures; before
mutating obj.Spec.Platform.AWS.ResourceTags capture the original slice (e.g.
origTags := make([]hyperv1.AWSResourceTag, len(...)) copy(...)) and in the
DeferCleanup use e2eutil.UpdateObject to restore the exact original slice back
to obj.Spec.Platform.AWS.ResourceTags; check the returned error and treat
apierrors.IsNotFound(err) as acceptable but fail the test (return or call
GinkgoTB().Fatalf/Fail) for any other non-nil error so cleanup failures are
surfaced; reference functions/types: DeferCleanup, e2eutil.UpdateObject,
hyperv1.HostedCluster, ResourceTags, day2TagKey, and apierrors.IsNotFound.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 2bda1482-1b2e-4a72-936e-e54b68f465d3

📥 Commits

Reviewing files that changed from the base of the PR and between 8947114 and 92fcb28.

📒 Files selected for processing (5)
  • test/e2e/v2/AGENTS.md
  • test/e2e/v2/tests/hosted_cluster_aws_test.go
  • test/e2e/v2/tests/hosted_cluster_ingress_test.go
  • test/e2e/v2/tests/hosted_cluster_node_communication_test.go
  • test/e2e/v2/tests/hosted_cluster_pull_secret_test.go
✅ Files skipped from review due to trivial changes (1)
  • test/e2e/v2/AGENTS.md

Comment thread test/e2e/v2/tests/hosted_cluster_aws_test.go
Add 6 new v2 e2e tests covering functionality previously only tested
in v1 TestCreateCluster:

- EnsureNodeCommunication: verifies konnectivity-agent pods exist and
  logs are retrievable (non-lifecycle)
- ValidateIngressOperatorConfiguration: validates EndpointPublishingStrategy
  propagation to hosted cluster IngressController (non-lifecycle)
- VerifyCPOOverrideImage: checks CPO pod runs with expected override
  image for the platform and version (non-lifecycle)
- EnsureDefaultSecurityGroupTags: applies day-2 resource tags to
  default worker security group via AWS API (lifecycle, AWS)
- EnsureAWSCCMWithCustomizations: validates NLBSecurityGroupMode=Managed
  config and managed security groups on NLB (lifecycle, AWS)
- EnsureGlobalPullSecret: full pull secret lifecycle including
  propagation, DaemonSet readiness, and cleanup (lifecycle)

Assisted-by: Claude:claude-opus-4-6
@cblecker cblecker force-pushed the feat/v2-e2e-remaining-create-cluster-gaps branch from 92fcb28 to 2173eee Compare May 26, 2026 22:24

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (3)
test/e2e/v2/tests/hosted_cluster_ingress_test.go (1)

52-58: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Hardcoded strategy assertions limit test generality.

Lines 72-79 were updated to compare against expectedStrategy, but these assertions still hardcode LoadBalancerServiceStrategyType and InternalLoadBalancer. The test's contract is to verify that the IngressController mirrors the HostedCluster configuration, yet these lines artificially restrict the test to one strategy type.

As per coding guidelines, when extracting a value from one resource to validate against another, use the extracted variable in assertions — not hardcoded enum constants.

♻️ Recommended change
-			expectedStrategy := hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy
-			Expect(expectedStrategy.Type).To(Equal(operatorv1.LoadBalancerServiceStrategyType),
-				"EndpointPublishingStrategy type should be LoadBalancerService")
-			Expect(expectedStrategy.LoadBalancer).NotTo(BeNil(),
-				"LoadBalancer configuration should be set")
-			Expect(expectedStrategy.LoadBalancer.Scope).To(Equal(operatorv1.InternalLoadBalancer),
-				"LoadBalancer scope should be Internal")
+			expectedStrategy := hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy
+			Expect(expectedStrategy).NotTo(BeNil(),
+				"EndpointPublishingStrategy should be configured on HostedCluster")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_ingress_test.go` around lines 52 - 58, The
assertions currently hardcode LoadBalancerServiceStrategyType and
InternalLoadBalancer instead of using the extracted expectedStrategy; update the
three Expect checks to compare the IngressController's observed strategy fields
against expectedStrategy (the value extracted from
hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy) so the
test validates mirroring behavior generically rather than enforcing a specific
enum constant.
test/e2e/v2/tests/hosted_cluster_aws_test.go (2)

94-112: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore AWS ResourceTags exactly instead of filtering by key.

Current cleanup removes by key only, which can drop pre-existing tags sharing that key. Capture the original slice before mutation and restore it verbatim; allow IsNotFound during cleanup.

Proposed fix
 import (
 	"fmt"
 	"strings"
 	"time"
@@
 	corev1 "k8s.io/api/core/v1"
+	apierrors "k8s.io/apimachinery/pkg/api/errors"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
@@
 			day2TagKey := "test-day2-tag"
 			day2TagValue := "test-day2-value"
+			originalTags := append([]hyperv1.AWSResourceTag(nil), hc.Spec.Platform.AWS.ResourceTags...)

 			err = e2eutil.UpdateObject(GinkgoTB(), tc.Context, tc.MgmtClient, hc, func(obj *hyperv1.HostedCluster) {
 				obj.Spec.Platform.AWS.ResourceTags = append(obj.Spec.Platform.AWS.ResourceTags, hyperv1.AWSResourceTag{
@@
 			DeferCleanup(func() {
 				err := e2eutil.UpdateObject(GinkgoTB(), tc.Context, tc.MgmtClient, hc, func(obj *hyperv1.HostedCluster) {
-					filtered := make([]hyperv1.AWSResourceTag, 0, len(obj.Spec.Platform.AWS.ResourceTags))
-					for _, tag := range obj.Spec.Platform.AWS.ResourceTags {
-						if tag.Key != day2TagKey {
-							filtered = append(filtered, tag)
-						}
-					}
-					obj.Spec.Platform.AWS.ResourceTags = filtered
+					obj.Spec.Platform.AWS.ResourceTags = append([]hyperv1.AWSResourceTag(nil), originalTags...)
 				})
-				Expect(err).NotTo(HaveOccurred(), "cleanup: failed to remove day-2 tag from HostedCluster")
+				if err != nil && !apierrors.IsNotFound(err) {
+					Expect(err).NotTo(HaveOccurred(), "cleanup: failed to restore HostedCluster AWS resource tags")
+				}
 			})
As per coding guidelines: "When a test mutates cluster state, capture the original state before mutation and defer restoration... In cleanup functions, check `apierrors.IsNotFound()` to handle cases where the resource was already deleted."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_aws_test.go` around lines 94 - 112, The
cleanup currently filters ResourceTags by key which can drop pre-existing tags;
before calling e2eutil.UpdateObject to append the day2 tag in the HostedCluster
(obj.Spec.Platform.AWS.ResourceTags), capture the original slice (e.g., origTags
:= append([]hyperv1.AWSResourceTag(nil), hc.Spec.Platform.AWS.ResourceTags...))
and in DeferCleanup restore that exact slice verbatim via e2eutil.UpdateObject
(setting obj.Spec.Platform.AWS.ResourceTags = origTags); also make the cleanup
tolerant to the resource being deleted by allowing apierrors.IsNotFound() when
checking the returned error from UpdateObject.

48-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Mark external AWS API specs as disconnected-incompatible.

Both specs call external AWS APIs and should include Label("Skipped:Disconnected") so disconnected jobs can filter them.

Proposed fix
-		It("should apply the tag to the default worker security group via AWS API", Label("AWS"), func() {
+		It("should apply the tag to the default worker security group via AWS API", Label("AWS", "Skipped:Disconnected"), func() {
...
-			It("should attach managed security groups to the NLB", func() {
+			It("should attach managed security groups to the NLB", Label("AWS", "CCM", "Skipped:Disconnected"), func() {
As per coding guidelines: "Flag tests requiring external connectivity ... Add [Skipped:Disconnected] label if external connectivity is required and cannot be adapted."

Also applies to: 159-160

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_aws_test.go` around lines 48 - 49, The test
It block that currently uses Label("AWS") (the "should apply the tag to the
default worker security group via AWS API" test) invokes external AWS APIs and
must be marked as disconnected-incompatible; add Label("Skipped:Disconnected")
alongside Label("AWS") on that It call to allow CI to skip when offline, and do
the same for the other It block referenced around the later section (the other
AWS API test). Locate the It(...) calls (the test name strings and the
Label("AWS") usages) and add Label("Skipped:Disconnected") to each.
🧹 Nitpick comments (4)
test/e2e/v2/tests/hosted_cluster_node_communication_test.go (2)

32-36: ⚡ Quick win

Add exported function comments with fail-loud behavior details.

RegisterNodeCommunicationTests and EnsureNodeCommunicationTest are exported but undocumented. In this package, comments should describe actual behavior and mention panic/fail-loud expectations.

Suggested patch
+// RegisterNodeCommunicationTests registers hosted-cluster node-communication e2e specs.
 func RegisterNodeCommunicationTests(getTestCtx internal.TestContextGetter) {
 	EnsureNodeCommunicationTest(getTestCtx)
 }
 
+// EnsureNodeCommunicationTest verifies konnectivity-agent pod presence and log retrievability.
+// It fails loudly via Gomega expectations when hosted-cluster client setup or log retrieval fails.
 func EnsureNodeCommunicationTest(getTestCtx internal.TestContextGetter) {
 	When("hosted cluster has konnectivity tunnel configured", func() {

As per coding guidelines: “Comments on exported functions must describe actual behavior including panic conditions, not just intended behavior.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_node_communication_test.go` around lines 32
- 36, Add Go doc comments for the exported functions
RegisterNodeCommunicationTests and EnsureNodeCommunicationTest describing their
actual behavior and explicitly stating fail‑loud/panic conditions (e.g., when a
nil TestContextGetter is passed or when underlying asserts/fatal checks will
call t.Fatal/panic), so callers know these functions may terminate the test run;
place the comments immediately above each function declaration and follow godoc
style (brief summary sentence starting with the function name, followed by
details about behavior and panic/fail‑loud expectations).

38-38: ⚡ Quick win

Label the It block for test filtering consistency.

The top-level Describe is labeled, but this It is not. Add a stable label on It as well to keep filtering behavior consistent with v2 guidance.

Suggested patch
-		It("should have konnectivity-agent pods with retrievable logs", func() {
+		It("should have konnectivity-agent pods with retrievable logs", Label("hosted-cluster-node-communication"), func() {

As per coding guidelines: “Apply labels to Describe and It blocks for test filtering.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_node_communication_test.go` at line 38, The
It block "should have konnectivity-agent pods with retrievable logs" is missing
a Ginkgo label; update the It invocation to include a stable Label(...) argument
(e.g. It("should have konnectivity-agent pods with retrievable logs",
Label("hosted-cluster-node-communication"), func() { ... })) so the test can be
filtered consistently with the top-level Describe; modify the It call in
hosted_cluster_node_communication_test.go to add the Label(...) option next to
the test description.
test/e2e/v2/tests/hosted_cluster_cpo_test.go (2)

34-38: ⚡ Quick win

Add doc comments for exported test registration functions.

Lines 34 and 38 export functions without comments describing actual behavior (including skip/failure behavior in the registered flow).

Suggested update
+// RegisterHostedClusterCPOTests registers Hosted Cluster CPO verification tests.
 func RegisterHostedClusterCPOTests(getTestCtx internal.TestContextGetter) {
 	VerifyCPOOverrideImageTest(getTestCtx)
 }
 
+// VerifyCPOOverrideImageTest verifies that a configured CPO override image
+// is used by a running control-plane-operator pod. The test skips when no
+// applicable override is configured for the platform/version.
 func VerifyCPOOverrideImageTest(getTestCtx internal.TestContextGetter) {

As per coding guidelines: test/e2e/v2/**/*.go: “Comments on exported functions must describe actual behavior including panic conditions, not just intended behavior.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go` around lines 34 - 38, Add
descriptive doc comments above the exported functions
RegisterHostedClusterCPOTests and VerifyCPOOverrideImageTest that explain their
actual runtime behavior: what tests they register, any side effects, conditions
that cause the test to be skipped, expected failures or panic conditions, and
how the registered flow behaves on error (e.g., whether it calls t.Skip or
t.Fatal). Mention any inputs (getTestCtx) and what state or prerequisites are
required for successful execution so the comments satisfy the guideline that
exported test functions describe actual behavior including skip/failure
behavior.

40-40: ⚡ Quick win

Add a label on the It block for filterability.

Line 40 should include a stable Label(...) so this case can be targeted consistently in filtered runs.

Suggested update
-		It("should run the control-plane-operator pod with the expected override image", func() {
+		It("should run the control-plane-operator pod with the expected override image", Label("hosted-cluster-cpo"), func() {

As per coding guidelines: test/e2e/v2/**/*_test.go: “Apply labels to Describe and It blocks for test filtering.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go` at line 40, Add a stable Label
to the Ginkgo It block that reads "should run the control-plane-operator pod
with the expected override image" so the test can be filtered reliably; update
the It declaration for that test case to include a Label argument (e.g.,
"control-plane-operator-override-image" or similar stable identifier) before the
final func() closure so the test is selectable via label-based filtering.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_aws_test.go`:
- Line 140: This test It("should have NLBSecurityGroupMode=Managed in the
aws-cloud-config ConfigMap") is missing a Ginkgo label used by the suite's
filter; add the same Label(...) used by other tests in this file (e.g.,
Label("aws") or the suite-specific label) to the It declaration so it becomes
It("...", Label("..."), func() { ... }), ensuring consistent label-based
filtering with the rest of the suite.
- Around line 173-176: After creating the namespace with
hcClient.Create(tc.Context, testNS), replace the anonymous defer with an
immediate tc.DeferCleanup registration that calls hcClient.Delete(tc.Context,
testNS) and checks the returned error; if the delete error is non-nil and not
apierrors.IsNotFound(err), surface it (e.g., via Expect/Fail) instead of
silently ignoring it. Reference the create call (hcClient.Create), the cleanup
call (hcClient.Delete), the test context (tc.Context), and use
apierrors.IsNotFound to ignore already-deleted resources when implementing the
cleanup.

In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go`:
- Around line 68-81: The current loop picks the first runningPod and then
asserts on container index 0, which is nondeterministic; change the search to
set a found boolean while scanning podList.Items for pods with Status.Phase ==
corev1.PodRunning, assert found is true, and then instead of inspecting
runningPod.Spec.Containers[0] only, check all containers of each running pod to
ensure at least one running control-plane-operator pod contains a container
whose Image equals expectedImage (use pod.Name and expectedImage in failure
messages); update the assertion logic accordingly so it tolerates
rollouts/sidecars and follows the required found-flag pattern.

In `@test/e2e/v2/tests/hosted_cluster_pull_secret_test.go`:
- Around line 200-210: The cleanup currently swallows errors: in the
DeferCleanup closure, replace the blind return on Get error with explicit
handling using apierrors.IsNotFound(err) to return only when the secret truly
doesn't exist, and for any other Get error fail loudly (e.g., tc.T.Fatalf or
equivalent). Likewise, check the error returned by tc.MgmtClient.Update and fail
the test immediately on update failure instead of ignoring it. Target the
DeferCleanup closure and the calls to tc.MgmtClient.Get and tc.MgmtClient.Update
and use apierrors.IsNotFound to distinguish not-found from real errors.
- Around line 82-93: The current selection of a NodePool only checks for
Spec.Replicas != nil but allows zero replicas; update the selection logic to
require a positive replica count so tests exercise node-level behavior: when
iterating npList.Items in the hosted_cluster_pull_secret_test, change the
candidate filter to require candidate.Spec.Replicas != nil &&
*candidate.Spec.Replicas > 0 (and then set nodeCount = *np.Spec.Replicas); also
add an upfront assertion that npList.Items is non-empty before the loop and keep
the Skip with a clear message if no suitable NodePool (non-InPlace upgrade and
replicas > 0) is found.

---

Duplicate comments:
In `@test/e2e/v2/tests/hosted_cluster_aws_test.go`:
- Around line 94-112: The cleanup currently filters ResourceTags by key which
can drop pre-existing tags; before calling e2eutil.UpdateObject to append the
day2 tag in the HostedCluster (obj.Spec.Platform.AWS.ResourceTags), capture the
original slice (e.g., origTags := append([]hyperv1.AWSResourceTag(nil),
hc.Spec.Platform.AWS.ResourceTags...)) and in DeferCleanup restore that exact
slice verbatim via e2eutil.UpdateObject (setting
obj.Spec.Platform.AWS.ResourceTags = origTags); also make the cleanup tolerant
to the resource being deleted by allowing apierrors.IsNotFound() when checking
the returned error from UpdateObject.
- Around line 48-49: The test It block that currently uses Label("AWS") (the
"should apply the tag to the default worker security group via AWS API" test)
invokes external AWS APIs and must be marked as disconnected-incompatible; add
Label("Skipped:Disconnected") alongside Label("AWS") on that It call to allow CI
to skip when offline, and do the same for the other It block referenced around
the later section (the other AWS API test). Locate the It(...) calls (the test
name strings and the Label("AWS") usages) and add Label("Skipped:Disconnected")
to each.

In `@test/e2e/v2/tests/hosted_cluster_ingress_test.go`:
- Around line 52-58: The assertions currently hardcode
LoadBalancerServiceStrategyType and InternalLoadBalancer instead of using the
extracted expectedStrategy; update the three Expect checks to compare the
IngressController's observed strategy fields against expectedStrategy (the value
extracted from
hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy) so the
test validates mirroring behavior generically rather than enforcing a specific
enum constant.

---

Nitpick comments:
In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go`:
- Around line 34-38: Add descriptive doc comments above the exported functions
RegisterHostedClusterCPOTests and VerifyCPOOverrideImageTest that explain their
actual runtime behavior: what tests they register, any side effects, conditions
that cause the test to be skipped, expected failures or panic conditions, and
how the registered flow behaves on error (e.g., whether it calls t.Skip or
t.Fatal). Mention any inputs (getTestCtx) and what state or prerequisites are
required for successful execution so the comments satisfy the guideline that
exported test functions describe actual behavior including skip/failure
behavior.
- Line 40: Add a stable Label to the Ginkgo It block that reads "should run the
control-plane-operator pod with the expected override image" so the test can be
filtered reliably; update the It declaration for that test case to include a
Label argument (e.g., "control-plane-operator-override-image" or similar stable
identifier) before the final func() closure so the test is selectable via
label-based filtering.

In `@test/e2e/v2/tests/hosted_cluster_node_communication_test.go`:
- Around line 32-36: Add Go doc comments for the exported functions
RegisterNodeCommunicationTests and EnsureNodeCommunicationTest describing their
actual behavior and explicitly stating fail‑loud/panic conditions (e.g., when a
nil TestContextGetter is passed or when underlying asserts/fatal checks will
call t.Fatal/panic), so callers know these functions may terminate the test run;
place the comments immediately above each function declaration and follow godoc
style (brief summary sentence starting with the function name, followed by
details about behavior and panic/fail‑loud expectations).
- Line 38: The It block "should have konnectivity-agent pods with retrievable
logs" is missing a Ginkgo label; update the It invocation to include a stable
Label(...) argument (e.g. It("should have konnectivity-agent pods with
retrievable logs", Label("hosted-cluster-node-communication"), func() { ... }))
so the test can be filtered consistently with the top-level Describe; modify the
It call in hosted_cluster_node_communication_test.go to add the Label(...)
option next to the test description.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b2ae6600-9319-46e5-b48a-c0ec45220efc

📥 Commits

Reviewing files that changed from the base of the PR and between 92fcb28 and 2173eee.

📒 Files selected for processing (6)
  • test/e2e/v2/AGENTS.md
  • test/e2e/v2/tests/hosted_cluster_aws_test.go
  • test/e2e/v2/tests/hosted_cluster_cpo_test.go
  • test/e2e/v2/tests/hosted_cluster_ingress_test.go
  • test/e2e/v2/tests/hosted_cluster_node_communication_test.go
  • test/e2e/v2/tests/hosted_cluster_pull_secret_test.go
✅ Files skipped from review due to trivial changes (1)
  • test/e2e/v2/AGENTS.md

Comment thread test/e2e/v2/tests/hosted_cluster_aws_test.go
Comment thread test/e2e/v2/tests/hosted_cluster_aws_test.go Outdated
Comment thread test/e2e/v2/tests/hosted_cluster_cpo_test.go Outdated
Comment thread test/e2e/v2/tests/hosted_cluster_pull_secret_test.go
Comment thread test/e2e/v2/tests/hosted_cluster_pull_secret_test.go
- Add DeferCleanup to remove day-2 tag after AWS security group test,
  capturing original tags before mutation and restoring them exactly
- Use dynamic expectedStrategy values in ingress assertions
- Iterate all konnectivity-agent pods for log retrieval
- Select suitable NodePool by upgrade type with replicas > 0
- Add DeferCleanup for additional-pull-secret creation
- Add vacuous pass guard for UpdatingConfig condition search
- Reconcile existing secret on AlreadyExists instead of silent return
- Search CPO pods by container name instead of nondeterministic index
- Fail on unparseable release image version instead of skipping
- Use DeferCleanup with IsNotFound guard for namespace cleanup
- Surface pull-secret restore failures in cleanup instead of swallowing
- Add Standards 17-18, amend Standard 9 and 16 in AGENTS.md

Assisted-by: Claude:claude-opus-4-6
@cblecker cblecker force-pushed the feat/v2-e2e-remaining-create-cluster-gaps branch from 2173eee to 29c70cb Compare May 27, 2026 20:39
@cblecker

Copy link
Copy Markdown
Member Author

/pipeline required

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

g.Expect(ds.Status.DesiredNumberScheduled).To(BeNumerically(">=", nodeCount))
g.Expect(ds.Status.NumberReady).To(Equal(ds.Status.DesiredNumberScheduled))
}, 20*time.Minute, 10*time.Second).Should(Succeed())
})

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: The v1 EnsureGlobalPullSecret includes three validation steps that are not ported here:

  1. VerifyKubeletConfigWithDaemonSet (called twice in v1) - deploys a verifier DaemonSet that mounts the host filesystem and compares on-disk /var/lib/kubelet/config.json against the cluster pull secret.
  2. Negative pod pull test - creates a pod with unauthorized creds and asserts image pull fails.
  3. Positive pod pull test - creates a pod after applying valid creds and asserts image pull succeeds.

Together these prove the global pull secret pipeline works at the node level, not just at the Kubernetes object level. Was the omission intentional, or should these be added as a follow-up?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The omission was intentional. The v2 test validates the full Kubernetes API-level pipeline: management-cluster secret propagation, GlobalPullSecret creation/update/deletion, and DaemonSet readiness (ovnkube-node, global-pull-secret-syncer, konnectivity-agent). The DaemonSet readiness checks serve as implicit proof that images can be pulled and pods scheduled successfully.

The three omitted v1 steps — VerifyKubeletConfigWithDaemonSet (on-disk kubelet config.json verification), negative pod pull test, and positive pod pull test — are defense-in-depth at the node/filesystem and container-runtime levels. They're valuable but represent a separate concern from the controller pipeline being validated here.

Comment thread test/e2e/v2/tests/hosted_cluster_ingress_test.go
Comment thread test/e2e/v2/tests/hosted_cluster_aws_test.go Outdated
Move E2E_ADDITIONAL_PULL_SECRET_FILE registration to the
central env var catalog (Standard openshift#8) and replace defer with
DeferCleanup for Ginkgo abort/timeout safety.

Assisted-by: Claude:claude-sonnet-4-6
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label May 29, 2026
@bryan-cox

Copy link
Copy Markdown
Member

/retest

@cblecker

Copy link
Copy Markdown
Member Author

/pipeline required
/verified by e2e

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 30, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@cblecker: This PR has been marked as verified by e2e.

Details

In response to this:

/pipeline required
/verified by e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cblecker

Copy link
Copy Markdown
Member Author

/retest

1 similar comment
@cblecker

Copy link
Copy Markdown
Member Author

/retest

@hypershift-jira-solve-ci

hypershift-jira-solve-ci Bot commented May 30, 2026

Copy link
Copy Markdown

I now have all the evidence needed. Here is the final analysis:

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

could not run steps: step [release-inputs:latest] failed: could not resolve source
imagestream ocp/5.0 for release latest: imagestreams.image.openshift.io "5.0" not found

Summary

The e2e-kubevirt-aws-ovn-reduced test never executed. The job failed during the ci-operator graph setup phase — specifically when the [release-inputs:latest] step attempted to snapshot the ocp/5.0 ImageStream on the build03 CI cluster to assemble the latest release for testing. The ImageStream resource "5.0" in namespace ocp was not found on build03 at the time of execution, even though individual image references (e.g., ocp_5.0_cli) resolved successfully via quay-proxy. This is a transient CI infrastructure issue — the same job has completed successfully on other runs of PR #8590 both before and after this failure, confirming the ocp/5.0 ImageStream was temporarily unavailable on build03. The failure is not related to the PR's code changes.

Root Cause

The root cause is a transient unavailability of the ocp/5.0 ImageStream on the build03 CI build cluster.

The ci-operator configuration for openshift/hypershift on the main branch defines a releases.latest.integration entry pointing to name: "5.0", namespace: ocp. This was introduced during the OCP 5.0 branching (April 13, 2026, commit b9deb5c67c).

When the job ran on build03 at 18:20:36Z, the ci-operator:

  1. Successfully loaded the integrated stream metadata from config.ci.openshift.org/integratedStream?name=5.0&namespace=ocp
  2. Successfully resolved individual image references (e.g., ocp/5.0:cliquay-proxy.ci.openshift.org/openshift/ci:ocp_5.0_cli)
  3. Successfully resolved the initial release from 5.0.0-0.ci release stream
  4. Failed in 10 milliseconds when attempting to snapshot the ocp/5.0 ImageStream resource directly on the build03 cluster — the Kubernetes ImageStream object with name "5.0" did not exist in namespace "ocp" on that cluster at that moment

This means the ImageStream mirrors on build03 were temporarily out of sync. The ImageStream must be present on the build cluster for the snapshot operation, which copies all tags from the source imagestream into the job's stable stream. This is a known class of transient CI infrastructure issue — the ImageStream syncing between the central registry and the build clusters can have intermittent gaps.

Evidence that this is transient: other runs of the same job on PR #8590, occurring both before and after this run, completed successfully (multiple success states visible in the job history for the same PR around the same time window).

Recommendations
  1. Retest the job — This is a transient CI infrastructure failure unrelated to PR GCP-688: Port remaining v1 TestCreateCluster gaps to v2 e2e #8590's code changes. Simply re-trigger the e2e-kubevirt-aws-ovn-reduced job with /retest or /test e2e-kubevirt-aws-ovn-reduced.

  2. No code changes needed — The PR's changes (porting v1 TestCreateCluster gaps to v2 e2e) did not contribute to this failure. The test step never ran; the failure occurred entirely within ci-operator's release image assembly phase.

  3. If the failure recurs persistently, file an issue against the CI infrastructure team (Test Platform / DPTP) regarding ocp/5.0 ImageStream availability on build03. The relevant error reason reported by ci-operator is executing_graph:step_failed:creating_release_images.

Evidence
Evidence Detail
Error message imagestreams.image.openshift.io "5.0" not found
Failed step [release-inputs:latest] — "Find all of the input images from ocp/5.0 and tag them into the stable stream"
Step duration 0.010 seconds (immediate failure)
Build cluster build03 (console-openshift-console.apps.build03.ci.devcluster.openshift.com)
Test step status e2e-kubevirt-aws-ovn-reduced: never started
ci-operator reason executing_graph:step_failed:creating_release_images
ImageStream source ocp/5.0 (395 tags), configured via releases.latest.integration in ci-operator config
Individual images resolved ocp/5.0:cli, ocp/5.0:manila-csi-driver-operator-test, etc. all resolved via quay-proxy successfully
Config origin openshift/release repo, ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
5.0 introduced April 13, 2026 during OCP 5.0 branching (commit b9deb5c67c)
Transient proof Same job on PR #8590 shows success runs at 21:44–21:45Z (builds 2060839756312350720, 2060840007391776768, etc.)
All image builds succeeded src, hypershift-operator, hypershift, hypershift-tests, hypershift-cli all built successfully despite the release-inputs failure

@bryan-cox

Copy link
Copy Markdown
Member

/retest

@csrwng csrwng left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment, otherwise lgtm

}

expectedStrategy := hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy
Expect(expectedStrategy.Type).To(Equal(operatorv1.LoadBalancerServiceStrategyType),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider these pre-conditions and not necessarily test failures. If the strategy is not LoadBalancer or LoadBalancer does not have the internal scope, then skip the test.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Removed them entirely rather than converting to Skip — these were artifacts of the v1 lifecycle model where the test itself injected that exact configuration via a BeforeApply hook. In v2 the test runs against a pre-existing cluster it doesn't control, so they were both redundant and unnecessarily restrictive. The Eventually block already validates propagation dynamically using expectedStrategy, so the test now works for any configured strategy type.

The v2 ingress test runs against a pre-existing cluster it does not
control, unlike the v1 test which explicitly set LoadBalancerService/
Internal scope via a BeforeApply hook. The hardcoded Expect assertions
for strategy type and LoadBalancer scope were artifacts of that v1
lifecycle model and unnecessarily restricted test applicability.

The Eventually block already validates propagation dynamically using
the extracted expectedStrategy values, so the assertions were
redundant in addition to being restrictive.

Assisted-by: Claude:claude-opus-4-6
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Jun 3, 2026
@cblecker

cblecker commented Jun 3, 2026

Copy link
Copy Markdown
Member Author

/pipeline required
/verified by e2e

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 3, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@cblecker: This PR has been marked as verified by e2e.

Details

In response to this:

/pipeline required
/verified by e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@csrwng

csrwng commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@openshift-ci

openshift-ci Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

@cblecker: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 6902fe3 into openshift:main Jun 3, 2026
41 checks passed
@cblecker cblecker deleted the feat/v2-e2e-remaining-create-cluster-gaps branch June 3, 2026 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/platform/aws PR/issue for AWS (AWSPlatform) platform area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants