GCP-688: Port remaining v1 TestCreateCluster gaps to v2 e2e#8590
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@cblecker: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds multiple e2ev2 Ginkgo test suites for HostedCluster: CPO override image, ingress operator configuration, konnectivity-agent node communication, AWS-specific tests (default worker security group tagging and CCM/NLB security-group checks), and Global Pull Secret lifecycle tests. Updates AGENTS.md with v2 framework guidance and registers a new optional env var E2E_ADDITIONAL_PULL_SECRET_FILE. Sequence Diagram(s)sequenceDiagram
participant TestRunner
participant KubernetesAPI
participant ELBV2
participant EC2
TestRunner->>KubernetesAPI: create NLB Service + annotations
KubernetesAPI->>ELBV2: controller provisions load balancer (hostname observed)
TestRunner->>ELBV2: describe load balancer by name
ELBV2->>EC2: return attached security groups
TestRunner->>EC2: verify non-empty security groups
Suggested reviewers
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error)
✅ Passed checks (10 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/test e2e-v2-aws e2e-v2-gke |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cblecker The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8590 +/- ##
==========================================
+ Coverage 40.53% 41.27% +0.73%
==========================================
Files 755 755
Lines 93235 93446 +211
==========================================
+ Hits 37794 38566 +772
+ Misses 52740 52148 -592
- Partials 2701 2732 +31 see 50 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_aws_test.go`:
- Around line 48-49: The two E2E specs that call external AWS APIs (the It block
with description "should apply the tag to the default worker security group via
AWS API" and the other spec around lines 147-148) must be marked as requiring
external connectivity; add the [Skipped:Disconnected] label to those tests
(e.g., modify the Label("AWS") call or add Label("Skipped:Disconnected")
alongside it) so the test harness will skip them when disconnected, ensuring
tests that call IAM/EC2/ELBv2 are properly flagged.
- Around line 94-100: The test appends a day-2 AWS tag to
hc.Spec.Platform.AWS.ResourceTags but never restores the original slice, leaking
cluster-scoped state; capture the original tags before calling
e2eutil.UpdateObject (e.g. origTags := append([]hyperv1.AWSResourceTag(nil),
hc.Spec.Platform.AWS.ResourceTags...)) and add a cleanup that uses
e2eutil.UpdateObject in an AfterEach/Defer to set
hc.Spec.Platform.AWS.ResourceTags = origTags (or remove the specific day2 tag)
so the HostedCluster resource is restored; reference the HostedCluster variable
(hc), the field Spec.Platform.AWS.ResourceTags, and the e2eutil.UpdateObject
call when making the change.
In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go`:
- Around line 48-50: The test currently calls Skip when version == "" which
hides parsing regressions; change the behavior to fail the test instead by
replacing the Skip call with a test failure (e.g., use t.Fatalf or FailNow with
a clear message) so that when the parsed version is empty we fail with context
including releaseImage and the unexpected empty version; locate the block that
checks version == "" (the one referencing version and releaseImage) and replace
Skip(fmt.Sprintf(...)) with t.Fatalf("could not extract version from release
image %s: parsed version is empty", releaseImage) or an equivalent failure
helper used in the test suite.
In `@test/e2e/v2/tests/hosted_cluster_ingress_test.go`:
- Around line 53-58: The test currently asserts hardcoded values
(operatorv1.LoadBalancerServiceStrategyType and operatorv1.InternalLoadBalancer)
instead of validating the IngressController mirrors the HostedCluster strategy;
update the assertions to compare
IngressController.Spec.EndpointPublishingStrategy to the expectedStrategy object
used earlier (e.g., replace checks against expectedStrategy.Type and
expectedStrategy.LoadBalancer.Scope with a direct equality/assertion between
IngressController.Spec.EndpointPublishingStrategy and expectedStrategy or
compare their Type and LoadBalancer fields to expectedStrategy.Type and
expectedStrategy.LoadBalancer to ensure the controller reflects the configured
strategy); apply the same change to the later block that repeats these hardcoded
assertions.
In `@test/e2e/v2/tests/hosted_cluster_node_communication_test.go`:
- Around line 47-58: The test currently always reads logs from podList.Items[0],
which is flaky because pod ordering is not stable; change the selection logic
inside the Eventually block (where podList is retrieved and GetLogs is called)
to iterate over podList.Items and pick a healthy konnectivity-agent pod (e.g.,
check pod.Status.Conditions for Ready==True or attempt GetLogs in a loop) and
only call GetLogs on the first pod that succeeds; update the GetLogs usage (the
call around GetLogs(...).DoRaw(tc.Context)) to try each pod name from the range
and assert success with g.Expect once a pod returns logs, failing only if none
succeed within the Eventually timeout.
In `@test/e2e/v2/tests/hosted_cluster_pull_secret_test.go`:
- Around line 99-100: Register the cleanup immediately after creating the secret
so it is removed even if the test fails: right after the
createAdditionalPullSecret(tc, hcClient, dummyPullSecretData) call, call
Ginkgo's DeferCleanup (or your test-framework cleanup helper) to delete the
secret (e.g., DeferCleanup(func() { deleteAdditionalPullSecret(tc, hcClient,
"additional-pull-secret") })); keep or remove the later cleanup at lines 139-145
as appropriate, but ensure the immediate DeferCleanup is present so the created
secret is always removed on test exit.
- Around line 232-238: The loop over nodePool.Status.Conditions must first
assert that a NodePoolUpdatingConfigConditionType entry exists before checking
its Status; modify the code that iterates nodePool.Status.Conditions (the block
referencing NodePoolUpdatingConfigConditionType) to record whether the condition
was found (e.g., found := false), set found = true when cond.Type ==
NodePoolUpdatingConfigConditionType, then after the loop add an
Expect(found).To(BeTrue(), "expected NodePoolUpdatingConfigConditionType to be
present") and only then validate Expect(string(cond.Status)).To(Equal(...)) for
the matching condition. Ensure you reference the same condition variable used in
the loop when asserting its Status.
- Around line 82-87: The test currently picks the first NodePool from
npList.Items which is nondeterministic; instead, iterate npList.Items to select
a deterministic NodePool that meets the test preconditions (e.g., Spec.Replicas
!= nil and Spec.Management.UpgradeType != hyperv1.UpgradeTypeInPlace), then if
multiple candidates exist sort them by a stable key like NodePool.Name and pick
the first; update references to np (currently used after selecting
npList.Items[0]) to use the chosen candidate. Ensure you check np.Spec.Replicas
and np.Spec.Management.UpgradeType on the selected NodePool and call Skip only
if no valid candidate is found.
- Around line 275-277: The test currently returns early when
hcClient.Create(tc.Context, secret) yields apierrors.IsAlreadyExists(err), which
leaves stale secret data; instead, on AlreadyExists you should fetch the
existing Secret (use hcClient.Get with the secret's NamespacedName), compare the
relevant fields (data/annotations/labels as used in the test), and if they
differ, update the existing Secret (set fields and use hcClient.Update or a
strategic patch ensuring ResourceVersion is preserved) so the secret content is
reconciled to the expected `secret` value; keep using tc.Context and the
original `secret` object for the desired state and handle other errors normally.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 03ff844d-17df-4497-9441-3b77b338c5b2
📒 Files selected for processing (5)
test/e2e/v2/tests/hosted_cluster_aws_test.gotest/e2e/v2/tests/hosted_cluster_cpo_test.gotest/e2e/v2/tests/hosted_cluster_ingress_test.gotest/e2e/v2/tests/hosted_cluster_node_communication_test.gotest/e2e/v2/tests/hosted_cluster_pull_secret_test.go
|
@cblecker: This pull request references GCP-688 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_aws_test.go`:
- Around line 101-111: DeferCleanup currently filters out tags by key and
ignores errors, which can drop pre-existing tags and mask failures; before
mutating obj.Spec.Platform.AWS.ResourceTags capture the original slice (e.g.
origTags := make([]hyperv1.AWSResourceTag, len(...)) copy(...)) and in the
DeferCleanup use e2eutil.UpdateObject to restore the exact original slice back
to obj.Spec.Platform.AWS.ResourceTags; check the returned error and treat
apierrors.IsNotFound(err) as acceptable but fail the test (return or call
GinkgoTB().Fatalf/Fail) for any other non-nil error so cleanup failures are
surfaced; reference functions/types: DeferCleanup, e2eutil.UpdateObject,
hyperv1.HostedCluster, ResourceTags, day2TagKey, and apierrors.IsNotFound.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 2bda1482-1b2e-4a72-936e-e54b68f465d3
📒 Files selected for processing (5)
test/e2e/v2/AGENTS.mdtest/e2e/v2/tests/hosted_cluster_aws_test.gotest/e2e/v2/tests/hosted_cluster_ingress_test.gotest/e2e/v2/tests/hosted_cluster_node_communication_test.gotest/e2e/v2/tests/hosted_cluster_pull_secret_test.go
✅ Files skipped from review due to trivial changes (1)
- test/e2e/v2/AGENTS.md
Add 6 new v2 e2e tests covering functionality previously only tested in v1 TestCreateCluster: - EnsureNodeCommunication: verifies konnectivity-agent pods exist and logs are retrievable (non-lifecycle) - ValidateIngressOperatorConfiguration: validates EndpointPublishingStrategy propagation to hosted cluster IngressController (non-lifecycle) - VerifyCPOOverrideImage: checks CPO pod runs with expected override image for the platform and version (non-lifecycle) - EnsureDefaultSecurityGroupTags: applies day-2 resource tags to default worker security group via AWS API (lifecycle, AWS) - EnsureAWSCCMWithCustomizations: validates NLBSecurityGroupMode=Managed config and managed security groups on NLB (lifecycle, AWS) - EnsureGlobalPullSecret: full pull secret lifecycle including propagation, DaemonSet readiness, and cleanup (lifecycle) Assisted-by: Claude:claude-opus-4-6
92fcb28 to
2173eee
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (3)
test/e2e/v2/tests/hosted_cluster_ingress_test.go (1)
52-58: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winHardcoded strategy assertions limit test generality.
Lines 72-79 were updated to compare against
expectedStrategy, but these assertions still hardcodeLoadBalancerServiceStrategyTypeandInternalLoadBalancer. The test's contract is to verify that the IngressController mirrors the HostedCluster configuration, yet these lines artificially restrict the test to one strategy type.As per coding guidelines, when extracting a value from one resource to validate against another, use the extracted variable in assertions — not hardcoded enum constants.
♻️ Recommended change
- expectedStrategy := hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy - Expect(expectedStrategy.Type).To(Equal(operatorv1.LoadBalancerServiceStrategyType), - "EndpointPublishingStrategy type should be LoadBalancerService") - Expect(expectedStrategy.LoadBalancer).NotTo(BeNil(), - "LoadBalancer configuration should be set") - Expect(expectedStrategy.LoadBalancer.Scope).To(Equal(operatorv1.InternalLoadBalancer), - "LoadBalancer scope should be Internal") + expectedStrategy := hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy + Expect(expectedStrategy).NotTo(BeNil(), + "EndpointPublishingStrategy should be configured on HostedCluster")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/hosted_cluster_ingress_test.go` around lines 52 - 58, The assertions currently hardcode LoadBalancerServiceStrategyType and InternalLoadBalancer instead of using the extracted expectedStrategy; update the three Expect checks to compare the IngressController's observed strategy fields against expectedStrategy (the value extracted from hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy) so the test validates mirroring behavior generically rather than enforcing a specific enum constant.test/e2e/v2/tests/hosted_cluster_aws_test.go (2)
94-112:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestore AWS
ResourceTagsexactly instead of filtering by key.Current cleanup removes by key only, which can drop pre-existing tags sharing that key. Capture the original slice before mutation and restore it verbatim; allow
IsNotFoundduring cleanup.As per coding guidelines: "When a test mutates cluster state, capture the original state before mutation and defer restoration... In cleanup functions, check `apierrors.IsNotFound()` to handle cases where the resource was already deleted."Proposed fix
import ( "fmt" "strings" "time" @@ corev1 "k8s.io/api/core/v1" + apierrors "k8s.io/apimachinery/pkg/api/errors" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" @@ day2TagKey := "test-day2-tag" day2TagValue := "test-day2-value" + originalTags := append([]hyperv1.AWSResourceTag(nil), hc.Spec.Platform.AWS.ResourceTags...) err = e2eutil.UpdateObject(GinkgoTB(), tc.Context, tc.MgmtClient, hc, func(obj *hyperv1.HostedCluster) { obj.Spec.Platform.AWS.ResourceTags = append(obj.Spec.Platform.AWS.ResourceTags, hyperv1.AWSResourceTag{ @@ DeferCleanup(func() { err := e2eutil.UpdateObject(GinkgoTB(), tc.Context, tc.MgmtClient, hc, func(obj *hyperv1.HostedCluster) { - filtered := make([]hyperv1.AWSResourceTag, 0, len(obj.Spec.Platform.AWS.ResourceTags)) - for _, tag := range obj.Spec.Platform.AWS.ResourceTags { - if tag.Key != day2TagKey { - filtered = append(filtered, tag) - } - } - obj.Spec.Platform.AWS.ResourceTags = filtered + obj.Spec.Platform.AWS.ResourceTags = append([]hyperv1.AWSResourceTag(nil), originalTags...) }) - Expect(err).NotTo(HaveOccurred(), "cleanup: failed to remove day-2 tag from HostedCluster") + if err != nil && !apierrors.IsNotFound(err) { + Expect(err).NotTo(HaveOccurred(), "cleanup: failed to restore HostedCluster AWS resource tags") + } })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/hosted_cluster_aws_test.go` around lines 94 - 112, The cleanup currently filters ResourceTags by key which can drop pre-existing tags; before calling e2eutil.UpdateObject to append the day2 tag in the HostedCluster (obj.Spec.Platform.AWS.ResourceTags), capture the original slice (e.g., origTags := append([]hyperv1.AWSResourceTag(nil), hc.Spec.Platform.AWS.ResourceTags...)) and in DeferCleanup restore that exact slice verbatim via e2eutil.UpdateObject (setting obj.Spec.Platform.AWS.ResourceTags = origTags); also make the cleanup tolerant to the resource being deleted by allowing apierrors.IsNotFound() when checking the returned error from UpdateObject.
48-49:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMark external AWS API specs as disconnected-incompatible.
Both specs call external AWS APIs and should include
Label("Skipped:Disconnected")so disconnected jobs can filter them.As per coding guidelines: "Flag tests requiring external connectivity ... Add [Skipped:Disconnected] label if external connectivity is required and cannot be adapted."Proposed fix
- It("should apply the tag to the default worker security group via AWS API", Label("AWS"), func() { + It("should apply the tag to the default worker security group via AWS API", Label("AWS", "Skipped:Disconnected"), func() { ... - It("should attach managed security groups to the NLB", func() { + It("should attach managed security groups to the NLB", Label("AWS", "CCM", "Skipped:Disconnected"), func() {Also applies to: 159-160
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/hosted_cluster_aws_test.go` around lines 48 - 49, The test It block that currently uses Label("AWS") (the "should apply the tag to the default worker security group via AWS API" test) invokes external AWS APIs and must be marked as disconnected-incompatible; add Label("Skipped:Disconnected") alongside Label("AWS") on that It call to allow CI to skip when offline, and do the same for the other It block referenced around the later section (the other AWS API test). Locate the It(...) calls (the test name strings and the Label("AWS") usages) and add Label("Skipped:Disconnected") to each.
🧹 Nitpick comments (4)
test/e2e/v2/tests/hosted_cluster_node_communication_test.go (2)
32-36: ⚡ Quick winAdd exported function comments with fail-loud behavior details.
RegisterNodeCommunicationTestsandEnsureNodeCommunicationTestare exported but undocumented. In this package, comments should describe actual behavior and mention panic/fail-loud expectations.Suggested patch
+// RegisterNodeCommunicationTests registers hosted-cluster node-communication e2e specs. func RegisterNodeCommunicationTests(getTestCtx internal.TestContextGetter) { EnsureNodeCommunicationTest(getTestCtx) } +// EnsureNodeCommunicationTest verifies konnectivity-agent pod presence and log retrievability. +// It fails loudly via Gomega expectations when hosted-cluster client setup or log retrieval fails. func EnsureNodeCommunicationTest(getTestCtx internal.TestContextGetter) { When("hosted cluster has konnectivity tunnel configured", func() {As per coding guidelines: “Comments on exported functions must describe actual behavior including panic conditions, not just intended behavior.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/hosted_cluster_node_communication_test.go` around lines 32 - 36, Add Go doc comments for the exported functions RegisterNodeCommunicationTests and EnsureNodeCommunicationTest describing their actual behavior and explicitly stating fail‑loud/panic conditions (e.g., when a nil TestContextGetter is passed or when underlying asserts/fatal checks will call t.Fatal/panic), so callers know these functions may terminate the test run; place the comments immediately above each function declaration and follow godoc style (brief summary sentence starting with the function name, followed by details about behavior and panic/fail‑loud expectations).
38-38: ⚡ Quick winLabel the
Itblock for test filtering consistency.The top-level
Describeis labeled, but thisItis not. Add a stable label onItas well to keep filtering behavior consistent with v2 guidance.Suggested patch
- It("should have konnectivity-agent pods with retrievable logs", func() { + It("should have konnectivity-agent pods with retrievable logs", Label("hosted-cluster-node-communication"), func() {As per coding guidelines: “Apply labels to
DescribeandItblocks for test filtering.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/hosted_cluster_node_communication_test.go` at line 38, The It block "should have konnectivity-agent pods with retrievable logs" is missing a Ginkgo label; update the It invocation to include a stable Label(...) argument (e.g. It("should have konnectivity-agent pods with retrievable logs", Label("hosted-cluster-node-communication"), func() { ... })) so the test can be filtered consistently with the top-level Describe; modify the It call in hosted_cluster_node_communication_test.go to add the Label(...) option next to the test description.test/e2e/v2/tests/hosted_cluster_cpo_test.go (2)
34-38: ⚡ Quick winAdd doc comments for exported test registration functions.
Lines 34 and 38 export functions without comments describing actual behavior (including skip/failure behavior in the registered flow).
Suggested update
+// RegisterHostedClusterCPOTests registers Hosted Cluster CPO verification tests. func RegisterHostedClusterCPOTests(getTestCtx internal.TestContextGetter) { VerifyCPOOverrideImageTest(getTestCtx) } +// VerifyCPOOverrideImageTest verifies that a configured CPO override image +// is used by a running control-plane-operator pod. The test skips when no +// applicable override is configured for the platform/version. func VerifyCPOOverrideImageTest(getTestCtx internal.TestContextGetter) {As per coding guidelines:
test/e2e/v2/**/*.go: “Comments on exported functions must describe actual behavior including panic conditions, not just intended behavior.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go` around lines 34 - 38, Add descriptive doc comments above the exported functions RegisterHostedClusterCPOTests and VerifyCPOOverrideImageTest that explain their actual runtime behavior: what tests they register, any side effects, conditions that cause the test to be skipped, expected failures or panic conditions, and how the registered flow behaves on error (e.g., whether it calls t.Skip or t.Fatal). Mention any inputs (getTestCtx) and what state or prerequisites are required for successful execution so the comments satisfy the guideline that exported test functions describe actual behavior including skip/failure behavior.
40-40: ⚡ Quick winAdd a label on the
Itblock for filterability.Line 40 should include a stable
Label(...)so this case can be targeted consistently in filtered runs.Suggested update
- It("should run the control-plane-operator pod with the expected override image", func() { + It("should run the control-plane-operator pod with the expected override image", Label("hosted-cluster-cpo"), func() {As per coding guidelines:
test/e2e/v2/**/*_test.go: “Apply labels toDescribeandItblocks for test filtering.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go` at line 40, Add a stable Label to the Ginkgo It block that reads "should run the control-plane-operator pod with the expected override image" so the test can be filtered reliably; update the It declaration for that test case to include a Label argument (e.g., "control-plane-operator-override-image" or similar stable identifier) before the final func() closure so the test is selectable via label-based filtering.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_aws_test.go`:
- Line 140: This test It("should have NLBSecurityGroupMode=Managed in the
aws-cloud-config ConfigMap") is missing a Ginkgo label used by the suite's
filter; add the same Label(...) used by other tests in this file (e.g.,
Label("aws") or the suite-specific label) to the It declaration so it becomes
It("...", Label("..."), func() { ... }), ensuring consistent label-based
filtering with the rest of the suite.
- Around line 173-176: After creating the namespace with
hcClient.Create(tc.Context, testNS), replace the anonymous defer with an
immediate tc.DeferCleanup registration that calls hcClient.Delete(tc.Context,
testNS) and checks the returned error; if the delete error is non-nil and not
apierrors.IsNotFound(err), surface it (e.g., via Expect/Fail) instead of
silently ignoring it. Reference the create call (hcClient.Create), the cleanup
call (hcClient.Delete), the test context (tc.Context), and use
apierrors.IsNotFound to ignore already-deleted resources when implementing the
cleanup.
In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go`:
- Around line 68-81: The current loop picks the first runningPod and then
asserts on container index 0, which is nondeterministic; change the search to
set a found boolean while scanning podList.Items for pods with Status.Phase ==
corev1.PodRunning, assert found is true, and then instead of inspecting
runningPod.Spec.Containers[0] only, check all containers of each running pod to
ensure at least one running control-plane-operator pod contains a container
whose Image equals expectedImage (use pod.Name and expectedImage in failure
messages); update the assertion logic accordingly so it tolerates
rollouts/sidecars and follows the required found-flag pattern.
In `@test/e2e/v2/tests/hosted_cluster_pull_secret_test.go`:
- Around line 200-210: The cleanup currently swallows errors: in the
DeferCleanup closure, replace the blind return on Get error with explicit
handling using apierrors.IsNotFound(err) to return only when the secret truly
doesn't exist, and for any other Get error fail loudly (e.g., tc.T.Fatalf or
equivalent). Likewise, check the error returned by tc.MgmtClient.Update and fail
the test immediately on update failure instead of ignoring it. Target the
DeferCleanup closure and the calls to tc.MgmtClient.Get and tc.MgmtClient.Update
and use apierrors.IsNotFound to distinguish not-found from real errors.
- Around line 82-93: The current selection of a NodePool only checks for
Spec.Replicas != nil but allows zero replicas; update the selection logic to
require a positive replica count so tests exercise node-level behavior: when
iterating npList.Items in the hosted_cluster_pull_secret_test, change the
candidate filter to require candidate.Spec.Replicas != nil &&
*candidate.Spec.Replicas > 0 (and then set nodeCount = *np.Spec.Replicas); also
add an upfront assertion that npList.Items is non-empty before the loop and keep
the Skip with a clear message if no suitable NodePool (non-InPlace upgrade and
replicas > 0) is found.
---
Duplicate comments:
In `@test/e2e/v2/tests/hosted_cluster_aws_test.go`:
- Around line 94-112: The cleanup currently filters ResourceTags by key which
can drop pre-existing tags; before calling e2eutil.UpdateObject to append the
day2 tag in the HostedCluster (obj.Spec.Platform.AWS.ResourceTags), capture the
original slice (e.g., origTags := append([]hyperv1.AWSResourceTag(nil),
hc.Spec.Platform.AWS.ResourceTags...)) and in DeferCleanup restore that exact
slice verbatim via e2eutil.UpdateObject (setting
obj.Spec.Platform.AWS.ResourceTags = origTags); also make the cleanup tolerant
to the resource being deleted by allowing apierrors.IsNotFound() when checking
the returned error from UpdateObject.
- Around line 48-49: The test It block that currently uses Label("AWS") (the
"should apply the tag to the default worker security group via AWS API" test)
invokes external AWS APIs and must be marked as disconnected-incompatible; add
Label("Skipped:Disconnected") alongside Label("AWS") on that It call to allow CI
to skip when offline, and do the same for the other It block referenced around
the later section (the other AWS API test). Locate the It(...) calls (the test
name strings and the Label("AWS") usages) and add Label("Skipped:Disconnected")
to each.
In `@test/e2e/v2/tests/hosted_cluster_ingress_test.go`:
- Around line 52-58: The assertions currently hardcode
LoadBalancerServiceStrategyType and InternalLoadBalancer instead of using the
extracted expectedStrategy; update the three Expect checks to compare the
IngressController's observed strategy fields against expectedStrategy (the value
extracted from
hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy) so the
test validates mirroring behavior generically rather than enforcing a specific
enum constant.
---
Nitpick comments:
In `@test/e2e/v2/tests/hosted_cluster_cpo_test.go`:
- Around line 34-38: Add descriptive doc comments above the exported functions
RegisterHostedClusterCPOTests and VerifyCPOOverrideImageTest that explain their
actual runtime behavior: what tests they register, any side effects, conditions
that cause the test to be skipped, expected failures or panic conditions, and
how the registered flow behaves on error (e.g., whether it calls t.Skip or
t.Fatal). Mention any inputs (getTestCtx) and what state or prerequisites are
required for successful execution so the comments satisfy the guideline that
exported test functions describe actual behavior including skip/failure
behavior.
- Line 40: Add a stable Label to the Ginkgo It block that reads "should run the
control-plane-operator pod with the expected override image" so the test can be
filtered reliably; update the It declaration for that test case to include a
Label argument (e.g., "control-plane-operator-override-image" or similar stable
identifier) before the final func() closure so the test is selectable via
label-based filtering.
In `@test/e2e/v2/tests/hosted_cluster_node_communication_test.go`:
- Around line 32-36: Add Go doc comments for the exported functions
RegisterNodeCommunicationTests and EnsureNodeCommunicationTest describing their
actual behavior and explicitly stating fail‑loud/panic conditions (e.g., when a
nil TestContextGetter is passed or when underlying asserts/fatal checks will
call t.Fatal/panic), so callers know these functions may terminate the test run;
place the comments immediately above each function declaration and follow godoc
style (brief summary sentence starting with the function name, followed by
details about behavior and panic/fail‑loud expectations).
- Line 38: The It block "should have konnectivity-agent pods with retrievable
logs" is missing a Ginkgo label; update the It invocation to include a stable
Label(...) argument (e.g. It("should have konnectivity-agent pods with
retrievable logs", Label("hosted-cluster-node-communication"), func() { ... }))
so the test can be filtered consistently with the top-level Describe; modify the
It call in hosted_cluster_node_communication_test.go to add the Label(...)
option next to the test description.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b2ae6600-9319-46e5-b48a-c0ec45220efc
📒 Files selected for processing (6)
test/e2e/v2/AGENTS.mdtest/e2e/v2/tests/hosted_cluster_aws_test.gotest/e2e/v2/tests/hosted_cluster_cpo_test.gotest/e2e/v2/tests/hosted_cluster_ingress_test.gotest/e2e/v2/tests/hosted_cluster_node_communication_test.gotest/e2e/v2/tests/hosted_cluster_pull_secret_test.go
✅ Files skipped from review due to trivial changes (1)
- test/e2e/v2/AGENTS.md
- Add DeferCleanup to remove day-2 tag after AWS security group test, capturing original tags before mutation and restoring them exactly - Use dynamic expectedStrategy values in ingress assertions - Iterate all konnectivity-agent pods for log retrieval - Select suitable NodePool by upgrade type with replicas > 0 - Add DeferCleanup for additional-pull-secret creation - Add vacuous pass guard for UpdatingConfig condition search - Reconcile existing secret on AlreadyExists instead of silent return - Search CPO pods by container name instead of nondeterministic index - Fail on unparseable release image version instead of skipping - Use DeferCleanup with IsNotFound guard for namespace cleanup - Surface pull-secret restore failures in cleanup instead of swallowing - Add Standards 17-18, amend Standard 9 and 16 in AGENTS.md Assisted-by: Claude:claude-opus-4-6
2173eee to
29c70cb
Compare
|
/pipeline required |
|
Scheduling tests matching the |
| g.Expect(ds.Status.DesiredNumberScheduled).To(BeNumerically(">=", nodeCount)) | ||
| g.Expect(ds.Status.NumberReady).To(Equal(ds.Status.DesiredNumberScheduled)) | ||
| }, 20*time.Minute, 10*time.Second).Should(Succeed()) | ||
| }) |
There was a problem hiding this comment.
question: The v1 EnsureGlobalPullSecret includes three validation steps that are not ported here:
VerifyKubeletConfigWithDaemonSet(called twice in v1) - deploys a verifier DaemonSet that mounts the host filesystem and compares on-disk/var/lib/kubelet/config.jsonagainst the cluster pull secret.- Negative pod pull test - creates a pod with unauthorized creds and asserts image pull fails.
- Positive pod pull test - creates a pod after applying valid creds and asserts image pull succeeds.
Together these prove the global pull secret pipeline works at the node level, not just at the Kubernetes object level. Was the omission intentional, or should these be added as a follow-up?
There was a problem hiding this comment.
The omission was intentional. The v2 test validates the full Kubernetes API-level pipeline: management-cluster secret propagation, GlobalPullSecret creation/update/deletion, and DaemonSet readiness (ovnkube-node, global-pull-secret-syncer, konnectivity-agent). The DaemonSet readiness checks serve as implicit proof that images can be pulled and pods scheduled successfully.
The three omitted v1 steps — VerifyKubeletConfigWithDaemonSet (on-disk kubelet config.json verification), negative pod pull test, and positive pod pull test — are defense-in-depth at the node/filesystem and container-runtime levels. They're valuable but represent a separate concern from the controller pipeline being validated here.
Move E2E_ADDITIONAL_PULL_SECRET_FILE registration to the central env var catalog (Standard openshift#8) and replace defer with DeferCleanup for Ginkgo abort/timeout safety. Assisted-by: Claude:claude-sonnet-4-6
|
/retest |
|
/pipeline required |
|
Scheduling tests matching the |
|
@cblecker: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
1 similar comment
|
/retest |
|
I now have all the evidence needed. Here is the final analysis: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Root CauseThe root cause is a transient unavailability of the The ci-operator configuration for When the job ran on
This means the ImageStream mirrors on Evidence that this is transient: other runs of the same job on PR #8590, occurring both before and after this run, completed successfully (multiple Recommendations
Evidence
|
|
/retest |
csrwng
left a comment
There was a problem hiding this comment.
Just one comment, otherwise lgtm
| } | ||
|
|
||
| expectedStrategy := hc.Spec.OperatorConfiguration.IngressOperator.EndpointPublishingStrategy | ||
| Expect(expectedStrategy.Type).To(Equal(operatorv1.LoadBalancerServiceStrategyType), |
There was a problem hiding this comment.
I would consider these pre-conditions and not necessarily test failures. If the strategy is not LoadBalancer or LoadBalancer does not have the internal scope, then skip the test.
There was a problem hiding this comment.
Good catch. Removed them entirely rather than converting to Skip — these were artifacts of the v1 lifecycle model where the test itself injected that exact configuration via a BeforeApply hook. In v2 the test runs against a pre-existing cluster it doesn't control, so they were both redundant and unnecessarily restrictive. The Eventually block already validates propagation dynamically using expectedStrategy, so the test now works for any configured strategy type.
The v2 ingress test runs against a pre-existing cluster it does not control, unlike the v1 test which explicitly set LoadBalancerService/ Internal scope via a BeforeApply hook. The hardcoded Expect assertions for strategy type and LoadBalancer scope were artifacts of that v1 lifecycle model and unnecessarily restricted test applicability. The Eventually block already validates propagation dynamically using the extracted expectedStrategy values, so the assertions were redundant in addition to being restrictive. Assisted-by: Claude:claude-opus-4-6
|
/pipeline required |
|
Scheduling tests matching the |
|
@cblecker: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
|
@cblecker: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Ports the remaining 6 v1
TestCreateClustervalidation gaps to v2 e2e tests, following up on PR #8511 which ported 14 validations.Non-lifecycle tests (read-only verification, no cluster mutation):
hosted_cluster_node_communication_test.go): Verifies konnectivity-agent pods exist in the hosted cluster and that pod logs are retrievable, confirming the konnectivity tunnel workshosted_cluster_ingress_test.go): Platform-agnostic test that validatesEndpointPublishingStrategyconfigured on the HostedCluster is reflected in the hosted cluster's IngressController (replaces Azure-only version with platform-agnostic coverage, requires 4.21+)hosted_cluster_cpo_test.go): Checks that when a CPO override image is configured for the platform/version, the running control-plane-operator pod uses that image (self-skips when no override is configured)Lifecycle tests (mutate cluster state with cleanup):
hosted_cluster_aws_test.go): AWS-only, adds a day-2 tag to the HostedCluster spec and verifies it's applied to the default worker security group via AWS API (requires 4.20+)hosted_cluster_aws_test.go): AWS-only, validates NLBSecurityGroupMode=Managed in aws-cloud-config ConfigMap and creates a LoadBalancer service to verify managed security groups are attached to the NLB (requires 4.23+)hosted_cluster_pull_secret_test.go): Full pull secret lifecycle — creates additional pull secret, verifies propagation through GlobalPullSecret pipeline, validates DaemonSet readiness, verifies in-place management-cluster pull secret propagation without rollout, and confirms cleanup on deletion (AWS/Azure, requires 4.19+)Test plan
make e2ev2compiles successfullymake lintpasses with 0 issuesSummary by CodeRabbit