Skip to content

NO-JIRA: Extend timeout for CRD removal during integration tests#8366

Merged
enxebre merged 1 commit into
openshift:mainfrom
JoelSpeed:extend-crd-remove-timeout
May 4, 2026
Merged

NO-JIRA: Extend timeout for CRD removal during integration tests#8366
enxebre merged 1 commit into
openshift:mainfrom
JoelSpeed:extend-crd-remove-timeout

Conversation

@JoelSpeed

@JoelSpeed JoelSpeed commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

After adjusting the output and running this in CI, I could see that the failures were because the deletiontimestamp wasn't set. I've updated the logic to make this more robust - if it sees that the CRD isn't gone, it will attempt to delete it again.

Which issue(s) this PR fixes:

Fixes

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Tests
    • Improved CRD uninstallation verification: missing resources are treated as successful removal, polling now retries deletions until resources are confirmed removed, polling is more responsive (shorter interval), and failure messages are clearer and more descriptive.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

test/envtest/generator.go: GenerateCRDInstallTest now uninstalls CRDs with an explicit per-CRD loop calling k8sClient.Delete (treating NotFound as success) instead of envtest.UninstallCRDs. After initiating deletions, the verification uses an Eventually callback that returns (bool, error): it performs a Get for each CRD, treats apierrors.IsNotFound(err) as success, retries Delete if the CRD is still present, and only reports completion when the CRD is observed as NotFound. The polling step was changed from "1s" to "200ms"; the assertion message remains CRD %s should be fully removed.


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Stable And Deterministic Test Names ❌ Error Test definitions at lines 230 and 175 include dynamic parameter values (featureSet, filename, gvr) formatted into test names using fmt.Sprintf(), violating stable and deterministic Ginkgo test title requirements. Replace dynamic test titles with static descriptive names: change It(fmt.Sprintf("should install all CRDs for feature set %q", featureSet), ...) to It("should install all CRDs successfully", ...) and use static Describe names instead of suiteName containing dynamic values.
Title check ⚠️ Warning The title claims to extend timeout for CRD removal, but the actual changes replace the removal method entirely and modify polling logic, not just extend a timeout value. Update title to reflect the main change: something like 'Replace CRD uninstall logic with explicit deletion loop' or 'Refactor CRD removal with per-CRD deletion and polling'
Test Structure And Quality ⚠️ Warning Test has critical compilation errors: unused variables on lines 265 and 270, variable redeclaration on line 276, and missing failure messages on assertions. Remove unused key variables from lines 265 and 270, change line 276 to use assignment instead of redeclaration, add meaningful messages to WaitForCRDs and Delete assertions.
Ote Binary Stdout Contract ❓ Inconclusive The specific file test/envtest/generator.go cannot be located in the current repository state to verify OTE Binary Stdout Contract compliance. Ensure the repository reflects the PR changes or provide the actual modified test/envtest/generator.go file content for stdout violation verification.
✅ Passed checks (8 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests are added; only test infrastructure code for CRD setup/teardown is modified.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests are introduced in this pull request. Changes are limited to utility function modifications in test/envtest/generator.go.
Topology-Aware Scheduling Compatibility ✅ Passed Pull request modifies only test infrastructure code in test/envtest/generator.go for CRD cleanup; no topology-aware scheduling constraints introduced.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Pull request modifies test infrastructure code (GenerateCRDInstallTest function) without introducing new Ginkgo e2e tests, making IPv6 and disconnected network compatibility check not applicable.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from devguyio and jparrill April 29, 2026 09:55
@openshift-ci openshift-ci Bot added the area/testing Indicates the PR includes changes for e2e testing label Apr 29, 2026
@openshift-ci

openshift-ci Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: JoelSpeed
Once this PR has been reviewed and has the lgtm label, please assign muraee for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov

codecov Bot commented Apr 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 36.71%. Comparing base (6e734a9) to head (7727f69).
⚠️ Report is 18 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8366   +/-   ##
=======================================
  Coverage   36.71%   36.71%           
=======================================
  Files         768      768           
  Lines       93396    93396           
=======================================
  Hits        34286    34286           
  Misses      56426    56426           
  Partials     2684     2684           
Flag Coverage Δ
cmd-support 30.45% <ø> (ø)
cpo-hostedcontrolplane 37.19% <ø> (ø)
cpo-other 37.73% <ø> (ø)
hypershift-operator 47.84% <ø> (ø)
other 27.77% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JoelSpeed JoelSpeed changed the title Extend timeout for CRD removal during integration tests NO-JIRA: Extend timeout for CRD removal during integration tests Apr 29, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 29, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@JoelSpeed: This pull request explicitly references no jira issue.

Details

In response to this:

What this PR does / why we need it:

This is regularly failing on integration tests in the GH actions runners. Extend to 60s

Which issue(s) this PR fixes:

Fixes

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Tests
  • Improved test reliability by increasing the timeout duration for Custom Resource Definition cleanup operations.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@enxebre

enxebre commented Apr 29, 2026

Copy link
Copy Markdown
Member

I've hit timeout with 60s in the past, do we want to set 120?

@JoelSpeed

Copy link
Copy Markdown
Contributor Author

Ack, extended to 120s

@jparrill jparrill left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped some comments . Thanks! EnvTests are failing too :)

Comment thread test/envtest/generator.go Outdated
@enxebre

enxebre commented Apr 30, 2026

Copy link
Copy Markdown
Member

failed again, 180?

@JoelSpeed

Copy link
Copy Markdown
Contributor Author

Something tells me that the "delete" isn't sticking - trying to repro locally with some more output when it does fail

@JoelSpeed

Copy link
Copy Markdown
Contributor Author

No luck reproducing locally, lets see if I can dump the CRD on failure and see what it says

@JoelSpeed JoelSpeed force-pushed the extend-crd-remove-timeout branch from d777ae5 to 42850bd Compare April 30, 2026 15:48

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/envtest/generator.go`:
- Around line 259-265: The CRD removal wait uses Eventually(func...) with a
hardcoded timeout "30s" which is too short; update the timeout argument in that
Eventually call (the one calling k8sClient.Get on
&apiextensionsv1.CustomResourceDefinition{} and referencing crd.Name,
crd.DeletionTimestamp, crd.Finalizers) to a longer value (e.g., "2m" or a shared
timeout constant) so the CRD cleanup wait path uses the extended timeout to
reduce CI flakes.
- Around line 260-264: The code calls k8sClient.Get(ctx, key,
&apiextensionsv1.CustomResourceDefinition{}) but discards the fetched object and
then logs stale fields from the original crd variable; also unexpected Get
errors are swallowed. Change the Get to populate a local variable (e.g., live :=
apiextensionsv1.CustomResourceDefinition{}; err := k8sClient.Get(ctx, key,
&live)), check if apierrors.IsNotFound(err) then return true,nil, otherwise if
err != nil return false, err, and when constructing the error message reference
live.DeletionTimestamp and live.Finalizers instead of
crd.DeletionTimestamp/crd.Finalizers so the diagnostic uses the live cluster
state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: e938bd30-f4a3-4aa1-9750-51ca1277ffe9

📥 Commits

Reviewing files that changed from the base of the PR and between d777ae5 and 42850bd.

📒 Files selected for processing (1)
  • test/envtest/generator.go

Comment thread test/envtest/generator.go Outdated
Comment on lines 259 to 265
Eventually(func() (bool, error) {
err := k8sClient.Get(ctx, key, &apiextensionsv1.CustomResourceDefinition{})
return apierrors.IsNotFound(err)
if apierrors.IsNotFound(err) {
return true, nil
}
return false, fmt.Errorf("CRD %s should be fully removed: DeletionTimestamp=%v, Finalizers=%v", crd.Name, crd.DeletionTimestamp, crd.Finalizers)
}, "30s", "1s").Should(BeTrue(), fmt.Sprintf("CRD %s should be fully removed", crd.Name))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Timeout is still 30s in the CRD removal wait path.

Line 265 still uses "30s", so this path does not apply the timeout extension intended to reduce CI flakes during CRD cleanup.

Proposed fix
-			}, "30s", "1s").Should(BeTrue(), fmt.Sprintf("CRD %s should be fully removed", crd.Name))
+			}, "120s", "1s").Should(BeTrue(), fmt.Sprintf("CRD %s should be fully removed", crd.Name))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Eventually(func() (bool, error) {
err := k8sClient.Get(ctx, key, &apiextensionsv1.CustomResourceDefinition{})
return apierrors.IsNotFound(err)
if apierrors.IsNotFound(err) {
return true, nil
}
return false, fmt.Errorf("CRD %s should be fully removed: DeletionTimestamp=%v, Finalizers=%v", crd.Name, crd.DeletionTimestamp, crd.Finalizers)
}, "30s", "1s").Should(BeTrue(), fmt.Sprintf("CRD %s should be fully removed", crd.Name))
Eventually(func() (bool, error) {
err := k8sClient.Get(ctx, key, &apiextensionsv1.CustomResourceDefinition{})
if apierrors.IsNotFound(err) {
return true, nil
}
return false, fmt.Errorf("CRD %s should be fully removed: DeletionTimestamp=%v, Finalizers=%v", crd.Name, crd.DeletionTimestamp, crd.Finalizers)
}, "120s", "1s").Should(BeTrue(), fmt.Sprintf("CRD %s should be fully removed", crd.Name))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/envtest/generator.go` around lines 259 - 265, The CRD removal wait uses
Eventually(func...) with a hardcoded timeout "30s" which is too short; update
the timeout argument in that Eventually call (the one calling k8sClient.Get on
&apiextensionsv1.CustomResourceDefinition{} and referencing crd.Name,
crd.DeletionTimestamp, crd.Finalizers) to a longer value (e.g., "2m" or a shared
timeout constant) so the CRD cleanup wait path uses the extended timeout to
reduce CI flakes.

Comment thread test/envtest/generator.go Outdated
@JoelSpeed JoelSpeed force-pushed the extend-crd-remove-timeout branch from 42850bd to c715f3f Compare April 30, 2026 16:57

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/envtest/generator.go`:
- Around line 261-270: The issue is a redeclaration of err in the Eventually
closure: err is declared with err := k8sClient.Get(ctx, key, ...) and then
redeclared with err := k8sClient.Delete(ctx, crd), which causes a "no new
variables" compile error; change the second short declaration to an assignment
(err = k8sClient.Delete(ctx, crd)) so both k8sClient.Get and k8sClient.Delete
use the same err variable within the closure (referencing ctx, key, crd,
k8sClient, and apiextensionsv1.CustomResourceDefinition to locate the code).
- Around line 254-257: Remove the unused key variable and fix the err
redeclaration: in the first loop that iterates crds (where
client.ObjectKeyFromObject is called and key is assigned), delete the
unnecessary key assignment so the variable is not declared if it's not used; in
the later anonymous function loop where err is first declared with := and then
redeclared, change the second declaration to assignment (use = instead of :=) or
rename the second variable so you don't use := on an existing identifier—adjust
the statements around k8sClient.Delete, Expect(...).To(SatisfyAny(...)), and any
calls that set err accordingly to keep the scope correct.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b1960f57-0425-4ed9-9558-5c0fdebf9be2

📥 Commits

Reviewing files that changed from the base of the PR and between 42850bd and c715f3f.

📒 Files selected for processing (1)
  • test/envtest/generator.go

Comment thread test/envtest/generator.go
Comment thread test/envtest/generator.go Outdated
@hypershift-jira-solve-ci

Copy link
Copy Markdown

I now have all the evidence needed. Here is the complete analysis:

Test Failure Analysis Complete

Job Information

  • Prow Job: pull-ci-openshift-hypershift-main-verify-workflows
  • Build ID: 2049895918852902912
  • PR: #8366 — NO-JIRA: Extend timeout for CRD removal during integration tests
  • Changed file: test/envtest/generator.go (+19, -5)
  • All 13 failing jobs share the same root cause (Go compilation errors in test/envtest/generator.go)

Test Failure Analysis

Error

test/envtest/generator.go:255:4: declared and not used: key
test/envtest/generator.go:269:9: no new variables on left side of :=
FAIL    github.com/openshift/hypershift/test/envtest [build failed]

Summary

All 13 jobs (6 Envtest OCP, 5 Envtest Vanilla Kube, and 2 Conclusion aggregators) fail because the PR introduces two Go compilation errors in test/envtest/generator.go. The first for loop (lines 254-256) declares a key variable via client.ObjectKeyFromObject(crd) that is never used — Go treats unused variables as a compile error. The second bug is inside the Eventually closure (line 269) where err is first declared with := on line 263 via k8sClient.Get(...), and then a second err := short variable declaration is attempted on line 269 via k8sClient.Delete(...) — Go rejects this because err already exists in that scope and := requires at least one new variable on the left side. The verify-workflows Prow job fails for a separate reason: the branch is outdated relative to main (stale workflow YAML files).

Root Cause

The PR modifies test/envtest/generator.go in the GenerateCRDInstallTest function and introduces two Go compilation errors:

Error 1 — Unused variable key (line 255):
The new first for loop at lines 254-256 declares key := client.ObjectKeyFromObject(crd) but never references key. Go enforces that all declared variables must be used, so this is a compile error. The key variable is only needed in the second for loop (which already declares its own key), so the declaration in the first loop is extraneous.

Error 2 — Duplicate short variable declaration err := (line 269):
Inside the Eventually closure in the second for loop, err is first declared at line 263 with err := k8sClient.Get(...). Then at line 269, err := k8sClient.Delete(...) attempts another short declaration of the same variable within the same scope. Go's := operator requires at least one new variable on the left-hand side, but err already exists — so the compiler rejects this. The fix is to use = (assignment) instead of := on line 269.

Verify-workflows failure (separate cause):
The verify-workflows Prow job fails because the branch has diverged from main and three workflow YAML files (.github/workflows/envtest-kube.yaml, envtest-ocp.yaml, test.yaml) are outdated. The verify script detects this and exits with: "Rebase your branch on main".

Recommendations
  1. Fix the unused variable: Remove the key := client.ObjectKeyFromObject(crd) line from the first for loop (line 255). It is not referenced in that loop body.

  2. Fix the duplicate declaration: On line 269, change err := k8sClient.Delete(ctx, crd) to err = k8sClient.Delete(ctx, crd) (use = instead of :=) since err was already declared on line 263.

  3. Rebase on main: Run git fetch upstream main && git rebase upstream/main to pick up the updated workflow YAML files and resolve the verify-workflows failure.

  4. Consider the timeout: The PR title mentions extending the timeout to 120s, but the code still uses "30s". If the original intent was a longer timeout, update the Eventually timeout parameter accordingly.

Evidence
Evidence Detail
Compiler error 1 test/envtest/generator.go:255:4: declared and not used: key — identical across all 11 envtest jobs
Compiler error 2 test/envtest/generator.go:269:9: no new variables on left side of := — identical across all 11 envtest jobs
Build result FAIL github.com/openshift/hypershift/test/envtest [build failed] — code never compiles, no tests run
Affected jobs (OCP) K8s 1.30.3, 1.31.2, 1.32.1, 1.33.2, 1.34.1, 1.35.1 — all fail with identical errors
Affected jobs (Kube) K8s 1.31.0, 1.32.0, 1.33.0, 1.34.0, 1.35.0 — all fail with identical errors
Conclusion jobs Both OCP and Kube Conclusion jobs fail because they aggregate envtest results: "Envtest jobs failed: failure"
verify-workflows Fails with OUTDATED: .github/workflows/envtest-kube.yaml has been updated on main since this branch diverged (plus 2 more workflow files)
PR diff Single file changed: test/envtest/generator.go (+19, -5) — both errors are in newly added code
PR commit c715f3f31567da6162ec64e6197001aa702e82cd

@JoelSpeed JoelSpeed force-pushed the extend-crd-remove-timeout branch 2 times, most recently from 6e58283 to 87e367b Compare April 30, 2026 21:48
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 1, 2026
The existing solution appeared to not successfully be deleting every CRD.
This time, check again and retry the delete if the object hasn't gone.
@JoelSpeed JoelSpeed force-pushed the extend-crd-remove-timeout branch from 87e367b to 7727f69 Compare May 1, 2026 10:33
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 1, 2026
@openshift-ci

openshift-ci Bot commented May 1, 2026

Copy link
Copy Markdown
Contributor

@JoelSpeed: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-workflows 87e367b link true /test verify-workflows

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@enxebre enxebre merged commit 29053b7 into openshift:main May 4, 2026
39 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants