NO-JIRA: Extend timeout for CRD removal during integration tests#8366
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughtest/envtest/generator.go: GenerateCRDInstallTest now uninstalls CRDs with an explicit per-CRD loop calling k8sClient.Delete (treating NotFound as success) instead of envtest.UninstallCRDs. After initiating deletions, the verification uses an Eventually callback that returns (bool, error): it performs a Get for each CRD, treats apierrors.IsNotFound(err) as success, retries Delete if the CRD is still present, and only reports completion when the CRD is observed as NotFound. The polling step was changed from "1s" to "200ms"; the assertion message remains Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 2 warnings, 1 inconclusive)
✅ Passed checks (8 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: JoelSpeed The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8366 +/- ##
=======================================
Coverage 36.71% 36.71%
=======================================
Files 768 768
Lines 93396 93396
=======================================
Hits 34286 34286
Misses 56426 56426
Partials 2684 2684
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
@JoelSpeed: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
I've hit timeout with 60s in the past, do we want to set 120? |
2caf3db to
4e36e26
Compare
|
Ack, extended to 120s |
jparrill
left a comment
There was a problem hiding this comment.
Dropped some comments . Thanks! EnvTests are failing too :)
|
failed again, 180? |
|
Something tells me that the "delete" isn't sticking - trying to repro locally with some more output when it does fail |
|
No luck reproducing locally, lets see if I can dump the CRD on failure and see what it says |
d777ae5 to
42850bd
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/envtest/generator.go`:
- Around line 259-265: The CRD removal wait uses Eventually(func...) with a
hardcoded timeout "30s" which is too short; update the timeout argument in that
Eventually call (the one calling k8sClient.Get on
&apiextensionsv1.CustomResourceDefinition{} and referencing crd.Name,
crd.DeletionTimestamp, crd.Finalizers) to a longer value (e.g., "2m" or a shared
timeout constant) so the CRD cleanup wait path uses the extended timeout to
reduce CI flakes.
- Around line 260-264: The code calls k8sClient.Get(ctx, key,
&apiextensionsv1.CustomResourceDefinition{}) but discards the fetched object and
then logs stale fields from the original crd variable; also unexpected Get
errors are swallowed. Change the Get to populate a local variable (e.g., live :=
apiextensionsv1.CustomResourceDefinition{}; err := k8sClient.Get(ctx, key,
&live)), check if apierrors.IsNotFound(err) then return true,nil, otherwise if
err != nil return false, err, and when constructing the error message reference
live.DeletionTimestamp and live.Finalizers instead of
crd.DeletionTimestamp/crd.Finalizers so the diagnostic uses the live cluster
state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: e938bd30-f4a3-4aa1-9750-51ca1277ffe9
📒 Files selected for processing (1)
test/envtest/generator.go
| Eventually(func() (bool, error) { | ||
| err := k8sClient.Get(ctx, key, &apiextensionsv1.CustomResourceDefinition{}) | ||
| return apierrors.IsNotFound(err) | ||
| if apierrors.IsNotFound(err) { | ||
| return true, nil | ||
| } | ||
| return false, fmt.Errorf("CRD %s should be fully removed: DeletionTimestamp=%v, Finalizers=%v", crd.Name, crd.DeletionTimestamp, crd.Finalizers) | ||
| }, "30s", "1s").Should(BeTrue(), fmt.Sprintf("CRD %s should be fully removed", crd.Name)) |
There was a problem hiding this comment.
Timeout is still 30s in the CRD removal wait path.
Line 265 still uses "30s", so this path does not apply the timeout extension intended to reduce CI flakes during CRD cleanup.
Proposed fix
- }, "30s", "1s").Should(BeTrue(), fmt.Sprintf("CRD %s should be fully removed", crd.Name))
+ }, "120s", "1s").Should(BeTrue(), fmt.Sprintf("CRD %s should be fully removed", crd.Name))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Eventually(func() (bool, error) { | |
| err := k8sClient.Get(ctx, key, &apiextensionsv1.CustomResourceDefinition{}) | |
| return apierrors.IsNotFound(err) | |
| if apierrors.IsNotFound(err) { | |
| return true, nil | |
| } | |
| return false, fmt.Errorf("CRD %s should be fully removed: DeletionTimestamp=%v, Finalizers=%v", crd.Name, crd.DeletionTimestamp, crd.Finalizers) | |
| }, "30s", "1s").Should(BeTrue(), fmt.Sprintf("CRD %s should be fully removed", crd.Name)) | |
| Eventually(func() (bool, error) { | |
| err := k8sClient.Get(ctx, key, &apiextensionsv1.CustomResourceDefinition{}) | |
| if apierrors.IsNotFound(err) { | |
| return true, nil | |
| } | |
| return false, fmt.Errorf("CRD %s should be fully removed: DeletionTimestamp=%v, Finalizers=%v", crd.Name, crd.DeletionTimestamp, crd.Finalizers) | |
| }, "120s", "1s").Should(BeTrue(), fmt.Sprintf("CRD %s should be fully removed", crd.Name)) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/envtest/generator.go` around lines 259 - 265, The CRD removal wait uses
Eventually(func...) with a hardcoded timeout "30s" which is too short; update
the timeout argument in that Eventually call (the one calling k8sClient.Get on
&apiextensionsv1.CustomResourceDefinition{} and referencing crd.Name,
crd.DeletionTimestamp, crd.Finalizers) to a longer value (e.g., "2m" or a shared
timeout constant) so the CRD cleanup wait path uses the extended timeout to
reduce CI flakes.
42850bd to
c715f3f
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/envtest/generator.go`:
- Around line 261-270: The issue is a redeclaration of err in the Eventually
closure: err is declared with err := k8sClient.Get(ctx, key, ...) and then
redeclared with err := k8sClient.Delete(ctx, crd), which causes a "no new
variables" compile error; change the second short declaration to an assignment
(err = k8sClient.Delete(ctx, crd)) so both k8sClient.Get and k8sClient.Delete
use the same err variable within the closure (referencing ctx, key, crd,
k8sClient, and apiextensionsv1.CustomResourceDefinition to locate the code).
- Around line 254-257: Remove the unused key variable and fix the err
redeclaration: in the first loop that iterates crds (where
client.ObjectKeyFromObject is called and key is assigned), delete the
unnecessary key assignment so the variable is not declared if it's not used; in
the later anonymous function loop where err is first declared with := and then
redeclared, change the second declaration to assignment (use = instead of :=) or
rename the second variable so you don't use := on an existing identifier—adjust
the statements around k8sClient.Delete, Expect(...).To(SatisfyAny(...)), and any
calls that set err accordingly to keep the scope correct.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b1960f57-0425-4ed9-9558-5c0fdebf9be2
📒 Files selected for processing (1)
test/envtest/generator.go
|
I now have all the evidence needed. Here is the complete analysis: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryAll 13 jobs (6 Envtest OCP, 5 Envtest Vanilla Kube, and 2 Conclusion aggregators) fail because the PR introduces two Go compilation errors in Root CauseThe PR modifies Error 1 — Unused variable Error 2 — Duplicate short variable declaration Verify-workflows failure (separate cause): Recommendations
Evidence
|
6e58283 to
87e367b
Compare
The existing solution appeared to not successfully be deleting every CRD. This time, check again and retry the delete if the object hasn't gone.
87e367b to
7727f69
Compare
|
@JoelSpeed: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What this PR does / why we need it:
After adjusting the output and running this in CI, I could see that the failures were because the deletiontimestamp wasn't set. I've updated the logic to make this more robust - if it sees that the CRD isn't gone, it will attempt to delete it again.
Which issue(s) this PR fixes:
Fixes
Special notes for your reviewer:
Checklist:
Summary by CodeRabbit