OCPBUGS-85677: avoid Progressing during node reboot pod recreation#3018
OCPBUGS-85677: avoid Progressing during node reboot pod recreation#3018jluhrsen wants to merge 1 commit into
Conversation
|
@jluhrsen: This pull request references Jira Issue OCPBUGS-85677, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThe ChangesDaemonSet Rollout Status Detection
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Masterminds/semver@v1.5.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Masterminds/sprig/v3@v3.2.3: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/containernetworking/cni@v0.8.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/yaml@v1.0.1-0.20190212211648-25d852aebe32: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-bindata/go-bindata@v3.1.2+incompatible: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/gomega@v1.39.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ope ... [truncated 17357 characters] ... red in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/gengo/v2@v2.0.0-20251215205346-5ee0d033ba5b: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kms@v0.35.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kube-aggregator@v0.35.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v6@v6.3.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n" Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jluhrsen The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/payload-aggregate periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-upgrade-ipsec 10 |
|
@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9c86cb10-5dfb-11f1-81a6-9b75c8f028d7-0 |
|
/jira refresh |
|
@jluhrsen: This pull request references Jira Issue OCPBUGS-85677, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/controller/statusmanager/pod_status.go (1)
97-105:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard is not applied to the main DaemonSet update-progress condition.
The new
Generation > ObservedGenerationcheck is only folded intodsRolloutActive, but Line 104 still marks progressing onUpdatedNumberScheduled < CurrentNumberScheduledunconditionally. That preserves the node-reboot false-positive path described in this PR.Suggested fix
- } else if ds.Status.UpdatedNumberScheduled < ds.Status.CurrentNumberScheduled { + } else if ds.Status.UpdatedNumberScheduled < ds.Status.CurrentNumberScheduled && ds.Generation > ds.Status.ObservedGeneration { progressing = append(progressing, fmt.Sprintf("DaemonSet %q update is rolling out (%d out of %d updated)", dsName.String(), ds.Status.UpdatedNumberScheduled, ds.Status.CurrentNumberScheduled)) dsProgressing = true } else if ds.Status.NumberUnavailable > 0 {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/statusmanager/pod_status.go` around lines 97 - 105, The progressing branch incorrectly uses ds.Status.UpdatedNumberScheduled < ds.Status.CurrentNumberScheduled without respecting the rollout guard; instead use the previously computed boolean dsRolloutActive (which already includes generation > ObservedGeneration) when deciding whether to append the rolling-update message and set dsProgressing. Update the else-if condition that currently checks UpdatedNumberScheduled < CurrentNumberScheduled to require dsRolloutActive (e.g., dsRolloutActive && ds.Status.UpdatedNumberScheduled < ds.Status.CurrentNumberScheduled) so the generation/observedGeneration guard prevents false-positive node-reboot progress for DaemonSet updates.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/controller/statusmanager/pod_status.go`:
- Around line 97-105: The progressing branch incorrectly uses
ds.Status.UpdatedNumberScheduled < ds.Status.CurrentNumberScheduled without
respecting the rollout guard; instead use the previously computed boolean
dsRolloutActive (which already includes generation > ObservedGeneration) when
deciding whether to append the rolling-update message and set dsProgressing.
Update the else-if condition that currently checks UpdatedNumberScheduled <
CurrentNumberScheduled to require dsRolloutActive (e.g., dsRolloutActive &&
ds.Status.UpdatedNumberScheduled < ds.Status.CurrentNumberScheduled) so the
generation/observedGeneration guard prevents false-positive node-reboot progress
for DaemonSet updates.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b3ae22f7-9d4a-4c06-a369-336eec3e1a73
📒 Files selected for processing (1)
pkg/controller/statusmanager/pod_status.go
The counter check (UpdatedNumberScheduled < CurrentNumberScheduled) correctly detects CNO-initiated rollouts but also triggers during node reboots when pods are recreated without any DaemonSet spec change. Require generation > observedGeneration to confirm CNO recently updated the DaemonSet, filtering out transient pod churn from node lifecycle events. Fixes false positives during MCO node reboots with ipsec machine configs. Signed-off-by Jamo Luhrsen <jluhrsen@gmail.com> Co-authored-by Claude Code <anthropic@noreply.com>
|
@jluhrsen: This pull request references Jira Issue OCPBUGS-85677, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/payload-aggregate periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-upgrade-ipsec 10 |
|
@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e577e9f0-5e03-11f1-9d4c-ab89f3f0ae9e-0 |
|
/test 4.22-upgrade-from-stable-4.21-e2e-aws-ovn-upgrade |
|
@jluhrsen: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The counter check (UpdatedNumberScheduled < CurrentNumberScheduled) correctly detects CNO-initiated rollouts but also triggers during node reboots when pods are recreated without any DaemonSet spec change.
Require generation > observedGeneration to confirm CNO recently updated the DaemonSet, filtering out transient pod churn from node lifecycle events.
Fixes false positives during MCO node reboots with ipsec machine configs.
Summary by CodeRabbit