KEP-5598: Extend opportunistic batching with rescoring by romanbaron · Pull Request #6039 · kubernetes/enhancements

romanbaron · 2026-04-29T08:45:06Z

One-line PR description:
Add rescoring to handle multi-pod-per-node workloads: when the last chosen node remains feasible, rescore it in-place and continue batching rather than flushing the cache.

Issue link: [Scheduling] OpportunisticBatching: redundant synchronous RunFilterPlugins call in batchStateCompatible for low-resource pods kubernetes#137707

Other comments:
AI tooling was used to assist in preparing this PR. All changes have been reviewed and verified by the author.

k8s-ci-robot · 2026-04-29T08:45:17Z

Hi @romanbaron. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

singh1203

Hey @romanbaron, thanks for including me. I went through the full diff carefully. Overall, the rescoring design is clean and well written. I just had one question, and thanks for answering it. 🙇
Overall, it looks good to me.

singh1203 · 2026-04-29T18:03:20Z

+#### NormalizeScore on a Subset
+
+To support rescoring, `batchState` also caches the raw score for each node in the cached list. When rescoring, only the rescored node's raw score is updated; the rest remain valid under the node-local scoring assumption.
+
+`NormalizeScore` is applied to the cached node subset rather than all feasible nodes. When the cache is fresh, meaning the cached list contains exactly the nodes a fresh full pipeline would filter to, rescoring node A and normalizing all cached nodes produces identical results to a full pipeline run. The raw scores of all other nodes are unchanged (node-local scoring), node A's raw score is updated by the explicit `Score` call, and `NormalizeScore` sees the same set with the same scores.
+


I'm curious about this part: the raw score for the other nodes stays unchanged, but the normalized scores can still shift when lastChosenNode is rescored because NormalizeScore is applied across the cached node list. Is the correctness argument that this still matches the fresh, full-pipeline result as long as the cached set is complete and fresh?

Yes, NormalizeScore scales everything relative to the min/max values, so if lastChosenNode's raw score changes, normalized scores can shift. It is fine as long as the cache is complete and fresh.

k8s-ci-robot · 2026-04-29T18:08:30Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: romanbaron, singh1203
Once this PR has been reviewed and has the lgtm label, please assign macsko for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

keps/sig-scheduling/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

KEP-5598: Rescoring

afbce01

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 29, 2026

k8s-ci-robot requested a review from dom4ha April 29, 2026 08:45

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Apr 29, 2026

k8s-ci-robot requested a review from macsko April 29, 2026 08:45

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Apr 29, 2026

github-project-automation Bot added this to SIG Scheduling Apr 29, 2026

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 29, 2026

github-project-automation Bot moved this to Needs Triage in SIG Scheduling Apr 29, 2026

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 29, 2026

singh1203 approved these changes Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-5598: Extend opportunistic batching with rescoring#6039

KEP-5598: Extend opportunistic batching with rescoring#6039
romanbaron wants to merge 1 commit intokubernetes:masterfrom
romanbaron:opportunistic-batching-rescore

romanbaron commented Apr 29, 2026

Uh oh!

k8s-ci-robot commented Apr 29, 2026

Uh oh!

singh1203 left a comment

Uh oh!

singh1203 Apr 29, 2026 •

edited

Loading

Uh oh!

romanbaron Apr 29, 2026

Uh oh!

k8s-ci-robot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

romanbaron commented Apr 29, 2026

Uh oh!

k8s-ci-robot commented Apr 29, 2026

Uh oh!

singh1203 left a comment

Choose a reason for hiding this comment

Uh oh!

singh1203 Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

romanbaron Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

singh1203 Apr 29, 2026 •

edited

Loading