KEP-5598: Extend opportunistic batching with rescoring#6039
KEP-5598: Extend opportunistic batching with rescoring#6039romanbaron wants to merge 1 commit intokubernetes:masterfrom
Conversation
|
Hi @romanbaron. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
singh1203
left a comment
There was a problem hiding this comment.
Hey @romanbaron, thanks for including me. I went through the full diff carefully. Overall, the rescoring design is clean and well written. I just had one question, and thanks for answering it. 🙇
Overall, it looks good to me.
| #### NormalizeScore on a Subset | ||
|
|
||
| To support rescoring, `batchState` also caches the raw score for each node in the cached list. When rescoring, only the rescored node's raw score is updated; the rest remain valid under the node-local scoring assumption. | ||
|
|
||
| `NormalizeScore` is applied to the cached node subset rather than all feasible nodes. When the cache is fresh, meaning the cached list contains exactly the nodes a fresh full pipeline would filter to, rescoring node A and normalizing all cached nodes produces identical results to a full pipeline run. The raw scores of all other nodes are unchanged (node-local scoring), node A's raw score is updated by the explicit `Score` call, and `NormalizeScore` sees the same set with the same scores. | ||
|
|
There was a problem hiding this comment.
I'm curious about this part: the raw score for the other nodes stays unchanged, but the normalized scores can still shift when lastChosenNode is rescored because NormalizeScore is applied across the cached node list. Is the correctness argument that this still matches the fresh, full-pipeline result as long as the cached set is complete and fresh?
There was a problem hiding this comment.
Yes, NormalizeScore scales everything relative to the min/max values, so if lastChosenNode's raw score changes, normalized scores can shift. It is fine as long as the cache is complete and fresh.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: romanbaron, singh1203 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Add rescoring to handle multi-pod-per-node workloads: when the last chosen node remains feasible, rescore it in-place and continue batching rather than flushing the cache.
AI tooling was used to assist in preparing this PR. All changes have been reviewed and verified by the author.