NVIDIA-882: DPU host: add configurable mgmt-port-resource-count#3024
NVIDIA-882: DPU host: add configurable mgmt-port-resource-count#3024tsorya wants to merge 1 commit into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (4)
WalkthroughReads an optional mgmt-port-resource-count from hardware-offload-config (default 1 when a resource name exists), adds MgmtPortResourceCount to bootstrap/render data, updates managed/self-hosted ovnkube-node templates to use that count for MgmtPortResourceName when OVN_NODE_MODE is dpu-host or smart-nic, and adds tests to verify rendered DaemonSet resources. ChangesManagement Port Resource Count Configuration
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Masterminds/semver@v1.5.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Masterminds/sprig/v3@v3.2.3: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/containernetworking/cni@v0.8.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/yaml@v1.0.1-0.20190212211648-25d852aebe32: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-bindata/go-bindata@v3.1.2+incompatible: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/gomega@v1.39.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ope ... [truncated 17357 characters] ... red in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/gengo/v2@v2.0.0-20251215205346-5ee0d033ba5b: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kms@v0.35.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kube-aggregator@v0.35.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v6@v6.3.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n" Comment |
|
@tsorya: This pull request references NVIDIA-396 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/network/ovn_kubernetes.go (1)
1074-1084: ⚡ Quick winConsider logging a warning when count is set without resource name.
The validation correctly rejects invalid count values, but if
mgmt-port-resource-countis configured withoutmgmt-port-resource-name, the count is validated but never used (templates checkMgmtPortResourceNamefirst). Adding a warning log would help admins debug misconfigurations, consistent with similar warnings elsewhere in this function (e.g., lines 1051, 1058, 1065).📝 Optional: add warning when count is orphaned
mgmtPortResourceCount, exists := cm.Data["mgmt-port-resource-count"] if exists { + if ovnConfigResult.MgmtPortResourceName == "" { + klog.Warningf("mgmt-port-resource-count is set but mgmt-port-resource-name is not; count will be ignored") + } count, err := strconv.ParseInt(mgmtPortResourceCount, 10, 64) if err != nil { return nil, fmt.Errorf("invalid mgmt-port-resource-count value %q: %w", mgmtPortResourceCount, err) } if count <= 0 { return nil, fmt.Errorf("invalid mgmt-port-resource-count value %q: must be > 0", mgmtPortResourceCount) } ovnConfigResult.MgmtPortResourceCount = count }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/network/ovn_kubernetes.go` around lines 1074 - 1084, The code currently parses mgmt-port-resource-count into ovnConfigResult.MgmtPortResourceCount but doesn't warn if mgmt-port-resource-name is missing, causing the count to be ignored by template logic that checks MgmtPortResourceName first; update the same parsing block that sets ovnConfigResult.MgmtPortResourceCount (referencing mgmt-port-resource-count, mgmt-port-resource-name and ovnConfigResult) to emit a warning via the existing logger (same style as the warnings at the earlier checks around lines handling mgmt-port-resource-name) when mgmt-port-resource-count is present but the corresponding mgmt-port-resource-name is empty or unset so admins are alerted to the orphaned count.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/network/ovn_kubernetes.go`:
- Around line 1074-1084: The code currently parses mgmt-port-resource-count into
ovnConfigResult.MgmtPortResourceCount but doesn't warn if
mgmt-port-resource-name is missing, causing the count to be ignored by template
logic that checks MgmtPortResourceName first; update the same parsing block that
sets ovnConfigResult.MgmtPortResourceCount (referencing
mgmt-port-resource-count, mgmt-port-resource-name and ovnConfigResult) to emit a
warning via the existing logger (same style as the warnings at the earlier
checks around lines handling mgmt-port-resource-name) when
mgmt-port-resource-count is present but the corresponding
mgmt-port-resource-name is empty or unset so admins are alerted to the orphaned
count.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0663a776-bcb8-44c8-9ea1-a629841b0cc8
📒 Files selected for processing (5)
bindata/network/ovn-kubernetes/managed/ovnkube-node.yamlbindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yamlpkg/bootstrap/types.gopkg/network/ovn_kubernetes.gopkg/network/ovn_kubernetes_dpu_host_test.go
|
@tsorya: This pull request references NVIDIA-882 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/network/ovn_kubernetes_dpu_host_test.go`:
- Around line 554-558: In the negative branch of the test (the else for
expectResource false) add an assertion to also verify that
ovnkubeController.Resources.Limits does not contain the resource key when
tc.mgmtPortResourceName != ""; locate the block that currently checks
ovnkubeController.Resources.Requests and mirror that check for Resources.Limits
(e.g., look up resourceName in ovnkubeController.Resources.Limits and
g.Expect(found).To(BeFalse(), "resource limit should not be set")) so the
template contract asserts both Requests and Limits are unset.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: be40fc2c-3836-4044-bfe5-959ee9c45cff
📒 Files selected for processing (5)
bindata/network/ovn-kubernetes/managed/ovnkube-node.yamlbindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yamlpkg/bootstrap/types.gopkg/network/ovn_kubernetes.gopkg/network/ovn_kubernetes_dpu_host_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/bootstrap/types.go
- pkg/network/ovn_kubernetes.go
| } else { | ||
| if tc.mgmtPortResourceName != "" { | ||
| _, found := ovnkubeController.Resources.Requests[resourceName] | ||
| g.Expect(found).To(BeFalse(), "resource request should not be set") | ||
| } |
There was a problem hiding this comment.
Assert limits absence in the negative path too.
When expectResource is false, the test only verifies Resources.Requests is unset; it should also verify Resources.Limits is unset to fully lock the template contract.
Suggested patch
} else {
if tc.mgmtPortResourceName != "" {
_, found := ovnkubeController.Resources.Requests[resourceName]
g.Expect(found).To(BeFalse(), "resource request should not be set")
+ _, found = ovnkubeController.Resources.Limits[resourceName]
+ g.Expect(found).To(BeFalse(), "resource limit should not be set")
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } else { | |
| if tc.mgmtPortResourceName != "" { | |
| _, found := ovnkubeController.Resources.Requests[resourceName] | |
| g.Expect(found).To(BeFalse(), "resource request should not be set") | |
| } | |
| } else { | |
| if tc.mgmtPortResourceName != "" { | |
| _, found := ovnkubeController.Resources.Requests[resourceName] | |
| g.Expect(found).To(BeFalse(), "resource request should not be set") | |
| _, found = ovnkubeController.Resources.Limits[resourceName] | |
| g.Expect(found).To(BeFalse(), "resource limit should not be set") | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/network/ovn_kubernetes_dpu_host_test.go` around lines 554 - 558, In the
negative branch of the test (the else for expectResource false) add an assertion
to also verify that ovnkubeController.Resources.Limits does not contain the
resource key when tc.mgmtPortResourceName != ""; locate the block that currently
checks ovnkubeController.Resources.Requests and mirror that check for
Resources.Limits (e.g., look up resourceName in
ovnkubeController.Resources.Limits and g.Expect(found).To(BeFalse(), "resource
limit should not be set")) so the template contract asserts both Requests and
Limits are unset.
…uery Add a configurable mgmt-port-resource-count key to the hardware-offload-config ConfigMap for both dpu-host and smart-nic modes. This allows OVNK to claim the full management port SR-IOV resource pool, which is required for UDN support on DPU host nodes. The resource count defaults to 1 when mgmt-port-resource-name is set, and can be overridden via the mgmt-port-resource-count ConfigMap key. Signed-off-by: Igal Tsoiref <itsoiref@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
@tsorya: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
LGTM. We also do this in dpu-simulator to support network segmentation (UDN). |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tsorya, wizhaoredhat The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary
mgmt-port-resource-countfor dpu-host and smart-nic modes. Previously both modes hardcoded the management port resource request/limit to'1'. Now both use a configurable count (defaulting to 1) read from thehardware-offload-configConfigMap.mgmt-port-resource-nameis set, and can be overridden via themgmt-port-resource-countConfigMap key.Jira: https://redhat.atlassian.net/browse/NVIDIA-882
Test plan
TestDpuHostModeResourceCounttests pass (validates template rendering with counts 8, 16, and default for both dpu-host and smart-nic)TestOVNKubernetesNodeModeTemplatestests passgo vetcleanmgmt-port-resource-countis set in the ConfigMapmgmt-port-resource-nameis set without explicit countSummary by CodeRabbit