net, sriov: memory hotplug tests#5349
Conversation
|
Warning Review limit reached
More reviews will be available in 23 minutes and 51 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughAdds guest-memory support to VM specs and live VM patching, introduces an SR-IOV VM builder, and adds fixtures, helpers, and tests that validate memory hot-plug and connectivity. ChangesSR-IOV Memory Hotplug Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Report bugs in Issues Welcome! 🎉This pull request will be automatically processed with the following features: 🔄 Automatic Actions
📋 Available CommandsPR Status Management
Review & Approval
Testing & Validation
Container Operations
Cherry-pick Operations
Branch Management
Label Management
✅ Merge RequirementsThis PR will be automatically approved when the following conditions are met:
📊 Review ProcessApprovers and ReviewersApprovers:
Reviewers:
Available Labels
AI Features
Security Checks
💡 Tips
For more information, please refer to the project documentation or contact the maintainers. |
c80a6d2 to
46a3996
Compare
|
Change: drop unused affinity param |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/network/sriov/libsriov.py`:
- Around line 74-77: The EthernetDevice constructor call for VM_SRIOV_GUEST_DEV
is missing the set_name parameter, which causes cloud-init to match the
interface by MAC address but fail to rename it to the expected name. Add
set_name=VM_SRIOV_GUEST_DEV as a parameter to the EthernetDevice constructor
(alongside addresses, match parameters) to ensure the SR-IOV interface is
renamed to the predictable name that downstream tests expect to bind to (eth1).
This aligns the implementation with the docstring promise about cloud-init
set-name identification.
In `@tests/network/sriov/memory_hotplug/conftest.py`:
- Around line 40-57: The `sriov_vm_with_memory_hotplug` fixture creates a VM
using `base_sriov_vm` but does not perform memory hot-plugging; the hotplug
operation only happens in `test_vmi_status`, making `test_connectivity` depend
on that test's side effects without explicit dependency declaration. To fix this
violation of test independence, either: (1) add a call to
`hotplug_memory_and_wait()` within the fixture before yielding the vm to ensure
the fixture provides a pre-hotplugged VM suitable for `test_connectivity`, or
(2) add an explicit
`@pytest.mark.dependency(depends=["TestSriovMemoryHotplug::test_vmi_status"])`
decorator to `test_connectivity` with a comment explaining why the dependency
exists per coding guidelines. Option 1 is preferred for better test isolation.
In `@tests/network/sriov/memory_hotplug/lib_helpers.py`:
- Around line 24-27: The `@retry` decorator on the
_wait_for_guest_memory_in_vmi_status function has an empty exceptions_dict
parameter, which prevents any exceptions from being retried. This causes
transient AttributeError exceptions that occur when
vm.vmi.instance.status.memory.guestCurrent is temporarily unavailable during
live migration to fail immediately instead of being retried. Update the
exceptions_dict parameter to include AttributeError so that these transient
exceptions are caught and retried within the timeout window.
In `@tests/network/sriov/memory_hotplug/test_sriov_memory_hotplug.py`:
- Around line 88-96: The for loop iterating over
filter_link_local_addresses(ip_addresses=iface.ipAddresses) can silently skip
all iterations if no addresses are returned, causing the test to pass without
executing any poll_tcp_connectivity checks. Add an explicit assertion right
before the for loop to verify that filter_link_local_addresses returns a
non-empty list of addresses, ensuring at least one connectivity check is always
performed.
- Around line 65-86: The test_connectivity method currently relies on memory
being hot-plugged by a previous test (test_vmi_status), creating a hidden
dependency that breaks test independence. To fix this, the test_connectivity
method should establish its own preconditions by performing the memory hot-plug
operation directly within the test before attempting to verify SR-IOV
connectivity. This way the test can run independently without depending on
another test's side effects, making it reliable whether run in isolation or as
part of a test suite.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ba054e86-04b2-4f85-a98f-058d80a6fe0c
📒 Files selected for processing (7)
libs/vm/spec.pylibs/vm/vm.pytests/network/sriov/libsriov.pytests/network/sriov/memory_hotplug/__init__.pytests/network/sriov/memory_hotplug/conftest.pytests/network/sriov/memory_hotplug/lib_helpers.pytests/network/sriov/memory_hotplug/test_sriov_memory_hotplug.py
| @dataclass | ||
| class Memory: | ||
| guest: str | ||
| maxGuest: str | None = None # noqa: N815 |
There was a problem hiding this comment.
The commit message describes the code instead of why this test is needed.
The reason that it is needed is because of a bug in the memory calculation that we used to have.
In short: virt-handler miscalculated guest memory in mem hotplug and libvirt wanted more. In turn it tried to lock additional memory pages, but since it's running in an unprivileged pod, it failed to do so.
The symptom was that during migration (in SRIOV we hot-unplug in the source and then hotplug in the target) hotplug failed. For the user it manifested into loss of network connectivity.
CNV-69778 — After workload migration the SR-IOV interface failed to attach (ON_QA)
CNV-73148 — SRIOV NIC reattachment fails after memory HP migration [v4.20] (Closed)
EDIT: And so we need to verify that we can perform memory hotplug in SRIOV without connectivity loss (and also verify the memory is increased as the user expects).
There was a problem hiding this comment.
Edited the commit message, thanks
|
|
||
| Updates the in-memory spec first so the object stays consistent with | ||
| the cluster without requiring a re-fetch after the patch. On a running | ||
| VM this triggers an automatic live migration. |
There was a problem hiding this comment.
in-memory spec first so the object stays consistent with
the cluster without requiring a re-fetch after the patch.
Isn't this too technical for consumers? They're just trying to change the VM guest memory. Please consider simplifying the text.
The migration comment is good though.
There was a problem hiding this comment.
It's a similar docstring to the existing set_networks
|
|
||
|
|
||
| @pytest.fixture(scope="class") | ||
| def sriov_vm_for_memory_hotplug( |
There was a problem hiding this comment.
Do we really care what this VM is for?
From the code below I see that all it does is add an IP to base_sriov_vm, hence please consider something like sriov_vm_with_ip.
There was a problem hiding this comment.
Renamed to sriov_vm_with_max_guest to better describe the VM config
|
|
||
|
|
||
| @pytest.fixture(scope="class") | ||
| def sriov_ref_vm( |
There was a problem hiding this comment.
If the only difference between this fixture and the previous are the memory parameters passed to base_sriov_vm then we don't need 2 fixtures, we can have one fixture with parametrized memory values (with defaults).
There was a problem hiding this comment.
I remember that we agreed we won't parametrize VM fixtures so each used VM fixture has a clear and readable config
@servolkov WDYT?
| @retry(wait_timeout=TIMEOUT_10MIN, sleep=TIMEOUT_5SEC) | ||
| def _wait_for_guest_memory_in_vmi_status(vm: BaseVirtualMachine, memory_guest: str) -> bool: | ||
| current = vm.vmi.instance.status.memory.guestCurrent | ||
| return bool(current) and parse_quantity(str(current)) == parse_quantity(memory_guest) |
There was a problem hiding this comment.
Just verifying bool(current) will be false if current differs from None?
There was a problem hiding this comment.
True when current differs from None (i.e. the field is set), and False when current is None (field not yet available)
| from utilities.constants import TIMEOUT_5SEC, TIMEOUT_10MIN | ||
|
|
||
|
|
||
| def hotplug_memory_and_wait(vm: BaseVirtualMachine, memory_guest: str) -> None: |
There was a problem hiding this comment.
At first I thought that this should be a VM method but since it must be used on a running VM perhaps it shouldn't useless we already have other "hot" methods there.
Perhaps we can split this function into 2:
One a VM method that sets memory, and a separate one in this test that just waits for guest momery.
There was a problem hiding this comment.
I followed the similar pattern from update_nad_references helper
https://github.com/RedHatQE/openshift-virtualization-tests/blob/main/libs/net/vmspec.py#L309
We change VM spec and then we wait for it to be applied
| lookup_iface_status( | ||
| vm=sriov_vm_for_memory_hotplug, | ||
| iface_name=sriov_network.name, | ||
| predicate=lambda iface: "guest-agent" in iface["infoSource"], |
There was a problem hiding this comment.
We need to validate infoSource domain because it indicates that the hot plug took place. See upstream [1].
There was a problem hiding this comment.
Done
Also added to the STD docstring
| spec = base_vmspec() | ||
| spec.template.spec.domain.memory = Memory(guest=memory_guest, maxGuest=memory_max_guest) | ||
| spec.template.spec.domain.devices.interfaces = [ # type: ignore[union-attr] | ||
| Interface(name="default", masquerade={}), |
There was a problem hiding this comment.
Why do we need the masquerade interface here? We're only checking SR-IOV and another interface can accidentally lead to false positive in connectivity test.
ef7da7a to
51445dc
Compare
|
Change: reabse upon main |
During SR-IOV live migration, the NIC is hot-unplugged on the source and re-attached on the target. A virt-handler memory miscalculation caused re-attachment to fail, which for the user manifested as loss of network connectivity. Tests verify memory hot-plug completes without connectivity loss and that guest memory is increased as expected. Signed-off-by: Asia Khromov <azhivovk@redhat.com> Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
51445dc to
7b7c8b2
Compare
|
Change: apply changes after CR
|
|
/build-and-push-container |
|
New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-5349 published |
|
/retest all |
What this PR does / why we need it:
Add SR-IOV VM memory hot-plug tests
During SR-IOV live migration, the NIC is hot-unplugged on
the source and re-attached on the target. A virt-handler
memory miscalculation caused re-attachment to fail, which
for the user manifested as loss of network connectivity.
Tests verify memory hot-plug completes without connectivity
loss and that guest memory is increased as expected.
Which issue(s) this PR fixes:
The scenario is motivated by observed NIC loss after live migration triggered by memory hot-plug on SR-IOV VMs
https://redhat.atlassian.net/browse/CNV-69778
https://redhat.atlassian.net/browse/CNV-69974
Special notes for reviewer:
Please review only the second commit of test implementation - will rebase upon main when STD is merged
jira-ticket: https://redhat.atlassian.net/browse/CNV-78566