Skip to content

net, sriov: memory hotplug tests#5349

Open
azhivovk wants to merge 1 commit into
RedHatQE:mainfrom
azhivovk:sriov-memory-hotplug
Open

net, sriov: memory hotplug tests#5349
azhivovk wants to merge 1 commit into
RedHatQE:mainfrom
azhivovk:sriov-memory-hotplug

Conversation

@azhivovk

@azhivovk azhivovk commented Jun 23, 2026

Copy link
Copy Markdown
Contributor
What this PR does / why we need it:

Add SR-IOV VM memory hot-plug tests
During SR-IOV live migration, the NIC is hot-unplugged on
the source and re-attached on the target. A virt-handler
memory miscalculation caused re-attachment to fail, which
for the user manifested as loss of network connectivity.

Tests verify memory hot-plug completes without connectivity
loss and that guest memory is increased as expected.

Which issue(s) this PR fixes:

The scenario is motivated by observed NIC loss after live migration triggered by memory hot-plug on SR-IOV VMs
https://redhat.atlassian.net/browse/CNV-69778
https://redhat.atlassian.net/browse/CNV-69974

Special notes for reviewer:

Please review only the second commit of test implementation - will rebase upon main when STD is merged

jira-ticket: https://redhat.atlassian.net/browse/CNV-78566

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@azhivovk, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 23 minutes and 51 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9e6c4d55-97d1-495b-953b-bedae2780e50

📥 Commits

Reviewing files that changed from the base of the PR and between 51445dc and 7b7c8b2.

📒 Files selected for processing (6)
  • libs/vm/spec.py
  • libs/vm/vm.py
  • tests/network/sriov/libsriov.py
  • tests/network/sriov/memory_hotplug/conftest.py
  • tests/network/sriov/memory_hotplug/lib_helpers.py
  • tests/network/sriov/memory_hotplug/test_sriov_memory_hotplug.py
📝 Walkthrough

Walkthrough

Adds guest-memory support to VM specs and live VM patching, introduces an SR-IOV VM builder, and adds fixtures, helpers, and tests that validate memory hot-plug and connectivity.

Changes

SR-IOV Memory Hotplug Feature

Layer / File(s) Summary
Memory spec and patch method
libs/vm/spec.py, libs/vm/vm.py
Memory gains optional maxGuest, and BaseVirtualMachine.set_guest_memory() updates the in-memory memory object before patching the live VM spec.
SR-IOV VM builder
tests/network/sriov/libsriov.py
Adds the SR-IOV VM construction helper that builds a Fedora VM with configurable guest memory, SR-IOV networking, cloud-init network data, and an attached no-cloud disk.
Fixtures and memory wait helper
tests/network/sriov/memory_hotplug/conftest.py, tests/network/sriov/memory_hotplug/lib_helpers.py
Adds SR-IOV VM fixtures and a helper that hot-plugs guest memory then polls VMI status until guestCurrent reaches the requested amount.
SR-IOV memory hot-plug tests
tests/network/sriov/memory_hotplug/test_sriov_memory_hotplug.py
Adds tests that hot-plug guest memory on the target VM, check VMI interface status, and verify TCP connectivity over the SR-IOV device.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

commented-coderabbitai[bot]

Suggested reviewers

  • nirdothan
  • yossisegev
  • orelmisan
  • EdDev
  • Anatw
  • frenzyfriday
  • rnetser
  • dshchedr
  • RoniKishner
  • vsibirsk
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately names the main change: SR-IOV memory hotplug tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description includes all required template sections, explains the change, links issues, gives reviewer notes, and provides a Jira ticket.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@openshift-virtualization-qe-bot

Copy link
Copy Markdown

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: A tracking issue is created for this PR and will be closed when the PR is merged or closed
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: Enabled categories: branch, can-be-merged, cherry-pick, has-conflicts, hold, needs-rebase, size, verified, wip

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message
  • /security-override - Set security check runs to pass (maintainers only)
  • /security-override cancel - Re-run security checks

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest build-container - Rebuild and test container image
  • /retest verify-bugs-are-open - verify-bugs-are-open
  • /retest all - Run all available tests

Container Operations

  • /build-and-push-container - Build and push container image (tagged with PR number)
    • Supports additional build arguments: /build-and-push-container --build-arg KEY=value

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3
  • /cherry-pick-retry <branch> - Retry a failed cherry-pick (merged PRs only)

Branch Management

  • /rebase - Rebase this PR branch onto its base branch

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. LGTM Count: Minimum 2 /lgtm from reviewers
  3. Status Checks: All required status checks must pass
  4. No Blockers: No wip, hold, has-conflicts labels and PR must be mergeable (no conflicts)
  5. Verified: PR must be marked as verified

📊 Review Process

Approvers and Reviewers

Approvers:

  • EdDev
  • dshchedr
  • myakove
  • rnetser
  • vsibirsk

Reviewers:

  • Anatw
  • EdDev
  • RoniKishner
  • azhivovk
  • dshchedr
  • frenzyfriday
  • nirdothan
  • orelmisan
  • rnetser
  • servolkov
  • vsibirsk
  • yossisegev
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
AI Features
  • Cherry-Pick Conflict Resolution: Enabled (claude/claude-opus-4-6-1m)
Security Checks
  • Suspicious Path Detection: Monitors paths: .claude/, .vscode/, .cursor/, .devcontainer/, .pi/, .github/workflows/, .github/actions/
  • Committer Identity Check: Verifies last committer matches PR author
  • Mandatory: Security checks block merge (use /security-override to bypass — maintainers only)

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is removed on new commits unless the push is detected as a clean rebase
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Container Builds: Container images are automatically tagged with the PR number
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

@azhivovk

Copy link
Copy Markdown
Contributor Author

Change: drop unused affinity param

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/network/sriov/libsriov.py`:
- Around line 74-77: The EthernetDevice constructor call for VM_SRIOV_GUEST_DEV
is missing the set_name parameter, which causes cloud-init to match the
interface by MAC address but fail to rename it to the expected name. Add
set_name=VM_SRIOV_GUEST_DEV as a parameter to the EthernetDevice constructor
(alongside addresses, match parameters) to ensure the SR-IOV interface is
renamed to the predictable name that downstream tests expect to bind to (eth1).
This aligns the implementation with the docstring promise about cloud-init
set-name identification.

In `@tests/network/sriov/memory_hotplug/conftest.py`:
- Around line 40-57: The `sriov_vm_with_memory_hotplug` fixture creates a VM
using `base_sriov_vm` but does not perform memory hot-plugging; the hotplug
operation only happens in `test_vmi_status`, making `test_connectivity` depend
on that test's side effects without explicit dependency declaration. To fix this
violation of test independence, either: (1) add a call to
`hotplug_memory_and_wait()` within the fixture before yielding the vm to ensure
the fixture provides a pre-hotplugged VM suitable for `test_connectivity`, or
(2) add an explicit
`@pytest.mark.dependency(depends=["TestSriovMemoryHotplug::test_vmi_status"])`
decorator to `test_connectivity` with a comment explaining why the dependency
exists per coding guidelines. Option 1 is preferred for better test isolation.

In `@tests/network/sriov/memory_hotplug/lib_helpers.py`:
- Around line 24-27: The `@retry` decorator on the
_wait_for_guest_memory_in_vmi_status function has an empty exceptions_dict
parameter, which prevents any exceptions from being retried. This causes
transient AttributeError exceptions that occur when
vm.vmi.instance.status.memory.guestCurrent is temporarily unavailable during
live migration to fail immediately instead of being retried. Update the
exceptions_dict parameter to include AttributeError so that these transient
exceptions are caught and retried within the timeout window.

In `@tests/network/sriov/memory_hotplug/test_sriov_memory_hotplug.py`:
- Around line 88-96: The for loop iterating over
filter_link_local_addresses(ip_addresses=iface.ipAddresses) can silently skip
all iterations if no addresses are returned, causing the test to pass without
executing any poll_tcp_connectivity checks. Add an explicit assertion right
before the for loop to verify that filter_link_local_addresses returns a
non-empty list of addresses, ensuring at least one connectivity check is always
performed.
- Around line 65-86: The test_connectivity method currently relies on memory
being hot-plugged by a previous test (test_vmi_status), creating a hidden
dependency that breaks test independence. To fix this, the test_connectivity
method should establish its own preconditions by performing the memory hot-plug
operation directly within the test before attempting to verify SR-IOV
connectivity. This way the test can run independently without depending on
another test's side effects, making it reliable whether run in isolation or as
part of a test suite.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ba054e86-04b2-4f85-a98f-058d80a6fe0c

📥 Commits

Reviewing files that changed from the base of the PR and between 5343f11 and c80a6d2.

📒 Files selected for processing (7)
  • libs/vm/spec.py
  • libs/vm/vm.py
  • tests/network/sriov/libsriov.py
  • tests/network/sriov/memory_hotplug/__init__.py
  • tests/network/sriov/memory_hotplug/conftest.py
  • tests/network/sriov/memory_hotplug/lib_helpers.py
  • tests/network/sriov/memory_hotplug/test_sriov_memory_hotplug.py

Comment thread tests/network/sriov/libsriov.py Outdated
Comment thread tests/network/sriov/memory_hotplug/conftest.py Outdated
Comment thread tests/network/sriov/memory_hotplug/lib_helpers.py Outdated
Comment thread tests/network/sriov/memory_hotplug/test_sriov_memory_hotplug.py
Comment thread tests/network/sriov/memory_hotplug/test_sriov_memory_hotplug.py Outdated
Comment thread libs/vm/spec.py
@dataclass
class Memory:
guest: str
maxGuest: str | None = None # noqa: N815

@nirdothan nirdothan Jun 25, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message describes the code instead of why this test is needed.
The reason that it is needed is because of a bug in the memory calculation that we used to have.
In short: virt-handler miscalculated guest memory in mem hotplug and libvirt wanted more. In turn it tried to lock additional memory pages, but since it's running in an unprivileged pod, it failed to do so.
The symptom was that during migration (in SRIOV we hot-unplug in the source and then hotplug in the target) hotplug failed. For the user it manifested into loss of network connectivity.

CNV-69778 — After workload migration the SR-IOV interface failed to attach (ON_QA)
CNV-73148 — SRIOV NIC reattachment fails after memory HP migration [v4.20] (Closed)

EDIT: And so we need to verify that we can perform memory hotplug in SRIOV without connectivity loss (and also verify the memory is increased as the user expects).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited the commit message, thanks

Comment thread libs/vm/vm.py

Updates the in-memory spec first so the object stays consistent with
the cluster without requiring a re-fetch after the patch. On a running
VM this triggers an automatic live migration.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in-memory spec first so the object stays consistent with
the cluster without requiring a re-fetch after the patch.

Isn't this too technical for consumers? They're just trying to change the VM guest memory. Please consider simplifying the text.
The migration comment is good though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a similar docstring to the existing set_networks



@pytest.fixture(scope="class")
def sriov_vm_for_memory_hotplug(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really care what this VM is for?
From the code below I see that all it does is add an IP to base_sriov_vm, hence please consider something like sriov_vm_with_ip.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to sriov_vm_with_max_guest to better describe the VM config



@pytest.fixture(scope="class")
def sriov_ref_vm(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only difference between this fixture and the previous are the memory parameters passed to base_sriov_vm then we don't need 2 fixtures, we can have one fixture with parametrized memory values (with defaults).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that we agreed we won't parametrize VM fixtures so each used VM fixture has a clear and readable config
@servolkov WDYT?

@retry(wait_timeout=TIMEOUT_10MIN, sleep=TIMEOUT_5SEC)
def _wait_for_guest_memory_in_vmi_status(vm: BaseVirtualMachine, memory_guest: str) -> bool:
current = vm.vmi.instance.status.memory.guestCurrent
return bool(current) and parse_quantity(str(current)) == parse_quantity(memory_guest)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just verifying bool(current) will be false if current differs from None?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True when current differs from None (i.e. the field is set), and False when current is None (field not yet available)

from utilities.constants import TIMEOUT_5SEC, TIMEOUT_10MIN


def hotplug_memory_and_wait(vm: BaseVirtualMachine, memory_guest: str) -> None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought that this should be a VM method but since it must be used on a running VM perhaps it shouldn't useless we already have other "hot" methods there.
Perhaps we can split this function into 2:
One a VM method that sets memory, and a separate one in this test that just waits for guest momery.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the similar pattern from update_nad_references helper
https://github.com/RedHatQE/openshift-virtualization-tests/blob/main/libs/net/vmspec.py#L309
We change VM spec and then we wait for it to be applied

lookup_iface_status(
vm=sriov_vm_for_memory_hotplug,
iface_name=sriov_network.name,
predicate=lambda iface: "guest-agent" in iface["infoSource"],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to validate infoSource domain because it indicates that the hot plug took place. See upstream [1].

[1] https://github.com/kubevirt/kubevirt/blob/a3a194cc10b919052fd20105122ab09cfc9ed0ed/tests/network/sriov.go#L403

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done
Also added to the STD docstring

spec = base_vmspec()
spec.template.spec.domain.memory = Memory(guest=memory_guest, maxGuest=memory_max_guest)
spec.template.spec.domain.devices.interfaces = [ # type: ignore[union-attr]
Interface(name="default", masquerade={}),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the masquerade interface here? We're only checking SR-IOV and another interface can accidentally lead to false positive in connectivity test.

@azhivovk

Copy link
Copy Markdown
Contributor Author

Change: reabse upon main

During SR-IOV live migration, the NIC is hot-unplugged on
the source and re-attached on the target. A virt-handler
memory miscalculation caused re-attachment to fail, which
for the user manifested as loss of network connectivity.

Tests verify memory hot-plug completes without connectivity
loss and that guest memory is increased as expected.

Signed-off-by: Asia Khromov <azhivovk@redhat.com>
Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@azhivovk azhivovk force-pushed the sriov-memory-hotplug branch from 51445dc to 7b7c8b2 Compare June 25, 2026 15:41
@azhivovk

Copy link
Copy Markdown
Contributor Author

Change: apply changes after CR

  • Rename VM fixture
  • Add domain to predicate post HP
  • Edit commit message

@azhivovk

Copy link
Copy Markdown
Contributor Author

/build-and-push-container

@openshift-virtualization-qe-bot-6

Copy link
Copy Markdown

New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-5349 published

@azhivovk

Copy link
Copy Markdown
Contributor Author

/retest all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants