Add Windows FIO VM workload support#1229
Conversation
|
Warning Review limit reached
More reviews will be available in 51 minutes and 34 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (46)
📝 WalkthroughWalkthroughAdds a Windows FIO workload: PowerShell runner and parser, KubeVirt VM and CDI templates, a WinfioVM Python orchestrator that runs benchmarks and uploads results, Jenkins/workload registration and packaging updates, plus golden-file tests for multiple CI scenarios. ChangesWindows FIO Workload Implementation
Test Golden Files and Validation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winfio_vm_ODF_PVC_True/windows_dv.yaml (1)
1-20:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDifferentiate
ODF_PVC=TruefromODF_PVC=Falsein this fixture set.Lines 1-20 are effectively the same as the false variant, so this func_ci golden does not exercise an ODF-specific edge path for DataVolume rendering.
As per coding guidelines, "Ensure test coverage for edge cases. Verify mocks and fixtures are properly scoped."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winfio_vm_ODF_PVC_True/windows_dv.yaml` around lines 1 - 20, The fixture duplicates the non-ODF variant and doesn't exercise ODF-specific rendering; update the DataVolume fixture (DataVolume kind with metadata.name "windows-clone-dv" and spec.pvc fields) to reflect ODF_PVC=True by changing the pvc.storageClassName to an ODF storage class used in our tests, adjust pvc accessModes/volumeMode to the ODF-expected values if different (e.g., ensure ReadWriteOnce/Filesystem or the cluster's ODF mode), and make the metadata.name unique (e.g., append "-odf") or add an ODF-specific annotation so the test vector triggers the ODF code path during rendering.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@benchmark_runner/common/template_operations/templates/winfio/internal_data/winfio_vm_template.yaml`:
- Around line 69-70: The tpm field is misplaced inside an interface object (near
networkInterfaceMultiqueue) and must be moved to the domain devices section;
remove the tpm: { } entry from the interfaces[] block and add it as
spec.template.spec.domain.devices.tpm (i.e., ensure tpm is a sibling of
interfaces under devices), keeping any empty/object value intact so the VM
manifest validates against KubeVirt's VirtualMachine schema.
In `@benchmark_runner/common/virtctl/virtctl.py`:
- Around line 128-136: The sleep before checking SSH is currently hardcoded to
OC.DELAY inside _wait_for_virtctl_ssh which forces the same boot delay for all
callers; change the method signature of _wait_for_virtctl_ssh to accept a
configurable delay parameter (e.g., initial_delay or delay, defaulting to
OC.DELAY) and replace the unconditional time.sleep(OC.DELAY) and loop increments
to use that parameter; update callers that need the Windows-specific extra wait
to pass the larger delay while leaving others to use the default, and keep
references to vm_name, namespace, key_path, and username unchanged so callers
can locate and update call sites easily.
In `@benchmark_runner/workloads/winfio_vm.py`:
- Around line 57-63: The helper _create_vm currently calls save_error_logs()
inside its try/except which runs in child worker processes; remove the
save_error_logs() call from _create_vm’s except block and any other phase helper
invoked by _run_parallel_phases (the other occurrences near the same helpers),
and let the top-level run() exception handler perform centralized error-log
collection instead; keep the except re-raising the original exception (raise
err) so the main process can catch it and call save_error_logs() there.
- Around line 65-83: The _prepare_vm and _run_fio methods currently swallow all
exceptions (catch Exception and only logger.warning), which allows the pipeline
to continue with missing data; modify these methods so failures are propagated:
in _prepare_vm (method name _prepare_vm and calls to _get_vm_name,
_virtctl.start_vm_sync, _virtctl._wait_for_virtctl_ssh, and
_scp_to_vm_with_verify) and in _run_fio replace the broad except Exception
handlers with either (a) catch specific exceptions you expect, log an error with
context, and re-raise the exception, or (b) log the error and raise a new
RuntimeError including the original exception as the cause, or (c) return a
clear failure value (e.g., False) and ensure the caller checks it and fails the
workload; ensure any logging uses logger.error and includes the vm identifier
and exception details so the pipeline fails fast instead of proceeding silently.
In `@jenkins/PerfCI/03_PerfCI_Workloads_Deployment/Jenkinsfile`:
- Around line 159-161: The mapping for the "winfio_vm" workload sets WINDOWS_URL
to WINDOWS_SERVER_MSSQL_2025_URL, causing the wrong Windows image to be used;
update the conditional in the Jenkinsfile so that when workload == "winfio_vm"
it assigns WINDOWS_URL = WINDOWS_SERVER_2025_URL instead of
WINDOWS_SERVER_MSSQL_2025_URL (keep the if (workload == "winfio_vm") block and
only change the assigned constant name).
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winfio_vm_ODF_PVC_False/winfio_vm.yaml`:
- Around line 47-53: Move the misplaced tpm entry out of the network interface
object: remove the tpm: { } key under the interfaces list (where name: nic-0,
model: virtio, masquerade etc. are defined) and add it under the VM spec
devices.tpm block (create or update devices.tpm) so TPM is defined at top-level
devices.tpm rather than inside interfaces[]. Ensure the final YAML uses devices:
{ tpm: ... } and no tpm key remains inside interfaces.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winfio_vm_ODF_PVC_False/winfio_vm.yaml`:
- Around line 47-53: The TPM key is incorrectly nested under the network
interface entry (devices.interfaces[] — e.g., the interface named "nic-0"); move
the tpm mapping out of the interface object into the device-level location
spec.template.spec.domain.devices.tpm so the TPM is defined under devices.tpm
(not devices.interfaces[].tpm), preserving its contents and removing the tpm key
from the interface.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/release_winfio_vm_ODF_PVC_False/winfio_vm.yaml`:
- Around line 47-52: The YAML places device-level fields under an interface
item; move networkInterfaceMultiqueue and tpm out of the interface entry and
into the domain devices section (spec.template.spec.domain.devices) instead of
under the interfaces list entry (the nic-0 interface). Ensure
networkInterfaceMultiqueue is defined at the same level as other device configs
under domain.devices and add a tpm entry under domain.devices (not
per-interface) so KubeVirt validation recognizes them.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winfio_vm_ODF_PVC_False/winfio_vm.yaml`:
- Around line 47-53: The fields networkInterfaceMultiqueue and tpm are
incorrectly nested under interfaces[]; move them up to
spec.template.spec.domain.devices so they live alongside other device settings.
Edit the resource template where interfaces: - name: nic-0 ... currently
contains networkInterfaceMultiqueue: true and tpm: { } — remove those two keys
from the interfaces entry and add them under spec.template.spec.domain.devices
(e.g., devices: { networkInterfaceMultiqueue: true, tpm: {}}) so the KubeVirt
schema is satisfied.
---
Outside diff comments:
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winfio_vm_ODF_PVC_True/windows_dv.yaml`:
- Around line 1-20: The fixture duplicates the non-ODF variant and doesn't
exercise ODF-specific rendering; update the DataVolume fixture (DataVolume kind
with metadata.name "windows-clone-dv" and spec.pvc fields) to reflect
ODF_PVC=True by changing the pvc.storageClassName to an ODF storage class used
in our tests, adjust pvc accessModes/volumeMode to the ODF-expected values if
different (e.g., ensure ReadWriteOnce/Filesystem or the cluster's ODF mode), and
make the metadata.name unique (e.g., append "-odf") or add an ODF-specific
annotation so the test vector triggers the ODF code path during rendering.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: e5558b97-5b22-460a-9dab-b6a6797b2173
📒 Files selected for processing (43)
benchmark_runner/common/template_operations/templates/winfio/__init__.pybenchmark_runner/common/template_operations/templates/winfio/internal_data/__init__.pybenchmark_runner/common/template_operations/templates/winfio/internal_data/parse_fio_results_template.ps1benchmark_runner/common/template_operations/templates/winfio/internal_data/run_fio_benchmark_template.ps1benchmark_runner/common/template_operations/templates/winfio/internal_data/windows_dv_template.yamlbenchmark_runner/common/template_operations/templates/winfio/internal_data/winfio_vm_template.yamlbenchmark_runner/common/template_operations/templates/winfio/winfio_data_template.yamlbenchmark_runner/common/virtctl/virtctl.pybenchmark_runner/main/environment_variables.pybenchmark_runner/workloads/winfio_vm.pyjenkins/PerfCI/03_PerfCI_Workloads_Deployment/Jenkinsfilepyproject.tomltests/unittest/benchmark_runner/common/template_operations/golden_files.pytests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winfio_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winfio_vm_ODF_PVC_False/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winfio_vm_ODF_PVC_False/winfio_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winfio_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winfio_vm_ODF_PVC_True/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/chaos_ci_winfio_vm_ODF_PVC_True/winfio_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winfio_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winfio_vm_ODF_PVC_False/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winfio_vm_ODF_PVC_False/winfio_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winfio_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winfio_vm_ODF_PVC_True/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winfio_vm_ODF_PVC_True/winfio_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winfio_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winfio_vm_ODF_PVC_False/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winfio_vm_ODF_PVC_False/winfio_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winfio_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winfio_vm_ODF_PVC_True/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winfio_vm_ODF_PVC_True/winfio_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_winfio_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_winfio_vm_ODF_PVC_False/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_winfio_vm_ODF_PVC_False/winfio_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_winfio_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_winfio_vm_ODF_PVC_True/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/release_winfio_vm_ODF_PVC_True/winfio_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winfio_vm_ODF_PVC_False/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winfio_vm_ODF_PVC_False/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winfio_vm_ODF_PVC_False/winfio_vm.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winfio_vm_ODF_PVC_True/namespace.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winfio_vm_ODF_PVC_True/windows_dv.yamltests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winfio_vm_ODF_PVC_True/winfio_vm.yaml
| networkInterfaceMultiqueue: true | ||
| tpm: { } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In KubeVirt VirtualMachine schema, where should TPM be defined: under spec.template.spec.domain.devices.tpm or inside spec.template.spec.domain.devices.interfaces[]?
💡 Result:
TPM in the KubeVirt VirtualMachine should be defined at spec.template.spec.domain.devices.tpm (not inside spec.template.spec.domain.devices.interfaces[]). For example, KubeVirt’s vTPM device configuration is shown under domain.devices.tpm in the VM template spec, where spec.template.spec.domain.devices.tpm specifies the vTPM device to add to the VM; optional fields like spec.template.spec.domain.devices.tpm.persistent control TPM state persistence after shutdown.[1]
Citations:
🏁 Script executed:
find . -path "*benchmark_runner/common/template_operations/templates/winfio/internal_data/winfio_vm_template.yaml"Repository: redhat-performance/benchmark-runner
Length of output: 180
🏁 Script executed:
cat -n ./benchmark_runner/common/template_operations/templates/winfio/internal_data/winfio_vm_template.yamlRepository: redhat-performance/benchmark-runner
Length of output: 5799
tpm is incorrectly nested under interfaces[]; it must be a direct child of devices.
According to the KubeVirt VirtualMachine schema, tpm should be defined at spec.template.spec.domain.devices.tpm, not inside an interface object. This placement will cause VM manifest validation to fail.
Suggested fix
devices:
blockMultiQueue: false
+ tpm: {}
disks:
@@
interfaces:
- masquerade: {}
model: virtio
name: nic-0
networkInterfaceMultiqueue: true
- tpm: { }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@benchmark_runner/common/template_operations/templates/winfio/internal_data/winfio_vm_template.yaml`
around lines 69 - 70, The tpm field is misplaced inside an interface object
(near networkInterfaceMultiqueue) and must be moved to the domain devices
section; remove the tpm: { } entry from the interfaces[] block and add it as
spec.template.spec.domain.devices.tpm (i.e., ensure tpm is a sibling of
interfaces under devices), keeping any empty/object value intact so the VM
manifest validates against KubeVirt's VirtualMachine schema.
| # wait before checking for SSH to avoid boot issues | ||
| time.sleep(OC.DELAY) | ||
| for i in range(0, timeout, OC.DELAY): | ||
| if self._ssh_ready(vm_name=vm_name, namespace=namespace, key_path=key_path, username=username): | ||
| logger.debug(f'SSH ready on {vm_name} after {i}s') | ||
| return True | ||
| if i > 0 and i % OC.DELAY == 0: | ||
| logger.debug(f'Waiting for SSH on {vm_name}... ({i}s)') | ||
| time.sleep(self.SLEEP_TIME) | ||
| time.sleep(OC.DELAY) |
There was a problem hiding this comment.
The new boot delay is hardcoded for all workloads instead of being caller-configurable.
This adds unconditional latency to every _wait_for_virtctl_ssh call and doesn’t match the goal of adding a Windows-specific delay knob.
💡 Suggested fix
- def _wait_for_virtctl_ssh(self, vm_name: str, namespace: str = '', key_path: str = '', username: str = '', timeout: int = OC.SHORT_TIMEOUT) -> bool:
+ def _wait_for_virtctl_ssh(self, vm_name: str, namespace: str = '', key_path: str = '', username: str = '',
+ timeout: int = OC.SHORT_TIMEOUT, delay: int = 0) -> bool:
"""Wait for SSH to be accessible on a VM via virtctl."""
- # wait before checking for SSH to avoid boot issues
- time.sleep(OC.DELAY)
+ if delay > 0:
+ time.sleep(delay)
for i in range(0, timeout, OC.DELAY):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@benchmark_runner/common/virtctl/virtctl.py` around lines 128 - 136, The sleep
before checking SSH is currently hardcoded to OC.DELAY inside
_wait_for_virtctl_ssh which forces the same boot delay for all callers; change
the method signature of _wait_for_virtctl_ssh to accept a configurable delay
parameter (e.g., initial_delay or delay, defaulting to OC.DELAY) and replace the
unconditional time.sleep(OC.DELAY) and loop increments to use that parameter;
update callers that need the Windows-specific extra wait to pass the larger
delay while leaving others to use the default, and keep references to vm_name,
namespace, key_path, and username unchanged so callers can locate and update
call sites easily.
There was a problem hiding this comment.
[Claude Code] The initial time.sleep(OC.DELAY) before the loop is required — without it, Windows VMs get stuck because UEFI/OVMF boot takes time before SSH is reachable. Removing it causes the SSH poll to start too early and hang. The loop interval using OC.DELAY may be relaxed to self.SLEEP_TIME in a follow-up after testing.
There was a problem hiding this comment.
@ebattat, understood — the initial time.sleep(OC.DELAY) is intentional to give Windows UEFI/OVMF time to complete its boot sequence before SSH probing begins, and removing it causes early polls that stall. That makes sense for the winfio_vm use case.
The remaining concern is that this delay now applies to all callers of _wait_for_virtctl_ssh, including non-Windows workloads that don't need the extra wait. A delay: int = 0 default parameter would let Windows callers opt-in while keeping existing callers unaffected — but I understand if you'd prefer to keep it simple for now and revisit the signature later.
Regarding the loop interval, noted that relaxing back to self.SLEEP_TIME is on your radar for a follow-up after testing. Would you like me to open a GitHub issue to track that?
(◕‿◕)っ🐇
✏️ Learnings added
Learnt from: ebattat
Repo: redhat-performance/benchmark-runner PR: 1229
File: benchmark_runner/common/virtctl/virtctl.py:128-136
Timestamp: 2026-05-19T15:01:02.688Z
Learning: In `benchmark_runner/common/virtctl/virtctl.py`, the initial `time.sleep(OC.DELAY)` added before the SSH polling loop in `_wait_for_virtctl_ssh` is intentional and required for Windows VM workloads (`winfio_vm`). Windows VMs using UEFI/OVMF take additional time to boot before SSH becomes reachable; removing this sleep causes the SSH poll to start too early and hang. The loop interval change from `self.SLEEP_TIME` to `OC.DELAY` may be relaxed back to `self.SLEEP_TIME` in a future follow-up after testing.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
You right, I will fix it
There was a problem hiding this comment.
@arpsharm,
I have tested it several times with time.sleep(self.SLEEP_TIME), but it got stuck when running scale Win FIO VMs. Therefore, I prefer to keep the longer sleep interval of 30 seconds instead of 3 seconds.
Currently, we are not using this method for bootstorm workload, so the sleep time is not an issue.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| def _create_vm(self, vm_num: str): | ||
| try: | ||
| self._oc.create_async(yaml=self._get_vm_yaml(vm_num)) | ||
| self._oc.wait_for_vm_status(vm_name=self._get_vm_name(vm_num), status=VMStatus.Stopped) | ||
| except Exception as err: | ||
| self.save_error_logs() | ||
| raise err |
There was a problem hiding this comment.
Avoid save_error_logs() inside phase helpers that run in parallel worker processes.
These helper methods are executed by _run_parallel_phases; calling save_error_logs() there risks the known multiprocessing SSL crash path. Let the top-level run() exception handler do centralized error-log handling.
💡 Suggested fix
def _create_vm(self, vm_num: str):
try:
self._oc.create_async(yaml=self._get_vm_yaml(vm_num))
self._oc.wait_for_vm_status(vm_name=self._get_vm_name(vm_num), status=VMStatus.Stopped)
- except Exception as err:
- self.save_error_logs()
- raise err
+ except Exception:
+ raise
@@
- except Exception as err:
- self.save_error_logs()
- raise err
+ except Exception:
+ raise
@@
- except Exception as err:
- self.save_error_logs()
- raise err
+ except Exception:
+ raiseBased on learnings: "avoid calling _save_error_logs() from within multiprocessing child processes ... move/limit _save_error_logs() so it runs only in the main process."
Also applies to: 160-162, 168-170
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@benchmark_runner/workloads/winfio_vm.py` around lines 57 - 63, The helper
_create_vm currently calls save_error_logs() inside its try/except which runs in
child worker processes; remove the save_error_logs() call from _create_vm’s
except block and any other phase helper invoked by _run_parallel_phases (the
other occurrences near the same helpers), and let the top-level run() exception
handler perform centralized error-log collection instead; keep the except
re-raising the original exception (raise err) so the main process can catch it
and call save_error_logs() there.
| def _prepare_vm(self, vm_num: str): | ||
| try: | ||
| vm_name = self._get_vm_name(vm_num) | ||
| self._virtctl.start_vm_sync(vm_name=vm_name) | ||
| if not self._virtctl._wait_for_virtctl_ssh(vm_name=vm_name, namespace=self.__namespace, | ||
| key_path=self.__ssh_key_path, username=self.__username, | ||
| timeout=OC.SHORT_TIMEOUT): | ||
| logger.warning(f'SSH never became ready on VM {vm_name}, skipping') | ||
| return | ||
|
|
||
| scripts = ['run_fio_benchmark.ps1', 'parse_fio_results.ps1'] | ||
| for script in scripts: | ||
| self._scp_to_vm_with_verify(vm_name=vm_name, | ||
| local_path=os.path.join(self._run_artifacts_path, script), | ||
| remote_path=f'{self.__remote_dir}/{script}') | ||
| except Exception as err: | ||
| logger.warning(f'Failed to prepare VM {self._get_vm_name(vm_num)}: {err}') | ||
|
|
||
|
|
There was a problem hiding this comment.
_prepare_vm / _run_fio swallow hard failures, so runs can silently succeed with missing data.
Both methods catch exceptions and only log warnings; the pipeline then proceeds to collect/upload, which can end in partial/no results without failing the workload.
💡 Suggested fix
def _prepare_vm(self, vm_num: str):
try:
@@
- except Exception as err:
- logger.warning(f'Failed to prepare VM {self._get_vm_name(vm_num)}: {err}')
+ except Exception as err:
+ logger.warning(f'Failed to prepare VM {self._get_vm_name(vm_num)}: {err}')
+ raise
@@
def _run_fio(self, vm_num: str):
try:
@@
- except Exception as err:
- logger.warning(f'FIO failed on VM {self._get_vm_name(vm_num)}: {err}')
+ except Exception as err:
+ logger.warning(f'FIO failed on VM {self._get_vm_name(vm_num)}: {err}')
+ raiseAlso applies to: 84-94
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@benchmark_runner/workloads/winfio_vm.py` around lines 65 - 83, The
_prepare_vm and _run_fio methods currently swallow all exceptions (catch
Exception and only logger.warning), which allows the pipeline to continue with
missing data; modify these methods so failures are propagated: in _prepare_vm
(method name _prepare_vm and calls to _get_vm_name, _virtctl.start_vm_sync,
_virtctl._wait_for_virtctl_ssh, and _scp_to_vm_with_verify) and in _run_fio
replace the broad except Exception handlers with either (a) catch specific
exceptions you expect, log an error with context, and re-raise the exception, or
(b) log the error and raise a new RuntimeError including the original exception
as the cause, or (c) return a clear failure value (e.g., False) and ensure the
caller checks it and fails the workload; ensure any logging uses logger.error
and includes the vm identifier and exception details so the pipeline fails fast
instead of proceeding silently.
| if (workload == "winfio_vm") { | ||
| WINDOWS_URL = WINDOWS_SERVER_MSSQL_2025_URL | ||
| } |
There was a problem hiding this comment.
Use the non-MSSQL Windows image URL for winfio_vm
On Line 160, winfio_vm is mapped to WINDOWS_SERVER_MSSQL_2025_URL. For this workload, it should use WINDOWS_SERVER_2025_URL; otherwise scale/perf runs may boot the wrong base image.
Proposed fix
if (workload == "winfio_vm") {
- WINDOWS_URL = WINDOWS_SERVER_MSSQL_2025_URL
+ WINDOWS_URL = WINDOWS_SERVER_2025_URL
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (workload == "winfio_vm") { | |
| WINDOWS_URL = WINDOWS_SERVER_MSSQL_2025_URL | |
| } | |
| if (workload == "winfio_vm") { | |
| WINDOWS_URL = WINDOWS_SERVER_2025_URL | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@jenkins/PerfCI/03_PerfCI_Workloads_Deployment/Jenkinsfile` around lines 159 -
161, The mapping for the "winfio_vm" workload sets WINDOWS_URL to
WINDOWS_SERVER_MSSQL_2025_URL, causing the wrong Windows image to be used;
update the conditional in the Jenkinsfile so that when workload == "winfio_vm"
it assigns WINDOWS_URL = WINDOWS_SERVER_2025_URL instead of
WINDOWS_SERVER_MSSQL_2025_URL (keep the if (workload == "winfio_vm") block and
only change the assigned constant name).
| interfaces: | ||
| - masquerade: {} | ||
| model: virtio | ||
| name: nic-0 | ||
| networkInterfaceMultiqueue: true | ||
| tpm: { } | ||
| features: |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In KubeVirt VirtualMachine API (kubevirt.io/v1), is spec.template.spec.domain.devices.interfaces[].tpmvalid, or must TPM be defined atspec.template.spec.domain.devices.tpm?
💡 Result:
In the KubeVirt VirtualMachine API (kubevirt.io/v1), TPM must be defined at spec.template.spec.domain.devices.tpm (not at spec.template.spec.domain.devices.interfaces[].tpm). The devices.interfaces field is for network interface configuration, while TPM is a top-level device under devices.tpm. KubeVirt user docs show a vTPM example using spec.template.spec.domain.devices.tpm (and nested properties like persistent) under domain.devices, not under domain.devices.interfaces[]. For example, OKD’s documentation explicitly states that spec.template.spec.domain.devices.tpm specifies the vTPM device to add to the VM [1] and provides an example manifest with domain.devices.tpm [2]. Separately, KubeVirt’s interface/network docs describe network interfaces as configured in spec.domain.devices.interfaces and list network-interface-related fields, with no mention of TPM belonging there [3]. Additionally, KubeVirt’s own discussion of the TPM API refers to spec.domain.devices.tpm as the VirtualMachineInstanceSpec attribute that can be implicitly disabled via nil (and later improved with an Enabled attribute), again indicating TPM is modeled under devices.tpm rather than inside interfaces[] [4][5]. So spec.template.spec.domain.devices.interfaces[].tpm is not a valid KubeVirt v1 API path; the correct field is spec.template.spec.domain.devices.tpm. Answer: must use spec.template.spec.domain.devices.tpm.
Citations:
- 1: No way of overriding the preference value of
PreferredTPMfrom within aVirtualMachinekubevirt/kubevirt#14315 - 2: https://kubevirt.io/api-reference/v0.5.0/definitions.html
- 3: https://kubevirt.io/user-guide/compute/persistent_tpm_and_uefi_state/
- 4: https://kubevirt.io/user-guide/network/interfaces_and_networks/
- 5: https://kubevirt.io/user-guide/user_workloads/templates/
🏁 Script executed:
fd "winfio_vm.yaml" --type fRepository: redhat-performance/benchmark-runner
Length of output: 1266
🏁 Script executed:
cat -n tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winfio_vm_ODF_PVC_False/winfio_vm.yaml | sed -n '40,60p'Repository: redhat-performance/benchmark-runner
Length of output: 774
🏁 Script executed:
cat -n tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winfio_vm_ODF_PVC_False/winfio_vm.yaml | sed -n '30,65p'Repository: redhat-performance/benchmark-runner
Length of output: 1229
Move TPM out of the network interface object.
Line 52 places tpm under interfaces[], which is not a valid interface field in KubeVirt VM spec and will cause rendered manifests to fail validation. TPM must be defined at the top-level devices.tpm, not within the interfaces array.
Proposed fix
devices:
blockMultiQueue: false
+ tpm: {}
disks:
@@
- masquerade: {}
model: virtio
name: nic-0
networkInterfaceMultiqueue: true
- tpm: { }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/func_ci_winfio_vm_ODF_PVC_False/winfio_vm.yaml`
around lines 47 - 53, Move the misplaced tpm entry out of the network interface
object: remove the tpm: { } key under the interfaces list (where name: nic-0,
model: virtio, masquerade etc. are defined) and add it under the VM spec
devices.tpm block (create or update devices.tpm) so TPM is defined at top-level
devices.tpm rather than inside interfaces[]. Ensure the final YAML uses devices:
{ tpm: ... } and no tpm key remains inside interfaces.
| interfaces: | ||
| - masquerade: {} | ||
| model: virtio | ||
| name: nic-0 | ||
| networkInterfaceMultiqueue: true | ||
| tpm: { } | ||
| features: |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In KubeVirt kubevirt.io/v1 VirtualMachine schema, is spec.template.spec.domain.devices.interfaces[].tpmvalid, or must TPM be set atspec.template.spec.domain.devices.tpm?
💡 Result:
In KubeVirt’s kubevirt.io/v1 VirtualMachine API, TPM is configured under spec.template.spec.domain.devices.tpm (as a single tpm device), not under spec.template.spec.domain.devices.interfaces[].tpm. Authoritative docs show the vTPM device placed at spec.template.spec.domain.devices.tpm (including fields like persistent) in example VM manifests [1][2]. Separately, KubeVirt’s TPM API discussion refers to spec.domain.devices.tpm specifically (and how it can’t previously be disabled by user-provided nil) [3]. So, spec.template.spec.domain.devices.interfaces[].tpm is not the valid schema location for TPM; TPM must be set at spec.template.spec.domain.devices.tpm.
Citations:
- 1: https://docs.okd.io/4.14/virt/virtual_machines/virt-using-vtpm-devices.html
- 2: https://docs.okd.io/4.21/virt/managing_vms/virt-using-vtpm-devices.html
- 3: No way of overriding the preference value of
PreferredTPMfrom within aVirtualMachinekubevirt/kubevirt#14315
Move tpm to the device level, not inside the NIC interface.
spec.template.spec.domain.devices.tpm is a device-level field in the KubeVirt kubevirt.io/v1 schema. At line 52 it is nested under an interface entry, which violates the valid schema placement. TPM must be configured at spec.template.spec.domain.devices.tpm, not under devices.interfaces[].tpm.
Proposed fix
devices:
blockMultiQueue: false
disks:
@@
interfaces:
- masquerade: {}
model: virtio
name: nic-0
networkInterfaceMultiqueue: true
- tpm: { }
+ tpm: {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/perf_ci_winfio_vm_ODF_PVC_False/winfio_vm.yaml`
around lines 47 - 53, The TPM key is incorrectly nested under the network
interface entry (devices.interfaces[] — e.g., the interface named "nic-0"); move
the tpm mapping out of the interface object into the device-level location
spec.template.spec.domain.devices.tpm so the TPM is defined under devices.tpm
(not devices.interfaces[].tpm), preserving its contents and removing the tpm key
from the interface.
| interfaces: | ||
| - masquerade: {} | ||
| model: virtio | ||
| name: nic-0 | ||
| networkInterfaceMultiqueue: true | ||
| tpm: { } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In KubeVirt kubevirt.io/v1 VirtualMachine schema, are networkInterfaceMultiqueueandtpmdefined underspec.template.spec.domain.devices(not per-interface underinterfaces[])?
💡 Result:
Yes, both networkInterfaceMultiqueue and tpm are defined under spec.template.spec.domain.devices in the KubeVirt VirtualMachine schema, rather than being configured per-interface within the interfaces[] list [1][2]. 1. networkInterfaceMultiqueue: This is a boolean field located directly under spec.template.spec.domain.devices. When set to true, it enables multi-queue functionality for all compatible virtio network interfaces [1][3]. It is not an interface-specific setting [1]. 2. tpm: This is a device object defined under spec.template.spec.domain.devices.tpm [2]. It is a global device configuration for the virtual machine, not a property of individual network interfaces [2].
Citations:
- 1: https://kubevirt.io/user-guide/network/interfaces_and_networks/
- 2: https://docs.okd.io/4.21/virt/managing_vms/virt-using-vtpm-devices.html
- 3: https://kubevirt.io/user-guide/user_workloads/templates/
Move device-level fields out of the interface entry.
Lines 51–52 (networkInterfaceMultiqueue and tpm) are placed under an interface item, but in KubeVirt both are device-level fields that belong under spec.template.spec.domain.devices, not per-interface. networkInterfaceMultiqueue applies globally to all compatible interfaces, and tpm is a global device configuration. This schema placement will cause validation failures or silently ignored fields.
Proposed fix
devices:
blockMultiQueue: false
+ networkInterfaceMultiqueue: true
+ tpm: {}
disks:
- disk:
bus: virtio
name: winfio-vm-root-disk-deadbeef
dedicatedIOThread: false
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- masquerade: {}
model: virtio
name: nic-0
- networkInterfaceMultiqueue: true
- tpm: { }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/release_winfio_vm_ODF_PVC_False/winfio_vm.yaml`
around lines 47 - 52, The YAML places device-level fields under an interface
item; move networkInterfaceMultiqueue and tpm out of the interface entry and
into the domain devices section (spec.template.spec.domain.devices) instead of
under the interfaces list entry (the nic-0 interface). Ensure
networkInterfaceMultiqueue is defined at the same level as other device configs
under domain.devices and add a tpm entry under domain.devices (not
per-interface) so KubeVirt validation recognizes them.
| interfaces: | ||
| - masquerade: {} | ||
| model: virtio | ||
| name: nic-0 | ||
| networkInterfaceMultiqueue: true | ||
| tpm: { } | ||
| features: |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In KubeVirt v1 VirtualMachine schema, are networkInterfaceMultiqueueandtpmvalid underspec.template.spec.domain.devices.interfaces[], or must they be under spec.template.spec.domain.devices?
💡 Result:
Yes, both are valid under spec.template.spec.domain.devices (not under domain.devices.interfaces[]): 1) spec.template.spec.domain.devices.networkInterfaceMultiqueue The KubeVirt user guide shows networkInterfaceMultiqueue configured under spec.template.spec.domain.devices (alongside other “devices” knobs), e.g. spec: template: spec: domain: devices: networkInterfaceMultiqueue: true. [1] Also, the same documentation explains that multi-queue affects “virtio” interfaces configured by name in spec.domain.devices.interfaces, which are separate from the devices.networkInterfaceMultiqueue toggle itself. [2] So networkInterfaceMultiqueue is not meant to be an element field inside spec.template.spec.domain.devices.interfaces[]. [1][2] 2) spec.template.spec.domain.devices.tpm KubeVirt’s vTPM docs show tpm under spec.template.spec.domain.devices: domain: devices: tpm: persistent: true. [3] So tpm is not an entry in spec.template.spec.domain.devices.interfaces[]; it belongs directly under spec.template.spec.domain.devices. [3] Conclusion: networkInterfaceMultiqueue and tpm must be under spec.template.spec.domain.devices; they are not under spec.template.spec.domain.devices.interfaces[].
Citations:
- 1: https://fabiand.github.io/user-guide/docs/latest/creating-virtual-machines/interfaces-and-networks.html
- 2: https://kubevirt.io/user-guide/network/interfaces_and_networks/
- 3: https://docs.okd.io/4.21/virt/managing_vms/virt-using-vtpm-devices.html
🏁 Script executed:
# Read the specific file to verify the issue at lines 47-53
cat -n "tests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winfio_vm_ODF_PVC_False/winfio_vm.yaml" | sed -n '40,60p'Repository: redhat-performance/benchmark-runner
Length of output: 774
Move networkInterfaceMultiqueue and tpm to spec.template.spec.domain.devices level.
Lines 51–52 place these device-level fields inside interfaces[]; KubeVirt schema requires them at the devices level. This can cause schema validation failure or silently drop intended configuration.
Proposed fix
devices:
blockMultiQueue: false
+ networkInterfaceMultiqueue: true
+ tpm: {}
disks:
- disk:
bus: virtio
name: winfio-vm-root-disk-deadbeef
dedicatedIOThread: false
- disk:
bus: virtio
name: cloudinitdisk
interfaces:
- masquerade: {}
model: virtio
name: nic-0
- networkInterfaceMultiqueue: true
- tpm: { }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@tests/unittest/benchmark_runner/common/template_operations/golden_files/test_ci_winfio_vm_ODF_PVC_False/winfio_vm.yaml`
around lines 47 - 53, The fields networkInterfaceMultiqueue and tpm are
incorrectly nested under interfaces[]; move them up to
spec.template.spec.domain.devices so they live alongside other device settings.
Edit the resource template where interfaces: - name: nic-0 ... currently
contains networkInterfaceMultiqueue: true and tpm: { } — remove those two keys
from the interfaces entry and add them under spec.template.spec.domain.devices
(e.g., devices: { networkInterfaceMultiqueue: true, tpm: {}}) so the KubeVirt
schema is satisfied.
There was a problem hiding this comment.
SSH wait loop uses OC.DELAY (30s) step instead of SLEEP_TIME (3s). If SSH is ready at 5s it won't be checked until 30s. _wait_for_virtctl_ssh_password uses SLEEP_TIME — should be consistent.
|
/approve |
|
/test ? |
|
/test test-step-sysbench-pod |
|
/test test-step-uperf-vm |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
jenkins/PerfCI/03_PerfCI_Workloads_Deployment/Jenkinsfile (1)
134-137:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMap
winfio_vm/winfio_vm_scaleto the non-MSSQL Windows 2025 image.Both branches currently set
WINDOWS_URLtoWINDOWS_SERVER_MSSQL_2025_URL, which selects the wrong base image for winfio workloads.Proposed fix
if (workload == "winfio_vm_scale") { SCALE = WINFIO_SCALE - WINDOWS_URL = WINDOWS_SERVER_MSSQL_2025_URL + WINDOWS_URL = WINDOWS_SERVER_2025_URL } @@ if (workload == "winfio_vm") { - WINDOWS_URL = WINDOWS_SERVER_MSSQL_2025_URL + WINDOWS_URL = WINDOWS_SERVER_2025_URL }Also applies to: 168-170
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@jenkins/PerfCI/03_PerfCI_Workloads_Deployment/Jenkinsfile` around lines 134 - 137, The winfio workload branches incorrectly assign WINDOWS_URL to WINDOWS_SERVER_MSSQL_2025_URL; update both branches where workload == "winfio_vm_scale" and workload == "winfio_vm" (also the identical block around lines 168-170) to set WINDOWS_URL to the non-MSSQL image constant WINDOWS_SERVER_2025_URL while keeping SCALE = WINFIO_SCALE (or appropriate SCALE assignment) unchanged so winfio uses the correct Windows 2025 image.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@jenkins/PerfCI/03_PerfCI_Workloads_Deployment/Jenkinsfile`:
- Around line 134-137: The winfio workload branches incorrectly assign
WINDOWS_URL to WINDOWS_SERVER_MSSQL_2025_URL; update both branches where
workload == "winfio_vm_scale" and workload == "winfio_vm" (also the identical
block around lines 168-170) to set WINDOWS_URL to the non-MSSQL image constant
WINDOWS_SERVER_2025_URL while keeping SCALE = WINFIO_SCALE (or appropriate SCALE
assignment) unchanged so winfio uses the correct Windows 2025 image.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: a87adeca-4776-4e3c-9cab-4dc5a140281e
📒 Files selected for processing (5)
benchmark_runner/grafana/perf/jsonnet/main.libsonnetbenchmark_runner/main/environment_variables.pyjenkins/PerfCI/03_PerfCI_Workloads_Deployment/Jenkinsfilepyproject.tomltests/unittest/benchmark_runner/common/template_operations/golden_files.py
✅ Files skipped from review due to trivial changes (1)
- pyproject.toml
|
/test ? |
|
/test test-step-fio-pod |
Assisted-by: Claude Code
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: arpsharm, ebattat The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary
winfio_vmworkload: Windows FIO storage benchmark running on Windows Server 2025 VMs via KubeVirtC:/tools/fiodelayparameter to_wait_for_virtctl_sshin virtctl.py to handle Windows UEFI boot timingwinfio_vmto Jenkins PerfCI pipeline withWINDOWS_SERVER_2025_URLTest plan
🤖 Assisted-by: Claude Code
Summary by CodeRabbit
New Features
Tests
Chores