MCO-2257: ensure MCO tests pass in RHEL10#6055
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@cheesesashimi: This pull request references MCO-2257 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThis PR adds RHEL 10 support to on-cluster layering E2E tests by introducing cluster-aware Cowsay Dockerfiles and runtime OS detection. Tests migrate from static Dockerfile constants to dynamically selected variants based on worker node OS, with supporting infrastructure for node operations and SSH path handling. ChangesRHEL 10 On-Cluster Layering Test Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 9 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Skipping CI for Draft Pull Request. |
|
/test e2e-gcp-op-ocl-part-1 e2e-gcp-op-ocl-part-2 e2e-gcp-op-part-1 e2e-gcp-op-part-2 |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-disruptive-techpreview-3of3 |
|
@cheesesashimi: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5d8391c0-52c3-11f1-876f-98743cec004f-0 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheesesashimi The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-gcp-op-ocl-part1 |
18b8a32 to
d68e21b
Compare
|
/test e2e-gcp-op-ocl-part1 |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e-1of2/mcd_test.go`:
- Around line 1180-1181: The test currently hardcodes the EPEL9 RPM URL in the
chroot curl command; instead detect the node OS major version at runtime (reuse
the same version-detection logic/pattern used elsewhere in this test suite), set
an epelURL variable to either the epel-release-latest-9 or
epel-release-latest-10 RPM accordingly, and replace the literal
"https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm" in the
chroot curl invocation with that epelURL; locate the chroot/curl invocation in
this test (the array containing "chroot", "/rootfs", "curl", "-L", "...", "-o",
"/tmp/epel-release-latest-9.noarch.rpm") and update surrounding test setup to
compute epelURL from the node OS version before using it.
In `@test/extended-priv/mco_ocb.go`:
- Around line 174-181: The test currently reads RHEL from an arbitrary worker
node via
NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()[0].GetRHELVersion(), which
can fail on workerless layouts and drift from the MCP under test; instead call
the existing GetCompactCompatiblePool to get the target pool name, then pick a
node that belongs to that pool (e.g., filter nodes from
NewNodeList(oc.AsAdmin()) by node.MachineConfigPool or equivalent property) and
call GetRHELVersion() on that node; replace the direct
GetAllWorkerNodesOrFail()[0] usage with logic that finds a node in the MCP
returned by GetCompactCompatiblePool so OS detection matches the pool under
test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7d0a66be-e519-470a-a0d6-5753084e37c6
📒 Files selected for processing (10)
test/e2e-1of2/mcd_test.gotest/e2e-ocl-1of2/onclusterlayering_test.gotest/e2e-ocl-2of2/layered_image_scaleup_test.gotest/e2e-ocl-2of2/onclusterlayering_test.gotest/e2e-ocl-shared/Containerfile.cowsay-rhel10test/e2e-ocl-shared/Containerfile.cowsay-rhel9test/e2e-ocl-shared/containerfiles.gotest/extended-priv/mco_ocb.gotest/extended-priv/node.gotest/helpers/utils.go
| // TODO: Switch this to use epel-release-latest-10 in the future. | ||
| "chroot", "/rootfs", "curl", "-L", "https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm", "-o", "/tmp/epel-release-latest-9.noarch.rpm", |
There was a problem hiding this comment.
Avoid hardcoding EPEL9 RPM in a dual RHEL9/RHEL10 test path.
This keeps the test pinned to EL9 while the PR target is EL9+EL10 compatibility. Select the EPEL release RPM by node OS version at runtime (same pattern used elsewhere in this stack) instead of a fixed latest-9 URL/path.
Suggested direction
- // TODO: Switch this to use epel-release-latest-10 in the future.
- "chroot", "/rootfs", "curl", "-L", "https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm", "-o", "/tmp/epel-release-latest-9.noarch.rpm",
+ nodeOS := helpers.GetOSReleaseForNode(t, cs, node).OS
+ epelMajor := "9"
+ if strings.Contains(nodeOS, "10") {
+ epelMajor = "10"
+ }
+ epelRPM := fmt.Sprintf("epel-release-latest-%s.noarch.rpm", epelMajor)
+ epelURL := fmt.Sprintf("https://dl.fedoraproject.org/pub/epel/%s", epelRPM)
+ "chroot", "/rootfs", "curl", "-L", epelURL, "-o", filepath.Join("/tmp", epelRPM),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e-1of2/mcd_test.go` around lines 1180 - 1181, The test currently
hardcodes the EPEL9 RPM URL in the chroot curl command; instead detect the node
OS major version at runtime (reuse the same version-detection logic/pattern used
elsewhere in this test suite), set an epelURL variable to either the
epel-release-latest-9 or epel-release-latest-10 RPM accordingly, and replace the
literal "https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm"
in the chroot curl invocation with that epelURL; locate the chroot/curl
invocation in this test (the array containing "chroot", "/rootfs", "curl", "-L",
"...", "-o", "/tmp/epel-release-latest-9.noarch.rpm") and update surrounding
test setup to compute epelURL from the node OS version before using it.
| // Get a worker node and determine which OS it is running so that the correct Containerfile can be used. | ||
| workerNode := NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()[0] | ||
| rhelVersion, err := workerNode.GetRHELVersion() | ||
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
|
||
| if !strings.Contains(rhelVersion, "10.") { | ||
| containerFileContent = rhel9ContainerFileContent | ||
| } |
There was a problem hiding this comment.
Detect RHEL version from the selected MCP, not from an arbitrary worker node.
This test chooses a topology-compatible pool via GetCompactCompatiblePool, but OS detection is taken from the first worker node. That can fail on workerless layouts and can also drift from the pool actually under test.
Suggested fix
- workerNode := NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()[0]
- rhelVersion, err := workerNode.GetRHELVersion()
+ targetNode := mcp.GetSortedNodesOrFail()[0]
+ rhelVersion, err := targetNode.GetRHELVersion()
o.Expect(err).NotTo(o.HaveOccurred())As per coding guidelines, tests should avoid assumptions about specific node roles/topologies (e.g., separate worker nodes) unless explicitly protected.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Get a worker node and determine which OS it is running so that the correct Containerfile can be used. | |
| workerNode := NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()[0] | |
| rhelVersion, err := workerNode.GetRHELVersion() | |
| o.Expect(err).NotTo(o.HaveOccurred()) | |
| if !strings.Contains(rhelVersion, "10.") { | |
| containerFileContent = rhel9ContainerFileContent | |
| } | |
| // Get a worker node and determine which OS it is running so that the correct Containerfile can be used. | |
| targetNode := mcp.GetSortedNodesOrFail()[0] | |
| rhelVersion, err := targetNode.GetRHELVersion() | |
| o.Expect(err).NotTo(o.HaveOccurred()) | |
| if !strings.Contains(rhelVersion, "10.") { | |
| containerFileContent = rhel9ContainerFileContent | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended-priv/mco_ocb.go` around lines 174 - 181, The test currently
reads RHEL from an arbitrary worker node via
NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()[0].GetRHELVersion(), which
can fail on workerless layouts and drift from the MCP under test; instead call
the existing GetCompactCompatiblePool to get the target pool name, then pick a
node that belongs to that pool (e.g., filter nodes from
NewNodeList(oc.AsAdmin()) by node.MachineConfigPool or equivalent property) and
call GetRHELVersion() on that node; replace the direct
GetAllWorkerNodesOrFail()[0] usage with logic that finds a node in the MCP
returned by GetCompactCompatiblePool so OS detection matches the pool under
test.
|
@cheesesashimi: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-disruptive-techpreview-3of3 |
|
@cheesesashimi: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/89f61b60-548f-11f1-93d6-13081f380bb4-0 |
- What I did
This PR ensures that MCO E2E tests pass in both RHEL9 and RHEL10. This is the PR that should eventually be merged since it does not contain the RHEL10 switchover work.
- How to verify it
This primarily targets the MCO E2E tests, so running the CI jobs should be sufficient.
- Description for the changelog
Ensure that MCO E2E tests pass in RHEL9 and RHEL10
Summary by CodeRabbit
Tests