Skip to content

MCO-2257: ensure MCO tests pass in RHEL10#6055

Draft
cheesesashimi wants to merge 2 commits into
openshift:mainfrom
cheesesashimi:zzlotnik/ensure-tests-pass-in-rhel9
Draft

MCO-2257: ensure MCO tests pass in RHEL10#6055
cheesesashimi wants to merge 2 commits into
openshift:mainfrom
cheesesashimi:zzlotnik/ensure-tests-pass-in-rhel9

Conversation

@cheesesashimi
Copy link
Copy Markdown
Member

@cheesesashimi cheesesashimi commented May 18, 2026

- What I did

This PR ensures that MCO E2E tests pass in both RHEL9 and RHEL10. This is the PR that should eventually be merged since it does not contain the RHEL10 switchover work.

- How to verify it

This primarily targets the MCO E2E tests, so running the CI jobs should be sufficient.

- Description for the changelog
Ensure that MCO E2E tests pass in RHEL9 and RHEL10

Summary by CodeRabbit

Tests

  • Enhanced test suite with support for RHEL 10 cluster environments
  • Improved test infrastructure to dynamically detect and use OS-specific configurations
  • Added cluster-aware resource selection for better cross-version compatibility
  • Expanded test coverage across EL9 and EL10 environments

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 18, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented May 18, 2026

@cheesesashimi: This pull request references MCO-2257 which is a valid jira issue.

Details

In response to this:

- What I did

This PR ensures that MCO E2E tests pass in both RHEL9 and RHEL10. This is the PR that should eventually be merged since it does not contain the RHEL10 switchover work.

- How to verify it

This primarily targets the MCO E2E tests, so running the CI jobs should be sufficient.

- Description for the changelog
Ensure that MCO E2E tests pass in RHEL9 and RHEL10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 18, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Walkthrough

This PR adds RHEL 10 support to on-cluster layering E2E tests by introducing cluster-aware Cowsay Dockerfiles and runtime OS detection. Tests migrate from static Dockerfile constants to dynamically selected variants based on worker node OS, with supporting infrastructure for node operations and SSH path handling.

Changes

RHEL 10 On-Cluster Layering Test Support

Layer / File(s) Summary
Core Dockerfile infrastructure and OS detection
test/e2e-ocl-shared/Containerfile.cowsay-rhel10, test/e2e-ocl-shared/containerfiles.go
New RHEL 10 Containerfile alongside RHEL 9; GetCowsayDockerfileForCluster detects worker node OS and returns the matching embedded containerfile variant.
Test support infrastructure for node operations and SSH paths
test/extended-priv/node.go, test/helpers/utils.go
NodeList.GetAllWorkerNodesOrFail helper for error-checked worker node retrieval; SSH path selection extended to treat EL10 same as EL9/SCOS/FCOS.
Extended privilege test RHEL 10 support
test/extended-priv/mco_ocb.go
Splits MachineOSConfig containerfile test content into RHEL 9 and RHEL 10 variants with runtime detection to select RHEL 9 when needed.
E2E on-cluster layering tests batch 1 (1of2)
test/e2e-ocl-1of2/onclusterlayering_test.go
Six tests updated to fetch cluster-specific Cowsay Dockerfile; client set creation reordered for reuse; cleanup flags grouped into single var block.
E2E on-cluster layering tests batch 2 (2of2)
test/e2e-ocl-2of2/onclusterlayering_test.go
Six tests updated to fetch cluster-specific Dockerfile; containerfile recovery and modification paths updated to use fetched variant instead of static constant; cleanup flags grouped.
Scale-up test cluster-specific Dockerfile
test/e2e-ocl-2of2/layered_image_scaleup_test.go
Selects existing worker node, fetches cluster-specific Cowsay Dockerfile with error handling, and uses returned variant in layered image build.
EPEL release RPM download update
test/e2e-1of2/mcd_test.go
Removes curl -K flag from EPEL release RPM download command; TODO comment added for future epel-release-latest-10 usage.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • pablintino
🚥 Pre-merge checks | ✅ 9 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test quality issues: GetAllWorkerNodesOrFail missing error message (violates #4), EPEL9 hardcoded in mcd_test without OS selection, require.NoError calls lack failure messages. Add failure messages to assertions per requirement #4. Implement dynamic EPEL selection in mcd_test like containerfiles.go. Fix OS detection from targeted MCP nodes, not arbitrary workers.
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning PR adds external connectivity requirements incompatible with IPv6 disconnected: hardcoded EPEL9 RPM URL, quay.io container image pulls, IPv4 localhost (127.0.0.1) without IPv6. Detect RHEL version and select EPEL RPM (9 vs 10) at runtime. Ensure containerfiles work offline or skip on disconnected. Support IPv6 localhost (::1) for metrics, not just IPv4.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main objective: ensuring MCO tests pass in RHEL10, which aligns with the changeset's core purpose of adding RHEL10 support across multiple test files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test names are static and deterministic. No dynamic information (node names, timestamps, UUIDs, etc.) present in test titles. Test modifications only affect test bodies, not names.
Microshift Test Compatibility ✅ Passed Custom check does not apply. The PR modifies existing tests and adds helper functions/utilities but does NOT add any new Ginkgo e2e tests (It, Describe, Context, When).
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo test blocks are added. The PR only modifies existing tests and helper functions for RHEL9/RHEL10 compatibility. The SNO check applies to newly added tests only.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only test files and test helpers (test/e2e-*, test/extended-priv/, test/helpers/). No deployment manifests, operator code, or controllers are changed. Custom check scope does not apply.
Ote Binary Stdout Contract ✅ Passed No violations detected. No fmt.Print, log.Print, or klog calls at process level in modified test files. Only safe fmt.Errorf in a function.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 18, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@cheesesashimi
Copy link
Copy Markdown
Member Author

/test e2e-gcp-op-ocl-part-1 e2e-gcp-op-ocl-part-2 e2e-gcp-op-part-1 e2e-gcp-op-part-2

@cheesesashimi
Copy link
Copy Markdown
Member Author

/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-disruptive-techpreview-3of3

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 18, 2026

@cheesesashimi: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-disruptive-techpreview-3of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5d8391c0-52c3-11f1-876f-98743cec004f-0

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 18, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheesesashimi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 18, 2026
@cheesesashimi
Copy link
Copy Markdown
Member Author

/test e2e-gcp-op-ocl-part1
/test e2e-gcp-op-ocl-part2
/test e2e-gcp-op-part1
/test e2e-gcp-op-part2

@cheesesashimi cheesesashimi force-pushed the zzlotnik/ensure-tests-pass-in-rhel9 branch from 18b8a32 to d68e21b Compare May 19, 2026 14:06
@cheesesashimi
Copy link
Copy Markdown
Member Author

/test e2e-gcp-op-ocl-part1
/test e2e-gcp-op-ocl-part2
/test e2e-gcp-op-part1
/test e2e-gcp-op-part2

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-1of2/mcd_test.go`:
- Around line 1180-1181: The test currently hardcodes the EPEL9 RPM URL in the
chroot curl command; instead detect the node OS major version at runtime (reuse
the same version-detection logic/pattern used elsewhere in this test suite), set
an epelURL variable to either the epel-release-latest-9 or
epel-release-latest-10 RPM accordingly, and replace the literal
"https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm" in the
chroot curl invocation with that epelURL; locate the chroot/curl invocation in
this test (the array containing "chroot", "/rootfs", "curl", "-L", "...", "-o",
"/tmp/epel-release-latest-9.noarch.rpm") and update surrounding test setup to
compute epelURL from the node OS version before using it.

In `@test/extended-priv/mco_ocb.go`:
- Around line 174-181: The test currently reads RHEL from an arbitrary worker
node via
NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()[0].GetRHELVersion(), which
can fail on workerless layouts and drift from the MCP under test; instead call
the existing GetCompactCompatiblePool to get the target pool name, then pick a
node that belongs to that pool (e.g., filter nodes from
NewNodeList(oc.AsAdmin()) by node.MachineConfigPool or equivalent property) and
call GetRHELVersion() on that node; replace the direct
GetAllWorkerNodesOrFail()[0] usage with logic that finds a node in the MCP
returned by GetCompactCompatiblePool so OS detection matches the pool under
test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7d0a66be-e519-470a-a0d6-5753084e37c6

📥 Commits

Reviewing files that changed from the base of the PR and between 206bc51 and d68e21b.

📒 Files selected for processing (10)
  • test/e2e-1of2/mcd_test.go
  • test/e2e-ocl-1of2/onclusterlayering_test.go
  • test/e2e-ocl-2of2/layered_image_scaleup_test.go
  • test/e2e-ocl-2of2/onclusterlayering_test.go
  • test/e2e-ocl-shared/Containerfile.cowsay-rhel10
  • test/e2e-ocl-shared/Containerfile.cowsay-rhel9
  • test/e2e-ocl-shared/containerfiles.go
  • test/extended-priv/mco_ocb.go
  • test/extended-priv/node.go
  • test/helpers/utils.go

Comment thread test/e2e-1of2/mcd_test.go
Comment on lines +1180 to +1181
// TODO: Switch this to use epel-release-latest-10 in the future.
"chroot", "/rootfs", "curl", "-L", "https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm", "-o", "/tmp/epel-release-latest-9.noarch.rpm",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid hardcoding EPEL9 RPM in a dual RHEL9/RHEL10 test path.

This keeps the test pinned to EL9 while the PR target is EL9+EL10 compatibility. Select the EPEL release RPM by node OS version at runtime (same pattern used elsewhere in this stack) instead of a fixed latest-9 URL/path.

Suggested direction
-	// TODO: Switch this to use epel-release-latest-10 in the future.
-	"chroot", "/rootfs", "curl", "-L", "https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm", "-o", "/tmp/epel-release-latest-9.noarch.rpm",
+	nodeOS := helpers.GetOSReleaseForNode(t, cs, node).OS
+	epelMajor := "9"
+	if strings.Contains(nodeOS, "10") {
+		epelMajor = "10"
+	}
+	epelRPM := fmt.Sprintf("epel-release-latest-%s.noarch.rpm", epelMajor)
+	epelURL := fmt.Sprintf("https://dl.fedoraproject.org/pub/epel/%s", epelRPM)
+	"chroot", "/rootfs", "curl", "-L", epelURL, "-o", filepath.Join("/tmp", epelRPM),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-1of2/mcd_test.go` around lines 1180 - 1181, The test currently
hardcodes the EPEL9 RPM URL in the chroot curl command; instead detect the node
OS major version at runtime (reuse the same version-detection logic/pattern used
elsewhere in this test suite), set an epelURL variable to either the
epel-release-latest-9 or epel-release-latest-10 RPM accordingly, and replace the
literal "https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm"
in the chroot curl invocation with that epelURL; locate the chroot/curl
invocation in this test (the array containing "chroot", "/rootfs", "curl", "-L",
"...", "-o", "/tmp/epel-release-latest-9.noarch.rpm") and update surrounding
test setup to compute epelURL from the node OS version before using it.

Comment on lines +174 to +181
// Get a worker node and determine which OS it is running so that the correct Containerfile can be used.
workerNode := NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()[0]
rhelVersion, err := workerNode.GetRHELVersion()
o.Expect(err).NotTo(o.HaveOccurred())

if !strings.Contains(rhelVersion, "10.") {
containerFileContent = rhel9ContainerFileContent
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Detect RHEL version from the selected MCP, not from an arbitrary worker node.

This test chooses a topology-compatible pool via GetCompactCompatiblePool, but OS detection is taken from the first worker node. That can fail on workerless layouts and can also drift from the pool actually under test.

Suggested fix
-		workerNode := NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()[0]
-		rhelVersion, err := workerNode.GetRHELVersion()
+		targetNode := mcp.GetSortedNodesOrFail()[0]
+		rhelVersion, err := targetNode.GetRHELVersion()
 		o.Expect(err).NotTo(o.HaveOccurred())

As per coding guidelines, tests should avoid assumptions about specific node roles/topologies (e.g., separate worker nodes) unless explicitly protected.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Get a worker node and determine which OS it is running so that the correct Containerfile can be used.
workerNode := NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()[0]
rhelVersion, err := workerNode.GetRHELVersion()
o.Expect(err).NotTo(o.HaveOccurred())
if !strings.Contains(rhelVersion, "10.") {
containerFileContent = rhel9ContainerFileContent
}
// Get a worker node and determine which OS it is running so that the correct Containerfile can be used.
targetNode := mcp.GetSortedNodesOrFail()[0]
rhelVersion, err := targetNode.GetRHELVersion()
o.Expect(err).NotTo(o.HaveOccurred())
if !strings.Contains(rhelVersion, "10.") {
containerFileContent = rhel9ContainerFileContent
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended-priv/mco_ocb.go` around lines 174 - 181, The test currently
reads RHEL from an arbitrary worker node via
NewNodeList(oc.AsAdmin()).GetAllWorkerNodesOrFail()[0].GetRHELVersion(), which
can fail on workerless layouts and drift from the MCP under test; instead call
the existing GetCompactCompatiblePool to get the target pool name, then pick a
node that belongs to that pool (e.g., filter nodes from
NewNodeList(oc.AsAdmin()) by node.MachineConfigPool or equivalent property) and
call GetRHELVersion() on that node; replace the direct
GetAllWorkerNodesOrFail()[0] usage with logic that finds a node in the MCP
returned by GetCompactCompatiblePool so OS detection matches the pool under
test.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 19, 2026

@cheesesashimi: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@cheesesashimi
Copy link
Copy Markdown
Member Author

/payload-job periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-disruptive-techpreview-3of3

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@cheesesashimi: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-5.0-periodics-e2e-aws-mco-disruptive-techpreview-3of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/89f61b60-548f-11f1-93d6-13081f380bb4-0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants