Skip to content

DNM: CNTRLPLANE-3612: Add dual-stream RHEL 9/10 NodePool support#8714

Open
hypershift-jira-solve-ci[bot] wants to merge 5 commits into
openshift:mainfrom
hypershift-community:fix-CNTRLPLANE-3612
Open

DNM: CNTRLPLANE-3612: Add dual-stream RHEL 9/10 NodePool support#8714
hypershift-jira-solve-ci[bot] wants to merge 5 commits into
openshift:mainfrom
hypershift-community:fix-CNTRLPLANE-3612

Conversation

@hypershift-jira-solve-ci

@hypershift-jira-solve-ci hypershift-jira-solve-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown

What this PR does / why we need it:

Implements dual-stream RHEL 9/10 NodePool support for OpenShift 5.0+ payloads.

Starting with OpenShift 5.0, release payloads carry both RHEL 9 and RHEL 10 boot images. This PR threads the OS stream selection end-to-end: from the NodePool spec through the controller, token secret, ignition server, and into MCC bootstrap manifests.

Key changes:

  • Boot image metadata parsing (support/releaseinfo): DeserializeImageMetadata now parses both the legacy single-stream "stream" key and the new multi-stream "streams" key from the boot image ConfigMap. A StreamMetadataForStream helper resolves stream-specific metadata with fallback to legacy.

  • RHEL stream resolution (nodepool/rhel_stream.go): Pure function getRHELStream() resolves the target RHEL stream based on spec.osImageStream.name, release version, and runc usage. RHEL 10 is the default for >= 5.0; runc forces RHEL 9 (RHEL 10 does not ship runc). Pre-5.0 releases use legacy single-stream behavior.

  • Controller plumbing (nodepool/): The resolved stream flows into rolloutConfig for hash computation (only explicit stream changes trigger rollouts), into defaultNodePoolAMI/defaultNodePoolGCPImage for stream-aware boot image selection, and into the token secret as os-stream for the ignition server.

  • Ignition server (ignition-server/): GetPayload() accepts the OS stream and writes an OSImageStream CR (99_osimagestream.yaml) to the MCC manifest directory, telling MCC bootstrap which RHEL stream to use when rendering MachineConfigs.

  • Status reporting (nodepool/version.go): status.osImageStream is populated by detecting the RHEL stream from CAPI Machine NodeInfo.OSImage (RHCOS 4xx = RHEL 9, 5xx = RHEL 10).

  • Validation conditions: rhel-10 on release < 5.0 and rhel-10 + runc are rejected with clear condition messages.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/CNTRLPLANE-3612

Special notes for your reviewer:

  • The rhelStream field is included in rolloutConfig.Hash() but only populated from the explicit spec.osImageStream.name (not the resolved default). This avoids fleet-wide rollouts when upgrading to 5.0 — existing NodePools silently pick up the default stream without a rollout.
  • RHEL 10 does not ship runc. When a NodePool uses ContainerRuntimeConfig with defaultRuntime=runc, the controller automatically falls back to RHEL 9 even on >= 5.0 releases.
  • Boot image ConfigMap parsing sorts map keys deterministically when the legacy "stream" key is absent, ensuring stable behavior.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Always review AI generated responses prior to use.
Generated with Claude Code via /jira:solve [CNTRLPLANE-3612](https://redhat.atlassian.net/browse/CNTRLPLANE-3612)

Summary by CodeRabbit

  • New Features

    • Support for selecting and validating RHEL 9 vs RHEL 10 OS image streams, propagated into payloads and tokens
    • Release metadata handling extended to multi-stream formats and fixtures
  • Bug Fixes / Improvements

    • More robust image/AMI selection and clearer errors when stream metadata is missing
    • Rollout/hash computation now accounts for chosen OS image stream for deterministic updates
  • Tests

    • Added unit tests for RHEL stream resolution, detection, manifest writing, and hashing impacts

Update DeserializeImageMetadata to parse both the legacy single-stream
"stream" key and the new multi-stream "streams" key from the boot image
ConfigMap. The "streams" key is a JSON map keyed by stream name (e.g.,
"rhel-9", "rhel-10") introduced in OpenShift 5.0 payloads.

Add StreamsMetadata field to ReleaseImage and a StreamMetadataForStream
helper method that looks up stream-specific metadata, falling back to
the legacy single-stream metadata when the requested stream is empty or
not found. This enables per-NodePool RHEL stream selection for boot
image resolution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 10, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 10, 2026

Copy link
Copy Markdown

@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-3612 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Implements dual-stream RHEL 9/10 NodePool support for OpenShift 5.0+ payloads.

Starting with OpenShift 5.0, release payloads carry both RHEL 9 and RHEL 10 boot images. This PR threads the OS stream selection end-to-end: from the NodePool spec through the controller, token secret, ignition server, and into MCC bootstrap manifests.

Key changes:

  • Boot image metadata parsing (support/releaseinfo): DeserializeImageMetadata now parses both the legacy single-stream "stream" key and the new multi-stream "streams" key from the boot image ConfigMap. A StreamMetadataForStream helper resolves stream-specific metadata with fallback to legacy.

  • RHEL stream resolution (nodepool/rhel_stream.go): Pure function getRHELStream() resolves the target RHEL stream based on spec.osImageStream.name, release version, and runc usage. RHEL 10 is the default for >= 5.0; runc forces RHEL 9 (RHEL 10 does not ship runc). Pre-5.0 releases use legacy single-stream behavior.

  • Controller plumbing (nodepool/): The resolved stream flows into rolloutConfig for hash computation (only explicit stream changes trigger rollouts), into defaultNodePoolAMI/defaultNodePoolGCPImage for stream-aware boot image selection, and into the token secret as os-stream for the ignition server.

  • Ignition server (ignition-server/): GetPayload() accepts the OS stream and writes an OSImageStream CR (99_osimagestream.yaml) to the MCC manifest directory, telling MCC bootstrap which RHEL stream to use when rendering MachineConfigs.

  • Status reporting (nodepool/version.go): status.osImageStream is populated by detecting the RHEL stream from CAPI Machine NodeInfo.OSImage (RHCOS 4xx = RHEL 9, 5xx = RHEL 10).

  • Validation conditions: rhel-10 on release < 5.0 and rhel-10 + runc are rejected with clear condition messages.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/CNTRLPLANE-3612

Special notes for your reviewer:

  • The rhelStream field is included in rolloutConfig.Hash() but only populated from the explicit spec.osImageStream.name (not the resolved default). This avoids fleet-wide rollouts when upgrading to 5.0 — existing NodePools silently pick up the default stream without a rollout.
  • RHEL 10 does not ship runc. When a NodePool uses ContainerRuntimeConfig with defaultRuntime=runc, the controller automatically falls back to RHEL 9 even on >= 5.0 releases.
  • Boot image ConfigMap parsing sorts map keys deterministically when the legacy "stream" key is absent, ensuring stable behavior.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Always review AI generated responses prior to use.
Generated with Claude Code via /jira:solve [CNTRLPLANE-3612](https://redhat.atlassian.net/browse/CNTRLPLANE-3612)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 63717372-7172-47fd-8c45-95b343c63967

📥 Commits

Reviewing files that changed from the base of the PR and between 8a6804f and 1868128.

📒 Files selected for processing (11)
  • hypershift-operator/controllers/nodepool/config.go
  • hypershift-operator/controllers/nodepool/config_test.go
  • hypershift-operator/controllers/nodepool/kubevirt/kubevirt.go
  • hypershift-operator/controllers/nodepool/openstack/openstack.go
  • hypershift-operator/controllers/nodepool/powervs.go
  • hypershift-operator/controllers/nodepool/rhel_stream.go
  • hypershift-operator/controllers/nodepool/rhel_stream_test.go
  • hypershift-operator/controllers/nodepool/version.go
  • support/releaseinfo/fixtures/5.0-dual-stream-bootimages.yaml
  • support/releaseinfo/fixtures/fixtures.go
  • support/releaseinfo/releaseinfo_test.go
✅ Files skipped from review due to trivial changes (1)
  • support/releaseinfo/fixtures/5.0-dual-stream-bootimages.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • hypershift-operator/controllers/nodepool/rhel_stream.go
  • hypershift-operator/controllers/nodepool/version.go
  • hypershift-operator/controllers/nodepool/config.go

📝 Walkthrough

Walkthrough

This PR adds multi-stream CoreOS metadata support, a getRHELStream resolver with version/runtime gating, and threads NodePool.Spec.OSImageStream through ConfigGenerator, token creation, and ignition payload generation. Release metadata parsing, provider wiring, AMI/image lookups, rollout hash inputs, token secret keys, ignition manifest writing, machine-based OS stream inference, and tests/fixtures were updated accordingly.

Possibly related PRs

  • openshift/hypershift#8675: Adds the osImageStream field to NodePoolSpec and introduces the OSStreams feature gate, which this PR consumes and threads through AMI selection, token generation, and ignition payloads.

Suggested reviewers

  • devguyio
  • muraee
🚥 Pre-merge checks | ✅ 10 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Test assertions lack meaningful failure messages. Criterion 4 requires failure messages (e.g., Expect(err).NotTo(HaveOccurred(), "context message")), but version_test.go and local_ignitionprovide... Add meaningful diagnostic messages to all Expect assertions. Example: change g.Expect(result).To(Equal(tt.expected)) to `g.Expect(result).To(Equal(tt.expected), "detectRHELStreamFromOSImage(%q) should return %q", tt.osImage, tt.expecte...
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'DNM: CNTRLPLANE-3612: Add dual-stream RHEL 9/10 NodePool support' clearly and specifically describes the main change (adding dual-stream RHEL 9/10 support), references the Jira ticket, and uses 'DNM' (Do Not Merge) to indicate hold status.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The PR adds test code using the standard Go testing library (func TestXXX), not Ginkgo. The check targets Ginkgo tests (It(), Describe(), etc.), so it is not applicable here.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies NodePool RHEL stream handling (metadata parsing, controller logic, token/ignition plumbing, status reporting) but introduces no scheduling constraints, pod affinity, topology spread rul...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests were added in this PR. All new/modified tests are standard Go unit tests (using testing.T) in controller packages, not e2e tests. Custom check does not apply.
No-Weak-Crypto ✅ Passed PR introduces no weak cryptography. Hash functions use FNV-1a (not MD5/SHA1), no custom crypto implementations added, no unsafe secret comparisons. All cryptographic usage is consistent with existi...
Container-Privileges ✅ Passed No container privilege escalation configurations found. PR contains Go source code and tests for RHEL stream support; the only manifest created is an OSImageStream CR with no security context or pr...
No-Sensitive-Data-In-Logs ✅ Passed All new logging statements in the PR log only non-sensitive information: RHEL stream names (rhel-9, rhel-10), RHEL version numbers, architecture identifiers, and controller operation results. No pa...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from devguyio and muraee June 10, 2026 12:25
@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform and removed do-not-merge/needs-area labels Jun 10, 2026
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 57.84314% with 86 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.57%. Comparing base (4755e9c) to head (1868128).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...rshift-operator/controllers/nodepool/conditions.go 0.00% 26 Missing ⚠️
support/releaseinfo/deserialize.go 26.66% 10 Missing and 1 partial ⚠️
hypershift-operator/controllers/nodepool/token.go 40.00% 6 Missing and 3 partials ⚠️
...ypershift-operator/controllers/nodepool/version.go 84.74% 6 Missing and 3 partials ⚠️
...erator/controllers/nodepool/openstack/openstack.go 0.00% 4 Missing and 2 partials ⚠️
...ition-server/controllers/local_ignitionprovider.go 62.50% 6 Missing ⚠️
...erator/controllers/nodepool/nodepool_controller.go 75.00% 2 Missing and 2 partials ⚠️
support/releaseinfo/registryclient_provider.go 0.00% 4 Missing ⚠️
hypershift-operator/controllers/nodepool/config.go 57.14% 2 Missing and 1 partial ⚠️
...operator/controllers/nodepool/kubevirt/kubevirt.go 0.00% 2 Missing and 1 partial ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8714      +/-   ##
==========================================
+ Coverage   41.54%   41.57%   +0.03%     
==========================================
  Files         758      759       +1     
  Lines       93838    94016     +178     
==========================================
+ Hits        38986    39090     +104     
- Misses      52107    52169      +62     
- Partials     2745     2757      +12     
Files with missing lines Coverage Δ
...shift-operator/controllers/nodepool/rhel_stream.go 100.00% <100.00%> (ø)
...ition-server/controllers/tokensecret_controller.go 59.18% <100.00%> (+0.16%) ⬆️
support/releaseinfo/registry_mirror_provider.go 45.83% <100.00%> (+2.35%) ⬆️
support/releaseinfo/releaseinfo.go 47.59% <100.00%> (+1.73%) ⬆️
hypershift-operator/controllers/nodepool/aws.go 69.85% <50.00%> (ø)
ignition-server/cmd/run_local_ignitionprovider.go 0.00% <0.00%> (ø)
hypershift-operator/controllers/nodepool/config.go 84.51% <57.14%> (-1.01%) ⬇️
...operator/controllers/nodepool/kubevirt/kubevirt.go 68.70% <0.00%> (-0.53%) ⬇️
...ypershift-operator/controllers/nodepool/powervs.go 0.00% <0.00%> (ø)
...erator/controllers/nodepool/nodepool_controller.go 43.37% <75.00%> (+0.30%) ⬆️
... and 7 more
Flag Coverage Δ
cmd-support 34.97% <46.42%> (+<0.01%) ⬆️
cpo-hostedcontrolplane 43.59% <ø> (ø)
cpo-other 43.17% <ø> (ø)
hypershift-operator 51.66% <59.23%> (+0.03%) ⬆️
other 31.62% <63.15%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
hypershift-operator/controllers/nodepool/version_test.go (1)

184-304: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add test coverage for OSImageStream status field in setNodesInfoStatus tests.

The new OSImageStream status field (set at version.go:103-108) is not verified by any of the three existing test cases. The test at lines 215-251 creates a Machine with NodeInfo, but expectedNodesInfo at line 246 only checks NodeVersions. Similarly, the other test cases don't verify the OSImageStream field.

Add OSImageStream expectations to the test cases to ensure the integration path correctly populates the status field from Machine OSImage data.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/version_test.go` around lines 184 -
304, The tests for setNodesInfoStatus don't assert the new OSImageStream status
field: update each test case (the ones constructing machines and
expectedNodesInfo in TestSetNodesInfoStatus) to include
Machine.Status.NodeInfo.OSImage (add OSImage value on the Machine with NodeInfo
where appropriate) and add the corresponding expected OSImageStream entries to
the expectedNodesInfo for each case (for the "machines exist" case set the
expected OSImageStream entry matching the Machine's OSImage; for the cases that
should clear status set expectedNodesInfo.OSImageStreams to empty). Ensure you
update the expectedNodesInfo structure used in the g.Expect comparison so
setNodesInfoStatus (the function under test) is validated for OSImageStream
population.
🧹 Nitpick comments (2)
hypershift-operator/controllers/nodepool/nodepool_controller.go (2)

764-786: ⚡ Quick win

Include requested stream in error message.

When streamMeta is nil (line 773), the error message mentions the architecture but not the requested stream. For consistency with the recommendation on defaultNodePoolAMI and to aid debugging, include the stream name when it's non-empty.

📝 Proposed fix
 	streamMeta := releaseImage.StreamMetadataForStream(s)
 	if streamMeta == nil {
-		return "", fmt.Errorf("release image stream metadata is nil, cannot determine GCP image for architecture %q", specifiedArch)
+		if s != "" {
+			return "", fmt.Errorf("release image has no metadata for stream %q, cannot determine GCP image for architecture %q", s, specifiedArch)
+		}
+		return "", fmt.Errorf("release image stream metadata is nil, cannot determine GCP image for architecture %q", specifiedArch)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/nodepool_controller.go` around lines
764 - 786, The error returned when streamMeta is nil in defaultNodePoolGCPImage
doesn't include the requested stream; update the error to include the stream
(variable s) when non-empty so callers can see which stream was requested (e.g.,
change the fmt.Errorf call that currently reports only the architecture to also
include the stream name), mirroring the behaviour in defaultNodePoolAMI and
keeping the existing architecture message.

739-761: ⚡ Quick win

Include requested stream in error message.

When streamMeta is nil (line 745), the error message doesn't mention which stream was requested. When debugging missing stream metadata (e.g., a payload lacking rhel-10 images), knowing the requested stream would speed diagnosis.

📝 Proposed fix
 	streamMeta := releaseImage.StreamMetadataForStream(s)
 	if streamMeta == nil {
-		return "", fmt.Errorf("release image stream metadata is nil")
+		if s != "" {
+			return "", fmt.Errorf("release image has no metadata for stream %q", s)
+		}
+		return "", fmt.Errorf("release image stream metadata is nil")
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/nodepool_controller.go` around lines
739 - 761, The error returned in defaultNodePoolAMI when streamMeta is nil
doesn't include which stream was requested; update the error to include the
requested stream value (the local variable s / the first variadic stream
argument) so the message reads something like "release image stream metadata is
nil for stream %q" — modify the nil-check that currently returns
fmt.Errorf("release image stream metadata is nil") to include s for better
diagnostics.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hypershift-operator/controllers/nodepool/version.go`:
- Around line 103-108: The OSImageStream status is only updated when
osImageStreamFromMachines returns a non-empty string, leaving stale data
otherwise; modify the block around osImageStreamFromMachines(machines) so that
nodePool.Status.OSImageStream is always assigned: if stream != "" set
nodePool.Status.OSImageStream = hyperv1.OSImageStreamReference{Name: stream}
else clear it by assigning an empty/zero value (matching the pattern used for
NodesInfo at lines 99-101) so stale values are removed when no stream is
detected.

---

Outside diff comments:
In `@hypershift-operator/controllers/nodepool/version_test.go`:
- Around line 184-304: The tests for setNodesInfoStatus don't assert the new
OSImageStream status field: update each test case (the ones constructing
machines and expectedNodesInfo in TestSetNodesInfoStatus) to include
Machine.Status.NodeInfo.OSImage (add OSImage value on the Machine with NodeInfo
where appropriate) and add the corresponding expected OSImageStream entries to
the expectedNodesInfo for each case (for the "machines exist" case set the
expected OSImageStream entry matching the Machine's OSImage; for the cases that
should clear status set expectedNodesInfo.OSImageStreams to empty). Ensure you
update the expectedNodesInfo structure used in the g.Expect comparison so
setNodesInfoStatus (the function under test) is validated for OSImageStream
population.

---

Nitpick comments:
In `@hypershift-operator/controllers/nodepool/nodepool_controller.go`:
- Around line 764-786: The error returned when streamMeta is nil in
defaultNodePoolGCPImage doesn't include the requested stream; update the error
to include the stream (variable s) when non-empty so callers can see which
stream was requested (e.g., change the fmt.Errorf call that currently reports
only the architecture to also include the stream name), mirroring the behaviour
in defaultNodePoolAMI and keeping the existing architecture message.
- Around line 739-761: The error returned in defaultNodePoolAMI when streamMeta
is nil doesn't include which stream was requested; update the error to include
the requested stream value (the local variable s / the first variadic stream
argument) so the message reads something like "release image stream metadata is
nil for stream %q" — modify the nil-check that currently returns
fmt.Errorf("release image stream metadata is nil") to include s for better
diagnostics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 56af0f53-9b8b-4ec2-8d7a-2a7a7c424a1a

📥 Commits

Reviewing files that changed from the base of the PR and between 4755e9c and 6c3e7d1.

📒 Files selected for processing (20)
  • hypershift-operator/controllers/nodepool/aws.go
  • hypershift-operator/controllers/nodepool/conditions.go
  • hypershift-operator/controllers/nodepool/config.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/rhel_stream.go
  • hypershift-operator/controllers/nodepool/rhel_stream_test.go
  • hypershift-operator/controllers/nodepool/token.go
  • hypershift-operator/controllers/nodepool/version.go
  • hypershift-operator/controllers/nodepool/version_test.go
  • ignition-server/cmd/run_local_ignitionprovider.go
  • ignition-server/controllers/local_ignitionprovider.go
  • ignition-server/controllers/local_ignitionprovider_test.go
  • ignition-server/controllers/tokensecret_controller.go
  • ignition-server/controllers/tokensecret_controller_test.go
  • support/releaseinfo/deserialize.go
  • support/releaseinfo/deserialize_test.go
  • support/releaseinfo/registry_mirror_provider.go
  • support/releaseinfo/registryclient_provider.go
  • support/releaseinfo/releaseinfo.go
  • support/releaseinfo/releaseinfo_test.go

Comment thread hypershift-operator/controllers/nodepool/version.go
@enxebre

enxebre commented Jun 10, 2026

Copy link
Copy Markdown
Member

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 10, 2026
@enxebre enxebre changed the title CNTRLPLANE-3612: Add dual-stream RHEL 9/10 NodePool support DNM: CNTRLPLANE-3612: Add dual-stream RHEL 9/10 NodePool support Jun 10, 2026
OpenShift CI Bot and others added 3 commits June 10, 2026 14:04
Implement the controller-level plumbing for dual-stream RHEL 9/10
NodePool support, threading the OS stream selection from the NodePool
spec through the token secret to the ignition server.

Key changes:

- Add getRHELStream() pure function that resolves the RHEL stream for a
  NodePool based on spec.osImageStream.name, release version, and runc
  usage. Returns "rhel-9"/"rhel-10" or "" for legacy single-stream.

- Add usesRunc detection to ConfigGenerator by inspecting
  ContainerRuntimeConfig CRs for defaultRuntime=runc during config
  parsing. RHEL 10 does not ship runc, so runc usage forces RHEL 9.

- Add rhelStream to rolloutConfig and include in Hash()/HashWithoutVersion()
  to trigger rollouts when the explicit stream changes. Only populated
  from spec.osImageStream.name (not the resolved default) to avoid
  fleet-wide rollouts on upgrade.

- Write os-stream to token secret data for the ignition server to read.

- Extend validMachineConfigCondition with RHEL stream validation:
  rhel-10 on < 5.0 and rhel-10+runc are rejected with clear messages.

- Update defaultNodePoolAMI and defaultNodePoolGCPImage to accept an
  optional stream parameter for stream-aware boot image resolution.

- Report status.osImageStream by detecting the RHEL stream from CAPI
  Machine NodeInfo.OSImage (RHCOS version 4xx=RHEL 9, 5xx=RHEL 10).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Thread the OS stream selection through the ignition server pipeline:

- Add osStream parameter to the IgnitionProvider.GetPayload() interface.
  The TokenSecretReconciler reads the os-stream key from the token
  secret and passes it to GetPayload().

- In LocalIgnitionProvider.GetPayload(), after copyMCOOutputToMCC() and
  before runMCC(), write 99_osimagestream.yaml to the MCC manifest
  directory. This OSImageStream CR tells the MCC bootstrap which RHEL
  stream to select when rendering MachineConfigs. When osStream is
  empty (pre-5.0 or feature gate disabled), no CR is written and the
  MCC uses the default BaseOSContainerImage from ControllerConfig.

This completes the end-to-end flow: NodePool spec -> token secret ->
ignition server -> MCC bootstrap -> correct osImageURL in MachineConfig.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add comprehensive unit tests covering all new dual-stream functions:

- TestGetRHELStream: 11 test cases covering all decision branches
  (explicit/implicit streams, runc guard, version thresholds, pre-release
  metadata stripping).

- TestDetectRHELStreamFromOSImage: 7 test cases for RHCOS version
  string parsing (RHEL 9 4xx, RHEL 10 5xx, edge cases).

- TestOsImageStreamFromMachines: 7 test cases for majority-based stream
  detection from CAPI Machines (unanimous, split, nil NodeInfo).

- TestWriteOSImageStreamManifest: 2 test cases verifying correct YAML
  generation of the OSImageStream CR for both rhel-9 and rhel-10.

All test names follow the HyperShift "When...it should..." convention.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
// discover constituent component image information.
type ReleaseImage struct {
*imageapi.ImageStream `json:",inline"`
StreamMetadata *CoreOSStreamMetadata `json:"streamMetadata"`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we instead use the upstream types for github.com/coreos/stream-metadata-go

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I checked the upstream github.com/coreos/stream-metadata-go/stream types and they cover most of what we need: StreamCoreOSStreamMetadata, plus AwsImage, GcpImage, ContainerImage (KubeVirt), ReplicatedObject (PowerVS/IBMCloud) etc.

However, our custom types include Azure-specific extensions (CoreRHCOSImage, CoreAzureDisk, CoreMarketplace, CoreAzureMarketplaceImage) that don't exist in the upstream package — Azure marketplace data is accessed via rhcos.Extensions which has a different structure than what we parse today. Adopting upstream types would require:

  1. Adding the module as a dependency
  2. Updating all ~15 consumers to use upstream type names and field accessors
  3. Reworking the Azure marketplace path through rhcos.Extensions
  4. Verifying JSON tag compatibility (some differ: sha256 vs SHA256, etc.)

I'd prefer to do this as a follow-up PR to keep this one focused on the dual-stream plumbing. I'll create a JIRA for it. Sound reasonable?


AI-assisted response via Claude Code

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The upstream github.com/coreos/stream-metadata-go package provides types like stream.Stream and stream.Arch that map to our custom CoreOSStreamMetadata and CoreOSArchitecture.

However, adopting the upstream types requires careful evaluation:

  1. The upstream types may include fields we don't need, increasing API surface
  2. Our types have custom JSON tags and HyperShift-specific nested structs (e.g., CoreOSGCPImage, CoreAzureMarketplace) that don't exist upstream
  3. The upstream types would introduce a new dependency that needs to be tracked for compatibility

I'd propose handling this as a follow-up PR to keep this one focused on the dual-stream logic. We can evaluate which upstream types map cleanly and adopt them incrementally. Would that work for you?


AI-assisted response via Claude Code

// TestReleaseInfoKubeVirt tests validates the presence of the kubevirt images
func TestReleaseInfoKubeVirt(t *testing.T) {
metadata, err := DeserializeImageMetadata(fixtures.CoreOSBootImagesYAML_4_10)
metadata, _, err := DeserializeImageMetadata(fixtures.CoreOSBootImagesYAML_4_10)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please have unit tests with a stub for dualstream

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have dual-stream unit tests using a 5.0-dual-stream-bootimages.yaml fixture (embedded via fixtures.CoreOSDualStreamBootImagesYAML_5_0):

  • TestDeserializeDualStreamImageMetadata — verifies legacy single-stream parsing, multi-stream parsing (rhel-9/rhel-10), correct stream names, and per-stream AWS AMI differentiation
  • TestStreamMetadataForStream — verifies stream lookup for empty, known (rhel-10), and unknown stream names with correct fallback behavior

These are in support/releaseinfo/releaseinfo_test.go (lines 165–262). Would you like additional coverage for specific scenarios beyond what these already test?


AI-assisted response via Claude Code

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dual-stream deserialization is already covered by tests in this file:

  • TestDeserializeDualStreamImageMetadata (line 165): parses the dual-stream fixture, validates both rhel-9 and rhel-10 streams exist, verifies stream names, and checks per-stream AWS AMIs are different
  • TestStreamMetadataForStream (line 229): validates StreamMetadataForStream() returns the correct metadata for empty, known, and unknown stream names

For the TestReleaseInfoKubeVirt test at this line, the change is minimal — it only updates the call to DeserializeImageMetadata to accept the new 3-value return (adding _ for streamsMeta). The test still validates KubeVirt images from the 4.10 single-stream fixture.

Would you like additional tests specifically for KubeVirt/PowerVS with dual-stream data? The current dual-stream fixture (CoreOSDualStreamBootImagesYAML_5_0) does include per-stream data for AWS/GCP, but we could extend it to include KubeVirt and PowerVS entries as well.


AI-assisted response via Claude Code

}
// Default behavior for Linux/RHCOS AMIs
ami, err := defaultNodePoolAMI(region, arch, releaseImage)
ami, err := defaultNodePoolAMI(region, arch, releaseImage, nodePool.Spec.OSImageStream.Name)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there other platforms like azure that need to update how they pick default?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch. Currently only AWS's defaultNodePoolAMI and defaultNodePoolGCPImage accept the stream parameter and use StreamMetadataForStream(). The following platforms still access releaseImage.StreamMetadata directly and would need updating:

  • Azure (azure.go:124) — getAzureMarketplaceMetadata uses releaseImage.StreamMetadata
  • GCP (gcp.go:171) — resolveGCPImage calls defaultNodePoolGCPImage without passing the stream name
  • KubeVirt (kubevirt/kubevirt.go:47) — uses releaseImage.StreamMetadata directly
  • PowerVS (powervs.go:112) — uses releaseImage.StreamMetadata directly
  • OpenStack (openstack/openstack.go:135,161) — uses releaseImage.StreamMetadata directly

For pre-5.0 releases this is fine (single stream = legacy StreamMetadata). For 5.0+ where StreamMetadata falls back to the first stream in the map, these platforms will still get a valid image — but they won't respect spec.osImageStream.name. I'll update them to use StreamMetadataForStream() consistently.


AI-assisted response via Claude Code

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch. Currently the following platforms need updating for dual-stream support:

  • GCP (gcp.go:171): defaultNodePoolGCPImage already accepts a stream ...string variadic, but resolveGCPImage doesn't pass nodePool.Spec.OSImageStream.Name yet
  • Azure (azure.go:39): defaultAzureNodePoolImage accesses releaseImage.StreamMetadata directly without going through StreamMetadataForStream()
  • PowerVS (powervs.go:112): getPowerVSImage accesses releaseImage.StreamMetadata.Architectures["ppc64le"] directly — though ppc64le may not have RHEL 10 images in the near term
  • KubeVirt (kubevirt.go:47): defaultImage accesses releaseImage.StreamMetadata directly

For this initial PR, I focused on AWS as the primary platform for RHEL 10 dual-stream. GCP and Azure should be updated to use StreamMetadataForStream() — I can add that in a follow-up or extend this PR if you'd prefer. PowerVS and KubeVirt are lower priority since ppc64le/s390x may not have RHEL 10 images initially.


AI-assisted response via Claude Code

return &ctrl.Result{}, fmt.Errorf("invalid RHEL stream configuration: %w", streamErr)
}
// Informational message when stream falls back to rhel-9 due to runc.
if nodePool.Spec.OSImageStream.Name == "" && stream == rhelStreamRHEL9 {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we know it was due to runc here? couldn't it be the payload is < 5?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. When specStream == "", getRHELStream() returns exactly these values:

Release usesRunc Returns
< 5.0 false "" (empty — legacy)
< 5.0 true "" (empty — legacy)
>= 5.0 false "rhel-10" (default)
>= 5.0 true "rhel-9" (runc fallback)

So the condition specStream == "" && stream == rhelStreamRHEL9 can only be true when rv >= 5.0 && usesRunc — pre-5.0 payloads return empty string, not "rhel-9". The runc attribution in the message is correct.

That said, the code doesn't make this obvious at the call site. Would it be clearer to also check configGenerator.usesRunc explicitly in the condition?


AI-assisted response via Claude Code

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Looking at getRHELStream() in rhel_stream.go, when specStream == "" (no explicit stream set), the possible return values are:

  1. rv >= 5.0 && usesRunc → returns "rhel-9" (this is the only path that returns "rhel-9" with empty specStream)
  2. rv >= 5.0 && !usesRunc → returns "rhel-10"
  3. rv < 5.0 → returns "" (legacy single-stream)

So the condition nodePool.Spec.OSImageStream.Name == "" && stream == rhelStreamRHEL9 can only be true via case 1 (runc fallback). Pre-5.0 payloads return "", not "rhel-9".

That said, the message could be clearer. I could add a comment explaining this reasoning, or make the condition more explicit by also checking configGenerator.usesRunc directly. Would that help?


AI-assisted response via Claude Code

cg.rolloutConfig.additionalTrustBundleName = hostedCluster.Spec.AdditionalTrustBundle.Name
}

// rhelStream is populated directly from spec.osImageStream.name (empty when unset).

@enxebre enxebre Jun 11, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we error here if userunC and rhelStream=10?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already handled downstream. In conditions.go:394, after the config is generated, we call:

```go
stream, streamErr := getRHELStream(nodePool.Spec.OSImageStream.Name, releaseVersion, configGenerator.usesRunc)
```

getRHELStream returns an error for rhelStream=rhel-10 + usesRunc:
```go
case specStream == rhelStreamRHEL10 && usesRunc:
return "", fmt.Errorf("OS stream %s is incompatible with default_runtime=runc; RHEL 10 does not ship runc", rhelStreamRHEL10)
```

The config generator itself populates `rhelStream` from the raw spec value at line 104 — it's just storing the user's input. The validation happens in validMachineConfigCondition which runs getRHELStream() to catch the incompatible combination and sets a clear condition message. So the validation is there, just in a different layer (conditions vs config).


AI-assisted response via Claude Code

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already handled by getRHELStream() in rhel_stream.go:36-37:

case specStream == rhelStreamRHEL10 && usesRunc:
    return "", fmt.Errorf("OS stream %s is incompatible with default_runtime=runc; RHEL 10 does not ship runc", rhelStreamRHEL10)

The validation happens in validMachineConfigCondition() (conditions.go:394) which calls getRHELStream() and sets NodePoolValidMachineConfigCondition to False when this error is returned. So the error is surfaced as a condition before we ever reach NewConfigGenerator.

The code at this line (config.go:101) runs during NewConfigGenerator, which is called before the getRHELStream() validation in conditions.go. However, cg.rhelStream here is only used for the hash — the actual incompatibility validation is handled separately. The usesRunc flag is set later during generateMCORawConfig (line 106) when parsing ContainerRuntimeConfig manifests.


AI-assisted response via Claude Code

// This is kept like this for now to contain the scope of the refactor and avoid backward compatibility issues.
func (cg *ConfigGenerator) Hash() string {
return supportutil.HashSimple(cg.mcoRawConfig + cg.releaseImage.Version() + cg.pullSecretName + cg.additionalTrustBundleName + cg.globalConfig)
return supportutil.HashSimple(cg.mcoRawConfig + cg.releaseImage.Version() + cg.pullSecretName + cg.additionalTrustBundleName + cg.globalConfig + cg.rhelStream)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have coverage to validate this doesn't change the current hash when rhelStream is empty? otherwise that would trigger a rollout

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When `rhelStream` is empty (the default), the hash input is:

```
mcoRawConfig + version + pullSecretName + additionalTrustBundleName + globalConfig + ""
```

Since `"abc" + ""` == `"abc"` in Go string concatenation, appending an empty string produces the same hash as before. No rollout is triggered.

That said, an explicit test proving this invariant would be valuable — I'll add a test case that computes the hash with `rhelStream=""` and verifies it matches the hash computed without the `rhelStream` field present at all.


AI-assisted response via Claude Code

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — there's explicit test coverage for this in config_test.go:

TestHash (line 450):

{
    name:     "When rhelStream is empty it should not change the hash",
    ...
    rhelStream: "",
    expected:   baseCaseHash,  // identical to base case without rhelStream field
},

This test verifies that an empty rhelStream (the default when spec.osImageStream is unset) produces the exact same hash as the base case. Since rhelStream defaults to "" and string concatenation with "" is a no-op, the hash input is identical to pre-PR behavior.


AI-assisted response via Claude Code

// This is kept like this for now to contain the scope of the refactor and avoid backward compatibility issues.
func (cg *ConfigGenerator) HashWithoutVersion() string {
return supportutil.HashSimple(cg.mcoRawConfig + cg.pullSecretName + cg.additionalTrustBundleName)
return supportutil.HashSimple(cg.mcoRawConfig + cg.pullSecretName + cg.additionalTrustBundleName + cg.rhelStream)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have coverage to validate this doesn't change the current hash when rhelStream is empty? otherwise that would trigger a rollout

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as above — when rhelStream is empty (the default/unset case), appending "" to the hash input string is a no-op: "abc" + "" == "abc". The hash is identical to the old code. I'll make the test cover both Hash() and HashWithoutVersion() to validate this invariant explicitly.


AI-assisted response via Claude Code

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above — TestHashWithoutVersion (line 576) has the same coverage:

{
    name:     "When rhelStream is empty it should not change the hash",
    ...
    rhelStream: "",
    expected:   baseCaseHash,  // "85234650" — identical to base case
},

Both Hash() and HashWithoutVersion() are covered. Empty rhelStream → no hash change → no rollout.


AI-assisted response via Claude Code

}
case *mcfgv1.ContainerRuntimeConfig:
// Detect runc usage for dual-stream RHEL compatibility
if obj.Spec.ContainerRuntimeConfig != nil &&

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this how the MCO validates this? please let this be a unit tested func

@enxebre

enxebre commented Jun 11, 2026

Copy link
Copy Markdown
Member

Code Review: PR #8714 — Dual-stream RHEL 9/10 NodePool Support

Reviewer: AI-assisted review (Claude Code with HyperShift SME agents)
Language: Go | Profile: HyperShift


Overall Verdict: FAIL (3 blocking issues)

The end-to-end architecture (spec → controller → token → ignition → MCC) is sound, and the hash stability design (explicit stream only in hash to avoid fleet-wide rollouts on 5.0 upgrade) is well-considered. Pure functions (getRHELStream, detectRHELStreamFromOSImage, osImageStreamFromMachines) are well-tested. However, there are a build failure, a data loss bug in disconnected environments, and a nil-safety issue that block merge.


Required Actions (Blocking)

1. Build failure — run_local_ignitionprovider.go:114

ignition-server/cmd/run_local_ignitionprovider.go:114:70: not enough arguments in call to p.GetPayload

The GetPayload interface gained a 7th osStream parameter, but RunLocalIgnitionProviderOptions.Run still passes only 6 arguments. Needs "" as the last argument.

2. RegistryMirrorProviderDecorator.Lookup drops StreamsMetadataregistry_mirror_provider.go:42

The Lookup method constructs a new ReleaseImage but does not propagate StreamsMetadata:

return &ReleaseImage{
    ImageStream:    imageStream,
    StreamMetadata: releaseImage.StreamMetadata,
    // StreamsMetadata: missing!
}

Any cluster using registry mirror overrides (common in disconnected/air-gapped environments) will silently lose multi-stream metadata. resolveStreamMetadata() will return an error when the user sets spec.osImageStream.name in these environments. Fix: add StreamsMetadata: releaseImage.StreamsMetadata.

3. PowerVS nil-safety — powervs.go:112

getPowerVSImage() accesses releaseImage.StreamMetadata.Architectures["ppc64le"] directly without going through resolveStreamMetadata(). If a future payload uses only the streams key (no legacy stream key), StreamMetadata will be nil, causing a panic. At minimum, add a nil check. Ideally, use the shared resolveStreamMetadata() helper.


Recommended Improvements (Non-blocking)

Architecture & Correctness

4. Inconsistent stream resolution across platforms — AWS/GCP use resolveStreamMetadata() which returns an error for unknown streams. Azure (azure.go:123-129) and KubeVirt (kubevirt/kubevirt.go:49-53) implement inline fallback that silently ignores unknown streams. All should use the shared helper for consistent error behavior.

5. Karpenter AMI path misses streamtoken.go:431 calls defaultNodePoolAMI(region, arch, releaseImage) without passing the stream name. Karpenter-managed NodePools will always get the legacy/default stream AMI, ignoring OSImageStream.

6. Non-deterministic map iteration in DeserializeImageMetadata fallback — When the stream key is absent but streams is present, the code iterates the streamsMeta map and returns the first entry. Map iteration in Go is non-deterministic. Should pick a deterministic default (e.g., sorted first key).

7. OpenStack nil-safetyOpenstackDefaultImage() and OpenStackReleaseImage() (openstack.go:135,158) access releaseImage.StreamMetadata.Architectures["x86_64"] directly. Same nil-safety risk as PowerVS.

Code Quality

8. Variadic stream ...string is unidiomatic (nodepool_controller.go) — defaultNodePoolAMI and defaultNodePoolGCPImage use variadic to make the parameter optional, but all callers pass 0 or 1 arguments. Use a plain string parameter where "" means "use default" — the resolveStreamMetadata helper already uses this pattern correctly.

9. detectRHELStreamFromOSImage uses raw string literals (version.go) — Returns "rhel-10" and "rhel-9" instead of the constants RHELStream10/RHELStream9 defined in the same package.

10. GetPayload has 7 positional string parameters — All string typed with no compile-time distinction. Consider refactoring to a PayloadRequest struct. At minimum, add a // TODO noting this debt.

11. Raw YAML for OSImageStream CR (local_ignitionprovider.go) — Uses fmt.Sprintf with %q to template YAML and hardcodes machineconfiguration.openshift.io/v1alpha1. No compile-time contract with the MCO type. The HyperShift API comment references v1, creating a version mismatch. Consider using a typed Go struct with proper serialization.

Test Gaps

12. Missing tests for key integration points:

Function File Gap
resolveStreamMetadata() nodepool_controller.go Zero tests — core dispatch logic for multi-stream
DeserializeMultiStreamImageMetadata() deserialize.go Zero tests — parses new multi-stream ConfigMap format
usesRunc detection config.go No test with defaultRuntime=runc verifying cg.usesRunc=true
defaultNodePoolAMI with stream nodepool_controller.go Variadic stream param never exercised
rhelStream in Hash config.go No test verifying hash changes when rhelStream changes
os-stream in token secret token.go No test verifying the key is written

Documentation

13. No forced migration path for existing RHEL 9 nodes — The implicit default switch from RHEL 9 to RHEL 10 on 5.0+ happens at the ignition payload level without changing the rollout hash. Existing nodes stay on RHEL 9; only new nodes get RHEL 10. This is likely intentional but should be documented explicitly in the enhancement.


Platform Coverage Summary

Platform Stream-aware? Status
AWS Correct via resolveStreamMetadata()
GCP Correct via resolveStreamMetadata()
Azure ⚠️ Inline resolution — should use shared helper
KubeVirt ⚠️ Inline resolution — should use shared helper
PowerVS Nil-safety risk (blocking)
OpenStack Nil-safety risk
Karpenter Stream argument not passed

What's Good

  • Hash stability design — Using the raw spec value (not resolved default) for the rollout hash avoids fleet-wide rollouts when upgrading to 5.0. Well-commented.
  • getRHELStream() is clean — Pure function with clear switch-case logic and comprehensive tests (10 cases).
  • Token secret flow — The os-stream key absence for pre-5.0 is handled correctly (nil → empty string → no manifest).
  • OSImageStream CR timing — Written after MCO but before MCC, which is the correct placement in the ignition pipeline.
  • usesRunc detection — Correctly re-initialized per reconcile via fresh ConfigGenerator construction.

Generated with Claude Code using HyperShift SME agents (control-plane, data-plane, cloud-provider, API, architect)


// rhelStream is populated directly from spec.osImageStream.name (empty when unset).
// Using the resolved value from getRHELStream() would inject implicit defaults into
// the hash, triggering fleet-wide rollouts on upgrade.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only let this change the hash and trigger a rollout if the value is different from the one in status

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting suggestion. Let me walk through the current behavior and the tradeoff:

Current design: rhelStream in the hash comes directly from spec.osImageStream.name (empty when unset). This means:

  • Empty → empty (upgrade to 5.0, no explicit stream set): hash unchanged, no rollout ✅
  • Empty → "rhel-10" (user explicitly sets stream): hash changes, rollout triggered ✅
  • "rhel-9" → "rhel-10" (user changes stream): hash changes, rollout triggered ✅

Your suggestion (compare with status): If spec == status, skip from hash. This would prevent a rollout when the user explicitly sets the stream to what's already running (e.g., setting "rhel-10" on a node already running rhel-10).

The tradeoff: this would require passing nodePool.Status.OSImageStream.Name into the ConfigGenerator, adding a coupling between the config hash computation and status. Currently the config generator is deliberately a pure function of spec inputs.

Also, status.osImageStream is populated from CAPI Machine NodeInfo (detected from running nodes), not from the controller's intent. There could be a window where status lags — e.g., the user sets rhel-10 but nodes haven't been updated yet, so status still shows rhel-9.

For now, the only case that would cause a "spurious" rollout is when a user explicitly sets the stream to match what they're already running — which seems rare and arguably intentional (explicit declarative intent). Would you still like this changed, or is the current behavior acceptable?


AI-assisted response via Claude Code

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The current design uses spec.osImageStream.name directly in the hash, which means:

  • Empty → empty string → hash unchanged → no rollout ✅
  • User sets "rhel-10" → hash changes → rollout ✅ (intended)
  • User sets "rhel-9" on a cluster already running rhel-9 → hash changes → rollout ⚠️ (unnecessary)

Your suggestion to compare spec vs status would avoid the unnecessary rollout in case 3. Implementation would look something like:

// Only include rhelStream in hash if it differs from the observed stream in status,
// preventing unnecessary rollouts when explicitly setting the current stream.
if nodePool.Spec.OSImageStream.Name != nodePool.Status.OSImageStream.Name {
    cg.rolloutConfig.rhelStream = nodePool.Spec.OSImageStream.Name
}

However, there's a subtle concern: this creates a dependency between the hash calculation and the status, which is normally populated asynchronously from observed machine state. If status hasn't been populated yet (e.g., new NodePool), we'd need to handle that edge case.

Would you like me to implement this approach? The tradeoff is: simpler code (current) vs. avoiding unnecessary rollouts when the user sets spec.osImageStream to match the already-running stream.


AI-assisted response via Claude Code

@enxebre

enxebre commented Jun 11, 2026

Copy link
Copy Markdown
Member

/address-review-comments

@github-actions

Copy link
Copy Markdown

🤖 Addressing review comments: workflow run

- Add nil-safety checks for StreamMetadata in PowerVS, OpenStack, and KubeVirt
- Extract runc detection to standalone containerRuntimeUsesRunc() function
- Use rhelStream constants in detectRHELStreamFromOSImage instead of raw strings
- Add hash stability tests for Hash() and HashWithoutVersion() with empty rhelStream
- Add unit tests for containerRuntimeUsesRunc with runc, crun, nil, and empty cases
- Add dual-stream fixture and tests for DeserializeImageMetadata and StreamMetadataForStream

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci openshift-ci Bot added area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/platform/openstack PR/issue for OpenStack (OpenStackPlatform) platform labels Jun 11, 2026
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hypershift-jira-solve-ci[bot]
Once this PR has been reviewed and has the lgtm label, please assign bryan-cox for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the area/platform/powervs PR/issue for PowerVS (PowerVSPlatform) platform label Jun 11, 2026
@hypershift-jira-solve-ci

hypershift-jira-solve-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Author

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

hypershift-operator/controllers/nodepool/rhel_stream.go:6:1: File is not properly formatted (gci)
hypershift-operator/controllers/nodepool/rhel_stream_test.go:7:1: File is not properly formatted (gci)

make: *** [Makefile:120: lint] Error 1

Summary

The lint job failed because two newly-added files (rhel_stream.go and rhel_stream_test.go) have their Go import statements in the wrong group order. The project uses the gci (Go Comment Import) linter with a strict custom group ordering defined in .golangci.yml. Both files place github.com/blang/semver (a third-party default group import) in the same import block as github.com/openshift/api/machineconfiguration/v1 (a prefix(github.com/openshift) group import). Per the project's gci configuration, these belong in separate groups separated by blank lines, with the github.com/openshift prefix group appearing before the default (third-party) group.

Root Cause

The project's .golangci.yml enforces a strict import group ordering via the gci linter with custom-order: true:

  1. standard — stdlib packages (fmt, strings, testing, etc.)
  2. dot — dot imports
  3. prefix(github.com/openshift/hypershift) — project-internal imports
  4. prefix(github.com/openshift) — OpenShift ecosystem imports
  5. prefix(github.com/aws) — AWS SDK imports
  6. prefix(github.com/Azure) — Azure SDK imports
  7. prefix(k8s.io) — Kubernetes imports
  8. prefix(sigs.k8s.io) — SIG imports
  9. default — all other third-party imports

Both new files group github.com/blang/semver (which falls into group 9, default) alongside github.com/openshift/api/machineconfiguration/v1 (which falls into group 4, prefix(github.com/openshift)) in a single import block. The gci linter requires these to be in separate blocks separated by a blank line, with the github.com/openshift import appearing first.

rhel_stream.go current (incorrect):

import (
	"fmt"

	"github.com/blang/semver"
	mcfgv1 "github.com/openshift/api/machineconfiguration/v1"
)

rhel_stream.go expected (correct):

import (
	"fmt"

	mcfgv1 "github.com/openshift/api/machineconfiguration/v1"

	"github.com/blang/semver"
)

The same pattern applies to rhel_stream_test.go.

Recommendations
  1. Fix the import ordering in both files by separating github.com/openshift/api/... and github.com/blang/semver into distinct import groups, with the openshift group first:

    rhel_stream.go — reorder to:

    import (
    	"fmt"
    
    	mcfgv1 "github.com/openshift/api/machineconfiguration/v1"
    
    	"github.com/blang/semver"
    )

    rhel_stream_test.go — reorder to:

    import (
    	"strings"
    	"testing"
    
    	mcfgv1 "github.com/openshift/api/machineconfiguration/v1"
    
    	"github.com/blang/semver"
    )
  2. Alternatively, auto-fix by running from the repo root:

    make lint-fix

    This invokes golangci-lint run --fix which will rewrite the import groups automatically.

Evidence
Evidence Detail
Lint errors gci reported 2 issues — both at the start of the import block (line 6 in rhel_stream.go, line 7 in rhel_stream_test.go)
gci config .golangci.yml specifies custom-order: true with prefix(github.com/openshift) (group 4) before default (group 9)
Offending pattern github.com/blang/semver (default group) and github.com/openshift/api/... (prefix(github.com/openshift) group) are in the same import block without a blank-line separator
Reference file nodepool_controller.go in the same package correctly separates github.com/openshift/hypershift/..., github.com/openshift/api/..., k8s.io/..., and third-party imports into distinct blank-line-separated groups
Exit code make lint exited with code 1 (Error 1), propagated as GitHub Actions exit code 2

@enxebre

enxebre commented Jun 11, 2026

Copy link
Copy Markdown
Member

/address-review-comments

@github-actions

Copy link
Copy Markdown

🤖 Addressing review comments: workflow run

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@hypershift-jira-solve-ci[bot]: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@enxebre

enxebre commented Jun 11, 2026

Copy link
Copy Markdown
Member

/test address-review-comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/platform/openstack PR/issue for OpenStack (OpenStackPlatform) platform area/platform/powervs PR/issue for PowerVS (PowerVSPlatform) platform do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants