DNM: CNTRLPLANE-3612: Add dual-stream RHEL 9/10 NodePool support#8714
DNM: CNTRLPLANE-3612: Add dual-stream RHEL 9/10 NodePool support#8714hypershift-jira-solve-ci[bot] wants to merge 5 commits into
Conversation
Update DeserializeImageMetadata to parse both the legacy single-stream "stream" key and the new multi-stream "streams" key from the boot image ConfigMap. The "streams" key is a JSON map keyed by stream name (e.g., "rhel-9", "rhel-10") introduced in OpenShift 5.0 payloads. Add StreamsMetadata field to ReleaseImage and a StreamMetadataForStream helper method that looks up stream-specific metadata, falling back to the legacy single-stream metadata when the requested stream is empty or not found. This enables per-NodePool RHEL stream selection for boot image resolution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-3612 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (11)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughThis PR adds multi-stream CoreOS metadata support, a getRHELStream resolver with version/runtime gating, and threads NodePool.Spec.OSImageStream through ConfigGenerator, token creation, and ignition payload generation. Release metadata parsing, provider wiring, AMI/image lookups, rollout hash inputs, token secret keys, ignition manifest writing, machine-based OS stream inference, and tests/fixtures were updated accordingly. Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 10 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
hypershift-operator/controllers/nodepool/version_test.go (1)
184-304:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd test coverage for OSImageStream status field in setNodesInfoStatus tests.
The new OSImageStream status field (set at version.go:103-108) is not verified by any of the three existing test cases. The test at lines 215-251 creates a Machine with NodeInfo, but
expectedNodesInfoat line 246 only checksNodeVersions. Similarly, the other test cases don't verify the OSImageStream field.Add OSImageStream expectations to the test cases to ensure the integration path correctly populates the status field from Machine OSImage data.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/nodepool/version_test.go` around lines 184 - 304, The tests for setNodesInfoStatus don't assert the new OSImageStream status field: update each test case (the ones constructing machines and expectedNodesInfo in TestSetNodesInfoStatus) to include Machine.Status.NodeInfo.OSImage (add OSImage value on the Machine with NodeInfo where appropriate) and add the corresponding expected OSImageStream entries to the expectedNodesInfo for each case (for the "machines exist" case set the expected OSImageStream entry matching the Machine's OSImage; for the cases that should clear status set expectedNodesInfo.OSImageStreams to empty). Ensure you update the expectedNodesInfo structure used in the g.Expect comparison so setNodesInfoStatus (the function under test) is validated for OSImageStream population.
🧹 Nitpick comments (2)
hypershift-operator/controllers/nodepool/nodepool_controller.go (2)
764-786: ⚡ Quick winInclude requested stream in error message.
When
streamMetaisnil(line 773), the error message mentions the architecture but not the requested stream. For consistency with the recommendation ondefaultNodePoolAMIand to aid debugging, include the stream name when it's non-empty.📝 Proposed fix
streamMeta := releaseImage.StreamMetadataForStream(s) if streamMeta == nil { - return "", fmt.Errorf("release image stream metadata is nil, cannot determine GCP image for architecture %q", specifiedArch) + if s != "" { + return "", fmt.Errorf("release image has no metadata for stream %q, cannot determine GCP image for architecture %q", s, specifiedArch) + } + return "", fmt.Errorf("release image stream metadata is nil, cannot determine GCP image for architecture %q", specifiedArch) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/nodepool/nodepool_controller.go` around lines 764 - 786, The error returned when streamMeta is nil in defaultNodePoolGCPImage doesn't include the requested stream; update the error to include the stream (variable s) when non-empty so callers can see which stream was requested (e.g., change the fmt.Errorf call that currently reports only the architecture to also include the stream name), mirroring the behaviour in defaultNodePoolAMI and keeping the existing architecture message.
739-761: ⚡ Quick winInclude requested stream in error message.
When
streamMetaisnil(line 745), the error message doesn't mention which stream was requested. When debugging missing stream metadata (e.g., a payload lackingrhel-10images), knowing the requested stream would speed diagnosis.📝 Proposed fix
streamMeta := releaseImage.StreamMetadataForStream(s) if streamMeta == nil { - return "", fmt.Errorf("release image stream metadata is nil") + if s != "" { + return "", fmt.Errorf("release image has no metadata for stream %q", s) + } + return "", fmt.Errorf("release image stream metadata is nil") }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/nodepool/nodepool_controller.go` around lines 739 - 761, The error returned in defaultNodePoolAMI when streamMeta is nil doesn't include which stream was requested; update the error to include the requested stream value (the local variable s / the first variadic stream argument) so the message reads something like "release image stream metadata is nil for stream %q" — modify the nil-check that currently returns fmt.Errorf("release image stream metadata is nil") to include s for better diagnostics.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hypershift-operator/controllers/nodepool/version.go`:
- Around line 103-108: The OSImageStream status is only updated when
osImageStreamFromMachines returns a non-empty string, leaving stale data
otherwise; modify the block around osImageStreamFromMachines(machines) so that
nodePool.Status.OSImageStream is always assigned: if stream != "" set
nodePool.Status.OSImageStream = hyperv1.OSImageStreamReference{Name: stream}
else clear it by assigning an empty/zero value (matching the pattern used for
NodesInfo at lines 99-101) so stale values are removed when no stream is
detected.
---
Outside diff comments:
In `@hypershift-operator/controllers/nodepool/version_test.go`:
- Around line 184-304: The tests for setNodesInfoStatus don't assert the new
OSImageStream status field: update each test case (the ones constructing
machines and expectedNodesInfo in TestSetNodesInfoStatus) to include
Machine.Status.NodeInfo.OSImage (add OSImage value on the Machine with NodeInfo
where appropriate) and add the corresponding expected OSImageStream entries to
the expectedNodesInfo for each case (for the "machines exist" case set the
expected OSImageStream entry matching the Machine's OSImage; for the cases that
should clear status set expectedNodesInfo.OSImageStreams to empty). Ensure you
update the expectedNodesInfo structure used in the g.Expect comparison so
setNodesInfoStatus (the function under test) is validated for OSImageStream
population.
---
Nitpick comments:
In `@hypershift-operator/controllers/nodepool/nodepool_controller.go`:
- Around line 764-786: The error returned when streamMeta is nil in
defaultNodePoolGCPImage doesn't include the requested stream; update the error
to include the stream (variable s) when non-empty so callers can see which
stream was requested (e.g., change the fmt.Errorf call that currently reports
only the architecture to also include the stream name), mirroring the behaviour
in defaultNodePoolAMI and keeping the existing architecture message.
- Around line 739-761: The error returned in defaultNodePoolAMI when streamMeta
is nil doesn't include which stream was requested; update the error to include
the requested stream value (the local variable s / the first variadic stream
argument) so the message reads something like "release image stream metadata is
nil for stream %q" — modify the nil-check that currently returns
fmt.Errorf("release image stream metadata is nil") to include s for better
diagnostics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 56af0f53-9b8b-4ec2-8d7a-2a7a7c424a1a
📒 Files selected for processing (20)
hypershift-operator/controllers/nodepool/aws.gohypershift-operator/controllers/nodepool/conditions.gohypershift-operator/controllers/nodepool/config.gohypershift-operator/controllers/nodepool/nodepool_controller.gohypershift-operator/controllers/nodepool/rhel_stream.gohypershift-operator/controllers/nodepool/rhel_stream_test.gohypershift-operator/controllers/nodepool/token.gohypershift-operator/controllers/nodepool/version.gohypershift-operator/controllers/nodepool/version_test.goignition-server/cmd/run_local_ignitionprovider.goignition-server/controllers/local_ignitionprovider.goignition-server/controllers/local_ignitionprovider_test.goignition-server/controllers/tokensecret_controller.goignition-server/controllers/tokensecret_controller_test.gosupport/releaseinfo/deserialize.gosupport/releaseinfo/deserialize_test.gosupport/releaseinfo/registry_mirror_provider.gosupport/releaseinfo/registryclient_provider.gosupport/releaseinfo/releaseinfo.gosupport/releaseinfo/releaseinfo_test.go
|
/hold |
Implement the controller-level plumbing for dual-stream RHEL 9/10 NodePool support, threading the OS stream selection from the NodePool spec through the token secret to the ignition server. Key changes: - Add getRHELStream() pure function that resolves the RHEL stream for a NodePool based on spec.osImageStream.name, release version, and runc usage. Returns "rhel-9"/"rhel-10" or "" for legacy single-stream. - Add usesRunc detection to ConfigGenerator by inspecting ContainerRuntimeConfig CRs for defaultRuntime=runc during config parsing. RHEL 10 does not ship runc, so runc usage forces RHEL 9. - Add rhelStream to rolloutConfig and include in Hash()/HashWithoutVersion() to trigger rollouts when the explicit stream changes. Only populated from spec.osImageStream.name (not the resolved default) to avoid fleet-wide rollouts on upgrade. - Write os-stream to token secret data for the ignition server to read. - Extend validMachineConfigCondition with RHEL stream validation: rhel-10 on < 5.0 and rhel-10+runc are rejected with clear messages. - Update defaultNodePoolAMI and defaultNodePoolGCPImage to accept an optional stream parameter for stream-aware boot image resolution. - Report status.osImageStream by detecting the RHEL stream from CAPI Machine NodeInfo.OSImage (RHCOS version 4xx=RHEL 9, 5xx=RHEL 10). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Thread the OS stream selection through the ignition server pipeline: - Add osStream parameter to the IgnitionProvider.GetPayload() interface. The TokenSecretReconciler reads the os-stream key from the token secret and passes it to GetPayload(). - In LocalIgnitionProvider.GetPayload(), after copyMCOOutputToMCC() and before runMCC(), write 99_osimagestream.yaml to the MCC manifest directory. This OSImageStream CR tells the MCC bootstrap which RHEL stream to select when rendering MachineConfigs. When osStream is empty (pre-5.0 or feature gate disabled), no CR is written and the MCC uses the default BaseOSContainerImage from ControllerConfig. This completes the end-to-end flow: NodePool spec -> token secret -> ignition server -> MCC bootstrap -> correct osImageURL in MachineConfig. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add comprehensive unit tests covering all new dual-stream functions: - TestGetRHELStream: 11 test cases covering all decision branches (explicit/implicit streams, runc guard, version thresholds, pre-release metadata stripping). - TestDetectRHELStreamFromOSImage: 7 test cases for RHCOS version string parsing (RHEL 9 4xx, RHEL 10 5xx, edge cases). - TestOsImageStreamFromMachines: 7 test cases for majority-based stream detection from CAPI Machines (unanimous, split, nil NodeInfo). - TestWriteOSImageStreamManifest: 2 test cases verifying correct YAML generation of the OSImageStream CR for both rhel-9 and rhel-10. All test names follow the HyperShift "When...it should..." convention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6c3e7d1 to
8a6804f
Compare
| // discover constituent component image information. | ||
| type ReleaseImage struct { | ||
| *imageapi.ImageStream `json:",inline"` | ||
| StreamMetadata *CoreOSStreamMetadata `json:"streamMetadata"` |
There was a problem hiding this comment.
can we instead use the upstream types for github.com/coreos/stream-metadata-go
There was a problem hiding this comment.
Good suggestion. I checked the upstream github.com/coreos/stream-metadata-go/stream types and they cover most of what we need: Stream ≈ CoreOSStreamMetadata, plus AwsImage, GcpImage, ContainerImage (KubeVirt), ReplicatedObject (PowerVS/IBMCloud) etc.
However, our custom types include Azure-specific extensions (CoreRHCOSImage, CoreAzureDisk, CoreMarketplace, CoreAzureMarketplaceImage) that don't exist in the upstream package — Azure marketplace data is accessed via rhcos.Extensions which has a different structure than what we parse today. Adopting upstream types would require:
- Adding the module as a dependency
- Updating all ~15 consumers to use upstream type names and field accessors
- Reworking the Azure marketplace path through
rhcos.Extensions - Verifying JSON tag compatibility (some differ:
sha256vsSHA256, etc.)
I'd prefer to do this as a follow-up PR to keep this one focused on the dual-stream plumbing. I'll create a JIRA for it. Sound reasonable?
AI-assisted response via Claude Code
There was a problem hiding this comment.
Good point. The upstream github.com/coreos/stream-metadata-go package provides types like stream.Stream and stream.Arch that map to our custom CoreOSStreamMetadata and CoreOSArchitecture.
However, adopting the upstream types requires careful evaluation:
- The upstream types may include fields we don't need, increasing API surface
- Our types have custom JSON tags and HyperShift-specific nested structs (e.g.,
CoreOSGCPImage,CoreAzureMarketplace) that don't exist upstream - The upstream types would introduce a new dependency that needs to be tracked for compatibility
I'd propose handling this as a follow-up PR to keep this one focused on the dual-stream logic. We can evaluate which upstream types map cleanly and adopt them incrementally. Would that work for you?
AI-assisted response via Claude Code
| // TestReleaseInfoKubeVirt tests validates the presence of the kubevirt images | ||
| func TestReleaseInfoKubeVirt(t *testing.T) { | ||
| metadata, err := DeserializeImageMetadata(fixtures.CoreOSBootImagesYAML_4_10) | ||
| metadata, _, err := DeserializeImageMetadata(fixtures.CoreOSBootImagesYAML_4_10) |
There was a problem hiding this comment.
can we please have unit tests with a stub for dualstream
There was a problem hiding this comment.
We already have dual-stream unit tests using a 5.0-dual-stream-bootimages.yaml fixture (embedded via fixtures.CoreOSDualStreamBootImagesYAML_5_0):
TestDeserializeDualStreamImageMetadata— verifies legacy single-stream parsing, multi-stream parsing (rhel-9/rhel-10), correct stream names, and per-stream AWS AMI differentiationTestStreamMetadataForStream— verifies stream lookup for empty, known (rhel-10), and unknown stream names with correct fallback behavior
These are in support/releaseinfo/releaseinfo_test.go (lines 165–262). Would you like additional coverage for specific scenarios beyond what these already test?
AI-assisted response via Claude Code
There was a problem hiding this comment.
The dual-stream deserialization is already covered by tests in this file:
TestDeserializeDualStreamImageMetadata(line 165): parses the dual-stream fixture, validates bothrhel-9andrhel-10streams exist, verifies stream names, and checks per-stream AWS AMIs are differentTestStreamMetadataForStream(line 229): validatesStreamMetadataForStream()returns the correct metadata for empty, known, and unknown stream names
For the TestReleaseInfoKubeVirt test at this line, the change is minimal — it only updates the call to DeserializeImageMetadata to accept the new 3-value return (adding _ for streamsMeta). The test still validates KubeVirt images from the 4.10 single-stream fixture.
Would you like additional tests specifically for KubeVirt/PowerVS with dual-stream data? The current dual-stream fixture (CoreOSDualStreamBootImagesYAML_5_0) does include per-stream data for AWS/GCP, but we could extend it to include KubeVirt and PowerVS entries as well.
AI-assisted response via Claude Code
| } | ||
| // Default behavior for Linux/RHCOS AMIs | ||
| ami, err := defaultNodePoolAMI(region, arch, releaseImage) | ||
| ami, err := defaultNodePoolAMI(region, arch, releaseImage, nodePool.Spec.OSImageStream.Name) |
There was a problem hiding this comment.
Is there other platforms like azure that need to update how they pick default?
There was a problem hiding this comment.
Yes, good catch. Currently only AWS's defaultNodePoolAMI and defaultNodePoolGCPImage accept the stream parameter and use StreamMetadataForStream(). The following platforms still access releaseImage.StreamMetadata directly and would need updating:
- Azure (
azure.go:124) —getAzureMarketplaceMetadatausesreleaseImage.StreamMetadata - GCP (
gcp.go:171) —resolveGCPImagecallsdefaultNodePoolGCPImagewithout passing the stream name - KubeVirt (
kubevirt/kubevirt.go:47) — usesreleaseImage.StreamMetadatadirectly - PowerVS (
powervs.go:112) — usesreleaseImage.StreamMetadatadirectly - OpenStack (
openstack/openstack.go:135,161) — usesreleaseImage.StreamMetadatadirectly
For pre-5.0 releases this is fine (single stream = legacy StreamMetadata). For 5.0+ where StreamMetadata falls back to the first stream in the map, these platforms will still get a valid image — but they won't respect spec.osImageStream.name. I'll update them to use StreamMetadataForStream() consistently.
AI-assisted response via Claude Code
There was a problem hiding this comment.
Yes, good catch. Currently the following platforms need updating for dual-stream support:
- GCP (
gcp.go:171):defaultNodePoolGCPImagealready accepts astream ...stringvariadic, butresolveGCPImagedoesn't passnodePool.Spec.OSImageStream.Nameyet - Azure (
azure.go:39):defaultAzureNodePoolImageaccessesreleaseImage.StreamMetadatadirectly without going throughStreamMetadataForStream() - PowerVS (
powervs.go:112):getPowerVSImageaccessesreleaseImage.StreamMetadata.Architectures["ppc64le"]directly — though ppc64le may not have RHEL 10 images in the near term - KubeVirt (
kubevirt.go:47):defaultImageaccessesreleaseImage.StreamMetadatadirectly
For this initial PR, I focused on AWS as the primary platform for RHEL 10 dual-stream. GCP and Azure should be updated to use StreamMetadataForStream() — I can add that in a follow-up or extend this PR if you'd prefer. PowerVS and KubeVirt are lower priority since ppc64le/s390x may not have RHEL 10 images initially.
AI-assisted response via Claude Code
| return &ctrl.Result{}, fmt.Errorf("invalid RHEL stream configuration: %w", streamErr) | ||
| } | ||
| // Informational message when stream falls back to rhel-9 due to runc. | ||
| if nodePool.Spec.OSImageStream.Name == "" && stream == rhelStreamRHEL9 { |
There was a problem hiding this comment.
how do we know it was due to runc here? couldn't it be the payload is < 5?
There was a problem hiding this comment.
Good question. When specStream == "", getRHELStream() returns exactly these values:
| Release | usesRunc | Returns |
|---|---|---|
| < 5.0 | false | "" (empty — legacy) |
| < 5.0 | true | "" (empty — legacy) |
| >= 5.0 | false | "rhel-10" (default) |
| >= 5.0 | true | "rhel-9" (runc fallback) |
So the condition specStream == "" && stream == rhelStreamRHEL9 can only be true when rv >= 5.0 && usesRunc — pre-5.0 payloads return empty string, not "rhel-9". The runc attribution in the message is correct.
That said, the code doesn't make this obvious at the call site. Would it be clearer to also check configGenerator.usesRunc explicitly in the condition?
AI-assisted response via Claude Code
There was a problem hiding this comment.
Good question. Looking at getRHELStream() in rhel_stream.go, when specStream == "" (no explicit stream set), the possible return values are:
rv >= 5.0 && usesRunc→ returns"rhel-9"(this is the only path that returns"rhel-9"with empty specStream)rv >= 5.0 && !usesRunc→ returns"rhel-10"rv < 5.0→ returns""(legacy single-stream)
So the condition nodePool.Spec.OSImageStream.Name == "" && stream == rhelStreamRHEL9 can only be true via case 1 (runc fallback). Pre-5.0 payloads return "", not "rhel-9".
That said, the message could be clearer. I could add a comment explaining this reasoning, or make the condition more explicit by also checking configGenerator.usesRunc directly. Would that help?
AI-assisted response via Claude Code
| cg.rolloutConfig.additionalTrustBundleName = hostedCluster.Spec.AdditionalTrustBundle.Name | ||
| } | ||
|
|
||
| // rhelStream is populated directly from spec.osImageStream.name (empty when unset). |
There was a problem hiding this comment.
should we error here if userunC and rhelStream=10?
There was a problem hiding this comment.
This is already handled downstream. In conditions.go:394, after the config is generated, we call:
```go
stream, streamErr := getRHELStream(nodePool.Spec.OSImageStream.Name, releaseVersion, configGenerator.usesRunc)
```
getRHELStream returns an error for rhelStream=rhel-10 + usesRunc:
```go
case specStream == rhelStreamRHEL10 && usesRunc:
return "", fmt.Errorf("OS stream %s is incompatible with default_runtime=runc; RHEL 10 does not ship runc", rhelStreamRHEL10)
```
The config generator itself populates `rhelStream` from the raw spec value at line 104 — it's just storing the user's input. The validation happens in validMachineConfigCondition which runs getRHELStream() to catch the incompatible combination and sets a clear condition message. So the validation is there, just in a different layer (conditions vs config).
AI-assisted response via Claude Code
There was a problem hiding this comment.
This is already handled by getRHELStream() in rhel_stream.go:36-37:
case specStream == rhelStreamRHEL10 && usesRunc:
return "", fmt.Errorf("OS stream %s is incompatible with default_runtime=runc; RHEL 10 does not ship runc", rhelStreamRHEL10)The validation happens in validMachineConfigCondition() (conditions.go:394) which calls getRHELStream() and sets NodePoolValidMachineConfigCondition to False when this error is returned. So the error is surfaced as a condition before we ever reach NewConfigGenerator.
The code at this line (config.go:101) runs during NewConfigGenerator, which is called before the getRHELStream() validation in conditions.go. However, cg.rhelStream here is only used for the hash — the actual incompatibility validation is handled separately. The usesRunc flag is set later during generateMCORawConfig (line 106) when parsing ContainerRuntimeConfig manifests.
AI-assisted response via Claude Code
| // This is kept like this for now to contain the scope of the refactor and avoid backward compatibility issues. | ||
| func (cg *ConfigGenerator) Hash() string { | ||
| return supportutil.HashSimple(cg.mcoRawConfig + cg.releaseImage.Version() + cg.pullSecretName + cg.additionalTrustBundleName + cg.globalConfig) | ||
| return supportutil.HashSimple(cg.mcoRawConfig + cg.releaseImage.Version() + cg.pullSecretName + cg.additionalTrustBundleName + cg.globalConfig + cg.rhelStream) |
There was a problem hiding this comment.
do we have coverage to validate this doesn't change the current hash when rhelStream is empty? otherwise that would trigger a rollout
There was a problem hiding this comment.
When `rhelStream` is empty (the default), the hash input is:
```
mcoRawConfig + version + pullSecretName + additionalTrustBundleName + globalConfig + ""
```
Since `"abc" + ""` == `"abc"` in Go string concatenation, appending an empty string produces the same hash as before. No rollout is triggered.
That said, an explicit test proving this invariant would be valuable — I'll add a test case that computes the hash with `rhelStream=""` and verifies it matches the hash computed without the `rhelStream` field present at all.
AI-assisted response via Claude Code
There was a problem hiding this comment.
Yes — there's explicit test coverage for this in config_test.go:
TestHash (line 450):
{
name: "When rhelStream is empty it should not change the hash",
...
rhelStream: "",
expected: baseCaseHash, // identical to base case without rhelStream field
},This test verifies that an empty rhelStream (the default when spec.osImageStream is unset) produces the exact same hash as the base case. Since rhelStream defaults to "" and string concatenation with "" is a no-op, the hash input is identical to pre-PR behavior.
AI-assisted response via Claude Code
| // This is kept like this for now to contain the scope of the refactor and avoid backward compatibility issues. | ||
| func (cg *ConfigGenerator) HashWithoutVersion() string { | ||
| return supportutil.HashSimple(cg.mcoRawConfig + cg.pullSecretName + cg.additionalTrustBundleName) | ||
| return supportutil.HashSimple(cg.mcoRawConfig + cg.pullSecretName + cg.additionalTrustBundleName + cg.rhelStream) |
There was a problem hiding this comment.
do we have coverage to validate this doesn't change the current hash when rhelStream is empty? otherwise that would trigger a rollout
There was a problem hiding this comment.
Same answer as above — when rhelStream is empty (the default/unset case), appending "" to the hash input string is a no-op: "abc" + "" == "abc". The hash is identical to the old code. I'll make the test cover both Hash() and HashWithoutVersion() to validate this invariant explicitly.
AI-assisted response via Claude Code
There was a problem hiding this comment.
Same as above — TestHashWithoutVersion (line 576) has the same coverage:
{
name: "When rhelStream is empty it should not change the hash",
...
rhelStream: "",
expected: baseCaseHash, // "85234650" — identical to base case
},Both Hash() and HashWithoutVersion() are covered. Empty rhelStream → no hash change → no rollout.
AI-assisted response via Claude Code
| } | ||
| case *mcfgv1.ContainerRuntimeConfig: | ||
| // Detect runc usage for dual-stream RHEL compatibility | ||
| if obj.Spec.ContainerRuntimeConfig != nil && |
There was a problem hiding this comment.
is this how the MCO validates this? please let this be a unit tested func
Code Review: PR #8714 — Dual-stream RHEL 9/10 NodePool SupportReviewer: AI-assisted review (Claude Code with HyperShift SME agents) Overall Verdict: FAIL (3 blocking issues)The end-to-end architecture (spec → controller → token → ignition → MCC) is sound, and the hash stability design (explicit stream only in hash to avoid fleet-wide rollouts on 5.0 upgrade) is well-considered. Pure functions ( Required Actions (Blocking)1. Build failure —
|
| Function | File | Gap |
|---|---|---|
resolveStreamMetadata() |
nodepool_controller.go |
Zero tests — core dispatch logic for multi-stream |
DeserializeMultiStreamImageMetadata() |
deserialize.go |
Zero tests — parses new multi-stream ConfigMap format |
usesRunc detection |
config.go |
No test with defaultRuntime=runc verifying cg.usesRunc=true |
defaultNodePoolAMI with stream |
nodepool_controller.go |
Variadic stream param never exercised |
rhelStream in Hash |
config.go |
No test verifying hash changes when rhelStream changes |
os-stream in token secret |
token.go |
No test verifying the key is written |
Documentation
13. No forced migration path for existing RHEL 9 nodes — The implicit default switch from RHEL 9 to RHEL 10 on 5.0+ happens at the ignition payload level without changing the rollout hash. Existing nodes stay on RHEL 9; only new nodes get RHEL 10. This is likely intentional but should be documented explicitly in the enhancement.
Platform Coverage Summary
| Platform | Stream-aware? | Status |
|---|---|---|
| AWS | ✅ | Correct via resolveStreamMetadata() |
| GCP | ✅ | Correct via resolveStreamMetadata() |
| Azure | Inline resolution — should use shared helper | |
| KubeVirt | Inline resolution — should use shared helper | |
| PowerVS | ❌ | Nil-safety risk (blocking) |
| OpenStack | ❌ | Nil-safety risk |
| Karpenter | ❌ | Stream argument not passed |
What's Good
- Hash stability design — Using the raw spec value (not resolved default) for the rollout hash avoids fleet-wide rollouts when upgrading to 5.0. Well-commented.
getRHELStream()is clean — Pure function with clear switch-case logic and comprehensive tests (10 cases).- Token secret flow — The
os-streamkey absence for pre-5.0 is handled correctly (nil → empty string → no manifest). - OSImageStream CR timing — Written after MCO but before MCC, which is the correct placement in the ignition pipeline.
usesRuncdetection — Correctly re-initialized per reconcile via freshConfigGeneratorconstruction.
Generated with Claude Code using HyperShift SME agents (control-plane, data-plane, cloud-provider, API, architect)
|
|
||
| // rhelStream is populated directly from spec.osImageStream.name (empty when unset). | ||
| // Using the resolved value from getRHELStream() would inject implicit defaults into | ||
| // the hash, triggering fleet-wide rollouts on upgrade. |
There was a problem hiding this comment.
I think we should only let this change the hash and trigger a rollout if the value is different from the one in status
There was a problem hiding this comment.
Interesting suggestion. Let me walk through the current behavior and the tradeoff:
Current design: rhelStream in the hash comes directly from spec.osImageStream.name (empty when unset). This means:
- Empty → empty (upgrade to 5.0, no explicit stream set): hash unchanged, no rollout ✅
- Empty → "rhel-10" (user explicitly sets stream): hash changes, rollout triggered ✅
- "rhel-9" → "rhel-10" (user changes stream): hash changes, rollout triggered ✅
Your suggestion (compare with status): If spec == status, skip from hash. This would prevent a rollout when the user explicitly sets the stream to what's already running (e.g., setting "rhel-10" on a node already running rhel-10).
The tradeoff: this would require passing nodePool.Status.OSImageStream.Name into the ConfigGenerator, adding a coupling between the config hash computation and status. Currently the config generator is deliberately a pure function of spec inputs.
Also, status.osImageStream is populated from CAPI Machine NodeInfo (detected from running nodes), not from the controller's intent. There could be a window where status lags — e.g., the user sets rhel-10 but nodes haven't been updated yet, so status still shows rhel-9.
For now, the only case that would cause a "spurious" rollout is when a user explicitly sets the stream to match what they're already running — which seems rare and arguably intentional (explicit declarative intent). Would you still like this changed, or is the current behavior acceptable?
AI-assisted response via Claude Code
There was a problem hiding this comment.
Good point. The current design uses spec.osImageStream.name directly in the hash, which means:
- Empty → empty string → hash unchanged → no rollout ✅
- User sets "rhel-10" → hash changes → rollout ✅ (intended)
- User sets "rhel-9" on a cluster already running rhel-9 → hash changes → rollout
⚠️ (unnecessary)
Your suggestion to compare spec vs status would avoid the unnecessary rollout in case 3. Implementation would look something like:
// Only include rhelStream in hash if it differs from the observed stream in status,
// preventing unnecessary rollouts when explicitly setting the current stream.
if nodePool.Spec.OSImageStream.Name != nodePool.Status.OSImageStream.Name {
cg.rolloutConfig.rhelStream = nodePool.Spec.OSImageStream.Name
}However, there's a subtle concern: this creates a dependency between the hash calculation and the status, which is normally populated asynchronously from observed machine state. If status hasn't been populated yet (e.g., new NodePool), we'd need to handle that edge case.
Would you like me to implement this approach? The tradeoff is: simpler code (current) vs. avoiding unnecessary rollouts when the user sets spec.osImageStream to match the already-running stream.
AI-assisted response via Claude Code
|
/address-review-comments |
|
🤖 Addressing review comments: workflow run |
- Add nil-safety checks for StreamMetadata in PowerVS, OpenStack, and KubeVirt - Extract runc detection to standalone containerRuntimeUsesRunc() function - Use rhelStream constants in detectRHELStreamFromOSImage instead of raw strings - Add hash stability tests for Hash() and HashWithoutVersion() with empty rhelStream - Add unit tests for containerRuntimeUsesRunc with runc, crun, nil, and empty cases - Add dual-stream fixture and tests for DeserializeImageMetadata and StreamMetadataForStream Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hypershift-jira-solve-ci[bot] The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe lint job failed because two newly-added files ( Root CauseThe project's
Both new files group rhel_stream.go current (incorrect): import (
"fmt"
"github.com/blang/semver"
mcfgv1 "github.com/openshift/api/machineconfiguration/v1"
)rhel_stream.go expected (correct): import (
"fmt"
mcfgv1 "github.com/openshift/api/machineconfiguration/v1"
"github.com/blang/semver"
)The same pattern applies to Recommendations
Evidence
|
|
/address-review-comments |
|
🤖 Addressing review comments: workflow run |
|
@hypershift-jira-solve-ci[bot]: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test address-review-comments |
What this PR does / why we need it:
Implements dual-stream RHEL 9/10 NodePool support for OpenShift 5.0+ payloads.
Starting with OpenShift 5.0, release payloads carry both RHEL 9 and RHEL 10 boot images. This PR threads the OS stream selection end-to-end: from the NodePool spec through the controller, token secret, ignition server, and into MCC bootstrap manifests.
Key changes:
Boot image metadata parsing (
support/releaseinfo):DeserializeImageMetadatanow parses both the legacy single-stream"stream"key and the new multi-stream"streams"key from the boot image ConfigMap. AStreamMetadataForStreamhelper resolves stream-specific metadata with fallback to legacy.RHEL stream resolution (
nodepool/rhel_stream.go): Pure functiongetRHELStream()resolves the target RHEL stream based onspec.osImageStream.name, release version, and runc usage. RHEL 10 is the default for >= 5.0; runc forces RHEL 9 (RHEL 10 does not ship runc). Pre-5.0 releases use legacy single-stream behavior.Controller plumbing (
nodepool/): The resolved stream flows intorolloutConfigfor hash computation (only explicit stream changes trigger rollouts), intodefaultNodePoolAMI/defaultNodePoolGCPImagefor stream-aware boot image selection, and into the token secret asos-streamfor the ignition server.Ignition server (
ignition-server/):GetPayload()accepts the OS stream and writes anOSImageStreamCR (99_osimagestream.yaml) to the MCC manifest directory, telling MCC bootstrap which RHEL stream to use when rendering MachineConfigs.Status reporting (
nodepool/version.go):status.osImageStreamis populated by detecting the RHEL stream from CAPI MachineNodeInfo.OSImage(RHCOS 4xx = RHEL 9, 5xx = RHEL 10).Validation conditions:
rhel-10on release < 5.0 andrhel-10+ runc are rejected with clear condition messages.Which issue(s) this PR fixes:
Fixes https://issues.redhat.com/browse/CNTRLPLANE-3612
Special notes for your reviewer:
rhelStreamfield is included inrolloutConfig.Hash()but only populated from the explicitspec.osImageStream.name(not the resolved default). This avoids fleet-wide rollouts when upgrading to 5.0 — existing NodePools silently pick up the default stream without a rollout.ContainerRuntimeConfigwithdefaultRuntime=runc, the controller automatically falls back to RHEL 9 even on >= 5.0 releases."stream"key is absent, ensuring stable behavior.Checklist:
Always review AI generated responses prior to use.
Generated with Claude Code via
/jira:solve [CNTRLPLANE-3612](https://redhat.atlassian.net/browse/CNTRLPLANE-3612)Summary by CodeRabbit
New Features
Bug Fixes / Improvements
Tests