Skip to content

fix(hostedclustersizing): add override fall back#7554

Draft
SudoBrendan wants to merge 1 commit into
openshift:mainfrom
SudoBrendan:sudobrendan/fix-size-override-fallback
Draft

fix(hostedclustersizing): add override fall back#7554
SudoBrendan wants to merge 1 commit into
openshift:mainfrom
SudoBrendan:sudobrendan/fix-size-override-fallback

Conversation

@SudoBrendan

@SudoBrendan SudoBrendan commented Jan 20, 2026

Copy link
Copy Markdown

What this PR does / why we need it:

Today, if you set the size override annotation, but that size isn't an exact string match of what's in the config, your cluster will remain unsized rather than falling back to the standard logic (autoscale / node size). The only indication this has happened is a log message in the controller.

NOTE: I think this seems odd - do others agree this is a bug, not a feature? Is there a reason to prefer an unsized cluster when the override is set incorrectly that I'm unaware of?

Which issue(s) this PR fixes:

Adjacent work I happened to discover while trying to refine what CNTRLPLANE-2581 would be

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

When ClusterSizeOverrideAnnotation is set to a value not found in the
ClusterSizingConfiguration, the controller now gracefully falls back to
autoscaling (if enabled) or node count sizing, rather than skipping
sizing entirely.

Signed-off-by: Brendan Bergen <bbergen@redhat.com>
Assisted-by: Claude Opus 4.5 (via Cursor)
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 20, 2026
@openshift-ci

openshift-ci Bot commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added do-not-merge/needs-area area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release labels Jan 20, 2026
@openshift-ci

openshift-ci Bot commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SudoBrendan
Once this PR has been reviewed and has the lgtm label, please assign devguyio for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci

openshift-ci Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

@SudoBrendan: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 11, 2026
@openshift-ci

openshift-ci Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Stale PRs are closed after 21d of inactivity.

If this PR is still relevant, comment to refresh it or remove the stale label.
Mark the PR as fresh by commenting /remove-lifecycle stale.

If this PR is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 11, 2026
@hypershift-jira-solve-ci

Copy link
Copy Markdown

Test Failure Analysis Complete

Job Information

  • Prow Job: tide, Red Hat Konflux enterprise-contract checks
  • Build ID: N/A (Konflux pipeline runs: hypershift-operator-main-enterprise-contract-l4jgm, hypershift-operator-enterprise-contract-4pg6w, enterprise-contract-mce-211-kzvv2)
  • PR: #7554fix(hostedclustersizing): add override fall back
  • PR Created: 2026-01-20 (last commit: 2026-01-20, ~5 months stale)
  • PR Status: 1,676 commits behind main, merge state: CONFLICTING

Test Failure Analysis

Error

tide: "Not mergeable. PR has a merge conflict." (state: error)

Red Hat Konflux / hypershift-operator-main-enterprise-contract: verify task FAILURE
  — 222 successes, 30 warnings, 10 failures

Red Hat Konflux / hypershift-operator-enterprise-contract: verify task FAILURE
  — 222 successes, 30 warnings, 10 failures

Red Hat Konflux / enterprise-contract-mce-211: verify task FAILURE
  — 240 successes, 62 warnings, 2 failures

Summary

None of these failures are caused by the PR's code changes. The PR has been stale for ~5 months (last commit January 20, 2026) and is now 1,676 commits behind main. The tide failure is caused by a merge conflict: the exact code the PR modifies (reconcile() method in hostedclustersizing_controller.go) was completely restructured by a gocyclo refactoring commit on May 7, which extracted the size-determination logic into a new determineSizeClass() function — making the PR's if/else-if/elseif/if/if refactor unmergeable. The Enterprise Contract (EC) failures are a systemic issue affecting multiple PRs across the repository (confirmed on PR #8709 from June 10, 2026) and are not related to this PR's code.

Root Cause

1. Tide — Merge Conflict (Primary Blocker)

The PR modifies the size-class selection logic in hostedclustersizing_controller.go (lines ~208–265), converting an if/else-if/else chain into sequential if blocks with sizeClass == nil guards. However, three commits merged to main after the PR was created have fundamentally restructured this code:

  • aee93fe (Apr 16): fix(lint): resolve all unparam violations — changed function signatures in the same file.
  • 615bc5c (May 7): refactor(hypershift-operator): reduce cyclomatic complexity and enable gocyclo linterthis is the primary conflict source. It extracted the entire size-determination logic from reconcile() into a new standalone function determineSizeClass(). The old if overrideSize {...} else if autoScaling {...} else {...} chain that PR fix(hostedclustersizing): add override fall back #7554 modifies was completely rewritten with early returns (return &config.Spec.Sizes[i], nil) instead of variable assignment. The ignoreError type was also changed from a type alias to a struct.
  • 7b92a2f (May 12): docs: restore non-obvious comments after gocyclo refactor — further modified the same file.

The refactored code on main already returns immediately when a valid override is found. When the override is invalid, it now logs an error and returns nil — which means the PR's intended "fallback" behavior needs to be re-implemented against the new determineSizeClass() function structure.

2. Enterprise Contract Failures (Systemic, Not PR-specific)

The three Konflux EC check failures (hypershift-operator-main-enterprise-contract, hypershift-operator-enterprise-contract, enterprise-contract-mce-211) are not caused by this PR. Evidence:

Recommendations
  1. Rebase the PR onto current main: The PR must be rebased to resolve the merge conflict. The core logic needs to be re-applied against the new determineSizeClass() function structure introduced by commit 615bc5c. Specifically:

    • In the new determineSizeClass(), modify the override branch to fall back instead of returning nil when the override size is not found.
    • Preserve the PR's intended behavior: when ClusterSizeOverrideAnnotation specifies an invalid size, fall through to autoscaling (if enabled) or node-count sizing, rather than silently skipping sizing.
  2. Do not investigate the EC failures in the context of this PR: The Enterprise Contract failures are a known systemic issue affecting the hypershift-operator component across multiple PRs. After rebasing, the EC checks will re-run against the fresh build and may resolve (or will show the same systemic issue that the team is already aware of).

  3. Re-run checks after rebase: All four checks (tide + 3 EC) should be re-evaluated after a rebase. The Prow CI checks (e2e-azure-self-managed, security) already pass for this PR.

  4. Consider whether this PR is still needed: The gocyclo refactor may have already addressed the behavior this PR targets (the refactored code returns nil for invalid overrides). Verify whether the current main behavior matches the intended fix before rebasing.

Evidence
Evidence Detail
PR staleness Created 2026-01-20, last commit 2026-01-20, now 1,676 commits behind main
Merge state mergeStateStatus: DIRTY, mergeable: CONFLICTING
Tide error "Not mergeable. PR has a merge conflict."
Conflicting commit 615bc5c (May 7) — refactored reconcile() into determineSizeClass(), restructuring the exact code the PR modifies (+224/-146 lines in the same file)
Additional conflicting commits aee93fe (Apr 16, lint fixes), 7b92a2f (May 12, comment restoration)
EC systemic evidence PR #8709 (June 10, unrelated PR) has identical hypershift-operator-main-enterprise-contract FAILURE
EC systemic evidence PR #8711 shows same checks as neutral; control-plane-operator-enterprise-contract passes on same PRs
EC check details (PR #7554) hypershift-operator-main-enterprise-contract: 222 pass, 30 warn, 10 fail
EC check details (PR #7554) hypershift-operator-enterprise-contract: 222 pass, 30 warn, 10 fail
EC check details (PR #7554) enterprise-contract-mce-211: 240 pass, 62 warn, 2 fail
Prow CI status e2e-azure-self-managed: ✅ PASS, security: ✅ PASS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant