-
Notifications
You must be signed in to change notification settings - Fork 96
CNTRLPLANE-3743: Add Agentic SDLC context files #440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sanchezl
wants to merge
4
commits into
openshift:master
Choose a base branch
from
sanchezl:contextify
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+303
−43
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| # AI Agent Instructions for cluster-openshift-controller-manager-operator | ||
|
|
||
| > Also read [ARCHITECTURE.md](ARCHITECTURE.md) for design decisions and [CONTRIBUTING.md](CONTRIBUTING.md) for workflow. | ||
|
|
||
| ## What This Repo Is | ||
|
|
||
| This is the **operator** for the openshift-controller-manager and route-controller-manager on OpenShift. It reconciles the `openshiftcontrollermanagers.operator.openshift.io/cluster` CR and manages two operand Deployments, their ConfigMaps, RBAC, network policies, and CA bundles. It reports status via the `openshift-controller-manager` ClusterOperator object. | ||
|
|
||
| The **operand** (the controller-manager binary itself) lives in [openshift/openshift-controller-manager](https://github.com/openshift/openshift-controller-manager). This repo only manages how that binary is deployed and configured. | ||
|
|
||
| ## Repository Layout | ||
|
|
||
| ```text | ||
| cmd/ # Entry points | ||
| cluster-openshift-controller-manager-operator/ # Operator binary | ||
| cluster-openshift-controller-manager-operator-tests-ext/ # OTE test binary | ||
| pkg/ | ||
| operator/ # Core operator logic | ||
| starter.go # RunOperator — wires all controllers | ||
| operator.go # Main reconciler struct and work queue | ||
| sync_openshiftcontrollermanager_v311_00.go # Sync loop for both operands | ||
| configobservation/ # Config observers (builds, images, network, controllers) | ||
| usercaobservation/ # Proxy CA bundle sync controller | ||
| internalimageregistry/ # Image pull secret cleanup controller | ||
| util/consts.go # Namespace constants | ||
| bindata/assets/ # Static manifests applied by the operator | ||
| manifests/ # CVO-managed manifests (install/upgrade) | ||
| test/e2e/ # E2E tests (Ginkgo) | ||
| ``` | ||
|
|
||
| ## Build and Test Commands | ||
|
|
||
| ```bash | ||
| make build # Build operator and OTE test binaries | ||
| make test # Unit tests | ||
| make verify # Lint and vet | ||
| make test-e2e # E2E tests (requires KUBECONFIG) | ||
| ``` | ||
|
|
||
| ## Critical Rules | ||
|
|
||
| 1. **Network policy ordering matters.** In `starter.go`, allow-rules must appear before default-deny rules in the StaticResourceController asset list. Reversing this order causes traffic interruption during reconciliation. | ||
|
|
||
| 2. **Never remove the "always managed" and "not removable" flags.** The operator calls `management.SetOperatorAlwaysManaged()` and `management.SetOperatorNotRemovable()` because the openshift-controller-manager is a core platform component. | ||
|
|
||
| 3. **Config observers must return previous config on error.** If an observer encounters an error, it must return `previousConfig` (not empty config) to prevent flapping the operand configuration. | ||
|
|
||
| 4. **All informer caches must sync before controllers start.** The operator explicitly calls `WaitForCacheSync` for every informer group and returns an error if any fail. Skipping this check causes transiently incorrect operations from partially-synced caches (see OCPBUGS-81472). | ||
|
|
||
| 5. **Vendor is committed.** After dependency changes, run `go mod tidy && go mod vendor` and commit the vendor update separately from code changes. | ||
|
|
||
| ## Key Patterns | ||
|
|
||
| - **library-go operator framework**: Uses `StaticResourceController`, `ConfigObserver`, `ClusterOperatorStatusController`, `ResourceSyncController`, and `LogLevelController` from `openshift/library-go`. | ||
| - **Dual-operand sync**: One sync function manages both the openshift-controller-manager and route-controller-manager Deployments. Errors and status conditions are tracked separately with the `RouteControllerManager` prefix for the route-controller-manager. | ||
| - **Capability-gated controllers**: Controllers are disabled by prepending `-` to their name in the operand ConfigMap (e.g., `-openshift.io/build`). The mapping from controllers to capabilities is in `controllerCapabilities` in `sync_openshiftcontrollermanager_v311_00.go`. | ||
| - **Input hash-driven rollouts**: ConfigMap resource versions and content hashes are embedded as annotations on the Deployment spec template, triggering a rolling update when upstream inputs change. | ||
| - **TLS security profile propagation**: The cluster-wide TLS profile from `apiservers.config.openshift.io` is observed and propagated into the operand config. TLS config changes trigger an operator restart. | ||
|
|
||
| ## What NOT to Do | ||
|
|
||
| - Do not add new operand controllers here — controllers belong in `openshift/openshift-controller-manager`. This repo only manages deployment and configuration. | ||
| - Do not modify `manifests/` without coordinating with the release team — these are CVO-managed and affect install/upgrade. | ||
| - Do not change the rate limiter parameters in `operator.go` without understanding the impact on API server load during rapid config changes. | ||
| - Do not add bindata assets after default-deny network policies — always add allow rules first. | ||
|
|
||
| ## Test Suites | ||
|
|
||
| - **Unit tests**: Colocated `_test.go` files in `pkg/`. Focus on config observer logic and deployment generation. | ||
| - **E2E tests**: `test/e2e/` using Ginkgo. Validate network policy enforcement and operator availability on a live cluster. | ||
| - **OTE**: The test extension binary (`cluster-openshift-controller-manager-operator-tests-ext`) is built alongside the operator and included in the production image. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,113 @@ | ||
| # Architecture: cluster-openshift-controller-manager-operator | ||
|
|
||
| ## Scope | ||
|
|
||
| This operator manages two operand Deployments: | ||
|
|
||
| - **openshift-controller-manager** — runs OpenShift-specific controllers (builds, deployments, image triggers, service accounts, templates, unidling) | ||
| - **route-controller-manager** — runs the ingress-to-route and ingress IP controllers (split from the controller-manager but still sharing the same operator CR) | ||
|
|
||
| The operator does **not** manage the upstream kube-controller-manager (that is `cluster-kube-controller-manager-operator`). | ||
|
|
||
| ## Namespace Map | ||
|
|
||
| | Namespace | Purpose | | ||
| |-----------|---------| | ||
| | `openshift-controller-manager-operator` | Operator Deployment runs here | | ||
| | `openshift-controller-manager` | Operand: openshift-controller-manager Deployment, ConfigMaps, ServiceAccount | | ||
| | `openshift-route-controller-manager` | Operand: route-controller-manager Deployment, ConfigMaps, ServiceAccount | | ||
| | `openshift-config` | Source for user-specified global configuration (proxy CA, build defaults) | | ||
| | `openshift-config-managed` | Source for machine-managed global configuration | | ||
|
|
||
| ## Component Overview | ||
|
|
||
| The operator binary starts a set of library-go controllers that collectively reconcile the desired state: | ||
|
|
||
| ``` | ||
| RunOperator() | ||
| ├── OpenShiftControllerManagerOperator — main sync loop for both operand Deployments | ||
| ├── StaticResourceController — reconciles RBAC, Namespaces, Services, NetworkPolicies from bindata/ | ||
| ├── ConfigObserver — watches cluster config and transforms it into operand configuration | ||
| ├── UserCAObservationController — watches proxy config and syncs trusted CA bundles | ||
| ├── ResourceSyncController — copies Secrets/ConfigMaps between namespaces | ||
| ├── ClusterOperatorStatusController — aggregates conditions into the ClusterOperator object | ||
| ├── LogLevelController — reconciles operator log verbosity from CR spec | ||
| └── ImagePullSecretCleanupController — cleans up internal-registry pull secrets when registry is disabled | ||
| ``` | ||
|
|
||
| ## Reconciliation Flow | ||
|
|
||
| The main sync loop (`syncOpenShiftControllerManager_v311_00_to_latest`) runs on every change to the operator CR, operand namespaces, or watched cluster config: | ||
|
|
||
| 1. **ConfigMaps** — merge default config with observed config and operator overrides, compute input hashes for rollout detection | ||
| 2. **Client CA** — sync the kube-apiserver client CA bundle into each operand namespace | ||
| 3. **Trust bundles** — ensure service-ca and global-ca ConfigMaps exist with injection annotations | ||
| 4. **Deployments** — apply the operand Deployment with correct image, replica count (one per master node), log level, proxy env vars, and spec annotations | ||
| 5. **Status** — set Available/Progressing/Degraded/Upgradeable conditions on the operator CR, then let the ClusterOperatorStatusController aggregate them | ||
|
|
||
| ## Capabilities Integration | ||
|
|
||
| The operator dynamically disables operand controllers when cluster capabilities are not enabled: | ||
|
|
||
| | Capability | Disabled Controllers | | ||
| |-----------|---------------------| | ||
| | `Build` | build, build-config-change, builder-sa, builder-rolebindings | | ||
| | `DeploymentConfig` | deployer, deployer-sa, deployment-config, deployer-rolebindings | | ||
| | `ImageRegistry` | service-account-pull-secrets, image-puller-rolebindings | | ||
|
|
||
| When the Build capability is initially disabled, the operator polls every 5 minutes and exits (triggering a restart) if the capability becomes enabled — ensuring a full re-initialization with build controllers active. | ||
|
|
||
| ## Config Observation | ||
|
|
||
| Config observers watch cluster-level resources and merge their state into the operand ConfigMap: | ||
|
|
||
| | Observer | Watches | Configures | | ||
| |---------|---------|-----------| | ||
| | `builds` | `builds.config.openshift.io/cluster` | Build defaults and overrides (only when Build capability is enabled) | | ||
| | `images` | `images.config.openshift.io/cluster` | Internal/external registry hostnames | | ||
| | `network` | `networks.config.openshift.io/cluster` | Cluster network CIDR for egress/ingress | | ||
| | `deployimages` | Operator env vars | Deployer and builder image pull specs | | ||
| | `controllers` | `clusterversions.config.openshift.io/version` | Controller enable/disable list based on capabilities | | ||
| | `featuregates` | `featuregates.config.openshift.io/cluster` | Feature flags (e.g., `BuildCSIVolumes`) | | ||
| | `apiserver` (TLS) | `apiservers.config.openshift.io/cluster` | TLS security profile (cipher suites, min TLS version) propagated to operand | | ||
|
|
||
| ## CRDs and API Types | ||
|
|
||
| - **Owned:** `openshiftcontrollermanagers.operator.openshift.io` — the operator's configuration CR (singleton named `cluster`) | ||
| - **Consumed:** `clusteroperators.config.openshift.io`, `clusterversions.config.openshift.io`, `proxies.config.openshift.io`, `builds.config.openshift.io`, `images.config.openshift.io`, `networks.config.openshift.io`, `featuregates.config.openshift.io` | ||
|
|
||
| ## Manifest and Resource Management | ||
|
|
||
| - **CVO-managed** (`manifests/`): Operator Deployment, Namespace, ServiceAccount, RBAC, ServiceMonitors, ClusterOperator, NetworkPolicies — applied by the Cluster Version Operator during install/upgrade | ||
| - **Operator-applied** (`bindata/assets/`): Operand Namespaces, Deployments, ConfigMaps, RBAC, Services, NetworkPolicies — reconciled by the StaticResourceController and the main sync loop | ||
|
|
||
| ## Dependencies | ||
|
|
||
| | Dependency | Usage | | ||
| |-----------|-------| | ||
| | `openshift/library-go` | Operator framework: static resource controller, config observer, status aggregation, resource apply, log level controller | | ||
| | `openshift/api` | CRD types (`operator/v1`, `config/v1`, `openshiftcontrolplane/v1`) | | ||
| | `openshift/client-go` | Typed clients and informers for OpenShift API groups | | ||
| | `k8s.io/client-go` | Kubernetes clients, informers, work queues | | ||
|
|
||
| ## Testing Strategy | ||
|
|
||
| - **Unit tests** (`pkg/...`): config observer logic, deployment generation, controller enable/disable based on capabilities | ||
| - **E2E tests** (`test/e2e/`): cluster-level validation using Ginkgo — network policy enforcement, operator availability | ||
| - **OTE** (`.openshift-tests-extension/`): test extension binary included in the production image for CI/CD integration | ||
|
|
||
| ## Design Decisions | ||
|
|
||
| 1. **Two operands, one operator**: The route-controller-manager was split from the openshift-controller-manager but both share the same operator CR and operator binary. This avoids a separate operator for a small set of controllers, but means the sync loop manages two independent Deployments with separate status condition prefixes. | ||
|
|
||
| 2. **Capability-aware controller disable**: Rather than separate operand images per capability, the operator passes a controller enable/disable list via the operand ConfigMap. The `-controllerName` convention (prefixing with `-`) disables individual controllers without changing the binary. | ||
|
|
||
| 3. **Rate-limited sync loop**: The main operator uses a token bucket rate limiter (0.05 tokens/sec, burst 4) to avoid hot-looping on rapid config changes. This is in addition to the workqueue's exponential backoff. | ||
|
|
||
| 4. **Always managed, not removable**: The operator opts out of library-go's "unmanaged" and "removable" states via `management.SetOperatorAlwaysManaged()` and `management.SetOperatorNotRemovable()`. The openshift-controller-manager is a core platform component that must always run. | ||
|
|
||
| 5. **Informer cache sync before controller start**: All informer caches must be fully synced before any controller starts making decisions. A bug (OCPBUGS-81472) caused transiently incorrect operations when controllers acted on partially-synced caches. The fix adds explicit `WaitForCacheSync` checks with error returns for each informer group (operator, kube, config) before starting any controllers. | ||
|
|
||
| 6. **TLS security profile propagation**: The operator observes the cluster-wide TLS security profile from `apiservers.config.openshift.io/cluster` and propagates it to the operand configuration via the config observer. When TLS config changes, the operator restarts to pick up the new settings (CNTRLPLANE-2620). | ||
|
|
||
| 7. **Capability poll-and-restart**: When the Build capability is disabled at startup, the operator skips the build config informer (avoiding a deadlock in the ConfigObserver — OCPBUGS-22956) and polls ClusterVersion every 5 minutes. If the capability becomes enabled, the operator exits to trigger a pod restart with full re-initialization. This is intentional: capabilities can only be enabled post-install (never disabled), and the newly-enabled informers require a clean startup. The [component-selection enhancement](https://github.com/openshift/enhancements/blob/master/enhancements/installer/component-selection.md) keeps capability awareness at the CVO layer and does not prescribe an operator-level reactive pattern. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| AGENTS.md |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| # Contributing to cluster-openshift-controller-manager-operator | ||
|
|
||
| ## Development Workflow | ||
|
|
||
| 1. Fork the repo and clone your fork | ||
| 2. Create a feature branch from `master` | ||
| 3. Make your changes, add or update tests | ||
| 4. Run verification locally: | ||
| ```bash | ||
| make build verify test | ||
| ``` | ||
| 5. If you changed dependencies: `go mod tidy && go mod vendor`, commit separately from code changes | ||
| 6. Push your branch and open a PR | ||
|
|
||
| ## Pull Request Guidelines | ||
|
|
||
| - Keep PRs focused — one logical change per PR | ||
| - Reference JIRA tickets in PR title: `OCPBUGS-XXXXX: description` or `CNTRLPLANE-XXXXX: description` | ||
| - Include tests for new functionality | ||
| - PRs require `/lgtm` from a reviewer and `/approve` from an approver (see OWNERS) | ||
| - All PRs require `/verified` before merge (see OpenShift verification workflow) | ||
|
|
||
| ## Testing | ||
|
|
||
| | Command | What It Runs | | ||
| |---------|-------------| | ||
| | `make build` | Builds the operator and OTE test binaries | | ||
| | `make test` | Unit tests in `pkg/...` and `cmd/...` | | ||
| | `make verify` | Linting, formatting, vet checks | | ||
| | `make test-e2e` | E2E tests against a live cluster (requires `KUBECONFIG`) | | ||
|
|
||
| ### E2E Tests | ||
|
|
||
| E2E tests use Ginkgo and require a running OpenShift cluster. They validate network policy enforcement and operator health. | ||
|
|
||
| ```bash | ||
| make test-e2e | ||
| ``` | ||
|
|
||
| ## Code Conventions | ||
|
|
||
| - **Error handling**: Use `fmt.Errorf` with `%w` for wrapping. Accumulate errors in slices when syncing multiple resources, then report all at once. | ||
| - **Status conditions**: Use library-go `v1helpers.SetOperatorCondition` helpers. Prefix condition types with `RouteControllerManager` for route-controller-manager conditions to distinguish from openshift-controller-manager conditions. | ||
| - **Config observation**: Each observer returns `(observedConfig, errors)`. On error, return the previous config to avoid flapping. | ||
| - **Static resources**: RBAC and namespace manifests go in `bindata/assets/`. Network policy allow rules must be listed before default-deny rules in the StaticResourceController to avoid traffic interruption during reconciliation. | ||
|
|
||
| ## Areas Requiring Extra Care | ||
|
|
||
| - **`bindata/assets/`**: Changes to static manifests affect what the operator reconciles. Network policy ordering matters (allow before deny). | ||
| - **`manifests/`**: CVO-managed resources. Changes here affect install/upgrade — coordinate with the release team. | ||
| - **Capability integration**: Adding or removing controllers from the capability-disable lists affects which controllers run on clusters with reduced capability sets. Test with capabilities both enabled and disabled. | ||
| - **Vendor directory**: Committed. Run `go mod tidy && go mod vendor` and commit separately from code changes. | ||
|
|
||
| ## CI Pipeline | ||
|
|
||
| CI is managed by Prow via [ci-operator](https://docs.ci.openshift.org/docs/architecture/ci-operator/). The build root image is configured in `.ci-operator.yaml`. Required checks include unit tests, build verification, and e2e tests on a live cluster. | ||
|
|
||
| ## Review and Approval | ||
|
|
||
| Reviews and approvals follow the [OWNERS](OWNERS) file. The primary approver groups are `control-plane-approvers` and build-related maintainers. See [OWNERS_ALIASES](OWNERS_ALIASES) for the full list of control-plane approvers. | ||
|
|
||
| ## Rebase Checklist | ||
|
|
||
| When rebasing to a new Kubernetes release: | ||
|
|
||
| - [ ] Reference the target [kubernetes release branch](https://github.com/kubernetes/kubernetes/branches) `go.mod` and `CHANGELOG` | ||
| - [ ] Bump Go version, all `k8s.io/`, `github.com/openshift/`, and other relevant dependencies | ||
| - [ ] Run `go mod vendor && go mod tidy`, commit separately | ||
| - [ ] Bump image versions (Dockerfile, `.ci-operator.yaml`) if needed | ||
| - [ ] Run `make build verify test` | ||
| - [ ] Make code changes as needed until the above pass |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Label the diagram fence.
This block will trip markdownlint as written. Use an explicit language tag such as
text.Suggested fix
📝 Committable suggestion
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 26-26: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Source: Linters/SAST tools