diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 000000000..7f3088ee3 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,80 @@ +# AI Agent Instructions for openshift-controller-manager + +> Also read [ARCHITECTURE.md](ARCHITECTURE.md) for design decisions and +> [CONTRIBUTING.md](CONTRIBUTING.md) for workflow. + +## What This Repo Is + +The OpenShift Controller Manager (OCM) runs 19 controllers that reconcile OpenShift-specific API +resources: Builds, DeploymentConfigs, ImageStreams, TemplateInstances, Projects, plus cross-cutting +controllers for authorization, image pull secrets, and service unidling. It is deployed by the +[cluster-openshift-controller-manager-operator](https://github.com/openshift/cluster-openshift-controller-manager-operator). + +## Repository Layout + +```text +cmd/openshift-controller-manager/ # Main binary entrypoint +cmd/openshift-controller-manager-tests-ext/ # OTE test binary +pkg/cmd/controller/ # Controller registry and init functions (start here) +pkg/cmd/openshift-controller-manager/ # Server bootstrap and config +pkg/apps/ # DeploymentConfig controllers (DEPRECATED) +pkg/build/ # Build controllers +pkg/image/ # ImageStream controllers and triggers +pkg/template/ # TemplateInstance controllers +pkg/project/ # Namespace finalizer controller +pkg/unidling/ # Service unidling controller +pkg/authorization/ # Default role binding controllers +pkg/internalregistry/ # Internal registry pull secret controllers (6 sub-controllers) +vendor/ # Vendored dependencies +``` + +## Build and Test Commands + +```bash +make build # build binaries +make test-unit # run unit tests +make verify # gofmt, govet, version checks +``` + +## Critical Rules + +1. **Never import `controller-runtime`** — this repo uses raw client-go. Mixing frameworks + breaks informer sharing and creates subtle cache inconsistencies. +2. **Never modify `vendor/` in the same commit as code changes** — always commit vendor updates + separately for reviewable diffs. +3. **DeploymentConfig is deprecated** — only critical and security fixes in `pkg/apps/`. Do not + add features or refactor beyond what is required for the fix. +4. **Pull secret controllers are channel-coordinated** — the 6 sub-controllers in + `pkg/internalregistry/` communicate via Go channels with blocking startup semantics. + Changes to one controller may break the startup ordering of others. + +## Key Patterns + +- **Controller registration:** All controllers are registered in `pkg/cmd/controller/config.go` + in the `ControllerInitializers` map. Each controller has an `InitFunc` in a corresponding file + under `pkg/cmd/controller/`. +- **Per-controller service accounts:** Each controller runs with its own SA and scoped RBAC. + SA names are constants in `pkg/cmd/controller/config.go`. +- **Capabilities integration:** Build and DC controllers can be disabled. Check + `IsControllerEnabled()` and conditional informer startup in `StartInformers()`. +- **Error classification (apps):** `fatalError` (never retried), `actionableError` (retried with + warning), regular errors (retried silently). Retry limits range from 5 to 15 by controller. +- **SSA in internalregistry:** Pull secret controllers use Server-Side Apply with distinct field + manager strings to avoid conflicts. + +## What NOT to Do + +- Do not add new controllers without a corresponding entry in `ControllerInitializers` and a + dedicated service account +- Do not start `AppsInformers` or `BuildInformers` unconditionally — they are gated on controller + enablement to avoid unnecessary API watches +- Do not use status subresources for DeploymentConfig state — the annotation-driven state machine + on ReplicationControllers is the API contract +- Do not delete or modify the `openshift.io/legacy-token` finalizer logic without understanding + the rollback controller interaction + +## Test Suites + +- **Unit tests:** `make test-unit` — colocated `_test.go` files throughout `pkg/` +- **OTE:** `./openshift-controller-manager-tests-ext run-suite openshift/openshift-controller-manager/conformance/parallel` +- **E2E:** Run via `openshift/origin` against a live cluster (not in this repo) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 000000000..e23b61c56 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,157 @@ +# Architecture: openshift-controller-manager + +## Scope + +The OpenShift Controller Manager (OCM) runs controllers that reconcile OpenShift-specific API +resources. It manages the lifecycle of Builds, DeploymentConfigs, ImageStreams, TemplateInstances, +and Projects, and provides cross-cutting controllers for authorization, image pull secrets, and +service unidling. + +**Does not manage:** Routes (moved to [route-controller-manager](https://github.com/openshift/route-controller-manager) in 4.12), +resource quotas, or security context constraints (moved out during the 2019 split from openshift/origin). + +## Namespace Map + +| Namespace | Purpose | +|-----------|---------| +| `openshift-controller-manager` | Operand pods, serving-cert secrets, leader election Lease | +| `openshift-infra` | Service accounts used by controllers (e.g., `build-controller`, `deployer-controller`) | +| `openshift-config` | Cluster-wide CA bundles and proxy config consumed by the build controller | +| `openshift-image-registry` | Registry Service watched by the registry URL observation controller | +| `openshift-kube-apiserver` | Bound SA signing key secret watched by the key ID observation controller | + +## Component Overview + +The binary starts via `library-go`'s `NewControllerCommandConfig`, reads an +`OpenShiftControllerManagerConfig`, waits for a healthy API server, then starts each enabled +controller from a central registry of 19 controllers plus 1 rollback controller. Each controller +gets its own service account and rate-limited client. Leader election ensures single-active-instance +semantics. + +## Controllers + +| Controller | API Group | Watches | Reconciles | +|-----------|-----------|---------|------------| +| `openshift.io/deployer` | apps | RC, Pod | Creates deployer pods, tracks deployment lifecycle | +| `openshift.io/deploymentconfig` | apps | DC, RC | Triggers rollouts, manages RC replicas, cleans up old revisions | +| `openshift.io/build` | build | Build, Pod, IS, ConfigMaps, Secrets, cluster configs | Runs build lifecycle: resolves images, creates build pods, tracks completion | +| `openshift.io/build-config-change` | build | BuildConfig | Instantiates first build on ConfigChange trigger | +| `openshift.io/image-import` | image | ImageStream | One-shot imports for new/updated ImageStreams | +| `openshift.io/image-trigger` | image | IS, DC, BC, Deployment, DaemonSet, StatefulSet, CronJob | Propagates IS tag changes to dependent resources | +| `openshift.io/image-signature-import` | image | Image | Downloads container image signatures from registries | +| `openshift.io/templateinstance` | template | TemplateInstance | Processes templates, creates objects, polls readiness | +| `openshift.io/templateinstancefinalizer` | template | TemplateInstance | Deletes template-created objects on TI deletion | +| `openshift.io/origin-namespace` | project | Namespace | Removes `openshift.io/origin` finalizer during namespace deletion | +| `openshift.io/unidling` | — | Event (NeedPods) | Scales idled services back up on traffic | +| `openshift.io/serviceaccount` | — | SA | Provisions `builder` and `deployer` service accounts | +| `openshift.io/serviceaccount-pull-secrets` | — | SA, Secret, Service | Manages bound-token image pull secrets for the internal registry | +| `openshift.io/default-rolebindings` | — | Namespace, RoleBinding | Ensures `image-pullers`, `image-builders`, `deployers` bindings | +| `openshift.io/builder-serviceaccount` | — | SA | Provisions `builder` SA (capability-scoped) | +| `openshift.io/deployer-serviceaccount` | — | SA | Provisions `deployer` SA (capability-scoped) | +| `openshift.io/builder-rolebindings` | — | Namespace, RoleBinding | `system:image-builders` binding (capability-scoped) | +| `openshift.io/deployer-rolebindings` | — | Namespace, RoleBinding | `system:deployers` binding (capability-scoped) | +| `openshift.io/image-puller-rolebindings` | — | Namespace, RoleBinding | `system:image-pullers` binding (capability-scoped) | + +The pull-secrets controller (`openshift.io/serviceaccount-pull-secrets`) is internally composed of +6 sub-controllers coordinated via Go channels: `ServiceAccountController`, +`ImagePullSecretController`, `RegistryURLObservationController`, `KeyIDObservationController`, +`LegacyImagePullSecretController`, and `LegacyTokenSecretController`. + +**Rollback controllers:** The `RollbackControllers` map in `config.go` registers cleanup/rollback +logic that runs *instead of* a disabled controller. When `startControllers()` skips a disabled +controller, `startRollbackControllers()` checks this map and starts the corresponding rollback +init function. This pattern is extensible — any controller that needs cleanup behavior when +disabled can register a rollback entry. Currently only `serviceaccount-pull-secrets` has one +(the `legacyImagePullSecretRollbackController`). + +## Capabilities Integration + +Since OpenShift 4.14, Build and DeploymentConfig APIs can be disabled at install time via the +Capabilities API. Controllers were refactored into independently-disablable units: + +- The monolithic service account controller was split into `builder-serviceaccount` and + `deployer-serviceaccount`. +- The role bindings controller was split into `builder-rolebindings`, `deployer-rolebindings`, + and `image-puller-rolebindings`. +- `AppsInformers` and `BuildInformers` only start when their controllers are enabled. +- The image trigger controller conditionally registers DC and BuildConfig sources. + +## Manifest and Resource Management + +OCM is deployed by the +[cluster-openshift-controller-manager-operator](https://github.com/openshift/cluster-openshift-controller-manager-operator), +not directly by the CVO. The operator manages the Deployment, ServiceAccount, ConfigMap +(controller config), and RBAC resources. OCM itself does not own CRDs — it consumes types defined +in [openshift/api](https://github.com/openshift/api). + +## Dependencies + +| Dependency | Role | +|-----------|------| +| `k8s.io/*` (v1.35) | Client-go, informers, workqueues, API machinery | +| `github.com/openshift/api` | OpenShift API type definitions | +| `github.com/openshift/client-go` | Typed OpenShift clients and informers | +| `github.com/openshift/library-go` | `controllercmd`, leader election, unidling client | +| `github.com/containers/image` | Image signature downloads (signature import controller) | + +## Testing Strategy + +- **Unit tests:** Colocated `_test.go` files in each package. Table-driven tests for build + strategies, readiness checks, deployer pod creation, and pull secret controllers. +- **OTE (OpenShift Tests Extension):** Binary `openshift-controller-manager-tests-ext` shipped in + the container image. Single-module architecture under `.openshift-tests-extension/`. +- **E2E:** Run via the `openshift/origin` test suite against a live cluster. No in-repo e2e tests. + +## Design Decisions + +1. **Split from openshift/origin (2018–2019):** OCM was extracted from the monolithic origin repo + to enable independent release cycles. The git history carries the full origin lineage back to + 2014. Quota and SCC controllers were removed during this split. + +2. **No controller-runtime:** OCM uses raw client-go informers and workqueues throughout. While + this predates controller-runtime, it is also an active choice for performance and memory + consumption. Raw informers allow precise control over which events trigger handlers — for + example, the pull secret controllers filter by secret type and annotation at the handler level, + avoiding unnecessary queue churn in clusters with large numbers of secrets and service accounts. + +3. **Annotation-driven deployment state machine:** DeploymentConfig stores deployment status in RC + annotations rather than a status subresource — a legacy pattern predating Kubernetes status + conventions. This is load-bearing and cannot be changed without breaking the DC API contract. + +4. **Per-controller service accounts:** Each controller runs with its own SA and scoped RBAC, + following least-privilege. Client QPS is divided across controllers (original/10+1 per client), + with high-rate controllers (pull secrets) getting a dedicated 100+ QPS path. + +5. **Bucketed image import scheduler:** The `ScheduledImageStreamController` uses a custom bucketed + scheduler instead of a standard workqueue, distributing import load evenly across time windows + to avoid thundering-herd effects on external registries. + +6. **Producer-consumer channel coordination for pull secrets:** The internal registry controllers + use Go channels (not informers) to coordinate registry URL and signing key discovery. The pull + secret controller blocks startup until both producers have emitted, preventing secret creation + with incomplete registry information. + +7. **DeploymentConfig is deprecated but permanent:** Deprecated since OCP 4.14 (OCPSTRAT-1465). + Only critical and security fixes accepted. Depends on the upstream ReplicationController API. + Removal version set to `v4.10000` — effectively never within the 4.x line. + +8. **Route controllers extracted (4.12):** Moved to a standalone + [route-controller-manager](https://github.com/openshift/route-controller-manager) process, + partly driven by HyperShift's need for a separate deployment topology. Source code was fully + removed from OCM on master (4.13). The Route API itself remains in openshift-apiserver for + standard OCP, though MicroShift already serves routes as CRDs and shared validation is being + consolidated in library-go. + +9. **Capability-scoped controller splits (4.14–4.16):** Service account and role binding controllers + were split from monolithic instances into per-capability units so Build and DeploymentConfig + functionality can be individually disabled without affecting core image-puller bindings. + +10. **Pull secret cleanup lives in the operator (tech debt):** When the internal image registry is + disabled, the operator's `ImagePullSecretCleanupController` deletes managed pull secrets while + simultaneously reconfiguring OCM to disable the creation controller. Because these are separate + processes with different lifecycles, there is a race: OCM may still be running with old config + and recreating secrets that the cleanup controller is deleting. A 10-minute grace period + (OCPBUGS-34054) papers over this but is explicitly a stop-gap. The proper fix is to move + cleanup into OCM as a rollback controller — the same pattern established for the + `legacyImagePullSecretRollbackController` (PR #380, OCPBUGS-52193) — so that creation and + deletion are mutually exclusive within the same process. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 000000000..47dc3e3d8 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 000000000..c348a097d --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,80 @@ +# Contributing to openshift-controller-manager + +## Prerequisites + +- Go 1.25+ +- An OpenShift cluster (for e2e testing via openshift/origin) + +## Development Workflow + +1. Fork the repo and clone your fork +2. Create a feature branch from `master` +3. Make your changes, add or update tests +4. Run verification locally: + ```bash + make build verify test-unit + ``` +5. If you changed dependencies: `go mod tidy && go mod vendor` (commit vendor separately) +6. Push your branch and open a PR + +## Pull Request Guidelines + +- Keep PRs focused — one logical change per PR +- Reference JIRA tickets in the PR title: `OCPBUGS-XXXXX: description` or `CNTRLPLANE-XXXX: description` +- Include unit tests for new functionality +- PRs require `/lgtm` from a reviewer and `/approve` from an approver (see OWNERS files) +- All PRs require `/verified` before merge + +## Building + +| Command | What It Does | +|---------|-------------| +| `make build` | Builds the `openshift-controller-manager` and OTE test binaries | +| `make test-unit` | Runs unit tests in `./pkg/...` and `./cmd/...` | +| `make verify` | Runs `gofmt`, `govet`, and Go version checks | +| `make update-gofmt` | Auto-formats Go source files | +| `make image-openshift-controller-manager` | Builds the container image | +| `make vulncheck` | Runs govulncheck | + +## Testing + +Unit tests are colocated with source files. Run `make test-unit` to execute them. + +E2E tests run via the [openshift/origin](https://github.com/openshift/origin) test suite against a +live cluster — there are no in-repo e2e tests. + +The repo also ships an OTE binary (`openshift-controller-manager-tests-ext`) in the container +image. See the [README](README.md) for OTE usage. + +## Code Conventions + +- No `controller-runtime` — use raw `client-go` informers and workqueues +- Each controller gets its own service account and init function in `pkg/cmd/controller/` +- Retry limits range from 5 to 15 depending on the controller, with exponential backoff +- Use Server-Side Apply (SSA) with distinct field manager strings for new controllers +- Build and deployer pod specs live in `pkg/build/controller/strategy/` and + `pkg/apps/deployer/` respectively + +## Areas Requiring Extra Care + +- **`vendor/`** — always commit separately from code changes; run `go mod tidy && go mod vendor` +- **Capabilities integration** — changes to controller registration must account for Build and + DeploymentConfig being optional (disabled via Capabilities API) +- **Pull secret controllers** (`pkg/internalregistry/`) — coordinated via Go channels; + changes to one sub-controller may affect startup ordering +- **DeploymentConfig** (`pkg/apps/`) — deprecated since OCP 4.14; only critical and security + fixes accepted +- **Build pod security** — build pods run with specific security contexts; changes require + careful review for privilege escalation + +## CI Pipeline + +CI runs via Prow and ci-operator. The build root image is configured in `.ci-operator.yaml`. +Full job definitions live in the [openshift/release](https://github.com/openshift/release) +repository. + +## Review and Approval + +The repo uses Prow's OWNERS-based review system. See the `OWNERS` and `OWNERS_ALIASES` files at +the repo root and in subdirectories for the current reviewer and approver lists. PRs touching +multiple areas may need approvers from each relevant OWNERS file. diff --git a/README.md b/README.md index 9c7ea66d8..9252b634f 100644 --- a/README.md +++ b/README.md @@ -1,68 +1,73 @@ # OpenShift Controller Manager -The OpenShift Controller Manager (OCM) is comprised of multiple controllers, many of which -correspond to a top-level OpenShift API object, watching for changes and acting accordingly. -The controllers are generally organized by API group: +The OpenShift Controller Manager (OCM) runs controllers that reconcile OpenShift-specific API +resources. It is a core control-plane component deployed by the +[cluster-openshift-controller-manager-operator](https://github.com/openshift/cluster-openshift-controller-manager-operator). -- `apps.openshift.io` - OpenShift-specific workloads, like `DeploymentConfig`. -- `build.openshift.io` - OpenShift `Builds` and `BuildConfigs`. -- `image.openshift.io` - `ImageStreams` and `Images`. -- `project.openshift.io` - Projects, OpenShift's wrapper for `Namespaces`. -- `template.openshift.io` - OpenShift `Templates` - a simple way to deploy applications. +Controllers are organized by API group: -There are additional controllers which add OpenShift-specific capabilities to the cluster: +- `apps.openshift.io` — DeploymentConfig lifecycle and deployer pods +- `build.openshift.io` — Build and BuildConfig reconciliation +- `image.openshift.io` — ImageStream imports, triggers, and signature verification +- `project.openshift.io` — Project/namespace finalizer +- `template.openshift.io` — TemplateInstance processing and cleanup -- `authorization` - provides default service account role bindings for OpenShift projects. -- `serviceaccounts` - manages secrets that allow images to be pulled and pushed from the - [OpenShift image registry](https://github.com/openshift/image-registry). -- `unidling` - manages unidling of applications when inbound network traffic is detected. See the - [OpenShift docs](https://docs.openshift.com/container-platform/latest/applications/idling-applications.html#idle-unidling-applications_idling-applications) - for more information. +Additional cross-cutting controllers handle default role bindings, image pull secret management +for the internal registry, and service unidling. -## Metrics - -Many of the controllers expose metrics which are visible in the default OpenShift monitoring system -(Prometheus). See [metrics](docs/metrics.md) for a detailed list of exposed metrics for each API -group. +## Quick Start -## Rebase -Follow this checklist and copy into the PR: +### Prerequisites -- [ ] Select the desired [kubernetes release branch](https://github.com/kubernetes/kubernetes/branches), and use its `go.mod` and `CHANGELOG` as references for the rest of the work. -- [ ] Bump go version if needed. -- [ ] Bump `require`s and `replace`s for `k8s.io/`, `github.com/openshift/`, and relevant deps. -- [ ] Run `go mod vendor && go mod tidy`, commit `vendor` folder separately from all other changes. -- [ ] Bump image versions (Dockerfile, ci...) if needed. -- [ ] Run `make build verify test`. -- [ ] Make code changes as needed until the above pass. -- [ ] Any other minor update, like documentation. -## Tests +- Go 1.25+ +- An OpenShift cluster (OCM cannot run standalone) -This repository is compatible with the "OpenShift Tests Extension (OTE)" framework. +### Building -### Building the test binary ```bash -make build +make build # builds the openshift-controller-manager binary +make verify # runs gofmt, govet, and golang version checks ``` -### Running test suites and tests -```bash -# Run a specific test suite or test -./openshift-controller-manager-tests-ext run-suite openshift/openshift-controller-manager/all -./openshift-controller-manager-tests-ext run-test "test-name" +### Running Tests -# Run with JUnit output -./openshift-controller-manager-tests-ext run-suite openshift/openshift-controller-manager/all --junit-path=/tmp/junit-results/junit.xml -./openshift-controller-manager-tests-ext run-test "test-name" --junit-path=/tmp/junit-results/junit.xml +```bash +make test-unit # runs unit tests in ./pkg/... ./cmd/... ``` -### Listing available tests and suites +### OTE (OpenShift Tests Extension) + ```bash -# List all test suites +make build ./openshift-controller-manager-tests-ext list-suites - -# List tests in a specific suite -./openshift-controller-manager-tests-ext list-tests openshift/openshift-controller-manager/all +./openshift-controller-manager-tests-ext run-suite openshift/openshift-controller-manager/conformance/parallel ``` -The test extension binary is included in the production image for CI/CD integration. +## Rebase Checklist + +- [ ] Check the target [kubernetes release branch](https://github.com/kubernetes/kubernetes/branches) `go.mod` and `CHANGELOG` +- [ ] Bump Go version if needed +- [ ] Bump `k8s.io/`, `github.com/openshift/`, and relevant deps in `go.mod` +- [ ] Run `go mod tidy && go mod vendor`, commit vendor separately +- [ ] Bump image versions in Dockerfile and `.ci-operator.yaml` if needed +- [ ] Run `make build verify test-unit` +- [ ] Fix any compilation or test failures from upstream API changes + +## Metrics + +Controllers expose Prometheus metrics visible in the default OpenShift monitoring stack. +See [docs/metrics.md](docs/metrics.md) for the full list. + +## Documentation + +- [ARCHITECTURE.md](ARCHITECTURE.md) — Design decisions and component architecture +- [CONTRIBUTING.md](CONTRIBUTING.md) — How to submit changes +- [AGENTS.md](AGENTS.md) — AI agent instructions + +## Related Repositories + +- [cluster-openshift-controller-manager-operator](https://github.com/openshift/cluster-openshift-controller-manager-operator) — Operator that deploys OCM +- [openshift/api](https://github.com/openshift/api) — API type definitions +- [openshift/library-go](https://github.com/openshift/library-go) — Shared controller libraries +- [route-controller-manager](https://github.com/openshift/route-controller-manager) — Route controllers (extracted from OCM in 4.12) +- [openshift/origin](https://github.com/openshift/origin) — E2E test suite