Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# AI Agent Instructions for openshift-controller-manager

> Also read [ARCHITECTURE.md](ARCHITECTURE.md) for design decisions and
> [CONTRIBUTING.md](CONTRIBUTING.md) for workflow.

## What This Repo Is

The OpenShift Controller Manager (OCM) runs 19 controllers that reconcile OpenShift-specific API
resources: Builds, DeploymentConfigs, ImageStreams, TemplateInstances, Projects, plus cross-cutting
controllers for authorization, image pull secrets, and service unidling. It is deployed by the
[cluster-openshift-controller-manager-operator](https://github.com/openshift/cluster-openshift-controller-manager-operator).

## Repository Layout

```text
cmd/openshift-controller-manager/ # Main binary entrypoint
cmd/openshift-controller-manager-tests-ext/ # OTE test binary
pkg/cmd/controller/ # Controller registry and init functions (start here)
pkg/cmd/openshift-controller-manager/ # Server bootstrap and config
pkg/apps/ # DeploymentConfig controllers (DEPRECATED)
pkg/build/ # Build controllers
pkg/image/ # ImageStream controllers and triggers
pkg/template/ # TemplateInstance controllers
pkg/project/ # Namespace finalizer controller
pkg/unidling/ # Service unidling controller
pkg/authorization/ # Default role binding controllers
pkg/internalregistry/ # Internal registry pull secret controllers (6 sub-controllers)
vendor/ # Vendored dependencies
```

## Build and Test Commands

```bash
make build # build binaries
make test-unit # run unit tests
make verify # gofmt, govet, version checks
```

## Critical Rules

1. **Never import `controller-runtime`** — this repo uses raw client-go. Mixing frameworks
breaks informer sharing and creates subtle cache inconsistencies.
2. **Never modify `vendor/` in the same commit as code changes** — always commit vendor updates
separately for reviewable diffs.
3. **DeploymentConfig is deprecated** — only critical and security fixes in `pkg/apps/`. Do not
add features or refactor beyond what is required for the fix.
4. **Pull secret controllers are channel-coordinated** — the 6 sub-controllers in
`pkg/internalregistry/` communicate via Go channels with blocking startup semantics.
Changes to one controller may break the startup ordering of others.

## Key Patterns

- **Controller registration:** All controllers are registered in `pkg/cmd/controller/config.go`
in the `ControllerInitializers` map. Each controller has an `InitFunc` in a corresponding file
under `pkg/cmd/controller/`.
- **Per-controller service accounts:** Each controller runs with its own SA and scoped RBAC.
SA names are constants in `pkg/cmd/controller/config.go`.
- **Capabilities integration:** Build and DC controllers can be disabled. Check
`IsControllerEnabled()` and conditional informer startup in `StartInformers()`.
- **Error classification (apps):** `fatalError` (never retried), `actionableError` (retried with
warning), regular errors (retried silently). Retry limits range from 5 to 15 by controller.
- **SSA in internalregistry:** Pull secret controllers use Server-Side Apply with distinct field
manager strings to avoid conflicts.

## What NOT to Do

- Do not add new controllers without a corresponding entry in `ControllerInitializers` and a
dedicated service account
- Do not start `AppsInformers` or `BuildInformers` unconditionally — they are gated on controller
enablement to avoid unnecessary API watches
- Do not use status subresources for DeploymentConfig state — the annotation-driven state machine
on ReplicationControllers is the API contract
- Do not delete or modify the `openshift.io/legacy-token` finalizer logic without understanding
the rollback controller interaction

## Test Suites

- **Unit tests:** `make test-unit` — colocated `_test.go` files throughout `pkg/`
- **OTE:** `./openshift-controller-manager-tests-ext run-suite openshift/openshift-controller-manager/conformance/parallel`
- **E2E:** Run via `openshift/origin` against a live cluster (not in this repo)
157 changes: 157 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Architecture: openshift-controller-manager

## Scope

The OpenShift Controller Manager (OCM) runs controllers that reconcile OpenShift-specific API
resources. It manages the lifecycle of Builds, DeploymentConfigs, ImageStreams, TemplateInstances,
and Projects, and provides cross-cutting controllers for authorization, image pull secrets, and
service unidling.

**Does not manage:** Routes (moved to [route-controller-manager](https://github.com/openshift/route-controller-manager) in 4.12),
resource quotas, or security context constraints (moved out during the 2019 split from openshift/origin).

## Namespace Map

| Namespace | Purpose |
|-----------|---------|
| `openshift-controller-manager` | Operand pods, serving-cert secrets, leader election Lease |
| `openshift-infra` | Service accounts used by controllers (e.g., `build-controller`, `deployer-controller`) |
| `openshift-config` | Cluster-wide CA bundles and proxy config consumed by the build controller |
| `openshift-image-registry` | Registry Service watched by the registry URL observation controller |
| `openshift-kube-apiserver` | Bound SA signing key secret watched by the key ID observation controller |

## Component Overview

The binary starts via `library-go`'s `NewControllerCommandConfig`, reads an
`OpenShiftControllerManagerConfig`, waits for a healthy API server, then starts each enabled
controller from a central registry of 19 controllers plus 1 rollback controller. Each controller
gets its own service account and rate-limited client. Leader election ensures single-active-instance
semantics.

## Controllers

| Controller | API Group | Watches | Reconciles |
|-----------|-----------|---------|------------|
| `openshift.io/deployer` | apps | RC, Pod | Creates deployer pods, tracks deployment lifecycle |
| `openshift.io/deploymentconfig` | apps | DC, RC | Triggers rollouts, manages RC replicas, cleans up old revisions |
| `openshift.io/build` | build | Build, Pod, IS, ConfigMaps, Secrets, cluster configs | Runs build lifecycle: resolves images, creates build pods, tracks completion |
| `openshift.io/build-config-change` | build | BuildConfig | Instantiates first build on ConfigChange trigger |
| `openshift.io/image-import` | image | ImageStream | One-shot imports for new/updated ImageStreams |
| `openshift.io/image-trigger` | image | IS, DC, BC, Deployment, DaemonSet, StatefulSet, CronJob | Propagates IS tag changes to dependent resources |
| `openshift.io/image-signature-import` | image | Image | Downloads container image signatures from registries |
| `openshift.io/templateinstance` | template | TemplateInstance | Processes templates, creates objects, polls readiness |
| `openshift.io/templateinstancefinalizer` | template | TemplateInstance | Deletes template-created objects on TI deletion |
| `openshift.io/origin-namespace` | project | Namespace | Removes `openshift.io/origin` finalizer during namespace deletion |
| `openshift.io/unidling` | — | Event (NeedPods) | Scales idled services back up on traffic |
| `openshift.io/serviceaccount` | — | SA | Provisions `builder` and `deployer` service accounts |
| `openshift.io/serviceaccount-pull-secrets` | — | SA, Secret, Service | Manages bound-token image pull secrets for the internal registry |
| `openshift.io/default-rolebindings` | — | Namespace, RoleBinding | Ensures `image-pullers`, `image-builders`, `deployers` bindings |
| `openshift.io/builder-serviceaccount` | — | SA | Provisions `builder` SA (capability-scoped) |
| `openshift.io/deployer-serviceaccount` | — | SA | Provisions `deployer` SA (capability-scoped) |
| `openshift.io/builder-rolebindings` | — | Namespace, RoleBinding | `system:image-builders` binding (capability-scoped) |
| `openshift.io/deployer-rolebindings` | — | Namespace, RoleBinding | `system:deployers` binding (capability-scoped) |
| `openshift.io/image-puller-rolebindings` | — | Namespace, RoleBinding | `system:image-pullers` binding (capability-scoped) |

The pull-secrets controller (`openshift.io/serviceaccount-pull-secrets`) is internally composed of
6 sub-controllers coordinated via Go channels: `ServiceAccountController`,
`ImagePullSecretController`, `RegistryURLObservationController`, `KeyIDObservationController`,
`LegacyImagePullSecretController`, and `LegacyTokenSecretController`.

**Rollback controllers:** The `RollbackControllers` map in `config.go` registers cleanup/rollback
logic that runs *instead of* a disabled controller. When `startControllers()` skips a disabled
controller, `startRollbackControllers()` checks this map and starts the corresponding rollback
init function. This pattern is extensible — any controller that needs cleanup behavior when
disabled can register a rollback entry. Currently only `serviceaccount-pull-secrets` has one
(the `legacyImagePullSecretRollbackController`).

## Capabilities Integration

Since OpenShift 4.14, Build and DeploymentConfig APIs can be disabled at install time via the
Capabilities API. Controllers were refactored into independently-disablable units:

- The monolithic service account controller was split into `builder-serviceaccount` and
`deployer-serviceaccount`.
- The role bindings controller was split into `builder-rolebindings`, `deployer-rolebindings`,
and `image-puller-rolebindings`.
- `AppsInformers` and `BuildInformers` only start when their controllers are enabled.
- The image trigger controller conditionally registers DC and BuildConfig sources.

## Manifest and Resource Management

OCM is deployed by the
[cluster-openshift-controller-manager-operator](https://github.com/openshift/cluster-openshift-controller-manager-operator),
not directly by the CVO. The operator manages the Deployment, ServiceAccount, ConfigMap
(controller config), and RBAC resources. OCM itself does not own CRDs — it consumes types defined
in [openshift/api](https://github.com/openshift/api).

## Dependencies

| Dependency | Role |
|-----------|------|
| `k8s.io/*` (v1.35) | Client-go, informers, workqueues, API machinery |
| `github.com/openshift/api` | OpenShift API type definitions |
| `github.com/openshift/client-go` | Typed OpenShift clients and informers |
| `github.com/openshift/library-go` | `controllercmd`, leader election, unidling client |
| `github.com/containers/image` | Image signature downloads (signature import controller) |

## Testing Strategy

- **Unit tests:** Colocated `_test.go` files in each package. Table-driven tests for build
strategies, readiness checks, deployer pod creation, and pull secret controllers.
- **OTE (OpenShift Tests Extension):** Binary `openshift-controller-manager-tests-ext` shipped in
the container image. Single-module architecture under `.openshift-tests-extension/`.
- **E2E:** Run via the `openshift/origin` test suite against a live cluster. No in-repo e2e tests.

## Design Decisions

1. **Split from openshift/origin (2018–2019):** OCM was extracted from the monolithic origin repo
to enable independent release cycles. The git history carries the full origin lineage back to
2014. Quota and SCC controllers were removed during this split.

2. **No controller-runtime:** OCM uses raw client-go informers and workqueues throughout. While
this predates controller-runtime, it is also an active choice for performance and memory
consumption. Raw informers allow precise control over which events trigger handlers — for
example, the pull secret controllers filter by secret type and annotation at the handler level,
avoiding unnecessary queue churn in clusters with large numbers of secrets and service accounts.

3. **Annotation-driven deployment state machine:** DeploymentConfig stores deployment status in RC
annotations rather than a status subresource — a legacy pattern predating Kubernetes status
conventions. This is load-bearing and cannot be changed without breaking the DC API contract.

4. **Per-controller service accounts:** Each controller runs with its own SA and scoped RBAC,
following least-privilege. Client QPS is divided across controllers (original/10+1 per client),
with high-rate controllers (pull secrets) getting a dedicated 100+ QPS path.

5. **Bucketed image import scheduler:** The `ScheduledImageStreamController` uses a custom bucketed
scheduler instead of a standard workqueue, distributing import load evenly across time windows
to avoid thundering-herd effects on external registries.

6. **Producer-consumer channel coordination for pull secrets:** The internal registry controllers
use Go channels (not informers) to coordinate registry URL and signing key discovery. The pull
secret controller blocks startup until both producers have emitted, preventing secret creation
with incomplete registry information.

7. **DeploymentConfig is deprecated but permanent:** Deprecated since OCP 4.14 (OCPSTRAT-1465).
Only critical and security fixes accepted. Depends on the upstream ReplicationController API.
Removal version set to `v4.10000` — effectively never within the 4.x line.

8. **Route controllers extracted (4.12):** Moved to a standalone
[route-controller-manager](https://github.com/openshift/route-controller-manager) process,
partly driven by HyperShift's need for a separate deployment topology. Source code was fully
removed from OCM on master (4.13). The Route API itself remains in openshift-apiserver for
standard OCP, though MicroShift already serves routes as CRDs and shared validation is being
consolidated in library-go.

9. **Capability-scoped controller splits (4.14–4.16):** Service account and role binding controllers
were split from monolithic instances into per-capability units so Build and DeploymentConfig
functionality can be individually disabled without affecting core image-puller bindings.

10. **Pull secret cleanup lives in the operator (tech debt):** When the internal image registry is
disabled, the operator's `ImagePullSecretCleanupController` deletes managed pull secrets while
simultaneously reconfiguring OCM to disable the creation controller. Because these are separate
processes with different lifecycles, there is a race: OCM may still be running with old config
and recreating secrets that the cleanup controller is deleting. A 10-minute grace period
(OCPBUGS-34054) papers over this but is explicitly a stop-gap. The proper fix is to move
cleanup into OCM as a rollback controller — the same pattern established for the
`legacyImagePullSecretRollbackController` (PR #380, OCPBUGS-52193) — so that creation and
deletion are mutually exclusive within the same process.
1 change: 1 addition & 0 deletions CLAUDE.md
80 changes: 80 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Contributing to openshift-controller-manager

## Prerequisites

- Go 1.25+
- An OpenShift cluster (for e2e testing via openshift/origin)

## Development Workflow

1. Fork the repo and clone your fork
2. Create a feature branch from `master`
3. Make your changes, add or update tests
4. Run verification locally:
```bash
make build verify test-unit
```
5. If you changed dependencies: `go mod tidy && go mod vendor` (commit vendor separately)
6. Push your branch and open a PR

## Pull Request Guidelines

- Keep PRs focused — one logical change per PR
- Reference JIRA tickets in the PR title: `OCPBUGS-XXXXX: description` or `CNTRLPLANE-XXXX: description`
- Include unit tests for new functionality
- PRs require `/lgtm` from a reviewer and `/approve` from an approver (see OWNERS files)
- All PRs require `/verified` before merge

## Building

| Command | What It Does |
|---------|-------------|
| `make build` | Builds the `openshift-controller-manager` and OTE test binaries |
| `make test-unit` | Runs unit tests in `./pkg/...` and `./cmd/...` |
| `make verify` | Runs `gofmt`, `govet`, and Go version checks |
| `make update-gofmt` | Auto-formats Go source files |
| `make image-openshift-controller-manager` | Builds the container image |
| `make vulncheck` | Runs govulncheck |

## Testing

Unit tests are colocated with source files. Run `make test-unit` to execute them.

E2E tests run via the [openshift/origin](https://github.com/openshift/origin) test suite against a
live cluster — there are no in-repo e2e tests.

The repo also ships an OTE binary (`openshift-controller-manager-tests-ext`) in the container
image. See the [README](README.md) for OTE usage.

## Code Conventions

- No `controller-runtime` — use raw `client-go` informers and workqueues
- Each controller gets its own service account and init function in `pkg/cmd/controller/`
- Retry limits range from 5 to 15 depending on the controller, with exponential backoff
- Use Server-Side Apply (SSA) with distinct field manager strings for new controllers
- Build and deployer pod specs live in `pkg/build/controller/strategy/` and
`pkg/apps/deployer/` respectively

## Areas Requiring Extra Care

- **`vendor/`** — always commit separately from code changes; run `go mod tidy && go mod vendor`
- **Capabilities integration** — changes to controller registration must account for Build and
DeploymentConfig being optional (disabled via Capabilities API)
- **Pull secret controllers** (`pkg/internalregistry/`) — coordinated via Go channels;
changes to one sub-controller may affect startup ordering
- **DeploymentConfig** (`pkg/apps/`) — deprecated since OCP 4.14; only critical and security
fixes accepted
- **Build pod security** — build pods run with specific security contexts; changes require
careful review for privilege escalation

## CI Pipeline

CI runs via Prow and ci-operator. The build root image is configured in `.ci-operator.yaml`.
Full job definitions live in the [openshift/release](https://github.com/openshift/release)
repository.

## Review and Approval

The repo uses Prow's OWNERS-based review system. See the `OWNERS` and `OWNERS_ALIASES` files at
the repo root and in subdirectories for the current reviewer and approver lists. PRs touching
multiple areas may need approvers from each relevant OWNERS file.
Loading