Skip to content

feat(platform): implement backup strategy for postgres#2552

Merged
androndo merged 1 commit intomainfrom
feat/pg-backups
May 7, 2026
Merged

feat(platform): implement backup strategy for postgres#2552
androndo merged 1 commit intomainfrom
feat/pg-backups

Conversation

@androndo
Copy link
Copy Markdown
Contributor

@androndo androndo commented May 1, 2026

What this PR does

Screenshots

Release note

Implemented CNPG-strategy controller for postgres backups with supporting in-place and to-copy restores.

Summary by CodeRabbit

  • New Features

    • CNPG-based backup strategy for PostgreSQL with S3-backed object store and controller support
    • Option to reference existing Kubernetes Secret for S3 credentials (configurable key names)
    • Controller supports scheduled/ad-hoc backups and both in-place and to-copy restores; safer credential handling (no inline keys)
  • Examples

    • Complete manifests: bucket, credentials, CNPG strategy, BackupClass, Plan, BackupJob, RestoreJob, Postgres source/target
  • Security/RBAC

    • Controller RBAC expanded for CNPG resources; tests enforce no cluster-wide Secret read
  • Documentation

    • README, chart values/schema and UI schema updated to document secret-reference workflow and restore guidance
  • Tests

    • New end-to-end and unit tests covering CNPG flows, restore semantics, cleanup behavior and RBAC constraints

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a robust backup and restore strategy for Postgres applications using the CloudNativePG (CNPG) operator. By leveraging the CNPG operator's native capabilities, this change allows for more reliable backup management and flexible recovery options, including in-place and to-copy restores. The implementation emphasizes security by ensuring that S3 credentials are handled via references rather than direct inclusion in resource specifications, and it includes necessary controller logic, API definitions, and end-to-end tests to ensure stability.

Highlights

  • CNPG Backup Strategy Implementation: Introduced a new CNPG backup strategy that delegates backup and restore operations to the CloudNativePG operator, enabling automated backup management and point-in-time recovery.
  • Secure Credential Handling: Implemented a secure credential forwarding mechanism where S3 credentials are referenced via Kubernetes Secrets in the application namespace, ensuring sensitive keys are never exposed in the Postgres CR spec.
  • Controller Logic and RBAC: Added new controller logic for managing CNPG backup/restore lifecycles and updated RBAC permissions to allow the controller to patch Postgres applications and manage CNPG-specific resources.
  • End-to-End Testing: Added a comprehensive BATS end-to-end test suite to verify the full backup, in-place restore, and to-copy restore workflows.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files
  • Ignored by pattern: **/zz_generated.*.go (4)
    • api/apps/v1alpha1/postgresql/zz_generated.deepcopy.go
    • api/backups/strategy/v1alpha1/zz_generated.deepcopy.go
    • internal/backupcontroller/cnpgtypes/zz_generated.deepcopy.go
    • internal/backupcontroller/postgresapp/zz_generated.deepcopy.go
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions Bot added area/platform Issues or PRs related to platform infrastructure (bundle, flux, talos, installer) size/XXL This PR changes 1000+ lines, ignoring generated files labels May 1, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds CNPG backup/restore strategy support and optional referencing of pre-existing S3 credential Secrets. Changes include new CNPG strategy API types/CRD, controller reconcile paths for CNPG backup/restore and cleanup dispatch, minimal CNPG/Postgres typed objects, Helm chart wiring to prefer external credential Secrets, example manifests, deepcopy updates, RBAC expansion, and tests.

Changes

CNPG backup strategy + S3 credential secret support

Layer / File(s) Summary
Data Shape / API types
api/backups/strategy/v1alpha1/cnpg_types.go, api/apps/v1alpha1/postgresql/types.go, internal/backupcontroller/cnpgtypes/types.go, internal/backupcontroller/postgresapp/types.go
Added CNPG strategy API types and CRD-aligned structs; added S3CredentialsSecret type and Backup.S3CredentialsSecret fields on Postgres API shape and internal Postgres app type.
Generated/handwritten deepcopy
api/backups/strategy/v1alpha1/zz_generated.deepcopy.go, api/apps/v1alpha1/postgresql/zz_generated.deepcopy.go, internal/backupcontroller/cnpgtypes/zz_generated.deepcopy.go, internal/backupcontroller/postgresapp/zz_generated.deepcopy.go
Added/updated DeepCopy implementations for new CNPG, Postgres, and S3 credential types; Backup.DeepCopyInto explicitly assigns new field.
Core controller logic
internal/backupcontroller/cnpgstrategy_controller.go, internal/backupcontroller/backup_controller.go, internal/backupcontroller/restorejob_controller.go, internal/backupcontroller/backupjob_controller.go
Implemented CNPG backup/restore reconcile flows: template rendering, SSA patching of CNPG Cluster barmanObjectStore, one-shot CNPG Backup management, polling/timeout logic, artifact creation, restore patching of Postgres app (forwards secret refs, clears inline keys), conditional cluster purge, and strategy-dispatched cleanup on Backup/Restore deletion.
Minimal runtime types & scheme wiring
internal/backupcontroller/cnpgtypes/types.go, internal/backupcontroller/postgresapp/types.go, cmd/backupstrategy-controller/main.go
Introduced minimal CNPG/Postgres runtime types, scheme builders and registered them into controller scheme at startup.
RBAC & CRD
packages/system/backupstrategy-controller/templates/rbac.yaml, packages/system/backupstrategy-controller/definitions/strategy.backups.cozystack.io_cnpgs.yaml
Added CNPG-related RBAC rules (patch/update postgreses; manage postgresql.cnpg.io clusters/backups) and a new CNPG CRD definition reflecting the strategy template schema.
Helm chart wiring & values
packages/apps/postgres/templates/db.yaml, packages/apps/postgres/templates/backup-secret.yaml, packages/apps/postgres/values.yaml, packages/apps/postgres/values.schema.json
Added backup.s3CredentialsSecret value object, computed credential Secret/key names in templates, and gated chart-managed Secret rendering when external Secret name is provided; updated schema and descriptions to mark inline keys ignored when secret-ref provided.
Examples & e2e
examples/backups/postgres/*, hack/e2e-apps/backup-postgres.bats
Added example manifests for Bucket, Postgres source/target, static credential Secrets, CNPG strategy, BackupClass, Plan, BackupJob, RestoreJobs and an E2E Bats script exercising backup + in-place and to-copy restores.
Tests & validation
internal/backupcontroller/cnpgstrategy_controller_test.go, internal/backupcontroller/backup_controller_test.go, internal/backupcontroller/restorejob_controller_test.go, internal/backupcontroller/cnpg_rbac_test.go, internal/backupcontroller/cozyrd_drift_test.go
Extensive unit tests covering template rendering, object store building, restore patch behaviour (secret-ref forwarding, key overrides, scrub), deadline gating, cleanup dispatch, RBAC (no cluster-wide Secret read), and schema drift checks.
Docs & UI schema
packages/apps/postgres/README.md, packages/system/postgres-rd/cozyrds/postgres.yaml, internal/backupcontroller/cozyrd_drift_test.go
Updated README backup guidance to prefer tenant-managed BackupClass/Plan/RestoreJob flows; added s3CredentialsSecret to UI/chart ApplicationDefinition schema and keys ordering; drift test added.
Pre-commit workflow
.github/workflows/pre-commit.yml
Pre-fetches k8s.io/code-generator into go module cache during pre-commit CI step to satisfy generate hooks.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant BackupController
  participant CNPGStrategy
  participant PostgresApp
  participant CNPGCluster
  participant SecretStore
  User->>BackupController: Create BackupJob (strategyRef: CNPG)
  BackupController->>CNPGStrategy: Fetch strategy/template
  BackupController->>PostgresApp: Fetch source Postgres app
  BackupController->>CNPGCluster: SSA-apply patch -> spec.backup.barmanObjectStore
  CNPGCluster-->>BackupController: Patch accepted / Cluster present?
  BackupController->>CNPGCluster: Create cnpg.io/Backup (one-shot)
  CNPGCluster->>BackupController: Update Backup status (running -> completed/failed)
  BackupController->>SecretStore: Use referenced Secret name (if template provides)
  BackupController->>BackupController: Create Cozystack Backup artifact (driver metadata + snapshot)
  BackupController->>User: Update BackupJob status (Succeeded / Failed)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

"I’m a rabbit in a foldered glade,
I hopped through CRDs and templates laid.
I tucked credentials in a secret nest,
and watched CNPG backups do their best.
🐇🌿"

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/pg-backups

@dosubot dosubot Bot added area/database Issues or PRs related to managed databases (postgres, mariadb, redis, etcd, kafka, clickhouse) kind/feature Categorizes issue or PR as related to a new feature labels May 1, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new CloudNativePG (CNPG) backup and restore strategy for Postgres applications, including the necessary API types, controller logic, and Helm chart updates to support external S3 credential secrets. The review feedback highlights a critical issue with Server-Side Apply (SSA) patches missing TypeMeta, a potential data loss scenario during to-copy restores if the source application is missing, and opportunities to improve performance and race condition handling during resource purging.

Comment on lines +740 to +744
func newCNPGClusterPatch(namespace, name string) *cnpgtypes.Cluster {
return &cnpgtypes.Cluster{
ObjectMeta: metav1.ObjectMeta{Namespace: namespace, Name: name},
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The newCNPGClusterPatch helper creates a Cluster object for Server-Side Apply (SSA) but fails to initialize TypeMeta. SSA requires Kind and APIVersion to be set on the object to correctly identify the resource on the server side. Without these, the r.Patch call at line 254 may fail or behave unexpectedly.

func newCNPGClusterPatch(namespace, name string) *cnpgtypes.Cluster {
	return &cnpgtypes.Cluster{
		TypeMeta: metav1.TypeMeta{
			APIVersion: cnpgtypes.GroupVersion.String(),
			Kind:       "Cluster",
		},
		ObjectMeta: metav1.ObjectMeta{Namespace: namespace, Name: name},
	}
}

Comment thread internal/backupcontroller/cnpgstrategy_controller.go Outdated
Comment on lines +643 to +649
for i := range pvcList.Items {
pvc := &pvcList.Items[i]
if err := r.Delete(ctx, pvc); err != nil && !apierrors.IsNotFound(err) {
return fmt.Errorf("delete PVC %s/%s: %w", pvc.Namespace, pvc.Name, err)
}
}
logger.Debug("deleted cnpg cluster PVCs", "count", len(pvcList.Items))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

When purging the existing cluster, PVCs are deleted immediately after the Cluster object. Since r.Delete is asynchronous, the pods belonging to the cluster might still be running and holding the PVCs. While Kubernetes handles this via finalizers, it might be safer to ensure pods are terminating before attempting to wipe the PVCs to avoid race conditions where the operator might try to re-attach them if the deletion is slow.

Comment on lines +707 to +717
func toJSONMap(obj interface{}) (map[string]interface{}, error) {
raw, err := json.Marshal(obj)
if err != nil {
return nil, err
}
out := map[string]interface{}{}
if err := json.Unmarshal(raw, &out); err != nil {
return nil, err
}
return out, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The toJSONMap function performs a full JSON marshal and unmarshal on every reconcile loop where templating is required. While functional for providing a clean map for Go templates, this is computationally expensive. If performance becomes an issue, consider using a more direct reflection-based approach or caching the result if the application object hasn't changed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
internal/backupcontroller/cnpgtypes/zz_generated.deepcopy.go (1)

3-5: ⚡ Quick win

Add a deepcopy regression test to guard manual drift.

Since these methods are intentionally hand-maintained (Line 3-5), add a small aliasing test that mutates copied nested fields and asserts the original is unchanged. This will catch future misses when structs in types.go evolve.

As per coding guidelines, "Use controller-runtime patterns and kubebuilder style for Go code in the Cozystack project."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backupcontroller/cnpgtypes/zz_generated.deepcopy.go` around lines 3
- 5, Add a regression unit test in the internal/backupcontroller/cnpgtypes
package (following controller-runtime/kubebuilder test patterns) that verifies
the hand-written DeepCopy methods do not alias nested fields: create an instance
of a representative struct from types.go, call its DeepCopy/DeepCopyInto methods
(e.g., MyType.DeepCopy/DeepCopyInto or other exported types present), mutate
nested slices/maps/struct fields on the copy, and assert the original value is
unchanged; use the same test harness style as other controller tests (testing.T,
table-driven if needed) and include clear failure messages to catch future
manual-drift in the zz_generated.deepcopy implementations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/backupcontroller/cnpg_rbac_test.go`:
- Around line 37-44: The test currently only flags rules where an APIGroups
entry equals "" and a Resources entry equals "secrets"; update the guardrail to
also treat wildcard entries as matching core/secrets. Specifically, in the loop
over rule.APIGroups and rule.Resources (variables rule.APIGroups,
rule.Resources, role.Name, i, rule.Verbs), consider group == "*" as covering the
core group and res == "*" as covering "secrets" (i.e., treat group != "" &&
group != "*" as the continue condition, and flag if res == "secrets" || res ==
"*"); keep the existing error message and parameters.

In `@internal/backupcontroller/cnpgstrategy_controller.go`:
- Around line 611-619: The current readSourceDatabasesAndUsers in
RestoreJobReconciler returns (nil, nil, nil) when getPostgresApp returns
IsNotFound, which allows restores to proceed without source DB/User info; change
this to fail the restore explicitly: in readSourceDatabasesAndUsers (and where
it calls getPostgresApp) return a clear error (e.g., wrap or return
fmt.Errorf("source PostgresApp %s/%s not found", namespace, name) or a sentinel
ErrSourcePostgresAppNotFound) instead of nil,nil,nil, and ensure the caller of
readSourceDatabasesAndUsers aborts the restore path on that error (update error
handling in the reconcile path that invokes readSourceDatabasesAndUsers).
Alternatively, if you prefer the persistence approach, implement logic in
readSourceDatabasesAndUsers to load databases/users from the Backup artifact
when getPostgresApp IsNotFound (fetch the Backup/CR and populate
src.Spec.Databases and src.Spec.Users) and return those maps so the restore can
safely proceed; choose one approach and implement consistent error handling or
backup-reading accordingly.
- Around line 588-597: The deep copy retains cleartext S3 keys on the Postgres
CR; when switching to using S3 credentials secret you must scrub any inline keys
from patched.Spec.Backup to avoid leaving s3AccessKey/s3SecretKey in etcd/audit
logs—after you set patched.Spec.Backup.S3CredentialsSecret (or whenever credsRef
!= nil) clear patched.Spec.Backup.S3AccessKey and
patched.Spec.Backup.S3SecretKey (and any equivalent inline fields) so only the
secret-ref remains; ensure this sanitization happens in the same block that
assigns patched.Spec.Backup.S3CredentialsSecret and also when switching from
secret to inline to cover both transitions.
- Around line 98-125: The controller currently only checks
j.Spec.ApplicationRef.Kind but not API group, allowing non-Cozystack Postgres
refs to be looked up by getPostgresApp; update the validation before calling
getPostgresApp to also verify j.Spec.ApplicationRef.APIVersion (or group)
matches the Cozystack Postgres API group (apps.cozystack.io/v1alpha1) and return
via markBackupJobFailed with a clear message if it doesn't match; apply the
identical API group check in the restore path so both the backup reconcile flow
(before calling getPostgresApp) and the restore flow reject refs that are not
the expected Cozystack Postgres CRD.
- Around line 337-360: When r.Create(ctx, backup) returns an AlreadyExists
error, treat that as success by fetching the existing Backup resource and
returning it instead of propagating the error; specifically, detect
apierrors.IsAlreadyExists(err) after the r.Create call, do a r.Get(ctx,
types.NamespacedName{Name: backup.Name, Namespace: backup.Namespace},
existingBackup) (or equivalent) to load the current Backup object, and return
that existingBackup so a concurrent Status().Update on the BackupJob does not
flip a completed run into Failed; otherwise return the original error as before.
- Around line 581-604: The code using app.DeepCopy() unintentionally preserves
prior restore settings (recoveryTime, endpointURL, s3CredentialsSecret,
databases, users) when the current restore intends them to be empty; modify the
patch logic in cnpgstrategy_controller.go to explicitly clear those fields on
the copied object when the corresponding inputs are empty before applying new
values — e.g., reset patched.Spec.Bootstrap.RecoveryTime to empty when
recoveryTime == "", clear patched.Spec.Backup.EndpointURL when sourceEndpointURL
== "", set patched.Spec.Backup.S3CredentialsSecret to an empty/zero value when
credsRef is nil or credsRef.SecretRef.Name == "", and assign nil/empty slices to
patched.Spec.Databases and patched.Spec.Users when sourceDatabases/sourceUsers
are empty — while still setting Bootstrap.Enabled, OldName and ServerName as
currently done.

In `@internal/backupcontroller/restorejob_controller_test.go`:
- Around line 97-119: The test currently only seeds a Velero Restore owned by a
different job, so it passes even if cleanupVeleroRestore runs; update the test
to also seed a Velero Restore that is labeled as owned by the same RestoreJob
(use rj.Name and rj.Namespace labels) via newRestoreJobTestClient and include
that object alongside stray, then after calling
RestoreJobReconciler.cleanupOnDelete(ctx, rj) assert that this owner-labeled
Velero Restore still exists (Get into a got object) to verify the reconciler
does not touch Velero restores.

In `@packages/apps/postgres/templates/db.yaml`:
- Around line 16-18: The template currently assigns $credAccessKey and
$credSecretKey defaults even when no external secret name is provided, causing
CNPG to look for keys that don't exist; modify the db.yaml template so the
derived variables ($credAccessKey, $credSecretKey) are only set/overridden when
the external secret name ($credCfg.name) is present (i.e., wrap the defaulting
of $credAccessKey and $credSecretKey in the same conditional that detects
external-secret mode), leaving them unset otherwise so the existing
backup-secret.yaml uses its fixed AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY keys;
apply the same guard to the other duplicate blocks referenced (the blocks
currently at ~lines 26–30 and ~52–56).

---

Nitpick comments:
In `@internal/backupcontroller/cnpgtypes/zz_generated.deepcopy.go`:
- Around line 3-5: Add a regression unit test in the
internal/backupcontroller/cnpgtypes package (following
controller-runtime/kubebuilder test patterns) that verifies the hand-written
DeepCopy methods do not alias nested fields: create an instance of a
representative struct from types.go, call its DeepCopy/DeepCopyInto methods
(e.g., MyType.DeepCopy/DeepCopyInto or other exported types present), mutate
nested slices/maps/struct fields on the copy, and assert the original value is
unchanged; use the same test harness style as other controller tests (testing.T,
table-driven if needed) and include clear failure messages to catch future
manual-drift in the zz_generated.deepcopy implementations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7edf8314-83a7-4d9e-ad88-de0174355752

📥 Commits

Reviewing files that changed from the base of the PR and between 5786afe and a04244d.

📒 Files selected for processing (33)
  • api/apps/v1alpha1/postgresql/types.go
  • api/apps/v1alpha1/postgresql/zz_generated.deepcopy.go
  • api/backups/strategy/v1alpha1/cnpg_types.go
  • api/backups/strategy/v1alpha1/zz_generated.deepcopy.go
  • cmd/backupstrategy-controller/main.go
  • examples/backups/postgres/00-bucket.yaml
  • examples/backups/postgres/05-postgres-src.yaml
  • examples/backups/postgres/07-s3-credentials.yaml
  • examples/backups/postgres/10-cnpg-strategy.yaml
  • examples/backups/postgres/15-backupclass.yaml
  • examples/backups/postgres/20-plan.yaml
  • examples/backups/postgres/25-backupjob-adhoc.yaml
  • examples/backups/postgres/30-postgres-target.yaml
  • examples/backups/postgres/35-restorejob-in-place.yaml
  • examples/backups/postgres/40-restorejob-to-copy.yaml
  • hack/e2e-apps/backup-postgres.bats
  • internal/backupcontroller/backupjob_controller.go
  • internal/backupcontroller/cnpg_rbac_test.go
  • internal/backupcontroller/cnpgstrategy_controller.go
  • internal/backupcontroller/cnpgstrategy_controller_test.go
  • internal/backupcontroller/cnpgtypes/types.go
  • internal/backupcontroller/cnpgtypes/zz_generated.deepcopy.go
  • internal/backupcontroller/postgresapp/types.go
  • internal/backupcontroller/postgresapp/zz_generated.deepcopy.go
  • internal/backupcontroller/restorejob_controller.go
  • internal/backupcontroller/restorejob_controller_test.go
  • packages/apps/postgres/README.md
  • packages/apps/postgres/templates/backup-secret.yaml
  • packages/apps/postgres/templates/db.yaml
  • packages/apps/postgres/values.schema.json
  • packages/apps/postgres/values.yaml
  • packages/system/backupstrategy-controller/definitions/strategy.backups.cozystack.io_cnpgs.yaml
  • packages/system/backupstrategy-controller/templates/rbac.yaml

Comment thread internal/backupcontroller/cnpg_rbac_test.go Outdated
Comment thread internal/backupcontroller/cnpgstrategy_controller.go Outdated
Comment thread internal/backupcontroller/cnpgstrategy_controller.go
Comment thread internal/backupcontroller/cnpgstrategy_controller.go
Comment thread internal/backupcontroller/cnpgstrategy_controller.go
Comment thread internal/backupcontroller/cnpgstrategy_controller.go Outdated
Comment thread internal/backupcontroller/restorejob_controller_test.go Outdated
Comment thread packages/apps/postgres/templates/db.yaml Outdated
@dosubot dosubot Bot added size:XXL and removed size/XXL This PR changes 1000+ lines, ignoring generated files size:XXL labels May 1, 2026
@github-actions github-actions Bot removed the size:XXL label May 1, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/apps/postgres/templates/db.yaml`:
- Around line 8-14: The multiline Helm block comments using the {{/* ... */}}
form (surrounding the s3Credentials explanatory text and again at the 17-25
region) are breaking YAML linting; replace those Helm block comments with
YAML-safe comments (use standard YAML # comments outside of Helm's block-comment
syntax or use single-line Helm comment tokens) so the template remains valid
YAML while preserving the same explanatory text for s3Credentials and the
backup/restore behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: db6dfedc-5811-426e-9bc2-7d6b7fab6c54

📥 Commits

Reviewing files that changed from the base of the PR and between a04244d and 3594d3c.

📒 Files selected for processing (11)
  • examples/backups/postgres/05-postgres-src.yaml
  • internal/backupcontroller/backup_controller.go
  • internal/backupcontroller/backup_controller_test.go
  • internal/backupcontroller/cnpg_rbac_test.go
  • internal/backupcontroller/cnpgstrategy_controller.go
  • internal/backupcontroller/cnpgstrategy_controller_test.go
  • internal/backupcontroller/cozyrd_drift_test.go
  • internal/backupcontroller/restorejob_controller_test.go
  • packages/apps/postgres/README.md
  • packages/apps/postgres/templates/db.yaml
  • packages/system/postgres-rd/cozyrds/postgres.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
  • examples/backups/postgres/05-postgres-src.yaml
  • internal/backupcontroller/cnpg_rbac_test.go
  • internal/backupcontroller/restorejob_controller_test.go
  • packages/apps/postgres/README.md
  • internal/backupcontroller/cnpgstrategy_controller.go

Comment on lines +8 to +14
{{/*
s3Credentials points at the operator-supplied Secret when
backup.s3CredentialsSecret.name is set; otherwise it falls back to the
chart-managed <release>-s3-creds Secret materialised from s3AccessKey/
s3SecretKey. The override path keeps credentials out of the Postgres
CR .spec — the CNPG backup driver uses it on restore.
*/}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace multiline Helm block comments with YAML-safe comments to avoid lint breakage.

Static analysis flags a YAML parse error at Line 14. The multiline {{/* ... */}} blocks in this position are the likely trigger for YAMLlint.

Suggested fix
-  {{/*
-    s3Credentials points at the operator-supplied Secret when
-    backup.s3CredentialsSecret.name is set; otherwise it falls back to the
-    chart-managed <release>-s3-creds Secret materialised from s3AccessKey/
-    s3SecretKey. The override path keeps credentials out of the Postgres
-    CR .spec — the CNPG backup driver uses it on restore.
-  */}}
+  # s3Credentials points at the operator-supplied Secret when
+  # backup.s3CredentialsSecret.name is set; otherwise it falls back to the
+  # chart-managed <release>-s3-creds Secret materialised from s3AccessKey/
+  # s3SecretKey. The override path keeps credentials out of the Postgres
+  # CR .spec — the CNPG backup driver uses it on restore.
   {{- $credCfg := default (dict) .Values.backup.s3CredentialsSecret }}
   {{- $credSecretName := $credCfg.name | default (printf "%s-s3-creds" .Release.Name) }}
-  {{/*
-    accessKeyIDKey/secretAccessKeyKey only apply when an external Secret was
-    supplied via backup.s3CredentialsSecret.name. The chart-managed
-    <release>-s3-creds Secret materialised by backup-secret.yaml hard-codes
-    AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY, so applying overrides in
-    that mode would point CNPG at non-existent keys and break credential
-    reads. Default both names unconditionally and let the override take
-    effect only in external-Secret mode.
-  */}}
+  # accessKeyIDKey/secretAccessKeyKey only apply when an external Secret was
+  # supplied via backup.s3CredentialsSecret.name. The chart-managed
+  # <release>-s3-creds Secret in backup-secret.yaml uses AWS_ACCESS_KEY_ID /
+  # AWS_SECRET_ACCESS_KEY, so overrides should apply only in external-Secret mode.

Also applies to: 17-25

🧰 Tools
🪛 YAMLlint (1.38.0)

[error] 14-14: syntax error: could not find expected ':'

(syntax)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/apps/postgres/templates/db.yaml` around lines 8 - 14, The multiline
Helm block comments using the {{/* ... */}} form (surrounding the s3Credentials
explanatory text and again at the 17-25 region) are breaking YAML linting;
replace those Helm block comments with YAML-safe comments (use standard YAML #
comments outside of Helm's block-comment syntax or use single-line Helm comment
tokens) so the template remains valid YAML while preserving the same explanatory
text for s3Credentials and the backup/restore behavior.

@dosubot dosubot Bot added the size:XXL label May 1, 2026
@github-actions github-actions Bot added the size/XXL This PR changes 1000+ lines, ignoring generated files label May 1, 2026
@dosubot dosubot Bot added size:XXL and removed size/XXL This PR changes 1000+ lines, ignoring generated files size:XXL labels May 1, 2026
@github-actions github-actions Bot removed the size:XXL label May 1, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.github/workflows/pre-commit.yml (1)

42-46: ⚡ Quick win

Harden code-generator version extraction to avoid brittle failures.

The current regex only matches v[0-9.]+. If the pinned tag format changes (e.g., -rc suffix), this step will fail in a non-obvious way. A slightly broader parse plus an explicit empty-check makes failures clearer.

Suggested patch
-          version=$(grep -oP 'code-generator@\Kv[0-9.]+' hack/update-codegen.sh)
+          version=$(sed -nE 's|.*code-generator@(v[^"[:space:]}]+).*|\1|p' hack/update-codegen.sh | head -n1)
+          if [ -z "$version" ]; then
+            echo "::error::Failed to extract k8s.io/code-generator version from hack/update-codegen.sh"
+            exit 1
+          fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/pre-commit.yml around lines 42 - 46, The version
extraction in hack/update-codegen.sh currently sets version=$(grep -oP
'code-generator@\Kv[0-9.]+' ...) which is brittle; update the grep to capture a
broader semver-like string (e.g., allow alphanumerics, dashes and dots such as
v1.2.3-rc1) and then add an explicit check that the variable version is
non-empty before proceeding; modify the assignment to use the new regex (still
assigning to version) and add a guard after that which logs a clear error and
exits if version is empty so the go get "k8s.io/code-generator@${version}" step
cannot run with an invalid value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/pre-commit.yml:
- Around line 42-46: The version extraction in hack/update-codegen.sh currently
sets version=$(grep -oP 'code-generator@\Kv[0-9.]+' ...) which is brittle;
update the grep to capture a broader semver-like string (e.g., allow
alphanumerics, dashes and dots such as v1.2.3-rc1) and then add an explicit
check that the variable version is non-empty before proceeding; modify the
assignment to use the new regex (still assigning to version) and add a guard
after that which logs a clear error and exits if version is empty so the go get
"k8s.io/code-generator@${version}" step cannot run with an invalid value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 203f7f6a-35f6-426c-8ba3-ca4e73e4229e

📥 Commits

Reviewing files that changed from the base of the PR and between 13bd251 and b84fe17.

📒 Files selected for processing (1)
  • .github/workflows/pre-commit.yml

@androndo androndo force-pushed the feat/pg-backups branch from b84fe17 to 961dab0 Compare May 4, 2026 06:01
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 4, 2026
@github-actions github-actions Bot added the size/XXL This PR changes 1000+ lines, ignoring generated files label May 4, 2026
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 5, 2026
@github-actions github-actions Bot removed the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 5, 2026
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 5, 2026
@github-actions github-actions Bot added the size/XXL This PR changes 1000+ lines, ignoring generated files label May 5, 2026
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size/XXL This PR changes 1000+ lines, ignoring generated files size:XXL This PR changes 1000+ lines, ignoring generated files. labels May 5, 2026
@github-actions github-actions Bot removed the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 5, 2026
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 5, 2026
@github-actions github-actions Bot added the size/XXL This PR changes 1000+ lines, ignoring generated files label May 5, 2026
@androndo androndo force-pushed the feat/pg-backups branch from 4ed4c6e to fc9a674 Compare May 6, 2026 06:31
@github-actions github-actions Bot removed the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 6, 2026
@lllamnyp
Copy link
Copy Markdown
Member

lllamnyp commented May 6, 2026

NOT LGTM

Reviewing against target branch: main.

Blocking issues:

  1. Restore re-renders the CNPG strategy without BackupClass parameters, even though the API documents .Parameters as supported.

api/backups/strategy/v1alpha1/cnpg_types.go:57 documents .Parameters as a template input. The backup path renders with resolved.Parameters, but restore calls renderCNPGTemplate(strategy.Spec.Template, targetApp, nil) at internal/backupcontroller/cnpgstrategy_controller.go:539. That means a strategy using parameters for s3Credentials.secretRef.name, key names, or endpointCA can create a valid backup, then restore with <no value> or wrong secret references.

The backup artifact already persists only a few rendered fields at internal/backupcontroller/cnpgstrategy_controller.go:420, and does not persist enough rendered credential/CA config to make restore independent of BackupClass parameters.

Add a test with a CNPG strategy whose restore-relevant fields use .Parameters, then verify restore patches the target Postgres app with the original rendered values.

  1. Postgres README contradicts the actual e2e coverage.

packages/apps/postgres/README.md:96 says CI only exercises in-place restore and that to-copy is manual, but hack/e2e-apps/backup-postgres.bats:3 is explicitly “backup + to-copy restore”. The same Bats header also references TestPurgeExistingCluster_*, but no such test exists. This is blocking documentation drift in the changed area.

  1. git diff --check fails.

internal/backupcontroller/cozyrd_drift_test.go:110: new blank line at EOF.

Verification run:

go test ./internal/backupcontroller passed.

Reminder: "Pre-existing" is not a thing. If a problem is flagged in this review, it must be fixed in this PR — no exceptions.

Every issue listed above must have a corresponding test. No matter the severity — bug, logic error, security issue, error handling gap, documentation drift — each one must be covered by a test that fails without the fix and passes with it. Untested fixes are not fixes.

@lllamnyp
Copy link
Copy Markdown
Member

lllamnyp commented May 6, 2026

Branch Review

NOT LGTM

The CNPG backup strategy is well-architected and the code paths I read are mostly tight: SSA field-ownership on the live Cluster, the WAL-archive gate before destructive purge, the StartedAt refetch-before-write, the cnpgPurgeNeeded dual guard against re-purging the freshly recovered Cluster, the snapshot of source spec.databases/spec.users into Backup.status.underlyingResources, the EndpointCA plumbing through both spec.backup.barmanObjectStore and externalClusters[].barmanObjectStore, and the cluster-scoped Secret-read RBAC scrub are all good. Tests are dense and well-aimed at the named bugs.

That said, several things must change before merge:

Blockers

1. Documentation/code drift in packages/apps/postgres/README.md (lines 96-102)

The README states:

e2e coverage: the CI e2e (hack/e2e-apps/backup-postgres.bats) only exercises the in-place restore (Steps 0-5). The to-copy flow stays as a manual / dev-cluster exercise

But hack/e2e-apps/backup-postgres.bats was switched to the to-copy variant in commit fc9a6746. The bats file's test is now literally named "CNPG Postgres backup + to-copy restore", has Steps 0-7, and Step 6 is the to-copy RestoreJob. The README must be updated to reflect what the e2e actually exercises. Per the review criteria: documentation drift in an area the PR touched is blocking.

While you're there, also fix the "(Steps 0-5)" reference — the bats has 8 numbered steps now.

2. README contradicts the e2e example on backup.enabled

packages/apps/postgres/README.md lines 19-30 tell users:

Pick one - mixing them on the same application produces a continuous SSA tug-of-war on the underlying cnpg.io Cluster.spec.backup field
...
If a BackupClass referencing the CNPG strategy applies to this Postgres app, leave backup.enabled=false in the chart values.

But examples/backups/postgres/05-postgres-src.yaml (also in this PR) sets backup.enabled: true, with a comment explaining that without it archive_command stays /bin/true and WALs never reach S3, breaking restores. This is a real architectural constraint, not just an e2e quirk: any user who follows the README guidance literally gets a setup where the first BackupJob succeeds but the WAL chain has a hole and restore fails with "WAL not found".

The README needs to spell out the actual contract: the chart's backup.enabled=true is required to bootstrap the archive_command on the source cluster (because the BackupJob driver SSA-patches spec.backup with ForceOwnership and the chart's emission of the same field is what gives the source cluster archive_command from helm-install onwards). Either explain that the SSA tug-of-war is benign because the controller forces ownership and re-applies on every reconcile, or change the architecture so backup.enabled=false works end-to-end. As written, the docs and the working example contradict.

3. internal/backupcontroller/cozyrd_drift_test.go is exactly the test pattern the project's own guidelines say not to write

The test parses packages/system/postgres-rd/cozyrds/postgres.yaml, walks into the embedded openAPISchema JSON, and asserts that properties.backup.properties.s3CredentialsSecret and keysOrder carry the right entries. This is a parse-the-generated-manifest drift guard. The user's standing project guidance:

Don't write tests that grep / parse source files, chart templates, or generated manifests. Reviewers often request these as "drift guards" against doc references or RBAC contracts. Substring matching catches almost nothing (a stray comment satisfies it), full YAML/AST parsing duplicates compiler/helm machinery, and either way the guarded contract is between the running plugin and the running cluster, not between two static files. The right place to verify that contract is an e2e test that exercises the actual flow.

The contract here ("the cozyrd schema reflects values.yaml") is enforced by make generate running in CI. If the regeneration step is missed, CI's codegen-drift.yml (referenced in .github/workflows/pre-commit.yml) catches it. A second guard inside Go tests duplicates that machinery and will inevitably go stale (e.g. when someone renames s3CredentialsSecret to something else, the test has to be edited too — a real coverage check would be e2e validating the field actually round-trips through cozystack-api).

Delete internal/backupcontroller/cozyrd_drift_test.go. Do not replace with a substring grep version. If you genuinely need a regression check, add an e2e step that creates a Postgres CR with spec.backup.s3CredentialsSecret set, talks to the aggregated API server, and verifies the value persists.

4. Adjacent bug: markBackupJobFailed / markRestoreJobFailed append duplicate "Ready" conditions

internal/backupcontroller/restorejob_controller.go:148-154 and internal/backupcontroller/backupjob_controller.go:130-137 both do:

restoreJob.Status.Conditions = append(restoreJob.Status.Conditions, metav1.Condition{
    Type:               "Ready",
    Status:             metav1.ConditionFalse,
    Reason:             "RestoreFailed",
    ...
})

The Conditions field on BackupJob/RestoreJob (and most Cozystack status types) is +listType=map++listMapKey=type. Two entries with Type: "Ready" violate that contract. The CNPG paths added in this PR call these helpers many more times than the legacy strategies did (every validation failure, every NotFound, every malformed manifest goes through markBackupJobFailed after potentially having set a non-Ready condition earlier in the same reconcile via apimeta.SetStatusCondition). Under retry storms — and CNPG restores have plenty of those, between bootstrap.recovery admission rejects and helm-controller rollback loops — the conditions array stacks up, and any consumer doing apimeta.FindStatusCondition(... "Ready") reads whichever copy is first in the slice, not necessarily the latest one.

Fix: replace both appends with apimeta.SetStatusCondition(&… .Status.Conditions, metav1.Condition{...}). The LastTransitionTime argument can be dropped — SetStatusCondition manages it.

The new code paths in this PR already use SetStatusCondition correctly (lines 207, 249, 279, 631, 658, 679, 713 in cnpgstrategy_controller.go), so this is just bringing the failure helpers in line.

Smaller things

  • markBackupJobFailed (and the symmetric Restore variant) does not set restoreJob.Status.StartedAt if it fails before reconcileCNPG reached the StartedAt block. That's not new, but the CNPG path's deadline checks rely on StartedAt; if a failure comes from something further downstream and a retry happens, StartedAt will be set on the retry. Probably fine in practice but a comment explaining the relationship between markBackupJobFailed and the deadline would help future readers.
  • internal/backupcontroller/cnpgstrategy_controller.go:648-654 reads restoreJob.Status.Conditions after a Status().Update. The comment says "We read the persisted condition rather than the in-memory copy so the timestamp matches what an operator inspecting the CR would see." But the code reads the in-memory copy — it's the slice that was just sent. This works because apimeta.SetStatusCondition already updated the in-memory LastTransitionTime to the persisted value. Either tighten the comment or refetch.
  • cnpgWALArchiveDeadline = 3 * time.Minute is short. For a quiescent cluster with no WAL switches between the backup's pg_start_backup and pg_stop_backup, you'd rely on CNPG's pg_switch_wal after the backup. If it ever doesn't fire (CNPG bug, signal storm, hung primary), 3 minutes might bite users with otherwise-healthy clusters. Consider making it configurable via CNPGRestoreOptions like restoreTimeoutSeconds is, with the existing 3-minute default.

Tests for each blocker

  • (Blocker 1, README e2e claim) — not testable in code. Fix the prose; the bats file diff is the real catch.
  • (Blocker 2, backup.enabled contradiction) — fix the README; consider linking examples/backups/postgres/05-postgres-src.yaml as the canonical "what good looks like".
  • (Blocker 3, drift test) — delete the file. No replacement test should be added in Go.
  • (Blocker 4, duplicate Ready) — add a unit test that calls markBackupJobFailed (and one for markRestoreJobFailed) twice on the same job and asserts only one Ready condition exists. Without the fix, length is 2 and the test fails.

Reminder: "Pre-existing" is not a thing. If a problem is flagged in this review, it must be fixed in this PR — no exceptions.

Every issue listed above must have a corresponding test. No matter the severity — bug, logic error, security issue, error handling gap, documentation drift — each one must be covered by a test that fails without the fix and passes with it. Untested fixes are not fixes.

@androndo androndo force-pushed the feat/pg-backups branch from fc9a674 to b9fe04c Compare May 6, 2026 14:19
@androndo androndo changed the title feautre(platform): implement backup strategy for postgres feat(platform): implement backup strategy for postgres May 6, 2026
@lllamnyp
Copy link
Copy Markdown
Member

lllamnyp commented May 7, 2026

Branch Review

LGTM (with non-blocking recommendations)

Reviewing branch feat/pg-backups against target main. The actual feature change is squashed into commit b9fe04c (~5k lines); the rest of the branch's commits are unrelated work that already merged via PR #2554.

Scope of the feature:

  • New CRD strategy.backups.cozystack.io/v1alpha1.CNPG with a typed BarmanObjectStore template (api/backups/strategy/v1alpha1/cnpg_types.go, packages/system/backupstrategy-controller/definitions/strategy.backups.cozystack.io_cnpgs.yaml).
  • New CNPG driver in the existing backupstrategy controller covering BackupJob, RestoreJob (in-place + to-copy), and Backup-cleanup paths (internal/backupcontroller/cnpgstrategy_controller.go ~1262 lines + supporting partial-schema typed packages internal/backupcontroller/cnpgtypes/types.go and internal/backupcontroller/postgresapp/types.go).
  • Chart wiring on packages/apps/postgres: new backup.s3CredentialsSecret and backup.endpointCA knobs to keep credentials and CAs out of the CR .spec; backup.schedule defaulted to empty so the legacy chart-emitted ScheduledBackup is now opt-in; init-job/init-script gated off when bootstrap.enabled so post-install hook does not race CNPG recovery.
  • RBAC additions for the controller (postgresql.cnpg.io clusters/backups, apps.cozystack.io/postgreses patch).
  • e2e bats fixture (hack/e2e-apps/backup-postgres.bats) exercising the to-copy happy path against tenant-root + seaweedfs-s3.
  • Examples directory examples/backups/postgres/ and feature documentation in packages/apps/postgres/README.md.

Why LGTM despite the scope

The controller code is unusually careful about the lifecycle hazards specific to CNPG bootstrap (admission immutability, archive-lag races, helm-controller rollback wiping SSA-applied fields, terminating-cluster vs fresh-recovery cluster ambiguity) and every one of those hazards is documented in a long comment AND covered by a unit test. Tests exist for: WAL archive gate (TestCNPGBackupWALArchived, 6 sub-cases), purge double-fire avoidance (TestCNPGPurgeNeeded), DeletionTimestamp guard (TestClusterHasRecoveryBootstrap_TerminatingCluster), backup-deadline cap, idempotent createCNPGBackupArtifact AlreadyExists swallow, refetch-before-write StartedAt fix, parameter snapshot round-trip + restore-time render, missing-snapshot rejection, RBAC no-Secret guard, cleanup dispatch correctness for both Backup and RestoreJob, scrubbing stale target spec on re-restore, replace-not-merge on Databases/Users, MarkFailed not double-appending Ready conditions. go vet and go test ./internal/backupcontroller/... are clean; helm unittest on the postgres chart passes its 4 cases.

Documentation in packages/apps/postgres/README.md is up to date with the new field semantics and gating, and accurately describes the canonical setup. The values.schema.json and the generated api/apps/v1alpha1/postgresql/types.go (cozyvalues-gen output, marked DO NOT EDIT) match values.yaml. The aggregated-API schema mirror in packages/system/postgres-rd/cozyrds/postgres.yaml has been regenerated to include the new fields.

Recommended-to-fix (non-blocking)

  1. cnpgRestoreTarget.IsCopy is dead code. internal/backupcontroller/cnpgstrategy_controller.go:741 declares the field and :756 sets it on resolution; it is read only by the unit tests. The reconcile path purges in both modes ("The same applies to in-place AND to-copy", :595) so the bookkeeping serves no production purpose. Either drop the field (and its three test assertions in cnpgstrategy_controller_test.go:405,417,429) or wire it through to differentiate behaviour. The comment at :849-853 deliberately rejects merge semantics for security reasons, so dropping is the cleaner option. Minor cleanup, no behavioural impact.

  2. internal/backupcontroller/cnpg_rbac_test.go:24 (TestBackupStrategyControllerClusterRole_NoSecretAccess) parses packages/system/backupstrategy-controller/templates/rbac.yaml and asserts no rule grants core/secrets verbs. This is the pattern explicitly called out in the project guidance ("Don't write tests that grep / parse source files, chart templates, or generated manifests... reviewers often request these as drift guards against doc references or RBAC contracts"). The contract guarded here ("controller does not read Secrets cluster-wide") is meaningful, but the right way to assert it is e2e (controller starts under its actual ServiceAccount and a kubectl auth can-i get secrets returns no), not a YAML parse + regex. Consider replacing with an e2e check or removing.

  3. Stale-cache resourceVersion on the post-StartedAt write path. internal/backupcontroller/cnpgstrategy_controller.go:156-170: the StartedAt block patches fresh (advancing fresh.ResourceVersion) but only copies the timestamp back into local j; subsequent r.Status().Update(ctx, j) in the same reconcile (e.g. the conditions write at :213, the phase update at :228) carry the stale ResourceVersion and fail with a Conflict on the very first poll. controller-runtime then retries the reconcile so this is functionally correct but spends an extra poll cycle and one extra Get every time. Tighten by either re-fetching j after the StartedAt patch (and returning early so the next reconcile picks it up cleanly — this is the pattern the matching RestoreJob path already uses at :508), or by patching the local j directly in place of the separate fresh round-trip.

  4. The CNPGSpec doc claims .Parameters carries "the parameters from the matched BackupClassStrategy" (api/backups/strategy/v1alpha1/cnpg_types.go:60) and the controller comment at cnpgstrategy_controller.go:872-874 asserts these "carry tenant-supplied configuration knobs ... never cleartext credentials". Nothing enforces it. The map is map[string]string and gets persisted to Backup.status.underlyingResources as JSON, which is tenant-readable. A future BackupClass author who wires parameters: {accessKey: hunter2} ships secrets straight into etcd's status field. Cheap mitigation: document a stronger warning on the BackupClassStrategy parameters field itself, and/or have the snapshot persistence drop keys matching a denylist. Not a regression of this PR — the parameter shape already existed for the Velero strategy — but the new Parameters snapshot persistence is new behaviour worth being explicit about.

  5. Minor: examples/backups/postgres/07-s3-credentials.yaml is referenced as an alternative to deriving credentials from BucketInfo, but the e2e (hack/e2e-apps/backup-postgres.bats) materialises the Secrets directly and does not apply 07. Either delete 07 (the pattern is documented inline in the e2e header) or wire it into the example sequence (and use it from the e2e). Currently a numbered file in a sequence directory that nothing in the test pipeline applies — confusing for someone copying the example.

Tests already cover all of the above where it matters: the Recommended notes are either cleanup (1, 5), policy/style (2, 4), or a slow-path retry that the existing test suite doesn't notice and probably should not (3). Each could be addressed in this PR or a fast follow-up; none block the merge.

Reminder: "Pre-existing" is not a thing. If a problem is flagged in this review, it must be fixed in this PR — no exceptions.

Every issue listed above must have a corresponding test. No matter the severity — bug, logic error, security issue, error handling gap, documentation drift — each one must be covered by a test that fails without the fix and passes with it. Untested fixes are not fixes.

lllamnyp
lllamnyp previously approved these changes May 7, 2026
lllamnyp
lllamnyp previously approved these changes May 7, 2026
@androndo androndo disabled auto-merge May 7, 2026 09:09
Signed-off-by: Andrey Kolkov <androndo@gmail.com>
@androndo androndo merged commit 8da1e7a into main May 7, 2026
15 of 19 checks passed
@androndo androndo deleted the feat/pg-backups branch May 7, 2026 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/database Issues or PRs related to managed databases (postgres, mariadb, redis, etcd, kafka, clickhouse) area/platform Issues or PRs related to platform infrastructure (bundle, flux, talos, installer) kind/feature Categorizes issue or PR as related to a new feature size/XXL This PR changes 1000+ lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants