From a0f98206228a9e7f9cb57a44794d100b214241d2 Mon Sep 17 00:00:00 2001 From: Daniel Maizel Date: Sun, 10 May 2026 16:17:58 +0300 Subject: [PATCH 1/2] docs(migration): add Sojern Hosted->Hybrid playbook MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Translates Sojern's Hosted runtime (hgr-sojern-1e325d9, v0.21, paying tier) into a Hybrid install on gitops-runtime chart 0.29.10. Migration is a fresh install on a new cluster, not an in-place upgrade — sidesteps the 4 breaking changes between 0.21 and 0.29. Includes: - values.yaml with documented dropped/added/kept sections vs. old Hosted values - README with prerequisites, install command, pre/post-install verification, and a functional-validation checklist (app sync, event-reporter, Gerrit commit-status, image enrichment) to address concerns raised in the May 6 team-codefresh thread. Validated via helm template render and a kind-cluster install on dmaizel's test account. Re-test from OCI 0.29.10 release pending before customer handover. --- docs/migration/sojern-hybrid/README.md | 111 ++++++++++++ docs/migration/sojern-hybrid/values.yaml | 210 +++++++++++++++++++++++ 2 files changed, 321 insertions(+) create mode 100644 docs/migration/sojern-hybrid/README.md create mode 100644 docs/migration/sojern-hybrid/values.yaml diff --git a/docs/migration/sojern-hybrid/README.md b/docs/migration/sojern-hybrid/README.md new file mode 100644 index 000000000..115e63ac2 --- /dev/null +++ b/docs/migration/sojern-hybrid/README.md @@ -0,0 +1,111 @@ +# Sojern: Hosted → Hybrid GitOps Runtime migration playbook + +Translates Sojern's Hosted runtime (`hgr-sojern-1e325d9`, runtime v0.21, paying tier) into a Hybrid install on gitops-runtime chart `0.29.10` (latest release as of 2026-05-08). Migration is a fresh install on a new cluster, not an in-place upgrade — this avoids the 4 breaking changes between 0.21 and 0.29. + +## Prerequisites + +The customer must provide: + +| Item | Notes | +|---|---| +| New Kubernetes cluster | Empty namespace `cf-runtime` (or whatever you choose); must reach `tunnels.cf-cd.com` outbound | +| Codefresh user token | Created in Codefresh UI → User Settings; loaded into a secret `codefresh-user-token` (key `token`) before install | +| Runtime name | Confirm `sojern-hybrid-prod` is acceptable, or pick another. Must be unique per Codefresh account. | +| Git credentials *(optional)* | Can be set via Codefresh UI post-install, or layered in via `global.runtime.gitCredentials.password.secretKeyRef` | + +## Install + +```bash +# 1. Create namespace + user-token secret on the target cluster +kubectl create namespace cf-runtime +kubectl -n cf-runtime create secret generic codefresh-user-token \ + --from-literal=token="" + +# 2. Install the runtime +helm upgrade --install sojern \ + oci://quay.io/codefresh/gitops-runtime \ + --version 0.29.10 \ + -n cf-runtime \ + -f values.yaml \ + --atomic \ + --history-max 5 +``` + +## What this values.yaml changes vs. their old Hosted values + +Fully documented in the file's header comment. Summary: + +**Dropped (no longer applicable):** +- `app-proxy.image` (was a CR-37307 dev build) +- `argo-cd.server.env CODEFRESH_PRIORITY_QUEUE=true` (Codefresh-fork only; bundled Argo CD is OSS now) +- `gitops-operator.argoCdNotifications.*` (path removed from chart — operator-side notifications subsystem deleted; not the same as upstream `argo-cd.notifications.*`) +- `argo-cd.eventReporter.*` (replaced by top-level `event-reporters.{cluster,runtime}-event-reporter`) +- `gitops-operator.resources` (chart defaults are fine) + +**Added (Hybrid-only required block):** +- `global.codefresh.{url,accountId,userToken}` +- `global.runtime.name` +- `global.integrations.argo-cd.server.auth` +- Top-level `redis-ha.enabled: true` (required by app-proxy when replicaCount > 1) +- HA topology to match Hosted paying-tier (replicas + PDBs across all components) + +**Kept:** +- `global.runtime.isConfigurationRuntime: true` (this Hybrid takes over as account's configuration runtime) +- All Sojern-specific Argo CD knobs: resource sizing, `gerritssh.p.sojern.net` knownHost, `controller.self.heal.timeout.seconds=60`, repo-server `ARGOCD_EXEC_TIMEOUT=3m`, status/operation processors + +## Pre-install verification + +```bash +# Render and inspect — should produce ~201 manifests with no template errors +helm template sojern oci://quay.io/codefresh/gitops-runtime \ + --version 0.29.10 -n cf-runtime -f values.yaml > /tmp/render.yaml + +# Spot-check key resources are configured for HA +grep -c "kind: PodDisruptionBudget" /tmp/render.yaml # expect 11 +grep -c "kind: StatefulSet" /tmp/render.yaml # expect 3 (controller + 2× redis-ha) +``` + +## Post-install verification (smoke tests) + +```bash +NS=cf-runtime +RUNTIME=sojern-hybrid-prod + +# 1. All HA workloads have ≥2 replicas ready +kubectl -n $NS get deploy -l 'app.kubernetes.io/part-of in (app-proxy,argocd,internal-router)' \ + -o custom-columns=NAME:.metadata.name,READY:.status.readyReplicas + +# 2. Both redis-ha clusters healthy (gitops-runtime-level + argo-cd-level) +kubectl -n $NS get statefulset | grep redis-ha # expect 2 STSs, 3/3 each + +# 3. Tunnel is registered (look for runtime online in Codefresh UI) +kubectl -n $NS logs -l app=codefresh-tunnel-client --tail=50 | grep -i "registered\|connected" + +# 4. Runtime appears in Codefresh and is marked the configuration runtime +# Visit: https://g.codefresh.io/2.0/account-settings/runtime +# Check: $RUNTIME shows online, "isConfigurationRuntime" badge present +``` + +## Functional validation (the test that matters) + +Per the engineering decision in the May 6 thread, "the install renders and pods start" is *not* sufficient sign-off. Before declaring the playbook ready: + +- [ ] Connect a non-prod copy of one of Sojern's GitOps source repos +- [ ] Trigger an app sync; confirm event-reporter publishes events to Codefresh +- [ ] Trigger a workflow; confirm commit-status reports back to Gerrit +- [ ] Verify image enrichment runs end-to-end on a test image +- [ ] Diff observed event throughput against current Hosted baseline + +These exercise the breaking changes that made us choose fresh-install over upgrade in the first place (Argo Events / event-reporter rewrites, Gerrit knownHost handling). + +## Known caveats + +1. **Comment header in `values.yaml` documents what was dropped vs. Hosted** — keep it in sync if values change. +2. **Top-level `redis-ha.enabled: true` is mandatory** for HA app-proxy. The chart's stock `values-ha.yaml` does *not* set this even though it sets `app-proxy.replicaCount: 2`, so layering `-f values-ha.yaml` alone would fail validation. Filed separately as a chart fix. +3. **Image enrichment images** are pinned to `1.1.27-main` in chart defaults — confirm Sojern's enrichment templates don't reference older tags. +4. **`controller.self.heal.timeout.seconds: "60"`** is a Sojern-specific carryover from their Gerrit-driven dev flow. Revisit once their flow stabilises post-migration. + +## Files + +- `values.yaml` — production values, ready to install with +- `README.md` — this file diff --git a/docs/migration/sojern-hybrid/values.yaml b/docs/migration/sojern-hybrid/values.yaml new file mode 100644 index 000000000..36a10c210 --- /dev/null +++ b/docs/migration/sojern-hybrid/values.yaml @@ -0,0 +1,210 @@ +# ABOUTME: Translated values.yaml for Sojern's Hybrid GitOps Runtime install. +# ABOUTME: Source = Hosted runtime hgr-sojern-1e325d9 (paying tier), target = gitops-runtime chart 0.29.10. +# +# What changed vs. their old Hosted values.yaml: +# - Dropped `app-proxy.image` override (was a private CR-37307 dev build — they get the chart-default cap-app-proxy version) +# - Dropped `argo-cd.server.env CODEFRESH_PRIORITY_QUEUE=true` (Codefresh-fork only; runtime now ships OSS Argo CD ≥3.0) +# - Dropped `gitops-operator.argoCdNotifications.*` (path removed from the chart; the Codefresh-operator-side notifications +# subsystem was deleted entirely — this is NOT the same as upstream `argo-cd.notifications.*`) +# - Dropped `gitops-operator.resources` (chart defaults are appropriate; can re-introduce if observed pressure) +# - Dropped `argo-cd.eventReporter.*` entirely. Reasons: +# * old reporter was a single argo-events-based binary; new chart splits into `runtime-event-reporter` + `cluster-event-reporter` +# * `replicas: 10` doesn't translate — it scales the wrong workload +# * `RATE_LIMITER_*` env vars aren't consumed by the new cf-argocd-extras-based reporter (it's configmap-driven: +# app.queue.size, threadiness, sharding.algorithm) +# Start on chart defaults (replicaCount: 2 each); tune from observed load post-install. +# +# What was added (Hybrid-only required block, was implicit on Hosted): +# - `global.codefresh.{url,accountId,userToken}` +# - `global.runtime.name` +# - tunnel-based connectivity (matches Hosted default) +# - `global.integrations.argo-cd.server.auth` (admin password from chart-managed `argocd-initial-admin-secret`) +# - HA topology to match Hosted paying-tier (replicas, PDBs, redis-ha) — see "HA topology" sections below +# +# Kept: +# - `global.runtime.isConfigurationRuntime: true` — they're decommissioning Hosted, so this Hybrid takes over +# as their account's configuration runtime +# - All Sojern-specific Argo CD knobs: resources, gerritssh.p.sojern.net knownHost, self-heal timeout=60s, +# resource exclusions/compareoptions, repo-server ARGOCD_EXEC_TIMEOUT=3m +# +# Note on git credentials: +# `global.runtime.gitCredentials` is intentionally not set here — Sojern can provide them via the Codefresh UI +# after install (or layer in another values file with secretKeyRef). + +global: + codefresh: + url: "https://g.codefresh.io" + accountId: "63d97e6762d88367f72f43b8" # Sojern's account + userToken: + secretKeyRef: + name: codefresh-user-token + key: token + + runtime: + name: "sojern-hybrid-prod" + cluster: https://kubernetes.default.svc + + # This Hybrid takes over as the account's configuration runtime (Hosted is going away). + # Exactly one runtime per account should have this set to true. + isConfigurationRuntime: true + + # Tunnel mode (matches Hosted default). Set ingress.enabled=true and disable tunnel-client below if exposing via ingress. + ingress: + enabled: false + protocol: https + className: nginx + hosts: [] + tls: [] + annotations: {} + + # Argo CD auth — runtime authenticates to bundled Argo CD via admin password from chart-managed secret. + integrations: + argo-cd: + server: + auth: + type: password + username: "admin" + passwordSecretKeyRef: + name: argocd-initial-admin-secret + key: password + +# Tunnel-based ingress (matches Hosted default). Disable if you set global.runtime.ingress.enabled=true. +tunnel-client: + enabled: true + +# ------------------------------------------------------------------------- +# Runtime-level redis-ha — required by app-proxy when replicaCount > 1. +# This is the gitops-runtime's own redis (used for app-proxy leader-election cache), +# separate from `argo-cd.redis-ha` below which is internal to the Argo CD subchart. +# Without this, helm install fails with: +# ".Values.redis.enabled or .Values.redis-ha.enabled must be true when .Values.app-proxy.replicaCount > 1" +# ------------------------------------------------------------------------- +redis-ha: + enabled: true + +# ------------------------------------------------------------------------- +# HA topology — replicas + PDBs for non-Argo-CD components +# Mirrors hosted-gitops-runtimes-charts/charts/runtime/paying-tier/values.yaml +# ------------------------------------------------------------------------- +internal-router: + replicaCount: 2 + pdb: + enabled: true + minAvailable: 1 + +app-proxy: + replicaCount: 2 + pdb: + enabled: true + minAvailable: 1 + resources: + limits: + memory: 4Gi + config: + skipGitPermissionValidation: "true" + +# ------------------------------------------------------------------------- +# Argo CD — preserves Sojern's resource sizing + Gerrit knownHost + self-heal workaround, +# adds HA topology (redis-ha, replicas, PDBs) to match Hosted paying-tier. +# ------------------------------------------------------------------------- +argo-cd: + # Disable standalone redis; use redis-ha instead (paying-tier topology). + redis: + enabled: false + redis-ha: + enabled: true + podDisruptionBudget: + minAvailable: 2 + redis: + resources: + requests: + memory: 2000Mi + limits: + memory: 2500Mi + haproxy: + podDisruptionBudget: + minAvailable: 2 + + controller: + replicas: 2 + pdb: + enabled: true + minAvailable: 1 + resources: + requests: + memory: 12Gi + cpu: "8" + limits: + memory: 16Gi + cpu: "10" + extraArgs: + - --status-processors=50 + - --operation-processors=25 + + server: + replicas: 2 + pdb: + enabled: true + minAvailable: 1 + resources: + requests: + memory: 8Gi + cpu: "1" + limits: + memory: 16Gi + cpu: "3" + # Old Hosted values had `CODEFRESH_PRIORITY_QUEUE=true` — that env var is a Codefresh-Argo-CD-fork feature. + # Bundled Argo CD here is OSS, so the env var has no effect. Dropped intentionally. + + repoServer: + replicas: 2 + pdb: + enabled: true + minAvailable: 1 + resources: + requests: + memory: 4Gi + cpu: "1" + ephemeral-storage: 8Gi + limits: + cpu: "2" + memory: 8Gi + ephemeral-storage: 24Gi + env: + - name: ARGOCD_EXEC_TIMEOUT + value: "3m" + + applicationSet: + replicas: 2 + pdb: + enabled: true + minAvailable: 1 + + configs: + cm: + resource.exclusions: | + - apiGroups: + - policy + kinds: + - PodSecurityPolicy + resource.compareoptions: | + ignoreAggregatedRoles: true + timeout.reconciliation: "90s" + + params: + # Sojern-specific workaround for conflicting apps/controllers in their Gerrit-driven dev flow. + # Carry-over from Hosted values; revisit once the new flow stabilises. + controller.self.heal.timeout.seconds: "60" + + ssh: + knownHosts: | + bitbucket.org ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAubiN81eDcafrgMeLzaFPsw2kNvEcqTKl/VqLat/MaB33pZy0y3rJZtnqwR2qOOvbwKZYKiEO1O6VqNEBxKvJJelCq0dTXWT5pbO2gDXC6h6QDXCaHo6pOHGPUy+YBaGQRGuSusMEASYiWunYN0vCAI8QaXnWMXNMdFP3jHAJH0eDsoiGnLPBlBp4TNm6rYI74nMzgz3B9IikW4WVK+dc8KZJZWYjAuORU3jc1c/NPskD2ASinf8v3xnfXeukU0sJ5N6m5E8VLjObPEO+mN2t/FZTMZLiFqPWc/ALSqnMnnhwrNi2rbfg/rd/IpL8Le3pSBne8+seeFVBoGqzHM9yXw== + github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== + gitlab.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFSMqzJeV9rUzU4kWitGjeR4PWSa29SPqJ1fVkhtj3Hw9xjLVXVYrU9QlYWrOLXBpQ6KWjbjTDTdDkoohFzgbEY= + gitlab.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAfuCHKVTjquxvt6CM6tdG4SLp1Btn/nOeHHE5UOzRdf + gitlab.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCsj2bNKTBSpIYDEGk9KxsGh3mySTRgMtXL583qmBpzeQ+jqCMRgBqB98u3z++J1sKlXHWfM9dyhSevkMwSbhoR8XIq/U0tCNyokEi/ueaBMCvbcTHhO7FcwzY92WK4Yt0aGROY5qX2UKSeOvuP4D6TPqKF1onrSzH9bx9XUf2lEdWT/ia1NEKjunUqu1xOB/StKDHMoX4/OKyIzuS0q/T1zOATthvasJFoPrAjkohTyaDUz2LN5JoH839hViyEG82yB+MjcFV5MU3N1l1QL3cVUCh93xSaua1N85qivl+siMkPGbO5xR/En4iEY6K2XPASUEMaieWVNTRCtJ4S8H+9 + ssh.dev.azure.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7Hr1oTWqNqOlzGJOfGJ4NakVyIzf1rXYd4d7wo6jBlkLvCA4odBlL0mDUyZ0/QUfTTqeu+tm22gOsv+VrVTMk6vwRU75gY/y9ut5Mb3bR5BV58dKXyq9A9UeB5Cakehn5Zgm6x1mKoVyf+FFn26iYqXJRgzIZZcZ5V6hrE0Qg39kZm4az48o0AUbf6Sp4SLdvnuMa2sVNwHBboS7EJkm57XQPVU3/QpyNLHbWDdzwtrlS+ez30S3AdYhLKEOxAG8weOnyrtLJAUen9mTkol8oII1edf7mWWbWVf0nBmly21+nZcmCTISQBtdcyPaEno7fFQMDD26/s0lfKob4Kw8H + vs-ssh.visualstudio.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7Hr1oTWqNqOlzGJOfGJ4NakVyIzf1rXYd4d7wo6jBlkLvCA4odBlL0mDUyZ0/QUfTTqeu+tm22gOsv+VrVTMk6vwRU75gY/y9ut5Mb3bR5BV58dKXyq9A9UeB5Cakehn5Zgm6x1mKoVyf+FFn26iYqXJRgzIZZcZ5V6hrE0Qg39kZm4az48o0AUbf6Sp4SLdvnuMa2sVNwHBboS7EJkm57XQPVU3/QpyNLHbWDdzwtrlS+ez30S3AdYhLKEOxAG8weOnyrtLJAUen9mTkol8oII1edf7mWWbWVf0nBmly21+nZcmCTISQBtdcyPaEno7fFQMDD26/s0lfKob4Kw8H + github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg= + github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl + [gerritssh.p.sojern.net]:29418 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCXJ3cQ+TJbonB+nw9YFKD9AXvqDDde5X/xzhYc2FrbLzqds/hKkkaL3N22VR42cmhLXojsCNCDqommKB7IP/0yrygfEzXgwVprSd2616S7BOIIc9IvOfWGEKTni83nvUfUzf4JnMrTXuCHonSQ6AMuYoNTaS9vrYLnnYaluxHOxQ== From 4b675c670f65b4ed4ff1116b7f2951766de5832b Mon Sep 17 00:00:00 2001 From: Daniel Maizel Date: Sun, 10 May 2026 17:21:00 +0300 Subject: [PATCH 2/2] docs(migration): drop sojern-hybrid README values.yaml header carries the necessary context (source, target chart version, dropped/added/kept). Separate README was duplicative and overstated what's required for the customer. --- docs/migration/sojern-hybrid/README.md | 111 ------------------------- 1 file changed, 111 deletions(-) delete mode 100644 docs/migration/sojern-hybrid/README.md diff --git a/docs/migration/sojern-hybrid/README.md b/docs/migration/sojern-hybrid/README.md deleted file mode 100644 index 115e63ac2..000000000 --- a/docs/migration/sojern-hybrid/README.md +++ /dev/null @@ -1,111 +0,0 @@ -# Sojern: Hosted → Hybrid GitOps Runtime migration playbook - -Translates Sojern's Hosted runtime (`hgr-sojern-1e325d9`, runtime v0.21, paying tier) into a Hybrid install on gitops-runtime chart `0.29.10` (latest release as of 2026-05-08). Migration is a fresh install on a new cluster, not an in-place upgrade — this avoids the 4 breaking changes between 0.21 and 0.29. - -## Prerequisites - -The customer must provide: - -| Item | Notes | -|---|---| -| New Kubernetes cluster | Empty namespace `cf-runtime` (or whatever you choose); must reach `tunnels.cf-cd.com` outbound | -| Codefresh user token | Created in Codefresh UI → User Settings; loaded into a secret `codefresh-user-token` (key `token`) before install | -| Runtime name | Confirm `sojern-hybrid-prod` is acceptable, or pick another. Must be unique per Codefresh account. | -| Git credentials *(optional)* | Can be set via Codefresh UI post-install, or layered in via `global.runtime.gitCredentials.password.secretKeyRef` | - -## Install - -```bash -# 1. Create namespace + user-token secret on the target cluster -kubectl create namespace cf-runtime -kubectl -n cf-runtime create secret generic codefresh-user-token \ - --from-literal=token="" - -# 2. Install the runtime -helm upgrade --install sojern \ - oci://quay.io/codefresh/gitops-runtime \ - --version 0.29.10 \ - -n cf-runtime \ - -f values.yaml \ - --atomic \ - --history-max 5 -``` - -## What this values.yaml changes vs. their old Hosted values - -Fully documented in the file's header comment. Summary: - -**Dropped (no longer applicable):** -- `app-proxy.image` (was a CR-37307 dev build) -- `argo-cd.server.env CODEFRESH_PRIORITY_QUEUE=true` (Codefresh-fork only; bundled Argo CD is OSS now) -- `gitops-operator.argoCdNotifications.*` (path removed from chart — operator-side notifications subsystem deleted; not the same as upstream `argo-cd.notifications.*`) -- `argo-cd.eventReporter.*` (replaced by top-level `event-reporters.{cluster,runtime}-event-reporter`) -- `gitops-operator.resources` (chart defaults are fine) - -**Added (Hybrid-only required block):** -- `global.codefresh.{url,accountId,userToken}` -- `global.runtime.name` -- `global.integrations.argo-cd.server.auth` -- Top-level `redis-ha.enabled: true` (required by app-proxy when replicaCount > 1) -- HA topology to match Hosted paying-tier (replicas + PDBs across all components) - -**Kept:** -- `global.runtime.isConfigurationRuntime: true` (this Hybrid takes over as account's configuration runtime) -- All Sojern-specific Argo CD knobs: resource sizing, `gerritssh.p.sojern.net` knownHost, `controller.self.heal.timeout.seconds=60`, repo-server `ARGOCD_EXEC_TIMEOUT=3m`, status/operation processors - -## Pre-install verification - -```bash -# Render and inspect — should produce ~201 manifests with no template errors -helm template sojern oci://quay.io/codefresh/gitops-runtime \ - --version 0.29.10 -n cf-runtime -f values.yaml > /tmp/render.yaml - -# Spot-check key resources are configured for HA -grep -c "kind: PodDisruptionBudget" /tmp/render.yaml # expect 11 -grep -c "kind: StatefulSet" /tmp/render.yaml # expect 3 (controller + 2× redis-ha) -``` - -## Post-install verification (smoke tests) - -```bash -NS=cf-runtime -RUNTIME=sojern-hybrid-prod - -# 1. All HA workloads have ≥2 replicas ready -kubectl -n $NS get deploy -l 'app.kubernetes.io/part-of in (app-proxy,argocd,internal-router)' \ - -o custom-columns=NAME:.metadata.name,READY:.status.readyReplicas - -# 2. Both redis-ha clusters healthy (gitops-runtime-level + argo-cd-level) -kubectl -n $NS get statefulset | grep redis-ha # expect 2 STSs, 3/3 each - -# 3. Tunnel is registered (look for runtime online in Codefresh UI) -kubectl -n $NS logs -l app=codefresh-tunnel-client --tail=50 | grep -i "registered\|connected" - -# 4. Runtime appears in Codefresh and is marked the configuration runtime -# Visit: https://g.codefresh.io/2.0/account-settings/runtime -# Check: $RUNTIME shows online, "isConfigurationRuntime" badge present -``` - -## Functional validation (the test that matters) - -Per the engineering decision in the May 6 thread, "the install renders and pods start" is *not* sufficient sign-off. Before declaring the playbook ready: - -- [ ] Connect a non-prod copy of one of Sojern's GitOps source repos -- [ ] Trigger an app sync; confirm event-reporter publishes events to Codefresh -- [ ] Trigger a workflow; confirm commit-status reports back to Gerrit -- [ ] Verify image enrichment runs end-to-end on a test image -- [ ] Diff observed event throughput against current Hosted baseline - -These exercise the breaking changes that made us choose fresh-install over upgrade in the first place (Argo Events / event-reporter rewrites, Gerrit knownHost handling). - -## Known caveats - -1. **Comment header in `values.yaml` documents what was dropped vs. Hosted** — keep it in sync if values change. -2. **Top-level `redis-ha.enabled: true` is mandatory** for HA app-proxy. The chart's stock `values-ha.yaml` does *not* set this even though it sets `app-proxy.replicaCount: 2`, so layering `-f values-ha.yaml` alone would fail validation. Filed separately as a chart fix. -3. **Image enrichment images** are pinned to `1.1.27-main` in chart defaults — confirm Sojern's enrichment templates don't reference older tags. -4. **`controller.self.heal.timeout.seconds: "60"`** is a Sojern-specific carryover from their Gerrit-driven dev flow. Revisit once their flow stabilises post-migration. - -## Files - -- `values.yaml` — production values, ready to install with -- `README.md` — this file