From b269c35b3663378f92eed22cda39f60d20c0ac39 Mon Sep 17 00:00:00 2001 From: bdchatham Date: Tue, 19 May 2026 12:31:05 -0700 Subject: [PATCH] feat(scenarios): ConfigMap ownerReference + Workflow name templating (PR 7) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related improvements to the major-upgrade scenario from PR 6: 1. Workflow CR's metadata.name now carries $SEI_WORKFLOW_RUN_ID so concurrent applies don't collide on the CR. The previous hardcoded `name: major-upgrade` meant two parallel `kubectl apply` invocations would patch the same CR rather than create distinct runs, even when the per-run ConfigMap names were distinct. 2. The workflow-vars- ConfigMap now carries an ownerReference pointing at the parent Workflow CR. Deletion of the Workflow cascades garbage-collection of the ConfigMap automatically via kube-controller-manager — no operator-managed cleanup, no leftover ConfigMaps accumulating across runs. Changes: - scenarios/major-upgrade.yaml: - Workflow.metadata.name = `major-upgrade-$SEI_WORKFLOW_RUN_ID` - compute-target-height looks up the Workflow's UID via kubectl and patches the rendered ConfigMap with ownerReferences before apply. Uses `kubectl patch --local --type=merge` to inject the field at the YAML pipeline stage (no need to re-Get after apply). blockOwnerDeletion: false so the ConfigMap never blocks Workflow deletion. - runner/rbac.yaml: add `chaos-mesh.org/workflows: get` so compute-target-height (running under the seitask-runner SA) can resolve the parent Workflow's UID. - scenarios/README.md: update the Cleanup and Known Limitations sections to reflect the cascade-delete behavior and remove the deferred-cleanup entry. Runner binary unchanged. No new CRDs, no new templates, no Go code. --- runner/rbac.yaml | 7 +++++++ scenarios/README.md | 20 +++++++------------- scenarios/major-upgrade.yaml | 20 +++++++++++++++++++- 3 files changed, 33 insertions(+), 14 deletions(-) diff --git a/runner/rbac.yaml b/runner/rbac.yaml index 4198c06..f96f33d 100644 --- a/runner/rbac.yaml +++ b/runner/rbac.yaml @@ -45,6 +45,13 @@ rules: - apiGroups: [""] resources: ["configmaps"] verbs: ["get", "list", "watch", "create", "patch", "update"] +# Chaos Mesh Workflow: read-only. compute-target-height looks up the +# parent Workflow CR's UID so it can stamp the workflow-vars ConfigMap +# with an ownerReference. Deletion of the Workflow then cascades +# garbage-collection of the ConfigMap automatically. +- apiGroups: ["chaos-mesh.org"] + resources: ["workflows"] + verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding diff --git a/scenarios/README.md b/scenarios/README.md index 2ac914d..d8d7856 100644 --- a/scenarios/README.md +++ b/scenarios/README.md @@ -207,11 +207,11 @@ namespace as the Workflow: `$SEI_UPGRADE_NAME` per concurrent run, or treat the chain as serially owned by one scenario at a time. -- **Cleanup:** ConfigMaps are not garbage-collected by the Workflow. - Operators clear them via the `sei.io/workflow-run` label (see Cleanup - above). A future enhancement is to set an `ownerReference` on the - ConfigMap pointing at the Workflow CR so it cascades on Workflow - deletion. +- **Cleanup:** the ConfigMap carries an `ownerReference` pointing at the + parent Workflow CR (`major-upgrade-$SEI_WORKFLOW_RUN_ID`). Deleting the + Workflow cascades garbage-collection of the ConfigMap automatically + via kube-controller-manager. Operators can still clean up by label + (`-l sei.io/workflow-run`) if multiple Workflows are torn down at once. ## Known limitations / deferred capability @@ -244,19 +244,13 @@ namespace as the Workflow: 4. **The runner image is not yet auto-published.** Add a `runner` step to `.github/workflows/ecr.yml` once this scenario is wired into a CI job. -5. **ConfigMap is not owner-referenced to the Workflow.** Cleanup is - manual today. A follow-up that adds an `ownerReferences` entry to - the ConfigMap (pointing at the Workflow CR) would make Workflow - deletion cascade. Punt until the runner manages the ConfigMap - lifecycle natively. - -6. **Argo Workflows migration is still on the long-term roadmap.** The +5. **Argo Workflows migration is still on the long-term roadmap.** The ConfigMap bridge is the MVP. Argo's `outputs.parameters` / `inputs.parameters` is more ergonomic and avoids the per-run ConfigMap garbage. Plan that migration once we have more than one scenario worth porting. -7. **No fan-out from a single step.** The 4-vote step is hard-coded to +6. **No fan-out from a single step.** The 4-vote step is hard-coded to 4 children rather than `--per-node-selector=role=validator`. We could collapse the four `vote-node-*` templates into one fan-out runner if the SeiNodes carry a consistent label, but the explicit per-node form diff --git a/scenarios/major-upgrade.yaml b/scenarios/major-upgrade.yaml index 7b9e8b2..a0aa08d 100644 --- a/scenarios/major-upgrade.yaml +++ b/scenarios/major-upgrade.yaml @@ -76,7 +76,11 @@ apiVersion: chaos-mesh.org/v1alpha1 kind: Workflow metadata: - name: major-upgrade + # Workflow CR name carries the run ID so two concurrent applies don't + # collide on the same CR. The workflow-vars ConfigMap (see + # compute-target-height) sets ownerReferences to this CR so a Workflow + # deletion cascades to the ConfigMap. + name: major-upgrade-$SEI_WORKFLOW_RUN_ID labels: sei.io/scenario: major-upgrade sei.io/workflow-run: "$SEI_WORKFLOW_RUN_ID" @@ -146,6 +150,17 @@ spec: POST=$((TARGET + 10)) PANIC_BOUNDARY=$((TARGET - 1)) echo "current=${CUR} target=${TARGET} post=${POST} panic_boundary=${PANIC_BOUNDARY}" + # Look up the parent Workflow's UID so we can stamp an + # ownerReference on the ConfigMap. When the Workflow CR is + # deleted, kube-controller-manager garbage-collects the + # ConfigMap automatically — no operator-managed cleanup. + WORKFLOW_UID=$(kubectl get workflow.chaos-mesh.org \ + "major-upgrade-${SEI_WORKFLOW_RUN_ID}" \ + -o jsonpath='{.metadata.uid}') + if [ -z "${WORKFLOW_UID}" ]; then + echo "failed to resolve Workflow UID for major-upgrade-${SEI_WORKFLOW_RUN_ID}" >&2 + exit 1 + fi kubectl create configmap "workflow-vars-${SEI_WORKFLOW_RUN_ID}" \ --from-literal=TARGET_HEIGHT="${TARGET}" \ --from-literal=UPGRADE_HEIGHT="${TARGET}" \ @@ -155,6 +170,9 @@ spec: | kubectl label -f - --local -o yaml \ sei.io/workflow-run="${SEI_WORKFLOW_RUN_ID}" \ sei.io/scenario=major-upgrade \ + | kubectl patch -f - --local --type=merge --patch \ + "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"chaos-mesh.org/v1alpha1\",\"kind\":\"Workflow\",\"name\":\"major-upgrade-${SEI_WORKFLOW_RUN_ID}\",\"uid\":\"${WORKFLOW_UID}\",\"controller\":false,\"blockOwnerDeletion\":false}]}}" \ + -o yaml \ | kubectl apply -f - env: - {name: SEI_DEPLOYMENT, value: "$SEI_DEPLOYMENT"}