Add control plane recovery complex task by jessie1111101 · Pull Request #63 · gke-labs/devops-bench

jessie1111101 · 2026-06-04T21:39:30Z

No description provided.

…ure and updating gcp variables resolver

…namespace secret ID fix

…event context window bloat

…se tip

…l_resource to ensure GKE workloads are re-applied on namespace changes

pradeepvrd · 2026-06-07T23:51:09Z

+  template:
+    metadata:
+      labels:
+        job-name: setup-and-corrupt


We might not want to use job names that make it obvious this is a test. The agent is likely to understand even before inspection.

Ideally we should look for a way to trigger the corruption from outside the cluster else we risk the agent inspecting the job and identifying the exact trigger easily

pradeepvrd · 2026-06-07T23:55:45Z

+  stack: "prebuilt/cp-recovery"
+  teardown: true
+prompt: |
+  You are a Site Reliability Engineer managing a Kubernetes control plane (simulated by a custom API server 'mock-apiserver' and a 3-member etcd cluster 'etcd-0', 'etcd-1', 'etcd-2') in the namespace '{{NAMESPACE}}' on the cluster '{{GKE_CLUSTER_NAME}}' in project '{{GCP_PROJECT_ID}}'.


This feels like a task that is better suited for kind than GKE if I am being honest. We don't need to simulate etcd and api server and we can even trigger the corruption from outside the cluster. This is a fair amount of detail for the task and likely not what was intended.

pradeepvrd · 2026-06-07T23:56:52Z

+
+  The control plane is currently experiencing a critical failure.
+  Restore the control plane to a healthy, consistent state from the latest verified backup in the GCS bucket 'cpr-{{GCP_PROJECT_ID}}-{{NAMESPACE}}'.
+  To prevent data drift or further corruption during recovery, ensure mutating requests are blocked and the system status is updated accordingly.


Let's remove all of this. The agent should be able to do it and if not the agent context can be assumed to be misconfigured.

jessie1111101 added 26 commits May 27, 2026 21:16

update diagnosis accuracy critera to be more pragmatic

3b2d2c8

Merge branch 'main' into complextask

30e82c6

add secret rotation complex task

84f4744

undo diagnosis change

10ec9a5

draft for adding terraform provisioner support

f681863

Resolve merge conflict in deployers/factory.py by keeping main struct…

6699548

…ure and updating gcp variables resolver

update with terraform provisioner

8bdc30c

Merge branch 'main' into complextask

40ad9a1

merge main and migrate secret-rotation task to OpenTofu (tofu)

5af936e

refactor: revert to working local-exec flow while preserving dynamic …

07c2bc1

…namespace secret ID fix

chore: ignore generated rendered yaml manifests and local tofu binary

3771bf2

refactor: clear GCE VM session folder at the start of agent run to pr…

b724ced

…event context window bloat

doc: update README reference to OpenTofu and document GKE cluster reu…

e104049

…se tip

fix: add namespace and file hash triggers to kubernetes_manifests nul…

523f982

…l_resource to ensure GKE workloads are re-applied on namespace changes

add control plane recovery task

a710a28

use kubernetes manifest resource

641f320

clean up

5eeaa7a

clean up

3fc5d29

small fix

66ce0e4

update readme for secret rotation

a2afb17

Merge branch 'complextask' into addcomplextask

3790583

cp recovery task refactoring

04d0b7e

move firewall and open claw permissions

59e20ec

Merge branch 'complextask' into addcomplextask

31d6ee3

update permissions

b12c0f9

Merge branch 'main' into addcomplextask

41f7327

jessie1111101 requested review from itssimrank and pradeepvrd June 6, 2026 01:54

pradeepvrd requested changes Jun 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add control plane recovery complex task#63

Add control plane recovery complex task#63
jessie1111101 wants to merge 26 commits into
mainfrom
addcomplextask

jessie1111101 commented Jun 4, 2026

Uh oh!

pradeepvrd Jun 7, 2026

Uh oh!

pradeepvrd Jun 7, 2026

Uh oh!

pradeepvrd Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jessie1111101 commented Jun 4, 2026

Uh oh!

pradeepvrd Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

pradeepvrd Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

pradeepvrd Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants