Skip to content

Add control plane recovery complex task#63

Open
jessie1111101 wants to merge 26 commits into
mainfrom
addcomplextask
Open

Add control plane recovery complex task#63
jessie1111101 wants to merge 26 commits into
mainfrom
addcomplextask

Conversation

@jessie1111101
Copy link
Copy Markdown
Collaborator

No description provided.

…l_resource to ensure GKE workloads are re-applied on namespace changes
template:
metadata:
labels:
job-name: setup-and-corrupt
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might not want to use job names that make it obvious this is a test. The agent is likely to understand even before inspection.

Ideally we should look for a way to trigger the corruption from outside the cluster else we risk the agent inspecting the job and identifying the exact trigger easily

stack: "prebuilt/cp-recovery"
teardown: true
prompt: |
You are a Site Reliability Engineer managing a Kubernetes control plane (simulated by a custom API server 'mock-apiserver' and a 3-member etcd cluster 'etcd-0', 'etcd-1', 'etcd-2') in the namespace '{{NAMESPACE}}' on the cluster '{{GKE_CLUSTER_NAME}}' in project '{{GCP_PROJECT_ID}}'.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a task that is better suited for kind than GKE if I am being honest. We don't need to simulate etcd and api server and we can even trigger the corruption from outside the cluster. This is a fair amount of detail for the task and likely not what was intended.


The control plane is currently experiencing a critical failure.
Restore the control plane to a healthy, consistent state from the latest verified backup in the GCS bucket 'cpr-{{GCP_PROJECT_ID}}-{{NAMESPACE}}'.
To prevent data drift or further corruption during recovery, ensure mutating requests are blocked and the system status is updated accordingly.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove all of this. The agent should be able to do it and if not the agent context can be assumed to be misconfigured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants