This repository bootstraps a multi-cluster Kubernetes environment with Argo CD, using a metal cluster that hosts vClusters for operations and workloads.
-
Secure Application Publishing: Utilizes Cloudflare Tunnel (provisioned via OpenTofu) to safely expose services and manage DNS for private clusters.
-
Automated Multi-Cluster Bootstrapping: Leverages Argo CD ApplicationSets templates to provision multiple clusters and workloads from a single source of truth.
-
Centralized Secret Management: Integrates HashiCorp Vault and ExternalSecrets to securely manage and distribute secrets across multiple clusters.
-
Multi-Tenant Virtualization: Deploys vcluster alongside OpenTofu to provide isolated virtual clusters for multi-tenant workloads.
-
Secure Private Networking: Embeds ZeroTier and Tailscale support to establish secure remote access to private services when required.
flowchart LR
subgraph Operator[Operator workstation]
Tools["vcluster + kubectl + helm + OpenTofu"]
end
Tools --> CloudInit["Cloud init (OpenTofu)"]
CloudInit --> Cloudflare["Cloudflare DNS + Tunnel"]
CloudInit --> RepoAccess["GitHub tokens/OAuth"]
CloudInit --> Metal["Metal Kubernetes cluster"]
Metal --> Argo["Argo CD"]
Argo -->|ApplicationSets| MetalApps["Metal apps"]
Argo -->|ApplicationSets| OpsApps["Operations apps"]
Argo -->|ApplicationSets| WorkApps["Workloads apps"]
Metal --> OpsV["vCluster: operations"]
Metal --> WorkV["vCluster: workloads"]
Vault["Vault"] -->|ExternalSecrets| OpsApps
Vault -->|ExternalSecrets| WorkApps
VPN["ZeroTier/Tailscale"] --> Metal
Cloudflare --> Metal
root/: bootstrap charts, configs, secrets, cloud-init, and vault-initmetal/: base cluster infrastructure (ingress, security, observability, kube-infra)operations/: operations vCluster workloads (ingress, security, observability)workloads/: application workloads (media, idp, services, observability)
Tools:
Accounts and access:
- Cloudflare account with API key and zone/account ID
- GitHub PAT and OAuth app credentials for Argo CD SSO
- Vault token with permissions to configure Kubernetes auth
- ZeroTier or Tailscale (optional, for private routes)
-
Create OpenTofu inputs for cloud-init:
cp root/cloud-init/terraform.tfvars.example root/cloud-init/terraform.tfvars
Update values in
root/cloud-init/terraform.tfvars, including:- Cloudflare zone, account ID, email, and API key
- GitHub token and OAuth client secret
github_repo_url/github_usernameif you are using a forkkube_contextandprivate_ip- ZeroTier/Tailscale toggles and tokens if enabled
-
Update Argo CD settings in
root/argocd/values.yaml(domain, GitHub org, OAuth client ID). -
Update repository settings in
root/bootstrap/values.yaml(repo URL and target revision). -
Prepare Vault inputs for
root/vault-init:- Export
TF_VAR_vault_tokenand optionallyTF_VAR_vault_address, or - Create
root/vault-init/terraform.tfvarswithvault_token(andvault_addressif needed).
- Export
Provision Cloudflare DNS, Cloudflare Tunnel, Argo CD OAuth secret, and the repo token.
cd root/cloud-init
tofu init
tofu applycd root/argocd
./apply.shThis creates ApplicationSets for root/, metal/, operations/, and workloads/.
cd root/bootstrap
helm template \
--namespace argocd \
--values ./values.yaml \
bootstrap . \
| kubectl apply -n argocd -f -Expect errors early on because the operations and workloads clusters cannot access Vault yet. The following applications are expected to fail temporarily:
- Operations vCluster:
- external-dns (cannot retrieve token from Vault)
- cert-manager (cannot retrieve token from Vault)
- Mimir, Loki, Grafana, Tempo, Grafana Agent (cannot retrieve token from Vault)
- Workloads vCluster:
- external-dns (cannot retrieve token from Vault)
- cert-manager (cannot retrieve token from Vault)
- Kubecost (cannot connect to Mimir)
The metal cluster should be healthy because it does not require Vault to retrieve tokens.
cd root/vault-init
vcluster connect metal-workloads \
-n vcluster \
--update-current=false \
--server=https://workloads-vcluster.internal.systemrestartengineering.cloud \
--kube-config ./workloads-kubeconfig.yaml
vcluster connect metal-operations \
-n vcluster \
--update-current=false \
--server=https://operations-vcluster.internal.systemrestartengineering.cloud \
--kube-config ./operations-kubeconfig.yamlcd root/vault-init
tofu init
tofu applyZeroTier is used to reach private routes. After creating a ZeroTier network by running OpenTofu in root/cloud-init/zerotier.tf, install the ZeroTier client and join the network using the generated network ID. If you enable Tailscale instead, provide the auth key in root/cloud-init/terraform.tfvars.
Retrieve the kubeconfig for the workloads cluster:
vcluster connect metal-workloads \
-n vcluster \
--update-current=false \
--server=https://workloads-vcluster.internal.systemrestartengineering.cloud \
--kube-config ./workloads-kubeconfig.yamlRetrieve the kubeconfig for the operations cluster:
vcluster connect metal-operations \
-n vcluster \
--update-current=false \
--server=https://operations-vcluster.internal.systemrestartengineering.cloud \
--kube-config ./operations-kubeconfig.yamlThe operations and workloads clusters share the same DNS zone (*.internal.systemrestartengineering.cloud). If external-dns runs in both clusters, they will conflict and neither will be able to update records.
