Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 36 additions & 25 deletions .github/workflows/consul-postgres-ha-publish.yml
Original file line number Diff line number Diff line change
@@ -1,42 +1,49 @@
name: Publish consul-postgres-ha images

# Builds and publishes the six container images the consul-postgres-ha
# example needs (mesh-conn, bootstrap-secrets, signaling, webdemo,
# sidecar, patroni). On push to main, images are
# tagged with the commit SHA *and* `latest`, pushed to GHCR, and
# attested with Sigstore-backed GitHub Build Provenance so consumers
# can verify "this image came from this commit of this repo" without
# us managing any keys. PRs build to verify but do not push or attest.
# Builds and publishes the four container images the consul-postgres-ha
# example needs (mesh-sidecar, patroni, webdemo, signaling). On push
# to main, images are tagged with the commit SHA *and* `latest`,
# pushed to GHCR, and attested with Sigstore-backed GitHub Build
# Provenance so consumers can verify "this image came from this
# commit of this repo" without us managing any keys. PRs build to
# verify but do not push or attest.
#
# Why six images on one workflow: the example needs all of them in
# lockstep — bumping mesh-conn alone but leaving the rest stale leads
# to mixed-version clusters that are hard to reason about. One workflow
# means one set of tags moves together.
# Why one workflow for all four: the example needs them in lockstep —
# bumping one but leaving the rest stale leads to mixed-version
# clusters that are hard to reason about. One workflow means one set
# of tags moves together.
#
# `mesh-sidecar` is the consolidated platform-plumbing image (formerly
# four images: bootstrap-secrets, mesh-conn, the legacy keepalive, and
# the old envoy-only sidecar). Its build context is the parent
# consul-postgres-ha/ directory so its Dockerfile can pull the Go
# sources from sibling subdirs. The other three images build from
# their own subdirs.
#
# Verifying a published image (consumer side):
#
# gh attestation verify \
# oci://ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-mesh-conn:latest \
# oci://ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-mesh-sidecar:latest \
# --repo Dstack-TEE/dstack-examples

on:
push:
branches: [main]
paths:
- 'consul-postgres-ha/mesh-conn/**'
- 'consul-postgres-ha/bootstrap-secrets/**'
- 'consul-postgres-ha/mesh-conn/**'
- 'consul-postgres-ha/mesh-sidecar/**'
- 'consul-postgres-ha/patroni/**'
- 'consul-postgres-ha/webdemo/**'
- 'consul-postgres-ha/sidecar/**'
- 'consul-postgres-ha/signaling/**'
- '.github/workflows/consul-postgres-ha-publish.yml'
pull_request:
paths:
- 'consul-postgres-ha/mesh-conn/**'
- 'consul-postgres-ha/bootstrap-secrets/**'
- 'consul-postgres-ha/mesh-conn/**'
- 'consul-postgres-ha/mesh-sidecar/**'
- 'consul-postgres-ha/patroni/**'
- 'consul-postgres-ha/webdemo/**'
- 'consul-postgres-ha/sidecar/**'
- 'consul-postgres-ha/signaling/**'
- '.github/workflows/consul-postgres-ha-publish.yml'
workflow_dispatch:
Expand All @@ -59,18 +66,18 @@ jobs:
fail-fast: false
matrix:
include:
- name: mesh-conn
context: consul-postgres-ha/mesh-conn
- name: bootstrap-secrets
context: consul-postgres-ha/bootstrap-secrets
# `mesh-sidecar` builds with the parent dir as context so
# its Dockerfile can pull bootstrap-secrets/ and mesh-conn/
# Go sources from siblings.
- name: mesh-sidecar
context: consul-postgres-ha
dockerfile: consul-postgres-ha/mesh-sidecar/Dockerfile
- name: patroni
context: consul-postgres-ha/patroni
- name: signaling
context: consul-postgres-ha/signaling
- name: webdemo
context: consul-postgres-ha/webdemo
- name: sidecar
context: consul-postgres-ha/sidecar
- name: signaling
context: consul-postgres-ha/signaling

steps:
- uses: actions/checkout@v4
Expand All @@ -90,7 +97,7 @@ jobs:
id: meta
uses: docker/metadata-action@v5
with:
# Image namespace lives one level under the repo so all six
# Image namespace lives one level under the repo so all four
# images sit side-by-side: ghcr.io/<owner>/<repo>/consul-postgres-ha-<name>
images: ${{ env.REGISTRY }}/${{ github.repository }}/consul-postgres-ha-${{ matrix.name }}
tags: |
Expand All @@ -103,6 +110,10 @@ jobs:
uses: docker/build-push-action@v6
with:
context: ${{ matrix.context }}
# Most images use the default Dockerfile in the context.
# `mesh-sidecar` overrides this to point at
# mesh-sidecar/Dockerfile while keeping the parent context.
file: ${{ matrix.dockerfile || format('{0}/Dockerfile', matrix.context) }}
platforms: linux/amd64
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
Expand Down
70 changes: 40 additions & 30 deletions consul-postgres-ha/FAILOVER.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ PW=$(ssh ... root@${W1}-22.${GW} "cat /tmp/dstack-runtime/secrets/patroni-superu

```bash
ssh ... root@${W1}-22.${GW} \
"docker exec dstack-tester-1 sh -c 'curl -s http://127.0.0.1:18803/cluster' | jq"
"docker exec dstack-sidecar-1 sh -c 'curl -s http://127.0.0.1:18803/cluster' | jq"

ssh ... root@${W1}-22.${GW} "PGPASSWORD='$PW' docker exec -e PGPASSWORD dstack-patroni-1 \
psql -h 127.0.0.1 -p 18703 -U postgres -d postgres \
Expand Down Expand Up @@ -86,19 +86,23 @@ consistent recovery state reached at 0/...
started streaming WAL from primary at 0/... on timeline 16
```

## Measured timeline (run from 2026-05-03)
## Measured timeline (run from 2026-05-04, single-sidecar layout)

```
T_kill 05:02:28.028 docker stop dstack-patroni-1 on worker-3
T_new_leader 05:02:49.994 worker-4 promoted (timeline 15 → 16) +22s
T_first_write 05:02:52.313 INSERT succeeds on worker-4 +24s ← RTO
T_restart_W3 05:03:39.704 docker start dstack-patroni-1
T_W3_rejoined 05:04:10.377 worker-3 streaming, lag=0 +31s
T_kill 17:31:26 docker stop dstack-patroni-1 on worker-5 (leader)
T_new_leader 17:31:57 worker-4 promoted (timeline 2 → 3) +31s
T_first_write 17:31:59 INSERT succeeds on worker-4 +33s ← RTO
```

**RTO (Recovery Time Objective): ~24 seconds.** That's the wall time
**RTO (Recovery Time Objective): ~33 seconds.** That's the wall time
from leader process death to first successful write on the new leader,
sitting comfortably inside the default Patroni `ttl=30`.
sitting at the edge of the default Patroni `ttl=30`. The 2026-05-03
multi-container baseline was 24s on a different cluster — the
single-sidecar layout is within typical run-to-run variance for the
`ttl=30 + promote-overhead` window. Cheap rejoin was confirmed in a
prior round of this same run: a previously-killed leader (worker-3)
came back as a streaming replica on the new timeline with lag=0
within ~60s of `docker start dstack-patroni-1`.

## Tunables for the RTO/availability tradeoff

Expand All @@ -124,8 +128,9 @@ the leader at once:
ssh ... root@${LEADER}-22.${GW} "docker stop -t 0 \$(docker ps -q)"
```

This kills patroni, postgres, mesh-conn, consul, sidecar, webdemo, and
the keepalive — everything that produces signal for the rest of the
This kills patroni, postgres, webdemo, and the consolidated sidecar
(which itself runs bootstrap-secrets, mesh-conn, consul, and envoy
inside it) — everything that produces signal for the rest of the
cluster. Bring the host back via:

```bash
Expand All @@ -135,23 +140,29 @@ ssh ... root@${LEADER}-22.${GW} \
```

`docker compose up -d` respects the dependency order
(bootstrap-secrets → mesh-conn → consul → patroni).
(sidecar's `service_healthy` gate fires once bootstrap-secrets has
written `/run/instance/info.json`, then patroni and webdemo start).

### Measured timeline (run from 2026-05-03)
### Measured timeline (run from 2026-05-04, single-sidecar layout)

```
T_kill 07:26:42 docker stop -t 0 ALL 7 containers on worker-4
T_new_leader 07:27:13 worker-3 promoted (timeline 16 → 17) +31s
T_first_write 07:27:15 INSERT succeeds on worker-3 +33s ← RTO
T_restart_W4 07:27:46 docker compose up -d on worker-4
T_W4_rejoined 07:28:34 worker-4 streaming, lag=0 +48s after restart
T_kill 17:33:29 docker stop -t 0 ALL containers on worker-4 (leader)
T_new_leader 17:34:00 worker-3 promoted (timeline 3 → 4) +31s
T_first_write 17:34:02 INSERT succeeds on worker-3 +33s ← RTO
T_restart_W4 17:34:02 docker compose up -d on worker-4
```

**Hard-kill RTO ≈ 33 seconds**, ~9 seconds longer than the soft-kill
above. That extra cost is Consul gossip-failure detection: with
soft-kill only the Patroni leader-key TTL expires, while with hard-kill
the entire Consul agent is gone, so the surviving peers see *both*
signals.
**Hard-kill RTO ≈ 33 seconds**, identical to both the soft-kill above
and the 2026-05-03 multi-container baseline. Consul gossip-failure
detection (which sees worker-4's whole agent disappear, not just the
Patroni lock) lines up with the Patroni leader-key TTL on this run,
so neither signal extends the RTO.

The post-restart rejoin path on dstack-worker pairs is occasionally
flaky (the documented `MESH_CONN_RELAY_ONLY=1` escape hatch in
`compose/worker.yaml` is exactly this case — flip it on if your
deployment hits a wedged ICE re-handshake). The mesh-conn binary
behavior is unchanged by the single-sidecar consolidation.

### Things confirmed by the hard-kill that the soft-kill didn't exercise

Expand Down Expand Up @@ -184,17 +195,16 @@ rm -rf /var/lib/docker/volumes/dstack_patroni-pgdata/_data/*
docker start dstack-patroni-1
```

### Measured timeline (run from 2026-05-03)
### Measured timeline (run from 2026-05-04, single-sidecar layout)

```
T_wipe 21:13:41 docker stop + rm -rf pgdata on worker-5
T_restart 21:13:42 docker start
T_basebackup 21:13:47 "trying to bootstrap from leader 'worker-4'"
T_complete 21:13:54 "replica has been created using basebackup" +7s
T_streaming 21:13:58 service registered, streaming WAL +16s total
T_wipe 17:34:21 docker stop + rm -rf pgdata on worker-5
T_restart 17:34:25 docker start
T_complete 17:34:43 "replica has been created using basebackup" +18s
T_streaming 17:35:43 streaming WAL on timeline 4, lag=0 +82s total
```

5.2 MB pgdata transferred in ~7 seconds end-to-end. Note the dataset
A few-MB pgdata transferred in ~18 seconds end-to-end. The dataset
is small enough that handshake/startup overhead dominates — for a
realistic throughput number, see the soft-kill section's pg_basebackup
trace at ~25 MB/s sustained on the QUIC path.
Expand Down
50 changes: 31 additions & 19 deletions consul-postgres-ha/PUBLISHING.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
# Stage 4 — image publishing & verification

The stage-4 example needs six container images deployed in lockstep:
`mesh-conn`, `bootstrap-secrets`, `signaling`, `webdemo`, `sidecar`,
`patroni`. CI publishes them to GHCR with Sigstore-backed GitHub Build
Provenance; consumers pin by tag (or, better, by digest) and verify
provenance with `gh attestation verify`.
The stage-4 example needs four container images deployed in lockstep:
`mesh-sidecar`, `patroni`, `webdemo`, `signaling`. CI publishes them to
GHCR with Sigstore-backed GitHub Build Provenance; consumers pin by
tag (or, better, by digest) and verify provenance with
`gh attestation verify`.

`mesh-sidecar` is the consolidated platform-plumbing image — a single
container that runs bootstrap-secrets, mesh-conn, consul, and (on
workers) envoy. It's the heaviest by a wide margin because it
inherits from envoyproxy/envoy and bundles three more binaries on top.

This doc covers the three paths you'll actually use:

Expand All @@ -15,10 +20,14 @@ This doc covers the three paths you'll actually use:
## 1. CI publish — the steady-state

`.github/workflows/consul-postgres-ha-publish.yml` runs on push to `main`
when any of the six image build contexts (or the workflow itself)
when any of the four image build contexts (or the workflow itself)
change, and on PRs touching the same paths. Each run:

- Builds all six images via a matrix job.
- Builds all four images via a matrix job. The `mesh-sidecar` build
uses `consul-postgres-ha/` as its docker context (instead of
`consul-postgres-ha/mesh-sidecar/`) so its Dockerfile can pull
`bootstrap-secrets/` and `mesh-conn/` Go sources from sibling
directories.
- On `main`, pushes to `ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-<name>` with two tags: the long-form commit SHA (`sha-<40-hex>`) and `latest`.
- Generates a GitHub Build Provenance attestation per image via
`actions/attest-build-provenance@v2`. The attestation is signed by
Expand All @@ -34,12 +43,12 @@ change, and on PRs touching the same paths. Each run:
```bash
# By tag (lower assurance — `latest` floats):
gh attestation verify \
oci://ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-mesh-conn:latest \
oci://ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-mesh-sidecar:latest \
--repo Dstack-TEE/dstack-examples

# By digest (preferred — pinned, won't drift):
gh attestation verify \
oci://ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-mesh-conn@sha256:<digest> \
oci://ghcr.io/dstack-tee/dstack-examples/consul-postgres-ha-mesh-sidecar@sha256:<digest> \
--repo Dstack-TEE/dstack-examples
```

Expand All @@ -54,20 +63,23 @@ of `latest` doesn't silently swap your cluster's bits.

## 2. Manual one-off publish — dev iteration

When iterating fast on `mesh-conn` (or any other component) you don't
want to round-trip through CI for every byte. Two equivalent shortcuts:
When iterating fast on the mesh-sidecar (or any other component) you
don't want to round-trip through CI for every byte. Two equivalent
shortcuts. Note that `mesh-sidecar` builds from the
`consul-postgres-ha/` parent dir (it pulls Go sources from sibling
subdirs); the rest build from their own subdir.

### a) `ttl.sh` (24h-disposable, no auth)

```bash
TS=$(date +%s)
TAG=ttl.sh/dstack-mesh-conn-${TS}:24h
docker build -t $TAG consul-postgres-ha/mesh-conn
TAG=ttl.sh/dstack-mesh-sidecar-${TS}:24h
docker build -t $TAG -f consul-postgres-ha/mesh-sidecar/Dockerfile consul-postgres-ha
docker push $TAG
```

Then point the running cluster at it via `terraform.tfvars`'s
`mesh_conn_image = ...` (and `terraform apply`), or hot-patch the
`mesh_sidecar_image = ...` (and `terraform apply`), or hot-patch the
running CVM (see §3). `ttl.sh` images expire 24h after push.

### b) Personal GHCR namespace (persistent, requires PAT)
Expand All @@ -76,8 +88,8 @@ If you want a longer-lived dev image without going through main:

```bash
echo "$GITHUB_TOKEN" | docker login ghcr.io -u <your-user> --password-stdin
TAG=ghcr.io/<your-user>/consul-postgres-ha-mesh-conn:dev-$(date +%s)
docker build -t $TAG consul-postgres-ha/mesh-conn
TAG=ghcr.io/<your-user>/consul-postgres-ha-mesh-sidecar:dev-$(date +%s)
docker build -t $TAG -f consul-postgres-ha/mesh-sidecar/Dockerfile consul-postgres-ha
docker push $TAG
```

Expand All @@ -99,17 +111,17 @@ Phala-Network/terraform-provider-phala#8).
```bash
GW=dstack-pha-prod5.phala.network
APP_ID=<cvm-app-id>
NEW=ttl.sh/dstack-mesh-conn-<ts>:24h
NEW=ttl.sh/dstack-mesh-sidecar-<ts>:24h
OLD=$(ssh ... root@${APP_ID}-22.${GW} \
"docker inspect dstack-mesh-conn-1 --format '{{.Config.Image}}'")
"docker inspect dstack-sidecar-1 --format '{{.Config.Image}}'")

ssh ... root@${APP_ID}-22.${GW} "
docker pull $NEW
docker tag $NEW $OLD
cd /tapp && docker compose \
--env-file /dstack/.host-shared/.decrypted-env \
-p dstack -f /tapp/docker-compose.yaml \
up -d --force-recreate mesh-conn
up -d --force-recreate sidecar
"
```

Expand Down
15 changes: 7 additions & 8 deletions consul-postgres-ha/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,10 @@ Prerequisites:

- A Phala Cloud account with API credentials at `~/.phala-cloud/credentials.json`.
- A Linux box with a public IP for the external coordinator (coturn + signaling).
- The six container images either already published to GHCR (via the
CI workflow on this repo's main branch) or pushed by you to a
registry of your choice. See [`PUBLISHING.md`](PUBLISHING.md).
- The four container images (`mesh-sidecar`, `patroni`, `webdemo`,
`signaling`) either already published to GHCR (via the CI workflow
on this repo's main branch) or pushed by you to a registry of your
choice. See [`PUBLISHING.md`](PUBLISHING.md).

```bash
cd consul-postgres-ha/cluster-example
Expand Down Expand Up @@ -72,11 +73,11 @@ consul-postgres-ha/
├── compose/ coordinator.yaml + worker.yaml templates
├── coordinator/ docker-compose for the external coordinator (coturn + signaling)
├── mesh-conn/ QUIC-over-pion/ICE overlay (~600 LoC Go)
├── bootstrap-secrets/ init container — TEE-derives per-CVM secrets
├── mesh-sidecar/ consolidated platform sidecar image (bootstrap-secrets + mesh-conn + consul + envoy)
├── bootstrap-secrets/ Go source — TEE-derives per-CVM secrets (built into sidecar)
├── mesh-conn/ Go source — QUIC-over-pion/ICE overlay (built into sidecar)
├── patroni/ Patroni + Postgres image
├── webdemo/ example workload sitting on the mesh
├── sidecar/ Envoy bootstrapper for Consul Connect mTLS
├── signaling/ HTTP /publish + /poll broker for ICE auth/candidate exchange
└── quic-on-ice/ standalone smoke test for the QUIC-over-ICE transport
```
Expand Down Expand Up @@ -113,8 +114,6 @@ and the Terraform structure as-is.
in parallel hits
[`phala-cloud#247`](https://github.com/Phala-Network/phala-cloud/issues/247)
— use `-parallelism=1` for now (~5 min × N to bring-up).
* Six container images per CVM is more platform plumbing than ideal.
A consolidation pass to a single sidecar container is planned.
* The mesh-conn admission story is **shared-secret based today**
(TURN HMAC), not attestation-based. Adding TEE attestation as the
admission credential is the next architectural step.
Expand Down
11 changes: 0 additions & 11 deletions consul-postgres-ha/bootstrap-secrets/Dockerfile

This file was deleted.

Loading
Loading