feat(dbos): pod dispatch-role split (api/worker) via listenQueues by pedrofrxncx · Pull Request #3937 · decocms/studio

pedrofrxncx · 2026-06-16T00:16:53Z

What

Pod dispatch-role split so request-serving and agent-loop execution scale as independent pods off one image, one Helm release, one Argo app.

App (MESH_DISPATCH_ROLE): all (default, unchanged) · worker (dequeue only the thread-gate/automations run queues) · api (dequeue nothing; serve HTTP + enqueue). Wired via DBOSConfig.listenQueues; all omits it → identical to today.

Chart (worker.enabled): renders a second <fullname>-worker Deployment + CPU HPA, same image/DB/auth. dispatchRole sets the main deployment's role.

How the LB can't hit a busy worker

Worker pods carry a distinct app.kubernetes.io/name (<name>-worker), so neither the main Deployment selector nor the Service selector match them. Workers are never Service endpoints → the LB has no path to them; they receive work only by pulling the DBOS queue. /stream is served by the api pods (which tail NATS for chunks workers published). Kubelet health probes hit pods directly, so workers stay live despite being off-Service.

Safety

worker.enabled=false (default) renders byte-identical to the current chart (verified with helm template diff) — zero impact until you opt in.
Distinct -worker name means no change to the main Deployment's immutable selector and no Service change → no recreate.
dispatchRole is an env-only change (rolling, no recreate).
Requires ≥1 worker (or all) pod or runs never dispatch.

Rollout

worker.enabled=true → workers come up; main still all, nothing breaks.
Confirm workers dequeue + a real decopilot run executes.
dispatchRole=api → main pods stop running the agent loop.
Rollback = unset both.

Validation

tsc clean · bun test dispatch-queue automations 31/0 · helm lint clean · helm template worker-off == baseline (byte-identical) · worker-on renders main=api / worker=worker with disjoint selectors + worker HPA.

Why

Profiling showed studio's under-load CPU is streaming throughput (NATS/socket I/O + AI-SDK parsing), not GC/idle-polling/Ajv. Scaling studio replicas to absorb it also multiplies the heavy per-pod footprint (DB pool, DBOS executor + queue polling). This lets the agent-loop workers scale on CPU independently of the request tier.

…eues Lets one image run as api-only or worker-only so request-serving and agent-loop execution scale as independent Deployments off the same DB/auth. - MESH_DISPATCH_ROLE=all (default, unchanged) | worker (dequeue only the agent/automation run queues) | api (dequeue nothing; serve HTTP + enqueue). - Wires DBOSConfig.listenQueues from the role. 'all' omits it → identical to today, so it's opt-in and safe. - Queue names moved to a side-effect-free dispatch-queue/queue-names module so index.ts can read them before DBOS.setConfig (which must precede workflow registration); existing consumers keep their import paths via re-export. Scheduled (cron) workflows + enqueueing run on every pod and stay exactly-once via DBOS's row-locked schedule, so an api pod can fire a cron a worker runs. Requires >=1 worker/all pod or runs never dispatch.

One release / one Argo app: set worker.enabled=true to render a second Deployment (<fullname>-worker, MESH_DISPATCH_ROLE=worker) that runs only the agent/automation run queues, with its own CPU HPA. Workers carry a distinct app.kubernetes.io/name (<name>-worker), so the main Deployment selector AND the Service selector don't match them — workers stay OFF the load balancer (no HTTP ever routed to a busy worker) and receive work only by pulling the DBOS queue. /stream is served by the api pods (NATS tail); kubelet health probes hit pods directly, so workers stay live off-Service. dispatchRole sets the MAIN deployment's role (default ""=all, unchanged; set "api" once workers exist). Verified: worker.enabled=false renders byte-identical to before (zero impact); helm lint clean.

pedrofrxncx added 2 commits June 15, 2026 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dbos): pod dispatch-role split (api/worker) via listenQueues#3937

feat(dbos): pod dispatch-role split (api/worker) via listenQueues#3937
pedrofrxncx wants to merge 2 commits into
mainfrom
feat/dbos-queue-role-split

pedrofrxncx commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pedrofrxncx commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How the LB can't hit a busy worker

Safety

Rollout

Validation

Why

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pedrofrxncx commented Jun 16, 2026 •

edited

Loading