Worka is a security-first orchestration system for AI agents.
The point of Worka is not that one agent can write GitHub issues, summarize Slack threads, or call a tool. Those are examples of work units. The real value is the control plane around them: Worka turns a user request into a governed, observable, policy-gated workflow where agents run as short-lived containers, every important state transition is recorded, and privileged side effects pass through a central Gateway / Policy Layer.
Worka is a workflow for agent orchestration:
- A user request enters through Slack.
- The request is converted into durable run state.
- A planning agent proposes a structured plan.
- The Gateway / Policy Layer validates and stores the plan.
- Step-scoped agents run in isolated containers.
- Tool access is mediated through policy-gated Gateway routes.
- Results, gates, failures, and audit events are recorded in Supabase.
- Memory and eval layers feed improvement without becoming hidden authority.
The architecture is designed so agents can become more capable without becoming more trusted.
flowchart LR
User["User"]
Slack["Slack ingress"]
Gateway["Gateway / Policy Layer"]
Job["K8s agent job container"]
Return[" "]
User -->|"input"| Slack
Slack -->|"new run + context envelope"| Gateway
Gateway -->|"approved work"| Job
Job -->|"agent output"| Gateway
Gateway --> Return
Return -->|"cleaned, validated output"| User
style Return fill:transparent,stroke:transparent,color:transparent;
Worka keeps authority centralized and execution isolated. The most important rule is that agents are workers, not trusted decision makers. They can propose work, request tools, and return structured outputs, but they do not decide what is allowed, where data is written, or which external side effects happen.
The Gateway / Policy Layer is the trusted control plane. It validates JWTs, checks policy, owns state transitions, dispatches containers, mediates tool calls, posts Slack messages, logs policy decisions, and records failures. Every agent request that matters goes through Gateway.
Agents never touch the policy file or their permission model. policyMiddleware loads the Gateway-owned policy manifest, checks the authenticated caller against agent_permissions, verifies skill allowlists when a tool route is involved, and writes allow/deny decisions to gates. Agents cannot grant themselves tools, widen their scope, change policy, or bypass Gateway by claiming a different identity in a request body.
Plans are treated as proposals until the Gateway accepts them. Hari writes a proposed plan and sends it back to Gateway. Gateway validates that plan against deterministic schema and normalization rules, checks the caller identity, rewrites temporary step_ref dependencies into real persisted step IDs, and only then stores trusted plan and step records in Supabase. A malformed or unauthorized plan is rejected instead of becoming executable state.
Agents are also not in direct contact with external APIs or the user's machine. Ernest may need GitHub, Daneel may need Slack, and future agents may need other tools, but the security model is mediation: the agent asks Gateway, Gateway checks policy and state, then Gateway performs or proxies the allowed action. Slack delivery, GitHub MCP calls, Supabase writes, memory persistence, and failure reporting are all controlled outside the agent container.
Hari, Ernest, Daneel, and future agents run as short-lived Kubernetes Jobs. Each container receives only the run or step context it needs, does one bounded piece of work, returns structured output, and exits.
Containerization gives Worka practical safety properties:
- Ephemeral execution: there is no long-lived agent process accumulating authority or state.
- Smaller blast radius: a compromised or broken agent is confined to one run or one step.
- Narrow context: step containers receive assigned
run_id,step_id, objective, token, and allowed secrets, not broad system access. - No direct user-machine access: agents run in the cluster/container runtime rather than on the operator's laptop.
- Cleaner cleanup: the runtime can monitor, fail, and delete Kubernetes Jobs.
- Auditable boundaries: container start, completion, failure, and timeout are visible control-plane events.
The container is where untrusted agent reasoning happens. The Gateway / Policy Layer is where trusted decisions happen.
Supabase is the durable source of truth and audit trail. Worka records the workflow as relational state:
runs: one row per user request. Tracks lifecycle status, source, requested text, requester, Slack channel/user metadata, timestamps, and delivery context.plans: Hari's proposed plan after Gateway validation. Storescreated_by, status,plan_hash, raw plan JSON, and timestamps.steps: executable units produced from an accepted plan. Stores the assignedagent_name, sequence, instruction JSON, allowed skills, dependencies, status, results, artifacts, and start/completion timestamps.gates: policy audit log. Each allow or deny decision records the action, status, subject, step context, and reason.burns: failure records for container/job failures or other burn-worthy events, including run, step, agent, and reason.evals: evaluation results for runs or skills, used to compare behavior against deterministic expectations.skill_invocations: audit records for privileged tool calls routed through Gateway, including policy and lock status plus input/output payloads.policy_manifests: immutable compiled policy records for versioned governance.artifacts/worka_artifacts: outputs and evidence linked to a run or skill call.
This gives operators a clear answer to what happened, who did it, which policy allowed it, what failed, and what state changed.
Worka separates improvement signals from control authority.
mem0 stores best-effort per-agent episodic memory after accepted useful work. Agents do not call mem0 directly; the Gateway writes memory into agent-scoped namespaces, and memory failures do not block a run.
The eval layer checks that orchestration behavior remains deterministic: policy decisions, schema validation, step completion, Slack delivery, job monitoring, and agent output normalization.
Langfuse is implemented for agent-container tracing. services/agents/tracing.ts initializes OpenTelemetry with the Langfuse span processor, instruments the Anthropic SDK, and flushes immediately so short-lived containers export traces before exit. Hari, Ernest, and Daneel start active Langfuse observations with run_id / step_id metadata, while recordAgentInput and recordAgentOutput attach sanitized inputs and outputs to the active observation.
Together, Supabase logs, evals, mem0, and Langfuse form the feedback loop: observe runs, evaluate behavior, preserve useful agent context, and tighten policy/prompts/schemas without giving agents hidden authority.
- Architecture - detailed system flow, diagrams, legend, security posture, evals, and Langfuse notes
- Security Model - auth, policy, secrets, and failure boundaries
- Local Development - local setup and useful commands
- Evals - deterministic test and evaluation expectations
- Memory - mem0 namespaces and persistence rules
- Demo - operator-facing walkthrough