For long-running agentic workflows where "please follow this process" is not enough: define the path once, start the run, and let enforced gates keep the agent on track.
Agents are now capable enough for long, multi-step work, but the main failure mode has shifted from task ability to process drift: skipping approval, forgetting recovery rules, claiming evidence that was not produced, or continuing from stale context.
Prompts and skills can describe the process, but they cannot enforce it. aharness turns the process into a runtime: states define what Codex may do next, typed submissions prove what happened, and transitions only occur through validated exits.
aharness plugs into the Codex setup you already use. It runs with your
AGENTS.md, skills, MCP servers, permissions, and local tools instead of asking
you to adopt a separate agent ecosystem.
A finite state machine (FSM) is a graph of named states and allowed transitions. For Codex workflows, FSMs sit between two bad extremes: raw code is flexible but hard to constrain, while YAML or JSON is easy to validate but too rigid for real workflows. aharness defines FSMs in TypeScript so workflows stay enforceable while still using typed data, guards, reducers, effects, the npm package ecosystem, and ordinary code-level composition.
The bet is that useful workflows are reusable software. They should be authored, reviewed, versioned, composed from smaller FSMs, and published as npm packages instead of copied around as prompts.
| Vanilla coding agents | With aharness |
|---|---|
| Put the process in instructions or a skill and hope it sticks | Encode the process as states the agent cannot skip |
| "Don't continue until I approve" | Approval is a real workflow gate |
| "Tell me what you did when you're done" | Evidence is typed, validated, and saved into workflow data |
| "If this fails, try to fix it, but don't loop forever" | Repair paths, retry limits, and failure outcomes are explicit |
| Long runs survive by compaction, summaries, or subagents | State, context clearing, and subprocess boundaries are deliberate |
| "Use the stronger model for the hard part" | Model and effort are selected per state |
| "Can I inspect what happened?" | Runs write state history, events, artifacts, and browser views |
Prerequisites:
- Node.js
>=20 - Codex CLI on
PATH; seepackages/core/SUPPORTED_CODEX.mdfor the current compatibility gate
npm install -g @aharness/coreThe global install puts the aharness command on PATH. Scaffolded projects
still get local authoring dependencies so editors and tsc can typecheck FSM
source.
Start with Codeflow to turn a large implementation roadmap into reviewed, verified, committed slices. It is the packaged aharness workflow for changes that are too broad or risky for one implementation plan.
Install the @aharness/codeflow
workflow package through aharness:
aharness install @aharness/codeflowThen run its recipe-driven development command against an implementation roadmap in your repository:
aharness run recipe-driven-development --roadmap-path docs/plans/my-roadmap.mdThe Codeflow package also ships process skills for preparing the roadmap:
writing-ideas, grill-me, writing-specs, reviewing-specs, and
writing-implementation-roadmaps. See the
Alfredvc/codeflow repository for docs
and more information.
- Writing Workflows
- Installing FSM Packages
- Try The Demo
- How It Works
- Common Commands
- Packages
- Documentation
- FAQ
Author workflows with the bundled aharness FSM authoring skill, not from a
blank TypeScript file. The skill guides Codex through state design, typed exits,
owner choices, recovery paths, verification, and current @aharness/core API
rules.
Install the authoring skill with npx skills:
npx skills add Alfredvc/aharnessThen ask Codex to use it:
Use $aharness-fsm-authoring to design and author an aharness FSM for this workflow.
The skill lives at
skills/aharness-fsm-authoring. Under
the hood, generated workflows are TypeScript files built with createFsm,
fsm.state, fsm.submit, fsm.choice, and fsm.final; see
docs/authoring.md when you need the API details.
A small FSM looks like this:
import { createFsm } from '@aharness/core';
interface Data {
plan: string | null;
}
const fsm = createFsm<Data>();
export default fsm.machine({
id: 'tiny-approval-workflow',
initial: 'plan',
data: () => ({ plan: null }),
states: {
plan: fsm.state({
prompt:
'Inspect the requested work, write a short plan, ' +
'then submit it as { "plan": "..." }. Do not edit files yet.',
on: {
submitPlan: fsm.submit<{ plan: string }>({
to: 'ownerApproval',
reduce: (draft, payload) => {
draft.plan = payload.plan;
},
}),
},
}),
ownerApproval: fsm.choice({
question: (data) => `Approve this plan before continuing?\n\n${data.plan}`,
options: [{ label: 'Approve', to: 'done' }],
}),
done: fsm.final({ outcome: 'success' }),
},
});Published workflows are normal npm packages with aharness command metadata. Install them through the global CLI:
aharness install workflow-package
aharness list
aharness verify build
aharness run build --project ./appaharness install <source> accepts package specs npm accepts: registry
packages, versions or dist-tags, GitHub repos, git URLs, local directories, and
tarballs.
aharness install workflow-package@latest
aharness install github:owner/workflows
aharness install git+https://github.com/owner/workflows.git
aharness install ../workflows
aharness install ./workflows-1.0.0.tgzDuring install, aharness lets npm materialize the package in its managed npm project, then validates package command metadata, package-relative assets, bundled skill declarations, and every declared FSM before writing trusted command records. Installs may run npm lifecycle scripts, so install packages from sources you trust. Unverified commands are not runnable.
Installed commands can be run or verified by fully qualified command identity, or by bare command name when there is no collision. Package names by themselves are not accepted verification targets:
aharness run workflow-package/build
aharness run build
aharness verify workflow-package/build
aharness verify buildRemove a package by package identity, not by command name:
aharness uninstall workflow-packageRe-run aharness install <same-source> to refresh a package after a new npm
version, Git ref, tarball, or local snapshot is available.
After installing the global CLI, clone this repository so the demo FSM and fixture files are available:
git clone https://github.com/Alfredvc/aharness.git
cd aharness
aharness verify examples/coding-smoke.fsm.ts
aharness run examples/coding-smoke.fsm.tsThe demo files are:
examples/coding-smoke.fsm.ts- the FSM.examples/coding-smoke/fixture- the tiny broken TypeScript fixture the agent repairs.examples/coding-smoke/README.md- what to watch during the run.
After that, use examples/DEMOS.md as a catalog of focused
mechanism demos for awaits, approvals, hooks, composition, skills, branching,
and final artifacts.
flowchart LR
Codex["Codex CLI<br/>agent worker"]
Aharness["aharness CLI<br/>FSM actor + verifier"]
Browser["Loopback browser UI<br/>input + approvals + graph"]
Runs[".aharness/runs/<runId><br/>events.jsonl + reports + artifacts"]
Aharness <--> Codex
Aharness <--> Browser
Aharness --> Runs
An aharness run has three jobs:
- Verify the workflow before Codex starts. Invalid FSMs fail early, before the model can begin work.
- Keep Codex inside the active state. aharness tells Codex the current state, valid exits, and required submit schema. Codex does the work; aharness validates submitted evidence and decides the next state.
- Record the run. Every run writes canonical artifacts under
.aharness/runs/<runId>/, including the event log, state history, final artifacts, and data used by the browser view.
The browser UI is the live operator surface. It shows the current state, graph,
compact transcript, approvals, and owner-input controls. Use --no-open when
you want aharness to serve and print the URL without opening a browser window.
Recorded inspection uses aharness view [run-id]. It reopens a completed run
from .aharness/runs without starting Codex or resuming the workflow. Omit the
run id to inspect the newest recorded run.
Run directories are sensitive. They can contain raw owner input, browser
replies, tool arguments and results, command output, file diffs, approvals,
token usage, and workflow context snapshots. Treat .aharness/runs as private
runtime evidence, not as a sanitized transcript.
aharness init --dir <path>
aharness verify <file.fsm.ts|command>
aharness visualize <file.fsm.ts|command>
aharness run <file.fsm.ts|command> --help
aharness run [--ask|--yolo] [--no-open] <file.fsm.ts|command> [--<input-flag> <value>]...
aharness view [run-id]
aharness doctor
aharness install <source>When the standard CI environment variable is set to a truthy value,
aharness verify skips Codex-backed model catalog validation so structural FSM
verification can run in environments without a Codex app-server. All other
static verifier checks still run.
See docs/reference.md for the full CLI, authoring API,
state options, hooks, installable package commands, completions, default Codex
auto-review behavior, --ask, --yolo, and --no-open. See
docs/advanced-runtime-surfaces.md for
programmatic live runs and Codex sidecar threads.
@aharness/coreprovides the SDK, theaharnessCLI binary, and theaharness-completionshell-completion helper binary.@aharness/test-supportprovides integration-test fixtures for aharness runs.packages/web-uiis the private React/Vite browser UI bundled into the core CLI build.
docs/authoring.mdteaches the workflow authoring mental model.docs/fsm-packages.mdexplains how to publish, install, run, and compose reusable FSM packages.docs/reference.mddocuments the public SDK and CLI.docs/advanced-runtime-surfaces.mddocuments programmatic live runs and Codex sidecar threads.docs/architecture.mdexplains the Codex/aharness runtime boundary.docs/troubleshooting.mdcovers prerequisite and runtime failures.packages/core/SUPPORTED_CODEX.mddocuments the Codex CLI compatibility gate.CONTRIBUTING.md,CHANGELOG.md, andSECURITY.mdcover project maintenance, release notes, and vulnerability reporting.
-
How is this different from Claude Code Dynamic Workflows: Both try to solve the same issue: agents lack determinism. The approach is different. Dynamic workflows are generated on the fly by Claude Code itself. Aharness FSMs are long-lived workflows that are iterated on and improved over time. Aharness also supports single-use FSMs, but that is not the main use case.
-
Why Codex: This project was originally based on Claude Code, but Claude Code is closed source and changes often. That made it difficult to develop aharness while keeping up with upstream changes. Codex is open source, and its app-server split makes building on top of it much easier.
-
When should I use aharness instead of a normal Codex session: Use aharness when process drift matters: ordered phases, approvals, typed evidence, recovery paths, or terminal outcomes should be enforced instead of remembered. For tiny one-shot edits or fully owner-steered sessions, a normal Codex session is usually simpler.
-
Does aharness replace Codex: No. Codex still does the language, code, and tool work. Aharness owns the workflow boundary around that work: active states, valid exits, schema validation, owner choices, approval routing, hooks, transitions, and durable run evidence.
-
Will you ever support Claude Code or PI: It depends on traction. This is currently an experiment, and it is already useful to me in its current form.
-
Can I run many FSMs simultaneously from one single UI: Not yet. This also depends on traction. The long-term idea is to support
aharness submit Xtogether with a daemon that executes FSMs in the background. All UI <-> aharness communication is HTTP-based, so a local daemon could talk to a remote UI, or vice versa. -
Can I share workflows with a team: Yes. Workflows can be shipped as npm packages with aharness command metadata, bundled skills, and package-relative assets. Install packages only from sources you trust, because npm lifecycle scripts may run during
aharness install. -
Do I have to hand-write FSMs: No. The intended authoring path is to use the bundled aharness FSM authoring skill with Codex, then use the docs as API reference when you need exact details.
Apache-2.0. See LICENSE.
