Harness for YAML-defined workflows that enables stepping through Claude sessions and bash commands.
Built for personal use by Coston. Public for sharing the approach. Use at your own risk.
npm install -g executantRequirements:
- Node.js 18+
- At least one coding-agent CLI on
PATH:- Claude Code —
npm install -g @anthropic-ai/claude-code(default) - OpenCode —
npm install -g opencode-ai(local/alternative models)
- Claude Code —
That's it. Executant has no other system dependencies. It runs on macOS and Linux.
For local LLM inference via llama.cpp (Apple Silicon Metal GPU), see docs/local-models.md.
Run npm run setup to verify all dependencies are installed and configured.
# workflow.yaml
goal: "Review and test my changes"
steps:
- name: test
type: script
command: npm test
- name: review
prompt: |
Review the changes in git diff and summarise any concerns.executant workflow.yamlA workflow is a YAML file with a goal and a list of steps. Each step is either a prompt (Claude runs it with full tool access), a script (bash runs it directly), a log (progress marker), or a forEach (iterates over a list). Steps run in order; the TUI shows live output and elapsed time for each.
executant plan "convert all CoffeeScript files to TypeScript and run tests"Generates a workflow YAML in your project's task directory using a three-pass Claude pipeline (research → decompose → validate). Also accepts -f file or stdin.
For self-contained requests (repetition patterns, forEach loops, or anything that doesn't need codebase exploration), the research pass is skipped automatically — going straight to decompose + validate. Use -q / --fast to force-skip research for any request:
executant plan -q "repeat the following prompt 20 times: review src/ for issues"
executant plan --fast "for each file in the list, run the linter"Use vars to define shared values substituted as {{var_name}} in any prompt or command. Pair with context to inject file contents directly into a prompt at runtime, and output to pipe a script step's stdout into a file for downstream steps to read.
vars:
spec: docs/spec.md
report: /tmp/report.txt
steps:
- name: implement
context: [spec] # prepends docs/spec.md contents to the prompt
prompt: Implement the feature described in the spec above.
- name: audit
type: script
command: npm run audit
output: report # captures stdout to /tmp/report.txt
- name: summarise
prompt: Summarise the audit findings in {{report}}.Use forEach to repeat a step over a list or shell command output — {{item}} is substituted per iteration:
steps:
- name: lint {{item}}
forEach: "git diff --name-only HEAD~1" # or an inline list: [a.ts, b.ts]
type: script
command: npx eslint src/{{item}}Use steps: inside a forEach or repeat to run multiple child steps per iteration:
steps:
- name: verify each package
forEach: [packages/api, packages/web, packages/shared]
steps:
- name: lint {{item}}
type: script
command: npm run lint --workspace={{item}}
- name: test {{item}}
type: script
command: npm test --workspace={{item}}
- name: build {{item}}
type: script
command: npm run build --workspace={{item}}Use repeat: N as shorthand when there is no meaningful list — just a count. {{item}} is the 1-based iteration number:
steps:
- name: iterative audit
repeat: 5
prompt: |
This is pass {{item}} of 5. Review src/runner.ts for untested edge cases.Pass --var KEY=VALUE on the command line to override or supply workflow vars without editing the YAML:
executant --var env=staging --var region=eu-west-1 deploy.yamlCLI vars override any same-named vars in the workflow's vars: section. Multiple --var flags are accepted.
Executant supports multiple coding-agent CLI backends. Claude is the default; OpenCode is a first-class alternative that supports a wide range of open models.
# Use OpenCode for all prompt steps
export EXECUTANT_PROVIDER=opencode
export EXECUTANT_MODEL=llama-qwen7b/qwen2.5-coder-7b
export EXECUTANT_AGENT=build
executant workflow.yamlgoal: "Review and implement changes"
steps:
- name: implement
provider: opencode
model: llama-qwen7b/qwen2.5-coder-7b
agent: build
prompt: |
Implement the requested change and run tests.
- name: review
provider: claude
model: sonnet
prompt: |
Review the git diff and summarise risks.| Variable | Description | Default |
|---|---|---|
EXECUTANT_PROVIDER |
Agent backend: claude or opencode |
claude |
EXECUTANT_MODEL |
Model name. Claude: sonnet/opus. OpenCode: llama-qwen7b/qwen2.5-coder-7b etc. |
per-provider default |
EXECUTANT_AGENT |
OpenCode --agent name (ignored by Claude) |
— |
Step-level provider, model, and agent fields take priority over env vars.
llm_as_judge: true— after a step completes, Claude evaluates the output; retries with feedback on FAIL, up to 5×self_healing: true— on script failure, Claude diagnoses and repairs the command, then re-runs it, up to 5×timeout_seconds: N— kill the step after N seconds and fail with exit code 3. Works for both script and prompt steps.allowed_tools— restrict which tools a prompt step can use:- Omit entirely → all tools available (default)
allowed_tools: []→ text-only mode, no toolsallowed_tools: [Bash, Read, Write]→ only those tools; names are case-insensitive
steps:
- name: analyse
prompt: Review the architecture and list concerns.
allowed_tools: [Read, Glob, Grep] # read-only: no edits or bash
- name: summarise
prompt: Write a one-paragraph summary.
allowed_tools: [] # no tools — pure text generationsteps:
- name: install
command: npm ci
timeout_seconds: 120 # fail if install takes longer than 2 min
- name: implement
prompt: Implement the feature described above.
timeout_seconds: 1800 # 30 min ceiling for the Claude stepWrite a .executant-cancel file in the same directory as the workflow YAML to stop the workflow cleanly between steps:
executant long-workflow.yaml &
touch .executant-cancel # workflow stops at the next step boundary; exits 4The file is deleted automatically. This is a cooperative, process-safe alternative to SIGTERM — no mid-step git state corruption. The cancel file is always resolved relative to the workflow file, so the location is predictable regardless of which directory you invoked executant from.
While a workflow is running, press i to open a text input at the bottom of the TUI. Type a correction and press Enter to send it; Esc cancels.
The message is queued and prepended to the next Claude step's prompt as [User correction from a previous step]. Claude sees your note before it starts and incorporates it into its work. If you interject while a script step is running, the correction waits for the next Claude step in the workflow.
press i → ▷ don't delete that file, use git revert▌ esc to cancel
What it's good for: steering the next Claude step while watching the current one run — leaving a note for the step that's about to start.
What it can't do: interrupt a Claude step mid-execution. The Claude CLI processes each invocation as a complete unit; there's no mechanism to inject a message partway through. To abort a runaway step immediately, press q.
| File | Demonstrates |
|---|---|
hello-world.yaml |
Simple prompt steps |
mixed-workflow.yaml |
Script + prompt steps together |
foreach-demo.yaml |
Inline lists and shell command iteration |
nested-steps-demo.yaml |
Multiple child steps per forEach / repeat iteration |
vars-demo.yaml |
Variable substitution |
judge-demo.yaml |
LLM-as-judge retry loop |
logging-demo.yaml |
Log steps, self-healing, judge |
git-status-summary.yaml |
Real-world git workflow |
repeat-demo.yaml |
Running a step N times with repeat |
file-demo.yaml |
File operations |
from-step-test.yaml |
Using --from-step to resume mid-workflow |
See the examples/ directory.
executant plan "description" # generate a workflow YAML (auto-detects fast path)
executant plan -q "description" # skip research pass (fast path)
executant refine workflow.yaml "instructions" # refine an existing workflow YAML
executant workflow.yaml # run a workflow
executant --ci workflow.yaml # headless, NDJSON to stdout
executant --step <name|n> wf.yaml # run one step by name or index
executant --from-step <n> wf.yaml # resume from step n
executant --var KEY=VALUE wf.yaml # override a workflow var at runtime
executant update # upgrade to latest version| Code | Meaning |
|---|---|
0 |
All steps completed successfully |
1 |
A step failed at runtime |
2 |
YAML or variable validation error |
3 |
A step timed out (timeout_seconds exceeded) |
4 |
Cancelled via .executant-cancel file |
npm test # run tests
npm run eval -- evals/plan-decompose.eval.yaml # score a prompt template
npm run eval -- --refine evals/plan-decompose.eval.yaml # refine until all cases pass
npm run eval -- --cases simple-feature,1-3 evals/plan-decompose.eval.yaml # run a subset of casesThe eval system tests and iteratively refines the prompt templates in src/prompts/. Eval definitions live in evals/*.eval.yaml; see AGENTS.md for the full format.
Pass --output-csv results/out.csv to any eval run to save results. Re-running with the same path resumes from where it left off — already-scored cases are skipped.
# Run all evals × all configured models and generate a benchmark report
npm run eval:compare
npm run eval:compare:report # regenerate report from existing CSVs
# Compare specific models on a single eval
npm run eval -- \
--models claude/sonnet,opencode/llama-qwen7b/qwen2.5-coder-7b \
--output-csv results/comparison.csv \
evals/judge-evaluation.eval.yaml
# Run multiple eval files in one command
npm run eval -- evals/plan-decompose.eval.yaml evals/judge-evaluation.eval.yamlThe --output-csv file is denormalized (one row per criterion judgment per model) — ready for pivot tables and charts. See docs/eval-comparison.md for column definitions and interpretation guidance.
Workflow evals test models on complete coding tasks — the full development lifecycle — rather than just prompt quality. Each task runs in an isolated git worktree:
explore → plan → implement → npm test → commit
After the model finishes, Claude (always Claude, never the model being tested) reviews the git diff and judges it against the task criteria.
npm run eval:workflow -- --models claude/sonnet path/to/task.yaml
npm run eval:workflow -- \
--models claude/sonnet,opencode/llama-qwen7b/qwen2.5-coder-7b \
--output-csv results/workflow-comparison.csv \
path/to/task.yamlTask files are valid executant workflow YAMLs with an extra eval_criteria top-level field the harness reads for post-run judging.