Executant

Executant

Harness for YAML-defined workflows that enables stepping through Claude sessions and bash commands.

Advisory

Built for personal use by Coston. Public for sharing the approach. Use at your own risk.

Install

npm install -g executant

Requirements:

Node.js 18+
At least one coding-agent CLI on PATH:
- Claude Code — npm install -g @anthropic-ai/claude-code (default)
- OpenCode — npm install -g opencode-ai (local/alternative models)

That's it. Executant has no other system dependencies. It runs on macOS and Linux.

For local LLM inference via llama.cpp (Apple Silicon Metal GPU), see docs/local-models.md.

Run npm run setup to verify all dependencies are installed and configured.

Quick Start

# workflow.yaml
goal: "Review and test my changes"

steps:
  - name: test
    type: script
    command: npm test

  - name: review
    prompt: |
      Review the changes in git diff and summarise any concerns.

executant workflow.yaml

How It Works

A workflow is a YAML file with a goal and a list of steps. Each step is either a prompt (Claude runs it with full tool access), a script (bash runs it directly), a log (progress marker), or a forEach (iterates over a list). Steps run in order; the TUI shows live output and elapsed time for each.

Generating Workflows

executant plan "convert all CoffeeScript files to TypeScript and run tests"

Generates a workflow YAML in your project's task directory using a three-pass Claude pipeline (research → decompose → validate). Also accepts -f file or stdin.

For self-contained requests (repetition patterns, forEach loops, or anything that doesn't need codebase exploration), the research pass is skipped automatically — going straight to decompose + validate. Use -q / --fast to force-skip research for any request:

executant plan -q "repeat the following prompt 20 times: review src/ for issues"
executant plan --fast "for each file in the list, run the linter"

Context & Variables

Use vars to define shared values substituted as {{var_name}} in any prompt or command. Pair with context to inject file contents directly into a prompt at runtime, and output to pipe a script step's stdout into a file for downstream steps to read.

vars:
  spec: docs/spec.md
  report: /tmp/report.txt

steps:
  - name: implement
    context: [spec]           # prepends docs/spec.md contents to the prompt
    prompt: Implement the feature described in the spec above.

  - name: audit
    type: script
    command: npm run audit
    output: report            # captures stdout to /tmp/report.txt

  - name: summarise
    prompt: Summarise the audit findings in {{report}}.

Use forEach to repeat a step over a list or shell command output — {{item}} is substituted per iteration:

steps:
  - name: lint {{item}}
    forEach: "git diff --name-only HEAD~1"   # or an inline list: [a.ts, b.ts]
    type: script
    command: npx eslint src/{{item}}

Use steps: inside a forEach or repeat to run multiple child steps per iteration:

steps:
  - name: verify each package
    forEach: [packages/api, packages/web, packages/shared]
    steps:
      - name: lint {{item}}
        type: script
        command: npm run lint --workspace={{item}}
      - name: test {{item}}
        type: script
        command: npm test --workspace={{item}}
      - name: build {{item}}
        type: script
        command: npm run build --workspace={{item}}

Use repeat: N as shorthand when there is no meaningful list — just a count. {{item}} is the 1-based iteration number:

steps:
  - name: iterative audit
    repeat: 5
    prompt: |
      This is pass {{item}} of 5. Review src/runner.ts for untested edge cases.

Variables at Runtime

Pass --var KEY=VALUE on the command line to override or supply workflow vars without editing the YAML:

executant --var env=staging --var region=eu-west-1 deploy.yaml

CLI vars override any same-named vars in the workflow's vars: section. Multiple --var flags are accepted.

Provider & Model Selection

Executant supports multiple coding-agent CLI backends. Claude is the default; OpenCode is a first-class alternative that supports a wide range of open models.

Global defaults via env vars

# Use OpenCode for all prompt steps
export EXECUTANT_PROVIDER=opencode
export EXECUTANT_MODEL=llama-qwen7b/qwen2.5-coder-7b
export EXECUTANT_AGENT=build

executant workflow.yaml

Per-step in YAML

goal: "Review and implement changes"

steps:
  - name: implement
    provider: opencode
    model: llama-qwen7b/qwen2.5-coder-7b
    agent: build
    prompt: |
      Implement the requested change and run tests.

  - name: review
    provider: claude
    model: sonnet
    prompt: |
      Review the git diff and summarise risks.

Env vars reference

Variable	Description	Default
`EXECUTANT_PROVIDER`	Agent backend: `claude` or `opencode`	`claude`
`EXECUTANT_MODEL`	Model name. Claude: `sonnet`/`opus`. OpenCode: `llama-qwen7b/qwen2.5-coder-7b` etc.	per-provider default
`EXECUTANT_AGENT`	OpenCode `--agent` name (ignored by Claude)	—

Step-level provider, model, and agent fields take priority over env vars.

Quality Controls

llm_as_judge: true — after a step completes, Claude evaluates the output; retries with feedback on FAIL, up to 5×
self_healing: true — on script failure, Claude diagnoses and repairs the command, then re-runs it, up to 5×
timeout_seconds: N — kill the step after N seconds and fail with exit code 3. Works for both script and prompt steps.
allowed_tools — restrict which tools a prompt step can use:
- Omit entirely → all tools available (default)
- allowed_tools: [] → text-only mode, no tools
- allowed_tools: [Bash, Read, Write] → only those tools; names are case-insensitive

steps:
  - name: analyse
    prompt: Review the architecture and list concerns.
    allowed_tools: [Read, Glob, Grep]   # read-only: no edits or bash

  - name: summarise
    prompt: Write a one-paragraph summary.
    allowed_tools: []                   # no tools — pure text generation

steps:
  - name: install
    command: npm ci
    timeout_seconds: 120   # fail if install takes longer than 2 min

  - name: implement
    prompt: Implement the feature described above.
    timeout_seconds: 1800  # 30 min ceiling for the Claude step

Cancellation

Write a .executant-cancel file in the same directory as the workflow YAML to stop the workflow cleanly between steps:

executant long-workflow.yaml &
touch .executant-cancel   # workflow stops at the next step boundary; exits 4

The file is deleted automatically. This is a cooperative, process-safe alternative to SIGTERM — no mid-step git state corruption. The cancel file is always resolved relative to the workflow file, so the location is predictable regardless of which directory you invoked executant from.

Interjection

While a workflow is running, press i to open a text input at the bottom of the TUI. Type a correction and press Enter to send it; Esc cancels.

The message is queued and prepended to the next Claude step's prompt as [User correction from a previous step]. Claude sees your note before it starts and incorporates it into its work. If you interject while a script step is running, the correction waits for the next Claude step in the workflow.

press i  →  ▷ don't delete that file, use git revert▌  esc to cancel

What it's good for: steering the next Claude step while watching the current one run — leaving a note for the step that's about to start.

What it can't do: interrupt a Claude step mid-execution. The Claude CLI processes each invocation as a complete unit; there's no mechanism to inject a message partway through. To abort a runaway step immediately, press q.

Examples

File	Demonstrates
`hello-world.yaml`	Simple prompt steps
`mixed-workflow.yaml`	Script + prompt steps together
`foreach-demo.yaml`	Inline lists and shell command iteration
`nested-steps-demo.yaml`	Multiple child steps per forEach / repeat iteration
`vars-demo.yaml`	Variable substitution
`judge-demo.yaml`	LLM-as-judge retry loop
`logging-demo.yaml`	Log steps, self-healing, judge
`git-status-summary.yaml`	Real-world git workflow
`repeat-demo.yaml`	Running a step N times with `repeat`
`file-demo.yaml`	File operations
`from-step-test.yaml`	Using `--from-step` to resume mid-workflow

See the examples/ directory.

CLI

executant plan "description"                    # generate a workflow YAML (auto-detects fast path)
executant plan -q "description"                 # skip research pass (fast path)
executant refine workflow.yaml "instructions"   # refine an existing workflow YAML
executant workflow.yaml                         # run a workflow
executant --ci workflow.yaml                    # headless, NDJSON to stdout
executant --step <name|n> wf.yaml              # run one step by name or index
executant --from-step <n> wf.yaml              # resume from step n
executant --var KEY=VALUE wf.yaml              # override a workflow var at runtime
executant update                                # upgrade to latest version

Exit codes

Code	Meaning
`0`	All steps completed successfully
`1`	A step failed at runtime
`2`	YAML or variable validation error
`3`	A step timed out (`timeout_seconds` exceeded)
`4`	Cancelled via `.executant-cancel` file

Development

npm test                                                     # run tests
npm run eval -- evals/plan-decompose.eval.yaml               # score a prompt template
npm run eval -- --refine evals/plan-decompose.eval.yaml      # refine until all cases pass
npm run eval -- --cases simple-feature,1-3 evals/plan-decompose.eval.yaml  # run a subset of cases

The eval system tests and iteratively refines the prompt templates in src/prompts/. Eval definitions live in evals/*.eval.yaml; see AGENTS.md for the full format.

Pass --output-csv results/out.csv to any eval run to save results. Re-running with the same path resumes from where it left off — already-scored cases are skipped.

Multi-model comparison

# Run all evals × all configured models and generate a benchmark report
npm run eval:compare
npm run eval:compare:report   # regenerate report from existing CSVs

# Compare specific models on a single eval
npm run eval -- \
  --models claude/sonnet,opencode/llama-qwen7b/qwen2.5-coder-7b \
  --output-csv results/comparison.csv \
  evals/judge-evaluation.eval.yaml

# Run multiple eval files in one command
npm run eval -- evals/plan-decompose.eval.yaml evals/judge-evaluation.eval.yaml

The --output-csv file is denormalized (one row per criterion judgment per model) — ready for pivot tables and charts. See docs/eval-comparison.md for column definitions and interpretation guidance.

Workflow evals (end-to-end agentic testing)

Workflow evals test models on complete coding tasks — the full development lifecycle — rather than just prompt quality. Each task runs in an isolated git worktree:

explore → plan → implement → npm test → commit

After the model finishes, Claude (always Claude, never the model being tested) reviews the git diff and judges it against the task criteria.

npm run eval:workflow -- --models claude/sonnet path/to/task.yaml
npm run eval:workflow -- \
  --models claude/sonnet,opencode/llama-qwen7b/qwen2.5-coder-7b \
  --output-csv results/workflow-comparison.csv \
  path/to/task.yaml

Task files are valid executant workflow YAMLs with an extra eval_criteria top-level field the harness reads for post-run judging.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github/workflows		.github/workflows
.husky		.husky
.reviews/src/tests		.reviews/src/tests
docs		docs
evals		evals
examples		examples
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
BACKLOG.md		BACKLOG.md
CLAUDE.md		CLAUDE.md
PRODUCT-SPEC.md		PRODUCT-SPEC.md
README.md		README.md
eslint.config.js		eslint.config.js
opencode.json		opencode.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Executant

Advisory

Install

Quick Start

How It Works

Generating Workflows

Context & Variables

Variables at Runtime

Provider & Model Selection

Global defaults via env vars

Per-step in YAML

Env vars reference

Quality Controls

Cancellation

Interjection

Examples

CLI

Exit codes

Development

Multi-model comparison

Workflow evals (end-to-end agentic testing)

About

Uh oh!

Releases 36

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Executant

Advisory

Install

Quick Start

How It Works

Generating Workflows

Context & Variables

Variables at Runtime

Provider & Model Selection

Global defaults via env vars

Per-step in YAML

Env vars reference

Quality Controls

Cancellation

Interjection

Examples

CLI

Exit codes

Development

Multi-model comparison

Workflow evals (end-to-end agentic testing)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 36

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages