Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .claude/skills/caveman/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2026 Julius Brussee

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
54 changes: 54 additions & 0 deletions .claude/skills/caveman/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# caveman skill (vendored — reference copy)

This directory vendors the [caveman](https://github.com/JuliusBrussee/caveman) Claude Code skill — an ultra-compressed communication mode that cuts agent prose token usage ~65–75% while keeping all technical substance intact.

## How ZO uses caveman (read this first)

**ZO does NOT rely on Claude Code auto-loading this `SKILL.md` file.** Caveman's always-on activation in Claude Code comes from upstream's SessionStart and UserPromptSubmit hooks (`hooks/install.sh` in their repo) which copy hook scripts into `~/.claude/hooks/` and patch `~/.claude/settings.json`. We deliberately do not run that installer — modifying the user's global Claude Code config is too invasive for an opt-in cost-saving feature.

Instead, when `--low-token` is active (and `caveman` is not opted out), the orchestrator inlines the caveman rules directly into the Lead Orchestrator's prompt — `src/zo/orchestrator.py:_prompt_low_token_overrides()` appends a "Token Efficiency: Caveman-Style Prose" subsection that contains the rules verbatim. The lead is instructed to pass the same rules to every sub-agent it spawns. The savings come from agents adopting the style as instructed; no skill or hook system is required.

This `SKILL.md` file is kept here as:
1. A **reference copy** of the canonical rules — useful if a developer wants to read the upstream spec without leaving the repo.
2. A **forward-compatible artifact** — if a future Claude Code release adds proper project-level skill auto-loading, this file is already in the right place to be picked up.
3. A **reference for users who install caveman properly** — anyone who runs upstream's `install.sh` separately gets the full hook-based always-on enforcement, and this file is a convenient way to confirm version alignment.

## Why caveman is safe to apply across the team

The skill explicitly preserves:

- Code blocks (verbatim)
- Quoted error strings (verbatim)
- Tool inputs (Write/Edit args — caveman only compresses chat prose, not tool calls)
- Structured artifacts (`metrics.jsonl`, `result.md`, `training_status.json`, agent contracts) — these go through Write/Edit, not chat

It also auto-disables itself for security warnings, irreversible-action confirmations, and any multi-step sequence where dropped articles/conjunctions could create ambiguity.

## Source

- Upstream: https://github.com/JuliusBrussee/caveman
- License: MIT (`LICENSE` in this directory — Copyright (c) 2026 Julius Brussee)
- Source-of-truth path upstream: `skills/caveman/SKILL.md`
- Auto-generated mirror at upstream root: `caveman/SKILL.md` (identical content)
- Vendored: 2026-05-05
- Intensity: full (default)

## Updating

To re-sync from upstream's source-of-truth:

```bash
gh api repos/JuliusBrussee/caveman/contents/skills/caveman/SKILL.md --jq '.content' | base64 -d \
> .claude/skills/caveman/SKILL.md
```

Then verify the YAML frontmatter and `## Persistence` / `## Rules` / `## Intensity` / `## Auto-Clarity` / `## Boundaries` sections are intact, and re-check whether the inlined rules in `src/zo/orchestrator.py:_prompt_low_token_overrides()` need updating to match material changes.

## Activation summary

1. User runs `zo build --low-token` (or sets `low_token: true` in plan frontmatter).
2. The `--low-token` preset includes `caveman: True` (default; opt out via `--no-caveman` CLI flag or `caveman: false` plan field).
3. The Lead Orchestrator's prompt gains a "Token Efficiency: Caveman-Style Prose" subsection containing the rules inline.
4. Lead adopts the style; sub-agents receive the same rules in their spawn prompts and follow them too.

See `docs/concepts/low-token-mode.mdx` and `docs/reference/cost-benchmark.mdx` for measured impact (target: +10–20pp savings on top of the measured ~30% baseline from the lead Opus→Sonnet swap).
74 changes: 74 additions & 0 deletions .claude/skills/caveman/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
name: caveman
description: >
Ultra-compressed communication mode. Cuts token usage ~75% by speaking like caveman
while keeping full technical accuracy. Supports intensity levels: lite, full (default), ultra,
wenyan-lite, wenyan-full, wenyan-ultra.
Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens",
"be brief", or invokes /caveman. Also auto-triggers when token efficiency is requested.
---

Respond terse like smart caveman. All technical substance stay. Only fluff die.

## Persistence

ACTIVE EVERY RESPONSE. No revert after many turns. No filler drift. Still active if unsure. Off only: "stop caveman" / "normal mode".

Default: **full**. Switch: `/caveman lite|full|ultra`.

## Rules

Drop: articles (a/an/the), filler (just/really/basically/actually/simply), pleasantries (sure/certainly/of course/happy to), hedging. Fragments OK. Short synonyms (big not extensive, fix not "implement a solution for"). Technical terms exact. Code blocks unchanged. Errors quoted exact.

Pattern: `[thing] [action] [reason]. [next step].`

Not: "Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by..."
Yes: "Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:"

## Intensity

| Level | What change |
|-------|------------|
| **lite** | No filler/hedging. Keep articles + full sentences. Professional but tight |
| **full** | Drop articles, fragments OK, short synonyms. Classic caveman |
| **ultra** | Abbreviate prose words (DB/auth/config/req/res/fn/impl), strip conjunctions, arrows for causality (X → Y), one word when one word enough. Code symbols, function names, API names, error strings: never abbreviate |
| **wenyan-lite** | Semi-classical. Drop filler/hedging but keep grammar structure, classical register |
| **wenyan-full** | Maximum classical terseness. Fully 文言文. 80-90% character reduction. Classical sentence patterns, verbs precede objects, subjects often omitted, classical particles (之/乃/為/其) |
| **wenyan-ultra** | Extreme abbreviation while keeping classical Chinese feel. Maximum compression, ultra terse |

Example — "Why React component re-render?"
- lite: "Your component re-renders because you create a new object reference each render. Wrap it in `useMemo`."
- full: "New object ref each render. Inline object prop = new ref = re-render. Wrap in `useMemo`."
- ultra: "Inline obj prop → new ref → re-render. `useMemo`."
- wenyan-lite: "組件頻重繪,以每繪新生對象參照故。以 useMemo 包之。"
- wenyan-full: "物出新參照,致重繪。useMemo .Wrap之。"
- wenyan-ultra: "新參照→重繪。useMemo Wrap。"

Example — "Explain database connection pooling."
- lite: "Connection pooling reuses open connections instead of creating new ones per request. Avoids repeated handshake overhead."
- full: "Pool reuse open DB connections. No new connection per request. Skip handshake overhead."
- ultra: "Pool = reuse DB conn. Skip handshake → fast under load."
- wenyan-full: "池reuse open connection。不每req新開。skip handshake overhead。"
- wenyan-ultra: "池reuse conn。skip handshake → fast。"

## Auto-Clarity

Drop caveman when:
- Security warnings
- Irreversible action confirmations
- Multi-step sequences where fragment order or omitted conjunctions risk misread
- Compression itself creates technical ambiguity (e.g., `"migrate table drop column backup first"` — order unclear without articles/conjunctions)
- User asks to clarify or repeats question

Resume caveman after clear part done.

Example — destructive op:
> **Warning:** This will permanently delete all rows in the `users` table and cannot be undone.
> ```sql
> DROP TABLE users;
> ```
> Caveman resume. Verify backup exist first.

## Boundaries

Code/commits/PRs: write normal. "stop caveman" or "normal mode": revert. Level persist until changed or session end.
39 changes: 30 additions & 9 deletions scripts/benchmark_low_token.sh
Original file line number Diff line number Diff line change
@@ -1,25 +1,38 @@
#!/usr/bin/env bash
# benchmark_low_token.sh — measure token cost reduction from --low-token mode.
# benchmark_low_token.sh — measure token cost reduction from --low-token mode
# and caveman-skill ablation.
#
# Runs `zo build` against the MNIST plan twice (default mode + low-token mode),
# captures Claude Code token usage via `npx ccusage`, and writes a comparison
# JSON + console summary.
# Runs `zo build` against the MNIST plan up to THREE times:
# - default (no flags — historical anchor, ~$11)
# - low-token (--low-token, caveman default-on — measures the
# full preset including caveman skill)
# - low-token-no-caveman (--low-token --no-caveman — isolates caveman's
# contribution within low-token mode)
#
# Comparing low-token vs low-token-no-caveman gives the caveman delta.
# Comparing default vs low-token gives the full low-token-mode savings.
#
# Captures Claude Code token usage via `npx ccusage` and writes a
# comparison JSON + console summary.
#
# Prerequisites:
# - `zo` CLI installed (this repo, ./setup.sh)
# - `claude` CLI logged in
# - `npx ccusage` available (npm install -g ccusage)
# - Mac or Linux dev box (Apple Silicon recommended for ~25min low-token wall time)
# - ~75 minutes wall time, ~$13-14 spend on Anthropic API
# - Mac or Linux dev box (Apple Silicon recommended)
# - ~110 minutes wall time across all three runs, ~$20-22 spend on Anthropic API
# (default ~$11 + low-token ~$5-6 + low-token-no-caveman ~$6-7)
#
# Usage:
# ./scripts/benchmark_low_token.sh
# ./scripts/benchmark_low_token.sh --delivery-prefix /tmp/zo-bench
# ./scripts/benchmark_low_token.sh --skip-default --skip-low-token # dry preview
# ./scripts/benchmark_low_token.sh --skip-default # only low-token variants
# ./scripts/benchmark_low_token.sh --skip-low-token-no-caveman # skip ablation arm
# ./scripts/benchmark_low_token.sh --skip-default --skip-low-token --skip-low-token-no-caveman # dry preview
# ./scripts/benchmark_low_token.sh --help
#
# Output:
# benchmark-results-{ISO-timestamp}.json — full diff + summary
# benchmark-results-{ISO-timestamp}.json — full diff + summary across all runs
# stdout — human-readable summary table

set -euo pipefail
Expand All @@ -31,6 +44,7 @@ set -euo pipefail
DELIVERY_PREFIX="${TMPDIR:-/tmp}/zo-low-token-bench"
RUN_DEFAULT=true
RUN_LOW_TOKEN=true
RUN_LOW_TOKEN_NO_CAVEMAN=true
TIMESTAMP="$(date -u +%Y%m%d-%H%M%S)"
RESULT_FILE="benchmark-results-${TIMESTAMP}.json"

Expand All @@ -42,6 +56,8 @@ while [[ $# -gt 0 ]]; do
RUN_DEFAULT=false; shift ;;
--skip-low-token)
RUN_LOW_TOKEN=false; shift ;;
--skip-low-token-no-caveman)
RUN_LOW_TOKEN_NO_CAVEMAN=false; shift ;;
--help|-h)
grep '^# ' "$0" | sed 's/^# //'
exit 0 ;;
Expand All @@ -54,6 +70,7 @@ done

DEFAULT_DELIVERY="${DELIVERY_PREFIX}-default"
LOW_TOKEN_DELIVERY="${DELIVERY_PREFIX}-low-token"
LOW_TOKEN_NO_CAVEMAN_DELIVERY="${DELIVERY_PREFIX}-low-token-no-caveman"
PLAN_PATH="plans/mnist-digit-classifier.md"

# ---------------------------------------------------------------------------
Expand Down Expand Up @@ -189,6 +206,10 @@ if [[ "$RUN_LOW_TOKEN" == "true" ]]; then
run_one "low-token" "$LOW_TOKEN_DELIVERY" --low-token
fi

if [[ "$RUN_LOW_TOKEN_NO_CAVEMAN" == "true" ]]; then
run_one "low-token-no-caveman" "$LOW_TOKEN_NO_CAVEMAN_DELIVERY" --low-token --no-caveman
fi

# ---------------------------------------------------------------------------
# Summarise
# ---------------------------------------------------------------------------
Expand All @@ -213,7 +234,7 @@ results = {
"runs": {}
}

for label in ("default", "low-token"):
for label in ("default", "low-token", "low-token-no-caveman"):
meta_path = Path(f"{prefix}-{label}-meta.json")
if not meta_path.exists():
continue
Expand Down
55 changes: 53 additions & 2 deletions src/zo/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,9 @@ def _gate_mode_from_str(value: str) -> GateMode:
"headlines_disabled": True, # disable Haiku ticker (~60 calls/hr)
"gate_mode": "full-auto", # no human-loop overhead
"compact_threshold": "60", # CLAUDE_AUTOCOMPACT_PCT_OVERRIDE
"caveman": True, # vendor caveman skill at .claude/skills/caveman/
# for terse prose responses; preserves code blocks
# and structured artifacts. Opt out via --no-caveman.
}


Expand Down Expand Up @@ -298,18 +301,45 @@ def _resolve_gate_mode(
return "supervised"


def _resolve_caveman(
*,
cli_no_caveman: bool,
plan_caveman: bool | None,
low_token: bool,
) -> bool:
"""Resolve effective caveman activation.

Precedence (highest first): CLI ``--no-caveman`` flag > plan field
``caveman: false`` > preset default. Caveman only activates when
``low_token`` is also True; outside low-token mode the skill file at
``.claude/skills/caveman/SKILL.md`` is still available for manual
invocation, but ZO does not auto-direct agents to use it.
"""
if not low_token:
return False
if cli_no_caveman:
return False
if plan_caveman is False:
return False
return bool(_LOW_TOKEN_PRESET["caveman"])


def _show_banner(
project: str = "",
mode: str = "",
phase: str = "",
gate_mode: str = "",
low_token: bool = False,
caveman: bool = False,
) -> None:
"""Display the ZO brand panel at startup.

When ``low_token`` is True, appends a "low-token" badge to the
banner so the user has constant visual confirmation that the
cost-saving preset is active.
cost-saving preset is active. When ``caveman`` is also True
(only meaningful with ``low_token``), the badge becomes
"low-token + caveman" to signal the additional terse-output skill
is engaged.
"""
from rich.panel import Panel
from rich.text import Text
Expand All @@ -319,7 +349,8 @@ def _show_banner(
logo.append("Zero Operators", style="#F0C040 bold")
logo.append(f" v{_VERSION}", style=_DIM)
if low_token:
logo.append(" [low-token]", style="#F0C040 bold")
badge = " [low-token + caveman]" if caveman else " [low-token]"
logo.append(badge, style="#F0C040 bold")
logo.append("\n", style=_DIM)
logo.append(" Autonomous AI Research & Engineering Teams\n", style=_DIM)
if project:
Expand Down Expand Up @@ -910,6 +941,12 @@ def _print_status(team_status, pane_snapshot=""): # noqa: ANN001
"--no-headlines", is_flag=True,
help="Disable the Haiku headline ticker (saves ~60 small calls/hour).",
)
@click.option(
"--no-caveman", is_flag=True,
help="Opt out of caveman terse-output skill (auto-on with --low-token). "
"Use if your run produces structured prose that caveman would compress "
"ambiguously. Code blocks and structured artifacts are always preserved.",
)
def build(
plan_path: Path,
gate_mode: str | None,
Expand All @@ -918,6 +955,7 @@ def build(
lead_model: str | None,
max_iterations: int | None,
no_headlines: bool,
no_caveman: bool,
) -> None:
"""Launch a project from a plan.md file.

Expand Down Expand Up @@ -960,6 +998,11 @@ def build(
plan_lead_model=plan.frontmatter.lead_model,
low_token=effective_low_token,
)
effective_caveman = _resolve_caveman(
cli_no_caveman=no_caveman,
plan_caveman=plan.frontmatter.caveman,
low_token=effective_low_token,
)
effective_headlines_disabled = (
no_headlines or effective_low_token
)
Expand Down Expand Up @@ -991,6 +1034,7 @@ def build(
phase=state_check.phase if detected_mode == "continue" else "starting",
gate_mode=effective_gate_mode,
low_token=effective_low_token,
caveman=effective_caveman,
)

# 5. Create CommsLogger and SemanticIndex
Expand All @@ -1017,6 +1061,7 @@ def build(
semantic=semantic, zo_root=zo_root, gate_mode=gm,
plan_path=plan_path,
low_token=effective_low_token,
caveman=effective_caveman,
max_iterations_override=max_iterations,
)
orchestrator.start_session()
Expand Down Expand Up @@ -1098,6 +1143,10 @@ def build(
"--no-headlines", is_flag=True,
help="Disable the Haiku headline ticker.",
)
@click.option(
"--no-caveman", is_flag=True,
help="Opt out of caveman terse-output skill (auto-on with --low-token).",
)
def continue_(
project_name: str | None,
repo: str | None,
Expand All @@ -1107,6 +1156,7 @@ def continue_(
lead_model: str | None,
max_iterations: int | None,
no_headlines: bool,
no_caveman: bool,
) -> None:
"""Resume a paused project or reconnect on a new machine.

Expand Down Expand Up @@ -1172,6 +1222,7 @@ def continue_(
lead_model=lead_model,
max_iterations=max_iterations,
no_headlines=no_headlines,
no_caveman=no_caveman,
)


Expand Down
Loading
Loading