diff --git a/.claude/skills/caveman/LICENSE b/.claude/skills/caveman/LICENSE new file mode 100644 index 0000000..d8c0ee8 --- /dev/null +++ b/.claude/skills/caveman/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2026 Julius Brussee + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/.claude/skills/caveman/README.md b/.claude/skills/caveman/README.md new file mode 100644 index 0000000..8d8bead --- /dev/null +++ b/.claude/skills/caveman/README.md @@ -0,0 +1,54 @@ +# caveman skill (vendored — reference copy) + +This directory vendors the [caveman](https://github.com/JuliusBrussee/caveman) Claude Code skill — an ultra-compressed communication mode that cuts agent prose token usage ~65–75% while keeping all technical substance intact. + +## How ZO uses caveman (read this first) + +**ZO does NOT rely on Claude Code auto-loading this `SKILL.md` file.** Caveman's always-on activation in Claude Code comes from upstream's SessionStart and UserPromptSubmit hooks (`hooks/install.sh` in their repo) which copy hook scripts into `~/.claude/hooks/` and patch `~/.claude/settings.json`. We deliberately do not run that installer — modifying the user's global Claude Code config is too invasive for an opt-in cost-saving feature. + +Instead, when `--low-token` is active (and `caveman` is not opted out), the orchestrator inlines the caveman rules directly into the Lead Orchestrator's prompt — `src/zo/orchestrator.py:_prompt_low_token_overrides()` appends a "Token Efficiency: Caveman-Style Prose" subsection that contains the rules verbatim. The lead is instructed to pass the same rules to every sub-agent it spawns. The savings come from agents adopting the style as instructed; no skill or hook system is required. + +This `SKILL.md` file is kept here as: +1. A **reference copy** of the canonical rules — useful if a developer wants to read the upstream spec without leaving the repo. +2. A **forward-compatible artifact** — if a future Claude Code release adds proper project-level skill auto-loading, this file is already in the right place to be picked up. +3. A **reference for users who install caveman properly** — anyone who runs upstream's `install.sh` separately gets the full hook-based always-on enforcement, and this file is a convenient way to confirm version alignment. + +## Why caveman is safe to apply across the team + +The skill explicitly preserves: + +- Code blocks (verbatim) +- Quoted error strings (verbatim) +- Tool inputs (Write/Edit args — caveman only compresses chat prose, not tool calls) +- Structured artifacts (`metrics.jsonl`, `result.md`, `training_status.json`, agent contracts) — these go through Write/Edit, not chat + +It also auto-disables itself for security warnings, irreversible-action confirmations, and any multi-step sequence where dropped articles/conjunctions could create ambiguity. + +## Source + +- Upstream: https://github.com/JuliusBrussee/caveman +- License: MIT (`LICENSE` in this directory — Copyright (c) 2026 Julius Brussee) +- Source-of-truth path upstream: `skills/caveman/SKILL.md` +- Auto-generated mirror at upstream root: `caveman/SKILL.md` (identical content) +- Vendored: 2026-05-05 +- Intensity: full (default) + +## Updating + +To re-sync from upstream's source-of-truth: + +```bash +gh api repos/JuliusBrussee/caveman/contents/skills/caveman/SKILL.md --jq '.content' | base64 -d \ + > .claude/skills/caveman/SKILL.md +``` + +Then verify the YAML frontmatter and `## Persistence` / `## Rules` / `## Intensity` / `## Auto-Clarity` / `## Boundaries` sections are intact, and re-check whether the inlined rules in `src/zo/orchestrator.py:_prompt_low_token_overrides()` need updating to match material changes. + +## Activation summary + +1. User runs `zo build --low-token` (or sets `low_token: true` in plan frontmatter). +2. The `--low-token` preset includes `caveman: True` (default; opt out via `--no-caveman` CLI flag or `caveman: false` plan field). +3. The Lead Orchestrator's prompt gains a "Token Efficiency: Caveman-Style Prose" subsection containing the rules inline. +4. Lead adopts the style; sub-agents receive the same rules in their spawn prompts and follow them too. + +See `docs/concepts/low-token-mode.mdx` and `docs/reference/cost-benchmark.mdx` for measured impact (target: +10–20pp savings on top of the measured ~30% baseline from the lead Opus→Sonnet swap). diff --git a/.claude/skills/caveman/SKILL.md b/.claude/skills/caveman/SKILL.md new file mode 100644 index 0000000..09ea3ef --- /dev/null +++ b/.claude/skills/caveman/SKILL.md @@ -0,0 +1,74 @@ +--- +name: caveman +description: > + Ultra-compressed communication mode. Cuts token usage ~75% by speaking like caveman + while keeping full technical accuracy. Supports intensity levels: lite, full (default), ultra, + wenyan-lite, wenyan-full, wenyan-ultra. + Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens", + "be brief", or invokes /caveman. Also auto-triggers when token efficiency is requested. +--- + +Respond terse like smart caveman. All technical substance stay. Only fluff die. + +## Persistence + +ACTIVE EVERY RESPONSE. No revert after many turns. No filler drift. Still active if unsure. Off only: "stop caveman" / "normal mode". + +Default: **full**. Switch: `/caveman lite|full|ultra`. + +## Rules + +Drop: articles (a/an/the), filler (just/really/basically/actually/simply), pleasantries (sure/certainly/of course/happy to), hedging. Fragments OK. Short synonyms (big not extensive, fix not "implement a solution for"). Technical terms exact. Code blocks unchanged. Errors quoted exact. + +Pattern: `[thing] [action] [reason]. [next step].` + +Not: "Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by..." +Yes: "Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:" + +## Intensity + +| Level | What change | +|-------|------------| +| **lite** | No filler/hedging. Keep articles + full sentences. Professional but tight | +| **full** | Drop articles, fragments OK, short synonyms. Classic caveman | +| **ultra** | Abbreviate prose words (DB/auth/config/req/res/fn/impl), strip conjunctions, arrows for causality (X → Y), one word when one word enough. Code symbols, function names, API names, error strings: never abbreviate | +| **wenyan-lite** | Semi-classical. Drop filler/hedging but keep grammar structure, classical register | +| **wenyan-full** | Maximum classical terseness. Fully 文言文. 80-90% character reduction. Classical sentence patterns, verbs precede objects, subjects often omitted, classical particles (之/乃/為/其) | +| **wenyan-ultra** | Extreme abbreviation while keeping classical Chinese feel. Maximum compression, ultra terse | + +Example — "Why React component re-render?" +- lite: "Your component re-renders because you create a new object reference each render. Wrap it in `useMemo`." +- full: "New object ref each render. Inline object prop = new ref = re-render. Wrap in `useMemo`." +- ultra: "Inline obj prop → new ref → re-render. `useMemo`." +- wenyan-lite: "組件頻重繪,以每繪新生對象參照故。以 useMemo 包之。" +- wenyan-full: "物出新參照,致重繪。useMemo .Wrap之。" +- wenyan-ultra: "新參照→重繪。useMemo Wrap。" + +Example — "Explain database connection pooling." +- lite: "Connection pooling reuses open connections instead of creating new ones per request. Avoids repeated handshake overhead." +- full: "Pool reuse open DB connections. No new connection per request. Skip handshake overhead." +- ultra: "Pool = reuse DB conn. Skip handshake → fast under load." +- wenyan-full: "池reuse open connection。不每req新開。skip handshake overhead。" +- wenyan-ultra: "池reuse conn。skip handshake → fast。" + +## Auto-Clarity + +Drop caveman when: +- Security warnings +- Irreversible action confirmations +- Multi-step sequences where fragment order or omitted conjunctions risk misread +- Compression itself creates technical ambiguity (e.g., `"migrate table drop column backup first"` — order unclear without articles/conjunctions) +- User asks to clarify or repeats question + +Resume caveman after clear part done. + +Example — destructive op: +> **Warning:** This will permanently delete all rows in the `users` table and cannot be undone. +> ```sql +> DROP TABLE users; +> ``` +> Caveman resume. Verify backup exist first. + +## Boundaries + +Code/commits/PRs: write normal. "stop caveman" or "normal mode": revert. Level persist until changed or session end. diff --git a/scripts/benchmark_low_token.sh b/scripts/benchmark_low_token.sh index a78d64d..01daadf 100755 --- a/scripts/benchmark_low_token.sh +++ b/scripts/benchmark_low_token.sh @@ -1,25 +1,38 @@ #!/usr/bin/env bash -# benchmark_low_token.sh — measure token cost reduction from --low-token mode. +# benchmark_low_token.sh — measure token cost reduction from --low-token mode +# and caveman-skill ablation. # -# Runs `zo build` against the MNIST plan twice (default mode + low-token mode), -# captures Claude Code token usage via `npx ccusage`, and writes a comparison -# JSON + console summary. +# Runs `zo build` against the MNIST plan up to THREE times: +# - default (no flags — historical anchor, ~$11) +# - low-token (--low-token, caveman default-on — measures the +# full preset including caveman skill) +# - low-token-no-caveman (--low-token --no-caveman — isolates caveman's +# contribution within low-token mode) +# +# Comparing low-token vs low-token-no-caveman gives the caveman delta. +# Comparing default vs low-token gives the full low-token-mode savings. +# +# Captures Claude Code token usage via `npx ccusage` and writes a +# comparison JSON + console summary. # # Prerequisites: # - `zo` CLI installed (this repo, ./setup.sh) # - `claude` CLI logged in # - `npx ccusage` available (npm install -g ccusage) -# - Mac or Linux dev box (Apple Silicon recommended for ~25min low-token wall time) -# - ~75 minutes wall time, ~$13-14 spend on Anthropic API +# - Mac or Linux dev box (Apple Silicon recommended) +# - ~110 minutes wall time across all three runs, ~$20-22 spend on Anthropic API +# (default ~$11 + low-token ~$5-6 + low-token-no-caveman ~$6-7) # # Usage: # ./scripts/benchmark_low_token.sh # ./scripts/benchmark_low_token.sh --delivery-prefix /tmp/zo-bench -# ./scripts/benchmark_low_token.sh --skip-default --skip-low-token # dry preview +# ./scripts/benchmark_low_token.sh --skip-default # only low-token variants +# ./scripts/benchmark_low_token.sh --skip-low-token-no-caveman # skip ablation arm +# ./scripts/benchmark_low_token.sh --skip-default --skip-low-token --skip-low-token-no-caveman # dry preview # ./scripts/benchmark_low_token.sh --help # # Output: -# benchmark-results-{ISO-timestamp}.json — full diff + summary +# benchmark-results-{ISO-timestamp}.json — full diff + summary across all runs # stdout — human-readable summary table set -euo pipefail @@ -31,6 +44,7 @@ set -euo pipefail DELIVERY_PREFIX="${TMPDIR:-/tmp}/zo-low-token-bench" RUN_DEFAULT=true RUN_LOW_TOKEN=true +RUN_LOW_TOKEN_NO_CAVEMAN=true TIMESTAMP="$(date -u +%Y%m%d-%H%M%S)" RESULT_FILE="benchmark-results-${TIMESTAMP}.json" @@ -42,6 +56,8 @@ while [[ $# -gt 0 ]]; do RUN_DEFAULT=false; shift ;; --skip-low-token) RUN_LOW_TOKEN=false; shift ;; + --skip-low-token-no-caveman) + RUN_LOW_TOKEN_NO_CAVEMAN=false; shift ;; --help|-h) grep '^# ' "$0" | sed 's/^# //' exit 0 ;; @@ -54,6 +70,7 @@ done DEFAULT_DELIVERY="${DELIVERY_PREFIX}-default" LOW_TOKEN_DELIVERY="${DELIVERY_PREFIX}-low-token" +LOW_TOKEN_NO_CAVEMAN_DELIVERY="${DELIVERY_PREFIX}-low-token-no-caveman" PLAN_PATH="plans/mnist-digit-classifier.md" # --------------------------------------------------------------------------- @@ -189,6 +206,10 @@ if [[ "$RUN_LOW_TOKEN" == "true" ]]; then run_one "low-token" "$LOW_TOKEN_DELIVERY" --low-token fi +if [[ "$RUN_LOW_TOKEN_NO_CAVEMAN" == "true" ]]; then + run_one "low-token-no-caveman" "$LOW_TOKEN_NO_CAVEMAN_DELIVERY" --low-token --no-caveman +fi + # --------------------------------------------------------------------------- # Summarise # --------------------------------------------------------------------------- @@ -213,7 +234,7 @@ results = { "runs": {} } -for label in ("default", "low-token"): +for label in ("default", "low-token", "low-token-no-caveman"): meta_path = Path(f"{prefix}-{label}-meta.json") if not meta_path.exists(): continue diff --git a/src/zo/cli.py b/src/zo/cli.py index 493f20f..c3b3167 100644 --- a/src/zo/cli.py +++ b/src/zo/cli.py @@ -266,6 +266,9 @@ def _gate_mode_from_str(value: str) -> GateMode: "headlines_disabled": True, # disable Haiku ticker (~60 calls/hr) "gate_mode": "full-auto", # no human-loop overhead "compact_threshold": "60", # CLAUDE_AUTOCOMPACT_PCT_OVERRIDE + "caveman": True, # vendor caveman skill at .claude/skills/caveman/ + # for terse prose responses; preserves code blocks + # and structured artifacts. Opt out via --no-caveman. } @@ -298,18 +301,45 @@ def _resolve_gate_mode( return "supervised" +def _resolve_caveman( + *, + cli_no_caveman: bool, + plan_caveman: bool | None, + low_token: bool, +) -> bool: + """Resolve effective caveman activation. + + Precedence (highest first): CLI ``--no-caveman`` flag > plan field + ``caveman: false`` > preset default. Caveman only activates when + ``low_token`` is also True; outside low-token mode the skill file at + ``.claude/skills/caveman/SKILL.md`` is still available for manual + invocation, but ZO does not auto-direct agents to use it. + """ + if not low_token: + return False + if cli_no_caveman: + return False + if plan_caveman is False: + return False + return bool(_LOW_TOKEN_PRESET["caveman"]) + + def _show_banner( project: str = "", mode: str = "", phase: str = "", gate_mode: str = "", low_token: bool = False, + caveman: bool = False, ) -> None: """Display the ZO brand panel at startup. When ``low_token`` is True, appends a "low-token" badge to the banner so the user has constant visual confirmation that the - cost-saving preset is active. + cost-saving preset is active. When ``caveman`` is also True + (only meaningful with ``low_token``), the badge becomes + "low-token + caveman" to signal the additional terse-output skill + is engaged. """ from rich.panel import Panel from rich.text import Text @@ -319,7 +349,8 @@ def _show_banner( logo.append("Zero Operators", style="#F0C040 bold") logo.append(f" v{_VERSION}", style=_DIM) if low_token: - logo.append(" [low-token]", style="#F0C040 bold") + badge = " [low-token + caveman]" if caveman else " [low-token]" + logo.append(badge, style="#F0C040 bold") logo.append("\n", style=_DIM) logo.append(" Autonomous AI Research & Engineering Teams\n", style=_DIM) if project: @@ -910,6 +941,12 @@ def _print_status(team_status, pane_snapshot=""): # noqa: ANN001 "--no-headlines", is_flag=True, help="Disable the Haiku headline ticker (saves ~60 small calls/hour).", ) +@click.option( + "--no-caveman", is_flag=True, + help="Opt out of caveman terse-output skill (auto-on with --low-token). " + "Use if your run produces structured prose that caveman would compress " + "ambiguously. Code blocks and structured artifacts are always preserved.", +) def build( plan_path: Path, gate_mode: str | None, @@ -918,6 +955,7 @@ def build( lead_model: str | None, max_iterations: int | None, no_headlines: bool, + no_caveman: bool, ) -> None: """Launch a project from a plan.md file. @@ -960,6 +998,11 @@ def build( plan_lead_model=plan.frontmatter.lead_model, low_token=effective_low_token, ) + effective_caveman = _resolve_caveman( + cli_no_caveman=no_caveman, + plan_caveman=plan.frontmatter.caveman, + low_token=effective_low_token, + ) effective_headlines_disabled = ( no_headlines or effective_low_token ) @@ -991,6 +1034,7 @@ def build( phase=state_check.phase if detected_mode == "continue" else "starting", gate_mode=effective_gate_mode, low_token=effective_low_token, + caveman=effective_caveman, ) # 5. Create CommsLogger and SemanticIndex @@ -1017,6 +1061,7 @@ def build( semantic=semantic, zo_root=zo_root, gate_mode=gm, plan_path=plan_path, low_token=effective_low_token, + caveman=effective_caveman, max_iterations_override=max_iterations, ) orchestrator.start_session() @@ -1098,6 +1143,10 @@ def build( "--no-headlines", is_flag=True, help="Disable the Haiku headline ticker.", ) +@click.option( + "--no-caveman", is_flag=True, + help="Opt out of caveman terse-output skill (auto-on with --low-token).", +) def continue_( project_name: str | None, repo: str | None, @@ -1107,6 +1156,7 @@ def continue_( lead_model: str | None, max_iterations: int | None, no_headlines: bool, + no_caveman: bool, ) -> None: """Resume a paused project or reconnect on a new machine. @@ -1172,6 +1222,7 @@ def continue_( lead_model=lead_model, max_iterations=max_iterations, no_headlines=no_headlines, + no_caveman=no_caveman, ) diff --git a/src/zo/orchestrator.py b/src/zo/orchestrator.py index 8d9419e..91f91ed 100644 --- a/src/zo/orchestrator.py +++ b/src/zo/orchestrator.py @@ -171,6 +171,7 @@ def __init__( gate_mode: GateMode = GateMode.SUPERVISED, plan_path: Path | None = None, low_token: bool = False, + caveman: bool = False, max_iterations_override: int | None = None, ) -> None: self._plan = plan @@ -182,6 +183,7 @@ def __init__( self._gate_mode = gate_mode self._plan_path = plan_path self._low_token = low_token + self._caveman = caveman self._max_iterations_override = max_iterations_override self._workflow: WorkflowDecomposition | None = None self._session_state: SessionState | None = None @@ -207,6 +209,17 @@ def low_token(self) -> bool: """Whether the low-token preset is active for this orchestrator.""" return self._low_token + @property + def caveman(self) -> bool: + """Whether the caveman terse-output skill is active. + + Only meaningful when ``low_token`` is also True; outside + low-token mode the orchestrator does not auto-direct agents to + invoke caveman (the skill file is still available at + ``.claude/skills/caveman/SKILL.md`` for manual use). + """ + return self._caveman + @property def session_state(self) -> SessionState: """Current session state (reads from memory if not yet loaded).""" @@ -399,13 +412,22 @@ def _prompt_low_token_overrides(self) -> str: Without this two-tier routing the savings ceiling is ~30% (lead-swap-only); with it, the ceiling moves to ~50-60% per ``docs/reference/cost-benchmark.mdx``. + + When ``caveman`` is also active (default with low-token; opt out + via ``--no-caveman`` or plan ``caveman: false``), an additional + "Token Efficiency: Caveman-Style Prose" subsection is appended + containing the caveman rules **inline** (no Claude Code skill + auto-loading dependency — that path requires hooks we + deliberately don't install). The lead is instructed to pass the + same rules into every sub-agent's spawn prompt so the team + adopts terse prose end-to-end. """ if not self._low_token: return "" haiku_list = ", ".join( f"`{name}`" for name in sorted(LOW_TOKEN_HAIKU_AGENTS) ) - return dedent(f"""\ + sections = [dedent(f"""\ # Low-Token Sub-Agent Model Override **CRITICAL:** Low-token mode is active. The lead session (you) @@ -454,7 +476,61 @@ def _prompt_low_token_overrides(self) -> str: does not accept a `model` parameter, log a `DECISION_LOG` entry noting this and proceed without the override — lead-side savings still apply, but sub-agent savings - cannot be achieved without an SDK-level fix.""") + cannot be achieved without an SDK-level fix.""")] + if self._caveman: + sections.append(dedent("""\ + # Token Efficiency: Caveman-Style Prose + + **Adopt terse, caveman-style prose for all chat + responses.** Pass this same instruction to every + sub-agent you spawn (include the rules below in their + spawn prompt — `Agent(..., prompt="...caveman rules...")`). + The reference skill is vendored at + `.claude/skills/caveman/SKILL.md` for users who install + the upstream hooks; we do not assume those hooks are + present, so the rules below are authoritative for this + session. + + **Rules (intensity: full):** + - Drop articles (a/an/the), filler (just/really/basically/ + actually/simply), pleasantries (sure/certainly/of course/ + happy to), hedging. + - Fragments OK. Short synonyms (big not extensive, + fix not "implement a solution for"). Technical terms + exact. + - Pattern: `[thing] [action] [reason]. [next step].` + - Example bad: "Sure! I'd be happy to help. The issue + is likely caused by your authentication middleware..." + - Example good: "Bug in auth middleware. Token expiry + check use `<` not `<=`. Fix:" + + **What stays VERBATIM (do not compress):** + - Code blocks — never compressed, every character intact. + - Quoted error strings — exact reproduction. + - Tool inputs (Write/Edit/Bash args) — we only compress + chat output, not tool calls. So all structured + artifacts the gates require (`metrics.jsonl`, + `result.md`, `training_status.json`, `hypothesis.md`, + agent contracts) are written normally — they go + through Write/Edit, not chat. + - Security warnings, irreversible-action confirmations, + multi-step sequences where dropped articles/conjunctions + could create order ambiguity. Drop caveman for these + and resume after the clear part is done. + + **DECISION_LOG entries, gate rationales, and + hand-off summaries:** these are persisted artifacts. + Apply caveman lightly — terse prose, but still readable + by the next session that reads the file cold. Lean + toward `lite` intensity for these (no filler/hedging, + keep articles + full sentences) over `full` (fragments, + drop articles). + + **Opt-out for the user:** they can pass `--no-caveman` at + CLI time or set `caveman: false` in plan frontmatter. If + they did, this subsection would not be present and you + would write normal prose.""" )) + return "\n\n".join(sections) def _prompt_autonomy(self) -> str: """Tell the agent how much autonomy it has based on gate mode.""" diff --git a/src/zo/plan.py b/src/zo/plan.py index 2535395..9fe28ff 100644 --- a/src/zo/plan.py +++ b/src/zo/plan.py @@ -39,6 +39,12 @@ class PlanFrontmatter(BaseModel): activates the preset; ``lead_model`` overrides the lead orchestrator model. Both can also be supplied via CLI flags, which take precedence (CLI > plan field > preset default > base default). + + ``caveman`` is opt-out for the caveman terse-output skill that + auto-activates with ``low_token``. ``None`` (the default) means + "use the preset default" (currently True). Set to ``False`` in plan + frontmatter to suppress caveman activation even when low-token is + on. CLI flag ``--no-caveman`` takes precedence over plan and preset. """ project_name: str @@ -49,6 +55,7 @@ class PlanFrontmatter(BaseModel): owner: str low_token: bool = False lead_model: str | None = None + caveman: bool | None = None class OracleDefinition(BaseModel):