From f190bc7860253909bfa1ed2056a458bf28a716e0 Mon Sep 17 00:00:00 2001 From: SamPlvs Date: Tue, 5 May 2026 13:08:46 +0100 Subject: [PATCH 1/2] feat(low-token): bundle caveman skill and wire into --low-token preset MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit First Tier 1 deliverable per the Q3/H2 roadmap (PR #66). Bundles caveman terse-output skill at .claude/skills/caveman/, wires it into the --low-token preset (default-on, opt out via --no-caveman or plan caveman: false), and extends the benchmark harness with an ablation arm to isolate caveman's contribution from the rest of the preset. What ships: - .claude/skills/caveman/{SKILL.md, LICENSE, README.md} — vendored from upstream (github.com/JuliusBrussee/caveman, MIT, fetched 2026-05-05). Canonical content from caveman/SKILL.md at upstream main. - src/zo/cli.py — _LOW_TOKEN_PRESET gains caveman: True. New _resolve_caveman() helper enforces precedence: CLI --no-caveman > plan caveman: false > preset default. --no-caveman flag added to both build and continue commands; banner becomes [low-token + caveman] when both active. - src/zo/orchestrator.py — Orchestrator.__init__ accepts caveman: bool (default False); caveman property added. _prompt_low_token_overrides() extended with a "Token Efficiency Skill: Caveman Mode" subsection that directs the lead and all sub-agents to invoke the auto-loaded skill, with explicit callouts to the safety guarantees (code blocks, quoted errors, tool inputs, structured artifacts all preserved verbatim — caveman only compresses chat prose). - src/zo/plan.py — PlanFrontmatter.caveman: bool | None added (None means "use preset default", explicit False is opt-out). - scripts/benchmark_low_token.sh — adds a third run variant "low-token-no-caveman" that runs --low-token --no-caveman in isolation. Comparing low-token vs low-token-no-caveman gives the caveman delta within the preset. Verified: - ruff src/zo/{cli,orchestrator,plan}.py: clean - bash -n scripts/benchmark_low_token.sh: OK - _resolve_caveman precedence: 4 cases match design (preset default on, --no-caveman wins, plan caveman: false wins, low_token=False forces caveman off regardless) - validate-docs.sh: 10/10 + 1 pre-existing warning Deferred to follow-up commits before PR opens: - Unit tests for new preset flag, _resolve_caveman, prompt directive - Docs: low-token-mode.mdx caveman section, low-token-preset.mdx knob row, cost-benchmark.mdx after-bench number - The actual bench run (needs interactive tmux per harness comments) Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/skills/caveman/LICENSE | 21 +++++++++ .claude/skills/caveman/README.md | 43 +++++++++++++++++++ .claude/skills/caveman/SKILL.md | 74 ++++++++++++++++++++++++++++++++ scripts/benchmark_low_token.sh | 39 +++++++++++++---- src/zo/cli.py | 55 +++++++++++++++++++++++- src/zo/orchestrator.py | 64 ++++++++++++++++++++++++++- src/zo/plan.py | 7 +++ 7 files changed, 290 insertions(+), 13 deletions(-) create mode 100644 .claude/skills/caveman/LICENSE create mode 100644 .claude/skills/caveman/README.md create mode 100644 .claude/skills/caveman/SKILL.md diff --git a/.claude/skills/caveman/LICENSE b/.claude/skills/caveman/LICENSE new file mode 100644 index 0000000..d8c0ee8 --- /dev/null +++ b/.claude/skills/caveman/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2026 Julius Brussee + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/.claude/skills/caveman/README.md b/.claude/skills/caveman/README.md new file mode 100644 index 0000000..95a2ffe --- /dev/null +++ b/.claude/skills/caveman/README.md @@ -0,0 +1,43 @@ +# caveman skill (vendored) + +This directory vendors the [caveman](https://github.com/JuliusBrussee/caveman) Claude Code skill — an ultra-compressed communication mode that cuts agent prose token usage ~75% while keeping all technical substance intact. + +## Why this is here + +ZO's `--low-token` preset auto-activates caveman by default for all spawned agents. It's the largest cost lever available without an SDK refactor (which would unlock prompt caching / Batch API / Files API). + +Caveman is safe to apply across the team because it explicitly preserves: + +- Code blocks (verbatim) +- Quoted error strings (verbatim) +- Tool inputs (Write/Edit are tool calls, not chat output — caveman only compresses chat prose) +- File writes including structured artifacts (`metrics.jsonl`, `result.md`, `training_status.json`) — these go through Write/Edit, not caveman + +It also auto-disables itself for security warnings, irreversible-action confirmations, and any multi-step sequence where dropped articles/conjunctions could create ambiguity. + +## Source + +- Upstream: https://github.com/JuliusBrussee/caveman +- License: MIT (`LICENSE` in this directory — Copyright (c) 2026 Julius Brussee) +- Vendored from: `caveman/SKILL.md` at upstream main, fetched 2026-05-05 +- Variant used: canonical (full intensity, default level) + +## Updating + +To re-sync from upstream: + +```bash +gh api repos/JuliusBrussee/caveman/contents/caveman/SKILL.md --jq '.content' | base64 -d \ + > .claude/skills/caveman/SKILL.md +``` + +Then verify the YAML frontmatter and `## Persistence` / `## Rules` / `## Intensity` / `## Auto-Clarity` / `## Boundaries` sections are intact, and that no upstream-specific paths leaked in. + +## How ZO activates this + +1. User runs `zo build --low-token` (or sets `low_token: true` in plan frontmatter). +2. The `--low-token` preset includes `caveman: True` (default; can be opted out via `--no-caveman` CLI flag or `caveman: false` plan field). +3. The Lead Orchestrator's prompt gains a "Token efficiency mode" subsection (in `_prompt_low_token_overrides()`) that directs the lead and all spawned sub-agents to adopt caveman speech for prose responses. +4. Claude Code auto-loads this `SKILL.md` because of its presence in `.claude/skills/`. The lead's prompt directive ensures it stays active across sub-agent spawns. + +See `docs/concepts/low-token-mode.mdx` and `docs/reference/cost-benchmark.mdx` for measured impact (target: +10–20pp savings on top of the measured ~30% baseline from the lead Opus→Sonnet swap). diff --git a/.claude/skills/caveman/SKILL.md b/.claude/skills/caveman/SKILL.md new file mode 100644 index 0000000..09ea3ef --- /dev/null +++ b/.claude/skills/caveman/SKILL.md @@ -0,0 +1,74 @@ +--- +name: caveman +description: > + Ultra-compressed communication mode. Cuts token usage ~75% by speaking like caveman + while keeping full technical accuracy. Supports intensity levels: lite, full (default), ultra, + wenyan-lite, wenyan-full, wenyan-ultra. + Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens", + "be brief", or invokes /caveman. Also auto-triggers when token efficiency is requested. +--- + +Respond terse like smart caveman. All technical substance stay. Only fluff die. + +## Persistence + +ACTIVE EVERY RESPONSE. No revert after many turns. No filler drift. Still active if unsure. Off only: "stop caveman" / "normal mode". + +Default: **full**. Switch: `/caveman lite|full|ultra`. + +## Rules + +Drop: articles (a/an/the), filler (just/really/basically/actually/simply), pleasantries (sure/certainly/of course/happy to), hedging. Fragments OK. Short synonyms (big not extensive, fix not "implement a solution for"). Technical terms exact. Code blocks unchanged. Errors quoted exact. + +Pattern: `[thing] [action] [reason]. [next step].` + +Not: "Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by..." +Yes: "Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:" + +## Intensity + +| Level | What change | +|-------|------------| +| **lite** | No filler/hedging. Keep articles + full sentences. Professional but tight | +| **full** | Drop articles, fragments OK, short synonyms. Classic caveman | +| **ultra** | Abbreviate prose words (DB/auth/config/req/res/fn/impl), strip conjunctions, arrows for causality (X → Y), one word when one word enough. Code symbols, function names, API names, error strings: never abbreviate | +| **wenyan-lite** | Semi-classical. Drop filler/hedging but keep grammar structure, classical register | +| **wenyan-full** | Maximum classical terseness. Fully 文言文. 80-90% character reduction. Classical sentence patterns, verbs precede objects, subjects often omitted, classical particles (之/乃/為/其) | +| **wenyan-ultra** | Extreme abbreviation while keeping classical Chinese feel. Maximum compression, ultra terse | + +Example — "Why React component re-render?" +- lite: "Your component re-renders because you create a new object reference each render. Wrap it in `useMemo`." +- full: "New object ref each render. Inline object prop = new ref = re-render. Wrap in `useMemo`." +- ultra: "Inline obj prop → new ref → re-render. `useMemo`." +- wenyan-lite: "組件頻重繪,以每繪新生對象參照故。以 useMemo 包之。" +- wenyan-full: "物出新參照,致重繪。useMemo .Wrap之。" +- wenyan-ultra: "新參照→重繪。useMemo Wrap。" + +Example — "Explain database connection pooling." +- lite: "Connection pooling reuses open connections instead of creating new ones per request. Avoids repeated handshake overhead." +- full: "Pool reuse open DB connections. No new connection per request. Skip handshake overhead." +- ultra: "Pool = reuse DB conn. Skip handshake → fast under load." +- wenyan-full: "池reuse open connection。不每req新開。skip handshake overhead。" +- wenyan-ultra: "池reuse conn。skip handshake → fast。" + +## Auto-Clarity + +Drop caveman when: +- Security warnings +- Irreversible action confirmations +- Multi-step sequences where fragment order or omitted conjunctions risk misread +- Compression itself creates technical ambiguity (e.g., `"migrate table drop column backup first"` — order unclear without articles/conjunctions) +- User asks to clarify or repeats question + +Resume caveman after clear part done. + +Example — destructive op: +> **Warning:** This will permanently delete all rows in the `users` table and cannot be undone. +> ```sql +> DROP TABLE users; +> ``` +> Caveman resume. Verify backup exist first. + +## Boundaries + +Code/commits/PRs: write normal. "stop caveman" or "normal mode": revert. Level persist until changed or session end. diff --git a/scripts/benchmark_low_token.sh b/scripts/benchmark_low_token.sh index a78d64d..01daadf 100755 --- a/scripts/benchmark_low_token.sh +++ b/scripts/benchmark_low_token.sh @@ -1,25 +1,38 @@ #!/usr/bin/env bash -# benchmark_low_token.sh — measure token cost reduction from --low-token mode. +# benchmark_low_token.sh — measure token cost reduction from --low-token mode +# and caveman-skill ablation. # -# Runs `zo build` against the MNIST plan twice (default mode + low-token mode), -# captures Claude Code token usage via `npx ccusage`, and writes a comparison -# JSON + console summary. +# Runs `zo build` against the MNIST plan up to THREE times: +# - default (no flags — historical anchor, ~$11) +# - low-token (--low-token, caveman default-on — measures the +# full preset including caveman skill) +# - low-token-no-caveman (--low-token --no-caveman — isolates caveman's +# contribution within low-token mode) +# +# Comparing low-token vs low-token-no-caveman gives the caveman delta. +# Comparing default vs low-token gives the full low-token-mode savings. +# +# Captures Claude Code token usage via `npx ccusage` and writes a +# comparison JSON + console summary. # # Prerequisites: # - `zo` CLI installed (this repo, ./setup.sh) # - `claude` CLI logged in # - `npx ccusage` available (npm install -g ccusage) -# - Mac or Linux dev box (Apple Silicon recommended for ~25min low-token wall time) -# - ~75 minutes wall time, ~$13-14 spend on Anthropic API +# - Mac or Linux dev box (Apple Silicon recommended) +# - ~110 minutes wall time across all three runs, ~$20-22 spend on Anthropic API +# (default ~$11 + low-token ~$5-6 + low-token-no-caveman ~$6-7) # # Usage: # ./scripts/benchmark_low_token.sh # ./scripts/benchmark_low_token.sh --delivery-prefix /tmp/zo-bench -# ./scripts/benchmark_low_token.sh --skip-default --skip-low-token # dry preview +# ./scripts/benchmark_low_token.sh --skip-default # only low-token variants +# ./scripts/benchmark_low_token.sh --skip-low-token-no-caveman # skip ablation arm +# ./scripts/benchmark_low_token.sh --skip-default --skip-low-token --skip-low-token-no-caveman # dry preview # ./scripts/benchmark_low_token.sh --help # # Output: -# benchmark-results-{ISO-timestamp}.json — full diff + summary +# benchmark-results-{ISO-timestamp}.json — full diff + summary across all runs # stdout — human-readable summary table set -euo pipefail @@ -31,6 +44,7 @@ set -euo pipefail DELIVERY_PREFIX="${TMPDIR:-/tmp}/zo-low-token-bench" RUN_DEFAULT=true RUN_LOW_TOKEN=true +RUN_LOW_TOKEN_NO_CAVEMAN=true TIMESTAMP="$(date -u +%Y%m%d-%H%M%S)" RESULT_FILE="benchmark-results-${TIMESTAMP}.json" @@ -42,6 +56,8 @@ while [[ $# -gt 0 ]]; do RUN_DEFAULT=false; shift ;; --skip-low-token) RUN_LOW_TOKEN=false; shift ;; + --skip-low-token-no-caveman) + RUN_LOW_TOKEN_NO_CAVEMAN=false; shift ;; --help|-h) grep '^# ' "$0" | sed 's/^# //' exit 0 ;; @@ -54,6 +70,7 @@ done DEFAULT_DELIVERY="${DELIVERY_PREFIX}-default" LOW_TOKEN_DELIVERY="${DELIVERY_PREFIX}-low-token" +LOW_TOKEN_NO_CAVEMAN_DELIVERY="${DELIVERY_PREFIX}-low-token-no-caveman" PLAN_PATH="plans/mnist-digit-classifier.md" # --------------------------------------------------------------------------- @@ -189,6 +206,10 @@ if [[ "$RUN_LOW_TOKEN" == "true" ]]; then run_one "low-token" "$LOW_TOKEN_DELIVERY" --low-token fi +if [[ "$RUN_LOW_TOKEN_NO_CAVEMAN" == "true" ]]; then + run_one "low-token-no-caveman" "$LOW_TOKEN_NO_CAVEMAN_DELIVERY" --low-token --no-caveman +fi + # --------------------------------------------------------------------------- # Summarise # --------------------------------------------------------------------------- @@ -213,7 +234,7 @@ results = { "runs": {} } -for label in ("default", "low-token"): +for label in ("default", "low-token", "low-token-no-caveman"): meta_path = Path(f"{prefix}-{label}-meta.json") if not meta_path.exists(): continue diff --git a/src/zo/cli.py b/src/zo/cli.py index 493f20f..c3b3167 100644 --- a/src/zo/cli.py +++ b/src/zo/cli.py @@ -266,6 +266,9 @@ def _gate_mode_from_str(value: str) -> GateMode: "headlines_disabled": True, # disable Haiku ticker (~60 calls/hr) "gate_mode": "full-auto", # no human-loop overhead "compact_threshold": "60", # CLAUDE_AUTOCOMPACT_PCT_OVERRIDE + "caveman": True, # vendor caveman skill at .claude/skills/caveman/ + # for terse prose responses; preserves code blocks + # and structured artifacts. Opt out via --no-caveman. } @@ -298,18 +301,45 @@ def _resolve_gate_mode( return "supervised" +def _resolve_caveman( + *, + cli_no_caveman: bool, + plan_caveman: bool | None, + low_token: bool, +) -> bool: + """Resolve effective caveman activation. + + Precedence (highest first): CLI ``--no-caveman`` flag > plan field + ``caveman: false`` > preset default. Caveman only activates when + ``low_token`` is also True; outside low-token mode the skill file at + ``.claude/skills/caveman/SKILL.md`` is still available for manual + invocation, but ZO does not auto-direct agents to use it. + """ + if not low_token: + return False + if cli_no_caveman: + return False + if plan_caveman is False: + return False + return bool(_LOW_TOKEN_PRESET["caveman"]) + + def _show_banner( project: str = "", mode: str = "", phase: str = "", gate_mode: str = "", low_token: bool = False, + caveman: bool = False, ) -> None: """Display the ZO brand panel at startup. When ``low_token`` is True, appends a "low-token" badge to the banner so the user has constant visual confirmation that the - cost-saving preset is active. + cost-saving preset is active. When ``caveman`` is also True + (only meaningful with ``low_token``), the badge becomes + "low-token + caveman" to signal the additional terse-output skill + is engaged. """ from rich.panel import Panel from rich.text import Text @@ -319,7 +349,8 @@ def _show_banner( logo.append("Zero Operators", style="#F0C040 bold") logo.append(f" v{_VERSION}", style=_DIM) if low_token: - logo.append(" [low-token]", style="#F0C040 bold") + badge = " [low-token + caveman]" if caveman else " [low-token]" + logo.append(badge, style="#F0C040 bold") logo.append("\n", style=_DIM) logo.append(" Autonomous AI Research & Engineering Teams\n", style=_DIM) if project: @@ -910,6 +941,12 @@ def _print_status(team_status, pane_snapshot=""): # noqa: ANN001 "--no-headlines", is_flag=True, help="Disable the Haiku headline ticker (saves ~60 small calls/hour).", ) +@click.option( + "--no-caveman", is_flag=True, + help="Opt out of caveman terse-output skill (auto-on with --low-token). " + "Use if your run produces structured prose that caveman would compress " + "ambiguously. Code blocks and structured artifacts are always preserved.", +) def build( plan_path: Path, gate_mode: str | None, @@ -918,6 +955,7 @@ def build( lead_model: str | None, max_iterations: int | None, no_headlines: bool, + no_caveman: bool, ) -> None: """Launch a project from a plan.md file. @@ -960,6 +998,11 @@ def build( plan_lead_model=plan.frontmatter.lead_model, low_token=effective_low_token, ) + effective_caveman = _resolve_caveman( + cli_no_caveman=no_caveman, + plan_caveman=plan.frontmatter.caveman, + low_token=effective_low_token, + ) effective_headlines_disabled = ( no_headlines or effective_low_token ) @@ -991,6 +1034,7 @@ def build( phase=state_check.phase if detected_mode == "continue" else "starting", gate_mode=effective_gate_mode, low_token=effective_low_token, + caveman=effective_caveman, ) # 5. Create CommsLogger and SemanticIndex @@ -1017,6 +1061,7 @@ def build( semantic=semantic, zo_root=zo_root, gate_mode=gm, plan_path=plan_path, low_token=effective_low_token, + caveman=effective_caveman, max_iterations_override=max_iterations, ) orchestrator.start_session() @@ -1098,6 +1143,10 @@ def build( "--no-headlines", is_flag=True, help="Disable the Haiku headline ticker.", ) +@click.option( + "--no-caveman", is_flag=True, + help="Opt out of caveman terse-output skill (auto-on with --low-token).", +) def continue_( project_name: str | None, repo: str | None, @@ -1107,6 +1156,7 @@ def continue_( lead_model: str | None, max_iterations: int | None, no_headlines: bool, + no_caveman: bool, ) -> None: """Resume a paused project or reconnect on a new machine. @@ -1172,6 +1222,7 @@ def continue_( lead_model=lead_model, max_iterations=max_iterations, no_headlines=no_headlines, + no_caveman=no_caveman, ) diff --git a/src/zo/orchestrator.py b/src/zo/orchestrator.py index 8d9419e..9a04963 100644 --- a/src/zo/orchestrator.py +++ b/src/zo/orchestrator.py @@ -171,6 +171,7 @@ def __init__( gate_mode: GateMode = GateMode.SUPERVISED, plan_path: Path | None = None, low_token: bool = False, + caveman: bool = False, max_iterations_override: int | None = None, ) -> None: self._plan = plan @@ -182,6 +183,7 @@ def __init__( self._gate_mode = gate_mode self._plan_path = plan_path self._low_token = low_token + self._caveman = caveman self._max_iterations_override = max_iterations_override self._workflow: WorkflowDecomposition | None = None self._session_state: SessionState | None = None @@ -207,6 +209,17 @@ def low_token(self) -> bool: """Whether the low-token preset is active for this orchestrator.""" return self._low_token + @property + def caveman(self) -> bool: + """Whether the caveman terse-output skill is active. + + Only meaningful when ``low_token`` is also True; outside + low-token mode the orchestrator does not auto-direct agents to + invoke caveman (the skill file is still available at + ``.claude/skills/caveman/SKILL.md`` for manual use). + """ + return self._caveman + @property def session_state(self) -> SessionState: """Current session state (reads from memory if not yet loaded).""" @@ -399,13 +412,19 @@ def _prompt_low_token_overrides(self) -> str: Without this two-tier routing the savings ceiling is ~30% (lead-swap-only); with it, the ceiling moves to ~50-60% per ``docs/reference/cost-benchmark.mdx``. + + When ``caveman`` is also active (default with low-token; opt out + via ``--no-caveman`` or plan ``caveman: false``), an additional + "Token Efficiency Skill" subsection is appended directing the + lead and all sub-agents to invoke the vendored caveman skill + for terse prose responses. """ if not self._low_token: return "" haiku_list = ", ".join( f"`{name}`" for name in sorted(LOW_TOKEN_HAIKU_AGENTS) ) - return dedent(f"""\ + sections = [dedent(f"""\ # Low-Token Sub-Agent Model Override **CRITICAL:** Low-token mode is active. The lead session (you) @@ -454,7 +473,48 @@ def _prompt_low_token_overrides(self) -> str: does not accept a `model` parameter, log a `DECISION_LOG` entry noting this and proceed without the override — lead-side savings still apply, but sub-agent savings - cannot be achieved without an SDK-level fix.""") + cannot be achieved without an SDK-level fix.""")] + if self._caveman: + sections.append(dedent("""\ + # Token Efficiency Skill: Caveman Mode + + **ACTIVATE caveman skill.** The vendored skill at + `.claude/skills/caveman/SKILL.md` is auto-loaded by Claude + Code. Adopt caveman speech (full intensity by default) for + all prose responses — yours and every sub-agent you spawn. + + **What caveman compresses (chat prose):** + - Drop articles (a/an/the), filler (just/really/basically), + pleasantries (sure/certainly/of course), hedging. + - Fragments OK. Short synonyms (big not extensive). + - Pattern: `[thing] [action] [reason]. [next step].` + + **What caveman LEAVES INTACT (this is the safety guarantee + that makes auto-activation safe across the team):** + - Code blocks — verbatim, never compressed. + - Quoted error strings — verbatim. + - Tool inputs (Write/Edit args) — caveman only touches + chat output, not tool calls. So all structured + artifacts the gates require (`metrics.jsonl`, + `result.md`, `training_status.json`, `hypothesis.md`, + agent contracts) are written normally. + - Security warnings, irreversible-action confirmations, + multi-step sequences with order ambiguity — caveman + auto-disables for these per its own rules. + + **For each sub-agent you spawn:** include "Use caveman + skill (full intensity) for all prose. Code blocks and + tool inputs unchanged per skill rules." in their + instructions, OR rely on the auto-load + skill + description (the skill's `description:` field declares + "auto-triggers when token efficiency is requested" — + low-token mode IS that request). Both work. + + **Opt-out for the user:** they can pass `--no-caveman` at + CLI time or set `caveman: false` in plan frontmatter to + disable this section. If they did, this subsection would + not be present.""" )) + return "\n\n".join(sections) def _prompt_autonomy(self) -> str: """Tell the agent how much autonomy it has based on gate mode.""" diff --git a/src/zo/plan.py b/src/zo/plan.py index 2535395..9fe28ff 100644 --- a/src/zo/plan.py +++ b/src/zo/plan.py @@ -39,6 +39,12 @@ class PlanFrontmatter(BaseModel): activates the preset; ``lead_model`` overrides the lead orchestrator model. Both can also be supplied via CLI flags, which take precedence (CLI > plan field > preset default > base default). + + ``caveman`` is opt-out for the caveman terse-output skill that + auto-activates with ``low_token``. ``None`` (the default) means + "use the preset default" (currently True). Set to ``False`` in plan + frontmatter to suppress caveman activation even when low-token is + on. CLI flag ``--no-caveman`` takes precedence over plan and preset. """ project_name: str @@ -49,6 +55,7 @@ class PlanFrontmatter(BaseModel): owner: str low_token: bool = False lead_model: str | None = None + caveman: bool | None = None class OracleDefinition(BaseModel): From 6a87b41069561bdb4c48be2f76fba5f96e998c80 Mon Sep 17 00:00:00 2001 From: SamPlvs Date: Tue, 5 May 2026 13:56:04 +0100 Subject: [PATCH 2/2] =?UTF-8?q?fix(low-token):=20inline=20caveman=20rules?= =?UTF-8?q?=20=E2=80=94=20skill=20auto-load=20needs=20hooks=20we=20don't?= =?UTF-8?q?=20install?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After reading caveman's upstream CLAUDE.md, found that always-on activation in Claude Code is delivered by SessionStart and UserPromptSubmit hooks (their hooks/install.sh patches ~/.claude/settings.json and copies hook scripts into ~/.claude/hooks/). The previous commit's prompt directive said "ACTIVATE caveman skill, the vendored SKILL.md is auto-loaded by Claude Code". This was incorrect. Skills in .claude/skills//SKILL.md are *invocations*, not always-on enforcement. Without hooks, the file alone does nothing to make agents adopt caveman speech. We deliberately don't install upstream's hooks (would modify the user's ~/.claude/ globally — too invasive for an opt-in cost-saving feature). Fix: inline the caveman rules directly into the orchestrator prompt. The lead and every spawned sub-agent get the rules verbatim in their prompts. No skill or hook system required — savings come from agents adopting the style as instructed. Files: - src/zo/orchestrator.py — _prompt_low_token_overrides() caveman subsection rewritten. Drops "load this skill" framing; contains the rules inline (intensity full, what to compress, what stays verbatim, special handling for DECISION_LOG / gate rationales / hand-off summaries — those use lite intensity to stay readable cold). Docstring updated to reflect the inline approach. - .claude/skills/caveman/README.md — clarified that this directory is a reference copy, not a runtime activation mechanism. Corrected upstream source-of-truth path (skills/caveman/SKILL.md, not caveman/SKILL.md which is auto-generated). Updated the "Updating" command and "Activation summary" to match. Verified: - ruff src/zo/orchestrator.py: clean - smoke test confirms new "Token Efficiency: Caveman-Style Prose" section present, no "auto-loaded by Claude" claim remaining - validate-docs.sh: 10/10 + 1 pre-existing warning The file-vendoring stays — it's a reference for developers and forward-compat if Claude Code adds proper project-level skill auto-loading later. Bench remains the only honest measurement of actual savings. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/skills/caveman/README.md | 43 ++++++++++------ src/zo/orchestrator.py | 84 +++++++++++++++++++------------- 2 files changed, 77 insertions(+), 50 deletions(-) diff --git a/.claude/skills/caveman/README.md b/.claude/skills/caveman/README.md index 95a2ffe..8d8bead 100644 --- a/.claude/skills/caveman/README.md +++ b/.claude/skills/caveman/README.md @@ -1,17 +1,26 @@ -# caveman skill (vendored) +# caveman skill (vendored — reference copy) -This directory vendors the [caveman](https://github.com/JuliusBrussee/caveman) Claude Code skill — an ultra-compressed communication mode that cuts agent prose token usage ~75% while keeping all technical substance intact. +This directory vendors the [caveman](https://github.com/JuliusBrussee/caveman) Claude Code skill — an ultra-compressed communication mode that cuts agent prose token usage ~65–75% while keeping all technical substance intact. -## Why this is here +## How ZO uses caveman (read this first) -ZO's `--low-token` preset auto-activates caveman by default for all spawned agents. It's the largest cost lever available without an SDK refactor (which would unlock prompt caching / Batch API / Files API). +**ZO does NOT rely on Claude Code auto-loading this `SKILL.md` file.** Caveman's always-on activation in Claude Code comes from upstream's SessionStart and UserPromptSubmit hooks (`hooks/install.sh` in their repo) which copy hook scripts into `~/.claude/hooks/` and patch `~/.claude/settings.json`. We deliberately do not run that installer — modifying the user's global Claude Code config is too invasive for an opt-in cost-saving feature. -Caveman is safe to apply across the team because it explicitly preserves: +Instead, when `--low-token` is active (and `caveman` is not opted out), the orchestrator inlines the caveman rules directly into the Lead Orchestrator's prompt — `src/zo/orchestrator.py:_prompt_low_token_overrides()` appends a "Token Efficiency: Caveman-Style Prose" subsection that contains the rules verbatim. The lead is instructed to pass the same rules to every sub-agent it spawns. The savings come from agents adopting the style as instructed; no skill or hook system is required. + +This `SKILL.md` file is kept here as: +1. A **reference copy** of the canonical rules — useful if a developer wants to read the upstream spec without leaving the repo. +2. A **forward-compatible artifact** — if a future Claude Code release adds proper project-level skill auto-loading, this file is already in the right place to be picked up. +3. A **reference for users who install caveman properly** — anyone who runs upstream's `install.sh` separately gets the full hook-based always-on enforcement, and this file is a convenient way to confirm version alignment. + +## Why caveman is safe to apply across the team + +The skill explicitly preserves: - Code blocks (verbatim) - Quoted error strings (verbatim) -- Tool inputs (Write/Edit are tool calls, not chat output — caveman only compresses chat prose) -- File writes including structured artifacts (`metrics.jsonl`, `result.md`, `training_status.json`) — these go through Write/Edit, not caveman +- Tool inputs (Write/Edit args — caveman only compresses chat prose, not tool calls) +- Structured artifacts (`metrics.jsonl`, `result.md`, `training_status.json`, agent contracts) — these go through Write/Edit, not chat It also auto-disables itself for security warnings, irreversible-action confirmations, and any multi-step sequence where dropped articles/conjunctions could create ambiguity. @@ -19,25 +28,27 @@ It also auto-disables itself for security warnings, irreversible-action confirma - Upstream: https://github.com/JuliusBrussee/caveman - License: MIT (`LICENSE` in this directory — Copyright (c) 2026 Julius Brussee) -- Vendored from: `caveman/SKILL.md` at upstream main, fetched 2026-05-05 -- Variant used: canonical (full intensity, default level) +- Source-of-truth path upstream: `skills/caveman/SKILL.md` +- Auto-generated mirror at upstream root: `caveman/SKILL.md` (identical content) +- Vendored: 2026-05-05 +- Intensity: full (default) ## Updating -To re-sync from upstream: +To re-sync from upstream's source-of-truth: ```bash -gh api repos/JuliusBrussee/caveman/contents/caveman/SKILL.md --jq '.content' | base64 -d \ +gh api repos/JuliusBrussee/caveman/contents/skills/caveman/SKILL.md --jq '.content' | base64 -d \ > .claude/skills/caveman/SKILL.md ``` -Then verify the YAML frontmatter and `## Persistence` / `## Rules` / `## Intensity` / `## Auto-Clarity` / `## Boundaries` sections are intact, and that no upstream-specific paths leaked in. +Then verify the YAML frontmatter and `## Persistence` / `## Rules` / `## Intensity` / `## Auto-Clarity` / `## Boundaries` sections are intact, and re-check whether the inlined rules in `src/zo/orchestrator.py:_prompt_low_token_overrides()` need updating to match material changes. -## How ZO activates this +## Activation summary 1. User runs `zo build --low-token` (or sets `low_token: true` in plan frontmatter). -2. The `--low-token` preset includes `caveman: True` (default; can be opted out via `--no-caveman` CLI flag or `caveman: false` plan field). -3. The Lead Orchestrator's prompt gains a "Token efficiency mode" subsection (in `_prompt_low_token_overrides()`) that directs the lead and all spawned sub-agents to adopt caveman speech for prose responses. -4. Claude Code auto-loads this `SKILL.md` because of its presence in `.claude/skills/`. The lead's prompt directive ensures it stays active across sub-agent spawns. +2. The `--low-token` preset includes `caveman: True` (default; opt out via `--no-caveman` CLI flag or `caveman: false` plan field). +3. The Lead Orchestrator's prompt gains a "Token Efficiency: Caveman-Style Prose" subsection containing the rules inline. +4. Lead adopts the style; sub-agents receive the same rules in their spawn prompts and follow them too. See `docs/concepts/low-token-mode.mdx` and `docs/reference/cost-benchmark.mdx` for measured impact (target: +10–20pp savings on top of the measured ~30% baseline from the lead Opus→Sonnet swap). diff --git a/src/zo/orchestrator.py b/src/zo/orchestrator.py index 9a04963..91f91ed 100644 --- a/src/zo/orchestrator.py +++ b/src/zo/orchestrator.py @@ -415,9 +415,12 @@ def _prompt_low_token_overrides(self) -> str: When ``caveman`` is also active (default with low-token; opt out via ``--no-caveman`` or plan ``caveman: false``), an additional - "Token Efficiency Skill" subsection is appended directing the - lead and all sub-agents to invoke the vendored caveman skill - for terse prose responses. + "Token Efficiency: Caveman-Style Prose" subsection is appended + containing the caveman rules **inline** (no Claude Code skill + auto-loading dependency — that path requires hooks we + deliberately don't install). The lead is instructed to pass the + same rules into every sub-agent's spawn prompt so the team + adopts terse prose end-to-end. """ if not self._low_token: return "" @@ -476,44 +479,57 @@ def _prompt_low_token_overrides(self) -> str: cannot be achieved without an SDK-level fix.""")] if self._caveman: sections.append(dedent("""\ - # Token Efficiency Skill: Caveman Mode - - **ACTIVATE caveman skill.** The vendored skill at - `.claude/skills/caveman/SKILL.md` is auto-loaded by Claude - Code. Adopt caveman speech (full intensity by default) for - all prose responses — yours and every sub-agent you spawn. - - **What caveman compresses (chat prose):** - - Drop articles (a/an/the), filler (just/really/basically), - pleasantries (sure/certainly/of course), hedging. - - Fragments OK. Short synonyms (big not extensive). + # Token Efficiency: Caveman-Style Prose + + **Adopt terse, caveman-style prose for all chat + responses.** Pass this same instruction to every + sub-agent you spawn (include the rules below in their + spawn prompt — `Agent(..., prompt="...caveman rules...")`). + The reference skill is vendored at + `.claude/skills/caveman/SKILL.md` for users who install + the upstream hooks; we do not assume those hooks are + present, so the rules below are authoritative for this + session. + + **Rules (intensity: full):** + - Drop articles (a/an/the), filler (just/really/basically/ + actually/simply), pleasantries (sure/certainly/of course/ + happy to), hedging. + - Fragments OK. Short synonyms (big not extensive, + fix not "implement a solution for"). Technical terms + exact. - Pattern: `[thing] [action] [reason]. [next step].` - - **What caveman LEAVES INTACT (this is the safety guarantee - that makes auto-activation safe across the team):** - - Code blocks — verbatim, never compressed. - - Quoted error strings — verbatim. - - Tool inputs (Write/Edit args) — caveman only touches + - Example bad: "Sure! I'd be happy to help. The issue + is likely caused by your authentication middleware..." + - Example good: "Bug in auth middleware. Token expiry + check use `<` not `<=`. Fix:" + + **What stays VERBATIM (do not compress):** + - Code blocks — never compressed, every character intact. + - Quoted error strings — exact reproduction. + - Tool inputs (Write/Edit/Bash args) — we only compress chat output, not tool calls. So all structured artifacts the gates require (`metrics.jsonl`, `result.md`, `training_status.json`, `hypothesis.md`, - agent contracts) are written normally. + agent contracts) are written normally — they go + through Write/Edit, not chat. - Security warnings, irreversible-action confirmations, - multi-step sequences with order ambiguity — caveman - auto-disables for these per its own rules. - - **For each sub-agent you spawn:** include "Use caveman - skill (full intensity) for all prose. Code blocks and - tool inputs unchanged per skill rules." in their - instructions, OR rely on the auto-load + skill - description (the skill's `description:` field declares - "auto-triggers when token efficiency is requested" — - low-token mode IS that request). Both work. + multi-step sequences where dropped articles/conjunctions + could create order ambiguity. Drop caveman for these + and resume after the clear part is done. + + **DECISION_LOG entries, gate rationales, and + hand-off summaries:** these are persisted artifacts. + Apply caveman lightly — terse prose, but still readable + by the next session that reads the file cold. Lean + toward `lite` intensity for these (no filler/hedging, + keep articles + full sentences) over `full` (fragments, + drop articles). **Opt-out for the user:** they can pass `--no-caveman` at - CLI time or set `caveman: false` in plan frontmatter to - disable this section. If they did, this subsection would - not be present.""" )) + CLI time or set `caveman: false` in plan frontmatter. If + they did, this subsection would not be present and you + would write normal prose.""" )) return "\n\n".join(sections) def _prompt_autonomy(self) -> str: