From f190bc7860253909bfa1ed2056a458bf28a716e0 Mon Sep 17 00:00:00 2001
From: SamPlvs <samyaktukra@gmail.com>
Date: Tue, 5 May 2026 13:08:46 +0100
Subject: [PATCH 1/2] feat(low-token): bundle caveman skill and wire into
 --low-token preset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

First Tier 1 deliverable per the Q3/H2 roadmap (PR #66).

Bundles caveman terse-output skill at .claude/skills/caveman/, wires
it into the --low-token preset (default-on, opt out via --no-caveman
or plan caveman: false), and extends the benchmark harness with an
ablation arm to isolate caveman's contribution from the rest of the
preset.

What ships:

- .claude/skills/caveman/{SKILL.md, LICENSE, README.md} — vendored
  from upstream (github.com/JuliusBrussee/caveman, MIT, fetched
  2026-05-05). Canonical content from caveman/SKILL.md at upstream main.

- src/zo/cli.py — _LOW_TOKEN_PRESET gains caveman: True. New
  _resolve_caveman() helper enforces precedence: CLI --no-caveman >
  plan caveman: false > preset default. --no-caveman flag added to
  both build and continue commands; banner becomes
  [low-token + caveman] when both active.

- src/zo/orchestrator.py — Orchestrator.__init__ accepts caveman: bool
  (default False); caveman property added. _prompt_low_token_overrides()
  extended with a "Token Efficiency Skill: Caveman Mode" subsection
  that directs the lead and all sub-agents to invoke the auto-loaded
  skill, with explicit callouts to the safety guarantees (code blocks,
  quoted errors, tool inputs, structured artifacts all preserved
  verbatim — caveman only compresses chat prose).

- src/zo/plan.py — PlanFrontmatter.caveman: bool | None added (None
  means "use preset default", explicit False is opt-out).

- scripts/benchmark_low_token.sh — adds a third run variant
  "low-token-no-caveman" that runs --low-token --no-caveman in
  isolation. Comparing low-token vs low-token-no-caveman gives the
  caveman delta within the preset.

Verified:
- ruff src/zo/{cli,orchestrator,plan}.py: clean
- bash -n scripts/benchmark_low_token.sh: OK
- _resolve_caveman precedence: 4 cases match design
  (preset default on, --no-caveman wins, plan caveman: false wins,
  low_token=False forces caveman off regardless)
- validate-docs.sh: 10/10 + 1 pre-existing warning

Deferred to follow-up commits before PR opens:
- Unit tests for new preset flag, _resolve_caveman, prompt directive
- Docs: low-token-mode.mdx caveman section, low-token-preset.mdx
  knob row, cost-benchmark.mdx after-bench number
- The actual bench run (needs interactive tmux per harness comments)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .claude/skills/caveman/LICENSE   | 21 +++++++++
 .claude/skills/caveman/README.md | 43 +++++++++++++++++++
 .claude/skills/caveman/SKILL.md  | 74 ++++++++++++++++++++++++++++++++
 scripts/benchmark_low_token.sh   | 39 +++++++++++++----
 src/zo/cli.py                    | 55 +++++++++++++++++++++++-
 src/zo/orchestrator.py           | 64 ++++++++++++++++++++++++++-
 src/zo/plan.py                   |  7 +++
 7 files changed, 290 insertions(+), 13 deletions(-)
 create mode 100644 .claude/skills/caveman/LICENSE
 create mode 100644 .claude/skills/caveman/README.md
 create mode 100644 .claude/skills/caveman/SKILL.md

diff --git a/.claude/skills/caveman/LICENSE b/.claude/skills/caveman/LICENSE
new file mode 100644
index 0000000..d8c0ee8
--- /dev/null
+++ b/.claude/skills/caveman/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 Julius Brussee
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/.claude/skills/caveman/README.md b/.claude/skills/caveman/README.md
new file mode 100644
index 0000000..95a2ffe
--- /dev/null
+++ b/.claude/skills/caveman/README.md
@@ -0,0 +1,43 @@
+# caveman skill (vendored)
+
+This directory vendors the [caveman](https://github.com/JuliusBrussee/caveman) Claude Code skill — an ultra-compressed communication mode that cuts agent prose token usage ~75% while keeping all technical substance intact.
+
+## Why this is here
+
+ZO's `--low-token` preset auto-activates caveman by default for all spawned agents. It's the largest cost lever available without an SDK refactor (which would unlock prompt caching / Batch API / Files API).
+
+Caveman is safe to apply across the team because it explicitly preserves:
+
+- Code blocks (verbatim)
+- Quoted error strings (verbatim)
+- Tool inputs (Write/Edit are tool calls, not chat output — caveman only compresses chat prose)
+- File writes including structured artifacts (`metrics.jsonl`, `result.md`, `training_status.json`) — these go through Write/Edit, not caveman
+
+It also auto-disables itself for security warnings, irreversible-action confirmations, and any multi-step sequence where dropped articles/conjunctions could create ambiguity.
+
+## Source
+
+- Upstream: https://github.com/JuliusBrussee/caveman
+- License: MIT (`LICENSE` in this directory — Copyright (c) 2026 Julius Brussee)
+- Vendored from: `caveman/SKILL.md` at upstream main, fetched 2026-05-05
+- Variant used: canonical (full intensity, default level)
+
+## Updating
+
+To re-sync from upstream:
+
+```bash
+gh api repos/JuliusBrussee/caveman/contents/caveman/SKILL.md --jq '.content' | base64 -d \
+  > .claude/skills/caveman/SKILL.md
+```
+
+Then verify the YAML frontmatter and `## Persistence` / `## Rules` / `## Intensity` / `## Auto-Clarity` / `## Boundaries` sections are intact, and that no upstream-specific paths leaked in.
+
+## How ZO activates this
+
+1. User runs `zo build --low-token` (or sets `low_token: true` in plan frontmatter).
+2. The `--low-token` preset includes `caveman: True` (default; can be opted out via `--no-caveman` CLI flag or `caveman: false` plan field).
+3. The Lead Orchestrator's prompt gains a "Token efficiency mode" subsection (in `_prompt_low_token_overrides()`) that directs the lead and all spawned sub-agents to adopt caveman speech for prose responses.
+4. Claude Code auto-loads this `SKILL.md` because of its presence in `.claude/skills/`. The lead's prompt directive ensures it stays active across sub-agent spawns.
+
+See `docs/concepts/low-token-mode.mdx` and `docs/reference/cost-benchmark.mdx` for measured impact (target: +10–20pp savings on top of the measured ~30% baseline from the lead Opus→Sonnet swap).
diff --git a/.claude/skills/caveman/SKILL.md b/.claude/skills/caveman/SKILL.md
new file mode 100644
index 0000000..09ea3ef
--- /dev/null
+++ b/.claude/skills/caveman/SKILL.md
@@ -0,0 +1,74 @@
+---
+name: caveman
+description: >
+  Ultra-compressed communication mode. Cuts token usage ~75% by speaking like caveman
+  while keeping full technical accuracy. Supports intensity levels: lite, full (default), ultra,
+  wenyan-lite, wenyan-full, wenyan-ultra.
+  Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens",
+  "be brief", or invokes /caveman. Also auto-triggers when token efficiency is requested.
+---
+
+Respond terse like smart caveman. All technical substance stay. Only fluff die.
+
+## Persistence
+
+ACTIVE EVERY RESPONSE. No revert after many turns. No filler drift. Still active if unsure. Off only: "stop caveman" / "normal mode".
+
+Default: **full**. Switch: `/caveman lite|full|ultra`.
+
+## Rules
+
+Drop: articles (a/an/the), filler (just/really/basically/actually/simply), pleasantries (sure/certainly/of course/happy to), hedging. Fragments OK. Short synonyms (big not extensive, fix not "implement a solution for"). Technical terms exact. Code blocks unchanged. Errors quoted exact.
+
+Pattern: `[thing] [action] [reason]. [next step].`
+
+Not: "Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by..."
+Yes: "Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:"
+
+## Intensity
+
+| Level | What change |
+|-------|------------|
+| **lite** | No filler/hedging. Keep articles + full sentences. Professional but tight |
+| **full** | Drop articles, fragments OK, short synonyms. Classic caveman |
+| **ultra** | Abbreviate prose words (DB/auth/config/req/res/fn/impl), strip conjunctions, arrows for causality (X → Y), one word when one word enough. Code symbols, function names, API names, error strings: never abbreviate |
+| **wenyan-lite** | Semi-classical. Drop filler/hedging but keep grammar structure, classical register |
+| **wenyan-full** | Maximum classical terseness. Fully 文言文. 80-90% character reduction. Classical sentence patterns, verbs precede objects, subjects often omitted, classical particles (之/乃/為/其) |
+| **wenyan-ultra** | Extreme abbreviation while keeping classical Chinese feel. Maximum compression, ultra terse |
+
+Example — "Why React component re-render?"
+- lite: "Your component re-renders because you create a new object reference each render. Wrap it in `useMemo`."
+- full: "New object ref each render. Inline object prop = new ref = re-render. Wrap in `useMemo`."
+- ultra: "Inline obj prop → new ref → re-render. `useMemo`."
+- wenyan-lite: "組件頻重繪，以每繪新生對象參照故。以 useMemo 包之。"
+- wenyan-full: "物出新參照，致重繪。useMemo .Wrap之。"
+- wenyan-ultra: "新參照→重繪。useMemo Wrap。"
+
+Example — "Explain database connection pooling."
+- lite: "Connection pooling reuses open connections instead of creating new ones per request. Avoids repeated handshake overhead."
+- full: "Pool reuse open DB connections. No new connection per request. Skip handshake overhead."
+- ultra: "Pool = reuse DB conn. Skip handshake → fast under load."
+- wenyan-full: "池reuse open connection。不每req新開。skip handshake overhead。"
+- wenyan-ultra: "池reuse conn。skip handshake → fast。"
+
+## Auto-Clarity
+
+Drop caveman when:
+- Security warnings
+- Irreversible action confirmations
+- Multi-step sequences where fragment order or omitted conjunctions risk misread
+- Compression itself creates technical ambiguity (e.g., `"migrate table drop column backup first"` — order unclear without articles/conjunctions)
+- User asks to clarify or repeats question
+
+Resume caveman after clear part done.
+
+Example — destructive op:
+> **Warning:** This will permanently delete all rows in the `users` table and cannot be undone.
+> ```sql
+> DROP TABLE users;
+> ```
+> Caveman resume. Verify backup exist first.
+
+## Boundaries
+
+Code/commits/PRs: write normal. "stop caveman" or "normal mode": revert. Level persist until changed or session end.
diff --git a/scripts/benchmark_low_token.sh b/scripts/benchmark_low_token.sh
index a78d64d..01daadf 100755
--- a/scripts/benchmark_low_token.sh
+++ b/scripts/benchmark_low_token.sh
@@ -1,25 +1,38 @@
 #!/usr/bin/env bash
-# benchmark_low_token.sh — measure token cost reduction from --low-token mode.
+# benchmark_low_token.sh — measure token cost reduction from --low-token mode
+# and caveman-skill ablation.
 #
-# Runs `zo build` against the MNIST plan twice (default mode + low-token mode),
-# captures Claude Code token usage via `npx ccusage`, and writes a comparison
-# JSON + console summary.
+# Runs `zo build` against the MNIST plan up to THREE times:
+#   - default              (no flags — historical anchor, ~$11)
+#   - low-token            (--low-token, caveman default-on — measures the
+#                           full preset including caveman skill)
+#   - low-token-no-caveman (--low-token --no-caveman — isolates caveman's
+#                           contribution within low-token mode)
+#
+# Comparing low-token vs low-token-no-caveman gives the caveman delta.
+# Comparing default vs low-token gives the full low-token-mode savings.
+#
+# Captures Claude Code token usage via `npx ccusage` and writes a
+# comparison JSON + console summary.
 #
 # Prerequisites:
 #   - `zo` CLI installed (this repo, ./setup.sh)
 #   - `claude` CLI logged in
 #   - `npx ccusage` available (npm install -g ccusage)
-#   - Mac or Linux dev box (Apple Silicon recommended for ~25min low-token wall time)
-#   - ~75 minutes wall time, ~$13-14 spend on Anthropic API
+#   - Mac or Linux dev box (Apple Silicon recommended)
+#   - ~110 minutes wall time across all three runs, ~$20-22 spend on Anthropic API
+#     (default ~$11 + low-token ~$5-6 + low-token-no-caveman ~$6-7)
 #
 # Usage:
 #   ./scripts/benchmark_low_token.sh
 #   ./scripts/benchmark_low_token.sh --delivery-prefix /tmp/zo-bench
-#   ./scripts/benchmark_low_token.sh --skip-default --skip-low-token  # dry preview
+#   ./scripts/benchmark_low_token.sh --skip-default       # only low-token variants
+#   ./scripts/benchmark_low_token.sh --skip-low-token-no-caveman  # skip ablation arm
+#   ./scripts/benchmark_low_token.sh --skip-default --skip-low-token --skip-low-token-no-caveman  # dry preview
 #   ./scripts/benchmark_low_token.sh --help
 #
 # Output:
-#   benchmark-results-{ISO-timestamp}.json — full diff + summary
+#   benchmark-results-{ISO-timestamp}.json — full diff + summary across all runs
 #   stdout — human-readable summary table
 
 set -euo pipefail
@@ -31,6 +44,7 @@ set -euo pipefail
 DELIVERY_PREFIX="${TMPDIR:-/tmp}/zo-low-token-bench"
 RUN_DEFAULT=true
 RUN_LOW_TOKEN=true
+RUN_LOW_TOKEN_NO_CAVEMAN=true
 TIMESTAMP="$(date -u +%Y%m%d-%H%M%S)"
 RESULT_FILE="benchmark-results-${TIMESTAMP}.json"
 
@@ -42,6 +56,8 @@ while [[ $# -gt 0 ]]; do
       RUN_DEFAULT=false; shift ;;
     --skip-low-token)
       RUN_LOW_TOKEN=false; shift ;;
+    --skip-low-token-no-caveman)
+      RUN_LOW_TOKEN_NO_CAVEMAN=false; shift ;;
     --help|-h)
       grep '^# ' "$0" | sed 's/^# //'
       exit 0 ;;
@@ -54,6 +70,7 @@ done
 
 DEFAULT_DELIVERY="${DELIVERY_PREFIX}-default"
 LOW_TOKEN_DELIVERY="${DELIVERY_PREFIX}-low-token"
+LOW_TOKEN_NO_CAVEMAN_DELIVERY="${DELIVERY_PREFIX}-low-token-no-caveman"
 PLAN_PATH="plans/mnist-digit-classifier.md"
 
 # ---------------------------------------------------------------------------
@@ -189,6 +206,10 @@ if [[ "$RUN_LOW_TOKEN" == "true" ]]; then
   run_one "low-token" "$LOW_TOKEN_DELIVERY" --low-token
 fi
 
+if [[ "$RUN_LOW_TOKEN_NO_CAVEMAN" == "true" ]]; then
+  run_one "low-token-no-caveman" "$LOW_TOKEN_NO_CAVEMAN_DELIVERY" --low-token --no-caveman
+fi
+
 # ---------------------------------------------------------------------------
 # Summarise
 # ---------------------------------------------------------------------------
@@ -213,7 +234,7 @@ results = {
     "runs": {}
 }
 
-for label in ("default", "low-token"):
+for label in ("default", "low-token", "low-token-no-caveman"):
     meta_path = Path(f"{prefix}-{label}-meta.json")
     if not meta_path.exists():
         continue
diff --git a/src/zo/cli.py b/src/zo/cli.py
index 493f20f..c3b3167 100644
--- a/src/zo/cli.py
+++ b/src/zo/cli.py
@@ -266,6 +266,9 @@ def _gate_mode_from_str(value: str) -> GateMode:
     "headlines_disabled": True,      # disable Haiku ticker (~60 calls/hr)
     "gate_mode": "full-auto",        # no human-loop overhead
     "compact_threshold": "60",       # CLAUDE_AUTOCOMPACT_PCT_OVERRIDE
+    "caveman": True,                 # vendor caveman skill at .claude/skills/caveman/
+                                     # for terse prose responses; preserves code blocks
+                                     # and structured artifacts. Opt out via --no-caveman.
 }
 
 
@@ -298,18 +301,45 @@ def _resolve_gate_mode(
     return "supervised"
 
 
+def _resolve_caveman(
+    *,
+    cli_no_caveman: bool,
+    plan_caveman: bool | None,
+    low_token: bool,
+) -> bool:
+    """Resolve effective caveman activation.
+
+    Precedence (highest first): CLI ``--no-caveman`` flag > plan field
+    ``caveman: false`` > preset default. Caveman only activates when
+    ``low_token`` is also True; outside low-token mode the skill file at
+    ``.claude/skills/caveman/SKILL.md`` is still available for manual
+    invocation, but ZO does not auto-direct agents to use it.
+    """
+    if not low_token:
+        return False
+    if cli_no_caveman:
+        return False
+    if plan_caveman is False:
+        return False
+    return bool(_LOW_TOKEN_PRESET["caveman"])
+
+
 def _show_banner(
     project: str = "",
     mode: str = "",
     phase: str = "",
     gate_mode: str = "",
     low_token: bool = False,
+    caveman: bool = False,
 ) -> None:
     """Display the ZO brand panel at startup.
 
     When ``low_token`` is True, appends a "low-token" badge to the
     banner so the user has constant visual confirmation that the
-    cost-saving preset is active.
+    cost-saving preset is active. When ``caveman`` is also True
+    (only meaningful with ``low_token``), the badge becomes
+    "low-token + caveman" to signal the additional terse-output skill
+    is engaged.
     """
     from rich.panel import Panel
     from rich.text import Text
@@ -319,7 +349,8 @@ def _show_banner(
     logo.append("Zero Operators", style="#F0C040 bold")
     logo.append(f"  v{_VERSION}", style=_DIM)
     if low_token:
-        logo.append("  [low-token]", style="#F0C040 bold")
+        badge = "  [low-token + caveman]" if caveman else "  [low-token]"
+        logo.append(badge, style="#F0C040 bold")
     logo.append("\n", style=_DIM)
     logo.append("     Autonomous AI Research & Engineering Teams\n", style=_DIM)
     if project:
@@ -910,6 +941,12 @@ def _print_status(team_status, pane_snapshot=""):  # noqa: ANN001
     "--no-headlines", is_flag=True,
     help="Disable the Haiku headline ticker (saves ~60 small calls/hour).",
 )
+@click.option(
+    "--no-caveman", is_flag=True,
+    help="Opt out of caveman terse-output skill (auto-on with --low-token). "
+    "Use if your run produces structured prose that caveman would compress "
+    "ambiguously. Code blocks and structured artifacts are always preserved.",
+)
 def build(
     plan_path: Path,
     gate_mode: str | None,
@@ -918,6 +955,7 @@ def build(
     lead_model: str | None,
     max_iterations: int | None,
     no_headlines: bool,
+    no_caveman: bool,
 ) -> None:
     """Launch a project from a plan.md file.
 
@@ -960,6 +998,11 @@ def build(
         plan_lead_model=plan.frontmatter.lead_model,
         low_token=effective_low_token,
     )
+    effective_caveman = _resolve_caveman(
+        cli_no_caveman=no_caveman,
+        plan_caveman=plan.frontmatter.caveman,
+        low_token=effective_low_token,
+    )
     effective_headlines_disabled = (
         no_headlines or effective_low_token
     )
@@ -991,6 +1034,7 @@ def build(
         phase=state_check.phase if detected_mode == "continue" else "starting",
         gate_mode=effective_gate_mode,
         low_token=effective_low_token,
+        caveman=effective_caveman,
     )
 
     # 5. Create CommsLogger and SemanticIndex
@@ -1017,6 +1061,7 @@ def build(
         semantic=semantic, zo_root=zo_root, gate_mode=gm,
         plan_path=plan_path,
         low_token=effective_low_token,
+        caveman=effective_caveman,
         max_iterations_override=max_iterations,
     )
     orchestrator.start_session()
@@ -1098,6 +1143,10 @@ def build(
     "--no-headlines", is_flag=True,
     help="Disable the Haiku headline ticker.",
 )
+@click.option(
+    "--no-caveman", is_flag=True,
+    help="Opt out of caveman terse-output skill (auto-on with --low-token).",
+)
 def continue_(
     project_name: str | None,
     repo: str | None,
@@ -1107,6 +1156,7 @@ def continue_(
     lead_model: str | None,
     max_iterations: int | None,
     no_headlines: bool,
+    no_caveman: bool,
 ) -> None:
     """Resume a paused project or reconnect on a new machine.
 
@@ -1172,6 +1222,7 @@ def continue_(
         lead_model=lead_model,
         max_iterations=max_iterations,
         no_headlines=no_headlines,
+        no_caveman=no_caveman,
     )
 
 
diff --git a/src/zo/orchestrator.py b/src/zo/orchestrator.py
index 8d9419e..9a04963 100644
--- a/src/zo/orchestrator.py
+++ b/src/zo/orchestrator.py
@@ -171,6 +171,7 @@ def __init__(
         gate_mode: GateMode = GateMode.SUPERVISED,
         plan_path: Path | None = None,
         low_token: bool = False,
+        caveman: bool = False,
         max_iterations_override: int | None = None,
     ) -> None:
         self._plan = plan
@@ -182,6 +183,7 @@ def __init__(
         self._gate_mode = gate_mode
         self._plan_path = plan_path
         self._low_token = low_token
+        self._caveman = caveman
         self._max_iterations_override = max_iterations_override
         self._workflow: WorkflowDecomposition | None = None
         self._session_state: SessionState | None = None
@@ -207,6 +209,17 @@ def low_token(self) -> bool:
         """Whether the low-token preset is active for this orchestrator."""
         return self._low_token
 
+    @property
+    def caveman(self) -> bool:
+        """Whether the caveman terse-output skill is active.
+
+        Only meaningful when ``low_token`` is also True; outside
+        low-token mode the orchestrator does not auto-direct agents to
+        invoke caveman (the skill file is still available at
+        ``.claude/skills/caveman/SKILL.md`` for manual use).
+        """
+        return self._caveman
+
     @property
     def session_state(self) -> SessionState:
         """Current session state (reads from memory if not yet loaded)."""
@@ -399,13 +412,19 @@ def _prompt_low_token_overrides(self) -> str:
         Without this two-tier routing the savings ceiling is ~30%
         (lead-swap-only); with it, the ceiling moves to ~50-60% per
         ``docs/reference/cost-benchmark.mdx``.
+
+        When ``caveman`` is also active (default with low-token; opt out
+        via ``--no-caveman`` or plan ``caveman: false``), an additional
+        "Token Efficiency Skill" subsection is appended directing the
+        lead and all sub-agents to invoke the vendored caveman skill
+        for terse prose responses.
         """
         if not self._low_token:
             return ""
         haiku_list = ", ".join(
             f"`{name}`" for name in sorted(LOW_TOKEN_HAIKU_AGENTS)
         )
-        return dedent(f"""\
+        sections = [dedent(f"""\
             # Low-Token Sub-Agent Model Override
 
             **CRITICAL:** Low-token mode is active. The lead session (you)
@@ -454,7 +473,48 @@ def _prompt_low_token_overrides(self) -> str:
             does not accept a `model` parameter, log a `DECISION_LOG`
             entry noting this and proceed without the override —
             lead-side savings still apply, but sub-agent savings
-            cannot be achieved without an SDK-level fix.""")
+            cannot be achieved without an SDK-level fix.""")]
+        if self._caveman:
+            sections.append(dedent("""\
+                # Token Efficiency Skill: Caveman Mode
+
+                **ACTIVATE caveman skill.** The vendored skill at
+                `.claude/skills/caveman/SKILL.md` is auto-loaded by Claude
+                Code. Adopt caveman speech (full intensity by default) for
+                all prose responses — yours and every sub-agent you spawn.
+
+                **What caveman compresses (chat prose):**
+                - Drop articles (a/an/the), filler (just/really/basically),
+                  pleasantries (sure/certainly/of course), hedging.
+                - Fragments OK. Short synonyms (big not extensive).
+                - Pattern: `[thing] [action] [reason]. [next step].`
+
+                **What caveman LEAVES INTACT (this is the safety guarantee
+                that makes auto-activation safe across the team):**
+                - Code blocks — verbatim, never compressed.
+                - Quoted error strings — verbatim.
+                - Tool inputs (Write/Edit args) — caveman only touches
+                  chat output, not tool calls. So all structured
+                  artifacts the gates require (`metrics.jsonl`,
+                  `result.md`, `training_status.json`, `hypothesis.md`,
+                  agent contracts) are written normally.
+                - Security warnings, irreversible-action confirmations,
+                  multi-step sequences with order ambiguity — caveman
+                  auto-disables for these per its own rules.
+
+                **For each sub-agent you spawn:** include "Use caveman
+                skill (full intensity) for all prose. Code blocks and
+                tool inputs unchanged per skill rules." in their
+                instructions, OR rely on the auto-load + skill
+                description (the skill's `description:` field declares
+                "auto-triggers when token efficiency is requested" —
+                low-token mode IS that request). Both work.
+
+                **Opt-out for the user:** they can pass `--no-caveman` at
+                CLI time or set `caveman: false` in plan frontmatter to
+                disable this section. If they did, this subsection would
+                not be present.""" ))
+        return "\n\n".join(sections)
 
     def _prompt_autonomy(self) -> str:
         """Tell the agent how much autonomy it has based on gate mode."""
diff --git a/src/zo/plan.py b/src/zo/plan.py
index 2535395..9fe28ff 100644
--- a/src/zo/plan.py
+++ b/src/zo/plan.py
@@ -39,6 +39,12 @@ class PlanFrontmatter(BaseModel):
     activates the preset; ``lead_model`` overrides the lead orchestrator
     model. Both can also be supplied via CLI flags, which take
     precedence (CLI > plan field > preset default > base default).
+
+    ``caveman`` is opt-out for the caveman terse-output skill that
+    auto-activates with ``low_token``. ``None`` (the default) means
+    "use the preset default" (currently True). Set to ``False`` in plan
+    frontmatter to suppress caveman activation even when low-token is
+    on. CLI flag ``--no-caveman`` takes precedence over plan and preset.
     """
 
     project_name: str
@@ -49,6 +55,7 @@ class PlanFrontmatter(BaseModel):
     owner: str
     low_token: bool = False
     lead_model: str | None = None
+    caveman: bool | None = None
 
 
 class OracleDefinition(BaseModel):

From 6a87b41069561bdb4c48be2f76fba5f96e998c80 Mon Sep 17 00:00:00 2001
From: SamPlvs <samyaktukra@gmail.com>
Date: Tue, 5 May 2026 13:56:04 +0100
Subject: [PATCH 2/2] =?UTF-8?q?fix(low-token):=20inline=20caveman=20rules?=
 =?UTF-8?q?=20=E2=80=94=20skill=20auto-load=20needs=20hooks=20we=20don't?=
 =?UTF-8?q?=20install?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

After reading caveman's upstream CLAUDE.md, found that always-on
activation in Claude Code is delivered by SessionStart and
UserPromptSubmit hooks (their hooks/install.sh patches
~/.claude/settings.json and copies hook scripts into ~/.claude/hooks/).

The previous commit's prompt directive said "ACTIVATE caveman skill,
the vendored SKILL.md is auto-loaded by Claude Code". This was
incorrect. Skills in .claude/skills/<name>/SKILL.md are *invocations*,
not always-on enforcement. Without hooks, the file alone does nothing
to make agents adopt caveman speech.

We deliberately don't install upstream's hooks (would modify the
user's ~/.claude/ globally — too invasive for an opt-in cost-saving
feature).

Fix: inline the caveman rules directly into the orchestrator prompt.
The lead and every spawned sub-agent get the rules verbatim in their
prompts. No skill or hook system required — savings come from agents
adopting the style as instructed.

Files:
- src/zo/orchestrator.py — _prompt_low_token_overrides() caveman
  subsection rewritten. Drops "load this skill" framing; contains
  the rules inline (intensity full, what to compress, what stays
  verbatim, special handling for DECISION_LOG / gate rationales /
  hand-off summaries — those use lite intensity to stay readable
  cold). Docstring updated to reflect the inline approach.
- .claude/skills/caveman/README.md — clarified that this directory
  is a reference copy, not a runtime activation mechanism. Corrected
  upstream source-of-truth path (skills/caveman/SKILL.md, not
  caveman/SKILL.md which is auto-generated). Updated the "Updating"
  command and "Activation summary" to match.

Verified:
- ruff src/zo/orchestrator.py: clean
- smoke test confirms new "Token Efficiency: Caveman-Style Prose"
  section present, no "auto-loaded by Claude" claim remaining
- validate-docs.sh: 10/10 + 1 pre-existing warning

The file-vendoring stays — it's a reference for developers and
forward-compat if Claude Code adds proper project-level skill
auto-loading later. Bench remains the only honest measurement of
actual savings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .claude/skills/caveman/README.md | 43 ++++++++++------
 src/zo/orchestrator.py           | 84 +++++++++++++++++++-------------
 2 files changed, 77 insertions(+), 50 deletions(-)

diff --git a/.claude/skills/caveman/README.md b/.claude/skills/caveman/README.md
index 95a2ffe..8d8bead 100644
--- a/.claude/skills/caveman/README.md
+++ b/.claude/skills/caveman/README.md
@@ -1,17 +1,26 @@
-# caveman skill (vendored)
+# caveman skill (vendored — reference copy)
 
-This directory vendors the [caveman](https://github.com/JuliusBrussee/caveman) Claude Code skill — an ultra-compressed communication mode that cuts agent prose token usage ~75% while keeping all technical substance intact.
+This directory vendors the [caveman](https://github.com/JuliusBrussee/caveman) Claude Code skill — an ultra-compressed communication mode that cuts agent prose token usage ~65–75% while keeping all technical substance intact.
 
-## Why this is here
+## How ZO uses caveman (read this first)
 
-ZO's `--low-token` preset auto-activates caveman by default for all spawned agents. It's the largest cost lever available without an SDK refactor (which would unlock prompt caching / Batch API / Files API).
+**ZO does NOT rely on Claude Code auto-loading this `SKILL.md` file.** Caveman's always-on activation in Claude Code comes from upstream's SessionStart and UserPromptSubmit hooks (`hooks/install.sh` in their repo) which copy hook scripts into `~/.claude/hooks/` and patch `~/.claude/settings.json`. We deliberately do not run that installer — modifying the user's global Claude Code config is too invasive for an opt-in cost-saving feature.
 
-Caveman is safe to apply across the team because it explicitly preserves:
+Instead, when `--low-token` is active (and `caveman` is not opted out), the orchestrator inlines the caveman rules directly into the Lead Orchestrator's prompt — `src/zo/orchestrator.py:_prompt_low_token_overrides()` appends a "Token Efficiency: Caveman-Style Prose" subsection that contains the rules verbatim. The lead is instructed to pass the same rules to every sub-agent it spawns. The savings come from agents adopting the style as instructed; no skill or hook system is required.
+
+This `SKILL.md` file is kept here as:
+1. A **reference copy** of the canonical rules — useful if a developer wants to read the upstream spec without leaving the repo.
+2. A **forward-compatible artifact** — if a future Claude Code release adds proper project-level skill auto-loading, this file is already in the right place to be picked up.
+3. A **reference for users who install caveman properly** — anyone who runs upstream's `install.sh` separately gets the full hook-based always-on enforcement, and this file is a convenient way to confirm version alignment.
+
+## Why caveman is safe to apply across the team
+
+The skill explicitly preserves:
 
 - Code blocks (verbatim)
 - Quoted error strings (verbatim)
-- Tool inputs (Write/Edit are tool calls, not chat output — caveman only compresses chat prose)
-- File writes including structured artifacts (`metrics.jsonl`, `result.md`, `training_status.json`) — these go through Write/Edit, not caveman
+- Tool inputs (Write/Edit args — caveman only compresses chat prose, not tool calls)
+- Structured artifacts (`metrics.jsonl`, `result.md`, `training_status.json`, agent contracts) — these go through Write/Edit, not chat
 
 It also auto-disables itself for security warnings, irreversible-action confirmations, and any multi-step sequence where dropped articles/conjunctions could create ambiguity.
 
@@ -19,25 +28,27 @@ It also auto-disables itself for security warnings, irreversible-action confirma
 
 - Upstream: https://github.com/JuliusBrussee/caveman
 - License: MIT (`LICENSE` in this directory — Copyright (c) 2026 Julius Brussee)
-- Vendored from: `caveman/SKILL.md` at upstream main, fetched 2026-05-05
-- Variant used: canonical (full intensity, default level)
+- Source-of-truth path upstream: `skills/caveman/SKILL.md`
+- Auto-generated mirror at upstream root: `caveman/SKILL.md` (identical content)
+- Vendored: 2026-05-05
+- Intensity: full (default)
 
 ## Updating
 
-To re-sync from upstream:
+To re-sync from upstream's source-of-truth:
 
 ```bash
-gh api repos/JuliusBrussee/caveman/contents/caveman/SKILL.md --jq '.content' | base64 -d \
+gh api repos/JuliusBrussee/caveman/contents/skills/caveman/SKILL.md --jq '.content' | base64 -d \
   > .claude/skills/caveman/SKILL.md
 ```
 
-Then verify the YAML frontmatter and `## Persistence` / `## Rules` / `## Intensity` / `## Auto-Clarity` / `## Boundaries` sections are intact, and that no upstream-specific paths leaked in.
+Then verify the YAML frontmatter and `## Persistence` / `## Rules` / `## Intensity` / `## Auto-Clarity` / `## Boundaries` sections are intact, and re-check whether the inlined rules in `src/zo/orchestrator.py:_prompt_low_token_overrides()` need updating to match material changes.
 
-## How ZO activates this
+## Activation summary
 
 1. User runs `zo build --low-token` (or sets `low_token: true` in plan frontmatter).
-2. The `--low-token` preset includes `caveman: True` (default; can be opted out via `--no-caveman` CLI flag or `caveman: false` plan field).
-3. The Lead Orchestrator's prompt gains a "Token efficiency mode" subsection (in `_prompt_low_token_overrides()`) that directs the lead and all spawned sub-agents to adopt caveman speech for prose responses.
-4. Claude Code auto-loads this `SKILL.md` because of its presence in `.claude/skills/`. The lead's prompt directive ensures it stays active across sub-agent spawns.
+2. The `--low-token` preset includes `caveman: True` (default; opt out via `--no-caveman` CLI flag or `caveman: false` plan field).
+3. The Lead Orchestrator's prompt gains a "Token Efficiency: Caveman-Style Prose" subsection containing the rules inline.
+4. Lead adopts the style; sub-agents receive the same rules in their spawn prompts and follow them too.
 
 See `docs/concepts/low-token-mode.mdx` and `docs/reference/cost-benchmark.mdx` for measured impact (target: +10–20pp savings on top of the measured ~30% baseline from the lead Opus→Sonnet swap).
diff --git a/src/zo/orchestrator.py b/src/zo/orchestrator.py
index 9a04963..91f91ed 100644
--- a/src/zo/orchestrator.py
+++ b/src/zo/orchestrator.py
@@ -415,9 +415,12 @@ def _prompt_low_token_overrides(self) -> str:
 
         When ``caveman`` is also active (default with low-token; opt out
         via ``--no-caveman`` or plan ``caveman: false``), an additional
-        "Token Efficiency Skill" subsection is appended directing the
-        lead and all sub-agents to invoke the vendored caveman skill
-        for terse prose responses.
+        "Token Efficiency: Caveman-Style Prose" subsection is appended
+        containing the caveman rules **inline** (no Claude Code skill
+        auto-loading dependency — that path requires hooks we
+        deliberately don't install). The lead is instructed to pass the
+        same rules into every sub-agent's spawn prompt so the team
+        adopts terse prose end-to-end.
         """
         if not self._low_token:
             return ""
@@ -476,44 +479,57 @@ def _prompt_low_token_overrides(self) -> str:
             cannot be achieved without an SDK-level fix.""")]
         if self._caveman:
             sections.append(dedent("""\
-                # Token Efficiency Skill: Caveman Mode
-
-                **ACTIVATE caveman skill.** The vendored skill at
-                `.claude/skills/caveman/SKILL.md` is auto-loaded by Claude
-                Code. Adopt caveman speech (full intensity by default) for
-                all prose responses — yours and every sub-agent you spawn.
-
-                **What caveman compresses (chat prose):**
-                - Drop articles (a/an/the), filler (just/really/basically),
-                  pleasantries (sure/certainly/of course), hedging.
-                - Fragments OK. Short synonyms (big not extensive).
+                # Token Efficiency: Caveman-Style Prose
+
+                **Adopt terse, caveman-style prose for all chat
+                responses.** Pass this same instruction to every
+                sub-agent you spawn (include the rules below in their
+                spawn prompt — `Agent(..., prompt="...caveman rules...")`).
+                The reference skill is vendored at
+                `.claude/skills/caveman/SKILL.md` for users who install
+                the upstream hooks; we do not assume those hooks are
+                present, so the rules below are authoritative for this
+                session.
+
+                **Rules (intensity: full):**
+                - Drop articles (a/an/the), filler (just/really/basically/
+                  actually/simply), pleasantries (sure/certainly/of course/
+                  happy to), hedging.
+                - Fragments OK. Short synonyms (big not extensive,
+                  fix not "implement a solution for"). Technical terms
+                  exact.
                 - Pattern: `[thing] [action] [reason]. [next step].`
-
-                **What caveman LEAVES INTACT (this is the safety guarantee
-                that makes auto-activation safe across the team):**
-                - Code blocks — verbatim, never compressed.
-                - Quoted error strings — verbatim.
-                - Tool inputs (Write/Edit args) — caveman only touches
+                - Example bad: "Sure! I'd be happy to help. The issue
+                  is likely caused by your authentication middleware..."
+                - Example good: "Bug in auth middleware. Token expiry
+                  check use `<` not `<=`. Fix:"
+
+                **What stays VERBATIM (do not compress):**
+                - Code blocks — never compressed, every character intact.
+                - Quoted error strings — exact reproduction.
+                - Tool inputs (Write/Edit/Bash args) — we only compress
                   chat output, not tool calls. So all structured
                   artifacts the gates require (`metrics.jsonl`,
                   `result.md`, `training_status.json`, `hypothesis.md`,
-                  agent contracts) are written normally.
+                  agent contracts) are written normally — they go
+                  through Write/Edit, not chat.
                 - Security warnings, irreversible-action confirmations,
-                  multi-step sequences with order ambiguity — caveman
-                  auto-disables for these per its own rules.
-
-                **For each sub-agent you spawn:** include "Use caveman
-                skill (full intensity) for all prose. Code blocks and
-                tool inputs unchanged per skill rules." in their
-                instructions, OR rely on the auto-load + skill
-                description (the skill's `description:` field declares
-                "auto-triggers when token efficiency is requested" —
-                low-token mode IS that request). Both work.
+                  multi-step sequences where dropped articles/conjunctions
+                  could create order ambiguity. Drop caveman for these
+                  and resume after the clear part is done.
+
+                **DECISION_LOG entries, gate rationales, and
+                hand-off summaries:** these are persisted artifacts.
+                Apply caveman lightly — terse prose, but still readable
+                by the next session that reads the file cold. Lean
+                toward `lite` intensity for these (no filler/hedging,
+                keep articles + full sentences) over `full` (fragments,
+                drop articles).
 
                 **Opt-out for the user:** they can pass `--no-caveman` at
-                CLI time or set `caveman: false` in plan frontmatter to
-                disable this section. If they did, this subsection would
-                not be present.""" ))
+                CLI time or set `caveman: false` in plan frontmatter. If
+                they did, this subsection would not be present and you
+                would write normal prose.""" ))
         return "\n\n".join(sections)
 
     def _prompt_autonomy(self) -> str: