SamPlvs · SamPlvs · May 5, 2026 · May 5, 2026
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 Julius Brussee
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,54 @@
+# caveman skill (vendored — reference copy)
+
+This directory vendors the [caveman](https://github.com/JuliusBrussee/caveman) Claude Code skill — an ultra-compressed communication mode that cuts agent prose token usage ~65–75% while keeping all technical substance intact.
+
+## How ZO uses caveman (read this first)
+
+**ZO does NOT rely on Claude Code auto-loading this `SKILL.md` file.** Caveman's always-on activation in Claude Code comes from upstream's SessionStart and UserPromptSubmit hooks (`hooks/install.sh` in their repo) which copy hook scripts into `~/.claude/hooks/` and patch `~/.claude/settings.json`. We deliberately do not run that installer — modifying the user's global Claude Code config is too invasive for an opt-in cost-saving feature.
+
+Instead, when `--low-token` is active (and `caveman` is not opted out), the orchestrator inlines the caveman rules directly into the Lead Orchestrator's prompt — `src/zo/orchestrator.py:_prompt_low_token_overrides()` appends a "Token Efficiency: Caveman-Style Prose" subsection that contains the rules verbatim. The lead is instructed to pass the same rules to every sub-agent it spawns. The savings come from agents adopting the style as instructed; no skill or hook system is required.
+
+This `SKILL.md` file is kept here as:
+1. A **reference copy** of the canonical rules — useful if a developer wants to read the upstream spec without leaving the repo.
+2. A **forward-compatible artifact** — if a future Claude Code release adds proper project-level skill auto-loading, this file is already in the right place to be picked up.
+3. A **reference for users who install caveman properly** — anyone who runs upstream's `install.sh` separately gets the full hook-based always-on enforcement, and this file is a convenient way to confirm version alignment.
+
+## Why caveman is safe to apply across the team
+
+The skill explicitly preserves:
+
+- Code blocks (verbatim)
+- Quoted error strings (verbatim)
+- Tool inputs (Write/Edit args — caveman only compresses chat prose, not tool calls)
+- Structured artifacts (`metrics.jsonl`, `result.md`, `training_status.json`, agent contracts) — these go through Write/Edit, not chat
+
+It also auto-disables itself for security warnings, irreversible-action confirmations, and any multi-step sequence where dropped articles/conjunctions could create ambiguity.
+
+## Source
+
+- Upstream: https://github.com/JuliusBrussee/caveman
+- License: MIT (`LICENSE` in this directory — Copyright (c) 2026 Julius Brussee)
+- Source-of-truth path upstream: `skills/caveman/SKILL.md`
+- Auto-generated mirror at upstream root: `caveman/SKILL.md` (identical content)
+- Vendored: 2026-05-05
+- Intensity: full (default)
+
+## Updating
+
+To re-sync from upstream's source-of-truth:
+
+```bash
+gh api repos/JuliusBrussee/caveman/contents/skills/caveman/SKILL.md --jq '.content' | base64 -d \
+  > .claude/skills/caveman/SKILL.md
+```
+
+Then verify the YAML frontmatter and `## Persistence` / `## Rules` / `## Intensity` / `## Auto-Clarity` / `## Boundaries` sections are intact, and re-check whether the inlined rules in `src/zo/orchestrator.py:_prompt_low_token_overrides()` need updating to match material changes.
+
+## Activation summary
+
+1. User runs `zo build --low-token` (or sets `low_token: true` in plan frontmatter).
+2. The `--low-token` preset includes `caveman: True` (default; opt out via `--no-caveman` CLI flag or `caveman: false` plan field).
+3. The Lead Orchestrator's prompt gains a "Token Efficiency: Caveman-Style Prose" subsection containing the rules inline.
+4. Lead adopts the style; sub-agents receive the same rules in their spawn prompts and follow them too.
+
+See `docs/concepts/low-token-mode.mdx` and `docs/reference/cost-benchmark.mdx` for measured impact (target: +10–20pp savings on top of the measured ~30% baseline from the lead Opus→Sonnet swap).
@@ -0,0 +1,74 @@
+---
+name: caveman
+description: >
+  Ultra-compressed communication mode. Cuts token usage ~75% by speaking like caveman
+  while keeping full technical accuracy. Supports intensity levels: lite, full (default), ultra,
+  wenyan-lite, wenyan-full, wenyan-ultra.
+  Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens",
+  "be brief", or invokes /caveman. Also auto-triggers when token efficiency is requested.
+---
+
+Respond terse like smart caveman. All technical substance stay. Only fluff die.
+
+## Persistence
+
+ACTIVE EVERY RESPONSE. No revert after many turns. No filler drift. Still active if unsure. Off only: "stop caveman" / "normal mode".
+
+Default: **full**. Switch: `/caveman lite|full|ultra`.
+
+## Rules
+
+Drop: articles (a/an/the), filler (just/really/basically/actually/simply), pleasantries (sure/certainly/of course/happy to), hedging. Fragments OK. Short synonyms (big not extensive, fix not "implement a solution for"). Technical terms exact. Code blocks unchanged. Errors quoted exact.
+
+Pattern: `[thing] [action] [reason]. [next step].`
+
+Not: "Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by..."
+Yes: "Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:"
+
+## Intensity
+
+| Level | What change |
+|-------|------------|
+| **lite** | No filler/hedging. Keep articles + full sentences. Professional but tight |
+| **full** | Drop articles, fragments OK, short synonyms. Classic caveman |
+| **ultra** | Abbreviate prose words (DB/auth/config/req/res/fn/impl), strip conjunctions, arrows for causality (X → Y), one word when one word enough. Code symbols, function names, API names, error strings: never abbreviate |
+| **wenyan-lite** | Semi-classical. Drop filler/hedging but keep grammar structure, classical register |
+| **wenyan-full** | Maximum classical terseness. Fully 文言文. 80-90% character reduction. Classical sentence patterns, verbs precede objects, subjects often omitted, classical particles (之/乃/為/其) |
+| **wenyan-ultra** | Extreme abbreviation while keeping classical Chinese feel. Maximum compression, ultra terse |
+
+Example — "Why React component re-render?"
+- lite: "Your component re-renders because you create a new object reference each render. Wrap it in `useMemo`."
+- full: "New object ref each render. Inline object prop = new ref = re-render. Wrap in `useMemo`."
+- ultra: "Inline obj prop → new ref → re-render. `useMemo`."
+- wenyan-lite: "組件頻重繪，以每繪新生對象參照故。以 useMemo 包之。"
+- wenyan-full: "物出新參照，致重繪。useMemo .Wrap之。"
+- wenyan-ultra: "新參照→重繪。useMemo Wrap。"
+
+Example — "Explain database connection pooling."
+- lite: "Connection pooling reuses open connections instead of creating new ones per request. Avoids repeated handshake overhead."
+- full: "Pool reuse open DB connections. No new connection per request. Skip handshake overhead."
+- ultra: "Pool = reuse DB conn. Skip handshake → fast under load."
+- wenyan-full: "池reuse open connection。不每req新開。skip handshake overhead。"
+- wenyan-ultra: "池reuse conn。skip handshake → fast。"
+
+## Auto-Clarity
+
+Drop caveman when:
+- Security warnings
+- Irreversible action confirmations
+- Multi-step sequences where fragment order or omitted conjunctions risk misread
+- Compression itself creates technical ambiguity (e.g., `"migrate table drop column backup first"` — order unclear without articles/conjunctions)
+- User asks to clarify or repeats question
+
+Resume caveman after clear part done.
+
+Example — destructive op:
+> **Warning:** This will permanently delete all rows in the `users` table and cannot be undone.
+> ```sql
+> DROP TABLE users;
+> ```
+> Caveman resume. Verify backup exist first.
+
+## Boundaries
+
+Code/commits/PRs: write normal. "stop caveman" or "normal mode": revert. Level persist until changed or session end.
@@ -1,25 +1,38 @@
 #!/usr/bin/env bash
-# benchmark_low_token.sh — measure token cost reduction from --low-token mode.
+# benchmark_low_token.sh — measure token cost reduction from --low-token mode
+# and caveman-skill ablation.
 #
-# Runs `zo build` against the MNIST plan twice (default mode + low-token mode),
-# captures Claude Code token usage via `npx ccusage`, and writes a comparison
-# JSON + console summary.
+# Runs `zo build` against the MNIST plan up to THREE times:
+#   - default              (no flags — historical anchor, ~$11)
+#   - low-token            (--low-token, caveman default-on — measures the
+#                           full preset including caveman skill)
+#   - low-token-no-caveman (--low-token --no-caveman — isolates caveman's
+#                           contribution within low-token mode)
+#
+# Comparing low-token vs low-token-no-caveman gives the caveman delta.
+# Comparing default vs low-token gives the full low-token-mode savings.
+#
+# Captures Claude Code token usage via `npx ccusage` and writes a
+# comparison JSON + console summary.
 #
 # Prerequisites:
 #   - `zo` CLI installed (this repo, ./setup.sh)
 #   - `claude` CLI logged in
 #   - `npx ccusage` available (npm install -g ccusage)
-#   - Mac or Linux dev box (Apple Silicon recommended for ~25min low-token wall time)
-#   - ~75 minutes wall time, ~$13-14 spend on Anthropic API
+#   - Mac or Linux dev box (Apple Silicon recommended)
+#   - ~110 minutes wall time across all three runs, ~$20-22 spend on Anthropic API
+#     (default ~$11 + low-token ~$5-6 + low-token-no-caveman ~$6-7)
 #
 # Usage:
 #   ./scripts/benchmark_low_token.sh
 #   ./scripts/benchmark_low_token.sh --delivery-prefix /tmp/zo-bench
-#   ./scripts/benchmark_low_token.sh --skip-default --skip-low-token  # dry preview
+#   ./scripts/benchmark_low_token.sh --skip-default       # only low-token variants
+#   ./scripts/benchmark_low_token.sh --skip-low-token-no-caveman  # skip ablation arm
+#   ./scripts/benchmark_low_token.sh --skip-default --skip-low-token --skip-low-token-no-caveman  # dry preview
 #   ./scripts/benchmark_low_token.sh --help
 #
 # Output:
-#   benchmark-results-{ISO-timestamp}.json — full diff + summary
+#   benchmark-results-{ISO-timestamp}.json — full diff + summary across all runs
 #   stdout — human-readable summary table
 
 set -euo pipefail
@@ -31,6 +44,7 @@ set -euo pipefail
 DELIVERY_PREFIX="${TMPDIR:-/tmp}/zo-low-token-bench"
 RUN_DEFAULT=true
 RUN_LOW_TOKEN=true
+RUN_LOW_TOKEN_NO_CAVEMAN=true
 TIMESTAMP="$(date -u +%Y%m%d-%H%M%S)"
 RESULT_FILE="benchmark-results-${TIMESTAMP}.json"
 
@@ -42,6 +56,8 @@ while [[ $# -gt 0 ]]; do
       RUN_DEFAULT=false; shift ;;
     --skip-low-token)
       RUN_LOW_TOKEN=false; shift ;;
+    --skip-low-token-no-caveman)
+      RUN_LOW_TOKEN_NO_CAVEMAN=false; shift ;;
     --help|-h)
       grep '^# ' "$0" | sed 's/^# //'
       exit 0 ;;
@@ -54,6 +70,7 @@ done
 
 DEFAULT_DELIVERY="${DELIVERY_PREFIX}-default"
 LOW_TOKEN_DELIVERY="${DELIVERY_PREFIX}-low-token"
+LOW_TOKEN_NO_CAVEMAN_DELIVERY="${DELIVERY_PREFIX}-low-token-no-caveman"
 PLAN_PATH="plans/mnist-digit-classifier.md"
 
 # ---------------------------------------------------------------------------
@@ -189,6 +206,10 @@ if [[ "$RUN_LOW_TOKEN" == "true" ]]; then
   run_one "low-token" "$LOW_TOKEN_DELIVERY" --low-token
 fi
 
+if [[ "$RUN_LOW_TOKEN_NO_CAVEMAN" == "true" ]]; then
+  run_one "low-token-no-caveman" "$LOW_TOKEN_NO_CAVEMAN_DELIVERY" --low-token --no-caveman
+fi
+
 # ---------------------------------------------------------------------------
 # Summarise
 # ---------------------------------------------------------------------------
@@ -213,7 +234,7 @@ results = {
     "runs": {}
 }
 
-for label in ("default", "low-token"):
+for label in ("default", "low-token", "low-token-no-caveman"):
     meta_path = Path(f"{prefix}-{label}-meta.json")
     if not meta_path.exists():
         continue

@@ -266,6 +266,9 @@ def _gate_mode_from_str(value: str) -> GateMode:
     "headlines_disabled": True,      # disable Haiku ticker (~60 calls/hr)
     "gate_mode": "full-auto",        # no human-loop overhead
     "compact_threshold": "60",       # CLAUDE_AUTOCOMPACT_PCT_OVERRIDE
+    "caveman": True,                 # vendor caveman skill at .claude/skills/caveman/
+                                     # for terse prose responses; preserves code blocks
+                                     # and structured artifacts. Opt out via --no-caveman.
 }
 
 
@@ -298,18 +301,45 @@ def _resolve_gate_mode(
     return "supervised"
 
 
+def _resolve_caveman(
+    *,
+    cli_no_caveman: bool,
+    plan_caveman: bool | None,
+    low_token: bool,
+) -> bool:
+    """Resolve effective caveman activation.
+
+    Precedence (highest first): CLI ``--no-caveman`` flag > plan field
+    ``caveman: false`` > preset default. Caveman only activates when
+    ``low_token`` is also True; outside low-token mode the skill file at
+    ``.claude/skills/caveman/SKILL.md`` is still available for manual
+    invocation, but ZO does not auto-direct agents to use it.
+    """
+    if not low_token:
+        return False
+    if cli_no_caveman:
+        return False
+    if plan_caveman is False:
+        return False
+    return bool(_LOW_TOKEN_PRESET["caveman"])
+
+
 def _show_banner(
     project: str = "",
     mode: str = "",
     phase: str = "",
     gate_mode: str = "",
     low_token: bool = False,
+    caveman: bool = False,
 ) -> None:
     """Display the ZO brand panel at startup.
 
     When ``low_token`` is True, appends a "low-token" badge to the
     banner so the user has constant visual confirmation that the
-    cost-saving preset is active.
+    cost-saving preset is active. When ``caveman`` is also True
+    (only meaningful with ``low_token``), the badge becomes
+    "low-token + caveman" to signal the additional terse-output skill
+    is engaged.
     """
     from rich.panel import Panel
     from rich.text import Text
@@ -319,7 +349,8 @@ def _show_banner(
     logo.append("Zero Operators", style="#F0C040 bold")
     logo.append(f"  v{_VERSION}", style=_DIM)
     if low_token:
-        logo.append("  [low-token]", style="#F0C040 bold")
+        badge = "  [low-token + caveman]" if caveman else "  [low-token]"
+        logo.append(badge, style="#F0C040 bold")
     logo.append("\n", style=_DIM)
     logo.append("     Autonomous AI Research & Engineering Teams\n", style=_DIM)
     if project:
@@ -910,6 +941,12 @@ def _print_status(team_status, pane_snapshot=""):  # noqa: ANN001
     "--no-headlines", is_flag=True,
     help="Disable the Haiku headline ticker (saves ~60 small calls/hour).",
 )
+@click.option(
+    "--no-caveman", is_flag=True,
+    help="Opt out of caveman terse-output skill (auto-on with --low-token). "
+    "Use if your run produces structured prose that caveman would compress "
+    "ambiguously. Code blocks and structured artifacts are always preserved.",
+)
 def build(
     plan_path: Path,
     gate_mode: str | None,
@@ -918,6 +955,7 @@ def build(
     lead_model: str | None,
     max_iterations: int | None,
     no_headlines: bool,
+    no_caveman: bool,
 ) -> None:
     """Launch a project from a plan.md file.
 
@@ -960,6 +998,11 @@ def build(
         plan_lead_model=plan.frontmatter.lead_model,
         low_token=effective_low_token,
     )
+    effective_caveman = _resolve_caveman(
+        cli_no_caveman=no_caveman,
+        plan_caveman=plan.frontmatter.caveman,
+        low_token=effective_low_token,
+    )
     effective_headlines_disabled = (
         no_headlines or effective_low_token
     )
@@ -991,6 +1034,7 @@ def build(
         phase=state_check.phase if detected_mode == "continue" else "starting",
         gate_mode=effective_gate_mode,
         low_token=effective_low_token,
+        caveman=effective_caveman,
     )
 
     # 5. Create CommsLogger and SemanticIndex
@@ -1017,6 +1061,7 @@ def build(
         semantic=semantic, zo_root=zo_root, gate_mode=gm,
         plan_path=plan_path,
         low_token=effective_low_token,
+        caveman=effective_caveman,
         max_iterations_override=max_iterations,
     )
     orchestrator.start_session()
@@ -1098,6 +1143,10 @@ def build(
     "--no-headlines", is_flag=True,
     help="Disable the Haiku headline ticker.",
 )
+@click.option(
+    "--no-caveman", is_flag=True,
+    help="Opt out of caveman terse-output skill (auto-on with --low-token).",
+)
 def continue_(
     project_name: str | None,
     repo: str | None,
@@ -1107,6 +1156,7 @@ def continue_(
     lead_model: str | None,
     max_iterations: int | None,
     no_headlines: bool,
+    no_caveman: bool,
 ) -> None:
     """Resume a paused project or reconnect on a new machine.
 
@@ -1172,6 +1222,7 @@ def continue_(
         lead_model=lead_model,
         max_iterations=max_iterations,
         no_headlines=no_headlines,
+        no_caveman=no_caveman,
     )