Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/COMMANDS.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@ zo build plans/project.md [--gate-mode supervised|auto|full-auto] [--no-tmux]
```

**Cost-saving options:**
- `--low-token`: activates the cost-saving preset (Sonnet lead, Haiku for code-reviewer / test-engineer / oracle-qa, 2 max iterations, full-auto gates, no headlines, earlier auto-compaction). Equivalent to `low_token: true` in plan frontmatter. **~30% measured savings** on MNIST ($7.75 vs ~$11 default); 50-60% targeted, 70-80% on the roadmap (prompt caching + Batch API + Files API via SDK refactor). See `docs/concepts/low-token-mode.mdx`.
- `--low-token`: activates the cost-saving preset (Sonnet lead, Haiku for code-reviewer / test-engineer / oracle-qa, 2 max iterations, full-auto gates, no end-of-session Haiku summary, earlier auto-compaction). Equivalent to `low_token: true` in plan frontmatter. **~30% measured savings** on MNIST ($7.75 vs ~$11 default); 50-60% targeted, 70-80% on the roadmap (prompt caching + Batch API + Files API via SDK refactor). See `docs/concepts/low-token-mode.mdx`.
- `--lead-model`: override lead orchestrator model. Composes with `--low-token`.
- `--max-iterations N`: hard cap on Phase-4 experiment iterations. Wins over plan and preset.
- `--no-headlines`: disable the Haiku headline ticker (saves ~60 small calls/hour).
- `--no-headlines`: skip the end-of-session Haiku bullet summary (~1 Haiku call per run).

### zo continue

Expand Down
9 changes: 3 additions & 6 deletions docs/cli/build.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ You don't pass a flag for this, `zo build` figures it out.
Creates a tmux pane (default) or runs headless (`--no-tmux`). Pastes the Lead Orchestrator's prompt. The Lead reads the full plan, creates the team via `TeamCreate`, and drives execution.
</Step>
<Step title="Monitor">
The Python `LifecycleWrapper` observes via `~/.claude/tasks/` and the tmux pane. Captures lifecycle events, pipes them to JSONL comms log, surfaces Haiku-summarised headlines every 60 seconds in your terminal. Auto-launches the training dashboard split-pane during Phase 4.
The Python `LifecycleWrapper` observes via `~/.claude/tasks/` and the tmux pane. Captures lifecycle events, pipes them to JSONL comms log, prints the live task list and recent agent events in your terminal. Auto-launches the training dashboard split-pane during Phase 4.
</Step>
<Step title="Gate handling">
At every phase boundary, checks artifact contracts and oracle thresholds. Automated gates auto-PROCEED. Blocking gates pause; you approve via `/gates:approve` or reject via `/gates:reject`.
Expand Down Expand Up @@ -117,9 +117,6 @@ While `zo build` is running, you have several windows into what the team is doin
<Card title="tmux pane" icon="window-restore">
`Ctrl-b n` switches to the agent session. Watch the Lead Orchestrator coordinate the team in real time. `Ctrl-b p` returns.
</Card>
<Card title="Headlines" icon="bullhorn">
Every 60 seconds, your invoking terminal gets a 1-line Haiku summary of recent agent events. Lightweight live signal.
</Card>
<Card title="Training dashboard" icon="chart-line">
Auto-launched as a split-pane during Phase 4. Rich Live panel: progress bar, metrics table, sparkline, checkpoints. See `zo watch-training`.
</Card>
Expand All @@ -138,7 +135,7 @@ While `zo build` is running, you have several windows into what the team is doin
| `--low-token` | (off) | Activate the cost-saving preset. See [low-token mode](/concepts/low-token-mode). |
| `--lead-model` | `opus` (or `sonnet` if `--low-token`) | Override the lead orchestrator model: `opus` / `sonnet` / `haiku`. |
| `--max-iterations N` | `10` (or `2` if `--low-token`) | Hard cap on Phase-4 experiment iterations. Wins over plan and preset. |
| `--no-headlines` | (off) | Disable the Haiku headline ticker (saves ~60 small calls/hour). |
| `--no-headlines` | (off) | Skip the end-of-session Haiku bullet summary (~1 Haiku call per run, ~$0.0002). |
| `--bypass-permissions` | (off) | Auto-approve Claude Code tool-call prompts. Implied by `--gate-mode full-auto`. See [Permission prompts](#permission-prompts-bypass-permissions). |

## Examples
Expand Down Expand Up @@ -173,7 +170,7 @@ While `zo build` is running, you have several windows into what the team is doin
```bash
zo build plans/my-project.md --low-token
```
Sonnet lead, Haiku for code-reviewer / test-engineer / oracle-qa, 2 max iterations, no headlines, full-auto gates, earlier auto-compaction. **~30% measured savings** on MNIST (\$7.75 vs ~\$11 default); 50-60% targeted, 70-80% on the [roadmap](/roadmap). See [low-token mode](/concepts/low-token-mode) for the full preset and trade-offs.
Sonnet lead, Haiku for code-reviewer / test-engineer / oracle-qa, 2 max iterations, no end-of-session Haiku summary, full-auto gates, earlier auto-compaction. **~30% measured savings** on MNIST (\$7.75 vs ~\$11 default); 50-60% targeted, 70-80% on the [roadmap](/roadmap). See [low-token mode](/concepts/low-token-mode) for the full preset and trade-offs.
</Accordion>
<Accordion title="Low-token mode with override">
```bash
Expand Down
2 changes: 1 addition & 1 deletion docs/cli/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ Most project-aware commands also accept:
| `--low-token` | Activate the cost-saving preset (Sonnet lead, max-iterations 2, full-auto gates, no headlines, earlier auto-compaction). See [low-token mode](/concepts/low-token-mode). |
| `--lead-model` | Override the lead orchestrator model (`opus`/`sonnet`/`haiku`). Composes with `--low-token`. |
| `--max-iterations N` | Hard cap on Phase-4 experiment iterations. Wins over plan-level `## Experiment Loop` and the low-token preset. |
| `--no-headlines` | Disable the Haiku headline ticker. |
| `--no-headlines` | Skip the end-of-session Haiku bullet summary (~1 small call per run). |
| `--bypass-permissions` | Auto-approve Claude Code tool-call permission prompts. Implied by `--gate-mode full-auto`. Works in both tmux and headless modes. See [build → Permission prompts](/cli/build#permission-prompts-bypass-permissions). |

## Modes of operation
Expand Down
9 changes: 4 additions & 5 deletions docs/concepts/low-token-mode.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ The banner shows a `[low-token]` badge whenever the preset is active so you have
| Phase-4 `max_iterations` | `10` | `2` | Phase 4 is the dominant cost; iteration count is the multiplier. |
| Phase-4 `stop_on_tier` | `must_pass` | `could_pass` | Stops at the weakest acceptable oracle tier instead of pushing for the strongest. |
| Cross-cutting `research-scout` | on every phase | dropped | Saves ~6 spawns and their contracts. |
| Haiku headline ticker | every 60s | disabled | ~60 small calls/hour. Small individually, cumulative across long runs. |
| End-of-session Haiku summary | 1 call per run | disabled | ~$0.0002 per run. The previous per-60-second ticker (~60 small calls/hour) was removed unconditionally; only the one-shot wrap-up remains, and `--low-token` skips even that. |
| Default gate mode | `supervised` | `full-auto` | No human-loop overhead. You can still pass `--gate-mode supervised` to override. |
| `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` | unset (Claude Code default ~83%) | `60` | Auto-compacts conversation context earlier, prevents performance degradation near the window limit. |

Expand Down Expand Up @@ -109,7 +109,7 @@ In each case, run on default settings, validate the result, then switch on `low_
The features below would help but require a larger architectural change (switching ZO from `claude` CLI launcher to direct Anthropic SDK). They're tracked as future work, not in scope for low-token mode v1:

- **Prompt caching**: Anthropic's 5-minute TTL cache. Would save ~90% on cached input tokens.
- **Batch API**: 50% discount for async jobs. Suitable for the Haiku ticker but incompatible with interactive tmux.
- **Batch API**: 50% discount for async jobs. Applicable to other Haiku/Sonnet calls ZO might add in future (e.g., end-of-phase reports) but incompatible with interactive tmux. The per-60-second Haiku ticker that originally motivated this note has since been removed.
- **Files API**: upload static artifacts (plans, specs) once, reference by ID.
- **Extended thinking budget tuning**: cap thinking tokens explicitly.

Expand All @@ -127,8 +127,7 @@ What changes, end-to-end, between a default run and a low-token run on the same
| Phase 4 stop condition | `must_pass` tier | `could_pass` tier | Stops earlier on weakest acceptable result |
| Cross-cutting `research-scout` | Spawned per phase | Skipped | ~6 spawns × ~1 KB contract |
| Cross-cutting `code-reviewer` | Spawned per phase | Spawned per phase | (kept, quality safety net) |
| Haiku headline ticker | Every 60s | Disabled | ~60 small calls/hour |
| End-of-session Haiku summary | Generated | Skipped | 1 small call per session |
| End-of-session Haiku summary | Generated | Skipped | 1 small call per session. (The previous per-60-second headline ticker — ~60 small calls/hour — was removed unconditionally.) |
| Auto-compaction trigger | ~83% of context window | 60% of context window | Earlier compaction → less degraded reasoning at the tail |

## Worked example: MNIST
Expand Down Expand Up @@ -236,7 +235,7 @@ In each case: run on default settings, validate the result, then switch on `low_
Several Anthropic API features could provide additional savings but require switching ZO from the `claude` CLI launcher to direct Anthropic SDK calls, out of scope for low-token v1:

- **Prompt caching**: 5-minute TTL cache. Would save ~90% on cached input tokens.
- **Batch API**: 50% discount for async jobs. Suitable for the Haiku ticker but incompatible with interactive tmux.
- **Batch API**: 50% discount for async jobs. Applicable to other Haiku/Sonnet calls ZO might add in future (e.g., end-of-phase reports) but incompatible with interactive tmux. The per-60-second Haiku ticker that originally motivated this note has since been removed.
- **Files API**: upload static artifacts (plans, specs) once, reference by ID.
- **Extended thinking budget tuning**: cap thinking tokens explicitly.

Expand Down
5 changes: 3 additions & 2 deletions docs/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,9 @@ The **Lead Orchestrator** reads the plan, decomposes it into phases, builds cont

- A tmux pane with the orchestrator's live session
- Phase progress + gate status streamed to your terminal
- Haiku-summarised headline events every 60 seconds
- Live task list + recent agent events from the comms log
- A live training dashboard (split-pane) once Phase 4 starts
- At the end of the run, a short Haiku-generated bullet summary of what the team accomplished

<Tip>
Press <kbd>Ctrl-b n</kbd> to switch into the agent pane and watch the team work in real time. <kbd>Ctrl-b p</kbd> jumps back.
Expand All @@ -113,7 +114,7 @@ The **Lead Orchestrator** reads the plan, decomposes it into phases, builds cont
zo build plans/mnist-demo.md --low-token
```

Sonnet lead instead of Opus, Haiku for code-reviewer / test-engineer / oracle-qa, 2 Phase-4 iterations instead of 10, full-auto gates, no headlines, earlier auto-compaction. **~30% measured savings** on the MNIST bench (\$7.75 vs ~\$11 default); 50-60% targeted with the Haiku routing + per-phase trims (pending second bench), 70-80% on the [roadmap](/roadmap) via SDK refactor (prompt caching, Batch API, Files API). Slight quality trade at the lead step. See [low-token mode](/concepts/low-token-mode) for the full preset and trade-offs.
Sonnet lead instead of Opus, Haiku for code-reviewer / test-engineer / oracle-qa, 2 Phase-4 iterations instead of 10, full-auto gates, no end-of-session Haiku summary, earlier auto-compaction. **~30% measured savings** on the MNIST bench (\$7.75 vs ~\$11 default); 50-60% targeted with the Haiku routing + per-phase trims (pending second bench), 70-80% on the [roadmap](/roadmap) via SDK refactor (prompt caching, Batch API, Files API). Slight quality trade at the lead step. See [low-token mode](/concepts/low-token-mode) for the full preset and trade-offs.
</Note>

## 6. Verify
Expand Down
5 changes: 2 additions & 3 deletions docs/reference/low-token-preset.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,7 @@ Authoritative locations: [`src/zo/cli.py`](https://github.com/SamPlvs/zero-opera
| Phase-4 `stop_on_tier` | `must_pass` | `could_pass` | `_LOW_TOKEN_LOOP_CLAMPS` in `experiment_loop.py` |
| Cross-cutting `research-scout` | enabled | disabled | `_agents_for_phase` in `orchestrator.py` |
| Cross-cutting `code-reviewer` | enabled | dropped from Phase 1; **on Haiku** elsewhere | `LOW_TOKEN_PHASE_DROPS` + `LOW_TOKEN_HAIKU_AGENTS` |
| Haiku headline ticker | every 60s | disabled | `_maybe_print_headline` in `cli.py` |
| End-of-session Haiku summary | generated | skipped | `_generate_session_summary` in `cli.py` |
| End-of-session Haiku summary | generated | skipped | `_generate_session_summary` in `cli.py`. (The previous per-60-second `_maybe_print_headline` ticker was removed unconditionally; only the one-shot wrap-up call remains, and `--low-token` skips even that.) |
| Default gate mode | `supervised` | `full-auto` | `_resolve_gate_mode` in `cli.py` |
| Lead prompt: dedicated adaptations section | included | omitted | `build_lead_prompt` in `orchestrator.py` |
| Lead prompt: roster | full descriptive | compact comma-list | `_prompt_roster` in `orchestrator.py` |
Expand All @@ -72,7 +71,7 @@ These compose with `--low-token` to fine-tune individual knobs:
|---|---|---|
| `--lead-model {opus,sonnet,haiku}` | choice | Override the lead model |
| `--max-iterations N` | int | Hard cap on Phase-4 iterations |
| `--no-headlines` | bool | Disable the headline ticker (independent of low-token) |
| `--no-headlines` | bool | Skip the end-of-session Haiku bullet summary (independent of low-token) |
| `--gate-mode {supervised,auto,full-auto}` | choice | Override the gate mode |

## Plan-level fields
Expand Down
33 changes: 33 additions & 0 deletions memory/zo-platform/DECISION_LOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -1082,3 +1082,36 @@ The tmux mechanism (settings.local.json overlay) is novel for ZO and the risk su
- **Single-flag shorthand `--unattended` aliasing both `--gate-mode full-auto` and `--bypass-permissions`** — defer for future polish; the current explicit form is more learnable and the two-flag combo is short enough.

**Outcome:** Single feature commit on branch `claude/bypass-permissions-flag`, PR opened against `main`. validate-docs 10/11 (improved from baseline as the test-count warning resolved). 760 pytest pass / 7 skipped. **No new PRIOR added** — this is a clean feature with no failure trace; the design choices are auditable here. Power-user usage: `zo continue --repo /path/to/prod-001 --gate-mode full-auto` for an unattended overnight tmux run with full agent-team visibility AND no permission prompts.

---

## Decision: 2026-05-28T20:00:00Z
**Type:** FEATURE-REMOVAL + BEHAVIOR-CHANGE
**Title:** Remove the per-60-second Haiku headline ticker (keep the end-of-session bullet summary)

**Decision:** Remove the periodic Haiku headline ticker from `zo build` / `zo continue` runs. The ticker spawned `claude -p --model haiku` every 60 seconds throughout a session to summarise the last 15 buffered events into a one-line headline printed in the lead pane. User feedback: nobody uses the headlines. Cost: ~60 subprocess spawns per hour at ~$0.0001-$0.0003 each, totalling ~$0.06-$0.18 per 10-hour overnight run, plus the latency/CPU cost of repeatedly spawning a Claude Code subprocess. Pure waste if the output is going unread.

The companion `_generate_session_summary` call at session end is **kept** — it's a single Haiku call per run (~$0.0002) that prints a 2-3 bullet wrap-up of what the team accomplished. Useful, cheap, one-shot. User explicitly confirmed keeping this piece.

Scope (single PR to `main`):
- **`src/zo/cli.py`** — `_launch_and_monitor`: deleted `_maybe_print_headline` function (~30 lines), the `_last_headline_time` and `_headline_interval` timer variables, and the `_maybe_print_headline()` call inside `_print_status`. Kept `_headline_buffer` (still consumed by `_generate_session_summary`) and all the `_headline_buffer.append(...)` event-capture calls in `_print_status`. Kept `_generate_session_summary` and its end-of-session invocation. Updated `headlines_disabled` parameter docstring + the auxiliary-call comment to reflect the narrowed meaning. Updated `_LOW_TOKEN_PRESET["headlines_disabled"]` inline comment from "disable Haiku ticker (~60 calls/hr)" to "skip end-of-session Haiku summary (~1 call/session)". Updated `--no-headlines` flag help text on both `build` and `continue` commands.
- **`docs/cli/build.mdx`** — Step 5 monitoring sentence rewritten ("surfaces Haiku-summarised headlines every 60 seconds" → "prints the live task list and recent agent events"), "Headlines" Live-monitoring `<Card>` removed entirely, options table `--no-headlines` row updated, `--low-token` accordion text updated.
- **`docs/cli/overview.mdx`** — shared options table `--no-headlines` row updated.
- **`docs/quickstart.mdx`** — "What you'll see" bullet list rewritten + new end-of-session bullet added, `--low-token` Note text updated.
- **`docs/COMMANDS.md`** — `--low-token` and `--no-headlines` lines updated.
- **`docs/concepts/low-token-mode.mdx`** — two preset-comparison tables collapsed the "Haiku headline ticker" row into the existing "End-of-session Haiku summary" row with an explanatory note that the ticker has been removed unconditionally. Updated the Batch-API forward-looking note that referenced the (now-removed) ticker.
- **`docs/reference/low-token-preset.mdx`** — preset table same collapse, `--no-headlines` override-flag entry updated.

**Rationale:** This is the cheapest large-payoff cleanup left on the cost-reduction roadmap. The ticker output was visible (one line every minute in the lead pane) but redundant — the lead pane already shows the live task list, agent events, and gate decisions in real time. The Haiku summary added decoration, not signal. Removing it cuts ~$0.06-$0.18 per overnight run AND removes 60/hr subprocess churn, with no observed user impact (per user-confirmed "no one is using this feature"). Keeping the end-of-session wrap-up preserves the genuinely useful one-shot summary at session close without re-introducing the per-tick cost.

The `--no-headlines` flag is preserved (not removed) for backwards compatibility. Its meaning narrows from "disable both the ticker and the end-of-session summary" to "disable the end-of-session summary only" — the ticker is gone for everyone now, with or without the flag.

**Alternatives considered:**
- **Remove the end-of-session summary too** — rejected. User explicitly wanted to keep it. Cost is trivial (~$0.0002/run), and the bullet wrap-up is genuinely useful at session close.
- **Remove the `--no-headlines` flag entirely** — rejected. Backwards-compatibility concern; the flag still has a meaningful (if narrower) effect. Keeping it is one line with positive UX value.
- **Move the ticker off Haiku to a non-LLM summariser** — rejected. The redundancy with the live task list pane is the actual problem, not the Haiku call. A non-LLM summary would still be ignored noise.
- **Make the ticker opt-in via a `--headlines` flag** — rejected. Adds complexity for a feature with zero confirmed users. If demand surfaces later, easy to re-add.

**Outcome:** Single feature-removal commit on branch `claude/remove-haiku-ticker`, PR opened against `main`. After rebase onto main (which now includes PR #92), pytest 760 passed / 7 skipped on both Python 3.11 and 3.12. ruff `src/` clean. validate-docs clean. Full CI-equivalent matrix run locally per PR-039 protocol before push. **No new PRIOR added** — this is a clean feature-removal driven by user feedback, not a failure-trace self-evolution.

**Cross-reference:** PR-039 (pre-push verification must mirror full CI matrix) — this PR's verification followed that protocol end-to-end. Related to session 024's `--low-token` design where `headlines_disabled` was first introduced as a cost-saving preset field; this PR extends that thinking to the unconditional case (everyone gets the cost saving, not just `--low-token` users).
Loading
Loading