reduce ceremony and make verification/review gates stricter

## Context
  A real Codex `/goal` dogfood run used Planr to build and deploy a polished Phaser browser game with generated assets, live browser verification, follow-up
  fixes, and Cloudflare deploy.
  Retrospective verdict was **mixed**:
  Planr helped the original broad run by providing durable task state, dependency ordering, evidence logs, reviews, and an auditable goal contract. However, it
  also added ceremony during smaller polish/debug follow-ups and missed important product-level bugs.
  This issue tracks the concrete product gaps found during that run.
  ## What Worked
  - Planr made the original broad goal more reliable than plain Codex `/goal`.
  - The map turned the broad task into ordered slices.
  - `planr pick`, `planr done`, logs, and reviews preserved handoff evidence.
  - `planr plan audit <plan-id> --json` prevented premature completion for the original goal because the stored goal contract required live browser
  verification.
  - If the session had crashed, Planr would have preserved item state, dependency links, logs, reviews, approvals, context, and screenshot/artifact paths.
  ## Problems Found
  ### 1. Verification was not consistently binding across follow-up plans
  The original goal audit required live browser evidence because a goal contract existed.
  A later follow-up plan could report `holds: true` even when:
  ```json
  "verification_logged": {
    "pass": false,
    "required": false
  }
  ```
  This creates an unsafe distinction:
  - broad `/goal` plans can require verification,
  - follow-up plans can appear done without verification unless a new contract is created.
  For autonomous work, this is too easy to miss.
  ### 2. Browser capability context was not used
  The run used Codex Browser, but not because Planr selected it from capability state.
  Evidence from the retrospective:
  ```bash
  planr context list --tag capability
  ```
  returned:
  ```json
  {"contexts":[]}
  ```
  Browser use happened because the user prompt and plan/contract said to use Browser, not because a pinned Planr capability guided the worker or reviewer.
  Planr should make capability capture more automatic when the user says things like:
  - use Codex browser plugin
  - use browser-harness
  - use Playwright
  - use iOS simulator
  - use hatch-pet / image generation
  ### 3. Reviews could become ceremonial
  Some review artifacts had:
  ```text
  Source content included: false
  ```
  and no findings, yet review gates completed.
  That weakens the value of review gates. If a review does not inspect source, changed files, logs, and verification evidence, it should either:
  - close as `unclear`, or
  - be skipped as low-signal ceremony.
  ### 4. Small follow-up polish created too much overhead
  For small visual polish/debug loops, Planr sometimes produced too many items and repeated `done --review` / `review close` cycles.
  This made Planr feel slower than a plain Codex `/goal` loop for minor UI polish.
  Planr needs a lighter follow-up mode for small bug/polish/deploy tasks.
  ### 5. Semantic product bugs were not caught
  A real product bug escaped Planr review and verification:
  The leaderboard submitted `bestScore` instead of the current failed-run score. The user caught it manually during live use.
  Planr’s smoke tests, browser verification, reviews, and audits did not catch this semantic acceptance bug.
  The issue is not that Planr should magically know every product bug, but that verification synthesis was too generic. The browser verifier should derive
  specific assertions from acceptance criteria.
  ## Proposed Fixes
  ### A. Make autonomous follow-up plans require a goal contract or verification gate
  When `$planr-loop` runs against a plan without a stored `goal-contract`, it should not silently treat verification as optional.
  Possible behavior:
  - If no contract exists, create/store one in iteration 1 using the plan’s verification section.
  - Or fail with a clear instruction to run `$planr-goal` / create a contract first.
  - Or mark `verification_logged.required = true` for all plans run by `$planr-loop`.
  Acceptance criteria:
  - A plan executed by `$planr-loop` cannot audit as complete without required verification evidence.
  - `plan audit` clearly explains when verification is skipped because no contract exists.
  - Follow-up bug/deploy plans have binding verification by default when run autonomously.
  ### B. Capture requested capabilities during goal prep
  `$planr-goal` should detect user-requested tooling and record it as capability context.
  Examples:
  ```bash
  planr context add "web verification, Codex browser plugin, invoke via Codex browser tools" --tag capability
  planr context add "asset generation, use /hatch-pet for player character graphics" --tag capability
  ```
  Acceptance criteria:
  - If the user says “use Codex browser plugin”, `$planr-goal` records a `capability` context.
  - `$planr-verify-web` reads `planr context list --tag capability` before discovery.
  - Reviewers can see which capability was intended and whether it was actually used.
  - Capability contexts are included in summary/audit output where relevant.
  ### C. Synthesize acceptance-specific verification steps
  For web goals, `$planr-goal` or `$planr-verify-web` should derive concrete browser assertions from acceptance criteria.
  Example for a game:
  - start the game,
  - jump,
  - observe score increasing,
  - collide,
  - verify game over,
  - submit or inspect leaderboard score,
  - restart,
  - verify generated assets are visible.
  Acceptance criteria:
  - Browser verification logs include actions and assertions, not only “opened app and checked screenshot.”
  - Verification explicitly covers user-visible acceptance criteria.
  - The verification summary should be strong enough for a reviewer to replay or challenge it.
  ### D. Make review gates source-aware or skip them
  Improve review discipline so `complete` means the reviewer actually inspected evidence.
  Possible behavior:
  - `planr review close --verdict complete` warns or fails when the review has no source/evidence inspection.
  - Review artifacts should record:
    - changed files inspected,
    - commands rerun,
    - verification evidence checked,
    - whether source content was included.
  - Low-signal review gates should be avoided by default for trivial polish.
  Acceptance criteria:
  - A complete review cannot be indistinguishable from a ceremonial approval.
  - Reviews without source/evidence inspection close as `unclear` or produce a warning.
  - `$planr-loop` requests fewer review gates for small low-risk follow-ups.
  ### E. Add lightweight follow-up mode
  Planr should support small post-goal work without forcing a full heavy planning cycle.
  Example prompt:
  ```text
  Use $planr. Add follow-up tasks for two bugs and a deploy, then continue.
  ```
  Expected behavior:
  - Reuse the existing project.
  - Create a small follow-up plan or parent gate.
  - Add 2-4 map items.
  - Link deploy after fixes and verification.
  - Request approval for deploy.
  - Avoid excessive review gates unless risk warrants them.
  Acceptance criteria:
  - Follow-up bug/polish/deploy work is easy to add from a single prompt.
  - The old completed goal remains intact.
  - The live map shows old and new scope together.
  - Follow-up work still has evidence and verification, but less ceremony.
  ## Non-Goals
  - Do not replace Codex `/goal`.
  - Do not make Planr another loop engine.
  - Do not require Planr to ship browser tooling.
  - Do not require all work to have heavy independent reviews.
  - Do not make every tiny polish change a full product plan.
  ## Core Positioning
  Codex `/goal` is the loop engine.
  Planr should be the durable state, evidence, verification, review, approval, and recovery layer around that loop.
  This dogfood run supports that positioning, but also shows Planr must reduce ceremony and make its gates more meaningful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce ceremony and make verification/review gates stricter #7

Context

What Worked

Problems Found

1. Verification was not consistently binding across follow-up plans

2. Browser capability context was not used

3. Reviews could become ceremonial

4. Small follow-up polish created too much overhead

5. Semantic product bugs were not caught

Proposed Fixes

A. Make autonomous follow-up plans require a goal contract or verification gate

B. Capture requested capabilities during goal prep

C. Synthesize acceptance-specific verification steps

D. Make review gates source-aware or skip them

E. Add lightweight follow-up mode

Non-Goals

Core Positioning

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

reduce ceremony and make verification/review gates stricter #7

Description

Context

What Worked

Problems Found

1. Verification was not consistently binding across follow-up plans

2. Browser capability context was not used

3. Reviews could become ceremonial

4. Small follow-up polish created too much overhead

5. Semantic product bugs were not caught

Proposed Fixes

A. Make autonomous follow-up plans require a goal contract or verification gate

B. Capture requested capabilities during goal prep

C. Synthesize acceptance-specific verification steps

D. Make review gates source-aware or skip them

E. Add lightweight follow-up mode

Non-Goals

Core Positioning

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions