fix(goals/flywheel-compounding): surface multi-session corpus diagnostic + propose corpus-active precondition

claude · claude · commit 6f41dd5d7f0d · 2026-04-30T06:58:57.000Z
Heavy-goal observability cycle for `flywheel-compounding` (W=8). Per the 4-attempt history (PR #165 observability, PR #174 rich diagnostic, PR #177 quarantine W=8→3, this run today's observability strengthening), the gate genuinely cannot be moved by single-session work because total citations in the 7-day measurement window remain 0. Defensible heavy-goal cycle per run-brief definition (b): documented investigation proving corpus-state binding paired with an observability improvement. Changes: - `scripts/check-flywheel-compounding.sh`: when σ=0 ρ=0, in addition to the existing dormant-corpus hint, surface `golden_signals.{trend_verdict,concentration_verdict,overall_verdict}`, `metrics.{citations_this_period,total_artifacts,learnings_created}`, and the period range. Adds a labelled "multi-session-bound:" line so operators see at a glance the gate is corpus-state bound without running `jq` against the JSON manually. Diagnostic only fires on the σ=0 ρ=0 branch — ρ=0-only and generic-fail branches are unchanged (existing tests cover that boundary). - `tests/scripts/check-flywheel-compounding.bats`: * existing σ=0 ρ=0 test gains a "multi-session-bound" assertion; * new test "FAIL with σ=0 AND ρ=0 surfaces verdict + period block when payload provides them" exercises the full diagnostic against a realistic payload and asserts each emitted field by name + the finding citation. - `.agents/findings/f-2026-04-30-002.md` (force-tracked): new finding building on `f-2026-04-29-001.md` (PR #177). Documents that this run is the 4th consecutive failed heavy-goal attempt and proposes a corpus-active precondition (`if total citations == 0 in window AND total_artifacts > 0: skip with reason='corpus-dormant'`) as the durable remediation. Also records the run-brief stop clause: attempt 4 binds "stop attempting heavy-goal cycles on this goal for the rest of this run" — subsequent cycles pivot to disjoint work. The goal still fails (the metric does not move from this commit alone — moving it requires sustained citation activity across many sessions, OR the corpus-active precondition's implementation). This cycle improves the diagnostic so operators stop re-attempting the same goal across nightlies.
diff --git a/.agents/findings/f-2026-04-30-002.md b/.agents/findings/f-2026-04-30-002.md
@@ -0,0 +1,73 @@
+---
+id: "f-2026-04-30-002"
+type: "finding"
+version: 1
+date: "2026-04-30"
+source_skill: "evolve"
+source_artifact: ".agents/nightly/2026-04-30/fitness-cycle-3.json"
+
+
+title: "flywheel-compounding has now failed four consecutive nightlies with σ=0 ρ=0; weight reduction alone hasn't moved the gate. Propose a corpus-active precondition."
+summary: "PR #165 (2026-04-27, observability), PR #174 (2026-04-28, rich diagnostic), PR #177 (2026-04-29, quarantine W=8→3) and this run (2026-04-30, observability strengthening) form four consecutive heavy-goal cycles with no metric movement. The goal genuinely cannot be moved by single-session work — total citations in the 7-day window remain 0. Recommend a `corpus-active` precondition (sigma+rho>0 in the last N days) that auto-skips the gate while the corpus is dormant, instead of failing it deterministically."
+pattern: "When a heavy fitness goal depends on multi-session corpus state (citations, retrievals, applied evidence), repeated failures across nightlies are not a regression — they are the gate working correctly while the upstream input is empty. Reducing the weight is a partial mitigation; a precondition gate (`if total citations in window == 0: skip with reason='corpus-dormant'`) is durable."
+detection_question: "When a goal has failed N consecutive nightlies with the same root cause (corpus dormancy, σ=0 ρ=0), did the goal definition or measurement script include a precondition that distinguishes 'corpus dormant' (skip with reason) from 'corpus active but underweight' (real fail)?"
+checklist_item: "For corpus-state goals (`flywheel-compounding`, `flywheel-proof`, `compile-freshness`), add a precondition: if the upstream corpus has zero citation activity in the measurement window AND total_artifacts > 0, return SKIP with reason='corpus-dormant; gate cannot be moved by single-session work' rather than FAIL. The skip is observable (still surfaces in the goals report), but doesn't crowd heavy-goal slots in nightly evolve loops."
+severity: "moderate"
+detectability: "advisory"
+status: "active"
+compiler_targets: ["evolve","plan"]
+scope_tags: ["fitness-gate","corpus-state","quarantine","precondition"]
+dedup_key: "fitness-gate|corpus-state-quarantine|precondition-skip"
+applicable_when: ["fitness-gate","corpus-state","heavy-goal"]
+applicable_languages: ["shell","go"]
+tier: "local"
+confidence: "high"
+ttl_days: 60
+hit_count: 1
+last_cited: 2026-04-30T06:58:00Z
+---
+# Finding: Four consecutive nightly attempts cannot move flywheel-compounding because the corpus stays dormant
+
+## Summary
+
+Four consecutive nightlies (2026-04-27, 2026-04-28, 2026-04-29, 2026-04-30) have attempted to move `flywheel-compounding` (W=8) without flipping the gate. PR #165 added observability (Tags column), PR #174 routed to a rich diagnostic, PR #177 proposed quarantine (W=8→3 + finding `f-2026-04-29-001`), and today's nightly (this run) added multi-session-bound diagnostic surfacing in `scripts/check-flywheel-compounding.sh`.
+
+The metric has not moved because `total citations this period = 0` and the corpus has been `concentration_verdict=dormant overall_verdict=accumulating` for the entire week-long measurement window. No single-session change can flip the gate; only sustained citation activity from `ao lookup --cite ...` runs across many sessions will.
+
+## Pattern
+
+When a heavy fitness goal depends on multi-session corpus state (citations, retrievals, applied evidence), repeated failures across nightlies are not a regression — they are the gate working correctly while the upstream input is empty. Reducing the weight (PR #177's W=8→3) is a partial mitigation but the gate continues to fail; a *precondition* gate (`if total citations in window == 0 and total_artifacts > 0: skip with reason='corpus-dormant'`) is durable.
+
+## Detection Question
+
+When a goal has failed N consecutive nightlies with the same root cause (σ=0 ρ=0, corpus dormant), did the goal definition or measurement script include a precondition that distinguishes "corpus dormant" (skip with reason) from "corpus active but underweight" (real fail)?
+
+## Checklist Item
+
+For corpus-state goals (`flywheel-compounding`, `flywheel-proof`, `compile-freshness`), add a precondition: if the upstream corpus has zero citation activity in the measurement window AND `total_artifacts > 0`, return SKIP with reason `corpus-dormant; gate cannot be moved by single-session work` rather than FAIL. The skip is observable (still surfaces in the goals report) but doesn't crowd heavy-goal slots in nightly evolve loops.
+
+## Recommended Implementation
+
+- Add `flywheel.corpus_active` boolean to `ao flywheel status --json` derived from `metrics.citations_this_period > 0 || golden_signals.velocity_trend_7d > 0`.
+- In `scripts/check-flywheel-compounding.sh`, gate the FAIL branch on `corpus_active`. When inactive, exit with a documented "skip" code (e.g., 2) that the `ao goals measure` runner translates to a SKIP result rather than a fail.
+- Adjust `cli/cmd/ao/goals_measure.go` (or equivalent) to honor exit code 2 as SKIP.
+
+## Today's Observability Improvement
+
+This run's `scripts/check-flywheel-compounding.sh` change (commit pending) surfaces, on σ=0 ρ=0, the `golden_signals.{trend_verdict,concentration_verdict,overall_verdict}` triple plus `metrics.{citations_this_period,total_artifacts,learnings_created}` and the period range. Operators can now tell at a glance the gate is multi-session-bound without running `jq` against the JSON manually. This is observability, not a metric flip — the gate still fails until a precondition is implemented OR the corpus accumulates citations.
+
+## Lifecycle
+
+- Status: active
+- Detectability: advisory
+- Confidence: high
+
+## Source
+
+- Skill: evolve
+- Artifact: `.agents/nightly/2026-04-30/fitness-cycle-3.json`
+
+## Cross-references
+
+- Companion finding `f-2026-04-29-001.md` (on PR #177; quarantine proposal W=8→3) — this finding builds on it.
+- Run-brief 3-attempt rule: PR #177 satisfied "next attempt MUST be a quarantine proposal" at attempt 3. This run is attempt 4; the rule's "stop attempting heavy-goal cycles for the rest of this run" clause now binds. No further heavy-goal work on this goal in this nightly.
diff --git a/.agents/findings/registry.jsonl b/.agents/findings/registry.jsonl
@@ -28,3 +28,4 @@ last_cited: 2026-04-09T23:22:54Z
 {"id":"f-2026-04-27-004","version":1,"tier":"local","source":{"repo":"agentops","session":"2026-04-27","file":".agents/council/2026-04-27-post-mortem-pr-167-write-surface-smoke.md","skill":"post-mortem"},"date":"2026-04-27","severity":"moderate","category":"validation-gap","pattern":"Duplicated contract scanners drift when a new accepted syntax is added to one validator without paired fixture coverage in the mirror validator.","detection_question":"When a CI gate and smoke test both enforce the same contract, did this change update every recognizer and add a fixture for the newly accepted syntax form?","checklist_item":"For mirrored validators, add or update paired fixtures for each accepted syntax representation and cite both validation commands in the PR proof.","applicable_languages":["go","shell"],"applicable_when":["pattern-matcher","validation-gap","test-gap"],"status":"active","superseded_by":null,"dedup_key":"validation-gap|duplicated-contract-scanners-drift-without-paired-syntax-fixtures|pattern-matcher","hit_count":0,"last_cited":null,"ttl_days":60,"confidence":"high"}
 {"id":"f-2026-04-25-001","version":1,"tier":"local","source":{"repo":"agentops","session":"2026-04-25","file":".agents/council/2026-04-25-post-mortem-agentops-eval-environment-longhaul.md","skill":"post-mortem"},"date":"2026-04-25","severity":"significant","category":"validation-gap","pattern":"Long autonomous improvement loops can pass product gates while remaining unready due to worktree disposition or closure replay failures.","detection_question":"After a long evolve or RPI run, did the closeout verify both product gates and repository disposition/closure-integrity replay before selecting more work?","checklist_item":"Run product validation, worktree disposition, and closure-integrity audit as a final closeout bundle; file or fix blockers before starting the next RPI cycle.","applicable_languages":["markdown","shell"],"applicable_when":["validation-gap","plan-shape"],"status":"active","superseded_by":null,"dedup_key":"validation-gap|long-autonomous-improvement-loops-can-pass-product-gates-while-remaining-unready-due-to-worktree-disposition-or-closure-replay-failures|validation-gap","hit_count":0,"last_cited":null,"ttl_days":30,"confidence":"high"}
 {"id":"f-2026-04-30-001","version":1,"tier":"local","source":{"repo":"agentops","session":"2026-04-30","file":".agents/dream/probe-results.jsonl","skill":"evolve"},"date":"2026-04-30","severity":"significant","category":"dream-curator-degraded","pattern":"Dream curator emits packets the inline-probe rejects as stale, with the same packet recurring across nightlies. probeDreamPacketStaleness only checks scripts/<x>.sh tokens; skills/<name>/SKILL.md and already-shipped fixes by name slip through.","detection_question":"Before emitting a morning packet, did the curator probe (a) every cited file/symbol/script \u2014 including skills/<x>/SKILL.md and schemas/*.json \u2014 and (b) cross-check the source item against any open triage PR's consumed marks for the same item ID?","checklist_item":"Extend probeDreamPacketStaleness to detect skills/<name>/SKILL.md tokens, schemas/<x>.json tokens, and cross-check item.ID against open triage PRs' next-work.jsonl consumed marks before emit.","applicable_languages":["go"],"applicable_when":["dream-curator","stale-detection","packet-emission"],"status":"active","superseded_by":null,"dedup_key":"dream-curator-degraded|probe-staleness-too-narrow|stale-detection","hit_count":1,"last_cited":"2026-04-30T06:38:00Z","ttl_days":30,"confidence":"high"}
+{"id":"f-2026-04-30-002","version":1,"tier":"local","source":{"repo":"agentops","session":"2026-04-30","file":".agents/nightly/2026-04-30/fitness-cycle-3.json","skill":"evolve"},"date":"2026-04-30","severity":"moderate","category":"fitness-gate-corpus-state","pattern":"When a heavy fitness goal depends on multi-session corpus state, repeated failures across nightlies are the gate working correctly while the upstream input is empty. Quarantine via weight reduction is partial; a corpus-active precondition (skip when total citations in window == 0 and total_artifacts > 0) is durable.","detection_question":"When a goal has failed N consecutive nightlies with the same root cause (corpus dormancy, \u03c3=0 \u03c1=0), did the goal definition or measurement script include a precondition that distinguishes 'corpus dormant' (skip with reason) from 'corpus active but underweight' (real fail)?","checklist_item":"For corpus-state goals, add a precondition: if upstream corpus has zero citation activity in the measurement window AND total_artifacts > 0, return SKIP with reason 'corpus-dormant; gate cannot be moved by single-session work' rather than FAIL.","applicable_languages":["shell","go"],"applicable_when":["fitness-gate","corpus-state","heavy-goal"],"status":"active","superseded_by":null,"dedup_key":"fitness-gate|corpus-state-quarantine|precondition-skip","hit_count":1,"last_cited":"2026-04-30T06:58:00Z","ttl_days":60,"confidence":"high"}
diff --git a/scripts/check-flywheel-compounding.sh b/scripts/check-flywheel-compounding.sh
@@ -44,11 +44,35 @@ hint="(σρ ≤ δ/100; corpus has insufficient evidence-backed influence)"
 # wake the flywheel; ρ=0 needs --cite applied|reference instead of bare retrieval.
 sigma=$(printf '%s' "$JSON" | jq -r '.sigma')
 rho=$(printf '%s' "$JSON" | jq -r '.rho')
+multi_session=""
 if [[ "$sigma" == "0" && "$rho" == "0" ]]; then
     hint="σ=0 ρ=0 — zero citations recorded in measurement window; corpus is dormant. Sessions must run 'ao lookup' (any --cite kind) before the gate sees signal"
+    # Multi-session-bound corpus state: surface verdicts + period so operators
+    # see at a glance this is not a single-session fix and matches the
+    # quarantine pattern recorded in .agents/findings/f-2026-04-29-001.md.
+    # Per the 2026-04-30 nightly retrospective, four consecutive nightlies
+    # have failed this gate without metric movement; the diagnostic should
+    # make the multi-session character obvious without requiring jq from
+    # the operator.
+    multi_session=$(printf '%s' "$JSON" | jq -r '
+        def fallback(default): if . == null or . == "" then default else . end;
+        "  trend_verdict=\(.golden_signals.trend_verdict | fallback("?")) " +
+        "concentration_verdict=\(.golden_signals.concentration_verdict | fallback("?")) " +
+        "overall_verdict=\(.golden_signals.overall_verdict | fallback("?"))\n" +
+        "  citations_this_period=\(.metrics.citations_this_period // 0) " +
+        "total_artifacts=\(.metrics.total_artifacts // 0) " +
+        "learnings_created=\(.metrics.learnings_created // 0)\n" +
+        "  period=[\(.metrics.period_start // "?") .. \(.metrics.period_end // "?")]\n" +
+        "  multi-session-bound: this gate measures corpus-level citation activity " +
+        "across all sessions in the window; a single nightly cannot move it. " +
+        "See .agents/findings/f-2026-04-30-002.md for the proposed corpus-active precondition path."
+    ' 2>/dev/null || true)
 elif [[ "$rho" == "0" ]]; then
     hint="ρ=0 — no applied/reference citations recorded; sessions must use 'ao lookup --cite applied|reference' or programmatic high-confidence citations"
 fi
 
 echo "FAIL: $diag — $hint"
+if [[ -n "$multi_session" ]]; then
+    printf '%s\n' "$multi_session"
+fi
 exit 1
diff --git a/tests/scripts/check-flywheel-compounding.bats b/tests/scripts/check-flywheel-compounding.bats
@@ -57,6 +57,22 @@ EOF
     [[ "$output" == *"dormant"* ]]
     # σ=0 ρ=0 hint must NOT mention only the high-confidence remediation
     [[ "$output" != *"applied|reference"* ]]
+    # σ=0 ρ=0 must surface multi-session-bound diagnostic (per the
+    # 2026-04-30 quarantine-strengthening cycle).
+    [[ "$output" == *"multi-session-bound"* ]]
+}
+
+@test "FAIL with σ=0 AND ρ=0 surfaces verdict + period block when payload provides them" {
+    write_fake_ao '{"escape_velocity_compounding":false,"sigma":0,"rho":0,"sigma_rho":0,"delta":0.003,"golden_signals":{"trend_verdict":"stagnant","concentration_verdict":"dormant","overall_verdict":"accumulating"},"metrics":{"citations_this_period":0,"total_artifacts":47,"learnings_created":65,"period_start":"2026-04-23T00:00:00Z","period_end":"2026-04-30T00:00:00Z"}}'
+    run env AO_BIN="$FAKE_AO" bash "$SCRIPT"
+    [ "$status" -eq 1 ]
+    [[ "$output" == *"trend_verdict=stagnant"* ]]
+    [[ "$output" == *"concentration_verdict=dormant"* ]]
+    [[ "$output" == *"overall_verdict=accumulating"* ]]
+    [[ "$output" == *"citations_this_period=0"* ]]
+    [[ "$output" == *"total_artifacts=47"* ]]
+    [[ "$output" == *"period=[2026-04-23T00:00:00Z .. 2026-04-30T00:00:00Z]"* ]]
+    [[ "$output" == *"f-2026-04-30-002.md"* ]]
 }
 
 @test "FAIL with ρ=0 only (σ>0) emits high-confidence-citation hint" {