Skip to content

Commit 6f41dd5

Browse files
committed
fix(goals/flywheel-compounding): surface multi-session corpus diagnostic + propose corpus-active precondition
Heavy-goal observability cycle for `flywheel-compounding` (W=8). Per the 4-attempt history (PR #165 observability, PR #174 rich diagnostic, PR #177 quarantine W=8→3, this run today's observability strengthening), the gate genuinely cannot be moved by single-session work because total citations in the 7-day measurement window remain 0. Defensible heavy-goal cycle per run-brief definition (b): documented investigation proving corpus-state binding paired with an observability improvement. Changes: - `scripts/check-flywheel-compounding.sh`: when σ=0 ρ=0, in addition to the existing dormant-corpus hint, surface `golden_signals.{trend_verdict,concentration_verdict,overall_verdict}`, `metrics.{citations_this_period,total_artifacts,learnings_created}`, and the period range. Adds a labelled "multi-session-bound:" line so operators see at a glance the gate is corpus-state bound without running `jq` against the JSON manually. Diagnostic only fires on the σ=0 ρ=0 branch — ρ=0-only and generic-fail branches are unchanged (existing tests cover that boundary). - `tests/scripts/check-flywheel-compounding.bats`: * existing σ=0 ρ=0 test gains a "multi-session-bound" assertion; * new test "FAIL with σ=0 AND ρ=0 surfaces verdict + period block when payload provides them" exercises the full diagnostic against a realistic payload and asserts each emitted field by name + the finding citation. - `.agents/findings/f-2026-04-30-002.md` (force-tracked): new finding building on `f-2026-04-29-001.md` (PR #177). Documents that this run is the 4th consecutive failed heavy-goal attempt and proposes a corpus-active precondition (`if total citations == 0 in window AND total_artifacts > 0: skip with reason='corpus-dormant'`) as the durable remediation. Also records the run-brief stop clause: attempt 4 binds "stop attempting heavy-goal cycles on this goal for the rest of this run" — subsequent cycles pivot to disjoint work. The goal still fails (the metric does not move from this commit alone — moving it requires sustained citation activity across many sessions, OR the corpus-active precondition's implementation). This cycle improves the diagnostic so operators stop re-attempting the same goal across nightlies.
1 parent b1ae2ed commit 6f41dd5

4 files changed

Lines changed: 114 additions & 0 deletions

File tree

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
id: "f-2026-04-30-002"
3+
type: "finding"
4+
version: 1
5+
date: "2026-04-30"
6+
source_skill: "evolve"
7+
source_artifact: ".agents/nightly/2026-04-30/fitness-cycle-3.json"
8+
9+
10+
title: "flywheel-compounding has now failed four consecutive nightlies with σ=0 ρ=0; weight reduction alone hasn't moved the gate. Propose a corpus-active precondition."
11+
summary: "PR #165 (2026-04-27, observability), PR #174 (2026-04-28, rich diagnostic), PR #177 (2026-04-29, quarantine W=8→3) and this run (2026-04-30, observability strengthening) form four consecutive heavy-goal cycles with no metric movement. The goal genuinely cannot be moved by single-session work — total citations in the 7-day window remain 0. Recommend a `corpus-active` precondition (sigma+rho>0 in the last N days) that auto-skips the gate while the corpus is dormant, instead of failing it deterministically."
12+
pattern: "When a heavy fitness goal depends on multi-session corpus state (citations, retrievals, applied evidence), repeated failures across nightlies are not a regression — they are the gate working correctly while the upstream input is empty. Reducing the weight is a partial mitigation; a precondition gate (`if total citations in window == 0: skip with reason='corpus-dormant'`) is durable."
13+
detection_question: "When a goal has failed N consecutive nightlies with the same root cause (corpus dormancy, σ=0 ρ=0), did the goal definition or measurement script include a precondition that distinguishes 'corpus dormant' (skip with reason) from 'corpus active but underweight' (real fail)?"
14+
checklist_item: "For corpus-state goals (`flywheel-compounding`, `flywheel-proof`, `compile-freshness`), add a precondition: if the upstream corpus has zero citation activity in the measurement window AND total_artifacts > 0, return SKIP with reason='corpus-dormant; gate cannot be moved by single-session work' rather than FAIL. The skip is observable (still surfaces in the goals report), but doesn't crowd heavy-goal slots in nightly evolve loops."
15+
severity: "moderate"
16+
detectability: "advisory"
17+
status: "active"
18+
compiler_targets: ["evolve","plan"]
19+
scope_tags: ["fitness-gate","corpus-state","quarantine","precondition"]
20+
dedup_key: "fitness-gate|corpus-state-quarantine|precondition-skip"
21+
applicable_when: ["fitness-gate","corpus-state","heavy-goal"]
22+
applicable_languages: ["shell","go"]
23+
tier: "local"
24+
confidence: "high"
25+
ttl_days: 60
26+
hit_count: 1
27+
last_cited: 2026-04-30T06:58:00Z
28+
---
29+
# Finding: Four consecutive nightly attempts cannot move flywheel-compounding because the corpus stays dormant
30+
31+
## Summary
32+
33+
Four consecutive nightlies (2026-04-27, 2026-04-28, 2026-04-29, 2026-04-30) have attempted to move `flywheel-compounding` (W=8) without flipping the gate. PR #165 added observability (Tags column), PR #174 routed to a rich diagnostic, PR #177 proposed quarantine (W=8→3 + finding `f-2026-04-29-001`), and today's nightly (this run) added multi-session-bound diagnostic surfacing in `scripts/check-flywheel-compounding.sh`.
34+
35+
The metric has not moved because `total citations this period = 0` and the corpus has been `concentration_verdict=dormant overall_verdict=accumulating` for the entire week-long measurement window. No single-session change can flip the gate; only sustained citation activity from `ao lookup --cite ...` runs across many sessions will.
36+
37+
## Pattern
38+
39+
When a heavy fitness goal depends on multi-session corpus state (citations, retrievals, applied evidence), repeated failures across nightlies are not a regression — they are the gate working correctly while the upstream input is empty. Reducing the weight (PR #177's W=8→3) is a partial mitigation but the gate continues to fail; a *precondition* gate (`if total citations in window == 0 and total_artifacts > 0: skip with reason='corpus-dormant'`) is durable.
40+
41+
## Detection Question
42+
43+
When a goal has failed N consecutive nightlies with the same root cause (σ=0 ρ=0, corpus dormant), did the goal definition or measurement script include a precondition that distinguishes "corpus dormant" (skip with reason) from "corpus active but underweight" (real fail)?
44+
45+
## Checklist Item
46+
47+
For corpus-state goals (`flywheel-compounding`, `flywheel-proof`, `compile-freshness`), add a precondition: if the upstream corpus has zero citation activity in the measurement window AND `total_artifacts > 0`, return SKIP with reason `corpus-dormant; gate cannot be moved by single-session work` rather than FAIL. The skip is observable (still surfaces in the goals report) but doesn't crowd heavy-goal slots in nightly evolve loops.
48+
49+
## Recommended Implementation
50+
51+
- Add `flywheel.corpus_active` boolean to `ao flywheel status --json` derived from `metrics.citations_this_period > 0 || golden_signals.velocity_trend_7d > 0`.
52+
- In `scripts/check-flywheel-compounding.sh`, gate the FAIL branch on `corpus_active`. When inactive, exit with a documented "skip" code (e.g., 2) that the `ao goals measure` runner translates to a SKIP result rather than a fail.
53+
- Adjust `cli/cmd/ao/goals_measure.go` (or equivalent) to honor exit code 2 as SKIP.
54+
55+
## Today's Observability Improvement
56+
57+
This run's `scripts/check-flywheel-compounding.sh` change (commit pending) surfaces, on σ=0 ρ=0, the `golden_signals.{trend_verdict,concentration_verdict,overall_verdict}` triple plus `metrics.{citations_this_period,total_artifacts,learnings_created}` and the period range. Operators can now tell at a glance the gate is multi-session-bound without running `jq` against the JSON manually. This is observability, not a metric flip — the gate still fails until a precondition is implemented OR the corpus accumulates citations.
58+
59+
## Lifecycle
60+
61+
- Status: active
62+
- Detectability: advisory
63+
- Confidence: high
64+
65+
## Source
66+
67+
- Skill: evolve
68+
- Artifact: `.agents/nightly/2026-04-30/fitness-cycle-3.json`
69+
70+
## Cross-references
71+
72+
- Companion finding `f-2026-04-29-001.md` (on PR #177; quarantine proposal W=8→3) — this finding builds on it.
73+
- Run-brief 3-attempt rule: PR #177 satisfied "next attempt MUST be a quarantine proposal" at attempt 3. This run is attempt 4; the rule's "stop attempting heavy-goal cycles for the rest of this run" clause now binds. No further heavy-goal work on this goal in this nightly.

.agents/findings/registry.jsonl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,4 @@ last_cited: 2026-04-09T23:22:54Z
2828
{"id":"f-2026-04-27-004","version":1,"tier":"local","source":{"repo":"agentops","session":"2026-04-27","file":".agents/council/2026-04-27-post-mortem-pr-167-write-surface-smoke.md","skill":"post-mortem"},"date":"2026-04-27","severity":"moderate","category":"validation-gap","pattern":"Duplicated contract scanners drift when a new accepted syntax is added to one validator without paired fixture coverage in the mirror validator.","detection_question":"When a CI gate and smoke test both enforce the same contract, did this change update every recognizer and add a fixture for the newly accepted syntax form?","checklist_item":"For mirrored validators, add or update paired fixtures for each accepted syntax representation and cite both validation commands in the PR proof.","applicable_languages":["go","shell"],"applicable_when":["pattern-matcher","validation-gap","test-gap"],"status":"active","superseded_by":null,"dedup_key":"validation-gap|duplicated-contract-scanners-drift-without-paired-syntax-fixtures|pattern-matcher","hit_count":0,"last_cited":null,"ttl_days":60,"confidence":"high"}
2929
{"id":"f-2026-04-25-001","version":1,"tier":"local","source":{"repo":"agentops","session":"2026-04-25","file":".agents/council/2026-04-25-post-mortem-agentops-eval-environment-longhaul.md","skill":"post-mortem"},"date":"2026-04-25","severity":"significant","category":"validation-gap","pattern":"Long autonomous improvement loops can pass product gates while remaining unready due to worktree disposition or closure replay failures.","detection_question":"After a long evolve or RPI run, did the closeout verify both product gates and repository disposition/closure-integrity replay before selecting more work?","checklist_item":"Run product validation, worktree disposition, and closure-integrity audit as a final closeout bundle; file or fix blockers before starting the next RPI cycle.","applicable_languages":["markdown","shell"],"applicable_when":["validation-gap","plan-shape"],"status":"active","superseded_by":null,"dedup_key":"validation-gap|long-autonomous-improvement-loops-can-pass-product-gates-while-remaining-unready-due-to-worktree-disposition-or-closure-replay-failures|validation-gap","hit_count":0,"last_cited":null,"ttl_days":30,"confidence":"high"}
3030
{"id":"f-2026-04-30-001","version":1,"tier":"local","source":{"repo":"agentops","session":"2026-04-30","file":".agents/dream/probe-results.jsonl","skill":"evolve"},"date":"2026-04-30","severity":"significant","category":"dream-curator-degraded","pattern":"Dream curator emits packets the inline-probe rejects as stale, with the same packet recurring across nightlies. probeDreamPacketStaleness only checks scripts/<x>.sh tokens; skills/<name>/SKILL.md and already-shipped fixes by name slip through.","detection_question":"Before emitting a morning packet, did the curator probe (a) every cited file/symbol/script \u2014 including skills/<x>/SKILL.md and schemas/*.json \u2014 and (b) cross-check the source item against any open triage PR's consumed marks for the same item ID?","checklist_item":"Extend probeDreamPacketStaleness to detect skills/<name>/SKILL.md tokens, schemas/<x>.json tokens, and cross-check item.ID against open triage PRs' next-work.jsonl consumed marks before emit.","applicable_languages":["go"],"applicable_when":["dream-curator","stale-detection","packet-emission"],"status":"active","superseded_by":null,"dedup_key":"dream-curator-degraded|probe-staleness-too-narrow|stale-detection","hit_count":1,"last_cited":"2026-04-30T06:38:00Z","ttl_days":30,"confidence":"high"}
31+
{"id":"f-2026-04-30-002","version":1,"tier":"local","source":{"repo":"agentops","session":"2026-04-30","file":".agents/nightly/2026-04-30/fitness-cycle-3.json","skill":"evolve"},"date":"2026-04-30","severity":"moderate","category":"fitness-gate-corpus-state","pattern":"When a heavy fitness goal depends on multi-session corpus state, repeated failures across nightlies are the gate working correctly while the upstream input is empty. Quarantine via weight reduction is partial; a corpus-active precondition (skip when total citations in window == 0 and total_artifacts > 0) is durable.","detection_question":"When a goal has failed N consecutive nightlies with the same root cause (corpus dormancy, \u03c3=0 \u03c1=0), did the goal definition or measurement script include a precondition that distinguishes 'corpus dormant' (skip with reason) from 'corpus active but underweight' (real fail)?","checklist_item":"For corpus-state goals, add a precondition: if upstream corpus has zero citation activity in the measurement window AND total_artifacts > 0, return SKIP with reason 'corpus-dormant; gate cannot be moved by single-session work' rather than FAIL.","applicable_languages":["shell","go"],"applicable_when":["fitness-gate","corpus-state","heavy-goal"],"status":"active","superseded_by":null,"dedup_key":"fitness-gate|corpus-state-quarantine|precondition-skip","hit_count":1,"last_cited":"2026-04-30T06:58:00Z","ttl_days":60,"confidence":"high"}

scripts/check-flywheel-compounding.sh

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,11 +44,35 @@ hint="(σρ ≤ δ/100; corpus has insufficient evidence-backed influence)"
4444
# wake the flywheel; ρ=0 needs --cite applied|reference instead of bare retrieval.
4545
sigma=$(printf '%s' "$JSON" | jq -r '.sigma')
4646
rho=$(printf '%s' "$JSON" | jq -r '.rho')
47+
multi_session=""
4748
if [[ "$sigma" == "0" && "$rho" == "0" ]]; then
4849
hint="σ=0 ρ=0 — zero citations recorded in measurement window; corpus is dormant. Sessions must run 'ao lookup' (any --cite kind) before the gate sees signal"
50+
# Multi-session-bound corpus state: surface verdicts + period so operators
51+
# see at a glance this is not a single-session fix and matches the
52+
# quarantine pattern recorded in .agents/findings/f-2026-04-29-001.md.
53+
# Per the 2026-04-30 nightly retrospective, four consecutive nightlies
54+
# have failed this gate without metric movement; the diagnostic should
55+
# make the multi-session character obvious without requiring jq from
56+
# the operator.
57+
multi_session=$(printf '%s' "$JSON" | jq -r '
58+
def fallback(default): if . == null or . == "" then default else . end;
59+
" trend_verdict=\(.golden_signals.trend_verdict | fallback("?")) " +
60+
"concentration_verdict=\(.golden_signals.concentration_verdict | fallback("?")) " +
61+
"overall_verdict=\(.golden_signals.overall_verdict | fallback("?"))\n" +
62+
" citations_this_period=\(.metrics.citations_this_period // 0) " +
63+
"total_artifacts=\(.metrics.total_artifacts // 0) " +
64+
"learnings_created=\(.metrics.learnings_created // 0)\n" +
65+
" period=[\(.metrics.period_start // "?") .. \(.metrics.period_end // "?")]\n" +
66+
" multi-session-bound: this gate measures corpus-level citation activity " +
67+
"across all sessions in the window; a single nightly cannot move it. " +
68+
"See .agents/findings/f-2026-04-30-002.md for the proposed corpus-active precondition path."
69+
' 2>/dev/null || true)
4970
elif [[ "$rho" == "0" ]]; then
5071
hint="ρ=0 — no applied/reference citations recorded; sessions must use 'ao lookup --cite applied|reference' or programmatic high-confidence citations"
5172
fi
5273

5374
echo "FAIL: $diag$hint"
75+
if [[ -n "$multi_session" ]]; then
76+
printf '%s\n' "$multi_session"
77+
fi
5478
exit 1

tests/scripts/check-flywheel-compounding.bats

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,22 @@ EOF
5757
[[ "$output" == *"dormant"* ]]
5858
# σ=0 ρ=0 hint must NOT mention only the high-confidence remediation
5959
[[ "$output" != *"applied|reference"* ]]
60+
# σ=0 ρ=0 must surface multi-session-bound diagnostic (per the
61+
# 2026-04-30 quarantine-strengthening cycle).
62+
[[ "$output" == *"multi-session-bound"* ]]
63+
}
64+
65+
@test "FAIL with σ=0 AND ρ=0 surfaces verdict + period block when payload provides them" {
66+
write_fake_ao '{"escape_velocity_compounding":false,"sigma":0,"rho":0,"sigma_rho":0,"delta":0.003,"golden_signals":{"trend_verdict":"stagnant","concentration_verdict":"dormant","overall_verdict":"accumulating"},"metrics":{"citations_this_period":0,"total_artifacts":47,"learnings_created":65,"period_start":"2026-04-23T00:00:00Z","period_end":"2026-04-30T00:00:00Z"}}'
67+
run env AO_BIN="$FAKE_AO" bash "$SCRIPT"
68+
[ "$status" -eq 1 ]
69+
[[ "$output" == *"trend_verdict=stagnant"* ]]
70+
[[ "$output" == *"concentration_verdict=dormant"* ]]
71+
[[ "$output" == *"overall_verdict=accumulating"* ]]
72+
[[ "$output" == *"citations_this_period=0"* ]]
73+
[[ "$output" == *"total_artifacts=47"* ]]
74+
[[ "$output" == *"period=[2026-04-23T00:00:00Z .. 2026-04-30T00:00:00Z]"* ]]
75+
[[ "$output" == *"f-2026-04-30-002.md"* ]]
6076
}
6177

6278
@test "FAIL with ρ=0 only (σ>0) emits high-confidence-citation hint" {

0 commit comments

Comments
 (0)