Skip to content

Nightly 2026-04-29 — 9 productive cycles, +1 code goal, +2 runtime, -1 quarantined#177

Open
boshu2 wants to merge 12 commits intomainfrom
nightly/2026-04-29
Open

Nightly 2026-04-29 — 9 productive cycles, +1 code goal, +2 runtime, -1 quarantined#177
boshu2 wants to merge 12 commits intomainfrom
nightly/2026-04-29

Conversation

@boshu2
Copy link
Copy Markdown
Owner

@boshu2 boshu2 commented Apr 29, 2026

Digest

9 productive cycles. Fitness improved from 4 failing goals (warm baseline) → 1 failing (flywheel-compounding, now at quarantine weight 3, corpus-state-bound and unmovable by single-session automation per the 3-attempt rule).

  • 1 code-driven flip: go-complexity-ceiling (W=6) — eliminated five CC violations across daemon, gascity, llm, rpi (5 cycles, ExecuteRPIPhase 27 → ~10, RebuildProjections 24 → ~9, applyQueueEvent 22 → ~7, readRawFrame 22 → ~17, RunForgeTier1 19 → ~7, ClassifyRPIArtifact 19 → ~14).
  • 2 runtime-artifact flips: compile-freshness, compile-no-oscillation — Dream phase produced .agents/overnight/latest/defrag/latest.json; gitignored, does not propagate to other environments.
  • 1 quarantine: flywheel-compounding (W=8 → W=3) — attempt 3 in three consecutive nightlies; mandatory quarantine per the run-brief rule.
  • 0 auto-reverts and 0 transient-flake non-reverts (no regressions during the run).

Fitness delta (warm baseline → final)

Goal Tags Baseline weight Final weight Baseline Final Notes
flywheel-compounding long-cycle, corpus-state 8 3 fail fail Quarantined (cycle 1; finding f-2026-04-29-001); metric still corpus-bound but no longer crowds heavy-goal slot
flywheel-proof 7 7 pass pass held
wiring-closure 7 7 pass pass held
go-cli-builds 8 8 pass pass held
go-cli-tests 8 8 pass pass held (warmed in prelude — see "cold-cache flake avoided" below)
go-complexity-ceiling 6 6 fail pass Code-driven flip across cycles 2–6
skill-frontmatter 6 6 pass pass held
hook-preflight 6 6 pass pass held
security-gate 6 6 pass pass held
flywheel-lifecycle 6 6 pass pass held
codex-parity-drift 5 5 pass pass held
go-vet-clean 5 5 pass pass held
manifest-versions-match 5 5 pass pass held
contract-compatibility 5 5 pass pass held
goals-validate 5 5 pass pass held
install-smoke 5 5 pass pass held
compile-freshness runtime-artifact 4 4 fail pass Runtime-artifact flip (Dream defrag) — does not propagate
compile-no-oscillation runtime-artifact 4 4 fail pass Same defrag artifact
competitive-freshness 3 3 pass pass held

Final: 18 pass / 1 fail (vs warm baseline 15 pass / 4 fail). go-complexity-ceiling (W=6) was the only heavy goal that this run could legitimately move; it flipped via code.

Code-driven flips vs runtime-artifact flips (tabulated separately)

Type Goal Cycles Source
Code-driven go-complexity-ceiling 2, 3, 4, 5, 6 five extract-helper refactors across daemon (rpi_runner.go, projections.go, jobs.go), gascity (events.go), llm (forge_tier1.go), rpi (artifacts.go); see per-cycle summary
Runtime-artifact compile-freshness (Dream phase) .agents/overnight/latest/defrag/latest.json produced by ao overnight start --max-iterations 1 --warn-only; gitignored, does not propagate
Runtime-artifact compile-no-oscillation (Dream phase) same defrag artifact

Per-cycle summary

# Type Goal/Finding Commit Fitness before → after
1 Heavy-goal quarantine (3-attempt rule MUST) flywheel-compounding W=8 → W=3 + finding f-2026-04-29-001 + attempts.jsonl ledger a17032ff warm 4 fail → 2 fail
2 Heavy-goal partial fix (extract-helper) go-complexity-ceilingExecuteRPIPhase CC 27 → ~10 (+ helpers resolveReadyCity, openSession, streamUntilTerminal, processGasCityFrame, classifyGasCityStreamError) a1b66ba0 2 fail → 2 fail
3 Heavy-goal partial fix go-complexity-ceilingRebuildProjections CC 24 → ~9 via applyJobLedgerEvent 49a04c67 2 fail → 2 fail
4 Heavy-goal partial fix go-complexity-ceilingapplyQueueEvent CC 22 → ~7 via metadata/status split 7d015081 2 fail → 2 fail
5 Heavy-goal partial fix go-complexity-ceilingreadRawFrame CC 22 → ~17 via assignSSEField. cli/ now green at threshold 20. 30ecff03 2 fail → 2 fail
6 Heavy-goal flip go-complexity-ceilingRunForgeTier1 19 → ~7 (validateAndDefaultTier1Options) + ClassifyRPIArtifact 19 → ~14 (classifyCouncilArtifact). cli/internal/ green at 18; goal flipped pass. 03a010c4 2 fail → 1 fail
7 Generator-layer test hygiene t.Chdir migration: cli/cmd/ao/plans_test.go (16 sites) 01f38792 1 fail → 1 fail
8 Generator-layer test hygiene t.Chdir migration: dedup_test, config_test, metrics_flywheel_test, metrics_health_test (19 sites, 4 files; both pattern A and B) 4a9e52b4 1 fail → 1 fail
9 Generator-layer CI hygiene Surface (warn-only) suffix on advisory CI jobs (agentops-eval-advisory, security-toolchain-gate, doctor-check, check-test-staleness) — closes harvested council finding "Rename warn-only CI checks with explicit suffix" 1b52db48 1 fail → 1 fail

Total t.Chdir migration footprint this run: 35 sites across 5 test files, ~221 net lines deleted.

Auto-reverts and transient-flake non-reverts

Cycle Action Reason
None No regressions observed; no transient flakes hit.

Cold-cache flake avoided. During the prelude the first ao goals measure showed go-cli-tests failing (CC=8 cold) and the second showed it passing (warm cache). The run-brief calls this out explicitly: "A cold module cache makes go-cli-tests (240s timeout) flake on the first measure, falsely flipping a goal fail→pass between baseline and final." We discarded the cold measure and re-warmed via cd cli && go test -count=1 ./... before persisting the baseline. Stable warm baseline: 15 pass / 4 fail (no go-cli-tests regression).

Quarantined goals + proposed weight reductions

Goal Weight before Weight after Status Finding Recommendation
flywheel-compounding 8 3 long-cycle, corpus-state f-2026-04-29-001 Three consecutive nightlies (PR #165 → PR #174 → this run) failed to flip the metric because σρ requires multi-session human/agent citation activity. Lowered weight to 3 (matching competitive-freshness, the other slow-cadence gate). Diagnostic from PR #174's rich script remains actionable. Rollback path: if citations accumulate and gate stays green ≥7 consecutive nightlies, raise weight back to 5–8.

Heavy-goal attempt history

Goal Run Angle Outcome
flywheel-compounding 2026-04-27 (PR #165) observability: restore long-cycle/corpus-state Tags column stripped by PR #162 metric unchanged
flywheel-compounding 2026-04-28 (PR #174) observability: route goal command to rich-diagnostic script metric unchanged; diagnostic actionable
flywheel-compounding 2026-04-29 (this run) quarantine: weight 8 → 3 + GOALS.md note + finding 3-attempt rule fired; quarantine MUST per run-brief
go-complexity-ceiling 2026-04-28 (PR #174, cycle 1) extract-helper: probeRuntimeVersion from RunLiveRuntime (CC 21 → 17) flipped pass
go-complexity-ceiling 2026-04-29 (this run, cycles 2–6) extract-helper × 5: ExecuteRPIPhase, RebuildProjections, applyQueueEvent, readRawFrame, RunForgeTier1 + ClassifyRPIArtifact flipped pass after recent daemon ship reintroduced regressions

The flywheel-compounding row hits the run-brief's "3+ consecutive nightlies attempted the same goal without flipping the metric, the next attempt MUST be a quarantine proposal." This run's cycle 1 satisfies that — see commit a17032ff and .agents/findings/f-2026-04-29-001.md. The goal will not be re-attempted on the heavy axis until the citation corpus is non-empty.

The go-complexity-ceiling row took the same axis (extract-helper) as PR #174, but only one prior nightly took that axis (PR #165 didn't touch this goal), so the "prior 2 nightlies same axis" rule did not fire.

Dream probe-results summary

  • Total morning packets: 3
  • Stale ratio: 1/3 = 33% (above 30% but below 50% — dream-curator-degraded finding not triggered, matching yesterday's run)
  • Per-packet (persisted to .agents/dream/probe-results.jsonl):
    • 01-audit-context-injection-latencyinconclusive (investigative, no concrete deliverable)
    • 02-decompose-skills-crank-skill-mdstale (claimed 248-line limit; skills/crank/SKILL.md is execution-tier with limit 800; current 660 lines is fine — same staleness as yesterday)
    • 03-replace-os-chdirlive (42 test files still contain os.Chdir; cycles 7–8 consumed this packet)

Yesterday's PR #174 noted "tomorrow's curator could still benefit from probing before emitting packet #2" — same packet emitted again today. The signal continues to accumulate; if a third consecutive nightly receives the same stale packet, file dream-curator-degraded regardless of the 30% threshold.

Findings opened, closed, deferred

  • Opened: f-2026-04-29-001 — flywheel-compounding quarantine proposal (force-tracked under .agents/findings/).
  • Closed: none in bd (bd unavailable). Code goals: go-complexity-ceiling flipped pass; compile-freshness and compile-no-oscillation flipped pass via runtime artifact.
  • Deferred:
    • flywheel-compounding re-attempt — locked at quarantine weight until citation corpus accumulates.
    • Pattern C/D (t.Cleanup-style) os.Chdir migration in curate_test, inject_context_test, feedback_test, index_test, hooks_test — first attempt this run hit err-shadowing in curate_test (the outer oldDir, err := os.Getwd() declares an err reused later by err = runCurateVerify(...)). Reverted; needs per-site review to keep var err error declarations where needed. Logged for next nightly.

Stale-audit count

  • Inline-probe rejections during selection: 2 (Dream packets cleanup #1 inconclusive + Build custom agent with Claude SDK #2 stale — short-circuited before claim).
  • Explicit stale-audit cycles: 0.
  • Cap binding: triage ran earlier today (PR chore(triage): mark 3 stale items consumed 2026-04-29 #176; fast-forwarded into nightly base) AND the Dream probe-stale rate (33%) is below the 50% trigger AND below the 30% MUST-spend trigger. Cap binds at zero, exactly as the run-brief predicts: "If triage ran earlier today AND probe-stale rate < 30%, expect the cap to bind at zero." (Note: 33% is between 30% and 50% so cap binding is "may but not must"; this run had concrete heavy-goal work so didn't need to spend the bookkeeping cap.)

bd / tracker degradation notes

  • bd is NOT available in this environment. scripts/install-bd.sh returns curl 403. All bead-tracking ops are soft-fail throughout the run (Dream bead-sync reports soft-fail (bd not available)). New findings discovered this run are recorded in this PR body and as files under .agents/findings/, not in beads. Restoring local bd availability is independent of nightly work.

Tag push degradation

  • git push origin refs/tags/nightly/2026-04-29 failed with HTTP 403 (same endpoint signature as yesterday's tag push and today's bd-install 403). Per the run-brief rule, did not retry; falling back to the branch ref nightly/2026-04-29 as the anchor for tomorrow's audit. Local tag exists at 1b52db48 for archival.

Coordination notes

  • PR chore(triage): mark 3 stale items consumed 2026-04-29 #176 (triage/2026-04-29, "chore(triage): mark 3 stale items consumed 2026-04-29") was open at run start. The nightly branch is based on origin/triage/2026-04-29 (fast-forwarded into local main during prelude), so PR chore(triage): mark 3 stale items consumed 2026-04-29 #176's .agents/rpi/next-work.jsonl rewrite + triage digest are inherited. Files-touched overlap: PR chore(triage): mark 3 stale items consumed 2026-04-29 #176 touches .agents/rpi/next-work.jsonl, .agents/triage/2026-04-29/baseline.json, .agents/triage/2026-04-29/digest.md. This PR additionally touches GOALS.md, 4 daemon/llm/rpi/gascity Go files, 5 test files, 1 workflow file, and 2 new .agents/findings/ + .agents/goals/ files. Disjoint with PR chore(triage): mark 3 stale items consumed 2026-04-29 #176's incremental write set; no conflicts expected.
  • The .agents/rpi/next-work.jsonl mutations from ao overnight start and goal measurements were reset before each commit (per prelude rules) so each cycle's diff contains only its intended changes.
  • The cli/embedded/skills/compile/scripts/compile.sh mutation that Dream's make sync-hooks produced was also reset — it appears to be sync-hooks correcting an accidental embedded drift introduced by the recent feat(daemon) commit (6e7ed526), but resyncing it in a nightly cycle would be bookkeeping rather than productive. Logged as "next nightly should re-run make sync-hooks and ship the embedded resync as a tracked cycle."

Worktree-disposition gate FAIL

The known-false-positive worktree-disposition gate fails on a nightly branch (canonical root /home/user/agentops is on nightly/2026-04-29; expected main). Per the run-brief notes this is expected and not investigated. No env bypass exists in the current script.

Commit links

  • a17032ff — goals(flywheel-compounding): quarantine — reduce weight 8 → 3 (attempt 3)
  • a1b66ba0 — refactor(daemon): split ExecuteRPIPhase to drop CC 27 → ~10
  • 49a04c67 — refactor(daemon): extract applyJobLedgerEvent to drop RebuildProjections CC 24 → ~9
  • 7d015081 — refactor(daemon): split applyQueueEvent metadata vs status (CC 22 → ~7 each)
  • 30ecff03 — refactor(gascity): extract assignSSEField from readRawFrame (CC 22 → ~17)
  • 03a010c4 — refactor: drop final two CC 19 violations — go-complexity-ceiling flips
  • 01f38792 — test(plans): replace defer os.Chdir restore with t.Chdir (16 sites)
  • 4a9e52b4 — test(cmd/ao): replace defer os.Chdir restore with t.Chdir (19 sites, 4 files)
  • 1b52db48 — ci(validate): surface warn-only suffix on continue-on-error jobs

Generated by Claude Code

claude added 11 commits April 29, 2026 05:26
…t 3)

Three consecutive nightlies (PR #165 2026-04-27, PR #174 2026-04-28, this
run 2026-04-29) failed to flip flywheel-compounding because the metric
requires multi-session citation activity that single-session nightly
automation legitimately cannot manufacture without gaming.

Per the run-brief rule "If 3+ consecutive nightlies attempted the same
goal without flipping the metric, the next attempt MUST be a quarantine
proposal," lower weight 8 → 3 (matching competitive-freshness, the other
slow-cadence gate). Diagnostic remains actionable from PR #174's rich
script swap.

- GOALS.md: weight 8 → 3, description annotated with rollback path
- .agents/findings/f-2026-04-29-001.md: full proposal w/ history
- .agents/goals/flywheel-compounding/attempts.jsonl: ledger persisted

Rollback: if citation corpus accumulates and gate stays green ≥7
consecutive nightlies, weight may be raised back to 5–8.

https://claude.ai/code/session_01Sjd8AFQVEcZJgdv86nP7BU
Decomposed GasCityRPIPhaseExecutor.ExecuteRPIPhase (cli/internal/daemon/
rpi_runner.go:486) into orchestration plus four helpers:

- resolveReadyCity: city name + readiness validation
- openSession: create + submit gascity session, returns partial result
- streamUntilTerminal: opens event stream, drives consumption loop
- processGasCityFrame: per-frame match/classify/finalize logic
- classifyGasCityStreamError: maps NextEvent failure to typed error

Net effect on cli/ complexity:
  before: 27 ExecuteRPIPhase
  after:  ExecuteRPIPhase eliminated; helpers all ≤ 9 CC

Remaining cli/ violations: RebuildProjections (24), readRawFrame (22),
applyQueueEvent (22) — to be tackled in subsequent cycles.

Behavior preserved; daemon RPI tests still pass (`go test
./internal/daemon/`). The split also makes the per-frame loop unit-
testable in isolation, which the prior monolith didn't allow.

https://claude.ai/code/session_01Sjd8AFQVEcZJgdv86nP7BU
…ons CC 24 → ~9

Decomposed daemon.RebuildProjections (cli/internal/daemon/projections.go:125)
by extracting the per-event fold body into a new helper:

- applyJobLedgerEvent: lazy-create job projection, append request IDs,
  resolve job type / targets / artifacts, dispatch lifecycle.

The outer loop becomes a clean: normalize → lifecycle-shortcut → fold.

Net effect on cli/ complexity:
  before: 24 RebuildProjections
  after:  RebuildProjections ~9; helper applyJobLedgerEvent ~13

Remaining cli/ violations: readRawFrame (22), applyQueueEvent (22).

Behavior preserved; daemon tests still pass (`go test ./internal/daemon/`).

https://claude.ai/code/session_01Sjd8AFQVEcZJgdv86nP7BU
…7 each)

Split daemon.Queue.applyQueueEvent (cli/internal/daemon/jobs.go:520) into
two cohesive helpers:

- applyQueueJobMetadata: payload-derived fields (job type, targets,
  idempotency key, payload, max attempts, artifacts) — status-independent
- applyQueueStatusTransition: lifecycle status + status-bound fields
  (claim token, lease epoch/expiry, attempt, failure)

Net effect on cli/ complexity:
  before: 22 applyQueueEvent
  after:  applyQueueEvent ~3 (orchestration); helpers ~7 each

Remaining cli/ violation: readRawFrame (22). cli/ is one cycle away from
green at threshold 20.

Behavior preserved; daemon tests pass.

https://claude.ai/code/session_01Sjd8AFQVEcZJgdv86nP7BU
…~17)

Pulled the SSE field-line switch out of SSEDecoder.readRawFrame
(cli/internal/gascity/events.go:379) into a free function:

- assignSSEField: parses one non-empty, non-comment SSE line and folds
  the field/value into the in-progress frame.

Net effect:
  before: 22 readRawFrame
  after:  readRawFrame ~17, assignSSEField ~6

cli/ at threshold 20 is now GREEN. cli/internal/ at threshold 18 still
has 2 violations (RunForgeTier1, ClassifyRPIArtifact, both CC 19).

Behavior preserved; gascity tests pass.

https://claude.ai/code/session_01Sjd8AFQVEcZJgdv86nP7BU
Two cohesive extractions clear the last cli/internal/ complexity offenders:

- cli/internal/llm/forge_tier1.go:
  validateAndDefaultTier1Options pulls the 12-clause guard prologue out
  of RunForgeTier1 (CC 19 → ~7). The required-fields and default-fills
  checks become trivially unit-testable.

- cli/internal/rpi/artifacts.go:
  classifyCouncilArtifact extracts the three /council/-prefixed branches
  (pre-mortem, post-mortem, vibe) out of ClassifyRPIArtifact's giant
  switch (CC 19 → ~14).

Goal effect:
  before: go-complexity-ceiling FAIL — 5 violations across cli/internal/
          (post-cycle-5; cli/ already green)
  after:  go-complexity-ceiling PASS — both cli/ (≤20) and cli/internal/
          (≤18) thresholds clean

Behavior preserved; llm and rpi tests pass.

https://claude.ai/code/session_01Sjd8AFQVEcZJgdv86nP7BU
Migrate cli/cmd/ao/plans_test.go's 16 oldWD/Chdir/defer-Chdir patterns to
the t.Chdir helper introduced in Go 1.24. The helper handles cleanup via
the test's t.Cleanup chain, eliminating defer ordering pitfalls and the
unused oldWD locals.

Net: -80/+16 lines; tests still pass.

Continues the t.Chdir migration begun in PR #174 (84 sites across 10
files); plans_test.go was the largest remaining cmd/ao site after that
batch (32 raw os.Chdir calls = 16 patterns).

https://claude.ai/code/session_01Sjd8AFQVEcZJgdv86nP7BU
…4 files)

Migrate the remaining mechanical Getwd/Chdir/defer-Chdir patterns:

- dedup_test.go:           4 sites
- config_test.go:         10 sites (also drops unused "os" import)
- metrics_flywheel_test.go: 4 sites
- metrics_health_test.go:   1 site

Net: -141/+19 lines; cmd/ao tests still pass.

Two pattern variants are migrated:
  A) `oldWD, _ := os.Getwd()` + defer
  B) `oldWD, err := os.Getwd()` + nil-check + defer

Continues PR #174 + this nightly's plans_test work. Remaining cmd/ao
files (context_test, extract_test, curate_test, hooks_test, etc) use
non-mechanical patterns (e.g. multi-step Chdir within nested table-test
loops) that don't fit either A or B and need per-site review.

https://claude.ai/code/session_01Sjd8AFQVEcZJgdv86nP7BU
Adds an explicit `name: <id> (warn-only)` to the four advisory jobs so
the GitHub PR status line distinguishes advisory failures from real
ones at a glance:

- agentops-eval-advisory  (warn-only)
- security-toolchain-gate (warn-only)
- doctor-check            (warn-only)
- check-test-staleness    (warn-only)

Job IDs are left unchanged to preserve `summary.needs` references and
any external branch-protection rules.

Closes harvested council-finding "Rename warn-only CI checks with
explicit suffix" (target_repo: agentops, severity: low) — chooses the
non-renaming variant ("surface continue-on-error in the GitHub status
line") since it's zero-risk for downstream refs.

Validation: `scripts/validate-ci-policy-parity.sh` — PASS (31 jobs;
4 non-blocking).

https://claude.ai/code/session_01Sjd8AFQVEcZJgdv86nP7BU
The embedded-sync CI gate flagged drift in cli/embedded/skills/compile/
scripts/compile.sh vs the canonical skills/compile/scripts/compile.sh.

Per CLAUDE.md: "Embedded hooks must stay in sync. After editing hooks/,
lib/hook-helpers.sh, or skills/standards/references/: run cd cli && make
sync-hooks." The drift is a legacy artifact of feat(daemon) commit
6e7ed52 which inadvertently mutated the embedded copy without the
matching skills/ change.

Resolution: run `cd cli && make sync-hooks` to bring embedded in line
with the canonical source. Two-line delta:

- "fallback http://localhost:11434" → "default http://localhost:11434"
- removed an outdated `shellcheck disable=SC2016` line that no longer
  pairs with a backtick regex

Validation: `cd cli && go build ./...` PASS; complexity gates still
green; behavior unchanged (the message string and shellcheck pragma
were both decorative).

This was previously logged in the PR digest under "Coordination notes"
as a follow-up; CI made it non-deferrable.

https://claude.ai/code/session_01Sjd8AFQVEcZJgdv86nP7BU
boshu2 pushed a commit that referenced this pull request Apr 30, 2026
…tic + propose corpus-active precondition

Heavy-goal observability cycle for `flywheel-compounding` (W=8). Per the
4-attempt history (PR #165 observability, PR #174 rich diagnostic, PR #177
quarantine W=8→3, this run today's observability strengthening), the gate
genuinely cannot be moved by single-session work because total citations
in the 7-day measurement window remain 0.

Defensible heavy-goal cycle per run-brief definition (b): documented
investigation proving corpus-state binding paired with an observability
improvement.

Changes:

- `scripts/check-flywheel-compounding.sh`: when σ=0 ρ=0, in addition to
  the existing dormant-corpus hint, surface
  `golden_signals.{trend_verdict,concentration_verdict,overall_verdict}`,
  `metrics.{citations_this_period,total_artifacts,learnings_created}`,
  and the period range. Adds a labelled "multi-session-bound:" line so
  operators see at a glance the gate is corpus-state bound without
  running `jq` against the JSON manually. Diagnostic only fires on the
  σ=0 ρ=0 branch — ρ=0-only and generic-fail branches are unchanged
  (existing tests cover that boundary).

- `tests/scripts/check-flywheel-compounding.bats`:
  * existing σ=0 ρ=0 test gains a "multi-session-bound" assertion;
  * new test "FAIL with σ=0 AND ρ=0 surfaces verdict + period block when
    payload provides them" exercises the full diagnostic against a
    realistic payload and asserts each emitted field by name + the
    finding citation.

- `.agents/findings/f-2026-04-30-002.md` (force-tracked): new finding
  building on `f-2026-04-29-001.md` (PR #177). Documents that this
  run is the 4th consecutive failed heavy-goal attempt and proposes
  a corpus-active precondition (`if total citations == 0 in window AND
  total_artifacts > 0: skip with reason='corpus-dormant'`) as the
  durable remediation. Also records the run-brief stop clause:
  attempt 4 binds "stop attempting heavy-goal cycles on this goal for
  the rest of this run" — subsequent cycles pivot to disjoint work.

The goal still fails (the metric does not move from this commit alone —
moving it requires sustained citation activity across many sessions, OR
the corpus-active precondition's implementation). This cycle improves the
diagnostic so operators stop re-attempting the same goal across nightlies.
boshu2 pushed a commit that referenced this pull request Apr 30, 2026
… 3 files)

Generator-layer test hygiene cycle. Continues the t.Chdir migration started
in PR #177 (which covered 5 cmd/ao test files: plans, dedup, config,
metrics_flywheel, metrics_health). This cycle picks up disjoint files so
PR #177 and this nightly do not conflict on the same _test.go.

Files migrated (6 sites):
- batch_forge_test.go (3 sites): TestRunForgeBatch_NoPendingTranscripts,
  TestRunForgeBatch_DryRunAppliesMaxLimit, TestRunForgeBatch_ProcessesTranscript.
  In each, replaces the 4-line block `origDir, err := os.Getwd(); ...
  os.Chdir(tmpDir); ...; t.Cleanup(func() { _ = os.Chdir(origDir) })`
  with `t.Chdir(tmpDir)`. Subsequent `err = runForgeBatch(...)` becomes
  `err := runForgeBatch(...)` (the prior implicit declaration came from
  the os.Getwd line).
- batch_promote_test.go (1 site): TestRunBatchPromote_NoPendingCandidates,
  same pattern.
- cobra_commands_test.go (2 sites): TestCobraStatusCommand/json_not_initialized
  and TestCobraSeedCommand. Pattern is the simpler `defer func() { _ =
  os.Chdir(orig) }()` form; collapses to `t.Chdir(...)`.

Net: -32 lines, +0 logic change. `go test -count=1 ./cmd/ao/...` passes;
`go vet ./...` clean; pre-push fast gate passes; full goals measure
unchanged (17 pass / 2 fail — the two open-state failures are
flywheel-compounding W=8 corpus-bound and go-complexity-ceiling W=6
in-flight via PR #177).
boshu2 pushed a commit that referenced this pull request Apr 30, 2026
…ites, 3 files)

Generator-layer test hygiene cycle, continues PR #177's t.Chdir migration
on disjoint files (PR #177 covered cmd/ao tests; this cycle covers
internal/goals tests + the remaining doctor_test sites).

Files migrated (20 sites):

- cli/internal/goals/commands_test.go (17 sites): the per-package
  `chdir(t, dir) func()` helper is gone — every call site was the same
  exact two-line `cleanup := chdir(t, tmp); defer cleanup()` pattern,
  collapsed to `t.Chdir(tmp)`. Helper definition removed.
- cli/internal/goals/measure_test.go (1 site):
  TestGitSHA_OutsideGitRepo — the four-line origDir/Chdir/defer block
  collapsed to one line.
- cli/cmd/ao/doctor_test.go (2 sites): TestCheckHookCoverage and
  TestFallbackReasonSurfaced — both used the seven-line
  `oldWD, err := os.Getwd(); ... t.Cleanup(...)` pattern; collapsed to
  one line each.

Net: +20 lines, -74 lines. All package tests pass (`go test ./...`),
`go vet ./...` clean, fast pre-push gate passes, fitness still 17 pass
/ 2 fail (W=8 flywheel-compounding corpus-bound + W=6 go-complexity-
ceiling in-flight via PR #177).
boshu2 pushed a commit that referenced this pull request Apr 30, 2026
Generator-layer test hygiene. Continues t.Chdir migration on the
disjoint cli/cmd/ao/index_test.go (PR #177 covered different test files;
this file is independent so the migration does not conflict).

4 sites migrated (TestRunIndex_WriteMode/CheckMode_Stale/JSONOutput/
SingleDir): each had the 8-line `prevWD, err := os.Getwd(); ... ;
t.Cleanup(func() { _ = os.Chdir(prevWD) })` block, collapsed to a
single `t.Chdir(tmp)` call. The trailing bare `err = runIndex(...)`
that previously reused the `err` declared by `os.Getwd` is now
`err := runIndex(...)` (the test's `err` is now scoped to that
single assignment, which is fine since each subsequent use is a
read, not a reassignment).

Net: -36 lines, +8 lines. All cmd/ao tests pass; vet clean; goals
measure unchanged (17 pass / 2 fail).
boshu2 pushed a commit that referenced this pull request Apr 30, 2026
…, 2 files)

Generator-layer test hygiene cycle. Continues t.Chdir migration on
disjoint internal/ packages. PR #177 covered cmd/ao test files; this
cycle handles internal/search and internal/ratchet.

- cli/internal/search/constraint_test.go (18 sites): every usage of the
  three-line `prev, _ := os.Getwd(); defer func(){_ = os.Chdir(prev)}();
  _ = os.Chdir(tmp)` pattern collapses to `t.Chdir(tmp)`. Pattern was
  identical at every call site so a single replace_all captured them.
  No err shadowing or downstream `prev` references — the `_` discards
  in the original pattern proved the variable wasn't observed elsewhere.

- cli/internal/ratchet/gate_test.go (1 helper, 2 call sites): the
  package's `chdirTemp(t, dir)` helper had a 14-line manual save/restore
  using `os.Getwd` + `t.Cleanup`. Replaced the helper body with a single
  `t.Chdir(dir)` call. The two callers (line 646, 967) keep using the
  helper unchanged, so the migration is internal to the helper.

Net: -24 lines, +7 lines. Build clean, vet clean, both packages' tests
pass, fitness 17 pass / 2 fail (W=8 flywheel-compounding corpus,
W=6 go-complexity-ceiling in-flight via PR #177).

The rpi/worktree_test.go remaining os.Chdir sites are intentional
subprocess-helper code (not test cleanup) and were left unchanged.
boshu2 pushed a commit that referenced this pull request Apr 30, 2026
…ites)

Generator-layer test hygiene. Continues t.Chdir migration on disjoint
files (PR #177 covered different cmd/ao tests). The 7 sites in
extract_test.go all used the same identical 5-line pattern:

    origDir, _ := os.Getwd()
    defer func() { _ = os.Chdir(origDir) }()
    if err := os.Chdir(tempDir); err != nil {
        t.Fatal(err)
    }

The `_` discards err from os.Getwd and the `if err := ...` block has
its own scope, so no err-shadowing risk; a single replace_all swept
all sites cleanly.

Net: -28 lines, +7 lines. Build clean, vet clean, full cmd/ao test
suite passes (29.7s), fitness 17 pass / 2 fail.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants