Step 2.75: Replay harness v0 + CI gate

## Goal

Build the fleet replay harness — captured-stream loader, run-once promotion/prioritization entry point, golden-scenario assertions, the one-boss invariant test stub — and gate it in CI so Overseer logic changes can be told apart from regressions.

## Spec

- `docs/plans/2026-06-03-overseer-build-sequence.md` Step 2.75 (primary)
- `docs/plans/2026-06-03-overseer-prioritization.md` §6 (replay / evaluation harness, golden test cases, KPIs)
- `docs/adr/0001-worker-facing-attribution-one-boss.md` §"Invariant test" (the one-boss invariant stub)
- `docs/plans/2026-06-03-overseer-contracts.md` §7 (transcript retention — fixtures must not be production transcripts)

## Acceptance

- [ ] Captured-event-stream loader: reads events + event_links + inbox_items from a snapshot file, replays into a sandbox DB.
- [ ] Promotion + prioritization run-once entry point invokable against a snapshot without touching the production DB.
- [ ] Golden-scenario assertions for the starter set (prioritization §6 table): 30 routine `progress` events surface nothing; same `dedupe_key` collapses; root-cause `blocked_by` chain surfaces upstream not symptoms; stale-item aging; etc. Initial target: at least 10 of the listed scenarios.
- [ ] One-boss invariant test stub (ADR-001 §"Invariant test"): for every `dispatched` event, the corresponding worker-facing `messages` row carries no Overseer-attribution metadata and the rendered instruction contains no generated attribution boilerplate. Passes vacuously now (no dispatches yet) but the assertion shape is wired so #26 activates real coverage automatically.
- [ ] CI gate: harness runs on every PR touching Overseer logic, inbox scoring, event taxonomy, or worker-emission contract. Failure blocks merge.
- [ ] Captured fixtures live under `test/fixtures/overseer-replay/` and are NOT production transcripts.

## Out of scope

- Salience-weight learning / quantitative persona tuning beyond the golden set (post-MVP).
- The dispatch path itself (#26) — only the invariant stub is wired here.

## Dependencies

- Blocks: #25 (Steps 3-4 ship behind the harness)
- Blocked by: #23 (replay asserts against promotion + scoring output)
- Part of: #19

## Suggested PR breakdown

1 PR: replay harness v0; golden scenarios; one-boss invariant test stub; CI gate.

## Risks

- Skipping or under-investing in this step is the single highest-leverage way to fail the whole project. Without harness-backed assertions, Steps 3-4 ship behavior changes nobody can tell improved or regressed the persona; every prompt edit becomes a hand-eval. Build at least the 10-scenario starter set + the one-boss invariant stub.
- Captured fixtures must NOT be production transcripts (contracts §7) — sanitized synthetic fixtures or explicitly captured non-production sessions only, or the harness ships operator transcript data into CI logs / artifact storage / public PR diffs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step 2.75: Replay harness v0 + CI gate #24

Goal

Spec

Acceptance

Out of scope

Dependencies

Suggested PR breakdown

Risks

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Step 2.75: Replay harness v0 + CI gate #24

Description

Goal

Spec

Acceptance

Out of scope

Dependencies

Suggested PR breakdown

Risks

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions