Skip to content

Pre-flight: emission-contract empirical sniff #20

@heavygee

Description

@heavygee

Goal

Measure, in a one-evening experiment, how reliably the prompted event-emission contract (contracts §1) actually gets compliance from real worker agents, so Step 2's scope and risk are calibrated against real data rather than assumption.

Spec

  • docs/plans/2026-06-03-overseer-contracts.md §1 (worker event taxonomy, wire format, hub-observed fallback)
  • docs/plans/2026-06-03-overseer-build-sequence.md Step 2 (the events substrate this informs)

Experiment

  • Spawn one worker per flavor (Cursor / Claude / Codex) with the §1 wire-format prompt baked into the system instruction (sentinel-delimited JSON block, schema_version, event objects with event_type + summary).
  • Give each a small bounded task (run tests, open a PR, fix a small bug).
  • Measure: emission rate (did the worker emit at the moments the taxonomy expects?) and shape conformance (valid JSON inside sentinels, required fields present, sane event_type values).

Acceptance

  • A short written finding: "compliance is X%, malformed in Y way, hub-observed fallback needs to cover Z."
  • The finding names which event types workers reliably emit vs. routinely miss (e.g. completed, blocked, stale).
  • The finding recalibrates Step 2's hub-observed synthesis scope (which gaps the fallback must cover).

Out of scope

Dependencies

Kill-criterion

If compliance is under ~40% even with prompt iteration, the prompted-emission contract is the wrong primitive and #22-#26 need a code-level emission API rewrite (see framing doc "things to push back on" #11). Surfacing this early is the entire point of the pre-flight.

Risks

  • A too-narrow task set could overstate compliance (workers emit well on the happy path, poorly under failure). Include at least one task that fails or stalls so failed / blocked / stale emission is exercised.

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureArchitectural / substrate workfleet-overseerFleet attention-arbitration architecturepre-flightPre-flight experiment that recalibrates downstream scope

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions