Skip to content

Feat/api cesar feel#213

Merged
cukas merged 3 commits into
mainfrom
feat/api-cesar-feel
Jun 12, 2026
Merged

Feat/api cesar feel#213
cukas merged 3 commits into
mainfrom
feat/api-cesar-feel

Conversation

@cukas

@cukas cukas commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

No description provided.

Instrument the three API-session guards (solo-coding gate, read-spin
breaker, ReportConfidence) with zero-hot-path-I/O telemetry so a week of
real use can answer, per engine, whether each guard fire actually
prevented a bad action or was pure ceremony:

- packages/core kern/telemetry: per-turn tracker (in-memory only;
  finalize is pure) + pure resolution derivation — a blocked write that
  gets re-issued near-identically resolves as ceremony, a changed write
  as prevented, a different path as redirected; only successful writes
  count as evidence. ReportConfidence calls resolve to calibration
  buckets (stated confidence vs subsequent edit success).
- Append-only JSONL ledger + per-engine×guard counters snapshot under
  ~/.agon/telemetry (atomic writes, file-locked RMW, 10MB rotation,
  best-effort: never throws into the session; raw write content never
  lands on disk — argsPreview + contentHash only).
- session-resume wiring: per-step round-trip/assembly/tool-exec timing,
  parallel-call counts, guard-overhead timing, exactly-once flush on
  every exit path (incl. generator abandonment); abnormal exits finalize
  as aborted so open fires stay unresolved.
- Read-spin delayed observation: the first sole-trigger crossing defers
  the recovery nudge one step to observe a true would-have-recovered
  counterfactual; the wasted extra round-trip is charged to the fire's
  overhead. Hard ceiling and all other triggers unchanged. Kill switch:
  AGON_GUARD_TELEMETRY=0.
- StatusDashboard: per-guard signal rows (ceremony/prevented,
  would-have-recovered/stalled, calibration hit rates) + per-engine
  parallel-rate/round-trip/assembly aggregates, 60s-memoized snapshot
  keyed to AGON_HOME, week-1 relax/keep recommendation per cell.

Reviewed by a multi-engine panel over 4 rounds (claude+codex; all
accepted findings fixed). 199 test files / 3001 tests green.

⚔️ Forged by [Agon](https://github.com/KERNlang/agon)

Co-Authored-By: agon (KERN) <292465531+KERN-Agon@users.noreply.github.com>
…engine flag

Collapse the three API-Cesar guards (solo-coding gate, read-spin breaker,
ReportConfidence ceremony) into two outcome-based invariants plus
self-verifying edits, opt-in per engine. Default is unchanged: without
the flag every engine runs the existing strict guards byte-identically
(proven by a golden test that replays both legacy gate texts).

- guards (packages/core kern/guards): pure per-call pipeline —
  grounded-write invariant (block Edit/Write/MultiEdit to a path never
  Read/Glob'd this session; net-new files whitelisted; read-then-edit
  passes instantly, including same-batch grounding), evidence invariant
  (a completion claim needs a state-advancing tool result, a green
  diagnostic, or an explicit unresolved-failure statement; one
  corrective nudge per turn, then pass-through with telemetry),
  information-gain ladder replacing the read-spin counter (cache-key +
  result-path + error + bash-stdout signals; nudge at 4 consecutive
  zero-gain stalls, hard stop at 8, global backstop 12), write-spin
  hard stop (3 identical ungrounded re-issues; RetrieveResult doesn't
  count as grounding), and a confidence gate that asks for
  ReportConfidence only on risky Bash, >=3-file writes, or delegation.
- Modes: engines/*.json guards: strict|invariants|shadow with
  ~/.agon/config.json override; shadow evaluates and records what
  would have fired without blocking anything.
- diagnostics (packages/core kern/diagnostics): auto post-Edit checker
  (tsc/ruff/pyright only when the repo advertises one), 400ms debounce
  per package, single in-flight, 4s budget, introduced-errors-only
  digest (fingerprint baseline, <=20 lines) re-entering the same turn
  as a paired tool message; full output retrievable via RetrieveResult.
  A green digest doubles as completion evidence.
- Session wiring: session-scoped read-path registry (persisted with
  session state, carried across engine handoffs in the workspace
  conversation, derived best-effort for old files), every verdict —
  including shadowed ones — recorded into the Phase-0 guard telemetry
  (new guard ids: evidence, confidence-escalation).
- Cesar prompt: under invariants the every-turn ReportConfidence
  ceremony is dropped (risky-action/uncertainty only); strict and
  shadow prompts unchanged.

Reviewed by a multi-engine panel over 5 rounds; all accepted findings
fixed (incl. fire-routing corruption, registry lifetime, MultiEdit
coverage, provider-correct digest pairing, cross-engine handoff).
203 test files / 3117 tests green; strict golden green.

⚔️ Forged by [Agon](https://github.com/KERNlang/agon)

Co-Authored-By: agon (KERN) <292465531+KERN-Agon@users.noreply.github.com>
…king-set carry-forward

Three council-validated quality-of-feel improvements for API-Cesar sessions:

- Dead-air heartbeat (CLI, UI-only): the active status strip appends an
  adaptive "· thinking/responding/processing… Ns" suffix after 2s of no
  meaningful event, capped at 120s with a not-stalled reassurance. Fingerprint
  keys off the normalized phase (parseHeartbeatPhase) so the brain's embedded
  elapsed counter never resets the anchor; streamed chunks DO reset it;
  jobs-only activity suppresses the suffix. Reuses the strip's existing
  1s isActive-gated ticker — zero new idle repaints, no new events/props.

- Read dedupe (core): session-scoped meta-cache keyed like the tool cache;
  one statSync (mtime+size) gates staleness; unchanged re-reads get a
  3-tier stub — inline ("already in context", slice-aware wording),
  RetrieveResult pointer (disk-cached/folded, live-checked), or clean
  re-execute. Large >INLINE_LIMIT reads are marked compacted at record
  time so the stub never overclaims. Invalidation: Edit/Write/MultiEdit
  per-path, Bash clears all; cache capped at 200, compacted-id set pruned
  at compaction. Kill-switch AGON_READ_DEDUPE=off. Dedupe hits charge 0
  tool-exec ms but still feed read-spin/info-gain detectors.

- Working-set carry-forward (core, compaction-only): CompactionSummary
  gains a WorkingSet — filesInPlay (recency-first: current-cycle writes
  incl. MultiEdit via isWriteTool + live registry, cap 10), pendingVerifier
  (DiagnosticRunner.lastVerifierStatus, timed-out checked before clean,
  mirrored at every drain site), and recent decisions/discoveries — rendered
  as a single WORKING SET line in the compaction summary. Per-turn injection
  deferred with P3.

Strict mode stays byte-identical (guard-strict-golden green). Reviewed by
multi-engine agon review over 5 rounds to zero codex findings.

⚔️ Forged by [Agon](https://github.com/KERNlang/agon)

Co-Authored-By: agon (KERN) <292465531+KERN-Agon@users.noreply.github.com>
@cukas cukas merged commit 4fbd54b into main Jun 12, 2026
2 checks passed
@cukas cukas deleted the feat/api-cesar-feel branch June 12, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants