Feat/api cesar feel#213
Merged
Merged
Conversation
Instrument the three API-session guards (solo-coding gate, read-spin breaker, ReportConfidence) with zero-hot-path-I/O telemetry so a week of real use can answer, per engine, whether each guard fire actually prevented a bad action or was pure ceremony: - packages/core kern/telemetry: per-turn tracker (in-memory only; finalize is pure) + pure resolution derivation — a blocked write that gets re-issued near-identically resolves as ceremony, a changed write as prevented, a different path as redirected; only successful writes count as evidence. ReportConfidence calls resolve to calibration buckets (stated confidence vs subsequent edit success). - Append-only JSONL ledger + per-engine×guard counters snapshot under ~/.agon/telemetry (atomic writes, file-locked RMW, 10MB rotation, best-effort: never throws into the session; raw write content never lands on disk — argsPreview + contentHash only). - session-resume wiring: per-step round-trip/assembly/tool-exec timing, parallel-call counts, guard-overhead timing, exactly-once flush on every exit path (incl. generator abandonment); abnormal exits finalize as aborted so open fires stay unresolved. - Read-spin delayed observation: the first sole-trigger crossing defers the recovery nudge one step to observe a true would-have-recovered counterfactual; the wasted extra round-trip is charged to the fire's overhead. Hard ceiling and all other triggers unchanged. Kill switch: AGON_GUARD_TELEMETRY=0. - StatusDashboard: per-guard signal rows (ceremony/prevented, would-have-recovered/stalled, calibration hit rates) + per-engine parallel-rate/round-trip/assembly aggregates, 60s-memoized snapshot keyed to AGON_HOME, week-1 relax/keep recommendation per cell. Reviewed by a multi-engine panel over 4 rounds (claude+codex; all accepted findings fixed). 199 test files / 3001 tests green. ⚔️ Forged by [Agon](https://github.com/KERNlang/agon) Co-Authored-By: agon (KERN) <292465531+KERN-Agon@users.noreply.github.com>
…engine flag Collapse the three API-Cesar guards (solo-coding gate, read-spin breaker, ReportConfidence ceremony) into two outcome-based invariants plus self-verifying edits, opt-in per engine. Default is unchanged: without the flag every engine runs the existing strict guards byte-identically (proven by a golden test that replays both legacy gate texts). - guards (packages/core kern/guards): pure per-call pipeline — grounded-write invariant (block Edit/Write/MultiEdit to a path never Read/Glob'd this session; net-new files whitelisted; read-then-edit passes instantly, including same-batch grounding), evidence invariant (a completion claim needs a state-advancing tool result, a green diagnostic, or an explicit unresolved-failure statement; one corrective nudge per turn, then pass-through with telemetry), information-gain ladder replacing the read-spin counter (cache-key + result-path + error + bash-stdout signals; nudge at 4 consecutive zero-gain stalls, hard stop at 8, global backstop 12), write-spin hard stop (3 identical ungrounded re-issues; RetrieveResult doesn't count as grounding), and a confidence gate that asks for ReportConfidence only on risky Bash, >=3-file writes, or delegation. - Modes: engines/*.json guards: strict|invariants|shadow with ~/.agon/config.json override; shadow evaluates and records what would have fired without blocking anything. - diagnostics (packages/core kern/diagnostics): auto post-Edit checker (tsc/ruff/pyright only when the repo advertises one), 400ms debounce per package, single in-flight, 4s budget, introduced-errors-only digest (fingerprint baseline, <=20 lines) re-entering the same turn as a paired tool message; full output retrievable via RetrieveResult. A green digest doubles as completion evidence. - Session wiring: session-scoped read-path registry (persisted with session state, carried across engine handoffs in the workspace conversation, derived best-effort for old files), every verdict — including shadowed ones — recorded into the Phase-0 guard telemetry (new guard ids: evidence, confidence-escalation). - Cesar prompt: under invariants the every-turn ReportConfidence ceremony is dropped (risky-action/uncertainty only); strict and shadow prompts unchanged. Reviewed by a multi-engine panel over 5 rounds; all accepted findings fixed (incl. fire-routing corruption, registry lifetime, MultiEdit coverage, provider-correct digest pairing, cross-engine handoff). 203 test files / 3117 tests green; strict golden green. ⚔️ Forged by [Agon](https://github.com/KERNlang/agon) Co-Authored-By: agon (KERN) <292465531+KERN-Agon@users.noreply.github.com>
…king-set carry-forward
Three council-validated quality-of-feel improvements for API-Cesar sessions:
- Dead-air heartbeat (CLI, UI-only): the active status strip appends an
adaptive "· thinking/responding/processing… Ns" suffix after 2s of no
meaningful event, capped at 120s with a not-stalled reassurance. Fingerprint
keys off the normalized phase (parseHeartbeatPhase) so the brain's embedded
elapsed counter never resets the anchor; streamed chunks DO reset it;
jobs-only activity suppresses the suffix. Reuses the strip's existing
1s isActive-gated ticker — zero new idle repaints, no new events/props.
- Read dedupe (core): session-scoped meta-cache keyed like the tool cache;
one statSync (mtime+size) gates staleness; unchanged re-reads get a
3-tier stub — inline ("already in context", slice-aware wording),
RetrieveResult pointer (disk-cached/folded, live-checked), or clean
re-execute. Large >INLINE_LIMIT reads are marked compacted at record
time so the stub never overclaims. Invalidation: Edit/Write/MultiEdit
per-path, Bash clears all; cache capped at 200, compacted-id set pruned
at compaction. Kill-switch AGON_READ_DEDUPE=off. Dedupe hits charge 0
tool-exec ms but still feed read-spin/info-gain detectors.
- Working-set carry-forward (core, compaction-only): CompactionSummary
gains a WorkingSet — filesInPlay (recency-first: current-cycle writes
incl. MultiEdit via isWriteTool + live registry, cap 10), pendingVerifier
(DiagnosticRunner.lastVerifierStatus, timed-out checked before clean,
mirrored at every drain site), and recent decisions/discoveries — rendered
as a single WORKING SET line in the compaction summary. Per-turn injection
deferred with P3.
Strict mode stays byte-identical (guard-strict-golden green). Reviewed by
multi-engine agon review over 5 rounds to zero codex findings.
⚔️ Forged by [Agon](https://github.com/KERNlang/agon)
Co-Authored-By: agon (KERN) <292465531+KERN-Agon@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.