feat(reviewer): INJ Phase 1 root fix — code-computed typed verdict (D-INJ-1) by ProtocolWarden · Pull Request #349 · ProtocolWarden/OperationsCenter

ProtocolWarden · 2026-06-20T03:04:48Z

Summary

First PR of Harness Trust-Hardening Phase 1 (INJ) — operator-implemented (the fleet must not author the controls that constrain it).

The reviewer emitted a free-text {"result": "LGTM"} that the model authored, so any prompt injection in the diff / campaign spec / Custodian findings contended directly for the merge decision (suppress a CONCERNS, forge an LGTM). Per spec §2.2 / D-INJ-1, the capability itself is removed.

Change

New pr_review_watcher/verdict.py: enumerated REVIEW_CHECKS, pure compute_verdict(checks) -> (result, failing), and verdict_schema_prompt().
The model now fills a typed {check_id, status, evidence_span} per check. _run_direct_review (the trust boundary) runs compute_verdict and returns a code-computed result — ignoring any model-authored result.
Fail-safe: missing / unknown / malformed → CONCERNS, never auto-LGTM (also satisfies D-INJ-2 degrade-to-stricter).

Acceptance (§2.4)

A forged {"result":"LGTM"} with no real checks computes to CONCERNS (unit + trust-boundary tests). 11 verdict-unit + 2 boundary tests; 237 reviewer tests pass; ruff/ty/audit clean.

Remaining Phase-1 PRs: typed circular hand-off (D-INJ-4), {detector_id,count} findings (D-INJ-3), output sanitization, nonce-fenced envelope, Custodian INJ1 detector.

…-INJ-1) The reviewer emitted a free-text {"result": "LGTM"} the MODEL authored, so any prompt injection in the diff/spec/Custodian findings contended directly for the merge decision. Per HARNESS_TRUST_HARDENING.md §2.2/D-INJ-1 the capability is removed: the model fills a typed {check_id, status, evidence_span} per enumerated review check and CODE computes LGTM/CONCERNS. - New pr_review_watcher/verdict.py: REVIEW_CHECKS, compute_verdict() (pure), verdict_schema_prompt(). - _run_direct_review (the trust boundary) computes the verdict from the model's typed checks and ignores any model-authored "result". - Fail-safe: missing/unknown/malformed -> CONCERNS, never auto-LGTM (D-INJ-2). Acceptance (§2.4): a forged {"result":"LGTM"} with no real checks computes to CONCERNS. 11 verdict-unit + 2 boundary tests; 237 reviewer tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ProtocolWarden · 2026-06-20T03:33:40Z

Resolved: CI green on unchanged head — test suite validates implementation; automated review resumed

~~Needs human attention~~ (reason=ci_misconfigured_check). Left open — not merged (unresolved) and not closed (work preserved).

CI has not gone green after 20 checks (1 failing: License headers: failure). Not merged (red CI) and not closed (work preserved) — needs a human to fix CI.

ProtocolWarden · 2026-06-20T03:36:29Z

Needs human attention (reason=ci_misconfigured_check). Left open — not merged (unresolved) and not closed (work preserved).

CI has not gone green after 21 checks (1 failing: License headers: failure). Not merged (red CI) and not closed (work preserved) — needs a human to fix CI.

ProtocolWarden mentioned this pull request Jun 20, 2026

feat(reviewer): INJ Phase 1 — fenced findings (D-INJ-3) + nonce envelope + output sanitization #350

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(reviewer): INJ Phase 1 root fix — code-computed typed verdict (D-INJ-1)#349

feat(reviewer): INJ Phase 1 root fix — code-computed typed verdict (D-INJ-1)#349
ProtocolWarden wants to merge 1 commit into
mainfrom
goal/inj-typed-verdict

ProtocolWarden commented Jun 20, 2026

Uh oh!

ProtocolWarden commented Jun 20, 2026 •

edited

Loading

Uh oh!

ProtocolWarden commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ProtocolWarden commented Jun 20, 2026

Summary

Change

Acceptance (§2.4)

Uh oh!

ProtocolWarden commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProtocolWarden commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ProtocolWarden commented Jun 20, 2026 •

edited

Loading