This repository was archived by the owner on Jun 8, 2026. It is now read-only.
Claude Code reflection plugin + 907-stop classification baseline#139
Merged
Conversation
Group A of plan v2 for issue #137. Lays the foundation for the Claude Code reflection plugin without enabling it end-to-end yet: - claude/.claude-plugin/plugin.json + hooks/hooks.json — Stop hook wiring - claude/bin/reflect.mjs — entry skeleton with loop-guard, attempt counter, transcript tail-read, debug logging, fail-safe error handling. Strips tool_use/tool_result from the stop context per spec (only user msgs + final assistant text reach the judge). - claude/README.md, claude/package.json — install + author docs - evals/scripts/mine-cc-stops.mjs — scans ~/.claude/projects/**/*.jsonl, extracts Stop boundaries, emits candidate JSONL with metadata (tools_available_inferred, user_messages, final_assistant_text) - .gitignore — exclude raw cc-stop-*.jsonl datasets (contain user data); allow committing redacted gold set No classifier yet. No inject yet. Plugin loads but exits 0 on every Stop. Next: run miner, filter, classify with Claude Code haiku. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Group B/C of plan v2. - filter-cc-stops.mjs: heuristic pass over miner output. Tags each candidate with hint:summary_drift / hint:punt / hint:stuck / hint:question. Drops candidates with no hints (cheap "complete" answers). - classify-cc-stops.mjs: calls Anthropic API directly with the OAuth Bearer token from ~/.claude/.credentials.json (avoids the ~100K context bloat that `claude -p` loads from CLAUDE.md / skills / plugins). Same model (claude-haiku-4-5), same user auth — just routed direct. Concurrency 4, retry-on-429, resume-safe (skips records already in output). Output JSONL stays gitignored (evals/datasets/cc-stop-*.jsonl) — real user session data. Only the redacted gold subset is committed downstream. Smoke run: 10 samples classified in ~9s, 1294 input tokens/sample avg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end pipeline now works:
- claude/lib/judge.mjs: classifies a stop context into one of 6 categories
via Haiku 4.5 over the Anthropic API (OAuth Bearer from
~/.claude/.credentials.json, same path as the eval classifier). 15s
hard timeout via AbortController. TIMEOUT/PARSE_ERROR returns are
treated as "no inject" by the caller — fail-safe.
- claude/lib/feedback.mjs: per-category templates with escalating tone
across attempts 1/2/3. Injects on summary_drift_stop, tool_available_punt,
genuinely_stuck. Skips on complete, waiting_for_user_legitimate, working,
and any error category.
- claude/bin/reflect.mjs: replaced the task-11/13 TODO blocks. Now reads
stdin, applies loop-guard + attempt-cap, calls judge, writes verdict
file, and (if injectable) emits the {decision:"block", additionalContext}
JSON on stdout per Claude Code Stop hook spec.
Smoke-tested with a real transcript file. Verified:
- happy path produces a valid block payload with additionalContext
- stop_hook_active=true: exits 0, no stdout, logs loop_guard_triggered
- attempt counter at MAX: exits 0, no stdout, logs attempt_cap_reached
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#137) - claude/test/reflect.test.mjs: 35 Node native-test cases covering feedback templates per category/attempt, reflect.mjs exports (loopGuard, attempt counter round-trip, transcript tail, stop context build), judge.mjs (stubbed fetch — zero real API calls, code-fence parsing, 429 retry, AbortController timeout, missing credentials path), and an in-process integration test (classify → buildFeedback → block output JSON). All 35 pass in ~300ms with --test-force-exit. - claude/package.json: test script uses --test-force-exit + explicit glob (test discovery without glob silently mis-resolved on Node 22). - evals/scripts/audit-cc-classifications.mjs: stratified sample (per-cat) + redaction (emails, tokens, /home paths, github refs, long secrets). - evals/datasets/cc-stop-labeled-gold-redacted.jsonl: 30 records, stratified 6 per category across the 5 categories that appeared in the 907-record baseline. supervisor-audited gold_label per record (v1 mostly accepts haiku, with one correction class: "complete" + ends-with-"Which?" → waiting_for_user_legitimate). - evals/datasets/README.md: dataset provenance, redaction rules, baseline distribution, known prompt issues (link to follow-up #138). Follow-up tracked in #138: refine classifier prompt (working over-assigned 374×, tool_available_punt under-assigned 0×). Acceptance: F1 ≥ 0.75 on the two high-value categories with an expanded gold set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer raised 5 real issues, all fixed: 1. claude/bin/reflect.mjs:23 — removed unused createRequire import. 2. claude/bin/reflect.mjs:100-109 — added sanitizeCwd() helper. Rejects non-absolute or non-normalized cwd from the Stop hook payload (defends against payloads like cwd:"../etc"). On throw, the existing uncaughtException handler exits 0 — fail-safe. 3. claude/bin/reflect.mjs:165-186 — writeAttemptCounter is now atomic (tmp + POSIX rename) AND concurrency-safe: only writes if the new count exceeds the existing on-disk count. Prevents two racing Stop hooks for the same session from clobbering each other and bypassing the 3-inject cap. 4. claude/bin/reflect.mjs:148-154 — readAttempts handles a corrupt / partially-written counter file by returning 0 and logging "attempts_file_corrupt". 5. claude/lib/judge.mjs:43-62, 285+ — added sanitizeError() helper. Strips Bearer/authorization/x-api-key from API error texts before they reach debug logs. Prevents the OAuth token from leaking if the Anthropic API echoes auth headers on a 401. 6. evals/scripts/audit-cc-classifications.mjs:34-40 — strengthened redaction patterns: fixed "Accept-Bearer" → case-insensitive "Authorization: Bearer", added x-api-key, Stripe (sk/pk/rk_test/live), AWS access keys (AKIA...), and JWT-shaped tokens (a.b.c). JWT pattern placed before the long-secret regex because dots break \b boundaries. Existing 35 unit tests still pass (npm test, 291ms). Smoke verified: - valid absolute cwd → emits decision:block as before - cwd:"/tmp/../etc" → sanitizeCwd throws → uncaughtException → exit 0, no stdout, no fs writes outside the project tree - cwd:"./relative" → same fail-safe behavior Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 7 reviewer flagged that the 35-test suite in claude/test/ was not run by CI — only the root Jest suite (test/*.ts) was. Adds a post-step that runs node --test --test-force-exit test/*.mjs in ./claude so future regressions land in CI, not on the dev box. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per user feedback: stubbed-fetch unit tests can't prove the Stop hook
actually fires inside Claude Code or that injects reach the agent. Real
E2E with `claude -p` + real Anthropic API is the only meaningful gate.
Changes:
1. Deleted claude/test/reflect.test.mjs (35 unit tests, all stubbed).
2. Removed the corresponding CI step in .github/workflows/test.yml.
3. Added claude/test/e2e-cc.mjs: real E2E runner with 4 scenarios:
- explicit_wait_negative: user says "wait" -> plugin must not inject.
- complete_negative: trivial Q&A -> plugin must not inject.
- attempt_cap_respected: multi-file task -> no false-positive injects,
attempt cap honored.
- direct_pipe_summary_drift: synthetic drift transcript piped directly
to reflect.mjs -> verifies the full inject path: real classifier
call, correct CC Stop hook schema in stdout, no hookSpecificOutput.
Run: node claude/test/e2e-cc.mjs (or per scenario: --scenario N).
Cost ~$0.05-0.20/scenario via Haiku 4.5 OAuth. Out of CI (auth + cost).
Bug fixes uncovered by E2E:
1. claude/bin/reflect.mjs: hook fires BEFORE transcript flush in -p
mode. Added poll loop (100ms x 10) that re-reads transcript until the
final assistant text appears. If still empty after polling, exit 0
(fail-safe -- better to skip than false-positive inject).
2. claude/bin/reflect.mjs: Stop hook JSON schema fix. CC v2.1.150
rejects { decision, reason, hookSpecificOutput: {...} } as "Invalid
input" -- that shape is for PreToolUse / PostToolUse. The correct
Stop hook shape per hookify/core/rule_engine.py and empirical test
is { decision: "block", reason }. CC injects reason as the agent's
next-turn instruction; the longer feedback message now goes in
reason. Verified by hook_blocking_error attachment + isMeta user
message "Stop hook feedback: <reason>" in the transcript.
E2E results (2026-05-26):
- 4/4 PASS
- s1 (explicit_wait_negative): 0 injects (correct)
- s2 (complete_negative): 0 injects (correct)
- s3 (attempt_cap_respected): 0 injects (Haiku didn't drift on this task)
- s4 (direct_pipe_summary_drift): 1 inject with schema-valid stdout
Known test-methodology limitation (follow-up): Haiku 4.5 rarely drifts
on small E2E prompts so scenario 3 is vacuously satisfied. The architecture
is proven; pattern provocation needs Sonnet or longer-horizon tasks.
Install for sessions (workaround for --plugin-dir not enabling Stop
hooks in -p mode, CC v2.1.150): merge hooks/hooks.json into your
~/.claude/settings.json under the "hooks" key, with command path
pointing at this plugin's bin/reflect.mjs absolute path. Plugin packaging
remains for future marketplace publication.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…137) - Install: settings.json hook is the authoritative path; --plugin-dir doesn't activate Stop hooks in headless -p mode on CC v2.1.150. Document the marketplace path as future work. - Failure categories: corrected to the 6 the classifier actually uses (matched judge.mjs/feedback.mjs). Removed the older speculative context_exhaustion/decision_paralysis/false_completion entries that never landed in the prompt. - Testing: documented the new E2E runner (node claude/test/e2e-cc.mjs) with scenario descriptions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Thinking Path
What Changed
deploy/systemd/paperclip.servicetemplate covering the three common install styles (npx, source checkout viapnpm, source checkout via directtsx) with inlineTODOmarkers for the two values an operator must edit.doc/SYSTEMD.mdwalkthrough: install, enable lingering, start, verify, common ops, updating, and a troubleshooting section for the failure modes that bit me on first install (no-TTYtsxloader,Start request repeated too quickly,ENOSPCcrash loops, tailnet bind ordering).README.md: one-line link from the install snippet so first-time self-hosters discover the systemd path without having to search.doc/DEVELOPING.md: one-paragraph cross-link next to the Docker Quickstart / Quadlet sections.No code, no manifest, no lockfile, no Dockerfile changes — strictly docs + a sample unit.
Verification
Both pieces were validated on the host that motivated this change (Ubuntu 24, source checkout, tailnet bind, embedded Postgres):
~/.config/systemd/user/paperclip.service, fill in the twoTODOvalues.sudo loginctl enable-linger "$USER"thensystemctl --user daemon-reload && systemctl --user enable --now paperclip.service.systemctl --user status paperclip.service→Active: active (running)within ~10 s.journalctl --user -u paperclip.service -fshows the Paperclip banner, embedded PostgreSQL ready line, and Better Auth init.curl -sf http://<bind-ip>:3100/api/health→{"status":"ok",...}.Sanity-checked the doc against the failure paths in the troubleshooting section by reproducing them on the same host before writing them up.
Risks
Low. Pure docs and a sample file under a new
deploy/systemd/path. No existing files are removed, no runtime, build, or CI behavior changes. The README/DEVELOPING edits are additive paragraphs. Worst case for a reader is that they follow the guide on an unsupported distro and the service does not start — at which point they are no worse off than before this PR existed.Model Used
Claude Opus 4.7 (1M context window, extended thinking enabled, tool use including Bash/Read/Edit/Write/Grep). Human-reviewed, edited, and verified on the target host.
Checklist
Closes #467. Supersedes #555 (docs-only) by also shipping the sample unit file.