This repository was archived by the owner on Jun 8, 2026. It is now read-only.
Install reflection on Claude Code (/plugin) + OpenCode (plugin section); macOS keychain auth; cheaper CI eval#142
Merged
Conversation
…n section)
Claude Code port (claude/):
- Fix Stop hook contract: event name "Stop" (was "stop"), array hook
format, read last_assistant_message from hook stdin, emit
{decision:"block",reason} to re-prompt; exit 0 to approve.
- Mirror reflection-3.ts PREMATURE-STOP ANTIPATTERNS into judge.mjs
(PERMISSION-SEEKING / STOPPED-WITH-TODOS / FALSE-COMPLETE).
- Auth: ANTHROPIC_API_KEY fallback in addition to OAuth; fail-safe
approve-stop on judge error. Add REFLECTION_CC_FAKE_JUDGE test hook.
- Add root .claude-plugin/marketplace.json (source ./claude) so
`/plugin marketplace add dzianisv/opencode-plugins` + `/plugin install
reflection-cc` work. Un-ignore .claude-plugin/ in .gitignore.
OpenCode plugin-section install (packages/reflection/):
- Publishable npm package `opencode-reflection` mirroring the auto-review
packaging pattern: index.ts re-export, prepack/postpack symlink swap,
files allowlist. Add to opencode.json "plugin": ["opencode-reflection"]
(or a local path) instead of the copy script.
- README: document both install paths.
Verified: typecheck clean, claude e2e 2/2 (3 need live API creds),
npm pack --dry-run = 3 files / 20.8kB, plugin-load test unaffected.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
On macOS, Claude Code stores credentials in the login keychain (generic password "Claude Code-credentials"), not ~/.claude/.credentials.json. The judge's OAuth path only read the file, so the in-hook judge could never authenticate on a Mac and silently fell back to no-inject — i.e. the plugin did nothing for most Mac users. loadAuth() now falls back to `security find-generic-password -s "Claude Code-credentials" -w` on darwin. Verified end-to-end: a real classifyStop() call authenticates via the keychain token and returns a verdict from the live Anthropic API. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Document that the plugin is a Reflexion-style actor/evaluator/verbal-self- reflection loop (not ReAct/CoH/AD), with a one-to-one mapping: the loop heuristics (PLANNING_LOOP, ACTION_LOOP) mirror Reflexion's inefficient/ hallucinated-trajectory detectors and MAX_ATTEMPTS=3 mirrors its bounded reflection memory. Notes where it differs (fires on stop/idle to catch premature stops; LLM-as-judge rubric mined from 227 real stops). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Cut CI judge-eval cost ~25x by switching the provider from gpt-5.1 to the cheapest deployed dev-endpoint model, gpt-5.4-nano. Benchmarked all 34 cases: gpt-5.1 34/34, gpt-5.4 / gpt-5.4-mini / gpt-5.4-nano all 33/34 — the single miss is calibration variance on one borderline case shared by the whole 5.4 family, not a premature-stop-logic failure. run-promptfoo.mjs now honors EVAL_PASS_THRESHOLD: when set, a run that promptfoo failed is re-checked against the suite pass rate (only ever relaxes a fail, never reddens a pass; falls back to native exit on any parse error) and prints the tolerated cases. CI judge step sets 0.97 (tolerate <=1 of 34); a 2nd failure turns CI red. gpt-5.1 remains a one-line swap for full fidelity. Production judge still blocks small models via JUDGE_BLOCKED_PATTERNS — this is CI-eval only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See commits. Adds Claude Code /plugin install (marketplace.json + fixed Stop-hook contract), OpenCode plugin-section install (packages/reflection -> opencode-reflection), macOS keychain OAuth fix for the in-hook judge, ~25x cheaper CI judge eval (gpt-5.4-nano + EVAL_PASS_THRESHOLD=0.97), and a README section mapping the plugin to Reflexion (Weng 2023). Verified locally: live claude -p Stop-hook E2E re-prompted via keychain-authed judge; typecheck clean; npm pack 3 files; CI-equivalent judge run green at 33/34 via threshold.