dzianisv · dzianisv · Jun 1, 2026 · Jun 1, 2026 · Jun 1, 2026 · Jun 1, 2026
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -0,0 +1,19 @@
+{
+  "name": "opencode-plugins",
+  "owner": {
+    "name": "dzianisv",
+    "url": "https://github.com/dzianisv"
+  },
+  "plugins": [
+    {
+      "name": "reflection-cc",
+      "source": "./claude",
+      "description": "Re-prompts Claude Code when it stops prematurely — catches PERMISSION-SEEKING, STOPPED-WITH-TODOS, and FALSE-COMPLETE failure modes (78% of real agent stops are premature), and injects targeted recovery instructions via the Stop hook.",
+      "version": "0.1.0",
+      "author": {
+        "name": "dzianisv",
+        "url": "https://github.com/dzianisv"
+      }
+    }
+  ]
+}
diff --git a/.github/workflows/evals.yml b/.github/workflows/evals.yml
@@ -75,6 +75,10 @@ jobs:
           # Must be the base host, e.g. https://vibebrowser-dev.openai.azure.com
           AZURE_OPENAI_BASE_URL: ${{ secrets.AZURE_OPENAI_BASE_URL }}
           AZURE_OPENAI_API_BASE_URL: ${{ secrets.AZURE_OPENAI_BASE_URL }}
+          # Judge suite runs the cheap gpt-5.4-nano model (see promptfooconfig.yaml).
+          # It scores 33/34 — one borderline case differs from the gpt-5.1 baseline.
+          # Tolerate <=1 of 34 (>=97%); a 2nd failure turns CI red.
+          EVAL_PASS_THRESHOLD: "0.97"
         run: npm run eval:judge -- --no-progress-bar -o evals/results/judge-results.json
 
       - name: Run Stuck Detection Evaluation

diff --git a/.gitignore b/.gitignore
@@ -3,6 +3,8 @@
 !.agents/**
 !.github
 !.github/**
+!.claude-plugin
+!.claude-plugin/**
 !claude/.claude-plugin
 !claude/.claude-plugin/**
 .tts

diff --git a/README.md b/README.md
@@ -57,7 +57,33 @@ This plugin adds a **judge layer** that automatically evaluates task completion
 - **Local TTS** - Hear responses read aloud (Coqui VCTK/VITS, Chatterbox, macOS)
 - **Voice-to-text** - Reply to Telegram with voice messages, transcribed by local Whisper
 
-## Quick Install
+## Install via opencode.json (preferred)
+
+Add the reflection plugin to the `plugin` array in your `opencode.json` (project-level or `~/.config/opencode/opencode.json` for global):
+
+**Published npm package:**
+```json
+{
+  "$schema": "https://opencode.ai/config.json",
+  "plugin": ["opencode-reflection"]
+}
+```
+
+**Local path (from a clone of this repo):**
+```json
+{
+  "$schema": "https://opencode.ai/config.json",
+  "plugin": ["/absolute/path/to/opencode-plugins/packages/reflection"]
+}
+```
+
+OpenCode resolves the entry point from `package.json` `exports`, imports the default export (a `Plugin` function), and calls it automatically at startup. No manual file copying or `bun install` required — OpenCode handles dependency installation.
+
+Restart OpenCode after editing `opencode.json` to activate.
+
+---
+
+## Quick Install (copy-script method)
 
 ```bash
 curl -fsSL https://raw.githubusercontent.com/dzianisv/opencode-plugins/main/install.sh | bash
@@ -155,6 +181,40 @@ Evaluates task completion after each agent response and provides feedback if wor
 4. **Verdict**: PASS → toast notification | FAIL → feedback injected into chat
 5. **Continuation**: Agent receives feedback and continues working
 
+### Relation to Reflexion (Weng 2023 / Shinn et al. 2023)
+
+This plugin is, in the taxonomy of Lilian Weng's [*LLM Powered Autonomous Agents*](https://lilianweng.github.io/posts/2023-06-23-agent/),
+a **Reflexion**-style self-improvement loop — not ReAct, Chain-of-Hindsight, or
+Algorithm Distillation. The mapping is almost one-to-one:
+
+| Reflexion concept (Weng / Shinn et al.) | This plugin |
+| --- | --- |
+| **Actor** — the policy LLM that acts | The coding agent (OpenCode / Claude Code) itself |
+| **Evaluator** — scores the trajectory | The LLM-as-judge self-assessment (`buildSelfAssessmentPrompt` / `classifyStop`), run in an unbiased hidden session |
+| **Self-reflection** — verbal feedback added to memory for the next attempt | The feedback string injected back into the chat / the Stop-hook `block` reason — natural-language, not a scalar reward |
+| **Heuristic: "inefficient" trajectory (too long without success)** | `PLANNING_LOOP` detector — many tool calls with a near-zero write ratio (`PLANNING_LOOP_MIN_TOOL_CALLS`, `PLANNING_LOOP_WRITE_RATIO_THRESHOLD`) |
+| **Heuristic: "hallucination" = consecutive identical actions → same observation** | `ACTION_LOOP` detector — repeated identical commands above `ACTION_LOOP_REPETITION_THRESHOLD` |
+| **"Up to three reflections stored in working memory"** | `MAX_ATTEMPTS = 3` — at most three feedback injections per task before giving up |
+| **Reset the environment to start a new trial** | Re-prompt the *same* session to continue (no env reset — agentic coding has no episodic reset) |
+
+**Where it differs from textbook Reflexion:**
+
+- **Trigger granularity.** Classic Reflexion evaluates at the end of an episode
+  / on a failed trajectory. This plugin fires on the `session.idle` (OpenCode) or
+  `Stop` (Claude Code) boundary — i.e. *every time the agent thinks it's done* —
+  so its primary job is catching **premature stops**, not just failed runs.
+- **Evaluator design.** Reflexion's evaluator is a task-specific heuristic (and
+  sometimes an LLM). Here the evaluator is primarily an **LLM-as-judge** whose
+  rubric is **mined from 227 real agent stops** (78% were premature), layered on
+  top of the two Reflexion-style heuristics above.
+- **Verbal, not numeric.** Like Reflexion (and unlike RLHF/CoH), the feedback is
+  natural language fed straight back into context — no fine-tuning, no reward
+  model, no gradient updates.
+
+In short: **Reflexion = actor + evaluator + verbal self-reflection with a small
+bounded memory of retries**, and that is exactly the shape of this plugin, with
+the evaluator specialized toward detecting premature task abandonment.
+
 ### State Graph
 
 ```

diff --git a/claude/.claude-plugin/plugin.json b/claude/.claude-plugin/plugin.json
@@ -1,13 +1,13 @@
 {
   "name": "reflection-cc",
+  "displayName": "Reflection (Claude Code)",
   "version": "0.1.0",
-  "description": "Re-prompts Claude Code when it stops prematurely due to failure modes like summary-drift-stop or tool-available-punt",
-  "author": "dzianisv",
+  "description": "Re-prompts Claude Code when it stops prematurely — catches PERMISSION-SEEKING, STOPPED-WITH-TODOS, and FALSE-COMPLETE failure modes, and injects targeted recovery instructions.",
+  "author": {
+    "name": "dzianisv",
+    "url": "https://github.com/dzianisv"
+  },
+  "repository": "https://github.com/dzianisv/opencode-plugins",
   "license": "MIT",
-  "hooks": {
-    "stop": {
-      "command": "${CLAUDE_PLUGIN_ROOT}/bin/reflect.mjs",
-      "timeout": 30000
-    }
-  }
+  "hooks": "./hooks/hooks.json"
 }
diff --git a/claude/README.md b/claude/README.md
@@ -4,7 +4,28 @@ Re-prompts Claude Code when it stops prematurely due to failure modes like summa
 
 ## Install
 
-**Recommended (works today, CC v2.x):** add the Stop hook directly to `~/.claude/settings.json`:
+### Via `/plugin` marketplace (recommended)
+
+```bash
+# 1. Register the marketplace (one-time per machine)
+/plugin marketplace add dzianisv/opencode-plugins
+
+# 2. Install the plugin
+/plugin install reflection-cc
+```
+
+Or in one step using the CLI:
+
+```bash
+claude plugin marketplace add dzianisv/opencode-plugins
+claude plugin install reflection-cc
+```
+
+This uses the `marketplace.json` at the repo root (`.claude-plugin/marketplace.json`) which points the `./claude` subdirectory as the plugin source.
+
+### Manual (settings-based install — always works)
+
+Add the Stop hook directly to `~/.claude/settings.json`:
 
 ```json
 {
@@ -24,9 +45,9 @@ Re-prompts Claude Code when it stops prematurely due to failure modes like summa
 }
 ```
 
-The plugin manifest under `.claude-plugin/` is included for future marketplace publication, but in CC v2.1.150 `--plugin-dir` and the `enabledPlugins` config path do NOT activate `Stop` hooks for headless `-p` sessions. The settings-based install above is the authoritative path until that gap closes.
+**One-session try:** write the JSON above to a file and pass `--settings ./reflect-settings.json`.
 
-**One-session try:** `claude --settings '<json above>'` ... or write the JSON to a file and pass `--settings ./reflect-settings.json`.
+> Note: the Stop hook event name is `"Stop"` (capital S) — lowercase `"stop"` is silently ignored by Claude Code.
 
 ## Failure Categories
 

diff --git a/claude/bin/reflect.mjs b/claude/bin/reflect.mjs
@@ -357,9 +357,10 @@ export function buildStopContext(stopPayload, transcriptTail) {
     }
   }
 
-  // Derive final assistant text: prefer CC's `response` field (it IS the last turn),
-  // fall back to the last assistant entry's text content from the tail.
-  let final_assistant_text = (stopPayload?.response ?? '').trim();
+  // Derive final assistant text: prefer CC's `last_assistant_message` field (the
+  // documented Stop hook field name as of CC v2.x — NOT `response`), fall back
+  // to the last assistant entry's text content from the transcript tail.
+  let final_assistant_text = (stopPayload?.last_assistant_message ?? stopPayload?.response ?? '').trim();
   if (!final_assistant_text) {
     // Walk tail in reverse, find last assistant entry with a text block
     for (let i = transcriptTail.length - 1; i >= 0; i--) {

diff --git a/claude/hooks/hooks.json b/claude/hooks/hooks.json
@@ -1,8 +1,15 @@
 {
   "hooks": {
-    "stop": {
-      "command": "${CLAUDE_PLUGIN_ROOT}/bin/reflect.mjs",
-      "timeout": 30000
-    }
+    "Stop": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "${CLAUDE_PLUGIN_ROOT}/bin/reflect.mjs",
+            "timeout": 30
+          }
+        ]
+      }
+    ]
   }
 }