Skip to content
This repository was archived by the owner on Jun 8, 2026. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"name": "opencode-plugins",
"owner": {
"name": "dzianisv",
"url": "https://github.com/dzianisv"
},
"plugins": [
{
"name": "reflection-cc",
"source": "./claude",
"description": "Re-prompts Claude Code when it stops prematurely — catches PERMISSION-SEEKING, STOPPED-WITH-TODOS, and FALSE-COMPLETE failure modes (78% of real agent stops are premature), and injects targeted recovery instructions via the Stop hook.",
"version": "0.1.0",
"author": {
"name": "dzianisv",
"url": "https://github.com/dzianisv"
}
}
]
}
4 changes: 4 additions & 0 deletions .github/workflows/evals.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,10 @@ jobs:
# Must be the base host, e.g. https://vibebrowser-dev.openai.azure.com
AZURE_OPENAI_BASE_URL: ${{ secrets.AZURE_OPENAI_BASE_URL }}
AZURE_OPENAI_API_BASE_URL: ${{ secrets.AZURE_OPENAI_BASE_URL }}
# Judge suite runs the cheap gpt-5.4-nano model (see promptfooconfig.yaml).
# It scores 33/34 — one borderline case differs from the gpt-5.1 baseline.
# Tolerate <=1 of 34 (>=97%); a 2nd failure turns CI red.
EVAL_PASS_THRESHOLD: "0.97"
run: npm run eval:judge -- --no-progress-bar -o evals/results/judge-results.json

- name: Run Stuck Detection Evaluation
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
!.agents/**
!.github
!.github/**
!.claude-plugin
!.claude-plugin/**
!claude/.claude-plugin
!claude/.claude-plugin/**
.tts
Expand Down
62 changes: 61 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,33 @@ This plugin adds a **judge layer** that automatically evaluates task completion
- **Local TTS** - Hear responses read aloud (Coqui VCTK/VITS, Chatterbox, macOS)
- **Voice-to-text** - Reply to Telegram with voice messages, transcribed by local Whisper

## Quick Install
## Install via opencode.json (preferred)

Add the reflection plugin to the `plugin` array in your `opencode.json` (project-level or `~/.config/opencode/opencode.json` for global):

**Published npm package:**
```json
{
"$schema": "https://opencode.ai/config.json",
"plugin": ["opencode-reflection"]
}
```

**Local path (from a clone of this repo):**
```json
{
"$schema": "https://opencode.ai/config.json",
"plugin": ["/absolute/path/to/opencode-plugins/packages/reflection"]
}
```

OpenCode resolves the entry point from `package.json` `exports`, imports the default export (a `Plugin` function), and calls it automatically at startup. No manual file copying or `bun install` required — OpenCode handles dependency installation.

Restart OpenCode after editing `opencode.json` to activate.

---

## Quick Install (copy-script method)

```bash
curl -fsSL https://raw.githubusercontent.com/dzianisv/opencode-plugins/main/install.sh | bash
Expand Down Expand Up @@ -155,6 +181,40 @@ Evaluates task completion after each agent response and provides feedback if wor
4. **Verdict**: PASS → toast notification | FAIL → feedback injected into chat
5. **Continuation**: Agent receives feedback and continues working

### Relation to Reflexion (Weng 2023 / Shinn et al. 2023)

This plugin is, in the taxonomy of Lilian Weng's [*LLM Powered Autonomous Agents*](https://lilianweng.github.io/posts/2023-06-23-agent/),
a **Reflexion**-style self-improvement loop — not ReAct, Chain-of-Hindsight, or
Algorithm Distillation. The mapping is almost one-to-one:

| Reflexion concept (Weng / Shinn et al.) | This plugin |
| --- | --- |
| **Actor** — the policy LLM that acts | The coding agent (OpenCode / Claude Code) itself |
| **Evaluator** — scores the trajectory | The LLM-as-judge self-assessment (`buildSelfAssessmentPrompt` / `classifyStop`), run in an unbiased hidden session |
| **Self-reflection** — verbal feedback added to memory for the next attempt | The feedback string injected back into the chat / the Stop-hook `block` reason — natural-language, not a scalar reward |
| **Heuristic: "inefficient" trajectory (too long without success)** | `PLANNING_LOOP` detector — many tool calls with a near-zero write ratio (`PLANNING_LOOP_MIN_TOOL_CALLS`, `PLANNING_LOOP_WRITE_RATIO_THRESHOLD`) |
| **Heuristic: "hallucination" = consecutive identical actions → same observation** | `ACTION_LOOP` detector — repeated identical commands above `ACTION_LOOP_REPETITION_THRESHOLD` |
| **"Up to three reflections stored in working memory"** | `MAX_ATTEMPTS = 3` — at most three feedback injections per task before giving up |
| **Reset the environment to start a new trial** | Re-prompt the *same* session to continue (no env reset — agentic coding has no episodic reset) |

**Where it differs from textbook Reflexion:**

- **Trigger granularity.** Classic Reflexion evaluates at the end of an episode
/ on a failed trajectory. This plugin fires on the `session.idle` (OpenCode) or
`Stop` (Claude Code) boundary — i.e. *every time the agent thinks it's done* —
so its primary job is catching **premature stops**, not just failed runs.
- **Evaluator design.** Reflexion's evaluator is a task-specific heuristic (and
sometimes an LLM). Here the evaluator is primarily an **LLM-as-judge** whose
rubric is **mined from 227 real agent stops** (78% were premature), layered on
top of the two Reflexion-style heuristics above.
- **Verbal, not numeric.** Like Reflexion (and unlike RLHF/CoH), the feedback is
natural language fed straight back into context — no fine-tuning, no reward
model, no gradient updates.

In short: **Reflexion = actor + evaluator + verbal self-reflection with a small
bounded memory of retries**, and that is exactly the shape of this plugin, with
the evaluator specialized toward detecting premature task abandonment.

### State Graph

```
Expand Down
16 changes: 8 additions & 8 deletions claude/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
{
"name": "reflection-cc",
"displayName": "Reflection (Claude Code)",
"version": "0.1.0",
"description": "Re-prompts Claude Code when it stops prematurely due to failure modes like summary-drift-stop or tool-available-punt",
"author": "dzianisv",
"description": "Re-prompts Claude Code when it stops prematurely — catches PERMISSION-SEEKING, STOPPED-WITH-TODOS, and FALSE-COMPLETE failure modes, and injects targeted recovery instructions.",
"author": {
"name": "dzianisv",
"url": "https://github.com/dzianisv"
},
"repository": "https://github.com/dzianisv/opencode-plugins",
"license": "MIT",
"hooks": {
"stop": {
"command": "${CLAUDE_PLUGIN_ROOT}/bin/reflect.mjs",
"timeout": 30000
}
}
"hooks": "./hooks/hooks.json"
}
27 changes: 24 additions & 3 deletions claude/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,28 @@ Re-prompts Claude Code when it stops prematurely due to failure modes like summa

## Install

**Recommended (works today, CC v2.x):** add the Stop hook directly to `~/.claude/settings.json`:
### Via `/plugin` marketplace (recommended)

```bash
# 1. Register the marketplace (one-time per machine)
/plugin marketplace add dzianisv/opencode-plugins

# 2. Install the plugin
/plugin install reflection-cc
```

Or in one step using the CLI:

```bash
claude plugin marketplace add dzianisv/opencode-plugins
claude plugin install reflection-cc
```

This uses the `marketplace.json` at the repo root (`.claude-plugin/marketplace.json`) which points the `./claude` subdirectory as the plugin source.

### Manual (settings-based install — always works)

Add the Stop hook directly to `~/.claude/settings.json`:

```json
{
Expand All @@ -24,9 +45,9 @@ Re-prompts Claude Code when it stops prematurely due to failure modes like summa
}
```

The plugin manifest under `.claude-plugin/` is included for future marketplace publication, but in CC v2.1.150 `--plugin-dir` and the `enabledPlugins` config path do NOT activate `Stop` hooks for headless `-p` sessions. The settings-based install above is the authoritative path until that gap closes.
**One-session try:** write the JSON above to a file and pass `--settings ./reflect-settings.json`.

**One-session try:** `claude --settings '<json above>'` ... or write the JSON to a file and pass `--settings ./reflect-settings.json`.
> Note: the Stop hook event name is `"Stop"` (capital S) — lowercase `"stop"` is silently ignored by Claude Code.

## Failure Categories

Expand Down
7 changes: 4 additions & 3 deletions claude/bin/reflect.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -357,9 +357,10 @@ export function buildStopContext(stopPayload, transcriptTail) {
}
}

// Derive final assistant text: prefer CC's `response` field (it IS the last turn),
// fall back to the last assistant entry's text content from the tail.
let final_assistant_text = (stopPayload?.response ?? '').trim();
// Derive final assistant text: prefer CC's `last_assistant_message` field (the
// documented Stop hook field name as of CC v2.x — NOT `response`), fall back
// to the last assistant entry's text content from the transcript tail.
let final_assistant_text = (stopPayload?.last_assistant_message ?? stopPayload?.response ?? '').trim();
if (!final_assistant_text) {
// Walk tail in reverse, find last assistant entry with a text block
for (let i = transcriptTail.length - 1; i >= 0; i--) {
Expand Down
15 changes: 11 additions & 4 deletions claude/hooks/hooks.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
{
"hooks": {
"stop": {
"command": "${CLAUDE_PLUGIN_ROOT}/bin/reflect.mjs",
"timeout": 30000
}
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/bin/reflect.mjs",
"timeout": 30
}
]
}
]
}
}
Loading
Loading