Skip to content

feat(evolver): /evolve proposes skills from session friction#88

Closed
shuff57 wants to merge 3 commits into
Doorman11991:masterfrom
shuff57:feat/evolver-phase1
Closed

feat(evolver): /evolve proposes skills from session friction#88
shuff57 wants to merge 3 commits into
Doorman11991:masterfrom
shuff57:feat/evolver-phase1

Conversation

@shuff57

@shuff57 shuff57 commented Jun 7, 2026

Copy link
Copy Markdown

What

/evolve — a create-mode evolver that turns recurring session friction into skill proposals:

  1. Deterministic friction extraction from saved traces (src/plugins/friction_analyzer.js): near-duplicate prompts (Jaccard clustering with stopword filtering, 3+ occurrences) and consecutive same-tool retry loops. No LLM involved in detection.
  2. LLM judgment routed to the strong tier when configured (falls back to active model): proposes ONE skill as JSON, forgiving-parsed (strict → fenced → abort-with-raw-output).
  3. Quarantined drafts: proposals land in .smallcode/skills/drafts/ which SkillManager never auto-loads. /evolve promote <name> moves a reviewed draft live. /evolve list and /evolve log round it out.

Safety rails

  • Validation gates every write: name format, no frontmatter injection (newline check), trigger rules
  • Name-collision check across live + draft + user-global skill dirs aborts cleanly
  • Structural 1-create-per-run capEvolverRun raises on a second write, not a convention
  • Never deletes, never commits; every create appends to .smallcode/evolver-audit.jsonl with source trace IDs
  • Parse failure = nothing written, raw model output shown

Why mechanics/judgment split

Small models produce noisy judgments. All fuzzy output flows through validate-or-abort before touching disk, and the mechanics module (src/plugins/evolver.js) is LLM-free so it unit-tests deterministically.

Tests

21 new tests: cap enforcement, traversal rejection, quarantine pin, promote round-trip, friction extraction incl. a field regression (real session prompts that initially failed to cluster). Full suite green.

Field-verified end-to-end on Ollama (qwen3-coder-next, deepseek-v4-flash): real friction from real sessions produced a sensible, promotable draft.

Stacked on #85 (includes its commits — review 086fa4a + 7095ce3 here). Phase 2 (subagent support) and Phase 3 (evolver proposes agents) are designed and can follow if there is appetite.

🤖 Generated with Claude Code

shuff57 and others added 3 commits June 5, 2026 12:33
Skills following the Claude Code layout (<skill-dir>/<name>/SKILL.md)
or written as plain .md without YAML frontmatter were silently skipped
in the standard skill dirs (.smallcode/skills, ~/.smallcode/skills,
~/.config/smallcode/skills). Both shapes now load; README-style files
(README/CHANGELOG/LICENSE/CONTRIBUTING) are filtered by name.

Fixes Doorman11991#81

Constraint: no warning channel exists in SkillManager, so silent skips had no user-visible signal
Rejected: warn-on-skip only | users following Claude Code conventions expect these layouts to work
Confidence: high
Scope-risk: narrow
Not-tested: fullscreen TUI /skill list rendering (logic shared with classic mode)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Create-mode evolver: deterministic friction extraction from saved
traces (repeated near-duplicate prompts, consecutive tool-retry
loops), LLM judgment routed to the strong tier, and ONE quarantined
skill draft per run written to .smallcode/skills/drafts/.

Drafts never auto-load; /evolve promote <name> moves them live.
Validation gates every write (name format, no frontmatter injection,
trigger rules); name collisions across live+draft+global dirs abort;
every create appends to .smallcode/evolver-audit.jsonl. The per-run
cap is structural — EvolverRun raises on a second create.

Constraint: small models produce noisy judgments, so all fuzzy output passes validate-or-abort before any write
Rejected: plugin delivery | needs TraceRecorder + SkillManager internals unreachable from plugin dirs under binary installs
Confidence: high
Scope-risk: narrow
Directive: keep mechanics LLM-free — judgment stays in the command handler so mechanics remain unit-testable
Not-tested: strong-tier routing with a separately configured SMALLCODE_MODEL_STRONG endpoint

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Field regression: rephrased prompts with filler drift (another/please/new) failed to cluster because stopwords diluted Jaccard below threshold. Real prompts from a live session pinned as a test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant