feat(evolver): /evolve proposes skills from session friction by shuff57 · Pull Request #88 · Doorman11991/smallcode

shuff57 · 2026-06-07T17:06:48Z

What

/evolve — a create-mode evolver that turns recurring session friction into skill proposals:

Deterministic friction extraction from saved traces (src/plugins/friction_analyzer.js): near-duplicate prompts (Jaccard clustering with stopword filtering, 3+ occurrences) and consecutive same-tool retry loops. No LLM involved in detection.
LLM judgment routed to the strong tier when configured (falls back to active model): proposes ONE skill as JSON, forgiving-parsed (strict → fenced → abort-with-raw-output).
Quarantined drafts: proposals land in .smallcode/skills/drafts/ which SkillManager never auto-loads. /evolve promote <name> moves a reviewed draft live. /evolve list and /evolve log round it out.

Safety rails

Validation gates every write: name format, no frontmatter injection (newline check), trigger rules
Name-collision check across live + draft + user-global skill dirs aborts cleanly
Structural 1-create-per-run cap — EvolverRun raises on a second write, not a convention
Never deletes, never commits; every create appends to .smallcode/evolver-audit.jsonl with source trace IDs
Parse failure = nothing written, raw model output shown

Why mechanics/judgment split

Small models produce noisy judgments. All fuzzy output flows through validate-or-abort before touching disk, and the mechanics module (src/plugins/evolver.js) is LLM-free so it unit-tests deterministically.

Tests

21 new tests: cap enforcement, traversal rejection, quarantine pin, promote round-trip, friction extraction incl. a field regression (real session prompts that initially failed to cluster). Full suite green.

Field-verified end-to-end on Ollama (qwen3-coder-next, deepseek-v4-flash): real friction from real sessions produced a sensible, promotable draft.

Stacked on #85 (includes its commits — review 086fa4a + 7095ce3 here). Phase 2 (subagent support) and Phase 3 (evolver proposes agents) are designed and can follow if there is appetite.

🤖 Generated with Claude Code

Skills following the Claude Code layout (<skill-dir>/<name>/SKILL.md) or written as plain .md without YAML frontmatter were silently skipped in the standard skill dirs (.smallcode/skills, ~/.smallcode/skills, ~/.config/smallcode/skills). Both shapes now load; README-style files (README/CHANGELOG/LICENSE/CONTRIBUTING) are filtered by name. Fixes Doorman11991#81 Constraint: no warning channel exists in SkillManager, so silent skips had no user-visible signal Rejected: warn-on-skip only | users following Claude Code conventions expect these layouts to work Confidence: high Scope-risk: narrow Not-tested: fullscreen TUI /skill list rendering (logic shared with classic mode) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Create-mode evolver: deterministic friction extraction from saved traces (repeated near-duplicate prompts, consecutive tool-retry loops), LLM judgment routed to the strong tier, and ONE quarantined skill draft per run written to .smallcode/skills/drafts/. Drafts never auto-load; /evolve promote <name> moves them live. Validation gates every write (name format, no frontmatter injection, trigger rules); name collisions across live+draft+global dirs abort; every create appends to .smallcode/evolver-audit.jsonl. The per-run cap is structural — EvolverRun raises on a second create. Constraint: small models produce noisy judgments, so all fuzzy output passes validate-or-abort before any write Rejected: plugin delivery | needs TraceRecorder + SkillManager internals unreachable from plugin dirs under binary installs Confidence: high Scope-risk: narrow Directive: keep mechanics LLM-free — judgment stays in the command handler so mechanics remain unit-testable Not-tested: strong-tier routing with a separately configured SMALLCODE_MODEL_STRONG endpoint Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Field regression: rephrased prompts with filler drift (another/please/new) failed to cluster because stopwords diluted Jaccard below threshold. Real prompts from a live session pinned as a test.

shuff57 and others added 3 commits June 5, 2026 12:33

fix(evolver): stopword filtering in prompt clustering

7095ce3

Field regression: rephrased prompts with filler drift (another/please/new) failed to cluster because stopwords diluted Jaccard below threshold. Real prompts from a live session pinned as a test.

This was referenced Jun 7, 2026

feat(skills): lazy index-first loading + use_skill tool #89

Closed

feat(agents): subagent + team support with spawn_agent tool #92

Closed

shuff57 closed this Jun 14, 2026

shuff57 deleted the feat/evolver-phase1 branch June 14, 2026 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evolver): /evolve proposes skills from session friction#88

feat(evolver): /evolve proposes skills from session friction#88
shuff57 wants to merge 3 commits into
Doorman11991:masterfrom
shuff57:feat/evolver-phase1

shuff57 commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shuff57 commented Jun 7, 2026

What

Safety rails

Why mechanics/judgment split

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant