feat(evolver): /evolve proposes skills from session friction#88
Closed
shuff57 wants to merge 3 commits into
Closed
feat(evolver): /evolve proposes skills from session friction#88shuff57 wants to merge 3 commits into
shuff57 wants to merge 3 commits into
Conversation
Skills following the Claude Code layout (<skill-dir>/<name>/SKILL.md) or written as plain .md without YAML frontmatter were silently skipped in the standard skill dirs (.smallcode/skills, ~/.smallcode/skills, ~/.config/smallcode/skills). Both shapes now load; README-style files (README/CHANGELOG/LICENSE/CONTRIBUTING) are filtered by name. Fixes Doorman11991#81 Constraint: no warning channel exists in SkillManager, so silent skips had no user-visible signal Rejected: warn-on-skip only | users following Claude Code conventions expect these layouts to work Confidence: high Scope-risk: narrow Not-tested: fullscreen TUI /skill list rendering (logic shared with classic mode) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Create-mode evolver: deterministic friction extraction from saved traces (repeated near-duplicate prompts, consecutive tool-retry loops), LLM judgment routed to the strong tier, and ONE quarantined skill draft per run written to .smallcode/skills/drafts/. Drafts never auto-load; /evolve promote <name> moves them live. Validation gates every write (name format, no frontmatter injection, trigger rules); name collisions across live+draft+global dirs abort; every create appends to .smallcode/evolver-audit.jsonl. The per-run cap is structural — EvolverRun raises on a second create. Constraint: small models produce noisy judgments, so all fuzzy output passes validate-or-abort before any write Rejected: plugin delivery | needs TraceRecorder + SkillManager internals unreachable from plugin dirs under binary installs Confidence: high Scope-risk: narrow Directive: keep mechanics LLM-free — judgment stays in the command handler so mechanics remain unit-testable Not-tested: strong-tier routing with a separately configured SMALLCODE_MODEL_STRONG endpoint Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Field regression: rephrased prompts with filler drift (another/please/new) failed to cluster because stopwords diluted Jaccard below threshold. Real prompts from a live session pinned as a test.
This was referenced Jun 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
/evolve— a create-mode evolver that turns recurring session friction into skill proposals:src/plugins/friction_analyzer.js): near-duplicate prompts (Jaccard clustering with stopword filtering, 3+ occurrences) and consecutive same-tool retry loops. No LLM involved in detection.strongtier when configured (falls back to active model): proposes ONE skill as JSON, forgiving-parsed (strict → fenced → abort-with-raw-output)..smallcode/skills/drafts/which SkillManager never auto-loads./evolve promote <name>moves a reviewed draft live./evolve listand/evolve loground it out.Safety rails
EvolverRunraises on a second write, not a convention.smallcode/evolver-audit.jsonlwith source trace IDsWhy mechanics/judgment split
Small models produce noisy judgments. All fuzzy output flows through validate-or-abort before touching disk, and the mechanics module (
src/plugins/evolver.js) is LLM-free so it unit-tests deterministically.Tests
21 new tests: cap enforcement, traversal rejection, quarantine pin, promote round-trip, friction extraction incl. a field regression (real session prompts that initially failed to cluster). Full suite green.
Field-verified end-to-end on Ollama (qwen3-coder-next, deepseek-v4-flash): real friction from real sessions produced a sensible, promotable draft.
Stacked on #85 (includes its commits — review
086fa4a+7095ce3here). Phase 2 (subagent support) and Phase 3 (evolver proposes agents) are designed and can follow if there is appetite.🤖 Generated with Claude Code