The master toolkit for building production-quality skills and plugins for Claude Code.
Skillsmith is the plugin you install before you build your first plugin. It pairs an interview-driven scaffolder with a strict linter that knows every rule from the official Anthropic docs, a description-optimization loop that runs against a real eval set, a packager that produces marketplace-ready archives, and an HTML eval viewer that turns raw run output into something you can actually read. Every workflow ships with a slash command. Every hot path is tested. Every hook fails open.
If you've ever shipped a skill that mysteriously failed to trigger, scaffolded a plugin only to discover the manifest didn't validate, or written a description that the linter quietly disagreed with — Skillsmith is the answer.
- Built from the docs, not from vibes. Every rule the linter checks comes from
code.claude.com/docs/skills,code.claude.com/docs/plugins-reference,code.claude.com/docs/hooks-guide, or a publicly-cited bug (Issue #56246, #24328, #46786, #38480). When the docs change, Skillsmith changes. - The hooks are real. Save a
SKILL.md, the linter runs in-process and you get findings in milliseconds. Save aplugin.json,claude plugin validate --strictruns and you see the result. No nags, no theatre. - The eval loop actually works. Split your eval set 60/40 train/test, propose a new description with
claude -p, re-evaluate, return the best by test score so you don't overfit. Stop early when no improvement is proposed. - The packager produces archives you can ship. Top-level dir comes from the manifest name (not a worktree alias),
.claude/is excluded, symlinks dropped, marketplace.json auto-generated. - The whole codebase is tested. 176 unit and integration tests covering parsing, linting, scaffolding, eval-trigger detection, hook handlers, the CLI, the eval viewer, and every error path that could matter.
- Ruff-clean, zero
noqa, zero TODO comments. What you read is what runs.
claude plugin marketplace add voidfreud/skillsmith
claude plugin install skillsmith@skillsmithgit clone https://github.com/voidfreud/skillsmith ~/code/skillsmith
claude --plugin-dir ~/code/skillsmithThe CLI (bin/skillsmith) is added to your PATH whenever the plugin is enabled in a Claude Code session.
# Inside Claude Code, after install:
/skillsmith # interactive router — recommends the next move
/skillsmith:create my-new-skill # interview → scaffold → lint
/skillsmith:audit ~/.claude/skills/X # structured quality audit
/skillsmith:lint . # fast schema check on current dir
/skillsmith:package # wrap a skill as a distributable plugin
/skillsmith:publish # generate marketplace.json + push tag
# From the shell, anywhere:
skillsmith lint <skill-dir> # `bin/skillsmith` is on PATH when plugin is active
skillsmith lint-plugin <plugin-dir>
skillsmith review <workspace> # opens the HTML eval viewer| Slash command | What it does |
|---|---|
/skillsmith |
Interactive router — assesses your repo and recommends the next action. |
/skillsmith:create |
Interview-first scaffolder: 4–7 targeted questions, then writes a valid skill or plugin tree. |
/skillsmith:audit |
Structured deep audit of an existing skill or plugin against the full ruleset. |
/skillsmith:lint |
Fast schema check — frontmatter, caps, names, anti-patterns. Exits 0/1 for CI. |
/skillsmith:eval |
Run the canonical eval workflow (with-skill vs without-skill, parallel subagents). |
/skillsmith:optimize |
Description optimization loop: train/test split, propose, re-evaluate, keep best. |
/skillsmith:package |
Wrap a skill in a distributable plugin scaffold; emit a clean zip. |
/skillsmith:publish |
Generate marketplace.json against the current git origin, ready for claude plugin marketplace add. |
/skillsmith:install |
Walk through installation flow + verify with claude plugin details. |
Every knowledge skill carries disable-model-invocation: true, so it's available for explicit reference but doesn't burn always-on tokens unprompted.
skill-anatomy— the full layout of a SKILL.mdplugin-anatomy— what goes where in a Claude Code pluginprogressive-disclosure— the 500/5,000/25,000-token loading modeldescription-writing— the rules (third person, 1024-char cap, what+when)description-optimization— the mechanics of runningrun_loop.pywriting-rules— operator catalog + negative-example library (forked from hookify)hook-authoring— hooks.json schema, fail-open patterns, recursion safetyslash-command-authoring— invocation syntax, argument-hint conventionssubagent-skills— when to spawn a subagent vs inline a skilleval-workflow— the canonicalevals.json/eval_metadata.json/grading.jsonschemapackaging-distribution— marketplace.json schema, install paths, versioningenvironments—claude-code/claude-ai/coworkdifferences,$CLAUDE_HARNESSanti-patterns— the 12 documented anti-patterns to avoidtroubleshooting— common failure modes and their fixes
skill-reviewer— quality review of an individual SKILL.md against the full ruleset (forked fromplugin-dev, extended).plugin-validator— end-to-end plugin validation: manifest, layout, security, anti-patterns (forked fromplugin-dev, extended).interviewer— runs the 4–7-question intent-capture session before scaffolding. Returns a structured spec.scaffolder— takes the interviewer's spec and idempotently writes the file tree, runs the linter to confirm.grader— reads<run-dir>/outputs/, grades against the eval's declaredexpectations, writesgrading.json.benchmark-analyzer— surfaces patterns the raw stats hide (non-discriminating expectations, flaky evals, time/token tradeoffs).post-iteration-analyzer— explains why a change helped or hurt between iterations, with actionable guidance for the next round.
A single PostToolUse hook in hooks/hooks.json routes through hooks/hook_dispatcher.py to two handlers:
- On
SKILL.mdsave →lint_skillruns in-process (no subprocess overhead) and surfaces findings as asystemMessage. - On
.claude-plugin/plugin.jsonsave →claude plugin validate --strictruns (8s timeout) and surfaces the result.
Both are toggled by userConfig flags (auto_review_on_save, auto_validate_manifest). Both fail open: a broken handler emits a diagnostic to stderr and exits 0.
A thin Python dispatcher added to PATH while the plugin is enabled. Subcommands route to python -m scripts.<module> with cwd set to the plugin root.
skillsmith lint <skill-path>
skillsmith lint-plugin <plugin-path>
skillsmith validate-workspace <iteration-dir>
skillsmith aggregate <iteration-dir> --skill-name <name>
skillsmith package --skill-path <p> --plugin-name <n> --output <o>
skillsmith package --archive --plugin-path <p> --output <p>.zip
skillsmith marketplace --plugin-path <p>
skillsmith scaffold --spec <spec.json> --target <dir>
skillsmith run-eval --eval-set <f> --skill-path <s> --skill-name <n>
skillsmith run-loop --eval-set <f> --skill-path <s> [--max-iterations N]
skillsmith improve-description --skill-name <n> --current-description <s> --eval-results <r>
skillsmith review <workspace> [--static <out>]
skillsmith trigger-eval-template
skillsmith root
eval-viewer/generate_review.py reads a workspace directory, finds every run (any directory with an outputs/ subdir), embeds the output files (text inline, images/PDF/xlsx as base64 data URIs), and serves a self-contained HTML page. Feedback auto-saves to feedback.json. Render with skillsmith review <workspace>.
Set at install time via Claude Code's userConfig prompt:
| Setting | Default | Purpose |
|---|---|---|
default_author_name |
Void Freud | Written into scaffolded plugin.json files. |
default_author_email |
voidfreud@users.noreply.github.com | Same. |
auto_review_on_save |
true |
Trigger lint_skill on SKILL.md save. |
auto_validate_manifest |
true |
Trigger claude plugin validate --strict on plugin.json save. |
default_eval_iterations |
3 |
How many run_loop iterations before declaring a skill ready. |
default_target_env |
claude-code |
Default target environment for new skills. |
workspace_root |
~/code |
Where eval workspaces are placed. |
Change them at any time in ~/.claude/settings.json under plugins.skillsmith.userConfig.
skillsmith/
├── .claude-plugin/
│ ├── plugin.json # Manifest with userConfig schema
│ └── marketplace.json # Self-distribution manifest (auto-generated)
├── skills/ # 23 skills: 9 verb + 14 knowledge
├── agents/ # 7 specialized subagents
├── hooks/
│ ├── hooks.json # PostToolUse routing table
│ └── hook_dispatcher.py # Thin entry point — delegates to scripts.hook_handler
├── scripts/ # Python package (importable as `scripts.X`)
│ ├── lint_skill.py # SKILL.md linter (in-process from hook)
│ ├── lint_plugin.py # Plugin linter (manifest + layout + security)
│ ├── scaffold.py # Skill/plugin tree writer
│ ├── package_plugin.py # Skill→plugin wrapper + archive packager
│ ├── generate_marketplace_json.py
│ ├── run_eval.py # Trigger-rate eval (claude -p × N runs)
│ ├── run_loop.py # Description-optimization loop
│ ├── improve_description.py # `claude -p` prompt for a better description
│ ├── aggregate_benchmark.py # Per-run grading → benchmark.json/md
│ ├── validate_workspace.py # Pre-aggregation sanity check
│ ├── hook_handler.py # Hook event dispatcher (called by entry point)
│ └── utils.py # Shared: parse_frontmatter, lint report,
│ # name validation, plugin-root discovery,
│ # SKILL.md description read/replace.
├── eval-viewer/
│ ├── generate_review.py # HTTP server + static HTML generator
│ ├── viewer.html # Standalone viewer (XSS-safe escapeHtml + textContent)
│ └── trigger_review.html # Eval-review trigger template
├── bin/
│ └── skillsmith # CLI wrapper (PATH-exposed)
├── tests/ # 176 tests covering every testable component
├── evals/ # Self-eval suite + fixtures
│ ├── evals.json # 8 prompts: create, audit, lint, eval, …
│ ├── trigger-evals.json # 18 queries for description optimization
│ └── fixtures/ # broken-skill + pdf-helper fixtures
├── audits/ # Closed audit reports (per-finding verifications)
├── pyproject.toml # ruff + pytest config
├── CHANGELOG.md
└── DEFERRED.md # Items deferred by design, with reasoning
Every entry point under scripts/ supports three invocation styles:
python -m scripts.lint_skill <path> # package-style (canonical)
python scripts/lint_skill.py <path> # direct file invocation
skillsmith lint <path> # via the CLI dispatcherThe scripts/ package contains a five-line bootstrap that re-anchors __package__ and adds the plugin root to sys.path when the file is run directly. No noqa markers anywhere in the codebase.
Every hook script ends with finally: sys.exit(0) (pattern forked from hookify). A broken rule or crashed handler can never block your tool calls. If a systemMessage from skillsmith reports an internal error, the hook failed open — your work continues unaffected.
git clone https://github.com/voidfreud/skillsmith
cd skillsmith
# Self-test
uv run pytest tests/ # 176 tests, ~2s
uvx ruff check . # all checks must pass
uvx ruff format --check . # formatting must be clean
uv run python -m scripts.lint_plugin . # the plugin must lint itself clean
claude plugin validate . --strict # the manifest must pass strictA passing self-test means:
- The plugin manifest validates strict.
- Every shipped skill lints clean (regression-guarded in
test_lint_skill.py). - The trigger-detection heuristic correctly identifies the 7 documented Claude invocation signals (
test_run_eval.py). - The scaffolder rejects path-traversal names and budgets descriptions under the 1024-char cap (
test_scaffold.py). - The packager ships archives with the manifest name as the top-level dir, drops
.claude/and symlinks (test_package_plugin.py). - Every script's argparse
--helpresolves without import errors (test_skillsmith_cli.py).
DEFERRED.md lists everything not shipped in the current release with reasoning. The short version: nothing is "we didn't get to it". Each entry is a deliberate choice — the rule engine is deferred until user-customizable lint rules exist (no rules → nothing to engine over), the live end-to-end eval run is deferred because it costs real model time (the framework ships, the run is a user-maintenance task), and a few skill responsibility overlaps are kept deliberately to make each skill self-contained when invoked.
Skillsmith forks code and patterns from the official Anthropic plugin examples:
plugin-dev—skill-reviewer,plugin-validator, the 8-phase create-plugin workflow.hookify— thefinally: sys.exit(0)runtime safety pattern, single-dispatcher design, thewriting-rulesoperator catalog.feature-dev— parallel agent fan-out pattern.code-review— two-pass confidence gating.claude-code-skill-factory— the interview-first scaffolding flow.
MIT. See LICENSE.