Skillsmith

The master toolkit for building production-quality skills and plugins for Claude Code.

Skillsmith is the plugin you install before you build your first plugin. It pairs an interview-driven scaffolder with a strict linter that knows every rule from the official Anthropic docs, a description-optimization loop that runs against a real eval set, a packager that produces marketplace-ready archives, and an HTML eval viewer that turns raw run output into something you can actually read. Every workflow ships with a slash command. Every hot path is tested. Every hook fails open.

If you've ever shipped a skill that mysteriously failed to trigger, scaffolded a plugin only to discover the manifest didn't validate, or written a description that the linter quietly disagreed with — Skillsmith is the answer.

Why Skillsmith

Built from the docs, not from vibes. Every rule the linter checks comes from code.claude.com/docs/skills, code.claude.com/docs/plugins-reference, code.claude.com/docs/hooks-guide, or a publicly-cited bug (Issue #56246, #24328, #46786, #38480). When the docs change, Skillsmith changes.
The hooks are real. Save a SKILL.md, the linter runs in-process and you get findings in milliseconds. Save a plugin.json, claude plugin validate --strict runs and you see the result. No nags, no theatre.
The eval loop actually works. Split your eval set 60/40 train/test, propose a new description with claude -p, re-evaluate, return the best by test score so you don't overfit. Stop early when no improvement is proposed.
The packager produces archives you can ship. Top-level dir comes from the manifest name (not a worktree alias), .claude/ is excluded, symlinks dropped, marketplace.json auto-generated.
The whole codebase is tested. 176 unit and integration tests covering parsing, linting, scaffolding, eval-trigger detection, hook handlers, the CLI, the eval viewer, and every error path that could matter.
Ruff-clean, zero noqa, zero TODO comments. What you read is what runs.

Install

From marketplace

claude plugin marketplace add voidfreud/skillsmith
claude plugin install skillsmith@skillsmith

From a local checkout

git clone https://github.com/voidfreud/skillsmith ~/code/skillsmith
claude --plugin-dir ~/code/skillsmith

The CLI (bin/skillsmith) is added to your PATH whenever the plugin is enabled in a Claude Code session.

Quick start

# Inside Claude Code, after install:
/skillsmith                          # interactive router — recommends the next move
/skillsmith:create my-new-skill      # interview → scaffold → lint
/skillsmith:audit ~/.claude/skills/X # structured quality audit
/skillsmith:lint .                   # fast schema check on current dir
/skillsmith:package                  # wrap a skill as a distributable plugin
/skillsmith:publish                  # generate marketplace.json + push tag

# From the shell, anywhere:
skillsmith lint <skill-dir>          # `bin/skillsmith` is on PATH when plugin is active
skillsmith lint-plugin <plugin-dir>
skillsmith review <workspace>        # opens the HTML eval viewer

What it ships

9 verb skills (user-invoked via `/skillsmith:*`)

Slash command	What it does
`/skillsmith`	Interactive router — assesses your repo and recommends the next action.
`/skillsmith:create`	Interview-first scaffolder: 4–7 targeted questions, then writes a valid skill or plugin tree.
`/skillsmith:audit`	Structured deep audit of an existing skill or plugin against the full ruleset.
`/skillsmith:lint`	Fast schema check — frontmatter, caps, names, anti-patterns. Exits 0/1 for CI.
`/skillsmith:eval`	Run the canonical eval workflow (with-skill vs without-skill, parallel subagents).
`/skillsmith:optimize`	Description optimization loop: train/test split, propose, re-evaluate, keep best.
`/skillsmith:package`	Wrap a skill in a distributable plugin scaffold; emit a clean zip.
`/skillsmith:publish`	Generate `marketplace.json` against the current `git origin`, ready for `claude plugin marketplace add`.
`/skillsmith:install`	Walk through installation flow + verify with `claude plugin details`.

14 knowledge skills (load-on-demand references)

Every knowledge skill carries disable-model-invocation: true, so it's available for explicit reference but doesn't burn always-on tokens unprompted.

skill-anatomy — the full layout of a SKILL.md
plugin-anatomy — what goes where in a Claude Code plugin
progressive-disclosure — the 500/5,000/25,000-token loading model
description-writing — the rules (third person, 1024-char cap, what+when)
description-optimization — the mechanics of running run_loop.py
writing-rules — operator catalog + negative-example library (forked from hookify)
hook-authoring — hooks.json schema, fail-open patterns, recursion safety
slash-command-authoring — invocation syntax, argument-hint conventions
subagent-skills — when to spawn a subagent vs inline a skill
eval-workflow — the canonical evals.json / eval_metadata.json / grading.json schema
packaging-distribution — marketplace.json schema, install paths, versioning
environments — claude-code / claude-ai / cowork differences, $CLAUDE_HARNESS
anti-patterns — the 12 documented anti-patterns to avoid
troubleshooting — common failure modes and their fixes

7 specialized agents

skill-reviewer — quality review of an individual SKILL.md against the full ruleset (forked from plugin-dev, extended).
plugin-validator — end-to-end plugin validation: manifest, layout, security, anti-patterns (forked from plugin-dev, extended).
interviewer — runs the 4–7-question intent-capture session before scaffolding. Returns a structured spec.
scaffolder — takes the interviewer's spec and idempotently writes the file tree, runs the linter to confirm.
grader — reads <run-dir>/outputs/, grades against the eval's declared expectations, writes grading.json.
benchmark-analyzer — surfaces patterns the raw stats hide (non-discriminating expectations, flaky evals, time/token tradeoffs).
post-iteration-analyzer — explains why a change helped or hurt between iterations, with actionable guidance for the next round.

Hooks

A single PostToolUse hook in hooks/hooks.json routes through hooks/hook_dispatcher.py to two handlers:

On SKILL.md save → lint_skill runs in-process (no subprocess overhead) and surfaces findings as a systemMessage.
On .claude-plugin/plugin.json save → claude plugin validate --strict runs (8s timeout) and surfaces the result.

Both are toggled by userConfig flags (auto_review_on_save, auto_validate_manifest). Both fail open: a broken handler emits a diagnostic to stderr and exits 0.

CLI (`bin/skillsmith`)

A thin Python dispatcher added to PATH while the plugin is enabled. Subcommands route to python -m scripts.<module> with cwd set to the plugin root.

skillsmith lint <skill-path>
skillsmith lint-plugin <plugin-path>
skillsmith validate-workspace <iteration-dir>
skillsmith aggregate <iteration-dir> --skill-name <name>
skillsmith package --skill-path <p> --plugin-name <n> --output <o>
skillsmith package --archive --plugin-path <p> --output <p>.zip
skillsmith marketplace --plugin-path <p>
skillsmith scaffold --spec <spec.json> --target <dir>
skillsmith run-eval --eval-set <f> --skill-path <s> --skill-name <n>
skillsmith run-loop --eval-set <f> --skill-path <s> [--max-iterations N]
skillsmith improve-description --skill-name <n> --current-description <s> --eval-results <r>
skillsmith review <workspace> [--static <out>]
skillsmith trigger-eval-template
skillsmith root

Eval viewer

eval-viewer/generate_review.py reads a workspace directory, finds every run (any directory with an outputs/ subdir), embeds the output files (text inline, images/PDF/xlsx as base64 data URIs), and serves a self-contained HTML page. Feedback auto-saves to feedback.json. Render with skillsmith review <workspace>.

Configuration

Set at install time via Claude Code's userConfig prompt:

Setting	Default	Purpose
`default_author_name`	Void Freud	Written into scaffolded `plugin.json` files.
`default_author_email`	voidfreud@users.noreply.github.com	Same.
`auto_review_on_save`	`true`	Trigger `lint_skill` on `SKILL.md` save.
`auto_validate_manifest`	`true`	Trigger `claude plugin validate --strict` on `plugin.json` save.
`default_eval_iterations`	`3`	How many `run_loop` iterations before declaring a skill ready.
`default_target_env`	`claude-code`	Default target environment for new skills.
`workspace_root`	`~/code`	Where eval workspaces are placed.

Change them at any time in ~/.claude/settings.json under plugins.skillsmith.userConfig.

Architecture

skillsmith/
├── .claude-plugin/
│   ├── plugin.json            # Manifest with userConfig schema
│   └── marketplace.json       # Self-distribution manifest (auto-generated)
├── skills/                    # 23 skills: 9 verb + 14 knowledge
├── agents/                    # 7 specialized subagents
├── hooks/
│   ├── hooks.json             # PostToolUse routing table
│   └── hook_dispatcher.py     # Thin entry point — delegates to scripts.hook_handler
├── scripts/                   # Python package (importable as `scripts.X`)
│   ├── lint_skill.py          # SKILL.md linter (in-process from hook)
│   ├── lint_plugin.py         # Plugin linter (manifest + layout + security)
│   ├── scaffold.py            # Skill/plugin tree writer
│   ├── package_plugin.py      # Skill→plugin wrapper + archive packager
│   ├── generate_marketplace_json.py
│   ├── run_eval.py            # Trigger-rate eval (claude -p × N runs)
│   ├── run_loop.py            # Description-optimization loop
│   ├── improve_description.py # `claude -p` prompt for a better description
│   ├── aggregate_benchmark.py # Per-run grading → benchmark.json/md
│   ├── validate_workspace.py  # Pre-aggregation sanity check
│   ├── hook_handler.py        # Hook event dispatcher (called by entry point)
│   └── utils.py               # Shared: parse_frontmatter, lint report,
│                              # name validation, plugin-root discovery,
│                              # SKILL.md description read/replace.
├── eval-viewer/
│   ├── generate_review.py     # HTTP server + static HTML generator
│   ├── viewer.html            # Standalone viewer (XSS-safe escapeHtml + textContent)
│   └── trigger_review.html    # Eval-review trigger template
├── bin/
│   └── skillsmith             # CLI wrapper (PATH-exposed)
├── tests/                     # 176 tests covering every testable component
├── evals/                     # Self-eval suite + fixtures
│   ├── evals.json             # 8 prompts: create, audit, lint, eval, …
│   ├── trigger-evals.json     # 18 queries for description optimization
│   └── fixtures/              # broken-skill + pdf-helper fixtures
├── audits/                    # Closed audit reports (per-finding verifications)
├── pyproject.toml             # ruff + pytest config
├── CHANGELOG.md
└── DEFERRED.md                # Items deferred by design, with reasoning

Module-or-script invocation

Every entry point under scripts/ supports three invocation styles:

python -m scripts.lint_skill <path>        # package-style (canonical)
python scripts/lint_skill.py <path>        # direct file invocation
skillsmith lint <path>                     # via the CLI dispatcher

The scripts/ package contains a five-line bootstrap that re-anchors __package__ and adds the plugin root to sys.path when the file is run directly. No noqa markers anywhere in the codebase.

Hook safety

Every hook script ends with finally: sys.exit(0) (pattern forked from hookify). A broken rule or crashed handler can never block your tool calls. If a systemMessage from skillsmith reports an internal error, the hook failed open — your work continues unaffected.

Development

git clone https://github.com/voidfreud/skillsmith
cd skillsmith

# Self-test
uv run pytest tests/                       # 176 tests, ~2s
uvx ruff check .                           # all checks must pass
uvx ruff format --check .                  # formatting must be clean
uv run python -m scripts.lint_plugin .     # the plugin must lint itself clean
claude plugin validate . --strict          # the manifest must pass strict

A passing self-test means:

The plugin manifest validates strict.
Every shipped skill lints clean (regression-guarded in test_lint_skill.py).
The trigger-detection heuristic correctly identifies the 7 documented Claude invocation signals (test_run_eval.py).
The scaffolder rejects path-traversal names and budgets descriptions under the 1024-char cap (test_scaffold.py).
The packager ships archives with the manifest name as the top-level dir, drops .claude/ and symlinks (test_package_plugin.py).
Every script's argparse --help resolves without import errors (test_skillsmith_cli.py).

What's deferred

DEFERRED.md lists everything not shipped in the current release with reasoning. The short version: nothing is "we didn't get to it". Each entry is a deliberate choice — the rule engine is deferred until user-customizable lint rules exist (no rules → nothing to engine over), the live end-to-end eval run is deferred because it costs real model time (the framework ships, the run is a user-maintenance task), and a few skill responsibility overlaps are kept deliberately to make each skill self-contained when invoked.

Acknowledgements

Skillsmith forks code and patterns from the official Anthropic plugin examples:

plugin-dev — skill-reviewer, plugin-validator, the 8-phase create-plugin workflow.
hookify — the finally: sys.exit(0) runtime safety pattern, single-dispatcher design, the writing-rules operator catalog.
feature-dev — parallel agent fan-out pattern.
code-review — two-pass confidence gating.
claude-code-skill-factory — the interview-first scaffolding flow.

License

MIT. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skillsmith

Why Skillsmith

Install

From marketplace

From a local checkout

Quick start

What it ships

9 verb skills (user-invoked via `/skillsmith:*`)

14 knowledge skills (load-on-demand references)

7 specialized agents

Hooks

CLI (`bin/skillsmith`)

Eval viewer

Configuration

Architecture

Module-or-script invocation

Hook safety

Development

What's deferred

Acknowledgements

License

About

Uh oh!

Releases 35

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.claude-plugin		.claude-plugin
.claude		.claude
agents		agents
audits		audits
bin		bin
eval-viewer		eval-viewer
evals		evals
hooks		hooks
scripts		scripts
skills		skills
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
DEFERRED.md		DEFERRED.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Skillsmith

Why Skillsmith

Install

From marketplace

From a local checkout

Quick start

What it ships

9 verb skills (user-invoked via /skillsmith:*)

14 knowledge skills (load-on-demand references)

7 specialized agents

Hooks

CLI (bin/skillsmith)

Eval viewer

Configuration

Architecture

Module-or-script invocation

Hook safety

Development

What's deferred

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 35

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

9 verb skills (user-invoked via `/skillsmith:*`)

CLI (`bin/skillsmith`)

Packages