Skip to content

voidfreud/skillsmith

Repository files navigation

Skillsmith

The master toolkit for building production-quality skills and plugins for Claude Code.

Skillsmith is the plugin you install before you build your first plugin. It pairs an interview-driven scaffolder with a strict linter that knows every rule from the official Anthropic docs, a description-optimization loop that runs against a real eval set, a packager that produces marketplace-ready archives, and an HTML eval viewer that turns raw run output into something you can actually read. Every workflow ships with a slash command. Every hot path is tested. Every hook fails open.

If you've ever shipped a skill that mysteriously failed to trigger, scaffolded a plugin only to discover the manifest didn't validate, or written a description that the linter quietly disagreed with — Skillsmith is the answer.


Why Skillsmith

  • Built from the docs, not from vibes. Every rule the linter checks comes from code.claude.com/docs/skills, code.claude.com/docs/plugins-reference, code.claude.com/docs/hooks-guide, or a publicly-cited bug (Issue #56246, #24328, #46786, #38480). When the docs change, Skillsmith changes.
  • The hooks are real. Save a SKILL.md, the linter runs in-process and you get findings in milliseconds. Save a plugin.json, claude plugin validate --strict runs and you see the result. No nags, no theatre.
  • The eval loop actually works. Split your eval set 60/40 train/test, propose a new description with claude -p, re-evaluate, return the best by test score so you don't overfit. Stop early when no improvement is proposed.
  • The packager produces archives you can ship. Top-level dir comes from the manifest name (not a worktree alias), .claude/ is excluded, symlinks dropped, marketplace.json auto-generated.
  • The whole codebase is tested. 176 unit and integration tests covering parsing, linting, scaffolding, eval-trigger detection, hook handlers, the CLI, the eval viewer, and every error path that could matter.
  • Ruff-clean, zero noqa, zero TODO comments. What you read is what runs.

Install

From marketplace

claude plugin marketplace add voidfreud/skillsmith
claude plugin install skillsmith@skillsmith

From a local checkout

git clone https://github.com/voidfreud/skillsmith ~/code/skillsmith
claude --plugin-dir ~/code/skillsmith

The CLI (bin/skillsmith) is added to your PATH whenever the plugin is enabled in a Claude Code session.


Quick start

# Inside Claude Code, after install:
/skillsmith                          # interactive router — recommends the next move
/skillsmith:create my-new-skill      # interview → scaffold → lint
/skillsmith:audit ~/.claude/skills/X # structured quality audit
/skillsmith:lint .                   # fast schema check on current dir
/skillsmith:package                  # wrap a skill as a distributable plugin
/skillsmith:publish                  # generate marketplace.json + push tag

# From the shell, anywhere:
skillsmith lint <skill-dir>          # `bin/skillsmith` is on PATH when plugin is active
skillsmith lint-plugin <plugin-dir>
skillsmith review <workspace>        # opens the HTML eval viewer

What it ships

9 verb skills (user-invoked via /skillsmith:*)

Slash command What it does
/skillsmith Interactive router — assesses your repo and recommends the next action.
/skillsmith:create Interview-first scaffolder: 4–7 targeted questions, then writes a valid skill or plugin tree.
/skillsmith:audit Structured deep audit of an existing skill or plugin against the full ruleset.
/skillsmith:lint Fast schema check — frontmatter, caps, names, anti-patterns. Exits 0/1 for CI.
/skillsmith:eval Run the canonical eval workflow (with-skill vs without-skill, parallel subagents).
/skillsmith:optimize Description optimization loop: train/test split, propose, re-evaluate, keep best.
/skillsmith:package Wrap a skill in a distributable plugin scaffold; emit a clean zip.
/skillsmith:publish Generate marketplace.json against the current git origin, ready for claude plugin marketplace add.
/skillsmith:install Walk through installation flow + verify with claude plugin details.

14 knowledge skills (load-on-demand references)

Every knowledge skill carries disable-model-invocation: true, so it's available for explicit reference but doesn't burn always-on tokens unprompted.

  • skill-anatomy — the full layout of a SKILL.md
  • plugin-anatomy — what goes where in a Claude Code plugin
  • progressive-disclosure — the 500/5,000/25,000-token loading model
  • description-writing — the rules (third person, 1024-char cap, what+when)
  • description-optimization — the mechanics of running run_loop.py
  • writing-rules — operator catalog + negative-example library (forked from hookify)
  • hook-authoring — hooks.json schema, fail-open patterns, recursion safety
  • slash-command-authoring — invocation syntax, argument-hint conventions
  • subagent-skills — when to spawn a subagent vs inline a skill
  • eval-workflow — the canonical evals.json / eval_metadata.json / grading.json schema
  • packaging-distribution — marketplace.json schema, install paths, versioning
  • environmentsclaude-code / claude-ai / cowork differences, $CLAUDE_HARNESS
  • anti-patterns — the 12 documented anti-patterns to avoid
  • troubleshooting — common failure modes and their fixes

7 specialized agents

  • skill-reviewer — quality review of an individual SKILL.md against the full ruleset (forked from plugin-dev, extended).
  • plugin-validator — end-to-end plugin validation: manifest, layout, security, anti-patterns (forked from plugin-dev, extended).
  • interviewer — runs the 4–7-question intent-capture session before scaffolding. Returns a structured spec.
  • scaffolder — takes the interviewer's spec and idempotently writes the file tree, runs the linter to confirm.
  • grader — reads <run-dir>/outputs/, grades against the eval's declared expectations, writes grading.json.
  • benchmark-analyzer — surfaces patterns the raw stats hide (non-discriminating expectations, flaky evals, time/token tradeoffs).
  • post-iteration-analyzer — explains why a change helped or hurt between iterations, with actionable guidance for the next round.

Hooks

A single PostToolUse hook in hooks/hooks.json routes through hooks/hook_dispatcher.py to two handlers:

  • On SKILL.md save → lint_skill runs in-process (no subprocess overhead) and surfaces findings as a systemMessage.
  • On .claude-plugin/plugin.json save → claude plugin validate --strict runs (8s timeout) and surfaces the result.

Both are toggled by userConfig flags (auto_review_on_save, auto_validate_manifest). Both fail open: a broken handler emits a diagnostic to stderr and exits 0.

CLI (bin/skillsmith)

A thin Python dispatcher added to PATH while the plugin is enabled. Subcommands route to python -m scripts.<module> with cwd set to the plugin root.

skillsmith lint <skill-path>
skillsmith lint-plugin <plugin-path>
skillsmith validate-workspace <iteration-dir>
skillsmith aggregate <iteration-dir> --skill-name <name>
skillsmith package --skill-path <p> --plugin-name <n> --output <o>
skillsmith package --archive --plugin-path <p> --output <p>.zip
skillsmith marketplace --plugin-path <p>
skillsmith scaffold --spec <spec.json> --target <dir>
skillsmith run-eval --eval-set <f> --skill-path <s> --skill-name <n>
skillsmith run-loop --eval-set <f> --skill-path <s> [--max-iterations N]
skillsmith improve-description --skill-name <n> --current-description <s> --eval-results <r>
skillsmith review <workspace> [--static <out>]
skillsmith trigger-eval-template
skillsmith root

Eval viewer

eval-viewer/generate_review.py reads a workspace directory, finds every run (any directory with an outputs/ subdir), embeds the output files (text inline, images/PDF/xlsx as base64 data URIs), and serves a self-contained HTML page. Feedback auto-saves to feedback.json. Render with skillsmith review <workspace>.


Configuration

Set at install time via Claude Code's userConfig prompt:

Setting Default Purpose
default_author_name Void Freud Written into scaffolded plugin.json files.
default_author_email voidfreud@users.noreply.github.com Same.
auto_review_on_save true Trigger lint_skill on SKILL.md save.
auto_validate_manifest true Trigger claude plugin validate --strict on plugin.json save.
default_eval_iterations 3 How many run_loop iterations before declaring a skill ready.
default_target_env claude-code Default target environment for new skills.
workspace_root ~/code Where eval workspaces are placed.

Change them at any time in ~/.claude/settings.json under plugins.skillsmith.userConfig.


Architecture

skillsmith/
├── .claude-plugin/
│   ├── plugin.json            # Manifest with userConfig schema
│   └── marketplace.json       # Self-distribution manifest (auto-generated)
├── skills/                    # 23 skills: 9 verb + 14 knowledge
├── agents/                    # 7 specialized subagents
├── hooks/
│   ├── hooks.json             # PostToolUse routing table
│   └── hook_dispatcher.py     # Thin entry point — delegates to scripts.hook_handler
├── scripts/                   # Python package (importable as `scripts.X`)
│   ├── lint_skill.py          # SKILL.md linter (in-process from hook)
│   ├── lint_plugin.py         # Plugin linter (manifest + layout + security)
│   ├── scaffold.py            # Skill/plugin tree writer
│   ├── package_plugin.py      # Skill→plugin wrapper + archive packager
│   ├── generate_marketplace_json.py
│   ├── run_eval.py            # Trigger-rate eval (claude -p × N runs)
│   ├── run_loop.py            # Description-optimization loop
│   ├── improve_description.py # `claude -p` prompt for a better description
│   ├── aggregate_benchmark.py # Per-run grading → benchmark.json/md
│   ├── validate_workspace.py  # Pre-aggregation sanity check
│   ├── hook_handler.py        # Hook event dispatcher (called by entry point)
│   └── utils.py               # Shared: parse_frontmatter, lint report,
│                              # name validation, plugin-root discovery,
│                              # SKILL.md description read/replace.
├── eval-viewer/
│   ├── generate_review.py     # HTTP server + static HTML generator
│   ├── viewer.html            # Standalone viewer (XSS-safe escapeHtml + textContent)
│   └── trigger_review.html    # Eval-review trigger template
├── bin/
│   └── skillsmith             # CLI wrapper (PATH-exposed)
├── tests/                     # 176 tests covering every testable component
├── evals/                     # Self-eval suite + fixtures
│   ├── evals.json             # 8 prompts: create, audit, lint, eval, …
│   ├── trigger-evals.json     # 18 queries for description optimization
│   └── fixtures/              # broken-skill + pdf-helper fixtures
├── audits/                    # Closed audit reports (per-finding verifications)
├── pyproject.toml             # ruff + pytest config
├── CHANGELOG.md
└── DEFERRED.md                # Items deferred by design, with reasoning

Module-or-script invocation

Every entry point under scripts/ supports three invocation styles:

python -m scripts.lint_skill <path>        # package-style (canonical)
python scripts/lint_skill.py <path>        # direct file invocation
skillsmith lint <path>                     # via the CLI dispatcher

The scripts/ package contains a five-line bootstrap that re-anchors __package__ and adds the plugin root to sys.path when the file is run directly. No noqa markers anywhere in the codebase.

Hook safety

Every hook script ends with finally: sys.exit(0) (pattern forked from hookify). A broken rule or crashed handler can never block your tool calls. If a systemMessage from skillsmith reports an internal error, the hook failed open — your work continues unaffected.


Development

git clone https://github.com/voidfreud/skillsmith
cd skillsmith

# Self-test
uv run pytest tests/                       # 176 tests, ~2s
uvx ruff check .                           # all checks must pass
uvx ruff format --check .                  # formatting must be clean
uv run python -m scripts.lint_plugin .     # the plugin must lint itself clean
claude plugin validate . --strict          # the manifest must pass strict

A passing self-test means:

  • The plugin manifest validates strict.
  • Every shipped skill lints clean (regression-guarded in test_lint_skill.py).
  • The trigger-detection heuristic correctly identifies the 7 documented Claude invocation signals (test_run_eval.py).
  • The scaffolder rejects path-traversal names and budgets descriptions under the 1024-char cap (test_scaffold.py).
  • The packager ships archives with the manifest name as the top-level dir, drops .claude/ and symlinks (test_package_plugin.py).
  • Every script's argparse --help resolves without import errors (test_skillsmith_cli.py).

What's deferred

DEFERRED.md lists everything not shipped in the current release with reasoning. The short version: nothing is "we didn't get to it". Each entry is a deliberate choice — the rule engine is deferred until user-customizable lint rules exist (no rules → nothing to engine over), the live end-to-end eval run is deferred because it costs real model time (the framework ships, the run is a user-maintenance task), and a few skill responsibility overlaps are kept deliberately to make each skill self-contained when invoked.


Acknowledgements

Skillsmith forks code and patterns from the official Anthropic plugin examples:

  • plugin-devskill-reviewer, plugin-validator, the 8-phase create-plugin workflow.
  • hookify — the finally: sys.exit(0) runtime safety pattern, single-dispatcher design, the writing-rules operator catalog.
  • feature-dev — parallel agent fan-out pattern.
  • code-review — two-pass confidence gating.
  • claude-code-skill-factory — the interview-first scaffolding flow.

License

MIT. See LICENSE.

About

Master Claude Code plugin for creating, auditing, evaluating, optimizing, and packaging skills and plugins

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors