14 specialist agents Β· 20+ slash-command workflows Β· 5 domain plugins β opinionated Claude Code + Codex CLI configuration for Python/ML OSS maintainers, version-controlled and self-calibrating.
Contents
Things not possible with vanilla Claude Code:
-
Parallel multi-specialist PR review with convergence callouts.
/oss:reviewfans six specialist agents β architecture, tests, perf, docs, lint, security β plus an independent Codex pre-pass, all running simultaneously. The consolidator flags every finding that two or more reviewers independently raised. You see both per-dimension analysis and the overlap, in one report. -
Feature development that cannot skip the demo test.
/develop:featurerequires a failing demo test to exist and pass review before a single line of production code is written. The gate is structural β the workflow does not proceed to implementation without it. -
Metric-driven experiment loops that auto-rollback on regression.
/research:runproposes a change, applies it, measures the target metric, and automatically reverts if the metric regresses β then tries the next hypothesis. The loop runs unattended; you set the goal and the guard, and review the committed result. -
Agent calibration benchmarks that measure overconfidence and fix it.
/foundry:calibrategenerates synthetic problems, scores each agent's responses against ground truth, and computes the gap between stated confidence and actual recall. Agents that are systematically overconfident get concrete fix proposals β applied automatically with--apply.
| Capability | Vanilla Claude Code | Borda's AI-Rig |
|---|---|---|
| Code review | Generalist single pass | 6 specialists in parallel + Codex pre-pass; convergence callouts |
| Context flooding | Context fills up across long sessions | File-based handoff β agents write full output to disk, return compact envelopes |
| Confidence calibration | No mechanism | /foundry:calibrate benchmarks recall vs stated confidence; auto-apply fixes |
| Demo-test gate | Skippable | Structural gate β /develop:feature cannot proceed without passing demo test |
| ML experiment safety | Manual rollback | /research:run auto-reverts regressions; goal + guard are explicit inputs |
| Release discipline | Manual | SemVer-aware /oss:release with deprecation tracking, migration guide, readiness audit |
| Token efficiency | Default verbosity | RTK hook compresses Bash output 60β99%; caveman plugin cuts response tokens ~75% |
# Install Claude Code
npm install -g @anthropic-ai/claude-code
# 1. Clone (run from the directory that will CONTAIN the clone)
git clone https://github.com/Borda/AI-Rig Borda-AI-Rig
# 2. Register as a local marketplace
claude plugin marketplace add ./Borda-AI-Rig
# 3. Install all five plugins
claude plugin install foundry@borda-ai-rig # base agents + audit, manage, calibrate, brainstorm, β¦
claude plugin install oss@borda-ai-rig # OSS workflow: analyse, review, resolve, release
claude plugin install develop@borda-ai-rig # development: feature, fix, refactor, plan, debug
claude plugin install research@borda-ai-rig # ML research: topic, plan, judge, run, sweep
claude plugin install codemap@borda-ai-rig # structural index: import graph, blast-radius scoresNote
Safe to install alongside any existing Claude Code setup. Plugins live in a private cache (~/.claude/plugins/cache/<plugin>/) under their own namespace. Your existing ~/.claude/agents/, ~/.claude/skills/, and settings.json are never modified or overwritten β custom agents and skills you have created remain fully independent. See the Claude Code plugin reference for details.
4. One-time settings merge β run inside Claude Code:
/foundry:init
OSS, develop, and research skills always use their plugin prefix (/oss:review, /develop:fix, /research:run). Safe to re-run.
Important
Codex CLI β optional companion; the plugins install Claude Code agents and skills only:
npm install -g @openai/codex
cp -r Borda-AI-Rig/.codex/ ~/.codex/ # Codex agents and profilesβ See Token Savings (RTK) for RTK install details.
A typical maintainer morning β 15 new issues, 3 PRs waiting, a release due:
# 1. Morning triage β what needs attention?
/oss:analyse health # repo overview, duplicate issue clustering, stale PR detection
# 2. Review incoming PRs
/oss:review 55 --reply # 7-agent review + welcoming contributor comment
# β or: full review first, then apply every finding in one automated pass
/oss:review 21 # 7-agent review β saved findings report
/oss:resolve 21 report # Codex reads the report and applies every comment
# 3. Fix the critical bug from overnight
/oss:analyse 42 # understand the issue
/develop:fix 42 # reproduce β regression test β minimal fix β quality stack
# 4. Ship the release
/oss:release prepare v2.1.0 # changelog, notes, migration guide, readiness audit
Each command chains agents in a defined topology β see Common Workflow Sequences below for more patterns.
Without AI-Rig: one generalist handles architecture, implementation, documentation, linting, testing, and performance with no boundary enforcement. A PR review misses the cache race condition because nobody ran the right checklist. The release gets wrong SemVer because nobody counted the breaking changes. ML experiments run without a judge gate and silently fail to improve anything. Corrections evaporate between sessions.
With AI-Rig: each part of the loop has a dedicated skill backed by a calibrated specialist agent. The agents know your conventions, enforce discipline at every gate, and feed corrections back into their own instructions. The feedback loop is closed.
Managing AI coding workflows for Python/ML OSS is complex β you need domain-aware agents, not generic chat. This config packages 14 specialist agents and 20+ slash-command skill workflows across five focused plugins, in a version-controlled, continuously benchmarked setup optimized for:
- Python/ML OSS libraries requiring SemVer discipline and deprecation cycles
- ML training and inference codebases needing GPU profiling and data pipeline validation
- Multi-contributor projects with CI/CD, pre-commit hooks, and automated releases
- Agents are roles, skills are workflows β agents carry domain expertise, skills orchestrate multi-step processes
- No duplication β agents reference each other instead of repeating content
- Profile-first, measure-last β performance skills always bracket changes with measurements
- Link integrity β never cite a URL without fetching it first (enforced in all research agents)
- Python 3.10+ baseline β all configs target py310 minimum (3.9 EOL was Oct 2025)
- Modern toolchain β uv, ruff, mypy, pytest, GitHub Actions with trusted publishing
14 specialist agents (expand)
Specialist roles with deep domain knowledge β requested by name, or auto-selected by Claude Code and Codex CLI.
| Agent | Claude [plugins] | Codex | Purpose |
|---|---|---|---|
| doc-scribe | π foundry | β | Google/Napoleon docstrings, Sphinx/mkdocs, API references |
| linting-expert | π foundry | β | ruff, mypy, pre-commit, type annotations |
| perf-optimizer | π foundry | β | Profile-first CPU/GPU/memory/I/O, torch.compile |
| qa-specialist | π foundry | β | pytest, hypothesis, mutation testing, ML test patterns |
| curator | π foundry | β | Config quality review, duplication detection, cross-ref audit |
| solution-architect | π foundry | β | System design, ADRs, API surface, migration plans |
| sw-engineer | π foundry | β | Architecture, implementation, SOLID principles, type safety |
| web-explorer | π foundry | β | API version comparison, migration guides, PyPI tracking |
| challenger | π foundry | β | Adversarial plan/architecture/code review; default-on in all develop skills + oss:review (--no-challenge to skip) |
| creator | π foundry | β | Blog posts, Marp slide decks, social threads, talk abstracts β four-beat narrative arc (ProblemβJourneyβInsightβAction) calibrated to audience; reads /foundry:create outline files |
| cicd-steward | π’ oss | β | GitHub Actions, test matrices, flaky test detection, caching |
| shepherd | π’ oss | β | Issue triage, PR review, SemVer, releases, trusted publishing |
| data-steward | π£ research | β | Dataset versioning, split validation, leakage detection |
| scientist | π£ research | β | Paper analysis, hypothesis generation, experiment design |
Agents and skills for Claude Code (Anthropic's AI coding CLI).
20+ slash-command skills reference (expand)
Skills are multi-agent workflows invoked via slash commands. Each skill composes several agents in a defined topology.
After running /foundry:init, foundry skills are available without a prefix. OSS, develop, and research skills always use their plugin prefix.
| Skill | What It Does |
|---|---|
π /foundry:brainstorm |
/brainstorm <idea> β clarifying questions β approaches β spec β curator review β approval gate; breakdown <spec> β ordered task table with per-task skill tags |
π /foundry:manage |
Create, update, delete agents/skills/rules; manage settings.json permissions; auto type-detection and cross-ref propagation |
π /foundry:investigate |
Systematic diagnosis for unknown failures β env, tools, hooks, CI divergence; ranks hypotheses and hands off to the right skill |
π /foundry:session |
Parking lot for diverging ideas β auto-parks unanswered questions and deferred threads; resume shows pending, archive closes, summary digests the session |
π /foundry:audit |
Config audit: broken refs, inventory drift, docs freshness; fix level chosen from always-fire follow-up gate; --upgrade applies docs-sourced improvements; --adversarial runs challenger + Codex review |
π /foundry:calibrate |
Synthetic benchmarks measuring recall vs confidence bias |
π /foundry:distill |
Suggest new agents/skills, prune memory, consolidate lessons into rules; external <source> analyses an external plugin/skill/agent resource and produces a scored adoption proposal with install-as-is recommendation |
π /foundry:create |
Interactive outline co-creation for developer advocacy content β format, audience, arc, voice β .plans/content/<slug>-outline.md; hand-off to foundry:creator for one-shot generation |
π΅ /develop:plan |
Scope analysis and implementation planning without code changes |
π΅ /develop:feature |
TDD-first feature implementation: codebase analysis, demo test, TDD loop, docs, review |
π΅ /develop:fix |
Reproduce-first bug fixes: regression test, minimal fix, quality stack |
π΅ /develop:debug |
Systematic debugging for known test failures |
π΅ /develop:refactor |
Test-first refactors with scope analysis |
π΅ /develop:review |
Six-agent parallel review of local files or current git diff; no GitHub PR needed |
π’ /oss:analyse |
GitHub thread analysis; health = repo overview + duplicate issue clustering |
π’ /oss:review |
Tiered parallel review of GitHub PRs; --reply drafts welcoming contributor comments |
π’ /oss:resolve |
OSS fast-close: resolving conflicts + applying review comments via codex-plugin-cc; three source modes: pr, report, pr + report |
π’ /oss:release |
SemVer-disciplined release pipeline: notes, changelog with deprecation tracking, migration guides, full prepare pipeline |
π£ /research:topic |
SOTA literature research with codebase-mapped implementation plan |
π£ /research:plan |
Config wizard: profile-first bottleneck discovery β program.md |
π£ /research:judge |
Research-supervisor review of experimental methodology (APPROVED/NEEDS-REVISION/BLOCKED) |
π£ /research:run |
Metric-driven iteration loop; --resume continues after crash; --team for parallel exploration; --colab for GPU workloads |
π£ /research:sweep |
Non-interactive pipeline: auto-plan β judge gate β run |
β Full command reference, orchestration flows, rules (10 auto-loaded rule files), architecture internals, status line β see .claude/README.md β Skills
Skills chain naturally β the output of one becomes the input for the next.
Bug report β fix β validate
/oss:analyse 42 # understand the issue, extract root cause hypotheses
/develop:fix 42 # reproduce with test, apply targeted fix
/oss:review # validate the fix meets quality standards
Code review β fix blocking issues
/oss:review 55 # 7 agent dimensions + Codex co-review
/develop:fix "race condition in cache invalidation" # fix blocking issue from review
/oss:review 55 # re-review after fix
Fuzzy idea β spec β breakdown β implement
/foundry:brainstorm "add caching layer to the data pipeline"
# clarifying questions β 2β3 approaches β spec saved to .plans/blueprint/ β curator review β approval
/foundry:brainstorm breakdown .plans/blueprint/2026-04-01-caching-layer.md
# reads spec β ordered task table with per-task skill/command tags:
# | 1 | audit existing pipeline | /foundry:audit |
# | 2 | implement caching layer | /develop:feature |
# | 3 | run quality gates | /develop:review |
# then execute each row in the breakdown table using its tagged skill
OSS contributor PR triage β review β reply
Preferred flow for maintainers responding to external contributions:
/oss:analyse 42 --reply # assess PR readiness + draft contributor reply in one step
# or if you need the full deep review first:
/oss:review 42 --reply # 7-agent + Codex co-review + draft overall comment + inline comments table
# output: .temp/output-reply-pr-42-dev-<date>.md
# post when ready:
gh pr comment 42 --body "$(cat .temp/output-reply-pr-42-dev-<date>.md)"
Both --reply flags produce a two-part shepherd output: an overall PR comment (prose, warm, decisive) and an inline comments table (file | line | 1β2 sentence fix). The /oss:analyse path is faster for routine triage; /oss:review gives deeper findings for complex PRs.
β More sequences, full orchestration flows, and architecture internals: .claude/README.md
Multi-agent configuration for OpenAI Codex CLI. Default session model is gpt-5.4-mini, with 12 specialist agents and a mirrored skill backbone (review/develop/resolve/audit + calibrate/release/investigate/sync/manage/analyse/optimize/research).
npm install -g @openai/codex # install Codex CLI
cp -r Borda-AI-Rig/.codex/ ~/.codex/ # activate globally (run from parent of clone)After pulling updates, re-apply: cp -r Borda-AI-Rig/.codex/ ~/.codex/ β or rsync -av to preserve local customizations.
Mirrored skills are prompt-based β not slash commands:
codex # interactive β auto-selects agents
codex "use the qa-specialist to review src/api/auth.py" # address agent by name
codex --profile deep-review "full security audit of src/api/" # activate a profilerun investigate on this branch and find root cause of failing CI
run resolve for the current working tree and fix high-severity findings
β Deep reference β agents, profiles, adversarial review, mirrored skills, RTK integration: .codex/README.md
Claude and Codex complement each other β Claude handles long-horizon reasoning, orchestration, and judgment calls; Codex handles focused, mechanical in-repo coding tasks with direct shell access.
Every skill that reviews or validates code uses a three-tier pipeline:
- Tier 0 (mechanical
git diff --statgate) - Tier 1 (codex:review pre-pass, ~60s, diff-focused)
- Tier 2 (specialized Claude agents).
Cheaper tiers gate the expensive ones β this keeps full agent spawns reserved for diffs that actually need them. β Full architecture with skill-tier matrix: .claude/README.md β Tiered review pipeline
Why unbiased review matters / Real example: Claude makes targeted changes with intentionality β it has a mental model of which files are "in scope". Codex has no such context: it reads the diff and the codebase independently. During one session, Claude applied a docstring-style mandate across 6 files and scored its own confidence at 0.88. The Codex pre-pass then found skills/develop/modes/feature.md still referencing the old style β a direct miss. The union of both passes is more complete than either alone.
-
Offloading mechanical tasks from Claude to Codex
Claude identifies what needs to change and delegates execution to the plugin agent. Claude keeps its context clean and validates the output via
git diff HEAD.Dispatched automatically by
/oss:review,/oss:resolve,/calibrate, and/research:runviacodex-delegation.md. The plugin agent has full working-tree access. -
Codex reviewing staged work
After Claude stages changes,
codex:review --waitserves as a second pass β examining the diff, applying review comments, or resolving PR conflicts. The/oss:resolveskill automates this: it resolves conflicts semantically (Claude) then applies review comments (plugin agent)./oss:resolve 42 # Claude resolves conflicts β plugin agent applies review comments /oss:resolve "rename the `fit` method to `train` throughout the module"
Setup requirement
Install the Codex plugin in Claude Code:
/plugin marketplace add openai/codex-plugin-cc
/plugin install codex@openai-codex
/reload-plugins
Without the plugin: pre-pass review is skipped gracefully (skills check with claude plugin list | grep 'codex@openai-codex'); /oss:resolve's review-comment step is skipped (conflict resolution works with Claude alone).
RTK is an optional CLI proxy that compresses Bash output (git, pytest, build tools) before it reaches Claude β 60β99% token savings with no workflow changes. A PreToolUse hook (plugins/foundry/hooks/rtk-rewrite.js) transparently rewrites supported commands across all Claude skills; Codex runs get the same treatment via .codex/hooks/rtk-enforce.js. The hook is a no-op when RTK is not installed, so the config stays portable.
β Install instructions: rtk-ai/rtk
openai/codex-plugin-cc connects the Codex CLI to Claude Code as a local plugin β enabling the cross-validation, mechanical delegation, and diff pre-pass described in Claude + Codex Integration.
β Install: /plugin marketplace add openai/codex-plugin-cc β /plugin install codex@openai-codex β /reload-plugins
Note
RTK only compresses Bash tool output β shell commands like git, cargo, pytest, etc. It does not affect Claude Code's native tools (Read, Grep, Glob, Edit, Write), which run inside Claude's own engine and are already token-efficient by design.
cc-Lens is a local analytics dashboard for Claude Code β token/cost trends, tool usage breakdowns, session replay. Reads ~/.claude/ directly, no cloud, no data leaves the machine.
β Run: npx cc-lens β no install required
colab-mcp connects Google Colab as a remote GPU executor. Pre-configured in .mcp.json (disabled by default) β used by /research:run --colab to offload metric-improvement iterations to a cloud GPU without a local CUDA setup. Supports hardware selection: --colab=H100, --colab=L4, --colab=T4, --colab=A100.
β Enable: add "colab-mcp" to enabledMcpjsonServers in settings.local.json
semble runs a local MCP server that adds hybrid semantic + lexical search across any repo. When available, the develop and oss skills automatically expose mcp__semble__search to agents as a gap-fill tool β used when the codemap index is non-exhaustive. No cloud, no API key; runs fully local via uvx.
β Install (global, all projects): claude mcp add semble -s user -- uvx --from "semble[mcp]" semble
β Install (this project only): claude mcp add semble -s project -- uvx --from "semble[mcp]" semble
caveman makes Claude respond in compressed "caveman speak" β cutting ~75% of output tokens while retaining full technical accuracy. Adjustable intensity levels (lite β full β ultra β ζθ¨ζ) and a compression tool that also cuts ~46% of input tokens per session.
β Install: claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman
Repository layout
AI-Rig/
βββ plugins/
β βββ foundry/ # Base plugin: agents, hooks, audit/manage/calibrate/brainstorm/β¦
β β βββ .claude-plugin/
β β β βββ plugin.json # plugin manifest
β β βββ agents/ # 10 foundry agents (canonical source)
β β βββ skills/ # foundry skills (canonical source)
β β βββ rules/ # rule files (canonical source; symlinked from .claude/rules/)
β β βββ CLAUDE.md # workflow rules (symlinked from .claude/CLAUDE.md)
β β βββ TEAM_PROTOCOL.md # AgentSpeak v2 protocol (symlinked from .claude/TEAM_PROTOCOL.md)
β β βββ permissions-guide.md # allow-entry reference (symlinked from .claude/permissions-guide.md)
β β βββ hooks/
β β βββ hooks.json # task tracking, quality gates, preprocessing
β βββ oss/ # OSS plugin: shepherd, cicd-steward + analyse/review/resolve/release (+ internal: gh-scraper, repo-warden)
β βββ develop/ # Develop plugin: feature/fix/refactor/plan/debug
β βββ research/ # Research plugin: scientist, data-steward + topic/plan/judge/run/sweep
β βββ codemap/ # codemap plugin: structural index, blast-radius scores, import graph
βββ .claude/ # Claude Code source of truth
β βββ README.md # full reference: restore, skills, rules, hooks, architecture (real file)
β βββ CLAUDE.md # workflow rules and core principles (symlink β plugins/foundry/)
β βββ TEAM_PROTOCOL.md # AgentSpeak v2 inter-agent protocol (symlink β plugins/foundry/)
β βββ permissions-guide.md # allow-entry reference (symlink β plugins/foundry/)
β βββ settings.json # deny list + project preferences (real file)
β βββ agents/ # symlinks β plugins/foundry/agents/
β βββ skills/ # symlinks β plugins/foundry/skills/
β βββ rules/ # per-topic coding and config standards (symlinks β plugins/foundry/rules/)
β βββ hooks/ # symlinks β plugins/foundry/hooks/
βββ .mcp.json # MCP server definitions
βββ .codex/ # OpenAI Codex CLI
β βββ README.md # full reference: agents, profiles, Claude integration
β βββ AGENTS.md # global instructions and subagent spawn rules
β βββ config.toml # multi-agent config (gpt-5.4-mini baseline)
β βββ agents/ # per-agent model and instruction overrides
β βββ calibration/ # self-calibration harness + fixed task set
β βββ skills/ # codex-native workflow skills
βββ .pre-commit-config.yaml
βββ .gitignore
βββ README.md
cd Borda-AI-Rig && git pull
claude plugin install foundry@borda-ai-rig # reinstalls from updated source
claude plugin install oss@borda-ai-rig
claude plugin install develop@borda-ai-rig
claude plugin install research@borda-ai-rig
claude plugin install codemap@borda-ai-rigRe-run /foundry:init only if permissions or enabledPlugins changed. Re-run /foundry:init if you previously used the link mode β symlinks point to the old plugin cache after an upgrade.
claude --plugin-dir ./Borda-AI-Rig/plugins/foundryclaude plugin uninstall foundry
claude plugin uninstall oss
claude plugin uninstall develop
claude plugin uninstall research
claude plugin uninstall codemapSettings added by /foundry:init remain in ~/.claude/settings.json; remove manually if desired. If /foundry:init was run, symlinks in ~/.claude/agents/ and ~/.claude/skills/ also persist and will be broken after uninstall β remove with rm ~/.claude/agents/<name>.md and rm -rf ~/.claude/skills/<name> for each.
Questions? Open an issue or start a discussion.
Made with π by the Borda et al.