ultra-instinct ML engineering intern for Claude Code. Reads papers, audits datasets, ships SFT/DPO/LoRA runs to Hugging Face.
ultra-ml-intern is a Claude Code plugin that gives Claude the workflow of an ML engineering intern. It researches ML papers, audits Hugging Face datasets, designs fine-tuning recipes (SFT, DPO, GRPO, LoRA, QLoRA, RLHF), and submits training jobs to HF Jobs with Trackio monitoring.
The procedural knowledge comes from huggingface/ml-intern, HF's standalone Python harness around the Claude API. This repo wires the same intelligence into Claude Code, Anthropic's official agentic harness for Claude. Same model, a more capable loop, and you bring your own Claude (Max subscription or API key) instead of paying for a second harness on top.
Works in any Claude Code surface: terminal CLI, IDE extensions, and the web app.
# In any Claude Code session:
/plugin marketplace add infiniV/ultra-ml-intern
/plugin install ml-intern@ultra-ml-internRestart Claude Code, then verify with /plugin and /agents. The slash commands (/ml-intern, /ml-research, …) keep their short names; the ultra- prefix is just the package wrapper.
What you get:
- 1 skill:
ml-intern(the workflow) - 5 slash commands:
/ml-intern,/ml-research,/ml-audit,/ml-preflight,/ml-train - 3 subagents:
ml-paper-researcher,dataset-auditor,training-job-architect - 1 MCP server: Hugging Face (activates when
HF_TOKENis set)
> "fine-tune Qwen3-0.5B for math reasoning"
The skill activates automatically and walks the 6-step research-driven workflow:
- Find the landmark paper for the task
- Crawl the citation graph for recent SOTA
- Read methodology sections (3, 4, 5) and extract the recipe
- Validate the dataset and base model exist on Hub
- Write a training script grounded in current TRL APIs
- Pre-flight check → smoke test → full
hf jobs runwith Trackio monitoring
| You ask | It does |
|---|---|
| "fine-tune X for Y" | Full pipeline: literature review → dataset audit → training-job design → smoke test → full run |
| "what's the best recipe for X" | Dispatches the ml-paper-researcher subagent; returns recipe + citations |
| "audit dataset Y" | Dispatches the dataset-auditor; returns schema, anomalies, GO/NO-GO verdict |
| "preflight train.py" | Catches missing push_to_hub, default 30m timeout, bf16 on T4, missing flash-attn install, before you spend cluster hours |
| "submit hf jobs run" | Walks pre-flight → cost estimate → smoke test → full submission → Trackio dashboard URL |
| Command | What it does |
|---|---|
/ml-intern |
The full pipeline: research → audit → train → ship |
/ml-research |
Literature review only: landmark paper, citation graph, extracted recipe |
/ml-audit |
Dataset audit only: schema, samples, anomalies, training-method recommendation |
/ml-preflight |
Sanity-check a training script before submission |
/ml-train |
Submit a training job. Local-first when a GPU is available, HF Jobs when not |
| Subagent | Role |
|---|---|
ml-paper-researcher |
Crawls arXiv + cites the landmark paper, extracts the methodology section into a recipe |
dataset-auditor |
Inspects HF datasets: schema, sample rows, distribution checks, anomaly flagging |
training-job-architect |
Writes the TRL/Transformers training script + the hf jobs run command sized to your hardware |
The plugin recognizes and writes scripts for:
- SFT (Supervised Fine-Tuning) for single-turn and multi-turn chat
- DPO (Direct Preference Optimization) on pairwise preference data
- GRPO (Group Relative Policy Optimization) for reasoning tasks, DeepSeek-style
- LoRA and QLoRA for parameter-efficient fine-tuning, including 4-bit quantization
- RLHF with the full reward-model plus PPO pipeline
- Continued pretraining for domain adaptation on raw text
All grounded in the current TRL API. The ml-paper-researcher reads the actual library source, not its training data, before writing imports.
The plugin knows the HF Jobs flavors (t4-small through a100x8) and picks one to fit your model:
| Model size | Default flavor |
|---|---|
| 1–3B | a10g-largex2 (48 GB) |
| 7–13B | a100-large (80 GB) |
| 30B+ | l40sx4 or a100x4 |
| 70B+ | a100x8 |
Full chart in references/hardware-sizing.md.
The 8 expensive errors from huggingface/ml-intern's system prompt, encoded here as procedural rules:
- Hallucinated TRL/Transformers imports → plugin reads the current source first
- Wrong trainer arguments → fetches the actual config docs before writing
- Wrong dataset format →
inspect_dataset.shruns first - Default 30m timeout kills the job → minimum 2h enforced
- Lost models (no
push_to_hub=True) → preflight refuses without it - Batch submission failures → smoke-test one job first
- Silent dataset substitution → surfaces the failure and asks the user
- Hardcoded missing packages (flash-attn, etc.) → preflight catches it
One more rule worth calling out: no scope-changing fixes. OOM doesn't mean silently rewriting SFT into LoRA. It means a proper batch and grad-accum reduction.
- Claude Code (any surface: terminal, IDE, web)
- Your own Claude access (Max subscription or API key) for the model itself
- Bash + standard Unix tools
- For HF Jobs submission: a Hugging Face account with billing enabled
- Optional:
HF_TOKENexported for the bundled MCP server
export HF_TOKEN="$(hf auth print-token)" # or paste from https://huggingface.co/settings/tokensThe MCP server adds Hub doc semantic search and community Gradio Space tools. The plugin works without it. It falls back to WebFetch and the bundled inspect_dataset.sh, crawl_arxiv.sh, and hf_paper_meta.sh helpers.
Both projects use the same model, Claude. What differs is the harness around it.
The upstream is a standalone Python project: roughly 50k lines of agent loop, plan tool, paper and dataset tools, jobs tool, and a hand-built system prompt. It calls the Claude API directly. It shipped first and the procedural knowledge in this repo is theirs.
Claude Code is Anthropic's official agentic harness for Claude. The agent loop, the planner (TodoWrite), subagent dispatch, MCP plumbing, and the editor surface are already there and tested. This plugin wires the procedural knowledge from the upstream into those primitives, so Claude runs the same workflow without a parallel harness sitting on top of the API.
In practice: same Claude model, a more capable surrounding loop, no extra billing layer (your Claude Max subscription or API key is enough), and the workflow runs anywhere Claude Code already runs.
If you want the standalone Python tool, use the upstream. If you already work in Claude Code, use this.
/plugin uninstall ml-intern@ultra-ml-intern
/plugin marketplace remove ultra-ml-intern
Issues and PRs welcome at https://github.com/infiniV/ultra-ml-intern.
CI runs:
shellcheckon every script inskills/ml-intern/scripts/bash -nsyntax check on the same- JSON schema validation on
.claude-plugin/plugin.jsonand.claude-plugin/marketplace.json
- Upstream:
huggingface/ml-internby the Hugging Face team. The procedural knowledge is theirs. - Plugin format reference:
obra/superpowers. - Plugin skeleton mirrors
infiniV/claude-code-audio-notify.
