ultra-ml-intern: ML engineering intern for Claude Code

ultra-instinct ML engineering intern for Claude Code. Reads papers, audits datasets, ships SFT/DPO/LoRA runs to Hugging Face.

ultra-ml-intern is a Claude Code plugin that gives Claude the workflow of an ML engineering intern. It researches ML papers, audits Hugging Face datasets, designs fine-tuning recipes (SFT, DPO, GRPO, LoRA, QLoRA, RLHF), and submits training jobs to HF Jobs with Trackio monitoring.

The procedural knowledge comes from huggingface/ml-intern, HF's standalone Python harness around the Claude API. This repo wires the same intelligence into Claude Code, Anthropic's official agentic harness for Claude. Same model, a more capable loop, and you bring your own Claude (Max subscription or API key) instead of paying for a second harness on top.

Works in any Claude Code surface: terminal CLI, IDE extensions, and the web app.

Install

# In any Claude Code session:
/plugin marketplace add infiniV/ultra-ml-intern
/plugin install ml-intern@ultra-ml-intern

Restart Claude Code, then verify with /plugin and /agents. The slash commands (/ml-intern, /ml-research, …) keep their short names; the ultra- prefix is just the package wrapper.

What you get:

1 skill: ml-intern (the workflow)
5 slash commands: /ml-intern, /ml-research, /ml-audit, /ml-preflight, /ml-train
3 subagents: ml-paper-researcher, dataset-auditor, training-job-architect
1 MCP server: Hugging Face (activates when HF_TOKEN is set)

Quickstart

> "fine-tune Qwen3-0.5B for math reasoning"

The skill activates automatically and walks the 6-step research-driven workflow:

Find the landmark paper for the task
Crawl the citation graph for recent SOTA
Read methodology sections (3, 4, 5) and extract the recipe
Validate the dataset and base model exist on Hub
Write a training script grounded in current TRL APIs
Pre-flight check → smoke test → full hf jobs run with Trackio monitoring

What it does

You ask	It does
"fine-tune X for Y"	Full pipeline: literature review → dataset audit → training-job design → smoke test → full run
"what's the best recipe for X"	Dispatches the `ml-paper-researcher` subagent; returns recipe + citations
"audit dataset Y"	Dispatches the `dataset-auditor`; returns schema, anomalies, GO/NO-GO verdict
"preflight train.py"	Catches missing `push_to_hub`, default 30m timeout, bf16 on T4, missing flash-attn install, before you spend cluster hours
"submit hf jobs run"	Walks pre-flight → cost estimate → smoke test → full submission → Trackio dashboard URL

Commands

Command	What it does
`/ml-intern`	The full pipeline: research → audit → train → ship
`/ml-research`	Literature review only: landmark paper, citation graph, extracted recipe
`/ml-audit`	Dataset audit only: schema, samples, anomalies, training-method recommendation
`/ml-preflight`	Sanity-check a training script before submission
`/ml-train`	Submit a training job. Local-first when a GPU is available, HF Jobs when not

Subagents

Subagent	Role
`ml-paper-researcher`	Crawls arXiv + cites the landmark paper, extracts the methodology section into a recipe
`dataset-auditor`	Inspects HF datasets: schema, sample rows, distribution checks, anomaly flagging
`training-job-architect`	Writes the TRL/Transformers training script + the `hf jobs run` command sized to your hardware

Training recipes supported

The plugin recognizes and writes scripts for:

SFT (Supervised Fine-Tuning) for single-turn and multi-turn chat
DPO (Direct Preference Optimization) on pairwise preference data
GRPO (Group Relative Policy Optimization) for reasoning tasks, DeepSeek-style
LoRA and QLoRA for parameter-efficient fine-tuning, including 4-bit quantization
RLHF with the full reward-model plus PPO pipeline
Continued pretraining for domain adaptation on raw text

All grounded in the current TRL API. The ml-paper-researcher reads the actual library source, not its training data, before writing imports.

Hardware sizing built in

The plugin knows the HF Jobs flavors (t4-small through a100x8) and picks one to fit your model:

Model size	Default flavor
1–3B	`a10g-largex2` (48 GB)
7–13B	`a100-large` (80 GB)
30B+	`l40sx4` or `a100x4`
70B+	`a100x8`

Full chart in references/hardware-sizing.md.

Mistakes it prevents

The 8 expensive errors from huggingface/ml-intern's system prompt, encoded here as procedural rules:

Hallucinated TRL/Transformers imports → plugin reads the current source first
Wrong trainer arguments → fetches the actual config docs before writing
Wrong dataset format → inspect_dataset.sh runs first
Default 30m timeout kills the job → minimum 2h enforced
Lost models (no push_to_hub=True) → preflight refuses without it
Batch submission failures → smoke-test one job first
Silent dataset substitution → surfaces the failure and asks the user
Hardcoded missing packages (flash-attn, etc.) → preflight catches it

One more rule worth calling out: no scope-changing fixes. OOM doesn't mean silently rewriting SFT into LoRA. It means a proper batch and grad-accum reduction.

Requirements

Claude Code (any surface: terminal, IDE, web)
Your own Claude access (Max subscription or API key) for the model itself
Bash + standard Unix tools
For HF Jobs submission: a Hugging Face account with billing enabled
Optional: HF_TOKEN exported for the bundled MCP server

export HF_TOKEN="$(hf auth print-token)"  # or paste from https://huggingface.co/settings/tokens

The MCP server adds Hub doc semantic search and community Gradio Space tools. The plugin works without it. It falls back to WebFetch and the bundled inspect_dataset.sh, crawl_arxiv.sh, and hf_paper_meta.sh helpers.

Relationship to huggingface/ml-intern

Both projects use the same model, Claude. What differs is the harness around it.

The upstream is a standalone Python project: roughly 50k lines of agent loop, plan tool, paper and dataset tools, jobs tool, and a hand-built system prompt. It calls the Claude API directly. It shipped first and the procedural knowledge in this repo is theirs.

Claude Code is Anthropic's official agentic harness for Claude. The agent loop, the planner (TodoWrite), subagent dispatch, MCP plumbing, and the editor surface are already there and tested. This plugin wires the procedural knowledge from the upstream into those primitives, so Claude runs the same workflow without a parallel harness sitting on top of the API.

In practice: same Claude model, a more capable surrounding loop, no extra billing layer (your Claude Max subscription or API key is enough), and the workflow runs anywhere Claude Code already runs.

If you want the standalone Python tool, use the upstream. If you already work in Claude Code, use this.

Uninstall

/plugin uninstall ml-intern@ultra-ml-intern
/plugin marketplace remove ultra-ml-intern

Contributing

Issues and PRs welcome at https://github.com/infiniV/ultra-ml-intern.

CI runs:

shellcheck on every script in skills/ml-intern/scripts/
bash -n syntax check on the same
JSON schema validation on .claude-plugin/plugin.json and .claude-plugin/marketplace.json

Credits

Upstream: huggingface/ml-intern by the Hugging Face team. The procedural knowledge is theirs.
Plugin format reference: obra/superpowers.
Plugin skeleton mirrors infiniV/claude-code-audio-notify.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
agents		agents
assets		assets
commands		commands
skills/ml-intern		skills/ml-intern
.gitignore		.gitignore
.mcp.json		.mcp.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ultra-ml-intern: ML engineering intern for Claude Code

Install

Quickstart

What it does

Commands

Subagents

Training recipes supported

Hardware sizing built in

Mistakes it prevents

Requirements

Relationship to huggingface/ml-intern

Uninstall

Contributing

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ultra-ml-intern: ML engineering intern for Claude Code

Install

Quickstart

What it does

Commands

Subagents

Training recipes supported

Hardware sizing built in

Mistakes it prevents

Requirements

Relationship to huggingface/ml-intern

Uninstall

Contributing

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages