ML Engineer, LLM Systems, Agentic AI
Production AI systems at scale. LLM-powered agents, RAG pipelines, backend services in production.
| Project | What it ships | Hero number |
|---|---|---|
| langgraph-research-pilot | LangGraph + RAG research agent with live HuggingFace demo, HotpotQA benchmark | +17% F1 vs single-shot baseline |
| mcp-server-toolkit | Reusable Anthropic MCP framework, 3 example servers (filesystem, GitHub issues, SQLite), in-memory test runner | p99 tool round-trip 8.2 ms (6x under budget) |
| reproduce-stepback | ICLR 2024 reproduction on Claude Haiku 4.5, honest negative result | Paper's +5.5 abs-pts effect collapses to +0.0 |
| async-llm-batcher | Async LLM batch runner with checkpointing and resume-after-kill | 988/1000 success rate across 4 providers |
| streaming-etl-pipeline | Per-partition watermark ETL into DuckDB, 0% false-late events | 4,566 events/sec, 71% lower false-late than naive |
| fde-starter-kit | 5 deploy-ready Anthropic templates behind one fde-kit init CLI |
Customer demo to first deploy in under 10 minutes |
More at the pinned repos and the full project bench.
| Project | What | Hero |
|---|---|---|
| llm-knowledge-editing | ROME / LTE-LoRA / ICE benchmark on LLaMA2-7B, 4-GPU cluster | 12-page reproducibility report, 3 published-result discrepancies surfaced |
| cuda-inlj | GPU-accelerated B+ Tree nested-loop join | 4.2x speedup vs CPU on 10M rows, within 5% of theoretical bound |
| dota2-build-generator | XGBoost + PyTorch embeddings recommender, 50K+ matches | 78% top-3 accuracy with SHAP feature importance |
Production LLM agent systems with LangGraph, Anthropic MCP servers, and reliability tooling. Open to remote roles in AI engineering, ML platform, and forward-deployed engineering.
