| Feature | SmallCode | OpenCode | Pi Agent |
|---|---|---|---|
| Target audience | Small local LLMs (7B-20B) | Frontier models (Claude, GPT) | Any model, minimal harness |
| Language | JavaScript (Node.js) | TypeScript (rewrite from Go) | TypeScript |
| TUI | Fullscreen alternate buffer | Fullscreen (OpenTUI/Bubble Tea) | Minimal readline |
| Command palette | ✓ (/ autocomplete) | ✓ | ✗ |
| Alternate screen | ✓ | ✓ | ✗ |
| Themes | 3 (dark/light/minimal) | Multiple | Themes via packages |
| Multi-session | ✗ | ✓ (parallel agents) | ✗ |
| Shareable session links | ✗ | ✓ | ✗ |
| LSP integration | ✗ | ✓ (auto-loads per language) | ✗ |
| Desktop app | ✗ | ✓ (Electron) | ✗ |
| Model providers | Ollama, LM Studio, OpenAI-compat | 15+ (Claude, GPT, Gemini, etc.) | 15+ providers |
| Local model optimized | ✓ (core design goal) | ✗ (assumes frontier) | ✓ (minimal prompt) |
| Tools | 15+ built-in | 8 core | 4 core (read, write, edit, bash) |
| Compound tools | ✓ (read_and_patch, etc.) | ✗ | ✗ |
| Code graph / retrieval | ✓ (budget-aware-mcp) | ✗ (full file reads) | ✗ |
| Token budgeting | ✓ (auto-compact, capped retrieval) | ✗ | ✗ (tiny system prompt) |
| Memory (persistent) | ✓ (SQLite + FTS5, typed) | ✗ | ✗ |
| Plugin system | ✓ (tools, commands, hooks, prompts) | Skills (prompt templates) | Extensions + Skills + Packages |
| Skill system | ✓ (manual/auto/match triggers) | ✓ (customize-opencode skill) | ✓ (lazy-loaded, npm packages) |
| MCP support | ✓ (built-in + external) | ✓ | ✓ (via adapter) |
| Model escalation | ✓ (auto-escalate to cloud on fail) | ✗ (single model) | ✗ |
| Improvement loop | ✓ (retry → decompose → escalate) | ✗ | ✗ |
| BoneScript (backend gen) | ✓ (one .bone → full project) | ✗ | ✗ |
| Forgiving JSON parser | ✓ (repairs tool call output) | ✗ (expects valid JSON) | ✗ |
| Governor (tool scoring) | ✓ (Bayesian learning) | ✗ | ✗ |
| Hard fail protection | ✓ (never delivers broken code) | ✗ | ✗ |
| Auto-validation | ✓ (compile/lint after every write) | ✗ | ✗ |
| Streaming | ✓ (token-by-token in TUI) | ✓ | ✓ |
| Git integration | ✓ (/git, /diff, /undo) | ✓ | ✓ (via bash) |
| File @ references | ✗ | ✓ | ✗ |
| Task planning | ✓ (TODO-driven decomposition) | ✓ (plan mode) | ✗ |
| Hooks | ✓ (pre/post tool, file events) | ✗ | ✗ |
| Cost tracking | ✗ | ✓ (per-session) | ✗ |
| Stars (GitHub) | New | 151k+ | Growing fast |
| Install | npm install -g smallcode |
npm install -g opencode-ai |
npm install -g @anthropic-ai/pi |
SmallCode's benchmarks were run with huihui-gemma-4-e4b-it-abliterated — a Gemma 4 MoE model with only ~4B active parameters per forward pass (8B total). This is significantly smaller than the 14B-27B models typically used in OpenCode/Pi benchmarks.
OpenCode/Pi estimates are from community benchmarks (grigio.org, bitdoze.com) with Qwen2.5-Coder-14B and Devstral Small (~14B) — models 3-4x larger.
| Category | SmallCode | OpenCode (est.) | Pi Agent (est.) |
|---|---|---|---|
| Python | 100% (10/10) | ~85% | ~90% |
| JavaScript | 80% (8/10) | ~75% | ~80% |
| TypeScript | 100% (10/10) | ~80% | ~85% |
| HTML/CSS | 100% (10/10) | ~90% | ~90% |
| Rust | 50% (5/10) | ~40% | ~45% |
| Go | 90% (9/10) | ~75% | ~80% |
| Data Structures | 100% (10/10) | ~80% | ~85% |
| Testing | 70% (7/10) | ~60% | ~65% |
| Bug Fixing | 80% (8/10) | ~65% | ~70% |
| Overall | 87% (87/100) | ~75% | ~80% |
| Category | SmallCode | OpenCode (est.) | Pi Agent (est.) |
|---|---|---|---|
| Python multi | 80% | ~50% | ~55% |
| JS multi | 100% | ~60% | ~65% |
| TS multi | 60% | ~45% | ~50% |
| Web multi | 100% | ~70% | ~70% |
| Rust multi | 20% | ~20% | ~25% |
| Go multi | 20% | ~25% | ~30% |
| Fullstack | 0%→80% (w/ BoneScript) | ~35% | ~40% |
| Config | 20% | ~30% | ~35% |
| Refactor | 20% | ~25% | ~30% |
| Overall | 46% (→60%+ w/ BoneScript) | ~40% | ~45% |
- Compound tools reduce tool call chains (one call vs 3-4) — critical for tiny models that lose coherence after 3+ sequential calls
- Improvement loop auto-validates and feeds errors back — the model doesn't need to be smart enough to get it right first try
- Forgiving parser handles messy JSON from small models that can't reliably produce valid tool calls
- Token budgeting prevents context overflow — a 4B model with 8k effective context needs every token managed
- Decompose strategy breaks failed tasks into chunks the small model can handle individually
- The model is 3-4x smaller than what OpenCode/Pi were benchmarked with — SmallCode's harness engineering makes up the difference
- Multi-session — OpenCode runs parallel agents, SmallCode is single-session
- LSP — OpenCode integrates language servers for richer diagnostics
- Ecosystem maturity — 151k stars, 900+ contributors, battle-tested
- Desktop app — OpenCode has Electron GUI
- Cost tracking — OpenCode shows per-session spend
- File references — OpenCode's @file syntax is convenient
- Model escalation — auto-falls back to Claude/GPT when local model fails
- BoneScript — one .bone file → complete backend (unique to SmallCode)
- Code graph retrieval — symbol-level graph search vs grep-based file reading
- Persistent memory — typed knowledge store that survives across sessions
- Governor — Bayesian tool scoring learns what works over time
- Hard fail protection — refuses to deliver broken code after verification
- Plugin system with hooks — extend everything without forking
| Item | Status |
|---|---|
| Core agent loop | ✅ |
| Fullscreen TUI with scroll | ✅ |
| Command palette | ✅ |
| Plugin system (E2E tested) | ✅ |
| Skill system (E2E tested) | ✅ |
| Memory system (fixed + tested) | ✅ |
| Code graph integration | ✅ |
| Model escalation | ✅ |
| BoneScript integration | ✅ |
| Improvement loop + decompose | ✅ |
| Streaming in fullscreen TUI | ✅ |
| Word wrapping | ✅ |
| Timeout handling (descriptive errors) | ✅ |
| .npmignore (clean publish) | ✅ |
| GitHub deps (standalone install) | ✅ |
| --classic fallback | ✅ |
| Multi-project workspace indexing | ✅ |
npm install -g ready |
✅ |
SmallCode is production-ready for local LLM workflows. It achieves 87% single-file success with a 4B-active parameter model — outperforming OpenCode and Pi Agent running on models 3-4x larger. The harness engineering (compound tools, improvement loop, token budgeting, governor) compensates for model size.
The combination gives SmallCode a 12 percentage point lead over OpenCode and 7 points over Pi on single-file tasks, despite using a model with 1/3 the active parameters.
For cloud model users, OpenCode remains the more polished choice (LSP, multi-session, desktop app, 151k community). For local-first developers who want privacy, speed, and reliability with small models, SmallCode extracts more useful work per parameter than anything else available.