feat(agents): subagent + team support with spawn_agent tool by shuff57 · Pull Request #92 · Doorman11991/smallcode

shuff57 · 2026-06-07T19:32:07Z

What

Bounded subagents and sequential agent teams, designed for small-model reality:

Agent definitions — .smallcode/agents/<name>.md: frontmatter {name, description, tools: [subset], model: tier-or-name} + body = system prompt. Same CRLF-tolerant frontmatter conventions as skills; a drafts/ subdirectory is quarantined (parity with skills, ready for evolver-proposed agents).
spawn_agent tool — the model delegates a scoped sub-task. The sub-conversation is hard-isolated:
- Initial history = ONLY the task — parent history never crosses the boundary (architecturally enforced, pinned by test)
- Tools narrowed to agentDef.tools ∩ TOOLS (+ read_file fallback); no MCP, no plugins, no nested repair calls
- Hard caps: 15 steps, min(8000, 30% of context) tokens; system prompt ≤600 tokens (body truncated at 1600 chars with marker)
- model: accepts a tier (fast/default/medium/strong) or a model name — sub-agents can run on a cheaper/stronger model than the parent
- run() never throws — always {output, steps, tokens, error?}
Teams — .smallcode/teams/<name>.yaml (tiny parser, no dependency): sequential pipeline, each agent''s output becomes the next agent''s task. No parallelism by design — local inference makes parallel agents a latency trap, not a win.
Commands — /agents, /agent <name> <task>, /teams, /team <name> <task>, palette + help entries.
Both tool routers register spawn_agent (plan/write/default) — same class of gap as the use_skill routing fix in feat(skills): lazy index-first loading + use_skill tool #89, handled up front.

Tests

33 new across test/agent_loader.test.js + test/agent_runner.test.js: isolation pin (initial history is task-only), tool narrowing + fallback, step/token caps, error-shape on fetch failure, CRLF frontmatter, team YAML parsing, pipeline piping with stubbed runners, drafts quarantine. Full suite green.

Field-verified on live Ollama: /agent manual runs (~2 steps / ~1000 tokens per delegation), model-initiated spawn_agent from natural language ("have the echo-reviewer agent check hello.py"), and a two-agent pipeline (reviewer → summarizer) producing a correct one-sentence digest.

Stacked on #90 — new commits: 97d12fb.

Phase 3 (the #88 evolver proposing agent/team drafts from friction) is designed and small (~150 lines) — can follow once this lands.

🤖 Generated with Claude Code

Skills following the Claude Code layout (<skill-dir>/<name>/SKILL.md) or written as plain .md without YAML frontmatter were silently skipped in the standard skill dirs (.smallcode/skills, ~/.smallcode/skills, ~/.config/smallcode/skills). Both shapes now load; README-style files (README/CHANGELOG/LICENSE/CONTRIBUTING) are filtered by name. Fixes Doorman11991#81 Constraint: no warning channel exists in SkillManager, so silent skips had no user-visible signal Rejected: warn-on-skip only | users following Claude Code conventions expect these layouts to work Confidence: high Scope-risk: narrow Not-tested: fullscreen TUI /skill list rendering (logic shared with classic mode) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Create-mode evolver: deterministic friction extraction from saved traces (repeated near-duplicate prompts, consecutive tool-retry loops), LLM judgment routed to the strong tier, and ONE quarantined skill draft per run written to .smallcode/skills/drafts/. Drafts never auto-load; /evolve promote <name> moves them live. Validation gates every write (name format, no frontmatter injection, trigger rules); name collisions across live+draft+global dirs abort; every create appends to .smallcode/evolver-audit.jsonl. The per-run cap is structural — EvolverRun raises on a second create. Constraint: small models produce noisy judgments, so all fuzzy output passes validate-or-abort before any write Rejected: plugin delivery | needs TraceRecorder + SkillManager internals unreachable from plugin dirs under binary installs Confidence: high Scope-risk: narrow Directive: keep mechanics LLM-free — judgment stays in the command handler so mechanics remain unit-testable Not-tested: strong-tier routing with a separately configured SMALLCODE_MODEL_STRONG endpoint Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Field regression: rephrased prompts with filler drift (another/please/new) failed to cluster because stopwords diluted Jaccard below threshold. Real prompts from a live session pinned as a test.

SkillManager now reads only frontmatter on startup (_index Map) and loads bodies on demand via _loadBody(), cached in skills Map. This cuts per-turn skill injection from ~60k chars (all bodies) to ~240 chars (compact index) for a typical 30-skill install. New surface: getIndex() flat list, formatSkillIndex/formatSkillResult in skill_index_formatter.js, use_skill tool (executor + tools.js). getSkillContext() injects the index always; auto-matched bodies append after, subject to the existing 4000-char cap. Public API (get/list/getAutoSkills/formatForPrompt/add/remove/ promoteDraft/listDrafts) is unchanged — all 335 tests pass. Rejected: inject all bodies always | O(skills) context cost per turn Constraint: existing tests must pass unmodified Confidence: high Scope-risk: moderate Not-tested: live use_skill call by real model (requires interactive session)

use_skill was defined in TOOLS but absent from both routers' category whitelists, so the model never saw it in routed mode. The skill index is injected every turn, so the tool rides along in every tool-bearing category (~80 tokens).

Memory objects gain tier (hot|archive) and last_used_at fields (backward-compat: backfilled on first hygiene run). runHygiene() sweeps: hot+unused>60d→archive, archive>90d→forget, hot>20→archive oldest 5. Adapter layer handles both SQLite budget-aware-mcp (via update()) and fallback MemoryStore (mutate+save) without touching node_modules. Auto-runs silently (try-catch) at 3 session-save points. /memory hygiene and /memory index subcommands added to commands.js. Generated .smallcode/MEMORY.md is human-readable + git-diffable; never authoritative. Rejected: markdown-tier replacement | loses FTS5/BM25 Rejected: hybrid two-source write | inconsistency risk Constraint: do not modify node_modules/budget-aware-mcp Confidence: high Scope-risk: narrow Not-tested: budget-aware-mcp setMeta path (no setMeta exists — update() used instead)

Without this, actively-retrieved old entries age out of the hot tier at 60d — hygiene tier sweeps need real usage signal. Try-catch wrapped; a failed touch never breaks retrieval.

AgentRunner runs isolated sub-conversations (task-only history, narrowed tools, MAX_STEPS=15, token budget min(8000,ctx*0.3), non-streaming). TeamLoader/team_runner add sequential pipelines (output → next agent input). spawn_agent tool wired in both compiled and two-stage routers. /agents, /agent, /teams, /team commands + fullscreen palette entries. 33 new tests (14 loader, 19 runner/team), 380 total, 0 failures. Constraint: No yaml dep — team loader hand-parses inline-array yaml only Constraint: No nested repair in sub-agents — bad JSON args → {} + tool error Directive: Loaders skip drafts/ — Phase 3 writes agent/team drafts there Rejected: Parallel team execution | local inference perf trap on small hw Confidence: high Scope-risk: moderate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

shuff57 and others added 8 commits June 5, 2026 12:33

fix(evolver): stopword filtering in prompt clustering

7095ce3

Field regression: rephrased prompts with filler drift (another/please/new) failed to cluster because stopwords diluted Jaccard below threshold. Real prompts from a live session pinned as a test.

fix(memory): touch last_used_at on memory_load retrieval

2115e60

Without this, actively-retrieved old entries age out of the hot tier at 60d — hygiene tier sweeps need real usage signal. Try-catch wrapped; a failed touch never breaks retrieval.

shuff57 closed this Jun 14, 2026

shuff57 deleted the feat/agents-phase2 branch June 14, 2026 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents): subagent + team support with spawn_agent tool#92

feat(agents): subagent + team support with spawn_agent tool#92
shuff57 wants to merge 8 commits into
Doorman11991:masterfrom
shuff57:feat/agents-phase2

shuff57 commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shuff57 commented Jun 7, 2026

What

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant