Skip to content

Consolidated features: agents, live TUI feed, skills/memory/evolver, small-model compat fixes#97

Open
shuff57 wants to merge 36 commits into
Doorman11991:masterfrom
shuff57:master
Open

Consolidated features: agents, live TUI feed, skills/memory/evolver, small-model compat fixes#97
shuff57 wants to merge 36 commits into
Doorman11991:masterfrom
shuff57:master

Conversation

@shuff57

@shuff57 shuff57 commented Jun 14, 2026

Copy link
Copy Markdown

Summary

Consolidates the feature and fix work from this fork onto one branch and proposes it upstream. Brings agent/subagent support, a live TUI activity feed, lazy skill loading, memory hygiene, an evolver, plus a batch of compatibility and stability fixes for small local models.

Features

Agents / orchestration

  • Phase 2 subagent + team support
  • Bundled default agent pack + teams, with loader fallback
  • General-purpose agent for open-ended/authoring tasks

TUI

Skills

  • Lazy index-first skill loading + use_skill tool
  • Discover nested and frontmatter-less skills

Memory

  • Hygiene tiers + MEMORY.md index
  • Touch last_used_at on memory retrieval

Evolver

  • /evolve proposes new skills from session friction
  • Stopword filtering in prompt clustering

Models / compatibility

  • Read-guard uncaps tool reads for large-window models
  • Wizard lists local models and reuses caller readline
  • minimax alias layer + quality-monitor parrot-loop fix
  • Consolidate mid-conversation system messages for strict chat templates (llama.cpp qwen 3.5 error #62)
  • Recover a final answer placed in reasoning_content (Cannot work with qwen3.6-35b-int4 #49)

Fixes

  • Stop the smallcode --mcp fork bomb (Memory leak: thousands of smallcode.js --mcp processes spawning #82)
  • Quality monitor validates the hallucination check against the full tool registry, not the current router category — fixes false "Tool X does not exist" steers that derailed small models mid-task
  • Route use_skill through tool category filters
  • Restore lockfile after a stale dependabot/hono merge (dropped no real dep; prevented npm ci breakage)

Verification

  • node --test test/*.test.js473 passing, 0 failing
  • package-lock.json consistent (peer/override deps intact)

Scope

36 commits, 60 files (+5554 / -162).

🤖 Generated with Claude Code

Doorman11991 and others added 30 commits May 29, 2026 10:58
…r strict templates

Qwen3/Qwen3.5 chat templates under llama.cpp --jinja raise
"System message must be at the beginning." and llama.cpp 400s when a
system-role message appears at any index but 0 — but only when tools are
present (that's when it compiles the template to build a tool-call grammar).

SmallCode injects system content mid-conversation (clarifier, plan request,
planner injection, path-validation warnings, skill activation, compaction),
so the messages array routinely had system entries past index 0.

New src/session/message_normalizer.js#consolidateSystemMessages() collapses
all system-role messages into a single leading one (order preserved, identical
blocks de-duplicated) and keeps only non-system turns after it. Applied in
both request builders (bin/smallcode.js and bin/model_client.js
chatCompletion) right before the body is sent.

Verified E2E against a Qwen3 model: every tool-bearing request now carries
exactly one system message at index 0. +9 tests; full suite 157 passing.
Bumps [hono](https://github.com/honojs/hono) from 4.12.19 to 4.12.23.
- [Release notes](https://github.com/honojs/hono/releases)
- [Commits](honojs/hono@v4.12.19...v4.12.23)

---
updated-dependencies:
- dependency-name: hono
  dependency-version: 4.12.23
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Skills following the Claude Code layout (<skill-dir>/<name>/SKILL.md)
or written as plain .md without YAML frontmatter were silently skipped
in the standard skill dirs (.smallcode/skills, ~/.smallcode/skills,
~/.config/smallcode/skills). Both shapes now load; README-style files
(README/CHANGELOG/LICENSE/CONTRIBUTING) are filtered by name.

Fixes Doorman11991#81

Constraint: no warning channel exists in SkillManager, so silent skips had no user-visible signal
Rejected: warn-on-skip only | users following Claude Code conventions expect these layouts to work
Confidence: high
Scope-risk: narrow
Not-tested: fullscreen TUI /skill list rendering (logic shared with classic mode)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two wizard UX fixes for local providers:

- Ollama / LM Studio: fetch the installed model list from the
  OpenAI-compatible /models endpoint and offer a numbered picker
  instead of a blank free-text prompt. Falls back to manual entry
  when the server is unreachable or returns nothing.
- Borrow the caller''s readline interface when provided. The wizard
  previously created a second readline on the same stdin while the
  TUI''s interface was still attached, so every keystroke echoed
  twice (duplicated letters while typing).

Constraint: wizard must keep working when invoked without a readline (tool path)
Rejected: pausing the caller readline around the wizard | borrowed rl is simpler and fixes echo at the source
Confidence: high
Scope-risk: narrow
Not-tested: fullscreen TUI wizard flow (mock rl there has no question method; pre-existing Doorman11991#80 territory)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mouse selection scoped to the chat panel: drag highlights, release
copies (ANSI-stripped) to the system clipboard. The 10-char role
gutter and the tool panel never select. Dwelling at the panel edge
auto-scrolls so selections extend beyond the visible window.

Enables SGR 1002 (button-motion tracking) — 1000 alone reports no
drag events, which also explains why text was previously unselectable
in fullscreen mode at all.

Constraint: motion events only arrive while the pointer moves, so edge auto-scroll needs dwell (repeat events), not hover
Rejected: terminal-native selection via disabling mouse tracking | loses wheel scroll and cannot scope to the chat panel
Confidence: high
Scope-risk: narrow
Not-tested: macOS/Linux clipboard paths (pbcopy/xclip) — same pattern as existing Ctrl+V paste

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Create-mode evolver: deterministic friction extraction from saved
traces (repeated near-duplicate prompts, consecutive tool-retry
loops), LLM judgment routed to the strong tier, and ONE quarantined
skill draft per run written to .smallcode/skills/drafts/.

Drafts never auto-load; /evolve promote <name> moves them live.
Validation gates every write (name format, no frontmatter injection,
trigger rules); name collisions across live+draft+global dirs abort;
every create appends to .smallcode/evolver-audit.jsonl. The per-run
cap is structural — EvolverRun raises on a second create.

Constraint: small models produce noisy judgments, so all fuzzy output passes validate-or-abort before any write
Rejected: plugin delivery | needs TraceRecorder + SkillManager internals unreachable from plugin dirs under binary installs
Confidence: high
Scope-risk: narrow
Directive: keep mechanics LLM-free — judgment stays in the command handler so mechanics remain unit-testable
Not-tested: strong-tier routing with a separately configured SMALLCODE_MODEL_STRONG endpoint

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SkillManager now reads only frontmatter on startup (_index Map) and
loads bodies on demand via _loadBody(), cached in skills Map. This
cuts per-turn skill injection from ~60k chars (all bodies) to ~240
chars (compact index) for a typical 30-skill install.

New surface: getIndex() flat list, formatSkillIndex/formatSkillResult
in skill_index_formatter.js, use_skill tool (executor + tools.js).
getSkillContext() injects the index always; auto-matched bodies append
after, subject to the existing 4000-char cap.

Public API (get/list/getAutoSkills/formatForPrompt/add/remove/
promoteDraft/listDrafts) is unchanged — all 335 tests pass.

Rejected: inject all bodies always | O(skills) context cost per turn
Constraint: existing tests must pass unmodified
Confidence: high
Scope-risk: moderate
Not-tested: live use_skill call by real model (requires interactive session)
Memory objects gain tier (hot|archive) and last_used_at fields
(backward-compat: backfilled on first hygiene run). runHygiene()
sweeps: hot+unused>60d→archive, archive>90d→forget, hot>20→archive
oldest 5. Adapter layer handles both SQLite budget-aware-mcp (via
update()) and fallback MemoryStore (mutate+save) without touching
node_modules.

Auto-runs silently (try-catch) at 3 session-save points. /memory
hygiene and /memory index subcommands added to commands.js. Generated
.smallcode/MEMORY.md is human-readable + git-diffable; never authoritative.

Rejected: markdown-tier replacement | loses FTS5/BM25
Rejected: hybrid two-source write | inconsistency risk
Constraint: do not modify node_modules/budget-aware-mcp
Confidence: high
Scope-risk: narrow
Not-tested: budget-aware-mcp setMeta path (no setMeta exists — update() used instead)
Field regression: rephrased prompts with filler drift (another/please/new) failed to cluster because stopwords diluted Jaccard below threshold. Real prompts from a live session pinned as a test.
use_skill was defined in TOOLS but absent from both routers' category whitelists, so the model never saw it in routed mode. The skill index is injected every turn, so the tool rides along in every tool-bearing category (~80 tokens).
Without this, actively-retrieved old entries age out of the hot tier at 60d — hygiene tier sweeps need real usage signal. Try-catch wrapped; a failed touch never breaks retrieval.
AgentRunner runs isolated sub-conversations (task-only history, narrowed
tools, MAX_STEPS=15, token budget min(8000,ctx*0.3), non-streaming).
TeamLoader/team_runner add sequential pipelines (output → next agent
input). spawn_agent tool wired in both compiled and two-stage routers.
/agents, /agent, /teams, /team commands + fullscreen palette entries.
33 new tests (14 loader, 19 runner/team), 380 total, 0 failures.

Constraint: No yaml dep — team loader hand-parses inline-array yaml only
Constraint: No nested repair in sub-agents — bad JSON args → {} + tool error
Directive: Loaders skip drafts/ — Phase 3 writes agent/team drafts there
Rejected: Parallel team execution | local inference perf trap on small hw
Confidence: high
Scope-risk: moderate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Boot the interactive fullscreen (or classic) TUI and immediately fire a
user-supplied prompt without entering non-interactive mode. The user can
continue typing after the seeded run completes.

Wired in three places:
- Arg parser: `else if (arg === '--task') { flags.task = args[++i]; }`
- main() dispatch: `!flags.task &&` guard prevents --task routing to runNonInteractive
- runTUI(): setImmediate(runAgentLoop) in both fullscreen and classic branches

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SMALLCODE_MAX_TOOL_RESULT_CHARS now accepts 0/none/unlimited/off to disable
tool-result trimming entirely, and when unset the default scales with the model
window — large-window models (>=131072 tokens, e.g. minimax-m3's 512K) are left
uncapped, since the read guard only exists to protect small windows. Small
models keep the 8000-char guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- agent_loader.js / team_loader.js: load bundled package-root
  agents/ and teams/ dirs first, then project .smallcode/ dirs,
  so project files win over bundled defaults (mirrors skills.js
  pattern).
- agents/: 10 bundled agents — scout, code-engineer, critic,
  debugger, oracle, planner, qa-tester, red-team, documenter,
  librarian — concise SmallCode-native prompts with correct tool
  lists and model tiers.
- teams/: 3 bundled teams — build, review, debug.
- test/agent_loader.test.js: update 4 tests that assumed empty
  loader on an empty project; now assert bundled defaults are
  present and named agents are reachable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a catch-all bundled agent geared toward content authoring and text
transformation (remaster/rewrite/summarize from source + a prompt) — the name
small models reach for by default (spawn_agent "general-purpose"). Follows a
named prompt/template when the task references one, reads source fully, and
writes the actual output artifact.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1991#80,Doorman11991#93,Doorman11991#96)

Doorman11991#93 Add line/word navigation to the fullscreen input: Home/End (and
Ctrl+A/Ctrl+E), Ctrl+Left/Right word jumps, Ctrl+W / Ctrl+Backspace
word-delete-left, Ctrl+Delete word-delete-right, and forward Delete.
Previously only plain arrows and Backspace were handled.

Doorman11991#96 Honor right-click as paste. SGR mouse tracking makes the terminal
forward right-clicks to the app instead of showing its native paste menu,
which broke right-click paste on Linux. Handle SGR button-2 release as a
clipboard paste; refactor the Ctrl+V clipboard read into a shared
_pasteFromClipboard() helper.

Doorman11991#80 Make /provider work in the fullscreen TUI. It now appears in the
command palette, and since its interactive wizard cannot run inside the
captured-stdout TUI, /provider shows current provider status plus guidance
to use /endpoint, /model, or the shell wizard instead of silently doing
nothing.

Doorman11991#76 Document the already-supported model response timeout
(SMALLCODE_MODEL_TIMEOUT / smallcode.toml [model].timeout) in the README.

Adds test/input_editing.test.js (10 cases). Full suite: 410 passing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… warning line but keeps the corrective steer

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
)

Move the /provider TUI-routing logic out of the inline onCommand handler in
bin/smallcode.js into bin/tui_commands.js as a pure resolveTuiCommand(cmd)
function returning { command, guidance }. Behavior is unchanged — the guidance
text is byte-identical to the previous inline version — but the mapping is now
unit-testable without booting the full TUI.

Adds test/tui_commands.test.js (7 cases): bare /provider reroutes to status
+ guidance, status/--status/-s pass through, unknown subcommands reroute,
non-provider commands untouched, /providerx word-boundary, and defensive
handling of empty/undefined input.

Full suite: 417 passing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two fixes for small models (minimax) misbehaving:

1. Add src/tools/tool_aliases.js — maps OpenAI/Claude-style tool names
   (Read, Edit, Bash, Grep, Glob, str_replace, LS, …) to SmallCode's real
   tool names, re-keying argument names as needed (file_path→path,
   old_string→old_str, etc.). Wire normalizeToolCall() in bin/smallcode.js
   right before the quality monitor and tool dispatch so the monitor sees
   real names (no false hallucinated_tool) and dispatch executes real tools.
   Also filter out quality-monitor/quality_monitor echo calls before they
   can re-trigger the feedback loop.

2. Change all four [QUALITY-MONITOR] injection prefixes in
   src/governor/quality_monitor.js to "Self-check note:" — a plain-text
   prefix small models won't parrot back as a bracketed tool name.

Tests: test/tool_aliases.test.js (31 cases); updated quality_monitor.test.js.
All 431 tests pass (npm test).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1991#80,Doorman11991#93,Doorman11991#96)

Doorman11991#93 Add line/word navigation to the fullscreen input: Home/End (and
Ctrl+A/Ctrl+E), Ctrl+Left/Right word jumps, Ctrl+W / Ctrl+Backspace
word-delete-left, Ctrl+Delete word-delete-right, and forward Delete.
Previously only plain arrows and Backspace were handled.

Doorman11991#96 Honor right-click as paste. SGR mouse tracking makes the terminal
forward right-clicks to the app instead of showing its native paste menu,
which broke right-click paste on Linux. Handle SGR button-2 release as a
clipboard paste; refactor the Ctrl+V clipboard read into a shared
_pasteFromClipboard() helper.

Doorman11991#80 Make /provider work in the fullscreen TUI. It now appears in the
command palette, and since its interactive wizard cannot run inside the
captured-stdout TUI, /provider shows current provider status plus guidance
to use /endpoint, /model, or the shell wizard instead of silently doing
nothing.

Doorman11991#76 Document the already-supported model response timeout
(SMALLCODE_MODEL_TIMEOUT / smallcode.toml [model].timeout) in the README.

Adds test/input_editing.test.js (10 cases). Full suite: 410 passing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
)

Move the /provider TUI-routing logic out of the inline onCommand handler in
bin/smallcode.js into bin/tui_commands.js as a pure resolveTuiCommand(cmd)
function returning { command, guidance }. Behavior is unchanged — the guidance
text is byte-identical to the previous inline version — but the mapping is now
unit-testable without booting the full TUI.

Adds test/tui_commands.test.js (7 cases): bare /provider reroutes to status
+ guidance, status/--status/-s pass through, unknown subcommands reroute,
non-provider commands untouched, /providerx word-boundary, and defensive
handling of empty/undefined input.

Full suite: 417 passing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A self-referential entry in mcp.json (a server whose command relaunches
`smallcode --mcp`) caused unbounded process spawning: each smallcode MCP
server booted, connected to its own configured MCP servers, and spawned
another smallcode MCP server — recursively. Reporters saw thousands of
`node smallcode.js --mcp` processes and thousands of identical session
files, exhausting RAM until the OOM killer fired.

Two layers of defense:

1. Host-side (root cause): bin/smallcode.js no longer initializes the
   external MCP client when running in --mcp server mode. An MCP server
   must not also act as an MCP host. This alone breaks the recursion
   regardless of mcp.json contents, and stops the server from creating a
   session file on every spawn.

2. Config-side (defense-in-depth): MCPClient.loadConfig() now skips any
   server entry that would relaunch smallcode in --mcp mode, via the new
   static MCPClient._isSelfReference(). Catches direct, node, npx, and
   smolv2 forms while leaving legitimate third-party servers untouched.

Adds test/mcp_self_reference.test.js (4 cases) plus an on-disk loadConfig
integration check. Full suite: 452 passing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…oorman11991#77)

Phase A of the live activity feed: the TUI now shows work as it happens
instead of only finished tool results. Per-feature toggles via a new /live
command, seeded from env.

- bin/live_settings.js: pure settings module (tools/context/stream/thinking),
  env-seeded, with resolveLiveCommand() for the /live command. tools+context
  default ON; stream+thinking default OFF (Phase B changes the request path).
- TUI: toolStart()/toolEnd() push an in-progress ⚙ line the moment a tool
  starts and rewrite it to ✓/✗ in place on completion (anchored against
  front-trimming). setContextMeter() renders a live "ctx 42% (13k/32k)"
  footer indicator.
- TokenMonitor: track lastPromptTokens; contextMeter(window) snapshot.
- Agent loop: dispatch site starts the live tool line; the console.log
  override finishes it and refreshes the context meter; meter also refreshes
  after each model turn. Classic behavior preserved when /live tools is off.
- /live command (commands.js) + palette entry.

Adds test/live_settings.test.js (8) and test/live_tui.test.js (6). Plus an
end-to-end /live command check. Full suite: 465 passing.

Design: docs/plans/2026-06-14-live-activity-feed-design.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
shuff57 and others added 6 commits June 14, 2026 12:04
…n11991#77)

Opt-in live streaming of the model reply and a live reasoning preview, gated
behind /live stream and /live thinking (both default OFF).

- bin/stream_assembler.js: pure StreamAssembler + parseSSEBuffer. Folds
  streamed OpenAI chunks (content, reasoning_content, tool_call deltas, usage,
  finish_reason) back into the exact non-streaming `data` shape, so all
  downstream chatCompletion logic is untouched. Buffer parser tolerates lines
  split across network reads.
- chatCompletion: when /live stream is on AND a fullscreen TUI is attached,
  request stream:true (+ stream_options.include_usage), consume the SSE via the
  assembler, drive streamToken (and streamThinking when /live thinking is on)
  as tokens arrive, then return the reassembled data. Any streaming error falls
  back to what was assembled. The non-streaming default path is unchanged; the
  error-retry is forced non-streamed so its JSON parse stays valid.
- TUI: streamThinking() renders a single collapsing dimmed [thinking] line,
  reset at turn boundaries by endStream().
- Suppress the post-turn addChat('assistant') when content was already shown
  live, to avoid double-rendering.

Adds test/stream_assembler.test.js (8) incl. split-buffer reconstruction,
parallel tool_calls, reasoning routing, and usage capture. Full suite: 473.

Note: the SSE assembler is fully unit-tested, but the live streaming path has
not been exercised against a real streaming endpoint here — smoke-test with
`/live stream on` against your local model.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…registry

The hallucination check scoped knownTools to currentToolCategory, causing
false "Tool X does not exist" steers when a real tool was invoked from a
different category (the dispatcher widens to all essential tools and runs
it). Validate against the full registry (getAllTools(config, null)) instead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… all-features

# Conflicts:
#	bin/smallcode.js
#	src/session/message_normalizer.js
#	test/message_normalizer.test.js
The dependabot/hono branch was cut from an old base; its 3-way merge stripped
valid lockfile entries (playwright-extra, puppeteer-extra-plugin-stealth,
rimraf, fs-extra, and ~25 others) that would break `npm ci`. hono is not a
declared dependency, so the bump itself is a no-op. Restore package-lock.json
to integration's current, correct state.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants