Consolidated features: agents, live TUI feed, skills/memory/evolver, small-model compat fixes#97
Open
shuff57 wants to merge 36 commits into
Open
Consolidated features: agents, live TUI feed, skills/memory/evolver, small-model compat fixes#97shuff57 wants to merge 36 commits into
shuff57 wants to merge 36 commits into
Conversation
…r strict templates Qwen3/Qwen3.5 chat templates under llama.cpp --jinja raise "System message must be at the beginning." and llama.cpp 400s when a system-role message appears at any index but 0 — but only when tools are present (that's when it compiles the template to build a tool-call grammar). SmallCode injects system content mid-conversation (clarifier, plan request, planner injection, path-validation warnings, skill activation, compaction), so the messages array routinely had system entries past index 0. New src/session/message_normalizer.js#consolidateSystemMessages() collapses all system-role messages into a single leading one (order preserved, identical blocks de-duplicated) and keeps only non-system turns after it. Applied in both request builders (bin/smallcode.js and bin/model_client.js chatCompletion) right before the body is sent. Verified E2E against a Qwen3 model: every tool-bearing request now carries exactly one system message at index 0. +9 tests; full suite 157 passing.
Bumps [hono](https://github.com/honojs/hono) from 4.12.19 to 4.12.23. - [Release notes](https://github.com/honojs/hono/releases) - [Commits](honojs/hono@v4.12.19...v4.12.23) --- updated-dependencies: - dependency-name: hono dependency-version: 4.12.23 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
Skills following the Claude Code layout (<skill-dir>/<name>/SKILL.md) or written as plain .md without YAML frontmatter were silently skipped in the standard skill dirs (.smallcode/skills, ~/.smallcode/skills, ~/.config/smallcode/skills). Both shapes now load; README-style files (README/CHANGELOG/LICENSE/CONTRIBUTING) are filtered by name. Fixes Doorman11991#81 Constraint: no warning channel exists in SkillManager, so silent skips had no user-visible signal Rejected: warn-on-skip only | users following Claude Code conventions expect these layouts to work Confidence: high Scope-risk: narrow Not-tested: fullscreen TUI /skill list rendering (logic shared with classic mode) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two wizard UX fixes for local providers: - Ollama / LM Studio: fetch the installed model list from the OpenAI-compatible /models endpoint and offer a numbered picker instead of a blank free-text prompt. Falls back to manual entry when the server is unreachable or returns nothing. - Borrow the caller''s readline interface when provided. The wizard previously created a second readline on the same stdin while the TUI''s interface was still attached, so every keystroke echoed twice (duplicated letters while typing). Constraint: wizard must keep working when invoked without a readline (tool path) Rejected: pausing the caller readline around the wizard | borrowed rl is simpler and fixes echo at the source Confidence: high Scope-risk: narrow Not-tested: fullscreen TUI wizard flow (mock rl there has no question method; pre-existing Doorman11991#80 territory) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mouse selection scoped to the chat panel: drag highlights, release copies (ANSI-stripped) to the system clipboard. The 10-char role gutter and the tool panel never select. Dwelling at the panel edge auto-scrolls so selections extend beyond the visible window. Enables SGR 1002 (button-motion tracking) — 1000 alone reports no drag events, which also explains why text was previously unselectable in fullscreen mode at all. Constraint: motion events only arrive while the pointer moves, so edge auto-scroll needs dwell (repeat events), not hover Rejected: terminal-native selection via disabling mouse tracking | loses wheel scroll and cannot scope to the chat panel Confidence: high Scope-risk: narrow Not-tested: macOS/Linux clipboard paths (pbcopy/xclip) — same pattern as existing Ctrl+V paste Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Create-mode evolver: deterministic friction extraction from saved traces (repeated near-duplicate prompts, consecutive tool-retry loops), LLM judgment routed to the strong tier, and ONE quarantined skill draft per run written to .smallcode/skills/drafts/. Drafts never auto-load; /evolve promote <name> moves them live. Validation gates every write (name format, no frontmatter injection, trigger rules); name collisions across live+draft+global dirs abort; every create appends to .smallcode/evolver-audit.jsonl. The per-run cap is structural — EvolverRun raises on a second create. Constraint: small models produce noisy judgments, so all fuzzy output passes validate-or-abort before any write Rejected: plugin delivery | needs TraceRecorder + SkillManager internals unreachable from plugin dirs under binary installs Confidence: high Scope-risk: narrow Directive: keep mechanics LLM-free — judgment stays in the command handler so mechanics remain unit-testable Not-tested: strong-tier routing with a separately configured SMALLCODE_MODEL_STRONG endpoint Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SkillManager now reads only frontmatter on startup (_index Map) and loads bodies on demand via _loadBody(), cached in skills Map. This cuts per-turn skill injection from ~60k chars (all bodies) to ~240 chars (compact index) for a typical 30-skill install. New surface: getIndex() flat list, formatSkillIndex/formatSkillResult in skill_index_formatter.js, use_skill tool (executor + tools.js). getSkillContext() injects the index always; auto-matched bodies append after, subject to the existing 4000-char cap. Public API (get/list/getAutoSkills/formatForPrompt/add/remove/ promoteDraft/listDrafts) is unchanged — all 335 tests pass. Rejected: inject all bodies always | O(skills) context cost per turn Constraint: existing tests must pass unmodified Confidence: high Scope-risk: moderate Not-tested: live use_skill call by real model (requires interactive session)
Memory objects gain tier (hot|archive) and last_used_at fields (backward-compat: backfilled on first hygiene run). runHygiene() sweeps: hot+unused>60d→archive, archive>90d→forget, hot>20→archive oldest 5. Adapter layer handles both SQLite budget-aware-mcp (via update()) and fallback MemoryStore (mutate+save) without touching node_modules. Auto-runs silently (try-catch) at 3 session-save points. /memory hygiene and /memory index subcommands added to commands.js. Generated .smallcode/MEMORY.md is human-readable + git-diffable; never authoritative. Rejected: markdown-tier replacement | loses FTS5/BM25 Rejected: hybrid two-source write | inconsistency risk Constraint: do not modify node_modules/budget-aware-mcp Confidence: high Scope-risk: narrow Not-tested: budget-aware-mcp setMeta path (no setMeta exists — update() used instead)
Field regression: rephrased prompts with filler drift (another/please/new) failed to cluster because stopwords diluted Jaccard below threshold. Real prompts from a live session pinned as a test.
use_skill was defined in TOOLS but absent from both routers' category whitelists, so the model never saw it in routed mode. The skill index is injected every turn, so the tool rides along in every tool-bearing category (~80 tokens).
Without this, actively-retrieved old entries age out of the hot tier at 60d — hygiene tier sweeps need real usage signal. Try-catch wrapped; a failed touch never breaks retrieval.
AgentRunner runs isolated sub-conversations (task-only history, narrowed
tools, MAX_STEPS=15, token budget min(8000,ctx*0.3), non-streaming).
TeamLoader/team_runner add sequential pipelines (output → next agent
input). spawn_agent tool wired in both compiled and two-stage routers.
/agents, /agent, /teams, /team commands + fullscreen palette entries.
33 new tests (14 loader, 19 runner/team), 380 total, 0 failures.
Constraint: No yaml dep — team loader hand-parses inline-array yaml only
Constraint: No nested repair in sub-agents — bad JSON args → {} + tool error
Directive: Loaders skip drafts/ — Phase 3 writes agent/team drafts there
Rejected: Parallel team execution | local inference perf trap on small hw
Confidence: high
Scope-risk: moderate
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Boot the interactive fullscreen (or classic) TUI and immediately fire a
user-supplied prompt without entering non-interactive mode. The user can
continue typing after the seeded run completes.
Wired in three places:
- Arg parser: `else if (arg === '--task') { flags.task = args[++i]; }`
- main() dispatch: `!flags.task &&` guard prevents --task routing to runNonInteractive
- runTUI(): setImmediate(runAgentLoop) in both fullscreen and classic branches
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SMALLCODE_MAX_TOOL_RESULT_CHARS now accepts 0/none/unlimited/off to disable tool-result trimming entirely, and when unset the default scales with the model window — large-window models (>=131072 tokens, e.g. minimax-m3's 512K) are left uncapped, since the read guard only exists to protect small windows. Small models keep the 8000-char guard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- agent_loader.js / team_loader.js: load bundled package-root agents/ and teams/ dirs first, then project .smallcode/ dirs, so project files win over bundled defaults (mirrors skills.js pattern). - agents/: 10 bundled agents — scout, code-engineer, critic, debugger, oracle, planner, qa-tester, red-team, documenter, librarian — concise SmallCode-native prompts with correct tool lists and model tiers. - teams/: 3 bundled teams — build, review, debug. - test/agent_loader.test.js: update 4 tests that assumed empty loader on an empty project; now assert bundled defaults are present and named agents are reachable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a catch-all bundled agent geared toward content authoring and text transformation (remaster/rewrite/summarize from source + a prompt) — the name small models reach for by default (spawn_agent "general-purpose"). Follows a named prompt/template when the task references one, reads source fully, and writes the actual output artifact. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1991#80,Doorman11991#93,Doorman11991#96) Doorman11991#93 Add line/word navigation to the fullscreen input: Home/End (and Ctrl+A/Ctrl+E), Ctrl+Left/Right word jumps, Ctrl+W / Ctrl+Backspace word-delete-left, Ctrl+Delete word-delete-right, and forward Delete. Previously only plain arrows and Backspace were handled. Doorman11991#96 Honor right-click as paste. SGR mouse tracking makes the terminal forward right-clicks to the app instead of showing its native paste menu, which broke right-click paste on Linux. Handle SGR button-2 release as a clipboard paste; refactor the Ctrl+V clipboard read into a shared _pasteFromClipboard() helper. Doorman11991#80 Make /provider work in the fullscreen TUI. It now appears in the command palette, and since its interactive wizard cannot run inside the captured-stdout TUI, /provider shows current provider status plus guidance to use /endpoint, /model, or the shell wizard instead of silently doing nothing. Doorman11991#76 Document the already-supported model response timeout (SMALLCODE_MODEL_TIMEOUT / smallcode.toml [model].timeout) in the README. Adds test/input_editing.test.js (10 cases). Full suite: 410 passing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… warning line but keeps the corrective steer Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
) Move the /provider TUI-routing logic out of the inline onCommand handler in bin/smallcode.js into bin/tui_commands.js as a pure resolveTuiCommand(cmd) function returning { command, guidance }. Behavior is unchanged — the guidance text is byte-identical to the previous inline version — but the mapping is now unit-testable without booting the full TUI. Adds test/tui_commands.test.js (7 cases): bare /provider reroutes to status + guidance, status/--status/-s pass through, unknown subcommands reroute, non-provider commands untouched, /providerx word-boundary, and defensive handling of empty/undefined input. Full suite: 417 passing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two fixes for small models (minimax) misbehaving: 1. Add src/tools/tool_aliases.js — maps OpenAI/Claude-style tool names (Read, Edit, Bash, Grep, Glob, str_replace, LS, …) to SmallCode's real tool names, re-keying argument names as needed (file_path→path, old_string→old_str, etc.). Wire normalizeToolCall() in bin/smallcode.js right before the quality monitor and tool dispatch so the monitor sees real names (no false hallucinated_tool) and dispatch executes real tools. Also filter out quality-monitor/quality_monitor echo calls before they can re-trigger the feedback loop. 2. Change all four [QUALITY-MONITOR] injection prefixes in src/governor/quality_monitor.js to "Self-check note:" — a plain-text prefix small models won't parrot back as a bracketed tool name. Tests: test/tool_aliases.test.js (31 cases); updated quality_monitor.test.js. All 431 tests pass (npm test). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…1991#80,Doorman11991#93,Doorman11991#96) Doorman11991#93 Add line/word navigation to the fullscreen input: Home/End (and Ctrl+A/Ctrl+E), Ctrl+Left/Right word jumps, Ctrl+W / Ctrl+Backspace word-delete-left, Ctrl+Delete word-delete-right, and forward Delete. Previously only plain arrows and Backspace were handled. Doorman11991#96 Honor right-click as paste. SGR mouse tracking makes the terminal forward right-clicks to the app instead of showing its native paste menu, which broke right-click paste on Linux. Handle SGR button-2 release as a clipboard paste; refactor the Ctrl+V clipboard read into a shared _pasteFromClipboard() helper. Doorman11991#80 Make /provider work in the fullscreen TUI. It now appears in the command palette, and since its interactive wizard cannot run inside the captured-stdout TUI, /provider shows current provider status plus guidance to use /endpoint, /model, or the shell wizard instead of silently doing nothing. Doorman11991#76 Document the already-supported model response timeout (SMALLCODE_MODEL_TIMEOUT / smallcode.toml [model].timeout) in the README. Adds test/input_editing.test.js (10 cases). Full suite: 410 passing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
) Move the /provider TUI-routing logic out of the inline onCommand handler in bin/smallcode.js into bin/tui_commands.js as a pure resolveTuiCommand(cmd) function returning { command, guidance }. Behavior is unchanged — the guidance text is byte-identical to the previous inline version — but the mapping is now unit-testable without booting the full TUI. Adds test/tui_commands.test.js (7 cases): bare /provider reroutes to status + guidance, status/--status/-s pass through, unknown subcommands reroute, non-provider commands untouched, /providerx word-boundary, and defensive handling of empty/undefined input. Full suite: 417 passing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A self-referential entry in mcp.json (a server whose command relaunches `smallcode --mcp`) caused unbounded process spawning: each smallcode MCP server booted, connected to its own configured MCP servers, and spawned another smallcode MCP server — recursively. Reporters saw thousands of `node smallcode.js --mcp` processes and thousands of identical session files, exhausting RAM until the OOM killer fired. Two layers of defense: 1. Host-side (root cause): bin/smallcode.js no longer initializes the external MCP client when running in --mcp server mode. An MCP server must not also act as an MCP host. This alone breaks the recursion regardless of mcp.json contents, and stops the server from creating a session file on every spawn. 2. Config-side (defense-in-depth): MCPClient.loadConfig() now skips any server entry that would relaunch smallcode in --mcp mode, via the new static MCPClient._isSelfReference(). Catches direct, node, npx, and smolv2 forms while leaving legitimate third-party servers untouched. Adds test/mcp_self_reference.test.js (4 cases) plus an on-disk loadConfig integration check. Full suite: 452 passing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…oorman11991#77) Phase A of the live activity feed: the TUI now shows work as it happens instead of only finished tool results. Per-feature toggles via a new /live command, seeded from env. - bin/live_settings.js: pure settings module (tools/context/stream/thinking), env-seeded, with resolveLiveCommand() for the /live command. tools+context default ON; stream+thinking default OFF (Phase B changes the request path). - TUI: toolStart()/toolEnd() push an in-progress ⚙ line the moment a tool starts and rewrite it to ✓/✗ in place on completion (anchored against front-trimming). setContextMeter() renders a live "ctx 42% (13k/32k)" footer indicator. - TokenMonitor: track lastPromptTokens; contextMeter(window) snapshot. - Agent loop: dispatch site starts the live tool line; the console.log override finishes it and refreshes the context meter; meter also refreshes after each model turn. Classic behavior preserved when /live tools is off. - /live command (commands.js) + palette entry. Adds test/live_settings.test.js (8) and test/live_tui.test.js (6). Plus an end-to-end /live command check. Full suite: 465 passing. Design: docs/plans/2026-06-14-live-activity-feed-design.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n11991#77) Opt-in live streaming of the model reply and a live reasoning preview, gated behind /live stream and /live thinking (both default OFF). - bin/stream_assembler.js: pure StreamAssembler + parseSSEBuffer. Folds streamed OpenAI chunks (content, reasoning_content, tool_call deltas, usage, finish_reason) back into the exact non-streaming `data` shape, so all downstream chatCompletion logic is untouched. Buffer parser tolerates lines split across network reads. - chatCompletion: when /live stream is on AND a fullscreen TUI is attached, request stream:true (+ stream_options.include_usage), consume the SSE via the assembler, drive streamToken (and streamThinking when /live thinking is on) as tokens arrive, then return the reassembled data. Any streaming error falls back to what was assembled. The non-streaming default path is unchanged; the error-retry is forced non-streamed so its JSON parse stays valid. - TUI: streamThinking() renders a single collapsing dimmed [thinking] line, reset at turn boundaries by endStream(). - Suppress the post-turn addChat('assistant') when content was already shown live, to avoid double-rendering. Adds test/stream_assembler.test.js (8) incl. split-buffer reconstruction, parallel tool_calls, reasoning routing, and usage capture. Full suite: 473. Note: the SSE assembler is fully unit-tested, but the live streaming path has not been exercised against a real streaming endpoint here — smoke-test with `/live stream on` against your local model. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…registry The hallucination check scoped knownTools to currentToolCategory, causing false "Tool X does not exist" steers when a real tool was invoked from a different category (the dispatcher widens to all essential tools and runs it). Validate against the full registry (getAllTools(config, null)) instead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… all-features # Conflicts: # bin/smallcode.js # src/session/message_normalizer.js # test/message_normalizer.test.js
…2.23' into all-features
The dependabot/hono branch was cut from an old base; its 3-way merge stripped valid lockfile entries (playwright-extra, puppeteer-extra-plugin-stealth, rimraf, fs-extra, and ~25 others) that would break `npm ci`. hono is not a declared dependency, so the bump itself is a no-op. Restore package-lock.json to integration's current, correct state. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Consolidates the feature and fix work from this fork onto one branch and proposes it upstream. Brings agent/subagent support, a live TUI activity feed, lazy skill loading, memory hygiene, an evolver, plus a batch of compatibility and stability fixes for small local models.
Features
Agents / orchestration
TUI
--taskflag to boot the TUI with an auto-seeded prompt/providerin the TUI (v1.6.0 - /provider command missing #80, Keyboard navigation does not follow either bash or windows conventions #93, [Linux Mint] Unable to right-click in Smallcode Terminal window #96)resolveTuiCommandhelper extracted + tests (v1.6.0 - /provider command missing #80)Skills
use_skilltoolMemory
MEMORY.mdindexlast_used_aton memory retrievalEvolver
/evolveproposes new skills from session frictionModels / compatibility
reasoning_content(Cannot work with qwen3.6-35b-int4 #49)Fixes
smallcode --mcpfork bomb (Memory leak: thousands of smallcode.js --mcp processes spawning #82)use_skillthrough tool category filtersnpm cibreakage)Verification
node --test test/*.test.js→ 473 passing, 0 failingpackage-lock.jsonconsistent (peer/override deps intact)Scope
36 commits, 60 files (+5554 / -162).
🤖 Generated with Claude Code