Round 4: AI copilot on the board, remote & mobile, insights by lacraig2 · Pull Request #1 · rehosting/muse

lacraig2 · 2026-06-17T01:48:54Z

Round 4 of the muse build-out, in three independent tracks. 249 backend tests pass; ruff + frontend build clean. Every track was verified live against the real corpus.

✦ AI copilot on the board (commit 1)

Headless claude -p brought to mission control, all on-demand or budget-capped (worker concurrency stays 1; shares the Max 5h window):

Draft reply — ✦ drafts the next message from the session's recent context + your reply style + the live pane; you edit and send. sanitize_draft strips fences and refuses //! leaders (injection defense); prompts carry an untrusted-data clause. Live: 8s / $0.09, clean directive output.
Diagnose — ✦ on a health-bad card explains the failure + the unstick, saved as a note.
Triage — one batched pass labels every needs-attention card; merged client-side so the live ticker stays AI-free.
Autopilot ai mode — two-phase in the controller (the only tmux-write site): enqueue a draft, then re-check idle / unchanged / rate-limit / MUSE_AI_DAILY_BUDGET_USD (default $2) before typing. 8 tests cover the discard matrix; never answers permission prompts.

📱 Remote & mobile (commit 2)

Token auth — pure-ASGI middleware (never BaseHTTPMiddleware → SSE-safe). Loopback bypasses by default so local UI/scripts/MCP are untouched; remote uses a Bearer header or login cookie. Token auto-generates to ~/.muse/auth_token on a public bind. Verified the full 401/200 matrix live on a real 0.0.0.0 bind, incl. SSE short-circuiting before any bytes.
Mobile — navbar scrolls, board single-columns, viewer defaults to conversation-only on phones; PWA installable.
MUSE_PUBLIC_URL makes ntfy taps + MCP links open the reachable address.

📊 Insights (commit 3)

New page: "what did my sessions produce" (vs Stats' "what did I spend"), pure compute over warm tables — no transcript parsing on the request path (0.46s warm over 57 sessions).

Shipped vs burned — productive (commits/$) vs wasteful (cost + bad health + 0 commits); per-model & per-project commits-per-$10. Only high+medium provenance counts as shipped; low shown separately (evidence, not authorship proof).
Quantile cost heatmap (click → that day's journal), 24×7 when-you-work matrix (activity/errors/commits), per-project timeline of session bars + commit ticks.

Cuts (deliberate)

Per-card AI triage / any per-tick AI; permission-dialog automation; service worker; hamburger menu; touch-drag resize; timeline zoom; derived outcomes table.

🤖 Generated with Claude Code

…opilot New `claude -p` job kinds (draft_reply, diagnose, triage) through the existing worker + _execute_ai_job seam: - pack_for_reply: small recent-weighted digest + reentry brief + the user's last few replies (style) + the live tmux pane capture. - pack_for_diagnose: digest + detected failure patterns (retry loop / spiral / denials) → markdown, routed to an AI note. - pack_for_triage: ≤8 attention cards → one fenced-JSON line each. sanitize_draft strips fences and refuses leading `/`/`!` (slash/bash-mode injection via transcript content); prompts carry an untrusted-data clause. UI: ✦ draft prefills ReplyBox (the human always edits and sends via /respond, so it's safe even on waiting cards); ✦ diagnose on health-bad cards; ✦ triage on the board header, merged client-side so the ticker's diffed cards stay AI-free. Autopilot idle_mode="ai" — two-phase in the controller (the only tmux-write site): Phase A enqueues a draft at the message-mode injection point; Phase B, a later tick, re-checks idle / no waiting_for / updated_at unchanged / job <10min / rate-limit banner / max_sends / daily budget before typing, else logs ai_discarded with the reason. Budget MUSE_AI_DAILY_BUDGET_USD (default $2, ≤0 disables). Callables wired in main.py (track B commit) like on_reset. Also adds session_service.get_insights/get_insights_timeline seams used by the insights track.

Auth is a PURE ASGI middleware (never BaseHTTPMiddleware — it buffers streaming responses and breaks SSE + the /mcp sub-app): protects /api/* and /mcp*, leaves the SPA shell public, and accepts a Bearer header, the muse_auth HttpOnly cookie (EventSource can't send headers; cookies ride along), or a loopback client. Loopback bypass defaults ON, so the local UI, vite dev proxy, scripts, and local Claude Code MCP are untouched with zero config. Token from MUSE_AUTH_TOKEN or auto-generated into ~/.muse/auth_token (chmod 600) only when binding non-loopback. routers/auth.py: login (204 + SameSite=Lax cookie so ntfy taps authenticate; Secure only over https), logout, status. Frontend LoginGate overlays a token prompt on a 401 event or an unauthenticated probe; the board SSE hook probes status on error instead of reconnecting blindly. Settings.base_url (MUSE_PUBLIC_URL) makes ntfy click-throughs and MCP-cited links open the reachable (tailscale) address. README gains a remote-access section. main.py also wires the autopilot AI callables and registers the auth + insights routers, and serves real dist files (manifest/icons) from the catch-all with a no-cache index (no service worker → no stale shell). PWA: manifest.webmanifest + ✻ icons (192/512/maskable/apple-180), index.html meta/links.

…, timeline) New Insights page answering "what did my sessions produce" (Stats answers "what did I spend"). insights_outcomes.py is pure compute over tables the alerts tick already keeps warm — board_rollup (cost/tokens), session_windows (duration), session_health, commit_session⋈git_commits (provenance), usage_daily (calendar), file_activity error times, commit times — so nothing parses a transcript on the request path. Service get_insights is TTL-cached like get_stats. - Shipped vs burned: most-productive (commits per $, cost-floored) and most-wasteful (cost + bad health + zero commits); per-model/per-project commits-per-$10. Confidence policy: only high+medium count as shipped, low is shown separately and never enters a ratio (provenance is evidence, not proof). - Calendar heatmap (cost by local day, quantile intensity, click → that day's journal — JournalPage now reads ?day=). - 24×7 hour×weekday matrix (activity / errors / commits), local time. - Per-project timeline: session bars in greedy lanes + commit ticks by confidence. Bulk store reads added: GitIndex.commits_by_session/commit_times/commits_for_repo, FileIndex.error_times, HealthStore.snapshot_rows. Three hand-rolled SVG components (no chart dep). Also includes the shared frontend plumbing for all three round-4 tracks (api client/types, styles, nav/routes). 249 backend tests pass; ruff + build clean.

A /tmp/muse.har capture showed concurrent requests gridlocking the server (/related 39-103s, /open-loops 41s) — isolated curls had hidden it. - incremental.new_objects + transcript.iter_json_lines: tail-read files over MAX_PARSE_BYTES (env MUSE_MAX_PARSE_BYTES, 128MB; 0 disables). Largest real session ~40MB; gemini tmp dumps run 114MB-3.5GB. 3.6GB session 33.8s -> 2.4s. - lineage.build_lineage: mtime-cached + incremental (was a full re-parse on EVERY session open, ~0.23s flat; now 1.7ms warm). - _parse_cached: evict by total bytes (MUSE_PARSE_CACHE_BYTES, 256MB) not a 6-entry count. 100+ sessions were thrashing, so open-loops re-parsed ~12 briefs every poll. Now sessions stay warm. - get_related_sessions: one bulk edited_files_by_session() query instead of a per-session lock-acquiring query in a loop. Results: /related 0.31s cold/14ms warm, /open-loops 2.8s/2ms, /board 0.27s/1ms. Adds test_parse_cap.py (tail-cap + offset-advance regression tests).

…header dropdown - ConversationView: viewport virtualization (opt-in via virtualize, only the main viewer). Mounts only the rows near the scroll position instead of all ~8900, so the Markdown/React mount cost that made big sessions slow is gone. Prefix-sum spacers from measured heights; native scrollIntoView for every jump so correctness doesn't depend on offset math. forwardRef exposes scrollToUuid/scrollToTool/scrollToBottom; SessionViewPage routes its six scroll sites (deep link, timeline, tool-sync, side panels, jump-to-bottom, live append) through it. Opens at top / follows tail, as before. Live panes unchanged (virtualize defaults off). - RelatedSessions: rendered as a header dropdown next to Subagents (was a below-header bar).

The event loop crept to 100% CPU because SSE streams + their per-session tailers leaked. The stream/board generators polled request.is_disconnected(), which consumes the ASGI receive channel sse_starlette's own disconnect listener needs — so http.disconnect was stolen, streams were never cancelled, finally/unsubscribe never ran, and tailers (force-polling the FS every 500ms) ran forever. A fresh server with 0 connections still had 5 SSE handlers + 4 tailers; over hours these pile up and peg the loop. - routers/stream.py, routers/board.py: drop the manual is_disconnected() (let sse_starlette cancel the generator) + EventSourceResponse(ping=) so half-open sockets are caught by a failed write. Verified: tailer 1->2 on connect, back to 1 on disconnect; idle CPU 0-2% and flat. - cli.py: restart now kills whatever holds the port (ss-based _pids_on_port), not just the pidfile pid — a stale/pegged instance under a different pid no longer survives 'muse restart'. _start refuses if the port is held by an unknown pid. Shared _kill() helper. - main.py: add GET /api/debug/tasks (asyncio.all_tasks() + stacks) — pinpoints event-loop spins the thread dump can't see (loop shows only uvicorn.run).

The event-loop spin had a second, more fundamental cause beyond the SSE leak: uvicorn's default httptools protocol busy-loops at ~100% CPU on half-closed (CLOSE-WAIT) connections — left behind by port probes, dropped SSE/MCP streams, and proxies. A single such connection pegged the loop indefinitely (the prior stale instance ran 15h at 100%). h11 handles the half-close correctly. Verified: with 5 CLOSE-WAIT connections present, httptools held a constant 100%; h11 settles to 0-2% and the dead connections drain to 0. Throughput is irrelevant for a local single-user tool.

…pection Continued hunt for the 100%-CPU event-loop spin that GIL-starves every request (seen in veryslow.har: /related 129s, events/files/commits 24-31s — all fast, <20ms, when the loop is calm). Confirmed: CPU spins only while unreaped CLOSE-WAIT connections exist; the access proxy creates them. Could NOT reproduce synthetically (clean half-closes are always reaped under both loops), so these are defensive, not a verified root-cause fix: - cli.py: loop="asyncio" (was uvloop). uvloop AND httptools are uvicorn's C fast-path and both have known busy-loops on half-closed sockets; h11 (already) + the stdlib loop are the correctness-first combo and let /api/debug/loop introspect the loop. - routers/stream.py, board.py: MUSE_SSE_MAX_SECONDS cap (default 300s) so a proxy-half-closed SSE connection can't hold the loop for hours — it self-heals on the client's transparent EventSource reconnect. - main.py: GET /api/debug/loop (ready / scheduled / selector_fds / tasks) — call it WHILE slow to finally classify the spin (callback storm vs fd spin vs task). 254 tests pass; ruff clean.

… cause) Caught the 100%-CPU spin live via /api/debug/loop: ready≈N (loop never sleeps), N orphaned RequestResponseCycle.run_asgi SSE tasks, no muse code in any stack. Root cause: sse_starlette's _listen_for_disconnect (and Starlette's StreamingResponse) run 'while active: msg = await receive()'. uvicorn's receive() blocks on an idle request EXCEPT when the connection has pending bytes / certain keep-alive/proxy states, where it returns http.request immediately and repeatedly — so the listener loops at 100% CPU, one spin per orphaned SSE connection. That GIL-starved every request (the HARs: session open 0.5s -> 60s, /related -> 121s; all <20ms when the loop is calm). SafeEventSourceResponse (muse/sse.py) wraps the receive channel so non-disconnect messages are swallowed with a 0.5s floor — the listener can only ever wake on a real http.disconnect and can never busy-loop, whatever uvicorn/the proxy feeds it. Used by both /stream and /board/stream. Verified: a 2000-request pipelined flood on an SSE connection keeps CPU 0% / ready 0 (and /api/debug/loop now measures the spin directly: ready>0 == spinning).

The spin survived SafeEventSourceResponse (ready stays 3 under real proxy use), so it is NOT sse_starlette's disconnect listener. Dump the actual ready-queue callbacks + per-fd selector registration so the next live capture names exactly what is re-scheduling itself / which fd the loop keeps waking on.

…at the source Root-caused via /api/debug/loop with ready-callback dumps: the perpetually-ready callback was anyio CancelScope._deliver_cancellation. anyio re-schedules it via call_soon for as long as a cancelled task group has a child that hasn't exited. sse_starlette's EventSourceResponse (and Starlette's StreamingResponse) run the stream + a receive()-based disconnect listener inside anyio.create_task_group(); on the connection states a reverse proxy creates, a child never settles, so anyio busy-loops cancelling it — one stuck CancelScope per orphaned SSE connection, pegging the loop at 100% and GIL-starving every request (session open 0.5s->60s, /related->121s in the HARs; all <20ms when calm). Neither the earlier is_disconnected removal nor SafeEventSourceResponse helped because the busy-loop is anyio's cancellation, not the receive. RawSSE (muse/sse.py) streams with plain ASGI sends: no task group, no concurrent receive listener — nothing for anyio to busy-loop cancelling. A gone client's response lingers idle (zero CPU) until MUSE_SSE_MAX_SECONDS (default 90) then self-closes; the browser's EventSource reconnects transparently. Verified the frames are valid text/event-stream.

Responses had no compression; the big thread JSON dominates load time over a port-forward. Pure-ASGI gzip that passes text/event-stream through untouched (stock GZipMiddleware buffers SSE and reintroduces the stall class we just killed). Compression runs in a thread executor — zlib releases the GIL, so a ~300ms compress of an 11MB body doesn't block the loop or other requests.

The viewer virtualized RENDERING but still fetched the entire thread (~10k items / 11MB) before showing a slice. Now GET /api/sessions/{id} takes limit/anchor/before/after/around and ships only the needed window; the full parse stays cached so a window is a cheap slice. Default anchor follows liveness — head for a finished session (read top-down), tail for a live one. Frontend loads a window on open and fetches adjacent windows as you scroll past an edge: append reuses the append-only machinery, prepend shifts the height cache + anchors scrollTop so the viewport stays put. All jump paths (timeline, backlinks, notes, health, re-entry, ?focus=, compaction, file-ops) load-then- scroll via around=. Subagent menu derives from the full event timeline, not the windowed items. Opening a 45MB live session: 99KB on the wire vs 2.4MB. No params still returns the full thread (MCP + subagents unchanged).

lacraig2 added 13 commits June 16, 2026 21:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Round 4: AI copilot on the board, remote & mobile, insights#1

Round 4: AI copilot on the board, remote & mobile, insights#1
lacraig2 wants to merge 13 commits into
mainfrom
round-4-copilot-remote-insights

lacraig2 commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant