Round 4: AI copilot on the board, remote & mobile, insights#1
Open
lacraig2 wants to merge 13 commits into
Open
Round 4: AI copilot on the board, remote & mobile, insights#1lacraig2 wants to merge 13 commits into
lacraig2 wants to merge 13 commits into
Conversation
…opilot New `claude -p` job kinds (draft_reply, diagnose, triage) through the existing worker + _execute_ai_job seam: - pack_for_reply: small recent-weighted digest + reentry brief + the user's last few replies (style) + the live tmux pane capture. - pack_for_diagnose: digest + detected failure patterns (retry loop / spiral / denials) → markdown, routed to an AI note. - pack_for_triage: ≤8 attention cards → one fenced-JSON line each. sanitize_draft strips fences and refuses leading `/`/`!` (slash/bash-mode injection via transcript content); prompts carry an untrusted-data clause. UI: ✦ draft prefills ReplyBox (the human always edits and sends via /respond, so it's safe even on waiting cards); ✦ diagnose on health-bad cards; ✦ triage on the board header, merged client-side so the ticker's diffed cards stay AI-free. Autopilot idle_mode="ai" — two-phase in the controller (the only tmux-write site): Phase A enqueues a draft at the message-mode injection point; Phase B, a later tick, re-checks idle / no waiting_for / updated_at unchanged / job <10min / rate-limit banner / max_sends / daily budget before typing, else logs ai_discarded with the reason. Budget MUSE_AI_DAILY_BUDGET_USD (default $2, ≤0 disables). Callables wired in main.py (track B commit) like on_reset. Also adds session_service.get_insights/get_insights_timeline seams used by the insights track.
Auth is a PURE ASGI middleware (never BaseHTTPMiddleware — it buffers streaming responses and breaks SSE + the /mcp sub-app): protects /api/* and /mcp*, leaves the SPA shell public, and accepts a Bearer header, the muse_auth HttpOnly cookie (EventSource can't send headers; cookies ride along), or a loopback client. Loopback bypass defaults ON, so the local UI, vite dev proxy, scripts, and local Claude Code MCP are untouched with zero config. Token from MUSE_AUTH_TOKEN or auto-generated into ~/.muse/auth_token (chmod 600) only when binding non-loopback. routers/auth.py: login (204 + SameSite=Lax cookie so ntfy taps authenticate; Secure only over https), logout, status. Frontend LoginGate overlays a token prompt on a 401 event or an unauthenticated probe; the board SSE hook probes status on error instead of reconnecting blindly. Settings.base_url (MUSE_PUBLIC_URL) makes ntfy click-throughs and MCP-cited links open the reachable (tailscale) address. README gains a remote-access section. main.py also wires the autopilot AI callables and registers the auth + insights routers, and serves real dist files (manifest/icons) from the catch-all with a no-cache index (no service worker → no stale shell). PWA: manifest.webmanifest + ✻ icons (192/512/maskable/apple-180), index.html meta/links.
…, timeline) New Insights page answering "what did my sessions produce" (Stats answers "what did I spend"). insights_outcomes.py is pure compute over tables the alerts tick already keeps warm — board_rollup (cost/tokens), session_windows (duration), session_health, commit_session⋈git_commits (provenance), usage_daily (calendar), file_activity error times, commit times — so nothing parses a transcript on the request path. Service get_insights is TTL-cached like get_stats. - Shipped vs burned: most-productive (commits per $, cost-floored) and most-wasteful (cost + bad health + zero commits); per-model/per-project commits-per-$10. Confidence policy: only high+medium count as shipped, low is shown separately and never enters a ratio (provenance is evidence, not proof). - Calendar heatmap (cost by local day, quantile intensity, click → that day's journal — JournalPage now reads ?day=). - 24×7 hour×weekday matrix (activity / errors / commits), local time. - Per-project timeline: session bars in greedy lanes + commit ticks by confidence. Bulk store reads added: GitIndex.commits_by_session/commit_times/commits_for_repo, FileIndex.error_times, HealthStore.snapshot_rows. Three hand-rolled SVG components (no chart dep). Also includes the shared frontend plumbing for all three round-4 tracks (api client/types, styles, nav/routes). 249 backend tests pass; ruff + build clean.
A /tmp/muse.har capture showed concurrent requests gridlocking the server (/related 39-103s, /open-loops 41s) — isolated curls had hidden it. - incremental.new_objects + transcript.iter_json_lines: tail-read files over MAX_PARSE_BYTES (env MUSE_MAX_PARSE_BYTES, 128MB; 0 disables). Largest real session ~40MB; gemini tmp dumps run 114MB-3.5GB. 3.6GB session 33.8s -> 2.4s. - lineage.build_lineage: mtime-cached + incremental (was a full re-parse on EVERY session open, ~0.23s flat; now 1.7ms warm). - _parse_cached: evict by total bytes (MUSE_PARSE_CACHE_BYTES, 256MB) not a 6-entry count. 100+ sessions were thrashing, so open-loops re-parsed ~12 briefs every poll. Now sessions stay warm. - get_related_sessions: one bulk edited_files_by_session() query instead of a per-session lock-acquiring query in a loop. Results: /related 0.31s cold/14ms warm, /open-loops 2.8s/2ms, /board 0.27s/1ms. Adds test_parse_cap.py (tail-cap + offset-advance regression tests).
…header dropdown - ConversationView: viewport virtualization (opt-in via virtualize, only the main viewer). Mounts only the rows near the scroll position instead of all ~8900, so the Markdown/React mount cost that made big sessions slow is gone. Prefix-sum spacers from measured heights; native scrollIntoView for every jump so correctness doesn't depend on offset math. forwardRef exposes scrollToUuid/scrollToTool/scrollToBottom; SessionViewPage routes its six scroll sites (deep link, timeline, tool-sync, side panels, jump-to-bottom, live append) through it. Opens at top / follows tail, as before. Live panes unchanged (virtualize defaults off). - RelatedSessions: rendered as a header dropdown next to Subagents (was a below-header bar).
The event loop crept to 100% CPU because SSE streams + their per-session tailers leaked. The stream/board generators polled request.is_disconnected(), which consumes the ASGI receive channel sse_starlette's own disconnect listener needs — so http.disconnect was stolen, streams were never cancelled, finally/unsubscribe never ran, and tailers (force-polling the FS every 500ms) ran forever. A fresh server with 0 connections still had 5 SSE handlers + 4 tailers; over hours these pile up and peg the loop. - routers/stream.py, routers/board.py: drop the manual is_disconnected() (let sse_starlette cancel the generator) + EventSourceResponse(ping=) so half-open sockets are caught by a failed write. Verified: tailer 1->2 on connect, back to 1 on disconnect; idle CPU 0-2% and flat. - cli.py: restart now kills whatever holds the port (ss-based _pids_on_port), not just the pidfile pid — a stale/pegged instance under a different pid no longer survives 'muse restart'. _start refuses if the port is held by an unknown pid. Shared _kill() helper. - main.py: add GET /api/debug/tasks (asyncio.all_tasks() + stacks) — pinpoints event-loop spins the thread dump can't see (loop shows only uvicorn.run).
The event-loop spin had a second, more fundamental cause beyond the SSE leak: uvicorn's default httptools protocol busy-loops at ~100% CPU on half-closed (CLOSE-WAIT) connections — left behind by port probes, dropped SSE/MCP streams, and proxies. A single such connection pegged the loop indefinitely (the prior stale instance ran 15h at 100%). h11 handles the half-close correctly. Verified: with 5 CLOSE-WAIT connections present, httptools held a constant 100%; h11 settles to 0-2% and the dead connections drain to 0. Throughput is irrelevant for a local single-user tool.
…pection Continued hunt for the 100%-CPU event-loop spin that GIL-starves every request (seen in veryslow.har: /related 129s, events/files/commits 24-31s — all fast, <20ms, when the loop is calm). Confirmed: CPU spins only while unreaped CLOSE-WAIT connections exist; the access proxy creates them. Could NOT reproduce synthetically (clean half-closes are always reaped under both loops), so these are defensive, not a verified root-cause fix: - cli.py: loop="asyncio" (was uvloop). uvloop AND httptools are uvicorn's C fast-path and both have known busy-loops on half-closed sockets; h11 (already) + the stdlib loop are the correctness-first combo and let /api/debug/loop introspect the loop. - routers/stream.py, board.py: MUSE_SSE_MAX_SECONDS cap (default 300s) so a proxy-half-closed SSE connection can't hold the loop for hours — it self-heals on the client's transparent EventSource reconnect. - main.py: GET /api/debug/loop (ready / scheduled / selector_fds / tasks) — call it WHILE slow to finally classify the spin (callback storm vs fd spin vs task). 254 tests pass; ruff clean.
… cause) Caught the 100%-CPU spin live via /api/debug/loop: ready≈N (loop never sleeps), N orphaned RequestResponseCycle.run_asgi SSE tasks, no muse code in any stack. Root cause: sse_starlette's _listen_for_disconnect (and Starlette's StreamingResponse) run 'while active: msg = await receive()'. uvicorn's receive() blocks on an idle request EXCEPT when the connection has pending bytes / certain keep-alive/proxy states, where it returns http.request immediately and repeatedly — so the listener loops at 100% CPU, one spin per orphaned SSE connection. That GIL-starved every request (the HARs: session open 0.5s -> 60s, /related -> 121s; all <20ms when the loop is calm). SafeEventSourceResponse (muse/sse.py) wraps the receive channel so non-disconnect messages are swallowed with a 0.5s floor — the listener can only ever wake on a real http.disconnect and can never busy-loop, whatever uvicorn/the proxy feeds it. Used by both /stream and /board/stream. Verified: a 2000-request pipelined flood on an SSE connection keeps CPU 0% / ready 0 (and /api/debug/loop now measures the spin directly: ready>0 == spinning).
The spin survived SafeEventSourceResponse (ready stays 3 under real proxy use), so it is NOT sse_starlette's disconnect listener. Dump the actual ready-queue callbacks + per-fd selector registration so the next live capture names exactly what is re-scheduling itself / which fd the loop keeps waking on.
…at the source Root-caused via /api/debug/loop with ready-callback dumps: the perpetually-ready callback was anyio CancelScope._deliver_cancellation. anyio re-schedules it via call_soon for as long as a cancelled task group has a child that hasn't exited. sse_starlette's EventSourceResponse (and Starlette's StreamingResponse) run the stream + a receive()-based disconnect listener inside anyio.create_task_group(); on the connection states a reverse proxy creates, a child never settles, so anyio busy-loops cancelling it — one stuck CancelScope per orphaned SSE connection, pegging the loop at 100% and GIL-starving every request (session open 0.5s->60s, /related->121s in the HARs; all <20ms when calm). Neither the earlier is_disconnected removal nor SafeEventSourceResponse helped because the busy-loop is anyio's cancellation, not the receive. RawSSE (muse/sse.py) streams with plain ASGI sends: no task group, no concurrent receive listener — nothing for anyio to busy-loop cancelling. A gone client's response lingers idle (zero CPU) until MUSE_SSE_MAX_SECONDS (default 90) then self-closes; the browser's EventSource reconnects transparently. Verified the frames are valid text/event-stream.
Responses had no compression; the big thread JSON dominates load time over a port-forward. Pure-ASGI gzip that passes text/event-stream through untouched (stock GZipMiddleware buffers SSE and reintroduces the stall class we just killed). Compression runs in a thread executor — zlib releases the GIL, so a ~300ms compress of an 11MB body doesn't block the loop or other requests.
The viewer virtualized RENDERING but still fetched the entire thread (~10k
items / 11MB) before showing a slice. Now GET /api/sessions/{id} takes
limit/anchor/before/after/around and ships only the needed window; the full
parse stays cached so a window is a cheap slice. Default anchor follows
liveness — head for a finished session (read top-down), tail for a live one.
Frontend loads a window on open and fetches adjacent windows as you scroll past
an edge: append reuses the append-only machinery, prepend shifts the height
cache + anchors scrollTop so the viewport stays put. All jump paths (timeline,
backlinks, notes, health, re-entry, ?focus=, compaction, file-ops) load-then-
scroll via around=. Subagent menu derives from the full event timeline, not the
windowed items. Opening a 45MB live session: 99KB on the wire vs 2.4MB.
No params still returns the full thread (MCP + subagents unchanged).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Round 4 of the muse build-out, in three independent tracks. 249 backend tests pass; ruff + frontend build clean. Every track was verified live against the real corpus.
✦ AI copilot on the board (commit 1)
Headless
claude -pbrought to mission control, all on-demand or budget-capped (worker concurrency stays 1; shares the Max 5h window):✦drafts the next message from the session's recent context + your reply style + the live pane; you edit and send.sanitize_draftstrips fences and refuses//!leaders (injection defense); prompts carry an untrusted-data clause. Live: 8s / $0.09, clean directive output.✦on a health-bad card explains the failure + the unstick, saved as a note.aimode — two-phase in the controller (the only tmux-write site): enqueue a draft, then re-check idle / unchanged / rate-limit /MUSE_AI_DAILY_BUDGET_USD(default $2) before typing. 8 tests cover the discard matrix; never answers permission prompts.📱 Remote & mobile (commit 2)
~/.muse/auth_tokenon a public bind. Verified the full 401/200 matrix live on a real0.0.0.0bind, incl. SSE short-circuiting before any bytes.MUSE_PUBLIC_URLmakes ntfy taps + MCP links open the reachable address.📊 Insights (commit 3)
New page: "what did my sessions produce" (vs Stats' "what did I spend"), pure compute over warm tables — no transcript parsing on the request path (0.46s warm over 57 sessions).
Cuts (deliberate)
Per-card AI triage / any per-tick AI; permission-dialog automation; service worker; hamburger menu; touch-drag resize; timeline zoom; derived outcomes table.
🤖 Generated with Claude Code