Skip to content

Round 4: AI copilot on the board, remote & mobile, insights#1

Open
lacraig2 wants to merge 13 commits into
mainfrom
round-4-copilot-remote-insights
Open

Round 4: AI copilot on the board, remote & mobile, insights#1
lacraig2 wants to merge 13 commits into
mainfrom
round-4-copilot-remote-insights

Conversation

@lacraig2

Copy link
Copy Markdown
Contributor

Round 4 of the muse build-out, in three independent tracks. 249 backend tests pass; ruff + frontend build clean. Every track was verified live against the real corpus.

✦ AI copilot on the board (commit 1)

Headless claude -p brought to mission control, all on-demand or budget-capped (worker concurrency stays 1; shares the Max 5h window):

  • Draft reply drafts the next message from the session's recent context + your reply style + the live pane; you edit and send. sanitize_draft strips fences and refuses //! leaders (injection defense); prompts carry an untrusted-data clause. Live: 8s / $0.09, clean directive output.
  • Diagnose on a health-bad card explains the failure + the unstick, saved as a note.
  • Triage — one batched pass labels every needs-attention card; merged client-side so the live ticker stays AI-free.
  • Autopilot ai mode — two-phase in the controller (the only tmux-write site): enqueue a draft, then re-check idle / unchanged / rate-limit / MUSE_AI_DAILY_BUDGET_USD (default $2) before typing. 8 tests cover the discard matrix; never answers permission prompts.

📱 Remote & mobile (commit 2)

  • Token auth — pure-ASGI middleware (never BaseHTTPMiddleware → SSE-safe). Loopback bypasses by default so local UI/scripts/MCP are untouched; remote uses a Bearer header or login cookie. Token auto-generates to ~/.muse/auth_token on a public bind. Verified the full 401/200 matrix live on a real 0.0.0.0 bind, incl. SSE short-circuiting before any bytes.
  • Mobile — navbar scrolls, board single-columns, viewer defaults to conversation-only on phones; PWA installable.
  • MUSE_PUBLIC_URL makes ntfy taps + MCP links open the reachable address.

📊 Insights (commit 3)

New page: "what did my sessions produce" (vs Stats' "what did I spend"), pure compute over warm tables — no transcript parsing on the request path (0.46s warm over 57 sessions).

  • Shipped vs burned — productive (commits/$) vs wasteful (cost + bad health + 0 commits); per-model & per-project commits-per-$10. Only high+medium provenance counts as shipped; low shown separately (evidence, not authorship proof).
  • Quantile cost heatmap (click → that day's journal), 24×7 when-you-work matrix (activity/errors/commits), per-project timeline of session bars + commit ticks.

Cuts (deliberate)

Per-card AI triage / any per-tick AI; permission-dialog automation; service worker; hamburger menu; touch-drag resize; timeline zoom; derived outcomes table.

🤖 Generated with Claude Code

lacraig2 added 13 commits June 16, 2026 21:47
…opilot

New `claude -p` job kinds (draft_reply, diagnose, triage) through the existing
worker + _execute_ai_job seam:
- pack_for_reply: small recent-weighted digest + reentry brief + the user's
  last few replies (style) + the live tmux pane capture.
- pack_for_diagnose: digest + detected failure patterns (retry loop / spiral /
  denials) → markdown, routed to an AI note.
- pack_for_triage: ≤8 attention cards → one fenced-JSON line each.

sanitize_draft strips fences and refuses leading `/`/`!` (slash/bash-mode
injection via transcript content); prompts carry an untrusted-data clause.

UI: ✦ draft prefills ReplyBox (the human always edits and sends via /respond,
so it's safe even on waiting cards); ✦ diagnose on health-bad cards; ✦ triage
on the board header, merged client-side so the ticker's diffed cards stay
AI-free.

Autopilot idle_mode="ai" — two-phase in the controller (the only tmux-write
site): Phase A enqueues a draft at the message-mode injection point; Phase B,
a later tick, re-checks idle / no waiting_for / updated_at unchanged / job
<10min / rate-limit banner / max_sends / daily budget before typing, else logs
ai_discarded with the reason. Budget MUSE_AI_DAILY_BUDGET_USD (default $2, ≤0
disables). Callables wired in main.py (track B commit) like on_reset.

Also adds session_service.get_insights/get_insights_timeline seams used by the
insights track.
Auth is a PURE ASGI middleware (never BaseHTTPMiddleware — it buffers streaming
responses and breaks SSE + the /mcp sub-app): protects /api/* and /mcp*, leaves
the SPA shell public, and accepts a Bearer header, the muse_auth HttpOnly cookie
(EventSource can't send headers; cookies ride along), or a loopback client.
Loopback bypass defaults ON, so the local UI, vite dev proxy, scripts, and local
Claude Code MCP are untouched with zero config. Token from MUSE_AUTH_TOKEN or
auto-generated into ~/.muse/auth_token (chmod 600) only when binding non-loopback.

routers/auth.py: login (204 + SameSite=Lax cookie so ntfy taps authenticate;
Secure only over https), logout, status. Frontend LoginGate overlays a token
prompt on a 401 event or an unauthenticated probe; the board SSE hook probes
status on error instead of reconnecting blindly.

Settings.base_url (MUSE_PUBLIC_URL) makes ntfy click-throughs and MCP-cited
links open the reachable (tailscale) address. README gains a remote-access
section. main.py also wires the autopilot AI callables and registers the auth
+ insights routers, and serves real dist files (manifest/icons) from the
catch-all with a no-cache index (no service worker → no stale shell).

PWA: manifest.webmanifest + ✻ icons (192/512/maskable/apple-180), index.html
meta/links.
…, timeline)

New Insights page answering "what did my sessions produce" (Stats answers "what
did I spend"). insights_outcomes.py is pure compute over tables the alerts tick
already keeps warm — board_rollup (cost/tokens), session_windows (duration),
session_health, commit_session⋈git_commits (provenance), usage_daily (calendar),
file_activity error times, commit times — so nothing parses a transcript on the
request path. Service get_insights is TTL-cached like get_stats.

- Shipped vs burned: most-productive (commits per $, cost-floored) and
  most-wasteful (cost + bad health + zero commits); per-model/per-project
  commits-per-$10. Confidence policy: only high+medium count as shipped, low is
  shown separately and never enters a ratio (provenance is evidence, not proof).
- Calendar heatmap (cost by local day, quantile intensity, click → that day's
  journal — JournalPage now reads ?day=).
- 24×7 hour×weekday matrix (activity / errors / commits), local time.
- Per-project timeline: session bars in greedy lanes + commit ticks by
  confidence.

Bulk store reads added: GitIndex.commits_by_session/commit_times/commits_for_repo,
FileIndex.error_times, HealthStore.snapshot_rows. Three hand-rolled SVG
components (no chart dep). Also includes the shared frontend plumbing for all
three round-4 tracks (api client/types, styles, nav/routes).

249 backend tests pass; ruff + build clean.
A /tmp/muse.har capture showed concurrent requests gridlocking the server
(/related 39-103s, /open-loops 41s) — isolated curls had hidden it.

- incremental.new_objects + transcript.iter_json_lines: tail-read files over
  MAX_PARSE_BYTES (env MUSE_MAX_PARSE_BYTES, 128MB; 0 disables). Largest real
  session ~40MB; gemini tmp dumps run 114MB-3.5GB. 3.6GB session 33.8s -> 2.4s.
- lineage.build_lineage: mtime-cached + incremental (was a full re-parse on
  EVERY session open, ~0.23s flat; now 1.7ms warm).
- _parse_cached: evict by total bytes (MUSE_PARSE_CACHE_BYTES, 256MB) not a
  6-entry count. 100+ sessions were thrashing, so open-loops re-parsed ~12
  briefs every poll. Now sessions stay warm.
- get_related_sessions: one bulk edited_files_by_session() query instead of a
  per-session lock-acquiring query in a loop.

Results: /related 0.31s cold/14ms warm, /open-loops 2.8s/2ms, /board 0.27s/1ms.
Adds test_parse_cap.py (tail-cap + offset-advance regression tests).
…header dropdown

- ConversationView: viewport virtualization (opt-in via virtualize, only the
  main viewer). Mounts only the rows near the scroll position instead of all
  ~8900, so the Markdown/React mount cost that made big sessions slow is gone.
  Prefix-sum spacers from measured heights; native scrollIntoView for every
  jump so correctness doesn't depend on offset math. forwardRef exposes
  scrollToUuid/scrollToTool/scrollToBottom; SessionViewPage routes its six
  scroll sites (deep link, timeline, tool-sync, side panels, jump-to-bottom,
  live append) through it. Opens at top / follows tail, as before. Live panes
  unchanged (virtualize defaults off).
- RelatedSessions: rendered as a header dropdown next to Subagents (was a
  below-header bar).
The event loop crept to 100% CPU because SSE streams + their per-session
tailers leaked. The stream/board generators polled request.is_disconnected(),
which consumes the ASGI receive channel sse_starlette's own disconnect
listener needs — so http.disconnect was stolen, streams were never cancelled,
finally/unsubscribe never ran, and tailers (force-polling the FS every 500ms)
ran forever. A fresh server with 0 connections still had 5 SSE handlers + 4
tailers; over hours these pile up and peg the loop.

- routers/stream.py, routers/board.py: drop the manual is_disconnected() (let
  sse_starlette cancel the generator) + EventSourceResponse(ping=) so half-open
  sockets are caught by a failed write. Verified: tailer 1->2 on connect, back
  to 1 on disconnect; idle CPU 0-2% and flat.
- cli.py: restart now kills whatever holds the port (ss-based _pids_on_port),
  not just the pidfile pid — a stale/pegged instance under a different pid no
  longer survives 'muse restart'. _start refuses if the port is held by an
  unknown pid. Shared _kill() helper.
- main.py: add GET /api/debug/tasks (asyncio.all_tasks() + stacks) — pinpoints
  event-loop spins the thread dump can't see (loop shows only uvicorn.run).
The event-loop spin had a second, more fundamental cause beyond the SSE leak:
uvicorn's default httptools protocol busy-loops at ~100% CPU on half-closed
(CLOSE-WAIT) connections — left behind by port probes, dropped SSE/MCP streams,
and proxies. A single such connection pegged the loop indefinitely (the prior
stale instance ran 15h at 100%). h11 handles the half-close correctly.

Verified: with 5 CLOSE-WAIT connections present, httptools held a constant 100%;
h11 settles to 0-2% and the dead connections drain to 0. Throughput is
irrelevant for a local single-user tool.
…pection

Continued hunt for the 100%-CPU event-loop spin that GIL-starves every request
(seen in veryslow.har: /related 129s, events/files/commits 24-31s — all fast,
<20ms, when the loop is calm). Confirmed: CPU spins only while unreaped
CLOSE-WAIT connections exist; the access proxy creates them. Could NOT reproduce
synthetically (clean half-closes are always reaped under both loops), so these
are defensive, not a verified root-cause fix:

- cli.py: loop="asyncio" (was uvloop). uvloop AND httptools are uvicorn's C
  fast-path and both have known busy-loops on half-closed sockets; h11 (already)
  + the stdlib loop are the correctness-first combo and let /api/debug/loop
  introspect the loop.
- routers/stream.py, board.py: MUSE_SSE_MAX_SECONDS cap (default 300s) so a
  proxy-half-closed SSE connection can't hold the loop for hours — it self-heals
  on the client's transparent EventSource reconnect.
- main.py: GET /api/debug/loop (ready / scheduled / selector_fds / tasks) — call
  it WHILE slow to finally classify the spin (callback storm vs fd spin vs task).

254 tests pass; ruff clean.
… cause)

Caught the 100%-CPU spin live via /api/debug/loop: ready≈N (loop never sleeps),
N orphaned RequestResponseCycle.run_asgi SSE tasks, no muse code in any stack.
Root cause: sse_starlette's _listen_for_disconnect (and Starlette's
StreamingResponse) run 'while active: msg = await receive()'. uvicorn's receive()
blocks on an idle request EXCEPT when the connection has pending bytes / certain
keep-alive/proxy states, where it returns http.request immediately and repeatedly
— so the listener loops at 100% CPU, one spin per orphaned SSE connection. That
GIL-starved every request (the HARs: session open 0.5s -> 60s, /related -> 121s;
all <20ms when the loop is calm).

SafeEventSourceResponse (muse/sse.py) wraps the receive channel so non-disconnect
messages are swallowed with a 0.5s floor — the listener can only ever wake on a
real http.disconnect and can never busy-loop, whatever uvicorn/the proxy feeds it.
Used by both /stream and /board/stream. Verified: a 2000-request pipelined flood
on an SSE connection keeps CPU 0% / ready 0 (and /api/debug/loop now measures the
spin directly: ready>0 == spinning).
The spin survived SafeEventSourceResponse (ready stays 3 under real proxy use),
so it is NOT sse_starlette's disconnect listener. Dump the actual ready-queue
callbacks + per-fd selector registration so the next live capture names exactly
what is re-scheduling itself / which fd the loop keeps waking on.
…at the source

Root-caused via /api/debug/loop with ready-callback dumps: the perpetually-ready
callback was anyio CancelScope._deliver_cancellation. anyio re-schedules it via
call_soon for as long as a cancelled task group has a child that hasn't exited.
sse_starlette's EventSourceResponse (and Starlette's StreamingResponse) run the
stream + a receive()-based disconnect listener inside anyio.create_task_group();
on the connection states a reverse proxy creates, a child never settles, so anyio
busy-loops cancelling it — one stuck CancelScope per orphaned SSE connection,
pegging the loop at 100% and GIL-starving every request (session open 0.5s->60s,
/related->121s in the HARs; all <20ms when calm). Neither the earlier
is_disconnected removal nor SafeEventSourceResponse helped because the busy-loop
is anyio's cancellation, not the receive.

RawSSE (muse/sse.py) streams with plain ASGI sends: no task group, no concurrent
receive listener — nothing for anyio to busy-loop cancelling. A gone client's
response lingers idle (zero CPU) until MUSE_SSE_MAX_SECONDS (default 90) then
self-closes; the browser's EventSource reconnects transparently. Verified the
frames are valid text/event-stream.
Responses had no compression; the big thread JSON dominates load time over a
port-forward. Pure-ASGI gzip that passes text/event-stream through untouched
(stock GZipMiddleware buffers SSE and reintroduces the stall class we just
killed). Compression runs in a thread executor — zlib releases the GIL, so a
~300ms compress of an 11MB body doesn't block the loop or other requests.
The viewer virtualized RENDERING but still fetched the entire thread (~10k
items / 11MB) before showing a slice. Now GET /api/sessions/{id} takes
limit/anchor/before/after/around and ships only the needed window; the full
parse stays cached so a window is a cheap slice. Default anchor follows
liveness — head for a finished session (read top-down), tail for a live one.

Frontend loads a window on open and fetches adjacent windows as you scroll past
an edge: append reuses the append-only machinery, prepend shifts the height
cache + anchors scrollTop so the viewport stays put. All jump paths (timeline,
backlinks, notes, health, re-entry, ?focus=, compaction, file-ops) load-then-
scroll via around=. Subagent menu derives from the full event timeline, not the
windowed items. Opening a 45MB live session: 99KB on the wire vs 2.4MB.

No params still returns the full thread (MCP + subagents unchanged).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant