feat: --watch mode — keep the index live as files change
Summary
Add a long-running java-codebase-rag watch (or init/increment --watch) that
listens for source-tree changes and updates the index incrementally in the
background, so the index stays current without the operator re-running
increment by hand.
Problem
Today the index is static between explicit increment invocations. During
active development (tight edit→search loops inside an MCP host), the index
drifts from the working tree until the user remembers to re-run increment.
For a tool whose value is "fresh, correct context for the agent," stale-by-
default is the wrong default.
Why this is more tractable than it looks
The hard parts of incremental update already exist:
- Vectors: cocoindex
@coco.fn(memo=True) skips unchanged files automatically
(java_index_flow_lancedb.py). A single changed file ≈ a single embed + a
small merge-insert.
- Graph:
build_ast_graph.py::incremental_rebuild (--incremental) already
detects added/changed/removed files and their dependents (cross-file
CALLS / HTTP_CALLS / ASYNC_CALLS edges are re-scoped), and rewrites only
that scope. The dependent-rescope logic — usually the gnarly part of a code
graph watcher — is done.
- Ignore:
LayeredIgnore already decides what's in/out of the index, so the
watcher can filter noise for free.
So watch is mostly a trigger/wiring problem, not a core-algorithm problem.
Proposed UX
java-codebase-rag watch --source-root <repo> [--index-dir <dir>] [--debounce 1.5s]
# runs until Ctrl-C; prints a structured event per applied update
Optional: init --watch / increment --watch flags that enter watch mode after
the initial pass.
Design options (the part to grill)
Option A — Watchdog → debounced increment (recommended MVP)
watchdog (FSEvents on macOS, inotify on Linux) watches the tree; events are
debounced (editors multi-save; build tools rewrite generated files); on a quiet
window we invoke the existing increment path (run_cocoindex_update +
run_incremental_graph).
- ✅ Reuses the proven subprocess architecture (flow runs in a child cocoindex
process today; this changes nothing about that).
- ✅ Smallest blast radius — no new coupling to cocoindex internals.
- ❌ Re-spawns the cocoindex child per flush → per-flush process startup +
model-load cost (mitigated: model stays in OS page cache; embedder warm-up is
the ~1s cost, acceptable at human edit cadence).
Option B — cocoindex LiveComponent (proper reactive vectors)
Use cocoindex's live-component model (coco.LiveComponentOperator,
update_full / mark_ready / update / delete, watchdog-driven) so vectors
update in-process without re-spawning, paired with graph incremental_rebuild.
- ✅ No per-flush child spawn; true incremental vector upsert.
- ❌ Architectural mismatch. The flow currently runs as a short-lived child
(cocoindex update …). LiveComponent requires a long-lived process
holding the flow live — i.e. we'd host cocoindex in-process as a daemon. This
is a real shift in how the lifecycle works, and conflicts with the current
"spawn and wait" model in pipeline.py.
- ❌ cocoindex's live model only covers the vectors half; the graph half
still needs our own incremental_rebuild trigger, so we'd carry two update
mechanisms (one cocoindex-native, one ours) that must stay coherent.
Recommendation
Ship Option A first (debounced increment). Treat Option B as a future
optimization only if per-flush spawn cost is measured to matter at real edit
cadence.
Open questions (please grill)
optimize() throttling. increment ends with a serialized
table.optimize() (full Lance compaction). On a watch loop firing per edit,
that's unacceptable overhead. Proposal: in watch mode, skip optimize per
flush and run it on a coarse timer (e.g. every 60s of activity, or every N
flushes). Acceptable? Does un-compacted-with-deletes hurt query correctness,
or only latency/disk? (Recall: query correctness does not depend on
optimize() — only compaction/prune do.)
- FTS index freshness.
ensure_text_fts_index is created lazily on first
search and not refreshed as rows are added/removed. Under continuous watch,
do deleted/added chunks appear in find results until a re-create? Need to
confirm whether the FTS index is append-friendly or needs periodic
replace=True.
- Process model. Watch is a long-running daemon; the MCP server is a
short-lived stdio process per host session. Should watch be (a) a separate
background process the user starts (simplest, decoupled), or (b) something the
MCP server spawns/manages? Lean (a) — keeps the stdio server lean and avoids
lifecycle coupling. Confirm.
- MCP server cache invalidation. Does the running MCP server (or its
callers) cache anything that a background watch update would make stale? If
so, watch needs a signal/heartbeat the server can observe (or the server
must re-open tables per request).
- Editor/build-tool noise.
target/, build/, generated sources. Do we
rely solely on LayeredIgnore, or add a watch-specific debounce/heuristic for
rapid burst writes (e.g. a mvn compile touching hundreds of files)?
- Coherence across the two stores. If vectors update but the graph rebuild
fails (or vice-versa) mid-flush, the index is temporarily inconsistent. Is
best-effort eventual consistency acceptable for a background freshness
feature (vs. the strict consistency init/increment promise)?
Non-goals
- Real-time (<250ms) updates — human edit cadence with debounce is the target.
- Distributed / multi-writer index access.
- Replacing the
increment command — watch layers on top of the same primitives.
Rough effort
Option A MVP: ~1 PR (watchdog integration + debounce + increment reuse +
optimize throttling + tests). Option B: multi-PR, blocked on the lifecycle
decision in Q3.
feat:
--watchmode — keep the index live as files changeSummary
Add a long-running
java-codebase-rag watch(orinit/increment --watch) thatlistens for source-tree changes and updates the index incrementally in the
background, so the index stays current without the operator re-running
incrementby hand.Problem
Today the index is static between explicit
incrementinvocations. Duringactive development (tight edit→search loops inside an MCP host), the index
drifts from the working tree until the user remembers to re-run
increment.For a tool whose value is "fresh, correct context for the agent," stale-by-
default is the wrong default.
Why this is more tractable than it looks
The hard parts of incremental update already exist:
@coco.fn(memo=True)skips unchanged files automatically(
java_index_flow_lancedb.py). A single changed file ≈ a single embed + asmall merge-insert.
build_ast_graph.py::incremental_rebuild(--incremental) alreadydetects added/changed/removed files and their dependents (cross-file
CALLS/HTTP_CALLS/ASYNC_CALLSedges are re-scoped), and rewrites onlythat scope. The dependent-rescope logic — usually the gnarly part of a code
graph watcher — is done.
LayeredIgnorealready decides what's in/out of the index, so thewatcher can filter noise for free.
So
watchis mostly a trigger/wiring problem, not a core-algorithm problem.Proposed UX
Optional:
init --watch/increment --watchflags that enter watch mode afterthe initial pass.
Design options (the part to grill)
Option A — Watchdog → debounced
increment(recommended MVP)watchdog(FSEvents on macOS, inotify on Linux) watches the tree; events aredebounced (editors multi-save; build tools rewrite generated files); on a quiet
window we invoke the existing
incrementpath (run_cocoindex_update+run_incremental_graph).process today; this changes nothing about that).
model-load cost (mitigated: model stays in OS page cache; embedder warm-up is
the ~1s cost, acceptable at human edit cadence).
Option B — cocoindex
LiveComponent(proper reactive vectors)Use cocoindex's live-component model (
coco.LiveComponentOperator,update_full/mark_ready/update/delete, watchdog-driven) so vectorsupdate in-process without re-spawning, paired with graph
incremental_rebuild.(
cocoindex update …).LiveComponentrequires a long-lived processholding the flow live — i.e. we'd host cocoindex in-process as a daemon. This
is a real shift in how the lifecycle works, and conflicts with the current
"spawn and wait" model in
pipeline.py.still needs our own
incremental_rebuildtrigger, so we'd carry two updatemechanisms (one cocoindex-native, one ours) that must stay coherent.
Recommendation
Ship Option A first (debounced
increment). Treat Option B as a futureoptimization only if per-flush spawn cost is measured to matter at real edit
cadence.
Open questions (please grill)
optimize()throttling.incrementends with a serializedtable.optimize()(full Lance compaction). On a watch loop firing per edit,that's unacceptable overhead. Proposal: in watch mode, skip optimize per
flush and run it on a coarse timer (e.g. every 60s of activity, or every N
flushes). Acceptable? Does un-compacted-with-deletes hurt query correctness,
or only latency/disk? (Recall: query correctness does not depend on
optimize()— only compaction/prune do.)ensure_text_fts_indexis created lazily on firstsearch and not refreshed as rows are added/removed. Under continuous watch,
do deleted/added chunks appear in
findresults until a re-create? Need toconfirm whether the FTS index is append-friendly or needs periodic
replace=True.short-lived stdio process per host session. Should
watchbe (a) a separatebackground process the user starts (simplest, decoupled), or (b) something the
MCP server spawns/manages? Lean (a) — keeps the stdio server lean and avoids
lifecycle coupling. Confirm.
callers) cache anything that a background watch update would make stale? If
so, watch needs a signal/heartbeat the server can observe (or the server
must re-open tables per request).
target/,build/, generated sources. Do werely solely on
LayeredIgnore, or add a watch-specific debounce/heuristic forrapid burst writes (e.g. a
mvn compiletouching hundreds of files)?fails (or vice-versa) mid-flush, the index is temporarily inconsistent. Is
best-effort eventual consistency acceptable for a background freshness
feature (vs. the strict consistency
init/incrementpromise)?Non-goals
incrementcommand —watchlayers on top of the same primitives.Rough effort
Option A MVP: ~1 PR (watchdog integration + debounce +
incrementreuse +optimize throttling + tests). Option B: multi-PR, blocked on the lifecycle
decision in Q3.