diff --git a/plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md b/plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md new file mode 100644 index 00000000..d1c602fb --- /dev/null +++ b/plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md @@ -0,0 +1,373 @@ +# Agent task prompts — Faster init/increment (PR-P1 → PR-P3) + +Status: **active**. One self-contained prompt per PR. Copy the prompt verbatim +into the agent, attach the files in its `@-files` block, and let it execute. + +> **Revised after the PR #339 subagent review.** PRs are split by **write-function**, +> not by path — the graph write helpers are shared between the full and incremental +> paths, so converting one accelerates both. Sentinel greps were corrected (the +> PR-P3 "must return zero" grep previously over-matched once-per-run sites). + +**Workflow per PR:** + +1. Create the branch named in the prompt off the stated base. +2. Read the cited plan section in full **before** writing code. +3. Implement step-by-step; run the listed tests after each step. +4. Run the sentinel greps — every "must return zero" line must be empty, every + "must be non-zero" line must hit. +5. Paste the manual-evidence output into the PR description. +6. Open a PR with the exact title in the Definition of Done. + +**Universal rules for every prompt:** + +- Use only `.venv/bin/python` and `.venv/bin/pip` (never system python/pip). +- `server.py` is stdio — never write to stdout from anything reachable by a tool handler. +- Do not add a cocoindex dependency outside `java_index_flow_lancedb.py`. +- The plan is the source of truth — if this prompt and the plan disagree, the plan wins. +- Do not touch any file outside the prompt's `@-files` + the test files it names. If you think an adjacent file must change, **stop and ask**. +- Do not loosen any existing test assertion to make it pass. +- Breaking changes are allowed; no compatibility shims. + +--- + +## PR-P1 — Bulk `COPY FROM` for `_write_edges` (shared; the ~250s prize) + +**Branch:** `perf/bulk-graph-writes-p1` off `master`. +**Base:** `master`. +**Plan section:** `plans/active/PLAN-INIT-INCREMENT-PERF.md` § PR-P1 (read this first). +**Estimated diff size:** medium (one module + tests + a fixture). + +**Attach (`@-files`):** + +- `@build_ast_graph.py` (focus: `_write_edges:3244`, `_node_row:2994`, `_SCHEMA_*:2812-2940`, `_callee_declaring_role_at_write:1647`, `seen_calls:3282`, `seen_ucs:3317`, `_populate_declares_rows:3189`) +- `@propose/active/INIT-INCREMENT-PERF-PROPOSE.md` (design; on PR #338's branch — if absent, the staging invariants are inlined in the plan §PR-P1) +- `@tests/test_ast_graph_build.py` (regression net + where new tests go) +- `@tests/test_incremental_graph.py` (`_write_edges` is shared — incremental regression is binding) +- `@tests/_builders.py` (graph-build helpers) + +**Prompt:** + +```` +You are implementing PR-P1 from `plans/active/PLAN-INIT-INCREMENT-PERF.md`. + +Read the **PR-P1** section in full. The plan wins if this prompt disagrees. + +KEY FACT: `_write_edges` (build_ast_graph.py:3244) is a SHARED helper called by +BOTH the full path (write_ladybug:3926) and the incremental path (:805). So +converting it accelerates both — there is no "full-only" edge conversion. PR-P1 +converts ONLY `_write_edges` (+ adds the `_bulk_copy` primitive). Nodes, routes, +clients/producers, and GraphMeta are PR-P2. + +## Scope + +1. **Step-1 spike (first commit):** confirm the exact REL `COPY FROM` column + naming + `pa.Table.from_pylist` typing with a throwaway 2-Symbol + 1-CALLS + toy. Record the working incantation in the `_bulk_copy` docstring. +2. Add `_bulk_copy(conn, table_name, columns, rows)` + the `_REL_*_COLUMNS` / + column constants (REL tables list FROM/TO first; match `_SCHEMA_*` order). +3. Convert `_write_edges` to per-edge-type row staging (apply the SAME + `seen_calls`/`seen_ucs` dedup and SAME `_callee_declaring_role_at_write` + lookup at staging, before appending), then `_bulk_copy` each REL table. + Bulk-load UnresolvedCallSite NODE rows before UNRESOLVED_AT edges (Symbol + nodes are already loaded by `_write_nodes`). +4. Delete the dead module constants `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL` + and the local `_CREATE_UNRESOLVED`/`_CREATE_UNRESOLVED_AT` (defined inside + `_write_edges` at :3307/:3313 — removed with the rewrite). +5. Generate + commit `tests/fixtures/graph_baseline_bank_chat.json` from the + last per-row `_write_edges` build before removal. +6. Add the four named tests. + +## Out of scope (do NOT touch) + +- Node writes (`_write_nodes`, `_write_nodes_impl`, `_write_nodes_merge`, + `_CREATE_SYMBOL`, `_MERGE_SYMBOL`) — PR-P2. +- Routes/clients/producers/calls (`_write_routes_and_exposes`, + `_write_clients_producers_and_calls`, and their `_CREATE_*` constants + including `_CREATE_CLIENT`/`_CREATE_PRODUCER`/`_CREATE_ROUTE` etc.) — PR-P2. +- `_write_meta` / GraphMeta — leave the MERGE alone (PR-P2 also leaves it). +- `java_index_flow_lancedb.py`, `path_filtering.py`, `server.py`. +- Any schema/ontology/re-index change. CSV or Parquet-file staging. + +If you find yourself wanting to touch any of the above, **stop and ask**. + +## Deliverables + +1. `_bulk_copy` helper + column-order constants. +2. `_write_edges` stages per-type rows and bulk-loads (CALLS dedup + callee_declaring_role at staging; UnresolvedCallSite before UNRESOLVED_AT). +3. Dead `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL` + locals removed. +4. `tests/fixtures/graph_baseline_bank_chat.json` committed. +5. Four new tests in `tests/test_ast_graph_build.py`. + +## Tests + +Run, all must pass: +``` +.venv/bin/python -m pytest tests/test_ast_graph_build.py tests/test_incremental_graph.py -v +.venv/bin/python -m pytest tests/test_bank_chat_brownfield_integration.py tests/test_call_edges_e2e.py -q +.venv/bin/ruff check . +``` + +Sentinel greps — **must return zero**: +``` +grep -nE "_CREATE_(EXT|IMPL|INJ|DECL|OVERRIDES|CALL|UNRESOLVED|UNRESOLVED_AT)\b" build_ast_graph.py +``` + +Sentinel greps — **must be non-zero** (guards against over-deletion; these belong to PR-P2 / are retained): +``` +grep -n "_MERGE_SYMBOL\b" build_ast_graph.py # node upsert, kept until PR-P2 +grep -n "_CREATE_CLIENT\b" build_ast_graph.py # routes/clients, PR-P2 +grep -n "MERGE (r:Route" build_ast_graph.py # pass5/6 dedup, kept +grep -n "COPY .*FROM \$rows" build_ast_graph.py # bulk path present +``` + +## Manual evidence (paste in PR description) + +```bash +rm -rf /tmp/p1 && .venv/bin/python build_ast_graph.py \ + --source-root tests/bank-chat-system \ + --ladybug-path /tmp/p1/code_graph.lbug --verbose +.venv/bin/java-codebase-rag meta --source-root tests/bank-chat-system --index-dir /tmp/p1 +``` +Expected: meta `counts_json` + node/edge counts identical to a pre-PR per-row +build (paste both). Note the graph-write phase timing from `JCIRAG_PROGRESS` +lines vs the pre-PR baseline. + +## Definition of Done + +- [ ] Step-1 spike result recorded in `_bulk_copy` docstring. +- [ ] `_write_edges` stages per-type rows (CALLS dedup + callee_declaring_role at staging); UnresolvedCallSite bulk-loaded before UNRESOLVED_AT. +- [ ] `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL` + local `_CREATE_UNRESOLVED/_UNRESOLVED_AT` deleted. +- [ ] `test_bulk_write_edges_match_per_row_baseline`, `test_bulk_write_is_deterministic_double_build`, `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role`, `test_bulk_write_empty_rel_table_is_noop` pass. +- [ ] Full `test_ast_graph_build.py` + `test_incremental_graph.py` pass unchanged. +- [ ] Sentinel greps: zero where required, non-zero where required. +- [ ] `.venv/bin/ruff check .` clean; benchmark in PR description. +- [ ] PR title: `perf(graph): bulk COPY FROM for _write_edges (PR-P1)`. +```` + +--- + +## PR-P2 — Bulk write for nodes + routes/clients/producers/calls + +**Branch:** `perf/bulk-graph-writes-p2` off PR-P1's branch (or `master` if PR-P1 merged). +**Base:** PR-P1 merged. +**Plan section:** `plans/active/PLAN-INIT-INCREMENT-PERF.md` § PR-P2 (read this first). +**Estimated diff size:** medium. + +**Attach (`@-files`):** + +- `@build_ast_graph.py` (`_write_nodes_impl:3029`, `_write_nodes:3096`, `_write_nodes_merge:817`, `_CREATE_SYMBOL`, `_MERGE_SYMBOL`, `_write_routes_and_exposes:3338`, `_write_clients_producers_and_calls:3810`, the Route `MERGE (r:Route` at `:3819`, `_write_meta:3421`, `_bulk_copy` + `_REL_*_COLUMNS` from PR-P1) +- `@tests/test_incremental_graph.py` (regression; add to `TestIncrementalOrchestrator`) + +**Prompt:** + +```` +You are implementing PR-P2 from `plans/active/PLAN-INIT-INCREMENT-PERF.md`. +Read the **PR-P2** section in full. It reuses PR-P1's `_bulk_copy` + +`_REL_*_COLUMNS`. The plan wins. + +## Scope + +1. Convert `_write_nodes_impl` (:3029, the shared workhorse called by + `_write_nodes` full + `_write_nodes_merge` incremental) to stage Symbol rows + then `_bulk_copy(conn, "Symbol", NODE_COLUMNS, rows)`. Delete `_CREATE_SYMBOL` + and `_MERGE_SYMBOL` (both dead once the workhorse is bulk). Do the existing + `resolve_role_and_capabilities` + `type_role_by_node_id` population before + staging, unchanged. +2. Convert `_write_routes_and_exposes` (:3338, shared) to per-table staging + + `_bulk_copy` for Route/EXPOSES/Client/Producer/DECLARES_CLIENT/DECLARES_PRODUCER/ + HTTP_CALLS/ASYNC_CALLS (keep the existing `_file_by_node_id`/`_file_by_client_id`/ + `_file_by_producer_id` source_file resolution). Bulk-load Route/Client/Producer + NODES before the EXPOSES/DECLARES_*/HTTP_CALLS/ASYNC_CALLS edges. Delete + `_CREATE_ROUTE`/`_CREATE_EXPOSES`. +3. Convert `_write_clients_producers_and_calls` (:3810, incremental-only global + pass5/6) Client/Producer/edge writes to per-type staging + `_bulk_copy` (keep + the `member_by_id`/`client_by_id`/`producer_by_id` resolution). **Retain the + `MERGE (r:Route {id:$id}) …` dedup (:3819-3828) verbatim** + add a one-line + comment it is intentionally kept. Now delete the 6 shared constants — + `_CREATE_CLIENT`/`_CREATE_PRODUCER`/`_CREATE_DECLARES_CLIENT`/ + `_CREATE_DECLARES_PRODUCER`/`_CREATE_HTTP_CALL`/`_CREATE_ASYNC_CALL` — which + are dead only after BOTH routes/exposes and clients_producers functions convert. +4. Leave `_write_meta` (:3421) and its `MERGE (m:GraphMeta …)` UNTOUCHED. +5. Add the two named tests as methods of `TestIncrementalOrchestrator` in + `tests/test_incremental_graph.py`. + +## Out of scope (do NOT touch) + +- `_write_edges` (done in PR-P1). +- `_write_meta` / GraphMeta MERGE — leave it. +- `_delete_file_scope`, `incremental_rebuild` algorithm, dependent-expansion, + crash-marker logic. +- Anything outside `build_ast_graph.py` + `tests/test_incremental_graph.py`. + +If you find yourself wanting to touch any of the above, **stop and ask**. + +## Deliverables + +1. `_write_nodes_impl` bulk; `_CREATE_SYMBOL` + `_MERGE_SYMBOL` deleted. +2. `_write_routes_and_exposes` bulk; `_CREATE_ROUTE`/`_CREATE_EXPOSES` deleted. +3. `_write_clients_producers_and_calls` Client/Producer/edges bulk; `MERGE (r:Route)` retained + commented; 6 shared `_CREATE_*` deleted. +4. `_write_meta` untouched. +5. Two new tests in `TestIncrementalOrchestrator`. + +## Tests + +Run, all must pass: +``` +.venv/bin/python -m pytest tests/test_incremental_graph.py tests/test_ast_graph_build.py -v +.venv/bin/ruff check . +``` + +Sentinel greps — **must return zero**: +``` +grep -nE "_CREATE_(SYMBOL|ROUTE|EXPOSES|CLIENT|PRODUCER|DECLARES_CLIENT|DECLARES_PRODUCER|HTTP_CALL|ASYNC_CALL)\b" build_ast_graph.py +grep -nE "_MERGE_SYMBOL\b" build_ast_graph.py +``` + +Sentinel greps — **must be non-zero** (Route dedup + GraphMeta MERGE retained; bulk present): +``` +grep -n "MERGE (r:Route" build_ast_graph.py +grep -n "MERGE (m:GraphMeta" build_ast_graph.py +grep -n "COPY .*FROM \$rows" build_ast_graph.py +``` + +## Manual evidence (paste in PR description) + +Single-file change equivalence: +```bash +# set up an index, touch one file, increment (bulk), then full-rebuild the same +# state (bulk) and diff graphs (node count, per-type edge counts, GraphMeta). +``` +Expected: incremental(bulk) == full-rebuild(bulk) for that state. Paste side-by-side counts. + +## Definition of Done + +- [ ] `_write_nodes_impl` bulk; `_CREATE_SYMBOL` + `_MERGE_SYMBOL` deleted. +- [ ] `_write_routes_and_exposes` bulk (Route/Client/Producer before edges); `_CREATE_ROUTE`/`_CREATE_EXPOSES` deleted. +- [ ] `_write_clients_producers_and_calls` Client/Producer/edges bulk; `MERGE (r:Route)` retained + commented; 6 shared `_CREATE_*` deleted. +- [ ] `_write_meta` untouched. +- [ ] `test_incremental_bulk_write_equivalent_to_full_rebuild`, `test_incremental_route_merge_dedup_preserved` (both in `TestIncrementalOrchestrator`) pass; full `test_incremental_graph.py` + `test_ast_graph_build.py` green. +- [ ] Sentinel greps pass. +- [ ] `.venv/bin/ruff check .` clean. +- [ ] PR title: `perf(graph): bulk COPY FROM for nodes, routes, clients/producers (PR-P2)`. +```` + +--- + +## PR-P3 — Cached `LayeredIgnore` + `is_ignored` memo + +**Branch:** `perf/cached-ignore-p3` off `master`. +**Base:** `master` (independent of PR-P1/P2). +**Plan section:** `plans/active/PLAN-INIT-INCREMENT-PERF.md` § PR-P3 (read this first). +**Estimated diff size:** small. + +**Attach (`@-files`):** + +- `@java_index_flow_lancedb.py` (`ContextKey` defs `:60-72`, `coco_lifespan` provide sites `:287-306`, `process_java_file:345`/`process_sql_file:417`/`process_yaml_file:465`) +- `@path_filtering.py` (`LayeredIgnore`, `_mega:334`, `_mega_build_for_rel:193`, `is_ignored:345`, `diagnose_dict:377`) +- `@tests/test_path_filtering.py` (where the two memo unit tests go) +- `@tests/test_lancedb_e2e.py` (HEAVY once-per-flow test) + +**Prompt:** + +```` +You are implementing PR-P3 from `plans/active/PLAN-INIT-INCREMENT-PERF.md`. + +Read the **PR-P3** section in full. Independent of PR-P1/P2. The plan wins. + +KEY FACT: `LayeredIgnore(project_root)` appears at FIVE sites in +java_index_flow_lancedb.py — :177, :351, :423, :471, :569. PR-P3 converts ONLY +the three `process_*_file` sites (:351/:423/:471). The other two (:177 in +`_approximate_vectors_total`, :569 in the app_main pre-walk) call +`cocoindex_excluded_patterns()` ONCE PER RUN — leave them alone. + +## Scope + +1. Define `IGNORE = coco.ContextKey[LayeredIgnore]("java_lance_layered_ignore")` + alongside `PROJECT_ROOT`/`EMBEDDER`/`LANCE_DB` (:60-72), reusing the SAME + `_ck_params` (`detect_change` vs `tracked`) detection block. +2. In `coco_lifespan` (:287-306), add `builder.provide(IGNORE, LayeredIgnore(root))` + — built ONCE per flow run. +3. In `process_java_file`/`process_sql_file`/`process_yaml_file`: add + `ignore = coco.use_context(IGNORE)` and replace + `LayeredIgnore(project_root).is_ignored((project_root / file.file_path.path).resolve())` + with `ignore.is_ignored((project_root / file.file_path.path).resolve())`. + Keep `project_root`. DO NOT touch :177 or :569. +4. In `path_filtering.py` `LayeredIgnore`: add `self._mega_cache` in `__init__` + and memoize `_mega(rel)` keyed by `Path(rel_project).parent.as_posix()` + (`_mega_build_for_rel` reads only `dir_parts`, so this is correct). +5. Add the three named tests in the right files. + +## Out of scope (do NOT touch) + +- `build_ast_graph.py` and the graph write path (PR-P1/P2). +- The ignore *decision* logic (`_mega_build_for_rel`, `_winning_row`, negation + scanning) — only memoize. +- Sites :177 and :569. +- Any schema/ontology/re-index change. Loosening any existing test. + +If you find yourself wanting to touch any of the above, **stop and ask**. + +## Deliverables + +1. `IGNORE` ContextKey (version-detected) + provided once in `coco_lifespan`. +2. The three `process_*_file` consume it; :177/:569 untouched. +3. `_mega` memoized by directory in `LayeredIgnore`. +4. Three tests: two in `tests/test_path_filtering.py`, one (HEAVY) in `tests/test_lancedb_e2e.py`. + +## Tests + +Run, all must pass: +``` +.venv/bin/python -m pytest tests/test_path_filtering.py tests/test_lancedb_e2e.py -q +.venv/bin/python -m pytest tests -q -k "ignore or path_filter or vectors_progress" +.venv/bin/ruff check . +``` +Heavy (only if you can run cocoindex e2e locally): +``` +JAVA_CODEBASE_RAG_RUN_HEAVY=1 .venv/bin/python -m pytest tests/test_lancedb_e2e.py -q +``` + +Sentinel greps — **must return zero** (matches ONLY the 3 process sites; :177/:569 use the bare constructor + `cocoindex_excluded_patterns`, not `.is_ignored`): +``` +grep -nE "LayeredIgnore\(project_root\)\.is_ignored" java_index_flow_lancedb.py +``` + +Sentinel greps — **must be non-zero**: +``` +grep -n "coco.use_context(IGNORE)" java_index_flow_lancedb.py # 3 sites +grep -n "_mega_cache" path_filtering.py # memo present +``` + +## Manual evidence (paste in PR description) + +With `JAVA_CODEBASE_RAG_RUN_HEAVY=1`, run the flow over a small corpus and log +`id(ignore)` per file (temporary instrumentation) — confirm a single object id +across all files. Then micro-benchmark `is_ignored` over N files: same-directory +files hit the `_mega` cache. + +## Definition of Done + +- [ ] `IGNORE` ContextKey (version-detected) + provided once in `coco_lifespan`. +- [ ] The three `process_*_file` consume it; :177/:569 untouched. +- [ ] `_mega` memoized by directory; `is_ignored`/`diagnose_dict` results unchanged. +- [ ] `test_is_ignored_mega_caches_by_directory`, `test_layered_ignore_memo_preserves_decisions` (in `tests/test_path_filtering.py`), `test_layered_ignore_provided_once_per_flow` (in `tests/test_lancedb_e2e.py`, HEAVY) pass. +- [ ] Existing ignore + vectors-progress tests pass unchanged. +- [ ] Sentinel greps pass. +- [ ] `.venv/bin/ruff check .` clean. +- [ ] PR title: `perf(vectors): lifespan-cached LayeredIgnore + is_ignored memo (PR-P3)`. +```` + +--- + +## Notes for the orchestrator + +- **Landing order:** PR-P1 → PR-P2 (P2 needs `_bulk_copy` + `_REL_*_COLUMNS`). + PR-P3 is independent (can go first, last, or between). +- **Shared-helper awareness:** `_write_edges`, `_write_routes_and_exposes`, + `_write_nodes_impl`, `_write_meta` are each called by BOTH paths. Converting + one accelerates both — so `test_incremental_graph.py` is a binding gate for + PR-P1 and PR-P2, not just PR-P2. +- **Review between PRs** (`superpowers:requesting-code-review`): the equivalence + harness gates P1/P2; the memo-parity test gates P3. +- **Sentinel greps are binding:** a non-empty "must return zero" grep = scope + leak; an empty "must be non-zero" grep = over-deletion. Either blocks merge. diff --git a/plans/active/PLAN-INIT-INCREMENT-PERF.md b/plans/active/PLAN-INIT-INCREMENT-PERF.md new file mode 100644 index 00000000..b6cff842 --- /dev/null +++ b/plans/active/PLAN-INIT-INCREMENT-PERF.md @@ -0,0 +1,406 @@ +# Plan: Faster init/increment — bulk graph writes + cached ignore + +Status: **active (planning)**. This plan implements +[`propose/active/INIT-INCREMENT-PERF-PROPOSE.md`](../../propose/active/INIT-INCREMENT-PERF-PROPOSE.md). +The proposal lands via PR #338; this plan lands via PR #339 and **stacks behind +#338** (the proposal file is on #338's branch until it merges). The staging +invariants are inlined below so this plan is self-contained if #338 has not +merged yet. + +Depends on: PR #338 (proposal) — non-blocking for reading this plan (key +invariants inlined), but the proposal should merge first. + +> **Revised after a 5-lens subagent review (PR #339 thread).** The original +> draft assumed `_write_edges` / `_write_routes_and_exposes` / `_write_meta` +> were full-rebuild-only and split PRs by *path*. They are **shared by both +> paths** (verified: `_write_edges` `build_ast_graph.py:3244` is called at +> `:3926` full + `:805` incremental; `_write_routes_and_exposes` `:3338` at +> `:3930` full + `:811` incremental; `_write_meta` `:3421` at `:3933` full + +> `:3733` incremental). PRs are now split by **write-function**, not by path, +> and GraphMeta is left on MERGE (not bulk-converted). + +## Goal + +- Cut `init` / `reprocess` / `increment` graph-write wall-clock by replacing + per-row `conn.execute` writes with bulk `COPY FROM` — the ~81% init lever. +- Remove the ~25s `LayeredIgnore` re-construction + `is_ignored` re-merge paid + per file in the cocoindex vectors phase. +- Preserve graph contents exactly: no ontology bump, no re-index, no + query-result change (proven by an equivalence harness). + +## Principles (do not relitigate in review) + +- **Byte-equivalent graph.** Every PR changes only the write mechanism; the + graph (node/edge rows, properties, `GraphMeta` counters) must be identical to + today. The equivalence harness + the full `test_incremental_graph.py` suite + are the merge gate. +- **Split by write-function, not by path.** The graph write helpers are shared + between the full path (`write_ladybug:3893`) and the incremental path + (`incremental_rebuild:3535`): `_write_edges`, `_write_routes_and_exposes`, + `_write_nodes_impl`, `_write_meta` are each called by BOTH. Converting a + shared helper accelerates **both** paths at once — there is no "full-only" + conversion for edges/routes. `_write_clients_producers_and_calls:3810` is + **incremental-only** (the global pass5/6 step; sole caller `:3716`). +- **GraphMeta stays on MERGE.** `_write_meta:3421` is shared and recomputes + counters before a single `MERGE (m:GraphMeta {key:$k})` (`:3472`). It is one + row and not worth the risk — **do not bulk-convert it**. (This reverses the + proposal's Open Q1 recommendation.) +- **In-memory pyarrow `COPY FROM`** is the bulk mechanism (verified: the + `ladybug` wrapper forwards `COPY FROM` and accepts a pyarrow param — + `ladybug/connection.py:337`/`:488`). Parquet-file is a fallback, not the + default. Do not propose CSV. +- **MPS device default is out of scope** — the flow already auto-selects MPS. +- **No new env vars, CLI flags, or public surface.** No compatibility shims. + +## PR breakdown - overview + +| PR | Scope (write-functions converted) | Ontology bump | Areas of concern | Test buckets | Independent of | +| --- | --- | --- | --- | --- | --- | +| **PR-P1** | Add `_bulk_copy`; convert **`_write_edges`** (shared → both paths' Symbol→Symbol edges + UnresolvedCallSite/UNRESOLVED_AT) | none | REL-table `COPY FROM` column order (FROM/TO first); CALLS dedup + `callee_declaring_role` materialized at staging; UnresolvedCallSite loaded before UNRESOLVED_AT | equivalence + determinism + baseline; full `test_ast_graph_build.py` **and** `test_incremental_graph.py` (shared helper) | — | +| **PR-P2** | Convert **`_write_nodes_impl`** (shared nodes), **`_write_routes_and_exposes`** (shared routes/clients/producers/calls), **`_write_clients_producers_and_calls`** (incremental-only global; Route MERGE preserved) | none | Route/Client/Producer nodes loaded before EXPOSES/DECLARES_*/HTTP_CALLS/ASYNC_CALLS; the 6 client/producer `_CREATE_*` constants are shared with the incremental-only function — delete only after BOTH convert; pass5/6 Route MERGE retained | `test_incremental_graph.py` regression + new incremental-equivalence test | depends on **PR-P1** (`_bulk_copy`) | +| **PR-P3** | Hoist `LayeredIgnore` to a cocoindex `ContextKey` + memoize `is_ignored`'s `_mega` by directory | none | `ContextKey` lifespan scoping; `_mega` memo correctness (mega depends on directory only); leave the once-per-run sites (`:177`, `:569`) alone; no change to any ignore *decision* | `tests/test_path_filtering.py` memo tests + `tests/test_lancedb_e2e.py` (HEAVY) once-per-flow | independent of PR-P1, PR-P2 | + +Landing order: **P1 → P2**; **P3** may land in any order (independent). + +## Resolved design decisions + +| Topic | Decision | +| --- | --- | +| Bulk mechanism | In-memory pyarrow: `conn.execute("COPY FROM $rows", {"rows": pa_table})`. Verified `ladybug/connection.py:337` (`FROM $param`) + `:488` (pyarrow accepted). | +| REL-table column rule | First two staged columns are the FROM/TO node primary keys. Exact column naming for REL `COPY FROM` locked by the PR-P1 step-1 spike. | +| GraphMeta (`_write_meta`) | **Leave on MERGE.** Shared helper, one row, recomputes counters (`build_ast_graph.py:3421`). Reverses proposal Open Q1. | +| `_write_nodes_impl` (shared workhorse) | Converted in PR-P2; both `_write_nodes` (full, `_CREATE_SYMBOL`) and `_write_nodes_merge` (incremental, `_MERGE_SYMBOL`) call it, so converting it once kills both constants. | +| PR-P3 cache vehicle | cocoindex `ContextKey[LayeredIgnore]` (lifespan-scoped, built once). | +| `is_ignored` memo | Cache `_mega(rel)` → `(mega, spec, meta)` keyed by the file's project-relative **directory** (`Path(rel).parent.as_posix()`); `_mega_build_for_rel` reads only `dir_parts = parts[:-1]` (`path_filtering.py:226-227`), so this is correct. `spec.match_file(rel)` stays per-file. | +| Sites `:177` / `:569` | Left alone — both call `cocoindex_excluded_patterns()` **once per run** (the `_approximate_vectors_total` helper and the app_main pre-walk), not per-file. | + +--- + +# PR-P1 — Bulk `COPY FROM` for `_write_edges` (shared; the ~250s prize) + +**Goal:** add the bulk primitive and convert the shared `_write_edges` helper. +Because `_write_edges` is called by both paths, this accelerates the +Symbol→Symbol edges + UnresolvedCallSite writes for **both** `init` and +`increment` (the largest graph-write cost: ~250s of ~321s). + +## File-by-file changes + +### 1. `build_ast_graph.py` — `_bulk_copy` primitive + `_write_edges` conversion + +#### 1a. New helper `_bulk_copy` (add near `_node_row`, ~`:2994`) +```python +import pyarrow as pa + +def _bulk_copy(conn, table_name, columns, rows): + """Bulk-load rows into a node/rel table via in-memory pyarrow COPY FROM. + + `columns` fixes column order; for REL tables the first two MUST be the + FROM/TO node primary keys (kuzu requirement). Empty `rows` is a no-op. + """ + if not rows: + return + tbl = pa.Table.from_pylist(rows) + conn.execute(f"COPY {table_name} FROM $rows", {"rows": tbl}) +``` +Column-order constants next to the `_SCHEMA_*` strings (`~:2812`), each matching +its `_SCHEMA_*` order; for REL tables the first two entries are the endpoint ids: +`_REL_EXTENDS_COLUMNS = ["FROM","TO","source_file","dst_name","dst_fqn","resolved"]`, +`_REL_CALLS_COLUMNS = ["FROM","TO","source_file","callee_declaring_role", <…resolved props>]`, +`UNRESOLVED_CALL_SITE_COLUMNS`, `_REL_UNRESOLVED_AT_COLUMNS = ["FROM","TO","source_file"]`, etc. + +> **Step-1 spike (mandatory, first commit of PR-P1):** confirm (a) the exact REL +> `COPY FROM` column naming — toy 2-Symbol + 1-CALLS `COPY CALLS FROM $rows` +> with `["FROM","TO",…]`, assert the edge lands with correct endpoints and +> `callee_declaring_role`; (b) `pa.Table.from_pylist` type inference for any LIST +> columns this helper touches. Record the working incantation in the docstring. +> (On kuzu 0.11.3 + the repo's pybind backend, all-empty LIST columns infer as +> `list` and are accepted — confirmed by review — so an explicit pa.schema +> is a fallback, not required.) + +#### 1b. Convert `_write_edges` (`:3244`, shared) to per-type staging + bulk +Today it loops `conn.execute(_CREATE_EXT, {…})` etc. with two dedup sets — +`seen_calls` (`:3282-3288`, key `(src_id,dst_id,arg_count,call_site_line)` — +verified) and `seen_ucs` (`:3317-3321`) — and a `callee_declaring_role` lookup +(`_callee_declaring_role_at_write`, `:1647`/`:3302`). Restructure to accumulate +per-edge-type row lists, applying the **same** dedup and **same** +`callee_declaring_role` materialization **before appending**, then `_bulk_copy` +each REL table once. REL row dicts get `FROM`/`TO` = src/dst node ids plus the +properties in `_SCHEMA_*` order. + +**Within-helper load order (kuzu validates endpoint existence):** Symbol nodes +are already loaded by `_write_nodes` (called before `_write_edges` in both +paths). So bulk-load the **UnresolvedCallSite node rows before the UNRESOLVED_AT +edge rows** (UNRESOLVED_AT is Symbol→UnresolvedCallSite). The Symbol→Symbol edges +(EXTENDS/IMPLEMENTS/INJECTS/DECLARES/OVERRIDES/CALLS) reference only already-loaded +Symbol nodes. + +`_CREATE_EXT`/`_CREATE_IMPL`/`_CREATE_INJ`/`_CREATE_DECL`/`_CREATE_OVERRIDES`/ +`_CREATE_CALL` (module constants) become dead → delete. `_CREATE_UNRESOLVED` and +`_CREATE_UNRESOLVED_AT` are **locals** defined inside `_write_edges` (`:3307`/`:3313`), +removed when the function is rewritten to bulk. `_populate_declares_rows` +(`:3189`) / `_populate_overrides_rows` (`:3206`) are pure in-memory population — +unchanged. + +### 2. `tests/test_ast_graph_build.py` — equivalence harness + baseline +The existing 26 tests are the regression net (e.g. +`test_schema_has_all_expected_tables`, `test_each_edge_type_populated`, +`test_pass3_callee_declaring_role_bank_annotated_types`, +`test_pass3_unresolved_call_site_emitted`, +`test_pass3_known_external_calls_preserved`). Add a committed baseline +`tests/fixtures/graph_baseline_bank_chat.json` (node count, per-type edge counts, +`GraphMeta` counters, and N=3 sampled edge property rows per type incl. +`source_file` and CALLS `callee_declaring_role`). **It is an equivalence anchor, +not a production invariant** — regenerated from the last per-row build before +removal, and regenerated only when `ontology_version` changes (it does not in +this plan). Asserting invariants here is acceptable because PR-P1 is a +behavior-preserving write-mechanism swap. + +### 3. `scripts/bench_init_graph_write.py` (new, dev-only) — benchmark +Times `init`/`build_ast_graph.py` on a medium corpus; prints the graph-write +phase delta. Not packaged; documents the measured speedup in the PR description. + +## Tests for PR-P1 + +1. `test_bulk_write_edges_match_per_row_baseline` — build `tests/bank-chat-system` + via the bulk path, assert node count, per-type edge counts, `GraphMeta` + counters, and sampled edge rows equal `graph_baseline_bank_chat.json`. +2. `test_bulk_write_is_deterministic_double_build` — build bank-chat twice to two + DBs via the bulk path, assert identical counts + query battery. Models on + `tests/test_brownfield_routes.py::test_29_determinism_pass4_route_ids` and + `tests/test_mcp_v2_compose.py::test_overrides_edge_set_deterministic_double_build`. +3. `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role` — CALLS rows + deduped by `(src,dst,argc,line)`; carry correct `callee_declaring_role` + (reuse the `@Service` callee assertion against a bulk build). +4. `test_bulk_write_empty_rel_table_is_noop` — a corpus with no `EXTENDS` edges + must not error (`_bulk_copy` no-ops on empty rows). + +**Must-still-pass (regression — `_write_edges` is shared, so both paths):** full +`tests/test_ast_graph_build.py`, `tests/test_incremental_graph.py` (28 tests), +`tests/test_bank_chat_brownfield_integration.py`, `tests/test_call_edges_e2e.py`. + +## Definition of done (PR-P1) + +- [ ] `_bulk_copy` helper added; step-1 spike result in its docstring. +- [ ] `_write_edges` stages per-type rows (CALLS dedup + `callee_declaring_role` at staging) and bulk-loads UnresolvedCallSite before UNRESOLVED_AT. +- [ ] `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL` deleted; local `_CREATE_UNRESOLVED/_UNRESOLVED_AT` gone with the rewrite. +- [ ] `test_bulk_write_edges_match_per_row_baseline`, + `test_bulk_write_is_deterministic_double_build`, + `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role`, + `test_bulk_write_empty_rel_table_is_noop` pass. +- [ ] Full `test_ast_graph_build.py` + `test_incremental_graph.py` pass unchanged. +- [ ] `.venv/bin/ruff check .` clean. +- [ ] Sentinel greps pass (see AGENT-PROMPTS); benchmark numbers in PR description. +- [ ] PR title: `perf(graph): bulk COPY FROM for _write_edges (PR-P1)`. + +## Implementation step list + +| # | Step | File(s) | Done when | +| --- | --- | --- | --- | +| 1 | Spike: REL `COPY FROM` column order + `from_pylist` typing | `build_ast_graph.py` (throwaway) | toy CALLS edge lands with correct endpoints; result in `_bulk_copy` docstring | +| 2 | Add `_bulk_copy` + `_REL_*_COLUMNS`/column constants | `build_ast_graph.py` | helper + constants defined | +| 3 | Convert `_write_edges` to per-type staging + bulk; load UnresolvedCallSite before UNRESOLVED_AT | `build_ast_graph.py` | edge counts match baseline; dedup + callee_declaring_role preserved | +| 4 | Delete dead `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL` + locals | `build_ast_graph.py` | sentinel greps pass | +| 5 | Generate + commit baseline | `tests/fixtures/graph_baseline_bank_chat.json` | from last per-row `_write_edges` build | +| 6 | Add 4 tests | `tests/test_ast_graph_build.py` | all pass | +| 7 | Full regression (ast_graph_build + incremental) + ruff; benchmark | repo | green; numbers in PR description | + +--- + +# PR-P2 — Bulk write for nodes + routes/clients/producers/calls + +**Goal:** convert the remaining graph writes. **Depends on PR-P1's `_bulk_copy`.** + +## File-by-file changes + +### 1. `build_ast_graph.py` + +#### 1a. Convert `_write_nodes_impl` (`:3029`, shared workhorse) +It is called by `_write_nodes` (`:3103`, full, `_CREATE_SYMBOL`) and +`_write_nodes_merge` (`:825`, incremental, `_MERGE_SYMBOL`). Convert its body to +stage all Symbol rows (packages/files/types/members, with the existing +`resolve_role_and_capabilities` + `type_role_by_node_id` population done before +staging) then `_bulk_copy(conn, "Symbol", NODE_COLUMNS, rows)`. Both wrappers now +share one bulk path; `_CREATE_SYMBOL` and `_MERGE_SYMBOL` become dead → delete +both. (`_write_nodes_impl`'s per-row loop is gone; the two wrappers may collapse +to one, but keep both names if they differ in non-write setup — minimize churn.) +> Note: this removes the ~40-line node-row-building duplication that a +> full-path-only conversion would have created — converting the shared workhorse +> avoids it entirely. + +#### 1b. Convert `_write_routes_and_exposes` (`:3338`, shared) to bulk +Today it loops over `routes_rows`/`exposes_rows`/`client_rows`/`declares_client_rows`/ +`producer_rows`/`declares_producer_rows`/`http_call_rows`/`async_call_rows`, calling +`_CREATE_ROUTE`/`_CREATE_EXPOSES`/`_CREATE_CLIENT`/`_CREATE_DECLARES_CLIENT`/ +`_CREATE_PRODUCER`/`_CREATE_DECLARES_PRODUCER`/`_CREATE_HTTP_CALL`/`_CREATE_ASYNC_CALL` +with the existing `_file_by_node_id`/`_file_by_client_id`/`_file_by_producer_id` +source_file resolution. Stage each table's rows (applying that resolution) and +`_bulk_copy`. **Load order within the helper:** bulk-load Route/Client/Producer +NODE rows before the EXPOSES/DECLARES_CLIENT/DECLARES_PRODUCER/HTTP_CALLS/ +ASYNC_CALLS edges (those edges reference those nodes + already-loaded Symbol). +`_CREATE_ROUTE`/`_CREATE_EXPOSES` become dead → delete. + +#### 1c. Convert `_write_clients_producers_and_calls` (`:3810`, incremental-only) to bulk +This is the global pass5/6 step (sole caller `incremental_rebuild:3716`). It +writes Route (via `MERGE (r:Route {id:$id}) …` to dedup against the scoped step, +`:3819-3828`), Client/Producer nodes (`_CREATE_CLIENT`/`_CREATE_PRODUCER`), and +DECLARES_CLIENT/DECLARES_PRODUCER/HTTP_CALLS/ASYNC_CALLS edges. Convert the +Client/Producer/edge writes to per-type staging + `_bulk_copy` (same +`member_by_id`/`client_by_id`/`producer_by_id` source_file resolution). +**Retain the `MERGE (r:Route …)` dedup verbatim** — routes written during the +scoped step must not duplicate; add a one-line comment that it is intentionally +kept. (Route upsert stays MERGE; only the Client/Producer/edge writes bulk.) +Now that BOTH `_write_routes_and_exposes` and `_write_clients_producers_and_calls` +are converted, the 6 shared constants — `_CREATE_CLIENT`/`_CREATE_PRODUCER`/ +`_CREATE_DECLARES_CLIENT`/`_CREATE_DECLARES_PRODUCER`/`_CREATE_HTTP_CALL`/ +`_CREATE_ASYNC_CALL` — are dead → delete all six. + +#### 1d. `_write_meta` — UNCHANGED +Leave `_write_meta` (`:3421`) on its `MERGE (m:GraphMeta …)`. Do not touch. + +### 2. `tests/test_incremental_graph.py` — incremental equivalence +Add the new tests as **methods of `TestIncrementalOrchestrator`** (same class as +`test_incremental_single_file_change`, `:230`-ish). The existing 28 tests must +pass unchanged. + +## Tests for PR-P2 + +1. `test_incremental_bulk_write_equivalent_to_full_rebuild` (in + `TestIncrementalOrchestrator`) — single-file change → `increment` (bulk) → + full rebuild of that state (bulk) → assert identical node count, per-type edge + counts, `GraphMeta` counters. +2. `test_incremental_route_merge_dedup_preserved` (in + `TestIncrementalOrchestrator`) — a corpus where pass5/6 re-emits an existing + route → no duplicate `Route` rows after `increment` (the retained MERGE + dedups). + +**Must-still-pass:** full `tests/test_incremental_graph.py`, `test_ast_graph_build.py`. + +## Definition of done (PR-P2) + +- [ ] `_write_nodes_impl` bulk; `_CREATE_SYMBOL` + `_MERGE_SYMBOL` deleted. +- [ ] `_write_routes_and_exposes` bulk; `_CREATE_ROUTE`/`_CREATE_EXPOSES` deleted. +- [ ] `_write_clients_producers_and_calls` Client/Producer/edge writes bulk; `MERGE (r:Route)` retained + commented; 6 shared `_CREATE_*` deleted. +- [ ] `_write_meta` untouched. +- [ ] Both new tests pass (in `TestIncrementalOrchestrator`); full incremental + ast_graph_build suites green. +- [ ] Sentinel greps pass; `.venv/bin/ruff check .` clean. +- [ ] PR title: `perf(graph): bulk COPY FROM for nodes, routes, clients/producers (PR-P2)`. + +## Implementation step list + +| # | Step | File(s) | Done when | +| --- | --- | --- | --- | +| 1 | Convert `_write_nodes_impl` to bulk; delete `_CREATE_SYMBOL`+`_MERGE_SYMBOL` | `build_ast_graph.py` | node counts match baseline; both wrappers share bulk | +| 2 | Convert `_write_routes_and_exposes` to bulk; load Route/Client/Producer before edges; delete `_CREATE_ROUTE`/`_CREATE_EXPOSES` | `build_ast_graph.py` | route/client/producer/call counts match | +| 3 | Convert `_write_clients_producers_and_calls` Client/Producer/edges to bulk; retain Route MERGE; delete 6 shared `_CREATE_*` | `build_ast_graph.py` | no duplicate routes; sentinel greps pass | +| 4 | Add 2 tests to `TestIncrementalOrchestrator` | `tests/test_incremental_graph.py` | both pass; full suite green | +| 5 | ruff + full regression | repo | clean + green | + +--- + +# PR-P3 — Cached `LayeredIgnore` (+ `is_ignored` memo) as a `ContextKey` + +**Goal:** remove the ~25s from re-constructing `LayeredIgnore(project_root)` per +file and re-merging `_mega` per file. **Independent** of PR-P1/P2. + +## File-by-file changes + +### 1. `java_index_flow_lancedb.py` +- Define `IGNORE = coco.ContextKey[LayeredIgnore]("java_lance_layered_ignore")` + alongside `PROJECT_ROOT`/`EMBEDDER`/`LANCE_DB` (`:60-72`), reusing the SAME + `_ck_params` (`detect_change` vs `tracked`) detection block. +- In `coco_lifespan` (`:287-306`), add `builder.provide(IGNORE, LayeredIgnore(root))` + — built **once** per flow run. +- In `process_java_file` (`:345`), `process_sql_file` (`:417`), `process_yaml_file` + (`:465`): add `ignore = coco.use_context(IGNORE)` and replace + `LayeredIgnore(project_root).is_ignored((project_root / file.file_path.path).resolve())` + (`:351`/`:423`/`:471`) with `ignore.is_ignored((project_root / file.file_path.path).resolve())`. + Keep `project_root` (still used for path resolution + `_parse_and_enrich_java`). +- **Leave `:177` and `:569` alone** — they call `cocoindex_excluded_patterns()` + once per run (the `_approximate_vectors_total` helper and the app_main + pre-walk), not per file. + +### 2. `path_filtering.py` +- Add `self._mega_cache: dict[str, tuple[list[str], GitIgnoreSpec, list]] = {}` + in `LayeredIgnore.__init__`. +- In `_mega` (`:334`): key on `Path(rel_project).parent.as_posix()`; return cached + `(mega, spec, meta)` if present, else compute via `_mega_build_for_rel` + + `GitIgnoreSpec.from_lines`, store, return. Correctness rests on + `_mega_build_for_rel` reading only `dir_parts` (`:226-227`). +- `is_ignored` (`:345`) and `diagnose_dict` (`:377`) call `_mega` unchanged and + benefit transparently (both consume the full `(mega, spec, meta)` tuple). + +## Tests for PR-P3 + +1. `test_is_ignored_mega_caches_by_directory` — in `tests/test_path_filtering.py`; + assert `_mega` is computed once per directory (spy on `_mega_build_for_rel`) + and decisions match the uncached path. +2. `test_layered_ignore_memo_preserves_decisions` — in `tests/test_path_filtering.py`; + for a corpus with nested ignore + gitignore negations, assert `is_ignored` is + identical with and without the cache. +3. `test_layered_ignore_provided_once_per_flow` — in `tests/test_lancedb_e2e.py` + (HEAVY, `JAVA_CODEBASE_RAG_RUN_HEAVY=1`); run the real flow, assert a single + `LayeredIgnore` instance (identity check), not per-file. + +**Must-still-pass:** `tests/test_lancedb_e2e.py::test_lancedb_ignore_file_reduces_indexed_java_files`, +`tests/test_path_filtering.py`, and the heavy +`tests/test_vectors_progress.py::test_flow_emits_vectors_progress_per_file`. + +## Definition of done (PR-P3) + +- [ ] `IGNORE` ContextKey (version-detected) + provided once in `coco_lifespan`. +- [ ] The three `process_*_file` consume it; sites `:177`/`:569` untouched. +- [ ] `_mega` memoized by directory; `is_ignored`/`diagnose_dict` results unchanged. +- [ ] The 3 named tests pass; existing ignore/vectors-progress tests pass unchanged. +- [ ] `.venv/bin/ruff check .` clean. +- [ ] PR title: `perf(vectors): lifespan-cached LayeredIgnore + is_ignored memo (PR-P3)`. + +## Implementation step list + +| # | Step | File(s) | Done when | +| --- | --- | --- | --- | +| 1 | Add `IGNORE` ContextKey (version-detected) + provide in lifespan | `java_index_flow_lancedb.py` | resolvable in process_*_file | +| 2 | Switch the three `process_*_file` to `coco.use_context(IGNORE)` | `java_index_flow_lancedb.py` | only `:351`/`:423`/`:471` changed; `:177`/`:569` untouched | +| 3 | Add `_mega` dirname cache | `path_filtering.py` | repeated same-dir calls hit cache | +| 4 | Add memo + correctness tests in `test_path_filtering.py`; once-per-flow in `test_lancedb_e2e.py` | `tests/` | all pass | +| 5 | ruff + regression (incl. heavy ignore/vectors tests) | repo | clean + green | + +--- + +# Cross-PR risks and mitigations + +| # | Risk | Severity | Mitigation | +| --- | --- | --- | --- | +| 1 | REL-table `COPY FROM` column order/naming differs from assumption | High | PR-P1 step-1 spike locks it before conversion. | +| 2 | Shared-helper conversion changes incremental output | High | `_write_edges`/`_write_routes_and_exposes`/`_write_nodes_impl` are shared → full `test_incremental_graph.py` (28 tests) is a merge gate for P1 and P2, plus the new incremental-equivalence test. | +| 3 | CALLS dedup / `callee_declaring_role` drift when moved to staging | High | `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role` + sampled-edge baseline. | +| 4 | Deleting a `_CREATE_*` constant still used by the other path | High | PR-P2 deletes the 6 client/producer constants only AFTER both `_write_routes_and_exposes` and `_write_clients_producers_and_calls` are converted (same PR). Sentinel greps enforce. | +| 5 | pass5/6 Route dedup breaks | Medium | `MERGE (r:Route)` retained by name; `test_incremental_route_merge_dedup_preserved` guards it. | +| 6 | PR-P3 `_mega` memo returns stale rules | Low | `_mega_build_for_rel` reads only `dir_parts`; `test_layered_ignore_memo_preserves_decisions` asserts parity. | +| 7 | PR-P3 sentinel over-matches `:177`/`:569` | Medium | Sentinel is `LayeredIgnore\(project_root\)\.is_ignored` (matches only the 3 process sites), NOT the bare constructor. | +| 8 | `ContextKey` lifespan differs across cocoindex versions | Low | Reuse the existing `_ck_params` `detect_change`/`tracked` detection block. | + +# Out of scope + +- MPS embedding default (already auto-selected). +- GraphMeta bulk conversion (left on MERGE — shared, one row). +- ANN index (#337) and `watch` mode (#336). +- Replacing/restructuring the cocoindex flow; changing embedding model/dim. +- Parallelizing graph analysis passes (pass1–pass6). +- Parquet-file or CSV bulk paths (pyarrow in-memory only). +- Converting `:177`/`:569` (once-per-run setup, not hot paths). + +# Whole-plan done definition + +1. `init`/`reprocess`/`increment` graph-write phase on the medium corpus drops + from ~321s to tens of seconds (benchmark in PR-P1; completed in PR-P2). +2. The vectors phase pays no per-file `LayeredIgnore`/`_mega` cost (PR-P3). +3. No ontology bump (`ontology_version` stays 17); no re-index required; all + existing graph/edge/brownfield/incremental/ignore/vectors tests pass. +4. Proposal moved to `propose/completed/` and this plan to `plans/completed/` + once all three PRs land. + +# Tracking + +- `PR-P1`: _pending_ +- `PR-P2`: _pending_ (blocked by PR-P1) +- `PR-P3`: _pending_