diff --git a/plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md b/plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md new file mode 100644 index 00000000..d1c602fb --- /dev/null +++ b/plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md @@ -0,0 +1,373 @@ +# Agent task prompts — Faster init/increment (PR-P1 → PR-P3) + +Status: **active**. One self-contained prompt per PR. Copy the prompt verbatim +into the agent, attach the files in its `@-files` block, and let it execute. + +> **Revised after the PR #339 subagent review.** PRs are split by **write-function**, +> not by path — the graph write helpers are shared between the full and incremental +> paths, so converting one accelerates both. Sentinel greps were corrected (the +> PR-P3 "must return zero" grep previously over-matched once-per-run sites). + +**Workflow per PR:** + +1. Create the branch named in the prompt off the stated base. +2. Read the cited plan section in full **before** writing code. +3. Implement step-by-step; run the listed tests after each step. +4. Run the sentinel greps — every "must return zero" line must be empty, every + "must be non-zero" line must hit. +5. Paste the manual-evidence output into the PR description. +6. Open a PR with the exact title in the Definition of Done. + +**Universal rules for every prompt:** + +- Use only `.venv/bin/python` and `.venv/bin/pip` (never system python/pip). +- `server.py` is stdio — never write to stdout from anything reachable by a tool handler. +- Do not add a cocoindex dependency outside `java_index_flow_lancedb.py`. +- The plan is the source of truth — if this prompt and the plan disagree, the plan wins. +- Do not touch any file outside the prompt's `@-files` + the test files it names. If you think an adjacent file must change, **stop and ask**. +- Do not loosen any existing test assertion to make it pass. +- Breaking changes are allowed; no compatibility shims. + +--- + +## PR-P1 — Bulk `COPY FROM` for `_write_edges` (shared; the ~250s prize) + +**Branch:** `perf/bulk-graph-writes-p1` off `master`. +**Base:** `master`. +**Plan section:** `plans/active/PLAN-INIT-INCREMENT-PERF.md` § PR-P1 (read this first). +**Estimated diff size:** medium (one module + tests + a fixture). + +**Attach (`@-files`):** + +- `@build_ast_graph.py` (focus: `_write_edges:3244`, `_node_row:2994`, `_SCHEMA_*:2812-2940`, `_callee_declaring_role_at_write:1647`, `seen_calls:3282`, `seen_ucs:3317`, `_populate_declares_rows:3189`) +- `@propose/active/INIT-INCREMENT-PERF-PROPOSE.md` (design; on PR #338's branch — if absent, the staging invariants are inlined in the plan §PR-P1) +- `@tests/test_ast_graph_build.py` (regression net + where new tests go) +- `@tests/test_incremental_graph.py` (`_write_edges` is shared — incremental regression is binding) +- `@tests/_builders.py` (graph-build helpers) + +**Prompt:** + +```` +You are implementing PR-P1 from `plans/active/PLAN-INIT-INCREMENT-PERF.md`. + +Read the **PR-P1** section in full. The plan wins if this prompt disagrees. + +KEY FACT: `_write_edges` (build_ast_graph.py:3244) is a SHARED helper called by +BOTH the full path (write_ladybug:3926) and the incremental path (:805). So +converting it accelerates both — there is no "full-only" edge conversion. PR-P1 +converts ONLY `_write_edges` (+ adds the `_bulk_copy` primitive). Nodes, routes, +clients/producers, and GraphMeta are PR-P2. + +## Scope + +1. **Step-1 spike (first commit):** confirm the exact REL `COPY FROM` column + naming + `pa.Table.from_pylist` typing with a throwaway 2-Symbol + 1-CALLS + toy. Record the working incantation in the `_bulk_copy` docstring. +2. Add `_bulk_copy(conn, table_name, columns, rows)` + the `_REL_*_COLUMNS` / + column constants (REL tables list FROM/TO first; match `_SCHEMA_*` order). +3. Convert `_write_edges` to per-edge-type row staging (apply the SAME + `seen_calls`/`seen_ucs` dedup and SAME `_callee_declaring_role_at_write` + lookup at staging, before appending), then `_bulk_copy` each REL table. + Bulk-load UnresolvedCallSite NODE rows before UNRESOLVED_AT edges (Symbol + nodes are already loaded by `_write_nodes`). +4. Delete the dead module constants `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL` + and the local `_CREATE_UNRESOLVED`/`_CREATE_UNRESOLVED_AT` (defined inside + `_write_edges` at :3307/:3313 — removed with the rewrite). +5. Generate + commit `tests/fixtures/graph_baseline_bank_chat.json` from the + last per-row `_write_edges` build before removal. +6. Add the four named tests. + +## Out of scope (do NOT touch) + +- Node writes (`_write_nodes`, `_write_nodes_impl`, `_write_nodes_merge`, + `_CREATE_SYMBOL`, `_MERGE_SYMBOL`) — PR-P2. +- Routes/clients/producers/calls (`_write_routes_and_exposes`, + `_write_clients_producers_and_calls`, and their `_CREATE_*` constants + including `_CREATE_CLIENT`/`_CREATE_PRODUCER`/`_CREATE_ROUTE` etc.) — PR-P2. +- `_write_meta` / GraphMeta — leave the MERGE alone (PR-P2 also leaves it). +- `java_index_flow_lancedb.py`, `path_filtering.py`, `server.py`. +- Any schema/ontology/re-index change. CSV or Parquet-file staging. + +If you find yourself wanting to touch any of the above, **stop and ask**. + +## Deliverables + +1. `_bulk_copy` helper + column-order constants. +2. `_write_edges` stages per-type rows and bulk-loads (CALLS dedup + callee_declaring_role at staging; UnresolvedCallSite before UNRESOLVED_AT). +3. Dead `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL` + locals removed. +4. `tests/fixtures/graph_baseline_bank_chat.json` committed. +5. Four new tests in `tests/test_ast_graph_build.py`. + +## Tests + +Run, all must pass: +``` +.venv/bin/python -m pytest tests/test_ast_graph_build.py tests/test_incremental_graph.py -v +.venv/bin/python -m pytest tests/test_bank_chat_brownfield_integration.py tests/test_call_edges_e2e.py -q +.venv/bin/ruff check . +``` + +Sentinel greps — **must return zero**: +``` +grep -nE "_CREATE_(EXT|IMPL|INJ|DECL|OVERRIDES|CALL|UNRESOLVED|UNRESOLVED_AT)\b" build_ast_graph.py +``` + +Sentinel greps — **must be non-zero** (guards against over-deletion; these belong to PR-P2 / are retained): +``` +grep -n "_MERGE_SYMBOL\b" build_ast_graph.py # node upsert, kept until PR-P2 +grep -n "_CREATE_CLIENT\b" build_ast_graph.py # routes/clients, PR-P2 +grep -n "MERGE (r:Route" build_ast_graph.py # pass5/6 dedup, kept +grep -n "COPY .*FROM \$rows" build_ast_graph.py # bulk path present +``` + +## Manual evidence (paste in PR description) + +```bash +rm -rf /tmp/p1 && .venv/bin/python build_ast_graph.py \ + --source-root tests/bank-chat-system \ + --ladybug-path /tmp/p1/code_graph.lbug --verbose +.venv/bin/java-codebase-rag meta --source-root tests/bank-chat-system --index-dir /tmp/p1 +``` +Expected: meta `counts_json` + node/edge counts identical to a pre-PR per-row +build (paste both). Note the graph-write phase timing from `JCIRAG_PROGRESS` +lines vs the pre-PR baseline. + +## Definition of Done + +- [ ] Step-1 spike result recorded in `_bulk_copy` docstring. +- [ ] `_write_edges` stages per-type rows (CALLS dedup + callee_declaring_role at staging); UnresolvedCallSite bulk-loaded before UNRESOLVED_AT. +- [ ] `_CREATE_EXT/IMPL/INJ/DECL/OVERRIDES/CALL` + local `_CREATE_UNRESOLVED/_UNRESOLVED_AT` deleted. +- [ ] `test_bulk_write_edges_match_per_row_baseline`, `test_bulk_write_is_deterministic_double_build`, `test_bulk_write_preserves_calls_dedup_and_callee_declaring_role`, `test_bulk_write_empty_rel_table_is_noop` pass. +- [ ] Full `test_ast_graph_build.py` + `test_incremental_graph.py` pass unchanged. +- [ ] Sentinel greps: zero where required, non-zero where required. +- [ ] `.venv/bin/ruff check .` clean; benchmark in PR description. +- [ ] PR title: `perf(graph): bulk COPY FROM for _write_edges (PR-P1)`. +```` + +--- + +## PR-P2 — Bulk write for nodes + routes/clients/producers/calls + +**Branch:** `perf/bulk-graph-writes-p2` off PR-P1's branch (or `master` if PR-P1 merged). +**Base:** PR-P1 merged. +**Plan section:** `plans/active/PLAN-INIT-INCREMENT-PERF.md` § PR-P2 (read this first). +**Estimated diff size:** medium. + +**Attach (`@-files`):** + +- `@build_ast_graph.py` (`_write_nodes_impl:3029`, `_write_nodes:3096`, `_write_nodes_merge:817`, `_CREATE_SYMBOL`, `_MERGE_SYMBOL`, `_write_routes_and_exposes:3338`, `_write_clients_producers_and_calls:3810`, the Route `MERGE (r:Route` at `:3819`, `_write_meta:3421`, `_bulk_copy` + `_REL_*_COLUMNS` from PR-P1) +- `@tests/test_incremental_graph.py` (regression; add to `TestIncrementalOrchestrator`) + +**Prompt:** + +```` +You are implementing PR-P2 from `plans/active/PLAN-INIT-INCREMENT-PERF.md`. +Read the **PR-P2** section in full. It reuses PR-P1's `_bulk_copy` + +`_REL_*_COLUMNS`. The plan wins. + +## Scope + +1. Convert `_write_nodes_impl` (:3029, the shared workhorse called by + `_write_nodes` full + `_write_nodes_merge` incremental) to stage Symbol rows + then `_bulk_copy(conn, "Symbol", NODE_COLUMNS, rows)`. Delete `_CREATE_SYMBOL` + and `_MERGE_SYMBOL` (both dead once the workhorse is bulk). Do the existing + `resolve_role_and_capabilities` + `type_role_by_node_id` population before + staging, unchanged. +2. Convert `_write_routes_and_exposes` (:3338, shared) to per-table staging + + `_bulk_copy` for Route/EXPOSES/Client/Producer/DECLARES_CLIENT/DECLARES_PRODUCER/ + HTTP_CALLS/ASYNC_CALLS (keep the existing `_file_by_node_id`/`_file_by_client_id`/ + `_file_by_producer_id` source_file resolution). Bulk-load Route/Client/Producer + NODES before the EXPOSES/DECLARES_*/HTTP_CALLS/ASYNC_CALLS edges. Delete + `_CREATE_ROUTE`/`_CREATE_EXPOSES`. +3. Convert `_write_clients_producers_and_calls` (:3810, incremental-only global + pass5/6) Client/Producer/edge writes to per-type staging + `_bulk_copy` (keep + the `member_by_id`/`client_by_id`/`producer_by_id` resolution). **Retain the + `MERGE (r:Route {id:$id}) …` dedup (:3819-3828) verbatim** + add a one-line + comment it is intentionally kept. Now delete the 6 shared constants — + `_CREATE_CLIENT`/`_CREATE_PRODUCER`/`_CREATE_DECLARES_CLIENT`/ + `_CREATE_DECLARES_PRODUCER`/`_CREATE_HTTP_CALL`/`_CREATE_ASYNC_CALL` — which + are dead only after BOTH routes/exposes and clients_producers functions convert. +4. Leave `_write_meta` (:3421) and its `MERGE (m:GraphMeta …)` UNTOUCHED. +5. Add the two named tests as methods of `TestIncrementalOrchestrator` in + `tests/test_incremental_graph.py`. + +## Out of scope (do NOT touch) + +- `_write_edges` (done in PR-P1). +- `_write_meta` / GraphMeta MERGE — leave it. +- `_delete_file_scope`, `incremental_rebuild` algorithm, dependent-expansion, + crash-marker logic. +- Anything outside `build_ast_graph.py` + `tests/test_incremental_graph.py`. + +If you find yourself wanting to touch any of the above, **stop and ask**. + +## Deliverables + +1. `_write_nodes_impl` bulk; `_CREATE_SYMBOL` + `_MERGE_SYMBOL` deleted. +2. `_write_routes_and_exposes` bulk; `_CREATE_ROUTE`/`_CREATE_EXPOSES` deleted. +3. `_write_clients_producers_and_calls` Client/Producer/edges bulk; `MERGE (r:Route)` retained + commented; 6 shared `_CREATE_*` deleted. +4. `_write_meta` untouched. +5. Two new tests in `TestIncrementalOrchestrator`. + +## Tests + +Run, all must pass: +``` +.venv/bin/python -m pytest tests/test_incremental_graph.py tests/test_ast_graph_build.py -v +.venv/bin/ruff check . +``` + +Sentinel greps — **must return zero**: +``` +grep -nE "_CREATE_(SYMBOL|ROUTE|EXPOSES|CLIENT|PRODUCER|DECLARES_CLIENT|DECLARES_PRODUCER|HTTP_CALL|ASYNC_CALL)\b" build_ast_graph.py +grep -nE "_MERGE_SYMBOL\b" build_ast_graph.py +``` + +Sentinel greps — **must be non-zero** (Route dedup + GraphMeta MERGE retained; bulk present): +``` +grep -n "MERGE (r:Route" build_ast_graph.py +grep -n "MERGE (m:GraphMeta" build_ast_graph.py +grep -n "COPY .*FROM \$rows" build_ast_graph.py +``` + +## Manual evidence (paste in PR description) + +Single-file change equivalence: +```bash +# set up an index, touch one file, increment (bulk), then full-rebuild the same +# state (bulk) and diff graphs (node count, per-type edge counts, GraphMeta). +``` +Expected: incremental(bulk) == full-rebuild(bulk) for that state. Paste side-by-side counts. + +## Definition of Done + +- [ ] `_write_nodes_impl` bulk; `_CREATE_SYMBOL` + `_MERGE_SYMBOL` deleted. +- [ ] `_write_routes_and_exposes` bulk (Route/Client/Producer before edges); `_CREATE_ROUTE`/`_CREATE_EXPOSES` deleted. +- [ ] `_write_clients_producers_and_calls` Client/Producer/edges bulk; `MERGE (r:Route)` retained + commented; 6 shared `_CREATE_*` deleted. +- [ ] `_write_meta` untouched. +- [ ] `test_incremental_bulk_write_equivalent_to_full_rebuild`, `test_incremental_route_merge_dedup_preserved` (both in `TestIncrementalOrchestrator`) pass; full `test_incremental_graph.py` + `test_ast_graph_build.py` green. +- [ ] Sentinel greps pass. +- [ ] `.venv/bin/ruff check .` clean. +- [ ] PR title: `perf(graph): bulk COPY FROM for nodes, routes, clients/producers (PR-P2)`. +```` + +--- + +## PR-P3 — Cached `LayeredIgnore` + `is_ignored` memo + +**Branch:** `perf/cached-ignore-p3` off `master`. +**Base:** `master` (independent of PR-P1/P2). +**Plan section:** `plans/active/PLAN-INIT-INCREMENT-PERF.md` § PR-P3 (read this first). +**Estimated diff size:** small. + +**Attach (`@-files`):** + +- `@java_index_flow_lancedb.py` (`ContextKey` defs `:60-72`, `coco_lifespan` provide sites `:287-306`, `process_java_file:345`/`process_sql_file:417`/`process_yaml_file:465`) +- `@path_filtering.py` (`LayeredIgnore`, `_mega:334`, `_mega_build_for_rel:193`, `is_ignored:345`, `diagnose_dict:377`) +- `@tests/test_path_filtering.py` (where the two memo unit tests go) +- `@tests/test_lancedb_e2e.py` (HEAVY once-per-flow test) + +**Prompt:** + +```` +You are implementing PR-P3 from `plans/active/PLAN-INIT-INCREMENT-PERF.md`. + +Read the **PR-P3** section in full. Independent of PR-P1/P2. The plan wins. + +KEY FACT: `LayeredIgnore(project_root)` appears at FIVE sites in +java_index_flow_lancedb.py — :177, :351, :423, :471, :569. PR-P3 converts ONLY +the three `process_*_file` sites (:351/:423/:471). The other two (:177 in +`_approximate_vectors_total`, :569 in the app_main pre-walk) call +`cocoindex_excluded_patterns()` ONCE PER RUN — leave them alone. + +## Scope + +1. Define `IGNORE = coco.ContextKey[LayeredIgnore]("java_lance_layered_ignore")` + alongside `PROJECT_ROOT`/`EMBEDDER`/`LANCE_DB` (:60-72), reusing the SAME + `_ck_params` (`detect_change` vs `tracked`) detection block. +2. In `coco_lifespan` (:287-306), add `builder.provide(IGNORE, LayeredIgnore(root))` + — built ONCE per flow run. +3. In `process_java_file`/`process_sql_file`/`process_yaml_file`: add + `ignore = coco.use_context(IGNORE)` and replace + `LayeredIgnore(project_root).is_ignored((project_root / file.file_path.path).resolve())` + with `ignore.is_ignored((project_root / file.file_path.path).resolve())`. + Keep `project_root`. DO NOT touch :177 or :569. +4. In `path_filtering.py` `LayeredIgnore`: add `self._mega_cache` in `__init__` + and memoize `_mega(rel)` keyed by `Path(rel_project).parent.as_posix()` + (`_mega_build_for_rel` reads only `dir_parts`, so this is correct). +5. Add the three named tests in the right files. + +## Out of scope (do NOT touch) + +- `build_ast_graph.py` and the graph write path (PR-P1/P2). +- The ignore *decision* logic (`_mega_build_for_rel`, `_winning_row`, negation + scanning) — only memoize. +- Sites :177 and :569. +- Any schema/ontology/re-index change. Loosening any existing test. + +If you find yourself wanting to touch any of the above, **stop and ask**. + +## Deliverables + +1. `IGNORE` ContextKey (version-detected) + provided once in `coco_lifespan`. +2. The three `process_*_file` consume it; :177/:569 untouched. +3. `_mega` memoized by directory in `LayeredIgnore`. +4. Three tests: two in `tests/test_path_filtering.py`, one (HEAVY) in `tests/test_lancedb_e2e.py`. + +## Tests + +Run, all must pass: +``` +.venv/bin/python -m pytest tests/test_path_filtering.py tests/test_lancedb_e2e.py -q +.venv/bin/python -m pytest tests -q -k "ignore or path_filter or vectors_progress" +.venv/bin/ruff check . +``` +Heavy (only if you can run cocoindex e2e locally): +``` +JAVA_CODEBASE_RAG_RUN_HEAVY=1 .venv/bin/python -m pytest tests/test_lancedb_e2e.py -q +``` + +Sentinel greps — **must return zero** (matches ONLY the 3 process sites; :177/:569 use the bare constructor + `cocoindex_excluded_patterns`, not `.is_ignored`): +``` +grep -nE "LayeredIgnore\(project_root\)\.is_ignored" java_index_flow_lancedb.py +``` + +Sentinel greps — **must be non-zero**: +``` +grep -n "coco.use_context(IGNORE)" java_index_flow_lancedb.py # 3 sites +grep -n "_mega_cache" path_filtering.py # memo present +``` + +## Manual evidence (paste in PR description) + +With `JAVA_CODEBASE_RAG_RUN_HEAVY=1`, run the flow over a small corpus and log +`id(ignore)` per file (temporary instrumentation) — confirm a single object id +across all files. Then micro-benchmark `is_ignored` over N files: same-directory +files hit the `_mega` cache. + +## Definition of Done + +- [ ] `IGNORE` ContextKey (version-detected) + provided once in `coco_lifespan`. +- [ ] The three `process_*_file` consume it; :177/:569 untouched. +- [ ] `_mega` memoized by directory; `is_ignored`/`diagnose_dict` results unchanged. +- [ ] `test_is_ignored_mega_caches_by_directory`, `test_layered_ignore_memo_preserves_decisions` (in `tests/test_path_filtering.py`), `test_layered_ignore_provided_once_per_flow` (in `tests/test_lancedb_e2e.py`, HEAVY) pass. +- [ ] Existing ignore + vectors-progress tests pass unchanged. +- [ ] Sentinel greps pass. +- [ ] `.venv/bin/ruff check .` clean. +- [ ] PR title: `perf(vectors): lifespan-cached LayeredIgnore + is_ignored memo (PR-P3)`. +```` + +--- + +## Notes for the orchestrator + +- **Landing order:** PR-P1 → PR-P2 (P2 needs `_bulk_copy` + `_REL_*_COLUMNS`). + PR-P3 is independent (can go first, last, or between). +- **Shared-helper awareness:** `_write_edges`, `_write_routes_and_exposes`, + `_write_nodes_impl`, `_write_meta` are each called by BOTH paths. Converting + one accelerates both — so `test_incremental_graph.py` is a binding gate for + PR-P1 and PR-P2, not just PR-P2. +- **Review between PRs** (`superpowers:requesting-code-review`): the equivalence + harness gates P1/P2; the memo-parity test gates P3. +- **Sentinel greps are binding:** a non-empty "must return zero" grep = scope + leak; an empty "must be non-zero" grep = over-deletion. Either blocks merge. diff --git a/plans/active/PLAN-INIT-INCREMENT-PERF.md b/plans/active/PLAN-INIT-INCREMENT-PERF.md new file mode 100644 index 00000000..b6cff842 --- /dev/null +++ b/plans/active/PLAN-INIT-INCREMENT-PERF.md @@ -0,0 +1,406 @@ +# Plan: Faster init/increment — bulk graph writes + cached ignore + +Status: **active (planning)**. This plan implements +[`propose/active/INIT-INCREMENT-PERF-PROPOSE.md`](../../propose/active/INIT-INCREMENT-PERF-PROPOSE.md). +The proposal lands via PR #338; this plan lands via PR #339 and **stacks behind +#338** (the proposal file is on #338's branch until it merges). The staging +invariants are inlined below so this plan is self-contained if #338 has not +merged yet. + +Depends on: PR #338 (proposal) — non-blocking for reading this plan (key +invariants inlined), but the proposal should merge first. + +> **Revised after a 5-lens subagent review (PR #339 thread).** The original +> draft assumed `_write_edges` / `_write_routes_and_exposes` / `_write_meta` +> were full-rebuild-only and split PRs by *path*. They are **shared by both +> paths** (verified: `_write_edges` `build_ast_graph.py:3244` is called at +> `:3926` full + `:805` incremental; `_write_routes_and_exposes` `:3338` at +> `:3930` full + `:811` incremental; `_write_meta` `:3421` at `:3933` full + +> `:3733` incremental). PRs are now split by **write-function**, not by path, +> and GraphMeta is left on MERGE (not bulk-converted). + +## Goal + +- Cut `init` / `reprocess` / `increment` graph-write wall-clock by replacing + per-row `conn.execute` writes with bulk `COPY FROM` — the ~81% init lever. +- Remove the ~25s `LayeredIgnore` re-construction + `is_ignored` re-merge paid + per file in the cocoindex vectors phase. +- Preserve graph contents exactly: no ontology bump, no re-index, no + query-result change (proven by an equivalence harness). + +## Principles (do not relitigate in review) + +- **Byte-equivalent graph.** Every PR changes only the write mechanism; the + graph (node/edge rows, properties, `GraphMeta` counters) must be identical to + today. The equivalence harness + the full `test_incremental_graph.py` suite + are the merge gate. +- **Split by write-function, not by path.** The graph write helpers are shared + between the full path (`write_ladybug:3893`) and the incremental path + (`incremental_rebuild:3535`): `_write_edges`, `_write_routes_and_exposes`, + `_write_nodes_impl`, `_write_meta` are each called by BOTH. Converting a + shared helper accelerates **both** paths at once — there is no "full-only" + conversion for edges/routes. `_write_clients_producers_and_calls:3810` is + **incremental-only** (the global pass5/6 step; sole caller `:3716`). +- **GraphMeta stays on MERGE.** `_write_meta:3421` is shared and recomputes + counters before a single `MERGE (m:GraphMeta {key:$k})` (`:3472`). It is one + row and not worth the risk — **do not bulk-convert it**. (This reverses the + proposal's Open Q1 recommendation.) +- **In-memory pyarrow `COPY FROM`** is the bulk mechanism (verified: the + `ladybug` wrapper forwards `COPY FROM` and accepts a pyarrow param — + `ladybug/connection.py:337`/`:488`). Parquet-file is a fallback, not the + default. Do not propose CSV. +- **MPS device default is out of scope** — the flow already auto-selects MPS. +- **No new env vars, CLI flags, or public surface.** No compatibility shims. + +## PR breakdown - overview + +| PR | Scope (write-functions converted) | Ontology bump | Areas of concern | Test buckets | Independent of | +| --- | --- | --- | --- | --- | --- | +| **PR-P1** | Add `_bulk_copy`; convert **`_write_edges`** (shared → both paths' Symbol→Symbol edges + UnresolvedCallSite/UNRESOLVED_AT) | none | REL-table `COPY FROM` column order (FROM/TO first); CALLS dedup + `callee_declaring_role` materialized at staging; UnresolvedCallSite loaded before UNRESOLVED_AT | equivalence + determinism + baseline; full `test_ast_graph_build.py` **and** `test_incremental_graph.py` (shared helper) | — | +| **PR-P2** | Convert **`_write_nodes_impl`** (shared nodes), **`_write_routes_and_exposes`** (shared routes/clients/producers/calls), **`_write_clients_producers_and_calls`** (incremental-only global; Route MERGE preserved) | none | Route/Client/Producer nodes loaded before EXPOSES/DECLARES_*/HTTP_CALLS/ASYNC_CALLS; the 6 client/producer `_CREATE_*` constants are shared with the incremental-only function — delete only after BOTH convert; pass5/6 Route MERGE retained | `test_incremental_graph.py` regression + new incremental-equivalence test | depends on **PR-P1** (`_bulk_copy`) | +| **PR-P3** | Hoist `LayeredIgnore` to a cocoindex `ContextKey` + memoize `is_ignored`'s `_mega` by directory | none | `ContextKey` lifespan scoping; `_mega` memo correctness (mega depends on directory only); leave the once-per-run sites (`:177`, `:569`) alone; no change to any ignore *decision* | `tests/test_path_filtering.py` memo tests + `tests/test_lancedb_e2e.py` (HEAVY) once-per-flow | independent of PR-P1, PR-P2 | + +Landing order: **P1 → P2**; **P3** may land in any order (independent). + +## Resolved design decisions + +| Topic | Decision | +| --- | --- | +| Bulk mechanism | In-memory pyarrow: `conn.execute("COPY