Skip to content

Commit 986c824

Browse files
aksOpsclaude
andcommitted
refactor(graph): use Kuzu 0.11 native features (FTS, param LIMIT, []string)
Kuzu 0.11.3 bundles features that were unavailable or broken in 0.7.1. This commit unwinds the workarounds documented in CLAUDE.md. ### FTS (fulltext search) `CreateIndexes()` was a no-op because Kuzu 0.7.1's FTS extension needed a network INSTALL (incompatible with air-gapped builds). 0.11.3 ships FTS pre-bundled. `CreateIndexes()` now: - `INSTALL fts; LOAD EXTENSION fts;` - `CALL DROP_FTS_INDEX` / `CALL CREATE_FTS_INDEX` for two indexes: - `code_node_label_fts` over `(label, fqn_lower)` - `code_node_lexical_fts` over `(prop_lex_comment, prop_lex_config_keys)` `SearchByLabel` / `SearchLexical` route through `CALL QUERY_FTS_INDEX` with BM25 score ranking. A trailing `*` is auto-appended when the user query is a single bare token, giving prefix-match UX similar to the old CONTAINS behaviour. CONTAINS-based fallbacks remain in place for graphs that never ran enrich (FTS index would be missing). The mutation gate (`MutationKeyword`) allows the read-only `CALL QUERY_FTS_INDEX` procedure; the catalog writers `CALL CREATE_FTS_INDEX` / `CALL DROP_FTS_INDEX` stay blocked under `OpenReadOnly`. ### Parameterized LIMIT / SKIP Kuzu 0.7.1 rejected `$lim` / `$skip` bindings — values had to be inline literals. 0.11.3 accepts them as bound parameters. Affected sites: - `internal/graph/indexes.go` — SearchByLabel / SearchLexical - `internal/graph/reads.go` — FindByKindPaginated - `internal/query/service.go` — FindCycles, FindDeadCode - `internal/mcp/tools_graph.go` — list-edges, ego-neighbours, endpoints-by-id Helper `intLiteral` is removed (was only used to format inline LIMITs). ### Drop `stringsToAny` widener Kuzu 0.7's Go binding required `[]any` for list parameters; `[]string` tripped `unsupported type` in `goValueToKuzuValue`. 0.11.3's binding accepts `[]string` directly. The widener helper is removed and its two callers (`query.FindDeadCode`, `topology.FindServicesContainingNodes`) pass `[]string` straight. ### CLAUDE.md Reworked the Kuzu quirks section into "lifted in 0.11.3" vs "still present" buckets so future contributors don't reintroduce workarounds that the runtime no longer needs. ### Verification - `cd go && CGO_ENABLED=1 go test ./... -count=1` — 883 passed - End-to-end on `~/projects/polyglot-bench/airflow`: enrich exit 0, 95k nodes, 246k edges, FTS search returns BM25-ranked hits - End-to-end on `~/projects/`: enrich exit 0, 187k nodes, 414k edges, 1m 29s wall, 1.88 GiB peak RSS FTS `'service*'` returns top-5 ranked at scores ~12-14 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 690fba6 commit 986c824

7 files changed

Lines changed: 160 additions & 124 deletions

File tree

CLAUDE.md

Lines changed: 22 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ landing) and `c630245` (release infra).
2727
- **Go 1.25.10** — toolchain pin; module min is 1.25.0 (clamped by the
2828
MCP SDK's own `go` directive).
2929
- **Kuzu 0.7.1** (`github.com/kuzudb/go-kuzu`) — embedded graph DB.
30-
CGO. v0.7.1 quirks documented in `## Gotchas` below.
30+
CGO. v0.11.3 capability matrix documented in `## Gotchas` below.
3131
- **`mattn/go-sqlite3` 1.14.22** — SQLite analysis cache. CGO.
3232
- **`smacker/go-tree-sitter`** — AST parsing for Java / Python /
3333
TypeScript / Go.
@@ -357,26 +357,34 @@ Release pipeline:
357357
silently produced 0 nodes pre-fix. Test: `codeiq plugins` lists
358358
every detector by name; new ones must appear.
359359

360-
### Kuzu v0.7.1 quirks
361-
362-
- FTS extension not bundled, not downloadable offline. `INSTALL fts`
363-
errors with "fts is not an official extension". `CreateIndexes()`
364-
no-ops FTS; `SearchByLabel` / `SearchLexical` use case-insensitive
365-
`CONTAINS` predicates.
366-
- LIMIT / SKIP can't be parameterized. Inline as literals;
367-
parameterize the needle only.
368-
- Uses `lower()` (SQL) not `toLower()` (openCypher).
369-
- `RETURN DISTINCT` scope tighter than openCypher; `ORDER BY` must
370-
reference the projected alias, not the bound variable.
360+
### Kuzu v0.11.3 (current pin)
361+
362+
**Lifted in 0.11.3**`CLAUDE.md` previously documented these as 0.7.1
363+
quirks; they were unwound in the post-bump cleanup:
364+
365+
- FTS extension ships bundled. `CreateIndexes()` runs `INSTALL fts; LOAD
366+
EXTENSION fts;` then `CALL CREATE_FTS_INDEX`. `SearchByLabel` /
367+
`SearchLexical` query via `CALL QUERY_FTS_INDEX` with BM25 ranking;
368+
CONTAINS predicates remain as fallback for pre-enrich graphs.
369+
- `LIMIT $param` and `SKIP $param` work as bound parameters. No more
370+
`fmt.Sprintf` for integer literals.
371+
- `toLower()` works (use it; `lower()` still accepted for SQL parity).
372+
- Go binding accepts `[]string` for `IN $param` directly. The
373+
`stringsToAny` widener is gone.
374+
375+
**Still present in 0.11.3** — keep workarounds:
376+
371377
- List comprehension binder rejects out-of-scope variables. Use
372378
`properties(nodes(p), 'id')` instead of `[n IN nodes(p) | n.id]`.
373379
- `EXISTS { … }` subquery doesn't see outer-scope `$param`. Inline
374380
static lists as rel-pattern alternations.
375-
- Go binding's `goValueToKuzuValue` accepts `[]any` only. Added
376-
`stringsToAny` widener for `IN $param` use cases.
377381
- Multi-label rel alternation + kleene-star in the same recursive
378382
pattern breaks the binder. BlastRadius uses an anonymous recursive
379383
pattern.
384+
- Recursive pattern upper bound `[*1..N]` must be a literal, not a
385+
parameter — only LIMIT/SKIP are now bindable.
386+
- Mutation gate allows `CALL QUERY_FTS_INDEX` but blocks
387+
`CALL CREATE_FTS_INDEX` / `CALL DROP_FTS_INDEX` (catalog writes).
380388

381389
### MCP SDK v1.6
382390

go/internal/graph/indexes.go

Lines changed: 95 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -7,78 +7,127 @@ import (
77
"github.com/randomcodespace/codeiq/go/internal/model"
88
)
99

10+
// FTS index names. CreateIndexes builds these after enrich. Read paths
11+
// query them via QUERY_FTS_INDEX.
12+
const (
13+
ftsLabelIndex = "code_node_label_fts"
14+
ftsLexicalIndex = "code_node_lexical_fts"
15+
)
16+
1017
// CreateIndexes installs the fulltext-search indexes the read side relies
11-
// on. Mirrors GraphStore.createIndexes() on the Java side, which declares
12-
// two Neo4j fulltext indexes:
18+
// on. Two indexes are created:
1319
//
14-
// - search_index: covers label_lower + fqn_lower. Powers /api/search and
15-
// the search_graph MCP tool.
16-
// - lexical_index: covers prop_lex_comment + prop_lex_config_keys.
20+
// - code_node_label_fts: covers label + fqn_lower. Powers SearchByLabel
21+
// and the search_graph MCP tool surface.
22+
// - code_node_lexical_fts: covers prop_lex_comment + prop_lex_config_keys.
1723
// Powers LexicalQueryService's doc-comment / config-key search.
1824
//
19-
// Implementation note (Kuzu version gap): Kuzu's official FTS extension
20-
// ships pre-bundled from v0.11.3 onwards. We pin go-kuzu v0.7.1 (Kuzu
21-
// 0.7.x runtime), which requires a network INSTALL of the FTS extension —
22-
// incompatible with the air-gapped build policy. We therefore expose the
23-
// same SearchByLabel / SearchLexical surface and back it with Cypher
24-
// CONTAINS predicates. When we bump Kuzu past 0.11.3 the implementation
25-
// swaps to CALL CREATE_FTS_INDEX / QUERY_FTS_INDEX without touching the
26-
// caller surface.
25+
// Idempotent: existing indexes are dropped before re-create. The enrich
26+
// pipeline calls this once after BulkLoadNodes / BulkLoadEdges complete,
27+
// so the indexes always reflect the latest snapshot.
2728
//
28-
// Because there is no actual index to create at this version, CreateIndexes
29-
// is a no-op that returns nil. It stays in the API so call sites in the
30-
// enrich command line up with the eventual FTS implementation.
29+
// FTS bundled in Kuzu 0.11.3+ (no network install needed — air-gapped safe).
3130
func (s *Store) CreateIndexes() error {
32-
// Touch the property columns to make sure schema is in place. We do
33-
// NOT attempt INSTALL fts here — that path requires network access
34-
// the air-gapped build policy forbids (see playbooks/build.md).
31+
// FTS extension ships bundled but still needs LOAD to register the
32+
// catalog functions. INSTALL is a no-op when bundled.
33+
if _, err := s.Cypher("INSTALL fts;"); err != nil {
34+
return fmt.Errorf("graph: install fts: %w", err)
35+
}
36+
if _, err := s.Cypher("LOAD EXTENSION fts;"); err != nil {
37+
return fmt.Errorf("graph: load fts: %w", err)
38+
}
39+
// Drop-then-create — idempotent across re-enrich. Dropping a missing
40+
// index errors; ignore that single error path.
41+
for _, idx := range []string{ftsLabelIndex, ftsLexicalIndex} {
42+
_, _ = s.Cypher(fmt.Sprintf("CALL DROP_FTS_INDEX('CodeNode', '%s');", idx))
43+
}
44+
if _, err := s.Cypher(fmt.Sprintf(
45+
`CALL CREATE_FTS_INDEX('CodeNode', '%s', ['label', 'fqn_lower']);`,
46+
ftsLabelIndex)); err != nil {
47+
return fmt.Errorf("graph: create fts label index: %w", err)
48+
}
49+
if _, err := s.Cypher(fmt.Sprintf(
50+
`CALL CREATE_FTS_INDEX('CodeNode', '%s', ['prop_lex_comment', 'prop_lex_config_keys']);`,
51+
ftsLexicalIndex)); err != nil {
52+
return fmt.Errorf("graph: create fts lexical index: %w", err)
53+
}
3554
return nil
3655
}
3756

38-
// SearchByLabel runs a case-insensitive substring search across
39-
// label_lower and fqn_lower. Returns up to `limit` nodes ordered by id for
40-
// stable test output. Behaviour matches the Java search_index contract at
41-
// the API surface; ranking differs (no BM25 until Kuzu FTS lands).
57+
// SearchByLabel runs a fulltext search across the label + fqn_lower index.
58+
// The query is auto-suffixed with '*' to give prefix matching (so 'auth'
59+
// matches 'AuthService' identifiers). Results are ranked by BM25 score.
60+
// Falls back to CONTAINS predicate when the FTS index hasn't been built
61+
// (pre-enrich or enrich aborted before CreateIndexes).
4262
func (s *Store) SearchByLabel(q string, limit int) ([]*model.CodeNode, error) {
43-
needle := strings.ToLower(q)
44-
// Kuzu 0.7.1 rejects parameter binding on LIMIT — the value must be
45-
// an inline literal. Coerce `limit` to a non-negative int and inline
46-
// it via fmt; the user-supplied needle still goes through prepared
47-
// parameter binding.
63+
return s.ftsSearch(ftsLabelIndex, q, limit, s.searchByLabelFallback)
64+
}
65+
66+
// SearchLexical runs a fulltext search across the prose columns
67+
// (prop_lex_comment + prop_lex_config_keys). BM25 ranks results. Same
68+
// CONTAINS fallback as SearchByLabel for pre-enrich graphs.
69+
func (s *Store) SearchLexical(q string, limit int) ([]*model.CodeNode, error) {
70+
return s.ftsSearch(ftsLexicalIndex, q, limit, s.searchLexicalFallback)
71+
}
72+
73+
// ftsSearch is the shared FTS path for SearchByLabel and SearchLexical.
74+
// On any FTS error (missing index, malformed query, etc.) it routes to the
75+
// caller-supplied CONTAINS fallback.
76+
func (s *Store) ftsSearch(idx, q string, limit int,
77+
fallback func(string, int) ([]*model.CodeNode, error)) ([]*model.CodeNode, error) {
4878
if limit < 0 {
4979
limit = 0
5080
}
51-
rows, err := s.Cypher(fmt.Sprintf(`
81+
needle := strings.TrimSpace(strings.ToLower(q))
82+
// Prefix-search via wildcard: "auth" → "auth*". Skip if user already
83+
// supplied a wildcard or a multi-token query (FTS treats space as AND).
84+
if needle != "" && !strings.ContainsAny(needle, "* ") {
85+
needle += "*"
86+
}
87+
rows, err := s.Cypher(`
88+
CALL QUERY_FTS_INDEX('CodeNode', $idx, $q)
89+
WITH node AS n, score
90+
RETURN n.id AS id, n.kind AS kind, n.label AS label,
91+
n.file_path AS file_path, n.layer AS layer, score
92+
ORDER BY score DESC, n.id
93+
LIMIT $lim`,
94+
map[string]any{"idx": idx, "q": needle, "lim": int64(limit)})
95+
if err != nil {
96+
return fallback(needle, limit)
97+
}
98+
return rowsToNodes(rows), nil
99+
}
100+
101+
// searchByLabelFallback uses CONTAINS — same shape as pre-FTS code, retained
102+
// for graphs where CreateIndexes has not run. Strips the trailing '*' added
103+
// by ftsSearch since CONTAINS is already substring-y.
104+
func (s *Store) searchByLabelFallback(needle string, limit int) ([]*model.CodeNode, error) {
105+
q := strings.TrimSuffix(needle, "*")
106+
rows, err := s.Cypher(`
52107
MATCH (n:CodeNode)
53108
WHERE n.label_lower CONTAINS $q OR n.fqn_lower CONTAINS $q
54109
RETURN n.id AS id, n.kind AS kind, n.label AS label,
55110
n.file_path AS file_path, n.layer AS layer
56-
ORDER BY n.id LIMIT %d`, limit),
57-
map[string]any{"q": needle})
111+
ORDER BY n.id LIMIT $lim`,
112+
map[string]any{"q": q, "lim": int64(limit)})
58113
if err != nil {
59114
return nil, fmt.Errorf("graph: search by label: %w", err)
60115
}
61116
return rowsToNodes(rows), nil
62117
}
63118

64-
// SearchLexical runs a case-insensitive substring search across
65-
// prop_lex_comment and prop_lex_config_keys — the two columns
66-
// LexicalEnricher fills with doc-comment text and surfaced config keys.
67-
// Same Kuzu version caveat as SearchByLabel above.
68-
func (s *Store) SearchLexical(q string, limit int) ([]*model.CodeNode, error) {
69-
needle := strings.ToLower(q)
70-
if limit < 0 {
71-
limit = 0
72-
}
73-
// Kuzu 0.7.1 uses SQL-style `lower()`, not `toLower()`.
74-
rows, err := s.Cypher(fmt.Sprintf(`
119+
// searchLexicalFallback uses CONTAINS with toLower() over prose columns.
120+
// Retained for graphs that haven't run enrich/CreateIndexes.
121+
func (s *Store) searchLexicalFallback(needle string, limit int) ([]*model.CodeNode, error) {
122+
q := strings.TrimSuffix(needle, "*")
123+
rows, err := s.Cypher(`
75124
MATCH (n:CodeNode)
76-
WHERE lower(n.prop_lex_comment) CONTAINS $q
77-
OR lower(n.prop_lex_config_keys) CONTAINS $q
125+
WHERE toLower(n.prop_lex_comment) CONTAINS $q
126+
OR toLower(n.prop_lex_config_keys) CONTAINS $q
78127
RETURN n.id AS id, n.kind AS kind, n.label AS label,
79128
n.file_path AS file_path, n.layer AS layer
80-
ORDER BY n.id LIMIT %d`, limit),
81-
map[string]any{"q": needle})
129+
ORDER BY n.id LIMIT $lim`,
130+
map[string]any{"q": q, "lim": int64(limit)})
82131
if err != nil {
83132
return nil, fmt.Errorf("graph: search lexical: %w", err)
84133
}

go/internal/graph/mutation.go

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,16 @@ var callRE = regexp.MustCompile(`(?i)\bCALL\s+(\w+(?:\.\w+)?)`)
4141
// readOnlyCallPrefixes are case-insensitive procedure-name prefixes that
4242
// are permitted under CALL. db.* covers Neo4j's read-only schema
4343
// procedures (db.indexes, db.constraints, db.labels); show_/table_/
44-
// current_setting/table_info cover Kuzu's introspection helpers.
44+
// current_setting/table_info cover Kuzu's introspection helpers;
45+
// query_fts_index is Kuzu 0.11's read-only FTS search procedure
46+
// (create_/drop_fts_index stay blocked because they mutate the catalog).
4547
var readOnlyCallPrefixes = []string{
4648
"db.",
4749
"show_",
4850
"table_",
4951
"current_setting",
5052
"table_info",
53+
"query_fts_index",
5154
}
5255

5356
// blockCommentRE matches /* … */ and line comments. Both are stripped

go/internal/graph/reads.go

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,7 @@ import (
1111
// GraphController. All return projections through rowsToNodes (defined in
1212
// indexes.go) — `id`, `kind`, `label`, and optionally `file_path` / `layer`.
1313
//
14-
// Kuzu 0.7.1 caveats relevant here:
15-
// - LIMIT/SKIP values must be inlined literals, not bound parameters.
14+
// Kuzu caveats relevant here:
1615
// - count(*) on rels works fine across all rel tables via
1716
// `MATCH ()-[r]->()` — Kuzu treats the wildcard as the union of every
1817
// declared rel type.
@@ -107,13 +106,12 @@ func (s *Store) FindByKindPaginated(kind string, offset, limit int) ([]*model.Co
107106
if limit < 0 {
108107
limit = 0
109108
}
110-
// Kuzu 0.7.1 disallows parameter binding on SKIP/LIMIT — inline them.
111-
rows, err := s.Cypher(fmt.Sprintf(`
109+
rows, err := s.Cypher(`
112110
MATCH (n:CodeNode) WHERE n.kind = $k
113111
RETURN n.id AS id, n.kind AS kind, n.label AS label,
114112
n.file_path AS file_path, n.layer AS layer
115-
ORDER BY n.id SKIP %d LIMIT %d`, offset, limit),
116-
map[string]any{"k": kind})
113+
ORDER BY n.id SKIP $skip LIMIT $lim`,
114+
map[string]any{"k": kind, "skip": int64(offset), "lim": int64(limit)})
117115
if err != nil {
118116
return nil, fmt.Errorf("graph: find by kind: %w", err)
119117
}

go/internal/mcp/tools_graph.go

Lines changed: 16 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -181,21 +181,15 @@ func toolQueryEdges(d *Deps) Tool {
181181
// the anonymous-rel pattern.
182182
cypher := `MATCH (a:CodeNode)-[r]->(b:CodeNode)
183183
RETURN a.id AS source, b.id AS target, LABEL(r) AS kind
184-
ORDER BY source, kind, target LIMIT ` + intLiteral(limit)
185-
args := map[string]any{}
184+
ORDER BY source, kind, target LIMIT $lim`
185+
args := map[string]any{"lim": int64(limit)}
186186
if p.Kind != "" {
187187
cypher = `MATCH (a:CodeNode)-[r]->(b:CodeNode) WHERE LABEL(r) = $k
188188
RETURN a.id AS source, b.id AS target, LABEL(r) AS kind
189-
ORDER BY source, kind, target LIMIT ` + intLiteral(limit)
189+
ORDER BY source, kind, target LIMIT $lim`
190190
args["k"] = p.Kind
191191
}
192-
var rows []map[string]any
193-
var err error
194-
if len(args) == 0 {
195-
rows, err = d.Store.Cypher(cypher)
196-
} else {
197-
rows, err = d.Store.Cypher(cypher, args)
198-
}
192+
rows, err := d.Store.Cypher(cypher, args)
199193
if err != nil {
200194
return NewErrorEnvelope(CodeInternalError, err, RequestID(ctx)), nil
201195
}
@@ -275,11 +269,11 @@ func toolGetEgoGraph(d *Deps) Tool {
275269
}
276270
depth := CapDepth(p.Radius, d.MaxDepth)
277271
// Variable-length match centered on Center, walking outbound up to
278-
// depth. Kuzu 0.7's binder is fussy about projecting properties
279-
// from the endpoint of a variable-length pattern; the supported
280-
// shape is `properties(nodes(p), 'id')` over the named path.
281-
// Splitting outbound + inbound queries keeps the rows shape
282-
// uniform (both sides projected through nodes(p)).
272+
// depth. Kuzu's binder is fussy about projecting properties from
273+
// the endpoint of a variable-length pattern; the supported shape
274+
// is `properties(nodes(p), 'id')` over the named path. The
275+
// recursive `[*1..N]` upper bound must be a literal (binder gap);
276+
// LIMIT goes through parameter binding fine.
283277
limit := CapResults(0, d.MaxResults)
284278
cypher := fmt.Sprintf(`
285279
MATCH p = (c:CodeNode {id: $center})-[*1..%d]-(:CodeNode)
@@ -289,8 +283,10 @@ func toolGetEgoGraph(d *Deps) Tool {
289283
WHERE n.id <> $center
290284
RETURN n.id AS id, n.kind AS kind, n.label AS label,
291285
n.file_path AS file_path, n.layer AS layer
292-
ORDER BY n.id LIMIT %d`, depth, limit)
293-
rows, err := d.Store.Cypher(cypher, map[string]any{"center": p.Center})
286+
ORDER BY n.id LIMIT $lim`, depth)
287+
rows, err := d.Store.Cypher(cypher, map[string]any{
288+
"center": p.Center, "lim": int64(limit),
289+
})
294290
if err != nil {
295291
return NewErrorEnvelope(CodeInternalError, err, RequestID(ctx)), nil
296292
}
@@ -621,16 +617,16 @@ func toolFindRelatedEndpoints(d *Deps) Tool {
621617
// Endpoints that share a service container with the identifier
622618
// (file path / class / fqn) — the simplest semantic match that
623619
// works across languages.
624-
cypher := fmt.Sprintf(`
620+
cypher := `
625621
MATCH (target:CodeNode)
626622
WHERE target.file_path = $i OR target.label = $i OR target.id = $i OR target.fqn = $i
627623
MATCH (target)<-[:CONTAINS]-(svc:CodeNode {kind: 'service'})-[:CONTAINS]->(ep:CodeNode)
628624
WHERE ep.kind = 'endpoint' OR ep.kind = 'websocket_endpoint'
629625
RETURN DISTINCT ep.id AS id, ep.kind AS kind, ep.label AS label,
630626
ep.file_path AS file_path, ep.layer AS layer,
631627
svc.label AS service
632-
ORDER BY ep.id LIMIT %d`, limit)
633-
rows, err := d.Store.Cypher(cypher, map[string]any{"i": p.Identifier})
628+
ORDER BY ep.id LIMIT $lim`
629+
rows, err := d.Store.Cypher(cypher, map[string]any{"i": p.Identifier, "lim": int64(limit)})
634630
if err != nil {
635631
return NewErrorEnvelope(CodeInternalError, err, RequestID(ctx)), nil
636632
}
@@ -754,13 +750,4 @@ func toolReadFile(d *Deps) Tool {
754750
}
755751
}
756752

757-
// intLiteral renders a non-negative int as a Cypher literal. Kuzu 0.7.1
758-
// rejects parameter binding on LIMIT — the value must be inline. The cap
759-
// floor is 1 to match Kuzu's `LIMIT 0` failure mode.
760-
func intLiteral(n int) string {
761-
if n < 1 {
762-
n = 1
763-
}
764-
return fmt.Sprintf("%d", n)
765-
}
766753

0 commit comments

Comments
 (0)