|
| 1 | +# CLAUDE.md — codeiq |
| 2 | + |
| 3 | +> **Repo-specific instructions for Claude Code (and any AI coding agent with similar tooling).** Read this in full before making changes. For the full doc set see [`docs/`](docs/); for the one-stop agent brief see [`docs/11-agent-handoff.md`](docs/11-agent-handoff.md). |
| 4 | +
|
| 5 | +## What this project is |
| 6 | + |
| 7 | +A deterministic code-knowledge-graph CLI + stdio MCP server. Pure static analysis — no AI in the index/enrich pipeline. Single static Go binary. CGO mandatory. |
| 8 | + |
| 9 | +- **Module path:** `github.com/randomcodespace/codeiq` |
| 10 | +- **Entry:** [`cmd/codeiq/main.go`](cmd/codeiq/main.go) → [`internal/cli/root.go`](internal/cli/root.go) |
| 11 | +- **Tech stack pinned in [`go.mod`](go.mod):** Go 1.25.10 toolchain, Kuzu 0.11.3, SQLite 1.14.44, MCP SDK v1.6, tree-sitter (smacker), cobra 1.10.2. |
| 12 | + |
| 13 | +## Architecture in 10 lines |
| 14 | + |
| 15 | +1. `codeiq index <path>` walks files (git ls-files → fallback), parses with tree-sitter / structured / regex, runs **100 detectors**, dedup-merges into a graph, and writes to **SQLite cache** at `.codeiq/cache/codeiq.sqlite`. |
| 16 | +2. `codeiq enrich <path>` loads cache, runs linkers + LayerClassifier + intelligence extractors + ServiceDetector, then BulkLoads into **Kuzu** at `.codeiq/graph/codeiq.kuzu/` and builds two FTS indexes (`code_node_label_fts`, `code_node_lexical_fts`). |
| 17 | +3. `codeiq mcp <path>` opens Kuzu read-only and serves a stdio JSON-RPC MCP protocol with **10 tools** (6 mode-driven + `run_cypher` + `read_file` + `generate_flow` + `review_changes`). |
| 18 | +4. `codeiq review` is the only LLM touch — diff + graph evidence → Ollama (`localhost:11434` default; `OLLAMA_API_KEY` flips to cloud). |
| 19 | +5. Every other subcommand (`stats`, `find`, `query`, `cypher`, `flow`, `graph`, `topology`, `cache`, `plugins`, `version`) is a thin read-only consumer of the Kuzu store. |
| 20 | + |
| 21 | +## Critical rules |
| 22 | + |
| 23 | +### Read-only MCP |
| 24 | + |
| 25 | +The MCP server (`codeiq mcp`) is strictly read-only. `run_cypher` enforces this via [`MutationKeyword`](internal/graph/mutation.go) — regex gate that rejects CREATE/DELETE/DETACH/SET/REMOVE/MERGE/DROP/FOREACH/LOAD CSV/COPY and any CALL outside the allow-list (`db.*`, `show_*`, `table_*`, `current_setting`, `table_info`, **`query_fts_index`**). `read_file` is path-sandboxed to the indexed root. |
| 26 | + |
| 27 | +Belt-and-braces: Kuzu is opened with `OpenReadOnly` at the engine level too. |
| 28 | + |
| 29 | +### Determinism |
| 30 | + |
| 31 | +Same input ⇒ same output, byte-for-byte. Every detector ships a determinism test. Conventions: |
| 32 | + |
| 33 | +- Never iterate a Go `map` without sorting keys first. |
| 34 | +- `GraphBuilder.Snapshot()` sorts nodes + edges by ID. |
| 35 | +- Linker outputs go through `.Sorted()` at the call site. |
| 36 | +- Detectors are stateless — no mutable struct fields. Method-local state only. |
| 37 | + |
| 38 | +### Detector registration choke point |
| 39 | + |
| 40 | +Adding a new detector under `internal/detector/<dir>/` is **not enough**. The package leaf must be blank-imported in [`internal/cli/detectors_register.go`](internal/cli/detectors_register.go). Without that line, the Go linker drops the package's `init()` and the binary ships with no registration for that detector family. This was the #1 silent-failure bug during the Java→Go port — 15 language families silently produced 0 nodes before the auto-import check was added to the dev workflow. |
| 41 | + |
| 42 | +### Goroutine safety |
| 43 | + |
| 44 | +- File I/O and detector dispatch run on a worker pool (`opts.Workers`, default `2 × GOMAXPROCS`). |
| 45 | +- Detectors must be stateless. Method-local state only. |
| 46 | +- Kuzu reads serialize behind the [`Store.mu`](internal/graph/store.go) mutex; one query at a time. |
| 47 | +- The intelligence extractor pool is also `2 × GOMAXPROCS`-bounded to keep tree-sitter heap under control (Phase A OOM fix). |
| 48 | + |
| 49 | +### Confidence ladder is monotonic |
| 50 | + |
| 51 | +``` |
| 52 | +ConfidenceLexical ("LEXICAL", 0.6) — regex / textual pattern |
| 53 | +ConfidenceSyntactic ("SYNTACTIC", 0.8) — AST / parse-tree match |
| 54 | +ConfidenceResolved ("RESOLVED", 0.95) — SymbolResolver cross-file resolution |
| 55 | +``` |
| 56 | + |
| 57 | +In `mergeNode`, the higher-confidence node wins. The donor only fills properties the survivor doesn't already have (so a Spring detector's `framework=spring` stamp can't be overwritten by a generic detector's lower-confidence emission). |
| 58 | + |
| 59 | +### Phantom edge drop |
| 60 | + |
| 61 | +Edges with endpoints not in the node set get dropped at `Snapshot()`. Detectors emitting imports / depends-on edges across files must explicitly create the anchor nodes: |
| 62 | + |
| 63 | +- `base.EnsureFileAnchor(ctx, lang)` — emits a `<lang>:file:<path>` node |
| 64 | +- `base.EnsureExternalAnchor(ctx, lang, name)` — emits a `<lang>:external:<name>` node |
| 65 | + |
| 66 | +See [`internal/detector/base/imports_helpers.go`](internal/detector/base/) and the gotcha note in [`docs/10-known-risks-and-todos.md`](docs/10-known-risks-and-todos.md). |
| 67 | + |
| 68 | +## Build / test / run commands |
| 69 | + |
| 70 | +```bash |
| 71 | +# Build |
| 72 | +CGO_ENABLED=1 go build -o /usr/local/bin/codeiq ./cmd/codeiq |
| 73 | + |
| 74 | +# Test — full suite (884+ tests, ~30s) |
| 75 | +CGO_ENABLED=1 go test ./... -count=1 |
| 76 | + |
| 77 | +# Race detector (CI-equivalent) |
| 78 | +CGO_ENABLED=1 go test ./... -race -count=1 |
| 79 | + |
| 80 | +# Single package |
| 81 | +CGO_ENABLED=1 go test ./internal/detector/jvm/java/... -count=1 |
| 82 | + |
| 83 | +# Static analysis (mirrors go-ci.yml) |
| 84 | +go vet ./... |
| 85 | +"$(go env GOPATH)/bin/staticcheck" ./... # honnef.co/go/tools@2025.1.1 |
| 86 | +"$(go env GOPATH)/bin/gosec" -exclude=G104,G115,G202,G204,G301,G304,G306,G401,G404,G501 ./... |
| 87 | +"$(go env GOPATH)/bin/govulncheck" ./... |
| 88 | + |
| 89 | +# Smoke: index + enrich + stats on the canonical fixture |
| 90 | +codeiq index testdata/fixture-minimal |
| 91 | +codeiq enrich testdata/fixture-minimal |
| 92 | +codeiq stats testdata/fixture-minimal |
| 93 | + |
| 94 | +# MCP wiring for Claude Code / Cursor |
| 95 | +codeiq mcp /path/to/repo |
| 96 | +``` |
| 97 | + |
| 98 | +## Layout |
| 99 | + |
| 100 | +``` |
| 101 | +codeiq/ |
| 102 | +├── cmd/codeiq/main.go — entry; 5-line shim into internal/cli |
| 103 | +├── cmd/extcheck/main.go — build-time helper (Inference) |
| 104 | +├── internal/ |
| 105 | +│ ├── analyzer/ — index + enrich pipelines, GraphBuilder, ServiceDetector |
| 106 | +│ ├── buildinfo/ — Version/Commit/Date with debug.BuildInfo fallback |
| 107 | +│ ├── cache/ — SQLite analysis cache (5 tables, CacheVersion=6) |
| 108 | +│ ├── cli/ — cobra subcommands + detectors_register.go CHOKE POINT |
| 109 | +│ ├── detector/ — 100 detectors organized by family |
| 110 | +│ │ ├── auth/ csharp/ frontend/ generic/ golang/ iac/ |
| 111 | +│ │ ├── jvm/java/ jvm/kotlin/ jvm/scala/ |
| 112 | +│ │ ├── markup/ proto/ python/ script/shell/ sql/ |
| 113 | +│ │ ├── structured/ systems/{cpp,rust}/ typescript/ |
| 114 | +│ │ └── base/ — shared helpers (NOT detectors) |
| 115 | +│ ├── flow/ — architecture-flow diagram engine |
| 116 | +│ ├── graph/ — Kuzu facade + FTS + mutation gate |
| 117 | +│ ├── intelligence/ — Lexical enricher + per-language extractors |
| 118 | +│ ├── mcp/ — MCP server + 10 tools |
| 119 | +│ ├── model/ — CodeNode, CodeEdge, NodeKind, EdgeKind, Confidence, Layer |
| 120 | +│ ├── parser/ — tree-sitter + structured parsers |
| 121 | +│ ├── query/ — service / topology / stats / dead-code Cypher templates |
| 122 | +│ └── review/ — PR-review pipeline (diff + Ollama) |
| 123 | +├── parity/ — parity harness (build tag `parity`); mostly idle |
| 124 | +├── testdata/ — fixture-minimal, fixture-multi-lang |
| 125 | +├── scripts/ — release / git-setup shell helpers |
| 126 | +├── .github/workflows/ — go-ci, perf-gate, release-go, release-darwin, security, scorecard |
| 127 | +├── .goreleaser.yml — Goreleaser v2 (CGO multi-arch + Cosign + Syft) |
| 128 | +├── go.mod / go.sum |
| 129 | +├── docs/ — Full reference doc tree (see docs/README equivalent in this file's sibling README.md) |
| 130 | +├── CLAUDE.md — this file |
| 131 | +├── AGENTS.md — short pointer to CLAUDE.md (Inference, may be regenerated) |
| 132 | +└── README.md — user-facing entry |
| 133 | +``` |
| 134 | + |
| 135 | +## Gotchas (kept terse — full list in [`docs/10-known-risks-and-todos.md`](docs/10-known-risks-and-todos.md)) |
| 136 | + |
| 137 | +### Build / install |
| 138 | + |
| 139 | +- **CGO mandatory.** `CGO_ENABLED=0` fails at link time. Kuzu, SQLite, tree-sitter all CGO. |
| 140 | +- **Module is at repo root.** Post-PR-#162 hoist. Stale instructions saying `cd go && go build` are wrong. |
| 141 | +- **`go install …@latest` may resolve to a poisoned version.** Deleted tags (`v0.1.0`, `v0.3.0`, `v1.0.0`) live on at `proxy.golang.org` with old layouts. Use an explicit `@v0.4.1` (or later never-previously-used version). |
| 142 | + |
| 143 | +### Pipeline |
| 144 | + |
| 145 | +- **Detector blank-import is mandatory.** Forget [`detectors_register.go`](internal/cli/detectors_register.go) and the family ships dead. `codeiq plugins list` is the quick check. |
| 146 | +- **Determinism over all else.** Map iteration without sort = silent regression. Determinism tests will catch you. |
| 147 | +- **Phantom edges drop at Snapshot.** Use `base.EnsureFileAnchor` / `EnsureExternalAnchor`. |
| 148 | + |
| 149 | +### Kuzu 0.11.3 (current) |
| 150 | + |
| 151 | +- **Native FTS bundled.** `INSTALL fts` is a no-op when bundled. `CALL CREATE_FTS_INDEX('<table>', '<name>', [cols])` + `CALL QUERY_FTS_INDEX('<table>', '<name>', '<query>')` work. |
| 152 | +- **Parameterized `LIMIT $lim` / `SKIP $skip`** — use them. The old `fmt.Sprintf("LIMIT %d", n)` pattern is gone after PR #159. |
| 153 | +- **`[]string` accepted directly for `IN $param`.** The old `stringsToAny` widener is gone (PR #159). |
| 154 | +- **Mutation gate allow-lists `CALL QUERY_FTS_INDEX`.** Write-side `CREATE_FTS_INDEX` / `DROP_FTS_INDEX` stay blocked under `OpenReadOnly`. |
| 155 | +- **Recursive pattern upper bound is still literal-only.** `[*1..N]` — `N` must be inline. We use `fmt.Sprintf` here; depth comes from a clamped `--max-depth` (default 10). |
| 156 | +- **`EXISTS { … }` subqueries don't see outer-scope `$param`.** Inline static lists as rel-pattern alternations. |
| 157 | +- **List comprehension on path nodes is broken.** Use `properties(nodes(p), 'id')`, not `[n IN nodes(p) | n.id]`. |
| 158 | + |
| 159 | +### Bulk-load CSV |
| 160 | + |
| 161 | +- **`DELIM='|'` + `QUOTE='"'` + `ESCAPE='"'`** in every Kuzu COPY. Required for RFC-4180 round-trip from Go's `csv.Writer`. Three production bugs in series taught us this (#150 commas, #153 pipes inside fields). |
| 162 | +- **Service IDs are path-qualified.** `service:<dir>:<name>`. Two modules sharing a name don't collide on Kuzu PK (#151). |
| 163 | + |
| 164 | +### TOML quoted keys |
| 165 | + |
| 166 | +- **`unquote()` on both the key AND the section header.** Airflow's `.cherry_picker.toml` had `"check_sha" = "..."` which used to ship as `"check_sha"` (with quotes) into node IDs. Fixed in PR #152. |
| 167 | + |
| 168 | +### MCP |
| 169 | + |
| 170 | +- **MCP SDK v1.6 quirks:** |
| 171 | + - No `NewStdioTransport(in, out)` helper. `StdioTransport{}` zero-value binds `os.Stdin`/`os.Stdout`. Tests use `NewInMemoryTransports()`. |
| 172 | + - `Server.AddTool(t *Tool, h ToolHandler)` — two args, not aggregate. |
| 173 | + - `CallToolRequest.Params` is `*CallToolParamsRaw{Arguments json.RawMessage}`. The wrapper in [`internal/mcp/tool.go`](internal/mcp/tool.go) unmarshals once. |
| 174 | + - ToolHandler returns get JSON-marshaled by the SDK. **Special-case `string` returns** in `asSDKTool` so the Mermaid/DOT string from `generate_flow` doesn't double-encode. |
| 175 | + |
| 176 | +### Release pipeline |
| 177 | + |
| 178 | +- **`draft: true`** in `.goreleaser.yml` — every release lands as a draft, needs `gh release edit --draft=false`. |
| 179 | +- **`release-darwin.yml` polls `release-go`** for 15 min with early-bail on upstream failure (PR #165 raised the budget from 90s). |
| 180 | +- **Never re-use a deleted tag name.** `proxy.golang.org` caches version content immutably. |
| 181 | + |
| 182 | +## Adding a new detector |
| 183 | + |
| 184 | +1. Create `internal/detector/<family>/<name>.go`: |
| 185 | + ```go |
| 186 | + package <family> |
| 187 | + |
| 188 | + import ( |
| 189 | + "github.com/randomcodespace/codeiq/internal/detector" |
| 190 | + "github.com/randomcodespace/codeiq/internal/detector/base" |
| 191 | + "github.com/randomcodespace/codeiq/internal/model" |
| 192 | + ) |
| 193 | + |
| 194 | + type MyDetector struct{} |
| 195 | + |
| 196 | + func NewMyDetector() *MyDetector { return &MyDetector{} } |
| 197 | + |
| 198 | + func (MyDetector) Name() string { return "my_detector" } |
| 199 | + func (MyDetector) SupportedLanguages() []string { return []string{"java"} } |
| 200 | + func (MyDetector) DefaultConfidence() model.Confidence { return base.RegexDetectorDefaultConfidence } |
| 201 | + |
| 202 | + func init() { detector.RegisterDefault(NewMyDetector()) } |
| 203 | + |
| 204 | + func (MyDetector) Detect(ctx *detector.Context) *detector.Result { |
| 205 | + // pattern matching → return detector.ResultOf(nodes, edges) or detector.EmptyResult() |
| 206 | + } |
| 207 | + ``` |
| 208 | + |
| 209 | +2. If the family `<family>/` is **new** (no detector lived there before), blank-import it in [`internal/cli/detectors_register.go`](internal/cli/detectors_register.go). |
| 210 | + |
| 211 | +3. Write `<name>_test.go` next to it. Three test cases required: |
| 212 | + - Positive match |
| 213 | + - Negative match (avoids false positives) |
| 214 | + - Determinism (run twice, assert byte-identical output) |
| 215 | + |
| 216 | +4. `CGO_ENABLED=1 go test ./internal/detector/<family>/... -count=1` |
| 217 | + |
| 218 | +5. Smoke check: `codeiq plugins list | grep my_detector` should show the new detector. |
| 219 | + |
| 220 | +## Adding a new MCP tool mode |
| 221 | + |
| 222 | +If extending one of the 6 consolidated tools with a new mode: |
| 223 | + |
| 224 | +1. Edit the relevant tool builder in [`internal/mcp/tools_consolidated.go`](internal/mcp/tools_consolidated.go). |
| 225 | +2. Add a parity-test entry in [`internal/mcp/tools_consolidated_parity_test.go`](internal/mcp/tools_consolidated_parity_test.go) covering arg-name mapping to the underlying narrow handler. |
| 226 | +3. Update [`docs/04-main-flows.md`](docs/04-main-flows.md) MCP tool table. |
| 227 | + |
| 228 | +If adding a wholly new top-level MCP tool: |
| 229 | + |
| 230 | +1. Add a `toolXxx(d) Tool` builder somewhere under `internal/mcp/`. |
| 231 | +2. Register it in `RegisterGraphUserFacing` / `RegisterConsolidated` / `RegisterFlow` (in [`internal/cli/mcp.go`](internal/cli/mcp.go) → `registerAllTools`). |
| 232 | +3. Write an integration test in [`internal/mcp/integration_test.go`](internal/mcp/integration_test.go). |
| 233 | + |
| 234 | +## Permission discipline |
| 235 | + |
| 236 | +- **Never commit unless the user explicitly asks.** Agent-generated `*.md` files (plans, scratchpad) must be in `.gitignore` before any push. |
| 237 | +- **Never push to `main` directly.** Always via PR. |
| 238 | +- **Never bypass branch protection** with `gh pr merge --admin`. `go-ci.yml` and `security.yml` are required for a reason. |
| 239 | +- **Never `git tag --force` a deleted version name.** `proxy.golang.org` cache poison. |
| 240 | +- **Always use `t.TempDir()` in tests.** No test should write outside its tempdir. |
| 241 | +- **For destructive ops** (`rm -rf`, `git push --delete`, `gh release delete`, `git reset --hard`): ask before doing, unless the operator explicitly authorized. |
| 242 | + |
| 243 | +## When in doubt |
| 244 | + |
| 245 | +- Read [`docs/11-agent-handoff.md`](docs/11-agent-handoff.md). |
| 246 | +- Run the smoke test on `testdata/fixture-minimal` after any pipeline change. |
| 247 | +- Use `git log -p --since="1 month"` to learn the recent change pattern. |
| 248 | +- The user values terse output. Skip preamble. Show the change + verification command. |
0 commit comments