Skip to content

Commit 669b7c7

Browse files
aksOpsclaude
andcommitted
docs: complete brownfield reference documentation
User-requested full reference-doc set ahead of stopping codeiq work with Claude Code. Replaces the doc-wipe slate from PR #168 with a coherent, source-grounded reference tree. Created (15 files / ~2,200 lines / ~180 KB): README.md — user-facing entry + badges CLAUDE.md — agent-facing repo guide docs/00-project-overview.md — what / who / status docs/01-local-setup.md — build / test / common issues docs/02-architecture.md — components + tradeoffs docs/03-code-map.md — directory-by-directory tour docs/04-main-flows.md — index / enrich / mcp / review docs/05-configuration.md — env / flags / config files docs/06-data-model.md — Kuzu + SQLite schemas + taxonomy docs/07-integrations.md — Ollama + Sigstore + zero else docs/08-testing.md — strategy + fixtures + perf-gate docs/09-build-deploy-release.md — Goreleaser + cosign keyless docs/10-known-risks-and-todos.md — gotchas / debt / sec-sensitive docs/11-agent-handoff.md — one-stop brief for next agent docs/adr/0001-current-architecture.md — why the shape is what it is Grounding policy: * Every concrete claim points at a file. Uses `path/to/file:line` form where line-level matters. * Anything not directly verified is marked `Inference`. * Anything unknown is marked `Unknown`. * No code changes. Only docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 61230ae commit 669b7c7

15 files changed

Lines changed: 2169 additions & 0 deletions

CLAUDE.md

Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
# CLAUDE.md — codeiq
2+
3+
> **Repo-specific instructions for Claude Code (and any AI coding agent with similar tooling).** Read this in full before making changes. For the full doc set see [`docs/`](docs/); for the one-stop agent brief see [`docs/11-agent-handoff.md`](docs/11-agent-handoff.md).
4+
5+
## What this project is
6+
7+
A deterministic code-knowledge-graph CLI + stdio MCP server. Pure static analysis — no AI in the index/enrich pipeline. Single static Go binary. CGO mandatory.
8+
9+
- **Module path:** `github.com/randomcodespace/codeiq`
10+
- **Entry:** [`cmd/codeiq/main.go`](cmd/codeiq/main.go)[`internal/cli/root.go`](internal/cli/root.go)
11+
- **Tech stack pinned in [`go.mod`](go.mod):** Go 1.25.10 toolchain, Kuzu 0.11.3, SQLite 1.14.44, MCP SDK v1.6, tree-sitter (smacker), cobra 1.10.2.
12+
13+
## Architecture in 10 lines
14+
15+
1. `codeiq index <path>` walks files (git ls-files → fallback), parses with tree-sitter / structured / regex, runs **100 detectors**, dedup-merges into a graph, and writes to **SQLite cache** at `.codeiq/cache/codeiq.sqlite`.
16+
2. `codeiq enrich <path>` loads cache, runs linkers + LayerClassifier + intelligence extractors + ServiceDetector, then BulkLoads into **Kuzu** at `.codeiq/graph/codeiq.kuzu/` and builds two FTS indexes (`code_node_label_fts`, `code_node_lexical_fts`).
17+
3. `codeiq mcp <path>` opens Kuzu read-only and serves a stdio JSON-RPC MCP protocol with **10 tools** (6 mode-driven + `run_cypher` + `read_file` + `generate_flow` + `review_changes`).
18+
4. `codeiq review` is the only LLM touch — diff + graph evidence → Ollama (`localhost:11434` default; `OLLAMA_API_KEY` flips to cloud).
19+
5. Every other subcommand (`stats`, `find`, `query`, `cypher`, `flow`, `graph`, `topology`, `cache`, `plugins`, `version`) is a thin read-only consumer of the Kuzu store.
20+
21+
## Critical rules
22+
23+
### Read-only MCP
24+
25+
The MCP server (`codeiq mcp`) is strictly read-only. `run_cypher` enforces this via [`MutationKeyword`](internal/graph/mutation.go) — regex gate that rejects CREATE/DELETE/DETACH/SET/REMOVE/MERGE/DROP/FOREACH/LOAD CSV/COPY and any CALL outside the allow-list (`db.*`, `show_*`, `table_*`, `current_setting`, `table_info`, **`query_fts_index`**). `read_file` is path-sandboxed to the indexed root.
26+
27+
Belt-and-braces: Kuzu is opened with `OpenReadOnly` at the engine level too.
28+
29+
### Determinism
30+
31+
Same input ⇒ same output, byte-for-byte. Every detector ships a determinism test. Conventions:
32+
33+
- Never iterate a Go `map` without sorting keys first.
34+
- `GraphBuilder.Snapshot()` sorts nodes + edges by ID.
35+
- Linker outputs go through `.Sorted()` at the call site.
36+
- Detectors are stateless — no mutable struct fields. Method-local state only.
37+
38+
### Detector registration choke point
39+
40+
Adding a new detector under `internal/detector/<dir>/` is **not enough**. The package leaf must be blank-imported in [`internal/cli/detectors_register.go`](internal/cli/detectors_register.go). Without that line, the Go linker drops the package's `init()` and the binary ships with no registration for that detector family. This was the #1 silent-failure bug during the Java→Go port — 15 language families silently produced 0 nodes before the auto-import check was added to the dev workflow.
41+
42+
### Goroutine safety
43+
44+
- File I/O and detector dispatch run on a worker pool (`opts.Workers`, default `2 × GOMAXPROCS`).
45+
- Detectors must be stateless. Method-local state only.
46+
- Kuzu reads serialize behind the [`Store.mu`](internal/graph/store.go) mutex; one query at a time.
47+
- The intelligence extractor pool is also `2 × GOMAXPROCS`-bounded to keep tree-sitter heap under control (Phase A OOM fix).
48+
49+
### Confidence ladder is monotonic
50+
51+
```
52+
ConfidenceLexical ("LEXICAL", 0.6) — regex / textual pattern
53+
ConfidenceSyntactic ("SYNTACTIC", 0.8) — AST / parse-tree match
54+
ConfidenceResolved ("RESOLVED", 0.95) — SymbolResolver cross-file resolution
55+
```
56+
57+
In `mergeNode`, the higher-confidence node wins. The donor only fills properties the survivor doesn't already have (so a Spring detector's `framework=spring` stamp can't be overwritten by a generic detector's lower-confidence emission).
58+
59+
### Phantom edge drop
60+
61+
Edges with endpoints not in the node set get dropped at `Snapshot()`. Detectors emitting imports / depends-on edges across files must explicitly create the anchor nodes:
62+
63+
- `base.EnsureFileAnchor(ctx, lang)` — emits a `<lang>:file:<path>` node
64+
- `base.EnsureExternalAnchor(ctx, lang, name)` — emits a `<lang>:external:<name>` node
65+
66+
See [`internal/detector/base/imports_helpers.go`](internal/detector/base/) and the gotcha note in [`docs/10-known-risks-and-todos.md`](docs/10-known-risks-and-todos.md).
67+
68+
## Build / test / run commands
69+
70+
```bash
71+
# Build
72+
CGO_ENABLED=1 go build -o /usr/local/bin/codeiq ./cmd/codeiq
73+
74+
# Test — full suite (884+ tests, ~30s)
75+
CGO_ENABLED=1 go test ./... -count=1
76+
77+
# Race detector (CI-equivalent)
78+
CGO_ENABLED=1 go test ./... -race -count=1
79+
80+
# Single package
81+
CGO_ENABLED=1 go test ./internal/detector/jvm/java/... -count=1
82+
83+
# Static analysis (mirrors go-ci.yml)
84+
go vet ./...
85+
"$(go env GOPATH)/bin/staticcheck" ./... # honnef.co/go/tools@2025.1.1
86+
"$(go env GOPATH)/bin/gosec" -exclude=G104,G115,G202,G204,G301,G304,G306,G401,G404,G501 ./...
87+
"$(go env GOPATH)/bin/govulncheck" ./...
88+
89+
# Smoke: index + enrich + stats on the canonical fixture
90+
codeiq index testdata/fixture-minimal
91+
codeiq enrich testdata/fixture-minimal
92+
codeiq stats testdata/fixture-minimal
93+
94+
# MCP wiring for Claude Code / Cursor
95+
codeiq mcp /path/to/repo
96+
```
97+
98+
## Layout
99+
100+
```
101+
codeiq/
102+
├── cmd/codeiq/main.go — entry; 5-line shim into internal/cli
103+
├── cmd/extcheck/main.go — build-time helper (Inference)
104+
├── internal/
105+
│ ├── analyzer/ — index + enrich pipelines, GraphBuilder, ServiceDetector
106+
│ ├── buildinfo/ — Version/Commit/Date with debug.BuildInfo fallback
107+
│ ├── cache/ — SQLite analysis cache (5 tables, CacheVersion=6)
108+
│ ├── cli/ — cobra subcommands + detectors_register.go CHOKE POINT
109+
│ ├── detector/ — 100 detectors organized by family
110+
│ │ ├── auth/ csharp/ frontend/ generic/ golang/ iac/
111+
│ │ ├── jvm/java/ jvm/kotlin/ jvm/scala/
112+
│ │ ├── markup/ proto/ python/ script/shell/ sql/
113+
│ │ ├── structured/ systems/{cpp,rust}/ typescript/
114+
│ │ └── base/ — shared helpers (NOT detectors)
115+
│ ├── flow/ — architecture-flow diagram engine
116+
│ ├── graph/ — Kuzu facade + FTS + mutation gate
117+
│ ├── intelligence/ — Lexical enricher + per-language extractors
118+
│ ├── mcp/ — MCP server + 10 tools
119+
│ ├── model/ — CodeNode, CodeEdge, NodeKind, EdgeKind, Confidence, Layer
120+
│ ├── parser/ — tree-sitter + structured parsers
121+
│ ├── query/ — service / topology / stats / dead-code Cypher templates
122+
│ └── review/ — PR-review pipeline (diff + Ollama)
123+
├── parity/ — parity harness (build tag `parity`); mostly idle
124+
├── testdata/ — fixture-minimal, fixture-multi-lang
125+
├── scripts/ — release / git-setup shell helpers
126+
├── .github/workflows/ — go-ci, perf-gate, release-go, release-darwin, security, scorecard
127+
├── .goreleaser.yml — Goreleaser v2 (CGO multi-arch + Cosign + Syft)
128+
├── go.mod / go.sum
129+
├── docs/ — Full reference doc tree (see docs/README equivalent in this file's sibling README.md)
130+
├── CLAUDE.md — this file
131+
├── AGENTS.md — short pointer to CLAUDE.md (Inference, may be regenerated)
132+
└── README.md — user-facing entry
133+
```
134+
135+
## Gotchas (kept terse — full list in [`docs/10-known-risks-and-todos.md`](docs/10-known-risks-and-todos.md))
136+
137+
### Build / install
138+
139+
- **CGO mandatory.** `CGO_ENABLED=0` fails at link time. Kuzu, SQLite, tree-sitter all CGO.
140+
- **Module is at repo root.** Post-PR-#162 hoist. Stale instructions saying `cd go && go build` are wrong.
141+
- **`go install …@latest` may resolve to a poisoned version.** Deleted tags (`v0.1.0`, `v0.3.0`, `v1.0.0`) live on at `proxy.golang.org` with old layouts. Use an explicit `@v0.4.1` (or later never-previously-used version).
142+
143+
### Pipeline
144+
145+
- **Detector blank-import is mandatory.** Forget [`detectors_register.go`](internal/cli/detectors_register.go) and the family ships dead. `codeiq plugins list` is the quick check.
146+
- **Determinism over all else.** Map iteration without sort = silent regression. Determinism tests will catch you.
147+
- **Phantom edges drop at Snapshot.** Use `base.EnsureFileAnchor` / `EnsureExternalAnchor`.
148+
149+
### Kuzu 0.11.3 (current)
150+
151+
- **Native FTS bundled.** `INSTALL fts` is a no-op when bundled. `CALL CREATE_FTS_INDEX('<table>', '<name>', [cols])` + `CALL QUERY_FTS_INDEX('<table>', '<name>', '<query>')` work.
152+
- **Parameterized `LIMIT $lim` / `SKIP $skip`** — use them. The old `fmt.Sprintf("LIMIT %d", n)` pattern is gone after PR #159.
153+
- **`[]string` accepted directly for `IN $param`.** The old `stringsToAny` widener is gone (PR #159).
154+
- **Mutation gate allow-lists `CALL QUERY_FTS_INDEX`.** Write-side `CREATE_FTS_INDEX` / `DROP_FTS_INDEX` stay blocked under `OpenReadOnly`.
155+
- **Recursive pattern upper bound is still literal-only.** `[*1..N]``N` must be inline. We use `fmt.Sprintf` here; depth comes from a clamped `--max-depth` (default 10).
156+
- **`EXISTS { … }` subqueries don't see outer-scope `$param`.** Inline static lists as rel-pattern alternations.
157+
- **List comprehension on path nodes is broken.** Use `properties(nodes(p), 'id')`, not `[n IN nodes(p) | n.id]`.
158+
159+
### Bulk-load CSV
160+
161+
- **`DELIM='|'` + `QUOTE='"'` + `ESCAPE='"'`** in every Kuzu COPY. Required for RFC-4180 round-trip from Go's `csv.Writer`. Three production bugs in series taught us this (#150 commas, #153 pipes inside fields).
162+
- **Service IDs are path-qualified.** `service:<dir>:<name>`. Two modules sharing a name don't collide on Kuzu PK (#151).
163+
164+
### TOML quoted keys
165+
166+
- **`unquote()` on both the key AND the section header.** Airflow's `.cherry_picker.toml` had `"check_sha" = "..."` which used to ship as `"check_sha"` (with quotes) into node IDs. Fixed in PR #152.
167+
168+
### MCP
169+
170+
- **MCP SDK v1.6 quirks:**
171+
- No `NewStdioTransport(in, out)` helper. `StdioTransport{}` zero-value binds `os.Stdin`/`os.Stdout`. Tests use `NewInMemoryTransports()`.
172+
- `Server.AddTool(t *Tool, h ToolHandler)` — two args, not aggregate.
173+
- `CallToolRequest.Params` is `*CallToolParamsRaw{Arguments json.RawMessage}`. The wrapper in [`internal/mcp/tool.go`](internal/mcp/tool.go) unmarshals once.
174+
- ToolHandler returns get JSON-marshaled by the SDK. **Special-case `string` returns** in `asSDKTool` so the Mermaid/DOT string from `generate_flow` doesn't double-encode.
175+
176+
### Release pipeline
177+
178+
- **`draft: true`** in `.goreleaser.yml` — every release lands as a draft, needs `gh release edit --draft=false`.
179+
- **`release-darwin.yml` polls `release-go`** for 15 min with early-bail on upstream failure (PR #165 raised the budget from 90s).
180+
- **Never re-use a deleted tag name.** `proxy.golang.org` caches version content immutably.
181+
182+
## Adding a new detector
183+
184+
1. Create `internal/detector/<family>/<name>.go`:
185+
```go
186+
package <family>
187+
188+
import (
189+
"github.com/randomcodespace/codeiq/internal/detector"
190+
"github.com/randomcodespace/codeiq/internal/detector/base"
191+
"github.com/randomcodespace/codeiq/internal/model"
192+
)
193+
194+
type MyDetector struct{}
195+
196+
func NewMyDetector() *MyDetector { return &MyDetector{} }
197+
198+
func (MyDetector) Name() string { return "my_detector" }
199+
func (MyDetector) SupportedLanguages() []string { return []string{"java"} }
200+
func (MyDetector) DefaultConfidence() model.Confidence { return base.RegexDetectorDefaultConfidence }
201+
202+
func init() { detector.RegisterDefault(NewMyDetector()) }
203+
204+
func (MyDetector) Detect(ctx *detector.Context) *detector.Result {
205+
// pattern matching → return detector.ResultOf(nodes, edges) or detector.EmptyResult()
206+
}
207+
```
208+
209+
2. If the family `<family>/` is **new** (no detector lived there before), blank-import it in [`internal/cli/detectors_register.go`](internal/cli/detectors_register.go).
210+
211+
3. Write `<name>_test.go` next to it. Three test cases required:
212+
- Positive match
213+
- Negative match (avoids false positives)
214+
- Determinism (run twice, assert byte-identical output)
215+
216+
4. `CGO_ENABLED=1 go test ./internal/detector/<family>/... -count=1`
217+
218+
5. Smoke check: `codeiq plugins list | grep my_detector` should show the new detector.
219+
220+
## Adding a new MCP tool mode
221+
222+
If extending one of the 6 consolidated tools with a new mode:
223+
224+
1. Edit the relevant tool builder in [`internal/mcp/tools_consolidated.go`](internal/mcp/tools_consolidated.go).
225+
2. Add a parity-test entry in [`internal/mcp/tools_consolidated_parity_test.go`](internal/mcp/tools_consolidated_parity_test.go) covering arg-name mapping to the underlying narrow handler.
226+
3. Update [`docs/04-main-flows.md`](docs/04-main-flows.md) MCP tool table.
227+
228+
If adding a wholly new top-level MCP tool:
229+
230+
1. Add a `toolXxx(d) Tool` builder somewhere under `internal/mcp/`.
231+
2. Register it in `RegisterGraphUserFacing` / `RegisterConsolidated` / `RegisterFlow` (in [`internal/cli/mcp.go`](internal/cli/mcp.go)`registerAllTools`).
232+
3. Write an integration test in [`internal/mcp/integration_test.go`](internal/mcp/integration_test.go).
233+
234+
## Permission discipline
235+
236+
- **Never commit unless the user explicitly asks.** Agent-generated `*.md` files (plans, scratchpad) must be in `.gitignore` before any push.
237+
- **Never push to `main` directly.** Always via PR.
238+
- **Never bypass branch protection** with `gh pr merge --admin`. `go-ci.yml` and `security.yml` are required for a reason.
239+
- **Never `git tag --force` a deleted version name.** `proxy.golang.org` cache poison.
240+
- **Always use `t.TempDir()` in tests.** No test should write outside its tempdir.
241+
- **For destructive ops** (`rm -rf`, `git push --delete`, `gh release delete`, `git reset --hard`): ask before doing, unless the operator explicitly authorized.
242+
243+
## When in doubt
244+
245+
- Read [`docs/11-agent-handoff.md`](docs/11-agent-handoff.md).
246+
- Run the smoke test on `testdata/fixture-minimal` after any pipeline change.
247+
- Use `git log -p --since="1 month"` to learn the recent change pattern.
248+
- The user values terse output. Skip preamble. Show the change + verification command.

0 commit comments

Comments
 (0)