Skip to content

Commit e5fd3fe

Browse files
aksOpsclaude
andauthored
docs: post-cutover + Kuzu 0.11 sweep (#160)
Stale doc references after Phase 6 (Java deletion, #132) and the Kuzu 0.7.1 → 0.11.3 bump (#155 + #159). - CLAUDE.md / PROJECT_SUMMARY.md: bump Kuzu 0.7.1 → 0.11.3, go-sqlite3 1.14.22 → 1.14.44, cobra to 1.10.2; note native FTS. - AGENTS.md: rewrite "What this repo is" (no more "REST API"); flip `mvn -B -ntp clean verify` → `go test ./...`; clarify that REST + React SPA were deleted in Phase 6 and won't return. - SECURITY.md: rewrite scope. Drop the dead JAR / serve / REST API / React UI / H2 / Neo4j Embedded references. New in-scope list covers every codeiq subcommand, the 10 MCP tools (with `run_cypher` mutation gate called out), `.codeiq/cache/` (SQLite) + `.codeiq/graph/` (Kuzu), and `read_file` path sandboxing. Add the security CI workflows (CodeQL, Semgrep, OSV-Scanner, Trivy, Gitleaks, SBOM, Socket Security) + perf-gate to the hardening references. - CHANGELOG.md: populate [Unreleased] with the OOM-fix saga (PRs #145-#148), the five correctness fixes (#149-#153), the Kuzu 0.7.1 → 0.11.3 bump (#155-#158), the FTS migration (#159), the Dependabot config rewrite (#154), and the enrich CLI knobs. No code changes. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 799be73 commit e5fd3fe

5 files changed

Lines changed: 80 additions & 26 deletions

File tree

AGENTS.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
55
## What this repo is
66

7-
codeiq is a CLI + read-only server that builds a deterministic code-knowledge graph over a codebase. No AI, no external APIs — pure static analysis. See [`/CLAUDE.md`](CLAUDE.md) for the architecture, package map, pipeline, conventions, and gotchas.
7+
codeiq is a CLI + read-only stdio MCP server that builds a deterministic code-knowledge graph over a codebase. No AI in the index/enrich pipeline; LLM use is opt-in via `codeiq review`. Single static Go binary (CGO for Kuzu + SQLite). See [`/CLAUDE.md`](CLAUDE.md) for the architecture, package map, pipeline, conventions, and gotchas.
88

99
## Pointers, in priority order
1010

@@ -22,9 +22,9 @@ codeiq is a CLI + read-only server that builds a deterministic code-knowledge gr
2222
- **Sign every commit.** The repo-local config (`scripts/setup-git-signed.sh`) makes this automatic; do not rewrite it.
2323
- **One logical change per commit.** Conventional-commit subjects (`feat:`, `fix:`, `chore:`, `refactor:`, `test:`, `docs:`, `perf:`).
2424
- **Squash-merge only.** Branch protection rejects merge commits and force-pushes to `main`.
25-
- **Tests + jacoco gate must pass.** `mvn -B -ntp clean verify` is the contract.
25+
- **Tests + race + vet must pass.** `cd go && CGO_ENABLED=1 go test ./... -count=1` is the contract; release CI runs `-race` too. 880+ tests today.
2626
- **Determinism is non-negotiable.** Same input → same output, byte-for-byte. Any new detector ships with a determinism test.
27-
- **Read-only serving layer.** MCP and REST API on the `serve` path do not mutate. If you find yourself adding `POST /api/<verb>` that writes, stop and reconsider.
27+
- **Read-only MCP server.** Tool calls never write to the graph. Index/enrich happen only via the CLI commands `codeiq index` / `codeiq enrich`. The Java reference's REST API + React SPA were deleted in Phase 6 cutover (#132) and will not be reintroduced.
2828
- **No secrets in code.** Repo-level GitHub Actions secrets only.
2929

3030
## Paperclip / RAN-* coordination

CHANGELOG.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,56 @@ for that specific tag for the per-commit details.
1414

1515
## [Unreleased]
1616

17+
### Fixed
18+
19+
- `codeiq enrich` survives polyglot codebases at `~/projects/` scale (49k
20+
files, 15 GiB host). Pre-fix runs OOM-killed at exit 137; now exits 0
21+
with peak RSS 1.8–2.2 GiB. PRs #145, #146, #147, #148.
22+
- Five enrich pipeline correctness fixes that surfaced at scale (each one
23+
blocked the next — landed in order):
24+
- PR #149: MCP dispatch arg names in `tools_consolidated` (7 modes were
25+
permanently returning `INVALID_INPUT`).
26+
- PR #150: pipe-delimited Kuzu COPY staging — JSON property values
27+
containing commas (e.g. Python `imports`) no longer break the parser.
28+
- PR #151: path-qualified SERVICE node IDs — two modules sharing a name
29+
in different folders no longer collide on primary key.
30+
- PR #152: TOML detector unquotes quoted keys (e.g. airflow's
31+
`.cherry_picker.toml` `"check_sha" = ...`).
32+
- PR #153: explicit `QUOTE='"', ESCAPE='"'` on Kuzu COPY so RFC-4180
33+
quoting round-trips correctly (Istio EDS cluster names with `|`).
34+
35+
### Changed
36+
37+
- **Kuzu 0.7.1 → 0.11.3** (PR #155). Migrates the embedded graph DB to a
38+
release with bundled FTS extension and bound `LIMIT`/`SKIP` parameters.
39+
- **Real FTS replaces CONTAINS predicates** (PR #159). `SearchByLabel`
40+
and `SearchLexical` now route through `CALL QUERY_FTS_INDEX` with BM25
41+
ranking; CONTAINS fallback retained for pre-enrich graphs. Auto-suffix
42+
`*` on single-token queries preserves prefix-match UX. Two indexes
43+
created at enrich time:
44+
- `code_node_label_fts` over `(label, fqn_lower)`
45+
- `code_node_lexical_fts` over `(prop_lex_comment, prop_lex_config_keys)`
46+
- **Parameterized `LIMIT`/`SKIP`** across the query layer (PR #159).
47+
`intLiteral` helper removed; `fmt.Sprintf("LIMIT %d", n)` replaced with
48+
`LIMIT $lim` bindings.
49+
- **Dropped `stringsToAny` widener** (PR #159). Kuzu 0.11's Go binding
50+
accepts `[]string` directly for `IN $param` clauses.
51+
- **Mutation gate** allow-lists read-only `CALL QUERY_FTS_INDEX` (PR #159);
52+
`CREATE_FTS_INDEX` / `DROP_FTS_INDEX` stay blocked under
53+
`OpenReadOnly`.
54+
- **Dependabot config** rewritten (PR #154) — drops the dead Java `maven`
55+
(`/`) and `npm` (`/src/main/frontend`) ecosystems, adds `gomod` (`/go`)
56+
with groups for `kuzu`, `tree-sitter`, `mcp`, `cobra-viper`, `sqlite`,
57+
`test-libs`. Routine bumps land via PRs #155, #156, #157, #158.
58+
59+
### Added
60+
61+
- `codeiq enrich` knobs (PR #147): `--memprofile=<path>` writes a Go
62+
heap profile; `--max-buffer-pool=N` overrides the 2 GiB Kuzu cap;
63+
`--copy-threads=N` overrides `MaxNumThreads` default.
64+
- Perf-gate CI step (PR #148): `/usr/bin/time -v codeiq enrich` runs on
65+
fixture-multi-lang; fails the build if peak RSS exceeds 300 MB.
66+
1767
## [v0.3.0] - 2026-05-13
1868

1969
### Changed

CLAUDE.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,15 +26,16 @@ landing) and `c630245` (release infra).
2626
2727
- **Go 1.25.10** — toolchain pin; module min is 1.25.0 (clamped by the
2828
MCP SDK's own `go` directive).
29-
- **Kuzu 0.7.1** (`github.com/kuzudb/go-kuzu`) — embedded graph DB.
30-
CGO. v0.11.3 capability matrix documented in `## Gotchas` below.
31-
- **`mattn/go-sqlite3` 1.14.22** — SQLite analysis cache. CGO.
29+
- **Kuzu 0.11.3** (`github.com/kuzudb/go-kuzu`) — embedded graph DB.
30+
CGO. Native FTS via `CALL CREATE_FTS_INDEX` / `QUERY_FTS_INDEX`.
31+
Capability matrix documented in `## Gotchas` below.
32+
- **`mattn/go-sqlite3` 1.14.44** — SQLite analysis cache. CGO.
3233
- **`smacker/go-tree-sitter`** — AST parsing for Java / Python /
3334
TypeScript / Go.
3435
- **`modelcontextprotocol/go-sdk` v1.6** — stdio MCP server. v1.6 API
3536
shape: `Server.Serve(ctx, mcpsdk.Transport)`; no `NewStdioTransport`
3637
helper.
37-
- **`spf13/cobra`** — CLI framework. Subcommand registration via
38+
- **`spf13/cobra` 1.10.2** — CLI framework. Subcommand registration via
3839
`internal/cli` blank imports.
3940

4041
## Architecture

PROJECT_SUMMARY.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,12 @@
2222

2323
- **Go 1.25.10** — toolchain pin in `go/go.mod` (module min 1.25.0,
2424
clamped by `modelcontextprotocol/go-sdk`).
25-
- **Kuzu 0.7.1** (`github.com/kuzudb/go-kuzu`) — embedded graph DB.
26-
- **`mattn/go-sqlite3` 1.14.22** — SQLite analysis cache.
25+
- **Kuzu 0.11.3** (`github.com/kuzudb/go-kuzu`) — embedded graph DB.
26+
Native FTS via `QUERY_FTS_INDEX` (bundled).
27+
- **`mattn/go-sqlite3` 1.14.44** — SQLite analysis cache.
2728
- **`smacker/go-tree-sitter`** — AST parsing (Java / Python / TS / Go).
2829
- **`modelcontextprotocol/go-sdk` v1.6** — stdio MCP server.
29-
- **`spf13/cobra`** — CLI framework.
30+
- **`spf13/cobra` 1.10.2** — CLI framework.
3031
- Manifest files read: `go/go.mod`, `go/go.sum`.
3132

3233
## Entry points

SECURITY.md

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22

33
## Supported versions
44

5-
Security fixes are issued against the latest minor release line on Maven Central. While codeiq is pre-1.0 (`0.x.y`) only the **latest** released `0.MINOR.x` line receives backports; older minor lines are EOL the moment a new minor ships.
5+
Security fixes are issued against the latest minor release line. While codeiq is pre-1.0 (`0.x.y`) only the **latest** released `0.MINOR.x` line receives backports; older minor lines are EOL the moment a new minor ships.
66

77
| Version line | Status |
88
|---|---|
9-
| `0.1.x` | Supported (current) |
10-
| `< 0.1.0` | Unsupported |
9+
| `0.3.x` | Supported (current — Go single binary) |
10+
| `0.2.x` and below | Unsupported (Java/Spring Boot reference, deleted at Phase 6 cutover) |
1111

12-
`-SNAPSHOT` builds are development snapshots; they do not receive security fixes by themselves — you should be tracking the latest tagged release.
12+
Development builds (untagged `main`) are not covered — track the latest tagged release.
1313

1414
## Reporting a vulnerability
1515

@@ -22,8 +22,8 @@ Use one of:
2222

2323
Please include:
2424

25-
- The codeiq version (`java -jar code-iq-*-cli.jar version` or `pom.xml` coordinate).
26-
- The shortest reproducer you can produce — a CLI command or test case is ideal.
25+
- The codeiq version (`codeiq --version`).
26+
- The shortest reproducer you can produce — a CLI command, a test case, or an indexed-fixture path.
2727
- Your assessment of impact (e.g., RCE, path traversal, info-disclosure, DoS).
2828
- Whether the issue is in a transitive dependency (please name the dependency + advisory ID if known).
2929

@@ -40,26 +40,28 @@ We do not currently run a paid bug bounty.
4040

4141
In-scope:
4242

43-
- The codeiq CLI (`code-iq-*-cli.jar`).
44-
- The library JAR (`io.github.randomcodespace.iq:code-iq`).
45-
- The bundled REST API + MCP server (`serve` subcommand) — including path traversal, authn/authz, deserialisation, request smuggling, and SSRF.
46-
- The bundled React UI assets shipped inside the JAR.
47-
- The pipeline cache (H2) and graph store (Neo4j Embedded) — including local privilege escalation and data tampering.
43+
- The `codeiq` CLI binary and every subcommand (`index`, `enrich`, `mcp`, `query`, `find`, `cypher`, `stats`, `flow`, `graph`, `topology`, `review`, `cache`, `plugins`, `config`).
44+
- The stdio MCP server (`codeiq mcp`) — including its 10 user-facing tools (`graph_summary`, `find_in_graph`, `inspect_node`, `trace_relationships`, `analyze_impact`, `topology_view`, `run_cypher`, `read_file`, `generate_flow`, `review_changes`). The mutation gate on `run_cypher` is in-scope — bypassing it to mutate the read-only Kuzu store is a vulnerability.
45+
- The pipeline cache (SQLite, `.codeiq/cache/codeiq.sqlite`) and graph store (Kuzu embedded, `.codeiq/graph/codeiq.kuzu`) — including local privilege escalation and data tampering of the indexed graph.
46+
- File-read sandboxing in `read_file` and `codeiq review` — path traversal out of the indexed root is in-scope.
47+
- The release pipeline — Goreleaser config, signing keys (cosign keyless via OIDC), GitHub Actions workflows under `.github/workflows/`, and the published artifacts (binary tarballs + checksums + cosign bundles).
4848

4949
Out of scope:
5050

5151
- Vulnerabilities that require pre-existing local code execution on the developer's machine (we ship as a developer tool — by definition you trust the code you point it at).
52-
- Public-internet attack surface — codeiq does not expose any service to the public internet by default; deploying the `serve` endpoint behind hostile reverse-proxies is out of scope.
53-
- Findings in third-party services we do not control (Maven Central, GitHub itself, SonarCloud, etc.) — please report those upstream.
52+
- Public-internet attack surface — codeiq does not expose any service to the public internet. It is a CLI + stdio MCP server only; there is no REST API and no web UI (the Java reference had both; they were deleted in Phase 6 cutover and will not be reintroduced).
53+
- Vulnerabilities in the LLM endpoint used by `codeiq review` (Ollama local or cloud) — those are the LLM vendor's surface area.
54+
- Findings in third-party services we do not control (GitHub itself, OpenSSF, Socket Security, etc.) — please report those upstream.
5455

5556
## Hardening references
5657

5758
- [`shared/runbooks/engineering-standards.md`](shared/runbooks/engineering-standards.md) — CVE policy and quality gates.
5859
- [`shared/runbooks/rollback.md`](shared/runbooks/rollback.md) §6 — secret rotation flow.
5960
- `.github/workflows/scorecard.yml` — OpenSSF Scorecard supply-chain checks.
60-
- GitHub repo-level **CodeQL default setup** (java-kotlin + javascript-typescript + actions) — code scanning, SARIF in the Security tab. Configured under repo Settings → Code security → Code scanning, not via a workflow file (a workflow-driven `codeql.yml` was tried and removed because GitHub rejects duplicate SARIF uploads when default setup is on for the same language).
61-
- `.github/dependabot.yml` — automated dependency / GHA / npm bumps.
61+
- `.github/workflows/security.yml` — CodeQL, Semgrep, OSV-Scanner, Trivy, Gitleaks, SBOM, Socket Security on every PR.
62+
- `.github/workflows/perf-gate.yml` — enrich memory regression gate (300 MB ceiling on fixture-multi-lang).
63+
- `.github/dependabot.yml` — automated `gomod` + `github-actions` bumps, grouped per ecosystem.
6264

6365
## Changelog
6466

65-
This file is versioned as part of the repo. Material changes (e.g., raising the supported-versions table, changing the disclosure timeline) are announced via a Release note and a Paperclip board comment.
67+
This file is versioned as part of the repo. Material changes (e.g., raising the supported-versions table, changing the disclosure timeline) are announced via a Release note.

0 commit comments

Comments
 (0)