You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Zero. A repo-wide grep for TODO, FIXME, HACK, XXX in internal/ and cmd/ returns 0 occurrences.
This is the result of a deliberate "no TODOs in main" discipline — incomplete work either ships behind a clear interface or is captured in plan files (now wiped). The flip side: there's no in-code roadmap for "things known to need work". The list below substitutes for that, drawn from comments, deleted plan files, and observed bugs.
Risks by surface
Index / enrich pipeline
Risk
Severity
Why
Where
Detector registration choke point
High
One forgotten blank import in detectors_register.go ships an empty registry for that language family. Phase 4 caught the bug: 15 language families silently produced 0 nodes.
Edges whose endpoint isn't in the node set get dropped at Snapshot(). Detectors must emit anchor nodes via base.EnsureFileAnchor / EnsureExternalAnchor for imports to survive. New detectors authored without this pattern silently lose edges.
Three production bugs in series: commas in JSON properties (#150), Istio cluster names with | (#153), TOML quoted keys (#152). Current state: DELIM='|', QUOTE='"', ESCAPE='"' with RFC-4180 round-trip + per-field tests. Any new structured-output channel must re-verify quoting end-to-end.
Two modules sharing a folder name used to emit service:<name> twice → Kuzu COPY aborts. Now path-qualified service:<dir>:<name> (#151). Any future "container" node type (e.g. workspace, layer) needs the same discipline.
Same input must produce byte-identical output. Map iteration without sort.Strings of keys, or goroutine ordering without explicit sort, is the typical bug. Every detector ships a determinism test for exactly this reason.
All 100 detectors
Tree-sitter grammar drift
Medium
Detectors pattern-match on AST node names. A grammar bump that renames nodes silently regresses fidelity. No automated regression for this.
The gate is regex over the literal query string, comment-stripped. Adversarial inputs embedding CREATE inside a string literal, or weird whitespace, could in principle slip through to Kuzu. Belt-and-braces is OpenReadOnly at the Kuzu level.
Even read-only Cypher can exhaust resources (huge result sets, recursive patterns). --max-results 500 + --query-timeout 30s cap the worst case, but a CALL-procedure on the allow-list could theoretically be slow.
Sandboxed to indexed root via filepath.Rel checks (Inference — verify in mcp/read_file.go). If the rel-check is missing or wrong, file content outside the root could leak. Worth a focused fuzz test.
7 modes used to permanently return INVALID_INPUT because the 6 consolidated tools passed wrong arg names to the underlying narrow handlers (#149). Now covered by parity tests. New mode additions need a parity-test entry.
proxy.golang.org caches every version's zip immutably. The pre-v0.4.0 tags were deleted from GitHub but their zips live on at the proxy. Anyone running go install …@v0.3.0 gets the old, broken layout.
History; mitigated by always picking a new version on releases.
goreleaser files: hard-fail on missing literal
Mitigated
files: README.md was a literal-file match — release blew up after the doc-wipe. Fixed by switching to globs README.md*.
The binary statically links Kuzu/SQLite/tree-sitter C++ code against glibc. Won't run on Alpine / musl without rebuilding upstream Kuzu.
All CGO deps.
No SLSA build attestation visibility on the consumer side
Low
actions/attest-build-provenance posts to GitHub's attestations store, but the README doesn't tell users how to verify it (gh attestation verify codeiq.tar.gz --owner RandomCodeSpace).
Kuzu is C++. A use-after-free or heap-overflow in Kuzu's query engine becomes our problem. Mitigated by pinning to a stable Kuzu version + cosign-verified release archives. No automated fuzzing today.
Per-file regex execution under a worker pool. A pathological input (e.g. very long backtracking-prone regex match) could stall a worker. Go's RE2 doesn't backtrack, so the risk is bounded — but a 1 GB single-line file is still 1 GB of byte scans. The default 2 MB tree-sitter buffer + DefaultExcludeDirs filter limit exposure.
Sufficient for detector use today. Misses TOML edge cases (array-of-tables, inline tables, multi-line strings — documented in code). Replace with a real TOML library if a bug surfaces.
No central error type
Across packages
Errors are stringly-typed. Operator scripting around stderr is fragile.
No structured logging
Across packages
Plain fmt.Fprintln(os.Stderr, ...) at -v levels. Fine for a CLI; would be limiting in a long-running service (and there isn't one, so it's fine).
The fixture has User.java + UserController.java to exercise the Java detector. They are content, not project Java code — easy to confuse with stale Java-era artifacts.
Incomplete features
Feature
State
codeiq config <action>
Mentioned in older docs; never implemented. Root --config flag works.
Incremental enrich
Not implemented. Today enrich does a full re-bulk-load of every cached row.
Multi-host distributed enrich
Not implemented; not currently a need.
Real-time / watch mode
Not implemented. Each run is operator-initiated.
Kuzu version-upgrade path
Manual rm -rf .codeiq/graph/ + re-enrich. No tooling.
Outbound HTTP to Ollama. Only network-touching surface in the binary. SSRF / injection in model / messages would land in the LLM, not in our process — but a malicious Ollama endpoint could feed back crafted responses.
Release pipeline (workflows + goreleaser)
Cosign keyless signing depends on GitHub OIDC. A workflow-modify PR that changes the signing identity would produce an unverifiable signature — branch protection on main is the mitigation.
GitHub Actions third-party actions
All third-party actions are pinned by SHA (per Scorecard guidance). Updating one requires re-pinning.
Migration risks
Bump
Risk
Kuzu major / minor
On-disk format may break. The graph is rebuildable from cache; the cache is SQLite (stable). Plan: enrich-rebuild on every Kuzu bump. Test by running enrich against testdata/fixture-multi-lang post-bump.
Go toolchain
Module pins go 1.25.0 (clamped by modelcontextprotocol/go-sdk). Bumping past 1.26 requires waiting for the SDK to update.
tree-sitter grammars
A grammar bump that renames AST nodes silently regresses detectors that pattern-match on those node names. No automated regression.
mattn/go-sqlite3
Stable; minor bumps have shipped without issue (1.14.22 → 1.14.44 in PR #157).
MCP SDK
Pins the Go module minimum. A major bump may rework the Server.AddTool / Tool signature; the wrapper in internal/mcp/tool.go is the absorption layer.
Recommended next sweeps
Fuzz the mutation gate with adversarial Cypher (comment smuggling, string literals containing blocked keywords, unicode whitespace tricks).
Property fuzz the bulk-load CSV writer with random byte sequences in node/edge properties.
Snapshot tests for tree-sitter grammar output on canonical files per language; pin grammar versions in go.mod rather than the wildcard tag patterns currently in use.
gh attestation verify documentation for release consumers.
Consider extracting an extra_files block in .goreleaser.yml with match: optional rather than the *-glob hack for README.md.