feat(graph): edge confidence audit trail + architecture analytics (Phase 1/4)#12
Conversation
Phase 1 of porting graphify's best ideas into codebase-index. - edges.confidence (extracted/inferred/ambiguous), SCHEMA_VERSION 2->3. Derived from how an edge resolved (exact / import-suffix heuristic / unresolved); never LLM-guessed. Surfaced in refs + impact for honesty. - graph/analysis.py: zero-dep, deterministic communities (Louvain local-move), god nodes, surprising bridges, auto-labels, suggested questions. Cached in meta['graph_analysis'] at build time. - Tests for confidence + analysis; refs/impact goldens regenerated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Test files cluster with the code they exercise and often outnumber it, which mislabelled production modules as "tests". Community labels now prefer the dominant non-test directory; an all-test community still names for tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c62bcbed5f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| """ | ||
| SELECT e.line AS line, f.path AS path, e.edge_type AS edge_type, | ||
| e.resolved AS resolved, e.src_id AS src_id, e.src_kind AS src_kind, | ||
| e.confidence AS confidence, |
There was a problem hiding this comment.
Handle schema-v2 indexes before reading confidence
When a user has an existing schema_version 2 index and runs refs after upgrading, Database._guard_version still allows opening older indexes for read-only commands, but this query now unconditionally selects e.confidence. Since schema.sql uses CREATE TABLE IF NOT EXISTS, the column is not added to the old edges table, so SQLite raises no such column: e.confidence before the fallback in refs_lookup can run; impact has the same risk through the updated edge accessors. Please rebuild/reject old indexes in read paths or make these accessors tolerate the missing column.
Useful? React with 👍 / 👎.
| surface the actual endpoint pair for each such bridge. | ||
| """ | ||
| pair_edges: dict[tuple[int, int], list[tuple[Node, Node]]] = defaultdict(list) | ||
| for (a, b), _w in edge_weight.items(): |
There was a problem hiding this comment.
Count parallel edge weights when selecting bridges
For graphs with several calls/imports between the same two endpoints in different communities, edge_weight already contains that multiplicity, but this loop ignores _w and appends the endpoint pair only once. That makes a heavily connected community pair look like a single rare bridge (edge_count is also underreported), so the cached architecture summary can surface non-surprising links as surprising whenever duplicate edges share endpoints.
Useful? React with 👍 / 👎.
… ids Symbol ids are assigned in file-walk order, which differs across OSes, so the community partition / god-node ranking (and thus the architecture/path/describe golden snapshots) diverged between Windows and the Linux/macOS CI runners. Key the analysis graph by content (kind:path:name:line) instead. build_adjacency takes an optional key_fn; analyze() passes the stable key so the result is identical on every platform. The community/degree helpers are generic over the node-key type (tuple in tests, str in analyze/export). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 1 of 4 — porting the best of safishamsi/graphify into codebase-index
This is the graph foundation: an honest confidence trail on every edge, plus a zero-dependency architecture-analytics engine. Later phases surface it via an
architecturecommand,path/describenavigation, and an upgraded HTML/export view.Added
Edge confidence audit trail — every graph edge carries a
confidence:extracted— exact (same-file symbol, or a repo-unique name)inferred— a heuristic resolved it (import path-suffix match)ambiguous— a named target we could not pin to a unique nodeDerived from how the edge resolved — never LLM-guessed; the index stays fully local.
refsandimpactnow show it, so an empty/short answer overambiguous/inferrededges reads as inconclusive, not as proof of "no callers". BumpsSCHEMA_VERSION2 → 3 (older indexes stay readable;index/updaterebuild on mismatch).graph/analysis.py— pure, deterministic, zero new dependencies (keeps the "small pure-Python core" promise). Over the resolved edge graph it computes:Cached in
meta['graph_analysis']at build time for instant reads.Tests
tests/test_analysis.py— community split, modularity, god-node ranking, bridge detection, labelling, cache round-trip, and a realsample_repobuild.tests/test_graph.py— asserts the three confidence values end-to-end.CI
pytest: 375 passed, 14 skipped, 84% coverage (gate 80%)ruff: clean ·mypy: clean🤖 Generated with Claude Code