feat(graph): edge confidence audit trail + architecture analytics (Phase 1/4) by denfry · Pull Request #12 · denfry/codebase-index

denfry · 2026-06-23T20:07:47Z

Phase 1 of 4 — porting the best of safishamsi/graphify into codebase-index

This is the graph foundation: an honest confidence trail on every edge, plus a zero-dependency architecture-analytics engine. Later phases surface it via an architecture command, path/describe navigation, and an upgraded HTML/export view.

Added

Edge confidence audit trail — every graph edge carries a confidence:
- extracted — exact (same-file symbol, or a repo-unique name)
- inferred — a heuristic resolved it (import path-suffix match)
- ambiguous — a named target we could not pin to a unique node
Derived from how the edge resolved — never LLM-guessed; the index stays fully local. refs and impact now show it, so an empty/short answer over ambiguous/inferred edges reads as inconclusive, not as proof of "no callers". Bumps SCHEMA_VERSION 2 → 3 (older indexes stay readable; index/update rebuild on mismatch).
graph/analysis.py — pure, deterministic, zero new dependencies (keeps the "small pure-Python core" promise). Over the resolved edge graph it computes:
- communities — greedy modularity (Louvain local-move). Unlike label propagation it does not collapse two cliques joined by a single bridge.
- god nodes — the most-connected symbols/files
- surprising connections — edges bridging weakly-linked communities
- auto-labelled modules + suggested questions
Cached in meta['graph_analysis'] at build time for instant reads.

Tests

tests/test_analysis.py — community split, modularity, god-node ranking, bridge detection, labelling, cache round-trip, and a real sample_repo build.
tests/test_graph.py — asserts the three confidence values end-to-end.
refs/impact goldens (CLI + MCP) regenerated to include the new fields (diff is additive only).

CI

pytest: 375 passed, 14 skipped, 84% coverage (gate 80%)
ruff: clean · mypy: clean

🤖 Generated with Claude Code

Phase 1 of porting graphify's best ideas into codebase-index. - edges.confidence (extracted/inferred/ambiguous), SCHEMA_VERSION 2->3. Derived from how an edge resolved (exact / import-suffix heuristic / unresolved); never LLM-guessed. Surfaced in refs + impact for honesty. - graph/analysis.py: zero-dep, deterministic communities (Louvain local-move), god nodes, surprising bridges, auto-labels, suggested questions. Cached in meta['graph_analysis'] at build time. - Tests for confidence + analysis; refs/impact goldens regenerated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Test files cluster with the code they exercise and often outnumber it, which mislabelled production modules as "tests". Community labels now prefer the dominant non-test directory; an all-test community still names for tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c62bcbed5f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-23T20:11:05Z

        """
        SELECT e.line AS line, f.path AS path, e.edge_type AS edge_type,
               e.resolved AS resolved, e.src_id AS src_id, e.src_kind AS src_kind,
+               e.confidence AS confidence,


Handle schema-v2 indexes before reading confidence

When a user has an existing schema_version 2 index and runs refs after upgrading, Database._guard_version still allows opening older indexes for read-only commands, but this query now unconditionally selects e.confidence. Since schema.sql uses CREATE TABLE IF NOT EXISTS, the column is not added to the old edges table, so SQLite raises no such column: e.confidence before the fallback in refs_lookup can run; impact has the same risk through the updated edge accessors. Please rebuild/reject old indexes in read paths or make these accessors tolerate the missing column.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-23T20:11:05Z

+    surface the actual endpoint pair for each such bridge.
+    """
+    pair_edges: dict[tuple[int, int], list[tuple[Node, Node]]] = defaultdict(list)
+    for (a, b), _w in edge_weight.items():


Count parallel edge weights when selecting bridges

For graphs with several calls/imports between the same two endpoints in different communities, edge_weight already contains that multiplicity, but this loop ignores _w and appends the endpoint pair only once. That makes a heavily connected community pair look like a single rare bridge (edge_count is also underreported), so the cached architecture summary can surface non-surprising links as surprising whenever duplicate edges share endpoints.

Useful? React with 👍 / 👎.

… ids Symbol ids are assigned in file-walk order, which differs across OSes, so the community partition / god-node ranking (and thus the architecture/path/describe golden snapshots) diverged between Windows and the Linux/macOS CI runners. Key the analysis graph by content (kind:path:name:line) instead. build_adjacency takes an optional key_fn; analyze() passes the stable key so the result is identical on every platform. The community/degree helpers are generic over the node-key type (tuple in tests, str in analyze/export). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

denfry and others added 2 commits June 23, 2026 23:07

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

This was referenced Jun 23, 2026

feat(architecture): architecture command + MCP tool (Phase 2/4) #13

Merged

feat(navigate): path + describe graph navigation (Phase 3/4) #14

Merged

feat(graph): viz upgrade + GraphML/DOT/Neo4j exports (Phase 4/4) #15

Merged

denfry merged commit 7d3340d into main Jun 23, 2026
10 checks passed

denfry mentioned this pull request Jun 23, 2026

release: v1.5.0 — graph analytics, navigation & interop #16

Merged

denfry deleted the feat/graph-confidence-analysis branch June 23, 2026 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(graph): edge confidence audit trail + architecture analytics (Phase 1/4)#12

feat(graph): edge confidence audit trail + architecture analytics (Phase 1/4)#12
denfry merged 3 commits into
mainfrom
feat/graph-confidence-analysis

denfry commented Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

denfry commented Jun 23, 2026

Phase 1 of 4 — porting the best of safishamsi/graphify into codebase-index

Added

Tests

CI

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant