Skip to content

feat(graph): edge confidence audit trail + architecture analytics (Phase 1/4)#12

Merged
denfry merged 3 commits into
mainfrom
feat/graph-confidence-analysis
Jun 23, 2026
Merged

feat(graph): edge confidence audit trail + architecture analytics (Phase 1/4)#12
denfry merged 3 commits into
mainfrom
feat/graph-confidence-analysis

Conversation

@denfry

@denfry denfry commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Phase 1 of 4 — porting the best of safishamsi/graphify into codebase-index

This is the graph foundation: an honest confidence trail on every edge, plus a zero-dependency architecture-analytics engine. Later phases surface it via an architecture command, path/describe navigation, and an upgraded HTML/export view.

Added

  • Edge confidence audit trail — every graph edge carries a confidence:

    • extracted — exact (same-file symbol, or a repo-unique name)
    • inferred — a heuristic resolved it (import path-suffix match)
    • ambiguous — a named target we could not pin to a unique node

    Derived from how the edge resolved — never LLM-guessed; the index stays fully local. refs and impact now show it, so an empty/short answer over ambiguous/inferred edges reads as inconclusive, not as proof of "no callers". Bumps SCHEMA_VERSION 2 → 3 (older indexes stay readable; index/update rebuild on mismatch).

  • graph/analysis.py — pure, deterministic, zero new dependencies (keeps the "small pure-Python core" promise). Over the resolved edge graph it computes:

    • communities — greedy modularity (Louvain local-move). Unlike label propagation it does not collapse two cliques joined by a single bridge.
    • god nodes — the most-connected symbols/files
    • surprising connections — edges bridging weakly-linked communities
    • auto-labelled modules + suggested questions

    Cached in meta['graph_analysis'] at build time for instant reads.

Tests

  • tests/test_analysis.py — community split, modularity, god-node ranking, bridge detection, labelling, cache round-trip, and a real sample_repo build.
  • tests/test_graph.py — asserts the three confidence values end-to-end.
  • refs/impact goldens (CLI + MCP) regenerated to include the new fields (diff is additive only).

CI

  • pytest: 375 passed, 14 skipped, 84% coverage (gate 80%)
  • ruff: clean · mypy: clean

🤖 Generated with Claude Code

denfry and others added 2 commits June 23, 2026 23:07
Phase 1 of porting graphify's best ideas into codebase-index.

- edges.confidence (extracted/inferred/ambiguous), SCHEMA_VERSION 2->3.
  Derived from how an edge resolved (exact / import-suffix heuristic /
  unresolved); never LLM-guessed. Surfaced in refs + impact for honesty.
- graph/analysis.py: zero-dep, deterministic communities (Louvain local-move),
  god nodes, surprising bridges, auto-labels, suggested questions. Cached in
  meta['graph_analysis'] at build time.
- Tests for confidence + analysis; refs/impact goldens regenerated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Test files cluster with the code they exercise and often outnumber it, which
mislabelled production modules as "tests". Community labels now prefer the
dominant non-test directory; an all-test community still names for tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c62bcbed5f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"""
SELECT e.line AS line, f.path AS path, e.edge_type AS edge_type,
e.resolved AS resolved, e.src_id AS src_id, e.src_kind AS src_kind,
e.confidence AS confidence,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle schema-v2 indexes before reading confidence

When a user has an existing schema_version 2 index and runs refs after upgrading, Database._guard_version still allows opening older indexes for read-only commands, but this query now unconditionally selects e.confidence. Since schema.sql uses CREATE TABLE IF NOT EXISTS, the column is not added to the old edges table, so SQLite raises no such column: e.confidence before the fallback in refs_lookup can run; impact has the same risk through the updated edge accessors. Please rebuild/reject old indexes in read paths or make these accessors tolerate the missing column.

Useful? React with 👍 / 👎.

surface the actual endpoint pair for each such bridge.
"""
pair_edges: dict[tuple[int, int], list[tuple[Node, Node]]] = defaultdict(list)
for (a, b), _w in edge_weight.items():

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count parallel edge weights when selecting bridges

For graphs with several calls/imports between the same two endpoints in different communities, edge_weight already contains that multiplicity, but this loop ignores _w and appends the endpoint pair only once. That makes a heavily connected community pair look like a single rare bridge (edge_count is also underreported), so the cached architecture summary can surface non-surprising links as surprising whenever duplicate edges share endpoints.

Useful? React with 👍 / 👎.

… ids

Symbol ids are assigned in file-walk order, which differs across OSes, so the
community partition / god-node ranking (and thus the architecture/path/describe
golden snapshots) diverged between Windows and the Linux/macOS CI runners.

Key the analysis graph by content (kind:path:name:line) instead. build_adjacency
takes an optional key_fn; analyze() passes the stable key so the result is
identical on every platform. The community/degree helpers are generic over the
node-key type (tuple in tests, str in analyze/export).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@denfry denfry merged commit 7d3340d into main Jun 23, 2026
10 checks passed
@denfry denfry deleted the feat/graph-confidence-analysis branch June 23, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant