Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,17 @@ All notable changes to this project are documented here. The format is based on

## [Unreleased]

### Added — graph navigation: `path` and `describe`
- **`codebase-index path <A> <B>`** — shortest undirected dependency/call path
between two symbols or files ("how is X connected to Y"). Renders the node chain
annotated with each link's edge type and confidence; `inferred`/`ambiguous` hops
are marked, so a path is only as trustworthy as its weakest edge.
- **`codebase-index describe <symbol>`** — a node card: definition(s), direct
callers and callees (with confidence), in/out degree, the symbol's module, and
its god-node rank if it has one. The graphify `explain Symbol` idea, named
`describe` so it doesn't collide with the existing how-it-works `explain`.
- **`path_between` and `describe_symbol` MCP tools** expose both to agents.

### Added — `architecture` command + `architecture_overview` MCP tool
- **`codebase-index architecture`** prints a high-level map of the codebase from
the analytics cached at index time: detected modules (with auto-derived labels),
Expand Down
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ See [CHANGELOG.md](CHANGELOG.md) and

MCP is now available as a stdio server via `codebase-index mcp --root <repo>`.
It exposes `healthcheck`, `search_code`, `find_symbol`, `find_refs`,
`impact_of`, `explain_code`, `architecture_overview`, and `index_stats`;
see [docs/MCP.md](docs/MCP.md).
`impact_of`, `explain_code`, `architecture_overview`, `path_between`,
`describe_symbol`, and `index_stats`; see [docs/MCP.md](docs/MCP.md).

```
You: "Where is user authentication implemented?"
Expand Down Expand Up @@ -391,6 +391,12 @@ codebase-index impact "src/auth/AuthService.ts"
# Map the codebase: modules, god nodes, surprising links, suggested questions
codebase-index architecture

# How are two symbols/files connected? Shortest dependency/call path
codebase-index path "renew" "refresh_access_token"

# Node card: definition, callers, callees, centrality, module
codebase-index describe "Database"

# View index statistics
codebase-index stats

Expand Down
6 changes: 4 additions & 2 deletions docs/MCP.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ The MCP server exposes the same retrieval contract as the CLI.
| `impact_of` | Return affected files/symbols from graph expansion | `impact` |
| `explain_code` | Intent-aware retrieval packet for a natural-language question | `explain` |
| `architecture_overview` | Modules, god nodes, surprising connections, suggested questions | `architecture` |
| `path_between` | Shortest dependency/call path between two symbols or files | `path` |
| `describe_symbol` | Node card: definition, callers, callees, centrality, module | `describe` |
| `index_stats` | Return counts, language coverage, graph stats, freshness | `stats` |

## Output contract
Expand Down Expand Up @@ -66,7 +68,7 @@ branch on the contract without sniffing the shape:
version. The current version is **1**.
- `tool` (string) — the emitting tool name (`search_code`, `find_symbol`,
`find_refs`, `impact_of`, `explain_code`, `architecture_overview`,
`index_stats`, `healthcheck`).
`path_between`, `describe_symbol`, `index_stats`, `healthcheck`).
- The no-index / error path carries the same envelope plus an `"error"` field.

Rules:
Expand Down Expand Up @@ -158,7 +160,7 @@ same trust boundaries:
- Done: `src/codebase_index/mcp/server.py` thin adapter over retrieval/storage code.
- Done: `codebase-index mcp --root <path>` CLI entrypoint.
- Done: `healthcheck`, `search_code`, `find_symbol`, `find_refs`, `impact_of`, `explain_code`,
`architecture_overview`, and `index_stats` tools.
`architecture_overview`, `path_between`, `describe_symbol`, and `index_stats` tools.
- Done: focused tests for tool registration, missing-index behavior, config resolution, and run entrypoint.
- Done: explicit `schema_version` + `tool` envelope on every structured tool payload (including the
error path), asserted by `tests/test_mcp_server.py` and `tests/test_mcp_golden.py`.
Expand Down
39 changes: 39 additions & 0 deletions src/codebase_index/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,45 @@ def architecture(
)


@app.command("path")
def path_between(
ctx: typer.Context,
source: str = typer.Argument(..., help="File path or symbol name to start from."),
target: str = typer.Argument(..., help="File path or symbol name to reach."),
json_flag: bool = typer.Option(False, "--json", help="Emit machine-readable JSON."),
) -> None:
"""Shortest dependency/call path between two symbols or files (how are they connected)."""
from .graph.navigate import path_payload
from .output import json as json_renderer
from .output import markdown as md_renderer
from .storage.db import Database

is_json = json_flag or bool(ctx.obj and ctx.obj.get("json"))
db_path, _cfg = _ensure_index(ctx)
with Database(db_path) as db:
payload = path_payload(db.conn, source, target)
typer.echo(json_renderer.render(payload) if is_json else md_renderer.render_path(payload))


@app.command("describe")
def describe(
ctx: typer.Context,
symbol: str = typer.Argument(..., help="Symbol name to describe."),
json_flag: bool = typer.Option(False, "--json", help="Emit machine-readable JSON."),
) -> None:
"""Node card for a symbol: definition, callers, callees, centrality, module."""
from .graph.navigate import describe_payload
from .output import json as json_renderer
from .output import markdown as md_renderer
from .storage.db import Database

is_json = json_flag or bool(ctx.obj and ctx.obj.get("json"))
db_path, _cfg = _ensure_index(ctx)
with Database(db_path) as db:
payload = describe_payload(db.conn, symbol)
typer.echo(json_renderer.render(payload) if is_json else md_renderer.render_describe(payload))


@app.command("graph")
def graph_view(
ctx: typer.Context,
Expand Down
201 changes: 201 additions & 0 deletions src/codebase_index/graph/navigate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
"""Graph navigation: shortest path between two nodes, and a node "card".

graphify ships `path A B` (how are two things connected?) and `explain Symbol`
(what is this node?). codebase-index already uses `explain` for how-it-works
retrieval, so the node card lives under `describe` to avoid colliding with it.

Both walk the *resolved* edge graph and carry the Phase-1 confidence trail, so a
path through an `inferred`/`ambiguous` edge is visibly less certain than one
through `extracted` edges.
"""

from __future__ import annotations

import sqlite3
from collections import deque
from typing import Optional

from ..storage import repo

# BFS safety valve: stop exploring after this many nodes so `path` stays cheap on
# very large graphs (the shortest path, if short, is found long before this).
_MAX_VISITS = 20000

Node = tuple[str, int]


def _freshness(conn: sqlite3.Connection) -> dict:
return {
"exists": True,
"stale": False,
"built_at": repo.get_meta(conn, "built_at"),
"head_commit": repo.get_meta(conn, "head_commit"),
}


def _resolve_targets(conn: sqlite3.Connection, token: str) -> list[Node]:
"""Resolve a path/symbol token to one or more graph nodes (file or symbols)."""
frow = repo.file_by_path(conn, token)
if frow is not None:
return [("file", int(frow["id"]))]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include file-contained symbols in path resolution

When a user supplies a file path as either endpoint, this seeds only the file node. The graph stores imports on file nodes but normal call/reference edges on the symbols inside the file, with no containment edge connecting a file to its symbols, so codebase-index path src/api/service.py refresh_access_token reports no path even when a function in service.py directly calls that target. Seed the file's symbols as well (as impact does) or add containment edges, otherwise the advertised symbols-or-files path mode misses common file-to-symbol/file connections.

Useful? React with 👍 / 👎.

sym_rows = repo.symbols_by_name(conn, token, exact=True)
if sym_rows:
return [("symbol", int(r["id"])) for r in sym_rows]
suffix = repo.files_with_suffix(conn, token)
if len(suffix) == 1:
return [("file", int(suffix[0]["id"]))]
return []


def _node_ref(conn: sqlite3.Connection, kind: str, node_id: int) -> Optional[dict]:
if kind == "file":
row = conn.execute("SELECT path FROM files WHERE id = ?", (node_id,)).fetchone()
if row is None:
return None
return {"kind": "file", "name": row["path"].rsplit("/", 1)[-1], "path": row["path"],
"line_start": None}
row = conn.execute(
"SELECT s.name AS name, s.kind AS kind, s.line_start AS line_start, f.path AS path "
"FROM symbols s JOIN files f ON f.id = s.file_id WHERE s.id = ?",
(node_id,),
).fetchone()
if row is None:
return None
return {"kind": "symbol", "name": row["name"], "symbol_kind": row["kind"],
"path": row["path"], "line_start": row["line_start"]}


def _undirected_neighbors(conn: sqlite3.Connection, kind: str, node_id: int):
"""Yield (next_kind, next_id, edge_type, confidence, direction) ignoring edge
direction — `path` answers "how are these connected", not "who calls whom"."""
for e in repo.incoming_edges(conn, kind, node_id):
yield e["src_kind"], int(e["src_id"]), e["edge_type"], e["confidence"], "in"
for e in repo.outgoing_edges(conn, kind, node_id):
if e["dst_id"] is not None:
yield e["dst_kind"], int(e["dst_id"]), e["edge_type"], e["confidence"], "out"


# ---------------------------------------------------------------------------
# path A B
# ---------------------------------------------------------------------------

def path_payload(conn: sqlite3.Connection, src: str, dst: str) -> dict:
"""Shortest undirected path between two nodes, with the edge audit trail."""
src_seeds = _resolve_targets(conn, src)
dst_seeds = set(_resolve_targets(conn, dst))
base = {"src": src, "dst": dst, "index": _freshness(conn), "nodes": [], "steps": []}
if not src_seeds or not dst_seeds:
missing = src if not src_seeds else dst
return {**base, "found": False, "reason": f"Could not resolve `{missing}` to an indexed node."}

# Multi-source BFS from every src node; stop at the first dst node reached.
parent: dict[Node, Optional[Node]] = {seed: None for seed in src_seeds}
via: dict[Node, tuple] = {}
queue: deque[Node] = deque(src_seeds)
found: Optional[Node] = None
visits = 0
while queue and visits < _MAX_VISITS:
node = queue.popleft()
visits += 1
if node in dst_seeds:
found = node
break
for nk, nid, etype, conf, direction in _undirected_neighbors(conn, *node):
nxt = (nk, nid)
if nxt not in parent:
parent[nxt] = node
via[nxt] = (etype, conf, direction)
queue.append(nxt)

if found is None:
return {**base, "found": False,
"reason": "No path found between the two nodes in the resolved graph."}

# Reconstruct from `found` back to a src seed.
chain: list[Node] = []
cur: Optional[Node] = found
while cur is not None:
chain.append(cur)
cur = parent[cur]
chain.reverse()

nodes = [ref for n in chain if (ref := _node_ref(conn, *n)) is not None]
steps = []
for prev, nxt in zip(chain, chain[1:]):
etype, conf, direction = via[nxt]
a, b = _node_ref(conn, *prev), _node_ref(conn, *nxt)
if a and b:
steps.append({"from": a, "to": b, "edge_type": etype,
"confidence": conf, "direction": direction})
return {**base, "found": True, "hops": len(steps), "nodes": nodes, "steps": steps}


# ---------------------------------------------------------------------------
# describe <symbol>
# ---------------------------------------------------------------------------

def describe_payload(conn: sqlite3.Connection, query: str) -> dict:
"""A node card: definition(s), callers, callees, centrality, module, god status."""
base = {"query": query, "index": _freshness(conn)}
sym_rows = repo.symbols_by_name(conn, query, exact=True)
if not sym_rows:
return {**base, "found": False,
"reason": f"No symbol named `{query}` is indexed. Try `search` or `symbol`."}

definitions = [
{
"name": r["name"],
"qualified": r["qualified"],
"kind": r["kind"],
"path": r["path"],
"line_start": r["line_start"],
"line_end": r["line_end"],
"signature": r["signature"],
"in_degree": int(r["in_degree"]),
"out_degree": int(r["out_degree"]),
}
for r in sym_rows
]
# Primary = most-connected definition (the one worth describing in depth).
primary_row = max(sym_rows, key=lambda r: int(r["in_degree"]) + int(r["out_degree"]))
primary_id = int(primary_row["id"])

callers = [
{"path": r["path"], "line": r["line"], "confidence": r["confidence"]}
for r in repo.refs_for_name(conn, query)
]
callees = []
for e in repo.outgoing_edges(conn, "symbol", primary_id):
if e["dst_id"] is None:
continue
ref = _node_ref(conn, e["dst_kind"], int(e["dst_id"]))
if ref is not None:
callees.append({**ref, "edge_type": e["edge_type"], "confidence": e["confidence"]})

module = primary_row["path"].rsplit("/", 1)[0] if "/" in primary_row["path"] else "(root)"
god = _god_rank(conn, primary_row["name"], primary_row["path"])

return {
**base,
"found": True,
"definitions": definitions,
"primary": {"name": primary_row["name"], "path": primary_row["path"],
"module": module, "god_rank": god,
"in_degree": int(primary_row["in_degree"]),
"out_degree": int(primary_row["out_degree"])},
"callers": callers,
"callees": callees,
}


def _god_rank(conn: sqlite3.Connection, name: str, path: str) -> Optional[int]:
"""1-based rank of this symbol among the cached god nodes, or None."""
from . import analysis

summary = analysis.load_analysis(conn)
if not summary:
return None
for idx, g in enumerate(summary.get("god_nodes", []), start=1):
if g.get("name") == name and g.get("path") == path:
return idx
return None
46 changes: 46 additions & 0 deletions src/codebase_index/mcp/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,52 @@ def architecture_overview() -> str:
return _emit("architecture_overview", payload)


@_tool()
def path_between(source: str, target: str) -> str:
"""Shortest dependency/call path between two symbols or files.

Answers "how is X connected to Y" — returns the chain of nodes and the edge
types (with confidence) linking them. Useful for tracing how a request reaches
the database, or how two modules touch.

Args:
source: File path (relative) or symbol name to start from.
target: File path (relative) or symbol name to reach.
"""
db_path, _ = _resolve_db()
if not db_path.exists():
return _emit("path_between", _no_index_payload())

from ..graph.navigate import path_payload
from ..storage.db import Database

with Database(db_path) as db:
payload = path_payload(db.conn, source, target)
return _emit("path_between", payload)


@_tool()
def describe_symbol(symbol: str) -> str:
"""Node card for a symbol: definition(s), callers, callees, centrality, module.

A compact "what is this and how does it sit in the graph" view — the in/out
degree, its module, whether it's a god node, and its direct callers/callees.

Args:
symbol: Symbol name to describe (e.g. "Database", "build_index").
"""
db_path, _ = _resolve_db()
if not db_path.exists():
return _emit("describe_symbol", _no_index_payload())

from ..graph.navigate import describe_payload
from ..storage.db import Database

with Database(db_path) as db:
payload = describe_payload(db.conn, symbol)
return _emit("describe_symbol", payload)


@_tool()
def index_stats() -> str:
"""Return index freshness, file count, symbol count, and per-language coverage."""
Expand Down
Loading
Loading