Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,16 @@ All notable changes to this project are documented here. The format is based on

## [Unreleased]

### Added — `architecture` command + `architecture_overview` MCP tool
- **`codebase-index architecture`** prints a high-level map of the codebase from
the analytics cached at index time: detected modules (with auto-derived labels),
god nodes (most-connected symbols/files), surprising cross-module connections,
and suggested starting questions. `--json` for the structured payload.
- **`architecture_overview` MCP tool** exposes the same map to MCP clients, so an
agent can orient itself before diving into specifics. Reports
`available: false` (rather than crashing) on an index built before the analytics
existed; a reindex fixes it.

### Added — graph foundation: edge confidence + architecture analytics (requires a one-time reindex)
- **Edge confidence audit trail.** Every graph edge now carries a `confidence`:
`extracted` (exact — a same-file symbol or repo-unique name), `inferred` (a
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,8 @@ See [CHANGELOG.md](CHANGELOG.md) and

MCP is now available as a stdio server via `codebase-index mcp --root <repo>`.
It exposes `healthcheck`, `search_code`, `find_symbol`, `find_refs`,
`impact_of`, `explain_code`, and `index_stats`; see [docs/MCP.md](docs/MCP.md).
`impact_of`, `explain_code`, `architecture_overview`, and `index_stats`;
see [docs/MCP.md](docs/MCP.md).

```
You: "Where is user authentication implemented?"
Expand Down Expand Up @@ -387,6 +388,9 @@ codebase-index refs "AuthService.login"
# Analyze impact of a change
codebase-index impact "src/auth/AuthService.ts"

# Map the codebase: modules, god nodes, surprising links, suggested questions
codebase-index architecture

# View index statistics
codebase-index stats

Expand Down
6 changes: 4 additions & 2 deletions docs/MCP.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ The MCP server exposes the same retrieval contract as the CLI.
| `find_refs` | Return callers/references for a symbol | `refs` |
| `impact_of` | Return affected files/symbols from graph expansion | `impact` |
| `explain_code` | Intent-aware retrieval packet for a natural-language question | `explain` |
| `architecture_overview` | Modules, god nodes, surprising connections, suggested questions | `architecture` |
| `index_stats` | Return counts, language coverage, graph stats, freshness | `stats` |

## Output contract
Expand Down Expand Up @@ -64,7 +65,8 @@ branch on the contract without sniffing the shape:
breaking change (field removal or type change); additive fields keep the same
version. The current version is **1**.
- `tool` (string) — the emitting tool name (`search_code`, `find_symbol`,
`find_refs`, `impact_of`, `explain_code`, `index_stats`, `healthcheck`).
`find_refs`, `impact_of`, `explain_code`, `architecture_overview`,
`index_stats`, `healthcheck`).
- The no-index / error path carries the same envelope plus an `"error"` field.

Rules:
Expand Down Expand Up @@ -156,7 +158,7 @@ same trust boundaries:
- Done: `src/codebase_index/mcp/server.py` thin adapter over retrieval/storage code.
- Done: `codebase-index mcp --root <path>` CLI entrypoint.
- Done: `healthcheck`, `search_code`, `find_symbol`, `find_refs`, `impact_of`, `explain_code`,
and `index_stats` tools.
`architecture_overview`, and `index_stats` tools.
- Done: focused tests for tool registration, missing-index behavior, config resolution, and run entrypoint.
- Done: explicit `schema_version` + `tool` envelope on every structured tool payload (including the
error path), asserted by `tests/test_mcp_server.py` and `tests/test_mcp_golden.py`.
Expand Down
22 changes: 22 additions & 0 deletions src/codebase_index/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -499,6 +499,28 @@ def explain(
typer.echo(json_renderer.render(payload) if want_json else md_renderer.render(payload))


@app.command("architecture")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add architecture to the cbx allowlist

When skill/plugin users invoke the supported cbx wrapper instead of the raw codebase-index binary, this new subcommand is still blocked: skill/scripts/cbx, skill/scripts/cbx.ps1, the packaged template copies, and bin/cbx only allow search explain symbol refs impact graph stats doctor update index, so cbx architecture exits 2 before reaching this command. Please add architecture to those synced allowlists so the advertised command works through the bundled wrappers.

Useful? React with 👍 / 👎.

def architecture(
ctx: typer.Context,
json_flag: bool = typer.Option(False, "--json", help="Emit machine-readable JSON."),
) -> None:
"""High-level map of the codebase: modules, god nodes, surprising links, questions.

Reads the analytics cached at index time (no recompute). Rebuild the index if it
reports no analysis available.
"""
from .output import json as json_renderer
from .output import markdown as md_renderer
from .service import architecture_payload

is_json = json_flag or bool(ctx.obj and ctx.obj.get("json"))
db_path, cfg = _ensure_index(ctx)
payload = architecture_payload(db_path, cfg)
typer.echo(
json_renderer.render(payload) if is_json else md_renderer.render_architecture(payload)
)


@app.command("graph")
def graph_view(
ctx: typer.Context,
Expand Down
22 changes: 21 additions & 1 deletion src/codebase_index/mcp/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@
instructions=(
"Local codebase index. Use search_code for general queries, find_symbol for exact "
"symbol lookups, find_refs to find callers/usages, impact_of for blast-radius analysis, "
"and explain_code for architecture/how-it-works questions."
"explain_code for architecture/how-it-works questions, and architecture_overview to map "
"the codebase's modules, god nodes, and surprising connections before diving in."
),
)

Expand Down Expand Up @@ -263,6 +264,25 @@ def explain_code(
return _emit("explain_code", payload)


@_tool()
def architecture_overview() -> str:
"""High-level map of the codebase from the cached graph analytics.

Returns the detected modules (communities), god nodes (most-connected
symbols/files), surprising cross-module connections, and suggested starting
questions. Use this to orient before diving into specifics. Rebuild the index
if it reports ``available: false``.
"""
db_path, cfg = _resolve_db()
if not db_path.exists():
return _emit("architecture_overview", _no_index_payload())

from ..service import architecture_payload

payload = architecture_payload(db_path, cfg)
return _emit("architecture_overview", payload)


@_tool()
def index_stats() -> str:
"""Return index freshness, file count, symbol count, and per-language coverage."""
Expand Down
56 changes: 56 additions & 0 deletions src/codebase_index/output/markdown.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,62 @@ def render_refs(resp: RefsResponse) -> str:
return "\n".join(lines).rstrip() + "\n"


def render_architecture(payload: dict) -> str:
"""Render the architecture overview: modules, god nodes, surprising links, questions."""
if not payload.get("available", False):
reason = payload.get("reason", "No architecture analysis available.")
return f"_{reason}_\n"

idx = payload.get("index", {})
freshness = "fresh" if not idx.get("stale") else "STALE"
lines = [
f"**Architecture overview** | **index:** {freshness} | "
f"{payload.get('node_count', 0)} nodes · {payload.get('edge_count', 0)} edges · "
f"{payload.get('community_count', 0)} modules · modularity {payload.get('modularity', 0)}",
"",
]

communities = payload.get("communities", [])
if communities:
lines.append("### Modules")
lines.append("| # | module | size | key nodes |")
lines.append("|---|--------|------|-----------|")
for c in communities:
tops = ", ".join(f"`{t['name']}`" for t in c.get("top_nodes", [])[:4])
lines.append(f"| {c['id']} | {c['label']} | {c['size']} | {tops} |")
lines.append("")

gods = payload.get("god_nodes", [])
if gods:
lines.append("### God nodes (most-connected)")
lines.append("| node | kind | degree | location |")
lines.append("|------|------|--------|----------|")
for g in gods:
loc = g.get("path") or ""
lines.append(f"| `{g['name']}` | {g['kind']} | {g['degree']} | `{loc}` |")
lines.append("")

surprising = payload.get("surprising", [])
if surprising:
lines.append("### Surprising connections (cross-module bridges)")
for s in surprising:
fr, to = s["from"], s["to"]
lines.append(
f"- `{fr['name']}` ({fr.get('path') or '?'}) ↔ "
f"`{to['name']}` ({to.get('path') or '?'}) — {s['edge_count']} edge(s)"
)
lines.append("")

questions = payload.get("questions", [])
if questions:
lines.append("### Suggested questions")
for q in questions:
lines.append(f"- {q}")
lines.append("")

return "\n".join(lines).rstrip() + "\n"


def _header(query: str, exists: bool, stale: bool) -> str:
freshness = "fresh" if not stale else "STALE"
if not exists:
Expand Down
27 changes: 27 additions & 0 deletions src/codebase_index/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,33 @@ def search_payload(
)


def architecture_payload(db_path: Path, cfg: "Config") -> dict[str, Any]:
"""The cached architecture analytics (communities / god nodes / surprising /
questions) plus index freshness — the payload both CLI and MCP serialize.

Returns ``available: False`` when no analysis is cached (an index built before
this feature, or an empty graph); the caller tells the user to reindex.
"""
from .graph import analysis
from .indexer.freshness import compute_freshness
from .storage.db import Database

with Database(db_path) as db:
fresh = compute_freshness(db.conn, Path(cfg.root), cfg)
summary = analysis.load_analysis(db.conn)
if summary is None:
return {
"exists": True,
"available": False,
"reason": (
"No architecture analysis cached. Rebuild the index "
"(`codebase-index index`) to compute it."
),
"index": fresh.model_dump(),
}
return {"exists": True, "available": True, "index": fresh.model_dump(), **summary}


def stats_payload(conn: sqlite3.Connection) -> dict[str, Any]:
"""Index size, freshness, and per-language coverage with the graph tier."""
from .parsers.languages import has_full_graph
Expand Down
151 changes: 151 additions & 0 deletions tests/golden/architecture.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
{
"available": true,
"communities": [
{
"id": 0,
"label": "src/api",
"size": 3,
"top_nodes": [
{
"degree": 2,
"kind": "file",
"name": "service.py",
"path": "src/api/service.py"
},
{
"degree": 1,
"kind": "file",
"name": "token.py",
"path": "src/auth/token.py"
},
{
"degree": 1,
"kind": "file",
"name": "user.py",
"path": "src/models/user.py"
}
]
},
{
"id": 1,
"label": "src/auth",
"size": 3,
"top_nodes": [
{
"degree": 2,
"kind": "symbol",
"name": "refresh_access_token",
"path": "src/auth/token.py"
},
{
"degree": 1,
"kind": "symbol",
"name": "renew",
"path": "src/api/service.py"
},
{
"degree": 1,
"kind": "symbol",
"name": "login",
"path": "src/auth/token.py"
}
]
},
{
"id": 2,
"label": "src/api",
"size": 2,
"top_nodes": [
{
"degree": 1,
"kind": "symbol",
"name": "AdminUser",
"path": "src/api/service.py"
},
{
"degree": 1,
"kind": "symbol",
"name": "User",
"path": "src/models/user.py"
}
]
}
],
"community_count": 3,
"edge_count": 5,
"exists": true,
"god_nodes": [
{
"community": 0,
"degree": 2,
"kind": "file",
"name": "service.py",
"path": "src/api/service.py"
},
{
"community": 1,
"degree": 2,
"kind": "symbol",
"name": "refresh_access_token",
"path": "src/auth/token.py"
},
{
"community": 0,
"degree": 1,
"kind": "file",
"name": "token.py",
"path": "src/auth/token.py"
},
{
"community": 0,
"degree": 1,
"kind": "file",
"name": "user.py",
"path": "src/models/user.py"
},
{
"community": 2,
"degree": 1,
"kind": "symbol",
"name": "AdminUser",
"path": "src/api/service.py"
},
{
"community": 1,
"degree": 1,
"kind": "symbol",
"name": "renew",
"path": "src/api/service.py"
},
{
"community": 1,
"degree": 1,
"kind": "symbol",
"name": "login",
"path": "src/auth/token.py"
},
{
"community": 2,
"degree": 1,
"kind": "symbol",
"name": "User",
"path": "src/models/user.py"
}
],
"index": {
"built_at": "<TS>",
"exists": true,
"files_changed_since_build": 0,
"head_commit": "<SHA>",
"stale": false
},
"modularity": 0.82,
"node_count": 8,
"questions": [
"What is the role of `service.py` in the architecture?",
"How does `refresh_access_token` work?",
"What breaks if `refresh_access_token` changes?",
"What is the role of `token.py` in the architecture?"
],
"surprising": []
}
Loading
Loading