Skip to content

Latest commit

 

History

History
173 lines (136 loc) · 6.12 KB

File metadata and controls

173 lines (136 loc) · 6.12 KB

MCP server and tool contract

codebase-index ships a stdio MCP server powered by the optional mcp extra:

pip install "codebase-index[mcp]"
codebase-index mcp --root /path/to/repo

The server speaks MCP over stdio through FastMCP. Build the index with codebase-index index before connecting a client.

Current shipped interfaces:

  • codebase-index / cbx CLI
  • codebase-index mcp --root <repo> stdio MCP server
  • Claude Code skill generated by codebase-index init --target claude
  • Codex AGENTS.md package generated by --target codex
  • OpenCode command/agent resources generated by --target opencode
  • Optional hooks and watch mode for freshness

Current non-goals:

  • No HTTP/SSE transport.
  • No streaming result events yet; use limit and token_budget.
  • Client-specific config paths are templates until verified against each client version.

Tools

The MCP server exposes the same retrieval contract as the CLI.

Tool Purpose CLI equivalent
healthcheck Report server, package, config, index, freshness, and safety status doctor, freshness checks
search_code Hybrid code search returning ranked file:line ranges search
find_symbol Locate definitions by symbol name/kind symbol
find_refs Return callers/references for a symbol refs
impact_of Return affected files/symbols from graph expansion impact
explain_code Intent-aware retrieval packet for a natural-language question explain
architecture_overview Modules, god nodes, surprising connections, suggested questions architecture
path_between Shortest dependency/call path between two symbols or files path
describe_symbol Node card: definition, callers, callees, centrality, module describe
index_stats Return counts, language coverage, graph stats, freshness stats

Output contract

Tool responses are JSON strings returned through MCP content blocks. Every payload — success or error — is wrapped in a stable envelope so clients can branch on the contract without sniffing the shape:

{
  "schema_version": 1,
  "tool": "search_code",
  "index": {
    "exists": true,
    "stale": false,
    "built_at": "2026-05-29T12:00:00Z",
    "files_changed_since_build": 0
  },
  "results": [],
  "recommended_reads": []
}
  • schema_version (int) — the payload contract version. Bumped only on a breaking change (field removal or type change); additive fields keep the same version. The current version is 1.
  • tool (string) — the emitting tool name (search_code, find_symbol, find_refs, impact_of, explain_code, architecture_overview, path_between, describe_symbol, index_stats, healthcheck).
  • The no-index / error path carries the same envelope plus an "error" field.

Rules:

  • Additive fields are allowed within a schema_version.
  • Field removal or type changes bump schema_version.
  • Tool descriptions should include examples and expected failure modes.
  • Errors should fail closed: no partial unsafe result when config or index state is unsafe.

Every tool's enveloped output is locked by golden snapshots in tests/golden/mcp_*.json (regenerate intentionally with UPDATE_GOLDEN=1 pytest tests/test_mcp_golden.py), and the schema_version / tool values are asserted explicitly so a golden can never silently freeze a wrong contract version.

Client config templates

Claude Desktop

{
  "mcpServers": {
    "codebase-index": {
      "command": "codebase-index",
      "args": ["mcp", "--root", "/path/to/repo"]
    }
  }
}

Claude Code

{
  "mcpServers": {
    "codebase-index": {
      "command": "codebase-index",
      "args": ["mcp", "--root", "${PWD}"]
    }
  }
}

Cursor / VS Code / Zed / Windsurf

Use the client's MCP server configuration UI or JSON file and register:

{
  "name": "codebase-index",
  "command": "codebase-index",
  "args": ["mcp", "--root", "/path/to/repo"]
}

Client-specific config file paths and screenshots should be added only after they are verified against the current client versions.

Progressive results

Large codebase queries currently support:

  • limit
  • token_budget

Future protocol work should add one of:

  • cursor for paging
  • token_budget to cap output
  • progressive result events where supported by the MCP SDK

Agents should be able to stop after enough context rather than receiving a large static payload.

Security requirements

The MCP server reads the same repository data as the CLI, so it inherits the same trust boundaries:

  • Respect .gitignore, .codeindexignore, .claudeignore, and .cursorignore.
  • Exclude secret filenames and binary/generated/dependency files before parsing.
  • Redact tokens, credentials, private keys, JWTs, and connection strings in snippets.
  • Make external embeddings opt-in only, with explicit config and warnings.
  • Never log source snippets or secrets by default.
  • Bind only to stdio unless a future HTTP transport has explicit host/protocol restrictions.

Implementation status

  • Done: src/codebase_index/mcp/server.py thin adapter over retrieval/storage code.
  • Done: codebase-index mcp --root <path> CLI entrypoint.
  • Done: healthcheck, search_code, find_symbol, find_refs, impact_of, explain_code, architecture_overview, path_between, describe_symbol, and index_stats tools.
  • Done: focused tests for tool registration, missing-index behavior, config resolution, and run entrypoint.
  • Done: explicit schema_version + tool envelope on every structured tool payload (including the error path), asserted by tests/test_mcp_server.py and tests/test_mcp_golden.py.
  • Done: golden snapshots for every tool output (tests/golden/mcp_*.json).
  • Done: unstructured-output registration (structured_output=False where supported) so the server loads on mcp>=1.27 + pydantic>=2.10, where auto-detecting a structured schema from the -> str return annotation otherwise raises at import time.
  • Follow-up: verified client-specific docs for Claude Desktop, Claude Code, Cursor, VS Code, Zed, and Windsurf.
  • Follow-up: paging or progressive result support.