Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 0 additions & 9 deletions .agents/skills/propose/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,15 +234,6 @@ Docs-only; baseline unchanged.
- `TOOL-NAME-PROPOSE.md`
- `ARCHITECTURE-CHANGE-PROPOSE.md`

## Final checklist

- [ ] Proposal file lives under `propose/active/`
- [ ] Problem statement includes concrete examples
- [ ] Schema/ontology/re-index impact is explicit
- [ ] Open questions include `[TBD]` with recommendations
- [ ] Out-of-scope section is present
- [ ] Sequencing/follow-up path is clear

## Key Principles

- **One question at a time** — Don't overwhelm with multiple questions
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,9 @@ With the package installed, the console script `java-codebase-rag-mcp` is on you
claude mcp add --transport stdio java-codebase-rag -- java-codebase-rag-mcp
```

Then set env vars (`JAVA_CODEBASE_RAG_INDEX_DIR`, `JAVA_CODEBASE_RAG_SOURCE_ROOT`, `SBERT_MODEL`, …) in `.mcp.json` or your shell profile. For a project-scoped `.mcp.json` template, see [`mcp.json.example`](./mcp.json.example). Official docs: [Claude Code settings](https://docs.anthropic.com/en/docs/claude-code/settings).
**Zero-env-var configuration:** The tool automatically walks up the directory tree to find `.java-codebase-rag.yml`, so you don't need to set `JAVA_CODEBASE_RAG_SOURCE_ROOT` when working from within a project. Just place the config file at your project root and the tool will find it. See [`mcp.json.example`](./mcp.json.example) for the minimal configuration.

If you need to override defaults, you can set env vars (`JAVA_CODEBASE_RAG_INDEX_DIR`, `JAVA_CODEBASE_RAG_SOURCE_ROOT`, `SBERT_MODEL`, …) in `.mcp.json` or your shell profile. For a full configuration template, see [`mcp.json.example`](./mcp.json.example). Official docs: [Claude Code settings](https://docs.anthropic.com/en/docs/claude-code/settings).

### Claude Desktop

Expand Down
46 changes: 46 additions & 0 deletions docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,25 @@ For the architecture rationale (the GPS metaphor, three-layer design, future wor

The operator-facing surface is **six** variables (plus MCP-only `JAVA_CODEBASE_RAG_SOURCE_ROOT` below). Precedence for knobs that also exist as CLI flags or YAML entries is **CLI flag > env var > YAML > built-in default** (see [`JAVA-CODEBASE-RAG-CLI.md`](./JAVA-CODEBASE-RAG-CLI.md)).

### Config file discovery (walk-up)

The tool automatically walks up the directory tree from the current working directory to find `.java-codebase-rag.yml` (or `.yaml`), similar to how Git finds `.git`. This means you can run CLI commands and MCP queries from any subdirectory within your project — the tool will locate the config file automatically.

**Walk-up behavior:**
- Starts from the current working directory and walks up the directory tree
- Stops at `$HOME` (inclusive — checks `$HOME` itself but doesn't walk past it)
- First match wins (closest config to cwd, not "most specific" or "deepest")
- If no config is found, falls back to using the current directory

**Precedence for source root resolution:**
1. CLI flag `--source-root` (highest priority)
2. Environment variable `JAVA_CODEBASE_RAG_SOURCE_ROOT`
3. YAML field `source_root` (resolved relative to config directory)
4. Walk-up discovery result (config directory itself)
5. Current working directory (fallback)

This walk-up behavior means you no longer need to set environment variables or pass flags when working from within a project — the tool finds the config automatically.

| Variable | Purpose |
|---|---|
| `JAVA_CODEBASE_RAG_INDEX_DIR` | Local filesystem **directory** for Lance tables, the Kuzu file `code_graph.kuzu`, and cocoindex state (`cocoindex.db`). Not a `lancedb://` or cloud URI — use a path. Default: `./.java-codebase-rag/` under the resolved Java tree root. |
Expand Down Expand Up @@ -58,6 +77,14 @@ A single file at the project root (the directory you pass as `--source-root`, or

# -------- Core knobs (mirror env vars; precedence: CLI > env > YAML > default) --------

# Source root: the Java project root. Useful when the config file lives
# separately from the Java source code (e.g., monorepo with configs at repo root).
# - Tilde (`~`) is expanded; `$VAR` is NOT (use absolute paths or `~`).
# - Relative paths resolve against the config file's parent directory, not cwd.
# - Env: JAVA_CODEBASE_RAG_SOURCE_ROOT. CLI: --source-root.
# - Default: the directory containing this config file (for walk-up discovery).
# source_root: ../java-project

# Index directory: where Lance tables, code_graph.kuzu, and cocoindex.db live.
# - Tilde (`~`) is expanded; `$VAR` is NOT (use absolute paths or `~`).
# - Relative paths resolve against source_root, not cwd.
Expand Down Expand Up @@ -95,6 +122,25 @@ microservice_roots:
- chat-orchestrator
- ranking

# Automatic microservice scope for queries (MCP server only)
# When working from a microservice subdirectory, queries automatically scope
# to that microservice — no manual filter needed. This provides correct
# codebase boundaries for agents working on specific microservices.
#
# Behavior:
# - At microservice root or inside a microservice subdirectory:
# → Queries automatically scoped to that microservice
# - At project root (above all microservices):
# → Queries span all microservices with an advisory message
# - Explicit microservice filters always override auto-detected scope
#
# The MCP server logs scope detection at startup:
# [scope] Detected microservice: chat-core
# [scope] Queries scoped to chat-core
# Or at system level:
# [scope] No microservice detected (at project root)
# [scope] Queries will span all microservices

# -------- Cross-service edge resolution --------

# How the resolver treats auto-detected cross-service call edges. See §4.2.
Expand Down
29 changes: 29 additions & 0 deletions graph_enrich.py
Original file line number Diff line number Diff line change
Expand Up @@ -1565,6 +1565,35 @@ def microservice_for_path(
return ""


def detect_microservice_from_path(cwd: Path, source_root: Path) -> str | None:
"""Detect microservice from cwd for query-time auto-scope.

Returns None if cwd is outside source_root, cwd IS source_root (system level),
or no microservice is detected. Otherwise returns the microservice name.
"""
cwd_resolved = cwd.resolve()
source_resolved = source_root.resolve()

# Check if cwd is outside source_root
try:
cwd_resolved.relative_to(source_resolved)
except ValueError:
return None

# Check if cwd IS source_root (at system level, no specific scope)
if cwd_resolved == source_resolved:
return None

# Check if cwd itself matches a YAML override (directory name matches microservice_roots)
overrides = load_microservice_overrides(source_resolved)
if overrides and cwd_resolved.name in overrides:
return cwd_resolved.name

# Call existing microservice_for_path to detect microservice from build markers
ms = microservice_for_path(str(cwd_resolved), source_resolved)
return ms if ms else None


# ---------- chunk enrichment ----------


Expand Down
12 changes: 12 additions & 0 deletions java_codebase_rag/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,18 @@ def _add_verbosity_flags(p: argparse.ArgumentParser) -> None:

def _cmd_init(args: argparse.Namespace) -> int:
cfg = _resolved_from_ns(args)
# Check for parent config
from java_codebase_rag.config import discover_project_root, YAML_CONFIG_FILENAMES
parent_config_dir = discover_project_root(cfg.source_root.parent)
if parent_config_dir is not None:
parent_config = parent_config_dir / YAML_CONFIG_FILENAMES[0]
if not parent_config.is_file():
parent_config = parent_config_dir / YAML_CONFIG_FILENAMES[1]
print(
f"Warning: found existing config at {parent_config}. "
f"Creating a new project here will create a separate index.",
file=sys.stderr,
)
_startup_hints(cfg)
cfg.apply_to_os_environ()
occupied, paths = index_dir_has_existing_artifacts(cfg.index_dir)
Expand Down
59 changes: 57 additions & 2 deletions java_codebase_rag/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,33 @@ def find_yaml_config_file(source_root: Path) -> Path | None:
return None


def discover_project_root(start: Path) -> Path | None:
"""Walk up from start to find the directory containing a config file.

First match wins (closest to start). Stops at $HOME inclusive — checks $HOME
itself but does not walk past it. Returns None if no config found.
"""
start = start.resolve()
home = Path.home().resolve()

current = start
while True:
# Check if current directory contains a config file
if find_yaml_config_file(current) is not None:
return current

# Stop if we've reached home (check home itself, but don't walk past it)
if current == home:
return None

# Stop if we've reached filesystem root
parent = current.parent
if parent == current:
return None

current = parent


def load_yaml_mapping(source_root: Path) -> dict[str, Any]:
path = find_yaml_config_file(source_root)
if path is None:
Expand Down Expand Up @@ -277,8 +304,36 @@ def resolve_operator_config(
cli_embedding_model: str | None = None,
cli_embedding_device: str | None = None,
) -> ResolvedOperatorConfig:
root = (source_root or Path.cwd()).expanduser().resolve()
yaml_dict = load_yaml_mapping(root)
# Phase 1: Find the config file directory
if source_root is not None:
# CLI flag provided: use it as both config_dir and effective source_root
# (skip YAML source_root check - CLI wins)
root = source_root.expanduser().resolve()
config_dir = root
yaml_dict = load_yaml_mapping(config_dir)
else:
# Check env var first
env_raw = os.environ.get(ENV_SOURCE_ROOT, "").strip()
if env_raw:
root = Path(env_raw).expanduser().resolve()
config_dir = root
yaml_dict = load_yaml_mapping(config_dir)
else:
# Walk up to find config dir
discovered = discover_project_root(Path.cwd())
config_dir = discovered if discovered is not None else Path.cwd().resolve()
# Load YAML from config dir
yaml_dict = load_yaml_mapping(config_dir)

# Phase 2: Resolve effective source root
# Check for YAML source_root field (resolved relative to config dir)
yaml_source_root = yaml_dict.get("source_root")
if isinstance(yaml_source_root, str) and yaml_source_root.strip():
yroot = Path(yaml_source_root.strip()).expanduser()
root = yroot.resolve() if yroot.is_absolute() else (config_dir / yroot).resolve()
else:
root = config_dir

index_dir, index_src = _resolve_index_dir_path(
source_root=root, cli_index_dir=cli_index_dir, yaml_dict=yaml_dict
)
Expand Down
25 changes: 25 additions & 0 deletions mcp.json.example
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,28 @@
}
}
}

// Minimal configuration with walk-up discovery (no env vars required):
// The tool walks up from the current directory to find .java-codebase-rag.yml,
// then uses the config's source_root (or the config directory itself) to find the index.
// Just omit the "env" section entirely:
//
// {
// "mcpServers": {
// "java-codebase-rag": {
// "type": "stdio",
// "command": "java-codebase-rag-mcp"
// }
// }
// }
//
// For Claude Code (which uses the same MCP protocol but different config format),
// the minimal configuration in .mcp.json is similar:
//
// {
// "mcpServers": {
// "java-codebase-rag": {
// "command": "java-codebase-rag-mcp"
// }
// }
// }
Loading
Loading