Skip to content

fix(increment): stop endless full graph rebuild from stale hash entries#314

Merged
HumanBean17 merged 1 commit into
masterfrom
fix/increment-graph-rebuild-loop
Jun 13, 2026
Merged

fix(increment): stop endless full graph rebuild from stale hash entries#314
HumanBean17 merged 1 commit into
masterfrom
fix/increment-graph-rebuild-loop

Conversation

@HumanBean17

@HumanBean17 HumanBean17 commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Problem

Every java-codebase-rag increment reprocessed the full graph (same timing as reprocess) even when nothing had changed. The log showed deleting outdated nodes and edges and a constant 7 removed files on each run, no matter how many times it ran. Reported reproduction:

  1. java-codebase-rag reprocess --graph-only
  2. java-codebase-rag increment (nothing changed in between)
  3. → graph does a full rebuild again

Root cause

Two bugs in build_ast_graph.py feeding each other in a loop:

1. _init_hash_tracker never pruned stale entries. It did load() + update(), so hashes for files that no longer exist (deleted, or now --ignored) were preserved forever. On the next increment, detect_changes() re-flagged those ghosts as "removed" → non-empty change set → non-empty scoped rebuild → triggers the second bug → falls back to full rebuild → _init_hash_tracker runs again and preserves the ghosts → loop.

2. _write_clients_producers_and_calls crashed on missing node_id. The default MemberEntry passed to member_by_id.get(row.symbol_id, ...) was missing the now-required node_id field. Because dict.get(k, default) evaluates default eagerly, the TypeError fired on every incremental run that had any declares_client / declares_producer row — not just on misses. That crash forced every client-bearing incremental rebuild into a full-graph fallback, which (via bug #1) kept the ghosts alive.

The two bugs formed a closed loop, which is why "7 removed files" appeared on every single run.

Fix

  • _init_hash_tracker: rebuild the hash store fresh instead of load()+update(). Since this runs right after a full rebuild, the store should mirror exactly the files just indexed — old ghost entries get pruned. (No behaviour change for update, which re-hashes every current file anyway.)
  • _write_clients_producers_and_calls: add node_id="" to both MemberEntry defaults.

Verification

  • Real change on the bank-chat fixture (130 files): increment runs mode=incremental (~8s), no fallback.
  • No changes: increment is a no-op (~0.04s).
  • Stale/ghost hash entries: pruned on the next rebuild.
  • Confirmed the CLI regression test is a true regression test: git stash-ing build_ast_graph.py makes it fail on the fell back to full graph rebuild assertion, then passes with the fix restored.
  • Full suite: 767 passed, 11 skipped (cocoindex-gated heavy tests); ruff clean.

Behaviour changes

  • increment is now genuinely incremental when changes are limited; no more needless full rebuilds.
  • No ontology version bump, no schema change, no env-var change → no mandatory reindex. Existing ghost entries clear automatically on the next reprocess or fallback rebuild.

🤖 Generated with Claude Code

Two bugs in build_ast_graph.py fed an endless full-rebuild loop on every
`java-codebase-rag increment` (the "7 removed files every run" symptom):

1. _init_hash_tracker (run by every full reprocess and by incremental
   fallback) did load()+update() and never removed hashes for files no
   longer on disk. Ghost entries were re-detected as "removed" on every
   run, sustaining the loop.
2. _write_clients_producers_and_calls built a default MemberEntry missing
   the required node_id field. Because dict.get(k, default) evaluates the
   default eagerly, the TypeError fired whenever ANY declares_client /
   declares_producer row existed, crashing every client-bearing incremental
   rebuild into a full fallback (which then preserved the ghosts via #1).

Fix: rebuild the hash store fresh in _init_hash_tracker so stale entries
are pruned; add node_id="" to the two MemberEntry defaults.

Adds regression tests (builder-level + a CLI test for the reported
reprocess --graph-only -> increment scenario, verified to fail without
the fix).

Co-Authored-By: Claude <noreply@anthropic.com>
@HumanBean17 HumanBean17 merged commit f4ebf46 into master Jun 13, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant