fix(increment): stop endless full graph rebuild from stale hash entries#314
Merged
Conversation
Two bugs in build_ast_graph.py fed an endless full-rebuild loop on every `java-codebase-rag increment` (the "7 removed files every run" symptom): 1. _init_hash_tracker (run by every full reprocess and by incremental fallback) did load()+update() and never removed hashes for files no longer on disk. Ghost entries were re-detected as "removed" on every run, sustaining the loop. 2. _write_clients_producers_and_calls built a default MemberEntry missing the required node_id field. Because dict.get(k, default) evaluates the default eagerly, the TypeError fired whenever ANY declares_client / declares_producer row existed, crashing every client-bearing incremental rebuild into a full fallback (which then preserved the ghosts via #1). Fix: rebuild the hash store fresh in _init_hash_tracker so stale entries are pruned; add node_id="" to the two MemberEntry defaults. Adds regression tests (builder-level + a CLI test for the reported reprocess --graph-only -> increment scenario, verified to fail without the fix). Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Every
java-codebase-rag incrementreprocessed the full graph (same timing asreprocess) even when nothing had changed. The log showeddeleting outdated nodes and edgesand a constant7 removed fileson each run, no matter how many times it ran. Reported reproduction:java-codebase-rag reprocess --graph-onlyjava-codebase-rag increment(nothing changed in between)Root cause
Two bugs in
build_ast_graph.pyfeeding each other in a loop:1.
_init_hash_trackernever pruned stale entries. It didload()+update(), so hashes for files that no longer exist (deleted, or now--ignored) were preserved forever. On the nextincrement,detect_changes()re-flagged those ghosts as "removed" → non-empty change set → non-empty scoped rebuild → triggers the second bug → falls back to full rebuild →_init_hash_trackerruns again and preserves the ghosts → loop.2.
_write_clients_producers_and_callscrashed on missingnode_id. The defaultMemberEntrypassed tomember_by_id.get(row.symbol_id, ...)was missing the now-requirednode_idfield. Becausedict.get(k, default)evaluatesdefaulteagerly, theTypeErrorfired on every incremental run that had anydeclares_client/declares_producerrow — not just on misses. That crash forced every client-bearing incremental rebuild into a full-graph fallback, which (via bug #1) kept the ghosts alive.The two bugs formed a closed loop, which is why "7 removed files" appeared on every single run.
Fix
_init_hash_tracker: rebuild the hash store fresh instead ofload()+update(). Since this runs right after a full rebuild, the store should mirror exactly the files just indexed — old ghost entries get pruned. (No behaviour change forupdate, which re-hashes every current file anyway.)_write_clients_producers_and_calls: addnode_id=""to bothMemberEntrydefaults.Verification
incrementrunsmode=incremental(~8s), no fallback.incrementis a no-op (~0.04s).git stash-ingbuild_ast_graph.pymakes it fail on thefell back to full graph rebuildassertion, then passes with the fix restored.ruffclean.Behaviour changes
incrementis now genuinely incremental when changes are limited; no more needless full rebuilds.reprocessor fallback rebuild.🤖 Generated with Claude Code