Fix search result set bounded by k instead of efSearch#18
Closed
suykerbuyk wants to merge 2 commits into
Closed
Conversation
added 2 commits
April 9, 2026 18:27
This commit fixes the root causes of the 0-2/10 recall measured when integrating coder/hnsw into vibe-palace's semantic search pipeline. Four distinct bugs were identified and fixed: 1. Heap.Max() and PopLast() returned wrong elements (heap/heap.go) The min-heap's Max() returned data[len-1] (the last array element), which is NOT the maximum in a binary min-heap. PopLast() removed that same arbitrary element. This corrupted search result sets: when evicting the "worst" candidate to make room for a better one, a random element was removed instead. Fixed by scanning leaf nodes (indices n/2 through n-1) to find the true maximum. 2. Search termination condition was too aggressive (graph.go) The search algorithm terminated when no neighbor improved result.Min() (the BEST result), but the standard HNSW algorithm terminates when the best remaining CANDIDATE is farther than result.Max() (the WORST result). The old condition stopped exploring far too early, leaving most of the k result slots filled with suboptimal nodes. After this fix, recall on synthetic 200-node/64-dim data jumped from ~0.46 to ~0.96. 3. replenish() hardcoded CosineDistance (graph.go:168) When a node was deleted, the neighbor repair function always used CosineDistance regardless of the graph's configured distance function. This corrupted graph topology for any non-Cosine metric. Fixed by threading DistanceFunc through the addNeighbor() -> replenish() -> isolate() -> Delete() call chain. 4. Stale elevator panic (graph.go Add/search methods) When traversing layers top-to-bottom, the elevator node (best entry point from the previous layer) could reference a node that had been deleted between layers. Added existence checks with fallback to layer.entry(). Addresses upstream issue coder#15. Test embeddings: TF-IDF weighted feature hashing The integration tests and sample CLI use a deterministic text embedding based on the hashing trick with TF-IDF weighting. Each word hashes to a dimension index (feature hashing), weighted by its inverse document frequency. Stop words (~40 common English words) are removed so that rare discriminating terms dominate the vector. Embedding dimension is 2048 to minimize hash collisions (~6400 unique terms in the corpus). This produces meaningful lexical similarity — a query for "Arms" finds the 2nd Amendment and Mexico's Article 10 (right to keep arms). This is a test/demo utility only; vibe-palace uses real transformer embeddings (all-MiniLM-L6-v2 via hugot) which are unaffected by this. Test changes: - Heap tests: 1 test -> 11 tests, coverage 61.1% -> 100% - Graph tests: added 16 new tests covering SearchWithDistance, Lookup, empty graph ops, Dims, Len, AddReplace, delete with Euclidean/Cosine, stale elevator regression, dimension mismatch - Distance tests: RegisterDistanceFunc, name roundtrip, unknown func - Encode tests: error paths for bad version, unknown distance, truncated data, string key round-trip - Integration tests: constitution corpus (US/Canada/Mexico/Amendments) with recall@k vs brute-force ground truth, delete+search, export/import round-trip, benchmarks - Updated TestGraph_AddSearch expected results to reflect correct nearest neighbors after heap fix: {64,65,63,66} not {64,65,62,63} - Overall coverage: main 89.1%, heap 100% New files: - integration_test.go: integration tests with TF-IDF feature hashing - example/constitution-search/main.go: sample CLI application - test-data/*.md: US Constitution, Amendments, Canada, Mexico - Makefile: standard build facade (help, build, test, integration, bench, vet, check, install, clean)
The HNSW search algorithm's layerNode.search() method bounded its internal result set (found-neighbors set W) by k (the number of results to return) instead of efSearch (the exploration beam width). Per the HNSW paper's Algorithm 5 (SEARCH-LAYER), the found-neighbors set must be bounded by ef to prevent premature termination; results are trimmed to k only after the search completes. With k=3 and efSearch=100, the result set held only 3 entries. The worst-result threshold (result.Max().dist) converged to a local minimum (~0.86) within a few iterations, triggering the termination condition before the search could navigate to the true nearest neighbors. For example, searching "Bear Arms" against the constitution corpus missed the 2nd Amendment entirely (true nearest at distance 0.7383), returning a Mexican constitution article at 0.8576 instead. The bug required efSearch=400 (~50% of the 801-node graph) to find correct results — effectively degrading HNSW to near-brute-force search. Changes to graph.go layerNode.search(): 1. Result set capacity and bounds changed from k to efSearch. The termination condition now checks result.Len() >= efSearch instead of result.Len() >= k, giving the search a proper beam width to explore through before stopping. 2. Candidate insertion is now guarded by the result improvement check: neighbors are only added as candidates when they could improve the efSearch-bounded result set (result.Len() < efSearch || dist < result.Max().dist). This matches the standard HNSW algorithm and prevents wasting candidate slots on unpromising neighbors. 3. Post-search sort and trim: after the search loop completes, the efSearch-sized result set is sorted by distance and trimmed to the requested k results. Tiebreaking uses the node key (cmp.Compare) for deterministic ordering of equidistant results. This is the 5th bug found in upstream coder/hnsw. Like the previous four, it was invisible to unit tests (small synthetic datasets have well-connected graphs where k-bounded results don't cause premature termination). It was discovered through full-stack integration testing with the constitution-search example against real TF-IDF embeddings. All 48 tests pass. Coverage: main 89.3%, heap 100%.
40d4c46 to
7715256
Compare
Contributor
Author
|
Closing to restructure into smaller, focused PRs as requested. The efSearch bound fix will be included in the restructured PR series. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the 5th recall bug:
layerNode.search()bounded its internal result set byk(number of results to return) instead ofefSearch(the exploration beam width). Per the HNSW paper's Algorithm 5 (SEARCH-LAYER), the found-neighbors set must be bounded byef; results are trimmed tokonly after the search completes.With k=3 and efSearch=100, the result set held only 3 entries. The worst-result threshold converged to a local minimum within a few iterations, triggering early termination before reaching the true nearest neighbors. This required efSearch≈400 (~50% of the graph) to find correct results, effectively degrading HNSW to near-brute-force.
Changes to
graph.golayerNode.search()ktoefSearchkresults with deterministic tiebreakingDepends on #17 — uses the integration test infrastructure added there for validation.
Test plan
go test -short ./...— all 48 tests passgo test ./...— integration tests confirm recall improvement🤖 Generated with Claude Code