feat!: lbug-only graph backend; rip DuckDB graph adapter#117
Merged
Conversation
DuckDB no longer implements IGraphStore. lbug (`@ladybugdb/core`) is the
sole graph backend; DuckDB stays as the temporal-only sidecar
(cochanges, symbol summaries, sql escape hatch, embeddings staging for
the deterministic Parquet sidecar). The auto-probe / dual-artifact /
CODEHUB_STORE resolver, the stale-artifact detection, and ~1900 LOC of
DuckDB graph-tier code are all gone.
Storage shape after the rip:
- `openStore({path})` always returns `{graph: GraphDbStore, temporal:
DuckDbStore, graphFile, temporalFile, close}`. No `backend` field on
the result, no `backend?` option on input.
- `<repo>/.codehub/graph.lbug` + `<repo>/.codehub/temporal.duckdb`.
`paths.describeArtifacts()` takes no arguments. `resolveDbPath` is
renamed `resolveGraphPath` and returns the lbug filename.
- The `IGraphStore` / `ITemporalStore` segregation stays — that's the
contract community AGE/Memgraph/Neo4j/Neptune adapters target. The
v1.0 conformance suite stays for the same reason.
- Embeddings live in graph.lbug; the pack sidecar streams them through
a per-call DuckDB temp table on temporal.duckdb so the byte-identical
Parquet writer still works.
- MCP `sql` tool's `cypher` field becomes unconditionally available.
lbug operational fixes captured in the same change:
- Pool now passes explicit `maxDbBytes=16 GiB` and
`bufferManagerBytes=2 GiB` so concurrent test Databases don't exhaust
the 47-bit user VA on Linux (default `maxDBSize=1<<43` = 8 TiB
reserves at construction). Citations: kuzudb/kuzu#1826,
`BufferPoolConstants::DEFAULT_VM_REGION_MAX_SIZE`.
- Bulk-load STRING[] sentinel switched from `[]` to `["__sentinel__"]`
so lbug's struct-field type inference doesn't resolve to LIST(ANY).
Empty-array sentinels surface as "Trying to create a vector with ANY
type" the moment a data row supplies a string.
- `ensureFtsIndex` / `ensureVectorIndex` no-op in readOnly mode, and
bulkLoad runs them at the end of the write path so readers don't
trigger writes on lbug.
ADR 0016 records the rip-out and supersedes ADR 0013 entirely; ADR
0011's "DuckDB-default + LadybugDB opt-in" framing is partially
superseded.
Net diff: +1297 / -7391 (6094 net deletions across 60 files).
Workspace verdict: `mise run check` exit 0 — 1931 passing tests, 0
failing, 2 platform-skipped.
…n before delete
Two lbug-vs-DuckDB-graph behavioral gaps surfaced by self-scan against
the OCH repo:
1. lbug's COPY enforces that every relation's from/to is a real CodeNode
primary key. The pipeline's fetches phase emits synthetic targets
(e.g. `fetches:unresolved:GET:/users/1`) carrying the URL template in
`reason`; these intentionally have no node. DuckDB silently accepted
them; lbug rejects with `Copy exception: Unable to find primary key
value`. Synthesize a Route placeholder for every orphan edge target
before insertNodes; downstream tools recognise the well-known prefix.
2. lbug builds the FTS index against CodeNode; deleting from CodeNode
(truncateAll in replace mode, mergeNodes per-id in upsert mode)
without the FTS extension loaded surfaces `Binder exception: Trying
to delete from an index on table CodeNode but its extension is not
loaded`. ingest-sarif's bulkLoad(graph, {mode: "upsert"}) hits this
on every analyze run after the first. Load the extension at the top
of bulkLoad so both modes' deletes succeed; failures are swallowed
on platforms without FTS so the search-side codepath surfaces the
clearer error later.
Verified: `codehub analyze .` runs end-to-end on the OCH repo (18,893
findings ingested via SARIF). Full workspace tests still 1931 pass / 0
fail. mise run check exit 0.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
IGraphStore. lbug (@ladybugdb/core) is the sole graph backend; DuckDB stays as the temporal-only sidecar (cochanges, symbol summaries, sql escape hatch, deterministic embeddings Parquet via temp-table staging).CODEHUB_STOREresolver and ~1900 LOC of DuckDB graph-tier code are gone.openStore({path})always returns{graph, temporal, graphFile, temporalFile, close}overgraph.lbug+temporal.duckdb.IGraphStore/ITemporalStoresegregation is preserved as the v1.0 community-adapter contract (AGE / Memgraph / Neo4j / Neptune still targetIGraphStore).Operational fixes captured in the same change
maxDBSizevirtual-address exhaustion — default1 << 43= 8 TiB perDatabasereserved at construction. Pool now passes explicitmaxDbBytes=16 GiB+bufferManagerBytes=2 GiB(citations: Buffer manager exception kuzudb/kuzu#1826,BufferPoolConstants::DEFAULT_VM_REGION_MAX_SIZE).LIST(ANY)runtime trap. Sentinel now seeds["__sentinel__"];WITH r WHERE r.id <> SENTINELfilters before COPY.CALL CREATE_FTS_INDEX/CALL CREATE_VECTOR_INDEXon a readOnly Database — surfaced as "Cannot execute write operations in a read-only database!" the moment a reader calledsearch(). Fix: build both at end ofbulkLoad; readOnly opens skip index creation.Net diff
+1545 / -7391across 60 files (5,800 net deletions).Test plan
mise run checkexit 0 (lint + typecheck + banned-strings + build + test)@ladybugdb/corebinding installs on your platform — Alpine/musl users need cmake-js source build (Wave 4 doctor flips to fail-hard)🤖 Generated with Claude Code