Skip to content

feat!: lbug-only graph backend; rip DuckDB graph adapter#117

Merged
theagenticguy merged 3 commits into
mainfrom
feat/duckdb-graph-rip
May 16, 2026
Merged

feat!: lbug-only graph backend; rip DuckDB graph adapter#117
theagenticguy merged 3 commits into
mainfrom
feat/duckdb-graph-rip

Conversation

@theagenticguy
Copy link
Copy Markdown
Owner

Summary

  • DuckDB no longer implements IGraphStore. lbug (@ladybugdb/core) is the sole graph backend; DuckDB stays as the temporal-only sidecar (cochanges, symbol summaries, sql escape hatch, deterministic embeddings Parquet via temp-table staging).
  • The auto-probe / dual-artifact / CODEHUB_STORE resolver and ~1900 LOC of DuckDB graph-tier code are gone. openStore({path}) always returns {graph, temporal, graphFile, temporalFile, close} over graph.lbug + temporal.duckdb.
  • The IGraphStore / ITemporalStore segregation is preserved as the v1.0 community-adapter contract (AGE / Memgraph / Neo4j / Neptune still target IGraphStore).
  • See ADR 0016 for the rationale + operational impact (lbug platform reach, future SQLite-WASM temporal swap). ADR 0013 is fully superseded; ADR 0011 partially.

Operational fixes captured in the same change

  1. lbug maxDBSize virtual-address exhaustion — default 1 << 43 = 8 TiB per Database reserved at construction. Pool now passes explicit maxDbBytes=16 GiB + bufferManagerBytes=2 GiB (citations: Buffer manager exception kuzudb/kuzu#1826, BufferPoolConstants::DEFAULT_VM_REGION_MAX_SIZE).
  2. Empty-array STRING[] sentinel → LIST(ANY) runtime trap. Sentinel now seeds ["__sentinel__"]; WITH r WHERE r.id <> SENTINEL filters before COPY.
  3. CALL CREATE_FTS_INDEX / CALL CREATE_VECTOR_INDEX on a readOnly Database — surfaced as "Cannot execute write operations in a read-only database!" the moment a reader called search(). Fix: build both at end of bulkLoad; readOnly opens skip index creation.

Net diff

+1545 / -7391 across 60 files (5,800 net deletions).

Test plan

  • mise run check exit 0 (lint + typecheck + banned-strings + build + test)
  • Storage package test count: 148 pass / 0 fail / 1 skip (deterministic across 3 consecutive runs)
  • Workspace test count: 1931 pass / 0 fail / 2 platform-skipped
  • Pack package: byte-identity test for embeddings.parquet stages from lbug → temp-table → COPY → drop-temp; SHA equality across two runs
  • Reviewer: confirm @ladybugdb/core binding installs on your platform — Alpine/musl users need cmake-js source build (Wave 4 doctor flips to fail-hard)

🤖 Generated with Claude Code

DuckDB no longer implements IGraphStore. lbug (`@ladybugdb/core`) is the
sole graph backend; DuckDB stays as the temporal-only sidecar
(cochanges, symbol summaries, sql escape hatch, embeddings staging for
the deterministic Parquet sidecar). The auto-probe / dual-artifact /
CODEHUB_STORE resolver, the stale-artifact detection, and ~1900 LOC of
DuckDB graph-tier code are all gone.

Storage shape after the rip:
- `openStore({path})` always returns `{graph: GraphDbStore, temporal:
  DuckDbStore, graphFile, temporalFile, close}`. No `backend` field on
  the result, no `backend?` option on input.
- `<repo>/.codehub/graph.lbug` + `<repo>/.codehub/temporal.duckdb`.
  `paths.describeArtifacts()` takes no arguments. `resolveDbPath` is
  renamed `resolveGraphPath` and returns the lbug filename.
- The `IGraphStore` / `ITemporalStore` segregation stays — that's the
  contract community AGE/Memgraph/Neo4j/Neptune adapters target. The
  v1.0 conformance suite stays for the same reason.
- Embeddings live in graph.lbug; the pack sidecar streams them through
  a per-call DuckDB temp table on temporal.duckdb so the byte-identical
  Parquet writer still works.
- MCP `sql` tool's `cypher` field becomes unconditionally available.

lbug operational fixes captured in the same change:
- Pool now passes explicit `maxDbBytes=16 GiB` and
  `bufferManagerBytes=2 GiB` so concurrent test Databases don't exhaust
  the 47-bit user VA on Linux (default `maxDBSize=1<<43` = 8 TiB
  reserves at construction). Citations: kuzudb/kuzu#1826,
  `BufferPoolConstants::DEFAULT_VM_REGION_MAX_SIZE`.
- Bulk-load STRING[] sentinel switched from `[]` to `["__sentinel__"]`
  so lbug's struct-field type inference doesn't resolve to LIST(ANY).
  Empty-array sentinels surface as "Trying to create a vector with ANY
  type" the moment a data row supplies a string.
- `ensureFtsIndex` / `ensureVectorIndex` no-op in readOnly mode, and
  bulkLoad runs them at the end of the write path so readers don't
  trigger writes on lbug.

ADR 0016 records the rip-out and supersedes ADR 0013 entirely; ADR
0011's "DuckDB-default + LadybugDB opt-in" framing is partially
superseded.

Net diff: +1297 / -7391 (6094 net deletions across 60 files).

Workspace verdict: `mise run check` exit 0 — 1931 passing tests, 0
failing, 2 platform-skipped.
…n before delete

Two lbug-vs-DuckDB-graph behavioral gaps surfaced by self-scan against
the OCH repo:

1. lbug's COPY enforces that every relation's from/to is a real CodeNode
   primary key. The pipeline's fetches phase emits synthetic targets
   (e.g. `fetches:unresolved:GET:/users/1`) carrying the URL template in
   `reason`; these intentionally have no node. DuckDB silently accepted
   them; lbug rejects with `Copy exception: Unable to find primary key
   value`. Synthesize a Route placeholder for every orphan edge target
   before insertNodes; downstream tools recognise the well-known prefix.

2. lbug builds the FTS index against CodeNode; deleting from CodeNode
   (truncateAll in replace mode, mergeNodes per-id in upsert mode)
   without the FTS extension loaded surfaces `Binder exception: Trying
   to delete from an index on table CodeNode but its extension is not
   loaded`. ingest-sarif's bulkLoad(graph, {mode: "upsert"}) hits this
   on every analyze run after the first. Load the extension at the top
   of bulkLoad so both modes' deletes succeed; failures are swallowed
   on platforms without FTS so the search-side codepath surfaces the
   clearer error later.

Verified: `codehub analyze .` runs end-to-end on the OCH repo (18,893
findings ingested via SARIF). Full workspace tests still 1931 pass / 0
fail. mise run check exit 0.
@theagenticguy theagenticguy merged commit 49e14fd into main May 16, 2026
42 of 46 checks passed
@theagenticguy theagenticguy deleted the feat/duckdb-graph-rip branch May 16, 2026 10:45
@github-actions github-actions Bot mentioned this pull request May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant