From ca93640d84ee78b4f340db1a2af172f20909d67d Mon Sep 17 00:00:00 2001 From: Kris McGinnes Date: Tue, 16 Jun 2026 15:32:27 -0500 Subject: [PATCH 1/2] Add ADRs for cross-tab persistence reconciliation strategy MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Record the two architectural decisions governing the fix for #1820: - IndexedDB (via localForage), not localStorage, for client-side persistence — Session and Schema payloads exceed localStorage's ~5MB limit, a constraint invisible in the code. - Per-key diff-merge reconciliation at the storage layer, scoped to durability only; same-entry conflicts and live cross-tab freshness are explicitly deferred. ADRs use the date-prefixed naming convention (#1835) and reference each other by slug. --- ...exeddb-not-localstorage-for-persistence.md | 38 +++++++++++++++ ...key-diff-merge-cross-tab-reconciliation.md | 47 +++++++++++++++++++ 2 files changed, 85 insertions(+) create mode 100644 docs/adr/20260616-indexeddb-not-localstorage-for-persistence.md create mode 100644 docs/adr/20260616-per-key-diff-merge-cross-tab-reconciliation.md diff --git a/docs/adr/20260616-indexeddb-not-localstorage-for-persistence.md b/docs/adr/20260616-indexeddb-not-localstorage-for-persistence.md new file mode 100644 index 000000000..e5fabf35d --- /dev/null +++ b/docs/adr/20260616-indexeddb-not-localstorage-for-persistence.md @@ -0,0 +1,38 @@ +# ADR — IndexedDB (via localForage), not localStorage, for client-side persistence + +- **Status:** Accepted +- **Date:** 2026-06-16 +- **Related:** ADR `per-key-diff-merge-cross-tab-reconciliation` builds the cross-tab fix on top of this constraint. Issue #1820. + +## Context + +Graph Explorer stores all user data client-side — there is no server-side storage. The persisted collections are **User Preferences**, **Schema**, **Connections**, and **Sessions**, all routed through `atomWithLocalForage` (`core/StateProvider/atomWithLocalForage.ts`), which is backed by localForage. localForage is configured (`name: "ge"`, `storeName: "graph-explorer"`) to use IndexedDB as its driver. + +Two of these payloads are routinely large: + +- A **Session** holds every **Vertex** and **Edge** a user has loaded through exploration for a **Connection**. Active exploration accumulates thousands of entities with their **Properties**. +- A **Schema** holds every discovered **Vertex Type**, **Edge Type**, and **Edge Connection** with their attributes, produced by **Schema Sync** against the connected database. + +Both regularly exceed `localStorage`'s ~5MB per-origin quota. `localStorage` is also synchronous and string-only (it would force `JSON.stringify`/`parse` of the whole payload on the main thread for every read and write). + +This constraint is **invisible in the code.** Nothing at the `atomWithLocalForage` call sites signals "these values are too big for `localStorage`." A reader who sees a simple read-once / write-whole-value atom could reasonably "simplify" it to `localStorage`, or reach for a `localStorage`-based collection library, and it would appear to work in development against small datasets — then silently fail in production when a real Session or Schema blows the quota. + +## Decision + +Persistence stays on **IndexedDB via localForage.** `localStorage` (and any library built on it) is explicitly ruled out for these collections. + +Rationale, recorded so it is not re-litigated: + +- **Capacity.** Session and Schema payloads exceed `localStorage`'s ~5MB limit; IndexedDB has no comparable practical ceiling for our payload sizes. +- **Asynchronous, structured storage.** IndexedDB stores structured values off the main thread; `localStorage` is synchronous and string-only. +- **The simplification is a trap, not a win.** A `localStorage`-backed `LocalStorageCollection`-style abstraction looks cleaner but cannot hold the data. Any persistence library evaluated for this codebase must use an IndexedDB driver (or equivalent large-capacity store), not `localStorage`. + +## Consequences + +- The persistence layer is inherently **asynchronous**. `atomWithLocalForage` preloads each value before returning the atom so reads are synchronous thereafter, but writes flush to IndexedDB in the background. The cross-tab fix in ADR `per-key-diff-merge-cross-tab-reconciliation` has to live inside that async write path. +- IndexedDB is **shared across all same-origin tabs** but has **no built-in cross-tab synchronization**. That sharing is precisely what makes the concurrent-write clobber in #1820 possible, and what ADR `per-key-diff-merge-cross-tab-reconciliation` reconciles. +- Tests mock localForage globally (see `setupTests.ts`), so this constraint does not impose IndexedDB on the test environment. + +## Out of scope + +- The choice of localForage as the IndexedDB wrapper (vs. raw IndexedDB or another wrapper) is not revisited here — only the IndexedDB-class storage requirement is. Swapping wrappers is allowed as long as the replacement keeps large-capacity, asynchronous, structured storage. diff --git a/docs/adr/20260616-per-key-diff-merge-cross-tab-reconciliation.md b/docs/adr/20260616-per-key-diff-merge-cross-tab-reconciliation.md new file mode 100644 index 000000000..104a8eff4 --- /dev/null +++ b/docs/adr/20260616-per-key-diff-merge-cross-tab-reconciliation.md @@ -0,0 +1,47 @@ +# ADR — Per-key diff-merge reconciliation for shared persisted collections + +- **Status:** Accepted +- **Date:** 2026-06-16 +- **Related:** ADR `indexeddb-not-localstorage-for-persistence` (IndexedDB constraint) is the storage substrate this reconciliation runs on. Issue #1820 (the clobber bug); inverse of #1788 / per-tab active connection, which wants per-tab _divergence_ rather than reconciliation. + +## Context + +`atomWithLocalForage` (`core/StateProvider/atomWithLocalForage.ts`) reads each value **once at startup** into an in-memory Jotai atom, then writes the **whole value** back on every change. IndexedDB is shared across same-origin tabs but has no cross-tab synchronization (no BroadcastChannel, no `storage` events, no Web Locks — see ADR `indexeddb-not-localstorage-for-persistence`). + +For collections mutated read-modify-write — e.g. `set(atom, prev => [...prev, x])` — a second tab's in-memory `prev` is stale the moment another tab persists. Writing the whole collection back then **silently drops entries the other tab added**. The clobber is last-writer-wins over the _entire_ collection, not just the field a tab touched. + +Concrete failure (#1820): Tab A styles **Vertex Type** X and persists. Tab B, opened before that and never having seen X, styles type Y and writes its stale **User Preferences** array — type X's styling is silently lost, discovered only on the next reload. The same hazard hits **Schema** (worst case: async **Schema Sync** completion clobbering a just-added SPARQL prefix), **Connections**, and **Sessions**. Affected: 32 write sites across 7 files. + +Scalar atoms (e.g. active connection) are unaffected — each write is a complete intended value, so there is no sibling entry to lose. + +## Decision + +Reconcile at the **storage layer**, inside the `atomWithLocalForage` write path, using a **per-key, diff-the-output merge**. On each persist: + +1. **Re-read** the current persisted value from IndexedDB (it may reflect writes by other tabs since this tab loaded). +2. **Diff this tab's output** — compare the value this tab is about to write against this tab's _previous in-memory value_ to determine exactly which keys this tab changed. +3. **Apply only those changed keys** onto the freshly-read value, and persist the result. + +The unit of reconciliation is the **key** (collection entry — e.g. one **Vertex Type**'s styling, one **Connection**, one **Schema** entry), so a tab editing entry Y never overwrites entry X that another tab added. The diff is computed from the _resulting_ values, not by replaying the updater function. + +### Scope boundary — durability only + +This decision fixes **durability** of concurrent writes to _different_ entries. It deliberately does **not** address: + +- **Same-entry conflicts.** Two tabs editing the _same_ key remain **last-writer-wins**. Field-level merge within a single entry is out of scope. +- **Live cross-tab freshness.** A tab still reads its in-memory copy and can show **stale data** while another tab is open. Propagating live changes between open tabs (so both reflect each other's edits without a reload) is explicitly deferred to a separate effort. + +## Considered Options + +- **Per-key diff-merge re-read at write (chosen).** Smallest change that fixes the clobber for sibling entries; lives entirely in the storage layer so all 32 call sites are fixed without touching them. Does not fix stale reads. +- **Replay-the-updater on the re-read value.** Re-run the `prev => next` updater against the freshly-read persisted value instead of diffing outputs. **Rejected: unsafe for non-idempotent updaters** — an updater like "append X" or "increment" applied to an already-updated base double-applies. Diffing the _output_ is update-function-agnostic and safe. +- **Cross-tab live sync (BroadcastChannel / `storage` events).** Keeps in-memory copies live and would also fix stale reads. **Deferred:** larger surface, and freshness is out of scope here. Can be layered on later without contradicting this decision. +- **Web Locks to serialize writes.** Heaviest option, with **availability caveats** across our deploy targets (not guaranteed in all embedding contexts, e.g. some notebook/proxy environments). Rejected for this fix. + +## Consequences + +- The write path becomes **read-modify-write against live IndexedDB** rather than a blind whole-value overwrite, adding one re-read per persist. Acceptable for the mutation frequencies involved (User Preferences is the highest, still human-interaction-paced). +- The merge needs each tab's **previous in-memory value** to compute its diff — `atomWithLocalForage` already holds this in its base atom, so no new state is introduced. +- Reconciliation is **per key**, so the persisted collections must be key-addressable (objects/maps keyed by entry id or type, or arrays reducible to such). Collections shaped as opaque blobs would not benefit. +- **Same-entry conflicts and stale reads persist by design** — anyone surprised by either should read this ADR's scope boundary before "fixing" it, and the live-sync follow-up is the intended home for the freshness work. +- This is the **inverse** of the per-tab active-connection decision (#1788): those scalars want each tab to _diverge_; these collections are genuinely shared and must _reconcile_. The two ship separately and must not be conflated. From 5b7a50a5a169d016c7939b56c4791b28e1b06629 Mon Sep 17 00:00:00 2001 From: Kris McGinnes Date: Tue, 16 Jun 2026 16:24:06 -0500 Subject: [PATCH 2/2] Add core rule against hard-wrapping Markdown prose --- AGENTS.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index e19001967..22e28a9aa 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -4,7 +4,8 @@ - Prefer descriptive variable and function names over code comments - Prefer simple to follow logic over clever concise code -- Every commit should have no type errors, lint errors, or failing tests +- Every commit should have no type errors, lint errors, formatting issues, or failing tests +- Don't hard-wrap Markdown prose to a fixed column width, let it soft-wrap - When possible, create failing tests first then implement the logic to make the tests pass - Add or update tests for the code you change, even if nobody asked - For TypeScript conventions and rules, refer to `.kiro/skills/typescript/SKILL.md`