Skip to content

feat(memory): structured messages + opt-in dedupBySession; add greymemory-cc plugin#8

Merged
arun-dev-des merged 4 commits into
mainfrom
feat/structured-messages-and-dedup
Jun 2, 2026
Merged

feat(memory): structured messages + opt-in dedupBySession; add greymemory-cc plugin#8
arun-dev-des merged 4 commits into
mainfrom
feat/structured-messages-and-dedup

Conversation

@arun-dev-des

Copy link
Copy Markdown
Owner

Summary

Implements the two greymemory library features needed to back a Claude Code memory plugin, and adds that plugin (greymemory-cc/). Both library features are additive and opt-in — existing string / {role,content:string}[] callers behave byte-for-byte as before.

Library — Feature 2: structured messages (src/memory.js, src/index.d.ts)

  • Message now accepts a tool role, tool_calls / tool_call_id / name, and content as string | ContentBlock[].
  • A single normalization chokepoint in add() (_normalizeContent / _serializeToolCalls / _normalizeMessages) flattens everything to { role, content:string }: content blocks → text + [image] placeholder (never the URL), assistant tool_calls[tool_call name=.. args=..], tool results → [tool result name=..].
  • source_role widened to Role | null.

Library — Feature 1: dedupBySession (src/memory.js, src/storage.js, bin/migrate.js, src/index.d.ts)

  • Opt-in add(input, { sessionId, dedupBySession: true }). Each round's raw text is sha256-hashed before contextualization; rounds already ingested under the same sessionId+container are skipped (no chunk, no contextualization, no extraction).
  • New chunks.content_hash + partial idx_chunks_dedup, kept in lockstep across _init, _migrate (rebuild + ALTER fallback), and bin/migrate.js; new storage.chunkExists(); AddResult.roundsSkipped / chunksSkipped.
  • Session-mode extraction is rebuilt from surviving rounds with re-indexed provenance.

Plugin — greymemory-cc/

A Claude Code plugin that captures sessions into a self-hosted greymemory DB (Stop hook → structured transcript mapping → add({ sessionId, dedupBySession })) and injects memories (SessionStart + UserPromptSubmithookSpecificOutput.additionalContext), plus a stdio MCP server (grey_search / grey_add / grey_profile) and slash commands. DB opened with journal_mode=WAL + busy_timeout for multi-process safety. Capture runs in a detached worker so the Stop hook never blocks the turn; a UUID-watermark cursor + dedupBySession make re-reading a growing transcript idempotent.

Testing

  • tsc --noEmit clean.
  • New tracked suites (stub extractor/embedder, no external deps): test files/test-task-3-dedup.js (skip counts, opt-in gating, contextualRetrieval, session-mode survivor prompt, container isolation) and test files/test-task-4-structured.js (content blocks, image-only, tool_calls in prompt, tool source_role, plain back-compat).
  • Migration paths verified on simulated pre-v0.4 (rebuild), v0.4 (ALTER fallback), and fresh/idempotent DBs.
  • Adversarial multi-agent review: back-compat and dedup-correctness clean; the only raised findings were verified false (pre-existing behavior, contract is Message[] | string).

Verified live

Ran the plugin end-to-end against Ollama (mxbai-embed-large) + Anthropic (claude-haiku-4-5): capture produced 4 round-chunks + 8 correctly-typed facts (with [tool result name=Bash] / [tool_call name=Bash] mapping), retrieval injected ranked context, the MCP server answered tools/list / grey_search / grey_profile over stdio, and dedupBySession skipped all rounds on re-add (chunks unchanged).

🤖 Generated with Claude Code

arun-dev-des and others added 4 commits June 1, 2026 13:34
…mory-cc plugin

Library (src/):
- Message now accepts a 'tool' role, tool_calls/tool_call_id/name, and content
  as string | ContentBlock[]. A normalization chokepoint in add()
  (_normalizeContent / _serializeToolCalls / _normalizeMessages) flattens
  everything to { role, content:string }: content blocks -> text + [image]
  placeholder, assistant tool_calls -> [tool_call name=.. args=..], tool results
  -> [tool result name=..]. source_role widened to Role | null.
- dedupBySession: opt-in add() option (requires sessionId). Each round's RAW text
  is sha256-hashed before contextualization; rounds already ingested under the
  same sessionId in this container are skipped (no chunk, no contextualization,
  no extraction). New chunks.content_hash + partial idx_chunks_dedup across
  _init / _migrate (rebuild + ALTER fallback) / bin/migrate.js; storage.chunkExists();
  AddResult.roundsSkipped / chunksSkipped. Session-mode extraction is rebuilt from
  surviving rounds with re-indexed provenance. Off by default -> byte-for-byte
  back-compat for existing string / {role,content:string}[] callers.
- index.d.ts updated (Role, ContentBlock, ToolCall, Message, AddOptions, AddResult).

Tests (test files/): test-task-3-dedup.js (dedup: skip counts, opt-in gating,
contextualRetrieval, session-mode survivor prompt, container isolation),
test-task-4-structured.js (content blocks, image-only, tool_calls, tool source_role,
plain back-compat). tsc --noEmit clean; migration paths verified on simulated old DBs.

Plugin (greymemory-cc/): Claude Code plugin that captures sessions into a
self-hosted greymemory DB (Stop hook -> structured transcript mapping ->
add({ sessionId, dedupBySession })) and injects memories (SessionStart +
UserPromptSubmit -> hookSpecificOutput.additionalContext), plus a stdio MCP server
(grey_search / grey_add / grey_profile) and slash commands. DB opened with WAL +
busy_timeout for multi-process safety. Verified live against Ollama
(mxbai-embed-large) + Anthropic (claude-haiku-4-5).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… seam, CI

- Tier 1 unit (test/plugin.test.mjs): pure-logic tests for transcript mapping,
  cursor watermark, and container resolution. Zero deps (node built-ins only).
- Tier 2 integration (test/integration.test.mjs): spawns the real capture-worker,
  retrieve hook, and MCP server with CC-shaped payloads, verifying DB writes,
  structured tool mapping, retrieval injection, MCP tools, and dedup — fully
  offline.
- lib/memory.mjs: add GREYMEMORY_EXTRACTOR=stub and GREYMEMORY_EMBEDDER=stub
  providers (deterministic, no network), gated behind explicit env; defaults
  (anthropic / ollama) unchanged. Enables offline tests and fully-local runs.
- package.json: `npm test` (unit) + `npm run test:integration`.
- .github/workflows/ci.yml: tsc --noEmit + library feature tests + plugin unit +
  offline integration, on push/PR. No secrets required.
- README: Testing section (three tiers + provider knobs).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pivot transcript mapping from structured messages to a clean prose round stream. Fix ide-context prompt loss (strip <ide_*> BEFORE the injected-row test, so a prompt typed with IDE context attached is no longer dropped wholesale). Move readJsonl to io.mjs so transcript.mjs is a pure entries->messages transform. Add lib/config.mjs: captureTools is user opt-in (settings.json / GREYMEMORY_CAPTURE_TOOLS), default off (conversational). Update unit + integration suites and README to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merge the former greymemory-viz and greymemory-diag into a single greymemory-console (one server + one client with viz/ and diag/ surfaces). Add benchmark helper scripts (ingest-single, verify-task-1, test-task-1-1) and CP1/CP2 round/key test files; refresh CLAUDE.md/benchmark/run.js. (Pre-existing working-tree changes, committed as-is per request; greymemory-cc work is in the preceding commit.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@arun-dev-des arun-dev-des merged commit 467694b into main Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant