Skip to content

Atrv-Shrn/MemStack

Repository files navigation

MemStack

Persistent semantic memory for AI agents — local files, no cloud.

AI agents forget everything between sessions. MemStack gives them persistent memory through a REST API and MCP server — write once, search by meaning, store as plain Markdown. No cloud, no API keys, no database.


Quick Start

Linux/macOS only. Windows users can check docs/windows.md.

# 1. Install
uv sync --extra dev

# 2. Configure (copy example env and set vault path)
cp .env.example .env
# Then edit .env and set MEMSTACK_VAULT_PATH to your vault directory

# 3. Start
uv run memstack start

# 4. Verify
curl http://127.0.0.1:7777/health

Expected output:

{
  "status": "healthy",
  "version": "1.4.4",
  "components": {
    "vault": "healthy",
    "lancedb": "healthy",
    "embeddings": "healthy",
    "mcp_port": 7778,
    "shared": "disabled",
    "watcher": "healthy"
  }
}

How It Works

Every write goes through a deduplication pipeline: similarity check, then LLM consultation for ambiguous cases. The pipeline decides add (new), merge (append facts), update (replace), or ignore (duplicate). Memories are stored as Markdown files with YAML frontmatter — human-readable, version-controllable. Search uses vector similarity + keyword matching, reranked by importance with time-based decay. Each agent gets its own namespace; shared mode copies private writes to a shared pool.

Write pipeline: similarity thresholds filter obvious matches, then Ollama decides add/merge/update/ignore for ambiguous cases; falls back to "add" if LLM is unavailable.

Storage: Markdown files with YAML frontmatter in a vault directory — human-readable, editable, version-controllable.

Search: vector + keyword hybrid via LanceDB, importance-weighted reranking with RRF fusion and time-based decay.


Features

  • Persistent memory — text memories that survive across sessions, stored as Markdown files
  • Smart deduplication — add, merge, update, or ignore based on similarity + LLM consultation
  • Semantic search — find memories by meaning, not just keywords, with importance-weighted reranking
  • Multi-agent namespaces — isolated memory per agent, plus optional shared namespace
  • MCP server — 7 memory tools for AI agents, runs as a separate process on its own port
  • Local-first and private — no cloud, no API keys, no telemetry; data stays on your machine
  • Graceful degradation — CRUD works without embeddings; search returns 503 until provider available
  • Background consolidation — LLM-driven review of stale memories (rewrite, enrich, merge, split)

Memory Files

Every memory is a Markdown file with YAML frontmatter. You can edit them in any text editor, put the vault under version control, or browse them with tools like Obsidian.

---
agent: my-agent
created: "2026-05-06T04:39:16.923211+00:00"
id: deployed-v2-to-production-my-agent-2026-05-07
importance: 0.9
importance_updated: "2026-05-06T04:39:16.923211+00:00"
tags:
  - deploy
  - production
type: memory
updated: "2026-05-06T04:39:16.923211+00:00"
---

Deployed v2 to production on Saturday

REST API

Method Endpoint Description
POST /agents/{id}/memories Write a memory
GET /agents/{id}/memories List memories
GET /agents/{id}/memories/{mem_id} Read a memory
DELETE /agents/{id}/memories/{mem_id} Delete a memory
GET /agents/{id}/memories/search?q= Search by meaning
GET /agents/{id}/inject?q= Inject context
GET /agents/{id}/system-prompt Get agent system prompt
GET /health Server health
# Write a memory
curl -X POST http://127.0.0.1:7777/agents/my-agent/memories \
  -H "Content-Type: application/json" \
  -d '{"content":"Deployed v2 to production on Saturday","tags":["deploy","production"],"importance":0.9}'
{
  "decision": "added",
  "id": "deployed-v2-to-production-my-agent-2026-05-07",
  "similarity_score": null
}

Full API reference: docs/api.md


MCP Server

The MCP server exposes 7 memory operations as tools, running as a separate process on port 7778. Enabled by default — disable with MEMSTACK_MCP_ENABLED=false.

Tool Description
memory_write Write a memory
memory_search Search by meaning
memory_read Read a single memory
memory_delete Delete a memory
memory_list List agent memories
memory_inject Inject context
memory_get_system_prompt Get system prompt block

Configuration

Required:

Variable Default Description
MEMSTACK_VAULT_PATH (Required) Path to vault directory

Optional:

Variable Default Description
MEMSTACK_HOST 127.0.0.1 Server bind address
MEMSTACK_PORT 7777 Server bind port
MEMSTACK_IMPORTANCE_INITIAL_SCORE 0.5 Default importance for new memories
MEMSTACK_LOG_LEVEL INFO Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL
MEMSTACK_LOG_FILE ~/.memstack/logs/memstack.log Log file path
MEMSTACK_LOG_ROTATION 10 MB Log rotation size threshold
MEMSTACK_LOG_RETENTION 7 days Log retention period
MEMSTACK_STATE_FILE ~/.memstack/state.json PID state file path
MEMSTACK_EMBEDDING_PROVIDER ollama Embedding provider: ollama or fastembed
MEMSTACK_EMBEDDING_MODEL nomic-embed-text Embedding model name (provider-specific)
MEMSTACK_EMBEDDING_AUTOFALLBACK true Auto-fallback to fastembed if Ollama unavailable
MEMSTACK_CHUNK_MAX_TOKENS 512 Max tokens per chunk for semantic chunking
MEMSTACK_CHUNK_OVERLAP_TOKENS 50 Overlap tokens between adjacent chunks
MEMSTACK_RRF_K 10 RRF constant for hybrid search fusion
MEMSTACK_IMPORTANCE_RERANK_WEIGHT 0.3 Weight for importance score in reranking (0.0–1.0)
MEMSTACK_INDEX_PATH ~/.memstack/index LanceDB index directory
MEMSTACK_SIMILARITY_ADD_THRESHOLD 0.25 Below this, always add as new memory
MEMSTACK_SIMILARITY_IGNORE_THRESHOLD 0.85 At or above this, treat as duplicate (if ignore enabled)
MEMSTACK_SIMILARITY_IGNORE_ENABLED false Auto-ignore duplicates above threshold (default: off — ambiguous cases go to LLM)
MEMSTACK_IMPORTANCE_DECAY_HALFLIFE 7.0 Half-life in days for importance decay
MEMSTACK_IMPORTANCE_HIT_INCREMENT 0.05 Importance bump on each retrieval
MEMSTACK_LLM_MODEL llama3 Ollama model for smart write consultation
MEMSTACK_LLM_HOST http://localhost:11434 Ollama host URL
MEMSTACK_MCP_ENABLED true Enable MCP server (separate process)
MEMSTACK_MCP_PORT 7778 Port for the standalone MCP server
MEMSTACK_WATCHER_ENABLED true Enable file watcher for automatic vault sync
MEMSTACK_WATCHER_DEBOUNCE_MS 2000 Debounce time in ms for file watcher events
MEMSTACK_SHARED_MODE false Enable shared mode — private writes also copied to shared namespace
MEMSTACK_INJECTION_MIN_SCORE 0.3 Minimum score for inject endpoint results
MEMSTACK_INJECTION_TOP_N 5 Max results returned by inject endpoint
MEMSTACK_EMBEDDING_CACHE_SIZE 1024 LRU cache size for embedding vectors
MEMSTACK_FTS_REBUILD_INTERVAL 50 Adds before FTS index rebuild
MEMSTACK_SEARCH_CACHE_TTL 30 TTL in seconds for search result cache
MEMSTACK_VAULT_CACHE_SIZE 512 LRU cache size for vault read/list operations
MEMSTACK_SYNTHESIS_ENABLED false Enable LLM synthesis for auto-capture writes
MEMSTACK_SYNTHESIS_MODEL (empty) Ollama model for synthesis (falls back to MEMSTACK_LLM_MODEL)
MEMSTACK_CONSOLIDATION_ENABLED false Enable background memory consolidation
MEMSTACK_CONSOLIDATION_INTERVAL 3600 Seconds between consolidation runs (min: 60)
MEMSTACK_CONSOLIDATION_BATCH_SIZE 20 Max memories per agent per run (1–100)
MEMSTACK_CONSOLIDATION_MODEL (empty) Ollama model for consolidation (falls back to MEMSTACK_LLM_MODEL)

All variables can be set in a .env file. See .env.example for the full reference.


Integrations

OpenClaw Bridge — TypeScript plugin for auto-recall, auto-capture, and native memory blocking → Setup guide


Troubleshooting

ValidationError: vault_path field required

MEMSTACK_VAULT_PATH not set. Set it in .env or as an environment variable before starting the server.

Port 7777 already in use

Another process is on port 7777. Use --port 8080 or kill the process on that port.

Search returns 503

Embedding provider unavailable. Start Ollama (ollama serve, then ollama pull nomic-embed-text) or set MEMSTACK_EMBEDDING_PROVIDER=fastembed.

Smart write always adds (never deduplicates)

Ollama not running, so LLM consultation falls back to "add". Start Ollama and pull llama3.

More: docs/troubleshooting.md


Changelog

  • v1.4.4 — LLM memory synthesis for auto-capture writes; background consolidation with rewrite/enrich/merge/split
  • v1.4.3 — Merge decision in smart write pipeline; auto-ignore made opt-in; lower similarity thresholds
  • v1.4.2 — Caching layer; async REST/MCP handlers; deferred importance updates

Full changelog


License

MIT © 2026 Atrv-Shrn

About

Local-first persistent memory for AI agents — semantic search, smart deduplication, importance decay, and multi-agent isolation. Stores everything as Markdown. No cloud, no API keys, no database.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors