Persistent semantic memory for AI agents — local files, no cloud.
AI agents forget everything between sessions. MemStack gives them persistent memory through a REST API and MCP server — write once, search by meaning, store as plain Markdown. No cloud, no API keys, no database.
Linux/macOS only. Windows users can check docs/windows.md.
# 1. Install
uv sync --extra dev
# 2. Configure (copy example env and set vault path)
cp .env.example .env
# Then edit .env and set MEMSTACK_VAULT_PATH to your vault directory
# 3. Start
uv run memstack start
# 4. Verify
curl http://127.0.0.1:7777/healthExpected output:
{
"status": "healthy",
"version": "1.4.4",
"components": {
"vault": "healthy",
"lancedb": "healthy",
"embeddings": "healthy",
"mcp_port": 7778,
"shared": "disabled",
"watcher": "healthy"
}
}Every write goes through a deduplication pipeline: similarity check, then LLM consultation for ambiguous cases. The pipeline decides add (new), merge (append facts), update (replace), or ignore (duplicate). Memories are stored as Markdown files with YAML frontmatter — human-readable, version-controllable. Search uses vector similarity + keyword matching, reranked by importance with time-based decay. Each agent gets its own namespace; shared mode copies private writes to a shared pool.
Write pipeline: similarity thresholds filter obvious matches, then Ollama decides add/merge/update/ignore for ambiguous cases; falls back to "add" if LLM is unavailable.
Storage: Markdown files with YAML frontmatter in a vault directory — human-readable, editable, version-controllable.
Search: vector + keyword hybrid via LanceDB, importance-weighted reranking with RRF fusion and time-based decay.
- Persistent memory — text memories that survive across sessions, stored as Markdown files
- Smart deduplication — add, merge, update, or ignore based on similarity + LLM consultation
- Semantic search — find memories by meaning, not just keywords, with importance-weighted reranking
- Multi-agent namespaces — isolated memory per agent, plus optional shared namespace
- MCP server — 7 memory tools for AI agents, runs as a separate process on its own port
- Local-first and private — no cloud, no API keys, no telemetry; data stays on your machine
- Graceful degradation — CRUD works without embeddings; search returns 503 until provider available
- Background consolidation — LLM-driven review of stale memories (rewrite, enrich, merge, split)
Every memory is a Markdown file with YAML frontmatter. You can edit them in any text editor, put the vault under version control, or browse them with tools like Obsidian.
---
agent: my-agent
created: "2026-05-06T04:39:16.923211+00:00"
id: deployed-v2-to-production-my-agent-2026-05-07
importance: 0.9
importance_updated: "2026-05-06T04:39:16.923211+00:00"
tags:
- deploy
- production
type: memory
updated: "2026-05-06T04:39:16.923211+00:00"
---
Deployed v2 to production on Saturday| Method | Endpoint | Description |
|---|---|---|
POST |
/agents/{id}/memories |
Write a memory |
GET |
/agents/{id}/memories |
List memories |
GET |
/agents/{id}/memories/{mem_id} |
Read a memory |
DELETE |
/agents/{id}/memories/{mem_id} |
Delete a memory |
GET |
/agents/{id}/memories/search?q= |
Search by meaning |
GET |
/agents/{id}/inject?q= |
Inject context |
GET |
/agents/{id}/system-prompt |
Get agent system prompt |
GET |
/health |
Server health |
# Write a memory
curl -X POST http://127.0.0.1:7777/agents/my-agent/memories \
-H "Content-Type: application/json" \
-d '{"content":"Deployed v2 to production on Saturday","tags":["deploy","production"],"importance":0.9}'{
"decision": "added",
"id": "deployed-v2-to-production-my-agent-2026-05-07",
"similarity_score": null
}Full API reference: docs/api.md
The MCP server exposes 7 memory operations as tools, running as a separate process on port 7778. Enabled by default — disable with MEMSTACK_MCP_ENABLED=false.
| Tool | Description |
|---|---|
memory_write |
Write a memory |
memory_search |
Search by meaning |
memory_read |
Read a single memory |
memory_delete |
Delete a memory |
memory_list |
List agent memories |
memory_inject |
Inject context |
memory_get_system_prompt |
Get system prompt block |
Required:
| Variable | Default | Description |
|---|---|---|
MEMSTACK_VAULT_PATH |
— | (Required) Path to vault directory |
Optional:
| Variable | Default | Description |
|---|---|---|
MEMSTACK_HOST |
127.0.0.1 |
Server bind address |
MEMSTACK_PORT |
7777 |
Server bind port |
MEMSTACK_IMPORTANCE_INITIAL_SCORE |
0.5 |
Default importance for new memories |
MEMSTACK_LOG_LEVEL |
INFO |
Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL |
MEMSTACK_LOG_FILE |
~/.memstack/logs/memstack.log |
Log file path |
MEMSTACK_LOG_ROTATION |
10 MB |
Log rotation size threshold |
MEMSTACK_LOG_RETENTION |
7 days |
Log retention period |
MEMSTACK_STATE_FILE |
~/.memstack/state.json |
PID state file path |
MEMSTACK_EMBEDDING_PROVIDER |
ollama |
Embedding provider: ollama or fastembed |
MEMSTACK_EMBEDDING_MODEL |
nomic-embed-text |
Embedding model name (provider-specific) |
MEMSTACK_EMBEDDING_AUTOFALLBACK |
true |
Auto-fallback to fastembed if Ollama unavailable |
MEMSTACK_CHUNK_MAX_TOKENS |
512 |
Max tokens per chunk for semantic chunking |
MEMSTACK_CHUNK_OVERLAP_TOKENS |
50 |
Overlap tokens between adjacent chunks |
MEMSTACK_RRF_K |
10 |
RRF constant for hybrid search fusion |
MEMSTACK_IMPORTANCE_RERANK_WEIGHT |
0.3 |
Weight for importance score in reranking (0.0–1.0) |
MEMSTACK_INDEX_PATH |
~/.memstack/index |
LanceDB index directory |
MEMSTACK_SIMILARITY_ADD_THRESHOLD |
0.25 |
Below this, always add as new memory |
MEMSTACK_SIMILARITY_IGNORE_THRESHOLD |
0.85 |
At or above this, treat as duplicate (if ignore enabled) |
MEMSTACK_SIMILARITY_IGNORE_ENABLED |
false |
Auto-ignore duplicates above threshold (default: off — ambiguous cases go to LLM) |
MEMSTACK_IMPORTANCE_DECAY_HALFLIFE |
7.0 |
Half-life in days for importance decay |
MEMSTACK_IMPORTANCE_HIT_INCREMENT |
0.05 |
Importance bump on each retrieval |
MEMSTACK_LLM_MODEL |
llama3 |
Ollama model for smart write consultation |
MEMSTACK_LLM_HOST |
http://localhost:11434 |
Ollama host URL |
MEMSTACK_MCP_ENABLED |
true |
Enable MCP server (separate process) |
MEMSTACK_MCP_PORT |
7778 |
Port for the standalone MCP server |
MEMSTACK_WATCHER_ENABLED |
true |
Enable file watcher for automatic vault sync |
MEMSTACK_WATCHER_DEBOUNCE_MS |
2000 |
Debounce time in ms for file watcher events |
MEMSTACK_SHARED_MODE |
false |
Enable shared mode — private writes also copied to shared namespace |
MEMSTACK_INJECTION_MIN_SCORE |
0.3 |
Minimum score for inject endpoint results |
MEMSTACK_INJECTION_TOP_N |
5 |
Max results returned by inject endpoint |
MEMSTACK_EMBEDDING_CACHE_SIZE |
1024 |
LRU cache size for embedding vectors |
MEMSTACK_FTS_REBUILD_INTERVAL |
50 |
Adds before FTS index rebuild |
MEMSTACK_SEARCH_CACHE_TTL |
30 |
TTL in seconds for search result cache |
MEMSTACK_VAULT_CACHE_SIZE |
512 |
LRU cache size for vault read/list operations |
MEMSTACK_SYNTHESIS_ENABLED |
false |
Enable LLM synthesis for auto-capture writes |
MEMSTACK_SYNTHESIS_MODEL |
(empty) | Ollama model for synthesis (falls back to MEMSTACK_LLM_MODEL) |
MEMSTACK_CONSOLIDATION_ENABLED |
false |
Enable background memory consolidation |
MEMSTACK_CONSOLIDATION_INTERVAL |
3600 |
Seconds between consolidation runs (min: 60) |
MEMSTACK_CONSOLIDATION_BATCH_SIZE |
20 |
Max memories per agent per run (1–100) |
MEMSTACK_CONSOLIDATION_MODEL |
(empty) | Ollama model for consolidation (falls back to MEMSTACK_LLM_MODEL) |
All variables can be set in a
.envfile. See.env.examplefor the full reference.
OpenClaw Bridge — TypeScript plugin for auto-recall, auto-capture, and native memory blocking → Setup guide
MEMSTACK_VAULT_PATH not set. Set it in .env or as an environment variable before starting the server.
Another process is on port 7777. Use --port 8080 or kill the process on that port.
Embedding provider unavailable. Start Ollama (ollama serve, then ollama pull nomic-embed-text) or set MEMSTACK_EMBEDDING_PROVIDER=fastembed.
Ollama not running, so LLM consultation falls back to "add". Start Ollama and pull llama3.
More: docs/troubleshooting.md
- v1.4.4 — LLM memory synthesis for auto-capture writes; background consolidation with rewrite/enrich/merge/split
- v1.4.3 — Merge decision in smart write pipeline; auto-ignore made opt-in; lower similarity thresholds
- v1.4.2 — Caching layer; async REST/MCP handlers; deferred importance updates
MIT © 2026 Atrv-Shrn