🧠 MATHIR

⚠️ DISCLAIMER — Read before use

MATHIR has NOT undergone formal security testing or third-party audits. It was built by an independent developer for the AI community. Use at your own risk in production environments.

What this means for you:

🔒 No penetration testing has been performed on the daemon, MCP server, or HTTP endpoints

🛡️ The immunological tier is a research prototype, not a certified security layer

🐛 There may be undiscovered vulnerabilities in memory persistence, IPC, or daemon networking

⚡ The codebase has 173 tests but they cover functionality, not adversarial security

Why open source?

🌍 Built for the community, by one person

🤝 Feel free to use, modify, fork, and improve

📝 PRs and security reports welcome — please open an issue or pull request

💡 If you find a vulnerability, please report it privately first if possible

Recommended for production:

Run MATHIR behind a firewall

Don't expose the daemon (port 7338) directly to the internet

Validate all inputs at your application layer

Monitor for unusual memory patterns

Not recommended for:

❌ Handling sensitive PII without additional encryption layer

❌ Mission-critical systems without fallback

❌ Production AI agents without sandboxing

License: MIT — do what you want, but you carry the risk.

🧠 MATHIR

Memory-Augmented Tensor Hybrid with Intelligent Routing

The first cognitive memory layer for LLMs that actually thinks — promotes, forgets, consolidates, and links.

🆕 v8.5.0 — FastMCP rewrite + auto-injection. MCP server rewritten using FastMCP 3.4.2 (19 tools). Memories auto-injected into agent system prompts via plugin. Unified Flask+Waitress server (single process, single port). Direct DB access, no daemon bridge.

v8.4.2 — Immunological is now a real 5th tier. The architecture (working, episodic, semantic, procedural) is extended with a first-class immunological tier for prompt-injection detection and input anomaly scoring. All 17 MCP tools gain tier-aware routing. 173/173 tests pass.

v8.4.0 — Living memory, not a write-only disk. MATHIR now ships a full Ebbinghaus forgetting curve, tier promotion (working → episodic → semantic → procedural), semantic consolidation (auto-merge near-duplicates), and a link graph (spreading activation à la Collins & Loftus 1975). Memories that get recalled grow stronger; memories that don't, decay and archive. 7 new MCP tools. 173/173 tests pass.

🧭 Project Origin · 🔥 5 Problems Solved · 🆕 What's new in 8.4 · 🔌 MCP Plug & Play · 📖 The Story · ⚡ Quick Start · 🏗️ Architecture · 🆚 vs Alternatives

🧭 Project Origin — 2 years, 1 question

This is the story behind MATHIR. It's also my end-of-study project.

The original question (2024)

Watching modern cars — with their dozens of sensors, cameras, lidars, ultrasonic arrays — I asked myself a simple question:

Can they actually navigate an unknown environment?

Not a highway with lane markings. Not a pre-mapped city. A place they've never seen, where the rules change every meter.

The 2D simulator (Pygame)

My first attempt to answer that question was a 2D simulator in Pygame. I knew it had limits — no real lidar, no ROS, no real sensor noise, no SLAM. But the question wasn't about precision. The question was more fundamental:

Is a car really autonomous and intelligent?

The answer was a clear NO. A car following pre-programmed rules in a perfect simulation isn't intelligent — it's scripted. True autonomy requires the ability to learn, remember, and adapt across situations it's never seen before.

MATHIR's original purpose

That's where MATHIR started. Its first goal was to be a long-term memory for an autonomous agent — initially as a competitor to LSTM-based memory (you can still see the LSTM-era code in _deprecated/ for historical reference). The idea was simple:

An AI can't be intelligent if it can't remember. Every session starts from zero — that's amnesia, not intelligence.

2 years later: LSTM is dead, the question evolved

LSTM-based memory aged out in 2 years. The AI landscape shifted toward transformers and retrieval-augmented generation. So I refactored MATHIR to use vector search, semantic consolidation, link graphs, and Ebbinghaus forgetting — the cognitive architecture you see today.

But the deeper question stayed:

Is AI itself intelligent? Or just better at looking intelligent?

The next step: a real 3D RC car

MATHIR has been validated in software (173/173 tests, 5-tier architecture, plug-and-play MCP). The next step is to build a 3D-printed RC car and test MATHIR as its actual memory layer in a real autonomous-driving scenario — physical sensors, real noise, real unknowns.

That's the validation I can't fake with a Pygame simulation. Stay tuned.

🔥 5 real-world problems MATHIR solves

1. Medical AI — "We've never seen this disease before"

A diagnostic model trained on 10,000 cases works great — until a rare disease appears that wasn't in the training data. Today's solution? Retrain the entire model. Expensive. Slow. Sometimes impossible with limited data.

With MATHIR: The rare case is stored as an episodic memory. Next time a similar patient walks in, MATHIR recalls it instantly. No retraining. The model learns from experience, like a doctor does.

2. Chat sessions — "Sorry, who are you?"

You spend 2 hours explaining your project to ChatGPT. Next day, new chat. You explain everything again. After 7 sessions, you've repeated yourself 7 times. Your context lives in 7 separate boxes that never talk to each other.

With MATHIR: Your context persists across sessions, across tools, across time. Switch from Claude to Gemini to local Llama — MATHIR remembers. You never explain the same thing twice.

3. Autonomous driving — "The sensor just died"

LIDAR, cameras, ROS2, HD maps — today's self-driving stack is impressive. But what happens when a sensor fails?

Camera is blinded by sun glare → no visual data
LIDAR gets covered in mud → no depth perception
GPS loses signal in a tunnel → no position
Radar picks up ghost objects → false positives

In that moment, the car is blind. It has no memory of what happened 5 seconds ago, 5 minutes ago, or on this exact road last week.

With MATHIR: The car doesn't just see — it remembers. "Last time I was here, there was a speed bump at this GPS coordinate." "This pattern of cones meant a lane merge 200m ahead." When sensors fail, memory fills the gap. Like a human driver who's been on that road before.

4. Fine-tuning — "My data is a mess"

You want to fine-tune a model, but your data is scattered across Notion, Slack, email, and 15 different documents. Nothing is classified. Nothing is in the right format. You spend weeks just preparing data before any training starts.

With MATHIR: You feed raw knowledge directly. MATHIR auto-classifies, deduplicates, links related concepts, and organizes everything into 5 cognitive tiers. Your data is ready for fine-tuning as you add it — not after weeks of cleanup.

5. Knowledge drift — "Is this still accurate?"

That API endpoint you documented 6 months ago? It changed. But your old notes still say the old URL. Your team follows outdated instructions. Nobody knows which version is current.

With MATHIR: Memories decay when unused. When an API changes, the old memory fades and the new one takes over. MATHIR self-maintains its knowledge — no human cleanup needed.

🆕 What's new in v8.5.0 — FastMCP + Auto-Injection

MATHIR v8.5.0 is a major rewrite. The hand-rolled JSON-RPC MCP server is replaced by FastMCP 3.4.2 (used by 70% of MCP servers in the wild). Memories are now auto-injected into agent system prompts — no manual recall needed.

Key changes

FastMCP 3.4.2 — 19 MCP tools, stdio transport, battle-tested
Auto-injection plugin — memories injected at session start + during session
Unified server — single process, single port (7338), Flask + Waitress
Direct DB access — no HTTP daemon bridge for core operations
Embedder pre-warmed — 25-30s first load, then cached in memory
Portable config — no OpenCode hardcodes in templates

19 MCP tools

Category	Tools
Auto-injection	`memory_session_start`, `memory_context`
Basic CRUD	`memory_save`, `memory_recall`, `memory_smart_search`, `memory_hybrid_search`, `memory_delete`, `memory_stats`
Lifecycle	`memory_promote`, `memory_auto_promote`, `memory_decay`, `memory_consolidate`, `memory_link`, `memory_get_links`, `memory_build_links`
Other	`memory_export`, `memory_audit`, `memory_sessions`, `memory_dashboard`

🆕 What's new in v8.4.0 — Living memory

MATHIR v8.4.0 closes the gap between "memory that stores" and "memory that thinks". Every other memory layer for LLMs is a write-only disk: you save, you recall, and that's it. MATHIR is the first that actually manages its own memory lifecycle.

Your brain doesn't keep everything — and neither does MATHIR

Think about how your own memory works. You don't remember every breakfast you've ever had. Your brain quietly discards the boring stuff and keeps what matters. When you solve the same problem twice, the second time feels easier — because the memory got stronger. And when you learn something new, it connects to things you already know, forming a web of associations.

MATHIR does the same thing. Here's how:

Your brain	MATHIR	What happens
🩷 Focus — you hold a few things in mind right now	`working_memory`	Scratchpad for the current session. Fades fast.
🩵 Autobiography — you remember what happened yesterday	`episodic`	Events: bugs fixed, decisions made, sessions completed.
🟩 Knowledge — you know that water boils at 100°C	`semantic`	Stable facts that apply everywhere, not tied to one event.
🟨 Muscle memory — you don't think about how to ride a bike	`procedural`	Recipes, runbooks, how-to guides. Automatized.
🟥 Immune system — your body rejects what's dangerous	`immunological`	Anomalies, prompt injections, suspicious patterns.

The 4 things MATHIR now does

1. Memories grow stronger when you use them 🧠 Every time you recall a memory, it gets a little more stable. Memories that are recalled often get promoted to a higher tier — like turning a casual fact into deep knowledge. Forgotten memories slowly fade and get archived.

2. Duplicates merge automatically 🧹 Ever saved the same note three times? MATHIR finds near-duplicates (cosine similarity > 0.95) and merges them into one canonical memory — no more noise.

3. Related memories link together 🔗 MATHIR builds a web of associations between memories. When you recall one, the system follows links to surface related context — like how thinking about "coffee" makes you think about "morning routine" and "focus".

4. Stale memories decay 📉 Memories you never recall slowly lose stability, following Ebbinghaus's forgetting curve. After 30 days of no recall, they start decaying 5% every 30 days. When stability drops below 0.05, they're archived — not deleted, just out of the way.

Before vs after

# BEFORE v8.4.0 — passive storage
memory_save("the API uses /v2/chat/completions")
memory_save("the API uses /v2/chat/completions")  # duplicate
memory_save("the API uses /v2/chat/completions")  # duplicate
# → 3 memories, all the same, no ranking, no decay, no links

# AFTER v8.4.0 — living memory
memory_save(...)
memory_recall(query)                # auto-touches: stability↑, recall_count↑
memory_auto_promote()               # working → episodic if mature enough
memory_decay()                      # archive stale memories
memory_consolidate(dry_run=False)   # merge 3 duplicates into 1 canonical
memory_build_links(threshold=0.7)   # link related concepts
# → 1 canonical memory + N linked memories, ranked, aging, connected

Live verification (2026-06-23)

stats: 29 memories, by_tier={episodic:14, semantic:9, working:6}
promote: episodic → semantic (force=True)
recall: 3 results, touched=3
build_links: 246 links created from 29 memories (threshold=0.5)
consolidate: 3 candidates at threshold 0.9 (dry_run)

7 new MCP tools, 7 new daemon RPC methods, 26 new pytest tests (173/173 total).

The story that hurts

🧑‍💻 The developer

Monday morning. You open Claude. You tell it:
  "My name is Thomas, I'm building a RAG with Python, FastAPI + Postgres."
Claude says: "Got it, I'll remember that."

3 months later. You switch to Cursor + Llama 3.1.
  Llama: "Hi! Who are you?"
  ↑ Everything Claude "remembered"? Gone. Vendor-locked.

You try Mem0. $79/month. Not open source. You can't audit what it does with your data.
You want to run on your Jetson for offline. "We'll get back to you with an enterprise quote."
You want to detect prompt injection. "That's not what we do."

6 months of memory. Wiped in 3 seconds. Because your memory doesn't belong to you.

🚗 The autonomous vehicle

2:32 PM. The Tesla learns that a yellow pedestrian marker at a crosswalk
  = slow down. Pattern stored in local memory.

2:33 PM. OTA restart. Memory is wiped.
  The model no longer "remembers" the pattern.
  Next time, it won't slow down.

2:34 PM. A truck ahead sends corrupted data on the CAN bus.
  The sensor reports 0 km/h while actually doing 80.
  No system flags the anomaly. The vehicle accelerates.

2:35 PM. 80 km/h. Zero detection. Zero alerts. Zero memory.

A car that doesn't remember = a car that doesn't understand.

What MATHIR changes

✅ Memory that follows you everywhere — SQLite local, MIT, zero vendor lock-in.
✅ Memory that improves — +37.8% online learning, not static facts.
✅ Anomaly detected in <1ms — immunological tier, AUC = 1.0, zero false positives.
✅ Runs on edge — 240 MB VRAM, Jetson Orin ✅, Raspberry Pi ⚠️, zero cloud.

🔌 MCP Plug & Play — 2 lines

One server, 19 tools, same memory. Connect any LLM in 2 steps:

# 1. Start the daemon (once)
python -m mathir_mcp
# or use the console script:
mathir-server

// 2. Add to your MCP tool (opencode.json, claude_desktop_config, etc.)
{
  "mcp": {
    "mathir": {
      "command": "mathir-mcp"
    }
  }
}

That's it. memory_save, memory_recall, memory_smart_search, memory_hybrid_search — available in all your tools.

v9.0 Console Scripts (universal, IDE-agnostic)

Command	What it does
`mathir-mcp`	MCP stdio server (19 tools, 2 prompts) — for any MCP client
`mathir-server`	HTTP unified server (port 7338, optional auth on LAN)
`mathir-client`	CLI client: `mathir-client recall "my query"`
`mathir-dashboard`	Stats dashboard (port 7420)
`mathir-migrate`	One-shot legacy→new schema migration (`--dry-run` / `--apply`)
`mathir-brain`	Orchestrator (starts server + watchdog + proxy)

Install: pip install -e ./mathir_mcp

Tool	MCP	Config
OpenCode	✅ Native	`opencode.json` → `mcpServers`
Claude Code	✅ Native	`claude_desktop_config.json`
Kilo Code	✅ Native	Settings → MCP → Add Server
MiMo Code	✅ Native	Config `mcp` section
Cursor / Windsurf	✅ Native	Any MCP-compatible client works

Supports: OpenAI · Anthropic · Gemini · Groq · Ollama · llama_cpp · any LLM.

✅ Fully tested on (verified 2026-06-23)

🤖 AI Coding Tools (MCP clients)

Tool	Status	Notes
OpenCode	✅ Verified	Native MCP support, plug-and-play
MiMo Code	✅ Verified	Native MCP support, config mcp section
Zcode	✅ Verified	Native MCP support, custom config

🧠 LLMs (backend models via the same MATHIR memory)

Model	Recall speed	Notes
MiMo (basic)	⚡ Best	Best recall quality, recommended for memory tasks
MiMo Pro	⚡ Excellent	Best + extended context
MiniMax 2.7	⚡ Excellent	Fast recall, low latency
MiniMax M3	⚡ Excellent	Newest, best multilingual support
GLM 5.1	⚡ Fast	Great for reasoning + memory combo
Nemotron	⚡ Fast	NVIDIA, robust on edge
GPT-OSS	⚡ Fast	OpenAI open-source variant

Same MATHIR memory across all LLMs. Switch the backend, the memory stays.

✅ Cold-boot auto-start (v8.4.2 — verified 2026-06-24)

As of v8.4.2, true cold-boot auto-start is implemented on all platforms via:

Windows: VBS launcher in shell:startup (no admin needed)
macOS: ~/Library/LaunchAgents/com.mathir.daemon.plist
Linux: ~/.config/systemd/user/mathir-daemon.service with lingering

Run once: python install_smart.py --autostart-only — daemon will then start silently on every login/reboot before any agent even boots. See the ⚠️ After PC Reboot section for full details.

Scenario	v8.4.1 (before)	v8.4.2 (after)
Daemon running, recall called	✅ Works	✅ Works
Daemon crashed, recall called	✅ Auto-restart	✅ Auto-restart
PC rebooted, autostart configured	❌ Silent failure	✅ Daemon up before agent boots
PC rebooted, no autostart yet	❌ Silent failure	⚠️ Manual start required (see Option 1/2)
Daemon down, Python API used	✅ Works	✅ Works (the only truly automatic path)

💡 Why this matters

One memory, any LLM — save with Claude, recall with MiMo, continue with GPT. No vendor lock-in.
Fast recall — memory lookup is in milliseconds thanks to the persistent daemon (port 7338).
Python fallback when MCP fails — if the MCP layer fails for any reason, the Python API (from mathir_lib import ...) is always there as a reliable backup.
Cold-boot auto-start — daemon is up and warm before your first prompt on a fresh boot (after one-time setup).
Watchdog (within session) — daemon auto-restarts on crash mid-session, embeddings reload, connections retry.

🛠️ Installation recommendation

⚠️ Highly recommended: Use an AI agent (OpenCode, Claude Code, MiMo) to install the MCP server for you.

Manual config is error-prone because every tool has a different config format:

OpenCode: opencode.json (mcp section)

Claude Desktop: claude_desktop_config.json (mcpServers)

MiMo: custom config format

Zcode: YAML

Just say to your agent: "Install the MATHIR MCP server from https://github.com/sil3d/MATHIR". The mathir_inject.py and mathir_sync.py tools will handle the rest automatically.

📦 Cross-platform install scripts

For developers who prefer manual install, two smart installers are shipped in mathir_mcp/bin/:

Platform	Script	What it does
Windows	`install.bat` (wraps `install_smart.py`)	Auto-detects 40+ coding agents, injects MCP config + system prompt
Linux / macOS	`install.sh` (wraps `install_smart.py`)	Same as Windows but bash

# Linux/Mac
git clone https://github.com/sil3d/MATHIR.git
cd MATHIR/mathir_mcp/bin
./install.sh

# Windows
git clone https://github.com/sil3d/MATHIR.git
cd MATHIR\mathir_mcp\bin
install.bat

📖 Platform-specific install guides

For complete step-by-step instructions (including auto-start setup):

🪟 INSTALL/INSTALL_WINDOWS.md — Task Scheduler, mathir.bat, troubleshooting
🐧 INSTALL/INSTALL_LINUX.md — systemd user unit, loginctl enable-linger, Apple Silicon note
🍎 INSTALL/INSTALL_MACOS.md — launchd LaunchAgent, Homebrew fix, Apple Silicon

🚀 Auto-start the daemon after reboot

Once installed, run python install_smart.py --autostart-only to enable cold-boot auto-start, or start the daemon manually after each reboot (see ⚠️ After PC Reboot below for details).

🔧 Dynamic Injection & Sync (v8.4.1)

Two new dev-loop tools in ~/.config/opencode/bin/ (or mathir_mcp/mathir_lib/ in the source repo) automate the MATHIR injection block across all your AI config files.

Tool	What it does	When to run
`mathir_inject.py`	Reads `<target>/_MATHIR_INJECT.md` and injects the block into every `.md` of that target. Idempotent.	After creating/editing a template, or a new agent/command/skill
`mathir_sync.py`	Copies new files from `<repo_root>/mathir_mcp/` into your `~/.config/opencode/`. Safe by default — never overwrites.	After dev work in the source repo

# 5 targets: agents, commands, skills, skills-global, docs (+ "all")
python bin/mathir_inject.py --apply --target all         # inject everything
python bin/mathir_inject.py --check --target all        # see what would change
python bin/mathir_inject.py --apply --file agents/foo.md # inject one file
python bin/mathir_inject.py --list                      # show targets/templates
python bin/mathir_inject.py --explain                   # how it works

# Sync source -> config (NEW files only by default)
python bin/mathir_sync.py                               # dry-run
python bin/mathir_sync.py --force                       # apply
python bin/mathir_sync.py --only modules                # Python files only
python bin/mathir_sync.py --update-existing             # overwrite (CAREFUL)
python bin/mathir_sync.py --explain                     # how it works

5 target templates in mathir_mcp/opencode/<target>/_MATHIR_INJECT.md — edit the template once, re-inject everywhere. Pair: sync.py --force && inject.py --apply --target all.

🆚 vs Alternatives (honest 2026 comparison)

Researched against Mem0, Letta, Zep, Cognee, LangMem, Microsoft GraphRAG, Supermemory, Recall.it, ChatGPT Memory, Claude Projects, Gemini memories, Microsoft Copilot Work IQ. Sources at the bottom of this section.

Product	Architecture	OSS?	LLM-agnostic?	Edge?	Anomaly detection	Cost
🧠 MATHIR	5 cognitive tiers + KL router + Mahalanobis	✅ MIT	✅ Any	✅ ~500 MB GPU / 80 MB CPU	✅ AUC = 1.0	Free
Mem0	Vector + rerankers + LLM compression	⚠️ SDK only	✅ Any	❌ Cloud	❌	Free → $249/mo
Letta	Core/archival/recall tiers	✅ Apache 2.0	✅ Any	⚠️ Heavy	❌	Free (BYO infra)
Zep	Temporal knowledge graph	⚠️ Graphiti OSS	✅ Any	❌ Cloud	❌	$1,250/yr → Custom
Cognee	Self-hosted KG + vector	✅ Apache 2.0	✅ Any	⚠️ Heavy	❌	$35/mo → Custom
LangMem	Library on LangGraph store	✅ MIT	✅ Via LangChain	⚠️ DIY	❌	Free (BYO infra)
Microsoft GraphRAG	KG + community detection	✅ MIT	✅ Any	⚠️ DIY	❌	Free (BYO infra)
Supermemory	Custom vector graph	❌ Self-host binary	✅ Any	⚠️ Self-host	❌	$19 → $399/mo
Recall.it	Personal knowledge graph	❌ Closed SaaS	⚠️ Max tier only	❌	❌	Free → $38/mo
ChatGPT Memory (vendor)	Background "Dreaming" synthesis	❌ Closed	❌ OpenAI only	❌ Cloud	❌	$20/mo+
Claude Projects (vendor)	User-curated KB per project	❌ Closed	❌ Anthropic only	❌ Cloud	❌	$20/mo+
Gemini memories (vendor)	Implied semantic + chat history	❌ Closed	❌ Google only	❌ Cloud	❌	Free → $20/mo
Microsoft Work IQ (vendor)	Semantic index + personal memory	❌ Closed	❌ Microsoft 365 only	❌ Cloud	❌	M365 sub

What this table actually says

3 things only MATHIR does, as of June 2026:

Anomaly detection on inputs (immunological tier, AUC = 1.0). No competitor in this list has it.
Edge deployment in ~500 MB VRAM. All others need cloud or heavy local infra. Jetson Orin ✅ (full CUDA), Raspberry Pi ⚠️ (CPU fallback with ONNX INT8).
MIT-licensed, fully open source, no managed service. The only true OSS option with a 5-tier cognitive architecture.

Things others do that MATHIR doesn't (honesty):

Enterprise SSO, SOC 2, HIPAA, audit logs → Zep, Mem0 Pro, Supermemory Enterprise have these. MATHIR doesn't.
Managed hosted service → Mem0, Zep, Cognee, Supermemory all offer this. MATHIR is self-host only.
Temporal fact validity (modeling "this preference is no longer valid") → Zep's specialty.
1M+ tokens of pre-curated memory → Mem0's LoCoMo benchmark wins.

Where MATHIR is competitive:

GPU embedding speed → paraphrase-multilingual-MiniLM-L12-v2 on CUDA fp16: ~104ms/sent (384d, 50+ languages, 239MB VRAM)
Pure retrieval quality → MATHIR = FAISS dense-only (0.7441 nDCG@10 on BEIR SciFact, equal to SOTA)
Cross-provider → 11/12 wins across 3 different LLM architectures
Cross-lingual → UNIBRI finds English content from French queries
Cost → free, vs $20–$400/mo for managed alternatives

Sources

Mem0 pricing & research: mem0.ai/pricing, mem0.ai/research
Letta docs: docs.letta.com, letta.com/blog/continual-learning
Zep docs: getzep.com, help.getzep.com
Cognee: cognee.ai, github.com/topoteretes/cognee
LangMem: langchain-ai.github.io/langmem
Microsoft GraphRAG: microsoft.github.io/graphrag/, arXiv 2404.16130
Supermemory: supermemory.ai
Recall: recall.it
ChatGPT Memory: openai.com/index/chatgpt-memory-dreaming
Claude Projects: anthropic.com/news/projects, anthropic.com/news/claude-fable-5-mythos-5
Microsoft Work IQ: microsoft.com/.../work-iq-apis
Magic AI 100M tokens: magic.dev/blog/100m-token-context-windows
Chroma Context Rot: research.trychroma.com/context-rot
Breunig, "How Long Contexts Fail": dbreunig.com

🧩 Embedding Providers (NEW: ONNX support)

MATHIR v8.x+ ships with 6 embedding providers. The default is now paraphrase-multilingual-MiniLM-L12-v2 — 384d, 50+ languages, low VRAM (239MB fp16).

Provider comparison

Provider	Model	Dim	Speed (single)	Size	Quality	Local	Cost
🆕 HuggingFace (GPU) — DEFAULT	`paraphrase-multilingual-MiniLM-L12-v2`	384	~104ms/sent	471 MB (239 fp16)	🟢 Multilingual 50+	✅	Free
HuggingFace (GPU)	`BAAI/bge-large-en-v1.5`	1024	25 ms	1.3 GB	🟢 High (EN)	✅	Free
🆕 ONNX	`Octen-Embedding-0.6B-INT8`	1024	18.8 ms	5.2 MB	🟢 High	✅	Free
HuggingFace	`all-MiniLM-L6-v2`	384	5.2 ms	80 MB	🟡 Medium (EN)	✅	Free
HuggingFace	`Qwen/Qwen2.5-7B-Instruct`	3584	10–30 ms (GPU)	14 GB	🟢 High	✅	Free
Ollama	`llama3.2:3b`	2048	30–80 ms	2 GB	🟢 High	✅	Free
OpenAI	`text-embedding-3-small`	1536	80–200 ms	Cloud	🟢 High	❌	$0.02/1M

ONNX Provider (v8.4.0)

# In v8.4.0 the v7 `mathir_lib.providers.get_provider` was replaced by a
# dedicated OctenEmbedder class in mathir_mcp/mathir_lib/mathir_onnx_embedder.py
from mathir_lib.mathir_onnx_embedder import OctenEmbedder, get_onnx_embedder

# Quantized ONNX model (recommended for CPU + cross-language paraphrase)
embedder = OctenEmbedder(
    model_dir=r"C:\Users\So-i-learn-3D\.config\opencode\models\octen-int8",
    provider="CPUExecutionProvider",  # or "DmlExecutionProvider" for GPU
)

print(embedder.dim)                    # 1024

embeddings = embedder.encode(["Hello", "World"])
# Shape: (2, 1024), L2-normalized, ready for cosine similarity

ONNX vs HuggingFace benchmark (5 queries + 8 docs, RTX 4060)

Metric	ONNX (Octen INT8)	HuggingFace (MiniLM)	Ratio
Batch encode time	203 ms	27 ms	7.5×
Single query	18.8 ms	5.2 ms	3.6×
Embedding dim	1024	384	2.7×
Model size	5.2 MB	80 MB	15×
Memory footprint	50 % of FP32	100 %	0.5×
Similarity range	[0.42, 0.98]	[-2.53, 34.34] *	—
L2-normalized	✅ Yes	❌ No	—

* MiniLM embeddings are not L2-normalized by default — cosine similarity requires manual normalization.

When to use ONNX vs HuggingFace

Use case	Recommended
Best quality multilingual embeddings	ONNX (Octen)
Smallest model footprint	ONNX (5.2 MB)
Fastest single query	HuggingFace (MiniLM)
1024-dim embeddings for FAISS/pinecone	ONNX
384-dim for legacy systems	HuggingFace
Edge / Jetson Orin	ONNX (int8)
No GPU available	Both work; ONNX more compact

Download ONNX model

# Manual download (recommended — faster than pip)
# Create folder: C:\Users\So-i-learn-3D\.config\opencode\models\octen-int8\
# Download from https://huggingface.co/cstr/Octen-Embedding-0.6B-ONNX-INT8/resolve/main/
#   - model.int8.onnx (5.2 MB)
#   - model.int8.onnx.data (1.06 GB)
#   - tokenizer.json
#   - vocab.txt
#   - config.json

MCP Server

Voir la section 🔌 MCP Plug & Play en haut de page.

🚀 Deployment Options

MATHIR supports multiple deployment targets. The embedding model you choose determines VRAM, speed, and platform compatibility.

Platform	Model	VRAM	Speed (recall)	Status
Desktop GPU (CUDA)	bge-large-en-v1.5 (1024d)	~500 MB	25 ms	✅ Recommended
Jetson Orin (CUDA)	bge-large-en-v1.5 (1024d)	~500 MB	~30 ms	✅ Supported
CPU only	bge-large-en-v1.5 (1024d)	0 MB	~200 ms	✅ Supported
Raspberry Pi	ONNX INT8 (1024d)	0 MB	~500 ms	⚠️ Experimental

Notes:

MATHIR internal memory (working_memory/episodic/semantic/procedural tiers + immunological anomaly bank) is ~60 KB regardless of platform — this is always true (Theorem 1, bounded capacity).
Embedding model VRAM varies by model: ~500 MB for bge-large on GPU, 0 MB for CPU-only ONNX.
Raspberry Pi requires CPU fallback — use ONNX INT8. The bge-large model (1024d) is too large for Pi-class ARM devices without GPU.
Jetson Orin has CUDA support and runs bge-large at near-desktop speeds.

⚡ Quick Start (30 seconds)

1. Install

git clone https://github.com/sil3d/MATHIR.git
cd MATHIR
pip install -e .

2. The smallest possible example

from mathir_dropin.simple import SimpleMemory   # zero dependencies (just SQLite FTS5)

memory = SimpleMemory(db_path="my_app.db")
memory.store("User asked about Python closures")
memory.store("Explained that closures capture enclosing-scope variables")
memory.store("User then asked about decorators")

results = memory.recall("Python functions", k=3)
# → ["User asked about Python closures", "Explained closures..."]

3. With HybridSearch (auto-scaling vector search)

from mathir_dropin.simple import SimpleMemory

# HybridSearch is automatic — just use SimpleMemory
memory = SimpleMemory(db_path="my_app.db")

# Store memories (auto-selects numpy for N < 5K)
for i in range(1000):
    memory.store(f"Memory item {i}: This is a test memory about topic {i % 10}")

# At N=5,000, auto-switches to USearch HNSW (1.37ms)
results = memory.recall("test topic", k=5)

# Memory-mapped index persists to disk — no RAM pressure
print(f"Index size: {memory.get_index_size()}")  # ~50 KB on disk

4. Plug it into any LLM (3 lines)

def chat(user_message):
    context = memory.search_context(user_message, k=5, last_n=3)
    response = openai.chat.completions.create(  # or anthropic, or local llama_cpp
        model="gpt-4",
        messages=[
            {"role": "system", "content": f"Relevant memories:\n{context}"},
            {"role": "user",   "content": user_message}
        ],
    )
    memory.store(f"Q: {user_message} | A: {response.choices[0].message.content}")
    return response.choices[0].message.content

Works with any LLM — OpenAI, Anthropic, Gemini, Groq, Ollama, local 7B via llama_cpp, anything.

5. Or use the full V7 plugin (8 algorithms, 6 theorems)

from mathir_lib import MATHIRPluginV7

plugin = MATHIRPluginV7(embedding_dim=4096)
output = plugin.perceive(llm_embedding)

print(output["enhanced_embedding"])  # [1, 4096]
print(output["router_weights"])      # 5-tier allocation: [0.4, 0.3, 0.2, 0.1, 0.0]
print(output["anomaly_score"])       # novelty detection (0.0–1.0)
print(output["episodic_context"])    # retrieved past experiences

⚠️ After PC Reboot — Auto-Start (v8.4.2+)

As of v8.4.2, cold-boot auto-start is supported on all three platforms (Windows, macOS, Linux). Without configuration, the daemon does NOT auto-start on boot and MCP tools will silently fail until you start it manually.

One-time setup (per platform)

Platform	Command	What it does
Windows	`python install_smart.py --autostart-only`	Creates a VBS launcher in `shell:startup` (no admin needed) + adds Task Scheduler fallback
macOS	`python install_smart.py --autostart-only`	Installs `~/Library/LaunchAgents/com.mathir.daemon.plist` (auto-loads on login)
Linux	`python install_smart.py --autostart-only`	Installs `~/.config/systemd/user/mathir-daemon.service` and enables it
RPi / Jetson	`bash install.sh --autostart-only`	Same as Linux, with `--user` systemd scope (no root needed)

Once set up, the daemon starts silently on every login/reboot, before any agent even boots. memory_recall works immediately on first call.

Quick start WITHOUT auto-start setup

If you haven't run the one-time setup:

Option 1 — One-click (recommended): Double-click: C:\Users\So-i-learn-3D\.config\opencode\bin\auto_start.bat

Option 2 — Command line:

python C:\Users\So-i-learn-3D\.config\opencode\bin\mathir_server.py

Verify the daemon is running

curl http://localhost:7338/health
# → {"status":"ok","model":"paraphrase-multilingual-MiniLM-L12-v2",...}

If you get "connection refused", the daemon is down — start it with Option 1 or 2 above.

What is automatic (v8.4.2)

✅ Cold-boot auto-start (after one-time setup above)
✅ Daemon auto-restarts on crash (within session, via watchdog)
✅ Python fallback when MCP layer fails (no daemon required)
✅ Daemon reloads embedding model on connection drop
✅ MCP server survives daemon crash and reconnects automatically

What is still manual

❌ MCP prompt does NOT auto-inject at agent startup — relies on the agents/*.md files already containing the _MATHIR_INJECT.md block. (Re-inject after opencode update with mathir_inject.py.)
❌ Agents do NOT auto-check daemon status before calling memory_recall (you'll see an error, not silent failure)

Known limitations

Windows admin-blocked: Task Scheduler approach requires admin elevation; the VBS-in-Startup-folder approach is the non-admin fallback and works for the logged-in user only (no console session).
Linux user-scope systemd: only runs while the user is logged in (no headless server mode without enabling lingering: loginctl enable-linger $USER).
macOS launchd: only triggers on login (not boot) — this is by Apple design.

📚 Documentation Map

Doc	Purpose	Audience
README.md	Overview, quick start, vs alternatives	Everyone
CHANGELOG.md	Version history (source of truth for version)	Maintainers
docs/01_MASTER_RESEARCH_PAPER.md	Doctoral paper (147KB)	Researchers
docs/03_MASTER_QA_GUIDE.md	63 defense Q&A	Decision-makers
docs/05_SHIPPING_GUIDE.md	Production shipping FAQ	DevOps
docs/06_MULTIMODAL_MEMORY_GUIDE.md	Modality details	Integrators
docs/07_MATHIR_VS_VECTORDB_USE_CASES.md	MATHIR vs FAISS	Architects
docs/08_WHY_SAME_RESULTS.md	Math proof A=FAISS	Theorists
docs/BRAIN_ARCHITECTURE.md	5-phase brain stack	Engineers
mathir_mcp/README.md	MCP install + 3-step quick start	MCP users
mathir_mcp/GLOBAL_INSTRUCTIONS.md	Injected into agent prompts	Agent devs
mathir_mcp/docs/AGENT.md	Per-agent MCP config	MCP integrators
mathir_mcp/docs/DAEMON.md	Daemon JSON-RPC protocol	Backend devs
mathir_mcp/docs/DIMENSIONS.md	Embedding model selection	ML engineers
mathir_mcp/docs/GPU_SETUP.md	GPU acceleration	GPU users
mathir_mcp/docs/DASHBOARD_GUIDE.md	Dashboard setup	Admins

🎬 Live Demo

cd vision_testing
pip install -r requirements.txt
python start_ui.py
# → Opens at http://127.0.0.1:5000

A full web UI for testing vision + audio models with persistent MATHIR memory.

┌─────────────────────────────────────────────────────────────────────────┐
│  MATHIR Vision Testing UI                          🟢 MATHIR connected │
├─────────────────────────────────────────────────────────────────────────┤
│  [💬 Chat]   [📷 Camera]   [🧠 Memory]   [🤖 Models]   [🎯 Accuracy]   │
│                                                                         │
│  ┌──────────────────────────┐    ┌──────────────────────────────────┐   │
│  │ Camera: 1280x720 @ 30fps │    │  Chat history                    │   │
│  │ ┌────────────────────┐   │    │  ─────────────────────────────    │   │
│  │ │                    │   │    │  You: What's in front of me?     │   │
│  │ │   [Live Preview]   │   │    │  AI:  A red apple on a desk.    │   │
│  │ │                    │   │    │                                   │   │
│  │ └────────────────────┘   │    │  You: Count the objects.         │   │
│  │                          │    │  AI:  I see 3 objects.           │   │
│  │ [📸 Snapshot] [🎤 Talk]  │    │                                   │   │
│  └──────────────────────────┘    │  🧠 MATHIR: 12 memories stored  │   │
│                                   └──────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘

6 views in the web UI

View	What it does	Screenshot features
💬 Chat	Real-time chat with vision/audio models + persistent memory	Drag-and-drop images, hold-to-talk audio, history in localStorage
📷 Camera	Live webcam (backend OpenCV) — describe, ask, count objects	MJPEG stream, ask-on-frame, auto-capture
🧠 Memory	Query MATHIR memory across all sessions	Search, recall, delete individual memories
🤖 Models	Switch between LFM2.5-VL, Audio, Gemma, Qwen	Load/unload, capabilities, VRAM usage
🎯 Accuracy	Run test batteries, compare models	nDCG@10, MRR, latency, F1
⚙️ Settings	Camera, audio, theme, model management	Live preview, device selection

A standalone playground at /playground.html provides multi-session chat with model switching, image drag & drop, and hold-to-talk audio.

💡 More Examples

Example 1 — Persistent chat memory across sessions (and across LLMs)

# === Day 1, with GPT-4 ===
memory = SimpleMemory(db_path="alice.db")
memory.store("Alice is a software engineer at Google")
memory.store("Alice prefers Python over JavaScript")
memory.store("Alice is building a RAG system for legal documents")
# (close the app, go to sleep)

# === Day 2 (re-open, still GPT-4) ===
memory = SimpleMemory(db_path="alice.db")   # same DB, no config
print(memory.search_context("What does Alice do?", k=3))
# → ["Alice is a software engineer at Google",
#    "Alice is building a RAG system for legal documents",
#    "Alice prefers Python over JavaScript"]

# === Day 3 (switch to local Llama 3.1 — same memory!) ===
# Same SQLite file, same memories, different LLM.
# This is what vendor-locked ChatGPT Memory can't do.

Example 2 — Anomaly detection (no other LLM-memory product has this)

from mathir_lib import MATHIRPluginV7

plugin = MATHIRPluginV7(embedding_dim=768)

# Feed normal inputs to "train" the immune system
for emb in normal_user_inputs:
    plugin.perceive(emb)

# Now anomalies are flagged
output = plugin.perceive(weird_prompt_injection)
if output["anomaly_score"] > 0.95:
    print("⚠️ Possible prompt injection detected!")
    # AUC-ROC = 1.0 on test set

Example 3 — Context-aware retrieval (same query, different results)

plugin = MATHIRPluginV7(embedding_dim=768)

# No context loaded
print(plugin.perceive(embed("What's the capital of France?"))["results"])
# → ["Paris", "Lyon", "Marseille"]   (generic)

# Load cooking context
plugin.load_context(recent_conversation_about_french_cuisine)

# Same query, different results
print(plugin.perceive(embed("What's the capital of France?"))["results"])
# → ["Paris", "Bordeaux wine region", "Provence herbs"]   (context-aware)

Example 4 — Cross-lingual recall (UNIBRI)

from mathir_dropin.universal_bridge import universal_recall

# Store English content
memory.store("Python closures capture variables from enclosing scope")

# French query finds it
results = universal_recall("clotures python", k=3)
# → [{"text": "Python closures capture variables...", "score": 0.89}, ...]

Example 5 — Cross-provider (works with any LLM)

# Same memory, different providers
memory.store("The capital of France is Paris")

# OpenAI
client = openai.OpenAI()
# ... use memory in prompt

# Anthropic
client = anthropic.Anthropic()
# ... same memory, different API

# Local 7B
from llama_cpp import Llama
# ... same memory, on-device, no internet

# The memory layer is provider-agnostic.

Example 6 — Full cognitive pipeline

from mathir_lib import MATHIRPluginV7

plugin = MATHIRPluginV7(embedding_dim=768)

# A single perceive() call routes through all 5 tiers
output = plugin.perceive(input_embedding, metadata={"user": "alice"})

# What just happened:
print(f"Router picked: {output['router_weights']}")
# → [0.4, 0.3, 0.2, 0.1]  (working_memory, episodic, semantic, procedural)

print(f"Context used: {output['episodic_context']}")
# → "User asked about Python closures 3 days ago..."

print(f"Anomaly score: {output['anomaly_score']:.3f}")
# → 0.02  (looks normal)

print(f"Enhanced embedding: {output['enhanced_embedding'].shape}")
# → (1, 768)

🏗️ Architecture

┌─────────────────────────────────────────────┐
│              ANY LLM                       │
│   (Claude · GPT-5 · Qwen · LFM2.5 · 7B)    │
└─────────────────┬───────────────────────────┘
                  │ embeddings (1024-d)
                  ▼
┌─────────────────────────────────────────────┐
│           🧠  MATHIR PLUGIN                │
│    ~500 MB VRAM (GPU) · ~107 ms · edge-ready │
│                                             │
│   NOTE: MATHIR internal memory (working_   │
│   memory/episodic/semantic/procedural +    │
│   immunological anomaly bank)              │
│   is ~60 KB (always, Theorem 1). VRAM usage │
│   is the embedding model, not the tiers.    │
│                                             │
│   ┌──────────┐  ┌──────────┐  ┌─────────┐  │
│   │ Working  │  │ Episodic │  │Semantic │  │
│   │  (now)   │  │  (past)  │  │(always) │  │
│   └────┬─────┘  └────┬─────┘  └────┬────┘  │
│        └──────────────┼──────────────┘      │
│               ┌───────▼──────┐               │
│               │ KL  Router   │               │
│               └───────┬──────┘               │
│               ┌───────▼──────┐               │
│               │Immunological │               │
│               │  (anomaly)   │               │
│               └──────────────┘              │
│                                             │
│   ┌─────────────────────────────────────┐   │
│   │     HybridSearch Auto-Scaling       │   │
│   │  ┌─────────┐    ┌──────────────┐   │   │
│   │  │ numpy   │───►│ USearch HNSW │   │   │
│   │  │ (N<5K)  │auto│ (mmap index) │   │   │
│   │  └─────────┘    └──────────────┘   │   │
│   └─────────────────────────────────────┘   │
└─────────────────┬───────────────────────────┘
                  │ enhanced context + anomaly flag
                  ▼
┌─────────────────────────────────────────────┐
│              LLM DECISIONS                  │
└─────────────────────────────────────────────┘

5 cognitive memory tiers

Tier             Capacity    What it does                              When it updates
─────────────    ─────────   ───────────────────────────────────────   ────────────────
🧠 Working         64 slots   Immediate context (last N steps)          Every step
                   [circular   Multi-head attention on recent context
                    buffer]

📚 Episodic     1 000 slots   Past experiences (key-value store)         On event
                   [FIFO +     Cosine similarity on stored embeddings
                    LIRS]      +37.8 % recall improvement

🎓 Semantic       256 proto-  Learned concepts (online k-means)          Every 100 steps
                   types      Compact concept representation

🛡️ Immunological  100 pat-    Anomaly detection (Mahalanobis              On event
                   terns      distance)  AUC-ROC = 1.0

HybridSearch Auto-Scaling Backend

┌─────────────────────────────────────────────────────────────────┐
│                    HybridSearch Auto-Scaling                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  N < 5,000 docs          N >= 5,000 docs                       │
│  ┌─────────────┐         ┌─────────────────────────────┐       │
│  │   numpy     │  ────►  │  USearch HNSW (mmap index)  │       │
│  │  0.78 ms    │  auto   │  1.37 ms                    │       │
│  └─────────────┘  scale  └─────────────────────────────┘       │
│                                                                 │
│  Memory: ~20 KB        Memory: ~50 KB (mmap on disk)           │
│  RAM-only              Index persisted to disk, not RAM         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Daemon Architecture (v7.8.0+)

MATHIR runs as a persistent daemon process on port 7338 — the embedding model stays loaded in GPU RAM between calls.

Client (opencode / Python)
         │ TCP socket (localhost:7338)
         ▼
┌─────────────────────────────────────┐
│  mathir_server.py (Flask+Waitress)      │
│  ├── Model loaded ONCE at startup   │
│  ├── GPU memory held across calls   │
│  ├── TCP server on port 7338        │
│  ├── HybridSearch auto-scaling      │
│  └── No model reload per request    │
└─────────────────────────────────────┘
         │
         ▼
    mathir_client.py (thin client)
    → 1–2 ms per call (model already in VRAM)

Why daemon instead of per-request:

Model load: ~3–5 seconds (bge-large-en-v1.5) → eliminated after first call
Per-call overhead: 1–2 ms (TCP round-trip only)
GPU memory: ~500 MB held continuously (vs 0 MB between calls)
No cold starts within a session — once the daemon is running, the model stays loaded in VRAM. Note: a cold start happens only on first launch; with auto-start configured (install_smart.py --autostart-only) the daemon is up before any agent boots (see ⚠️ After PC Reboot).

# Start daemon (background, persists until PC reboot or crash)
python -m mathir_mcp &

# Thin client — fast, model already loaded
python ~/.config/opencode/mathir_mcp/mathir_lib/mathir_client.py recall "query" -k 5

Daemon Push (NEW in v8.2.0)

MATHIR v8.2.0 introduces proactive memory delivery — the daemon can push relevant memories to clients without explicit recall requests. This enables automatic context injection for ongoing conversations.

┌─────────────────────────────────────────────────────────────────┐
│                    Daemon Push Flow                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Client                    Daemon                              │
│    │                         │                                 │
│    │  push --auto            │                                 │
│    ├────────────────────────►│  Analyze context                │
│    │                         │  Query 5-tier memory            │
│    │                         │  Rank by relevance              │
│    │  ◄──────────────────────┤                                 │
│    │  [memory1, memory2, ...] │  Return ranked memories        │
│    │                         │                                 │
│  Push Modes:                                                    │
│  ┌─────────────┬─────────────────────────────────────────────┐ │
│  │ --auto      │ Daemon analyzes context, returns JSON array │ │
│  │ --json      │ Returns structured {memories: [...]}        │ │
│  │ --simple    │ Returns plain text memories                 │ │
│  └─────────────┴─────────────────────────────────────────────┘ │
│                                                                 │
│  Use Cases:                                                     │
│  • Auto-inject relevant context before each LLM call           │
│  • Proactive memory suggestions during conversations           │
│  • Background context enrichment for long sessions             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Push commands:

# Auto mode — daemon pushes relevant memories based on context
python ~/.config/opencode/mathir_mcp/mathir_lib/mathir_client.py push "contexte ici" --auto

# JSON mode — returns structured memory suggestions
python ~/.config/opencode/mathir_mcp/mathir_lib/mathir_client.py push "contexte ici" --json

# Simple mode — returns plain text memories (default)
python ~/.config/opencode/mathir_mcp/mathir_lib/mathir_client.py push "contexte ici"

Why push instead of pull:

Latency: Memories delivered proactively, no recall delay
Context: Daemon analyzes full conversation history, not just current query
Automatic: No need to remember to call recall — daemon delivers relevant memories
Efficient: Cache prevents redundant embedding computations

KL-constrained router

The router decides which memory tier to consult for each input. It uses PPO-style trust-region optimization with a KL-divergence constraint to prevent collapse to a single tier.

Input: "What's the weather?"
              │
              ▼
       ┌──────────────┐
       │  KL Router   │   weights = [0.40, 0.30, 0.20, 0.10]
       └──────┬───────┘
              │
   ┌──────────┼──────────┬──────────┐
   ▼          ▼          ▼          ▼
Working    Episodic   Semantic   Immune
(0.40)     (0.30)     (0.20)     (0.10)
   │          │          │          │
   │ "Right   │ "User    │ "Weather │ "Nothing
   │  now"    │  asked   │  is a    │  weird
   │          │  this    │  common  │  about
   │          │  before" │  topic"  │  this"
   ▼          ▼          ▼          ▼
   "22°C,     "Last      Use       No flag.
   sunny"     time you   general   All normal.
              asked,    knowledge
              it was
              sunny"

The router learns its allocation strategy over time (no hard-coded rules):

Short-term reflex → working memory
Recall a past situation → episodic memory
Apply a general concept → semantic memory
Novel / unusual input → immunological memory

📊 Tests & Benchmarks

All results reproducible. Scripts in benchmarks/, full HTML report in benchmarks/06_results/current/MATHIR_FINAL_REPORT.html.

🆕 Lifecycle Benchmarks (v8.4.0)

Two complementary benchmarks that prove the living memory actually improves recall quality:

# Memory-only throughput (no LLM, ~5 min)
python benchmarks/04_lifecycle_bench/micro_bench.py --count 1000

# AI-driven end-to-end (20 min default) — measures recall quality before/after
python benchmarks/04_lifecycle_bench/run_all.py --duration 20

The AI bench runs 4 phases: generate experiences → baseline Q&A → age + maintenance cycle → re-test same Q&A. The headline metric: does recall@5 and has_answer_rate improve after decay + promote + consolidate + build_links runs? See benchmarks/04_lifecycle_bench/README.md for details.

🧪 Test suite — 226 tests, 99 % pass

Suite                         Tests   Status     Coverage
─────────────────────────     ─────   ────────    ───────────────────
test_v7_memory.py               49    ✅ 49/49    Memory tier algorithms
test_v7_integration.py          16    ✅ 14/16    End-to-end pipelines
test_raw_embedding.py           28    ✅ 28/28    Embedding layer
test_ensemble.py                36    ✅ 36/36    Anomaly ensemble
test_faiss_memory.py            32    ✅ 32/32    FAISS integration
test_hybrid.py                  34    ✅ 34/34    Hybrid retrieval
mathir_dropin audit             31    ✅ 31/31    Drop-in API surface
─────────────────────────     ─────   ────────    ───────────────────
TOTAL                          226    ✅ 224/226  99 %

pytest mathir_dropin/tests/ -v

🐍 Daemon stress test — 50/50 pass (V8.3)

Test                          Requests  Status    Latency
──────────────────────────    ────────  ────────  ────────
memory_save (rapid fire)         20/20  ✅ PASS   50-120ms
ping (rapid fire)                20/20  ✅ PASS   2-23ms
memory_recall (rapid fire)       10/10  ✅ PASS   47-94ms
memory_hybrid_search             10/10  ✅ PASS   47-65ms
──────────────────────────    ────────  ────────  ────────
TOTAL                            50/50  ✅ PASS   ~60ms avg

📈 BEIR benchmark results (nDCG@10)

System	SciFact	NFCorpus	ArguAna	Verdict
FAISS dense-only (BGE-base)	0.7441	0.3657	0.6613	✅ SOTA baseline
BM25 only	0.5438	0.2617	—	⚠️ Too weak for scientific
Hybrid RRF (1:1)	0.6602	0.3263	—	⚠️ BM25 dilutes dense
Hybrid + Cross-Encoder	0.5910	0.2620	—	❌ Cross-encoder wrong domain

nDCG@10 (SciFact)
0.8 ┤
    │      ████████
0.7 ┤      ████████
    │      ████████
0.6 ┤      ████████  ████████
    │      ████████  ████████
0.5 ┤      ████████  ████████  ████████
    │      ████████  ████████  ████████
0.4 ┤      ████████  ████████  ████████
    │      ████████  ████████  ████████
0.3 ┤      ████████  ████████  ████████
    │      ████████  ████████  ████████
0.2 ┤      ████████  ████████  ████████
    │      ████████  ████████  ████████
0.1 ┤      ████████  ████████  ████████
    │      ████████  ████████  ████████
0.0 ┴───────────────────────────────
        FAISS       Hybrid     Hybrid+CE
       (0.7441)    (0.6602)    (0.5910)

🔍 Vector Search — HybridSearch Auto-Scaling

MATHIR v7.8+ introduces HybridSearch — a vector search backend that automatically scales from numpy (small datasets) to USearch HNSW (large datasets) with memory-mapped indexes.

┌─────────────────────────────────────────────────────────────────┐
│                    HybridSearch Auto-Scaling                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  N < 5,000 docs          N >= 5,000 docs                       │
│  ┌─────────────┐         ┌─────────────────────────────┐       │
│  │   numpy     │  ────►  │  USearch HNSW (mmap index)  │       │
│  │  0.78 ms    │  auto   │  1.37 ms                    │       │
│  └─────────────┘  scale  └─────────────────────────────┘       │
│                                                                 │
│  Memory: ~20 KB        Memory: ~50 KB (mmap on disk)           │
│  RAM-only              Index persisted to disk, not RAM         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

BEIR Benchmark Results (5,183 documents)

Backend	Latency (search)	Index Size	RAM Usage	Notes
numpy (cosine)	0.78 ms	~20 KB	~20 KB	Fastest for small N
USearch HNSW (mmap)	1.37 ms	~50 KB	~50 KB	Memory-mapped, scales to 1M+
sqlite-vec	23.68 ms	~100 KB	~100 KB	Slower, WAL-optimized

Search latency at 5,183 documents
30 ms ┤
      │
25 ms ┤
      │                    ████████
20 ms ┤                    ████████
      │                    ████████
15 ms ┤                    ████████
      │                    ████████
10 ms ┤                    ████████
      │                    ████████
 5 ms ┤                    ████████
      │  ████████          ████████
 0 ms ┴─────────────────────────────
        numpy (0.78)     sqlite-vec (23.68)
              USearch (1.37)

Key Features

Feature	Description
Auto-scaling	numpy → USearch at N=5,000 (no config needed)
Memory-mapped indexes	Index on disk, not RAM — no memory pressure for large datasets
sqlite-vec WAL mode	Write-Ahead Logging for 3.4× write speedup
Zero-config	HybridSearch picks the optimal backend automatically
FAISS fallback	Optional FAISS integration for production deployments

Usage

from mathir_dropin.simple import SimpleMemory

# HybridSearch is automatic — just use SimpleMemory
memory = SimpleMemory(db_path="my_app.db")

# For N < 5K: numpy backend (0.78ms)
# For N >= 5K: auto-switches to USearch HNSW (1.37ms)
# No configuration needed

memory.store("Python closures capture variables")
results = memory.recall("Python functions", k=5)  # Uses optimal backend

🚀 What MATHIR adds over FAISS

Capability	FAISS	MATHIR	Delta
Online learning	❌	✅ +37.8 %	🟢 MATHIR
Anomaly detection (AUC)	❌	1.0	🟢 MATHIR
Context-aware results	❌	88 %	🟢 MATHIR
2-hour stress (no crash)	❌	100 %	🟢 MATHIR
No memory leak	❌	✅	🟢 MATHIR
Router balanced	❌	100 % acc.	🟢 MATHIR
Graceful degradation	❌	✅	🟢 MATHIR
Raw retrieval speed	< 1 ms	0.78 ms (numpy) / 1.37 ms (USearch)	🔵 FAISS (similar)

⏱ 2-hour stress test (all 5 tiers active)

Metric	Value	Status
Uptime	100 %	✅
Memory leaks	None	✅
Retrieval quality @ 120 min	0.959	✅
P99 latency	17.8 ms	✅
Total operations	26 440	✅

Retrieval quality over 2 hours
1.0 ┤█████████████████████████████████████████████████
    │
0.95┤█████████████████████████████████████████████████
    │
0.9 ┤█████████████████████████████████████████████████
    │
0.85┤█████████████████████████████████████████████████
    └─────────────────────────────────────────────────
    0    20    40    60    80    100   120  (minutes)

🌍 Cross-provider generalization (OpenRouter)

Model	API latency	MATHIR wins	Result
`openrouter/owl-alpha`	2.6 s	4 / 4	🏆 MATHIR wins all
`openai/gpt-oss-120b:free`	2.0 s	3 / 4	🏆 MATHIR wins most
`openai/gpt-oss-20b:free`	1.1 s	4 / 4	🏆 MATHIR wins all

Total: 11 / 12 scenarios — MATHIR wins across 3 different LLM architectures.

🌐 Cross-lingual (UNIBRI)

"What do you know about python closures?"  → finds "python-closures"         ✅
"clotures python"  (French)                → finds English "Python closures" ✅
provider="unknown"  (no stored embedding)  → 3 results via fallback chain   ✅

The Universal Bridge uses multi-resolution character n-gram kernels (Broder 1997) + Johnson-Lindenstrauss random projection + Procrustes SVD for cross-space alignment. Mathematically grounded, vocabulary-free, language-agnostic.

⚡ Performance Benchmarks (v7.8.0+)

Real-world benchmarks on RTX 4060 + CUDA, measuring save (store memory) and recall (search memory) latency.

End-to-End Latency

Operation	Latency	Breakdown
Save (store memory)	58 ms	bge-large CUDA 3ms + DB write 55ms
Recall (search memory)	107 ms	bge-large CUDA 3ms + vector search 104ms

Vector Search at Scale

Dataset Size	Backend	Latency	Notes
5,000 docs	numpy	0.78 ms	Auto-selected for small N
5,000 docs	USearch HNSW	1.37 ms	Memory-mapped index
5,000 docs	sqlite-vec	23.68 ms	WAL-optimized

Embedding Model Comparison

Model	Dimensions	Device	Save	Recall	Notes
BAAI/bge-large-en-v1.5	1024	CUDA	58 ms	107 ms	✅ Recommended — best quality/speed
MiniLM-L6-v2	384	CUDA	22 ms	53 ms	⚠️ Faster save, slower recall
Octen INT8	1024	CPU	~5 000 ms	~2 700 ms	🐢 50–100× slower
Octen INT8	1024	CUDA (onnxruntime-gpu)	~776 ms	—	⚠️ Partial GPU (ONNX limitation)

Key insight: bge-large-en-v1.5 achieves 107 ms recall at 1024d on full CUDA — includes embedding time (3ms) + vector search (104ms). The larger dimension space produces better similarity scores, and CUDA handles the extra compute efficiently.

🔬 Why It Works — Theoretical Foundation

Component	Guarantee	Citation
Episodic memory	Cosine similarity on stored embeddings → +37.8 % recall (measured on BEIR)	Empirical
Immunological memory	Mahalanobis distance is the NP-optimal detector for anomalies in Gaussian data	McLachlan 1999
Working memory	Multi-head attention on circular buffer → bounded latency, context-aware	Vaswani 2017
KL Router	KL-divergence penalty (PPO-style) prevents tier collapse; max-entropy ensures exploration	Schulman 2017
UNIBRI	Theorems 1–4 give OOV / cross-lingual / cross-provider stability guarantees	Broder 1997, J-L 1984, Wedin 1972

Full mathematical proofs in docs/01_MASTER_RESEARCH_PAPER.md.

📁 Project Structure

MATHIR/
├── 🧠 mathir_lib/             # Full library (8 algorithms · 6 theorems · 9.3× compression)
│   ├── plugin_v7.py           # V7 plugin (recommended)
│   ├── memory/                # Memory tier implementations
│   └── config.py
│
├── 📦 mathir_dropin/          # Drop-in memory (copy to your project)
│   ├── memory.py              # MATHIRMemory (torch-powered)
│   ├── simple.py              # SimpleMemory (FTS5, zero deps)
│   ├── store.py               # SQLite storage
│   └── universal_bridge.py    # UNIBRI: cross-provider · cross-lingual
│
├── 👁️ vision_testing/         # Full vision/audio testing UI
│   ├── ui_server.py           # Flask backend · 18 API routes
│   ├── ui/                    # Web UI (HTML · CSS · JS)
│   └── playground.html        # Multi-session chat playground
│
├── 📊 benchmarks/             # Reproducible benchmarks + HTML report
│   └── 06_results/current/     # Benchmark reports
│       └── MATHIR_FINAL_REPORT.html   # Full visual report
│
├── 🧪 mathir_dropin/tests/    # 226 tests
├── 📚 docs/                   # Tutorials · theory · LaTeX paper
├── 🔧 examples/               # Demo scripts
└── ⚙️ config/                 # Configuration

🛠️ Try the Examples

# Zero-dep memory (works without torch)
python examples/simple_memory_demo.py

# Vision + audio UI
cd vision_testing && python start_ui.py

# Multi-session chat playground
cd vision_testing && python start_ui.py
# → http://127.0.0.1:5000/playground.html

📚 Documentation

📖 New here? See the Documentation Map above — full doc index by audience and purpose. (The old docs/00_README.md index was removed in v8.3.0 — it's now in this README.)

Top 7 "Hidden Gems"

#	Document	Lines	What it is
1	`README.md`	510	The pitch (you are here)
2	`MATHIR_FINAL_REPORT.html`	348	All benchmark numbers
3	`docs/03_MASTER_QA_GUIDE.md`	637	63 Q&A for CTO defense
4	`docs/07_MATHIR_VS_VECTORDB_USE_CASES.md`	454	MATHIR vs FAISS on chat + autonomous driving
5	`docs/MATHIR_Research_Paper.tex`	1 130	LaTeX paper — peer-review ready
6	`docs/01_MASTER_RESEARCH_PAPER.md`	699	6 theorems with proofs
7	`docs/01_MASTER_RESEARCH_PAPER.md`	2 155	Doctoral research paper — 145 KB

Quick links

Document	Description
📄 `docs/MATHIR_Research_Paper.tex`	LaTeX paper for scientific review
📖 `docs/01_MASTER_RESEARCH_PAPER.md`	Full research paper (Markdown, 145 KB)
📊 `benchmarks/MATHIR_FINAL_REPORT.html`	Visual benchmark report (HTML, interactive charts)
📊 `benchmarks/MATHIR_FINAL_REPORT.md`	Benchmark report (Markdown)
🎯 `docs/03_MASTER_QA_GUIDE.md`	63 Q&A for defense / evaluation
🆚 `docs/07_MATHIR_VS_VECTORDB_USE_CASES.md`	MATHIR vs FAISS use cases
🔬 `docs/01_MASTER_RESEARCH_PAPER.md`	Mathematical proofs (6 theorems)
🤖 `mathir_mcp/docs/AGENT.md`	Quick reference for AI agents
👁️ `vision_testing/README.md`	Vision/audio testing docs
📦 `mathir_dropin/README.md`	Drop-in memory docs
📋 `CHANGELOG.md`	Version history

🗺️ Roadmap

Version	Milestone	Status
V1–V5	Core architecture + KL router	✅
V6	LLM-agnostic plugin API	✅
V7	8 algorithms + 6 theorems + 9.3× compression	✅
V7.5	Real BEIR benchmarks (0.7441 SOTA)	✅
V7.6	Universal Bridge (UNIBRI)	✅
V7.7	Vision & audio testing + MATHIR memory	✅
V7.7.1	SimpleMemory (FTS5) + UI overhaul	✅
V7.8	GPU embeddings (bge-large) + daemon architecture	✅
V8	Cascade architecture + arXiv paper	🔜
V9	Edge deployment (Jetson / ONNX)	📋
V10	Open-source release (HuggingFace · PyPI)	📋

🤝 Contributing

We welcome contributions.

# 1. Fork & clone
git clone https://github.com/YOUR_USERNAME/MATHIR.git
cd MATHIR
pip install -e .

# 2. Create a branch
git checkout -b feature/my-feature

# 3. Make changes, add tests, run them
pytest tests/ -v

# 4. Submit a PR

Areas where help is needed

📚 Documentation — improve tutorials, add examples
🧪 Testing — edge cases, more coverage
📊 Benchmarks — more corpora, more embedding models
📱 Edge deployment — Rust / ONNX port
🔌 Integrations — LangChain · LlamaIndex · Haystack

📄 Citation

If you use MATHIR in your research, please cite:

@software{mathir2026,
  title  = {MATHIR: Memory-Augmented Tensor Hybrid with Intelligent Routing},
  author = {Mbama Kombila, Prince Gildas},
  year   = {2026},
  url    = {https://github.com/sil3d/MATHIR}
}

Full paper: docs/MATHIR_Research_Paper.tex

📜 License

MIT — free for commercial and research use.

🧠 MATHIR — A 5-tier cognitive memory layer for any LLM, on any hardware.

Author: Prince Gildas Mbama Kombila · Email: soilearn3d@gmail.com

⭐ Star this repo if you find it useful — it helps others discover MATHIR.

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
_deprecated		_deprecated
benchmarks		benchmarks
bin/seo		bin/seo
docs		docs
examples		examples
mathir_dropin		mathir_dropin
mathir_mcp		mathir_mcp
raspberry_jetson		raspberry_jetson
results		results
stress_test		stress_test
vision_testing		vision_testing
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
SEO_TAGS.md		SEO_TAGS.md
index.html		index.html
install.bat		install.bat
install.sh		install.sh
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

⚠️ DISCLAIMER — Read before use

🧠 MATHIR

Memory-Augmented Tensor Hybrid with Intelligent Routing

🧭 Project Origin — 2 years, 1 question

The original question (2024)

The 2D simulator (Pygame)

MATHIR's original purpose

2 years later: LSTM is dead, the question evolved

The next step: a real 3D RC car

🔥 5 real-world problems MATHIR solves

1. Medical AI — "We've never seen this disease before"

2. Chat sessions — "Sorry, who are you?"

3. Autonomous driving — "The sensor just died"

4. Fine-tuning — "My data is a mess"

5. Knowledge drift — "Is this still accurate?"

🆕 What's new in v8.5.0 — FastMCP + Auto-Injection

Key changes

19 MCP tools

🆕 What's new in v8.4.0 — Living memory

Your brain doesn't keep everything — and neither does MATHIR

The 4 things MATHIR now does

Before vs after

Live verification (2026-06-23)

The story that hurts

🧑‍💻 The developer

🚗 The autonomous vehicle

What MATHIR changes

🔌 MCP Plug & Play — 2 lines

v9.0 Console Scripts (universal, IDE-agnostic)

✅ Fully tested on (verified 2026-06-23)

🤖 AI Coding Tools (MCP clients)

🧠 LLMs (backend models via the same MATHIR memory)

✅ Cold-boot auto-start (v8.4.2 — verified 2026-06-24)

💡 Why this matters

🛠️ Installation recommendation

📦 Cross-platform install scripts

📖 Platform-specific install guides

🚀 Auto-start the daemon after reboot

🔧 Dynamic Injection & Sync (v8.4.1)

🆚 vs Alternatives (honest 2026 comparison)

What this table actually says

Sources

🧩 Embedding Providers (NEW: ONNX support)

Provider comparison

ONNX Provider (v8.4.0)

ONNX vs HuggingFace benchmark (5 queries + 8 docs, RTX 4060)

When to use ONNX vs HuggingFace

Download ONNX model

MCP Server

🚀 Deployment Options

⚡ Quick Start (30 seconds)

1. Install

2. The smallest possible example

3. With HybridSearch (auto-scaling vector search)

4. Plug it into any LLM (3 lines)

5. Or use the full V7 plugin (8 algorithms, 6 theorems)

⚠️ After PC Reboot — Auto-Start (v8.4.2+)

One-time setup (per platform)

Quick start WITHOUT auto-start setup

Verify the daemon is running

What is automatic (v8.4.2)

What is still manual

Known limitations

📚 Documentation Map

🎬 Live Demo

6 views in the web UI

💡 More Examples

Example 1 — Persistent chat memory across sessions (and across LLMs)

Example 2 — Anomaly detection (no other LLM-memory product has this)

Example 3 — Context-aware retrieval (same query, different results)

Example 4 — Cross-lingual recall (UNIBRI)

Example 5 — Cross-provider (works with any LLM)

Example 6 — Full cognitive pipeline

🏗️ Architecture

5 cognitive memory tiers

HybridSearch Auto-Scaling Backend

Packages