A closed-loop memory engine for streaming AI.
Cortext turns live text, audio, and image streams into durable, source-backed memory. It stores compact traces, builds graph associations, retrieves relevant context, and runs explicit shallow consolidation over embeddings. It ships as a C++20 library with a stable C ABI and bindings for Python, Go, JavaScript/TypeScript, Dart, and WebAssembly.
Most LLM memory systems are open-loop. Content is chunked, embedded, summarized, and retrieved while the parameters that govern those stages stay fixed. Cortext feeds retrieval outcomes, prediction error, storage pressure, and consolidation results back into three continuous control knobs: Focus (F), Sensitivity (S), and Stability (T). Those knobs modulate write gating, attention width, decay, thresholds, and consolidation cadence for the next input.
The strongest result so far is a hosted frontier-judge eval on a public Meta Multi-Session Chat slice (2026-06-30). Cortext won 21 of 27 blind judgments and used 98% fewer context tokens than traditional chat+RAG.
| System | Win rate | Raw wins | 95% CI | Mean context tokens |
|---|---|---|---|---|
| Cortext native | 0.778 | 21/27 | [0.519, 0.963] | 998 |
| Traditional chat+RAG | 0.000 | 0/27 | [0.000, 0.000] | 49,196 |
| Full-history upper bound | 0.037 | 1/27 | [0.000, 0.111] | 185,439 |
| Hosted compaction rollup | 0.185 | 5/27 | [0.037, 0.444] | n/a |
That is a 97.97% context-token reduction versus traditional chat+RAG (probe-bootstrap 95% CI [97.77%, 98.17%]). Cortext also had the lowest mean judged noise: 1.85, against 4.70, 5.00, and 3.63 for the other three systems.
Caveat: this is not yet a full sufficiency-match claim. Traditional chat+RAG scored higher mean sufficiency (4.67 versus 4.41), and the compaction baseline scored 4.63. Cortext wins on token cost, judge preference, and noise, not yet on raw sufficiency.
Protocol: the replay processed 9,130 text turns from 708 rows of the Hugging
Face nayohan/multi_session_chat mirror, with daily source-time consolidation
at 02:00 UTC, a 5,000-event warmup, and 9 probes at 500-event intervals. The
judge was gpt-5.5 over hosted OpenAI Chat Completions at a 1,000,000-token
context, three blind repetitions per probe, judge_seed=42, 2,000 bootstrap
samples, all 27 judgments completed across four text-only systems. The strict
gates passed: judge prompt fits context, full-history prompt fits judge context,
no future or current-turn leakage, hidden labels absent, text-only RAG
baselines. Artifact:
eval_runs/msc_frontier_late_200dlg_gpt55_20260630T053427Z/judge_openai_gpt55_four_system_clean.json.
A post-optimization full replay on the same 9,130-turn MSC slice preserved the
native probe behavior exactly while flattening graph-retrieval latency. The 9
progress checkpoints reported GraphRetrieve.total between 13.7 ms and 27.7
ms, and the 9 judged probe turns reported 21.8-28.6 ms. Non-timing probe fields,
retrieved/working memory IDs, retrieval counts, memory counts, and consolidation
counts matched the saved baseline exactly. Artifact:
build/graph_profile/full_msc_verify_final/summary.json.
A follow-up 128k-capped RAG ablation on the same probes used six blinded
systems and three hosted gpt-5.5 judgments per probe. Cortext won 19/27,
the capped compaction rollup won 8/27, and semantic-vector-only,
lexical-keyword-only, rolling-window-only, and 16k hybrid chat+vector RAG each
won 0/27. The max estimated judge prompt was 116,425 tokens under a 128,000
context cap; actual OpenAI usage was 1,871,994 prompt tokens and 39,001
completion tokens. Mean context tokens were 816 for Cortext, 88 for semantic
vector RAG, 224 for lexical RAG, 15,999 for rolling-window chat, 15,999 for
hybrid RAG, and 7,110 for compaction. Artifact:
eval_runs/msc_rag_ablation_128k_gpt55_20260630T_actual/judge_openai_gpt55_rag_ablation_128k.json.
An earlier local blind-judge pass (2026-06-28) on a one-year sparse replay, judged by Gemma4-12B-AWQ over vLLM at a 131,072-token context, completed 93/93 judgments.
| System | Per-31-probe wins | Raw wins | Mean context tokens |
|---|---|---|---|
| Cortext native | 15.7/31 | 47/93 | 467 |
| Traditional chat RAG | 5.3/31 | 16/93 | 7,447 |
| Full-history upper bound | 1.0/31 | 3/93 | 15,974 |
| Tie / unclear | 9.0/31 | 27/93 | n/a |
Cortext's three single-repetition counts were 14, 17, and 16 of 31, bracketing the historical 15-win A/B baseline. Mean context was 467 tokens versus 7,447 for traditional RAG, a 93.7% reduction.
A separate release probe confirms the v1.0 hard cut is behavior-preserving.
Against a preserved replay binary on the same AIST GGUF model and system GGML
runtime, dense and full sparse replay matched exactly: retr_diffs=0 and
rank_diffs=0.
Full protocol and artifacts live in docs/paper/sections/9_experimental.qmd and
the generated manuscript at docs/paper/_manuscript/index.md.
Replay and harness tools:
cmake --build build -j --target cortext_chat_replay_live_run
./build/examples/benchmark/cortext_chat_replay_live_run --help
python scripts/run_memory_harness.py --max-conversations 2 --max-turns 360 --max-total 720 --no-multi
scripts/run_msc_frontier_judge.shCortext v1.0 is the hard-cut production runtime: the embedding and graph memory engine, and nothing else. The older research stack (decoder, provider registry, semantic extractor, summarizer, static label bank, fact layer, label-bucket graph, mode-selected deep consolidation) is preserved in git but not shipped. The release surface is deliberately smaller than the research history.
Requirements:
- C++20 compiler
- CMake
- Git and Python 3 for default dependency and model bootstrap
The default build fetches bundled native dependencies and downloads the required
AIST GGUF model into models/AIST-87M-GGUF/. Configure, build, and test:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build -j
ctest --test-dir build -R cortext_tests --output-on-failureThe model-free CI release gate runs:
./build/tests/cortext_tests '~[aist]' --reporter compactExamples are optional:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug -DCORTEXT_BUILD_EXAMPLES=ON
cmake --build build -j
./build/examples/topical_chat_analysis/cortext_topical_chat_analysis --help#include <cortext/cortext.hpp>
#include <iostream>
int main()
{
cortext::Cortext::Config cfg;
cfg.focus = 0.7; // F: attentional precision
cfg.sensitivity = 0.5; // S: reactivity to surprise
cfg.stability = 0.8; // T: plasticity vs. retention
auto engine = cortext::Cortext::Create(cfg, "memory.db", "models");
auto context = engine->ProcessText("Bailey likes tennis balls.", "chat/main");
for (const auto &memory : context.retrieved_memory)
{
std::cout << memory.id << " " << memory.source_id << "\n";
}
auto embedding = engine->EmbedText("embed without storing");
std::cout << "embedding dims: " << embedding.size() << "\n";
engine->Consolidate();
engine->Flush();
}Public entrypoints:
- C++ API:
include/cortext/cortext.hpp - C API:
include/cortext/capi.h
- Native C++20 API and stable C ABI.
- FFI bindings for Python, Go, JavaScript/TypeScript, Dart, and WebAssembly.
- Text, audio, and image processing calls that store durable memory.
- Text, audio, and image embed-only calls that do not mutate memory state.
- Working-memory and long-term retrieval packets returned as JSON.
- Explicit
Consolidate()/cortext_consolidate_json()shallow replay. Reset()/cortext_reset()for volatile processor-state reset that keeps durable memories.- SQLite metadata storage plus sqlite-objstore payload storage by default.
- External database and object-store callback seams for embedders that own storage.
Cortext's required encoder is augmem/AIST-87M in the local GGUF layout under
models/AIST-87M-GGUF/. The default build downloads and verifies
AIST-87M_q8_0.gguf automatically. The engine auto-discovers
AIST-87M_q8_0.gguf or AIST-87M_q5_1.gguf, or you can pin a path with
CORTEXT_AIST_MODEL_PATH. If the model cannot be resolved, engine creation
fails instead of falling back to a different embedding space.
AIST is multimodal: text, audio, speech, and image inputs map into one retrieval space. Audio inputs use 16 kHz mono float32 PCM. Image inputs use row-major RGB/RGBA bytes with explicit width, height, and channel count.
Every database pins the encoder fingerprint that produced its embeddings,
because mixing embedding spaces silently corrupts retrieval. source_id is
opaque provenance, used for exact same-source grouping and hydration, not hidden
behavior switches.
Default build flags are opt-out:
CORTEXT_FETCH_AIST_MODEL=ONdownloads AIST duringcmake --build.CORTEXT_AIST_MODEL_QUANT=q8_0selects the quantization; useq5_1orall.CORTEXT_FETCH_GGML=ONfetches and builds the bundled GGML backend.CORTEXT_DISABLE_OPENTELEMETRY=OFFfetches the OpenTelemetry API; exporters stay off unlessCORTEXT_OPENTELEMETRY_EXPORTERS=ON.CORTEXT_USE_SYSTEM_GGML=ONwithCORTEXT_FETCH_AIST_MODEL=OFFare overrides for packagers or offline builds, not the default release path.
Cortext separates metadata storage from payload storage. SQLite is the supported default, but embedders can own persistence topology.
cortext::Storeandcortext::Transactiondefine the database boundary for schema, query, and transactional work.SQLiteStoreis the built-in implementation.cortext::ObjectStoreandcortext::ObjectTransactiondefine content-addressed payload storage.SqlObjectStoreis the current sqlite-objstore-backed implementation over aStore.Cortext::Create()has overloads for the default SQLite path, a caller-supplied database store, a caller-supplied object store, or both.
Alternate providers are extension points today, not bundled backends.
The production loop is built from small operations in src/operations/.
flowchart TD
input["input<br/>(text / audio / image)"] --> perception[perception]
perception --> accumulator[stream accumulator]
accumulator --> wm[working memory]
wm --> retrieval[graph retrieval]
retrieval --> ctx[context out]
wm --> consolidation[shallow consolidation]
consolidation --> ltm[long-term store]
retrieval -. memory-usage events,<br/>prediction error,<br/>contextual gain .-> control
ltm -. storage pressure .-> control
control["<b>homeostatic control</b><br/>Focus / Sensitivity / Stability<br/>storage pressure / neuromodulators"]
control -. modulates write gate,<br/>attention width, decay,<br/>thresholds, consolidation cadence .-> perception
control -.-> accumulator
control -.-> wm
control -.-> consolidation
classDef loop fill:#fef3c7,stroke:#b45309,color:#78350f
classDef flow fill:#e0f2fe,stroke:#0369a1,color:#0c4a6e
class control loop
class input,perception,accumulator,wm,retrieval,ctx,consolidation,ltm flow
Key pieces:
focus_feedback,sensitivity_feedback, andstability_feedbackadjust the three control knobs from retrieval and usage outcomes.accumulator,accumulator_scores,spike_bypass, andwrite_gateturn streaming signals into bounded memory writes.embedding_prediction_errorcontributes surprise to the feedback path.neuromodulatorsandemotion_cascademodulate encoding and reconsolidation strength.reconsolidationandconstructive_recall_internalupdate existing memory surfaces on re-exposure instead of duplicating memories.graph_retrievalcombines embedding similarity, durable graph edges, reconstruction-aware ranking, and temporal scoring.soft_anchormaintains provisional anchor evidence without treating it as immutable ground truth.
The three Config fields are the primary control knobs. Most other tunable
parameters derive from them through transformations specified in the paper.
The C ABI lives in include/cortext/capi.h. JSON-returning helpers allocate
strings owned by Cortext; callers must release them with cortext_string_free.
| Purpose | C ABI |
|---|---|
| create | cortext_create_with_config |
| process text/audio/image | cortext_process_*_json |
| embed text/audio/image only | cortext_embed_*_json |
| explicit shallow consolidation | cortext_consolidate_json |
| commit buffered writes | cortext_flush |
| reset volatile state | cortext_reset |
| diagnostics | cortext_last_error, cortext_version |
Bindings live under bindings/:
bindings/python:ctypesbindings/go:cgobindings/javascript: Node-API plus TypeScript declarationsbindings/dart:dart:ffibindings/wasm: browser ES-module wrapper over the WebAssembly C ABI
Build the shared library for FFI consumers:
cmake --preset ffi-release
cmake --build --preset ffi-release --target cortextNode's native addon uses the Node-enabled preset:
cmake --preset ffi-release-node
cmake --build --preset ffi-release-node --target cortext cortext_nodeZig can also build the shared library. The default Zig path downloads and verifies AIST and compiles the bundled GGML CPU backend; prebuilt GGML paths are an override for packagers.
zig build checkThe browser build uses Emscripten and emits an ES module plus a .wasm payload:
./build-wasm.shOutputs:
build-wasm/dist/wasm/cortext.js
build-wasm/dist/wasm/cortext.wasm
The module exports the public C ABI and malloc/free. A small JavaScript
wrapper lives in bindings/wasm/cortext.js, and a browser demo lives in
examples/web/.
The browser build still needs the AIST GGUF model. For demos, either select the model file in the browser UI or embed the model directory into the virtual filesystem at build time:
./build-wasm.sh -DCORTEXT_WASM_PRELOAD_MODELS_DIR="$PWD/models/AIST-87M-GGUF"Serve the repository root after building:
python3 -m http.server 8000Then open http://localhost:8000/examples/web/.
include/,src/: public headers and C++ implementation.src/operations/: control-loop and memory pipeline operations.tests/: Catch2 test suite.examples/: benchmarks, demos, and smoke tests.bindings/: Python, Go, JavaScript/TypeScript, Dart, and WebAssembly FFI.scripts/,tools/: experiment harnesses and offline utilities.docs/paper/: manuscript source and generated markdown.models/: local model assets.third_party/: vendored native dependencies.
The architecture is specified in the manuscript generated at
docs/paper/_manuscript/index.md from the source sections in
docs/paper/sections/. To understand why the loop is shaped the way it is, start
with the knob derivations, stability/plasticity analysis, homeostatic threshold
control, the consolidation section, and the release evidence.
Regenerate it with:
QUARTO_DISABLE_GIT=1 QUARTO_DISABLE_GITHUB=1 quarto render docs/paperCortext began for a personal reason. In 2022, my father-in-law was diagnosed with dementia. Since then I have been focused on building systems that help people with memory loss preserve continuity, confidence, and independence.
The same architecture is useful for long-horizon LLM memory, but the primary motivation is human. Cortext is built for realtime streams where the system has to notice what matters, surface relevant context, and avoid forcing the user to manage memory by hand. A care context needs homeostasis. Salience, confusion, and emotional state change through the day, and that is exactly what an open-loop memory system cannot track.