Reality Intelligence: Perception without Hallucination

The authoritative local world-model system for high-fidelity scene understanding.

Architecture · Smriti Memory · Setu-2 Bridge · Quick Start · API

"Memory is not a caption waiting to be written. It is a geometry waiting to be understood."

The Reality Intelligence Manifesto

Toori is a local-only, JEPA-powered world-model ecosystem that understands the physical world through visual truth, not generative imagination. It replaces hallucinatory captions with geometric grounding, enabling high-fidelity scene understanding and private semantic memory.

👁️ Observe Natively: Zero-shot perception via deep geometric grounding.
🧠 Remember Semantically: Private memory structured by physical reality.
💬 Query Cinematically: Natural language reasoning on visual truth.

The intelligence layer for the era of embodied perception. 100% Local. Zero Cloud. Just the Truth.

What Is TOORI?

TOORI is a local-first AI system that builds a world model of your visual environment — in real time from a live camera, and over time from your personal photo and video library.

At its core, TOORI uses Joint Embedding Predictive Architecture (JEPA) — the class of energy-based, non-generative models that represent the visual world as abstract geometry rather than text. This isn't another app that runs your photos through a language model and stores flat captions. TOORI stores the geometry itself. The memory is the embedding.

The result: a system that can tell you when you wore that red jacket in Kolkata without ever hallucinating context, confusing a telescope for a wig, or uploading a single byte to a cloud server.

The Problem We Solve

Every major photo AI — Apple Photos, Google Photos, and their ilk — works like this:

Photo → Generative LLM/VLM → Generate caption → Store caption → Search captions

Generation = hallucination. When your camera captures you sitting at a desk with a telescope in the background, generative systems often assert "a person with a cylindrical object on their shoulder" or "someone wearing an unusual hair accessory." They confuse context. They invent details. They are fundamentally unreliable for high-stakes recall.

TOORI works like this:

Photo/Frame → JEPA Engine → Embed geometry → Store embedding → Query with energy bridge

No generation. No captions. No hallucinations. The visual truth is never compressed through a language bottleneck.

Mission & Vision

The mission is to turn abstract JEPA ideas into a practical, operator-facing system. We want you to point a camera at the real world and directly observe:

what the model expects to stay true
what actually changed
whether an entity persisted through occlusion or motion
whether a temporal world model outperforms caption-only baselines

The long-term vision is a camera-native cognition layer that can sit behind many products:

A scientific desktop proof surface for demonstrating JEPA-style behavior.
A plugin/runtime boundary that other applications can call as a perception-and-memory layer.
A cross-platform stack for desktop, mobile, robotics-adjacent interfaces, and ambient intelligence workflows.

Smriti — Personal Memory

Smriti is TOORI's personal media organization surface. It watches your photo and video library, indexes every item using JEPA embeddings, and lets you recall any moment with a natural language query — all on your hardware, in complete privacy.

Why Smriti Defeats Global Photo AIs

	Apple / Google Photos	TOORI Smriti
Core AI	CLIP / VLM generation	JEPA energy-based embeddings
Hallucination	Inherent (LLMs confabulate context)	Bounded (ECGD gate blocks uncertain output)
Memory model	Calendar + face clusters + NLP index	JEPA latent geometry + Semantic Anchor Graph
Query engine	Text→image CLIP similarity	Setu-2 EBM: language → JEPA metric space
Personalization	Global model (no per-user adaptation)	W-matrix updates directly from your ✓/✗ feedback
Privacy	Cloud-dependent	100% local — nothing leaves your machine
Video	Frame-level only	V-JEPA temporal prediction + TPDS depth separation

What Smriti does today

Automatic ingestion daemon — Watch any folder via filesystem events, with SHA-256 deduplication and tuned back-pressure queuing (PyAV for video, watchdog for folders).
Natural language recall — Ask "the evening I was by the river". Results ranked by JEPA energy, not keywords.
Mandala view — A massive 2D Canvas semantic cluster map of your library, organized by visual proximity. Backed by a high-performance Web Worker.
Deepdive patch grid — Click any memory to see an immersive 14×14 JEPA energy patch grid. See exactly why a semantic template recognized it. (Press E to toggle, F for fullscreen, Esc to close).
Person Journal — Tag people to generate a living timeline and co-occurrence graph showing social topology.
Storage control — Set GB budgets, view disk usage, prune failed data, and perform one-click copy-first data migrations to external drives.

Setu-2 — The Language Bridge

Setu-2 solves the core problem: how do you query in natural language without making the AI describe images in natural language?

Instead of an LLM generation step, Setu-2 is an Energy-Based Model (EBM) bridge:

User query: "red jacket Kolkata"
     ↓  tokenize + P_θ projector (4-layer MLP)
Query embedding  ∈  ℝ³⁸⁴  (JEPA metric space)
     ↓  E(q, v) = ‖W(q − v)‖²  for each v in corpus
Rank by ascending energy  →  top-K results
     ↓  ECGD gate filters uncertain proposals
✓   Hallucination-bounded recall

The W-matrix (384×384) personalizes the metric. When you use the ✓ and ✗ buttons in the Recall UI, the W-matrix pulls that query-memory pair closer or pushes them apart. The system learns your visual vocabulary natively. Zero LLM calls at runtime, zero generative steps, zero shared global training data.

System Design & Architecture

High-Level Architecture

Architecture diagram — data flow from capture through JEPA to proof surfaces:

flowchart LR
  Camera["Live Camera / File Input"] --> Runtime["Toori Runtime"]
  Runtime --> Perception["Primary Local Perception"]
  Perception --> JEPA["JEPA Engine"]
  JEPA --> Atlas["Epistemic Atlas"]
  JEPA --> Talker["Selective Talker"]
  Perception --> Observation["Observation Storage"]
  Observation --> World["World Model Layer"]
  World --> Living["Living Lens Proof Surface"]
  World --> Challenge["Challenge Evaluation"]
  Challenge --> Compare["Baseline Comparison"]
  World --> ToolState["Tool-State Observer"]
  World --> Planning["Planning Rollout Engine"]
  Planning --> Recovery["Recovery Benchmarks"]
  Runtime --> Reasoning["Optional Local / Cloud Reasoning"]
  Reasoning --> Observation
  Runtime --> Smriti["Smriti Memory System"]
  Smriti --> Ingestion["Ingestion Daemon"]
  Smriti --> Recall["Recall + Journals + HUD"]
  World --> Events["WebSocket Events"]
  Events --> Living
  Events --> Recall
  Events --> Plugins["Plugin SDKs / Host Apps"]
  Challenge --> Docs["Reports / Results"]

Toori is organized into seven functional layers:

Capture layer — Real camera frames or uploaded images enter through Browser, Electron, or mobile clients.
Observation layer — The runtime stores observations, thumbnails, embeddings/descriptors, provenance, and session context.
World-model layer — Temporal state is built on top of observations through SceneState, EntityTrack, PredictionWindow, continuity signals, persistence signals, and challenge runs.
Planning layer — GroundedEntity and GroundedAffordance objects derived from live frames feed ActionToken-ranked rollout branches. RolloutComparison ranks Plan A vs Plan B. Uncertainty gates suppress low-confidence branches.
Proof layer — Living Lens exposes predicted vs observed state, stability/change, persistence, challenge evaluation, and baseline comparison. The Recovery Lab adds closed-loop recovery benchmarks and tool-state grounding.
Smriti layer — The semantic memory system persists media, builds cluster graphs, performs guarded recall, and surfaces person/location timelines without breaking the main live runtime.
Extension layer — The same runtime is available through HTTP, WebSocket events, and generated plugin SDKs.

The VL-JEPA Pipeline (5 Stages)

Smriti relies on an intentionally heavily-gated pipeline:

Stage	File Location	What it does
TPDS	`depth_separator.py`	Temporal Parallax Depth Separator: computes foreground, midground, and background strata directly from JEPA motion energy deltas (no depth sensor required).
SAG	`anchor_graph.py`	Semantic Anchor Graph: matches patch topology against 8 predefined bootstrap templates (e.g. `person_torso`, `chair_seated`). Eliminates severe semantic confusion.
CWMA	`world_model_alignment.py`	Cross-Modal World Model Alignment: injects physical co-occurrence priors via the 50KB SCPT table to penalize impossible configurations.
ECGD	`confidence_gate.py`	Epistemic Confidence Gate: A mathematical 3-condition lock. Blocks low-consistency matches and emits uncertainty maps instead of hallucinated answers.
Setu-2	`setu2.py`	JEPA-to-Language Bridge: Grounded EBM query scoring and template-based descriptions.

The Sentinel Production Contract

The system's integrity revolves around one non-negotiable invariant test:

def test_telescope_behind_person_not_described_as_body_part():
    """
    A telescope in the background must NEVER be described as
    a body part (shoulder, wig, hair, arm, etc.) when a person
    sits in the foreground.
    """

If this test fails, all engineering work stops until it passes. It is the sentinel proving TPDS, SAG, CWMA, and ECGD are fully operational.

Quick Start

Prerequisites

macOS (Apple Silicon M1/M2/M3 highly recommended) or Linux
Python 3.11
Node.js 20+

The runtime is validated on Python 3.11 only. Launching the JEPA backend with a newer python3 alias can trip unsupported native-extension crashes in the torch/transformers/onnx stack, so prefer bash scripts/run_runtime.sh.

1. Install & Launch

# Clone the repository
git clone https://github.com/NeoOne601/Toori.git
cd Toori

# Install Python backend dependencies
python3.11 -m pip install -r requirements.txt

# Launch the FastAPI JEPA runtime in Terminal A
TOORI_DATA_DIR=.toori bash scripts/run_runtime.sh --reload

# Install and start the Desktop UI in Terminal B
cd desktop/electron
npm install
npm run web

Open: http://127.0.0.1:4173

(Browser mode is recommended natively for active proof development, as it seamlessly handles macOS Camera permission flows avoiding unsigned Electron bundle identifier quirks. Legacy ONNX models can optionally be downloaded using python3.11 scripts/download_desktop_models.py)

2. Configure Desktop Settings

Open the Settings menu.

World Model: V-JEPA2 is now configured dynamically via a JSON settings mirror (no hard-coded constants). Use World Model in Settings to point Toori at a local V-JEPA2 weight directory, choose the HuggingFace cache directory used for local-only loads, tune n_frames, and inspect whether the last tick stayed on vjepa2 or degraded to the honest dinov2-vits14-onnx fallback. The WorldModelStatus panel reports the configured encoder, the encoder actually used on the last tick, and an explicit degradation reason/stage if V-JEPA2 fell back.
Providers: M1 users default to DINOv2 perception. ONNX remains a proposal-box compatibility path, while ollama, mlx, and cloud are optional language sidecars — they must never overwrite authoritative world-model metrics or rollout rankings.
Themes: Choose from system, dark, light, graphite, sepia, high_contrast_dark, and high_contrast_light. Theme changes save immediately; other settings remain draft until you save them.
Smriti Storage: Configure the heavy-data directory. (On an M1 iMac with a 256 GB SSD, point Smriti at an external drive before indexing a large video corpus).

3. Smriti Quick Workflow

In Settings -> Smriti Storage, use + Add Folder to select your Photos directory or watch folder.
Under Smriti -> HUD, watch the background queue and ingestion workers process your life.
Once ingested, go to Smriti -> Recall and ask questions: "red jacket", "snow at night", "living room".
Click a result. Press E to view JEPA energy overlays.
Use ✓ and ✗ buttons on results to train your personalized Setu-2 W-matrix.
Tag a person to automatically generate their Journals and social topology graph.

4. Living Lens & Recovery Lab Workflow

Use the Live Lens for live camera capture debugging. The Living Lens proof surface runs continuously in passive mode to track entity persistence, predicting moment-to-moment reality.

The Recovery Lab (inside Living Lens) extends the scene-state pipeline with:

Tool-State Observer — ground browser or desktop tool evidence into the same world-state pipeline via POST /v1/tool-state/observe.
Planning Rollout — submit a scene + candidate actions to get ranked Plan A / Plan B branches scored by the world model via POST /v1/planning/rollout.
Recovery Benchmarks — run closed-loop hybrid recovery benchmarks and retrieve historical results via POST/GET /v1/benchmarks/recovery/.

Evaluate baselines by running an occlusion challenge and hit Copy Share Text for a portable recap.

Environment & Configuration

Variable	Default	Description
`TOORI_DATA_DIR`	`.toori`	Root runtime data directory
`TOORI_SMRITI_DATA_DIR`	`{data_dir}/smriti`	Override Smriti specific data location
`TOORI_PUBLIC_URL`	`https://github.com/NeoOne601/Toori`	Public URL used in auto-generated Share CTAs
`TOORI_DINOV2_DEVICE`	`cpu`	Set to `mps` for Apple Silicon neural engine acceleration

Project Structure & What's Included

cloud/
  api/main.py             — Loopback runtime entrypoint (port 7777)
  api/tests/              — Comprehensive test suite (197 tests passing)
  jepa_service/engine.py  — Core JEPA engine & spatial energy maps
  runtime/
    smriti_ingestion.py   — Ingestion daemon & watch queues
    smriti_storage.py     — SmetiDB + FAISS-lite vector index (+ RecoveryBenchmarkRun persistence)
    smriti_migration.py   — Copy-first data migration service
    setu2.py              — Setu-2 EBM language bridge
    service.py            — Runtime contracts, states, configs (V-JEPA2 dynamic config + planning logic)
    world_model.py        — Grounded entities, affordances, rollout comparison, recovery benchmark builder
    models.py             — Canonical schema: ActionToken, GroundedEntity, GroundedAffordance,
                            RolloutBranch, RolloutComparison, RecoveryBenchmarkRun, WorldModelStatus, WorldModelConfig
    proof_report.py       — PDF proof report generator (now byte-streaming via _render_pdf_bytes)
  search_service/main.py  — Compatibility search layer
  perception/
    vjepa2_encoder.py     — V-JEPA2 encoder (dynamic JSON-based config: _resolve_model_id, _resolve_n_frames)

desktop/electron/         — Electron shell & React/Vite operator UI
  src/tabs/SmritiTab.tsx  — Main workspace
  src/components/smriti/  — Mandala Canvas, Recall grid, HUD, Deepdive

mobile/                   — Native clients
  ios/TooriApp/           — SwiftUI client sources
  android/com/toori/app/  — Jetpack Compose client sources

sdk/                      — Plugin SDK outputs (Python, TypeScript, Swift, Kotlin)
docs/                     — Extended system-design and manual architecture specs

Core Operational API Highlights

Method	Route	Description
`POST`	`/v1/analyze`	Single frame camera analysis
`POST`	`/v1/living-lens/tick`	Advance live world-model state
`GET`	`/v1/world-model/status`	Report configured encoder, last tick result, and degradation stage (`WorldModelStatus`)
`GET`	`/v1/world-model/config`	Retrieve current V-JEPA2 dynamic config (`WorldModelConfig`)
`PUT`	`/v1/world-model/config`	Update V-JEPA2 model path, cache directory, and `n_frames` at runtime
`POST`	`/v1/tool-state/observe`	Ground browser or desktop tool state into the same world-state pipeline
`POST`	`/v1/planning/rollout`	Rank local Plan A / Plan B action-conditioned branches (`RolloutComparison`)
`POST`	`/v1/benchmarks/recovery/run`	Run the hybrid recovery benchmark pack (`RecoveryBenchmarkRun`)
`GET`	`/v1/benchmarks/recovery/{id}`	Fetch a stored recovery benchmark run
`GET`	`/v1/world-state`	Get Epistemic Atlas and scene threads
`WS`	`/v1/events`	Stream live observations & syncs
`POST`	`/v1/share/observation`	Build shareable observation text
`POST`	`/v1/smriti/ingest`	Ingest media or watch folder
`POST`	`/v1/smriti/recall`	Setu-2 semantic recall query
`POST`	`/v1/smriti/tag/person`	Tag a person in media
`GET`	`/v1/smriti/person/{name}/...`	Person journal and co-occurrence graphs
`GET`	`/v1/smriti/clusters`	Force graph geometry points (Mandala)
`POST`	`/v1/smriti/storage/migrate`	Non-destructive deep migration

Proof Surface Glossary

Live Lens: Manual capture and debugging surface.
Living Lens: Continuous proof surface demonstrating real-time persistence and continuity.
Recovery Lab: World-model planning workspace inside Living Lens — tool-state grounding, ranked rollout branches, and benchmark evaluation.
JEPA Energy / Residual: Pure loss target. Low energy = prediction consistency.
Epistemic Atlas: Spatial tracking of entities and their relationship threads across time.
Selective Talker: Adaptive event narration. Only speaks on extreme JEPA surprise or track shifts.
GroundedEntity: A tracked scene element attached to world-model state with a spatial domain (table_surface, hand, tool, etc.).
GroundedAffordance: An affordance (reachable, graspable, blocked, etc.) attached to a GroundedEntity as predicted by the world model.
ActionToken: A ranked candidate action derived from grounded entities and affordances, used to seed rollout branches.
RolloutComparison: A ranked pair of Plan A vs Plan B rollout branches. Each branch scores predicted outcome + uncertainty.
RecoveryBenchmarkRun: A persisted benchmark run that evaluates camera + tool planning across a recovery scenario set.
WorldModelStatus: Live diagnostic: configured encoder, encoder actually used on last tick, degradation reason, and fallback stage.
WorldModelConfig: Dynamic V-JEPA2 parameters (model path, cache directory, n_frames) that can be updated at runtime via PUT /v1/world-model/config.
Responsive Grid: Modular Living Lens layout separating understanding, pulse, and relinking.
Saliency Filtering: Rejects weak or irrelevant proposals from dynamic entities.
Passive mode: Continuous monitoring updating the scene model seamlessly.

Performance Goals

Benchmarked on Apple M1 iMac (8GB Unified Memory), cpu device:

Mean JEPA Tick: 48.5ms
Recall Latency: < 500ms for 10,000 items (empty corpus: < 100ms)
FAISS Index Footprint: ~5MB (10k items)
Idle Runtime RAM: ~180MB

Production Invariants & Tests

Run the backend verification suite:

pytest -q cloud/api/tests cloud/jepa_service/tests \
       cloud/search_service/tests cloud/monitoring/tests tests/test_readme.py

Run the rigid production gate (12 tests that lock in the Sentinel contract):

pytest -v cloud/api/tests/test_smriti_production.py

Run Desktop Typechecks:

cd desktop/electron && npm run typecheck && npm run build

Critical Invariants & Notes:

torch imports are absolutely forbidden anywhere outside of cloud/perception/.
JEPA calculation remains pure numpy float32.
Storage Migrations are non-destructive and verify the copy before updating destinations.
EMA updates strictly happen before predictor forward steps.
Missing cloud reasoning models degrade gracefully into local memory-storage operations.
Native mobile wiring directories exist but require separate packaging steps outside the desktop Python runner loop.
ollama and MLX are explanatory sidecars only — they must never overwrite authoritative world-model metrics, rollout rankings, or recovery benchmark winners.
V-JEPA2 encoder configuration is now dynamic. The canonical settings are written to a JSON mirror on disk by _write_settings_mirror() and re-read by _resolve_model_id() / _resolve_n_frames() on each load. Never hard-code encoder parameters.
WorldModelStatus must stay truthful: report the configured encoder, the encoder actually used on the last tick, and explicit degradation reason/stage when V-JEPA2 falls back to surrogate.

Roadmap / Sprints Summary

Sprint	Tests	Delivered
Baseline	84	Original TOORI (Camera + world model basics)
Sprint 1	139	Full JEPA Pipeline (TPDS, SAG, CWMA, ECGD, Setu-2)
Sprint 2+3	146	PyAV integration, Ingestion, Smriti Recall & Journals UI
Sprint 4	162	Storage config, quotas, Watch Folders management
Sprint 5	197	Safe Migrations, W-Matrix Feedback, Web Worker Mandala, WCAG AA
Sprint 6	235	World Model Foundation: GroundedEntities, Affordances, Planning Rollouts, Recovery Benchmarks, V-JEPA2 Dynamic Config
Sprint 7	—	Mobile companion · federated Setu-2 · signed macOS bundle

Accessibility

Smriti UI rigorously complies with WCAG 2.1 AA standards:

Extensive keyboard trap mechanisms inside modals (Deepdive focus retention) + Skip link options.
Native ARIA live regions for background ingestion indexing status.
Respect for global prefers-reduced-motion logics.
#9db1c6 text on #07111b background, guaranteeing superior >4.6:1 contrast ratio.

Contributing

We welcome PRs for perception architectures and new extensions! See CONTRIBUTING.md. Before filing:

pytest -v cloud/api/tests/test_smriti_production.py — ensure 12/12 passing.
npm run typecheck — zero TypeScript leakages.
The telescope Sentinel test must NEVER be tampered with.
No torch in root or Smriti modules.

License

Engine and Plugin SDK: Apache 2.0
UI proof surfaces / Translators logic: CC-BY-SA 4.0 (unless otherwise noted).
Otherwise covered under standard MIT License. See LICENSE.

"Memory is not a caption waiting to be written.
It is a geometry waiting to be understood."

Built with JEPA principles · FastAPI · Electron

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github		.github
.smriti_build		.smriti_build
cloud		cloud
desktop/electron		desktop/electron
docs		docs
fastlane		fastlane
graphify-out		graphify-out
mobile		mobile
models		models
output/pdf		output/pdf
scripts		scripts
sdk		sdk
smriti-kit		smriti-kit
tests		tests
tmp/pdfs		tmp/pdfs
.gitignore		.gitignore
.graphify_detect.json		.graphify_detect.json
.pytest_results.txt		.pytest_results.txt
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Codex_SharedDesignSystem_performancepolish_viralmoments_Session3.md		Codex_SharedDesignSystem_performancepolish_viralmoments_Session3.md
LICENSE		LICENSE
Llama-Modelfile		Llama-Modelfile
README.md		README.md
conftest.py		conftest.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Reality Intelligence: Perception without Hallucination

"Memory is not a caption waiting to be written. It is a geometry waiting to be understood."

The Reality Intelligence Manifesto

What Is TOORI?

The Problem We Solve

Mission & Vision

Smriti — Personal Memory

Why Smriti Defeats Global Photo AIs

What Smriti does today

Setu-2 — The Language Bridge

System Design & Architecture

High-Level Architecture

The VL-JEPA Pipeline (5 Stages)

The Sentinel Production Contract

Quick Start

Prerequisites

1. Install & Launch

2. Configure Desktop Settings

3. Smriti Quick Workflow

4. Living Lens & Recovery Lab Workflow

Environment & Configuration

Project Structure & What's Included

Core Operational API Highlights

Proof Surface Glossary

Performance Goals

Production Invariants & Tests

Roadmap / Sprints Summary

Accessibility

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages