Earth as memory, for real-world agents.
Hosted · Docs · Spec · OpenAPI · Try it · /verify · Gallery · HF Space
Where emem has attested facts right now. 1440×720 plate-carrée. The same SVG renders live at /v1/coverage_map.svg.
Mumbai · Manhattan · Tokyo, painted by Copernicus DEM elevation. Each image is a live endpoint URL; click to see the latest signed render. /docs/gallery has the full set.
An LLM asked what is at a place will usually guess. It has no stable handle for the patch of ground at 19.07° N, 72.87° E and no audit trail for the number it returns. emem is the missing handle.
emem is a shared memory for AI agents that connects facts and gets better over time — and every answer is signed, so anyone can check it without trusting the server.
Here is how it works, in plain terms first. A cell64 addresses a place the way a token addresses text in an LLM: every patch of ground on Earth gets a 64-bit identifier (about 9.55 m on a side at the equator). A fact is one measurement at one cell — keyed by cell, band, and time. Each fact is signed: emem takes a one-way fingerprint of the answer (a blake3 hash — change one byte and the fingerprint changes completely) and packs the bytes in a fixed, canonical order (canonical CBOR) so the fingerprint is the same on every machine. That fingerprint is the fact's content id: a 26-character string that names the bytes themselves. emem then signs the content id with an ed25519 key — the same kind of digital signature that secures SSH and HTTPS. The signer is the responder. Paste the content id into a chat and a colleague can pull the same bytes from any responder and check the signature in their browser at /verify — no need to trust the server it came from. The hosted node is https://emem.dev; there are no keys, no accounts, and the same handlers answer both MCP and plain REST.
When an agent asks for a band at a cell that has no signed fact yet, the responder fetches the underlying tile through one of its 46 upstream sources, signs the result as its own primary attestation, persists it, and returns it in the same response. The cold path takes about 180 ms; warm reads are sub-ten milliseconds. Every cell on Earth answers cite-ably from day one, without a pre-seeded corpus. Five of those 46 schemes are declared but not yet wired (openet.30m.daily, dynamic_world.v1, tropomi.s5p.ch4, tropomi.s5p.no2, viirs.dnb.monthly), so they return a typed Absence rather than data — the catalog never promises more than it can sign.
Above the spatial fact store sits a writable memory substrate the agent owns. memory_create writes content-addressed, signed files into /memories/. memory_search runs BGE-768 embeddings against a LanceDB IVF_PQ partition over those files so the agent paraphrases its own past notes. memory_contradictions scores disagreement between attesters at the same (cell, band, tslot) per band kind — scalar by normalised spread, vector by mean cosine, categorical by mode-share. memory_bundle composes N facts into one signed envelope (memb:<bundle_cid>) that resolves byte-identically on any peer. Each memory file carries a kind from the CoALA taxonomy (episodic / semantic / procedural / resource); writes can be capability-bound to a specific ed25519 attester so paths under /memories/by_attester/<pubkey>/... reject any signer that isn't their owner. The same signing surface that proves a Sentinel-2 reading is real proves an agent's own memory is unmodified.
Every read primitive in the substrate carries a bi-temporal axis. as_of_tslot returns the latest fact whose observation time is on or before the bound — what the world looked like then. as_of_signed_at returns the latest fact whose signing time is on or before the bound — what the system knew then. Set both and both predicates hold simultaneously. The receipt carries an as_of block when the bound is non-empty, so an auditor in 2027 takes a 2026 receipt to any peer and replays the exact same query without trusting our server. /v1/memory/sse?path_prefix=&kind=&attester= opens a Server-Sent Events stream filtered server-side; a compliance subscriber receives every write the moment its sled commit lands.
Four foundation encoders sit GPU-pinned alongside the responder: Clay v1.5 reads Sentinel-2 at a 2.56 km receptive field, Prithvi-EO-2.0-300M-TL reads HLS at 6.7 km, Tessera reads Sentinel-1 plus Sentinel-2 per-pixel, and Galileo reads multimodal stacks (S1 + S2 + DEM + climate). Each carries its own aliasing pattern, so disagreements are informative. The clay_prithvi_tessera_triple_consensus@1 recipe votes them; six domain variants follow for deforestation, wetland change, urban expansion, disaster anomaly, climate archetype, and coastal erosion. Receipts pin the algorithm CID, so a third party can replay the score against the same input facts and reproduce the same number.
When a band genuinely has no data at a cell (the encoder is offline, the place is outside coverage, the archetype seed never materialised), the responder returns a signed absence with a typed reason. An empty answer is itself a citable receipt, not a 404 and not an empty array. Whitepaper at /whitepaper.md walks the math.
# Geocode a place to a cell64.
curl -s -X POST https://emem.dev/v1/locate \
-H 'content-type: application/json' \
-d '{"q":"Bengaluru"}' | jq .cell64
# "defi.zb493.xuqA.zcb5f" # (geocoder result — may drift)
# Recall a band at that cell (auto-fetched if cold).
curl -s -X POST https://emem.dev/v1/recall \
-H 'content-type: application/json' \
-d '{"cell":"defi.zb493.xuqA.zcb5f","bands":["weather.temperature_2m"]}' \
| jq '.facts[0]'
# Ask a free-text question; the foundation-embedding fan-out fires
# automatically on "find places like" / "what changed" intents.
curl -s -X POST https://emem.dev/v1/ask \
-H 'content-type: application/json' \
-d '{"q":"find places like Yellowstone","place":"Yellowstone National Park"}' \
| jq '.answer'
# Hunter mode: discover event hotspots over a named region. The same
# classifier reads "find <event> in <region>" from /v1/ask and routes
# here; structured callers can hit /v1/hunt directly.
curl -s -X POST https://emem.dev/v1/hunt \
-H 'content-type: application/json' \
-d '{"event":"algal_bloom","region":"Lake Erie"}' \
| jq '.hotspots[0]'The receipt's fact_cid is a durable handle. Re-fetching it from any responder, in any year, returns the same bytes.
The pitch lives or dies on this flow. Every recall response carries a receipt with fact_cids[], a merkle_proof, and an Ed25519 signature over the canonical preimage blake3(request_id ‖ served_at ‖ primitive ‖ cells ‖ fact_cids) — UTF-8, sections joined by |, list elements followed by ,. The signer's public key is stable; the receipt verifies offline against any copy of the responder pubkey.
# 1. Resolve a place to a cell64.
CELL=$(curl -s -X POST https://emem.dev/v1/locate \
-H 'content-type: application/json' \
-d '{"q":"Golden Gate Park, San Francisco"}' | jq -r .cell64)
# 2. Recall a band and capture the receipt envelope.
curl -s -X POST https://emem.dev/v1/recall \
-H 'content-type: application/json' \
-d "{\"cell\":\"$CELL\",\"bands\":[\"indices.ndvi\"]}" > /tmp/recall.json
jq '.receipt | {primitive, served_at, responder_pubkey_b32, fact_cids, merkle_proof: .merkle_proof.root}' \
/tmp/recall.json
# 3. Ask the responder to verify its own signature (server-side check).
jq '{receipt: .receipt}' /tmp/recall.json > /tmp/receipt.json
curl -s -X POST https://emem.dev/v1/verify_receipt \
-H 'content-type: application/json' --data @/tmp/receipt.json
# {"valid":true,"preimage_blake3_hex":"…","fact_cids_count":1,"signer_pubkey_b32":"…",…}
# 4. Reproduce: pull the same fact_cid from any responder, on any day.
# The cell, band, tslot, and derivation.fn_key are content-addressed —
# the bytes you receive will hash to the same fact_cid.
jq '.facts[0].derivation' /tmp/recall.jsonFor a browser-only verify, open /verify/<fact_cid> — the page does the same Ed25519 check in WebCrypto + @noble/ed25519 so you never have to trust the responder you got the receipt from. A guided walk lives at /demos/signed-answer.
The MCP endpoint is https://emem.dev/mcp. Drop a config snippet into your client.
| Client | Config |
|---|---|
| Claude Desktop | examples/claude-desktop.json |
| Claude Code | examples/claude-code.mcp.json |
| Cursor | examples/cursor.mcp.json |
| Cline (VS Code) | examples/cline.mcp.json |
| Gemini CLI | gemini extensions install https://emem.dev/gemini-extension.json |
| ChatGPT (Custom GPT) | examples/openai-gpt-action.json |
| LangChain (Python) | examples/langchain.py |
| LangChain MCP agent | examples/langchain/ |
| LlamaIndex (Python) | examples/llamaindex.py |
| LlamaIndex MCP agent | examples/llamaindex/ |
| Agno MCP agent | examples/agno/ |
| Pydantic AI MCP agent | examples/pydantic-ai/ |
| AutoGen MCP agent | examples/autogen/ |
| CrewAI MCP agent | examples/crewai/ |
| Mastra MCP agent | examples/mastra/ |
Python and TypeScript SDKs live under sdks/ (publication to PyPI / NPM pending; install from the repo today).
80 MCP tools (10 core, 70 extended), 92 documented REST paths under /v1/*, surfaced through /openapi.json. Every tool carries a when_to_use string written for LLM tool-selection, and four MCP behavioural annotations (readOnlyHint, destructiveHint, idempotentHint, openWorldHint). A no-param tools/list returns all 80 tools (so every MCP client discovers the full surface); pass {"tier":"core"} for just the 10 essentials. Tools are callable via tools/call regardless of tier.
- Locate: name or lat/lng →
cell64. Five-layer cascade: wide-bbox table → embedded gazetteer → GeoNames cities-5000 (68 581 places, in-process) → sled cache → Photon → Nominatim. Polygon geometry from Overturedivisions/division_area. District-level queries reroute through Overture when Nominatim returns a POI courthouse. - Memory substrate (state + tokens + bundles + memory files + search + contradictions + SSE):
POST /v1/statereturns a signed dense per-place embedding (view=encoderdefault 128-D,view=cubefull 1792-D).POST /v1/state_multifans acrossgeotessera+clay_v1+prithvi_eo2+galileo.POST /v1/state_diffreturns residual + L2 + cosine between two vintages.POST /v1/memory_tokencomposesmemt:<cell64>:<fact_cid>.POST /v1/memory_bundlecomposes a signed envelopememb:<bundle_cid>over N (cell, band, tslot) triples. Six MCP file-op verbs (memory_view,memory_create,memory_str_replace,memory_insert,memory_delete,memory_rename) conform to Anthropic's memory-tool spec; every write is ed25519-signed and content-addressed. Paths under/memories/by_attester/<pubkey>/...enforce capability binding (ed25519 signature overblake3("emem.memory_write|" + verb + "|" + path + "|" + body_hash)). Each file carries akindfrom the CoALA taxonomy (episodic/semantic/procedural/resource).POST /v1/memory/searchdoes BGE-768 semantic search over file contents via a LanceDB IVF_PQ partition.POST /v1/memory_contradictionswalks a parallel multi-attester index and scores disagreement per band kind (scalar / vector / categorical).GET /v1/memory/sse?path_prefix=&kind=&attester=streams write events with server-side filter. Every read primitive acceptsas_of_tslot+as_of_signed_atfor bi-temporal queries (valid-time + transaction-time); the receipt carries anas_ofblock when set. Seedocs/memory.mdfor the full reference. - Recall / recall_many / recall_polygon: 122 materializer-wired band names across 42 cube slots. Recall answers any wired band, auto-fetching on a cold miss and signing the result. Signed Absence on out-of-coverage.
- Find similar: k-NN over any vector band. Hamming fast path (sign-bit pop-count) auto-derives from the cosine band when the binary sibling is absent. Mode
hamming_then_reranktriages with Hamming then re-orders by cosine; the over-sampling factor is EWMA-adaptive. - Compare / compare_bands / diff / trajectory: pairwise and time-series.
- Connect & evolve: typed temporal edges (
emem_edges_recallreads a fact's signed connections —disagrees_with,supersedes,relates_to— with a valid-time bound), multi-attester contradiction scoring (memory_contradictions, per band kind), and a deterministic refinement loop that re-derives a fact when a newer attestation or adisagrees_withedge lands. All three ship in 0.0.9. - Verify: structured claim against attested facts; returns signed verdict + evidence CIDs.
- Physics:
/v1/heat_solve(2-D explicit FTCS heat, MODIS LST stencil),/v1/wave_solve(1-D shallow-water along seaward bathymetry gradient),/v1/jepa_predict(closed-form NDVI AR(2) seasonal),/v1/jepa_predict_v2(Tessera embedding dynamics; short-circuits to last-vintage identity baseline while the trained head is pending, receipt carriesuntrained_baseline). - Ask: free-text question with topic routing. The classifier covers three intent families: place-anchored topical questions (the topic router fan-out), foundation-embedding intents on
find places like/what changed/deforestation/anomaly(cross-encoder consensus over Clay + Prithvi + Tessera), corpus-meta intents onwhere do you have data/how fresh is your corpus(redirect to coverage surfaces), and hunter-mode discovery onfind <event> in <region>(routes to/v1/hunt). - Hunter:
POST /v1/huntand MCPemem_huntfor open-world event discovery. Twelve event keywords —algal_bloom,deforestation,flood_extent,wildfire,urban_heat_island,methane_plume,landslide,drought,soil_salinity,crop_stress,water_turbidity,oil_slick— each maps to a registered detection algorithm. The responder samples up to 32 cells from the named region (8 for slow primary bands such as MODIS LST), recalls the algorithm's primary scalar plus any configured gate band (e.g. NDWI > 0 for water-mask events), and returns the top 8 hotspots with cell64, lat/lng, recalled value, gate value, fact CID, and a Sentinel-2 scene URL. A Tessera embedding rerank fires when at least three candidate cells have a geotessera vector available, re-ordering by cosine similarity to the cluster centroid.oil_slickreturnsstatus: not_yet_implementedwith pointers atflood_extent_sar_threshold@1andwater_turbidity_red_band@1instead of fabricating detections. - EUDR Due Diligence Statement:
POST /v1/eudr_ddsand MCPemem_eudr_ddsproduce a signed Annex II-shaped DDS under Regulation (EU) 2023/1115. The per-cell algorithmeudr_compliance@1implements Article 2(4) as written: >0.5 ha, >5 m height, >10 % canopy cover, excluding land predominantly under agricultural or urban use. Forest baseline is JRC GFC2020 V3 (the Commission's expected non-binding baseline; live single-COG materializer at JEODPP) confirmed against Hansen GFC v1.12 treecover2000. JRC TMF (annual change, deforestation year, degradation year, transition subtype) reads through a pull-and-cache connector — JEODPP's TMF endpoint serves 84 MB tiles without HTTP Range, so the responder fetches once into$EMEM_DATA/jrc_tmf_cache/with atomic rename, then samples. The Article 2(28) dispatch picks POINT (≤4 ha non-cattle) vs POLYGON (>4 ha or any cattle plot under HS 0102/0201/0202). Sims et al. 2025 driver attribution and RADD Sentinel-1 alerts layer on as refinement: both currently materialize Absence with a structured NotImplemented reason because Zenodo and GFW S3 do not honour HTTP Range, and the responder will not fabricate a value for a connector it cannot read. Every response carries the explicit Article 9(1)(b) legality disclaimer — land tenure, FPIC, country-of-origin laws are structurally out of EO scope and need a partner module before TRACES NT submission. The JSON Schema at/v1/schemas/eudr_dds.jsoncites the exact EUR-Lex paragraph each field maps to. - Domain shortcuts:
emem_at,emem_ndvi,emem_air,emem_lst,emem_soil,emem_water,emem_forest,emem_weather. Collapse locate → recall → polygon-aggregate into one call by place name. - Field boundaries: Fields of The World (~3.17 B field polygons, 241 countries, 10 m, CC-BY-4.0) via PMTiles range reads on
source.coop. - Visual surfaces:
/v1/coverage_map.svg(1440×720 plate-carrée of attested cells, log-scale density) and/v1/places/scene_overlay.svg?place=…&band=…(per-place value-painted bbox grid; band-aware ColorBrewer ramps, horizontal legend, km scale bar, signed source line). The MCP equivalents return the same SVG as anEmbeddedResourceblock. The full set, plus the 32-diagram protocol/industry suite, lives at /docs/gallery and /docs/diagrams.
159 named composition recipes (flood_risk@2, walkability_score@1, heat_index@2, carbon_sink_score@1, eudr_compliance@1, forest_carbon_loss_co2_flux@1, enteric_ch4_dairy_tier1_ipcc2019@1, n2o_synthetic_fertilizer_ef1_ipcc2019@1, ...) live in a content-addressed registry. Each carries:
formula: plain math the agent can read and apply.inputs: band keys with role + explanation.when_to_use: agent-targeted trigger guidance.citation: peer-reviewed source.accuracy_band: honest precision estimate, not marketing.parameters: typed tunable thresholds (gate, k, timeout, ...).learned_from: citation provenance for every tuned number. An auditor can trace any gate threshold back to a referee.
Algorithms with an evaluation: Expr AST are also re-executable in-process: the responder walks the AST against the snapshot recall and returns a signed composite scalar that any third party with matching algorithms_cid and input fact CIDs reproduces deterministically.
Browse at GET /v1/algorithms or per-key at GET /v1/algorithms/<key>.
Designed for agents to read, not for humans to remember:
GET /openapi.json — OpenAPI 3.1 of every REST route
GET /v1/agent_card — live capability snapshot + manifest CIDs
GET /v1/tools — 80 MCP tools (10 core, 70 extended) with when_to_use + annotations
GET /v1/algorithms?summary=true — 159 algorithm keys + categories
GET /v1/topics — 27 topic-grouped bands + algorithms (router brain)
GET /v1/manifests — bands_cid, algorithms_cid, sources_cid, schema_cid
GET /v1/schemas/eudr_dds.json — Annex II JSON Schema with EUR-Lex paragraph citations
GET /.well-known/{emem,agent,mcp,ai-plugin}.json
POST /v1/state — signed dense state vector at any cell (view=encoder | view=cube)
POST /v1/state_multi — fan-out across geotessera + clay_v1 + prithvi_eo2 with typed missing[]
POST /v1/state_diff — vintage delta at one cell: residual vector + L2 + cosine
POST /v1/memory_token — compose memt:<cell64>:<fact_cid> citation handle
POST /v1/memory_token/resolve — single round-trip dereference back to signed fact body
GET /v1/stream — Server-Sent Events corpus heartbeat, signed every 5-300 s
GET /v1/corpus_state_stats — signed snapshot of corpus liveness (one-shot equivalent of /v1/stream)
GET /v1/benchmark — hand-verified eval items; pair with POST /v1/benchmark/grade
POST /v1/hunt — structured event-discovery sweep (12 events × region)
POST /v1/eudr_dds — EUDR Due Diligence Statement (Regulation EU 2023/1115)
POST /mcp — JSON-RPC 2.0 (Streamable HTTP)
GET /llms.txt /llms-full.txt — plaintext catalog for LLM ingestion
GET /humans /humans.json — interactive try-it surface + machine twin
GET /verify /verify/<fact_cid>— in-browser ed25519 receipt verifier
GET /docs/gallery — live coverage map + hunter case studies + 32 diagrams
GET /docs/diagrams/ — 32 SVGs of protocol + industry deployments
The operator_attestation block in /.well-known/emem.json binds the running binary's BLAKE3 hash to its git_commit + build_timestamp and signs the triple under the responder's ed25519 key, so a verifier can confirm the live binary corresponds to the published source tree without trusting the operator.
Every receipt pins four content-addressed registry CIDs (bands_cid, algorithms_cid, sources_cid, schema_cid). A peer that recomputes a fact under matching CIDs produces the same bytes. A peer with drifted registries returns a different bands_cid on /health and the divergence is visible before any data flows.
cargo run --release --bin emem-server
# Or via container.
docker run -p 5051:5051 ghcr.io/vortx-ai/emem:latestNo required env vars. EMEM_BIND overrides the listener (default 0.0.0.0:5051). EMEM_DATA overrides the data directory (default ./var/emem; pass :memory: for ephemeral). For TLS, systemd, ACME on :443, and the HuggingFace Space wrapper, see docs/operators/operating.md.
| field | bits | wire form | example |
|---|---|---|---|
cell |
64 | four base-1024 bigrams, dot-sep | defi.zb493.xuqA.zcb5f |
tslot |
64 | base32-nopad-leb128, t. prefix |
t.aaaaagy |
cid |
32 B BLAKE3 | base32-nopad-lowercase, 26 chars | qi3jo4sqcg…l2hgjtwm |
vec |
1792-D fp16 | 12-byte prefix in receipts | full vector via recall |
The active grid is ~9.54 m × ~9.55 m at the equator (lat 21 bits × lng 22 bits, asymmetric to match the 360°/180° ratio). Above the equator, longitude pitch narrows with cos(lat). The Hilbert-ordered base-1024 alphabet keeps adjacent cells string-prefix-similar, so an LLM that emits defi.zb493… already lands in roughly the right place. GET /v1/grid_info declares the active resolution honestly; the spec target is a hierarchical migration toward H3-equivalent res-13 (~3.4 m).
emem/
├── crates/ # 16 workspace crates, MSRV 1.91, version 0.0.9
│ ├── emem-core/ # bands, algorithms, functions, sources, topics, schema
│ ├── emem-codec/ # cell64, cid64, vec64, hilbert, geo, alphabet
│ ├── emem-fact/ # canonical CBOR; fact, receipt, attestation
│ ├── emem-claim/ # claim predicates (Op enum)
│ ├── emem-cache/ # sled cache wrapper
│ ├── emem-fetch/ # 16 data connectors + 13 utility modules
│ ├── emem-storage/ # sled hot cache + append-only merkle log
│ ├── emem-cubes/ # 1792-D voxel cube handle
│ ├── emem-primitives/ # recall, find_similar, trajectory, compare, diff, verify, query_region
│ ├── emem-attest/ # merkle root over fact CIDs
│ ├── emem-intent/ # rule-based intent → plan planner
│ ├── emem-mcp/ # 50-tool MCP descriptor registry
│ ├── emem-api-rest/ # axum router, physics solvers, foundation fan-out
│ ├── emem-cli/ # binaries: emem-server, emem-livedemo, emem-realdemo, emem-demo, emem-ask-eval
│ ├── emem-membench/ # memory-substrate benchmark harness
│ └── emem-sleep-agent/ # offline refinement loop over contradictions + edges
├── sdks/
│ ├── emem-py/ # Python client (httpx, sync + async)
│ └── emem-ts/ # TypeScript client (zero runtime deps, native fetch)
├── python/ # FastAPI sidecar over UDS: Prithvi-EO-2.0, Galileo, Clay v1.5, JEPA-v2
├── examples/ # MCP configs + LangChain / LlamaIndex
├── ops/ # systemd units, journald retention
└── web/ # SSR HTML, humans, verify, llms.txt, agent.json
The 16 data connectors back 46 declared source schemes and 122 live materializer registrations. Five of the 46 schemes are declared-but-unwired (openet.30m.daily, dynamic_world.v1, tropomi.s5p.ch4, tropomi.s5p.no2, viirs.dnb.monthly); they return a typed Absence, not data. Most wired schemes route through cog.rs, the universal STAC + COG sampler, plus bespoke modules for chirps (rainfall), dmsp_ols (nightlights), esa_cci_biomass (above-ground biomass, CEDA), firms (active fire), ftw (Fields of The World), geonames (gazetteer), gmrt (topobathymetry, PointServer + GridServer), hansen_gfc (forest change), jrc_gfc2020 (EUDR forest baseline, JEODPP single-COG), jrc_tmf (tropical moist forest, pull-and-cache), koppen (climate classification), overture (places / buildings / divisions), radd_alerts (Sentinel-1 disturbance), terraclimate (climate), wdpa (protected areas), worldpop (population), wri_gdm_drivers (Sims et al. 2025 driver attribution).
The GPU sidecar (Python FastAPI over Unix domain socket) co-resides four encoders on a 20 GB VRAM budget:
- Clay v1.5: 1024-D CLS, S2 L2A 10 bands, ~12 ms warm. Teacher (DINOv2
vit_large_patch14_reg4_dinov2.lvd142m) pre-staged at boot soHF_HUB_OFFLINE=1holds. - Prithvi-EO-2.0-300M-TL: 1024-D CLS, HLS V2 6-band, ~13 ms warm.
- Galileo (variant
basein production;tiny/nanoselectable viaEMEM_GALILEO_VARIANT): S2-only modality wired (S1 / ERA5 / SRTM / VIIRS / Dynamic-World / WorldCover / LandScan / location zero-masked; the scaffold is multimodal but only S2 is connected today). The advertised capability isgalileo-<variant>in/v1/capabilities.extensions[]. - JEPA v2 dynamics: untrained baseline. Metadata-only
is_trained()check short-circuits to last-vintage identity; receipt carriesuntrained_baselineandvia: "short_circuit_untrained". Training is upstream-bottlenecked on multi-vintage Tessera availability.
Sidecar crash does not cascade. The REST router degrades to scalar bands and signs the GPU-anchored algorithms as Absence with gpu_unavailable. See docs/developers/inference.md.
emem is built to be a protocol, not a single service. Because every fact is content-addressed and signed, any responder can serve it and any client can verify it offline, without trusting the source. Today that runs as one hosted responder plus self-hosted nodes. The design target is a federation of independent responders that resolve the same content ids byte-for-byte, cross-cite each other's attestations, and record where they disagree, so the shared memory gets more trustworthy the more agents read and write against it. None of the multi-host federation routing ships in 0.0.9. What ships today is the substrate that makes it possible: content addressing, signed receipts, typed temporal edges, multi-attester contradiction scoring, and a deterministic refinement loop.
- No commercial sub-meter imagery. Sentinel-2 (10 m), Landsat (30 m), HLS. For Planet Pelican (50 cm) or Maxar bring your own connector.
- No edge / onboard inference. Sidecar runs on a single host.
- Single-host deployment. No federation, no global routing, no SOC 2.
- JEPA v2 is untrained today. The endpoint exists and signs honestly; predictions equal the last attested vintage until the dynamics head is trained.
- 16 data connectors, 122 live materializer registrations. Catalog-by-count is not the pitch; every wired band is auto-fetchable, signed, and content-addressed. Bands without a wired materializer are listed under
declared_but_no_materializer_at_this_responder. - Foundation-encoder materializers are uneven.
geotessera(Tessera 128-D) has a wired materializer and auto-fetches on miss.clay_v1andprithvi_eo2are seed-only at this responder — the GPU sidecar runs both models, but the auto-materialise path that fans out to upstream tile archives is not wired today. Recall against either returns whatever has already been signed; the hunter-mode envelope discloses this per request undermaterializer_status[]. - Tessera is upstream-rate-limited.
dl2.geotessera.orgreliably serves 2024 vintages today; historical backfill across all eight vintages (2017–2024) is partial. The Tessera-coherence rerank in hunter mode gracefully degrades to primary-scalar order when the upstream is unreachable, surfacing the reason underembedding_rerank.reason. - MODIS LST is rate-limited.
modis.lst_day_8daymaterialises through the NASA/ORNL REST API at roughly 30 s per cell. Hunter mode caps the per-region fan-out for the LST family to 8 cells (env overrideEMEM_HUNTER_SLOW_BAND_CAP) so urban-heat queries return inside the gateway timeout. - No interactive notebook UI. For exploration there is
/humans(try-it drawer, manifest grid, ontology SVG); for analytics, drive from a notebook against the REST or MCP endpoint.
| Agent loop | https://emem.dev/agents.md |
| Wire spec | https://emem.dev/spec.md |
| llms.txt | https://emem.dev/llms.txt |
| OpenAPI 3.1 | https://emem.dev/openapi.json |
| MCP | https://emem.dev/mcp |
| Verify | https://emem.dev/verify |
| Container | ghcr.io/vortx-ai/emem:latest (multi-arch, anonymously pullable) |
| HF Space | huggingface.co/spaces/vortx-ai/emem |
| MCP Directory | docs/mcp-directory.md |
| Issues / PRs | github.com/Vortx-AI/emem/issues |
| Security | SECURITY.md, avijeet@vortx.ai |
Apache-2.0. See LICENSE and NOTICE.
Default-build data sources are open: Copernicus DEM, JRC GSW (CC-BY 4.0), Hansen GFC, ESA WorldCover (CC-BY 4.0), Overture Maps (places, buildings, transportation, divisions/division_area; ODbL / CDLA-Permissive), Fields of The World (CC-BY 4.0), GeoNames cities-5000 (CC-BY 4.0), OSM (ODbL), met.no, Open-Meteo, Tessera. No API keys, no operator credentials, no SaaS lock-in.