Skip to content

kylemaa/distributed-semantic-cache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

46 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Distributed Semantic Cache

Open-source semantic caching for LLM applications. Reduce API costs by 50-80% while improving response times.

License: MIT TypeScript Node.js Tests

🎯 Why Distributed Semantic Cache?

Challenge Solution
High LLM API costs Semantic caching reduces calls by 50-80%
Slow response times Sub-millisecond cache hits vs 1-3s API calls
Exact match limitations Semantic similarity catches paraphrased queries
Data privacy concerns 100% local embeddings, your data never leaves
Production scalability Kubernetes-ready with HNSW indexing for 100K+ vectors

πŸ“¦ SDK - The Developer Experience

npm install @distributed-semantic-cache/sdk

Drop-in LLM Integration

import { createOpenAIMiddleware, SemanticCache } from '@distributed-semantic-cache/sdk';
import OpenAI from 'openai';

// Setup cache
const cache = new SemanticCache({
  baseUrl: process.env.CACHE_URL,
  apiKey: process.env.CACHE_API_KEY,
});

// Create middleware
const middleware = createOpenAIMiddleware({ cache, threshold: 0.85 });

// Wrap your OpenAI calls - that's it!
const result = await middleware.chat(
  { model: 'gpt-4', messages: [{ role: 'user', content: 'Explain quantum computing' }] },
  () => openai.chat.completions.create({ model: 'gpt-4', messages: [...] })
);

if (result.cached) {
  console.log(`πŸ’° Saved API call! Similarity: ${result.similarity}`);
}

Also Supports

  • Anthropic Claude - createAnthropicMiddleware()
  • Custom LLMs - createGenericLLMMiddleware()
  • React Apps - createSemanticCacheHooks(React)
  • Fluent Config - buildCache().withPreset('production').build()

πŸ“š Full SDK Documentation

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Your Application                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                         SDK Middleware                          β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚    β”‚   OpenAI    β”‚    β”‚  Anthropic  β”‚    β”‚   Custom    β”‚        β”‚
β”‚    β”‚  Middleware β”‚    β”‚  Middleware β”‚    β”‚     LLM     β”‚        β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚                  β”‚                  β”‚
            β–Ό                  β–Ό                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Semantic Cache API                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚  β”‚ L1: Exact   β”‚ β†’  β”‚L2: Normalizedβ”‚ β†’  β”‚L3: Semantic β”‚         β”‚
β”‚  β”‚   Match     β”‚    β”‚    Match     β”‚    β”‚   Search    β”‚         β”‚
β”‚  β”‚   O(1)      β”‚    β”‚    O(1)      β”‚    β”‚  O(log n)   β”‚         β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚   HNSW      β”‚    β”‚  Matryoshka β”‚    β”‚  Predictive β”‚          β”‚
β”‚  β”‚   Index     β”‚    β”‚   Cascade   β”‚    β”‚   Warming   β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚                  β”‚                  β”‚
            β–Ό                  β–Ό                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Storage & Embeddings                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚   SQLite    β”‚    β”‚    Local    β”‚    β”‚   OpenAI    β”‚          β”‚
β”‚  β”‚   Storage   β”‚    β”‚  Embeddings β”‚    β”‚  Embeddings β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Features

Features

  • 3-Layer Cache Architecture - Exact β†’ Normalized β†’ Semantic matching
  • Local Embeddings - 100% free, privacy-first (MiniLM, mpnet, e5)
  • Query Normalization - Case, punctuation, contraction handling
  • Confidence Scoring - Multi-factor cache hit confidence
  • SQLite Storage - Lightweight, file-based, zero-config
  • Full REST API - Query, store, stats, admin endpoints
  • React Chat UI - Interactive demo and testing interface
  • Multi-Tenancy - Complete data isolation, per-tenant quotas
  • Analytics - Cost tracking, ROI dashboards, time-series metrics
  • Predictive Cache Warming - Pattern-based pre-population
  • HNSW Indexing - O(log n) search for 100K+ vectors
  • Matryoshka Cascade - Adaptive dimension search (4-8x faster)
  • Production Ready - Docker, Kubernetes, Terraform templates

πŸ“Š Performance

Metric Value
Cache Hit Latency < 5ms
L1 (Exact) Lookup O(1)
L3 (Semantic) Search O(log n) with HNSW
Vector Capacity 100K+ entries
Storage Reduction 75% with quantization
API Cost Savings 50-80% typical

πŸ› οΈ Quick Start

Prerequisites

  • Node.js 18+
  • pnpm 8+

Installation

# Clone the repository
git clone https://github.com/your-org/distributed-semantic-cache.git
cd distributed-semantic-cache

# Install dependencies
pnpm install

# Configure environment
cp .env.example .env

Configuration

Option A: Local Embeddings (Free, Privacy-First) ⭐ Recommended

EMBEDDING_PROVIDER=local
LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2

Option B: OpenAI Embeddings (Higher Quality)

EMBEDDING_PROVIDER=openai
OPENAI_API_KEY=your_openai_api_key

Run

# Development mode (all packages)
pnpm dev

# Or individually
cd packages/api && pnpm dev   # API: http://localhost:3000
cd packages/web && pnpm dev   # Web: http://localhost:5173

πŸ“‘ API Reference

Query Cache

curl -X POST http://localhost:3000/api/cache/query \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{"query": "What is TypeScript?", "threshold": 0.85}'

Store Response

curl -X POST http://localhost:3000/api/cache/store \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{"query": "What is TypeScript?", "response": "TypeScript is..."}'

Get Statistics

curl http://localhost:3000/api/cache/stats \
  -H "x-api-key: YOUR_API_KEY"

πŸ“– Full API Documentation

🐳 Production Deployment

Docker

docker-compose up -d

Kubernetes

kubectl apply -f deploy/kubernetes/

Terraform (AWS)

cd deploy/terraform/aws
terraform init && terraform apply

πŸš€ Deployment Guide

πŸ“ Project Structure

distributed-semantic-cache/
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ api/           # Fastify REST API server
β”‚   β”œβ”€β”€ sdk/           # TypeScript SDK for developers
β”‚   β”œβ”€β”€ web/           # React demo application
β”‚   └── shared/        # Shared types and utilities
β”œβ”€β”€ deploy/
β”‚   β”œβ”€β”€ kubernetes/    # K8s manifests
β”‚   β”œβ”€β”€ terraform/     # Infrastructure as code
β”‚   └── nginx/         # Reverse proxy config
└── docs/
    β”œβ”€β”€ architecture/  # System design docs
    β”œβ”€β”€ guides/        # User guides
    └── business/      # Strategy docs

πŸ“„ License

This project is licensed under the MIT License - see LICENSE for details.

Free to use, modify, and distribute for any purpose.

πŸ“š Documentation

Document Description
SDK Documentation TypeScript SDK reference
Quick Start Guide Get running in 5 minutes
Architecture System design overview
Security Guide Production hardening
Examples Integration patterns

πŸ§ͺ Testing

# Run all tests
pnpm test

# Run SDK tests
cd packages/sdk && pnpm test

# Run API tests
cd packages/api && pnpm test

220+ tests passing across all packages.

🀝 Contributing

See CONTRIBUTING.md for development guidelines.

πŸ“ž Support

Have questions or need help?

  • πŸ“ Open an Issue for bugs or feature requests
  • πŸ’¬ Discussions for questions and ideas
  • ⭐ Star this repo if you find it useful!

Reduce LLM costs. Improve performance. Ship faster.

Built with ❀️ for the AI community

About

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors