Battle-tested caching strategies that reduce LLM costs by up to 95% in production systems.
npm install llm-cache-patterns
# or
pip install llm-cache-patternsconst { SemanticCache } = require('llm-cache-patterns');
const cache = new SemanticCache({
threshold: 0.95, // Similarity threshold
ttl: 3600, // 1 hour TTL
provider: 'redis'
});
// Before: $0.10 per request
const response = await openai.complete(prompt);
// After: $0.005 per request (95% cache hit rate)
const response = await cache.get(prompt, () => openai.complete(prompt));| Company | Before | After | Savings | Cache Hit Rate |
|---|---|---|---|---|
| FinTech Startup | $45K/mo | $3K/mo | 93% | 89% |
| E-commerce Platform | $120K/mo | $15K/mo | 87% | 76% |
| SaaS Analytics | $28K/mo | $1.8K/mo | 94% | 91% |
Simple but effective for repeated queries.
const exactCache = new ExactMatchCache({
provider: 'memory', // or 'redis', 'dynamodb'
maxSize: 10000
});Uses embeddings to match similar queries.
const semanticCache = new SemanticCache({
model: 'text-embedding-ada-002',
threshold: 0.95,
vectorDB: 'pinecone' // or 'weaviate', 'qdrant'
});Caches responses for templated prompts.
const templateCache = new TemplateCache({
templates: {
'user-query': 'Answer this question about {topic}: {question}',
'summary': 'Summarize this {type} document: {content}'
}
});Time-based cache with sliding expiration.
const slidingCache = new SlidingWindowCache({
windowSize: 3600, // 1 hour
bucketSize: 60, // 1 minute buckets
maxTokensPerWindow: 1000000
});Multi-level caching for different query types.
const hierarchicalCache = new HierarchicalCache({
levels: [
{ name: 'hot', ttl: 300, size: 1000 }, // 5 min
{ name: 'warm', ttl: 3600, size: 10000 }, // 1 hour
{ name: 'cold', ttl: 86400, size: 100000 } // 1 day
]
});Prevent duplicate requests in flight.
const deduper = new RequestDeduplicator({
timeout: 30000 // 30 second window
});
// Multiple simultaneous requests for same prompt
// Only one API call made
const [r1, r2, r3] = await Promise.all([
deduper.request(prompt, () => llm.complete(prompt)),
deduper.request(prompt, () => llm.complete(prompt)),
deduper.request(prompt, () => llm.complete(prompt))
]);Preload cache with common queries.
const warmer = new CacheWarmer({
schedule: '0 6 * * *', // Daily at 6 AM
queries: './common-queries.json',
batchSize: 100
});
await warmer.warmCache(cache);Intelligent cache invalidation strategies.
const invalidator = new SmartInvalidator({
rules: [
{ pattern: /stock price/i, ttl: 60 }, // 1 minute for stock data
{ pattern: /weather/i, ttl: 1800 }, // 30 min for weather
{ pattern: /definition/i, ttl: 604800 } // 1 week for definitions
]
});Track cache performance in production.
const monitor = new CacheMonitor({
metrics: ['hitRate', 'latency', 'savings'],
dashboard: 'grafana',
alerts: {
hitRate: { below: 0.7 },
latency: { above: 100 }
}
});- Hit Rate: Percentage of requests served from cache
- Latency: Cache vs API response times
- Cost Savings: Actual $ saved
- Token Usage: Cached vs fresh tokens
// pages/api/ai-complete.js
import { SemanticCache } from 'llm-cache-patterns';
const cache = new SemanticCache({
provider: process.env.REDIS_URL
});
export default async function handler(req, res) {
const { prompt } = req.body;
const cached = await cache.get(prompt, async () => {
return await openai.complete({ prompt });
});
res.json({
response: cached.value,
cached: cached.fromCache,
savings: cached.fromCache ? '$0.02' : '$0.00'
});
}from llm_cache_patterns import SemanticCache
import openai
cache = SemanticCache(
provider="redis",
threshold=0.95
)
@app.post("/complete")
async def complete(prompt: str):
result = await cache.get(
prompt,
lambda: openai.Completion.create(prompt=prompt)
)
return {
"response": result.value,
"cached": result.from_cache,
"latency": result.latency_ms
}- PII Handling: Never cache personally identifiable information
- Encryption: Use encryption at rest for sensitive domains
- Access Control: Implement proper cache key namespacing
- Audit Logging: Track cache access for compliance
- Blog: Why Your AI Agent Needs a Cache
- Video: Implementing Semantic Cache
- Benchmarks: Cache Provider Comparison
We welcome contributions! See CONTRIBUTING.md for guidelines.
npm test
npm run benchmark- @BinaryBourbon - Author
- @contributor1 - Redis optimizations
- @contributor2 - Python bindings
MIT License - see LICENSE for details.
Built with ❤️ by BinaryBourbon | Star on GitHub