A unified API gateway for multiple Large Language Model (LLM) providers, providing a single endpoint to access various AI models from Gemini, Groq, Mistral, and Nvidia.
- Unified API: Single
/v1/chat/completionsendpoint for all supported providers - Intelligent Routing: Automatically routes requests to the appropriate provider based on model name
- Response Caching: Redis-based caching to reduce latency and API costs
- Authentication: Secure API key-based authentication
- Request Logging: PostgreSQL database for tracking usage and costs
- Rate Limiting: Built-in middleware for request throttling (configurable)
- Docker Support: Easy deployment with Docker Compose
| Provider | Models | Base URL |
|---|---|---|
| Gemini | gemini-1.5-flash, gemini-1.5-pro |
Google AI |
| Groq | llama-3.1-70b, mixtral-8x7b, gemma-7b |
Groq Cloud |
| Mistral | mistral-large, mistral-medium |
Mistral AI |
| Nvidia | deepseek-v3, various Nvidia models |
Nvidia AI |
- Docker and Docker Compose
- API keys for desired providers
git clone <your-repo-url>
cd llm-gatewayCreate a .env file in the root directory:
# Provider API Keys
GEMINI_API_KEY=your_gemini_api_key
GROQ_API_KEY=your_groq_api_key
MISTRAL_API_KEY=your_mistral_api_key
NVIDIA_API_KEY=your_nvidia_api_key
# Gateway Configuration
GATEWAY_API_KEY=your_gateway_secret_keydocker-compose up --buildThe gateway will be available at http://localhost:8000
pip install -r requirements.txt# Start Redis (if not using Docker)
redis-server
# Start PostgreSQL (if not using Docker)
# Configure your local database
# Run the application
python main.pyAll requests require the x-gateway-key header:
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "x-gateway-key: your_gateway_secret_key" \
-d '{
"model": "gemini-1.5-flash",
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}'The gateway accepts standard OpenAI-compatible chat completion requests:
{
"model": "gemini-1.5-flash",
"messages": [
{"role": "user", "content": "Your message here"}
],
"temperature": 0.7,
"max_tokens": 100
}The gateway automatically routes based on model names:
gemini-*→ Gemini APIllama-*,mixtral-*,gemma-*→ Groq APImistral-*→ Mistral APInvidia-*,deepseek-*→ Nvidia API
┌─────────────────┐ ┌─────────────────┐
│ Client Apps │────│ LLM Gateway │
└─────────────────┘ └─────────────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌───────▼──────┐ ┌──────▼──────┐ ┌─────▼─────┐
│ Redis │ │ PostgreSQL │ │ Providers │
│ Cache │ │ Logs │ │ APIs │
└──────────────┘ └─────────────┘ └───────────┘
CREATE TABLE users (
id SERIAL PRIMARY KEY,
api_key VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(100),
balance_usd DECIMAL(10, 4) DEFAULT 0.0
);CREATE TABLE request_logs (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
provider VARCHAR(50),
model_name VARCHAR(100),
prompt_tokens INTEGER,
completion_tokens INTEGER,
total_cost DECIMAL(10, 6),
status_code INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);| Variable | Description | Required |
|---|---|---|
GEMINI_API_KEY |
Google Gemini API key | No |
GROQ_API_KEY |
Groq API key | No |
MISTRAL_API_KEY |
Mistral AI API key | No |
NVIDIA_API_KEY |
Nvidia API key | No |
GATEWAY_API_KEY |
Gateway authentication key | Yes |
- Cache TTL: 1 hour (3600 seconds)
- Cache key: SHA256 hash of model + messages
llm-gateway/
├── main.py # FastAPI application and routing logic
├── middleware.py # Authentication and rate limiting
├── database.py # Database schema and initialization
├── requirements.txt # Python dependencies
├── Dockerfile # Container configuration
├── docker-compose.yml # Multi-service setup
└── README.md # This file
- Add provider configuration to
PROVIDERSdict inmain.py - Implement routing logic in the gateway endpoint
- Update model routing conditions
- Add environment variable for API key
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
For issues and questions, please open a GitHub issue or contact the maintainers.