Skip to content

Zem-0/Multi-Provider-LLM-Inference-Gateway

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Gateway

A unified API gateway for multiple Large Language Model (LLM) providers, providing a single endpoint to access various AI models from Gemini, Groq, Mistral, and Nvidia.

Features

  • Unified API: Single /v1/chat/completions endpoint for all supported providers
  • Intelligent Routing: Automatically routes requests to the appropriate provider based on model name
  • Response Caching: Redis-based caching to reduce latency and API costs
  • Authentication: Secure API key-based authentication
  • Request Logging: PostgreSQL database for tracking usage and costs
  • Rate Limiting: Built-in middleware for request throttling (configurable)
  • Docker Support: Easy deployment with Docker Compose

Supported Providers & Models

Provider Models Base URL
Gemini gemini-1.5-flash, gemini-1.5-pro Google AI
Groq llama-3.1-70b, mixtral-8x7b, gemma-7b Groq Cloud
Mistral mistral-large, mistral-medium Mistral AI
Nvidia deepseek-v3, various Nvidia models Nvidia AI

Quick Start

Prerequisites

  • Docker and Docker Compose
  • API keys for desired providers

1. Clone and Setup

git clone <your-repo-url>
cd llm-gateway

2. Environment Configuration

Create a .env file in the root directory:

# Provider API Keys
GEMINI_API_KEY=your_gemini_api_key
GROQ_API_KEY=your_groq_api_key
MISTRAL_API_KEY=your_mistral_api_key
NVIDIA_API_KEY=your_nvidia_api_key

# Gateway Configuration
GATEWAY_API_KEY=your_gateway_secret_key

3. Launch with Docker

docker-compose up --build

The gateway will be available at http://localhost:8000

Local Development

Install Dependencies

pip install -r requirements.txt

Run Locally

# Start Redis (if not using Docker)
redis-server

# Start PostgreSQL (if not using Docker)
# Configure your local database

# Run the application
python main.py

API Usage

Authentication

All requests require the x-gateway-key header:

curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "x-gateway-key: your_gateway_secret_key" \
  -d '{
    "model": "gemini-1.5-flash",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }'

Request Format

The gateway accepts standard OpenAI-compatible chat completion requests:

{
  "model": "gemini-1.5-flash",
  "messages": [
    {"role": "user", "content": "Your message here"}
  ],
  "temperature": 0.7,
  "max_tokens": 100
}

Model Routing

The gateway automatically routes based on model names:

  • gemini-* → Gemini API
  • llama-*, mixtral-*, gemma-* → Groq API
  • mistral-* → Mistral API
  • nvidia-*, deepseek-* → Nvidia API

Architecture

┌─────────────────┐    ┌─────────────────┐
│   Client Apps   │────│   LLM Gateway   │
└─────────────────┘    └─────────────────┘
                                │
                ┌───────────────┼───────────────┐
                │               │               │
        ┌───────▼──────┐ ┌──────▼──────┐ ┌─────▼─────┐
        │   Redis      │ │ PostgreSQL  │ │ Providers  │
        │   Cache      │ │   Logs      │ │  APIs      │
        └──────────────┘ └─────────────┘ └───────────┘

Database Schema

Users Table

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    api_key VARCHAR(255) UNIQUE NOT NULL,
    name VARCHAR(100),
    balance_usd DECIMAL(10, 4) DEFAULT 0.0
);

Request Logs Table

CREATE TABLE request_logs (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    provider VARCHAR(50),
    model_name VARCHAR(100),
    prompt_tokens INTEGER,
    completion_tokens INTEGER,
    total_cost DECIMAL(10, 6),
    status_code INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Configuration

Environment Variables

Variable Description Required
GEMINI_API_KEY Google Gemini API key No
GROQ_API_KEY Groq API key No
MISTRAL_API_KEY Mistral AI API key No
NVIDIA_API_KEY Nvidia API key No
GATEWAY_API_KEY Gateway authentication key Yes

Cache Configuration

  • Cache TTL: 1 hour (3600 seconds)
  • Cache key: SHA256 hash of model + messages

Development

Project Structure

llm-gateway/
├── main.py          # FastAPI application and routing logic
├── middleware.py    # Authentication and rate limiting
├── database.py      # Database schema and initialization
├── requirements.txt # Python dependencies
├── Dockerfile       # Container configuration
├── docker-compose.yml # Multi-service setup
└── README.md        # This file

Adding New Providers

  1. Add provider configuration to PROVIDERS dict in main.py
  2. Implement routing logic in the gateway endpoint
  3. Update model routing conditions
  4. Add environment variable for API key

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

MIT License

Support

For issues and questions, please open a GitHub issue or contact the maintainers.

About

Built a self-hosted API gateway routing OpenAI-compatible requests across 4 providers (Gemini, Groq, Mistral, NVIDIA) with automatic provider translation, per-key auth, and real-time token cost tracking via PostgreSQL.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors