A production-style Retrieval-Augmented Generation (RAG) API built with FastAPI.
This project combines secure JWT authentication, vector search with pgvector, Hugging Face embeddings, and Groq LLMs to deliver context-aware answers from your own data.
- Last Updated: 15-04-2026
- Python Version: 3.12
- JWT-based authentication (HS256)
- Protected endpoints using Bearer tokens
- Environment-based credentials
- Ingests
.txtdocuments from URLs - Splits content into topic-based chunks
- Generates embeddings using Hugging Face
- Stores vectors in PostgreSQL (
pgvector) - Retrieves relevant context for queries
- Model:
llama-3.1-8b-instant - Context-aware answer generation
- Structured prompting for grounded responses
- Query → embedding
- Top-K similarity search via
pgvector - Cosine distance (
<->)
Stores:
- Document content
- Embeddings (384-dim vectors)
- Source URL
- Metadata
- Timestamp
Optimizations:
VECTOR(384)columnivfflatindex for fast retrieval
- FastAPI
BackgroundTasks - Async ingestion pipeline
- Non-blocking embedding + DB insert
/debug/retrieve→ test retrieval without LLM- Console logging for inspection
| Method | Endpoint | Description |
|---|---|---|
| POST | /token |
Get JWT access token |
| POST | /ask |
Ask questions (RAG-powered) 🔐 |
| POST | /ingest |
Ingest .txt files from URLs |
| GET | /debug/retrieve |
Debug semantic search |
🔐 = Requires authentication
git clone https://github.com/your-username/your-repo.git
cd your-repopython -m venv venvActivate it:
Windows (PowerShell):
venv\Scripts\activateMac/Linux:
source venv/bin/activatepip install -r requirements.txtuvicorn main:app --reloadOnce running:
-
🌐 API: http://127.0.0.1:8000
-
📄 Swagger Docs: http://127.0.0.1:8000/docs
Use Swagger UI to:
- Authenticate via
/token - Copy the JWT token
- Authorize requests
- Call
/tokenwith credentials - Receive JWT access token
- Use in headers:
Authorization: Bearer <your_token>User Query
↓
Embedding (Hugging Face)
↓
pgvector Similarity Search
↓
Top-K Relevant Chunks
↓
Groq LLM (LLaMA 3.1)
↓
Final Answer + Sources
- Accepts
.txtfile URLs - Downloads and cleans content
- Splits into topic-based chunks
- Generates embeddings
- Stores results in PostgreSQL
- Model:
sentence-transformers/all-MiniLM-L6-v2 - 384-dimensional normalized vectors
- Batch processing with retry support
- Powered via Hugging Face Inference API
On application startup:
- Creates
pgvectorextension - Creates
documentstable - Builds
ivfflatsimilarity index
- Fetches
.txtfiles from URLs - Validates content type
- Cleans and normalizes text
- 🔄 Refresh tokens
- 📊 Admin dashboard
- 🔍 Hybrid search (BM25 + vector)
- 📈 Monitoring & logging
- 🧩 Plugin/tool integrations
- Splitting the code of the app.py into seperates files inside folders for improved structure
MIT License
This project is designed as a clean, production-style RAG backend and can be extended into:
- Chatbots
- Internal knowledge systems
- AI assistants
- Document search platforms
Happy coding :-)