Skip to content

persteenolsen/fastapi-jwt-auth-rag-two

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python + FastAPI + JWT Auth + RAG + Hugging Face embeddings

A production-style Retrieval-Augmented Generation (RAG) API built with FastAPI.
This project combines secure JWT authentication, vector search with pgvector, Hugging Face embeddings, and Groq LLMs to deliver context-aware answers from your own data.


📌 Project Info

  • Last Updated: 15-04-2026
  • Python Version: 3.12

✨ Features

🔐 Authentication

  • JWT-based authentication (HS256)
  • Protected endpoints using Bearer tokens
  • Environment-based credentials

🧠 RAG Pipeline

  • Ingests .txt documents from URLs
  • Splits content into topic-based chunks
  • Generates embeddings using Hugging Face
  • Stores vectors in PostgreSQL (pgvector)
  • Retrieves relevant context for queries

🤖 LLM Integration (Groq)

  • Model: llama-3.1-8b-instant
  • Context-aware answer generation
  • Structured prompting for grounded responses

🔎 Semantic Search

  • Query → embedding
  • Top-K similarity search via pgvector
  • Cosine distance (<->)

🗄️ Vector Database (PostgreSQL + pgvector)

Stores:

  • Document content
  • Embeddings (384-dim vectors)
  • Source URL
  • Metadata
  • Timestamp

Optimizations:

  • VECTOR(384) column
  • ivfflat index for fast retrieval

⚙️ Background Processing

  • FastAPI BackgroundTasks
  • Async ingestion pipeline
  • Non-blocking embedding + DB insert

🧪 Debug Tools

  • /debug/retrieve → test retrieval without LLM
  • Console logging for inspection

📡 API Endpoints

Method Endpoint Description
POST /token Get JWT access token
POST /ask Ask questions (RAG-powered) 🔐
POST /ingest Ingest .txt files from URLs
GET /debug/retrieve Debug semantic search

🔐 = Requires authentication


⚙️ Getting Started

1. Clone the Repository

git clone https://github.com/your-username/your-repo.git
cd your-repo

2. Create Virtual Environment

python -m venv venv

Activate it:

Windows (PowerShell):

venv\Scripts\activate

Mac/Linux:

source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

▶️ Run the Application

uvicorn main:app --reload

Once running:

Use Swagger UI to:

  1. Authenticate via /token
  2. Copy the JWT token
  3. Authorize requests

🔑 Authentication Flow

  1. Call /token with credentials
  2. Receive JWT access token
  3. Use in headers:
Authorization: Bearer <your_token>

🧠 How RAG Works

User Query
   ↓
Embedding (Hugging Face)
   ↓
pgvector Similarity Search
   ↓
Top-K Relevant Chunks
   ↓
Groq LLM (LLaMA 3.1)
   ↓
Final Answer + Sources

📥 Document Ingestion

/ingest

  • Accepts .txt file URLs
  • Downloads and cleans content
  • Splits into topic-based chunks
  • Generates embeddings
  • Stores results in PostgreSQL

🧾 Embeddings

  • Model: sentence-transformers/all-MiniLM-L6-v2
  • 384-dimensional normalized vectors
  • Batch processing with retry support
  • Powered via Hugging Face Inference API

🗄️ Database Initialization

On application startup:

  • Creates pgvector extension
  • Creates documents table
  • Builds ivfflat similarity index

🛠️ Text Processing

  • Fetches .txt files from URLs
  • Validates content type
  • Cleans and normalizes text

📌 Future Improvements

  • 🔄 Refresh tokens
  • 📊 Admin dashboard
  • 🔍 Hybrid search (BM25 + vector)
  • 📈 Monitoring & logging
  • 🧩 Plugin/tool integrations
  • Splitting the code of the app.py into seperates files inside folders for improved structure

📄 License

MIT License


🙌 Final Notes

This project is designed as a clean, production-style RAG backend and can be extended into:

  • Chatbots
  • Internal knowledge systems
  • AI assistants
  • Document search platforms

Happy coding :-)

About

Python FastAPI with JWT Auth serving RAG using Real embeddings

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages