RAG-powered document Q&A that lets you upload PDF documents and ask natural-language questions about their content. It uses hybrid search (semantic + BM25) over a FAISS vector index to retrieve relevant chunks, then generates grounded answers using a locally-running LLM.
- PDF Ingestion - Upload PDFs, text is extracted and embedded into a vector index for retrieval.
- Hybrid Search - Combines semantic and keyword-based retrieval for better coverage across query types.
- Streaming Q&A - Retrieved context is passed to a local LLM and answers are streamed token-by-token.
- Conversation Memory - Chat history is persisted per session for coherent multi-turn conversations.
- Document Management - List, delete, or clear documents from the UI.
- Ingest - PDF text is extracted and split into chunks, then embedded and indexed for search.
- Retrieve - A question is searched across both semantic and keyword indexes to find the most relevant chunks.
- Generate - Retrieved context is passed to a local LLM which produces a grounded answer.
- Stream - The answer is streamed token-by-token to the frontend in real-time.
- Python 3.11 (Conda recommended)
- CUDA-capable GPU with at least 6 GB VRAM (required for both embeddings and LLM)
- Redis running locally (for Celery broker + conversation memory)
- Node.js 18+ (for the frontend)
-
Clone the repository
git clone https://github.com/Choco-10/RAG_Project.git cd RAG_Project -
Create the Conda environment
cd server conda env create -f environment.yml conda activate rag-faiss -
Start Redis
redis-server
-
Start the FastAPI server
uvicorn app.main:app --reload --port 8000
The server will be available at
http://127.0.0.1:8000. Openhttp://127.0.0.1:8000/docsfor the interactive Swagger UI. -
Start the Celery worker (in a separate terminal)
cd server celery -A app.celery_worker:celery_app worker --loglevel=info --pool=solo
cd client
npm install
npm run devThe frontend will be available at http://localhost:5173.