YouTube RAG Chatbot API

A FastAPI-based Retrieval-Augmented Generation (RAG) chatbot that extracts insights from YouTube videos. Load any YouTube video and ask questions about its content using local LLM models via Ollama.

Features

YouTube Transcript Extraction: Automatically fetches transcripts from YouTube videos
Multi-language Support: Handles transcripts in English, Hindi, and auto-detects other languages
Intelligent Translation: Uses lightweight Ollama LLM to translate non-English transcripts to English
Vector Database: Stores embeddings using ChromaDB for efficient semantic search
Local LLM: Uses Ollama for both embeddings and generation (no external API calls)
Caching: In-memory caching for videos and question answers to improve performance
RAG Pipeline: Retrieves relevant context from transcripts and generates accurate answers

Quick Start

Prerequisites

Python 3.8+
Ollama installed and running with models:
- qwen2.5:1.5b (for translation)
- qwen2.5:7b (for question answering)

Installation

Clone the repository:

git clone https://github.com/GuyOnAKeyboard/YouTube-RAG-Chatbot-API-with-FastAPI-ChromaDB-Local-LLM-Ollama-.git
cd YouTube-RAG-Chatbot-API-with-FastAPI-ChromaDB-Local-LLM-Ollama-

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Start the FastAPI server:

python main.py

The API will be available at http://localhost:8000

API Endpoints

1. Load Video

POST /load_video

Load a YouTube video for analysis.

Request:

{
  "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}

Response:

{
  "video_id": "hash_of_url",
  "cached": false
}

2. Ask Question

POST /ask

Ask a question about a loaded video's content.

Request:

{
  "video_id": "hash_of_url",
  "question": "What is the main topic discussed?"
}

Response:

{
  "answer": "The video discusses...",
  "cached": false
}

Architecture

YouTube Video
    ↓
[Extract Transcript]
    ↓
[Translate to English] ← Uses Ollama qwen2.5:1.5b
    ↓
[Split & Chunk Text]
    ↓
[Generate Embeddings] ← Uses Ollama
    ↓
[Store in ChromaDB]
    ↓
[RAG Query Pipeline]
    ↓
Answer Generation ← Uses Ollama qwen2.5:7b

Components

agents.py: RAG pipeline logic, video loading, document processing
main.py: FastAPI endpoints and request handling

Key Notes

Translation Behavior

Translation only occurs when a transcript is in a language other than English
The system uses Ollama (qwen2.5:1.5b) to translate non-English transcripts to English for better LLM compatibility
If translation fails or is unavailable, the system falls back to the original transcript
Translation quality depends on the Ollama model used
This is acceptable — the system gracefully handles translation failures without breaking

Future Updates & Improvements

Multiple LLM Models
- Support for additional Ollama models
- Model selection via API parameter
Advanced Caching
- Persistent caching (Redis/PostgreSQL)
- Cache TTL management
- Cache statistics and monitoring
Enhanced Translation
- Support for more languages
- Improved translation accuracy
- Language-specific prompts
Vector DB Improvements
- Metadata filtering (video title, date, speaker)
- Hybrid search (keyword + semantic)
- Database persistence across sessions
API Enhancements
- User authentication and API keys
- Rate limiting
- Request logging and analytics
- WebSocket support for real-time streaming responses
UI/Frontend
- Web interface for easy video upload and querying
- Chat history and saved conversations
- Export answers to PDF/Markdown
Quality Improvements
- Unit and integration tests
- Error handling improvements
- Structured logging
- Docker containerization
Scalability
- Async request processing (Celery/RQ)
- Horizontal scaling support
- Load balancing
- Queue-based video processing

License

This project is licensed under the MIT License - see the LICENSE file for details.

You are free to:

Use the project for personal or commercial purposes
Modify and distribute the code
Use privately or publicly

Just include the original license notice in distributions.

Commercial Use Attribution

If you are earning money from any commercial use or deployment of this project, you must provide clear attribution to the original creator. Please mention the original repository and author in your application's documentation, credits, or appropriate attribution section.

Technologies Used

FastAPI - Modern web framework
LangChain - LLM orchestration and RAG
ChromaDB - Vector database
Ollama - Local LLM runtime
youtube-transcript-api - YouTube transcript extraction
Pydantic - Data validation

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

Support

For questions or issues, please open an issue on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agents.py		agents.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube RAG Chatbot API

Features

Quick Start

Prerequisites

Installation

API Endpoints

1. Load Video

2. Ask Question

Architecture

Components

Key Notes

Translation Behavior

Future Updates & Improvements

License

Commercial Use Attribution

Technologies Used

Contributing

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YouTube RAG Chatbot API

Features

Quick Start

Prerequisites

Installation

API Endpoints

1. Load Video

2. Ask Question

Architecture

Components

Key Notes

Translation Behavior

Future Updates & Improvements

License

Commercial Use Attribution

Technologies Used

Contributing

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages