SmartDoc RAG

Intelligent Document Q&A System — kết hợp Standard RAG + CoRAG (Corrective RAG) với Local LLM qua Ollama/Ngrok.

Tech Stack

Layer	Technology
Frontend	React 19 + Vite + Zustand
Backend	FastAPI + LangChain
Vector DB	FAISS (local) + BM25 (hybrid search)
Embeddings	`paraphrase-multilingual-mpnet-base-v2` (768-dim)
OCR	EasyOCR
LLM	Ollama (`qwen2.5:7b`)
Web Search	Tavily API (fallback: DuckDuckGo)
CoRAG Eval	CrossEncoder `ms-marco-MiniLM-L-6-v2`
History	SQLite

Cài Đặt

Yêu cầu

Python 3.8+
Node.js 18+
Ollama đã cài và đang chạy

1. Cài Ollama và pull model

# Tải Ollama tại https://ollama.ai rồi pull model
ollama pull qwen2.5:7b

2. Tạo file `.env`

OLLAMA_BASE_URL=http://localhost:11434/v1
OLLAMA_MODEL=qwen2.5:7b
TAVILY_API_KEY=tvly-xxxxxxxxxxxx        # Lấy tại tavily.com (tuỳ chọn)

# Tuỳ chỉnh (có thể bỏ qua — dùng mặc định)
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
TOP_K_RETRIEVAL=5
RELEVANCE_THRESHOLD=0.35
RETRIEVAL_MODE=hybrid                   # hybrid | vector

Chạy LLM trên Google Colab (tuỳ chọn)

Nếu máy local không đủ VRAM, bạn có thể chạy Ollama trên Google Colab T4 GPU và expose qua ngrok.

Kiến trúc

Colab (GPU T4)               Máy local
┌───────────────────┐        ┌──────────────────────┐
│  Ollama            │        │  FastAPI backend      │
│  qwen2.5:7b        │◄───────│  (uvicorn :8000)      │
│  + ngrok tunnel    │        │  React frontend       │
└───────────────────┘        │  (vite :5173)         │
                             └──────────────────────┘

Notebook Colab

Bước 1 — Chọn runtime T4 GPU: Runtime → Change runtime type → Hardware accelerator: T4 GPU → Save.

Bước 2 — Tạo notebook mới tại colab.research.google.com và chạy lần lượt:

# Cell 1 — Cài Ollama + ngrok
!apt-get install -y zstd -q
!curl -fsSL https://ollama.com/install.sh | sh
!pip install pyngrok -q

# Cell 2 — Start Ollama server (dùng thread để không block)
import os, threading, subprocess, time

def run_ollama():
    os.environ["OLLAMA_HOST"] = "0.0.0.0:11434"
    subprocess.run(["ollama", "serve"])

threading.Thread(target=run_ollama, daemon=True).start()
time.sleep(5)
print("Ollama server sẵn sàng!")

# Cell 3 — Pull model (lần đầu mất vài phút)
!ollama pull qwen2.5:7b

# Cell 4 — Verify Ollama đang chạy
!curl -s http://localhost:11434/api/tags | python3 -c "import sys,json; print('Models:', [m['name'] for m in json.load(sys.stdin)['models']])"

# Cell 5 — Tạo ngrok tunnel và in URL
from pyngrok import ngrok
ngrok.set_auth_token("<YOUR_NGROK_TOKEN>")  # Lấy tại ngrok.com/dashboard
tunnel = ngrok.connect(11434, "http")
print("NGROK_LLM_URL =", tunnel.public_url)
# → Copy URL này vào .env trên máy local

Cập nhật `.env` local

Sau khi Cell 5 in ra URL, cập nhật file .env:

NGROK_LLM_URL=https://xxxx.ngrok-free.app   # URL lấy từ Cell 5
OLLAMA_MODEL=qwen2.5:7b

Lưu ý:

URL ngrok thay đổi mỗi lần restart Colab (free tier) — phải update .env lại.

Để có URL cố định: dùng static domain miễn phí tại ngrok dashboard.

Colab free tự disconnect sau ~90 phút không tương tác — giữ tab active hoặc dùng Colab Pro.

3. Backend

# Tạo virtual environment
python -m venv venv
source venv/bin/activate        # Linux/Mac
venv\Scripts\activate           # Windows

# Cài dependencies
pip install -r requirements.txt

# Khởi động server
uvicorn app:app --reload --host 0.0.0.0 --port 8000

Lần đầu chạy: Ba model ML sẽ tự động tải về khi cần:

MPNet embeddings (~450 MB) — khi upload hoặc query lần đầu

EasyOCR (~120 MB) — khi upload ảnh/PDF scan lần đầu

CrossEncoder (~80 MB) — khi CoRAG query lần đầu

4. Frontend

cd frontend
npm install
npm run dev
# Mở http://localhost:5173

Hướng Dẫn Sử Dụng

Upload tài liệu

Kéo thả file vào vùng "Upload tài liệu" ở sidebar trái, hoặc click để chọn file.
Hỗ trợ: PDF, DOCX, PNG, JPG, JPEG, BMP, WEBP, TIFF.
Chọn nhiều file cùng lúc: giữ Ctrl/Cmd khi chọn, hoặc kéo thả nhiều file — hệ thống xử lý song song và hiển thị tiến trình riêng từng file.
Khi thanh tiến trình hiện "Xong", file đã sẵn sàng để hỏi đáp.

Đặt câu hỏi

Gõ câu hỏi vào ô nhập ở cuối màn hình.
Nhấn Enter để gửi (Shift+Enter để xuống dòng).
Hệ thống chạy đồng thời hai pipeline:
- RAG — tìm kiếm trong tài liệu và trả lời nhanh.
- CoRAG — đánh giá lại độ liên quan, nếu không đủ sẽ viết lại truy vấn, tìm lại, hoặc tìm kiếm web bổ sung.
Kết quả hiển thị kèm nguồn trích dẫn (tên file, số trang, đoạn trích).

Hội thoại liên tục (Conversational RAG)

Hệ thống tự động ghi nhớ các câu hỏi và trả lời trong cùng phiên chat. Bạn có thể đặt câu hỏi tiếp nối mà không cần nhắc lại ngữ cảnh:

Bạn: Tài liệu này nói về gì?
Bot: [trả lời tổng quan]

Bạn: Tóm tắt phần đầu tiên
Bot: [trả lời đúng mà không cần nhắc lại tên tài liệu]

Bạn: Còn phần tiếp theo thì sao?
Bot: [tiếp tục từ ngữ cảnh trước]

Mặc định hệ thống sử dụng 4 cặp Q&A gần nhất làm ngữ cảnh. Có thể thay đổi qua field history_window trong API.

Quản lý phiên chat

Sidebar trái hiển thị danh sách tất cả phiên chat đã lưu:

Thao tác	Cách thực hiện
Tạo phiên mới	Nhấn nút "+ Mới" ở đầu danh sách
Xem lại phiên cũ	Click vào tên phiên — lịch sử tải lại ngay
Xóa một phiên	Nhấn × bên phải tên phiên → xác nhận
Xóa tin nhắn hiện tại	Nhấn "Xóa lịch sử" ở header chat → xác nhận

Phiên đang active được highlight. Lịch sử lưu trong SQLite (data/history.db) — tồn tại qua các lần restart.

Xóa dữ liệu đã index

Để xóa toàn bộ tài liệu đã upload và index (FAISS + BM25):

Cuộn xuống cuối sidebar trái.
Nhấn "Xóa Vector Store" → xác nhận.
Upload lại tài liệu để sử dụng tiếp.

Tính Năng Nâng Cao

Hybrid Search (BM25 + FAISS)

Mặc định hệ thống dùng RETRIEVAL_MODE=hybrid — kết hợp tìm kiếm semantic (FAISS) và keyword (BM25) bằng Reciprocal Rank Fusion:

Semantic search (FAISS): hiểu nghĩa câu hỏi, tốt với câu hỏi diễn giải lại.
Keyword search (BM25): match từ khóa chính xác, tốt với tên riêng, mã số, thuật ngữ kỹ thuật.

Để quay về pure semantic: đặt RETRIEVAL_MODE=vector trong .env rồi restart server.

Lưu ý: FAISS index cũ (trước khi bật hybrid) không có BM25 corpus. Khi set hybrid, phần dữ liệu cũ tự động fallback về vector-only. Để dùng hybrid đầy đủ: xóa Vector Store và upload lại.

CoRAG Pipeline

Query
  |
  v
[Retrieve] Hybrid search (FAISS + BM25) top-5
  |
  v
[Evaluate] Cross-encoder score
  |
  +-- score >= threshold --> [Generate] LLM --> Answer
  |
  +-- score < threshold --> [Rewrite query] --> [Re-retrieve]
                                  |
                                  +-- score >= threshold --> [Generate]
                                  |
                                  +-- score < threshold --> [Web Search]
                                                                |
                                                                v
                                                          [Generate] LLM (doc + web) --> Answer

Cấu Trúc Thư Mục

smartdoc-rag/
├── app.py
├── .env
├── requirements.txt
├── core/
│   ├── document_loader.py    # PDF / DOCX / ảnh + EasyOCR
│   ├── embeddings.py         # MPNet singleton
│   ├── vector_store.py       # FAISS persist/load + sync BM25 corpus
│   ├── bm25_retriever.py     # BM25Okapi, corpus pickle, tokenize
│   ├── retriever.py          # Hybrid RRF / pure-vector dispatch
│   ├── llm.py                # Ollama client + anti-hallucination prompt
│   ├── rag_chain.py          # Standard RAG + conversational history
│   └── history_store.py      # SQLite: chat_history + chat_sessions
├── features/
│   ├── citation_tracker.py
│   └── corag/
│       ├── evaluator.py      # Cross-encoder context eval
│       ├── rewriter.py       # Query rewrite
│       ├── web_search.py     # Tavily + DuckDuckGo
│       └── corag_chain.py    # CoRAG pipeline + conversational history
├── api/
│   ├── schemas.py
│   └── routes/
│       ├── upload.py         # POST /api/upload — multi-file SSE
│       ├── query.py          # POST /api/query  — SSE
│       └── history.py        # GET/DELETE sessions & history
└── frontend/
    └── src/
        ├── components/
        │   ├── Sidebar.jsx               # Upload + sessions + docs
        │   ├── ConfirmDialog.jsx
        │   ├── upload/DropZone.jsx       # Multi-file drag-drop
        │   ├── upload/ProgressStepper.jsx
        │   ├── chat/ChatPanel.jsx
        │   ├── chat/MessageBubble.jsx
        │   ├── chat/CitationCard.jsx
        │   └── query/QueryProgress.jsx
        ├── store/chatStore.js
        └── services/api.js

API Endpoints

Method	Path	Mô tả
POST	`/api/upload`	Upload 1 hoặc nhiều file — SSE stream
POST	`/api/query`	Đặt câu hỏi — SSE stream
GET	`/api/sessions`	Danh sách phiên chat
DELETE	`/api/sessions/{id}`	Xóa một phiên chat
GET	`/api/history`	Lịch sử câu hỏi của session
DELETE	`/api/history`	Xóa tin nhắn của session
DELETE	`/api/vectorstore`	Xóa toàn bộ FAISS + BM25 index
GET	`/api/stats`	Số vector hiện tại
GET	`/api/health`	Health check
GET	`/docs`	Swagger UI

SSE Event Format

Upload (step field):

reading_file → ocr → ocr_done → chunking → chunking_done → indexing → done

Mỗi event có thêm filename, file_index, total_files để phân biệt khi upload nhiều file.

Query (step field, kèm source: "rag" | "corag"):

retrieval → retrieval_done → [evaluating → evaluation_done]
→ [rewriting_query → re_retrieval → re_evaluating → re_evaluation_done]
→ [web_search → web_search_done]
→ generating → answer

Biến Môi Trường

Biến	Mặc định	Mô tả
`OLLAMA_BASE_URL`	`http://localhost:11434/v1`	Ollama API endpoint
`OLLAMA_MODEL`	`qwen2.5:7b`	Tên model
`TAVILY_API_KEY`	—	Web search (tuỳ chọn)
`EMBEDDING_MODEL`	`sentence-transformers/paraphrase-multilingual-mpnet-base-v2`	MPNet model
`FAISS_INDEX_PATH`	`./data/faiss_index`	Thư mục lưu FAISS
`BM25_CORPUS_PATH`	`./data/bm25_corpus.pkl`	File corpus BM25
`RETRIEVAL_MODE`	`hybrid`	`hybrid` hoặc `vector`
`CHUNK_SIZE`	`1000`	Kích thước chunk
`CHUNK_OVERLAP`	`200`	Overlap giữa các chunk
`TOP_K_RETRIEVAL`	`5`	Số chunk trả về mỗi truy vấn
`RELEVANCE_THRESHOLD`	`0.35`	Ngưỡng kích hoạt CoRAG fallback
`HISTORY_DB_PATH`	`./data/history.db`	SQLite path
`RRF_K`	`60`	Hằng số RRF fusion

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
api		api
core		core
features		features
frontend		frontend
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
app.py		app.py
rag_flow_analysis.md		rag_flow_analysis.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmartDoc RAG

Tech Stack

Cài Đặt

Yêu cầu

1. Cài Ollama và pull model

2. Tạo file `.env`

Chạy LLM trên Google Colab (tuỳ chọn)

Kiến trúc

Notebook Colab

Cập nhật `.env` local

3. Backend

4. Frontend

Hướng Dẫn Sử Dụng

Upload tài liệu

Đặt câu hỏi

Hội thoại liên tục (Conversational RAG)

Quản lý phiên chat

Xóa dữ liệu đã index

Tính Năng Nâng Cao

Hybrid Search (BM25 + FAISS)

CoRAG Pipeline

Cấu Trúc Thư Mục

API Endpoints

SSE Event Format

Biến Môi Trường

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SmartDoc RAG

Tech Stack

Cài Đặt

Yêu cầu

1. Cài Ollama và pull model

2. Tạo file .env

Chạy LLM trên Google Colab (tuỳ chọn)

Kiến trúc

Notebook Colab

Cập nhật .env local

3. Backend

4. Frontend

Hướng Dẫn Sử Dụng

Upload tài liệu

Đặt câu hỏi

Hội thoại liên tục (Conversational RAG)

Quản lý phiên chat

Xóa dữ liệu đã index

Tính Năng Nâng Cao

Hybrid Search (BM25 + FAISS)

CoRAG Pipeline

Cấu Trúc Thư Mục

API Endpoints

SSE Event Format

Biến Môi Trường

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Tạo file `.env`

Cập nhật `.env` local

Packages