You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AI-powered OCR backend that converts handwritten notes and scanned documents into searchable, editable text — with RAG-based Q&A over your notes.
Architecture
graph TD
Client["Frontend (React)"] -->|REST| API["FastAPI (server.py)"]
API --> OCR["OCR Engine"]
OCR -->|handwritten| TrOCR["TrOCR\n(microsoft/trocr-base-handwritten)"]
OCR -->|printed text| Tesseract["Tesseract OCR"]
OCR -->|fallback / low confidence| Gemini["Gemini Vision\n(gemini-2.0-flash)"]
API --> PDF["PDF Processor\n(pypdfium2 + reportlab)"]
API --> RAG["RAG Engine\n(Groq LLM + sentence-transformers)"]
API --> DB[(MongoDB)]
RAG --> DB
Loading
Engine fallback chain: TrOCR → Tesseract → Gemini (triggered when confidence < 60 %) PDF handling: text-layer extraction first; falls back to image OCR only when no selectable text exists.
Features
Multi-engine OCR — TrOCR for handwriting, Tesseract for printed text, Gemini Vision as a high-quality fallback
PDF processing — extracts embedded text layers (instant, lossless) or renders pages for OCR
Batch PDF OCR — sends all pages to Gemini in a single API call to reduce latency
Searchable PDF export — saves original image with an invisible text layer for copy-paste
Folder & note management — full CRUD with MongoDB storage
RAG Q&A — ask natural language questions over your notes (Groq LLM + local embeddings)
Semantic re-indexing — /api/rag/reindex generates embeddings for all existing notes
Tries PSM 3, 6, 4 and picks the highest-confidence result
Gemini Vision
Complex layouts, mixed content
Requires GEMINI_API_KEY; used as fallback when confidence < 60 %
Running Tests
# From the repo root
pip install pytest
pytest backend/tests/ -v
Requires MongoDB running locally. Tests use the scribeai_test database (overriding DB_NAME).
Known Limitations
CORS is wide open (*) — tighten CORS_ORIGINS before deploying publicly
RAG similarity search fetches up to 500 notes and computes cosine similarity in Python — not suitable for large corpora; consider Atlas Vector Search for scale
TrOCR is CPU-only by default; a CUDA-enabled GPU will give ~10× speedup
Hindi PDF export font support is a stub — replace the try/pass in pdf_generator.py with a Noto Devanagari font if needed
Tech Stack
Layer
Technology
API framework
FastAPI + Uvicorn
Database
MongoDB (Motor async driver)
OCR — handwriting
Microsoft TrOCR (Transformers)
OCR — printed
Tesseract 5 via pytesseract
OCR — vision LLM
Google Gemini 2.0 Flash
Embeddings
sentence-transformers (all-MiniLM-L6-v2)
LLM for RAG
Groq (llama-3.1-8b-instant)
PDF read
pypdfium2
PDF write
ReportLab
Image processing
Pillow + OpenCV
Validation
Pydantic v2
About
ScribeAI: AI OCR for handwritten + scanned documents (TrOCR + Tesseract) with RAG-based Q&A over your notes — React + FastAPI.