GitHub - abdrahman-dev/RatMD: Convert bloated PDFs into clean, AI-ready Markdown. Reduce LLM token usage with client-side parsing and tiktoken-accurate estimation.

██████╗  █████╗ ████████╗███╗   ███╗██████╗
██╔══██╗██╔══██╗╚══██╔══╝████╗ ████║██╔══██╗
██████╔╝███████║   ██║   ██╔████╔██║██║  ██║
██╔══██╗██╔══██║   ██║   ██║╚██╔╝██║██║  ██║
██║  ██║██║  ██║   ██║   ██║ ╚═╝ ██║██████╔╝
╚═╝  ╚═╝╚═╝  ╚═╝   ╚═╝   ╚═╝     ╚═╝╚═════╝

PDF to Markdown, optimized for AI — strip noise, preserve structure, and reduce token count for LLM ingestion.

✨ What is RatMD

RatMD converts PDF documents into clean, token-efficient Markdown designed for LLM workflows. The frontend runs entirely in your browser — no uploads, no servers, no privacy leaks. The parser extracts text from PDFs using pdfjs-dist, groups content into structured lines, detects headings by font size ratios, and outputs Markdown that preserves document hierarchy.

Token savings are real but vary by document. Heavily formatted PDFs with repeated headers, footers, and whitespace typically see 30–60% fewer tokens. Plain academic papers with minimal formatting see smaller gains. The estimator uses OpenAI's cl100k_base encoding (via js-tiktoken) for accurate counts — not a heuristic.

The backend provides user accounts, conversion history tracking, community leaderboards, and profile management — all secured with httpOnly cookie-based JWT auth.

🚀 Features

PDF parsing — text extraction via pdfjs-dist v5 with line grouping and heading detection
Token estimation — real cl100k_base encoding via js-tiktoken, not approximate math
Auth & Community — JWT-based auth with httpOnly cookies, email verification via OTP, conversion tracking, rat ranks, leaderboards, user profiles
Light/dark theme — warm parchment light mode, dark-first default, persisted in localStorage
Mobile navigation — hamburger menu with animated dropdown on screens < 768px
FAQ page — 18 questions across 6 categories with accordion expand/collapse
Client-side privacy — all processing happens in the browser, zero server uploads
RAG-ready output — clean Markdown structured for vector databases and LLM context windows
Export — download .md file or copy to clipboard
Responsive design — full mobile support, floating pill navbar, container breakpoints
Framer Motion animations — scroll-triggered fade-ins, entrance sequences, pulse effects

🏗 Architecture

┌─────────────────────────────────────────────────────────┐
│  app/web/                  services/auth/               │
│  ──────────                ─────────────                │
│  React 19 SPA              Node.js + Express 5          │
│  Vite 8 + TailwindCSS v4   MongoDB + Mongoose 9         │
│  Zustand 5 state           JWT auth (httpOnly cookies)  │
│  React Router 7            Zod validation               │
│                                                             │
│  Pages:                    Endpoints:                   │
│  / /converter /docs /faq   /api/auth/*                  │
│  /login /register          /api/conversions/*           │
│  /verify-email             /api/profile                 │
│  /dashboard /profile       /api/community/*             │
└─────────────────────────────────────────────────────────┘

📦 Tech Stack

Technology	Version	Purpose
React	19	UI framework
TypeScript	6	Type safety
Vite	8	Bundler and dev server
TailwindCSS	4	Utility-first styling with `@theme` tokens
Framer Motion	12	Animation library
Zustand	5	State management
React Router	7	Client-side routing
pdfjs-dist	5	PDF text extraction
js-tiktoken	1	OpenAI `cl100k_base` token encoding
Express	5	Backend API framework
MongoDB	9	Database + Mongoose ODM
Zod	4	Input validation

📁 Project Structure

RatMD/
├── app/web/src/
│   ├── app/
│   │   ├── layouts/         # RootLayout with header + footer + outlet
│   │   ├── providers/       # Session restore on app mount
│   │   ├── router/          # React Router config (10 routes)
│   │   └── store/           # Zustand stores (app-store, auth-store)
│   ├── components/
│   │   ├── animations/      # AnimatedElement (Framer Motion scroll-reveal)
│   │   ├── layout/          # Header (auth-aware navbar), Footer
│   │   ├── shared/          # Section wrapper component
│   │   └── ui/              # Button, Card, Badge, Container, Logo
│   ├── features/
│   │   ├── export/          # Download .md + clipboard copy
│   │   ├── markdown-preview/# Rendered Markdown output viewer
│   │   ├── parser/          # ParserPanel with animated stages
│   │   ├── token-estimator/ # Token comparison bars + detail view
│   │   └── upload/          # Drag-and-drop upload zone
│   ├── hooks/               # useTheme, useFileUpload, useTokenEstimate
│   ├── lib/
│   │   ├── api/             # Fetch client (credentials: include) + endpoints
│   │   ├── constants/       # Routes, nav links, feature data, steps
│   │   ├── pdf/             # Real PDF parser (pdfjs-dist, heading detection)
│   │   ├── tokenizer/       # Real token estimator (js-tiktoken cl100k_base)
│   │   └── utils/           # cn(), formatBytes, formatNumber, generateId
│   ├── pages/
│   │   ├── auth/            # Login, Register, VerifyEmail
│   │   ├── converter/       # Full conversion workflow
│   │   ├── dashboard/       # User stats + conversion history
│   │   ├── docs/            # CLI reference + web guide
│   │   ├── faq/             # 18-question accordion FAQ
│   │   ├── home/            # 7-section landing page
│   │   └── profile/         # Avatar picker + profile form
│   ├── services/            # Parser + export service abstractions
│   ├── styles/              # @theme tokens + light/dark mode + keyframes
│   ├── types/               # TypeScript interfaces
│   ├── App.tsx
│   └── main.tsx
│
└── services/auth/
    ├── app.js               # Entry point, middleware, route registration
    ├── config/              # Cookie options, Zod env validation
    ├── controllers/         # auth/, conversion/, community/, profile/
    ├── middleware/           # authMiddleware, errorHandler, rateLimiter, validate
    ├── model/               # userModel, refreshToken, conversionModel, mongodb
    ├── routes/              # authRoutes, conversionRoutes, communityRoutes, profileRoutes
    ├── utils/               # authTokens, logger
    └── .env                 # PORT, MONGODB_URL, JWT secrets

🛠 Getting Started

Prerequisites

Node.js 20+
npm 10+
MongoDB (local or Atlas)

Installation

# Backend
cd services/auth
npm install

# Frontend
cd app/web
npm install

Development

Run both services in separate terminals:

# Terminal 1 — Backend
cd services/auth
npm run dev
# Runs at http://localhost:5000

# Terminal 2 — Frontend
cd app/web
npm run dev
# Opens at http://localhost:5173

Build

cd app/web
npm run build
# Output in app/web/dist/

🐳 Docker

# From project root
docker compose up -d
# Opens at http://localhost:3000

The Docker image serves the built static app via Nginx.

⚠️ Known Limitations

Heading detection is heuristic-based — font size ratios determine heading levels. PDFs with non-standard sizing or inline formatting may produce incorrect hierarchy.
Token savings vary by document type — heavily formatted PDFs (whitespace, repeated headers, page numbers) see 30–60% reduction. Plain academic papers with minimal formatting see smaller gains.
Client-side processing limit — PDFs over 10MB may be slow or fail on low-end devices. The 10MB file cap reflects practical browser memory limits.
No image/table extraction — the current parser only extracts text. Images, tables, and complex layouts are not preserved.

🗺 Roadmap

Auth system — JWT with httpOnly cookies, email verification, password reset
Dashboard — user stats, conversion history with pagination
Profile — avatar picker, bio, social links
Community leaderboard UI
Server-side parsing — offload heavy processing to a worker service
CLI tool — standalone binary for terminal workflows (ratmd convert file.pdf)
Batch processing — convert multiple PDFs in a single operation
Image extraction — preserve embedded images in output

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
app/web		app/web
services/auth		services/auth
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
nginx.conf		nginx.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ What is RatMD

🚀 Features

🏗 Architecture

📦 Tech Stack

📁 Project Structure

🛠 Getting Started

Prerequisites

Installation

Development

Build

🐳 Docker

⚠️ Known Limitations

🗺 Roadmap

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

✨ What is RatMD

🚀 Features

🏗 Architecture

📦 Tech Stack

📁 Project Structure

🛠 Getting Started

Prerequisites

Installation

Development

Build

🐳 Docker

⚠️ Known Limitations

🗺 Roadmap

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages