Scoring OCR quality for any incoming documents or text sequences
-
Updated
Mar 15, 2022 - Python
Scoring OCR quality for any incoming documents or text sequences
A tool for checking whether AI-written documents are ready to hand off, ship, publish, or use for decisions. Not an AI detector.
Claude Code skill: vision-LLM audit of rendered .docx/.pptx/.pdf pages. Catches formatting drift, typography mixing, SmartArt overflow, placeholder text, broken cross-refs — what Claude can't see when it only reads the source.
Fork of DeQA-Doc (VQualA 2025 DIQA Champion) extended with a confidence-weighted pseudo-labeling pipeline for scaling beyond 5K human-labeled images via 4-signal uncertainty quantification and active learning.
Add a description, image, and links to the document-quality topic page so that developers can more easily learn about it.
To associate your repository with the document-quality topic, visit your repo's landing page and select "manage topics."