Skip to content

【PaddlePaddle Hackathon 10】Add Doc-QnA demo with PaddleOCR-VL + OpenVINO#552

Open
bob798 wants to merge 4 commits into
openvinotoolkit:masterfrom
bob798:paddleocr-vl-doc-qna
Open

【PaddlePaddle Hackathon 10】Add Doc-QnA demo with PaddleOCR-VL + OpenVINO#552
bob798 wants to merge 4 commits into
openvinotoolkit:masterfrom
bob798:paddleocr-vl-doc-qna

Conversation

@bob798

@bob798 bob798 commented Jun 4, 2026

Copy link
Copy Markdown

Summary

Add a new Document Q&A demo that combines PaddleOCR-VL with OpenVINO for intelligent document understanding and RAG-based question answering.

  • PaddleOCR-VL (OpenVINO): Parses scanned PDFs, tables, and formulas into structured Markdown
  • Table-aware chunking: Each table row carries its header for precise cell-level lookup
  • Qwen3-Embedding-0.6B-int8 (OpenVINO): 1024-dim multilingual vector indexing via ChromaDB
  • Qwen3-1.7B-int4 (OpenVINO GenAI): Answer generation with [doc_name p.page] source citations
  • All-OpenVINO inference: OCR, Embedding, and LLM — no PyTorch at runtime
  • CPU-friendly: ~3.5s/question on Intel i5

Demo output example

Q: A100 型号的工作温度范围是多少?
A: A100 型号的工作温度范围是 -20~70℃ [spec_with_tables p.1]
⏱  embed=75ms  retrieve=1.4ms  llm=3397ms  total=3474ms  tps=10.3

One-click run

cd demos/doc_qna_demo
pip install -r requirements.txt
python main.py

Models are auto-downloaded from HuggingFace on first run (~1.6 GB total for Embedding + LLM).

Files added

demos/doc_qna_demo/
├── main.py                    # Single entry point
├── README.md                  # Setup + usage + architecture
├── requirements.txt           # Pinned dependencies
├── setup/
│   ├── install.bat            # Windows one-click install + run
│   └── install.sh             # Linux/macOS one-click install + run
├── src/                       # Core modules
│   ├── embedding.py           # Qwen3-Embedding OpenVINO wrapper
│   ├── llm.py                 # Qwen3-1.7B OpenVINO GenAI wrapper
│   ├── rag.py                 # RAG orchestration
│   ├── vector_store.py        # ChromaDB persistence
│   ├── chunker.py             # Table-aware chunking
│   ├── doc_parser.py          # PaddleOCR-VL document parsing
│   ├── pdf_preprocessor.py    # PDF text-layer detection
│   ├── inference.py           # OpenVINO inference wrapper
│   └── pipeline.py            # Phase 2 end-to-end pipeline
├── data/
│   └── demo_questions.txt     # 5 business questions for demo
└── results/phase2/            # Pre-generated chunks (skip OCR step)

Test plan

  • Verified on Windows (Python 3.12, Intel i5 CPU)
  • pip install -r requirements.txt + python main.py runs end-to-end
  • 5/5 questions answered with correct citations
  • Performance: 3.5s/question avg on CPU
  • Linux/macOS verification (pending)
  • GPU device test (pending)

bob798 and others added 4 commits June 5, 2026 07:53
Document understanding and RAG Q&A system:
- PaddleOCR-VL for document parsing (scanned PDFs, tables, formulas)
- Table-aware chunking preserving row-header context
- Qwen3-Embedding-0.6B-int8 (OpenVINO) for vector indexing
- Qwen3-1.7B-int4 (OpenVINO GenAI) for answer generation
- Source citations [doc_name p.page] in every answer
- CPU-only: ~3.5s/question on Intel i5

All inference through OpenVINO — no PyTorch at runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- main.py: 检测到缺失依赖时自动 pip install,真正一键 python main.py
- 添加 demo.png 效果截图
- README: 更新为 One Command 复现

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant