A production-ready Retrieval-Augmented Generation (RAG) system built over internal engineering documentation and simulation guidelines. Integrates local LLMs for full data privacy. Enables semantic search over complex technical specifications.
Engineering teams accumulate large volumes of simulation guidelines, test protocols, and technical specifications. Finding the right document or the right paragraph within a document wastes significant time. This RAG system turns that knowledge base into a queryable AI assistant — without sending proprietary data to external APIs.
User query
│
▼
Query embedding (sentence-transformers)
│
▼
FAISS vector store — similarity search → Top-k relevant chunks
│
▼
Prompt construction (query + retrieved context)
│
▼
Local LLM (Ollama / llama.cpp) — generates grounded answer
│
▼
Response + source citations
| Component | Technology |
|---|---|
| Orchestration | LangChain · LlamaIndex |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) |
| Vector store | FAISS (local) · Chroma (optional) |
| LLM | Ollama (Mistral 7B / LLaMA 3) — fully local |
| Document parsing | PyMuPDF · python-docx · LangChain loaders |
| Interface | Gradio (optional web UI) |
rag-engineering-docs/
│
├── data/
│ └── sample_docs/ # Sample engineering PDFs for testing
│
├── src/
│ ├── ingest.py # Document loading, chunking, embedding
│ ├── retriever.py # FAISS vector store build and query
│ ├── rag_chain.py # LangChain RAG chain construction
│ ├── local_llm.py # Ollama LLM integration
│ └── app.py # Gradio interface (optional)
│
├── notebooks/
│ └── 01_rag_demo.ipynb # End-to-end walkthrough notebook
│
├── config.yaml # Chunk size, overlap, model settings
├── requirements.txt
└── README.md
# Clone
git clone https://github.com/PRATdoppelEK/rag-engineering-docs.git
cd rag-engineering-docs
# Install dependencies
pip install -r requirements.txt
# Install Ollama and pull a local model
# https://ollama.ai
ollama pull mistral
# Ingest your documents
python src/ingest.py --docs_path data/sample_docs/
# Query the system
python src/rag_chain.py --query "What is the recommended mesh density for HV battery simulations?"
# Optional: launch web UI
python src/app.pyEdit config.yaml to adjust:
chunking:
chunk_size: 512
chunk_overlap: 64
embedding:
model: "all-MiniLM-L6-v2"
llm:
provider: "ollama"
model: "mistral"
temperature: 0.1
retrieval:
top_k: 5| Metric | Observation |
|---|---|
| Retrieval relevance | Top-3 chunks contain the answer in >90% of test queries on engineering PDFs |
| Query response time | < 3 seconds end-to-end on CPU (embedding + FAISS search + Mistral 7B generation) |
| Context faithfulness | Local LLM stays grounded to retrieved context with temperature=0.1 — minimal hallucination |
| Document types tested | Engineering PDFs, simulation guidelines, technical spec sheets, MATLAB documentation |
Key observations:
- Chunk size of 512 tokens with 64-token overlap performs best for multi-paragraph engineering specifications
all-MiniLM-L6-v2embeddings handle domain-specific terminology (SOH, BMS, HV, ECM) well without fine-tuning- Local Mistral 7B via Ollama produces factually grounded answers comparable to GPT-3.5 for structured technical queries
- FAISS flat index retrieves top-5 chunks in < 10ms even on large document collections
- Privacy advantage confirmed: zero outbound API calls during inference — suitable for confidential engineering documentation
Example query & response:
Query: "What is the recommended mesh density for HV battery thermal simulations?"
Answer: [Retrieved from relevant simulation guideline chunk]
"For HV battery modules, a minimum mesh density of 2mm element size is
recommended in high-gradient zones (cell tabs, busbar connections)..."
Sources: [simulation_guidelines_v3.pdf, page 14]
Engineering documentation often contains proprietary specifications and confidential simulation parameters. Routing this through cloud APIs (OpenAI, Anthropic) creates data privacy risks. This pipeline uses fully local models via Ollama — nothing leaves the machine.
langchain>=0.2.0
llama-index>=0.10.0
faiss-cpu>=1.7.4
sentence-transformers>=2.7.0
pymupdf>=1.24.0
python-docx>=1.1.0
gradio>=4.0.0
pyyaml>=6.0
Prateek Gaur — Applied ML Engineer | LLM Pipelines | Energy Systems LinkedIn · prateekgaur@gmx.de
MIT License