groundwork

A RAG agent that reads documents from the data/ directory and answers questions based solely on those documents. Everything runs locally. No API calls, no internet required.

How it works

When you start the script, it reads every text file in data/ and splits them into chunks of roughly 300 words each. Each chunk is passed through an embedding model (nomic-embed-text) which converts it into a 768-dimensional vector. All vectors are stored in a numpy matrix in memory.

When you type a question:

The same embedding model converts your question into a vector
Cosine similarity (dot product) finds the top 3 chunks closest to your question
Those 3 chunks are inserted into a prompt that instructs the LLM to answer only from the provided context
The model (qwen2.5-coder:1.5b-instruct-q4_K_M) streams the answer back

The prompt explicitly tells the model not to use its own knowledge, only the context it receives.

Setup

ollama pull qwen2.5-coder:1.5b-instruct-q4_K_M
ollama pull nomic-embed-text:latest
pip install ollama numpy
python3 rag_agent.py

Place .txt files in data/ before running. Type your questions at the prompt. Type quit to exit.

Models used

LLM: qwen2.5-coder:1.5b-instruct-q4_K_M — 4-bit quantized, roughly 1GB. Selected for its small footprint and strong instruction following relative to its size.

Embedding model: nomic-embed-text — 274MB, 768-dimensional output. Purpose-built for semantic similarity tasks. Using a dedicated embedding model rather than the LLM itself is more efficient and produces better retrieval results.

You can swap either model by changing the EMBED_MODEL and LLM_MODEL variables at the top of the script.

Why this exists

Standard chatbots have no mechanism to restrict their answers to a specific set of documents. They draw from their training data, which makes them unsuitable for querying private or domain-specific information. This agent solves that by:

Keeping all data local (privacy)
Using retrieval to select only relevant context (precision)
Structuring the prompt to restrict the LLM's output (grounding)

Project structure

rag/
├── rag_agent.py
├── data/               # source documents (txt files)
└── README.md

Notes

Only supports plain text files currently
Embeddings are recomputed on every startup (no persistence yet)
Chunk size is fixed at ~300 words
Retrieval is limited to top 3 chunks

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
.gitignore		.gitignore
README.md		README.md
rag_agent.py		rag_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

groundwork

How it works

Setup

Models used

Why this exists

Project structure

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

groundwork

How it works

Setup

Models used

Why this exists

Project structure

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages