Skip to content

builtbyashwin/groundwork

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

groundwork

A RAG agent that reads documents from the data/ directory and answers questions based solely on those documents. Everything runs locally. No API calls, no internet required.


How it works

When you start the script, it reads every text file in data/ and splits them into chunks of roughly 300 words each. Each chunk is passed through an embedding model (nomic-embed-text) which converts it into a 768-dimensional vector. All vectors are stored in a numpy matrix in memory.

When you type a question:

  1. The same embedding model converts your question into a vector
  2. Cosine similarity (dot product) finds the top 3 chunks closest to your question
  3. Those 3 chunks are inserted into a prompt that instructs the LLM to answer only from the provided context
  4. The model (qwen2.5-coder:1.5b-instruct-q4_K_M) streams the answer back

The prompt explicitly tells the model not to use its own knowledge, only the context it receives.


Setup

ollama pull qwen2.5-coder:1.5b-instruct-q4_K_M
ollama pull nomic-embed-text:latest
pip install ollama numpy
python3 rag_agent.py

Place .txt files in data/ before running. Type your questions at the prompt. Type quit to exit.


Models used

LLM: qwen2.5-coder:1.5b-instruct-q4_K_M — 4-bit quantized, roughly 1GB. Selected for its small footprint and strong instruction following relative to its size.

Embedding model: nomic-embed-text — 274MB, 768-dimensional output. Purpose-built for semantic similarity tasks. Using a dedicated embedding model rather than the LLM itself is more efficient and produces better retrieval results.

You can swap either model by changing the EMBED_MODEL and LLM_MODEL variables at the top of the script.


Why this exists

Standard chatbots have no mechanism to restrict their answers to a specific set of documents. They draw from their training data, which makes them unsuitable for querying private or domain-specific information. This agent solves that by:

  • Keeping all data local (privacy)
  • Using retrieval to select only relevant context (precision)
  • Structuring the prompt to restrict the LLM's output (grounding)

Project structure

rag/
├── rag_agent.py
├── data/               # source documents (txt files)
└── README.md

Notes

  • Only supports plain text files currently
  • Embeddings are recomputed on every startup (no persistence yet)
  • Chunk size is fixed at ~300 words
  • Retrieval is limited to top 3 chunks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages