AI Repo Assistant — RAG over GitHub Repositories

An AI-powered assistant that ingests GitHub repositories, stores code embeddings in a vector database, and enables contextual Q&A over a codebase using a local LLM — fully local, no data leaves your machine.

Overview

The system implements a two-stage Retrieval-Augmented Generation (RAG) pipeline:

Fetch Python files from a GitHub repository via the GitHub API (no cloning)
Parse each file with AST to extract functions and classes with full signatures
Embed each chunk with all-MiniLM-L6-v2 (SentenceTransformer) and store in Qdrant
On query: retrieve candidate chunks by vector similarity (bi-encoder), then rerank with a Cross-Encoder for higher precision
Build a token-aware context window and generate an answer via a local Ollama LLM

Tech Stack

Component	Role
FastAPI	Async REST API
Qdrant	Vector database (cosine similarity search)
Ollama	Local LLM inference
SentenceTransformers	Bi-encoder embeddings (`all-MiniLM-L6-v2`, 384-dim) + Cross-Encoder reranking (`ms-marco-MiniLM-L-6-v2`)
aiohttp	Async GitHub API client
Docker Compose	Three-service local deployment

Features

GitHub repo ingestion — fetches files via GitHub API with async concurrency (semaphore=20, exponential backoff)
AST-based chunking — extracts functions and classes using ast.unparse() for complete signatures including type annotations, *args, **kwargs, and base classes
Idempotent ingest — point IDs are deterministic SHA256 hashes of (repo_id, file_path, symbol), so re-ingesting the same repo is a no-op; force=true triggers a full re-index
Two-stage retrieval — vector search fetches limit×3 candidates, a Cross-Encoder reranks them to limit; each result carries both score (cosine) and rerank_score
Token-aware context — context is truncated to 8 000 characters (2 000 per chunk) before being sent to the LLM, avoiding context window overflows without a tokenizer dependency
Query rewriting — optional LLM-powered query rewrite before retrieval (adapt_user_query=true)
Observability — structured key=value log lines at every pipeline stage; /health and /readiness endpoints
Graceful error handling — VectorDBError → 503, LLMError → 502, PermanentGitHubError (401/403) is not retried

Architecture

GitHub API (HTTP)
      ↓
GitHubParser  — async fetch, base64 decode, exponential backoff
      ↓
AST Chunker   — ast.unparse() signatures, module-level + class methods
      ↓
SentenceTransformer (bi-encoder, 384-dim)
      ↓
Qdrant  — deterministic point IDs, batched upsert (100 pts/batch)
      ↓
Vector Search  — cosine similarity, score_threshold filter
      ↓
Cross-Encoder  — ms-marco-MiniLM-L-6-v2 reranking
      ↓
Context Builder  — char-based truncation (8k chars)
      ↓
Ollama LLM  — local inference, 150s timeout
      ↓
Answer

Getting Started

cp .env.example .env          # set GITHUB_TOKEN and optionally LLM_MODEL
make up-local                 # start FastAPI (:8000), Qdrant (:6333), Ollama (:11434)
make pull-model               # pull the model specified in .env (once after first start)

To stop: make down-local

API Endpoints

`POST /api/v1/repo_parser/ingest`

Fetches a GitHub repository, parses Python files with AST, embeds each chunk, and stores in Qdrant.

{ "owner": "tiangolo", "repo": "fastapi", "branch": "master", "force": false }

force: true — deletes existing vectors for this repo before re-indexing (handles renamed/deleted symbols)

`GET /api/v1/repo_parser/query`

Vector-searches the index and returns matching code chunks with similarity scores.

?query=how does dependency injection work&owner=tiangolo&repo=fastapi&branch=master&limit=5&score_threshold=0.3

Response includes items, total, and score_threshold_used.

`GET /api/v1/repo_parser/ask`

Two-stage retrieval + LLM answer generation.

?query=how does dependency injection work&owner=tiangolo&repo=fastapi&adapt_user_query=false

adapt_user_query=true — rewrites the query via Ollama before retrieval

Response:

{ "answer": "...", "context_found": true }

context_found: false — no matching code was found in the vector DB; the LLM answered from general knowledge without a RAG context constraint

`GET /api/v1/repo_parser/ask/stream`

Same as /ask but streams tokens as they are generated (use curl -N).

The X-Context-Found: true/false response header signals whether vector DB results were used.

`GET /api/v1/health` / `GET /api/v1/readiness`

Liveness and readiness probes (readiness checks Qdrant connectivity).

Configuration

All settings are in .env (see .env.example):

Variable	Default	Description
`GITHUB_TOKEN`	—	GitHub personal access token (required)
`LLM_MODEL`	`qwen2.5-coder:1.5b`	Ollama model name
`QDRANT_HOST`	`qdrant`	Qdrant service hostname
`LLM_HOST`	`ollama`	Ollama service hostname
`RERANKER_ENABLED`	`true`	Enable Cross-Encoder reranking

Development

make check     # ruff lint + mypy type check + pytest
make format    # auto-format with ruff
make test      # run pytest only

CI

GitHub Actions runs make check (lint → typecheck → tests) on every push and pull request to master. The workflow installs only requirements-dev.txt — no runtime dependencies — since tests cover pure-Python utilities and mypy is configured with ignore_missing_imports = true.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
hooks		hooks
src/app		src/app
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Repo Assistant — RAG over GitHub Repositories

Overview

Tech Stack

Features

Architecture

Getting Started

API Endpoints

`POST /api/v1/repo_parser/ingest`

`GET /api/v1/repo_parser/query`

`GET /api/v1/repo_parser/ask`

`GET /api/v1/repo_parser/ask/stream`

`GET /api/v1/health` / `GET /api/v1/readiness`

Configuration

Development

CI

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Repo Assistant — RAG over GitHub Repositories

Overview

Tech Stack

Features

Architecture

Getting Started

API Endpoints

POST /api/v1/repo_parser/ingest

GET /api/v1/repo_parser/query

GET /api/v1/repo_parser/ask

GET /api/v1/repo_parser/ask/stream

GET /api/v1/health / GET /api/v1/readiness

Configuration

Development

CI

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/v1/repo_parser/ingest`

`GET /api/v1/repo_parser/query`

`GET /api/v1/repo_parser/ask`

`GET /api/v1/repo_parser/ask/stream`

`GET /api/v1/health` / `GET /api/v1/readiness`

Packages