π Live Demo β’ πΈ View Screenshots β’ π Quick Start β’ ποΈ Architecture
Revolutionary RAG system that builds dependency graphs using Tree-sitter for context-aware code understanding
Traditional RAG systems break code context with random chunking. CodeRAG AI is different.
|
|
| Feature | Description |
|---|---|
| π³ Tree-sitter Parsing | Context-rich chunking that preserves code structure |
| π Dependency Graphs | Complete function, class, and import relationship mapping |
| ποΈ Neo4j Storage | Graph database with vector embeddings for semantic search |
| π€ Gemini-Powered | AI responses with full codebase context |
| π Session Management | Create new sessions or rejoin existing conversations |
| π― Smart Context | Retrieves code with all dependencies intact |
| π‘ Natural Queries | Ask "define loginController" - automatically enhanced |
| π Multi-Language | Supports multiple programming languages via Tree-sitter |
| π Persistent Sessions | Use session IDs to continue conversations anytime |
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Git Clone β βββ> β Tree-sitter β βββ> β Extract Nodes β
β Repository β β AST Parser β β & Chunks β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββ
β Build Dependency Graph β
β β’ Function calls β
β β’ Class calls β
β β’ Import resolution β
β β’ Sibling relationshipsβ
βββββββββββββββββββββββββββ
β
βββββββββββββ΄βββββββββββββ
βΌ βΌ
βββββββββββββββββ ββββββββββββββββ
β Generate β β Store in β
β Embeddings β β Neo4j with β
β (Gemini) β β Session ID β
βββββββββββββββββ ββββββββββββββββ
- Repository Cloning - Clone via Git with unique session ID
- File Walking - Extract language from extension, process code files
- AST Generation - Tree-sitter parses code into Abstract Syntax Trees
- Import Extraction - Capture all imports for inter-file dependencies
- Chunk Creation - DFS traversal with MIN_CHUNK_SIZE threshold
- Node ID:
{file_path}:{start_line}:{node_type} - Extract: name, calls, siblings, parent relationships
- Node ID:
- Call Resolution - Match function/class calls to actual definitions
- Import Resolution - Link imports to their source definitions
- Document Handling - Process README, docs with recursive chunking
- Graph Storage - Store in Neo4j with embeddings and session ID
- User Query - Natural language question with session ID
- Query Enhancement - LLM expands "define loginController" to full context
- Vector Embedding - Convert enhanced query to vector (Gemini)
- Top-K Retrieval - Find most similar chunks from Neo4j
- Dependency Fetching - Retrieve all related nodes (calls, imports, siblings)
- Context Assembly - Concatenate code with dependencies
- LLM Response - Gemini generates answer with full context
- Return Result - Send back to client with session preserved
- Paste any GitHub repository URL
- System generates unique session ID
- Index repository and build dependency graph
- Start chatting immediately
- Use session ID from previous conversations
- Instantly access same codebase context
- Continue conversations anytime
- Share session IDs with team members
Live Application: code-rag.vercel.app
{
"id": "file.py:42:function_definition",
"name": "loginController",
"code_str": "def loginController(req, res): ...",
"ast_type": "function_definition",
"file": "src/controllers/auth.py",
"language": "python",
"start_line": 42,
"end_line": 58,
"size": 245,
"relationships": {
"belongs_to": ["file.py:10:class_declaration"],
"parent": ["file.py:5:module"],
"sibling": ["file.py:60:function_definition"],
"function_call": ["utils.py:15:function_definition"],
"class_call": ["models.py:20:class_declaration"],
"imports_from": ["auth_service.py:5:function_definition"]
},
"metadata": {
"depth": 2,
"calls": ["validateUser", "generateToken"],
"type_references": ["User", "AuthService"],
"is_definition": true,
"definition_type": "function"
}
}Problem: Traditional RAG loses context
# Random chunk breaks meaning
def process_payment(order):
validator = OrderValidator() # What is OrderValidator?
if validator.check(order): # check() definition lost
return payment_gateway.charge() # No import contextSolution: CodeRAG preserves everything
OrderValidatorβ Links to class definitioncheck()β Links to method implementationpayment_gatewayβ Resolves import source- Siblings β Related functions in same file
Choose to create new session or join existing conversation
Natural conversation with full dependency context
Visual representation of code relationships
- π Python 3.9+
- π¦ Node.js 18+ and npm
- π API Keys:
- Neo4j Aura - Graph Database
- Google AI Studio - Gemini LLM & Embeddings
git clone https://github.com/shivamsahu-tech/coderag-ai.git
cd coderag-aiClient (client/.env):
VITE_SERVER_URL=http://localhost:8000Server (server/.env):
NEO4J_URI=neo4j+s://xxxxx.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password
LLM_API_KEY=your-gemini-api-key
EMBEDDING_API_KEY=your-gemini-api-keyFrontend:
cd client
npm installBackend:
cd server
python3 -m venv .env
source .env/bin/activate # Windows: .env\Scripts\activate
pip install -r requirements.txtBackend (Terminal 1):
cd server
source .env/bin/activate
uvicorn main:app --reloadβ Running at: http://localhost:8000
Frontend (Terminal 2):
cd client
npm run devβ Running at: http://localhost:5173
- Open http://localhost:5173
- New Session: Paste GitHub URL β Wait for indexing
- Join Session: Enter existing session ID
- Ask questions about the codebase!
MIN_CHUNK_SIZE determines granularity based on:
- β LLM token limits (directly proportional)
- β Embedding dimensions (directly proportional)
- β Graph density (inversely proportional)
Node ID Format: {file_path}:{start_line}:{node_type}
Example: src/auth.py:42:function_definition
- Extract calls from AST nodes (e.g.,
foo()) - Traverse all chunks to find matching definitions
- Link caller β callee with
function_callrelationship - Store resolved node IDs in relationship fields
- Parse
imports_fromfield (e.g.,from auth import login) - Identify source file from import path
- Search source file chunks for matching module
- Create
imports_fromrelationship edge
|
|
"Define loginController"
β Enhanced: "Explain loginController function, its purpose, related
functions, and how it handles authentication"
"Where is the database initialized?"
"Show all functions that call validateUser"
"What does the UserService class do?"
"How are imports structured in this project?"
"Find all API endpoints"