🧠 PROVE

Portfolio Reasoning Over Verified Evidence

"In God we trust; all others must bring data." — W. Edwards Deming

PROVE turns your actual code into proof of what you know. Point it at a resume and some GitHub repos; it builds a Neo4j knowledge graph of code-backed skill evidence, then answers questions about your experience with real snippets, GitHub links, and computed proficiency scores.

🔗 Live demo: prove.codeblackwell.ai

✨ What Makes PROVE Special

🔬 Evidence, Not Vibes — Every skill claim is backed by real code snippets, GitHub links, and computed proficiency scores. No self-reported ratings.

🧩 Semantic Bridge — Claude Sonnet writes dense context paragraphs for every code snippet at ingestion time, solving the vocabulary gap between how recruiters search ("OAuth experience") and how code reads (def refresh_token). This is the secret sauce — it makes every future search smarter.

🤖 ReAct Agent with Receipts — The query agent reasons, it doesn't just search: up to 4 tool calls per question across vector search, skill graphs, resume data, and architecture summaries, then an LLM curator picks only the most impressive snippets.

📊 Living Knowledge Graph — A Neo4j graph where Engineer → Repository → File → CodeSnippet → Skill relationships answer questions like "What skills are claimed but unverified?"

⚡ Smart Model Routing — Sonnet for the expensive one-time ingestion work, Haiku for fast per-request queries (A/B tested: matches Sonnet quality at 4.8x cheaper).

🎯 Three Modes, One Graph — QA chat with multi-turn memory, JD matching with per-requirement confidence, and an interactive D3 competency treemap.

Who it's for

Engineers — Replace "5 years of Python" with a live system that proves it; find your own gaps before an interviewer does.
Hiring managers — Ask any question and get code-backed answers with confidence scores; paste a JD for per-requirement match scores. Pre-screen technical depth in seconds.
The economics — ~$0.01/query on Haiku. Free tier via NVIDIA NIM; quality tier (Anthropic + Voyage) is ~$2–5 for a full codebase ingestion.

🏗️ Architecture at a Glance

flowchart TB
    subgraph INGEST["<b>📥 Ingestion</b> · one-time · Claude Sonnet"]
        direction TB
        R[Resume PDF] --> RP[Sonnet Parse<br><i>roles, skills, companies</i>]
        G[GitHub Repos] --> TS[Tree-sitter Parse<br><i>functions, classes</i>]
        TS --> SC[Sonnet Classify<br><i>map to skill taxonomy</i>]
        SC --> CG[Sonnet Context Gen<br><i>dense paragraph per snippet</i>]
        CG --> EM[Embed<br><i>Voyage-3.5 / EmbedQA</i>]
        EM --> N4[(Neo4j<br>Knowledge Graph)]
        RP --> N4
    end

    subgraph GRAPH["<b>🕸️ Knowledge Graph</b> · Neo4j"]
        direction LR
        ENG[Engineer] -->|OWNS| REPO[Repository]
        REPO -->|CONTAINS| FILE[File]
        FILE -->|CONTAINS| CS[CodeSnippet<br><i>content, context,<br>embeddings, lines</i>]
        CS -->|DEMONSTRATES| SK[Skill<br><i>proficiency, counts</i>]
        DOM[Domain] -->|CONTAINS| CAT[Category]
        CAT -->|CONTAINS| SK
        ENG -->|CLAIMS| SK
    end

    subgraph QUERY["<b>💬 Query</b> · per-request · Claude Haiku"]
        direction TB
        Q[User Question] --> EMQ[Embed Query]
        EMQ --> REACT[ReAct Agent<br><i>up to 4 tool calls</i>]
        REACT --> T1[search_code<br><i>vector similarity</i>]
        REACT --> T2[get_evidence<br><i>skill lookup</i>]
        REACT --> T3[find_gaps<br><i>gap analysis</i>]
        REACT --> T4[get_repo_overview<br><i>repo structure</i>]
        REACT --> T5[get_connected_evidence<br><i>multi-file view</i>]
        REACT --> T6[search_resume<br><i>work history</i>]
        T1 & T2 & T3 & T4 & T5 & T6 --> EV[Evidence Collection<br><i>sort, dedup, diversify</i>]
        EV --> CUR[Haiku Curation<br><i>pick best, assign display mode</i>]
    end

    subgraph STREAM["<b>📡 Response</b> · SSE Stream"]
        direction LR
        SS[Status Updates<br><i>tool-by-tool</i>] --> SG[Skill Subgraph<br><i>progressive D3 viz</i>]
        SG --> ANS[Answer + Evidence<br><i>narrative, code, GitHub links,<br>confidence score</i>]
    end

    N4 --> QUERY
    REACT -.->|intermediate<br>subgraph| SG
    CUR --> ANS

    style INGEST fill:#f5f0eb,stroke:#8b7355,color:#2c2c2c
    style GRAPH fill:#f5f0eb,stroke:#6b8f9e,color:#2c2c2c
    style QUERY fill:#f5f0eb,stroke:#7a8b6f,color:#2c2c2c
    style STREAM fill:#f5f0eb,stroke:#b8805a,color:#2c2c2c

→ Full pipeline walkthrough: docs/HOW_IT_WORKS.md · Design rationale: docs/ARCHITECTURE.md

🚀 Quick Start

Prerequisites

Tool	Why	Install
🐳 Docker	Runs Neo4j	get.docker.com
🐍 Python 3.11+	Runtime	python.org
📦 uv	Fast package manager	`curl -LsSf https://astral.sh/uv/install.sh \| sh`
🔀 Git	Clones repos during ingestion	Pre-installed on most systems

1. Clone and install

git clone https://github.com/CodeBlackwell/PROVE.git
cd PROVE
uv sync

2. Fire up Neo4j 🔥

docker compose up -d

Wait a few seconds for Neo4j to become healthy — check at http://localhost:7474.

3. Configure 🔑

cp .env.example .env

Add your API keys to .env. Two pipeline options:

🆓 Free pipeline: Set NVIDIA_API_KEY only. Uses NVIDIA NIM for everything (Nemotron 49B + EmbedQA 1B).

💎 Quality pipeline: Set ANTHROPIC_API_KEY + VOYAGE_API_KEY. Ingestion auto-upgrades to Claude Sonnet; queries use Haiku 4.5. This is what the live demo runs.

To run PROVE for someone other than the default subject, edit subject.toml (name, naming rules, GitHub owner, domain). See docs/CONFIGURATION.md.

👀 Just want to see it work? Seed a small synthetic graph (no API key needed) and skip straight to step 5:
uv run python scripts/seed_demo.py            # structural demo (homepage + treemap)
uv run python scripts/seed_demo.py --embed    # also enables vector search (needs an embed key)

4. Ingest your data 🍽️

# All public repos for a GitHub user
uv run python -m src.ingestion.cli --resume path/to/resume.pdf --github-user your-username

# Or specific repos
uv run python -m src.ingestion.cli --resume path/to/resume.pdf \
  --repos https://github.com/you/repo1 https://github.com/you/repo2

More options (private repos, languages, re-embedding) in docs/INGESTION.md.

5. Launch! 🚀

just dev
# → http://127.0.0.1:7860

No just? The raw command:

CHAT_PROVIDER=anthropic EMBED_PROVIDER=voyage uv run uvicorn src.app:app --port 7860 --reload

6. Verify ✅

Ask: "What are this engineer's strongest skills?" — you should see a narrative answer with GitHub-linked code evidence and a competency treemap building in the right panel. 🏆

📚 Documentation

Doc	What's inside
How It Works	Ingestion, knowledge graph, query pipeline, and JD match — with diagrams
Architecture	Model strategy, context augmentation, taxonomy, dual-provider system, SSE streaming
Context-Augmented Embeddings	The core retrieval technique — write it up, steal it for your own code-RAG
Configuration	`subject.toml` + every environment variable
Ingestion Guide	Resume formats, repo sources, languages, re-embedding, architecture summaries
Development	Project structure, testing, structured logging
Deployment	Production stack, deploy commands, security
Contributing	Local setup and the checks CI runs

🤝 Contributing

PRs welcome 🛠️ See CONTRIBUTING.md for setup and conventions. Before submitting:

uv run ruff check src tests
uv run ruff format --check src tests
uv run pytest tests/ -m "not e2e"   # no services needed

📝 License

PROVE is open source under a modified MIT license. Use it, fork it, make it yours 🎁

All I ask: keep the attribution, and if it helped you — let's chat.

GitHub | LinkedIn

Made with 🔥 and intent by @CodeBlackwell

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
.github		.github
docs		docs
eval		eval
infra		infra
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Caddyfile		Caddyfile
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
justfile		justfile
package.json		package.json
pyproject.toml		pyproject.toml
subject.toml		subject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 PROVE

Portfolio Reasoning Over Verified Evidence

✨ What Makes PROVE Special

Who it's for

🏗️ Architecture at a Glance

🚀 Quick Start

Prerequisites

1. Clone and install

2. Fire up Neo4j 🔥

3. Configure 🔑

4. Ingest your data 🍽️

5. Launch! 🚀

6. Verify ✅

📚 Documentation

🤝 Contributing

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 PROVE

Portfolio Reasoning Over Verified Evidence

✨ What Makes PROVE Special

Who it's for

🏗️ Architecture at a Glance

🚀 Quick Start

Prerequisites

1. Clone and install

2. Fire up Neo4j 🔥

3. Configure 🔑

4. Ingest your data 🍽️

5. Launch! 🚀

6. Verify ✅

📚 Documentation

🤝 Contributing

📝 License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages