FastAPI backend for the AI-powered career guidance platform.
Layer
Technology
Framework
FastAPI 0.115 + Uvicorn (ASGI)
Database
PostgreSQL 16 on Supabase (asyncpg + SQLAlchemy 2 async)
Migrations
Alembic (28 migrations)
Background tasks
Celery 5 + Redis
Auth
JWT (HS256) + OAuth (Google, GitHub) + TOTP MFA + Email OTP
Embeddings
HuggingFace sentence-transformers/all-MiniLM-L6-v2 (384-dim, pgvector)
File storage
Supabase Storage (S3-compatible, CV/resume files)
Model storage
DigitalOcean Spaces (ML model staging/production)
Error tracking
Sentry (sentry-sdk[fastapi])
Metrics
Prometheus (prometheus-fastapi-instrumentator) at /metrics
AI-powered career guidance platform built with FastAPI. Helps students and professionals with career discovery, resume generation, learning roadmaps, job matching, and a personalized AI chat assistant.
Layer
Technology
Web Framework
FastAPI 0.115 (Python 3.12)
Database
PostgreSQL 16 + pgvector
ORM
SQLAlchemy 2.x (async)
Migrations
Alembic
Task Queue
Celery 5.x
Cache / OTP Store
Upstash Redis (HTTP)
Celery Broker
Upstash Redis (rediss://)
AI / LLM
OpenAI (gpt-4o-mini), LangChain, LangGraph
Embeddings
HuggingFace all-MiniLM-L6-v2 (384 dims)
File Storage
Supabase Storage (S3-compatible)
Email
Mailgun
Auth
JWT + bcrypt + TOTP MFA
OAuth
Google + GitHub
Deployment
Azure Container Apps via GitHub Actions
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
---
# # Features
- ** Auth** — Register, login, email OTP verification, password reset, Google/GitHub OAuth, TOTP MFA, re-authentication, JWT revocation
- ** User Profile** — Profile management, skills, work experience, CV upload (S3), profile picture, GitHub sync
- ** AI Chat** — WebSocket streaming RAG chatbot powered by LangGraph ReAct agent + web search
- ** Transcripts** — Academic transcript upload with version history, GPA validation, AI profile summary
- ** Learning Roadmaps** — AI-generated week-by-week roadmaps with real resource URLs, step progress tracking, completion percentage
- ** Resume Generation** — AI-generated resumes from user knowledge base (transcript, CV, GitHub, profile)
- ** Jobs** — Trending careers, in-demand skills, job market stats
- ** Notifications** — In-app notification system
- ** Admin** — User management (list, update, soft/hard delete, reactivate)
- ** Background Tasks** — Async embedding generation via Celery workers
---
# # Project Structure
Backend/
├── app/
│ ├── api/ # Route handlers (14 modules)
│ ├── core/ # Config, database, security
│ ├── models/ # SQLAlchemy ORM models (13 models)
│ ├── repositories/ # Data access layer
│ ├── schemas/ # Pydantic request/response models
│ ├── services/ # Business logic
│ ├── tasks/ # Celery background tasks
│ ├── templates/email/ # Email templates
│ └── main.py
├── alembic/ # Database migrations
├── tests/ # Unit, integration, e2e tests
├── scripts/ # Dev/ops scripts
├── Dockerfile # API image
├── Dockerfile.worker # Celery worker image
├── docker-compose.yml # Local dev stack
├── docker-compose.prod.yml
└── requirements.txt
---
## API Endpoints
| Prefix | Description |
|---|---|
| `POST /api/auth/register` | Register with email/password |
| `POST /api/auth/login` | Login, returns JWT |
| `POST /api/auth/verify-email` | Verify OTP |
| `POST /api/auth/forgot-password` | Request password reset |
| `POST /api/auth/reset-password` | Reset password with OTP |
| `POST /api/auth/logout` | Revoke JWT |
| `GET /api/auth/oauth/google/login` | Google OAuth URL |
| `GET /api/auth/oauth/github/login` | GitHub OAuth URL |
| `POST /api/auth/mfa/enroll` | Start TOTP enrollment |
| `POST /api/auth/mfa/verify` | Verify TOTP code |
| `GET /api/users/me` | Get own profile |
| `PATCH /api/users/me` | Update profile |
| `POST /api/users/me/cv` | Upload CV |
| `GET /api/users/me/github/sync` | Sync GitHub profile |
| `POST /api/users/me/experiences` | Add work experience |
| `GET /api/transcripts/` | List transcript versions |
| `POST /api/transcripts/` | Upload transcript |
| `GET /api/chat/sessions` | List chat sessions |
| `WS /api/chat/ws/{session_id}` | Streaming chat WebSocket |
| `POST /api/roadmaps/generate` | Generate learning roadmap |
| `GET /api/roadmaps/` | List roadmaps with progress % |
| `PATCH /api/roadmaps/steps/{id}/progress` | Update step progress |
| `POST /api/resume/generate` | Generate AI resume |
| `GET /api/jobs/trending` | Trending careers |
| `GET /api/jobs/in-demand-skills` | In-demand skills |
| `GET /api/notifications/` | List notifications |
| `GET /api/health` | Health check |
Full interactive docs at `http://localhost:8000/docs` (Swagger UI).
---
## Local Development
### Prerequisites
- Python 3.12
- PostgreSQL 16 with pgvector extension
- Redis (local, for Celery worker)
### 1. Clone and install
```bash
git clone https://github.com/VentureScope/Backend.git
cd Backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Fill in at minimum: DATABASE_URL, SECRET_KEY
# Generate SECRET_KEY: python -c "import secrets; print(secrets.token_hex(32))"
See Environment Variables Reference for all options.
The app connects to Supabase directly — no local Postgres needed for the default setup.
Edit .env and fill in all required values (see Environment Variables section below)
### 3. Run database migrations(local)
```bash
alembic upgrade head
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
Optional: Docker Compose (local Postgres + Redis)
# API + local Postgres + Redis (DATABASE_URL is overridden to local postgres in compose)
docker compose up -d
# Also start Celery worker
docker compose --profile worker up -d
# Also start Prometheus
docker compose --profile monitoring up -d
Services:
Note: docker-compose.yml overrides DATABASE_URL to point at the local
Postgres container. To use Supabase, run uvicorn directly (step 4).
Method
Path
Description
POST
/api/auth/register
Register with email + password
POST
/api/auth/login
Login → JWT
POST
/api/auth/logout
Invalidate token
POST
/api/auth/refresh
Refresh access token
GET
/api/auth/oauth/google
Google OAuth flow
GET
/api/auth/oauth/github
GitHub OAuth flow
POST
/api/auth/mfa/enable
Enable TOTP MFA
POST
/api/auth/mfa/verify
Verify TOTP code
Method
Path
Description
GET
/api/users/me
Current user profile
PATCH
/api/users/me
Update profile
POST
/api/users/me/cv
Upload CV/resume
Jobs, Chat, Roadmap (/api/*)
Method
Path
Description
GET
/api/jobs
Search job listings
POST
/api/chat
AI career chat
GET
/api/roadmap
Learning roadmap
Super-admin dashboard (/api/admin/*)
All routes require is_admin=True in the JWT unless noted.
Method
Path
Description
GET
/api/admin/ml/runs
List training runs (Supabase), filterable by status/model_type
GET
/api/admin/ml/runs/{run_id}
Single training run with full metrics
POST
/api/admin/ml/deploy/{run_id}
Deploy model: copies models/staging/ → models/production/ in DO Spaces, updates status
POST
/api/admin/ml/trigger
Trigger monthly_training_pipeline DAG via Airflow
Method
Path
Description
GET
/api/admin/taxonomy/unmatched
List low-confidence job titles pending review
PATCH
/api/admin/taxonomy/unmatched/{id}
Accept (→ writes to taxonomy_roles DB) or decline
GET
/api/admin/taxonomy/roles
List accepted canonical roles
Method
Path
Description
GET
/api/admin/system/pipeline-status
Last run state for both DAGs (Airflow proxy)
GET
/api/admin/system/pipeline-runs
ETL run history + task durations (Recharts data)
GET
/api/admin/system/storage
DO Spaces model file listing + total size
Method
Path
Auth
Description
GET
/api/admin/sentry/summary
is_admin + 5-min cache
Error counts, trend, top issues, p95, Apdex
POST
/api/admin/sentry-webhook
HMAC-SHA256 only
Receives Sentry alert webhooks
Method
Path
Description
POST
/api/admin/notifications
Receive HMAC-signed pipeline webhook from CareerCompass
GET
/api/admin/notifications-feed
List stored notifications (pipeline + Sentry), paginated
PATCH
/api/admin/notifications-feed/{id}/read
Mark one notification as read
PATCH
/api/admin/notifications-feed/mark-all-read
Bulk mark read
Method
Path
Description
GET
/api/admin/users
List all users (paginated)
GET/PATCH/DELETE
/api/admin/users/{id}
Get / update / deactivate user
POST
/api/admin/users/{id}/reactivate
Reactivate deactivated user
app/
├── api/
│ ├── deps.py # JWT auth dependencies
│ ├── auth.py, mfa.py # Auth routes
│ ├── users.py, admin.py # User management
│ ├── admin_ml.py # ML pipeline admin + notifications feed
│ ├── admin_taxonomy.py # Taxonomy review admin
│ ├── admin_system.py # System health / Airflow proxy
│ ├── admin_sentry.py # Sentry proxy + webhook receiver
│ ├── chat.py, jobs.py # Core product routes
│ └── health.py
├── core/
│ ├── config.py # Pydantic Settings (validates secrets at startup)
│ ├── database.py # SQLAlchemy async engine + session
│ ├── security.py # JWT helpers
│ └── rate_limit.py # In-process fixed-window rate limiter
├── models/ # SQLAlchemy ORM models (17 files)
├── repositories/ # Data access layer
├── schemas/ # Pydantic request/response models
├── services/
│ ├── airflow_service.py # Airflow REST API client (async, parallel calls)
│ ├── sentry_service.py # Sentry API client (async, 5-min TTL cache, parallel calls)
│ ├── supabase_service.py # asyncpg pool for Supabase admin queries + writes
│ ├── spaces_service.py # Shared DO Spaces boto3 client factory
│ ├── auth_service.py, user_service.py, ...
│ └── email_service.py, embedding_service.py, ...
└── main.py # App factory, lifespan, router mounts
alembic/versions/ # 28 migration files
# Apply all pending migrations
alembic upgrade head
# Create a new migration
alembic revision --autogenerate -m " describe change"
# Check current state
alembic current
# Roll back one step
alembic downgrade -1
API: http://localhost:8000
Docs: http://localhost:8000/docs
# ## 5. Start the Celery worker (local)
Upstash Redis wire-protocol is blocked by WSL2 networking. Use your local Redis for development:
` ` ` bash
CELERY_BROKER_URL=redis://localhost:6379 \
CELERY_RESULT_BACKEND=redis://localhost:6379 \
celery -A app.celery_config.celery_app worker --loglevel=info
In production (Azure) the worker uses the Upstash rediss:// URL from .env automatically.
Copy .env.example to .env and fill in:
Variable
Description
DATABASE_URL
PostgreSQL async URL (postgresql+asyncpg://...)
SECRET_KEY
JWT signing key — generate with openssl rand -hex 32
UPSTASH_REDIS_URL
Upstash REST URL (https://...upstash.io)
UPSTASH_REDIS_TOKEN
Upstash REST token
CELERY_BROKER_URL
Upstash wire-protocol URL (rediss://...)
CELERY_RESULT_BACKEND
Same as CELERY_BROKER_URL
EMBEDDING_PROVIDER
hf for HuggingFace or hosted for OpenAI-compatible
HF_TOKEN
HuggingFace API token (if EMBEDDING_PROVIDER=hf)
EMBEDDING_MODEL_NAME
e.g. sentence-transformers/all-MiniLM-L6-v2
EMBEDDING_DIMENSIONS
Must match the model output (e.g. 384)
END_POINT
LLM API base URL
HOSTED_LLM_TOKEN
LLM API token
CHAT_MODEL_NAME
LLM model name e.g. gpt-4o-mini
AWS_ACCESS_KEY_ID
Supabase Storage access key
AWS_SECRET_ACCESS_KEY
Supabase Storage secret
S3_BUCKET_NAME
Storage bucket name
S3_ENDPOINT_URL
Supabase Storage endpoint
MAILGUN_API_KEY
Mailgun API key
MAILGUN_DOMAIN
Mailgun sending domain
GOOGLE_CLIENT_ID
Google OAuth client ID
GOOGLE_CLIENT_SECRET
Google OAuth client secret
GITHUB_CLIENT_ID
GitHub OAuth client ID
GITHUB_CLIENT_SECRET
GitHub OAuth client secret
SERPER_API_KEY
Serper API key for web search
Two Celery tasks run in the background:
Task
Triggered by
What it does
generate_user_profile_embedding
Register, profile update, CV upload, skills update
Builds user document text → generates vector embedding → stores in users.embedding
generate_knowledge_embedding
Transcript upload, CV upload, GitHub sync
Embeds individual knowledge chunks → stores in user_knowledge.embedding
batch_generate_knowledge_embeddings
Transcript re-upload
Re-embeds all knowledge chunks for a source type
These embeddings power semantic job matching and the RAG chatbot retrieval.
Model
Table
Purpose
User
users
Core user identity, skills, embedding
OAuthAccount
oauth_accounts
Google/GitHub OAuth connections
TokenBlocklist
token_blocklist
JWT revocation store
AcademicTranscript
academic_transcripts
E-student transcript versions
TranscriptConfig
transcript_configs
User GPA scale config
UserKnowledge
user_knowledge
Vector-searchable RAG knowledge chunks
Experience
experiences
Work experience entries
GitHubSyncSnapshot
github_sync_snapshots
Cached GitHub profile data
Job
jobs
Job listings with embeddings
LearningRoadmap
learning_roadmaps
AI-generated learning plans
LearningRoadmapStep
learning_roadmap_steps
Weekly steps
LearningRoadmapStepResource
learning_roadmap_step_resources
Resources per step
LearningRoadmapProgress
learning_roadmap_progress
User progress per step
Resume
resumes
AI-generated resume data
ChatSession
chat_sessions
Conversation threads
ChatMessage
chat_messages
Individual messages
Notification
notifications
In-app notifications
Authentication & Security
JWT — HS256 signed tokens with jti UUID for per-token revocation via token_blocklist
AAL2 — Sensitive routes (password change, account deletion, MFA management) require re-authentication or TOTP verification
OTP — 6-digit codes stored in Upstash Redis with TTL, rate-limited (60s cooldown, max 3/hour)
OAuth CSRF — State parameter signed with HMAC-SHA256 + timestamp expiry
Timing attacks — Constant-time comparison on passwords and OTP codes throughout
bcrypt — Password hashing via passlib
Deploys automatically to Azure Container Apps on every push to master-v2 via GitHub Actions.
Builds and pushes the API image (Dockerfile) to GitHub Container Registry
Builds and pushes the Worker image (Dockerfile.worker) to GitHub Container Registry
Updates the venturescope Container App with the new API image
Updates the backgroundworker Container App with the new worker image
Secret
Value
AZURE_CREDENTIALS
Azure service principal JSON
AZURE_RG
Azure resource group name
Update environment variables on Azure
# API container
az containerapp update \
--name venturescope \
--resource-group < AZURE_RG> \
--set-env-vars KEY=" value" KEY2=" value2"
# Worker container
az containerapp update \
--name backgroundworker \
--resource-group < AZURE_RG> \
--set-env-vars KEY=" value" KEY2=" value2"
# All tests
./run_tests.sh
# With coverage
./run_tests.sh coverage
# In Docker
./run_tests.sh docker
# Directly
pytest tests/ -v
Environment Variables Reference
Variable
Description
DATABASE_URL
PostgreSQL async URL (postgresql+asyncpg://...)
SECRET_KEY
JWT signing key — generate: python -c "import secrets; print(secrets.token_hex(32))"
The app refuses to start in production (ENVIRONMENT=production) if either
of these is still set to their placeholder defaults.
Variable
Default
Description
ENVIRONMENT
development
development / staging / production
DEBUG
false
Enable debug mode
ALGORITHM
HS256
JWT algorithm
ACCESS_TOKEN_EXPIRE_MINUTES
1440
24 hours
Variable
Description
GOOGLE_CLIENT_ID / GOOGLE_CLIENT_SECRET
Google OAuth app credentials
GITHUB_CLIENT_ID / GITHUB_CLIENT_SECRET
GitHub OAuth app credentials
OAUTH_STATE_SECRET
CSRF protection secret (different from SECRET_KEY)
Variable
Default
Description
EMBEDDING_PROVIDER
hf
hf (HuggingFace local) or hosted (OpenAI-compatible)
EMBEDDING_MODEL_NAME
sentence-transformers/all-MiniLM-L6-v2
Model name
EMBEDDING_DIMENSIONS
384
Must match pgvector column dimension
HF_TOKEN
HuggingFace token (for hf provider)
END_POINT / HOSTED_LLM_TOKEN
Hosted LLM endpoint + token (for hosted provider)
Variable
Description
AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY
Supabase Storage S3 credentials
S3_BUCKET_NAME
Resume bucket name
S3_ENDPOINT_URL
Supabase Storage endpoint
Variable
Description
REDIS_URL
Redis connection URL (supports rediss:// TLS)
CELERY_BROKER_URL / CELERY_RESULT_BACKEND
Usually same as REDIS_URL
Variable
Description
MAILGUN_API_KEY / MAILGUN_DOMAIN
Mailgun credentials
MAILGUN_FROM_EMAIL
Sender address
OTP_EXPIRE_MINUTES
OTP validity window (default 10)
Variable
Description
SUPABASE_URL
Plain psycopg2 URL for Supabase (admin read/write queries)
AIRFLOW_API_URL
Airflow REST API base URL (http://...:8080/api/v1)
AIRFLOW_SERVICE_ACCOUNT_USER / AIRFLOW_SERVICE_ACCOUNT_PASSWORD
Airflow backend-svc account
SENTRY_DSN
Sentry ingest URL
SENTRY_AUTH_TOKEN
Internal integration token (project:read + org:read)
SENTRY_ORG_SLUG / SENTRY_PROJECT_SLUG
Sentry org/project identifiers
SENTRY_WEBHOOK_SECRET
HMAC secret for verifying inbound Sentry webhooks
PIPELINE_WEBHOOK_SECRET
HMAC secret shared with CareerCompass notify_admin task
DO_SPACES_KEY / DO_SPACES_SECRET
DO Spaces credentials for model deploy
DO_SPACES_BUCKET / DO_SPACES_ENDPOINT / DO_SPACES_REGION
DO Spaces config
Phase
Description
Status
Scaffold
FastAPI app, config, CORS, folder structure
✅ Done
Auth
JWT, register/login, token blocklist
✅ Done
OAuth
Google + GitHub OAuth 2.0
✅ Done
MFA
TOTP + Email OTP
✅ Done
Users
Profile update, CV upload, GitHub sync
✅ Done
Alembic
28 versioned migrations
✅ Done
Jobs
Job listings, search, pgvector similarity
✅ Done
Chat
LangGraph AI career chat
✅ Done
Roadmap
Learning roadmap generation
✅ Done
Admin users
User management endpoints
✅ Done
Phase 2
Super-admin dashboard (ML, taxonomy, system, Sentry, notifications)
✅ Done
Phase 4
Prometheus instrumentation + /metrics endpoint
✅ Done
pytest tests/ -v
pytest tests/unit/ -v
pytest tests/integration/ -v
pytest tests/ --cov=app --cov-report=html
tests/
├── conftest.py # Fixtures: engine, db session, client, users
├── unit/ # Service and repository unit tests (9 modules)
├── integration/ # API endpoint + migration tests (6 modules)
└── e2e/ # Full user journey tests (1 module)
HTTP Request
→ FastAPI Router
→ get_current_user (JWT → blocklist → user fetch)
→ [require_aal2 if sensitive]
→ Route Handler
→ Service Layer (business logic)
→ Repository Layer (SQLAlchemy async)
→ PostgreSQL
→ Celery task dispatched (embeddings)
→ Pydantic response serialization
WS Connect (?token=JWT)
→ Auth check
→ Receive message
→ Embed query → vector search UserKnowledge
→ Load message history
→ LangGraph ReAct agent (may call web search tool)
→ Stream tokens → WS send_json
→ Save assistant message
→ Create notification