An open-source SRE Ops dashboard that helps teams apply Site Reliability Engineering practices. Config-driven, Prometheus-native, ships with a zero-infra demo stack.
MIT License · FastAPI + React + Vite · Docker Compose · Live Demo
From DockerHub (no build needed):
docker run -p 8000:8000 ops4life/sre-framework:latestFrom source:
git clone https://github.com/ops4life/sre-framework
cd sre
docker compose -f demo/docker-compose.yml up --buildOpen http://localhost:8080 — live SRE dashboard with synthetic metrics from three fake services (frontend, api, worker). No Traefik, no Prometheus to install, no real services needed. Or view the hosted Live Demo.
| Preset | Works with |
|---|---|
traefik |
Traefik reverse proxy + dockerstats + node_exporter |
http |
Any app exposing http_requests_total + http_request_duration_seconds_bucket |
provider: http # or "traefik"
default_service: api
services:
- name: api
slo_target: 99.5
labels:
service: api # fills {service} in query templates
- name: frontend
slo_target: 99.9
labels:
service: frontendFor the traefik preset, add a container label too:
labels:
service: devex-svc@file
container: devexdocker compose up --buildSet PROMETHEUS_URL and SRE_CONFIG_FILE in your .env (see .env.example).
Add a queries: block to sre.yaml to override or extend the preset:
provider: http
queries:
# replace the default availability query
availability: 'avg_over_time(my_custom_up{job="{service}"}[{window}]) * 100'
# add a signal the preset doesn't have
saturation: 'my_cpu_ratio{container="{container}"}'Every panel has a lightbulb icon — hover (or tap on mobile) to see what the metric means, how it's computed, and a link to the CONCEPTS.md primer.
Click Tour in the top bar for a guided walkthrough of the full dashboard.
| Variable | Default | Description |
|---|---|---|
PROMETHEUS_URL |
http://prometheus:9090 |
Prometheus API endpoint |
SRE_CONFIG_FILE |
app/config/sre.yaml |
Path to main config (mount your own without rebuilding) |
TRAEFIK_HOST |
(ops4life-only) | Domain for Traefik TLS routing |
COMPOSE_PROJECT_NAME |
sre |
Docker Compose project name |
Injected at serve time — no rebuild required.
| Variable | Default | Description |
|---|---|---|
SRE_TITLE |
SRE Ops — Mission Control |
Browser tab title and dashboard heading |
SRE_TIMEZONE |
UTC |
Clock display timezone — any IANA string (e.g. America/New_York). Run timedatectl list-timezones or see tz database |
SRE_WINDOW |
28d |
SLO and error budget evaluation window — day format only (e.g. 7d, 30d) |
SRE_FAVICON |
/favicon.png |
URL to a custom favicon and sidebar logo. To serve a local file, mount it into frontend/dist/ and reference it by path |
SRE_ACCENT |
#caff04 |
Override UI accent color. 6-digit hex string (e.g. #3b82f6). Applied at runtime — no rebuild needed. |
# .env example
SRE_TITLE=Acme SRE Dashboard
SRE_TIMEZONE=America/Chicago
SRE_WINDOW=30dsre/
├── app/ # FastAPI backend
│ ├── config_loader.py # load sre.yaml + provider preset, render PromQL
│ ├── metrics.py # query logic (config-driven, provider-agnostic)
│ ├── prometheus.py # Prometheus HTTP client
│ └── config/
│ ├── sre.yaml # main user config
│ └── providers/
│ ├── traefik.yaml # Traefik preset
│ └── http.yaml # generic HTTP RED preset
├── frontend/ # React + Vite
│ └── src/
│ ├── components/ # KpiStrip, SloTable, GoldenSignals, ErrorBudgetBurn, CapacityGrid
│ └── content/concepts.ts # SRE concept definitions (for Learn Mode)
├── demo/ # standalone zero-infra demo stack
│ ├── docker-compose.yml
│ ├── metrics-generator/ # synthetic Prometheus metrics for 3 fake services
│ ├── prometheus.yml
│ └── sre.demo.yaml
├── tests/ # pytest: config loader unit tests
├── CONCEPTS.md # SRE primer
└── CONTRIBUTING.md
Data flow: Browser → FastAPI /api/sre/overview → config_loader renders PromQL → prometheus.py queries Prometheus → JSON response → React components.
Create app/config/providers/<name>.yaml:
name: mystack
latency_unit: seconds # or "milliseconds"
queries:
availability: '...'
request_rate: '...'
latency_p99: '...'
error_rate: '...'
saturation: '...' # optional — panel renders null if absent
cap_vps_cpu: '...' # optional
cap_vps_mem: '...' # optional
cap_vps_disk: '...' # optional
cap_container_cpu: '...' # optional
cap_container_mem: '...' # optionalUse {service}, {container}, {window} as placeholders. Escape literal PromQL {} as {{ and }}.
Then set provider: mystack in sre.yaml.
Every push to main runs the full CI pipeline and publishes a new image to DockerHub:
ops4life/sre-framework:latest # always tracks main
ops4life/sre-framework:<sha> # pinned to commit
Pipeline stages:
- Python tests —
pytest tests/ - Frontend —
tsc -b && vite build - Docker build + push (amd64 + arm64) — main branch only
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt -r requirements-dev.txt
pytest tests/ -vFrontend:
cd frontend && pnpm install && pnpm run buildSee CONTRIBUTING.md.