Monorepo with two Python packages:
maturity_tools: tools to assess data maturity.data_viewer: Streamlit UI to visualize maturity results.
They are separate so maturity_tools can be used as a dependency without pulling UI dependencies.
Runs as a small stack (Postgres + API + Streamlit viewer) via Docker Compose.
Each service now builds from its own Dockerfile (inside the package folder) to keep dependencies isolated.
- Docker + Docker Compose
- Secret files (not committed): see
secrets/README.md
Docker Compose reads runtime secrets from ./secrets/*, not from the inline secret values in .env.
Keep real values in:
secrets/postgres_passwordsecrets/api_keysecrets/github_tokensecrets/openai_api_key
make build
make upBut you need to start the scheduler profile early for cache and summaries to populate, so it's recommended to use the command in the next section.
The scheduler refreshes cached data, writes metrics snapshots, and runs summaries after each refresh.
Start it as soon as the stack is up so the database populates quickly and summaries stay current.
It needs GITHUB_TOKEN, API_KEY, and OPENAI_API_KEY (or their _FILE variants) when summaries are enabled.
make build
make up-all- Viewer: http://localhost:8501
- API docs: http://localhost:8000/docs
- Database:
POSTGRES_DB(default:maturity)POSTGRES_USER(default:maturity)secrets/postgres_passwordmounted asPOSTGRES_PASSWORD_FILEfor the database service- App containers use
DB_HOST,DB_PORT,DB_NAME,DB_USER, andDB_PASSWORD_FILE
- Scheduler (profile
scheduleronly):REFRESH_OWNERS=owner1,owner2(recommended; falls back toDISTINGUISHED_OWNERSif unset)REFRESH_REPO(optional, single repo name)REFRESH_INTERVAL_DAYS(default:7)REFRESH_INTERVAL_SECONDS(optional override; mainly for testing)FORCE_REFRESH(default:false)RUN_SUMMARIES(default:true)SUMMARY_BASE_URL(default:http://api:8000)SUMMARY_MODEL,SUMMARY_HISTORY,SUMMARY_MAX_AGE_DAYS,SUMMARY_FORCE,SUMMARY_NO_STORE(optional)
- Cache and metrics snapshots are stored in Postgres only.
- Docker Compose uses
DB_*environment variables plusDB_PASSWORD_FILErather than a singleDATABASE_URL. - Refresh cache/metrics:
python scripts/refresh_cache.py --owner <org> - Summaries are generated by
scripts/summarize.py(or by the scheduler whenRUN_SUMMARIES=true).
- Per-service requirements live in
dpg_butler_api/requirements.txt,data_viewer/requirements.txt, andscripts/requirements.txt. - Refresh them with
make refresh-requirementsafter updating Poetry dependencies. requirements.txtat repo root is legacy and not used by Docker Compose.
.envis for non-secret config and optional local non-Docker overrides.- In Docker Compose,
API_KEY_FILE,GITHUB_TOKEN_FILE,OPENAI_API_KEY_FILE, andDB_PASSWORD_FILEtake precedence over plain env vars.
make build: buildapi,viewer, andrefresh_schedulermake build-no-cache: rebuild all service images without Docker cachemake up: startdb,api, andviewermake up-all: startdb,api,viewer, andrefresh_schedulermake down: stop the Compose stackmake ps: show service statusmake logs: follow Compose logs
- MkDocs site:
docs/(config inmkdocs.yml) - Scripts:
docs/scripts.md - Storage/cache:
docs/storage.md - API:
docs/api.md