Skip to content

DPGAlliance/maturity-tool

Repository files navigation

maturity-tool

Monorepo with two Python packages:

  • maturity_tools: tools to assess data maturity.
  • data_viewer: Streamlit UI to visualize maturity results.

They are separate so maturity_tools can be used as a dependency without pulling UI dependencies.

Docker (recommended)

Runs as a small stack (Postgres + API + Streamlit viewer) via Docker Compose.

Each service now builds from its own Dockerfile (inside the package folder) to keep dependencies isolated.

Prereqs

  • Docker + Docker Compose
  • Secret files (not committed): see secrets/README.md

Docker Compose reads runtime secrets from ./secrets/*, not from the inline secret values in .env. Keep real values in:

  • secrets/postgres_password
  • secrets/api_key
  • secrets/github_token
  • secrets/openai_api_key

Start (local)

make build
make up

But you need to start the scheduler profile early for cache and summaries to populate, so it's recommended to use the command in the next section.

Start the scheduler early (recommended for servers)

The scheduler refreshes cached data, writes metrics snapshots, and runs summaries after each refresh. Start it as soon as the stack is up so the database populates quickly and summaries stay current. It needs GITHUB_TOKEN, API_KEY, and OPENAI_API_KEY (or their _FILE variants) when summaries are enabled.

make build
make up-all

URLs

Configuration

  • Database:
    • POSTGRES_DB (default: maturity)
    • POSTGRES_USER (default: maturity)
    • secrets/postgres_password mounted as POSTGRES_PASSWORD_FILE for the database service
    • App containers use DB_HOST, DB_PORT, DB_NAME, DB_USER, and DB_PASSWORD_FILE
  • Scheduler (profile scheduler only):
    • REFRESH_OWNERS=owner1,owner2 (recommended; falls back to DISTINGUISHED_OWNERS if unset)
    • REFRESH_REPO (optional, single repo name)
    • REFRESH_INTERVAL_DAYS (default: 7)
    • REFRESH_INTERVAL_SECONDS (optional override; mainly for testing)
    • FORCE_REFRESH (default: false)
    • RUN_SUMMARIES (default: true)
    • SUMMARY_BASE_URL (default: http://api:8000)
    • SUMMARY_MODEL, SUMMARY_HISTORY, SUMMARY_MAX_AGE_DAYS, SUMMARY_FORCE, SUMMARY_NO_STORE (optional)

Storage and refresh workflow

  • Cache and metrics snapshots are stored in Postgres only.
  • Docker Compose uses DB_* environment variables plus DB_PASSWORD_FILE rather than a single DATABASE_URL.
  • Refresh cache/metrics: python scripts/refresh_cache.py --owner <org>
  • Summaries are generated by scripts/summarize.py (or by the scheduler when RUN_SUMMARIES=true).

Dependency exports (Poetry)

  • Per-service requirements live in dpg_butler_api/requirements.txt, data_viewer/requirements.txt, and scripts/requirements.txt.
  • Refresh them with make refresh-requirements after updating Poetry dependencies.
  • requirements.txt at repo root is legacy and not used by Docker Compose.

Secrets note

  • .env is for non-secret config and optional local non-Docker overrides.
  • In Docker Compose, API_KEY_FILE, GITHUB_TOKEN_FILE, OPENAI_API_KEY_FILE, and DB_PASSWORD_FILE take precedence over plain env vars.

Makefile shortcuts

  • make build: build api, viewer, and refresh_scheduler
  • make build-no-cache: rebuild all service images without Docker cache
  • make up: start db, api, and viewer
  • make up-all: start db, api, viewer, and refresh_scheduler
  • make down: stop the Compose stack
  • make ps: show service status
  • make logs: follow Compose logs

Docs

  • MkDocs site: docs/ (config in mkdocs.yml)
  • Scripts: docs/scripts.md
  • Storage/cache: docs/storage.md
  • API: docs/api.md

About

No description, website, or topics provided.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages