From 82531d814c4893e67ccdecb1cbdad182993a5f29 Mon Sep 17 00:00:00 2001 From: Greg Kostin Date: Wed, 22 Apr 2026 07:54:28 -0400 Subject: [PATCH 1/2] Add backend/config developer guide, DELTA.md automation, and gen_delta.py - backend/config/README.md: developer guide for updating dspace.cfg secrets in Kubernetes across production, workshop, and demo environments. Covers the two-layer config (Secret + ConfigMap overrides), kubectl context switching, step-by-step secret update workflow, and how to regenerate the local cfg files from scratch. - backend/config/DELTA.md: auto-generated comparison of the three dspace.cfg environments (25 differing properties, key findings, recommendations). Regenerated by dotpy/gen_delta.py. - dotpy/gen_delta.py: new script that parses the three from-kube.*.dspace.cfg files, computes all differing properties, redacts sensitive values, and writes a fresh DELTA.md with a padded Markdown table, auto-detected findings, and recommendations. - dotpy/README.md: added gen_delta.py entry following existing conventions. - .gitignore: added backend/config/*.cfg to prevent secrets from being committed (db passwords, DOI credentials, API keys). --- .gitignore | 1 + backend/config/DELTA.md | 113 ++++++++++++ backend/config/README.md | 287 ++++++++++++++++++++++++++++++ dotpy/README.md | 55 ++++++ dotpy/gen_delta.py | 375 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 831 insertions(+) create mode 100644 backend/config/DELTA.md create mode 100644 backend/config/README.md create mode 100755 dotpy/gen_delta.py diff --git a/.gitignore b/.gitignore index be47843..e080607 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,3 @@ .DS_Store .env +backend/config/*.cfg diff --git a/backend/config/DELTA.md b/backend/config/DELTA.md new file mode 100644 index 0000000..fe694a5 --- /dev/null +++ b/backend/config/DELTA.md @@ -0,0 +1,113 @@ +# Config File Comparison: demo vs production vs workshop + +> Generated 2026-04-22T02:19:26Z by `python3 dotpy/gen_delta.py`. +> Re-run after fetching fresh copies of the three cfg files from the cluster. + +Comparing the three decrypted Kubernetes config files: + +- `from-kube.demo.dspace.cfg` +- `from-kube.production.dspace.cfg` +- `from-kube.workshop.dspace.cfg` + +> **Note:** these `*.cfg` files are gitignored and are never committed. +> Fetch them before running this script (see `README.md`). + +--- + +## Summary Table + +Properties that differ across environments (25 of 263 total): + +| Property | demo | production | workshop | +|------------------------------------------------------------|---------------------------------------------------|--------------------------------------------------|----------------------------------------------| +| `alert.recipient` | ${mail.admin} | ulib-deepblue-documents-cron-reporting@umich.edu | ${mail.admin} | +| `cc.license.jurisdiction` | us | *(absent)* | *(absent)* | +| `core.authorization.collection-admin.submitters` | false | true | true | +| `core.authorization.community-admin.item-admin.cc-license` | *(absent)* | false | false | +| `core.authorization.community-admin.item.create-bitstream` | *(absent)* | true | true | +| `core.authorization.community-admin.item.delete-bitstream` | *(absent)* | false | false | +| `db.password` | *(redacted)* | *(redacted)* | *(redacted)* | +| `db.url` | jdbc:postgresql://localhost:5432/dspace | jdbc:postgresql://localhost:5432/dspace-prod | jdbc:postgresql://localhost:5432/dspace-prod | +| `db.username` | dspace | dspace-prod | dspace-prod | +| `filestorage.dir` | ${dspace.dir} | /mnt/prod-assetstore | /mnt/prod-assetstore | +| `google.analytics.key` | *(absent)* | UA-12656561-1 | *(absent)* | +| `handle.canonical.prefix` | ${dspace.ui.url}/handle/ | https://hdl.handle.net/ | ${dspace.ui.url}/handle/ | +| `handle.hide.listhandles` | *(absent)* | false | false | +| `handle.prefix` | 123456789 | 2027.42 | 123456789 | +| `handle.remote-resolver.enabled` | *(absent)* | true | true | +| `handle.use.uuid` | *(absent)* | true | true | +| `hidden.format` | 86 | 39 | 39 | +| `identifier.doi.password` | *(redacted)* | *(redacted)* | *(redacted)* | +| `identifier.doi.password_working` | *(absent)* | *(redacted)* | *(absent)* | +| `identifier.doi.prefix` | 10.33577 | 10.7302 | 10.33577 | +| `nodoi.email` | abcblancoj@umich.edu | depositsarefun@acm.org | abcblancoj@umich.edu | +| `sitemap.dir` | ${dspace.dir}/sitemaps | ${dspace.dir}/data/sitemaps | ${dspace.dir}/data/sitemaps | +| `sitemap.path` | *(absent)* | sitemaps | *(absent)* | +| `upload.temp.dir` | ${dspace.dir}/upload | ${dspace.dir}/data/ui-upload | ${dspace.dir}/upload | +| `webui.browse.index.2` | author:metadata:dc.contributor.*\,dc.creator:text | author:metadata:dc.contributor.author:text | author:metadata:dc.contributor.author:text | + +--- + +## Key Findings + +### 1. `dspace.server.url` and `dspace.ui.url` are `localhost` in all three environments + +All three config files have: + +``` +dspace.server.url = http://localhost:8080/server +dspace.ui.url = http://localhost:4000 +``` + +This is **intentional**. These values are always overridden at pod startup by environment variables injected from the `backend-environment` ConfigMap (`dspace__P__server__P__url`, `dspace__P__ui__P__url`). Do **not** edit these in `dspace.cfg`; edit `environments//backend-cm.jsonnet` in `deepblue-documents-kube` instead. + +### 2. workshop shares the production database credentials + +- `db.url` = `jdbc:postgresql://localhost:5432/dspace-prod` in both production and workshop — workshop connects to the **production database**. +- `db.username` / `db.password` are also identical in production and workshop. + +⚠️ Workshop is **not** an isolated test environment with respect to data. The production assetstore is mounted read-only in workshop (`readOnly: true`), but database writes (deposits, metadata edits, workflow actions) **affect production data**. + +### 3. demo uses a local/test assetstore and database + +- `filestorage.dir` = `${dspace.dir}` — files stored relative to the DSpace install directory (ephemeral in a container). +- `db.url` = `jdbc:postgresql://localhost:5432/dspace` with username `dspace` — a lightweight local database, not production. + +demo is fully isolated from production data. + +### 4. Production has unique real-world identifiers + +- `handle.prefix` = `2027.42` is the registered U-M prefix at CNRI. demo and workshop use `123456789` (test/dummy). +- `identifier.doi.prefix` = `10.7302` is the registered Deep Blue Data prefix at DataCite. demo and workshop use `10.33577` (test). +- Production has an extra `identifier.doi.password_working` field absent from demo and workshop. + + +### 5. `nodoi.email` differs between production and the other environments + +- Production: `depositsarefun@acm.org` — ⚠️ this does not appear to be a valid U-M address; it should be reviewed. +- demo and workshop: `abcblancoj@umich.edu`. + +### 6. IP ranges, API key, and mail settings are identical across all three environments + +All three environments share the same values for: + +- `ip.bioIPsRange1` +- `ip.bioIPsRange2` +- `api.user.key` +- `mail.server` +- `mail.from.address` + +Changes to these properties must be applied to all three secrets. + +--- + +## Recommendations + +1. **Do not edit ConfigMap-controlled properties in `dspace.cfg`.** Properties such as `dspace.server.url`, `dspace.ui.url`, `db.url`, `solr.server`, `handle.prefix`, `identifier.doi.prefix`, and mail settings are overridden at runtime by the `backend-environment` ConfigMap. Edit `environments//backend-cm.jsonnet` in `deepblue-documents-kube` instead. + +2. **Isolate workshop from production data.** Workshop should have its own database and assetstore. The current setup (shared `dspace-prod` database) is dangerous for testing. + +3. **Review `nodoi.email` in production** — the current value does not appear to be a valid U-M address. + +4. **Properties shared by all three environments** (IP ranges, `api.user.key`, mail settings) must be updated in all three secrets simultaneously. + diff --git a/backend/config/README.md b/backend/config/README.md new file mode 100644 index 0000000..41eb10c --- /dev/null +++ b/backend/config/README.md @@ -0,0 +1,287 @@ +# backend/config — Developer Guide: Updating `dspace.cfg` in Kubernetes + +This directory contains supporting documentation for working with the +`dspace.cfg` Kubernetes Secrets across all three deployment environments. + +## Files in This Directory + +| File | Description | +|-----------------------------------|------------------------------------------------------------------------| +| `from-kube.production.dspace.cfg` | Decrypted `dspace.cfg` pulled from the **production** namespace | +| `from-kube.workshop.dspace.cfg` | Decrypted `dspace.cfg` pulled from the **workshop** namespace | +| `from-kube.demo.dspace.cfg` | Decrypted `dspace.cfg` pulled from the **demo** namespace | +| `DELTA.md` | Auto-generated diff of the three configs; key differences and findings | + +> **Note:** all `*.cfg` files in this directory are listed in `.gitignore` +> and are **never committed to the repository**. They contain secrets +> (database passwords, DOI credentials, API keys) and exist only as +> local working copies. Fetch them fresh from the cluster before editing +> (see [Step 1](#1-fetch-the-current-secret) below). + +--- + +## Regenerating This Directory from Scratch + +If the `from-kube.*.dspace.cfg` files and/or `DELTA.md` are missing, run the +following commands from the repository root. You need `kubectl` access to +both clusters and the two contexts shown below. + +### Step A — Fetch the cfg files from the cluster + +```shell +# production runs on its own cluster +kubectl config use-context deepblue-documents-production +kubectl -n production get secret dspace-cfg \ + -o jsonpath="{.data.dspace\.cfg}" | base64 --decode \ + > backend/config/from-kube.production.dspace.cfg + +# workshop and demo both run on the workshop cluster +kubectl config use-context deepblue-documents-workshop +kubectl -n workshop get secret dspace-cfg \ + -o jsonpath="{.data.dspace\.cfg}" | base64 --decode \ + > backend/config/from-kube.workshop.dspace.cfg +kubectl -n demo get secret dspace-cfg \ + -o jsonpath="{.data.dspace\.cfg}" | base64 --decode \ + > backend/config/from-kube.demo.dspace.cfg +``` + +### Step B — Regenerate DELTA.md + +```shell +python3 dotpy/gen_delta.py backend/config +``` + +This parses the three cfg files, computes all differing properties, redacts +sensitive values, and writes `backend/config/DELTA.md`. Run it any time the +cfg files change to keep the diff current. + +--- + +## How `dspace.cfg` Is Deployed + +In Kubernetes, `dspace.cfg` is stored as a **Secret** named `dspace-cfg` in +each environment's namespace. At pod startup the backend container copies the +secret to `/dspace/config/dspace.cfg` and starts Tomcat. + +Changing the config requires: + +1. Fetching the current secret from the cluster. +2. Editing the plain-text config. +3. Re-encoding and re-applying the secret. +4. Restarting the backend pod to pick up the change. + +### Two-layer configuration: Secret + ConfigMap overrides + +`dspace.cfg` (the Secret) is the base configuration, but a number of its +properties are **overridden at runtime** by environment variables injected +from the `backend-environment` **ConfigMap** in each namespace. DSpace 7+ +uses Spring Boot's externalized configuration: an env var named +`dspace__P__server__P__url` overrides the `dspace.server.url` property in +`dspace.cfg` (the `__P__` sequence represents a `.`). + +The ConfigMap is rendered by Tanka from `environments//backend-cm.jsonnet` +and sets these properties per environment: + +| ConfigMap env var key | Overrides `dspace.cfg` property | +|----------------------------------------------------|-----------------------------------------------| +| `dspace__P__server__P__url` | `dspace.server.url` | +| `dspace__P__ui__P__url` | `dspace.ui.url` | +| `db__P__url` | `db.url` | +| `solr__P__server` | `solr.server` | +| `handle__P__prefix` | `handle.prefix` | +| `identifier__P__doi__P__prefix` | `identifier.doi.prefix` | +| `dspace__P__name` / `dspace__P__shortname` | `dspace.name` / `dspace.shortname` | +| `pr__P__collectionid` | `pr.collectionid` | +| `hidden__P__format` | `hidden.format` | +| `mail__P__server`, `mail__P__server__P__port`, etc | Mail settings | + +> **Important:** if you change one of the above properties in `dspace.cfg`, +> the ConfigMap env var will still **win at runtime**. To change these +> properties for a Kubernetes environment, edit the corresponding +> `backend-cm.jsonnet` file in the `deepblue-documents-kube` repository +> and let Argo CD sync the change — do not edit the secret alone. +> +> Properties that are **not** in the ConfigMap (e.g. `db.password`, +> `identifier.doi.password`, IP ranges, `api.user.key`, `bitstream.virus.check`, +> `nodoi.email`) can only be changed via the `dspace-cfg` Secret. + +> **⚠️ Warning — workshop shares the production database.** +> The `workshop` namespace points at the **production** PostgreSQL database +> (`dspace-prod`) via the `backend-environment` ConfigMap +> (`db__P__url = jdbc:postgresql://db:5432/dspace-prod`). +> The production assetstore is mounted in workshop, but as **read-only** +> (`readOnly: true`). Database writes (deposits, metadata edits, workflow +> actions) still affect production data. See `DELTA.md` for full details. + +--- + +## Environment Reference + +| Environment | Kubernetes Namespace | Cluster API Server | Public Hostname | +|--------------|----------------------|----------------------------------------------------------------|-----------------------------------------------| +| `production` | `production` | `https://production.cluster.deepblue-documents.lib.umich.edu` | `production.deepblue-documents.lib.umich.edu` | +| `workshop` | `workshop` | `https://workshop.cluster.deepblue-documents.lib.umich.edu` | `workshop.deepblue-documents.lib.umich.edu` | +| `demo` | `demo` | `https://workshop.cluster.deepblue-documents.lib.umich.edu` | `demo.deepblue-documents.lib.umich.edu` | + +--- + +## Per-Environment Differences, Findings, and Recommendations + +See **[`DELTA.md`](DELTA.md)** for the full, auto-generated comparison of all +three environments — including a table of every differing property, key +findings, and recommendations. `DELTA.md` is regenerated from the live cfg +files by running: + +```shell +python3 dotpy/gen_delta.py backend/config +``` + +--- + +## Step-by-Step: Updating `dspace.cfg` + +Replace `` below with `production`, `workshop`, or `demo` as +appropriate. + +> **Run all commands from the repository root.** The `.gitignore` rule +> `backend/config/*.cfg` only protects files written into that directory. +> Using explicit `backend/config/` paths below ensures decrypted secrets +> are never accidentally staged or committed. + +### 0. Select the correct kubectl context + +`production` runs on a separate cluster from `workshop` and `demo`. Switch to +the right context before running any `kubectl` commands: + +```shell +# for production: +kubectl config use-context deepblue-documents-production + +# for workshop or demo (both run on the workshop cluster): +kubectl config use-context deepblue-documents-workshop +``` + +Confirm the active context at any time with: + +```shell +kubectl config current-context +``` + +### 1. Fetch the current secret + +```shell +kubectl -n get secret dspace-cfg \ + -o jsonpath="{.data.dspace\.cfg}" | base64 --decode \ + > backend/config/from-kube..dspace.cfg +``` + +This writes the decrypted config to `backend/config/from-kube..dspace.cfg`, +which is covered by the `.gitignore` rule. + +### 2. Edit the config + +```shell +$EDITOR backend/config/from-kube..dspace.cfg +``` + +Refer to [DELTA.md](DELTA.md) for key per-environment configuration differences +to confirm you are using values appropriate to the target +environment. In particular: + +- Do **not** use production `handle.prefix` (`2027.42`) or + `identifier.doi.prefix` (`10.7302`) in `demo` or `workshop`. +- Do **not** use production database credentials in `demo`. + +### 3. Re-encode to base64 + +```shell +base64 < backend/config/from-kube..dspace.cfg \ + > backend/config/from-kube..dspace.cfg.base64 +``` + +### 4. Patch the secret directly (recommended) + +```shell +kubectl -n patch secret dspace-cfg \ + --type='json' \ + -p="[{\"op\":\"replace\",\"path\":\"/data/dspace.cfg\",\"value\":\"$(cat backend/config/from-kube..dspace.cfg.base64)\"}]" +``` + +Alternatively, copy the contents of +`backend/config/from-kube..dspace.cfg.base64` +into a local `config-secret.yaml` manifest and apply it: + +```shell +kubectl apply -f config-secret.yaml +``` + +### 5. Restart the backend pod + +The backend pod must be restarted to reload the mounted secret: + +```shell +kubectl -n rollout restart deployment backend +``` + +Wait for the rollout to complete: + +```shell +kubectl -n rollout status deployment backend +``` + +### 6. Verify + +Check the backend logs for startup errors: + +```shell +kubectl -n logs -l app=backend --tail=100 +``` + +Confirm the REST API is healthy: + +```shell +# production: +curl -s https://backend.production.deepblue-documents.lib.umich.edu/server/actuator/health + +# workshop: +curl -s https://backend.workshop.deepblue-documents.lib.umich.edu/server/actuator/health + +# demo: +curl -s https://backend.demo.deepblue-documents.lib.umich.edu/server/actuator/health +``` + +--- + +## Working with Local Config Copies + +The `from-kube.*.dspace.cfg` files are gitignored — they are never committed. +If they are missing, follow [Regenerating This Directory](#regenerating-this-directory-from-scratch) +above to fetch them from the cluster. + +After applying a change to a secret, re-fetch to keep your local copy current: + +```shell +kubectl config use-context deepblue-documents-production +kubectl -n production get secret dspace-cfg \ + -o jsonpath="{.data.dspace\.cfg}" | base64 --decode \ + > backend/config/from-kube.production.dspace.cfg + +kubectl config use-context deepblue-documents-workshop +for NS in workshop demo; do + kubectl -n $NS get secret dspace-cfg \ + -o jsonpath="{.data.dspace\.cfg}" | base64 --decode \ + > backend/config/from-kube.${NS}.dspace.cfg +done +``` + +Then re-run `gen_delta.py` to update `DELTA.md`: + +```shell +python3 dotpy/gen_delta.py backend/config +``` + +Because these files contain secrets (passwords, API keys) they must stay +out of version control. The `.gitignore` rule `backend/config/*.cfg` ensures +this. Do not remove or override that rule. + + diff --git a/dotpy/README.md b/dotpy/README.md index 105773b..d05bf6a 100644 --- a/dotpy/README.md +++ b/dotpy/README.md @@ -66,6 +66,61 @@ ERROR: README.md:67 col 3: width mismatch (header=32, this row=28) --- +### `gen_delta.py` — Regenerate `backend/config/DELTA.md` from the live cfg files + +Parses the three gitignored `from-kube.*.dspace.cfg` files in `backend/config/`, +compares every property across all three environments, and writes a fresh +`DELTA.md` containing a summary table of differing properties, auto-generated +key findings, and recommendations. Sensitive values (`db.password`, +`identifier.doi.password`, `api.user.key`, etc.) are redacted in the output +so `DELTA.md` is safe to commit. + +**Usage** + +```shell +# Auto-discovers backend/config/ and writes backend/config/DELTA.md: +python3 dotpy/gen_delta.py + +# Explicit directory: +python3 dotpy/gen_delta.py backend/config/ + +# Explicit directory and output file: +python3 dotpy/gen_delta.py backend/config/ backend/config/DELTA.md + +# Write to stdout instead of a file: +python3 dotpy/gen_delta.py backend/config/ - +``` + +**Prerequisites** + +The three `from-kube.*.dspace.cfg` files must exist locally (they are +gitignored). Fetch them from the cluster first: + +```shell +for NS in production workshop demo; do + kubectl -n $NS get secret dspace-cfg \ + -o jsonpath="{.data.dspace\.cfg}" | base64 --decode \ + > backend/config/from-kube.${NS}.dspace.cfg +done +``` + +**Example output (stderr)** + +``` +Parsed backend/config/from-kube.demo.dspace.cfg (254 properties) +Parsed backend/config/from-kube.production.dspace.cfg (262 properties) +Parsed backend/config/from-kube.workshop.dspace.cfg (259 properties) +Written to backend/config/DELTA.md +``` + +**When to use** + +- After fetching fresh cfg files from the cluster to update the diff. +- After making a change to a `dspace-cfg` secret to document what changed. +- To bootstrap `DELTA.md` from a blank slate. + +--- + ## Conventions for adding new scripts When a new Python utility is useful enough to save for future use, add it here: diff --git a/dotpy/gen_delta.py b/dotpy/gen_delta.py new file mode 100755 index 0000000..d61fe28 --- /dev/null +++ b/dotpy/gen_delta.py @@ -0,0 +1,375 @@ +#!/usr/bin/env python3 +""" +gen_delta.py — Compare the three environment dspace.cfg Kubernetes secrets +and regenerate DELTA.md from scratch. + +Usage: + python3 dotpy/gen_delta.py # auto-discovers backend/config/ + python3 dotpy/gen_delta.py # explicit dir → /DELTA.md + python3 dotpy/gen_delta.py # explicit dir and output file + python3 dotpy/gen_delta.py - # write to stdout + +config_dir + Directory containing the three from-kube.*.dspace.cfg files. + Defaults to backend/config/ relative to the repository root. + +output + Where to write DELTA.md. Defaults to /DELTA.md. + Pass '-' to write to stdout instead. + +Prerequisites + The three input files must exist in config_dir: + from-kube.demo.dspace.cfg + from-kube.production.dspace.cfg + from-kube.workshop.dspace.cfg + + Fetch them from the cluster first (see backend/config/README.md): + kubectl -n get secret dspace-cfg \\ + -o jsonpath="{.data.dspace\\.cfg}" | base64 --decode > \\ + backend/config/from-kube..dspace.cfg +""" + +from __future__ import annotations + +import sys +from datetime import datetime, timezone +from pathlib import Path + + +# ── constants ──────────────────────────────────────────────────────────────── + +ENVS = ['demo', 'production', 'workshop'] + +# Values for these keys are replaced with a placeholder in committed output +# because DELTA.md itself is version-controlled. +REDACTED_KEYS = { + 'db.password', + 'identifier.doi.password', + 'identifier.doi.password_working', + 'api.user.key', +} + +# Cell values longer than this are truncated with '…' in the table +MAX_CELL = 58 + + +# ── cfg parser ──────────────────────────────────────────────────────────────── + +def parse_cfg(path: Path) -> dict: + """ + Parse a dspace.cfg (Java .properties style) file. + + Returns a dict mapping property name → value string. + Handles: + - '#' and blank line comments + - 'key = value' and 'key=value' syntax + - Multi-line values joined with '\\' line continuations + """ + props: dict = {} + key: str | None = None + parts: list[str] = [] + + def flush(): + nonlocal key, parts + if key is not None: + props[key] = ' '.join(parts).strip() + key = None + parts = [] + + with open(path, encoding='utf-8', errors='replace') as fh: + for raw in fh: + line = raw.rstrip('\n') + + # If the previous line ended with \, this line continues that value. + if parts and parts[-1].endswith('\\'): + parts[-1] = parts[-1][:-1].rstrip() + parts.append(line.strip()) + continue + + s = line.strip() + if not s or s.startswith('#'): + # A blank or comment line ends a simple (non-continuation) value + # naturally; we let flush() happen on the next key= line. + continue + + eq = line.find('=') + if eq == -1: + continue # malformed line — skip + + flush() + key = line[:eq].strip() + parts = [line[eq + 1:].strip()] + + flush() + return props + + +# ── formatting helpers ──────────────────────────────────────────────────────── + +def cell(key: str, val: str | None) -> str: + """Return a display-safe, length-capped cell value for a property.""" + if val is None: + return '*(absent)*' + if key in REDACTED_KEYS: + return '*(redacted)*' + if not val: + return '*(empty)*' + return val if len(val) <= MAX_CELL else val[:MAX_CELL] + '…' + + +def md_table(headers: list[str], rows: list[list[str]]) -> str: + """ + Render a properly padded Markdown table. + + Column widths are determined by the widest cell in each column (header + or data). Every cell is left-padded with one space and right-padded to + the column width plus one trailing space. + """ + widths = [len(h) for h in headers] + for row in rows: + for i, c in enumerate(row): + widths[i] = max(widths[i], len(c)) + + def fmt(cells: list[str]) -> str: + return '|' + '|'.join( + ' ' + c.ljust(widths[i]) + ' ' for i, c in enumerate(cells) + ) + '|' + + sep = '|' + '|'.join('-' * (w + 2) for w in widths) + '|' + return '\n'.join([fmt(headers), sep] + [fmt(r) for r in rows]) + + +# ── delta generation ────────────────────────────────────────────────────────── + +def generate(cfgs: dict[str, dict]) -> str: + """ + Compare cfgs[env] dicts and return DELTA.md content as a string. + cfgs keys must be exactly the three strings in ENVS. + """ + # All property keys present in any environment + all_keys: set[str] = set() + for d in cfgs.values(): + all_keys.update(d.keys()) + + # Partition into differing vs identical + differing: list[str] = [] + identical: list[str] = [] + for k in sorted(all_keys): + vals = {env: cfgs[env].get(k) for env in ENVS} + unique = {v for v in vals.values() if v is not None} + absent = [env for env in ENVS if k not in cfgs[env]] + if len(unique) > 1 or absent: + differing.append(k) + else: + identical.append(k) + + # ── Summary Table ──────────────────────────────────────────────────────── + headers = ['Property', 'demo', 'production', 'workshop'] + rows = [] + for k in differing: + rows.append( + ['`' + k + '`'] + [cell(k, cfgs[env].get(k)) for env in ENVS] + ) + + # ── Key Findings ───────────────────────────────────────────────────────── + findings: list[tuple[str, str]] = [] + + # 1. server/ui URLs + server_vals = {env: cfgs[env].get('dspace.server.url', '') for env in ENVS} + ui_vals = {env: cfgs[env].get('dspace.ui.url', '') for env in ENVS} + if ( + all('localhost' in v for v in server_vals.values()) + and all('localhost' in v for v in ui_vals.values()) + ): + findings.append(( + '`dspace.server.url` and `dspace.ui.url` are `localhost` in all three environments', + 'All three config files have:\n\n' + '```\n' + f'dspace.server.url = {server_vals["production"]}\n' + f'dspace.ui.url = {cfgs["production"].get("dspace.ui.url", "")}\n' + '```\n\n' + 'This is **intentional**. These values are always overridden at pod startup ' + 'by environment variables injected from the `backend-environment` ConfigMap ' + '(`dspace__P__server__P__url`, `dspace__P__ui__P__url`). ' + 'Do **not** edit these in `dspace.cfg`; edit ' + '`environments//backend-cm.jsonnet` in `deepblue-documents-kube` instead.' + )) + + # 2. Workshop shares production database + w_db = cfgs['workshop'].get('db.url', '') + p_db = cfgs['production'].get('db.url', '') + d_db = cfgs['demo'].get('db.url', '') + if w_db == p_db and w_db != d_db: + findings.append(( + 'workshop shares the production database credentials', + f'- `db.url` = `{p_db}` in both production and workshop — workshop ' + 'connects to the **production database**.\n' + '- `db.username` / `db.password` are also identical in production and workshop.\n\n' + '⚠️ Workshop is **not** an isolated test environment with respect to data. ' + 'The production assetstore is mounted read-only in workshop ' + '(`readOnly: true`), but database writes (deposits, metadata edits, ' + 'workflow actions) **affect production data**.' + )) + + # 3. Demo uses local test database + if 'localhost' in d_db and 'prod' not in d_db: + d_fs = cfgs['demo'].get('filestorage.dir', '') + findings.append(( + 'demo uses a local/test assetstore and database', + f'- `filestorage.dir` = `{d_fs}` — files stored relative to the DSpace ' + 'install directory (ephemeral in a container).\n' + f'- `db.url` = `{d_db}` with username `{cfgs["demo"].get("db.username", "")}` ' + '— a lightweight local database, not production.\n\n' + 'demo is fully isolated from production data.' + )) + + # 4. Production has real-world identifiers + p_handle = cfgs['production'].get('handle.prefix', '') + d_handle = cfgs['demo'].get('handle.prefix', '') + p_doi = cfgs['production'].get('identifier.doi.prefix', '') + d_doi = cfgs['demo'].get('identifier.doi.prefix', '') + pw_working = 'identifier.doi.password_working' in cfgs['production'] + if p_handle != d_handle: + findings.append(( + 'Production has unique real-world identifiers', + f'- `handle.prefix` = `{p_handle}` is the registered U-M prefix at CNRI. ' + f'demo and workshop use `{d_handle}` (test/dummy).\n' + f'- `identifier.doi.prefix` = `{p_doi}` is the registered Deep Blue Data ' + f'prefix at DataCite. demo and workshop use `{d_doi}` (test).\n' + + ('- Production has an extra `identifier.doi.password_working` field absent ' + 'from demo and workshop.\n' if pw_working else '') + )) + + # 5. nodoi.email + p_nodoi = cfgs['production'].get('nodoi.email', '') + d_nodoi = cfgs['demo'].get('nodoi.email', '') + if p_nodoi != d_nodoi: + findings.append(( + '`nodoi.email` differs between production and the other environments', + f'- Production: `{p_nodoi}` — ⚠️ this does not appear to be a valid ' + 'U-M address; it should be reviewed.\n' + f'- demo and workshop: `{d_nodoi}`.' + )) + + # 6. Properties identical in all three (notable ones only) + notable_shared = [ + k for k in ['ip.bioIPsRange1', 'ip.bioIPsRange2', 'api.user.key', + 'mail.server', 'mail.from.address'] + if k in identical + ] + if notable_shared: + findings.append(( + 'IP ranges, API key, and mail settings are identical across all three environments', + 'All three environments share the same values for:\n\n' + + '\n'.join(f'- `{k}`' for k in notable_shared) + + '\n\nChanges to these properties must be applied to all three secrets.' + )) + + # ── Assemble document ───────────────────────────────────────────────────── + now = datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ') + out: list[str] = [ + '# Config File Comparison: demo vs production vs workshop', + '', + f'> Generated {now} by `python3 dotpy/gen_delta.py`.', + '> Re-run after fetching fresh copies of the three cfg files from the cluster.', + '', + 'Comparing the three decrypted Kubernetes config files:', + '', + '- `from-kube.demo.dspace.cfg`', + '- `from-kube.production.dspace.cfg`', + '- `from-kube.workshop.dspace.cfg`', + '', + '> **Note:** these `*.cfg` files are gitignored and are never committed.', + '> Fetch them before running this script (see `README.md`).', + '', + '---', + '', + '## Summary Table', + '', + f'Properties that differ across environments ({len(differing)} of {len(all_keys)} total):', + '', + md_table(headers, rows), + '', + '---', + '', + '## Key Findings', + '', + ] + + for i, (title, body) in enumerate(findings, 1): + out.append(f'### {i}. {title}') + out.append('') + out.append(body) + out.append('') + + out += [ + '---', + '', + '## Recommendations', + '', + '1. **Do not edit ConfigMap-controlled properties in `dspace.cfg`.** ' + 'Properties such as `dspace.server.url`, `dspace.ui.url`, `db.url`, ' + '`solr.server`, `handle.prefix`, `identifier.doi.prefix`, and mail ' + 'settings are overridden at runtime by the `backend-environment` ConfigMap. ' + 'Edit `environments//backend-cm.jsonnet` in `deepblue-documents-kube` instead.', + '', + '2. **Isolate workshop from production data.** ' + 'Workshop should have its own database and assetstore. ' + 'The current setup (shared `dspace-prod` database) is dangerous for testing.', + '', + '3. **Review `nodoi.email` in production** — the current value does not appear ' + 'to be a valid U-M address.', + '', + '4. **Properties shared by all three environments** (IP ranges, `api.user.key`, ' + 'mail settings) must be updated in all three secrets simultaneously.', + '', + ] + + return '\n'.join(out) + + +# ── entry point ─────────────────────────────────────────────────────────────── + +def main(): + script_dir = Path(__file__).parent + default_config_dir = script_dir.parent / 'backend' / 'config' + + config_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else default_config_dir + + # Verify and load cfg files + cfgs: dict[str, dict] = {} + for env in ENVS: + fname = f'from-kube.{env}.dspace.cfg' + fpath = config_dir / fname + if not fpath.exists(): + print( + f'ERROR: {fpath} not found.\n' + f'Fetch it from the cluster:\n' + f' kubectl -n {env} get secret dspace-cfg \\\n' + f' -o jsonpath="{{.data.dspace\\.cfg}}" | base64 --decode \\\n' + f' > {fpath}', + file=sys.stderr, + ) + sys.exit(1) + cfgs[env] = parse_cfg(fpath) + print(f'Parsed {fpath} ({len(cfgs[env])} properties)', file=sys.stderr) + + content = generate(cfgs) + + # Determine output destination + if len(sys.argv) > 2: + out_arg = sys.argv[2] + if out_arg == '-': + print(content) + return + out_path = Path(out_arg) + else: + out_path = config_dir / 'DELTA.md' + + out_path.write_text(content + '\n', encoding='utf-8') + print(f'Written to {out_path}', file=sys.stderr) + + +if __name__ == '__main__': + main() + From a6a9314e86d44771d0a9d8540468ec350118ef5d Mon Sep 17 00:00:00 2001 From: Greg Kostin Date: Wed, 22 Apr 2026 07:55:38 -0400 Subject: [PATCH 2/2] Pull request review fixes --- backend/config/README.md | 25 +++++++------------------ 1 file changed, 7 insertions(+), 18 deletions(-) diff --git a/backend/config/README.md b/backend/config/README.md index 41eb10c..45ada53 100644 --- a/backend/config/README.md +++ b/backend/config/README.md @@ -192,30 +192,19 @@ environment. In particular: `identifier.doi.prefix` (`10.7302`) in `demo` or `workshop`. - Do **not** use production database credentials in `demo`. -### 3. Re-encode to base64 +### 3. Re-apply the secret -```shell -base64 < backend/config/from-kube..dspace.cfg \ - > backend/config/from-kube..dspace.cfg.base64 -``` - -### 4. Patch the secret directly (recommended) +Encode the edited file and patch the secret in one pipeline. +`tr -d '\n'` strips any line-wrapping that `base64` may insert (behaviour +differs between macOS and Linux), ensuring the value is a single-line string: ```shell kubectl -n patch secret dspace-cfg \ --type='json' \ - -p="[{\"op\":\"replace\",\"path\":\"/data/dspace.cfg\",\"value\":\"$(cat backend/config/from-kube..dspace.cfg.base64)\"}]" -``` - -Alternatively, copy the contents of -`backend/config/from-kube..dspace.cfg.base64` -into a local `config-secret.yaml` manifest and apply it: - -```shell -kubectl apply -f config-secret.yaml + -p="[{\"op\":\"replace\",\"path\":\"/data/dspace.cfg\",\"value\":\"$(base64 < backend/config/from-kube..dspace.cfg | tr -d '\n')\"}]" ``` -### 5. Restart the backend pod +### 4. Restart the backend pod The backend pod must be restarted to reload the mounted secret: @@ -229,7 +218,7 @@ Wait for the rollout to complete: kubectl -n rollout status deployment backend ``` -### 6. Verify +### 5. Verify Check the backend logs for startup errors: