Pentest fixes and hardening#321
Conversation
Satisfies SLSA v1.2 Source track requirement for a documented Safe Expunging Process. Covers scope, permitted reasons, two-maintainer approval, procedure, and consumer notification.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Bitnami removed bitnami/nginx:1.24 from Docker Hub in August 2025, breaking compose builds. Point to the bitnamilegacy mirror that Bitnami publishes for deprecated tags.
The celery-ocr service defined in docker-compose.yml builds with ROLE=celery-ocr, but flask/Dockerfile only handled 'flask' and 'celery' branches, so the image had no Python runtime and the container exited 127 with 'celery: not found'. Share the celery install path with celery-ocr, which needs the same deps.
The compose file defaulted ENV_FILE to .env, which is the local development file where POSTGRES_HOST=localhost. That mount made the bayanat container try to reach postgres via a local socket instead of the compose service. .env.docker already ships with the correct service hostnames (postgres, redis) and is the intended default for this compose file.
flask create-db builds the full schema from models, so running db upgrade after it conflicts on indexes the latest migrations try to create. Detect fresh vs existing DB via flask db current.
Swap ubuntu:24.04 for python:3.12-slim-bookworm in the runtime stage and use the official ghcr.io/astral-sh/uv image as the builder. Keeps the ROLE-based conditional install for celery-only deps (exiftool, ffmpeg) and drops incidental ubuntu bulk that is not actually used by the app. - builder stage installs only build headers; runtime stage installs only shared libraries (libpq5, pango, cairo, harfbuzz, libxml2, libxslt, libjpeg62-turbo, libopenjp2, libffi8, dejavu fonts) - libimage-exiftool-perl is kept in the builder because pyexifinfo's setup.py probes for the exiftool binary during wheel install - XDG_CACHE_HOME=/tmp/.cache silences fontconfig cache warnings during weasyprint PDF generation Verified end-to-end against a fresh compose stack: weasyprint PDF gen, psycopg2, pillow, lxml, exiftool, ffmpeg, nginx->uwsgi->Flask login all work. Image sizes drop ~160MB (bayanat) and ~210MB (celery).
The previous `service nginx status` check failed on the bitnami nginx image, which has no sysvinit, so the container always reported unhealthy even when nginx was serving correctly. Switch to a bash TCP probe against port 80 — no external tools required (curl and wget are not present in the image).
## Summary - Installer's systemd unit for `bayanat-celery` only subscribed to the default `celery` queue, but `ocr_single` tasks route to a dedicated `ocr` queue (`enferno/tasks/__init__.py`). - Any bulk OCR dispatched via the admin UI (`POST /admin/api/ocr/bulk`) or the `flask ocr process` CLI piled up in Redis with no consumer. - Single-media OCR (per-item UI button / `flask ocr extract`) was unaffected — it takes a sync path through `process_media_extraction_task()`. ## Fix Add `-Q celery,ocr` to the `ExecStart` line so the worker consumes both queues. ## Affected versions - v4.0.0 (GA install command ships the broken unit). ## Upgrade path ### Fresh install Nothing special — new installs get the fixed unit. ### Existing v4.0.0 installs (in-place patch, no reinstall needed) ``` sudo sed -i.bak 's|worker --autoscale 2,5 -B$|& -Q celery,ocr|' /etc/systemd/system/bayanat-celery.service sudo systemctl daemon-reload sudo systemctl restart bayanat-celery ``` Verify with `sudo journalctl -u bayanat-celery --since "30 seconds ago" | grep queues` — should list both `celery` and `ocr`. ## Test plan - [x] Reproduced on auto-update-simplified deployment: 6 tasks stuck in `ocr` queue with 0 processed, worker idle. - [x] Applied the same fix to prod2 manually: queue drains, `ocr_single` tasks processed end-to-end via Google Vision API. - [x] All 4 test media processed successfully (confidence 24-98%, zero failures). - [ ] Tag `v4.0.1` after merge and update docs installer URL to pin v4.0.1.
Bulletin/actor/incident history routes only checked the global view_*_history permission, never the per-item access of the parent record. A user with history-view permission but no group access to a restricted item could read its full revision payload via the history API. Resolve the parent entity and gate on current_user.can_access(parent) before the history query. Log denied attempts to Activity. Location history is unaffected (Location has no role-based access).
The PUT /api/extraction/<id> endpoint resolved the Extraction row directly by ID without re-checking the parent Media's group membership, mirroring the GET sibling. Any DA/Admin could overwrite text/status on extractions in groups they don't belong to, and the success response leaked the full text+history payload back. Resolve the parent Media, gate on current_user.can_access(media), and trim the success response to to_compact_dict so even authorised calls don't echo the full text/history block.
api_csv_analyze, api_xls_sheet and api_xls_analyze concatenated the caller-supplied filename onto IMPORT_DIR with no containment check, so '../../../../app/.env' resolved outside the import directory and the resulting file content was returned as parsed CSV. With Admin credentials this leaked SECRET_KEY, SECURITY_TOTP_SECRETS and SECURITY_PASSWORD_SALT. Add _resolve_import_path() that joins via werkzeug.safe_join, resolves symlinks, and asserts the candidate is inside IMPORT_DIR. Reject with 400 on traversal or missing filename, with a warning log.
The /login endpoint had no server-side throttle: Flask-Limiter was only attached to /csrf, the session-scoped failure counter is reset by starting a new session, and reCAPTCHA is off by default. An attacker could issue an unbounded number of POSTs against /login. Add a Redis-backed throttle keyed independently on (username) and (ip) with a 15-minute sliding window: 10 failures per username, 30 per IP. Throttle check runs in before_app_request for POST /login and returns 429 once either ceiling is hit. Counters increment in after_app_request on failed login and clear on success. Failed attempts are logged via the regular logger; reCAPTCHA stays as an optional secondary friction layer.
Interactive CRUD runs Bulletin.description / ActorProfile.description through SanitizedField, but the import helpers wrote raw, untrusted strings straight to those columns. Both fields are rendered with v-html in BulletinCard and ActorProfiles, so an attacker-controlled CSV cell or external metadata payload could carry stored XSS that fires when an analyst opens the record. Wrap the import-side writes in sanitize_string() so the same bleach allowlist used by SanitizedField applies before persistence. Covers sheet_import.set_description (actor profile), and the three bulletin.description sinks in media_import (update_description, video info description, text_content).
One test file covering 001 / 002 / 004 / 007 / 008. Each test mirrors the auditor's PoC payload so a future regression flips the test. Mix of e2e (history endpoint, extraction PUT, traversal endpoints) and unit (login throttle helpers, sanitize_string) depending on what was practical to wire through fixtures. 7ASec retest map: run 'uv run pytest tests/test_pentest_fixes.py -v'.
The previous fix reinvented sliding-window rate limiting on top of raw Redis incr/expire. Replace it with two stacked Flask-Limiter decorators applied to security.login view post-registration: one keyed by username, one by IP. Storage, TTL, sliding window, 429 response and X-RateLimit-* headers all come from the existing limiter setup. Limits live in settings.py (LOGIN_RATE_LIMIT_PER_USERNAME and LOGIN_RATE_LIMIT_PER_IP), env-overridable, no UI exposure. TestConfig uses tighter values so the e2e test trips quickly. Drops the custom helpers in rate_limit_utils.py, the throttle code in user/views.py, the _FakeRedis test class, three throttle helper unit tests, and the weak resolver unit test. Adds one e2e test that posts bad credentials repeatedly until 429. Net: -94 lines, leaning on a battle-tested library.
The installer created the bayanat role with createuser -s (full superuser) and inserted 'local all bayanat trust' into pg_hba.conf. Any local OS user could then open a socket connection and claim the bayanat role with no password and full superuser privileges. Drop -s on createuser so the role is a plain owner. Create pg_trgm and postgis as the postgres superuser at install time so the app role never needs CREATE EXTENSION privilege. Replace the 'trust' pg_hba rule with 'peer' so only processes running as the bayanat OS user can authenticate as the bayanat PG role. Idempotent for upgrades: existing installs get ALTER USER ... NOSUPERUSER and the trust rule is removed before the peer rule is inserted.
Celery export tasks (PDF / JSON / CSV / media) iterated the caller-supplied IDs and serialised every row, with no per-record group check. A user with can_export but no group access could submit restricted IDs in an export request; once an admin approved it (the approval UI gives no signal that items cross access boundaries), the requester received the full payload. Add _accessible_items helper that yields only rows the export requester is authorised to see (mirrors current_user.can_access on the direct GET path) and route all three generator tasks plus the media-attachment task through it. CSV path uses an inline check because each iteration mutates a DataFrame. Restricted items get a warning log; missing requester yields nothing (fail closed). UI surface (admin approval warning when items are out-of-group) deferred to a follow-up; this commit closes the data leak.
…r CLI The /api/create-admin route was reachable over the network without authentication during the install window. The supported quick-install script started uWSGI + Caddy before the operator visited the wizard, so the first network client could claim a fully privileged admin account. The same hole reopened any time the admin role had zero users. Drop the route entirely (and its sibling /api/check-admin). The installer now provisions the admin out-of-band: it generates a high-entropy password and runs 'flask install --username admin --password ...' before the service is exposed, then prints the credentials once for the operator. The setup wizard now requires an authenticated Admin session and starts from the language step. flask install gains --username and --password options for non-interactive use; if username is given without a password it generates and prints one. Drive-by: pre-commit ruff dropped 4 pre-existing unused imports (Bulletin, DynamicField, DateHelper, celery_app) and one redundant f-string in commands.py.
The credentials banner already prints username + password, but the operator still has to construct the login URL themselves. Add the fully-qualified /login URL to the banner and explain what the wizard covers after sign-in, so the post-install handoff is one copy + one click.
The native installer (bayanat script) provisions the initial admin via flask install before the service is reachable. The Docker entrypoint was not aligned: it ran create-db on a fresh volume but never created an admin, leaving Pattern A's network surface (wizard requires auth, no admin = no login) effectively unusable for compose users. Add a flask install --username admin call after schema creation. With no --password supplied the command generates a random one and prints the credentials banner to stdout, which docker-compose logs flask captures so the operator can retrieve it after first start.
Two issues from review: - The entrypoint comment referenced 'docker-compose logs flask' but the service in docker-compose.yml is named bayanat. - The deployment docs still told operators to run 'flask install' after startup, which now silently no-ops because the entrypoint bootstrapped the admin already, leaving them without credentials. Fix the comment and rewrite the docs: the first-run password is in 'docker-compose logs bayanat | grep -A4 "Generated password"'; the CLI fallback only mints a fresh password if no admin exists.
Reviewer flagged that the native installer's _bootstrap_admin generated
the password in shell and passed it via 'flask install --password "$pw"',
exposing it through /proc/<pid>/cmdline and 'ps' for the lifetime of
the child.
Add a --password-stdin flag to the flask install CLI (mirroring the
docker login --password-stdin pattern). The installer now feeds the
password over a pipe:
printf '%s\n' "$pw" | flask install --username admin --password-stdin
The password never appears in argv or environ. The Docker entrypoint
already used the safer shape (flask install --username admin generates
and prints) and is unchanged.
f-string substitution into postgresql:// and redis:// URLs broke when passwords contained URL-special characters (/, @, #, +, =). Surfaced during Docker bootstrap testing: a base64 password with '/' caused 'Port could not be cast to integer' at app boot. quote(safe='') around POSTGRES_USER/POSTGRES_PASSWORD/REDIS_PASSWORD fixes Postgres URI plus all four Redis URLs (broker, result, session, cache). Verified roundtrip via redis.from_url decode.
Wizard no longer creates the admin user; the installer (native and Docker entrypoint) prints generated credentials and operators sign in before the wizard runs. Update installation and docker guides to match, and switch examples to docker compose v2 with --env-file .env.docker so substitution of POSTGRES_USER/REDIS_PASSWORD does not silently fall back to empty values.
apodacaduron
left a comment
There was a problem hiding this comment.
BAY-01-001:
- If the user does not belong to the same access group as a restricted bulletin, the history endpoint correctly blocks the request:
GET /admin/api/bulletinhistory/3returns{"message": "Restricted Access"}✅ - Direct access to the restricted bulletin also correctly blocks:
GET /admin/api/bulletin/3returns restricted access and triggers the admin notification ✅ - Related follow-up: the relations endpoint still returns
200for the same restricted parent bulletin. Example:GET /admin/api/bulletin/relations/3?class=actorreturns masked entity stubs plus relationship metadata likerelated_as,probability,comment, anduser_id. This leaks relationship existence/metadata for a restricted parent record and should probably return403.⚠️ - Related follow-up: the graph endpoint also returns
200for the same restricted bulletin. BothGET /admin/api/graph/json?id=3&type=bulletin&expanded=falseandGET /admin/api/graph/json?id=3&type=bulletin&expanded=truereturn masked nodes, but still expose graph topology and metadata, including restricted entity IDs, relation existence, relation type (Perpetrator), connected accessible records, and connected locations (Mexico,USA). This should probably return403when the root entity is restricted from the current user.⚠️ - Related follow-up: the flowmap endpoint masks the restricted actor but still leaks event/location timeline data. Example:
POST /admin/api/flowmap/actors-for-locationswith{"location_ids":[1],"q":[{}]}returns actor1as restricted, but includescurrent_eventandnext_eventdetails such as event type (Detained/Abducted,Death), locations (Mexico,USA), location IDs, and event dates. Restricted actors should probably be excluded from the flowmap result or returned without event details.⚠️
BAY-01-008:
|
BAY-01-009:
|
BAY-01-010
|
BAY-01-011:
|
BAY-01-012:
NOTE: the Bulletin drawer still renders the Media card for this user, including media ID, filename, etag/hash, and media action buttons. The file itself is blocked, but media metadata is still exposed through the parent Bulletin view/UI |
BAY-01-035:
|
BAY-01-036
|
BAY-01-037
|
BAY-01-038:
|
BAY-01-039:
|
BAY-01-040:
|
BAY-01-042:
|
Security fixes and hardening on top of baseline
8615ff11b. Each commit addresses a specific BAY-01-XXX finding so 7ASec can map fixes to their report bygit log --grep BAY-01 pentest-fixes.BAY-01 status
Fixed in this branch (Wave 1+2)
BAY-01-001Revision history bypasses object-level access —80f56b9bcBAY-01-002OCR extraction PUT IDOR —d0f4581daBAY-01-003Export pipeline skips per-item authz —3ce769fe8BAY-01-004Path traversal in CSV/XLS analyze —4b66981d5BAY-01-005Unauthenticated admin takeover via/api/create-admin—597e7e6c1BAY-01-006DB superuser + trust auth in installer —507bab008BAY-01-007No login rate limit —6b6973456(initial) +ecc4aa76f(refactor to Flask-Limiter onsecurity.login)BAY-01-008Stored XSS via import sanitization gap —9800fbed6Awaiting full report from 7ASec (early week of 2026-05-04)
BAY-01-009Insufficient authz on media updateBAY-01-010Relation mutation on restricted itemsBAY-01-011Moderator bulk-update clears access rolesBAY-01-012Direct media APIs bypass accessBAY-01-013Privilege escalation via auto-update root wrapper (likely overlaps with existing CLI work)BAY-01-014Domain allowlist bypass in web import (hardening)Notes for retest
git log --grep BAY-01 pentest-fixes --format=fullerfor the full set.tests/test_pentest_fixes.py(uv run pytest tests/test_pentest_fixes.py -v).flask installis now non-interactive when--username/--passwordare supplied; the installer generates and prints the admin password once. Fresh installs are recommended for retest.Will be converted from draft once all in-scope items are addressed.