Skip to content

Pentest fixes and hardening#321

Open
level09 wants to merge 52 commits into
auto-update-simplifiedfrom
pentest-fixes
Open

Pentest fixes and hardening#321
level09 wants to merge 52 commits into
auto-update-simplifiedfrom
pentest-fixes

Conversation

@level09
Copy link
Copy Markdown
Collaborator

@level09 level09 commented Apr 21, 2026

Security fixes and hardening on top of baseline 8615ff11b. Each commit addresses a specific BAY-01-XXX finding so 7ASec can map fixes to their report by git log --grep BAY-01 pentest-fixes.

BAY-01 status

Fixed in this branch (Wave 1+2)

  • BAY-01-001 Revision history bypasses object-level access — 80f56b9bc
  • BAY-01-002 OCR extraction PUT IDOR — d0f4581da
  • BAY-01-003 Export pipeline skips per-item authz — 3ce769fe8
  • BAY-01-004 Path traversal in CSV/XLS analyze — 4b66981d5
  • BAY-01-005 Unauthenticated admin takeover via /api/create-admin597e7e6c1
  • BAY-01-006 DB superuser + trust auth in installer — 507bab008
  • BAY-01-007 No login rate limit — 6b6973456 (initial) + ecc4aa76f (refactor to Flask-Limiter on security.login)
  • BAY-01-008 Stored XSS via import sanitization gap — 9800fbed6

Awaiting full report from 7ASec (early week of 2026-05-04)

  • BAY-01-009 Insufficient authz on media update
  • BAY-01-010 Relation mutation on restricted items
  • BAY-01-011 Moderator bulk-update clears access roles
  • BAY-01-012 Direct media APIs bypass access
  • BAY-01-013 Privilege escalation via auto-update root wrapper (likely overlaps with existing CLI work)
  • BAY-01-014 Domain allowlist bypass in web import (hardening)

Notes for retest

  • Each fix commit message contains: root cause, what changed, and where to verify.
  • Run git log --grep BAY-01 pentest-fixes --format=fuller for the full set.
  • Regression tests at tests/test_pentest_fixes.py (uv run pytest tests/test_pentest_fixes.py -v).
  • BAY-01-005 changes the install flow: flask install is now non-interactive when --username/--password are supplied; the installer generates and prints the admin password once. Fresh installs are recommended for retest.

Will be converted from draft once all in-scope items are addressed.

Satisfies SLSA v1.2 Source track requirement for a documented
Safe Expunging Process. Covers scope, permitted reasons, two-maintainer
approval, procedure, and consumer notification.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9ef9b352-bddb-4408-9776-e2419df3ffb7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pentest-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

level09 added 5 commits April 21, 2026 15:31
Bitnami removed bitnami/nginx:1.24 from Docker Hub in August 2025,
breaking compose builds. Point to the bitnamilegacy mirror that
Bitnami publishes for deprecated tags.
The celery-ocr service defined in docker-compose.yml builds with
ROLE=celery-ocr, but flask/Dockerfile only handled 'flask' and
'celery' branches, so the image had no Python runtime and the
container exited 127 with 'celery: not found'. Share the celery
install path with celery-ocr, which needs the same deps.
The compose file defaulted ENV_FILE to .env, which is the local
development file where POSTGRES_HOST=localhost. That mount made
the bayanat container try to reach postgres via a local socket
instead of the compose service. .env.docker already ships with
the correct service hostnames (postgres, redis) and is the
intended default for this compose file.
flask create-db builds the full schema from models, so running
db upgrade after it conflicts on indexes the latest migrations
try to create. Detect fresh vs existing DB via flask db current.
@level09 level09 self-assigned this Apr 21, 2026
level09 added 20 commits April 22, 2026 00:48
Swap ubuntu:24.04 for python:3.12-slim-bookworm in the runtime stage
and use the official ghcr.io/astral-sh/uv image as the builder. Keeps
the ROLE-based conditional install for celery-only deps (exiftool,
ffmpeg) and drops incidental ubuntu bulk that is not actually used by
the app.

- builder stage installs only build headers; runtime stage installs
  only shared libraries (libpq5, pango, cairo, harfbuzz, libxml2,
  libxslt, libjpeg62-turbo, libopenjp2, libffi8, dejavu fonts)
- libimage-exiftool-perl is kept in the builder because pyexifinfo's
  setup.py probes for the exiftool binary during wheel install
- XDG_CACHE_HOME=/tmp/.cache silences fontconfig cache warnings during
  weasyprint PDF generation

Verified end-to-end against a fresh compose stack: weasyprint PDF gen,
psycopg2, pillow, lxml, exiftool, ffmpeg, nginx->uwsgi->Flask login
all work. Image sizes drop ~160MB (bayanat) and ~210MB (celery).
The previous `service nginx status` check failed on the bitnami nginx
image, which has no sysvinit, so the container always reported
unhealthy even when nginx was serving correctly. Switch to a bash TCP
probe against port 80 — no external tools required (curl and wget are
not present in the image).
## Summary

- Installer's systemd unit for `bayanat-celery` only subscribed to the
default `celery` queue, but `ocr_single` tasks route to a dedicated
`ocr` queue (`enferno/tasks/__init__.py`).
- Any bulk OCR dispatched via the admin UI (`POST /admin/api/ocr/bulk`)
or the `flask ocr process` CLI piled up in Redis with no consumer.
- Single-media OCR (per-item UI button / `flask ocr extract`) was
unaffected — it takes a sync path through
`process_media_extraction_task()`.

## Fix

Add `-Q celery,ocr` to the `ExecStart` line so the worker consumes both
queues.

## Affected versions

- v4.0.0 (GA install command ships the broken unit).

## Upgrade path

### Fresh install

Nothing special — new installs get the fixed unit.

### Existing v4.0.0 installs (in-place patch, no reinstall needed)

```
sudo sed -i.bak 's|worker --autoscale 2,5 -B$|& -Q celery,ocr|' /etc/systemd/system/bayanat-celery.service
sudo systemctl daemon-reload
sudo systemctl restart bayanat-celery
```

Verify with `sudo journalctl -u bayanat-celery --since "30 seconds ago"
| grep queues` — should list both `celery` and `ocr`.

## Test plan

- [x] Reproduced on auto-update-simplified deployment: 6 tasks stuck in
`ocr` queue with 0 processed, worker idle.
- [x] Applied the same fix to prod2 manually: queue drains, `ocr_single`
tasks processed end-to-end via Google Vision API.
- [x] All 4 test media processed successfully (confidence 24-98%, zero
failures).
- [ ] Tag `v4.0.1` after merge and update docs installer URL to pin
v4.0.1.
Bulletin/actor/incident history routes only checked the global
view_*_history permission, never the per-item access of the parent
record. A user with history-view permission but no group access to a
restricted item could read its full revision payload via the history
API.

Resolve the parent entity and gate on current_user.can_access(parent)
before the history query. Log denied attempts to Activity. Location
history is unaffected (Location has no role-based access).
The PUT /api/extraction/<id> endpoint resolved the Extraction row
directly by ID without re-checking the parent Media's group
membership, mirroring the GET sibling. Any DA/Admin could overwrite
text/status on extractions in groups they don't belong to, and the
success response leaked the full text+history payload back.

Resolve the parent Media, gate on current_user.can_access(media),
and trim the success response to to_compact_dict so even authorised
calls don't echo the full text/history block.
api_csv_analyze, api_xls_sheet and api_xls_analyze concatenated the
caller-supplied filename onto IMPORT_DIR with no containment check, so
'../../../../app/.env' resolved outside the import directory and the
resulting file content was returned as parsed CSV. With Admin
credentials this leaked SECRET_KEY, SECURITY_TOTP_SECRETS and
SECURITY_PASSWORD_SALT.

Add _resolve_import_path() that joins via werkzeug.safe_join, resolves
symlinks, and asserts the candidate is inside IMPORT_DIR. Reject with
400 on traversal or missing filename, with a warning log.
The /login endpoint had no server-side throttle: Flask-Limiter was
only attached to /csrf, the session-scoped failure counter is reset
by starting a new session, and reCAPTCHA is off by default. An
attacker could issue an unbounded number of POSTs against /login.

Add a Redis-backed throttle keyed independently on (username) and
(ip) with a 15-minute sliding window: 10 failures per username,
30 per IP. Throttle check runs in before_app_request for POST /login
and returns 429 once either ceiling is hit. Counters increment in
after_app_request on failed login and clear on success. Failed
attempts are logged via the regular logger; reCAPTCHA stays as an
optional secondary friction layer.
Interactive CRUD runs Bulletin.description / ActorProfile.description
through SanitizedField, but the import helpers wrote raw, untrusted
strings straight to those columns. Both fields are rendered with
v-html in BulletinCard and ActorProfiles, so an attacker-controlled
CSV cell or external metadata payload could carry stored XSS that
fires when an analyst opens the record.

Wrap the import-side writes in sanitize_string() so the same bleach
allowlist used by SanitizedField applies before persistence.
Covers sheet_import.set_description (actor profile), and the three
bulletin.description sinks in media_import (update_description,
video info description, text_content).
One test file covering 001 / 002 / 004 / 007 / 008. Each test mirrors
the auditor's PoC payload so a future regression flips the test. Mix
of e2e (history endpoint, extraction PUT, traversal endpoints) and
unit (login throttle helpers, sanitize_string) depending on what was
practical to wire through fixtures.

7ASec retest map: run 'uv run pytest tests/test_pentest_fixes.py -v'.
The previous fix reinvented sliding-window rate limiting on top of raw
Redis incr/expire. Replace it with two stacked Flask-Limiter
decorators applied to security.login view post-registration: one keyed
by username, one by IP. Storage, TTL, sliding window, 429 response and
X-RateLimit-* headers all come from the existing limiter setup.

Limits live in settings.py (LOGIN_RATE_LIMIT_PER_USERNAME and
LOGIN_RATE_LIMIT_PER_IP), env-overridable, no UI exposure. TestConfig
uses tighter values so the e2e test trips quickly.

Drops the custom helpers in rate_limit_utils.py, the throttle code in
user/views.py, the _FakeRedis test class, three throttle helper unit
tests, and the weak resolver unit test. Adds one e2e test that posts
bad credentials repeatedly until 429.

Net: -94 lines, leaning on a battle-tested library.
The installer created the bayanat role with createuser -s (full
superuser) and inserted 'local all bayanat trust' into pg_hba.conf.
Any local OS user could then open a socket connection and claim the
bayanat role with no password and full superuser privileges.

Drop -s on createuser so the role is a plain owner. Create pg_trgm
and postgis as the postgres superuser at install time so the app
role never needs CREATE EXTENSION privilege. Replace the 'trust'
pg_hba rule with 'peer' so only processes running as the bayanat OS
user can authenticate as the bayanat PG role.

Idempotent for upgrades: existing installs get ALTER USER ...
NOSUPERUSER and the trust rule is removed before the peer rule is
inserted.
Celery export tasks (PDF / JSON / CSV / media) iterated the
caller-supplied IDs and serialised every row, with no per-record
group check. A user with can_export but no group access could submit
restricted IDs in an export request; once an admin approved it (the
approval UI gives no signal that items cross access boundaries),
the requester received the full payload.

Add _accessible_items helper that yields only rows the export
requester is authorised to see (mirrors current_user.can_access on
the direct GET path) and route all three generator tasks plus the
media-attachment task through it. CSV path uses an inline check
because each iteration mutates a DataFrame. Restricted items get a
warning log; missing requester yields nothing (fail closed).

UI surface (admin approval warning when items are out-of-group)
deferred to a follow-up; this commit closes the data leak.
…r CLI

The /api/create-admin route was reachable over the network without
authentication during the install window. The supported quick-install
script started uWSGI + Caddy before the operator visited the wizard,
so the first network client could claim a fully privileged admin
account. The same hole reopened any time the admin role had zero
users.

Drop the route entirely (and its sibling /api/check-admin). The
installer now provisions the admin out-of-band: it generates a
high-entropy password and runs 'flask install --username admin
--password ...' before the service is exposed, then prints the
credentials once for the operator. The setup wizard now requires an
authenticated Admin session and starts from the language step.

flask install gains --username and --password options for
non-interactive use; if username is given without a password it
generates and prints one.

Drive-by: pre-commit ruff dropped 4 pre-existing unused imports
(Bulletin, DynamicField, DateHelper, celery_app) and one redundant
f-string in commands.py.
The credentials banner already prints username + password, but the
operator still has to construct the login URL themselves. Add the
fully-qualified /login URL to the banner and explain what the wizard
covers after sign-in, so the post-install handoff is one copy + one
click.
The native installer (bayanat script) provisions the initial admin via
flask install before the service is reachable. The Docker entrypoint
was not aligned: it ran create-db on a fresh volume but never created
an admin, leaving Pattern A's network surface (wizard requires auth,
no admin = no login) effectively unusable for compose users.

Add a flask install --username admin call after schema creation. With
no --password supplied the command generates a random one and prints
the credentials banner to stdout, which docker-compose logs flask
captures so the operator can retrieve it after first start.
Two issues from review:
- The entrypoint comment referenced 'docker-compose logs flask' but
  the service in docker-compose.yml is named bayanat.
- The deployment docs still told operators to run 'flask install'
  after startup, which now silently no-ops because the entrypoint
  bootstrapped the admin already, leaving them without credentials.

Fix the comment and rewrite the docs: the first-run password is in
'docker-compose logs bayanat | grep -A4 "Generated password"';
the CLI fallback only mints a fresh password if no admin exists.
Reviewer flagged that the native installer's _bootstrap_admin generated
the password in shell and passed it via 'flask install --password "$pw"',
exposing it through /proc/<pid>/cmdline and 'ps' for the lifetime of
the child.

Add a --password-stdin flag to the flask install CLI (mirroring the
docker login --password-stdin pattern). The installer now feeds the
password over a pipe:

    printf '%s\n' "$pw" | flask install --username admin --password-stdin

The password never appears in argv or environ. The Docker entrypoint
already used the safer shape (flask install --username admin generates
and prints) and is unchanged.
f-string substitution into postgresql:// and redis:// URLs broke
when passwords contained URL-special characters (/, @, #, +, =).
Surfaced during Docker bootstrap testing: a base64 password with
'/' caused 'Port could not be cast to integer' at app boot.

quote(safe='') around POSTGRES_USER/POSTGRES_PASSWORD/REDIS_PASSWORD
fixes Postgres URI plus all four Redis URLs (broker, result, session,
cache). Verified roundtrip via redis.from_url decode.
Wizard no longer creates the admin user; the installer (native and
Docker entrypoint) prints generated credentials and operators sign in
before the wizard runs. Update installation and docker guides to match,
and switch examples to docker compose v2 with --env-file .env.docker
so substitution of POSTGRES_USER/REDIS_PASSWORD does not silently fall
back to empty values.
Copy link
Copy Markdown
Contributor

@apodacaduron apodacaduron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BAY-01-001:

  • If the user does not belong to the same access group as a restricted bulletin, the history endpoint correctly blocks the request: GET /admin/api/bulletinhistory/3 returns {"message": "Restricted Access"}
  • Direct access to the restricted bulletin also correctly blocks: GET /admin/api/bulletin/3 returns restricted access and triggers the admin notification ✅
  • Related follow-up: the relations endpoint still returns 200 for the same restricted parent bulletin. Example: GET /admin/api/bulletin/relations/3?class=actor returns masked entity stubs plus relationship metadata like related_as, probability, comment, and user_id. This leaks relationship existence/metadata for a restricted parent record and should probably return 403. ⚠️
  • Related follow-up: the graph endpoint also returns 200 for the same restricted bulletin. Both GET /admin/api/graph/json?id=3&type=bulletin&expanded=false and GET /admin/api/graph/json?id=3&type=bulletin&expanded=true return masked nodes, but still expose graph topology and metadata, including restricted entity IDs, relation existence, relation type (Perpetrator), connected accessible records, and connected locations (Mexico, USA). This should probably return 403 when the root entity is restricted from the current user. ⚠️
  • Related follow-up: the flowmap endpoint masks the restricted actor but still leaks event/location timeline data. Example: POST /admin/api/flowmap/actors-for-locations with {"location_ids":[1],"q":[{}]} returns actor 1 as restricted, but includes current_event and next_event details such as event type (Detained/Abducted, Death), locations (Mexico, USA), location IDs, and event dates. Restricted actors should probably be excluded from the flowmap result or returned without event details. ⚠️

@apodacaduron
Copy link
Copy Markdown
Contributor

BAY-01-008:

  • The original reported import path for Actor Profile Description appears to be fixed: importing <img src=x onerror=alert(document.domain)> through the sheet import stores a sanitized value, and the onerror handler is removed before the value reaches the actor description v-html sink ✅
  • Follow-up found while testing the same import surface: the Missing Person import path still persists raw HTML when MP fields are mapped/populated and the actor profile is created with mode = 3. The payload is then rendered by MissingPersonField.js via v-html. In my test, CSP blocked execution of the onerror handler, but the raw attacker-controlled HTML was still stored and rendered, so this should not be considered fully remediated ⚠️
  • Suggested fix: apply the same sanitize_string() policy to imported MP fields before persistence, especially MP detail fields such as seen_in_detention_details, injured_details, known_dead_details, and skin_markings_details, plus direct MP text fields rendered by MissingPersonField.js. Alternatively, render these values as plain text instead of v-html.

@apodacaduron
Copy link
Copy Markdown
Contributor

BAY-01-009:

  • The main media update endpoint looks fixed now. When I tested PUT /admin/api/media/<id> as another DA user who can see the bulletin but is not assigned to it, the request correctly returned 403

  • However, I found the same assignment issue still exists on related media actions. As the unassigned DA user, these still returned 200:

    • PUT /admin/api/media/<id>/orientation
    • POST /admin/api/ocr/process/<media_id>
    • POST /admin/api/ocr/bulk
    • PUT /admin/api/extraction/<extraction_id>
  • So the direct media update is fixed, but the related media/OCR/extraction actions still seem to be missing the same assigned-user check. A DA who is not assigned to the bulletin can still rotate media, trigger OCR, and update extracted text on another analyst’s assigned item ⚠️

  • I think these endpoints should use the same edit boundary as PUT /admin/api/media/<id>, and we should add regression tests for the related media actions too.

@apodacaduron
Copy link
Copy Markdown
Contributor

BAY-01-010

  • The request completed, and the restricted/private relation targets were skipped and were not added to the Bulletin ✅

@apodacaduron
Copy link
Copy Markdown
Contributor

BAY-01-011:

  • Tested the Moderator bulk update case Bulletin access roles were not cleared after processing ✅

@apodacaduron
Copy link
Copy Markdown
Contributor

apodacaduron commented May 26, 2026

BAY-01-012:

  • Direct media/file access is now blocked with 403 / Unauthorized
  • The transcribe/media API flow is also blocked with 403

NOTE: the Bulletin drawer still renders the Media card for this user, including media ID, filename, etag/hash, and media action buttons. The file itself is blocked, but media metadata is still exposed through the parent Bulletin view/UI

@apodacaduron
Copy link
Copy Markdown
Contributor

BAY-01-035:

  • Tested the ragged CSV reproducer (a,b,c followed by 1,2,3,4,5) against SheetImport.parse_csv().
  • The parser now returns a normal JSON-serializable structure instead of tuple keys ✅
  • The extra fields are dropped and the result serializes cleanly, so this looks fixed.

@apodacaduron
Copy link
Copy Markdown
Contributor

BAY-01-036

  • Uploading a .xls file returns HTTP 500 Internal Server Error ❌
  • The fix pins engine="openpyxl" but openpyxl can't read .xls files, so the error still happens, just differently
  • .xls is still in the allowlist so users can still upload it

@apodacaduron
Copy link
Copy Markdown
Contributor

BAY-01-037

  • sheet=null and sheet=[0] on /import/api/xls/analyze correctly return 417 ✅
  • sheet=true on /import/api/xls/analyze returns HTTP 500 ❌
  • sheet=true on /import/api/process-sheet returns HTTP 500 ❌ — this endpoint has no type check

@apodacaduron
Copy link
Copy Markdown
Contributor

BAY-01-038:

  • XLSX import works correctly with text headers (a, b, c) mapped to actor name fields ✅
  • With numeric headers (1, 2, 3), the mapping screen accepts them, but the imported actor name fields are not populated and the actor is created as UNKNOWN UNKNOWN ⚠️

@apodacaduron
Copy link
Copy Markdown
Contributor

BAY-01-039:

  • Covered while testing BAY-01-008: the main sheet-import description and mismatch paths now sanitize the XSS payloads before they reach the actor description ✅
  • Related follow-up remains in the Missing Person import fields, which can still render raw imported HTML through MissingPersonField.js with CSP blocking execution ⚠️

@apodacaduron
Copy link
Copy Markdown
Contributor

BAY-01-040:

  • Tested malformed export request bodies like null, [], "hello", {}, {"config": null}, and {"config": "foo"}.
  • They no longer crash the server with HTTP 500; the API now returns a controlled {"message": "Error creating export request"} response ✅

@apodacaduron
Copy link
Copy Markdown
Contributor

BAY-01-042:

  • Tested the malformed import API bodies from the report. Those now return controlled responses instead of HTTP 500 ✅
  • Follow-up: /import/api/process-sheet still returns HTTP 500 for malformed files input, for example {}, {"files": null}, or {"files": "x"}. This looks like the same class of issue and should validate that files is a non-empty list before processing ⚠️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants