Skip to content

Security audit: fix 32 confirmed vulnerabilities (10 critical / 16 high / 5 medium / 1 low)#36

Open
erseco wants to merge 11 commits into
mainfrom
fix/critical-security-audit
Open

Security audit: fix 32 confirmed vulnerabilities (10 critical / 16 high / 5 medium / 1 low)#36
erseco wants to merge 11 commits into
mainfrom
fix/critical-security-audit

Conversation

@erseco
Copy link
Copy Markdown
Member

@erseco erseco commented Jun 2, 2026

Security audit: fix 32 confirmed vulnerabilities (10 critical / 16 high / 5 medium / 1 low)

Fixes every issue from a multi-agent security audit that survived independent adversarial verification. Organized into themed commits grouped by root cause so the change is reviewable and bisectable. Every fix ships with colocated tests in the same change.

Summary

  • 68 files changed, 12 commits.
  • make fix, make test-unit (incl. the per-file 90% line-coverage gate), make test-integration, and make test-e2e all green locally.
  • New .ts files have colocated *.spec.ts; new/changed public/app JS has colocated *.test.js; user-visible flows covered by existing/updated E2E specs.

Fixes by theme

1. Path traversal / arbitrary file write — shared safe-path foundation + sinks

New src/utils/safe-path.ts (single source of truth: isSafePathSegment / safeJoin / sanitizeFileExtension) plus a fix to file-helper.isPathSafe, which used a naive startsWith(base) that treated /data/assets-evil as inside /data/assets. Applied to every sink building an on-disk path from untrusted input:

  • C1 (critical)clientId used directly as the on-disk filename across asset upload/finalize/sync/stream (assets.ts), batch upload (upload-session.ts), and API v1 (api/v1/assets.ts) → arbitrary file write/overwrite (webshell / DB / cross-tenant). The repo already had isPathSafe/sanitizeFolderPath but applied them to the wrong field.
  • C2 (critical)odeSessionId path param in export download (GET/POST) → arbitrary file write + directory read (export.ts).
  • C3 (critical)themeName/themeId in theme bundle/download/metadata endpoints → unauthenticated arbitrary file/dir read, incl. .env/DB (themes.ts).
  • H10 (high)resumableIdentifier chunk-directory traversal (assets.ts).
  • H14 (high)ideviceId metadata endpoint traversal (idevices.ts).

No auth added to the public theme/idevice resource endpoints (they are legitimately fetched unauthenticated by preview/embedding/static builds); the traversal validation is the fix.

2. Broken authorization / IDOR

  • C4 (critical)POST /projects/uuid/:uuid/duplicate had no auth/access check; any caller could clone an arbitrary private project into their account. Now requires auth + checkProjectAccess on the source.
  • H4 (high)GET /api/games/:odeSessionId/idevices returned full iDevice content of any project by UUID, unauthenticated. Now withJwtAuth + enforceProjectAccess. Verified the editor/preview caller is same-origin and sends the auth cookie; exported static packages never call the live endpoint.
  • M1 (medium) — project sharing endpoints leaked owner/collaborator emails for any project (numeric variant unauthenticated + enumerable). Now require auth + access.
  • M2 (medium)set_platform_new_ode exported any project by UUID with no ownership binding. Now requires the project's platform_id to match the validated JWT cmid.

3. SSRF

New src/utils/ssrf-guard.ts: safeFetch allows only http(s), resolves the host and rejects private/loopback/link-local/CGNAT/unspecified addresses, and follows redirects manually re-validating each hop (DNS + fetch injectable for hermetic tests).

  • H2 (high)link-validator.validateLink fetched arbitrary URLs with redirect: 'follow' and no egress filter. Now routes through safeFetch; a blocked target is reported as broken without leaking the resolved IP.
  • H1 (high)/brokenlinks reimplemented the same unguarded fetch inline and was unauthenticated. Deleted the duplicate, calls the shared SSRF-safe service, and requires auth on all three brokenlinks endpoints.
  • M3 (medium)isAllowedProviderUrl failed open when PROVIDER_URLS was empty (the shipped default). Now fails closed; returnurls with internal-IP/non-http hosts are rejected.

4. Yjs persistence

  • C7 (critical) — the document mutex woke all waiters on release, so several callers believed they held the lock (reproduced: 4–5 of 5 concurrent). Replaced with a real FIFO mutex; release is idempotent.
  • C6 (critical) — REST API v1 edits were saved to yjs_updates with hardcoded version '1', always older than the Date.now() snapshot version, so they were filtered out on load (silent, unrecoverable loss). Now uses a monotonic timestamp version.
  • H6 (high)saveFullState did a non-transactional delete-then-insert; a failed insert wiped state. Now wrapped in a transaction.
  • H5 (high) — the client load endpoint read only the snapshot table; a project whose state lived in yjs_updates loaded empty/404. Now reconstructs snapshot + updates.
  • M6 (medium)CAST(version AS INTEGER) is rejected by MySQL (needs SIGNED) and overflows 32-bit on PostgreSQL for the millisecond-timestamp versions already in use. Now dialect-aware (INTEGER/BIGINT/SIGNED).

5. Stored / DOM XSS

  • C5 (critical) — a remote collaborator's iDevice htmlContent was assigned to innerHTML with no sanitization (zero-click JS in every collaborator's session). New public/app/utils/sanitizeHtml.js (DOMPurify-backed, with a fail-safe fallback) sanitizes remote content before innerHTML; DOMPurify was wired into the workarea (it had only been loaded for the equation editor). loadInitScriptIdevice re-attaches legitimate iDevice behavior, so interactivity is preserved.
  • H12 (high) — the File Manager list view interpolated the asset filename raw into innerHTML in the reference-count branch. Now escaped with the existing escapeHtml helper.

6. Resource exhaustion / DoS

  • C9 (critical) — ELP import called unzipSync with no decompressed-size cap (zip bomb → OOM). Now enforces cumulative / per-entry / entry-count caps via fflate's central-directory filter, aborting before inflation, on the main, nested-ELP, and zip-contents paths.
  • H8 (high) — client-controlled awareness availableAssets grew server memory unbounded. Now capped per message, ids validated, distinct entries and pending requests bounded per project, orphaned keys pruned on disconnect; explicit 8MB maxPayloadLength on the Yjs WS route.
  • H9 (high) — abandoned chunked uploads never expired (disk + memory leak). Added per-chunk / total-chunk caps and a TTL sweeper (wired into server startup/shutdown).
  • M5 (medium)upload-session-manager.cleanupExpired() was defined but never scheduled. Now scheduled (with serverPriorityQueue.cleanupStaleSlots), wired into startup/shutdown.
  • L1 (low) — batch upload buffered all files before checking the size cap. Now enforces the cap by declared size before buffering, with incremental-abort fallback.

7. Client-side data loss

  • H7 (high)AssetManager._putToCache only fell back to IndexedDB on scheme errors, so a QuotaExceededError was swallowed and the only in-memory blob copy was then evicted — losing it entirely. Now any cache write failure falls back to IndexedDB and only resolves after a confirmed persistent write; otherwise it re-throws so the caller keeps the blob in memory.

8. Auth secret / EPUB / DB / parser

  • H3 (high) — admin/user/pages/project routes and admin-route-helpers each used a divergent local getJwtSecret (JWT_SECRET || APP_SECRET || <public default>) to verify tokens, while auth.ts signs with API_JWT_SECRET || JWT_SECRET || <default>. With the shipped .env (JWT_SECRET unset) this mismatched and broke admin auth; and if APP_SECRET were unset it fell back to a public in-repo constant, enabling token forgery. All now use the single canonical resolver; APP_SECRET stays only for platform JWTs.
  • H11 (high) — EPUB export used blind global regexes that broke external .html links, mangled prose, and corrupted void elements with > in an attribute. Replaced with a real xmldom parse + XHTML serialize, rewriting only internal page links.
  • M4 (medium)transferOwnership ran 3 writes non-transactionally. Now atomic.
  • H13 (high) — legacy XML import resolved each <reference> via a full-document scan (O(n²); ~21.8s on a crafted file). Now a Map built once per parse (sub-second).

Re-verified, not changed

  • C8 ("Ctrl+S skips assets") — re-verified against the live code: YjsProjectManagerMixin.save() overrides project.save() and routes through SaveManager, which uploads pending assets before markClean. Not a bug; no change.
  • 4 audit findings were rejected by the verifiers as false positives and are not included.

Testing

  • make test-unit: 7280 pass, 0 fail; coverage gate ✓ (all 219 files ≥90% line coverage). Two files I touched near the gate were brought back over 90% with real behavior tests rather than excluded.
  • make test-integration: green (admin integration suite's JWT env aligned with the unified resolver).
  • make test-e2e (Playwright/chromium): green.

Notes for reviewers

  • The dual-table Yjs model (yjs_documents snapshot + yjs_updates) is preserved; this PR makes its two write paths and version schemes consistent and the load path reconstruct from both. Collapsing to a single table is a larger refactor left for follow-up.
  • set_platform_new_ode_browser has a residual (non-exfiltrating) platform-link gap noted during M2; worth a follow-up.

erseco added 11 commits June 4, 2026 10:37
…ix bypass

Introduce src/utils/safe-path.ts as the single source of truth for validating
untrusted path segments/identifiers before building on-disk paths:
- isSafePathSegment / assertSafePathSegment reject separators, traversal
  tokens (. / ..), NUL/control chars, and enforce a charset + length cap.
- isWithinBase / safeJoin assert post-join containment.
- sanitizeFileExtension yields a safe lowercased extension or ''.

Also fix file-helper.isPathSafe which used a naive startsWith(resolvedBase)
check that treated /data/assets-evil as inside /data/assets; now uses a
separator-aware containment check.

Foundation for the path-traversal / arbitrary-file-write fixes that follow.

Refs: C1,C2,C3,H10,H14
…utes

Validate every untrusted identifier that becomes part of an on-disk path with
the shared safe-path helper, and build paths via safeJoin (post-join
containment) instead of raw path.join:

- assets.ts: clientId on upload/finalize/sync/stream + resumableIdentifier on
  chunk upload (C1, H10) — arbitrary file write outside FILES_DIR.
- api/v1/assets.ts: clientId on upload (C1).
- upload-session.ts: per-file clientId in batch (C1).
- export.ts: odeSessionId on export download GET/POST (C2) — arbitrary file
  write + directory read.
- themes.ts: themeName/themeId on bundle, download-zip and metadata endpoints
  (C3, H14) — unauthenticated arbitrary file/dir read.
- idevices.ts: ideviceId on installed-metadata endpoint (H14).

Invalid identifiers now return the route's existing 4xx shape; legitimate
UUIDs/slugs/timestamps still pass. No authentication added to public
theme/idevice resource endpoints (avoids breaking preview/embedding/static).
Tests added for each sink.

Refs: C1,C2,C3,H10,H14
… routes

- project duplicate: require auth and checkProjectAccess on the SOURCE project
  before cloning it; previously any (even unauthenticated) caller could clone
  an arbitrary private project into their account (IDOR, C4).
- project sharing (numeric + uuid): require auth and project access before
  returning owner/collaborator emails; previously unauthenticated and
  enumerable by sequential id (M1).
- games iDevices endpoint: add withJwtAuth + enforceProjectAccess; previously
  unauthenticated, exposing full iDevice content of any project by UUID (H4).
  Verified the editor/preview caller is same-origin and sends the auth cookie;
  exported static packages never call the live endpoint.
- platform set_platform_new_ode: require the requested project's platform_id to
  match the validated JWT cmid before exporting/posting it back (IDOR, M2).

Tests updated to authenticate and added negative 401/403 cases.

Refs: C4,H4,M1,M2
…+ platform callbacks

- New src/utils/ssrf-guard.ts: safeFetch() allows only http(s), resolves the
  host and rejects private/loopback/link-local/CGNAT/unspecified addresses,
  and follows redirects MANUALLY re-validating every hop (DNS + fetch are
  injectable for hermetic tests).
- link-validator.validateLink: route external checks through safeFetch instead
  of fetch(redirect:'follow'); a blocked target is reported as broken without
  leaking the resolved IP (H2).
- project.ts /brokenlinks: delete the duplicated inline validateLink (which had
  its own unguarded fetch) and call the shared SSRF-safe service; require
  authentication on all three brokenlinks endpoints (H1).
- platform-jwt: isAllowedProviderUrl now fails CLOSED when PROVIDER_URLS is
  unset, and returnurls with internal-IP/non-http hosts are rejected (M3).

Refs: H1,H2,M3
- doc-lock: replace the broken mutex (which woke ALL waiters on release, so
  several callers believed they held the lock) with a real FIFO mutex that
  hands off to exactly one waiter per release; release is idempotent (C7).
- queries/yjs.saveFullState: stamp a monotonic Date.now() version instead of the
  hardcoded '1' that was always older than a snapshot version and thus filtered
  out on load (silent loss of every REST API v1 edit, C6); wrap the
  delete-then-insert in a transaction so a failed insert can't wipe state (H6).
- routes/yjs GET: reconstruct from snapshot + updates instead of snapshot-only,
  so a project whose server-side state lives in yjs_updates no longer loads as
  empty/404 (H5).
- queries/yjs version cast is now dialect-aware (INTEGER/BIGINT/SIGNED): bare
  INTEGER overflowed PostgreSQL and is rejected by MySQL for the ms-timestamp
  versions already used by incremental updates (M6).

Refs: C6,C7,H5,H6,M6
…enames (XSS)

- Add public/app/utils/sanitizeHtml.js (DOMPurify-backed, with a fail-safe
  fallback) and route remote/collaborator iDevice htmlContent through it before
  the innerHTML assignment in idevicesEngine, closing zero-click stored XSS
  across collaborators (C5). Load DOMPurify in the workarea template (it was
  only bundled for the equation editor). loadInitScriptIdevice re-attaches
  legitimate iDevice behavior, so interactivity is preserved.
- Escape the asset filename in the File Manager list reference-count branch
  (the one unescaped innerHTML sink), using the class's existing escapeHtml
  helper (H12).

Refs: C5,H12
- ElpxImporter: enforce decompressed-size, per-entry and entry-count caps via
  fflate's central-directory filter (aborts before inflation), on the main,
  nested-ELP and zip-contents paths — closes the zip-bomb OOM (C9).
- asset-coordinator: cap awareness availableAssets per message, validate asset
  ids, bound distinct assets and pendingRequests per project, and prune orphaned
  asset keys on unregisterClient; set an explicit 8MB maxPayloadLength on the
  Yjs WS route (H8).
- assets chunked upload: per-chunk and total-chunk-count caps, plus a TTL
  sweeper (start/stop wired in index.ts) that reaps abandoned uploads from the
  Map and disk (H9).
- upload-session-manager: schedule the previously-orphaned cleanupExpired() and
  serverPriorityQueue.cleanupStaleSlots() (start/stop wired in index.ts) (M5).
- upload-session batch: enforce the size cap by declared size BEFORE buffering,
  with incremental-abort fallback, instead of buffering everything first (L1).

Refs: C9,H8,H9,M5,L1
…viction

AssetManager._putToCache only fell back to IndexedDB when the error message
matched /unsupported|scheme|Failed to execute/. A QuotaExceededError matched
none, so it resolved as success and putAsset/putBlob then evicted the only
in-memory copy — losing the blob entirely (broken on save and reload). Now any
Cache write failure falls back to IndexedDB and only resolves after a confirmed
persistent write (verified by read-back); if neither store succeeds it re-throws
so the caller keeps the blob in memory. QuotaExceededError is detected
explicitly and surfaced as a storage-full warning.

C8 (Ctrl+S skipping assets) was re-verified and is NOT a bug: the
YjsProjectManagerMixin routes project.save() through SaveManager, which uploads
pending assets before markClean.

Refs: H7 (C8 verified not-a-bug)
…, O(n) ref lookup

- JWT (H3): admin/user/pages/project routes and admin-route-helpers all used a
  divergent local getJwtSecret (JWT_SECRET || APP_SECRET || public-default) to
  VERIFY tokens, while auth.ts SIGNS with API_JWT_SECRET || JWT_SECRET || default.
  With the shipped .env (JWT_SECRET unset) this mismatched, breaking admin auth —
  and if APP_SECRET were unset it fell back to a public in-repo constant,
  enabling token forgery. All now use the single canonical resolver from
  routes/auth; APP_SECRET stays only for platform JWTs. Specs aligned to sign
  with the canonical secret.
- EPUB (H11): replace the blind regex HTML->XHTML rewrites (which broke external
  .html links, mangled prose, and corrupted void elements with '>' in an
  attribute) with a real xmldom parse + XHTML serialize, rewriting only internal
  page links via the page filename map.
- DB (M4): wrap transferOwnership's read-verify + 3 writes in a transaction so
  ownership can't end up half-transferred.
- Import (H13): resolve <reference> instances via a Map built once per parse
  instead of a full-document scan per reference (O(n^2) -> O(n)); ~21.8s -> sub-second.

Refs: H3,H11,M4,H13
Add behavior-asserting tests for the security fixes' surrounding code so both
files clear the per-file 90% line-coverage gate: idevices route config
resolution / dedup / CSS url rewrite (89.27% -> 93.35%) and asset-coordinator
pending-request bounds, id validation, and upload-session callback/error paths
(87.29% -> 91.90%).
…resolver

The canonical getJwtSecret (routes/auth) prefers API_JWT_SECRET, which the
loaded .env sets, so the admin integration suite must override API_JWT_SECRET
(not only JWT_SECRET) to match the secret generateTestToken signs with.
@erseco erseco force-pushed the fix/critical-security-audit branch from bf57f4b to af15281 Compare June 4, 2026 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant