EPUB Content Sanitization and Sandbox Hardening#440
Conversation
- Implement `sanitizeEpubDocument` and `createEpubSanitizerHook` using DOMPurify - Integrate sanitization hook into `EpubLoader` and `useReaderEpub` - Remove `allow-scripts` from reader iframe sandbox in web app - Remove `allow-scripts` from CSP sandbox directive in worker file server - Add unit tests for document sanitization - Update existing tests to match hardened sandbox configuration This change addresses a critical security item in AGENTS.md by ensuring all EPUB content is sanitized before rendering and enforcing a script-free environment via browser-level sandbox controls. Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Deploying do-epub-studio with
|
| Latest commit: |
3b01794
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://e1e38dff.do-epub-studio.pages.dev |
| Branch Preview URL: | https://fix-epub-sanitization-harden.do-epub-studio.pages.dev |
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| Security | 3 high |
🟢 Metrics 21 complexity · 2 duplication
Metric Results Complexity 21 Duplication 2
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
🚀 Performance Report⚡ Startup & Interaction
🛠️ CI & Workflow
|
…idempotency Addresses the two Codacy high-severity findings on PR #440: 1. Security: replace FORBID_TAGS-only with an explicit ALLOWED_TAGS allowlist (EPUB_BODY/HEAD/STRUCTURAL_TAGS). The denylist approach is a known foot-gun for untrusted HTML because a future DOMPurify version that ships a new default-permitted tag would silently let it through. The new allowlist also drops form/input/button/select/ textarea, blocking phishing forms embedded in EPUBs. 2. ErrorProne: replace WHOLE_DOCUMENT=true + IN_PLACE=true with a deterministic three-pass implementation: (a) DOMPurify allowlist on a clone (RETURN_DOM:true) (b) live documentElement.replaceChildren() with sanitized children (c) sanitizeDom() for href-scheme + event-attr enforcement This avoids the WHOLE_DOCUMENT mutation foot-gun and is provably idempotent: a new 'hook is safe to invoke multiple times' test asserts the DOM is identical after N invocations. Also added: structural tags (html/head/body) so DOMPurify no longer drops the document wrapper, and 7 new sanitizer tests covering forms, unknown tags, event-handler stripping, idempotency, empty input, and null-document defensive paths. The double registration in useReaderEpub.ts vs epub-loader.ts is intentional and now documented: each creates its own rendition, and the hook is idempotent even if the renditions are ever unified.
…plementation - Implement `sanitizeEpubDocument` with an explicit `ALLOWED_TAGS` allowlist - Address Codacy findings by replacing denylist-only approach - Use a deterministic three-pass implementation to avoid mutation foot-guns: (a) DOMPurify on a clone with `RETURN_DOM: true` (b) Sync attributes and children back to live document (c) Post-process with `sanitizeDom` for scheme validation - Remove phishing surface by stripping `form`, `input`, `button`, etc. - Preserve structural tags (`html`, `head`, `body`) and head metadata - Add unit tests for phishing prevention, structural integrity, and idempotency - Harden reader sandbox and CSP by removing `allow-scripts` Addresses high-severity security debt in AGENTS.md. Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>
Implemented a multi-layered security enhancement for the EPUB reader:
DOMPurify-backed sanitizer for EPUB documents that strips dangerous tags (script, iframe, object) while preserving necessary styling tags (link, style, meta).allow-scriptsfrom the rendition iframe's sandbox attribute, preventing any potential script execution even if sanitization is bypassed.allow-scriptsfrom the sandbox directive.PR created automatically by Jules for task 10893744634142080706 started by @d-oit