Skip to content

EPUB Content Sanitization and Sandbox Hardening#440

Open
d-oit wants to merge 3 commits into
mainfrom
fix/epub-sanitization-hardening-10893744634142080706
Open

EPUB Content Sanitization and Sandbox Hardening#440
d-oit wants to merge 3 commits into
mainfrom
fix/epub-sanitization-hardening-10893744634142080706

Conversation

@d-oit
Copy link
Copy Markdown
Owner

@d-oit d-oit commented Jun 6, 2026

Implemented a multi-layered security enhancement for the EPUB reader:

  1. Content Sanitization: Added a DOMPurify-backed sanitizer for EPUB documents that strips dangerous tags (script, iframe, object) while preserving necessary styling tags (link, style, meta).
  2. Iframe Hardening: Removed allow-scripts from the rendition iframe's sandbox attribute, preventing any potential script execution even if sanitization is bypassed.
  3. Backend Defense: Updated the Content-Security-Policy for EPUB assets in the Cloudflare Worker to remove allow-scripts from the sandbox directive.
  4. Testing: Added comprehensive unit tests for the new sanitizer and updated existing loader and hook tests to verify the secure configuration.

PR created automatically by Jules for task 10893744634142080706 started by @d-oit

- Implement `sanitizeEpubDocument` and `createEpubSanitizerHook` using DOMPurify
- Integrate sanitization hook into `EpubLoader` and `useReaderEpub`
- Remove `allow-scripts` from reader iframe sandbox in web app
- Remove `allow-scripts` from CSP sandbox directive in worker file server
- Add unit tests for document sanitization
- Update existing tests to match hardened sandbox configuration

This change addresses a critical security item in AGENTS.md by ensuring
all EPUB content is sanitized before rendering and enforcing a
script-free environment via browser-level sandbox controls.

Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Jun 6, 2026

Deploying do-epub-studio with  Cloudflare Pages  Cloudflare Pages

Latest commit: 3b01794
Status: ✅  Deploy successful!
Preview URL: https://e1e38dff.do-epub-studio.pages.dev
Branch Preview URL: https://fix-epub-sanitization-harden.do-epub-studio.pages.dev

View logs

@codacy-production
Copy link
Copy Markdown
Contributor

codacy-production Bot commented Jun 6, 2026

Not up to standards ⛔

🔴 Issues 3 high

Alerts:
⚠ 3 issues (≤ 0 issues of at least minor severity)

Results:
3 new issues

Category Results
Security 3 high

View in Codacy

🟢 Metrics 21 complexity · 2 duplication

Metric Results
Complexity 21
Duplication 2

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 6, 2026

🚀 Performance Report

⚡ Startup & Interaction

Metric Value (ms) Limit (ms) Trend Status
First Contentful Paint 180.00 1500 -
Chapter Switch Latency 0.00 300 -
Offline Rehydrate Time 214.00 800 -
DOM Interactive 21.80 - - -
Load Event End 119.70 - - -

🛠️ CI & Workflow

Metric Value Limit Status
Total CI Duration 5.27 min 15.00 min
Pnpm Cache Hit Hit -

d-oit and others added 2 commits June 6, 2026 10:20
…idempotency

Addresses the two Codacy high-severity findings on PR #440:

1. Security: replace FORBID_TAGS-only with an explicit ALLOWED_TAGS
   allowlist (EPUB_BODY/HEAD/STRUCTURAL_TAGS). The denylist approach
   is a known foot-gun for untrusted HTML because a future DOMPurify
   version that ships a new default-permitted tag would silently let
   it through. The new allowlist also drops form/input/button/select/
   textarea, blocking phishing forms embedded in EPUBs.

2. ErrorProne: replace WHOLE_DOCUMENT=true + IN_PLACE=true with a
   deterministic three-pass implementation:
     (a) DOMPurify allowlist on a clone (RETURN_DOM:true)
     (b) live documentElement.replaceChildren() with sanitized children
     (c) sanitizeDom() for href-scheme + event-attr enforcement
   This avoids the WHOLE_DOCUMENT mutation foot-gun and is provably
   idempotent: a new 'hook is safe to invoke multiple times' test
   asserts the DOM is identical after N invocations.

Also added: structural tags (html/head/body) so DOMPurify no longer
drops the document wrapper, and 7 new sanitizer tests covering forms,
unknown tags, event-handler stripping, idempotency, empty input, and
null-document defensive paths.

The double registration in useReaderEpub.ts vs epub-loader.ts is
intentional and now documented: each creates its own rendition, and
the hook is idempotent even if the renditions are ever unified.
…plementation

- Implement `sanitizeEpubDocument` with an explicit `ALLOWED_TAGS` allowlist
- Address Codacy findings by replacing denylist-only approach
- Use a deterministic three-pass implementation to avoid mutation foot-guns:
  (a) DOMPurify on a clone with `RETURN_DOM: true`
  (b) Sync attributes and children back to live document
  (c) Post-process with `sanitizeDom` for scheme validation
- Remove phishing surface by stripping `form`, `input`, `button`, etc.
- Preserve structural tags (`html`, `head`, `body`) and head metadata
- Add unit tests for phishing prevention, structural integrity, and idempotency
- Harden reader sandbox and CSP by removing `allow-scripts`

Addresses high-severity security debt in AGENTS.md.

Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>
d-oit added a commit that referenced this pull request Jun 6, 2026
Plan 068 orchestrates resolution of the 15 open issues (#439-#454) and
the codacy-flagged PR #440 via a hybrid+swarm strategy. Phase 0 commits
the in-tree fixes; Phase 1 hardens PR #440 sanitizer; Phases 2-3 fan
out DX scaffolding/hygiene work; Phase 4 synthesizes the ADR and PRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant