Skip to content

Replace Bleach with JustHTML#24986

Open
chrstinalin wants to merge 3 commits into
mozilla:masterfrom
chrstinalin:#15291-replace-bleach
Open

Replace Bleach with JustHTML#24986
chrstinalin wants to merge 3 commits into
mozilla:masterfrom
chrstinalin:#15291-replace-bleach

Conversation

@chrstinalin

@chrstinalin chrstinalin commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Fixes mozilla/addons#15291

Description

Replaces the deprecated bleach library with JustHTML.

Notable behavioural changes:

  1. Quotes on their own are not necessarily escaped. See: src/olympia/amo/tests/test_amo_utils.py, src/olympia/translations/tests/test_commands.py
  2. Malformed tags are escaped rather than stripped. See: src/olympia/addons/tests/test_models.py

Testing

  1. Follow the testing section of Support Markdown in Add-on Listing Fields #22956 to ensure no regressions around Markdown have occurred.
image

Checklist

  • Add #ISSUENUM at the top of your PR to an existing open issue in the mozilla/addons repository.
  • Successfully verified the change locally.
  • The change is covered by automated tests, or otherwise indicated why doing so is unnecessary/impossible.

@chrstinalin chrstinalin marked this pull request as draft June 4, 2026 14:34
Comment thread requirements/prod.txt
@chrstinalin chrstinalin force-pushed the #15291-replace-bleach branch 8 times, most recently from 063a96d to 821bf0a Compare June 8, 2026 21:10
@chrstinalin chrstinalin force-pushed the #15291-replace-bleach branch from 821bf0a to 79ce7cf Compare June 8, 2026 21:23
@chrstinalin chrstinalin marked this pull request as ready for review June 8, 2026 21:53
@chrstinalin chrstinalin requested a review from diox June 8, 2026 21:53
@diox

diox commented Jun 9, 2026

Copy link
Copy Markdown
Member

Quotes on their own are not necessarily escaped

That could have been dangerous, but I can't find any places where that's problematic. The pattern I was worried about would be XSS through attributes like " onmouseover="alert(5) when inserted into something like title="{{ clean_me(dangerous) }}" but the code where this behavior has changed is linkify_* and escape_all, and those are meant to return HTMLified content anyway, so we wouldn't use that inside an attribute value.

Malformed tags are escaped rather than stripped.

That was a weird behavior which I'm happy to see go away - stripping malformed tag we recognize instead of escaping them never made much sense...

@diox diox left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some comments to try to refactor a bit to make it simpler if possible.

Comment thread src/olympia/translations/models.py Outdated
Comment on lines +293 to +295
# Markdown wraps paragraphs in <p>, which bleach used to unwrap into newlines.
# justHTML does not, so this needs to be manually accounted for.
markdown = re.sub(r'(</\w+>)\s*<p[^>]*>', r'\1\n\n', markdown)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care? That seems to add complexity that's going to be annoying later. Is the difference only newlines in the final output ? Can you point to a test that was failing without that bit ?

@chrstinalin chrstinalin Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the tests that were failing:

src/olympia/addons/tests/test_models.py::TestAddonModels::test_blockquote_markdown
src/olympia/addons/tests/test_models.py::TestAddonModels::test_bold_markdown
src/olympia/addons/tests/test_models.py::TestAddonModels::test_code_markdown
src/olympia/addons/tests/test_models.py::TestAddonModels::test_italics_markdown
src/olympia/addons/tests/test_indexers.py::TestAddonIndexer::test_extract_version_and_files
src/olympia/addons/tests/test_models.py::TestAddonModels::test_nested_newline
src/olympia/addons/tests/test_models.py::TestAddonModels::test_newlines
src/olympia/addons/tests/test_models.py::TestAddonModels::test_newlines_xss_script

I don't like it being there, but I was concerned specifically by src/olympia/addons/tests/test_models.py::TestAddonModels::test_newlines, since justHTML will interpret them into single newlines on the second clean. Without this line:
image

@chrstinalin chrstinalin Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking again, I misinterpreted here. It's because <p> is not in the allowed tags and is stripped; so a possible way to avoid this is adding p to the allowed_tags and updating the tests:

<p>Paragraph one.\nThis should be on the very next line.</p>\n<p>Should be two nl's before this line.</p>\n<p>Should be two nl's before this line.</p>\n<p>Should be two nl's before this line.</p>

Which I believe should be visually identical?

This would involve updating alot more tests than the ones above, though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind allowing the <p>, I think atm add-on descriptions are the only place where we use this, and they are inside a <div> to it would probably be fine to allow <p>. Could you test locally with description from several popular add-ons like https://addons.mozilla.org/en-US/firefox/addon/ublock-origin/ which are using newlines ?

@chrstinalin chrstinalin Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There does seem to be issues with <p> rather than newlines. Just copy/pasting, with newlines:

image

and without (allowing/using <p>):

image

Which seems to notably affect block-level elements. It seems like going down this route would still introduce a workaround somewhere along the line, so getting it right at the root as it currently does does seem like the simplest solution.

Comment thread src/olympia/translations/models.py Outdated
Comment thread src/olympia/translations/models.py Outdated
Comment thread src/olympia/translations/models.py
Comment thread src/olympia/devhub/feeds.py Outdated
@chrstinalin chrstinalin requested a review from diox June 9, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task]: Replace bleach

2 participants