Replace Bleach with JustHTML#24986
Conversation
063a96d to
821bf0a
Compare
821bf0a to
79ce7cf
Compare
That could have been dangerous, but I can't find any places where that's problematic. The pattern I was worried about would be XSS through attributes like
That was a weird behavior which I'm happy to see go away - stripping malformed tag we recognize instead of escaping them never made much sense... |
diox
left a comment
There was a problem hiding this comment.
Looks good! Some comments to try to refactor a bit to make it simpler if possible.
| # Markdown wraps paragraphs in <p>, which bleach used to unwrap into newlines. | ||
| # justHTML does not, so this needs to be manually accounted for. | ||
| markdown = re.sub(r'(</\w+>)\s*<p[^>]*>', r'\1\n\n', markdown) |
There was a problem hiding this comment.
Do we care? That seems to add complexity that's going to be annoying later. Is the difference only newlines in the final output ? Can you point to a test that was failing without that bit ?
There was a problem hiding this comment.
These are the tests that were failing:
src/olympia/addons/tests/test_models.py::TestAddonModels::test_blockquote_markdown
src/olympia/addons/tests/test_models.py::TestAddonModels::test_bold_markdown
src/olympia/addons/tests/test_models.py::TestAddonModels::test_code_markdown
src/olympia/addons/tests/test_models.py::TestAddonModels::test_italics_markdown
src/olympia/addons/tests/test_indexers.py::TestAddonIndexer::test_extract_version_and_files
src/olympia/addons/tests/test_models.py::TestAddonModels::test_nested_newline
src/olympia/addons/tests/test_models.py::TestAddonModels::test_newlines
src/olympia/addons/tests/test_models.py::TestAddonModels::test_newlines_xss_script
I don't like it being there, but I was concerned specifically by src/olympia/addons/tests/test_models.py::TestAddonModels::test_newlines, since justHTML will interpret them into single newlines on the second clean. Without this line:

There was a problem hiding this comment.
Looking again, I misinterpreted here. It's because <p> is not in the allowed tags and is stripped; so a possible way to avoid this is adding p to the allowed_tags and updating the tests:
<p>Paragraph one.\nThis should be on the very next line.</p>\n<p>Should be two nl's before this line.</p>\n<p>Should be two nl's before this line.</p>\n<p>Should be two nl's before this line.</p>
Which I believe should be visually identical?
This would involve updating alot more tests than the ones above, though.
There was a problem hiding this comment.
I wouldn't mind allowing the <p>, I think atm add-on descriptions are the only place where we use this, and they are inside a <div> to it would probably be fine to allow <p>. Could you test locally with description from several popular add-ons like https://addons.mozilla.org/en-US/firefox/addon/ublock-origin/ which are using newlines ?
There was a problem hiding this comment.
There does seem to be issues with <p> rather than newlines. Just copy/pasting, with newlines:
and without (allowing/using <p>):
Which seems to notably affect block-level elements. It seems like going down this route would still introduce a workaround somewhere along the line, so getting it right at the root as it currently does does seem like the simplest solution.
Fixes mozilla/addons#15291
Description
Replaces the deprecated bleach library with JustHTML.
Notable behavioural changes:
src/olympia/amo/tests/test_amo_utils.py,src/olympia/translations/tests/test_commands.pysrc/olympia/addons/tests/test_models.pyTesting
Checklist
#ISSUENUMat the top of your PR to an existing open issue in the mozilla/addons repository.