fix(markdown_parser): prevent unclosed <u> tags from pairing via emphasis algorithm#13134
Conversation
…asis algorithm When two <u> underline start tags appear without closing </u> tags, they were incorrectly paired as an opener/closer pair through the emphasis algorithm, causing the literal <u> markers and surrounding text to be silently deleted. Root cause: <u> was tokenized as UnderlineStart with can_close set from right-flanking rules, allowing process_emphasis to match two <u> openers as a pair. Underline is only meant to be closed by an explicit </u> tag (handled by parse_underline), never by another <u>. Fix: - Set can_close = false for UnderlineStart delimiters so <u> is never treated as a closing delimiter by process_emphasis - Add a guard in can_open_for to reject UnderlineStart pairs (defense in depth) Adds regression tests for the cases from the issue: - <u><u> should round-trip as literal text - <u>a<u> should preserve both <u> markers - a <u>word<u> b should not delete text Fixes warpdotdev#12863
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: zhenxing.shen.
|
|
I'm starting a first review of this pull request. You can view the conversation on Warp. I reviewed this pull request and requested human review from: Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR updates the inline Markdown delimiter handling so <u> start tags can no longer act as emphasis closers, while preserving explicit underline closing through </u>. It also adds regression coverage for unclosed underline-start sequences that previously dropped literal text.
Concerns
- No blocking correctness, security, or spec-alignment concerns found in the attached diff.
Verdict
Found: 0 critical, 0 important, 0 suggestions
Approve
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
Summary
When two
<u>underline start tags appear without closing</u>tags, they were incorrectly paired as an opener/closer pair through the emphasis algorithm, causing the literal<u>markers and surrounding text to be silently deleted.Root Cause
<u>was tokenized asUnderlineStartwithcan_closeset from right-flanking rules, allowingprocess_emphasisto match two<u>openers as a pair. Underline is only meant to be closed by an explicit</u>tag (handled byparse_underline), never by another<u>.Fix
can_close = falseforUnderlineStartdelimiters so<u>is never treated as a closing delimiter byprocess_emphasiscan_open_forto rejectUnderlineStartpairs (defense in depth)Regression Tests
Added tests covering the cases from the issue:
<u><u>round-trips as literal text (was: empty line, all text deleted)<u>a<u>preserves both<u>markers (was:a, both markers deleted)a <u>word<u> bdoes not delete text (was:a word b, markers deleted)Fixes #12863