Skip to content

fix(ipynb): use a dynamic code fence so cells containing backticks don't leak#2170

Open
YellowFoxH4XOR wants to merge 1 commit into
microsoft:mainfrom
YellowFoxH4XOR:fix/ipynb-code-fence-injection
Open

fix(ipynb): use a dynamic code fence so cells containing backticks don't leak#2170
YellowFoxH4XOR wants to merge 1 commit into
microsoft:mainfrom
YellowFoxH4XOR:fix/ipynb-code-fence-injection

Conversation

@YellowFoxH4XOR

@YellowFoxH4XOR YellowFoxH4XOR commented Jun 29, 2026

Copy link
Copy Markdown

Summary

IpynbConverter wraps every code/raw cell in a hard-coded 3-backtick fence
(```python … ```) without inspecting the cell's contents. When a cell's
source itself contains a ``` line — common in notebooks that demo Markdown,
print fenced strings, or embed heredocs — the inner backticks close the fence
early. The rest of the cell then leaks out of the code block and renders as
prose, corrupting the document.

This PR sizes the fence to the content: it uses a run of backticks one longer
than the longest backtick run inside the cell, per the
CommonMark fenced-code-block rules.
Cells with no backticks are unaffected (still 3 backticks), so output is
unchanged for the common case.

Repro

from markitdown import MarkItDown
import io, json
nb = {"nbformat": 4, "nbformat_minor": 5, "metadata": {}, "cells": [
    {"cell_type": "code", "source": ['print("""\n```\nnot python\n```\n""")'],
     "metadata": {}, "outputs": [], "execution_count": None}]}
print(MarkItDown().convert_stream(io.BytesIO(json.dumps(nb).encode()),
                                  file_extension=".ipynb").markdown)

Before — the inner ``` closes the block; the code leaks out as prose:

```python
print("""
```
not python
```
""")
```

After — a 4-backtick fence keeps the cell intact:

````python
print("""
```
not python
```
""")
````

Context

This is the same defect jupytext fixed in #712
and quarto in #3179 — the
"longer fence than content" approach is the standard, ecosystem-wide fix.

Note: the fence length counts all backtick runs, including
inline ones that cannot actually close a fence. This is a safe
over-approximation — it occasionally yields a fence one backtick longer than
strictly minimal, which is always valid CommonMark, and in exchange it also
covers indented closing fences (which a line-prefix check would miss).

Testing

  • Added test_ipynb_code_cell_with_backtick_fence — fails on main, passes with this change.
  • Full module/vector suite shows no new failures (pre-existing failures are all
    missing optional dependencies: docx, llm, speech, exiftool — identical on baseline).

@YellowFoxH4XOR

Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

…t leak

Code and raw notebook cells were always wrapped in a 3-backtick fence
without inspecting the cell content. A cell whose source itself contains a
``` line (common in notebooks that demo Markdown or print fenced strings)
closed the fence early, leaking the rest of the cell out as prose and
corrupting the document structure.

Wrap cells with a fence longer than the longest backtick run in the
content, per CommonMark. Adds a regression test.
@YellowFoxH4XOR YellowFoxH4XOR force-pushed the fix/ipynb-code-fence-injection branch from 0cd5bc7 to 6971e65 Compare June 29, 2026 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant