Skip to content

Preserve DOCX run boundaries on PM round trip#832

Merged
jedrazb merged 5 commits into
eigenpal:mainfrom
aldrinjenson:aldrinjenson/codex/preserve-run-boundaries
Jun 16, 2026
Merged

Preserve DOCX run boundaries on PM round trip#832
jedrazb merged 5 commits into
eigenpal:mainfrom
aldrinjenson:aldrinjenson/codex/preserve-run-boundaries

Conversation

@aldrinjenson

@aldrinjenson aldrinjenson commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Preserve source DOCX run boundaries through no-op toProseDoc -> fromProseDoc round trips.
  • Restore empty source runs and split ProseMirror-merged same-mark text only when the paragraph text and mark stream still match the imported source.
  • Add regression coverage, a patch changeset, and regenerated API snapshots for the new internal paragraph attr.

Testing

  • bun run format
  • bun test packages/core/src/prosemirror/conversion
  • bun run typecheck
  • bun run check:parity
  • bun run build:packages
  • bun run api:check

Refs #830


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

@vercel

vercel Bot commented Jun 16, 2026

Copy link
Copy Markdown

@aldrinjenson is attempting to deploy a commit to the EigenPal Team on Vercel.

A member of the Team first needs to authorize it.

@eigenpal-release-pal

eigenpal-release-pal Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA ✍️ ✅

Posted by the CLA bot.

@aldrinjenson

Copy link
Copy Markdown
Contributor Author

Update after corpus verification: the original version only restored boundaries after toProseDoc, but the real-doc gap was earlier. This branch now also stops parseParagraph from applying consolidateParagraphContent during DOCX parse, so adjacent same-rPr <w:r> elements and leading empty runs survive into the document model before the PM round trip.

Sanity check on the #830 attachment: raw document.xml has 139 body paragraphs, parsed model has 139, and the paragraphs with 7 raw <w:r> runs now parse as 7 model runs.

Additional validation after the parser fix:

  • bun test packages/core/src/docx packages/core/src/prosemirror/conversion
  • bun run typecheck
  • bun run check:parity
  • bun run api:check

@vercel

vercel Bot commented Jun 16, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docx-editor Ready Ready Preview, Comment Jun 16, 2026 3:02pm

Request Review

@jedrazb jedrazb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for taking this on, the approach is solid. i like that it fails closed, and i couldn't get it to corrupt or duplicate content. requesting changes for a few things before merge:

  1. needs a rebase (behind main). since the parser de-consolidation is pipeline-wide, not just round-trip, please run the full playwright suite after rebasing, not only the conversion unit tests.

the rest are inline. nice work on the changeset, the regenerated api snapshots, and the stale-after-edit coverage.

Comment thread packages/core/src/prosemirror/conversion/fromProseDoc/paragraph.ts
Comment thread packages/core/src/prosemirror/conversion/fromProseDoc/paragraph.ts Outdated
Comment thread packages/core/src/docx/paragraphParser.ts
@aldrinjenson

Copy link
Copy Markdown
Contributor Author

Pushed e97ddd5 addressing the review comments.

Validation run:

  • bun test packages/core/src/prosemirror/conversion/__tests__/run-boundary-roundtrip.test.ts passed
  • bun test packages/core/src/docx packages/core/src/prosemirror/conversion passed
  • bun run typecheck passed
  • bun run check:parity passed
  • bun run api:extract && bun run api:check passed
  • pre-commit repeated typecheck/parity/API extraction successfully
  • bun run test:e2e was attempted as requested, but the full Playwright suite is not green in this checkout: 793 passed, 5 skipped, 213 failed. The failures are broad UI/e2e issues (toolbar/dropdown selector timeouts, visual snapshot differences, table toolbar actions, find/replace, scroll-to-page, etc.) rather than localized failures from this run-boundary change.

@jedrazb jedrazb merged commit a89af59 into eigenpal:main Jun 16, 2026
8 checks passed
jedrazb added a commit that referenced this pull request Jun 16, 2026
After the leading-page-break (#831) and run-boundary-preservation (#832)
fixes both landed on main, the empty leading-break run is kept distinct, so
the lastRenderedPageBreak marker serializes onto its own w:r instead of the
text run. The output is valid OOXML; relax the over-specified assertion to
check the marker and the text independently. Fixes red main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jedrazb added a commit that referenced this pull request Jun 16, 2026
After the leading-page-break (#831) and run-boundary-preservation (#832)
fixes both landed on main, the empty leading-break run is kept distinct, so
the lastRenderedPageBreak marker serializes onto its own w:r instead of the
text run. The output is valid OOXML; relax the over-specified assertion to
check the marker and the text independently. Fixes red main.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants