fix(docx): round-trip TOC fields and point comments by sathinator-afk · Pull Request #837 · eigenpal/docx-editor

sathinator-afk · 2026-06-16T08:21:26Z

What this fixes

Two Tier-1 .docx open→save round-trip losses, each with a regression test. Found while building a round-trip fidelity bake-off against @eigenpal/docx-editor-core.

1. Cross-paragraph complex fields (Table of Contents) were dropped

parseParagraphContents runs the complex-field state machine per paragraph and only finalizes a field when it sees the matching w:fldChar end in the same paragraph. A real Word TOC (and INDEX, a multi-paragraph TA) has its begin/instrText/separate in the first entry paragraph and its end in a later one, so the field never closed and its consumed runs were swallowed — the begin + instruction were silently lost. This regressed the documented fields.toc roundTrip: 'full' guarantee (docs/site/data/word-features.ts).

Fix: buffer the runs an open field consumes and, if the paragraph ends with the field still open, splice them back at the field's start index — preserving order vs. in-paragraph non-run content (e.g. the first TOC entry's hyperlink). Each run already carries its fieldChar/instrText content, so the begin + instruction round-trip verbatim and Word re-stitches the field across paragraphs. Touches only paragraphParser/content.ts.

2. Range-less "point" comments were dropped

A comment can be anchored by a bare w:commentReference with no commentRangeStart/End. The model had no commentReference node, so the reference run parsed to empty content and was dropped by the run consolidator; references were only emitted as a side-effect of commentRangeEnd. A point comment therefore had no representation, and a two-comment document round-tripped to one.

Fix: add a first-class CommentReference paragraph-content node — parse every w:commentReference (wrapped in a run or bare) and serialize it explicitly, making it the single source of truth for every reference (and stop the double-emit from commentRangeEnd). Ranged and point comments now both survive with exactly one reference each.

Tests & verification

New: toc-field-roundtrip.test.ts (incl. the begin-runs-before-first-entry-hyperlink ordering), comment-reference-roundtrip.test.ts.
Full packages/core docx suite green (325 tests, 0 fail); bun run typecheck clean.
API report refreshed for the new public CommentReference export.

Notes

Authored with AI assistance (Claude Code); I'll sign the CLA on this PR.
Happy to split into two PRs (TOC; point comments) if you'd prefer — they share content.ts but are independent.

🤖 Generated with Claude Code

^{Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.}

Two Tier-1 open->save fidelity fixes found via a docx round-trip bake-off: 1. Cross-paragraph complex fields (TOC, INDEX, multi-paragraph TA) were dropped: the per-paragraph field state machine only finalizes a field when its w:fldChar end is in the same paragraph, so an unterminated field's begin/instrText/separate runs were swallowed. Buffer those runs and, if the paragraph ends with the field still open, splice them back at the field's start index (preserving order vs. in-paragraph non-run content such as the first TOC entry's hyperlink). Restores the documented fields.toc roundTrip:'full' guarantee. 2. Range-less 'point' comments (bare w:commentReference, no commentRangeStart/End) were dropped: the model had no commentReference node, so the reference run parsed empty and was consolidated away, and references were only emitted as a side-effect of commentRangeEnd. Add a first-class CommentReference node (parse + serialize) as the single source of truth for every reference. Adds toc-field-roundtrip + comment-reference-roundtrip tests. Full packages/core docx suite green (325 tests), typecheck clean. Pre-PR: run 'bun run api:extract' to refresh the API report for the new CommentReference export. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The new public CommentReference type (point-comment round-trip fix) adds a member to ParagraphContent; regenerate the API Extractor report so api:check passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel · 2026-06-16T08:21:32Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
docx-editor	Ready	Preview, Comment	Jun 16, 2026 10:09pm

eigenpal-release-pal · 2026-06-16T08:21:43Z

All contributors have signed the CLA ✍️ ✅

_{Posted by the CLA bot.}

sathinator-afk · 2026-06-16T08:30:48Z

I have read the CLA Document and I hereby sign the CLA

jedrazb

Thanks for this — the TOC / cross-paragraph field fix (part 1) is solid: I traced the buffer-and-splice and the begin-run-before-hyperlink ordering and it round-trips correctly. Requesting changes on the point-comment half (part 2) because, as written, it regresses comment fidelity on the editor's own save path.

The regression

This PR removes the <w:commentReference> run from the commentRangeEnd serializer case and makes it depend on a commentReference content node. That node is produced only by the DOCX parser — never by the editor's save pipeline:

Editor comments are ProseMirror marks (commentOps.ts addMark(... schema.marks.comment ...)), not content nodes.
On save, insertCommentRanges (packages/core/src/prosemirror/conversion/fromProseDoc/paragraph.ts) emits only commentRangeStart / commentRangeEnd — there is zero commentReference producer anywhere under packages/core/src/prosemirror/. The reply / tracked-change paths (injectReplyRangeMarkers.ts) emit only range markers too.
The import side drops it as well: toProseDoc/paragraph.ts has no commentReference case, so a parsed reference node is silently discarded before it reaches PM.

Net effect: any document opened and saved through the editor — and every comment created in the editor — loses all <w:commentReference> runs. Imported range-less point comments disappear entirely on re-save. Per ECMA-376 the reference run is the comment anchor, so this breaks the documented fields/comments roundTrip: 'full' guarantee (docs/site/data/word-features.ts).

Verified empirically, not just by reading: building a PM doc with a comment mark and running the real fromProseDoc + serializer with this diff applied yields commentRangeStart + commentRangeEnd and zero commentReference runs (pre-PR: one). A full toProseDoc → fromProseDoc → serialize round-trip of a 2-comment doc drops both references and loses the point comment.

What's needed before merge

Emit the node on the PM-save path: in insertCommentRanges (and injectReplyRangeMarkers / the TC variant), push { type: 'commentReference', id } immediately after each commentRangeEnd.
Decide commentReference handling in toProseDoc/paragraph.ts explicitly (round-trip it or intentionally drop it — right now it's an unhandled silent drop).
Add a regression test that goes through ProseMirror (toProseDoc → fromProseDoc → serialize), not just parseDocx → repackDocx — that's why the current suite stays green.

The TOC fix is independent and mergeable on its own; splitting it out (as you offered) and holding the comment half for the PM-path fix is the clean path. (Also still blocked on CLA.)

jedrazb · 2026-06-16T08:40:32Z

    case 'commentRangeEnd':
-      return (
-        `<w:commentRangeEnd w:id="${content.id}"/>` +
-        `<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="${content.id}"/></w:r>`
-      );
+      // Emit only the end marker. The reference run is emitted from its own
+      // 'commentReference' node (preserved at parse time), so every comment —
+      // ranged or point — round-trips its reference exactly once.
+      return `<w:commentRangeEnd w:id="${content.id}"/>`;
+    case 'commentReference':
+      return `<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="${content.id}"/></w:r>`;


This split is the root of the regression. Moving the <w:commentReference> run out of the commentRangeEnd case is only safe if every code path that produces a commentRangeEnd also produces a commentReference node. The DOCX parser now does — but the editor's PM-save path does not.

insertCommentRanges in fromProseDoc/paragraph.ts emits commentRangeStart/commentRangeEnd from comment marks and never a commentReference node (grep confirms no commentReference producer anywhere under prosemirror/). So after this change, any comment saved through the editor serializes with no reference run at all.

Empirically reproduced: PM doc with a comment mark → fromProseDoc → serialize with this diff = commentRangeStart + commentRangeEnd, zero commentReference (was 1 pre-PR).

Fix: push { type: 'commentReference', id } right after each commentRangeEnd in insertCommentRanges and in injectReplyRangeMarkers / the TC variant, and add a commentReference case to toProseDoc/paragraph.ts.

jedrazb · 2026-06-16T08:40:32Z

+    const doc = await parseDocx(await buildCommentsDocx(), { preloadFonts: false });
+    const out = new Uint8Array(await repackDocx(doc));


This test exercises parseDocx → repackDocx (raw Document model → serializer), which preserves the parser-produced commentReference node — so it passes even though the editor save path regresses. The bug lives in the ProseMirror round-trip, which this never touches. Please add a sibling test that goes toProseDoc → fromProseDoc → serialize and asserts exactly one <w:commentReference> per comment (and that a range-less point comment survives).

Making `commentReference` the single source of truth for the reference run regressed comment fidelity on the editor's own save path: editor comments are ProseMirror `comment` marks, and `insertCommentRanges` (fromProseDoc) only synthesized `commentRangeStart`/`End` from them — nothing produced the `commentReference` node, so the serializer (which now emits the run only from that node) dropped every `<w:commentReference>` on save. Any doc opened+saved through the editor, and every editor-created comment, lost its anchor. Fix, per review: - fromProseDoc `insertCommentRanges`: emit a `commentReference` node right after each `commentRangeEnd` (both the mid-paragraph close and the end-of-paragraph close), so the editor save path round-trips the anchor. - docx `injectReplyRangeMarkers` + `injectTCReplyRangeMarkers`: emit a parallel `commentReference` for every reply / tracked-change reply too. - toProseDoc: handle `commentReference` explicitly — intentionally NOT converted to a PM node (the mark is the source of truth; it's regenerated on save, so keeping it would double-emit). Range-less "point" comments can't be a zero-width mark and are dropped on the editor path (still lossless via parseDocx→repackDocx). - New regression test goes through ProseMirror (toProseDoc → fromProseDoc → serialize), not just parseDocx → repackDocx, and asserts exactly one reference survives a save→load→save cycle (no loss, no accumulation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sathinator-afk · 2026-06-16T22:34:35Z

Thanks for the careful review — you're right, and the empirical repro nailed it. I fixed the PM-save-path regression rather than splitting the PR:

insertCommentRanges (fromProseDoc/paragraph.ts) now emits a commentReference node right after each commentRangeEnd (both the mid-paragraph and end-of-paragraph close), and injectReplyRangeMarkers / injectTCReplyRangeMarkers do the same for replies and tracked-change-parented replies.
toProseDoc/paragraph.ts now handles commentReference explicitly — intentionally not converted to a PM node, since the comment mark is the source of truth and the reference is regenerated on save (keeping it would double-emit). A range-less point comment can't be expressed as a zero-width mark, so it's dropped on the editor path (still lossless via parseDocx→repackDocx), documented inline.
Added a regression test that goes through ProseMirror (toProseDoc → fromProseDoc → serialize) and asserts exactly one reference survives a save→load→save cycle — no loss, no accumulation.

Kept the two halves together since the comment path is correct now, but happy to split the TOC fix out if you'd still prefer. CLA's signed too.

sathinator-afk and others added 2 commits June 16, 2026 18:09

docs(api): refresh API report for CommentReference export

236eb59

The new public CommentReference type (point-comment round-trip fix) adds a member to ParagraphContent; regenerate the API Extractor report so api:check passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview June 16, 2026 08:23 View deployment

eigenpal-release-pal Bot added a commit that referenced this pull request Jun 16, 2026

Record CLA signature for @sathinator-afk (PR #837)

ae10230

Merge branch 'main' into fix/roundtrip-toc-and-point-comments

c281375

vercel Bot deployed to Preview June 16, 2026 08:38 View deployment

jedrazb requested changes Jun 16, 2026

View reviewed changes

vercel Bot deployed to Preview June 16, 2026 22:09 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(docx): round-trip TOC fields and point comments#837

fix(docx): round-trip TOC fields and point comments#837
sathinator-afk wants to merge 4 commits into
eigenpal:mainfrom
sathinator-afk:fix/roundtrip-toc-and-point-comments

sathinator-afk commented Jun 16, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

vercel Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

eigenpal-release-pal Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

sathinator-afk commented Jun 16, 2026

Uh oh!

jedrazb left a comment

Uh oh!

jedrazb Jun 16, 2026

Uh oh!

jedrazb Jun 16, 2026

Uh oh!

sathinator-afk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const doc = await parseDocx(await buildCommentsDocx(), { preloadFonts: false });
		const out = new Uint8Array(await repackDocx(doc));

Conversation

sathinator-afk commented Jun 16, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this fixes

1. Cross-paragraph complex fields (Table of Contents) were dropped

2. Range-less "point" comments were dropped

Tests & verification

Notes

Uh oh!

vercel Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eigenpal-release-pal Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sathinator-afk commented Jun 16, 2026

Uh oh!

jedrazb left a comment

Choose a reason for hiding this comment

The regression

What's needed before merge

Uh oh!

jedrazb Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

jedrazb Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

sathinator-afk commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sathinator-afk commented Jun 16, 2026 •

edited by blacksmith-sh Bot

Loading

vercel Bot commented Jun 16, 2026 •

edited

Loading

eigenpal-release-pal Bot commented Jun 16, 2026 •

edited

Loading