Skip to content

Expand LaTeX symbol coverage beyond hand-maintained map #6

Description

@chitwitgit

Problem

The @m2d/math plugin maps LaTeX commands to Unicode for Word OMML text runs via a hand-maintained LATEX_SYMBOLS table in lib/src/index.ts (~120 entries). This limits how many commands are handled well:

  • Incomplete coverage — many common commands are not mapped and fall through as raw command names in DOCX output.
  • Incorrect mappings — some entries appear wrong (e.g. bigwedge was mapped to instead of ).
  • Hard to maintain — adding or correcting symbols requires manual edits with no structured source to build from.

Accent handling is also limited: only \hat / \widehat produce OMML accent characters; other accent commands are treated as plain symbols.

Goal

Support as many LaTeX commands as possible when converting math to DOCX. The plugin should not be tied to any single renderer — the aim is broad LaTeX command coverage, not parity with a particular engine.

Proposed approach

Use KaTeX v0.16.22 as a practical starting point, not a target spec. Its symbol and macro tables are a convenient, well-structured source that is easy to vendor and codegen from, giving us a large baseline quickly. Other sources (manual overrides, additional macro tables, etc.) can be layered on later.

  1. Vendor KaTeX source snippets (symbols.js, macros.js, functions/op.js) under lib/scripts/data/ as seed input.
  2. Add a codegen script (pnpm generate:katex) that extracts:
    • symbol → Unicode mappings
    • macro aliases
    • accent commands for createMathAccentCharacter
    • named functions (\sin, \log, etc.) rendered as literal text
    • macro-only symbol overrides (e.g. \neq, \copyright)
  3. Replace the inline map in index.ts with the generated tables and a resolveLatexSymbol() helper.
  4. Narrow tsup entry to ./src/index.ts so generated data files are bundled, not published as separate entry points.

Acceptance criteria

  • Symbol tables cover substantially more commands than the current hand-maintained map
  • Accent commands use OMML accent characters where appropriate
  • Named functions render as text, not Unicode symbols
  • pnpm typecheck, pnpm build, and pnpm test pass in lib/
  • Regeneration is documented via pnpm generate:katex

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions