Skip to content

fix(docx): tolerate styles missing type#2168

Open
jstar0 wants to merge 1 commit into
microsoft:mainfrom
jstar0:fix/docx-missing-style-type
Open

fix(docx): tolerate styles missing type#2168
jstar0 wants to merge 1 commit into
microsoft:mainfrom
jstar0:fix/docx-missing-style-type

Conversation

@jstar0

@jstar0 jstar0 commented Jun 28, 2026

Copy link
Copy Markdown

Summary

Fixes the DOCX conversion crash from #2166 when a document contains a malformed word/styles.xml entry whose w:style element is missing the w:type attribute.

Root Cause / Context

MarkItDown pre-processes DOCX files and then hands them to Mammoth. Mammoth reads every style entry in word/styles.xml and indexes the w:type attribute directly. If a DOCX exporter writes a style without that attribute, Mammoth raises KeyError("w:type"), which MarkItDown reports as a FileConversionException.

Changes

  • Remove malformed DOCX style entries that do not have w:type during the existing DOCX pre-processing step.
  • Add a regression test that creates a DOCX fixture with one missing style type and verifies conversion still succeeds.

Scope / Risk

This only changes DOCX pre-processing for malformed style definitions. It does not alter document text, comments, equations, or valid style entries. If style cleanup itself fails, the pre-processor falls back to the original file content, matching the existing best-effort behavior used for math pre-processing.

Verification

cd packages/markitdown; hatch test tests/test_module_misc.py::test_docx_missing_style_type_does_not_crash
cd packages/markitdown; hatch test tests/test_module_misc.py::test_docx_comments tests/test_module_misc.py::test_docx_equations tests/test_module_misc.py::test_docx_missing_style_type_does_not_crash tests/test_module_vectors.py::test_convert_local -- -k docx
cd packages/markitdown; hatch test tests/test_module_misc.py
pre-commit run --files packages/markitdown/src/markitdown/converter_utils/docx/pre_process.py packages/markitdown/tests/test_module_misc.py

All commands passed.

Closes #2166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

- DocxConverter threw KeyError with message: 'w:type'

1 participant