Title: Question about SML labels in the released CMER training code
Hi, thanks for releasing the CMERNet implementation and the CMER/MER datasets.
I am trying to reproduce the training pipeline described in the paper. In the paper, CMERNet introduces Structured Mathematical Language (SML), where the raw LaTeX string is first parsed into a syntax tree and then serialized into a structured token sequence with grammar tokens.
However, when I checked the released OpenOCR CMER configuration and README, the training label format seems to be:
Title: Question about SML labels in the released CMER training code
Hi, thanks for releasing the CMERNet implementation and the CMER/MER datasets.
I am trying to reproduce the training pipeline described in the paper. In the paper, CMERNet introduces Structured Mathematical Language (SML), where the raw LaTeX string is first parsed into a syntax tree and then serialized into a structured token sequence with grammar tokens.
However, when I checked the released OpenOCR CMER configuration and README, the training label format seems to be: