convert equation inside word(.docx) file to latex, current only supports omml
First of all, word(.docx) file is a zip file. After unziping a word file, you can get several documents. The content is mainly stored at word/document.xml.
$: unzip equation.docx $: tree ├── equation.docx ├── [Content_Types].xml ├── docProps │ ├── app.xml │ └── core.xml ├── equation.docx ├── _rels └── word ├── document.xml ├── fontTable.xml ├── _rels │ └── document.xml.rels └── styles.xml
There are three ways to insert a equation into word file:
- eq field
- omml
- MathType
All of them are stored as markup inside docx file. You can easily find them at word/document.xml.
<m:oMath xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
<m:sSup>
<m:e>
<m:d>
<m:dPr>
<m:begChr m:val="("/>
<m:endChr m:val=")"/>
</m:dPr>
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
</w:rPr>
<m:t xml:space="preserve">x</m:t>
</m:r>
.......
</m:e>
</m:sSup>
</m:e>
</m:nary>
</m:oMath>
So we can just extract the equation markup out and convert it into LaTeX.
There is a demo word file named equation.docx which contains a euqation. Install requirements and run the demo.py and you will get equation.html contains the LaTeX format equation.
<p>${\left(x+a\right)}^{n}={\sum }_{k=0}^{n}\left(\begin{array}{c}n\\ k\end{array}\right){x}^{k}{a}^{n-k}$</p>