You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are seeing an odd behavior where the processing of a TOC table in a word document fails without any errors with the resulting document missing the content that was originally in the TOC.
What we have tried:
Using OCR fails with the TOC content being omitted.
Exporting to PDF (using Word) and then using docling to convert to markdown works as expected with no content omissions.
Steps to reproduce
Use attached minimal example docx file and run:
docling sample.docx
resulting in the attached Markdown file which has the TOC content missing.
Bug
We are seeing an odd behavior where the processing of a TOC table in a word document fails without any errors with the resulting document missing the content that was originally in the TOC.
What we have tried:
Steps to reproduce
Use attached minimal example docx file and run:
resulting in the attached Markdown file which has the TOC content missing.
Docling version
Docling version: 2.13.0
Docling Core version: 2.12.1
Docling IBM Models version: 3.1.0
Docling Parse version: 3.0.0
Python version
Python 3.11.11
Note: all shared samples and publicly available documents.
sample.md
sample.docx
The text was updated successfully, but these errors were encountered: