You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using latest docling-core (v2.6.1) and docling (v2.8.1) xlsx files are unable to be processed due to a ValueError mentioning that the mimetype "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" is not valid as per DocumentOrigin
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" is not a mimetype offered by default by the mimetype pkg being used by docling and hence has to be added in the extras section
Fix
Add the mimetype as an extra mimetype
Steps to reproduce
Send a .xlsx file as a document stream, then try to do docling.convert()
from docling import *
import streamlit as st
import io
file = st.file_uploader(type=['xlsx'])
arti = docling.pipeline.standard_pdf_pipeline.StandardPdfPipeline.download_models_hf()
pipelinePDF = docling.datampdel.pipeline_options.PdfPipelineOptions(artifacts_path=arti)
converter = docling.document_converter.DocumentConverter(allowed_formats=[docling.InputFormat.XLSX])
buf = io.BytesIO(file.read())
src = docling.datamodel.base_models.DocumentStream(name=file.name, stream=buf)
doclingified = converter.convert(src)
Context
Using latest docling-core (v2.6.1) and docling (v2.8.1) xlsx files are unable to be processed due to a ValueError mentioning that the mimetype "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" is not valid as per DocumentOrigin
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
is not a mimetype offered by default by themimetype
pkg being used by docling and hence has to be added in the extras sectionFix
Add the mimetype as an extra mimetype
Steps to reproduce
Send a .xlsx file as a document stream, then try to do docling.convert()
Docling version
v2.6.1
Docling core: v2.8.1
Python version
Python 3.12
Fix
PR to fix in docling core:
DS4SD/docling-core#88
The text was updated successfully, but these errors were encountered: