Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLSX Not Working- Docling Core Needs Update #493

Closed
ctandrewtran opened this issue Dec 2, 2024 · 4 comments
Closed

XLSX Not Working- Docling Core Needs Update #493

ctandrewtran opened this issue Dec 2, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@ctandrewtran
Copy link

ctandrewtran commented Dec 2, 2024

Context

Using latest docling-core (v2.6.1) and docling (v2.8.1) xlsx files are unable to be processed due to a ValueError mentioning that the mimetype "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" is not valid as per DocumentOrigin

"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" is not a mimetype offered by default by the mimetype pkg being used by docling and hence has to be added in the extras section

Fix
Add the mimetype as an extra mimetype

Steps to reproduce

Send a .xlsx file as a document stream, then try to do docling.convert()

from docling import *
import streamlit as st
import io

file = st.file_uploader(type=['xlsx'])

arti = docling.pipeline.standard_pdf_pipeline.StandardPdfPipeline.download_models_hf()

pipelinePDF = docling.datampdel.pipeline_options.PdfPipelineOptions(artifacts_path=arti)

converter = docling.document_converter.DocumentConverter(allowed_formats=[docling.InputFormat.XLSX])

buf = io.BytesIO(file.read())

src = docling.datamodel.base_models.DocumentStream(name=file.name, stream=buf) 

doclingified = converter.convert(src)

Docling version

v2.6.1

Docling core: v2.8.1

Python version

Python 3.12

Fix

PR to fix in docling core:
DS4SD/docling-core#88

@ctandrewtran ctandrewtran added the bug Something isn't working label Dec 2, 2024
@ctandrewtran ctandrewtran changed the title XLSX Not Working XLSX Not Working- Docling Core Needs Update Dec 3, 2024
@ctandrewtran
Copy link
Author

Any updates on this? Would be very helpful, currently blocking me

@kime541200
Copy link

Same problem here.

@ozgurnsahin
Copy link

Same here.

@ctandrewtran
Copy link
Author

This was resolved in the v2.9.0 of docling-core

@ozgurnsahin @kime541200

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants