-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: pandas cannot open xlsx with openpyxl engine #40499
Comments
Thanks for the report! I also cannot open this file directly using openpyx:
If I download the file, open it in Libre Office on ubuntu, and resave it as an Open Office XML spreadsheet, I can then open it with both openpyxl and pandas. It seems to me this may be an invalid excel file, but if there is an issue it would be with openpyxl. |
@newbie9188 - what tool was used to produce this excel file? If it was produced programmatically using pandas, sharing some example code that reproduces would be helpful. |
Hi @newbie9188 , can you share how you solve this issue? do we need to resave the file as xml file type? |
@newbie9188 I'm also curious how you fixed this issue as I also see these errors when trying to read in a xlsx from this ViCell unit. |
I cannot fix it. my solution is use "win32.com" package to open the xlsx file via excel application and parse out the data then re-save. |
I cannot fix it. my solution is use "win32.com" package to open the xlsx file via excel application and parse out the data then re-save. The reason that I cannot open the xlsx file is because the vicell instrument exported the xlsx as protected file. The analyst wants export the result on a per sample bases. If the vicell export result for multiple samples then it will be in csv format. |
I found a different work around - in pandas 1.3.5 (or earlier) and xlrd version < 2.0, use engine='xlrd' and the ViCell file can be opened successfully. |
I have the same problem for ACL Analytics generated files. |
I had the same problem. import pandas as pd
from xlsx2csv import Xlsx2csv
Xlsx2csv("examplexlsx.xlsx", outputencoding="utf-8").convert("examplecsv.csv")
pd.read_csv("examplecsv.csv") |
+1 to this issue. I get the same error. As far as reproducing the error, I have an unusual situation. Like the OP, the Excel files are generated by a lab instrument using NI Labview. I use Now, just a few days before running into this bug, I had to rebuild my entire Python environment (installing I don't know that this information helps anyone at all, and the bug seems to belong to |
From what I've seen, this doesn't mean a whole lot. Excel can be very forgiving when reading invalid files (a reason why certain lab tools get away with writing invalid files), whereas open source Excel libraries are not (and understandably so). I would not expect requests to have engines read invalid files to be turned down. On the other hand, if it is a valid Excel file, then that changes things. But we would need a reproducible example. |
OK, fine. I copied the first 10 rows of the offending file to a blank Excel file and saved it. Trying to read the resulting file using I have |
On a fresh venv with pandas 2.1.4 and openpyxl 3.1.2 the script import openpyxl
import pandas as pd
fpath = 'Heaterpower.xlsx'
wb = openpyxl.load_workbook(fpath)
pd.read_excel(fpath, index_col=0) runs without error for me. |
Fascinating. Works for me, too. |
Oh, I'm sorry. I still had the original file in my data folder, and the error was actually coming from that file. I think you're right that the Labview code running on the tester is somehow generating a malformation in the file that Excel tolerates. |
[ x] I have checked that this issue has not already been reported.
[ x] I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
[this should explain why the current behaviour is a problem and why the expected output is a better solution]
Expected Output
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here leaving a blank line after the details tag]pandas: v1.2.3
openpyxl: v 3.0.7
python : v 3.8.5
The text was updated successfully, but these errors were encountered: