Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pandas cannot open xlsx with openpyxl engine #40499

Closed
1 task
newbie9188 opened this issue Mar 18, 2021 · 16 comments
Closed
1 task

BUG: pandas cannot open xlsx with openpyxl engine #40499

newbie9188 opened this issue Mar 18, 2021 · 16 comments
Labels
Bug IO Excel read_excel, to_excel

Comments

@newbie9188
Copy link

  • [ x] I have checked that this issue has not already been reported.

  • [ x] I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

[NXTS122ND8S10.xlsx](https://github.com/pandas-dev/pandas/files/6165254/NXTS122ND8S10.xlsx)

I have this trouble that pandas cannot open any xlsx files. All of the xlsx files can be open and read by excel.

Current packages and versions in use:
pandas: v1.2.3
openpyxl: v 3.0.7
python : v 3.8.5

method 1:
excel = pd.read_excel('NXTS122ND8S10.xlsx',sheet_name='Run results',engine='openpyxl')
print(excel)

method 2:
excel = pd.read_excel(open('NXTS122ND8S10.xlsx','rb'), sheet_name ='Run results')
print(excel)

method 3:
wb = openpyxl.load_workbook('NXTS122ND8S10.xlsx')
sheet = wb.worksheets[0]
excel = pd.dataframe(sheet.values)
print(excel)

method 4:  
excel = pd.read_excel('NXTS122ND8S10.xlsx')
print(excel)

same error code:

Traceback (most recent call last):
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\base.py", line 55, in _convert
value = expected_type(value)
TypeError: Fill() takes no arguments

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\XXXXX\eclipse-workspace\Work Related Projects\transpose_3.py", line 107, in
excel = pd.read_excel(open(directory+''+file,'rb'), sheet_name ='Run results')
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\util_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel_base.py", line 336, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel_base.py", line 1131, in init
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel_openpyxl.py", line 475, in init
super().init(filepath_or_buffer, storage_options=storage_options)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel_base.py", line 391, in init
self.book = self.load_workbook(self.handles.handle)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel_openpyxl.py", line 486, in load_workbook
return load_workbook(
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 317, in load_workbook
reader.read()
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 281, in read
apply_stylesheet(self.archive, self.wb)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\styles\stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\styles\stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\serialisable.py", line 103, in from_tree
return cls(**attrib)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\styles\stylesheet.py", line 74, in init
self.fills = fills
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\sequence.py", line 26, in set
seq = [_convert(self.expected_type, value) for value in seq]
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\sequence.py", line 26, in
seq = [_convert(self.expected_type, value) for value in seq]
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\base.py", line 57, in _convert
raise TypeError('expected ' + str(expected_type))
TypeError: expected <class 'openpyxl.styles.fills.Fill'>

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]
pandas: v1.2.3
openpyxl: v 3.0.7
python : v 3.8.5

@newbie9188 newbie9188 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 18, 2021
@newbie9188 newbie9188 changed the title BUG: BUG: pandas cannot open xlsx with openpyxl engine Mar 18, 2021
@rhshadrach
Copy link
Member

rhshadrach commented Mar 19, 2021

Thanks for the report! I also cannot open this file directly using openpyx:

from openpyxl import load_workbook
print(load_workbook(filename='filename.xlsx'))

If I download the file, open it in Libre Office on ubuntu, and resave it as an Open Office XML spreadsheet, I can then open it with both openpyxl and pandas. It seems to me this may be an invalid excel file, but if there is an issue it would be with openpyxl.

@rhshadrach rhshadrach added IO Excel read_excel, to_excel and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 19, 2021
@rhshadrach
Copy link
Member

@newbie9188 - what tool was used to produce this excel file? If it was produced programmatically using pandas, sharing some example code that reproduces would be helpful.

@newbie9188
Copy link
Author

Hi Richard,

The excel is generated by lab instrument and exported with the instrument software.
Thank you for the response, this will save me a lot of time in debugging by eliminating a possible error.

I'm very appreciate for your help.

image

image

@chu4276
Copy link

chu4276 commented Apr 8, 2022

Hi @newbie9188 , can you share how you solve this issue? do we need to resave the file as xml file type?

@agileminor
Copy link

@newbie9188 I'm also curious how you fixed this issue as I also see these errors when trying to read in a xlsx from this ViCell unit.

@newbie9188
Copy link
Author

I cannot fix it. my solution is use "win32.com" package to open the xlsx file via excel application and parse out the data then re-save.

@newbie9188
Copy link
Author

I cannot fix it. my solution is use "win32.com" package to open the xlsx file via excel application and parse out the data then re-save. The reason that I cannot open the xlsx file is because the vicell instrument exported the xlsx as protected file. The analyst wants export the result on a per sample bases. If the vicell export result for multiple samples then it will be in csv format.

@agileminor
Copy link

I found a different work around - in pandas 1.3.5 (or earlier) and xlrd version < 2.0, use engine='xlrd' and the ViCell file can be opened successfully.

@dkabal14
Copy link

dkabal14 commented Mar 1, 2024

I have the same problem for ACL Analytics generated files.

@nicilytop
Copy link

I had the same problem.
Unfortunately, I did not find any solutions and solved it by converting the xlsx file to csv using the xlsx2csv library and then importing it into pandas.

import pandas as pd

from xlsx2csv import Xlsx2csv

Xlsx2csv("examplexlsx.xlsx", outputencoding="utf-8").convert("examplecsv.csv")
pd.read_csv("examplecsv.csv")

@igurin-invn
Copy link

+1 to this issue. I get the same error.

As far as reproducing the error, I have an unusual situation. Like the OP, the Excel files are generated by a lab instrument using NI Labview. I use pandas to convert them to csv and save them in a repo, after which we generally don't retain the Excel files. I first wrote the conversion code a couple of years ago, and it worked fine, although usually it was my colleague who used it. The other day, I was trying to debug an unrelated issue and found that the converter had stopped working. What I have is an Excel file, on one hand, and on the other hand a CSV file that I'm pretty sure was generated from the Excel file using read_csv. The Excel file opens in Excel just fine.

Now, just a few days before running into this bug, I had to rebuild my entire Python environment (installing pyside6 had apparently messed something up). I thought this might have caused the problem, so I downgraded openpyxl from 3.1.5 to 3.1.2 (which my colleague has), but the problem didn't go away.

I don't know that this information helps anyone at all, and the bug seems to belong to openpyxl rather than pandas, but no one has reported it there yet, so it seemed more constructive to add my 2 cents' worth here.

@rhshadrach
Copy link
Member

The Excel file opens in Excel just fine.

From what I've seen, this doesn't mean a whole lot. Excel can be very forgiving when reading invalid files (a reason why certain lab tools get away with writing invalid files), whereas open source Excel libraries are not (and understandably so). I would not expect requests to have engines read invalid files to be turned down.

On the other hand, if it is a valid Excel file, then that changes things. But we would need a reproducible example.

@igurin-invn
Copy link

OK, fine. I copied the first 10 rows of the offending file to a blank Excel file and saved it. Trying to read the resulting file using pd.read_excel(fpath, index_col=0) reproduces the OP's error.
Heaterpower.xlsx

I have pandas 2.1.4 and openpyxl 3.1.2.

@rhshadrach
Copy link
Member

On a fresh venv with pandas 2.1.4 and openpyxl 3.1.2 the script

import openpyxl
import pandas as pd

fpath = 'Heaterpower.xlsx'
wb = openpyxl.load_workbook(fpath)
pd.read_excel(fpath, index_col=0)

runs without error for me.

@igurin-invn
Copy link

igurin-invn commented Aug 21, 2024

Fascinating. Works for me, too.

@igurin-invn
Copy link

Oh, I'm sorry. I still had the original file in my data folder, and the error was actually coming from that file.

I think you're right that the Labview code running on the tester is somehow generating a malformation in the file that Excel tolerates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

7 participants