BUG: pandas cannot open xlsx with openpyxl engine #40499

newbie9188 · 2021-03-18T15:24:00Z

[ x] I have checked that this issue has not already been reported.
[ x] I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

[NXTS122ND8S10.xlsx](https://github.com/pandas-dev/pandas/files/6165254/NXTS122ND8S10.xlsx)

I have this trouble that pandas cannot open any xlsx files. All of the xlsx files can be open and read by excel.

Current packages and versions in use:
pandas: v1.2.3
openpyxl: v 3.0.7
python : v 3.8.5

method 1:
excel = pd.read_excel('NXTS122ND8S10.xlsx',sheet_name='Run results',engine='openpyxl')
print(excel)

method 2:
excel = pd.read_excel(open('NXTS122ND8S10.xlsx','rb'), sheet_name ='Run results')
print(excel)

method 3:
wb = openpyxl.load_workbook('NXTS122ND8S10.xlsx')
sheet = wb.worksheets[0]
excel = pd.dataframe(sheet.values)
print(excel)

method 4:  
excel = pd.read_excel('NXTS122ND8S10.xlsx')
print(excel)

same error code:

Traceback (most recent call last):
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\base.py", line 55, in _convert
value = expected_type(value)
TypeError: Fill() takes no arguments

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\XXXXX\eclipse-workspace\Work Related Projects\transpose_3.py", line 107, in
excel = pd.read_excel(open(directory+''+file,'rb'), sheet_name ='Run results')
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\util_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel_base.py", line 336, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel_base.py", line 1131, in init
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel_openpyxl.py", line 475, in init
super().init(filepath_or_buffer, storage_options=storage_options)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel_base.py", line 391, in init
self.book = self.load_workbook(self.handles.handle)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel_openpyxl.py", line 486, in load_workbook
return load_workbook(
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 317, in load_workbook
reader.read()
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 281, in read
apply_stylesheet(self.archive, self.wb)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\styles\stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\styles\stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\serialisable.py", line 103, in from_tree
return cls(**attrib)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\styles\stylesheet.py", line 74, in init
self.fills = fills
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\sequence.py", line 26, in set
seq = [_convert(self.expected_type, value) for value in seq]
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\sequence.py", line 26, in
seq = [_convert(self.expected_type, value) for value in seq]
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\descriptors\base.py", line 57, in _convert
raise TypeError('expected ' + str(expected_type))
TypeError: expected <class 'openpyxl.styles.fills.Fill'>

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

Expected Output

Output of `pd.show_versions()`

[paste the output of pd.show_versions() here leaving a blank line after the details tag]
pandas: v1.2.3
openpyxl: v 3.0.7
python : v 3.8.5

The text was updated successfully, but these errors were encountered:

rhshadrach · 2021-03-19T00:09:20Z

Thanks for the report! I also cannot open this file directly using openpyx:

from openpyxl import load_workbook
print(load_workbook(filename='filename.xlsx'))

If I download the file, open it in Libre Office on ubuntu, and resave it as an Open Office XML spreadsheet, I can then open it with both openpyxl and pandas. It seems to me this may be an invalid excel file, but if there is an issue it would be with openpyxl.

rhshadrach · 2021-03-19T00:12:21Z

@newbie9188 - what tool was used to produce this excel file? If it was produced programmatically using pandas, sharing some example code that reproduces would be helpful.

newbie9188 · 2021-03-19T00:54:19Z

Hi Richard,

The excel is generated by lab instrument and exported with the instrument software.
Thank you for the response, this will save me a lot of time in debugging by eliminating a possible error.

I'm very appreciate for your help.

chu4276 · 2022-04-08T06:22:10Z

Hi @newbie9188 , can you share how you solve this issue? do we need to resave the file as xml file type?

agileminor · 2022-04-27T19:28:54Z

@newbie9188 I'm also curious how you fixed this issue as I also see these errors when trying to read in a xlsx from this ViCell unit.

newbie9188 · 2022-04-28T22:21:33Z

I cannot fix it. my solution is use "win32.com" package to open the xlsx file via excel application and parse out the data then re-save.

newbie9188 · 2022-04-28T22:23:15Z

I cannot fix it. my solution is use "win32.com" package to open the xlsx file via excel application and parse out the data then re-save. The reason that I cannot open the xlsx file is because the vicell instrument exported the xlsx as protected file. The analyst wants export the result on a per sample bases. If the vicell export result for multiple samples then it will be in csv format.

agileminor · 2022-04-29T01:05:54Z

I found a different work around - in pandas 1.3.5 (or earlier) and xlrd version < 2.0, use engine='xlrd' and the ViCell file can be opened successfully.

dkabal14 · 2024-03-01T20:23:47Z

I have the same problem for ACL Analytics generated files.

nicilytop · 2024-04-18T09:42:19Z

I had the same problem.
Unfortunately, I did not find any solutions and solved it by converting the xlsx file to csv using the xlsx2csv library and then importing it into pandas.

import pandas as pd

from xlsx2csv import Xlsx2csv

Xlsx2csv("examplexlsx.xlsx", outputencoding="utf-8").convert("examplecsv.csv")
pd.read_csv("examplecsv.csv")

igurin-invn · 2024-08-21T20:48:49Z

+1 to this issue. I get the same error.

As far as reproducing the error, I have an unusual situation. Like the OP, the Excel files are generated by a lab instrument using NI Labview. I use pandas to convert them to csv and save them in a repo, after which we generally don't retain the Excel files. I first wrote the conversion code a couple of years ago, and it worked fine, although usually it was my colleague who used it. The other day, I was trying to debug an unrelated issue and found that the converter had stopped working. What I have is an Excel file, on one hand, and on the other hand a CSV file that I'm pretty sure was generated from the Excel file using read_csv. The Excel file opens in Excel just fine.

Now, just a few days before running into this bug, I had to rebuild my entire Python environment (installing pyside6 had apparently messed something up). I thought this might have caused the problem, so I downgraded openpyxl from 3.1.5 to 3.1.2 (which my colleague has), but the problem didn't go away.

I don't know that this information helps anyone at all, and the bug seems to belong to openpyxl rather than pandas, but no one has reported it there yet, so it seemed more constructive to add my 2 cents' worth here.

rhshadrach · 2024-08-21T21:02:40Z

The Excel file opens in Excel just fine.

From what I've seen, this doesn't mean a whole lot. Excel can be very forgiving when reading invalid files (a reason why certain lab tools get away with writing invalid files), whereas open source Excel libraries are not (and understandably so). I would not expect requests to have engines read invalid files to be turned down.

On the other hand, if it is a valid Excel file, then that changes things. But we would need a reproducible example.

igurin-invn · 2024-08-21T21:15:11Z

OK, fine. I copied the first 10 rows of the offending file to a blank Excel file and saved it. Trying to read the resulting file using pd.read_excel(fpath, index_col=0) reproduces the OP's error.
Heaterpower.xlsx

I have pandas 2.1.4 and openpyxl 3.1.2.

rhshadrach · 2024-08-21T21:40:31Z

On a fresh venv with pandas 2.1.4 and openpyxl 3.1.2 the script

import openpyxl
import pandas as pd

fpath = 'Heaterpower.xlsx'
wb = openpyxl.load_workbook(fpath)
pd.read_excel(fpath, index_col=0)

runs without error for me.

igurin-invn · 2024-08-21T21:44:54Z

Fascinating. Works for me, too.

igurin-invn · 2024-08-21T21:58:23Z

Oh, I'm sorry. I still had the original file in my data folder, and the error was actually coming from that file.

I think you're right that the Labview code running on the tester is somehow generating a malformation in the file that Excel tolerates.

newbie9188 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 18, 2021

newbie9188 changed the title ~~BUG:~~ BUG: pandas cannot open xlsx with openpyxl engine Mar 18, 2021

rhshadrach added IO Excel read_excel, to_excel and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 19, 2021

newbie9188 closed this as completed Mar 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pandas cannot open xlsx with openpyxl engine #40499

BUG: pandas cannot open xlsx with openpyxl engine #40499

newbie9188 commented Mar 18, 2021

rhshadrach commented Mar 19, 2021 •

edited

Loading

rhshadrach commented Mar 19, 2021

newbie9188 commented Mar 19, 2021

chu4276 commented Apr 8, 2022

agileminor commented Apr 27, 2022

newbie9188 commented Apr 28, 2022

newbie9188 commented Apr 28, 2022

agileminor commented Apr 29, 2022

dkabal14 commented Mar 1, 2024 •

edited

Loading

nicilytop commented Apr 18, 2024

igurin-invn commented Aug 21, 2024

rhshadrach commented Aug 21, 2024

igurin-invn commented Aug 21, 2024

rhshadrach commented Aug 21, 2024

igurin-invn commented Aug 21, 2024 •

edited

Loading

igurin-invn commented Aug 21, 2024

BUG: pandas cannot open xlsx with openpyxl engine #40499

BUG: pandas cannot open xlsx with openpyxl engine #40499

Comments

newbie9188 commented Mar 18, 2021

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

rhshadrach commented Mar 19, 2021 • edited Loading

rhshadrach commented Mar 19, 2021

newbie9188 commented Mar 19, 2021

chu4276 commented Apr 8, 2022

agileminor commented Apr 27, 2022

newbie9188 commented Apr 28, 2022

newbie9188 commented Apr 28, 2022

agileminor commented Apr 29, 2022

dkabal14 commented Mar 1, 2024 • edited Loading

nicilytop commented Apr 18, 2024

igurin-invn commented Aug 21, 2024

rhshadrach commented Aug 21, 2024

igurin-invn commented Aug 21, 2024

rhshadrach commented Aug 21, 2024

igurin-invn commented Aug 21, 2024 • edited Loading

igurin-invn commented Aug 21, 2024

Output of `pd.show_versions()`

rhshadrach commented Mar 19, 2021 •

edited

Loading

dkabal14 commented Mar 1, 2024 •

edited

Loading

igurin-invn commented Aug 21, 2024 •

edited

Loading