-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Having pandas.read_excel FASTER (with an available proof of concept) #47290
Comments
offset
attribute (with an available proof of concept)
If I understand correctly, there are two topics here:
For the first question I'd suggest I'm a bit curious how you came across the scenario. You want to load a small portion of the data, and you're unhappy with the performance. So I hope this isn't a silly question, but are you by any chance calling |
Thanks for the quick response and your feedbacks.
|
Yes, I think that a pull request that
would be welcome. I'd suggest that you compare the speed of the main branch and your feature branch in at least three cases:
and ideally for all four backends: xls, xlsx, xlsb, ods. For example, #46894 was the recent change to improve case #2. A more experienced contributor than me would have to judge if the performance improvement is big enough to justify any increase in code complexity. I've found the devs to be quite helpful for us newer contributors. |
Thanks for the issue, but it appears this hasn't gotten traction in a while so closing |
Is your feature request related to a problem?
I wish pandas read_excel could be faster.
Describe the solution you'd like
pandas.read_excel should get faster if we use engines iterators, here i created an offset variable for testing, that can go deeper to the engines, so for theese tests, there is a new parameter
offset
that goes deeply to the engine iterators functions.API breaking implications
It should not break the actual API.
Describe alternatives you've considered
For now, this is the only way i found.
Additional context
Basically, the offset can keep the columns name and be use like this :
LIVE DEMO
BENCHMARKS
And as benchmarks are runned with :
OUTPUT
I made a Proof Of Concept for this update available here : https://github.com/Sanix-Darker/pandas/pull/1
The text was updated successfully, but these errors were encountered: