-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate block fetch requests. #1733
Conversation
Would you mind explaining more completely what this does? It also makes that block of code look more complex (to me), so some comments inline would be appreciated. |
Python's groupby works by splitting based on a change in the value of a key function. The chosen key computes the difference between an ascending range and the block numbers (which are also necessarily sorted). If the that difference changes, we know that there was a "hole" in the block numbers (ie, we already have the block cached). Assume a file of 5 blocks,
I'll add an explanation via comments. |
You might be interested, that something similar happens in https://github.com/fsspec/filesystem_spec/blob/master/fsspec/utils.py#L528 (where we don't have blocks, but we do have byte ranges and possible gaps). |
@martindurant Thank you for pointing my attention to that function. I'm not sure if the approach here is directly applicable there. I'll have to spend more time reasoning thru it. |
@martindurant Would you prefer that I instead use |
Only if it's obvious how. I am not in a good position to make that choice, so long as you have considered the possibility. |
@martindurant I played around with an implementation and observed that |
Implement the TODO to consolidate consecutive block fetch requests into a single fetch request.
Adapted from the itertools example found here: https://docs.python.org/2.6/library/itertools.html#examples