Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize downloading objects #62

Open
krisis opened this issue Oct 3, 2017 · 5 comments
Open

Parallelize downloading objects #62

krisis opened this issue Oct 3, 2017 · 5 comments

Comments

@krisis
Copy link
Member

krisis commented Oct 3, 2017

[ Inspired from https://github.com/minio/minio-go/issues/797]

Downloading an object can be parallelized using multiple GET request with range-headers. This can lead to significant download performance improvement for large files.

The strategy can be as follows:

  1. Perform a HEAD request on the object to retrieve its size and ETag.
  2. Choose a suitable split-size, and make a list of range headers to be used in subsequent requests.
  3. Perform the GET requests with range-headers and write them to appropriate offsets into the on-disk file. In each GET request use the If-Match header and set the ETag retrieved in step 1. This is to make sure that any changes to the object during the download may be detected.
@Megamiun
Copy link

Hello, I never worked with Haskell and am thinking of trying to take this issue. But before that, how complicated is the setup of this project for a Linux user that played a little with Haskell but didn't get much experience?

@krisis
Copy link
Member Author

krisis commented Oct 11, 2017

@Megamiun The instructions to setup development are the same as installation. See https://github.com/minio/minio-hs/blob/master/README.md#installation. You can join us on https://slack.minio.io/ if you have further questions.

@Megamiun
Copy link

Oh, thanks, sorry for the delay. I was studying other things and didn't have much time.

Can I ask some questions? How can I determine a suitable split size? And I tried to go to the minio-go page to check about their implementation and the issue is blocked. Even so we want this feature?

@donatello
Copy link
Member

@Megamiun You can just use a constant split size of 64MiB to begin with. The feature will be useful to speed up downloads (it is blocked in minio-go for some other reason, and should be unblocked eventually).

In the current code we would like to replace fGetObject with a parallelized version. A function to perform tasks in parallel with a fixed number of threads is already available in the code base - see limitedConcurrently.

@harshavardhana
Copy link
Member

Removed hacktoberfest github tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants