-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with read after write consistency #531
Comments
From looking at the logs more closely, the second command, See example output from the reproducer:
s5cmd is listing objects then exiting 0 without copying. |
Seeing the same lack of strong consistency with script below, in the latest beta v2.1.0-beta.1-3e08061, v2.0.0, and v1.4.0-d7a0dda:
It's not 100% of the time but over 50% of the time I'd say. I wrote similar scripts using the AWS CLI and github.com/aws/aws-sdk-go v1.40.25 and they never lacked this strong consistency. File used in testing is 122 KB, same behavior from s5cmd whether it was a new file or an overwritten file. |
So this seems tied to S3 having a fast clock. i.e. We did a copy at 16:27:27, failed a read at 16:27:28, and then succeeded a read at 16:27:33 showing a LastModified time of 16:27:29. s5cmd discarded the object on the initial read due to this logic ignoring future objects: https://github.com/peak/s5cmd/pull/188/files So when we went back to s5cmd 1.0.0 it works fine. Thoughts on a flag to optionally disable ignoring files with future timestamps? e.g. s5cmd --show-files-with-future-lastmodified-dates ls s3://bucket2/my.parquet https://github.com/peak/s5cmd/blob/v2.1.0-beta.1/storage/s3.go#L208 -> if mod.After(now) && ignoreFutureFiles { |
The reason why this problem occurs is having different clocks, as @mwkaufman mentioned. Lines 333 to 337 in 74d21bb
Lines 345 to 349 in 74d21bb
These check was added for the #168. When you use pagination when you have more than 1000 according to AWS listObjectsV2, if you have moved an object that matches with the wildcard pattern and if in the next call to the listObjectsV2 the object you have moved occurs, the object gets recopied since it matches the pattern, thus creating an infinite loop. This happens because since s5cmd does not list all the objects then starts copying but it simultaneously does the operations, unlike gsutil where it first lists then copies, and aws-cli has different wildcard permissions than s5cmd has.
This is another approach by #567 , where we request the time from the header, but So, what are some possible solutions ?
|
After further research and discussion, we have concluded that the current solution is the best effort solution at the moment. We are leaning towards to add a flag to disable future object check when |
Version: 2.0.0
I find that uploading files to S3 are not always immediately available for access.
It was my understanding that this should not happen with strong consistency. Is s5cmd exiting before the PUT request returns?
This tends to fail after a few iterations for me.
Note that a similar variant of this test never fails when
file
is transferred directly instead of by directory/prefix.The text was updated successfully, but these errors were encountered: