Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not enough context to root-cause S3 access error #346

Open
philipmw opened this issue Oct 31, 2024 · 9 comments
Open

Not enough context to root-cause S3 access error #346

philipmw opened this issue Oct 31, 2024 · 9 comments
Labels
A-instrumentation Area: Related to tracing, instrumentation, metrics C-enhancement Category: New feature or request

Comments

@philipmw
Copy link

I am having a problem configuring rustic with a set of hot and cold S3 buckets. That may be its own bug, as I believe all permissions are properly configured, but the meta-bug that I want to report here is that rustic doesn't give me enough debug information, even at trace log level, to root-cause the issue.

What I want: for rustic to print the S3 response so I can see why S3 is returning a 400 status code.

What I get: just the status code and headers. S3 has a bunch of reasons why it would give a 400 status code, so I need more.

% rustic init --log-level trace
[INFO] using config ./rustic.toml
[DEBUG] (1) opendal::services::s3::backend: backend build started: S3Builder { config: S3Config { root: None, bucket: "pmw-backups-laptop-rustic-cold", endpoint: None, region: None, .. }, .. }
[DEBUG] (1) opendal::services::s3::backend: backend use root /
[DEBUG] (1) opendal::services::s3::backend: backend use bucket pmw-backups-laptop-rustic-cold
[DEBUG] (1) reqsign::aws::config: load_via_profile_config_file failed: No such file or directory (os error 2)
[DEBUG] (1) opendal::services::s3::backend: backend use region: us-west-2
[DEBUG] (1) opendal::services::s3::backend: backend use endpoint: https://s3.us-west-2.amazonaws.com/pmw-backups-laptop-rustic-cold
[TRACE] (1) mio::poll: [/private/tmp/nix-build-rustic-0.9.4.drv-0/rustic-0.9.4-vendor.tar.gz/mio/src/poll.rs:569] registering event source with poller: token=Token(1), interests=READABLE
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-cold : metadata started
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-cold : metadata finished
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-cold : metadata started
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-cold : metadata finished
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-cold : metadata started
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-cold : metadata finished
[DEBUG] (1) opendal::services::s3::backend: backend build started: S3Builder { config: S3Config { root: None, bucket: "pmw-backups-laptop-rustic-hot", endpoint: None, region: None, .. }, .. }
[DEBUG] (1) opendal::services::s3::backend: backend use root /
[DEBUG] (1) opendal::services::s3::backend: backend use bucket pmw-backups-laptop-rustic-hot
[DEBUG] (1) reqsign::aws::config: load_via_profile_config_file failed: No such file or directory (os error 2)
[DEBUG] (1) opendal::services::s3::backend: backend use region: us-west-2
[DEBUG] (1) opendal::services::s3::backend: backend use endpoint: https://s3.us-west-2.amazonaws.com/pmw-backups-laptop-rustic-hot
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-hot : metadata started
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-hot : metadata finished
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-hot : metadata started
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-hot : metadata finished
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-hot : metadata started
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-hot : metadata finished
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-cold : metadata started
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-cold : metadata finished
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-hot : metadata started
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-hot : metadata finished
[TRACE] (1) rustic_backend::opendal: [/private/tmp/nix-build-rustic-0.9.4.drv-0/rustic-0.9.4-vendor.tar.gz/rustic_backend/src/opendal.rs:187] listing tpe: Config
[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-cold path=config: stat started
[DEBUG] (1) reqsign::aws::v4: calculated scope: 20241030/us-west-2/s3/aws4_request
[DEBUG] (1) reqsign::aws::v4: calculated string to sign: AWS4-HMAC-SHA256
20241030T235504Z
20241030/us-west-2/s3/aws4_request
9df280dec8161b026ac8e1bea92c72e1df19c07f80c33cef7bbc1b230dbdb4e7
[TRACE] (1) hyper_util::client::legacy::pool: [/private/tmp/nix-build-rustic-0.9.4.drv-0/rustic-0.9.4-vendor.tar.gz/hyper-util/src/client/legacy/pool.rs:671] checkout waiting for idle connection: ("https", s3.us-west-2.amazonaws.com)
[DEBUG] (1) reqwest::connect: starting new connection: https://s3.us-west-2.amazonaws.com/
[TRACE] (1) hyper_util::client::legacy::connect::http: [/private/tmp/nix-build-rustic-0.9.4.drv-0/rustic-0.9.4-vendor.tar.gz/hyper-util/src/client/legacy/connect/http.rs:372] Http::connect; scheme=Some("https"), host=Some("s3.us-west-2.amazonaws.com"), port=None
[DEBUG] (1) hyper_util::client::legacy::connect::dns: resolve; host=s3.us-west-2.amazonaws.com
[TRACE] (1) tracing::span: [/private/tmp/nix-build-rustic-0.9.4.drv-0/rustic-0.9.4-vendor.tar.gz/hyper-util/src/client/legacy/connect/dns.rs:121] -- resolve;
[DEBUG] (14) hyper_util::client::legacy::connect::dns: resolving host="s3.us-west-2.amazonaws.com"
[DEBUG] (1) hyper_util::client::legacy::connect::http: connecting to 52.92.153.192:443
[TRACE] (1) mio::poll: [/private/tmp/nix-build-rustic-0.9.4.drv-0/rustic-0.9.4-vendor.tar.gz/mio/src/poll.rs:569] registering event source with poller: token=Token(140479080479232), interests=READABLE | WRITABLE
[DEBUG] (1) hyper_util::client::legacy::connect::http: connected to 52.92.153.192:443
[TRACE] (1) hyper_util::client::legacy::client: [/private/tmp/nix-build-rustic-0.9.4.drv-0/rustic-0.9.4-vendor.tar.gz/hyper-util/src/client/legacy/client.rs:592] http1 handshake complete, spawning background dispatcher task
[TRACE] (1) hyper_util::client::legacy::pool: [/private/tmp/nix-build-rustic-0.9.4.drv-0/rustic-0.9.4-vendor.tar.gz/hyper-util/src/client/legacy/pool.rs:713] checkout dropped for ("https", s3.us-west-2.amazonaws.com)
[TRACE] (4) mio::poll: [/private/tmp/nix-build-rustic-0.9.4.drv-0/rustic-0.9.4-vendor.tar.gz/mio/src/poll.rs:700] deregistering event source from poller
[ERROR] service=s3 name=pmw-backups-laptop-rustic-cold path=config: stat failed Unexpected (persistent) at stat

Context:
   uri: https://s3.us-west-2.amazonaws.com/pmw-backups-laptop-rustic-cold/config
   response: Parts { status: 400, version: HTTP/1.1, headers: {"x-amz-request-id": "FVWGD70YSD2FB1EF", "x-amz-id-2": "wnJtc3uryZlrkr99gmE1z1YUOEFvLz1uin6C2do8Em+tn61Q4+l/jX7y7nCPYO0728mpDCyFR5Q=", "content-type": "application/xml", "date": "Wed, 30 Oct 2024 23:55:04 GMT", "server": "AmazonS3", "connection": "close"} }
   service: s3
   path: config

[DEBUG] (1) opendal::services: service=s3 name=pmw-backups-laptop-rustic-cold path=config: stat failed
error: error listing the repo config file

My request: print the full response from S3 when the log level is trace or even debug.

This is what the AWS CLI does -- when verbose output is enabled, I see the full request and response payload.

@github-actions github-actions bot added the S-triage Status: Waiting for a maintainer to triage this issue/PR label Oct 31, 2024
@simonsan simonsan transferred this issue from rustic-rs/rustic Oct 31, 2024
@simonsan simonsan added A-meta Area: Project wide C-enhancement Category: New feature or request A-instrumentation Area: Related to tracing, instrumentation, metrics and removed S-triage Status: Waiting for a maintainer to triage this issue/PR A-meta Area: Project wide labels Oct 31, 2024
@Xuanwo
Copy link

Xuanwo commented Nov 16, 2024

Hi, stat is a HEAD request, which doesn't have a body, so we don't have additional information to return here.

It's very weird that HEAD can return a 400 response since the request seems correct to me. Could you share more details about your setup? If you are using a static key, please double-check if the access key and secret key are configured correctly.

@philipmw
Copy link
Author

philipmw commented Nov 16, 2024

Hello! I didn't realize that this is a HEAD request and that there's no more info to show me.

Yes, this problem has been perplexing me. I quadruple checked the access key, and it is the same bucket config that I use for restic that works. (Though I have not tried restic with this bucket.)

Here is my full rustic.toml:

[backup]
glob-files = ["/Users/pmw/Documents/laptop/backups/includes.dat"]

[repository]
repository = "opendal:s3"
repo-hot = "opendal:s3"
password-file = "/Users/pmw/Documents/laptop/backups/restic-repo-password.dat"

[repository.options]
access_key_id = "AKIA..."
secret_access_key = "..."

[repository.options-cold]
bucket = "pmw-backups-laptop-rustic-cold"
default_storage_class = "DEEP_ARCHIVE"

[repository.options-hot]
bucket = "pmw-backups-laptop-rustic-hot"

@Xuanwo
Copy link

Xuanwo commented Nov 16, 2024

Hi, @philipmw, thank you for providing more information on this. It's really helpful.

I now believe it's related to the DEEP_ARCHIVE storage class, which needs to be restored before it can be read. Calling stat (also known as HeadObject) on an archived file may return a 400 error. Please refer to https://docs.aws.amazon.com/AmazonS3/latest/API/API_RestoreObject.html for more details.

To verify my answer, please try using s3cmd to send a HeadObject or GetObject request to the same object.


cc @simonsan and @aawsome, perhaps we should consider handling config differently by not storing it as DEEP_ARCHIVE.

@philipmw
Copy link
Author

Thanks. The command I am running is rustic init, and both buckets are newly created and empty. I don't know what archived objects rustic is trying to stat or what I can do to get around it.

@Xuanwo
Copy link

Xuanwo commented Nov 16, 2024

Thanks. The command I am running is rustic init, and both buckets are newly created and empty. I don't know what archived objects rustic is trying to stat or what I can do to get around it.

Hi, by setting default_storage_class = "DEEP_ARCHIVE", all files written in pmw-backups-laptop-rustic-cold will be in DEEP_ARCHIVE storage class.

@philipmw
Copy link
Author

Is this not what I should be doing? I do want the -cold bucket to be in Deep Archive, and my understanding is that rustic supports this. An example of someone else configuring Deep Archive: https://kmh.prasil.info/posts/rustic-cold-storage-glacier-migration-configuration/

What's the right way to use Deep Archive with rustic?

@aawsome
Copy link
Member

aawsome commented Nov 16, 2024

Actually rustic should write the config file into the cold storage location as DEEP_ARCHIVE, if it is specified like that.
While rustic will not access any file/object in the cold storage, it requires to list contents. For config, it uses Operator::exists which seems to be calling stat internally. I wonder why this would return a 400 error.

@philipmw Did you ensure you have s3:GetObject and s3:ListBucket permission on your S3 bucket? If the error really is a 400 Bad Request error, the AWS docu says that only occurs when using encryption and wrong encryption request headers. Can you double check on this?

@aawsome
Copy link
Member

aawsome commented Nov 16, 2024

Actually another test to rule out any permission stuff would be to run the same config without using default_storage_class and see if the error still exists...

@philipmw
Copy link
Author

philipmw commented Dec 9, 2024

After a long vacation, I am back and spent some more time on this.

I figured out the problem for that 400 error: the access_key_id and secret_access_key defined in my rustic.toml were ignored. I found this out by looking at the IAM console and seeing that the "last accessed" timestamp wasn't updating whenever I tried to initialize the repository. On commenting out these credentials, there was no change to the behavior. So, they were ignored. Instead it must have been using some irrelevant credentials in my environment.

I fixed this by adding disable_config_load = "true" to my repository options. This (correctly) caused rustic to fail when my credentials were commented out. On uncommenting the credentials, rustic started working!

I believe this issue is still relevant, as I had a heck of a time diagnosing the failure. I assumed that the credentials I defined were being used. What would have helped is printing the access key ID and secret access key, at least in debug log-level. How do y'all feel about that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-instrumentation Area: Related to tracing, instrumentation, metrics C-enhancement Category: New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants