Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aws sts library to support IRSA for logging #263

Closed
wants to merge 6 commits into from

Conversation

msaffitz
Copy link
Contributor

What / How

Adds libs.aws.java.sdk.sts to the classpath to expose WebIdentityTokenFileCredentialsProvider for use with IRSA (IAM Roles for Service Accounts) for logging to S3. This (hopefully!) will enable use of S3 without providing keys. The docs on this for v1 of the SDK are little ambiguous; the v2 docs are much clearer. See also the discussion here

This builds on the previous work done in #7766 and supersedes #177.

Can this PR be safely reverted / rolled back?

  • YES 💚
  • NO ❌

🚨 User Impact 🚨

No known user impact.

@msaffitz
Copy link
Contributor Author

@bothra90 -- FYI. You mentioned here #177 (comment) that you had to make some changes and were able to get things working. Does this look right to you? Did I miss anything?

@msaffitz
Copy link
Contributor Author

cc @pmossman -- Thanks for the help with the previous PR; FYI for this follow on.

@bothra90
Copy link

@msaffitz: Here's a git patch for what I was able to get working: https://gist.github.com/bothra90/62403b32449c90807ee4b8ec9c14f862. Your PR is in particular missing the necessary changes to log4j2.xml.

@msaffitz
Copy link
Contributor Author

Thanks @bothra90 . With removing the values from log4j2.xml, did you confirm if logging still works when using AWS keys? I wasn't sure about that part.

(I'm curious about the error you saw with leaving the values in the XML file -- I'd think the null / empty string values here would both be ok? Unfortunately I don't have a good way to test this locally at the moment as our AWS stuff is only configured to pull from official sources).

@bothra90
Copy link

Yes, I tested that logging works with the values removed.

@msaffitz
Copy link
Contributor Author

Awesome, thanks for confirming. I'll update this PR this evening with those additional changes.

@weichunnn
Copy link

Thanks @msaffitz @bothra90 - looking forward to this PR getting landed into the main codebase 👀

@@ -12,8 +12,6 @@

<Property name="s3-bucket">${sys:S3_LOG_BUCKET:-${env:S3_LOG_BUCKET}}</Property>
<Property name="s3-region">${sys:S3_LOG_BUCKET_REGION:-${env:S3_LOG_BUCKET_REGION}}</Property>
<Property name="s3-aws-key">${sys:AWS_ACCESS_KEY_ID:-${env:AWS_ACCESS_KEY_ID}}</Property>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msaffitz can you explain why the removal of these lines are necessary? What happens if you introduce the sts library without removing these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure; I had the same question above (#263 (comment)) and @bothra90 indicated that removing them was necessary to get things functioning. @bothra90 -- do you have more context on this?

Copy link

@bothra90 bothra90 Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was unable to create a source from the console and seeing the following error log in airbyte-server process:

Cannot end publish with com.van.logging.aws.S3PublishHelper@53f4b9a9 due to error:
Cannot end publishing: Cannot publish to S3: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code:
InvalidAccessKeyId; Request ID: TY5JJZ76SMD9BA86; S3 Extended Request ID: WEK4bCkgavHUqbgoywsSNVMwBt14/ke89r9mC/usS879oBQ1MYNw+S/2urb3xedmfWxWBpfj90TMLqnYL+hCfw==; Proxy: null)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this internally with some folks that have more familiarity around our Log4J2 setup. Simply removing these properties won't work -- we use an underlying log4j-appender implementation that will need to support IRSA.

We opened an issue for this in that project so we'll wait and see what the maintainer says.

Copy link

@bothra90 bothra90 Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it did the job for me. Happy to jump on a quick Slack huddle and demo it to you or @davinchia

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that the underlying appender library that I linked above actually supports roles by default. The s3-aws-key and s3-aws-secret properties are optional, for situations where role-based credentials aren't available. That said, I'm concerned that if we simply delete these properties from the xml configuration, we could break Airbyte installations that are using those properties instead of roles.

An ideal solution would be backwards compatible for such installations, though I'm not sure what that would look like, since it seems like as soon as these tags are defined, that appender will try to use them, even if the value is blank

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the concern. Ultimately though, the passed parameters get used here and if the s3AwsSecret and s3AwsKey are not provided, the library will use the default AWS providers chain which will correctly use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. This is how these variables in the configuration are populated in the first place, so I don't suspect this would be an issue.

Having said that, I can try to test this out on my end and make sure nothing breaks.

Copy link

@weichunnn weichunnn Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconded @bothra90's comment. Given that Airbyte is already following the standard naming conversion for storing AWS keys which is how it's currently being passed as environment variables, I also don't see any issue with letting the DefaultAWSCredentialsProviderChain handle it instead of being passed explicitly.

Using the ProviderChain would be backwards compatible with existing setup as any existing or new credentials will just be picked up according to their priority - in this case ENV first followed by roles / instance profile

Reference: https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bothra90 @weichunnn thanks for pointing this out, and apologies for the slow-moving back and forth here. I agree that it seems like these XML properties look unnecessary. @bothra90 if you're able to test and verify that this works as expected on your end, I'd love to see your results.

Otherwise, I'll open this PR against our internal repo where I can run our full suite of tests. Thanks for bearing with me here!

Copy link

@weichunnn weichunnn Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmossman! Checking in to see if there are any updates on the review internally? 👀

@pmossman
Copy link
Contributor

pmossman commented Aug 9, 2023

/create-oss-pr

@pmossman
Copy link
Contributor

@msaffitz @bothra90 @weichunnn I was able to test these changes with accessKey/secretKey configuration and verified it's all working as expected. The internal PR is merged, so I am going to close this PR. Thank you all for your engagement and contribution!

@pmossman pmossman closed this Aug 19, 2023
@msaffitz msaffitz deleted the fix_irsa_logging branch August 22, 2023 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants