Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(aws-lambda): Refactor AWS initialization #11534

Closed
wants to merge 3 commits into from

Conversation

hanshuebner
Copy link
Contributor

@hanshuebner hanshuebner commented Sep 9, 2023

Summary

#11350 introduced a new plugin callback function init() which would be called after plugin schemas had been initialized. The purpose of the function was to provide a hook in the init phase, as that is the only time in which the full process environment is accessible. The AWS Lambda plugin was changed to use this function to initialize the global AWS configuration, which uses a couple of environment variables, as well as initialize the global AWS instance.

The instance initialization reaches out to the AWS metadata service using an HTTP request, and as that happened during the initialization of the AWS Lambda plugin (which is enabled by default), the AWS metadata request would be made during Kong startup, even if no AWS Lambda plugin was configured or when not running in AWS. This caused a five second startup delay in most non-AWS environments before the metadata request would time out.

This PR changes the initialization of the global AWS instance so that it is done when the AWS Lambda plugin is first used. This means that the first request making use of AWS Lambda will see an additional delay caused by the initialization, but as the metadata endpoint is always local to the host running Kong, while the Lambda environment might be in a different data center and need to bring up the Lambda function, the additional delay is not significant.

As the init() function is no longer needed, it is completely removed again.

Supersedes #11528

Checklist

  • The Pull Request has tests
  • A changelog file has been added to CHANGELOG/unreleased/kong or adding skip-changelog label on PR if unnecessary. README.md
  • There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Full changelog

1bf074ad18 feat(concurrency): add new get_worker_singleton function
40b2d2a780 refactor(aws-lambda): use worker singleton for AWS instance
af75f1e656 refactor(core): remove plugin init() function

Issue reference

KAG-2551

kong/tools/aws.lua Show resolved Hide resolved
kong/tools/aws.lua Outdated Show resolved Hide resolved
@windmgc
Copy link
Member

windmgc commented Sep 12, 2023

Hmm, CI seems failing, did a quick test(with adding some logging) and there is some circular loading inside

2023/09/12 16:37:35 [error] 74871#0: *372 [kong] concurrency.lua:169 [aws-lambda] failed to get singleton: ...azel-bin/build/kong-dev/share/lua/5.1/resty/aws/init.lua:534: ...-bin/build/kong-dev/openresty/lualib/resty/core/base.lua:80: loop or pr
evious error loading module 'resty.aws.credentials.CredentialProviderChain'
stack traceback:
        [C]: in function 'error'
        ...-bin/build/kong-dev/openresty/lualib/resty/core/base.lua:80: in function <...-bin/build/kong-dev/openresty/lualib/resty/core/base.lua:76>
        [C]: in function 'xpcall'
        ...azel-bin/build/kong-dev/share/lua/5.1/resty/aws/init.lua:510: in function 'init'
        ./kong/concurrency.lua:166: in function <./kong/concurrency.lua:164>
        [C]: in function 'pcall'
        ./kong/concurrency.lua:130: in function 'with_coroutine_mutex'
        ./kong/concurrency.lua:159: in function 'tools_aws'
        ./kong/plugins/aws-lambda/handler.lua:53: in function <./kong/plugins/aws-lambda/handler.lua:38>, client: 127.0.0.1, server: kong, request: "GET /get?a=1&b=2 HTTP/1.1", host: "lambda21.com"

@hanshuebner
Copy link
Contributor Author

@windmgc Thanks for pointing at this again. I'll need to talk to Thijs to get some input how we can make late initialization of lua-resty-aws possible. It might need to be changed for that.

@windmgc
Copy link
Member

windmgc commented Sep 13, 2023

@hanshuebner I found a way to work around this - We need to manually call global aws config before require("resty.aws")(), so that the global config is initialized before the require happens. Otherwise, the call chain ProviderChain require xpcall -> global config being used -> utils being used -> global config again will create a loop inside require

@hanshuebner
Copy link
Contributor Author

@windmgc I have been investigating where we can require("resty.aws.utils") because I ran into issues with it (yielding across a C call boundary). The problem is that resty.aws.utils uses .config and that in turn can yield. Consequently, it can only be required on the module top level, and that in turn causes the metadata access.

@Tieske and I discussed this and we're going to fix this by globally replacing the require implementation by one written in Lua. This already happens in busted tests so we believe it is safe to move it to globalpatches and not have to worry about this problem anymore.

@hanshuebner hanshuebner marked this pull request as draft September 13, 2023 08:14
@windmgc
Copy link
Member

windmgc commented Sep 13, 2023

The problem is that resty.aws.utils uses .config and that in turn can yield.

@hanshuebner Hmm, If I understand correctly, the reason of the yield is because that resty.aws.utils tries to access global config and thus triggers the getCurrentRegion, is that right? Since we've decided to postpone the happening of getCurrentRegion to let it happen when the singleton is being used for the first time(it will be in the access phase in the lambda plugin), then we shouldn't require it on the module top level, which means that replacing the require implementation seems not related to the singleton change? Am I missing something here?

@hanshuebner
Copy link
Contributor Author

The problem that we have is that we need to require resty.aws.utils eventually, as it contains the region detection code, and that module must first be required on the top level. If it is required on first use, it will try to yield when querying the metadata service, which is not currently possible from within require other than when Kong is initializing.

A worker singleton is a worker-global variable that is initialized only once.  The
get_worker_singleton function is used to access a worker singleton by name.  It
makes sure that when the variable is not yet initialized and multiple coroutines
try to access it, only one calls the initialization functions and other coroutines
wait.  If get_worker_singleton is called in the init_worker phase, initialization
is performed by the first call without locking as there are no other coroutines yet.
The init() function added in ea85db8 was perceived to have
too little overall value.  It is thus removed.
@hanshuebner hanshuebner force-pushed the refactor/aws-initialization branch 3 times, most recently from 0817f31 to fd95058 Compare September 14, 2023 13:46
@hanshuebner
Copy link
Contributor Author

Superseded by #11614

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants