Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change us-west-2 check, as only working in EC2 instances #444

Open
sabinehelm opened this issue Feb 5, 2024 · 19 comments
Open

Change us-west-2 check, as only working in EC2 instances #444

sabinehelm opened this issue Feb 5, 2024 · 19 comments
Labels
type: enhancement New feature or request

Comments

@sabinehelm
Copy link

In pull request #424 API access to store._running_in_us_west_2() is added in the form of a printed statement that the user is (or is not) running in AWS region us-west-2.
Unfortunately the check in store._running_in_us_west_2() only works for EC2 instances. It is not working for example for ECS (Elastic Container Service) instances running in region us-west-2. Now the big question is: is it intended that only EC2 instances in region us-west-2 can access the data OR can it be any computing instance running in region us-west-2.

In the second case store._running_in_us_west_2() could be adapted using boto3 for checking the region following the code snippet in issues #231 :

if (boto3.client('s3').meta.region_name == 'us-west-2'):
    return True
else:
    raise ValueError('Your notebook is not running inside the AWS us-west-2 region, and will not be able to directly access NASA Earthdata S3 buckets')
    return False
@betolink
Copy link
Member

betolink commented Feb 5, 2024

The check is intended to verify in-region execution but shouldn't be limited to EC2, I think your change would be a valid PR if it works the same way in EC2!

@mfisher87 mfisher87 added the type: enhancement New feature or request label Feb 6, 2024
@sabinehelm
Copy link
Author

Thanks @betolink. This is great to hear!
I did not use exactly the same code snippet as given in #231. But we used the following to get the current region, which should also work from EC2:

my_session = boto3.session.Session()
my_region = my_session.region_name

@JessicaS11
Copy link
Collaborator

@sabinehelm Thanks for sharing your updated solution. We worked on this a bunch today and decided to try and use botocore directly:
botocore.session.get_session().get_config_variable("region")

Can you confirm whether or not this will work for your use case?

@jhkennedy
Copy link
Collaborator

jhkennedy commented Feb 6, 2024

I don't think boto3/botocore is going to do what you want -- namely, determine which region you're actually running in.

Boto is going to pull the session information from your AWS config or AWS_* environment variables, so it's more checking what region you're configured to access (what APIs to hit) than what region you're actually running in.

For example, on my laptop:

>>> import botocore.session
>>> botocore.session.get_session().get_config_variable("region")
'us-west-2'

because I have my default region in my ~/.aws/config set up like so:

[default]
region = us-west-2

Likewise, if I instead do:

>>> import os
>>> import botocore.session
>>> os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'
>>> botocore.session.get_session().get_config_variable("region")
'us-east-1'

AFAIK, from an EC2 instance, the only way to determine what region the AWS instance is running in is to hit this special IP address:

http://169.254.169.254/latest/meta-data/placement/region

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html#instance-metadata-ex-1

And you'll need to handle IMDSv1 or IMDSv2 metadata rquests -- see this answer on SO:
https://stackoverflow.com/a/77902397

This should work on ECS configured for EC2, but I don't know if it works on ECS configured for Fargate (I suspect it will though).

@jhkennedy
Copy link
Collaborator

It looks like on Fargate, the AWS_REGION environment variable is set, so the botocore method should work there, but it's not robust in that it's just checking environment variables that are mutable and used to primarily select which APIs to interact with, not to determine what region you're running in.

@jhkennedy
Copy link
Collaborator

On ECS, here's a summary of available metadata:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint.html

So, determining this is non-trivial and requires knowing a bit about the service you're running in.

@sabinehelm
Copy link
Author

@sabinehelm Thanks for sharing your updated solution. We worked on this a bunch today and decided to try and use botocore directly: botocore.session.get_session().get_config_variable("region")

Can you confirm whether or not this will work for your use case?

@JessicaS11: Thanks for your response. I tested the code snippet. It would work for our use case. But I fear it is already outdated after the last comments.

@JessicaS11
Copy link
Collaborator

Thanks @sabinehelm, and agreed!

@jhkennedy, can you comment on what you'd recommend for #231 and #424? If I understand what you're saying correctly, our current implementation in #424 would not properly keep the user from running a notebook out of region (or tell us if someone really is working out of region), depending on how their config parameters are set.

@jhkennedy
Copy link
Collaborator

jhkennedy commented Feb 8, 2024

If I understand what you're saying correctly, our current implementation in #424 would not properly keep the user from running a notebook out of region (or tell us if someone really is working out of region), depending on how their config parameters are set.

Yes, correct. The current implementation will determine which AWS API you're configured to hit. That could be something the user manually configured or the one set by a service, but since it's all via config files or environment variables, there's no good way of knowing which.

@jhkennedy, can you comment on what you'd recommend for #231 and #424? I

@JessicaS11 I'm not sure -- this is hard.

So, I'll start by saying the answer to:

Was the spirit to check specifically for us-west-2, or to enable the user to see what region they are running in?

It should probably be yes to both, or at least confirm what region they are in.

So, overall, I think I'd recommend:

  1. creating an earthaccess.aws module with:
    1. get_ec2_region, which has most of the current implementation inside _running_in_us_west_2
      https://github.com/nsidc/earthaccess/blob/main/earthaccess/store.py#L146-L155
    2. get_fargate_region, which would need to do the same, but for fargate as detailed here:
      https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-metadata.html
    3. (maybe?) get_ecs_container_region as detailed here:
      https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-metadata.html
    4. (maybe) get_config_region which would use boto3/botocore as in add us-west-2 check to API and improve auth repr #424
    5. And then a get_region (get_running_region?) method that'd go through the above 4 in order and return the first one that didn't raise (or return None depending on implementation). If it hit 4, I'd probably also throw a warning like "Region inferred from AWS config or environment variable and may not represent the region you're running in."

Then you could also have a convenience method like

def earthaccess.aws.ensure_in_region(region: str = 'us-west-2') -> bool:

on top of it and use it in the Earthaccess Store.

Note: Actual method/function names could be improved


All that said, this seems like something that should already exist so it'd be worth spending some time searching GitHub/PyPI... I don't see anything though, oddly.

@jhkennedy
Copy link
Collaborator

For ec2, it might be worth just using this package:
https://github.com/adamchainz/ec2-metadata

It looks well maintained, is sponsored, and is a "critical" project on PyPI.

I still don't see anything for ECS and Fargate, however.

@betolink
Copy link
Member

betolink commented Feb 9, 2024

I like what you're proposing @jhkennedy, and one would think that this should already be on a package that would work in all the execution environments in AWS, EC2, ECS, Lambda, etc

@yuvipanda
Copy link

ec2-metadata will not work on any z2jh instances by default, as access to the metadata server is explicitly blocked by default (https://z2jh.jupyter.org/en/stable/administrator/security.html#block-cloud-metadata-api-with-a-privileged-initcontainer-running-iptables).

@betolink
Copy link
Member

If I test the current approach to verify in-region execution it works from Openscapes (2i2c) is this the same endpoint? @yuvipanda

http://169.254.169.254/latest/meta-data/placement/region

def _running_in_us_west_2(self) -> bool:

@yuvipanda
Copy link

@betolink yes, because we've intentionally unblocked that access point in the openscapes hub :) But we coupled it with appropriate IRSA roles so it is secure. At least when I last looked, just unblocking access to the metadata server without setting up IRSA or similar was pretty insecure.

@betolink
Copy link
Member

Ah, this reminded me of an issue the VEDA hub reported, when using earthaccess the library didn't detect in-region execution and used the HTTP links, this is probably why this is happening. cc @abarciauskas-bgse

@yuvipanda
Copy link

yeah, i'd suggest (similar to @jhkennedy elsewhere) to look at possibly looking for the redirects coming to figure out whatever you need to do internally, as ultimately everything else is going to be only a heuristic. For example we're going to do 2i2c-org/infrastructure#3273 soon for the openscapes hub, not sure what effect that will have on ec2-metadata.

@abarciauskas-bgse
Copy link
Contributor

abarciauskas-bgse commented Feb 23, 2024

@betolink sorry for the delay here but I verified that an updated earthaccess (v0.8.2) does open via S3 direct access on VEDA, whereas the current version installed on VEDA's Hub (v0.5.2) does not properly register that the instance is in-region. Hopefully this will be resolved once NASA-IMPACT/veda-jh-environments#41 is completed.

@itcarroll
Copy link
Collaborator

Overheard from maintainers of oss.smce.nasa.gov:

yes, we do indeed block the instance metadata (for security reasons), so the check that earthaccess is making is not ideal for what they’re trying to do.

@meteodave
Copy link

I am using AWS pcluster instance and I need to add earthaccess.__store__.in_region = True to my script to enable the s3 earthdata.download() transfer while already located in the US-West-2 region.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
Status: 🆕 New
Development

Successfully merging a pull request may close this issue.

9 participants