Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#157] Example notebook for NLDI data access #158

Merged

Conversation

pkdash
Copy link
Contributor

@pkdash pkdash commented Aug 30, 2024

No description provided.

@pkdash
Copy link
Contributor Author

pkdash commented Aug 30, 2024

@thodson-usgs I am waiting for Jeff to do some final edits to the new notebook for nldi data access. After that I it should be ready for review.

@thodson-usgs thodson-usgs self-requested a review September 3, 2024 21:40
@pkdash pkdash marked this pull request as ready for review September 10, 2024 13:49
@pkdash
Copy link
Contributor Author

pkdash commented Sep 12, 2024

@thodson-usgs It is now ready for review. Thanks.

@ehinman
Copy link
Collaborator

ehinman commented Sep 25, 2024

@pkdash I will take a look in the next week. Thanks for contributing!

Copy link
Collaborator

@ehinman ehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good! I definitely learned a lot. As someone who has not used this module before, my questions represent how a beginner might struggle with the nldi functions. I also feel that the nldi module could benefit from more detailed descriptions of what the arguments do/mean. For example, what is stop_comid in get_flowlines(), and what does trim_start() trim? I am also curious if you can use basin polygons as features from which to obtain upstream/downstream basins? If so, that might be a nice example to add.

"source": [
"### Install the Package\n",
"\n",
"Use the following code to install the package if it doesn't exist already within your Jupyter Python environment. Note the `nldi` option in the `dataretrieval` package installation. The default `dataretrieval` package installation does not support NLDI data access. You must run the dataretrieval install with the `nldi` option to ensure that you have the necessary capabilities. If you are running this notebook in the CUAHSI JuypyterHub server linked to HydroShare, you will want to run the following pip install command. The base `dataretrieval` package is already installed in the CUAHSI JupyterHub Python environment, but it does not include the NLDI option."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I follow this. Are you saying the official release of dataretrieval-python does not have the nldi module? I think if you pip install the latest version, the default does come with NLDI. Please let me know if I misunderstand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NLDI was added after the last official release.

"\n",
"The following arguments are supported:\n",
"\n",
"* **feature_source** (string): The name of the NLDI feature source.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One difficulty I have here is that I don't know all the feature_source options. I see in this example wqp and the NLDI blog post mentions nwissite, but do you know of a place where one can reference all possible inputs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is something USGS should answer. @dblodgett-usgs might recommend a NLDI API reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The following arguments are supported:\n",
"\n",
"* **feature_source** (string): The name of the NLDI feature source.\n",
"* **feature_id** (string): The identifier of the NLDI feature.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to previous comment, it's not clear the format of the feature_id. I see examples with monitoring locations, like USGS-#######, but what about other features?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to know the id you are looking for or discover it through navigation requests.

demos/hydroshare/USGS_dataretrieval_NLDI_Examples.ipynb Outdated Show resolved Hide resolved
"metadata": {},
"source": [
"#### Example 2: Get flowline data using a NHDPlus comid\n",
"In some cases, you may wish to get flowline data associated with a NHDPlus COMID rather than flowlines associated with a monitoring station. The following shows how to get flowline data for a COMID as a geopandas dataframe."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a link to what a COMID is?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehinman that's our job to define. I'm also hesitant to use links that we know will break. Perhaps a conceptual description would suffice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

demos/hydroshare/USGS_dataretrieval_NLDI_Examples.ipynb Outdated Show resolved Hide resolved
"* \"WQP\" - Water Quality Portal\n",
"* \"comid\" - NHDPlus comid\n",
"\n",
"There are a few others - for a complete list, consult the getDataSources endpoint of the NLDI API. For this example, we will trace upstream along the mainstem and find any NWIS surface water sites located along the mainstem of the river."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"There are a few others - for a complete list, consult the getDataSources endpoint of the NLDI API. For this example, we will trace upstream along the mainstem and find any NWIS surface water sites located along the mainstem of the river."
"There are a few others - for a complete list, consult the getDataSources endpoint of the NLDI API. For this example, we will trace from an NWIS surface water site upstream along the mainstem and find any NWIS surface water sites located along the mainstem of the river."

"metadata": {},
"outputs": [],
"source": [
"gdf = nldi.get_features(lat=43.073051, long=-89.401230)\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the buffer distance for a call like this? Can you use navigation mode with a lat/long input?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece is actually throwing an error for me: ValueError: Error getting features for lat '43.073051' and long '-89.40123'. Error reason: Internal Server Error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregard bug comment, above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehinman I was able to run it without error:
image

"source": [
"***\n",
"\n",
"### Examples for the `search()` function:\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little bit confused by the search function. I don't see how it differs from get_basin/flowlines/features(as_json=True)? I could see using search() for situations when someone isn't sure what kind of geometry they want, they just want to know the geometries available around a certain point.

Copy link
Collaborator

@thodson-usgs thodson-usgs Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think search mainly allows you to search for multiple feature types in a single query, which can be much more efficient. My recollection was that I wanted to tweak the implementation somewhat, but that's not on @pkdash. He did great for having little-to-no requirements to work from.

"source": [
"#### Example 1: Get all indexed features along a specified navigation path\n",
"Get all of the indexed features from a particular data source using a navigation path that traces upstream along the mainstam (UM) and uses as an origin for the trace a feature from a given feature source (e.g., a monitoring station from the Water Quality Portal). You can get any indexed features from any of the included data sources. Example data sources and the codes you need to retrieve features from those sources include:\n",
"* \"census2020-nhdpv2\" - 2020 Census Block - NHDPlusV2 Catchment Intersections\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this list is really helpful and should be at the top. Have you considered providing this list as a function like dataRetrieval::get_nldi_sources()?

@thodson-usgs
Copy link
Collaborator

I would just add that @ehinman makes some good points, but some seem out of scope of the PR and will be for us to follow up later.

I also feel that the nldi module could benefit from more detailed descriptions of what the arguments do/mean. For example, what is stop_comid in get_flowlines(), and what does trim_start() trim?

After reviewing our own swagger doc, I'm also confused. As these functions aren't implemented in the PR, this is a little out of scope and perhaps not something @pkdash can answer.

@dblodgett-usgs
Copy link
Contributor

BTW, we are migrating the NLDI to the URL in that link. If the "labs" url is used in this package, can you change "https://labs.waterdata.usgs.gov/api/nldi/" to "https://api.water.usgs.gov/nldi/" throughout?

thodson-usgs and others added 3 commits October 3, 2024 23:19
Co-authored-by: Elise Hinman <121896266+ehinman@users.noreply.github.com>
Co-authored-by: Elise Hinman <121896266+ehinman@users.noreply.github.com>
Co-authored-by: Elise Hinman <121896266+ehinman@users.noreply.github.com>
Copy link
Collaborator

@thodson-usgs thodson-usgs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, for your review @ehinman. Indeed, there are a few loose ends for us to follow up on, but otherwise, this contribution is a clear improvement to our documentation, and I support merging it.

@thodson-usgs thodson-usgs merged commit 103724b into DOI-USGS:main Oct 4, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants