title | tags | authors | affiliations | date | bibliography | |||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rabpro: global watershed boundaries, river elevation profiles, and catchment statistics |
|
|
|
25 February 2022 |
paper.bib |
River and Basin Profiler (rabpro
) is a Python package to delineate watersheds, extract river flowlines and elevation profiles, and compute watershed statistics for any location on the Earth’s surface. As fundamental hydrologically-relevant units of surface area, watersheds are areas of land that drain via aboveground pathways to the same location, or outlet. Delineations of watershed boundaries are typically performed on digital elevation models (DEMs) that represent surface elevations as gridded rasters. Depending on the resolution of the DEM and the size of the watershed, delineation may be very computationally expensive. With this in mind, we designed rabpro
to provide user-friendly workflows to manage the complexity and computational expense of watershed calculations given an arbitrary coordinate pair. In addition to basic watershed delineation, rabpro
will extract the elevation profile for a watershed’s main-channel flowline. This enables the computation of river slope, which is a critical parameter in many hydrologic and geomorphologic models. Finally, rabpro
provides a user-friendly wrapper around Google Earth Engine’s (GEE) Python API to enable cloud-computing of zonal watershed statistics and/or time-varying forcing data from hundreds of available datasets. Altogether, rabpro
provides the ability to automate or semi-automate complex watershed analysis workflows across broad spatial extents.
Watersheds play a central and vital role in many scientific, engineering, and environmental management applications (See @brooks_hydrology_2003 for a comprehensive overview). While rabpro
can benefit any watershed-based research or analysis, it was designed to satisfy the needs of data-driven rainfall-runoff models. These models aim to predict streamflow (runoff) time series as a function of precipitation over upstream land area (i.e. the watershed). In addition to watershed delineations and precipitation estimates, they typically require data on both time-varying parameters (or forcing data) like temperature, humidity, soil moisture, and vegetation as well as static watershed properties like topography, soil type, or land use/land cover [@kratzert_toward_2019; @gauch_rainfallrunoff_2021; @nearing_data_2021; @kratzert_note_2021]. The rabpro
API enables users to manage the complete data pipeline necessary to drive such a model starting from the initial watershed delineation through the calculation of static and time-varying parameters. Some hydrologic and hydraulic models also require channel slope for routing streamflow [@boyle_toward_2001; @piccolroaz_hyperstream_2016; @wilson_water_2008], developing rating curves [@fenton_calculation_2001; @colby_relationship_1956], or modeling local hydraulics [@schwenk_life_2015; @schwenk_high_2017; @schwenk_meander_2016].
The need for watershed-based data analysis tools is exemplified by the growing collection of published datasets that provide watershed boundaries, forcing data, and/or watershed attributes in precomputed form, including CAMELS [@addor_camels_2017], CAMELS-CL, -AUS, and -BR [@alvarez-garreton_camels-cl_2018; @fowler_camels-aus_2021; @chagas_camels-br_2020], Hysets [@arsenault_comprehensive_2020], and HydroAtlas [@linke_global_2019]. These datasets provide off-the-shelf options for building streamflow models, but they suffer from a degree of inflexibility. For example, someone desiring to add a watershed attribute, to use a new remotely-sensed data product, or to update the forcing data time-series to include the most recently available data must go through the arduous process of sampling it themselves. rabpro
was designed to provide flexibility for both building a watershed dataset from scratch or appending to an existing one.
While we point to streamflow modeling as an example, many other applications exist. rabpro
is currently being used to contextualize streamflow trends, build a data-driven model of riverbank erosion, and generate forcing data for a mosquito population dynamics model. rabpro
's focus is primarily on watersheds, but some users may also find rabpro
's Google Earth Engine wrapper convenient for sampling raster data over any geopolygon(s). For example, Earth System Models commonly require sampling raster datasets over watersheds or other polygons for parameterizations and validations [@fisher2019parametric; @chen2020global].
The importance of watersheds, availability of DEMs, and growing computational power has led to the development of many excellent open-source terrain (DEM) analysis packages that provide watershed delineation tools, including TauDEM [@tarboton_terrain_2005], pysheds [@bartos_pysheds_2020], Whitebox Tools [@lindsay_whitebox_2016], SAGA [@conrad_system_2015], among many others. Computing statistics and forcing data from geospatial rasters also has a rich history of development, and Google Earth Engine [@gorelick_google_2017] has played an important role. Almost a decade has passed since Google Earth Engine has been available to developers, and the community has in-turn developed open-source packages to interface with its Python API in user-friendlier ways, including gee_tools [@principe_gee_tools_2021], geemap [@wu_geemap_2020], eemont [@montero_eemont_2021], and restee [@markert_restee_2021]–each of which provides support for sampling zonal statistics and time series from geospatial polygons.
However, to our knowledge, rabpro
is the only available package that provides efficient end-to-end delineation and characterization of watershed basins at scale. While a combination of the cited terrain analysis packages and GEE toolboxes can achieve rabpro
’s functionality, rabpro
’s blending of them enables simpler, less error-prone, and faster results.
One unique rabpro
innovation is its automation of “hydrologically addressing” input coordinates. DEM watershed delineations require that the outlet pixel be precisely specified; in many rabpro
use cases, this is simply a (latitude, longitude) coordinate that may not align with the underlying DEM. rabpro
will attempt to “snap” the provided coordinate to a nearby flowline while minimizing the snapping distance and the difference in upstream drainage area (if provided by the user). Another unique rabpro
feature is the ability to optimize the watershed delineation method according to basin size such that pixel-based (from MERIT-Hydro [@yamazaki_merit_2019]) delineations can be used for more accurate estimates and/or smaller basins, and coarser subbasin-based (from HydroBASINS [@lehner_hydrobasins_2014]) delineations can be used for rapid estimates of larger basins.
rabpro
executes watershed delineation based on either the MERIT-Hydro dataset, which provides a global, ~90 meter per pixel, hydrologically-processed DEM suite, or the HydroBASINS data product, which provides pre-delineated subbasins at approximately ~230 km^2 per subbasin. Conceptually, basin delineation is identical for both. The user-provided coordinate is hydrologically addressed by finding the downstream-most pixel (MERIT-Hydro) or subbasin (HydroBASINS). The watershed is then delineated by finding all upstream pixels or subbasins that drain into the downstream pixel/subbasin and taking the union of these pixels/subbasins to form a single polygon. A user must therefore download either the MERIT-Hydro tiles covering their study watershed or the appropriate HydroBASINS product; rabpro
provides tooling to automate these downloads and create its expected data structure (See the Downloading data notebook). rabpro
does not currently provide support for custom watershed datasets similar to HydroBASINS due to attribute field and data structure requirements that must be consistent for generalizability.
There are three primary operations supported by rabpro
: 1) basin delineation, 2) elevation profiling, and 3) subbasin (zonal) statistics. If operating on a single coordinate pair, the cleanest workflow is to instantiate an object of the profiler
class and call (in order) the delineate_basins()
, elev_profile()
, and basin_stats()
methods (See the Basic Example notebook). If operating on multiple coordinate pairs, the workflow is to loop through each coordinate pair while delineating each watershed (optionally calculating its elevation profile). As the loop runs, the user collects each basin polygon in a list, concatenates the list, and directly calls basin_stats.compute()
on the resulting GeoDataFrame (See the Multiple Basins Example notebook). More details on package functionality can be found in the documentation.
rabpro
relies on functionality from the following Python packages: GDAL [@gdalogr_contributors_gdalogr_2020], NumPy [@harris_array_2020], GeoPandas [@jordahl_geopandasgeopandas_2020], Shapely [@gillies_shapely_2007], pyproj [@snow_pyproj4pyproj_2021], scikit-image [@van_der_walt_scikit-image_2014], scipy [@virtanen_scipy_2020], and earthengine-api [@gorelick_google_2017]. Use of the watershed statistics methods requires a free Google Earth Engine account. Required MERIT-Hydro and HydroBASINS data are freely available for download by visiting their websites or using rabpro
’s download scripts; MERIT-Hydro requires users to first register to receive a username and password for access to downloads.
Jordan Muss, Joel Rowland, and Eiten Shelef envisioned and created a predecessor to rabpro
and helped guide its early development. rabpro
was developed with support from the Laboratory Directed Research and Development program of Los Alamos National Laboratory (Project numbers 20210213ER, 20220697PRD1) and as part of the Interdisciplinary Research for Arctic Coastal Environments (InteRFACE) project through the Department of Energy, Office of Science, Biological and Environmental Research Earth and Environment Systems Sciences Division RGMA program, awarded under contract grant #9233218CNA000001 to Triad National Security, LLC (“Triad”). TZ was supported by funding from the Columbia Undergraduate Scholars Program Summer Enhancement Fellowship.