This document serves as a guide for cities-cif developers who would like to contribute to this project. At it's core, this project intents to standardize the process of caclulating zonal statistics for city boundaries. To do so we need 3 basic components
- Data layers that we want to use as inputs
- Indicator functions that define the desired calculations
- City boundary files to run the indicator calculation on
The city_metrix
library allows users of geospatial data to collect and apply zonal statistics on Global Geospatial Datasets for measuring sustainability indicators in urban areas.
It provides two main functionalities:
- Extracting geospatial
layers
based on specific areas of interests (defined as geodataframe)' These data layers are collected from any cloud source (Google Earth Engine, AWS S3 public buckets, Public APIs). Two formats of data layers are handled incity_metrix
: Rasters and vectors.- Rasters data are collected and transformed into arrays using
xarray
(GEE images collections are converted also intoarrays
usingxee
). - Vectors data are stored as
GeoDataFrame
.
- Rasters data are collected and transformed into arrays using
- Measuring
indicators
using the extractedlayers
by implementing zonal statistics operations
The main package source code is located in the city_metrix
directory.
The layers
sub-directory contains the different scripts used to extract layers from various data sources. Every layer
is defined in a separate python
file (with the name of script referencing the name of the layer).
Every layer is defined as a python class
, which contains all instructions to calculate and extract the data from the global data sources.
Every layer class
should implement at least a get_data
function that either returns an xarray
with the raster data or a GeoDataFrame
with the vector data. layer
classes can also define parameters in their __init__
function the effect how the data is extracted (e.g. a date range, land cover class, etc.)
This will be used in the indicators
script to collect the data based on a region of interest.
The indicators methods are defined in the metrics
folder.
Every indicator is implemented as a separate function in a separate file that uses the layers
extraction defined in the layers
sub-module.
The layer
objects can be chained together similar to pandas
DataFrames to perform zonal statistics. Generally, you'll need three things
- The
layer
you want to collect any statistics over (e.g. count, mean) - Any
layers
you want to apply as a mask (e.g. built-up land) - A
GeoDataFrame
representing one city (and possibly many subdistricts) you wants to use as zones
For example, you can get the tree cover count in built up land over Jakarta with the following code:
TreeCover(min_tree_cover=10).mask(EsaWorldCover(land_cover_class=EsaWorldCoverClass.BUILT_UP)).groupby(jakarta_gdf).count()
``
The indicators function takes as input a GeoDataFrame
(defined by zones
) and returns the indicator values.
By default we assume you will provide the city boundary files you want to run calculations on.
We also have and API for storing city polygons and calculated metrics in a standardized way for projects that want a system to keep track of their inputs and outputs. If you want to use that, get in touch with Saif, Tori, or Chris.
-
Before getting started here, check out the main project README file to setup your local environment and run through the tutorial to get a sense of how to use the existing functionality.
-
We keep track of all our datasets, layers, and cities in Airtable so you should make sure you have premission to add records to https://airtable.com/appDWCVIQlVnLLaW2
-
[Optional] If you want to add new cities or indicator values to the API, you will need access to our Carto account but the framework does not depend on this.
Hopefully we already have the layers you need in city_metrix/layers/
and you can skip this step. If not here is the process of creating a new one.
-
Add a record in the Airtable Datasets table
You should add a record to the Airtable table for the new dataset. There should be a formal Name of the data layer with the associated metadata, including Theme, Data source, Providor, Spatial resolution, etc. You should also link the record to the Indicators (if any) using this dataset for calculation.
-
Add a record in the Airtable Layers table
You should add a record to the Airtable table for the new data layer. The data layer is processed data from the existing or new dataset. E.g., esa_world_cover is the world cover layer from ESA, and natural_areas is a reclassification of ESA world cover. The new data layer should be linked to the Datasets that generated it, and Indicators (if any) using this layer for calculation.
-
Create a Python file in city_metrix/layers
For consistency, the Python file name should match the Name in the Airtable Layers table. Each data layer should be a class with
__init__()
andget_data()
functions. It could also use functions defined in layers.py or other necessary functions.If the layer is from a new dataset, ideally, pull data from the source API or S3. If we need to get the data from Google Earth Engine, we are using
xee
wherever possible. -
Import the new layer class to city_metrix/layers/__init__.py
-
Add a test to tests/layers.py to ensure the new layer is working as expected.
-
Add new dependencies to setup.py and environment.yml.
-
Add a section to the get layers.ipynb notebook to demonstrate how to use the new layer.
-
Create a PR to merge the new layer into the main branch with these in the PR description:
- Link to Jira ticket (if any)
- A brief description of the new layer
- A link to the Airtable record for the new layer
- Explain how to test the new layer
Once you have all the data layers you need as inputs, here is the process to create an indicator using them.
-
Add a record in Airtable Indicators table
You should add a record to the Airtable table for the new indicator. There should be a unique indicator_label of the indicator with the associated metadata, including theme, indicator_legend, code, indicator_definition, etc. You should also link the record to the Layers and data_sources_link that are used to calculate this indicator.
-
Define the indicator calculation function in a new file in city_metrix/metrics
Define a function for new indicator with the input of the calculation zones as a
GeoDataFrame
and output of the calculated indicators as aGeoSeries
. -
Add a test to tests/metrics.py to ensure the new indicator is working as expected.
-
Add new dependencies to setup.py and environment.yml.
-
Create a PR to merge the new indicator into the main branch with these in the PR description:
- Link to Jira ticket (if any)
- A brief description of the new indicator
- A link to the Airtable record for the new indicator
- Explain how to test the new indicator
You can always have users just provide their own boudary files, but if you are working on a project where you want to provide access to a common set of city boundaries, the best option is to add them to the API
A standalone GeoDataFrame
can be used as input zones for analysis. However, if the new cities will be part of the CIF project, you should consider converting the GeoDataFrame
to match the current format of the boundaries table on Carto.
Add a city to the boundaries table on Carto
Ensure the columns in the GeoDataFrame
align with the boundaries table on Carto. Then use the to_carto()
function in the cartoframes
package to upload:
to_carto(gdf, "boundaries", if_exists='append')
You can run the tests by setting the credentials above and running the following:
pytest