Add documentation on the effects of duplicates in the source geometries #182

darribas · 2023-08-15T09:36:56Z

I don't think this is necessarily a bug, but it is something that caught me off guard until I thought it through, and could trip up other users, so maybe the solution is adding a bit of documentation.

In areal interpolation (not sure about other cases), if the source geometries have duplicates or overlaps, the results are wrong. At least for categoricals (I'm not sure what would happen to intensive/extensive, but I think something similar), some percentages add up to more than 1. My sense is this comes from more than one source geometry covering the same patch of land, which then causes it to be counted more than once. Again, this is what the method would do and, arguably, a strange case (it's unusual to have overlapping/duplicate source geometries), but maybe worth adding a line on the source_df documentation?

tobler/tobler/area_weighted/area_interpolate.py

Line 221 in df0cbc6

source_df : geopandas.GeoDataFrame

What do you think?

The text was updated successfully, but these errors were encountered:

knaaptime · 2023-08-21T16:19:16Z

In areal interpolation (not sure about other cases), if the source geometries have duplicates or overlaps, the results are wrong.

not quite. The validity depends on the question. if you've got data on, say, overlapping school districts (some private, some public) and you're sending average test scores to a smaller geometry, then the target geometry contains the weighted average of the area covered by the overlapping polys (which is what you want in this case). If that small poly is covered entirely by two different overlapping schools, one private and one public, then the target gets 50/50 shares

if you've got an extensive variable with overlapping sources (and those overlaps are conceptually valid in the source data,) then the overlapping sum is correct

non-planar geometries are something that can obviously surface a lot in interpolation problems, so i've thought a few times about includng some sort of check, but ultimately non-planarity also a basic data check and something the user needs to understand about their data, so i've landed on the idea that folks should use https://github.com/sjsrey/geoplanar when they need to check their data

darribas added the documentation Improvements or additions to documentation label Aug 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documentation on the effects of duplicates in the source geometries #182

Add documentation on the effects of duplicates in the source geometries #182

darribas commented Aug 15, 2023

knaaptime commented Aug 21, 2023 •

edited

Loading

Add documentation on the effects of duplicates in the source geometries #182

Add documentation on the effects of duplicates in the source geometries #182

Comments

darribas commented Aug 15, 2023

knaaptime commented Aug 21, 2023 • edited Loading

knaaptime commented Aug 21, 2023 •

edited

Loading