Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CellAlignment doesn't work when using enclosed tessellation #278

Open
matthew-law opened this issue Jul 18, 2021 · 5 comments
Open

CellAlignment doesn't work when using enclosed tessellation #278

matthew-law opened this issue Jul 18, 2021 · 5 comments

Comments

@matthew-law
Copy link
Contributor

matthew-law commented Jul 18, 2021

momepy.CellAlignment expects the left and right GeoDataFrames (containing buildings and tessellation cells) to have the same number of elements, ie each building corresponds to one cell (which is the case in morphological tessellation). This means that it sometimes doesn't work when using enclosed tessellation, when there are some tessellation cells with no corresponding buildings (ie areas enclosed by roads with no buildings on them).

Code example:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-78-f527b308bee5> in <module>
----> 1 blg['blg_CellAlignment'] = mm.CellAlignment(blg, tess, 'blg_Orientation', 'tess_Orientation', 'uID', 'uID').series

/usr/local/Cellar/jupyterlab/3.0.1/libexec/lib/python3.9/site-packages/momepy/distribution.py in __init__(self, left, right, left_orientations, right_orientations, left_unique_id, right_unique_id)
    419             right_orientations = right_orientations + "_y"
    420         self.series = np.absolute(comp[left_orientations] - comp[right_orientations])
--> 421         self.series.index = left.index
    422 
    423 

/usr/local/Cellar/jupyterlab/3.0.1/libexec/lib/python3.9/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
   5476         try:
   5477             object.__getattribute__(self, name)
-> 5478             return object.__setattr__(self, name, value)
   5479         except AttributeError:
   5480             pass

pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()

/usr/local/Cellar/jupyterlab/3.0.1/libexec/lib/python3.9/site-packages/pandas/core/series.py in _set_axis(self, axis, labels, fastpath)
    468         if not fastpath:
    469             # The ensure_index call above ensures we have an Index object
--> 470             self._mgr.set_axis(axis, labels)
    471 
    472     # ndarray compatibility

/usr/local/Cellar/jupyterlab/3.0.1/libexec/lib/python3.9/site-packages/pandas/core/internals/managers.py in set_axis(self, axis, new_labels)
    218 
    219         if new_len != old_len:
--> 220             raise ValueError(
    221                 f"Length mismatch: Expected axis has {old_len} elements, new "
    222                 f"values have {new_len} elements"

ValueError: Length mismatch: Expected axis has 81708 elements, new values have 81479 elements

Where tess is an enclosed tessellation generated from the buildings blg.

@martinfleis
Copy link
Member

Thanks for the report!

momepy.CellAlignment expects the left and right GeoDataFrames (containing buildings and tessellation cells) to have the same number of elements, ie each building corresponds to one cell (which is the case in morphological tessellation).

Actually, it doesn't. In the example below, there are more enclosed tessellation cells than buildings and it works fine.

import geopandas
import momepy

blg = geopandas.read_file(momepy.datasets.get_path('bubenec'), layer="buildings")
streets = geopandas.read_file(momepy.datasets.get_path('bubenec'), layer="streets")

enclosures = momepy.enclosures(streets, limit=geopandas.GeoSeries([streets.unary_union.convex_hull]))
tessellation = momepy.Tessellation(blg, "uID", enclosures=enclosures).tessellation

blg['orient_blg'] = momepy.Orientation(blg).series
tessellation['orient_tess'] = momepy.Orientation(tessellation).series

momepy.CellAlignment(blg, tessellation, 'orient_blg', 'orient_tess', 'uID', 'uID').series
0       0.290739
1       0.547136
2       5.486106
3       0.263380
4       0.165063
         ...    
139     0.185136
140    17.566047
141     0.386790
142     0.010713
143     0.438541
Length: 144, dtype: float64

print(blg.shape, tessellation.shape)
(144, 3) (153, 4)

So the issue is not in the non-matching number of elements.

I can reproduce the issue if I have duplicated unique IDs in the tessellation layer. That can happen when a building intersects two enclosures. It is generally not safe to use a building unique ID as a unique identifier of enclosures.

One way around this I was using is to combine both geometries (buildings and tessellation) into the same GeoDataFrame and then dynamically change the active geometry.

# renaming to keep track of which is which and merging together
combined = tessellation.rename_geometry("tessellation").merge(blg.rename_geometry("buildings"), on="uID", how="left")
combined["unique_id"] = range(len(combined))  # new truly unique ID

# switching active geometry from default tessellation
momepy.CellAlignment(combined.set_geometry("buildings"), combined, 'orient_blg', 'orient_tess', 'unique_id', 'unique_id').series

I am not sure if this is a bug or not though :D.

@matthew-law
Copy link
Contributor Author

Thanks for having a look at this and hopefully getting to the bottom of the issue!

Would it be a bad idea to do a geometric overlay of streets (ie the edges of enclosures) and buildings as a preprocessing step? It should solve the issue of duplicate unique IDs, but it would mess up some of the metrics based on (eg) building area etc (by splitting buildings which overlap with streets). But I would assume that it is a relatively small number of buildings that cross a road (especially since my streets data only contains roads for cars, and not pedestrian-only streets), so hopefully this wouldn't have a significant effect on the overall results. I did try to find which buildings crossed which streets in my data, but crashed QGIS in the attempt so have yet to do so.

Alternatively, could there be some way to, for each building, get one of either a) the eID of the enclosure that the building is entirely within (this will be the case for most buildings), or b) the eID of the enclosure that the largest portion of the building is in, if it is split between two buildings? This could then be used in place of the uIDs that were previously used to match buildings and tessellation cells when running CellAlignment.

@martinfleis
Copy link
Member

I don't think there's a universal solution for this situation. You can do overlay if you know that you're not going to use that building geometry for anything meaningful in the future. You can do what I've done above but that causes trouble in AreaRatio as you may have a larger building than its cell (in that case, an overlay is the best solution).

I'd say that you need to know your data and your use case and apply a solution that works the best for that.

a) the eID of the enclosure that the building is entirely within (this will be the case for most buildings)

that is geopandas.sjoin with op="within".

b) the eID of the enclosure that the largest portion of the building is in, if it is split between two buildings?

that is tobler's area_join

@jGaboardi
Copy link
Member

Is this issue the same as #252?

@martinfleis
Copy link
Member

@jGaboardi it seems so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants