Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow non-regularly gridded and lat-lon coordinates #32

Merged
merged 42 commits into from
Nov 29, 2024

Conversation

joeloskarsson
Copy link
Contributor

@joeloskarsson joeloskarsson commented Oct 13, 2024

Describe your changes

This PR allows for more flexibility in the coordinates used to create graphs. They do no longer need to be on a regular $N_x \times N_y$ grid, but can be at arbitrary positions. This means that the main coordinate input no longer has the shape [2, Ny, Nx], but instead [N_grid_points, 2] with each column representing a coordinate for all grid points. This makes it easy to also allow for lat-lon input coordinates, and use a specified projection to map these to a euclidean space.

To get an overview of the lat-lon functionality I recommend checking out the new page I added to the documentation.

The motivation for this change is to allow more flexibility in the regions graphs are built for. This is in particular useful when we consider grid points as the union of some LAM-region and a boundary of global grid-points, possibly on different gridding. With this change such cases can be handled without issues.

There are a few noteworthy changes that comes with this feature:

  • I broke out the flat graph creation into a function create_flat_singlescale_mesh_graph, as it made more sense that also this type of mesh graph had its own function. This made sense to do here as the amount of related code was growing.
  • I also moved some utility functions around the testing (for creating dummy coordinate arrays) into their own tests/utils.py file. This made sense as these functions were duplicated in multiple files before and I had to change them with this new coordinate setup.
  • The argument grid_refinement_factor was earlier based on the number of grid nodes in x- and y-directions. As we do no longer assume that these grid nodes lay in regular rows and columns this definition will no longer work. Instead we will have to define the grid refinement in terms of "mesh nodes / distance", using the actual coordinate values. The argument is also being renamed to mesh_node_distance to match this.
  • In the high-level functions that are used to create graphs the xy argument is renamed to coords, as this can be lat-lons or euclidean x- and y-coordinates. If a projection is given this is assumed to be lat-lons, otherwise euclidean. For functions further down (e.g. create_single_level_2d_mesh_graph), that still take only euclidean coordinates, this argument is still called xy.

New dependencies

This PR introduces a dependency to cartopy, as that is needed for the projection specification when working with lat-lons.

Merge and release planning

  • It would be good if this change went in after Fix wrong number of mesh levels when grid is multiple of refinement factor #26 as both deal with the refinement factor calculations. (Now merged)
  • We might want to create a new release before merging this, as this is a breaking change that will alter how the main input to the package is formatted. It could be nice to separate all previous changes in a release before adding this on.

Issue Link

This PR closes #29

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

@joeloskarsson joeloskarsson changed the title Allow non-regularly gridded coordinates Allow non-regularly gridded and lat-lon coordinates Oct 16, 2024
@joeloskarsson joeloskarsson self-assigned this Oct 16, 2024
@joeloskarsson joeloskarsson marked this pull request as ready for review October 16, 2024 15:43
@joeloskarsson
Copy link
Contributor Author

This is now ready for review 🥳

@joeloskarsson joeloskarsson requested a review from sadamov October 16, 2024 15:43
@leifdenby leifdenby added this to the v0.3.0 milestone Nov 20, 2024
@leifdenby
Copy link
Member

leifdenby commented Nov 20, 2024

I'm in the process of giving this a thorough read through and executing the notebooks etc as I go through. In general this is really excellent (as usual), and I fully agree with all of the following three changes:

  • I broke out the flat graph creation into a function create_flat_singlescale_mesh_graph, as it made more sense that also this type of mesh graph had its own function. This made sense to do here as the amount of related code was growing.

  • I also moved some utility functions around the testing (for creating dummy coordinate arrays) into their own tests/utils.py file. This made sense as these functions were duplicated in multiple files before and I had to change them with this new coordinate setup.

  • The argument grid_refinement_factor was earlier based on the number of grid nodes in x- and y-directions. As we do no longer assume that these grid nodes lay in regular rows and columns this definition will no longer work. Instead we will have to define the grid refinement in terms of "mesh nodes / distance", using the actual coordinate values. The argument is also being renamed to mesh_node_distance to match this.

The following I also really like, but I have a question/suggestion wrt it:

  • In the high-level functions that are used to create graphs the xy argument is renamed to coords, as this can be lat-lons or euclidean x- and y-coordinates. If a projection is given this is assumed to be lat-lons, otherwise euclidean. For functions further down (e.g. create_single_level_2d_mesh_graph), that still take only euclidean coordinates, this argument is still called xy.

As I am reading it the intended use in your new implementation has two main use cases (in terms of how the coordinates are defined):

# 1. Using the coordinate values as-is, measuring distance in units of
# coordinate values with the euclidean distance metric. Relevant when working
# with cartesian coordinates.
graph = wmg.create.archetype.create_keisler_graph(
    coords,
    mesh_node_distance=10,  # distance in units of coordinate values, measured as euclidean distance
    projection=None,
)

# 2. First projecting the coordinates to a different coordinate system,
# measuring distance in units of the projection with the euclidean distance
# metric. Relevant when working with lat/lon coordinates.
graph = wmg.create.archetype.create_keisler_graph(
    coords,
    mesh_node_distance=1.0e6,  # distance in units of projected coordinates, measured as euclidean distance
    projection=ae_projection
)

with the current function signature being:

def create_keisler_graph(
    coords: np.ndarray,
    mesh_node_distance: float,
    projection: Optional[cartopy.crs.CRSj] = None,
) -> nx.Graph:

Where it is assumed that distance is measured with the euclidean distance metric, with either the coordinate values as-is or first projecting.

I think it would be cool to extend this in future by allowing for a different distance metric to be used. That would allow us to create graphs from lat/lon coordinates directly (and thereby have graphs that fully wrap the Earth).

For example we could then do:

# 1. Using the coordinate values as-is, measuring distance in units of
# coordinate values with the euclidean distance metric. This would be useful
# for creating graphs from cartesian coordinates.
graph = wmg.create.archetype.create_keisler_graph(
    coords,
    mesh_node_distance=10,
    projection=None,
    distance_metric="euclidean"
)

# 2. First projecting the coordinates to a different coordinate system,
# measuring distance in units of the projection with the euclidean distance
# metric. This would be useful for creating graphs from lat/lons for limited
# areas.
graph = wmg.create.archetype.create_keisler_graph(
    coords,
    mesh_node_distance=mesh_distance,
    projection=ae_projection,
    distance_metric="euclidean"
)

# 3. Using the coordinate values as-is, measuring distance in great circle
# distance with the haversine distance metric. This would be useful for
# creating graphs from lat/lon coordinates directly. We should default to the
# distance being in SI units (as in meters) and so use a fixed Earth radius in
# meters.
graph = wmg.create.archetype.create_keisler_graph(
    coords,
    mesh_node_distance=1.0e4,  # distance of 10 km between coordinates given in lat/lon
    projection=None,
    distance_metric="haversine"
)

With the function signature changed to:

def create_keisler_graph(
    coords: np.ndarray,
    mesh_node_distance: float,
    projection: Optional[cartopy.crs.CRS] = None,
    distance_metric: str = "euclidean"
) -> nx.Graph:

So that by default the function will use euclidean distance with the coordinate values, but the metric can be changed to haversine (which gives approximate great circle distances in meters between lat/lon coordinates, assuming a fixed Earth radius).

All that is then required is to pass the distance_metric argument down the function calls and where the actual distance is calculated and used implement the logic for the different distance metrics.

What do you think of this idea? If it sounds good to you then maybe we could go ahead and add distance_metric as an argument to the function (with default value euclidean) to make the metric used explicit, and then that would make it easy to add the haversine distance metric in the future.

Of course one could argue that it should be possible to provide the Earth radius, but maybe that could become an optional metric_argument down the line. Or we allow people to optionally provide a projection that makes it clear that the coordinates are lat/lon and in that projection object they can then set the globe radius (e.g. with cartopy.crs.PlateCarree(globe=cartopy.crs.Globe(semimajor_axis=..., semiminor_axis=...)) or something like that)

Copy link
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really great to me. There isn't that you've written that I think is incorrect, I've only made some minor suggestions for small improvements. And then there's the suggestion to make the distance metric used explicit so that we can later introduce other distance metrics (for example haversine)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe where we create the coordinate values here we could point the reader towards your notebook on working with lat/lons in case that's what they have? Something like

... (x/y cartesian, see [notebook link] on how to use lat/lon coordinate values in weather-model-graphs)

Copy link
Contributor Author

@joeloskarsson joeloskarsson Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested locally that this way of linking to another notebook works in jupyter-book.

pyproject.toml Outdated Show resolved Hide resolved
src/weather_model_graphs/create/grid/grid.py Show resolved Hide resolved
src/weather_model_graphs/create/mesh/kinds/hierarchical.py Outdated Show resolved Hide resolved
src/weather_model_graphs/create/mesh/mesh.py Outdated Show resolved Hide resolved
tests/test_graph_creation.py Outdated Show resolved Hide resolved
docs/creating_the_graph.ipynb Outdated Show resolved Hide resolved
src/weather_model_graphs/create/archetype.py Show resolved Hide resolved
@joeloskarsson
Copy link
Contributor Author

About the idea of having a distance_metric="euclidean" keyword argument:

I think that this is a good idea in general, but it seems to be quite a big project to generalize beyond euclidean metrics. I also wonder if the different ways to build graphs that are interesting are as easily summarized as just switching out a distance metric in the same algorithm. I have not given that any deeper thought, it's just not obvious to me that you would want to build graphs using lat-lons with the same algorithm, just a different metric. And if the algorithm is completely different, then maybe it's better to just have a different type of switch, e.g. graph_creation_method=.... Just some thoughts for the future :)

But for this PR, while I think this keyword argument could be nice, I don't think it belongs with this change. This does not generalize the metric in any way, so it would just be adding an option that only has one accepted value. I think that option should be added in the potential future PR that actually generalizes the metric option instead.

@leifdenby
Copy link
Member

About the idea of having a distance_metric="euclidean" keyword argument:

I think that this is a good idea in general, but it seems to be quite a big project to generalize beyond euclidean metrics. I also wonder if the different ways to build graphs that are interesting are as easily summarized as just switching out a distance metric in the same algorithm. I have not given that any deeper thought, it's just not obvious to me that you would want to build graphs using lat-lons with the same algorithm, just a different metric. And if the algorithm is completely different, then maybe it's better to just have a different type of switch, e.g. graph_creation_method=.... Just some thoughts for the future :)

But for this PR, while I think this keyword argument could be nice, I don't think it belongs with this change. This does not generalize the metric in any way, so it would just be adding an option that only has one accepted value. I think that option should be added in the potential future PR that actually generalizes the metric option instead.

Ok, thanks for the feedback :) I am glad you like the idea anyway. I have quickly hacked together what I meant here: #37, and it seems to be working :) So I will continue with that for now, but glad to hear that this idea seems to fit.

@joeloskarsson
Copy link
Contributor Author

This is ready for another look @leifdenby . All comments above should be fixed.

Copy link
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some very minor comments/suggestions left. Great!

docs/lat_lons.ipynb Show resolved Hide resolved
mesh_node_distance: float
Distance (in x- and y-direction) between created mesh nodes,
in coordinate system of coords
projection: cartopy.crs.CRS or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this docstring line needs updating to reflect that coords_crs and graph_crs are now provided separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, forgot to change those docstrings as well. Thanks!

level_refinement_factor: int
Refinement factor between grid points and bottom level of mesh hierarchy
NOTE: Must be an odd integer >1 to create proper multiscale graph
max_num_levels: int
The number of levels of longer-range connections in the mesh graph.
projection: cartopy.crs.CRS or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this docstring line needs updating to reflect that coords_crs and graph_crs are now provided separately.

level_refinement_factor: float
Refinement factor between grid points and bottom level of mesh hierarchy
projection: cartopy.crs.CRS or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there, I think this docstring line needs updating to reflect that coords_crs and graph_crs are now provided separately.

src/weather_model_graphs/create/base.py Outdated Show resolved Hide resolved
src/weather_model_graphs/create/mesh/mesh.py Show resolved Hide resolved
tests/test_graph_creation.py Show resolved Hide resolved
joeloskarsson and others added 2 commits November 29, 2024 14:45
@joeloskarsson
Copy link
Contributor Author

Should be all fixed now @leifdenby :)

Copy link
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just minor tweak suggested to make the change to coordinate shape explicit in CHANGELOG

CHANGELOG.md Outdated Show resolved Hide resolved
joeloskarsson and others added 2 commits November 29, 2024 16:24
Co-authored-by: Leif Denby <leif@denby.eu>
@joeloskarsson joeloskarsson merged commit d909a1c into mllam:main Nov 29, 2024
3 checks passed
@joeloskarsson joeloskarsson deleted the general_coordinates branch November 29, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow non-regularly gridded and lat-lon coordinates
4 participants