Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add containing_rectangle graph connection method for m2g #28

Merged
merged 10 commits into from
Oct 23, 2024

Conversation

joeloskarsson
Copy link
Contributor

@joeloskarsson joeloskarsson commented Oct 2, 2024

Describe your changes

This PR introduces a new method for connecting a mesh graph to the grid.

Background

In earlier work we have used 4-nearest-neighbors (4nn) to create the m2g connection. This is not terrible, but has its issues (see below), especially for grid cells along the edges (outside the area covered by the mesh graph).

Below is an example of how a 4nn m2g graph can look like, with a few edges highlighted

4nn_problem_illustration

  • In red: These edges are not really desirable, as they represent strangely long connections in the graph
  • In green: This edge does not exist, but it would be natural if it would. This mesh node represents one corner of the "rectangle" made up of 4 mesh nodes.

containing_rectangle

The 4nn method was somewhat inspired by how m2g is created in GraphCast, where each grid node is projected up to the corresponding triangular face containing it in the mesh graph. The containing_rectangle method is an attempt at achieving something closer to this for regular quadrilateral mesh graphs.

Starting with the end result, m2g edges created using the containing_rectangle look like

cont_example

The idea of this method is to similarly to GraphCast find the face of the mesh graph containing each grid node and connect the grid nodes with the edges of that face. Since we currently work with a regular layout for the mesh nodes these faces are rectangles, as illustrated below:

cont_example_grid

Note that this means that grid nodes along the edges will have less than 4 mesh connections.

Definition of edge set

The actual definition of this edge set is as follows: Let $d_x$ be the distance between the columns of mesh nodes in the x-direction and similarly $d_y$ in the y-direction. A mesh nodes with coordinates $(x_m, y_m)$ should be connected to a grid node with coordinates $(x_g, y_g)$ if

$$ (|x_g - x_m| < d_x) \& (|y_g - y_m| < d_y). $$

To avoid having to check all grid and mesh nodes pairwise for the expression above we in practice start by building a radius graph based on radius $r = \sqrt{d_x^2 + d_y^2}$. This gives an edge set that is a superset of the containing_rectangle edge set. We then filter this edge set using the expression above to get the final set of edges. Trying to build a KD-tree using this distance measure directly seemed a bit over-complicated and this two-step method works fast without any issues.

For now this method is only defined for m2g. It is not entirely clear how to do it for g2m, but it should be possible to extend also to those edges.

Issue Link

No relevant issue.

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
    • As far as I see there is no place in the docs where the different graph connections are discussed, so will not add that right now.
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

@TomasLandelius
Copy link

Just a thought - what about requiring that the mesh surrounds the grid so that all grid points have 4 decent nns?

@joeloskarsson
Copy link
Contributor Author

Me and @TomasLandelius discussed this a bit offline and realized the fact that the outermost nodes will always be on the boundary in LAM models, meaning that we don't actually care about the prediction for that node and hence its m2g connectivity pattern does not matter. This means that the long red edge in my first figure above is not really a problem. The "problem" with the second red edge remains.

This relates to the question of whether we should place the outermost mesh nodes to surround the grid or (as we do now) inside the bounding box of the grid. This basically boils down to a tradeoff of where we want the irregular connectivity pattern:

  1. The outermost grid nodes having a somewhat strange and irregular connectivity pattern (current situation), or
  2. The outermost mesh nodes having a somewhat strange and irregular connectivity pattern

Since we realize that we do not care about predicting for the outermost grid nodes (we don't even need to connect to these in m2g really), the first option is likely to prefer. The mesh nodes (also those over the boundary) will impact the prediction also for nodes within the state region.

@joeloskarsson joeloskarsson requested a review from sadamov October 14, 2024 06:20
@sadamov sadamov added the enhancement New feature or request label Oct 14, 2024
@sadamov sadamov modified the milestones: v0.1.1, v0.2.0 Oct 14, 2024
Copy link
Contributor

@sadamov sadamov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this extension. The code is in good shape. I have some conceptual questions and one technical comment (mostly for my education I guess).

  • Once we start connecting realistic boundaries the state: is your statement below referring to the nodes at the border of the state or at the border of the boundary(+state)? I don't think we would want irregularities neither in grid nor mesh nodes at the border of the state, right?
    The outermost grid nodes having a somewhat strange and irregular connectivity pattern (current situation), or The outermost mesh nodes having a somewhat strange and irregular connectivity pattern

  • Could containing_rectangle be used for many shapes in the future (e.g. hex-grid). Will we add seperate functions for each type of grid or call this one containing_shape?

  • Looking at the roadmap I saw that graphtools is planned as an additional backend for v0.3.0. The motivation for the two-step approach (within-radius -> containing-rectangle) seems to be performance. I was wondering whether you would still do this in a highly optimized library like graphtools?

I am not complete sure from the roadmap whether this feature was planned for 0.1.1 or 0.2.0? From my side we can merge now and make sure that we add it to the roadmap accordingly.

# Connect to all nodes that could potentially be close enough,
# which is at a relative distance of 1
rad_graph = connect_nodes_across_graphs(
G_source, G_target, method="within_radius", rel_max_dist=1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this 1.0 and not srt(dx^2 + dy^2)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This is relative distance, which is relative to the longest edge in (the bottom level of) the source and target graphs. These longest edges will be the diagonal edges in each rectangle in the mesh graph, which have the length sqrt(dx^2 + dy^2). So that is exactly the distance of rel_max_dist=1.0.

I added a bit of a clarification about this, and also realized that there was nothing describing the rel_max_dist argument in the docstring, so fixed that as well.

@joeloskarsson
Copy link
Contributor Author

The outermost grid nodes having a somewhat strange and irregular connectivity pattern (current situation), or The outermost mesh nodes having a somewhat strange and irregular connectivity pattern

This was referring to the nodes at the border of the boundary(+state)? I was a bit unspecific (I sometimes think we should just define a full on list of terminology with figures for these things 😄 ). I agree that we would not want any form of irregularities at the border of the state (inner region) itself.

Could containing_rectangle be used for many shapes in the future (e.g. hex-grid). Will we add seperate functions for each type of grid or call this one containing_shape?

Sadly I think that this is quite specific to these rectangular mesh graphs, so I do not expect that this will trivially work for other mesh node layouts. However, for the hex-grid case (i.e. triangular mesh) I am quite sure that just 3-nn connections will achieve the same thing as this does for rectangular meshes. This just because of the geometry with triangles. The only issue with 3-nn in that case would be at the border of the state+boundary region, but as discussed above that does not really matter as we do not predict there.

Looking at the roadmap I saw that graphtools is planned as an additional backend for v0.3.0. The motivation for the two-step approach (within-radius -> containing-rectangle) seems to be performance. I was wondering whether you would still do this in a highly optimized library like graphtools?

I would still do this two-step approach, as this is not needed due to networkx being slow (there is actually not really any networkx functions being used here). This is just inherent to the problem, if you would check each pair of nodes if they should be connected that will always be $O(N^2)$, whereas this two-step approach is dominated by the kd-tree ball lookup which I think is $O(N^\frac{3}{2})$ (?, https://en.wikipedia.org/wiki/K-d_tree#Range_search), but very fast in practice.

I am not complete sure from the roadmap whether this feature was planned for 0.1.1 or 0.2.0? From my side we can merge now and make sure that we add it to the roadmap accordingly.

Yes, I think this was not on the original road map, but is mentioned in #1 (comment). It should probably go in under some version number though. I think it would be a good idea to create a new release here soon, in particular before #32 is merged in.

@joeloskarsson
Copy link
Contributor Author

@TomasLandelius did you want to go over this code in more detail or are you happy with the addition?

@TomasLandelius
Copy link

@joeloskarsson Sorry for being so slow. I'm happy with this!

@joeloskarsson joeloskarsson merged commit c3ba212 into mllam:main Oct 23, 2024
3 checks passed
@joeloskarsson joeloskarsson mentioned this pull request Oct 23, 2024
24 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants