Improve performance for molecular dynamics simulations #359

aprokop · 2020-08-13T16:02:35Z

Problem

Given a set of 3D points (particles) and a radius, for every particle find all particles within a sphere of the given radius.

Of note:

Finding extra particles outside of the sphere (false positives) may be acceptable
There are two ways to get the result: full neighbor list, and half-neighbor
Half-neighbors only find each edge once
The output format may affect performance
CSR vs 2D
There is potentially a way to use callbacks
Neighbor construction is followed by force calculations
There are simulations that use multiple radii
Solvent/colloid simulations

Current status (CabanaMD)

Lagging behind Verlet list for uniform particle distribution, clear win for non-uniform

Things to try with the current ArborX implementation

Reusing sort for query ordering
The problem is auto-collision, thus sorting may be reused
Overall ArborX performance improvement
E.g., Add knobs to control occupancy in bvh traversal algorithms #323
Using buffering and single pass
Using callbacks

Things to consider implementing in ArborX

Papers from HOOMD group:

[1] Howard, M. P., Anderson, J. A., Nikoubashman, A., Glotzer, S. C., & Panagiotopoulos, A. Z. (2016). Efficient neighbor list calculation for molecular simulation of colloidal systems using graphics processing units. Computer Physics Communications, 203, 45-52.

[2] Howard, M. P., Statt, A., Madutsa, F., Truskett, T. M., & Panagiotopoulos, A. Z. (2019). Quantized bounding volume hierarchies for neighbor search in molecular simulations on graphics processing units. Computational Materials Science, 164, 139-146.

The code is available here: https://github.com/mphowardlab/neighbor

Summary of [2] (as I understand it):

Use standard LBVH (Karras), not a high quality tree
Use stackless traversal using ropes
Use quantization to compress node size to 16 bytes
- Filtering out false-positives is optional

The main difference of [2] compared to [1] is quantization. The claim is this is what allowed them to beat grid-based methods. Essentially, they are snapping to the boundaries of Morton cells (2^10 in each direction).

aprokop · 2020-08-13T16:06:56Z

I think we can try these algorithms of [2] (or an approximation of them) without too much trouble:

Quantization is pretty straightforward. Need to update Node storage and then do coordinate expansion from quantized to single precision inside getBoundingVolume()
Ropes may not be too hard
- Rope traversal is pretty straightforward to implement
  ~~Main downside is that it is only for spatial traversal and is incompatible with nearest)~~
- Rope construction may be slightly tricky
  Should only be considered after merging Apetrei (Apetrei's construction algorithm #350) as it depends on the construction algorithm. There is also an autorope algorithm.

Update: rope traversal implemented in #364.

aprokop · 2020-08-19T15:43:53Z

Quantization

In the papers [1] and [2], they use the following layout with (a) original and (b) quantized (c/o [2]):

Here are some thoughts on quantization.

Can we live with false positives?

For MD likely yes, as long as their number is small comparatively (in [2], they have ~5%). For other problems, like halo finder, likely no. And we can't really postprocess/filter them out, as the information about true boxes is lost after construction (unless we store the original full precision boxes, which is a bad idea).

If we can't live with false positives, then what?

We would need to separate internal and leaf nodes types. The internal nodes will be compressed, while leaf nodes will not. This means 25% memory reduction instead of 50%. I think we'd like to know the ratio of hit internal nodes vs hit leaf nodes in the traversal. It's at least 1:1, but may be better, and this 25% may go up.

Is quantization better than using low precision floating points?

~~Not sure. Would want to examine how it compares with an (unconventional) 10-bit floating number with 6 bit mantissa and 4 bit exponent, both from error and performance point of view.~~ Independent of how you construct them, using 10-bit results in 2^10 different values. Thus, if not using uniform partition of the range, some bins will necessarily be larger than the ones in uniform. Therefore, using the original quantization seems to be preferable to minimize the error.

Is the information in the node sufficient to perform operations?

No. With quantized boundaries, intersections with user data require first expanding stored data using scene bounds. Thus, there will need to be an additional information pass to the calls. On the other hand, if user's query information is transformed to the quantized state prior to traversal, that will not be necessary. But this approach will result in fase positives, and thus see first point.

aprokop added performance Something is slower than it should be application Anything to support specific applications labels Aug 13, 2020

aprokop mentioned this issue Dec 12, 2021

ArborX::Grid #586

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance for molecular dynamics simulations #359

Improve performance for molecular dynamics simulations #359

aprokop commented Aug 13, 2020 •

edited

Loading

aprokop commented Aug 13, 2020 •

edited

Loading

aprokop commented Aug 19, 2020 •

edited

Loading

Improve performance for molecular dynamics simulations #359

Improve performance for molecular dynamics simulations #359

Comments

aprokop commented Aug 13, 2020 • edited Loading

Problem

Current status (CabanaMD)

Things to try with the current ArborX implementation

Things to consider implementing in ArborX

Papers from HOOMD group:

aprokop commented Aug 13, 2020 • edited Loading

aprokop commented Aug 19, 2020 • edited Loading

Quantization

aprokop commented Aug 13, 2020 •

edited

Loading

aprokop commented Aug 13, 2020 •

edited

Loading

aprokop commented Aug 19, 2020 •

edited

Loading