Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bond deletion #4763

Merged
merged 8 commits into from
Oct 26, 2024
Merged

Fix bond deletion #4763

merged 8 commits into from
Oct 26, 2024

Conversation

lilyminium
Copy link
Member

@lilyminium lilyminium commented Oct 25, 2024

Fixes #4762

Changes made in this Pull Request:

  • Changes Topology creation and addition to only accept a single identical bond/angle/dihedral/etc instead of multiple (see discussion in PR)
  • Changes bond/angle/etc deletion to delete all copies of an identical bond if that bond is passed in for deletion

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4763.org.readthedocs.build/en/4763/

@pep8speaks
Copy link

pep8speaks commented Oct 25, 2024

Hello @lilyminium! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-10-26 04:28:34 UTC

@@ -146,7 +146,7 @@ def parse():
struc = parse()

assert hasattr(struc, 'bonds')
assert len(struc.bonds.values) == 4
assert len(struc.bonds.values) == 2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These used to be bonds between [(0, 1), (0, 1), (1, 2), (1, 2)] -- so two bonds repeated. This change makes it so each bond is unique.

@@ -158,7 +158,7 @@ def parse():
with pytest.warns(UserWarning):
struc = parse()
assert hasattr(struc, 'bonds')
assert len(struc.bonds.values) == 2
assert len(struc.bonds.values) == 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above.

Copy link

codecov bot commented Oct 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.15%. Comparing base (d7153b9) to head (c64627e).
Report is 3 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4763      +/-   ##
===========================================
- Coverage    93.59%   93.15%   -0.45%     
===========================================
  Files          177       12     -165     
  Lines        21708     1066   -20642     
  Branches      3052        0    -3052     
===========================================
- Hits         20318      993   -19325     
+ Misses         943       73     -870     
+ Partials       447        0     -447     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@IAlibay
Copy link
Member

IAlibay commented Oct 25, 2024

@lilyminium this PR is marked as draft - is it meant to be up for review or is there more work that needs to be done?

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me - however there's one likely outcome of this "behaviour change" that might have a real impact.

Some software intentionally misuse the PDB format's CONECT records to represent different bond orders - PyMol is a good example of this iirc. In those cases, my understanding is that with this change you would lose that information?

As far as I'm aware, the PDB writer would lose that information even if you have duplicate bonds because we call a set on the bonds of all the atoms (which should squash out duplicates?). So there's probably a good case to be made that support for multi-conect PDBs is not something we want to do anyways.

I'm not particularly sure that I care about the behaviour change, but I do want to cc @MDAnalysis/coredevs (especially @orbeckst @yuxuanzhuang and @RMeli ) who might have some more immediate thoughts on the whole thing.

@IAlibay
Copy link
Member

IAlibay commented Oct 25, 2024

Failures aren't happening in other PRs - it looks like something that has been specifically introduced here.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to revert to a set for the not in comparison, or do some other kind of trick to avoid the performance loss.

for v, t, g, o in zip(values, types, guessed, order):
if v not in existing:
if v not in self.values:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is likely the culprit - doing a not in operation on a list is orders of magnitude slower than on a set.

Example:

In [1]: import random

In [2]: lister = [random.random() for i in range(10000)]

In [3]: setter = set(lister)

In [4]: len(setter)
Out[4]: 10000

In [5]: len(lister)
Out[5]: 10000

In [6]: def notin(a):
   ...:     for v in range(10000):
   ...:         if v not in a:
   ...:             pass
   ...:     return
   ...: 

In [7]: %timeit notin(lister)
734 ms ± 5.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %timeit notin(setter)
129 μs ± 166 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Copy link
Member Author

@lilyminium lilyminium Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this in the commit below, but note to self that changing self.values to a dict might be the easiest way to do this performantly and retain both uniqueness and order.

Or masking the input values after checking a set version of self.values might be less work but still more performant.

@lilyminium
Copy link
Member Author

lilyminium commented Oct 26, 2024

Hmm. Yes that would be the case that multi-bonds can't be sketchily represented by digging into Topology internals, which downstream people may well be doing.

I can think of two solutions to this in this PR:

  • put that information in the order attribute (which TopologyObjects should have) -- a bit of work, may have time later today to push it through but not guaranteed. Also, even more changes in current behaviour
  • This PR fixes the deletion issue twice over. It first only counts identical bonds once when creating /adding to the topology object. It secondly also deletes all identical bonds when we try to delete. If we take out the first and just ensure that deletion works, we should still fix the original issue. This is a lot less work

@lilyminium
Copy link
Member Author

Pushed a commit that removes the "addition" fixes and preserves previous behaviour (should be easy to revert if discussion goes the other way)

@lilyminium
Copy link
Member Author

I'm leaning towards the simpler fix of just ensuring that all identical bonds are deleted when deletion is caused, as it's less disruptive to current behaviour, and potentially fixing the addition of identical bonds in a separate issue/PR later, so I've updated the changelog too to reflect that -- should still be easy to revert if there's any disagreement

@lilyminium lilyminium marked this pull request as ready for review October 26, 2024 04:30
@lilyminium lilyminium added this to the Release 2.8.0 milestone Oct 26, 2024
@IAlibay
Copy link
Member

IAlibay commented Oct 26, 2024

Going with just the bond deletion fix seems like a reasonable thing for this release.

@IAlibay IAlibay merged commit 800b4b2 into MDAnalysis:develop Oct 26, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

delete_bonds not deleting all bonds
3 participants