Fix bond deletion #4763

lilyminium · 2024-10-25T07:12:21Z

Fixes #4762

Changes made in this Pull Request:

~~Changes Topology creation and addition to only accept a single identical bond/angle/dihedral/etc instead of multiple~~ (see discussion in PR)
Changes bond/angle/etc deletion to delete all copies of an identical bond if that bond is passed in for deletion

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

Developers certificate of origin

I certify that this contribution is covered by the LGPLv2.1+ license as defined in our LICENSE and adheres to the Developer Certificate of Origin.

📚 Documentation preview 📚: https://mdanalysis--4763.org.readthedocs.build/en/4763/

pep8speaks · 2024-10-25T07:53:22Z

Hello @lilyminium! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-10-26 04:28:34 UTC

lilyminium · 2024-10-25T12:28:58Z

testsuite/MDAnalysisTests/topology/test_pdb.py

@@ -146,7 +146,7 @@ def parse():
        struc = parse()

    assert hasattr(struc, 'bonds')
-    assert len(struc.bonds.values) == 4
+    assert len(struc.bonds.values) == 2


These used to be bonds between [(0, 1), (0, 1), (1, 2), (1, 2)] -- so two bonds repeated. This change makes it so each bond is unique.

lilyminium · 2024-10-25T12:29:09Z

testsuite/MDAnalysisTests/topology/test_pdb.py

@@ -158,7 +158,7 @@ def parse():
    with pytest.warns(UserWarning):
        struc = parse()
    assert hasattr(struc, 'bonds')
-    assert len(struc.bonds.values) == 2
+    assert len(struc.bonds.values) == 1


Similar to above.

codecov · 2024-10-25T13:03:11Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.15%. Comparing base (d7153b9) to head (c64627e).
Report is 3 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4763      +/-   ##
===========================================
- Coverage    93.59%   93.15%   -0.45%     
===========================================
  Files          177       12     -165     
  Lines        21708     1066   -20642     
  Branches      3052        0    -3052     
===========================================
- Hits         20318      993   -19325     
+ Misses         943       73     -870     
+ Partials       447        0     -447

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

IAlibay · 2024-10-25T19:07:02Z

@lilyminium this PR is marked as draft - is it meant to be up for review or is there more work that needs to be done?

IAlibay

Overall this looks good to me - however there's one likely outcome of this "behaviour change" that might have a real impact.

Some software intentionally misuse the PDB format's CONECT records to represent different bond orders - PyMol is a good example of this iirc. In those cases, my understanding is that with this change you would lose that information?

As far as I'm aware, the PDB writer would lose that information even if you have duplicate bonds because we call a set on the bonds of all the atoms (which should squash out duplicates?). So there's probably a good case to be made that support for multi-conect PDBs is not something we want to do anyways.

I'm not particularly sure that I care about the behaviour change, but I do want to cc @MDAnalysis/coredevs (especially @orbeckst @yuxuanzhuang and @RMeli ) who might have some more immediate thoughts on the whole thing.

IAlibay · 2024-10-25T21:56:19Z

Failures aren't happening in other PRs - it looks like something that has been specifically introduced here.

IAlibay

We should try to revert to a set for the not in comparison, or do some other kind of trick to avoid the performance loss.

IAlibay · 2024-10-25T22:04:21Z

package/MDAnalysis/core/topologyattrs.py

        for v, t, g, o in zip(values, types, guessed, order):
-            if v not in existing:
+            if v not in self.values:


This change is likely the culprit - doing a not in operation on a list is orders of magnitude slower than on a set.

Example:

In [1]: import random In [2]: lister = [random.random() for i in range(10000)] In [3]: setter = set(lister) In [4]: len(setter) Out[4]: 10000 In [5]: len(lister) Out[5]: 10000 In [6]: def notin(a): ...: for v in range(10000): ...: if v not in a: ...: pass ...: return ...: In [7]: %timeit notin(lister) 734 ms ± 5.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [8]: %timeit notin(setter) 129 μs ± 166 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

I removed this in the commit below, but note to self that changing self.values to a dict might be the easiest way to do this performantly and retain both uniqueness and order.

Or masking the input values after checking a set version of self.values might be less work but still more performant.

lilyminium · 2024-10-26T02:15:34Z

Hmm. Yes that would be the case that multi-bonds can't be sketchily represented by digging into Topology internals, which downstream people may well be doing.

I can think of two solutions to this in this PR:

put that information in the order attribute (which TopologyObjects should have) -- a bit of work, may have time later today to push it through but not guaranteed. Also, even more changes in current behaviour
This PR fixes the deletion issue twice over. It first only counts identical bonds once when creating /adding to the topology object. It secondly also deletes all identical bonds when we try to delete. If we take out the first and just ensure that deletion works, we should still fix the original issue. This is a lot less work

lilyminium · 2024-10-26T02:21:08Z

Pushed a commit that removes the "addition" fixes and preserves previous behaviour (should be easy to revert if discussion goes the other way)

lilyminium · 2024-10-26T04:29:55Z

I'm leaning towards the simpler fix of just ensuring that all identical bonds are deleted when deletion is caused, as it's less disruptive to current behaviour, and potentially fixing the addition of identical bonds in a separate issue/PR later, so I've updated the changelog too to reflect that -- should still be easy to revert if there's any disagreement

IAlibay · 2024-10-26T10:02:19Z

Going with just the bond deletion fix seems like a reasonable thing for this release.

lilyminium added 3 commits October 25, 2024 18:11

add failing test

46f5b15

add fix

5485b02

update changelog

0f1b83e

lilyminium mentioned this pull request Oct 25, 2024

Allow bonds etc to be additively guessed when present #4761

Merged

5 tasks

lilyminium added 2 commits October 25, 2024 18:54

rm comment

64be27b

change tests

367e2c4

lilyminium commented Oct 25, 2024

View reviewed changes

Merge branch 'develop' into fix-bond-deletion

560ad70

IAlibay reviewed Oct 25, 2024

View reviewed changes

IAlibay requested changes Oct 25, 2024

View reviewed changes

remove "adding" fixes -- keep only "deletion" fixes

fa68282

update changelog [skip ci]

c64627e

lilyminium marked this pull request as ready for review October 26, 2024 04:30

lilyminium added this to the Release 2.8.0 milestone Oct 26, 2024

IAlibay approved these changes Oct 26, 2024

View reviewed changes

IAlibay merged commit 800b4b2 into MDAnalysis:develop Oct 26, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bond deletion #4763

Fix bond deletion #4763

lilyminium commented Oct 25, 2024 •

edited

Loading

pep8speaks commented Oct 25, 2024 •

edited

Loading

lilyminium Oct 25, 2024

lilyminium Oct 25, 2024

codecov bot commented Oct 25, 2024 •

edited

Loading

IAlibay commented Oct 25, 2024

IAlibay left a comment

IAlibay commented Oct 25, 2024

IAlibay left a comment

IAlibay Oct 25, 2024

lilyminium Oct 26, 2024 •

edited

Loading

lilyminium commented Oct 26, 2024 •

edited

Loading

lilyminium commented Oct 26, 2024

lilyminium commented Oct 26, 2024

IAlibay commented Oct 26, 2024

Fix bond deletion #4763

Fix bond deletion #4763

Conversation

lilyminium commented Oct 25, 2024 • edited Loading

PR Checklist

Developers certificate of origin

pep8speaks commented Oct 25, 2024 • edited Loading

Comment last updated at 2024-10-26 04:28:34 UTC

lilyminium Oct 25, 2024

Choose a reason for hiding this comment

lilyminium Oct 25, 2024

Choose a reason for hiding this comment

codecov bot commented Oct 25, 2024 • edited Loading

Codecov Report

IAlibay commented Oct 25, 2024

IAlibay left a comment

Choose a reason for hiding this comment

IAlibay commented Oct 25, 2024

IAlibay left a comment

Choose a reason for hiding this comment

IAlibay Oct 25, 2024

Choose a reason for hiding this comment

lilyminium Oct 26, 2024 • edited Loading

Choose a reason for hiding this comment

lilyminium commented Oct 26, 2024 • edited Loading

lilyminium commented Oct 26, 2024

lilyminium commented Oct 26, 2024

IAlibay commented Oct 26, 2024

lilyminium commented Oct 25, 2024 •

edited

Loading

pep8speaks commented Oct 25, 2024 •

edited

Loading

codecov bot commented Oct 25, 2024 •

edited

Loading

lilyminium Oct 26, 2024 •

edited

Loading

lilyminium commented Oct 26, 2024 •

edited

Loading