-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bond deletion #4763
Fix bond deletion #4763
Conversation
Hello @lilyminium! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2024-10-26 04:28:34 UTC |
@@ -146,7 +146,7 @@ def parse(): | |||
struc = parse() | |||
|
|||
assert hasattr(struc, 'bonds') | |||
assert len(struc.bonds.values) == 4 | |||
assert len(struc.bonds.values) == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These used to be bonds between [(0, 1), (0, 1), (1, 2), (1, 2)] -- so two bonds repeated. This change makes it so each bond is unique.
@@ -158,7 +158,7 @@ def parse(): | |||
with pytest.warns(UserWarning): | |||
struc = parse() | |||
assert hasattr(struc, 'bonds') | |||
assert len(struc.bonds.values) == 2 | |||
assert len(struc.bonds.values) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #4763 +/- ##
===========================================
- Coverage 93.59% 93.15% -0.45%
===========================================
Files 177 12 -165
Lines 21708 1066 -20642
Branches 3052 0 -3052
===========================================
- Hits 20318 993 -19325
+ Misses 943 73 -870
+ Partials 447 0 -447 ☔ View full report in Codecov by Sentry. |
@lilyminium this PR is marked as draft - is it meant to be up for review or is there more work that needs to be done? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good to me - however there's one likely outcome of this "behaviour change" that might have a real impact.
Some software intentionally misuse the PDB format's CONECT records to represent different bond orders - PyMol is a good example of this iirc. In those cases, my understanding is that with this change you would lose that information?
As far as I'm aware, the PDB writer would lose that information even if you have duplicate bonds because we call a set on the bonds of all the atoms (which should squash out duplicates?). So there's probably a good case to be made that support for multi-conect PDBs is not something we want to do anyways.
I'm not particularly sure that I care about the behaviour change, but I do want to cc @MDAnalysis/coredevs (especially @orbeckst @yuxuanzhuang and @RMeli ) who might have some more immediate thoughts on the whole thing.
Failures aren't happening in other PRs - it looks like something that has been specifically introduced here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to revert to a set for the not in
comparison, or do some other kind of trick to avoid the performance loss.
for v, t, g, o in zip(values, types, guessed, order): | ||
if v not in existing: | ||
if v not in self.values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is likely the culprit - doing a not in
operation on a list is orders of magnitude slower than on a set.
Example:
In [1]: import random
In [2]: lister = [random.random() for i in range(10000)]
In [3]: setter = set(lister)
In [4]: len(setter)
Out[4]: 10000
In [5]: len(lister)
Out[5]: 10000
In [6]: def notin(a):
...: for v in range(10000):
...: if v not in a:
...: pass
...: return
...:
In [7]: %timeit notin(lister)
734 ms ± 5.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [8]: %timeit notin(setter)
129 μs ± 166 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this in the commit below, but note to self that changing self.values
to a dict might be the easiest way to do this performantly and retain both uniqueness and order.
Or masking the input values
after checking a set
version of self.values
might be less work but still more performant.
Hmm. Yes that would be the case that multi-bonds can't be sketchily represented by digging into Topology internals, which downstream people may well be doing. I can think of two solutions to this in this PR:
|
Pushed a commit that removes the "addition" fixes and preserves previous behaviour (should be easy to revert if discussion goes the other way) |
I'm leaning towards the simpler fix of just ensuring that all identical bonds are deleted when deletion is caused, as it's less disruptive to current behaviour, and potentially fixing the addition of identical bonds in a separate issue/PR later, so I've updated the changelog too to reflect that -- should still be easy to revert if there's any disagreement |
Going with just the bond deletion fix seems like a reasonable thing for this release. |
Fixes #4762
Changes made in this Pull Request:
Changes Topology creation and addition to only accept a single identical bond/angle/dihedral/etc instead of multiple(see discussion in PR)PR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4763.org.readthedocs.build/en/4763/