-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for reading chainID info from prmtop amber topologies. #4007
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #4007 +/- ##
==========================================
Coverage 93.39% 93.40%
==========================================
Files 170 184 +14
Lines 22224 23354 +1130
Branches 4065 4071 +6
==========================================
+ Hits 20757 21814 +1057
- Misses 951 1024 +73
Partials 516 516
☔ View full report in Codecov by Sentry. |
@lilyminium would you have some bandwidth to have a look at this topology parser enhancement? Alternatively, can you recommend someone else to review — if so, please pass the buck and ping them ;-). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple initial comments, my major blocker here is that, as far as I can tell, RESIDUE_CHAINID isn't a canonical flag for the parm7 format. I'd like to ask for docs on this format spec addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should have been a blocking review sorry.
02079e6
to
4a1ddd4
Compare
Here's the parmed function that add's the IDK if Amber developers consider it canonical, but parmed writes it, so I guess it can be relied upon. |
ace127d
to
dad2cbf
Compare
Could one of the other @MDAnalysis/coredevs weigh in here please? I'm rather on the fence on adding fields being introduced by parmed that aren't directly in the spec, I could go either way here. |
I am not an AMBER user myself so I don't know how important it is for typical use to support ParmEd extensions to the format. Could someone with AMBER experience comment? @pgbarletta is the parm7 format extensible, i.e. does it allow for additional/optional flags? Does it state what should happen if an optional flag is encountered? |
Parmed does provide functions for the addition/removal of %FLAGs. This is the most up to date (that is, 2004) reference for the format (according to the official site) which already shows modifications to support conversion from CHARMM parameters. If we decide to move forward, I should modify some one of the |
Thank you @pgbarletta . The PDF reference says
I read this as "it is allowed to extend the PRMTOP" so that we may have optional Given that ParmEd produced files are probably still common, I am supporting parsing an optional |
dad2cbf
to
a8fb3ab
Compare
Linter Bot Results:Hi @pgbarletta! Thanks for making this PR. We linted your code and found the following: Some issues were found with the formatting of your code.
Please have a look at the Please note: The |
8796ff1
to
3a35e36
Compare
Oops, sorry, missed that. Modified the |
d670ca0
to
5802593
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My primary concern is with testing: Can you leave the old test file as is and make a copy with the additions. Then use the new one to run additional tests for the new features. We need to know that files without the optional parts also work. Can you change the new test file so that there are multiple segments (even if it makes little sense for the actual system) but so that multi-chains is also tested.
For other things see inline comments — please reply there. Thanks!
can be added with the addPDB command from parmed: | ||
https://parmed.github.io/ParmEd/html/parmed.html#addpdb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use named links instead of bare links (see example above with
`PARM parameter/topology file specification`_
State what happens when %RESIDUE_CHAINID is not present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
x = FORTRANReader(y) | ||
|
||
def get_fmt(file): | ||
if (line := next(file))[:7] == "%FORMAT": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the syntax (line := ...)
do, i.e., (x := y)
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is equivalent to
line = next(file)
if line[:7] == "%FORMAT":
...
I hadn't see C-style "assignment inside conditionals" in Python. In which version of Python was this introduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I find the old-school Python two-line syntax clearer but I'll concede that this is a matter of personal style. More important is if this syntax is supported by all our Python versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the 'walrus operator', added on 3.8 and the minimum supported one is 3.9, so we'd be good. Anyways, I ended up dropping it.
attr : :class:`Segids` | ||
A :class:`Segids` instance containing the chainID of each residue | ||
as defined in the parm7 file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really return a segids instance?? It looks as if this just returns an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad mistake. Fixed.
%VERSION VERSION_STAMP = V0001.000 DATE = 07/25/06 09:16:39 | ||
%FLAG TITLE | ||
%FORMAT(20a4) | ||
NALA | ||
%FLAG POINTERS | ||
%FORMAT(10I8) | ||
%VERSION VERSION_STAMP = V0001.000 DATE = 07/13/23 13:12:57 | ||
%FLAG TITLE | ||
%FORMAT(20a4) | ||
NALA | ||
%FLAG POINTERS | ||
%FORMAT(10I8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter that the new version does not have full white-space padded lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's a tell of which software was used to write the .prmtop
. tleap uses fortran format writing, while parmed writes only the character that needs.
%COMMENT If present: %FLAG RESIDUE_ICODE, %FORMAT(20a4) | ||
%FORMAT(20I4) | ||
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ||
%FLAG RESIDUE_CHAINID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather keep the old test file and make a new one with the additions. Then test both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
d1ddbcb
to
2a0bb02
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick initial comments - I'll have a proper review over the weekend.
attrs["segids"] = Segids(segids) | ||
attrs["ChainIDs"] = ChainIDs(chainids) | ||
n_segs = len(segids) | ||
except (KeyError, ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this, the default option is that the optional attribute (which isn't going to be present in most PARM7 files), is the option you hope will happen and then throw an error in all other cases?
I'm not very keen on this, since it means that we have to eat up the cost of throwing an error for 99% of the time. Is there a better way to reformat this so we don't have to deal with the error throwing / catching cost in most cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Using ifs now.
@@ -0,0 +1,4506 @@ | |||
%VERSION VERSION_STAMP = V0001.000 DATE = 07/20/23 21:22:41 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
304 kb for a new optional case is quite a lot. Can we bz2 this file please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -426,6 +426,8 @@ | |||
PFncdf_Top = (_data_ref / 'Amber/posfor.top').as_posix() | |||
PFncdf_Trj = (_data_ref / 'Amber/posfor.ncdf').as_posix() | |||
|
|||
PRMmulti = (_data_ref / "Amber/multi_ache.prmtop").as_posix() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PRMmulti doesn't really give me a sense of what is in this file. Can we either rename it to "PRM_chainids" or at the very least have a comment explaining what this file contains?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"names", | ||
"types", | ||
"type_indices", | ||
"charges", | ||
"masses", | ||
"resnames", | ||
"bonds", | ||
"angles", | ||
"dihedrals", | ||
"impropers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changed here? If it's just a reformatting thing, could you revert this? It's a lot of extra noise on the diff that will make git blaming difficult if it ever needs to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I can revert that if you want, but Darker asks for it. If the idea was to go towards flake8/black compliance, then stuff like this is gonna happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the text of the darker bot, if it's not a flake8 failure then it's ok to ignore.
The flake8 failures are there but don't need addressing because we don't enforce changes on datafiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I misread that. I thought it was my new TestPRMChainidParser
class, not the old TestPRMParser
. It's fixed now.
@@ -201,6 +210,76 @@ class TestPRMParser(TOPBase): | |||
expected_elems = None | |||
|
|||
|
|||
class TestPRMmultiParser(TOPBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, multi is a bit hard to know what it does, so maybe add an issue / PR number and a comment on what it's checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
80b1ed2
to
532832a
Compare
@pgbarletta we are changing the way in which future contributions to MDAnalysis are made. Could you please confirm that you agree to releasing this code under the terms of the LGPLv2.1 and that your contribution also adheres to the developer certificate of origion? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick feedback, sorry this is taking so long.
@orbeckst can I ask you for a review here please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smaller questions + what @IAlibay said.
package/CHANGELOG
Outdated
* Add support for reading chainID info from prmtop amber topologies | ||
(PR #4007) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not resolved because your editor inserted a <TAB>
instead of spaces, try
* Add support for reading chainID info from prmtop amber topologies | |
(PR #4007) | |
* Add support for reading chainID info from prmtop amber topologies | |
(PR #4007) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation is still incorrect. If you commit my suggestion, it should be fixed.
Yes I release the code under LGPLv2.1 terms and yes my contribution adheres to the developer certificate of origin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You addressed all the major things but it looks as if some of the smaller ones fell through the cracks — see comments please. Just the minor issues and then it's good from my side.
EDIT: The test is a major request that still needs to be added, see uncovered lines https://app.codecov.io/gh/MDAnalysis/mdanalysis/pull/4007/blob/package/MDAnalysis/topology/TOPParser.py#L322
package/CHANGELOG
Outdated
* Add support for reading chainID info from prmtop amber topologies | ||
(PR #4007) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation is still incorrect. If you commit my suggestion, it should be fixed.
Some runners timed out. I restarted one but stupid GH does not let me restart the other one in parallel. |
f601f14
to
10f89be
Compare
Sorry about the delay @pgbarletta - Just been a bit stalled by the current release. I'm going to try to get a re-review in this evening. |
No worries, I just switched continents and I've been dealing with that as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So sorry about the delay here @pgbarletta - I'm finally out of release mode so gettting this merged is now my priority.
On my end of things, there's only 2 things left and we should be good to go.
@orbeckst can I ask you for a re-review please?
@@ -162,6 +170,8 @@ class TOPParser(TopologyReaderBase): | |||
warns users that chamber-style topologies are not currently supported | |||
.. versionchanged:: 2.0.0 | |||
no longer guesses elements if missing | |||
.. versionchanged:: 2.6.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. versionchanged:: 2.6.0 | |
.. versionchanged:: 2.7.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just this and I think we're good (at least on my side of things)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, sorry. Fixed.
10f89be
to
18485e0
Compare
@@ -85,6 +88,23 @@ def test_impropers_atom_counts(self, filename): | |||
assert len(u.atoms[[self.atom_i]].impropers) == \ | |||
self.expected_n_i_impropers | |||
|
|||
def test_chainIDs(self, filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than have this in a parent class that will be only used by one child class, could we just put this in the child class only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good to me.
@orbeckst could I ask you for a re-review please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looking good! Thank you @pgbarletta !!
expected_chainIDs = np.array( | ||
[ | ||
"A", | ||
"A", | ||
"A", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this "one line for each element" black
's doing? ... Not my favorite code aesthetic (because it makes it harder to see context) but I accept it as your preferred choice.
So don't worry about my comment, I am just venting — I actually recognize the upsides to auto-formatters, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is one of those bits that really kills readability and intent in code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't even understand the rationale when it seems perfectly capable to organize other arrays neatly fitted in a line. ... Aaaaanyways... I am trying out black in MDPOW, just to see what it feels like and if it's really as life-altering, time-saving as @RMeli says it is ;-).
Thank you @pgbarletta — it took a while (sorry) but it's finally merged! 🎉 |
Hi!
Would something like this be of interest? While tleap doesn't add chainID info, parmed does and I though I would be nice to support it as well.
Changes made in this Pull Request:
get_fmt()
toTOPParser.parsesection_mapper()
to support.prmtop
comments like the following:parse_chainids()
toTOPParser
to try to parse chainID information from.prmtop
topology files.PR Checklist