Add support for reading chainID info from prmtop amber topologies. #4007

pgbarletta · 2023-01-23T16:24:24Z

Hi!

Would something like this be of interest? While tleap doesn't add chainID info, parmed does and I though I would be nice to support it as well.

Changes made in this Pull Request:

Add get_fmt() to TOPParser.parsesection_mapper() to support .prmtop comments like the following:

%FLAG RESIDUE_CHAINID
%COMMENT Residue chain ID (chainId) read from PDB file; DIMENSION(NRES)
%FORMAT(20a4)

Add parse_chainids() to TOPParser to try to parse chainID information from .prmtop topology files.

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

codecov · 2023-01-23T16:43:40Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (c2e6df9) 93.39% compared to head (2c6513b) 93.40%.

Additional details and impacted files

@@            Coverage Diff             @@
##           develop    #4007     +/-   ##
==========================================
  Coverage    93.39%   93.40%             
==========================================
  Files          170      184     +14     
  Lines        22224    23354   +1130     
  Branches      4065     4071      +6     
==========================================
+ Hits         20757    21814   +1057     
- Misses         951     1024     +73     
  Partials       516      516

Files Changed	Coverage Δ
package/MDAnalysis/topology/TOPParser.py	`100.00% <100.00%> (ø)`

... and 14 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

orbeckst · 2023-02-13T02:39:19Z

@lilyminium would you have some bandwidth to have a look at this topology parser enhancement?

Alternatively, can you recommend someone else to review — if so, please pass the buck and ping them ;-).

IAlibay

Couple initial comments, my major blocker here is that, as far as I can tell, RESIDUE_CHAINID isn't a canonical flag for the parm7 format. I'd like to ask for docs on this format spec addition.

package/MDAnalysis/topology/TOPParser.py

IAlibay

That should have been a blocking review sorry.

pgbarletta · 2023-02-18T14:08:57Z

Here's the parmed function that add's the RESIDUE_CHAINID flag.

IDK if Amber developers consider it canonical, but parmed writes it, so I guess it can be relied upon.

IAlibay · 2023-07-11T19:39:35Z

Could one of the other @MDAnalysis/coredevs weigh in here please? I'm rather on the fence on adding fields being introduced by parmed that aren't directly in the spec, I could go either way here.

orbeckst · 2023-07-11T21:33:52Z

I am not an AMBER user myself so I don't know how important it is for typical use to support ParmEd extensions to the format. Could someone with AMBER experience comment?

@pgbarletta is the parm7 format extensible, i.e. does it allow for additional/optional flags? Does it state what should happen if an optional flag is encountered?

pgbarletta · 2023-07-12T08:14:14Z

@pgbarletta is the parm7 format extensible, i.e. does it allow for additional/optional flags? Does it state what should happen if an optional flag is encountered?

Parmed does provide functions for the addition/removal of %FLAGs. This is the most up to date (that is, 2004) reference for the format (according to the official site) which already shows modifications to support conversion from CHARMM parameters.

If we decide to move forward, I should modify some one of the .prmtop topologies from the tests so I can test this feature. Otherwise, as @IAlibay, I could either move forward or scrap it. I just thought it would be a nice addition since amber users are left with no chainID info when reading from .rst7 and .prmtop .

orbeckst · 2023-07-12T17:53:49Z

Thank you @pgbarletta . The PDF reference says

Any new SECTION should be added to the end of the topology file to avoid conflicts with order-dependent parsers.

...

Avoid modifying if possible. Consider if this new section or change is truly necessary and belongs in the prmtop.

I read this as "it is allowed to extend the PRMTOP" so that we may have optional %FLAG SECTION.

Given that ParmEd produced files are probably still common, I am supporting parsing an optional %FLAG RESIDUE_CHAINID field, provided there are no problems when it is absent.

github-actions · 2023-07-13T11:04:55Z

Linter Bot Results:

Hi @pgbarletta! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location	Outcome
main package	⚠️ Possible failure
testsuite	✅ Passed

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/6060116484/job/16443944129

Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

pgbarletta · 2023-07-13T15:00:51Z

Thank you @pgbarletta . The PDF reference says
...

Oops, sorry, missed that.

Modified the ache.prmtop data file and the corresponding test, so the feature is tested as well.
If tests go well, I'll add it to the changelog and re-request a review.

orbeckst

My primary concern is with testing: Can you leave the old test file as is and make a copy with the additions. Then use the new one to run additional tests for the new features. We need to know that files without the optional parts also work. Can you change the new test file so that there are multiple segments (even if it makes little sense for the actual system) but so that multi-chains is also tested.

For other things see inline comments — please reply there. Thanks!

package/CHANGELOG

orbeckst · 2023-07-19T17:20:54Z

package/MDAnalysis/topology/TOPParser.py

+    can be added with the addPDB command from parmed:
+    https://parmed.github.io/ParmEd/html/parmed.html#addpdb


Please use named links instead of bare links (see example above with

`PARM parameter/topology file specification`_

State what happens when %RESIDUE_CHAINID is not present.

package/MDAnalysis/topology/TOPParser.py

orbeckst · 2023-07-19T17:28:49Z

package/MDAnalysis/topology/TOPParser.py

-        x = FORTRANReader(y)
+
+        def get_fmt(file):
+            if (line := next(file))[:7] == "%FORMAT":


What does the syntax (line := ...) do, i.e., (x := y) ??

I guess it is equivalent to

line = next(file) if line[:7] == "%FORMAT": ...

I hadn't see C-style "assignment inside conditionals" in Python. In which version of Python was this introduced?

Personally, I find the old-school Python two-line syntax clearer but I'll concede that this is a matter of personal style. More important is if this syntax is supported by all our Python versions.

Yes, the 'walrus operator', added on 3.8 and the minimum supported one is 3.9, so we'd be good. Anyways, I ended up dropping it.

orbeckst · 2023-07-19T17:37:02Z

package/MDAnalysis/topology/TOPParser.py

+        attr : :class:`Segids`
+            A :class:`Segids` instance containing the chainID of each residue
+            as defined in the parm7 file


Does this really return a segids instance?? It looks as if this just returns an array.

Bad mistake. Fixed.

package/MDAnalysis/topology/TOPParser.py

orbeckst · 2023-07-19T17:39:49Z

testsuite/MDAnalysisTests/data/Amber/ache.prmtop

-%VERSION  VERSION_STAMP = V0001.000  DATE = 07/25/06  09:16:39                  
-%FLAG TITLE                                                                     
-%FORMAT(20a4)                                                                   
-NALA                                                                            
-%FLAG POINTERS                                                                  
-%FORMAT(10I8)                                                                   
+%VERSION  VERSION_STAMP = V0001.000  DATE = 07/13/23  13:12:57
+%FLAG TITLE
+%FORMAT(20a4)
+NALA
+%FLAG POINTERS
+%FORMAT(10I8)


Does it matter that the new version does not have full white-space padded lines?

No, it's a tell of which software was used to write the .prmtop. tleap uses fortran format writing, while parmed writes only the character that needs.

orbeckst · 2023-07-19T17:41:45Z

testsuite/MDAnalysisTests/data/Amber/ache.prmtop

+%COMMENT If present: %FLAG RESIDUE_ICODE, %FORMAT(20a4)
+%FORMAT(20I4)
+   1   2   3   4   5   6   7   8   9  10  11  12  13  14
+%FLAG RESIDUE_CHAINID


I'd rather keep the old test file and make a new one with the additions. Then test both.

pgbarletta · 2023-07-21T10:17:24Z

I added TestPRMmultiParser to test the new multi_ache.prmtop and restored TestPRMParser and its tested file ache.prmtop. I copied TestPRMParser to make TestPRMmultiParser so there's a lot of overlap between the tests that I think are a bit of a waste. If you're ok with it, I'd like to remove the tests on atom_i and atom_zero.
Regarding errors in the RESIDUE_CHAINID flag: I decided that giving a proper warning and just reading the file as if there was no RESIDUE_CHAINID info was the right thing to do, mainly because this is an optional flag and I don't think a field that is optional should prevent the user from doing what they want. Let me know if it's ok with you.

IAlibay

Some quick initial comments - I'll have a proper review over the weekend.

IAlibay · 2023-07-21T10:41:57Z

package/MDAnalysis/topology/TOPParser.py

+            attrs["segids"] = Segids(segids)
+            attrs["ChainIDs"] = ChainIDs(chainids)
+            n_segs = len(segids)
+        except (KeyError, ValueError):


If I'm reading this, the default option is that the optional attribute (which isn't going to be present in most PARM7 files), is the option you hope will happen and then throw an error in all other cases?

I'm not very keen on this, since it means that we have to eat up the cost of throwing an error for 99% of the time. Is there a better way to reformat this so we don't have to deal with the error throwing / catching cost in most cases?

Done. Using ifs now.

package/MDAnalysis/topology/TOPParser.py

IAlibay · 2023-07-21T10:44:41Z

testsuite/MDAnalysisTests/data/Amber/multi_ache.prmtop

@@ -0,0 +1,4506 @@
+%VERSION  VERSION_STAMP = V0001.000  DATE = 07/20/23  21:22:41


304 kb for a new optional case is quite a lot. Can we bz2 this file please?

IAlibay · 2023-07-21T10:45:39Z

testsuite/MDAnalysisTests/datafiles.py

@@ -426,6 +426,8 @@
 PFncdf_Top = (_data_ref / 'Amber/posfor.top').as_posix()
 PFncdf_Trj = (_data_ref / 'Amber/posfor.ncdf').as_posix()

+PRMmulti = (_data_ref / "Amber/multi_ache.prmtop").as_posix()


PRMmulti doesn't really give me a sense of what is in this file. Can we either rename it to "PRM_chainids" or at the very least have a comment explaining what this file contains?

IAlibay · 2023-07-21T10:47:20Z

testsuite/MDAnalysisTests/topology/test_top.py

+        "names",
+        "types",
+        "type_indices",
+        "charges",
+        "masses",
+        "resnames",
+        "bonds",
+        "angles",
+        "dihedrals",
+        "impropers",


What changed here? If it's just a reformatting thing, could you revert this? It's a lot of extra noise on the diff that will make git blaming difficult if it ever needs to happen.

Well, I can revert that if you want, but Darker asks for it. If the idea was to go towards flake8/black compliance, then stuff like this is gonna happen.

As per the text of the darker bot, if it's not a flake8 failure then it's ok to ignore.

The flake8 failures are there but don't need addressing because we don't enforce changes on datafiles.

sorry, I misread that. I thought it was my new TestPRMChainidParser class, not the old TestPRMParser. It's fixed now.

IAlibay · 2023-07-21T10:48:05Z

testsuite/MDAnalysisTests/topology/test_top.py

@@ -201,6 +210,76 @@ class TestPRMParser(TOPBase):
    expected_elems = None


+class TestPRMmultiParser(TOPBase):


As above, multi is a bit hard to know what it does, so maybe add an issue / PR number and a comment on what it's checking?

package/MDAnalysis/topology/TOPParser.py

IAlibay · 2023-07-31T10:57:45Z

@pgbarletta we are changing the way in which future contributions to MDAnalysis are made. Could you please confirm that you agree to releasing this code under the terms of the LGPLv2.1 and that your contribution also adheres to the developer certificate of origion?

IAlibay

Some quick feedback, sorry this is taking so long.

@orbeckst can I ask you for a review here please?

package/CHANGELOG

package/MDAnalysis/topology/TOPParser.py

testsuite/MDAnalysisTests/datafiles.py

package/MDAnalysis/topology/TOPParser.py

orbeckst

Smaller questions + what @IAlibay said.

orbeckst · 2023-07-31T15:54:30Z

package/CHANGELOG

+  * Add support for reading chainID info from prmtop amber topologies
+	(PR #4007)


Still not resolved because your editor inserted a <TAB> instead of spaces, try

Suggested change

* Add support for reading chainID info from prmtop amber topologies

(PR #4007)

* Add support for reading chainID info from prmtop amber topologies

(PR #4007)

The indentation is still incorrect. If you commit my suggestion, it should be fixed.

package/MDAnalysis/topology/TOPParser.py

pgbarletta · 2023-08-01T13:20:11Z

@pgbarletta we are changing the way in which future contributions to MDAnalysis are made. Could you please confirm that you agree to releasing this code under the terms of the LGPLv2.1 and that your contribution also adheres to the developer certificate of origion?

Yes I release the code under LGPLv2.1 terms and yes my contribution adheres to the developer certificate of origin.

orbeckst

You addressed all the major things but it looks as if some of the smaller ones fell through the cracks — see comments please. Just the minor issues and then it's good from my side.

EDIT: The test is a major request that still needs to be added, see uncovered lines https://app.codecov.io/gh/MDAnalysis/mdanalysis/pull/4007/blob/package/MDAnalysis/topology/TOPParser.py#L322

orbeckst · 2023-08-11T01:26:15Z

package/CHANGELOG

+  * Add support for reading chainID info from prmtop amber topologies
+	(PR #4007)


The indentation is still incorrect. If you commit my suggestion, it should be fixed.

package/MDAnalysis/topology/TOPParser.py

orbeckst · 2023-08-11T16:30:31Z

Some runners timed out. I restarted one but stupid GH does not let me restart the other one in parallel.

IAlibay · 2023-08-16T14:37:50Z

Sorry about the delay @pgbarletta - Just been a bit stalled by the current release. I'm going to try to get a re-review in this evening.

pgbarletta · 2023-08-16T14:42:39Z

Sorry about the delay @pgbarletta - Just been a bit stalled by the current release. I'm going to try to get a re-review in this evening.

No worries, I just switched continents and I've been dealing with that as well.

IAlibay

So sorry about the delay here @pgbarletta - I'm finally out of release mode so gettting this merged is now my priority.

On my end of things, there's only 2 things left and we should be good to go.

@orbeckst can I ask you for a re-review please?

IAlibay · 2023-09-02T10:55:01Z

package/MDAnalysis/topology/TOPParser.py

@@ -162,6 +170,8 @@ class TOPParser(TopologyReaderBase):
      warns users that chamber-style topologies are not currently supported
    .. versionchanged:: 2.0.0
      no longer guesses elements if missing
+    .. versionchanged:: 2.6.0


Suggested change

.. versionchanged:: 2.6.0

.. versionchanged:: 2.7.0

Just this and I think we're good (at least on my side of things)

oops, sorry. Fixed.

testsuite/MDAnalysisTests/topology/test_top.py

IAlibay · 2023-09-02T18:50:25Z

testsuite/MDAnalysisTests/topology/test_top.py

@@ -85,6 +88,23 @@ def test_impropers_atom_counts(self, filename):
        assert len(u.atoms[[self.atom_i]].impropers) == \
            self.expected_n_i_impropers

+    def test_chainIDs(self, filename):


rather than have this in a parent class that will be only used by one child class, could we just put this in the child class only?

IAlibay

Thanks, this looks good to me.

@orbeckst could I ask you for a re-review please?

orbeckst

All looking good! Thank you @pgbarletta !!

orbeckst · 2023-09-02T20:06:08Z

testsuite/MDAnalysisTests/topology/test_top.py

+    expected_chainIDs = np.array(
+        [
+            "A",
+            "A",
+            "A",


Is this "one line for each element" black's doing? ... Not my favorite code aesthetic (because it makes it harder to see context) but I accept it as your preferred choice.

So don't worry about my comment, I am just venting — I actually recognize the upsides to auto-formatters, too.

Yeah this is one of those bits that really kills readability and intent in code.

I don't even understand the rationale when it seems perfectly capable to organize other arrays neatly fitted in a line. ... Aaaaanyways... I am trying out black in MDPOW, just to see what it feels like and if it's really as life-altering, time-saving as @RMeli says it is ;-).

orbeckst · 2023-09-03T01:37:13Z

Thank you @pgbarletta — it took a while (sorry) but it's finally merged! 🎉

github-actions bot added the Component-Topology label Jan 23, 2023

IAlibay reviewed Feb 13, 2023

View reviewed changes

package/MDAnalysis/topology/TOPParser.py Outdated Show resolved Hide resolved

package/MDAnalysis/topology/TOPParser.py Outdated Show resolved Hide resolved

package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved

IAlibay requested changes Feb 13, 2023

View reviewed changes

pgbarletta force-pushed the add_top_segid branch from 02079e6 to 4a1ddd4 Compare February 18, 2023 13:57

pgbarletta force-pushed the add_top_segid branch 2 times, most recently from ace127d to dad2cbf Compare February 27, 2023 12:38

orbeckst assigned IAlibay Mar 23, 2023

orbeckst added enhancement Format-Amber labels Jul 12, 2023

pgbarletta force-pushed the add_top_segid branch from dad2cbf to a8fb3ab Compare July 13, 2023 11:01

pgbarletta force-pushed the add_top_segid branch from 8796ff1 to 3a35e36 Compare July 13, 2023 14:56

pgbarletta force-pushed the add_top_segid branch from d670ca0 to 5802593 Compare July 13, 2023 19:16

pgbarletta requested a review from IAlibay July 13, 2023 19:54

orbeckst requested changes Jul 19, 2023

View reviewed changes

pgbarletta force-pushed the add_top_segid branch 4 times, most recently from d1ddbcb to 2a0bb02 Compare July 21, 2023 09:48

pgbarletta requested a review from orbeckst July 21, 2023 10:17

IAlibay requested changes Jul 21, 2023

View reviewed changes

pgbarletta force-pushed the add_top_segid branch from 80b1ed2 to 532832a Compare July 21, 2023 13:08

IAlibay self-requested a review July 29, 2023 00:13

IAlibay requested changes Jul 31, 2023

View reviewed changes

orbeckst requested changes Jul 31, 2023

View reviewed changes

IAlibay requested review from orbeckst and IAlibay August 10, 2023 22:32

orbeckst requested changes Aug 11, 2023

View reviewed changes

pgbarletta force-pushed the add_top_segid branch 4 times, most recently from f601f14 to 10f89be Compare August 16, 2023 14:22

IAlibay requested changes Sep 2, 2023

View reviewed changes

pgbarletta added 3 commits September 2, 2023 14:04

Add chainID support for prmtop amber topologies

c052e04

Fix formatting

c2c91ae

Add test_chainIDs()

18485e0

pgbarletta force-pushed the add_top_segid branch from 10f89be to 18485e0 Compare September 2, 2023 18:47

IAlibay requested changes Sep 2, 2023

View reviewed changes

pgbarletta added 2 commits September 2, 2023 15:54

Moved test_chainIDs() to TestPRMChainidParser

5b461f5

Fix versionchanged on TOPParser

2c6513b

IAlibay approved these changes Sep 2, 2023

View reviewed changes

IAlibay requested a review from orbeckst September 2, 2023 19:24

orbeckst approved these changes Sep 2, 2023

View reviewed changes

orbeckst self-assigned this Sep 3, 2023

orbeckst merged commit 2acd594 into MDAnalysis:develop Sep 3, 2023
19 of 21 checks passed

		can be added with the addPDB command from parmed:
		https://parmed.github.io/ParmEd/html/parmed.html#addpdb

		@@ -0,0 +1,4506 @@
		%VERSION VERSION_STAMP = V0001.000 DATE = 07/20/23 21:22:41

		@@ -201,6 +210,76 @@ class TestPRMParser(TOPBase):
		expected_elems = None


		class TestPRMmultiParser(TOPBase):

		* Add support for reading chainID info from prmtop amber topologies
		(PR #4007)

Add support for reading chainID info from prmtop amber topologies. #4007

Add support for reading chainID info from prmtop amber topologies. #4007

Conversation

pgbarletta commented Jan 23, 2023 • edited by orbeckst Loading

PR Checklist

codecov bot commented Jan 23, 2023 • edited Loading

Codecov Report

orbeckst commented Feb 13, 2023

IAlibay left a comment

Choose a reason for hiding this comment

IAlibay left a comment

Choose a reason for hiding this comment

pgbarletta commented Feb 18, 2023

IAlibay commented Jul 11, 2023

orbeckst commented Jul 11, 2023

pgbarletta commented Jul 12, 2023

orbeckst commented Jul 12, 2023

github-actions bot commented Jul 13, 2023 • edited Loading

Linter Bot Results:

pgbarletta commented Jul 13, 2023

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgbarletta commented Jul 21, 2023

IAlibay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IAlibay commented Jul 31, 2023

IAlibay left a comment

Choose a reason for hiding this comment

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgbarletta commented Aug 1, 2023

orbeckst left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Aug 11, 2023

IAlibay commented Aug 16, 2023

pgbarletta commented Aug 16, 2023

IAlibay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IAlibay left a comment

Choose a reason for hiding this comment

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Sep 3, 2023

pgbarletta commented Jan 23, 2023 •

edited by orbeckst

Loading

codecov bot commented Jan 23, 2023 •

edited

Loading

github-actions bot commented Jul 13, 2023 •

edited

Loading

orbeckst left a comment •

edited

Loading