`PARTFILE` support for near-null vectors (and eigenvectors) #1398

weinbe2 · 2023-08-11T14:29:23Z

This PR exposes the ability to save near-null vectors (and eigenvectors) in QIO's PARTFILE format, which is one file per MPI rank. The primary purpose of this is to speed up the saving (and loading) of near-null vectors during MG when tuning the algorithm, but it can also be used (very effectively) in production runs so long as you can assume the process decomposition will not change between runs.

A description of a PARTFILE workflow where files are stored to per-node local scratch disks, copied to the network drive after the run---and then the process is run in reverse on later runs---has already been documented on the QUDA wiki here.

This is threaded through the test executables via the flags --mg-save-partfile and --eig-save-partfile, as well as through the MILC MG interface.

Of note: there is no need for an analogous "loading" flag because QIO will automatically look for singlefile, then partfile, versions of a file on the load. There is also no functional reason why this can't be added for gauge fields as well, there is just far less of a use case (and much more risk for confusion).

This has been verified to give a speedup for 144^3x288 HISQ MG workflows on Selene where saving 64 fine-level near-null vectors goes from taking ~144 seconds to ~6 seconds. While I don't have the allocation to perform fresh timings on other machines, historically I have seen the analogous save take up to an hour on Summit; it's expected this would be much faster with the on-node SSDs.

Documentation: see https://github.com/lattice/quda/wiki/QIO-and-QMP
Adding to io_test
clang-format

… format, plus exposed it on the command line

maddyscientist

This is a great addition. A couple of things:

io_test needs to be extended to test the PARTFILE saving. Of course this will only be non-trivial when running on multiple processes, but that's fine.
Should the QudaBoolean addition in the interface instead just be bool? We already implicitly require C99 support, so I see no reason not to just use bool. While in the long term we'll want to remove the QudaBoolean for legacy interface options, perhaps now we draw a line in the sand and just use bool for new additions?

weinbe2 · 2023-08-11T15:55:03Z

This is a great addition. A couple of things:

io_test needs to be extended to test the PARTFILE saving. Of course this will only be non-trivial when running on multiple processes, but that's fine.

Should the QudaBoolean addition in the interface instead just be bool? We already implicitly require C99 support, so I see no reason not to just use bool. While in the long term we'll want to remove the QudaBoolean for legacy interface options, perhaps now we draw a line in the sand and just use bool for new additions?

re: io_test, I had it in the to-do checklist in my PR already :) But it's also now done in fd467c0 .

As for just using bool... I agree we should remove it in the future, but in the off chance this causes a problem for some external code, I don't want this to be the PR that triggers it. I think we should change the convention in one go and deal with the consequences then.

maddyscientist · 2023-08-11T15:56:59Z

Fair enough regarding bool / QudaBoolean. Line in sand can be drawn another day.

weinbe2 added 5 commits August 10, 2023 12:14

Threaded in support for saving near-null and eigenvectors in PARTFILE…

91b9f9c

… format, plus exposed it on the command line

Fixed a few bugs in partfile saving

37d9a85

Added verbose timers directly around read/write vector routines

5c0867a

Threaded partfile support through the MILC MG interface

478e7ab

Addressed a missing eig partfile flag

22a6101

weinbe2 added feature optimization labels Aug 11, 2023

weinbe2 assigned maddyscientist Aug 11, 2023

weinbe2 requested a review from a team as a code owner August 11, 2023 14:29

maddyscientist requested changes Aug 11, 2023

View reviewed changes

Added partfile loading/saving to io_test

fd467c0

weinbe2 added 2 commits August 11, 2023 08:59

Fixed a 1xGPU corner case in io_test

461d6fe

clang-format

c162881

maddyscientist approved these changes Aug 11, 2023

View reviewed changes

weinbe2 merged commit 5d8f528 into develop Aug 14, 2023
1 check passed

weinbe2 deleted the feature/partfile-colorspinor branch August 14, 2023 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`PARTFILE` support for near-null vectors (and eigenvectors) #1398

`PARTFILE` support for near-null vectors (and eigenvectors) #1398

weinbe2 commented Aug 11, 2023 •

edited

Loading

maddyscientist left a comment

weinbe2 commented Aug 11, 2023

maddyscientist commented Aug 11, 2023

PARTFILE support for near-null vectors (and eigenvectors) #1398

PARTFILE support for near-null vectors (and eigenvectors) #1398

Conversation

weinbe2 commented Aug 11, 2023 • edited Loading

maddyscientist left a comment

Choose a reason for hiding this comment

weinbe2 commented Aug 11, 2023

maddyscientist commented Aug 11, 2023

`PARTFILE` support for near-null vectors (and eigenvectors) #1398

`PARTFILE` support for near-null vectors (and eigenvectors) #1398

weinbe2 commented Aug 11, 2023 •

edited

Loading