Skip to content

Meeting Minutes 2020 08 10

Tomislav Janjusic edited this page Jan 6, 2024 · 1 revision

2020-08-10 Open MPI virtual face-to-face meeting

Attendees:

  • Jeff Squyres
  • Austen Lauria
  • Brian Barrett
  • Brice Goglin
  • Christoph Niethammer
  • Howard Pritchard
  • Joseph Schuchart
  • Josh Hursey
  • Matthew Dosanjh
  • Nysal Jan
  • Raghu Raja
  • Tom Naughton
  • Ralph Castain
  • Shinji Sumimoto
  • Todd Kordenbrock
  • Nathan Hjelm

Notes

Make a list of v5.0.0 user-noticeable changes

Just gather a list of all the user-noticeable changes here.

Make a list here in this one place so that it's easy to find if/when we go to actually document them.

  • 5.0 Breaks BW compat

    • ABI / SO version change

    • mpirun command line arguments change

    • MPIR is gone from v5.0, there is a shim library that users can use.

      • If you need the shim, go get it yourself
    • Totalview and DDT are both working on releasing -- they're waiting for Open MPI v5.0, so we don't have TV/DDT version numbers that support this yet.

    • PMI1/PMI2 from Slurm and cray: gone

    • ORTE is gone

      • Most noticeable via MCA params that are gone and/or other mpirun CLI args
    • mpirun: most args are double-dash now (single-dash is largely gone)

      • Would be good to make a list of the params that are gone
    • PMIx symbols are now visible to user applications

      • Be careful to not link in another PMIx!
    • Multiple different MCA layers -- three different namespaces of MCA params

      • PMIx
      • PRRTE
      • OMPI --> Need to document these somehow
    • Similar issue for the MCA config files

    • Similar issue for configure params

      • E.g., how to get to configure CLI options for underlying packages (e.g., expose PRRTE / PMIx configure CLI options through Open MPI's configure)
    • Cross reference to https://github.com/open-mpi/ompi/wiki/Webex-affinity-discussions-2019-09 wiki page for many CLI options that now exist in v5.0 (including listing deprecated options)

    • Josh is working on prte.1.md man page.

  • Qthreads / Argobots

    • This is a compile-time only decision
    • We should describe this specifically somewhere (README?)
    • There is only a very small subset of people who can/should use these options.
    • ...need some docs from Qthreads/Argobots people here.
    • NOTE: UCX PML does not use the synchronization object, so Argobots/Qthreads will effectively be stuck.
      • Joseph cites #7702 (issue)
  • ULFM:

    • Point off to their documentation
  • openib:

    • Use UCX
  • Vader:

    • Use sm. vader name going away eventually (6.0?)
  • ADAPT and HAN

    • More details on how to configure these yourselves if you want to. Advanced users only.
  • Connectivity map

    • ob1: will show the BTLs
    • cm: might show the MTLs...? Need to check
      • Assuming it does not go into Libfabric, etc.
    • UCX: shows ucx
    • Talk to Josh with suggestions for mpirun CLI options
      • Spectrum is "--prot" (and "--prot-lazy")
  • Tell people that they need to update their auto-completion stuff (bash, etc.) because the format of stuff [may have] changed in 5.0 ompi_info output

  • 5.0 general messaging

    • We really recommend you use external:
      • hwloc
      • libevent
      • pmix
      • prrte (see note below)
    • If you use the internal ones, you won't get the headers
    • sidenote: we do not (yet?) recommend using external PRRTE (because you won't get the trivial mpirun wrapper for PRRTE; you can "prun ..." yourself, of course)
  • How do we communicate this to users?

    • v5.0 release guide
    • Point to the EasyBuild videos / slides
    • Make error messages "google-able"
      • ...maybe FAQ style?
      • Maybe something better...? --> Brian points out that making something google-able is pretty darn easy. We just need the content that is linked to from somewhere.

To-dos

  • Sidenotes:

    • Still need ompi_info work to see PMIx/PRRTE/etc. (???)
    • Still need PRRTE pass-thru of configure params from OMPI configure (Josh)
  • Raghu: pointed out that only recent versions of HDF5 (as of Aug 2020) deleted their MPI-1 functionality

    • What are distro/packagers doing?
      • Debian is Open MPI v4.0.2 and does not pass --enable-mpi1-compatibility
      • Fedora ...?
    • Did they silently enable --enable-mpi1-compatibility so that packages didn't notice?
    • Might be worth checking the community on this -- last time we talked/checked this was Oct 2018 -- see https://github.com/open-mpi/ompi/wiki/5.0.x-FeatureList

Fujitsu status

See the Fujitsu slides.

MPI-4 features

  • MPI_SIZEOF deprecated: Jeff will handle

  • MPI_COMM_TYPE_HW_UNGUIDED/GUIDED added as possible value for split_type - Section 6.4.2 on page 269

    • Guilliume: did his prototype an external library, "hsplit"
    • The external library will have another hwloc
    • We will want to integrate that better in OMPI -- use our scalability stuff for hwloc, yadda yadda yadda
    • Embed this?
      • Probably best bet: pull it in as a good basis/starting point.
      • Integrate it deeply from there.
    • Brice will check with Guilliume, but assumption so far is that we should just integrate it as a starting point and go from there.
  • Callback-driven event interface added to MPI_T - Section 14.3.8

    • Nathan has this prototyped in a branch
    • There's still a few things that need to be adjusted on the branch, but it's close
    • We would want some test coverage for this
      • We assume there are no existing tests (Nathan thinks there may be some, somewhere...?)
        • Nathan had some test code to develop his branch.
        • He might be able to re-purpose those as real tests...?
    • This deletes PERUSE support -- need to sync with George on this.
  • New MPI_INFO_CREATE_ENV function - Section 10.2.1 on page 420

    • This is what Aurelien did, right?
    • We think so.
    • It's kind of an addendum to the sessions stuff.
  • MPI Sessions - many places in standard touched, main additions are in Chapter 6 and Chapter 10.

    • Howard has this on a branch.
    • It's "fully functional" but not fully debugged.
    • Last time it was rebased was mid-May.
    • Would be beneficial to restructure INIT/FINALIZE first
      • Some pieces of this went in to master already (allowed us to delete nice big chunks of code)
      • There are other pieces that are still only in the SESSIONS branch
    • Is this v5.0 or not?
      • ...not clear yet.
      • Would want to make sure that there's some MTT coverage of this
    • If by Oct, we haven't yet branched for v5 -- let's talk then about whether to bring this in to master/v5.0.
      • Should have some more sessions tests by then.
  • Embiggenment - https://github.com/mpi-forum/mpi-issues/issues/137

    • Doesn't sound like anyone is working on this in Open MPI ...?
    • Lower priority, but we'll need this someday (to claim MPI-4 conformance)
  • MPI shared memory window / alignment stuff

    • Joseph has a PR outstanding for this
  • MPI partitioned communication

    • Matthew/Sandia is working on a prototype implementation in Open MPI
    • It's Matthew's main focus for next few months

OFI heterogeneous memory support (e.g., CUDA/Accelerators)

Raghu talked about how AWS is working to support new Libfabric APIs for "hmem" (heterogeneous memory). Will eventually have a PR to talk about.

Public testing repo

Do we want to have a public repo for Open MPI tests?

  • ompi-tests is private because of its history
    • Mainly because we needed a way to easily share publicly-downloadable test suites all in one place back in the beginning of the project
    • It's a different internet now (trivially easy to download tests from anywhere on the internet). But ompi-tests remains.
  • What about new tests -- should they be private?
    • At least LANL would like DOE-funded self-written tests to be in a public repo, not a private repo.
    • This is a fair point.
    • HLRS would like to make their test suite public, too (that's currently in ompi-tests).
  • Proposal:
    • Make a new repo that is public
    • Use same permissions that we have on main OMPI repo
    • Use same LICENSE file that we have in the main OMPI repo
    • Any new tests can go in there
    • Old tests of which we are 100% sure they are Open MPI community providence can be moved to the new public repo (e.g., the HLRS test suite)
  • There was general agreement that this was a good thing.
    • Jeff will create ompi-tests-public repo.

No meeting tomorrow

Agenda is complete. No need to continue this meeting tomorrow.

But we still will have the "regular" Tuesday meeting tomorrow.

Clone this wiki locally