Skip to content

WeeklyTelcon_20210112

Geoffrey Paulsen edited this page Jan 19, 2021 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Akshay Venkatesh (NVIDIA)
  • Aurelien Bouteiller (UTK)
  • Brendan Cunningham (Cornelis Networks)
  • Christoph Niethammer (HLRS)
  • Edgar Gabriel (UH)
  • Geoffrey Paulsen (IBM)
  • George Bosilca (UTK)
  • Harumi Kuno (HPE)
  • Hessam Mirsadeghi (UCX/nVidia)
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Joseph Schuchart
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Naughton III, Thomas (ORNL)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Todd Kordenbrock (Sandia)
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Artem Polyakov (nVidia/Mellanox)
  • Austen Lauria (IBM)
  • Barrett, Brian (AWS)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • David Bernhold (ORNL)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • Josh Hursey (IBM)
  • Joshua Ladd (nVidia/Mellanox)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Tomislav Janjusic
  • Xin Zhao (nVidia/Mellanox)
  • mohan (AWS)

Web-Ex

  • link has changed for 2021. Please see email from Jeff Squyres to devel-core@lists.open-mpi.org on 12/15/2020 for the new link

4.0.x

  • v4.0.6rc1 - built, please test.
  • Discussed https://github.com/open-mpi/ompi/issues/8299 - srun issue in v4.0.x, mpirun works.
    • SRUN might not give us enough info, so might need a fix.
    • Curious what version of hwloc their slurm is built with.
  • Discussed https://github.com/open-mpi/ompi/issues/8321
    • UCX in VM possible silent error.
    • Added blocker label.
    • in v4.0.x and master, though might be down in UCX.
  • SLURM_WHOLE issue, want to stay in sync with OMPI v4.1.x.
  • Howard wants to get Luster testing before v4.0.6rc2.
    • Geoff pinged Mark to post his branch of ROMIO fixes for Luster

v4.1

  • Merged a number of PRs yesterday.

  • Issue 8334 - a performance regression with AVX. Still digging into.

    • AVX Perf issue.
    • Raghu tested AVX512 seems to make it slower.
    • Papers show that anything after AVX2 throttles down cores and have this effect.
    • Need to look into root cause.
    • Probably not ready for default.
    • Many apps just do one rank per node, which might WANT AVX on, but fully subscribed may want AVX off.
  • Issue 8335 - Trying to run with external PMIx.

    • resolved
  • Michael Heinz is looking at PSM2(?) new issue from yesterday. Possibly for v4.1.1

    • Fix PRed CQ entry data size field
  • Josh Hursey is working on Issue 8304 (verified in v4.1, v4.0, and v3.1)

    • Resolved.

Open-MPI v5.0

What's the state of ULFM (PR 7740) for v5.0?

  • Does the community want this ULFM PR 7740 for OMPI v5.0? If so, we need a PRRTE v3.0
    • Aurelien will rebase.
    • Works with PRRTE refered to ompi master submodule pointer.
    • Currently used in a bunch of places.
    • Run normal regression tests. Should not see any performance regressions.
    • When this works, can provide other tests.
    • Is a configure flag. Default is to configure in, but disabled at runtime.
      • A number of things to set to enable.
      • Aurelien is working to get a single parameter
    • Lets get some CODE reviews done.
      • Look at intersections of the core, and ensure that the NOT-ULFM paths are "clean".
    • Also we have a downstream affect PMIX and PRRTE to get a
    • Lets put a deadline on reviews. Lets say in 4 weeks, we'll push the merge button.
      • Jan 26th we'll merge if no issues

Josh and George removed Checkpoint Restart

  • Modified ABI - removed one callback/member function from some components (BTLs/PMLs) used for FT event.
    • All these structures for these components.
    • Pending for this discussion.
    • Going to version the frameworks that are affected.
    • Not this simple in practice, because usually we just return a pointer to a static object.
      • But this isn't possible anymore.
      • We don't support multiple versions
  • Do we think we should allow Open-MPI v5.0 to run with mcas from past versions?
    • Maybe good to protect against it?
    • Unless we know of someone we need to support like this, we shouldn't bend over for this.
    • Josh thinks the Container community is experimenting with this.
  • Josh has advised that Open-MPI doesn't guarantee
  • v5.0 is advertised as an ABI break.
  • In this case, the framework doesn't exist anymore.
  • George will do a check to ensure we're not loading mcas from earlier version. *

Jeff Squyres want the v5.0 RMs to generate a list of versions it'll support, to document.

  • Still need to coirdinate on this. He'd like this, this week.

  • PMIx v4.0 working on Tools, hopefully done soon.

    • PMIx go through python bindings.
    • a new Shmem component to replace
    • Still working on.
  • Dave Wooten pushed up some PRRTE patches, and making some progress there.

    • Slow but steady progress.
    • Once tool work is more stabilized on PMIx v4.0, will add some tool tests to CI.
    • Probably won't start until first of the year.
  • How is the submodule reference updatees on Open-MPI master

    • Probably be switching OMPI master to master PMIx in next few weeks.
      • PR 8319 - this failed. Should this be closed and create a new one?
    • Josh was still looking to see about adding some cross checking CI
    • When making a PRTE PR, could add some comment to the PR and it'll trigger Open-MPI CI with that PR.
  • v4.0 PMIx and PRRTE master.

    • When PRRTE branches a v2.0 branch, we can switch to that then, but that'll
  • Two different drivers:

    • OFI MTL
    • HFI support
    • Interest in PRRTE in a release, and a few other things that are already in v4.1.x
    • HAN and ADAPT as default.
    • Amazon helping testing and other resources
    • Amazon also investing to contract Ralph to help get PRRTE up to speed.
  • Other features in PMIX

    • can set GPU affinities, can query GPU info

This is the last Tuesday call of December.

  • New web-ex for January

ROMIO issue on Lustre

  • Too latest ROMIO from and it failed on both
  • But then he took LAST week's 3.4 BETA ROMIO and it passed. But it's a little too new.
    • He gave a bit more info about the stuff he integrates, and stuff he moves forward.
        1. ROMIO modernization (don't use MPI1 based things)
        1. ROMIO integration items.
    • We're hesitant to put this into 4.1.0 because it's NOT yet release from MPICH
    • hesitant to even update ROMIO in v4.0.6 since it's a big change.
    • If we delay and pickup newer ROMIO in the next minor, would there be backwards compatibility issues?
      • Need to ask about compatibility between ROMIO 3.2.2 and 3.4
        • If fully compatibile, then only one ROMIO
    • We could ship multiple ROMIOs, but that has a lot of problems.

Edgar hunted down performance issue of OMPIO

  • Just got resources to test, and root caused the issue in OMPIO
  • So, given some more time Edgar will get a fix, and OMPIO can be default

ROMIO Long Term (12/8)

  • What do we want to do about ROMIO in general.
    • OMPIO is the default everywhere.
    • Giles is saying the changes we made are integration changes.
      • There have been some OMPI specific changes put into ROMIO, meaning upstream maintainers refuse to help us with it.
      • We may be able to work with upstream to make a clear API between the two.
    • As a 3rd party package, should we move it upto the 3rd party packaging area, to be clear that we shouldn't make changes to this area?
  • Need to look at this treematch thing. Upstream package that is now inside of Open-MPI.
  • Might want a CI bot to watch a set of files, and flag PRs that violate principles like this.

Doc update

  • PR 8329 - convert README, HACKING, and possibly Manpages to restructured text.
    • Uses https://www.sphinx-doc.org/en/master/ (Python tool, can pip install)
    • Has a built from this PR, so we can see what it looks like.
    • Have a look. It's a different approach to have one document that's the whole thing.
      • FAQ, README, HACKING.
  • Do people even use manpages anymore? Do we need/want them in our tarballs?

Josh described new command line flags (-prot / -protlazy )

  • Putting new tests there

  • Very little there so far, but working on adding some more.

  • Should have some new Sessions tests

  • What's going to be the state of the SM Cuda BTL and CUDA support in v5.0?

    • What's the general state? Any known issues?
    • AWS would like to get.
    • Josh Ladd - Will take internally to see what they have to say.
    • From nVidia/Mellanox, Cuda Support is through UCX, SM Cuda isn't tested that much.
    • Hessam Mirsadeg - All Cuda awareness through UCX
    • May ask George Bosilica about this.
    • Don't want to remove a BTL if someone is interested in it.
    • UCX also supports TCP via CUDA
    • PRRTE CLI on v5.0 will have some GPU functionality that Ralph is working on
  • Update 11/17/2020

    • UTK is interested in this BTL, and maybe others.
    • Still gap in the MTL use-case.
    • nVidia is not maintaining SMCuda anymore. All CUDA support will be through UCX
    • What's the state of the shared memory in the BTL?
      • This is the really old generation Shared Memory. Older than Vader.
    • Was told after a certain point, no more development in SM Cuda.
    • One option might be to
    • Another option might be to bring that SM in SMCuda to Vader(now SM)
  • Restructure Tech Doc (more features than Markdown, including crossrefrences)

    • Jeff had a first stab at this, but take a look. Sent it out to devel-list.
    • All work for master / v5.0
      • Might just be useful to do README for v4.1.? (don't block v4.1.0 for this)
    • Sphynx is tool to generate docs from restructured doc.
      • can handle current markdown manpages together with new docs.
    • readthedocs.io encourages "restructured text" format over markdown.
      • They also support a hybrid for projects that have both.
    • Thomas Naughton has done the restructured text, and it allows
    • LICENSE question - what license would the docs be available under? Open-MPI BSD license, or
  • Ralph tried the Instant on at scale:

    • 10,000 nodes x 32PPN
    • Ralph verified Open-MPI could do all of that in < 5 seconds, Instant-On.
    • Through MPI_Init() (if using Instant-On)
    • TCP and Slingshot (OFI provider private now)
    • PRRTE with PMIx v4.0 support
    • SLURM has some of the integration, but hasn't taken this patch yet.
  • Discussion on:

    • Draft Request Make default static https://github.com/open-mpi/ompi/pull/8132
    • One con is that many providers hard link against libraries, which would then make libmpi dependent on this.
    • Non-Homogenous clusters (GPUs on some nodes, and non-GPUs on some other)

Video Presentation

  • New George and Jeff are leading
  • One for Open-MPI and one for PMIx
  • In a month and a half or so. George will send date to Jeff
Clone this wiki locally