Skip to content

WeeklyTelcon_20220614

Geoffrey Paulsen edited this page Jun 21, 2022 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Akshay Venkatesh (NVIDIA)
  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS)
  • Christoph Niethammer (HLRS)
  • Edgar Gabriel (UoH)
  • Geoffrey Paulsen (IBM)
  • Hessam Mirsadeghi (UCX/nVidia)
  • Joseph Schuchart
  • Josh Fisher (Cornelis Networks)
  • Josh Hursey (IBM)
  • Matthew Dosanjh (Sandia)
  • Todd Kordenbrock (Sandia)
  • Tommy Janjusic (nVidia)
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Artem Polyakov (nVidia)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • David Bernhold (ORNL)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • George Bosilca (UTK)
  • Harumi Kuno (HPE)
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Joshua Ladd (nVidia)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Sam Gutierrez (LLNL)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Thomas Naughton (ORNL)
  • Xin Zhao (nVidia)

v4.1.x

  • v4.1.5
    • Schedule: targeting ~6 mon (Nov 1)
    • No driver on schedule yet.

v5.0.x

  • Updated PMIx and PRRTE submodule pointers.

    • Issue 10437 - We hope this is resolved by updated pointers.
    • Austen couldn't reproduce, can anyone give confirmation that this is resolved?
  • Issue 10468 - Doc to-do list.

  • Issue 10459 - a bunch of issues with ompi-master.

    • Compiler issues with Q-Threads
      • Not sure who the owner of qthreads is.
  • Discussions about new

  • Mellanox still have some use-cases for sm_cuda btl.

  • Any idea on how mature accellerator framework is?

    • nVidia commits to testing the framework on main.
    • Still some discussion on the Pull Request.
  • A couple of critical new issues.

    • Issue 10435 - a Regression from v4.1
      • No update.
  • Progress being made on missing Sessions symbols.

    • Howard has a PR open that needs a bit more work.
  • Call to Prte / PMIx

    • Longest Pole in the tent right now.
    • If you want OMPI v5.0 released in near-ish future, please scare up some resources
    • Use PRRTE critical and Target v2.1 labels for issues.
  • Schedule:

    • Blockers are still the same.
    • PRRTE blocker -
    • Right now looking like late summer (Us not having a PRRTE release for Packager to package)
      • Call for help - If anyone has resources to help, we can move this release date much sooner.
      • Requires investment from us.
    • Blockers are listed Some are in the PRRTE project
    • Any Alternatives?
      • The problem for Open MPI is not that PRRTE isn't ready to release. The parts we use, works great, but other parts still have issues (namely DVM)
      • Because we install PMIx and PRRTE as if they came from their own tarballs.
        • This leaves Packagers no good way to distribute Open MPI.
      • How do we install PMIx and PRRTE in open-mpi/lib instead and get all of the rpaths correct?
      • This might be the best bet (aside from fixing PRRTE ources of course)
  • Several Backported PRs

Main branch

  • coll_han tuning runs discussion [PR 10347]

    • Tommy(nVidia) + UCX on v5.0.x Seems that Adapt and Han are underperforming realtive to
      • Graph of data posted to [PR 10347]
        • Percentage difference latency graphs.
        • Anything ABOVE 0 is where Han out performed (better) than tuned.
      • He's been seeing some "sorry" messages.
        • Perhaps a combination of SLURM and MPIRUN?
      • Just tested Alltoall, Allreduce, and Allgather.
      • x86 cluster, 32nodes x 40ppn
        • By node HAN seems to perform better
        • By core Tuned seems to perform better.
      • Some dips might be due to UCX dynamic transport at this scale (rather than RC)
      • Tommy can do some more testing if others have suggestions.
      • Used mpirun with either (--map-by-node|--map-by-core) force ucx and select collective.
      • Tommy will also run 1ppn and full ppn
    • Would be good to run Open MPI v4.1 branch to see, especially since George's paper was against v4.1
    • Brian(AWS) was using EFA, and seeing similar things.
    • Would also be interesting to see how UCC stands up against these numbers.
    • Corneilius (Brendan) ran both v4.1 and main - not highly tuned clusters, but similar components.
      • Trying to isolate the differences between v4.1 and main.
      • Just increasing priority SHOULD work to select the correct collective components.
      • OFI with PSM2 provider
      • Substantial difference between main and v4.1
      • Have seen substantial differences with different mapping flags.
      • Maybe we should rerun this with explict mapping controls.
      • Small messages seem better with Han and large messages due to Tuned?
    • Austen (IBM) also did graphs with v5.0.x
      • lower percentages
      • OB1 with out of box with Tuned/Han
      • Orange is --map-by-core, blue is --map-by-node
      • Bcast getting close to 90%
      • Will run with IMB to verify OSU data.
      • Using UCX didn't see much difference on Han and Tuned.
      • HAN is heirarchical so scaling ppn shouldn't be as noticable difference as scaling nodes.
      • Don't really see too much difference between --map-by-core and --map-by-node (expected in HAN), but dissimilar with Brian and Tommy's data.
    • Would be good for George to look and comment on this.
    • Joseph is also planning to do runs.
      • Will talk to George on posted numbers and post any suggestions.
    • Thomas Naughton.
    • main and v5.0.x should be the same, use either
  • Please HELP!

    • Performance test default selection of Tuned vs HAN
    • Brian hasn't (and might not for a while) have time to send out instructions on how to test.
      • Can anyone send out these instructions?
    • Call for folks to performance test at 16 nodes, and at whatever "makes sense" for them.
  • Accelerator stuff that William is working on, should be able to get out of draft.

    • Edgar has been working on ROCME component of Framework
    • Post v5.0.0? Originally was shouldn't since release was close, but if it slips to end of summer, we'll see ...
  • Edgar finished ROCM component... appears to be working.

    • William or Brian can comment on how close to merge to main.
    • William working on btl sm_cuda and rcache code. Could maybe merge at the end of this week.
    • Tommy, was going to get some nVidia people to review / test.
    • Discussion on btl sm_cuda - used to be a cloned copy of sm, but it's the older sm component, not vader which was renamed to sm.
      • Might be time to drop btl sm_cuda?
      • vader component does not have hooks to the new framework.
      • Uses where btl sm_cuda might get used today would be:
        • TCP path would use this for on-node
        • Node without UCX
      • even one-sided would not end up using btl sm_cuda.
    • v5.0.0 would be a good time to remove this.
      • Based on old sm is a big detractor.
      • Can we ALSO remove rcache? Unclear.
  • What's the status of accellerator branch on v5.0.x branch?

    • PR is just to main.
    • We said we could do a backport, but that would be after it gets merged to main
      • If v5.0.0 is still a month out, is that enough time?
      • v5.0.0 is lurking closer.
    • This is a BIG chunk of code...
      • But if v5.0.0 delays longer... this would be good to get in.
    • Answer is largely dependent on pmix and prte.
    • Also has implications on OMPI-next?
  • Can anyone who understands packaging review: https://github.com/open-mpi/ompi/pull/10386 ?

  • Automate 3rd Party minimum version checks into a txt file that both

    • configure and docs could read from a common file.
    • config.py runs at beginning of Sphynx and could read in files, etc.
    • Still iterating on.
  • https://github.com/open-mpi/ompi/pull/8941 -

    • Like to get this in, or close it
    • Geoff will sent him an email to George to ask him to reiview.

MTT

Face-to-face

  • What are companies thinking about travel?
  • Wiki for face to face: https://github.com/open-mpi/ompi/wiki/Meeting-2022
    • Should think about schedule, location, and topics.
    • Some new topics added this week. Please consider adding more topics.
  • MPI Forum was virtual
  • Next one Euro MPI will be hybrid.
    • Plan to continue being hybrid with 1-2 meetings / year.
Clone this wiki locally