Skip to content

WeeklyTelcon_20211102

Geoffrey Paulsen edited this page Nov 8, 2021 · 1 revision

Open MPI Weekly Telecon ---

Attendees (on Web-ex)

Oops Not recorded today. :(

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (NVIDIA)
  • Aurelien Bouteiller (UTK)
  • Austen Lauria (IBM)
  • Brandon Yates (Intel)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS) - Welcome Back!
  • Charles Shereda (LLNL)
  • Christoph Niethammer (HLRS)
  • David Bernholdt (ORNL)
  • Edgar Gabriel (UH)
  • Erik Zeiske (HPE)
  • Geoffrey Paulsen (IBM)
  • Geoffroy Vallee (ARM)
  • George Bosilca (UTK)
  • Harumi Kuno (HPE)
  • Hessam Mirsadeghi (NVIDIA))
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Joseph Schuchart (HLRS)
  • Josh Hursey (IBM)
  • Joshua Ladd (NVIDIA)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja
  • Ralph Castain (Intel)
  • Sam Gutierrez (LANL)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Sriraj Paul (Intel)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • Tomislav Janjusic (NVIDIA)
  • William Zhang (AWS)
  • Xin Zhao (NVIDIA)

New Topics For Today

  • Does Fortran Fixes affect API? (i.e. needed for v5.0.0?)
  • Howard has been implementing isend/recv and isend/replace
    • ULFM might not have looked closely enough about how this was defined in the standard.
    • What if send completed, but the recv failed?
      • Not hard to code, just not well defined. Let the forum discuss.

v4.0.x

  • Schedule: Pushed to Nov. for 4.0.7
  • Thursday we'll build 4.0.7 rc2
  • Adding ireduce_scatter 2GB silent wrong answer bug into news.

v4.1.x

  • Schedule:
    • Another RC probably tomorrow
  • Cisco MTT just got their testing, back online this morning.
    • RHEL7 - Let Encrypt certification expiration last month.
    • Will just run v4.1.x to try to get a good run in.
      • All running
    • Howard added a new osc test named "empty" that includes ompi.h, but doesn't us it.
      • Some test harness fix PRs ready to merge.

v5.0.x

  • Schedule: rc2 went out yesterday.
  • https://github.com/open-mpi/ompi/issues/9540 might be ready on v5.0.x
  • 8 PRs open.
    • PR 9594 - Fixes some BTL issues (against master) will take a few days to review.
  • Issue #9554 Jeff asked about Partitions support going to v5.0 or not?
    • Matthew is interested
  • PR #9495 TCP Onesided for master.
  • Tommy's still pushing on UCX Onesided.
  • PR 9576 - Ralph filed a ticket about building packages externally.
    • Working with fedora packagers. Will be a v5.0.x
    • Might need some back and forth with PMIx. The way he updated PMIx might need massive change to OMPI.
      • Ball is somewhat in Jeff's Court.
      • Across OMPI/PMIx/PRRTE - Just need to
  • MPI Info stuff that Yoseph and Howard are working on.
    • Marking a few MPI_ calls as deprecated.
    • Nevermind, Don't mark as deprecated, since we're not MPI 4.0 compliant, so DONT mark as deprecated yet.
    • No additional discussion. *
  • Documentation
    • Got a change in sphynx tools needed. No sure if there's a release yet.
      • This fixes outputting issues in manpages.
    • Process to update FAQ is to talk to Jeff or Harumi.
    • Any changes in README or FAQ let them know to make changes in NEW docs.
      • For now, make changes in ompi-www and README as usual and let them know.
  • Issue 9501 regression, needs to be fixed or reverted.
  • No test for building from tarball, ensure we don't need pandoc.
  • Github Project of [critical v5.0.x issues|https://github.com/open-mpi/ompi/projects/3]
    • Issue #8983 If we partially disable OSC/TCP BTL - Not breaking MPI compliance, just breaking One-sided performance badly.
    • Described approach of rc1 on Sept 23, disabling any functionality that are blockers to allow for the rc.
      • Worried that blockers might not be fixed in time, so will put in code to issue an error at runtime to prevent getting into those paths, and document it heavily.

Super Computing SC BoF

  • Time and Date of BOF Nov 16 @ 12:15pm US Eastern Time.
  • Was accepted for Open MPI
    • Our Hybrid BoF will be mostly VIRTUAL BoF
      • George may be there in person for tutorial (tho other tutorials will be fully-virtual)
    • Bird of a Feather will be Virtual.
    • George sent out an email to Amazon, Cisco, IBM, nVidia
  • Where do we drop slides? Jeff will send again. Deadline T-minus 1-week.
    • Google Slides - Due Tuesday Nov 9th.
    • Focus on v5.0

Master

Documentation

  • No update
  • Don't do the old system, use this new system for v5.0.0

MPI 4.0 API

  • No discussion [Open MPI 4.0 API Compliance Github Project|https://github.com/open-mpi/ompi/projects/2]
  • Joseph says we're not dropping Info Keys as we SHOULD in the MPI 4.0.
    • Can make it work easily for Comms because it would need to go down into the PMLs.
    • Issue #9555
    • Do we want this in OMPI v5.0.0?
      • It'd be nice, because it's going to change behavior.
      • But it might also be bad because it's a change in behavior (if users depending on MPI 3.1 behavior)
        • But since it wasn't specified in MPI 3.1, so maybe whatever we do is okay.
  • Jeff's going to review PR 9246
  • Howard will review 7985
  • Need to decide what to do with 8057
  • Sessions branch, don't want to merge into master until possibly v5.0.1 gets out.
    • It will complicate things in finalize/initialize code.

MTT

  • Looking okay.
  • Looks like something was wrong with MTT.
    • That machine just got upgraded.
    • Install fail is kinda weird.

Longer Term discussions

  • No discussion.
Clone this wiki locally