Skip to content

WeeklyTelcon_20211019

Geoffrey Paulsen edited this page Nov 2, 2021 · 1 revision

Open MPI Weekly Telecon ---

Attendees (on Web-ex)

  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS) - Welcome Back!
  • Geoffrey Paulsen (IBM)
  • Harumi Kuno (HPE)
  • Hessam Mirsadeghi (NVIDIA))
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Joseph Schuchart (HLRS)
  • Josh Hursey (IBM)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Sam Gutierrez (LANL)
  • Sriraj Paul (Intel)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (NVIDIA)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • Christoph Niethammer (HLRS)
  • David Bernholdt (ORNL)
  • Edgar Gabriel (UH)
  • Erik Zeiske (HPE)
  • Geoffroy Vallee (ARM)
  • George Bosilca (UTK)
  • Joshua Ladd (NVIDIA)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja
  • Ralph Castain (Intel)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Tomislav Janjusic (NVIDIA)
  • Xin Zhao (NVIDIA)

New Topics For Today

  • Discuss the relative submodule path issue

    • Only master and v5.0.x.
    • Request from distributor to change from https:// to relative path.
    • Works in git client in RHEL7 (git in RHEL6 was too old)
    • Some issues:
      • One client was using a mirror, but accidentally using https for submodules. This exposed that.
    • If this is a problem, please let us know.
  • Does Fortran Fixes affect API? (i.e. needed for v5.0.0?)

v4.0.x

  • Schedule: Pushed to October for 4.0.7
  • --cpu-set - Geoff working on PR for nice warning/docs
  • Fortran PR 9259, 9367 probably affect v4.0.x branch as well.
    • Geoff will follow up.
  • Geoff and Howard

v4.1.x

v5.0.x

  • Schedule: rc2 went out yesterday.
  • 9534 - needs a cherry-pick back.
  • Issue #9554 Jeff asked about Partitions support going to v5.0 or not?
    • Matthew is interested
  • PR #9495 TCP Onesided for master.
  • Tommy's still pushing on UCX Onesided.
  • MPI Info stuff that Yoseph and Howard are working on.
    • Marking a few MPI_ calls as deprecated.
    • Nevermind, Don't mark as deprecated, since we're not MPI 4.0 compliant, so DONT mark as deprecated yet.
  • Documentation
    • Got a change in sphynx tools needed. No sure if there's a release yet.
      • This fixes outputting issues in manpages.
    • Process to update FAQ is to talk to Jeff or Harumi.
    • Any changes in README or FAQ let them know to make changes in NEW docs.
      • For now, make changes in ompi-www and README as usual and let them know.
  • v5.0.x requires pandoc. If user downloads from .tarball they do NOT need pandoc installed.
    • If user runs make dist or make dist-check they WILL need pandoc.
      • This is a strange quirk, but seems fine.
  • Github Project of [critical v5.0.x issues|https://github.com/open-mpi/ompi/projects/3]
    • Issue #8983 If we partially disable OSC/TCP BTL - Not breaking MPI compliance, just breaking One-sided performance badly.
    • Described approach of rc1 on Sept 23, disabling any functionality that are blockers to allow for the rc.
      • Worried that blockers might not be fixed in time, so will put in code to issue an error at runtime to prevent getting into those paths, and document it heavily.

Super Computing SC BoF

  • Time and Date of BOF Nov 16 @ 12:15pm US Eastern Time.
  • Was accepted for Open MPI
    • Our Hybrid BoF will be mostly VIRTUAL BoF
      • George may be there in person for tutorial (tho other tutorials will be fully-virtual)
    • Bird of a Feather will be Virtual.
    • George sent out an email to Amazon, Cisco, IBM, nVidia

Master

Documentation

  • No update
  • Don't do the old system, use this new system for v5.0.0

MPI 4.0 API

  • No discussion [Open MPI 4.0 API Compliance Github Project|https://github.com/open-mpi/ompi/projects/2]
  • Joseph says we're not dropping Info Keys as we SHOULD in the MPI 4.0.
    • Can make it work easily for Comms because it would need to go down into the PMLs.
    • Issue #9555
    • Do we want this in OMPI v5.0.0?
      • It'd be nice, because it's going to change behavior.
      • But it might also be bad because it's a change in behavior (if users depending on MPI 3.1 behavior)
        • But since it wasn't specified in MPI 3.1, so maybe whatever we do is okay.
  • Jeff's going to review PR 9246
  • Howard will review 7985
  • Need to decide what to do with 8057
  • Sessions branch, don't want to merge into master until possibly v5.0.1 gets out.
    • It will complicate things in finalize/initialize code.

MTT

  • Looking okay.
  • Looks like something was wrong with MTT.
    • That machine just got upgraded.
    • Install fail is kinda weird.

Longer Term discussions

  • No discussion.
Clone this wiki locally