Skip to content

WeeklyTelcon_20161220

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff
  • Ralph
  • Howard
  • Josh Hursey
  • Nathan Hjelm

Agenda

Review 1.10

Review 2.0.x

  • Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20

  • Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker *

  • Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0 2.0.2 -

    • 1sided stuff - Worked with Mark yesterday, and noticed an assumption in the test.
    • in sort, osc_pt2pt is passing for single threaded (after test case fix)
    • multi-threaded still hanging in heavy order.
      • If a rank has multiple locks outstanding, they can be handled in an arbitrary order on remote side, causing a deadlock.
      • Unclear if this is a test problem (shouldn't put MPI in this situation).
      • Mark gave Josh a pretty small reproducer problem.
    • Josh will get lock, and lock2 test case

    2953 - locks must complete in order?

    • IBM will PR osc mca parameters to disable for thread multiple case.
    • IBM will also PR the current osc_pt2pt fix commits. Single threaded is working, but MT still hangs.

    Howard will close some already merged Issues.

    anything else about 2.0.2 that we should discuss?

    • Howard would want to do an RC2 after we get 1sided in, tomorrow or thursday morning.
    • Then check with Jeff to see if we want a release by end of week.
      • He has a nice checklist to workthrough for
  • 2.1 -

    • Did darray fix get into 2.1? Nathan thinks so.

PMIx update

  • PMIx 1.2.0 - rc'ed last week, looking good. So decided this morning to release 1.2.0 today.
  • Next step is to integrate into OMPI.
    • Original plan was to go into OMPI v2.1, but some discussion we may want it in 2.0.3 to resolve memory issues.

General question

  • We have a bunch of sleeps that we do during startup. Totalview has a problem with a bunch of little usleeps.
  • Totalview requested that we change usleeps to pthread conditioned waits.
    • In some environments, the pthread library didn't idle the CPUs, just kept using full time-slot.
    • Totalview claims on the systems they run, that is no longer true, that pthread conditioned wait now idles the CPU
    • Nathan says pthread condition waits are implemented in kernel with something that will idle the CPU.

Face2Face in Jan.

  • Please sign up.

Review Master MTT testing (https://mtt.open-mpi.org/)

MTT Dev status:

Status Updates:


Status Update Rotation

  1. Cisco, ORNL, UTK, NVIDIA
  2. Mellanox, Sandia, Intel
  3. LANL, Houston, IBM

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally