-
Notifications
You must be signed in to change notification settings - Fork 861
WeeklyTelcon_20160216
Jeff Squyres edited this page Nov 18, 2016
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Brad Benton
- Edgar Gabriel
- Howard
- Josh Hursey
- Ryan Grant
- Todd Kordenbrock
- Joshua Ladd
- Ralph
- Sylvain Jeaugey
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.3
- Targeting beginning April for 1.10.3 - no new drivers
- Nathan - 0 byte send issue?
- Howard - verbs usNIC build default issue? - PR 938 waiting for Howard to review.
- Jeff - Fortran08? - ralph just committed.
- Issue 1136 - SLES12 - Longrunning jobs mpirun SIGCHLD at end of Job?
- nVidia now showing MTT failures, were silently failing before.
- hello_alloc_memusempi - 1sided. Slyvian should open an issue against 1.10.x.
- Some race condition. so possibly not fixed on master and 2.x, might just not hit it.
- Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20
- Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
- Issue 1348 -disable addprocs 32bit & bigendian. Resolved.
- Issue 1346 - grpcomm fixes. Resolved.
-
Issue 1252 - openib causes horrible same node perf.
- Can't test on 2.0 due to unresolved symbol / bad init.
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0
- PR 953: --host discussion Issue 1344
- Have a pow-wow in face to face. - Lets make sure we settle on what we want.
- ompi-release PR 962 - in master.
- Nathan would like this change everywhere. Nathan will put this on master everywhere PR.
- 966 - Fortran 08 - good with this, push everywhere.
- PR 967 - work occurred over weekend.
- Is there a more generic jenkins test that could be configured to catch this?
- Yes, but can't anticipate that this will happen commonly
- Jeff could add another configure option to Jeff's nightly MTTs.
- Edgar has a general problem that many components require OMPIO component.
- PMI-x 1.1.2? - fixes some bugs, but also adds more "stuff".
- Ralph: this is a 1.1.2+ for OMPI 2.0 release.
- PMI-x 1.2 for OMPI 2.0.1
- Will this be okay for our versioning backwards compatibility statements?
- Yes, none of this gets built into the users' application.
- From Last week:
- lot of issues are usNIC related. Jeff will STILLlook at.
- non-one-sided failures with usNIC cluster. Perhaps cluster network setup.
- nVidia look like dynamics related. Slyvian fixing something about way it launches.
- Turned of nVidida MTT tests right now. Just started getting different errors.
- BOTH Master and 2.x - some CUDA related things are broken. IS collective related.
- Some new errors for 1.10 - because jeff committed some fixes on the test, that is now SHOWING the error.
- Hope to get testing back online today or tomorrow.
- Turned of nVidida MTT tests right now. Just started getting different errors.
- Nathan will look at all one-sided failures.
- tcp btl might have an issue, getting tried to lock resource but already locked warning.
- lot of issues are usNIC related. Jeff will STILLlook at.
- LANL - Release stuff, Some investigations for meeting next week.
- Now that we have KNL boxes, been working some with Open MPI and MPICH KNL, vast improvement over KNC.
- Binaries will work on KNL or Haswells.
- Want to get back to OMPI_PLACES setting. Not sure where to put it. Discuss at face2face.
- will need to use NESTED OMP parallelism. Want to make that easy.
- Want to make sure everything is clean for 1-sided for 2.0
- Trying to find last error with MPOOL re-write. Asking for feedback, and asking how people like the new organization.
- Really want George's comment here.
- will give us ability to use MEMKIND, and will take some work of getting everything to use same allocators.
- Can expose performance variables to tweak settings.
- Houston - Mostly using release branch, done a little more code development for glass
- IBM -
- Getting MTT and builds setup internally.
- Defining support matrix for new open MPI product.
- Will be using RFC process for some bigger features.
- Problem with MTT reporter. Josh put patch for it. Still running off svn repo, but we'll need to do a swap.
- During the swap MTT will be down.
- LANL, Houston, IBM
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel