-
Notifications
You must be signed in to change notification settings - Fork 861
WeeklyTelcon_20170221
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Artem Polyakov
- Edgar Gabriel
- Geoffroy Vallee
- Howard
- josh Hursey
- Josh Ladd
- Nathan Hjelm
- Ralph
- Thomas Naughton
- Todd Kordenbrock
Review All Open Blockers
Review Milestones v1.10.6
- No plans for a v1.10.7
Review Milestones v2.0.2
- No plans for a v2.0.3
Review Milestones v2.1.0
- PMIx 1.2.1 will release today.
- Nathan thinks he can test Open MPI master with both PMIx 1.2.1 and PMIx master.
- Nathan is still concerned, still sees lots of scaling issues.
- Josh: Release today. PR to Open MPI tomorrow, and have a few days for Nathan to test PMIx 1.2.1 and Master.
- Ralph: We know Open MPI v2.1 won't scale that well, the memory scaling will better, but scaling won't be as good as Open MPI v2.0.
- Don't know how much better it will be with Direct Launch (still have memory scaling issue - doesn't use dstore, unless SLURM plugin uses it).
- How do we message PMIx 1.2.1 rev in Open MPI v2.1 release? - Reduced memory footprint, haven't fixed launch time problem.
- A bunch of PRs on 2.1.0 - Howard will merge in when he gets a chance.
- Went through blocker list on v2.1.0
- Nathan will try to get Issue 2106 in today - fail eligantly.
- Probably not BSD specific, could show up, so adding graceful fail. Removed blocker.
-
Bcast Corruption in libnbc.
- Hard to fix, without packing. The problem is that it's picking two different algorithms on each side.
- We have the work around in, so Not a blocker for v2.1. Removed blocker and moved to v3.0
-
Missing a few F08 symbols in C mpi.h.
- MPI says all constants are supposed to appear in all headers, regardless of language.
- Not technically MPI 3.0 compliant without this. Removed blocker for v2.1.
- So now, only two blockers are PMIx, and release checklist.
- Do we have a code-complete date? No new features when PMIx, but bug fixes until we release?
- We all want to get this out soon.
- Nathan will try to get Issue 2106 in today - fail eligantly.
- Proposal to accelerate Discuss Skipping v2.2, and moving toward v3.0 as soon as v2.1.0 is out (soon!)
- Proposal was to branch v3.0 "soon", and then release on June 15th.
- The four month release cycle (off of master) may not be feasible until we have better CI.
- CI only provides faster turnaround and prevents really bad code going into master.
- Want to release what we test, but there are many features that we can't test.
- value in guidelines, but dangerous to set this as a hard rule, because we don't want to kill things that people out there depend on us for.
- can request community to test Release Candidates.
- Cutting a release branch, enables vendors to begin their back-end testing. With the four month cycle, that's
- We know there are things in Master we need, but probably don't want to back-port to v2.x
- So probably want to branch v3.0 pretty soon.
- MTT master doesn't look too bad, but some issues.
- New proposal doesn't leave much time for new features wanted for v3.0:
- New features needed for v3.0 on call?
- hooks framework, ugenie btl
- put something out on devel.
- New date-based approach allows us to ship v2.1 and push some non-regression type bugs back to the next release.
- Write up an email for devel email list that we branch off of master for v3.0 Feb 28th [Action IBM]
Review Master Pull Requests
Review Master MTT testing
- External Component renaming of external component symbols.
- When we first embedding things, they weren't available in distros (at least the levels we are requiring).
- PMIx - Not in RHEL5 or RHEL6, May be in early adopter phase were we have to carry it with us.
- libevent - Could use alternative libev, but would HAVE to have downstream fork.
- hwloc - configure test (do you have hwloc 1.8 or newer), due to a function introduced in 1.8, but looks like we don't actually use it, then most distros would already have hwloc, and it wouldn't be an issue.
- Would be nice to strip them out, and have the glue to make them work.
- What would this look like?
- If we get rid of internal component, we'd still have 1 or more external components to link against various external component libraries.
- Can we really go back to an hwloc 1.7?
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu