-
Notifications
You must be signed in to change notification settings - Fork 861
WeeklyTelcon_20171024
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen (IBM)
- Jeff Squyres
- Brian
- Geoffroy Vallee
- George
- Howard
- Josh Hursey
- Mathew / SNL
- Mohan
- Nathan Hjelm
- Ralph
- Todd Kordenbrock
- Edgar Gabriel
Review All Open Blockers
Review v2.0.x Milestones v2.0.4
- NEWS - Labor intensive to make NEWs every time. Can't we automate this?
- Can we just use the short titles from the PR titles?
- Not this week.
- Don't include high sierra fix.
- Schedule: Get it out this week.
Review v2.x Milestones v2.1.2
- v2.1.3 (unscheduled, but probably jan 19, 2018)
- PR4172 - a mix between feature / bugfix.
- Are we going to do anything for v2.x for hwloc 2?
- At least put in a configure error if detects hwloc v2.x
Review v3.0.x Milestones v3.0
- v3.0.1
- Still targeting End of October for release of v3.0.1
- a few PRs need review.
- Schedule: Still shooting for End of October.
Review v3.1.x Milestones v3.1](https://github.com/open-mpi/ompi/milestone/27)
-
v3.1.x -
- Roll hwloc back to 1.11.7 on v3.1.x branch (Ralph put together, Brian reviews)
- Will support an external hwloc v2.0.x, but default will be hwloc 1.11.7.
- PMIx - v3.1.0 was supposed to go out with PMIx 2.1.0 with cross version support
- Cross version support of PMIx is working fine, as long as not using PMIx shared memory.
- Fixing shared memory piece in v2.1 (with cross version support) needs a complete re-write.
- Ticket out there, needs review,
- Do we want to ship with PMIx v2.0 an no cross-memory support? Or PMIx v2.1, but don't support shared memory? (would have a number of build time flags to throw to get this to build).
- Cross version support of PMIx is working fine, as long as not using PMIx shared memory.
- Could delay...
- Could we ship BOTH, and have the default be the PMIx v2.1 without shared memory
- provide a configure time flag to build with PMIx v2.0 to allow shared memory for high core-count platforms.
- BUT, the backwards compatibile PMIx v2.1 still doesn't work with older PMIx versions if they were built with dstore (which is/was the default), so they have to go back and rebuild their PMIx stuff.
- All of our options are BAD, so lets delay a week and discuss next week as to what we can do.
- Send out an email to devel-core, and say we're going to delay v3.1 to fix it.
- Amazon will scope the amount of changes for dstore this week.
-
Schedule - Unsure, will see about above, and discuss next week.
-
Add v3.1 to MTT tests
- Database is active now to accept v3.1 tests.
-
MTT disks were getting full - PHP was trying to use /tmp, and local /tmp was full all weekend, so submissions weren't working. Josh moved what he could, but still thinks PHP is putting something in /tmp.
-
Administration
- Restored the Partner desgination.
- Voted in Mexico Consortium
Review Master Master Pull Requests
- Looking reasonablly good, but history is all mucked up.
- Something is going on with Jenkins (it looks like it's totally turned off right now)
- Treematch segfault issue - just master? We think.
- IBM has a patch we'll get PRed upstream, not sure if it fixes the same root issue others were seeing, but it fixes it in IBM's environment.
- George accidentally pushed a branch 'v3.x' into upstream.
- Just delete it.
- Jenkins - Botny Bay, and Berkly machines - both had issues where Jenkins couldn't ssh into those machines, and logged that it couldn't.
- This filled the disk, and ran us out of Web server credits.
- Brian will send out config to Nathan on how to setup a daemon for connections so Jenkins won't sit in loop trying to ssh nodes it can't get to. He already has MAC-OSX config.
- There is a wiki page with instructions also.
- Brian will also put Jenkins on it's own partition to help isolate us.
- When Jenkins goes bonkers it consumes all CPU cycles on the machine.
- Discussed Issue 4349
- We seem to remember disabling it due to a real bug.
- IBM will dig through notes and reply on Issue.
Review Master MTT testing
- Website - openmpi.org
- Brian trying to make things more automated, so can checkout repo, etc. Repo is TOO large.
- Majority of the problem is the Tarballs. and already storing those in S3.
-
Need to see if Attributes are MT - IBM will see if we have any tests to audit.
- Asked, need to get answer back from them.
- Jan / Feb
- Possible locations: San Jose, Portland, Albuquerque, Dallas
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu
- Amazon,
- Cisco, ORNL, UTK, NVIDIA