-
Notifications
You must be signed in to change notification settings - Fork 862
WeeklyTelcon_20160719
Jeff Squyres edited this page Nov 18, 2016
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Artem Polyakov
- Brian
- Edgar Gabriel
- Howard
- Josh Hursey
- Nathan Hjelm
- Ralph
- Ryan Grant
- Todd Kordenbrock
- Milestones
- A couple of things sitting against 1.10.4
- Wiki
-
2.0.1 PRs that are reviewed and approved
- v2.0.1 PRs are open. Need to get PRs reviewed!
- Blocker Issues *
- Milestones
- We release last Tuesday. Now taking in PRs.
- A lot of 2.0.1 PRs that did not get reviewed yet, so please get reviews.
- Howard and Jeff merging in low risk ones.
- nvidia failures with OFED install (false failures)
- cisco failures - still some failures here. Have to do with sparse groups. One of the PRs we haven't pulled in yet.
- IBM seem to do with spawn an intercomm interconnect
- Might call Connect / Accept - when we create key, we use PMIx to communicate between leaders.
- PMIX needs to support Exchange.
- Aborts but Hangs. - PMIx error code is coming up.
- Cray - all associated with SPAWN, but CRAY PMI doesn't support it.
- Applaunch with Master doesn't work
- MPI_Info keys are weird. OMPI_NUM_APPS? what is that?
- Mellanox will host EventBright (vendor process fees). Thank you Mellanox.
Review Master MTT testing (https://mtt.open-mpi.org/)
-
Gentle reminder that lots of 2.0.1 PRs haven't been reviewed yet.
-
Merging github master and ompi_release is taking a backseat to migraion.
-
Migration ongoing, nothing's moved yet, just testing:
- Mailman lists - sanity check of list of lists that we are migrating, and not-migrating.
- if Community is good with list of lists, then give everyone a heads up that it's moving.
- new aliases will be @lists.openmpi-org.
- Transfer MTT to Ralph's machine to address PostGRES issue, before transitioning.
- MTT code is somewhat POSTGRES specific. But Hostgator support MYSQL, but not POSTGRES.
- So need to modify code from POSTGRES to MYSQL.
- So Intel is temporarily migrating MTT Sever until we can migrate to MYSQL.
- meeting with MTT to guestimate the time... few months of realistic effort.
- Mostly API issue, though some POSTGRES specific tables. That will need to change. Database structure won't have to change.
- moving main website. Mostly a solved issue. Want to do mailing list stuff first.
- PDFs for 3rd party agreements. Ralph talked to Hostgator, they have a file sharing that increases price dramatically.
- If only one or two people have access, and have permission on HostGator, perhaps this is acceptable.
- Mailman lists - sanity check of list of lists that we are migrating, and not-migrating.
- Mellanox Jenkin's - Some jenkin's testing that was failling in MPI_Init, not sure if new MELLANOX Seed.
- Will look into. Server was rebooted, they are doing some maintenance. Perhaps this is causing issues.
- Jeff tagged Artem in PR in last few hours.
- Possible to put a :bot-mellanox-retest: on Mellanox Jenkins
- Artem will try.
- Howard pointed out yesterday. Jeff did a bot-retest of old 2.0.1 PRs, because he thought they'd be done in serial. But Mellanox config says it will run 10 in parallel
- Artem - discuss benchmarks.
- Test blocking versus non-blocking MPI_Send/Recv -
- could run 16 processes per node, and pair processes on two nodes to send back and forth.
- Or could run 1 process per node, and run 16 threads, and do the same thing.
- would expect this to be similar, but in reality, it is very different (16 threads is much worse).
- So this is one of the questions to discuss?
- Not talking about oversubscribing.
- Are each thread pair using seperate communicator or same?
- Can do both, no difference.
- Are you preposting the buffers? If not, and everything is using different tags, then the receive list gets quite large (OB1).
- Each thread using different tag, but different messages using the same tag.
- Unclear if Yalla has a way to distinguish based on Comms (just tags?) so possibly no gain in parallelism.
- Artem was thinking about this.
- Artem can include link to sources.
- Would like community to work with Artem on them.
- can do the fine binding (if you have 16 procs per node).
- Do allow you to bind a process to 4 cores. Bind to core, map by core:ppe=4.
- Artem has benchmark do fine binding.
- Doing non-overlapping fine binding based on MPI_Exchanging.
- Can run 4 multi-threaded process. OSU can only run 1 proc/node.
- Now can reproduce OSU results with 1 process per node.
- Can reproduce ARM's results with OSU, having each thread doing seperate Send/Recvs.
- Artem will send out link to his personal public github repo for for others to try, and provide PRs against, etc.
- Mellanox
- Artem sent out message rate email.
- Sandia
- Intel
- Mellanox, Sandia, Intel
- LANL, Houston, IBM
- Cisco, ORNL, UTK, NVIDIA