- Fix compatibility issue with tf2 + standalone keras
- Add support for tensorflow.keras
- Improve robustness of broadcast
- Add DistributedDataParallel module for PyTorch
- Fix the problem of different CPU tensor using the same name
- Add skip_synchronize api for PyTorch
- Add the option for lazy/non-lazy init
- Largely improve RDMA performance by enforcing page aligned memory.
- Add IPC support for RDMA. Now support colocating servers and workers without sacrificing much performance.
- Fix a hanging bug in BytePS server.
- Fix RDMA-related segmentation fault problem during fork() (e.g., used by PyTorch data loader).
- New feature: Enable mixing use of colocate and non-colocate servers, along with a smart tensor allocation strategy.
- New feature: Add
bpslaunch
as the command to launch tasks. - Add support for pip install:
pip3 install byteps
- First official release.