ColossalAI vs. Fairscale #1582
Replies: 1 comment 6 replies
-
Colossal-AI is native to PyTorch as well, just that it is not maintained by Meta. Compared to FairScale, our ZeRO and Offloading modules surely outperform those in FairScale. This is because that we build our system from the Tensor layer, we created a new Tensor called ColoTensor (compatible with torch.Tensor) and design in such a way that it supports distributed training natively. Besides that, we also support tensor and pipeline parallelisms, one exciting feature coming is auto parallelism which automates all parallelisms. As for mixed precision training, The ZeRO module in ColossalAI does use FP16 and has some modifications. Thus, it does not need |
Beta Was this translation helpful? Give feedback.
-
Hi, Happy Mid-Autumn Day~
I saw this impressive figure for Github star history among ColossalAI, Megatron and Fairscale. There must be great reasons why ColossalAI is gaining popularity with strong momentum to have surpassed Fairscale in such a sharp slope. How do you compare ColossalAI with Fairscale, since Fairscale (FSDP) is native/friendly to pytorch, and also has efficient memory management, data parallelism, 1-D model/pipeline parallelism with FSDP, and model offloading to CPU.
Other than NVMe offloading and ZeRO Redundancy Optimizer with chuck-based memory management, does Colossal AI has more distinctive features that beat Fairscale? How much does the ZeRO Redundancy Optimizer influence the performance compared to Fairscale which does not have this feature?
Also in terms of fp16 mix precision support, how do you compare ZeRO, ColossalAI AMP, or Fairscale? since fp16 and zero configuration are not compatible, and Pipeline only support ColossalAI naive AMP currently.
Beta Was this translation helpful? Give feedback.
All reactions