Tagged: distributed training


Beyond Data and Model Parallelism for Deep Neural Networks

https://arxiv.org/abs/1807.05358 Existing deep learning frameworks either use data or model parallelism as distribution strategy. However, these strategies often produce sub-optimal performance for many models. In data parallel distribution, the computation graph is partitioned on the data sample dimension....