Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

https://arxiv.org/abs/1712.01887 This paper is from the same team who published Deep Compression. This time they are targeting to reduce the traffic generated during training of distributed deep learning models by compressing the gradient matrix. The key insight of...