Abstract: When performing SGD in a distributed environment, a large number of local gradients need to be exchanged through the network, so the communication cost becomes a bottleneck of distributed ...