As the data and models scale, it becomes necessary to have multiple processing units for both training and inference. SignSGD is a gradient compression algorithm that only transmits the sign of the stochastic gradients during distributed training. This algorithm uses 32 times less communication per iteration than distributed SGD. We show that signSGD obtains free lunch both in theory and practice: no loss in accuracy while yielding speedups. Pushing the current boundaries of deep learning also requires using multiple dimensions and modalities. These can be encoded into tensors, which are natural extensions of matrices. These functionalities are available in the Tensorly package with multiple backend interfaces for large-scale deep learning.