AI at scale requires a perfect storm of data, algorithms and cloud infrastructure. Modern deep learning requires large amounts of training data. We develop methods that improve data collection, aggregation and augmentation. This involves active learning, partial feedback, crowdsourcing, and generative models. We analyze large-scale machine learning methods for distributed training. We show that gradient quantization can yield best of both the worlds: accuracy and communication efficiency. We extend matrix methods to higher dimensions using tensor algebraic techniques and show superior performance. Finally, at AWS, we are developing robust software frameworks and AI cloud services at all levels of the stack.