This document discusses attention mechanisms in deep learning models. It covers attention in sequence models like recurrent neural networks (RNNs) and neural machine translation. It also discusses attention in convolutional neural network (CNN) based models, including spatial transformer networks which allow spatial transformations of feature maps. The document notes that spatial transformer networks have achieved state-of-the-art results on image classification tasks and fine-grained visual recognition. It provides an overview of the localisation network, parameterised sampling grid, and differentiable image sampling components of spatial transformer networks.