Transformers are now achieving state-of-the-art results in computer vision tasks, outperforming convolutional neural networks that have previously dominated the field. Transformers use attention mechanisms to incorporate information from across the entire representation, rather than relying on local connections as in CNNs. Recent works have applied transformers directly to vision problems and found they can match or exceed CNN accuracy. Researchers are also exploring combinations of transformers and convolutions to leverage the benefits of both approaches. As applications require more complex visual understanding, attention-based models will continue to be important for vision.