This document provides an overview of image captioning using attention models. It discusses the tasks of understanding image content and generating descriptive sentences. Real-world applications of image captioning like search and aids for visually impaired are mentioned. The history and evolution of image captioning including early work mapping images to sentences is covered. Finally, the document reviews sequence-to-sequence learning, the Show and Tell and Show Attend and Tell models using attention mechanisms, beam search, and provides a link to code in PyTorch.