This document provides a review of image question answering, which involves understanding visual elements in an image and common-sense knowledge to provide responses to open-ended questions about the image. It discusses approaches that map images and questions to a common feature space, along with datasets used to train and evaluate these systems. Several existing datasets for image question answering are described and compared, including DAQUAR, COCO-QA, VQA, and FM-IQA. Algorithms for image question answering extract features from the image and question and combine them to generate an answer, and baseline models are used as starting points for evaluation.