This document presents a project on image caption generation using deep learning and natural language processing. It discusses using a convolutional neural network to extract features from images and a long short term memory network to generate captions by predicting words from the extracted features. The objectives are to describe image contents, showcase LSTM effectiveness, and create a working model. It proposes using CNN, RNN and LSTM with Flickr datasets. Literature on existing approaches and references are provided.