This document discusses building a semantic image search engine through deep representation learning. It describes using a pre-trained convolutional neural network to extract image embeddings, which are then used to build an index for fast proximity search of similar images. The model is further trained to predict word embeddings for images, allowing image search to be guided by text queries beyond the initial training classes. This provides a way to build a joint image-text embedding space and enable generalized image search using only a small dataset for training.