This document discusses bag-of-words representations for large-scale visual recognition. It describes how bag-of-words representations work by quantizing local image descriptors into "visual words" to create sparse feature vectors for images. These vectors can then be efficiently searched over large datasets using inverted files. The document discusses techniques to improve efficiency such as using very large vocabularies, weak geometric verification, and query expansion.