Machine learning methods are increasingly being used for computer vision tasks like object recognition. Bag-of-words models represent images as collections of local features or "visual words" to classify images based on their visual content. These models first detect local features, then cluster them into visual word codebooks. Images are then represented as histograms of visual word frequencies. Naive Bayes and hierarchical Bayesian models like pLSA and LDA can then be used for classification or topic modeling based on these image representations. While intuitive and computationally efficient, bag-of-words models lack explicit geometric information about object parts and have not been extensively tested for viewpoint and scale invariance.