This document summarizes research comparing different convolutional neural network (CNN) architectures and feature representations on common image classification tasks. It finds that CNN-based methods outperform traditional bag-of-words models. Specifically, it compares different pre-trained CNNs, explores the effects of data augmentation, and shows that fine-tuning networks to target datasets improves performance. The best results are achieved with smaller filters, deeper networks, and ranking loss fine-tuning, outperforming more complex architectures. Code and models are available online for others to replicate the findings.