This document summarizes a research paper on near-duplicate video retrieval using features extracted from intermediate layers of convolutional neural networks. The researchers extract features from multiple layers of pretrained CNNs like AlexNet, VGGNet and GoogLeNet. They aggregate the features using two methods: vector-based aggregation that concatenates features, and layer-based aggregation that averages features within each layer. These aggregated representations are indexed and used to retrieve near-duplicate videos from a dataset. Their approach outperforms previous methods on standard evaluation metrics, achieving mean average precision of up to 0.81. The researchers also discuss expanding their work to use 3D CNNs and evaluate on larger more challenging datasets.