This document discusses using object detection models like Faster R-CNN and Mask R-CNN to extract embeddings from images for image retrieval purposes. It proposes a student-teacher training paradigm where an object detection model acts as the student and learns to transform its output feature map into the feature space of a classification teacher model through knowledge distillation in order to generate more semantically meaningful embeddings. The goal is to perform efficient image retrieval based on these object-level embeddings.