This document discusses visual search and recognition, specifically large scale instance search. It outlines detecting objects across different images despite changes in scale, viewpoint, lighting and occlusion. Key steps include covariant feature detection to find corresponding regions, and generating invariant descriptors like SIFT to match features between images. The goal is to cast object recognition as nearest neighbor matching or text retrieval to perform efficient search over very large datasets.