One of the fundamental challenges in automatically detecting and localizing objects in images is the need to collect a large number of example images with annotated object locations (bounding boxes). The introduction of detection challenge datasets has propelled progress by providing the research community with enough fully annotated images to train competitive detectors for 20-200 classes. However, as we look forward towards the goal of scaling our systems to human level category detection, it becomes impractical to collect a large quantity of bounding box labels for tens, or even hundreds of thousands of categories.
In this talk I will discuss recent work on enabling the training of detectors with weakly annotated images, i.e. images that are known to contain the object but with unknown object location (bounding box). The first approach I will present proposes a new multiple instance learning method for object detection that is capable of handling noisy automatically obtained annotations. Our approach consists of ﬁrst obtaining conﬁdence estimates over the label space and second incorporating these estimates within a new Boosting procedure. We demonstrate the efﬁciency of our procedure on two detection tasks, namely horse detection and pedestrian detection, where the training data is primarily annotated by a coarse area of interest detector, and show substantial improvements over existing MIL methods.
I will also present a second, complimentary approach--a domain adaptation algorithm which learns the difference between the classification task and the detection task, and transfers this knowledge to classiﬁers for categories without bounding box annotated data, adapting them into detectors. Our method has the potential to enable detection for the tens of thousands of categories that lack bounding box annotations, yet have plenty of classiﬁcation data in Imagenet. The approach is evaluated on the ImageNet LSVRC-2013 detection challenge.