Learning Object Detectors From Weakly Supervised Image Data

1,298 views

Published on

One of the fundamental challenges in automatically detecting and localizing objects in images is the need to collect a large number of example images with annotated object locations (bounding boxes). The introduction of detection challenge datasets has propelled progress by providing the research community with enough fully annotated images to train competitive detectors for 20-200 classes. However, as we look forward towards the goal of scaling our systems to human level category detection, it becomes impractical to collect a large quantity of bounding box labels for tens, or even hundreds of thousands of categories.

In this talk I will discuss recent work on enabling the training of detectors with weakly annotated images, i.e. images that are known to contain the object but with unknown object location (bounding box). The first approach I will present proposes a new multiple instance learning method for object detection that is capable of handling noisy automatically obtained annotations. Our approach consists of first obtaining confidence estimates over the label space and second incorporating these estimates within a new Boosting procedure. We demonstrate the efficiency of our procedure on two detection tasks, namely horse detection and pedestrian detection, where the training data is primarily annotated by a coarse area of interest detector, and show substantial improvements over existing MIL methods.

I will also present a second, complimentary approach--a domain adaptation algorithm which learns the difference between the classification task and the detection task, and transfers this knowledge to classifiers for categories without bounding box annotated data, adapting them into detectors. Our method has the potential to enable detection for the tens of thousands of categories that lack bounding box annotations, yet have plenty of classification data in Imagenet. The approach is evaluated on the ImageNet LSVRC-2013 detection challenge.

Published in: Science, Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,298
On SlideShare
0
From Embeds
0
Number of Embeds
338
Actions
Shares
0
Downloads
27
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Learning Object Detectors From Weakly Supervised Image Data

  1. 1. КОМПЬЮТЕРНОЕ ЗРЕНИЕ: ОБУЧЕНИЕ РАСПОЗНАВАНИЮ ОБЪЕКТОВ Kate Saenko, University of Massachusetts, Lowell
  2. 2. COMPUTER VISION: LEARNING TO DETECT OBJECTS Kate Saenko, University of Massachusetts, Lowell
  3. 3. What is computer vision?2
  4. 4. Computer Vision 3 Terminator 2 we’re not quite there yet, but…. terminator 2, enemy of the state (from UCSD “Fact or Fiction” DVD)
  5. 5. Machine Learning: What is it?  Program a computer to learn from experience  Learn from “big data”
  6. 6. Machine Learning in practice
  7. 7. Machine learning is not perfect 6
  8. 8. Machine learning is not perfect 7
  9. 9. Personal photo albums Lots of image data available!
  10. 10. Data for computer vision
  11. 11. What are applications of computer vision? 10
  12. 12. Surveillance and security Computer Vision: Surveillance and Security
  13. 13. Smart cars  Mobileye  Vision systems currently in high-end BMW, GM, Volvo models  By 2010: 70% of car manufacturers Slide content courtesy of Amnon Shashua
  14. 14. Scientific Images
  15. 15. Medical Imaging Image guided surgery Grimson et al., MIT 3D imaging MRI, CT slide by S. Seitz
  16. 16. Vision for Robotics http://www.robocup.org/NASA’s Mars Spirit Rover http://en.wikipedia.org/wiki/Spirit_rover slide by S. Seitz
  17. 17. Object Detection: Face Detection Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001
  18. 18. What is object detection?17
  19. 19. Goal of object detection 18 Detect: PERSON
  20. 20. Why is object detection difficult? 19
  21. 21. Why is object detection difficult? 20 Can you detect all objects in this image?
  22. 22. Easy to collect data on the web! 21
  23. 23. Difficult to label image annotations 22  Easy to label from search engine  Much more difficult and costly to label dog apple dog apple
  24. 24. Goal of this research: 23  Learn from weakly labeled data!
  25. 25. How well can we do without bounding box labels?24 Computer detecting pedestrians
  26. 26. 25 Computer detecting 7,000 object categories How well can we do without bounding box labels?
  27. 27. Join work with Karim Ali Confidence-rated Multiple instance Boosting for Detection
  28. 28. Motivation 27  Object Detection  High accuracy requires large labeled data sets  Scalability  Reducing annotation requirements  Semi-supervised Learning  Active Learning  Multiple-Instance Learning
  29. 29. Overview 28 CR- MILBOOST
  30. 30. Multiple instance learning with noise 29  MI Learning cannot handle noisy bags
  31. 31. Outline 30  Reminder: What is MIL?  CR-MILBoost (CVPR’14)  Conclusion & Future Work  Discussion
  32. 32. Reminder: What is MIL? 31  Supervised Learning  Each instance has an associated label  MIL: Weaker Supervision  Examples come in bags  Each Bag has a label  Negative Bag: all instances in bag are negative  Positive Bag: at least one instance in bag is positive
  33. 33. Supervised vs MIL (binary) 32  Supervised Learning  MI Learning xi, yi( ) Î RD ´ -1,1{ } Xi = xi1,… , xiK{ }, yi( )Î RD ( ) K ´ -1,1{ } j x( )> 0 if y = +1 j x( )< 0 if y = -1 max j j(xij ) > 0 if yi = +1 max j j(xij ) < 0 if yi = -1 j* x( )= argmin j x( ) L j;x, y( ) j* x( )= argmin j x( ) L j;X, y( )
  34. 34. Related Methods 33  How to estimate latent labels for positives Gartner, ICML’02 Xi = 1 N xijå Xu, ICML’04 j(Xi ) = 1 N j(xijå ) Andrews, NIPS’03 j(Xi )= max j j(xij ) Bunescu, ICML’07 SVM Constraints Viola, NIPS’07 pi =1-Õj (1- pij ) Supervised MIL
  35. 35. CR-MILBOOST 34 j* (x) = argminÕ pi ti (1- pi )1-ti pij = 1 1+e -j xij( ) pi =1-Õj (1- pij )  MILBoost
  36. 36. CR-MILBOOST 35 j* (x) = argminÕ pi ti (1- pi )1-ti  MILBoost wij = yi - pi pi pij j(x) = akhk (x) k å
  37. 37. CR-MILBOOST 36  Two Step Procedure  Estimate Probabilities on latent label  Integrate estimate in new loss  Mitigates label estimation error by incorporating priors
  38. 38. CR-MILBOOST 37 Q = j1 x( ),j2 x( ),… ,jq x( ){ } hij º P yij = yi Q( )= 1 1+e -yi jq xij( )å hi º P yi Q( )= max j hij  Step 1
  39. 39. CR-MILBOOST 38  Step 2 j* (x) = argminÕ pi ti (1- pi )1-ti pij = 1 1+e -j xij( ) pi =1-Õj (1- pij ) hij hi
  40. 40. CR-MILBOOST 39  Step 2 wij = yi - pi hi pi hij pij j(x) = akhk (x) k å
  41. 41. Experiments: Features 40 h* e,R (x) = xe (x,m) mÎR å xd (x,m) dÎF,mÎR å  Weak Learners:  An edge orientation  A sub-window  A threshold e,R,t( )  Simple, Efficient  Q=4, number of stumps f x( ) = akhk x( ) k å
  42. 42. Experiments: Pedestrian Detection 41  Training Data  200 images automatically downloaded from the web  200 “objectness” bounding boxes
  43. 43. Experiments: Pedestrian Detection 42  Testing Data  INRIA Person  300 images containing 600 pedestrians
  44. 44. Experiments: Pedestrian Detection 43
  45. 45. Experiments: Pedestrian Detection 44
  46. 46. Experiments: Pedestrian Detection 45
  47. 47. Experiments: Horse Detection 46  Training Data  200 images automatically downloaded from the web  200 “objectness” bounding boxes
  48. 48. Experiments: Horse Detection 47  Testing Data  200 images containing 200 side-view horses
  49. 49. Experiments: Horse Detection 48
  50. 50. Experiments: Horse Detection 49
  51. 51. Experiments: Horse Detection 50
  52. 52. Conclusion 51  New MIL method: CR-MILBOOST  Two step procedure  Dramatic increase in performance 200% on two datasets  Quality of selected examples still suffer from additional ambiguity when compared to the fully supervised examples
  53. 53. Joint work with Judy Hoffman, Eric Tzeng, Sergio Guadarrama and Trevor Darrell at UC Berkeley Adapting Deep CNNs from Classification to Detection 53
  54. 54. Recall: classification is easier than detection 54  Classification label: Easy to label  Detection label: much more difficult and costly! dog apple dog apple
  55. 55. ICLASSIFY dogapple I DET dog apple ICLASSIFY cat W CLASSIFY dog W CLASSIFY apple Classifiers WDET dog WDET apple Detectors WCLASSIFY cat WDET cat IDET ? Main idea behind the approach
  56. 56. cat: 0.90 dog: 0.85 airplane: 0.05 person: 0.10 layers 1-5 fc6 fc7 fcA fcB Classification data from categories A and B Train Classification CNN Deep Convolutional Neural Network
  57. 57. dog: 0.87 person: 0.15 cat: 0.90 dog: 0.85 background: 0.25 airplane: 0.05 person: 0.10 layers 1-5 det layers 1-5 fc6 det fc6 fc7 det fc7 fcA fcB det fcB Classification data from categories A and B Train Classification CNN Detection data from categories B Labeled warped region Train adapted detection CNN dog background background: 0.25 det layers 1-5 det fc6 det fc7 Final Combined and fully adapted CNN cat: 0.90 airplane: 0.02det fcA dog: 0.45 person: 0.15 det fcB adapt background (c) Output Layer Adaptation (a)ClassificationCNN (b) Hidden Layer Adaptation
  58. 58. Results on ILSVRC 2013 Detection
  59. 59. Results on ILSVRC 2013 Detection
  60. 60. Results on ILSVRC 2013 Detection
  61. 61. Preliminary results on 7K categories 62
  62. 62. Conclusion 63  Presented two new methods for object detector training with minimal bounding box annotation  MIL based method for learning from results of image search  Adaptation from classification to detection task
  63. 63. Questions? 64

×