Representational Challenges of Recognition,
     from Detection to Interpretation



    NSF Frontiers in Computer Vision Workshop

                 Derek Hoiem
           University of Illinois (UIUC)
                    Aug 2011
Recognition in last 15 years
• Focus on object search: “Where is it?”
• Build templates that quickly differentiate object
  patch from background patch



                                               Dog Model




                                            Object or
                                            Non-Object?
Dog Model
Template Matching Problem

   True
 Detections
                              Bad                  Confused with
                           Localization            Similar Object




                                           Confused with
              Misc. Background            Dissimilar Object
Breakdown of top 100 false positives
                                                             Misc.
           Misc.
                     Airplane                                          Car
        Background                                        Background
           16%                                                9%
                                                Other
   Other                                        Object
  Object 4%                                      5%
                                                     Similar
                                                     Object
                                                       16%
  Similar                 Localization
Object 15%                                                               Localization
                             65%                                            70%



                                      Misc.
                                                Cat                                       Dog
                                   Background
                                       5%

                                                                                     Misc.
                        Other Object
                                                                                  Background
                            15%                                        Other         17%     Localization
                                                                       Object                    23%
                                                      Localization      9%
                                                         41%
                                          Similar
                                          Object                                             Similar
                                           39%                                               Object
                                                                                              51%
Felzenzwalb et al. (v4) Detector
PASCAL VOC 2010 valset
Key Challenge: localize the object from a
detection

      Good    Bad         Good    Bad     Dog Model




                    Need good category-sensitive
                    segmentation methods

                    Can free up detectors to focus
                    on discriminative pieces
Key challenge: differentiate between similar
 categories




Robustness through learned abstraction (e.g.,
shape), rather than hand-coded invariance
Compare details, rather than holistic appearance   Dog Model
To get large improvements, we need to solve the
“mid-level” problems

            Potential Gains in Precision-Recall
Object Recognition Challenges
• Last 15 years: object detection
  – Good methods to detect objects, ignore
    background
  – Better segmentation and mid-level
    representations are crucial for further
    improvement


• Next 10+ years: object interpretation
  – How do we represent objects themselves?
Key Challenge: How do we deal with
objects that we can’t categorize?




How to localize objects without categorization?

How to build representations that apply to novel objects?
Key Challenge: build/infer representations
that encode physical context




How to infer physical relations (contact, engagement, etc.)?

How to interpret an object’s role in the scene?
Key Challenge: build/infer representations
that depend on task context




 Big animal ahead,                       Cow
 moving left

 Which objects are relevant, and how are they relevant?
We need complex, multi-faceted representations


• Categories, pose, material, unusual characteristics, etc.


                                                Mirrors

    Vehicle
    Two-wheeled                             Gas tank
    Motorcycle                       Seat                         Headlight
                    Lic. Plate
    Facing right                                                     Motorcycle
                        Tail light
    On the street                      Metal
                    Exhaust
    Has a rider                                                    Rubber
                                               Engine
                                                          Wheel
                         Wheel
Summary

• In object detection, key challenges are object
  segmentation and fine differentiation

• Object interpretation is a wide-open problem,
  and we need new object representations
  – Unfamiliar objects
  – Situational context
  – Task context
Thank you

Fcv rep hoiem

  • 1.
    Representational Challenges ofRecognition, from Detection to Interpretation NSF Frontiers in Computer Vision Workshop Derek Hoiem University of Illinois (UIUC) Aug 2011
  • 2.
    Recognition in last15 years • Focus on object search: “Where is it?” • Build templates that quickly differentiate object patch from background patch Dog Model Object or Non-Object?
  • 3.
    Dog Model Template MatchingProblem True Detections Bad Confused with Localization Similar Object Confused with Misc. Background Dissimilar Object
  • 4.
    Breakdown of top100 false positives Misc. Misc. Airplane Car Background Background 16% 9% Other Other Object Object 4% 5% Similar Object 16% Similar Localization Object 15% Localization 65% 70% Misc. Cat Dog Background 5% Misc. Other Object Background 15% Other 17% Localization Object 23% Localization 9% 41% Similar Object Similar 39% Object 51% Felzenzwalb et al. (v4) Detector PASCAL VOC 2010 valset
  • 5.
    Key Challenge: localizethe object from a detection Good Bad Good Bad Dog Model Need good category-sensitive segmentation methods Can free up detectors to focus on discriminative pieces
  • 6.
    Key challenge: differentiatebetween similar categories Robustness through learned abstraction (e.g., shape), rather than hand-coded invariance Compare details, rather than holistic appearance Dog Model
  • 7.
    To get largeimprovements, we need to solve the “mid-level” problems Potential Gains in Precision-Recall
  • 8.
    Object Recognition Challenges •Last 15 years: object detection – Good methods to detect objects, ignore background – Better segmentation and mid-level representations are crucial for further improvement • Next 10+ years: object interpretation – How do we represent objects themselves?
  • 11.
    Key Challenge: Howdo we deal with objects that we can’t categorize? How to localize objects without categorization? How to build representations that apply to novel objects?
  • 12.
    Key Challenge: build/inferrepresentations that encode physical context How to infer physical relations (contact, engagement, etc.)? How to interpret an object’s role in the scene?
  • 13.
    Key Challenge: build/inferrepresentations that depend on task context Big animal ahead, Cow moving left Which objects are relevant, and how are they relevant?
  • 14.
    We need complex,multi-faceted representations • Categories, pose, material, unusual characteristics, etc. Mirrors Vehicle Two-wheeled Gas tank Motorcycle Seat Headlight Lic. Plate Facing right Motorcycle Tail light On the street Metal Exhaust Has a rider Rubber Engine Wheel Wheel
  • 15.
    Summary • In objectdetection, key challenges are object segmentation and fine differentiation • Object interpretation is a wide-open problem, and we need new object representations – Unfamiliar objects – Situational context – Task context
  • 16.