• Save
Promising avenues for interdisciplinary research in vision
Upcoming SlideShare
Loading in...5
×
 

Promising avenues for interdisciplinary research in vision

on

  • 1,982 views

 

Statistics

Views

Total Views
1,982
Views on SlideShare
1,961
Embed Views
21

Actions

Likes
1
Downloads
0
Comments
2

3 Embeds 21

http://www.foerderverein-technische-fakultaet.at 14
http://www.ftf.or.at 5
http://www.slideshare.net 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Promising avenues for interdisciplinary research in vision Promising avenues for interdisciplinary research in vision Presentation Transcript

  • Dr. Oge Marques Associate Professor Computer Science and Engineering Florida Atlantic University Boca Raton, FL (USA) June 2009
  • Take-home message We postulate that many challenging problems in human and computer vision research can be approached in a truly interdisciplinary way and show examples of recent work on the topic of “objects in context” that support our claim.!
  • Outline •  Background and motivation •  Visual perception •  Object detection and recognition •  Scene recognition and analysis •  The role of context •  Representative work •  Concluding remarks View slide
  • Background and motivation Computer vision is not as easy as it seemed 40+ years ago View slide
  • Background and motivation •  Computer vision has many open research questions –  Object detection, recognition, and categorization –  Scene analysis, recognition, and understanding –  Objects in context •  Research in human vision has grown tremendously –  Computational models of selected visual processes have emerged •  A truly interdisciplinary effort can help bring the best of human vision research into selected problems in computer vision.
  • The fundamental question of vision How are we able so quickly and effortlessly to perceive meaningful, coherent, 3D scenes from incomplete, 2D patterns of light that enter our eyes?
  • A selected related question 
 How
are
we
able
to
perceive,
 detect,
categorize
and
recognize
 objects
and
scenes?


  • Vision Science Interdisciplinary
study
of
many
areas
of
visual
 processing
and
function
 Areas
of
Research
 Disciplines
 • Detection
 • Psychology
 • Attention
 • Neuroscience
 • Memory
 • Biology
 • Recognition
 • Computer
Science
 • Motion
perception
 • Engineering
 • etc.
 • etc.

  • Reverse engineering the perceptual system •  We know the visual system works •  But how?
  • We don’t ‘see’ with our eyes We see with our brains!
  • The hierarchical nature of the scientific knowledge of the visual system The deeper in the system you go, the less we know…
  • What do we know about visual perception? Not much compared to what we don’t know Ignorance Knowledge
  • Outline •  Background and motivation •  Visual perception •  Object detection and recognition •  Scene recognition and analysis •  The role of context •  Representative work •  Concluding remarks
  • The perceptual process Source: E.B. Goldstein, “Sensation and Perception”
  • Four Stages of Visual Perception Inspired by work by David Marr (1945-1980) •  One of the most influential neuroscientists of vision. •  Thought of vision as an information-processing task. •  In his book Vision (1982), he distinguished three different levels of description involved in understanding complex information processing systems: –  Computational level –  Algorithmic level –  Implementation level •  An important point is that the levels can be considered independently.
  • Four Stages of Visual Perception
  • Four Stages of Visual Perception
  • Four Stages of Visual Perception
  • Four Stages of Visual Perception
  • Four Stages of Visual Perception “cup”
  • Outline •  Background and motivation •  Visual perception •  Object detection and recognition •  Scene recognition and analysis •  The role of context •  Representative work •  Concluding remarks
  • The challenge of object recognition •  Why is it so difficult for computers to carry out object recognition tasks that humans can perform easily? •  Although most human visual perception appears to be almost effortless, it involves complex “behind the scenes” processes.
  • The challenge of object recognition Human vision scientist: “Let’s look at selected behavioral and neural processes that make it possible for people to perceive (i.e., detect and recognize) objects.” Computer vision scientist: “Let’s model what is known – and reasonable – and try it out on standard databases containing real-world images.”
  • The challenge of object perception •  The stimulus on the receptors is ambiguous
  • The challenge of object perception •  The stimulus on the receptors is ambiguous The inverse projection problem
  • The challenge of object perception •  The stimulus on the receptors is ambiguous
  • The challenge of object perception •  The stimulus on the receptors is ambiguous http://users.skynet.be/J.Beever/pave.htm
  • The challenge of object perception •  Objects can be hidden or blurred Can you find… - the pencil? - the glasses?
  • The challenge of object perception •  Objects can be hidden or blurred Who are these people?
  • The challenge of object perception Objects look different from different viewpoints The ability of humans to recognize an object seen from different viewpoints is called viewpoint invariance.
  • The challenge of object perception Objects look different from different viewpoints Q: Which two faces correspond to the same person? A1 (human): (a) and (c) A2 (computer): (a) and (b)
  • Research question How do we recognize objects from different viewpoints? Structural-Description Models Image-Description Models Propose that our ability to recognize Propose that our ability to recognize 3D objects is based on 3D volumes objects from different viewpoints is (called volumetric features) that can based on stored 2D views of the be combined to create the overall object as it would appear from different shape of an object. viewpoints. Which Model Is Correct? The actual mechanism for object recognition probably involves elements of both the structural-description and image-description models (Palmeri & Gauthier, 2004)
  • Why do we care about object recognition? Because object recognition leads to perception of function.
  • So, what do we use direct or indirect? “It seems exceedingly unlikely (though logically possible) that we categorize everything in our visual fields”, Palmer. Hypothesis: we categorize the objects that are relevant for a specific task that we have at hand, but we only extract affordances from the others.
  • Object detection and the “Head in the coffee beans problem”
  • “Head in the coffee beans problem” Can you find the head in this image?
  • “Head in the coffee beans problem” Can you find the head in this image?
  • So what does object recognition involve? Slide by Fei-Fei, Fergus, Torralba
  • Verification: is that a lamp? Slide by Fei-Fei, Fergus, Torralba
  • Detection: are there people? Slide by Fei-Fei, Fergus, Torralba
  • Identification: is that Potala Palace? Slide by Fei-Fei, Fergus, Torralba
  • Object categorization mountain tree building banner street lamp vendor people Slide by Fei-Fei, Fergus, Torralba
  • Scene and context categorization •  outdoor •  city •  … Slide by Fei-Fei, Fergus, Torralba
  • Is this space large or small? How far are the buildings in the back? Slide by Fei-Fei, Fergus, Torralba
  • Activity What is this person doing? What are these two doing?? Slide by Fei-Fei, Fergus, Torralba
  • Outline •  Background and motivation •  Visual perception •  Object detection and recognition •  Scene recognition and analysis •  The role of context •  Representative work •  Concluding remarks
  • What is a scene? •  A scene is a view of a real-world environment that contains multiples surfaces and objects, organized in a meaningful way. –  A tour of scene understanding literature: http://cvcl.mit.edu/SUNSarticles.htm
  • The “gist” of a scene •  Mary Potter (1975, 1976) demonstrated that during a rapid sequential visual presentation (100 msec per image), a novel scene picture is indeed instantly understood and observers seem to comprehend a lot of visual information, but a delay of a few hundreds msec (~ 300 msec) is required for the picture to be consolidated in memory. •  The “gist” (a summary) refers to the visual information perceived after/during a glance at an image. •  To simplify, the gist is often synonymous with the basic level category of the scene or event (e.g. wedding, bathroom, beach, forest, street)
  • What we (don’t) know about scene analysis, recognition, and classification •  Humans are very good at recognizing and classifying scenes •  We are also very fast (100 ms or less) •  We often sacrifice accuracy in the name of speed (we capture the gist but miss many details) •  How exactly do we do it?
  • What is the basis for scene identification? •  Different schools of thought: – Scene-centered – Part-based (i.e., object-centered) – Holistic
  • Outline •  Background and motivation •  Visual perception •  Object detection and recognition •  Scene recognition and analysis •  The role of context •  Representative work •  Concluding remarks
  • Objects in context •  Objects do not exist isolated from a context •  Torralba’s challenge: “How far can you go without using an object detector?”
  • Objects in context
  • The multiple personalities of a blob
  • The multiple personalities of a blob
  • Look-Alikes by Joan Steiner
  • Why is context important?
  • What are the hidden objects?
  • What are the hidden objects?
  • Biederman 1982 •  Pictures shown for 150 ms. •  Objects in appropriate context were detected more accurately than objects in an inappropriate context. •  Scene consistency affects object detection.
  • Objects and Scenes Biederman’s violations (1981):
  • Support
  • Interposition
  • Size
  • Position, Probability
  • Biederman’s classes in Computer Vision Galleguillos & Belongie, Tech Report (2008) •  Interposition and support can be coded by reference to physical space. •  Probability, position and size are defined as semantic relations because they require access to the referential meaning of the object. •  Semantic relations include information about detailed interactions among objects in the scene and they are often used as contextual features.
  • Dreaming of an ideal computer vision solution… Galleguillos & Belongie, Tech Report (2008)
  • Types of context Galleguillos & Belongie, Tech Report (2008) •  Contextual features can be grouped into 3 categories: –  semantic context (probability) –  spatial context (position) –  scale context (size). •  Contextual knowledge can be any information that is not directly produced by the appearance of an object. •  It can be obtained from: –  the nearby image data; –  image tags or annotations; –  the presence and location of other objects.
  • Acquiring and modeling context Galleguillos & Belongie, Tech Report (2008) •  Which of the three should one use? –  Spatial and scale context are the most exploited types of context by recognition frameworks. –  Generally, semantic context is implicitly present in spatial context, as information of object co-occurrences come from identifying objects for the spatial relations in the scene. –  The same happens to scale context, as scale is measured with respect to others objects. –  Therefore, using spatial and scale context involve using all forms of contextual information in the scene.
  • Outline •  Background and motivation •  Visual perception •  Object detection and recognition •  Scene recognition and analysis •  The role of context •  Representative work •  Concluding remarks
  • Representative work •  There are many research groups working on the intersection of human and computer vision in numerous topics, including “objects in context”. •  Most expressive example: work by Aude Oliva and Antonio Torralba (and collaborators) at MIT.
  • Representative work •  A case study: –  L.W. Renninger and J. Malik (2004). When is scene recognition just texture recognition? Vision Research, 44, 2301-2311.
  • Renninger and Malik •  Basic idea –  Consider texture as an early cue for scene perception. •  It’s simple •  It’s fast (pre-attentive) (Julesz, 1981)
  • Renninger and Malik •  Approach How well do humans Build a texture-based discriminate scenes model for scene with very limited discrimination. exposure? Compare performance!
  • Scene categories
  • Scene categories
  • Renninger and Malik •  Task –  2AFC –  Subjects are shown an image •  Image exposure time: 37, 50 and 69ms –  Image followed by a jumbled scene mask –  The task is to select one of two word choices that best describes the image –  Subject performance: 77%, 82% and 92% correct •  Get ready…
  • Texture Discrimination Model –  Cluster response distributions from V1-like filters to get prototypical responses (textons) –  Remember what types of textons occur in particular scenes (build histogram) –  Label new image using a nearest neighbor classifier •  Compare texton histogram for new image to stored representations (χ2 distance) (Malik and Perona, 1990) (Malik, et. al., 1999)
  • Texture Discrimination Model •  V1-like filters
  • Texture Discrimination Model •  Textons
  • Texture Discrimination Model
  • Confusion matrix Outdoor/ Natural Indoor MM Natural 50.56 33.26 16.19 Outdoor / MM 23.14 46.54 30.33 Indoor 8.12 18.69 73.18
  • Discrimination of Superordinate Categories
  • Renninger and Malik •  Conclusion –  Early scene identification can be mostly explained by a simple texture model
  • Outline •  Background and motivation •  Visual perception •  Object detection and recognition •  Scene recognition and analysis •  The role of context •  Representative work •  Concluding remarks
  • Our experience •  Working with Dept of Psychology @ FAU –  Two joint graduate-level courses –  Joint student supervision –  Joint grant proposals –  Joint papers (in preparation) –  Constant discussions –  Promising days ahead… •  Imaging Science & Technology Center •  Multidisciplinary Vision Program
  • Our focus •  To establish quantitative measures of the importance of context – Method: present subjects with degraded (blocky, blurry, etc.) objects against a context and ask them to recognize the objet as it becomes progressively more visible. – Human vision: behavioral experiments – Computer vision: stimuli creation
  • Concluding remarks •  Great potential •  Cultural barriers •  Open problems and challenges on both sides •  The time is ripe for interdisciplinary research on vision, particularly “objects in context”
  • Acknowledgments •  Thanks to Prof. Elan Barenholtz (Dept of Psychology, FAU) for allowing me to use some of his slides and for the many interesting discussions on the topics presented in this talk. •  Many slides for this talk contain material made publicly available on the Web by Antonio Torralba and Aude Oliva (MIT) and Fei-Fei Li (UIUC).
  • Thank you for attending my talk! Questions? Email: omarques@fau.edu