An Introduction to Computer Vision


Published on

Computer vision has started to achieve some very impressive results over the last 5-10 years. It is now possible to quickly and reliably detect faces, recognize and localize target images, and even classify pictures of objects into generic categories. Unfortunately, knowledge of these techniques remains largely confined to academia. In this session we’ll go over some of the tools available, placing an emphasis on exploring the ideas and algorithms behind their design.

To show how these components can be put together, a sample system will be developed over the course of the presentation. Starting with standard image descriptors, we’ll first see how to do direct image recognition. We’ll then extend that into a simple object classifier, which will be able to distinguish (for example) between images which contain a bicycle and those that don’t.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

An Introduction to Computer Vision

  1. 1. An Introduction to Computer Vision Matthew Dockrey June 18, 2009
  2. 2. Introduction <ul><li>State of the art in computer vision </li></ul><ul><ul><li>PASCAL Visual Objects Challenge </li></ul></ul><ul><ul><li>Semantic Robot Vision Challenge </li></ul></ul>
  3. 3. PASCAL VOC Data Table by Mark Everingham
  4. 4. PASCAL VOC Results Table by Mark Everingham
  5. 5. SRVC
  6. 6. SRVC Results Table by Paul E. Rybski Alexei Efros
  7. 7. What is computer vision? <ul><li>Step 1: Image analysis/feature extraction </li></ul><ul><li>Step 2: Machine learning/statistical analysis </li></ul><ul><li>Step 3: Profit? </li></ul>
  8. 8. Scale-invariant feature transform (SIFT) <ul><li>The standard image feature / descriptor </li></ul><ul><li>David Lowe, 2004 </li></ul><ul><li>The algorithm is patented, but free for non-commercial use </li></ul><ul><li>Binaries only for original implementation, but GPL'd versions exist </li></ul>
  9. 9. SIFT – Point Detection Images by David Lowe
  10. 10. SIFT – Point Descriptor <ul><ul><li>Image by David Lowe </li></ul></ul>
  11. 11. SIFT Feature <ul><li>SIFT = </li></ul><ul><li>Row location </li></ul><ul><li>Column location </li></ul><ul><li>Orientation </li></ul><ul><li>Scale </li></ul><ul><li>128 dimensional vector </li></ul>
  12. 12. SIFT Image Matching <ul><li>1) Extract SIFT features </li></ul><ul><li>2) Match using nearest-neighbor search </li></ul><ul><li>3) Apply semi-local constraints </li></ul><ul><ul><ul><li>Orientation </li></ul></ul></ul><ul><ul><ul><li>Scale </li></ul></ul></ul><ul><ul><ul><li>Location </li></ul></ul></ul><ul><li>4) Huzzah! </li></ul>
  13. 13. Goal: Object Classifier <ul><li>Most object classes won't share exact SIFT features </li></ul><ul><li>Need to abstract properties of the class into a form that we can reason with </li></ul>
  14. 14. we really need context?
  15. 15. No. <ul><ul><li>Image parts: Thomas Hawk </li></ul></ul>
  16. 16. <ul><ul><li>Image: Thomas Hawk </li></ul></ul>
  17. 17. Bag of Words <ul><li>Comes from computational linguistics, document matching </li></ul><ul><li>Cluster features into codebook words </li></ul><ul><ul><li>(Using k-means, usually) </li></ul></ul><ul><li>Image descriptor is a histogram vector counting how many times each word is seen </li></ul><ul><li>[5 8 14 2 12 4 3 5 11 26 1 3 ...] </li></ul>
  18. 18. Support Vector Machine (SVM) <ul><li>Very popular and reasonably fast </li></ul><ul><li>Given a set of training vectors and their labels, will build a classifier which will give a label for any other vector </li></ul><ul><li>(It is trying to find a hyperplane which maximizes the margin between the classes) </li></ul>
  19. 19. Training Data <ul><li>But where do we get training data? </li></ul><ul><li>Dangers of data sets </li></ul><ul><ul><li>Background class? </li></ul></ul><ul><ul><li>Well framed? </li></ul></ul><ul><ul><li>Sufficient variation? </li></ul></ul>
  20. 20. Bag of Words Classifier <ul><li>Putting it all together </li></ul><ul><ul><li>For each training image </li></ul></ul><ul><ul><ul><li>Extract SIFT features </li></ul></ul></ul><ul><ul><ul><li>Cluster into codebook words </li></ul></ul></ul><ul><ul><ul><li>Generate the vectors </li></ul></ul></ul><ul><ul><ul><li>Train the SVM on these vectors </li></ul></ul></ul><ul><ul><li>To test an image </li></ul></ul><ul><ul><ul><li>Extract SIFT features </li></ul></ul></ul><ul><ul><ul><li>Generate the vector </li></ul></ul></ul><ul><ul><ul><li>Use the SVM to predict the class </li></ul></ul></ul>
  21. 21. Classifier Results <ul><li>68.9% accurate! </li></ul><ul><ul><li>(Not bad for a couple hours of programming...) </li></ul></ul><ul><li>Confusion matrix: </li></ul>
  22. 22. Any questions?
  23. 23. Other Machine Learning Systems <ul><li>Randomized trees/forests </li></ul><ul><li>Image: Wikipedia </li></ul>
  24. 24. Other Machine Learning Systems <ul><li>Boosting </li></ul><ul><li>Image: Kihwan Kim </li></ul>
  25. 25. Viola Jones Face Detector <ul><li>Image: Kihwan Kim </li></ul>
  26. 26. Summed Area Table / Integral Image <ul><li>Image: Nvidia </li></ul>