Pictures and Words<br />
Vision and language in human brain<br />Language<br />Vision<br />Wernicke<br />Area<br />Broca<br />Area<br />PPA<br />LO...
Vision and language in human brain<br />figure modified from: http://www.colorado.edu/intphys/Class/IPHY3730<br />
Vision and language in human brain<br />?<br />(Translation: “This is not a pipe.”)<br />figure modified from: http://www....
What can you see in a glance of a scene?<br />Fei-Fei, Iyer, Koch, Perona, JoV, 2007<br />
PT = 27ms<br />This was a picture with some dark sploches in it. Yeah. . .that's about it. (Subject: KM)<br />PT = 40ms<br...
Section outline<br />Early “pictures and words” work<br />Content-based retrieval<br />Beyond nouns, towards total scene a...
“Pictures and words”<br />Barnard, Duygulu, de Freitas, Forsyth, Blei, Jordan, Matching words and pictures, JMLR, 2003<br ...
<ul><li>Images are composed of multimodal “concepts”.
Images are clustered based on priors over concepts.
Learning determines localized concepts models from global annotations.
Addresses the correspondence problem
One possible assumption: concept models simultaneously generate both a word and blob  </li></ul>sun<br />sun<br />sky<br /...
<ul><li>A generative model for assembling image data sets from multimodal clusters
Chose an image cluster by p(c)
Chose multimodal concept clusters using p(s|c)
From each multimodal cluster, sample a Gaussian for blob features, p(b|s), and a multinomial for words, p(w|s)
(Skip with some probability to account for mismatched numbers of words and blobs)
For a given correspondence*</li></ul>sun<br />sun<br />sky<br />water<br />waves<br />Barnard et al. JMLR, 2005<br />Slide...
Barnard et al. JMLR, 2005<br />
Section outline<br />Early “pictures and words” work<br />Content-based retrieval<br />Beyond nouns, towards total scene a...
Content-based retrieval<br />Elegance<br />Love<br />Symmetry<br />Flower<br />Petals<br />Tower<br />France<br />Rose<br ...
Literature – MANY!!!<br />A. W. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, Content-Based Image Retrieval at the...
Try out Alipr (www.alipr.com)<br />
Try out Alipr (www.alipr.com)<br />
Automatic Image Annotation: ALIP<br />Slide courtesy ofRitendraDatta, Jia Li, James Z. Wang<br />
Automatic Image Annotation: ALIP<br />Slide courtesy ofRitendraDatta, Jia Li, James Z. Wang<br />
Automatic Image Annotation: ALIP<br />2D-MHMM: Two-dimensional multi-resolution hidden Markov model<br />Slide courtesy of...
Automatic Image Annotation: ALIP<br />Annotation Process<br /><ul><li>Classification results form the basis
Salient words appearing in the classification favored more</li></ul>Food, indoor, cuisine, dessert<br />Building, sky, lak...
Section outline<br />Early “pictures and words” work<br />Content-based retrieval<br />Beyond nouns, towards total scene a...
Section outline<br />Early “pictures and words” work<br />Content-based retrieval<br />Beyond nouns, towards total scene a...
Gupta & Davis, EECV, 2008<br />“Beyond nouns”<br />
“Beyond nouns”<br />Gupta & Davis, EECV, 2008<br />
Gupta & Davis, EECV, 2008<br />
Section outline<br />Early “pictures and words” work<br />Content-based retrieval<br />Beyond nouns, towards total scene a...
What, where and who? Classifying events by scene and object recognition<br />L-J Li & L. Fei-Fei, ICCV 2007<br />
scene pathway<br />object pathway<br />event<br />PFC<br />“where” pathway<br />“what” pathway<br />L.-J. Li & L. Fei-Fei ...
scene pathway<br />“Polo Field”<br />Fei-Fei & Perona, CVPR, 2005<br />L.-J. Li & L. Fei-Fei ICCV 2007<br />
O= ‘horse’<br />object pathway<br />G. Wang & L. Fei-Fei, CVPR, 2006<br />L.-J. Li , G. Wang & L. Fei-Fei, CVPR, 2007<br /...
The 3W stories<br />what<br />who<br />where<br />L.-J. Li & L. Fei-Fei ICCV 2007<br />
Upcoming SlideShare
Loading in...5
×

Iccv2009 recognition and learning object categories p2 c03 - objects and annotations

400

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
400
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Iccv2009 recognition and learning object categories p2 c03 - objects and annotations

  1. 1. Pictures and Words<br />
  2. 2. Vision and language in human brain<br />Language<br />Vision<br />Wernicke<br />Area<br />Broca<br />Area<br />PPA<br />LOC<br />V1<br />FFA<br />
  3. 3. Vision and language in human brain<br />figure modified from: http://www.colorado.edu/intphys/Class/IPHY3730<br />
  4. 4. Vision and language in human brain<br />?<br />(Translation: “This is not a pipe.”)<br />figure modified from: http://www.colorado.edu/intphys/Class/IPHY3730<br />
  5. 5.
  6. 6.
  7. 7. What can you see in a glance of a scene?<br />Fei-Fei, Iyer, Koch, Perona, JoV, 2007<br />
  8. 8. PT = 27ms<br />This was a picture with some dark sploches in it. Yeah. . .that's about it. (Subject: KM)<br />PT = 40ms<br />I think I saw two people on a field. (Subject: <br />RW) <br />PT = 67ms<br />Outdoor scene. There were some kind of animals, maybe dogs or horses, in the middle of the picture. It looked like they were running in the middle of a grassy field. (Subject: IV) <br />PT = 500ms<br />Some kind of game or fight. Two groups of two men? The foregound pair looked like one was getting a fist in the face. Outdoors seemed like because i have an impression of grass and maybe lines on the grass? That would be why I think perhaps a game, rough game though, more like rugby than football because they pairs weren't in pads and helmets, though I did get the impression of similar clothing. maybe some trees? in the background. (Subject: SM)<br />PT = 107ms<br />two people, whose profile was toward me. looked like they were on a field of some sort and engaged in some sort of sport (their attire suggested soccer, but it looked like there was too much contact for that). (Subject: AI) <br />Fei-Fei, Iyer, Koch, Perona, JoV, 2007<br />
  9. 9. Section outline<br />Early “pictures and words” work<br />Content-based retrieval<br />Beyond nouns, towards total scene annotation<br />
  10. 10. “Pictures and words”<br />Barnard, Duygulu, de Freitas, Forsyth, Blei, Jordan, Matching words and pictures, JMLR, 2003<br />Duygulu, Barnard, de Freitas, Forsyth, Object Recognition as Machine Translation: Learning a lexicon for a fixed image vocabulary , ECCV, 2003<br />Blei & Jordan, Modeling annotated data, ACM SIGIR, 2003<br />Chang, Goh, Sychay, & Wu, Soft annotation using Bayes point machines, IEEE Transactions on Circuits and Systems for Video Technology, 2003<br />Goh, Chang, & Cheng, Ensemble of SVM-based classifiers for annotation, 2003<br />….<br />
  11. 11. <ul><li>Images are composed of multimodal “concepts”.
  12. 12. Images are clustered based on priors over concepts.
  13. 13. Learning determines localized concepts models from global annotations.
  14. 14. Addresses the correspondence problem
  15. 15. One possible assumption: concept models simultaneously generate both a word and blob </li></ul>sun<br />sun<br />sky<br />water<br />waves<br />Barnard et al. JMLR, 2005<br />Slide courtesy of Kobus Barnard (1 hour ago!)<br />
  16. 16. <ul><li>A generative model for assembling image data sets from multimodal clusters
  17. 17. Chose an image cluster by p(c)
  18. 18. Chose multimodal concept clusters using p(s|c)
  19. 19. From each multimodal cluster, sample a Gaussian for blob features, p(b|s), and a multinomial for words, p(w|s)
  20. 20. (Skip with some probability to account for mismatched numbers of words and blobs)
  21. 21. For a given correspondence*</li></ul>sun<br />sun<br />sky<br />water<br />waves<br />Barnard et al. JMLR, 2005<br />Slide courtesy of Kobus Barnard (1 hour ago!)<br />
  22. 22. Barnard et al. JMLR, 2005<br />
  23. 23. Section outline<br />Early “pictures and words” work<br />Content-based retrieval<br />Beyond nouns, towards total scene annotation<br />
  24. 24. Content-based retrieval<br />Elegance<br />Love<br />Symmetry<br />Flower<br />Petals<br />Tower<br />France<br />Rose<br />Corolla<br />Australian Floribunda Rose<br />EiffelTower<br />Paris<br />Slide courtesy of RitendraDatta, Jia Li, James Z. Wang<br />
  25. 25. Literature – MANY!!!<br />A. W. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, Content-Based Image Retrieval at the End of the Early Years, IEEE Trans. Pattern Analysis and Machine Intelligence , 22(12):1349-1380, 2000. <br />R. Datta, D. Joshi, J. Li, and J. Z. Wang, Image Retrieval: Ideas, Influences, and Trends of the New Age, ACM Computing Surveys, vol. 40, no. 2, pp. 5:1-60, 2008.<br />
  26. 26. Try out Alipr (www.alipr.com)<br />
  27. 27. Try out Alipr (www.alipr.com)<br />
  28. 28. Automatic Image Annotation: ALIP<br />Slide courtesy ofRitendraDatta, Jia Li, James Z. Wang<br />
  29. 29. Automatic Image Annotation: ALIP<br />Slide courtesy ofRitendraDatta, Jia Li, James Z. Wang<br />
  30. 30. Automatic Image Annotation: ALIP<br />2D-MHMM: Two-dimensional multi-resolution hidden Markov model<br />Slide courtesy ofRitendraDatta, Jia Li, James Z. Wang<br />
  31. 31. Automatic Image Annotation: ALIP<br />Annotation Process<br /><ul><li>Classification results form the basis
  32. 32. Salient words appearing in the classification favored more</li></ul>Food, indoor, cuisine, dessert<br />Building, sky, lake, landscape, Europe, tree<br />Snow, animal, wildlife, sky, cloth, ice, people<br />Slide courtesy ofRitendraDatta, Jia Li, James Z. Wang<br />
  33. 33. Section outline<br />Early “pictures and words” work<br />Content-based retrieval<br />Beyond nouns, towards total scene annotation<br />Propositions<br />A. Gupta and L. S. Davis, Beyond Nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers, ECCV, 2008<br />Objects, scenes, activities<br />L.-J. Li and L. Fei-Fei. What, where and who? Classifying event by scene and object recognition. ICCV, 2007<br />L.-J. Li, R. Socher and L. Fei-Fei. Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework. CVPR, 2009<br />
  34. 34. Section outline<br />Early “pictures and words” work<br />Content-based retrieval<br />Beyond nouns, towards total scene annotation<br />Propositions<br />A. Gupta and L. S. Davis, Beyond Nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers, ECCV, 2008<br />Objects, scenes, activities<br />L.-J. Li and L. Fei-Fei. What, where and who? Classifying event by scene and object recognition. ICCV, 2007<br />L.-J. Li, R. Socher and L. Fei-Fei. Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework. CVPR, 2009<br />
  35. 35. Gupta & Davis, EECV, 2008<br />“Beyond nouns”<br />
  36. 36. “Beyond nouns”<br />Gupta & Davis, EECV, 2008<br />
  37. 37. Gupta & Davis, EECV, 2008<br />
  38. 38. Section outline<br />Early “pictures and words” work<br />Content-based retrieval<br />Beyond nouns, towards total scene annotation<br />Propositions<br />A. Gupta and L. S. Davis, Beyond Nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers, ECCV, 2008<br />Objects, scenes, activities<br />L.-J. Li and L. Fei-Fei. What, where and who? Classifying event by scene and object recognition. ICCV, 2007<br />L.-J. Li, R. Socher and L. Fei-Fei. Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework. CVPR, 2009<br />
  39. 39. What, where and who? Classifying events by scene and object recognition<br />L-J Li & L. Fei-Fei, ICCV 2007<br />
  40. 40. scene pathway<br />object pathway<br />event<br />PFC<br />“where” pathway<br />“what” pathway<br />L.-J. Li & L. Fei-Fei ICCV 2007<br />
  41. 41. scene pathway<br />“Polo Field”<br />Fei-Fei & Perona, CVPR, 2005<br />L.-J. Li & L. Fei-Fei ICCV 2007<br />
  42. 42. O= ‘horse’<br />object pathway<br />G. Wang & L. Fei-Fei, CVPR, 2006<br />L.-J. Li , G. Wang & L. Fei-Fei, CVPR, 2007<br />L. Cao & L. Fei-Fei, ICCV, 2007<br />L.-J. Li & L. Fei-Fei ICCV 2007<br />
  43. 43. The 3W stories<br />what<br />who<br />where<br />L.-J. Li & L. Fei-Fei ICCV 2007<br />
  44. 44. Classification<br />Annotation<br />Segmentation<br />class: Polo<br />Sky<br />Tree<br />Athlete<br />Athlete<br />Horse<br />Grass<br />Trees<br />Sky<br />Saddle<br />Horse<br />Horse<br />Horse<br />Horse<br />Horse<br />Horse<br />Grass<br />L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009<br />
  45. 45. Our model: a hierarchical representation of the image and its semantic contents<br />Sky<br />Rock<br />Total Scene<br />initialization<br />Mountain<br />Sky<br />Sky<br />Generative<br />Model<br />Tree<br />…<br />Class: Polo<br />Athlete<br />Athlete<br />Horse<br />Grass<br />Trees<br />Sky<br />Saddle<br />Class: <br />Rock climbing<br />Horse<br />Tree<br />noisy images and tags<br />Horse<br />Athlete<br />Athlete<br />Mountain<br />Trees<br />Rock<br />Sky<br />Ascent<br />Athlete<br />Horse<br />Horse<br />Horse<br />Learning<br />Grass<br />Tree<br />sailboat<br />Water<br />Class: Sailing<br />Athlete<br />Sailboat<br />Trees<br />Water<br />Sky<br />Wind<br />Recognition<br />L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009<br />
  46. 46. Our model: a hierarchical representation of the image and its semantic contents<br />Sky<br />Rock<br />Total Scene<br />initialization<br />Mountain<br />Sky<br />Sky<br />Generative<br />Model<br />Generative<br />Model<br />Tree<br />…<br />Class: Polo<br />Athlete<br />Athlete<br />Horse<br />Grass<br />Trees<br />Sky<br />Saddle<br />Class: <br />Rock climbing<br />Horse<br />Tree<br />noisy images and tags<br />Horse<br />Athlete<br />Athlete<br />Mountain<br />Trees<br />Rock<br />Sky<br />Ascent<br />Athlete<br />Horse<br />Horse<br />Horse<br />Learning<br />Grass<br />Tree<br />sailboat<br />Water<br />Class: Sailing<br />Athlete<br />Sailboat<br />Trees<br />Water<br />Sky<br />Wind<br />Recognition<br />L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009<br />
  47. 47. The model: a hierarchical representation of the image and its semantic contents<br />Total Scene<br />Polo<br />C<br />“Switch variable”<br />Visible<br />Text<br />Not visible<br />S<br />Visual<br />Athlete<br />Horse<br />Grass<br />Trees<br />Sky<br />Saddle<br />horse<br />Horse<br />O<br />T<br />X<br />R<br />Z<br />Ar<br />NF<br />Nr<br />Nt<br />“Connector variable”<br />D<br />
  48. 48. Our model: a hierarchical representation of the image and its semantic contents<br />Sky<br />Rock<br />Total Scene<br />initialization<br />initialization<br />Mountain<br />Sky<br />Sky<br />Generative<br />Model<br />Generative<br />Model<br />Tree<br />…<br />Class: Polo<br />Athlete<br />Athlete<br />Horse<br />Grass<br />Trees<br />Sky<br />Saddle<br />Class: <br />Rock climbing<br />Horse<br />Tree<br />noisy images and tags<br />Horse<br />Athlete<br />Athlete<br />Mountain<br />Trees<br />Rock<br />Sky<br />Ascent<br />Athlete<br />Horse<br />Horse<br />Horse<br />Learning<br />Learning<br />Grass<br />Tree<br />sailboat<br />Water<br />Class: Sailing<br />Athlete<br />Sailboat<br />Trees<br />Water<br />Sky<br />Wind<br />Recognition<br />L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009<br />
  49. 49. Need some good, initial “guestimate” of O<br />Total Scene<br />C<br />Scene/Event images<br />from the Internet<br />S<br />O<br />T<br />X<br />R<br />Z<br />Nr<br />NF<br />Ar<br />Nt<br />L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009<br />
  50. 50. Auto-semi-supervised learning:<br />Small # of initialized images + Large # of uninitialized images<br />Total Scene<br />Scene/Event images<br />from the Internet<br />Generative<br />Model<br />Large # of uninitialized images<br />+<br />Athlete<br />Horse<br />Grass<br />Tree<br />Wind<br />Saddle<br />Small # of initialized images<br />L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009<br />
  51. 51. Our model: a hierarchical representation of the image and its semantic contents<br />Sky<br />Rock<br />Total Scene<br />initialization<br />Mountain<br />Sky<br />Sky<br />Generative<br />Model<br />Tree<br />…<br />Class: Polo<br />Athlete<br />Athlete<br />Horse<br />Grass<br />Trees<br />Sky<br />Saddle<br />Class: <br />Rock climbing<br />Horse<br />Tree<br />noisy images and tags<br />Horse<br />Athlete<br />Athlete<br />Mountain<br />Trees<br />Rock<br />Sky<br />Ascent<br />Athlete<br />Horse<br />Horse<br />Horse<br />Learning<br />Grass<br />Tree<br />sailboat<br />Water<br />Class: Sailing<br />Athlete<br />Sailboat<br />Trees<br />Water<br />Sky<br />Wind<br />Recognition<br />L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009<br />
  52. 52. 8 Event/Scene Classes<br /> Rockclimbing<br /> Badminton<br /> Bocce<br /> Rowing<br /> Croquet<br /> Sailing<br /> Snow<br />boarding<br /> Polo<br />
  53. 53. 43<br />Some sample results<br />Total Scene<br />Class: Croquet<br />Class: Bocce<br />Class: Snowboarding<br />Class: Polo<br />Class: Sailing<br />Class: Badminton<br />Class: Rock Climbing<br />Class: Rowing<br />L-J Li , R. Socher & L. Fei-Fei, CVPR, 2009<br />
  54. 54. PT = 27ms<br />This was a picture with some dark sploches in it. Yeah. . .that's about it. (Subject: KM)<br />PT = 40ms<br />I think I saw two people on a field. (Subject: <br />RW) <br />PT = 67ms<br />Outdoor scene. There were some kind of animals, maybe dogs or horses, in the middle of the picture. It looked like they were running in the middle of a grassy field. (Subject: IV) <br />PT = 500ms<br />Some kind of game or fight. Two groups of two men? The foregound pair looked like one was getting a fist in the face. Outdoors seemed like because i have an impression of grass and maybe lines on the grass? That would be why I think perhaps a game, rough game though, more like rugby than football because they pairs weren't in pads and helmets, though I did get the impression of similar clothing. maybe some trees? in the background. (Subject: SM)<br />PT = 107ms<br />two people, whose profile was toward me. looked like they were on a field of some sort and engaged in some sort of sport (their attire suggested soccer, but it looked like there was too much contact for that). (Subject: AI) <br />Fei-Fei, Iyer, Koch, Perona, JoV, 2007<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×