Contenxt 100407

2,101 views

Published on

In computer vision, context has been mostly ignored in the last two decades. We show that in understanding images, context plays more significant role that content.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,101
On SlideShare
0
From Embeds
0
Number of Embeds
178
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Contenxt 100407

  1. 1. Contenxt: Bridging the Semantic Gap Ramesh Jain (with Pinaki Sinha and other collaborators) Department of Computer Science University of California, Irvine jain@ics.uci.edu © Ramesh Jain
  2. 2. Football Highlight System: Automatic Segmentation 15 College teams All games – 4 cameras 30 minutes after the game © Ramesh Jain
  3. 3. Find Mubarak Shah © Ramesh Jain
  4. 4. Image Search: Ramesh Jain © Ramesh Jain
  5. 5. Gives some details © Ramesh Jain
  6. 6. Tells me who may not be the ‘Real’ © Ramesh Jain
  7. 7. Finds people who are my friends © Ramesh Jain
  8. 8. Image Search: Finds activities © Ramesh Jain
  9. 9. My current research   EventWeb   Connecting and accessing Events   From Twitter, Facebook   From Web cams, Planetary Skin, …   Connecting environments   Personal Media Management   Images, Video, Text, …   Doing Computer Vision Correctly © Ramesh Jain
  10. 10. Computer Vision   Computer vision is the science and technology of machines that see.   As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images. From Wikipedia. © Ramesh Jain
  11. 11. How do you Search for Images?   Use a Content-based Image retrieval engine from XYZ University?   Use a Content Based Image Search Engine from a company?   Is there any?   I tried to do one in 1994 and built Virage as a result – but …..   Or do you use just a ‘text’ search engine? © Ramesh Jain
  12. 12. Text Search Engines   How good are text search engines in object recognition?   Lets Look at some real working systems by searching for people here. © Ramesh Jain
  13. 13. Disruptive Stages in Computing:1 Data Data: Numbers, Text, (Computation) Statistics, Sensors (Video) © Ramesh Jain
  14. 14. Computing 1: Data   Mainframe and workstations   Main applications:   Scientific and engineering   Business   Users:   Sophisticated   Expected to be trained   Dominant Technology   Computing © Ramesh Jain
  15. 15. Disruptive Stages in Computing:2 Information Information: Search, Specialized sources (Communication) Data Data: Numbers, Text, (Computation) Statistics, Sensors (Video) © Ramesh Jain
  16. 16. Computing 2: Information   PC and Internet   Main applications:   Information   Communication   Users:   Common people in ‘developed world’   Easy access using keyboards   Dominant Technology   Authoring tools   Access mechanisms   Sharing © Ramesh Jain
  17. 17. Disruptive Stages in Computing:3 Experience Experience: What Next? Direct observation or (Insights) participation Information Information: Search, Specialized sources (Communication) Data Data: Numbers, Text, (Computation) Statistics, Sensors (Video) © Ramesh Jain
  18. 18. Computing 3: Experience   Experiential devices: Mobile phones   Main applications:   Experience management   Experiential communication   Users:   Humans   No language issues   Dominant Technology   Sensor understanding   Vision and audio will be dominant © Ramesh Jain
  19. 19. © Ramesh Jain
  20. 20. © Ramesh Jain
  21. 21. The Challenge Connecting © Ramesh Jain
  22. 22. Transforma)ons
 Lists, Arrays, Documents, Images … Alphanumeric Characters Bits and Bytes © Ramesh Jain
  23. 23. Semantic Gap The semantic gap is the lack of coincidence between the information that one can extract from the (visual) data and the interpretation that the same data have for a user in a given situation. A linguistic description is almost always contextual, whereas an (image) may live by itself. Content-Based Image Retrieval at the End of the Early Years Found in: IEEE Transactions on Pattern Analysis and Machine Intelligence Arnold Smeulders , et. al., December 2000 © Ramesh Jain
  24. 24. Data Information Experience © Ramesh Jain
  25. 25. Rohrsach Test   use this test to examine a person's personality characteristics and emotional functioning © Ramesh Jain
  26. 26. Falling Tree and George Berkeley   "If a tree falls in a forest and no one is around to hear it, does it make a sound”   "No. Sound is the sensation excited in the ear when the air or other medium is set in motion.“   Observation, Reality, and Perception. © Ramesh Jain
  27. 27. Context   - text surrounding word or passage: the words, phrases, or passages that come before and after a particular word or pas…   - surrounding conditions: the circumstances or events that form the environment within which something exists or takes …   - data transfer structure: a data structure used to transfer electronic data to and from a business management system © Ramesh Jain
  28. 28. Content   - amount of something in container: the amount of something contained in something else   - subject matter: the various issues, topics, or questions dealt with in speech, discussion, or a piece of writing   - meaning or message: the meaning or message contained in a creative work, as distinct from its appearance, form, or style © Ramesh Jain
  29. 29. The Story of Computer Vision Marvin Minsky and the summer project to solve computer vision. The Psychology of Computer Vision (McGraw-Hill Computer Science Series), 1975. © Ramesh Jain
  30. 30. D.L. Waltz, Understanding Line Drawings of Scenes with Shadows. © Ramesh Jain
  31. 31. MSYS: A System for Reasoning about Scenes Harry Barrow and Martin Tenebaum April 1976 © Ramesh Jain
  32. 32. MSYS: Relational Constraints © Ramesh Jain
  33. 33. Relaxation labelling algorithms — a review J Kittler and J Illingworth Image and Vision Computing Volume 3, Issue 4, November 1985, Pages 206-216 Abstract An important research topic in image processing and image interpretation methodology is the development of methods to incorporate contextual information into the interpretation of objects. Over the last decade, relaxation labelling has been a useful and much studied approach to this problem. It is an attractive technique because it is highly parallel, involving the propagation of local information via iterative processing. The paper. surveys the literature pertaining to relaxation labelling and highlights the important theoretical advances and the interesting applications for which it has proven useful. © Ramesh Jain
  34. 34. Serge Belongie and Co-Researchers   semantic context (probability),   spatial context (position)   and scale context (size). © Ramesh Jain
  35. 35. Modeling the World   Data (Semantic Web)   Objects (Search Companies, …)   Events (Relationships among objects and attributes) Both Objects and Events are essential to model the world. © Ramesh Jain
  36. 36. Events   Take place in the real world.   Captured using different sensory mechanism.   Each sensor captures only a limited aspect of the event.   Are used to understand a Situation. © Ramesh Jain
  37. 37. What is in an Event? © Ramesh Jain
  38. 38. Events 1- dimensional Space Time © Ramesh Jain
  39. 39. History: Gopher to Google   We had Internet.   Lots of computers were connected to each other.   Computers had files on them.   We had GOPHER and other FTP mechanisms. © Ramesh Jain
  40. 40. Tim Berners-Lee thought:   Suppose all the information stored on computers everywhere were linked.   Suppose I could program my computer to create a space in which anything could be linked to anything. Others – including Bush -- had that idea earlier but the technology was not ready. © Ramesh Jain
  41. 41. That resulted in the Web   DocumentWeb   Each node is a ‘Page’ or a document.   Pages are linked through explicit referential links © Ramesh Jain
  42. 42. Then Came Google, Facebook, Twitter   Search   Maps   …   Social Network   Events   Twitter   Status Updates   Eventful © Ramesh Jain
  43. 43. Evolution of Search   Alphanumeric structured data: Databases   Information Retrieval   Search   Multimedia Search   Real Time Search (Event Search)   Will lead to identifying situations © Ramesh Jain
  44. 44. Continuing the Evolution of the Web   Consider a Web in which each node   Is an event   Has informational as well as experiential data   Is connected to other nodes using   Referential links   Structural links   Relational links   Causal links   Explicit links can be created by anybody   This EventWeb is connected to other Webs. © Ramesh Jain
  45. 45. Connectors   My 5 Senses are connectors between ‘me’ and the world.   We use our sensors (vision, audio, …) to experience the world.   Sensors could be the interface between the Cyberspace and the Real World.   Sensors are placed for ‘detecting events’.   How do you decide what sensors to put at any place?   Would you put a sensor if nothing interesting ever happens at a place? © Ramesh Jain
  46. 46. From Atomic Events to Composite Events   Spatial and Temporal aggregation   Assimilation   Composition   Using sophisticated models   Ontolgical models could be used   May include causality © Ramesh Jain
  47. 47. EventWeb 1- dimensional Space Time © Ramesh Jain
  48. 48. Types of Context   Relationship among different objects and even in their subparts in real world   Environmental parameters of the digital devices at the time of photo taking   Knowledge about the person taking photos and even of the person Interpreting photo   Real world situation in which the data is interpreted © Ramesh Jain
  49. 49. Context Starts much Before the Photo is Taken   Where   When   Why   Who (Photographer)   Which device   Parameters of the device © Ramesh Jain
  50. 50. Modern Cameras   Are more than ‘Camera Obscura’: They capture an event.   Many sensors capture scene context and store it along with intensity values.   EXIF data is all metadata related to the Event. Exposure Time Aperture Diameter Flash Metering Mode ISO Ratings Focal Length Time Location (soon) Face © Ramesh Jain
  51. 51. Sony CyberShot DSC-T2 Touchscreen 8MP Digital Camera with Smile Detection © Ramesh Jain
  52. 52. Information in a Digital Photo Voice Tags, Preset Modes, Ontology etc Latitude, Longitude Exposure Time, Focal Length, Aperture, Flash, ISO Ratings Date, Time, Time Zone © Ramesh Jain
  53. 53. Experiential Media Management Environment   Event-based   Should be able to deal with ‘multimedia’   Photos   Audio   Video   Text   Information and data   …   Searching based on events and media.   Storytelling © Ramesh Jain
  54. 54. EMME Event Cycle Event Atomic Event Presentation/ Entry Navigation Story Telling EXIF Explore Features Search Tags/ Event Base Context User Photo Annot- stream ations Event Segment. Ontology Event Grouping, Linking, Assimilation © Ramesh Jain
  55. 55. Using EMME   Searching for photo   ACM MM 2009   Creating Albums:   Professional   Family   Tourism   Telling stories   What did I do in Beijing?   Scenario: In December 2009, I have 20,000 pictures taken in 2008. How do I (semi-automatically) select 25 to send to   My mother   The uncle that I hate   My personal friend   My professional friend   … © Ramesh Jain
  56. 56. Content Contenxt Context   Contenxt = Content + Context   Context is as powerful, possibly more, as content in understanding audio-visual information © Ramesh Jain
  57. 57. Examples of Photos from the Unsupervised Clusters: High Exposure Time, Small Aperture © Ramesh Jain
  58. 58. Examples of Photos from the Unsupervised Clusters: Low Aperture (High DOF), Low FL (Wide Angle) © Ramesh Jain
  59. 59. Examples of Photos from the Unsupervised Clusters: High Aperture (Low DOF), High FL (Telephoto) © Ramesh Jain
  60. 60. Examples of Photos from the Unsupervised Clusters: Photos with Flash: Indoor shots © Ramesh Jain
  61. 61. Examples of Photos from the Unsupervised Clusters: Photos with Flash: Darker Outdoors © Ramesh Jain
  62. 62. Photos can be tagged using only EXIF! © Ramesh Jain
  63. 63. Guess the Tags!! Using Image Features Only: Scenery, City Streets, Illuminations, People Posing for Photo, Wildlife. Using Optical Parameters: Single Person Indoors, Portraits, Party Indoors, People at Dinner. © Ramesh Jain
  64. 64. Guess The Tags!! Predicted Tags: Using Image Features Only: Scenery City Streets People Posing Outdoors Group Photo Indoors Wildlife Using Optical Metadata and Thumbnail Features: Group Photo Indoors Single Person Indoors Indoor Party Confusing Indoor Artifact Background !! Illuminations © Ramesh Jain
  65. 65. Automatic Annotation   Use both Content and Optical Context   How to Combine them?   Are the Optical Context Really Useful for Annotation?   What should be the nature of annotations?   Grass, sky, …   People, animals, … © Ramesh Jain
  66. 66. More on Exif Related Experiments For Photo Tagging   Build models separately for Point-and- Shoots vs SLR cameras since their optical parameters vary a lot.   Do rigorous experiments using the same dataset (NUS WIDE or MIR Flickr) to find how content based classifiers compare with context based classifiers.   How much do we gain by including both. © Ramesh Jain
  67. 67. Personal-Photo-EventWeb © Ramesh Jain
  68. 68. Singapore – Outdoor -- People © Ramesh Jain
  69. 69. People-No Face - Outdoor © Ramesh Jain
  70. 70. Sharing Photos   Taking photos is (almost) zero cost.   People now ‘Shoot first – see later’.   Let me share 344 photos that I took $12.30 At Amazon.com yesterday with you.   Here   On Flickr   On Facebook   Tweeting cameras This is a serious problem now. Today. © Ramesh Jain
  71. 71. I want to share, but …   Flickr Problem   Facebook © Ramesh Jain
  72. 72. Our Solution: Photo Summarization   Many TYPES of Summaries to choose from:   Time/ Face Based   Image Feature Based   Applications   Sharing with friends without making them enemy   Uploading to your favorite sites   Selecting exemplar photos for printing   Refreshing your memory   Photo frames   Soon will be available on your camera. © Ramesh Jain
  73. 73. Technical Specifications:   Uses and extends state of art   EXIF   GIST Features   Faces   Color Histograms   Affinity Propagation Algorithm   Performance: Great!   Very Intuitive   Very fast   Human in the Loop: Fine Tuning   We believe – You are the BOSS © Ramesh Jain
  74. 74. Photos Summarization © Ramesh Jain
  75. 75. Original Data Set © Ramesh Jain
  76. 76. Photo-Summarization using content © Ramesh Jain
  77. 77. Photo-Summarization using Faces © Ramesh Jain
  78. 78. Using Contenxt to find Unique People in Photostreams from Multiple People in an Event © Ramesh Jain
  79. 79. Using Clothing + Face Feature (Contenxt) Step 1: Detect Faces Across All Photostreams Step2: Detect Clothing Across all Photostreams Step3: Cluster Clothing Based on Color Step 4: Find Unique Faces within each Clothing Cluster Step 5: Iterate through 3-4 by refining the parameters to get a unique set of people. © Ramesh Jain
  80. 80. Clothing Cluster 1 with corresponding Faces © Ramesh Jain
  81. 81. Unique Faces in Cluster 1: (each row is one person) © Ramesh Jain
  82. 82. Clothing Cluster 2 with corresponding Faces © Ramesh Jain
  83. 83. Unique Faces in Cluster 2: (each row is one person) © Ramesh Jain
  84. 84. Clothing Cluster 3 with corresponding Faces © Ramesh Jain
  85. 85. Unique Faces in Cluster 3: (each row is one person) © Ramesh Jain
  86. 86. Conclusions and Future research   Content (data) is important for computer vision.   Context is more important than content for solving real (and hard) problems in vision.   Real success is only possible by using ConteNXt. © Ramesh Jain
  87. 87. Thanks. For more information, jain@ics.uci.edu ? © Ramesh Jain

×