• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Contenxt 100407

Contenxt 100407



In computer vision, context has been mostly ignored in the last two decades. We show that in understanding images, context plays more significant role that content.

In computer vision, context has been mostly ignored in the last two decades. We show that in understanding images, context plays more significant role that content.



Total Views
Views on SlideShare
Embed Views



4 Embeds 176

http://ngs.ics.uci.edu 149
http://nayayug.com 23
http://ngs-test.ics.uci.edu 3
http://www.slideshare.net 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Contenxt 100407 Contenxt 100407 Presentation Transcript

    • Contenxt: Bridging the Semantic Gap Ramesh Jain (with Pinaki Sinha and other collaborators) Department of Computer Science University of California, Irvine jain@ics.uci.edu © Ramesh Jain
    • Football Highlight System: Automatic Segmentation 15 College teams All games – 4 cameras 30 minutes after the game © Ramesh Jain
    • Find Mubarak Shah © Ramesh Jain
    • Image Search: Ramesh Jain © Ramesh Jain
    • Gives some details © Ramesh Jain
    • Tells me who may not be the ‘Real’ © Ramesh Jain
    • Finds people who are my friends © Ramesh Jain
    • Image Search: Finds activities © Ramesh Jain
    • My current research   EventWeb   Connecting and accessing Events   From Twitter, Facebook   From Web cams, Planetary Skin, …   Connecting environments   Personal Media Management   Images, Video, Text, …   Doing Computer Vision Correctly © Ramesh Jain
    • Computer Vision   Computer vision is the science and technology of machines that see.   As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images. From Wikipedia. © Ramesh Jain
    • How do you Search for Images?   Use a Content-based Image retrieval engine from XYZ University?   Use a Content Based Image Search Engine from a company?   Is there any?   I tried to do one in 1994 and built Virage as a result – but …..   Or do you use just a ‘text’ search engine? © Ramesh Jain
    • Text Search Engines   How good are text search engines in object recognition?   Lets Look at some real working systems by searching for people here. © Ramesh Jain
    • Disruptive Stages in Computing:1 Data Data: Numbers, Text, (Computation) Statistics, Sensors (Video) © Ramesh Jain
    • Computing 1: Data   Mainframe and workstations   Main applications:   Scientific and engineering   Business   Users:   Sophisticated   Expected to be trained   Dominant Technology   Computing © Ramesh Jain
    • Disruptive Stages in Computing:2 Information Information: Search, Specialized sources (Communication) Data Data: Numbers, Text, (Computation) Statistics, Sensors (Video) © Ramesh Jain
    • Computing 2: Information   PC and Internet   Main applications:   Information   Communication   Users:   Common people in ‘developed world’   Easy access using keyboards   Dominant Technology   Authoring tools   Access mechanisms   Sharing © Ramesh Jain
    • Disruptive Stages in Computing:3 Experience Experience: What Next? Direct observation or (Insights) participation Information Information: Search, Specialized sources (Communication) Data Data: Numbers, Text, (Computation) Statistics, Sensors (Video) © Ramesh Jain
    • Computing 3: Experience   Experiential devices: Mobile phones   Main applications:   Experience management   Experiential communication   Users:   Humans   No language issues   Dominant Technology   Sensor understanding   Vision and audio will be dominant © Ramesh Jain
    • © Ramesh Jain
    • © Ramesh Jain
    • The Challenge Connecting © Ramesh Jain
    • Transforma)ons
 Lists, Arrays, Documents, Images … Alphanumeric Characters Bits and Bytes © Ramesh Jain
    • Semantic Gap The semantic gap is the lack of coincidence between the information that one can extract from the (visual) data and the interpretation that the same data have for a user in a given situation. A linguistic description is almost always contextual, whereas an (image) may live by itself. Content-Based Image Retrieval at the End of the Early Years Found in: IEEE Transactions on Pattern Analysis and Machine Intelligence Arnold Smeulders , et. al., December 2000 © Ramesh Jain
    • Data Information Experience © Ramesh Jain
    • Rohrsach Test   use this test to examine a person's personality characteristics and emotional functioning © Ramesh Jain
    • Falling Tree and George Berkeley   "If a tree falls in a forest and no one is around to hear it, does it make a sound”   "No. Sound is the sensation excited in the ear when the air or other medium is set in motion.“   Observation, Reality, and Perception. © Ramesh Jain
    • Context   - text surrounding word or passage: the words, phrases, or passages that come before and after a particular word or pas…   - surrounding conditions: the circumstances or events that form the environment within which something exists or takes …   - data transfer structure: a data structure used to transfer electronic data to and from a business management system © Ramesh Jain
    • Content   - amount of something in container: the amount of something contained in something else   - subject matter: the various issues, topics, or questions dealt with in speech, discussion, or a piece of writing   - meaning or message: the meaning or message contained in a creative work, as distinct from its appearance, form, or style © Ramesh Jain
    • The Story of Computer Vision Marvin Minsky and the summer project to solve computer vision. The Psychology of Computer Vision (McGraw-Hill Computer Science Series), 1975. © Ramesh Jain
    • D.L. Waltz, Understanding Line Drawings of Scenes with Shadows. © Ramesh Jain
    • MSYS: A System for Reasoning about Scenes Harry Barrow and Martin Tenebaum April 1976 © Ramesh Jain
    • MSYS: Relational Constraints © Ramesh Jain
    • Relaxation labelling algorithms — a review J Kittler and J Illingworth Image and Vision Computing Volume 3, Issue 4, November 1985, Pages 206-216 Abstract An important research topic in image processing and image interpretation methodology is the development of methods to incorporate contextual information into the interpretation of objects. Over the last decade, relaxation labelling has been a useful and much studied approach to this problem. It is an attractive technique because it is highly parallel, involving the propagation of local information via iterative processing. The paper. surveys the literature pertaining to relaxation labelling and highlights the important theoretical advances and the interesting applications for which it has proven useful. © Ramesh Jain
    • Serge Belongie and Co-Researchers   semantic context (probability),   spatial context (position)   and scale context (size). © Ramesh Jain
    • Modeling the World   Data (Semantic Web)   Objects (Search Companies, …)   Events (Relationships among objects and attributes) Both Objects and Events are essential to model the world. © Ramesh Jain
    • Events   Take place in the real world.   Captured using different sensory mechanism.   Each sensor captures only a limited aspect of the event.   Are used to understand a Situation. © Ramesh Jain
    • What is in an Event? © Ramesh Jain
    • Events 1- dimensional Space Time © Ramesh Jain
    • History: Gopher to Google   We had Internet.   Lots of computers were connected to each other.   Computers had files on them.   We had GOPHER and other FTP mechanisms. © Ramesh Jain
    • Tim Berners-Lee thought:   Suppose all the information stored on computers everywhere were linked.   Suppose I could program my computer to create a space in which anything could be linked to anything. Others – including Bush -- had that idea earlier but the technology was not ready. © Ramesh Jain
    • That resulted in the Web   DocumentWeb   Each node is a ‘Page’ or a document.   Pages are linked through explicit referential links © Ramesh Jain
    • Then Came Google, Facebook, Twitter   Search   Maps   …   Social Network   Events   Twitter   Status Updates   Eventful © Ramesh Jain
    • Evolution of Search   Alphanumeric structured data: Databases   Information Retrieval   Search   Multimedia Search   Real Time Search (Event Search)   Will lead to identifying situations © Ramesh Jain
    • Continuing the Evolution of the Web   Consider a Web in which each node   Is an event   Has informational as well as experiential data   Is connected to other nodes using   Referential links   Structural links   Relational links   Causal links   Explicit links can be created by anybody   This EventWeb is connected to other Webs. © Ramesh Jain
    • Connectors   My 5 Senses are connectors between ‘me’ and the world.   We use our sensors (vision, audio, …) to experience the world.   Sensors could be the interface between the Cyberspace and the Real World.   Sensors are placed for ‘detecting events’.   How do you decide what sensors to put at any place?   Would you put a sensor if nothing interesting ever happens at a place? © Ramesh Jain
    • From Atomic Events to Composite Events   Spatial and Temporal aggregation   Assimilation   Composition   Using sophisticated models   Ontolgical models could be used   May include causality © Ramesh Jain
    • EventWeb 1- dimensional Space Time © Ramesh Jain
    • Types of Context   Relationship among different objects and even in their subparts in real world   Environmental parameters of the digital devices at the time of photo taking   Knowledge about the person taking photos and even of the person Interpreting photo   Real world situation in which the data is interpreted © Ramesh Jain
    • Context Starts much Before the Photo is Taken   Where   When   Why   Who (Photographer)   Which device   Parameters of the device © Ramesh Jain
    • Modern Cameras   Are more than ‘Camera Obscura’: They capture an event.   Many sensors capture scene context and store it along with intensity values.   EXIF data is all metadata related to the Event. Exposure Time Aperture Diameter Flash Metering Mode ISO Ratings Focal Length Time Location (soon) Face © Ramesh Jain
    • Sony CyberShot DSC-T2 Touchscreen 8MP Digital Camera with Smile Detection © Ramesh Jain
    • Information in a Digital Photo Voice Tags, Preset Modes, Ontology etc Latitude, Longitude Exposure Time, Focal Length, Aperture, Flash, ISO Ratings Date, Time, Time Zone © Ramesh Jain
    • Experiential Media Management Environment   Event-based   Should be able to deal with ‘multimedia’   Photos   Audio   Video   Text   Information and data   …   Searching based on events and media.   Storytelling © Ramesh Jain
    • EMME Event Cycle Event Atomic Event Presentation/ Entry Navigation Story Telling EXIF Explore Features Search Tags/ Event Base Context User Photo Annot- stream ations Event Segment. Ontology Event Grouping, Linking, Assimilation © Ramesh Jain
    • Using EMME   Searching for photo   ACM MM 2009   Creating Albums:   Professional   Family   Tourism   Telling stories   What did I do in Beijing?   Scenario: In December 2009, I have 20,000 pictures taken in 2008. How do I (semi-automatically) select 25 to send to   My mother   The uncle that I hate   My personal friend   My professional friend   … © Ramesh Jain
    • Content Contenxt Context   Contenxt = Content + Context   Context is as powerful, possibly more, as content in understanding audio-visual information © Ramesh Jain
    • Examples of Photos from the Unsupervised Clusters: High Exposure Time, Small Aperture © Ramesh Jain
    • Examples of Photos from the Unsupervised Clusters: Low Aperture (High DOF), Low FL (Wide Angle) © Ramesh Jain
    • Examples of Photos from the Unsupervised Clusters: High Aperture (Low DOF), High FL (Telephoto) © Ramesh Jain
    • Examples of Photos from the Unsupervised Clusters: Photos with Flash: Indoor shots © Ramesh Jain
    • Examples of Photos from the Unsupervised Clusters: Photos with Flash: Darker Outdoors © Ramesh Jain
    • Photos can be tagged using only EXIF! © Ramesh Jain
    • Guess the Tags!! Using Image Features Only: Scenery, City Streets, Illuminations, People Posing for Photo, Wildlife. Using Optical Parameters: Single Person Indoors, Portraits, Party Indoors, People at Dinner. © Ramesh Jain
    • Guess The Tags!! Predicted Tags: Using Image Features Only: Scenery City Streets People Posing Outdoors Group Photo Indoors Wildlife Using Optical Metadata and Thumbnail Features: Group Photo Indoors Single Person Indoors Indoor Party Confusing Indoor Artifact Background !! Illuminations © Ramesh Jain
    • Automatic Annotation   Use both Content and Optical Context   How to Combine them?   Are the Optical Context Really Useful for Annotation?   What should be the nature of annotations?   Grass, sky, …   People, animals, … © Ramesh Jain
    • More on Exif Related Experiments For Photo Tagging   Build models separately for Point-and- Shoots vs SLR cameras since their optical parameters vary a lot.   Do rigorous experiments using the same dataset (NUS WIDE or MIR Flickr) to find how content based classifiers compare with context based classifiers.   How much do we gain by including both. © Ramesh Jain
    • Personal-Photo-EventWeb © Ramesh Jain
    • Singapore – Outdoor -- People © Ramesh Jain
    • People-No Face - Outdoor © Ramesh Jain
    • Sharing Photos   Taking photos is (almost) zero cost.   People now ‘Shoot first – see later’.   Let me share 344 photos that I took $12.30 At Amazon.com yesterday with you.   Here   On Flickr   On Facebook   Tweeting cameras This is a serious problem now. Today. © Ramesh Jain
    • I want to share, but …   Flickr Problem   Facebook © Ramesh Jain
    • Our Solution: Photo Summarization   Many TYPES of Summaries to choose from:   Time/ Face Based   Image Feature Based   Applications   Sharing with friends without making them enemy   Uploading to your favorite sites   Selecting exemplar photos for printing   Refreshing your memory   Photo frames   Soon will be available on your camera. © Ramesh Jain
    • Technical Specifications:   Uses and extends state of art   EXIF   GIST Features   Faces   Color Histograms   Affinity Propagation Algorithm   Performance: Great!   Very Intuitive   Very fast   Human in the Loop: Fine Tuning   We believe – You are the BOSS © Ramesh Jain
    • Photos Summarization © Ramesh Jain
    • Original Data Set © Ramesh Jain
    • Photo-Summarization using content © Ramesh Jain
    • Photo-Summarization using Faces © Ramesh Jain
    • Using Contenxt to find Unique People in Photostreams from Multiple People in an Event © Ramesh Jain
    • Using Clothing + Face Feature (Contenxt) Step 1: Detect Faces Across All Photostreams Step2: Detect Clothing Across all Photostreams Step3: Cluster Clothing Based on Color Step 4: Find Unique Faces within each Clothing Cluster Step 5: Iterate through 3-4 by refining the parameters to get a unique set of people. © Ramesh Jain
    • Clothing Cluster 1 with corresponding Faces © Ramesh Jain
    • Unique Faces in Cluster 1: (each row is one person) © Ramesh Jain
    • Clothing Cluster 2 with corresponding Faces © Ramesh Jain
    • Unique Faces in Cluster 2: (each row is one person) © Ramesh Jain
    • Clothing Cluster 3 with corresponding Faces © Ramesh Jain
    • Unique Faces in Cluster 3: (each row is one person) © Ramesh Jain
    • Conclusions and Future research   Content (data) is important for computer vision.   Context is more important than content for solving real (and hard) problems in vision.   Real success is only possible by using ConteNXt. © Ramesh Jain
    • Thanks. For more information, jain@ics.uci.edu ? © Ramesh Jain