Machine Learning and Multimedia     Information Retrieval*     Integrated Knowledge Solutions            iksinc@yahoo.com ...
Outline             •   Introduction             •   Bridging the Semantic Gap             •   Events in Videos           ...
Too Much Information                                                             Which is more frustrating?Being stuck in ...
Not a New Problem                        Nalanda University was one of the first universities                        in th...
However, Earlier Data Producers                                     Data Consumers12/12/2010              ICMLA Talk      ...
But Now a Days12/12/2010        ICMLA Talk   6
Some Relevant Numbers                           Photobucket has 6.2 billion photos and Flickr                           ha...
Phenomenon                       • 24 hours of videos are                         uploaded to YouTube                     ...
So how do we get help in finding the desired                     multimedia information?                                  ...
So What is MIR?• Also known as CBIR (Content-based Image Retrieval) and  CBVIR (Content-based Visual Information Retrieval...
History of MIR• Conference on Database Applications of Pictorial Applications,  1979 (Florence, Italy)• NSF Workshop on Vi...
A Typical MIR System                                                           Relevance                                  ...
Semantic Gap                                                                                Early systems produced results...
Semantic GapUsers also like to query using descriptivewords rather than query images or othermultimedia objects. This requ...
How to Bridge the Semantic Gap?Exploit context• Text surrounding images• Associated sound track andclosed captions in vide...
Exploiting Context: An ExampleKulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Por...
Example of Using Surrounding Text12/12/2010                 ICMLA Talk            17
Context via Surrounding Text12/12/2010               ICMLA Talk         18
Context Via Surrounding Text: One               More Example12/12/2010          ICMLA Talk            19
Better Context with More Text12/12/2010           ICMLA Talk         20
Improving Context via More Words per               Query12/12/2010      ICMLA Talk         21
Issues Unique to ML for MIR• Simultaneous presence of  multiple concepts• How to extract/isolate  concept-specific feature...
A Template Relating Concepts with Pictures       Concepts   Image Tokens        Images12/12/2010               ICMLA Talk ...
Feature Extraction Issues  Whole image based features.  Easy to use but not very  effective                               ...
Scale Invariant Feature Transform               (SIFT) Descriptors  SIFT descriptors or its variants are  currently the mo...
Feature Discovery                                         Basic idea is to discover                                       ...
Image Category Classifiers (ICC)• Trained using both supervised and  unsupervised learning methods (SVM,  DT, AdaBoost, VQ...
VQ Based Image Category Classifier               Fire Codebook                                  Test      Best            ...
Object Detectors                        PASCAL Visual Object Classes Challenge12/12/2010         ICMLA Talk               ...
Project             Web-based annotation tool to segment and label image             regions. Labeled objects in images ar...
Image Category Classifiers Examples     IMARS provides a large number of built-in classifiers for visual categories that c...
Image Classification via Probabilistic                 ModelingSemantic labeling. (a) An MPE semantic retrieval system gro...
Retrieving Events in Videos• An event in MIR implies an interesting  spatiotemporal instance• Considerable work in MIR com...
Event Retrieval Examples: Supervised               Approach                  Mustafa & Sethi AVSS Conference 200512/12/201...
Unsupervised Learning for Event RetrievalMustafa & Sethi, ICTAI 200712/12/2010                    ICMLA Talk   35
Unsupervised Learning Based Event                Retrieval    Mustafa & Sethi, ICTAI 200712/12/2010                       ...
Retrieval By Cross-Modal Associations  - Using query from one modality (e.g. audio) to  retrieve content on a different mo...
Talking Face Example                 Feature                Extraction    Query                                          R...
Tagging in MIR             All time most popular tags at Flickr12/12/2010                  ICMLA Talk              39
About Tags•   User centered•   Imprecise and often overly personalized•   Tag distribution follows power law•   Most users...
How are Tags Being Used in MIR?             Relating tags in different languages through visual features                  ...
Tag Suggester             Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08)12/12/2010                          ...
Collaborative Tags• Also known as Folksonomy, social tagging, and social  classification• Great for content characterizati...
Decision Tree Based Tagger • Uses social tags in binary/weighted mode • Generates/suggests multiple tags through a single ...
12/12/2010   ICMLA Talk   45
12/12/2010   ICMLA Talk   46
Current Status of MIR• Extensive interest as evident from conferences,  journals, and special issues• Most in the MM commu...
MIR Application Examples  Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain,  ...
Biological and Medical Data Retrieval             http://www.cs.washington.edu/research/VACE/Multimedia/12/12/2010        ...
Killer Apps?12/12/2010       ICMLA Talk   50
http://www.iqengines.com/applications.php12/12/2010                       ICMLA Talk              51
http://www.iqengines.com/applications.php12/12/2010                       ICMLA Talk              52
http://www.thingd.com                             Bloomberg Businessweek, Nov29, 201012/12/2010             ICMLA Talk    ...
12/12/2010   ICMLA Talk   54
Take Home Message• MIR is emerging in the commercial domain.  Lot more activity is expected in near future• MIR community ...
Acknowledgement• This presentation is based on the work of  numerous researchers from the MIR/ML/CVPR  community. I have t...
Questions?12/12/2010      ICMLA Talk   57
Upcoming SlideShare
Loading in …5
×

Machine learning and multimedia information retrieval

1,409 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,409
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
58
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Machine learning and multimedia information retrieval

  1. 1. Machine Learning and Multimedia Information Retrieval* Integrated Knowledge Solutions iksinc@yahoo.com * Based on a talk at ICMLA Conference
  2. 2. Outline • Introduction • Bridging the Semantic Gap • Events in Videos • Use of Tagging in MIR • Killer Apps of MIR • Take Home Message12/12/2010 ICMLA Talk 2
  3. 3. Too Much Information Which is more frustrating?Being stuck in traffic on way to or from workNot being able to find information youurgently need According to a survey by Xerox12/12/2010 ICMLA Talk 3
  4. 4. Not a New Problem Nalanda University was one of the first universities in the world, founded in the 5th Century BC, and reported to have been visited by the Buddha during his lifetime. At its peak, in the 7th century AD, Nalanda held some 10,000 students when it was visited by the Chinese scholar Xuanzang. The Royal Library of Alexandria, in Egypt, seems to have been the largest and most significant great library of the ancient world. It functioned as a major center of scholarship from its construction in the third century B.C. until the Roman conquest of Egypt in 48 B.C.12/12/2010 ICMLA Talk 4
  5. 5. However, Earlier Data Producers Data Consumers12/12/2010 ICMLA Talk 5
  6. 6. But Now a Days12/12/2010 ICMLA Talk 6
  7. 7. Some Relevant Numbers Photobucket has 6.2 billion photos and Flickr has over 2 billion. Facebook has over 10 Billion photos and over 400 million active users.12/12/2010 ICMLA Talk 7
  8. 8. Phenomenon • 24 hours of videos are uploaded to YouTube every one minute • YouTube streams 2 billions of videos every day12/12/2010 ICMLA Talk 8
  9. 9. So how do we get help in finding the desired multimedia information? MIR12/12/2010 ICMLA Talk 9
  10. 10. So What is MIR?• Also known as CBIR (Content-based Image Retrieval) and CBVIR (Content-based Visual Information Retrieval)• Deals with systems that manage and facilitate searching for multimedia documents such as images, videos, audio clips and slides etc based on content12/12/2010 ICMLA Talk 10
  11. 11. History of MIR• Conference on Database Applications of Pictorial Applications, 1979 (Florence, Italy)• NSF Workshop on Visual Information Management Systems, 1992 (Redwood, CA)• QBIC (Query By Image Content), 1993 (SPIE’s Conf on Storage and Retrieval for Image and Video Databases), Also First ACM Multimedia Conference• Shift to semantic similarity from signal similarity, 1999• Community tagging, photo and video sharing sites, 200212/12/2010 ICMLA Talk 11
  12. 12. A Typical MIR System Relevance Feedback Query Feature Extraction Indexing & Retrieved Matching Results Media Feature Features Collection Extraction12/12/2010 ICMLA Talk 12
  13. 13. Semantic Gap Early systems produced results wherein the retrieved documents were visually similar (signal level similar) but not necessarily similar in showing the same semantic concept.http://www.searchenginejournal.com/7-similarity-based-image-search-engines/8265/ Content-Based Image Retrieval at the End of the Early Years Found in: IEEE Transactions on Pattern Analysis and Machine Intelligence , Arnold Smeulders , Marcel Worring , Simone Santini , Amarnath Gupta , Ramesh Jain , December 2000 12/12/2010 ICMLA Talk 13
  14. 14. Semantic GapUsers also like to query using descriptivewords rather than query images or othermultimedia objects. This requires MIRsystems to correlate low-level featureswith high level concepts.Visually dissimilarimages representingthe same concept.12/12/2010 ICMLA Talk 14
  15. 15. How to Bridge the Semantic Gap?Exploit context• Text surrounding images• Associated sound track andclosed captions in videos• Query history Use machine learning to: • Build image category classifiers to perform semantic filtering of the results • Build specific detectors for objects to associate concepts with images •Build object models using low level features12/12/2010 ICMLA Talk 15
  16. 16. Exploiting Context: An ExampleKulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Portal,”Proceedings Second Int’l Workshop on Multimedia Data Mining, 2001 12/12/2010 ICMLA Talk 16
  17. 17. Example of Using Surrounding Text12/12/2010 ICMLA Talk 17
  18. 18. Context via Surrounding Text12/12/2010 ICMLA Talk 18
  19. 19. Context Via Surrounding Text: One More Example12/12/2010 ICMLA Talk 19
  20. 20. Better Context with More Text12/12/2010 ICMLA Talk 20
  21. 21. Improving Context via More Words per Query12/12/2010 ICMLA Talk 21
  22. 22. Issues Unique to ML for MIR• Simultaneous presence of multiple concepts• How to extract/isolate concept-specific features? Segment or do not segment?• Imbalance between Romance, couple, beach, sundown From: s163.photobucket.com positive and negative examples• Extremely large number of concepts for a general purpose MIR12/12/2010 ICMLA Talk 22
  23. 23. A Template Relating Concepts with Pictures Concepts Image Tokens Images12/12/2010 ICMLA Talk 23
  24. 24. Feature Extraction Issues Whole image based features. Easy to use but not very effective Region based features. Both regular region structure and segmented regions are popular Salient objects based features. Connected regions corresponding to dominant visual properties of objects in an image12/12/2010 ICMLA Talk 24
  25. 25. Scale Invariant Feature Transform (SIFT) Descriptors SIFT descriptors or its variants are currently the most popular features in use. Each image generates thousands of features (key point descriptors) with each feature typically consisting of 128 values http://www.vlfeat.org/ D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, 2004.12/12/2010 ICMLA Talk 25
  26. 26. Feature Discovery Basic idea is to discover features that are best suitable for a given collection Mukhopadhyay, Ma, and Sethi, “Pathfinder Networks for Content Based Image Retrieval Based on Automated Shape Feature Discovery,” ISMSE 200412/12/2010 ICMLA Talk 26
  27. 27. Image Category Classifiers (ICC)• Trained using both supervised and unsupervised learning methods (SVM, DT, AdaBoost, VQ etc)• Early work limited to few tens of categories; however some of the current systems can work with thousands of categories/concepts12/12/2010 ICMLA Talk 27
  28. 28. VQ Based Image Category Classifier Fire Codebook Test Best Image Codebook Sky Codebook Label Mustafa & Sethi (2004) Water Codebook12/12/2010 ICMLA Talk 28
  29. 29. Object Detectors PASCAL Visual Object Classes Challenge12/12/2010 ICMLA Talk 29
  30. 30. Project Web-based annotation tool to segment and label image regions. Labeled objects in images are used as training images to build object detectors.12/12/2010 http://labelme.csail.mit.edu/ ICMLA Talk 30
  31. 31. Image Category Classifiers Examples IMARS provides a large number of built-in classifiers for visual categories that cover places, people, objects, settings, activities and events. It is easy to add new ones. IMARS can work on PC or laptop (trial version is available at IBM alphaWorks). IMARS can also work at large-scale for high-volume batch processing of millions and images and videos per day. Several demos of IMARS are available (see IMARS demos)12/12/2010 ICMLA Talk 31
  32. 32. Image Classification via Probabilistic ModelingSemantic labeling. (a) An MPE semantic retrieval system groups images by semanticconcept and learns a probabilistic model for each concept. (b) The system representseach image by a vector of posterior concept probabilities. From Pixels to Semantic Spaces: Advances in Content-Based Image Retrieval (Nuno Vasconcelos, IEEE Computer, July 2007)12/12/2010 ICMLA Talk 32
  33. 33. Retrieving Events in Videos• An event in MIR implies an interesting spatiotemporal instance• Considerable work in MIR community on events because of popularity of sports videos• Also tremendous interest in detecting and recognizing events with potential homeland security applications12/12/2010 ICMLA Talk 33
  34. 34. Event Retrieval Examples: Supervised Approach Mustafa & Sethi AVSS Conference 200512/12/2010 ICMLA Talk 34
  35. 35. Unsupervised Learning for Event RetrievalMustafa & Sethi, ICTAI 200712/12/2010 ICMLA Talk 35
  36. 36. Unsupervised Learning Based Event Retrieval Mustafa & Sethi, ICTAI 200712/12/2010 ICMLA Talk 36
  37. 37. Retrieval By Cross-Modal Associations - Using query from one modality (e.g. audio) to retrieve content on a different modality (e.g. video) - Directly on low-level features Approaches: Latent semantic indexing (LSI)Li, Dimitrova, Li and Sethi (ACM Cross-modal factor analysis (CFA)MM 03) Canonical correlation analysis (CCA)12/12/2010 ICMLA Talk 37
  38. 38. Talking Face Example Feature Extraction Query Retrieval Results Collection Cross-Modal of Image Sequences Association . . Feature . Extraction M. Li, D. Li, Dimitrova and Sethi, “Audio-Visual Talking Face Detection,” Proceedings, ICME, 200312/12/2010 ICMLA Talk 38
  39. 39. Tagging in MIR All time most popular tags at Flickr12/12/2010 ICMLA Talk 39
  40. 40. About Tags• User centered• Imprecise and often overly personalized• Tag distribution follows power law• Most users use very few distinct tags while a small group of users works with extremely large set of tags12/12/2010 ICMLA Talk 40
  41. 41. How are Tags Being Used in MIR? Relating tags in different languages through visual features Aurnhammer, Hanappe and Steels Proc. WWW200612/12/2010 ICMLA Talk 41
  42. 42. Tag Suggester Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08)12/12/2010 ICMLA Talk 42
  43. 43. Collaborative Tags• Also known as Folksonomy, social tagging, and social classification• Great for content characterization• The tag size represents the number of times the tag has been applied to the same item by different users. It kind of represents the level of agreement /confidence in a tag.12/12/2010 ICMLA Talk 43
  44. 44. Decision Tree Based Tagger • Uses social tags in binary/weighted mode • Generates/suggests multiple tags through a single decision tree classifierFirst, the label vectors associatedwith training vectors areclustered into two initial groupsNext, the SVM is used on trainingvectors to yield the split that bestmatches the clustering resultAn impurity based measure isused to iteratively adjust the split,if needed Ma, Sethi, and Patel. “Multilabel Classification Method for Multimedia Tagging”. (IJMDEM, 2010) 12/12/2010 ICMLA Talk 44
  45. 45. 12/12/2010 ICMLA Talk 45
  46. 46. 12/12/2010 ICMLA Talk 46
  47. 47. Current Status of MIR• Extensive interest as evident from conferences, journals, and special issues• Most in the MM community happy with the progress• Gap between published results and results from publicly available systems on web. (http://www.theopavlidis.com/technology/CBIR/PaperB/icpr08.htm)• Lack of application focus• Plenty of scope for machine learning to help improve MIR systems performance• Killer applications are beginning to emerge12/12/2010 ICMLA Talk 47
  48. 48. MIR Application Examples Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain, Jung-Eun Lee, and Rong Jin)12/12/2010 ICMLA Talk 48
  49. 49. Biological and Medical Data Retrieval http://www.cs.washington.edu/research/VACE/Multimedia/12/12/2010 ICMLA Talk 49
  50. 50. Killer Apps?12/12/2010 ICMLA Talk 50
  51. 51. http://www.iqengines.com/applications.php12/12/2010 ICMLA Talk 51
  52. 52. http://www.iqengines.com/applications.php12/12/2010 ICMLA Talk 52
  53. 53. http://www.thingd.com Bloomberg Businessweek, Nov29, 201012/12/2010 ICMLA Talk 53
  54. 54. 12/12/2010 ICMLA Talk 54
  55. 55. Take Home Message• MIR is emerging in the commercial domain. Lot more activity is expected in near future• MIR community is obsessed with general purpose retrieval engine; a folly pursued by computer vision community for a long time• ML is playing a vital role in MIR• Approaches combining social search and visual search techniques are expected to gain prominence12/12/2010 ICMLA Talk 55
  56. 56. Acknowledgement• This presentation is based on the work of numerous researchers from the MIR/ML/CVPR community. I have tried to give credit/references wherever possible. Any omission is unintentional and I apologize for that.• Also want to thank my present and past students and collaborators.12/12/2010 ICMLA Talk 56
  57. 57. Questions?12/12/2010 ICMLA Talk 57

×