Machine learning and multimedia information retrieval
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Machine learning and multimedia information retrieval

Uploaded on


More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Machine Learning and Multimedia Information Retrieval* Integrated Knowledge Solutions * Based on a talk at ICMLA Conference
  • 2. Outline • Introduction • Bridging the Semantic Gap • Events in Videos • Use of Tagging in MIR • Killer Apps of MIR • Take Home Message12/12/2010 ICMLA Talk 2
  • 3. Too Much Information Which is more frustrating?Being stuck in traffic on way to or from workNot being able to find information youurgently need According to a survey by Xerox12/12/2010 ICMLA Talk 3
  • 4. Not a New Problem Nalanda University was one of the first universities in the world, founded in the 5th Century BC, and reported to have been visited by the Buddha during his lifetime. At its peak, in the 7th century AD, Nalanda held some 10,000 students when it was visited by the Chinese scholar Xuanzang. The Royal Library of Alexandria, in Egypt, seems to have been the largest and most significant great library of the ancient world. It functioned as a major center of scholarship from its construction in the third century B.C. until the Roman conquest of Egypt in 48 B.C.12/12/2010 ICMLA Talk 4
  • 5. However, Earlier Data Producers Data Consumers12/12/2010 ICMLA Talk 5
  • 6. But Now a Days12/12/2010 ICMLA Talk 6
  • 7. Some Relevant Numbers Photobucket has 6.2 billion photos and Flickr has over 2 billion. Facebook has over 10 Billion photos and over 400 million active users.12/12/2010 ICMLA Talk 7
  • 8. Phenomenon • 24 hours of videos are uploaded to YouTube every one minute • YouTube streams 2 billions of videos every day12/12/2010 ICMLA Talk 8
  • 9. So how do we get help in finding the desired multimedia information? MIR12/12/2010 ICMLA Talk 9
  • 10. So What is MIR?• Also known as CBIR (Content-based Image Retrieval) and CBVIR (Content-based Visual Information Retrieval)• Deals with systems that manage and facilitate searching for multimedia documents such as images, videos, audio clips and slides etc based on content12/12/2010 ICMLA Talk 10
  • 11. History of MIR• Conference on Database Applications of Pictorial Applications, 1979 (Florence, Italy)• NSF Workshop on Visual Information Management Systems, 1992 (Redwood, CA)• QBIC (Query By Image Content), 1993 (SPIE’s Conf on Storage and Retrieval for Image and Video Databases), Also First ACM Multimedia Conference• Shift to semantic similarity from signal similarity, 1999• Community tagging, photo and video sharing sites, 200212/12/2010 ICMLA Talk 11
  • 12. A Typical MIR System Relevance Feedback Query Feature Extraction Indexing & Retrieved Matching Results Media Feature Features Collection Extraction12/12/2010 ICMLA Talk 12
  • 13. Semantic Gap Early systems produced results wherein the retrieved documents were visually similar (signal level similar) but not necessarily similar in showing the same semantic concept. Content-Based Image Retrieval at the End of the Early Years Found in: IEEE Transactions on Pattern Analysis and Machine Intelligence , Arnold Smeulders , Marcel Worring , Simone Santini , Amarnath Gupta , Ramesh Jain , December 2000 12/12/2010 ICMLA Talk 13
  • 14. Semantic GapUsers also like to query using descriptivewords rather than query images or othermultimedia objects. This requires MIRsystems to correlate low-level featureswith high level concepts.Visually dissimilarimages representingthe same concept.12/12/2010 ICMLA Talk 14
  • 15. How to Bridge the Semantic Gap?Exploit context• Text surrounding images• Associated sound track andclosed captions in videos• Query history Use machine learning to: • Build image category classifiers to perform semantic filtering of the results • Build specific detectors for objects to associate concepts with images •Build object models using low level features12/12/2010 ICMLA Talk 15
  • 16. Exploiting Context: An ExampleKulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Portal,”Proceedings Second Int’l Workshop on Multimedia Data Mining, 2001 12/12/2010 ICMLA Talk 16
  • 17. Example of Using Surrounding Text12/12/2010 ICMLA Talk 17
  • 18. Context via Surrounding Text12/12/2010 ICMLA Talk 18
  • 19. Context Via Surrounding Text: One More Example12/12/2010 ICMLA Talk 19
  • 20. Better Context with More Text12/12/2010 ICMLA Talk 20
  • 21. Improving Context via More Words per Query12/12/2010 ICMLA Talk 21
  • 22. Issues Unique to ML for MIR• Simultaneous presence of multiple concepts• How to extract/isolate concept-specific features? Segment or do not segment?• Imbalance between Romance, couple, beach, sundown From: positive and negative examples• Extremely large number of concepts for a general purpose MIR12/12/2010 ICMLA Talk 22
  • 23. A Template Relating Concepts with Pictures Concepts Image Tokens Images12/12/2010 ICMLA Talk 23
  • 24. Feature Extraction Issues Whole image based features. Easy to use but not very effective Region based features. Both regular region structure and segmented regions are popular Salient objects based features. Connected regions corresponding to dominant visual properties of objects in an image12/12/2010 ICMLA Talk 24
  • 25. Scale Invariant Feature Transform (SIFT) Descriptors SIFT descriptors or its variants are currently the most popular features in use. Each image generates thousands of features (key point descriptors) with each feature typically consisting of 128 values D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, 2004.12/12/2010 ICMLA Talk 25
  • 26. Feature Discovery Basic idea is to discover features that are best suitable for a given collection Mukhopadhyay, Ma, and Sethi, “Pathfinder Networks for Content Based Image Retrieval Based on Automated Shape Feature Discovery,” ISMSE 200412/12/2010 ICMLA Talk 26
  • 27. Image Category Classifiers (ICC)• Trained using both supervised and unsupervised learning methods (SVM, DT, AdaBoost, VQ etc)• Early work limited to few tens of categories; however some of the current systems can work with thousands of categories/concepts12/12/2010 ICMLA Talk 27
  • 28. VQ Based Image Category Classifier Fire Codebook Test Best Image Codebook Sky Codebook Label Mustafa & Sethi (2004) Water Codebook12/12/2010 ICMLA Talk 28
  • 29. Object Detectors PASCAL Visual Object Classes Challenge12/12/2010 ICMLA Talk 29
  • 30. Project Web-based annotation tool to segment and label image regions. Labeled objects in images are used as training images to build object detectors.12/12/2010 ICMLA Talk 30
  • 31. Image Category Classifiers Examples IMARS provides a large number of built-in classifiers for visual categories that cover places, people, objects, settings, activities and events. It is easy to add new ones. IMARS can work on PC or laptop (trial version is available at IBM alphaWorks). IMARS can also work at large-scale for high-volume batch processing of millions and images and videos per day. Several demos of IMARS are available (see IMARS demos)12/12/2010 ICMLA Talk 31
  • 32. Image Classification via Probabilistic ModelingSemantic labeling. (a) An MPE semantic retrieval system groups images by semanticconcept and learns a probabilistic model for each concept. (b) The system representseach image by a vector of posterior concept probabilities. From Pixels to Semantic Spaces: Advances in Content-Based Image Retrieval (Nuno Vasconcelos, IEEE Computer, July 2007)12/12/2010 ICMLA Talk 32
  • 33. Retrieving Events in Videos• An event in MIR implies an interesting spatiotemporal instance• Considerable work in MIR community on events because of popularity of sports videos• Also tremendous interest in detecting and recognizing events with potential homeland security applications12/12/2010 ICMLA Talk 33
  • 34. Event Retrieval Examples: Supervised Approach Mustafa & Sethi AVSS Conference 200512/12/2010 ICMLA Talk 34
  • 35. Unsupervised Learning for Event RetrievalMustafa & Sethi, ICTAI 200712/12/2010 ICMLA Talk 35
  • 36. Unsupervised Learning Based Event Retrieval Mustafa & Sethi, ICTAI 200712/12/2010 ICMLA Talk 36
  • 37. Retrieval By Cross-Modal Associations - Using query from one modality (e.g. audio) to retrieve content on a different modality (e.g. video) - Directly on low-level features Approaches: Latent semantic indexing (LSI)Li, Dimitrova, Li and Sethi (ACM Cross-modal factor analysis (CFA)MM 03) Canonical correlation analysis (CCA)12/12/2010 ICMLA Talk 37
  • 38. Talking Face Example Feature Extraction Query Retrieval Results Collection Cross-Modal of Image Sequences Association . . Feature . Extraction M. Li, D. Li, Dimitrova and Sethi, “Audio-Visual Talking Face Detection,” Proceedings, ICME, 200312/12/2010 ICMLA Talk 38
  • 39. Tagging in MIR All time most popular tags at Flickr12/12/2010 ICMLA Talk 39
  • 40. About Tags• User centered• Imprecise and often overly personalized• Tag distribution follows power law• Most users use very few distinct tags while a small group of users works with extremely large set of tags12/12/2010 ICMLA Talk 40
  • 41. How are Tags Being Used in MIR? Relating tags in different languages through visual features Aurnhammer, Hanappe and Steels Proc. WWW200612/12/2010 ICMLA Talk 41
  • 42. Tag Suggester Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08)12/12/2010 ICMLA Talk 42
  • 43. Collaborative Tags• Also known as Folksonomy, social tagging, and social classification• Great for content characterization• The tag size represents the number of times the tag has been applied to the same item by different users. It kind of represents the level of agreement /confidence in a tag.12/12/2010 ICMLA Talk 43
  • 44. Decision Tree Based Tagger • Uses social tags in binary/weighted mode • Generates/suggests multiple tags through a single decision tree classifierFirst, the label vectors associatedwith training vectors areclustered into two initial groupsNext, the SVM is used on trainingvectors to yield the split that bestmatches the clustering resultAn impurity based measure isused to iteratively adjust the split,if needed Ma, Sethi, and Patel. “Multilabel Classification Method for Multimedia Tagging”. (IJMDEM, 2010) 12/12/2010 ICMLA Talk 44
  • 45. 12/12/2010 ICMLA Talk 45
  • 46. 12/12/2010 ICMLA Talk 46
  • 47. Current Status of MIR• Extensive interest as evident from conferences, journals, and special issues• Most in the MM community happy with the progress• Gap between published results and results from publicly available systems on web. (• Lack of application focus• Plenty of scope for machine learning to help improve MIR systems performance• Killer applications are beginning to emerge12/12/2010 ICMLA Talk 47
  • 48. MIR Application Examples Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain, Jung-Eun Lee, and Rong Jin)12/12/2010 ICMLA Talk 48
  • 49. Biological and Medical Data Retrieval ICMLA Talk 49
  • 50. Killer Apps?12/12/2010 ICMLA Talk 50
  • 51. ICMLA Talk 51
  • 52. ICMLA Talk 52
  • 53. Bloomberg Businessweek, Nov29, 201012/12/2010 ICMLA Talk 53
  • 54. 12/12/2010 ICMLA Talk 54
  • 55. Take Home Message• MIR is emerging in the commercial domain. Lot more activity is expected in near future• MIR community is obsessed with general purpose retrieval engine; a folly pursued by computer vision community for a long time• ML is playing a vital role in MIR• Approaches combining social search and visual search techniques are expected to gain prominence12/12/2010 ICMLA Talk 55
  • 56. Acknowledgement• This presentation is based on the work of numerous researchers from the MIR/ML/CVPR community. I have tried to give credit/references wherever possible. Any omission is unintentional and I apologize for that.• Also want to thank my present and past students and collaborators.12/12/2010 ICMLA Talk 56
  • 57. Questions?12/12/2010 ICMLA Talk 57