Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Image Search: Then and Now


Published on

A presentation on image search emphasizing how social interactions around images can be used to improve image retrieval.

Published in: Data & Analytics
  • Be the first to comment

Image Search: Then and Now

  1. 1. Image Search: Then and Now Integrated Knowledge Solutions
  2. 2. Outline • Introduction • Image = Content + Context • Content Based Image Retrieval (CBIR) • Bridging the Semantic Gap • Using Social Interactions for Retrieval • Where do we go from here
  3. 3. What is Image Search? • Image search means retrieving images from an image database that satisfy the user’s need. • The user need may be expressed in the following ways: – Keywords or text describing the image content – An exemplar image • Other names for image search – Image retrieval – Image similarity search – Content based image retrieval (CBIR)
  4. 4. Document Search Not a New Problem
  5. 5. Nalanda University was one of the first universities in the world, founded in the 5th Century BC, and reported to have been visited by the Buddha during his lifetime. At its peak, in the 7th century AD, Nalanda held some 10,000 students when it was visited by the Chinese scholar Xuanzang.
  6. 6. The Royal Library of Alexandria, in Egypt, seems to have been the largest and most significant great library of the ancient world. It functioned as a major center of scholarship from its construction in the third century B.C. until the Roman conquest of Egypt in 48 B.C.
  7. 7. However, Earlier Few Document Producers Many Document Consumers
  8. 8. But Now a Days No Distinction Between Document Producers and Consumers
  9. 9. Some Relevant Numbers Flickr has over 6 billion pictures as of August 2011, and 3.5 million images are uploaded daily. Photobucket has more than 10 Billion images, and over 4 million images are uploaded everyday. Facebook has over 60 Billion photos and more than 350 million photos are uploaded everyday. Instagram has over 20 billion photos. About 60 million photos are uploaded everyday.
  10. 10. An image now a days is not just a picture but it is a picture with thousand words
  11. 11. Image = Content + Context Tags Cherry blossom Japantown San Francisco Peace Pagoda Content Context
  12. 12. So, image retrieval should benefit from the contextual component, if present. How? But, first let us look at image retrieval from the content perspective only
  13. 13. QBIC/signal similarity Concept /semantic similarity Concept plus context History of Image Retrieval 1993 2002 1999
  14. 14. A Typical QBIC Type Image Retrieval System Feature Extraction FeaturesMedia Collection Indexing & Matching Query Feature Extraction Retrieved Results Relevance Feedback Such systems/approaches are often referred to as Content Based Image Retrieval (CBIR)
  15. 15. Semantic Gap Early systems produced results wherein the retrieved documents were visually similar (signal level similar) but not necessarily similar in showing the same semantic concept. Content-Based Image Retrieval at the End of the Early Years, IEEE Transactions on Pattern Analysis and Machine Intelligence , Arnold Smeulders , Marcel Worring , Simone Santini , Amarnath Gupta , Ramesh Jain , December 2000
  16. 16. Semantic Gap Users also like to query using descriptive words rather than query images or other multimedia objects. This requires retrieval systems to correlate low-level features with high level concepts. Visually dissimilar images representing the same concept.
  17. 17. Semantic Gap Challenge
  18. 18. How to Bridge the Semantic Gap? Manual annotation Use machine learning to: • Build image category classifiers to perform semantic filtering of the results • Build specific detectors for objects to associate concepts with images •Build object models using low level features Exploit context: • Text surrounding images • Associated sound track and closed captions in videos • Query history
  19. 19. Crowdsourcing for Manual Annotation
  20. 20. Example of Image Search using Keywords Search result in 2010
  21. 21. Example of Image Search using Keywords Search result in 2014 The results are better organized in sub-categories
  22. 22. Example of Image Search using Keywords
  23. 23. Example of Image Search using Keywords Search result in 2014 Again, the results are better organized in sub- categories
  24. 24. Exploiting Context: An Example Kulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Portal,” Proceedings Second Int’l Workshop on Multimedia Data Mining, 2001
  25. 25. Machine Learning of Image Concepts • Challenging problem • Presence of multiple concepts/multiple instances • Disproportionate number of negative examples • Manpower need for labeling training examples
  26. 26. Feature Extraction Issues Whole image based features. Easy to use but not very effective Region based features. Both regular region structure and segmented regions are popular Salient objects based features. Connected regions corresponding to dominant visual properties of objects in an image
  27. 27. Scale Invariant Feature Transform (SIFT) Descriptors SIFT descriptors or its variants are currently the most popular features in use. Each image generates thousands of features (key point descriptors) with each feature typically consisting of 128 values D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, 2004.
  28. 28. Learning Image Concepts • Both supervised and unsupervised learning methods (SVM, DT, AdaBoost, VQ etc.) have been used • Early work limited to few tens of categories; however some of the current systems can work with thousands of categories/concepts
  29. 29. VQ Based Learning Classifier Test Image Best Codebook Label Water Codebook Sky Codebook Fire Codebook Mustafa & Sethi (2004)
  30. 30. ecog_bow_cs223b.pdf Bag of Words Approach
  31. 31. Bag of Words Representation of Images
  32. 32. Co-occurrence of Bag of Words Image Collection Edge Analysis Images Collection of Binary Image Blocks Clustering Local Feature Descriptors (Codewords) Codeword Representation Of Images Co-occurrence Matrices of Local Features Compute Distances Image Distance Matrix Pathfinder Network Mukhopadhyay, Ma, and Sethi, “Pathfinder Networks for Content Based Image Retrieval Based on Automated Shape Feature Discovery,” ISMSE 2004
  33. 33. Co-occurrence of BoW Original image Representation by feature indices (cluster membership) Co-occurrence matrix )},(),,(max{),( ABhBAhBAH  ))max(min(),( AaBbbaBAh  Hausdorff metric Manhattan distance
  34. 34. Notice how similar images are placed together in the graph
  35. 35. Object Detectors for Image Concepts PASCAL Visual Object Classes Challenge
  36. 36. Project Web-based annotation tool to segment and label image regions. Labeled objects in images are used as training images to build object detectors.
  37. 37. IMARS provides a large number of built-in classifiers for visual categories that cover places, people, objects, settings, activities and events. It is easy to add new ones. IMARS can work on PC or laptop (trial version is available at IBM alphaWorks). IMARS can also work at large-scale for high-volume batch processing of millions and images and videos per day. Several demos of IMARS are available (see IMARS demos) Image Category Classifiers Examples
  38. 38. Semantic labeling. (a) An MPE semantic retrieval system groups images by semantic concept and learns a probabilistic model for each concept. (b) The system represents each image by a vector of posterior concept probabilities. From Pixels to Semantic Spaces: Advances in Content-Based Image Retrieval (Nuno Vasconcelos, IEEE Computer, July 2007) Image Classification via Probabilistic Modeling
  39. 39. Image = Content + Context Tags Cherry blossom Japantown San Francisco Peace Pagoda Content Context
  40. 40. Tagging All time most popular tags at Flickr
  41. 41. About Tags • User centered • Imprecise and often overly personalized • Tag distribution follows power law • Most users use very few distinct tags while a small group of users works with extremely large set of tags • Also known as Folksonomy, social tagging, and social classification
  42. 42. Why Not Use Social Tags for Retrieval? Problem: The relevant tag is often not at the top of the list. Only less than 10% of the images have their most relevant tag at the top of the list. Solution: Improve tagging by suggesting potential tags to a user / tag ranking /tag completion etc. Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, Hong- Jiang Zhang. Tag Ranking. WWW 2009. Madrid, Spain
  43. 43. Tag Recommendation using Tags Co-occurrences Given a target image and initial tags, use co-occurrence of tags to recommend tags for the target image. This approach doesn’t take into account the visual features co-occurrences.
  44. 44. Tag Recommendation using Tags Co-occurrences and Visual Similarity Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08) Given a target image and initial tags, use the existing tagged images to suggest tags for the target image.
  45. 45. Tag Ranking
  46. 46. Tag Ranking: Another Approach Dong Liu, Xian-Sheng Hua, Linjun Yang, Meng Wang, Hong- Jiang Zhang. Tag Ranking. WWW 2009. Madrid, Spain
  47. 47. How to Compute Tag Similarity
  48. 48. Tag Recommendation After Tag Ranking • Given an untagged image, find its visually similar “k” images • Pool the top two ranked tags from k images and select the unique tags as recommended tags
  49. 49. Tag Completion The complete tag matrix is generated by imposing constraints based on visual similarity, tag to tag similarity, and similarity with the initial tag matrix. The matrix completion is done by an optimization procedure. Wu and Jain, IEEE-PAMI, JANUARY 2011
  50. 50. What about Taggers & Commenters? Question: How can we incorporate taggers/commenters characteristics for improved tag recommendations? Answer: Use three sets of features: derived from image to be tagged, user’s tag history, and user’s social interactions
  51. 51. Tag History & Social Interaction Features Tag history features are based on the tags the user has used in the past Social interaction features are derived from tags/comments posted by the user’s friends/favorite posters X. Chen & H. Shin, ICDM 2010
  52. 52. Current Status of Image Search • Extensive interest as evident from conferences, journals, and special issues • Overall, solid progress is being made • Efforts towards performance evaluation with benchmarked collections are gaining more traction • Integration of content and context through tags and comments is receiving increasing attention to help improve retrieval • Killer applications are beginning to emerge as visual search gains prominence • Need for more applications outside entertainment
  53. 53. Performance Evaluation Efforts ImageCLEF2013 - Annotation Task: - 250000 Training Images - 95 (develop), 116 (test) concepts to be identified - A lot of label Noise inside the training set, due to the automatic label extraction from websites
  54. 54. Performance Evaluation Efforts TRECVID workshops, an offshoot of TREC, are yearly evaluation meetings since 2003. The goal of the workshops is to encourage research in content-based video retrieval and analysis by providing large test collections, realistic system tasks, uniform scoring procedures, and a forum for organizations interested in comparing their results.
  55. 55. Application Examples Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain, Jung-Eun Lee, and Rong Jin)
  56. 56. CBIR for Whole Slide Imageries • The availability of digital whole slide data sets represent an enormous opportunity to carry out new forms of numerical and data- driven query, in modes not based on textual, ontological or lexical matching. – Search image repositories with whole images or image regions of interest – Carry our search in real-time via use of scalable computational architectures Extraction from Image repositories based upon spatial information Analysis of data in the digital domain …001011010111010111.. Resultant Surface Map or gallery of matching images or Slide courtesy of Ulysses J. Balis, M.D. Director, Division of Pathology Informatics Department of Pathology University of Michigan Health System
  57. 57. Medical Image Retrieval Text “Find all the cases in which a tumor decrease in size for less than three month post treatment, then resumed a growth pattern after that period” QUERY ? Text + medical image “Find images with large-sized frontal lobes brain tumors for patients approximately 35 years old” +Medical image QUERY IMAGE-BASED CONCEPTS Medical image ij - Specific Signature ImageiQuery VB-Spec CUIp VB-Gen CUI1 VB-Spec CUIkIMAGE-BASED ONTOLOGY GENRAL AND SPECIALIZED QUERY MEDICAL IMAGE VISUAL ANALYSIS Text query CUIn CUI1 CUI2 QUERY TEXT-BASED CONCEPTS Textual query i - Indexes MEDICAL ONTOLOGY TEXT QUERY CONCEPTS EXTRACTION
  58. 58. Image Search Products
  59. 59. Image Search Products
  60. 60. Image Search Products
  61. 61. Image Search Products
  62. 62.
  63. 63. Take Home Message • Image/video retrieval is moving in the commercial domain. Lot more activity is expected in near future • Multimodal/cross-modal retrieval is gaining importance • Approaches combining social search and visual search techniques are expected to gain prominence • Crowdsourcing is a cheap and effective way of tagging media
  64. 64. Acknowledgement • This presentation is based on the work of numerous researchers from the MIR/ML/CVPR community. I have tried to give credit/references wherever possible. Any omission is unintentional and I apologize for that. • Also want to thank my present and past students and collaborators.
  65. 65. Questions? Email: Email: