Image retrieval: challenges and opportunities


Published on

Published in: Technology, Education
  • Be the first to comment

Image retrieval: challenges and opportunities

  1. 1. Image retrieval: challenges and opportunities Oge Marques Florida Atlantic University Boca Raton, FL - USA June  4,  2012   UTFPR   Curi3ba,  PR  -­‐  Brazil  
  2. 2. Watch this… h@p://     Oge  Marques  
  3. 3. Google Goggles •  Mobile visual search (MVS) solution –  Android and iPhone –  Narrow-domain search and retrieval h@p://     Oge  Marques  
  4. 4. Outline •  How does it work? •  Why is it relevant? •  What else is going on? •  Which challenges and opportunities lie ahead? Oge  Marques  
  5. 5. Fundamentals How does it work?
  6. 6. Fundamentals •  Google Goggles is (one of) the first – and maybe the best-known – solution for MVS •  It is a contemporary example of content-based image retrieval (CBIR) •  Its technical details (algorithms, etc.) are not publicly available •  However… Oge  Marques  
  7. 7. MVS: Pipeline for image retrieval Girod  et  al.  IEEE  Mul3media  2011   Oge  Marques  
  8. 8. MVS: 3 scenarios Girod  et  al.  IEEE  Mul3media  2011   Oge  Marques  
  9. 9. MVS: descriptor extraction •  Interest point detection •  Feature descriptor computation Girod  et  al.  IEEE  Mul3media  2011   Oge  Marques  
  10. 10. Interest point detection •  Numerous interest-point detectors have been proposed in the literature: –  Harris Corners (Harris and Stephens 1988) –  Scale-Invariant Feature Transform (SIFT) Difference-of-Gaussian (DoG) (Lowe 2004) –  Maximally Stable Extremal Regions (MSERs) (Matas et al. 2002) –  Hessian affine (Mikolajczyk et al. 2005) –  Features from Accelerated Segment Test (FAST) (Rosten and Drummond 2006) –  Hessian blobs (Bay, Tuytelaars and Van Gool 2006) •  Different tradeoffs in repeatability and complexity •  See (Mikolajczyk and Schmid 2005) for a comparative performance evaluation of local descriptors in a common framework. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  11. 11. Feature descriptor computation •  After interest-point detection, we compute a visual word descriptor on a normalized patch. •  Ideally, descriptors should be: –  robust to small distortions in scale, orientation, and lighting conditions; –  discriminative, i.e., characteristic of an image or a small set of images; –  compact, due to typical mobile computing constraints. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  12. 12. Feature descriptor computation •  Examples of feature descriptors in the literature: –  SIFT (Lowe 1999) –  Speeded Up Robust Feature (SURF) interest-point detector (Bay et al. 2008) –  Gradient Location and Orientation Histogram (GLOH) (Mikolajczyk and Schmid 2005) –  Compressed Histogram of Gradients (CHoG) (Chandrasekhar et al. 2009, 2010) •  See (Winder, (Hua,) and Brown CVPR 2007, 2009) and (Mikolajczyk and Schmid PAMI 2005) for comparative performance evaluation of different descriptors. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  13. 13. Feature descriptor computation •  What about compactness? –  Option 1: Compress off-the-shelf descriptors. •  Result: poor rate-constrained image-retrieval performance. –  Option 2: Design a descriptor with compression in mind. –  Example: CHoG (Compressed Histogram of Gradients) (Chandrasekhar et al. 2009, 2010) Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  14. 14. CHoG: Compressed Histogram of Gradients Gradients Gradient distributions Patch for each bin dx dy dx dy 011101 Spatial 0100101 binning 01101 101101 Histogram 0100011 111001 compression 0010011 01100 1010100 CHoG
 Descriptor Bernd Girod: Mobile Visual SearchChandrasekhar  et  al.  CVPR  09,10   Oge  Marques  
  15. 15. CHoG: Compressed Histogram of Gradients [3B2-9] mmu2011030086.3d 30/7/011 16:27 Page 92 •  Performance evaluation –  Recall vs. bit rate Industry and Standards 100 features, as they arrive.15 On 98 finds a result that has sufficien ing score, it terminates the searc 96 ately sends the results back. T optimization reduces system Classification accuracy (%) 94 other factor of two. 92 Overall, the SPS system dem using the described array of tec 90 bile visual-search systems can ac ognition accuracy, scale to re 88 databases, and deliver search r 86 ceptable time. 84 Send feature (CHoG) Emerging MPEG standard Send image (JPEG) As we have seen, key compo 82 Send feature (SIFT) gies for mobile visual search alr 80 we can choose among several p 100 101 102 tures to design such a system. W Query size (Kbytes) these options at the beginnin Figure 7. Comparison of different schemes with regard to classification The architecture shown in FigurGirod  et  al.  IEEE  Mul3media  2011   Oge  Marques   est one to implement on a mobi accuracy and query size. CHoG descriptor data is an order of magnitude smaller compared to JPEG images or uncompressed SIFT descriptors. requires fast networks such as W good performance. The archite
  16. 16. MVS: feature indexing and matching •  Goal: produce a data structure that can quickly return a short list of the database candidates most likely to match the query image. –  The short list may contain false positives as long as the correct match is included. –  Slower pairwise comparisons can be subsequently performed on just the short list of candidates rather than the entire database. •  Example of a technique: Vocabulary Tree (VT)-Based Retrieval Girod  et  al.  IEEE  Mul3media  2011   Oge  Marques  
  17. 17. MVS: geometric verification •  Goal: use location information of features in query and database images to confirm that the feature matches are consistent with a change in viewpoint between the two images. Girod  et  al.  IEEE  Mul3media  2011   Oge  Marques  
  18. 18. ik2, c, ikNk 6 is sorted, it is moreutive ID differences 5 dk1 5 ik1,es. is used to encode the inverted index.2 ik1Nk 212 6 in place of the IDs. This dex [58] can significantly reducecting recognition accuracy. First, [64] and recursive bottom-up complete (RBUC) code [65] have been shown to be at least ten times faster in decoding than MVS: geometric verification AC, while achieving comparable compression gains as AC. The carryover and RBUC codes attain these speedups by enforcinged in text retrieval [62]. Second, word-aligned memory accesses. n be quantized to a few repre- Figure S6(a) compares the memory usage of the invert- •  Method: perform ed index with and without feature descriptorsRBUC evaluateMax quantization. Third, the dis- pairwise matching of compression using the andces and visit counts are far from code. Index compression reduces memory usage from near- geometricrate ly 10 GBof correspondences. coding can be much more consistency to 2 GB. This five times reduction leads to a sub- •  Techniques: oding. Using the distributions of stantial speedup in server-side processing, as shown incounts, each inverted list can be Figure S6(b). Without compression, the large inverted c code (AC) [63]. The geometricindex causes swapping between main anddatabase image is usually –  Since keeping transform between the query and virtual memory estimated very important for interactive regression down the retrieval engine. After compression, using robust and slows techniques such as: ions, a scheme that allows ultra- sample consensus (RANSAC) (Fischlermemory congestion •  Random memory swapping is avoided and and Bolles 1981) red over AC. The carryover code delays no longer contribute to the query latency. •  Hough transform (Lowe 2004) –  The transformation is often represented by an affine mapping or a homography. •  Note: GV is computationally expensive, which is why it’s only used for a subset of images selected during the feature-matching stage. onsistency checks to rerank tion and scale information of [53] and [69] propose incor-tion into the VT matching or 71], the authors investigatestimation itself. Philbin et al.atching features to propose c transformation model and hypotheses. Weak geometriccally used to rerank a largerore a full GVt  al.  Iperformed on011   Girod  e is EEE  Mul3media  2 Oge  Marques   [FIG4] In the GV step, we match feature descriptors pairwise and find feature correspondences that are consistent with a geometricadd a geometric reranking step
  19. 19. Relevance Why is it relevant?
  20. 20. Relevance •  Explosive growth and increasing popularity of mobile devices and apps •  (Finally!) a good use case for CBIR •  Many commercial opportunities Oge  Marques  
  21. 21. Mobile visual search: driving factors •  Age of mobile computing h@p://­‐mobile-­‐phones-­‐than-­‐toothbrushes/     Oge  Marques  
  22. 22. Mobile visual search: driving factors •  Why do I need a camera? I have a smartphone… (22 Dec 2011) h@p://www.cellular-­‐     Oge  Marques  
  23. 23. Mobile visual search: driving factors •  Powerful devices 1 GHz ARM Cortex-A9 processor, PowerVR SGX543MP2, Apple A5 chipset h@p://    h@p://­‐4212.php     Oge  Marques  
  24. 24. Mobile visual search: driving factors •  Powerful devices h@p://­‐series/808/Nokia808PureView_Whitepaper.pdf    h@p://­‐fr/produits/mobiles/808/     Oge  Marques  
  25. 25. Mobile visual search: driving factors •  Instagram: –  50 million registered users (35 M in last four months) –  7 employees –  A (growing ecosystem) based on it! •  Search •  Send postcards •  Manage your photos •  Build a poster •  etc. –  Sold to Facebook (for $ 1 Billion !) earlier this year h@p://­‐hits-­‐15m-­‐users-­‐and-­‐has-­‐2-­‐people-­‐working-­‐on-­‐an-­‐android-­‐app-­‐right-­‐now/    h@p://       Oge  Marques  
  26. 26. Search system, a low-latency interactive visual search system. base and is the key to very fast retr Several sidebars in this article invite the interested reader to dig features they have in common wit deeper into the underlying algorithms. of potentially similar images is sele Finally, a geometric verificatio Mobile visual search: driving factors ROBUST MOBILE IMAGE RECOGNITION Today, the most successful algorithms for content-based image most similar matches in the datab spatial pattern between features of retrieval use an approach that is referred to as bag of features didate database image to ensure (BoFs) or bag of words (BoWs). The BoW idea is borrowed from Example retrieval systems are pres •  A natural use case for CBIR with QBE (at last!) text retrieval. To find a particular text document, such as a Web page, it is sufficient to use a few well-chosen words. In the For mobile visual search, ther to provide the users with an int –  The example is right in front of the user! database, the document itself can be likewise represented by a deployed systems typically transm the server, which might require t large databases, the inverted file in memory swapping operations slow ing stage. Further, the GV step and thus increases the response t the retrieval pipeline in the follow the challenges of mobile visual se Query Feature Image Extraction [FIG2] A Pipeline for image retrieva from the query image. Feature mat [FIG1] A snapshot of an outdoor mobile visual search system images in the database that have m being used. The system augments the viewfinder with with the query image. The GV step information about the objects it recognizes in the image taken feature locations that cannot be pl with a camera phone. in viewing position.Girod  et  al.  IEEE  Mul3media  2011   Oge  Marques  
  27. 27. MVS: commercial opportunities •  Example app (La Redoute by pixlinQ) h@p://     Oge  Marques  
  28. 28. Context What else is going on?
  29. 29. Context •  Research: datasets and groups •  Standardization: MPEG CDVS efforts •  Commercial: main players (so far) Oge  Marques  
  30. 30. Datasets for MVS research •  Stanford Mobile Visual Search Data Set ( –  Key characteristics: •  rigid objects •  widely varying lighting conditions •  perspective distortion •  foreground and background clutter •  realistic ground-truth reference data •  query data collected from heterogeneous low and high-end camera phones. Chandrasekhar  et  al.  ACM  MMSys  2011   Oge  Marques  
  31. 31. SMVS Data Set: categories and examples •  DVD covers h@p://­‐2011-­‐dataset/stanford/mvs_images/dvd_covers.html     Oge  Marques  
  32. 32. SMVS Data Set: categories and examples •  CD covers h@p://­‐2011-­‐dataset/stanford/mvs_images/cd_covers.html     Oge  Marques  
  33. 33. SMVS Data Set: categories and examples •  Museum paintings h@p://­‐2011-­‐dataset/stanford/mvs_images/museum_pain3ngs.html     Oge  Marques  
  34. 34. Other MVS data sets ISO/IEC  JTC1/SC29/WG11/N12202  -­‐  July  2011,  Torino,  IT   Oge  Marques  
  35. 35. MPEG Compact Descriptors for Visual Search (CDVS) •  Objective –  Define a standard that enables efficient implementation of visual search functionality on mobile devices •  Scope •  bitstream of descriptors •  parts of descriptor extraction process (e.g. key-point detection) needed to ensure interoperability –  Additional info: • • (Ad hoc groups) Bober,  Cordara,  and  Reznik  (2010)   Oge  Marques  
  36. 36. MPEG CDVS [3B2-9] mmu2011030086.3d 1/8/011 16:44 Page 93 •  Summarized timeline Table 1. Timeline for development of MPEG standard for visual search. When Milestone Comments March, 2011 Call for Proposals is published Registration deadline: 11 July 2011 Proposals due: 21 November 2011 December, 2011 Evaluation of proposals None February, 2012 1st Working Draft First specification and test software model that can be used for subsequent improvements. July, 2012 Committee Draft Essentially complete and stabilized specification. January, 2013 Draft International Standard Complete specification. Only minor editorial changes are allowed after DIS. July, 2013 Final Draft International Finalized specification, submitted for approval and Standard publication as International standard. that among several component technologies for existing standards, such as MPEG Query For- image retrieval, such a standard should focus pri- mat, HTTP, XML, JPEG, and JPSearch. marily on defining the format of descriptors andGirod  et  al.  IEEE  Mul3media  2011   Oge  Marques   parts of their extraction process (such as interest Conclusions and outlook point detectors) needed to ensure interoperabil- Recent years have witnessed remarkable
  37. 37. Commercial apps •  SnapTell •  oMoby (and the IQ Engines API) •  Moodstocks Oge  Marques  
  38. 38. SnapTell •  One of the earliest (ca. 2008) MVS apps for iPhone –  Eventually acquired by Amazon (A9) •  Proprietary technique (“highly accurate and robust algorithm for image matching: Accumulated Signed Gradient (ASG)”). h@p://     Oge  Marques  
  39. 39. oMoby (and the IQ Engines API) –  iPhone app h@p://     Oge  Marques  
  40. 40. oMoby (and the IQ Engines API) •  The IQ Engines API: “vision as a service” h@p://     Oge  Marques  
  41. 41. Moodstocks: overview •  Offline image recognition thanks to a smart image signatures synchronization h@p://     Oge  Marques  
  42. 42. Perspective Which challenges andopportunities lie ahead?
  43. 43. MVS: technical challenges •  How to ensure low latency (and interactive queries) under constraints such as: –  Network bandwidth –  Computational power –  Battery consumption •  How to achieve robust visual recognition in spite of low-resolution cameras, varying lighting conditions, etc. •  How to handle broad and narrow domains Oge  Marques  
  44. 44. Other technical challenges •  How to handle the (infamous) semantic gap •  Combination of text-based and visual queries •  Visualization of results •  Users needs and intentions Oge  Marques  
  45. 45. The semantic gap •  The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation. •  “The pivotal point in content-based retrieval is that the user seeks semantic similarity, but the database can only provide similarity by data processing. This is what we called the semantic gap.” [Smeulders et al., 2000] Oge  Marques  
  46. 46. Alipr Oge  Marques  
  47. 47. Alipr Oge  Marques  
  48. 48. Alipr Oge  Marques  
  49. 49. Alipr Oge  Marques  
  50. 50. Google similarity search Oge  Marques  
  51. 51. Google similarity search Oge  Marques  
  52. 52. Google sort by subject Oge  Marques  
  53. 53. Google image swirl Oge  Marques  
  54. 54. Challenge: users’ needs and intentions •  Users and developers have quite different views •  Cultural and contextual information should be taken into account •  User intentions are hard to infer –  Privacy issues –  Users themselves don’t always know what they want –  Who misses the MS Office paper clip? Oge  Marques  
  55. 55. Concluding thoughts (Mobile) visual search and retrieval is a fascinatingresearch field with many open challenges andopportunities which have the potential to impactthe way we organize, annotate, and retrieve visualdata (images and videos). Oge  Marques  
  56. 56. Learn more about it • Oge  Marques  
  57. 57. Thanks! •  Questions? •  For additional information: Oge  Marques