Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mobile Visual Search


Published on

Mobile Visual Search (MVS) is a fascinating research field with many open challenges and opportunities, which have the potential to impact the way we organize, annotate, and retrieve visual data (images and videos) using mobile devices.

This talk is structured in four parts:

1. Opportunities: where I present recent and relevant numbers of the mobile computing market, particularly in the field of photography apps, social networks, and mobile search.

2. Basic concepts: where I explain the basic MVS pipeline and discuss the three main MVS scenarios and associated challenges.

3. Technical aspects: where I briefly cover topics such as feature extraction, indexing, descriptor matching, and geometric verification, discuss the state of the art in these fields, and comment on open problems and research opportunities.

4. Examples and applications: where I show representative examples of academic research and commercial apps in this field.

Published in: Technology, Business

Mobile Visual Search

  1. 1. Mobile Visual Search Oge Marques Florida Atlantic University Universitat Politècnica de Catalunya Barcelona 2 Mar 2012
  2. 2. Take-home message Mobile Visual Search (MVS) is a fascinating researchfield with many open challenges and opportunitieswhich have the potential to impact the way weorganize, annotate, and retrieve visual data (imagesand videos) using mobile devices. Oge  Marques  
  3. 3. Outline •  This talk is structured in four parts: 1.  Opportunities 2.  Basic concepts 3.  Technical aspects 4.  Examples and applications Oge  Marques  
  4. 4. Part I Opportunities
  5. 5. Mobile visual search: driving factors •  Age of mobile computing h,p://­‐mobile-­‐phones-­‐than-­‐toothbrushes/     Oge  Marques  
  6. 6. Mobile visual search: driving factors •  Why do I need a camera? I have a smartphone… (22 Dec 2011) h,p://www.cellular-­‐     Oge  Marques  
  7. 7. Mobile visual search: driving factors •  Powerful devices 1 GHz ARM Cortex-A9 processor, PowerVR SGX543MP2, Apple A5 chipset h,p://    h,p://­‐4212.php     Oge  Marques  
  8. 8. Mobile visual search: driving factors •  Powerful devices h,p://­‐series/808/Nokia808PureView_Whitepaper.pdf    h,p://­‐fr/produits/mobiles/808/     Oge  Marques  
  9. 9. Mobile visual search: driving factors Social networks and mobile devices (May 2011) h,p://­‐universe-­‐2/     Oge  Marques  
  10. 10. Mobile visual search: driving factors •  Social networks and mobile devices –  Motivated users: image taking and image sharing are huge! :  h,p://www.onlinemarkeUng-­‐­‐photo-­‐staUsUcs-­‐and-­‐insights.html     Oge  Marques  
  11. 11. Mobile visual search: driving factors •  Instagram: –  15 million registered users (in 13 months) –  7 employees –  A (growing ecosystem) based on it! •  Search •  Send postcards •  Manage your photos •  Build a poster •  etc. h,p://­‐hits-­‐15m-­‐users-­‐and-­‐has-­‐2-­‐people-­‐working-­‐on-­‐an-­‐android-­‐app-­‐right-­‐now/    h,p://       Oge  Marques  
  12. 12. Mobile visual search: driving factors •  Legitimate (or not quite…) needs and use cases h,p://­‐by-­‐sight-­‐google-­‐goggles  h,ps://twi,!/courtanee/status/14704916575       Oge  Marques  
  13. 13. Search system, a low-latency interactive visual search system. base and is the key to very fast retr Several sidebars in this article invite the interested reader to dig features they have in common wit deeper into the underlying algorithms. of potentially similar images is sele Finally, a geometric verificatio Mobile visual search: driving factors ROBUST MOBILE IMAGE RECOGNITION Today, the most successful algorithms for content-based image most similar matches in the datab spatial pattern between features of retrieval use an approach that is referred to as bag of features didate database image to ensure (BoFs) or bag of words (BoWs). The BoW idea is borrowed from Example retrieval systems are pres •  A natural use case for CBIR with QBE (at last!) text retrieval. To find a particular text document, such as a Web page, it is sufficient to use a few well-chosen words. In the For mobile visual search, ther to provide the users with an int –  The example is right in front of the user! database, the document itself can be likewise represented by a deployed systems typically transm the server, which might require t large databases, the inverted file in memory swapping operations slow ing stage. Further, the GV step and thus increases the response t the retrieval pipeline in the follow the challenges of mobile visual se Query Feature Image Extraction [FIG2] A Pipeline for image retrieva from the query image. Feature mat [FIG1] A snapshot of an outdoor mobile visual search system images in the database that have m being used. The system augments the viewfinder with with the query image. The GV step information about the objects it recognizes in the image taken feature locations that cannot be pl with a camera phone. in viewing position.Girod  et  al.  IEEE  MulUmedia  2011   Oge  Marques  
  14. 14. Part II Basic concepts
  15. 15. MVS: technical challenges •  How to ensure low latency (and interactive queries) under constraints such as: –  Network bandwidth –  Computational power –  Battery consumption •  How to achieve robust visual recognition in spite of low-resolution cameras, varying lighting conditions, etc. •  How to handle broad and narrow domains Oge  Marques  
  16. 16. MVS: Pipeline for image retrieval Girod  et  al.  IEEE  MulUmedia  2011   Oge  Marques  
  17. 17. 3 scenarios Girod  et  al.  IEEE  MulUmedia  2011   Oge  Marques  
  18. 18. Part III Technical aspects
  19. 19. Part III - Outline •  The MVS pipeline in greater detail •  Datasets for MVS research •  MPEG Compact Descriptors for Visual Search (CDVS) Oge  Marques  
  20. 20. MVS: descriptor extraction •  Interest point detection •  Feature descriptor computation Girod  et  al.  IEEE  MulUmedia  2011   Oge  Marques  
  21. 21. Interest point detection •  Numerous interest-point detectors have been proposed in the literature: –  Harris Corners (Harris and Stephens 1988) –  Scale-Invariant Feature Transform (SIFT) Difference-of-Gaussian (DoG) (Lowe 2004) –  Maximally Stable Extremal Regions (MSERs) (Matas et al. 2002) –  Hessian affine (Mikolajczyk et al. 2005) –  Features from Accelerated Segment Test (FAST) (Rosten and Drummond 2006) –  Hessian blobs (Bay, Tuytelaars and Van Gool 2006) •  Different tradeoffs in repeatability and complexity •  See (Mikolajczyk and Schmid 2005) for a comparative performance evaluation of local descriptors in a common framework. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  22. 22. Feature descriptor computation •  After interest-point detection, we compute a visual word descriptor on a normalized patch. •  Ideally, descriptors should be: –  robust to small distortions in scale, orientation, and lighting conditions; –  discriminative, i.e., characteristic of an image or a small set of images; –  compact, due to typical mobile computing constraints. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  23. 23. Feature descriptor computation •  Examples of feature descriptors in the literature: –  SIFT (Lowe 1999) –  Speeded Up Robust Feature (SURF) interest-point detector (Bay et al. 2008) –  Gradient Location and Orientation Histogram (GLOH) (Mikolajczyk and Schmid 2005) –  Compressed Histogram of Gradients (CHoG) (Chandrasekhar et al. 2009, 2010) •  See (Winder, (Hua,) and Brown CVPR 2007, 2009) and (Mikolajczyk and Schmid PAMI 2005) for comparative performance evaluation of different descriptors. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  24. 24. Feature descriptor computation •  What about compactness? –  Option 1: Compress off-the-shelf descriptors. •  Result: poor rate-constrained image-retrieval performance. –  Option 2: Design a descriptor with compression in mind. –  Example: CHoG (Compressed Histogram of Gradients) (Chandrasekhar et al. 2009, 2010) Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  25. 25. CHoG: Compressed Histogram of Gradients Gradients Gradient distributions Patch for each bin dx dy dx dy 011101 Spatial 0100101 binning 01101 101101 Histogram 0100011 111001 compression 0010011 01100 1010100 CHoG
 Descriptor Bernd Girod: Mobile Visual SearchChandrasekhar  et  al.  CVPR  09,10   Oge  Marques  
  26. 26. CHoG: Compressed Histogram of Gradients [3B2-9] mmu2011030086.3d 30/7/011 16:27 Page 92 •  Performance evaluation –  Recall vs. bit rate Industry and Standards 100 features, as they arrive.15 On 98 finds a result that has sufficien ing score, it terminates the searc 96 ately sends the results back. T optimization reduces system Classification accuracy (%) 94 other factor of two. 92 Overall, the SPS system dem using the described array of tec 90 bile visual-search systems can ac ognition accuracy, scale to re 88 databases, and deliver search r 86 ceptable time. 84 Send feature (CHoG) Emerging MPEG standard Send image (JPEG) As we have seen, key compo 82 Send feature (SIFT) gies for mobile visual search alr 80 we can choose among several p 100 101 102 tures to design such a system. W Query size (Kbytes) these options at the beginnin Figure 7. Comparison of different schemes with regard to classification The architecture shown in FigurGirod  et  al.  IEEE  MulUmedia  2011   Oge  Marques   est one to implement on a mobi accuracy and query size. CHoG descriptor data is an order of magnitude smaller compared to JPEG images or uncompressed SIFT descriptors. requires fast networks such as W good performance. The archite
  27. 27. MVS: feature indexing and matching •  Goal: produce a data structure that can quickly return a short list of the database candidates most likely to match the query image. –  The short list may contain false positives as long as the correct match is included. –  Slower pairwise comparisons can be subsequently performed on just the short list of candidates rather than the entire database. •  Example of a technique: Vocabulary Tree (VT)-Based Retrieval Girod  et  al.  IEEE  MulUmedia  2011   Oge  Marques  
  28. 28. MVS: geometric verification •  Goal: use location information of features in query and database images to confirm that the feature matches are consistent with a change in view-point between the two images. Girod  et  al.  IEEE  MulUmedia  2011   Oge  Marques  
  29. 29. ik2, c, ikNk 6 is sorted, it is moreutive ID differences 5 dk1 5 ik1,es. is used to encode the inverted index.2 ik1Nk 212 6 in place of the IDs. This dex [58] can significantly reducecting recognition accuracy. First, [64] and recursive bottom-up complete (RBUC) code [65] have been shown to be at least ten times faster in decoding than MVS: geometric verification AC, while achieving comparable compression gains as AC. The carryover and RBUC codes attain these speedups by enforcinged in text retrieval [62]. Second, word-aligned memory accesses. n be quantized to a few repre- Figure S6(a) compares the memory usage of the invert- •  Method: perform ed index with and without feature descriptorsRBUC evaluateMax quantization. Third, the dis- pairwise matching of compression using the andces and visit counts are far from code. Index compression reduces memory usage from near- geometricrate ly 10 GBof correspondences. coding can be much more consistency to 2 GB. This five times reduction leads to a sub- •  Techniques: oding. Using the distributions of stantial speedup in server-side processing, as shown incounts, each inverted list can be Figure S6(b). Without compression, the large inverted c code (AC) [63]. The geometricindex causes swapping between main anddatabase image is usually –  Since keeping transform between the query and virtual memory estimated very important for interactive regression down the retrieval engine. After compression, using robust and slows techniques such as: ions, a scheme that allows ultra- sample consensus (RANSAC) (Fischlermemory congestion •  Random memory swapping is avoided and and Bolles 1981) red over AC. The carryover code delays no longer contribute to the query latency. •  Hough transform (Lowe 2004) –  The transformation is often represented by an affine mapping or a homography. •  Note: GV is computationally expensive, which is why it’s only used for a subset of images selected during the feature-matching stage. onsistency checks to rerank tion and scale information of [53] and [69] propose incor-tion into the VT matching or 71], the authors investigatestimation itself. Philbin et al.atching features to propose c transformation model and hypotheses. Weak geometriccally used to rerank a largerore a full GVt  al.  Iperformed on011   Girod  e is EEE  MulUmedia  2 Oge  Marques   [FIG4] In the GV step, we match feature descriptors pairwise and find feature correspondences that are consistent with a geometricadd a geometric reranking step
  30. 30. Datasets for MVS research •  Stanford Mobile Visual Search Data Set ( –  Key characteristics: •  rigid objects •  widely varying lighting conditions •  perspective distortion •  foreground and background clutter •  realistic ground-truth reference data •  query data collected from heterogeneous low and high-end camera phones. Chandrasekhar  et  al.  ACM  MMSys  2011   Oge  Marques  
  31. 31. SMVS Data Set: categories and examples •  DVD covers h,p://­‐2011-­‐dataset/stanford/mvs_images/dvd_covers.html     Oge  Marques  
  32. 32. SMVS Data Set: categories and examples •  CD covers h,p://­‐2011-­‐dataset/stanford/mvs_images/cd_covers.html     Oge  Marques  
  33. 33. SMVS Data Set: categories and examples •  Museum paintings h,p://­‐2011-­‐dataset/stanford/mvs_images/museum_painUngs.html     Oge  Marques  
  34. 34. Other MVS data sets ISO/IEC  JTC1/SC29/WG11/N12202  -­‐  July  2011,  Torino,  IT   Oge  Marques  
  35. 35. MPEG Compact Descriptors for Visual Search (CDVS) •  Objective –  Define a standard that enables efficient implementation of visual search functionality on mobile devices •  Scope •  bitstream of descriptors •  parts of descriptor extraction process (e.g. key-point detection) needed to ensure interoperability –  Additional info: • • (Ad hoc groups) Bober,  Cordara,  and  Reznik  (2010)   Oge  Marques  
  36. 36. MPEG CDVS [3B2-9] mmu2011030086.3d 1/8/011 16:44 Page 93 •  Summarized timeline Table 1. Timeline for development of MPEG standard for visual search. When Milestone Comments March, 2011 Call for Proposals is published Registration deadline: 11 July 2011 Proposals due: 21 November 2011 December, 2011 Evaluation of proposals None February, 2012 1st Working Draft First specification and test software model that can be used for subsequent improvements. July, 2012 Committee Draft Essentially complete and stabilized specification. January, 2013 Draft International Standard Complete specification. Only minor editorial changes are allowed after DIS. July, 2013 Final Draft International Finalized specification, submitted for approval and Standard publication as International standard. that among several component technologies for existing standards, such as MPEG Query For- image retrieval, such a standard should focus pri- mat, HTTP, XML, JPEG, and JPSearch. marily on defining the format of descriptors andGirod  et  al.  IEEE  MulUmedia  2011   Oge  Marques   parts of their extraction process (such as interest Conclusions and outlook point detectors) needed to ensure interoperabil- Recent years have witnessed remarkable
  37. 37. Part IV Examples and applications
  38. 38. Examples •  Google Goggles •  SnapTell •  oMoby (and the IQ Engines API) •  pixlinQ •  Moodstocks Oge  Marques  
  39. 39. Examples of commercial MVS apps •  Google Goggles –  Android and iPhone –  Narrow- domain search and retrieval h,p://     Oge  Marques  
  40. 40. SnapTell •  One of the earliest (ca. 2008) MVS apps for iPhone –  Eventually acquired by Amazon (A9) •  Proprietary technique (“highly accurate and robust algorithm for image matching: Accumulated Signed Gradient (ASG)”). h,p://     Oge  Marques  
  41. 41. oMoby (and the IQ Engines API) –  iPhone app h,p://     Oge  Marques  
  42. 42. oMoby (and the IQ Engines API) •  The IQ Engines API: “vision as a service” h,p://     Oge  Marques  
  43. 43. pixlinQ •  A “mobile visual search solution that enables you to link users to digital content whenever they take a mobile picture of your printed materials.” –  Powered by image recognition from LTU technologies h,p://     Oge  Marques  
  44. 44. pixlinQ •  Example app (La Redoute) h,p://     Oge  Marques  
  45. 45. Moodstocks: overview •  Offline image recognition thanks to a smart image signatures synchronization h,p://     Oge  Marques  
  46. 46. Moodstocks: technology •  Unique features: –  offline image recognition thanks to a smart image signatures synchronization, –  QR Code decoding, –  EAN 8/13 decoding, –  online image recognition as a fallback for very large image databases, –  simultaneous run of image recognition and barcode decoding, –  seamless scans logging in the background. •  Cross-platform (iOS / Android) client-side SDK and HTTP API available: •  JPEG encoder used within their SDK also publicly available: Oge  Marques  
  47. 47. Moodstocks •  Many successful apps for different platforms h,p://     Oge  Marques  
  48. 48. Concluding thoughts
  49. 49. Concluding thoughts •  Mobile Visual Search (MVS) is coming of age. •  This is not a fad and it can only grow. •  Still a good research topic –  Many relevant technical challenges –  MPEG efforts have just started •  Infinite creative commercial possibilities Oge  Marques  
  50. 50. Thanks! •  Questions? •  For additional information: Oge  Marques