SlideShare a Scribd company logo
Mobile Visual Search	


     Oge Marques	

 Florida Atlantic University	

   Boca Raton, FL - USA	


     TEWI	
  Kolloquium	
  –	
  24	
  Jan	
  2012	
  	
  
Take-home message	



Mobile Visual Search (MVS) is a fascinating research
field with many open challenges and opportunities
which have the potential to impact the way we
organize, annotate, and retrieve visual data (images
and videos) using mobile devices.	





                                                Oge	
  Marques	
  
Outline	

•  This talk is structured in four parts:	


   1.  Opportunities	


   2.  Basic concepts	


   3.  Technical details	


   4.  Examples and applications	


                                               Oge	
  Marques	
  
Part I	


Opportunities
Mobile visual search: driving factors	

  •  Age of mobile computing	





h<p://60secondmarketer.com/blog/2011/10/18/more-­‐mobile-­‐phones-­‐than-­‐toothbrushes/	
  	
     Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Smartphone market	





h<p://www.idc.com/getdoc.jsp?containerId=prUS23123911	
  	
     Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Smartphone market	





h<p://www.cellular-­‐news.com/story/48647.php?s=h	
  	
  	
     Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Why do I need a camera? I have a smartphone… 	





h<p://www.cellular-­‐news.com/story/52382.php	
  	
     Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Why do I need a camera? I have a smartphone… 	





h<p://www.cellular-­‐news.com/story/52382.php	
  	
     Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Powerful devices	





                                                            1 GHz ARM
                                                            Cortex-A9
                                                            processor,
                                                            PowerVR
                                                            SGX543MP2,
  	

                                                            Apple A5 chipset	

  	


  	

h<p://www.apple.com/iphone/specs.html	
  	
  
h<p://www.gsmarena.com/apple_iphone_4s-­‐4212.php	
  	
                           Oge	
  Marques	
  
Mobile visual search: driving factors	

                                                    Social networks
                                                    and mobile
                                                    devices	

                                                      
                                                      (May 2011)	





hp://jess3.com/geosocial-­‐universe-­‐2/	
  	
                       Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Social networks and mobile devices	

           –  Motivated users: image taking and image sharing are
              huge!	





           	



:	
  hp://www.onlinemarkeVng-­‐trends.com/2011/03/facebook-­‐photo-­‐staVsVcs-­‐and-­‐insights.html	
  	
     Oge	
  Marques	
  
Mobile visual search: driving factors	

                                  •  Instagram: 	

                                          –  13 million registered (although not
                                             necessarily active) users (in 13 months)	

                                          –  7 employees	

                                          –  Several apps based on it!	





                                  	


hp://venturebeat.com/2011/11/18/instagram-­‐13-­‐million-­‐users/	
  	
             Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Food photo
     sharing!	





  	





hp://mashable.com/2011/05/09/foodtography-­‐infographic/	
  	
     Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Legitimate (or not quite…) needs and use cases	





hp://www.slideshare.net/dtunkelang/search-­‐by-­‐sight-­‐google-­‐goggles	
  
hps://twier.com/#!/courtanee/status/14704916575	
  	
  	
                      Oge	
  Marques	
  
Search system, a low-latency interactive visual search system.         base and is the key to very fast retr
                                                      Several sidebars in this article invite the interested reader to dig   features they have in common wit
                                                      deeper into the underlying algorithms.                                 of potentially similar images is sele
                                                                                                                                 Finally, a geometric verificatio

            Mobile visual search: driving factors	

  ROBUST MOBILE IMAGE RECOGNITION
                                                      Today, the most successful algorithms for content-based image
                                                                                                                             most similar matches in the datab
                                                                                                                             spatial pattern between features of
                                                      retrieval use an approach that is referred to as bag of features       didate database image to ensure
                                                      (BoFs) or bag of words (BoWs). The BoW idea is borrowed from           Example retrieval systems are pres
    •  A natural use case for CBIR with QBE (at last!)	

                                                      text retrieval. To find a particular text document, such as a Web
                                                      page, it is sufficient to use a few well-chosen words. In the
                                                                                                                                 For mobile visual search, ther
                                                                                                                             to provide the users with an int
               –  The example is right in front of the user!	

                                                      database, the document itself can be likewise represented by a         deployed systems typically transm
                                                                                                                             the server, which might require t
                                                                                                                             large databases, the inverted file in
                                                                                                                             memory swapping operations slow
                                                                                                                             ing stage. Further, the GV step
                                                                                                                             and thus increases the response t
                                                                                                                             the retrieval pipeline in the follow
                                                                                                                             the challenges of mobile visual se




                                                                                                                                    Query         Feature
                                                                                                                                    Image        Extraction


                                                                                                                             [FIG2] A Pipeline for image retrieva
                                                                                                                             from the query image. Feature mat
                                                      [FIG1] A snapshot of an outdoor mobile visual search system            images in the database that have m
                                                      being used. The system augments the viewfinder with                    with the query image. The GV step
                                                      information about the objects it recognizes in the image taken         feature locations that cannot be pl
                                                      with a camera phone.                                                   in viewing position.
Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
                                                                                           Oge	
  Marques	
  
Part II	


Basic concepts
MVS: technical challenges	

•  How to ensure low latency (and interactive
   queries) under constraints such as:	

  –  Network bandwidth	

  –  Computational power 	

  –  Battery consumption	

•  How to achieve robust visual recognition in spite
   of low-resolution cameras, varying lighting
   conditions, etc.	

•  How to handle broad and narrow domains	


                                                 Oge	
  Marques	
  
MVS: Pipeline for image retrieval	





Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
      Oge	
  Marques	
  
3 scenarios	





Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
                      Oge	
  Marques	
  
Part III	


Technical details
Part III - Outline	

•  The MVS pipeline in greater detail	


•  Datasets for MVS research	


•  MPEG Compact Descriptors for Visual Search
   (CDVS)	





                                                Oge	
  Marques	
  
MVS: descriptor extraction	

    •  Interest point detection	

    •  Feature descriptor computation	





Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
                  Oge	
  Marques	
  
Interest point detection	

   •  Numerous interest-point detectors have been
      proposed in the literature:	

              –  Harris Corners (Harris and Stephens 1988)	

              –  Scale-Invariant Feature Transform (SIFT) Difference-of-
                 Gaussian (DoG) (Lowe 2004)	

              –  Maximally Stable Extremal Regions (MSERs) (Matas et al.
                 2002)	

              –  Hessian affine (Mikolajczyk et al. 2005)	

              –  Features from Accelerated Segment Test (FAST) (Rosten
                 and Drummond 2006)	

              –  Hessian blobs (Bay, Tuytelaars and Van Gool 2006) 	

              –  etc.	


Girod	
  et	
  al.	
  IEEE	
  Signal	
  Processing	
  Magazine	
  2011	
     Oge	
  Marques	
  
Interest point detection	

   •  Different tradeoffs in repeatability and complexity:	

              –  SIFT DoG and other affine interest-point detectors are slow to
                 compute but are highly repeatable. 	

              –  SURF interest-point detector provides significant speed up over
                 DoG interest-point detectors by using box filters and integral
                 images for fast computation. 	

                         •  However, the box filter approximation causes significant anisotropy, i.e.,
                            the matching performance varies with the relative orientation of query
                            and database images.	

              –  FAST corner detector is an extremely fast interest-point
                 detector that offers very low repeatability. 	


   •  See (Mikolajczyk and Schmid 2005) for a comparative
      performance evaluation of local descriptors in a common
      framework. 	


Girod	
  et	
  al.	
  IEEE	
  Signal	
  Processing	
  Magazine	
  2011	
                       Oge	
  Marques	
  
Feature descriptor computation	

   •  After interest-point detection, we compute a
      visual word descriptor on a normalized patch. 	


   •  Ideally, descriptors should be:	

              –  robust to small distortions in scale, orientation, and
                 lighting conditions;	

              –  discriminative, i.e., characteristic of an image or a small
                 set of images;	

              –  compact, due to typical mobile computing constraints.	



Girod	
  et	
  al.	
  IEEE	
  Signal	
  Processing	
  Magazine	
  2011	
     Oge	
  Marques	
  
Feature descriptor computation	

   •  Examples of feature descriptors in the literature:	

              –  SIFT (Lowe 1999)	

              –  Speeded Up Robust Feature (SURF) interest-point
                 detector (Bay et al. 2008) 	

              –  Gradient Location and Orientation Histogram (GLOH)
                 (Mikolajczyk and Schmid 2005)	

              –  Compressed Histogram of Gradients (CHoG)
                 (Chandrasekhar et al. 2009, 2010)	

   •  See (Winder, (Hua,) and Brown CVPR 2007, 2009) and
      (Mikolajczyk and Schmid PAMI 2005) for comparative
      performance evaluation of different descriptors. 	

Girod	
  et	
  al.	
  IEEE	
  Signal	
  Processing	
  Magazine	
  2011	
     Oge	
  Marques	
  
Feature descriptor computation	

   •  What about compactness?	

              –  Several attempts in the literature to compress off-the-
                 shelf descriptors did not lead to the best-rate-
                 constrained image-retrieval performance. 	

              –  Alternative: design a descriptor with compression in
                 mind. 	





Girod	
  et	
  al.	
  IEEE	
  Signal	
  Processing	
  Magazine	
  2011	
     Oge	
  Marques	
  
Feature descriptor computation	

   •  CHoG (Compressed Histogram of Gradients) 
      (Chandrasekhar et al. 2009, 2010)	


              –  Based on the distribution of gradients within a patch of pixels 	


              –  Histogram of gradient (HoG)-based descriptors [e.g. (Lowe
                 2004), (Bay et al. 2008), (Dalal and Triggs 2005), (Freeman and
                 Roth 1994), and (Winder et al. 2009)] have been shown to be
                 highly discriminative at low bit rates.	





Girod	
  et	
  al.	
  IEEE	
  Signal	
  Processing	
  Magazine	
  2011	
       Oge	
  Marques	
  
CHoG: Compressed Histogram of Gradients	

                                                  Gradients
   Gradient distributions
                                Patch
                             for each bin
                                                     dx



                                                     dy

                                                               dx
                                                                            dy
             011101


                                                  Spatial
                                  0100101


                                                  binning
                                                                                            01101

                                                                                            101101  

                                                                  Histogram
                                                                                            0100011

                                                                                            111001  

                                                                 compression
                                                                                            0010011

                                                                                            01100

                                                                                            1010100
                                                                                                    

                                                                                          CHoG

                                                                                         Descriptor
       Bernd Girod: Mobile Visual Search
Chandrasekhar	
  et	
  al.	
  CVPR	
  09,10	
                                                  Oge	
  Marques	
  
the context for each spatial bin.                                              I
                                                                                LHC provides two key benefits. First, encoding the                       (x,
                                                                             locations of a set of N features as the histogram reduces                   we
                                                                             the bit rate by log(N!) compared to encoding each feature                   rate

             Encoding descriptor’s location information	

                   location in sequence [47]. Here, we exploit the fact that
                                                                             the features can be sent in any order. Consider the sample
                                                                                                                                                         VGA
                                                                                                                                                         loca
                                                                             space that represents N features. There are N! number of                    tion
                                                                             codes that represent the same feature set because the                       tati
   •  Location Histogram Coding (LHC)	

                                     order does not matter. Thus, if we fix the ordering for the                 fixe




              –  Rationale: Interest-
                 point locations in
                 images tend to
                 cluster spatially.	



                                                                                                                                                         [FIG
                                                                                                                                                         a lo
                                                                             [FIG S3] Interest-point locations in images tend to cluster                 spa
                                                                             spatially.                                                                  bloc
Girod	
  et	
  al.	
  IEEE	
  Signal	
  Processing	
  Magazine	
  2011	
                                                            Oge	
  Marques	
  
spatial     different coding gains. In our experiments, Hessian Laplace
ounts,      has the highest gain followed by SIFT and SURF interest-
ns. We      point detectors. Even if the feature points are uniformly
 based      scattered in the image, LHC is still able to exploit the
 sed as     ordering gain, which results in log(N!) saving in bits.

 g the              Encoding descriptor’s location information	

                In our experiments, we found that quantizing the
            (x, y) location to four-pixel blocks is sufficient for GV. If
educes      we use a simple fixed-length coding scheme, then the
eature      rate will be log(640/4) 1 log(640/4) z 14 b/feature for a
 t that     VGA size image. Using LHC, we can transmit the same                    •  Method: 	

 ample
ber of        •  Location Histogram
            location data with z 5 b/descriptor; z 12.5 times reduc-
            tion in data compared to a 64-b floating point represen-                   1.  Generate a 2D histogram from
se the
 or the           Coding (LHC)	

            tation and z 2.8 times rate reduction compared to
            fixed-length coding [48].
                                                                                           the locations of the descriptors. 	

                                                                                           •  Divide the image into spatial bins and
                                                                                              count the number of features within
              	

                                                                             each spatial bin. 	

                                                                                       2.  Compress the binary map,
                                                                                           indicating which spatial bins
                                                                                           contains features, and a sequence
                                                 1
                                                                                           of feature counts, representing
                                                     2          1                          the number of features in
                                                     1 1        3                          occupied bins. 	

                                                     1
                                                       1            1 1                3.  Encode the binary map using a
                                                                                           trained context-based arithmetic
            [FIG S4] We represent the location of the descriptors using
                                                                                           coder, with the neighboring bins
            a location histogram. The image is first divided into evenly                   being used as the context for
er          spaced blocks. We enumerate the features within each spatial
            block by generating a location histogram.                                      each spatial bin. 	


be tra-     ed by images taken from multiple view points. The size of the
            inverted index is reduced by using geometry to find matching                                                    Oge	
  Marques	
  
          Girod	
  et	
  al.	
  IEEE	
  Signal	
  Processing	
  Magazine	
  2011	
  
sed to      features across images, and only retaining useful features and
devel-      discarding irrelevant clutter features.
MVS: feature indexing and matching	

    •  Goal: produce a data structure that can quickly return a short
       list of the database candidates most likely to match the query
       image. 	

               –  The short list may contain false positives as long as the correct match
                  is included. 	

               –  Slower pairwise comparisons can be subsequently performed on just
                  the short list of candidates rather than the entire database.	





Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
                                  Oge	
  Marques	
  
clustering is applied to the training descriptors assigned to                   fall in same cluster.
      that cluster, to generate k smaller clusters. This recursive di-                    During a query, the VT is traversed for each feature in
      vision of the descriptor space is repeated until there are                      the query image, finishing at one of the leaf nodes. The
      enough bins to ensure good classification performance.                          corresponding lists of images and frequency counts are

         MVS: feature indexing and matching	

      Figure B1 shows a VT with only two levels, branching factor
      k ¼ 3, and 32 ¼ 9 leaf nodes. In practice, VT can be much
      larger, for example, with height 6, branching factor k ¼
                                                                                      subsequently used to compute similarity scores be-
                                                                                      tween these images and the query image. By pulling
                                                                                      images from all these lists and sorting them according
      10, and containing 106 ¼ 1 million nodes.                                       to the scores, we arrive at a subset of database images

   •  Vocabulary Tree (VT)-Based Retrieval	

          The associated inverted index structure maintains two
      lists for each VT leaf node, as shown in Figure B2. For a
                                                                                      that is likely to contain a true match to the query
                                                                                      image.


                                                        1
                                                                       2


                                                               3                                       Training descriptor
                                                                                                       Root node
                                        7          8
                                                                                                       1st level intermediate node
                                                              4              5
                                                                                                       2nd level leaf node
                                            9
                                                                   6
                                (1)



                                        Vocabulary tree                                                Inverted index

                                                                                           i11       i12        i13     ...    i1N1
                                                                                           c11       c12        c13     ...   c1N
                                                                                                                                     1

                                                                                           i21       i22        i23     ...    i2N2
                              1 2 3         4 5 6        7 8 9                                                          ...    c2N2
                                                                                           c21       c22        c23

                              (2)

Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
   B. (1) Vocabulary tree and (2) inverted index structures.
                                      Figure                                                                                             Oge	
  Marques	
  
MVS: geometric verification	

    •  Goal: use location information of features in
       query and database images to confirm that the
       feature matches are consistent with a change in
       view-point between the two images.	





Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
                Oge	
  Marques	
  
counts are far from           code. Index compression reduces memory usage from near-
 be much more rate              ly 10 GB to 2 GB. This five times reduction leads to a sub-
g the distributions of          stantial speedup in server-side processing, as shown in
h inverted list can be
  [63]. Since keeping
 tant for interactive
                           MVS: geometric verification	

                                Figure S6(b). Without compression, the large inverted
                                index causes swapping between main and virtual memory
                                and slows down the retrieval engine. After compression,
me that allows ultra-           memory swapping is avoided and memory congestion
          •  Method: perform pairwise matching of feature
C. The carryover code           delays no longer contribute to the query latency.

             descriptors and evaluate geometric consistency of
             correspondences. 	

  checks to rerank
cale information of
 69] propose incor-
he VT matching or
uthors investigate
tself. Philbin et al.
 atures to propose
mation model and
s. Weak geometric
 to rerank a larger
GV is performed on
                                             [FIG4] In the GV step, we match feature descriptors pairwise and
                                             find feature correspondences that are consistent with a geometric
etric reranking step                         model. True feature matches are shown in red. False feature
ted inGirod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
  
        Figure 5. In                         matches are shown in green.                                         Oge	
  Marques	
  
MVS: geometric verification	

    •  Techniques: 	

               –  The geometric transform between the query and database
                  image is usually estimated using robust regression
                  techniques such as:	

                          •  Random sample consensus (RANSAC) (Fischler and Bolles 1981)	

                          •  Hough transform (Lowe 2004)	

               –  The transformation is often represented by an affine
                  mapping or a homography. 	


    •  GV is computationally expensive, which is why it’s
       only used for a subset of images selected during the
       feature-matching stage. 	


Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
                                      Oge	
  Marques	
  
mation itself. Philbin et al.
ching features to propose
 transformation model and
                        MVS: geometric reranking	

ypotheses. Weak geometric
lly used to rerank a larger
e a full GV is performed on
       •  Speed-up step between Vocabulary Tree building
                             [FIG4] In the GV step, we match feature descriptors pairwise and
                             find feature correspondences that are consistent with a geometric
          and Geometric Verification.	

 are shown in red. False feature
d a geometric reranking step model. True feature matches
s illustrated in Figure 5. In    matches are shown in green.
tep that
 mation
up stage
t of top         Query                     Geometric                                 Identify
                              VT                                   GV
                 Data                      Reranking                               Information
 ometric
 e of the
x, y fea- [FIG5] An image retrieval pipeline can be greatly sped up by incorporating a geometric
use scale reranking stage.


           IEEE SIGNAL PROCESSING MAGAZINE [69] JULY 2011




                                                                                           Oge	
  Marques	
  
Fast geometric reranking	

 ing algorithm in
erank a short list
  set of potential                                                           log (÷)
  database image
nerating a set of
 geometric score                                (a)       (b)       (c)                   (d)
te the geometric
We find the dis- location geometric score is computed as follows: 	

          •  The                       [FIG S7] The location geometric score is computed as
 uery image a)  features of twofeatures are matched based on VT quantization;	

                         and           follows: (a) images of two images are matched based on
                                       VT quantization, (b) distances between pairs of features
 atching features distances between pairs of features within an image are calculated;	

                        b) 
  distance corre-                      within an image are calculated, (c) log-distance ratios of the
                        c)  log-distance ratiospairs (denoted by color) pairs (denotedand color) are
                                       corresponding of the corresponding are calculated, by
the two images. calculated; and	

                                       (d) histogram of log-distance ratios is computed. The
res in the query histogram of log-distance histogram is the geometric similarity
                        d)             maximum value of the ratios is computed. 	

es. If there exists                    score. A peak in the histogram indicates a similarity
         •  The maximum value of the histogram is the geometric
  peak in the his-                     transform between the query and database image.
            similarity score. 	

y that the query
                        –  A peak in the histogram indicates a similarity transform between
                                       The time required to calculate a geometric similarity score
 use we use the                  the query to two orders of magnitude less than using RANSAC.
                                       is one and database image.	

 otential feature                      Typically, we perform fast geometric reranking on the top
  scoring scheme.MulVmedia	
  2011	
  
        Girod	
  et	
  al.	
  IEEE	
   500 images and RANSAC on the top 50 ranked images.             Oge	
  Marques	
  
Datasets for MVS research	

   •  Stanford Mobile Visual Search Data Set 
           (http://web.cs.wpi.edu/~claypool/mmsys-2011-dataset/stanford/)	

             –  Key characteristics:	

                       •  rigid objects	

                       •  widely varying lighting conditions	

                       •  perspective distortion	

                       •  foreground and background clutter	

                       •  realistic ground-truth reference data	

                       •  query data collected from heterogeneous low and high-end
                          camera phones. 	




Chandrasekhar	
  et	
  al.	
  ACM	
  MMSys	
  2011	
                          Oge	
  Marques	
  
Stanford Mobile Visual Search (SMVS) 
                                   Data Set	

            •  Limitations of popular datasets	

                Data       Database     Query    Classes   Rigid    Lighting    Clutter   Perspective    Camera
                 Set         (#)         (#)      (#)                                                    Phone
                                                             √                    √            √
               ZuBuD         1005        115      200        √         −
                                                                       √          √            √           −
               Oxford        5062         55       17                             √            √           ×
               INRIA         1491        500      500        −
                                                             √         −                       √           −
                UKY         10200       2550      2550                 −
                                                                       √          −
                                                                                  √            √           −
              ImageNet       11M         15K      15K        −
                                                             √         √          √            √           −
                                                                                                           √
               SMVS          1200       3300      1200


able 1: Comparison of different data sets. “Classes” refers to the number of distinct objects in the data set.
Rigid” refers to whether on not the objects in the database are rigid. “Lighting” refers to whether or not
                                                                ZuBuD
 e query images capture widely varying lighting conditions. “Clutter” refers to whether or not the query
mages contain foreground/background clutter. “Perspective” refers to whether the data set contains typical
 rspective distortions. “Camera-phone” refers to whether the images were captured with mobile devices.
MVS is a good data set for mobile visual search applications.
                                                                               Oxford
ries like CDs, DVDs, books, text documents and business             affine models with a minimum threshold of 10 matches post-
rds, we capture the images indoors under widely varying             RANSAC for declaring a pair of images to be a valid match.
hting conditions over several days. We include foreground              In Fig. 3, we report results for 3 state-of-the-art schemes:
d background clutter that would be typically present in             (1) SIFT INRIA
                                                                                Difference-of-Gaussian (DoG) interest point de-
e application, e.g., a picture of a CD would might other            tector and SIFT descriptor (code: [27]), (2) Hessian-affine
Ds in the background. For landmarks, we capture images              interest point detector and SIFT descriptor (code [17]), and
 buildings in San Francisco. We collected query images              (3) Fast Hessian blob interest point detector [2] sped up
veral months after the reference data was collected. For
                                                                                UKY
                                                                    with integral images, and the recently proposed Compressed
deo clips, the query images were taken from laptop, com-            Histogram of Gradients (CHoG) descriptor [4]. We report
 ter and TV screens to include typical specular distortions.        the percentage of images that match, the average number
nally, the paintings were captured at the Cantor Arts Cen-          of features and the average number of features that match
  at Stanford University under controlled lighting condi-
                                                                            Image Nets
                                                                    post-RANSAC for each category.
 ns typical of museums. et	
  al.	
  ACM	
  MMSys	
  2011	
  
          Chandrasekhar	
                                                                                                                    Oge	
  Marques	
  
                                                              Figure 2: Limitations with popular data sets in are easier vision. The left most image in each row is
                                                                       First, we note that indoor categories computer than out-
The resolution of the query images varies for each camera     database image, and the E.g., some categories like CDs, ZuBuD,and
                                                                    door categories.   other 3 images are query images. DVDs INRIA and UKY consist of images ta
 one. We provide the original JPEG compressed high qual- at the book time and location. ImageNets is not suitable for image retrieval applications. The Oxford data
                                                                     same
                                                                          covers achieve over 95% accuracy. The most challeng-
                                                             has different faades of the same building labelled with the same name.
SMVS Data Set: categories and examples	


   •  Number of query and database images per
      category	





Chandrasekhar	
  et	
  al.	
  ACM	
  MMSys	
  2011	
                 Oge	
  Marques	
  
SMVS Data Set: categories and examples	


  •  DVD covers	





hp://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/dvd_covers.html	
  	
     Oge	
  Marques	
  
SMVS Data Set: categories and examples	


  •  CD covers	





hp://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/cd_covers.html	
  	
     Oge	
  Marques	
  
SMVS Data Set: categories and examples	


  •  Museum paintings	





hp://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/museum_painVngs.html	
  	
     Oge	
  Marques	
  
Other MVS data sets	





ISO/IEC	
  JTC1/SC29/WG11/N12202	
  -­‐	
  July	
  2011,	
  Torino,	
  IT	
     Oge	
  Marques	
  
Other MVS data sets	





ISO/IEC	
  JTC1/SC29/WG11/N12202	
  -­‐	
  July	
  2011,	
  Torino,	
  IT	
     Oge	
  Marques	
  
Other MVS data sets	

   •  Distractor set	

             –  1 million images of various resolution and content
                collected from FLICKR. 	





ISO/IEC	
  JTC1/SC29/WG11/N12202	
  -­‐	
  July	
  2011,	
  Torino,	
  IT	
     Oge	
  Marques	
  
MPEG Compact Descriptors for Visual Search (CDVS)	


   •  Objectives	

              –  Define a standard that:	

                        •  enables design of visual search applications	

                        •  minimizes lengths of query requests	

                        •  ensures high matching performance (in terms of reliability and
                           complexity)	

                        •  enables interoperability between search applications and visual databases	

                        •  enables efficient implementation of visual search functionality on mobile
                           devices	

   •  Scope	

              –  It is envisioned that (as a minimum) the standard will specify:	

                        •  bitstream of descriptors	

                        •  parts of descriptor extraction process (e.g. key-point detection) needed
                           to ensure interoperability	



Bober,	
  Cordara,	
  and	
  Reznik	
  (2010)	
                                                Oge	
  Marques	
  
MPEG CDVS	

   •  Requirements	

              –  Robustness	

                        •  High matching accuracy shall be achieved at least for images of textured
                           rigid objects, landmarks, and printed documents. The matching accuracy
                           shall be robust to changes in vantage point, camera parameters, lighting
                           conditions, as well as in the presence of partial occlusions.	

              –  Sufficiency	

                        •  Descriptors shall be self-contained, in the sense that no other data are
                           necessary for matching.	

              –  Compactness	

                        •  Shall minimize lengths/size of image descriptors	

              –  Scalability	

                        •  Shall allow adaptation of descriptor lengths to support the required
                           performance level and database size.	

                        •  Shall enable design of web-scale visual search applications and databases.	


Bober,	
  Cordara,	
  and	
  Reznik	
  (2010)	
                                                  Oge	
  Marques	
  
MPEG CDVS	

   •  Requirements (cont’d)	

              –  Image format independence	

                        •  Descriptors shall be independent of the image format	

              –  Extraction complexity	

                        •  Shall allow descriptor extraction with low complexity (in terms of
                           memory and computation) to facilitate video rate implementations	

              –  Matching complexity	

                        •  Shall allow matching of descriptors with low complexity (in terms of
                           memory and computation).	

                        •  If decoding of descriptors is required for matching, such decoding shall
                           also be possible with low complexity.	

              –  Localization:	

                        •  Shall support visual search algorithms that identify and localize matching
                           regions of the query image and the database image	

                        •  Shall support visual search algorithms that provide an estimate of a
                           geometric transformation between matching regions of the query image
                           and the database image	


Bober,	
  Cordara,	
  and	
  Reznik	
  (2010)	
                                                Oge	
  Marques	
  
MPEG CDVS	

                          [3B2-9]    mmu2011030086.3d       1/8/011   16:44   Page 93




  •  Summarized timeline	

         Table 1. Timeline for development of MPEG standard for visual search.


         When                   Milestone                             Comments
         March, 2011            Call for Proposals is published       Registration deadline: 11 July 2011
                                                                      Proposals due: 21 November 2011
         December, 2011         Evaluation of proposals               None
         February, 2012         1st Working Draft                     First specification and test software model that can
                                                                        be used for subsequent improvements.
         July, 2012             Committee Draft                       Essentially complete and stabilized specification.
         January, 2013          Draft International Standard          Complete specification. Only minor editorial
                                                                        changes are allowed after DIS.
         July, 2013             Final Draft International             Finalized specification, submitted for approval and
                                    Standard                            publication as International standard.




                that among several component technologies for         existing standards, such as MPEG Query For-
                image retrieval, such a standard should focus pri-    mat, HTTP, XML, JPEG, and JPSearch.
                marily on defining the format of descriptors and
Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
                                                                    Oge	
  Marques	
  
                parts of their extraction process (such as interest   Conclusions and outlook
                point detectors) needed to ensure interoperabil-         Recent years have witnessed remarkable
MPEG CDVS	

   •  CDVS: evaluation framework	

             –  Experimental setup	

                       •  Retrieval experiment: intended to assess performance of
                          proposals in the context of an image retrieval system	





ISO/IEC	
  JTC1/SC29/WG11/N12202	
  -­‐	
  July	
  2011,	
  Torino,	
  IT	
          Oge	
  Marques	
  
MPEG CDVS	

   •  CDVS: evaluation framework	

             –  Experimental setup	

                       •  Pair-wise matching experiments: intended for assessing
                          performance of proposals in the context of an application
                          that uses descriptors for the purpose of image matching.	

                               Annota-­‐
                                Vons	
  
                                                                                                Check	
  
                                                                                             accuracy	
  of	
       Report	
  
                                                                                            search	
  results	
  
                               Image	
  A	
                 Extract	
  
                                                           descriptor	
  

                                                                                Match	
  


                               Image	
  B	
                 Extract	
  
                                                          descriptors	
  




ISO/IEC	
  JTC1/SC29/WG11/N12202	
  -­‐	
  July	
  2011,	
  Torino,	
  IT	
                                                      Oge	
  Marques	
  
MPEG CDVS	

•  For more info: 	

                	

   –  https://mailhost.tnt.uni-hannover.de/mailman/listinfo/cdvs 	

   –  http://mpeg.chiariglione.org/meetings/geneva11-1/geneva_ahg.htm 
      (Ad hoc groups)	





                                                                            Oge	
  Marques	
  
Part IV	


Examples and applications
Examples	

•  Academic	

   –  Stanford Product Search System	

•  Commercial 	

   –  Google Goggles	

   –  Kooaba: Déjà Vu and Paperboy	

   –  SnapTell 	

   –  oMoby (and the IQ Engines API)	

   –  pixlinQ	

   –  Moodstocks	


                                          Oge	
  Marques	
  
list of important references for each module in the matching                                        (500 3 500 pixel resolution) [75] exhibiting challenging pho-
pipeline in Table 2.                                                                                tometric and geometric distortions, as shown in Figure 7. For



           Stanford Product Search (SPS) System	

 [TABLE 2] SUMMARY OF REFERENCES FOR MODULES IN A MATCHING PIPELINE.

 MODULE                                                     LIST OF REFERENCES


          •  Local feature based visual search system	

 FEATURE EXTRACTION                                         HARRIS AND STEPHENS [17], LOWE [15], [23], MATAS ET AL. [18], MIKOLAJCZYK ET AL. [16], [22],
                                                            DALAL AND TRIGGS [41], ROSTEN AND DRUMMOND [19], BAY ET AL. [20], WINDER ET AL. [27], [28],
                                                            CHANDRASEKHAR ET AL. [25], [26], PHILBIN ET AL. [40]
 FEATURE INDEXING AND MATCHING                              SCHMID AND MOHR [13], LOWE [15], [23], SIVIC AND ZISSERMAN [9], NISTÉR AND STEWÉNIUS [10],
                                                            CHUM ET AL. [50], [52], [53], YEH ET AL. [51], PHILBIN ET AL. [12], JEGOU ET AL. [11], [59], [60], ZHANG ET AL. [54]
                                                            CHEN ET AL. [58], PERRONNIN [61], MIKULIK ET AL. [55], TURCOT AND LOWE [56], LI ET AL. [57]
 GV                                                         FISCHLER AND BOLLES [66], SCHAFFALITZKY AND ZISSERMAN [74], LOWE [15], [23], CHUM ET AL. [53], [70], [71]

          •  Client-server architecture	

                  FERRARI ET AL. [68], JEGOU ET AL. [11], WU ET AL. [69], TSAI ET AL. [73]




                                                                                              Query Data
                              Image             Feature                Feature
                                                                                                                               VT Matching
                                               Extraction            Compression
                                                                                               Network
                                                            Display                                                                  GV
            Client                                                                        Identification Data                                                 Server


[FIG6] Stanford Product Search system. Because of the large database, the image-recognition server is placed at a remote location. In
most systems [1], [3], [7], the query image is sent to the server and feature extraction is performed. In our system, we show that by
performing feature extraction on the phone we can significantly reduce the transmission delay and provide an interactive experience.



                                                                      IEEE SIGNAL PROCESSING MAGAZINE [70] JULY 2011

      Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
  
      Tsai	
  et	
  al.	
  ACM	
  MM	
  2010	
                                                                                                              Oge	
  Marques	
  
Stanford Product Search (SPS) System	

    •  Key contributions:	

               –  Optimized feature extraction implementation	

               –  CHoG: a low bit-rate compact descriptor (provides up
                  to 20× bit-rate saving over SIFT with comparable
                  image retrieval performance)	

               –  Inverted index compression to enable large-scale
                  image retrieval on the server	

               –  Fast geometric re-ranking 	




Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
  
Tsai	
  et	
  al.	
  ACM	
  MM	
  2010	
                         Oge	
  Marques	
  
The system
                                                      including different distances, viewing angles,                                                              Figure 1. A
                                                                                                                                                                  the viewfind
                                                      Mobile image-based retrieval
                                                      and lighting conditions, or in the presence of                                                               of an outdo
                                                                                                                                                                  information
                                                      technologies
                                                      partial occlusions or motion blur.                                                                           visual-searc
                                                                                                                                                                  objects it rec
                                                          Most successful algorithms for image-based
                                                                                                                                                                  The system
                                                                                                                                                                  the image ta
                                                      retrieval today use an approach that is referred


     Stanford Product Search (SPS) System	

                                                                                                                                                                  the viewfind
                                                                                                                                                                  a phone cam
                                                      Mobile image-based(BoF) or bag of words
                                                      to as bag of features retrieval
                                                                                                                                                                  information
                                                      technologiesBoW idea is borrowed from text
                                                      (BoW).1,2 The
                                                                                                                                                                  objects it re
                                                      document retrieval. To find afor image-based
                                                         Most successful algorithms particular text
                                                                                                                                                                  the image ta
                                                      retrieval today use anwebpage, it’s sufficient to
                                                      document, such as a approach that is referred
                                                                                                                                                                  a phone cam
                                                      to as few well-chosen words. or the database,
                                                      use a bag of features (BoF) In bag of words


    •  Two modes:	

                                                      (BoW).1,2 The BoW idea is borrowed be repre-
                                                      the document itself can likewise from text          characteristic of a particular image take the
                                                      document a bag of salient words, regardless
                                                      sented by retrieval. To find a particular text      role of visual words. As with text retrieval,
                                                      document, such as a webpage, it’s sufficient to
                                                      of where these words appear in the document.        BoF image retrieval does not consider where
                                                      use aimages, robust local features database,
                                                      For few well-chosen words. In the that are
               –  Send Image mode	

                                                                                                          in the image the features occur, at least in the
                                                      the document itself can likewise be repre-          characteristic of a particular image take the
                                                      sented by a bag of salient words, regardless        role of visual words. As with text retrieval,
                                                                                                                                                                  Figure 2. Mo
                                                                 Mobile phone
                                                      of where these words appear in the document.                          Visual search server
                                                                                                          BoF image retrieval does not consider where
                                                                                                                                                                  search archi
                                                      For images, robust local features that are          in the image the features occur, at least in the
                                                                                                                                                                  (a) The mob
                                                                       Image encoding                            Image          Descriptor
                                                       Image                                                                                                      transmits th
                                                                            (JPEG)                              decoding        extraction
                                                                                                                                                                  compressed
                                                                                                                                                                   Figure 2. Mo
                                                                  Mobile phone                                                 Visual search server
                                                                                            Wireless network                    Descriptor                        while analys
                                                                                                                                                                   search arch
                                                                                                                                matching          Database        image and r
                                                                                                                                                                   (a) The mob
                                                                     Image encoding                              Image          Descriptor                        are done ent
                                                       Image Process and (JPEG)                                   Search                                          transmits th
                                                                                                                decoding        extraction
                                                            display results                                       results                                         remote serve
                                                                                                                                                                  compressed
                                                                                           Wireless network                      Descriptor                       local image
                                                                                                                                                                   while analy
                                                                                                                                 matching          Database       (descriptors)
                                                                                                                                                                   image and r
                                                      (a)
                                                                                                                                                                  extracted en
                                                                                                                                                                  are done on
                                                                 Process and                                      Search
                                                                display results                                   results                                         phone and t
                                                                                                                                                                  remote serve
                                                                   Mobile phone                                             Visual search server
                                                                                                                                                                  encoded and

               –  Send Feature mode	

                                                                                                                                                                  local image
                                                                                                                                                                  transmitted
                                                                                                                                                                  (descriptors
                                                      (a)         Descriptor Descriptor                           Descriptor
                                                        Image                                                                                                     network. Su
                                                                                                                                                                  extracted on
                                                                  extraction   encoding                           decoding
                                                                                                                                                                  descriptors a
                                                                                                                                                                  phone and t
                                                                  Mobile phone                                             Visual search server
                                                                                            Wireless network                Descriptor                            used by the
                                                                                                                                                                  encoded and
                                                                                                                             matching           Database          perform the
                                                                                                                                                                  transmitted
                                                               Descriptor Descriptor                              Descriptor                                      (c) The mob
                                                                                                                                                                  network. Su
                                                       Image Process and
                                                               extraction   encoding                              Search
                                                                                                                  decoding
                                                            display results                                       results                                         maintains a
                                                                                                                                                                  descriptors a
                                                                                                                                                                  the databas
                                                                                                                                                                  used by the
                                                                                           Wireless network                    Descriptor
                                                                                                                               matching            Database       search reque
                                                                                                                                                                  perform the
                                                      (b)
                                                                                                                                                                  remote serve
                                                                                                                                                                  (c) The mob
                                                                 Process and                                      Search
                                                                display results                                                                                   object of int
                                                                                                                                                                  maintains a
                                                                   Mobile phone                                   results   Visual search server
                                                                                                                                                                  found in thi
                                                                                                                                                                  the databas
                                                                                                                                                                  further requ
                                                                                                                                                                  search redu
Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
     (b)
                                                        Image
                                                                  Descriptor Descriptor                          Descriptor
                                                                                                                                                                  amountserve
                                                                                                                                                                  remote of d
                                                                  extraction  encoding                           decoding
Tsai	
  et	
  al.	
  ACM	
  MM	
  2010	
                                                                                                             Oge	
  Marques	
  
                                                                                                                                                                  over the netw
                                                                                                                                                                  object of int
                                                                  Mobile phone      No                                     Visual search server
                                                                  Descriptor    Found       Wireless network                Descriptor                            found in th
                                                                  matching                                                  matching            Database          further redu
                                                                  Descriptor Descriptor                          Descriptor
Stanford Product Search System	

   •  Performance evaluation	

             –  Dataset: 1 million CD, DVD, and book cover images +
                1,000 query images (500×500) with challenging
                photometric and geometric distortions	





                                                                                            (a)




                                                                                            (b)


Girod	
  et	
  al.	
  IEEE	
  distortions. 2011	
   pairs from the data set. (a) A clean database picture is matched against (b) a real-world picture with various
                           [FIG7] Example image
                              MulVmedia	
                                                                                                                            Oge	
  Marques	
  


                    the client, we use a Nokia 5800 mobile phone with a 300-MHz                     Figure 8 compares the recall for the three schemes: send
Stanford Product Search System	

                                                                                  [3B2-9]   mmu2011030086.3d    30/7/011    16:27   Page 92


    •  Performance evaluation	

               –  Recall vs. bit rate	

      Industry and Standards


                                                                           100
                                                                                                                                                       features, as they arrive.15 On
                                                                           98                                                                          finds a result that has sufficien
                                                                                                                                                       ing score, it terminates the searc
                                                                           96                                                                          ately sends the results back. T
                                                                                                                                                       optimization reduces system
                                             Classification accuracy (%)




                                                                           94
                                                                                                                                                       other factor of two.
                                                                           92                                                                             Overall, the SPS system dem
                                                                                                                                                       using the described array of tec
                                                                           90                                                                          bile visual-search systems can ac
                                                                                                                                                       ognition accuracy, scale to re
                                                                           88
                                                                                                                                                       databases, and deliver search r
                                                                           86                                                                          ceptable time.

                                                                           84                                              Send feature (CHoG)         Emerging MPEG standard
                                                                                                                           Send image (JPEG)              As we have seen, key compo
                                                                           82
                                                                                                                           Send feature (SIFT)         gies for mobile visual search alr
                                                                           80                                                                          we can choose among several p
                                                                                 100                           101                               102
                                                                                                                                                       tures to design such a system. W
                                                                                                  Query size (Kbytes)
                                                                                                                                                       these options at the beginnin
                                               Figure 7. Comparison of different schemes with regard to classification                                 The architecture shown in Figur
Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
                                                                                                                   Oge	
  Marques	
  
                                                                                                                                                       est one to implement on a mobi
                                               accuracy and query size. CHoG descriptor data is an order of magnitude
                                               smaller compared to JPEG images or uncompressed SIFT descriptors.                                       requires fast networks such as W
                                                                                                                                                       good performance. The archite
achieve ,1 s server processing latency while maintaining
                                      high recall.                                                                                                14




                                                                                                                     Communication Time-Out (%)
                     Stanford Product Search System	

                                                                                                                                                  12
                                      TRANSMISSION DELAY
                                      The transmission delay depends on the type of network used.                                                 10
                                      In Figure 10, we observe that the data transmission time is                                                 8
    •  Performance evaluation	

      insignificant for a WLAN network because of the high
                                                                                                                                                  6
               –  Processing times	

                                                                                                             4
                                        [TABLE 3] PROCESSING TIMES.
                                                                                                                                                  2
                                        CLIENT-SIDE OPERATIONS                         TIME ( S)
                                                                                                                                                  0
                                        IMAGE CAPTURE                                     1–2                                                          0
                                        FEATURE EXTRACTION AND COMPRESSION               1–1.5
                                        (FOR SEND IMAGE MODE)
                                        SERVER-SIDE OPERATIONS                        TIME ( MS)
                                        FEATURE EXTRACTION                               100
                                                                                                        [FIG9] Measured
                                        (FOR SEND IMAGE MODE)
                                        VT MATCHING                                      100
                                                                                                        percentage (b) f
                                        FAST GEOMETRIC RERANKING (PER IMAGE)             0.46           3G network. Ind
                                        GV (PER IMAGE)                                    30            Indoor (II) is test
                                                                                                        tested outside o



                                                                               IEEE SIGNAL PROCESSING MAGAZINE [72] JULY

Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
  
Tsai	
  et	
  al.	
  ACM	
  MM	
  2010	
                                                                 Oge	
  Marques	
  
ognition accuracy, sca




                                                      Classifica
                                                                                 88
                                                                                                                                                             databases, and deliver s
                                                                                 86                                                                          ceptable time.




                     Stanford Product Search System	

                                                                                 84                                              Send feature (CHoG)         Emerging MPEG stan
                                                                                                                                 Send image (JPEG)              As we have seen, key
                                                                                 82
                                                                                                                                 Send feature (SIFT)         gies for mobile visual se
                                                                                 80                                                                          we can choose among se
                                                                                         100                          101                              102
                                                                                                                                                             tures to design such a s

    •  Performance evaluation	

                                                                                                         Query size (Kbytes)
                                                                                                                                                             these options at the b
                                                       Figure 7. Comparison of different schemes with regard to classification                               The architecture shown
                                                       accuracy and query size. CHoG descriptor data is an order of magnitude                                est one to implement on
               –  End-to-end latency	

                smaller compared to JPEG images or uncompressed SIFT descriptors.                                     requires fast networks su
                                                                                                                                                             good performance. The
                                                                                                                                                             Figure 2b reduces netwo
                                                                                12
                                                                                                                                                             fast response over tod
                                                                                                                                    Feature extraction
                                                                                                                                    Network transmission     requires descriptors to
                                                                                10                                                  Retrieval                phone. Many applicatio
                                                                                                                                                             further by using a cache
                                                                                                                                                             phone, as exemplified
                                                      Response time (seconds)



                                                                                 8                                                                           shown in Figure 2c.
                                                                                                                                                                However, this imme
                                                                                                                                                             tion of interoperability
                                                                                 6                                                                           mobile visual search app
                                                                                                                                                             across a broad range of d
                                                                                                                                                             the information is exc
                                                                                 4
                                                                                                                                                             compressed visual de
                                                                                                                                                             images? This question w
                                                                                                                                                             ing the Workshop on
                                                                                 2
                                                                                                                                                             held at Stanford Univer
                                                                                                                                                             This discussion led to a
                                                                                                                                                             US delegation to MPEG,
                                                                                 0
                                                                                      JPEG     Feature         Feature          JPEG           Feature       tential interest in a stan
                                                                                      (3G)      (3G)         progressive       (WLAN)          (WLAN)        applications be explore
                                                                                                                (3G)
                                                                                                                                                             ploratory activity in MP
                                                       Figure 8. End-to-end latency for different schemes. Compared to a system                              produced a series of do
Girod	
  et	
  al.	
  IEEE	
  MulVmedia	
  2011	
                                                                                                                Oge	
  Marques	
  
                                                       transmitting a JPEG query image, a scheme employing progressive                                       quent year describing a
                                                       transmission of CHoG features achieves approximately four times the                                   objectives, scope, and re
                                                       reduction in system latency over a 3G network.                                                        standard.17
Examples of commercial MVS apps	

  •  Google
     Goggles 	

          –  Android
             and iPhone	

          –  Narrow-
             domain
             search and
             retrieval	





hp://www.google.com/mobile/goggles	
  	
     Oge	
  Marques	
  
SnapTell	

                                                             	

  •  One of the earliest (ca. 2008) MVS apps for iPhone	

          –  Eventually acquired by Amazon (A9)	

  •  Proprietary technique (“highly accurate and robust
     algorithm for image matching: Accumulated Signed Gradient
     (ASG)”).	





hp://www.snaptell.com/technology/index.htm	
  	
                   Oge	
  Marques	
  
oMoby (and the IQ Engines API)	

          –  iPhone app	





hp://omoby.com/pages/screenshots.php	
  	
     Oge	
  Marques	
  
oMoby (and the IQ Engines API)	


  •  The IQ Engines API: 
     “vision as a service”	





hp://www.iqengines.com/applicaVons.php	
  	
     Oge	
  Marques	
  
The IQEngines API demo app	

•  Screenshots	





                                       Oge	
  Marques	
  
The IQEngines API demo app	

•  XML-formatted response	





                                      Oge	
  Marques	
  
Kooaba: Déjà Vu and Paperboy	

  •  “Image recognition in the cloud” platform	





hp://www.kooaba.com/en/home/developers	
  	
       Oge	
  Marques	
  
Kooaba: Déjà Vu and Paperboy	


       •  Déjà Vu	

                                •  Paperboy	

              –  Enhanced digital                      –  News sharing from
                 memories / notes /                       printed media	

                 journal	





hp://www.kooaba.com/en/products/dejavu	
  
hp://www.kooaba.com/en/products/paperboy	
  	
                               Oge	
  Marques	
  
	
  	
  
pixlinQ	

  •  A “mobile visual
     search solution that
     enables you to link
     users to digital
     content whenever
     they take a mobile
     picture of your
     printed materials.”	

          –  Powered by image
             recognition from LTU
             technologies	


hp://www.pixlinq.com/home	
  	
                  Oge	
  Marques	
  
pixlinQ	

  •  Example app (La Redoute)	





hp://www.youtube.com/watch?v=qUZCFtc42Q4	
  	
                  Oge	
  Marques	
  
Moodstocks	

  •  Real-time mobile image recognition that works offline (!)	

  •  API and SDK available	





hp://www.youtube.com/watch?v=tsxe23b12eU	
  	
                 Oge	
  Marques	
  
Moodstocks	

  •  Many successful apps for different platforms	





hp://www.moodstocks.com/gallery/	
  	
                     Oge	
  Marques	
  
Concluding thoughts
Concluding thoughts	

•  Mobile Visual Search (MVS) is coming of age.	


•  This is not a fad and it can only grow.	


•  Still a good research topic	

   –  Many relevant technical challenges	

   –  MPEG efforts have just started	



•  Infinite creative commercial possibilities	

                                                     Oge	
  Marques	
  
Side note	

•  The power of Twitter…	





                                   Oge	
  Marques	
  
Thanks!	

•  Questions?	





•  For additional information: omarques@fau.edu	

                                                 Oge	
  Marques	
  

More Related Content

Viewers also liked

Technology Entrepreneurship Venture Lab 2012
Technology Entrepreneurship Venture Lab 2012Technology Entrepreneurship Venture Lab 2012
Technology Entrepreneurship Venture Lab 2012Takezo Oyama
 
Careers in footwear technology
Careers in footwear technologyCareers in footwear technology
Careers in footwear technologyentranzz123
 
Entrepreneurship, startups and company culture 101
Entrepreneurship, startups and company culture 101Entrepreneurship, startups and company culture 101
Entrepreneurship, startups and company culture 101Ivan Bjelajac
 
MultiModal Image Search on Mobile Device
MultiModal Image Search on Mobile DeviceMultiModal Image Search on Mobile Device
MultiModal Image Search on Mobile DeviceShailesh kumar
 
Disposable Products Manufacturing
Disposable Products ManufacturingDisposable Products Manufacturing
Disposable Products ManufacturingAjjay Kumar Gupta
 
Enterpreneurship. 01
Enterpreneurship. 01Enterpreneurship. 01
Enterpreneurship. 01Vishnu Sharma
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)Amazon Web Services
 
Project on women enterpreneurship in india
Project on women enterpreneurship in indiaProject on women enterpreneurship in india
Project on women enterpreneurship in indiaPoorvee Batra
 
CADBURY`S DAIRY MILK .
CADBURY`S DAIRY MILK .CADBURY`S DAIRY MILK .
CADBURY`S DAIRY MILK .gaurav
 
Business Plan for Starting a Chocolate Company
Business Plan for Starting a Chocolate CompanyBusiness Plan for Starting a Chocolate Company
Business Plan for Starting a Chocolate CompanyChandan Pahelwani
 
Starbucks marketing strategy
Starbucks marketing strategyStarbucks marketing strategy
Starbucks marketing strategySaravanan Murugan
 

Viewers also liked (14)

Technology Entrepreneurship Venture Lab 2012
Technology Entrepreneurship Venture Lab 2012Technology Entrepreneurship Venture Lab 2012
Technology Entrepreneurship Venture Lab 2012
 
Careers in footwear technology
Careers in footwear technologyCareers in footwear technology
Careers in footwear technology
 
Entrepreneurship, startups and company culture 101
Entrepreneurship, startups and company culture 101Entrepreneurship, startups and company culture 101
Entrepreneurship, startups and company culture 101
 
MultiModal Image Search on Mobile Device
MultiModal Image Search on Mobile DeviceMultiModal Image Search on Mobile Device
MultiModal Image Search on Mobile Device
 
Visual Search
Visual SearchVisual Search
Visual Search
 
Disposable Products Manufacturing
Disposable Products ManufacturingDisposable Products Manufacturing
Disposable Products Manufacturing
 
business plan
business planbusiness plan
business plan
 
Enterpreneurship. 01
Enterpreneurship. 01Enterpreneurship. 01
Enterpreneurship. 01
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)
 
Project on women enterpreneurship in india
Project on women enterpreneurship in indiaProject on women enterpreneurship in india
Project on women enterpreneurship in india
 
Nescafe
NescafeNescafe
Nescafe
 
CADBURY`S DAIRY MILK .
CADBURY`S DAIRY MILK .CADBURY`S DAIRY MILK .
CADBURY`S DAIRY MILK .
 
Business Plan for Starting a Chocolate Company
Business Plan for Starting a Chocolate CompanyBusiness Plan for Starting a Chocolate Company
Business Plan for Starting a Chocolate Company
 
Starbucks marketing strategy
Starbucks marketing strategyStarbucks marketing strategy
Starbucks marketing strategy
 

Similar to Mobile Visual Search

Volume 2-issue-6-1960-1964
Volume 2-issue-6-1960-1964Volume 2-issue-6-1960-1964
Volume 2-issue-6-1960-1964Editor IJARCET
 
Volume 2-issue-6-1960-1964
Volume 2-issue-6-1960-1964Volume 2-issue-6-1960-1964
Volume 2-issue-6-1960-1964Editor IJARCET
 
Image processing project list for java and dotnet
Image processing project list for java and dotnetImage processing project list for java and dotnet
Image processing project list for java and dotnetredpel dot com
 
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...Editor IJMTER
 
IRJET- Real-Time Object Detection System using Caffe Model
IRJET- Real-Time Object Detection System using Caffe ModelIRJET- Real-Time Object Detection System using Caffe Model
IRJET- Real-Time Object Detection System using Caffe ModelIRJET Journal
 
IRJET - Visual E-Commerce Application using Deep Learning
IRJET - Visual E-Commerce Application using Deep LearningIRJET - Visual E-Commerce Application using Deep Learning
IRJET - Visual E-Commerce Application using Deep LearningIRJET Journal
 
Mobile Web Browsing Based On Content Preserving With Reduced Cost
Mobile Web Browsing Based On Content Preserving With Reduced CostMobile Web Browsing Based On Content Preserving With Reduced Cost
Mobile Web Browsing Based On Content Preserving With Reduced CostEswar Publications
 
Web crawler with email extractor and image extractor
Web crawler with email extractor and image extractorWeb crawler with email extractor and image extractor
Web crawler with email extractor and image extractorAbhinav Gupta
 
A Intensified Approach on Deep Neural Networks for Human Activity Recognition...
A Intensified Approach on Deep Neural Networks for Human Activity Recognition...A Intensified Approach on Deep Neural Networks for Human Activity Recognition...
A Intensified Approach on Deep Neural Networks for Human Activity Recognition...IRJET Journal
 
Location based reminder
Location based reminderLocation based reminder
Location based reminderjunnubabu
 
IRJET- Recognition of OPS using Google Street View Images
IRJET-  	  Recognition of OPS using Google Street View ImagesIRJET-  	  Recognition of OPS using Google Street View Images
IRJET- Recognition of OPS using Google Street View ImagesIRJET Journal
 
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...IRJET Journal
 
Chapter 1_Introduction.docx
Chapter 1_Introduction.docxChapter 1_Introduction.docx
Chapter 1_Introduction.docxKISHWARYA2
 
IRJET- Criminal Recognization in CCTV Surveillance Video
IRJET-  	  Criminal Recognization in CCTV Surveillance VideoIRJET-  	  Criminal Recognization in CCTV Surveillance Video
IRJET- Criminal Recognization in CCTV Surveillance VideoIRJET Journal
 

Similar to Mobile Visual Search (20)

Volume 2-issue-6-1960-1964
Volume 2-issue-6-1960-1964Volume 2-issue-6-1960-1964
Volume 2-issue-6-1960-1964
 
Volume 2-issue-6-1960-1964
Volume 2-issue-6-1960-1964Volume 2-issue-6-1960-1964
Volume 2-issue-6-1960-1964
 
Image processing project list for java and dotnet
Image processing project list for java and dotnetImage processing project list for java and dotnet
Image processing project list for java and dotnet
 
Presentation1
Presentation1Presentation1
Presentation1
 
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...
 
Presentation1
Presentation1Presentation1
Presentation1
 
Presentation1
Presentation1Presentation1
Presentation1
 
IRJET- Real-Time Object Detection System using Caffe Model
IRJET- Real-Time Object Detection System using Caffe ModelIRJET- Real-Time Object Detection System using Caffe Model
IRJET- Real-Time Object Detection System using Caffe Model
 
IRJET - Visual E-Commerce Application using Deep Learning
IRJET - Visual E-Commerce Application using Deep LearningIRJET - Visual E-Commerce Application using Deep Learning
IRJET - Visual E-Commerce Application using Deep Learning
 
J018136669
J018136669J018136669
J018136669
 
pedersen
pedersenpedersen
pedersen
 
Mobile Web Browsing Based On Content Preserving With Reduced Cost
Mobile Web Browsing Based On Content Preserving With Reduced CostMobile Web Browsing Based On Content Preserving With Reduced Cost
Mobile Web Browsing Based On Content Preserving With Reduced Cost
 
Web crawler with email extractor and image extractor
Web crawler with email extractor and image extractorWeb crawler with email extractor and image extractor
Web crawler with email extractor and image extractor
 
A Intensified Approach on Deep Neural Networks for Human Activity Recognition...
A Intensified Approach on Deep Neural Networks for Human Activity Recognition...A Intensified Approach on Deep Neural Networks for Human Activity Recognition...
A Intensified Approach on Deep Neural Networks for Human Activity Recognition...
 
Location based reminder
Location based reminderLocation based reminder
Location based reminder
 
IRJET- Recognition of OPS using Google Street View Images
IRJET-  	  Recognition of OPS using Google Street View ImagesIRJET-  	  Recognition of OPS using Google Street View Images
IRJET- Recognition of OPS using Google Street View Images
 
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...
 
H0314450
H0314450H0314450
H0314450
 
Chapter 1_Introduction.docx
Chapter 1_Introduction.docxChapter 1_Introduction.docx
Chapter 1_Introduction.docx
 
IRJET- Criminal Recognization in CCTV Surveillance Video
IRJET-  	  Criminal Recognization in CCTV Surveillance VideoIRJET-  	  Criminal Recognization in CCTV Surveillance Video
IRJET- Criminal Recognization in CCTV Surveillance Video
 

More from Förderverein Technische Fakultät

The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...Förderverein Technische Fakultät
 
Engineering Serverless Workflow Applications in Federated FaaS.pdf
Engineering Serverless Workflow Applications in Federated FaaS.pdfEngineering Serverless Workflow Applications in Federated FaaS.pdf
Engineering Serverless Workflow Applications in Federated FaaS.pdfFörderverein Technische Fakultät
 
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfThe Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfFörderverein Technische Fakultät
 
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...Förderverein Technische Fakultät
 
East-west oriented photovoltaic power systems: model, benefits and technical ...
East-west oriented photovoltaic power systems: model, benefits and technical ...East-west oriented photovoltaic power systems: model, benefits and technical ...
East-west oriented photovoltaic power systems: model, benefits and technical ...Förderverein Technische Fakultät
 
Advances in Visual Quality Restoration with Generative Adversarial Networks
Advances in Visual Quality Restoration with Generative Adversarial NetworksAdvances in Visual Quality Restoration with Generative Adversarial Networks
Advances in Visual Quality Restoration with Generative Adversarial NetworksFörderverein Technische Fakultät
 
Industriepraktikum_ Unterstützung bei Projekten in der Automatisierung.pdf
Industriepraktikum_ Unterstützung bei Projekten in der Automatisierung.pdfIndustriepraktikum_ Unterstützung bei Projekten in der Automatisierung.pdf
Industriepraktikum_ Unterstützung bei Projekten in der Automatisierung.pdfFörderverein Technische Fakultät
 

More from Förderverein Technische Fakultät (20)

Supervisory control of business processes
Supervisory control of business processesSupervisory control of business processes
Supervisory control of business processes
 
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
 
A Game of Chess is Like a Swordfight.pdf
A Game of Chess is Like a Swordfight.pdfA Game of Chess is Like a Swordfight.pdf
A Game of Chess is Like a Swordfight.pdf
 
From Mind to Meta.pdf
From Mind to Meta.pdfFrom Mind to Meta.pdf
From Mind to Meta.pdf
 
Miniatures Design for Tabletop Games.pdf
Miniatures Design for Tabletop Games.pdfMiniatures Design for Tabletop Games.pdf
Miniatures Design for Tabletop Games.pdf
 
Distributed Systems in the Post-Moore Era.pptx
Distributed Systems in the Post-Moore Era.pptxDistributed Systems in the Post-Moore Era.pptx
Distributed Systems in the Post-Moore Era.pptx
 
Don't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptxDon't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptx
 
Engineering Serverless Workflow Applications in Federated FaaS.pdf
Engineering Serverless Workflow Applications in Federated FaaS.pdfEngineering Serverless Workflow Applications in Federated FaaS.pdf
Engineering Serverless Workflow Applications in Federated FaaS.pdf
 
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfThe Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
 
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
 
Towards a data driven identification of teaching patterns.pdf
Towards a data driven identification of teaching patterns.pdfTowards a data driven identification of teaching patterns.pdf
Towards a data driven identification of teaching patterns.pdf
 
Förderverein Technische Fakultät.pptx
Förderverein Technische Fakultät.pptxFörderverein Technische Fakultät.pptx
Förderverein Technische Fakultät.pptx
 
The Computing Continuum.pdf
The Computing Continuum.pdfThe Computing Continuum.pdf
The Computing Continuum.pdf
 
East-west oriented photovoltaic power systems: model, benefits and technical ...
East-west oriented photovoltaic power systems: model, benefits and technical ...East-west oriented photovoltaic power systems: model, benefits and technical ...
East-west oriented photovoltaic power systems: model, benefits and technical ...
 
Machine Learning in Finance via Randomization
Machine Learning in Finance via RandomizationMachine Learning in Finance via Randomization
Machine Learning in Finance via Randomization
 
IT does not stop
IT does not stopIT does not stop
IT does not stop
 
Advances in Visual Quality Restoration with Generative Adversarial Networks
Advances in Visual Quality Restoration with Generative Adversarial NetworksAdvances in Visual Quality Restoration with Generative Adversarial Networks
Advances in Visual Quality Restoration with Generative Adversarial Networks
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Industriepraktikum_ Unterstützung bei Projekten in der Automatisierung.pdf
Industriepraktikum_ Unterstützung bei Projekten in der Automatisierung.pdfIndustriepraktikum_ Unterstützung bei Projekten in der Automatisierung.pdf
Industriepraktikum_ Unterstützung bei Projekten in der Automatisierung.pdf
 
Introduction to 5G from radio perspective
Introduction to 5G from radio perspectiveIntroduction to 5G from radio perspective
Introduction to 5G from radio perspective
 

Recently uploaded

Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Server-Driven User Interface (SDUI) at Priceline
Server-Driven User Interface (SDUI) at PricelineServer-Driven User Interface (SDUI) at Priceline
Server-Driven User Interface (SDUI) at PricelineUXDXConf
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfalexjohnson7307
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
Transforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UXTransforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UXUXDXConf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.Boni Yeamin
 
Intelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdfIntelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdfAnthony Lucente
 

Recently uploaded (20)

Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Server-Driven User Interface (SDUI) at Priceline
Server-Driven User Interface (SDUI) at PricelineServer-Driven User Interface (SDUI) at Priceline
Server-Driven User Interface (SDUI) at Priceline
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Transforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UXTransforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UX
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.
 
Intelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdfIntelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdf
 

Mobile Visual Search

  • 1. Mobile Visual Search Oge Marques Florida Atlantic University Boca Raton, FL - USA TEWI  Kolloquium  –  24  Jan  2012    
  • 2. Take-home message Mobile Visual Search (MVS) is a fascinating research field with many open challenges and opportunities which have the potential to impact the way we organize, annotate, and retrieve visual data (images and videos) using mobile devices. Oge  Marques  
  • 3. Outline •  This talk is structured in four parts: 1.  Opportunities 2.  Basic concepts 3.  Technical details 4.  Examples and applications Oge  Marques  
  • 5. Mobile visual search: driving factors •  Age of mobile computing h<p://60secondmarketer.com/blog/2011/10/18/more-­‐mobile-­‐phones-­‐than-­‐toothbrushes/     Oge  Marques  
  • 6. Mobile visual search: driving factors •  Smartphone market h<p://www.idc.com/getdoc.jsp?containerId=prUS23123911     Oge  Marques  
  • 7. Mobile visual search: driving factors •  Smartphone market h<p://www.cellular-­‐news.com/story/48647.php?s=h       Oge  Marques  
  • 8. Mobile visual search: driving factors •  Why do I need a camera? I have a smartphone… h<p://www.cellular-­‐news.com/story/52382.php     Oge  Marques  
  • 9. Mobile visual search: driving factors •  Why do I need a camera? I have a smartphone… h<p://www.cellular-­‐news.com/story/52382.php     Oge  Marques  
  • 10. Mobile visual search: driving factors •  Powerful devices 1 GHz ARM Cortex-A9 processor, PowerVR SGX543MP2, Apple A5 chipset h<p://www.apple.com/iphone/specs.html     h<p://www.gsmarena.com/apple_iphone_4s-­‐4212.php     Oge  Marques  
  • 11. Mobile visual search: driving factors Social networks and mobile devices (May 2011) hp://jess3.com/geosocial-­‐universe-­‐2/     Oge  Marques  
  • 12. Mobile visual search: driving factors •  Social networks and mobile devices –  Motivated users: image taking and image sharing are huge! :  hp://www.onlinemarkeVng-­‐trends.com/2011/03/facebook-­‐photo-­‐staVsVcs-­‐and-­‐insights.html     Oge  Marques  
  • 13. Mobile visual search: driving factors •  Instagram: –  13 million registered (although not necessarily active) users (in 13 months) –  7 employees –  Several apps based on it! hp://venturebeat.com/2011/11/18/instagram-­‐13-­‐million-­‐users/     Oge  Marques  
  • 14. Mobile visual search: driving factors •  Food photo sharing! hp://mashable.com/2011/05/09/foodtography-­‐infographic/     Oge  Marques  
  • 15. Mobile visual search: driving factors •  Legitimate (or not quite…) needs and use cases hp://www.slideshare.net/dtunkelang/search-­‐by-­‐sight-­‐google-­‐goggles   hps://twier.com/#!/courtanee/status/14704916575       Oge  Marques  
  • 16. Search system, a low-latency interactive visual search system. base and is the key to very fast retr Several sidebars in this article invite the interested reader to dig features they have in common wit deeper into the underlying algorithms. of potentially similar images is sele Finally, a geometric verificatio Mobile visual search: driving factors ROBUST MOBILE IMAGE RECOGNITION Today, the most successful algorithms for content-based image most similar matches in the datab spatial pattern between features of retrieval use an approach that is referred to as bag of features didate database image to ensure (BoFs) or bag of words (BoWs). The BoW idea is borrowed from Example retrieval systems are pres •  A natural use case for CBIR with QBE (at last!) text retrieval. To find a particular text document, such as a Web page, it is sufficient to use a few well-chosen words. In the For mobile visual search, ther to provide the users with an int –  The example is right in front of the user! database, the document itself can be likewise represented by a deployed systems typically transm the server, which might require t large databases, the inverted file in memory swapping operations slow ing stage. Further, the GV step and thus increases the response t the retrieval pipeline in the follow the challenges of mobile visual se Query Feature Image Extraction [FIG2] A Pipeline for image retrieva from the query image. Feature mat [FIG1] A snapshot of an outdoor mobile visual search system images in the database that have m being used. The system augments the viewfinder with with the query image. The GV step information about the objects it recognizes in the image taken feature locations that cannot be pl with a camera phone. in viewing position. Girod  et  al.  IEEE  MulVmedia  2011   Oge  Marques  
  • 18. MVS: technical challenges •  How to ensure low latency (and interactive queries) under constraints such as: –  Network bandwidth –  Computational power –  Battery consumption •  How to achieve robust visual recognition in spite of low-resolution cameras, varying lighting conditions, etc. •  How to handle broad and narrow domains Oge  Marques  
  • 19. MVS: Pipeline for image retrieval Girod  et  al.  IEEE  MulVmedia  2011   Oge  Marques  
  • 20. 3 scenarios Girod  et  al.  IEEE  MulVmedia  2011   Oge  Marques  
  • 22. Part III - Outline •  The MVS pipeline in greater detail •  Datasets for MVS research •  MPEG Compact Descriptors for Visual Search (CDVS) Oge  Marques  
  • 23. MVS: descriptor extraction •  Interest point detection •  Feature descriptor computation Girod  et  al.  IEEE  MulVmedia  2011   Oge  Marques  
  • 24. Interest point detection •  Numerous interest-point detectors have been proposed in the literature: –  Harris Corners (Harris and Stephens 1988) –  Scale-Invariant Feature Transform (SIFT) Difference-of- Gaussian (DoG) (Lowe 2004) –  Maximally Stable Extremal Regions (MSERs) (Matas et al. 2002) –  Hessian affine (Mikolajczyk et al. 2005) –  Features from Accelerated Segment Test (FAST) (Rosten and Drummond 2006) –  Hessian blobs (Bay, Tuytelaars and Van Gool 2006) –  etc. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  • 25. Interest point detection •  Different tradeoffs in repeatability and complexity: –  SIFT DoG and other affine interest-point detectors are slow to compute but are highly repeatable. –  SURF interest-point detector provides significant speed up over DoG interest-point detectors by using box filters and integral images for fast computation. •  However, the box filter approximation causes significant anisotropy, i.e., the matching performance varies with the relative orientation of query and database images. –  FAST corner detector is an extremely fast interest-point detector that offers very low repeatability. •  See (Mikolajczyk and Schmid 2005) for a comparative performance evaluation of local descriptors in a common framework. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  • 26. Feature descriptor computation •  After interest-point detection, we compute a visual word descriptor on a normalized patch. •  Ideally, descriptors should be: –  robust to small distortions in scale, orientation, and lighting conditions; –  discriminative, i.e., characteristic of an image or a small set of images; –  compact, due to typical mobile computing constraints. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  • 27. Feature descriptor computation •  Examples of feature descriptors in the literature: –  SIFT (Lowe 1999) –  Speeded Up Robust Feature (SURF) interest-point detector (Bay et al. 2008) –  Gradient Location and Orientation Histogram (GLOH) (Mikolajczyk and Schmid 2005) –  Compressed Histogram of Gradients (CHoG) (Chandrasekhar et al. 2009, 2010) •  See (Winder, (Hua,) and Brown CVPR 2007, 2009) and (Mikolajczyk and Schmid PAMI 2005) for comparative performance evaluation of different descriptors. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  • 28. Feature descriptor computation •  What about compactness? –  Several attempts in the literature to compress off-the- shelf descriptors did not lead to the best-rate- constrained image-retrieval performance. –  Alternative: design a descriptor with compression in mind. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  • 29. Feature descriptor computation •  CHoG (Compressed Histogram of Gradients) (Chandrasekhar et al. 2009, 2010) –  Based on the distribution of gradients within a patch of pixels –  Histogram of gradient (HoG)-based descriptors [e.g. (Lowe 2004), (Bay et al. 2008), (Dalal and Triggs 2005), (Freeman and Roth 1994), and (Winder et al. 2009)] have been shown to be highly discriminative at low bit rates. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  • 30. CHoG: Compressed Histogram of Gradients Gradients Gradient distributions Patch for each bin dx dy dx dy 011101 Spatial 0100101 binning 01101 101101 Histogram 0100011 111001 compression 0010011 01100 1010100 CHoG
 Descriptor Bernd Girod: Mobile Visual Search Chandrasekhar  et  al.  CVPR  09,10   Oge  Marques  
  • 31. the context for each spatial bin. I LHC provides two key benefits. First, encoding the (x, locations of a set of N features as the histogram reduces we the bit rate by log(N!) compared to encoding each feature rate Encoding descriptor’s location information location in sequence [47]. Here, we exploit the fact that the features can be sent in any order. Consider the sample VGA loca space that represents N features. There are N! number of tion codes that represent the same feature set because the tati •  Location Histogram Coding (LHC) order does not matter. Thus, if we fix the ordering for the fixe –  Rationale: Interest- point locations in images tend to cluster spatially. [FIG a lo [FIG S3] Interest-point locations in images tend to cluster spa spatially. bloc Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  • 32. spatial different coding gains. In our experiments, Hessian Laplace ounts, has the highest gain followed by SIFT and SURF interest- ns. We point detectors. Even if the feature points are uniformly based scattered in the image, LHC is still able to exploit the sed as ordering gain, which results in log(N!) saving in bits. g the Encoding descriptor’s location information In our experiments, we found that quantizing the (x, y) location to four-pixel blocks is sufficient for GV. If educes we use a simple fixed-length coding scheme, then the eature rate will be log(640/4) 1 log(640/4) z 14 b/feature for a t that VGA size image. Using LHC, we can transmit the same •  Method: ample ber of •  Location Histogram location data with z 5 b/descriptor; z 12.5 times reduc- tion in data compared to a 64-b floating point represen- 1.  Generate a 2D histogram from se the or the Coding (LHC) tation and z 2.8 times rate reduction compared to fixed-length coding [48]. the locations of the descriptors. •  Divide the image into spatial bins and count the number of features within each spatial bin. 2.  Compress the binary map, indicating which spatial bins contains features, and a sequence 1 of feature counts, representing 2 1 the number of features in 1 1 3 occupied bins. 1 1 1 1 3.  Encode the binary map using a trained context-based arithmetic [FIG S4] We represent the location of the descriptors using coder, with the neighboring bins a location histogram. The image is first divided into evenly being used as the context for er spaced blocks. We enumerate the features within each spatial block by generating a location histogram. each spatial bin. be tra- ed by images taken from multiple view points. The size of the inverted index is reduced by using geometry to find matching Oge  Marques   Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   sed to features across images, and only retaining useful features and devel- discarding irrelevant clutter features.
  • 33. MVS: feature indexing and matching •  Goal: produce a data structure that can quickly return a short list of the database candidates most likely to match the query image. –  The short list may contain false positives as long as the correct match is included. –  Slower pairwise comparisons can be subsequently performed on just the short list of candidates rather than the entire database. Girod  et  al.  IEEE  MulVmedia  2011   Oge  Marques  
  • 34. clustering is applied to the training descriptors assigned to fall in same cluster. that cluster, to generate k smaller clusters. This recursive di- During a query, the VT is traversed for each feature in vision of the descriptor space is repeated until there are the query image, finishing at one of the leaf nodes. The enough bins to ensure good classification performance. corresponding lists of images and frequency counts are MVS: feature indexing and matching Figure B1 shows a VT with only two levels, branching factor k ¼ 3, and 32 ¼ 9 leaf nodes. In practice, VT can be much larger, for example, with height 6, branching factor k ¼ subsequently used to compute similarity scores be- tween these images and the query image. By pulling images from all these lists and sorting them according 10, and containing 106 ¼ 1 million nodes. to the scores, we arrive at a subset of database images •  Vocabulary Tree (VT)-Based Retrieval The associated inverted index structure maintains two lists for each VT leaf node, as shown in Figure B2. For a that is likely to contain a true match to the query image. 1 2 3 Training descriptor Root node 7 8 1st level intermediate node 4 5 2nd level leaf node 9 6 (1) Vocabulary tree Inverted index i11 i12 i13 ... i1N1 c11 c12 c13 ... c1N 1 i21 i22 i23 ... i2N2 1 2 3 4 5 6 7 8 9 ... c2N2 c21 c22 c23 (2) Girod  et  al.  IEEE  MulVmedia  2011   B. (1) Vocabulary tree and (2) inverted index structures. Figure Oge  Marques  
  • 35. MVS: geometric verification •  Goal: use location information of features in query and database images to confirm that the feature matches are consistent with a change in view-point between the two images. Girod  et  al.  IEEE  MulVmedia  2011   Oge  Marques  
  • 36. counts are far from code. Index compression reduces memory usage from near- be much more rate ly 10 GB to 2 GB. This five times reduction leads to a sub- g the distributions of stantial speedup in server-side processing, as shown in h inverted list can be [63]. Since keeping tant for interactive MVS: geometric verification Figure S6(b). Without compression, the large inverted index causes swapping between main and virtual memory and slows down the retrieval engine. After compression, me that allows ultra- memory swapping is avoided and memory congestion •  Method: perform pairwise matching of feature C. The carryover code delays no longer contribute to the query latency. descriptors and evaluate geometric consistency of correspondences. checks to rerank cale information of 69] propose incor- he VT matching or uthors investigate tself. Philbin et al. atures to propose mation model and s. Weak geometric to rerank a larger GV is performed on [FIG4] In the GV step, we match feature descriptors pairwise and find feature correspondences that are consistent with a geometric etric reranking step model. True feature matches are shown in red. False feature ted inGirod  et  al.  IEEE  MulVmedia  2011   Figure 5. In matches are shown in green. Oge  Marques  
  • 37. MVS: geometric verification •  Techniques: –  The geometric transform between the query and database image is usually estimated using robust regression techniques such as: •  Random sample consensus (RANSAC) (Fischler and Bolles 1981) •  Hough transform (Lowe 2004) –  The transformation is often represented by an affine mapping or a homography. •  GV is computationally expensive, which is why it’s only used for a subset of images selected during the feature-matching stage. Girod  et  al.  IEEE  MulVmedia  2011   Oge  Marques  
  • 38. mation itself. Philbin et al. ching features to propose transformation model and MVS: geometric reranking ypotheses. Weak geometric lly used to rerank a larger e a full GV is performed on •  Speed-up step between Vocabulary Tree building [FIG4] In the GV step, we match feature descriptors pairwise and find feature correspondences that are consistent with a geometric and Geometric Verification. are shown in red. False feature d a geometric reranking step model. True feature matches s illustrated in Figure 5. In matches are shown in green. tep that mation up stage t of top Query Geometric Identify VT GV Data Reranking Information ometric e of the x, y fea- [FIG5] An image retrieval pipeline can be greatly sped up by incorporating a geometric use scale reranking stage. IEEE SIGNAL PROCESSING MAGAZINE [69] JULY 2011 Oge  Marques  
  • 39. Fast geometric reranking ing algorithm in erank a short list set of potential log (÷) database image nerating a set of geometric score (a) (b) (c) (d) te the geometric We find the dis- location geometric score is computed as follows: •  The [FIG S7] The location geometric score is computed as uery image a)  features of twofeatures are matched based on VT quantization; and follows: (a) images of two images are matched based on VT quantization, (b) distances between pairs of features atching features distances between pairs of features within an image are calculated; b)  distance corre- within an image are calculated, (c) log-distance ratios of the c)  log-distance ratiospairs (denoted by color) pairs (denotedand color) are corresponding of the corresponding are calculated, by the two images. calculated; and (d) histogram of log-distance ratios is computed. The res in the query histogram of log-distance histogram is the geometric similarity d)  maximum value of the ratios is computed. es. If there exists score. A peak in the histogram indicates a similarity •  The maximum value of the histogram is the geometric peak in the his- transform between the query and database image. similarity score. y that the query –  A peak in the histogram indicates a similarity transform between The time required to calculate a geometric similarity score use we use the the query to two orders of magnitude less than using RANSAC. is one and database image. otential feature Typically, we perform fast geometric reranking on the top scoring scheme.MulVmedia  2011   Girod  et  al.  IEEE   500 images and RANSAC on the top 50 ranked images. Oge  Marques  
  • 40. Datasets for MVS research •  Stanford Mobile Visual Search Data Set (http://web.cs.wpi.edu/~claypool/mmsys-2011-dataset/stanford/) –  Key characteristics: •  rigid objects •  widely varying lighting conditions •  perspective distortion •  foreground and background clutter •  realistic ground-truth reference data •  query data collected from heterogeneous low and high-end camera phones. Chandrasekhar  et  al.  ACM  MMSys  2011   Oge  Marques  
  • 41. Stanford Mobile Visual Search (SMVS) Data Set •  Limitations of popular datasets Data Database Query Classes Rigid Lighting Clutter Perspective Camera Set (#) (#) (#) Phone √ √ √ ZuBuD 1005 115 200 √ − √ √ √ − Oxford 5062 55 17 √ √ × INRIA 1491 500 500 − √ − √ − UKY 10200 2550 2550 − √ − √ √ − ImageNet 11M 15K 15K − √ √ √ √ − √ SMVS 1200 3300 1200 able 1: Comparison of different data sets. “Classes” refers to the number of distinct objects in the data set. Rigid” refers to whether on not the objects in the database are rigid. “Lighting” refers to whether or not ZuBuD e query images capture widely varying lighting conditions. “Clutter” refers to whether or not the query mages contain foreground/background clutter. “Perspective” refers to whether the data set contains typical rspective distortions. “Camera-phone” refers to whether the images were captured with mobile devices. MVS is a good data set for mobile visual search applications. Oxford ries like CDs, DVDs, books, text documents and business affine models with a minimum threshold of 10 matches post- rds, we capture the images indoors under widely varying RANSAC for declaring a pair of images to be a valid match. hting conditions over several days. We include foreground In Fig. 3, we report results for 3 state-of-the-art schemes: d background clutter that would be typically present in (1) SIFT INRIA Difference-of-Gaussian (DoG) interest point de- e application, e.g., a picture of a CD would might other tector and SIFT descriptor (code: [27]), (2) Hessian-affine Ds in the background. For landmarks, we capture images interest point detector and SIFT descriptor (code [17]), and buildings in San Francisco. We collected query images (3) Fast Hessian blob interest point detector [2] sped up veral months after the reference data was collected. For UKY with integral images, and the recently proposed Compressed deo clips, the query images were taken from laptop, com- Histogram of Gradients (CHoG) descriptor [4]. We report ter and TV screens to include typical specular distortions. the percentage of images that match, the average number nally, the paintings were captured at the Cantor Arts Cen- of features and the average number of features that match at Stanford University under controlled lighting condi- Image Nets post-RANSAC for each category. ns typical of museums. et  al.  ACM  MMSys  2011   Chandrasekhar   Oge  Marques   Figure 2: Limitations with popular data sets in are easier vision. The left most image in each row is First, we note that indoor categories computer than out- The resolution of the query images varies for each camera database image, and the E.g., some categories like CDs, ZuBuD,and door categories. other 3 images are query images. DVDs INRIA and UKY consist of images ta one. We provide the original JPEG compressed high qual- at the book time and location. ImageNets is not suitable for image retrieval applications. The Oxford data same covers achieve over 95% accuracy. The most challeng- has different faades of the same building labelled with the same name.
  • 42. SMVS Data Set: categories and examples •  Number of query and database images per category Chandrasekhar  et  al.  ACM  MMSys  2011   Oge  Marques  
  • 43. SMVS Data Set: categories and examples •  DVD covers hp://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/dvd_covers.html     Oge  Marques  
  • 44. SMVS Data Set: categories and examples •  CD covers hp://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/cd_covers.html     Oge  Marques  
  • 45. SMVS Data Set: categories and examples •  Museum paintings hp://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/museum_painVngs.html     Oge  Marques  
  • 46. Other MVS data sets ISO/IEC  JTC1/SC29/WG11/N12202  -­‐  July  2011,  Torino,  IT   Oge  Marques  
  • 47. Other MVS data sets ISO/IEC  JTC1/SC29/WG11/N12202  -­‐  July  2011,  Torino,  IT   Oge  Marques  
  • 48. Other MVS data sets •  Distractor set –  1 million images of various resolution and content collected from FLICKR. ISO/IEC  JTC1/SC29/WG11/N12202  -­‐  July  2011,  Torino,  IT   Oge  Marques  
  • 49. MPEG Compact Descriptors for Visual Search (CDVS) •  Objectives –  Define a standard that: •  enables design of visual search applications •  minimizes lengths of query requests •  ensures high matching performance (in terms of reliability and complexity) •  enables interoperability between search applications and visual databases •  enables efficient implementation of visual search functionality on mobile devices •  Scope –  It is envisioned that (as a minimum) the standard will specify: •  bitstream of descriptors •  parts of descriptor extraction process (e.g. key-point detection) needed to ensure interoperability Bober,  Cordara,  and  Reznik  (2010)   Oge  Marques  
  • 50. MPEG CDVS •  Requirements –  Robustness •  High matching accuracy shall be achieved at least for images of textured rigid objects, landmarks, and printed documents. The matching accuracy shall be robust to changes in vantage point, camera parameters, lighting conditions, as well as in the presence of partial occlusions. –  Sufficiency •  Descriptors shall be self-contained, in the sense that no other data are necessary for matching. –  Compactness •  Shall minimize lengths/size of image descriptors –  Scalability •  Shall allow adaptation of descriptor lengths to support the required performance level and database size. •  Shall enable design of web-scale visual search applications and databases. Bober,  Cordara,  and  Reznik  (2010)   Oge  Marques  
  • 51. MPEG CDVS •  Requirements (cont’d) –  Image format independence •  Descriptors shall be independent of the image format –  Extraction complexity •  Shall allow descriptor extraction with low complexity (in terms of memory and computation) to facilitate video rate implementations –  Matching complexity •  Shall allow matching of descriptors with low complexity (in terms of memory and computation). •  If decoding of descriptors is required for matching, such decoding shall also be possible with low complexity. –  Localization: •  Shall support visual search algorithms that identify and localize matching regions of the query image and the database image •  Shall support visual search algorithms that provide an estimate of a geometric transformation between matching regions of the query image and the database image Bober,  Cordara,  and  Reznik  (2010)   Oge  Marques  
  • 52. MPEG CDVS [3B2-9] mmu2011030086.3d 1/8/011 16:44 Page 93 •  Summarized timeline Table 1. Timeline for development of MPEG standard for visual search. When Milestone Comments March, 2011 Call for Proposals is published Registration deadline: 11 July 2011 Proposals due: 21 November 2011 December, 2011 Evaluation of proposals None February, 2012 1st Working Draft First specification and test software model that can be used for subsequent improvements. July, 2012 Committee Draft Essentially complete and stabilized specification. January, 2013 Draft International Standard Complete specification. Only minor editorial changes are allowed after DIS. July, 2013 Final Draft International Finalized specification, submitted for approval and Standard publication as International standard. that among several component technologies for existing standards, such as MPEG Query For- image retrieval, such a standard should focus pri- mat, HTTP, XML, JPEG, and JPSearch. marily on defining the format of descriptors and Girod  et  al.  IEEE  MulVmedia  2011   Oge  Marques   parts of their extraction process (such as interest Conclusions and outlook point detectors) needed to ensure interoperabil- Recent years have witnessed remarkable
  • 53. MPEG CDVS •  CDVS: evaluation framework –  Experimental setup •  Retrieval experiment: intended to assess performance of proposals in the context of an image retrieval system ISO/IEC  JTC1/SC29/WG11/N12202  -­‐  July  2011,  Torino,  IT   Oge  Marques  
  • 54. MPEG CDVS •  CDVS: evaluation framework –  Experimental setup •  Pair-wise matching experiments: intended for assessing performance of proposals in the context of an application that uses descriptors for the purpose of image matching. Annota-­‐ Vons   Check   accuracy  of   Report   search  results   Image  A   Extract   descriptor   Match   Image  B   Extract   descriptors   ISO/IEC  JTC1/SC29/WG11/N12202  -­‐  July  2011,  Torino,  IT   Oge  Marques  
  • 55. MPEG CDVS •  For more info: –  https://mailhost.tnt.uni-hannover.de/mailman/listinfo/cdvs –  http://mpeg.chiariglione.org/meetings/geneva11-1/geneva_ahg.htm (Ad hoc groups) Oge  Marques  
  • 56. Part IV Examples and applications
  • 57. Examples •  Academic –  Stanford Product Search System •  Commercial –  Google Goggles –  Kooaba: Déjà Vu and Paperboy –  SnapTell –  oMoby (and the IQ Engines API) –  pixlinQ –  Moodstocks Oge  Marques  
  • 58. list of important references for each module in the matching (500 3 500 pixel resolution) [75] exhibiting challenging pho- pipeline in Table 2. tometric and geometric distortions, as shown in Figure 7. For Stanford Product Search (SPS) System [TABLE 2] SUMMARY OF REFERENCES FOR MODULES IN A MATCHING PIPELINE. MODULE LIST OF REFERENCES •  Local feature based visual search system FEATURE EXTRACTION HARRIS AND STEPHENS [17], LOWE [15], [23], MATAS ET AL. [18], MIKOLAJCZYK ET AL. [16], [22], DALAL AND TRIGGS [41], ROSTEN AND DRUMMOND [19], BAY ET AL. [20], WINDER ET AL. [27], [28], CHANDRASEKHAR ET AL. [25], [26], PHILBIN ET AL. [40] FEATURE INDEXING AND MATCHING SCHMID AND MOHR [13], LOWE [15], [23], SIVIC AND ZISSERMAN [9], NISTÉR AND STEWÉNIUS [10], CHUM ET AL. [50], [52], [53], YEH ET AL. [51], PHILBIN ET AL. [12], JEGOU ET AL. [11], [59], [60], ZHANG ET AL. [54] CHEN ET AL. [58], PERRONNIN [61], MIKULIK ET AL. [55], TURCOT AND LOWE [56], LI ET AL. [57] GV FISCHLER AND BOLLES [66], SCHAFFALITZKY AND ZISSERMAN [74], LOWE [15], [23], CHUM ET AL. [53], [70], [71] •  Client-server architecture FERRARI ET AL. [68], JEGOU ET AL. [11], WU ET AL. [69], TSAI ET AL. [73] Query Data Image Feature Feature VT Matching Extraction Compression Network Display GV Client Identification Data Server [FIG6] Stanford Product Search system. Because of the large database, the image-recognition server is placed at a remote location. In most systems [1], [3], [7], the query image is sent to the server and feature extraction is performed. In our system, we show that by performing feature extraction on the phone we can significantly reduce the transmission delay and provide an interactive experience. IEEE SIGNAL PROCESSING MAGAZINE [70] JULY 2011 Girod  et  al.  IEEE  MulVmedia  2011   Tsai  et  al.  ACM  MM  2010   Oge  Marques  
  • 59. Stanford Product Search (SPS) System •  Key contributions: –  Optimized feature extraction implementation –  CHoG: a low bit-rate compact descriptor (provides up to 20× bit-rate saving over SIFT with comparable image retrieval performance) –  Inverted index compression to enable large-scale image retrieval on the server –  Fast geometric re-ranking Girod  et  al.  IEEE  MulVmedia  2011   Tsai  et  al.  ACM  MM  2010   Oge  Marques  
  • 60. The system including different distances, viewing angles, Figure 1. A the viewfind Mobile image-based retrieval and lighting conditions, or in the presence of of an outdo information technologies partial occlusions or motion blur. visual-searc objects it rec Most successful algorithms for image-based The system the image ta retrieval today use an approach that is referred Stanford Product Search (SPS) System the viewfind a phone cam Mobile image-based(BoF) or bag of words to as bag of features retrieval information technologiesBoW idea is borrowed from text (BoW).1,2 The objects it re document retrieval. To find afor image-based Most successful algorithms particular text the image ta retrieval today use anwebpage, it’s sufficient to document, such as a approach that is referred a phone cam to as few well-chosen words. or the database, use a bag of features (BoF) In bag of words •  Two modes: (BoW).1,2 The BoW idea is borrowed be repre- the document itself can likewise from text characteristic of a particular image take the document a bag of salient words, regardless sented by retrieval. To find a particular text role of visual words. As with text retrieval, document, such as a webpage, it’s sufficient to of where these words appear in the document. BoF image retrieval does not consider where use aimages, robust local features database, For few well-chosen words. In the that are –  Send Image mode in the image the features occur, at least in the the document itself can likewise be repre- characteristic of a particular image take the sented by a bag of salient words, regardless role of visual words. As with text retrieval, Figure 2. Mo Mobile phone of where these words appear in the document. Visual search server BoF image retrieval does not consider where search archi For images, robust local features that are in the image the features occur, at least in the (a) The mob Image encoding Image Descriptor Image transmits th (JPEG) decoding extraction compressed Figure 2. Mo Mobile phone Visual search server Wireless network Descriptor while analys search arch matching Database image and r (a) The mob Image encoding Image Descriptor are done ent Image Process and (JPEG) Search transmits th decoding extraction display results results remote serve compressed Wireless network Descriptor local image while analy matching Database (descriptors) image and r (a) extracted en are done on Process and Search display results results phone and t remote serve Mobile phone Visual search server encoded and –  Send Feature mode local image transmitted (descriptors (a) Descriptor Descriptor Descriptor Image network. Su extracted on extraction encoding decoding descriptors a phone and t Mobile phone Visual search server Wireless network Descriptor used by the encoded and matching Database perform the transmitted Descriptor Descriptor Descriptor (c) The mob network. Su Image Process and extraction encoding Search decoding display results results maintains a descriptors a the databas used by the Wireless network Descriptor matching Database search reque perform the (b) remote serve (c) The mob Process and Search display results object of int maintains a Mobile phone results Visual search server found in thi the databas further requ search redu Girod  et  al.  IEEE  MulVmedia  2011   (b) Image Descriptor Descriptor Descriptor amountserve remote of d extraction encoding decoding Tsai  et  al.  ACM  MM  2010   Oge  Marques   over the netw object of int Mobile phone No Visual search server Descriptor Found Wireless network Descriptor found in th matching matching Database further redu Descriptor Descriptor Descriptor
  • 61. Stanford Product Search System •  Performance evaluation –  Dataset: 1 million CD, DVD, and book cover images + 1,000 query images (500×500) with challenging photometric and geometric distortions (a) (b) Girod  et  al.  IEEE  distortions. 2011   pairs from the data set. (a) A clean database picture is matched against (b) a real-world picture with various [FIG7] Example image MulVmedia   Oge  Marques   the client, we use a Nokia 5800 mobile phone with a 300-MHz Figure 8 compares the recall for the three schemes: send
  • 62. Stanford Product Search System [3B2-9] mmu2011030086.3d 30/7/011 16:27 Page 92 •  Performance evaluation –  Recall vs. bit rate Industry and Standards 100 features, as they arrive.15 On 98 finds a result that has sufficien ing score, it terminates the searc 96 ately sends the results back. T optimization reduces system Classification accuracy (%) 94 other factor of two. 92 Overall, the SPS system dem using the described array of tec 90 bile visual-search systems can ac ognition accuracy, scale to re 88 databases, and deliver search r 86 ceptable time. 84 Send feature (CHoG) Emerging MPEG standard Send image (JPEG) As we have seen, key compo 82 Send feature (SIFT) gies for mobile visual search alr 80 we can choose among several p 100 101 102 tures to design such a system. W Query size (Kbytes) these options at the beginnin Figure 7. Comparison of different schemes with regard to classification The architecture shown in Figur Girod  et  al.  IEEE  MulVmedia  2011   Oge  Marques   est one to implement on a mobi accuracy and query size. CHoG descriptor data is an order of magnitude smaller compared to JPEG images or uncompressed SIFT descriptors. requires fast networks such as W good performance. The archite
  • 63. achieve ,1 s server processing latency while maintaining high recall. 14 Communication Time-Out (%) Stanford Product Search System 12 TRANSMISSION DELAY The transmission delay depends on the type of network used. 10 In Figure 10, we observe that the data transmission time is 8 •  Performance evaluation insignificant for a WLAN network because of the high 6 –  Processing times 4 [TABLE 3] PROCESSING TIMES. 2 CLIENT-SIDE OPERATIONS TIME ( S) 0 IMAGE CAPTURE 1–2 0 FEATURE EXTRACTION AND COMPRESSION 1–1.5 (FOR SEND IMAGE MODE) SERVER-SIDE OPERATIONS TIME ( MS) FEATURE EXTRACTION 100 [FIG9] Measured (FOR SEND IMAGE MODE) VT MATCHING 100 percentage (b) f FAST GEOMETRIC RERANKING (PER IMAGE) 0.46 3G network. Ind GV (PER IMAGE) 30 Indoor (II) is test tested outside o IEEE SIGNAL PROCESSING MAGAZINE [72] JULY Girod  et  al.  IEEE  MulVmedia  2011   Tsai  et  al.  ACM  MM  2010   Oge  Marques  
  • 64. ognition accuracy, sca Classifica 88 databases, and deliver s 86 ceptable time. Stanford Product Search System 84 Send feature (CHoG) Emerging MPEG stan Send image (JPEG) As we have seen, key 82 Send feature (SIFT) gies for mobile visual se 80 we can choose among se 100 101 102 tures to design such a s •  Performance evaluation Query size (Kbytes) these options at the b Figure 7. Comparison of different schemes with regard to classification The architecture shown accuracy and query size. CHoG descriptor data is an order of magnitude est one to implement on –  End-to-end latency smaller compared to JPEG images or uncompressed SIFT descriptors. requires fast networks su good performance. The Figure 2b reduces netwo 12 fast response over tod Feature extraction Network transmission requires descriptors to 10 Retrieval phone. Many applicatio further by using a cache phone, as exemplified Response time (seconds) 8 shown in Figure 2c. However, this imme tion of interoperability 6 mobile visual search app across a broad range of d the information is exc 4 compressed visual de images? This question w ing the Workshop on 2 held at Stanford Univer This discussion led to a US delegation to MPEG, 0 JPEG Feature Feature JPEG Feature tential interest in a stan (3G) (3G) progressive (WLAN) (WLAN) applications be explore (3G) ploratory activity in MP Figure 8. End-to-end latency for different schemes. Compared to a system produced a series of do Girod  et  al.  IEEE  MulVmedia  2011   Oge  Marques   transmitting a JPEG query image, a scheme employing progressive quent year describing a transmission of CHoG features achieves approximately four times the objectives, scope, and re reduction in system latency over a 3G network. standard.17
  • 65. Examples of commercial MVS apps •  Google Goggles –  Android and iPhone –  Narrow- domain search and retrieval hp://www.google.com/mobile/goggles     Oge  Marques  
  • 66. SnapTell •  One of the earliest (ca. 2008) MVS apps for iPhone –  Eventually acquired by Amazon (A9) •  Proprietary technique (“highly accurate and robust algorithm for image matching: Accumulated Signed Gradient (ASG)”). hp://www.snaptell.com/technology/index.htm     Oge  Marques  
  • 67. oMoby (and the IQ Engines API) –  iPhone app hp://omoby.com/pages/screenshots.php     Oge  Marques  
  • 68. oMoby (and the IQ Engines API) •  The IQ Engines API: “vision as a service” hp://www.iqengines.com/applicaVons.php     Oge  Marques  
  • 69. The IQEngines API demo app •  Screenshots Oge  Marques  
  • 70. The IQEngines API demo app •  XML-formatted response Oge  Marques  
  • 71. Kooaba: Déjà Vu and Paperboy •  “Image recognition in the cloud” platform hp://www.kooaba.com/en/home/developers     Oge  Marques  
  • 72. Kooaba: Déjà Vu and Paperboy •  Déjà Vu •  Paperboy –  Enhanced digital –  News sharing from memories / notes / printed media journal hp://www.kooaba.com/en/products/dejavu   hp://www.kooaba.com/en/products/paperboy     Oge  Marques      
  • 73. pixlinQ •  A “mobile visual search solution that enables you to link users to digital content whenever they take a mobile picture of your printed materials.” –  Powered by image recognition from LTU technologies hp://www.pixlinq.com/home     Oge  Marques  
  • 74. pixlinQ •  Example app (La Redoute) hp://www.youtube.com/watch?v=qUZCFtc42Q4     Oge  Marques  
  • 75. Moodstocks •  Real-time mobile image recognition that works offline (!) •  API and SDK available hp://www.youtube.com/watch?v=tsxe23b12eU     Oge  Marques  
  • 76. Moodstocks •  Many successful apps for different platforms hp://www.moodstocks.com/gallery/     Oge  Marques  
  • 78. Concluding thoughts •  Mobile Visual Search (MVS) is coming of age. •  This is not a fad and it can only grow. •  Still a good research topic –  Many relevant technical challenges –  MPEG efforts have just started •  Infinite creative commercial possibilities Oge  Marques  
  • 79. Side note •  The power of Twitter… Oge  Marques  
  • 80. Thanks! •  Questions? •  For additional information: omarques@fau.edu Oge  Marques