Image retrieval: challenges and opportunities

Image retrieval:
challenges and opportunities

Oge Marques

Florida Atlantic University

Boca Raton, FL - USA

June
4,
2012

UTFPR

Curi3ba,
PR
-‐
Brazil

Watch this…

h@p://www.google.com/mobile/goggles

Oge
Marques

Google Goggles

•  Mobile visual search (MVS) solution

–  Android and iPhone

–  Narrow-domain search and retrieval

h@p://www.google.com/mobile/goggles

Oge
Marques

Outline

•  How does it work?

•  Why is it relevant?

•  What else is going on?

•  Which challenges and opportunities lie ahead?

Oge
Marques

Fundamentals

How does it work?

Fundamentals

•  Google Goggles is (one of) the ﬁrst – and maybe
the best-known – solution for MVS

•  It is a contemporary example of content-based
image retrieval (CBIR)

•  Its technical details (algorithms, etc.) are not
publicly available

•  However…

Oge
Marques

MVS: Pipeline for image retrieval

Girod
et
al.
IEEE
Mul3media
2011
Oge
Marques

MVS: 3 scenarios

Girod
et
al.
IEEE
Mul3media
2011
Oge
Marques

MVS: descriptor extraction

•  Interest point detection

•  Feature descriptor computation

Girod
et
al.
IEEE
Mul3media
2011
Oge
Marques

Interest point detection

•  Numerous interest-point detectors have been proposed in
the literature:

–  Harris Corners (Harris and Stephens 1988)

–  Scale-Invariant Feature Transform (SIFT) Difference-of-Gaussian
(DoG) (Lowe 2004)

–  Maximally Stable Extremal Regions (MSERs) (Matas et al. 2002)

–  Hessian afﬁne (Mikolajczyk et al. 2005)

–  Features from Accelerated Segment Test (FAST) (Rosten and
Drummond 2006)

–  Hessian blobs (Bay, Tuytelaars and Van Gool 2006)

•  Different tradeoffs in repeatability and complexity

•  See (Mikolajczyk and Schmid 2005) for a comparative
performance evaluation of local descriptors in a common
framework.

Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques

Feature descriptor computation

•  After interest-point detection, we compute a
visual word descriptor on a normalized patch.

•  Ideally, descriptors should be:

–  robust to small distortions in scale, orientation, and
lighting conditions;

–  discriminative, i.e., characteristic of an image or a small
set of images;

–  compact, due to typical mobile computing constraints.

Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques


•  Examples of feature descriptors in the literature:

–  SIFT (Lowe 1999)

–  Speeded Up Robust Feature (SURF) interest-point
detector (Bay et al. 2008)

–  Gradient Location and Orientation Histogram (GLOH)
(Mikolajczyk and Schmid 2005)

–  Compressed Histogram of Gradients (CHoG)
(Chandrasekhar et al. 2009, 2010)

•  See (Winder, (Hua,) and Brown CVPR 2007, 2009) and
(Mikolajczyk and Schmid PAMI 2005) for comparative
performance evaluation of different descriptors.

Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques


•  What about compactness?

–  Option 1: Compress off-the-shelf descriptors.

•  Result: poor rate-constrained image-retrieval
performance.

–  Option 2: Design a descriptor with compression in
mind.

–  Example: CHoG (Compressed Histogram of Gradients)
(Chandrasekhar et al. 2009, 2010)

Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques

CHoG: Compressed Histogram of Gradients

Gradients
Gradient distributions
Patch
for each bin
dx

dy

dx
dy
011101

Spatial
0100101

binning
01101

101101

Histogram
0100011

111001

compression
0010011

01100

1010100

CHoG 
Descriptor
Bernd Girod: Mobile Visual Search
Chandrasekhar
et
al.
CVPR
09,10
Oge
Marques

CHoG: Compressed Histogram of Gradients

[3B2-9] mmu2011030086.3d 30/7/011 16:27 Page 92

•  Performance evaluation

–  Recall vs. bit rate

Industry and Standards

100
features, as they arrive.15 On
98 finds a result that has sufficien
ing score, it terminates the searc
96 ately sends the results back. T
optimization reduces system
Classification accuracy (%)

94
other factor of two.
92 Overall, the SPS system dem
using the described array of tec
90 bile visual-search systems can ac
ognition accuracy, scale to re
88
databases, and deliver search r
86 ceptable time.

84 Send feature (CHoG) Emerging MPEG standard
Send image (JPEG) As we have seen, key compo
82
Send feature (SIFT) gies for mobile visual search alr
80 we can choose among several p
100 101 102
tures to design such a system. W
Query size (Kbytes)
these options at the beginnin
Figure 7. Comparison of different schemes with regard to classification The architecture shown in Figur
Girod
et
al.
IEEE
Mul3media
2011
Oge
Marques

est one to implement on a mobi
accuracy and query size. CHoG descriptor data is an order of magnitude
smaller compared to JPEG images or uncompressed SIFT descriptors. requires fast networks such as W
good performance. The archite

MVS: feature indexing and matching

•  Goal: produce a data structure that can quickly return a short
list of the database candidates most likely to match the query
image.

–  The short list may contain false positives as long as the correct match
is included.

–  Slower pairwise comparisons can be subsequently performed on just
the short list of candidates rather than the entire database.

•  Example of a technique: Vocabulary Tree (VT)-Based Retrieval

Girod
et
al.
IEEE
Mul3media
2011
Oge
Marques

MVS: geometric veriﬁcation

•  Goal: use location information of features in
query and database images to conﬁrm that the
feature matches are consistent with a change in
viewpoint between the two images.

Girod
et
al.
IEEE
Mul3media
2011
Oge
Marques

ik2, c, ikNk 6 is sorted, it is more
utive ID differences 5 dk1 5 ik1,
es. is used to encode the inverted index.

2 ik1Nk 212 6 in place of the IDs. This
dex [58] can significantly reduce
cting recognition accuracy. First, [64] and recursive bottom-up complete (RBUC) code [65] have
been shown to be at least ten times faster in decoding than

MVS: geometric veriﬁcation

AC, while achieving comparable compression gains as AC. The
carryover and RBUC codes attain these speedups by enforcing
ed in text retrieval [62]. Second, word-aligned memory accesses.
n be quantized to a few repre- Figure S6(a) compares the memory usage of the invert-
•  Method: perform ed index with and without feature descriptorsRBUC evaluate
Max quantization. Third, the dis- pairwise matching of compression using the and
ces and visit counts are far from code. Index compression reduces memory usage from near-
geometricrate ly 10 GBof correspondences.

coding can be much more
consistency to 2 GB. This five times reduction leads to a sub-
•  Techniques:

oding. Using the distributions of stantial speedup in server-side processing, as shown in
counts, each inverted list can be Figure S6(b). Without compression, the large inverted
c code (AC) [63]. The geometricindex causes swapping between main anddatabase image is usually
–  Since keeping transform between the query and virtual memory estimated
very important for interactive regression down the retrieval engine. After compression,
using robust and slows techniques such as:

ions, a scheme that allows ultra- sample consensus (RANSAC) (Fischlermemory congestion
•  Random memory swapping is avoided and and Bolles 1981)

red over AC. The carryover code delays no longer contribute to the query latency.
•  Hough transform (Lowe 2004)

–  The transformation is often represented by an afﬁne mapping or a homography.

•  Note: GV is computationally expensive, which is why it’s only used for a subset
of images selected during the feature-matching stage.

onsistency checks to rerank
tion and scale information of
[53] and [69] propose incor-
tion into the VT matching or
71], the authors investigate
stimation itself. Philbin et al.
atching features to propose
c transformation model and
hypotheses. Weak geometric
cally used to rerank a larger
ore a full GVt
al.
Iperformed on011

Girod
e is EEE
Mul3media
2 Oge
Marques

[FIG4] In the GV step, we match feature descriptors pairwise and
find feature correspondences that are consistent with a geometric
add a geometric reranking step

Relevance

Why is it relevant?

Relevance

•  Explosive growth and increasing popularity of
mobile devices and apps

•  (Finally!) a good use case for CBIR

•  Many commercial opportunities

Oge
Marques

Mobile visual search: driving factors

•  Age of mobile computing

h@p://60secondmarketer.com/blog/2011/10/18/more-‐mobile-‐phones-‐than-‐toothbrushes/

Oge
Marques


•  Why do I need a camera? I have a smartphone…

(22 Dec 2011)

h@p://www.cellular-‐news.com/story/52382.php

Oge
Marques


•  Powerful devices

1 GHz ARM
Cortex-A9
processor,
PowerVR
SGX543MP2,

Apple A5 chipset

h@p://www.apple.com/iphone/specs.html

h@p://www.gsmarena.com/apple_iphone_4s-‐4212.php

Oge
Marques


•  Powerful devices

h@p://europe.nokia.com/PRODUCT_METADATA_0/Products/Phones/8000-‐series/808/Nokia808PureView_Whitepaper.pdf

h@p://www.nokia.com/fr-‐fr/produits/mobiles/808/

Oge
Marques


•  Instagram:

–  50 million registered users (35 M in last four
months)

–  7 employees

–  A (growing ecosystem) based on it!

•  Search

•  Send postcards

•  Manage your photos

•  Build a poster

•  etc.

–  Sold to Facebook (for $ 1 Billion !)
earlier this year

h@p://thenextweb.com/apps/2011/12/07/instagram-‐hits-‐15m-‐users-‐and-‐has-‐2-‐people-‐working-‐on-‐an-‐android-‐app-‐right-‐now/

h@p://www.nuwomb.com/instagram/

Oge
Marques

Search system, a low-latency interactive visual search system. base and is the key to very fast retr
Several sidebars in this article invite the interested reader to dig features they have in common wit
deeper into the underlying algorithms. of potentially similar images is sele
Finally, a geometric verificatio


ROBUST MOBILE IMAGE RECOGNITION
Today, the most successful algorithms for content-based image
most similar matches in the datab
spatial pattern between features of
retrieval use an approach that is referred to as bag of features didate database image to ensure
(BoFs) or bag of words (BoWs). The BoW idea is borrowed from Example retrieval systems are pres
•  A natural use case for CBIR with QBE (at last!)

text retrieval. To find a particular text document, such as a Web
page, it is sufficient to use a few well-chosen words. In the
For mobile visual search, ther
to provide the users with an int
–  The example is right in front of the user!

database, the document itself can be likewise represented by a deployed systems typically transm
the server, which might require t
large databases, the inverted file in
memory swapping operations slow
ing stage. Further, the GV step
and thus increases the response t
the retrieval pipeline in the follow
the challenges of mobile visual se

Query Feature
Image Extraction

[FIG2] A Pipeline for image retrieva
from the query image. Feature mat
[FIG1] A snapshot of an outdoor mobile visual search system images in the database that have m
being used. The system augments the viewfinder with with the query image. The GV step
information about the objects it recognizes in the image taken feature locations that cannot be pl
with a camera phone. in viewing position.
Girod
et
al.
IEEE
Mul3media
2011
Oge
Marques

MVS: commercial opportunities

•  Example app (La Redoute by pixlinQ)

h@p://www.youtube.com/watch?v=qUZCFtc42Q4

Oge
Marques

Context

What else is going on?

Context

•  Research: datasets and groups

•  Standardization: MPEG CDVS efforts

•  Commercial: main players (so far)

Oge
Marques

Datasets for MVS research

•  Stanford Mobile Visual Search Data Set
(http://web.cs.wpi.edu/~claypool/mmsys-dataset/2011/stanford/)

–  Key characteristics:

•  rigid objects

•  widely varying lighting conditions

•  perspective distortion

•  foreground and background clutter

•  realistic ground-truth reference data

•  query data collected from heterogeneous low and high-end
camera phones.

Chandrasekhar
et
al.
ACM
MMSys
2011
Oge
Marques

SMVS Data Set: categories and examples

•  DVD covers

h@p://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/dvd_covers.html

Oge
Marques


•  CD covers

h@p://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/cd_covers.html

Oge
Marques


•  Museum paintings

h@p://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/museum_pain3ngs.html

Oge
Marques

Other MVS data sets

ISO/IEC
JTC1/SC29/WG11/N12202
-‐
July
2011,
Torino,
IT
Oge
Marques

MPEG Compact Descriptors for Visual Search (CDVS)

•  Objective

–  Deﬁne a standard that enables efﬁcient
implementation of visual search functionality on mobile
devices

•  Scope

•  bitstream of descriptors

•  parts of descriptor extraction process (e.g. key-point
detection) needed to ensure interoperability

–  Additional info:

•  https://mailhost.tnt.uni-hannover.de/mailman/listinfo/cdvs

•  http://mpeg.chiariglione.org/meetings/geneva11-1/geneva_ahg.htm (Ad hoc groups)

Bober,
Cordara,
and
Reznik
(2010)
Oge
Marques

MPEG CDVS

[3B2-9] mmu2011030086.3d 1/8/011 16:44 Page 93

•  Summarized timeline

Table 1. Timeline for development of MPEG standard for visual search.

When Milestone Comments
March, 2011 Call for Proposals is published Registration deadline: 11 July 2011
Proposals due: 21 November 2011
December, 2011 Evaluation of proposals None
February, 2012 1st Working Draft First specification and test software model that can
be used for subsequent improvements.
July, 2012 Committee Draft Essentially complete and stabilized specification.
January, 2013 Draft International Standard Complete specification. Only minor editorial
changes are allowed after DIS.
July, 2013 Final Draft International Finalized specification, submitted for approval and
Standard publication as International standard.

that among several component technologies for existing standards, such as MPEG Query For-
image retrieval, such a standard should focus pri- mat, HTTP, XML, JPEG, and JPSearch.
marily on defining the format of descriptors and
Girod
et
al.
IEEE
Mul3media
2011
Oge
Marques

parts of their extraction process (such as interest Conclusions and outlook
point detectors) needed to ensure interoperabil- Recent years have witnessed remarkable

Commercial apps

•  SnapTell

•  oMoby (and the IQ Engines API)

•  Moodstocks

Oge
Marques

SnapTell

•  One of the earliest (ca. 2008) MVS apps for iPhone

–  Eventually acquired by Amazon (A9)

•  Proprietary technique (“highly accurate and robust
algorithm for image matching: Accumulated Signed Gradient
(ASG)”).

h@p://www.snaptell.com/technology/index.htm

Oge
Marques

oMoby (and the IQ Engines API)

–  iPhone app

h@p://omoby.com/pages/screenshots.php

Oge
Marques

oMoby (and the IQ Engines API)

•  The IQ Engines API:
“vision as a service”

h@p://www.iqengines.com/applica3ons.php

Oge
Marques

Moodstocks: overview

•  Ofﬂine image recognition thanks to a smart image
signatures synchronization

h@p://www.youtube.com/watch?v=tsxe23b12eU

Oge
Marques

Perspective

Which challenges and
opportunities lie ahead?

MVS: technical challenges

•  How to ensure low latency (and interactive
queries) under constraints such as:

–  Network bandwidth

–  Computational power

–  Battery consumption

•  How to achieve robust visual recognition in spite
of low-resolution cameras, varying lighting
conditions, etc.

•  How to handle broad and narrow domains

Oge
Marques

Other technical challenges

•  How to handle the (infamous) semantic gap

•  Combination of text-based and visual queries

•  Visualization of results

•  Users' needs and intentions

Oge
Marques

The semantic gap

•  The semantic gap is the lack of coincidence
between the information that one can extract
from the visual data and the interpretation that
the same data have for a user in a given situation.

•  “The pivotal point in content-based retrieval is that the user
seeks semantic similarity, but the database can only provide
similarity by data processing. This is what we called the
semantic gap.” [Smeulders et al., 2000]

Oge
Marques

Google similarity search

Oge
Marques

Google sort by subject

http://www.google.com/landing/imagesorting/

Oge
Marques

Google image swirl

http://image-swirl.googlelabs.com/

Oge
Marques

Challenge: users’ needs and intentions

•  Users and developers have quite different views

•  Cultural and contextual information should be
taken into account

•  User intentions are hard to infer

–  Privacy issues

–  Users themselves don’t always know what they want

–  Who misses the MS Ofﬁce paper clip?

Oge
Marques

Concluding thoughts

(Mobile) visual search and retrieval is a fascinating
research ﬁeld with many open challenges and
opportunities which have the potential to impact
the way we organize, annotate, and retrieve visual
data (images and videos).

Oge
Marques

Learn more about it

•  http://savvash.blogspot.com/

Oge
Marques

Thanks!

•  Questions?

•  For additional information: omarques@fau.edu

Oge
Marques

Image retrieval: challenges and opportunities

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Image retrieval: challenges and opportunities

Similar to Image retrieval: challenges and opportunities (18)

More from Oge Marques

More from Oge Marques (6)

Recently uploaded

Recently uploaded (20)

Image retrieval: challenges and opportunities