Advances and Challenges in Visual Information Search and Retrieval (WVC 2012 - Goiania-GO, Brazil)

Advances and Challenges in
Visual Information Search and
Retrieval

Oge Marques

Florida Atlantic University

Boca Raton, FL - USA

VIII
Workshop
de
Visão
Computacional
(WVC)
2012

May
27
–
30,
2012

Goiania,
GO
-‐
Brazil

Take-home message

Visual Information Retrieval (VIR) is a fascinating
research ﬁeld with many open challenges and
opportunities which have the potential to impact
the way we organize, annotate, and retrieve visual
data (images and videos).

Oge
Marques

Disclaimer #1

•  Visual Information Retrieval (VIR) is a highly
interdisciplinary ﬁeld, but …

Image and (Multimedia)
Information
Video Database
Retrieval
Processing Systems

Visual
Machine Computer
Learning Information Vision
Retrieval

Visual data
Human Visual
Data Mining modeling and
Perception
representation

Oge
Marques

Disclaimer #2

•  There are many things that I believe…

•  … but cannot prove

Oge
Marques

Background and Motivation

“What is it that we’re trying to do
and
why is it so difﬁcult?”

–  Taking pictures and storing, sharing, and publishing
them has never been so easy and inexpensive.

–  If only we could say the same about ﬁnding the images
we want and retrieving them…

Oge
Marques


The “big mismatch”

easy
• Take pictures
• Store pictures
• Publish pictures
expensive • Share pictures cheap

• Organize pictures
• Annotate pictures
• Find pictures
• Retrieve pictures difficult

Oge
Marques


•  Q: What do you do when you need to find an image
(on the Web)?

•  A1: Google (image search), of course!

Oge
Marques


Google image search results for “sydney opera house”

Source: Google Image Search (http://images.google.com/)

Oge
Marques


Google image search results for “opera”

Source: Google Image Search (http://images.google.com/)

Oge
Marques


•  Q: What do you do when you need to find an
image (on the Web)?

•  A2: Other (so-called specialized) image search
engines
•  http://images.search.yahoo.com/
•  http://pictures.ask.com
•  http://www.bing.com/images

Oge
Marques


•  Q: What do you do when you need to ﬁnd an
image (on the Web)?

•  A3: Search directly on large photo repositories:

–  Flickr

–  Webshots

–  Shutterstock

Oge
Marques


Flickr image search results for “opera”

Oge
Marques


Webshots image search results for “opera”

Oge
Marques


Shutterstock image search results for “opera”

Oge
Marques


Are you happy with the results so far?

Oge
Marques


•  Back to our original (two-part) question:

–  What is it that we’re trying to do?

–  We're trying to create

automated solutions to the problem of

finding and retrieving visual information,

from (large, unstructured) repositories,

in a way that satisfies search criteria specified by

users,

relying (primarily) on the visual contents of the

media.

Oge
Marques


•  Why is it so difﬁcult?

•  There are many challenges, among them:

–  The elusive notion of similarity

–  The semantic gap

–  Large datasets and broad domains

–  Combination of visual and textual information

–  The users (and how to make them happy)

Oge
Marques

Outline

•  Part I – Concepts, challenges, and state of the art

•  Part II – Medical image retrieval

•  Part III – Mobile visual search

•  Part IV – Where is image search headed?

Oge
Marques

Part I

Concepts, challenges, and state of
the art

The elusive notion of similarity

•  Are these two images similar?

Source: Eidenberger, H., Introduction:Visual Information Retrieval, “Habilitation thesis”,Vienna University of Technology, 2004.
Available at http://www.ims.tuwien.ac.at/~hme/papers/habil-full.pdf

Oge
Marques


•  Is the second or the third image more similar to
the ﬁrst?


Oge
Marques


•  Which image ﬁts better to the ﬁrst two: the third
or the fourth?


Oge
Marques

The semantic gap

•  The semantic gap is the lack of coincidence
between the information that one can extract
from the visual data and the interpretation that
the same data have for a user in a given situation.

•  “The pivotal point in content-based retrieval is that the user
seeks semantic similarity, but the database can only provide
similarity by data processing. This is what we called the
semantic gap.” [Smeulders et al., 2000]

Oge
Marques

Google similarity search

Oge
Marques

Google sort by subject

http://www.google.com/landing/imagesorting/

Oge
Marques

Google image swirl

http://image-swirl.googlelabs.com/

Oge
Marques

How I see it…

•  The semantic gap problem has not been solved (and
maybe will never be…)

•  What are the alternatives?

–  Treat visual similarity and semantic relatedness differently

•  Examples: Alipr, Google (or Bing) similarity search, etc.

–  Improve both (text-based and visual) search methods
independently

–  Combine visual and textual information in a meaningful
way

–  Engage the user

•  Collaborative ﬁltering, crowdsourcing, games.

Oge
Marques

•  But, wait… There
are other gaps!

–  Just when you
thought the
semantic gap was
your only
problem…

Source: [Deserno, Antani, and Long, 2009]
Oge
Marques

Large datasets and broad domains

•  Large datasets bring additional challenges in all
aspects of the system:

–  Storage requirements: images, metadata, and “visual
signatures”

–  Computational cost of indexing, searching, retrieving,
and displaying images

–  Network and latency issues

Oge
Marques

Large datasets and broad domains

Source: Smeulders et al., “Content-based image retrieval at the end of
the early years”, IEEE Transactions on PAMI, Vol 22, Issue 12, Dec 2000

Oge
Marques

Challenge: users’ needs and intentions

•  Users and developers have quite different views

•  Cultural and contextual information should be
taken into account

•  User intentions are hard to infer

–  Privacy issues

–  Users themselves don’t always know what they want

–  Who misses the MS Ofﬁce paper clip?

Oge
Marques


•  The user’s
perspective

–  What do they
want?

–  Where do
they want to
search?

–  In what form
do they
express their Source: R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image Retrieval: Ideas,
query?

Inﬂuences, and Trends of the New Age”, ACM Computing Surveys, April 2008.

Oge
Marques


•  The image
retrieval system
should be able to
be mindful of:

–  How users wish
the results to be
presented

–  Where users
desire to search

–  The nature of
user input/ Source: R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image Retrieval: Ideas,
Inﬂuences, and Trends of the New Age”, ACM Computing Surveys, April 2008.

interaction.

Oge
Marques


•  Each application has
different users (with
different intent, needs,
background, cultural bias,
etc.) and different visual
assets.

???

Oge
Marques

Challenge: growing up (as a ﬁeld)

•  It’s been 10 years since the “end of the early years”

–  Are the challenges from 2000 still relevant?

–  Are the directions and guidelines from 2000 still
appropriate?

–  Have we grown up (at all)?

–  Let’s revisit the ‘Concluding Remarks’ from that paper…

Oge
Marques

Revisiting [Smeulders et al. 2000]

What they said

How I see it

•  Driving forces

•  Yes, we have seen many new
audiences, new purposes, new
–  “[…] content-based image styles of use, and new modes
retrieval (CBIR) will continue of interaction emerge.

to grow in every direction:
new audiences, new purposes, •  Each of these usually requires
new styles of use, new modes new methods to solve the
of interaction, larger data sets, problems that they bring.

and new methods to solve
the problems.”

•  However, not too many
researchers see them as a
driving force (as they should).

Oge
Marques


What they said

How I see it

•  Heritage of computer vision

•  I’m afraid I have bad news…

–  Computer vision hasn’t made
–  “An important obstacle to so much progress during the
overcome […] is to realize past 10 years.

that image retrieval does not
entail solving the general –  Some classical problems
image understanding (including image
understanding)
problem.”

remain unresolved.

–  Similarly, CBIR from a
pure computer vision
perspective didn’t work
too well either.

Oge
Marques


What they said

How I see it

•  Inﬂuence on computer •  The adoption of large data sets
became standard practice in
vision

computer vision.

–  “[…] CBIR offers a different •  No reliance on strong
look at traditional computer segmentation (still unresolved) led
to new areas of research, e.g.,
vision problems: large data automatic ROI extraction and RBIR.

sets, no reliance on strong •  Color image processing and color
segmentation, and revitalized descriptors became incredibly
interest in color image popular, useful, and (to some
processing and invariance.”

degree) effective.

•  Invariance still a huge problem

–  But it’s cheaper than ever to have
multiple views.

Oge
Marques


What they said

How I see it

•  Similarity and learning

•  The authors were pointing in the
right direction (human in the
–  “We make a pledge for the loop, role of context, beneﬁts
importance of human- based from learning,…)

similarity rather than general
similarity. Also, the connection •  However:

between image semantics, –  Similarity is a tough problem to
crack and model.

image data, and query context •  Even the understanding of how
will have to be made clearer humans judge image similarity is
very limited.

in the future.”

–  Machine learning is almost
–  “[…] in order to bring inevitable…

•  … but sometimes it can be
semantics to the user, learning abused.

is inevitable.”

Oge
Marques


What they said

How I see it

•  Interaction

•  Signiﬁcant progress on
–  Better visualization options, visualization interfaces and
more control to the user, devices.

ability to provide feedback
[…]

•  Relevance Feedback: still a
very tricky tradeoff (effort
vs. perceived beneﬁt), but
more popular than ever
(rating, thumbs up/down,
etc.)

Oge
Marques


What they said

How I see it

•  Need for databases

•  Very little progress

–  “The connection between
CBIR and database research is –  Image search and retrieval has
likely to increase in the benefited much more from
future. […] problems like the document information
definition of suitable query retrieval than from database
languages, efficient search in research.

high dimensional feature
space, search in the presence
of changing similarity
measures are largely unsolved
[…]”

Oge
Marques


What they said

How I see it

•  The problem of evaluation

•  Signiﬁcant progress on
–  CBIR could use a reference benchmarks, standardized
standard against which new datasets, etc.

algorithms could be evaluated
(similar to TREC in the ﬁeld of –  ImageCLEF

text recognition).

–  Pascal VOC Challenge

–  “A comprehensive and publicly –  MSRA dataset

available collection of images, –  Simplicity dataset

sorted by class and retrieval –  UCID dataset and ground truth
purposes, together with a (GT)

protocol to standardize –  Accio / SIVAL dataset and GT

experimental practices, will be –  Caltech 101, Caltech 256

instrumental in the next phase –  LabelMe

of CBIR.”

Oge
Marques


What they said

How I see it

•  Semantic gap and other •  The semantic gap problem
sources

has not been solved (and
–  “A critical point in the maybe will never be…)

advancement of CBIR is the
semantic gap, where the
meaning of an image is rarely •  But the idea about using
self-evident. […] One way to other sources was right on
resolve the semantic gap the spot!

comes from sources outside
–  Geographical context

the image by integrating other
sources of information about the –  Social networks

image in the query.”

–  Tags

Oge
Marques

Part II

Medical Image Retrieval

Medical image retrieval

•  Challenges

–  We’re entering a new country…

•  How much can we bring?

•  Do we speak the language?

•  Do we know their culture?

•  Do they understand us and where we come from?

•  Opportunities

–  They use images (extensively)

–  They have expert knowledge

–  Domains are narrow (almost by deﬁnition)

–  Fewer clients, but potentially more $$

Oge
Marques

Medical image retrieval

•  Selected challenges:

–  Different terminology

–  Standards

–  Modality dependencies

•  Other challenges:

–  Equipment dependencies

–  Privacy issues

–  Proprietary data

Oge
Marques

Different terminology

•  Be prepared for:

–  New acronyms

•  CBMIR (Content-Based Medical Image Retrieval)

•  PACS (Picture Archiving and Communication System)

•  DICOM (Digital Imaging and COmmunication in Medicine)

•  Hospital Information Systems (HIS)

•  Radiological Information Systems (RIS)

–  New phrases

•  Imaging informatics

–  Lots of technical medical terms

Oge
Marques

Standards

•  DICOM (http://medical.nema.org/)

–  Global IT standard, created in 1993, used in virtually all
hospitals worldwide.

–  Designed to ensure the interoperability of different
systems and manage related workﬂow.

–  Will be required by all EHR systems that include imaging
information as an integral part of the patient record.

–  750+ technical and medical experts participate in 20+
active DICOM working groups.

–  Standard is updated 4-5 times per year.

–  Many available tools! (see http://www.idoimaging.com/)

Oge
Marques

Medical image modalities

•  The IRMA code [Lehmann et al., 2003]

–  4 axes with 3 to 4 positions, each in {0,...9,a,...,z}, where 0
denotes unspeciﬁed to determine the end of a path along an
axis.

•  Technical code (T) describes the imaging modality

•  Directional code (D) models body orientations

•  Anatomical code (A) refers to the body region examined

•  Biological code (B) describes the biological system
examined.

Oge
Marques


•  The IRMA code [Lehmann et al., 2003]

–  The entire code results in a character string of 14
characters (IRMA: TTTT – DDD – AAA – BBB).

Example: “x-ray, projection radiography,
analog, high energy – sagittal, left lateral
decubitus, inspiration – chest, lung –
respiratory system, lung”

Source: [Lehmann et al., 2003]
Oge
Marques


•  The IRMA code
[Lehmann et al.,
2003]

–  The companion
tool…

Source: [Lehmann et al., 2004]

Oge
Marques

CBMIR vs. text-based MIR

•  Most current retrieval systems in clinical use rely on
text keywords such as DICOM header information to
perform retrieval.

•  CBIR has been widely researched in a variety of
domains and provides an intuitive and expressive
method for querying visual data using features, e.g.
color, shape, and texture.

•  However, current CBIR systems:

–  are not easily integrated into the healthcare environment;

–  have not been widely evaluated using a large dataset; and

–  lack the ability to perform relevance feedback to reﬁne
retrieval results.

Source: [Hsu et al., 2009]
Oge
Marques

Who are the main players?

•  USA

–  NIH (National Institutes of Health)

•  NIBIB - National Institute of Biomedical Imaging and
Bioengineering

•  NCI - National Cancer Institute

•  NLM – National Libraries of Medicine

–  Several universities and hospitals

•  Europe

–  Aachen University (Germany)

–  Geneva University (Switzerland)

•  Big companies (Siemens, GE, etc.)

Oge
Marques

Medical image retrieval systems: examples

•  IRMA (Image Retrieval in Medical Applications)

–  Aachen University (Germany)

•  http://ganymed.imib.rwth-aachen.de/irma/

–  3 online demos:

•  IRMA Query demo: allows the evaluation of CBIR on several
databases.

•  IRMA Extended Query Reﬁnement demo: CBIR from the IRMA
database (a subset of 10,000 images).

•  Spine Pathology and Image Retrieval Systems (SPIRS) designed by the
NLM/NIH (USA): holds information of ~17,000 spine x-rays.

Oge
Marques


•  MedGIFT (GNU Image Finding Tool)

–  Geneva University (Switzerland)

•  http://www.sim.hcuge.ch/medgift/

–  Large effort, including projects such as:

•  Talisman (lung image retrieval)

•  Case-based fracture image retrieval system

•  Onco-Media: medical image retrieval + grid computing

•  ImageCLEF: evaluation and validation

•  medSearch

Oge
Marques


•  WebMIRS

–  NIH / NLM (USA)

•  http://archive.nlm.nih.gov/proj/webmirs/index.php

–  Query by text + navigation by categories

–  Uses datasets and related x-ray images from the
National Health and Nutrition Examination Survey
(NHANES)

Oge
Marques


•  SPIRS (Spine Pathology Image Retrieval System):
Web-based image retrieval system for large
biomedical databases

–  NIH / UCLA (USA)

–  Representative case study on highly specialized CBMIR

Source: [Hsu et al., 2009] Oge
Marques


•  National Biomedical Imaging Archive (NBIA)

–  NCI / NIH (USA)

•  https://imaging.nci.nih.gov/

–  Search based on metadata (DICOM ﬁelds)

–  3 search options:

•  Simple

•  Advanced

•  Dynamic

Oge
Marques


•  ARSS Goldminer

–  American Roentgen Ray Society (USA)

•  http://goldminer.arrs.org/

–  Query by text

–  Results can be ﬁltered by:

•  Modality

•  Age

•  Sex

Oge
Marques

Evaluation: ImageCLEF Medical Image Retrieval

•  ImageCLEF Medical Image
Retrieval

•  http://www.imageclef.org/2011/medical

–  Dataset: 77,000+ images from articles published in
medical journals including text of the captions and link
to the html of the full text articles.

–  3 types of tasks:

•  Modality Classiﬁcation: given an image, return its modality

•  Ad-hoc retrieval: classic medical retrieval task, with 3
“ﬂavors”: textual, mixed and semantic queries

•  Case-based retrieval: retrieve cases including images that
might best suit the provided case description.

Oge
Marques

Medical Image Retrieval: promising directions

•  Better user interfaces (responsive, highly interactive,
and capable of supporting relevance feedback)

•  New applications of CBMIR, including:

–  Teaching

–  Research

–  Diagnosis

–  PACS and Electronic Patient Records

•  CBMIR evaluation using medical experts

•  Integration of local and global features

•  New visual descriptors

Oge
Marques

Medical Image Retrieval: promising directions

•  New devices

Oge
Marques

Part III

Mobile visual search

Mobile visual search: driving factors

•  Age of mobile computing

hIp://60secondmarketer.com/blog/2011/10/18/more-‐mobile-‐phones-‐than-‐toothbrushes/

Oge
Marques


•  Why do I need a camera? I have a smartphone…

(22 Dec 2011)

hIp://www.cellular-‐news.com/story/52382.php

Oge
Marques


•  Powerful devices

1 GHz ARM
Cortex-A9
processor,
PowerVR
SGX543MP2,

Apple A5 chipset

hIp://www.apple.com/iphone/specs.html

hIp://www.gsmarena.com/apple_iphone_4s-‐4212.php

Oge
Marques


•  Powerful devices

hIp://europe.nokia.com/PRODUCT_METADATA_0/Products/Phones/8000-‐series/808/Nokia808PureView_Whitepaper.pdf

hIp://www.nokia.com/fr-‐fr/produits/mobiles/808/

Oge
Marques


Social networks
and mobile
devices

(May 2011)

hIp://jess3.com/geosocial-‐universe-‐2/

Oge
Marques


•  Social networks and mobile devices

–  Motivated users: image taking and image sharing are
huge!

:
hIp://www.onlinemarke_ng-‐trends.com/2011/03/facebook-‐photo-‐sta_s_cs-‐and-‐insights.html

Oge
Marques


•  Instagram:

–  50 million registered users (35 M in last four
months)

–  7 employees

–  A (growing ecosystem) based on it!

•  Search

•  Send postcards

•  Manage your photos

•  Build a poster

•  etc.

–  Sold to Facebook (for $ 1 Billion !)
earlier this year

hIp://thenextweb.com/apps/2011/12/07/instagram-‐hits-‐15m-‐users-‐and-‐has-‐2-‐people-‐working-‐on-‐an-‐android-‐app-‐right-‐now/

hIp://www.nuwomb.com/instagram/

Oge
Marques


•  Legitimate (or not quite…) needs and use cases

hIp://www.slideshare.net/dtunkelang/search-‐by-‐sight-‐google-‐goggles

hIps://twiIer.com/#!/courtanee/status/14704916575

Oge
Marques

Search system, a low-latency interactive visual search system. base and is the key to very fast retr
Several sidebars in this article invite the interested reader to dig features they have in common wit
deeper into the underlying algorithms. of potentially similar images is sele
Finally, a geometric verificatio


ROBUST MOBILE IMAGE RECOGNITION
Today, the most successful algorithms for content-based image
most similar matches in the datab
spatial pattern between features of
retrieval use an approach that is referred to as bag of features didate database image to ensure
(BoFs) or bag of words (BoWs). The BoW idea is borrowed from Example retrieval systems are pres
•  A natural use case for CBIR with QBE (at last!)

text retrieval. To find a particular text document, such as a Web
page, it is sufficient to use a few well-chosen words. In the
For mobile visual search, ther
to provide the users with an int
–  The example is right in front of the user!

database, the document itself can be likewise represented by a deployed systems typically transm
the server, which might require t
large databases, the inverted file in
memory swapping operations slow
ing stage. Further, the GV step
and thus increases the response t
the retrieval pipeline in the follow
the challenges of mobile visual se

Query Feature
Image Extraction

[FIG2] A Pipeline for image retrieva
from the query image. Feature mat
[FIG1] A snapshot of an outdoor mobile visual search system images in the database that have m
being used. The system augments the viewfinder with with the query image. The GV step
information about the objects it recognizes in the image taken feature locations that cannot be pl
with a camera phone. in viewing position.
Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques

MVS: technical challenges

•  How to ensure low latency (and interactive
queries) under constraints such as:

–  Network bandwidth

–  Computational power

–  Battery consumption

•  How to achieve robust visual recognition in spite
of low-resolution cameras, varying lighting
conditions, etc.

•  How to handle broad and narrow domains

Oge
Marques

MVS: Pipeline for image retrieval

Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques

3 scenarios

Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques

MVS: descriptor extraction

•  Interest point detection

•  Feature descriptor computation

Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques

Interest point detection

•  Numerous interest-point detectors have been proposed in
the literature:

–  Harris Corners (Harris and Stephens 1988)

–  Scale-Invariant Feature Transform (SIFT) Difference-of-Gaussian
(DoG) (Lowe 2004)

–  Maximally Stable Extremal Regions (MSERs) (Matas et al. 2002)

–  Hessian afﬁne (Mikolajczyk et al. 2005)

–  Features from Accelerated Segment Test (FAST) (Rosten and
Drummond 2006)

–  Hessian blobs (Bay, Tuytelaars and Van Gool 2006)

•  Different tradeoffs in repeatability and complexity

•  See (Mikolajczyk and Schmid 2005) for a comparative
performance evaluation of local descriptors in a common
framework.

Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques

Feature descriptor computation

•  After interest-point detection, we compute a
visual word descriptor on a normalized patch.

•  Ideally, descriptors should be:

–  robust to small distortions in scale, orientation, and
lighting conditions;

–  discriminative, i.e., characteristic of an image or a small
set of images;

–  compact, due to typical mobile computing constraints.

Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques


•  Examples of feature descriptors in the literature:

–  SIFT (Lowe 1999)

–  Speeded Up Robust Feature (SURF) interest-point
detector (Bay et al. 2008)

–  Gradient Location and Orientation Histogram (GLOH)
(Mikolajczyk and Schmid 2005)

–  Compressed Histogram of Gradients (CHoG)
(Chandrasekhar et al. 2009, 2010)

•  See (Winder, (Hua,) and Brown CVPR 2007, 2009) and
(Mikolajczyk and Schmid PAMI 2005) for comparative
performance evaluation of different descriptors.

Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques


•  What about compactness?

–  Option 1: Compress off-the-shelf descriptors.

•  Result: poor rate-constrained image-retrieval
performance.

–  Option 2: Design a descriptor with compression in
mind.

–  Example: CHoG (Compressed Histogram of Gradients)
(Chandrasekhar et al. 2009, 2010)

Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques

CHoG: Compressed Histogram of Gradients

Gradients
Gradient distributions
Patch
for each bin
dx

dy

dx
dy
011101

Spatial
0100101

binning
01101

101101

Histogram
0100011

111001

compression
0010011

01100

1010100

CHoG 
Descriptor
Bernd Girod: Mobile Visual Search
Chandrasekhar
et
al.
CVPR
09,10
Oge
Marques

CHoG: Compressed Histogram of Gradients

[3B2-9] mmu2011030086.3d 30/7/011 16:27 Page 92

•  Performance evaluation

–  Recall vs. bit rate

Industry and Standards

100
features, as they arrive.15 On
98 finds a result that has sufficien
ing score, it terminates the searc
96 ately sends the results back. T
optimization reduces system
Classification accuracy (%)

94
other factor of two.
92 Overall, the SPS system dem
using the described array of tec
90 bile visual-search systems can ac
ognition accuracy, scale to re
88
databases, and deliver search r
86 ceptable time.

84 Send feature (CHoG) Emerging MPEG standard
Send image (JPEG) As we have seen, key compo
82
Send feature (SIFT) gies for mobile visual search alr
80 we can choose among several p
100 101 102
tures to design such a system. W
Query size (Kbytes)
these options at the beginnin
Figure 7. Comparison of different schemes with regard to classification The architecture shown in Figur
Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques

est one to implement on a mobi
accuracy and query size. CHoG descriptor data is an order of magnitude
smaller compared to JPEG images or uncompressed SIFT descriptors. requires fast networks such as W
good performance. The archite

MVS: feature indexing and matching

•  Goal: produce a data structure that can quickly return a short
list of the database candidates most likely to match the query
image.

–  The short list may contain false positives as long as the correct match
is included.

–  Slower pairwise comparisons can be subsequently performed on just
the short list of candidates rather than the entire database.

•  Example of a technique: Vocabulary Tree (VT)-Based Retrieval

Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques

MVS: geometric veriﬁcation

•  Goal: use location information of features in
query and database images to conﬁrm that the
feature matches are consistent with a change in
view-point between the two images.

Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques

ik2, c, ikNk 6 is sorted, it is more
utive ID differences 5 dk1 5 ik1,
es. is used to encode the inverted index.

2 ik1Nk 212 6 in place of the IDs. This
dex [58] can significantly reduce
cting recognition accuracy. First, [64] and recursive bottom-up complete (RBUC) code [65] have
been shown to be at least ten times faster in decoding than

MVS: geometric veriﬁcation

AC, while achieving comparable compression gains as AC. The
carryover and RBUC codes attain these speedups by enforcing
ed in text retrieval [62]. Second, word-aligned memory accesses.
n be quantized to a few repre- Figure S6(a) compares the memory usage of the invert-
•  Method: perform ed index with and without feature descriptorsRBUC evaluate
Max quantization. Third, the dis- pairwise matching of compression using the and
ces and visit counts are far from code. Index compression reduces memory usage from near-
geometricrate ly 10 GBof correspondences.

coding can be much more
consistency to 2 GB. This five times reduction leads to a sub-
•  Techniques:

oding. Using the distributions of stantial speedup in server-side processing, as shown in
counts, each inverted list can be Figure S6(b). Without compression, the large inverted
c code (AC) [63]. The geometricindex causes swapping between main anddatabase image is usually
–  Since keeping transform between the query and virtual memory estimated
very important for interactive regression down the retrieval engine. After compression,
using robust and slows techniques such as:

ions, a scheme that allows ultra- sample consensus (RANSAC) (Fischlermemory congestion
•  Random memory swapping is avoided and and Bolles 1981)

red over AC. The carryover code delays no longer contribute to the query latency.
•  Hough transform (Lowe 2004)

–  The transformation is often represented by an afﬁne mapping or a homography.

•  Note: GV is computationally expensive, which is why it’s only used for a subset
of images selected during the feature-matching stage.

onsistency checks to rerank
tion and scale information of
[53] and [69] propose incor-
tion into the VT matching or
71], the authors investigate
stimation itself. Philbin et al.
atching features to propose
c transformation model and
hypotheses. Weak geometric
cally used to rerank a larger
ore a full GVt
al.
Iperformed on011

Girod
e is EEE
Mul_media
2 Oge
Marques

[FIG4] In the GV step, we match feature descriptors pairwise and
find feature correspondences that are consistent with a geometric
add a geometric reranking step

Datasets for MVS research

•  Stanford Mobile Visual Search Data Set
(http://web.cs.wpi.edu/~claypool/mmsys-dataset/2011/stanford/)

–  Key characteristics:

•  rigid objects

•  widely varying lighting conditions

•  perspective distortion

•  foreground and background clutter

•  realistic ground-truth reference data

•  query data collected from heterogeneous low and high-end
camera phones.

Chandrasekhar
et
al.
ACM
MMSys
2011
Oge
Marques

SMVS Data Set: categories and examples

•  DVD covers

hIp://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/dvd_covers.html

Oge
Marques


•  CD covers

hIp://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/cd_covers.html

Oge
Marques


•  Museum paintings

hIp://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/museum_pain_ngs.html

Oge
Marques

Other MVS data sets

ISO/IEC
JTC1/SC29/WG11/N12202
-‐
July
2011,
Torino,
IT
Oge
Marques

MPEG Compact Descriptors for Visual Search (CDVS)

•  Objective

–  Deﬁne a standard that enables efﬁcient
implementation of visual search functionality on mobile
devices

•  Scope

•  bitstream of descriptors

•  parts of descriptor extraction process (e.g. key-point
detection) needed to ensure interoperability

–  Additional info:

•  https://mailhost.tnt.uni-hannover.de/mailman/listinfo/cdvs

•  http://mpeg.chiariglione.org/meetings/geneva11-1/geneva_ahg.htm (Ad hoc groups)

Bober,
Cordara,
and
Reznik
(2010)
Oge
Marques

MPEG CDVS

[3B2-9] mmu2011030086.3d 1/8/011 16:44 Page 93

•  Summarized timeline

Table 1. Timeline for development of MPEG standard for visual search.

When Milestone Comments
March, 2011 Call for Proposals is published Registration deadline: 11 July 2011
Proposals due: 21 November 2011
December, 2011 Evaluation of proposals None
February, 2012 1st Working Draft First specification and test software model that can
be used for subsequent improvements.
July, 2012 Committee Draft Essentially complete and stabilized specification.
January, 2013 Draft International Standard Complete specification. Only minor editorial
changes are allowed after DIS.
July, 2013 Final Draft International Finalized specification, submitted for approval and
Standard publication as International standard.

that among several component technologies for existing standards, such as MPEG Query For-
image retrieval, such a standard should focus pri- mat, HTTP, XML, JPEG, and JPSearch.
marily on defining the format of descriptors and
Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques

parts of their extraction process (such as interest Conclusions and outlook
point detectors) needed to ensure interoperabil- Recent years have witnessed remarkable

Examples

•  Google Goggles

•  SnapTell

•  oMoby (and the IQ Engines API)

•  pixlinQ

•  Moodstocks

Oge
Marques

Examples of commercial MVS apps

•  Google
Goggles

–  Android
and iPhone

–  Narrow-
domain
search and
retrieval

hIp://www.google.com/mobile/goggles

Oge
Marques

SnapTell

•  One of the earliest (ca. 2008) MVS apps for iPhone

–  Eventually acquired by Amazon (A9)

•  Proprietary technique (“highly accurate and robust
algorithm for image matching: Accumulated Signed Gradient
(ASG)”).

hIp://www.snaptell.com/technology/index.htm

Oge
Marques

oMoby (and the IQ Engines API)

–  iPhone app

hIp://omoby.com/pages/screenshots.php

Oge
Marques

oMoby (and the IQ Engines API)

•  The IQ Engines API:
“vision as a service”

hIp://www.iqengines.com/applica_ons.php

Oge
Marques

pixlinQ

•  A “mobile visual
search solution that
enables you to link
users to digital
content whenever
they take a mobile
picture of your
printed materials.”

–  Powered by image
recognition from LTU
technologies

hIp://www.pixlinq.com/home

Oge
Marques

pixlinQ

•  Example app (La Redoute)

hIp://www.youtube.com/watch?v=qUZCFtc42Q4

Oge
Marques

Moodstocks: overview

•  Ofﬂine image recognition thanks to a smart image
signatures synchronization

hIp://www.youtube.com/watch?v=tsxe23b12eU

Oge
Marques

Moodstocks: technology

•  Unique features:

–  ofﬂine image recognition thanks to a smart image signatures
synchronization,

–  QR Code decoding,

–  EAN 8/13 decoding,

–  online image recognition as a fallback for very large image databases,

–  simultaneous run of image recognition and barcode decoding,

–  seamless scans logging in the background.

•  Cross-platform (iOS / Android) client-side SDK and HTTP API
available: https://github.com/Moodstocks

•  JPEG encoder used within their SDK also publicly
available: https://github.com/Moodstocks/jpec

Oge
Marques

Moodstocks

•  Many successful apps for different platforms

hIp://www.moodstocks.com/gallery/

Oge
Marques

MVS: concluding thoughts

•  Mobile Visual Search (MVS) is coming of age.

•  This is not a fad and it can only grow.

•  Still a good research topic

–  Many relevant technical challenges

–  MPEG efforts have just started

•  Inﬁnite creative commercial possibilities

Oge
Marques

Part IV

Where is image search headed?

Where is image search headed?

•  Advice for [young] researchers

–  In this last part, I’ve compiled pieces and bits of advice
that I believe might help researchers who are entering
the ﬁeld.

–  They focus on research avenues that I personally
consider to be the most promising.

Oge
Marques

Advice for [young] researchers

• LOOK

• THINK

• UNDERSTAND

• CREATE

Oge
Marques


• LOOK…

–  at yourself (how do you search for images and videos?)

–  around (related areas and how they have grown)

–  at Google (and other major players)

Oge
Marques


• THINK…

–  mobile devices

–  new devices and services

–  social networks

–  games

Oge
Marques


• UNDERSTAND…

–  human intentions and emotions

–  the context of the search

–  user’s preferences and needs

Oge
Marques


• CREATE…

–  better interfaces

–  better user experience

–  new business opportunities (added value)

Oge
Marques

Concluding thoughts

–  I believe (but cannot prove…) that successful VIR
solutions will:

•  combine content-based image retrieval (CBIR) with
metadata (high-level semantic-based image retrieval)

•  only be truly successful in narrow domains

•  include the user in the loop

–  Relevance Feedback (RF)

–  Collaborative efforts (tagging, rating, annotating)

•  provide friendly, intuitive interfaces

•  incorporate results and insights from cognitive science,
particularly human visual attention, perception, and
memory

Oge
Marques

Concluding thoughts

•  “Image search and retrieval” is not a problem, but
rather a collection of related problems that look like
one.

•  There is a great need for good solutions to speciﬁc
problems.

•  10 years after “the end of the early years”, research in
visual information retrieval still has many open
problems, challenges, and opportunities.

Oge
Marques

Learn more about it

•  http://savvash.blogspot.com/

Oge
Marques

Thanks!

•  Questions?

•  For additional information: omarques@fau.edu

Oge
Marques

Advances and Challenges in Visual Information Search and Retrieval (WVC 2012 - Goiania-GO, Brazil)

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Advances and Challenges in Visual Information Search and Retrieval (WVC 2012 - Goiania-GO, Brazil)

Similar to Advances and Challenges in Visual Information Search and Retrieval (WVC 2012 - Goiania-GO, Brazil) (20)

Recently uploaded

Recently uploaded (20)

Advances and Challenges in Visual Information Search and Retrieval (WVC 2012 - Goiania-GO, Brazil)