Information Retrieval group seminar series. The University of Glasgow. 21st February 2011.
Southampton has a long history of research in the areas of multimedia information analysis. This talk will focus on some of the recent work we have been involved with in the area of image search. The talk will start by looking at how image content can be represented in ways analogous to textual information and how techniques developed for indexing text can be adapted to images. In particular, the talk will introduce ImageTerrier, a research platform for image retrieval that is built around Glasgow's Terrier software. The talk will also cover some of our recent work on image classification and image search result diversification.
5. Bags-of-visual words
In the computer vision community over recent years it has
become popular to model the content of an image in a
similar way to a “bag-of-terms” in textual document
analysis.
The quick brown
fox jumped over
the lazy dog!
Tokenisation
Stemming/Lemmatisation
Count Occurrences
Local Feature Extraction
Feature Quantization
Count Occurrences
brown dog fox jumped lazy over quick the
1 1 1 1 1 1 1 2[ ] 1[ 2 0 0 6 ]
The quick brown
fox jumped over
the lazy dog!
6. BoVW using local features
• Features localised by a
robust region detector and
described by a local
descriptor such as SIFT.
• A vocabulary of exemplar
feature-vectors is learnt.
– Traditionally through k-
means clustering.
• Local descriptors can then be
quantised to discrete visual
terms by finding the closest
exemplar in the vocabulary.
6
7. Clustering features into
vocabularies
• A typical image may have a few thousand local features.
• For indexing images, the number of discrete visual terms is
large (i.e. around 1 million terms).
–Smaller vocabularies are typically used in classification
tasks.
–Building vocabularies using k-means is hard:
• i.e. 1M clusters, 128-dimensional vectors, >>>10M
samples.
• Special k-means variants developed to deal with this
(perhaps! - see next slide) [i.e. AKM (Philbin et al, 2007),
HKM (Nistér and Stewénius, 2006)].
• Other tricks can also be applied by exploiting the shape of
the space in the vectors can lie. 7
8. A preliminary study of the
performance of clustering techniques
2.000
2.375
2.750
3.125
3.500
10000 1000000 3000000
HKM/SIFT AKM/SIFT RND/SIFT
UKBenchScore
(4==perfectretrieval)
Vocabulary Size
10. Lets start with a demo!
• Task: Given a query image, match it against a collection of
geo-tagged and labelled images and attempt to determine:
–the location.
–potential tags/labels.
• Collection >114,000 images (∼36GB) crawled from Flickr
–All images are geo-tagged and from the Trentino & Alto-Adige
regions of northern Italy; many images have Flickr tags.
–Images represented by:
• Region detector: difference-of-Gaussian peaks.
• Local descriptor: SIFT.
• Vocabulary: 1 million terms, trained on a completely
separate dataset using AKM. 10
11.
12.
13. How does it work?
• Vector-space retrieval model.
• Compressed inverted index of visual terms created using a
single-pass indexer.
–Index is augmented with position information of each
term as it occurred in the image (x,y,scale,primary-
orientation).
• Searching is a one or two-pass operation:
–Images first ranked using a standard scoring function (i.e.
tf-idf, unweighted cosine, L1, L1+IDF, etc).
•L1+IDF distance works well.
–Top hits are then (potentially) re-ranked using geometric
information. 12
14. Geometric re-ranking
• A pair of images sharing a number of visual terms may or
may not be related.
–It can often be useful to ensure that the matching visual
terms are spatially consistent between the images.
•This is like the visual equivalent of phrase or proximity
searches in text.
•Spatial/geometric constraints can be very strict;
–i.e. there must be an exact transform between the
images (homography, affine, etc)
•Or quite loose;
–i.e. all pairs of matching terms should have a similar
relative orientation or scale. 13
15. Introducing ImageTerrier!
• It would take a considerable amount of effort to write a new search
engine from scratch.
–So, we’ve been building ours on top of Terrier ☺
–ImageTerrier is a set of classes that extend the Terrier core:
• Collections and documents that read data produced from
image feature extractors.
• New indexers and supporting classes to make compressed
augmented inverted indices for visual term data.
• New distance measures implemented as WeightingModels.
• Geometric re-ranking implemented as
DocumentScoreModifiers.
• Command-line tools for indexing and searching.
14
16. Scalability: Using Map-Reduce
• Indexing is an expensive operation, but compared to feature
extraction the cost is small.
–One solution to being able to deal with larger datasets is to
distribute the workload across multiple machines.
–The Map-Reduce framework popularised by Google can let
us do this in a way that minimises data transfer by
distributing data and performing work only on the local
portions.
• We have been experimenting with M-R implementations of
our image processing tools that enable us to work on much
bigger datasets.
15
18. Implementation Details
• Entire toolset implemented in pure Java:
–Using Hadoop 0.20.2 currently.
–Many different state-of-the-art techniques for creating
visual-terms implemented.
–Completely re-written Terrier/Hadoop implementation
•Works with the new M-R API.
•Doesn’t subclass BasicSinglePassIndexer.
–Allows any indexer implementation to be used
internally (useful as ImageTerrier has a number of
different ones).
17
19. ImageTerrier: Next steps
• Test (and benchmark) the scalability:
–100,000 images; no problem
•(this was without Hadoop indexing, but did use Hadoop
for visual-term extraction)
–1,000,000 images? 10,000,000 images?
• Open-source the code.
–Aiming for a summer release:
•ImageTerrier + image processing toolkit + tools
–and demos,
–and documentation!
18
21. Implicit Search Result Diversification
• Diversity in search result rankings is needed when users'
queries are poorly specified or ambiguous.
–By presenting a diverse range of results covering all possible
representations of a query the probability of finding relevant
images is increased.
–Duplicate images should be reduced, as they are not
considered useful by users.
• In 2009, we participated in an ImageCLEF task that addressed
this issue.
–Corpus of 498920 images from the Belga news agency.
–84 runs from 19 different research groups were submitted.
•1st and 2nd place in the 'part 1 queries' category. 20
22. Pure Visual Diversification
IAM@ImageCLEFphoto 2009 - results of a search for ‘euro’
without visual features
with visual features
21
Hare, J., Dupplaw, D. and Lewis, P. (2009) IAM@ImageCLEFphoto 2009: Experiments on Maximising Diversity using Image Features. In: CLEF 2009 Workshop, 30
September - 2 October 2009, Corfu, Greece. p. 42.
32. Explicit Search Result Diversification
• Sometimes users have an idea about how they would like
their results to be diversified.
–For example; searching for images of Arnold
Schwarzenegger diversified by the films he has starred in.
• Using a combination of different technologies we have built
a prototype search engine that allows this kind of query.
23
http://www.diversity-search.info
33.
34.
35. DBPedia
Triple-Store
Yahoo
Wikipedia
Search Index
Yahoo Image
Search Index
DBPedia
SPARQL
Endpoint
Yahoo BOSS Web Service
Diversity Enabled Image Search Engine
Yahoo Web
Search Index
Yahoo News
Search Index
Query Specification
Subject Context
Diversity
Axis
Search for
Wikipedia Page
about Subject
Get DBPedia URI
for Subject
Search DBPedia
for Resources
and Literals that
link the Diversity
Specification with
the Subject URI
Generate a list of queries, by combining
subject, context and resource names/literal
text. Use "dbprop:redirect of" to provide query
expansion for resources.
Search for documents of the given type using
the generated queries
Organise and present results
Response
Search
Type
How does it work?
25
37. Classifying sentiment
• Recently we’ve been investigating the possibility of
estimating the sentiment of an image based on its visual
characteristics.
–Applications include search and diversification along a
sentiment axis.
Joint work with Stefan Siersdorfer, Enrico Minack and Fan Deng at the L3S Research Centre in Hannover
Siersdorfer, S., Hare, J., Minack, E. and Deng, F. (2010) Analyzing and Predicting Sentiment of Images on the Social Web. In: ACM Multimedia 2010, 25-29 October
2010, Firenze, Italy. pp. 715-718.
Zontone, P., Boato, G., Hare, J., Lewis, P., Siersdorfer, S. and Minack, E. (2010) Image and Collateral Text in Support of Auto-annotation and Sentiment Analysis. In:
TextGraphs-5: Graph-based Methods for Natural Language Processing, 16th July 2010, Uppsala, Sweden. pp. 88-92.
38. Sentiment Image Dataset
The top-1000 most positive and top-1000 most negative words from
SentiWordNet were selected to form query terms for images that
were likely to be associated with either positive or negative sentiment.
NEGATIVE
POSITIVE
Up to 5000 images per term were selected by searching Flickr with
each query term. Over 586000 images were collected together with
their respective metadata.
39. Visual Features
29
Global Colour Histograms
Local Colour Histograms
A B C D
1
2
3
4
A2 A2
C3 C3
D1
D1
Quantised SIFT Feature Histograms
40. Positive
GCH
NegativePositive
LCH
NegativePositive
SIFT
Negative
Figure 4: Images classified as positive and negative based on the three features: GCH, LCH, an
discover image features that are most correlated with sen-
timents. For each feature, we computed the MI value with
features appear to be biased away from the far
image plane.
SW 294,559 199,370 493,929
SWN-avg-0.00 316,089 238,388 554,477
SWN-avg-0.10 260,225 190,012 450,237
SWN-avg-0.20 194,700 149,096 343,796
RND 293,456 292,812 586,268
able 1: Statistics on labeled images in the dataset
0.5
0.6
0.7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
Recall
Precision at Recall (SW)
0.5
0.6
0.7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
Recall
Precision at Recall (SWN-avg-0.20)
SIFT
GCH
LCH
GCH+LCH
GCH+SIFT
LCH+SIFT
RND
5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Precision at Recall (SWN-avg-0.20)
SIFT
GCH
LCH
GCH+LCH
GCH+SIFT
LCH+SIFT
RND
Classification results for sentiment assign-
W and SWN-avg- with = 0.20 for training
00 photos per category
an SVM model on these labeled data and tested
maining labeled data. For testing, we chose an
ber of positive and negative test images, with at
0 of each kind. We used the SVMlight [17] imple-
of linear support vector machines (SVMs) with
arameterization in our experiments, as this has
Classification Experiments
30
Binary classification experiments
using a linear SVM.
100,000 training images (50:50).
35,000 test images.
41. Sentiment Correlated Features
Mutual information used to
investigate which visual
features are most strongly
correlated with positive or
negative sentiment.
31
Global Colour Histogram Local Colour Histogram SIFT Visual Terms
positive negative positive negative positive negative