Searching Images
Recent research at Southampton
Joint work with Paul Lewis, David Dupplaw & Sina Samangooei
Jonathon Hare 

21 February 2011
2
http://www.livingknowledge-project.eu
http://www.livememories.org
http://www.arcomem.eu
Contents
• Image feature representation
• Highly scalable content-based image search
–Demo: Guessing tags and geo location of images
–Compressed single-pass indexing to build an augmented
index
–Using Map-Reduce for scalability
• Diversity in search result ranking
–Implicit image search diversification
–Explicit search diversification
• Classifying images to aid search and diversification
–Sentiment classification
3
Image Feature
Representation
Bags-of-visual words
In the computer vision community over recent years it has
become popular to model the content of an image in a
similar way to a “bag-of-terms” in textual document
analysis.
The quick brown
fox jumped over
the lazy dog!
Tokenisation
Stemming/Lemmatisation
Count Occurrences
Local Feature Extraction
Feature Quantization
Count Occurrences
brown dog fox jumped lazy over quick the
1 1 1 1 1 1 1 2[ ] 1[ 2 0 0 6 ]
The quick brown
fox jumped over
the lazy dog!
BoVW using local features
• Features localised by a
robust region detector and
described by a local
descriptor such as SIFT.
• A vocabulary of exemplar
feature-vectors is learnt.
– Traditionally through k-
means clustering.
• Local descriptors can then be
quantised to discrete visual
terms by finding the closest
exemplar in the vocabulary.
6
Clustering features into
vocabularies
• A typical image may have a few thousand local features.
• For indexing images, the number of discrete visual terms is
large (i.e. around 1 million terms).
–Smaller vocabularies are typically used in classification
tasks.
–Building vocabularies using k-means is hard:
• i.e. 1M clusters, 128-dimensional vectors, >>>10M
samples.
• Special k-means variants developed to deal with this
(perhaps! - see next slide) [i.e. AKM (Philbin et al, 2007),
HKM (Nistér and Stewénius, 2006)].
• Other tricks can also be applied by exploiting the shape of
the space in the vectors can lie. 7
A preliminary study of the
performance of clustering techniques
2.000
2.375
2.750
3.125
3.500
10000 1000000 3000000
HKM/SIFT AKM/SIFT RND/SIFT
UKBenchScore
(4==perfectretrieval)
Vocabulary Size
Highly scalable
content-based
image search
Lets start with a demo!
• Task: Given a query image, match it against a collection of
geo-tagged and labelled images and attempt to determine:
–the location.
–potential tags/labels.
• Collection >114,000 images (∼36GB) crawled from Flickr
–All images are geo-tagged and from the Trentino & Alto-Adige
regions of northern Italy; many images have Flickr tags.
–Images represented by:
• Region detector: difference-of-Gaussian peaks.
• Local descriptor: SIFT.
• Vocabulary: 1 million terms, trained on a completely
separate dataset using AKM. 10
How does it work?
• Vector-space retrieval model.
• Compressed inverted index of visual terms created using a
single-pass indexer.
–Index is augmented with position information of each
term as it occurred in the image (x,y,scale,primary-
orientation).
• Searching is a one or two-pass operation:
–Images first ranked using a standard scoring function (i.e.
tf-idf, unweighted cosine, L1, L1+IDF, etc).
•L1+IDF distance works well.
–Top hits are then (potentially) re-ranked using geometric
information. 12
Geometric re-ranking
• A pair of images sharing a number of visual terms may or
may not be related.
–It can often be useful to ensure that the matching visual
terms are spatially consistent between the images.
•This is like the visual equivalent of phrase or proximity
searches in text.
•Spatial/geometric constraints can be very strict;
–i.e. there must be an exact transform between the
images (homography, affine, etc)
•Or quite loose;
–i.e. all pairs of matching terms should have a similar
relative orientation or scale. 13
Introducing ImageTerrier!
• It would take a considerable amount of effort to write a new search
engine from scratch.
–So, we’ve been building ours on top of Terrier ☺
–ImageTerrier is a set of classes that extend the Terrier core:
• Collections and documents that read data produced from
image feature extractors.
• New indexers and supporting classes to make compressed
augmented inverted indices for visual term data.
• New distance measures implemented as WeightingModels.
• Geometric re-ranking implemented as
DocumentScoreModifiers.
• Command-line tools for indexing and searching.
14
Scalability: Using Map-Reduce
• Indexing is an expensive operation, but compared to feature
extraction the cost is small.
–One solution to being able to deal with larger datasets is to
distribute the workload across multiple machines.
–The Map-Reduce framework popularised by Google can let
us do this in a way that minimises data transfer by
distributing data and performing work only on the local
portions.
• We have been experimenting with M-R implementations of
our image processing tools that enable us to work on much
bigger datasets.
15
Feature
Extraction
Mappers
Quantisation
Mappers
ImageTerrier
Map-Reduce
Indexing
ImageTerrier
Index
Image Corpus
Assign
features to
Centroids
Recalculate
Centroids
Assign
features to
Centroids
Recalculate
Centroids
Assign
features to
Centroids
Recalculate
Centroids
Assign
features to
Centroids
Recalculate
Centroids
...
Generate Vocabulary
A Complete Map-Reduce Pipeline
16
Implementation Details
• Entire toolset implemented in pure Java:
–Using Hadoop 0.20.2 currently.
–Many different state-of-the-art techniques for creating
visual-terms implemented.
–Completely re-written Terrier/Hadoop implementation
•Works with the new M-R API.
•Doesn’t subclass BasicSinglePassIndexer.
–Allows any indexer implementation to be used
internally (useful as ImageTerrier has a number of
different ones).
17
ImageTerrier: Next steps
• Test (and benchmark) the scalability:
–100,000 images; no problem
•(this was without Hadoop indexing, but did use Hadoop
for visual-term extraction)
–1,000,000 images? 10,000,000 images?
• Open-source the code.
–Aiming for a summer release:
•ImageTerrier + image processing toolkit + tools
–and demos,
–and documentation!
18
Diversity in search
result ranking
Implicit Search Result Diversification
• Diversity in search result rankings is needed when users'
queries are poorly specified or ambiguous.
–By presenting a diverse range of results covering all possible
representations of a query the probability of finding relevant
images is increased.
–Duplicate images should be reduced, as they are not
considered useful by users.
• In 2009, we participated in an ImageCLEF task that addressed
this issue.
–Corpus of 498920 images from the Belga news agency.
–84 runs from 19 different research groups were submitted.
•1st and 2nd place in the 'part 1 queries' category. 20
Pure Visual Diversification
IAM@ImageCLEFphoto 2009 - results of a search for ‘euro’
without visual features
with visual features
21
Hare, J., Dupplaw, D. and Lewis, P. (2009) IAM@ImageCLEFphoto 2009: Experiments on Maximising Diversity using Image Features. In: CLEF 2009 Workshop, 30
September - 2 October 2009, Corfu, Greece. p. 42.
Input from text
search
Input from text
search
Local Feature
Extraction
Input from text
search
Local Feature
Extraction
Feature
Histograms/
Vectors
Input from text
search
Image distribution in feature-space
Input from text
search
Image distribution in feature-space
Iterative re-ranking procedure by
maximising distance in feature-space
✸
Input from text
search
Image distribution in feature-space
Iterative re-ranking procedure by
maximising distance in feature-space
✸
✸
Input from text
search
Image distribution in feature-space
Iterative re-ranking procedure by
maximising distance in feature-space
✸✸
✸
Input from text
search
Image distribution in feature-space
Iterative re-ranking procedure by
maximising distance in feature-space
✸✸
✸
✸
Input from text
search
Re-ranked
output
Image distribution in feature-space
Iterative re-ranking procedure by
maximising distance in feature-space
✸✸
✸
✸
Explicit Search Result Diversification
• Sometimes users have an idea about how they would like
their results to be diversified.
–For example; searching for images of Arnold
Schwarzenegger diversified by the films he has starred in.
• Using a combination of different technologies we have built
a prototype search engine that allows this kind of query.
23
http://www.diversity-search.info
DBPedia
Triple-Store
Yahoo
Wikipedia
Search Index
Yahoo Image
Search Index
DBPedia
SPARQL
Endpoint
Yahoo BOSS Web Service
Diversity Enabled Image Search Engine
Yahoo Web
Search Index
Yahoo News
Search Index
Query Specification
Subject Context
Diversity
Axis
Search for
Wikipedia Page
about Subject
Get DBPedia URI
for Subject
Search DBPedia
for Resources
and Literals that
link the Diversity
Specification with
the Subject URI
Generate a list of queries, by combining
subject, context and resource names/literal
text. Use "dbprop:redirect of" to provide query
expansion for resources.
Search for documents of the given type using
the generated queries
Organise and present results
Response
Search
Type
How does it work?
25
Classifying images
to aid search and
diversification
Classifying sentiment
• Recently we’ve been investigating the possibility of
estimating the sentiment of an image based on its visual
characteristics.
–Applications include search and diversification along a
sentiment axis.
Joint work with Stefan Siersdorfer, Enrico Minack and Fan Deng at the L3S Research Centre in Hannover
Siersdorfer, S., Hare, J., Minack, E. and Deng, F. (2010) Analyzing and Predicting Sentiment of Images on the Social Web. In: ACM Multimedia 2010, 25-29 October
2010, Firenze, Italy. pp. 715-718.
Zontone, P., Boato, G., Hare, J., Lewis, P., Siersdorfer, S. and Minack, E. (2010) Image and Collateral Text in Support of Auto-annotation and Sentiment Analysis. In:
TextGraphs-5: Graph-based Methods for Natural Language Processing, 16th July 2010, Uppsala, Sweden. pp. 88-92.
Sentiment Image Dataset
The top-1000 most positive and top-1000 most negative words from
SentiWordNet were selected to form query terms for images that
were likely to be associated with either positive or negative sentiment.
NEGATIVE
POSITIVE
Up to 5000 images per term were selected by searching Flickr with
each query term. Over 586000 images were collected together with
their respective metadata.
Visual Features
29
Global Colour Histograms
Local Colour Histograms
A B C D
1
2
3
4
A2 A2
C3 C3
D1
D1
Quantised SIFT Feature Histograms
Positive
GCH
NegativePositive
LCH
NegativePositive
SIFT
Negative
Figure 4: Images classified as positive and negative based on the three features: GCH, LCH, an
discover image features that are most correlated with sen-
timents. For each feature, we computed the MI value with
features appear to be biased away from the far
image plane.
SW 294,559 199,370 493,929
SWN-avg-0.00 316,089 238,388 554,477
SWN-avg-0.10 260,225 190,012 450,237
SWN-avg-0.20 194,700 149,096 343,796
RND 293,456 292,812 586,268
able 1: Statistics on labeled images in the dataset
0.5
0.6
0.7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
Recall
Precision at Recall (SW)
0.5
0.6
0.7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
Recall
Precision at Recall (SWN-avg-0.20)
SIFT
GCH
LCH
GCH+LCH
GCH+SIFT
LCH+SIFT
RND
5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Precision at Recall (SWN-avg-0.20)
SIFT
GCH
LCH
GCH+LCH
GCH+SIFT
LCH+SIFT
RND
Classification results for sentiment assign-
W and SWN-avg- with = 0.20 for training
00 photos per category
an SVM model on these labeled data and tested
maining labeled data. For testing, we chose an
ber of positive and negative test images, with at
0 of each kind. We used the SVMlight [17] imple-
of linear support vector machines (SVMs) with
arameterization in our experiments, as this has
Classification Experiments
30
Binary classification experiments
using a linear SVM.
100,000 training images (50:50).
35,000 test images.
Sentiment Correlated Features
Mutual information used to
investigate which visual
features are most strongly
correlated with positive or
negative sentiment.
31
Global Colour Histogram Local Colour Histogram SIFT Visual Terms
positive negative positive negative positive negative
Thank You!
Any Questions?

Searching Images: Recent research at Southampton

  • 1.
    Searching Images Recent researchat Southampton Joint work with Paul Lewis, David Dupplaw & Sina Samangooei Jonathon Hare 
 21 February 2011
  • 2.
  • 3.
    Contents • Image featurerepresentation • Highly scalable content-based image search –Demo: Guessing tags and geo location of images –Compressed single-pass indexing to build an augmented index –Using Map-Reduce for scalability • Diversity in search result ranking –Implicit image search diversification –Explicit search diversification • Classifying images to aid search and diversification –Sentiment classification 3
  • 4.
  • 5.
    Bags-of-visual words In thecomputer vision community over recent years it has become popular to model the content of an image in a similar way to a “bag-of-terms” in textual document analysis. The quick brown fox jumped over the lazy dog! Tokenisation Stemming/Lemmatisation Count Occurrences Local Feature Extraction Feature Quantization Count Occurrences brown dog fox jumped lazy over quick the 1 1 1 1 1 1 1 2[ ] 1[ 2 0 0 6 ] The quick brown fox jumped over the lazy dog!
  • 6.
    BoVW using localfeatures • Features localised by a robust region detector and described by a local descriptor such as SIFT. • A vocabulary of exemplar feature-vectors is learnt. – Traditionally through k- means clustering. • Local descriptors can then be quantised to discrete visual terms by finding the closest exemplar in the vocabulary. 6
  • 7.
    Clustering features into vocabularies •A typical image may have a few thousand local features. • For indexing images, the number of discrete visual terms is large (i.e. around 1 million terms). –Smaller vocabularies are typically used in classification tasks. –Building vocabularies using k-means is hard: • i.e. 1M clusters, 128-dimensional vectors, >>>10M samples. • Special k-means variants developed to deal with this (perhaps! - see next slide) [i.e. AKM (Philbin et al, 2007), HKM (Nistér and Stewénius, 2006)]. • Other tricks can also be applied by exploiting the shape of the space in the vectors can lie. 7
  • 8.
    A preliminary studyof the performance of clustering techniques 2.000 2.375 2.750 3.125 3.500 10000 1000000 3000000 HKM/SIFT AKM/SIFT RND/SIFT UKBenchScore (4==perfectretrieval) Vocabulary Size
  • 9.
  • 10.
    Lets start witha demo! • Task: Given a query image, match it against a collection of geo-tagged and labelled images and attempt to determine: –the location. –potential tags/labels. • Collection >114,000 images (∼36GB) crawled from Flickr –All images are geo-tagged and from the Trentino & Alto-Adige regions of northern Italy; many images have Flickr tags. –Images represented by: • Region detector: difference-of-Gaussian peaks. • Local descriptor: SIFT. • Vocabulary: 1 million terms, trained on a completely separate dataset using AKM. 10
  • 13.
    How does itwork? • Vector-space retrieval model. • Compressed inverted index of visual terms created using a single-pass indexer. –Index is augmented with position information of each term as it occurred in the image (x,y,scale,primary- orientation). • Searching is a one or two-pass operation: –Images first ranked using a standard scoring function (i.e. tf-idf, unweighted cosine, L1, L1+IDF, etc). •L1+IDF distance works well. –Top hits are then (potentially) re-ranked using geometric information. 12
  • 14.
    Geometric re-ranking • Apair of images sharing a number of visual terms may or may not be related. –It can often be useful to ensure that the matching visual terms are spatially consistent between the images. •This is like the visual equivalent of phrase or proximity searches in text. •Spatial/geometric constraints can be very strict; –i.e. there must be an exact transform between the images (homography, affine, etc) •Or quite loose; –i.e. all pairs of matching terms should have a similar relative orientation or scale. 13
  • 15.
    Introducing ImageTerrier! • Itwould take a considerable amount of effort to write a new search engine from scratch. –So, we’ve been building ours on top of Terrier ☺ –ImageTerrier is a set of classes that extend the Terrier core: • Collections and documents that read data produced from image feature extractors. • New indexers and supporting classes to make compressed augmented inverted indices for visual term data. • New distance measures implemented as WeightingModels. • Geometric re-ranking implemented as DocumentScoreModifiers. • Command-line tools for indexing and searching. 14
  • 16.
    Scalability: Using Map-Reduce •Indexing is an expensive operation, but compared to feature extraction the cost is small. –One solution to being able to deal with larger datasets is to distribute the workload across multiple machines. –The Map-Reduce framework popularised by Google can let us do this in a way that minimises data transfer by distributing data and performing work only on the local portions. • We have been experimenting with M-R implementations of our image processing tools that enable us to work on much bigger datasets. 15
  • 17.
    Feature Extraction Mappers Quantisation Mappers ImageTerrier Map-Reduce Indexing ImageTerrier Index Image Corpus Assign features to Centroids Recalculate Centroids Assign featuresto Centroids Recalculate Centroids Assign features to Centroids Recalculate Centroids Assign features to Centroids Recalculate Centroids ... Generate Vocabulary A Complete Map-Reduce Pipeline 16
  • 18.
    Implementation Details • Entiretoolset implemented in pure Java: –Using Hadoop 0.20.2 currently. –Many different state-of-the-art techniques for creating visual-terms implemented. –Completely re-written Terrier/Hadoop implementation •Works with the new M-R API. •Doesn’t subclass BasicSinglePassIndexer. –Allows any indexer implementation to be used internally (useful as ImageTerrier has a number of different ones). 17
  • 19.
    ImageTerrier: Next steps •Test (and benchmark) the scalability: –100,000 images; no problem •(this was without Hadoop indexing, but did use Hadoop for visual-term extraction) –1,000,000 images? 10,000,000 images? • Open-source the code. –Aiming for a summer release: •ImageTerrier + image processing toolkit + tools –and demos, –and documentation! 18
  • 20.
  • 21.
    Implicit Search ResultDiversification • Diversity in search result rankings is needed when users' queries are poorly specified or ambiguous. –By presenting a diverse range of results covering all possible representations of a query the probability of finding relevant images is increased. –Duplicate images should be reduced, as they are not considered useful by users. • In 2009, we participated in an ImageCLEF task that addressed this issue. –Corpus of 498920 images from the Belga news agency. –84 runs from 19 different research groups were submitted. •1st and 2nd place in the 'part 1 queries' category. 20
  • 22.
    Pure Visual Diversification IAM@ImageCLEFphoto2009 - results of a search for ‘euro’ without visual features with visual features 21 Hare, J., Dupplaw, D. and Lewis, P. (2009) IAM@ImageCLEFphoto 2009: Experiments on Maximising Diversity using Image Features. In: CLEF 2009 Workshop, 30 September - 2 October 2009, Corfu, Greece. p. 42.
  • 23.
  • 24.
    Input from text search LocalFeature Extraction
  • 25.
    Input from text search LocalFeature Extraction Feature Histograms/ Vectors
  • 26.
    Input from text search Imagedistribution in feature-space
  • 27.
    Input from text search Imagedistribution in feature-space Iterative re-ranking procedure by maximising distance in feature-space ✸
  • 28.
    Input from text search Imagedistribution in feature-space Iterative re-ranking procedure by maximising distance in feature-space ✸ ✸
  • 29.
    Input from text search Imagedistribution in feature-space Iterative re-ranking procedure by maximising distance in feature-space ✸✸ ✸
  • 30.
    Input from text search Imagedistribution in feature-space Iterative re-ranking procedure by maximising distance in feature-space ✸✸ ✸ ✸
  • 31.
    Input from text search Re-ranked output Imagedistribution in feature-space Iterative re-ranking procedure by maximising distance in feature-space ✸✸ ✸ ✸
  • 32.
    Explicit Search ResultDiversification • Sometimes users have an idea about how they would like their results to be diversified. –For example; searching for images of Arnold Schwarzenegger diversified by the films he has starred in. • Using a combination of different technologies we have built a prototype search engine that allows this kind of query. 23 http://www.diversity-search.info
  • 35.
    DBPedia Triple-Store Yahoo Wikipedia Search Index Yahoo Image SearchIndex DBPedia SPARQL Endpoint Yahoo BOSS Web Service Diversity Enabled Image Search Engine Yahoo Web Search Index Yahoo News Search Index Query Specification Subject Context Diversity Axis Search for Wikipedia Page about Subject Get DBPedia URI for Subject Search DBPedia for Resources and Literals that link the Diversity Specification with the Subject URI Generate a list of queries, by combining subject, context and resource names/literal text. Use "dbprop:redirect of" to provide query expansion for resources. Search for documents of the given type using the generated queries Organise and present results Response Search Type How does it work? 25
  • 36.
    Classifying images to aidsearch and diversification
  • 37.
    Classifying sentiment • Recentlywe’ve been investigating the possibility of estimating the sentiment of an image based on its visual characteristics. –Applications include search and diversification along a sentiment axis. Joint work with Stefan Siersdorfer, Enrico Minack and Fan Deng at the L3S Research Centre in Hannover Siersdorfer, S., Hare, J., Minack, E. and Deng, F. (2010) Analyzing and Predicting Sentiment of Images on the Social Web. In: ACM Multimedia 2010, 25-29 October 2010, Firenze, Italy. pp. 715-718. Zontone, P., Boato, G., Hare, J., Lewis, P., Siersdorfer, S. and Minack, E. (2010) Image and Collateral Text in Support of Auto-annotation and Sentiment Analysis. In: TextGraphs-5: Graph-based Methods for Natural Language Processing, 16th July 2010, Uppsala, Sweden. pp. 88-92.
  • 38.
    Sentiment Image Dataset Thetop-1000 most positive and top-1000 most negative words from SentiWordNet were selected to form query terms for images that were likely to be associated with either positive or negative sentiment. NEGATIVE POSITIVE Up to 5000 images per term were selected by searching Flickr with each query term. Over 586000 images were collected together with their respective metadata.
  • 39.
    Visual Features 29 Global ColourHistograms Local Colour Histograms A B C D 1 2 3 4 A2 A2 C3 C3 D1 D1 Quantised SIFT Feature Histograms
  • 40.
    Positive GCH NegativePositive LCH NegativePositive SIFT Negative Figure 4: Imagesclassified as positive and negative based on the three features: GCH, LCH, an discover image features that are most correlated with sen- timents. For each feature, we computed the MI value with features appear to be biased away from the far image plane. SW 294,559 199,370 493,929 SWN-avg-0.00 316,089 238,388 554,477 SWN-avg-0.10 260,225 190,012 450,237 SWN-avg-0.20 194,700 149,096 343,796 RND 293,456 292,812 586,268 able 1: Statistics on labeled images in the dataset 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision Recall Precision at Recall (SW) 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision Recall Precision at Recall (SWN-avg-0.20) SIFT GCH LCH GCH+LCH GCH+SIFT LCH+SIFT RND 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision at Recall (SWN-avg-0.20) SIFT GCH LCH GCH+LCH GCH+SIFT LCH+SIFT RND Classification results for sentiment assign- W and SWN-avg- with = 0.20 for training 00 photos per category an SVM model on these labeled data and tested maining labeled data. For testing, we chose an ber of positive and negative test images, with at 0 of each kind. We used the SVMlight [17] imple- of linear support vector machines (SVMs) with arameterization in our experiments, as this has Classification Experiments 30 Binary classification experiments using a linear SVM. 100,000 training images (50:50). 35,000 test images.
  • 41.
    Sentiment Correlated Features Mutualinformation used to investigate which visual features are most strongly correlated with positive or negative sentiment. 31 Global Colour Histogram Local Colour Histogram SIFT Visual Terms positive negative positive negative positive negative
  • 42.