Sharp images and
fuzzy concepts
Multimedia retrieval and the semantic gap
Jonathon Hare <jsh2@ecs.soton.ac.uk>
1
2
Artificial
Intelligence
Multimedia
Understanding
& Retrieval
Computer
Vision
Information
Retrieval
Text
Analysis
Signal
Processing
Machine
Learning
Why do people
search for images?
9
Content-
based
retrieval
Context-based
retrieval
Semantic
retrieval
Visual
index,
visual
query
Textual
index,
textual
query
Textual or
structured
query
Auto-annotation
based retrieval
Visual
index,
textual
query
How are images traditionally
indexed and retrieved?
Metadata Image
circa 1960: A windmill in
Romania. (Photo
by Keystone/Getty Images)
Labour politician Ellen
Wilkinson (1891-1947)
making a speech at a 'Save
Peace' demonstration in
Trafalgar Square, London,
on the Czech-German crisis,
18th September 1938. Photo
by A. Hudson/Topical Press
Agency/Hulton Archive/
Getty Images)
circa 1960: The Slovnaft oil
works near Bratislava at the
end of one branch of the
Druzba/Friendship pipeline
which brings oil from the
USSR to Czechoslovakia.
(Photo by Three Lions/Getty
Images)
search statement: romania
vocabulary control
text matcher
What about images on the
web?
What are some of
the problems?
Title: Romanian Windmill
Caption: circa 1960:A windmill in Romania. (Photo by Keystone/Getty Images)
Image #: 3404453
Photographer: Keystone/Stringer
Collection: Hulton Archive
Credit: Getty Images
Date created: 01 Jan 1960
Copyright: 2005 Getty Images
Keywords:
Finance, Lifestyles, Horizontal, Rural Scene, Black And White, Romania,
Wind Powered Building, Nobody, Fuel and Power Generation
$$
Fish!
Dolphin!
“Gypsy tart?”
Content based
retrieval
Descriptor
s
Imagesearch
statement
similarity
matcher
Feature
extraction
Dimensionality
Reduction
18
!
:
!
!
:
Kirby Moor
Country House
Hotel AA**
Longtown Road, Brampton, CA8 2AB
Phone: +44 (0)1697 73893
Fax: +44 (0)1697 741847
Check Availability
Content-based
Retrieval Applications
Auto-annotation
and the semantic
gap
!"#$%&'()
!"#$%&'($)*&+!,-.+/-'*,0'1!($
*+,"(&-.$+"/)
-21"!)+%',*1$-'!3'!"#$%&-
*+,"(&)
/(!&!&2/+%*)'%!1"+,*&+!,-'!3'
0$-%(+/&!(-
0")(1'2&31)
3$*&4($56$%&!(-
4$5-6"7'$
+1*7$-
!"#$%3%%&"'(%5'&8%)*"+%3%-
43$7)'7"%'%-93)"#'&"-:$&'3%$/-
;$1<=->$/'?31%'$-3%-@ABCB@DDA-$&-
@EFCGFCCH6I
!"#$
)*"+
,"'(
)-./-*0-(%1#"123%)'#4-*0%,-.4"*23%
546-#7#-8-#%9420".,'/23%:";,4-,%
(-2<,4=0",23%-0<>>>
Content-based
Image Retrieval
Semantic Image
Retrieval
Semantic
Retrieval
Automatic
Annotation
CAR, SUN CAR, TREE TREE ? ? ?
Semantic retrieval of unannotated images is equivalent to automatic annotation!
Semantic Retrieval?
Auto-Annotation?
Annotation and
retrieval with
semantic spaces
Bird
Sun
Beach
G
rass
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . ≅TD
Singular Value
Decomposition
Probabilistic Latent
Factor Models
Non-negative Matrix
Factorisation24
Semantic Spaces
Searching by Keyword
SUN
TRAIN
Semantic Spaces
Searching by Keyword
SUN
TRAIN
Ranked Search Results:
Search for images
about “SUN”
Semantic Spaces
Searching by Keyword
SUN
TRAIN
Ranked Search Results:
Search for images
about “SUN”
SUN
Semantic Spaces
Searching by Keyword
SUN
TRAIN
Ranked Search Results:
Search for images
about “SUN”
SUN
Demo
Annotation by
large-scale content-
based search
Goal
Use image analysis techniques combined with very-large sets of
partially annotated images to build software that can automatically
annotate new images.
29
Motivation
• Sources such as Flickr can provide very large amounts of partially
annotated imagery.
–Some images are annotated with textual “tags”, some have
captions, others are geo-located (“geo-tagged”).
• Efficient indexing of the visual information should allow
unannotated images to be associated with annotated images, and the
annotations to be transferred.
–Advanced aggregation techniques could be applied to remove
noise and semantically enrich the annotations.
Demo
Analogies with Text
in Image Feature
Representations
Bags-of-visual words
In the computer vision community over recent years it has
become popular to model the content of an image in a
similar way to a “bag-of-terms” in textual document
analysis.
The quick brown
fox jumped over
the lazy dog!
Tokenisation
Stemming/Lemmatisation
Count Occurrences
Local Feature Extraction
Feature Quantization
Count Occurrences
brown dog fox jumped lazy over quick the
1 1 1 1 1 1 1 2[ ] 1[ 2 0 0 6 ]
The quick brown
fox jumped over
the lazy dog!
BoVW using local features
• Features localised by a
robust region detector and
described by a local
descriptor such as SIFT.
• A vocabulary of exemplar
feature-vectors is learnt.
– Traditionally through k-
means clustering.
• Local descriptors can then be
quantised to discrete visual
terms by finding the closest
exemplar in the vocabulary.
33
Next Steps
• Current research challenges being tackled:
–Scalable retrieval and duplicate detection (millions of
images)
–Better automatic annotators and classifiers
•Object/entity recognition
•Sentiment analysis
•Face recognition
–Improved image representations
–Extensions of techniques to integrate video and audio
analysis
• All with open-source software!
35
Thank You!
Any Questions?

Sharp images and fuzzy concepts: Multimedia retrieval and the semantic gap