Spot the Dog
An overview of semantic retrieval of
unannotated images in the Semantic
Gap project
Semantic Image Retrieval - The User Perspective
Jonathon Hare
Intelligence,Agents, Multimedia Group
School of Electronics and Computer Science
University of Southampton
{jsh2}@ecs.soton.ac.uk
The previous talks have described the issues associated with
image retrieval from the practitioner perspective -- a
problem that has become known as the ‘semantic gap’ in
image retrieval.
This presentation aims to explore how the use of novel
computational and mathematical techniques can be used to
help improve content-based multimedia search by enabling
textual search of unannotated imagery.
Introduction
Unannotated Imagery
Manually constructing metadata in order to index
images is expensive.
Perhaps US$1-$5 per image for simple keywording.
More for archival quality metadata (keywords,
caption, title, description, dates, times, events).
Every day, the number of images is increasing.
In many domains, manually indexing everything is
an impossible task!
Unannotated Imagery
An Example
Kennel club image collection.
relatively small (~60,000 images)
~7000 of those digitised.
~3000 of those have subject metadata (mostly
keywords), remainder have little/no information.
Each year, after the Crufts dog show they expect
to receive additional (digital) images [of the order
of a few 1000] with little, if any metadata, other
than date/time (and only then if the camera is set-
up correctly).
An Overview of Our Approach
Conceptually simple idea:Teach a machine to learn
the relationship between visual features of images
and the metadata that describes them.
So, two stages:
Use exemplar image/metadata pairs to learn
relationships.
Project learnt relationships to images without
metadata in order to make them searchable.
ModellingVisual Information
In order to model the visual content of an image we can
generate and extract descriptors or feature-vectors.
Feature-vectors can describe many differing aspects of the
image content.
Low level features:
Fourier transforms, wavelet decomposition, texture
histograms, colour histograms, shape primitives, filter
primitives, etc.
Higher-level features:
Faces, objects, etc.
Visual Term Representations
A modern approach to modelling the content of an
image is to treat it like a textual document.
Model image as a collection of “visual terms”.
Synonymous with words in a text document.
Feature-vectors can be transformed into visual terms
through some mapping.
Visual Term Representations
Bag-of-Terms
For indexing purposes, we often discount order/arrangement
of terms and just count number of occurrences.
The quick
brown fox
jumped over
the lazy dog
brown dog fox jumped lazy over quick the
1 1 1 1 1 1 1 2[ ]1[ 2 0 0 6 ]
Visual Term Representations
Example: Global ColourVisual Terms
A common way of indexing the global colours used in an
image is the colour histogram.
The each bin of the histogram counts the number of pixels of
the colour range represented by that bin.
The colour histogram can thus be used directly as a term
occurrence vector in which each bin is represented as a visual
term.
1569
3408
491
0
0
902
2146
5026
0
0
56
3633
0
0
0
6827
Visual Term Representations
Example: Local interest-point based visual terms
Features based on Lowe’s
difference-of-Gaussian
region detector and SIFT
feature vector.
A vocabulary of exemplar
feature-vectors is learnt by
applying k-means clustering
to a training set of features.
Feature-vectors can then be
quantised to discrete visual
terms by finding the closest
exemplar in the vocabulary.
Semantic Spaces
Basic idea: Create a large multidimensional space in which
images, keywords (or other metadata) and visual terms can
be placed.
In the training stage learn how keywords are related to
visual terms and images.
Place related visual terms, images and keywords close-
together within the space.
In the projection stage unannotated images can be placed in
the space based upon the visual terms they contain.
The placement should be such that they lie near
keywords that describe them.
Semantic Spaces
Conceptual Overview
Semantic Spaces
Conceptual Overview
Semantic Spaces
Uses of the space
Once constructed, the semantic space has a number of uses:
Finding images (both annotated and unannotated) by
keyword(s)/metadata.
Finding images (both annotated and unannotated) by
semantically similar images.
Determining likely metadata for an image.
Examining keyword-keyword and keyword-visual term
relationships.
Segmenting an image.
Semantic Spaces
Searching by Keyword
SUN
TRAIN
Semantic Spaces
Searching by Keyword
SUN
TRAIN
Ranked Search Results:
Search for images
about “SUN”
Semantic Spaces
Searching by Keyword
SUN
TRAIN
Ranked Search Results:
Search for images
about “SUN”
SUN
Semantic Spaces
Searching by Keyword
SUN
TRAIN
Ranked Search Results:
Search for images
about “SUN”
SUN
Semantic Spaces
Searching by Image
Semantic Spaces
Searching by Image
Search for images
like this:
Ranked Search Results:
Semantic Spaces
Searching by Image
Search for images
like this:
Ranked Search Results:
Semantic Spaces
Searching by Image
Search for images
like this:
Ranked Search Results:
Semantic Spaces
Suggesting Keywords
SUN
SKY
MOUNTAIN
TREE
CAR
Semantic Spaces
Suggesting Keywords
Suggested keywords:
Suggest keywords
for this image:SUN
SKY
MOUNTAIN
TREE
CAR
Semantic Spaces
Suggesting Keywords
Suggested keywords:
Suggest keywords
for this image:SUN
SKY
MOUNTAIN
TREE
CAR
Semantic Spaces
Suggesting Keywords
Suggested keywords:
Suggest keywords
for this image:SUN
SKY
MOUNTAIN
TREE
CAR
SKY MOUNTAIN TREE SUN CAR
CAR
SUN
TREE
SKY
MOUNTAIN
Semantic Spaces
Experimental Retrieval Results - Corel Dataset
Colour Histograms used as visual terms (each bin
representing a single term).
Standard experimental collection: 500 test images, 4500
training images.
Results quite impressive ~ comparable with Machine
Translation auto-annotation technique (but remember we
are using much simpler image features).
Works well for query keywords that are easily associated
with a particular set of colours,
but not so well for the other keywords.
Semantic Spaces
Experimental Retrieval Results - Corel Dataset
Top 15 images when querying for ‘sun’
Semantic Spaces
Experimental Retrieval Results - Corel Dataset
Top 15 images when querying for ‘horse’
Semantic Spaces
Experimental Retrieval Results - Corel Dataset
Top 15 images when querying for ‘foals’
Demo
The K9 Retrieval System
We have built a demonstration system around the semantic
space idea and applied it to images from the Kennel Club
picture library (>7000 images, ∼3000 with keywords).
The system allows annotated images to be retrieved by
keywords and concepts (keywords with thesaurus expansion).
Both annotated and unannotated images can also be
retrieved using the semantic space and regular content-
based techniques.
This brief demo will concentrate on retrieval of annotated
images using keyword matching, and unannotated images
using the semantic space.
Conclusions
Semantic retrieval of unannotated images is hard!
Our semantic space approach takes us some of the
way, but there is still a long way to go.
Retrieval is limited by the choice of visual
features, and how well those features relate to
the keywords.
Questions?

Spot the Dog: An overview of semantic retrieval of unannotated images in the Semantic Gap project

  • 1.
    Spot the Dog Anoverview of semantic retrieval of unannotated images in the Semantic Gap project Semantic Image Retrieval - The User Perspective Jonathon Hare Intelligence,Agents, Multimedia Group School of Electronics and Computer Science University of Southampton {jsh2}@ecs.soton.ac.uk
  • 2.
    The previous talkshave described the issues associated with image retrieval from the practitioner perspective -- a problem that has become known as the ‘semantic gap’ in image retrieval. This presentation aims to explore how the use of novel computational and mathematical techniques can be used to help improve content-based multimedia search by enabling textual search of unannotated imagery. Introduction
  • 3.
    Unannotated Imagery Manually constructingmetadata in order to index images is expensive. Perhaps US$1-$5 per image for simple keywording. More for archival quality metadata (keywords, caption, title, description, dates, times, events). Every day, the number of images is increasing. In many domains, manually indexing everything is an impossible task!
  • 4.
    Unannotated Imagery An Example Kennelclub image collection. relatively small (~60,000 images) ~7000 of those digitised. ~3000 of those have subject metadata (mostly keywords), remainder have little/no information. Each year, after the Crufts dog show they expect to receive additional (digital) images [of the order of a few 1000] with little, if any metadata, other than date/time (and only then if the camera is set- up correctly).
  • 5.
    An Overview ofOur Approach Conceptually simple idea:Teach a machine to learn the relationship between visual features of images and the metadata that describes them. So, two stages: Use exemplar image/metadata pairs to learn relationships. Project learnt relationships to images without metadata in order to make them searchable.
  • 6.
    ModellingVisual Information In orderto model the visual content of an image we can generate and extract descriptors or feature-vectors. Feature-vectors can describe many differing aspects of the image content. Low level features: Fourier transforms, wavelet decomposition, texture histograms, colour histograms, shape primitives, filter primitives, etc. Higher-level features: Faces, objects, etc.
  • 7.
    Visual Term Representations Amodern approach to modelling the content of an image is to treat it like a textual document. Model image as a collection of “visual terms”. Synonymous with words in a text document. Feature-vectors can be transformed into visual terms through some mapping.
  • 8.
    Visual Term Representations Bag-of-Terms Forindexing purposes, we often discount order/arrangement of terms and just count number of occurrences. The quick brown fox jumped over the lazy dog brown dog fox jumped lazy over quick the 1 1 1 1 1 1 1 2[ ]1[ 2 0 0 6 ]
  • 9.
    Visual Term Representations Example:Global ColourVisual Terms A common way of indexing the global colours used in an image is the colour histogram. The each bin of the histogram counts the number of pixels of the colour range represented by that bin. The colour histogram can thus be used directly as a term occurrence vector in which each bin is represented as a visual term. 1569 3408 491 0 0 902 2146 5026 0 0 56 3633 0 0 0 6827
  • 10.
    Visual Term Representations Example:Local interest-point based visual terms Features based on Lowe’s difference-of-Gaussian region detector and SIFT feature vector. A vocabulary of exemplar feature-vectors is learnt by applying k-means clustering to a training set of features. Feature-vectors can then be quantised to discrete visual terms by finding the closest exemplar in the vocabulary.
  • 11.
    Semantic Spaces Basic idea:Create a large multidimensional space in which images, keywords (or other metadata) and visual terms can be placed. In the training stage learn how keywords are related to visual terms and images. Place related visual terms, images and keywords close- together within the space. In the projection stage unannotated images can be placed in the space based upon the visual terms they contain. The placement should be such that they lie near keywords that describe them.
  • 12.
  • 13.
  • 14.
    Semantic Spaces Uses ofthe space Once constructed, the semantic space has a number of uses: Finding images (both annotated and unannotated) by keyword(s)/metadata. Finding images (both annotated and unannotated) by semantically similar images. Determining likely metadata for an image. Examining keyword-keyword and keyword-visual term relationships. Segmenting an image.
  • 15.
  • 16.
    Semantic Spaces Searching byKeyword SUN TRAIN Ranked Search Results: Search for images about “SUN”
  • 17.
    Semantic Spaces Searching byKeyword SUN TRAIN Ranked Search Results: Search for images about “SUN” SUN
  • 18.
    Semantic Spaces Searching byKeyword SUN TRAIN Ranked Search Results: Search for images about “SUN” SUN
  • 19.
  • 20.
    Semantic Spaces Searching byImage Search for images like this: Ranked Search Results:
  • 21.
    Semantic Spaces Searching byImage Search for images like this: Ranked Search Results:
  • 22.
    Semantic Spaces Searching byImage Search for images like this: Ranked Search Results:
  • 23.
  • 24.
    Semantic Spaces Suggesting Keywords Suggestedkeywords: Suggest keywords for this image:SUN SKY MOUNTAIN TREE CAR
  • 25.
    Semantic Spaces Suggesting Keywords Suggestedkeywords: Suggest keywords for this image:SUN SKY MOUNTAIN TREE CAR
  • 26.
    Semantic Spaces Suggesting Keywords Suggestedkeywords: Suggest keywords for this image:SUN SKY MOUNTAIN TREE CAR SKY MOUNTAIN TREE SUN CAR CAR SUN TREE SKY MOUNTAIN
  • 27.
    Semantic Spaces Experimental RetrievalResults - Corel Dataset Colour Histograms used as visual terms (each bin representing a single term). Standard experimental collection: 500 test images, 4500 training images. Results quite impressive ~ comparable with Machine Translation auto-annotation technique (but remember we are using much simpler image features). Works well for query keywords that are easily associated with a particular set of colours, but not so well for the other keywords.
  • 28.
    Semantic Spaces Experimental RetrievalResults - Corel Dataset Top 15 images when querying for ‘sun’
  • 29.
    Semantic Spaces Experimental RetrievalResults - Corel Dataset Top 15 images when querying for ‘horse’
  • 30.
    Semantic Spaces Experimental RetrievalResults - Corel Dataset Top 15 images when querying for ‘foals’
  • 31.
    Demo The K9 RetrievalSystem We have built a demonstration system around the semantic space idea and applied it to images from the Kennel Club picture library (>7000 images, ∼3000 with keywords). The system allows annotated images to be retrieved by keywords and concepts (keywords with thesaurus expansion). Both annotated and unannotated images can also be retrieved using the semantic space and regular content- based techniques. This brief demo will concentrate on retrieval of annotated images using keyword matching, and unannotated images using the semantic space.
  • 32.
    Conclusions Semantic retrieval ofunannotated images is hard! Our semantic space approach takes us some of the way, but there is still a long way to go. Retrieval is limited by the choice of visual features, and how well those features relate to the keywords.
  • 33.