Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlation and Semantic Spaces

Semantic Retrieval and
Automatic Annotation
Linear Transformations, Correlation and
Semantic Spaces
Jonathon Hare & Paul Lewis
School of Electronics and Computer Science
University of Southampton

Introduction and Motivation
• Introduce a new, simple linear-transform based
annotation/retrieval technique
• Compare against a number of similar existing
techniques for automatic annotation & semantic
retrieval that:
• Represent images by a fixed length histogram (of
visual-term occurrences)
• Optionally use SVD for noise reduction
• Are deterministic (no randomness)
• Are (relatively) computationally efficient
• Reflect on real-world performance

SingularValue Decomposition
• SVD can be used to ﬁlter noise by producing a rank-k
estimate of the original data matrix
• The rank-k estimate is optimal in the least-squares
sense

Nomenclature
• F is a visual-term occurrence matrix
(columns represent images, rows visual-
terms)
• W is a keyword occurrence matrix
(columns represent images, rows keywords)

Technique: linear transform
Assume that visual-term occurrence vectors can be
related to keyword occurrence vectors by a simple
linear transformation, T.
FT=W
T can be estimated using the pseudo-inverse (calculated
using the SVD, which allows noise reduction) given a
training set with known F and W, then unknown W*
can be calculated from F* (from unannotated images)
and T.

Technique: Semantic Spaces
• Based around the factorisation [-]= TD
• Calculated using truncated SVD
• Rows of T represent coordinates of the features
and words in a vector space
• Columns of D represent coordinates of images in
the same space
• Similar objects have similar locations in the space, so
it is possible to rank images on their distance to a
given word
F
W
Hare, J. S., Lewis, P. H., Enser, P. G. B., and Sandom, C. J., “A Linear-Algebraic Technique with an
Application in Semantic Image Retrieval,” in CIVR 2006, Sundaram, H., Naphade, M., Smith, J. R., and
Rui,Y., eds., LNCS 4071, 31–40, Springer (2006).

Technique: Correlation
• Pan et al deﬁned four techniques for building
translation tables between visual terms and keywords
[i.e. the elements of the table/matrix represent
p(wi,fj)].
• The Corr method used WTF to build the table
• The Cos method used the cosine of wi and fj
• The SVDCorr and SVDCos methods ﬁltered the
tables from the Corr and Cos methods reducing
the rank using the SVD
Pan, J.-Y.,Yang, H.-J., Duygulu, P., and Faloutsos, C., “Automatic image captioning,” IEEE International Conference
on Multimedia and Expo 2004 (ICME ’04). Vol.3 (27-30 June 2004).

Technique Summary
Technique Variables Notes
Transform
feature-weighting,
dimensionality reduction
Words independent
Corr, Cos feature-weighting Words independent
SVDCorr,
SVDCos
feature-weighting,
Words independent
Semantic Space
feature-weighting,
Inter-word
dependencies

Image Features
• Two types of visual-term feature considered:
• Segmented-blob based (using shape, colour, texture
descriptors) [500 terms]
• Quantised DCT-based [500 terms]

Experimental Protocol
• 5000 image Corel data-set
• 4000 training images
• 500 validation images (for optimising reduced rank)
• 500 test images
• Two weighting types: unweighted and IDF
• Evaluation performed as a hypothetical retrieval
experiment
• Unannotated test images retrieved in response to
using each word in turn as a query
• Mean-average precision used for comparison

Real-world performance
• ~20% mAP might sound low, but in reality many
queries will work quite well (reasonable initial
precision, but drops fast)
• Choice of image features is very important
• It would be difﬁcult to learn the concept of “sun”
from grey-level SIFT features!
• See the paper for some more reﬂection on real-word
performance...

Conclusions
• We have described a set of auto-annotation/semantic
retrieval algorithms
• Performance is less than the state-of-the-art, but this is
partially explained by the use of different image
features (see our MIR 2010 paper)
• However, the methods;
• Are computationally inexpensive (although this is
proportional to the amount of training data)
• Are deterministic, and don’t rely on algorithms such
as EM which might get stuck in local minima/maxima

Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlation and Semantic Spaces

More Related Content

What's hot

Viewers also liked

Similar to Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlation and Semantic Spaces

Recently uploaded

Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlation and Semantic Spaces