1. IMAGE RETRIEVAL:
CONTENT
VERSUS
CONTEXT
Thijs Westerveld
Teoria e Tecnologia della Comunicazione
Sistemi Informativi Multimediali AA’11-’12
Angelo Oldani
744818
2. INTRODUCTION
Through these slides will be presented the paper “Image
retrieval: content versus context” of Thijs Westerveld.
This paper presents a “new” approach to image retrieval
that takes the best from two worlds.
It combine image features – content
collateral text – context
4. CONTEXT
BASE
IMAGE
RETRIEVAL
May be based on two modes:
• Annotations that are manually added.
• Collateral text available with an image.
The similarity between images is then based on the
similarity between the associated texts.
5. CONTEXT
BASE
IMAGE
RETRIEVAL
PROBLEMS
• Synonymy
Use different words to describe the same subject in
different documents.
• Ambiguity
Same words describe different subjects.
6. CONTENT
BASE
IMAGE
RETRIEVAL
Return images that are visually most similar.
Similarity is based on a set of low-level image
features like a:
• colour
• shape
• texture
• …..
8. LATENT
SEMANTIC
INDEXING
(LSI)
LSI is a method that uses co-occurrence statistics of
terms to find the semantics behind a document’s terms.
Documents using similar terms are probably related.
RESERVATION
DOUBLE ROOM
SHOWER
BREAKFAST
9. LATENT
SEMANTIC
INDEXING
(LSI)
No one has combined text and image into the same
semantic space using LSI.
List of terms from both modalities in one term document
matrix and then apply the SVD resulting in a semantic space
that contains both visual and textual items.
10. LATENT
SEMANTIC
INDEXING
(LSI)
CALCULATING
IMAGE TERMS
To use LSI on image content is necessary to
define a set of discrete image features that
has the same distribuiton as the set of textual
terms.
Set terms that is sparse as the set
of the textual terms.
CALCULATING
IMAGE TERMS
Set of therms that is the same size
of the textual terms.
11. FEATURE
EXTRACTION
Should extract the indexing terms from documents.
TEXTUAL IMAGE
TERMS
FEATURES
Image captions
Colours
Textures
12. SPARSE
SET OF
IMAGE TERMS
COLOUR FEATURES
Has been used HSV colour space divided into 18 Hues, 3
Saturations and 3 Values and were extracted two sets of
features:
• Histogram for the whole image.
• Binary value of the most frequent color for each
block.
TEXTURE FEATURES
Has been used gabor filters at 3 different wavelengths and
four orientation and was extracted the average energy for
each combination of wavelengths and orientation. Avg
energy values are quantified into 128 bands and disregarding
the values
that fall within the lower 16 bands.
13. SPARSE
SET OF
IMAGE TERMS
TERM FREQUENCIES
Tot. #terms
Avg. #terms/doc
ratio
Text
4283
27
158:1
Image
37752
625
63:1
Combination
42035
598
70:1
14. SMALL
SET OF
IMAGE TERMS
COLOUR FEATURES
Has been used HSV colour space divided into 18 Hues, 3
Saturations and 3 Values and were extracted two sets of
features:
• Histogram for each block.
• Histogram for whole image.
TEXTURE FEATURES
Has been used gabor filters at 3 different wavelengths and
four orientation and was extracted the average energy for
each combination of wavelengths and orientation. Avg
energy values are quantified into 10 bands and
disregarding the values
that fall within the lower 2 bands.
15. SMALL
SET OF
IMAGE TERMS
TERM FREQUENCIES
Tot. #terms
Avg. #terms/doc
ratio
Text
4283
27
158:1
Image
4442
1131
4:1
Combination
8725
1158
8:1
16. EXPERIMENT
3379 images from Reformatorisch Dagblad
online archive together with their
captions.
Set of 20 documents as query
3 indexes (LSI indexing):
• Visual terms
• Textual term
• Visual Textual terms
Top 100 returned documents
17. EXPERIMENT
RESULTS
• The small set of image features seems to perform
somewhat better than the sparse set
• The combined approach for this set of features
outperforms both the image and the text approach for
queries with many relevant documents in the data set.
18. DISCUSSION
Latent Semantic Indexing can help bridge the semantic
gap
LIMITS
• Research based on very small set of images
• Text is not available with every image