MultiModal Retrieval Image

MMRetrieval.net
A Multimodal Search Engine

Multimodal Information
 Single language text-only retrieval reach a limit.
 Content-based Image Retrieval is computational
costly and still in infancy stages.
 Digital Information is increasingly becoming
multimodal
 Example: Wikipedia

Modality
 Dictionary: A tendency to conform to a general
pattern or belong to a particular group or
category.
 Definition of Modality in Information Retrieval
 It is unclear, fuzzy
 1st Definition: Modality = Media
 2nd Definition: Modality = Data Stream

MMRetrieval.net
 A Product of Cooperation
 Started June, 2010
 Avi Arampatzis, Lecturer D.U.T.H.
 Konstantinos Zagoris, ph.D. D.U.T.H
 Savvas A. Chatzichristofis, ph.D. candidate D.U.T.H.

ImageCLEF 2010
Wikipedia Retrieval Task
 ImageCLEF 2010 Wikipedia Collection
 Consisting of 237434 items
 Image Primary Media
 Noisy and Incomplete User Supplied Textual
Annotations
 Wikipedia Articles Containing the Images
 Written in any combination of English, German,
French, or any other unidentified language

Wikipedia Collection
<image id="244845" file="images/25/244845.jpg">
<name>Balloons Festival - Chateaux d'Oex.jpg</name>
<text xml:lang="en">
<description/>
<comment/>
<caption article="text/en/4/331622">Balloon
festival </caption>
</text>
<text xml:lang="de">
<description/>
<comment/>
<caption/>
</text>
<text xml:lang="fr">
<description/>
<comment/>
<caption/>
</text>
<comment>(Balloon festival in Chateaux d'Oex.
Category:Chateau d'Oex Category:Hot air balloons)
</comment>
<license>GFDL</license>
</image>

ImageCLEF 2010
Wikipedia Retrieval Task
 70 test topics
 consisting of a textual and a visual part
 three title fields (one per language—English,
German, French)
 one or more example images

Wikipedia Topic
<topic>
<number>8</number>
<title xml:lang="en">tennis player on court</title>
<title xml:lang="de">tennisspieler auf dem platz</title>
<title xml:lang="fr">joueur de tennis sur le terrain</title>
<image>2197587684_94542c6fbd.jpg</image>
<image>777629689_443a25ba08.jpg</image>
</topic>

Extraction of Modalities
Joint Composite Descriptor (JCD)
Spartial Color Distribution (SpCD)
description
comment
caption
article
name
English,
French,
German
Lemur Toolkit V4.11 and Indri V2.11 with
the tf.idf retrieval model

Fusion in Information Retrieval
 combining evidence about relevance from
different sources of information
 from several modalities
 fusion consists of two components
 score normalization
 score combination

Score Normalization
 the relevance scores are not comparable
 popular text retrieval models (tf.idf) can be turned to
probabilities of relevance via the score-distributional
method
 image descriptors does not fit
 MinMax (maps linearly to the [0,1] )
 Zscore (maps to the number of standard deviations it
lies above or below the mean score)
 non-linear Known-Item Aggregate Cumulative Density
Function (KIACDF)

Score Combination
 CompSUM
 CompMULT
 CompMAX
 CompMED
 CompWSUM

Results
Participant MAP
1 xrce 0.2765
2 unt 0.2251
3 telecom 0.2227
4 i2rcviu 0.2126
5 dcu 0.2039
6 cheshire 0.2014
7 duth 0.1998
8 uned 0.1927
9 daedalus 0.1820
10 sztaki 0.1794
11 nus 0.1581
12 rgu 0.0617
13 uaic 0.0423
Participant P@10
1 xrce 0.6114
2 duth 0.5200
3 i2rcviu 0.4971
4 cheshire 0.4929
5 telecom 0.4914
6 sztaki 0.4857
7 daedalus 0.4471
8 unt 0.4314
9 dcu 0.4271
10 uned 0.4200
11 nus 0.3529
12 rgu 0.2271
13 uaic 0.1543
Participant P@20
1 xrce 0.5407
2 duth 0.4836
3 telecom 0.4407
4 cheshire 0.4364
5 sztaki 0.4329
6 i2rcviu 0.4321
7 daedalus 0.4029
8 unt 0.3986
9 dcu 0.3907
10 uned 0.3671
11 nus 0.3264
12 uaic 0.1529
13 rgu 0.1514

Corrected Results
Participant MAP
1 xrce 0.2765
2 duth 0.2561
3 unt 0.2251
4 telecom 0.2227
5 i2rcviu 0.2126
6 dcu 0.2039
7 cheshire 0.2014
8 uned 0.1927
9 daedalus 0.1820
10 sztaki 0.1794
11 nus 0.1581
12 rgu 0.0617
13 uaic 0.0423
Participant P@10
1 xrce 0.6114
2 duth 0.5257
3 i2rcviu 0.4971
4 cheshire 0.4929
5 telecom 0.4914
6 sztaki 0.4857
7 daedalus 0.4471
8 unt 0.4314
9 dcu 0.4271
10 uned 0.4200
11 nus 0.3529
12 rgu 0.2271
13 uaic 0.1543
Participant P@20
1 xrce 0.5407
2 duth 0.4900
3 telecom 0.4407
4 cheshire 0.4364
5 sztaki 0.4329
6 i2rcviu 0.4321
7 daedalus 0.4029
8 unt 0.3986
9 dcu 0.3907
10 uned 0.3671
11 nus 0.3264
12 uaic 0.1529
13 rgu 0.1514

Fusion Problems
 appropriate weighing of modalities and score
normalization/combination are not trivial
problems
 if results are assessed by visual similarity only,
fusion is not a theoretically sound method

Content-based Image Retrieval
Problems
 Content-based Image Retrieval (CBIR) with global
features is notoriously noisy for image queries of
low generality, i.e. the fraction of relevant images
in a collection.
 does not scale up well to large databases
efficiency-wise

Two – Stage Image Retrieval
 how it works: first use the secondary modality to rank the
collection then perform CBIR only on the top-K items
 assumption: primary (image) – secondary (text) modalities
 hypothesis: CBIR can do better than text retrieval in small
sets or sets of high query generality
 efficient benefit: Using a ‘cheaper’ secondary modality, this
improves also efficiency by cutting down on costly CBIR
operations
 possible drawback: relevant images with empty or very
noise secondary modalities would be completely missed

Previous Work
 Best results re-ranking by visual content has been
seen before
 mostly in different setups
 All these approaches employed a static predefined
K for all queries
 not clear if it works

Our Two-Stage Method
 dynamic K
 calculated dynamically per query
 optimize a predefined effectiveness measure
 without using external information or training
data

Retrieval Results
cockpit of an airplane
Image Only
Text Only
Static K=25
Dynamic K

Best Fusion Method – Max of Sums
 i the index running over example images (i=1,2,…)
 j running over the visual descriptors (𝑗∈{1,2})
 DESCji is the score against the ith example image
for the jth descriptor
 parameter w controls the relative contribution of
the two media
𝑠 = 1 − 𝑤 max
𝑖
𝑗
𝑀𝑖𝑛𝑀𝑎𝑥 𝐷𝐸𝑆𝐶𝑗𝑖 + 𝑤𝑀𝑖𝑛𝑀𝑎𝑥 𝑡𝑓. 𝑖𝑑𝑓

Implementation
• developed in the C#/.NET
Framework 4.0
• HTML, CSS and JavaScript (AJAX)
technologies for the interface
• requires a fairly modern browser

Directions for Further Research
 Multi-stage retrieval for multimodal databases
based on modality hierarchy.
 Fuzzy Fusion (replace w with membership
function m).
 Create artificial modalities (not only from
relevance scores)
 pseudo relevance feedback – cross media
feedback

Publications
 Multimedia Search with Noisy Modalities: Fusion and
Multistage Retrieval. Avi Arampatzis, Savvas A.
Chatzichristofis, and Konstantinos Zagoris. In: CLEF
(Notebook Papers/LABs/Workshops), 22-23
September, Padua, Italy, 2010.
 www.MMRetrieval.net: A Multimodal Search Engine.
Konstantinos Zagoris, Avi Arampatzis, and Savvas A.
Chatzichristofis. In: Proceedings of the 3rd
International Conference on SImilarity Search and
APplications, SISAP 2010, Istanbul, Turkey, September
18-19, 2010. © Association for Computing Machinery
(ACM).

MultiModal Retrieval Image

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MultiModal Retrieval Image

Similar to MultiModal Retrieval Image (20)

Recently uploaded

Recently uploaded (20)

MultiModal Retrieval Image