Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
MMRetrieval.net
A Multimodal Search Engine
Multimodal Information
 Single language text-only retrieval reach a limit.
 Content-based Image Retrieval is computation...
Modality
 Dictionary: A tendency to conform to a general
pattern or belong to a particular group or
category.
 Definitio...
MMRetrieval.net
 A Product of Cooperation
 Started June, 2010
 Avi Arampatzis, Lecturer D.U.T.H.
 Konstantinos Zagoris...
ImageCLEF 2010
Wikipedia Retrieval Task
 ImageCLEF 2010 Wikipedia Collection
 Consisting of 237434 items
 Image Primary...
Wikipedia Collection
<image id="244845" file="images/25/244845.jpg">
<name>Balloons Festival - Chateaux d'Oex.jpg</name>
<...
ImageCLEF 2010
Wikipedia Retrieval Task
 70 test topics
 consisting of a textual and a visual part
 three title fields ...
Wikipedia Topic
<topic>
<number>8</number>
<title xml:lang="en">tennis player on court</title>
<title xml:lang="de">tennis...
Extraction of Modalities
Joint Composite Descriptor (JCD)
Spartial Color Distribution (SpCD)
description
comment
caption
a...
MMRetrieval.net Structure
Fusion in Information Retrieval
 combining evidence about relevance from
different sources of information
 from several ...
Score Normalization
 the relevance scores are not comparable
 popular text retrieval models (tf.idf) can be turned to
pr...
Score Combination
 CompSUM
 CompMULT
 CompMAX
 CompMED
 CompWSUM
Results
Participant MAP
1 xrce 0.2765
2 unt 0.2251
3 telecom 0.2227
4 i2rcviu 0.2126
5 dcu 0.2039
6 cheshire 0.2014
7 duth...
Corrected Results
Participant MAP
1 xrce 0.2765
2 duth 0.2561
3 unt 0.2251
4 telecom 0.2227
5 i2rcviu 0.2126
6 dcu 0.2039
...
Fusion Problems
 appropriate weighing of modalities and score
normalization/combination are not trivial
problems
 if res...
Content-based Image Retrieval
Problems
 Content-based Image Retrieval (CBIR) with global
features is notoriously noisy fo...
Two – Stage Image Retrieval
 how it works: first use the secondary modality to rank the
collection then perform CBIR only...
Previous Work
 Best results re-ranking by visual content has been
seen before
 mostly in different setups
 All these ap...
Our Two-Stage Method
 dynamic K
 calculated dynamically per query
 optimize a predefined effectiveness measure
 withou...
Retrieval Results
cockpit of an airplane
Image Only
Text Only
Static K=25
Dynamic K
Best Fusion Method – Max of Sums
 i the index running over example images (i=1,2,…)
 j running over the visual descripto...
Fusion vs Two-Stage
Implementation
• developed in the C#/.NET
Framework 4.0
• HTML, CSS and JavaScript (AJAX)
technologies for the interface
•...
Directions for Further Research
 Multi-stage retrieval for multimodal databases
based on modality hierarchy.
 Fuzzy Fusi...
Publications
 Multimedia Search with Noisy Modalities: Fusion and
Multistage Retrieval. Avi Arampatzis, Savvas A.
Chatzic...
MultiModal Retrieval Image
Upcoming SlideShare
Loading in …5
×

MultiModal Retrieval Image

1,832 views

Published on

Published in: Technology, Art & Photos
  • Be the first to comment

MultiModal Retrieval Image

  1. 1. MMRetrieval.net A Multimodal Search Engine
  2. 2. Multimodal Information  Single language text-only retrieval reach a limit.  Content-based Image Retrieval is computational costly and still in infancy stages.  Digital Information is increasingly becoming multimodal  Example: Wikipedia
  3. 3. Modality  Dictionary: A tendency to conform to a general pattern or belong to a particular group or category.  Definition of Modality in Information Retrieval  It is unclear, fuzzy  1st Definition: Modality = Media  2nd Definition: Modality = Data Stream
  4. 4. MMRetrieval.net  A Product of Cooperation  Started June, 2010  Avi Arampatzis, Lecturer D.U.T.H.  Konstantinos Zagoris, ph.D. D.U.T.H  Savvas A. Chatzichristofis, ph.D. candidate D.U.T.H.
  5. 5. ImageCLEF 2010 Wikipedia Retrieval Task  ImageCLEF 2010 Wikipedia Collection  Consisting of 237434 items  Image Primary Media  Noisy and Incomplete User Supplied Textual Annotations  Wikipedia Articles Containing the Images  Written in any combination of English, German, French, or any other unidentified language
  6. 6. Wikipedia Collection <image id="244845" file="images/25/244845.jpg"> <name>Balloons Festival - Chateaux d'Oex.jpg</name> <text xml:lang="en"> <description/> <comment/> <caption article="text/en/4/331622">Balloon festival </caption> </text> <text xml:lang="de"> <description/> <comment/> <caption/> </text> <text xml:lang="fr"> <description/> <comment/> <caption/> </text> <comment>(Balloon festival in Chateaux d'Oex. Category:Chateau d'Oex Category:Hot air balloons) </comment> <license>GFDL</license> </image>
  7. 7. ImageCLEF 2010 Wikipedia Retrieval Task  70 test topics  consisting of a textual and a visual part  three title fields (one per language—English, German, French)  one or more example images
  8. 8. Wikipedia Topic <topic> <number>8</number> <title xml:lang="en">tennis player on court</title> <title xml:lang="de">tennisspieler auf dem platz</title> <title xml:lang="fr">joueur de tennis sur le terrain</title> <image>2197587684_94542c6fbd.jpg</image> <image>777629689_443a25ba08.jpg</image> </topic>
  9. 9. Extraction of Modalities Joint Composite Descriptor (JCD) Spartial Color Distribution (SpCD) description comment caption article name English, French, German Lemur Toolkit V4.11 and Indri V2.11 with the tf.idf retrieval model
  10. 10. MMRetrieval.net Structure
  11. 11. Fusion in Information Retrieval  combining evidence about relevance from different sources of information  from several modalities  fusion consists of two components  score normalization  score combination
  12. 12. Score Normalization  the relevance scores are not comparable  popular text retrieval models (tf.idf) can be turned to probabilities of relevance via the score-distributional method  image descriptors does not fit  MinMax (maps linearly to the [0,1] )  Zscore (maps to the number of standard deviations it lies above or below the mean score)  non-linear Known-Item Aggregate Cumulative Density Function (KIACDF)
  13. 13. Score Combination  CompSUM  CompMULT  CompMAX  CompMED  CompWSUM
  14. 14. Results Participant MAP 1 xrce 0.2765 2 unt 0.2251 3 telecom 0.2227 4 i2rcviu 0.2126 5 dcu 0.2039 6 cheshire 0.2014 7 duth 0.1998 8 uned 0.1927 9 daedalus 0.1820 10 sztaki 0.1794 11 nus 0.1581 12 rgu 0.0617 13 uaic 0.0423 Participant P@10 1 xrce 0.6114 2 duth 0.5200 3 i2rcviu 0.4971 4 cheshire 0.4929 5 telecom 0.4914 6 sztaki 0.4857 7 daedalus 0.4471 8 unt 0.4314 9 dcu 0.4271 10 uned 0.4200 11 nus 0.3529 12 rgu 0.2271 13 uaic 0.1543 Participant P@20 1 xrce 0.5407 2 duth 0.4836 3 telecom 0.4407 4 cheshire 0.4364 5 sztaki 0.4329 6 i2rcviu 0.4321 7 daedalus 0.4029 8 unt 0.3986 9 dcu 0.3907 10 uned 0.3671 11 nus 0.3264 12 uaic 0.1529 13 rgu 0.1514
  15. 15. Corrected Results Participant MAP 1 xrce 0.2765 2 duth 0.2561 3 unt 0.2251 4 telecom 0.2227 5 i2rcviu 0.2126 6 dcu 0.2039 7 cheshire 0.2014 8 uned 0.1927 9 daedalus 0.1820 10 sztaki 0.1794 11 nus 0.1581 12 rgu 0.0617 13 uaic 0.0423 Participant P@10 1 xrce 0.6114 2 duth 0.5257 3 i2rcviu 0.4971 4 cheshire 0.4929 5 telecom 0.4914 6 sztaki 0.4857 7 daedalus 0.4471 8 unt 0.4314 9 dcu 0.4271 10 uned 0.4200 11 nus 0.3529 12 rgu 0.2271 13 uaic 0.1543 Participant P@20 1 xrce 0.5407 2 duth 0.4900 3 telecom 0.4407 4 cheshire 0.4364 5 sztaki 0.4329 6 i2rcviu 0.4321 7 daedalus 0.4029 8 unt 0.3986 9 dcu 0.3907 10 uned 0.3671 11 nus 0.3264 12 uaic 0.1529 13 rgu 0.1514
  16. 16. Fusion Problems  appropriate weighing of modalities and score normalization/combination are not trivial problems  if results are assessed by visual similarity only, fusion is not a theoretically sound method
  17. 17. Content-based Image Retrieval Problems  Content-based Image Retrieval (CBIR) with global features is notoriously noisy for image queries of low generality, i.e. the fraction of relevant images in a collection.  does not scale up well to large databases efficiency-wise
  18. 18. Two – Stage Image Retrieval  how it works: first use the secondary modality to rank the collection then perform CBIR only on the top-K items  assumption: primary (image) – secondary (text) modalities  hypothesis: CBIR can do better than text retrieval in small sets or sets of high query generality  efficient benefit: Using a ‘cheaper’ secondary modality, this improves also efficiency by cutting down on costly CBIR operations  possible drawback: relevant images with empty or very noise secondary modalities would be completely missed
  19. 19. Previous Work  Best results re-ranking by visual content has been seen before  mostly in different setups  All these approaches employed a static predefined K for all queries  not clear if it works
  20. 20. Our Two-Stage Method  dynamic K  calculated dynamically per query  optimize a predefined effectiveness measure  without using external information or training data
  21. 21. Retrieval Results cockpit of an airplane Image Only Text Only Static K=25 Dynamic K
  22. 22. Best Fusion Method – Max of Sums  i the index running over example images (i=1,2,…)  j running over the visual descriptors (𝑗∈{1,2})  DESCji is the score against the ith example image for the jth descriptor  parameter w controls the relative contribution of the two media 𝑠 = 1 − 𝑤 max 𝑖 𝑗 𝑀𝑖𝑛𝑀𝑎𝑥 𝐷𝐸𝑆𝐶𝑗𝑖 + 𝑤𝑀𝑖𝑛𝑀𝑎𝑥 𝑡𝑓. 𝑖𝑑𝑓
  23. 23. Fusion vs Two-Stage
  24. 24. Implementation • developed in the C#/.NET Framework 4.0 • HTML, CSS and JavaScript (AJAX) technologies for the interface • requires a fairly modern browser
  25. 25. Directions for Further Research  Multi-stage retrieval for multimodal databases based on modality hierarchy.  Fuzzy Fusion (replace w with membership function m).  Create artificial modalities (not only from relevance scores)  pseudo relevance feedback – cross media feedback
  26. 26. Publications  Multimedia Search with Noisy Modalities: Fusion and Multistage Retrieval. Avi Arampatzis, Savvas A. Chatzichristofis, and Konstantinos Zagoris. In: CLEF (Notebook Papers/LABs/Workshops), 22-23 September, Padua, Italy, 2010.  www.MMRetrieval.net: A Multimodal Search Engine. Konstantinos Zagoris, Avi Arampatzis, and Savvas A. Chatzichristofis. In: Proceedings of the 3rd International Conference on SImilarity Search and APplications, SISAP 2010, Istanbul, Turkey, September 18-19, 2010. © Association for Computing Machinery (ACM).

×