SlideShare a Scribd company logo
MMRetrieval.net
A Multimodal Search Engine
Multimodal Information
 Single language text-only retrieval reach a limit.
 Content-based Image Retrieval is computational
costly and still in infancy stages.
 Digital Information is increasingly becoming
multimodal
 Example: Wikipedia
Modality
 Dictionary: A tendency to conform to a general
pattern or belong to a particular group or
category.
 Definition of Modality in Information Retrieval
 It is unclear, fuzzy
 1st Definition: Modality = Media
 2nd Definition: Modality = Data Stream
MMRetrieval.net
 A Product of Cooperation
 Started June, 2010
 Avi Arampatzis, Lecturer D.U.T.H.
 Konstantinos Zagoris, ph.D. D.U.T.H
 Savvas A. Chatzichristofis, ph.D. candidate D.U.T.H.
ImageCLEF 2010
Wikipedia Retrieval Task
 ImageCLEF 2010 Wikipedia Collection
 Consisting of 237434 items
 Image Primary Media
 Noisy and Incomplete User Supplied Textual
Annotations
 Wikipedia Articles Containing the Images
 Written in any combination of English, German,
French, or any other unidentified language
Wikipedia Collection
<image id="244845" file="images/25/244845.jpg">
<name>Balloons Festival - Chateaux d'Oex.jpg</name>
<text xml:lang="en">
<description/>
<comment/>
<caption article="text/en/4/331622">Balloon
festival </caption>
</text>
<text xml:lang="de">
<description/>
<comment/>
<caption/>
</text>
<text xml:lang="fr">
<description/>
<comment/>
<caption/>
</text>
<comment>(Balloon festival in Chateaux d'Oex.
Category:Chateau d'Oex Category:Hot air balloons)
</comment>
<license>GFDL</license>
</image>
ImageCLEF 2010
Wikipedia Retrieval Task
 70 test topics
 consisting of a textual and a visual part
 three title fields (one per language—English,
German, French)
 one or more example images
Wikipedia Topic
<topic>
<number>8</number>
<title xml:lang="en">tennis player on court</title>
<title xml:lang="de">tennisspieler auf dem platz</title>
<title xml:lang="fr">joueur de tennis sur le terrain</title>
<image>2197587684_94542c6fbd.jpg</image>
<image>777629689_443a25ba08.jpg</image>
</topic>
Extraction of Modalities
Joint Composite Descriptor (JCD)
Spartial Color Distribution (SpCD)
description
comment
caption
article
name
English,
French,
German
Lemur Toolkit V4.11 and Indri V2.11 with
the tf.idf retrieval model
MMRetrieval.net Structure
Fusion in Information Retrieval
 combining evidence about relevance from
different sources of information
 from several modalities
 fusion consists of two components
 score normalization
 score combination
Score Normalization
 the relevance scores are not comparable
 popular text retrieval models (tf.idf) can be turned to
probabilities of relevance via the score-distributional
method
 image descriptors does not fit
 MinMax (maps linearly to the [0,1] )
 Zscore (maps to the number of standard deviations it
lies above or below the mean score)
 non-linear Known-Item Aggregate Cumulative Density
Function (KIACDF)
Score Combination
 CompSUM
 CompMULT
 CompMAX
 CompMED
 CompWSUM
Results
Participant MAP
1 xrce 0.2765
2 unt 0.2251
3 telecom 0.2227
4 i2rcviu 0.2126
5 dcu 0.2039
6 cheshire 0.2014
7 duth 0.1998
8 uned 0.1927
9 daedalus 0.1820
10 sztaki 0.1794
11 nus 0.1581
12 rgu 0.0617
13 uaic 0.0423
Participant P@10
1 xrce 0.6114
2 duth 0.5200
3 i2rcviu 0.4971
4 cheshire 0.4929
5 telecom 0.4914
6 sztaki 0.4857
7 daedalus 0.4471
8 unt 0.4314
9 dcu 0.4271
10 uned 0.4200
11 nus 0.3529
12 rgu 0.2271
13 uaic 0.1543
Participant P@20
1 xrce 0.5407
2 duth 0.4836
3 telecom 0.4407
4 cheshire 0.4364
5 sztaki 0.4329
6 i2rcviu 0.4321
7 daedalus 0.4029
8 unt 0.3986
9 dcu 0.3907
10 uned 0.3671
11 nus 0.3264
12 uaic 0.1529
13 rgu 0.1514
Corrected Results
Participant MAP
1 xrce 0.2765
2 duth 0.2561
3 unt 0.2251
4 telecom 0.2227
5 i2rcviu 0.2126
6 dcu 0.2039
7 cheshire 0.2014
8 uned 0.1927
9 daedalus 0.1820
10 sztaki 0.1794
11 nus 0.1581
12 rgu 0.0617
13 uaic 0.0423
Participant P@10
1 xrce 0.6114
2 duth 0.5257
3 i2rcviu 0.4971
4 cheshire 0.4929
5 telecom 0.4914
6 sztaki 0.4857
7 daedalus 0.4471
8 unt 0.4314
9 dcu 0.4271
10 uned 0.4200
11 nus 0.3529
12 rgu 0.2271
13 uaic 0.1543
Participant P@20
1 xrce 0.5407
2 duth 0.4900
3 telecom 0.4407
4 cheshire 0.4364
5 sztaki 0.4329
6 i2rcviu 0.4321
7 daedalus 0.4029
8 unt 0.3986
9 dcu 0.3907
10 uned 0.3671
11 nus 0.3264
12 uaic 0.1529
13 rgu 0.1514
Fusion Problems
 appropriate weighing of modalities and score
normalization/combination are not trivial
problems
 if results are assessed by visual similarity only,
fusion is not a theoretically sound method
Content-based Image Retrieval
Problems
 Content-based Image Retrieval (CBIR) with global
features is notoriously noisy for image queries of
low generality, i.e. the fraction of relevant images
in a collection.
 does not scale up well to large databases
efficiency-wise
Two – Stage Image Retrieval
 how it works: first use the secondary modality to rank the
collection then perform CBIR only on the top-K items
 assumption: primary (image) – secondary (text) modalities
 hypothesis: CBIR can do better than text retrieval in small
sets or sets of high query generality
 efficient benefit: Using a ‘cheaper’ secondary modality, this
improves also efficiency by cutting down on costly CBIR
operations
 possible drawback: relevant images with empty or very
noise secondary modalities would be completely missed
Previous Work
 Best results re-ranking by visual content has been
seen before
 mostly in different setups
 All these approaches employed a static predefined
K for all queries
 not clear if it works
Our Two-Stage Method
 dynamic K
 calculated dynamically per query
 optimize a predefined effectiveness measure
 without using external information or training
data
Retrieval Results
cockpit of an airplane
Image Only
Text Only
Static K=25
Dynamic K
Best Fusion Method – Max of Sums
 i the index running over example images (i=1,2,…)
 j running over the visual descriptors (𝑗∈{1,2})
 DESCji is the score against the ith example image
for the jth descriptor
 parameter w controls the relative contribution of
the two media
𝑠 = 1 − 𝑤 max
𝑖
𝑗
𝑀𝑖𝑛𝑀𝑎𝑥 𝐷𝐸𝑆𝐶𝑗𝑖 + 𝑤𝑀𝑖𝑛𝑀𝑎𝑥 𝑡𝑓. 𝑖𝑑𝑓
Fusion vs Two-Stage
Implementation
• developed in the C#/.NET
Framework 4.0
• HTML, CSS and JavaScript (AJAX)
technologies for the interface
• requires a fairly modern browser
Directions for Further Research
 Multi-stage retrieval for multimodal databases
based on modality hierarchy.
 Fuzzy Fusion (replace w with membership
function m).
 Create artificial modalities (not only from
relevance scores)
 pseudo relevance feedback – cross media
feedback
Publications
 Multimedia Search with Noisy Modalities: Fusion and
Multistage Retrieval. Avi Arampatzis, Savvas A.
Chatzichristofis, and Konstantinos Zagoris. In: CLEF
(Notebook Papers/LABs/Workshops), 22-23
September, Padua, Italy, 2010.
 www.MMRetrieval.net: A Multimodal Search Engine.
Konstantinos Zagoris, Avi Arampatzis, and Savvas A.
Chatzichristofis. In: Proceedings of the 3rd
International Conference on SImilarity Search and
APplications, SISAP 2010, Istanbul, Turkey, September
18-19, 2010. © Association for Computing Machinery
(ACM).
MultiModal Retrieval Image

More Related Content

What's hot

Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...
Konstantinos Zagoris
 
Self-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with SmoothingSelf-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with Smoothing
Priyanka Wagh
 
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence MatrixSteganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
CSCJournals
 
IRJET- Object Detection using Hausdorff Distance
IRJET-  	  Object Detection using Hausdorff DistanceIRJET-  	  Object Detection using Hausdorff Distance
IRJET- Object Detection using Hausdorff Distance
IRJET Journal
 
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
IJERA Editor
 
Btv thesis defense_v1.02-final
Btv thesis defense_v1.02-finalBtv thesis defense_v1.02-final
Btv thesis defense_v1.02-finalVinh Bui
 
Test PDF
Test PDFTest PDF
Test PDFAlgnuD
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
Putra Wanda
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
Tarat Diloksawatdikul
 
Optimized Neural Network for Classification of Multispectral Images
Optimized Neural Network for Classification of Multispectral ImagesOptimized Neural Network for Classification of Multispectral Images
Optimized Neural Network for Classification of Multispectral Images
IDES Editor
 
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
ijcsit
 
Super Resolution with OCR Optimization
Super Resolution with OCR OptimizationSuper Resolution with OCR Optimization
Super Resolution with OCR Optimization
niveditJain
 
A Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionA Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detection
vivatechijri
 
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET Journal
 
Radial Thickness Calculation and Visualization for Volumetric Layers-8397
Radial Thickness Calculation and Visualization for Volumetric Layers-8397Radial Thickness Calculation and Visualization for Volumetric Layers-8397
Radial Thickness Calculation and Visualization for Volumetric Layers-8397
Kitware Kitware
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingIAEME Publication
 
Enhanced characterness for text detection in the wild
Enhanced characterness for text detection in the wildEnhanced characterness for text detection in the wild
Enhanced characterness for text detection in the wild
Prerana Mukherjee
 
An adaptive-model-for-blind-image-restoration-using-bayesian-approach
An adaptive-model-for-blind-image-restoration-using-bayesian-approachAn adaptive-model-for-blind-image-restoration-using-bayesian-approach
An adaptive-model-for-blind-image-restoration-using-bayesian-approachCemal Ardil
 
A1804010105
A1804010105A1804010105
A1804010105
IOSR Journals
 

What's hot (20)

Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...
 
Self-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with SmoothingSelf-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with Smoothing
 
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence MatrixSteganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
 
IRJET- Object Detection using Hausdorff Distance
IRJET-  	  Object Detection using Hausdorff DistanceIRJET-  	  Object Detection using Hausdorff Distance
IRJET- Object Detection using Hausdorff Distance
 
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
 
Btv thesis defense_v1.02-final
Btv thesis defense_v1.02-finalBtv thesis defense_v1.02-final
Btv thesis defense_v1.02-final
 
Test PDF
Test PDFTest PDF
Test PDF
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
Optimized Neural Network for Classification of Multispectral Images
Optimized Neural Network for Classification of Multispectral ImagesOptimized Neural Network for Classification of Multispectral Images
Optimized Neural Network for Classification of Multispectral Images
 
50120140501016
5012014050101650120140501016
50120140501016
 
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
 
Super Resolution with OCR Optimization
Super Resolution with OCR OptimizationSuper Resolution with OCR Optimization
Super Resolution with OCR Optimization
 
A Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detectionA Literature Survey: Neural Networks for object detection
A Literature Survey: Neural Networks for object detection
 
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...IRJET-  	  Finding Dominant Color in the Artistic Painting using Data Mining ...
IRJET- Finding Dominant Color in the Artistic Painting using Data Mining ...
 
Radial Thickness Calculation and Visualization for Volumetric Layers-8397
Radial Thickness Calculation and Visualization for Volumetric Layers-8397Radial Thickness Calculation and Visualization for Volumetric Layers-8397
Radial Thickness Calculation and Visualization for Volumetric Layers-8397
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of moving
 
Enhanced characterness for text detection in the wild
Enhanced characterness for text detection in the wildEnhanced characterness for text detection in the wild
Enhanced characterness for text detection in the wild
 
An adaptive-model-for-blind-image-restoration-using-bayesian-approach
An adaptive-model-for-blind-image-restoration-using-bayesian-approachAn adaptive-model-for-blind-image-restoration-using-bayesian-approach
An adaptive-model-for-blind-image-restoration-using-bayesian-approach
 
A1804010105
A1804010105A1804010105
A1804010105
 

Similar to MultiModal Retrieval Image

Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in Images
Anil Kumar Gupta
 
Big-Data Analytics for Media Management
Big-Data Analytics for Media ManagementBig-Data Analytics for Media Management
Big-Data Analytics for Media Management
techkrish
 
Image super resolution using Generative Adversarial Network.
Image super resolution using Generative Adversarial Network.Image super resolution using Generative Adversarial Network.
Image super resolution using Generative Adversarial Network.
IRJET Journal
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier
 
A Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringA Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question Answering
IRJET Journal
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
IRJET Journal
 
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET Journal
 
2008.11560v2.pdf
2008.11560v2.pdf2008.11560v2.pdf
2008.11560v2.pdf
BabulMosabber1
 
An Overview of Supervised Machine Learning Paradigms and their Classifiers
An Overview of Supervised Machine Learning Paradigms and their ClassifiersAn Overview of Supervised Machine Learning Paradigms and their Classifiers
An Overview of Supervised Machine Learning Paradigms and their Classifiers
IJAEMSJORNAL
 
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
CHENHuiMei
 
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET Journal
 
DSDT meetup July 2021
DSDT meetup July 2021DSDT meetup July 2021
DSDT meetup July 2021
DSDT_MTL
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
Azad public school
 
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVALMETA-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
IJCSEIT Journal
 
Deep Convolutional Neural Network based Intrusion Detection System
Deep Convolutional Neural Network based Intrusion Detection SystemDeep Convolutional Neural Network based Intrusion Detection System
Deep Convolutional Neural Network based Intrusion Detection System
Sri Ram
 
A detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning AlgorithmsA detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning Algorithms
NIET Journal of Engineering & Technology (NIETJET)
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
Paolo Missier
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
Yogendra Tamang
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
IRJET Journal
 
Automated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU ArchitectureAutomated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU Architecture
IRJET Journal
 

Similar to MultiModal Retrieval Image (20)

Obscenity Detection in Images
Obscenity Detection in ImagesObscenity Detection in Images
Obscenity Detection in Images
 
Big-Data Analytics for Media Management
Big-Data Analytics for Media ManagementBig-Data Analytics for Media Management
Big-Data Analytics for Media Management
 
Image super resolution using Generative Adversarial Network.
Image super resolution using Generative Adversarial Network.Image super resolution using Generative Adversarial Network.
Image super resolution using Generative Adversarial Network.
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
A Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringA Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question Answering
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
 
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 
2008.11560v2.pdf
2008.11560v2.pdf2008.11560v2.pdf
2008.11560v2.pdf
 
An Overview of Supervised Machine Learning Paradigms and their Classifiers
An Overview of Supervised Machine Learning Paradigms and their ClassifiersAn Overview of Supervised Machine Learning Paradigms and their Classifiers
An Overview of Supervised Machine Learning Paradigms and their Classifiers
 
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
 
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
 
DSDT meetup July 2021
DSDT meetup July 2021DSDT meetup July 2021
DSDT meetup July 2021
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVALMETA-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
 
Deep Convolutional Neural Network based Intrusion Detection System
Deep Convolutional Neural Network based Intrusion Detection SystemDeep Convolutional Neural Network based Intrusion Detection System
Deep Convolutional Neural Network based Intrusion Detection System
 
A detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning AlgorithmsA detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning Algorithms
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
Automated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU ArchitectureAutomated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU Architecture
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

MultiModal Retrieval Image

  • 2. Multimodal Information  Single language text-only retrieval reach a limit.  Content-based Image Retrieval is computational costly and still in infancy stages.  Digital Information is increasingly becoming multimodal  Example: Wikipedia
  • 3. Modality  Dictionary: A tendency to conform to a general pattern or belong to a particular group or category.  Definition of Modality in Information Retrieval  It is unclear, fuzzy  1st Definition: Modality = Media  2nd Definition: Modality = Data Stream
  • 4. MMRetrieval.net  A Product of Cooperation  Started June, 2010  Avi Arampatzis, Lecturer D.U.T.H.  Konstantinos Zagoris, ph.D. D.U.T.H  Savvas A. Chatzichristofis, ph.D. candidate D.U.T.H.
  • 5. ImageCLEF 2010 Wikipedia Retrieval Task  ImageCLEF 2010 Wikipedia Collection  Consisting of 237434 items  Image Primary Media  Noisy and Incomplete User Supplied Textual Annotations  Wikipedia Articles Containing the Images  Written in any combination of English, German, French, or any other unidentified language
  • 6. Wikipedia Collection <image id="244845" file="images/25/244845.jpg"> <name>Balloons Festival - Chateaux d'Oex.jpg</name> <text xml:lang="en"> <description/> <comment/> <caption article="text/en/4/331622">Balloon festival </caption> </text> <text xml:lang="de"> <description/> <comment/> <caption/> </text> <text xml:lang="fr"> <description/> <comment/> <caption/> </text> <comment>(Balloon festival in Chateaux d'Oex. Category:Chateau d'Oex Category:Hot air balloons) </comment> <license>GFDL</license> </image>
  • 7. ImageCLEF 2010 Wikipedia Retrieval Task  70 test topics  consisting of a textual and a visual part  three title fields (one per language—English, German, French)  one or more example images
  • 8. Wikipedia Topic <topic> <number>8</number> <title xml:lang="en">tennis player on court</title> <title xml:lang="de">tennisspieler auf dem platz</title> <title xml:lang="fr">joueur de tennis sur le terrain</title> <image>2197587684_94542c6fbd.jpg</image> <image>777629689_443a25ba08.jpg</image> </topic>
  • 9. Extraction of Modalities Joint Composite Descriptor (JCD) Spartial Color Distribution (SpCD) description comment caption article name English, French, German Lemur Toolkit V4.11 and Indri V2.11 with the tf.idf retrieval model
  • 11. Fusion in Information Retrieval  combining evidence about relevance from different sources of information  from several modalities  fusion consists of two components  score normalization  score combination
  • 12. Score Normalization  the relevance scores are not comparable  popular text retrieval models (tf.idf) can be turned to probabilities of relevance via the score-distributional method  image descriptors does not fit  MinMax (maps linearly to the [0,1] )  Zscore (maps to the number of standard deviations it lies above or below the mean score)  non-linear Known-Item Aggregate Cumulative Density Function (KIACDF)
  • 13. Score Combination  CompSUM  CompMULT  CompMAX  CompMED  CompWSUM
  • 14. Results Participant MAP 1 xrce 0.2765 2 unt 0.2251 3 telecom 0.2227 4 i2rcviu 0.2126 5 dcu 0.2039 6 cheshire 0.2014 7 duth 0.1998 8 uned 0.1927 9 daedalus 0.1820 10 sztaki 0.1794 11 nus 0.1581 12 rgu 0.0617 13 uaic 0.0423 Participant P@10 1 xrce 0.6114 2 duth 0.5200 3 i2rcviu 0.4971 4 cheshire 0.4929 5 telecom 0.4914 6 sztaki 0.4857 7 daedalus 0.4471 8 unt 0.4314 9 dcu 0.4271 10 uned 0.4200 11 nus 0.3529 12 rgu 0.2271 13 uaic 0.1543 Participant P@20 1 xrce 0.5407 2 duth 0.4836 3 telecom 0.4407 4 cheshire 0.4364 5 sztaki 0.4329 6 i2rcviu 0.4321 7 daedalus 0.4029 8 unt 0.3986 9 dcu 0.3907 10 uned 0.3671 11 nus 0.3264 12 uaic 0.1529 13 rgu 0.1514
  • 15. Corrected Results Participant MAP 1 xrce 0.2765 2 duth 0.2561 3 unt 0.2251 4 telecom 0.2227 5 i2rcviu 0.2126 6 dcu 0.2039 7 cheshire 0.2014 8 uned 0.1927 9 daedalus 0.1820 10 sztaki 0.1794 11 nus 0.1581 12 rgu 0.0617 13 uaic 0.0423 Participant P@10 1 xrce 0.6114 2 duth 0.5257 3 i2rcviu 0.4971 4 cheshire 0.4929 5 telecom 0.4914 6 sztaki 0.4857 7 daedalus 0.4471 8 unt 0.4314 9 dcu 0.4271 10 uned 0.4200 11 nus 0.3529 12 rgu 0.2271 13 uaic 0.1543 Participant P@20 1 xrce 0.5407 2 duth 0.4900 3 telecom 0.4407 4 cheshire 0.4364 5 sztaki 0.4329 6 i2rcviu 0.4321 7 daedalus 0.4029 8 unt 0.3986 9 dcu 0.3907 10 uned 0.3671 11 nus 0.3264 12 uaic 0.1529 13 rgu 0.1514
  • 16. Fusion Problems  appropriate weighing of modalities and score normalization/combination are not trivial problems  if results are assessed by visual similarity only, fusion is not a theoretically sound method
  • 17. Content-based Image Retrieval Problems  Content-based Image Retrieval (CBIR) with global features is notoriously noisy for image queries of low generality, i.e. the fraction of relevant images in a collection.  does not scale up well to large databases efficiency-wise
  • 18. Two – Stage Image Retrieval  how it works: first use the secondary modality to rank the collection then perform CBIR only on the top-K items  assumption: primary (image) – secondary (text) modalities  hypothesis: CBIR can do better than text retrieval in small sets or sets of high query generality  efficient benefit: Using a ‘cheaper’ secondary modality, this improves also efficiency by cutting down on costly CBIR operations  possible drawback: relevant images with empty or very noise secondary modalities would be completely missed
  • 19. Previous Work  Best results re-ranking by visual content has been seen before  mostly in different setups  All these approaches employed a static predefined K for all queries  not clear if it works
  • 20. Our Two-Stage Method  dynamic K  calculated dynamically per query  optimize a predefined effectiveness measure  without using external information or training data
  • 21. Retrieval Results cockpit of an airplane Image Only Text Only Static K=25 Dynamic K
  • 22. Best Fusion Method – Max of Sums  i the index running over example images (i=1,2,…)  j running over the visual descriptors (𝑗∈{1,2})  DESCji is the score against the ith example image for the jth descriptor  parameter w controls the relative contribution of the two media 𝑠 = 1 − 𝑤 max 𝑖 𝑗 𝑀𝑖𝑛𝑀𝑎𝑥 𝐷𝐸𝑆𝐶𝑗𝑖 + 𝑤𝑀𝑖𝑛𝑀𝑎𝑥 𝑡𝑓. 𝑖𝑑𝑓
  • 24. Implementation • developed in the C#/.NET Framework 4.0 • HTML, CSS and JavaScript (AJAX) technologies for the interface • requires a fairly modern browser
  • 25. Directions for Further Research  Multi-stage retrieval for multimodal databases based on modality hierarchy.  Fuzzy Fusion (replace w with membership function m).  Create artificial modalities (not only from relevance scores)  pseudo relevance feedback – cross media feedback
  • 26. Publications  Multimedia Search with Noisy Modalities: Fusion and Multistage Retrieval. Avi Arampatzis, Savvas A. Chatzichristofis, and Konstantinos Zagoris. In: CLEF (Notebook Papers/LABs/Workshops), 22-23 September, Padua, Italy, 2010.  www.MMRetrieval.net: A Multimodal Search Engine. Konstantinos Zagoris, Avi Arampatzis, and Savvas A. Chatzichristofis. In: Proceedings of the 3rd International Conference on SImilarity Search and APplications, SISAP 2010, Istanbul, Turkey, September 18-19, 2010. © Association for Computing Machinery (ACM).