Serwah Sabetghadam
PhD Defense Presentation
Institute of Software Technology and Interactive Systems (IFS Group)
Vienna University of Technology
Supervisors
Ao.univ.Prof. Dr. Andreas Rauber
Dr. Mihai Lupu
A Graph-based Model for Multimodal
Information Retrieval
Rapid growth of the multimodal content
On average 350 million photos are uploaded
daily to Facebook
Multimodal Information Retrieval
has become a challenge
Multimodal Information Retrieval (IR)
 Search for information of any modality with an information
need that can be unimodal or multimodal
 Unimodal query: only keywords or
an image example
 Multimodal query: a combination of
images, video, or music files
4
modality
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
An example of an information need:
find paintings like this image
5
multimodal query
Motivation (1)
 Conventional text search to find multimodal result
• E.g. Text based Image retrieval
• Lack of indexing information of other modalities
o Content-based image retrieval
• Motivates to use different modalities
 Traditional IR does not include explicitly relations between docs
• Documents are not isolated anymore
• Hyperlinks, Metadata and Semantic connections
• User-generated multimodal content, multimodal collections
• Motivates to use structured IR too
6ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Motivation (2)
Multimodal Data is
Interlinked
Structured data represented
by Graphs
Related work consider only
one type of relation and
one type of modality in the graphs
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Motivation (2)
Multimodal Data is
Interlinked
Structured data represented
by Graphs
Related work consider only
one type of relation and
one type of modality in the graphs
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Motivations:
- To consider different modalities
- To use graph of objects
- To use different types of relations
(e.g. semantic or similarity)
- To consider
contained information object
separately
Research Questions
 RQ1: How to define a graph-based model for multimodal
information retrieval?
 RQ2: In such a graph model, can the relevant nodes be
reached?
 RQ3: In such a model can scores identify the relevant nodes?
9ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Contributions
10
[ECIR Workshop 2013]
[IRFC 2014]
[ICMR 2014]
Contributions
RQ1: How to define a
graph-based model for
multi-modal information
retrieval (MMIR)?
[ECIR 2015] [CLEF 2014]
[GBS 2015]
[Keystone 2016]
Astera model for MMIR
Defined search based
on facets
RQ2: In such a graph
model, can the relevant
nodes be reached (recall)?
Contributions
Analysis of the effect of
different facets and
links on reachability of
relevant nodes
Model
RQ3: In such a model can
scores identify the
relevant nodes?
Contributions
Analysis of the effect of
different facets on precision.
The effect of Query-
dependent and –
independent routing on
precision
Reachability Precision
Contributions
11
[ECIR Workshop 2013]
[IRFC 2014]
[ICMR 2014]
Contributions
RQ1: How to define a
graph-based model for
multi-modal information
retrieval (MMIR)?
[ECIR 2015] [CLEF 2014]
[GBS 2016]
[Keystone 2016]
Astera model for MMIR
Defined search based
on facets
RQ2: In such a graph
model, can the relevant
nodes be reached?
Contributions
Analysis of the effect of
different facets and
links on reachability of
relevant nodes
Model
RQ3: In such a model can
scores identify the
relevant nodes?
Contributions
Analysis of the effect of
different facets on precision.
The effect of Query-
dependent and –
independent routing on
precision
Reachability Score Analysis
An Information Object and
Different Representations/Facets
12
An information object may have several representations or facets.
Facet: an inherent feature or property of an information object
Image
F3
F2
F1
F4
F5
Model
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
RQ1: How to define a graph-based model for
multi-modal information retrieval?
 We propose a graph of information objects G = (V,E) named Astera
• V is the set of vertices (nodes/facets)
• E is the set of edges (defined by different types of relations)
 Different types of relations
• Semantic (α): any semantic relation between two objects
• Part-of (β): an object as part of another object, e.g. an image in a
document
• Similarity (Ɣ): relation between the facets of the same type of two
information object
• Facet (δ): linking an object to its facets
13
Model
S. Sabetghadam, M. Lupu, and A. Rauber, “Astera - A Generic Model for Multimodal Information Retrieval”, in
Proceedings of Integrating IR Technologies for Professional Search Workshop, held in ECIR 2013, pp. 551-554.
An Example of the Model
14
Elvis Presley
Graceland
(Home of Elvis Presley)
Rockability
α
α
Elvis Presley
Graceland
(Home of Elvis Presley)
Rockability
β
ββ
β
β
β
β
β
β
α
α
Elvis Presley
Graceland
(Home of Elvis Presley)
Rockability
β
ββ
β
β
β
β
β
TF.IDF
TF.IDF
TF.IDF
δ
δ
δ
β
α
α
Elvis Presley
Graceland
(Home of Elvis Presley)
Rockability
β
ββ
β
β
β
β
β
TF.IDF
TF.IDF
TF.IDF
δ
δ
δ
β
Color HistogramColor Histogram
δδ
Edge Histogram
Edge Histogram δ
δ
Color Histogramδ
α
α
Elvis Presley
Graceland
(Home of Elvis Presley)
Rockability
β
ββ
β
β
β
β
β
TF.IDF
TF.IDF
TF.IDF
δ
δ
δ
Ɣ
Ɣ Ɣ
β
Color HistogramColor Histogram
δδ
γ
Edge Histogram
Edge Histogram δ
γ
δ
Color Histogramδ γ
α
α
Ɣ
α: Semantic
β: Part-of
δ: Facet
Ɣ: Similarity
Hybrid Search
 Standard search
• Text: Lucene
• Image: LIRE
• Any similarity computation
framework
 Graph search
• Start from top results of
standard search and traverse the graph
• We take top 20 results of each facet
15
Model
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Relevance Score Value Function (RSV)
16

 Reaching a node we calculate the similarity of different facets
with the query facets
 The result is a score given to this node
Normalization function Similarity function Weight of facet fi
Model
S. Sabetghadam, M. Lupu, and A. Rauber, “A Combined Approach of Structured and Non-structured IR in multimodal
domain,” in Proceeedings of International Conference on Multimodal Retrieval, ICMR 2014, pp. 491-494.
Graph Traversal Methods (1)
 Two of the well-known methods
Spreading Activation , Random Walks
 Spreading activation
• Based on associative retrieval idea [Crestani97]
o Nodes and associations
• Some nodes get activated
• Energy propagates to the neighbours
17
Model
Graph Traversal Methods (2)
 Random Walks
• a chain of states created by some stochastic process
• Stationary distribution of the graph
 Two methods are Principally the same
• Under certain conditions
• Used both in different experiments
18
S. Sabetghadam, M. Lupu, A. Rauber, “Which one to choose? Spreading Activation or Random Walks?”,
Information Retrieval Facility Conference, IRFC 2014, pp. 112-119.
Model
Data collection
 ImageCLEF 2011 Wikipedia collection
 About 400,000 Documents and Images
• 125,828 documents, 273,434 images
19
Model
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Query data
 50 topics
• Easy, medium, hard, very hard [Tsikrika 2011]
 Query
• keywords
• four/five image examples
20ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Model
21
Query example: “Flying bird”
Visual features: CEDD, TLEP, SURF, CIME
Textual features: TF.IDF, LM, BM25
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Image Metadata Provided by the Collection
22
Query: Flying Bird
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
The Collection Mapped to our Model
23
D1D1 D2 D4D3
I1 I2 I4I3
TF.IDF CEDD CEDDTF.IDF TF.IDFCEDD
TF.IDF TF.IDF
TF.IDF TF.IDF
TF.IDF CEDD
Facet
Part-of
Model
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Hybrid Search Example
24
D1D1 D2 D4D3
I1 I2 I4I3
TF.IDF CEDD CEDDTF.IDF TF.IDFCEDD
TF.IDF TF.IDF TF.IDF TF.IDF
TF.IDF CEDD
D1D1 D2 D4D3
I1 I2 I4I3
TF.IDF CEDD CEDDTF.IDF TF.IDFCEDD
TF.IDF TF.IDF TF.IDF TF.IDF
TF.IDF CEDD
Model
D1D1 D2 D4D3
I1 I2 I4I3
TF.IDF CEDD CEDDTF.IDF TF.IDFCEDD
TF.IDF TF.IDF
TF.IDF TF.IDF
TF.IDF CEDD
D1D1 D2 D4D3
I1 I2 I4I3
TF.IDF CEDD CEDDTF.IDF TF.IDFCEDD
TF.IDF TF.IDF
TF.IDF TF.IDF
TF.IDF CEDD
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Summary
 We proposed a model which supports
• Different modalities
• Different relation types
• Decomposed an information object into facets
 Hybrid Search
• Standard and Graph search
 Mapped the collection to our model
25
Model
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Contributions
26
[ECIR Workshop 2013]
[IRFC 2014]
[ICMR 2014]
Contributions
RQ1:Can we define a
graph-based model for
multi-modal
multi-faceted information
retrieval (MMIR)?
[ECIR 2015] [CLEF 2014]
[GBS 2016]
[Keystone 2016]
Astera model for MMIR
Modelled faceted
search and relevancy
computation function
RQ2: In such a graph
model, can the relevant
nodes be reached?
Contributions
Analysis of the effect of
different facets and
links on reachability of
relevant nodes
Model
RQ3: In such a model can
scores identify the
relevant nodes?
Contributions
Analysis of the effect of
different facets on precision.
The effect of Query-
dependent and –
independent routing on
precision
Reachability Score Analysis
Relevant Data Distribution
 40 steps
 In each step
• Check if we visit
new relevant image
27
Reachability
Shape Size:
No. related node
No. total related
@step
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
RQ2: In such a graph model, can the relevant
nodes be reached?
28
C2: Reachability Analysis from Different Links
C1: Reachability Analysis from Different Facets
C3: Reachability Analysis of Different Topic Categories
C4: Graph Visit from Different Facets
Reachability
Recall from Document Textual Facets
29
Reachability
 Links used
• Facet and Part-of
 Top 20 results of each facet
 Traverse the graph from
these results
 Calculate recall in each step
C1
Similar recall behaviour
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Recall from Image Textual Facets
30
Reachability
 Same experiment starting
from results of
Image metadata textual facet
C1
Similar recall behaviour
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Recall from Image Visual Facets
31
Reachability
C1
Similar recall behaviour
 Same experiment
starting from results of
Image visual facets
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Representative Facet from each Category of
Facets
32
Document textual facetsImage textual facets
Image visual facets
TF.IDFD, LMI, CEDD
C1
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Facet Combinations (1)
33
Reachability
 Links: part-of, facet
• No semantic/
similarity links
 Fewer visited nodes,
higher recall
TF.IDFD, LMI>TF.IDFD, CEDD
 Highest recall
TF.IDFD, CEDD, LMI
Graph percentage seen
Recall
TF.IDFD, CEDD
TF.IDFD, LMI
C1
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Facet Combinations (1)
34
Reachability
 Links: part-of, facet
• No semantic/
similarity links
 Fewer visited nodes,
higher recall
TF.IDFD, LMI>TF.IDFD, CEDD
 Highest recall
TF.IDFD, CEDD, LMI
Graph percentage seen
Recall
TF.IDFD, CEDD
TF.IDFD, LMI
C1
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Facet Combinations (2)
35
Reachability
 Links: part-of, facet
• No semantic/
similarity links
 Fewer visited nodes,
higher recall
TF.IDFD, LMI>TF.IDFD, CEDD
 Highest recall
TF.IDFD, CEDD, LMI
Graph percentage seen
Recall
Recall
Percentage graph seen
TF.IDFD, CEDD
TF.IDFD, LMI
C1
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Facet Combinations (2)
36
Reachability
 Links: part-of, facet
• No semantic/
similarity links
 Fewer visited nodes,
higher recall
TF.IDFD, LMI>TF.IDFD, CEDD
 Highest recall
TF.IDFD, CEDD, LMI
Graph percentage seen
Recall
Fewer visited nodes, higher recall
TF.IDFD, LMI>TF.IDFD, CEDD
Recall
Percentage graph seen
TF.IDFD, CEDD
TF.IDFD, LMI
C1
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Observations
 Different facets leading to visiting different parts of the
collection
 This reinforces the importance of the poly-representation idea
to identify the relevant objects.
37
Sabetghadam S., Lupu M., Bierig R., and Rauber A, ."Reachability Analysis of Graph Modelled Collections". 37th
European Conference on Information Retrieval, ECIR 2015, pp. 370-381
C1
Observations
 Different facets leading to visiting different parts of the
collection
 This reinforces the importance of the poly-representation idea
to identify the relevant objects.
38
Sabetghadam S., Lupu M., Bierig R., and Rauber A, ."Reachability Analysis of Graph Modelled Collections". 37th
European Conference on Information Retrieval, ECIR 2015, pp. 370-381
Still limited access to the graph,
only half of the graph is reachable
C1
RQ2: In such a graph model, can the relevant
nodes be reached?
39
C2: Reachability Analysis from Different Links (α,β,δ,Ɣ )
C1: Reachability Analysis from Different Facets
C3: Reachability Analysis of Different Topic Categories
C4: Graph Visit from Different Facets
Reachability
Recall Baseline Graph vs Lucene Results
 Base graph recall: 0.76
 Better than Lucene: 0.66
40
0.76
0.66
β: Part-of
δ: Facet
C2
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Adding Semantic Links
 Using DBpedia dump
 Adding semantic links
between equivalent pages
 55,544 Intra-lingual
links added
 100,653 Inter-lingual
links added
41
C2
Reachability
Recall after Adding Semantic Links
 Semantic links added
from DBpedia dump
 Facets to start from
• TF.IDFD, CEDD, LMI
 Recall increase of %10
after adding semantic
links
42
0.76
0.84
0.98
0.66
α: Semantic
β: Part-of
δ: Facet
Ɣ: Similarity
C2
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Recall after Adding Semantic Links
 Semantic links added
from DBpedia dump
 Facets to start from
• TF.IDFD, CEDD, LMI
 Recall increase of %10
after adding semantic
links
43
0.76
0.84
0.98
0.66
Question: Is this recall increase just because of adding more links?
α: Semantic
β: Part-of
δ: Facet
Ɣ: Similarity
C2
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
AverageRecall
Recall comparison: Semantic vs. Random
44
Reachability
 We added the same
number of random links
between docs
Higher recall with random links but with the
expense of visiting almost all the graph!
Recall Random links
Recall Semantic links
Perc. graph seen semantic links
Perc. graph seen random links
C2
AverageRecall
Recall comparison: Semantic vs. Random
45
Reachability
 We added the same
number of random links
between docs
Higher recall with random links but with the
expense of visiting almost all the graph!
Recall Random links
Recall Semantic links
Perc. graph seen semantic links
Perc. graph seen random links
How visiting this large amount of nodes affect precision?
C2
Adding random links - Precision loss
46
Reachability
Adding semantic links helped effectively
to the recall increase.
C2
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Recall after Adding Similarity Links
47
We did the same analysis with adding
similarity links
They are effectively helpful in reaching
more relevant nodes
RQ2: In such a graph model, can the relevant
nodes be reached?
48
C2: Reachability Analysis from Different Links
C1: Reachability Analysis from Different Facets
C3: Reachability Analysis of Different Topic Categories
C4: Graph Visit from Different Facets
Reachability
Different Topic Categories (Tsikrika 2011)
49
C3
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Recall analysis of different topic categories
50
Reachability
138%
128%
373%
266%
Only TF.IDFD
All three facets
High recall gain for hard and very hard topics Facet combination increased the recall pace
Links: Part-of, Facet
C3
Base graph vs added Semantic links
51
Only TF.IDFD – without semantic links Only TF.IDFD – with semantic links
13% 8%
We obtained 13% increase in recall for hard topics
by using semantic links
Links: Part-of, Facet, Semantic
C3
Links: Part-of, Facet
Observations
 The Graph structure outpaces Lucene results
• with 373% recall increase for very hard topics
 Leveraging multiple facets
• saved at least 5 steps to reach the same recall compared to using only
one facet
 Adding semantic links
• Very hard and hard topic by 13% and 8%
• shifted highly the recall value already in the first few steps.
52
C3
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
53
Document textual facets
Image textual facetsImage visual facets
54
Question: Do facets with similar recall behavior visit the
same parts or relevant nodes too?
Document textual facets
Image textual facetsImage visual facets
RQ2: In such a graph model, can the relevant
nodes be reached?
55
C2: Reachability Analysis from Different Links
C1: Reachability Analysis from Different Facets
C3: Reachability Analysis of Different Topic Categories
C4: Graph Visit from Different Facets (β,δ)
Reachability
Do different facets with the same recall value visit
the same parts of the graph?
 Percentage of different nodes visited in a step
56
Reachability
Nodes seen in a step for
facet fi Nodes reachable only
through facet fi
C4
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Graph Visit – Doc Textual Facets
57
Reachability
Ratio of nodes seen Ratio of relevant nodes seen
C4
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Links: Part-of, Facet
Graph Visit – Image Textual Facets
58
Reachability
Ratio of nodes seen Ratio of relevant nodes seen
C4
Image textual facets visit at least 20% different parts of the graphLinks: Part-of, Facet
Graph Visit – Image Visual Facets
59
Reachability
Ratio of nodes seen Ratio of relevant nodes seen
C4
Image visual facets visit very different parts, but not so many
different relevant nodes.
Links: Part-of, Facet
Observations
 Re-inforcing the importance of poly-representation idea
• to identify relevant information objects
 The image visual facets show the same recall behaviour,
• They visit totally different relevant images at the beginning steps (up
to 10)
 LM (Language Model) facet has more divergent view than
BM25 and TF.IDF facets
60
Reachability
C4
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Contributions
61
[ECIR Workshop 2013]
[IRFC 2014]
[ICMR 2014]
Contributions
RQ1:Can we define a
graph-based model for
multi-modal
multi-faceted information
retrieval (MMIR)?
[ECIR 2015] [CLEF 2014]
[GBS 2015]
[Keystone 2016]
Astera model for MMIR
Modelled faceted
search and relevancy
computation function
RQ2: In such a graph
model, can the relevant
nodes be reached?
Contributions
Analysis of the effect of
different facets and
links on reachability of
relevant nodes
Model
RQ3: In such a model can
scores identify the
relevant nodes?
Contributions
Analysis of the effect of
different facets on precision.
The effect of Query-
dependent and –
independent routing on
precision
Reachability Score Analysis
RQ3: In such a model can scores identify the
relevant nodes?
62
C2: Precision in query-dependent and query-independent routing
C1: The effect of different facet combination on precision
C1
Baseline test (1) – No Graph
63
 Standard Test
• text-based search
Precision
C1
D1
D2
D3
I11
I12
I13
I22
I31
I32
I11
I12
I13
I22
I31
I32
Ranked list
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Baseline test (2) – No Graph
Reranked by image similarity computation
64
Each image has two scores:
Text similarity score,
Image similarity score
Precision
C1
D1
D2
D3
I11
I12
I13
I22
I31
I32
I11
I12
I13
I22
I31
I32
Ranked list
Reranked list
Similarity with query images
I11
I13
I32
I12
I22
I32
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Precision with different facet combinations,
st:0.9, links: β, δ
65
TF.IDFD
TF.IDFD & CEDD
TF.IDFD & LMI
TF.IDFD & CEDD & LMI
All Facets > TF.IDFD & LMI > TF.IDFD & CEDD > TF.IDFD
Precision
Precision increase by 9%
Sabetghadam S., Lupu M., Bierig R., and Rauber A., "A Hybrid Approach for Multi-Faceted IR in Multi-modal Domain",
5th Conference of Labs and Evaluation Forums, CLEF 2014, pp. 86-97.
RQ3: In such a model can scores identify the
relevant nodes?
66
C2: Precision in very large steps with query-dependent and
query-independent routing
C1: The effect of different facet combination on precision
Precision
How does the graph behave in very large steps?
 Normalized weighting in the graph to satisfy the stochastic
property
 Random Walk
 See the graph in the stationary distribution
 Compare Query-dependent and Query-independent routing
67
Precision
C2
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Query-dependent and Query-independent
Routing
 Random Walks as Query-independent routing
• The basic definition of RW does not consider relevancy to the query
• What we need is only the transition matrix
 Metropolis Hastings as Query-dependent routing
• In each step
o Relevance of source and target nodes to the query are considered
68
Precision
C2
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Precision analysis of Random Walk and
Metropolis-Hastings
69
Random Walk Metropolis-Hastings
Precision
C2
Sabetghadam S., Lupu M., and Rauber A,."Leveraging Metropolis-Hastings Algorithm on Graph-based Model for
multimodal IR". GSB’15: First International Workshop on Graph Search and Beyond, held at SIGIR 2015, pp. 14-18.
Observations
 Combination of three facets > combination of each two
 Precision increased by 9%
 Compared the performance of using query-dependent and
independent Random Walk
 Higher precision results with query-dependent RW
70
Precision
C2
Sabetghadam S., Lupu M., Rauber A., "Random Walks Analysis on Graph Modelled Multimodal Collections",
Second International KEYSTONE Conference, Keystone, 2016.
Conclusion (1)
 RQ1: How to define a graph-based model for multi-modal multi-
faceted information retrieval?
 We defined a graph-based model which supports
• Different modalities
• Different types of relations
• Our defined search based on facets
 Decomposes an information object to its facets
 Defined relevancy function Based on Poly-representation principle
 Calculate relevancy of an information object to the query as starting
points in the graph
71ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Conclusion (2)
 RQ2: In such a graph model, can the relevant nodes be reached?
 We showed the effect of poly-representation on recall
• Combination of facets increases result
• Not every combination is effective
 The graph structure helps reaching relevant nodes specially for hard
and very hard topics
 The results from adding random links showed:
• The effect of adding meaningful in higher reachability
 Facets may show same recall behavior but visit different relevant
nodes (again Poly-representation)
72ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Conclusion (3)
 RQ3: In such a model can scores identify the relevant nodes?
 Combination of different facets resulted in better precision
 Compared Query-dependent and Query-independent Random
Walks in stationary distribution
 Query-dependent Random walks shows better precision
73ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
Astera Status
 Astera: 2012 - now
 open source and available online
• http://ifs.tuwien.ac.at/~sabetghadam/Astera.html
 Highly configurable to work
with other collections
74
Query files
Query
Manager
Indexed
Search
Manager
Indexed Data
Graph Search
Graph
Structured
Data
Data
Collection
Data
Collection
Data
Collection
Scoring
Ranking
Linked Data
Indexed Search
Results
Final Results
Data Interface
Semantic
Manager
Scientific Publications
75
 Sabetghadam S., Lupu M., Rauber A., "Astera - A generic model for multi-modal Information Retrieval", Workshop on Integrating IR technologies
for Professional Search, held in ECIR 2013, pp. 551-554.
 Sabetghadam S., Lupu M., and Rauber A., "Which one do you choose? Spreading Activation or Random Walks?". Information Retrieval Facility
Conference, IRFC 2014, pp. 112-119.
 Sabetghadam S., Lupu M., Bierig R., and Rauber A., "A combined approach of structured and non-structured IR in multi-modal domain", In
Proceedings of International Conference on Multimedia Retrieval, ICMR 2014, pp. 491-494.
 Sabetghadam S., Astera - A model for Multimodal IR with a Combined Approach of Structured and Non-structured Retrieval, Doctoral
Symposium, ICMR 2014, pp. 551.
 Sabetghadam S., Lupu M., Bierig R., and Rauber A., "A Hybrid Approach for Multi-Faceted IR in multi-modal Domain", 5th Conference of Labs and
Evaluation Forums, CLEF 2014, pp. 86-97.
 Sabetghadam S., Lupu M., Bierig R., and Rauber A,."Reachability Analysis of Graph Modelled Collections". 37th European Conference on
Information Retrieval, ECIR 2015, pp. 370-381.
 Sabetghadam S., Palotti J., Rekabsaz N., Lupu M., Hanbury A.."TUW at MediaEval 2015". MediaEval, 2015. Obtained first place in the task of
„Diverise Social Image Retrieval“, Media Eval 2015.
 Sabetghadam S., Lupu M., and Rauber A,."Leveraging Metropolis-Hastings Algorithm on Graph-based Model for multimodal IR". GSB’15: First
International Workshop on Graph Search and Beyond, held at SIGIR 2015, pp. 14-18.
 Navid Rekabsaz, Serwah Sabetghadam, Mihai Lupu, Linda Andersson, Allan Hanbury, Standard Test Collection for English-Persian Cross-Lingual
Word Sense Disambiguation, Proceeding on Language Resources and Evaluation Conference, LREC 2016.
 Sabetghadam S., Lupu M., Rauber A., "Random Walks Analysis on Graph Modelled Multimodal Collections", Second International KEYSTONE
Conference Keystone 2016.
 Eskevich M., Larson M., Aly R., Sabetghadam S., Jones G., Ordelman R., Huet B. “Multimodal Video-to-Video Linking: Turning to the Crowd for
Insight and Evaluation”, Proceeding on Multimedia Modeling, MMM 2017, pp. 280-292.
 S. Sabetghadam, M. Lupu, R. Bierig, A. Rauber, "A faceted approach to reachability analysis of graph modelled collections";
International Journal of Multimedia Information Retrieval, IJMIR (2017), 10.1007/s13735-017-0145-8; S. 1 - 15.

A Graph-based Model for Multimodal Information Retrieval

  • 1.
    Serwah Sabetghadam PhD DefensePresentation Institute of Software Technology and Interactive Systems (IFS Group) Vienna University of Technology Supervisors Ao.univ.Prof. Dr. Andreas Rauber Dr. Mihai Lupu A Graph-based Model for Multimodal Information Retrieval
  • 2.
    Rapid growth ofthe multimodal content
  • 3.
    On average 350million photos are uploaded daily to Facebook Multimodal Information Retrieval has become a challenge
  • 4.
    Multimodal Information Retrieval(IR)  Search for information of any modality with an information need that can be unimodal or multimodal  Unimodal query: only keywords or an image example  Multimodal query: a combination of images, video, or music files 4 modality ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 5.
    An example ofan information need: find paintings like this image 5 multimodal query
  • 6.
    Motivation (1)  Conventionaltext search to find multimodal result • E.g. Text based Image retrieval • Lack of indexing information of other modalities o Content-based image retrieval • Motivates to use different modalities  Traditional IR does not include explicitly relations between docs • Documents are not isolated anymore • Hyperlinks, Metadata and Semantic connections • User-generated multimodal content, multimodal collections • Motivates to use structured IR too 6ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 7.
    Motivation (2) Multimodal Datais Interlinked Structured data represented by Graphs Related work consider only one type of relation and one type of modality in the graphs ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 8.
    Motivation (2) Multimodal Datais Interlinked Structured data represented by Graphs Related work consider only one type of relation and one type of modality in the graphs ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets Motivations: - To consider different modalities - To use graph of objects - To use different types of relations (e.g. semantic or similarity) - To consider contained information object separately
  • 9.
    Research Questions  RQ1:How to define a graph-based model for multimodal information retrieval?  RQ2: In such a graph model, can the relevant nodes be reached?  RQ3: In such a model can scores identify the relevant nodes? 9ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 10.
    Contributions 10 [ECIR Workshop 2013] [IRFC2014] [ICMR 2014] Contributions RQ1: How to define a graph-based model for multi-modal information retrieval (MMIR)? [ECIR 2015] [CLEF 2014] [GBS 2015] [Keystone 2016] Astera model for MMIR Defined search based on facets RQ2: In such a graph model, can the relevant nodes be reached (recall)? Contributions Analysis of the effect of different facets and links on reachability of relevant nodes Model RQ3: In such a model can scores identify the relevant nodes? Contributions Analysis of the effect of different facets on precision. The effect of Query- dependent and – independent routing on precision Reachability Precision
  • 11.
    Contributions 11 [ECIR Workshop 2013] [IRFC2014] [ICMR 2014] Contributions RQ1: How to define a graph-based model for multi-modal information retrieval (MMIR)? [ECIR 2015] [CLEF 2014] [GBS 2016] [Keystone 2016] Astera model for MMIR Defined search based on facets RQ2: In such a graph model, can the relevant nodes be reached? Contributions Analysis of the effect of different facets and links on reachability of relevant nodes Model RQ3: In such a model can scores identify the relevant nodes? Contributions Analysis of the effect of different facets on precision. The effect of Query- dependent and – independent routing on precision Reachability Score Analysis
  • 12.
    An Information Objectand Different Representations/Facets 12 An information object may have several representations or facets. Facet: an inherent feature or property of an information object Image F3 F2 F1 F4 F5 Model ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 13.
    RQ1: How todefine a graph-based model for multi-modal information retrieval?  We propose a graph of information objects G = (V,E) named Astera • V is the set of vertices (nodes/facets) • E is the set of edges (defined by different types of relations)  Different types of relations • Semantic (α): any semantic relation between two objects • Part-of (β): an object as part of another object, e.g. an image in a document • Similarity (Ɣ): relation between the facets of the same type of two information object • Facet (δ): linking an object to its facets 13 Model S. Sabetghadam, M. Lupu, and A. Rauber, “Astera - A Generic Model for Multimodal Information Retrieval”, in Proceedings of Integrating IR Technologies for Professional Search Workshop, held in ECIR 2013, pp. 551-554.
  • 14.
    An Example ofthe Model 14 Elvis Presley Graceland (Home of Elvis Presley) Rockability α α Elvis Presley Graceland (Home of Elvis Presley) Rockability β ββ β β β β β β α α Elvis Presley Graceland (Home of Elvis Presley) Rockability β ββ β β β β β TF.IDF TF.IDF TF.IDF δ δ δ β α α Elvis Presley Graceland (Home of Elvis Presley) Rockability β ββ β β β β β TF.IDF TF.IDF TF.IDF δ δ δ β Color HistogramColor Histogram δδ Edge Histogram Edge Histogram δ δ Color Histogramδ α α Elvis Presley Graceland (Home of Elvis Presley) Rockability β ββ β β β β β TF.IDF TF.IDF TF.IDF δ δ δ Ɣ Ɣ Ɣ β Color HistogramColor Histogram δδ γ Edge Histogram Edge Histogram δ γ δ Color Histogramδ γ α α Ɣ α: Semantic β: Part-of δ: Facet Ɣ: Similarity
  • 15.
    Hybrid Search  Standardsearch • Text: Lucene • Image: LIRE • Any similarity computation framework  Graph search • Start from top results of standard search and traverse the graph • We take top 20 results of each facet 15 Model ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 16.
    Relevance Score ValueFunction (RSV) 16   Reaching a node we calculate the similarity of different facets with the query facets  The result is a score given to this node Normalization function Similarity function Weight of facet fi Model S. Sabetghadam, M. Lupu, and A. Rauber, “A Combined Approach of Structured and Non-structured IR in multimodal domain,” in Proceeedings of International Conference on Multimodal Retrieval, ICMR 2014, pp. 491-494.
  • 17.
    Graph Traversal Methods(1)  Two of the well-known methods Spreading Activation , Random Walks  Spreading activation • Based on associative retrieval idea [Crestani97] o Nodes and associations • Some nodes get activated • Energy propagates to the neighbours 17 Model
  • 18.
    Graph Traversal Methods(2)  Random Walks • a chain of states created by some stochastic process • Stationary distribution of the graph  Two methods are Principally the same • Under certain conditions • Used both in different experiments 18 S. Sabetghadam, M. Lupu, A. Rauber, “Which one to choose? Spreading Activation or Random Walks?”, Information Retrieval Facility Conference, IRFC 2014, pp. 112-119. Model
  • 19.
    Data collection  ImageCLEF2011 Wikipedia collection  About 400,000 Documents and Images • 125,828 documents, 273,434 images 19 Model ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 20.
    Query data  50topics • Easy, medium, hard, very hard [Tsikrika 2011]  Query • keywords • four/five image examples 20ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets Model
  • 21.
    21 Query example: “Flyingbird” Visual features: CEDD, TLEP, SURF, CIME Textual features: TF.IDF, LM, BM25 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 22.
    Image Metadata Providedby the Collection 22 Query: Flying Bird ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 23.
    The Collection Mappedto our Model 23 D1D1 D2 D4D3 I1 I2 I4I3 TF.IDF CEDD CEDDTF.IDF TF.IDFCEDD TF.IDF TF.IDF TF.IDF TF.IDF TF.IDF CEDD Facet Part-of Model ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 24.
    Hybrid Search Example 24 D1D1D2 D4D3 I1 I2 I4I3 TF.IDF CEDD CEDDTF.IDF TF.IDFCEDD TF.IDF TF.IDF TF.IDF TF.IDF TF.IDF CEDD D1D1 D2 D4D3 I1 I2 I4I3 TF.IDF CEDD CEDDTF.IDF TF.IDFCEDD TF.IDF TF.IDF TF.IDF TF.IDF TF.IDF CEDD Model D1D1 D2 D4D3 I1 I2 I4I3 TF.IDF CEDD CEDDTF.IDF TF.IDFCEDD TF.IDF TF.IDF TF.IDF TF.IDF TF.IDF CEDD D1D1 D2 D4D3 I1 I2 I4I3 TF.IDF CEDD CEDDTF.IDF TF.IDFCEDD TF.IDF TF.IDF TF.IDF TF.IDF TF.IDF CEDD ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 25.
    Summary  We proposeda model which supports • Different modalities • Different relation types • Decomposed an information object into facets  Hybrid Search • Standard and Graph search  Mapped the collection to our model 25 Model ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 26.
    Contributions 26 [ECIR Workshop 2013] [IRFC2014] [ICMR 2014] Contributions RQ1:Can we define a graph-based model for multi-modal multi-faceted information retrieval (MMIR)? [ECIR 2015] [CLEF 2014] [GBS 2016] [Keystone 2016] Astera model for MMIR Modelled faceted search and relevancy computation function RQ2: In such a graph model, can the relevant nodes be reached? Contributions Analysis of the effect of different facets and links on reachability of relevant nodes Model RQ3: In such a model can scores identify the relevant nodes? Contributions Analysis of the effect of different facets on precision. The effect of Query- dependent and – independent routing on precision Reachability Score Analysis
  • 27.
    Relevant Data Distribution 40 steps  In each step • Check if we visit new relevant image 27 Reachability Shape Size: No. related node No. total related @step ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 28.
    RQ2: In sucha graph model, can the relevant nodes be reached? 28 C2: Reachability Analysis from Different Links C1: Reachability Analysis from Different Facets C3: Reachability Analysis of Different Topic Categories C4: Graph Visit from Different Facets Reachability
  • 29.
    Recall from DocumentTextual Facets 29 Reachability  Links used • Facet and Part-of  Top 20 results of each facet  Traverse the graph from these results  Calculate recall in each step C1 Similar recall behaviour ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 30.
    Recall from ImageTextual Facets 30 Reachability  Same experiment starting from results of Image metadata textual facet C1 Similar recall behaviour ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 31.
    Recall from ImageVisual Facets 31 Reachability C1 Similar recall behaviour  Same experiment starting from results of Image visual facets ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 32.
    Representative Facet fromeach Category of Facets 32 Document textual facetsImage textual facets Image visual facets TF.IDFD, LMI, CEDD C1 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 33.
    Facet Combinations (1) 33 Reachability Links: part-of, facet • No semantic/ similarity links  Fewer visited nodes, higher recall TF.IDFD, LMI>TF.IDFD, CEDD  Highest recall TF.IDFD, CEDD, LMI Graph percentage seen Recall TF.IDFD, CEDD TF.IDFD, LMI C1 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 34.
    Facet Combinations (1) 34 Reachability Links: part-of, facet • No semantic/ similarity links  Fewer visited nodes, higher recall TF.IDFD, LMI>TF.IDFD, CEDD  Highest recall TF.IDFD, CEDD, LMI Graph percentage seen Recall TF.IDFD, CEDD TF.IDFD, LMI C1 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 35.
    Facet Combinations (2) 35 Reachability Links: part-of, facet • No semantic/ similarity links  Fewer visited nodes, higher recall TF.IDFD, LMI>TF.IDFD, CEDD  Highest recall TF.IDFD, CEDD, LMI Graph percentage seen Recall Recall Percentage graph seen TF.IDFD, CEDD TF.IDFD, LMI C1 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 36.
    Facet Combinations (2) 36 Reachability Links: part-of, facet • No semantic/ similarity links  Fewer visited nodes, higher recall TF.IDFD, LMI>TF.IDFD, CEDD  Highest recall TF.IDFD, CEDD, LMI Graph percentage seen Recall Fewer visited nodes, higher recall TF.IDFD, LMI>TF.IDFD, CEDD Recall Percentage graph seen TF.IDFD, CEDD TF.IDFD, LMI C1 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 37.
    Observations  Different facetsleading to visiting different parts of the collection  This reinforces the importance of the poly-representation idea to identify the relevant objects. 37 Sabetghadam S., Lupu M., Bierig R., and Rauber A, ."Reachability Analysis of Graph Modelled Collections". 37th European Conference on Information Retrieval, ECIR 2015, pp. 370-381 C1
  • 38.
    Observations  Different facetsleading to visiting different parts of the collection  This reinforces the importance of the poly-representation idea to identify the relevant objects. 38 Sabetghadam S., Lupu M., Bierig R., and Rauber A, ."Reachability Analysis of Graph Modelled Collections". 37th European Conference on Information Retrieval, ECIR 2015, pp. 370-381 Still limited access to the graph, only half of the graph is reachable C1
  • 39.
    RQ2: In sucha graph model, can the relevant nodes be reached? 39 C2: Reachability Analysis from Different Links (α,β,δ,Ɣ ) C1: Reachability Analysis from Different Facets C3: Reachability Analysis of Different Topic Categories C4: Graph Visit from Different Facets Reachability
  • 40.
    Recall Baseline Graphvs Lucene Results  Base graph recall: 0.76  Better than Lucene: 0.66 40 0.76 0.66 β: Part-of δ: Facet C2 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 41.
    Adding Semantic Links Using DBpedia dump  Adding semantic links between equivalent pages  55,544 Intra-lingual links added  100,653 Inter-lingual links added 41 C2 Reachability
  • 42.
    Recall after AddingSemantic Links  Semantic links added from DBpedia dump  Facets to start from • TF.IDFD, CEDD, LMI  Recall increase of %10 after adding semantic links 42 0.76 0.84 0.98 0.66 α: Semantic β: Part-of δ: Facet Ɣ: Similarity C2 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 43.
    Recall after AddingSemantic Links  Semantic links added from DBpedia dump  Facets to start from • TF.IDFD, CEDD, LMI  Recall increase of %10 after adding semantic links 43 0.76 0.84 0.98 0.66 Question: Is this recall increase just because of adding more links? α: Semantic β: Part-of δ: Facet Ɣ: Similarity C2 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 44.
    AverageRecall Recall comparison: Semanticvs. Random 44 Reachability  We added the same number of random links between docs Higher recall with random links but with the expense of visiting almost all the graph! Recall Random links Recall Semantic links Perc. graph seen semantic links Perc. graph seen random links C2
  • 45.
    AverageRecall Recall comparison: Semanticvs. Random 45 Reachability  We added the same number of random links between docs Higher recall with random links but with the expense of visiting almost all the graph! Recall Random links Recall Semantic links Perc. graph seen semantic links Perc. graph seen random links How visiting this large amount of nodes affect precision? C2
  • 46.
    Adding random links- Precision loss 46 Reachability Adding semantic links helped effectively to the recall increase. C2 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 47.
    Recall after AddingSimilarity Links 47 We did the same analysis with adding similarity links They are effectively helpful in reaching more relevant nodes
  • 48.
    RQ2: In sucha graph model, can the relevant nodes be reached? 48 C2: Reachability Analysis from Different Links C1: Reachability Analysis from Different Facets C3: Reachability Analysis of Different Topic Categories C4: Graph Visit from Different Facets Reachability
  • 49.
    Different Topic Categories(Tsikrika 2011) 49 C3 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 50.
    Recall analysis ofdifferent topic categories 50 Reachability 138% 128% 373% 266% Only TF.IDFD All three facets High recall gain for hard and very hard topics Facet combination increased the recall pace Links: Part-of, Facet C3
  • 51.
    Base graph vsadded Semantic links 51 Only TF.IDFD – without semantic links Only TF.IDFD – with semantic links 13% 8% We obtained 13% increase in recall for hard topics by using semantic links Links: Part-of, Facet, Semantic C3 Links: Part-of, Facet
  • 52.
    Observations  The Graphstructure outpaces Lucene results • with 373% recall increase for very hard topics  Leveraging multiple facets • saved at least 5 steps to reach the same recall compared to using only one facet  Adding semantic links • Very hard and hard topic by 13% and 8% • shifted highly the recall value already in the first few steps. 52 C3 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 53.
    53 Document textual facets Imagetextual facetsImage visual facets
  • 54.
    54 Question: Do facetswith similar recall behavior visit the same parts or relevant nodes too? Document textual facets Image textual facetsImage visual facets
  • 55.
    RQ2: In sucha graph model, can the relevant nodes be reached? 55 C2: Reachability Analysis from Different Links C1: Reachability Analysis from Different Facets C3: Reachability Analysis of Different Topic Categories C4: Graph Visit from Different Facets (β,δ) Reachability
  • 56.
    Do different facetswith the same recall value visit the same parts of the graph?  Percentage of different nodes visited in a step 56 Reachability Nodes seen in a step for facet fi Nodes reachable only through facet fi C4 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 57.
    Graph Visit –Doc Textual Facets 57 Reachability Ratio of nodes seen Ratio of relevant nodes seen C4 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets Links: Part-of, Facet
  • 58.
    Graph Visit –Image Textual Facets 58 Reachability Ratio of nodes seen Ratio of relevant nodes seen C4 Image textual facets visit at least 20% different parts of the graphLinks: Part-of, Facet
  • 59.
    Graph Visit –Image Visual Facets 59 Reachability Ratio of nodes seen Ratio of relevant nodes seen C4 Image visual facets visit very different parts, but not so many different relevant nodes. Links: Part-of, Facet
  • 60.
    Observations  Re-inforcing theimportance of poly-representation idea • to identify relevant information objects  The image visual facets show the same recall behaviour, • They visit totally different relevant images at the beginning steps (up to 10)  LM (Language Model) facet has more divergent view than BM25 and TF.IDF facets 60 Reachability C4 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 61.
    Contributions 61 [ECIR Workshop 2013] [IRFC2014] [ICMR 2014] Contributions RQ1:Can we define a graph-based model for multi-modal multi-faceted information retrieval (MMIR)? [ECIR 2015] [CLEF 2014] [GBS 2015] [Keystone 2016] Astera model for MMIR Modelled faceted search and relevancy computation function RQ2: In such a graph model, can the relevant nodes be reached? Contributions Analysis of the effect of different facets and links on reachability of relevant nodes Model RQ3: In such a model can scores identify the relevant nodes? Contributions Analysis of the effect of different facets on precision. The effect of Query- dependent and – independent routing on precision Reachability Score Analysis
  • 62.
    RQ3: In sucha model can scores identify the relevant nodes? 62 C2: Precision in query-dependent and query-independent routing C1: The effect of different facet combination on precision C1
  • 63.
    Baseline test (1)– No Graph 63  Standard Test • text-based search Precision C1 D1 D2 D3 I11 I12 I13 I22 I31 I32 I11 I12 I13 I22 I31 I32 Ranked list ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 64.
    Baseline test (2)– No Graph Reranked by image similarity computation 64 Each image has two scores: Text similarity score, Image similarity score Precision C1 D1 D2 D3 I11 I12 I13 I22 I31 I32 I11 I12 I13 I22 I31 I32 Ranked list Reranked list Similarity with query images I11 I13 I32 I12 I22 I32 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 65.
    Precision with differentfacet combinations, st:0.9, links: β, δ 65 TF.IDFD TF.IDFD & CEDD TF.IDFD & LMI TF.IDFD & CEDD & LMI All Facets > TF.IDFD & LMI > TF.IDFD & CEDD > TF.IDFD Precision Precision increase by 9% Sabetghadam S., Lupu M., Bierig R., and Rauber A., "A Hybrid Approach for Multi-Faceted IR in Multi-modal Domain", 5th Conference of Labs and Evaluation Forums, CLEF 2014, pp. 86-97.
  • 66.
    RQ3: In sucha model can scores identify the relevant nodes? 66 C2: Precision in very large steps with query-dependent and query-independent routing C1: The effect of different facet combination on precision Precision
  • 67.
    How does thegraph behave in very large steps?  Normalized weighting in the graph to satisfy the stochastic property  Random Walk  See the graph in the stationary distribution  Compare Query-dependent and Query-independent routing 67 Precision C2 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 68.
    Query-dependent and Query-independent Routing Random Walks as Query-independent routing • The basic definition of RW does not consider relevancy to the query • What we need is only the transition matrix  Metropolis Hastings as Query-dependent routing • In each step o Relevance of source and target nodes to the query are considered 68 Precision C2 ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 69.
    Precision analysis ofRandom Walk and Metropolis-Hastings 69 Random Walk Metropolis-Hastings Precision C2 Sabetghadam S., Lupu M., and Rauber A,."Leveraging Metropolis-Hastings Algorithm on Graph-based Model for multimodal IR". GSB’15: First International Workshop on Graph Search and Beyond, held at SIGIR 2015, pp. 14-18.
  • 70.
    Observations  Combination ofthree facets > combination of each two  Precision increased by 9%  Compared the performance of using query-dependent and independent Random Walk  Higher precision results with query-dependent RW 70 Precision C2 Sabetghadam S., Lupu M., Rauber A., "Random Walks Analysis on Graph Modelled Multimodal Collections", Second International KEYSTONE Conference, Keystone, 2016.
  • 71.
    Conclusion (1)  RQ1:How to define a graph-based model for multi-modal multi- faceted information retrieval?  We defined a graph-based model which supports • Different modalities • Different types of relations • Our defined search based on facets  Decomposes an information object to its facets  Defined relevancy function Based on Poly-representation principle  Calculate relevancy of an information object to the query as starting points in the graph 71ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 72.
    Conclusion (2)  RQ2:In such a graph model, can the relevant nodes be reached?  We showed the effect of poly-representation on recall • Combination of facets increases result • Not every combination is effective  The graph structure helps reaching relevant nodes specially for hard and very hard topics  The results from adding random links showed: • The effect of adding meaningful in higher reachability  Facets may show same recall behavior but visit different relevant nodes (again Poly-representation) 72ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 73.
    Conclusion (3)  RQ3:In such a model can scores identify the relevant nodes?  Combination of different facets resulted in better precision  Compared Query-dependent and Query-independent Random Walks in stationary distribution  Query-dependent Random walks shows better precision 73ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets
  • 74.
    Astera Status  Astera:2012 - now  open source and available online • http://ifs.tuwien.ac.at/~sabetghadam/Astera.html  Highly configurable to work with other collections 74 Query files Query Manager Indexed Search Manager Indexed Data Graph Search Graph Structured Data Data Collection Data Collection Data Collection Scoring Ranking Linked Data Indexed Search Results Final Results Data Interface Semantic Manager
  • 75.
    Scientific Publications 75  SabetghadamS., Lupu M., Rauber A., "Astera - A generic model for multi-modal Information Retrieval", Workshop on Integrating IR technologies for Professional Search, held in ECIR 2013, pp. 551-554.  Sabetghadam S., Lupu M., and Rauber A., "Which one do you choose? Spreading Activation or Random Walks?". Information Retrieval Facility Conference, IRFC 2014, pp. 112-119.  Sabetghadam S., Lupu M., Bierig R., and Rauber A., "A combined approach of structured and non-structured IR in multi-modal domain", In Proceedings of International Conference on Multimedia Retrieval, ICMR 2014, pp. 491-494.  Sabetghadam S., Astera - A model for Multimodal IR with a Combined Approach of Structured and Non-structured Retrieval, Doctoral Symposium, ICMR 2014, pp. 551.  Sabetghadam S., Lupu M., Bierig R., and Rauber A., "A Hybrid Approach for Multi-Faceted IR in multi-modal Domain", 5th Conference of Labs and Evaluation Forums, CLEF 2014, pp. 86-97.  Sabetghadam S., Lupu M., Bierig R., and Rauber A,."Reachability Analysis of Graph Modelled Collections". 37th European Conference on Information Retrieval, ECIR 2015, pp. 370-381.  Sabetghadam S., Palotti J., Rekabsaz N., Lupu M., Hanbury A.."TUW at MediaEval 2015". MediaEval, 2015. Obtained first place in the task of „Diverise Social Image Retrieval“, Media Eval 2015.  Sabetghadam S., Lupu M., and Rauber A,."Leveraging Metropolis-Hastings Algorithm on Graph-based Model for multimodal IR". GSB’15: First International Workshop on Graph Search and Beyond, held at SIGIR 2015, pp. 14-18.  Navid Rekabsaz, Serwah Sabetghadam, Mihai Lupu, Linda Andersson, Allan Hanbury, Standard Test Collection for English-Persian Cross-Lingual Word Sense Disambiguation, Proceeding on Language Resources and Evaluation Conference, LREC 2016.  Sabetghadam S., Lupu M., Rauber A., "Random Walks Analysis on Graph Modelled Multimodal Collections", Second International KEYSTONE Conference Keystone 2016.  Eskevich M., Larson M., Aly R., Sabetghadam S., Jones G., Ordelman R., Huet B. “Multimodal Video-to-Video Linking: Turning to the Crowd for Insight and Evaluation”, Proceeding on Multimedia Modeling, MMM 2017, pp. 280-292.  S. Sabetghadam, M. Lupu, R. Bierig, A. Rauber, "A faceted approach to reachability analysis of graph modelled collections"; International Journal of Multimedia Information Retrieval, IJMIR (2017), 10.1007/s13735-017-0145-8; S. 1 - 15.