A Graph-based Model for Multimodal Information Retrieval

Serwah Sabetghadam
PhD Defense Presentation
Institute of Software Technology and Interactive Systems (IFS Group)
Vienna University of Technology
Supervisors
Ao.univ.Prof. Dr. Andreas Rauber
Dr. Mihai Lupu
A Graph-based Model for Multimodal
Information Retrieval

Rapid growth of the multimodal content

On average 350 million photos are uploaded
daily to Facebook
Multimodal Information Retrieval
has become a challenge

Multimodal Information Retrieval (IR)
 Search for information of any modality with an information
need that can be unimodal or multimodal
 Unimodal query: only keywords or
an image example
 Multimodal query: a combination of
images, video, or music files
4
modality
ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets

An example of an information need:
find paintings like this image
5
multimodal query

Motivation (1)
 Conventional text search to find multimodal result
• E.g. Text based Image retrieval
• Lack of indexing information of other modalities
o Content-based image retrieval
• Motivates to use different modalities
 Traditional IR does not include explicitly relations between docs
• Documents are not isolated anymore
• Hyperlinks, Metadata and Semantic connections
• User-generated multimodal content, multimodal collections
• Motivates to use structured IR too
6ConclusionIntroduction Model Reachability PrecisionC2-Links C3-Topic Categories C4-Graph visitC1-Facets

Motivation (2)
Multimodal Data is
Interlinked
Structured data represented
by Graphs
Related work consider only
one type of relation and
one type of modality in the graphs

Motivation (2)
Multimodal Data is
Interlinked
Structured data represented
by Graphs
Related work consider only
one type of relation and
one type of modality in the graphs
Motivations:
- To consider different modalities
- To use graph of objects
- To use different types of relations
(e.g. semantic or similarity)
- To consider
contained information object
separately

Research Questions
 RQ1: How to define a graph-based model for multimodal
information retrieval?
 RQ2: In such a graph model, can the relevant nodes be
reached?
 RQ3: In such a model can scores identify the relevant nodes?

Contributions
10
[ECIR Workshop 2013]
[IRFC 2014]
[ICMR 2014]
Contributions
RQ1: How to define a
graph-based model for
multi-modal information
retrieval (MMIR)?
[ECIR 2015] [CLEF 2014]
[GBS 2015]
[Keystone 2016]
Astera model for MMIR
Defined search based
on facets
RQ2: In such a graph
model, can the relevant
nodes be reached (recall)?
Contributions
Analysis of the effect of
different facets and
links on reachability of
relevant nodes
Model
RQ3: In such a model can
scores identify the
relevant nodes?
Contributions
different facets on precision.
The effect of Query-
dependent and –
independent routing on
precision
Reachability Precision

Contributions
11
[IRFC 2014]
[ICMR 2014]
Contributions
RQ1: How to define a
multi-modal information
retrieval (MMIR)?
[ECIR 2015] [CLEF 2014]
[GBS 2016]
[Keystone 2016]
Defined search based
on facets
nodes be reached?
Contributions
relevant nodes
Model
scores identify the
relevant nodes?
Contributions
dependent and –
precision
Reachability Score Analysis

An Information Object and
Different Representations/Facets
12
An information object may have several representations or facets.
Facet: an inherent feature or property of an information object
Image
F3
F2
F1
F4
F5
Model

RQ1: How to define a graph-based model for
multi-modal information retrieval?
 We propose a graph of information objects G = (V,E) named Astera
• V is the set of vertices (nodes/facets)
• E is the set of edges (defined by different types of relations)
 Different types of relations
• Semantic (α): any semantic relation between two objects
• Part-of (β): an object as part of another object, e.g. an image in a
document
• Similarity (Ɣ): relation between the facets of the same type of two
information object
• Facet (δ): linking an object to its facets
13
Model
S. Sabetghadam, M. Lupu, and A. Rauber, “Astera - A Generic Model for Multimodal Information Retrieval”, in
Proceedings of Integrating IR Technologies for Professional Search Workshop, held in ECIR 2013, pp. 551-554.

An Example of the Model
14
Elvis Presley
Graceland
(Home of Elvis Presley)
Rockability
α
α
Elvis Presley
Graceland
Rockability
β
ββ
β
β
β
β
β
β
α
α
Elvis Presley
Graceland
Rockability
β
ββ
β
β
β
β
β
TF.IDF
TF.IDF
TF.IDF
δ
δ
δ
β
α
α
Elvis Presley
Graceland
Rockability
β
ββ
β
β
β
β
β
TF.IDF
TF.IDF
TF.IDF
δ
δ
δ
β
Color HistogramColor Histogram
δδ
Edge Histogram
Edge Histogram δ
δ
Color Histogramδ
α
α
Elvis Presley
Graceland
Rockability
β
ββ
β
β
β
β
β
TF.IDF
TF.IDF
TF.IDF
δ
δ
δ
Ɣ
Ɣ Ɣ
β
Color HistogramColor Histogram
δδ
γ
Edge Histogram
Edge Histogram δ
γ
δ
Color Histogramδ γ
α
α
Ɣ
α: Semantic
β: Part-of
δ: Facet
Ɣ: Similarity

Hybrid Search
 Standard search
• Text: Lucene
• Image: LIRE
• Any similarity computation
framework
 Graph search
• Start from top results of
standard search and traverse the graph
• We take top 20 results of each facet
15
Model

Relevance Score Value Function (RSV)
16

 Reaching a node we calculate the similarity of different facets
with the query facets
 The result is a score given to this node
Normalization function Similarity function Weight of facet fi
Model
S. Sabetghadam, M. Lupu, and A. Rauber, “A Combined Approach of Structured and Non-structured IR in multimodal
domain,” in Proceeedings of International Conference on Multimodal Retrieval, ICMR 2014, pp. 491-494.

Graph Traversal Methods (1)
 Two of the well-known methods
Spreading Activation , Random Walks
 Spreading activation
• Based on associative retrieval idea [Crestani97]
o Nodes and associations
• Some nodes get activated
• Energy propagates to the neighbours
17
Model

Graph Traversal Methods (2)
 Random Walks
• a chain of states created by some stochastic process
• Stationary distribution of the graph
 Two methods are Principally the same
• Under certain conditions
• Used both in different experiments
18
S. Sabetghadam, M. Lupu, A. Rauber, “Which one to choose? Spreading Activation or Random Walks?”,
Information Retrieval Facility Conference, IRFC 2014, pp. 112-119.
Model

Data collection
 ImageCLEF 2011 Wikipedia collection
 About 400,000 Documents and Images
• 125,828 documents, 273,434 images
19
Model

Query data
 50 topics
• Easy, medium, hard, very hard [Tsikrika 2011]
 Query
• keywords
• four/five image examples
Model

21
Query example: “Flying bird”
Visual features: CEDD, TLEP, SURF, CIME
Textual features: TF.IDF, LM, BM25

Image Metadata Provided by the Collection
22
Query: Flying Bird

The Collection Mapped to our Model
23
D1D1 D2 D4D3
I1 I2 I4I3
TF.IDF CEDD CEDDTF.IDF TF.IDFCEDD
TF.IDF TF.IDF
TF.IDF TF.IDF
TF.IDF CEDD
Facet
Part-of
Model

Hybrid Search Example
24
D1D1 D2 D4D3
I1 I2 I4I3
TF.IDF TF.IDF TF.IDF TF.IDF
TF.IDF CEDD
D1D1 D2 D4D3
I1 I2 I4I3
TF.IDF TF.IDF TF.IDF TF.IDF
TF.IDF CEDD
Model
D1D1 D2 D4D3
I1 I2 I4I3
TF.IDF TF.IDF
TF.IDF TF.IDF
TF.IDF CEDD
D1D1 D2 D4D3
I1 I2 I4I3
TF.IDF TF.IDF
TF.IDF TF.IDF
TF.IDF CEDD

Summary
 We proposed a model which supports
• Different modalities
• Different relation types
• Decomposed an information object into facets
 Hybrid Search
• Standard and Graph search
 Mapped the collection to our model
25
Model

Contributions
26
[IRFC 2014]
[ICMR 2014]
Contributions
RQ1:Can we define a
multi-modal
multi-faceted information
retrieval (MMIR)?
[ECIR 2015] [CLEF 2014]
[GBS 2016]
[Keystone 2016]
Modelled faceted
search and relevancy
computation function
nodes be reached?
Contributions
relevant nodes
Model
scores identify the
relevant nodes?
Contributions
dependent and –
precision

Relevant Data Distribution
 40 steps
 In each step
• Check if we visit
new relevant image
27
Reachability
Shape Size:
No. related node
No. total related
@step

RQ2: In such a graph model, can the relevant
nodes be reached?
28
C2: Reachability Analysis from Different Links
C1: Reachability Analysis from Different Facets
C3: Reachability Analysis of Different Topic Categories
C4: Graph Visit from Different Facets
Reachability

Recall from Document Textual Facets
29
Reachability
 Links used
• Facet and Part-of
 Top 20 results of each facet
 Traverse the graph from
these results
 Calculate recall in each step
C1
Similar recall behaviour

Recall from Image Textual Facets
30
Reachability
 Same experiment starting
from results of
Image metadata textual facet
C1

Recall from Image Visual Facets
31
Reachability
C1
 Same experiment
starting from results of
Image visual facets

Representative Facet from each Category of
Facets
32
Document textual facetsImage textual facets
Image visual facets
TF.IDFD, LMI, CEDD
C1

Facet Combinations (1)
33
Reachability
 Links: part-of, facet
• No semantic/
similarity links
 Fewer visited nodes,
higher recall
TF.IDFD, LMI>TF.IDFD, CEDD
 Highest recall
TF.IDFD, CEDD, LMI
Graph percentage seen
Recall
TF.IDFD, CEDD
TF.IDFD, LMI
C1

34
Reachability
• No semantic/
similarity links
higher recall
 Highest recall
TF.IDFD, CEDD, LMI
Recall
TF.IDFD, CEDD
TF.IDFD, LMI
C1

35
Reachability
• No semantic/
similarity links
higher recall
 Highest recall
TF.IDFD, CEDD, LMI
Recall
Recall
Percentage graph seen
TF.IDFD, CEDD
TF.IDFD, LMI
C1

36
Reachability
• No semantic/
similarity links
higher recall
 Highest recall
TF.IDFD, CEDD, LMI
Recall
Fewer visited nodes, higher recall
Recall
Percentage graph seen
TF.IDFD, CEDD
TF.IDFD, LMI
C1

Observations
 Different facets leading to visiting different parts of the
collection
 This reinforces the importance of the poly-representation idea
to identify the relevant objects.
37
Sabetghadam S., Lupu M., Bierig R., and Rauber A, ."Reachability Analysis of Graph Modelled Collections". 37th
European Conference on Information Retrieval, ECIR 2015, pp. 370-381
C1

Observations
 Different facets leading to visiting different parts of the
collection
 This reinforces the importance of the poly-representation idea
to identify the relevant objects.
38
Sabetghadam S., Lupu M., Bierig R., and Rauber A, ."Reachability Analysis of Graph Modelled Collections". 37th
European Conference on Information Retrieval, ECIR 2015, pp. 370-381
Still limited access to the graph,
only half of the graph is reachable
C1

nodes be reached?
39
C2: Reachability Analysis from Different Links (α,β,δ,Ɣ )
Reachability

Recall Baseline Graph vs Lucene Results
 Base graph recall: 0.76
 Better than Lucene: 0.66
40
0.76
0.66
β: Part-of
δ: Facet
C2

Adding Semantic Links
 Using DBpedia dump
 Adding semantic links
between equivalent pages
 55,544 Intra-lingual
links added
 100,653 Inter-lingual
links added
41
C2
Reachability

Recall after Adding Semantic Links
 Semantic links added
from DBpedia dump
 Facets to start from
• TF.IDFD, CEDD, LMI
 Recall increase of %10
after adding semantic
links
42
0.76
0.84
0.98
0.66
α: Semantic
β: Part-of
δ: Facet
Ɣ: Similarity
C2

Recall after Adding Semantic Links
 Semantic links added
from DBpedia dump
 Facets to start from
• TF.IDFD, CEDD, LMI
 Recall increase of %10
after adding semantic
links
43
0.76
0.84
0.98
0.66
Question: Is this recall increase just because of adding more links?
α: Semantic
β: Part-of
δ: Facet
Ɣ: Similarity
C2

AverageRecall
Recall comparison: Semantic vs. Random
44
Reachability
 We added the same
number of random links
between docs
Higher recall with random links but with the
expense of visiting almost all the graph!
Recall Random links
Recall Semantic links
Perc. graph seen semantic links
Perc. graph seen random links
C2

AverageRecall
Recall comparison: Semantic vs. Random
45
Reachability
 We added the same
number of random links
between docs
Higher recall with random links but with the
expense of visiting almost all the graph!
Recall Random links
Recall Semantic links
Perc. graph seen semantic links
Perc. graph seen random links
How visiting this large amount of nodes affect precision?
C2

Adding random links - Precision loss
46
Reachability
Adding semantic links helped effectively
to the recall increase.
C2

Recall after Adding Similarity Links
47
We did the same analysis with adding
similarity links
They are effectively helpful in reaching
more relevant nodes

nodes be reached?
48
Reachability

Different Topic Categories (Tsikrika 2011)
49
C3

Recall analysis of different topic categories
50
Reachability
138%
128%
373%
266%
Only TF.IDFD
All three facets
High recall gain for hard and very hard topics Facet combination increased the recall pace
Links: Part-of, Facet
C3

Base graph vs added Semantic links
51
Only TF.IDFD – without semantic links Only TF.IDFD – with semantic links
13% 8%
We obtained 13% increase in recall for hard topics
by using semantic links
Links: Part-of, Facet, Semantic
C3

Observations
 The Graph structure outpaces Lucene results
• with 373% recall increase for very hard topics
 Leveraging multiple facets
• saved at least 5 steps to reach the same recall compared to using only
one facet
 Adding semantic links
• Very hard and hard topic by 13% and 8%
• shifted highly the recall value already in the first few steps.
52
C3

53
Document textual facets
Image textual facetsImage visual facets

54
Question: Do facets with similar recall behavior visit the
same parts or relevant nodes too?
Document textual facets
Image textual facetsImage visual facets

nodes be reached?
55
C4: Graph Visit from Different Facets (β,δ)
Reachability

Do different facets with the same recall value visit
the same parts of the graph?
 Percentage of different nodes visited in a step
56
Reachability
Nodes seen in a step for
facet fi Nodes reachable only
through facet fi
C4

Graph Visit – Doc Textual Facets
57
Reachability
Ratio of nodes seen Ratio of relevant nodes seen
C4

Graph Visit – Image Textual Facets
58
Reachability
C4
Image textual facets visit at least 20% different parts of the graphLinks: Part-of, Facet

Graph Visit – Image Visual Facets
59
Reachability
C4
Image visual facets visit very different parts, but not so many
different relevant nodes.

Observations
 Re-inforcing the importance of poly-representation idea
• to identify relevant information objects
 The image visual facets show the same recall behaviour,
• They visit totally different relevant images at the beginning steps (up
to 10)
 LM (Language Model) facet has more divergent view than
BM25 and TF.IDF facets
60
Reachability
C4

Contributions
61
[IRFC 2014]
[ICMR 2014]
Contributions
RQ1:Can we define a
multi-modal
multi-faceted information
retrieval (MMIR)?
[ECIR 2015] [CLEF 2014]
[GBS 2015]
[Keystone 2016]
Modelled faceted
search and relevancy
computation function
nodes be reached?
Contributions
relevant nodes
Model
scores identify the
relevant nodes?
Contributions
dependent and –
precision

RQ3: In such a model can scores identify the
relevant nodes?
62
C2: Precision in query-dependent and query-independent routing
C1: The effect of different facet combination on precision
C1

Baseline test (1) – No Graph
63
 Standard Test
• text-based search
Precision
C1
D1
D2
D3
I11
I12
I13
I22
I31
I32
I11
I12
I13
I22
I31
I32
Ranked list

Baseline test (2) – No Graph
Reranked by image similarity computation
64
Each image has two scores:
Text similarity score,
Image similarity score
Precision
C1
D1
D2
D3
I11
I12
I13
I22
I31
I32
I11
I12
I13
I22
I31
I32
Ranked list
Reranked list
Similarity with query images
I11
I13
I32
I12
I22
I32

Precision with different facet combinations,
st:0.9, links: β, δ
65
TF.IDFD
TF.IDFD & CEDD
TF.IDFD & LMI
TF.IDFD & CEDD & LMI
All Facets > TF.IDFD & LMI > TF.IDFD & CEDD > TF.IDFD
Precision
Precision increase by 9%
Sabetghadam S., Lupu M., Bierig R., and Rauber A., "A Hybrid Approach for Multi-Faceted IR in Multi-modal Domain",
5th Conference of Labs and Evaluation Forums, CLEF 2014, pp. 86-97.

RQ3: In such a model can scores identify the
relevant nodes?
66
C2: Precision in very large steps with query-dependent and
query-independent routing
C1: The effect of different facet combination on precision
Precision

How does the graph behave in very large steps?
 Normalized weighting in the graph to satisfy the stochastic
property
 Random Walk
 See the graph in the stationary distribution
 Compare Query-dependent and Query-independent routing
67
Precision
C2

Query-dependent and Query-independent
Routing
 Random Walks as Query-independent routing
• The basic definition of RW does not consider relevancy to the query
• What we need is only the transition matrix
 Metropolis Hastings as Query-dependent routing
• In each step
o Relevance of source and target nodes to the query are considered
68
Precision
C2

Precision analysis of Random Walk and
Metropolis-Hastings
69
Random Walk Metropolis-Hastings
Precision
C2
Sabetghadam S., Lupu M., and Rauber A,."Leveraging Metropolis-Hastings Algorithm on Graph-based Model for
multimodal IR". GSB’15: First International Workshop on Graph Search and Beyond, held at SIGIR 2015, pp. 14-18.

Observations
 Combination of three facets > combination of each two
 Precision increased by 9%
 Compared the performance of using query-dependent and
independent Random Walk
 Higher precision results with query-dependent RW
70
Precision
C2
Sabetghadam S., Lupu M., Rauber A., "Random Walks Analysis on Graph Modelled Multimodal Collections",
Second International KEYSTONE Conference, Keystone, 2016.

Conclusion (1)
 RQ1: How to define a graph-based model for multi-modal multi-
faceted information retrieval?
 We defined a graph-based model which supports
• Different modalities
• Different types of relations
• Our defined search based on facets
 Decomposes an information object to its facets
 Defined relevancy function Based on Poly-representation principle
 Calculate relevancy of an information object to the query as starting
points in the graph

Conclusion (2)
 RQ2: In such a graph model, can the relevant nodes be reached?
 We showed the effect of poly-representation on recall
• Combination of facets increases result
• Not every combination is effective
 The graph structure helps reaching relevant nodes specially for hard
and very hard topics
 The results from adding random links showed:
• The effect of adding meaningful in higher reachability
 Facets may show same recall behavior but visit different relevant
nodes (again Poly-representation)

Conclusion (3)
 RQ3: In such a model can scores identify the relevant nodes?
 Combination of different facets resulted in better precision
 Compared Query-dependent and Query-independent Random
Walks in stationary distribution
 Query-dependent Random walks shows better precision

Astera Status
 Astera: 2012 - now
 open source and available online
• http://ifs.tuwien.ac.at/~sabetghadam/Astera.html
 Highly configurable to work
with other collections
74
Query files
Query
Manager
Indexed
Search
Manager
Indexed Data
Graph Search
Graph
Structured
Data
Data
Collection
Data
Collection
Data
Collection
Scoring
Ranking
Linked Data
Indexed Search
Results
Final Results
Data Interface
Semantic
Manager

Scientific Publications
75
 Sabetghadam S., Lupu M., Rauber A., "Astera - A generic model for multi-modal Information Retrieval", Workshop on Integrating IR technologies
for Professional Search, held in ECIR 2013, pp. 551-554.
 Sabetghadam S., Lupu M., and Rauber A., "Which one do you choose? Spreading Activation or Random Walks?". Information Retrieval Facility
Conference, IRFC 2014, pp. 112-119.
 Sabetghadam S., Lupu M., Bierig R., and Rauber A., "A combined approach of structured and non-structured IR in multi-modal domain", In
Proceedings of International Conference on Multimedia Retrieval, ICMR 2014, pp. 491-494.
 Sabetghadam S., Astera - A model for Multimodal IR with a Combined Approach of Structured and Non-structured Retrieval, Doctoral
Symposium, ICMR 2014, pp. 551.
 Sabetghadam S., Lupu M., Bierig R., and Rauber A., "A Hybrid Approach for Multi-Faceted IR in multi-modal Domain", 5th Conference of Labs and
Evaluation Forums, CLEF 2014, pp. 86-97.
 Sabetghadam S., Lupu M., Bierig R., and Rauber A,."Reachability Analysis of Graph Modelled Collections". 37th European Conference on
Information Retrieval, ECIR 2015, pp. 370-381.
 Sabetghadam S., Palotti J., Rekabsaz N., Lupu M., Hanbury A.."TUW at MediaEval 2015". MediaEval, 2015. Obtained first place in the task of
„Diverise Social Image Retrieval“, Media Eval 2015.
 Sabetghadam S., Lupu M., and Rauber A,."Leveraging Metropolis-Hastings Algorithm on Graph-based Model for multimodal IR". GSB’15: First
International Workshop on Graph Search and Beyond, held at SIGIR 2015, pp. 14-18.
 Navid Rekabsaz, Serwah Sabetghadam, Mihai Lupu, Linda Andersson, Allan Hanbury, Standard Test Collection for English-Persian Cross-Lingual
Word Sense Disambiguation, Proceeding on Language Resources and Evaluation Conference, LREC 2016.
 Sabetghadam S., Lupu M., Rauber A., "Random Walks Analysis on Graph Modelled Multimodal Collections", Second International KEYSTONE
Conference Keystone 2016.
 Eskevich M., Larson M., Aly R., Sabetghadam S., Jones G., Ordelman R., Huet B. “Multimodal Video-to-Video Linking: Turning to the Crowd for
Insight and Evaluation”, Proceeding on Multimedia Modeling, MMM 2017, pp. 280-292.
 S. Sabetghadam, M. Lupu, R. Bierig, A. Rauber, "A faceted approach to reachability analysis of graph modelled collections";
International Journal of Multimedia Information Retrieval, IJMIR (2017), 10.1007/s13735-017-0145-8; S. 1 - 15.

A Graph-based Model for Multimodal Information Retrieval

More Related Content

What's hot

Similar to A Graph-based Model for Multimodal Information Retrieval

Recently uploaded

A Graph-based Model for Multimodal Information Retrieval