SlideShare a Scribd company logo
1 of 109
Download to read offline
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Information Retrieval Performance using
Word Embedding
Dwaipayan Roy
Indian Statistical Institute, Kolkata
December 28, 2018
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 1 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Information Retrieval
www.trec.nist.gov
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 2 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Information Retrieval: A Practical Application
www.google.com
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 3 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Information Retrieval: A Graphical Representation
Corpus
Index
IndexingInformation
Need
Query
Query
(Re)Formulation
Retrieval
System
Ranked
Docs.
d1
d2
d3
d4
d5
.
.
.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 4 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Information Retrieval: A Graphical Representation
Corpus
Index
IndexingInformation
Need
Query
Query
(Re)Formulation
Retrieval
Model
Ranked
Docs.
d1
d2
d3
d4
d5
.
.
.
Input
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Information Retrieval: A Graphical Representation
Corpus
Index
IndexingInformation
Need
Query
Query
(Re)Formulation
Retrieval
Model
Ranked
Docs.
d1
d2
d3
d4
d5
.
.
.
Input
Information
Retrieval
System
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Information Retrieval: A Graphical Representation
Corpus
Index
IndexingInformation
Need
Query
Query
(Re)Formulation
Retrieval
Model
Ranked
Docs.
d1
d2
d3
d4
d5
.
.
.
Input
Information
Retrieval
System
Output
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 5 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Retrieval Models
Language Model with Jelinek Mercer Smoothing (LM-JM)
Language Model with Dirichlet Smoothing (LM-Dir)
BM25
DFR
...
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 6 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Retrieval Models
Language Model with Jelinek Mercer Smoothing (LM-JM)
Language Model with Dirichlet Smoothing (LM-Dir)
BM25
DFR
...
Score(D,Q) =



∑
t∈Q f(D, t)
∏
t∈Q f(D, t)
...
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 6 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Retrieval Models
Language Model with Jelinek Mercer Smoothing (LM-JM)
Language Model with Dirichlet Smoothing (LM-Dir)
BM25
DFR
...
Score(D,Q) =



∑
t∈Q f(D, t)
∏
t∈Q f(D, t)
...
LM-JM(D,Q) =
∏
t∈Q
[
λ ∗ MLE(t in Document)
+ (1 − λ) ∗ MLE(t in Collection)
]
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 6 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Evaluation: Datasets
Collection Name Documents # documents # topics # rel-docs
TREC123 Tipster disks 1, 2 741,856
150
37836
(51-200)
TREC678
Tipster disks 4, 5
528,155
150
13692
exclude docs. from CR (301-450)
Robust
Tipster disks 4, 5
528,155
100
3720
exclude docs. from CR (601-700)
TREC910 WT10G 1,692,096
100
5980
(451-550)
GOV2 GOV2 25,205,179
150
26917
(701-850)
ClueWeb09B ClueWeb09 Disk1 - English 50,220,423
200
11037
(1-200)
Table: Overview of datasets used in experiments reported in this presentation.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 7 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Evaluation: Metrics
MAP: Mean average precision
P@5: Precision upto rank 5
Recall: Percentage of relevant documents retrieved
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 8 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word Embedding
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 9 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word Embedding
Represents every word as low dimensional vector in an abstract space.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 9 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word Embedding
Represents every word as low dimensional vector in an abstract space.
Similarity between vectors reflect semantic similarity between terms.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 10 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word Embedding
Represents every word as low dimensional vector in an abstract space.
Similarity between vectors reflect semantic similarity between terms.
Effect of conceptual composition by simple addition of vectors.
u
v
w1
w2
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 11 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word Embedding
Represents every word as low dimensional vector in an abstract space.
Similarity between vectors reflect semantic similarity between terms.
Effect of conceptual composition by simple addition of vectors.
u
v
w1
w2
German + Airlines ∼ Lufthansa
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 11 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word Embedding
Represents every word as low dimensional vector in an abstract space.
Similarity between vectors reflect semantic similarity between terms.
Effect of conceptual composition by simple addition of vectors.
Words have close vector representations if they share similar contexts.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 12 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Improving Baseline Retrieval Model
2 Improving Relevance Feedback based Query Expansion
3 Query Performance Prediction using Word Embedding
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 13 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Outline
1 Improving Baseline Retrieval Model
2 Improving Relevance Feedback based Query Expansion
3 Query Performance Prediction using Word Embedding
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 14 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Traditional Language Model
Given:
Collection C: A set of documents
Query Q
Documents ranked:
in decreasing order by the posterior probabilities P(D|Q)
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 15 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Traditional Language Model
P(D|Q) =
∏
t∈Q
λ
tf(t, D)
|D|
+ (1 − λ)
cf(t)
cs
Query terms generated by independent sampling from either
the document or the collection
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 16 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Language Model with Word Embedding
A generative process in which a noisy channel may transform a term
t′ into a term t
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 17 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Term Transformation Events : via Document Sampling
Figure: Schematics of generating a query term t from d
P(t, t′|d) = P(t|t′, d)P(t′|d)
P(t|t′, d) = sim(t′,t)
Σ(d)
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 18 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Term Transformation Events : via Collection Sampling
Figure: Schematics of generating a query term t from the collection
P(t, t′|C) = P(t|t′, C)P(t′|C)
P(t|t′, C) = sim(t,t′)∑
t′′∈Nt
sim(t,t′′)
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 19 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Term Transformation Events
Noisy Channel
d
C
t
λ
t′
1 − λ − α − β
α
β
Figure: A Generalized Language Model (GLM). GLM degenerates to LM when
α = β = 0.
λ: Standard Jelinek Mercer weighting parameter
α: Probability of sampling term t via a transformation through a term
t′ sampled from the document d
β: Probability of sampling term t via a transformation through a term
t′ sampled from collection
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 20 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Term Transformation Events : Combined Events
P(Q|d) =
∏
t∈Q
[
λP(t|d) + α
∑
t′∈d
P(t, t′
|d)P(t′
)+
β
∑
t′∈Nt
P(t, t′
|C)P(t′
) + (1 − λ − α − β)P(t|C)
]
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 21 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Generalized Language Model : Evaluation
Query Method
Metrics
MAP P@5 Recall
TREC 123
LM 0.1967 0.4213 0.4477
GLM 0.2104†
0.4562†
0.4893†
TREC 678
LM 0.2189 0.4160 0.5300
GLM 0.2214†
0.4187 0.5327
Robust
LM 0.2658 0.4364 0.7881
GLM 0.2777†
0.4586†
0.7990
WT10G
LM 0.1747 0.3152 0.6377
GLM 0.1906†
0.3782†
0.6689†
Table: Comparative performance of GLM on the basis of mean average precision
(MAP) precision at 5 (P@5) and recall at 1000. A † indicates the significance of
the metric value with respect to the baseline LM based retrieval model.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 22 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Evaluation: Summary
GLM > LM
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 23 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Evaluation: Summary
GLM > LM
A Word Embedding based Generalized Language Model for
Information Retrieval (SIGIR 2015)
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 23 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Retrieval Models: Keyword Matching
Score(D,Q) =



∑
t∈Q f(D, t)
∏
t∈Q f(D, t)
...
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 24 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Retrieval Models: Keyword Matching
Score(D,Q) =



∑
t∈Q f(D, t)
∏
t∈Q f(D, t)
...
Query: Vehicle Smoke Pollution
Score(D,Q) = f(D, Vehicle) ∗ f(D, Smoke) ∗ f(D, Pollution)
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 24 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Vocabulary Mismatch
Vehicle
Smoke
Pollution
Doc1
Passenger vehicles & heavy-duty trucks
are a major source of air pollution
which includes ozone, particulate matter,
and other smog-forming emissions
in the form of smoke.
Doc2
The exhaust gas of cars powered by
fossil fuels are a major source of
toxic materials in the air that causes
severe damage to the eco-system.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 25 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Vocabulary Mismatch
Vehicle
Smoke
Pollution
Doc1
Passenger vehicles & heavy-duty trucks
are a major source of air pollution
which includes ozone, particulate matter,
and other smog-forming emissions
in the form of smoke.
Doc2
The exhaust gas of cars powered by
fossil fuels are a major source of
toxic materials in the air that causes
severe damage to the eco-system.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 25 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Vocabulary Mismatch
Vehicle
Smoke
Pollution
Doc1
Passenger vehicles & heavy-duty trucks
are a major source of air pollution
which includes ozone, particulate matter,
and other smog-forming emissions
in the form of smoke.
Doc2
The exhaust gas of cars powered by
fossil fuels are a major source of
toxic materials in the air that causes
severe damage to the eco-system.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 25 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Vocabulary Mismatch: Solution
Query Expansion
Add more words of similar meaning to initial query for better retrieval.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 26 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Baseline Retrieval Model
Outline
1 Improving Baseline Retrieval Model
2 Improving Relevance Feedback based Query Expansion
3 Query Performance Prediction using Word Embedding
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 27 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Outline
1 Improving Baseline Retrieval Model
2 Improving Relevance Feedback based Query Expansion
3 Query Performance Prediction using Word Embedding
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 28 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Pseudo-Relevance Feedback based Query Expansion
Given a query Q
Perform initial retrieval using some retrieval model
Consider top K documents as (pseudo-)relevant
Select terms from those documents as candidate expansion terms
Perform re-retrieval with the expanded query
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 29 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Relevance based Language Model (RLM)
Assumes that both query and (pseudo-)relevant documents sampled
from a latent relevance model R.
The task → to find (estimate) the density function for R.
P(w|R) =
∑
D∈M
P(w|D)
∏
q∈Q
P(q|D)
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 30 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
RM3
RM3
A mixture model of RLM and query likelihood model.
P′
(w|R) = µP(w|R) + (1 − µ)P(w|Q)
P(w|R) =
∑
D∈M
P(w|D)
∏
q∈Q
P(q|D)
P(w|Q) =
tf(w, Q)
|Q|
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 31 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Motivation of this work
Word Vector Compositionality based Relevance Feedback
using Kernel Density Estimation
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 32 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Motivation of this work
Word Vector Compositionality based Relevance Feedback
using Kernel Density Estimation
Word embedding:
Captures the semantic relationship between terms.
Relevance feedback:
Captures co-occurence information of terms with query in top docs.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 32 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Kernel Density Estimation (KDE)
76543210-1
5
4
3
2
1
0
76543210-1
5
4
3
2
1
0
76543210-1
5
4
3
2
1
0
Estimate a distribution that generates the given data (data points).
Place kernel function (e.g. Gaussian) centered around each observed
data point.
Combine the Gaussians to get a function peaked at the observed data
point.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 33 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Kernel Density Estimation (KDE)
f(x) =
1
nh
n∑
i=1
K(
x − xi
h
)
h - bandwidth (smoothing parameter)
n - number of data points
xi - ith data point
K() - the kernel
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 34 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
KDE with Gaussian Kernel
f(x) =
1
nh
n∑
i=1
K(
x − xi
h
)
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 35 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
KDE with Gaussian Kernel
f(x) =
1
nh
n∑
i=1
K(
x − xi
h
)
f(x) =
1
nh
n∑
i=1
1
√
2πσ
e− 1
2σ2 (
x−xi
h
)
2
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 35 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
KDE with Gaussian Kernel
f(x) =
1
nh
n∑
i=1
K(
x − xi
h
)
f(x) =
1
nh
n∑
i=1
1
√
2πσ
e− 1
2σ2 (
x−xi
h
)
2
h and σ: tunable parameters
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 35 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Comparing the Estimations
Relevance Model Estimation
Given set of terms from (pseudo-)relevant documents, estimate the density
function that generates R.
76543210-1
5
4
3
2
1
0
Kernel Density Estimation
Given set of n data samples, estimates the density function that generates
the observed data.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 36 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
KDE in Relevance Feedback Scenario
f(x) =
1
nh
n∑
i=1
K(
x − xi
h
)
The Points
Observed data points (xi): Q = {q1, . . . , qn}
Points at which probability (density) is to be estimated (x):
w → Candidate expansion term
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 37 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
KDE in Relevance Feedback Scenario
f(x) =
1
nh
n∑
i=1
K(
x − xi
h
)
The Points
Observed data points (xi): Q = {q1, . . . , qn}
Points at which probability (density) is to be estimated (x):
w → Candidate expansion term
f(w) =
1
nh
n∑
i=1
K(
w − qi
h
)
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 37 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Relevance Feedback with KDE
f(w1)
f(q2)
f(w2)
q1 q2 q3
w1 w2
One dimensional projection of embedded vectors
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 38 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Kernel Density Estimate (Unweighted)
f(w) =
1
nh
n∑
i=1
K(
w − qi
h
)
f(w1)
f(q2)
f(w2)
q1 q2 q3
w1 w2
n - number of samples
h - bandwidth
K(·) - kernel function
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 39 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Kernel Density Estimate (Weighted)
f(w, α) =
1
nh
n∑
i=1
αiK(
w − qi
h
)
f(w2)
f(q2)
f(w1)
q1 q2 q3
w1 w2
n - number of samples
h - bandwidth
K(·) - kernel function
αi - weight of local
Kernel function around
ith data point
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 40 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
One dimensional KDE
f(w2)
f(q2)
f(w1)
q1 q2 q3
w1 w2
f(w, α) =
1
nh
n∑
i=1
αi
1
√
2πσ
e−
(w−qi)2
2σ2h2
αi : weight of the ith query term
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 41 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
One dimensional KDE
f(w, α) =
n∑
i=1
P(w|M)P(qi|M)
1
σ
√
2π
exp(−
(w − qi)T(w − qi)
2σ2h2
)
Captures co-occurence information of terms with query in top
docs.
Captures the semantic relationship between terms.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 42 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Two dimensional KDE
In 1D-KDE, set of all top ranked documents considered as a single
document model (M).
We now generalize the 1D-KDE to two dimensions so as to weight
the contributions obtained from different documents in the ranked list
separately.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 43 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Two dimensional KDE
q1 q2 q3 Terms
Documents
D1
D2
2
1
D1
D3
The x-axis corresponds to the query term vectors similar to one
dimensional KDE (unordered).
The y-axis represents the normalized term frequency of query terms in
respective feedback documents.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 44 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Two dimensional KDE
The Points: One Dimensional KDE
Observed data points (xi): Q = {q1, . . . , qn}
Points at which probability (density) is to be estimated (x):
w → Candidate expansion term
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 45 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Two dimensional KDE
The Points: Two Dimensional KDE
Observed data points (xij) = (qi, Dj): Encapsulates word vector for
i-th query word qi and its normalized term frequency in feedback
document Dj (P(qi|Dj)).
Points at which probability (density) is to be estimated (x)
= (w, Dj): Encapsulates word vector for candidate expansion term w
and its normalized term frequency in feedback document Dj
(P(w|Dj)).
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 45 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Two dimensional KDE
f(x, α) =
n∑
i=1
M∑
j=1
P(w|Dj)P(qi|Dj)
2πσ2
exp(
(w − qi)2 + (P(w|Dj) − P(qi|Dj))2
−2σ2h2
)
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 46 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Two dimensional KDE
f(x, α) =
n∑
i=1
M∑
j=1
P(w|Dj)P(qi|Dj)
2πσ2
exp(
(w − qi)2 + (P(w|Dj) − P(qi|Dj))2
−2σ2h2
)
w in document Dj, denoted as x, will get a high value of the density
function if:
(i) w is semanticlly close to query terms qi;
(ii) w frequently co-occurs with query terms in each top ranked document
Dj.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 46 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Term Composition in KDE
Extending the set of pivot points:
q1 q2q1+q2
w
(a) Without composition
q1 q2q1+q2
w
(b) With composition
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 47 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Importance of Composition
ISI Inter Services
Intelligence
Infrared
Spatial In-
terferometer
Indian
Statistical
Institute
Institute for
Scientific
Interchange
Islamic
State of Iraq
Integral
Satcom
Initiative
Ice Skating
Institute
Islamic
School
of Irving
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 48 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Importance of Composition
ISI
Kolkata
Inter Services
Intelligence
Infrared
Spatial In-
terferometer
Indian
Statistical
Institute
Institute for
Scientific
Interchange
Islamic
State of Iraq
Integral
Satcom
Initiative
Ice Skating
Institute
Islamic
School
of Irving
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 49 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Summary
Given:
Corpora : Set of documents.
Query Q : q1, . . . , qn.
Vector embeddings of each of the terms.
First level retrieval performed using any LM.
ET = All terms of top ranked M documents considered as potential
expansion terms.
Calculate KDE(w) for each w ∈ ET.
Terms with high estimated value taken as expansion terms.
Linearly interpolate derived density function with underlying query
model.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 50 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Evaluation
Query Method
LM-JM
MAP P@5 Recall
TREC 123
LM 0.1967 0.4213 0.4477
RM3 0.2507∗ 0.4653∗ 0.5154∗
KDERLM 0.2661∗† 0.4798∗ 0.5373∗†
TREC 678
LM 0.2189 0.4160 0.5300
RM3 0.2351∗ 0.4547∗ 0.5366
KDERLM 0.2536∗† 0.4601∗† 0.5520∗†
Robust
LM 0.2658 0.4364 0.7881
RM3 0.3309∗ 0.4929∗ 0.8596∗
KDERLM 0.3420∗† 0.5043∗ 0.8596∗
WT10G
LM 0.1747 0.3152 0.6377
RM3 0.2094∗ 0.3394∗ 0.6743∗
KDERLM 0.2221∗† 0.3419∗ 0.6910∗†
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 51 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Evaluation: Summary
KDERLM > RM3 > GLM > LM
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 52 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Evaluation: Summary
KDERLM > RM3 > GLM > LM
Word Vector Compositionality based Relevance Feedback
using Kernel Density Estimation (CIKM 2016)
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 52 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Query Expansion for Simple Query
Searching data set
National Geographic Readers: Snakes!
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 53 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Query Expansion for Simple Query
Searching data set
National Geographic Readers: Snakes!
Query
python life span
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 53 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Query Expansion for Simple Query
Searching data set
National Geographic Readers: Snakes!
Query
python life span
python setup in windows 10
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 53 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Query Expansion for Simple Query
Searching data set
National Geographic Readers: Snakes!
Query
python life span
python setup in windows 10
Query expansion → overkill or, not work at all.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 53 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Improving Relevance Feedback based Query Expansion
Query Expansion for Simple Query
Searching data set
National Geographic Readers: Snakes!
Query
python life span
python setup in windows 10
Query expansion → overkill or, not work at all.
How to understand whether query actually need expansion? → QPP
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 53 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Outline
1 Improving Baseline Retrieval Model
2 Improving Relevance Feedback based Query Expansion
3 Query Performance Prediction using Word Embedding
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 54 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Query Performance Prediction (QPP)
Definition
To quantify the quality of search results when no relevance feedback is
given.
For an ambiguous query, it may be difficult for a search engine to
return satisfactory results for the query.
This can lead to poor IR effectiveness.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 55 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Query Performance Prediction (QPP)
Why we need QP Predictor?
▶ Feedback to Users.
▶ Feedback to Search Engine.
▶ Feedback to System Administrator.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 56 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
What Makes a Query Difficult?
Figure: A general model of a topic based on a query Q expressing a specific
information need, the relevant documents R for Q, the entire collection C, and
the distances between the sets involved.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 57 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Characteristics of Difficult Query
d(Q, C) - Distance between query and collection; Reflects likelihood of generation of Q
from C.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 58 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Characteristics of Difficult Query
d(Q, C) - Distance between query and collection; Reflects likelihood of generation of Q
from C.
d(R, C) - Distance between retrieved documents and collection; Reflects likelihood of
generation of documents of R from C.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 58 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Characteristics of Difficult Query
d(Q, C) - Distance between query and collection; Reflects likelihood of generation of Q
from C.
d(R, C) - Distance between retrieved documents and collection; Reflects likelihood of
generation of documents of R from C.
d(Q, Q) - Diversity of query. Reflects the ambiguity of Q; more the diameter, more
ambiguous the query.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 58 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Characteristics of Difficult Query
d(Q, C) - Distance between query and collection; Reflects likelihood of generation of Q
from C.
d(R, C) - Distance between retrieved documents and collection; Reflects likelihood of
generation of documents of R from C.
d(Q, Q) - Diversity of query. Reflects the ambiguity of Q; more the diameter, more
ambiguous the query.
d(R, R) - Distance among retrieved documents; Manifestation of ambiguity of retrieved
documents.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 58 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Characteristics of Difficult Query
d(Q, C) - Distance between query and collection; Reflects likelihood of generation of Q
from C.
d(R, C) - Distance between retrieved documents and collection; Reflects likelihood of
generation of documents of R from C.
d(Q, Q) - Diversity of query. Reflects the ambiguity of Q; more the diameter, more
ambiguous the query.
d(R, R) - Distance among retrieved documents; Manifestation of ambiguity of retrieved
documents.
d(Q, R) - Distance between query and retrieved documents R; Equivalent to distribution
of retrieval scores of R.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 58 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Characteristics of Difficult Query
d(Q, C) - Distance between query and collection; Reflects likelihood of generation of Q
from C.
d(Q, Q) - Diversity of query. Reflects the ambiguity of Q; more the diameter, more
ambiguous the query.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 58 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Word Embedding based QPP
Ambiguous Terms
Terms with multiple senses
Hypothesis
For an ambiguous term w, in terms of the embedded space, w is more
likely to be a peripheral point of a word cluster rather than being an
interior point close to the cluster centre.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 59 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Ambiguous Term in Embedded Space
Figure: The neighbourhood of an ambiguous word with multiple senses
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 60 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Ambiguous Term in Embedded Space
Figure: Histogram of number of terms in the ϵ-neighbourhood of a term
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 61 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Ambiguous Term in Embedded Space
Figure: Histogram of number of terms in the ϵ-neighbourhood of a term
Cluster the set of points in the ϵ-neighbourhood of query term
For an ambiguous word, the distribution of points in the clusters likely
to be close to uniform
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 61 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Word Embedding based QPP
AMBWV(Q; K, ϵ) =
1
K
K∑
k=1
(|Nϵ(Q)| − µ)2
;
Q - The query.
Nϵ(Q) - ϵ-neighborhood of Q.
K - Number of clusters.
ϵ - Neighbourhood size.
µ - Average cardinality of clusters. (|Nϵ(Q)|
K )
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 62 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Word Embedding based QPP
AMBWV(Q; K, ϵ) =
1
K
K∑
k=1
(|Nϵ(Q)| − µ)2
;
High variance → One sense predominates others → Non-ambiguous
Low variance → Having multiple senses → Ambiguous
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 63 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Pre-retrieval QPP
d(Q, C) - Reflects likelihood of generation of Q from C.
d(Q, Q) - Reflects the ambiguity of Q;
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 64 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Pre-retrieval QPP
d(Q, C) - Reflects likelihood of generation of Q from C.
d(Q, Q) - Reflects the ambiguity of Q;
Pre-retrieval QPP Methods
Ignores any post-retrieval information
Hybrid of pre-retrieval and post-retrieval approaches outperforms both
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 64 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Combining with NQC
NQC
Post-retrieval QPP.
Variance of similarity score proportional to query performance.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 65 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Word Embedding and NQC based Hybrid Predictor
AMBWV-NQC = αAMBWV + (1 − α)NQC
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 66 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Evaluation: Metrics
Correlation between QPP scores and true score (AP value)
Evaluation Metrics
Pearson’s ρ: Correlation coefficient
Kendall’s τ: Rank correlation coefficient
High correlation → Better prediction
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 67 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Results on Some of the Topic sets
TREC3 TREC7 TREC8 TREC10
ρ τ ρ τ ρ τ ρ τ
AvgIDF 0.2353 0.2490 0.4262 0.3796 0.5910 0.3518 0.3736 0.2359
MaxIDF 0.2285 0.2772 0.3524 0.2662 0.4938 0.2996 0.2219 0.1233
WordNet 0.0297 -0.0375 0.2696 0.2509 0.0746 0.1014 0.2221 0.0896
AMBWV 0.3222 0.2457 0.2043 0.2132 0.2132 0.1559 0.1544 0.0033
NQC 0.3146 0.1559 0.4580 0.3502 0.6717 0.4335 0.4676 0.2686
AvgIDF-NQC 0.2782 0.2620 0.4551 0.4253 0.6315 0.3878 0.4119 0.2588
MaxIDF-NQC 0.2862 0.3159 0.4204 0.3094 0.5815 0.3714 0.2157 0.2336
WordNet-NQC 0.1794 0.1069 0.4095 0.3453 0.3457 0.3078 0.3292 0.1673
AMBWV-NQC 0.4741 0.3648 0.5362 0.3844 0.7070 0.4498 0.4936 0.2914
Table: Comaprisons of the word embedding based QPP method against various
baselines on the test topic sets.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 68 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Contribution
An embedding based approach to quantify underlying ambiguity of
user query.
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 69 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Contribution
An embedding based approach to quantify underlying ambiguity of
user query.
Estimating Gaussian Mixture Models in the Local
Neighbourhood of Embedded Word Vectors for Query
Performance Prediction
(Information Processing and Management - in press)
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 69 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Conclusion
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 70 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Conclusion
WE & I R Great together!
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 70 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
Conclusion
WE & I R Great together!
https://github.com/dwaipayanroy
dwaipayan_r@isical.ac.in
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 70 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query Performance Prediction using Word Embedding
THANK YOU!
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 71 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
GLM: A less-heavy QE method
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 72 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
GLM Parameter Details
λ empirically set to 0.2
GLM parameters α and β varied within range of [0.1, 0.4] to ensure
α + β + λ < 1
Word vectors embedded in a 200-dimensional space with
negative-sampling using 5-word window on continuous bag-of- words
model.
Nt set to 3
Back
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 73 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1D-KDE Details
P(w|M) : tf(w,M)
|M|
P(qi|M) : tf(qi,M)
|M|
h: Bandwidth set to one (1)
σ: Trained on the development set (TREC 6 and TREC 9)
Return
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 74 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2D-KDE Details
(w − qi)2
= (w − qi)T(w − qi)
P(w|Dj) :
tf(w,Dj)
|Dj|
P(qi|Dj) :
tf(qi,Dj)
|Dj|
h: Bandwidth set to one (1)
σ: Trained on the development set (TREC 6 and TREC 9)
covariance matrix σ as a diagonal matrix with equal covariance for
both the dimensions.
Return
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 75 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Extra slide
Return
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 76 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Pearson’s ρ
ρ(X,Y) =
∑n
i=1 (xi − ¯x)(yi − ¯y)
√∑n
i=1 (xi − ¯x)2
√∑n
i=1 (yi − ¯y)2
Range: (−1, 1)
Back
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 77 / 78
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Kendall’s τ
τ(X,Y) =
(No. of concordant pair) - (No. of discordant pair)
n(n − 1)/2
Range: (−1, 1)
Back
D. Roy (ISI, Kolkata) WE for IR December 28, 2018 78 / 78

More Related Content

Similar to Improving Information Retrieval Performance using Word Embedding

Wise Document Translator Report
Wise Document Translator ReportWise Document Translator Report
Wise Document Translator Report
Raouf KESKES
 
Arules_TM_Rpart_Markdown
Arules_TM_Rpart_MarkdownArules_TM_Rpart_Markdown
Arules_TM_Rpart_Markdown
Adrian Cuyugan
 
The Role of Selfies in Creating the Next Generation Computer Vision Infused O...
The Role of Selfies in Creating the Next Generation Computer Vision Infused O...The Role of Selfies in Creating the Next Generation Computer Vision Infused O...
The Role of Selfies in Creating the Next Generation Computer Vision Infused O...
hanumayamma
 
Stock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisStock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_Analysis
Oktay Bahceci
 
R journal 2010-2
R journal 2010-2R journal 2010-2
R journal 2010-2
Ajay Ohri
 
Applying the GNST to the Engie IT RAL-final
Applying the GNST to the Engie IT RAL-finalApplying the GNST to the Engie IT RAL-final
Applying the GNST to the Engie IT RAL-final
Geert Haerens
 
EE660 Project_sl_final
EE660 Project_sl_finalEE660 Project_sl_final
EE660 Project_sl_final
Shanglin Yang
 

Similar to Improving Information Retrieval Performance using Word Embedding (20)

Master's Thesis
Master's ThesisMaster's Thesis
Master's Thesis
 
Lakhotia09
Lakhotia09Lakhotia09
Lakhotia09
 
Wise Document Translator Report
Wise Document Translator ReportWise Document Translator Report
Wise Document Translator Report
 
A Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency AlgorithmsA Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency Algorithms
 
AIAA Future of Fluids 2018 Balaji
AIAA Future of Fluids 2018 BalajiAIAA Future of Fluids 2018 Balaji
AIAA Future of Fluids 2018 Balaji
 
Arules_TM_Rpart_Markdown
Arules_TM_Rpart_MarkdownArules_TM_Rpart_Markdown
Arules_TM_Rpart_Markdown
 
An introduction to data cleaning with r
An introduction to data cleaning with rAn introduction to data cleaning with r
An introduction to data cleaning with r
 
Notes econometricswithr
Notes econometricswithrNotes econometricswithr
Notes econometricswithr
 
Schedule Classes DataLab Community
Schedule Classes DataLab CommunitySchedule Classes DataLab Community
Schedule Classes DataLab Community
 
SEALS Whitepaper
SEALS WhitepaperSEALS Whitepaper
SEALS Whitepaper
 
Master_2_7a
Master_2_7aMaster_2_7a
Master_2_7a
 
The Role of Selfies in Creating the Next Generation Computer Vision Infused O...
The Role of Selfies in Creating the Next Generation Computer Vision Infused O...The Role of Selfies in Creating the Next Generation Computer Vision Infused O...
The Role of Selfies in Creating the Next Generation Computer Vision Infused O...
 
Stock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisStock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_Analysis
 
Word Vector Compositionality based Relevance Feedback using Kernel Density Es...
Word Vector Compositionality based Relevance Feedback using Kernel Density Es...Word Vector Compositionality based Relevance Feedback using Kernel Density Es...
Word Vector Compositionality based Relevance Feedback using Kernel Density Es...
 
R journal 2010-2
R journal 2010-2R journal 2010-2
R journal 2010-2
 
Applying the GNST to the Engie IT RAL-final
Applying the GNST to the Engie IT RAL-finalApplying the GNST to the Engie IT RAL-final
Applying the GNST to the Engie IT RAL-final
 
Project final report
Project final reportProject final report
Project final report
 
okafor2021.pdf
okafor2021.pdfokafor2021.pdf
okafor2021.pdf
 
EE660 Project_sl_final
EE660 Project_sl_finalEE660 Project_sl_final
EE660 Project_sl_final
 
An Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsAn Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing Units
 

Recently uploaded

如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
a8om7o51
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
pwgnohujw
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 

Recently uploaded (20)

Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 

Improving Information Retrieval Performance using Word Embedding