SlideShare a Scribd company logo
1 of 217
Download to read offline
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word Embedding in Information Retrieval: Present and
Beyond
Debasis Ganguly 1 Dwaipayan Roy 2 Ayan Bandopadhyay 2
Mandar Mitra 2
1IBM Research, Dublin
2CVPR Unit, Indian Statistical Institute
December 19, 2017
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 1 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Overview
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 2 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 3 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Information retrieval
Problem definition:
Given a user’s information need, find documents satisfying that need.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 4 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Information retrieval
Problem definition:
Given a user’s information need, find documents satisfying that need.
Types of information: text, images/graphics, speech, video, etc.
Text is still the most commonly used.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 4 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
Bag of Words Representation
Vector space model
Language Model
Pseudo-Relevance Feedback and Query Expansion
IR Evaluation
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectorsD. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 5 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
IR: bag of words approach
Document → list of keywords / content-descriptors / terms
User’s information need → (natural-language) query → list of
keywords
Measure overlap between query and documents.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 6 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Indexing
Tokenization: identify individual words.
In the prestigious season-ending tournament finale, Sindhu played her
heart out before losing 21-15, 12-21, 19-21 to Yamaguchi in an
energy-sapping summit clash that lasted an hour and 31 minutes.
⇓
In the prestigious season-ending tournament . . .
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 7 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Indexing: stemming
Stemming: reduce words to a common root.
e.g. resignation, resigned, resigns → resign
use standard algorithms (Porter).
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 8 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Indexing: stemming
Stemming: reduce words to a common root.
e.g. resignation, resigned, resigns → resign
use standard algorithms (Porter).
Stemming in NLTK
1 porter = nltk.PorterStemmer()
2 stemmed = [ porter.stem(w) for w in filtered ]
3 index_terms = sorted(set(stemmed))
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 8 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Indexing
Thesaurus: find synonyms for words in the document.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 9 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Indexing
Thesaurus: find synonyms for words in the document.
Phrases: find multi-word terms e.g. computer science, data mining.
use syntax/linguistic methods or “statistical” methods.
Named entities: identify names of people, organizations and places;
dates; monetary or other amounts, etc.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 9 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
IR: basic principle
Document → list of keywords / content-descriptors / terms
User’s information need → (natural-language) query → list of
keywords
Measure overlap between query and documents.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 10 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Boolean model
Keywords combined using and, or, (and) not
e.g. (medicine or treatment) and (hypertension or “high blood pressure”)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 11 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Boolean model
Keywords combined using and, or, (and) not
e.g. (medicine or treatment) and (hypertension or “high blood pressure”)
Efficient and easy to implement (list merging)
and ≡ intersection
or ≡ union
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 11 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Boolean model
Keywords combined using and, or, (and) not
e.g. (medicine or treatment) and (hypertension or “high blood pressure”)
Efficient and easy to implement (list merging)
and ≡ intersection
or ≡ union
Drawbacks
or — one match as good as many
and — one miss as bad as all
no ranking
queries may be difficult to formulate
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 11 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
Bag of Words Representation
Vector space model
Language Model
Pseudo-Relevance Feedback and Query Expansion
IR Evaluation
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectorsD. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 12 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Vector space model
Any text item (“document”) is represented as list of terms and
associated weights.
D = (⟨t1, w1⟩, . . . , ⟨tn, wn⟩)
Term = keywords or content-descriptors
Weight = measure of the importance of a term in representing the
information contained in the document
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 13 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term weights
Term frequency (tf): repeated words are strongly related to content
Inverse document frequency (idf): uncommon term is more important
Example: medicine vs. antibiotic
Normalization by document length
long docs. contain many distinct words.
long docs. contain same word many times.
term-weights for long documents should be reduced.
use # bytes, # distinct words, Euclidean length, etc.
Weight = tf x idf / normalization
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 14 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term weights: commonly used weighting schemes
Pivoted normalization [Singhal et al., SIGIR 96]
1+log(tf)
1+log(average tf) × log( N
df)
(1.0 − slope) × pivot + slope × # unique terms
BM25 (probabilistic model) [Robertson and Zaragoza, FTIR 2009]
tf × log(N−df+0.5
df+0.5 )
k1((1 − b) + b dl
avdl ) + tf
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 15 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Retrieval
Measure vocabulary overlap between user query and documents.
t1 . . . tn
Q = q1 . . . qn
D = d1 . . . dn
Sim(Q, D) = ⃗Q.⃗D
=
∑
i qi × di
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 16 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Retrieval
Measure vocabulary overlap between user query and documents.
t1 . . . tn
Q = q1 . . . qn
D = d1 . . . dn
Sim(Q, D) = ⃗Q.⃗D
=
∑
i qi × di
Use inverted list (index).
Termi → (Di1 , wi1 ), . . . , (Dik
, wik
)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 16 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
Bag of Words Representation
Vector space model
Language Model
Pseudo-Relevance Feedback and Query Expansion
IR Evaluation
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectorsD. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 17 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Language Model
Each document is a Multinomial distribution of terms.
Equivalent to an urn that contains N balls of p different colors, out of
which n1 are of color c1, n2 are of color c2,
∑p
i=1 ni = N.
Each term sampling is independent of the other.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 18 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Query sampling from document
Query terms are sampled independently from each document.
Documents are ranked by decreasing values of the posterior
probabilities P(D|Q).
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 19 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Converting posteriors to priors during indexing
P(D|Q) is not known during indexing. Why?
Terms are known (from the document content).
Estimate the sampling probability of each term from a document.
P(t|D) = tf(t,D)∑
t′∈D tf(t′,D) (Maximum Likelihood)
P(D|Q) ∝ P(Q|D).P(D)
P(D|Q) ∝ P(D)
∏
q∈Q P(q|D)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 20 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Smoothing
Problem with Maximum likelihood?
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 21 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Smoothing
Problem with Maximum likelihood?
If a query term is absent in a document, the probability becomes zero.
Too harsh.
Consider a complementary event that a term can be generated from
the collection.
Throw a coin. If ‘Heads’ then sample the term t from document D
else if ‘Tails’ sample t from collection.
Let λ be the probability of ‘Heads’. Probability of ‘Tails’ is (1 − λ).
P(t|D) = λ tf(t,D)∑
t′∈D tf(t′,D) + (1 − λ)P(t|coll).
P(t|coll) = cf(t)∑
t′∈V cf(t′)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 21 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
Bag of Words Representation
Vector space model
Language Model
Pseudo-Relevance Feedback and Query Expansion
IR Evaluation
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectorsD. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 22 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivation
Searching depends on matching keywords between user-query and
document.
Searchers and document creators may use different keywords to
denote same “concept”.
Vocabulary mismatch =⇒ poor retrieval quality.
Users may not know what they are looking for, but they’ll know when
they see it.
Effective query formulation may be difficult; simplify the problem
through iteration.
Boost recall: “find me similar documents...”
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 23 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivation
Searching depends on matching keywords between user-query and
document.
Searchers and document creators may use different keywords to
denote same “concept”.
Vocabulary mismatch =⇒ poor retrieval quality.
Users may not know what they are looking for, but they’ll know when
they see it.
Effective query formulation may be difficult; simplify the problem
through iteration.
Boost recall: “find me similar documents...”
Solution: expand the query by adding related words/phrases.
Issues:
select which terms to add to query
calculate weights for added terms
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 23 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Corpus
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Corpus
Index
Indexing
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
Query
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
Query
IR
System
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
Query
IR
System
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
Query
IR
System
Ranked
Docs.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
Query
IR
System
Ranked
Docs.
d1
d2
d3
d4
d5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
Query
IR
System
Ranked
Docs.
d1
d2
d3
d4
d5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
Query
IR
System
Ranked
Docs.
d1
d2
d3
d4
d5
.
.
.
d1
d2
d3
d4
d5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
Query
IR
System
Ranked
Docs.
d1
d2
d3
d4
d5
.
.
.
d1
d2
d3
d4
d5
.
.
.
Marked
Ranked
Docs.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
Query
IR
System
Ranked
Docs.
d1
d2
d3
d4
d5
.
.
.
d1
d2
d3
d4
d5
.
.
.
Marked
Ranked
Docs.
Feedback
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
Query
IR
System
Ranked
Docs.
d1
d2
d3
d4
d5
.
.
.
d1
d2
d3
d4
d5
.
.
.
Marked
Ranked
Docs.
Feedback
Query
Reformulation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
Query
IR
System
Ranked
Docs.
d1
d2
d3
d4
d5
.
.
.
d1
d2
d3
d4
d5
.
.
.
Marked
Ranked
Docs.
Feedback
Query
Reformulation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
Query
IR
System
Ranked
Docs.
d1
d2
d3
d4
d5
.
.
.
d1
d2
d3
d4
d5
.
.
.
Marked
Ranked
Docs.
Feedback
Query
Reformulation
New Ranked
Docs.
D1
D2
D3
D4
D5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: A Graphical Representation
Index
Information
Need
Query
IR
System
Ranked
Docs.
d1
d2
d3
d4
d5
.
.
.
d1
d2
d3
d4
d5
.
.
.
Marked
Ranked
Docs.
Feedback
Query
Reformulation
New Ranked
Docs.
D1
D2
D3
D4
D5
.
.
.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 24 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Basic Idea
User issues a query.
Search engine returns a set of documents.
User marks some documents as relevant, some as non-relevant.
Search engine uses this feedback information, together with collection
statistics to formulate a better representation of the information need.
Search engine performs retrieval with the reformulated query.
The new set of documents returned by the engine, having (hopefully)
better recall.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 25 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Basic Idea
User issues a query.
Search engine returns a set of documents.
User marks some documents as relevant, some as non-relevant.
Search engine uses this feedback information, together with collection
statistics to formulate a better representation of the information need.
Search engine performs retrieval with the reformulated query.
The new set of documents returned by the engine, having (hopefully)
better recall.
Can iterate this: several rounds of relevance feedback.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 25 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 1
Initial Query: bike
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 26 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 1
Results for initial query
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 27 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 1
Feedback: Relevant document selected
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 28 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 1
Result after re-retrieval
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 29 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 2
TREC 2 Topic - 113 : New Space Satellite Applications
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 30 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 2
TREC 2 Topic - 113 : New Space Satellite Applications
Rank Score Document title
1 0.539 NASA Hasn’t Scrapped Imaging Spectrometer
2 0.533 NASA Scratches Environment Gear From Satellite Plan
3 0.528 Science Panel Backs NASA Satellite Plan, But Urges
Launches of Smaller Probes
4 0.526 A NASA Satellite Project Accomplishes Incredible Feat:
Staying Within Budget
5 0.525 Scientist Who Exposed Global Warming Proposes
Satellites for Climate Research
6 0.524 Report Provides Support for the Critics of Using Big
Satellites to Study Climate
7 0.516 Arianespace Receives Satellite Launch Pact From Telesat
Canada
8 0.509 Telecommunications Tale of Two Companies
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 30 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 2
TREC 2 Topic - 113 : New Space Satellite Applications
Rank Score Document title
✓ 1 0.539 NASA Hasn’t Scrapped Imaging Spectrometer
✓ 2 0.533 NASA Scratches Environment Gear From Satellite Plan
3 0.528 Science Panel Backs NASA Satellite Plan, But Urges
Launches of Smaller Probes
4 0.526 A NASA Satellite Project Accomplishes Incredible Feat:
Staying Within Budget
5 0.525 Scientist Who Exposed Global Warming Proposes
Satellites for Climate Research
6 0.524 Report Provides Support for the Critics of Using Big
Satellites to Study Climate
7 0.516 Arianespace Receives Satellite Launch Pact From Telesat
Canada
✓ 8 0.509 Telecommunications Tale of Two Companies
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 30 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 2
Initial query: new space satellite applications
Expanded query after relevance feedback
2.074 new 15.106 space
30.816 satellite 5.660 application
5.991 nasa 5.196 eos
4.196 launch 3.972 aster
3.516 instrument 3.446 arianespace
3.004 bundespost 2.806 ss
2.790 rocket 2.053 scientist
2.003 broadcast 1.172 earth
0.836 oil 0.646 measure
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 31 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 2
TREC 2 Topic - 113 : New Space Satellite Applications
Rank Score Document title
1 0.513 NASA Scratches Environment Gear From Satellite Plan
2 0.500 NASA Hasn’t Scrapped Imaging Spectrometer
3 0.493 When the Pentagon Launches a Secret Satellite, Space
Sleuths Do Some Spy Work of Their Own
4 0.493 NASA Uses ‘Warm’ Superconductors For Fast Circuit
5 0.492 Telecommunications Tale of Two Companies
6 0.491 Soviets May Adapt Parts of SS-20 Missile For
Commercial Use
7 0.490 Gaping Gap: Pentagon Lags in Race To Match the
Soviets In Rocket Launchers
8 0.490 Rescue of Satellite By Space Agency To Cost $90 Million
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 32 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 2
TREC 2 Topic - 113 : New Space Satellite Applications
Old Rank Score Document title
2 1 0.513 NASA Scratches Environment Gear From Satellite Plan
1 2 0.500 NASA Hasn’t Scrapped Imaging Spectrometer
3 0.493 When the Pentagon Launches a Secret Satellite, Space
Sleuths Do Some Spy Work of Their Own
4 0.493 NASA Uses ‘Warm’ Superconductors For Fast Circuit
8 5 0.492 Telecommunications Tale of Two Companies
6 0.491 Soviets May Adapt Parts of SS-20 Missile For
Commercial Use
7 0.490 Gaping Gap: Pentagon Lags in Race To Match the
Soviets In Rocket Launchers
8 0.490 Rescue of Satellite By Space Agency To Cost $90 Million
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 32 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Key concept for relevance feedback: Centroid
The centroid is the center of mass of a set of points.
Recall that we represent documents as points in a high-dimensional
space.
Thus: we can compute centroids of documents.
The centroid of a set of documents D:
⃗µ(D) =
1
|D|
∑
d∈D
⃗d
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 33 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm for Relevance Feedback
Standard algorithm used for relevance feedback (SMART, ’70s).
Integrates a measure of relevance feedback into Vector Space Model.
Maximize the difference between average similarities for relevant and
non-relevant documents.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 34 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm for Relevance Feedback
Chooses the optimal query ⃗Qopt that maximizes:
⃗Qopt = arg max
⃗Q
[
sim(⃗Q, µ(DR)) − sim(⃗Q, µ(DNR))
]
DR: Set of relevant documents
DNR: Set of non-relevant documents
µ(DX): Centroid of a set of documents X
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 35 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm for Relevance Feedback
Chooses the optimal query ⃗Qopt that maximizes:
⃗Qopt = arg max
⃗Q
[
sim(⃗Q, µ(DR)) − sim(⃗Q, µ(DNR))
]
DR: Set of relevant documents
DNR: Set of non-relevant documents
µ(DX): Centroid of a set of documents X
To find the vector ⃗Qopt, that separates relevant and non-relevant
documents maximally.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 35 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm Illustrated
relevant documents non-relevant documents
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 36 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm Illustrated
⃗µR : centroid of relevant documents
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 37 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm Illustrated
⃗µR not seperating relevant documents from non-relevant
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 38 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm Illustrated
⃗µNR : centroid of non-relevant documents
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 39 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm Illustrated
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 40 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm Illustrated
⃗µR − ⃗µNR : difference vector
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 41 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm Illustrated
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 42 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm Illustrated
Add difference vector to ⃗µR to get ⃗µopt
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 43 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm Illustrated
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 44 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm Illustrated
⃗Qopt seperates relevant and non-relevant documents perfectly
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 45 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm for Relevance Feedback
The optimal query vector:
⃗Qopt = µ(DR) +
[
µ(DR) − µ(DNR)
]
=
1
|DR|
∑
⃗di∈DR
⃗di +
[ 1
|DR|
∑
⃗di∈DR
⃗di −
1
|DNR|
∑
⃗dj∈DNR
⃗dj
]
The centroid of the relevant documents moved by the difference between
the two centroids
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 46 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm
In practice:
⃗Qm = α⃗QO + β
1
|DR|
∑
⃗di∈DR
⃗di − γ
1
|DNR|
∑
⃗dj∈DNR
⃗dj
Qm: Modified query
QO: Original query
DR: Set of known relevant documents
DNR: Set of known non-relevant documents
α, β, γ: Weights (belief) of corresponding sets
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 47 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm: Parameter Settings
⃗Qm = α⃗QO + β
1
|DR|
∑
⃗di∈DR
⃗di − γ
1
|DNR|
∑
⃗dj∈DNR
⃗dj
New Query
Set higher β, γ if there is a lot of judged documents.
Positive feedback more valuable than negative feedback
Set negative term weights to 0 (Negative weight doesn’t make sense
in vector space model).
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 48 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm: Example
Original query 0 4 0 8 0 0 α = 1.0 0 4 0 8 0 0
Positive feedback 2 4 8 0 0 2 β = 0.5 1 2 4 0 0 1
Negative feedback 8 0 4 4 0 16 γ = 0.25 2 0 1 1 0 4
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 49 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm: Example
Original query 0 4 0 8 0 0 α = 1.0 0 4 0 8 0 0
Positive feedback 2 4 8 0 0 2 β = 0.5 1 2 4 0 0 1
Negative feedback 8 0 4 4 0 16 γ = 0.25 2 0 1 1 0 4
New query -1 6 3 7 0 -3
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 49 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm: issues
What should be regarded as NR?
Documents marked non-relevant by user?
too little information
All documents not known to be relevant?
Unseen relevant documents may affect term-weights
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 50 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm: issues
What should be regarded as NR?
Documents marked non-relevant by user?
too little information
All documents not known to be relevant?
Unseen relevant documents may affect term-weights
How should terms be selected? How many?
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 50 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm: issues
What should be regarded as NR?
Documents marked non-relevant by user?
too little information
All documents not known to be relevant?
Unseen relevant documents may affect term-weights
How should terms be selected? How many?
Rank terms by # relevant documents they occur in.
Add 50-100 terms.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 50 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Problem
Usually Expensive with respect to time.
Users are reluctant to provide explicit feedback.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 51 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Indirect Relevance Feedback
Uses Evidences rather than Explicit feedback.
Not specific to users.
Suitable for Web search.
Click / Time spent on a particular retrieved document.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 52 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Blind/Pseudo Relevance Feedback
In the absence of feedback, assume top-ranked documents are
relevant.
Obvious danger: if initial retrieval is poor, adhoc feedback can
aggravate the problem.
Most groups at TREC found adhoc feedback useful on average.
Suggestion: find ways to improve the initial retrieval.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 53 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance based Language Model (RLM)1
Assumes that both query and relevant documents are sampled from a
latent relevance model R.
In absence of training data, top ranked documents considered as set
of (known) relevant documents.
The query serves as the only absolute evidence about the relevance
model.
The task is to find (estimate) the density function for R.
Estimates P(w|R) on the basis of P(w|Q).
1
Lavrenko and Croft - SIGIR 2001
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 54 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
RLM: Formally
Probability Ranking Principle
Rank documents by odds of their being observed in the relevant class.a
P(D|R)
P(D|N)
∼
∏
w∈D
P(w|R)
P(w|N)
a
S. Robertson. - Morgan Kaufmann Publishers, Inc.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 55 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The Estimation
The probability of sampling a term w from the R, denoted by P(w|R),
approximated by
P(w|R) ≈
P(w, Q)
P(Q)
Joint probability of observing w along with query terms
Query prior probability
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 56 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The Estimation
The probability of sampling a term w from the R, denoted by P(w|R),
approximated by
P(w|R) ≈
P(w, Q)
P(Q)
∼ P(w, Q)
Joint probability of observing w along with query terms
Query prior probability
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 56 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Independent and Indentically Distributed (IID) Sampling
Given a query Q = {q1, . . . , qk} and M, a set of initially retrieved
document.
Assumption
The query words Q = {q1, . . . , qk} and w sampled independently and
identically from (pseudo-)relevant documents
Pseudo-relevant documents =⇒ Top ranked documents
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 57 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Independent and Indentically Distributed (IID) Sampling
Q = {q1, . . . , qk}; M = {D1, . . . , D|M|}
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 58 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Independent and Indentically Distributed (IID) Sampling
Q = {q1, . . . , qk}; M = {D1, . . . , D|M|}
P(w, Q) =
∑
D∈M
P(D)P(w, Q|D)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 58 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Independent and Indentically Distributed (IID) Sampling
Q = {q1, . . . , qk}; M = {D1, . . . , D|M|}
P(w, Q) =
∑
D∈M
P(D)P(w, Q|D)
P(w, Q|D) = P(w|D)
∏
q∈Q
P(q|D)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 58 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Independent and Indentically Distributed (IID) Sampling
Q = {q1, . . . , qk}; M = {D1, . . . , D|M|}
P(w, Q) =
∑
D∈M
P(D)P(w, Q|D)
P(w, Q|D) = P(w|D)
∏
q∈Q
P(q|D)
P(w, Q) =
∑
D∈M
P(D)P(w|D)
∏
q∈Q
P(q|D)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 58 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Independent and Indentically Distributed (IID) Sampling
Q = {q1, . . . , qk}; M = {D1, . . . , D|M|}
P(w, Q) =
∑
D∈M
P(D)P(w, Q|D)
P(w, Q|D) = P(w|D)
∏
q∈Q
P(q|D)
P(w, Q) =
∑
D∈M
P(D)P(w|D)
∏
q∈Q
P(q|D)
∏
q∈Q P(q|D): Maximum likelihood estimate of Q in D (LM based
retrieval score of D).
P(w|D): Maximum likelihood estimate of w in D.
P(D): Prior probability of selection of the document.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 58 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
RM32
Drawback of RLM
Does not consider the original query terms for estimation of the density
function
2
UMass at TREC 2004: Novelty and HARD - Jaleel et al.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 59 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
RM32
Drawback of RLM
Does not consider the original query terms for estimation of the density
function
RM3
A mixture model of RLM and query likelihood model
P′
(w|R) = µP(w|R) + (1 − µ)P(w|Q)
2
UMass at TREC 2004: Novelty and HARD - Jaleel et al.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 59 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
RM32
Drawback of RLM
Does not consider the original query terms for estimation of the density
function
RM3
A mixture model of RLM and query likelihood model
P′
(w|R) = µP(w|R) + (1 − µ)P(w|Q)
P(w|R) =
∑
D∈M
P(w|D)
∏
q∈Q
P(q|D)
P(w|Q) =
tf(w, Q)
|Q|
2
UMass at TREC 2004: Novelty and HARD - Jaleel et al.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 59 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Summary
Different types of feedback
Explicit: Users do the work
Implicit: System guesses based on user activity
Blind/Pseudo: Users not involved
Query Expansion as a general mechanism
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 60 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
Bag of Words Representation
Vector space model
Language Model
Pseudo-Relevance Feedback and Query Expansion
IR Evaluation
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectorsD. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 61 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Cranfield method (cleverdon et al., 60s)
Benchmark data
Document collection
Query / topic collection
Relevance judgments - information about which document is relevant
to which query
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 62 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Cranfield method (cleverdon et al., 60s)
Benchmark data
Document collection
Query / topic collection
Relevance judgments - information about which document is relevant
to which query
syllabus
question paper
correct answers
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 62 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Cranfield method (cleverdon et al., 60s)
Benchmark data
Document collection
Query / topic collection
Relevance judgments - information about which document is relevant
to which query
syllabus
question paper
correct answers
Assumptions
relevance of a document to a query is objectively discernible
all relevant documents contribute equally to the performance
measures
relevance of a document is independent of the relevance of other
documents
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 62 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Evaluation metrics
Background
User has an information need.
Information need is converted into a query.
Documents are relevant or non-relevant.
Ideal system retrieves all and only the relevant documents.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 63 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Evaluation metrics
Background
User has an information need.
Information need is converted into a query.
Documents are relevant or non-relevant.
Ideal system retrieves all and only the relevant documents.
Document
Collection
Information
need
User
System
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 63 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Set-based metrics
Recall =
#(relevant retrieved)
#(relevant)
=
#(true positives)
#(true positives + false negatives)
Precision =
#(relevant retrieved)
#(retrieved)
=
#(true positives)
#(true positives + false positives)
F =
1
α/P + (1 − α)/R
=
(β2 + 1)PR
β2P + R
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 64 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
(Non-interpolated) average precision
Which is better?
1 Non-relevant
2 Non-relevant
3 Non-relevant
4 Relevant
5 Relevant
1 Relevant
2 Relevant
3 Non-relevant
4 Non-relevant
5 Non-relevant
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 65 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
(Non-interpolated) average precision
Rank Type Recall Precision
1 Relevant 0.2 1.00
2 Non-relevant
3 Relevant 0.4 0.67
4 Non-relevant
5 Non-relevant
6 Relevant 0.6 0.50
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 66 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
(Non-interpolated) average precision
Rank Type Recall Precision
1 Relevant 0.2 1.00
2 Non-relevant
3 Relevant 0.4 0.67
4 Non-relevant
5 Non-relevant
6 Relevant 0.6 0.50
∞ Relevant 0.8 0.00
∞ Relevant 1.0 0.00
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 66 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
(Non-interpolated) average precision
Rank Type Recall Precision
1 Relevant 0.2 1.00
2 Non-relevant
3 Relevant 0.4 0.67
4 Non-relevant
5 Non-relevant
6 Relevant 0.6 0.50
∞ Relevant 0.8 0.00
∞ Relevant 1.0 0.00
AvgP =
1
5
(1 +
2
3
+
3
6
)
(5 relevant docs. in all)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 66 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
(Non-interpolated) average precision
Rank Type Recall Precision
1 Relevant 0.2 1.00
2 Non-relevant
3 Relevant 0.4 0.67
4 Non-relevant
5 Non-relevant
6 Relevant 0.6 0.50
∞ Relevant 0.8 0.00
∞ Relevant 1.0 0.00
AvgP =
1
5
(1 +
2
3
+
3
6
)
(5 relevant docs. in all)
AvgP =
1
NRel
∑
di∈Rel
i
Rank(di)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 66 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
Interpolated average precision at a given recall point
Recall points correspond to 1
NRel
NRel different for different queries
P
R
1.00.0
Q1 (3 rel. docs)
Q2 (4 rel. docs)
Interpolation required to compute averages across queries
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 67 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
Interpolated average precision
Pint(r) = max
r′≥r
P(r′
)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 68 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
Interpolated average precision
Pint(r) = max
r′≥r
P(r′
)
11-pt interpolated average precision
Rank Type Recall Precision
1 Relevant 0.2 1.00
2 Non-relevant
3 Relevant 0.4 0.67
4 Non-relevant
5 Non-relevant
6 Relevant 0.6 0.50
∞ Relevant 0.8 0.00
∞ Relevant 1.0 0.00
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 68 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
Interpolated average precision
Pint(r) = max
r′≥r
P(r′
)
11-pt interpolated average precision
Rank Type Recall Precision
1 Relevant 0.2 1.00
2 Non-relevant
3 Relevant 0.4 0.67
4 Non-relevant
5 Non-relevant
6 Relevant 0.6 0.50
∞ Relevant 0.8 0.00
∞ Relevant 1.0 0.00
R Interp. P
0.0 1.00
0.1 1.00
0.2 1.00
0.3 0.67
0.4 0.67
0.5 0.50
0.6 0.50
0.7 0.00
0.8 0.00
0.9 0.00
1.0 0.00
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 68 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
11-pt interpolated average precision
0.20.0 0.4 0.6 0.8 1.0
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 69 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for sub-document retrieval
Let pr - document part retrieved at rank r
rsize(pr) - amount of relevant text contained by pr
size(pr) - total number of characters contained by pr
Trel - total amount of relevant text for a given topic
P[r] =
∑r
i=1 rsize(pi)
∑r
i=1 size(pi)
R[r] =
1
Trel
r∑
i=1
rsize(pi)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 70 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
Precision at k (P@k) - precision after k documents have been
retrieved
easy to interpret
not very stable / discriminatory
does not average well
R precision - precision after NRel documents have been retrieved
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 71 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Cumulated Gain
Idea:
Highly relevant documents are more valuable than marginally relevant
documents
Documents ranked low are less valuable
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 72 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Cumulated Gain
Idea:
Highly relevant documents are more valuable than marginally relevant
documents
Documents ranked low are less valuable
Gain ∈ {0, 1, 2, 3}
G = ⟨3, 2, 3, 0, 0, 1, 2, 2, 3, 0, . . .⟩
CG[i] =
i∑
j=1
G[i]
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 72 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
(n)DCG
DCG[i] =
CG[i] if i < b
DCG[i − 1] + G[i]/ logb i if i ≥ b
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 73 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
(n)DCG
DCG[i] =
CG[i] if i < b
DCG[i − 1] + G[i]/ logb i if i ≥ b
Ideal G = ⟨3, 3, . . . , 3, 2, . . . , 2, 1, . . . , 1, 0, . . .⟩
nDCG[i] =
DCG[i]
Ideal DCG[i]
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 73 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mean Reciprocal Rank
Useful for known-item searches with a single target
Let ri — rank at which the “answer” for query i is retrieved.
Then reciprocal rank = 1/ri
Mean reciprocal rank (MRR) =
n∑
i=1
1
ri
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 74 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
TREC
http://trec.nist.gov
Organized by NIST every year since 1992
Typical tasks
adhoc
user enters a search topic for a one-time information need
document collection is static
routing/filtering
user’s information need is persistent
document collection is a stream of incoming documents
question answering
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 75 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
TREC data
Documents
Genres:
news (AP, LA Times, WSJ, SJMN, Financial Times, FBIS)
govt. documents (Federal Register, Congressional Records)
technical articles (Ziff Davis, DOE abstracts)
Size: 0.8 million documents – 1.7 million web pages
(cf. Google indexes several billion pages)
Topics
title
description
narrative
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 76 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 77 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
What is Word Embedding?
Represent every word as a vector in some abstract space.
What are the characteristics of this space?
Two terms t1 and t2 are close if and only if they share similar contexts.
Paris is close to France. Why?
If Paris is close to France, then Berlin will be close to Germany. Why?
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 78 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
Word2vec: Working principle
Word2vec: Properties and Limitations
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 79 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word2vec
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 80 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word2vec
Architecture:
Input: One-hot vector representation of words.
Context: Concatenated one-hot vectors of neighboring words.
Output: Softmax of dimension D (the desired dimensionality of the
vector).
Connecting the cells
Current output is a function of current input and previous outputs.
y(wn) = f(y(wn−1), c(wn)) (1)
Objective Function (to maximize):
J(W) =
∑
wt,ct∈D+
∑
c∈ct
p(D = 1| ⃗wt,⃗c)) −
∑
wt,c′
t∈D−
∑
c∈c′
t
p(D = 1| ⃗wt,⃗c))
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 81 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
Word2vec: Working principle
Word2vec: Properties and Limitations
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 82 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Compositional properties of word2vec
Adding (vector sum) of word vectors lead to composing the meanings
of terms.
Example: vec(‘Jadavpur’) + vec(‘NLP’) + vec(‘conference’) is
expected to be close to vec(‘ICON’17’)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 83 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
FastText - Learning word vectors from character sequences
Word2vec Limitations
1 For less frequent terms, vector representations would be noisy. Why?
2 For morphologically complex languages, the word vector representation
would also be noisy. Why?
Solution: FastText
1 Learn vector representations of character n-grams.
2 Obtain vector representation of a whole word by summing the vectors
of the n-grams.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 84 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 85 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
Problems with the Term Independence Assumption
Word vectors to address term dependence
Multi-modal
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 86 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Dependence: Initial Retrieval
Term association: Has been an intriguing problem in IR.
Vocabulary mismatch: Different terms may used in two documents
that are about the same topic, e.g. “atomic” and “nuclear” etc.
Terms used in query are different from those in its relevant
documents.
Standard retrieval models assume term independence.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 87 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Dependence: Relevance Feedback
Uses statistical co-occurrence of words in top ranked docs with query
terms.
No way to take into account multi-word ‘concepts’, e.g. relating
‘osteoporosis’ to ‘bone disease’ beyond pre-defined phrases.
Noisy expansion terms can lead to ‘query drift’ and hence degraded
IR effectiveness after RF.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 88 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
Problems with the Term Independence Assumption
Word vectors to address term dependence
Multi-modal
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 89 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
How could Word Embedding help?
Embedded vector space could model the relationship between term
pairs, e.g. ‘atomic’ and ‘nuclear’.
Mainly two approaches: Global and Local Analysis
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 90 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Global Analysis
Independent of queries.
Done once at indexing time.
Takes into account term co-occurrences to normalize terms.
Examples: Approaches such as LSA, PLSA, LDA etc.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 91 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Local Analysis
More focused than global analysis.
Considers a focused subset of documents retrieved in response to a
query.
Can handle term dependence in specific context.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 92 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word Embedding for Relevance Feedback (RF)
Semantic similarity captured by distance measure between word
vectors.
Integrate semantic similarity with statistical co-occurrences between
terms for RF.
Exploit term compositionality to extract meaningful concepts to use
in RF.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 93 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
Problems with the Term Independence Assumption
Word vectors to address term dependence
Multi-modal
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 94 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word Embeddings for Multi-modal IR
Multi-modality: A document comprised of text, images, speech,
video, e.g. a typical Wikipedia page.
Given a unimodal (e.g. text/image) query or more generally a
multi-modal query, how can one retrieve relevant multi-modal
documents?
Given a Wiki page as a query, what are the other relevant documents?
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 95 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word Embeddings for Multi-modal IR
Index the different modalities separately. Compute similarities
individually and fuse.
Problems: Different retrieval strategies. How to combine the scores?
Vector Embedding Approach: Joint embedding of categorical data,
such as text, and continuous data such as image features into vectors
of reals.
What we need: A similarity function between sets of vectors.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 96 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 97 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Traditional Language Model
Given:
Collection C: A set of documents
Query q
Documents ranked:
in decreasing order by the posterior probabilities P(d|q)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 98 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Traditional Language Model
P(d|q) = P(q|d).P(d)
= P(d).
∏
t∈q
P(t|d)
=
∏
t∈q
λˆP(t|d) + (1 − λ)ˆP(t|C)
=
∏
t∈q
λ
tf(t, d)
|d|
+ (1 − λ)
cf(t)
cs
Query terms generated by sampling independently from either the
document or the collection
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 99 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Transformation Events
We propose
A generative process in which a noisy channel may transform a term
t′ into a term t
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 100 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Transformation Events
If a term t is observed in query corresponding to a document d, it may
have occured in three possible ways:
Direct term sampling: Standard LM term sampling
Transformation via Document Sampling: Sampling a term t′ from
d which is then transformed to t by a noisy channel
Transformation via Collection Sampling: Sampling the term t′
from the collection which is then transformed to t by the noisy
channel
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 101 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Transformation Events : via Document Sampling
Figure: Schematics of generating a query term t from d
P(t, t′|d) = P(t|t′, d)P(t′|d)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 102 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Transformation Events : via Document Sampling
P(t, t′
|d) = P(t|t′
, d)P(t′
|d)
P(t|t′
, d) =
sim(t, t′)
∑
t′′∈d sim(t, t′′)
P(t, t′
|d) =
sim(t, t′)
∑
t′′∈d sim(t, t′′)
tf(t′, d)
|d|
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 103 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Transformation Events : via Collection Sampling
Figure: Schematics of generating a query term t from the collection
P(t, t′|C) = P(t|t′, C)P(t′|C)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 104 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Transformation Events : via Collection Sampling
P(t, t′
|C) = P(t|t′
, C)P(t′
|C)
P(t|t′
, C) =
sim(t, t′)
∑
t′′∈Nt
sim(t, t′′)
P(t, t′
|C) =
sim(t, t′)
∑
t′′∈Nt
sim(t, t′′)
cf(t′)
cs
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 105 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Transformation Events
Noisy Channel
d
C
t
λ
t′
1 − λ − α − β
α
β
Figure: Schematics of generating a query term t in Generalized Language Model
(GLM). GLM degenerates to LM when α = β = 0.
λ: Standard Jelinek Mercer weighting parameter
α: Probability of sampling term t via a transformation through a term
t′ sampled from the document d
β: Probability of sampling term t via a transformation through a term
t′ sampled from collection
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 106 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Transformation Events : Combined Events
P(t|d) = λP(t|d) + α
∑
t′∈d
P(t, t′
|d)P(t′
)+
β
∑
t′∈Nt
P(t, t′
|C)P(t′
) + (1 − λ − α − β)P(t|C)
Parameter Details
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 107 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generalized Language Model : Evaluation
Topic Set Method MAP GMAP Recall
LM 0.2148 0.0761 0.4778
TREC-6 LDA-LM 0.2192 0.0790 0.5333
GLM 0.2287 0.0956 0.5020
LM 0.1771 0.0706 0.4867
TREC-7 LDA-LM 0.1631 0.0693 0.4854
GLM 0.1958 0.0867 0.5021
LM 0.2357 0.1316 0.5895
TREC-8 LDA-LM 0.2428 0.1471 0.5833
GLM 0.2503 0.1492 0.6246
LM 0.2555 0.1290 0.7715
Robust LDA-LM 0.2623 0.1712 0.8005
GLM 0.2864 0.1656 0.7967
Table: Comparative performance of LM, LDA and GLM on the TREC query sets.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 108 / 158
Results
Metrics
Topic Set Method MAP GMAP Recall
TREC-6
LM 0.2148 0.0761 0.4778
LDA 0.2192 0.0790 0.5333
GLM 0.2287 0.0956 0.5020
TREC-7
LM 0.1771 0.0706 0.4867
LDA 0.1631 0.0693 0.4854
GLM 0.1958 0.0867 0.5021
TREC-8
LM 0.2357 0.1316 0.5895
LDA 0.2428 0.1471 0.5833
GLM 0.2503 0.1492 0.6246
Robust
LM 0.2555 0.1290 0.7715
LDA 0.2623 0.1712 0.8005
GLM 0.2864 0.1656 0.7967
. . . . . . . . . . . . . . . . . . . .
Parameter Variation Effects
0.2268
0.2272
0.2276
0.228
0.2284
0.2288
0.1 0.2 0.3 0.4
MAP
α
β=0.1 β=0.2
β=0.3 β=0.4
(a) TREC-6
0.1925
0.1935
0.1945
0.1955
0.1 0.2 0.3 0.4
MAP
α
β=0.1 β=0.2
β=0.3 β=0.4
(b) TREC-7
0.2485
0.2495
0.2505
0.1 0.2 0.3 0.4
MAP
α
β=0.1 β=0.2
β=0.3 β=0.4
(c) TREC-8
0.2835
0.2845
0.2855
0.2865
0.1 0.2 0.3 0.4
MAP
α
β=0.1 β=0.2
β=0.3 β=0.7
(d) Robust. . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generalized Language Model : Discussion
GLM consistently and significantly outperforms both LM and LDA
smoothed LM for all the query sets.
LDA achieves higher recall for TREC-6, Robust, but it does not
significantly increase MAP indicating, no improvement in top rank
precision.
For GLM, an increase in recall is always associated with a significant
increase in MAP as well, indicating precision at top ranks remain
relatively stable compared to LDA.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 111 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 112 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivation of this work
Vocabulary mismatch problem hindrance to effective search
Standard IR approach: Use relevance feedback along with statistical
co-occurrence of words in top ranked docs with query terms.
Semantic similarity can be captured by distances between word vectors
Research Challenges:
Develop a formal framework to use word embeddings in IR feedback
models
Utilize semantic similarity in conjunction with statistical co-occurrences
in top docs
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 113 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance based Language Model (RLM)3
Assumes that both query and relevant documents are sampled from a
latent relevance model R
In absence of training data, top ranked documents considered as set
of (known) relevant documents
The query serves as the only absolute evidence about the relevance
model
The task is to find (estimate) the density function for R
Estimates P(w|R) on the basis of P(w|Q)
3
Lavrenko et al. - SIGIR 2001
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 114 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The Estimation
The probability of sampling a term w from the R, denoted by P(w|R),
approximated by
P(w|R) ≈
P(w, Q)
P(Q)
Joint probability of observing w along with query terms
Query prior probability
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 115 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The Estimation
The probability of sampling a term w from the R, denoted by P(w|R),
approximated by
P(w|R) ≈
P(w, Q)
P(Q)
∼ P(w, Q)
Joint probability of observing w along with query terms
Query prior probability
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 115 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Independent and Indentically Distributed (IID) Sampling
Assumption
The query words Q = {q1, . . . , qk} and w sampled from
(pseudo-)relevant documents
Pseudo-relevant documents =⇒ Top ranked documents
P(w|R) ≈ P(w, Q) =
∑
D∈M
P(w|D)
∏
q∈Q
P(q|D)
M : Set of pseudo-relevant documents
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 116 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
RM34
Drawback of RLM
Does not consider the original query terms for estimation of the density
function
4
UMass at TREC 2004: Novelty and HARD - Jaleel et al.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 117 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
RM34
Drawback of RLM
Does not consider the original query terms for estimation of the density
function
RM3
A mixture model of RLM and query likelihood model
P′
(w|R) = µP(w|R) + (1 − µ)P(w|Q)
4
UMass at TREC 2004: Novelty and HARD - Jaleel et al.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 117 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
RM34
Drawback of RLM
Does not consider the original query terms for estimation of the density
function
RM3
A mixture model of RLM and query likelihood model
P′
(w|R) = µP(w|R) + (1 − µ)P(w|Q)
P(w|R) =
∑
D∈M
P(w|D)
∏
q∈Q
P(q|D)
P(w|Q) =
tf(w, Q)
|Q|
4
UMass at TREC 2004: Novelty and HARD - Jaleel et al.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 117 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Kernel Density Estimation (KDE)
76543210-1
5
4
3
2
1
0
76543210-1
5
4
3
2
1
0
76543210-1
5
4
3
2
1
0
Estimate a distribution that generates the given data (data points)
Place Gaussians centered around each observed data points
Combine the Gaussians to get a function peaked at the observed data
points
f(x) =
1
nh
n∑
i=1
K(
x − xi
h
)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 118 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
KDE with Gaussian Kernel
f(x) =
1
nh
n∑
i=1
K(
x − xi
h
)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 119 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
KDE with Gaussian Kernel
f(x) =
1
nh
n∑
i=1
K(
x − xi
h
)
K(
x − xi
h
) = N(
x
h
|
xi
h
, σ2
) =
1
√
2πσ
e
− 1
2σ2 (
x−xi
h
)
2
A normal distribution of scaled variable x
h parameterized by mean xi
h and
variance σ2
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 119 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
KDE with Gaussian Kernel
f(x) =
1
nh
n∑
i=1
K(
x − xi
h
)
K(
x − xi
h
) = N(
x
h
|
xi
h
, σ2
) =
1
√
2πσ
e
− 1
2σ2 (
x−xi
h
)
2
f(x) =
1
nh
n∑
i=1
1
√
2πσ
e
− 1
2σ2 (
x−xi
h
)
2
h and σ: Tunable Parameters
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 119 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
In Relevance Feedback Scenario
f(x) =
1
nh
n∑
i=1
K(
x − xi
h
)
The Points
Data points (xi): Q = {q1, . . . , qk}
Sample points (x): w → Candidate expansion term
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 120 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
In Relevance Feedback Scenario
f(x) =
1
nh
n∑
i=1
K(
x − xi
h
)
The Points
Data points (xi): Q = {q1, . . . , qk}
Sample points (x): w → Candidate expansion term
f(w) =
1
kh
k∑
i=1
K(
w − qi
h
)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 120 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback with KDE
f(w2)
f(q2)
f(w1)
q1 q2 q3
w1 w2
One dimensional projection of embedded vectors
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 121 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Kernel Density Estimate (Unweighted)
f(w) =
1
nh
n∑
i=1
K(
w − qi
h
)
f(w2)
f(q2)
f(w1)
q1 q2 q3
w1 w2
n - Number of samples
h - The bandwidth
K(·) - Kernel function
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 122 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Kernel Density Estimate (Weighted)
f(w, α) =
1
nh
n∑
i=1
αiK(
w − qi
h
)
f(w2)
f(q2)
f(w1)
q1 q2 q3
w1 w2
n - Number of samples
h - The bandwidth
K(·) - Kernel function
αi - Weight of local
Kernel function around
ith data point
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 123 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
One dimensional KDE
f(w2)
f(q2)
f(w1)
q1 q2 q3
w1 w2
f(w, α) =
1
kh
k∑
i=1
αi
1
√
2πσ
e
−
(w−qi)2
2σ2h2
αi : The weight of the ith query term
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 124 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
One dimensional KDE
f(w, α) =
1
kh
k∑
i=1
αi
1
√
2πσ
e
−
(w−qi)2
2σ2h2
The vector embedding of the words of the collection vocabulary
makes it possible to define and compute the value of wqi as a
distance measure, e.g. the squared Euclidean distance, (⃗w⃗qi)T
(⃗w⃗qi),
between two word vectors w and qi
(w − qi)2
= (⃗w − ⃗qi)T
(⃗w − ⃗qi)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 125 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
One dimensional KDE
f(w, α) =
k∑
i=1
P(w|M)P(qi|M)
1
σ
√
2π
exp(−
(⃗w − ⃗qi)T(⃗w − ⃗qi)
2σ2h2
)
αi : P(w|M)P(qi|M)
M : set of terms of the union of M top ranked documents
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 126 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Two dimensional KDE
In 1D-KDE, set of all top ranked documents considered as a single
document model(M).
We now generalize the 1D-KDE to two dimensions so as to weight
the contributions obtained from different documents in the ranked list
separately.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 127 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Two dimensional KDE
q1 q2 q3 Terms
Documents
D1
D2
2
1
D1
D3
The x-axis corresponds to the query term vectors similar to one
dimensional KDE
The y-axis represents the normalized term frequency of query terms in
respective feedback documents
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 127 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Two dimensional KDE
xij = (qi, Dj): Encapsulates word vector for i-th query word qi and its
normalized term frequency in feedback document Dj (P(qi|Dj))
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 128 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Two dimensional KDE
f(x, α) =
1
kMh2
k∑
i=1
M∑
j=1
αijK(
x − xij
h
)
∼
k∑
i=1
M∑
j=1
αijN(
x
h
|
xij
h
, σ)
=
k∑
i=1
M∑
j=1
αij
1
2πσ2
exp(−
1
2σ2h2
(x − xij)T
(x − xij))
=
k∑
i=1
M∑
j=1
αij
2πσ2
exp(
(⃗w − ⃗qi)2 + (P(w|Dj)−P(qi|Dj))2
−2σ2h2
)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 128 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Two dimensional KDE
f(x, α) =
k∑
i=1
M∑
j=1
P(w|Dj)P(qi|Dj)
2πσ2
exp(
(⃗w − ⃗qi)2 + (P(w|Dj)−P(qi|Dj))2
−2σ2h2
)
αij : P(w|Dj)P(qi|Dj)
w in document Dj , denoted as x, will get a high value of the density
function if:
(i) ⃗w is close to query terms ⃗qi
(ii) w frequently co-occurs with query terms in each top ranked document
Dj
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 129 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Vector Addition for Compositionality
⃗German + ⃗Airlines ∼ ⃗Lufthansa
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 130 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Vector Addition for Compositionality
u
v
w1
w2
Achieved by performing an addition operation over the embedded
vectors
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 130 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Composition in KDE
Extending the set of pivot points
q1 q2q1+q2
w
(a) Without composition
q1 q2q1+q2
w
(b) With composition
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 131 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Importance of Composition
ISI Inter Services
Intelligence
Infrared
Spatial In-
terferometer
Indian
Statistical
Institute
Institute for
Scientific
Interchange
Islamic
State of Iraq
Integral
Satcom
Initiative
Ice Skating
Institute
Islamic
School
of Irving
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 132 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Importance of Composition
ISI
Kolkata
Inter Services
Intelligence
Infrared
Spatial In-
terferometer
Indian
Statistical
Institute
Institute for
Scientific
Interchange
Islamic
State of Iraq
Integral
Satcom
Initiative
Ice Skating
Institute
Islamic
School
of Irving
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 133 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Summary
Given
Corpora : Set of documents
Query Q : q1, . . . , qk
First level retrieval performed using LM
ET = All terms of top ranked M docs. considered as potential
expansion terms
Calculate KDE(w) for each w ∈ ET
Terms with high estimated value taken as expansion terms
Linearly interpolate derived density function with underlying query
model
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 134 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Evaluation : TREC-678, Robust
Dataset Method
Parameters Metrics
M N MAP Recall P@5
TREC 6
RM3 20 70 0.2634 0.5368 0.4360
1d KDE 10 80 0.2818†
0.5757†
0.4680†
2d KDE 10 80 0.2850†
0.5724†
0.4600†
TREC 7
RM3 20 70 0.2151 0.5432 0.4160
1d KDE 10 80 0.2351†
0.6001†
0.4425†
2d KDE 10 80 0.2380 0.6108†‡
0.4400
TREC 8
RM3 20 70 0.2701 0.6410 0.4760
1d KDE 10 80 0.2746 0.6749†
0.4888
2d KDE 10 80 0.2957†‡
0.6887†
0.5120†‡
TREC Rb
RM3 20 70 0.3304 0.8559 0.4949
1d KDE 10 80 0.3228 0.8725 0.4929
2d KDE 10 80 0.3456†‡
0.8772†‡
0.5152†‡
Table: † and ‡ denote significance with respect to RLM and 1d KDE respectively.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 135 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Evaluation : WT10G
Dataset Method Parameters Metrics
M N MAP Recall P@5
TREC 9
RM3 20 70 0.1930 0.6755 0.3233
1d KDE 10 80 0.1984 0.6851 0.3360
2d KDE 10 80 0.2145†‡
0.6878 0.3562†‡
TREC 10
RM3 20 70 0.1759 0.7386 0.3347
1d KDE 10 80 0.2192†
0.7499 0.4004†
2d KDE 10 80 0.2213†
0.7502 0.4204†
Table: † and ‡ denote significance with respect to RLM and 1d KDE respectively.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 136 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contributions
Integration of semantic information with co-occurrence statistics
Integration of Kernel Density Estimate in Relevance Feedback
Incorporation of Word Compositionality in the model
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 137 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contributions
Integration of semantic information with co-occurrence statistics
Integration of Kernel Density Estimate in Relevance Feedback
Incorporation of Word Compositionality in the model
Word Vector Compositionality based Relevance Feedback
using Kernel Density Estimation (CIKM 2016)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 137 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 138 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
Motivation
Multi-modal and cross-modal IR
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 139 / 158
Documents as term vectors
Terms as dimensions of a document vector (forms an inner product
space).
Inner product d.q gives the similarity between document and query.
Figure: Sample three term space.
. . . . . . . . . . . . . . . . . . . .
Documents as sets of word embedded vectors
Each document: A set of real-valued vectors in p dimensions,
D = {xi}
|D|
i=1, xi ∈ Rp.
Need: Generalized distance (inverse similarity) measures, d(X, Y),
where X, Y are sets of vectors, which satisfy
d(X, X) = 0, d(X, Y) = d(Y, X) and d(X, Y) + d(Y, Z) ≥ d(X, Z).
. . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Documents as sets of word embedded vectors
Two distance metrics investigated:
Average inter-distance: d(X, Y) = 1
|X||Y|
∑
x∈X
∑
y∈Y d(x, y), where
d(x, y) is L2 or Euclidean distance between vectors x and y.
Hausdorff Distance:
d(X, Y) = max
(
maxx∈X miny∈Y d(x, y), maxy∈Y minx∈X d(x, y)
)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 142 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Illustrative Examples
Figure: Two example scenarios of single-topical documents, where the document
on the left has a higher similarity to the query than the document on the right.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 143 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Illustrative Examples
Figure: Two example scenarios where documents are multi-topical, i.e. K-means
clustering shows 4 distinct clusters. Document on the right is more similar to the
query.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 144 / 158
Method Details
A document is treated as a mixture model of Gaussians of the
observed constituent words.
A query is treated as the observed points drawn from the underlying
mixture distribution of a document.
The query likelihood is then given by the probability of sampling the
observed query points from the mixture distribution.
sim(q, d) =
1
K|q|
∑
i
∑
k
qi · µk (2)
This is combined with the text based query likelihood (language
model based) to obtain the final query likelihood.
P(d|q) = αPLM(d|q) + (1 − α)PWVEC(d|q) (3)
. . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Practical Considerations for Implementation
Individually estimating the Gaussian mixture model for each
document is time consuming, and slows the indexing process.
Solution: Cluster the entire vocabulary with an EM based clustering
algorithm such as K-means.
Each term is thus mapped to a cluster id.
Induce the per-document clusters by grouping together words in a
document with the same cluster id and find the centre of each group
Ck.
µk =
1
|Ck|
∑
x∈Ck
x, Ck = {xi : c(wi) = k}, i = 1, . . . , |d| (4)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 146 / 158
Results
Dataset Method Parameters Metrics
Clustered #clusters α MAP GMAP Recall P@5
TREC-6
LM n/a n/a n/a 0.2303 0.0875 0.5011 0.3920
LM+wvecsimone_cluster yes 1 0.4 0.2355 0.0918 0.5058 0.3920
LM+wvecsimno_cluster no n/a 0.4 0.2259 0.0827 0.5000 0.3600
LM+wvecsimkmeans yes 100 0.4 0.2345 0.0906 0.5027 0.4040
TREC-7
LM n/a n/a n/a 0.1750 0.0828 0.4803 0.4080
LM+wvecsimone_cluster yes 1 0.4 0.1773 0.0851 0.4897 0.3960
LM+wvecsimno_cluster no n/a 0.4 0.1664 0.0803 0.4863 0.3640
LM+wvecsimkmeans yes 100 0.4 0.1756 0.0874 0.4916 0.3840
TREC-8
LM n/a n/a n/a 0.2466 0.1318 0.5835 0.4320
LM+wvecsimone_cluster yes 1 0.4 0.2541†
0.1465 0.6017 0.4440
LM+wvecsimno_cluster no n/a 0.4 0.2473 0.1396 0.5994 0.4520
LM+wvecsimkmeans yes 100 0.4 0.2558†
0.1468 0.6017 0.4720
Robust
LM n/a n/a n/a 0.2651 0.1710 0.7803 0.4424
LM+wvecsimone_cluster yes 1 0.4 0.2690 0.1701 0.7905 0.4465
LM+wvecsimno_cluster no n/a 0.4 0.2642 0.1646 0.7900 0.4485
LM+wvecsimkmeans yes 100 0.4 0.2804†
0.1819 0.8010 0.4687
Table: Results of set-based word vector similarities with different settings. K:
#clusters, α: weight of the text based query likelihood.
. . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Observations
Results with word vector based similarities outperform pure text based
ones.
K = 100 produces best results for the TREC 8 and the TREC Robust
topic sets.
Show consistent improvements in both recall and precision at top
ranks.
Very fine-grained representation of documents (each constituent word
as its own cluster) is not optimal.
Somewhat surprisingly, K = 1, i.e., each document represented by a
single point (the average of all words) produces close results to
K = 100.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 148 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
Motivation
Multi-modal and cross-modal IR
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 149 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Embedded Vector based Multi-modal IR
Use joint embeddings of text and other data type (e.g. images) to
automatically augment text documents with semantically related
‘vectors’.
Example: For a given text document, enhance its representative
content (for the purpose of more effective search) by augmenting
relevant images from the Wikimedia (Wikipedia image collection).
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 150 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Embedded Vector based Cross-modal and Cross-lingual IR
Joint embeddings of vectors can be used for cross-lingual search.
Individual word embeddings for different languages can be aligned
with a parallel corpora.
Document-Query similarity can be measured on these embedded
vector space.
Joint embeddings can also be used for addressing cross-modal
information access, e.g. searching for text documents with image
query, searching for speech/video with text query and so on.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 151 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Joint Embedding of Multi-modal data
Data from different modalities need to be embedded in a joint space.
Why?
DeViSE (Deep Visual Semantic Embedding)
1 Apply CNN on images. Take the vector from layer f6 or f7.
2 Apply word2vec to obtain pre-trained word vectors.
3 Learn transformation from image to word.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 152 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
DeViSE Transformation
Use existing image captions to learn the transformation.
Captions are manually generated and expected to be good descriptors
of the images.
loss(image, label; θ) =
∑
j̸=label
max[0, M − ⃗tlabelθ ⃗image + ⃗tjθ ⃗image]
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 153 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 154 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
Microblog Retrieval
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 155 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Problems for short documents
Short documents lack sufficient context for effective training of the
RNN.
If document boundaries are ignored, context increases but the context
is often noisy.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 156 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Context driven word2vec training
Given a set S of contextually similar word vector pairs,
Φ(w) = {v : (w, v) ∈ S}
Transform the vector space so that v becomes close to u in the
transformed space.
Transformation function:
l(⃗w; θ) =
∑
⃗v:v∈Φ(w)
∑
⃗u:u̸∈Φ(w) max
(
0, m − ((θ⃗w)T⃗v − (θ⃗w)T⃗u)
)
Similar to the image-word training. Why?
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 157 / 158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Context driven word2vec training
The context S depends on the application.
For Microblog retrieval:
1 Other topically similar tweets.
2 Tweets within a time period - tweets are often associated with
temporal bursts.
For Session Modeling:
1 Queries that are close in time by the same user.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 158 / 158

More Related Content

Similar to ICON 2017 tutorial : Word Embedding in IR: Present and Beyond

Diversity
DiversityDiversity
DiversityTelnet
 
Farthest Neighbor Approach for Finding Initial Centroids in K- Means
Farthest Neighbor Approach for Finding Initial Centroids in K- MeansFarthest Neighbor Approach for Finding Initial Centroids in K- Means
Farthest Neighbor Approach for Finding Initial Centroids in K- MeansWaqas Tariq
 
Transcript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literatureTranscript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literatureARDC
 
50YearsDataScience.pdf
50YearsDataScience.pdf50YearsDataScience.pdf
50YearsDataScience.pdfJyothi Jangam
 
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...IJERA Editor
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Fernando de Assis Rodrigues
 
Document Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word RepresentationDocument Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word Representationsuthi
 
Workshop: Multimodal Tutor
Workshop: Multimodal TutorWorkshop: Multimodal Tutor
Workshop: Multimodal TutorDaniele Di Mitri
 
A Probabilistic Framework For Information Modelling And Retrieval Based On Us...
A Probabilistic Framework For Information Modelling And Retrieval Based On Us...A Probabilistic Framework For Information Modelling And Retrieval Based On Us...
A Probabilistic Framework For Information Modelling And Retrieval Based On Us...Heather Strinden
 
A Brief Survey Of Text Mining And Its Applications
A Brief Survey Of Text Mining And Its ApplicationsA Brief Survey Of Text Mining And Its Applications
A Brief Survey Of Text Mining And Its ApplicationsCynthia King
 
Improving Information Retrieval Performance using Word Embedding
Improving Information Retrieval Performance using Word EmbeddingImproving Information Retrieval Performance using Word Embedding
Improving Information Retrieval Performance using Word Embeddingdwaipayan123
 
Improving Information Retrieval Performance using Word Embedding
Improving Information Retrieval Performance using Word EmbeddingImproving Information Retrieval Performance using Word Embedding
Improving Information Retrieval Performance using Word EmbeddingDwaipayan Roy
 
extract_cs_km
extract_cs_kmextract_cs_km
extract_cs_kmmanesa009
 
Analysing Predictive Coding Algorithms For Document Review
Analysing Predictive Coding Algorithms For Document ReviewAnalysing Predictive Coding Algorithms For Document Review
Analysing Predictive Coding Algorithms For Document ReviewCynthia King
 
Correlation Analysis of Forensic Metadata for Digital Evidence
Correlation Analysis of Forensic Metadata for Digital EvidenceCorrelation Analysis of Forensic Metadata for Digital Evidence
Correlation Analysis of Forensic Metadata for Digital EvidenceIJCSIS Research Publications
 
Literature survey andrei_manta_0
Literature survey andrei_manta_0Literature survey andrei_manta_0
Literature survey andrei_manta_0darshanahiren
 

Similar to ICON 2017 tutorial : Word Embedding in IR: Present and Beyond (20)

Exploring Big Data Landscapes with Elastic Displays
Exploring Big Data Landscapes with Elastic DisplaysExploring Big Data Landscapes with Elastic Displays
Exploring Big Data Landscapes with Elastic Displays
 
Diversity
DiversityDiversity
Diversity
 
Farthest Neighbor Approach for Finding Initial Centroids in K- Means
Farthest Neighbor Approach for Finding Initial Centroids in K- MeansFarthest Neighbor Approach for Finding Initial Centroids in K- Means
Farthest Neighbor Approach for Finding Initial Centroids in K- Means
 
Transcript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literatureTranscript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literature
 
50YearsDataScience.pdf
50YearsDataScience.pdf50YearsDataScience.pdf
50YearsDataScience.pdf
 
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
 
2015_FIT_Talk.pptx
2015_FIT_Talk.pptx2015_FIT_Talk.pptx
2015_FIT_Talk.pptx
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...
 
Document Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word RepresentationDocument Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word Representation
 
Workshop: Multimodal Tutor
Workshop: Multimodal TutorWorkshop: Multimodal Tutor
Workshop: Multimodal Tutor
 
A Probabilistic Framework For Information Modelling And Retrieval Based On Us...
A Probabilistic Framework For Information Modelling And Retrieval Based On Us...A Probabilistic Framework For Information Modelling And Retrieval Based On Us...
A Probabilistic Framework For Information Modelling And Retrieval Based On Us...
 
A Brief Survey Of Text Mining And Its Applications
A Brief Survey Of Text Mining And Its ApplicationsA Brief Survey Of Text Mining And Its Applications
A Brief Survey Of Text Mining And Its Applications
 
Improving Information Retrieval Performance using Word Embedding
Improving Information Retrieval Performance using Word EmbeddingImproving Information Retrieval Performance using Word Embedding
Improving Information Retrieval Performance using Word Embedding
 
Improving Information Retrieval Performance using Word Embedding
Improving Information Retrieval Performance using Word EmbeddingImproving Information Retrieval Performance using Word Embedding
Improving Information Retrieval Performance using Word Embedding
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Narrative Science: A Review
Narrative Science: A ReviewNarrative Science: A Review
Narrative Science: A Review
 
extract_cs_km
extract_cs_kmextract_cs_km
extract_cs_km
 
Analysing Predictive Coding Algorithms For Document Review
Analysing Predictive Coding Algorithms For Document ReviewAnalysing Predictive Coding Algorithms For Document Review
Analysing Predictive Coding Algorithms For Document Review
 
Correlation Analysis of Forensic Metadata for Digital Evidence
Correlation Analysis of Forensic Metadata for Digital EvidenceCorrelation Analysis of Forensic Metadata for Digital Evidence
Correlation Analysis of Forensic Metadata for Digital Evidence
 
Literature survey andrei_manta_0
Literature survey andrei_manta_0Literature survey andrei_manta_0
Literature survey andrei_manta_0
 

Recently uploaded

21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological universityMohd Saifudeen
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...Amil baba
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfEr.Sonali Nasikkar
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfssuser5c9d4b1
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligencemahaffeycheryld
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailingAshishSingh1301
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationEmaan Sharma
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualBalamuruganV28
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsVIEW
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...archanaece3
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxCHAIRMAN M
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...drjose256
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdfKamal Acharya
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalSwarnaSLcse
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptjigup7320
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashidFaiyazSheikh
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfKira Dess
 

Recently uploaded (20)

21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdf
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdf
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) ppt
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
 

ICON 2017 tutorial : Word Embedding in IR: Present and Beyond