6. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
Bag of Words Representation
Vector space model
Language Model
Pseudo-Relevance Feedback and Query Expansion
IR Evaluation
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectorsD. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 5 / 158
16. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Boolean model
Keywords combined using and, or, (and) not
e.g. (medicine or treatment) and (hypertension or “high blood pressure”)
Efficient and easy to implement (list merging)
and ≡ intersection
or ≡ union
Drawbacks
or — one match as good as many
and — one miss as bad as all
no ranking
queries may be difficult to formulate
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 11 / 158
17. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
Bag of Words Representation
Vector space model
Language Model
Pseudo-Relevance Feedback and Query Expansion
IR Evaluation
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectorsD. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 12 / 158
18. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Vector space model
Any text item (“document”) is represented as list of terms and
associated weights.
D = (⟨t1, w1⟩, . . . , ⟨tn, wn⟩)
Term = keywords or content-descriptors
Weight = measure of the importance of a term in representing the
information contained in the document
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 13 / 158
19. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term weights
Term frequency (tf): repeated words are strongly related to content
Inverse document frequency (idf): uncommon term is more important
Example: medicine vs. antibiotic
Normalization by document length
long docs. contain many distinct words.
long docs. contain same word many times.
term-weights for long documents should be reduced.
use # bytes, # distinct words, Euclidean length, etc.
Weight = tf x idf / normalization
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 14 / 158
20. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term weights: commonly used weighting schemes
Pivoted normalization [Singhal et al., SIGIR 96]
1+log(tf)
1+log(average tf) × log( N
df)
(1.0 − slope) × pivot + slope × # unique terms
BM25 (probabilistic model) [Robertson and Zaragoza, FTIR 2009]
tf × log(N−df+0.5
df+0.5 )
k1((1 − b) + b dl
avdl ) + tf
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 15 / 158
23. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
Bag of Words Representation
Vector space model
Language Model
Pseudo-Relevance Feedback and Query Expansion
IR Evaluation
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectorsD. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 17 / 158
26. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Converting posteriors to priors during indexing
P(D|Q) is not known during indexing. Why?
Terms are known (from the document content).
Estimate the sampling probability of each term from a document.
P(t|D) = tf(t,D)∑
t′∈D tf(t′,D) (Maximum Likelihood)
P(D|Q) ∝ P(Q|D).P(D)
P(D|Q) ∝ P(D)
∏
q∈Q P(q|D)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 20 / 158
28. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Smoothing
Problem with Maximum likelihood?
If a query term is absent in a document, the probability becomes zero.
Too harsh.
Consider a complementary event that a term can be generated from
the collection.
Throw a coin. If ‘Heads’ then sample the term t from document D
else if ‘Tails’ sample t from collection.
Let λ be the probability of ‘Heads’. Probability of ‘Tails’ is (1 − λ).
P(t|D) = λ tf(t,D)∑
t′∈D tf(t′,D) + (1 − λ)P(t|coll).
P(t|coll) = cf(t)∑
t′∈V cf(t′)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 21 / 158
29. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
Bag of Words Representation
Vector space model
Language Model
Pseudo-Relevance Feedback and Query Expansion
IR Evaluation
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectorsD. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 22 / 158
30. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivation
Searching depends on matching keywords between user-query and
document.
Searchers and document creators may use different keywords to
denote same “concept”.
Vocabulary mismatch =⇒ poor retrieval quality.
Users may not know what they are looking for, but they’ll know when
they see it.
Effective query formulation may be difficult; simplify the problem
through iteration.
Boost recall: “find me similar documents...”
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 23 / 158
31. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivation
Searching depends on matching keywords between user-query and
document.
Searchers and document creators may use different keywords to
denote same “concept”.
Vocabulary mismatch =⇒ poor retrieval quality.
Users may not know what they are looking for, but they’ll know when
they see it.
Effective query formulation may be difficult; simplify the problem
through iteration.
Boost recall: “find me similar documents...”
Solution: expand the query by adding related words/phrases.
Issues:
select which terms to add to query
calculate weights for added terms
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 23 / 158
49. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Basic Idea
User issues a query.
Search engine returns a set of documents.
User marks some documents as relevant, some as non-relevant.
Search engine uses this feedback information, together with collection
statistics to formulate a better representation of the information need.
Search engine performs retrieval with the reformulated query.
The new set of documents returned by the engine, having (hopefully)
better recall.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 25 / 158
50. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Basic Idea
User issues a query.
Search engine returns a set of documents.
User marks some documents as relevant, some as non-relevant.
Search engine uses this feedback information, together with collection
statistics to formulate a better representation of the information need.
Search engine performs retrieval with the reformulated query.
The new set of documents returned by the engine, having (hopefully)
better recall.
Can iterate this: several rounds of relevance feedback.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 25 / 158
56. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 2
TREC 2 Topic - 113 : New Space Satellite Applications
Rank Score Document title
1 0.539 NASA Hasn’t Scrapped Imaging Spectrometer
2 0.533 NASA Scratches Environment Gear From Satellite Plan
3 0.528 Science Panel Backs NASA Satellite Plan, But Urges
Launches of Smaller Probes
4 0.526 A NASA Satellite Project Accomplishes Incredible Feat:
Staying Within Budget
5 0.525 Scientist Who Exposed Global Warming Proposes
Satellites for Climate Research
6 0.524 Report Provides Support for the Critics of Using Big
Satellites to Study Climate
7 0.516 Arianespace Receives Satellite Launch Pact From Telesat
Canada
8 0.509 Telecommunications Tale of Two Companies
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 30 / 158
57. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 2
TREC 2 Topic - 113 : New Space Satellite Applications
Rank Score Document title
✓ 1 0.539 NASA Hasn’t Scrapped Imaging Spectrometer
✓ 2 0.533 NASA Scratches Environment Gear From Satellite Plan
3 0.528 Science Panel Backs NASA Satellite Plan, But Urges
Launches of Smaller Probes
4 0.526 A NASA Satellite Project Accomplishes Incredible Feat:
Staying Within Budget
5 0.525 Scientist Who Exposed Global Warming Proposes
Satellites for Climate Research
6 0.524 Report Provides Support for the Critics of Using Big
Satellites to Study Climate
7 0.516 Arianespace Receives Satellite Launch Pact From Telesat
Canada
✓ 8 0.509 Telecommunications Tale of Two Companies
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 30 / 158
58. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 2
Initial query: new space satellite applications
Expanded query after relevance feedback
2.074 new 15.106 space
30.816 satellite 5.660 application
5.991 nasa 5.196 eos
4.196 launch 3.972 aster
3.516 instrument 3.446 arianespace
3.004 bundespost 2.806 ss
2.790 rocket 2.053 scientist
2.003 broadcast 1.172 earth
0.836 oil 0.646 measure
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 31 / 158
59. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 2
TREC 2 Topic - 113 : New Space Satellite Applications
Rank Score Document title
1 0.513 NASA Scratches Environment Gear From Satellite Plan
2 0.500 NASA Hasn’t Scrapped Imaging Spectrometer
3 0.493 When the Pentagon Launches a Secret Satellite, Space
Sleuths Do Some Spy Work of Their Own
4 0.493 NASA Uses ‘Warm’ Superconductors For Fast Circuit
5 0.492 Telecommunications Tale of Two Companies
6 0.491 Soviets May Adapt Parts of SS-20 Missile For
Commercial Use
7 0.490 Gaping Gap: Pentagon Lags in Race To Match the
Soviets In Rocket Launchers
8 0.490 Rescue of Satellite By Space Agency To Cost $90 Million
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 32 / 158
60. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance Feedback: Example 2
TREC 2 Topic - 113 : New Space Satellite Applications
Old Rank Score Document title
2 1 0.513 NASA Scratches Environment Gear From Satellite Plan
1 2 0.500 NASA Hasn’t Scrapped Imaging Spectrometer
3 0.493 When the Pentagon Launches a Secret Satellite, Space
Sleuths Do Some Spy Work of Their Own
4 0.493 NASA Uses ‘Warm’ Superconductors For Fast Circuit
8 5 0.492 Telecommunications Tale of Two Companies
6 0.491 Soviets May Adapt Parts of SS-20 Missile For
Commercial Use
7 0.490 Gaping Gap: Pentagon Lags in Race To Match the
Soviets In Rocket Launchers
8 0.490 Rescue of Satellite By Space Agency To Cost $90 Million
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 32 / 158
61. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Key concept for relevance feedback: Centroid
The centroid is the center of mass of a set of points.
Recall that we represent documents as points in a high-dimensional
space.
Thus: we can compute centroids of documents.
The centroid of a set of documents D:
⃗µ(D) =
1
|D|
∑
d∈D
⃗d
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 33 / 158
62. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm for Relevance Feedback
Standard algorithm used for relevance feedback (SMART, ’70s).
Integrates a measure of relevance feedback into Vector Space Model.
Maximize the difference between average similarities for relevant and
non-relevant documents.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 34 / 158
64. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm for Relevance Feedback
Chooses the optimal query ⃗Qopt that maximizes:
⃗Qopt = arg max
⃗Q
[
sim(⃗Q, µ(DR)) − sim(⃗Q, µ(DNR))
]
DR: Set of relevant documents
DNR: Set of non-relevant documents
µ(DX): Centroid of a set of documents X
To find the vector ⃗Qopt, that separates relevant and non-relevant
documents maximally.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 35 / 158
77. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm: Parameter Settings
⃗Qm = α⃗QO + β
1
|DR|
∑
⃗di∈DR
⃗di − γ
1
|DNR|
∑
⃗dj∈DNR
⃗dj
New Query
Set higher β, γ if there is a lot of judged documents.
Positive feedback more valuable than negative feedback
Set negative term weights to 0 (Negative weight doesn’t make sense
in vector space model).
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 48 / 158
82. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rocchio’s Algorithm: issues
What should be regarded as NR?
Documents marked non-relevant by user?
too little information
All documents not known to be relevant?
Unseen relevant documents may affect term-weights
How should terms be selected? How many?
Rank terms by # relevant documents they occur in.
Add 50-100 terms.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 50 / 158
85. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Blind/Pseudo Relevance Feedback
In the absence of feedback, assume top-ranked documents are
relevant.
Obvious danger: if initial retrieval is poor, adhoc feedback can
aggravate the problem.
Most groups at TREC found adhoc feedback useful on average.
Suggestion: find ways to improve the initial retrieval.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 53 / 158
86. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance based Language Model (RLM)1
Assumes that both query and relevant documents are sampled from a
latent relevance model R.
In absence of training data, top ranked documents considered as set
of (known) relevant documents.
The query serves as the only absolute evidence about the relevance
model.
The task is to find (estimate) the density function for R.
Estimates P(w|R) on the basis of P(w|Q).
1
Lavrenko and Croft - SIGIR 2001
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 54 / 158
90. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Independent and Indentically Distributed (IID) Sampling
Given a query Q = {q1, . . . , qk} and M, a set of initially retrieved
document.
Assumption
The query words Q = {q1, . . . , qk} and w sampled independently and
identically from (pseudo-)relevant documents
Pseudo-relevant documents =⇒ Top ranked documents
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 57 / 158
98. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
RM32
Drawback of RLM
Does not consider the original query terms for estimation of the density
function
RM3
A mixture model of RLM and query likelihood model
P′
(w|R) = µP(w|R) + (1 − µ)P(w|Q)
P(w|R) =
∑
D∈M
P(w|D)
∏
q∈Q
P(q|D)
P(w|Q) =
tf(w, Q)
|Q|
2
UMass at TREC 2004: Novelty and HARD - Jaleel et al.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 59 / 158
100. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
Bag of Words Representation
Vector space model
Language Model
Pseudo-Relevance Feedback and Query Expansion
IR Evaluation
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectorsD. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 61 / 158
103. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Cranfield method (cleverdon et al., 60s)
Benchmark data
Document collection
Query / topic collection
Relevance judgments - information about which document is relevant
to which query
syllabus
question paper
correct answers
Assumptions
relevance of a document to a query is objectively discernible
all relevant documents contribute equally to the performance
measures
relevance of a document is independent of the relevance of other
documents
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 62 / 158
111. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
(Non-interpolated) average precision
Rank Type Recall Precision
1 Relevant 0.2 1.00
2 Non-relevant
3 Relevant 0.4 0.67
4 Non-relevant
5 Non-relevant
6 Relevant 0.6 0.50
∞ Relevant 0.8 0.00
∞ Relevant 1.0 0.00
AvgP =
1
5
(1 +
2
3
+
3
6
)
(5 relevant docs. in all)
AvgP =
1
NRel
∑
di∈Rel
i
Rank(di)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 66 / 158
112. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
Interpolated average precision at a given recall point
Recall points correspond to 1
NRel
NRel different for different queries
P
R
1.00.0
Q1 (3 rel. docs)
Q2 (4 rel. docs)
Interpolation required to compute averages across queries
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 67 / 158
117. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for sub-document retrieval
Let pr - document part retrieved at rank r
rsize(pr) - amount of relevant text contained by pr
size(pr) - total number of characters contained by pr
Trel - total amount of relevant text for a given topic
P[r] =
∑r
i=1 rsize(pi)
∑r
i=1 size(pi)
R[r] =
1
Trel
r∑
i=1
rsize(pi)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 70 / 158
118. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metrics for ranked results
Precision at k (P@k) - precision after k documents have been
retrieved
easy to interpret
not very stable / discriminatory
does not average well
R precision - precision after NRel documents have been retrieved
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 71 / 158
125. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
TREC data
Documents
Genres:
news (AP, LA Times, WSJ, SJMN, Financial Times, FBIS)
govt. documents (Federal Register, Congressional Records)
technical articles (Ziff Davis, DOE abstracts)
Size: 0.8 million documents – 1.7 million web pages
(cf. Google indexes several billion pages)
Topics
title
description
narrative
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 76 / 158
126. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 77 / 158
127. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
What is Word Embedding?
Represent every word as a vector in some abstract space.
What are the characteristics of this space?
Two terms t1 and t2 are close if and only if they share similar contexts.
Paris is close to France. Why?
If Paris is close to France, then Berlin will be close to Germany. Why?
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 78 / 158
128. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
Word2vec: Working principle
Word2vec: Properties and Limitations
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 79 / 158
130. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word2vec
Architecture:
Input: One-hot vector representation of words.
Context: Concatenated one-hot vectors of neighboring words.
Output: Softmax of dimension D (the desired dimensionality of the
vector).
Connecting the cells
Current output is a function of current input and previous outputs.
y(wn) = f(y(wn−1), c(wn)) (1)
Objective Function (to maximize):
J(W) =
∑
wt,ct∈D+
∑
c∈ct
p(D = 1| ⃗wt,⃗c)) −
∑
wt,c′
t∈D−
∑
c∈c′
t
p(D = 1| ⃗wt,⃗c))
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 81 / 158
131. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
Word2vec: Working principle
Word2vec: Properties and Limitations
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 82 / 158
133. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
FastText - Learning word vectors from character sequences
Word2vec Limitations
1 For less frequent terms, vector representations would be noisy. Why?
2 For morphologically complex languages, the word vector representation
would also be noisy. Why?
Solution: FastText
1 Learn vector representations of character n-grams.
2 Obtain vector representation of a whole word by summing the vectors
of the n-grams.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 84 / 158
134. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 85 / 158
135. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
Problems with the Term Independence Assumption
Word vectors to address term dependence
Multi-modal
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 86 / 158
136. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Dependence: Initial Retrieval
Term association: Has been an intriguing problem in IR.
Vocabulary mismatch: Different terms may used in two documents
that are about the same topic, e.g. “atomic” and “nuclear” etc.
Terms used in query are different from those in its relevant
documents.
Standard retrieval models assume term independence.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 87 / 158
137. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Dependence: Relevance Feedback
Uses statistical co-occurrence of words in top ranked docs with query
terms.
No way to take into account multi-word ‘concepts’, e.g. relating
‘osteoporosis’ to ‘bone disease’ beyond pre-defined phrases.
Noisy expansion terms can lead to ‘query drift’ and hence degraded
IR effectiveness after RF.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 88 / 158
138. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
Problems with the Term Independence Assumption
Word vectors to address term dependence
Multi-modal
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 89 / 158
142. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word Embedding for Relevance Feedback (RF)
Semantic similarity captured by distance measure between word
vectors.
Integrate semantic similarity with statistical co-occurrences between
terms for RF.
Exploit term compositionality to extract meaningful concepts to use
in RF.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 93 / 158
143. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
Problems with the Term Independence Assumption
Word vectors to address term dependence
Multi-modal
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 94 / 158
144. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word Embeddings for Multi-modal IR
Multi-modality: A document comprised of text, images, speech,
video, e.g. a typical Wikipedia page.
Given a unimodal (e.g. text/image) query or more generally a
multi-modal query, how can one retrieve relevant multi-modal
documents?
Given a Wiki page as a query, what are the other relevant documents?
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 95 / 158
145. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Word Embeddings for Multi-modal IR
Index the different modalities separately. Compute similarities
individually and fuse.
Problems: Different retrieval strategies. How to combine the scores?
Vector Embedding Approach: Joint embedding of categorical data,
such as text, and continuous data such as image features into vectors
of reals.
What we need: A similarity function between sets of vectors.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 96 / 158
146. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 97 / 158
150. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Transformation Events
If a term t is observed in query corresponding to a document d, it may
have occured in three possible ways:
Direct term sampling: Standard LM term sampling
Transformation via Document Sampling: Sampling a term t′ from
d which is then transformed to t by a noisy channel
Transformation via Collection Sampling: Sampling the term t′
from the collection which is then transformed to t by the noisy
channel
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 101 / 158
155. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Term Transformation Events
Noisy Channel
d
C
t
λ
t′
1 − λ − α − β
α
β
Figure: Schematics of generating a query term t in Generalized Language Model
(GLM). GLM degenerates to LM when α = β = 0.
λ: Standard Jelinek Mercer weighting parameter
α: Probability of sampling term t via a transformation through a term
t′ sampled from the document d
β: Probability of sampling term t via a transformation through a term
t′ sampled from collection
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 106 / 158
160. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generalized Language Model : Discussion
GLM consistently and significantly outperforms both LM and LDA
smoothed LM for all the query sets.
LDA achieves higher recall for TREC-6, Robust, but it does not
significantly increase MAP indicating, no improvement in top rank
precision.
For GLM, an increase in recall is always associated with a significant
increase in MAP as well, indicating precision at top ranks remain
relatively stable compared to LDA.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 111 / 158
161. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 112 / 158
162. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivation of this work
Vocabulary mismatch problem hindrance to effective search
Standard IR approach: Use relevance feedback along with statistical
co-occurrence of words in top ranked docs with query terms.
Semantic similarity can be captured by distances between word vectors
Research Challenges:
Develop a formal framework to use word embeddings in IR feedback
models
Utilize semantic similarity in conjunction with statistical co-occurrences
in top docs
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 113 / 158
163. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Relevance based Language Model (RLM)3
Assumes that both query and relevant documents are sampled from a
latent relevance model R
In absence of training data, top ranked documents considered as set
of (known) relevant documents
The query serves as the only absolute evidence about the relevance
model
The task is to find (estimate) the density function for R
Estimates P(w|R) on the basis of P(w|Q)
3
Lavrenko et al. - SIGIR 2001
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 114 / 158
169. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
RM34
Drawback of RLM
Does not consider the original query terms for estimation of the density
function
RM3
A mixture model of RLM and query likelihood model
P′
(w|R) = µP(w|R) + (1 − µ)P(w|Q)
P(w|R) =
∑
D∈M
P(w|D)
∏
q∈Q
P(q|D)
P(w|Q) =
tf(w, Q)
|Q|
4
UMass at TREC 2004: Novelty and HARD - Jaleel et al.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 117 / 158
170. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Kernel Density Estimation (KDE)
76543210-1
5
4
3
2
1
0
76543210-1
5
4
3
2
1
0
76543210-1
5
4
3
2
1
0
Estimate a distribution that generates the given data (data points)
Place Gaussians centered around each observed data points
Combine the Gaussians to get a function peaked at the observed data
points
f(x) =
1
nh
n∑
i=1
K(
x − xi
h
)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 118 / 158
180. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
One dimensional KDE
f(w, α) =
1
kh
k∑
i=1
αi
1
√
2πσ
e
−
(w−qi)2
2σ2h2
The vector embedding of the words of the collection vocabulary
makes it possible to define and compute the value of wqi as a
distance measure, e.g. the squared Euclidean distance, (⃗w⃗qi)T
(⃗w⃗qi),
between two word vectors w and qi
(w − qi)2
= (⃗w − ⃗qi)T
(⃗w − ⃗qi)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 125 / 158
182. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Two dimensional KDE
In 1D-KDE, set of all top ranked documents considered as a single
document model(M).
We now generalize the 1D-KDE to two dimensions so as to weight
the contributions obtained from different documents in the ranked list
separately.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 127 / 158
186. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Two dimensional KDE
f(x, α) =
k∑
i=1
M∑
j=1
P(w|Dj)P(qi|Dj)
2πσ2
exp(
(⃗w − ⃗qi)2 + (P(w|Dj)−P(qi|Dj))2
−2σ2h2
)
αij : P(w|Dj)P(qi|Dj)
w in document Dj , denoted as x, will get a high value of the density
function if:
(i) ⃗w is close to query terms ⃗qi
(ii) w frequently co-occurs with query terms in each top ranked document
Dj
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 129 / 158
192. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Summary
Given
Corpora : Set of documents
Query Q : q1, . . . , qk
First level retrieval performed using LM
ET = All terms of top ranked M docs. considered as potential
expansion terms
Calculate KDE(w) for each w ∈ ET
Terms with high estimated value taken as expansion terms
Linearly interpolate derived density function with underlying query
model
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 134 / 158
193. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Evaluation : TREC-678, Robust
Dataset Method
Parameters Metrics
M N MAP Recall P@5
TREC 6
RM3 20 70 0.2634 0.5368 0.4360
1d KDE 10 80 0.2818†
0.5757†
0.4680†
2d KDE 10 80 0.2850†
0.5724†
0.4600†
TREC 7
RM3 20 70 0.2151 0.5432 0.4160
1d KDE 10 80 0.2351†
0.6001†
0.4425†
2d KDE 10 80 0.2380 0.6108†‡
0.4400
TREC 8
RM3 20 70 0.2701 0.6410 0.4760
1d KDE 10 80 0.2746 0.6749†
0.4888
2d KDE 10 80 0.2957†‡
0.6887†
0.5120†‡
TREC Rb
RM3 20 70 0.3304 0.8559 0.4949
1d KDE 10 80 0.3228 0.8725 0.4929
2d KDE 10 80 0.3456†‡
0.8772†‡
0.5152†‡
Table: † and ‡ denote significance with respect to RLM and 1d KDE respectively.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 135 / 158
194. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Evaluation : WT10G
Dataset Method Parameters Metrics
M N MAP Recall P@5
TREC 9
RM3 20 70 0.1930 0.6755 0.3233
1d KDE 10 80 0.1984 0.6851 0.3360
2d KDE 10 80 0.2145†‡
0.6878 0.3562†‡
TREC 10
RM3 20 70 0.1759 0.7386 0.3347
1d KDE 10 80 0.2192†
0.7499 0.4004†
2d KDE 10 80 0.2213†
0.7502 0.4204†
Table: † and ‡ denote significance with respect to RLM and 1d KDE respectively.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 136 / 158
197. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 138 / 158
198. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
Motivation
Multi-modal and cross-modal IR
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 139 / 158
199. Documents as term vectors
Terms as dimensions of a document vector (forms an inner product
space).
Inner product d.q gives the similarity between document and query.
Figure: Sample three term space.
. . . . . . . . . . . . . . . . . . . .
200. Documents as sets of word embedded vectors
Each document: A set of real-valued vectors in p dimensions,
D = {xi}
|D|
i=1, xi ∈ Rp.
Need: Generalized distance (inverse similarity) measures, d(X, Y),
where X, Y are sets of vectors, which satisfy
d(X, X) = 0, d(X, Y) = d(Y, X) and d(X, Y) + d(Y, Z) ≥ d(X, Z).
. . . . . . . . . . . . . . . . . . . .
201. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Documents as sets of word embedded vectors
Two distance metrics investigated:
Average inter-distance: d(X, Y) = 1
|X||Y|
∑
x∈X
∑
y∈Y d(x, y), where
d(x, y) is L2 or Euclidean distance between vectors x and y.
Hausdorff Distance:
d(X, Y) = max
(
maxx∈X miny∈Y d(x, y), maxy∈Y minx∈X d(x, y)
)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 142 / 158
204. Method Details
A document is treated as a mixture model of Gaussians of the
observed constituent words.
A query is treated as the observed points drawn from the underlying
mixture distribution of a document.
The query likelihood is then given by the probability of sampling the
observed query points from the mixture distribution.
sim(q, d) =
1
K|q|
∑
i
∑
k
qi · µk (2)
This is combined with the text based query likelihood (language
model based) to obtain the final query likelihood.
P(d|q) = αPLM(d|q) + (1 − α)PWVEC(d|q) (3)
. . . . . . . . . . . . . . . . . . . .
205. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Practical Considerations for Implementation
Individually estimating the Gaussian mixture model for each
document is time consuming, and slows the indexing process.
Solution: Cluster the entire vocabulary with an EM based clustering
algorithm such as K-means.
Each term is thus mapped to a cluster id.
Induce the per-document clusters by grouping together words in a
document with the same cluster id and find the centre of each group
Ck.
µk =
1
|Ck|
∑
x∈Ck
x, Ck = {xi : c(wi) = k}, i = 1, . . . , |d| (4)
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 146 / 158
207. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Observations
Results with word vector based similarities outperform pure text based
ones.
K = 100 produces best results for the TREC 8 and the TREC Robust
topic sets.
Show consistent improvements in both recall and precision at top
ranks.
Very fine-grained representation of documents (each constituent word
as its own cluster) is not optimal.
Somewhat surprisingly, K = 1, i.e., each document represented by a
single point (the average of all words) produces close results to
K = 100.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 148 / 158
208. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
Motivation
Multi-modal and cross-modal IR
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 149 / 158
209. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Embedded Vector based Multi-modal IR
Use joint embeddings of text and other data type (e.g. images) to
automatically augment text documents with semantically related
‘vectors’.
Example: For a given text document, enhance its representative
content (for the purpose of more effective search) by augmenting
relevant images from the Wikimedia (Wikipedia image collection).
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 150 / 158
210. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Embedded Vector based Cross-modal and Cross-lingual IR
Joint embeddings of vectors can be used for cross-lingual search.
Individual word embeddings for different languages can be aligned
with a parallel corpora.
Document-Query similarity can be measured on these embedded
vector space.
Joint embeddings can also be used for addressing cross-modal
information access, e.g. searching for text documents with image
query, searching for speech/video with text query and so on.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 151 / 158
211. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Joint Embedding of Multi-modal data
Data from different modalities need to be embedded in a joint space.
Why?
DeViSE (Deep Visual Semantic Embedding)
1 Apply CNN on images. Take the vector from layer f6 or f7.
2 Apply word2vec to obtain pre-trained word vectors.
3 Learn transformation from image to word.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 152 / 158
213. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 154 / 158
214. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Outline
1 Information Retrieval: A Primer
2 Introduction to Word Embedding
3 An overview of Word Embedding applications for IR
4 Generalized Language Model
5 Embedding based Relevance Feedback with Kernel Density Estimate
6 Documents as sets of vectors
7 Task specific word vector learning
Microblog Retrieval
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 155 / 158
216. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Context driven word2vec training
Given a set S of contextually similar word vector pairs,
Φ(w) = {v : (w, v) ∈ S}
Transform the vector space so that v becomes close to u in the
transformed space.
Transformation function:
l(⃗w; θ) =
∑
⃗v:v∈Φ(w)
∑
⃗u:u̸∈Φ(w) max
(
0, m − ((θ⃗w)T⃗v − (θ⃗w)T⃗u)
)
Similar to the image-word training. Why?
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 157 / 158
217. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Context driven word2vec training
The context S depends on the application.
For Microblog retrieval:
1 Other topically similar tweets.
2 Tweets within a time period - tweets are often associated with
temporal bursts.
For Session Modeling:
1 Queries that are close in time by the same user.
D. Ganguly, D. Roy, M. Mitra Word Embeddings in IR December 19, 2017 158 / 158