SlideShare a Scribd company logo
1 of 1
Download to read offline
Using Word Embedding for
Automatic Query Expansion
Dwaipayan Roy, Debjyoti Paul, Mandar Mitra and Utpal Garain
Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata, India
Abstract
• Automatic Query Expansion intuitively
requires terms which are semantically similar
to the query terms.
• Neural Word Embedding Word2Vec captures
both semantic and syntactic regularities in the
language
• We explore Word2Vec framework to find
semantically similar terms for Query
expansion
• Experimentation results though significant
improvement over baseline, lags behind RM3
Composition of Query
Future work
C
V
P
R
Unit
Query Expansion Method Result
Discussion
Motivation
 Word2Vec framework generates word
embedding which capture semantic and
syntactic regularities
 It has shown improved performance in clinical
decision support and cross-lingual retrieval
 We try to answer the following points:
1. Does QE, using the nearest neighbors of
query terms, improve retrieval effectiveness?
2. If yes, is it possible to characterize the queries
for which this QE method does / does not
work?
3. How does embedding based QE perform
compared to an established QE technique like
RM3 [1]?
Our Contribution
 Improving retrieval performance by finding
semantically similar terms for query expansion in
ad-hoc retrieval.
 Similar terms are found by computing K nearest
neighbour (K-NN) of query term.
 Our contribution is two-fold:
(1) Proposing a composition function for multi-term
query for finding K-NN terms
(2) Proposing an incremental K-NN algorithm to
reduce query drift during expansion.
Query Expansion Methods
 Pre-retrieval KNN: The Expansion terms
for a query 𝑸 such that 𝒒 𝟏, 𝒒 𝟐, … , 𝒒 𝒏 ∈ 𝑸 be
the n terms in it:
𝑪 =
𝒒∈𝑸
𝑵𝑵(𝒒)
 𝑁𝑁 𝑞 is the 𝑘 terms closest to 𝒒 in the
embedding space
 The mean similarity between a candidate
expansion term 𝒕 and all terms in 𝒒 is
computed by:
1
𝑄
𝑞 𝑖∈𝑄
𝑡. 𝑞𝑖
 The Post-retrieval KNN follows the similar
approach except the expansion terms search
space is reduced to the pseudo relevant
documents.
 Incremental KNN: Let the nearest
neighbors of 𝑞 in order of decreasing
similarity be 𝑡1, 𝑡2, … , 𝑡 𝑁.
Incremental KNN
 We prune the 𝐾 least similar neighbors to
obtain 𝑡1, 𝑡2, … , 𝑡 𝑁−𝐾 .
 Next, we consider 𝑡1, and reorder the
terms 𝑡2, … , 𝑡 𝑁−𝐾 in decreasing order of
similarity with $t_1$.
 Again, the 𝐾 least similar neighbors in the
reordered list are pruned to obtain
𝑡2
′
, 𝑡3
′
, … 𝑡 𝑁−2𝑘
′
.Next, we pick 𝑡2
′
and
repeat the same process.
 This continues for 𝑙 iterations.
 Given a query 𝑄 consisting of 𝑚 terms
{𝑞1, … , 𝑞 𝑚}. we first construct 𝑄 𝑐, the set of
query word bigrams.
 𝑄 𝑐 = { 𝑞1, 𝑞2 , 𝑞2, 𝑞3 , … , 〈 𝑞 𝑚−1 , 𝑞 𝑚 〉
 We define the embedding for a bigram
〈 𝑞𝑖, 𝑞 𝑖+1 〉 as simply q 𝑖 + 𝑞𝑖+1
 Next, we define the extended query term set
(EQTS) 𝑄′ as 𝑄′
= 𝑄 ∪ 𝑄 𝑐
𝑃 𝑤 𝑄 𝑒𝑥𝑝
= 𝛼 𝑃 𝑤 𝑄 + 1 − 𝛼
𝑆𝑖𝑚(𝑤, 𝑄)
𝑤∈𝑄 𝑒𝑥𝑝
𝑆𝑖𝑚(𝑤, 𝑄)
𝛼 is the interpolation parameter used to combine
the original unexpanded query with the expansion
terms.
where 𝑄 𝐾 represents the set of top 𝐾 terms from
𝐶, the set of candidate expansion terms
Retrieval
Experimental Setup
Use Composition or Not ?
Dataset Oveview
Benefit of pairwise composition
 The semantic and contextual information in
Word2Vec embedding is leveraged here.
 Query expansion intuitively calls for finding
terms which are similar to the query, and terms
which occurs frequently in the relevant
documents
 In the proposed expansion method terms
similar to query terms at collection level
abstract space are considered for expansion
 For Post retrieval KNN the search space for the
expansion terms is reduced to relevant
feedback documents.
 Incremental KNN reduces the query drift
beyond semantic similarity, which is not the
case in Pre-retrieval or Post retrieval method.
This justifies consistent better performance of
Incremental method.
 Experiments showing RM3 performing better
on the TREC ad-hoc and web that the co-
occurrence statistics is more powerful than the
similarity in the abstract space.
 The obvious future work is to include the co-
occurrence statistics at collection level along
with Word2Vec
 To address the generalization effect introduced
will further improve the performance of our
proposed method.
 Local retraining of word embedding might be
one possibilityAcknowledgement

More Related Content

What's hot

TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsShubhangi Tandon
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Quinsulon Israel
 
2 column paper
2 column paper2 column paper
2 column paperAksh Gupta
 
Combining inductive and analytical learning
Combining inductive and analytical learningCombining inductive and analytical learning
Combining inductive and analytical learningswapnac12
 
Spell Checker and string matching Using BK tree
Spell Checker and string matching Using BK treeSpell Checker and string matching Using BK tree
Spell Checker and string matching Using BK tree111shridhar
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amramit nagarkoti
 
Optimizing Near-Synonym System
Optimizing Near-Synonym SystemOptimizing Near-Synonym System
Optimizing Near-Synonym SystemSiyuan Zhou
 
Improvement of Text Summarization using Fuzzy Logic Based Method
Improvement of Text Summarization using Fuzzy Logic Based  MethodImprovement of Text Summarization using Fuzzy Logic Based  Method
Improvement of Text Summarization using Fuzzy Logic Based MethodIOSR Journals
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge BaseShubham Agarwal
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
Neural Network in Knowledge Bases
Neural Network in Knowledge BasesNeural Network in Knowledge Bases
Neural Network in Knowledge BasesKushal Arora
 
Nearest keyword set search in multi dimensional datasets
Nearest keyword set search in multi dimensional datasetsNearest keyword set search in multi dimensional datasets
Nearest keyword set search in multi dimensional datasetsShakas Technologies
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
 
Analytical learning
Analytical learningAnalytical learning
Analytical learningswapnac12
 
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...TELKOMNIKA JOURNAL
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsIJCSIS Research Publications
 
Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learningswapnac12
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document RankingBhaskar Mitra
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
 

What's hot (20)

TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
2 column paper
2 column paper2 column paper
2 column paper
 
Combining inductive and analytical learning
Combining inductive and analytical learningCombining inductive and analytical learning
Combining inductive and analytical learning
 
Spell Checker and string matching Using BK tree
Spell Checker and string matching Using BK treeSpell Checker and string matching Using BK tree
Spell Checker and string matching Using BK tree
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
 
Optimizing Near-Synonym System
Optimizing Near-Synonym SystemOptimizing Near-Synonym System
Optimizing Near-Synonym System
 
Improvement of Text Summarization using Fuzzy Logic Based Method
Improvement of Text Summarization using Fuzzy Logic Based  MethodImprovement of Text Summarization using Fuzzy Logic Based  Method
Improvement of Text Summarization using Fuzzy Logic Based Method
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Neural Network in Knowledge Bases
Neural Network in Knowledge BasesNeural Network in Knowledge Bases
Neural Network in Knowledge Bases
 
Nearest keyword set search in multi dimensional datasets
Nearest keyword set search in multi dimensional datasetsNearest keyword set search in multi dimensional datasets
Nearest keyword set search in multi dimensional datasets
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
Analytical learning
Analytical learningAnalytical learning
Analytical learning
 
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
 
Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learning
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
Nn kb
Nn kbNn kb
Nn kb
 

Similar to Using Word Embedding for Automatic Query Expansion

AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...IJCSEA Journal
 
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICSEVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICSkevig
 
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICSEVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICSkevig
 
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...IRJET Journal
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluationijnlc
 
Approaches for Keyword Query Routing
Approaches for Keyword Query RoutingApproaches for Keyword Query Routing
Approaches for Keyword Query RoutingIJERA Editor
 
An Approach for Big Data to Evolve the Auspicious Information from Cross-Domains
An Approach for Big Data to Evolve the Auspicious Information from Cross-DomainsAn Approach for Big Data to Evolve the Auspicious Information from Cross-Domains
An Approach for Big Data to Evolve the Auspicious Information from Cross-DomainsIJECEIAES
 
Conceptual Similarity Measurement Algorithm For Domain Specific Ontology
Conceptual Similarity Measurement Algorithm For Domain Specific OntologyConceptual Similarity Measurement Algorithm For Domain Specific Ontology
Conceptual Similarity Measurement Algorithm For Domain Specific OntologyZac Darcy
 
Conceptual similarity measurement algorithm for domain specific ontology[
Conceptual similarity measurement algorithm for domain specific ontology[Conceptual similarity measurement algorithm for domain specific ontology[
Conceptual similarity measurement algorithm for domain specific ontology[Zac Darcy
 
Enhancing Keyword Query Results Over Database for Improving User Satisfaction
Enhancing Keyword Query Results Over Database for Improving User Satisfaction Enhancing Keyword Query Results Over Database for Improving User Satisfaction
Enhancing Keyword Query Results Over Database for Improving User Satisfaction ijmpict
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationIJECEIAES
 
IRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET Journal
 
Semantic Similarity Between Sentences
Semantic Similarity Between SentencesSemantic Similarity Between Sentences
Semantic Similarity Between SentencesIRJET Journal
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Techniqueiosrjce
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similaritypathsproject
 
Different Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using KnnDifferent Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using KnnIOSR Journals
 

Similar to Using Word Embedding for Automatic Query Expansion (20)

AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
 
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICSEVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
 
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICSEVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
EVALUATION OF SEMANTIC ANSWER SIMILARITY METRICS
 
semeval2016
semeval2016semeval2016
semeval2016
 
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
 
TEXT CLUSTERING.doc
TEXT CLUSTERING.docTEXT CLUSTERING.doc
TEXT CLUSTERING.doc
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
 
Approaches for Keyword Query Routing
Approaches for Keyword Query RoutingApproaches for Keyword Query Routing
Approaches for Keyword Query Routing
 
An Approach for Big Data to Evolve the Auspicious Information from Cross-Domains
An Approach for Big Data to Evolve the Auspicious Information from Cross-DomainsAn Approach for Big Data to Evolve the Auspicious Information from Cross-Domains
An Approach for Big Data to Evolve the Auspicious Information from Cross-Domains
 
Conceptual Similarity Measurement Algorithm For Domain Specific Ontology
Conceptual Similarity Measurement Algorithm For Domain Specific OntologyConceptual Similarity Measurement Algorithm For Domain Specific Ontology
Conceptual Similarity Measurement Algorithm For Domain Specific Ontology
 
Conceptual similarity measurement algorithm for domain specific ontology[
Conceptual similarity measurement algorithm for domain specific ontology[Conceptual similarity measurement algorithm for domain specific ontology[
Conceptual similarity measurement algorithm for domain specific ontology[
 
Enhancing Keyword Query Results Over Database for Improving User Satisfaction
Enhancing Keyword Query Results Over Database for Improving User Satisfaction Enhancing Keyword Query Results Over Database for Improving User Satisfaction
Enhancing Keyword Query Results Over Database for Improving User Satisfaction
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
 
IRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between Sentences
 
Semantic Similarity Between Sentences
Semantic Similarity Between SentencesSemantic Similarity Between Sentences
Semantic Similarity Between Sentences
 
J017256674
J017256674J017256674
J017256674
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Technique
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
Different Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using KnnDifferent Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using Knn
 
Cg4201552556
Cg4201552556Cg4201552556
Cg4201552556
 

Recently uploaded

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 

Recently uploaded (20)

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 

Using Word Embedding for Automatic Query Expansion

  • 1. Using Word Embedding for Automatic Query Expansion Dwaipayan Roy, Debjyoti Paul, Mandar Mitra and Utpal Garain Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata, India Abstract • Automatic Query Expansion intuitively requires terms which are semantically similar to the query terms. • Neural Word Embedding Word2Vec captures both semantic and syntactic regularities in the language • We explore Word2Vec framework to find semantically similar terms for Query expansion • Experimentation results though significant improvement over baseline, lags behind RM3 Composition of Query Future work C V P R Unit Query Expansion Method Result Discussion Motivation  Word2Vec framework generates word embedding which capture semantic and syntactic regularities  It has shown improved performance in clinical decision support and cross-lingual retrieval  We try to answer the following points: 1. Does QE, using the nearest neighbors of query terms, improve retrieval effectiveness? 2. If yes, is it possible to characterize the queries for which this QE method does / does not work? 3. How does embedding based QE perform compared to an established QE technique like RM3 [1]? Our Contribution  Improving retrieval performance by finding semantically similar terms for query expansion in ad-hoc retrieval.  Similar terms are found by computing K nearest neighbour (K-NN) of query term.  Our contribution is two-fold: (1) Proposing a composition function for multi-term query for finding K-NN terms (2) Proposing an incremental K-NN algorithm to reduce query drift during expansion. Query Expansion Methods  Pre-retrieval KNN: The Expansion terms for a query 𝑸 such that 𝒒 𝟏, 𝒒 𝟐, … , 𝒒 𝒏 ∈ 𝑸 be the n terms in it: 𝑪 = 𝒒∈𝑸 𝑵𝑵(𝒒)  𝑁𝑁 𝑞 is the 𝑘 terms closest to 𝒒 in the embedding space  The mean similarity between a candidate expansion term 𝒕 and all terms in 𝒒 is computed by: 1 𝑄 𝑞 𝑖∈𝑄 𝑡. 𝑞𝑖  The Post-retrieval KNN follows the similar approach except the expansion terms search space is reduced to the pseudo relevant documents.  Incremental KNN: Let the nearest neighbors of 𝑞 in order of decreasing similarity be 𝑡1, 𝑡2, … , 𝑡 𝑁. Incremental KNN  We prune the 𝐾 least similar neighbors to obtain 𝑡1, 𝑡2, … , 𝑡 𝑁−𝐾 .  Next, we consider 𝑡1, and reorder the terms 𝑡2, … , 𝑡 𝑁−𝐾 in decreasing order of similarity with $t_1$.  Again, the 𝐾 least similar neighbors in the reordered list are pruned to obtain 𝑡2 ′ , 𝑡3 ′ , … 𝑡 𝑁−2𝑘 ′ .Next, we pick 𝑡2 ′ and repeat the same process.  This continues for 𝑙 iterations.  Given a query 𝑄 consisting of 𝑚 terms {𝑞1, … , 𝑞 𝑚}. we first construct 𝑄 𝑐, the set of query word bigrams.  𝑄 𝑐 = { 𝑞1, 𝑞2 , 𝑞2, 𝑞3 , … , 〈 𝑞 𝑚−1 , 𝑞 𝑚 〉  We define the embedding for a bigram 〈 𝑞𝑖, 𝑞 𝑖+1 〉 as simply q 𝑖 + 𝑞𝑖+1  Next, we define the extended query term set (EQTS) 𝑄′ as 𝑄′ = 𝑄 ∪ 𝑄 𝑐 𝑃 𝑤 𝑄 𝑒𝑥𝑝 = 𝛼 𝑃 𝑤 𝑄 + 1 − 𝛼 𝑆𝑖𝑚(𝑤, 𝑄) 𝑤∈𝑄 𝑒𝑥𝑝 𝑆𝑖𝑚(𝑤, 𝑄) 𝛼 is the interpolation parameter used to combine the original unexpanded query with the expansion terms. where 𝑄 𝐾 represents the set of top 𝐾 terms from 𝐶, the set of candidate expansion terms Retrieval Experimental Setup Use Composition or Not ? Dataset Oveview Benefit of pairwise composition  The semantic and contextual information in Word2Vec embedding is leveraged here.  Query expansion intuitively calls for finding terms which are similar to the query, and terms which occurs frequently in the relevant documents  In the proposed expansion method terms similar to query terms at collection level abstract space are considered for expansion  For Post retrieval KNN the search space for the expansion terms is reduced to relevant feedback documents.  Incremental KNN reduces the query drift beyond semantic similarity, which is not the case in Pre-retrieval or Post retrieval method. This justifies consistent better performance of Incremental method.  Experiments showing RM3 performing better on the TREC ad-hoc and web that the co- occurrence statistics is more powerful than the similarity in the abstract space.  The obvious future work is to include the co- occurrence statistics at collection level along with Word2Vec  To address the generalization effect introduced will further improve the performance of our proposed method.  Local retraining of word embedding might be one possibilityAcknowledgement