SlideShare a Scribd company logo
1
Chapter Seven
Query Operations
 Relevance feedback
 Query expansion
Problems with Keywords
• May not retrieve relevant documents that
include synonymous terms.
◦ “restaurant” vs. “café”
◦ “PRC” vs. “China”
• May retrieve irrelevant documents that include
ambiguous terms.
◦ “bat” (baseball vs. mammal)
◦ “Apple” (company vs. fruit)
◦ “bit” (unit of data vs. act of eating)
Techniques for Intelligent IR
• Take into account the meaning of the words used
• Take into account the order of words in the query
• Adapt to the user based on automatic or semi-
automatic feedback
• Extend search with related terms
• Perform automatic spell checking / diacritics
restoration (diacritics-A mark added to a letter to
indicate a special pronunciation)
• Take into account the authority of the source.
Query operations
l No detailed knowledge of collection and retrieval
environment
difficult to formulate queries well designed for
retrieval
Need many formulations of queries for effective
retrieval
uFirst formulation: often naïve attempt to retrieve
relevant information
uDocuments initially retrieved:
Can be examined for relevance information by
user, automatically by the system
 Improve query formulations for retrieving
additional relevant documents
2
Query reformulation
•Two basic techniques to revise query to account for
feedback:
–Query expansion: Expanding original query with
new terms from relevant documents.
• This is done by adding new terms to query from
relevant documents
–Term reweighting in expanded query: Modify term
weights based on user relevance judgements.
• Increase weight of terms in relevant documents
• decrease weight of terms in irrelevant documents
6
Approaches for Relevance Feedback
Approaches based on Users relevance feedback
 Relevance feedback with user input
Clustering hypothesis: known relevant documents contain terms which
can be used to describe a larger cluster of relevant documents
Description of cluster built interactively with user
assistance
Approaches based on pseudo relevance feedback
 Use relevance feedback methods without explicit user
involvement.
Obtain cluster description automatically
Identify terms related to query terms
e.g. synonyms, stemming variations, terms close to query terms
in text
User Relevance Feedback
• Most popular query reformulation strategy
• Cycle:
–User presented with list of retrieved documents
• After initial retrieval results are presented, allow the user to provide
feedback on the relevance of one or more of the retrieved documents.
–User marks those which are relevant
• In practice: top 10-20 ranked documents are examined
–Use this feedback information to reformulate the query.
• Select important terms from documents assessed relevant by users
–Enhance importance of these terms in a new query
• Produce new results based on reformulated query.
–Allows more interactive, multi-pass process.
• Expected:
–New query moves towards relevant documents and away from non-relevant
documents
User Relevance Feedback Architecture
Rankings
IR
System
Document
corpus
Ranked
Documents
1. Doc1
2. Doc2
3. Doc3
.
.
1. Doc1 
2. Doc2 
3. Doc3 
.
.
Feedback
Query
String
Revised
Query
ReRanked
Documents
1. Doc2
2. Doc4
3. Doc5
.
.
Query
Reformulation
3
9
Refinement by relevance feedback (cont.)
• In the vector model, a query is a vector of term weights; hence
reformulation involves reassigning term weights.
• If a document is known to be relevant, the query can be improved
by increasing its similarity to that document.
• If a document is known to be non-relevant, the query can be
improved by decreasing its similarity to that document.
• Problems:
• What if the query has to increase its similarity to two very non-similar
documents (each “pulls” the query in an entirely different direction)?
• What if the query has to be decrease its similarity to two very non-
similar documents (each “pushes” the query in an entirely different
direction)?
• Critical assumptions that must be made:
• Relevant documents resemble each other (are clustered).
• Non-relevant documents resemble each other (are clustered).
• Non-relevant documents differ from the relevant documents. 10
Refinement by relevance feedback (cont.)
• For a query q, denote
• DR : The set of relevant documents in the answer (as identified by the user)
• DN : The set of non-relevant documents in the answer.
• CR : The set of relevant documents in the collection (the ideal answer).
• Assume (unrealistic!) that CR is known in advance.
• It can then be shown that the best query vector for distinguishing
the relevant documents from the non-relevant documents is:
• The left expression is the centroid of the relevant documents, the
right expression is the centroid of the non-relevant documents.
• Note: This expression is vector arithmetic! dj are vectors, whereas
|CR| and |n – CR| are scalars.



 
 R
j
R
j C
d
j
R
C
d
j
R
d
C
n
d
C
1
1
11
Refinement by relevance feedback (cont.)
• Since we don’t know CR, we shall substitute it by DR in
each of the two expressions, and then use them to modify
the initial query q:
• a, b, and g are tuning constants; for example, 1.0, 0.5,
0.25.
• Note: This expression is vector arithmetic! q and dj are
vectors, whereas |DR|, |DN|, a, b, and g are scalars.




 
 N
j
R
j D
d
j
N
D
d
j
R
new
d
D
d
D
q
q
g
b
a.
12
Refinement by relevance feedback (cont.)
• Positive feedback factor. Uses the user's judgments
on relevant documents to increase the values of terms. Moves
the query to retrieve documents similar to relevant documents
retrieved (in the direction of more relevant documents).
• Negative feedback factor. Uses the user's judgments
on non-relevant documents to decrease the values of terms.
Moves the query away from non-relevant documents.
• Positive feedback often weighted significantly more than
negative feedback; Sometimes, only positive feedback is used.

 R
j
D
d
j
R
d
D
b

 N
j D
d
j
N
d
D
g
4
13
Refinement by relevance feedback (cont.)
• Example:
• Assume query q = (3,0,0,2,0) retrieved three documents: d1, d2, d3.
• Assume d1 and d2 are judged relevant and d3 is judged non-relevant.
• Assume the tuning constants used are 1.0, 0.5, 0.25.
k1 k2 k3 k4 k5
q 3 0 0 2 0
d1 2 4 0 0 2
d2 1 3 0 0 0
d3 0 0 4 3 3
qnew 3.75 1.75 0 1.25 0
The revised query is:
qnew = (3, 0, 0, 2, 0)
+ 0.5 * ((2+1)/2, (4+3)/2, (0+0)/2, (0+0)/2, (2+0)/2)
– 0.25 * (0, 0, 4, 3, 3)
= (3.75, 1.75, –1, 1.25, 0)
= (3.75, 1.75, 0, 1.25, 0)
14
Refinement by relevance feedback (cont.)
• Using a simplified similarity formula (the nominator only of the cosine):
we can compare the similarity of q and qnew to the three documents:
d1 d2 d3
q 6 3 6
qnew 14.5 9 3.75
• Compared to the original query, the new query is indeed more similar to
d1, d2. (which were judged relevant), and less similar to d3 (which was
judged non-relevant).
similarity d j ,q
i 1
t
W i , j Wi ,q
15
• Problem: Relevance feedback may not operate satisfactorily, if the
identified relevant documents do not form a tight cluster.
• Possible solution: Cluster the identified relevant documents, then split the
original query into several, by constructing a new query for each cluster.
• Problem: Some of the query terms might not be found in any of the
retrieved documents. This will lead to reduction of their
relative weight in the modified query (or even elimination).
Undesirable, because these terms might still be found in future
iterations.
• Possible solutions: Ensure that the original terms are kept; or present all
modified queries to the user for review.
• Problem: New query terms might be introduced that conflict with the
intention of the user
•Possible solutions: Present all modified queries to the user for review.
Refinement by relevance feedback (cont.)
16
Refinement by relevance feedback (cont.)
• Conclusion: Experimentation showed that user relevance
feedback in the vector model gives good results.
• However:
• Users are sometimes reluctant to provide explicit feedback
• Results in long queries that needs more computation to
retrieve, which is a lot of time for search engines.
• Makes it harder to understand why a particular document was
retrieved.
• “Fully automatic” relevance feedback: The rank values for
the documents in the first answer are used as relevance
feedback to automatically generate the second query (no human
judgment).
• The highest ranking documents are assumed to be relevant
(positive feedback only).
5
Pseudo Relevance Feedback
• Just assume the top m retrieved documents are relevant,
and use them to reformulate the query.
• Allows for query expansion that includes terms that are
correlated with the query terms.
 Two strategies:
– Local strategies: Approaches based on information derived
from set of initially retrieved documents (local set of
documents)
– Global strategies: Approaches based on global information
derived from document collection
Pseudo Feedback Architecture
Rankings
IR
System
Document
corpus
Ranked
Documents
1. Doc1
2. Doc2
3. Doc3
.
.
Query
String
Revised
Query
ReRanked
Documents
1. Doc2
2. Doc4
3. Doc5
.
.
Query
Reformulation
1. Doc1 
2. Doc2 
3. Doc3 
.
.
Pseudo
Feedback
Local analysis
 Examine documents retrieved for query to determine
query expansion
No user assistance
• Synonymy association: terms that frequently co-occur
inside local set of documents
 At query time, dynamically determine similar terms
based on analysis of top-ranked retrieved documents.
 Base correlation analysis on only the “local” set of
retrieved documents for a specific query.
 Avoids ambiguity by determining similar (correlated)
terms only within relevant documents.
“Apple computer”  “Apple computer Powerbook
laptop”
Association Matrix
w1 w2 w3 …………………..wn
w1
w2
w3
.
.
wn
c11 c12 c13…………………c1n
c21
c31
.
.
cn1
cij: Correlation factor between term i and term j
6
21
Clusters
• Synonymy association: terms that frequently co-occur
inside local set of documents
• Clustering techniques
– Query-index term association matrix (normalised)
– Where, tf(ti, d) is frequency of term i in document d
; Ci,j is association factor between term i and term j
;
– Term-term (e.g., stem-stem) association matrix is
normalised using mi,j
– Normalized score mi,j is 1 if two terms have the
same frequency in all documents




Dl
d
j
i
j
i d
t
tf
d
t
tf
c )
,
(
)
,
(
,
j
i
j
j
i
i
j
i
j
i
c
c
c
c
m
,
,
,
,
,



Example
• Given:
Doc 1 = D D A B C A B C
Doc 2 = E C E A A D
Doc 3 = D C B B D A B C A
Doc 4 = A
• Query: A E E
What is the New Reformulated Query using
synonymous association matrix?
Global analysis
• Expand query using information from whole set of
documents in collection
• Approach to select terms for query expansion
– Determine term similarity through a pre-computed
statistical analysis of the complete corpus.
• Compute association matrices which quantify term
correlations in terms of how frequently they co-occur.
• Expand queries with statistically most similar terms.
Problems with Global Analysis
• Term ambiguity may introduce irrelevant
statistically correlated terms.
– “Apple computer”  “Apple red fruit computer”
• Since terms are highly correlated anyway,
expansion may not retrieve many additional
documents.
7
Global vs. Local Analysis
• Global analysis requires intensive term correlation
computation only once at system development time.
• Global – Thesaurus used to help select terms for
expansion.
• Local analysis requires intensive term correlation
computation for every query at run time (although
number of terms and documents is less than in global
analysis).
• Local – Documents retrieved are examined to
automatically determine query expansion. No relevance
feedback needed.
• Generally local analysis gives better results.
Query Expansion Conclusions
• Expansion of queries with related terms can improve
performance, particularly recall.
• However, must select similar terms very carefully to avoid
problems, such as loss of precision.
26
27
Thank you

More Related Content

Similar to qury.pdf

Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt do
PonnuthuraiSelvaraj1
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
Aravindharamanan S
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluated
GESIS
 
Text mining
Text miningText mining
Text mining
Koshy Geoji
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Lucidworks
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Lucidworks
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Simon Hughes
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
Aravindharamanan S
 
Information retrieval 6 ir models
Information retrieval 6 ir modelsInformation retrieval 6 ir models
Information retrieval 6 ir models
Vaibhav Khanna
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
Kalpit Desai
 
Made to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchMade to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using Elasticsearch
Daniel Schneiter
 
Cue Forum2008
Cue Forum2008Cue Forum2008
Cue Forum2008
Matthew Apple
 
Learning by example: training users through high-quality query suggestions
Learning by example: training users through high-quality query suggestionsLearning by example: training users through high-quality query suggestions
Learning by example: training users through high-quality query suggestions
Claudia Hauff
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Lucidworks
 
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppte3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
appstore15
 
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docx
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docxDeductive vs Inductive ReasoningDeductive reasoning starts out w.docx
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docx
simonithomas47935
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
Riccardo Albertoni
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
PyData
 

Similar to qury.pdf (20)

Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt do
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluated
 
Text mining
Text miningText mining
Text mining
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
 
Information retrieval 6 ir models
Information retrieval 6 ir modelsInformation retrieval 6 ir models
Information retrieval 6 ir models
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Made to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchMade to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using Elasticsearch
 
Cue Forum2008
Cue Forum2008Cue Forum2008
Cue Forum2008
 
Learning by example: training users through high-quality query suggestions
Learning by example: training users through high-quality query suggestionsLearning by example: training users through high-quality query suggestions
Learning by example: training users through high-quality query suggestions
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
 
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppte3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
 
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docx
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docxDeductive vs Inductive ReasoningDeductive reasoning starts out w.docx
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docx
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
 

More from Habtamu100

Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
Habtamu100
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
Habtamu100
 
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdfChapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Habtamu100
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdf
Habtamu100
 
Chapter 3 Indexing.pdf
Chapter 3 Indexing.pdfChapter 3 Indexing.pdf
Chapter 3 Indexing.pdf
Habtamu100
 
Chapter 7.pdf
Chapter 7.pdfChapter 7.pdf
Chapter 7.pdf
Habtamu100
 

More from Habtamu100 (6)

Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
 
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdfChapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdf
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdf
 
Chapter 3 Indexing.pdf
Chapter 3 Indexing.pdfChapter 3 Indexing.pdf
Chapter 3 Indexing.pdf
 
Chapter 7.pdf
Chapter 7.pdfChapter 7.pdf
Chapter 7.pdf
 

Recently uploaded

BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
nitinpv4ai
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching AptitudeUGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
S. Raj Kumar
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
zuzanka
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
giancarloi8888
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
zuzanka
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
JomonJoseph58
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
National Information Standards Organization (NISO)
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
Steve Thomason
 
B. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdfB. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdf
BoudhayanBhattachari
 

Recently uploaded (20)

BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching AptitudeUGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
 
B. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdfB. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdf
 

qury.pdf

  • 1. 1 Chapter Seven Query Operations  Relevance feedback  Query expansion Problems with Keywords • May not retrieve relevant documents that include synonymous terms. ◦ “restaurant” vs. “café” ◦ “PRC” vs. “China” • May retrieve irrelevant documents that include ambiguous terms. ◦ “bat” (baseball vs. mammal) ◦ “Apple” (company vs. fruit) ◦ “bit” (unit of data vs. act of eating) Techniques for Intelligent IR • Take into account the meaning of the words used • Take into account the order of words in the query • Adapt to the user based on automatic or semi- automatic feedback • Extend search with related terms • Perform automatic spell checking / diacritics restoration (diacritics-A mark added to a letter to indicate a special pronunciation) • Take into account the authority of the source. Query operations l No detailed knowledge of collection and retrieval environment difficult to formulate queries well designed for retrieval Need many formulations of queries for effective retrieval uFirst formulation: often naïve attempt to retrieve relevant information uDocuments initially retrieved: Can be examined for relevance information by user, automatically by the system  Improve query formulations for retrieving additional relevant documents
  • 2. 2 Query reformulation •Two basic techniques to revise query to account for feedback: –Query expansion: Expanding original query with new terms from relevant documents. • This is done by adding new terms to query from relevant documents –Term reweighting in expanded query: Modify term weights based on user relevance judgements. • Increase weight of terms in relevant documents • decrease weight of terms in irrelevant documents 6 Approaches for Relevance Feedback Approaches based on Users relevance feedback  Relevance feedback with user input Clustering hypothesis: known relevant documents contain terms which can be used to describe a larger cluster of relevant documents Description of cluster built interactively with user assistance Approaches based on pseudo relevance feedback  Use relevance feedback methods without explicit user involvement. Obtain cluster description automatically Identify terms related to query terms e.g. synonyms, stemming variations, terms close to query terms in text User Relevance Feedback • Most popular query reformulation strategy • Cycle: –User presented with list of retrieved documents • After initial retrieval results are presented, allow the user to provide feedback on the relevance of one or more of the retrieved documents. –User marks those which are relevant • In practice: top 10-20 ranked documents are examined –Use this feedback information to reformulate the query. • Select important terms from documents assessed relevant by users –Enhance importance of these terms in a new query • Produce new results based on reformulated query. –Allows more interactive, multi-pass process. • Expected: –New query moves towards relevant documents and away from non-relevant documents User Relevance Feedback Architecture Rankings IR System Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3 . . 1. Doc1  2. Doc2  3. Doc3  . . Feedback Query String Revised Query ReRanked Documents 1. Doc2 2. Doc4 3. Doc5 . . Query Reformulation
  • 3. 3 9 Refinement by relevance feedback (cont.) • In the vector model, a query is a vector of term weights; hence reformulation involves reassigning term weights. • If a document is known to be relevant, the query can be improved by increasing its similarity to that document. • If a document is known to be non-relevant, the query can be improved by decreasing its similarity to that document. • Problems: • What if the query has to increase its similarity to two very non-similar documents (each “pulls” the query in an entirely different direction)? • What if the query has to be decrease its similarity to two very non- similar documents (each “pushes” the query in an entirely different direction)? • Critical assumptions that must be made: • Relevant documents resemble each other (are clustered). • Non-relevant documents resemble each other (are clustered). • Non-relevant documents differ from the relevant documents. 10 Refinement by relevance feedback (cont.) • For a query q, denote • DR : The set of relevant documents in the answer (as identified by the user) • DN : The set of non-relevant documents in the answer. • CR : The set of relevant documents in the collection (the ideal answer). • Assume (unrealistic!) that CR is known in advance. • It can then be shown that the best query vector for distinguishing the relevant documents from the non-relevant documents is: • The left expression is the centroid of the relevant documents, the right expression is the centroid of the non-relevant documents. • Note: This expression is vector arithmetic! dj are vectors, whereas |CR| and |n – CR| are scalars.       R j R j C d j R C d j R d C n d C 1 1 11 Refinement by relevance feedback (cont.) • Since we don’t know CR, we shall substitute it by DR in each of the two expressions, and then use them to modify the initial query q: • a, b, and g are tuning constants; for example, 1.0, 0.5, 0.25. • Note: This expression is vector arithmetic! q and dj are vectors, whereas |DR|, |DN|, a, b, and g are scalars.        N j R j D d j N D d j R new d D d D q q g b a. 12 Refinement by relevance feedback (cont.) • Positive feedback factor. Uses the user's judgments on relevant documents to increase the values of terms. Moves the query to retrieve documents similar to relevant documents retrieved (in the direction of more relevant documents). • Negative feedback factor. Uses the user's judgments on non-relevant documents to decrease the values of terms. Moves the query away from non-relevant documents. • Positive feedback often weighted significantly more than negative feedback; Sometimes, only positive feedback is used.   R j D d j R d D b   N j D d j N d D g
  • 4. 4 13 Refinement by relevance feedback (cont.) • Example: • Assume query q = (3,0,0,2,0) retrieved three documents: d1, d2, d3. • Assume d1 and d2 are judged relevant and d3 is judged non-relevant. • Assume the tuning constants used are 1.0, 0.5, 0.25. k1 k2 k3 k4 k5 q 3 0 0 2 0 d1 2 4 0 0 2 d2 1 3 0 0 0 d3 0 0 4 3 3 qnew 3.75 1.75 0 1.25 0 The revised query is: qnew = (3, 0, 0, 2, 0) + 0.5 * ((2+1)/2, (4+3)/2, (0+0)/2, (0+0)/2, (2+0)/2) – 0.25 * (0, 0, 4, 3, 3) = (3.75, 1.75, –1, 1.25, 0) = (3.75, 1.75, 0, 1.25, 0) 14 Refinement by relevance feedback (cont.) • Using a simplified similarity formula (the nominator only of the cosine): we can compare the similarity of q and qnew to the three documents: d1 d2 d3 q 6 3 6 qnew 14.5 9 3.75 • Compared to the original query, the new query is indeed more similar to d1, d2. (which were judged relevant), and less similar to d3 (which was judged non-relevant). similarity d j ,q i 1 t W i , j Wi ,q 15 • Problem: Relevance feedback may not operate satisfactorily, if the identified relevant documents do not form a tight cluster. • Possible solution: Cluster the identified relevant documents, then split the original query into several, by constructing a new query for each cluster. • Problem: Some of the query terms might not be found in any of the retrieved documents. This will lead to reduction of their relative weight in the modified query (or even elimination). Undesirable, because these terms might still be found in future iterations. • Possible solutions: Ensure that the original terms are kept; or present all modified queries to the user for review. • Problem: New query terms might be introduced that conflict with the intention of the user •Possible solutions: Present all modified queries to the user for review. Refinement by relevance feedback (cont.) 16 Refinement by relevance feedback (cont.) • Conclusion: Experimentation showed that user relevance feedback in the vector model gives good results. • However: • Users are sometimes reluctant to provide explicit feedback • Results in long queries that needs more computation to retrieve, which is a lot of time for search engines. • Makes it harder to understand why a particular document was retrieved. • “Fully automatic” relevance feedback: The rank values for the documents in the first answer are used as relevance feedback to automatically generate the second query (no human judgment). • The highest ranking documents are assumed to be relevant (positive feedback only).
  • 5. 5 Pseudo Relevance Feedback • Just assume the top m retrieved documents are relevant, and use them to reformulate the query. • Allows for query expansion that includes terms that are correlated with the query terms.  Two strategies: – Local strategies: Approaches based on information derived from set of initially retrieved documents (local set of documents) – Global strategies: Approaches based on global information derived from document collection Pseudo Feedback Architecture Rankings IR System Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3 . . Query String Revised Query ReRanked Documents 1. Doc2 2. Doc4 3. Doc5 . . Query Reformulation 1. Doc1  2. Doc2  3. Doc3  . . Pseudo Feedback Local analysis  Examine documents retrieved for query to determine query expansion No user assistance • Synonymy association: terms that frequently co-occur inside local set of documents  At query time, dynamically determine similar terms based on analysis of top-ranked retrieved documents.  Base correlation analysis on only the “local” set of retrieved documents for a specific query.  Avoids ambiguity by determining similar (correlated) terms only within relevant documents. “Apple computer”  “Apple computer Powerbook laptop” Association Matrix w1 w2 w3 …………………..wn w1 w2 w3 . . wn c11 c12 c13…………………c1n c21 c31 . . cn1 cij: Correlation factor between term i and term j
  • 6. 6 21 Clusters • Synonymy association: terms that frequently co-occur inside local set of documents • Clustering techniques – Query-index term association matrix (normalised) – Where, tf(ti, d) is frequency of term i in document d ; Ci,j is association factor between term i and term j ; – Term-term (e.g., stem-stem) association matrix is normalised using mi,j – Normalized score mi,j is 1 if two terms have the same frequency in all documents     Dl d j i j i d t tf d t tf c ) , ( ) , ( , j i j j i i j i j i c c c c m , , , , ,    Example • Given: Doc 1 = D D A B C A B C Doc 2 = E C E A A D Doc 3 = D C B B D A B C A Doc 4 = A • Query: A E E What is the New Reformulated Query using synonymous association matrix? Global analysis • Expand query using information from whole set of documents in collection • Approach to select terms for query expansion – Determine term similarity through a pre-computed statistical analysis of the complete corpus. • Compute association matrices which quantify term correlations in terms of how frequently they co-occur. • Expand queries with statistically most similar terms. Problems with Global Analysis • Term ambiguity may introduce irrelevant statistically correlated terms. – “Apple computer”  “Apple red fruit computer” • Since terms are highly correlated anyway, expansion may not retrieve many additional documents.
  • 7. 7 Global vs. Local Analysis • Global analysis requires intensive term correlation computation only once at system development time. • Global – Thesaurus used to help select terms for expansion. • Local analysis requires intensive term correlation computation for every query at run time (although number of terms and documents is less than in global analysis). • Local – Documents retrieved are examined to automatically determine query expansion. No relevance feedback needed. • Generally local analysis gives better results. Query Expansion Conclusions • Expansion of queries with related terms can improve performance, particularly recall. • However, must select similar terms very carefully to avoid problems, such as loss of precision. 26 27 Thank you