SlideShare a Scribd company logo
1 of 7
Download to read offline
1
Chapter Seven
Query Operations
 Relevance feedback
 Query expansion
Problems with Keywords
• May not retrieve relevant documents that
include synonymous terms.
◦ “restaurant” vs. “café”
◦ “PRC” vs. “China”
• May retrieve irrelevant documents that include
ambiguous terms.
◦ “bat” (baseball vs. mammal)
◦ “Apple” (company vs. fruit)
◦ “bit” (unit of data vs. act of eating)
Techniques for Intelligent IR
• Take into account the meaning of the words used
• Take into account the order of words in the query
• Adapt to the user based on automatic or semi-
automatic feedback
• Extend search with related terms
• Perform automatic spell checking / diacritics
restoration (diacritics-A mark added to a letter to
indicate a special pronunciation)
• Take into account the authority of the source.
Query operations
l No detailed knowledge of collection and retrieval
environment
difficult to formulate queries well designed for
retrieval
Need many formulations of queries for effective
retrieval
uFirst formulation: often naïve attempt to retrieve
relevant information
uDocuments initially retrieved:
Can be examined for relevance information by
user, automatically by the system
 Improve query formulations for retrieving
additional relevant documents
2
Query reformulation
•Two basic techniques to revise query to account for
feedback:
–Query expansion: Expanding original query with
new terms from relevant documents.
• This is done by adding new terms to query from
relevant documents
–Term reweighting in expanded query: Modify term
weights based on user relevance judgements.
• Increase weight of terms in relevant documents
• decrease weight of terms in irrelevant documents
6
Approaches for Relevance Feedback
Approaches based on Users relevance feedback
 Relevance feedback with user input
Clustering hypothesis: known relevant documents contain terms which
can be used to describe a larger cluster of relevant documents
Description of cluster built interactively with user
assistance
Approaches based on pseudo relevance feedback
 Use relevance feedback methods without explicit user
involvement.
Obtain cluster description automatically
Identify terms related to query terms
e.g. synonyms, stemming variations, terms close to query terms
in text
User Relevance Feedback
• Most popular query reformulation strategy
• Cycle:
–User presented with list of retrieved documents
• After initial retrieval results are presented, allow the user to provide
feedback on the relevance of one or more of the retrieved documents.
–User marks those which are relevant
• In practice: top 10-20 ranked documents are examined
–Use this feedback information to reformulate the query.
• Select important terms from documents assessed relevant by users
–Enhance importance of these terms in a new query
• Produce new results based on reformulated query.
–Allows more interactive, multi-pass process.
• Expected:
–New query moves towards relevant documents and away from non-relevant
documents
User Relevance Feedback Architecture
Rankings
IR
System
Document
corpus
Ranked
Documents
1. Doc1
2. Doc2
3. Doc3
.
.
1. Doc1 
2. Doc2 
3. Doc3 
.
.
Feedback
Query
String
Revised
Query
ReRanked
Documents
1. Doc2
2. Doc4
3. Doc5
.
.
Query
Reformulation
3
9
Refinement by relevance feedback (cont.)
• In the vector model, a query is a vector of term weights; hence
reformulation involves reassigning term weights.
• If a document is known to be relevant, the query can be improved
by increasing its similarity to that document.
• If a document is known to be non-relevant, the query can be
improved by decreasing its similarity to that document.
• Problems:
• What if the query has to increase its similarity to two very non-similar
documents (each “pulls” the query in an entirely different direction)?
• What if the query has to be decrease its similarity to two very non-
similar documents (each “pushes” the query in an entirely different
direction)?
• Critical assumptions that must be made:
• Relevant documents resemble each other (are clustered).
• Non-relevant documents resemble each other (are clustered).
• Non-relevant documents differ from the relevant documents. 10
Refinement by relevance feedback (cont.)
• For a query q, denote
• DR : The set of relevant documents in the answer (as identified by the user)
• DN : The set of non-relevant documents in the answer.
• CR : The set of relevant documents in the collection (the ideal answer).
• Assume (unrealistic!) that CR is known in advance.
• It can then be shown that the best query vector for distinguishing
the relevant documents from the non-relevant documents is:
• The left expression is the centroid of the relevant documents, the
right expression is the centroid of the non-relevant documents.
• Note: This expression is vector arithmetic! dj are vectors, whereas
|CR| and |n – CR| are scalars.



 
 R
j
R
j C
d
j
R
C
d
j
R
d
C
n
d
C
1
1
11
Refinement by relevance feedback (cont.)
• Since we don’t know CR, we shall substitute it by DR in
each of the two expressions, and then use them to modify
the initial query q:
• a, b, and g are tuning constants; for example, 1.0, 0.5,
0.25.
• Note: This expression is vector arithmetic! q and dj are
vectors, whereas |DR|, |DN|, a, b, and g are scalars.




 
 N
j
R
j D
d
j
N
D
d
j
R
new
d
D
d
D
q
q
g
b
a.
12
Refinement by relevance feedback (cont.)
• Positive feedback factor. Uses the user's judgments
on relevant documents to increase the values of terms. Moves
the query to retrieve documents similar to relevant documents
retrieved (in the direction of more relevant documents).
• Negative feedback factor. Uses the user's judgments
on non-relevant documents to decrease the values of terms.
Moves the query away from non-relevant documents.
• Positive feedback often weighted significantly more than
negative feedback; Sometimes, only positive feedback is used.

 R
j
D
d
j
R
d
D
b

 N
j D
d
j
N
d
D
g
4
13
Refinement by relevance feedback (cont.)
• Example:
• Assume query q = (3,0,0,2,0) retrieved three documents: d1, d2, d3.
• Assume d1 and d2 are judged relevant and d3 is judged non-relevant.
• Assume the tuning constants used are 1.0, 0.5, 0.25.
k1 k2 k3 k4 k5
q 3 0 0 2 0
d1 2 4 0 0 2
d2 1 3 0 0 0
d3 0 0 4 3 3
qnew 3.75 1.75 0 1.25 0
The revised query is:
qnew = (3, 0, 0, 2, 0)
+ 0.5 * ((2+1)/2, (4+3)/2, (0+0)/2, (0+0)/2, (2+0)/2)
– 0.25 * (0, 0, 4, 3, 3)
= (3.75, 1.75, –1, 1.25, 0)
= (3.75, 1.75, 0, 1.25, 0)
14
Refinement by relevance feedback (cont.)
• Using a simplified similarity formula (the nominator only of the cosine):
we can compare the similarity of q and qnew to the three documents:
d1 d2 d3
q 6 3 6
qnew 14.5 9 3.75
• Compared to the original query, the new query is indeed more similar to
d1, d2. (which were judged relevant), and less similar to d3 (which was
judged non-relevant).
similarity d j ,q
i 1
t
W i , j Wi ,q
15
• Problem: Relevance feedback may not operate satisfactorily, if the
identified relevant documents do not form a tight cluster.
• Possible solution: Cluster the identified relevant documents, then split the
original query into several, by constructing a new query for each cluster.
• Problem: Some of the query terms might not be found in any of the
retrieved documents. This will lead to reduction of their
relative weight in the modified query (or even elimination).
Undesirable, because these terms might still be found in future
iterations.
• Possible solutions: Ensure that the original terms are kept; or present all
modified queries to the user for review.
• Problem: New query terms might be introduced that conflict with the
intention of the user
•Possible solutions: Present all modified queries to the user for review.
Refinement by relevance feedback (cont.)
16
Refinement by relevance feedback (cont.)
• Conclusion: Experimentation showed that user relevance
feedback in the vector model gives good results.
• However:
• Users are sometimes reluctant to provide explicit feedback
• Results in long queries that needs more computation to
retrieve, which is a lot of time for search engines.
• Makes it harder to understand why a particular document was
retrieved.
• “Fully automatic” relevance feedback: The rank values for
the documents in the first answer are used as relevance
feedback to automatically generate the second query (no human
judgment).
• The highest ranking documents are assumed to be relevant
(positive feedback only).
5
Pseudo Relevance Feedback
• Just assume the top m retrieved documents are relevant,
and use them to reformulate the query.
• Allows for query expansion that includes terms that are
correlated with the query terms.
 Two strategies:
– Local strategies: Approaches based on information derived
from set of initially retrieved documents (local set of
documents)
– Global strategies: Approaches based on global information
derived from document collection
Pseudo Feedback Architecture
Rankings
IR
System
Document
corpus
Ranked
Documents
1. Doc1
2. Doc2
3. Doc3
.
.
Query
String
Revised
Query
ReRanked
Documents
1. Doc2
2. Doc4
3. Doc5
.
.
Query
Reformulation
1. Doc1 
2. Doc2 
3. Doc3 
.
.
Pseudo
Feedback
Local analysis
 Examine documents retrieved for query to determine
query expansion
No user assistance
• Synonymy association: terms that frequently co-occur
inside local set of documents
 At query time, dynamically determine similar terms
based on analysis of top-ranked retrieved documents.
 Base correlation analysis on only the “local” set of
retrieved documents for a specific query.
 Avoids ambiguity by determining similar (correlated)
terms only within relevant documents.
“Apple computer”  “Apple computer Powerbook
laptop”
Association Matrix
w1 w2 w3 …………………..wn
w1
w2
w3
.
.
wn
c11 c12 c13…………………c1n
c21
c31
.
.
cn1
cij: Correlation factor between term i and term j
6
21
Clusters
• Synonymy association: terms that frequently co-occur
inside local set of documents
• Clustering techniques
– Query-index term association matrix (normalised)
– Where, tf(ti, d) is frequency of term i in document d
; Ci,j is association factor between term i and term j
;
– Term-term (e.g., stem-stem) association matrix is
normalised using mi,j
– Normalized score mi,j is 1 if two terms have the
same frequency in all documents




Dl
d
j
i
j
i d
t
tf
d
t
tf
c )
,
(
)
,
(
,
j
i
j
j
i
i
j
i
j
i
c
c
c
c
m
,
,
,
,
,



Example
• Given:
Doc 1 = D D A B C A B C
Doc 2 = E C E A A D
Doc 3 = D C B B D A B C A
Doc 4 = A
• Query: A E E
What is the New Reformulated Query using
synonymous association matrix?
Global analysis
• Expand query using information from whole set of
documents in collection
• Approach to select terms for query expansion
– Determine term similarity through a pre-computed
statistical analysis of the complete corpus.
• Compute association matrices which quantify term
correlations in terms of how frequently they co-occur.
• Expand queries with statistically most similar terms.
Problems with Global Analysis
• Term ambiguity may introduce irrelevant
statistically correlated terms.
– “Apple computer”  “Apple red fruit computer”
• Since terms are highly correlated anyway,
expansion may not retrieve many additional
documents.
7
Global vs. Local Analysis
• Global analysis requires intensive term correlation
computation only once at system development time.
• Global – Thesaurus used to help select terms for
expansion.
• Local analysis requires intensive term correlation
computation for every query at run time (although
number of terms and documents is less than in global
analysis).
• Local – Documents retrieved are examined to
automatically determine query expansion. No relevance
feedback needed.
• Generally local analysis gives better results.
Query Expansion Conclusions
• Expansion of queries with related terms can improve
performance, particularly recall.
• However, must select similar terms very carefully to avoid
problems, such as loss of precision.
26
27
Thank you

More Related Content

Similar to qury.pdf

Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt doPonnuthuraiSelvaraj1
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systemsAravindharamanan S
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluatedGESIS
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Lucidworks
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comLucidworks
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
Information retrieval 6 ir models
Information retrieval 6 ir modelsInformation retrieval 6 ir models
Information retrieval 6 ir modelsVaibhav Khanna
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Made to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchMade to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchDaniel Schneiter
 
Learning by example: training users through high-quality query suggestions
Learning by example: training users through high-quality query suggestionsLearning by example: training users through high-quality query suggestions
Learning by example: training users through high-quality query suggestionsClaudia Hauff
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseLucidworks
 
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppte3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.pptappstore15
 
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docx
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docxDeductive vs Inductive ReasoningDeductive reasoning starts out w.docx
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docxsimonithomas47935
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...PyData
 

Similar to qury.pdf (20)

Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt do
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluated
 
Text mining
Text miningText mining
Text mining
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
 
Information retrieval 6 ir models
Information retrieval 6 ir modelsInformation retrieval 6 ir models
Information retrieval 6 ir models
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Made to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchMade to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using Elasticsearch
 
Cue Forum2008
Cue Forum2008Cue Forum2008
Cue Forum2008
 
Learning by example: training users through high-quality query suggestions
Learning by example: training users through high-quality query suggestionsLearning by example: training users through high-quality query suggestions
Learning by example: training users through high-quality query suggestions
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
 
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppte3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
 
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docx
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docxDeductive vs Inductive ReasoningDeductive reasoning starts out w.docx
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docx
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
 

More from Habtamu100

Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptxHabtamu100
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdfHabtamu100
 
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdfChapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdfHabtamu100
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfHabtamu100
 
Chapter 3 Indexing.pdf
Chapter 3 Indexing.pdfChapter 3 Indexing.pdf
Chapter 3 Indexing.pdfHabtamu100
 

More from Habtamu100 (6)

Chapter 1.pptx
Chapter 1.pptxChapter 1.pptx
Chapter 1.pptx
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
 
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdfChapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdf
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdf
 
Chapter 3 Indexing.pdf
Chapter 3 Indexing.pdfChapter 3 Indexing.pdf
Chapter 3 Indexing.pdf
 
Chapter 7.pdf
Chapter 7.pdfChapter 7.pdf
Chapter 7.pdf
 

Recently uploaded

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Recently uploaded (20)

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

qury.pdf

  • 1. 1 Chapter Seven Query Operations  Relevance feedback  Query expansion Problems with Keywords • May not retrieve relevant documents that include synonymous terms. ◦ “restaurant” vs. “café” ◦ “PRC” vs. “China” • May retrieve irrelevant documents that include ambiguous terms. ◦ “bat” (baseball vs. mammal) ◦ “Apple” (company vs. fruit) ◦ “bit” (unit of data vs. act of eating) Techniques for Intelligent IR • Take into account the meaning of the words used • Take into account the order of words in the query • Adapt to the user based on automatic or semi- automatic feedback • Extend search with related terms • Perform automatic spell checking / diacritics restoration (diacritics-A mark added to a letter to indicate a special pronunciation) • Take into account the authority of the source. Query operations l No detailed knowledge of collection and retrieval environment difficult to formulate queries well designed for retrieval Need many formulations of queries for effective retrieval uFirst formulation: often naïve attempt to retrieve relevant information uDocuments initially retrieved: Can be examined for relevance information by user, automatically by the system  Improve query formulations for retrieving additional relevant documents
  • 2. 2 Query reformulation •Two basic techniques to revise query to account for feedback: –Query expansion: Expanding original query with new terms from relevant documents. • This is done by adding new terms to query from relevant documents –Term reweighting in expanded query: Modify term weights based on user relevance judgements. • Increase weight of terms in relevant documents • decrease weight of terms in irrelevant documents 6 Approaches for Relevance Feedback Approaches based on Users relevance feedback  Relevance feedback with user input Clustering hypothesis: known relevant documents contain terms which can be used to describe a larger cluster of relevant documents Description of cluster built interactively with user assistance Approaches based on pseudo relevance feedback  Use relevance feedback methods without explicit user involvement. Obtain cluster description automatically Identify terms related to query terms e.g. synonyms, stemming variations, terms close to query terms in text User Relevance Feedback • Most popular query reformulation strategy • Cycle: –User presented with list of retrieved documents • After initial retrieval results are presented, allow the user to provide feedback on the relevance of one or more of the retrieved documents. –User marks those which are relevant • In practice: top 10-20 ranked documents are examined –Use this feedback information to reformulate the query. • Select important terms from documents assessed relevant by users –Enhance importance of these terms in a new query • Produce new results based on reformulated query. –Allows more interactive, multi-pass process. • Expected: –New query moves towards relevant documents and away from non-relevant documents User Relevance Feedback Architecture Rankings IR System Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3 . . 1. Doc1  2. Doc2  3. Doc3  . . Feedback Query String Revised Query ReRanked Documents 1. Doc2 2. Doc4 3. Doc5 . . Query Reformulation
  • 3. 3 9 Refinement by relevance feedback (cont.) • In the vector model, a query is a vector of term weights; hence reformulation involves reassigning term weights. • If a document is known to be relevant, the query can be improved by increasing its similarity to that document. • If a document is known to be non-relevant, the query can be improved by decreasing its similarity to that document. • Problems: • What if the query has to increase its similarity to two very non-similar documents (each “pulls” the query in an entirely different direction)? • What if the query has to be decrease its similarity to two very non- similar documents (each “pushes” the query in an entirely different direction)? • Critical assumptions that must be made: • Relevant documents resemble each other (are clustered). • Non-relevant documents resemble each other (are clustered). • Non-relevant documents differ from the relevant documents. 10 Refinement by relevance feedback (cont.) • For a query q, denote • DR : The set of relevant documents in the answer (as identified by the user) • DN : The set of non-relevant documents in the answer. • CR : The set of relevant documents in the collection (the ideal answer). • Assume (unrealistic!) that CR is known in advance. • It can then be shown that the best query vector for distinguishing the relevant documents from the non-relevant documents is: • The left expression is the centroid of the relevant documents, the right expression is the centroid of the non-relevant documents. • Note: This expression is vector arithmetic! dj are vectors, whereas |CR| and |n – CR| are scalars.       R j R j C d j R C d j R d C n d C 1 1 11 Refinement by relevance feedback (cont.) • Since we don’t know CR, we shall substitute it by DR in each of the two expressions, and then use them to modify the initial query q: • a, b, and g are tuning constants; for example, 1.0, 0.5, 0.25. • Note: This expression is vector arithmetic! q and dj are vectors, whereas |DR|, |DN|, a, b, and g are scalars.        N j R j D d j N D d j R new d D d D q q g b a. 12 Refinement by relevance feedback (cont.) • Positive feedback factor. Uses the user's judgments on relevant documents to increase the values of terms. Moves the query to retrieve documents similar to relevant documents retrieved (in the direction of more relevant documents). • Negative feedback factor. Uses the user's judgments on non-relevant documents to decrease the values of terms. Moves the query away from non-relevant documents. • Positive feedback often weighted significantly more than negative feedback; Sometimes, only positive feedback is used.   R j D d j R d D b   N j D d j N d D g
  • 4. 4 13 Refinement by relevance feedback (cont.) • Example: • Assume query q = (3,0,0,2,0) retrieved three documents: d1, d2, d3. • Assume d1 and d2 are judged relevant and d3 is judged non-relevant. • Assume the tuning constants used are 1.0, 0.5, 0.25. k1 k2 k3 k4 k5 q 3 0 0 2 0 d1 2 4 0 0 2 d2 1 3 0 0 0 d3 0 0 4 3 3 qnew 3.75 1.75 0 1.25 0 The revised query is: qnew = (3, 0, 0, 2, 0) + 0.5 * ((2+1)/2, (4+3)/2, (0+0)/2, (0+0)/2, (2+0)/2) – 0.25 * (0, 0, 4, 3, 3) = (3.75, 1.75, –1, 1.25, 0) = (3.75, 1.75, 0, 1.25, 0) 14 Refinement by relevance feedback (cont.) • Using a simplified similarity formula (the nominator only of the cosine): we can compare the similarity of q and qnew to the three documents: d1 d2 d3 q 6 3 6 qnew 14.5 9 3.75 • Compared to the original query, the new query is indeed more similar to d1, d2. (which were judged relevant), and less similar to d3 (which was judged non-relevant). similarity d j ,q i 1 t W i , j Wi ,q 15 • Problem: Relevance feedback may not operate satisfactorily, if the identified relevant documents do not form a tight cluster. • Possible solution: Cluster the identified relevant documents, then split the original query into several, by constructing a new query for each cluster. • Problem: Some of the query terms might not be found in any of the retrieved documents. This will lead to reduction of their relative weight in the modified query (or even elimination). Undesirable, because these terms might still be found in future iterations. • Possible solutions: Ensure that the original terms are kept; or present all modified queries to the user for review. • Problem: New query terms might be introduced that conflict with the intention of the user •Possible solutions: Present all modified queries to the user for review. Refinement by relevance feedback (cont.) 16 Refinement by relevance feedback (cont.) • Conclusion: Experimentation showed that user relevance feedback in the vector model gives good results. • However: • Users are sometimes reluctant to provide explicit feedback • Results in long queries that needs more computation to retrieve, which is a lot of time for search engines. • Makes it harder to understand why a particular document was retrieved. • “Fully automatic” relevance feedback: The rank values for the documents in the first answer are used as relevance feedback to automatically generate the second query (no human judgment). • The highest ranking documents are assumed to be relevant (positive feedback only).
  • 5. 5 Pseudo Relevance Feedback • Just assume the top m retrieved documents are relevant, and use them to reformulate the query. • Allows for query expansion that includes terms that are correlated with the query terms.  Two strategies: – Local strategies: Approaches based on information derived from set of initially retrieved documents (local set of documents) – Global strategies: Approaches based on global information derived from document collection Pseudo Feedback Architecture Rankings IR System Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3 . . Query String Revised Query ReRanked Documents 1. Doc2 2. Doc4 3. Doc5 . . Query Reformulation 1. Doc1  2. Doc2  3. Doc3  . . Pseudo Feedback Local analysis  Examine documents retrieved for query to determine query expansion No user assistance • Synonymy association: terms that frequently co-occur inside local set of documents  At query time, dynamically determine similar terms based on analysis of top-ranked retrieved documents.  Base correlation analysis on only the “local” set of retrieved documents for a specific query.  Avoids ambiguity by determining similar (correlated) terms only within relevant documents. “Apple computer”  “Apple computer Powerbook laptop” Association Matrix w1 w2 w3 …………………..wn w1 w2 w3 . . wn c11 c12 c13…………………c1n c21 c31 . . cn1 cij: Correlation factor between term i and term j
  • 6. 6 21 Clusters • Synonymy association: terms that frequently co-occur inside local set of documents • Clustering techniques – Query-index term association matrix (normalised) – Where, tf(ti, d) is frequency of term i in document d ; Ci,j is association factor between term i and term j ; – Term-term (e.g., stem-stem) association matrix is normalised using mi,j – Normalized score mi,j is 1 if two terms have the same frequency in all documents     Dl d j i j i d t tf d t tf c ) , ( ) , ( , j i j j i i j i j i c c c c m , , , , ,    Example • Given: Doc 1 = D D A B C A B C Doc 2 = E C E A A D Doc 3 = D C B B D A B C A Doc 4 = A • Query: A E E What is the New Reformulated Query using synonymous association matrix? Global analysis • Expand query using information from whole set of documents in collection • Approach to select terms for query expansion – Determine term similarity through a pre-computed statistical analysis of the complete corpus. • Compute association matrices which quantify term correlations in terms of how frequently they co-occur. • Expand queries with statistically most similar terms. Problems with Global Analysis • Term ambiguity may introduce irrelevant statistically correlated terms. – “Apple computer”  “Apple red fruit computer” • Since terms are highly correlated anyway, expansion may not retrieve many additional documents.
  • 7. 7 Global vs. Local Analysis • Global analysis requires intensive term correlation computation only once at system development time. • Global – Thesaurus used to help select terms for expansion. • Local analysis requires intensive term correlation computation for every query at run time (although number of terms and documents is less than in global analysis). • Local – Documents retrieved are examined to automatically determine query expansion. No relevance feedback needed. • Generally local analysis gives better results. Query Expansion Conclusions • Expansion of queries with related terms can improve performance, particularly recall. • However, must select similar terms very carefully to avoid problems, such as loss of precision. 26 27 Thank you