SlideShare a Scribd company logo
1 of 21
2
Why System Evaluation?
It provides the ability to measure the difference between
IR systems
 How well do our search engines work?
 Is system A better than B?
 Under what conditions?
Evaluation drives what to research on regarding existing
IR system. And what to improve
 Identify techniques that work and do not work
 There are many retrieval models/ algorithms/ systems
 which one is the best?
 What is the best component for:
 Similarity measures (dot-product, cosine, …)
 Index term selection (stop-word removal, stemming…)
 Term weighting (TF, TF-IDF,…)
Types of Evaluation Strategies
System-centered evaluation
 Given documents, queries, & relevance judgments
 Try several variations of the system
 Measure which system returns the “best” matching list of
documents
User-centered evaluation
 Given several users, and at least two IR systems
 Have each user try the same task on each system
 Measure which system works the “best” for users
information need
 How do we measure users satisfaction? How do we know
their impression towards the IR system ?
Major Evaluation Criteria
What are some of the main measures for evaluating
an IR system’s performance?
Efficiency: time, space
Speed in terms of retrieval time and indexing time
Speed of query processing
The space taken by corpus vs. index
 Is there a need for compression
Index size: Index/corpus size ratio
Effectiveness
How is a system capable of retrieving relevant documents
from the collection?
Is a system better than another one?
User satisfaction: How “good” are the documents that are
returned as a response to user query?
“Relevance” of results to meet information need of users
5
Difficulties in Evaluating IR System
IR systems essentially facilitate communication
between a user and document collections
Relevance is a measure of the effectiveness of
communication
Effectiveness is related to the relevancy of retrieved items.
Relevance: relates information need (query) and a
document or surrogate
Relevancy is not typically binary but continuous.
Even if relevancy is binary, it is a difficult judgment to make.
 Relevance is the degree of a correspondence existing
between a document and a query as determined by
requester / information specialist/ external judge /
other users
Difficulties in Evaluating IR System
Relevance judgments is made by
The user who posed the retrieval problem
An external judge or information specialists or system
developer
 Is the relevance judgment made by users,
information specialists or external person the same ?
Why?
 Relevance judgment is usually:
 Subjective: Depends upon a specific user’s judgment.
 Situational: Relates to user’s current needs.
 Cognitive: Depends on human perception and
behavior.
 Dynamic: Changes over time.
Retrieval scenario
= relevant document
A.
B.
C.
D.
E.
F.
•The scenario where 13 results retrieved by different
systems for a given query?
Measuring Retrieval Effectiveness
Retrieval of documents may result in:
False negative (false drop): some relevant documents may
not be retrieved.
False positive: some irrelevant documents may be retrieved.
For many applications a good index should not permit any
false drops, but may permit a few false positives.
“Type two errors”
“Errors of omission”
“False negatives”
“Type one errors”
“Errors of commission”
“False positives”
•Metrics often used to
evaluate effectiveness
of the system
relevant irrelevant
retrieved
not
retrieved
A B
C D
Relevant performance metrics
Recall: The ability of the search to find all of the relevant
items in the corpus
 Recall is percentage of relevant documents retrieved from
the database in response to users query.
= No. of relevant documents retrieved
Total no. of relevant documents in database
Precision: The ability to retrieve top-ranked documents that
are mostly relevant.
Precision is percentage of retrieved documents that are
relevant to the query (i.e. number of retrieved documents
that are relevant).
= No. of relevant documents retrieved
Total number of documents retrieved
Measuring Retrieval Effectiveness
When is precision important? When is recall important?
|
}
{
|
|
}
{
}
{
|
Re
Relevant
Retrieved
Relevant
call


|
}
{
|
|
}
{
}
{
|
Pr
Retrieved
Retrieved
Relevant
ecision


Relevant Not
relevant
Retrieved A B
Not retrieved C D
Collection size = A+B+C+D
Relevant = A+C
Retrieved = A+B
Relevant Retrieved
Relevant +
Retrieved
Not Relevant + Not Retrieved
Example 1
= relevant document
Assume there are 14 relevant documents in the corpus,
compute precision and recall at the given cutoff points?
1/14 1/14 1/14 1/14 2/14 3/14 3/14 4/14 4/14 4/14
5/11 5/12 5/13 5/14 5/15 6/16 6/17 6/18 6/19 6/20
Recall
Precision
Hits 1-10
Hits 11-20
1/1 1/2 1/3 1/4 2/5 3/6 3/7 4/8 4/9 4/10
Precision
5/14 5/14 5/14 5/14 5/14 6/14 6/14 6/14 6/14 6/14
Recall
12
Example 2
n doc # relevant RecallPrecision
1 588 x 0.17 1
2 589 x 0.33 1
3 576
4 590 x 0.5 0.75
5 986
6 592 x 0.67 0.667
7 984
8 988
9 578
10 985
11 103
12 591
13 772 x 0.83 0.38
14 990
•Let total number
of relevant
documents = 6,
compute recall
and precision for
each cut off point
n:
Average precision
Average precision at each retrieved relevant document
 Relevant documents not retrieved contribute zero to score
Example: Assume total of 14 relevant documents,
compute mean average precision.
= relevant document
1/1 1/2 1/3 1/4 2/5 3/6 3/7 4/8 4/9 4/10
5/11 5/12 5/135/14 5/15 6/16 6/17 6/18 6/19
4/20
Precision
Precision
Hits 1-10
Hits 11-20
AP = 0.231
MAP (Mean Average Precision)
 rij = rank of the j-th relevant
document for Qi
 |Ri| = #rel. doc. for Qi
 n = # test queries
 
 


i i
j
Q R
D ij
i r
j
R
n
MAP )
(
|
|
1
1
8
3
)]
8
2
4
1
(
2
1
)
10
3
5
2
1
1
(
3
1
[
2
1






MAP
• Computing mean average for more than one query
Relevant Docs.
retrieved
Query
1
Query
2
1st rel. doc. 1 4
2nd rel. doc. 5 8
3rd rel. doc. 10
• E.g. Assume that for query 1 and 2, there are 3 and 2 relevant
documents in the collection, respectively.
15
R- Precision
Precision at the R-th position in the ranking of results
for a query, where R is the total number of relevant
documents.
 Calculate precision after R documents are seen
 Can be averaged over all queries
n doc # relevant
1 588 x
2 589 x
3 576
4 590 x
5 986
6 592 x
7 984
8 988
9 578
10 985
11 103
12 591
13 772 x
14 990
R = # of relevant docs = 6
R-Precision = 4/6 = 0.67
• Average precision is calculated by averaging precision
when recall increases. Hence
• average precision at this recall values = 62.2% for
Ranking #1, 52.0% for Ranking #2 . Thus, using this
measure we can say that Ranking #1 is better than #2.
More Example
Precision/Recall tradeoff
Can increase recall by retrieving many documents
(down to a low level of relevance ranking),
but many irrelevant documents would be fetched, reducing
precision
Can get high recall (but low precision) by retrieving all
documents for all queries
1
0
1
Recall
Precision
The ideal: Better recall
and precision
Returns relevant
documents but
misses many
useful ones too
Returns most relevant
documents but includes
lots of junk
18
F-Measure
 One measure of performance that takes into
account both recall and precision.
 Harmonic mean of recall and precision:
 Compared to arithmetic mean, both need to be
high for harmonic mean to be high.
 What if no relevant documents exist?
P
R
R
P
PR
F 1
1
2
2




19
E-Measure
Associated with Van Rijsbergen
Allows user to specify importance of recall and
precision
It is parameterized F Measure. A variant of F
measure that allows weighting emphasis on
precision over recall:
Value of  controls trade-off:
 = 1: Equal weight for precision and recall (E=F).
 > 1: Weight recall more. It emphasizes precision.
 < 1: Weight precision more. It emphasizes recall.
P
R
R
P
PR
E 1
2
2
2
2
)
1
(
)
1
(





 



20
Quiz(5%)
Write short answer for the following questions
1. List and explain the two IR system Evaluation
Strategies(2pt)
2. Suppose XY IR system returns 8 relevant
documents, and 10 irrelevant documents. There
are a total of 20 relevant documents in the
collection/dataset. What is the precision of the
system on this search, and what is its recall?(3pt)
information technology materrailas paper

More Related Content

Similar to information technology materrailas paper

Candidate Link Generation Using Semantic Phermone Swarm
Candidate Link Generation Using Semantic Phermone SwarmCandidate Link Generation Using Semantic Phermone Swarm
Candidate Link Generation Using Semantic Phermone Swarmkevig
 
SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS
SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS
SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS ijcsit
 
Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt doPonnuthuraiSelvaraj1
 
Information retrieval
Information retrievalInformation retrieval
Information retrievalhplap
 
Validation basics
Validation basicsValidation basics
Validation basicsAnalysys
 
Data Analytics all units
Data Analytics all unitsData Analytics all units
Data Analytics all unitsjayaramb
 
Data analysis market research
Data analysis   market researchData analysis   market research
Data analysis market researchsachinudepurkar
 
Chapter 11 Metrics for process and projects.ppt
Chapter 11  Metrics for process and projects.pptChapter 11  Metrics for process and projects.ppt
Chapter 11 Metrics for process and projects.pptssuser3f82c9
 
Section b
Section bSection b
Section bPCTE
 
1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docx1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docxSONU61709
 
Web Information Retrieval - Homework 1
Web Information Retrieval - Homework 1Web Information Retrieval - Homework 1
Web Information Retrieval - Homework 1Biagio Botticelli
 
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptxUnit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptxtesfkeb
 
Informatio retrival evaluation
Informatio retrival evaluationInformatio retrival evaluation
Informatio retrival evaluationNidhirBiswas
 
sources of data.ppt
sources of data.pptsources of data.ppt
sources of data.pptTeenaPS1
 
Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxdaniahendric
 
3 Software Estmation.ppt
3 Software Estmation.ppt3 Software Estmation.ppt
3 Software Estmation.pptSoham De
 

Similar to information technology materrailas paper (20)

Candidate Link Generation Using Semantic Phermone Swarm
Candidate Link Generation Using Semantic Phermone SwarmCandidate Link Generation Using Semantic Phermone Swarm
Candidate Link Generation Using Semantic Phermone Swarm
 
SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS
SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS
SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS
 
Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt do
 
Chapter 7.pdf
Chapter 7.pdfChapter 7.pdf
Chapter 7.pdf
 
qury.pdf
qury.pdfqury.pdf
qury.pdf
 
Introduction
IntroductionIntroduction
Introduction
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
 
Validation basics
Validation basicsValidation basics
Validation basics
 
Data Analytics all units
Data Analytics all unitsData Analytics all units
Data Analytics all units
 
Business analyst
Business analystBusiness analyst
Business analyst
 
Data analysis market research
Data analysis   market researchData analysis   market research
Data analysis market research
 
Chapter 11 Metrics for process and projects.ppt
Chapter 11  Metrics for process and projects.pptChapter 11  Metrics for process and projects.ppt
Chapter 11 Metrics for process and projects.ppt
 
Section b
Section bSection b
Section b
 
1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docx1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docx
 
Web Information Retrieval - Homework 1
Web Information Retrieval - Homework 1Web Information Retrieval - Homework 1
Web Information Retrieval - Homework 1
 
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptxUnit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
 
Informatio retrival evaluation
Informatio retrival evaluationInformatio retrival evaluation
Informatio retrival evaluation
 
sources of data.ppt
sources of data.pptsources of data.ppt
sources of data.ppt
 
Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docx
 
3 Software Estmation.ppt
3 Software Estmation.ppt3 Software Estmation.ppt
3 Software Estmation.ppt
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

information technology materrailas paper

  • 1.
  • 2. 2 Why System Evaluation? It provides the ability to measure the difference between IR systems  How well do our search engines work?  Is system A better than B?  Under what conditions? Evaluation drives what to research on regarding existing IR system. And what to improve  Identify techniques that work and do not work  There are many retrieval models/ algorithms/ systems  which one is the best?  What is the best component for:  Similarity measures (dot-product, cosine, …)  Index term selection (stop-word removal, stemming…)  Term weighting (TF, TF-IDF,…)
  • 3. Types of Evaluation Strategies System-centered evaluation  Given documents, queries, & relevance judgments  Try several variations of the system  Measure which system returns the “best” matching list of documents User-centered evaluation  Given several users, and at least two IR systems  Have each user try the same task on each system  Measure which system works the “best” for users information need  How do we measure users satisfaction? How do we know their impression towards the IR system ?
  • 4. Major Evaluation Criteria What are some of the main measures for evaluating an IR system’s performance? Efficiency: time, space Speed in terms of retrieval time and indexing time Speed of query processing The space taken by corpus vs. index  Is there a need for compression Index size: Index/corpus size ratio Effectiveness How is a system capable of retrieving relevant documents from the collection? Is a system better than another one? User satisfaction: How “good” are the documents that are returned as a response to user query? “Relevance” of results to meet information need of users
  • 5. 5 Difficulties in Evaluating IR System IR systems essentially facilitate communication between a user and document collections Relevance is a measure of the effectiveness of communication Effectiveness is related to the relevancy of retrieved items. Relevance: relates information need (query) and a document or surrogate Relevancy is not typically binary but continuous. Even if relevancy is binary, it is a difficult judgment to make.  Relevance is the degree of a correspondence existing between a document and a query as determined by requester / information specialist/ external judge / other users
  • 6. Difficulties in Evaluating IR System Relevance judgments is made by The user who posed the retrieval problem An external judge or information specialists or system developer  Is the relevance judgment made by users, information specialists or external person the same ? Why?  Relevance judgment is usually:  Subjective: Depends upon a specific user’s judgment.  Situational: Relates to user’s current needs.  Cognitive: Depends on human perception and behavior.  Dynamic: Changes over time.
  • 7. Retrieval scenario = relevant document A. B. C. D. E. F. •The scenario where 13 results retrieved by different systems for a given query?
  • 8. Measuring Retrieval Effectiveness Retrieval of documents may result in: False negative (false drop): some relevant documents may not be retrieved. False positive: some irrelevant documents may be retrieved. For many applications a good index should not permit any false drops, but may permit a few false positives. “Type two errors” “Errors of omission” “False negatives” “Type one errors” “Errors of commission” “False positives” •Metrics often used to evaluate effectiveness of the system relevant irrelevant retrieved not retrieved A B C D
  • 9. Relevant performance metrics Recall: The ability of the search to find all of the relevant items in the corpus  Recall is percentage of relevant documents retrieved from the database in response to users query. = No. of relevant documents retrieved Total no. of relevant documents in database Precision: The ability to retrieve top-ranked documents that are mostly relevant. Precision is percentage of retrieved documents that are relevant to the query (i.e. number of retrieved documents that are relevant). = No. of relevant documents retrieved Total number of documents retrieved
  • 10. Measuring Retrieval Effectiveness When is precision important? When is recall important? | } { | | } { } { | Re Relevant Retrieved Relevant call   | } { | | } { } { | Pr Retrieved Retrieved Relevant ecision   Relevant Not relevant Retrieved A B Not retrieved C D Collection size = A+B+C+D Relevant = A+C Retrieved = A+B Relevant Retrieved Relevant + Retrieved Not Relevant + Not Retrieved
  • 11. Example 1 = relevant document Assume there are 14 relevant documents in the corpus, compute precision and recall at the given cutoff points? 1/14 1/14 1/14 1/14 2/14 3/14 3/14 4/14 4/14 4/14 5/11 5/12 5/13 5/14 5/15 6/16 6/17 6/18 6/19 6/20 Recall Precision Hits 1-10 Hits 11-20 1/1 1/2 1/3 1/4 2/5 3/6 3/7 4/8 4/9 4/10 Precision 5/14 5/14 5/14 5/14 5/14 6/14 6/14 6/14 6/14 6/14 Recall
  • 12. 12 Example 2 n doc # relevant RecallPrecision 1 588 x 0.17 1 2 589 x 0.33 1 3 576 4 590 x 0.5 0.75 5 986 6 592 x 0.67 0.667 7 984 8 988 9 578 10 985 11 103 12 591 13 772 x 0.83 0.38 14 990 •Let total number of relevant documents = 6, compute recall and precision for each cut off point n:
  • 13. Average precision Average precision at each retrieved relevant document  Relevant documents not retrieved contribute zero to score Example: Assume total of 14 relevant documents, compute mean average precision. = relevant document 1/1 1/2 1/3 1/4 2/5 3/6 3/7 4/8 4/9 4/10 5/11 5/12 5/135/14 5/15 6/16 6/17 6/18 6/19 4/20 Precision Precision Hits 1-10 Hits 11-20 AP = 0.231
  • 14. MAP (Mean Average Precision)  rij = rank of the j-th relevant document for Qi  |Ri| = #rel. doc. for Qi  n = # test queries       i i j Q R D ij i r j R n MAP ) ( | | 1 1 8 3 )] 8 2 4 1 ( 2 1 ) 10 3 5 2 1 1 ( 3 1 [ 2 1       MAP • Computing mean average for more than one query Relevant Docs. retrieved Query 1 Query 2 1st rel. doc. 1 4 2nd rel. doc. 5 8 3rd rel. doc. 10 • E.g. Assume that for query 1 and 2, there are 3 and 2 relevant documents in the collection, respectively.
  • 15. 15 R- Precision Precision at the R-th position in the ranking of results for a query, where R is the total number of relevant documents.  Calculate precision after R documents are seen  Can be averaged over all queries n doc # relevant 1 588 x 2 589 x 3 576 4 590 x 5 986 6 592 x 7 984 8 988 9 578 10 985 11 103 12 591 13 772 x 14 990 R = # of relevant docs = 6 R-Precision = 4/6 = 0.67
  • 16. • Average precision is calculated by averaging precision when recall increases. Hence • average precision at this recall values = 62.2% for Ranking #1, 52.0% for Ranking #2 . Thus, using this measure we can say that Ranking #1 is better than #2. More Example
  • 17. Precision/Recall tradeoff Can increase recall by retrieving many documents (down to a low level of relevance ranking), but many irrelevant documents would be fetched, reducing precision Can get high recall (but low precision) by retrieving all documents for all queries 1 0 1 Recall Precision The ideal: Better recall and precision Returns relevant documents but misses many useful ones too Returns most relevant documents but includes lots of junk
  • 18. 18 F-Measure  One measure of performance that takes into account both recall and precision.  Harmonic mean of recall and precision:  Compared to arithmetic mean, both need to be high for harmonic mean to be high.  What if no relevant documents exist? P R R P PR F 1 1 2 2    
  • 19. 19 E-Measure Associated with Van Rijsbergen Allows user to specify importance of recall and precision It is parameterized F Measure. A variant of F measure that allows weighting emphasis on precision over recall: Value of  controls trade-off:  = 1: Equal weight for precision and recall (E=F).  > 1: Weight recall more. It emphasizes precision.  < 1: Weight precision more. It emphasizes recall. P R R P PR E 1 2 2 2 2 ) 1 ( ) 1 (          
  • 20. 20 Quiz(5%) Write short answer for the following questions 1. List and explain the two IR system Evaluation Strategies(2pt) 2. Suppose XY IR system returns 8 relevant documents, and 10 irrelevant documents. There are a total of 20 relevant documents in the collection/dataset. What is the precision of the system on this search, and what is its recall?(3pt)