SlideShare a Scribd company logo
1 of 18
INTRODUCTION TO INFORMATION
RETRIEVAL EVALUVATION
METHODS
Prem Sankar C
IIITM KERALA
INFORMATION RETRIEVAL (IR)
 IR is the activity of obtaining information resources
relevant to an information need from a collection of
information resources.
 IR - Select the most relevant document (precision),
and preferably all the relevant ones (recall) from set
of document terms
 Ex: search strings in web search engines.
IR EVALUVATION
 The evaluation of IR system is the process of
assessing how well a system meets the information
needs of its users.
 To measure information retrieval effectiveness in
the standard way, we need a test collection
consisting of three things:
1. A document collection
2. A test suite of information needs, (queries)
3. A set of relevance judgments, a binary
assessment of either relevant or nonrelevant for
each query-document pair.
4
HUMAN LABELED CORPORA
(GOLD STANDARD)
 Start with a corpus of documents.
 Collect a set of queries for this corpus.
 Have one or more human experts
exhaustively label the relevant documents for
each query.
 Typically assumes binary relevance
judgments.
 Requires considerable human effort for large
document/query corpora.
5
Relevant
documents
Retrieved
document
s
Entire
document
collection
retrieved &
relevant
not retrieved
but relevant
retrieved &
irrelevant
Not retrieved &
irrelevant
retrieved not retrieved
relevantirrelevant
PRECISION AND RECALL
PRECISION
 Precision is the ability to retrieve top-ranked
documents that are mostly relevant.
retrieveddocumentsofnumberTotal
retrieveddocumentsrelevantofNumber
precision 
P(relevant|retrieved)
RECALL
 Recall is the fraction of the documents relevant to
the query that are successfully retrieved
 The ability of the search to find all of the relevant
items in the corpus.
documentsrelevantofnumberTotal
retrieveddocumentsrelevantofNumber
recall 
P(retrieved|relevant)
 F-measure is the harmonic mean of precision and
recall:
IR RELEVANCE
 Relevance denotes how well a retrieved document
or set of documents meets the information need of
the user.
 Relevance may include concerns such as
timeliness, authority or novelty of the result.
 Relevance is assessed relative to the user need,
not the query
 E.g., Information need: My swimming pool is
becoming black and needs to be cleaned.
 Query: pool cleaner
DIFFICULTIES IN EVALUATING IR SYSTEMS
 Effectiveness is related to the relevancy of
retrieved items.
 Relevancy is not typically binary but
continuous.
 Even if relevancy is binary, it can be a
difficult judgment to make.
 Relevancy, from a human standpoint, is:
 Subjective: Depends upon a specific user’s
judgment.
 Situational: Relates to user’s current needs.
 Cognitive: Depends on human perception and
behavior.
 Dynamic: Changes over time.
IR JUDGEMENT
 Evaluating the performance of information retrieval
systems usually takes a lot of human efforts.
 Relevance judgement is a laborious task when we
have a large set of retrieved documents.
 One of the most interesting evaluation techniques
used in TREC is the pooling method employed to
deal with relevance judgements, so as to reduce
human efforts.
 In TREC, each participating system reports the
1000 top-ranked documents for each topic. Of
these, only the top 100 from each system are
collected into a pool for human assessment.
PRECISION@K
 Set a rank threshold K
 Compute % relevant in top K
 Ignores documents ranked lower than K
 Ex:
 Prec@3 of 2/3
 Prec@4 of 2/4
 Prec@5 of 3/5
 In similar fashion we have Recall@K
MEAN AVERAGE PRECISION
 Consider rank position of each relevant doc
K1, K2, … KR
 Compute Precision@K for each K1, K2, … KR
 Average precision = average of P@K
 Ex: Avg precision =1/3(1/1+2/3+3/5)
 MAP Score is Average Precision across multiple
queries/rankings
DISCOUNTED CUMULATIVE GAIN
 Popular measure for evaluating web search and
related tasks
 Two assumptions:
1. Highly relevant documents are more useful than
marginally relevant documents
2. The lower the ranked position of a relevant
document, the less useful it is for the user, since it
is less likely to be examined
DISCOUNTED CUMULATIVE GAIN
 Uses graded relevance as a measure of usefulness, or
gain, from examining a document
 Gain is accumulated starting at the top of the ranking
and may be reduced, or discounted, at lower ranks
 Typical discount is 1/log (rank)
 With base 2, the discount at rank 4 is 1/2, and at rank 8
it is 1/3
HUMAN JUDGMENTS ARE
 Expensive
 Inconsistent between raters , Over time
 Decay in value as documents/query mix evolves
 Not always representative of “real users”
IR JUDGEMENT
 How fast does it index?
 How fast does it search?
 Does it recommend related info
 Crowd source relevance judgments facilities
 INTRODUCTION INFORMATION RETRIEVAL EVALUVATION

More Related Content

What's hot

Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)
silambu111
 

What's hot (20)

Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
Medlars
MedlarsMedlars
Medlars
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Information Retrieval Evaluation
Information Retrieval EvaluationInformation Retrieval Evaluation
Information Retrieval Evaluation
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to Metadata
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
BIM Data Mining Unit4 by Tekendra Nath Yogi
 BIM Data Mining Unit4 by Tekendra Nath Yogi BIM Data Mining Unit4 by Tekendra Nath Yogi
BIM Data Mining Unit4 by Tekendra Nath Yogi
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
Multimedia Information Retrieval
Multimedia Information RetrievalMultimedia Information Retrieval
Multimedia Information Retrieval
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction)
 
Digital library software
Digital library softwareDigital library software
Digital library software
 

Viewers also liked

A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrieval
Chen Xi
 
Tweet Recommendation with Graph Co-Ranking
Tweet Recommendation with Graph Co-RankingTweet Recommendation with Graph Co-Ranking
Tweet Recommendation with Graph Co-Ranking
Yoshinari Fujinuma
 
Learning to rankの評価手法
Learning to rankの評価手法Learning to rankの評価手法
Learning to rankの評価手法
Kensuke Mitsuzawa
 
Performance evluvation of chaotic encryption technique
Performance evluvation of chaotic encryption techniquePerformance evluvation of chaotic encryption technique
Performance evluvation of chaotic encryption technique
Ancy Mariam Babu
 
Web Information Extraction Learning based on Probabilistic Graphical Models
Web Information Extraction Learning based on Probabilistic Graphical ModelsWeb Information Extraction Learning based on Probabilistic Graphical Models
Web Information Extraction Learning based on Probabilistic Graphical Models
GUANBO
 
Group-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social mediaGroup-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social media
Ahmedali Durga
 
Multimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalMultimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location Retrieval
Svitlana volkova
 
[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...
[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...
[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...
PROJECT CONSULT Unternehmensberatung Dr. Ulrich Kampffmeyer GmbH
 
System for-health-diagnosis
System for-health-diagnosisSystem for-health-diagnosis
System for-health-diagnosis
ask2372
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
butest
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2nd
hit_alex
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval system
silambu111
 

Viewers also liked (20)

Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIIT
 
A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrieval
 
Crawling
CrawlingCrawling
Crawling
 
Tweet Recommendation with Graph Co-Ranking
Tweet Recommendation with Graph Co-RankingTweet Recommendation with Graph Co-Ranking
Tweet Recommendation with Graph Co-Ranking
 
Learning to rankの評価手法
Learning to rankの評価手法Learning to rankの評価手法
Learning to rankの評価手法
 
Performance evluvation of chaotic encryption technique
Performance evluvation of chaotic encryption techniquePerformance evluvation of chaotic encryption technique
Performance evluvation of chaotic encryption technique
 
Web Information Extraction Learning based on Probabilistic Graphical Models
Web Information Extraction Learning based on Probabilistic Graphical ModelsWeb Information Extraction Learning based on Probabilistic Graphical Models
Web Information Extraction Learning based on Probabilistic Graphical Models
 
Web Information Retrieval and Mining
Web Information Retrieval and MiningWeb Information Retrieval and Mining
Web Information Retrieval and Mining
 
Mining Product Synonyms - Slides
Mining Product Synonyms - SlidesMining Product Synonyms - Slides
Mining Product Synonyms - Slides
 
Group-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social mediaGroup-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social media
 
Multimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalMultimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location Retrieval
 
[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...
[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...
[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...
 
IRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersIRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research Papers
 
System for-health-diagnosis
System for-health-diagnosisSystem for-health-diagnosis
System for-health-diagnosis
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2nd
 
Information Retrieval and Extraction
Information Retrieval and ExtractionInformation Retrieval and Extraction
Information Retrieval and Extraction
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval system
 
ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...
 

Similar to INTRODUCTION INFORMATION RETRIEVAL EVALUVATION

Iaetsd an efficient way of classifying and
Iaetsd an efficient way of classifying andIaetsd an efficient way of classifying and
Iaetsd an efficient way of classifying and
Iaetsd Iaetsd
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
hplap
 
Hci encyclopedia irshortefords
Hci encyclopedia irshortefordsHci encyclopedia irshortefords
Hci encyclopedia irshortefords
apollobgslibrary
 
Hci encyclopedia irshortefords
Hci encyclopedia irshortefordsHci encyclopedia irshortefords
Hci encyclopedia irshortefords
apollobgslibrary
 

Similar to INTRODUCTION INFORMATION RETRIEVAL EVALUVATION (20)

information technology materrailas paper
information technology materrailas paperinformation technology materrailas paper
information technology materrailas paper
 
Chapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdfChapter 5 Query Evaluation.pdf
Chapter 5 Query Evaluation.pdf
 
Candidate Link Generation Using Semantic Phermone Swarm
Candidate Link Generation Using Semantic Phermone SwarmCandidate Link Generation Using Semantic Phermone Swarm
Candidate Link Generation Using Semantic Phermone Swarm
 
Iaetsd an efficient way of classifying and
Iaetsd an efficient way of classifying andIaetsd an efficient way of classifying and
Iaetsd an efficient way of classifying and
 
An introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalAn introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information Retrieval
 
An effective pre processing algorithm for information retrieval systems
An effective pre processing algorithm for information retrieval systemsAn effective pre processing algorithm for information retrieval systems
An effective pre processing algorithm for information retrieval systems
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
 
Predicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithmsPredicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithms
 
Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.
 
Informatio retrival evaluation
Informatio retrival evaluationInformatio retrival evaluation
Informatio retrival evaluation
 
Data Analytics all units
Data Analytics all unitsData Analytics all units
Data Analytics all units
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
 
Hci encyclopedia irshortefords
Hci encyclopedia irshortefordsHci encyclopedia irshortefords
Hci encyclopedia irshortefords
 
Hci encyclopedia irshortefords
Hci encyclopedia irshortefordsHci encyclopedia irshortefords
Hci encyclopedia irshortefords
 
Ijetcas14 446
Ijetcas14 446Ijetcas14 446
Ijetcas14 446
 
Research report nithish
Research report nithishResearch report nithish
Research report nithish
 
Research Report on Document Indexing-Nithish Kumar
Research Report on Document Indexing-Nithish KumarResearch Report on Document Indexing-Nithish Kumar
Research Report on Document Indexing-Nithish Kumar
 
information retrival evaluation.ppt
information retrival evaluation.pptinformation retrival evaluation.ppt
information retrival evaluation.ppt
 

More from Premsankar Chakkingal

More from Premsankar Chakkingal (14)

AI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the ClassroomsAI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the Classrooms
 
AI in Creative Space
AI in Creative SpaceAI in Creative Space
AI in Creative Space
 
Dynamics of Semantic Networks of Independence Day Speeches
Dynamics of Semantic Networks of Independence Day SpeechesDynamics of Semantic Networks of Independence Day Speeches
Dynamics of Semantic Networks of Independence Day Speeches
 
Introduction to Computational Social Science
Introduction to Computational Social ScienceIntroduction to Computational Social Science
Introduction to Computational Social Science
 
Introductory Talk on Social Network Analysis at Facebook Developer Circle Me...
Introductory Talk on Social Network Analysis  at Facebook Developer Circle Me...Introductory Talk on Social Network Analysis  at Facebook Developer Circle Me...
Introductory Talk on Social Network Analysis at Facebook Developer Circle Me...
 
Introduction to Agent Based Modeling Using NetLogo
Introduction to Agent Based Modeling Using NetLogoIntroduction to Agent Based Modeling Using NetLogo
Introduction to Agent Based Modeling Using NetLogo
 
Introduction to Technology Assessments As tool for Forecasting and evaluation...
Introduction to Technology Assessments As tool for Forecasting and evaluation...Introduction to Technology Assessments As tool for Forecasting and evaluation...
Introduction to Technology Assessments As tool for Forecasting and evaluation...
 
Negotiated Studies - A semantic social network based expert recommender system
Negotiated Studies - A semantic social network based expert recommender systemNegotiated Studies - A semantic social network based expert recommender system
Negotiated Studies - A semantic social network based expert recommender system
 
Business potential of Energy Auditing in Kerala
Business potential of Energy Auditing in KeralaBusiness potential of Energy Auditing in Kerala
Business potential of Energy Auditing in Kerala
 
Introduction to Graph Theory
Introduction to Graph TheoryIntroduction to Graph Theory
Introduction to Graph Theory
 
Negotiated Studies Presentation on Social Network Analysis of Knowledge Networks
Negotiated Studies Presentation on Social Network Analysis of Knowledge NetworksNegotiated Studies Presentation on Social Network Analysis of Knowledge Networks
Negotiated Studies Presentation on Social Network Analysis of Knowledge Networks
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Hypothesis Testing for Beginners
Hypothesis Testing for BeginnersHypothesis Testing for Beginners
Hypothesis Testing for Beginners
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

INTRODUCTION INFORMATION RETRIEVAL EVALUVATION

  • 1. INTRODUCTION TO INFORMATION RETRIEVAL EVALUVATION METHODS Prem Sankar C IIITM KERALA
  • 2. INFORMATION RETRIEVAL (IR)  IR is the activity of obtaining information resources relevant to an information need from a collection of information resources.  IR - Select the most relevant document (precision), and preferably all the relevant ones (recall) from set of document terms  Ex: search strings in web search engines.
  • 3. IR EVALUVATION  The evaluation of IR system is the process of assessing how well a system meets the information needs of its users.  To measure information retrieval effectiveness in the standard way, we need a test collection consisting of three things: 1. A document collection 2. A test suite of information needs, (queries) 3. A set of relevance judgments, a binary assessment of either relevant or nonrelevant for each query-document pair.
  • 4. 4 HUMAN LABELED CORPORA (GOLD STANDARD)  Start with a corpus of documents.  Collect a set of queries for this corpus.  Have one or more human experts exhaustively label the relevant documents for each query.  Typically assumes binary relevance judgments.  Requires considerable human effort for large document/query corpora.
  • 5. 5 Relevant documents Retrieved document s Entire document collection retrieved & relevant not retrieved but relevant retrieved & irrelevant Not retrieved & irrelevant retrieved not retrieved relevantirrelevant PRECISION AND RECALL
  • 6. PRECISION  Precision is the ability to retrieve top-ranked documents that are mostly relevant. retrieveddocumentsofnumberTotal retrieveddocumentsrelevantofNumber precision  P(relevant|retrieved)
  • 7. RECALL  Recall is the fraction of the documents relevant to the query that are successfully retrieved  The ability of the search to find all of the relevant items in the corpus. documentsrelevantofnumberTotal retrieveddocumentsrelevantofNumber recall  P(retrieved|relevant)
  • 8.  F-measure is the harmonic mean of precision and recall:
  • 9. IR RELEVANCE  Relevance denotes how well a retrieved document or set of documents meets the information need of the user.  Relevance may include concerns such as timeliness, authority or novelty of the result.  Relevance is assessed relative to the user need, not the query  E.g., Information need: My swimming pool is becoming black and needs to be cleaned.  Query: pool cleaner
  • 10. DIFFICULTIES IN EVALUATING IR SYSTEMS  Effectiveness is related to the relevancy of retrieved items.  Relevancy is not typically binary but continuous.  Even if relevancy is binary, it can be a difficult judgment to make.  Relevancy, from a human standpoint, is:  Subjective: Depends upon a specific user’s judgment.  Situational: Relates to user’s current needs.  Cognitive: Depends on human perception and behavior.  Dynamic: Changes over time.
  • 11. IR JUDGEMENT  Evaluating the performance of information retrieval systems usually takes a lot of human efforts.  Relevance judgement is a laborious task when we have a large set of retrieved documents.  One of the most interesting evaluation techniques used in TREC is the pooling method employed to deal with relevance judgements, so as to reduce human efforts.  In TREC, each participating system reports the 1000 top-ranked documents for each topic. Of these, only the top 100 from each system are collected into a pool for human assessment.
  • 12. PRECISION@K  Set a rank threshold K  Compute % relevant in top K  Ignores documents ranked lower than K  Ex:  Prec@3 of 2/3  Prec@4 of 2/4  Prec@5 of 3/5  In similar fashion we have Recall@K
  • 13. MEAN AVERAGE PRECISION  Consider rank position of each relevant doc K1, K2, … KR  Compute Precision@K for each K1, K2, … KR  Average precision = average of P@K  Ex: Avg precision =1/3(1/1+2/3+3/5)  MAP Score is Average Precision across multiple queries/rankings
  • 14. DISCOUNTED CUMULATIVE GAIN  Popular measure for evaluating web search and related tasks  Two assumptions: 1. Highly relevant documents are more useful than marginally relevant documents 2. The lower the ranked position of a relevant document, the less useful it is for the user, since it is less likely to be examined
  • 15. DISCOUNTED CUMULATIVE GAIN  Uses graded relevance as a measure of usefulness, or gain, from examining a document  Gain is accumulated starting at the top of the ranking and may be reduced, or discounted, at lower ranks  Typical discount is 1/log (rank)  With base 2, the discount at rank 4 is 1/2, and at rank 8 it is 1/3
  • 16. HUMAN JUDGMENTS ARE  Expensive  Inconsistent between raters , Over time  Decay in value as documents/query mix evolves  Not always representative of “real users”
  • 17. IR JUDGEMENT  How fast does it index?  How fast does it search?  Does it recommend related info  Crowd source relevance judgments facilities