SlideShare a Scribd company logo
How do Gain and Discount Functions Affect the
Correlation between DCG and User Satisfaction?
Julián Urbano Mónica Marrero
ECIR 2015
Vienna, March 30th
Discount d(i ; k) Gain g(r)
Zipfian: 1/݅ Linear: ‫ݎ‬
Linear: ሺ݇ ൅ ݅ െ 1ሻ/݇ Exp(2): 2௥
െ 1
Constant: 1 Exp(3): 3௥
െ 1
Log(2): 1/ logଶ ݅ ൅ 1 Exp(5): 5௥
െ 1
Log(3): 1/ logଷ ݅ ൅ 2 Bin(1): Iሾ‫ݎ‬ ൒ 1ሿ
Log(5): 1/ logହሺ݅ ൅ 4ሻ Bin(2): Iሾ‫ݎ‬ ൒ 2ሿ
Discount functions
Rank i
Discountd(i)
0.00.20.40.60.81.0
1 2 3 4 5
Zipfian
Linear
Constant
Log(2)
Log(3)
Log(5)
Gain functions
Relevance r
Gaing(r)
0510152025
0 1 2
Linear
Exp(2)
Exp(3)
Exp(5)
Bin(1)Bin(2)
Documents
Information
Need
Real World Cranfield
IR System
Topic
Relevance
Judgments
IR System
Documents
GAP
DCG
ERR
Static
Component
Dynamic
Component
Test
Collection
Effectiveness
Measures
Time to complete task, Idle time,
Success rate, Frustration, Satisfaction,
Ease of use, Ease of learning…
Precision, Average Precision, Reciprocal Rank,
Q-measure, Discounted Cumulative Gain,
Rank-Biased Precision, Time-Biased Gain…
Live Observation
What Gain and Discount for DCG are
better to predict user satisfaction?
• First, let’s normalize DCG scores (this is not nDCG!)
• One system with DCG=φ. What does it mean?
• Intuition: φ·100% of users will be satisfied
• P(Sat|DCG= φ)= φ
• Two systems with ΔDCG=Δφ. What does it mean?
• Intuition: users will prefer the (supposedly) better one
• P(Pref|ΔDCG=Δφ)=1
P(Sat) and P(Pref) depend on the systems,
not on how we evaluate them. Yet, there are
many different ways to compute effectiveness
Experiment
• Collect user preferences between two systems
• Map DCG onto P(Sat)
• Map ΔDCG onto P(Pref)
• Music recommendation task
• Ad-hoc, informational, enjoyable by assessors
• Preferences less confounded by interface effects
• All data from MIREX (TREC-like for Music IR)
• Datasets from 2007–2012
• 3-point relevance scale: {0, 1, 2}
• 4115 examples
• Uniformly covering the [0,1] range of ΔDCG
• 432 unique queries
• 5636 unique documents
• Crowdsourced with Crowdflower
• Trap examples for quality control
• 113 subjects
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Zipfian discount
Difference in DCG
Probabilitythatusersagree
Gains
Linear
Exp(2)
Exp(3)
Exp(5)
Bin(1)
Bin(2)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(2) discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(3) discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(5) discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Constant discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Zipfian discount
DCG
Probabilityofusersatisfaction
Gains
Linear
Exp(2)
Exp(3)
Exp(5)
Bin(1)
Bin(2)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear discount
DCG
Probabilityofusersatisfaction
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(2) discount
DCG
Probabilityofusersatisfaction
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(3) discount
DCG
Probabilityofusersatisfaction
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(5) discount
DCG
Probabilityofusersatisfaction
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Constant discount
DCG
Probabilityofusersatisfaction
Results (1 system): DCG predicting user satisfaction Results (2 systems): ΔDCG predicting user preference
Results: bias of Gain and Discount functions
• Diagonal: how far is P(Sat|DCG) from the ideal diagonal?
• Intuitiveness of DCG scores
• Endpoint: how far is P(Sat|DCG) from the ideal 0% and 100%?
• User disagreement and goodness of the DCG user model
• Top: how far is P(Pref|ΔDCG) from the ideal 100%?
• Discriminative power
Summary and Implications
• New method to map system effectiveness onto user satisfaction
• Sample application to DCG for a music recommendation task
• Gain functions that emphasize highly relevant documents underestimate
user satisfaction. Linear gain is better than exponential
• All discount functions bias the prediction of user satisfaction
• This task might be too enjoyable to observe discount effect
• Size (of the DCG difference) does matter
• Non-parametric statistics (eg. Sign test, Wilcoxon test) and just looking at
the ranking of systems (eg. Kendall τ) oversimplify the evaluation problem
• Zero-point null hypothesis testing (ie. H0 : ΔDCG=0) is not reasonable
• Future work will investigate this method for Text IR
• Provide a common framework, based on P(Sat) and P(Pref), to evaluate
with informational and navigational queries using appropriate measures
Data and code
available online
‫݇@ܩܥܦ‬ ൌ
∑ ݃ ‫ݎ‬௜ ⋅ ݀ ݅	; ݇௞
௜ୀଵ
∑ ݃ ‫ݎ‬௠௔௫ ⋅ ݀ ݅	; ݇௞
௜ୀଵ
0.060.100.14
Diagonal bias
Bias
Bin(2)
Bin(1)
Exp(5)
Exp(3)
Exp(2)
Linear
0.060.100.14
Zipfian
Linear
Log(2)
Log(3)
Log(5)
Constant
0.060.100.14
Gain Discount
0.140.180.22
Endpoint bias
Bias
Bin(2)
Bin(1)
Exp(5)
Exp(3)
Exp(2)
Linear
0.140.180.22
Zipfian
Linear
Log(2)
Log(3)
Log(5)
Constant
0.140.180.22
Gain Discount
0.460.500.54
Top bias
Bias
Bin(2)
Bin(1)
Exp(5)
Exp(3)
Exp(2)
Linear
0.460.500.54
Zipfian
Linear
Log(2)
Log(3)
Log(5)
Constant
0.460.500.54
Gain Discount

More Related Content

Viewers also liked

A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...
Julián Urbano
 
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
Julián Urbano
 
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
Suneal Saini
 
Hplc presentation final
Hplc presentation    finalHplc presentation    final
Hplc presentation final
Ovesh Gaikwad
 
HPLC
HPLCHPLC
Hplc presentation for class
Hplc presentation for classHplc presentation for class
Hplc presentation for class
Dr. Ravi Sankar
 
HPLC Principle,Instrumentation and Application
HPLC Principle,Instrumentation and ApplicationHPLC Principle,Instrumentation and Application
HPLC Principle,Instrumentation and Application
Alakesh Pradhan
 
Water Pollution2 By Meenaxi & Shradha
Water Pollution2  By Meenaxi & ShradhaWater Pollution2  By Meenaxi & Shradha
Water Pollution2 By Meenaxi & Shradha
subzero64
 
Hplc
HplcHplc
Principles and application of chromatography
Principles and application of chromatographyPrinciples and application of chromatography
Principles and application of chromatography
suniu
 
HPLC - High Performance Liquid Chromatography
HPLC - High Performance Liquid ChromatographyHPLC - High Performance Liquid Chromatography
HPLC - High Performance Liquid Chromatography
Divya Basuti
 
Environmental pollution
Environmental pollutionEnvironmental pollution
Environmental pollution
Dhanalakshmi Chandran
 
Lead Generation on SlideShare: A How-to Guide
Lead Generation on SlideShare: A How-to GuideLead Generation on SlideShare: A How-to Guide
Lead Generation on SlideShare: A How-to Guide
SlideShare
 

Viewers also liked (13)

A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...
 
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
 
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
 
Hplc presentation final
Hplc presentation    finalHplc presentation    final
Hplc presentation final
 
HPLC
HPLCHPLC
HPLC
 
Hplc presentation for class
Hplc presentation for classHplc presentation for class
Hplc presentation for class
 
HPLC Principle,Instrumentation and Application
HPLC Principle,Instrumentation and ApplicationHPLC Principle,Instrumentation and Application
HPLC Principle,Instrumentation and Application
 
Water Pollution2 By Meenaxi & Shradha
Water Pollution2  By Meenaxi & ShradhaWater Pollution2  By Meenaxi & Shradha
Water Pollution2 By Meenaxi & Shradha
 
Hplc
HplcHplc
Hplc
 
Principles and application of chromatography
Principles and application of chromatographyPrinciples and application of chromatography
Principles and application of chromatography
 
HPLC - High Performance Liquid Chromatography
HPLC - High Performance Liquid ChromatographyHPLC - High Performance Liquid Chromatography
HPLC - High Performance Liquid Chromatography
 
Environmental pollution
Environmental pollutionEnvironmental pollution
Environmental pollution
 
Lead Generation on SlideShare: A How-to Guide
Lead Generation on SlideShare: A How-to GuideLead Generation on SlideShare: A How-to Guide
Lead Generation on SlideShare: A How-to Guide
 

Similar to How Do Gain and Discount Functions Affect the Correlation between DCG and User Satisfaction?

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
MLconf
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Ian Foster
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
Dr. C.V. Suresh Babu
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
Zbigniew Jerzak
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
Adam Doyle
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
Hansol Kang
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
Ian Foster
 
Relational algebra
Relational algebraRelational algebra
Relational algebra
Dr. C.V. Suresh Babu
 
【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks
Takeo Imai
 
Dsdt meetup oct24
Dsdt meetup oct24Dsdt meetup oct24
Dsdt meetup oct24
Annik Bernatchez
 
DSDT Meetup October 2017
DSDT Meetup October 2017DSDT Meetup October 2017
DSDT Meetup October 2017
DSDT_MTL
 
Dsdt meetup oct24
Dsdt meetup oct24Dsdt meetup oct24
Dsdt meetup oct24
JDA Labs MTL
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
Paolo Missier
 
Software metrics
Software metricsSoftware metrics
Software metrics
Dr. C.V. Suresh Babu
 
TINET_FRnOG_2008_public
TINET_FRnOG_2008_publicTINET_FRnOG_2008_public
TINET_FRnOG_2008_public
Davide Cherubini
 
ga-2.ppt
ga-2.pptga-2.ppt
ga-2.ppt
sayedmha
 
Amplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqDAmplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqD
CPqD
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...
Sara Granados Cabeza
 
Advanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITIAdvanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITI
Innovation Enterprise
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
Vincenzo Gulisano
 

Similar to How Do Gain and Discount Functions Affect the Correlation between DCG and User Satisfaction? (20)

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Relational algebra
Relational algebraRelational algebra
Relational algebra
 
【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks
 
Dsdt meetup oct24
Dsdt meetup oct24Dsdt meetup oct24
Dsdt meetup oct24
 
DSDT Meetup October 2017
DSDT Meetup October 2017DSDT Meetup October 2017
DSDT Meetup October 2017
 
Dsdt meetup oct24
Dsdt meetup oct24Dsdt meetup oct24
Dsdt meetup oct24
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Software metrics
Software metricsSoftware metrics
Software metrics
 
TINET_FRnOG_2008_public
TINET_FRnOG_2008_publicTINET_FRnOG_2008_public
TINET_FRnOG_2008_public
 
ga-2.ppt
ga-2.pptga-2.ppt
ga-2.ppt
 
Amplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqDAmplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqD
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...
 
Advanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITIAdvanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITI
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 

More from Julián Urbano

Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Julián Urbano
 
Your PhD and You
Your PhD and YouYour PhD and You
Your PhD and You
Julián Urbano
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and How
Julián Urbano
 
The Treatment of Ties in AP Correlation
The Treatment of Ties in AP CorrelationThe Treatment of Ties in AP Correlation
The Treatment of Ties in AP Correlation
Julián Urbano
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR Evaluation
Julián Urbano
 
Crawling the Web for Structured Documents
Crawling the Web for Structured DocumentsCrawling the Web for Structured Documents
Crawling the Web for Structured Documents
Julián Urbano
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing TrackThe University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
Julián Urbano
 
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
Julián Urbano
 
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Julián Urbano
 
Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)
Julián Urbano
 
Evaluation in Audio Music Similarity
Evaluation in Audio Music SimilarityEvaluation in Audio Music Similarity
Evaluation in Audio Music Similarity
Julián Urbano
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Julián Urbano
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
Julián Urbano
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...
Julián Urbano
 
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Julián Urbano
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
Julián Urbano
 
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Julián Urbano
 
Audio Music Similarity and Retrieval: Evaluation Power and Stability
Audio Music Similarity and Retrieval: Evaluation Power and StabilityAudio Music Similarity and Retrieval: Evaluation Power and Stability
Audio Music Similarity and Retrieval: Evaluation Power and Stability
Julián Urbano
 
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
Julián Urbano
 
Improving the Generation of Ground Truths based on Partially Ordered Lists
Improving the Generation of Ground Truths based on Partially Ordered ListsImproving the Generation of Ground Truths based on Partially Ordered Lists
Improving the Generation of Ground Truths based on Partially Ordered Lists
Julián Urbano
 

More from Julián Urbano (20)

Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
 
Your PhD and You
Your PhD and YouYour PhD and You
Your PhD and You
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and How
 
The Treatment of Ties in AP Correlation
The Treatment of Ties in AP CorrelationThe Treatment of Ties in AP Correlation
The Treatment of Ties in AP Correlation
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR Evaluation
 
Crawling the Web for Structured Documents
Crawling the Web for Structured DocumentsCrawling the Web for Structured Documents
Crawling the Web for Structured Documents
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing TrackThe University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
 
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
 
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
 
Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)
 
Evaluation in Audio Music Similarity
Evaluation in Audio Music SimilarityEvaluation in Audio Music Similarity
Evaluation in Audio Music Similarity
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...
 
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
 
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
 
Audio Music Similarity and Retrieval: Evaluation Power and Stability
Audio Music Similarity and Retrieval: Evaluation Power and StabilityAudio Music Similarity and Retrieval: Evaluation Power and Stability
Audio Music Similarity and Retrieval: Evaluation Power and Stability
 
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
 
Improving the Generation of Ground Truths based on Partially Ordered Lists
Improving the Generation of Ground Truths based on Partially Ordered ListsImproving the Generation of Ground Truths based on Partially Ordered Lists
Improving the Generation of Ground Truths based on Partially Ordered Lists
 

Recently uploaded

Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 

Recently uploaded (20)

Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 

How Do Gain and Discount Functions Affect the Correlation between DCG and User Satisfaction?

  • 1. How do Gain and Discount Functions Affect the Correlation between DCG and User Satisfaction? Julián Urbano Mónica Marrero ECIR 2015 Vienna, March 30th Discount d(i ; k) Gain g(r) Zipfian: 1/݅ Linear: ‫ݎ‬ Linear: ሺ݇ ൅ ݅ െ 1ሻ/݇ Exp(2): 2௥ െ 1 Constant: 1 Exp(3): 3௥ െ 1 Log(2): 1/ logଶ ݅ ൅ 1 Exp(5): 5௥ െ 1 Log(3): 1/ logଷ ݅ ൅ 2 Bin(1): Iሾ‫ݎ‬ ൒ 1ሿ Log(5): 1/ logହሺ݅ ൅ 4ሻ Bin(2): Iሾ‫ݎ‬ ൒ 2ሿ Discount functions Rank i Discountd(i) 0.00.20.40.60.81.0 1 2 3 4 5 Zipfian Linear Constant Log(2) Log(3) Log(5) Gain functions Relevance r Gaing(r) 0510152025 0 1 2 Linear Exp(2) Exp(3) Exp(5) Bin(1)Bin(2) Documents Information Need Real World Cranfield IR System Topic Relevance Judgments IR System Documents GAP DCG ERR Static Component Dynamic Component Test Collection Effectiveness Measures Time to complete task, Idle time, Success rate, Frustration, Satisfaction, Ease of use, Ease of learning… Precision, Average Precision, Reciprocal Rank, Q-measure, Discounted Cumulative Gain, Rank-Biased Precision, Time-Biased Gain… Live Observation What Gain and Discount for DCG are better to predict user satisfaction? • First, let’s normalize DCG scores (this is not nDCG!) • One system with DCG=φ. What does it mean? • Intuition: φ·100% of users will be satisfied • P(Sat|DCG= φ)= φ • Two systems with ΔDCG=Δφ. What does it mean? • Intuition: users will prefer the (supposedly) better one • P(Pref|ΔDCG=Δφ)=1 P(Sat) and P(Pref) depend on the systems, not on how we evaluate them. Yet, there are many different ways to compute effectiveness Experiment • Collect user preferences between two systems • Map DCG onto P(Sat) • Map ΔDCG onto P(Pref) • Music recommendation task • Ad-hoc, informational, enjoyable by assessors • Preferences less confounded by interface effects • All data from MIREX (TREC-like for Music IR) • Datasets from 2007–2012 • 3-point relevance scale: {0, 1, 2} • 4115 examples • Uniformly covering the [0,1] range of ΔDCG • 432 unique queries • 5636 unique documents • Crowdsourced with Crowdflower • Trap examples for quality control • 113 subjects 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Zipfian discount Difference in DCG Probabilitythatusersagree Gains Linear Exp(2) Exp(3) Exp(5) Bin(1) Bin(2) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(2) discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(3) discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(5) discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Constant discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Zipfian discount DCG Probabilityofusersatisfaction Gains Linear Exp(2) Exp(3) Exp(5) Bin(1) Bin(2) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear discount DCG Probabilityofusersatisfaction 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(2) discount DCG Probabilityofusersatisfaction 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(3) discount DCG Probabilityofusersatisfaction 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(5) discount DCG Probabilityofusersatisfaction 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Constant discount DCG Probabilityofusersatisfaction Results (1 system): DCG predicting user satisfaction Results (2 systems): ΔDCG predicting user preference Results: bias of Gain and Discount functions • Diagonal: how far is P(Sat|DCG) from the ideal diagonal? • Intuitiveness of DCG scores • Endpoint: how far is P(Sat|DCG) from the ideal 0% and 100%? • User disagreement and goodness of the DCG user model • Top: how far is P(Pref|ΔDCG) from the ideal 100%? • Discriminative power Summary and Implications • New method to map system effectiveness onto user satisfaction • Sample application to DCG for a music recommendation task • Gain functions that emphasize highly relevant documents underestimate user satisfaction. Linear gain is better than exponential • All discount functions bias the prediction of user satisfaction • This task might be too enjoyable to observe discount effect • Size (of the DCG difference) does matter • Non-parametric statistics (eg. Sign test, Wilcoxon test) and just looking at the ranking of systems (eg. Kendall τ) oversimplify the evaluation problem • Zero-point null hypothesis testing (ie. H0 : ΔDCG=0) is not reasonable • Future work will investigate this method for Text IR • Provide a common framework, based on P(Sat) and P(Pref), to evaluate with informational and navigational queries using appropriate measures Data and code available online ‫݇@ܩܥܦ‬ ൌ ∑ ݃ ‫ݎ‬௜ ⋅ ݀ ݅ ; ݇௞ ௜ୀଵ ∑ ݃ ‫ݎ‬௠௔௫ ⋅ ݀ ݅ ; ݇௞ ௜ୀଵ 0.060.100.14 Diagonal bias Bias Bin(2) Bin(1) Exp(5) Exp(3) Exp(2) Linear 0.060.100.14 Zipfian Linear Log(2) Log(3) Log(5) Constant 0.060.100.14 Gain Discount 0.140.180.22 Endpoint bias Bias Bin(2) Bin(1) Exp(5) Exp(3) Exp(2) Linear 0.140.180.22 Zipfian Linear Log(2) Log(3) Log(5) Constant 0.140.180.22 Gain Discount 0.460.500.54 Top bias Bias Bin(2) Bin(1) Exp(5) Exp(3) Exp(2) Linear 0.460.500.54 Zipfian Linear Log(2) Log(3) Log(5) Constant 0.460.500.54 Gain Discount