SlideShare a Scribd company logo
1 of 1
Download to read offline
How do Gain and Discount Functions Affect the
Correlation between DCG and User Satisfaction?
Julián Urbano Mónica Marrero
ECIR 2015
Vienna, March 30th
Discount d(i ; k) Gain g(r)
Zipfian: 1/݅ Linear: ‫ݎ‬
Linear: ሺ݇ ൅ ݅ െ 1ሻ/݇ Exp(2): 2௥
െ 1
Constant: 1 Exp(3): 3௥
െ 1
Log(2): 1/ logଶ ݅ ൅ 1 Exp(5): 5௥
െ 1
Log(3): 1/ logଷ ݅ ൅ 2 Bin(1): Iሾ‫ݎ‬ ൒ 1ሿ
Log(5): 1/ logହሺ݅ ൅ 4ሻ Bin(2): Iሾ‫ݎ‬ ൒ 2ሿ
Discount functions
Rank i
Discountd(i)
0.00.20.40.60.81.0
1 2 3 4 5
Zipfian
Linear
Constant
Log(2)
Log(3)
Log(5)
Gain functions
Relevance r
Gaing(r)
0510152025
0 1 2
Linear
Exp(2)
Exp(3)
Exp(5)
Bin(1)Bin(2)
Documents
Information
Need
Real World Cranfield
IR System
Topic
Relevance
Judgments
IR System
Documents
GAP
DCG
ERR
Static
Component
Dynamic
Component
Test
Collection
Effectiveness
Measures
Time to complete task, Idle time,
Success rate, Frustration, Satisfaction,
Ease of use, Ease of learning…
Precision, Average Precision, Reciprocal Rank,
Q-measure, Discounted Cumulative Gain,
Rank-Biased Precision, Time-Biased Gain…
Live Observation
What Gain and Discount for DCG are
better to predict user satisfaction?
• First, let’s normalize DCG scores (this is not nDCG!)
• One system with DCG=φ. What does it mean?
• Intuition: φ·100% of users will be satisfied
• P(Sat|DCG= φ)= φ
• Two systems with ΔDCG=Δφ. What does it mean?
• Intuition: users will prefer the (supposedly) better one
• P(Pref|ΔDCG=Δφ)=1
P(Sat) and P(Pref) depend on the systems,
not on how we evaluate them. Yet, there are
many different ways to compute effectiveness
Experiment
• Collect user preferences between two systems
• Map DCG onto P(Sat)
• Map ΔDCG onto P(Pref)
• Music recommendation task
• Ad-hoc, informational, enjoyable by assessors
• Preferences less confounded by interface effects
• All data from MIREX (TREC-like for Music IR)
• Datasets from 2007–2012
• 3-point relevance scale: {0, 1, 2}
• 4115 examples
• Uniformly covering the [0,1] range of ΔDCG
• 432 unique queries
• 5636 unique documents
• Crowdsourced with Crowdflower
• Trap examples for quality control
• 113 subjects
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Zipfian discount
Difference in DCG
Probabilitythatusersagree
Gains
Linear
Exp(2)
Exp(3)
Exp(5)
Bin(1)
Bin(2)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(2) discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(3) discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(5) discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Constant discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Zipfian discount
DCG
Probabilityofusersatisfaction
Gains
Linear
Exp(2)
Exp(3)
Exp(5)
Bin(1)
Bin(2)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear discount
DCG
Probabilityofusersatisfaction
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(2) discount
DCG
Probabilityofusersatisfaction
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(3) discount
DCG
Probabilityofusersatisfaction
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(5) discount
DCG
Probabilityofusersatisfaction
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Constant discount
DCG
Probabilityofusersatisfaction
Results (1 system): DCG predicting user satisfaction Results (2 systems): ΔDCG predicting user preference
Results: bias of Gain and Discount functions
• Diagonal: how far is P(Sat|DCG) from the ideal diagonal?
• Intuitiveness of DCG scores
• Endpoint: how far is P(Sat|DCG) from the ideal 0% and 100%?
• User disagreement and goodness of the DCG user model
• Top: how far is P(Pref|ΔDCG) from the ideal 100%?
• Discriminative power
Summary and Implications
• New method to map system effectiveness onto user satisfaction
• Sample application to DCG for a music recommendation task
• Gain functions that emphasize highly relevant documents underestimate
user satisfaction. Linear gain is better than exponential
• All discount functions bias the prediction of user satisfaction
• This task might be too enjoyable to observe discount effect
• Size (of the DCG difference) does matter
• Non-parametric statistics (eg. Sign test, Wilcoxon test) and just looking at
the ranking of systems (eg. Kendall τ) oversimplify the evaluation problem
• Zero-point null hypothesis testing (ie. H0 : ΔDCG=0) is not reasonable
• Future work will investigate this method for Text IR
• Provide a common framework, based on P(Sat) and P(Pref), to evaluate
with informational and navigational queries using appropriate measures
Data and code
available online
‫݇@ܩܥܦ‬ ൌ
∑ ݃ ‫ݎ‬௜ ⋅ ݀ ݅	; ݇௞
௜ୀଵ
∑ ݃ ‫ݎ‬௠௔௫ ⋅ ݀ ݅	; ݇௞
௜ୀଵ
0.060.100.14
Diagonal bias
Bias
Bin(2)
Bin(1)
Exp(5)
Exp(3)
Exp(2)
Linear
0.060.100.14
Zipfian
Linear
Log(2)
Log(3)
Log(5)
Constant
0.060.100.14
Gain Discount
0.140.180.22
Endpoint bias
Bias
Bin(2)
Bin(1)
Exp(5)
Exp(3)
Exp(2)
Linear
0.140.180.22
Zipfian
Linear
Log(2)
Log(3)
Log(5)
Constant
0.140.180.22
Gain Discount
0.460.500.54
Top bias
Bias
Bin(2)
Bin(1)
Exp(5)
Exp(3)
Exp(2)
Linear
0.460.500.54
Zipfian
Linear
Log(2)
Log(3)
Log(5)
Constant
0.460.500.54
Gain Discount

More Related Content

Viewers also liked

Hplc presentation for class
Hplc presentation for classHplc presentation for class
Hplc presentation for class
Dr. Ravi Sankar
 
Water Pollution2 By Meenaxi & Shradha
Water Pollution2  By Meenaxi & ShradhaWater Pollution2  By Meenaxi & Shradha
Water Pollution2 By Meenaxi & Shradha
subzero64
 
Principles and application of chromatography
Principles and application of chromatographyPrinciples and application of chromatography
Principles and application of chromatography
suniu
 

Viewers also liked (13)

A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...
 
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
 
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
 
Hplc presentation final
Hplc presentation    finalHplc presentation    final
Hplc presentation final
 
HPLC
HPLCHPLC
HPLC
 
Hplc presentation for class
Hplc presentation for classHplc presentation for class
Hplc presentation for class
 
HPLC Principle,Instrumentation and Application
HPLC Principle,Instrumentation and ApplicationHPLC Principle,Instrumentation and Application
HPLC Principle,Instrumentation and Application
 
Water Pollution2 By Meenaxi & Shradha
Water Pollution2  By Meenaxi & ShradhaWater Pollution2  By Meenaxi & Shradha
Water Pollution2 By Meenaxi & Shradha
 
Hplc
HplcHplc
Hplc
 
Principles and application of chromatography
Principles and application of chromatographyPrinciples and application of chromatography
Principles and application of chromatography
 
HPLC - High Performance Liquid Chromatography
HPLC - High Performance Liquid ChromatographyHPLC - High Performance Liquid Chromatography
HPLC - High Performance Liquid Chromatography
 
Environmental pollution
Environmental pollutionEnvironmental pollution
Environmental pollution
 
Lead Generation on SlideShare: A How-to Guide
Lead Generation on SlideShare: A How-to GuideLead Generation on SlideShare: A How-to Guide
Lead Generation on SlideShare: A How-to Guide
 

Similar to How Do Gain and Discount Functions Affect the Correlation between DCG and User Satisfaction?

Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
Zbigniew Jerzak
 
Dsdt meetup oct24
Dsdt meetup oct24Dsdt meetup oct24
Dsdt meetup oct24
JDA Labs MTL
 
Amplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqDAmplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqD
CPqD
 

Similar to How Do Gain and Discount Functions Affect the Correlation between DCG and User Satisfaction? (20)

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Relational algebra
Relational algebraRelational algebra
Relational algebra
 
【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks
 
DSDT Meetup October 2017
DSDT Meetup October 2017DSDT Meetup October 2017
DSDT Meetup October 2017
 
Dsdt meetup oct24
Dsdt meetup oct24Dsdt meetup oct24
Dsdt meetup oct24
 
Dsdt meetup oct24
Dsdt meetup oct24Dsdt meetup oct24
Dsdt meetup oct24
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Software metrics
Software metricsSoftware metrics
Software metrics
 
TINET_FRnOG_2008_public
TINET_FRnOG_2008_publicTINET_FRnOG_2008_public
TINET_FRnOG_2008_public
 
ga-2.ppt
ga-2.pptga-2.ppt
ga-2.ppt
 
Amplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqDAmplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqD
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...
 
Advanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITIAdvanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITI
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 

More from Julián Urbano

Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Julián Urbano
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Julián Urbano
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
Julián Urbano
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...
Julián Urbano
 

More from Julián Urbano (20)

Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
 
Your PhD and You
Your PhD and YouYour PhD and You
Your PhD and You
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and How
 
The Treatment of Ties in AP Correlation
The Treatment of Ties in AP CorrelationThe Treatment of Ties in AP Correlation
The Treatment of Ties in AP Correlation
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR Evaluation
 
Crawling the Web for Structured Documents
Crawling the Web for Structured DocumentsCrawling the Web for Structured Documents
Crawling the Web for Structured Documents
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing TrackThe University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
 
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
 
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
 
Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)
 
Evaluation in Audio Music Similarity
Evaluation in Audio Music SimilarityEvaluation in Audio Music Similarity
Evaluation in Audio Music Similarity
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...
 
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
 
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
 
Audio Music Similarity and Retrieval: Evaluation Power and Stability
Audio Music Similarity and Retrieval: Evaluation Power and StabilityAudio Music Similarity and Retrieval: Evaluation Power and Stability
Audio Music Similarity and Retrieval: Evaluation Power and Stability
 
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
 
Improving the Generation of Ground Truths based on Partially Ordered Lists
Improving the Generation of Ground Truths based on Partially Ordered ListsImproving the Generation of Ground Truths based on Partially Ordered Lists
Improving the Generation of Ground Truths based on Partially Ordered Lists
 

Recently uploaded

Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree days
Brahmesh Reddy B R
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptxNanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
ssusera4ec7b
 
Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 students
levieagacer
 
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.pptGENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
SyedArifMalki
 

Recently uploaded (20)

Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of Asepsis
 
Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree days
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) Metabolism
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdf
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
 
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
 
A Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert EinsteinA Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert Einstein
 
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptxNanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
 
PARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semPARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th sem
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
EU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfEU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdf
 
Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 students
 
ANITINUTRITION FACTOR GYLCOSIDES SAPONINS CYANODENS
ANITINUTRITION FACTOR GYLCOSIDES SAPONINS CYANODENSANITINUTRITION FACTOR GYLCOSIDES SAPONINS CYANODENS
ANITINUTRITION FACTOR GYLCOSIDES SAPONINS CYANODENS
 
Heads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfHeads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdf
 
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.pptGENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
GENETICALLY MODIFIED ORGANISM'S PRESENTATION.ppt
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 

How Do Gain and Discount Functions Affect the Correlation between DCG and User Satisfaction?

  • 1. How do Gain and Discount Functions Affect the Correlation between DCG and User Satisfaction? Julián Urbano Mónica Marrero ECIR 2015 Vienna, March 30th Discount d(i ; k) Gain g(r) Zipfian: 1/݅ Linear: ‫ݎ‬ Linear: ሺ݇ ൅ ݅ െ 1ሻ/݇ Exp(2): 2௥ െ 1 Constant: 1 Exp(3): 3௥ െ 1 Log(2): 1/ logଶ ݅ ൅ 1 Exp(5): 5௥ െ 1 Log(3): 1/ logଷ ݅ ൅ 2 Bin(1): Iሾ‫ݎ‬ ൒ 1ሿ Log(5): 1/ logହሺ݅ ൅ 4ሻ Bin(2): Iሾ‫ݎ‬ ൒ 2ሿ Discount functions Rank i Discountd(i) 0.00.20.40.60.81.0 1 2 3 4 5 Zipfian Linear Constant Log(2) Log(3) Log(5) Gain functions Relevance r Gaing(r) 0510152025 0 1 2 Linear Exp(2) Exp(3) Exp(5) Bin(1)Bin(2) Documents Information Need Real World Cranfield IR System Topic Relevance Judgments IR System Documents GAP DCG ERR Static Component Dynamic Component Test Collection Effectiveness Measures Time to complete task, Idle time, Success rate, Frustration, Satisfaction, Ease of use, Ease of learning… Precision, Average Precision, Reciprocal Rank, Q-measure, Discounted Cumulative Gain, Rank-Biased Precision, Time-Biased Gain… Live Observation What Gain and Discount for DCG are better to predict user satisfaction? • First, let’s normalize DCG scores (this is not nDCG!) • One system with DCG=φ. What does it mean? • Intuition: φ·100% of users will be satisfied • P(Sat|DCG= φ)= φ • Two systems with ΔDCG=Δφ. What does it mean? • Intuition: users will prefer the (supposedly) better one • P(Pref|ΔDCG=Δφ)=1 P(Sat) and P(Pref) depend on the systems, not on how we evaluate them. Yet, there are many different ways to compute effectiveness Experiment • Collect user preferences between two systems • Map DCG onto P(Sat) • Map ΔDCG onto P(Pref) • Music recommendation task • Ad-hoc, informational, enjoyable by assessors • Preferences less confounded by interface effects • All data from MIREX (TREC-like for Music IR) • Datasets from 2007–2012 • 3-point relevance scale: {0, 1, 2} • 4115 examples • Uniformly covering the [0,1] range of ΔDCG • 432 unique queries • 5636 unique documents • Crowdsourced with Crowdflower • Trap examples for quality control • 113 subjects 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Zipfian discount Difference in DCG Probabilitythatusersagree Gains Linear Exp(2) Exp(3) Exp(5) Bin(1) Bin(2) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(2) discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(3) discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(5) discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Constant discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Zipfian discount DCG Probabilityofusersatisfaction Gains Linear Exp(2) Exp(3) Exp(5) Bin(1) Bin(2) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear discount DCG Probabilityofusersatisfaction 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(2) discount DCG Probabilityofusersatisfaction 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(3) discount DCG Probabilityofusersatisfaction 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(5) discount DCG Probabilityofusersatisfaction 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Constant discount DCG Probabilityofusersatisfaction Results (1 system): DCG predicting user satisfaction Results (2 systems): ΔDCG predicting user preference Results: bias of Gain and Discount functions • Diagonal: how far is P(Sat|DCG) from the ideal diagonal? • Intuitiveness of DCG scores • Endpoint: how far is P(Sat|DCG) from the ideal 0% and 100%? • User disagreement and goodness of the DCG user model • Top: how far is P(Pref|ΔDCG) from the ideal 100%? • Discriminative power Summary and Implications • New method to map system effectiveness onto user satisfaction • Sample application to DCG for a music recommendation task • Gain functions that emphasize highly relevant documents underestimate user satisfaction. Linear gain is better than exponential • All discount functions bias the prediction of user satisfaction • This task might be too enjoyable to observe discount effect • Size (of the DCG difference) does matter • Non-parametric statistics (eg. Sign test, Wilcoxon test) and just looking at the ranking of systems (eg. Kendall τ) oversimplify the evaluation problem • Zero-point null hypothesis testing (ie. H0 : ΔDCG=0) is not reasonable • Future work will investigate this method for Text IR • Provide a common framework, based on P(Sat) and P(Pref), to evaluate with informational and navigational queries using appropriate measures Data and code available online ‫݇@ܩܥܦ‬ ൌ ∑ ݃ ‫ݎ‬௜ ⋅ ݀ ݅ ; ݇௞ ௜ୀଵ ∑ ݃ ‫ݎ‬௠௔௫ ⋅ ݀ ݅ ; ݇௞ ௜ୀଵ 0.060.100.14 Diagonal bias Bias Bin(2) Bin(1) Exp(5) Exp(3) Exp(2) Linear 0.060.100.14 Zipfian Linear Log(2) Log(3) Log(5) Constant 0.060.100.14 Gain Discount 0.140.180.22 Endpoint bias Bias Bin(2) Bin(1) Exp(5) Exp(3) Exp(2) Linear 0.140.180.22 Zipfian Linear Log(2) Log(3) Log(5) Constant 0.140.180.22 Gain Discount 0.460.500.54 Top bias Bias Bin(2) Bin(1) Exp(5) Exp(3) Exp(2) Linear 0.460.500.54 Zipfian Linear Log(2) Log(3) Log(5) Constant 0.460.500.54 Gain Discount