SlideShare a Scribd company logo
Setting Goals and Choosing Metrics for Recommender
                   System Evaluations
                     Gunnar Schröder, Maik Thiele, Wolfgang Lehner

Gunnar Schröder                                                    UCERSTI 2 Workshop
T-Systems Multimedia Solutions                            at the 5th ACM Conference on
Dresden University of Technology                                 Recommender Systems
                                                           Chicago, October 23th, 2011
How Do You Evaluate Recommender Systems?


                                               RMSE
                           Precision
                                                                       F1-Measure
              Recall                     MAE
                                                          ROC Curves
   Qualitative Techniques
                                                                               Quantitative Techniques
                              User-Centric Evaluation

        Mean Average Precision                                    Area under the Curve


                       Accuracy Metrics                    Non-Accuracy Metrics



                 But why do you do it exactly this way?


              Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Some of the Issues This Paper Tries to Touch


   A large variety of metrics have been published
   Some metrics are highly correlated [Herlocker 2004]
   Little guidance for evaluating recommenders and choosing metrics

   Which aspects of the usage scenario and the data influence the choice?
   Which metrics are applicable?
   What do these metrics express?
   What are differences among them?
   Which metric represents our use-case best?
   How much do the metrics suffer from biases?




                  Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Factors That Influence the Choice of Evaluation Metrics


                                   Objectives for recommender usage

              Business goals                                                       User interests


                                   Recommender task and interaction

 Prediction       Classification                    Ranking                     Similarity                Presentation


                                                Preference data

   Explicit            Implicit                       Unary                        Binary                  Numerical


                                               Choice of metrics



                 Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Major Classes of Evaluation Metrics


     Prediction Accuracy Metrics
     Ranking Accuracy Metrics
     Classification Accuracy Metrics
     Non-Accuracy Metrics




    5.0     4.8         4.7          4.3             3.8            3.2           2.4            2.1         1.6   1.2




                    Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Why Precision, Recall and F1-Measure May Fool You


   Ideal recommender (example a – f) vs. Worst-case recommender (ex. g – l )
   Four recommendations (R1 – R4) e.g. Precision@4
   Ten items with a varying ratio of relevant items (1 – 9 relevant items)




   Precision, recall and F1-measure are very sensitive to the ratio of relevant items Figure 3
   They fail to distinguish between an ideal recommender and a worst-case recommender if
    the ratio of relevant items is varied



                   Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
What is the Ideal Length for a Top-k Recommendation List?


   A typical ranking produced by a recommender on a set of ten item with four items being
    relevant
   The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10)




                                                                                                           Figure 1




                  Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
What is the Ideal Length for a Top-k Recommendation List?


   A typical ranking produced by a recommender on a set of ten item with four items being
    relevant
   The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10)

                                                                                                                 2.
                                                                                                                      1.
                                                                                                                 2.

                                                                                                                 2.
                                                                                                            3.

                                                                                                           part of
                                                                                                           Figure 1




                  Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
What is the Ideal Length for a Top-k Recommendation List?


   A typical ranking produced by a recommender on a set of ten item with four items being
    relevant
   The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10)

                                                                                              2.                        3.
                                                                                                1.             2.            1.
                                                                                            3.                     1.    2.
                                                                                                              3.        3.

                                                                                                                        3.


                                                                                                                        part of
   Markedness = Precision + InvPrecision – 1                                                                           Figure 1
   Informedness = Recall + InvRecall – 1
   Matthew’s Correlation =
                                                                                         [Powers 2007]


                     Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
From Simple Classification Measures to Partial Ranking Measures


   Moving a single relevant item among the recommenders ranking (examples a - j)




   Idea: Consider both classification and ranking for the top-k recommendations                           Figure 2


   Area under the Curve => Limited Area under the Curve

   Boolean Kendall’s Tau => Limited Boolean Kendall’s Tau




                  Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
A Further More Complex Example to Study at Home




                                                                                                           Figure 4
   Conclusions:
      For classification use markedness, informedness and Matthew’s correlation instead
       of precision, recall and F1 measure
      Limited area under the curve and limited boolean Kendall’s tau are useful metrics for
       top-k recommender evaluations

                  Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Conclusion and Contributions


   Important aspects that influence the metric choice
      Objectives for recommender usage
      Recommender task and interaction
      Aspects of preference data


   Some problems of Precision, Recall and F1-Measure
   The advantages of markedness, informedness and Matthew’s correlation

   Two new metrics that measure the ranking of a limited top-k list
      Limited area under the curve, limited boolean Kendall’s tau


   Guidelines for choosing a metric (See paper)


                   Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Thank You Very Much!


   Do not hesitate to contact me, if you have any
    questions, comments or answers!




   Slides are available via e-mail or slideshare




                   Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder

More Related Content

What's hot

Tutorial 1 ahp_relative_model_ver_2.2.x
Tutorial 1 ahp_relative_model_ver_2.2.xTutorial 1 ahp_relative_model_ver_2.2.x
Tutorial 1 ahp_relative_model_ver_2.2.xelenau12
 
Building & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business CaseBuilding & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business Case
Siddhanth Chaurasiya
 
SuperDecision for AHP and ANP
SuperDecision for AHP and ANPSuperDecision for AHP and ANP
SuperDecision for AHP and ANPElena Rokou
 
Machine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - Reboot
Marion DE SOUSA
 
Microsoft Professional Capstone: Data Science
Microsoft Professional Capstone: Data ScienceMicrosoft Professional Capstone: Data Science
Microsoft Professional Capstone: Data Science
Mashfiq Shahriar
 
Trust Region Algorithm - Bachelor Dissertation
Trust Region Algorithm - Bachelor DissertationTrust Region Algorithm - Bachelor Dissertation
Trust Region Algorithm - Bachelor DissertationChristian Adom
 

What's hot (6)

Tutorial 1 ahp_relative_model_ver_2.2.x
Tutorial 1 ahp_relative_model_ver_2.2.xTutorial 1 ahp_relative_model_ver_2.2.x
Tutorial 1 ahp_relative_model_ver_2.2.x
 
Building & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business CaseBuilding & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business Case
 
SuperDecision for AHP and ANP
SuperDecision for AHP and ANPSuperDecision for AHP and ANP
SuperDecision for AHP and ANP
 
Machine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - Reboot
 
Microsoft Professional Capstone: Data Science
Microsoft Professional Capstone: Data ScienceMicrosoft Professional Capstone: Data Science
Microsoft Professional Capstone: Data Science
 
Trust Region Algorithm - Bachelor Dissertation
Trust Region Algorithm - Bachelor DissertationTrust Region Algorithm - Bachelor Dissertation
Trust Region Algorithm - Bachelor Dissertation
 

Viewers also liked

ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
Pablo Castells
 
Diversity and novelty for recommendation system
Diversity and novelty for recommendation systemDiversity and novelty for recommendation system
Diversity and novelty for recommendation system
Zhenv5
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Udith Gunaratna
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
Alejandro Bellogin
 
Recommender system
Recommender systemRecommender system
Recommender system
Jie Jin
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 

Viewers also liked (6)

ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
 
Diversity and novelty for recommendation system
Diversity and novelty for recommendation systemDiversity and novelty for recommendation system
Diversity and novelty for recommendation system
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 

Similar to Setting Goals and Choosing Metrics for Recommender System Evaluations

Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docx
daniahendric
 
IM426 3A G5.ppt
IM426 3A G5.pptIM426 3A G5.ppt
IM426 3A G5.ppt
MohamedSalem979344
 
software engineering module i & ii.pptx
software engineering module i & ii.pptxsoftware engineering module i & ii.pptx
software engineering module i & ii.pptx
rani marri
 
Research Methodology (RM)- Scaling Techniques- MBA
Research Methodology (RM)- Scaling Techniques- MBAResearch Methodology (RM)- Scaling Techniques- MBA
Research Methodology (RM)- Scaling Techniques- MBA
Chandra Shekar Immani
 
Identifying Thresholds for Distance Design-based Direct Class Cohesion (D3C2)...
Identifying Thresholds for Distance Design-based Direct Class Cohesion (D3C2)...Identifying Thresholds for Distance Design-based Direct Class Cohesion (D3C2)...
Identifying Thresholds for Distance Design-based Direct Class Cohesion (D3C2)...
IJECEIAES
 
A Study of Significant Software Metrics
A Study of Significant Software MetricsA Study of Significant Software Metrics
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTSSPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
cscpconf
 
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning TechniquesCredit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Zhongmin Luo
 
PASS grade must be achievedOutcomesLearner has demonstr.docx
PASS grade must be achievedOutcomesLearner has demonstr.docxPASS grade must be achievedOutcomesLearner has demonstr.docx
PASS grade must be achievedOutcomesLearner has demonstr.docx
herbertwilson5999
 
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
ijseajournal
 
IRJET- Crowd Density Estimation using Image Processing
IRJET- Crowd Density Estimation using Image ProcessingIRJET- Crowd Density Estimation using Image Processing
IRJET- Crowd Density Estimation using Image Processing
IRJET Journal
 
New seven management tools
New seven management toolsNew seven management tools
New seven management tools
Ramasudharsanam Kamalakannan
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectMarciano Moreno
 
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy LogicImproved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
IRJET Journal
 
operation research notes
operation research notesoperation research notes
operation research notesRenu Thakur
 
Management quality tools (6) 19 1-2016
Management quality tools (6)  19 1-2016Management quality tools (6)  19 1-2016
Management quality tools (6) 19 1-2016
Ahmed said
 
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
IRJET Journal
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding Prediction
UT, San Antonio
 
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTIONGRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
IJCSEA Journal
 

Similar to Setting Goals and Choosing Metrics for Recommender System Evaluations (20)

Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docx
 
IM426 3A G5.ppt
IM426 3A G5.pptIM426 3A G5.ppt
IM426 3A G5.ppt
 
software engineering module i & ii.pptx
software engineering module i & ii.pptxsoftware engineering module i & ii.pptx
software engineering module i & ii.pptx
 
Research Methodology (RM)- Scaling Techniques- MBA
Research Methodology (RM)- Scaling Techniques- MBAResearch Methodology (RM)- Scaling Techniques- MBA
Research Methodology (RM)- Scaling Techniques- MBA
 
Identifying Thresholds for Distance Design-based Direct Class Cohesion (D3C2)...
Identifying Thresholds for Distance Design-based Direct Class Cohesion (D3C2)...Identifying Thresholds for Distance Design-based Direct Class Cohesion (D3C2)...
Identifying Thresholds for Distance Design-based Direct Class Cohesion (D3C2)...
 
A Study of Significant Software Metrics
A Study of Significant Software MetricsA Study of Significant Software Metrics
A Study of Significant Software Metrics
 
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTSSPEECH CLASSIFICATION USING ZERNIKE MOMENTS
SPEECH CLASSIFICATION USING ZERNIKE MOMENTS
 
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning TechniquesCredit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
 
PASS grade must be achievedOutcomesLearner has demonstr.docx
PASS grade must be achievedOutcomesLearner has demonstr.docxPASS grade must be achievedOutcomesLearner has demonstr.docx
PASS grade must be achievedOutcomesLearner has demonstr.docx
 
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
 
IRJET- Crowd Density Estimation using Image Processing
IRJET- Crowd Density Estimation using Image ProcessingIRJET- Crowd Density Estimation using Image Processing
IRJET- Crowd Density Estimation using Image Processing
 
New seven management tools
New seven management toolsNew seven management tools
New seven management tools
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProject
 
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy LogicImproved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
 
operation research notes
operation research notesoperation research notes
operation research notes
 
Management quality tools (6) 19 1-2016
Management quality tools (6)  19 1-2016Management quality tools (6)  19 1-2016
Management quality tools (6) 19 1-2016
 
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding Prediction
 
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTIONGRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
 

Recently uploaded

special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 

Recently uploaded (20)

special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 

Setting Goals and Choosing Metrics for Recommender System Evaluations

  • 1. Setting Goals and Choosing Metrics for Recommender System Evaluations Gunnar Schröder, Maik Thiele, Wolfgang Lehner Gunnar Schröder UCERSTI 2 Workshop T-Systems Multimedia Solutions at the 5th ACM Conference on Dresden University of Technology Recommender Systems Chicago, October 23th, 2011
  • 2. How Do You Evaluate Recommender Systems? RMSE Precision F1-Measure Recall MAE ROC Curves Qualitative Techniques Quantitative Techniques User-Centric Evaluation Mean Average Precision Area under the Curve Accuracy Metrics Non-Accuracy Metrics But why do you do it exactly this way? Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 3. Some of the Issues This Paper Tries to Touch  A large variety of metrics have been published  Some metrics are highly correlated [Herlocker 2004]  Little guidance for evaluating recommenders and choosing metrics  Which aspects of the usage scenario and the data influence the choice?  Which metrics are applicable?  What do these metrics express?  What are differences among them?  Which metric represents our use-case best?  How much do the metrics suffer from biases? Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 4. Factors That Influence the Choice of Evaluation Metrics Objectives for recommender usage Business goals User interests Recommender task and interaction Prediction Classification Ranking Similarity Presentation Preference data Explicit Implicit Unary Binary Numerical Choice of metrics Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 5. Major Classes of Evaluation Metrics  Prediction Accuracy Metrics  Ranking Accuracy Metrics  Classification Accuracy Metrics  Non-Accuracy Metrics 5.0 4.8 4.7 4.3 3.8 3.2 2.4 2.1 1.6 1.2 Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 6. Why Precision, Recall and F1-Measure May Fool You  Ideal recommender (example a – f) vs. Worst-case recommender (ex. g – l )  Four recommendations (R1 – R4) e.g. Precision@4  Ten items with a varying ratio of relevant items (1 – 9 relevant items)  Precision, recall and F1-measure are very sensitive to the ratio of relevant items Figure 3  They fail to distinguish between an ideal recommender and a worst-case recommender if the ratio of relevant items is varied Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 7. What is the Ideal Length for a Top-k Recommendation List?  A typical ranking produced by a recommender on a set of ten item with four items being relevant  The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10) Figure 1 Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 8. What is the Ideal Length for a Top-k Recommendation List?  A typical ranking produced by a recommender on a set of ten item with four items being relevant  The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10) 2. 1. 2. 2. 3. part of Figure 1 Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 9. What is the Ideal Length for a Top-k Recommendation List?  A typical ranking produced by a recommender on a set of ten item with four items being relevant  The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10) 2. 3. 1. 2. 1. 3. 1. 2. 3. 3. 3. part of  Markedness = Precision + InvPrecision – 1 Figure 1  Informedness = Recall + InvRecall – 1  Matthew’s Correlation = [Powers 2007] Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 10. From Simple Classification Measures to Partial Ranking Measures  Moving a single relevant item among the recommenders ranking (examples a - j)  Idea: Consider both classification and ranking for the top-k recommendations Figure 2  Area under the Curve => Limited Area under the Curve  Boolean Kendall’s Tau => Limited Boolean Kendall’s Tau Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 11. A Further More Complex Example to Study at Home Figure 4  Conclusions:  For classification use markedness, informedness and Matthew’s correlation instead of precision, recall and F1 measure  Limited area under the curve and limited boolean Kendall’s tau are useful metrics for top-k recommender evaluations Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 12. Conclusion and Contributions  Important aspects that influence the metric choice  Objectives for recommender usage  Recommender task and interaction  Aspects of preference data  Some problems of Precision, Recall and F1-Measure  The advantages of markedness, informedness and Matthew’s correlation  Two new metrics that measure the ranking of a limited top-k list  Limited area under the curve, limited boolean Kendall’s tau  Guidelines for choosing a metric (See paper) Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • 13. Thank You Very Much!  Do not hesitate to contact me, if you have any questions, comments or answers!  Slides are available via e-mail or slideshare Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder