Setting Goals and Choosing Metrics for Recommender System Evaluations
Upcoming SlideShare
Loading in...5
×
 

Setting Goals and Choosing Metrics for Recommender System Evaluations

on

  • 1,644 views

Recommender systems have become an important personalization technique ...

Recommender systems have become an important personalization technique
on the web and are widely used especially in e-commerce applications.
However, operators of web shops and other platforms are challenged by
the large variety of available algorithms and the multitude of their
possible parameterizations. Since the quality of the recommendations that are
given can have a significant business impact, the selection
of a recommender system should be made based on well-founded evaluation
data. The literature on recommender system evaluation offers a large
variety of evaluation metrics but provides little guidance on how to choose
among them. The paper which is presented in this presentation focuses on the often neglected aspect of clearly defining the goal of an evaluation and how this goal relates to the
selection of an appropriate metric. We discuss several well-known
accuracy metrics and analyze how these reflect different evaluation goals. Furthermore we present some less well-known metrics as well as a variation of the area under the curve measure that are particularly suitable for the evaluation of
recommender systems in e-commerce applications.

Statistics

Views

Total Views
1,644
Views on SlideShare
1,641
Embed Views
3

Actions

Likes
2
Downloads
44
Comments
0

1 Embed 3

http://www.slashdocs.com 3

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Setting Goals and Choosing Metrics for Recommender System Evaluations Setting Goals and Choosing Metrics for Recommender System Evaluations Presentation Transcript

  • Setting Goals and Choosing Metrics for Recommender System Evaluations Gunnar Schröder, Maik Thiele, Wolfgang LehnerGunnar Schröder UCERSTI 2 WorkshopT-Systems Multimedia Solutions at the 5th ACM Conference onDresden University of Technology Recommender Systems Chicago, October 23th, 2011
  • How Do You Evaluate Recommender Systems? RMSE Precision F1-Measure Recall MAE ROC Curves Qualitative Techniques Quantitative Techniques User-Centric Evaluation Mean Average Precision Area under the Curve Accuracy Metrics Non-Accuracy Metrics But why do you do it exactly this way? Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • Some of the Issues This Paper Tries to Touch A large variety of metrics have been published Some metrics are highly correlated [Herlocker 2004] Little guidance for evaluating recommenders and choosing metrics Which aspects of the usage scenario and the data influence the choice? Which metrics are applicable? What do these metrics express? What are differences among them? Which metric represents our use-case best? How much do the metrics suffer from biases? Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • Factors That Influence the Choice of Evaluation Metrics Objectives for recommender usage Business goals User interests Recommender task and interaction Prediction Classification Ranking Similarity Presentation Preference data Explicit Implicit Unary Binary Numerical Choice of metrics Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • Major Classes of Evaluation Metrics Prediction Accuracy Metrics Ranking Accuracy Metrics Classification Accuracy Metrics Non-Accuracy Metrics 5.0 4.8 4.7 4.3 3.8 3.2 2.4 2.1 1.6 1.2 Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • Why Precision, Recall and F1-Measure May Fool You Ideal recommender (example a – f) vs. Worst-case recommender (ex. g – l ) Four recommendations (R1 – R4) e.g. Precision@4 Ten items with a varying ratio of relevant items (1 – 9 relevant items) Precision, recall and F1-measure are very sensitive to the ratio of relevant items Figure 3 They fail to distinguish between an ideal recommender and a worst-case recommender if the ratio of relevant items is varied Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • What is the Ideal Length for a Top-k Recommendation List? A typical ranking produced by a recommender on a set of ten item with four items being relevant The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10) Figure 1 Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • What is the Ideal Length for a Top-k Recommendation List? A typical ranking produced by a recommender on a set of ten item with four items being relevant The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10) 2. 1. 2. 2. 3. part of Figure 1 Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • What is the Ideal Length for a Top-k Recommendation List? A typical ranking produced by a recommender on a set of ten item with four items being relevant The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10) 2. 3. 1. 2. 1. 3. 1. 2. 3. 3. 3. part of Markedness = Precision + InvPrecision – 1 Figure 1 Informedness = Recall + InvRecall – 1 Matthew’s Correlation = [Powers 2007] Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • From Simple Classification Measures to Partial Ranking Measures Moving a single relevant item among the recommenders ranking (examples a - j) Idea: Consider both classification and ranking for the top-k recommendations Figure 2 Area under the Curve => Limited Area under the Curve Boolean Kendall’s Tau => Limited Boolean Kendall’s Tau Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • A Further More Complex Example to Study at Home Figure 4 Conclusions:  For classification use markedness, informedness and Matthew’s correlation instead of precision, recall and F1 measure  Limited area under the curve and limited boolean Kendall’s tau are useful metrics for top-k recommender evaluations Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • Conclusion and Contributions Important aspects that influence the metric choice  Objectives for recommender usage  Recommender task and interaction  Aspects of preference data Some problems of Precision, Recall and F1-Measure The advantages of markedness, informedness and Matthew’s correlation Two new metrics that measure the ranking of a limited top-k list  Limited area under the curve, limited boolean Kendall’s tau Guidelines for choosing a metric (See paper) Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
  • Thank You Very Much! Do not hesitate to contact me, if you have any questions, comments or answers! Slides are available via e-mail or slideshare Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder