Recommendation
system session 2
Eng. Maryam Mostafa
• Evaluation metrics in recommender.
• What makes a good recommender system.
• Example on rank-based recommender.
• Task1.
• Example on collaborative filtering
recommender.
• Task2.
• Try Content-Based filtering at home.
Table of content for today
•Offline Evaluation: Offline evaluation is done in similar ways we evaluate
machine learning models, i.e., we usually have a fixed dataset, collected and
immutable before the beginning of the evaluation, and then the dataset is
splitted into two parts, the train and test set, the RS are trained on the train and
then evaluated over the test set.
•Online Evaluation: As the name states, the online evaluation is usually
performed online, with real users interacting with different versions or algorithms
of a RS and the evaluation is performed by collecting metrics associated with
the user behavior in real time.
Offline
Offline:
• Pros - This type of evaluation can be easier to set. By having lots of already published datasets with their respective
ratings or evaluations, people can easily set up and evaluate their algorithms by comparing their output with the
expected output from the already published results. By having a fixed dataset and possible fixed user interactions
with it (all existing ratings in the dataset) the results of an offline evaluation is also reproducible in an easier way,
comparing to online evaluations.
• Cons - There are a few discussions regarding the validity of offline evaluations. For example, the most criticized
aspect of it is the overall capacity of the performance evaluation of the trained algorithm in a splitted test set. The
idea of a RS is to provide new recommendations that the user probably doesn't know yet. The problem of testing it in
a test set is that we must have already the user's evaluations for each item/recommendation, i.e. we end up testing
only item that we are sure the user knows. Even more, in this evaluation, if the RS recommend an item the user
hadn't evaluated yet but that could be a good recommendation, we penalize it because we don't have it in our
test set. In the end, we end up penalizing the RS for doing its job.
When do I perform one or another?
Both approaches have its pros and cons:
Online:
• Pros - Contrary to offline evaluations, in an online context, we have the possibility to collect real
time user interaction with the RS, among which, reviews, clicks, preferences etc. This can bring a
whole better picture when evaluating the RS's performance. Besides, as we are evaluating real time
data, instead of a static one, we're able to provide further analysis if desired.
• Cons - Dynamic real time data also bring a negative point in the evaluation, as the reproducibility of
the experiment can be worse, when comparing to a static script and dataset. Besides, in order to
prepare (and maybe even create) the environment to test the RS, we must expend a considerable
higher amount of time to set it up.
DIFFERENT OFFLINE METRICS
The different offline metrics and other measures that define our Recommendation System are
mentioned below.
Note
• They are not complete withing themselves in case of Recommendation Systems
i.e RMSE value of 0.8766 for an algorithm doesn’t mean anything until there is
another algorithm with another RMSE value with which we can compare our
current RMSE value.
• MSE or RMSE doesn’t matter in the real world. What matters the most is which
movies you post in front of a user in top N recommendations and how users react
to it.
WHAT TO FOCUS ON — WHICH METRIC?
Given that we have covered various metrics and dimensions for
evaluating our Recommendation System; You might be thinking which
metric is the best?! Well, it “depends”. There are many factors that we
have to consider before giving priority to one metric over another.
Metrics must be looked at together and we must understand the trade-
offs between them. We must also focus on the requirements and main
objective for building Recommendation System.
Python code for implementation of each
of the above metrics can be found in
Kaggle kernel (here)
Now let’s
practice
Thank you

session2.pdf

  • 1.
  • 2.
    • Evaluation metricsin recommender. • What makes a good recommender system. • Example on rank-based recommender. • Task1. • Example on collaborative filtering recommender. • Task2. • Try Content-Based filtering at home. Table of content for today
  • 5.
    •Offline Evaluation: Offlineevaluation is done in similar ways we evaluate machine learning models, i.e., we usually have a fixed dataset, collected and immutable before the beginning of the evaluation, and then the dataset is splitted into two parts, the train and test set, the RS are trained on the train and then evaluated over the test set. •Online Evaluation: As the name states, the online evaluation is usually performed online, with real users interacting with different versions or algorithms of a RS and the evaluation is performed by collecting metrics associated with the user behavior in real time.
  • 6.
  • 7.
    Offline: • Pros -This type of evaluation can be easier to set. By having lots of already published datasets with their respective ratings or evaluations, people can easily set up and evaluate their algorithms by comparing their output with the expected output from the already published results. By having a fixed dataset and possible fixed user interactions with it (all existing ratings in the dataset) the results of an offline evaluation is also reproducible in an easier way, comparing to online evaluations. • Cons - There are a few discussions regarding the validity of offline evaluations. For example, the most criticized aspect of it is the overall capacity of the performance evaluation of the trained algorithm in a splitted test set. The idea of a RS is to provide new recommendations that the user probably doesn't know yet. The problem of testing it in a test set is that we must have already the user's evaluations for each item/recommendation, i.e. we end up testing only item that we are sure the user knows. Even more, in this evaluation, if the RS recommend an item the user hadn't evaluated yet but that could be a good recommendation, we penalize it because we don't have it in our test set. In the end, we end up penalizing the RS for doing its job. When do I perform one or another? Both approaches have its pros and cons:
  • 8.
    Online: • Pros -Contrary to offline evaluations, in an online context, we have the possibility to collect real time user interaction with the RS, among which, reviews, clicks, preferences etc. This can bring a whole better picture when evaluating the RS's performance. Besides, as we are evaluating real time data, instead of a static one, we're able to provide further analysis if desired. • Cons - Dynamic real time data also bring a negative point in the evaluation, as the reproducibility of the experiment can be worse, when comparing to a static script and dataset. Besides, in order to prepare (and maybe even create) the environment to test the RS, we must expend a considerable higher amount of time to set it up.
  • 9.
    DIFFERENT OFFLINE METRICS Thedifferent offline metrics and other measures that define our Recommendation System are mentioned below.
  • 12.
    Note • They arenot complete withing themselves in case of Recommendation Systems i.e RMSE value of 0.8766 for an algorithm doesn’t mean anything until there is another algorithm with another RMSE value with which we can compare our current RMSE value. • MSE or RMSE doesn’t matter in the real world. What matters the most is which movies you post in front of a user in top N recommendations and how users react to it.
  • 22.
    WHAT TO FOCUSON — WHICH METRIC? Given that we have covered various metrics and dimensions for evaluating our Recommendation System; You might be thinking which metric is the best?! Well, it “depends”. There are many factors that we have to consider before giving priority to one metric over another. Metrics must be looked at together and we must understand the trade- offs between them. We must also focus on the requirements and main objective for building Recommendation System.
  • 24.
    Python code forimplementation of each of the above metrics can be found in Kaggle kernel (here)
  • 25.
  • 26.