SlideShare a Scribd company logo
Recommender Systems
    Challenges
      Best Practices
     Tutorial & Panel


       ACM RecSys 2012
           Dublin
         September 10, 2012
About us
•   Alan Said - PhD Student @ TU-Berlin
    o   Topics: RecSys Evaluation
    o   @alansaid
    o   URL: www.alansaid.com


•   Domonkos Tikk - CEO @ Gravity R&D
    o   Topics: Machine Learning methods for RecSys
    o   @domonkostikk
    o   http://www.tmit.bme.hu/tikk.domonkos


•   Andreas Hotho - Prof. @ Uni. Würzburg
    o   Topics: Data Mining, Information Retrieval, Web Science
    o   http://www.is.informatik.uni-wuerzburg.de/staff/hotho
General Motivation
"RecSys is nobody's home conference. We
  come from CHI, IUI, SIGIR, etc."
  Joe Konstan - RecSys 2010


RecSys is our home conference - we
should evaluate accordingly!
Outline
•   Tutorial
    o Introduction to concepts in challenges
    o Execution of a challenge
    o Conclusion

•   Panel
      Experiences of participating in and
      organizing challenges
         Yehuda Koren
         Darren Vengroff
         Torben Brodt
What is the motivation
for RecSys Challenges?
          Part 1
Setup - information overload




users


                      content of service
                          provider
        recommender
Motivation of stakeholders
find relevant content
easy navigation
serendipity, discovery

  user                                service

                                    increase revenue
                                    target user with
                    recom               the right content
                                    engage users
 facilitate goals of stakeholders
 get recognized
Evaluation in terms of the business
                           business
                           reporting




Online evaluation
   (A/B test)
                      Casting into a
                    research problem
Context of the contest
•   Selection of metrics
•   Domain dependent
•   Offline vs. online evaluation


•   IR centric evaluation
     o RMSE
     o MAP
     o F1
Latent user needs
Recsys Competition Highlights
                          •   Large scale
                          •   Organization
                          •   RMSE
•   3-stage setup         •   Prize
•   selection by review
•   runtime limits
•   real traffic
•   revenue increase
                          •   offline
                          •   MAP@500
                          •   metadata available
                          •   larger in dimensions
                          •   no ratings
Recurring Competitions
•   ACM KDD Cup (2007, 2011, 2012)
•   ECML/PKDD Discovery Challenge (2008
    onwards)
    o 2008 and 09: tag recommendation in social
      bookmarking (incl. online evaluation task)
    o 2011: video lectures
•   CAMRa (2010, 2011, 2012)
Does size matter?
•   Yes! – real world users
•   In research – to some extent
Research & Industry
Important for both
• Industry has the data and research needs
  data
• Industry needs better approaches but this
  costs
• Research has ideas but has no systems
  and/or data to do the evaluation

Don't exploit participants
Don't be too greedy
Running a Challenge
       Part 2
Standard Challenge Setting
•   organizer defines the recommender setting e.g.
    tag recommendation in BibSonomy
•   provide data
    o   with features or
    o   raw data
    o   construct your own data
•   fix the way to do the evaluation
•   define the goal e.g. reach a certain
    improvement (F1)
•   motivate people to participate:
    e.g. promise a lot of money ;-)
Typical contest settings
 •   offline
     o   everyone gets access to the dataset
     o   in principle it is a prediction task, the user can't be influenced
     o   privacy of the user within the data is a big issue
     o   results from offline experimentation have limited predictive power
         for online user behavior

 •   online
     o   after a first learning phase the recommender is plugged into a real
         system
     o   user can be influenced but only by the selected system
     o   comparison of different system is not completely fair

 •   further ways
     o   user study
Example online setting
(BibSonomy)




BALBY MARINHO, L. ; HOTHO, A. ; JÄSCHKE, R. ; NANOPOULOS, A. ; RENDLE, S. ; SCHMIDT-THIEME, L. ; STUMME, G. ; SYMEONIDIS, P.:
Recommender Systems for Social Tagging Systems : SPRINGER, 2012 (SpringerBriefs in Electrical and Computer Engineering). - ISBN 978-1-
4614-1893-1
Which evaluation measures?
•   Root Mean Squared Error (RMSE)
•   Mean Absolute Error (MAE)
•   Typical IR measures
    o   precision @ n-items
    o   recall @ n-items
    o   False Positive Rate
    o   F1 @ n-items
    o   Area Under the ROC Curve (AUC)
•   non-quality measures
    o   server answer time
    o   understandability of the results
Discussion of measures?
    RMSE - Precision
• RMSE is not necessarily the king of metrics
    as RMSE is easy to optimize on
•   What about Top-n?
•   but RMSE is not influenced by popularity as
    top-n

• What about user-centric stuff?
• Ranking-based measure in KDD Cup 2011,
    Track 2
Results influenced by ...

•   target of the recommendation (user, resources, etc...)
•   evaluation methodology (leave-one-out, time based split, random
    sample, cross validation)
•   evaluation measure
•   design of the application (online setting)
•   the selected part of the data and its preprocessing (e.g.
    p-core vs. long tail)
•   scalability vs. quality of the model
•   feature and content accessible and usable for the
    recommendation
Don't forget..
• the effort to organize a challenge is very big
• preparing data takes time
• answering questions takes even more time
• participants are creative, needs for reaction
• time to compute the evaluation and check the
    results
•   prepare proceedings with the outcome
•   ...
What have we learnt?
    Conclusion
        Part 3
Challenges are good since they...
•   ... are focused on solving a single problem
•   ... have many participants
•   ... create common evaluation criteria
•   ... have comparable results
•   ... bring real-world problems to research
•   ... make it easy to crown a winner
•   ... they are cheap (even with a 1M$ prize)
Is that the complete truth?




           No!
Is that the complete truth?
•   Why?
Because using standard information retrieval metrics we
cannot evaluate recommender system concepts like:
    • user interaction
    • perception
    • satisfaction
    • usefulness
    • any metric not based on accuracy/rating prediction
      and negative predictions
    • scalability
    • engineering
We can't catch everything offline
        Scalability

                      Presentation



                      Interaction
The difference between IR and RS
Information retrieval systems answer to a need


                 A Query
Recommender systems identify the user's needs
Should we organize more
challenges?
•   Yes - but before we do that, think of
    o What is the utility of Yet Another Dataset - aren't
      there enough already?
    o How do we create a real-world like challenge
    o How do we get real user feedback
Take home message
•   Real needs of users and content providers are better
    reflected in online evaluation

•   Consider technical limitations as well

•   Challenges advance the field a lot
    o Matrix factorization & ensemble methods in the
      Netflix Prize
    o Evaluation measure and objective in the KDD Cup
      2011
Related events at RecSys
•   Workshops
    o   Recommender Utility Evaluation
    o   RecSys Data Challenge
•   Paper Sessions
    o Multi-Objective Recommendation and Human
      Factors - Mon. 14:30
    o Implicit Feedback and User Preference - Tue. 11:00
    o Top-N Recommendation - Wed. 14:30

•   More challenges:
    o   www.recsyswiki.com/wiki/Category:Competition
Panel
Part 4
Panel
•   Torben Brodt
    o   Plista
    o   Organizing Plista Contest

•   Yehuda Koren
    o   Google
    o   Member of winning team of the Netflix Prize


•   Darren Vengroff
    o   RichRelevance
    o   Organizer of RecLab Prize
Questions
•   How does recommendation influence the
    user and system?
•   How can we quantify the effects of the UI?
•   How should we translate what we've
    presented into an actual challenge?
•   should we focus on the long tail or the short
    head?
•   Evaluation measures, click rate, wtf@k
•   How to evaluate conversion rate?

More Related Content

What's hot

Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Xavier Amatriain
 
Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slides
Daniel Valcarce
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
PyData
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
Xavier Amatriain
 
Aiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionAiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversion
Deepak Agarwal
 
Social Recommender Systems
Social Recommender SystemsSocial Recommender Systems
Social Recommender Systems
guest77b0cd12
 
Product Recommendations Enhanced with Reviews
Product Recommendations Enhanced with ReviewsProduct Recommendations Enhanced with Reviews
Product Recommendations Enhanced with Reviews
maranlar
 
Recommender system
Recommender systemRecommender system
Recommender system
Nilotpal Pramanik
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Girish Khanzode
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and Deepak
Deepak Agarwal
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
Şeyda Hatipoğlu
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation SystemsSalil Navgire
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systemsyoualab
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Carlos Castillo (ChaTo)
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
Falitokiniaina Rabearison
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
Stanley Wang
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
David Zibriczky
 
Recommendation System for Design Patterns in Software Development
Recommendation System for Design Patterns in Software DevelopmentRecommendation System for Design Patterns in Software Development
Recommendation System for Design Patterns in Software Development
Francis Palma
 

What's hot (20)

Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slides
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 
Aiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionAiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversion
 
Social Recommender Systems
Social Recommender SystemsSocial Recommender Systems
Social Recommender Systems
 
Product Recommendations Enhanced with Reviews
Product Recommendations Enhanced with ReviewsProduct Recommendations Enhanced with Reviews
Product Recommendations Enhanced with Reviews
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and Deepak
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Recommendation System for Design Patterns in Software Development
Recommendation System for Design Patterns in Software DevelopmentRecommendation System for Design Patterns in Software Development
Recommendation System for Design Patterns in Software Development
 

Similar to Best Practices in Recommender System Challenges

PAS: The Planning Quality Framework
PAS: The Planning Quality FrameworkPAS: The Planning Quality Framework
PAS: The Planning Quality Framework
PAS_Team
 
Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...
DRIscience
 
ECIR Recommendation Challenges
ECIR Recommendation ChallengesECIR Recommendation Challenges
ECIR Recommendation Challenges
Daniel Kohlsdorf
 
Value stream mapping for complex processes (innovation, Lean, service design)
Value stream mapping for complex processes (innovation, Lean, service design) Value stream mapping for complex processes (innovation, Lean, service design)
Value stream mapping for complex processes (innovation, Lean, service design)
Teemu Toivonen
 
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Alan Said
 
Building Search and Personalization at Nordstrom Rack | Hautelook
Building Search and Personalization at Nordstrom Rack | HautelookBuilding Search and Personalization at Nordstrom Rack | Hautelook
Building Search and Personalization at Nordstrom Rack | Hautelook
Lucidworks
 
Unlocking the value of customer data
Unlocking the value of customer dataUnlocking the value of customer data
Unlocking the value of customer dataJanessa Lantz
 
Dlf 2012
Dlf 2012Dlf 2012
Dlf 2012
sherriberger
 
7.1 Mapping Your Processes to Deliver an Exceptional Student Experience
7.1 Mapping Your Processes to Deliver an Exceptional Student Experience7.1 Mapping Your Processes to Deliver an Exceptional Student Experience
7.1 Mapping Your Processes to Deliver an Exceptional Student Experience
TargetX
 
Downloads abc 2006 presentation downloads-ramesh_babu
Downloads abc 2006   presentation downloads-ramesh_babuDownloads abc 2006   presentation downloads-ramesh_babu
Downloads abc 2006 presentation downloads-ramesh_babu
Hem Rana
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project Delivery
Mark Constable
 
Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspective
oralonso
 
Knowledge Management in Healthcare Analytics
Knowledge Management in Healthcare AnalyticsKnowledge Management in Healthcare Analytics
Knowledge Management in Healthcare Analytics
Gregory Nelson
 
The art of project estimation
The art of project estimationThe art of project estimation
The art of project estimation
Return on Intelligence
 
PQF Overview
PQF OverviewPQF Overview
PQF Overview
Martin Hutchings
 
Ch 3
Ch   3Ch   3
Crafting a Compelling Data Science Resume
Crafting a Compelling Data Science ResumeCrafting a Compelling Data Science Resume
Crafting a Compelling Data Science Resume
Arushi Prakash, Ph.D.
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Maya Hristakeva
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptx
ellamangapis2003
 
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
Usability Testing for Qualitative Researchers - QRCA NYC Chapter eventUsability Testing for Qualitative Researchers - QRCA NYC Chapter event
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
Kay Aubrey
 

Similar to Best Practices in Recommender System Challenges (20)

PAS: The Planning Quality Framework
PAS: The Planning Quality FrameworkPAS: The Planning Quality Framework
PAS: The Planning Quality Framework
 
Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...
 
ECIR Recommendation Challenges
ECIR Recommendation ChallengesECIR Recommendation Challenges
ECIR Recommendation Challenges
 
Value stream mapping for complex processes (innovation, Lean, service design)
Value stream mapping for complex processes (innovation, Lean, service design) Value stream mapping for complex processes (innovation, Lean, service design)
Value stream mapping for complex processes (innovation, Lean, service design)
 
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
 
Building Search and Personalization at Nordstrom Rack | Hautelook
Building Search and Personalization at Nordstrom Rack | HautelookBuilding Search and Personalization at Nordstrom Rack | Hautelook
Building Search and Personalization at Nordstrom Rack | Hautelook
 
Unlocking the value of customer data
Unlocking the value of customer dataUnlocking the value of customer data
Unlocking the value of customer data
 
Dlf 2012
Dlf 2012Dlf 2012
Dlf 2012
 
7.1 Mapping Your Processes to Deliver an Exceptional Student Experience
7.1 Mapping Your Processes to Deliver an Exceptional Student Experience7.1 Mapping Your Processes to Deliver an Exceptional Student Experience
7.1 Mapping Your Processes to Deliver an Exceptional Student Experience
 
Downloads abc 2006 presentation downloads-ramesh_babu
Downloads abc 2006   presentation downloads-ramesh_babuDownloads abc 2006   presentation downloads-ramesh_babu
Downloads abc 2006 presentation downloads-ramesh_babu
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project Delivery
 
Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspective
 
Knowledge Management in Healthcare Analytics
Knowledge Management in Healthcare AnalyticsKnowledge Management in Healthcare Analytics
Knowledge Management in Healthcare Analytics
 
The art of project estimation
The art of project estimationThe art of project estimation
The art of project estimation
 
PQF Overview
PQF OverviewPQF Overview
PQF Overview
 
Ch 3
Ch   3Ch   3
Ch 3
 
Crafting a Compelling Data Science Resume
Crafting a Compelling Data Science ResumeCrafting a Compelling Data Science Resume
Crafting a Compelling Data Science Resume
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptx
 
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
Usability Testing for Qualitative Researchers - QRCA NYC Chapter eventUsability Testing for Qualitative Researchers - QRCA NYC Chapter event
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
 

More from Alan Said

Replication of Recommender Systems Research
Replication of Recommender Systems ResearchReplication of Recommender Systems Research
Replication of Recommender Systems Research
Alan Said
 
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just RatingsThe Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
Alan Said
 
A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
A Top-N Recommender System Evaluation Protocol Inspired by Deployed SystemsA Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
Alan Said
 
Information Retrieval and User-centric Recommender System Evaluation
Information Retrieval and User-centric Recommender System EvaluationInformation Retrieval and User-centric Recommender System Evaluation
Information Retrieval and User-centric Recommender System Evaluation
Alan Said
 
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
Alan Said
 
A 3D Approach to Recommender System Evaluation
A 3D Approach to Recommender System EvaluationA 3D Approach to Recommender System Evaluation
A 3D Approach to Recommender System Evaluation
Alan Said
 
State of RecSys: Recap of RecSys 2012
State of RecSys: Recap of RecSys 2012State of RecSys: Recap of RecSys 2012
State of RecSys: Recap of RecSys 2012
Alan Said
 
RecSysChallenge Opening
RecSysChallenge OpeningRecSysChallenge Opening
RecSysChallenge Opening
Alan Said
 
Estimating the Magic Barrier of Recommender Systems: A User Study
Estimating the Magic Barrier of Recommender Systems: A User StudyEstimating the Magic Barrier of Recommender Systems: A User Study
Estimating the Magic Barrier of Recommender Systems: A User Study
Alan Said
 
Users and Noise: The Magic Barrier of Recommender Systems
Users and Noise: The Magic Barrier of Recommender SystemsUsers and Noise: The Magic Barrier of Recommender Systems
Users and Noise: The Magic Barrier of Recommender Systems
Alan Said
 
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Alan Said
 
CaRR 2012 Opening Presentation
CaRR 2012 Opening PresentationCaRR 2012 Opening Presentation
CaRR 2012 Opening Presentation
Alan Said
 
Personalizing Tags: A Folksonomy-like Approach for Recommending Movies
Personalizing Tags: A Folksonomy-like Approach for Recommending MoviesPersonalizing Tags: A Folksonomy-like Approach for Recommending Movies
Personalizing Tags: A Folksonomy-like Approach for Recommending Movies
Alan Said
 
Inferring Contextual User Profiles - Improving Recommender Performance
Inferring Contextual User Profiles - Improving Recommender PerformanceInferring Contextual User Profiles - Improving Recommender Performance
Inferring Contextual User Profiles - Improving Recommender Performance
Alan Said
 
Using Social- and Pseudo-Social Networks to Improve Recommendation Quality
Using Social- and Pseudo-Social Networks to Improve Recommendation QualityUsing Social- and Pseudo-Social Networks to Improve Recommendation Quality
Using Social- and Pseudo-Social Networks to Improve Recommendation Quality
Alan Said
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Alan Said
 

More from Alan Said (16)

Replication of Recommender Systems Research
Replication of Recommender Systems ResearchReplication of Recommender Systems Research
Replication of Recommender Systems Research
 
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just RatingsThe Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
 
A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
A Top-N Recommender System Evaluation Protocol Inspired by Deployed SystemsA Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
 
Information Retrieval and User-centric Recommender System Evaluation
Information Retrieval and User-centric Recommender System EvaluationInformation Retrieval and User-centric Recommender System Evaluation
Information Retrieval and User-centric Recommender System Evaluation
 
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
 
A 3D Approach to Recommender System Evaluation
A 3D Approach to Recommender System EvaluationA 3D Approach to Recommender System Evaluation
A 3D Approach to Recommender System Evaluation
 
State of RecSys: Recap of RecSys 2012
State of RecSys: Recap of RecSys 2012State of RecSys: Recap of RecSys 2012
State of RecSys: Recap of RecSys 2012
 
RecSysChallenge Opening
RecSysChallenge OpeningRecSysChallenge Opening
RecSysChallenge Opening
 
Estimating the Magic Barrier of Recommender Systems: A User Study
Estimating the Magic Barrier of Recommender Systems: A User StudyEstimating the Magic Barrier of Recommender Systems: A User Study
Estimating the Magic Barrier of Recommender Systems: A User Study
 
Users and Noise: The Magic Barrier of Recommender Systems
Users and Noise: The Magic Barrier of Recommender SystemsUsers and Noise: The Magic Barrier of Recommender Systems
Users and Noise: The Magic Barrier of Recommender Systems
 
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
 
CaRR 2012 Opening Presentation
CaRR 2012 Opening PresentationCaRR 2012 Opening Presentation
CaRR 2012 Opening Presentation
 
Personalizing Tags: A Folksonomy-like Approach for Recommending Movies
Personalizing Tags: A Folksonomy-like Approach for Recommending MoviesPersonalizing Tags: A Folksonomy-like Approach for Recommending Movies
Personalizing Tags: A Folksonomy-like Approach for Recommending Movies
 
Inferring Contextual User Profiles - Improving Recommender Performance
Inferring Contextual User Profiles - Improving Recommender PerformanceInferring Contextual User Profiles - Improving Recommender Performance
Inferring Contextual User Profiles - Improving Recommender Performance
 
Using Social- and Pseudo-Social Networks to Improve Recommendation Quality
Using Social- and Pseudo-Social Networks to Improve Recommendation QualityUsing Social- and Pseudo-Social Networks to Improve Recommendation Quality
Using Social- and Pseudo-Social Networks to Improve Recommendation Quality
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 

Recently uploaded

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

Best Practices in Recommender System Challenges

  • 1. Recommender Systems Challenges Best Practices Tutorial & Panel ACM RecSys 2012 Dublin September 10, 2012
  • 2. About us • Alan Said - PhD Student @ TU-Berlin o Topics: RecSys Evaluation o @alansaid o URL: www.alansaid.com • Domonkos Tikk - CEO @ Gravity R&D o Topics: Machine Learning methods for RecSys o @domonkostikk o http://www.tmit.bme.hu/tikk.domonkos • Andreas Hotho - Prof. @ Uni. Würzburg o Topics: Data Mining, Information Retrieval, Web Science o http://www.is.informatik.uni-wuerzburg.de/staff/hotho
  • 3. General Motivation "RecSys is nobody's home conference. We come from CHI, IUI, SIGIR, etc." Joe Konstan - RecSys 2010 RecSys is our home conference - we should evaluate accordingly!
  • 4. Outline • Tutorial o Introduction to concepts in challenges o Execution of a challenge o Conclusion • Panel Experiences of participating in and organizing challenges  Yehuda Koren  Darren Vengroff  Torben Brodt
  • 5. What is the motivation for RecSys Challenges? Part 1
  • 6. Setup - information overload users content of service provider recommender
  • 7. Motivation of stakeholders find relevant content easy navigation serendipity, discovery user service increase revenue target user with recom the right content engage users facilitate goals of stakeholders get recognized
  • 8. Evaluation in terms of the business business reporting Online evaluation (A/B test) Casting into a research problem
  • 9. Context of the contest • Selection of metrics • Domain dependent • Offline vs. online evaluation • IR centric evaluation o RMSE o MAP o F1
  • 11. Recsys Competition Highlights • Large scale • Organization • RMSE • 3-stage setup • Prize • selection by review • runtime limits • real traffic • revenue increase • offline • MAP@500 • metadata available • larger in dimensions • no ratings
  • 12. Recurring Competitions • ACM KDD Cup (2007, 2011, 2012) • ECML/PKDD Discovery Challenge (2008 onwards) o 2008 and 09: tag recommendation in social bookmarking (incl. online evaluation task) o 2011: video lectures • CAMRa (2010, 2011, 2012)
  • 13. Does size matter? • Yes! – real world users • In research – to some extent
  • 14. Research & Industry Important for both • Industry has the data and research needs data • Industry needs better approaches but this costs • Research has ideas but has no systems and/or data to do the evaluation Don't exploit participants Don't be too greedy
  • 16. Standard Challenge Setting • organizer defines the recommender setting e.g. tag recommendation in BibSonomy • provide data o with features or o raw data o construct your own data • fix the way to do the evaluation • define the goal e.g. reach a certain improvement (F1) • motivate people to participate: e.g. promise a lot of money ;-)
  • 17. Typical contest settings • offline o everyone gets access to the dataset o in principle it is a prediction task, the user can't be influenced o privacy of the user within the data is a big issue o results from offline experimentation have limited predictive power for online user behavior • online o after a first learning phase the recommender is plugged into a real system o user can be influenced but only by the selected system o comparison of different system is not completely fair • further ways o user study
  • 18. Example online setting (BibSonomy) BALBY MARINHO, L. ; HOTHO, A. ; JÄSCHKE, R. ; NANOPOULOS, A. ; RENDLE, S. ; SCHMIDT-THIEME, L. ; STUMME, G. ; SYMEONIDIS, P.: Recommender Systems for Social Tagging Systems : SPRINGER, 2012 (SpringerBriefs in Electrical and Computer Engineering). - ISBN 978-1- 4614-1893-1
  • 19. Which evaluation measures? • Root Mean Squared Error (RMSE) • Mean Absolute Error (MAE) • Typical IR measures o precision @ n-items o recall @ n-items o False Positive Rate o F1 @ n-items o Area Under the ROC Curve (AUC) • non-quality measures o server answer time o understandability of the results
  • 20. Discussion of measures? RMSE - Precision • RMSE is not necessarily the king of metrics as RMSE is easy to optimize on • What about Top-n? • but RMSE is not influenced by popularity as top-n • What about user-centric stuff? • Ranking-based measure in KDD Cup 2011, Track 2
  • 21. Results influenced by ... • target of the recommendation (user, resources, etc...) • evaluation methodology (leave-one-out, time based split, random sample, cross validation) • evaluation measure • design of the application (online setting) • the selected part of the data and its preprocessing (e.g. p-core vs. long tail) • scalability vs. quality of the model • feature and content accessible and usable for the recommendation
  • 22. Don't forget.. • the effort to organize a challenge is very big • preparing data takes time • answering questions takes even more time • participants are creative, needs for reaction • time to compute the evaluation and check the results • prepare proceedings with the outcome • ...
  • 23. What have we learnt? Conclusion Part 3
  • 24. Challenges are good since they... • ... are focused on solving a single problem • ... have many participants • ... create common evaluation criteria • ... have comparable results • ... bring real-world problems to research • ... make it easy to crown a winner • ... they are cheap (even with a 1M$ prize)
  • 25. Is that the complete truth? No!
  • 26. Is that the complete truth? • Why? Because using standard information retrieval metrics we cannot evaluate recommender system concepts like: • user interaction • perception • satisfaction • usefulness • any metric not based on accuracy/rating prediction and negative predictions • scalability • engineering
  • 27. We can't catch everything offline Scalability Presentation Interaction
  • 28. The difference between IR and RS Information retrieval systems answer to a need A Query Recommender systems identify the user's needs
  • 29. Should we organize more challenges? • Yes - but before we do that, think of o What is the utility of Yet Another Dataset - aren't there enough already? o How do we create a real-world like challenge o How do we get real user feedback
  • 30. Take home message • Real needs of users and content providers are better reflected in online evaluation • Consider technical limitations as well • Challenges advance the field a lot o Matrix factorization & ensemble methods in the Netflix Prize o Evaluation measure and objective in the KDD Cup 2011
  • 31. Related events at RecSys • Workshops o Recommender Utility Evaluation o RecSys Data Challenge • Paper Sessions o Multi-Objective Recommendation and Human Factors - Mon. 14:30 o Implicit Feedback and User Preference - Tue. 11:00 o Top-N Recommendation - Wed. 14:30 • More challenges: o www.recsyswiki.com/wiki/Category:Competition
  • 33. Panel • Torben Brodt o Plista o Organizing Plista Contest • Yehuda Koren o Google o Member of winning team of the Netflix Prize • Darren Vengroff o RichRelevance o Organizer of RecLab Prize
  • 34. Questions • How does recommendation influence the user and system? • How can we quantify the effects of the UI? • How should we translate what we've presented into an actual challenge? • should we focus on the long tail or the short head? • Evaluation measures, click rate, wtf@k • How to evaluate conversion rate?