Best Practices in Recommender System Challenges

Alan Said
Alan SaidLecturer at University of Gothenburg
Recommender Systems
    Challenges
      Best Practices
     Tutorial & Panel


       ACM RecSys 2012
           Dublin
         September 10, 2012
About us
•   Alan Said - PhD Student @ TU-Berlin
    o   Topics: RecSys Evaluation
    o   @alansaid
    o   URL: www.alansaid.com


•   Domonkos Tikk - CEO @ Gravity R&D
    o   Topics: Machine Learning methods for RecSys
    o   @domonkostikk
    o   http://www.tmit.bme.hu/tikk.domonkos


•   Andreas Hotho - Prof. @ Uni. Würzburg
    o   Topics: Data Mining, Information Retrieval, Web Science
    o   http://www.is.informatik.uni-wuerzburg.de/staff/hotho
General Motivation
"RecSys is nobody's home conference. We
  come from CHI, IUI, SIGIR, etc."
  Joe Konstan - RecSys 2010


RecSys is our home conference - we
should evaluate accordingly!
Outline
•   Tutorial
    o Introduction to concepts in challenges
    o Execution of a challenge
    o Conclusion

•   Panel
      Experiences of participating in and
      organizing challenges
         Yehuda Koren
         Darren Vengroff
         Torben Brodt
What is the motivation
for RecSys Challenges?
          Part 1
Setup - information overload




users


                      content of service
                          provider
        recommender
Motivation of stakeholders
find relevant content
easy navigation
serendipity, discovery

  user                                service

                                    increase revenue
                                    target user with
                    recom               the right content
                                    engage users
 facilitate goals of stakeholders
 get recognized
Evaluation in terms of the business
                           business
                           reporting




Online evaluation
   (A/B test)
                      Casting into a
                    research problem
Context of the contest
•   Selection of metrics
•   Domain dependent
•   Offline vs. online evaluation


•   IR centric evaluation
     o RMSE
     o MAP
     o F1
Latent user needs
Recsys Competition Highlights
                          •   Large scale
                          •   Organization
                          •   RMSE
•   3-stage setup         •   Prize
•   selection by review
•   runtime limits
•   real traffic
•   revenue increase
                          •   offline
                          •   MAP@500
                          •   metadata available
                          •   larger in dimensions
                          •   no ratings
Recurring Competitions
•   ACM KDD Cup (2007, 2011, 2012)
•   ECML/PKDD Discovery Challenge (2008
    onwards)
    o 2008 and 09: tag recommendation in social
      bookmarking (incl. online evaluation task)
    o 2011: video lectures
•   CAMRa (2010, 2011, 2012)
Does size matter?
•   Yes! – real world users
•   In research – to some extent
Research & Industry
Important for both
• Industry has the data and research needs
  data
• Industry needs better approaches but this
  costs
• Research has ideas but has no systems
  and/or data to do the evaluation

Don't exploit participants
Don't be too greedy
Running a Challenge
       Part 2
Standard Challenge Setting
•   organizer defines the recommender setting e.g.
    tag recommendation in BibSonomy
•   provide data
    o   with features or
    o   raw data
    o   construct your own data
•   fix the way to do the evaluation
•   define the goal e.g. reach a certain
    improvement (F1)
•   motivate people to participate:
    e.g. promise a lot of money ;-)
Typical contest settings
 •   offline
     o   everyone gets access to the dataset
     o   in principle it is a prediction task, the user can't be influenced
     o   privacy of the user within the data is a big issue
     o   results from offline experimentation have limited predictive power
         for online user behavior

 •   online
     o   after a first learning phase the recommender is plugged into a real
         system
     o   user can be influenced but only by the selected system
     o   comparison of different system is not completely fair

 •   further ways
     o   user study
Example online setting
(BibSonomy)




BALBY MARINHO, L. ; HOTHO, A. ; JÄSCHKE, R. ; NANOPOULOS, A. ; RENDLE, S. ; SCHMIDT-THIEME, L. ; STUMME, G. ; SYMEONIDIS, P.:
Recommender Systems for Social Tagging Systems : SPRINGER, 2012 (SpringerBriefs in Electrical and Computer Engineering). - ISBN 978-1-
4614-1893-1
Which evaluation measures?
•   Root Mean Squared Error (RMSE)
•   Mean Absolute Error (MAE)
•   Typical IR measures
    o   precision @ n-items
    o   recall @ n-items
    o   False Positive Rate
    o   F1 @ n-items
    o   Area Under the ROC Curve (AUC)
•   non-quality measures
    o   server answer time
    o   understandability of the results
Discussion of measures?
    RMSE - Precision
• RMSE is not necessarily the king of metrics
    as RMSE is easy to optimize on
•   What about Top-n?
•   but RMSE is not influenced by popularity as
    top-n

• What about user-centric stuff?
• Ranking-based measure in KDD Cup 2011,
    Track 2
Results influenced by ...

•   target of the recommendation (user, resources, etc...)
•   evaluation methodology (leave-one-out, time based split, random
    sample, cross validation)
•   evaluation measure
•   design of the application (online setting)
•   the selected part of the data and its preprocessing (e.g.
    p-core vs. long tail)
•   scalability vs. quality of the model
•   feature and content accessible and usable for the
    recommendation
Don't forget..
• the effort to organize a challenge is very big
• preparing data takes time
• answering questions takes even more time
• participants are creative, needs for reaction
• time to compute the evaluation and check the
    results
•   prepare proceedings with the outcome
•   ...
What have we learnt?
    Conclusion
        Part 3
Challenges are good since they...
•   ... are focused on solving a single problem
•   ... have many participants
•   ... create common evaluation criteria
•   ... have comparable results
•   ... bring real-world problems to research
•   ... make it easy to crown a winner
•   ... they are cheap (even with a 1M$ prize)
Is that the complete truth?




           No!
Is that the complete truth?
•   Why?
Because using standard information retrieval metrics we
cannot evaluate recommender system concepts like:
    • user interaction
    • perception
    • satisfaction
    • usefulness
    • any metric not based on accuracy/rating prediction
      and negative predictions
    • scalability
    • engineering
We can't catch everything offline
        Scalability

                      Presentation



                      Interaction
The difference between IR and RS
Information retrieval systems answer to a need


                 A Query
Recommender systems identify the user's needs
Should we organize more
challenges?
•   Yes - but before we do that, think of
    o What is the utility of Yet Another Dataset - aren't
      there enough already?
    o How do we create a real-world like challenge
    o How do we get real user feedback
Take home message
•   Real needs of users and content providers are better
    reflected in online evaluation

•   Consider technical limitations as well

•   Challenges advance the field a lot
    o Matrix factorization & ensemble methods in the
      Netflix Prize
    o Evaluation measure and objective in the KDD Cup
      2011
Related events at RecSys
•   Workshops
    o   Recommender Utility Evaluation
    o   RecSys Data Challenge
•   Paper Sessions
    o Multi-Objective Recommendation and Human
      Factors - Mon. 14:30
    o Implicit Feedback and User Preference - Tue. 11:00
    o Top-N Recommendation - Wed. 14:30

•   More challenges:
    o   www.recsyswiki.com/wiki/Category:Competition
Panel
Part 4
Panel
•   Torben Brodt
    o   Plista
    o   Organizing Plista Contest

•   Yehuda Koren
    o   Google
    o   Member of winning team of the Netflix Prize


•   Darren Vengroff
    o   RichRelevance
    o   Organizer of RecLab Prize
Questions
•   How does recommendation influence the
    user and system?
•   How can we quantify the effects of the UI?
•   How should we translate what we've
    presented into an actual challenge?
•   should we focus on the long tail or the short
    head?
•   Evaluation measures, click rate, wtf@k
•   How to evaluate conversion rate?
1 of 34

Recommended

Recommender Systems - A Review and Recent Research Trends by
Recommender Systems  -  A Review and Recent Research TrendsRecommender Systems  -  A Review and Recent Research Trends
Recommender Systems - A Review and Recent Research TrendsSujoy Bag
2.4K views53 slides
Recommender system a-introduction by
Recommender system a-introductionRecommender system a-introduction
Recommender system a-introductionzh3f
6.9K views353 slides
Recommender system introduction by
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
12.2K views39 slides
Recommendation techniques by
Recommendation techniques Recommendation techniques
Recommendation techniques sun9413
8.4K views15 slides
Social Recommender Systems Tutorial - WWW 2011 by
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011idoguy
21.5K views156 slides
Past, present, and future of Recommender Systems: an industry perspective by
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
11K views71 slides

More Related Content

What's hot

Recommender Systems (Machine Learning Summer School 2014 @ CMU) by
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
184.6K views248 slides
Information Retrieval Models for Recommender Systems - PhD slides by
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesDaniel Valcarce
1.3K views196 slides
An Example of Predictive Analytics: Building a Recommendation Engine Using Py... by
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...PyData
8K views25 slides
Kdd 2014 Tutorial - the recommender problem revisited by
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisitedXavier Amatriain
46.9K views135 slides
Aiinpractice2017deepaklongversion by
Aiinpractice2017deepaklongversionAiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionDeepak Agarwal
1.4K views59 slides
Social Recommender Systems by
Social Recommender SystemsSocial Recommender Systems
Social Recommender Systemsguest77b0cd12
3.5K views40 slides

What's hot(20)

Recommender Systems (Machine Learning Summer School 2014 @ CMU) by Xavier Amatriain
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Xavier Amatriain184.6K views
Information Retrieval Models for Recommender Systems - PhD slides by Daniel Valcarce
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slides
Daniel Valcarce1.3K views
An Example of Predictive Analytics: Building a Recommendation Engine Using Py... by PyData
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
PyData8K views
Kdd 2014 Tutorial - the recommender problem revisited by Xavier Amatriain
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
Xavier Amatriain46.9K views
Aiinpractice2017deepaklongversion by Deepak Agarwal
Aiinpractice2017deepaklongversionAiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversion
Deepak Agarwal1.4K views
Social Recommender Systems by guest77b0cd12
Social Recommender SystemsSocial Recommender Systems
Social Recommender Systems
guest77b0cd123.5K views
Product Recommendations Enhanced with Reviews by maranlar
Product Recommendations Enhanced with ReviewsProduct Recommendations Enhanced with Reviews
Product Recommendations Enhanced with Reviews
maranlar4.5K views
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale by Xavier Amatriain
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Xavier Amatriain15.1K views
Recommender systems: Content-based and collaborative filtering by Viet-Trung TRAN
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN11.4K views
Recsys2016 Tutorial by Xavier and Deepak by Deepak Agarwal
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and Deepak
Deepak Agarwal1.6K views
Data Mining and Recommendation Systems by Salil Navgire
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
Salil Navgire9.6K views
Survey of Recommendation Systems by youalab
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
youalab4.7K views
Overview of recommender system by Stanley Wang
Overview of recommender systemOverview of recommender system
Overview of recommender system
Stanley Wang6.9K views
An introduction to Recommender Systems by David Zibriczky
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
David Zibriczky2.3K views
Recommendation System for Design Patterns in Software Development by Francis Palma
Recommendation System for Design Patterns in Software DevelopmentRecommendation System for Design Patterns in Software Development
Recommendation System for Design Patterns in Software Development
Francis Palma625 views

Similar to Best Practices in Recommender System Challenges

PAS: The Planning Quality Framework by
PAS: The Planning Quality FrameworkPAS: The Planning Quality Framework
PAS: The Planning Quality FrameworkPAS_Team
684 views53 slides
Establishing best practices to improve usefulness and usability of web interf... by
Establishing best practices to improve usefulness and usability of web interf...Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...DRIscience
947 views37 slides
ECIR Recommendation Challenges by
ECIR Recommendation ChallengesECIR Recommendation Challenges
ECIR Recommendation ChallengesDaniel Kohlsdorf
356 views37 slides
Value stream mapping for complex processes (innovation, Lean, service design) by
Value stream mapping for complex processes (innovation, Lean, service design) Value stream mapping for complex processes (innovation, Lean, service design)
Value stream mapping for complex processes (innovation, Lean, service design) Teemu Toivonen
238 views28 slides
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame... by
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Alan Said
4.6K views29 slides
Building Search and Personalization at Nordstrom Rack | Hautelook by
Building Search and Personalization at Nordstrom Rack | HautelookBuilding Search and Personalization at Nordstrom Rack | Hautelook
Building Search and Personalization at Nordstrom Rack | HautelookLucidworks
279 views21 slides

Similar to Best Practices in Recommender System Challenges(20)

PAS: The Planning Quality Framework by PAS_Team
PAS: The Planning Quality FrameworkPAS: The Planning Quality Framework
PAS: The Planning Quality Framework
PAS_Team684 views
Establishing best practices to improve usefulness and usability of web interf... by DRIscience
Establishing best practices to improve usefulness and usability of web interf...Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...
DRIscience947 views
Value stream mapping for complex processes (innovation, Lean, service design) by Teemu Toivonen
Value stream mapping for complex processes (innovation, Lean, service design) Value stream mapping for complex processes (innovation, Lean, service design)
Value stream mapping for complex processes (innovation, Lean, service design)
Teemu Toivonen238 views
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame... by Alan Said
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Alan Said4.6K views
Building Search and Personalization at Nordstrom Rack | Hautelook by Lucidworks
Building Search and Personalization at Nordstrom Rack | HautelookBuilding Search and Personalization at Nordstrom Rack | Hautelook
Building Search and Personalization at Nordstrom Rack | Hautelook
Lucidworks279 views
Unlocking the value of customer data by Janessa Lantz
Unlocking the value of customer dataUnlocking the value of customer data
Unlocking the value of customer data
Janessa Lantz1.2K views
7.1 Mapping Your Processes to Deliver an Exceptional Student Experience by TargetX
7.1 Mapping Your Processes to Deliver an Exceptional Student Experience7.1 Mapping Your Processes to Deliver an Exceptional Student Experience
7.1 Mapping Your Processes to Deliver an Exceptional Student Experience
TargetX741 views
Downloads abc 2006 presentation downloads-ramesh_babu by Hem Rana
Downloads abc 2006   presentation downloads-ramesh_babuDownloads abc 2006   presentation downloads-ramesh_babu
Downloads abc 2006 presentation downloads-ramesh_babu
Hem Rana29 views
Advanced Project Data Analytics for Improved Project Delivery by Mark Constable
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project Delivery
Mark Constable200 views
Human computation, crowdsourcing and social: An industrial perspective by oralonso
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspective
oralonso1.2K views
Knowledge Management in Healthcare Analytics by Gregory Nelson
Knowledge Management in Healthcare AnalyticsKnowledge Management in Healthcare Analytics
Knowledge Management in Healthcare Analytics
Gregory Nelson250 views
Modern Perspectives on Recommender Systems and their Applications in Mendeley by Maya Hristakeva
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Maya Hristakeva1.7K views
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event by Kay Aubrey
Usability Testing for Qualitative Researchers - QRCA NYC Chapter eventUsability Testing for Qualitative Researchers - QRCA NYC Chapter event
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
Kay Aubrey3.4K views
Charles Rygula: Value Beyond Words by Jack Molisani
Charles Rygula: Value Beyond WordsCharles Rygula: Value Beyond Words
Charles Rygula: Value Beyond Words
Jack Molisani166 views

More from Alan Said

Replication of Recommender Systems Research by
Replication of Recommender Systems ResearchReplication of Recommender Systems Research
Replication of Recommender Systems ResearchAlan Said
1.9K views130 slides
The Magic Barrier of Recommender Systems - No Magic, Just Ratings by
The Magic Barrier of Recommender Systems - No Magic, Just RatingsThe Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just RatingsAlan Said
2K views26 slides
A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems by
A Top-N Recommender System Evaluation Protocol Inspired by Deployed SystemsA Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
A Top-N Recommender System Evaluation Protocol Inspired by Deployed SystemsAlan Said
2K views18 slides
Information Retrieval and User-centric Recommender System Evaluation by
Information Retrieval and User-centric Recommender System EvaluationInformation Retrieval and User-centric Recommender System Evaluation
Information Retrieval and User-centric Recommender System EvaluationAlan Said
1K views1 slide
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco... by
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...Alan Said
2.3K views22 slides
A 3D Approach to Recommender System Evaluation by
A 3D Approach to Recommender System EvaluationA 3D Approach to Recommender System Evaluation
A 3D Approach to Recommender System EvaluationAlan Said
651 views1 slide

More from Alan Said(16)

Replication of Recommender Systems Research by Alan Said
Replication of Recommender Systems ResearchReplication of Recommender Systems Research
Replication of Recommender Systems Research
Alan Said1.9K views
The Magic Barrier of Recommender Systems - No Magic, Just Ratings by Alan Said
The Magic Barrier of Recommender Systems - No Magic, Just RatingsThe Magic Barrier of Recommender Systems - No Magic, Just Ratings
The Magic Barrier of Recommender Systems - No Magic, Just Ratings
Alan Said2K views
A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems by Alan Said
A Top-N Recommender System Evaluation Protocol Inspired by Deployed SystemsA Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
Alan Said2K views
Information Retrieval and User-centric Recommender System Evaluation by Alan Said
Information Retrieval and User-centric Recommender System EvaluationInformation Retrieval and User-centric Recommender System Evaluation
Information Retrieval and User-centric Recommender System Evaluation
Alan Said1K views
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco... by Alan Said
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
User-Centric Evaluation of a K-Furthest Neighbor Collaborative Filtering Reco...
Alan Said2.3K views
A 3D Approach to Recommender System Evaluation by Alan Said
A 3D Approach to Recommender System EvaluationA 3D Approach to Recommender System Evaluation
A 3D Approach to Recommender System Evaluation
Alan Said651 views
State of RecSys: Recap of RecSys 2012 by Alan Said
State of RecSys: Recap of RecSys 2012State of RecSys: Recap of RecSys 2012
State of RecSys: Recap of RecSys 2012
Alan Said1.3K views
RecSysChallenge Opening by Alan Said
RecSysChallenge OpeningRecSysChallenge Opening
RecSysChallenge Opening
Alan Said1.1K views
Estimating the Magic Barrier of Recommender Systems: A User Study by Alan Said
Estimating the Magic Barrier of Recommender Systems: A User StudyEstimating the Magic Barrier of Recommender Systems: A User Study
Estimating the Magic Barrier of Recommender Systems: A User Study
Alan Said656 views
Users and Noise: The Magic Barrier of Recommender Systems by Alan Said
Users and Noise: The Magic Barrier of Recommender SystemsUsers and Noise: The Magic Barrier of Recommender Systems
Users and Noise: The Magic Barrier of Recommender Systems
Alan Said1.4K views
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold... by Alan Said
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Alan Said1.2K views
CaRR 2012 Opening Presentation by Alan Said
CaRR 2012 Opening PresentationCaRR 2012 Opening Presentation
CaRR 2012 Opening Presentation
Alan Said676 views
Personalizing Tags: A Folksonomy-like Approach for Recommending Movies by Alan Said
Personalizing Tags: A Folksonomy-like Approach for Recommending MoviesPersonalizing Tags: A Folksonomy-like Approach for Recommending Movies
Personalizing Tags: A Folksonomy-like Approach for Recommending Movies
Alan Said1.1K views
Inferring Contextual User Profiles - Improving Recommender Performance by Alan Said
Inferring Contextual User Profiles - Improving Recommender PerformanceInferring Contextual User Profiles - Improving Recommender Performance
Inferring Contextual User Profiles - Improving Recommender Performance
Alan Said832 views
Using Social- and Pseudo-Social Networks to Improve Recommendation Quality by Alan Said
Using Social- and Pseudo-Social Networks to Improve Recommendation QualityUsing Social- and Pseudo-Social Networks to Improve Recommendation Quality
Using Social- and Pseudo-Social Networks to Improve Recommendation Quality
Alan Said1.2K views
Recommender Systems by Alan Said
Recommender SystemsRecommender Systems
Recommender Systems
Alan Said1.6K views

Recently uploaded

Special_edition_innovator_2023.pdf by
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdfWillDavies22
18 views6 slides
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensorssugiuralab
21 views15 slides
Piloting & Scaling Successfully With Microsoft Viva by
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft VivaRichard Harbridge
12 views160 slides
NET Conf 2023 Recap by
NET Conf 2023 RecapNET Conf 2023 Recap
NET Conf 2023 RecapLee Richardson
10 views71 slides
Melek BEN MAHMOUD.pdf by
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
14 views1 slide
STPI OctaNE CoE Brochure.pdf by
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdfmadhurjyapb
14 views1 slide

Recently uploaded(20)

Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 views
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab21 views
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely25 views
Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma39 views

Best Practices in Recommender System Challenges

  • 1. Recommender Systems Challenges Best Practices Tutorial & Panel ACM RecSys 2012 Dublin September 10, 2012
  • 2. About us • Alan Said - PhD Student @ TU-Berlin o Topics: RecSys Evaluation o @alansaid o URL: www.alansaid.com • Domonkos Tikk - CEO @ Gravity R&D o Topics: Machine Learning methods for RecSys o @domonkostikk o http://www.tmit.bme.hu/tikk.domonkos • Andreas Hotho - Prof. @ Uni. Würzburg o Topics: Data Mining, Information Retrieval, Web Science o http://www.is.informatik.uni-wuerzburg.de/staff/hotho
  • 3. General Motivation "RecSys is nobody's home conference. We come from CHI, IUI, SIGIR, etc." Joe Konstan - RecSys 2010 RecSys is our home conference - we should evaluate accordingly!
  • 4. Outline • Tutorial o Introduction to concepts in challenges o Execution of a challenge o Conclusion • Panel Experiences of participating in and organizing challenges  Yehuda Koren  Darren Vengroff  Torben Brodt
  • 5. What is the motivation for RecSys Challenges? Part 1
  • 6. Setup - information overload users content of service provider recommender
  • 7. Motivation of stakeholders find relevant content easy navigation serendipity, discovery user service increase revenue target user with recom the right content engage users facilitate goals of stakeholders get recognized
  • 8. Evaluation in terms of the business business reporting Online evaluation (A/B test) Casting into a research problem
  • 9. Context of the contest • Selection of metrics • Domain dependent • Offline vs. online evaluation • IR centric evaluation o RMSE o MAP o F1
  • 11. Recsys Competition Highlights • Large scale • Organization • RMSE • 3-stage setup • Prize • selection by review • runtime limits • real traffic • revenue increase • offline • MAP@500 • metadata available • larger in dimensions • no ratings
  • 12. Recurring Competitions • ACM KDD Cup (2007, 2011, 2012) • ECML/PKDD Discovery Challenge (2008 onwards) o 2008 and 09: tag recommendation in social bookmarking (incl. online evaluation task) o 2011: video lectures • CAMRa (2010, 2011, 2012)
  • 13. Does size matter? • Yes! – real world users • In research – to some extent
  • 14. Research & Industry Important for both • Industry has the data and research needs data • Industry needs better approaches but this costs • Research has ideas but has no systems and/or data to do the evaluation Don't exploit participants Don't be too greedy
  • 16. Standard Challenge Setting • organizer defines the recommender setting e.g. tag recommendation in BibSonomy • provide data o with features or o raw data o construct your own data • fix the way to do the evaluation • define the goal e.g. reach a certain improvement (F1) • motivate people to participate: e.g. promise a lot of money ;-)
  • 17. Typical contest settings • offline o everyone gets access to the dataset o in principle it is a prediction task, the user can't be influenced o privacy of the user within the data is a big issue o results from offline experimentation have limited predictive power for online user behavior • online o after a first learning phase the recommender is plugged into a real system o user can be influenced but only by the selected system o comparison of different system is not completely fair • further ways o user study
  • 18. Example online setting (BibSonomy) BALBY MARINHO, L. ; HOTHO, A. ; JÄSCHKE, R. ; NANOPOULOS, A. ; RENDLE, S. ; SCHMIDT-THIEME, L. ; STUMME, G. ; SYMEONIDIS, P.: Recommender Systems for Social Tagging Systems : SPRINGER, 2012 (SpringerBriefs in Electrical and Computer Engineering). - ISBN 978-1- 4614-1893-1
  • 19. Which evaluation measures? • Root Mean Squared Error (RMSE) • Mean Absolute Error (MAE) • Typical IR measures o precision @ n-items o recall @ n-items o False Positive Rate o F1 @ n-items o Area Under the ROC Curve (AUC) • non-quality measures o server answer time o understandability of the results
  • 20. Discussion of measures? RMSE - Precision • RMSE is not necessarily the king of metrics as RMSE is easy to optimize on • What about Top-n? • but RMSE is not influenced by popularity as top-n • What about user-centric stuff? • Ranking-based measure in KDD Cup 2011, Track 2
  • 21. Results influenced by ... • target of the recommendation (user, resources, etc...) • evaluation methodology (leave-one-out, time based split, random sample, cross validation) • evaluation measure • design of the application (online setting) • the selected part of the data and its preprocessing (e.g. p-core vs. long tail) • scalability vs. quality of the model • feature and content accessible and usable for the recommendation
  • 22. Don't forget.. • the effort to organize a challenge is very big • preparing data takes time • answering questions takes even more time • participants are creative, needs for reaction • time to compute the evaluation and check the results • prepare proceedings with the outcome • ...
  • 23. What have we learnt? Conclusion Part 3
  • 24. Challenges are good since they... • ... are focused on solving a single problem • ... have many participants • ... create common evaluation criteria • ... have comparable results • ... bring real-world problems to research • ... make it easy to crown a winner • ... they are cheap (even with a 1M$ prize)
  • 25. Is that the complete truth? No!
  • 26. Is that the complete truth? • Why? Because using standard information retrieval metrics we cannot evaluate recommender system concepts like: • user interaction • perception • satisfaction • usefulness • any metric not based on accuracy/rating prediction and negative predictions • scalability • engineering
  • 27. We can't catch everything offline Scalability Presentation Interaction
  • 28. The difference between IR and RS Information retrieval systems answer to a need A Query Recommender systems identify the user's needs
  • 29. Should we organize more challenges? • Yes - but before we do that, think of o What is the utility of Yet Another Dataset - aren't there enough already? o How do we create a real-world like challenge o How do we get real user feedback
  • 30. Take home message • Real needs of users and content providers are better reflected in online evaluation • Consider technical limitations as well • Challenges advance the field a lot o Matrix factorization & ensemble methods in the Netflix Prize o Evaluation measure and objective in the KDD Cup 2011
  • 31. Related events at RecSys • Workshops o Recommender Utility Evaluation o RecSys Data Challenge • Paper Sessions o Multi-Objective Recommendation and Human Factors - Mon. 14:30 o Implicit Feedback and User Preference - Tue. 11:00 o Top-N Recommendation - Wed. 14:30 • More challenges: o www.recsyswiki.com/wiki/Category:Competition
  • 33. Panel • Torben Brodt o Plista o Organizing Plista Contest • Yehuda Koren o Google o Member of winning team of the Netflix Prize • Darren Vengroff o RichRelevance o Organizer of RecLab Prize
  • 34. Questions • How does recommendation influence the user and system? • How can we quantify the effects of the UI? • How should we translate what we've presented into an actual challenge? • should we focus on the long tail or the short head? • Evaluation measures, click rate, wtf@k • How to evaluate conversion rate?