Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Recommender Systems
    Challenges
      Best Practices
     Tutorial & Panel


       ACM RecSys 2012
           Dublin
 ...
About us
•   Alan Said - PhD Student @ TU-Berlin
    o   Topics: RecSys Evaluation
    o   @alansaid
    o   URL: www.alan...
General Motivation
"RecSys is nobody's home conference. We
  come from CHI, IUI, SIGIR, etc."
  Joe Konstan - RecSys 2010
...
Outline
•   Tutorial
    o Introduction to concepts in challenges
    o Execution of a challenge
    o Conclusion

•   Pan...
What is the motivation
for RecSys Challenges?
          Part 1
Setup - information overload




users


                      content of service
                          provider
     ...
Motivation of stakeholders
find relevant content
easy navigation
serendipity, discovery

  user                           ...
Evaluation in terms of the business
                           business
                           reporting




Online ev...
Context of the contest
•   Selection of metrics
•   Domain dependent
•   Offline vs. online evaluation


•   IR centric ev...
Latent user needs
Recsys Competition Highlights
                          •   Large scale
                          •   Organization
       ...
Recurring Competitions
•   ACM KDD Cup (2007, 2011, 2012)
•   ECML/PKDD Discovery Challenge (2008
    onwards)
    o 2008 ...
Does size matter?
•   Yes! – real world users
•   In research – to some extent
Research & Industry
Important for both
• Industry has the data and research needs
  data
• Industry needs better approache...
Running a Challenge
       Part 2
Standard Challenge Setting
•   organizer defines the recommender setting e.g.
    tag recommendation in BibSonomy
•   prov...
Typical contest settings
 •   offline
     o   everyone gets access to the dataset
     o   in principle it is a predictio...
Example online setting
(BibSonomy)




BALBY MARINHO, L. ; HOTHO, A. ; JÄSCHKE, R. ; NANOPOULOS, A. ; RENDLE, S. ; SCHMIDT...
Which evaluation measures?
•   Root Mean Squared Error (RMSE)
•   Mean Absolute Error (MAE)
•   Typical IR measures
    o ...
Discussion of measures?
    RMSE - Precision
• RMSE is not necessarily the king of metrics
    as RMSE is easy to optimize...
Results influenced by ...

•   target of the recommendation (user, resources, etc...)
•   evaluation methodology (leave-on...
Don't forget..
• the effort to organize a challenge is very big
• preparing data takes time
• answering questions takes ev...
What have we learnt?
    Conclusion
        Part 3
Challenges are good since they...
•   ... are focused on solving a single problem
•   ... have many participants
•   ... c...
Is that the complete truth?




           No!
Is that the complete truth?
•   Why?
Because using standard information retrieval metrics we
cannot evaluate recommender s...
We can't catch everything offline
        Scalability

                      Presentation



                      Interac...
The difference between IR and RS
Information retrieval systems answer to a need


                 A Query
Recommender sys...
Should we organize more
challenges?
•   Yes - but before we do that, think of
    o What is the utility of Yet Another Dat...
Take home message
•   Real needs of users and content providers are better
    reflected in online evaluation

•   Conside...
Related events at RecSys
•   Workshops
    o   Recommender Utility Evaluation
    o   RecSys Data Challenge
•   Paper Sess...
Panel
Part 4
Panel
•   Torben Brodt
    o   Plista
    o   Organizing Plista Contest

•   Yehuda Koren
    o   Google
    o   Member of...
Questions
•   How does recommendation influence the
    user and system?
•   How can we quantify the effects of the UI?
• ...
Upcoming SlideShare
Loading in …5
×

of

Best Practices in Recommender System Challenges Slide 1 Best Practices in Recommender System Challenges Slide 2 Best Practices in Recommender System Challenges Slide 3 Best Practices in Recommender System Challenges Slide 4 Best Practices in Recommender System Challenges Slide 5 Best Practices in Recommender System Challenges Slide 6 Best Practices in Recommender System Challenges Slide 7 Best Practices in Recommender System Challenges Slide 8 Best Practices in Recommender System Challenges Slide 9 Best Practices in Recommender System Challenges Slide 10 Best Practices in Recommender System Challenges Slide 11 Best Practices in Recommender System Challenges Slide 12 Best Practices in Recommender System Challenges Slide 13 Best Practices in Recommender System Challenges Slide 14 Best Practices in Recommender System Challenges Slide 15 Best Practices in Recommender System Challenges Slide 16 Best Practices in Recommender System Challenges Slide 17 Best Practices in Recommender System Challenges Slide 18 Best Practices in Recommender System Challenges Slide 19 Best Practices in Recommender System Challenges Slide 20 Best Practices in Recommender System Challenges Slide 21 Best Practices in Recommender System Challenges Slide 22 Best Practices in Recommender System Challenges Slide 23 Best Practices in Recommender System Challenges Slide 24 Best Practices in Recommender System Challenges Slide 25 Best Practices in Recommender System Challenges Slide 26 Best Practices in Recommender System Challenges Slide 27 Best Practices in Recommender System Challenges Slide 28 Best Practices in Recommender System Challenges Slide 29 Best Practices in Recommender System Challenges Slide 30 Best Practices in Recommender System Challenges Slide 31 Best Practices in Recommender System Challenges Slide 32 Best Practices in Recommender System Challenges Slide 33 Best Practices in Recommender System Challenges Slide 34
Upcoming SlideShare
Personalized news recommendation engine
Next
Download to read offline and view in fullscreen.

16 Likes

Share

Download to read offline

Best Practices in Recommender System Challenges

Download to read offline

Recommender System Challenges such as the Netflix Prize, KDD Cup, etc. have contributed vastly to the development and adoptability of recommender systems. Each year a number of challenges or contests are organized covering different aspects of recommendation. In this tutorial and panel, we present some of the factors involved in successfully organizing a challenge, whether for reasons purely related to research, industrial challenges, or to widen the scope of recommender systems applications.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Best Practices in Recommender System Challenges

  1. 1. Recommender Systems Challenges Best Practices Tutorial & Panel ACM RecSys 2012 Dublin September 10, 2012
  2. 2. About us • Alan Said - PhD Student @ TU-Berlin o Topics: RecSys Evaluation o @alansaid o URL: www.alansaid.com • Domonkos Tikk - CEO @ Gravity R&D o Topics: Machine Learning methods for RecSys o @domonkostikk o http://www.tmit.bme.hu/tikk.domonkos • Andreas Hotho - Prof. @ Uni. Würzburg o Topics: Data Mining, Information Retrieval, Web Science o http://www.is.informatik.uni-wuerzburg.de/staff/hotho
  3. 3. General Motivation "RecSys is nobody's home conference. We come from CHI, IUI, SIGIR, etc." Joe Konstan - RecSys 2010 RecSys is our home conference - we should evaluate accordingly!
  4. 4. Outline • Tutorial o Introduction to concepts in challenges o Execution of a challenge o Conclusion • Panel Experiences of participating in and organizing challenges  Yehuda Koren  Darren Vengroff  Torben Brodt
  5. 5. What is the motivation for RecSys Challenges? Part 1
  6. 6. Setup - information overload users content of service provider recommender
  7. 7. Motivation of stakeholders find relevant content easy navigation serendipity, discovery user service increase revenue target user with recom the right content engage users facilitate goals of stakeholders get recognized
  8. 8. Evaluation in terms of the business business reporting Online evaluation (A/B test) Casting into a research problem
  9. 9. Context of the contest • Selection of metrics • Domain dependent • Offline vs. online evaluation • IR centric evaluation o RMSE o MAP o F1
  10. 10. Latent user needs
  11. 11. Recsys Competition Highlights • Large scale • Organization • RMSE • 3-stage setup • Prize • selection by review • runtime limits • real traffic • revenue increase • offline • MAP@500 • metadata available • larger in dimensions • no ratings
  12. 12. Recurring Competitions • ACM KDD Cup (2007, 2011, 2012) • ECML/PKDD Discovery Challenge (2008 onwards) o 2008 and 09: tag recommendation in social bookmarking (incl. online evaluation task) o 2011: video lectures • CAMRa (2010, 2011, 2012)
  13. 13. Does size matter? • Yes! – real world users • In research – to some extent
  14. 14. Research & Industry Important for both • Industry has the data and research needs data • Industry needs better approaches but this costs • Research has ideas but has no systems and/or data to do the evaluation Don't exploit participants Don't be too greedy
  15. 15. Running a Challenge Part 2
  16. 16. Standard Challenge Setting • organizer defines the recommender setting e.g. tag recommendation in BibSonomy • provide data o with features or o raw data o construct your own data • fix the way to do the evaluation • define the goal e.g. reach a certain improvement (F1) • motivate people to participate: e.g. promise a lot of money ;-)
  17. 17. Typical contest settings • offline o everyone gets access to the dataset o in principle it is a prediction task, the user can't be influenced o privacy of the user within the data is a big issue o results from offline experimentation have limited predictive power for online user behavior • online o after a first learning phase the recommender is plugged into a real system o user can be influenced but only by the selected system o comparison of different system is not completely fair • further ways o user study
  18. 18. Example online setting (BibSonomy) BALBY MARINHO, L. ; HOTHO, A. ; JÄSCHKE, R. ; NANOPOULOS, A. ; RENDLE, S. ; SCHMIDT-THIEME, L. ; STUMME, G. ; SYMEONIDIS, P.: Recommender Systems for Social Tagging Systems : SPRINGER, 2012 (SpringerBriefs in Electrical and Computer Engineering). - ISBN 978-1- 4614-1893-1
  19. 19. Which evaluation measures? • Root Mean Squared Error (RMSE) • Mean Absolute Error (MAE) • Typical IR measures o precision @ n-items o recall @ n-items o False Positive Rate o F1 @ n-items o Area Under the ROC Curve (AUC) • non-quality measures o server answer time o understandability of the results
  20. 20. Discussion of measures? RMSE - Precision • RMSE is not necessarily the king of metrics as RMSE is easy to optimize on • What about Top-n? • but RMSE is not influenced by popularity as top-n • What about user-centric stuff? • Ranking-based measure in KDD Cup 2011, Track 2
  21. 21. Results influenced by ... • target of the recommendation (user, resources, etc...) • evaluation methodology (leave-one-out, time based split, random sample, cross validation) • evaluation measure • design of the application (online setting) • the selected part of the data and its preprocessing (e.g. p-core vs. long tail) • scalability vs. quality of the model • feature and content accessible and usable for the recommendation
  22. 22. Don't forget.. • the effort to organize a challenge is very big • preparing data takes time • answering questions takes even more time • participants are creative, needs for reaction • time to compute the evaluation and check the results • prepare proceedings with the outcome • ...
  23. 23. What have we learnt? Conclusion Part 3
  24. 24. Challenges are good since they... • ... are focused on solving a single problem • ... have many participants • ... create common evaluation criteria • ... have comparable results • ... bring real-world problems to research • ... make it easy to crown a winner • ... they are cheap (even with a 1M$ prize)
  25. 25. Is that the complete truth? No!
  26. 26. Is that the complete truth? • Why? Because using standard information retrieval metrics we cannot evaluate recommender system concepts like: • user interaction • perception • satisfaction • usefulness • any metric not based on accuracy/rating prediction and negative predictions • scalability • engineering
  27. 27. We can't catch everything offline Scalability Presentation Interaction
  28. 28. The difference between IR and RS Information retrieval systems answer to a need A Query Recommender systems identify the user's needs
  29. 29. Should we organize more challenges? • Yes - but before we do that, think of o What is the utility of Yet Another Dataset - aren't there enough already? o How do we create a real-world like challenge o How do we get real user feedback
  30. 30. Take home message • Real needs of users and content providers are better reflected in online evaluation • Consider technical limitations as well • Challenges advance the field a lot o Matrix factorization & ensemble methods in the Netflix Prize o Evaluation measure and objective in the KDD Cup 2011
  31. 31. Related events at RecSys • Workshops o Recommender Utility Evaluation o RecSys Data Challenge • Paper Sessions o Multi-Objective Recommendation and Human Factors - Mon. 14:30 o Implicit Feedback and User Preference - Tue. 11:00 o Top-N Recommendation - Wed. 14:30 • More challenges: o www.recsyswiki.com/wiki/Category:Competition
  32. 32. Panel Part 4
  33. 33. Panel • Torben Brodt o Plista o Organizing Plista Contest • Yehuda Koren o Google o Member of winning team of the Netflix Prize • Darren Vengroff o RichRelevance o Organizer of RecLab Prize
  34. 34. Questions • How does recommendation influence the user and system? • How can we quantify the effects of the UI? • How should we translate what we've presented into an actual challenge? • should we focus on the long tail or the short head? • Evaluation measures, click rate, wtf@k • How to evaluate conversion rate?
  • mikepham12

    Jan. 8, 2019
  • AndrewYakushev

    Jul. 19, 2016
  • riohsu

    Mar. 19, 2015
  • romovpa

    Apr. 14, 2014
  • summitsuen

    Mar. 29, 2014
  • JedidiahFrancis

    Feb. 27, 2014
  • tantrieuf31

    Oct. 16, 2013
  • khomenko1

    Oct. 16, 2013
  • wxwidget

    Oct. 16, 2013
  • lumengxi

    Sep. 21, 2013
  • JulinOrtega

    Jun. 3, 2013
  • maciejd

    Feb. 8, 2013
  • mo_hit4u

    Jan. 21, 2013
  • domonkostikk

    Sep. 29, 2012
  • gracemcdunnough

    Sep. 13, 2012
  • abellogin

    Sep. 11, 2012

Recommender System Challenges such as the Netflix Prize, KDD Cup, etc. have contributed vastly to the development and adoptability of recommender systems. Each year a number of challenges or contests are organized covering different aspects of recommendation. In this tutorial and panel, we present some of the factors involved in successfully organizing a challenge, whether for reasons purely related to research, industrial challenges, or to widen the scope of recommender systems applications.

Views

Total views

7,149

On Slideshare

0

From embeds

0

Number of embeds

537

Actions

Downloads

233

Shares

0

Comments

0

Likes

16

×