Successfully reported this slideshow.

ECIR Recommendation Challenges

1

Share

Upcoming SlideShare
RecSys NL - Meetup
RecSys NL - Meetup
Loading in …3
×
1 of 37
1 of 37

More Related Content

Similar to ECIR Recommendation Challenges

ECIR Recommendation Challenges

  1. 1. Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 24th, 2016 Andreas Lommatzsch - TU Berlin, Berlin, Germany Jonas Seiler - plista, Berlin, Germany Daniel Kohlsdorf - XING, Hamburg, Germany CrowdRec - www.crowdrec.eu Idomaar - http://rf.crowdrec.eu
  2. 2. • Andreas Andreas Lommatzsch Andreas.Lommatzsch@tu-berlin.de http://www.dai-lab.de
  3. 3. • s Jonas Seiler Jonas.Seiler@plista.com http://www.plista.com
  4. 4. • Daniel Daniel Kohlsdorf Daniel.Kohlsdorf@xing.com http://www.xing.com
  5. 5. Where are recommender system challenges headed? Direction 1: Use info beyond the user-item matrix. Direction 2: Online evaluation + multiple metrics. Moving towards real-world evaluation Flickr credit: rodneycampbell
  6. 6. Why evaluate? • Evaluation is crucial for the success of real-life systems • How should we evaluate? Precision and Recall Technical complexity Influence on sales Required hardware resources Business models Scalability Diversity of the presented results User satisfaction
  7. 7. Evaluation Settings • A static collection of documents • A set of queries • A list of relevant documents defined by experts for each query Traditional Evaluation in IR “The Cranfield paradigm” Advantages • Reproducible setting • All researches have exactly the same information • Optimized for measuring precision Query0 * #nn * #nn * #nn
  8. 8. Traditional Evaluation in IR Weaknesses of traditional IR evaluation • High costs for creating dataset • Datasets are not up-to-date • Domain-specific documents • The expert-defined ground truth does not consider individual user preferences • Individual user preferences • Context-awareness is not considered • Technical aspects are ignored Context is everything
  9. 9. Industry and recsys challenges • Challenges benefit both industry and academic research. • We look at how industry challenges have evolved since the Netflix prize 2009.
  10. 10. Traditional Evaluation in RecSys Evaluation Settings • Rating prediction on user-item matrices • Large, sparse dataset • Predict personalized ratings • Cross-validation, RMSE Advantages • Reproducible setting • Personalization • Dataset is based on real user ratings “The Netflix paradigm”
  11. 11. Traditional Evaluation in RecSys Weaknesses of traditional Recommender evaluation • Static data • Only one type of data - only user ratings • User ratings are noisy • Temporal aspects tend to be ignored • Context-awareness is not considered • Technical aspects are ignored
  12. 12. Challenges of Developing Applications Challenges • Data streams - continuous changes • Big data • Combine knowledge from different sources • Context-Awareness • Users expect personally relevant results • Heterogeneous devices • Technical complexity, real-time requirements
  13. 13. How to address these challenges in the Evaluation? • Realistic evaluation setting • Heterogeneous data sources • Streams • Dynamic user feedback • Appropriate metrics • Precision and User satisfaction • Technical complexity • Sales and Business models • Online and Offline Evaluation How to Setup a better Evaluation?
  14. 14. Approaches for a better Evaluation • News recommendations @ plista • Job recommendations @ XING
  15. 15. The plista Recommendation Scenario Setting ● 250 ms response time ● 350 Mio AI/day ● In 10 Countries Challenges ● News change continuously ● User do not log-in explicitly ● Seasonality, context-depend user preferences
  16. 16. Offline • Cross-validation • Metric Optimization Engine (https://github.com/Yelp/MOE) • Integration into Spark • How well does it correlate with Online Evaluation? • Time Complexity Evaluation @ plista Online • AB Tests • Limited • by Caching Memory • Computational Resources • MOE*
  17. 17. Offline • Mean and variance estimation of parameter space with Gaussian Process • Evaluate parameter with highest Expected Improvement (EI), Upper Confidence Interval …. • Rest API Evaluation using MOE
  18. 18. Online • A/B Tests are expensive • Model non-stationarity • Integrate out non-stationarity to get mean EI Evaluation using MOE
  19. 19. Provide an API enabling researchers testing own ideas • The CLEF-NewsREEL challenge • A Challenge in CLEF (Conferences and Labs of the Evaluation Forum) • 2 Tasks: Online and Offline Evaluation The CLEF-NewsREEL challenge
  20. 20. How does the challenge work? • Live streams consisting of impressions, requests, and clicks, 5 publishers, approx 6 Million messages per day • Technical requirements: 100 ms per request • Live evaluation based on CTR CLEF-NewsREEL Online Task
  21. 21. Online vs. Offline Evaluation • Technical aspects can be evaluated without user feedback • Analyze the required resources and the response time • Simulate the online evaluation by replaying a recorded stream CLEF-NewsREEL Offline Task
  22. 22. Challenge • Realistic simulation of streams • Reproducible setup of computing environments Solution • A framework simplifying the setup of the evaluation environment • The Idomaar framework developed in the CrowdRec project CLEF-NewsREEL Offline Task http://rf.crowdrec.eu
  23. 23. More Information • SIGIR forum Dec 2015 (Vol 49, #2) http://sigir.org/files/forum/2015D/p129.pdf Evaluate your algorithm online and offline in NewsREEL • Register for the challenge! http://crowdrec.eu/2015/11/clef-newsreel-2016/ (register until 22nd of April) • Tutorials and Templates are provided at orp.plista.com CLEF-NewsREEL
  24. 24. https://recsys.xing.com/ XING - RecSys Challenge
  25. 25. Job Recommendations @ XING
  26. 26. XING - Evaluation based on interaction ● On Xing users can give feedback on recommendations. ● Number of user feedback way lower than implicit measures. ● A/B Tests focus on clickthrough rate.
  27. 27. XING - RecSys Challenge, Scoring, Space on Page ● Predict 30 items for each user. ● Score: weighted combination of the precision ○ precisionAt(2) ○ precisionAt(4) ○ precisionAt(6) ○ precisionAt(20) Top 6
  28. 28. XING - RecSys Challenge, User Data • User ID • Job Title • Educational Degree • Field of Study • Location
  29. 29. XING - RecSys Challenge, User Data • Number of past jobs • Years of Experience • Current career level • Current discipline • Current industry
  30. 30. XING - RecSys Challenge, Item Data • Job title • Desired career level • Desired discipline • Desired industry
  31. 31. XING - RecSys Challenge, Interaction Data • Timestamp • User • Job • Type: • Deletion • Click • Bookmark
  32. 32. XING - RecSys Challenge, Anonymization
  33. 33. XING - RecSys Challenge, Anonymization
  34. 34. XING - RecSys Challenge, Future • Live Challenge • Users submit predicted future interactions • The solution is recommended on the platform • Participants get points for actual user clicks Release to Challenge Collect Clicks Work On Predictions Score
  35. 35. How to setup a better Evaluation • Consider different quality criteria (prediction, technical, business models) • Aggregate heterogeneous information sources • Consider user feedback • Use online and offline analyses to understand users and their requirements Concluding ...
  36. 36. Participate in challenges based on real-life scenarios • NewsREEL challenge Concluding ... • RecSys 2016 challenge => Organize a challenge. Focus on real-life data.
  37. 37. More Information • http://www.crowdrec.eu • (http://www.clef-newsreel.org) • http://orp.plista.com • http://2016.recsyschallenge.com • http://www.xing.com Thank You Questions?

×