Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Get on with it!
Recommender system industry
challenges move towards real-world,
online evaluation
Padova – March 24th, 201...
• Andreas
Andreas Lommatzsch
Andreas.Lommatzsch@tu-berlin.de
http://www.dai-lab.de
• s
Jonas Seiler
Jonas.Seiler@plista.com
http://www.plista.com
• Daniel
Daniel Kohlsdorf
Daniel.Kohlsdorf@xing.com
http://www.xing.com
Where are recommender
system challenges headed?
Direction 1:
Use info beyond the
user-item matrix.
Direction 2:
Online eva...
Why evaluate?
• Evaluation is crucial for the success of real-life systems
• How should we evaluate?
Precision and
Recall
...
Evaluation Settings
• A static collection of documents
• A set of queries
• A list of relevant documents defined by
expert...
Traditional Evaluation in IR
Weaknesses of traditional IR evaluation
• High costs for creating dataset
• Datasets are not ...
Industry and recsys challenges
• Challenges benefit both industry and academic research.
• We look at how industry challen...
Traditional Evaluation in RecSys
Evaluation Settings
• Rating prediction on user-item matrices
• Large, sparse dataset
• P...
Traditional Evaluation in RecSys
Weaknesses of traditional Recommender evaluation
• Static data
• Only one type of data - ...
Challenges of Developing Applications
Challenges
• Data streams - continuous changes
• Big data
• Combine knowledge from d...
How to address these challenges in the Evaluation?
• Realistic evaluation setting
• Heterogeneous data sources
• Streams
•...
Approaches for a better Evaluation
• News recommendations
@ plista
• Job recommendations
@ XING
The plista Recommendation Scenario
Setting
● 250 ms response time
● 350 Mio AI/day
● In 10 Countries
Challenges
● News cha...
Offline
• Cross-validation
• Metric Optimization Engine
(https://github.com/Yelp/MOE)
• Integration into Spark
• How well ...
Offline
• Mean and variance estimation of parameter space with
Gaussian Process
• Evaluate parameter with highest Expected...
Online
• A/B Tests are expensive
• Model non-stationarity
• Integrate out non-stationarity
to get mean EI
Evaluation using...
Provide an API enabling researchers testing own ideas
• The CLEF-NewsREEL challenge
• A Challenge in CLEF (Conferences and...
How does the challenge work?
• Live streams consisting of impressions, requests, and
clicks, 5 publishers, approx 6 Millio...
Online vs. Offline Evaluation
• Technical aspects can be evaluated without user feedback
• Analyze the required resources ...
Challenge
• Realistic simulation of streams
• Reproducible setup of computing environments
Solution
• A framework simplify...
More Information
• SIGIR forum Dec 2015 (Vol 49, #2)
http://sigir.org/files/forum/2015D/p129.pdf
Evaluate your algorithm o...
https://recsys.xing.com/
XING - RecSys Challenge
Job Recommendations @ XING
XING - Evaluation based on interaction
● On Xing users can give feedback on recommendations.
● Number of user feedback way...
XING - RecSys Challenge, Scoring,
Space on Page
● Predict 30 items for each user.
● Score: weighted combination of the
pre...
XING - RecSys Challenge, User Data
• User ID
• Job Title
• Educational Degree
• Field of Study
• Location
XING - RecSys Challenge, User Data
• Number of past jobs
• Years of Experience
• Current career level
• Current discipline...
XING - RecSys Challenge, Item Data
• Job title
• Desired career level
• Desired discipline
• Desired industry
XING - RecSys Challenge, Interaction Data
• Timestamp
• User
• Job
• Type:
• Deletion
• Click
• Bookmark
XING - RecSys Challenge, Anonymization
XING - RecSys Challenge, Anonymization
XING - RecSys Challenge, Future
• Live Challenge
• Users submit predicted future interactions
• The solution is recommende...
How to setup a better Evaluation
• Consider different quality criteria
(prediction, technical, business models)
• Aggregat...
Participate in challenges based on real-life scenarios
• NewsREEL challenge
Concluding ...
• RecSys 2016 challenge
=> Orga...
More Information
• http://www.crowdrec.eu
• (http://www.clef-newsreel.org)
• http://orp.plista.com
• http://2016.recsyscha...
Upcoming SlideShare
Loading in …5
×

ECIR Recommendation Challenges

166 views

Published on

Get on with it! Recommender system industry challenges move towards real-world, online evaluation

Published in: Data & Analytics

ECIR Recommendation Challenges

  1. 1. Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 24th, 2016 Andreas Lommatzsch - TU Berlin, Berlin, Germany Jonas Seiler - plista, Berlin, Germany Daniel Kohlsdorf - XING, Hamburg, Germany CrowdRec - www.crowdrec.eu Idomaar - http://rf.crowdrec.eu
  2. 2. • Andreas Andreas Lommatzsch Andreas.Lommatzsch@tu-berlin.de http://www.dai-lab.de
  3. 3. • s Jonas Seiler Jonas.Seiler@plista.com http://www.plista.com
  4. 4. • Daniel Daniel Kohlsdorf Daniel.Kohlsdorf@xing.com http://www.xing.com
  5. 5. Where are recommender system challenges headed? Direction 1: Use info beyond the user-item matrix. Direction 2: Online evaluation + multiple metrics. Moving towards real-world evaluation Flickr credit: rodneycampbell
  6. 6. Why evaluate? • Evaluation is crucial for the success of real-life systems • How should we evaluate? Precision and Recall Technical complexity Influence on sales Required hardware resources Business models Scalability Diversity of the presented results User satisfaction
  7. 7. Evaluation Settings • A static collection of documents • A set of queries • A list of relevant documents defined by experts for each query Traditional Evaluation in IR “The Cranfield paradigm” Advantages • Reproducible setting • All researches have exactly the same information • Optimized for measuring precision Query0 * #nn * #nn * #nn
  8. 8. Traditional Evaluation in IR Weaknesses of traditional IR evaluation • High costs for creating dataset • Datasets are not up-to-date • Domain-specific documents • The expert-defined ground truth does not consider individual user preferences • Individual user preferences • Context-awareness is not considered • Technical aspects are ignored Context is everything
  9. 9. Industry and recsys challenges • Challenges benefit both industry and academic research. • We look at how industry challenges have evolved since the Netflix prize 2009.
  10. 10. Traditional Evaluation in RecSys Evaluation Settings • Rating prediction on user-item matrices • Large, sparse dataset • Predict personalized ratings • Cross-validation, RMSE Advantages • Reproducible setting • Personalization • Dataset is based on real user ratings “The Netflix paradigm”
  11. 11. Traditional Evaluation in RecSys Weaknesses of traditional Recommender evaluation • Static data • Only one type of data - only user ratings • User ratings are noisy • Temporal aspects tend to be ignored • Context-awareness is not considered • Technical aspects are ignored
  12. 12. Challenges of Developing Applications Challenges • Data streams - continuous changes • Big data • Combine knowledge from different sources • Context-Awareness • Users expect personally relevant results • Heterogeneous devices • Technical complexity, real-time requirements
  13. 13. How to address these challenges in the Evaluation? • Realistic evaluation setting • Heterogeneous data sources • Streams • Dynamic user feedback • Appropriate metrics • Precision and User satisfaction • Technical complexity • Sales and Business models • Online and Offline Evaluation How to Setup a better Evaluation?
  14. 14. Approaches for a better Evaluation • News recommendations @ plista • Job recommendations @ XING
  15. 15. The plista Recommendation Scenario Setting ● 250 ms response time ● 350 Mio AI/day ● In 10 Countries Challenges ● News change continuously ● User do not log-in explicitly ● Seasonality, context-depend user preferences
  16. 16. Offline • Cross-validation • Metric Optimization Engine (https://github.com/Yelp/MOE) • Integration into Spark • How well does it correlate with Online Evaluation? • Time Complexity Evaluation @ plista Online • AB Tests • Limited • by Caching Memory • Computational Resources • MOE*
  17. 17. Offline • Mean and variance estimation of parameter space with Gaussian Process • Evaluate parameter with highest Expected Improvement (EI), Upper Confidence Interval …. • Rest API Evaluation using MOE
  18. 18. Online • A/B Tests are expensive • Model non-stationarity • Integrate out non-stationarity to get mean EI Evaluation using MOE
  19. 19. Provide an API enabling researchers testing own ideas • The CLEF-NewsREEL challenge • A Challenge in CLEF (Conferences and Labs of the Evaluation Forum) • 2 Tasks: Online and Offline Evaluation The CLEF-NewsREEL challenge
  20. 20. How does the challenge work? • Live streams consisting of impressions, requests, and clicks, 5 publishers, approx 6 Million messages per day • Technical requirements: 100 ms per request • Live evaluation based on CTR CLEF-NewsREEL Online Task
  21. 21. Online vs. Offline Evaluation • Technical aspects can be evaluated without user feedback • Analyze the required resources and the response time • Simulate the online evaluation by replaying a recorded stream CLEF-NewsREEL Offline Task
  22. 22. Challenge • Realistic simulation of streams • Reproducible setup of computing environments Solution • A framework simplifying the setup of the evaluation environment • The Idomaar framework developed in the CrowdRec project CLEF-NewsREEL Offline Task http://rf.crowdrec.eu
  23. 23. More Information • SIGIR forum Dec 2015 (Vol 49, #2) http://sigir.org/files/forum/2015D/p129.pdf Evaluate your algorithm online and offline in NewsREEL • Register for the challenge! http://crowdrec.eu/2015/11/clef-newsreel-2016/ (register until 22nd of April) • Tutorials and Templates are provided at orp.plista.com CLEF-NewsREEL
  24. 24. https://recsys.xing.com/ XING - RecSys Challenge
  25. 25. Job Recommendations @ XING
  26. 26. XING - Evaluation based on interaction ● On Xing users can give feedback on recommendations. ● Number of user feedback way lower than implicit measures. ● A/B Tests focus on clickthrough rate.
  27. 27. XING - RecSys Challenge, Scoring, Space on Page ● Predict 30 items for each user. ● Score: weighted combination of the precision ○ precisionAt(2) ○ precisionAt(4) ○ precisionAt(6) ○ precisionAt(20) Top 6
  28. 28. XING - RecSys Challenge, User Data • User ID • Job Title • Educational Degree • Field of Study • Location
  29. 29. XING - RecSys Challenge, User Data • Number of past jobs • Years of Experience • Current career level • Current discipline • Current industry
  30. 30. XING - RecSys Challenge, Item Data • Job title • Desired career level • Desired discipline • Desired industry
  31. 31. XING - RecSys Challenge, Interaction Data • Timestamp • User • Job • Type: • Deletion • Click • Bookmark
  32. 32. XING - RecSys Challenge, Anonymization
  33. 33. XING - RecSys Challenge, Anonymization
  34. 34. XING - RecSys Challenge, Future • Live Challenge • Users submit predicted future interactions • The solution is recommended on the platform • Participants get points for actual user clicks Release to Challenge Collect Clicks Work On Predictions Score
  35. 35. How to setup a better Evaluation • Consider different quality criteria (prediction, technical, business models) • Aggregate heterogeneous information sources • Consider user feedback • Use online and offline analyses to understand users and their requirements Concluding ...
  36. 36. Participate in challenges based on real-life scenarios • NewsREEL challenge Concluding ... • RecSys 2016 challenge => Organize a challenge. Focus on real-life data.
  37. 37. More Information • http://www.crowdrec.eu • (http://www.clef-newsreel.org) • http://orp.plista.com • http://2016.recsyschallenge.com • http://www.xing.com Thank You Questions?

×