SlideShare a Scribd company logo
1 of 37
Download to read offline
Get on with it!
Recommender system industry
challenges move towards real-world,
online evaluation
Padova – March 24th, 2016
Andreas Lommatzsch - TU Berlin, Berlin, Germany
Jonas Seiler - plista, Berlin, Germany
Daniel Kohlsdorf - XING, Hamburg, Germany
CrowdRec - www.crowdrec.eu
Idomaar - http://rf.crowdrec.eu
• Andreas
Andreas Lommatzsch
Andreas.Lommatzsch@tu-berlin.de
http://www.dai-lab.de
• s
Jonas Seiler
Jonas.Seiler@plista.com
http://www.plista.com
• Daniel
Daniel Kohlsdorf
Daniel.Kohlsdorf@xing.com
http://www.xing.com
Where are recommender
system challenges headed?
Direction 1:
Use info beyond the
user-item matrix.
Direction 2:
Online evaluation +
multiple metrics.
Moving towards real-world evaluation
Flickr credit: rodneycampbell
Why evaluate?
• Evaluation is crucial for the success of real-life systems
• How should we evaluate?
Precision and
Recall
Technical
complexity
Influence
on sales
Required hardware
resources
Business
models
Scalability
Diversity of the
presented results
User
satisfaction
Evaluation Settings
• A static collection of documents
• A set of queries
• A list of relevant documents defined by
experts for each query
Traditional Evaluation in IR
“The Cranfield paradigm”
Advantages
• Reproducible setting
• All researches have exactly the same
information
• Optimized for measuring precision
Query0
* #nn
* #nn
* #nn
Traditional Evaluation in IR
Weaknesses of traditional IR evaluation
• High costs for creating dataset
• Datasets are not up-to-date
• Domain-specific documents
• The expert-defined ground truth does not
consider individual user preferences
• Individual user preferences
• Context-awareness is not considered
• Technical aspects are ignored
Context is
everything
Industry and recsys challenges
• Challenges benefit both industry and academic research.
• We look at how industry challenges have evolved since
the Netflix prize 2009.
Traditional Evaluation in RecSys
Evaluation Settings
• Rating prediction on user-item matrices
• Large, sparse dataset
• Predict personalized ratings
• Cross-validation, RMSE
Advantages
• Reproducible setting
• Personalization
• Dataset is based on
real user ratings “The Netflix paradigm”
Traditional Evaluation in RecSys
Weaknesses of traditional Recommender evaluation
• Static data
• Only one type of data - only user ratings
• User ratings are noisy
• Temporal aspects tend to be ignored
• Context-awareness is not considered
• Technical aspects are ignored
Challenges of Developing Applications
Challenges
• Data streams - continuous changes
• Big data
• Combine knowledge from different sources
• Context-Awareness
• Users expect personally relevant results
• Heterogeneous devices
• Technical complexity, real-time requirements
How to address these challenges in the Evaluation?
• Realistic evaluation setting
• Heterogeneous data sources
• Streams
• Dynamic user feedback
• Appropriate metrics
• Precision and User satisfaction
• Technical complexity
• Sales and Business models
• Online and Offline Evaluation
How to Setup a better Evaluation?
Approaches for a better Evaluation
• News recommendations
@ plista
• Job recommendations
@ XING
The plista Recommendation Scenario
Setting
● 250 ms response time
● 350 Mio AI/day
● In 10 Countries
Challenges
● News change
continuously
● User do not log-in
explicitly
● Seasonality,
context-depend user
preferences
Offline
• Cross-validation
• Metric Optimization Engine
(https://github.com/Yelp/MOE)
• Integration into Spark
• How well does it correlate with
Online Evaluation?
• Time Complexity
Evaluation @ plista
Online
• AB Tests
• Limited
• by Caching Memory
• Computational
Resources
• MOE*
Offline
• Mean and variance estimation of parameter space with
Gaussian Process
• Evaluate parameter with highest Expected Improvement (EI),
Upper Confidence Interval ….
• Rest API
Evaluation using MOE
Online
• A/B Tests are expensive
• Model non-stationarity
• Integrate out non-stationarity
to get mean EI
Evaluation using MOE
Provide an API enabling researchers testing own ideas
• The CLEF-NewsREEL challenge
• A Challenge in CLEF (Conferences and Labs of the Evaluation Forum)
• 2 Tasks: Online and Offline Evaluation
The CLEF-NewsREEL challenge
How does the challenge work?
• Live streams consisting of impressions, requests, and
clicks, 5 publishers, approx 6 Million messages per day
• Technical requirements: 100 ms per request
• Live evaluation
based on CTR
CLEF-NewsREEL
Online Task
Online vs. Offline Evaluation
• Technical aspects can be evaluated without user feedback
• Analyze the required resources and the response time
• Simulate the online evaluation by replaying a recorded
stream
CLEF-NewsREEL
Offline Task
Challenge
• Realistic simulation of streams
• Reproducible setup of computing environments
Solution
• A framework simplifying
the setup of the evaluation
environment
• The Idomaar framework
developed in the CrowdRec project
CLEF-NewsREEL
Offline Task
http://rf.crowdrec.eu
More Information
• SIGIR forum Dec 2015 (Vol 49, #2)
http://sigir.org/files/forum/2015D/p129.pdf
Evaluate your algorithm online and offline in NewsREEL
• Register for the challenge!
http://crowdrec.eu/2015/11/clef-newsreel-2016/
(register until 22nd of April)
• Tutorials and Templates are provided at orp.plista.com
CLEF-NewsREEL
https://recsys.xing.com/
XING - RecSys Challenge
Job Recommendations @ XING
XING - Evaluation based on interaction
● On Xing users can give feedback on recommendations.
● Number of user feedback way lower than implicit measures.
● A/B Tests focus on clickthrough rate.
XING - RecSys Challenge, Scoring,
Space on Page
● Predict 30 items for each user.
● Score: weighted combination of the
precision
○ precisionAt(2)
○ precisionAt(4)
○ precisionAt(6)
○ precisionAt(20)
Top 6
XING - RecSys Challenge, User Data
• User ID
• Job Title
• Educational Degree
• Field of Study
• Location
XING - RecSys Challenge, User Data
• Number of past jobs
• Years of Experience
• Current career level
• Current discipline
• Current industry
XING - RecSys Challenge, Item Data
• Job title
• Desired career level
• Desired discipline
• Desired industry
XING - RecSys Challenge, Interaction Data
• Timestamp
• User
• Job
• Type:
• Deletion
• Click
• Bookmark
XING - RecSys Challenge, Anonymization
XING - RecSys Challenge, Anonymization
XING - RecSys Challenge, Future
• Live Challenge
• Users submit predicted future interactions
• The solution is recommended on the platform
• Participants get points for actual user clicks
Release to Challenge Collect Clicks
Work On Predictions
Score
How to setup a better Evaluation
• Consider different quality criteria
(prediction, technical, business models)
• Aggregate heterogeneous information sources
• Consider user feedback
• Use online and offline analyses
to understand users and their
requirements
Concluding ...
Participate in challenges based on real-life scenarios
• NewsREEL challenge
Concluding ...
• RecSys 2016 challenge
=> Organize a challenge. Focus on real-life data.
More Information
• http://www.crowdrec.eu
• (http://www.clef-newsreel.org)
• http://orp.plista.com
• http://2016.recsyschallenge.com
• http://www.xing.com
Thank You
Questions?

More Related Content

Similar to ECIR Recommendation Challenges

Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AISanjana Chowdhury
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryMark Constable
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Peter Schleinitz
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission TeamsDashlane
 
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...Comit Projects Ltd
 
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...TriNimbus
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AIPeter Skomoroch
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesAlan Said
 
Ambient Intelligence Design Process
Ambient Intelligence Design ProcessAmbient Intelligence Design Process
Ambient Intelligence Design ProcessFulvio Corno
 
Decision Matrix for IoT Product Development
Decision Matrix for IoT Product DevelopmentDecision Matrix for IoT Product Development
Decision Matrix for IoT Product DevelopmentAlexey Pyshkin
 
ATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinkingATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinkingNorbertKroth
 
Using analytics in ux design my view
Using analytics in ux design   my viewUsing analytics in ux design   my view
Using analytics in ux design my viewOuti Aramo
 
Product Lines and Ecosystems: from customization to configuration
Product Lines and Ecosystems: from customization to configurationProduct Lines and Ecosystems: from customization to configuration
Product Lines and Ecosystems: from customization to configurationAdaCore
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...DianaGray10
 
2014 12-16 biwug - cgi SharePoint Factory Framework
2014 12-16 biwug - cgi SharePoint Factory Framework2014 12-16 biwug - cgi SharePoint Factory Framework
2014 12-16 biwug - cgi SharePoint Factory FrameworkBIWUG
 
How to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product ManagerHow to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product ManagerProduct School
 
Software engineering jwfiles 3
Software engineering jwfiles 3Software engineering jwfiles 3
Software engineering jwfiles 3Azhar Shaik
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkPeter Skomoroch
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 

Similar to ECIR Recommendation Challenges (20)

Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AI
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project Delivery
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission Teams
 
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
 
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AI
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 
Ambient Intelligence Design Process
Ambient Intelligence Design ProcessAmbient Intelligence Design Process
Ambient Intelligence Design Process
 
Decision Matrix for IoT Product Development
Decision Matrix for IoT Product DevelopmentDecision Matrix for IoT Product Development
Decision Matrix for IoT Product Development
 
ATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinkingATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinking
 
Using analytics in ux design my view
Using analytics in ux design   my viewUsing analytics in ux design   my view
Using analytics in ux design my view
 
Product Lines and Ecosystems: from customization to configuration
Product Lines and Ecosystems: from customization to configurationProduct Lines and Ecosystems: from customization to configuration
Product Lines and Ecosystems: from customization to configuration
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
 
2014 12-16 biwug - cgi SharePoint Factory Framework
2014 12-16 biwug - cgi SharePoint Factory Framework2014 12-16 biwug - cgi SharePoint Factory Framework
2014 12-16 biwug - cgi SharePoint Factory Framework
 
How to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product ManagerHow to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product Manager
 
Software engineering jwfiles 3
Software engineering jwfiles 3Software engineering jwfiles 3
Software engineering jwfiles 3
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 

Recently uploaded

INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证nhjeo1gg
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一F La
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 

Recently uploaded (20)

INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 

ECIR Recommendation Challenges

  • 1. Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 24th, 2016 Andreas Lommatzsch - TU Berlin, Berlin, Germany Jonas Seiler - plista, Berlin, Germany Daniel Kohlsdorf - XING, Hamburg, Germany CrowdRec - www.crowdrec.eu Idomaar - http://rf.crowdrec.eu
  • 5. Where are recommender system challenges headed? Direction 1: Use info beyond the user-item matrix. Direction 2: Online evaluation + multiple metrics. Moving towards real-world evaluation Flickr credit: rodneycampbell
  • 6. Why evaluate? • Evaluation is crucial for the success of real-life systems • How should we evaluate? Precision and Recall Technical complexity Influence on sales Required hardware resources Business models Scalability Diversity of the presented results User satisfaction
  • 7. Evaluation Settings • A static collection of documents • A set of queries • A list of relevant documents defined by experts for each query Traditional Evaluation in IR “The Cranfield paradigm” Advantages • Reproducible setting • All researches have exactly the same information • Optimized for measuring precision Query0 * #nn * #nn * #nn
  • 8. Traditional Evaluation in IR Weaknesses of traditional IR evaluation • High costs for creating dataset • Datasets are not up-to-date • Domain-specific documents • The expert-defined ground truth does not consider individual user preferences • Individual user preferences • Context-awareness is not considered • Technical aspects are ignored Context is everything
  • 9. Industry and recsys challenges • Challenges benefit both industry and academic research. • We look at how industry challenges have evolved since the Netflix prize 2009.
  • 10. Traditional Evaluation in RecSys Evaluation Settings • Rating prediction on user-item matrices • Large, sparse dataset • Predict personalized ratings • Cross-validation, RMSE Advantages • Reproducible setting • Personalization • Dataset is based on real user ratings “The Netflix paradigm”
  • 11. Traditional Evaluation in RecSys Weaknesses of traditional Recommender evaluation • Static data • Only one type of data - only user ratings • User ratings are noisy • Temporal aspects tend to be ignored • Context-awareness is not considered • Technical aspects are ignored
  • 12. Challenges of Developing Applications Challenges • Data streams - continuous changes • Big data • Combine knowledge from different sources • Context-Awareness • Users expect personally relevant results • Heterogeneous devices • Technical complexity, real-time requirements
  • 13. How to address these challenges in the Evaluation? • Realistic evaluation setting • Heterogeneous data sources • Streams • Dynamic user feedback • Appropriate metrics • Precision and User satisfaction • Technical complexity • Sales and Business models • Online and Offline Evaluation How to Setup a better Evaluation?
  • 14. Approaches for a better Evaluation • News recommendations @ plista • Job recommendations @ XING
  • 15. The plista Recommendation Scenario Setting ● 250 ms response time ● 350 Mio AI/day ● In 10 Countries Challenges ● News change continuously ● User do not log-in explicitly ● Seasonality, context-depend user preferences
  • 16. Offline • Cross-validation • Metric Optimization Engine (https://github.com/Yelp/MOE) • Integration into Spark • How well does it correlate with Online Evaluation? • Time Complexity Evaluation @ plista Online • AB Tests • Limited • by Caching Memory • Computational Resources • MOE*
  • 17. Offline • Mean and variance estimation of parameter space with Gaussian Process • Evaluate parameter with highest Expected Improvement (EI), Upper Confidence Interval …. • Rest API Evaluation using MOE
  • 18. Online • A/B Tests are expensive • Model non-stationarity • Integrate out non-stationarity to get mean EI Evaluation using MOE
  • 19. Provide an API enabling researchers testing own ideas • The CLEF-NewsREEL challenge • A Challenge in CLEF (Conferences and Labs of the Evaluation Forum) • 2 Tasks: Online and Offline Evaluation The CLEF-NewsREEL challenge
  • 20. How does the challenge work? • Live streams consisting of impressions, requests, and clicks, 5 publishers, approx 6 Million messages per day • Technical requirements: 100 ms per request • Live evaluation based on CTR CLEF-NewsREEL Online Task
  • 21. Online vs. Offline Evaluation • Technical aspects can be evaluated without user feedback • Analyze the required resources and the response time • Simulate the online evaluation by replaying a recorded stream CLEF-NewsREEL Offline Task
  • 22. Challenge • Realistic simulation of streams • Reproducible setup of computing environments Solution • A framework simplifying the setup of the evaluation environment • The Idomaar framework developed in the CrowdRec project CLEF-NewsREEL Offline Task http://rf.crowdrec.eu
  • 23. More Information • SIGIR forum Dec 2015 (Vol 49, #2) http://sigir.org/files/forum/2015D/p129.pdf Evaluate your algorithm online and offline in NewsREEL • Register for the challenge! http://crowdrec.eu/2015/11/clef-newsreel-2016/ (register until 22nd of April) • Tutorials and Templates are provided at orp.plista.com CLEF-NewsREEL
  • 26. XING - Evaluation based on interaction ● On Xing users can give feedback on recommendations. ● Number of user feedback way lower than implicit measures. ● A/B Tests focus on clickthrough rate.
  • 27. XING - RecSys Challenge, Scoring, Space on Page ● Predict 30 items for each user. ● Score: weighted combination of the precision ○ precisionAt(2) ○ precisionAt(4) ○ precisionAt(6) ○ precisionAt(20) Top 6
  • 28. XING - RecSys Challenge, User Data • User ID • Job Title • Educational Degree • Field of Study • Location
  • 29. XING - RecSys Challenge, User Data • Number of past jobs • Years of Experience • Current career level • Current discipline • Current industry
  • 30. XING - RecSys Challenge, Item Data • Job title • Desired career level • Desired discipline • Desired industry
  • 31. XING - RecSys Challenge, Interaction Data • Timestamp • User • Job • Type: • Deletion • Click • Bookmark
  • 32. XING - RecSys Challenge, Anonymization
  • 33. XING - RecSys Challenge, Anonymization
  • 34. XING - RecSys Challenge, Future • Live Challenge • Users submit predicted future interactions • The solution is recommended on the platform • Participants get points for actual user clicks Release to Challenge Collect Clicks Work On Predictions Score
  • 35. How to setup a better Evaluation • Consider different quality criteria (prediction, technical, business models) • Aggregate heterogeneous information sources • Consider user feedback • Use online and offline analyses to understand users and their requirements Concluding ...
  • 36. Participate in challenges based on real-life scenarios • NewsREEL challenge Concluding ... • RecSys 2016 challenge => Organize a challenge. Focus on real-life data.
  • 37. More Information • http://www.crowdrec.eu • (http://www.clef-newsreel.org) • http://orp.plista.com • http://2016.recsyschallenge.com • http://www.xing.com Thank You Questions?