SlideShare a Scribd company logo
Get on with it!
Recommender system industry
challenges move towards real-world,
online evaluation
Padova – March 24th, 2016
Andreas Lommatzsch - TU Berlin, Berlin, Germany
Jonas Seiler - plista, Berlin, Germany
Daniel Kohlsdorf - XING, Hamburg, Germany
CrowdRec - www.crowdrec.eu
Idomaar - http://rf.crowdrec.eu
• Andreas
Andreas Lommatzsch
Andreas.Lommatzsch@tu-berlin.de
http://www.dai-lab.de
• s
Jonas Seiler
Jonas.Seiler@plista.com
http://www.plista.com
• Daniel
Daniel Kohlsdorf
Daniel.Kohlsdorf@xing.com
http://www.xing.com
Where are recommender
system challenges headed?
Direction 1:
Use info beyond the
user-item matrix.
Direction 2:
Online evaluation +
multiple metrics.
Moving towards real-world evaluation
Flickr credit: rodneycampbell
Why evaluate?
• Evaluation is crucial for the success of real-life systems
• How should we evaluate?
Precision and
Recall
Technical
complexity
Influence
on sales
Required hardware
resources
Business
models
Scalability
Diversity of the
presented results
User
satisfaction
Evaluation Settings
• A static collection of documents
• A set of queries
• A list of relevant documents defined by
experts for each query
Traditional Evaluation in IR
“The Cranfield paradigm”
Advantages
• Reproducible setting
• All researches have exactly the same
information
• Optimized for measuring precision
Query0
* #nn
* #nn
* #nn
Traditional Evaluation in IR
Weaknesses of traditional IR evaluation
• High costs for creating dataset
• Datasets are not up-to-date
• Domain-specific documents
• The expert-defined ground truth does not
consider individual user preferences
• Individual user preferences
• Context-awareness is not considered
• Technical aspects are ignored
Context is
everything
Industry and recsys challenges
• Challenges benefit both industry and academic research.
• We look at how industry challenges have evolved since
the Netflix prize 2009.
Traditional Evaluation in RecSys
Evaluation Settings
• Rating prediction on user-item matrices
• Large, sparse dataset
• Predict personalized ratings
• Cross-validation, RMSE
Advantages
• Reproducible setting
• Personalization
• Dataset is based on
real user ratings “The Netflix paradigm”
Traditional Evaluation in RecSys
Weaknesses of traditional Recommender evaluation
• Static data
• Only one type of data - only user ratings
• User ratings are noisy
• Temporal aspects tend to be ignored
• Context-awareness is not considered
• Technical aspects are ignored
Challenges of Developing Applications
Challenges
• Data streams - continuous changes
• Big data
• Combine knowledge from different sources
• Context-Awareness
• Users expect personally relevant results
• Heterogeneous devices
• Technical complexity, real-time requirements
How to address these challenges in the Evaluation?
• Realistic evaluation setting
• Heterogeneous data sources
• Streams
• Dynamic user feedback
• Appropriate metrics
• Precision and User satisfaction
• Technical complexity
• Sales and Business models
• Online and Offline Evaluation
How to Setup a better Evaluation?
Approaches for a better Evaluation
• News recommendations
@ plista
• Job recommendations
@ XING
The plista Recommendation Scenario
Setting
● 250 ms response time
● 350 Mio AI/day
● In 10 Countries
Challenges
● News change
continuously
● User do not log-in
explicitly
● Seasonality,
context-depend user
preferences
Offline
• Cross-validation
• Metric Optimization Engine
(https://github.com/Yelp/MOE)
• Integration into Spark
• How well does it correlate with
Online Evaluation?
• Time Complexity
Evaluation @ plista
Online
• AB Tests
• Limited
• by Caching Memory
• Computational
Resources
• MOE*
Offline
• Mean and variance estimation of parameter space with
Gaussian Process
• Evaluate parameter with highest Expected Improvement (EI),
Upper Confidence Interval ….
• Rest API
Evaluation using MOE
Online
• A/B Tests are expensive
• Model non-stationarity
• Integrate out non-stationarity
to get mean EI
Evaluation using MOE
Provide an API enabling researchers testing own ideas
• The CLEF-NewsREEL challenge
• A Challenge in CLEF (Conferences and Labs of the Evaluation Forum)
• 2 Tasks: Online and Offline Evaluation
The CLEF-NewsREEL challenge
How does the challenge work?
• Live streams consisting of impressions, requests, and
clicks, 5 publishers, approx 6 Million messages per day
• Technical requirements: 100 ms per request
• Live evaluation
based on CTR
CLEF-NewsREEL
Online Task
Online vs. Offline Evaluation
• Technical aspects can be evaluated without user feedback
• Analyze the required resources and the response time
• Simulate the online evaluation by replaying a recorded
stream
CLEF-NewsREEL
Offline Task
Challenge
• Realistic simulation of streams
• Reproducible setup of computing environments
Solution
• A framework simplifying
the setup of the evaluation
environment
• The Idomaar framework
developed in the CrowdRec project
CLEF-NewsREEL
Offline Task
http://rf.crowdrec.eu
More Information
• SIGIR forum Dec 2015 (Vol 49, #2)
http://sigir.org/files/forum/2015D/p129.pdf
Evaluate your algorithm online and offline in NewsREEL
• Register for the challenge!
http://crowdrec.eu/2015/11/clef-newsreel-2016/
(register until 22nd of April)
• Tutorials and Templates are provided at orp.plista.com
CLEF-NewsREEL
https://recsys.xing.com/
XING - RecSys Challenge
Job Recommendations @ XING
XING - Evaluation based on interaction
● On Xing users can give feedback on recommendations.
● Number of user feedback way lower than implicit measures.
● A/B Tests focus on clickthrough rate.
XING - RecSys Challenge, Scoring,
Space on Page
● Predict 30 items for each user.
● Score: weighted combination of the
precision
○ precisionAt(2)
○ precisionAt(4)
○ precisionAt(6)
○ precisionAt(20)
Top 6
XING - RecSys Challenge, User Data
• User ID
• Job Title
• Educational Degree
• Field of Study
• Location
XING - RecSys Challenge, User Data
• Number of past jobs
• Years of Experience
• Current career level
• Current discipline
• Current industry
XING - RecSys Challenge, Item Data
• Job title
• Desired career level
• Desired discipline
• Desired industry
XING - RecSys Challenge, Interaction Data
• Timestamp
• User
• Job
• Type:
• Deletion
• Click
• Bookmark
XING - RecSys Challenge, Anonymization
XING - RecSys Challenge, Anonymization
XING - RecSys Challenge, Future
• Live Challenge
• Users submit predicted future interactions
• The solution is recommended on the platform
• Participants get points for actual user clicks
Release to Challenge Collect Clicks
Work On Predictions
Score
How to setup a better Evaluation
• Consider different quality criteria
(prediction, technical, business models)
• Aggregate heterogeneous information sources
• Consider user feedback
• Use online and offline analyses
to understand users and their
requirements
Concluding ...
Participate in challenges based on real-life scenarios
• NewsREEL challenge
Concluding ...
• RecSys 2016 challenge
=> Organize a challenge. Focus on real-life data.
More Information
• http://www.crowdrec.eu
• (http://www.clef-newsreel.org)
• http://orp.plista.com
• http://2016.recsyschallenge.com
• http://www.xing.com
Thank You
Questions?

More Related Content

Similar to ECIR Recommendation Challenges

Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AI
Sanjana Chowdhury
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project Delivery
Mark Constable
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0
Peter Schleinitz
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission Teams
Dashlane
 
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
Comit Projects Ltd
 
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
TriNimbus
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AI
Peter Skomoroch
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
Alan Said
 
Ambient Intelligence Design Process
Ambient Intelligence Design ProcessAmbient Intelligence Design Process
Ambient Intelligence Design Process
Fulvio Corno
 
Decision Matrix for IoT Product Development
Decision Matrix for IoT Product DevelopmentDecision Matrix for IoT Product Development
Decision Matrix for IoT Product Development
Alexey Pyshkin
 
ATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinkingATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinking
NorbertKroth
 
Using analytics in ux design my view
Using analytics in ux design   my viewUsing analytics in ux design   my view
Using analytics in ux design my view
Outi Aramo
 
Product Lines and Ecosystems: from customization to configuration
Product Lines and Ecosystems: from customization to configurationProduct Lines and Ecosystems: from customization to configuration
Product Lines and Ecosystems: from customization to configuration
AdaCore
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
Sri Ambati
 
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
DianaGray10
 
2014 12-16 biwug - cgi SharePoint Factory Framework
2014 12-16 biwug - cgi SharePoint Factory Framework2014 12-16 biwug - cgi SharePoint Factory Framework
2014 12-16 biwug - cgi SharePoint Factory Framework
BIWUG
 
How to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product ManagerHow to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product Manager
Product School
 
Software engineering jwfiles 3
Software engineering jwfiles 3Software engineering jwfiles 3
Software engineering jwfiles 3
Azhar Shaik
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
Peter Skomoroch
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
Turi, Inc.
 

Similar to ECIR Recommendation Challenges (20)

Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AI
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project Delivery
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission Teams
 
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
 
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AI
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 
Ambient Intelligence Design Process
Ambient Intelligence Design ProcessAmbient Intelligence Design Process
Ambient Intelligence Design Process
 
Decision Matrix for IoT Product Development
Decision Matrix for IoT Product DevelopmentDecision Matrix for IoT Product Development
Decision Matrix for IoT Product Development
 
ATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinkingATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinking
 
Using analytics in ux design my view
Using analytics in ux design   my viewUsing analytics in ux design   my view
Using analytics in ux design my view
 
Product Lines and Ecosystems: from customization to configuration
Product Lines and Ecosystems: from customization to configurationProduct Lines and Ecosystems: from customization to configuration
Product Lines and Ecosystems: from customization to configuration
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
 
2014 12-16 biwug - cgi SharePoint Factory Framework
2014 12-16 biwug - cgi SharePoint Factory Framework2014 12-16 biwug - cgi SharePoint Factory Framework
2014 12-16 biwug - cgi SharePoint Factory Framework
 
How to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product ManagerHow to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product Manager
 
Software engineering jwfiles 3
Software engineering jwfiles 3Software engineering jwfiles 3
Software engineering jwfiles 3
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 

Recently uploaded

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 

Recently uploaded (20)

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 

ECIR Recommendation Challenges

  • 1. Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 24th, 2016 Andreas Lommatzsch - TU Berlin, Berlin, Germany Jonas Seiler - plista, Berlin, Germany Daniel Kohlsdorf - XING, Hamburg, Germany CrowdRec - www.crowdrec.eu Idomaar - http://rf.crowdrec.eu
  • 5. Where are recommender system challenges headed? Direction 1: Use info beyond the user-item matrix. Direction 2: Online evaluation + multiple metrics. Moving towards real-world evaluation Flickr credit: rodneycampbell
  • 6. Why evaluate? • Evaluation is crucial for the success of real-life systems • How should we evaluate? Precision and Recall Technical complexity Influence on sales Required hardware resources Business models Scalability Diversity of the presented results User satisfaction
  • 7. Evaluation Settings • A static collection of documents • A set of queries • A list of relevant documents defined by experts for each query Traditional Evaluation in IR “The Cranfield paradigm” Advantages • Reproducible setting • All researches have exactly the same information • Optimized for measuring precision Query0 * #nn * #nn * #nn
  • 8. Traditional Evaluation in IR Weaknesses of traditional IR evaluation • High costs for creating dataset • Datasets are not up-to-date • Domain-specific documents • The expert-defined ground truth does not consider individual user preferences • Individual user preferences • Context-awareness is not considered • Technical aspects are ignored Context is everything
  • 9. Industry and recsys challenges • Challenges benefit both industry and academic research. • We look at how industry challenges have evolved since the Netflix prize 2009.
  • 10. Traditional Evaluation in RecSys Evaluation Settings • Rating prediction on user-item matrices • Large, sparse dataset • Predict personalized ratings • Cross-validation, RMSE Advantages • Reproducible setting • Personalization • Dataset is based on real user ratings “The Netflix paradigm”
  • 11. Traditional Evaluation in RecSys Weaknesses of traditional Recommender evaluation • Static data • Only one type of data - only user ratings • User ratings are noisy • Temporal aspects tend to be ignored • Context-awareness is not considered • Technical aspects are ignored
  • 12. Challenges of Developing Applications Challenges • Data streams - continuous changes • Big data • Combine knowledge from different sources • Context-Awareness • Users expect personally relevant results • Heterogeneous devices • Technical complexity, real-time requirements
  • 13. How to address these challenges in the Evaluation? • Realistic evaluation setting • Heterogeneous data sources • Streams • Dynamic user feedback • Appropriate metrics • Precision and User satisfaction • Technical complexity • Sales and Business models • Online and Offline Evaluation How to Setup a better Evaluation?
  • 14. Approaches for a better Evaluation • News recommendations @ plista • Job recommendations @ XING
  • 15. The plista Recommendation Scenario Setting ● 250 ms response time ● 350 Mio AI/day ● In 10 Countries Challenges ● News change continuously ● User do not log-in explicitly ● Seasonality, context-depend user preferences
  • 16. Offline • Cross-validation • Metric Optimization Engine (https://github.com/Yelp/MOE) • Integration into Spark • How well does it correlate with Online Evaluation? • Time Complexity Evaluation @ plista Online • AB Tests • Limited • by Caching Memory • Computational Resources • MOE*
  • 17. Offline • Mean and variance estimation of parameter space with Gaussian Process • Evaluate parameter with highest Expected Improvement (EI), Upper Confidence Interval …. • Rest API Evaluation using MOE
  • 18. Online • A/B Tests are expensive • Model non-stationarity • Integrate out non-stationarity to get mean EI Evaluation using MOE
  • 19. Provide an API enabling researchers testing own ideas • The CLEF-NewsREEL challenge • A Challenge in CLEF (Conferences and Labs of the Evaluation Forum) • 2 Tasks: Online and Offline Evaluation The CLEF-NewsREEL challenge
  • 20. How does the challenge work? • Live streams consisting of impressions, requests, and clicks, 5 publishers, approx 6 Million messages per day • Technical requirements: 100 ms per request • Live evaluation based on CTR CLEF-NewsREEL Online Task
  • 21. Online vs. Offline Evaluation • Technical aspects can be evaluated without user feedback • Analyze the required resources and the response time • Simulate the online evaluation by replaying a recorded stream CLEF-NewsREEL Offline Task
  • 22. Challenge • Realistic simulation of streams • Reproducible setup of computing environments Solution • A framework simplifying the setup of the evaluation environment • The Idomaar framework developed in the CrowdRec project CLEF-NewsREEL Offline Task http://rf.crowdrec.eu
  • 23. More Information • SIGIR forum Dec 2015 (Vol 49, #2) http://sigir.org/files/forum/2015D/p129.pdf Evaluate your algorithm online and offline in NewsREEL • Register for the challenge! http://crowdrec.eu/2015/11/clef-newsreel-2016/ (register until 22nd of April) • Tutorials and Templates are provided at orp.plista.com CLEF-NewsREEL
  • 26. XING - Evaluation based on interaction ● On Xing users can give feedback on recommendations. ● Number of user feedback way lower than implicit measures. ● A/B Tests focus on clickthrough rate.
  • 27. XING - RecSys Challenge, Scoring, Space on Page ● Predict 30 items for each user. ● Score: weighted combination of the precision ○ precisionAt(2) ○ precisionAt(4) ○ precisionAt(6) ○ precisionAt(20) Top 6
  • 28. XING - RecSys Challenge, User Data • User ID • Job Title • Educational Degree • Field of Study • Location
  • 29. XING - RecSys Challenge, User Data • Number of past jobs • Years of Experience • Current career level • Current discipline • Current industry
  • 30. XING - RecSys Challenge, Item Data • Job title • Desired career level • Desired discipline • Desired industry
  • 31. XING - RecSys Challenge, Interaction Data • Timestamp • User • Job • Type: • Deletion • Click • Bookmark
  • 32. XING - RecSys Challenge, Anonymization
  • 33. XING - RecSys Challenge, Anonymization
  • 34. XING - RecSys Challenge, Future • Live Challenge • Users submit predicted future interactions • The solution is recommended on the platform • Participants get points for actual user clicks Release to Challenge Collect Clicks Work On Predictions Score
  • 35. How to setup a better Evaluation • Consider different quality criteria (prediction, technical, business models) • Aggregate heterogeneous information sources • Consider user feedback • Use online and offline analyses to understand users and their requirements Concluding ...
  • 36. Participate in challenges based on real-life scenarios • NewsREEL challenge Concluding ... • RecSys 2016 challenge => Organize a challenge. Focus on real-life data.
  • 37. More Information • http://www.crowdrec.eu • (http://www.clef-newsreel.org) • http://orp.plista.com • http://2016.recsyschallenge.com • http://www.xing.com Thank You Questions?