SlideShare a Scribd company logo
1 of 35
Download to read offline
Continuous Evaluation of
Collaborative Recommender Systems in
Data Stream Management Systems
Cornelius A. Ludmann, Marco Grawunder,
Timo Michelsen, H.-Jürgen Appelrath
{cornelius.ludmann, marco.grawunder, timo.michelsen, appelrath}@uni-oldenburg.de
University of Oldenburg · Department of Computer Science
Information Systems Group
DSEP @ BTW 2015, Hamburg, Germany
March 3, 2015
cornelius.ludmann.org · @ludmann
2
Content
● Short introduction to Recommender Systems
● From static and finite rating datasets to
an infinite stream of rating events
● Continuous query plan for collaborative RecSys
● Evaluation methodology for a DSMS-based
RecSys
● Prototypical implementation with Odysseus
3
Problem: Information Overload
Photo:CCBY-NC2.0,https://www.flickr.com/photos/lyonora/3608656428/,UserLeonoraGiovanazzi
How to handle the flood of information?
4
From Search to Recommendation
“The Web [...] is leaving the era of search and
entering one of discovery.
What's the difference?
Search is what you do when you're looking for
something. Discovery is when something wonderful
that you didn't know existed, or didn't know how to
ask for, finds you.”
– Jeffrey M. O'Brien: “The race to create a 'smart' Google”,
CNN Money / Fortune Magazine (2006)
http://archive.fortune.com/magazines/fortune/fortune_archive/2006/11/27/8394347/index.htm
5
The Recommender Problem
Estimate a utility function that
predicts how a user will like an item.
6
The Recommender Problem
Estimate a utility function that
predicts how a user will like an item.
products
7
The Recommender Problem
Estimate a utility function that
predicts how a user will like an item.
movies
8
The Recommender Problem
Estimate a utility function that
predicts how a user will like an item.
music
9
The Recommender Problem
Estimate a utility function that
predicts how a user will like an item.
quantified by a rating score
e. g., 1 to 5 stars:
10
The Recommender Problem
Estimate a utility function that
predicts how a user will like an item.
set of users
set of items
range of the rating score
e. g.:
11
The Recommender Problem
Estimate a utility function that
predicts how a user will like an item.
How to estimate ?
A common approach is Collaborative Filtering.
12
Collaborative Filtering (CF)
5 3 3
1 5
5 4 3
1 2 4
1
4 5
ratings are
given by the
users:
explicitly or
implicitly by
their
behavior
13
Collaborative Filtering (CF)
5 3 3
1 5
5 4 3
1 2 4
1
4 5
sparse
rating
matrix
14
Collaborative Filtering (CF)
A learner “finds” an approximation
to the true function :
Examples for learners are:
● User similarities methods
● Matrix Factorization methods
15
Collaborative Filtering (CF)
5 3 2
1 5
5 4 3 2.4
1 2 4
1
4 3
predict a rating
for all unrated
items
recommend
items with the
highest
predicted
ratings
16
What is about …?
… different situations?
… mood of the users?
… (temporary) changes of user interests?
17
Problem of Traditional RecSys
The interests of users on items can differ
at different point in times.
● Depending on the situation of the user
– Context awareness
– Hidden contexts
● Due to changing user preferences
– Concept drift
Context-Aware RecSys
consider context data.
Time-Aware RecSys
consider temporal effects.
18
From Static Rating Datasets …
Traditional RecSys learners use a static and finite set
of rating data to build a RecSys model.
Reoccurring model learning after a specific time span
to incorporate new rating data.
19
… to Rating Events (Real-life RecSys)
P Qx
Model
rating event
(user, item, rating)
at time t
Rating events occur continuously, which leads to new learning data
that potentially improves the predictions for all users.
20
Related Work
● Time-aware Collaborative RecSys
(e. g., Koren 2010)
● Incremental/Online Collaborative RecSys Algorithms
(e. g., BRISMF by Gábor et al. 2009)
● Collaborative RecSys with Apache Storm by Ali et al. 2011
– Algorithm for parallel collaborative filtering
● StreamRec by Campos et al. 2011
– Based on Microsoft StreamInsight
– Calculates item/user similarities with DSMS operators
● Massive Online Analysis (MOA) by Bifet et al. 2010
– Framework for evaluating machine learning algorithms
– Implements BRISMF
21
Why DSMS + RecSys?
Characteristics of a (our) DSMS:
● Continuous queries on potentially infinite data
streams
● Stream-based operators (one pass) /
stream-based relational algebra
● Query plan as directed graph of operators
● Logical operators are transformed to
physical operators
● Query optimizations, query sharing
● Time annotation of stream elements
● Operator framework
22
Why DSMS + RecSys?
What is the current interest of a user?
● Continuously processing of rating data
● Models are valid a specific time interval
● Deterministic temporal matching of model and data
In general:
● Take advantages of DSMS features (like optimizations,
flexible query formulation, …)
● Usage of established standard operators
● Extensibility like pre- and post-processing
(e. g., context reasoning/modeling, normalization, …)
23
RecSys Operators
get unrated
items
predict
rating
train
recsys model
recommend
Feedback
rating event model with
validity time
interval
request for
recommendations
recommendation
candidates
select top-k
24
RecSys Operators
get unrated
items
predict
rating
train
recsys model
recommend
Feedback
window
Rating data is potentially infinite in size!
A window operator limits the tuples in memory.
25
RecSys Operators
● “train recsys model” Operator
– Implements a learner to train a model.
– Holds valid rating data.
– Outputs models with a validity time interval.
● “get unrated items” Operator
– Gets request for recommendations.
– Outputs for each item that was not rated by the requesting user a
tuple as recommendation candidate.
● “predict rating” Operator
– Predicts for each recommendation candidate the rating.
● “recommend” Operator
– Selects the recommendations for the requesting user.
– Selects items with a min. rating and/or top-K items.
26
Operators for Continuous Evaluation
get unrated
items
route
predict
rating
train
recsys model
predict
rating
recommend
test
prediction
Feedback
window
test data
learning data
prediction
error
27
Operators for Continuous Evaluation
● “route” Operator
– Distributes incoming data as learning or test data.
– Implements an evaluation methodology.
– e. g., Hold out: route 10 % as test data
● “predict rating” Operator
– Predicts for each test tuple the rating.
● “test prediction” Operator
– Calculates an error value for true and predicted .
– e. g., Root Mean Square Error (RMSE)
28
Prototypical Implementation
29
Physical Query Plan
requests for recommendations (RfR)
rating data
metadata creation
(time interval)
routes learning and test data
joins RfR with temporal matching models
joins test data with
temporal matching
models
limits validity of learning data
implements learner
“now” window
30
Physical Query Plan
adds predicted
rating to
test tuple
map operator
aggregation
size
aggregation
operator
map operator
outputs unrated items
joins unrated items
and models
adds predicted
rating to unrated item
selects items
with min rating
selects top-K items
31
Physical Operators
● “train recsys model” Operator
– BRISMF (incremental matrix factorization)
– BRISMF implementation of Massive Online Analysis
(MOA) integrated in Odysseus
http://moa.cms.waikato.ac.nz/details/recommender-systems/
– One model for each subset of valid learning tuples.
– Exactly one valid model at every point in time.
– Needs to hold all learning tuples in memory but
does not build models from scratch.
32
Physical Operators
● “interleaved test-than-train” Operator
– Physical counterpart to “route” operator
– Using rating tuples for learning and testing
– Sets validity interval of test tuple to [t-1, t) to ensure a
matching to a model that has not used this tuple for learning
● “test prediction” Operator is implemented by a
– map, → calculates square error
– time window, → sets aggregation time span
– aggregation, → aggregates errors (avg)
– and another map operator → calculates root of error
33
Plot of Root Mean Square Error
34
Prototype Evaluation
● Comparison of RMSE after every learning tuple with
Massive Online Analysis (MOA)
– MovieLens dataset, ordered by timestamp, read line-by-line as
rating data
– No decay of learning tuples (unbounded window)
– Aggregation of RMSE over the whole dataset
– (Random users for request for recommendations)
● Same results as MOA
– MOA operates sequentially (first test, than train),
we ensure the correct order by the time annotation and the
temporal join
– Temporal matching works as expected
35
Summary and Future Work
● Generic, extendable and modular structure for a RecSys based on
DSMS operators
● Logical operators allow different physical implementations
● Time annotations ensures deterministic temporal matching of models
and data
● Prototypical implementation with BRISMF and Interleaved Test-Than-
Train
● Future Work:
– Implementation of learners that consider temporal aspects
– Impact of decay of tuples (different windows) on accuracy, latency,
throughput, memory consumption
– Optimizations of algorithms, query plan, transformations …
Thank you for your attention!

More Related Content

What's hot

Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsMatthias Braunhofer
 
Techniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsTechniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsMatthias Braunhofer
 
Real-world Strategies for Debugging Machine Learning Systems
Real-world Strategies for Debugging Machine Learning SystemsReal-world Strategies for Debugging Machine Learning Systems
Real-world Strategies for Debugging Machine Learning SystemsDatabricks
 
Use cases - agentMET4FOF
Use cases - agentMET4FOFUse cases - agentMET4FOF
Use cases - agentMET4FOFBang Xiang Yong
 
Summer internship 2014 report by Rishabh Misra, Thapar University
Summer internship 2014 report by Rishabh Misra, Thapar UniversitySummer internship 2014 report by Rishabh Misra, Thapar University
Summer internship 2014 report by Rishabh Misra, Thapar UniversityRishabh Misra
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryTim Menzies
 
Slides UMAP'13 paper "Exploiting the Semantic Similarity of Contextual Situat...
Slides UMAP'13 paper "Exploiting the Semantic Similarity of Contextual Situat...Slides UMAP'13 paper "Exploiting the Semantic Similarity of Contextual Situat...
Slides UMAP'13 paper "Exploiting the Semantic Similarity of Contextual Situat...Victor Codina
 
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Abdel Salam Sayyad
 
Decision Support Analyss for Software Effort Estimation by Analogy
Decision Support Analyss for Software Effort Estimation by AnalogyDecision Support Analyss for Software Effort Estimation by Analogy
Decision Support Analyss for Software Effort Estimation by AnalogyTim Menzies
 
A survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsA survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsAhmed Magdy Ezzeldin, MSc.
 
Recommendation System for Design Patterns in Software Development
Recommendation System for Design Patterns in Software DevelopmentRecommendation System for Design Patterns in Software Development
Recommendation System for Design Patterns in Software DevelopmentFrancis Palma
 
Web Rec Final Report
Web Rec Final ReportWeb Rec Final Report
Web Rec Final Reportweichen
 
Multimodal interactions in recommender systems (Bracis 2014)
Multimodal interactions in recommender systems (Bracis 2014)Multimodal interactions in recommender systems (Bracis 2014)
Multimodal interactions in recommender systems (Bracis 2014)Arthur Fortes
 
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Alan Said
 
Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Praveen Penumathsa
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendationAravindharamanan S
 
Automated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web ApplicationsAutomated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web ApplicationsLionel Briand
 
Software Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeSoftware Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeEditor IJMTER
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsAravind Sesagiri Raamkumar
 

What's hot (20)

Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender Systems
 
Techniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsTechniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start Recommendations
 
Real-world Strategies for Debugging Machine Learning Systems
Real-world Strategies for Debugging Machine Learning SystemsReal-world Strategies for Debugging Machine Learning Systems
Real-world Strategies for Debugging Machine Learning Systems
 
Use cases - agentMET4FOF
Use cases - agentMET4FOFUse cases - agentMET4FOF
Use cases - agentMET4FOF
 
Summer internship 2014 report by Rishabh Misra, Thapar University
Summer internship 2014 report by Rishabh Misra, Thapar UniversitySummer internship 2014 report by Rishabh Misra, Thapar University
Summer internship 2014 report by Rishabh Misra, Thapar University
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern Discovery
 
Slides UMAP'13 paper "Exploiting the Semantic Similarity of Contextual Situat...
Slides UMAP'13 paper "Exploiting the Semantic Similarity of Contextual Situat...Slides UMAP'13 paper "Exploiting the Semantic Similarity of Contextual Situat...
Slides UMAP'13 paper "Exploiting the Semantic Similarity of Contextual Situat...
 
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
 
Decision Support Analyss for Software Effort Estimation by Analogy
Decision Support Analyss for Software Effort Estimation by AnalogyDecision Support Analyss for Software Effort Estimation by Analogy
Decision Support Analyss for Software Effort Estimation by Analogy
 
A survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsA survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithms
 
Recommendation System for Design Patterns in Software Development
Recommendation System for Design Patterns in Software DevelopmentRecommendation System for Design Patterns in Software Development
Recommendation System for Design Patterns in Software Development
 
Web Rec Final Report
Web Rec Final ReportWeb Rec Final Report
Web Rec Final Report
 
Multimodal interactions in recommender systems (Bracis 2014)
Multimodal interactions in recommender systems (Bracis 2014)Multimodal interactions in recommender systems (Bracis 2014)
Multimodal interactions in recommender systems (Bracis 2014)
 
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
 
Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendation
 
Automated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web ApplicationsAutomated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web Applications
 
Software Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeSoftware Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking Scheme
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender Systems
 
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
 

Similar to Continuous Evaluation of Collaborative Recommender Systems in Data Stream Management Systems

software engineering powerpoint presentation foe everyone
software engineering powerpoint presentation foe everyonesoftware engineering powerpoint presentation foe everyone
software engineering powerpoint presentation foe everyonerebantaofficial
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenPoo Kuan Hoong
 
Crowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesCrowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesAlekya Yermal
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...IRJET Journal
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyAbdel Salam Sayyad
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning SystemsXavier Amatriain
 
Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...
Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...
Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...ESEM 2014
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engineJayesh Lahori
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Training and Placement Portal
Training and Placement PortalTraining and Placement Portal
Training and Placement PortalIRJET Journal
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVIntoTheMinds
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVFrancisco Couto
 
Usability evaluation methods (part 2) and performance metrics
Usability evaluation methods (part 2) and performance metricsUsability evaluation methods (part 2) and performance metrics
Usability evaluation methods (part 2) and performance metricsAndres Baravalle
 
the application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEEthe application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEEKiranKumar671235
 
SIZE ESTIMATION OF OLAP SYSTEMS
SIZE ESTIMATION OF OLAP SYSTEMSSIZE ESTIMATION OF OLAP SYSTEMS
SIZE ESTIMATION OF OLAP SYSTEMScscpconf
 
Size estimation of olap systems
Size estimation of olap systemsSize estimation of olap systems
Size estimation of olap systemscsandit
 

Similar to Continuous Evaluation of Collaborative Recommender Systems in Data Stream Management Systems (20)

Thesis
ThesisThesis
Thesis
 
GKumarAICS
GKumarAICSGKumarAICS
GKumarAICS
 
software engineering powerpoint presentation foe everyone
software engineering powerpoint presentation foe everyonesoftware engineering powerpoint presentation foe everyone
software engineering powerpoint presentation foe everyone
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
Apsec 2014 Presentation
Apsec 2014 PresentationApsec 2014 Presentation
Apsec 2014 Presentation
 
Crowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesCrowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral Outcomes
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...
Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...
Effect of Temporal Collaboration Network, Maintenance Activity, and Experienc...
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engine
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Training and Placement Portal
Training and Placement PortalTraining and Placement Portal
Training and Placement Portal
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
Usability evaluation methods (part 2) and performance metrics
Usability evaluation methods (part 2) and performance metricsUsability evaluation methods (part 2) and performance metrics
Usability evaluation methods (part 2) and performance metrics
 
the application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEEthe application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEE
 
SIZE ESTIMATION OF OLAP SYSTEMS
SIZE ESTIMATION OF OLAP SYSTEMSSIZE ESTIMATION OF OLAP SYSTEMS
SIZE ESTIMATION OF OLAP SYSTEMS
 
Size estimation of olap systems
Size estimation of olap systemsSize estimation of olap systems
Size estimation of olap systems
 

Recently uploaded

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Continuous Evaluation of Collaborative Recommender Systems in Data Stream Management Systems

  • 1. Continuous Evaluation of Collaborative Recommender Systems in Data Stream Management Systems Cornelius A. Ludmann, Marco Grawunder, Timo Michelsen, H.-Jürgen Appelrath {cornelius.ludmann, marco.grawunder, timo.michelsen, appelrath}@uni-oldenburg.de University of Oldenburg · Department of Computer Science Information Systems Group DSEP @ BTW 2015, Hamburg, Germany March 3, 2015 cornelius.ludmann.org · @ludmann
  • 2. 2 Content ● Short introduction to Recommender Systems ● From static and finite rating datasets to an infinite stream of rating events ● Continuous query plan for collaborative RecSys ● Evaluation methodology for a DSMS-based RecSys ● Prototypical implementation with Odysseus
  • 4. 4 From Search to Recommendation “The Web [...] is leaving the era of search and entering one of discovery. What's the difference? Search is what you do when you're looking for something. Discovery is when something wonderful that you didn't know existed, or didn't know how to ask for, finds you.” – Jeffrey M. O'Brien: “The race to create a 'smart' Google”, CNN Money / Fortune Magazine (2006) http://archive.fortune.com/magazines/fortune/fortune_archive/2006/11/27/8394347/index.htm
  • 5. 5 The Recommender Problem Estimate a utility function that predicts how a user will like an item.
  • 6. 6 The Recommender Problem Estimate a utility function that predicts how a user will like an item. products
  • 7. 7 The Recommender Problem Estimate a utility function that predicts how a user will like an item. movies
  • 8. 8 The Recommender Problem Estimate a utility function that predicts how a user will like an item. music
  • 9. 9 The Recommender Problem Estimate a utility function that predicts how a user will like an item. quantified by a rating score e. g., 1 to 5 stars:
  • 10. 10 The Recommender Problem Estimate a utility function that predicts how a user will like an item. set of users set of items range of the rating score e. g.:
  • 11. 11 The Recommender Problem Estimate a utility function that predicts how a user will like an item. How to estimate ? A common approach is Collaborative Filtering.
  • 12. 12 Collaborative Filtering (CF) 5 3 3 1 5 5 4 3 1 2 4 1 4 5 ratings are given by the users: explicitly or implicitly by their behavior
  • 13. 13 Collaborative Filtering (CF) 5 3 3 1 5 5 4 3 1 2 4 1 4 5 sparse rating matrix
  • 14. 14 Collaborative Filtering (CF) A learner “finds” an approximation to the true function : Examples for learners are: ● User similarities methods ● Matrix Factorization methods
  • 15. 15 Collaborative Filtering (CF) 5 3 2 1 5 5 4 3 2.4 1 2 4 1 4 3 predict a rating for all unrated items recommend items with the highest predicted ratings
  • 16. 16 What is about …? … different situations? … mood of the users? … (temporary) changes of user interests?
  • 17. 17 Problem of Traditional RecSys The interests of users on items can differ at different point in times. ● Depending on the situation of the user – Context awareness – Hidden contexts ● Due to changing user preferences – Concept drift Context-Aware RecSys consider context data. Time-Aware RecSys consider temporal effects.
  • 18. 18 From Static Rating Datasets … Traditional RecSys learners use a static and finite set of rating data to build a RecSys model. Reoccurring model learning after a specific time span to incorporate new rating data.
  • 19. 19 … to Rating Events (Real-life RecSys) P Qx Model rating event (user, item, rating) at time t Rating events occur continuously, which leads to new learning data that potentially improves the predictions for all users.
  • 20. 20 Related Work ● Time-aware Collaborative RecSys (e. g., Koren 2010) ● Incremental/Online Collaborative RecSys Algorithms (e. g., BRISMF by Gábor et al. 2009) ● Collaborative RecSys with Apache Storm by Ali et al. 2011 – Algorithm for parallel collaborative filtering ● StreamRec by Campos et al. 2011 – Based on Microsoft StreamInsight – Calculates item/user similarities with DSMS operators ● Massive Online Analysis (MOA) by Bifet et al. 2010 – Framework for evaluating machine learning algorithms – Implements BRISMF
  • 21. 21 Why DSMS + RecSys? Characteristics of a (our) DSMS: ● Continuous queries on potentially infinite data streams ● Stream-based operators (one pass) / stream-based relational algebra ● Query plan as directed graph of operators ● Logical operators are transformed to physical operators ● Query optimizations, query sharing ● Time annotation of stream elements ● Operator framework
  • 22. 22 Why DSMS + RecSys? What is the current interest of a user? ● Continuously processing of rating data ● Models are valid a specific time interval ● Deterministic temporal matching of model and data In general: ● Take advantages of DSMS features (like optimizations, flexible query formulation, …) ● Usage of established standard operators ● Extensibility like pre- and post-processing (e. g., context reasoning/modeling, normalization, …)
  • 23. 23 RecSys Operators get unrated items predict rating train recsys model recommend Feedback rating event model with validity time interval request for recommendations recommendation candidates select top-k
  • 24. 24 RecSys Operators get unrated items predict rating train recsys model recommend Feedback window Rating data is potentially infinite in size! A window operator limits the tuples in memory.
  • 25. 25 RecSys Operators ● “train recsys model” Operator – Implements a learner to train a model. – Holds valid rating data. – Outputs models with a validity time interval. ● “get unrated items” Operator – Gets request for recommendations. – Outputs for each item that was not rated by the requesting user a tuple as recommendation candidate. ● “predict rating” Operator – Predicts for each recommendation candidate the rating. ● “recommend” Operator – Selects the recommendations for the requesting user. – Selects items with a min. rating and/or top-K items.
  • 26. 26 Operators for Continuous Evaluation get unrated items route predict rating train recsys model predict rating recommend test prediction Feedback window test data learning data prediction error
  • 27. 27 Operators for Continuous Evaluation ● “route” Operator – Distributes incoming data as learning or test data. – Implements an evaluation methodology. – e. g., Hold out: route 10 % as test data ● “predict rating” Operator – Predicts for each test tuple the rating. ● “test prediction” Operator – Calculates an error value for true and predicted . – e. g., Root Mean Square Error (RMSE)
  • 29. 29 Physical Query Plan requests for recommendations (RfR) rating data metadata creation (time interval) routes learning and test data joins RfR with temporal matching models joins test data with temporal matching models limits validity of learning data implements learner “now” window
  • 30. 30 Physical Query Plan adds predicted rating to test tuple map operator aggregation size aggregation operator map operator outputs unrated items joins unrated items and models adds predicted rating to unrated item selects items with min rating selects top-K items
  • 31. 31 Physical Operators ● “train recsys model” Operator – BRISMF (incremental matrix factorization) – BRISMF implementation of Massive Online Analysis (MOA) integrated in Odysseus http://moa.cms.waikato.ac.nz/details/recommender-systems/ – One model for each subset of valid learning tuples. – Exactly one valid model at every point in time. – Needs to hold all learning tuples in memory but does not build models from scratch.
  • 32. 32 Physical Operators ● “interleaved test-than-train” Operator – Physical counterpart to “route” operator – Using rating tuples for learning and testing – Sets validity interval of test tuple to [t-1, t) to ensure a matching to a model that has not used this tuple for learning ● “test prediction” Operator is implemented by a – map, → calculates square error – time window, → sets aggregation time span – aggregation, → aggregates errors (avg) – and another map operator → calculates root of error
  • 33. 33 Plot of Root Mean Square Error
  • 34. 34 Prototype Evaluation ● Comparison of RMSE after every learning tuple with Massive Online Analysis (MOA) – MovieLens dataset, ordered by timestamp, read line-by-line as rating data – No decay of learning tuples (unbounded window) – Aggregation of RMSE over the whole dataset – (Random users for request for recommendations) ● Same results as MOA – MOA operates sequentially (first test, than train), we ensure the correct order by the time annotation and the temporal join – Temporal matching works as expected
  • 35. 35 Summary and Future Work ● Generic, extendable and modular structure for a RecSys based on DSMS operators ● Logical operators allow different physical implementations ● Time annotations ensures deterministic temporal matching of models and data ● Prototypical implementation with BRISMF and Interleaved Test-Than- Train ● Future Work: – Implementation of learners that consider temporal aspects – Impact of decay of tuples (different windows) on accuracy, latency, throughput, memory consumption – Optimizations of algorithms, query plan, transformations … Thank you for your attention!