SlideShare a Scribd company logo
RRecommend Roadtrip – Matrix Factorization
DYLAN VALERIO
Recommendation
Systems using R
WEAR SEATBELTS
75% of watched series came from the recommender systems. (McKinsey)
35% of revenue comes from recommenders (McKinsey)
135.99 billion in revenue (Statista)
DYLAN VALERIO – NOVEMBER 2020
Dylan Valerio
•Software engineer
•Data Scientist
•Kaggler
•academic
•ADMU, BS CS,
•UP Diliman, MS CS
DYLAN VALERIO – NOVEMBER 2020
Review: Different Paradigms of Recommendation
watched
Content-based Filtering Collaborative Filtering
watched
Similar tags: crime, Robert de Niro, dark, mob
recommends
recommends
Pros More interesting for users
Cons Items with no usage (Cold-start)
Pros Readily explainable; Fast
Cons Stale and unchanging
• No free lunch
• It’s a quickly growing field with vast literature and domain-specific nuances
DYLAN VALERIO – NOVEMBER 2020
Review: Different Paradigms of Recommendation
Recommenders
Content-Based Filtering Collaborative Filtering
Neighborhood-based
(Memory-based)
Model-based
(Vector-space models)
Hybrid
DYLAN VALERIO – NOVEMBER 2020
Review: Memory-Based Models
User-based K-Nearest Neighbor Recommendation
Intuition: Find the most enjoyed items by the users closest to me in terms of what they watch.
Item-based K-Nearest Neighbor Recommendation
Intuition: Find the closest items of the items I enjoyed in terms of the users that enjoyed both.
Similarity between users
Similarity between items
DYLAN VALERIO – NOVEMBER 2020
Anime Recommendations
DYLAN VALERIO – NOVEMBER 2020
Ratings Matrix
Conceptual View Data View
Collaborative Filtering – Matrix Factorization
IF YOU CAN’T EXPLAIN IT SIMPLY, YOU DON’T UNDERSTAND IT WELL ENO UGH
DYLAN VALERIO – NOVEMBER 2020
Factorization
• 32 and 27 has special properties
• Tons of algorithms for factorization
• P & Q produces R
• In some algorithms, P & Q have
special properties
DYLAN VALERIO – NOVEMBER 2020
Intuitions
If you can imagine all movies in 2 dimensions:
https://developers.google.com/machine-learning/crash-course/embeddings/motivation-from-collaborative-filtering Microsoft blog (source image URL missing)
Imagine also users in the same space
DYLAN VALERIO – NOVEMBER 2020
Other Areas of Application
https://www.researchgate.net/figure/The-approaches-
based-on-a-low-rank-assumption-Non-negative-matrix-
factorization-NMF-is_fig2_324487531
An Overview of Lead and
Accompaniment Separation in Music
EigenFaces and A Simple
Face Detector
https://sandipanweb.wordpress.com/2018/01/06/eigenfaces-and-
a-simple-face-detector-with-pca-svd-in-python/
Low Rank plus Sparse Matrix recovery using
Randomized Rank Revealing Decomposition
https://www.youtube.com/watch?v=70DMDJUxyl8
DYLAN VALERIO – NOVEMBER 2020
Matrix Factorization For Recommenders
• Latent factors describe the structure of the data
beyond the noise
• There are two latent variables, the items and the
users, rows and columns, respectively.
• Can “recover” the missing values in the ratings matrix
Similarity is defined as the dot product
This computes for R even for content not consumed by user i.
User
Item
• Algorithms for MF: Alternating Least Squares,
FunkSVD, Weighted Matrix Factorization
• Minimize the following loss function:
DYLAN VALERIO – NOVEMBER 2020
RecommenderlabRecommendation library in R
Convert data table to matrix of users in rows and items in columns
Input recommender algorithm
1
2
https://www.kaggle.com/krsnewwave/r-study-factorization
Evaluating Recommendation Algorithms
WHO’S A GOOD BOY?
DYLAN VALERIO – NOVEMBER 2020
Evaluating a Good Recommender
We take out a fraction of watches from each user.
We then compare the similarity of our predicted
recommendations to actual watched items.
Ranking Metrics
• Precision
• Recall
• Area under the ROC curve
Error-Based Metrics
• RMSE, MAE
Other metrics
• Diversity
• Novelty
• Serendipity
DYLAN VALERIO – NOVEMBER 2020
Recommenderlabs Evaluation
Create evaluation strategy
1 2
3
Evaluate top N of the list (1, 3, 5… items)
Competing algorithms:
SVD – 10 traits (k)
DYLAN VALERIO – NOVEMBER 2020
Evaluation
Results
DYLAN VALERIO – NOVEMBER 2020
Evaluation Results
Some manipulation of the resulting
table and ggplot magic.
DYLAN VALERIO – NOVEMBER 2020
Evaluation Results
Some manipulation of the resulting
table and ggplot magic.
Interpreting the Output Matrices
LOGIC WILL GET YOU FROM A TO B. IMAGINATION WILL TAKE YOU EVERYW HERE.
DYLAN VALERIO – NOVEMBER 2020
Interpretation / Visualization TechniquesUsing factoextra
Idea: Run Principal Components Analysis for
Factor Analysis (2-stage decomposition)
Each row in P and Q are user and item
features
Import Libraries for Factor Analysis
Get Item Latent Vectors
1
2
DYLAN VALERIO – NOVEMBER 2020
Principal Components’ Contribution
• 1st component accounts
for 12% of variance
• The next 8 accounts for
80%
• The 10th is around 7.5%
• After the 10th, there are
only minor contributions.
1 5 10
DYLAN VALERIO – NOVEMBER 2020
Visualizing Individual Shows
Similarly rated shows are clustered
together
DYLAN VALERIO – NOVEMBER 2020
Visualizing Individual Shows and the Original Features
V1 has a small angle with first
principal component
DYLAN VALERIO – NOVEMBER 2020
itstherealdyl.com
Kaggle Data: Blood Chemistry Kaggle Competition: Melanoma A Name Across Time and Space
DYLAN VALERIO – NOVEMBER 2020
RRecommend Roadtrip
DYLAN VALERIO
Recommendation
Systems using R
WEAR SEATBELTS

More Related Content

Similar to R-Recommenders Matrix Factorization - RUG PH Meetup

Automatic image moderation in classifieds
Automatic image moderation in classifiedsAutomatic image moderation in classifieds
Automatic image moderation in classifiedsJaroslaw Szymczak
 
Automatic image moderation in classifieds, Jarosław Szymczak
Automatic image moderation in classifieds, Jarosław SzymczakAutomatic image moderation in classifieds, Jarosław Szymczak
Automatic image moderation in classifieds, Jarosław SzymczakPôle Systematic Paris-Region
 
The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...Evention
 
AI improves software testing by Kari Kakkonen at TQS
AI improves software testing by Kari Kakkonen at TQSAI improves software testing by Kari Kakkonen at TQS
AI improves software testing by Kari Kakkonen at TQSKari Kakkonen
 
Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Massimiliano Di Penta
 
Target Leakage in Machine Learning
Target Leakage in Machine LearningTarget Leakage in Machine Learning
Target Leakage in Machine LearningYuriy Guts
 
[Sirius Day Eindhoven 2018] ASML's MDE Going Sirius
[Sirius Day Eindhoven 2018]  ASML's MDE Going Sirius[Sirius Day Eindhoven 2018]  ASML's MDE Going Sirius
[Sirius Day Eindhoven 2018] ASML's MDE Going SiriusObeo
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R MeetupJo-fai Chow
 
DSI_Detailed_Syllabus_v10.2
DSI_Detailed_Syllabus_v10.2DSI_Detailed_Syllabus_v10.2
DSI_Detailed_Syllabus_v10.2Dorian Lacaisse
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19Yong Siang (Ivan) Tan
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015DataKitchen
 
[Rakuten TechConf2014] [C-6] Japan ICHIBA Daily Work - Tools & Processes
[Rakuten TechConf2014] [C-6] Japan ICHIBA Daily Work - Tools & Processes[Rakuten TechConf2014] [C-6] Japan ICHIBA Daily Work - Tools & Processes
[Rakuten TechConf2014] [C-6] Japan ICHIBA Daily Work - Tools & ProcessesRakuten Group, Inc.
 
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...Tao Xie
 
Scrum and Kanban - Getting the Most from Each
Scrum and Kanban - Getting the Most from EachScrum and Kanban - Getting the Most from Each
Scrum and Kanban - Getting the Most from EachMichael Sahota
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...predictionio
 
Lean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products FasterLean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products FasterBrad Swanson
 

Similar to R-Recommenders Matrix Factorization - RUG PH Meetup (20)

Automatic image moderation in classifieds
Automatic image moderation in classifiedsAutomatic image moderation in classifieds
Automatic image moderation in classifieds
 
Automatic image moderation in classifieds, Jarosław Szymczak
Automatic image moderation in classifieds, Jarosław SzymczakAutomatic image moderation in classifieds, Jarosław Szymczak
Automatic image moderation in classifieds, Jarosław Szymczak
 
The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...
 
AI improves software testing by Kari Kakkonen at TQS
AI improves software testing by Kari Kakkonen at TQSAI improves software testing by Kari Kakkonen at TQS
AI improves software testing by Kari Kakkonen at TQS
 
Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?
 
Target Leakage in Machine Learning
Target Leakage in Machine LearningTarget Leakage in Machine Learning
Target Leakage in Machine Learning
 
[Sirius Day Eindhoven 2018] ASML's MDE Going Sirius
[Sirius Day Eindhoven 2018]  ASML's MDE Going Sirius[Sirius Day Eindhoven 2018]  ASML's MDE Going Sirius
[Sirius Day Eindhoven 2018] ASML's MDE Going Sirius
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
 
DSI_Detailed_Syllabus_v10.2
DSI_Detailed_Syllabus_v10.2DSI_Detailed_Syllabus_v10.2
DSI_Detailed_Syllabus_v10.2
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
 
[Rakuten TechConf2014] [C-6] Japan ICHIBA Daily Work - Tools & Processes
[Rakuten TechConf2014] [C-6] Japan ICHIBA Daily Work - Tools & Processes[Rakuten TechConf2014] [C-6] Japan ICHIBA Daily Work - Tools & Processes
[Rakuten TechConf2014] [C-6] Japan ICHIBA Daily Work - Tools & Processes
 
An Introduction to Face Detection
An Introduction to Face DetectionAn Introduction to Face Detection
An Introduction to Face Detection
 
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
 
Scrum and Kanban - Getting the Most from Each
Scrum and Kanban - Getting the Most from EachScrum and Kanban - Getting the Most from Each
Scrum and Kanban - Getting the Most from Each
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
Lean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products FasterLean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products Faster
 

Recently uploaded

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单ewymefz
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单enxupq
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单nscud
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单yhkoc
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportSatyamNeelmani2
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单ocavb
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单nscud
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesStarCompliance.io
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundOppotus
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhArpitMalhotra16
 

Recently uploaded (20)

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 

R-Recommenders Matrix Factorization - RUG PH Meetup

  • 1. RRecommend Roadtrip – Matrix Factorization DYLAN VALERIO Recommendation Systems using R WEAR SEATBELTS
  • 2. 75% of watched series came from the recommender systems. (McKinsey) 35% of revenue comes from recommenders (McKinsey) 135.99 billion in revenue (Statista)
  • 3. DYLAN VALERIO – NOVEMBER 2020 Dylan Valerio •Software engineer •Data Scientist •Kaggler •academic •ADMU, BS CS, •UP Diliman, MS CS
  • 4. DYLAN VALERIO – NOVEMBER 2020 Review: Different Paradigms of Recommendation watched Content-based Filtering Collaborative Filtering watched Similar tags: crime, Robert de Niro, dark, mob recommends recommends Pros More interesting for users Cons Items with no usage (Cold-start) Pros Readily explainable; Fast Cons Stale and unchanging • No free lunch • It’s a quickly growing field with vast literature and domain-specific nuances
  • 5. DYLAN VALERIO – NOVEMBER 2020 Review: Different Paradigms of Recommendation Recommenders Content-Based Filtering Collaborative Filtering Neighborhood-based (Memory-based) Model-based (Vector-space models) Hybrid
  • 6. DYLAN VALERIO – NOVEMBER 2020 Review: Memory-Based Models User-based K-Nearest Neighbor Recommendation Intuition: Find the most enjoyed items by the users closest to me in terms of what they watch. Item-based K-Nearest Neighbor Recommendation Intuition: Find the closest items of the items I enjoyed in terms of the users that enjoyed both. Similarity between users Similarity between items
  • 7. DYLAN VALERIO – NOVEMBER 2020 Anime Recommendations
  • 8. DYLAN VALERIO – NOVEMBER 2020 Ratings Matrix Conceptual View Data View
  • 9. Collaborative Filtering – Matrix Factorization IF YOU CAN’T EXPLAIN IT SIMPLY, YOU DON’T UNDERSTAND IT WELL ENO UGH
  • 10. DYLAN VALERIO – NOVEMBER 2020 Factorization • 32 and 27 has special properties • Tons of algorithms for factorization • P & Q produces R • In some algorithms, P & Q have special properties
  • 11. DYLAN VALERIO – NOVEMBER 2020 Intuitions If you can imagine all movies in 2 dimensions: https://developers.google.com/machine-learning/crash-course/embeddings/motivation-from-collaborative-filtering Microsoft blog (source image URL missing) Imagine also users in the same space
  • 12. DYLAN VALERIO – NOVEMBER 2020 Other Areas of Application https://www.researchgate.net/figure/The-approaches- based-on-a-low-rank-assumption-Non-negative-matrix- factorization-NMF-is_fig2_324487531 An Overview of Lead and Accompaniment Separation in Music EigenFaces and A Simple Face Detector https://sandipanweb.wordpress.com/2018/01/06/eigenfaces-and- a-simple-face-detector-with-pca-svd-in-python/ Low Rank plus Sparse Matrix recovery using Randomized Rank Revealing Decomposition https://www.youtube.com/watch?v=70DMDJUxyl8
  • 13. DYLAN VALERIO – NOVEMBER 2020 Matrix Factorization For Recommenders • Latent factors describe the structure of the data beyond the noise • There are two latent variables, the items and the users, rows and columns, respectively. • Can “recover” the missing values in the ratings matrix Similarity is defined as the dot product This computes for R even for content not consumed by user i. User Item • Algorithms for MF: Alternating Least Squares, FunkSVD, Weighted Matrix Factorization • Minimize the following loss function:
  • 14. DYLAN VALERIO – NOVEMBER 2020 RecommenderlabRecommendation library in R Convert data table to matrix of users in rows and items in columns Input recommender algorithm 1 2 https://www.kaggle.com/krsnewwave/r-study-factorization
  • 16. DYLAN VALERIO – NOVEMBER 2020 Evaluating a Good Recommender We take out a fraction of watches from each user. We then compare the similarity of our predicted recommendations to actual watched items. Ranking Metrics • Precision • Recall • Area under the ROC curve Error-Based Metrics • RMSE, MAE Other metrics • Diversity • Novelty • Serendipity
  • 17. DYLAN VALERIO – NOVEMBER 2020 Recommenderlabs Evaluation Create evaluation strategy 1 2 3 Evaluate top N of the list (1, 3, 5… items) Competing algorithms: SVD – 10 traits (k)
  • 18. DYLAN VALERIO – NOVEMBER 2020 Evaluation Results
  • 19. DYLAN VALERIO – NOVEMBER 2020 Evaluation Results Some manipulation of the resulting table and ggplot magic.
  • 20. DYLAN VALERIO – NOVEMBER 2020 Evaluation Results Some manipulation of the resulting table and ggplot magic.
  • 21. Interpreting the Output Matrices LOGIC WILL GET YOU FROM A TO B. IMAGINATION WILL TAKE YOU EVERYW HERE.
  • 22. DYLAN VALERIO – NOVEMBER 2020 Interpretation / Visualization TechniquesUsing factoextra Idea: Run Principal Components Analysis for Factor Analysis (2-stage decomposition) Each row in P and Q are user and item features Import Libraries for Factor Analysis Get Item Latent Vectors 1 2
  • 23. DYLAN VALERIO – NOVEMBER 2020 Principal Components’ Contribution • 1st component accounts for 12% of variance • The next 8 accounts for 80% • The 10th is around 7.5% • After the 10th, there are only minor contributions. 1 5 10
  • 24. DYLAN VALERIO – NOVEMBER 2020 Visualizing Individual Shows Similarly rated shows are clustered together
  • 25. DYLAN VALERIO – NOVEMBER 2020 Visualizing Individual Shows and the Original Features V1 has a small angle with first principal component
  • 26. DYLAN VALERIO – NOVEMBER 2020 itstherealdyl.com Kaggle Data: Blood Chemistry Kaggle Competition: Melanoma A Name Across Time and Space
  • 27. DYLAN VALERIO – NOVEMBER 2020 RRecommend Roadtrip DYLAN VALERIO Recommendation Systems using R WEAR SEATBELTS

Editor's Notes

  1. RecSys: https://miro.medium.com/max/1636/1*Xt2bNHbyHwtjd8Y9m9msYw.png (deezer: https://deezer.io/deezer-research-recsys-2017-11th-acm-recommender-systems-conference-9b92e1d9d0f3)