SlideShare a Scribd company logo
1 of 34
Download to read offline
Demystifying
Recommendation
Systems
About Rumman
•Senior Data Scientist and Instructor at Metis
•Practicing Data Scientist
• Find me on twitter @ruchowdh
• Visit my website at rummanchowdhury.com
• Check out my jobs page
• …and my blog
About Metis
• Data Science Bootcamp
• Part of Kaplan
• Accredited by ACCET
• 12-weeks, full-time including 60 hours of
online pre-work
• Evening and weekend training courses
• Third party financing options
• $3,000 scholarship for women,
underrepresented minority groups, and
veterans or members of the U.S. military
Overview
• What is a recommendation engine?
• What are the types of recommendation
systems?
• What are the drawbacks of the most
common recommendation engines and how
do I deal with them?
• How do I fine-tune my model?
What are recommendation
systems?
What are recommendation systems?
Automated systems that seek to suggest whether a
given item (product, event, movie, song, etc) will be
desirable to a user.
Or, more data science-y: predict what a user’s
review will be for items that they have not 

reviewed
Where does a recommendation system lie in the
space of data science and analytics?
• Descriptive
• Average, percents, etc
• Explains post-event or during
• Predictive
• Uses modeling of past behavior to make
predictions about the future
• Prescriptive
• Informed decision of how actions

should be taken based on data
How do I pick the best kind of recommender system
for my data?
• What is your existing data?
• How quickly does your inventory change?
• How much information can you get on a
user? (explicit and implicit)
• Does your model need to scale well?
What are the kinds of
recommendation systems?
What are the kinds of recommender systems?
• Search (knowledge-based)
• Pros: items will be close matches to
expressed needs, no cold-start issues
• Cons: Static, manual tagging, will not
work well with very similar inventories
or rapidly changing inventories
• Example: Amazon’s basic search
What are the kinds of recommender systems?
• Content-based
• Items are mapped based on characteristics into
an item-feature space, and recommendations
are based on specified characteristics
• Pros: Easier comparison between items
• Cons: Cold start problem, need good content
descriptions, need item ratings
•Example: Search for ‘ai’ vs ‘AI’, 

‘mit’ vs ‘MIT’
What are the kinds of recommender systems?
• Collaborative filtering: based on user and
item similarities
• Pros: can provide less-obvious matches
• Cons: cold-start problem for new users and
new items, requires a feedback rating
Limitations, or, Ask yourself, do you really need a
recommendation engine?
• Recommendation systems have to update immediately.
• You have to have a sufficiently inexpensive
model and have the bandwidth to return results
fast.
• You have more information than you think:
• existing item popularity
• geography based in ip address
• cookies
How does Content-Based recommendation work?
• Users and items are represented by vectors
in a feature space
• Approaches:
• Map users and items to the same
feature space, compute distance
between a user and an item.
Example: Content-Based Recommendation
Features = (big box office, aimed at kids, famous actors)
Items (movies):



Finding Nemo = (5, 5, 2)
Mission Impossible = (3, -5, 5)
Jiro Dreams of Sushi = (-4, -5, -5)
Predicted ratings*:
(-3*5 + 2*5 + 2*2) = -9
(-3*3 - 2*5 - 2*5) = -29
(3*4 - 2*5 + 2*5) = +12
* Ratings for user with a described
preference of (-3, 2, 2) for these features
How does Content Based Recommendation work?
• Another option is to create features from
user+item pairs and use an algorithm
(classifier?) to predict like/dislike
•Each user/item pair has a labeled outcome,
such as purchased/not purchased. You can
train a model to predict purchase behavior.
How does Collaborative Filtering work?
• Collaborative filtering refers to a family of
methods for predicting ratings where instead of
thinking about users and items in terms of a
feature space, we are only interested in the
existing user-item ratings themselves.

•In this case, our dataset is a ratings matrix whose
columns correspond to items, and whose rows
correspond to users.
Example: Netflix movie recommendations
How does collaborative filtering work?
• Method 1: Item-based CF, a.k.a. neighborhood
methods or memory-based CF
• Ratings data are used to create an item-item
similarity matrix.
• Recommendations are made based on the items
most similar to those a user has already rated
highly.
•This method does not scale well.
• Why? You need a fully populated matrix of
item-item similarity. This doesn’t work well
if you have a lot of items or if your items
change a lot.
How does CF work?
• Method 2: Model-based CF use matrix
decomposition via singular value
decomposition (SVD) to reduce
dimensionality and extract latent variables.
• We express users and items in terms of
these variables.
Why is model-based CF preferred?
• Scalable, flexible, accurate, domain
independent, and requires no explicit
information.
What are the drawbacks, and
how can I address them?
Let’s discuss the drawbacks
• Cold-start problem!
• Data is typically very sparse
•Need granularity in your data
Drawback: Cold Start problem
• Build an initial profile based on implicit
data, evolve based on explicit feedback as it
comes.
• Sometimes called a ‘hybrid’ filtering
method, you can use content-based
information to ease cold-start and data
sparsity problems.
Drawback: Sparsity of Data
• Famous Netflix prize dataset, ~ 99% of
possible ratings were missing.
• Data is skewed and sparse
• or, most people don’t rate a lot and
most items aren’t rated
• those that are often are rated
constantly
Drawback: Granularity of data
• Traditional model-based CF works well for
non-binary data (ie, a 5 star rating). Doesn’t
work well for binary (ie, click/not click,
purchased/did not purchase)
• You will need to tweak your 

measurements of item similarities
Quick overview of measurement
• Non-binary rating:
• Pearson correlation coefficient
• Euclidean distance
• Manhattan distance
• Binary ratings:
• Jaccard similarity
• Cosine similarity
How do I refine my
model?
Normalization
• Some items are significantly higher rated
(ie, blockbuster movies, Oscar winners)
• Some users are lower (or higher) raters
from the norm
• Ratings can change over time
Normalization
• Need to offset per user
• Need to offset per item
•Ex: Mean rating across all users for item x is
some value. How does it differ from the mean
rating across all items? How does my rating
differ from the mean rating of that item?
Capturing data trends
• Rating distributions:
• ratings aren’t random, they follow a
distribution - model this distribution
• Feature importance: You can regress on your
feature vectors to get an understanding of what
values impact ratings
• Feature generation: Characterize your users and
create one-hot features (this can save a lot of time,
and help with cold-start problems)
Temporal factors
• There can be an upward trend of ratings
over time
• Seasonal shifts due to holidays, awards, etc
• Anchoring (ie, an item based on a previous
iteration or version of that item)
Conclusions
• Think about your data, your capabilities,
and your needs prior to creating a
recommendation system
• Consider the pros and cons of each type
• Refine your model thoughtfully
Questions?
www.rummanchowdhury.com
@ruchowdh

More Related Content

Similar to Demystifying Recommendation Systems

Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbaiTejaspathiLV
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in puneprathyusha1234
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabadprathyusha1234
 
best online data science courses
best online data science coursesbest online data science courses
best online data science coursesprathyusha1234
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation systemAkashPatil334
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyMaya Hristakeva
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Dakiry
 
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratchDr. Amit Sachan
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerceAlexander Konduforov
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedBetclic Everest Group Tech Team
 
recommendation system techunique and issue
recommendation system techunique and issuerecommendation system techunique and issue
recommendation system techunique and issueNutanBhor
 
Mod5_Recommendation Systems.pptx
Mod5_Recommendation Systems.pptxMod5_Recommendation Systems.pptx
Mod5_Recommendation Systems.pptxdivyammo
 
Agent technology for e commerce-recommendation systems
Agent technology for e commerce-recommendation systemsAgent technology for e commerce-recommendation systems
Agent technology for e commerce-recommendation systemsAravindharamanan S
 
Recommended System.pptx
 Recommended System.pptx Recommended System.pptx
Recommended System.pptxDr.Shweta
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionPerumalPitchandi
 

Similar to Demystifying Recommendation Systems (20)

Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbai
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in pune
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabad
 
best online data science courses
best online data science coursesbest online data science courses
best online data science courses
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation system
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”
 
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratch
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
 
recommendation system techunique and issue
recommendation system techunique and issuerecommendation system techunique and issue
recommendation system techunique and issue
 
Lec7 collaborative filtering
Lec7 collaborative filteringLec7 collaborative filtering
Lec7 collaborative filtering
 
Culbert.ppt
Culbert.pptCulbert.ppt
Culbert.ppt
 
Culbert.ppt
Culbert.pptCulbert.ppt
Culbert.ppt
 
Culbert.ppt
Culbert.pptCulbert.ppt
Culbert.ppt
 
Culbert.ppt
Culbert.pptCulbert.ppt
Culbert.ppt
 
Mod5_Recommendation Systems.pptx
Mod5_Recommendation Systems.pptxMod5_Recommendation Systems.pptx
Mod5_Recommendation Systems.pptx
 
Agent technology for e commerce-recommendation systems
Agent technology for e commerce-recommendation systemsAgent technology for e commerce-recommendation systems
Agent technology for e commerce-recommendation systems
 
Recommended System.pptx
 Recommended System.pptx Recommended System.pptx
Recommended System.pptx
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System Introduction
 

Recently uploaded

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Recently uploaded (20)

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

Demystifying Recommendation Systems

  • 2. About Rumman •Senior Data Scientist and Instructor at Metis •Practicing Data Scientist • Find me on twitter @ruchowdh • Visit my website at rummanchowdhury.com • Check out my jobs page • …and my blog
  • 3. About Metis • Data Science Bootcamp • Part of Kaplan • Accredited by ACCET • 12-weeks, full-time including 60 hours of online pre-work • Evening and weekend training courses • Third party financing options • $3,000 scholarship for women, underrepresented minority groups, and veterans or members of the U.S. military
  • 4. Overview • What is a recommendation engine? • What are the types of recommendation systems? • What are the drawbacks of the most common recommendation engines and how do I deal with them? • How do I fine-tune my model?
  • 6. What are recommendation systems? Automated systems that seek to suggest whether a given item (product, event, movie, song, etc) will be desirable to a user. Or, more data science-y: predict what a user’s review will be for items that they have not 
 reviewed
  • 7. Where does a recommendation system lie in the space of data science and analytics? • Descriptive • Average, percents, etc • Explains post-event or during • Predictive • Uses modeling of past behavior to make predictions about the future • Prescriptive • Informed decision of how actions
 should be taken based on data
  • 8. How do I pick the best kind of recommender system for my data? • What is your existing data? • How quickly does your inventory change? • How much information can you get on a user? (explicit and implicit) • Does your model need to scale well?
  • 9. What are the kinds of recommendation systems?
  • 10. What are the kinds of recommender systems? • Search (knowledge-based) • Pros: items will be close matches to expressed needs, no cold-start issues • Cons: Static, manual tagging, will not work well with very similar inventories or rapidly changing inventories • Example: Amazon’s basic search
  • 11. What are the kinds of recommender systems? • Content-based • Items are mapped based on characteristics into an item-feature space, and recommendations are based on specified characteristics • Pros: Easier comparison between items • Cons: Cold start problem, need good content descriptions, need item ratings •Example: Search for ‘ai’ vs ‘AI’, 
 ‘mit’ vs ‘MIT’
  • 12. What are the kinds of recommender systems? • Collaborative filtering: based on user and item similarities • Pros: can provide less-obvious matches • Cons: cold-start problem for new users and new items, requires a feedback rating
  • 13. Limitations, or, Ask yourself, do you really need a recommendation engine? • Recommendation systems have to update immediately. • You have to have a sufficiently inexpensive model and have the bandwidth to return results fast. • You have more information than you think: • existing item popularity • geography based in ip address • cookies
  • 14. How does Content-Based recommendation work? • Users and items are represented by vectors in a feature space • Approaches: • Map users and items to the same feature space, compute distance between a user and an item.
  • 15. Example: Content-Based Recommendation Features = (big box office, aimed at kids, famous actors) Items (movies):
 
 Finding Nemo = (5, 5, 2) Mission Impossible = (3, -5, 5) Jiro Dreams of Sushi = (-4, -5, -5) Predicted ratings*: (-3*5 + 2*5 + 2*2) = -9 (-3*3 - 2*5 - 2*5) = -29 (3*4 - 2*5 + 2*5) = +12 * Ratings for user with a described preference of (-3, 2, 2) for these features
  • 16. How does Content Based Recommendation work? • Another option is to create features from user+item pairs and use an algorithm (classifier?) to predict like/dislike •Each user/item pair has a labeled outcome, such as purchased/not purchased. You can train a model to predict purchase behavior.
  • 17. How does Collaborative Filtering work? • Collaborative filtering refers to a family of methods for predicting ratings where instead of thinking about users and items in terms of a feature space, we are only interested in the existing user-item ratings themselves.
 •In this case, our dataset is a ratings matrix whose columns correspond to items, and whose rows correspond to users.
  • 18. Example: Netflix movie recommendations
  • 19. How does collaborative filtering work? • Method 1: Item-based CF, a.k.a. neighborhood methods or memory-based CF • Ratings data are used to create an item-item similarity matrix. • Recommendations are made based on the items most similar to those a user has already rated highly. •This method does not scale well. • Why? You need a fully populated matrix of item-item similarity. This doesn’t work well if you have a lot of items or if your items change a lot.
  • 20. How does CF work? • Method 2: Model-based CF use matrix decomposition via singular value decomposition (SVD) to reduce dimensionality and extract latent variables. • We express users and items in terms of these variables.
  • 21. Why is model-based CF preferred? • Scalable, flexible, accurate, domain independent, and requires no explicit information.
  • 22. What are the drawbacks, and how can I address them?
  • 23. Let’s discuss the drawbacks • Cold-start problem! • Data is typically very sparse •Need granularity in your data
  • 24. Drawback: Cold Start problem • Build an initial profile based on implicit data, evolve based on explicit feedback as it comes. • Sometimes called a ‘hybrid’ filtering method, you can use content-based information to ease cold-start and data sparsity problems.
  • 25. Drawback: Sparsity of Data • Famous Netflix prize dataset, ~ 99% of possible ratings were missing. • Data is skewed and sparse • or, most people don’t rate a lot and most items aren’t rated • those that are often are rated constantly
  • 26. Drawback: Granularity of data • Traditional model-based CF works well for non-binary data (ie, a 5 star rating). Doesn’t work well for binary (ie, click/not click, purchased/did not purchase) • You will need to tweak your 
 measurements of item similarities
  • 27. Quick overview of measurement • Non-binary rating: • Pearson correlation coefficient • Euclidean distance • Manhattan distance • Binary ratings: • Jaccard similarity • Cosine similarity
  • 28. How do I refine my model?
  • 29. Normalization • Some items are significantly higher rated (ie, blockbuster movies, Oscar winners) • Some users are lower (or higher) raters from the norm • Ratings can change over time
  • 30. Normalization • Need to offset per user • Need to offset per item •Ex: Mean rating across all users for item x is some value. How does it differ from the mean rating across all items? How does my rating differ from the mean rating of that item?
  • 31. Capturing data trends • Rating distributions: • ratings aren’t random, they follow a distribution - model this distribution • Feature importance: You can regress on your feature vectors to get an understanding of what values impact ratings • Feature generation: Characterize your users and create one-hot features (this can save a lot of time, and help with cold-start problems)
  • 32. Temporal factors • There can be an upward trend of ratings over time • Seasonal shifts due to holidays, awards, etc • Anchoring (ie, an item based on a previous iteration or version of that item)
  • 33. Conclusions • Think about your data, your capabilities, and your needs prior to creating a recommendation system • Consider the pros and cons of each type • Refine your model thoughtfully