Demystifying
Recommendation
Systems
About Rumman
•Senior Data Scientist and Instructor at Metis
•Practicing Data Scientist
• Find me on twitter @ruchowdh
• Visit my website at rummanchowdhury.com
• Check out my jobs page
• …and my blog
About Metis
• Data Science Bootcamp
• Part of Kaplan
• Accredited by ACCET
• 12-weeks, full-time including 60 hours of
online pre-work
• Evening and weekend training courses
• Third party financing options
• $3,000 scholarship for women,
underrepresented minority groups, and
veterans or members of the U.S. military
Overview
• What is a recommendation engine?
• What are the types of recommendation
systems?
• What are the drawbacks of the most
common recommendation engines and how
do I deal with them?
• How do I fine-tune my model?
What are recommendation
systems?
What are recommendation systems?
Automated systems that seek to suggest whether a
given item (product, event, movie, song, etc) will be
desirable to a user.
Or, more data science-y: predict what a user’s
review will be for items that they have not 

reviewed
Where does a recommendation system lie in the
space of data science and analytics?
• Descriptive
• Average, percents, etc
• Explains post-event or during
• Predictive
• Uses modeling of past behavior to make
predictions about the future
• Prescriptive
• Informed decision of how actions

should be taken based on data
How do I pick the best kind of recommender system
for my data?
• What is your existing data?
• How quickly does your inventory change?
• How much information can you get on a
user? (explicit and implicit)
• Does your model need to scale well?
What are the kinds of
recommendation systems?
What are the kinds of recommender systems?
• Search (knowledge-based)
• Pros: items will be close matches to
expressed needs, no cold-start issues
• Cons: Static, manual tagging, will not
work well with very similar inventories
or rapidly changing inventories
• Example: Amazon’s basic search
What are the kinds of recommender systems?
• Content-based
• Items are mapped based on characteristics into
an item-feature space, and recommendations
are based on specified characteristics
• Pros: Easier comparison between items
• Cons: Cold start problem, need good content
descriptions, need item ratings
•Example: Search for ‘ai’ vs ‘AI’, 

‘mit’ vs ‘MIT’
What are the kinds of recommender systems?
• Collaborative filtering: based on user and
item similarities
• Pros: can provide less-obvious matches
• Cons: cold-start problem for new users and
new items, requires a feedback rating
Limitations, or, Ask yourself, do you really need a
recommendation engine?
• Recommendation systems have to update immediately.
• You have to have a sufficiently inexpensive
model and have the bandwidth to return results
fast.
• You have more information than you think:
• existing item popularity
• geography based in ip address
• cookies
How does Content-Based recommendation work?
• Users and items are represented by vectors
in a feature space
• Approaches:
• Map users and items to the same
feature space, compute distance
between a user and an item.
Example: Content-Based Recommendation
Features = (big box office, aimed at kids, famous actors)
Items (movies):



Finding Nemo = (5, 5, 2)
Mission Impossible = (3, -5, 5)
Jiro Dreams of Sushi = (-4, -5, -5)
Predicted ratings*:
(-3*5 + 2*5 + 2*2) = -9
(-3*3 - 2*5 - 2*5) = -29
(3*4 - 2*5 + 2*5) = +12
* Ratings for user with a described
preference of (-3, 2, 2) for these features
How does Content Based Recommendation work?
• Another option is to create features from
user+item pairs and use an algorithm
(classifier?) to predict like/dislike
•Each user/item pair has a labeled outcome,
such as purchased/not purchased. You can
train a model to predict purchase behavior.
How does Collaborative Filtering work?
• Collaborative filtering refers to a family of
methods for predicting ratings where instead of
thinking about users and items in terms of a
feature space, we are only interested in the
existing user-item ratings themselves.

•In this case, our dataset is a ratings matrix whose
columns correspond to items, and whose rows
correspond to users.
Example: Netflix movie recommendations
How does collaborative filtering work?
• Method 1: Item-based CF, a.k.a. neighborhood
methods or memory-based CF
• Ratings data are used to create an item-item
similarity matrix.
• Recommendations are made based on the items
most similar to those a user has already rated
highly.
•This method does not scale well.
• Why? You need a fully populated matrix of
item-item similarity. This doesn’t work well
if you have a lot of items or if your items
change a lot.
How does CF work?
• Method 2: Model-based CF use matrix
decomposition via singular value
decomposition (SVD) to reduce
dimensionality and extract latent variables.
• We express users and items in terms of
these variables.
Why is model-based CF preferred?
• Scalable, flexible, accurate, domain
independent, and requires no explicit
information.
What are the drawbacks, and
how can I address them?
Let’s discuss the drawbacks
• Cold-start problem!
• Data is typically very sparse
•Need granularity in your data
Drawback: Cold Start problem
• Build an initial profile based on implicit
data, evolve based on explicit feedback as it
comes.
• Sometimes called a ‘hybrid’ filtering
method, you can use content-based
information to ease cold-start and data
sparsity problems.
Drawback: Sparsity of Data
• Famous Netflix prize dataset, ~ 99% of
possible ratings were missing.
• Data is skewed and sparse
• or, most people don’t rate a lot and
most items aren’t rated
• those that are often are rated
constantly
Drawback: Granularity of data
• Traditional model-based CF works well for
non-binary data (ie, a 5 star rating). Doesn’t
work well for binary (ie, click/not click,
purchased/did not purchase)
• You will need to tweak your 

measurements of item similarities
Quick overview of measurement
• Non-binary rating:
• Pearson correlation coefficient
• Euclidean distance
• Manhattan distance
• Binary ratings:
• Jaccard similarity
• Cosine similarity
How do I refine my
model?
Normalization
• Some items are significantly higher rated
(ie, blockbuster movies, Oscar winners)
• Some users are lower (or higher) raters
from the norm
• Ratings can change over time
Normalization
• Need to offset per user
• Need to offset per item
•Ex: Mean rating across all users for item x is
some value. How does it differ from the mean
rating across all items? How does my rating
differ from the mean rating of that item?
Capturing data trends
• Rating distributions:
• ratings aren’t random, they follow a
distribution - model this distribution
• Feature importance: You can regress on your
feature vectors to get an understanding of what
values impact ratings
• Feature generation: Characterize your users and
create one-hot features (this can save a lot of time,
and help with cold-start problems)
Temporal factors
• There can be an upward trend of ratings
over time
• Seasonal shifts due to holidays, awards, etc
• Anchoring (ie, an item based on a previous
iteration or version of that item)
Conclusions
• Think about your data, your capabilities,
and your needs prior to creating a
recommendation system
• Consider the pros and cons of each type
• Refine your model thoughtfully
Questions?
www.rummanchowdhury.com
@ruchowdh

Demystifying Recommendation Systems

  • 1.
  • 2.
    About Rumman •Senior DataScientist and Instructor at Metis •Practicing Data Scientist • Find me on twitter @ruchowdh • Visit my website at rummanchowdhury.com • Check out my jobs page • …and my blog
  • 3.
    About Metis • DataScience Bootcamp • Part of Kaplan • Accredited by ACCET • 12-weeks, full-time including 60 hours of online pre-work • Evening and weekend training courses • Third party financing options • $3,000 scholarship for women, underrepresented minority groups, and veterans or members of the U.S. military
  • 4.
    Overview • What isa recommendation engine? • What are the types of recommendation systems? • What are the drawbacks of the most common recommendation engines and how do I deal with them? • How do I fine-tune my model?
  • 5.
  • 6.
    What are recommendationsystems? Automated systems that seek to suggest whether a given item (product, event, movie, song, etc) will be desirable to a user. Or, more data science-y: predict what a user’s review will be for items that they have not 
 reviewed
  • 7.
    Where does arecommendation system lie in the space of data science and analytics? • Descriptive • Average, percents, etc • Explains post-event or during • Predictive • Uses modeling of past behavior to make predictions about the future • Prescriptive • Informed decision of how actions
 should be taken based on data
  • 8.
    How do Ipick the best kind of recommender system for my data? • What is your existing data? • How quickly does your inventory change? • How much information can you get on a user? (explicit and implicit) • Does your model need to scale well?
  • 9.
    What are thekinds of recommendation systems?
  • 10.
    What are thekinds of recommender systems? • Search (knowledge-based) • Pros: items will be close matches to expressed needs, no cold-start issues • Cons: Static, manual tagging, will not work well with very similar inventories or rapidly changing inventories • Example: Amazon’s basic search
  • 11.
    What are thekinds of recommender systems? • Content-based • Items are mapped based on characteristics into an item-feature space, and recommendations are based on specified characteristics • Pros: Easier comparison between items • Cons: Cold start problem, need good content descriptions, need item ratings •Example: Search for ‘ai’ vs ‘AI’, 
 ‘mit’ vs ‘MIT’
  • 12.
    What are thekinds of recommender systems? • Collaborative filtering: based on user and item similarities • Pros: can provide less-obvious matches • Cons: cold-start problem for new users and new items, requires a feedback rating
  • 13.
    Limitations, or, Askyourself, do you really need a recommendation engine? • Recommendation systems have to update immediately. • You have to have a sufficiently inexpensive model and have the bandwidth to return results fast. • You have more information than you think: • existing item popularity • geography based in ip address • cookies
  • 14.
    How does Content-Basedrecommendation work? • Users and items are represented by vectors in a feature space • Approaches: • Map users and items to the same feature space, compute distance between a user and an item.
  • 15.
    Example: Content-Based Recommendation Features= (big box office, aimed at kids, famous actors) Items (movies):
 
 Finding Nemo = (5, 5, 2) Mission Impossible = (3, -5, 5) Jiro Dreams of Sushi = (-4, -5, -5) Predicted ratings*: (-3*5 + 2*5 + 2*2) = -9 (-3*3 - 2*5 - 2*5) = -29 (3*4 - 2*5 + 2*5) = +12 * Ratings for user with a described preference of (-3, 2, 2) for these features
  • 16.
    How does ContentBased Recommendation work? • Another option is to create features from user+item pairs and use an algorithm (classifier?) to predict like/dislike •Each user/item pair has a labeled outcome, such as purchased/not purchased. You can train a model to predict purchase behavior.
  • 17.
    How does CollaborativeFiltering work? • Collaborative filtering refers to a family of methods for predicting ratings where instead of thinking about users and items in terms of a feature space, we are only interested in the existing user-item ratings themselves.
 •In this case, our dataset is a ratings matrix whose columns correspond to items, and whose rows correspond to users.
  • 18.
    Example: Netflix movierecommendations
  • 19.
    How does collaborativefiltering work? • Method 1: Item-based CF, a.k.a. neighborhood methods or memory-based CF • Ratings data are used to create an item-item similarity matrix. • Recommendations are made based on the items most similar to those a user has already rated highly. •This method does not scale well. • Why? You need a fully populated matrix of item-item similarity. This doesn’t work well if you have a lot of items or if your items change a lot.
  • 20.
    How does CFwork? • Method 2: Model-based CF use matrix decomposition via singular value decomposition (SVD) to reduce dimensionality and extract latent variables. • We express users and items in terms of these variables.
  • 21.
    Why is model-basedCF preferred? • Scalable, flexible, accurate, domain independent, and requires no explicit information.
  • 22.
    What are thedrawbacks, and how can I address them?
  • 23.
    Let’s discuss thedrawbacks • Cold-start problem! • Data is typically very sparse •Need granularity in your data
  • 24.
    Drawback: Cold Startproblem • Build an initial profile based on implicit data, evolve based on explicit feedback as it comes. • Sometimes called a ‘hybrid’ filtering method, you can use content-based information to ease cold-start and data sparsity problems.
  • 25.
    Drawback: Sparsity ofData • Famous Netflix prize dataset, ~ 99% of possible ratings were missing. • Data is skewed and sparse • or, most people don’t rate a lot and most items aren’t rated • those that are often are rated constantly
  • 26.
    Drawback: Granularity ofdata • Traditional model-based CF works well for non-binary data (ie, a 5 star rating). Doesn’t work well for binary (ie, click/not click, purchased/did not purchase) • You will need to tweak your 
 measurements of item similarities
  • 27.
    Quick overview ofmeasurement • Non-binary rating: • Pearson correlation coefficient • Euclidean distance • Manhattan distance • Binary ratings: • Jaccard similarity • Cosine similarity
  • 28.
    How do Irefine my model?
  • 29.
    Normalization • Some itemsare significantly higher rated (ie, blockbuster movies, Oscar winners) • Some users are lower (or higher) raters from the norm • Ratings can change over time
  • 30.
    Normalization • Need tooffset per user • Need to offset per item •Ex: Mean rating across all users for item x is some value. How does it differ from the mean rating across all items? How does my rating differ from the mean rating of that item?
  • 31.
    Capturing data trends •Rating distributions: • ratings aren’t random, they follow a distribution - model this distribution • Feature importance: You can regress on your feature vectors to get an understanding of what values impact ratings • Feature generation: Characterize your users and create one-hot features (this can save a lot of time, and help with cold-start problems)
  • 32.
    Temporal factors • Therecan be an upward trend of ratings over time • Seasonal shifts due to holidays, awards, etc • Anchoring (ie, an item based on a previous iteration or version of that item)
  • 33.
    Conclusions • Think aboutyour data, your capabilities, and your needs prior to creating a recommendation system • Consider the pros and cons of each type • Refine your model thoughtfully
  • 34.