Recommendation Engines are everywhere these days, telling us which products to buy on Amazon, which movies to watch on Netflix, which courses to take on Coursera, and on and on. This presentation is a description of the collaborative filtering and content-based recommendation engines at Jane.com, Inc magazine's fastest-growing e-commerce company of 2015.
3. Amazon’s percent of sales from recommendation 35% (2006)
Netflix estimates that 75 percent of viewer activity is driven by recommendation.
(2013 - Wired)
Why Recommendations?
4. How does it work?
Application
User
Events
Kinesis
Lambda
Lambda
Lambda
DB
5. Collaborative Filtering
Amazon’s “Users also Purchased”
Recommend products based on shared activity with other users
Predicts what other product-user mappings are likely based on current ones
www.amazon.com
6.
7. The Tools: Spark, Mahout, Cloudsearch
Spark:
Fast Parallel Data Processing and Machine Learning
Scales to massive amounts of data
Mahout:
Parallel Linear Algebra (Matrix Operations) and Machine Learning
Spark and Mahout together enable fast collaborative filtering on massive
datasets
Cloudsearch:
9. Jane’s Recommendation Challenges
“Cold Start Problem” To the Max
No long-lived products to use as baseline for new ones
Every day ⅓ of products are brand new
Means we need to use events as far back as we
reasonably can in our calculation
http://www.beautifulonraw.com/raw-food-blog/wp-
content/uploads/2010/06/Shivering.jpg
10. Other Types of Recommenders
Content
Popular
User Similarity
"Collaborative Filtering in Recommender Systems" by Moshanin - Own work. Licensed under CC BY-SA 3.0 via Commons -
https://commons.wikimedia.org/wiki/File:Collaborative_Filtering_in_Recommender_Systems.jpg#/media/File:Collaborative_Filtering_in_Recommender_Systems.jpg
11. Content Recommendations
Recommend items that are similar to the given item
Based on information contained in the item - title, description, images, etc.
Avoids the “Cold Start” problem
User may not want to buy 2 very similar things though
14. Content Recommendations with Word Embeddings
Calculate word embeddings on text within product (description, title, tags, etc.)
Compute distances between “embedded” product information
Euclidean distance is poor in such high dimensions - try cosine, mahalanobis, others
N nearest neighbors to the product in question are your recommendation
15. Improving Content Recommendations
Remove meaningless, common stopwords
Weight your embedded vectors on given criteria
Use category information
Get creative with your data - different patterns in each dataset
16. Improving, cont.
Can “embed” images in a similar fashion using deep networks
Compute distance between embedded images
Combine image distances and text distances to give combined distance metric
Determine nearest neighbors from new distance metric
17. Summary
Recommendations are a powerful (and these days, standard and necessary) tool
for improving customer interaction, conversion, etc.
Collaborative filtering is a proven algorithm for relevant recommendations (given
lots of user data and products)
Great tools for building collaborative filtering recommendation systems exist
(AWS, Spark, etc.) but you need to adapt to your specific needs
Content recommendations can supplement the weaknesses of collaborative
filtering
Get creative to improve the quality of your recommendations