Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Recommender system
1. Recommender System
How does it work ?
Group 4:
Nguyen Dao Tan Bao
Nguyen Thi Ngoc Phu
Cao Dinh Qui
Pham Huy Thanh
Instructed by Dr.Tim Reichert
2. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Outline
Introduction
Collaborative filtering algorithms
User-based
Item-based
Similarity algorithms
Apache Mahout demo
Case study – Amazon
Demo – Music recommender system
2
3. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
3
Introduction
Source: http://www.cguru.info/information_technology_branch.htm
Technologies help
people do many jobs ..
… and also in searching
information
4. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
4
Go to web directories
5. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Or use search engines
5
6. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
So what is the Problem ?
Find
Knew
what
you
need
Search
Knew the
keywords
6
7. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender system
Predict &produce the most relevant
recommendations to its audiences based on
their tastes.
7
8. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Where’s it applied ?
8
9. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Breaking news
9
Source: http://blogs.wsj.com/digits/2014/01/17/amazon-wants-to-ship-your-package-before-you-buy-it/
Amazon Wants to Ship
Your Package Before
You Buy It
What is the secret behind ?
10. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender Approaches
10
Item
Hierarchy
Attribute-based
recommendations
Collaborative
filtering – User-
user Similarity
Collaborative
filtering – Item-item
Similarity
Social +
Interest
Graph
Based
Model Based
11. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender Approaches
11
Item
Hierarchy
Attribute-based
recommendations
Collaborative
filtering – User-
user Similarity
Collaborative
filtering – Item-item
Similarity
Social +
Interest
Graph
Based
Model Based
12. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender Approaches
12
Item
Hierarchy
Attribute-based
recommendations
Collaborative
filtering – User-
user Similarity
Collaborative
filtering – Item-item
Similarity
Social +
Interest
Graph
Based
Model Based
13. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender Approaches
13
Item
Hierarchy
Attribute-based
recommendations
Collaborative
filtering – User-
user Similarity
Collaborative
filtering – Item-item
Similarity
Social +
Interest
Graph
Based
Model Based
14. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender Approaches
14
Item
Hierarchy
Attribute-based
recommendations
Collaborative
filtering – User-
user Similarity
Collaborative
filtering – Item-item
Similarity
Social +
Interest
Graph
Based
Model Based
15. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommender Approaches
15
Item
Hierarchy
Attribute-based
recommendations
Collaborative
filtering – User-
user Similarity
Collaborative
filtering – Item-item
Similarity
Social +
Interest
Graph
Based
Model Based
16. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
What is Collaborative filtering ?
16
Method of making automatic predictions
About the interests of a user by collecting
preferences or taste information from
many users.
17. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
User-based CF
1/25/2014 17
18. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
User-based CF
1/25/2014 18
• General Idea
• Algorithm
• K-Nearest Neighbors
19. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 19
I'm gonna rent a film to
watch with my boyfriend
this week. Do you have
any suggestion ?
20. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 20
What kind of film does he
like ?
I'm gonna rent a film to
watch with my boyfriend
this week. Do you have
any suggestion ?
21. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 21
I don't know but his best
friend really like insect
collecting
22. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 22
I don't know but his best
friend really like insect
collecting
Maybe your boyfriend is
similar to him. You guys
can watch spiderman !?
23. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 23
I don't know but his best
friend really like insect
collecting
Maybe your boyfriend is
similar to him. You guys
can watch spiderman !?
24. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 24
I'm gonna rent a film to
watch with my boyfriend
this week. Do you have
any suggestion ?
25. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 25
What kind of film does he
like ?
I'm gonna rent a film to
watch with my boyfriend
this week. Do you have
any suggestion ?
26. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 26
I don't know but he really
enjoys our last film, iron
man
27. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 27
I don't know but he really
enjoys our last film, iron
man
Really, my boyfriend also likes
it and his favourite one is the
amazing Spiderman so maybe
you guys can try it
28. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
General Idea
1/25/2014 28
Similar
29. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Algorithm
1/25/2014 29
… …
8 8 03
…
30. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Algorithm
1/25/2014 30
… …
8 8 03
…
31. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Algorithm
1/25/2014 31
… …
8 8 03
…
5
5
32. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Algorithm
1/25/2014 32
… …
8 8 03
…
33. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
K-Nearest Neighbors
1/25/2014 33
Neighborhood of most similar
users is computed first
Only items known to those
users are considered
34. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
K-Nearest Neighbors
1/25/2014 34
…
8 8 03
35. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Item-Based CF
1/25/2014 35
• General Idea
• Why we need ?
• Algorithm
36. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 36
I'm gonna rent a film to
watch with my boyfriend
this week. Do you have
any suggestion ?
37. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 37
What kind of film does he
like ?
I'm gonna rent a film to
watch with my boyfriend
this week. Do you have
any suggestion ?
38. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 38
I don't know but he really
enjoys our last film, iron
man
39. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 39
I don't know but he really
enjoys our last film, iron
man
Really, if you enjoy ironman
then you should try ironman 2
40. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
General Idea
1/25/2014 40
Item-based recommendation is derived from how similar items are to
items, instead of users to users.
Similar
41. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Why we need ?
1/25/2014 41
So MANY users !!!
42. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 42Human is COMPLEX ?
43. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 43
10 000 users like
8 000 users like
44. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Algorithm
1/25/2014 44
…
……
…
8 5 16
Check every item that
has no preference
For each of them,
calculate the similarity
between it and every
item that has preference
… …
45. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Similarities
1/25/2014 45
• Pearson correlation
• Euclidean distance
• Tanimoto
46. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Pearson Correlation
1/25/2014 46
A hypothesis that how tall you are effects your self
esteem
47. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Pearson Correlation
1/25/2014 47
The Pearson correlation is a number between –1 and 1
Measure of the strength of a linear association between two
variables
48. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Pearson Correlation
1/25/2014 48
• Doesn’t take into account the number of items in which
two users’ preferences overlap
49. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Pearson Correlation
1/25/2014 49
• Doesn’t take into account the number of items in which
two users’ preferences overlap
• If two users overlap on only one item, no correlation can
be computed
50. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Tanimoto
1/25/2014 50
Ignore preference values entirely.
• It’s the ratio of the size of the intersection to the size of
the union of their preferred items
• When two users’ items completely overlap, the result is 1.0
• When they have nothing in common, it’s 0.0
51. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Tanimoto
1/25/2014 51
= AB / ( A + B - AB)
52. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Tanimoto
1/25/2014 53
Only use while underlying
data contains only Boolean
preferences
Too much noise in preferences
54. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
What is Apache Mahout?
• Open Source from Apache
• Mahout is a Java library
o Implementing Machine Learning techniques
• Recommendation
• Clustering
• Classification
55. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Why do we prefer Mahout ?
• Apache License
• Good Community & Documentation
• Scalable
o Based on Hadoop (not mandatory!)
56. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
•
Physical Storage(database,
files …)
Data Model
Recommender
Application
57. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommendation in Mahout
• Input: raw data (user preferences)
• Output: Preference estimation
• Step 1
o Mapping raw data into a DataModel Mahout-compliant
• Step 2
o Tuning recommender components
• Similarity measure, neighborhood, …
• Step 3
o Recommend
58. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Recommendation Components
• Five Java interfaces
o DataModel interface:
• MySQLJDBCDataModel, FileDataModel …
o UserSimilarity interface
• Methods to calculate the degree of correlation between two
users
o ItemSimilarity interface
• Methods to calculate the degree of correlation between two
items
o UserNeighborhood interface
• Methods to define the concept of ‘neighborhood’
o Recommender interface
• Methods to implement the recommendation step itself
59. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Similarity Metrics
• SIMILARITY_COOCCURRENCE
• SIMILARITY_LOGLIKELIHOOD
• SIMILARITY_TANIMOTO_COEFFICIENT
• SIMILARITY_CITY_BLOCK
• SIMILARITY_COSINE
• SIMILARITY_PEARSON_CORRELATION
• SIMILARITY_EUCLIDEAN_DISTANCE
60. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 61
CASE STUDY
“Much is made of what the likes of Facebook, Google and Apple know about users.
Truth is, Amazon may know more. And the massive retailer proves it every day “ - JP Mangalindan, Writer [*]
References: http://tech.fortune.cnn.com/2012/07/30/amazon-5/
61. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
CASE STUDY
1/25/2014 62
• Amazon recommendation system is based on a number of simple
elements:
o what a user has bought in the past and recently viewed
o which items a user has in virtual shopping cart
o items the user has rated and liked,
o what other customers have viewed and purchased
• The retail giant's call this "item-to-item collaborative filtering“
• And used this algorithm to heavily customize the browsing experience for
returning customers
References: http://tech.fortune.cnn.com/2012/07/30/amazon-5/
62. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
• The recommendation system worked, and Amazon reported very successfully
o 29% sales increase to $12.83 billion during its 2nd fiscal quarter (as of July 26, 2012 )
o Compare to $9.9 billion during the same time last year
• Amazon has integrated recommendations into nearly every part of the
purchasing process from product discovery to checkout
• "Our mission is to delight our customers by allowing them to serendipitously discover
great products.“ an Amazon spokesperson
CASE STUDY
1/25/2014 63References: http://tech.fortune.cnn.com/2012/07/30/amazon-5/
63. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 64
CASE STUDY – Amazon recommendations services
References http://www.google.com/patents/US7921042
64. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 65
CASE STUDY - Generation of Similar Items Table
References http://www.google.com/patents/US7921042 (Fig.1)
http://www.google.com/patents/US7113917 (Fig.3,4)
The recommendation services
components include:
- a recommendation process
- and an off-line table
generation process
- a similar items table
65. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 66
References http://www.google.com/patents/US7921042 (Fig.1)
http://www.google.com/patents/US7113917 (Fig.2)
CASE STUDY - Generation of Recommendation
66. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 67
References http://www.google.com/patents/US7921042 (Fig.1)
http://www.google.com/patents/US7113917 (Fig.5)
CASE STUDY - Generation of Recommendation
67. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 68
References http://www.google.com/patents/US7921042 (Fig.1)
http://www.google.com/patents/US7113917 (Fig.7)
CASE STUDY - Generation of Recommendation
68. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 69
EVALUATION
Experimental Settings
• Offline Experiments
- Performed by using a pre-collected data set of users choosing or rating items
- Simulate the behavior of users that interact with a recommendation system.
- Assume that the user behavior when the data was collected will be similar enough to the
user behavior when the recommender system is deployed,
- Make reliable decisions based on the simulation.
• User Studies
- conducted by recruiting a set of test subject,
- and asking and observing them to perform several tasks requiring an interaction with the
recommendation system.
- We can then check whether the recommendations are used, and whether people read
different stories with and without recommendations then ask them whether recommend
were relevant
• Online Experiments
- measuring the change in user behavior when interacting with different recommendation
systems.
- if users of one system follow the recommendations more often, or if some utility gathered
from users of one system exceeds utility gathered from users of the other system, then we
can conclude that one system is superior to the other
References: Microsoft research :Evaluating Recommendation Systems -
Guy Shani and Asela Gunawardana
69. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
1/25/2014 70
EVALUATION
References: Microsoft research :Evaluating Recommendation Systems -
Guy Shani and Asela Gunawardana
Reliable conclusion
1. Confidence and p-values
2. Multiple tests
Measure Metrics
1. User Preference & Prediction Accuracy: voting from user
o Root Mean Squared Error (RMSE)
o Measuring Usage Prediction
2. Coverage: Item Space & User Space
3. Novelty: recommendations for items that the user did not know about
4. Utility: the recommendation engine can be judged by the revenue that
it generates for the website
70. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
References
• Personalized recommendations of items represented within a database
http://www.google.com/patents/US7113917
• Computer processes for identifying related items and generating personalized
item recommendations
http://www.google.com/patents/US7921042
• Microsoft research :Evaluating Recommendation Systems - Guy Shani and
Asela Gunawardana
• Amazon Recommendation – Industry report
1/25/2014 71
71. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Summary
• Recommender Systems
• User-based vs Item-based
• Similarity metrics
o Depend on data to choose the most suitable
• Evaluation and challenges
• Apache mahout
o A Java library implements machine learning techniques
1/25/2014 72
72. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Mahout Music Recommend Demo
73
IF YOU LIKE BRITNEY,
YOU WILL LOVE ….
73. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Architecture of Recommender
DemoFriendLikes.csv DataModel
FacebookRecommender Recommender
FacebookRecommenderSOAP
Glassfish Java 6 EE Server
facebook-recommender-demo.war
SOAP
74. Nguyen Dao Tan Bao Nguyen Thi Ngoc Phu Cao Dinh Qui Pham Huy Thanh
Q&A
THANK YOU!
1/25/2014 75