Usman SharifRECOMMENDATION SYSTEMS
Why recommendation systems? Provide a better experience to your users. Understand the behavior and patterns of  users. ...
How some companies are usingRecommendation Systems - Amazon
How some companies are usingRecommendation Systems - Gmail
A simple recommendation system Consider the following scenario   A library has books and has members   Members can have...
Scoring Matrices         Book 1   Book 2   Book 3   Book 4User 1   X                 XUser 2   XUser 3            X       ...
Using the scoring matrices If a user has read Book 1 recommend Book 3, 2, 4. If a user has read Book 2 recommend Book 1,...
Advantages Very simple to understand and implement. Works really well if you’re interested in  looking at user’s one act...
Disadvantages Cannot work for a new user with no history. In a real world scenario where there are  thousands of books a...
Another Try Our Books records might look like this:BookId Title                     Genre         Writer               La...
Create an Item Similarity   Matrix            Book 1     Book 2      Book 3     Book 4      Book 5     Book 6      Book 7B...
To Recommend Look at what a user has previously read. Use the values from the similarity matrix and  recommend books bas...
Advantages Recommendations can be pre-computed for  a very large Item base. Fast lookups can be built to perform  recomm...
Disadvantage Does not consider the user’s history. Instead looks at a collective trend.
Another Approach - The Users Our Users records might look like this: UserId     Gender    Age        Location 1          ...
The User Borrowing  UserId   BookId  1        3  1        7  2        2  3        1  3        5  3        7  4        6  4...
Transforming User Borrowing             User 1     User 2       User 3   User 4   User 5   User 6   Book 1                ...
Transform the Users Records Consider Age as a discrete column with  ranges like {0-10, 11-20, 21-30, 31-40, …} so  that w...
Recreate User Borrowing using  Partition Information Lesser zero valued records (11/21 compared to  30/42 previously) Mu...
To Recommend See what partition a user belongs to. Look at the column of that partition and sort  the books in descendin...
Advantages Continues to improve over time. More partitions can be added over time. Instead of using a collective scorin...
Disadvantages Needs some seed data to start. Requires some transformations. Can become very complex as the number of  u...
Evaluating Performance(Metrics) Almost any Information Retrieval metric can  be used. Three interesting ones:   Accurac...
Accuracy• Takes into account the order in which recommendations are  shown to users and how they responded to them.• For r...
Coverage Shows the coverage of items that appear in the  recommendations for all users. For rank position = 1:   Cov(1)...
Normalized Distance Based Performance    Measure (NDPM)   Assesses the quality of the measure of recommendation system ta...
How to improve results Ensure that you maintain a list of already  seen recommendations for users and don’t  recommend th...
Some standard algorithms Item Hierarchy      You bought a printer, you will also need ink. Attribute-based recommendati...
Some Tools Apache Mahout (Java) Crab (Python) Easyrec (RESTful API)
Questions??
Thankyou!            www.usman-sharif.com                  @sharif_usman
Upcoming SlideShare
Loading in …5
×

Recommender Systems

1,274 views

Published on

Published in: Technology
4 Comments
1 Like
Statistics
Notes
No Downloads
Views
Total views
1,274
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
33
Comments
4
Likes
1
Embeds 0
No embeds

No notes for slide

Recommender Systems

  1. 1. Usman SharifRECOMMENDATION SYSTEMS
  2. 2. Why recommendation systems? Provide a better experience to your users. Understand the behavior and patterns of users. Enables an opportunity to re-engage inactive users. Boost sales Better than a search feature
  3. 3. How some companies are usingRecommendation Systems - Amazon
  4. 4. How some companies are usingRecommendation Systems - Gmail
  5. 5. A simple recommendation system Consider the following scenario  A library has books and has members  Members can have books issued  The library wants to build a recommender system to recommend books to their members
  6. 6. Scoring Matrices Book 1 Book 2 Book 3 Book 4User 1 X XUser 2 XUser 3 X XUser 4 X X XUser 5 X X Book 1 Book 2 Book 3 Book 4Book 1 4 1 2 1Book 2 1 2 0 1Book 3 2 0 2 1Book 4 1 1 1 2
  7. 7. Using the scoring matrices If a user has read Book 1 recommend Book 3, 2, 4. If a user has read Book 2 recommend Book 1, 4, 3. If a user has read Book 3 recommend Book 1, 4, 2. If a user has read Book 4 recommend Book 1, 2, 3.
  8. 8. Advantages Very simple to understand and implement. Works really well if you’re interested in looking at user’s one activity to recommend further.
  9. 9. Disadvantages Cannot work for a new user with no history. In a real world scenario where there are thousands of books and thousands of members, there are bound to be too many zeroes (a sparse matrix). Does not consider more than 1 item.
  10. 10. Another Try Our Books records might look like this:BookId Title Genre Writer Language1 The Great Gatsby Classic F Scott Fitzgerald English2 Nine Stories Short Stories J D Salinger English3 The Sun Also Rises Classic Ernest Hemingway English4 The Hunger Games Action Suzanne Collins English5 The Ambler Warning Thriller Robert Ludlum English6 The Catcher in the Rye Classic J D Salinger English7 To Kill a Mockingbird Classic Harper Lee English
  11. 11. Create an Item Similarity Matrix Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Book 7Book 1 3 1 2 1 1 2 2Book 2 1 3 1 1 1 2 1Book 3 2 1 3 1 1 2 2Book 4 1 1 1 3 1 1 1Book 5 1 1 1 1 3 1 1Book 6 2 2 2 1 1 3 2Book 7 2 1 2 1 1 2 3• This would always be a square (n x n) matrix.• Each cell has the count of similar attributes (excluding unique attributes).• In general any measure for similarity can be used here.
  12. 12. To Recommend Look at what a user has previously read. Use the values from the similarity matrix and recommend books based on how similar it is to the book the user has already read.
  13. 13. Advantages Recommendations can be pre-computed for a very large Item base. Fast lookups can be built to perform recommendations. For example, if a user is seeing the page of Book 3, you may want to recommend them Books 1, 6 and 7. Would work for new/non-registered users.
  14. 14. Disadvantage Does not consider the user’s history. Instead looks at a collective trend.
  15. 15. Another Approach - The Users Our Users records might look like this: UserId Gender Age Location 1 Male 34 Pakistan 2 Female 28 Pakistan 3 Male 38 India 4 Male 32 India 5 Female 21 Pakistan 6 Female 24 Pakistan
  16. 16. The User Borrowing UserId BookId 1 3 1 7 2 2 3 1 3 5 3 7 4 6 4 7 5 2 6 4 6 6 6 7
  17. 17. Transforming User Borrowing User 1 User 2 User 3 User 4 User 5 User 6 Book 1 X Book 2 X X Book 3 X Book 4 X Book 5 X Book 6 X X Book 7 X X X X• Issue with too many zero values.• Any solutions?
  18. 18. Transform the Users Records Consider Age as a discrete column with ranges like {0-10, 11-20, 21-30, 31-40, …} so that we can create some partitions like this: PartitionId Gender AgeGroup Location 1 Male 31-40 Pakistan 2 Female 21-30 Pakistan 3 Male 31-40 India
  19. 19. Recreate User Borrowing using Partition Information Lesser zero valued records (11/21 compared to 30/42 previously) Much less columns than we previously had! The notation has been changed from ‘X’ to count. Partition 1 Partition 2 Partition 3 Book 1 1 Book 2 2 Book 3 1 Book 4 1 Book 5 1 Book 6 1 1 Book 7 1 1 2
  20. 20. To Recommend See what partition a user belongs to. Look at the column of that partition and sort the books in descending order based on their frequency count.
  21. 21. Advantages Continues to improve over time. More partitions can be added over time. Instead of using a collective scoring, the technique partitions the user base into ‘similar’ users. The technique can easily be extended on the item side and rather than having books as rows, we can have book clusters.
  22. 22. Disadvantages Needs some seed data to start. Requires some transformations. Can become very complex as the number of users/items grow.
  23. 23. Evaluating Performance(Metrics) Almost any Information Retrieval metric can be used. Three interesting ones:  Accuracy  Coverage  Normalized Distance Based Performance Measure (NDPM)
  24. 24. Accuracy• Takes into account the order in which recommendations are shown to users and how they responded to them.• For rank position = 1: • Acc(1) = # of Positive responses with rank less than or equal to 1 / total recommendations with rank less than or equal to 1 • Therefore, Acc(1) = 1 / 3 = 33.33%• Similarly, Acc(2) = 2 / 6 = 33.33% UserId BookId Rank Response 1 3 1 Yes 1 2 2 No 2 7 1 No 2 5 2 Yes 3 3 1 No 3 7 2 No
  25. 25. Coverage Shows the coverage of items that appear in the recommendations for all users. For rank position = 1:  Cov(1) = Unique items in recommendations with rank less than or equal to 1 / total items.  Therefore, Cov(1) = 2 / 7 = 28.57% Similarly, Cov(2) = 4 / 7 = 57.14% UserId BookId Rank Response 1 3 1 Yes 1 2 2 No 2 7 1 No 2 5 2 Yes 3 3 1 No 3 7 2 No
  26. 26. Normalized Distance Based Performance Measure (NDPM) Assesses the quality of the measure of recommendation system taking into account the ordering in which items are shown. NDPM = (C- + 0.5 x C+) / Cu C- - is the number of recommended item pairs where user responded as (No, Yes). C+ - is the number of recommended item pairs where user responded as (Yes, No). Cu - is the number of all item pairs where the user’s response was not same. In our example,  C-(1) = 2, C+(1) = 2 and Cu(1) = 4 => NDPM(1) = (2 + 0.5 x 2) / 4 = 75%  C-(2) = 0, C+(2) = 1 and Cu(2) = 1 => NDPM(2) = (0 + 0.5 x 1) / 1 = 50%  NDPM = (0.75 + 0.5) / 2 = 62.5% UserId BookId Rank Response 1 3 1 Yes 1 2 2 No 1 7 3 No 1 5 4 Yes 2 3 1 Yes 2 7 2 No
  27. 27. How to improve results Ensure that you maintain a list of already seen recommendations for users and don’t recommend them back for some time. Provide some sort of mechanism to user to provide information about what they’re looking for. Infer the above from user searches.
  28. 28. Some standard algorithms Item Hierarchy  You bought a printer, you will also need ink. Attribute-based recommendations  You like reading classics, written by Salinger, you might like “Catcher in the Rye”. Collaborative Filtering – User-User Similarity  People like you who read “The Hunger Games” also read “The Ambler Warning”. Collaborative Filtering – Item-Item Similarity  You like “Catcher in the Rye” so you will like “Nine Stories”. Social + Interest Graph Based  Your friends like “The Great Gatsby” so you will like “The Great Gatsby” too. Model Based  Training SVM, LDA, SVD for implicit features.
  29. 29. Some Tools Apache Mahout (Java) Crab (Python) Easyrec (RESTful API)
  30. 30. Questions??
  31. 31. Thankyou! www.usman-sharif.com @sharif_usman

×