Coen Stevens
Lead Recommendation Engineer
How to build a recommender system?
           Wakoopa use case
Mission:
Discover software & games
Software tracker
Windows         Mac          Linux
Your profile
Updates
Software pages
Recommendations
Building a recommender system
       Approach and challenges
Data
                      what do we have?

Usage (implicit)                       Ratings (explicit)
                   ...
Data
                       what do we use?


•   Active users (Tracker activity in the past month): ~9.000

•   Actively ...
Recommender system methods
Collaborative recommendations: The user will be
recommended items that people with similar tast...
Item-Based Collaborative Filtering
           User software usage matrix
                     Software items




         ...
User software usage matrix [0, 1]
                      Software items




              1   1      0     1       0   1   ...
How do we predict the probability that I would like to use GMail?
                              Software items




       ...
Calculate the similarities between Gmail and the other software items.
                                            Softwar...
Calculate the similarities between Gmail and the other software items.
                                            Softwar...
Calculate the similarities between Gmail and the other software items.
                                            Softwar...
Item-item correlation matrix



    1    0.1   0.6   0.1   0.1   0.1   0.7

   0.2   1     0.8   0.5   0.8   0.1   0.9

  ...
Item-item correlation matrix
Gmail similarities




          0.6            1    0.1   0.6   0.1   0.1   0.1   0.7

     ...
K-nearest neighbor approach
Gmail similarities


                     •   Performance vs quality
          0.6
           ...
Calculate the predicted value for Gmail
Gmail similarities   User usage




                            1
          0.6

 ...
Calculate the predicted value for Gmail
Gmail similarities   User usage




                           0.9
          0.6
 ...
Calculate the predicted value for Gmail
Gmail similarities   User usage




                           0.9
          0.6

...
Calculate the predicted value for Gmail

                                       • User feedback
Gmail similarities   User ...
Calculate all unknown values and
show the Top-N recommendations to each user
                    Software items




      ...
Explainability
             Why did I get this recommendation?


•   Overlap between the item’s (K) neighbors and your usa...
User-Based Collaborative Filtering
                                 Finding people like you



                           ...
Applying inverse user frequency

        log(n/ni): ni is the number of users that uses item i and n is
                  ...
0.1   0.2   0     0.4   0     0.4   0

                                 0.1   0.2   0.6   0     0.8   0     0

           ...
User-user correlation matrix



     1     0.8   0.6   0.5   0.7   0.2

     0.8   1     0.4   0.7   0.5   0.5

     0.6  ...
Performance
                 measure for success

•   Cross-validation: Train-Test split (80-20)

•   Precision and Recall...
Implementation

•   Ruby Enterprise Edition (garbage collection)

•   MySQL database

•   Built our own c-libraries

•   A...
Future challenges


•   What is the best algorithm for Wakoopa? (or you)

•   Reducing space-time complexity (scalability)...
1 evening, 3 speakers, 100 developers
           www.recked.org
How to build a recommender system?
How to build a recommender system?
Upcoming SlideShare
Loading in...5
×

How to build a recommender system?

39,501

Published on

By Coen Stevens, Lead Recommendations Engineer at Wakoopa. Presented at http://recked.org

Published in: Technology
3 Comments
113 Likes
Statistics
Notes
  • Slide 19:
    After running multiple tests with different values, the Beta value was set at 0.04

    The confidence was basically a popularity score, which was calculated as follows for a particular (software) item:

    (Math.log(num_total_users/num_item_users)) / (Math.log(num_total_users))
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I think , he compare product A vs B by using Pearson Correlation
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I know its an old topic but can someone maybe explain slide 19 ? I don’t understand what the variables are. Thanks in advance
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
39,501
On Slideshare
0
From Embeds
0
Number of Embeds
22
Actions
Shares
0
Downloads
1,651
Comments
3
Likes
113
Embeds 0
No embeds

No notes for slide

How to build a recommender system?

  1. 1. Coen Stevens Lead Recommendation Engineer
  2. 2. How to build a recommender system? Wakoopa use case
  3. 3. Mission: Discover software & games
  4. 4. Software tracker Windows Mac Linux
  5. 5. Your profile
  6. 6. Updates
  7. 7. Software pages
  8. 8. Recommendations
  9. 9. Building a recommender system Approach and challenges
  10. 10. Data what do we have? Usage (implicit) Ratings (explicit) vs. • • Noisy Accurate • • Only positive feedback Positive and negative feedback • • Easy to collect Hard to collect
  11. 11. Data what do we use? • Active users (Tracker activity in the past month): ~9.000 • Actively used software items (in the past month): ~10.000 • We calculate recommendations for each OS together with Web applications separately
  12. 12. Recommender system methods Collaborative recommendations: The user will be recommended items that people with similar tastes and preferences liked (used) in the past • Item-based collaborative filtering • User-based collaborative filtering (we only use for calculating user similarities to find people like you) • Combining both methods
  13. 13. Item-Based Collaborative Filtering User software usage matrix Software items 220 90 180 22 280 12 42 80 Users 175 210 210 45 165 35 195 13 25 100 50 185 35 190 60 65 185
  14. 14. User software usage matrix [0, 1] Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 Users 1 1 0 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1
  15. 15. How do we predict the probability that I would like to use GMail? Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 ? Users 1 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1
  16. 16. Calculate the similarities between Gmail and the other software items. Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 Users 1 1 0 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1 Cosine Similarity(Firefox, Gmail)
  17. 17. Calculate the similarities between Gmail and the other software items. Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 Users 1 1 0 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1 Cosine Similarity(Firefox, Gmail)
  18. 18. Calculate the similarities between Gmail and the other software items. Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 Popularity correction, Users 1 1 0 1 0 1 0 we put less trust 1 0 1 1 1 1 0 in popular software 0 1 1 1 0 1 1 0 1 0 1 0 0 1 Cosine Similarity(Firefox, Gmail)
  19. 19. Item-item correlation matrix 1 0.1 0.6 0.1 0.1 0.1 0.7 0.2 1 0.8 0.5 0.8 0.1 0.9 0.1 0.6 1 0.5 0.7 0.2 0.3 0.2 0.6 0.4 1 0.8 0.2 0.3 0.5 0.4 0.4 0.4 1 0.1 0.2 0.5 0.5 0.3 0.5 0.3 1 0.3 0.2 0.6 0.3 0.8 0.7 0.7 1
  20. 20. Item-item correlation matrix Gmail similarities 0.6 1 0.1 0.6 0.1 0.1 0.1 0.7 0.8 0.2 1 0.8 0.5 0.8 0.1 0.9 0.4 0.1 0.6 1 0.5 0.7 0.2 0.3 0.4 0.2 0.6 0.4 1 0.8 0.2 0.3 0.3 0.5 0.4 0.4 0.4 1 0.1 0.2 0.3 0.5 0.5 0.3 0.5 0.3 1 0.3 0.2 0.6 0.3 0.8 0.7 0.7 1
  21. 21. K-nearest neighbor approach Gmail similarities • Performance vs quality 0.6 • We take only the ‘K’ most similar items (say 4) 0.8 • Space complexity: O(m + Kn) 0.4 • 0.4 Computational complexity: O(m + n²) 0.3 0.3
  22. 22. Calculate the predicted value for Gmail Gmail similarities User usage 1 0.6 1 0.8 1 0.4 0.4 1
  23. 23. Calculate the predicted value for Gmail Gmail similarities User usage 0.9 0.6 Usage correction, 0.8 0.8 more usage results in a higher score [0,1] 0.6 0.4 0.4 0.2
  24. 24. Calculate the predicted value for Gmail Gmail similarities User usage 0.9 0.6 0.8 0.8 0.6 0.4 0.4 0.2 (0.6 * 0.9) + (0.8 * 0.8) + (0.4 * 0.6) = 0.82 0.6 + 0.8 + 0.4 + 0.4
  25. 25. Calculate the predicted value for Gmail • User feedback Gmail similarities User usage • Contacts usage 0.9 0.6 • Commercial vs Free 0.8 0.8 0.6 0.4 0.4 0.2 (0.6 * 0.9) + (0.8 * 0.8) + (0.4 * 0.6) = 0.82 0.6 + 0.8 + 0.4 + 0.4
  26. 26. Calculate all unknown values and show the Top-N recommendations to each user Software items ? ? ? 1 1 1 1 ?1?? 1 1 1 ?1?1? Users 1 1 ?1111? 1 ?111?11 ?1?1??1
  27. 27. Explainability Why did I get this recommendation? • Overlap between the item’s (K) neighbors and your usage
  28. 28. User-Based Collaborative Filtering Finding people like you 1 1 0 1 0 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 1 1 0 Cosine Similarity(Coen, Menno) 0 1 1 1 0 1 1 0 1 0 1 0 0 1
  29. 29. Applying inverse user frequency log(n/ni): ni is the number of users that uses item i and n is the total number of users in the database 0.1 0.2 0 0.4 0 0.4 0 0.1 0.2 0.6 0 0.8 0 0 0.1 0.2 0 0.4 0 0.4 0 0.1 0.2 0.6 0.4 0.8 0.4 0 Cosine Similarity(Coen, Menno) 0 0.2 0.6 0.4 0 0.4 0.2 0 0.2 0 0.4 0 0 0.2 The fact that you both use Textmate tells you more than when you both use firefox
  30. 30. 0.1 0.2 0 0.4 0 0.4 0 0.1 0.2 0.6 0 0.8 0 0 0.1 0.2 0 0.4 0 0.4 0 0.1 0.2 0.6 0.4 0.8 0.4 0 Cosine Similarity(Coen, Menno) 0 0.2 0.6 0.4 0 0.4 0.2 0 0.2 0 0.4 0 0 0.2
  31. 31. User-user correlation matrix 1 0.8 0.6 0.5 0.7 0.2 0.8 1 0.4 0.7 0.5 0.5 0.6 0.4 1 0.4 0.9 0.1 0.5 0.8 0.4 1 0.6 0.4 0.8 0.5 0.9 0.6 1 0.2 0.2 0.5 0.1 0.4 0.2 1
  32. 32. Performance measure for success • Cross-validation: Train-Test split (80-20) • Precision and Recall: - precision = size(hit set) / size(total given recs) - recall = size(hit set) / size(test set) • Root mean squared error (RMSE)
  33. 33. Implementation • Ruby Enterprise Edition (garbage collection) • MySQL database • Built our own c-libraries • Amazon EC2: - Low cost - Flexibility - Ease of use • Open source
  34. 34. Future challenges • What is the best algorithm for Wakoopa? (or you) • Reducing space-time complexity (scalability): - Parallelization (Clojure) - Distributed computing (Hadoop)
  35. 35. 1 evening, 3 speakers, 100 developers www.recked.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×