Thesis presentationOnline recommendationsusing matrix factorisation           Marcus Ljungblad                        marc...
40+ million videos13+ million users500 requests/second                    y ea rs            0 6
3 reasons:   - find good content   - improve user experience   - increase revenue               great!
3 problems
1: the data
2: the model
2: the model
3: the system  hy so littleW          esearch?syste ms r
3: the system
3 1 problem
Question:  How do you serve  recommendations  from millions of items  to millions of users?
Video ratings        2   4       4           ?   1        3   5       ?           ?   1Users        ?   4       2         ...
def matrix_factorization(MatrixToFactorise, UsersPreferences, MoviesFeatures,                  NumberOfLatentFeatures, Max...
[ 0.38 0.91    0.32   0.36 1.22][ 1.52 -0.07   0.66   0.76   0.79]                                     [ 0.72 0.98    0.98...
[ 2.05   3.97   3.96   2.12   1.01][ 2.93   5.02   3.21   1.61   0.98][ 2.15   3.95   2.01   1.05   1.1 ][ 1.     4.29   1...
[ 2.05   3.97   3.96   2.12   1.01][ 2.93   5.02   3.21   1.61   0.98][ 2.15   3.95   2.01   1.05   1.1 ][ 1.     4.29   1...
[ 2.05   3.97   3.96   2.12   1.01][ 2.93   5.02   3.21   1.61   0.98][ 2.15   3.95   2.01   1.05   1.1 ][ 1.     4.29   1...
1 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 56 34 2 24 ...
r in g l us teC
Millions of items1                        2       3
1        2    3
Recommendation        Request1               2    3
Recommendation        Request1               2    3
Compass = last video
Interface                   Delegate             Router                  Workers                                          ...
Did it work?
Results   - ~600 requests per second   - latency below 30 ms   - quality is ok
Results: Throughput
Results: Throughput          h uh?
Interface                   Delegate             Router                  Workers                                          ...
Results: Quality     Queries Non-zero   MAP     1       41         23%     2       87         25%     3       116        3...
Summary  - clustering is data  - balanced clusters needed  - scale is ok
Photos and imagery used in the presentation (except graphs and logos).Amazon recommendations: http://pleated-jeans.com/201...
Thesis-presentation: Tuenti Engineering
Thesis-presentation: Tuenti Engineering
Thesis-presentation: Tuenti Engineering
Upcoming SlideShare
Loading in …5
×

Thesis-presentation: Tuenti Engineering

370
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
370
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Thesis-presentation: Tuenti Engineering

  1. 1. Thesis presentationOnline recommendationsusing matrix factorisation Marcus Ljungblad marcus@tuenti.com Royal Institute of Technology, Stockholm, Sweden Instituto Superior Técnico, Lisbon, Portugal Universitat Politécnica de Catalunya, Barcelona, Spain
  2. 2. 40+ million videos13+ million users500 requests/second y ea rs 0 6
  3. 3. 3 reasons: - find good content - improve user experience - increase revenue great!
  4. 4. 3 problems
  5. 5. 1: the data
  6. 6. 2: the model
  7. 7. 2: the model
  8. 8. 3: the system hy so littleW esearch?syste ms r
  9. 9. 3: the system
  10. 10. 3 1 problem
  11. 11. Question: How do you serve recommendations from millions of items to millions of users?
  12. 12. Video ratings 2 4 4 ? 1 3 5 ? ? 1Users ? 4 2 1 ? 1 ? 1 3 3
  13. 13. def matrix_factorization(MatrixToFactorise, UsersPreferences, MoviesFeatures, NumberOfLatentFeatures, MaxSteps=5000, LearningRate=0.0002, RegularizationConstant=0.02): MoviesFeatures = MoviesFeatures.T for step in xrange(MaxSteps): for user in xrange(len(MatrixToFactorise)): for movie in xrange(len(MatrixToFactorise[user])): if MatrixToFactorise[user][movie] > 0: estimatedUserMovieFactors = MatrixToFactorise[user][movie] - numpy.dot(UsersPreferences[user,:], MoviesFeatures[:,movie]) for feature in xrange(NumberOfLatentFeatures): UsersPreferences[user][feature] = UsersPreferences[user][feature] + LearningRate * (2 * estimatedUserMovieFactors * MoviesFeatures[feature][movie] - RegularizationConstant * UsersPreferences[user][feature]) MoviesFeatures[feature][movie] = MoviesFeatures[feature][movie] + LearningRate * (2 * estimatedUserMovieFactors * UsersPreferences[user][feature] - RegularizationConstant * MoviesFeatures[feature][movie]) # if approximation is good enough, stop iterating ApproximationError = calculate_mean_squared_error_of_estimate(MatrixToFactorise, UsersPreferences, MoviesFeatures, out NumberOfLatentFeatures, ab RegularizationConstant) orry if ApproximationError < 0.001: break S the slide
  14. 14. [ 0.38 0.91 0.32 0.36 1.22][ 1.52 -0.07 0.66 0.76 0.79] [ 0.72 0.98 0.98 1.28 1.75][ 0.79 0.63 0.08 0.9 1.46] [ 1.54 -0.19 0.81 0.61 0.72][ 0.56 0.58 0.16 0.43 1.28] [ 0.22 0.61 0.95 1.18 -0.09][-0.15 0.7 0.87 1.45 -0.3] [-0.13 0.76 0.97 1.04 -0.26]
  15. 15. [ 2.05 3.97 3.96 2.12 1.01][ 2.93 5.02 3.21 1.61 0.98][ 2.15 3.95 2.01 1.05 1.1 ][ 1. 4.29 1.01 2.96 2.98]
  16. 16. [ 2.05 3.97 3.96 2.12 1.01][ 2.93 5.02 3.21 1.61 0.98][ 2.15 3.95 2.01 1.05 1.1 ][ 1. 4.29 1.01 2.96 2.98] 2 4 4 ? 1 3 5 ? ? 1 ? 4 2 1 ? 1 ? 1 3 3
  17. 17. [ 2.05 3.97 3.96 2.12 1.01][ 2.93 5.02 3.21 1.61 0.98][ 2.15 3.95 2.01 1.05 1.1 ][ 1. 4.29 1.01 2.96 2.98]
  18. 18. 1 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 56 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 656 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 234 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 113 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 11123 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 634 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 614 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 13x4051 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 133 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 42 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 224 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 12 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 2443 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 623 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 4234 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 56 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 million ratings61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 4321 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 233 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 965 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 62 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 4523 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 2345 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 653 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 21 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 4523 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 2345 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 653 2 11
  19. 19. r in g l us teC
  20. 20. Millions of items1 2 3
  21. 21. 1 2 3
  22. 22. Recommendation Request1 2 3
  23. 23. Recommendation Request1 2 3
  24. 24. Compass = last video
  25. 25. Interface Delegate Router Workers Workers Workersrequest start route compute reply top N merge / sort top N to json start
  26. 26. Did it work?
  27. 27. Results - ~600 requests per second - latency below 30 ms - quality is ok
  28. 28. Results: Throughput
  29. 29. Results: Throughput h uh?
  30. 30. Interface Delegate Router Workers Workers Workersrequest start route compute reply top N merge / sort top N to json start
  31. 31. Results: Quality Queries Non-zero MAP 1 41 23% 2 87 25% 3 116 36% 4 165 58% 5 196 74%
  32. 32. Summary - clustering is data - balanced clusters needed - scale is ok
  33. 33. Photos and imagery used in the presentation (except graphs and logos).Amazon recommendations: http://pleated-jeans.com/2010/08/06/amazon-recommendations-for-characters-from-the-office/Pile of books: http://www.paper-pills.com/category/gewgaws/page/2/Function: http://en.wikipedia.org/wiki/File:Graph_of_example_function.svgServer: http://arstechnica.com/gadgets/2007/08/windows-home-server-system-specs-prices-and-launch-date-leaked/Tick: http://ia.wikipedia.org/wiki/File:Tick_green_modern.svgPhone: http://www.foxbusiness.com/technology/2012/05/22/are-carrier-subsidies-hurting-innovation-and-driving-up-mobile-phone-costs/Man in front of computer: http://honesttogawd.blogspot.com.es/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×