Thesis presentation


Online recommendations
using matrix factorisation


           Marcus Ljungblad
                        marcus@tuenti.com


        Royal Institute of Technology, Stockholm, Sweden
            Instituto Superior Técnico, Lisbon, Portugal
       Universitat Politécnica de Catalunya, Barcelona, Spain
40+ million videos

13+ million users
500 requests/second

                    y ea rs
            0 6
3 reasons:
   - find good content
   - improve user experience
   - increase revenue

               great!
3 problems
1: the data
2: the model
2: the model
3: the system


  hy so little
W          esearch?
syste ms r
3: the system
3 1 problem
Question:
  How do you serve
  recommendations
  from millions of items
  to millions of users?
Video ratings



        2   4       4           ?   1
        3   5       ?           ?   1
Users




        ?   4       2           1   ?
        1   ?       1           3   3
def matrix_factorization(MatrixToFactorise, UsersPreferences, MoviesFeatures,
                  NumberOfLatentFeatures, MaxSteps=5000,
                  LearningRate=0.0002, RegularizationConstant=0.02):
   MoviesFeatures = MoviesFeatures.T

  for step in xrange(MaxSteps):
     for user in xrange(len(MatrixToFactorise)):
        for movie in xrange(len(MatrixToFactorise[user])):
           if MatrixToFactorise[user][movie] > 0:
               estimatedUserMovieFactors = MatrixToFactorise[user][movie] - 
                  numpy.dot(UsersPreferences[user,:], MoviesFeatures[:,movie])
               for feature in xrange(NumberOfLatentFeatures):
                  UsersPreferences[user][feature] = UsersPreferences[user][feature] + 
                     LearningRate * (2 * estimatedUserMovieFactors *
                                 MoviesFeatures[feature][movie] -
                                 RegularizationConstant * UsersPreferences[user][feature])
                  MoviesFeatures[feature][movie] = MoviesFeatures[feature][movie] + 
                     LearningRate * (2 * estimatedUserMovieFactors *
                                 UsersPreferences[user][feature] -
                                 RegularizationConstant * MoviesFeatures[feature][movie])

     # if approximation is good enough, stop iterating
     ApproximationError = calculate_mean_squared_error_of_estimate(MatrixToFactorise,
                                                     UsersPreferences,
                                                     MoviesFeatures,



                                                                              out
                                                     NumberOfLatentFeatures,


                                                                           ab
                                                     RegularizationConstant)



                                                                      orry
     if ApproximationError < 0.001:
         break

                                                                    S
                                                                       the slide
[ 0.38 0.91    0.32   0.36 1.22]
[ 1.52 -0.07   0.66   0.76   0.79]
                                     [ 0.72 0.98    0.98   1.28 1.75]
[ 0.79 0.63    0.08   0.9    1.46]
                                     [ 1.54 -0.19   0.81   0.61 0.72]
[ 0.56 0.58    0.16   0.43   1.28]
                                     [ 0.22 0.61    0.95   1.18 -0.09]
[-0.15 0.7     0.87   1.45   -0.3]
                                     [-0.13 0.76    0.97   1.04 -0.26]
[ 2.05   3.97   3.96   2.12   1.01]
[ 2.93   5.02   3.21   1.61   0.98]
[ 2.15   3.95   2.01   1.05   1.1 ]
[ 1.     4.29   1.01   2.96   2.98]
[ 2.05   3.97   3.96   2.12   1.01]
[ 2.93   5.02   3.21   1.61   0.98]
[ 2.15   3.95   2.01   1.05   1.1 ]
[ 1.     4.29   1.01   2.96   2.98]



  2       4      4      ?      1
  3       5      ?      ?      1
  ?       4      2      1      ?
  1       ?      1      3      3
[ 2.05   3.97   3.96   2.12   1.01]
[ 2.93   5.02   3.21   1.61   0.98]
[ 2.15   3.95   2.01   1.05   1.1 ]
[ 1.     4.29   1.01   2.96   2.98]
1 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5
6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65
6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2
34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1
13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111
23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6
34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6
14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34




                       13x40
51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13
3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4
2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2
24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1
2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24
43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6
23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42
34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5
6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4

                          million ratings
61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43
21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23
3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5
234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9
65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6
2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1
123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45
23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234
5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65
3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2
1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1
123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45
23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234
5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65
3 2 11
r in g
 l us te
C
Millions of items




1
                        2




       3
1
        2




    3
Recommendation
        Request




1
               2




    3
Recommendation
        Request




1
               2




    3
Compass = last video
Interface                   Delegate             Router                  Workers
                                                                                    Workers
                                                                                      Workers
request

                        start
                                                   route
                                                                         compute




                                                                    reply top N




                                                 merge
                                                 / sort
                      top N to json
 start
Did it work?
Results
   - ~600 requests per second
   - latency below 30 ms
   - quality is ok
Results: Throughput
Results: Throughput




          h uh?
Interface                   Delegate             Router                  Workers
                                                                                    Workers
                                                                                      Workers
request

                        start
                                                   route
                                                                         compute




                                                                    reply top N




                                                 merge
                                                 / sort
                      top N to json
 start
Results: Quality

     Queries Non-zero   MAP
     1       41         23%
     2       87         25%
     3       116        36%
     4       165        58%
     5       196        74%
Summary
  - clustering is data
  - balanced clusters needed
  - scale is ok
Photos and imagery used in the presentation (except graphs and logos).

Amazon recommendations: http://pleated-jeans.com/2010/08/06/amazon-recommendations-for-characters-from-the-office/
Pile of books: http://www.paper-pills.com/category/gewgaws/page/2/
Function: http://en.wikipedia.org/wiki/File:Graph_of_example_function.svg
Server: http://arstechnica.com/gadgets/2007/08/windows-home-server-system-specs-prices-and-launch-date-leaked/
Tick: http://ia.wikipedia.org/wiki/File:Tick_green_modern.svg
Phone: http://www.foxbusiness.com/technology/2012/05/22/are-carrier-subsidies-hurting-innovation-and-driving-up-mobile-
phone-costs/
Man in front of computer: http://honesttogawd.blogspot.com.es/

Thesis-presentation: Tuenti Engineering

  • 1.
    Thesis presentation Online recommendations usingmatrix factorisation Marcus Ljungblad marcus@tuenti.com Royal Institute of Technology, Stockholm, Sweden Instituto Superior Técnico, Lisbon, Portugal Universitat Politécnica de Catalunya, Barcelona, Spain
  • 2.
    40+ million videos 13+million users 500 requests/second y ea rs 0 6
  • 3.
    3 reasons: - find good content - improve user experience - increase revenue great!
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    3: the system hy so little W esearch? syste ms r
  • 10.
  • 11.
  • 12.
    Question: Howdo you serve recommendations from millions of items to millions of users?
  • 14.
    Video ratings 2 4 4 ? 1 3 5 ? ? 1 Users ? 4 2 1 ? 1 ? 1 3 3
  • 15.
    def matrix_factorization(MatrixToFactorise, UsersPreferences,MoviesFeatures, NumberOfLatentFeatures, MaxSteps=5000, LearningRate=0.0002, RegularizationConstant=0.02): MoviesFeatures = MoviesFeatures.T for step in xrange(MaxSteps): for user in xrange(len(MatrixToFactorise)): for movie in xrange(len(MatrixToFactorise[user])): if MatrixToFactorise[user][movie] > 0: estimatedUserMovieFactors = MatrixToFactorise[user][movie] - numpy.dot(UsersPreferences[user,:], MoviesFeatures[:,movie]) for feature in xrange(NumberOfLatentFeatures): UsersPreferences[user][feature] = UsersPreferences[user][feature] + LearningRate * (2 * estimatedUserMovieFactors * MoviesFeatures[feature][movie] - RegularizationConstant * UsersPreferences[user][feature]) MoviesFeatures[feature][movie] = MoviesFeatures[feature][movie] + LearningRate * (2 * estimatedUserMovieFactors * UsersPreferences[user][feature] - RegularizationConstant * MoviesFeatures[feature][movie]) # if approximation is good enough, stop iterating ApproximationError = calculate_mean_squared_error_of_estimate(MatrixToFactorise, UsersPreferences, MoviesFeatures, out NumberOfLatentFeatures, ab RegularizationConstant) orry if ApproximationError < 0.001: break S the slide
  • 16.
    [ 0.38 0.91 0.32 0.36 1.22] [ 1.52 -0.07 0.66 0.76 0.79] [ 0.72 0.98 0.98 1.28 1.75] [ 0.79 0.63 0.08 0.9 1.46] [ 1.54 -0.19 0.81 0.61 0.72] [ 0.56 0.58 0.16 0.43 1.28] [ 0.22 0.61 0.95 1.18 -0.09] [-0.15 0.7 0.87 1.45 -0.3] [-0.13 0.76 0.97 1.04 -0.26]
  • 17.
    [ 2.05 3.97 3.96 2.12 1.01] [ 2.93 5.02 3.21 1.61 0.98] [ 2.15 3.95 2.01 1.05 1.1 ] [ 1. 4.29 1.01 2.96 2.98]
  • 18.
    [ 2.05 3.97 3.96 2.12 1.01] [ 2.93 5.02 3.21 1.61 0.98] [ 2.15 3.95 2.01 1.05 1.1 ] [ 1. 4.29 1.01 2.96 2.98] 2 4 4 ? 1 3 5 ? ? 1 ? 4 2 1 ? 1 ? 1 3 3
  • 19.
    [ 2.05 3.97 3.96 2.12 1.01] [ 2.93 5.02 3.21 1.61 0.98] [ 2.15 3.95 2.01 1.05 1.1 ] [ 1. 4.29 1.01 2.96 2.98]
  • 20.
    1 23 42 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 13x40 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 million ratings 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 11
  • 21.
    r in g l us te C
  • 22.
  • 23.
    1 2 3
  • 24.
    Recommendation Request 1 2 3
  • 25.
    Recommendation Request 1 2 3
  • 26.
  • 27.
    Interface Delegate Router Workers Workers Workers request start route compute reply top N merge / sort top N to json start
  • 28.
  • 29.
    Results - ~600 requests per second - latency below 30 ms - quality is ok
  • 30.
  • 31.
  • 32.
    Interface Delegate Router Workers Workers Workers request start route compute reply top N merge / sort top N to json start
  • 33.
    Results: Quality Queries Non-zero MAP 1 41 23% 2 87 25% 3 116 36% 4 165 58% 5 196 74%
  • 34.
    Summary -clustering is data - balanced clusters needed - scale is ok
  • 36.
    Photos and imageryused in the presentation (except graphs and logos). Amazon recommendations: http://pleated-jeans.com/2010/08/06/amazon-recommendations-for-characters-from-the-office/ Pile of books: http://www.paper-pills.com/category/gewgaws/page/2/ Function: http://en.wikipedia.org/wiki/File:Graph_of_example_function.svg Server: http://arstechnica.com/gadgets/2007/08/windows-home-server-system-specs-prices-and-launch-date-leaked/ Tick: http://ia.wikipedia.org/wiki/File:Tick_green_modern.svg Phone: http://www.foxbusiness.com/technology/2012/05/22/are-carrier-subsidies-hurting-innovation-and-driving-up-mobile- phone-costs/ Man in front of computer: http://honesttogawd.blogspot.com.es/