Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

Thesis-presentation: Tuenti Engineering

328

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
328
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
7
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript

• 1. Thesis presentationOnline recommendationsusing matrix factorisation Marcus Ljungblad marcus@tuenti.com Royal Institute of Technology, Stockholm, Sweden Instituto Superior Técnico, Lisbon, Portugal Universitat Politécnica de Catalunya, Barcelona, Spain
• 2. 40+ million videos13+ million users500 requests/second y ea rs 0 6
• 3. 3 reasons: - find good content - improve user experience - increase revenue great!
• 4. 3 problems
• 5. 1: the data
• 6. 2: the model
• 7. 2: the model
• 8. 3: the system hy so littleW esearch?syste ms r
• 9. 3: the system
• 10. 3 1 problem
• 11. Question: How do you serve recommendations from millions of items to millions of users?
• 12. Video ratings 2 4 4 ? 1 3 5 ? ? 1Users ? 4 2 1 ? 1 ? 1 3 3
• 13. def matrix_factorization(MatrixToFactorise, UsersPreferences, MoviesFeatures, NumberOfLatentFeatures, MaxSteps=5000, LearningRate=0.0002, RegularizationConstant=0.02): MoviesFeatures = MoviesFeatures.T for step in xrange(MaxSteps): for user in xrange(len(MatrixToFactorise)): for movie in xrange(len(MatrixToFactorise[user])): if MatrixToFactorise[user][movie] > 0: estimatedUserMovieFactors = MatrixToFactorise[user][movie] - numpy.dot(UsersPreferences[user,:], MoviesFeatures[:,movie]) for feature in xrange(NumberOfLatentFeatures): UsersPreferences[user][feature] = UsersPreferences[user][feature] + LearningRate * (2 * estimatedUserMovieFactors * MoviesFeatures[feature][movie] - RegularizationConstant * UsersPreferences[user][feature]) MoviesFeatures[feature][movie] = MoviesFeatures[feature][movie] + LearningRate * (2 * estimatedUserMovieFactors * UsersPreferences[user][feature] - RegularizationConstant * MoviesFeatures[feature][movie]) # if approximation is good enough, stop iterating ApproximationError = calculate_mean_squared_error_of_estimate(MatrixToFactorise, UsersPreferences, MoviesFeatures, out NumberOfLatentFeatures, ab RegularizationConstant) orry if ApproximationError < 0.001: break S the slide
• 14. [ 0.38 0.91 0.32 0.36 1.22][ 1.52 -0.07 0.66 0.76 0.79] [ 0.72 0.98 0.98 1.28 1.75][ 0.79 0.63 0.08 0.9 1.46] [ 1.54 -0.19 0.81 0.61 0.72][ 0.56 0.58 0.16 0.43 1.28] [ 0.22 0.61 0.95 1.18 -0.09][-0.15 0.7 0.87 1.45 -0.3] [-0.13 0.76 0.97 1.04 -0.26]
• 15. [ 2.05 3.97 3.96 2.12 1.01][ 2.93 5.02 3.21 1.61 0.98][ 2.15 3.95 2.01 1.05 1.1 ][ 1. 4.29 1.01 2.96 2.98]
• 16. [ 2.05 3.97 3.96 2.12 1.01][ 2.93 5.02 3.21 1.61 0.98][ 2.15 3.95 2.01 1.05 1.1 ][ 1. 4.29 1.01 2.96 2.98] 2 4 4 ? 1 3 5 ? ? 1 ? 4 2 1 ? 1 ? 1 3 3
• 17. [ 2.05 3.97 3.96 2.12 1.01][ 2.93 5.02 3.21 1.61 0.98][ 2.15 3.95 2.01 1.05 1.1 ][ 1. 4.29 1.01 2.96 2.98]
• 18. 1 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 56 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 656 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 234 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 113 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 11123 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 634 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 614 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 13x4051 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 133 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 42 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 224 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 12 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 2443 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 623 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 4234 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 56 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 million ratings61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 4321 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 233 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 965 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 62 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 4523 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 2345 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 653 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 21 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 4523 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 2345 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 653 2 11
• 19. r in g l us teC
• 20. Millions of items1 2 3
• 21. 1 2 3
• 22. Recommendation Request1 2 3
• 23. Recommendation Request1 2 3
• 24. Compass = last video
• 25. Interface Delegate Router Workers Workers Workersrequest start route compute reply top N merge / sort top N to json start
• 26. Did it work?
• 27. Results - ~600 requests per second - latency below 30 ms - quality is ok
• 28. Results: Throughput
• 29. Results: Throughput h uh?
• 30. Interface Delegate Router Workers Workers Workersrequest start route compute reply top N merge / sort top N to json start
• 31. Results: Quality Queries Non-zero MAP 1 41 23% 2 87 25% 3 116 36% 4 165 58% 5 196 74%
• 32. Summary - clustering is data - balanced clusters needed - scale is ok
• 33. Photos and imagery used in the presentation (except graphs and logos).Amazon recommendations: http://pleated-jeans.com/2010/08/06/amazon-recommendations-for-characters-from-the-office/Pile of books: http://www.paper-pills.com/category/gewgaws/page/2/Function: http://en.wikipedia.org/wiki/File:Graph_of_example_function.svgServer: http://arstechnica.com/gadgets/2007/08/windows-home-server-system-specs-prices-and-launch-date-leaked/Tick: http://ia.wikipedia.org/wiki/File:Tick_green_modern.svgPhone: http://www.foxbusiness.com/technology/2012/05/22/are-carrier-subsidies-hurting-innovation-and-driving-up-mobile-phone-costs/Man in front of computer: http://honesttogawd.blogspot.com.es/