Online recommendations at scale using matrix factorisation

2,238 views

Published on

This presentation was used for my thesis defense held at Universidad Politecnica de Catalunya, Spain, for a double-degree master programme in Distributed Computing. The other two universities participating in the programme are Royal Institute of Technology, Stockholm, Sweden and Instituto Tecnico Superior, Lisbon, Portugal.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,238
On SlideShare
0
From Embeds
0
Number of Embeds
1,460
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Online recommendations at scale using matrix factorisation

  1. 1. Thesis presentation:Online recommendationsat scale with matrix factorisation.Royal Institute of Technology, Stockholm, Sweden 22 June 2012Instituto Superior Técnico, Lisbon, Portugal Marcus LjungbladUniversitat Politécnica de Catalunya, Barcelona, Spain marcus@ljungblad.nu
  2. 2. " 75% of the 30 million daily movie starts are sourced from recommendations.
  3. 3. " a key differentiating factor
  4. 4. 3 challenges
  5. 5. How do you serverecommendationsfrom millions ofitems to millionsof users online?
  6. 6. Video ratings 2 4 4 ? 1 3 5 ? ? 1Users ? 4 2 1 ? 1 ? 1 3 3
  7. 7. f( )
  8. 8. Video ratings 2.05 3.97 3.96 2.12 1.01 2.93 5.02 3.21 1.61 0.98Users 2.15 3.95 2.01 1.05 1.10 1.00 4.29 1.01 2.96 2.98
  9. 9. Video ratings 2.05 3.97 3.96 2.12 1.01 2 4 4 ? 1Users 2.93 5.02 3.21 1.61 0.98 3 5 ? ? 1 2.15 3.95 2.01 1.05 1.10 ? 4 2 1 ? 1.00 4.29 1.01 2.96 2.98 1 ? 1 3 3
  10. 10. 2.05 3.97 3.96 2.12 1.012.93 5.02 3.21 1.61 0.982.15 3.95 2.01 1.05 1.101.00 4.29 1.01 2.96 2.98
  11. 11. 13x40MILLIONRATINGS
  12. 12. Interface Delegate Router Workerrequest start route compute top-N merge to json reply
  13. 13. Interface Delegate Router Workerrequest start route compute top-N merge to json reply
  14. 14. Did it work?
  15. 15. Setup: • 1-3 machines • 1 million items • same rack = high-speed • 1 test machine
  16. 16. Performance!
  17. 17. Performance! h uh?!
  18. 18. Did it work? w ell
  19. 19. 74% = 74% Offline Online
  20. 20. Summary:... clustering depends on data ...... need balanced clusters ...... memory bound ...... scales ok ...
  21. 21. Thank you!
  22. 22. Photos and pictures borrowed from the Internetz:Iron Maiden cover: http://en.wikipedia.org/wiki/File:Iron_Maiden_(album)_cover.jpgCat picture: http://www.lastfm.es/group/CatsCoins: http://www.sxc.hu/photo/1235540iPhones: http://blog.bayuamus.com/2011/08/user-experience-comparison-between-htc-salsa-and-samsung-galaxy-mini/Amazon recommendations: http://mashable.com/2010/08/06/online-retail-facebook-data/TV remote: http://www.flickr.com/photos/62337512@N00/2749561795/sizes/z/in/photostream/Headphones: http://www.flickr.com/photos/markusschoepke/82957375/sizes/m/in/photostream/Function: http://en.wikipedia.org/wiki/File:Graph_of_example_function.svgHome servers: http://www.flickr.com/photos/fabrico/477844434/sizes/z/in/photostream/
  23. 23. Extra material...
  24. 24. AXYDBLSZQ (1/2) / 1AXYDBLSZQ (1/1) / 1AXYDBLSZQ (1/1 + 2/3) / 2

×