Thesis presentation:

Online recommendations
at scale with matrix factorisation.




Royal Institute of Technology, Stockholm, Sweden               22 June 2012
Instituto Superior Técnico, Lisbon, Portugal                Marcus Ljungblad
Universitat Politécnica de Catalunya, Barcelona, Spain   marcus@ljungblad.nu
"   75% of the 30 million daily movie
    starts are sourced from
    recommendations.
"   a key differentiating factor
3 challenges
How do you serve
recommendations
from millions of
items to millions
of users online?
Video ratings



        2   4       4           ?   1
        3   5       ?           ?   1
Users




        ?   4       2           1   ?
        1   ?       1           3   3
f( )
Video ratings



        2.05   3.97   3.96   2.12   1.01
        2.93   5.02   3.21   1.61   0.98
Users




        2.15   3.95   2.01   1.05   1.10
        1.00   4.29   1.01   2.96   2.98
Video ratings


        2.05   3.97   3.96   2.12   1.01           2   4   4   ?   1
Users




        2.93   5.02   3.21   1.61   0.98           3   5   ?   ?   1
        2.15   3.95   2.01   1.05   1.10           ?   4   2   1   ?
        1.00   4.29   1.01   2.96   2.98           1   ?   1   3   3
2.05   3.97   3.96   2.12   1.01
2.93   5.02   3.21   1.61   0.98
2.15   3.95   2.01   1.05   1.10
1.00   4.29   1.01   2.96   2.98
13x40
MILLION
RATINGS
Interface             Delegate                     Router             Worker


request
                      start

                                             route

                                                                      compute




                                                     top-N




                                           merge
                      to json
  reply
Interface             Delegate                     Router             Worker


request
                      start

                                             route

                                                                      compute




                                                     top-N




                                           merge
                      to json
  reply
Did it work?
Setup:
 • 1-3 machines

 • 1 million items

 • same rack = high-speed

 • 1 test machine
Performance!
Performance!




        h uh?!
Did it work?
          w ell
74% = 74%
 Offline   Online
Summary:
... clustering depends on data ...

... need balanced clusters ...

... memory bound ...

... scales ok ...
Thank you!
Photos and pictures borrowed from the Internetz:

Iron Maiden cover: http://en.wikipedia.org/wiki/File:Iron_Maiden_(album)_cover.jpg
Cat picture: http://www.lastfm.es/group/Cats
Coins: http://www.sxc.hu/photo/1235540
iPhones: http://blog.bayuamus.com/2011/08/user-experience-comparison-between-htc-salsa-and-samsung-galaxy-mini/
Amazon recommendations: http://mashable.com/2010/08/06/online-retail-facebook-data/
TV remote: http://www.flickr.com/photos/62337512@N00/2749561795/sizes/z/in/photostream/
Headphones: http://www.flickr.com/photos/markusschoepke/82957375/sizes/m/in/photostream/
Function: http://en.wikipedia.org/wiki/File:Graph_of_example_function.svg
Home servers: http://www.flickr.com/photos/fabrico/477844434/sizes/z/in/photostream/
Extra material...
AXYDBLSZQ   (1/2) / 1


AXYDBLSZQ   (1/1) / 1


AXYDBLSZQ   (1/1 + 2/3) / 2

Online recommendations at scale using matrix factorisation

  • 1.
    Thesis presentation: Online recommendations atscale with matrix factorisation. Royal Institute of Technology, Stockholm, Sweden 22 June 2012 Instituto Superior Técnico, Lisbon, Portugal Marcus Ljungblad Universitat Politécnica de Catalunya, Barcelona, Spain marcus@ljungblad.nu
  • 5.
    " 75% of the 30 million daily movie starts are sourced from recommendations.
  • 6.
    " a key differentiating factor
  • 7.
  • 14.
    How do youserve recommendations from millions of items to millions of users online?
  • 15.
    Video ratings 2 4 4 ? 1 3 5 ? ? 1 Users ? 4 2 1 ? 1 ? 1 3 3
  • 16.
  • 17.
    Video ratings 2.05 3.97 3.96 2.12 1.01 2.93 5.02 3.21 1.61 0.98 Users 2.15 3.95 2.01 1.05 1.10 1.00 4.29 1.01 2.96 2.98
  • 18.
    Video ratings 2.05 3.97 3.96 2.12 1.01 2 4 4 ? 1 Users 2.93 5.02 3.21 1.61 0.98 3 5 ? ? 1 2.15 3.95 2.01 1.05 1.10 ? 4 2 1 ? 1.00 4.29 1.01 2.96 2.98 1 ? 1 3 3
  • 19.
    2.05 3.97 3.96 2.12 1.01 2.93 5.02 3.21 1.61 0.98 2.15 3.95 2.01 1.05 1.10 1.00 4.29 1.01 2.96 2.98
  • 20.
  • 29.
    Interface Delegate Router Worker request start route compute top-N merge to json reply
  • 30.
    Interface Delegate Router Worker request start route compute top-N merge to json reply
  • 31.
  • 32.
    Setup: • 1-3machines • 1 million items • same rack = high-speed • 1 test machine
  • 33.
  • 34.
  • 35.
  • 36.
    74% = 74% Offline Online
  • 38.
    Summary: ... clustering dependson data ... ... need balanced clusters ... ... memory bound ... ... scales ok ...
  • 39.
  • 41.
    Photos and picturesborrowed from the Internetz: Iron Maiden cover: http://en.wikipedia.org/wiki/File:Iron_Maiden_(album)_cover.jpg Cat picture: http://www.lastfm.es/group/Cats Coins: http://www.sxc.hu/photo/1235540 iPhones: http://blog.bayuamus.com/2011/08/user-experience-comparison-between-htc-salsa-and-samsung-galaxy-mini/ Amazon recommendations: http://mashable.com/2010/08/06/online-retail-facebook-data/ TV remote: http://www.flickr.com/photos/62337512@N00/2749561795/sizes/z/in/photostream/ Headphones: http://www.flickr.com/photos/markusschoepke/82957375/sizes/m/in/photostream/ Function: http://en.wikipedia.org/wiki/File:Graph_of_example_function.svg Home servers: http://www.flickr.com/photos/fabrico/477844434/sizes/z/in/photostream/
  • 42.
  • 45.
    AXYDBLSZQ (1/2) / 1 AXYDBLSZQ (1/1) / 1 AXYDBLSZQ (1/1 + 2/3) / 2