Emmys, Oscars, and

Machine Learning Algorithms

@ Netflix Scale
November,	
  2013	
  
	
  

Xavier	
  Amatriain	
  
Direc...
“In a simple Netlfix-style item recommender, we would
simply apply some form of matrix factorization (i.e NMF)”
What we were interested in:
§  High quality recommendations

Proxy question:
§  Accuracy in predicted rating
§  Improve...
From the Netflix Prize
to today

2006

2013
Big Data @Netflix

§  > 40M subscribers
§  Ratings: ~5M/day

Time

§  Searches: >3M/day

Geo-information

§  Plays: > ...
Smart Models

§  Regression models (Logistic, Linear,
Elastic nets)
§  SVD & other MF models
§  Factorization Machines
...
Deep-dive into
Netflix Algorithms
Rating Prediction
2007 Progress Prize
▪  Top 2 algorithms
▪  SVD - Prize RMSE: 0.8914
▪  RBM - Prize RMSE: 0.8990

▪  Linear blend Prize RMS...
SVD - Definition

A[n x m] = U[n x r] Λ [ r x r] (V[m x r])T
▪ 

A: n x m matrix (e.g., n documents, m terms)

▪ 

U: n x ...
SVD - Properties
▪  ‘spectral decomposition’ of the matrix:

= u1

u2

x λ1

λ2

▪  ‘documents’, ‘terms’ and ‘concepts’:
§...
SVD for Rating Prediction
§  User factor vectors
§  Baseline (bias)

and item-factors vectors
(user & item deviation

fr...
Restricted Boltzmann Machines
§  Restrict the connectivity in ANN
to make learning easier.

§ 

Only one layer of hidden...
What about the final prize ensembles?
▪  Our offline studies showed they were too
computationally intensive to scale
▪  Ex...
Ranking

Ranking
Ranking
§  Ranking = Scoring + Sorting +
Filtering bags of movies for
presentation to a user

§  Factors
§  Accuracy
§...
Example: Two features, linear model

2	
  
3	
  
4	
  
5	
  

Popularity	
  

Linear	
  Model:	
  
frank(u,v)	
  =	
  w1	
...
Example: Two features, linear model

2	
  
3	
  
4	
  
5	
  

Popularity	
  

Final	
  Ranking	
  

Predicted	
  RaEng	
  ...
Ranking
Learning to Rank
▪  Ranking is a very important problem in many contexts
(search, advertising, recommendations)
▪  Quality...
Learning to Rank Approaches
§  ML problem: construct ranking model from training data
1. 

Pointwise (Ordinal regression,...
Similarity
Similars
What is similarity?
§ 

Similarity can refer to different dimensions
§  Similar in metadata/tags
§  Similar in user pla...
Graph-based similarities

0.8
0.3
0.3
0.4
0.2

0.3
0.7
Example of graph-based similarity: SimRank
▪  SimRank (Jeh & Widom, 02): “two objects are
similar if they are referenced b...
Final similarity ensemble
§ 

Come up with a score of play similarity, rating similarity,
tag-based similarity…

§ 

Com...
Row Selection
Row	
  SelecEon	
  Inputs	
  
§  Visitor	
  data	
  
§  Video	
  history	
  
§  Queue	
  adds	
  
§  RaEngs	
  
§  Ta...
Row	
  SelecEon	
  Core	
  Algorithm	
  
SelecEon	
  
constraints	
  
Candidate	
  
data	
  

Groupify	
  

Score/Select	
...
SelecEon	
  Constraints	
  
•  Many	
  business	
  rules	
  and	
  heurisEcs	
  
–  RelaEve	
  posiEon	
  of	
  groups	
  ...
GroupificaEon	
  
•  Ranking	
  of	
  Etles	
  according	
  to	
  
ranking	
  algo	
  
•  Filtering	
  
•  Deduping	
  
•  ...
Scoring	
  and	
  SelecEon	
  
•  Features	
  
considered	
  

–  ARO	
  scorer	
  
–  Greedy	
  algorithm	
  

–  Quality...
Search Recommendations
Search Recommendation
CombinaEon	
  of	
  Different	
  models	
  
Play	
  Aaer	
  Search:	
  TransiEon	
  on	
  Play	
  aae...
Unavailable Title Recommendations
Gamification Algorithms
rating gameGame
Rating
Mood Selection Algorithm
1.  Editorially selected moods
2.  Sort them according to users consumptio...
More data or
better models?
More data or better models?

Really?

Anand Rajaraman: Former Stanford Prof. &
Senior VP at Walmart
More data or better models?
Sometimes, it’s not
about more data
More data or better models?
[Banko and Brill, 2001]

Norvig: “Google does not
have better Algorithms,
only more Data”

Man...
More data or better models?

Sometimes, it’s not
about more data
“Data without a sound approach = noise”
Conclusion
More data +
Smarter models +
More accurate metrics +
Better system architectures
Lots of room for improvement!
Xavier Amatriain (@xamat)
xavier@netflix.com

Thanks!

We’re hiring!
Upcoming SlideShare
Loading in …5
×

Xavier amatriain, dir algorithms netflix m lconf 2013

2,303 views

Published on

Xavier Amatriain, Research/Engineering Director, Netflix: Emmys, Oscars, and Machine Learning Algorithms

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,303
On SlideShare
0
From Embeds
0
Number of Embeds
1,058
Actions
Shares
0
Downloads
22
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Xavier amatriain, dir algorithms netflix m lconf 2013

  1. 1. Emmys, Oscars, and Machine Learning Algorithms @ Netflix Scale November,  2013     Xavier  Amatriain   Director  -­‐  Algorithms  Engineering  -­‐  Ne<lix   @xamat  
  2. 2. “In a simple Netlfix-style item recommender, we would simply apply some form of matrix factorization (i.e NMF)”
  3. 3. What we were interested in: §  High quality recommendations Proxy question: §  Accuracy in predicted rating §  Improve by 10% = $1million!
  4. 4. From the Netflix Prize to today 2006 2013
  5. 5. Big Data @Netflix §  > 40M subscribers §  Ratings: ~5M/day Time §  Searches: >3M/day Geo-information §  Plays: > 50M/day Impressions Device Info §  Streamed hours: Metadata Social Ratings o  5B hours in Q3 2013 Member Behavior Demographics
  6. 6. Smart Models §  Regression models (Logistic, Linear, Elastic nets) §  SVD & other MF models §  Factorization Machines §  Restricted Boltzmann Machines §  Markov Chains & other graph models §  Clustering (from k-means to HDP) §  Deep ANN §  LDA §  Association Rules §  GBDT/RF §  …
  7. 7. Deep-dive into Netflix Algorithms
  8. 8. Rating Prediction
  9. 9. 2007 Progress Prize ▪  Top 2 algorithms ▪  SVD - Prize RMSE: 0.8914 ▪  RBM - Prize RMSE: 0.8990 ▪  Linear blend Prize RMSE: 0.88 ▪  Currently in use as part of Netflix’ rating prediction component ▪  Limitations ▪  Designed for 100M ratings, we have 5B ratings ▪  Not adaptable as users add ratings ▪  Performance issues
  10. 10. SVD - Definition A[n x m] = U[n x r] Λ [ r x r] (V[m x r])T ▪  A: n x m matrix (e.g., n documents, m terms) ▪  U: n x r matrix (n documents, r concepts) ▪  ▪  Λ: r x r diagonal matrix (strength of each ‘concept’) (r: rank of the matrix) V: m x r matrix (m terms, r concepts)
  11. 11. SVD - Properties ▪  ‘spectral decomposition’ of the matrix: = u1 u2 x λ1 λ2 ▪  ‘documents’, ‘terms’ and ‘concepts’: §  U: document-to-concept similarity matrix §  V: term-to-concept similarity matrix §  Λ: its diagonal elements: ‘strength’ of each concept x v1 v2
  12. 12. SVD for Rating Prediction §  User factor vectors §  Baseline (bias) and item-factors vectors (user & item deviation from average) §  Predict rating as §  SVD++ (Koren et. Al) asymmetric variation w. implicit feedback §  Where §  are three item factor vectors §  Users are not parametrized
  13. 13. Restricted Boltzmann Machines §  Restrict the connectivity in ANN to make learning easier. §  Only one layer of hidden units. §  §  Although multiple layers are possible No connections between hidden units. §  Hidden units are independent given the visible states.. §  RBMs can be stacked to form Deep Belief Networks (DBN) – 4th generation of ANNs
  14. 14. What about the final prize ensembles? ▪  Our offline studies showed they were too computationally intensive to scale ▪  Expected improvement not worth the engineering effort ▪  Plus, focus had already shifted to other issues that had more impact than rating prediction...
  15. 15. Ranking Ranking
  16. 16. Ranking §  Ranking = Scoring + Sorting + Filtering bags of movies for presentation to a user §  Factors §  Accuracy §  Novelty Key algorithm, sorts titles in most contexts §  Diversity §  Freshness §  Goal: Find the best possible ordering of a set of videos for a user within a specific context in real-time §  Scalability §  §  Objective: maximize consumption & “enjoyment” §  …
  17. 17. Example: Two features, linear model 2   3   4   5   Popularity   Linear  Model:   frank(u,v)  =  w1  p(v)  +  w2  r(u,v)  +  b   Final  Ranking   Predicted  RaEng   1  
  18. 18. Example: Two features, linear model 2   3   4   5   Popularity   Final  Ranking   Predicted  RaEng   1  
  19. 19. Ranking
  20. 20. Learning to Rank ▪  Ranking is a very important problem in many contexts (search, advertising, recommendations) ▪  Quality of ranking is measured using ranking metrics such as NDCG, MRR, MAP, FPC… ▪  It is hard to optimize machine-learned models directly on these measures ▪  They are not differentiable ▪  We would like to use the same measure for optimizing and for the final evaluation
  21. 21. Learning to Rank Approaches §  ML problem: construct ranking model from training data 1.  Pointwise (Ordinal regression, Logistic regression, SVM, GBDT, …) Loss function defined on individual relevance judgment §  2.  Pairwise (RankSVM, RankBoost, RankNet, FRank…) §  §  3.  Loss function defined on pair-wise preferences Goal: minimize number of inversions in ranking Listwise §  Indirect Loss Function (RankCosine, ListNet…) §  Directly optimize IR measures (NDCG, MRR, FCP…) §  Genetic Programming or Simulated Annealing §  Use boosting to optimize NDCG (Adarank) §  Gradient descent on smoothed version (CLiMF, TFMAP, GAPfm @cikm13) §  Iterative Coordinate Ascent (Direct Rank @kdd13)
  22. 22. Similarity
  23. 23. Similars
  24. 24. What is similarity? §  Similarity can refer to different dimensions §  Similar in metadata/tags §  Similar in user play behavior §  Similar in user rating behavior §  … §  You can learn a model for each of them and finally learn an ensemble
  25. 25. Graph-based similarities 0.8 0.3 0.3 0.4 0.2 0.3 0.7
  26. 26. Example of graph-based similarity: SimRank ▪  SimRank (Jeh & Widom, 02): “two objects are similar if they are referenced by similar objects.”
  27. 27. Final similarity ensemble §  Come up with a score of play similarity, rating similarity, tag-based similarity… §  Combine them using an ensemble §  Weights are learned using regression over existing response §  The final concept of “similarity” responds to what users vote as similar
  28. 28. Row Selection
  29. 29. Row  SelecEon  Inputs   §  Visitor  data   §  Video  history   §  Queue  adds   §  RaEngs   §  Taste  preferences   §  Predicted  raEngs   §  PVR  ranking   §  etc.   §  Global data §  Metadata §  Popularity §  etc.
  30. 30. Row  SelecEon  Core  Algorithm   SelecEon   constraints   Candidate   data   Groupify   Score/Select   ParEal   selecEon   Group   selecEon  
  31. 31. SelecEon  Constraints   •  Many  business  rules  and  heurisEcs   –  RelaEve  posiEon  of  groups   –  Presence/absence  of  groups   –  Deduping  rules   –  Number  of  groups  of  a  given  type   •  Determined  by  past  A/B  tests   •  Evolved  over  Eme  from  a  simple  template  
  32. 32. GroupificaEon   •  Ranking  of  Etles  according  to   ranking  algo   •  Filtering   •  Deduping   •  Forma^ng   Candidate   data   •  SelecEon  of  evidence   •  ...   First  n   selected   groups   Group  for   posiEon  n +1  
  33. 33. Scoring  and  SelecEon   •  Features   considered   –  ARO  scorer   –  Greedy  algorithm   –  Quality  of  items   –  Framework  supports   other  methods   –  Diversity  of  tags   –  Quality  of  evidence   §  Backtracking   –  Freshness   §  Approximate  integer   programming   –  Recency   –  ...   §  etc.    
  34. 34. Search Recommendations
  35. 35. Search Recommendation CombinaEon  of  Different  models   Play  Aaer  Search:  TransiEon  on  Play  aaer  Query     Play  Related  to  Search:  Similarity  between  user’s  (query,   play)     …   •  •  •  Read this paper for a description of a similar system at LinkedIn
  36. 36. Unavailable Title Recommendations
  37. 37. Gamification Algorithms
  38. 38. rating gameGame Rating Mood Selection Algorithm 1.  Editorially selected moods 2.  Sort them according to users consumption 3.  Take into account freshness, diversity...
  39. 39. More data or better models?
  40. 40. More data or better models? Really? Anand Rajaraman: Former Stanford Prof. & Senior VP at Walmart
  41. 41. More data or better models? Sometimes, it’s not about more data
  42. 42. More data or better models? [Banko and Brill, 2001] Norvig: “Google does not have better Algorithms, only more Data” Many features/ low-bias models
  43. 43. More data or better models? Sometimes, it’s not about more data
  44. 44. “Data without a sound approach = noise”
  45. 45. Conclusion
  46. 46. More data + Smarter models + More accurate metrics + Better system architectures Lots of room for improvement!
  47. 47. Xavier Amatriain (@xamat) xavier@netflix.com Thanks! We’re hiring!

×