Recommending for the World
Yves Raimond (@moustaki)
03/16
Research/Engineering Manager
Search & Recommendations Algorithm Engineering
Netflix
Some background
● > 75M members
● > 190 countries
● > 3.7B hours of content streamed every
month
● > 1000 device types
● 36% of peak US downstream traffic
Netflix scale
Recommendations @ Netflix
Goal
Help members find content to watch and enjoy
to maximize satisfaction and retention
▪
▪
▪
▪
▪
▪
▪
▪
▪
▪
▪ …
Models & Algorithms
Going global
● How do we make sure all these
algorithms are ready to work on a
global scale?
● Led us to investigate many
challenges, leading to many rollouts
of new algorithms, over the last year
○ Tech blog post
○ Company blog post
Challenge 1: Uneven Video Availability
US
FR
US
FR
1,000 users
100 users
0
...
...Co-occurrences
! =
?????
R ≈ UM
! =
What would have happened if the two
videos were available to the same
members?
US
FR
1,000 users
100 users
100,000 users
10 users
≈
US
FR
2016-01-01 2016-01-02
Newly available
What would have happened if the two
videos were available to the same
members for the same amount of time?
Challenge 2: Cultural Awareness
Two questions
1) Similar users, in two different countries. Should they get similar
recommendations?
2) Overall, should recommendations be different for users in Japan vs users
in Argentina? What about new users?
Regional models
Group countries into regions, and train
individual models on each region.
Pros
● Easy!
● Catalog can be constrained to be
relatively uniform
● Solves question 2
Cons
● Doesn’t solve question 1
● How to define groupings?
● Algorithms x A/B model variants x
regions
● Biggest country in the region will
dominate
● Sparsity
Sparsity and global models
Only a small fraction of users from all countries
would be interested in these titles. Models trained
locally perform poorly -- lack of data.
Pooling data from all countries discovers a
worldwide community of interest, making
recommendations better for these users.
Global communities - Anime
Global communities - Bollywood
Local taste vs personal taste
● Personal taste benefits from global algorithms
○ Taste patterns travel globally
● Local taste still needs to be taken into account in order to solve 2)
● Incorporate signals and priors capturing local taste patterns (e.g. country and
language)
Challenge 3: Language
Instant search
● Ranking entities for partial queries
● Optimizing for the minimum number of interactions needed to find something
● Different languages involve very different interaction patterns
● How to automatically detect and adapt to such patterns in newly introduced
languages?
Hangul alphabet, 3 syllables but
requires 7 (2 + 3 + 2) interactions
One interaction
Language & Recommendations
≈+
US US/AU FR
?
Challenge 4: Does it even work?
Tracking quality
● Objective: build algorithms that work equally well for all our members
● Looking at global metrics might hide issues with small subsets of members
● How to identify sub-optimality for a subset of our members?
○ Language, country, device, …
○ Slicing on all dimensions lead to sparsity and noisiness
○ Automatically grouping observations for the purpose of automatically detecting outliers
● Metrics, instrumentation and monitoring
○ Detect problems
○ Highlight areas of improvement
Conclusion
● Catalog differences, cultural awareness, language and metrics
● Worldwide communities of interest for better recommendations
○ Thinking about global actually led us to test and release better algorithms
○ But also need to capture signals and priors related to cultural preferences
● Quickly finding entities in any language
● Detecting issues at a finer grain
● … Still a lot of work to do!
○ Better global algorithms… (Now that we have data)
○ Better cultural/language awareness
○ Better user and item cold start
○ Reactiveness
○ Better algorithms for anomaly detection
Conclusion
Questions?

Recommending for the World

  • 2.
    Recommending for theWorld Yves Raimond (@moustaki) 03/16 Research/Engineering Manager Search & Recommendations Algorithm Engineering Netflix
  • 3.
  • 7.
    ● > 75Mmembers ● > 190 countries ● > 3.7B hours of content streamed every month ● > 1000 device types ● 36% of peak US downstream traffic Netflix scale
  • 8.
  • 9.
    Goal Help members findcontent to watch and enjoy to maximize satisfaction and retention
  • 19.
  • 21.
    Going global ● Howdo we make sure all these algorithms are ready to work on a global scale? ● Led us to investigate many challenges, leading to many rollouts of new algorithms, over the last year ○ Tech blog post ○ Company blog post
  • 22.
    Challenge 1: UnevenVideo Availability
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
    What would havehappened if the two videos were available to the same members?
  • 30.
  • 31.
  • 32.
  • 33.
    What would havehappened if the two videos were available to the same members for the same amount of time?
  • 34.
  • 35.
  • 36.
    1) Similar users,in two different countries. Should they get similar recommendations?
  • 37.
    2) Overall, shouldrecommendations be different for users in Japan vs users in Argentina? What about new users?
  • 38.
    Regional models Group countriesinto regions, and train individual models on each region. Pros ● Easy! ● Catalog can be constrained to be relatively uniform ● Solves question 2 Cons ● Doesn’t solve question 1 ● How to define groupings? ● Algorithms x A/B model variants x regions ● Biggest country in the region will dominate ● Sparsity
  • 39.
    Sparsity and globalmodels Only a small fraction of users from all countries would be interested in these titles. Models trained locally perform poorly -- lack of data. Pooling data from all countries discovers a worldwide community of interest, making recommendations better for these users.
  • 40.
  • 41.
  • 42.
    Local taste vspersonal taste ● Personal taste benefits from global algorithms ○ Taste patterns travel globally ● Local taste still needs to be taken into account in order to solve 2) ● Incorporate signals and priors capturing local taste patterns (e.g. country and language)
  • 44.
  • 45.
    Instant search ● Rankingentities for partial queries ● Optimizing for the minimum number of interactions needed to find something ● Different languages involve very different interaction patterns ● How to automatically detect and adapt to such patterns in newly introduced languages?
  • 46.
    Hangul alphabet, 3syllables but requires 7 (2 + 3 + 2) interactions
  • 47.
  • 48.
  • 49.
    Challenge 4: Doesit even work?
  • 50.
    Tracking quality ● Objective:build algorithms that work equally well for all our members ● Looking at global metrics might hide issues with small subsets of members ● How to identify sub-optimality for a subset of our members? ○ Language, country, device, … ○ Slicing on all dimensions lead to sparsity and noisiness ○ Automatically grouping observations for the purpose of automatically detecting outliers ● Metrics, instrumentation and monitoring ○ Detect problems ○ Highlight areas of improvement
  • 51.
  • 52.
    ● Catalog differences,cultural awareness, language and metrics ● Worldwide communities of interest for better recommendations ○ Thinking about global actually led us to test and release better algorithms ○ But also need to capture signals and priors related to cultural preferences ● Quickly finding entities in any language ● Detecting issues at a finer grain ● … Still a lot of work to do! ○ Better global algorithms… (Now that we have data) ○ Better cultural/language awareness ○ Better user and item cold start ○ Reactiveness ○ Better algorithms for anomaly detection Conclusion
  • 53.