The Netflix experience is driven by a number of Machine Learning algorithms: personalized ranking, page generation, search, similarity, ratings, etc. On the 6th of January, we simultaneously launched Netflix in 130 new countries around the world, which brings the total to over 190 countries. Preparing for such a rapid expansion while ensuring each algorithm was ready to work seamlessly created new challenges for our recommendation and search teams. In this post, we highlight the four most interesting challenges we’ve encountered in making our algorithms operate globally and, most importantly, how this improved our ability to connect members worldwide with stories they'll love.
● How do we make sure all these
algorithms are ready to work on a
● Led us to investigate many
challenges, leading to many rollouts
of new algorithms, over the last year
○ Tech blog post
○ Company blog post
1) Similar users, in two different countries. Should they get similar
2) Overall, should recommendations be different for users in Japan vs users
in Argentina? What about new users?
Group countries into regions, and train
individual models on each region.
● Catalog can be constrained to be
● Solves question 2
● Doesn’t solve question 1
● How to define groupings?
● Algorithms x A/B model variants x
● Biggest country in the region will
Sparsity and global models
Only a small fraction of users from all countries
would be interested in these titles. Models trained
locally perform poorly -- lack of data.
Pooling data from all countries discovers a
worldwide community of interest, making
recommendations better for these users.
Local taste vs personal taste
● Personal taste benefits from global algorithms
○ Taste patterns travel globally
● Local taste still needs to be taken into account in order to solve 2)
● Incorporate signals and priors capturing local taste patterns (e.g. country and
● Ranking entities for partial queries
● Optimizing for the minimum number of interactions needed to find something
● Different languages involve very different interaction patterns
● How to automatically detect and adapt to such patterns in newly introduced
● Objective: build algorithms that work equally well for all our members
● Looking at global metrics might hide issues with small subsets of members
● How to identify sub-optimality for a subset of our members?
○ Language, country, device, …
○ Slicing on all dimensions lead to sparsity and noisiness
○ Automatically grouping observations for the purpose of automatically detecting outliers
● Metrics, instrumentation and monitoring
○ Detect problems
○ Highlight areas of improvement
● Catalog differences, cultural awareness, language and metrics
● Worldwide communities of interest for better recommendations
○ Thinking about global actually led us to test and release better algorithms
○ But also need to capture signals and priors related to cultural preferences
● Quickly finding entities in any language
● Detecting issues at a finer grain
● … Still a lot of work to do!
○ Better global algorithms… (Now that we have data)
○ Better cultural/language awareness
○ Better user and item cold start
○ Better algorithms for anomaly detection