Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Netflix’s Recommendation
ML Pipeline using
Apache Spark
DB Tsai
Spark Summit East - Feb 8, 2017
At Netflix, we use ML everywhere
Everything is
a Recommendation
Over 80% of what members watch comes
from our recommendations
Recommendations are driven by Machine
Learning Algorithms
Jan 6th, 2016
#NetflixEverywhere
● 93+ Million Members
● 190+ Countries
● 125+ Million streaming hours / day
● 1000 hours of Original co...
Constantly Innovating
through A/B tests
Try an idea offline using historical data to see
if they would have made better
recommendations
If it does, deploy a live ...
Running an Experiment
Design Experiment
Collect Label Dataset
DeLorean: Offline
Feature Generation
Distributed
Model Train...
We use a standardized data format across
multiple ranking pipelines
This standardized data format is used by
common toolin...
Contexts: The setting for evaluating a set of items (e.g.
tuples of member profiles, country, time, device, etc.)
Items: T...
root
|-- profile_id: long (nullable = false)
|-- country_iso_code: string (nullable = false)
|-- items: array (nullable = ...
The nested data structure avoids an expensive
shuffle when ranking
The features are derived from Netflix data or the
outpu...
Transformer
https://en.wikipedia.org/wiki/Spark_(Transformers)
Transformer takes an input DataFrame and “lazily”
returns an output DataFrame
Item Transformer
● Extends Spark ML’s Transf...
Why DataFrame?
Catalyst Optimizations
Up-front Schema Verification
We found a 4x speedup during feature generation
by migr...
Negative Generator
Facts
root
|-- profile_id: long (nullable = false)
|-- country_iso_code: string (nullable = false)
|-- ...
DeLorean Feature Generator
root
|-- profile_id: long (nullable = false)
|-- country_iso_code: string (nullable = false)
|-...
Creating the Dataset for Algorithms
Multithreading Model Training
For single machine multi-threading algorithms, we
allocate one task to one machine. Multiple...
Distributed Model Training
We use both Spark ML’s algorithms and in-house
ML implementations
We keep the interface similar...
Scoring and Ranking
Scorer is also a Transformer
returned from the Trainer
Multiple models can be scored at
the same time ...
Lessons Learned - Pipeline Abstraction
Pros
● Modularity + Tests
● Plug-N-Play
● Notebook Prototyping
● Customizability
● ...
2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East talk by DB Tsai
Upcoming SlideShare
Loading in …5
×

2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East talk by DB Tsai

289 views

Published on

Netflix is the world’s largest streaming service, with 80 million members in over 250 countries. Netflix uses machine learning to inform nearly every aspect of the product, from the recommendations you get, to the boxart you see, to the decisions made about which TV shows and movies are created.
Given this scale, we utilized Apache Spark to be the engine of our recommendation pipeline. Apache Spark enables Netflix to use a single, unified framework/API – for ETL, feature generation, model training, and validation. With pipeline framework in Spark ML, each step within the Netflix recommendation pipeline (e.g. label generation, feature encoding, model training, model evaluation) is encapsulated as Transformers, Estimators and Evaluators – enabling modularity, composability and testability. Thus, Netflix engineers can build our own feature engineering logics as Transformers, learning algorithms as Estimators, and customized metrics as Evaluators, and with these building blocks, we can more easily experiment with new pipelines and rapidly deploy them to production.

In this talk, we will discuss how Apache Spark is used as a distributed framework we build our own algorithms on top of to generate personalized recommendations for each of our 80+ million subscribers, specific techniques we use at Netflix to scale, and the various pitfalls we’ve found along the way.

Published in: Software
  • Be the first to comment

2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East talk by DB Tsai

  1. 1. Netflix’s Recommendation ML Pipeline using Apache Spark DB Tsai Spark Summit East - Feb 8, 2017
  2. 2. At Netflix, we use ML everywhere
  3. 3. Everything is a Recommendation
  4. 4. Over 80% of what members watch comes from our recommendations Recommendations are driven by Machine Learning Algorithms
  5. 5. Jan 6th, 2016
  6. 6. #NetflixEverywhere ● 93+ Million Members ● 190+ Countries ● 125+ Million streaming hours / day ● 1000 hours of Original content in 2017 ● ⅓ of US internet traffic during evenings
  7. 7. Constantly Innovating through A/B tests
  8. 8. Try an idea offline using historical data to see if they would have made better recommendations If it does, deploy a live A/B test to see if it performs well in Production Data Driven
  9. 9. Running an Experiment Design Experiment Collect Label Dataset DeLorean: Offline Feature Generation Distributed Model Training Parallel training of individual models using different executors Compute Validation Metrics Model Testing Choose best model Design a New Experiment to Test Out Different Ideas Good Metrics Offline Experiment Online A/B Test Online AB Testing Bad Metrics Selected Contexts
  10. 10. We use a standardized data format across multiple ranking pipelines This standardized data format is used by common tooling, libraries, and algorithms
  11. 11. Contexts: The setting for evaluating a set of items (e.g. tuples of member profiles, country, time, device, etc.) Items: The elements to be trained on, scored, and/or ranked (e.g. videos, rows, search entities) Labels: For supervised learning, this will be the label (target) for each item Ranking problems
  12. 12. root |-- profile_id: long (nullable = false) |-- country_iso_code: string (nullable = false) |-- items: array (nullable = true) | |-- element: struct (containsNull = false) | | |-- show_title_id: long (nullable = false) | | |-- label: double (nullable = false) | | |-- weight: double (nullable = false) | | |-- features: struct (nullable = false) | | | |-- feature1: double (nullable = false) | | | |-- feature2: double (nullable = false) | | | |-- feature3: double (nullable = false) DeLorean Data Format a.k.a DMC-12
  13. 13. The nested data structure avoids an expensive shuffle when ranking The features are derived from Netflix data or the output of other trained models The features are persisted in HIVE using Parquet Ensemble methods are used to build rankers
  14. 14. Transformer https://en.wikipedia.org/wiki/Spark_(Transformers)
  15. 15. Transformer takes an input DataFrame and “lazily” returns an output DataFrame Item Transformer ● Extends Spark ML’s Transformer ● Accepts DMC-12 DataFrame with contextual information ● Transforms DataFrame at the item level
  16. 16. Why DataFrame? Catalyst Optimizations Up-front Schema Verification We found a 4x speedup during feature generation by migrating from RDD-based implementation to DataFrame implementation
  17. 17. Negative Generator Facts root |-- profile_id: long (nullable = false) |-- country_iso_code: string (nullable = false) |-- items: array (nullable = true) | |-- element: struct (containsNull = false) | | |-- show_title_id: long (nullable = false) | | |-- label: double (nullable = false) | | |-- weight: double (nullable = false) Facts with synthetic negatives root |-- profile_id: long (nullable = false) |-- country_iso_code: string (nullable = false) |-- items: array (nullable = true) | |-- element: struct (containsNull = false) | | |-- show_title_id: long (nullable = false) | | |-- label: double (nullable = false) | | |-- weight: double (nullable = false) Creating negatives from what member plays for supervised learning
  18. 18. DeLorean Feature Generator root |-- profile_id: long (nullable = false) |-- country_iso_code: string (nullable = false) |-- items: array (nullable = true) | |-- element: struct (containsNull = false) | | |-- show_title_id: long (nullable = false) | | |-- label: double (nullable = false) | | |-- weight: double (nullable = false) root |-- profile_id: long (nullable = false) |-- country_iso_code: string (nullable = false) |-- items: array (nullable = true) | |-- element: struct (containsNull = false) | | |-- show_title_id: long (nullable = false) | | |-- label: double (nullable = false) | | |-- weight: double (nullable = false) | | |-- features: struct (nullable = false) | | | |-- feature1: double (nullable = false) | | | |-- feature2: double (nullable = false) | | | |-- feature3: double (nullable = false) Creating features based on common code base in offline and online system http://techblog.netflix.com/2016/02/distributed-time-travel-for-feature.html
  19. 19. Creating the Dataset for Algorithms
  20. 20. Multithreading Model Training For single machine multi-threading algorithms, we allocate one task to one machine. Multiple tasks are running in Spark for different parameters Broadcast in Spark has datasize limitation, we write data into HDFS, and stream the data into the trainers in executors which run single-machine multi-threading algorithms
  21. 21. Distributed Model Training We use both Spark ML’s algorithms and in-house ML implementations We keep the interface similar for both multi-threading and distributed algorithms, so experimenters can try different ideas easily
  22. 22. Scoring and Ranking Scorer is also a Transformer returned from the Trainer Multiple models can be scored at the same time in parallel The ranks are derived from sorted scores Together with labels, we can compute metrics, NMRR, NDCG, and Recall, etc root |-- profile_id: long (nullable = false) |-- country_iso_code: string (nullable = false) |-- items: array (nullable = true) | |-- element: struct (containsNull = false) | | |-- show_title_id: long (nullable = false) | | |-- label: double (nullable = false) | | |-- weight: double (nullable = false) | | |-- features: struct (nullable = false) | | | |-- feature1: double (nullable = false) | | | |-- feature2: double (nullable = false) | | | |-- feature3: double (nullable = false) | | |-- scores: struct (nullable = false) | | | |-- model1: double false | | | |-- model2: double false
  23. 23. Lessons Learned - Pipeline Abstraction Pros ● Modularity + Tests ● Plug-N-Play ● Notebook Prototyping ● Customizability ● Schema Verification ● Serializability (AC) Cons ● Dependent on Spark platform ● Not easy to bring to production ● Default Metric Evaluator doesn’t support ranking multiple models and type of metrics in one pass

×