Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ML Infra for Netflix Recommendations - AI NEXTCon talk

5,129 views

Published on

Slides describe the tech stack for personalization infrastructure at Netflix.

Published in: Technology

ML Infra for Netflix Recommendations - AI NEXTCon talk

  1. 1. Faisal Siddiqi (@faisalzs) 12 Apr 2018 Machine Learning Infra for Recommendations
  2. 2. ?
  3. 3. ?
  4. 4. Create Personalized recommendations for discoveries of engaging video content that maximize member joy Goal
  5. 5. Create Personalized recommendations for discoveries of engaging video content that maximize member joy Goal
  6. 6. Outline for today ● Domain Context ● ML Infra stack for Personalization ● Deeper dives into 2 major ML Infra components
  7. 7. Personalize everything! Row Selection & Order Titles ranked by relevance Artwork!
  8. 8. Personalization Context Data from Millions of users Training pipelines Models Precompute System Rankings Online caches AB Test Allocation
  9. 9. Joy == Rainbow
  10. 10. Training Data Preparation Training Feature Engineering Model Quality Intent To Treat (Serving) Treatment & Action Inference & Logging The Personalization Rainbow Online Device Function Offline Personalization systems & infrastructure Control Plane
  11. 11. Label Generation Fact Store Training Data Preparation Training Feature Engineering Model Quality Intent To Treat (Serving) Treatment & Action Hyperparameter Optimization N O T E B O O K S Caching Dynamic param management Inference & Logging A/B Testing Platform Online & Precompute Framework Personalization Aggregation Fact Logging Device Logging Online Services API The Personalization Rainbow Control Plane Online Device Function Offline Personalization systems & infrastructure Boson Algo Commons O R C H E S T R A T I O N
  12. 12. Training Feature Engineering Model Quality Inference & Logging Boson Algo Commons We’ll zoom into Boson & AlgoCommons today
  13. 13. The Context for AlgoCommons & Boson ● Machine Learning via ‘loosely-coupled, highly-aligned’ Scala/Java Libraries ● Historical context ○ Siloed machine learning infrastructure ○ Few opportunities for sharing ■ Incompatibility ■ Dependency concerns ■ Improvements in one pipeline not shared across others
  14. 14. Design Principles ● Composability ○ Ability to put pieces together in novel ways ○ Enable construction of generic tools ● Portability ○ Easily share code online/offline and between applications ○ Models, Feature encoders, Common data manipulation ● Avoiding Training-Serving Skew ○ Serving/Online systems are Java based, drives choice of offline software ○ Share code & data between offline/online worlds
  15. 15. Training Feature Engineering Model Quality Inference & Logging Delorean Time Travel Feature Generation Feature Transformers Label Joins Feature Schema Stratification & Sampling Data Fetchers & utilities Training API Model Tuning Boson AlgoCommons Spot Checks (human-in-the-loop) Visualization Feature Importance Validation Runs Training Metrics Abstractions Feature Sharing Component Sets Data Maps Feature Encoders Specification Common Model Format (JSON) Metrics Framework Predictions Inferencing Metrics Scoring Model Loading InferencingAlgoCommons & Boson Batch Training over Distributed Spark or Dockerized Containers
  16. 16. AlgoCommons
  17. 17. ● Common abstractions and building blocks for ML ● Integrated in Java microservices for online or pre-computed Inferencing ● Library > framework (user-focus) ● Program to interfaces (composability) ● Aggressive modularization to avoid Jar Hell (portability) ● Data Access Abstraction (portability, testability) Overview AlgoCommons
  18. 18. Common abstractions and Building Blocks ● Data ○ Data Keys ○ Data Maps ● Modeling ○ Component Sets ○ Feature Encoders, Predictor, Scorer ○ Model Format ● Metrics AlgoCommons
  19. 19. DataKey<T> ○ Identifies a data value by name/type e.g “ViewingHistory” Data Value ○ Preferably immutable data structure DataMap ○ Map from DataKey<T> to T, plus metadata Data Access - Abstractions AlgoCommons
  20. 20. Data Access - Lifecycle Application Component Factory Component What DataKeys do you need? I need X, Y, and Z f.create(dataMap) new Component(X, Y, Z) Return comp comp.do(someInput) Make DataMap w/ X, Y, and Z Data Retrieval Component Instantiation / Data Prep Component Application (repeat as needed) AlgoCommons
  21. 21. DataTransform ● DataMap => K/V ● Given zero or more key/values, produce a new key/value ● Consumable by other data transforms, feature encoders, and components AlgoCommons
  22. 22. Feature Encoder ● DataMap ⇒ (T ⇒ FeatureSet) ● FeatureEncoder<T> create(DataMap) ○ Given a DataMap, initialize a new encoder doing any required data prep ● void encode(T, FeatureSet) ○ Given an item (say, a Video), encode features for it into the feature set AlgoCommons
  23. 23. Feature Transform ● Expression “language” for transforming features to produce new features ○ aka Feature Interactions ● Many operators available ○ log, outer/inner product, arithmetic, logic ● Expressions can be arbitrarily “stacked” ● Expressions are automatically DeDuped AlgoCommons
  24. 24. Predictor ● Compute a score for a feature vector ● DataMap ⇒ (Vector ⇒ Double) ○ Predictor create(DataMap) ■ Given a data map, construct a new predictor ○ double predict(Vector) ■ Given a feature vector, compute a prediction/score ● Supports many Predictors: ○ LR, RegressionTree, TensorFlow, XGBoost, WeightedAdditiveEnsemble, FeatureWeighted, MultivariatePredictors, BanditPredictor, Sequence-to-sequence,... AlgoCommons
  25. 25. Scorer ● Compute a score for business objects ● DataMap ⇒ (T ⇒ Double) ● Scorer<T> create(DataMap) ○ Given a data map, construct a new Scorer<T>. ● double score(T) ○ Given an item, compute a score AlgoCommons
  26. 26. Extensible Model Definition ● Component abstraction ● JSON model serialization ● Various “views” of the Model ○ Feature gen ○ Prediction ○ Scoring { "@id" : "my-model", "@schema" : "SimpleFeatureScoringModel", "dataTransforms" : [ ... data transforms ...], "featureEncoders" : [ ... feature defs ...], "featureTransform" : { ... feature interactions ... }, "predictor" : { ... ML model (weights, etc.) ... } } AlgoCommons
  27. 27. Data Transform Data Transform Feature Encoder Feature Transform App Data Feature Encoder Feature Encoder Predictor ScoringModelView DataTransformView FeatureGeneratorView PredictorView Views of the Feature Scoring Model AlgoCommons
  28. 28. Metrics ● Building blocks ○ Accumulators ○ Estimators ● Ranking ○ Precision, Recall ○ Recall@Rank, NormalizedMeanReciprocalRank ● Regression ○ Error Accumulators ○ RMSE AlgoCommons
  29. 29. Motivation Provide the productivity of this But make it easy to go between prod & experimentation
  30. 30. Overview ● A high level Scala API for ML exploration ● Focuses on Offline Training for both ○ Ad-hoc exploration ○ Production Training ● Think “Subset of SKLearn” for Scala/JVM ecosystem ● Spark’s dataframe a core data abstraction
  31. 31. Data Utilities ● Utilities for data transfer between heterogeneous systems ● Leverage Spark for data munging, but need bridge to Docker Trainers ○ Use standalone s3 downloader and parquet reader ○ S3 + s3fs-fuse ○ HDFS + hdfs-fuse ● On the wire format ○ Parquet ○ Protobuf
  32. 32. Feature Schema ● Context The setting for evaluating a set of items (member profiles, country, etc.) ● Items The elements to be trained on and scored (videos, rows, etc.)
  33. 33. Stratification dataframe.stratify (samplingRules = $(“column_foo”) == ‘US’ maxPercent 8.0, $(“column_bar”) > 10 && $(“column_qux”) > 1 minPercent 0.5, … ) A generalized API on Spark Dataframes Native SparkSQL expressions Emphasis on type-safety Many stratification attributes: Country, Devices, Searches,...
  34. 34. Feature Transformers The feature generation pipeline is a sequence of Transformers A Transformer takes a dataframe, and based on contexts performs computations on and returns a new data frame. Dataset Type Tagger → Country Tenure Stratified Sampler → Negative Generator → ….
  35. 35. Feature Generation - Putting it together Model Training Structured Labeled Features Feature Model Structured Data in DataFrame Feature Encoders Required Feature Maps of Data POJO Features Required Data Label Data Catalyst Expressions AlgoCommons Fact Store Structured Labeled Features Required Feature DataMaps Features Required Data 1 2 24 5 6 7
  36. 36. Training ● Need flexibility and access to trainers in all languages/environments ● A simple unified Training API for ○ Synchronous & Asynchronous ○ Single Docker or Distributed (Spark) ● Inputs: Trainingset as a Spark Dataset, model params ● Returns: a Model abstraction wrapper of AlgoCommons PredictorConfig ● Can support many popular Trainers: Learning Tools
  37. 37. Metrics ● Leverages AlgoCommons Metrics framework ● Context Level Metrics ○ Supports ranking metrics: nMRR, Recall, nDCG, etc. ○ Supports algo-commons models or custom scoring functions ○ Users can slice and dice the metrics ○ Users can aggregate them using SQL ■ Performant implementation using Spark SQL catalyst expressions ● Item Level Metrics ○ E.g. row popularity
  38. 38. Visualization Integrates with - a Scala library for matplotlib like visualizations Open-sourced
  39. 39. Lessons learnt ● Machine learning is an iterative and data sensitive process ○ Make exploration easy, and productionizing robust ○ Make it easy to go switch between the two ● Design components with a general flexible interface ○ Specialize interfaces when you need to ● Testing can be hard, but worthwhile ○ Unit, Integration, Data Checks, Continuous Integration, @ScaleTesting ○ Metric driven system validations
  40. 40. Label Generation Fact Store Training Data Preparation Training Feature Engineering Model Quality Intent To Treat (Serving) Treatment & Action Hyperparameter Optimization N O T E B O O K S Caching Dynamic param management Inference & Logging A/B Testing Platform Online & Precompute Framework Personalization Aggregation Fact Logging Device Logging Online Services API The Personalization Rainbow Control Plane Online Device Function Offline Personalization systems & infrastructure Boson Algo Commons O R C H E S T R A T I O N
  41. 41. Joy
  42. 42. Thank you! (and yes, we’re hiring) Questions

×