ML Infra for Netflix Recommendations - AI NEXTCon talk

Faisal Siddiqi (@faisalzs)
12 Apr 2018
Machine Learning Infra for
Recommendations

Create Personalized recommendations for discoveries
of engaging video content that maximize member joy
Goal

Outline for today
● Domain Context
● ML Infra stack for Personalization
● Deeper dives into 2 major ML Infra components

Personalize everything!
Row
Selection &
Order
Titles ranked by relevance
Artwork!

Personalization Context
Data from
Millions of
users
Training
pipelines
Models
Precompute System
Rankings
Online
caches
AB Test
Allocation

Training
Data
Preparation
Training
Feature
Engineering
Model
Quality
Intent To Treat
(Serving)
Treatment &
Action
Inference &
Logging
The
Personalization
Rainbow
Online
Device
Function
Offline
Personalization systems & infrastructure
Control Plane

Label
Generation
Fact Store
Training
Data
Preparation
Training
Feature
Engineering
Model
Quality
Intent To Treat
(Serving)
Treatment &
Action
Hyperparameter
Optimization
N
O
T
E
B
O
O
K
S
Caching
Dynamic param
management
Inference &
Logging
A/B Testing
Platform Online &
Precompute
Framework
Personalization
Aggregation
Fact
Logging
Device
Logging
Online
Services
API
The
Personalization
Rainbow
Control Plane
Online
Device
Function
Offline
Personalization systems & infrastructure
Boson
Algo Commons
O
R
C
H
E
S
T
R
A
T
I
O
N

Training
Feature
Engineering
Model
Quality
Inference &
Logging
Boson
Algo Commons
We’ll zoom into Boson & AlgoCommons today

The Context for AlgoCommons & Boson
● Machine Learning via ‘loosely-coupled, highly-aligned’ Scala/Java Libraries
● Historical context
○ Siloed machine learning infrastructure
○ Few opportunities for sharing
■ Incompatibility
■ Dependency concerns
■ Improvements in one pipeline not shared across others

Design Principles
● Composability
○ Ability to put pieces together in novel ways
○ Enable construction of generic tools
● Portability
○ Easily share code online/offline and between applications
○ Models, Feature encoders, Common data manipulation
● Avoiding Training-Serving Skew
○ Serving/Online systems are Java based, drives choice of offline software
○ Share code & data between offline/online worlds

Training
Feature
Engineering
Model
Quality
Inference &
Logging
Delorean
Time Travel
Feature Generation
Feature Transformers
Label Joins
Feature Schema
Stratification & Sampling
Data Fetchers & utilities
Training
API
Model
Tuning
Boson
AlgoCommons
Spot Checks (human-in-the-loop)
Visualization
Feature Importance
Validation Runs
Training Metrics
Abstractions
Feature Sharing
Component Sets
Data Maps
Feature Encoders
Specification
Common Model Format
(JSON)
Metrics Framework
Predictions
Inferencing Metrics
Scoring
Model Loading
InferencingAlgoCommons
& Boson
Batch Training over Distributed
Spark or Dockerized Containers

● Common abstractions and building blocks for ML
● Integrated in Java microservices
for online or pre-computed Inferencing
● Library > framework (user-focus)
● Program to interfaces (composability)
● Aggressive modularization to avoid Jar Hell (portability)
● Data Access Abstraction (portability, testability)
Overview AlgoCommons

Common abstractions and Building Blocks
● Data
○ Data Keys
○ Data Maps
● Modeling
○ Component Sets
○ Feature Encoders, Predictor, Scorer
○ Model Format
● Metrics
AlgoCommons

DataKey<T>
○ Identifies a data value by name/type e.g “ViewingHistory”
Data Value
○ Preferably immutable data structure
DataMap
○ Map from DataKey<T> to T, plus metadata
Data Access - Abstractions AlgoCommons

Data Access - Lifecycle
Application Component
Factory
Component
What DataKeys do
you need?
I need X, Y, and Z
f.create(dataMap)
new Component(X, Y, Z)
Return comp
comp.do(someInput)
Make
DataMap w/
X, Y, and Z
Data Retrieval
Component Instantiation /
Data Prep
Component Application
(repeat as needed)
AlgoCommons

DataTransform
● DataMap => K/V
● Given zero or more key/values, produce a new key/value
● Consumable by other data transforms, feature encoders, and components
AlgoCommons

Feature Encoder
● DataMap ⇒ (T ⇒ FeatureSet)
● FeatureEncoder<T> create(DataMap)
○ Given a DataMap, initialize a new encoder doing any required data prep
● void encode(T, FeatureSet)
○ Given an item (say, a Video), encode features for it into the feature set
AlgoCommons

Feature Transform
● Expression “language” for transforming features to produce new features
○ aka Feature Interactions
● Many operators available
○ log, outer/inner product, arithmetic, logic
● Expressions can be arbitrarily “stacked”
● Expressions are automatically DeDuped
AlgoCommons

Predictor
● Compute a score for a feature vector
● DataMap ⇒ (Vector ⇒ Double)
○ Predictor create(DataMap)
■ Given a data map, construct a new predictor
○ double predict(Vector)
■ Given a feature vector, compute a prediction/score
● Supports many Predictors:
○ LR, RegressionTree, TensorFlow, XGBoost,
WeightedAdditiveEnsemble, FeatureWeighted, MultivariatePredictors,
BanditPredictor, Sequence-to-sequence,...
AlgoCommons

Scorer
● Compute a score for business objects
● DataMap ⇒ (T ⇒ Double)
● Scorer<T> create(DataMap)
○ Given a data map, construct a new Scorer<T>.
● double score(T)
○ Given an item, compute a score
AlgoCommons

Extensible Model Definition
● Component abstraction
● JSON model serialization
● Various “views” of the Model
○ Feature gen
○ Prediction
○ Scoring
{
"@id" : "my-model",
"@schema" : "SimpleFeatureScoringModel",
"dataTransforms" : [ ... data transforms ...],
"featureEncoders" : [ ... feature defs ...],
"featureTransform" : { ... feature interactions ... },
"predictor" : { ... ML model (weights, etc.) ... }
}
AlgoCommons

Data
Transform
Data
Transform
Feature
Encoder
Feature
Transform
App
Data
Feature
Encoder
Feature
Encoder
Predictor
ScoringModelView
DataTransformView
FeatureGeneratorView
PredictorView
Views of the Feature Scoring Model
AlgoCommons

Metrics
● Building blocks
○ Accumulators
○ Estimators
● Ranking
○ Precision, Recall
○ Recall@Rank, NormalizedMeanReciprocalRank
● Regression
○ Error Accumulators
○ RMSE
AlgoCommons

Motivation
Provide the productivity of this
But make it easy to go between
prod & experimentation

Overview
● A high level Scala API for ML exploration
● Focuses on Offline Training for both
○ Ad-hoc exploration
○ Production Training
● Think “Subset of SKLearn” for Scala/JVM ecosystem
● Spark’s dataframe a core data abstraction

Data Utilities
● Utilities for data transfer between heterogeneous systems
● Leverage Spark for data munging, but need bridge to Docker Trainers
○ Use standalone s3 downloader and parquet reader
○ S3 + s3fs-fuse
○ HDFS + hdfs-fuse
● On the wire format
○ Parquet
○ Protobuf

Feature Schema
● Context
The setting for evaluating a set of items (member profiles, country, etc.)
● Items
The elements to be trained on and scored (videos, rows, etc.)

Stratification
dataframe.stratify (samplingRules =
$(“column_foo”) == ‘US’ maxPercent 8.0,
$(“column_bar”) > 10 && $(“column_qux”) > 1 minPercent 0.5,
…
)
A generalized API on Spark Dataframes
Native SparkSQL expressions
Emphasis on type-safety
Many stratification attributes: Country, Devices, Searches,...

Feature Transformers
The feature generation pipeline is a sequence of Transformers
A Transformer takes a dataframe, and based on contexts performs computations
on and returns a new data frame.
Dataset Type Tagger
→ Country Tenure Stratified Sampler
→ Negative Generator
→ ….

Feature Generation - Putting it together
Model Training
Structured Labeled Features
Feature Model
Structured Data in
DataFrame
Feature Encoders
Required
Feature
Maps of Data
POJO
Features
Required Data
Label Data
Catalyst
Expressions
AlgoCommons
Fact Store
Structured Labeled
Features
Required
Feature
DataMaps
Features
Required
Data
1
2
24
5
6
7

Training
● Need flexibility and access to trainers in all languages/environments
● A simple unified Training API for
○ Synchronous & Asynchronous
○ Single Docker or Distributed (Spark)
● Inputs: Trainingset as a Spark Dataset, model params
● Returns: a Model abstraction wrapper of AlgoCommons PredictorConfig
● Can support many popular Trainers:
Learning Tools

Metrics
● Leverages AlgoCommons Metrics framework
● Context Level Metrics
○ Supports ranking metrics: nMRR, Recall, nDCG, etc.
○ Supports algo-commons models or custom scoring functions
○ Users can slice and dice the metrics
○ Users can aggregate them using SQL
■ Performant implementation using Spark SQL catalyst expressions
● Item Level Metrics
○ E.g. row popularity

Visualization
Integrates with
- a Scala library for
matplotlib like visualizations
Open-sourced

Lessons learnt
● Machine learning is an iterative and data sensitive process
○ Make exploration easy, and productionizing robust
○ Make it easy to go switch between the two
● Design components with a general flexible interface
○ Specialize interfaces when you need to
● Testing can be hard, but worthwhile
○ Unit, Integration, Data Checks, Continuous Integration, @ScaleTesting
○ Metric driven system validations

Thank you!
(and yes, we’re hiring)
Questions

ML Infra for Netflix Recommendations - AI NEXTCon talk

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ML Infra for Netflix Recommendations - AI NEXTCon talk

Similar to ML Infra for Netflix Recommendations - AI NEXTCon talk (20)

Recently uploaded

Recently uploaded (20)

ML Infra for Netflix Recommendations - AI NEXTCon talk