SlideShare a Scribd company logo
1 of 43
Download to read offline
Faisal Siddiqi (@faisalzs)
12 Apr 2018
Machine Learning Infra for
Recommendations
?
?
Create Personalized recommendations for discoveries
of engaging video content that maximize member joy
Goal
Create Personalized recommendations for discoveries
of engaging video content that maximize member joy
Goal
Outline for today
● Domain Context
● ML Infra stack for Personalization
● Deeper dives into 2 major ML Infra components
Personalize everything!
Row
Selection &
Order
Titles ranked by relevance
Artwork!
Personalization Context
Data from
Millions of
users
Training
pipelines
Models
Precompute System
Rankings
Online
caches
AB Test
Allocation
Joy == Rainbow
Training
Data
Preparation
Training
Feature
Engineering
Model
Quality
Intent To Treat
(Serving)
Treatment &
Action
Inference &
Logging
The
Personalization
Rainbow
Online
Device
Function
Offline
Personalization systems & infrastructure
Control Plane
Label
Generation
Fact Store
Training
Data
Preparation
Training
Feature
Engineering
Model
Quality
Intent To Treat
(Serving)
Treatment &
Action
Hyperparameter
Optimization
N
O
T
E
B
O
O
K
S
Caching
Dynamic param
management
Inference &
Logging
A/B Testing
Platform Online &
Precompute
Framework
Personalization
Aggregation
Fact
Logging
Device
Logging
Online
Services
API
The
Personalization
Rainbow
Control Plane
Online
Device
Function
Offline
Personalization systems & infrastructure
Boson
Algo Commons
O
R
C
H
E
S
T
R
A
T
I
O
N
Training
Feature
Engineering
Model
Quality
Inference &
Logging
Boson
Algo Commons
We’ll zoom into Boson & AlgoCommons today
The Context for AlgoCommons & Boson
● Machine Learning via ‘loosely-coupled, highly-aligned’ Scala/Java Libraries
● Historical context
○ Siloed machine learning infrastructure
○ Few opportunities for sharing
■ Incompatibility
■ Dependency concerns
■ Improvements in one pipeline not shared across others
Design Principles
● Composability
○ Ability to put pieces together in novel ways
○ Enable construction of generic tools
● Portability
○ Easily share code online/offline and between applications
○ Models, Feature encoders, Common data manipulation
● Avoiding Training-Serving Skew
○ Serving/Online systems are Java based, drives choice of offline software
○ Share code & data between offline/online worlds
Training
Feature
Engineering
Model
Quality
Inference &
Logging
Delorean
Time Travel
Feature Generation
Feature Transformers
Label Joins
Feature Schema
Stratification & Sampling
Data Fetchers & utilities
Training
API
Model
Tuning
Boson
AlgoCommons
Spot Checks (human-in-the-loop)
Visualization
Feature Importance
Validation Runs
Training Metrics
Abstractions
Feature Sharing
Component Sets
Data Maps
Feature Encoders
Specification
Common Model Format
(JSON)
Metrics Framework
Predictions
Inferencing Metrics
Scoring
Model Loading
InferencingAlgoCommons
& Boson
Batch Training over Distributed
Spark or Dockerized Containers
AlgoCommons
● Common abstractions and building blocks for ML
● Integrated in Java microservices
for online or pre-computed Inferencing
● Library > framework (user-focus)
● Program to interfaces (composability)
● Aggressive modularization to avoid Jar Hell (portability)
● Data Access Abstraction (portability, testability)
Overview AlgoCommons
Common abstractions and Building Blocks
● Data
○ Data Keys
○ Data Maps
● Modeling
○ Component Sets
○ Feature Encoders, Predictor, Scorer
○ Model Format
● Metrics
AlgoCommons
DataKey<T>
○ Identifies a data value by name/type e.g “ViewingHistory”
Data Value
○ Preferably immutable data structure
DataMap
○ Map from DataKey<T> to T, plus metadata
Data Access - Abstractions AlgoCommons
Data Access - Lifecycle
Application Component
Factory
Component
What DataKeys do
you need?
I need X, Y, and Z
f.create(dataMap)
new Component(X, Y, Z)
Return comp
comp.do(someInput)
Make
DataMap w/
X, Y, and Z
Data Retrieval
Component Instantiation /
Data Prep
Component Application
(repeat as needed)
AlgoCommons
DataTransform
● DataMap => K/V
● Given zero or more key/values, produce a new key/value
● Consumable by other data transforms, feature encoders, and components
AlgoCommons
Feature Encoder
● DataMap ⇒ (T ⇒ FeatureSet)
● FeatureEncoder<T> create(DataMap)
○ Given a DataMap, initialize a new encoder doing any required data prep
● void encode(T, FeatureSet)
○ Given an item (say, a Video), encode features for it into the feature set
AlgoCommons
Feature Transform
● Expression “language” for transforming features to produce new features
○ aka Feature Interactions
● Many operators available
○ log, outer/inner product, arithmetic, logic
● Expressions can be arbitrarily “stacked”
● Expressions are automatically DeDuped
AlgoCommons
Predictor
● Compute a score for a feature vector
● DataMap ⇒ (Vector ⇒ Double)
○ Predictor create(DataMap)
■ Given a data map, construct a new predictor
○ double predict(Vector)
■ Given a feature vector, compute a prediction/score
● Supports many Predictors:
○ LR, RegressionTree, TensorFlow, XGBoost,
WeightedAdditiveEnsemble, FeatureWeighted, MultivariatePredictors,
BanditPredictor, Sequence-to-sequence,...
AlgoCommons
Scorer
● Compute a score for business objects
● DataMap ⇒ (T ⇒ Double)
● Scorer<T> create(DataMap)
○ Given a data map, construct a new Scorer<T>.
● double score(T)
○ Given an item, compute a score
AlgoCommons
Extensible Model Definition
● Component abstraction
● JSON model serialization
● Various “views” of the Model
○ Feature gen
○ Prediction
○ Scoring
{
"@id" : "my-model",
"@schema" : "SimpleFeatureScoringModel",
"dataTransforms" : [ ... data transforms ...],
"featureEncoders" : [ ... feature defs ...],
"featureTransform" : { ... feature interactions ... },
"predictor" : { ... ML model (weights, etc.) ... }
}
AlgoCommons
Data
Transform
Data
Transform
Feature
Encoder
Feature
Transform
App
Data
Feature
Encoder
Feature
Encoder
Predictor
ScoringModelView
DataTransformView
FeatureGeneratorView
PredictorView
Views of the Feature Scoring Model
AlgoCommons
Metrics
● Building blocks
○ Accumulators
○ Estimators
● Ranking
○ Precision, Recall
○ Recall@Rank, NormalizedMeanReciprocalRank
● Regression
○ Error Accumulators
○ RMSE
AlgoCommons
Motivation
Provide the productivity of this
But make it easy to go between
prod & experimentation
Overview
● A high level Scala API for ML exploration
● Focuses on Offline Training for both
○ Ad-hoc exploration
○ Production Training
● Think “Subset of SKLearn” for Scala/JVM ecosystem
● Spark’s dataframe a core data abstraction
Data Utilities
● Utilities for data transfer between heterogeneous systems
● Leverage Spark for data munging, but need bridge to Docker Trainers
○ Use standalone s3 downloader and parquet reader
○ S3 + s3fs-fuse
○ HDFS + hdfs-fuse
● On the wire format
○ Parquet
○ Protobuf
Feature Schema
● Context
The setting for evaluating a set of items (member profiles, country, etc.)
● Items
The elements to be trained on and scored (videos, rows, etc.)
Stratification
dataframe.stratify (samplingRules =
$(“column_foo”) == ‘US’ maxPercent 8.0,
$(“column_bar”) > 10 && $(“column_qux”) > 1 minPercent 0.5,
…
)
A generalized API on Spark Dataframes
Native SparkSQL expressions
Emphasis on type-safety
Many stratification attributes: Country, Devices, Searches,...
Feature Transformers
The feature generation pipeline is a sequence of Transformers
A Transformer takes a dataframe, and based on contexts performs computations
on and returns a new data frame.
Dataset Type Tagger
→ Country Tenure Stratified Sampler
→ Negative Generator
→ ….
Feature Generation - Putting it together
Model Training
Structured Labeled Features
Feature Model
Structured Data in
DataFrame
Feature Encoders
Required
Feature
Maps of Data
POJO
Features
Required Data
Label Data
Catalyst
Expressions
AlgoCommons
Fact Store
Structured Labeled
Features
Required
Feature
DataMaps
Features
Required
Data
1
2
24
5
6
7
Training
● Need flexibility and access to trainers in all languages/environments
● A simple unified Training API for
○ Synchronous & Asynchronous
○ Single Docker or Distributed (Spark)
● Inputs: Trainingset as a Spark Dataset, model params
● Returns: a Model abstraction wrapper of AlgoCommons PredictorConfig
● Can support many popular Trainers:
Learning Tools
Metrics
● Leverages AlgoCommons Metrics framework
● Context Level Metrics
○ Supports ranking metrics: nMRR, Recall, nDCG, etc.
○ Supports algo-commons models or custom scoring functions
○ Users can slice and dice the metrics
○ Users can aggregate them using SQL
■ Performant implementation using Spark SQL catalyst expressions
● Item Level Metrics
○ E.g. row popularity
Visualization
Integrates with
- a Scala library for
matplotlib like visualizations
Open-sourced
Lessons learnt
● Machine learning is an iterative and data sensitive process
○ Make exploration easy, and productionizing robust
○ Make it easy to go switch between the two
● Design components with a general flexible interface
○ Specialize interfaces when you need to
● Testing can be hard, but worthwhile
○ Unit, Integration, Data Checks, Continuous Integration, @ScaleTesting
○ Metric driven system validations
Label
Generation
Fact Store
Training
Data
Preparation
Training
Feature
Engineering
Model
Quality
Intent To Treat
(Serving)
Treatment &
Action
Hyperparameter
Optimization
N
O
T
E
B
O
O
K
S
Caching
Dynamic param
management
Inference &
Logging
A/B Testing
Platform Online &
Precompute
Framework
Personalization
Aggregation
Fact
Logging
Device
Logging
Online
Services
API
The
Personalization
Rainbow
Control Plane
Online
Device
Function
Offline
Personalization systems & infrastructure
Boson
Algo Commons
O
R
C
H
E
S
T
R
A
T
I
O
N
Joy
Thank you!
(and yes, we’re hiring)
Questions

More Related Content

What's hot

Models in Minutes using AutoML
Models in Minutes using AutoMLModels in Minutes using AutoML
Models in Minutes using AutoMLBill Liu
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOpsRui Quintino
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine LearningLogical Clocks
 
ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...
ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...
ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...Fei Chen
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsDataPhoenix
 
Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...PAPIs.io
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycleDatabricks
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)Simba Khadder
 
Accelerating Production Machine Learning with MLflow with Matei Zaharia
Accelerating Production Machine Learning with MLflow with Matei ZahariaAccelerating Production Machine Learning with MLflow with Matei Zaharia
Accelerating Production Machine Learning with MLflow with Matei ZahariaDatabricks
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Nikhil Garg
 
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...Databricks
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyftmarkgrover
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
Seamless End-to-End Production Machine Learning with Seldon and MLflow
 Seamless End-to-End Production Machine Learning with Seldon and MLflow Seamless End-to-End Production Machine Learning with Seldon and MLflow
Seamless End-to-End Production Machine Learning with Seldon and MLflowDatabricks
 
Challenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous DrivingChallenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous DrivingJan Wiegelmann
 
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...Costanoa Ventures
 
API, WhizzML and Apps
API, WhizzML and AppsAPI, WhizzML and Apps
API, WhizzML and AppsBigML, Inc
 
What's Next for MLflow in 2019
What's Next for MLflow in 2019What's Next for MLflow in 2019
What's Next for MLflow in 2019Anyscale
 
Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIQAware GmbH
 

What's hot (20)

Models in Minutes using AutoML
Models in Minutes using AutoMLModels in Minutes using AutoML
Models in Minutes using AutoML
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine Learning
 
ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...
ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...
ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 
Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
Accelerating Production Machine Learning with MLflow with Matei Zaharia
Accelerating Production Machine Learning with MLflow with Matei ZahariaAccelerating Production Machine Learning with MLflow with Matei Zaharia
Accelerating Production Machine Learning with MLflow with Matei Zaharia
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
 
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
Seamless End-to-End Production Machine Learning with Seldon and MLflow
 Seamless End-to-End Production Machine Learning with Seldon and MLflow Seamless End-to-End Production Machine Learning with Seldon and MLflow
Seamless End-to-End Production Machine Learning with Seldon and MLflow
 
Challenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous DrivingChallenges of Deep Learning in the Automotive Industry and Autonomous Driving
Challenges of Deep Learning in the Automotive Industry and Autonomous Driving
 
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
 
API, WhizzML and Apps
API, WhizzML and AppsAPI, WhizzML and Apps
API, WhizzML and Apps
 
What's Next for MLflow in 2019
What's Next for MLflow in 2019What's Next for MLflow in 2019
What's Next for MLflow in 2019
 
Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AI
 

Similar to Netflix Machine Learning Infra for Recommendations - 2018

Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...Chetan Khatri
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...Noriaki Tatsumi
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real ExperienceIhor Bobak
 
Shaping serverless architecture with domain driven design patterns - py web-il
Shaping serverless architecture with domain driven design patterns - py web-ilShaping serverless architecture with domain driven design patterns - py web-il
Shaping serverless architecture with domain driven design patterns - py web-ilAsher Sterkin
 
Machine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - DatabricksMachine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - DatabricksSpark Summit
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Khai Tran
 
Multiple graphs in openCypher
Multiple graphs in openCypherMultiple graphs in openCypher
Multiple graphs in openCypheropenCypher
 
Machine learning pipeline with spark ml
Machine learning pipeline with spark mlMachine learning pipeline with spark ml
Machine learning pipeline with spark mldatamantra
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17Taro L. Saito
 
Ml programming with python
Ml programming with pythonMl programming with python
Ml programming with pythonKumud Arora
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibDatabricks
 
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...Databricks
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Multiple Graphs: Updatable Views
Multiple Graphs: Updatable ViewsMultiple Graphs: Updatable Views
Multiple Graphs: Updatable ViewsopenCypher
 

Similar to Netflix Machine Learning Infra for Recommendations - 2018 (20)

Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real Experience
 
Shaping serverless architecture with domain driven design patterns - py web-il
Shaping serverless architecture with domain driven design patterns - py web-ilShaping serverless architecture with domain driven design patterns - py web-il
Shaping serverless architecture with domain driven design patterns - py web-il
 
Machine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - DatabricksMachine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - Databricks
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
 
Multiple graphs in openCypher
Multiple graphs in openCypherMultiple graphs in openCypher
Multiple graphs in openCypher
 
Machine learning pipeline with spark ml
Machine learning pipeline with spark mlMachine learning pipeline with spark ml
Machine learning pipeline with spark ml
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
 
Ml programming with python
Ml programming with pythonMl programming with python
Ml programming with python
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlib
 
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Multiple Graphs: Updatable Views
Multiple Graphs: Updatable ViewsMultiple Graphs: Updatable Views
Multiple Graphs: Updatable Views
 

More from Karthik Murugesan

Rakuten - Recommendation Platform
Rakuten - Recommendation PlatformRakuten - Recommendation Platform
Rakuten - Recommendation PlatformKarthik Murugesan
 
Yahoo's Knowledge Graph - 2014 slides
Yahoo's Knowledge Graph - 2014 slidesYahoo's Knowledge Graph - 2014 slides
Yahoo's Knowledge Graph - 2014 slidesKarthik Murugesan
 
Free servers to build Big Data Systems on: Bing's Approach
Free servers to build Big Data Systems on: Bing's  Approach Free servers to build Big Data Systems on: Bing's  Approach
Free servers to build Big Data Systems on: Bing's Approach Karthik Murugesan
 
Microsoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER IntroductionMicrosoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER IntroductionKarthik Murugesan
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019Karthik Murugesan
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...Karthik Murugesan
 
The journey toward a self-service data platform at Netflix - sf 2019
The journey toward a self-service data platform at Netflix - sf 2019The journey toward a self-service data platform at Netflix - sf 2019
The journey toward a self-service data platform at Netflix - sf 2019Karthik Murugesan
 
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at UberKarthik Murugesan
 
Developing a ML model using TF Estimator
Developing a ML model using TF EstimatorDeveloping a ML model using TF Estimator
Developing a ML model using TF EstimatorKarthik Murugesan
 
Production Model Deployment - StitchFix - 2018
Production Model Deployment - StitchFix - 2018Production Model Deployment - StitchFix - 2018
Production Model Deployment - StitchFix - 2018Karthik Murugesan
 
Netflix factstore for recommendations - 2018
Netflix factstore  for recommendations - 2018Netflix factstore  for recommendations - 2018
Netflix factstore for recommendations - 2018Karthik Murugesan
 
Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Karthik Murugesan
 
Netflix Ads Personalization Solution - 2017
Netflix Ads Personalization Solution - 2017Netflix Ads Personalization Solution - 2017
Netflix Ads Personalization Solution - 2017Karthik Murugesan
 
Spotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music DiscoverySpotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music DiscoveryKarthik Murugesan
 
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform Karthik Murugesan
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Karthik Murugesan
 

More from Karthik Murugesan (20)

Rakuten - Recommendation Platform
Rakuten - Recommendation PlatformRakuten - Recommendation Platform
Rakuten - Recommendation Platform
 
Yahoo's Knowledge Graph - 2014 slides
Yahoo's Knowledge Graph - 2014 slidesYahoo's Knowledge Graph - 2014 slides
Yahoo's Knowledge Graph - 2014 slides
 
Free servers to build Big Data Systems on: Bing's Approach
Free servers to build Big Data Systems on: Bing's  Approach Free servers to build Big Data Systems on: Bing's  Approach
Free servers to build Big Data Systems on: Bing's Approach
 
Microsoft cosmos
Microsoft cosmosMicrosoft cosmos
Microsoft cosmos
 
Microsoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER IntroductionMicrosoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER Introduction
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...
 
The journey toward a self-service data platform at Netflix - sf 2019
The journey toward a self-service data platform at Netflix - sf 2019The journey toward a self-service data platform at Netflix - sf 2019
The journey toward a self-service data platform at Netflix - sf 2019
 
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
 
Developing a ML model using TF Estimator
Developing a ML model using TF EstimatorDeveloping a ML model using TF Estimator
Developing a ML model using TF Estimator
 
Production Model Deployment - StitchFix - 2018
Production Model Deployment - StitchFix - 2018Production Model Deployment - StitchFix - 2018
Production Model Deployment - StitchFix - 2018
 
Netflix factstore for recommendations - 2018
Netflix factstore  for recommendations - 2018Netflix factstore  for recommendations - 2018
Netflix factstore for recommendations - 2018
 
Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Trends in Music Recommendations 2018
Trends in Music Recommendations 2018
 
Netflix Ads Personalization Solution - 2017
Netflix Ads Personalization Solution - 2017Netflix Ads Personalization Solution - 2017
Netflix Ads Personalization Solution - 2017
 
State Of AI 2018
State Of AI 2018State Of AI 2018
State Of AI 2018
 
Spotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music DiscoverySpotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music Discovery
 
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
AirBNB - Zipline: Airbnb’s Machine Learning Data Management Platform
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Netflix Machine Learning Infra for Recommendations - 2018

  • 1. Faisal Siddiqi (@faisalzs) 12 Apr 2018 Machine Learning Infra for Recommendations
  • 2. ?
  • 3. ?
  • 4. Create Personalized recommendations for discoveries of engaging video content that maximize member joy Goal
  • 5. Create Personalized recommendations for discoveries of engaging video content that maximize member joy Goal
  • 6. Outline for today ● Domain Context ● ML Infra stack for Personalization ● Deeper dives into 2 major ML Infra components
  • 8. Personalization Context Data from Millions of users Training pipelines Models Precompute System Rankings Online caches AB Test Allocation
  • 10. Training Data Preparation Training Feature Engineering Model Quality Intent To Treat (Serving) Treatment & Action Inference & Logging The Personalization Rainbow Online Device Function Offline Personalization systems & infrastructure Control Plane
  • 11. Label Generation Fact Store Training Data Preparation Training Feature Engineering Model Quality Intent To Treat (Serving) Treatment & Action Hyperparameter Optimization N O T E B O O K S Caching Dynamic param management Inference & Logging A/B Testing Platform Online & Precompute Framework Personalization Aggregation Fact Logging Device Logging Online Services API The Personalization Rainbow Control Plane Online Device Function Offline Personalization systems & infrastructure Boson Algo Commons O R C H E S T R A T I O N
  • 13. The Context for AlgoCommons & Boson ● Machine Learning via ‘loosely-coupled, highly-aligned’ Scala/Java Libraries ● Historical context ○ Siloed machine learning infrastructure ○ Few opportunities for sharing ■ Incompatibility ■ Dependency concerns ■ Improvements in one pipeline not shared across others
  • 14. Design Principles ● Composability ○ Ability to put pieces together in novel ways ○ Enable construction of generic tools ● Portability ○ Easily share code online/offline and between applications ○ Models, Feature encoders, Common data manipulation ● Avoiding Training-Serving Skew ○ Serving/Online systems are Java based, drives choice of offline software ○ Share code & data between offline/online worlds
  • 15. Training Feature Engineering Model Quality Inference & Logging Delorean Time Travel Feature Generation Feature Transformers Label Joins Feature Schema Stratification & Sampling Data Fetchers & utilities Training API Model Tuning Boson AlgoCommons Spot Checks (human-in-the-loop) Visualization Feature Importance Validation Runs Training Metrics Abstractions Feature Sharing Component Sets Data Maps Feature Encoders Specification Common Model Format (JSON) Metrics Framework Predictions Inferencing Metrics Scoring Model Loading InferencingAlgoCommons & Boson Batch Training over Distributed Spark or Dockerized Containers
  • 17. ● Common abstractions and building blocks for ML ● Integrated in Java microservices for online or pre-computed Inferencing ● Library > framework (user-focus) ● Program to interfaces (composability) ● Aggressive modularization to avoid Jar Hell (portability) ● Data Access Abstraction (portability, testability) Overview AlgoCommons
  • 18. Common abstractions and Building Blocks ● Data ○ Data Keys ○ Data Maps ● Modeling ○ Component Sets ○ Feature Encoders, Predictor, Scorer ○ Model Format ● Metrics AlgoCommons
  • 19. DataKey<T> ○ Identifies a data value by name/type e.g “ViewingHistory” Data Value ○ Preferably immutable data structure DataMap ○ Map from DataKey<T> to T, plus metadata Data Access - Abstractions AlgoCommons
  • 20. Data Access - Lifecycle Application Component Factory Component What DataKeys do you need? I need X, Y, and Z f.create(dataMap) new Component(X, Y, Z) Return comp comp.do(someInput) Make DataMap w/ X, Y, and Z Data Retrieval Component Instantiation / Data Prep Component Application (repeat as needed) AlgoCommons
  • 21. DataTransform ● DataMap => K/V ● Given zero or more key/values, produce a new key/value ● Consumable by other data transforms, feature encoders, and components AlgoCommons
  • 22. Feature Encoder ● DataMap ⇒ (T ⇒ FeatureSet) ● FeatureEncoder<T> create(DataMap) ○ Given a DataMap, initialize a new encoder doing any required data prep ● void encode(T, FeatureSet) ○ Given an item (say, a Video), encode features for it into the feature set AlgoCommons
  • 23. Feature Transform ● Expression “language” for transforming features to produce new features ○ aka Feature Interactions ● Many operators available ○ log, outer/inner product, arithmetic, logic ● Expressions can be arbitrarily “stacked” ● Expressions are automatically DeDuped AlgoCommons
  • 24. Predictor ● Compute a score for a feature vector ● DataMap ⇒ (Vector ⇒ Double) ○ Predictor create(DataMap) ■ Given a data map, construct a new predictor ○ double predict(Vector) ■ Given a feature vector, compute a prediction/score ● Supports many Predictors: ○ LR, RegressionTree, TensorFlow, XGBoost, WeightedAdditiveEnsemble, FeatureWeighted, MultivariatePredictors, BanditPredictor, Sequence-to-sequence,... AlgoCommons
  • 25. Scorer ● Compute a score for business objects ● DataMap ⇒ (T ⇒ Double) ● Scorer<T> create(DataMap) ○ Given a data map, construct a new Scorer<T>. ● double score(T) ○ Given an item, compute a score AlgoCommons
  • 26. Extensible Model Definition ● Component abstraction ● JSON model serialization ● Various “views” of the Model ○ Feature gen ○ Prediction ○ Scoring { "@id" : "my-model", "@schema" : "SimpleFeatureScoringModel", "dataTransforms" : [ ... data transforms ...], "featureEncoders" : [ ... feature defs ...], "featureTransform" : { ... feature interactions ... }, "predictor" : { ... ML model (weights, etc.) ... } } AlgoCommons
  • 28. Metrics ● Building blocks ○ Accumulators ○ Estimators ● Ranking ○ Precision, Recall ○ Recall@Rank, NormalizedMeanReciprocalRank ● Regression ○ Error Accumulators ○ RMSE AlgoCommons
  • 29.
  • 30. Motivation Provide the productivity of this But make it easy to go between prod & experimentation
  • 31. Overview ● A high level Scala API for ML exploration ● Focuses on Offline Training for both ○ Ad-hoc exploration ○ Production Training ● Think “Subset of SKLearn” for Scala/JVM ecosystem ● Spark’s dataframe a core data abstraction
  • 32. Data Utilities ● Utilities for data transfer between heterogeneous systems ● Leverage Spark for data munging, but need bridge to Docker Trainers ○ Use standalone s3 downloader and parquet reader ○ S3 + s3fs-fuse ○ HDFS + hdfs-fuse ● On the wire format ○ Parquet ○ Protobuf
  • 33. Feature Schema ● Context The setting for evaluating a set of items (member profiles, country, etc.) ● Items The elements to be trained on and scored (videos, rows, etc.)
  • 34. Stratification dataframe.stratify (samplingRules = $(“column_foo”) == ‘US’ maxPercent 8.0, $(“column_bar”) > 10 && $(“column_qux”) > 1 minPercent 0.5, … ) A generalized API on Spark Dataframes Native SparkSQL expressions Emphasis on type-safety Many stratification attributes: Country, Devices, Searches,...
  • 35. Feature Transformers The feature generation pipeline is a sequence of Transformers A Transformer takes a dataframe, and based on contexts performs computations on and returns a new data frame. Dataset Type Tagger → Country Tenure Stratified Sampler → Negative Generator → ….
  • 36. Feature Generation - Putting it together Model Training Structured Labeled Features Feature Model Structured Data in DataFrame Feature Encoders Required Feature Maps of Data POJO Features Required Data Label Data Catalyst Expressions AlgoCommons Fact Store Structured Labeled Features Required Feature DataMaps Features Required Data 1 2 24 5 6 7
  • 37. Training ● Need flexibility and access to trainers in all languages/environments ● A simple unified Training API for ○ Synchronous & Asynchronous ○ Single Docker or Distributed (Spark) ● Inputs: Trainingset as a Spark Dataset, model params ● Returns: a Model abstraction wrapper of AlgoCommons PredictorConfig ● Can support many popular Trainers: Learning Tools
  • 38. Metrics ● Leverages AlgoCommons Metrics framework ● Context Level Metrics ○ Supports ranking metrics: nMRR, Recall, nDCG, etc. ○ Supports algo-commons models or custom scoring functions ○ Users can slice and dice the metrics ○ Users can aggregate them using SQL ■ Performant implementation using Spark SQL catalyst expressions ● Item Level Metrics ○ E.g. row popularity
  • 39. Visualization Integrates with - a Scala library for matplotlib like visualizations Open-sourced
  • 40. Lessons learnt ● Machine learning is an iterative and data sensitive process ○ Make exploration easy, and productionizing robust ○ Make it easy to go switch between the two ● Design components with a general flexible interface ○ Specialize interfaces when you need to ● Testing can be hard, but worthwhile ○ Unit, Integration, Data Checks, Continuous Integration, @ScaleTesting ○ Metric driven system validations
  • 41. Label Generation Fact Store Training Data Preparation Training Feature Engineering Model Quality Intent To Treat (Serving) Treatment & Action Hyperparameter Optimization N O T E B O O K S Caching Dynamic param management Inference & Logging A/B Testing Platform Online & Precompute Framework Personalization Aggregation Fact Logging Device Logging Online Services API The Personalization Rainbow Control Plane Online Device Function Offline Personalization systems & infrastructure Boson Algo Commons O R C H E S T R A T I O N
  • 42. Joy
  • 43. Thank you! (and yes, we’re hiring) Questions