SlideShare a Scribd company logo
1 of 35
11
Recommendations for
Building Machine Learning Software
Justin Basilico
Page Algorithms Engineering November 13, 2015
@JustinBasilico
SF 2015
22
Introduction
3
Change of focus
2006 2015
4
Netflix Scale
 > 69M members
 > 50 countries
 > 1000 device types
 > 3B hours/month
 36.4% of peak US
downstream traffic
5
Goal
Help members find content to watch and enjoy
to maximize member satisfaction and retention
6
Everything is a Recommendation
Rows
Ranking
Over 80% of what
people watch
comes from our
recommendations
Recommendations
are driven by
Machine Learning
7
Machine Learning Approach
Problem
Data
AlgorithmModel
Metrics
8
Models & Algorithms
 Regression (linear, logistic, elastic net)
 SVD and other Matrix Factorizations
 Factorization Machines
 Restricted Boltzmann Machines
 Deep Neural Networks
 Markov Models and Graph Algorithms
 Clustering
 Latent Dirichlet Allocation
 Gradient Boosted Decision
Trees/Random Forests
 Gaussian Processes
 …
9
Design Considerations
Recommendations
• Personal
• Accurate
• Diverse
• Novel
• Fresh
Software
• Scalable
• Responsive
• Resilient
• Efficient
• Flexible
10
Software Stack
http://techblog.netflix.com
1111
Recommendations
12
Be flexible about where and when
computation happens
Recommendation 1
13
System Architecture
 Offline: Process data
 Batch learning
 Nearline: Process events
 Model evaluation
 Online learning
 Asynchronous
 Online: Process requests
 Real-time
Netflix.Hermes
Netflix.Manhattan
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
More details on Netflix Techblog
14
Where to place components?
 Example: Matrix Factorization
 Offline:
 Collect sample of play data
 Run batch learning algorithm like
SGD to produce factorization
 Publish video factors
 Nearline:
 Solve user factors
 Compute user-video dot products
 Store scores in cache
 Online:
 Presentation-context filtering
 Serve recommendations
Netflix.Hermes
Netflix.Manhattan
Nearline
Computation
Models
Online
Data Service
Offline Data
Model
training
Online
Computation
Event Distribution
User Event
Queue
Algorithm
Service
UI Client
Member
Query results
Recommendations
NEARLINE
Machine
Learning
Algorithm
Machine
Learning
Algorithm
Offline
Computation Machine
Learning
Algorithm
Play, Rate,
Browse...
OFFLINE
ONLINE
V
sij=uivj Aui=b
sij
X≈UVt
X
sij>t
15
Think about distribution
starting from the outermost levels
Recommendation 2
16
Three levels of Learning Distribution/Parallelization
1. For each subset of the population (e.g.
region)
 Want independently trained and tuned models
2. For each combination of (hyper)parameters
 Simple: Grid search
 Better: Bayesian optimization using Gaussian
Processes
3. For each subset of the training data
 Distribute over machines (e.g. ADMM)
 Multi-core parallelism (e.g. HogWild)
 Or… use GPUs
17
Example: Training Neural Networks
 Level 1: Machines in different
AWS regions
 Level 2: Machines in same AWS
region
 Spearmint or MOE for parameter
optimization
 Mesos, etc. for coordination
 Level 3: Highly optimized, parallel
CUDA code on GPUs
18
Design application software for
experimentation
Recommendation 3
19
Example development process
Idea Data
Offline
Modeling
(R, Python,
MATLAB, …)
Iterate
Implement in
production
system (Java,
C++, …)
Data
discrepancies
Missing post-
processing
logic
Performance
issues
Actual
output
Experimentation environment
Production environment
(A/B test) Code
discrepancies
Final
model
20
Shared Engine
Avoid dual implementations
Experiment
code
Production
code
ProductionExperiment • Models
• Features
• Algorithms
• …
21
Solution: Share and lean towards production
 Developing machine learning is iterative
 Need a short pipeline to rapidly try ideas
 Want to see output of complete system
 So make the application easy to experiment with
 Share components between online, nearline, and offline
 Use the real code whenever possible
 Have well-defined interfaces and formats to allow you to go
off-the-beaten-path
22
Make algorithms extensible and modular
Recommendation 4
23
Make algorithms and models extensible and modular
 Algorithms often need to be tailored for a
specific application
 Treating an algorithm as a black box is
limiting
 Better to make algorithms extensible and
modular to allow for customization
 Separate models and algorithms
 Many algorithms can learn the same model
(i.e. linear binary classifier)
 Many algorithms can be trained on the same
types of data
 Support composing algorithms
Data
Parameters
Data
Model
Parameters
Model
Algorithm
Vs.
24
Provide building blocks
 Don’t start from scratch
 Linear algebra: Vectors, Matrices, …
 Statistics: Distributions, tests, …
 Models, features, metrics, ensembles, …
 Loss, distance, kernel, … functions
 Optimization, inference, …
 Layers, activation functions, …
 Initializers, stopping criteria, …
 …
 Domain-specific components
Build abstractions on
familiar concepts
Make the software put
them together
25
Example: Tailoring Random Forests
Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry
Use a custom
tree split
Customize to
run it for an
hour
Report a
custom metric
each iteration
Inspect the
ensemble
26
Describe your input and output
transformations with your model
Recommendation 5
27
Application
Putting learning in an application
Feature
Encoding
Output
Decoding
?
Machine
Learned Model
Rd ⟶ Rk
Application or model code?
28
Example: Simple ranking system
 High-level API: List<Video> rank(User u, List<Video> videos)
 Example model description file:
{
“type”: “ScoringRanker”,
“scorer”: {
“type”: “FeatureScorer”,
“features”: [
{“type”: “Popularity”, “days”: 10},
{“type”: “PredictedRating”}
],
“function”: {
“type”: “Linear”,
“bias”: -0.5,
“weights”: {
“popularity”: 0.2,
“predictedRating”: 1.2,
“predictedRating*popularity”:
3.5
}
}
}
Ranker
Scorer
Features
Linear function
Feature transformations
29
Don’t just rely on metrics for testing
Recommendation 6
30
Machine Learning and Testing
 Temptation: Use validation metrics to test software
 When things work and metrics go up this seems great
 When metrics don’t improve was it the
 code
 data
 metric
 idea
 …?
31
Reality of Testing
 Machine learning code involves intricate math and logic
 Rounding issues, corner cases, …
 Is that a + or -? (The math or paper could be wrong.)
 Solution: Unit test
 Testing of metric code is especially important
 Test the whole system: Just unit testing is not enough
 At a minimum, compare output for unexpected changes across
versions
3232
Conclusions
33
Two ways to solve computational problems
Know
solution
Write code
Compile
code
Test code Deploy code
Know
relevant
data
Develop
algorithmic
approach
Train model
on data using
algorithm
Validate
model with
metrics
Deploy
model
Software Development
Machine Learning
(steps may involve Software Development)
34
Take-aways for building machine learning software
 Building machine learning is an iterative process
 Make experimentation easy
 Take a holistic view of application where you are placing
learning
 Design your algorithms to be modular
 Look for the easy places to parallelize first
 Testing can be hard but is worthwhile
35
Thank You Justin Basilico
jbasilico@netflix.com
@JustinBasilico
We’re hiring

More Related Content

What's hot

MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the CloudMMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
Xavier Amatriain
 
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...
Sri Ambati
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati
 
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
Xavier Amatriain
 

What's hot (20)

Recommendation at Netflix Scale
Recommendation at Netflix ScaleRecommendation at Netflix Scale
Recommendation at Netflix Scale
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
 
MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the CloudMMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
 
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...
Driver vs Driverless AI - Mark Landry, Competitive Data Scientist and Product...
 
Model Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model AnalysisModel Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model Analysis
 
Modern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and PracticesModern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and Practices
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 

Viewers also liked

Machine Learning
Machine LearningMachine Learning
Machine Learning
Soft Computing
 
Séminaire Expérience Client
Séminaire Expérience ClientSéminaire Expérience Client
Séminaire Expérience Client
Soft Computing
 
Machine Learning et Intelligence Artificielle
Machine Learning et Intelligence ArtificielleMachine Learning et Intelligence Artificielle
Machine Learning et Intelligence Artificielle
Soft Computing
 
Phygital
PhygitalPhygital
Phygital
Soft Computing
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Lior Rokach
 

Viewers also liked (15)

Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
BSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data TransformationsBSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data Transformations
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and Evaluations
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
API, WhizzML and Apps
API, WhizzML and AppsAPI, WhizzML and Apps
API, WhizzML and Apps
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering
 
VSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic RegressionVSSML16 L2. Ensembles and Logistic Regression
VSSML16 L2. Ensembles and Logistic Regression
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature Engineering
 
Séminaire Expérience Client
Séminaire Expérience ClientSéminaire Expérience Client
Séminaire Expérience Client
 
Machine Learning et Intelligence Artificielle
Machine Learning et Intelligence ArtificielleMachine Learning et Intelligence Artificielle
Machine Learning et Intelligence Artificielle
 
Données Personnelles
Données PersonnellesDonnées Personnelles
Données Personnelles
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 
Phygital
PhygitalPhygital
Phygital
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Similar to Recommendations for Building Machine Learning Software

Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
Stepan Pushkarev
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
PyData
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Similar to Recommendations for Building Machine Learning Software (20)

Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
 
Build 2019 Recap
Build 2019 RecapBuild 2019 Recap
Build 2019 Recap
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Labview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRLLabview1_ Computer Applications in Control_ACRRL
Labview1_ Computer Applications in Control_ACRRL
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Presentation Verification & Validation
Presentation Verification & ValidationPresentation Verification & Validation
Presentation Verification & Validation
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at Scale
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in production
 

More from Justin Basilico

Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 

More from Justin Basilico (6)

Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Recommendations for Building Machine Learning Software

  • 1. 11 Recommendations for Building Machine Learning Software Justin Basilico Page Algorithms Engineering November 13, 2015 @JustinBasilico SF 2015
  • 4. 4 Netflix Scale  > 69M members  > 50 countries  > 1000 device types  > 3B hours/month  36.4% of peak US downstream traffic
  • 5. 5 Goal Help members find content to watch and enjoy to maximize member satisfaction and retention
  • 6. 6 Everything is a Recommendation Rows Ranking Over 80% of what people watch comes from our recommendations Recommendations are driven by Machine Learning
  • 8. 8 Models & Algorithms  Regression (linear, logistic, elastic net)  SVD and other Matrix Factorizations  Factorization Machines  Restricted Boltzmann Machines  Deep Neural Networks  Markov Models and Graph Algorithms  Clustering  Latent Dirichlet Allocation  Gradient Boosted Decision Trees/Random Forests  Gaussian Processes  …
  • 9. 9 Design Considerations Recommendations • Personal • Accurate • Diverse • Novel • Fresh Software • Scalable • Responsive • Resilient • Efficient • Flexible
  • 12. 12 Be flexible about where and when computation happens Recommendation 1
  • 13. 13 System Architecture  Offline: Process data  Batch learning  Nearline: Process events  Model evaluation  Online learning  Asynchronous  Online: Process requests  Real-time Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE More details on Netflix Techblog
  • 14. 14 Where to place components?  Example: Matrix Factorization  Offline:  Collect sample of play data  Run batch learning algorithm like SGD to produce factorization  Publish video factors  Nearline:  Solve user factors  Compute user-video dot products  Store scores in cache  Online:  Presentation-context filtering  Serve recommendations Netflix.Hermes Netflix.Manhattan Nearline Computation Models Online Data Service Offline Data Model training Online Computation Event Distribution User Event Queue Algorithm Service UI Client Member Query results Recommendations NEARLINE Machine Learning Algorithm Machine Learning Algorithm Offline Computation Machine Learning Algorithm Play, Rate, Browse... OFFLINE ONLINE V sij=uivj Aui=b sij X≈UVt X sij>t
  • 15. 15 Think about distribution starting from the outermost levels Recommendation 2
  • 16. 16 Three levels of Learning Distribution/Parallelization 1. For each subset of the population (e.g. region)  Want independently trained and tuned models 2. For each combination of (hyper)parameters  Simple: Grid search  Better: Bayesian optimization using Gaussian Processes 3. For each subset of the training data  Distribute over machines (e.g. ADMM)  Multi-core parallelism (e.g. HogWild)  Or… use GPUs
  • 17. 17 Example: Training Neural Networks  Level 1: Machines in different AWS regions  Level 2: Machines in same AWS region  Spearmint or MOE for parameter optimization  Mesos, etc. for coordination  Level 3: Highly optimized, parallel CUDA code on GPUs
  • 18. 18 Design application software for experimentation Recommendation 3
  • 19. 19 Example development process Idea Data Offline Modeling (R, Python, MATLAB, …) Iterate Implement in production system (Java, C++, …) Data discrepancies Missing post- processing logic Performance issues Actual output Experimentation environment Production environment (A/B test) Code discrepancies Final model
  • 20. 20 Shared Engine Avoid dual implementations Experiment code Production code ProductionExperiment • Models • Features • Algorithms • …
  • 21. 21 Solution: Share and lean towards production  Developing machine learning is iterative  Need a short pipeline to rapidly try ideas  Want to see output of complete system  So make the application easy to experiment with  Share components between online, nearline, and offline  Use the real code whenever possible  Have well-defined interfaces and formats to allow you to go off-the-beaten-path
  • 22. 22 Make algorithms extensible and modular Recommendation 4
  • 23. 23 Make algorithms and models extensible and modular  Algorithms often need to be tailored for a specific application  Treating an algorithm as a black box is limiting  Better to make algorithms extensible and modular to allow for customization  Separate models and algorithms  Many algorithms can learn the same model (i.e. linear binary classifier)  Many algorithms can be trained on the same types of data  Support composing algorithms Data Parameters Data Model Parameters Model Algorithm Vs.
  • 24. 24 Provide building blocks  Don’t start from scratch  Linear algebra: Vectors, Matrices, …  Statistics: Distributions, tests, …  Models, features, metrics, ensembles, …  Loss, distance, kernel, … functions  Optimization, inference, …  Layers, activation functions, …  Initializers, stopping criteria, …  …  Domain-specific components Build abstractions on familiar concepts Make the software put them together
  • 25. 25 Example: Tailoring Random Forests Using Cognitive Foundry: http://github.com/algorithmfoundry/Foundry Use a custom tree split Customize to run it for an hour Report a custom metric each iteration Inspect the ensemble
  • 26. 26 Describe your input and output transformations with your model Recommendation 5
  • 27. 27 Application Putting learning in an application Feature Encoding Output Decoding ? Machine Learned Model Rd ⟶ Rk Application or model code?
  • 28. 28 Example: Simple ranking system  High-level API: List<Video> rank(User u, List<Video> videos)  Example model description file: { “type”: “ScoringRanker”, “scorer”: { “type”: “FeatureScorer”, “features”: [ {“type”: “Popularity”, “days”: 10}, {“type”: “PredictedRating”} ], “function”: { “type”: “Linear”, “bias”: -0.5, “weights”: { “popularity”: 0.2, “predictedRating”: 1.2, “predictedRating*popularity”: 3.5 } } } Ranker Scorer Features Linear function Feature transformations
  • 29. 29 Don’t just rely on metrics for testing Recommendation 6
  • 30. 30 Machine Learning and Testing  Temptation: Use validation metrics to test software  When things work and metrics go up this seems great  When metrics don’t improve was it the  code  data  metric  idea  …?
  • 31. 31 Reality of Testing  Machine learning code involves intricate math and logic  Rounding issues, corner cases, …  Is that a + or -? (The math or paper could be wrong.)  Solution: Unit test  Testing of metric code is especially important  Test the whole system: Just unit testing is not enough  At a minimum, compare output for unexpected changes across versions
  • 33. 33 Two ways to solve computational problems Know solution Write code Compile code Test code Deploy code Know relevant data Develop algorithmic approach Train model on data using algorithm Validate model with metrics Deploy model Software Development Machine Learning (steps may involve Software Development)
  • 34. 34 Take-aways for building machine learning software  Building machine learning is an iterative process  Make experimentation easy  Take a holistic view of application where you are placing learning  Design your algorithms to be modular  Look for the easy places to parallelize first  Testing can be hard but is worthwhile
  • 35. 35 Thank You Justin Basilico jbasilico@netflix.com @JustinBasilico We’re hiring

Editor's Notes

  1. http://techblog.netflix.com/2013/03/system-architectures-for.html
  2. http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html