SlideShare a Scribd company logo
10 Lessons Learned from 
building ML systems 
Xavier Amatriain - Director Algorithms Engineering 
November 2014
Machine Learning 
@ Netflix
10 Lessons Learned from Building Machine Learning Systems
Netflix Scale 
▪ > 50M members 
▪ > 40 countries 
▪ > 1000 device types 
▪ > 7B hours in Q2 2014 
▪ Plays: > 70M/day 
▪ Searches: > 4M/day 
▪ Ratings: > 6M/day 
▪ Log 100B events/day 
▪ 31.62% of peak US downstream 
traffic
Smart Models ■ Regression models (Logistic, 
Linear, Elastic nets) 
■ GBDT/RF 
■ SVD & other MF models 
■ Factorization Machines 
■ Restricted Boltzmann Machines 
■ Markov Chains & other graphical 
models 
■ Clustering (from k-means to 
HDP) 
■ Deep ANN 
■ LDA 
■ Association Rules 
■ …
10 Lessons
1. More data vs. & Better Models
More data or better models? 
Really? 
Anand Rajaraman: Former Stanford Prof. & 
Senior VP at Walmart
More data or better models? 
Sometimes, it’s not 
about more data
More data or better models? 
[Banko and Brill, 2001] 
Norvig: “Google does not 
have better Algorithms, 
only more Data” 
Many features/ 
low-bias models
More data or better models? 
Sometimes, it’s not 
about more data
2. You might not need all your 
Big Data
How useful is Big Data 
■ “Everybody” has Big Data 
■ But not everybody needs it 
■ E.g. Do you need many millions of users if the goal is to 
compute a MF of, say, 100 factors? 
■ Many times, doing some kind of smart (e.g. stratified) 
sampling can produce as good or even better results as 
using it all
3. The fact that a more complex 
model does not improve things 
does not mean you don’t need one
Better Models and features that “don’t work” 
■ Imagine the following scenario: 
■ You have a linear model and for some time you have been 
selecting and optimizing features for that model 
■ If you try a more complex (e.g. non-linear) model with the same 
features you are not likely to see any improvement 
■ If you try to add more expressive features, the existing model is 
likely not to capture them and you are not likely to see any 
improvement
Better Models and features that “don’t work” 
■ More complex features may require 
a more complex model 
■ A more complex model may not 
show improvements with a feature 
set that is too simple
4. Be thoughtful about your 
training data
Defining training/testing data 
■ Imagine you are training a simple binary 
classifier 
■ Defining positive and negative labels -> Non-trivial 
task 
■ E.g. Is this a positive or a negative? 
■ User watches a movie to completion and rates it 1 
star 
■ User watches the same movie again (maybe 
because she can’t find anything else) 
■ User abandons movie after 5 minutes, or 15 
minutes… or 1 hour 
■ User abandons TV show after 2 episodes, or 10 
episode… or 1 season 
■ User adds something to her list but never watches it
Other training data issues: Time traveling 
■ Time traveling: usage of features that originated after 
the event you are trying to predict 
■ E.g. Your rating a movie is a pretty good prediction of you 
watching that movie, especially because most ratings happen 
AFTER you watch the movie 
■ It can get tricky when you have many features that relate to 
each other 
■ Whenever we see an offline experiment with huge wins, the 
first question we ask ourselves is: “Is there time traveling?”
5. Learn to deal with (The curse 
of) Presentation Bias
2D Navigational Modeling 
More likely 
to see 
Less likely
The curse of presentation bias 
■ User can only click on what you decide to show 
■ But, what you decide to show is the result of what your model 
predicted is good 
■ Simply treating things you show as negatives is not 
likely to work 
■ Better options 
■ Correcting for the probability a user will click on a position -> 
Attention models 
■ Explore/exploit approaches such as MAB
6. The UI is the algorithm’s only 
communication channel with that 
which matters most: the users
UI->Algorithm->UI 
■ The UI generates the user 
feedback that we will input into the 
algorithms 
■ The UI is also where the results of 
our algorithms will be shown 
■ A change in the UI might require a 
change in algorithms and 
viceversa
7. Data and Models are great. You 
know what’s even better? The 
right evaluation approach
Offline/Online testing process
Executing A/B tests 
Measure differences in metrics across statistically 
identical populations that each experience a different 
algorithm. 
■ Decisions on the product always data-driven 
■ Overall Evaluation Criteria (OEC) = member retention 
■ Use long-term metrics whenever possible 
■ Short-term metrics can be informative and allow faster 
decisions 
■ But, not always aligned with OEC
Offline testing 
■ Measure model performance, using 
(IR) metrics 
■ Offline performance used as an 
indication to make informed 
decisions on follow-up A/B tests 
■ A critical (and mostly unsolved) 
issue is how offline metrics can 
correlate with A/B test results. 
Problem 
Data 
Metrics 
Algorithm Model
8. Distributing algorithms is 
important, but knowing at what 
level to do it is even more 
important
The three levels of Distribution/Parallelization 
1. For each subset of the population (e.g. 
region) 
2. For each combination of the 
hyperparameters 
3. For each subset of the training data 
Each level has different requirements
ANN Training over GPUS and AWS
9. It pays off to be smart about 
choosing your hyperparameters
Hyperparameter optimization 
■ Automate hyperparameter 
optimization by choosing the right 
metric. 
■ But, is it enough to choose the value 
that maximizes the metric? 
■ E.g. Is a regularization lambda of 0 
better than a lambda = 1000 that 
decreases your metric by only 1%? 
■ Also, think about using Bayesian 
Optimization (Gaussian Processes) 
instead of grid search
10. There are things you can do 
offline and there are things you 
can’t… and there is nearline for 
everything in between
System Overview 
▪ Blueprint for multiple 
personalization algorithm 
services 
▪ Ranking 
▪ Row selection 
▪ Ratings 
▪ … 
▪ Recommendation involving 
multi-layered Machine 
Learning
Matrix Factorization Example
Conclusions 
1. Choose the right metric 
2. Be thoughtful about your data 
3. Understand dependencies 
between data and models 
4. Optimize only what matters
Xavier Amatriain (@xamat) 
xavier@netflix.com 
Thanks! 
(and yes, we are hiring)

More Related Content

What's hot

Machine learning
Machine learning Machine learning
Machine learning
Saurabh Agrawal
 
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
Edureka!
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Kumar P
 
2.17Mb ppt
2.17Mb ppt2.17Mb ppt
2.17Mb ppt
butest
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
Loic Merckel
 
What is Deep Learning?
What is Deep Learning?What is Deep Learning?
What is Deep Learning?
NVIDIA
 
Introduction of Deep Learning
Introduction of Deep LearningIntroduction of Deep Learning
Introduction of Deep Learning
Myungjin Lee
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial Intelligence
Suman Srinivasan
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
David Raj Kanthi
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Shaikh Shahzad
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
DataWorks Summit
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
Spark Summit
 
Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)
Joshua Bloom
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Eng Teong Cheah
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
Simplilearn
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
Yuriy Guts
 

What's hot (20)

Machine learning
Machine learning Machine learning
Machine learning
 
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
2.17Mb ppt
2.17Mb ppt2.17Mb ppt
2.17Mb ppt
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
What is Deep Learning?
What is Deep Learning?What is Deep Learning?
What is Deep Learning?
 
Introduction of Deep Learning
Introduction of Deep LearningIntroduction of Deep Learning
Introduction of Deep Learning
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial Intelligence
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
 
Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 

Viewers also liked

Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
Daniel Tunkelang
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
Prof. Dr. Diego Kuonen
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
joshwills
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
Imry Kissos
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
Daniel Tunkelang
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Data Science London
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
Philip Zheng
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
ryanorban
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Devashish Shanker
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
Pier Luca Lanzi
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
David Pittman
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
NhatHai Phan
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
Owen Zhang
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
Si Haem
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
DEEPASHRI HK
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
DataRobot
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
lpaviglianiti
 

Viewers also liked (20)

Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
 

Similar to 10 Lessons Learned from Building Machine Learning Systems

Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
Xavier Amatriain
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
Axel de Romblay
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Xavier Amatriain
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model Selection
BigML, Inc
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation Systems
Axel de Romblay
 
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Dr. Cornelius Ludmann
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
Cikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business ValueCikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business Value
Xavier Amatriain
 
Introduction to conjoint analysis 2021
Introduction to conjoint analysis 2021Introduction to conjoint analysis 2021
Introduction to conjoint analysis 2021
Ray Poynter
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
NeerajNishad4
 
unit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxunit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptx
Dr.Shweta
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
ODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systems
Anuj Gupta
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
JishanAhmed24
 
Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approach
Ajit Ghodke
 
Improving machine learning models unit 5.pptx
Improving machine learning models unit 5.pptxImproving machine learning models unit 5.pptx
Improving machine learning models unit 5.pptx
SomnathMule5
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
Poo Kuan Hoong
 
Product Madness - A/B Testing
Product Madness - A/B TestingProduct Madness - A/B Testing
Product Madness - A/B Testing
GIAF
 

Similar to 10 Lessons Learned from Building Machine Learning Systems (20)

Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model Selection
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation Systems
 
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Cikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business ValueCikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business Value
 
Introduction to conjoint analysis 2021
Introduction to conjoint analysis 2021Introduction to conjoint analysis 2021
Introduction to conjoint analysis 2021
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
 
unit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxunit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptx
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
ODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systems
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
 
Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approach
 
Improving machine learning models unit 5.pptx
Improving machine learning models unit 5.pptxImproving machine learning models unit 5.pptx
Improving machine learning models unit 5.pptx
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
Product Madness - A/B Testing
Product Madness - A/B TestingProduct Madness - A/B Testing
Product Madness - A/B Testing
 

More from Xavier Amatriain

Data/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealthData/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealth
Xavier Amatriain
 
AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19
Xavier Amatriain
 
AI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 updateAI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 update
Xavier Amatriain
 
AI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approachAI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approach
Xavier Amatriain
 
AI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for EveryoneAI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for Everyone
Xavier Amatriain
 
Towards online universal quality healthcare through AI
Towards online universal quality healthcare through AITowards online universal quality healthcare through AI
Towards online universal quality healthcare through AI
Xavier Amatriain
 
From one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategyFrom one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategy
Xavier Amatriain
 
Learning to speak medicine
Learning to speak medicineLearning to speak medicine
Learning to speak medicine
Xavier Amatriain
 
ML to cure the world
ML to cure the worldML to cure the world
ML to cure the world
Xavier Amatriain
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
Xavier Amatriain
 
Medical advice as a Recommender System
Medical advice as a Recommender SystemMedical advice as a Recommender System
Medical advice as a Recommender System
Xavier Amatriain
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
Xavier Amatriain
 
Staying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldStaying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning World
Xavier Amatriain
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
Xavier Amatriain
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
Xavier Amatriain
 
Barcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons LearnedBarcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons Learned
Xavier Amatriain
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
Xavier Amatriain
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
Xavier Amatriain
 
Machine Learning to Grow the World's Knowledge
Machine Learning to Grow  the World's KnowledgeMachine Learning to Grow  the World's Knowledge
Machine Learning to Grow the World's Knowledge
Xavier Amatriain
 
MLConf Seattle 2015 - ML@Quora
MLConf Seattle 2015 - ML@QuoraMLConf Seattle 2015 - ML@Quora
MLConf Seattle 2015 - ML@Quora
Xavier Amatriain
 

More from Xavier Amatriain (20)

Data/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealthData/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealth
 
AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19
 
AI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 updateAI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 update
 
AI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approachAI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approach
 
AI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for EveryoneAI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for Everyone
 
Towards online universal quality healthcare through AI
Towards online universal quality healthcare through AITowards online universal quality healthcare through AI
Towards online universal quality healthcare through AI
 
From one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategyFrom one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategy
 
Learning to speak medicine
Learning to speak medicineLearning to speak medicine
Learning to speak medicine
 
ML to cure the world
ML to cure the worldML to cure the world
ML to cure the world
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
Medical advice as a Recommender System
Medical advice as a Recommender SystemMedical advice as a Recommender System
Medical advice as a Recommender System
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
Staying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldStaying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning World
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Barcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons LearnedBarcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons Learned
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
Machine Learning to Grow the World's Knowledge
Machine Learning to Grow  the World's KnowledgeMachine Learning to Grow  the World's Knowledge
Machine Learning to Grow the World's Knowledge
 
MLConf Seattle 2015 - ML@Quora
MLConf Seattle 2015 - ML@QuoraMLConf Seattle 2015 - ML@Quora
MLConf Seattle 2015 - ML@Quora
 

Recently uploaded

find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
Edge AI and Vision Alliance
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAIApplying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
ssuserd4e0d2
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
How to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdfHow to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdf
ChristopherTHyatt
 

Recently uploaded (20)

find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAIApplying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
How to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdfHow to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdf
 

10 Lessons Learned from Building Machine Learning Systems

  • 1. 10 Lessons Learned from building ML systems Xavier Amatriain - Director Algorithms Engineering November 2014
  • 4. Netflix Scale ▪ > 50M members ▪ > 40 countries ▪ > 1000 device types ▪ > 7B hours in Q2 2014 ▪ Plays: > 70M/day ▪ Searches: > 4M/day ▪ Ratings: > 6M/day ▪ Log 100B events/day ▪ 31.62% of peak US downstream traffic
  • 5. Smart Models ■ Regression models (Logistic, Linear, Elastic nets) ■ GBDT/RF ■ SVD & other MF models ■ Factorization Machines ■ Restricted Boltzmann Machines ■ Markov Chains & other graphical models ■ Clustering (from k-means to HDP) ■ Deep ANN ■ LDA ■ Association Rules ■ …
  • 7. 1. More data vs. & Better Models
  • 8. More data or better models? Really? Anand Rajaraman: Former Stanford Prof. & Senior VP at Walmart
  • 9. More data or better models? Sometimes, it’s not about more data
  • 10. More data or better models? [Banko and Brill, 2001] Norvig: “Google does not have better Algorithms, only more Data” Many features/ low-bias models
  • 11. More data or better models? Sometimes, it’s not about more data
  • 12. 2. You might not need all your Big Data
  • 13. How useful is Big Data ■ “Everybody” has Big Data ■ But not everybody needs it ■ E.g. Do you need many millions of users if the goal is to compute a MF of, say, 100 factors? ■ Many times, doing some kind of smart (e.g. stratified) sampling can produce as good or even better results as using it all
  • 14. 3. The fact that a more complex model does not improve things does not mean you don’t need one
  • 15. Better Models and features that “don’t work” ■ Imagine the following scenario: ■ You have a linear model and for some time you have been selecting and optimizing features for that model ■ If you try a more complex (e.g. non-linear) model with the same features you are not likely to see any improvement ■ If you try to add more expressive features, the existing model is likely not to capture them and you are not likely to see any improvement
  • 16. Better Models and features that “don’t work” ■ More complex features may require a more complex model ■ A more complex model may not show improvements with a feature set that is too simple
  • 17. 4. Be thoughtful about your training data
  • 18. Defining training/testing data ■ Imagine you are training a simple binary classifier ■ Defining positive and negative labels -> Non-trivial task ■ E.g. Is this a positive or a negative? ■ User watches a movie to completion and rates it 1 star ■ User watches the same movie again (maybe because she can’t find anything else) ■ User abandons movie after 5 minutes, or 15 minutes… or 1 hour ■ User abandons TV show after 2 episodes, or 10 episode… or 1 season ■ User adds something to her list but never watches it
  • 19. Other training data issues: Time traveling ■ Time traveling: usage of features that originated after the event you are trying to predict ■ E.g. Your rating a movie is a pretty good prediction of you watching that movie, especially because most ratings happen AFTER you watch the movie ■ It can get tricky when you have many features that relate to each other ■ Whenever we see an offline experiment with huge wins, the first question we ask ourselves is: “Is there time traveling?”
  • 20. 5. Learn to deal with (The curse of) Presentation Bias
  • 21. 2D Navigational Modeling More likely to see Less likely
  • 22. The curse of presentation bias ■ User can only click on what you decide to show ■ But, what you decide to show is the result of what your model predicted is good ■ Simply treating things you show as negatives is not likely to work ■ Better options ■ Correcting for the probability a user will click on a position -> Attention models ■ Explore/exploit approaches such as MAB
  • 23. 6. The UI is the algorithm’s only communication channel with that which matters most: the users
  • 24. UI->Algorithm->UI ■ The UI generates the user feedback that we will input into the algorithms ■ The UI is also where the results of our algorithms will be shown ■ A change in the UI might require a change in algorithms and viceversa
  • 25. 7. Data and Models are great. You know what’s even better? The right evaluation approach
  • 27. Executing A/B tests Measure differences in metrics across statistically identical populations that each experience a different algorithm. ■ Decisions on the product always data-driven ■ Overall Evaluation Criteria (OEC) = member retention ■ Use long-term metrics whenever possible ■ Short-term metrics can be informative and allow faster decisions ■ But, not always aligned with OEC
  • 28. Offline testing ■ Measure model performance, using (IR) metrics ■ Offline performance used as an indication to make informed decisions on follow-up A/B tests ■ A critical (and mostly unsolved) issue is how offline metrics can correlate with A/B test results. Problem Data Metrics Algorithm Model
  • 29. 8. Distributing algorithms is important, but knowing at what level to do it is even more important
  • 30. The three levels of Distribution/Parallelization 1. For each subset of the population (e.g. region) 2. For each combination of the hyperparameters 3. For each subset of the training data Each level has different requirements
  • 31. ANN Training over GPUS and AWS
  • 32. 9. It pays off to be smart about choosing your hyperparameters
  • 33. Hyperparameter optimization ■ Automate hyperparameter optimization by choosing the right metric. ■ But, is it enough to choose the value that maximizes the metric? ■ E.g. Is a regularization lambda of 0 better than a lambda = 1000 that decreases your metric by only 1%? ■ Also, think about using Bayesian Optimization (Gaussian Processes) instead of grid search
  • 34. 10. There are things you can do offline and there are things you can’t… and there is nearline for everything in between
  • 35. System Overview ▪ Blueprint for multiple personalization algorithm services ▪ Ranking ▪ Row selection ▪ Ratings ▪ … ▪ Recommendation involving multi-layered Machine Learning
  • 37. Conclusions 1. Choose the right metric 2. Be thoughtful about your data 3. Understand dependencies between data and models 4. Optimize only what matters
  • 38. Xavier Amatriain (@xamat) xavier@netflix.com Thanks! (and yes, we are hiring)