MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud

Xavier Amatriain
Xavier AmatriainCofounder/CTO at Curai
Distributing Large-scale ML
Algorithms: from GPUs to the
Cloud
MMDS 2014
June, 2014
Xavier Amatriain
Director - Algorithms Engineering @xamat
Outline
■ Introduction
■ Emmy-winning Algorithms
■ Distributing ML Algorithms in Practice
■ An example: ANN over GPUs & AWS Cloud
What we were interested in:
■ High quality recommendations
Proxy question:
■ Accuracy in predicted rating
■ Improve by 10% = $1million!
Data size:
■ 100M ratings (back then “almost massive”)
2006 2014
Netflix Scale
▪ > 44M members
▪ > 40 countries
▪ > 1000 device types
▪ > 5B hours in Q3 2013
▪ Plays: > 50M/day
▪ Searches: > 3M/day
▪ Ratings: > 5M/day
▪ Log 100B events/day
▪ 31.62% of peak US downstream
traffic
Smart Models ■ Regression models (Logistic,
Linear, Elastic nets)
■ GBDT/RF
■ SVD & other MF models
■ Factorization Machines
■ Restricted Boltzmann Machines
■ Markov Chains & other graphical
models
■ Clustering (from k-means to
modern non-parametric models)
■ Deep ANN
■ LDA
■ Association Rules
■ …
Netflix Algorithms
“Emmy Winning”
MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud
Rating Prediction
2007 Progress Prize
▪ Top 2 algorithms
▪ MF/SVD - Prize RMSE: 0.8914
▪ RBM - Prize RMSE: 0.8990
▪ Linear blend Prize RMSE: 0.88
▪ Currently in use as part of Netflix’ rating prediction component
▪ Limitations
▪ Designed for 100M ratings, we have 5B ratings
▪ Not adaptable as users add ratings
▪ Performance issues
Ranking
Ranking
Page composition
Similarity
Search Recommendations
Postplay
Gamification
Distributing ML algorithms in practice
1. Do I need all that data?
2. At what level should I distribute/parallelize?
3. What latency can I afford?
Do I need all that data?
Really?
Anand Rajaraman: Former Stanford Prof. &
Senior VP at Walmart
Sometimes, it’s not
about more data
[Banko and Brill, 2001]
Norvig: “Google does not
have better Algorithms,
only more Data”
Many features/
low-bias models
Sometimes, it’s not
about more data
At what level should I parallelize?
The three levels of Distribution/Parallelization
1. For each subset of the population (e.g.
region)
2. For each combination of the
hyperparameters
3. For each subset of the training data
Each level has different requirements
Level 1 Distribution
■ We may have subsets of the
population for which we need to
train an independently optimized
model.
■ Training can be fully distributed
requiring no coordination or data
communication
Level 2 Distribution
■ For a given subset of the population we
need to find the “optimal” model
■ Train several models with different
hyperparameter values
■ Worst-case: grid search
■ Can do much better than this (E.g. Bayesian
Optimization with Gaussian Process Priors)
■ This process *does* require coordination
■ Need to decide on next “step”
■ Need to gather final optimal result
■ Requires data distribution, not sharing
Level 3 Distribution
■ For each combination of
hyperparameters, model training may
still be expensive
■ Process requires coordination and
data sharing/communication
■ Can distribute computation over
machines splitting examples or
parameters (e.g. ADMM)
■ Or parallelize on a single multicore
machine (e.g. Hogwild)
■ Or… use GPUs
ANN Training over GPUS and AWS
ANN Training over GPUS and AWS
■ Level 1 distribution: machines over different AWS
regions
■ Level 2 distribution: machines in AWS and same AWS
region
■ Use coordination tools
■ Spearmint or similar for parameter optimization
■ Condor, StarCluster, Mesos… for distributed cluster coordination
■ Level 3 parallelization: highly optimized parallel CUDA
code on GPUs
What latency can I afford?
3 shades of latency
▪ Blueprint for multiple
algorithm services
▪ Ranking
▪ Row selection
▪ Ratings
▪ Search
▪ …
▪ Multi-layered Machine
Learning
Matrix Factorization Example
Xavier Amatriain (@xamat)
xavier@netflix.com
Thanks!
(and yes, we are hiring)
1 of 34

Recommended

Cikm 2013 - Beyond Data From User Information to Business Value by
Cikm 2013 - Beyond Data From User Information to Business ValueCikm 2013 - Beyond Data From User Information to Business Value
Cikm 2013 - Beyond Data From User Information to Business ValueXavier Amatriain
2.3K views36 slides
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix by
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixXavier Amatriain
4K views47 slides
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems by
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
5.9K views51 slides
BIG2016- Lessons Learned from building real-life user-focused Big Data systems by
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsXavier Amatriain
3.4K views48 slides
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys... by
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
16.5K views46 slides
Barcelona ML Meetup - Lessons Learned by
Barcelona ML Meetup - Lessons LearnedBarcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons LearnedXavier Amatriain
3.2K views76 slides

More Related Content

What's hot

A Folksonomy of styles, aka: other stylists also said and Subjective Influenc... by
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...Natalia Díaz Rodríguez
682 views44 slides
Machine Learning to Grow the World's Knowledge by
Machine Learning to Grow  the World's KnowledgeMachine Learning to Grow  the World's Knowledge
Machine Learning to Grow the World's KnowledgeXavier Amatriain
29.6K views24 slides
Recommending the world's knowledge by
Recommending the world's knowledgeRecommending the world's knowledge
Recommending the world's knowledgeLei Yang
1.2K views24 slides
Introduction To Applied Machine Learning by
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learningananth
2.2K views18 slides
A Multi-Armed Bandit Framework For Recommendations at Netflix by
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
11.1K views45 slides
Explainable AI - making ML and DL models more interpretable by
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableAditya Bhattacharya
205 views39 slides

What's hot(20)

A Folksonomy of styles, aka: other stylists also said and Subjective Influenc... by Natalia Díaz Rodríguez
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
Machine Learning to Grow the World's Knowledge by Xavier Amatriain
Machine Learning to Grow  the World's KnowledgeMachine Learning to Grow  the World's Knowledge
Machine Learning to Grow the World's Knowledge
Xavier Amatriain29.6K views
Recommending the world's knowledge by Lei Yang
Recommending the world's knowledgeRecommending the world's knowledge
Recommending the world's knowledge
Lei Yang1.2K views
Introduction To Applied Machine Learning by ananth
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
ananth2.2K views
A Multi-Armed Bandit Framework For Recommendations at Netflix by Jaya Kawale
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale11.1K views
Explainable AI - making ML and DL models more interpretable by Aditya Bhattacharya
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretable
Replicable Evaluation of Recommender Systems by Alejandro Bellogin
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
Alejandro Bellogin5.9K views
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial by Xavier Amatriain
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorialBuilding Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Xavier Amatriain20K views
Artificial Intelligence Course: Linear models by ananth
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
ananth1.4K views
Understanding Basics of Machine Learning by Pranav Ainavolu
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
Pranav Ainavolu473 views
Artwork Personalization at Netflix by Justin Basilico
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico28.1K views
Recommending for the World by Yves Raimond
Recommending for the WorldRecommending for the World
Recommending for the World
Yves Raimond2.7K views
Machine learning basics by Akanksha Bali
Machine learning basics Machine learning basics
Machine learning basics
Akanksha Bali357 views
MaxEnt (Loglinear) Models - Overview by ananth
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
ananth2K views
Factorization Machines with libFM by Liangjie Hong
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFM
Liangjie Hong14.8K views
Introduction to-machine-learning by Babu Priyavrat
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
Babu Priyavrat2.5K views
Introduction to Machine Learning by Shahar Cohen
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Shahar Cohen798 views
Machine Learning Basics by Suresh Arora
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
Suresh Arora736 views
Time, Context and Causality in Recommender Systems by Yves Raimond
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
Yves Raimond5.9K views
Lecture 2 Basic Concepts in Machine Learning for Language Technology by Marina Santini
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Marina Santini5.6K views

Viewers also liked

MLConf Seattle 2015 - ML@Quora by
MLConf Seattle 2015 - ML@QuoraMLConf Seattle 2015 - ML@Quora
MLConf Seattle 2015 - ML@QuoraXavier Amatriain
4.1K views24 slides
May: If I Were 22 by
May: If I Were 22May: If I Were 22
May: If I Were 22LinkedIn Editors' Picks
15.5K views1 slide
Chela stress test by
Chela stress testChela stress test
Chela stress testsuperserch
2.7K views22 slides
Adding Identity Management and Access Control to your Application by
Adding Identity Management and Access Control to your ApplicationAdding Identity Management and Access Control to your Application
Adding Identity Management and Access Control to your ApplicationFernando Lopez Aguilar
3.1K views19 slides
Cwin16 - Paris - ux design by
Cwin16 - Paris - ux designCwin16 - Paris - ux design
Cwin16 - Paris - ux designCapgemini
2.2K views43 slides
Ia32 Modo Protegido by
Ia32 Modo ProtegidoIa32 Modo Protegido
Ia32 Modo ProtegidoErwin Meza
1.8K views17 slides

Viewers also liked(16)

Chela stress test by superserch
Chela stress testChela stress test
Chela stress test
superserch2.7K views
Adding Identity Management and Access Control to your Application by Fernando Lopez Aguilar
Adding Identity Management and Access Control to your ApplicationAdding Identity Management and Access Control to your Application
Adding Identity Management and Access Control to your Application
Cwin16 - Paris - ux design by Capgemini
Cwin16 - Paris - ux designCwin16 - Paris - ux design
Cwin16 - Paris - ux design
Capgemini2.2K views
Ia32 Modo Protegido by Erwin Meza
Ia32 Modo ProtegidoIa32 Modo Protegido
Ia32 Modo Protegido
Erwin Meza1.8K views
Past present and future of Recommender Systems: an Industry Perspective by Xavier Amatriain
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
Xavier Amatriain7.2K views
Marc Stickdorn & Jakob Schneider – Mobile ethnography and ExperienceFellow, a... by Jakob Schneider
Marc Stickdorn & Jakob Schneider – Mobile ethnography and ExperienceFellow, a...Marc Stickdorn & Jakob Schneider – Mobile ethnography and ExperienceFellow, a...
Marc Stickdorn & Jakob Schneider – Mobile ethnography and ExperienceFellow, a...
Jakob Schneider6.4K views
Netflix Nebula - Gradle Summit 2014 by Justin Ryan
Netflix Nebula - Gradle Summit 2014Netflix Nebula - Gradle Summit 2014
Netflix Nebula - Gradle Summit 2014
Justin Ryan7.8K views
Cwin16 tls-partner-hpe-digital economy & Hybrid IT by Capgemini
Cwin16 tls-partner-hpe-digital economy & Hybrid ITCwin16 tls-partner-hpe-digital economy & Hybrid IT
Cwin16 tls-partner-hpe-digital economy & Hybrid IT
Capgemini1.4K views
Disciplined agile business analysis by Scott W. Ambler
Disciplined agile business analysisDisciplined agile business analysis
Disciplined agile business analysis
Scott W. Ambler5.7K views
How Comcast uses Data Science to Improve the Customer Experience by Turi, Inc.
How Comcast uses Data Science to Improve the Customer ExperienceHow Comcast uses Data Science to Improve the Customer Experience
How Comcast uses Data Science to Improve the Customer Experience
Turi, Inc.6.4K views
[PREMONEY 2014] Mayfield Fund >> Tim Chang, "Mobile Is The Future Of YOU: Why... by 500 Startups
[PREMONEY 2014] Mayfield Fund >> Tim Chang, "Mobile Is The Future Of YOU: Why...[PREMONEY 2014] Mayfield Fund >> Tim Chang, "Mobile Is The Future Of YOU: Why...
[PREMONEY 2014] Mayfield Fund >> Tim Chang, "Mobile Is The Future Of YOU: Why...
500 Startups9.2K views
Cwin16 - lyon - faurecia customer cockpit by Capgemini
Cwin16 - lyon - faurecia customer cockpitCwin16 - lyon - faurecia customer cockpit
Cwin16 - lyon - faurecia customer cockpit
Capgemini1.4K views

Similar to MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud

The Power of Auto ML and How Does it Work by
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
2.8K views29 slides
Machine Learning for Fraud Detection by
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud DetectionNitesh Kumar
5.9K views26 slides
Computer Vision for Beginners by
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
197 views39 slides
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan by
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganSpark Summit
2.5K views23 slides
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale by
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain
15.1K views38 slides
Spark Summit EU talk by Josef Habdank by
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit
820 views21 slides

Similar to MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud(20)

The Power of Auto ML and How Does it Work by Ivo Andreev
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev2.8K views
Machine Learning for Fraud Detection by Nitesh Kumar
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
Nitesh Kumar5.9K views
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan by Spark Summit
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Spark Summit2.5K views
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale by Xavier Amatriain
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Xavier Amatriain15.1K views
Spark Summit EU talk by Josef Habdank by Spark Summit
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
Spark Summit820 views
Big data 2.0, deep learning and financial Usecases by Arvind Rapaka
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
Arvind Rapaka290 views
10 Lessons Learned from Building Machine Learning Systems by Xavier Amatriain
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
Xavier Amatriain378.2K views
Prediction as a service with ensemble model in SparkML and Python ScikitLearn by Josef A. Habdank
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Josef A. Habdank1.8K views
Ed Snelson. Counterfactual Analysis by Volha Banadyseva
Ed Snelson. Counterfactual AnalysisEd Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual Analysis
Volha Banadyseva3.7K views
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™ by Databricks
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Databricks2K views
A Framework for Scene Recognition Using Convolutional Neural Network as Featu... by Tahmid Abtahi
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
Tahmid Abtahi2.4K views
Cooperative Machine Learning Network with Ahmed Masud of saf.ai by Christopher Conlan
Cooperative Machine Learning Network with Ahmed Masud of saf.aiCooperative Machine Learning Network with Ahmed Masud of saf.ai
Cooperative Machine Learning Network with Ahmed Masud of saf.ai
SparkML: Easy ML Productization for Real-Time Bidding by Databricks
SparkML: Easy ML Productization for Real-Time BiddingSparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time Bidding
Databricks577 views
Making sense of the Graph Revolution by InfiniteGraph
Making sense of the Graph RevolutionMaking sense of the Graph Revolution
Making sense of the Graph Revolution
InfiniteGraph1.1K views
Large scale Click-streaming and tranaction log mining by itstuff
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
itstuff505 views
IEEE.BigData.Tutorial.2.slides by Nish Parikh
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
Nish Parikh213 views
MongoDB.local Seattle 2019: Global Cluster Topologies in MongoDB Atlas by MongoDB
MongoDB.local Seattle 2019: Global Cluster Topologies in MongoDB AtlasMongoDB.local Seattle 2019: Global Cluster Topologies in MongoDB Atlas
MongoDB.local Seattle 2019: Global Cluster Topologies in MongoDB Atlas
MongoDB254 views
Machine Learning and AI: Core Methods and Applications by QuantUniversity
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
QuantUniversity836 views

More from Xavier Amatriain

Data/AI driven product development: from video streaming to telehealth by
Data/AI driven product development: from video streaming to telehealthData/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealthXavier Amatriain
434 views50 slides
AI-driven product innovation: from Recommender Systems to COVID-19 by
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19Xavier Amatriain
864 views77 slides
AI for COVID-19 - Q42020 update by
AI for COVID-19 - Q42020 updateAI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 updateXavier Amatriain
1.3K views29 slides
AI for COVID-19: An online virtual care approach by
AI for COVID-19: An online virtual care approachAI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approachXavier Amatriain
1.8K views15 slides
Lessons learned from building practical deep learning systems by
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsXavier Amatriain
71.8K views61 slides
AI for healthcare: Scaling Access and Quality of Care for Everyone by
AI for healthcare: Scaling Access and Quality of Care for EveryoneAI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for EveryoneXavier Amatriain
2.5K views29 slides

More from Xavier Amatriain(20)

Data/AI driven product development: from video streaming to telehealth by Xavier Amatriain
Data/AI driven product development: from video streaming to telehealthData/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealth
Xavier Amatriain434 views
AI-driven product innovation: from Recommender Systems to COVID-19 by Xavier Amatriain
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19
Xavier Amatriain864 views
AI for COVID-19: An online virtual care approach by Xavier Amatriain
AI for COVID-19: An online virtual care approachAI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approach
Xavier Amatriain1.8K views
Lessons learned from building practical deep learning systems by Xavier Amatriain
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
Xavier Amatriain71.8K views
AI for healthcare: Scaling Access and Quality of Care for Everyone by Xavier Amatriain
AI for healthcare: Scaling Access and Quality of Care for EveryoneAI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for Everyone
Xavier Amatriain2.5K views
Towards online universal quality healthcare through AI by Xavier Amatriain
Towards online universal quality healthcare through AITowards online universal quality healthcare through AI
Towards online universal quality healthcare through AI
Xavier Amatriain1.7K views
From one to zero: Going smaller as a growth strategy by Xavier Amatriain
From one to zero: Going smaller as a growth strategyFrom one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategy
Xavier Amatriain1.8K views
Medical advice as a Recommender System by Xavier Amatriain
Medical advice as a Recommender SystemMedical advice as a Recommender System
Medical advice as a Recommender System
Xavier Amatriain5.6K views
Staying Shallow & Lean in a Deep Learning World by Xavier Amatriain
Staying Shallow & Lean in a Deep Learning WorldStaying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning World
Xavier Amatriain7.6K views
Machine Learning for Q&A Sites: The Quora Example by Xavier Amatriain
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
Xavier Amatriain6.2K views
Past, present, and future of Recommender Systems: an industry perspective by Xavier Amatriain
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
Xavier Amatriain11K views
10 more lessons learned from building Machine Learning systems - MLConf by Xavier Amatriain
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
Xavier Amatriain375.5K views
10 more lessons learned from building Machine Learning systems by Xavier Amatriain
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
Xavier Amatriain180.1K views
Lean DevOps - Lessons Learned from Innovation-driven Companies by Xavier Amatriain
Lean DevOps - Lessons Learned from Innovation-driven CompaniesLean DevOps - Lessons Learned from Innovation-driven Companies
Lean DevOps - Lessons Learned from Innovation-driven Companies
Xavier Amatriain15.4K views
Recsys 2014 Tutorial - The Recommender Problem Revisited by Xavier Amatriain
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem Revisited
Xavier Amatriain27K views
Kdd 2014 Tutorial - the recommender problem revisited by Xavier Amatriain
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
Xavier Amatriain46.9K views

Recently uploaded

Control Systems Feedback.pdf by
Control Systems Feedback.pdfControl Systems Feedback.pdf
Control Systems Feedback.pdfLGGaming5
6 views39 slides
Effect of deep chemical mixing columns on properties of surrounding soft clay... by
Effect of deep chemical mixing columns on properties of surrounding soft clay...Effect of deep chemical mixing columns on properties of surrounding soft clay...
Effect of deep chemical mixing columns on properties of surrounding soft clay...AltinKaradagli
9 views10 slides
K8S Roadmap.pdf by
K8S Roadmap.pdfK8S Roadmap.pdf
K8S Roadmap.pdfMaryamTavakkoli2
8 views1 slide
fakenews_DBDA_Mar23.pptx by
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptxdeepmitra8
15 views34 slides
Design of machine elements-UNIT 3.pptx by
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptxgopinathcreddy
32 views31 slides
DESIGN OF SPRINGS-UNIT4.pptx by
DESIGN OF SPRINGS-UNIT4.pptxDESIGN OF SPRINGS-UNIT4.pptx
DESIGN OF SPRINGS-UNIT4.pptxgopinathcreddy
19 views47 slides

Recently uploaded(20)

Control Systems Feedback.pdf by LGGaming5
Control Systems Feedback.pdfControl Systems Feedback.pdf
Control Systems Feedback.pdf
LGGaming56 views
Effect of deep chemical mixing columns on properties of surrounding soft clay... by AltinKaradagli
Effect of deep chemical mixing columns on properties of surrounding soft clay...Effect of deep chemical mixing columns on properties of surrounding soft clay...
Effect of deep chemical mixing columns on properties of surrounding soft clay...
AltinKaradagli9 views
fakenews_DBDA_Mar23.pptx by deepmitra8
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptx
deepmitra815 views
Design of machine elements-UNIT 3.pptx by gopinathcreddy
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptx
gopinathcreddy32 views
MSA Website Slideshow (16).pdf by msaucla
MSA Website Slideshow (16).pdfMSA Website Slideshow (16).pdf
MSA Website Slideshow (16).pdf
msaucla76 views
_MAKRIADI-FOTEINI_diploma thesis.pptx by fotinimakriadi
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptx
fotinimakriadi8 views
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) by Tsuyoshi Horigome
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... by csegroupvn
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
csegroupvn5 views
Generative AI Models & Their Applications by SN
Generative AI Models & Their ApplicationsGenerative AI Models & Their Applications
Generative AI Models & Their Applications
SN8 views
GDSC Mikroskil Members Onboarding 2023.pdf by gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil53 views

MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud

  • 1. Distributing Large-scale ML Algorithms: from GPUs to the Cloud MMDS 2014 June, 2014 Xavier Amatriain Director - Algorithms Engineering @xamat
  • 2. Outline ■ Introduction ■ Emmy-winning Algorithms ■ Distributing ML Algorithms in Practice ■ An example: ANN over GPUs & AWS Cloud
  • 3. What we were interested in: ■ High quality recommendations Proxy question: ■ Accuracy in predicted rating ■ Improve by 10% = $1million! Data size: ■ 100M ratings (back then “almost massive”)
  • 5. Netflix Scale ▪ > 44M members ▪ > 40 countries ▪ > 1000 device types ▪ > 5B hours in Q3 2013 ▪ Plays: > 50M/day ▪ Searches: > 3M/day ▪ Ratings: > 5M/day ▪ Log 100B events/day ▪ 31.62% of peak US downstream traffic
  • 6. Smart Models ■ Regression models (Logistic, Linear, Elastic nets) ■ GBDT/RF ■ SVD & other MF models ■ Factorization Machines ■ Restricted Boltzmann Machines ■ Markov Chains & other graphical models ■ Clustering (from k-means to modern non-parametric models) ■ Deep ANN ■ LDA ■ Association Rules ■ …
  • 10. 2007 Progress Prize ▪ Top 2 algorithms ▪ MF/SVD - Prize RMSE: 0.8914 ▪ RBM - Prize RMSE: 0.8990 ▪ Linear blend Prize RMSE: 0.88 ▪ Currently in use as part of Netflix’ rating prediction component ▪ Limitations ▪ Designed for 100M ratings, we have 5B ratings ▪ Not adaptable as users add ratings ▪ Performance issues
  • 18. 1. Do I need all that data? 2. At what level should I distribute/parallelize? 3. What latency can I afford?
  • 19. Do I need all that data?
  • 20. Really? Anand Rajaraman: Former Stanford Prof. & Senior VP at Walmart
  • 22. [Banko and Brill, 2001] Norvig: “Google does not have better Algorithms, only more Data” Many features/ low-bias models
  • 24. At what level should I parallelize?
  • 25. The three levels of Distribution/Parallelization 1. For each subset of the population (e.g. region) 2. For each combination of the hyperparameters 3. For each subset of the training data Each level has different requirements
  • 26. Level 1 Distribution ■ We may have subsets of the population for which we need to train an independently optimized model. ■ Training can be fully distributed requiring no coordination or data communication
  • 27. Level 2 Distribution ■ For a given subset of the population we need to find the “optimal” model ■ Train several models with different hyperparameter values ■ Worst-case: grid search ■ Can do much better than this (E.g. Bayesian Optimization with Gaussian Process Priors) ■ This process *does* require coordination ■ Need to decide on next “step” ■ Need to gather final optimal result ■ Requires data distribution, not sharing
  • 28. Level 3 Distribution ■ For each combination of hyperparameters, model training may still be expensive ■ Process requires coordination and data sharing/communication ■ Can distribute computation over machines splitting examples or parameters (e.g. ADMM) ■ Or parallelize on a single multicore machine (e.g. Hogwild) ■ Or… use GPUs
  • 29. ANN Training over GPUS and AWS
  • 30. ANN Training over GPUS and AWS ■ Level 1 distribution: machines over different AWS regions ■ Level 2 distribution: machines in AWS and same AWS region ■ Use coordination tools ■ Spearmint or similar for parameter optimization ■ Condor, StarCluster, Mesos… for distributed cluster coordination ■ Level 3 parallelization: highly optimized parallel CUDA code on GPUs
  • 31. What latency can I afford?
  • 32. 3 shades of latency ▪ Blueprint for multiple algorithm services ▪ Ranking ▪ Row selection ▪ Ratings ▪ Search ▪ … ▪ Multi-layered Machine Learning