SlideShare a Scribd company logo
1 of 34
Download to read offline
Distributing Large-scale ML
Algorithms: from GPUs to the
Cloud
MMDS 2014
June, 2014
Xavier Amatriain
Director - Algorithms Engineering @xamat
Outline
■ Introduction
■ Emmy-winning Algorithms
■ Distributing ML Algorithms in Practice
■ An example: ANN over GPUs & AWS Cloud
What we were interested in:
■ High quality recommendations
Proxy question:
■ Accuracy in predicted rating
■ Improve by 10% = $1million!
Data size:
■ 100M ratings (back then “almost massive”)
2006 2014
Netflix Scale
▪ > 44M members
▪ > 40 countries
▪ > 1000 device types
▪ > 5B hours in Q3 2013
▪ Plays: > 50M/day
▪ Searches: > 3M/day
▪ Ratings: > 5M/day
▪ Log 100B events/day
▪ 31.62% of peak US downstream
traffic
Smart Models ■ Regression models (Logistic,
Linear, Elastic nets)
■ GBDT/RF
■ SVD & other MF models
■ Factorization Machines
■ Restricted Boltzmann Machines
■ Markov Chains & other graphical
models
■ Clustering (from k-means to
modern non-parametric models)
■ Deep ANN
■ LDA
■ Association Rules
■ …
Netflix Algorithms
“Emmy Winning”
Rating Prediction
2007 Progress Prize
▪ Top 2 algorithms
▪ MF/SVD - Prize RMSE: 0.8914
▪ RBM - Prize RMSE: 0.8990
▪ Linear blend Prize RMSE: 0.88
▪ Currently in use as part of Netflix’ rating prediction component
▪ Limitations
▪ Designed for 100M ratings, we have 5B ratings
▪ Not adaptable as users add ratings
▪ Performance issues
Ranking
Ranking
Page composition
Similarity
Search Recommendations
Postplay
Gamification
Distributing ML algorithms in practice
1. Do I need all that data?
2. At what level should I distribute/parallelize?
3. What latency can I afford?
Do I need all that data?
Really?
Anand Rajaraman: Former Stanford Prof. &
Senior VP at Walmart
Sometimes, it’s not
about more data
[Banko and Brill, 2001]
Norvig: “Google does not
have better Algorithms,
only more Data”
Many features/
low-bias models
Sometimes, it’s not
about more data
At what level should I parallelize?
The three levels of Distribution/Parallelization
1. For each subset of the population (e.g.
region)
2. For each combination of the
hyperparameters
3. For each subset of the training data
Each level has different requirements
Level 1 Distribution
■ We may have subsets of the
population for which we need to
train an independently optimized
model.
■ Training can be fully distributed
requiring no coordination or data
communication
Level 2 Distribution
■ For a given subset of the population we
need to find the “optimal” model
■ Train several models with different
hyperparameter values
■ Worst-case: grid search
■ Can do much better than this (E.g. Bayesian
Optimization with Gaussian Process Priors)
■ This process *does* require coordination
■ Need to decide on next “step”
■ Need to gather final optimal result
■ Requires data distribution, not sharing
Level 3 Distribution
■ For each combination of
hyperparameters, model training may
still be expensive
■ Process requires coordination and
data sharing/communication
■ Can distribute computation over
machines splitting examples or
parameters (e.g. ADMM)
■ Or parallelize on a single multicore
machine (e.g. Hogwild)
■ Or… use GPUs
ANN Training over GPUS and AWS
ANN Training over GPUS and AWS
■ Level 1 distribution: machines over different AWS
regions
■ Level 2 distribution: machines in AWS and same AWS
region
■ Use coordination tools
■ Spearmint or similar for parameter optimization
■ Condor, StarCluster, Mesos… for distributed cluster coordination
■ Level 3 parallelization: highly optimized parallel CUDA
code on GPUs
What latency can I afford?
3 shades of latency
▪ Blueprint for multiple
algorithm services
▪ Ranking
▪ Row selection
▪ Ratings
▪ Search
▪ …
▪ Multi-layered Machine
Learning
Matrix Factorization Example
Xavier Amatriain (@xamat)
xavier@netflix.com
Thanks!
(and yes, we are hiring)

More Related Content

What's hot

Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFM
Liangjie Hong
 

What's hot (20)

A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
 
Machine Learning to Grow the World's Knowledge
Machine Learning to Grow  the World's KnowledgeMachine Learning to Grow  the World's Knowledge
Machine Learning to Grow the World's Knowledge
 
Recommending the world's knowledge
Recommending the world's knowledgeRecommending the world's knowledge
Recommending the world's knowledge
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretable
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorialBuilding Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFM
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 

Viewers also liked

Viewers also liked (16)

MLConf Seattle 2015 - ML@Quora
MLConf Seattle 2015 - ML@QuoraMLConf Seattle 2015 - ML@Quora
MLConf Seattle 2015 - ML@Quora
 
May: If I Were 22
May: If I Were 22May: If I Were 22
May: If I Were 22
 
Chela stress test
Chela stress testChela stress test
Chela stress test
 
Adding Identity Management and Access Control to your Application
Adding Identity Management and Access Control to your ApplicationAdding Identity Management and Access Control to your Application
Adding Identity Management and Access Control to your Application
 
Cwin16 - Paris - ux design
Cwin16 - Paris - ux designCwin16 - Paris - ux design
Cwin16 - Paris - ux design
 
Ia32 Modo Protegido
Ia32 Modo ProtegidoIa32 Modo Protegido
Ia32 Modo Protegido
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
Marc Stickdorn & Jakob Schneider – Mobile ethnography and ExperienceFellow, a...
Marc Stickdorn & Jakob Schneider – Mobile ethnography and ExperienceFellow, a...Marc Stickdorn & Jakob Schneider – Mobile ethnography and ExperienceFellow, a...
Marc Stickdorn & Jakob Schneider – Mobile ethnography and ExperienceFellow, a...
 
Netflix Nebula - Gradle Summit 2014
Netflix Nebula - Gradle Summit 2014Netflix Nebula - Gradle Summit 2014
Netflix Nebula - Gradle Summit 2014
 
Cwin16 tls-partner-hpe-digital economy & Hybrid IT
Cwin16 tls-partner-hpe-digital economy & Hybrid ITCwin16 tls-partner-hpe-digital economy & Hybrid IT
Cwin16 tls-partner-hpe-digital economy & Hybrid IT
 
Disciplined agile business analysis
Disciplined agile business analysisDisciplined agile business analysis
Disciplined agile business analysis
 
How Comcast uses Data Science to Improve the Customer Experience
How Comcast uses Data Science to Improve the Customer ExperienceHow Comcast uses Data Science to Improve the Customer Experience
How Comcast uses Data Science to Improve the Customer Experience
 
How to Start a Startup at NYU
How to Start a Startup at NYUHow to Start a Startup at NYU
How to Start a Startup at NYU
 
Introduction to the Innovation Corps (NSF I-Corps)
Introduction to the Innovation Corps (NSF I-Corps)Introduction to the Innovation Corps (NSF I-Corps)
Introduction to the Innovation Corps (NSF I-Corps)
 
[PREMONEY 2014] Mayfield Fund >> Tim Chang, "Mobile Is The Future Of YOU: Why...
[PREMONEY 2014] Mayfield Fund >> Tim Chang, "Mobile Is The Future Of YOU: Why...[PREMONEY 2014] Mayfield Fund >> Tim Chang, "Mobile Is The Future Of YOU: Why...
[PREMONEY 2014] Mayfield Fund >> Tim Chang, "Mobile Is The Future Of YOU: Why...
 
Cwin16 - lyon - faurecia customer cockpit
Cwin16 - lyon - faurecia customer cockpitCwin16 - lyon - faurecia customer cockpit
Cwin16 - lyon - faurecia customer cockpit
 

Similar to MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud

Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Spark Summit
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Xavier Amatriain
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
Nish Parikh
 
Machine Learning Introduction introducing basics of Machine Learning
Machine Learning Introduction introducing basics of Machine LearningMachine Learning Introduction introducing basics of Machine Learning
Machine Learning Introduction introducing basics of Machine Learning
vipulkondekar
 

Similar to MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud (20)

The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
 
Ed Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual AnalysisEd Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual Analysis
 
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 
Cooperative Machine Learning Network with Ahmed Masud of saf.ai
Cooperative Machine Learning Network with Ahmed Masud of saf.aiCooperative Machine Learning Network with Ahmed Masud of saf.ai
Cooperative Machine Learning Network with Ahmed Masud of saf.ai
 
C3 w1
C3 w1C3 w1
C3 w1
 
SparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time BiddingSparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time Bidding
 
Making sense of the Graph Revolution
Making sense of the Graph RevolutionMaking sense of the Graph Revolution
Making sense of the Graph Revolution
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
 
Machine Learning Introduction introducing basics of Machine Learning
Machine Learning Introduction introducing basics of Machine LearningMachine Learning Introduction introducing basics of Machine Learning
Machine Learning Introduction introducing basics of Machine Learning
 
MongoDB.local Seattle 2019: Global Cluster Topologies in MongoDB Atlas
MongoDB.local Seattle 2019: Global Cluster Topologies in MongoDB AtlasMongoDB.local Seattle 2019: Global Cluster Topologies in MongoDB Atlas
MongoDB.local Seattle 2019: Global Cluster Topologies in MongoDB Atlas
 

More from Xavier Amatriain

More from Xavier Amatriain (20)

Data/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealthData/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealth
 
AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19
 
AI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 updateAI for COVID-19 - Q42020 update
AI for COVID-19 - Q42020 update
 
AI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approachAI for COVID-19: An online virtual care approach
AI for COVID-19: An online virtual care approach
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
AI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for EveryoneAI for healthcare: Scaling Access and Quality of Care for Everyone
AI for healthcare: Scaling Access and Quality of Care for Everyone
 
Towards online universal quality healthcare through AI
Towards online universal quality healthcare through AITowards online universal quality healthcare through AI
Towards online universal quality healthcare through AI
 
From one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategyFrom one to zero: Going smaller as a growth strategy
From one to zero: Going smaller as a growth strategy
 
Learning to speak medicine
Learning to speak medicineLearning to speak medicine
Learning to speak medicine
 
ML to cure the world
ML to cure the worldML to cure the world
ML to cure the world
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
Medical advice as a Recommender System
Medical advice as a Recommender SystemMedical advice as a Recommender System
Medical advice as a Recommender System
 
Staying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldStaying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning World
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
Lean DevOps - Lessons Learned from Innovation-driven Companies
Lean DevOps - Lessons Learned from Innovation-driven CompaniesLean DevOps - Lessons Learned from Innovation-driven Companies
Lean DevOps - Lessons Learned from Innovation-driven Companies
 
Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem Revisited
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 

Recently uploaded

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 

Recently uploaded (20)

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 

MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud

  • 1. Distributing Large-scale ML Algorithms: from GPUs to the Cloud MMDS 2014 June, 2014 Xavier Amatriain Director - Algorithms Engineering @xamat
  • 2. Outline ■ Introduction ■ Emmy-winning Algorithms ■ Distributing ML Algorithms in Practice ■ An example: ANN over GPUs & AWS Cloud
  • 3. What we were interested in: ■ High quality recommendations Proxy question: ■ Accuracy in predicted rating ■ Improve by 10% = $1million! Data size: ■ 100M ratings (back then “almost massive”)
  • 5. Netflix Scale ▪ > 44M members ▪ > 40 countries ▪ > 1000 device types ▪ > 5B hours in Q3 2013 ▪ Plays: > 50M/day ▪ Searches: > 3M/day ▪ Ratings: > 5M/day ▪ Log 100B events/day ▪ 31.62% of peak US downstream traffic
  • 6. Smart Models ■ Regression models (Logistic, Linear, Elastic nets) ■ GBDT/RF ■ SVD & other MF models ■ Factorization Machines ■ Restricted Boltzmann Machines ■ Markov Chains & other graphical models ■ Clustering (from k-means to modern non-parametric models) ■ Deep ANN ■ LDA ■ Association Rules ■ …
  • 8.
  • 10. 2007 Progress Prize ▪ Top 2 algorithms ▪ MF/SVD - Prize RMSE: 0.8914 ▪ RBM - Prize RMSE: 0.8990 ▪ Linear blend Prize RMSE: 0.88 ▪ Currently in use as part of Netflix’ rating prediction component ▪ Limitations ▪ Designed for 100M ratings, we have 5B ratings ▪ Not adaptable as users add ratings ▪ Performance issues
  • 18. 1. Do I need all that data? 2. At what level should I distribute/parallelize? 3. What latency can I afford?
  • 19. Do I need all that data?
  • 20. Really? Anand Rajaraman: Former Stanford Prof. & Senior VP at Walmart
  • 22. [Banko and Brill, 2001] Norvig: “Google does not have better Algorithms, only more Data” Many features/ low-bias models
  • 24. At what level should I parallelize?
  • 25. The three levels of Distribution/Parallelization 1. For each subset of the population (e.g. region) 2. For each combination of the hyperparameters 3. For each subset of the training data Each level has different requirements
  • 26. Level 1 Distribution ■ We may have subsets of the population for which we need to train an independently optimized model. ■ Training can be fully distributed requiring no coordination or data communication
  • 27. Level 2 Distribution ■ For a given subset of the population we need to find the “optimal” model ■ Train several models with different hyperparameter values ■ Worst-case: grid search ■ Can do much better than this (E.g. Bayesian Optimization with Gaussian Process Priors) ■ This process *does* require coordination ■ Need to decide on next “step” ■ Need to gather final optimal result ■ Requires data distribution, not sharing
  • 28. Level 3 Distribution ■ For each combination of hyperparameters, model training may still be expensive ■ Process requires coordination and data sharing/communication ■ Can distribute computation over machines splitting examples or parameters (e.g. ADMM) ■ Or parallelize on a single multicore machine (e.g. Hogwild) ■ Or… use GPUs
  • 29. ANN Training over GPUS and AWS
  • 30. ANN Training over GPUS and AWS ■ Level 1 distribution: machines over different AWS regions ■ Level 2 distribution: machines in AWS and same AWS region ■ Use coordination tools ■ Spearmint or similar for parameter optimization ■ Condor, StarCluster, Mesos… for distributed cluster coordination ■ Level 3 parallelization: highly optimized parallel CUDA code on GPUs
  • 31. What latency can I afford?
  • 32. 3 shades of latency ▪ Blueprint for multiple algorithm services ▪ Ranking ▪ Row selection ▪ Ratings ▪ Search ▪ … ▪ Multi-layered Machine Learning