MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud

•

13 likes•3,818 views

Xavier Amatriain

Engineering Data & Analytics Technology

Outline
■ Introduction
■ Emmy-winning Algorithms
■ Distributing ML Algorithms in Practice
■ An example: ANN over GPUs & AWS Cloud

What we were interested in:
■ High quality recommendations
Proxy question:
■ Accuracy in predicted rating
■ Improve by 10% = $1million!
Data size:
■ 100M ratings (back then “almost massive”)

Netflix Scale
▪ > 44M members
▪ > 40 countries
▪ > 1000 device types
▪ > 5B hours in Q3 2013
▪ Plays: > 50M/day
▪ Searches: > 3M/day
▪ Ratings: > 5M/day
▪ Log 100B events/day
▪ 31.62% of peak US downstream
traffic

Smart Models ■ Regression models (Logistic,
Linear, Elastic nets)
■ GBDT/RF
■ SVD & other MF models
■ Factorization Machines
■ Restricted Boltzmann Machines
■ Markov Chains & other graphical
models
■ Clustering (from k-means to
modern non-parametric models)
■ Deep ANN
■ LDA
■ Association Rules
■ …

2007 Progress Prize
▪ Top 2 algorithms
▪ MF/SVD - Prize RMSE: 0.8914
▪ RBM - Prize RMSE: 0.8990
▪ Linear blend Prize RMSE: 0.88
▪ Currently in use as part of Netflix’ rating prediction component
▪ Limitations
▪ Designed for 100M ratings, we have 5B ratings
▪ Not adaptable as users add ratings
▪ Performance issues

1. Do I need all that data?
2. At what level should I distribute/parallelize?
3. What latency can I afford?

Really?
Anand Rajaraman: Former Stanford Prof. &
Senior VP at Walmart

[Banko and Brill, 2001]
Norvig: “Google does not
have better Algorithms,
only more Data”
Many features/
low-bias models

The three levels of Distribution/Parallelization
1. For each subset of the population (e.g.
region)
2. For each combination of the
hyperparameters
3. For each subset of the training data
Each level has different requirements

Level 1 Distribution
■ We may have subsets of the
population for which we need to
train an independently optimized
model.
■ Training can be fully distributed
requiring no coordination or data
communication

Level 2 Distribution
■ For a given subset of the population we
need to find the “optimal” model
■ Train several models with different
hyperparameter values
■ Worst-case: grid search
■ Can do much better than this (E.g. Bayesian
Optimization with Gaussian Process Priors)
■ This process *does* require coordination
■ Need to decide on next “step”
■ Need to gather final optimal result
■ Requires data distribution, not sharing

Level 3 Distribution
■ For each combination of
hyperparameters, model training may
still be expensive
■ Process requires coordination and
data sharing/communication
■ Can distribute computation over
machines splitting examples or
parameters (e.g. ADMM)
■ Or parallelize on a single multicore
machine (e.g. Hogwild)
■ Or… use GPUs

ANN Training over GPUS and AWS
■ Level 1 distribution: machines over different AWS
regions
■ Level 2 distribution: machines in AWS and same AWS
region
■ Use coordination tools
■ Spearmint or similar for parameter optimization
■ Condor, StarCluster, Mesos… for distributed cluster coordination
■ Level 3 parallelization: highly optimized parallel CUDA
code on GPUs

3 shades of latency
▪ Blueprint for multiple
algorithm services
▪ Ranking
▪ Row selection
▪ Ratings
▪ Search
▪ …
▪ Multi-layered Machine
Learning

Xavier Amatriain (@xamat)
xavier@netflix.com
Thanks!
(and yes, we are hiring)

What's hot

A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...Natalia Díaz Rodríguez

Machine Learning to Grow the World's KnowledgeXavier Amatriain

Recommending the world's knowledgeLei Yang

Introduction To Applied Machine Learningananth

A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale

Explainable AI - making ML and DL models more interpretableAditya Bhattacharya

Replicable Evaluation of Recommender SystemsAlejandro Bellogin

Building Large-scale Real-world Recommender Systems - Recsys2012 tutorialXavier Amatriain

Artificial Intelligence Course: Linear models ananth

Understanding Basics of Machine LearningPranav Ainavolu

Artwork Personalization at NetflixJustin Basilico

Recommending for the WorldYves Raimond

Machine learning basics Akanksha Bali

MaxEnt (Loglinear) Models - Overviewananth

Factorization Machines with libFMLiangjie Hong

Introduction to-machine-learningBabu Priyavrat

Introduction to Machine LearningShahar Cohen

Machine Learning BasicsSuresh Arora

Time, Context and Causality in Recommender SystemsYves Raimond

Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini

What's hot (20)

A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...

Machine Learning to Grow the World's Knowledge

Recommending the world's knowledge

Introduction To Applied Machine Learning

A Multi-Armed Bandit Framework For Recommendations at Netflix

Explainable AI - making ML and DL models more interpretable

Replicable Evaluation of Recommender Systems

Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial

Artificial Intelligence Course: Linear models

Understanding Basics of Machine Learning

Artwork Personalization at Netflix

Recommending for the World

Machine learning basics

MaxEnt (Loglinear) Models - Overview

Factorization Machines with libFM

Introduction to-machine-learning

Introduction to Machine Learning

Machine Learning Basics

Time, Context and Causality in Recommender Systems

Lecture 2 Basic Concepts in Machine Learning for Language Technology

Viewers also liked

MLConf Seattle 2015 - ML@QuoraXavier Amatriain

May: If I Were 22LinkedIn Editors' Picks

Chela stress testsuperserch

Adding Identity Management and Access Control to your ApplicationFernando Lopez Aguilar

Cwin16 - Paris - ux designCapgemini

Ia32 Modo ProtegidoErwin Meza

Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain

Marc Stickdorn & Jakob Schneider – Mobile ethnography and ExperienceFellow, a...Jakob Schneider

Netflix Nebula - Gradle Summit 2014Justin Ryan

Cwin16 tls-partner-hpe-digital economy & Hybrid ITCapgemini

Disciplined agile business analysisScott W. Ambler

How Comcast uses Data Science to Improve the Customer ExperienceTuri, Inc.

How to Start a Startup at NYUNYU Entrepreneurial Institute

Introduction to the Innovation Corps (NSF I-Corps)NYU Entrepreneurial Institute

[PREMONEY 2014] Mayfield Fund >> Tim Chang, "Mobile Is The Future Of YOU: Why...500 Startups

Cwin16 - lyon - faurecia customer cockpitCapgemini

Viewers also liked (16)

MLConf Seattle 2015 - ML@Quora

May: If I Were 22

Chela stress test

Adding Identity Management and Access Control to your Application

Cwin16 - Paris - ux design

Ia32 Modo Protegido

Past present and future of Recommender Systems: an Industry Perspective

Marc Stickdorn & Jakob Schneider – Mobile ethnography and ExperienceFellow, a...

Netflix Nebula - Gradle Summit 2014

Cwin16 tls-partner-hpe-digital economy & Hybrid IT

Disciplined agile business analysis

How Comcast uses Data Science to Improve the Customer Experience

How to Start a Startup at NYU

Introduction to the Innovation Corps (NSF I-Corps)

[PREMONEY 2014] Mayfield Fund >> Tim Chang, "Mobile Is The Future Of YOU: Why...

Cwin16 - lyon - faurecia customer cockpit

Similar to MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud

The Power of Auto ML and How Does it WorkIvo Andreev

Machine Learning for Fraud DetectionNitesh Kumar

Computer Vision for BeginnersSanghamitra Deb

Multi Model Machine Learning by Maximo Gurmendez and Beth LoganSpark Summit

Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain

Spark Summit EU talk by Josef HabdankSpark Summit

Big data 2.0, deep learning and financial UsecasesArvind Rapaka

10 Lessons Learned from Building Machine Learning SystemsXavier Amatriain

Prediction as a service with ensemble model in SparkML and Python ScikitLearnJosef A. Habdank

Ed Snelson. Counterfactual AnalysisVolha Banadyseva

Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Databricks

A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi

Cooperative Machine Learning Network with Ahmed Masud of saf.aiChristopher Conlan

C3 w1Ajay Taneja

SparkML: Easy ML Productization for Real-Time BiddingDatabricks

Making sense of the Graph RevolutionInfiniteGraph

IEEE.BigData.Tutorial.2.slidesNish Parikh

Large scale Click-streaming and tranaction log miningitstuff

Machine Learning Introduction introducing basics of Machine Learningvipulkondekar

MongoDB.local Seattle 2019: Global Cluster Topologies in MongoDB AtlasMongoDB

Similar to MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud (20)

The Power of Auto ML and How Does it Work

Machine Learning for Fraud Detection

Computer Vision for Beginners

Multi Model Machine Learning by Maximo Gurmendez and Beth Logan

Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale

Spark Summit EU talk by Josef Habdank

Big data 2.0, deep learning and financial Usecases

10 Lessons Learned from Building Machine Learning Systems

Prediction as a service with ensemble model in SparkML and Python ScikitLearn

Ed Snelson. Counterfactual Analysis

Understanding Parallelization of Machine Learning Algorithms in Apache Spark™

A Framework for Scene Recognition Using Convolutional Neural Network as Featu...

Cooperative Machine Learning Network with Ahmed Masud of saf.ai

C3 w1

SparkML: Easy ML Productization for Real-Time Bidding

Making sense of the Graph Revolution

IEEE.BigData.Tutorial.2.slides

Large scale Click-streaming and tranaction log mining

Machine Learning Introduction introducing basics of Machine Learning

MongoDB.local Seattle 2019: Global Cluster Topologies in MongoDB Atlas

Recently uploaded

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor

UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

Introduction to Multiple Access Protocol.pptxupamatechverse

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1

Extrusion Processes and Their Limitations120cr0395

UNIT - IV - Air Compressors and its Performancesivaprakash250

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

UNIT-II FMM-Flow Through Circular Conduitsrknatarajan

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

KubeKraft presentation @CloudNativeHooghlysanyuktamishra911

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Roadmap to Membership of RICS - Pathways and RoutesM Maged Hegazy, LLM, MBA, CCP, P3O

Recently uploaded (20)

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130

UNIT-III FMM. DIMENSIONAL ANALYSIS

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

Microscopic Analysis of Ceramic Materials.pptx

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

Coefficient of Thermal Expansion and their Importance.pptx

Introduction to Multiple Access Protocol.pptx

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt

Extrusion Processes and Their Limitations

UNIT - IV - Air Compressors and its Performance

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

UNIT-II FMM-Flow Through Circular Conduits

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

KubeKraft presentation @CloudNativeHooghly

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service

Roadmap to Membership of RICS - Pathways and Routes

MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud

1. Distributing Large-scale ML Algorithms: from GPUs to the Cloud MMDS 2014 June, 2014 Xavier Amatriain Director - Algorithms Engineering @xamat

2. Outline ■ Introduction ■ Emmy-winning Algorithms ■ Distributing ML Algorithms in Practice ■ An example: ANN over GPUs & AWS Cloud

3. What we were interested in: ■ High quality recommendations Proxy question: ■ Accuracy in predicted rating ■ Improve by 10% = $1million! Data size: ■ 100M ratings (back then “almost massive”)

4. 2006 2014

5. Netflix Scale ▪ > 44M members ▪ > 40 countries ▪ > 1000 device types ▪ > 5B hours in Q3 2013 ▪ Plays: > 50M/day ▪ Searches: > 3M/day ▪ Ratings: > 5M/day ▪ Log 100B events/day ▪ 31.62% of peak US downstream traffic

6. Smart Models ■ Regression models (Logistic, Linear, Elastic nets) ■ GBDT/RF ■ SVD & other MF models ■ Factorization Machines ■ Restricted Boltzmann Machines ■ Markov Chains & other graphical models ■ Clustering (from k-means to modern non-parametric models) ■ Deep ANN ■ LDA ■ Association Rules ■ …

7. Netflix Algorithms “Emmy Winning”

9. Rating Prediction

10. 2007 Progress Prize ▪ Top 2 algorithms ▪ MF/SVD - Prize RMSE: 0.8914 ▪ RBM - Prize RMSE: 0.8990 ▪ Linear blend Prize RMSE: 0.88 ▪ Currently in use as part of Netflix’ rating prediction component ▪ Limitations ▪ Designed for 100M ratings, we have 5B ratings ▪ Not adaptable as users add ratings ▪ Performance issues

11. Ranking Ranking

12. Page composition

13. Similarity

14. Search Recommendations

15. Postplay

16. Gamification

17. Distributing ML algorithms in practice

18. 1. Do I need all that data? 2. At what level should I distribute/parallelize? 3. What latency can I afford?

19. Do I need all that data?

20. Really? Anand Rajaraman: Former Stanford Prof. & Senior VP at Walmart

21. Sometimes, it’s not about more data

22. [Banko and Brill, 2001] Norvig: “Google does not have better Algorithms, only more Data” Many features/ low-bias models

23. Sometimes, it’s not about more data

24. At what level should I parallelize?

25. The three levels of Distribution/Parallelization 1. For each subset of the population (e.g. region) 2. For each combination of the hyperparameters 3. For each subset of the training data Each level has different requirements

26. Level 1 Distribution ■ We may have subsets of the population for which we need to train an independently optimized model. ■ Training can be fully distributed requiring no coordination or data communication

27. Level 2 Distribution ■ For a given subset of the population we need to find the “optimal” model ■ Train several models with different hyperparameter values ■ Worst-case: grid search ■ Can do much better than this (E.g. Bayesian Optimization with Gaussian Process Priors) ■ This process *does* require coordination ■ Need to decide on next “step” ■ Need to gather final optimal result ■ Requires data distribution, not sharing

28. Level 3 Distribution ■ For each combination of hyperparameters, model training may still be expensive ■ Process requires coordination and data sharing/communication ■ Can distribute computation over machines splitting examples or parameters (e.g. ADMM) ■ Or parallelize on a single multicore machine (e.g. Hogwild) ■ Or… use GPUs

29. ANN Training over GPUS and AWS

30. ANN Training over GPUS and AWS ■ Level 1 distribution: machines over different AWS regions ■ Level 2 distribution: machines in AWS and same AWS region ■ Use coordination tools ■ Spearmint or similar for parameter optimization ■ Condor, StarCluster, Mesos… for distributed cluster coordination ■ Level 3 parallelization: highly optimized parallel CUDA code on GPUs

31. What latency can I afford?

32. 3 shades of latency ▪ Blueprint for multiple algorithm services ▪ Ranking ▪ Row selection ▪ Ratings ▪ Search ▪ … ▪ Multi-layered Machine Learning

33. Matrix Factorization Example

34. Xavier Amatriain (@xamat) xavier@netflix.com Thanks! (and yes, we are hiring)

MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud

Similar to MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud (20)

More from Xavier Amatriain

More from Xavier Amatriain (20)

Recently uploaded

Recently uploaded (20)

MMDS 2014 Talk - Distributing ML Algorithms: from GPUs to the Cloud