Distributed machine learning

Stanley Wang
Stanley WangSolutions Architect at Honeywell
DISTRIBUTED MACHINE
LEARNING
STANLEY WANG
SOLUTION ARCHITECT, TECH LEAD
@SWANG68
http://www.linkedin.com/in/stanley-wang-a2b143b
What is Machine Learning?
Mathematics 101 for Machine Learning
Distributed machine learning
Types of Machine Learning
Types of ML Algorithms
• Clustering
• Association learning
• Parameter estimation
• Recommendation engines
• Classification
• Similarity matching
• Neural networks
• Bayesian networks
• Genetic algorithms
Top Machine Learning Algorithms
Machine Learning Library
Typical Machine Learning Cases
Machine Learning Customers Examples
Machine Learning in Big Data Infrastructure
Big Data Machine Learning Pipeline
Benefits of Big Data Machine Learning
Distributed machine learning
Distributed machine learning
Distributed machine learning
Distributed ML Framework
• Data Centric: Train over large data
 Data split over multiple machines
 Model replicas train over different parts of data and
communicate model information periodically
• Model Centric: Train over large models
 Models split over multiple machines
 A single training iteration spans multiple machines
• Graph Centric: Train over large graphs
 Partitions data as graph associated with every vertex/edge;
 Parallel apply update functions are operations on a vertex
and transforming data in the scope of the vertex;
Distributed machine learning
Data Parallel ML - MapReduce
Distributed machine learning
Distributed machine learning
Distributed machine learning
Distributed machine learning
Distributed machine learning
Model Parallel ML – Parameter Server
Graph Parallel ML – BSP, Pregel, GAS
Distributed machine learning
Graph Parallel vs Data Parallel
Graph parallel is new technique to partition and distribute
data and execute machine learning algorithm orders of
magnitude faster than data parallel approach!
Efficient Scaling Up
• Businesses Need to Compute Hundreds of
Distinct Tasks on the Same Graph
o Example: personalized recommendations;
Parallelize each task Parallelize across tasks
Tas
k
Tas
k
Tas
k
Tas
k
Tas
k
Tas
k
Tas
k
Tas
k
Tas
k Tas
k Tas
k
Complex Simple
Expensive
to scale
2x machines =
2x throughput
Another Approach Task Parallelism:
Simple, But Practical
• What about scalability? Use cluster of single-machine
systems to solve many tasks in parallel, homogeneous
graph data, but, heterogeneous algorithm;
• What about learning ability? Use hybrid data fusion
approach ;
• What about memory? Using Parallel Sliding Windows
(PSW) algorithm enable computation on very large graphs
on disk;
Parallel Sliding Windows
• PSW processes the graph one sub-graph a
time:
• In one iteration, the whole graph is
processed.
– And typically, next iteration is started.
Scalable Distributed ML Frameworks
• Yahoo Vowpal Wabbit - Fast and scalable out-of-core online
ML algorithms library ; Hadoop compatible Allreduce;
• Hadoop Mahout - Scalable Java ML library using map/reduce
paradigm; supports 3”C”s+Extras use cases;
• Spark MLlib - Memory based distributed ML framework; 10
times as fast as Mahout and even scales better;
• Dato GraphLab - Graph based parallel ML framework;
• Apache Giraph: Bulk Synchronous Parallel (BSP) based large
scale graph processing framework ;
• CMU Parameter Server - Distributed ML framework;
• CMU Petuum - Iterative-Convergent Big ML ;
• 0xdata H2O - Scalable memory efficient deep learning
system;
ML and Big Data is Breakthrough
1 of 33

Recommended

Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf... by
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Edureka!
3.7K views36 slides
Federated learning in brief by
Federated learning in briefFederated learning in brief
Federated learning in briefShashi Perera
2.4K views11 slides
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is... by
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
9.2K views81 slides
Anomaly detection by
Anomaly detectionAnomaly detection
Anomaly detectionQuantUniversity
9.1K views60 slides
Federated Learning by
Federated LearningFederated Learning
Federated LearningDataWorks Summit
6.7K views40 slides
What’s New with Databricks Machine Learning by
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningDatabricks
401 views102 slides

More Related Content

What's hot

Neural Networks in Data Mining - “An Overview” by
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
282 views40 slides
Graph Neural Network - Introduction by
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - IntroductionJungwon Kim
5.3K views19 slides
Federated learning by
Federated learningFederated learning
Federated learningMindos Cheng
4K views26 slides
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz... by
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Dataconomy Media
849 views23 slides
Computational Intelligence: concepts and applications using Athena by
Computational Intelligence: concepts and applications using AthenaComputational Intelligence: concepts and applications using Athena
Computational Intelligence: concepts and applications using AthenaPedro Almir
2.1K views25 slides
Variational Autoencoder by
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
7.7K views26 slides

What's hot(20)

Graph Neural Network - Introduction by Jungwon Kim
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
Jungwon Kim5.3K views
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz... by Dataconomy Media
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Dataconomy Media849 views
Computational Intelligence: concepts and applications using Athena by Pedro Almir
Computational Intelligence: concepts and applications using AthenaComputational Intelligence: concepts and applications using Athena
Computational Intelligence: concepts and applications using Athena
Pedro Almir2.1K views
Variational Autoencoder by Mark Chang
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
Mark Chang7.7K views
Decision Intelligence: a new discipline emerges by Lorien Pratt
Decision Intelligence: a new discipline emergesDecision Intelligence: a new discipline emerges
Decision Intelligence: a new discipline emerges
Lorien Pratt4.6K views
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S... by Simplilearn
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn18.7K views
Neural networks and deep learning by Jörgen Sandig
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
Jörgen Sandig7.9K views
Deep Learning - Convolutional Neural Networks by Christian Perone
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
Christian Perone71.4K views
Machine Learning Model Deployment: Strategy to Implementation by DataWorks Summit
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
DataWorks Summit3.3K views
Big Data Analytics for Real Time Systems by Kamalika Dutta
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
Kamalika Dutta2K views
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners... by Simplilearn
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn1.4K views
Deep learning - A Visual Introduction by Lukas Masuch
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch57.5K views
Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present... by SlideTeam
Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...
Artificial Intelligence Machine Learning Deep Learning Ppt Powerpoint Present...
SlideTeam6.5K views
Explainable AI (XAI) - A Perspective by Saurabh Kaushik
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective
Saurabh Kaushik5.1K views

Similar to Distributed machine learning

Challenges on Distributed Machine Learning by
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learningjie cao
1.8K views48 slides
Scalable machine learning by
Scalable machine learningScalable machine learning
Scalable machine learningArnaud Rachez
5.7K views29 slides
Designing Distributed Machine Learning on Apache Spark by
Designing Distributed Machine Learning on Apache SparkDesigning Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache SparkDatabricks
1.7K views32 slides
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019 by
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019VMware Tanzu
2K views59 slides
The Challenges of Bringing Machine Learning to the Masses by
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesAlice Zheng
3.3K views24 slides
Simulation of Heterogeneous Cloud Infrastructures by
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
110 views23 slides

Similar to Distributed machine learning(20)

Challenges on Distributed Machine Learning by jie cao
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
jie cao1.8K views
Scalable machine learning by Arnaud Rachez
Scalable machine learningScalable machine learning
Scalable machine learning
Arnaud Rachez5.7K views
Designing Distributed Machine Learning on Apache Spark by Databricks
Designing Distributed Machine Learning on Apache SparkDesigning Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache Spark
Databricks1.7K views
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019 by VMware Tanzu
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
VMware Tanzu2K views
The Challenges of Bringing Machine Learning to the Masses by Alice Zheng
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the Masses
Alice Zheng3.3K views
Simulation of Heterogeneous Cloud Infrastructures by CloudLightning
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
CloudLightning110 views
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2 by Anant Corporation
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2
Anant Corporation237 views
A survey on Machine Learning In Production (July 2018) by Arnab Biswas
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
Arnab Biswas77 views
Introduction to Mahout and Machine Learning by Varad Meru
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru77K views
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark by Databricks
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Databricks1.4K views
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ... by asimkadav
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
asimkadav1.3K views
The Analytics Frontier of the Hadoop Eco-System by inside-BigData.com
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
inside-BigData.com2.2K views
Kylin and Druid Presentation by argonauts007
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
argonauts0076.6K views
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan by Spark Summit
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Spark Summit2.5K views
Prediction as a service with ensemble model in SparkML and Python ScikitLearn by Josef A. Habdank
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Josef A. Habdank1.8K views
Mining big data streams with APACHE SAMOA by Albert Bifet by J On The Beach
Mining big data streams with APACHE SAMOA by Albert BifetMining big data streams with APACHE SAMOA by Albert Bifet
Mining big data streams with APACHE SAMOA by Albert Bifet
J On The Beach578 views
Java scalability considerations yogesh deshpande by IndicThreads
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpande
IndicThreads1.5K views

More from Stanley Wang

Sparql a simple knowledge query by
Sparql  a simple knowledge querySparql  a simple knowledge query
Sparql a simple knowledge queryStanley Wang
869 views8 slides
Ontologies and semantic web by
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic webStanley Wang
1.9K views29 slides
Ontology model and owl by
Ontology model and owlOntology model and owl
Ontology model and owlStanley Wang
953 views25 slides
Resource description framework by
Resource description frameworkResource description framework
Resource description frameworkStanley Wang
1.2K views9 slides
Semantic web technology by
Semantic web technologySemantic web technology
Semantic web technologyStanley Wang
4.6K views17 slides
Next generation big data bi by
Next generation big data biNext generation big data bi
Next generation big data biStanley Wang
1.5K views22 slides

More from Stanley Wang(15)

Sparql a simple knowledge query by Stanley Wang
Sparql  a simple knowledge querySparql  a simple knowledge query
Sparql a simple knowledge query
Stanley Wang869 views
Ontologies and semantic web by Stanley Wang
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
Stanley Wang1.9K views
Ontology model and owl by Stanley Wang
Ontology model and owlOntology model and owl
Ontology model and owl
Stanley Wang953 views
Resource description framework by Stanley Wang
Resource description frameworkResource description framework
Resource description framework
Stanley Wang1.2K views
Semantic web technology by Stanley Wang
Semantic web technologySemantic web technology
Semantic web technology
Stanley Wang4.6K views
Next generation big data bi by Stanley Wang
Next generation big data biNext generation big data bi
Next generation big data bi
Stanley Wang1.5K views
Overview of recommender system by Stanley Wang
Overview of recommender systemOverview of recommender system
Overview of recommender system
Stanley Wang6.9K views
Data analytics as a service by Stanley Wang
Data analytics as a serviceData analytics as a service
Data analytics as a service
Stanley Wang8.4K views
Distributed machine learning examples by Stanley Wang
Distributed machine learning examplesDistributed machine learning examples
Distributed machine learning examples
Stanley Wang714 views
Fundamental of deep learning by Stanley Wang
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
Stanley Wang1.3K views
Graph analytic and machine learning by Stanley Wang
Graph analytic and machine learningGraph analytic and machine learning
Graph analytic and machine learning
Stanley Wang2.5K views
Big data analytic market opportunity by Stanley Wang
Big data analytic market opportunityBig data analytic market opportunity
Big data analytic market opportunity
Stanley Wang1K views
A sdn based application aware and network provisioning by Stanley Wang
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
Stanley Wang559 views

Recently uploaded

Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...ShapeBlue
82 views62 slides
DRBD Deep Dive - Philipp Reisner - LINBIT by
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITShapeBlue
62 views21 slides
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... by
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...ShapeBlue
48 views17 slides
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueShapeBlue
131 views23 slides
State of the Union - Rohit Yadav - Apache CloudStack by
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStackShapeBlue
145 views53 slides
NTGapps NTG LowCode Platform by
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform Mustafa Kuğu
141 views30 slides

Recently uploaded(20)

Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue82 views
DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue62 views
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... by ShapeBlue
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
ShapeBlue48 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue131 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue145 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu141 views
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue74 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue88 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue46 views
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue85 views
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue111 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker50 views
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue42 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue46 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue63 views
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates by ShapeBlue
Keynote Talk: Open Source is Not Dead - Charles Schulz - VatesKeynote Talk: Open Source is Not Dead - Charles Schulz - Vates
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates
ShapeBlue119 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE57 views

Distributed machine learning

  • 1. DISTRIBUTED MACHINE LEARNING STANLEY WANG SOLUTION ARCHITECT, TECH LEAD @SWANG68 http://www.linkedin.com/in/stanley-wang-a2b143b
  • 2. What is Machine Learning?
  • 3. Mathematics 101 for Machine Learning
  • 5. Types of Machine Learning
  • 6. Types of ML Algorithms • Clustering • Association learning • Parameter estimation • Recommendation engines • Classification • Similarity matching • Neural networks • Bayesian networks • Genetic algorithms
  • 7. Top Machine Learning Algorithms
  • 11. Machine Learning in Big Data Infrastructure
  • 12. Big Data Machine Learning Pipeline
  • 13. Benefits of Big Data Machine Learning
  • 17. Distributed ML Framework • Data Centric: Train over large data  Data split over multiple machines  Model replicas train over different parts of data and communicate model information periodically • Model Centric: Train over large models  Models split over multiple machines  A single training iteration spans multiple machines • Graph Centric: Train over large graphs  Partitions data as graph associated with every vertex/edge;  Parallel apply update functions are operations on a vertex and transforming data in the scope of the vertex;
  • 19. Data Parallel ML - MapReduce
  • 25. Model Parallel ML – Parameter Server
  • 26. Graph Parallel ML – BSP, Pregel, GAS
  • 28. Graph Parallel vs Data Parallel Graph parallel is new technique to partition and distribute data and execute machine learning algorithm orders of magnitude faster than data parallel approach!
  • 29. Efficient Scaling Up • Businesses Need to Compute Hundreds of Distinct Tasks on the Same Graph o Example: personalized recommendations; Parallelize each task Parallelize across tasks Tas k Tas k Tas k Tas k Tas k Tas k Tas k Tas k Tas k Tas k Tas k Complex Simple Expensive to scale 2x machines = 2x throughput
  • 30. Another Approach Task Parallelism: Simple, But Practical • What about scalability? Use cluster of single-machine systems to solve many tasks in parallel, homogeneous graph data, but, heterogeneous algorithm; • What about learning ability? Use hybrid data fusion approach ; • What about memory? Using Parallel Sliding Windows (PSW) algorithm enable computation on very large graphs on disk;
  • 31. Parallel Sliding Windows • PSW processes the graph one sub-graph a time: • In one iteration, the whole graph is processed. – And typically, next iteration is started.
  • 32. Scalable Distributed ML Frameworks • Yahoo Vowpal Wabbit - Fast and scalable out-of-core online ML algorithms library ; Hadoop compatible Allreduce; • Hadoop Mahout - Scalable Java ML library using map/reduce paradigm; supports 3”C”s+Extras use cases; • Spark MLlib - Memory based distributed ML framework; 10 times as fast as Mahout and even scales better; • Dato GraphLab - Graph based parallel ML framework; • Apache Giraph: Bulk Synchronous Parallel (BSP) based large scale graph processing framework ; • CMU Parameter Server - Distributed ML framework; • CMU Petuum - Iterative-Convergent Big ML ; • 0xdata H2O - Scalable memory efficient deep learning system;
  • 33. ML and Big Data is Breakthrough