DISTRIBUTED MACHINE
LEARNING
STANLEY WANG
SOLUTION ARCHITECT, TECH LEAD
@SWANG68
http://www.linkedin.com/in/stanley-wang-a2b143b
What is Machine Learning?
Mathematics 101 for Machine Learning
Types of Machine Learning
Types of ML Algorithms
• Clustering
• Association learning
• Parameter estimation
• Recommendation engines
• Classification
• Similarity matching
• Neural networks
• Bayesian networks
• Genetic algorithms
Top Machine Learning Algorithms
Machine Learning Library
Typical Machine Learning Cases
Machine Learning Customers Examples
Machine Learning in Big Data Infrastructure
Big Data Machine Learning Pipeline
Benefits of Big Data Machine Learning
Distributed ML Framework
• Data Centric: Train over large data
 Data split over multiple machines
 Model replicas train over different parts of data and
communicate model information periodically
• Model Centric: Train over large models
 Models split over multiple machines
 A single training iteration spans multiple machines
• Graph Centric: Train over large graphs
 Partitions data as graph associated with every vertex/edge;
 Parallel apply update functions are operations on a vertex
and transforming data in the scope of the vertex;
Data Parallel ML - MapReduce
Model Parallel ML – Parameter Server
Graph Parallel ML – BSP, Pregel, GAS
Graph Parallel vs Data Parallel
Graph parallel is new technique to partition and distribute
data and execute machine learning algorithm orders of
magnitude faster than data parallel approach!
Efficient Scaling Up
• Businesses Need to Compute Hundreds of
Distinct Tasks on the Same Graph
o Example: personalized recommendations;
Parallelize each task Parallelize across tasks
Tas
k
Tas
k
Tas
k
Tas
k
Tas
k
Tas
k
Tas
k
Tas
k
Tas
k Tas
k Tas
k
Complex Simple
Expensive
to scale
2x machines =
2x throughput
Another Approach Task Parallelism:
Simple, But Practical
• What about scalability? Use cluster of single-machine
systems to solve many tasks in parallel, homogeneous
graph data, but, heterogeneous algorithm;
• What about learning ability? Use hybrid data fusion
approach ;
• What about memory? Using Parallel Sliding Windows
(PSW) algorithm enable computation on very large graphs
on disk;
Parallel Sliding Windows
• PSW processes the graph one sub-graph a
time:
• In one iteration, the whole graph is
processed.
– And typically, next iteration is started.
Scalable Distributed ML Frameworks
• Yahoo Vowpal Wabbit - Fast and scalable out-of-core online
ML algorithms library ; Hadoop compatible Allreduce;
• Hadoop Mahout - Scalable Java ML library using map/reduce
paradigm; supports 3”C”s+Extras use cases;
• Spark MLlib - Memory based distributed ML framework; 10
times as fast as Mahout and even scales better;
• Dato GraphLab - Graph based parallel ML framework;
• Apache Giraph: Bulk Synchronous Parallel (BSP) based large
scale graph processing framework ;
• CMU Parameter Server - Distributed ML framework;
• CMU Petuum - Iterative-Convergent Big ML ;
• 0xdata H2O - Scalable memory efficient deep learning
system;
ML and Big Data is Breakthrough

Distributed machine learning

  • 1.
    DISTRIBUTED MACHINE LEARNING STANLEY WANG SOLUTIONARCHITECT, TECH LEAD @SWANG68 http://www.linkedin.com/in/stanley-wang-a2b143b
  • 2.
    What is MachineLearning?
  • 3.
    Mathematics 101 forMachine Learning
  • 5.
  • 6.
    Types of MLAlgorithms • Clustering • Association learning • Parameter estimation • Recommendation engines • Classification • Similarity matching • Neural networks • Bayesian networks • Genetic algorithms
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    Machine Learning inBig Data Infrastructure
  • 12.
    Big Data MachineLearning Pipeline
  • 13.
    Benefits of BigData Machine Learning
  • 17.
    Distributed ML Framework •Data Centric: Train over large data  Data split over multiple machines  Model replicas train over different parts of data and communicate model information periodically • Model Centric: Train over large models  Models split over multiple machines  A single training iteration spans multiple machines • Graph Centric: Train over large graphs  Partitions data as graph associated with every vertex/edge;  Parallel apply update functions are operations on a vertex and transforming data in the scope of the vertex;
  • 19.
    Data Parallel ML- MapReduce
  • 25.
    Model Parallel ML– Parameter Server
  • 26.
    Graph Parallel ML– BSP, Pregel, GAS
  • 28.
    Graph Parallel vsData Parallel Graph parallel is new technique to partition and distribute data and execute machine learning algorithm orders of magnitude faster than data parallel approach!
  • 29.
    Efficient Scaling Up •Businesses Need to Compute Hundreds of Distinct Tasks on the Same Graph o Example: personalized recommendations; Parallelize each task Parallelize across tasks Tas k Tas k Tas k Tas k Tas k Tas k Tas k Tas k Tas k Tas k Tas k Complex Simple Expensive to scale 2x machines = 2x throughput
  • 30.
    Another Approach TaskParallelism: Simple, But Practical • What about scalability? Use cluster of single-machine systems to solve many tasks in parallel, homogeneous graph data, but, heterogeneous algorithm; • What about learning ability? Use hybrid data fusion approach ; • What about memory? Using Parallel Sliding Windows (PSW) algorithm enable computation on very large graphs on disk;
  • 31.
    Parallel Sliding Windows •PSW processes the graph one sub-graph a time: • In one iteration, the whole graph is processed. – And typically, next iteration is started.
  • 32.
    Scalable Distributed MLFrameworks • Yahoo Vowpal Wabbit - Fast and scalable out-of-core online ML algorithms library ; Hadoop compatible Allreduce; • Hadoop Mahout - Scalable Java ML library using map/reduce paradigm; supports 3”C”s+Extras use cases; • Spark MLlib - Memory based distributed ML framework; 10 times as fast as Mahout and even scales better; • Dato GraphLab - Graph based parallel ML framework; • Apache Giraph: Bulk Synchronous Parallel (BSP) based large scale graph processing framework ; • CMU Parameter Server - Distributed ML framework; • CMU Petuum - Iterative-Convergent Big ML ; • 0xdata H2O - Scalable memory efficient deep learning system;
  • 33.
    ML and BigData is Breakthrough