Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

10

Share

Download to read offline

Distributed machine learning

Download to read offline

Distributed machine learning

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Distributed machine learning

  1. 1. DISTRIBUTED MACHINE LEARNING STANLEY WANG SOLUTION ARCHITECT, TECH LEAD @SWANG68 http://www.linkedin.com/in/stanley-wang-a2b143b
  2. 2. What is Machine Learning?
  3. 3. Mathematics 101 for Machine Learning
  4. 4. Types of Machine Learning
  5. 5. Types of ML Algorithms • Clustering • Association learning • Parameter estimation • Recommendation engines • Classification • Similarity matching • Neural networks • Bayesian networks • Genetic algorithms
  6. 6. Top Machine Learning Algorithms
  7. 7. Machine Learning Library
  8. 8. Typical Machine Learning Cases
  9. 9. Machine Learning Customers Examples
  10. 10. Machine Learning in Big Data Infrastructure
  11. 11. Big Data Machine Learning Pipeline
  12. 12. Benefits of Big Data Machine Learning
  13. 13. Distributed ML Framework • Data Centric: Train over large data  Data split over multiple machines  Model replicas train over different parts of data and communicate model information periodically • Model Centric: Train over large models  Models split over multiple machines  A single training iteration spans multiple machines • Graph Centric: Train over large graphs  Partitions data as graph associated with every vertex/edge;  Parallel apply update functions are operations on a vertex and transforming data in the scope of the vertex;
  14. 14. Data Parallel ML - MapReduce
  15. 15. Model Parallel ML – Parameter Server
  16. 16. Graph Parallel ML – BSP, Pregel, GAS
  17. 17. Graph Parallel vs Data Parallel Graph parallel is new technique to partition and distribute data and execute machine learning algorithm orders of magnitude faster than data parallel approach!
  18. 18. Efficient Scaling Up • Businesses Need to Compute Hundreds of Distinct Tasks on the Same Graph o Example: personalized recommendations; Parallelize each task Parallelize across tasks Tas k Tas k Tas k Tas k Tas k Tas k Tas k Tas k Tas k Tas k Tas k Complex Simple Expensive to scale 2x machines = 2x throughput
  19. 19. Another Approach Task Parallelism: Simple, But Practical • What about scalability? Use cluster of single-machine systems to solve many tasks in parallel, homogeneous graph data, but, heterogeneous algorithm; • What about learning ability? Use hybrid data fusion approach ; • What about memory? Using Parallel Sliding Windows (PSW) algorithm enable computation on very large graphs on disk;
  20. 20. Parallel Sliding Windows • PSW processes the graph one sub-graph a time: • In one iteration, the whole graph is processed. – And typically, next iteration is started.
  21. 21. Scalable Distributed ML Frameworks • Yahoo Vowpal Wabbit - Fast and scalable out-of-core online ML algorithms library ; Hadoop compatible Allreduce; • Hadoop Mahout - Scalable Java ML library using map/reduce paradigm; supports 3”C”s+Extras use cases; • Spark MLlib - Memory based distributed ML framework; 10 times as fast as Mahout and even scales better; • Dato GraphLab - Graph based parallel ML framework; • Apache Giraph: Bulk Synchronous Parallel (BSP) based large scale graph processing framework ; • CMU Parameter Server - Distributed ML framework; • CMU Petuum - Iterative-Convergent Big ML ; • 0xdata H2O - Scalable memory efficient deep learning system;
  22. 22. ML and Big Data is Breakthrough
  • AkaradejIndharawong

    Dec. 11, 2018
  • safibaig

    Aug. 3, 2018
  • SunnyboysSon

    Apr. 23, 2018
  • rammohanmishra

    Feb. 5, 2018
  • DrDebashisDutta

    Feb. 5, 2018
  • danielmendoza37454961

    Feb. 4, 2018
  • AjinkyaKolhe2

    Nov. 7, 2017
  • ChrisLee37

    Jun. 9, 2017
  • TangQIchao

    May. 25, 2017
  • JeeHyubKim

    Aug. 25, 2016

Distributed machine learning

Views

Total views

6,203

On Slideshare

0

From embeds

0

Number of embeds

14

Actions

Downloads

247

Shares

0

Comments

0

Likes

10

×