Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubernetes + Distributed TensorFlow GPUs

642 views

Published on

https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/244971261/

Based on this blog post: https://mengdong.github.io/2017/07/15/distributed-tensorflow-with-gpu-on-kubernetes-and-mapr/

youtube video:
https://www.youtube.com/watch?v=3phz1_B-rR4

http://pipeline.ai

Published in: Software
  • There are over 16,000 woodworking plans that comes with step-by-step instructions and detailed photos, Click here to take a look ♣♣♣ http://tinyurl.com/y3hc8gpw
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Want to preview some of our plans? You can get 50 Woodworking Plans and a 440-Page "The Art of Woodworking" Book... Absolutely FREE  http://tinyurl.com/yy9yh8fu
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Get access to 16,000 woodworking plans, Download 50 FREE Plans... ♥♥♥ http://tinyurl.com/yy9yh8fu
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubernetes + Distributed TensorFlow GPUs

  1. 1. © 2017 MapR TechnologiesMapR Confidential 1 Distributed Deep Learning on MapR Converged Data Platform Dong Meng - dmeng@mapr.com Data Scientist, MapR Technologies
  2. 2. © 2017 MapR TechnologiesMapR Confidential 2 Roadmaps •  Enterprise Big data Journey •  Distributed Deep Learning •  Apply Container Technology and NVIDIA GPUs •  Demos
  3. 3. © 2017 MapR TechnologiesMapR Confidential 3 Enterprise Big data Journey
  4. 4. © 2017 MapR TechnologiesMapR Confidential 4 Great buildings have great foundations
  5. 5. © 2017 MapR TechnologiesMapR Confidential 5 MapR Development 2011 Industrial grade data platform for big data analytics 1.0 - MapR File System 2013 Industrial grade NoSQL Key Value Store – MapRDB (Implements Hbase APIs) 2012 Industry’s first visual big data ops dashboard in MapR control system 2014 Global multi datacenter replication Fast Ingest 1.0 2016 Global streaming - MapR Streams JSON Document DB Fast Ingest 2.0 Spyglass Monitoring 2015 Schema free SQL engine for big data – Apache Drill Global table replication 2017 Persisted data access for Docker containers MapR Edge for edge computing
  6. 6. © 2017 MapR TechnologiesMapR Confidential 6 MapR Converged Data Platform Files, Tables, Streams together on same platform Shared Services On-Premise, In the Cloud, Hybrid High Availability Real Time Security & Governance Multi-tenancy Disaster Recovery Global Namespace Converge-X™ Data Fabric Event Data Streams Analytics & Machine Learning Engines Operational Database Cloud-scale Data Store
  7. 7. © 2017 MapR TechnologiesMapR Confidential 7 High Availability Real Time Security & Governance Multi-tenancy Disaster Recovery Global Namespace Converge-X™ Data Fabric On-Premise, In the Cloud, Hybrid HDFS APIPOSIX, NFS HBase API JSON API Kafka API An Enterprise Foundation To Operationalize Data Data Center Next-Gen Applications Event Data Streams Analytics & Machine Learning Engines Operational Database Cloud-scale Data Store Existing Enterprise Applications
  8. 8. © 2017 MapR TechnologiesMapR Confidential 8 From a User Perspective Files Table Streams Directories Cluster Volume mount point
  9. 9. © 2017 MapR TechnologiesMapR Confidential 9 Distributed Deep Learning
  10. 10. © 2017 MapR TechnologiesMapR Confidential 10 More DATA The Unreasonable Effectiveness of Data, published by Google beats complex algorithms
  11. 11. © 2017 MapR TechnologiesMapR Confidential 11 Distributed Machine Learning •  Mahout: Map-Reduce •  GraphLab: Graph-based parallel framework •  Spark: In memory dataflow system •  H2o: Distributed machine learning library and server
  12. 12. © 2017 MapR TechnologiesMapR Confidential 12 Why Deep Learning •  Machine learning algorithm’s performance depend on the representation of the data they are given (feature engineering) •  Difficulty in finding the right representations (features) •  Deep learning –  Learns multiple levels of representations –  Learns high level of abstractions •  Growth of data volumes and computing power (GPUs)
  13. 13. © 2017 MapR TechnologiesMapR Confidential 13 Distributed Deep Learning System From a system design perspective: •  Distributed Data Storage – HDFS, MapRFS, Ceph •  Consistency – Parameter Server •  Fault Tolerance – Checkpoint reload •  Resource management – Yarn, Mesos, Kubernetes •  Programming Mode – TensorFlow, Apache MXNet, Torch, Caffe
  14. 14. © 2017 MapR TechnologiesMapR Confidential 14 Distributed Deep Learning Model Data Parallelism: •  Data Partition •  Train each model on mini-batches of data Model Parallelism: •  Model Partition •  Separate the training task for each layer
  15. 15. © 2017 MapR TechnologiesMapR Confidential 15 Distributed Deep Learning System From a user perspective: •  Ease of expression: for lots of crazy ML ideas/algorithms •  Scalability: can run experiments quickly •  Portability: can run on wide variety of platforms •  Reproducibility: easy to share and reproduce research •  Production readiness: go from research to real products
  16. 16. © 2017 MapR TechnologiesMapR Confidential 16 Apply Container Technology and NVIDIA GPUs
  17. 17. © 2017 MapR TechnologiesMapR Confidential 17 Deep Learning Development Environment Individual Research: •  Laptop/Dev boxes Research Lab Setting: •  HPC clusters, high speed job execution, small teams Enterprise Deep learning System: •  Distributed Data Storage, Consistency, Fault Tolerance, Resource management, Programming Mode, Version and Deployment •  Container, Kubernetes, MapR Data Platform
  18. 18. © 2017 MapR TechnologiesMapR Confidential 18 Containers are Great for Deployment and Research Advantages •  Reproducible work environments •  Ease of deployment •  Isolation of individual workspaces •  Run across heterogeneous environments •  Facilitate collaboration
  19. 19. © 2017 MapR TechnologiesMapR Confidential 19 Stateful Containers for Deep Learning Persistent Storage Audio Data Video Feeds Advantages •  Containerized workspaces •  Keep your work between sessions •  Manage work across many projects •  Work with versioned datasets and models •  Share work across containers, projects and/or teams Sensor data
  20. 20. © 2017 MapR TechnologiesMapR Confidential 20 Microservices for Production Deep Learning Event Streams & DB Advantages •  Deploy models to production as microservices •  Use files, real-time streams and databases in production •  Scales horizontally •  Support both real-time and batch •  May or may not be stateful
  21. 21. © 2017 MapR TechnologiesMapR Confidential 21 Kubernetes for Containers Orchestration •  Deploy once and run many times •  Rollout new versions/rollback to old versions •  Dynamic Scheduling and Elastic Scaling •  Auto Fault Recovery and Scalable Computing •  Isolation and Quota •  Manage GPU resources •  Mount MapR Volume to Persistent Volume
  22. 22. © 2017 MapR TechnologiesMapR Confidential 22 Kubernetes in a Heterogeneous GPU Cluster •  Kubernetes master: •  CPU-only •  Workers: •  NVIDIA driver •  CUDA •  CUDNN •  Kubelet config updated for GPU workers needed --feature gates=Accelerators=true •  Note: Containers also need to be configured for GPU support separatelyDiagram: Frederic Tausch on Github
  23. 23. © 2017 MapR TechnologiesMapR Confidential 23 Architecture Data layer Orchestration layer Application layer
  24. 24. © 2017 MapR TechnologiesMapR Confidential 24 •  Converged Data Platform for both Big Data and Deep Learning •  By design Volume Topology, collocate the data with GPU computation/training. •  Performant POSIX file system, quick adapt to new deep learning technologies and frameworks •  Mirroring feature enables model training/deployment with global data centers •  Snapshot feature enables research reproducibility •  MapRDB and MapR Streams provides infrastructures for Intelligence applications with deep learning models MapR as the Infrastructure for Distributed DL
  25. 25. © 2017 MapR TechnologiesMapR Confidential 25 MapR Volumes •  Volumes are logical groupings of containers, and are mounted on directories (just like Linux). •  No fixed size •  Units of policy management •  No limit on number of volumes •  Containers are replicated to other nodes /projects /project1 /users /jsmith /mjohnson / projects /users Data Nodes
  26. 26. © 2017 MapR TechnologiesMapR Confidential 26 Pattern 1: Separate Clusters MapR Converged Data Platform Tier Dockerized GPU-based NVIDIA Tier, NVIDIA DGX systems as MapR client
  27. 27. © 2017 MapR TechnologiesMapR Confidential 27 Pattern 2: Collocated MapR + Kubernetes MapR Converged Data Platforms powered by NVIDIA GPU Cards
  28. 28. © 2017 MapR TechnologiesMapR Confidential 28 Demos
  29. 29. © 2017 MapR TechnologiesMapR Confidential 29 Demo1: Parameter Server and Worker POD1: TF Parameter Server POD3: TF Worker2POD2: TF Work1 /projects /project1 Persistent Volume MapR Volume
  30. 30. © 2017 MapR TechnologiesMapR Confidential 30 Demo2: Real Time Streams Face Detection GLOBAL DATA PLANE MAPR EDGE Small footprint at the edgeMAPR CONVERGED ENTERPRISE EDITION Send updated model back to edge Aggregate, analyze data at core and refine models Reliable replication MAPR EDGE Camera to capture video feeds Convergence at the edge MAPR EDGE Deploy Model at the edge to score data feed [on-prem, hybrid, cloud]
  31. 31. © 2017 MapR TechnologiesMapR Confidential 31 Demo2: Real Time Streams Face Detection Producers to capture video feeds Consumers to output processed videos DL model process streams MAPR EDGE
  32. 32. © 2017 MapR TechnologiesMapR Confidential 32 Q&A ENGAGE WITH US 孟东 dmeng@mapr.com

×