More Related Content Similar to Converged and Containerized Distributed Deep Learning With TensorFlow and Kubernetes (20) More from Mathieu Dumoulin (7) Converged and Containerized Distributed Deep Learning With TensorFlow and Kubernetes1. © 2017 MapR TechnologiesMapR Confidential 1
Converged, Containerized
Distributed Deep Learning With
TensorFlow and Kubernetes
Mathieu Dumoulin
Data Engineer, MapR Professional Services
Advanced Analytics Meetup, NYC, 26th September 2017
2. © 2017 MapR TechnologiesMapR Confidential 2
• MapR Data Engineer, Professional Services APAC
• From Montreal, Canada
• M.Sc. CS from University Laval, Canada
– Large scale text classification on Hadoop
• My interests: ML at scale, real-time, Kafka, microservices and
containers, Kubernetes
About Me: Mathieu Dumoulin
Robot predictive maintenance in Action
11:20am–12:00pm Wednesday, September 27, 2017
Mathieu Dumoulin and Mateusz Dymczyk (H2O.ai)
3. © 2017 MapR TechnologiesMapR Confidential 3
Today’s Menu
1. Enterprise Machine Learning is hard
2. Deep Learning is even harder
3. Containers to the rescue
4. Kubernetes to containers’ rescue
5. Convergence rescues all of the above
6. Example: TensorFlow, Kubernetes and MapR
4. © 2017 MapR TechnologiesMapR Confidential 4
ML for Enterprise: Who’s Winning and Why
• Massively invested, major business impact
• Core features of main products
• Internal end-to-end expertise
• World-class (purpose-built) infrastructure
5. © 2017 MapR TechnologiesMapR Confidential 5
“ML is so amazing, every enterprise
must be rushing to implement this
everywhere, right now!!”
—Mathieu Dumoulin, grad student (2012)
Copyright © Disney Enterprise
6. © 2017 MapR TechnologiesMapR Confidential 6
Fast-forward to 2017: Transformative ML Adoption is Slow
7. © 2017 MapR TechnologiesMapR Confidential 7
ML is Hard
“Why is Machine Learning Hard?” by S. Zayd Enam
http://ai.stanford.edu/~zayd/why-is-machine-learning-hard.html
The Data Science Venn Diagram, courtesy of Drew
Conway
8. © 2017 MapR TechnologiesMapR Confidential 8
Enterprise ML is Harder
+
9. © 2017 MapR TechnologiesMapR Confidential 9
Data Engineering Effort Dominates ML Projects
~80%* of the work Also ~80%*
of the work
Data scientists do their thing* A number I made up
10. © 2017 MapR TechnologiesMapR Confidential 10
Enter Deep Learning
11. © 2017 MapR TechnologiesMapR Confidential 11
Autonomous Driving
XXXXXX
• Deep learning for
autonomous
driving
• Convolutional
neural networks
• Real-time semantic
segmentation
• 2 GB/s
12. © 2017 MapR TechnologiesMapR Confidential 12
Deep Learning and Enterprise ML: Harder
• All the problems of “normal”
enterprise ML
– ETL data flows
– production deployment
– Supporting multiple DS
– Data & model governance
• New Problems
– Need lots of compute for training
– Need access to GPUs
– Need new tools & libraries
13. © 2017 MapR TechnologiesMapR Confidential 13
Containers Help Enterprise ML
14. © 2017 MapR TechnologiesMapR Confidential 14
What’s so great about a container?
15. © 2017 MapR TechnologiesMapR Confidential 15
What is Docker? - Before Docker
Developer IT
Hey, my app is done,
can you deploy it?
Sure! Give me a 2 weeks.
Sysadmin
Storage
Admin
Network
Admin
Provision stuff please.
Done
Done
Sorry, something didn’t work.
Didn’t work, can you try again?
Stick figures: http://www.clipartpanda.com/
16. © 2017 MapR TechnologiesMapR Confidential 16
What is Docker? - After Docker
Developer
Build
container
with app
inside.
IT
Hey, my app is done,
can you deploy this
container?
Sure, it’s live!
Either
17. © 2017 MapR TechnologiesMapR Confidential 17
Containers are Great for Machine Learning
Advantages
• Easy(er) deployments
• Run across heterogeneous
environments (laptop/cluster/cloud)
• Reproducible environments
• facilitate collaboration
• Better than VMs
• But limited to stateless…
18. © 2017 MapR TechnologiesMapR Confidential 18
Stateful Containers for ML
Persistent Storage
Transaction
data
Clickstream
logs
Advantages
• Containerized workspaces
• Work with specific version of
tools, datasets and/or models
• Collaborate across projects
and/or teams
Sensor data
19. © 2017 MapR TechnologiesMapR Confidential 19
Production Deployment of ML as Microservices
Event Streams & DB
Advantages
• Deploy models to production
as microservices
• Use files, message streams
and DB from containers
• Scales elastically as needed
• Real-time or batch
20. © 2017 MapR TechnologiesMapR Confidential 20
Kubernetes* is a Key
Component to Enterprise ML
Success
*Read: “Production-Grade Container Orchestration”
21. © 2017 MapR TechnologiesMapR Confidential 21
Containers Need a Runtime: My Laptop
Data Scientist
22. © 2017 MapR TechnologiesMapR Confidential 22
Docker Containers in the Enterprise Don’t Scale
Data Science Team App Dev Team Other Dev Team
23. © 2017 MapR TechnologiesMapR Confidential 23
Scaling Up with Container Orchestration
• Serve multiple users each with multiple containers
• Scheduling and resource allocation
• “Data Center OS” – treat data centers like a giant computer
What you get:
• Fault tolerance
• Elastic scaling of services
• Connect to persistent storage
• Handle security
24. © 2017 MapR TechnologiesMapR Confidential 24
About Kubernetes
• Announced by Google in mid-
2014
– Version 1.0 released in 2015
• Google's Borg system inspired
• Open source, very active
– over 1,000 collaborators
• De-facto standard for managing
application containers
• Master + Nodes structure
• Use via REST API (only!)
Kube is on GitHub: https://github.com/kubernetes/kubernetes Graph: Kubernetes Cluster Setup by Pieter Jong
25. © 2017 MapR TechnologiesMapR Confidential 25
Kubernetes Manages GPUs as Resources
• Deep learning needs GPUs
• GPUs are just another resource
• Requires hardware + drivers
installed on OS and in the
containers
• Officially beta feature, but works OK
already
Diagram: Frederic Tausch on Github
26. © 2017 MapR TechnologiesMapR Confidential 26
Convergence for ML Pipelines
in a Containerized World
27. © 2017 MapR TechnologiesMapR Confidential 27
Containers Don’t Just Live in a Bubble
Source: https://wallpaperfx.com bubble world
28. © 2017 MapR TechnologiesMapR Confidential 28
Machine Learning forms Data Pipelines
Ref: https://eng.uber.com/michelangelo/
29. © 2017 MapR TechnologiesMapR Confidential 29
Containers Help Manage the Steps
What about the arrows?
30. © 2017 MapR TechnologiesMapR Confidential 30
Just Throw OSS Software at it Until It Works
Ref: http://advancedspark.com/ , https://github.com/fluxcapacitor/pipeline
Separate
Clusters!
31. © 2017 MapR TechnologiesMapR Confidential 31
Just Throw OSS Software at it Until It Works
Ref: http://advancedspark.com/ , https://github.com/fluxcapacitor/pipeline
32. © 2017 MapR TechnologiesMapR Confidential 32
Converged
Platform
Data Pipelines on One Platform is Converged
Distributed FS
Real-time event streams
Enterprise grade NoSQL
+
33. © 2017 MapR TechnologiesMapR Confidential 33
Analytical and Machine
Learning Engines
Event Data
Streams
Cloud Scale Data
Store
MapR Converged Data Platform
Files, Tables, Streams
together on same platform
Shared Services
On-Premise, In the Cloud, Hybrid
High Availability Real Time Security & Governance Multi-tenancy Disaster Recovery Global Namespace
Converge-X Data Fabric
Operational
Database
34. © 2017 MapR TechnologiesMapR Confidential 34
MapR Data Services for Containers
• Pre-built, certified container image
for connecting to MapR services
• Secure authentication at
container level, secure connection
• High performance
• Extensible support for application
layers
• Available in Docker Hub, Dockerfile
for customizability
MapR Persistent Application Client Container
(PACC)
MapR POSIX Client
for Containers
MapR Converged
Client for
Containers
Space for Customer Application
35. © 2017 MapR TechnologiesMapR Confidential 35
Containers and MapR: Separate Clusters
MapR Converged Data Platform Tier
Dockerized CPU/GPU-based Nvidia Tier
36. © 2017 MapR TechnologiesMapR Confidential 36
Containers and MapR: Separate Clusters
CPU-based MapR Tier with GPU Cards
37. © 2017 MapR TechnologiesMapR Confidential 37
Example:
Distributed TensorFlow on
Kubernetes and MapR
38. © 2017 MapR TechnologiesMapR Confidential 38
• Most popular
implementation for DDL
– CaffeOnSpark (Yahoo)
– TensorFlowOnSpark
– TensorFlow
– DeepLearning4J
– SparkNet
• Basic Idea: Iterative
model parameter
averaging
Distributed Deep Learning: Parameter Server
Li et al. Scaling Distributed Machine Learning with the Parameter Server (link)
Implementations compared: Dong & Cao 2016 on Slideshare
39. © 2017 MapR TechnologiesMapR Confidential 39
Deep Learning QSS Reference Architecture
New Image
to Classify
Category
Probabilities
Training
Images…
Category
1
Category
N
…
40. © 2017 MapR TechnologiesMapR Confidential 40
Architecture Layers Explained
Data layer
Orchestration
layer
Application
layer
41. © 2017 MapR TechnologiesMapR Confidential 41
MapR + Kube is Already in Production
• At a “very large global
consumer electronics firm”
• GPUs on some nodes
• Kubernetes + Docker
• Data input via NFS
• Storage expanding
quickly (TB-> PB scale)
42. © 2017 MapR TechnologiesMapR Confidential 42
Conclusion:
Enterprise ML IT’s Future is
Containerized and Converged
43. © 2017 MapR TechnologiesMapR Confidential 43
• Integration with external systems
• Performance monitoring
• Upgrading model versions
• HA & Elastic Scalability
• Yes, Kubernetes and containers help
• BUT there is still a lot left to do…
I’m Glossing Over Deployment Difficulties
Standalone Streaming Microservice
Spark Streaming Deployment
44. © 2017 MapR TechnologiesMapR Confidential 44
Enterprise ML IT’s Future is Containerized
• Huge Opportunity – Organizations are
rapidly moving to containerize
• Radical benefits for ML practitioners
• Huge Gap for stateful application support
• MapR provides a high value, highly
differentiated solution
Containers are for everything, not just ML!
45. © 2017 MapR TechnologiesMapR Confidential 45
• It’s not about the boxes, it’s about the arrows
• Kubernetes is already the de-facto standard orchestration
• Converged platforms radically simplify the required stack
Enterprise ML IT is Containerized and Converged
+
46. © 2017 MapR TechnologiesMapR Confidential 46
New: Machine Learning Logistics
Model Management in the Real World
O’Reilly book by Ellen Friedman & Ted Dunning © Sept 2017
Get free pdf copy of book courtesy of MapR:
https://mapr.com/ebook/machine-learning-logistics/
Visit MapR booth for free book signings & booth theater
presentations by the authors
Wed schedule:
Book signing: afternoon break 3:35 – 4:20 pm
Booth presentation by Ted Dunning: 3:00 – 3:30 pm
Thur schedule:
Book signing: morning break 10:45 – 11:20 am
Booth presentation by Ellen Friedman: 3:00 – 3:30 pm
47. © 2017 MapR TechnologiesMapR Confidential 47
New: Microservices and Containers
Mastering the Cloud, Data, and Digital Transformation
MapR book by Jim Scott © Sept 2017
Get free pdf copy of books courtesy of MapR:
https://mapr.com/ebooks/
Visit MapR booth for free book signing
Wednesday schedule:
Book signing: morning break 10:50 – 11:20 am
Or until everyone goes to a talk
48. © 2017 MapR TechnologiesMapR Confidential 48
Q&A
ENGAGE WITH US
mdumoulin@mapr.com
@mapr
MapR Blog: https://mapr.com/blog
49. © 2017 MapR TechnologiesMapR Confidential 49
• Overview of the Rendezvous Architecture included in “Non-Flink
Machine Learning on Flink” video of talk by Ted Dunning at Flink
Forward conference 14 April 2017
– https://www.youtube.com/watch?v=fZXQZNKFUVE
• “How Stream-1st Architecture & Emerging Technologies Provide a
Competitive Edge” video of talk by Ellen Friedman at Big Data
London conference 4 November 2016
– https://www.youtube.com/watch?v=FivaG1T11W0
• Dong Meng’s blog post: “Distributed Deep Learning on the MapR
Converged Data Platform” May 2017
– https://mapr.com/blog/distributed-deep-learning-mapr
Additional Resources
50. © 2017 MapR TechnologiesMapR Confidential 50
Humans Still Better in Non-Ideal Conditions (for now…)
Ref: A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions
Samuel Dodge, Lina Karam, May 2017
• Researchers added slight noise to images (Noise and Blur)
• State of the art DL models fail quickly
• Humans win out easily on most distorted images
51. © 2017 MapR TechnologiesMapR Confidential 51
Bonus:
MapR Unique Features for ML
52. © 2017 MapR TechnologiesMapR Confidential 52
• NFS mount and POSIX file system
– Small scale Python or R data exploration on the real data
– Keep the raw data, ETL work is easily reused
• Supports standard big data ecosystem (Spark)
• NFS mount can ingest data from any enterprise system that
can output files
– Even if they don’t support Hadoop!
• Much faster than HDFS
– Serve production models directly from MapR
MapR Supports All Tools Out of the Box
53. © 2017 MapR TechnologiesMapR Confidential 53
Remember that most of the effort in Enterprise ML is to realize the
workflow. This is where MapR shines!
• Operational capabilities (MapR DB, MapR Client)
– Serve production models directly from MapR
• Snapshots and Mirrors
– Do A/B testing with almost no coding
– Promote the mirror to go back to the previous state
• Just update the path in the production system - no redeployment!
• MapR ES (Event Streams/Kafka) for Real-time predictions
– Zero configuration Kafka – it just works!
– Kafka REST Proxy for max interoperability
– Supports microservices and Stateful Containers
Support the ML Workflow, Not Just Modeling
54. © 2017 MapR TechnologiesMapR Confidential 54
Technical Details:
- Environment software versions
- Kubernetes setup
- Start deep learning model
training
55. © 2017 MapR TechnologiesMapR Confidential 55
• 4x AWS EC2 g2.2xlarge (GPU)
• Master: m4.2xlarge
• OS: Ubuntu 16.04 LTS + updates
• MapR 5.2.1.42646.GA
• Kubernetes 1.7.3
• Tensorflow: 1.3.0 GPU
Blog post about it by Dong Meng: Instructions and video
Demo Environment Details
56. © 2017 MapR TechnologiesMapR Confidential 56
Kubernetes Install on Ubuntu
$ clush -aB apt-get update
$ clush -aB apt-get install -qy docker.io
$ clush -aB apt-get update
$ clush -aB apt-get install -y apt-transport-https
$ clush -aB 'curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -'
$ cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
$ deb http://apt.kubernetes.io/ kubernetes-xenial main
$ EOF
$ clush -aB apt-get -y update
clush -a "apt-get install -y kubelet kubeadm kubectl kubernetes-cni"
# For all GPU nodes
# echo “Environment="KUBELET_EXTRA_ARGS=--feature-gates=Accelerators=true” >>
/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
$ clush -aB systemctl enable docker
$ clush -aB systemctl start docker
$ clush -aB systemctl enable kubelet
$ clush -aB systemctl start kubelet
57. © 2017 MapR TechnologiesMapR Confidential 57
Kubernetes Install on Ubuntu 2
$ kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-
address=<MASTER IP>
# Enable use of ’kubectl’ to manage kubernetes cluster
$ cp /etc/kubernetes/admin.conf $HOME/
$ sudo chown $(id -u):$(id -g) $HOME/admin.conf
$ export KUBECONFIG=$HOME/admin.conf
$ echo "export KUBECONFIG=$HOME/admin.conf" | tee -a ~/.bashrc
$ kubectl apply
-f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml
$ kubectl create
-f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
$ kubectl taint nodes --all node-role.kubernetes.io/master-
$ kubeadm join --token <TOKEN VALUE> <- Done! Kubernetes is up
58. © 2017 MapR TechnologiesMapR Confidential 58
Control Kubernetes From your Mac
# Control your Kube cluster from your Mac
$ brew install kubectl
# Copy the admin authentication from the master to your client
(scp <cluster>:admin.conf ~/.kube/
# edit admin.conf
# update: “server: https://<KUBE MASTER IP/HOST>:6443”
$ export KUBECONFIG=~/.kube/admin.conf
$ kubectl get pods --all-namespaces
# Install the dashboard UI (lots of alternatives)
$ kubectl create -f https://git.io/kube-dashboard
$ kubectl proxy &
# open your browser to: http://127.0.0.1:8001/ui
59. © 2017 MapR TechnologiesMapR Confidential 59
Deep learning Demonstrates ML is Useful
Image and video
• Object
identification
• Motion detection
• Image generation
Sound and text
• Speech recognition
• Sentiment analysis
• Chatbots
Time series & other
• Anomaly detection
• Fraud detection
• Recommenders