SlideShare a Scribd company logo
© 2017 MapR TechnologiesMapR Confidential 1
Distributed Deep Learning on
MapR Converged Data Platform
Dong Meng - dmeng@mapr.com
Data Scientist, MapR Technologies
© 2017 MapR TechnologiesMapR Confidential 2
Roadmaps
•  Enterprise Big data Journey
•  Distributed Deep Learning
•  Apply Container Technology and NVIDIA GPUs
•  Demos
© 2017 MapR TechnologiesMapR Confidential 3
Enterprise Big data
Journey
© 2017 MapR TechnologiesMapR Confidential 4
Great buildings have
great foundations
© 2017 MapR TechnologiesMapR Confidential 5
MapR Development
2011
Industrial
grade data
platform for
big data
analytics 1.0
- MapR File
System
2013
Industrial
grade NoSQL
Key Value
Store –
MapRDB
(Implements
Hbase APIs)
2012
Industry’s first
visual big data
ops
dashboard in
MapR control
system
2014
Global multi
datacenter
replication
Fast Ingest
1.0
2016
Global
streaming
- MapR
Streams
JSON
Document DB
Fast Ingest
2.0
Spyglass
Monitoring
2015
Schema free
SQL engine
for big data –
Apache Drill
Global table
replication
2017
Persisted data
access for
Docker
containers
MapR Edge
for edge
computing
© 2017 MapR TechnologiesMapR Confidential 6
MapR Converged Data Platform
Files, Tables, Streams
together on same platform
Shared Services
On-Premise, In the Cloud, Hybrid
High Availability Real Time Security & Governance Multi-tenancy Disaster Recovery Global Namespace
Converge-X™ Data Fabric
Event Data
Streams
Analytics &
Machine Learning
Engines
Operational
Database
Cloud-scale
Data Store
© 2017 MapR TechnologiesMapR Confidential 7
High Availability Real Time Security & Governance Multi-tenancy Disaster Recovery Global Namespace
Converge-X™ Data Fabric
On-Premise, In the Cloud, Hybrid
HDFS APIPOSIX, NFS HBase API JSON API Kafka API
An Enterprise Foundation To Operationalize Data
Data
Center
Next-Gen
Applications
Event Data
Streams
Analytics &
Machine Learning Engines
Operational
Database
Cloud-scale
Data Store
Existing Enterprise
Applications
© 2017 MapR TechnologiesMapR Confidential 8
From a User Perspective
Files
Table
Streams
Directories
Cluster
Volume mount point
© 2017 MapR TechnologiesMapR Confidential 9
Distributed Deep Learning
© 2017 MapR TechnologiesMapR Confidential 10
More
DATA
The Unreasonable Effectiveness of Data,
published by Google
beats complex
algorithms
© 2017 MapR TechnologiesMapR Confidential 11
Distributed Machine Learning
•  Mahout: Map-Reduce
•  GraphLab: Graph-based parallel framework
•  Spark: In memory dataflow system
•  H2o: Distributed machine learning library and server
© 2017 MapR TechnologiesMapR Confidential 12
Why Deep Learning
•  Machine learning algorithm’s performance depend on the
representation of the data they are given (feature
engineering)
•  Difficulty in finding the right representations (features)
•  Deep learning
–  Learns multiple levels of representations
–  Learns high level of abstractions
•  Growth of data volumes and computing power (GPUs)
© 2017 MapR TechnologiesMapR Confidential 13
Distributed Deep Learning System
From a system design perspective:
•  Distributed Data Storage – HDFS, MapRFS, Ceph
•  Consistency – Parameter Server
•  Fault Tolerance – Checkpoint reload
•  Resource management – Yarn, Mesos, Kubernetes
•  Programming Mode – TensorFlow, Apache MXNet, Torch, Caffe
© 2017 MapR TechnologiesMapR Confidential 14
Distributed Deep Learning Model
Data Parallelism:
•  Data Partition
•  Train each model on mini-batches of data
Model Parallelism:
•  Model Partition
•  Separate the training task for each layer
© 2017 MapR TechnologiesMapR Confidential 15
Distributed Deep Learning System
From a user perspective:
•  Ease of expression: for lots of crazy ML ideas/algorithms
•  Scalability: can run experiments quickly
•  Portability: can run on wide variety of platforms
•  Reproducibility: easy to share and reproduce research
•  Production readiness: go from research to real products
© 2017 MapR TechnologiesMapR Confidential 16
Apply Container Technology and
NVIDIA GPUs
© 2017 MapR TechnologiesMapR Confidential 17
Deep Learning Development Environment
Individual Research:
•  Laptop/Dev boxes
Research Lab Setting:
•  HPC clusters, high speed job execution, small teams
Enterprise Deep learning System:
•  Distributed Data Storage, Consistency, Fault Tolerance,
Resource management, Programming Mode, Version and
Deployment
•  Container, Kubernetes, MapR Data Platform
© 2017 MapR TechnologiesMapR Confidential 18
Containers are Great for Deployment and Research
Advantages
•  Reproducible work environments
•  Ease of deployment
•  Isolation of individual workspaces
•  Run across heterogeneous
environments
•  Facilitate collaboration
© 2017 MapR TechnologiesMapR Confidential 19
Stateful Containers for Deep Learning
Persistent Storage
Audio Data Video Feeds
Advantages
•  Containerized workspaces
•  Keep your work between sessions
•  Manage work across many
projects
•  Work with versioned datasets and
models
•  Share work across containers,
projects and/or teams
Sensor data
© 2017 MapR TechnologiesMapR Confidential 20
Microservices for Production Deep Learning
Event Streams & DB
Advantages
•  Deploy models to production as
microservices
•  Use files, real-time streams and
databases in production
•  Scales horizontally
•  Support both real-time and batch
•  May or may not be stateful
© 2017 MapR TechnologiesMapR Confidential 21
Kubernetes for Containers Orchestration
•  Deploy once and run many times
•  Rollout new versions/rollback to old versions
•  Dynamic Scheduling and Elastic Scaling
•  Auto Fault Recovery and Scalable Computing
•  Isolation and Quota
•  Manage GPU resources
•  Mount MapR Volume to Persistent Volume
© 2017 MapR TechnologiesMapR Confidential 22
Kubernetes in a Heterogeneous GPU Cluster
•  Kubernetes master:
•  CPU-only
•  Workers:
•  NVIDIA driver
•  CUDA
•  CUDNN
•  Kubelet config updated for GPU
workers needed
--feature gates=Accelerators=true
•  Note: Containers also need to be
configured for GPU support
separatelyDiagram: Frederic Tausch on Github
© 2017 MapR TechnologiesMapR Confidential 23
Architecture
Data layer
Orchestration
layer
Application
layer
© 2017 MapR TechnologiesMapR Confidential 24
•  Converged Data Platform for both Big Data and Deep Learning
•  By design Volume Topology, collocate the data with GPU
computation/training.
•  Performant POSIX file system, quick adapt to new deep
learning technologies and frameworks
•  Mirroring feature enables model training/deployment with global
data centers
•  Snapshot feature enables research reproducibility
•  MapRDB and MapR Streams provides infrastructures for
Intelligence applications with deep learning models
MapR as the Infrastructure for Distributed DL
© 2017 MapR TechnologiesMapR Confidential 25
MapR Volumes •  Volumes are logical
groupings of containers, and
are mounted on directories
(just like Linux).
•  No fixed size
•  Units of policy management
•  No limit on number of
volumes
•  Containers are replicated to
other nodes
/projects
/project1
/users
/jsmith
/mjohnson
/
projects
/users
Data Nodes
© 2017 MapR TechnologiesMapR Confidential 26
Pattern 1: Separate Clusters
MapR Converged Data Platform Tier
Dockerized GPU-based NVIDIA Tier, NVIDIA DGX systems as MapR client
© 2017 MapR TechnologiesMapR Confidential 27
Pattern 2: Collocated MapR + Kubernetes
MapR Converged Data Platforms powered by NVIDIA GPU Cards
© 2017 MapR TechnologiesMapR Confidential 28
Demos
© 2017 MapR TechnologiesMapR Confidential 29
Demo1: Parameter Server and Worker
POD1: TF
Parameter Server
POD3: TF Worker2POD2: TF Work1
/projects
/project1
Persistent
Volume
MapR Volume
© 2017 MapR TechnologiesMapR Confidential 30
Demo2: Real Time Streams Face Detection
GLOBAL DATA PLANE
MAPR EDGE
Small footprint
at the edgeMAPR CONVERGED
ENTERPRISE
EDITION
Send updated
model back to
edge
Aggregate, analyze
data at core and
refine models
Reliable
replication
MAPR EDGE
Camera to
capture
video feeds
Convergence
at the edge
MAPR EDGE
Deploy Model
at the edge to
score data feed
[on-prem, hybrid, cloud]
© 2017 MapR TechnologiesMapR Confidential 31
Demo2: Real Time Streams Face Detection
Producers to capture video feeds
Consumers to output processed videos
DL model process streams
MAPR EDGE
© 2017 MapR TechnologiesMapR Confidential 32
Q&A
ENGAGE WITH US
孟东
dmeng@mapr.com

More Related Content

What's hot

High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloud
Accubits Technologies
 
(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud
Amazon Web Services
 
AI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using KubeflowAI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using Kubeflow
Steve Guhr
 
HPC on Azure for Reserach
HPC on Azure for ReserachHPC on Azure for Reserach
HPC on Azure for Reserach
Jürgen Ambrosi
 
Deep learning at supercomputing scale by Rangan Sukumar from Cray
Deep learning at supercomputing scale  by Rangan Sukumar from CrayDeep learning at supercomputing scale  by Rangan Sukumar from Cray
Deep learning at supercomputing scale by Rangan Sukumar from Cray
Bill Liu
 
High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101
Amazon Web Services
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
DESMOND YUEN
 
High Performance Computing Pitch Deck
High Performance Computing Pitch DeckHigh Performance Computing Pitch Deck
High Performance Computing Pitch Deck
Nicholas Vossburg
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
Adrian Cockcroft
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
Lee Stott
 
What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018
What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018
What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018
Sri Ambati
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
Jen Aman
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
Vladimir Starostenkov
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent Memory
Databricks
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Mathieu Dumoulin
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
Amazon Web Services
 
Netflix in the Cloud
Netflix in the CloudNetflix in the Cloud
Netflix in the Cloud
Adrian Cockcroft
 
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Sri Ambati
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
Amazon Web Services
 

What's hot (20)

High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloud
 
(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud
 
AI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using KubeflowAI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using Kubeflow
 
HPC on Azure for Reserach
HPC on Azure for ReserachHPC on Azure for Reserach
HPC on Azure for Reserach
 
Deep learning at supercomputing scale by Rangan Sukumar from Cray
Deep learning at supercomputing scale  by Rangan Sukumar from CrayDeep learning at supercomputing scale  by Rangan Sukumar from Cray
Deep learning at supercomputing scale by Rangan Sukumar from Cray
 
High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
High Performance Computing Pitch Deck
High Performance Computing Pitch DeckHigh Performance Computing Pitch Deck
High Performance Computing Pitch Deck
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018
What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018
What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent Memory
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
 
Netflix in the Cloud
Netflix in the CloudNetflix in the Cloud
Netflix in the Cloud
 
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 

Similar to Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubernetes + Distributed TensorFlow GPUs

Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
MapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
MapR Technologies
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
Mathieu Dumoulin
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
Justin Brandenburg
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
Ian Downard
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
MapR Technologies
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
DataWorks Summit
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
Amazon Web Services
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
Ted Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
Ted Dunning
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Matt Stubbs
 
Operationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer NoriOperationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer Nori
Databricks
 
Using TensorFlow for Machine Learning
Using TensorFlow for Machine LearningUsing TensorFlow for Machine Learning
Using TensorFlow for Machine Learning
Justin Brandenburg
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
DataWorks Summit/Hadoop Summit
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetup
Alan Iovine
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Ted Dunning
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
John Berns
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017
Dr. Anita Goel
 

Similar to Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubernetes + Distributed TensorFlow GPUs (20)

Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business Solutions
 
Operationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer NoriOperationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer Nori
 
Using TensorFlow for Machine Learning
Using TensorFlow for Machine LearningUsing TensorFlow for Machine Learning
Using TensorFlow for Machine Learning
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetup
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017
 

More from Chris Fregly

AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and Data
Chris Fregly
 
Pandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfPandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdf
Chris Fregly
 
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupRay AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Chris Fregly
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap:  AI and Machine LearningAmazon reInvent 2020 Recap:  AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine Learning
Chris Fregly
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Chris Fregly
 
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon Braket
Chris Fregly
 
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
Chris Fregly
 
AWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapAWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:Cap
Chris Fregly
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
Chris Fregly
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Chris Fregly
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Chris Fregly
 
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Chris Fregly
 
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
Chris Fregly
 
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
Chris Fregly
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Chris Fregly
 
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
Chris Fregly
 
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
Chris Fregly
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
Chris Fregly
 
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
Chris Fregly
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
Chris Fregly
 

More from Chris Fregly (20)

AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and Data
 
Pandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfPandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdf
 
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS MeetupRay AI Runtime (AIR) on AWS - Data Science On AWS Meetup
Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap:  AI and Machine LearningAmazon reInvent 2020 Recap:  AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine Learning
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
 
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon Braket
 
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
 
AWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapAWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:Cap
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
 
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
 
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
 
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
 
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
 
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
 

Recently uploaded

Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 

Recently uploaded (20)

Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 

Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubernetes + Distributed TensorFlow GPUs

  • 1. © 2017 MapR TechnologiesMapR Confidential 1 Distributed Deep Learning on MapR Converged Data Platform Dong Meng - dmeng@mapr.com Data Scientist, MapR Technologies
  • 2. © 2017 MapR TechnologiesMapR Confidential 2 Roadmaps •  Enterprise Big data Journey •  Distributed Deep Learning •  Apply Container Technology and NVIDIA GPUs •  Demos
  • 3. © 2017 MapR TechnologiesMapR Confidential 3 Enterprise Big data Journey
  • 4. © 2017 MapR TechnologiesMapR Confidential 4 Great buildings have great foundations
  • 5. © 2017 MapR TechnologiesMapR Confidential 5 MapR Development 2011 Industrial grade data platform for big data analytics 1.0 - MapR File System 2013 Industrial grade NoSQL Key Value Store – MapRDB (Implements Hbase APIs) 2012 Industry’s first visual big data ops dashboard in MapR control system 2014 Global multi datacenter replication Fast Ingest 1.0 2016 Global streaming - MapR Streams JSON Document DB Fast Ingest 2.0 Spyglass Monitoring 2015 Schema free SQL engine for big data – Apache Drill Global table replication 2017 Persisted data access for Docker containers MapR Edge for edge computing
  • 6. © 2017 MapR TechnologiesMapR Confidential 6 MapR Converged Data Platform Files, Tables, Streams together on same platform Shared Services On-Premise, In the Cloud, Hybrid High Availability Real Time Security & Governance Multi-tenancy Disaster Recovery Global Namespace Converge-X™ Data Fabric Event Data Streams Analytics & Machine Learning Engines Operational Database Cloud-scale Data Store
  • 7. © 2017 MapR TechnologiesMapR Confidential 7 High Availability Real Time Security & Governance Multi-tenancy Disaster Recovery Global Namespace Converge-X™ Data Fabric On-Premise, In the Cloud, Hybrid HDFS APIPOSIX, NFS HBase API JSON API Kafka API An Enterprise Foundation To Operationalize Data Data Center Next-Gen Applications Event Data Streams Analytics & Machine Learning Engines Operational Database Cloud-scale Data Store Existing Enterprise Applications
  • 8. © 2017 MapR TechnologiesMapR Confidential 8 From a User Perspective Files Table Streams Directories Cluster Volume mount point
  • 9. © 2017 MapR TechnologiesMapR Confidential 9 Distributed Deep Learning
  • 10. © 2017 MapR TechnologiesMapR Confidential 10 More DATA The Unreasonable Effectiveness of Data, published by Google beats complex algorithms
  • 11. © 2017 MapR TechnologiesMapR Confidential 11 Distributed Machine Learning •  Mahout: Map-Reduce •  GraphLab: Graph-based parallel framework •  Spark: In memory dataflow system •  H2o: Distributed machine learning library and server
  • 12. © 2017 MapR TechnologiesMapR Confidential 12 Why Deep Learning •  Machine learning algorithm’s performance depend on the representation of the data they are given (feature engineering) •  Difficulty in finding the right representations (features) •  Deep learning –  Learns multiple levels of representations –  Learns high level of abstractions •  Growth of data volumes and computing power (GPUs)
  • 13. © 2017 MapR TechnologiesMapR Confidential 13 Distributed Deep Learning System From a system design perspective: •  Distributed Data Storage – HDFS, MapRFS, Ceph •  Consistency – Parameter Server •  Fault Tolerance – Checkpoint reload •  Resource management – Yarn, Mesos, Kubernetes •  Programming Mode – TensorFlow, Apache MXNet, Torch, Caffe
  • 14. © 2017 MapR TechnologiesMapR Confidential 14 Distributed Deep Learning Model Data Parallelism: •  Data Partition •  Train each model on mini-batches of data Model Parallelism: •  Model Partition •  Separate the training task for each layer
  • 15. © 2017 MapR TechnologiesMapR Confidential 15 Distributed Deep Learning System From a user perspective: •  Ease of expression: for lots of crazy ML ideas/algorithms •  Scalability: can run experiments quickly •  Portability: can run on wide variety of platforms •  Reproducibility: easy to share and reproduce research •  Production readiness: go from research to real products
  • 16. © 2017 MapR TechnologiesMapR Confidential 16 Apply Container Technology and NVIDIA GPUs
  • 17. © 2017 MapR TechnologiesMapR Confidential 17 Deep Learning Development Environment Individual Research: •  Laptop/Dev boxes Research Lab Setting: •  HPC clusters, high speed job execution, small teams Enterprise Deep learning System: •  Distributed Data Storage, Consistency, Fault Tolerance, Resource management, Programming Mode, Version and Deployment •  Container, Kubernetes, MapR Data Platform
  • 18. © 2017 MapR TechnologiesMapR Confidential 18 Containers are Great for Deployment and Research Advantages •  Reproducible work environments •  Ease of deployment •  Isolation of individual workspaces •  Run across heterogeneous environments •  Facilitate collaboration
  • 19. © 2017 MapR TechnologiesMapR Confidential 19 Stateful Containers for Deep Learning Persistent Storage Audio Data Video Feeds Advantages •  Containerized workspaces •  Keep your work between sessions •  Manage work across many projects •  Work with versioned datasets and models •  Share work across containers, projects and/or teams Sensor data
  • 20. © 2017 MapR TechnologiesMapR Confidential 20 Microservices for Production Deep Learning Event Streams & DB Advantages •  Deploy models to production as microservices •  Use files, real-time streams and databases in production •  Scales horizontally •  Support both real-time and batch •  May or may not be stateful
  • 21. © 2017 MapR TechnologiesMapR Confidential 21 Kubernetes for Containers Orchestration •  Deploy once and run many times •  Rollout new versions/rollback to old versions •  Dynamic Scheduling and Elastic Scaling •  Auto Fault Recovery and Scalable Computing •  Isolation and Quota •  Manage GPU resources •  Mount MapR Volume to Persistent Volume
  • 22. © 2017 MapR TechnologiesMapR Confidential 22 Kubernetes in a Heterogeneous GPU Cluster •  Kubernetes master: •  CPU-only •  Workers: •  NVIDIA driver •  CUDA •  CUDNN •  Kubelet config updated for GPU workers needed --feature gates=Accelerators=true •  Note: Containers also need to be configured for GPU support separatelyDiagram: Frederic Tausch on Github
  • 23. © 2017 MapR TechnologiesMapR Confidential 23 Architecture Data layer Orchestration layer Application layer
  • 24. © 2017 MapR TechnologiesMapR Confidential 24 •  Converged Data Platform for both Big Data and Deep Learning •  By design Volume Topology, collocate the data with GPU computation/training. •  Performant POSIX file system, quick adapt to new deep learning technologies and frameworks •  Mirroring feature enables model training/deployment with global data centers •  Snapshot feature enables research reproducibility •  MapRDB and MapR Streams provides infrastructures for Intelligence applications with deep learning models MapR as the Infrastructure for Distributed DL
  • 25. © 2017 MapR TechnologiesMapR Confidential 25 MapR Volumes •  Volumes are logical groupings of containers, and are mounted on directories (just like Linux). •  No fixed size •  Units of policy management •  No limit on number of volumes •  Containers are replicated to other nodes /projects /project1 /users /jsmith /mjohnson / projects /users Data Nodes
  • 26. © 2017 MapR TechnologiesMapR Confidential 26 Pattern 1: Separate Clusters MapR Converged Data Platform Tier Dockerized GPU-based NVIDIA Tier, NVIDIA DGX systems as MapR client
  • 27. © 2017 MapR TechnologiesMapR Confidential 27 Pattern 2: Collocated MapR + Kubernetes MapR Converged Data Platforms powered by NVIDIA GPU Cards
  • 28. © 2017 MapR TechnologiesMapR Confidential 28 Demos
  • 29. © 2017 MapR TechnologiesMapR Confidential 29 Demo1: Parameter Server and Worker POD1: TF Parameter Server POD3: TF Worker2POD2: TF Work1 /projects /project1 Persistent Volume MapR Volume
  • 30. © 2017 MapR TechnologiesMapR Confidential 30 Demo2: Real Time Streams Face Detection GLOBAL DATA PLANE MAPR EDGE Small footprint at the edgeMAPR CONVERGED ENTERPRISE EDITION Send updated model back to edge Aggregate, analyze data at core and refine models Reliable replication MAPR EDGE Camera to capture video feeds Convergence at the edge MAPR EDGE Deploy Model at the edge to score data feed [on-prem, hybrid, cloud]
  • 31. © 2017 MapR TechnologiesMapR Confidential 31 Demo2: Real Time Streams Face Detection Producers to capture video feeds Consumers to output processed videos DL model process streams MAPR EDGE
  • 32. © 2017 MapR TechnologiesMapR Confidential 32 Q&A ENGAGE WITH US 孟东 dmeng@mapr.com