SlideShare a Scribd company logo
1 of 17
Download to read offline
Distributed Deep
Learning
Optimizations
GEETA CHAUHAN
5th Annual Global Big Data Conference, Santa Clara
Aug 29th – 31st , 2017
Agenda  Distributed DL Challenges
 @ Scale DL Infrastructure
 Parallelize your models
 Techniques for Optimization
 Look into future
 References
Rise of Deep Learning
• Computer Vision, Language Translation,
Speech Recognition, Question & Answer,
…
Major Advances
in AI
• Latency, Cost, Power consumption issues
• Complexity & size outpacing commodity
“General purpose compute”
• Hyper-parameter tuning
Challenging to
build & deploy
for large scale
applications
Exascale, 15 Watts
3
Shift towards Specialized Compute
 Special purpose Cloud
 Google TPU, Microsoft Brainwave, IBM Power AI, Nvidia v100, Intel Nervana
 Spectrum: CPU, GPU, FPGA, Custom Asics
 Edge Compute: Hardware accelerators, AI SOC
 Intel Neural Compute Stick, Nvidia Jetson, Nvidia Drive PX (Self driving cars)
 Architectures
 Cluster Compute, HPC, Neuromorphic, Quantum compute
 Complexity in Software
 Model tuning/optimizations specific to hardware
 Growing need for compilers to optimize based on deployment hardware
 Workload specific compute: Model training, Inference
4
CPU Optimizations
 Leverage High Performant compute tools
 Intel Python, Intel Math Kernel Library (MKL), MKL-
DNN
 Compile Tensorflow from Source for CPU
Optimizations
 Proper Data Format
 NCHW for CPUs vs Tensorflow default NHWC
 Proper Batch size, using all cores & memory
 Use Queues for Reading Data
Source: Intel Research Blog
5
Tensorflow CPU Optimizations
 Compile from source
 git clone https://github.com/tensorflow/tensorflow.git
 Run ./configure from Tensorflow source directory
 Select option MKL (CPU) Optimization
 Build pip package for install
 bazel build --config=mkl --copt=-DEIGEN_USE_VML -c opt
//tensorflow/tools/pip_package:build_pip_package
 Install the optimized TensorFlow wheel
 bazel-bin/tensorflow/tools/pip_package/build_pip_package
~/path_to_save_wheel
pip install --upgrade --user ~/path_to_save_wheel /wheel_name.whl
6
Parallelize your models
 Data Parallelism
 Tensorflow Estimator + Experiments
 Parameter Server, Worker cluster
 Intel BigDL Spark Cluster
 Baidu’s Ring AllReduce
 HyperTune Google Cloud ML
 Model Parallelism
 Graph too large to fit on one
machine
 Tensorflow Model Towers
7
Optimizations for Training
Source: Amazon MxNET
8
Workload Partitioning
Source: Amazon MxNET
 Minimize communication time
 Place neighboring layers on same GPU
 Balance workload between GPUs
 Different layers have different memory-compute
properties
 Model on left more balanced
 LSTM unrolling: ↓ memory, ↑ compute time
 Encode/Decode: ↑ memory
9
Optimizations for Inferencing
 Graph Transform Tool
 Freeze graph (variables to constants)
 Quantize weights (20 M weights for IV3)
 Quantization (32 bit float → 8 bit float)
 Memory Mapping
 Inception v3 93 MB → 1.5 MB
10
bazel build tensorflow/tools/graph_transforms:transform_graph
bazel-bin/tensorflow/tools/graph_transforms/transform_graph 
--in_graph=/tmp/classify_image_graph_def.pb 
--outputs="softmax" --out_graph=/tmp/quantized_graph.pb 
--transforms='add_default_attributes strip_unused_nodes(type=float,
shape="1,299,299,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms fold_old_batch_norms quantize_weights quantize_nodes
strip_unused_nodes sort_by_execution_order'
Cluster
Optimizations
 Define your ML Container locally
 Evaluate with different parameters in the cloud
 Use EFS / GFS for data storage and sharing across
nodes
 Create separate Data processing container
 Mount EFS/GFS drive on all pods for shared
storage
 Avoid GPU Fragmentation problems by bundling
jobs
 Placement optimizations – Kubernetes Bundle
as pods, Mesos placement constraints
 GPU Drivers bundling in container a problem
 Mount as Readonly volume, or use Nvidia-
docker
11
Future: FPGA Hardware Microservices
Source: Microsoft Research Blog
12
BrainWave Compiler & Runtime
Source: Microsoft Research Blog
13
Future: Neuromorphic Compute
IBM True North: Brain Inspired Chip Neuromorphic memristors
14
Future:
Quantum
Computers
Source: opentranscripts.org
Eg Personalized Medicine for diseases like Cancer
15
Resources
 A Study of Complex Deep Learning Networks on High Performance,
Neuromorphic, and Quantum Computers
https://arxiv.org/pdf/1703.05364.pdf
 Microsoft’s Project Brainwave: https://www.microsoft.com/en-
us/research/blog/microsoft-unveils-project-brainwave/
 Intel Spark BigDL: https://software.intel.com/en-us/articles/bigdl-
distributed-deep-learning-on-apache-spark
 Baidu’s Paddle-Paddle on Kubernetes:
http://blog.kubernetes.io/2017/02/run-deep-learning-with-
paddlepaddle-on-kubernetes.html
 Kubernetes GPU Guide: https://github.com/Langhalsdino/Kubernetes-
GPU-Guide
 Tensorflow Quantization:
https://www.tensorflow.org/performance/quantization
16
Questions?
Contact
http://bit.ly/geeta4c
geeta@svsg.co
@geeta4c

More Related Content

What's hot

Machine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMatthias Feys
 
CPN211 My Datacenter Has Walls That Move - AWS re: Invent 2012
CPN211 My Datacenter Has Walls That Move - AWS re: Invent 2012CPN211 My Datacenter Has Walls That Move - AWS re: Invent 2012
CPN211 My Datacenter Has Walls That Move - AWS re: Invent 2012Amazon Web Services
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware LandscapeGrigory Sapunov
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Codemotion
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer BuildPetteriTeikariPhD
 
Big data app meetup 2016-06-15
Big data app meetup 2016-06-15Big data app meetup 2016-06-15
Big data app meetup 2016-06-15Illia Polosukhin
 
Graph lab in a NutShell
Graph lab in a NutShellGraph lab in a NutShell
Graph lab in a NutShellMohit Ranjan
 
Intro to the Distributed Version of TensorFlow
Intro to the Distributed Version of TensorFlowIntro to the Distributed Version of TensorFlow
Intro to the Distributed Version of TensorFlowAltoros
 
On-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidOn-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidYufeng Guo
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeNVIDIA Taiwan
 
Transfer Leaning Using Pytorch synopsis Minor project pptx
Transfer Leaning Using Pytorch  synopsis Minor project pptxTransfer Leaning Using Pytorch  synopsis Minor project pptx
Transfer Leaning Using Pytorch synopsis Minor project pptxAnkit Gupta
 
Metta Innovations - Introdução ao Deep Learning aplicado a vídeo analytics
Metta Innovations - Introdução ao Deep Learning aplicado a vídeo analyticsMetta Innovations - Introdução ao Deep Learning aplicado a vídeo analytics
Metta Innovations - Introdução ao Deep Learning aplicado a vídeo analyticsEduardo Gaspar
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architectureKhanh Le
 
ANISH_and_DR.DANIEL_augmented_reality_presentation
ANISH_and_DR.DANIEL_augmented_reality_presentationANISH_and_DR.DANIEL_augmented_reality_presentation
ANISH_and_DR.DANIEL_augmented_reality_presentationAnish Patel
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in Rmikaelhuss
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 

What's hot (20)

Machine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud Platform
 
CPN211 My Datacenter Has Walls That Move - AWS re: Invent 2012
CPN211 My Datacenter Has Walls That Move - AWS re: Invent 2012CPN211 My Datacenter Has Walls That Move - AWS re: Invent 2012
CPN211 My Datacenter Has Walls That Move - AWS re: Invent 2012
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 
Deep Learning Update May 2016
Deep Learning Update May 2016Deep Learning Update May 2016
Deep Learning Update May 2016
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer Build
 
Big data app meetup 2016-06-15
Big data app meetup 2016-06-15Big data app meetup 2016-06-15
Big data app meetup 2016-06-15
 
Graph lab in a NutShell
Graph lab in a NutShellGraph lab in a NutShell
Graph lab in a NutShell
 
Intro to the Distributed Version of TensorFlow
Intro to the Distributed Version of TensorFlowIntro to the Distributed Version of TensorFlow
Intro to the Distributed Version of TensorFlow
 
AI Hardware
AI HardwareAI Hardware
AI Hardware
 
On-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidOn-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on Android
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better Life
 
Transfer Leaning Using Pytorch synopsis Minor project pptx
Transfer Leaning Using Pytorch  synopsis Minor project pptxTransfer Leaning Using Pytorch  synopsis Minor project pptx
Transfer Leaning Using Pytorch synopsis Minor project pptx
 
Metta Innovations - Introdução ao Deep Learning aplicado a vídeo analytics
Metta Innovations - Introdução ao Deep Learning aplicado a vídeo analyticsMetta Innovations - Introdução ao Deep Learning aplicado a vídeo analytics
Metta Innovations - Introdução ao Deep Learning aplicado a vídeo analytics
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architecture
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
ANISH_and_DR.DANIEL_augmented_reality_presentation
ANISH_and_DR.DANIEL_augmented_reality_presentationANISH_and_DR.DANIEL_augmented_reality_presentation
ANISH_and_DR.DANIEL_augmented_reality_presentation
 
The Revolution of Deep Learning
The Revolution of Deep LearningThe Revolution of Deep Learning
The Revolution of Deep Learning
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 

Similar to Distributed deep learning optimizations

Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learningAmer Ather
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everythingLew Tucker
 
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingAchieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingMesbah Uddin Khan
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik
 
Build, train, and deploy machine learning models at scale
Build, train, and deploy machine learning models at scaleBuild, train, and deploy machine learning models at scale
Build, train, and deploy machine learning models at scaleAmazon Web Services
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platforminside-BigData.com
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...Willy Marroquin (WillyDevNET)
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Databricks
 
Migrating existing open source machine learning to azure
Migrating existing open source machine learning to azureMigrating existing open source machine learning to azure
Migrating existing open source machine learning to azureMicrosoft Tech Community
 
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Amazon Web Services
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningDataWorks Summit/Hadoop Summit
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesAnirudh Koul
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors DataWorks Summit/Hadoop Summit
 

Similar to Distributed deep learning optimizations (20)

Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
 
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingAchieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
Build, train, and deploy machine learning models at scale
Build, train, and deploy machine learning models at scaleBuild, train, and deploy machine learning models at scale
Build, train, and deploy machine learning models at scale
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
 
Migrating existing open source machine learning to azure
Migrating existing open source machine learning to azureMigrating existing open source machine learning to azure
Migrating existing open source machine learning to azure
 
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 

More from geetachauhan

Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainabilitygeetachauhan
 
Building AI with Security Privacy in Mind
Building AI with Security Privacy in MindBuilding AI with Security Privacy in Mind
Building AI with Security Privacy in Mindgeetachauhan
 
Building AI with Security and Privacy in mind
Building AI with Security and Privacy in mindBuilding AI with Security and Privacy in mind
Building AI with Security and Privacy in mindgeetachauhan
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorchgeetachauhan
 
Building Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchBuilding Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchgeetachauhan
 
Future is private intel dev fest
Future is private   intel dev festFuture is private   intel dev fest
Future is private intel dev festgeetachauhan
 
Decentralized AI Draper
Decentralized AI   DraperDecentralized AI   Draper
Decentralized AI Drapergeetachauhan
 
Decentralized AI: Convergence of AI + Blockchain
Decentralized AI: Convergence of AI + Blockchain Decentralized AI: Convergence of AI + Blockchain
Decentralized AI: Convergence of AI + Blockchain geetachauhan
 
Decentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AIDecentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AIgeetachauhan
 
Decentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AIDecentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AIgeetachauhan
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaginggeetachauhan
 
NIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCSNIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCSgeetachauhan
 
Deep learning @ Edge using Intel's Neural Compute Stick
Deep learning @ Edge using Intel's Neural Compute StickDeep learning @ Edge using Intel's Neural Compute Stick
Deep learning @ Edge using Intel's Neural Compute Stickgeetachauhan
 
Build Secure IOT Solutions using Blockchain
Build Secure IOT Solutions using BlockchainBuild Secure IOT Solutions using Blockchain
Build Secure IOT Solutions using Blockchaingeetachauhan
 
Data Analytics in Real World (May 2016)
Data Analytics in Real World (May 2016)Data Analytics in Real World (May 2016)
Data Analytics in Real World (May 2016)geetachauhan
 
Data Analytics in Real World
Data Analytics in Real WorldData Analytics in Real World
Data Analytics in Real Worldgeetachauhan
 
Blockchain revolution
Blockchain revolutionBlockchain revolution
Blockchain revolutiongeetachauhan
 

More from geetachauhan (17)

Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
 
Building AI with Security Privacy in Mind
Building AI with Security Privacy in MindBuilding AI with Security Privacy in Mind
Building AI with Security Privacy in Mind
 
Building AI with Security and Privacy in mind
Building AI with Security and Privacy in mindBuilding AI with Security and Privacy in mind
Building AI with Security and Privacy in mind
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
 
Building Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchBuilding Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorch
 
Future is private intel dev fest
Future is private   intel dev festFuture is private   intel dev fest
Future is private intel dev fest
 
Decentralized AI Draper
Decentralized AI   DraperDecentralized AI   Draper
Decentralized AI Draper
 
Decentralized AI: Convergence of AI + Blockchain
Decentralized AI: Convergence of AI + Blockchain Decentralized AI: Convergence of AI + Blockchain
Decentralized AI: Convergence of AI + Blockchain
 
Decentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AIDecentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AI
 
Decentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AIDecentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AI
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
 
NIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCSNIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCS
 
Deep learning @ Edge using Intel's Neural Compute Stick
Deep learning @ Edge using Intel's Neural Compute StickDeep learning @ Edge using Intel's Neural Compute Stick
Deep learning @ Edge using Intel's Neural Compute Stick
 
Build Secure IOT Solutions using Blockchain
Build Secure IOT Solutions using BlockchainBuild Secure IOT Solutions using Blockchain
Build Secure IOT Solutions using Blockchain
 
Data Analytics in Real World (May 2016)
Data Analytics in Real World (May 2016)Data Analytics in Real World (May 2016)
Data Analytics in Real World (May 2016)
 
Data Analytics in Real World
Data Analytics in Real WorldData Analytics in Real World
Data Analytics in Real World
 
Blockchain revolution
Blockchain revolutionBlockchain revolution
Blockchain revolution
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Distributed deep learning optimizations

  • 1. Distributed Deep Learning Optimizations GEETA CHAUHAN 5th Annual Global Big Data Conference, Santa Clara Aug 29th – 31st , 2017
  • 2. Agenda  Distributed DL Challenges  @ Scale DL Infrastructure  Parallelize your models  Techniques for Optimization  Look into future  References
  • 3. Rise of Deep Learning • Computer Vision, Language Translation, Speech Recognition, Question & Answer, … Major Advances in AI • Latency, Cost, Power consumption issues • Complexity & size outpacing commodity “General purpose compute” • Hyper-parameter tuning Challenging to build & deploy for large scale applications Exascale, 15 Watts 3
  • 4. Shift towards Specialized Compute  Special purpose Cloud  Google TPU, Microsoft Brainwave, IBM Power AI, Nvidia v100, Intel Nervana  Spectrum: CPU, GPU, FPGA, Custom Asics  Edge Compute: Hardware accelerators, AI SOC  Intel Neural Compute Stick, Nvidia Jetson, Nvidia Drive PX (Self driving cars)  Architectures  Cluster Compute, HPC, Neuromorphic, Quantum compute  Complexity in Software  Model tuning/optimizations specific to hardware  Growing need for compilers to optimize based on deployment hardware  Workload specific compute: Model training, Inference 4
  • 5. CPU Optimizations  Leverage High Performant compute tools  Intel Python, Intel Math Kernel Library (MKL), MKL- DNN  Compile Tensorflow from Source for CPU Optimizations  Proper Data Format  NCHW for CPUs vs Tensorflow default NHWC  Proper Batch size, using all cores & memory  Use Queues for Reading Data Source: Intel Research Blog 5
  • 6. Tensorflow CPU Optimizations  Compile from source  git clone https://github.com/tensorflow/tensorflow.git  Run ./configure from Tensorflow source directory  Select option MKL (CPU) Optimization  Build pip package for install  bazel build --config=mkl --copt=-DEIGEN_USE_VML -c opt //tensorflow/tools/pip_package:build_pip_package  Install the optimized TensorFlow wheel  bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/path_to_save_wheel pip install --upgrade --user ~/path_to_save_wheel /wheel_name.whl 6
  • 7. Parallelize your models  Data Parallelism  Tensorflow Estimator + Experiments  Parameter Server, Worker cluster  Intel BigDL Spark Cluster  Baidu’s Ring AllReduce  HyperTune Google Cloud ML  Model Parallelism  Graph too large to fit on one machine  Tensorflow Model Towers 7
  • 9. Workload Partitioning Source: Amazon MxNET  Minimize communication time  Place neighboring layers on same GPU  Balance workload between GPUs  Different layers have different memory-compute properties  Model on left more balanced  LSTM unrolling: ↓ memory, ↑ compute time  Encode/Decode: ↑ memory 9
  • 10. Optimizations for Inferencing  Graph Transform Tool  Freeze graph (variables to constants)  Quantize weights (20 M weights for IV3)  Quantization (32 bit float → 8 bit float)  Memory Mapping  Inception v3 93 MB → 1.5 MB 10 bazel build tensorflow/tools/graph_transforms:transform_graph bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=/tmp/classify_image_graph_def.pb --outputs="softmax" --out_graph=/tmp/quantized_graph.pb --transforms='add_default_attributes strip_unused_nodes(type=float, shape="1,299,299,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms quantize_weights quantize_nodes strip_unused_nodes sort_by_execution_order'
  • 11. Cluster Optimizations  Define your ML Container locally  Evaluate with different parameters in the cloud  Use EFS / GFS for data storage and sharing across nodes  Create separate Data processing container  Mount EFS/GFS drive on all pods for shared storage  Avoid GPU Fragmentation problems by bundling jobs  Placement optimizations – Kubernetes Bundle as pods, Mesos placement constraints  GPU Drivers bundling in container a problem  Mount as Readonly volume, or use Nvidia- docker 11
  • 12. Future: FPGA Hardware Microservices Source: Microsoft Research Blog 12
  • 13. BrainWave Compiler & Runtime Source: Microsoft Research Blog 13
  • 14. Future: Neuromorphic Compute IBM True North: Brain Inspired Chip Neuromorphic memristors 14
  • 16. Resources  A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers https://arxiv.org/pdf/1703.05364.pdf  Microsoft’s Project Brainwave: https://www.microsoft.com/en- us/research/blog/microsoft-unveils-project-brainwave/  Intel Spark BigDL: https://software.intel.com/en-us/articles/bigdl- distributed-deep-learning-on-apache-spark  Baidu’s Paddle-Paddle on Kubernetes: http://blog.kubernetes.io/2017/02/run-deep-learning-with- paddlepaddle-on-kubernetes.html  Kubernetes GPU Guide: https://github.com/Langhalsdino/Kubernetes- GPU-Guide  Tensorflow Quantization: https://www.tensorflow.org/performance/quantization 16