SlideShare a Scribd company logo
1 of 23
Umang Sharma
Deep Learning Data Scientist
Author
@umang_sha
Distributed Deep Learning
Trainings over clusters:
Parallelizing Everything
Talk Agenda
• Hardware Requirements for Deep Learning
• CPUs vs GPUs for Deep Learning
• The Challenges in using GPUs
• Scaling it up: Using a Cluster of GPUs
• How TensorFlow Does it?
• Parameter Sharing
• The Solution: Introducing Horovod
• How it works?
• Questions
Hardware Requirements for Deep Learning
• Neural Networks primarily require intense matrix multiplications.
• On top of it a huge amount of Training Data is required for Deep learning
• Huge Data means more memory is required for Computations
• With each layer of Neural network, the number of Parameters increase many
folds.
CPUs vs GPUs for Deep Learning
• Deep learning tasks require good amount of memory to read data, hence
memory bandwidth becomes an important factor
• Wait What’s memory bandwidth, though?
• The Memory bandwidth is the rate at which data can be read from or stored
into a semiconductor memory by a processor.
• The standalone GPU, on the other hand, comes with a dedicated VRAM
memory. Thus, CPU’s memory can be used for other tasks
• GPUs do very well than CPUs here, lets see!
Memory Bandwidths
Comparison: CPUs vs
GPUs over Time
Another Advantage of GPUs over CPUs
• GPUs consists of more cores than CPUs, hence are able to perform these
memory intensive calculations more faster and in optimised way.
• GPUs are able to parallelise these operations due to more number of little
cores present in them
• To paint a picture, imagine CPU is Ferrari and GPU is a huge truck, ferrari
though fast can only transport 2 people at a time. GPU being a truck can
transport a large number of people hence better for us.
CPUs vs GPUs Source: NVIDIA YouTube
The Challenge in using GPUs
• Utilsing GPUs is a tricky process, the reason being one needs to write low level
code to access the GPUs.
• CUDA is the NVIDIA’s library for Deep Learning
• It is not just your code specifically that matters, it is actually the entire code path
between your concept and the CUDA cores that are executing it on the GPU
• But, Worry not! Things have improved for Good.
• Deep learning frameworks such as TensorFlow, PyTorch take care of taking your
Python code by using a computation graph that translates into CUDA code to
GPUs.
Scaling it up: Using a Cluster of GPUs
• So far what we discussed applies to simple models.
• As we create more and more complex models, a single GPU isn’t useful one
needs multiple GPUs, namely a cluster of GPUs.
• Unfortunately, parallelising tasks in GPUs aren’t as simple as in CPUs.
• Fortunately TensorFlow provides a way to distribute training amongst the
GPUs its called tf.distributed( )
How TensorFlow Does it?
• There are 2 types of Deep Learning Training Parallelism possible, Data
Parallelism and Model Parallelism
• Most widely used in Data parallelism more suitable for Deep learning with
huge amount of data
• Data gets divided to multiple GPUs and each GPU runs its own copy of Model
training and training parameters are shared.
• This approach is called centralised approach
Parameter Sharing
Centralised Approach
Challenges in this approach
• But that too comes with its own Challenge 😕
Challenges in Centralised Approach
• It becomes a challenge to decide the accurate ratio of number of parameter to
workers.
• If multiple parameter servers are used, the communication pattern becomes
“all-to-all” which may saturate network interconnects.
• Another key challenge is as we try to implement this, the TF code becomes
more and more complex.
• The Data scientist needs to add more parameter and worker level codes.
• Lets look at the performance now
Scaling on tf.distributed( ), Source: Uber Engg Blog
Note: Ideal is computed by multiplying the single-GPU rate by the number of GPUs
The Solution?
• Presenting Uber’s Horovod
How horovod does it?
• Horovod uses a different algorithm from Baidu called ring-allreduce
• The algorithm works in totally different ways than centralised approach
• Its rather a de-centralised approach
• The approach works faster and uses less bandwidth than parameter sharing
• But how it works? Lets see.
De-Centralised Approach
Ring all-reduce algorithm, Source: Baidu
Why it works?
• The no need of parameter server leads to lesser communication overheads
• The algorithm is bandwidth-optimal, meaning that if the buffer is large enough,
it will optimally utilize the available network.
• The allreduce approach is much easier to understand and adopt.
• All the user needs to do is modify their program to average gradients using
an allreduce() operation.
Implementing Horovod in your code
• Implementation is pretty simple with horovod being packaged as a Python
package.
• Step 1
hvd.init() initializes Horovod.
• Step 2
config.gpu_options.visible_device_list = str(hvd.local_ran
k()) assigns a GPU to each of the TensorFlow processes.
• Step 3
opt=hvd.DistributedOptimiz
er(opt)wraps any regular
TensorFlow optimizer with Horovod
optimizer which takes care of
averaging gradients using ring-
allreduce.
• Step 4
hvd.BroadcastGlobalVariablesH
ook(0) broadcasts variables from the
first process to all other processes to
ensure consistent initialization. If the
program does not use
But Hey! Does all this works?
Questions?
My Contact information
Feel free to contact me for any questions
• Twitter: @umang_sha
• LinkedIn: umangsharma-datascience

More Related Content

What's hot

Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghData Con LA
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performances.rohit
 
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production ScaleGPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scalesparktc
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On SparkSpark Summit
 
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd MostakLeveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd MostakDatabricks
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodLin Yuan
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on sparkSatyendra Rana
 
Introduction to Polyaxon
Introduction to PolyaxonIntroduction to Polyaxon
Introduction to PolyaxonYu Ishikawa
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016MLconf
 
Computer Architecture: A quantitative approach - Cap4 - Section 8
Computer Architecture: A quantitative approach - Cap4 - Section 8Computer Architecture: A quantitative approach - Cap4 - Section 8
Computer Architecture: A quantitative approach - Cap4 - Section 8Marcelo Arbore
 
Modern processor art
Modern processor artModern processor art
Modern processor artwaqasjadoon11
 
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceAnirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceLviv Startup Club
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensOscar Law
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Jen Aman
 

What's hot (20)

Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performance
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production ScaleGPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd MostakLeveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with Horovod
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on spark
 
Introduction to Polyaxon
Introduction to PolyaxonIntroduction to Polyaxon
Introduction to Polyaxon
 
Parallelism in sql server
Parallelism in sql serverParallelism in sql server
Parallelism in sql server
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
 
Danish presentation
Danish presentationDanish presentation
Danish presentation
 
Computer Architecture: A quantitative approach - Cap4 - Section 8
Computer Architecture: A quantitative approach - Cap4 - Section 8Computer Architecture: A quantitative approach - Cap4 - Section 8
Computer Architecture: A quantitative approach - Cap4 - Section 8
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceAnirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 

Similar to Distributed Deep learning Training.

Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDatabricks
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDataWorks Summit
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
Gfarm presentation and thesis topic introduction
Gfarm presentation and thesis topic introductionGfarm presentation and thesis topic introduction
Gfarm presentation and thesis topic introductionChawanat Nakasan
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDocker, Inc.
 
Spark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsSpark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsS N
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedWee Hyong Tok
 
Parallel and Distributed Computing chapter 3
Parallel and Distributed Computing chapter 3Parallel and Distributed Computing chapter 3
Parallel and Distributed Computing chapter 3AbdullahMunir32
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity ManagementEDB
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptxruvex
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture corehard_by
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsAltoros
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduJen Aman
 

Similar to Distributed Deep learning Training. (20)

Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
 
C3 w3
C3 w3C3 w3
C3 w3
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
Gfarm presentation and thesis topic introduction
Gfarm presentation and thesis topic introductionGfarm presentation and thesis topic introduction
Gfarm presentation and thesis topic introduction
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
 
Spark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsSpark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloads
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learned
 
Parallel and Distributed Computing chapter 3
Parallel and Distributed Computing chapter 3Parallel and Distributed Computing chapter 3
Parallel and Distributed Computing chapter 3
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Paddle_Spark_Summit
Paddle_Spark_SummitPaddle_Spark_Summit
Paddle_Spark_Summit
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
 
Choosing the right parallel compute architecture
Choosing the right parallel compute architecture Choosing the right parallel compute architecture
Choosing the right parallel compute architecture
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In Baidu
 

Recently uploaded

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 

Recently uploaded (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 

Distributed Deep learning Training.

  • 1. Umang Sharma Deep Learning Data Scientist Author @umang_sha Distributed Deep Learning Trainings over clusters: Parallelizing Everything
  • 2. Talk Agenda • Hardware Requirements for Deep Learning • CPUs vs GPUs for Deep Learning • The Challenges in using GPUs • Scaling it up: Using a Cluster of GPUs • How TensorFlow Does it? • Parameter Sharing • The Solution: Introducing Horovod • How it works? • Questions
  • 3. Hardware Requirements for Deep Learning • Neural Networks primarily require intense matrix multiplications. • On top of it a huge amount of Training Data is required for Deep learning • Huge Data means more memory is required for Computations • With each layer of Neural network, the number of Parameters increase many folds.
  • 4. CPUs vs GPUs for Deep Learning • Deep learning tasks require good amount of memory to read data, hence memory bandwidth becomes an important factor • Wait What’s memory bandwidth, though? • The Memory bandwidth is the rate at which data can be read from or stored into a semiconductor memory by a processor. • The standalone GPU, on the other hand, comes with a dedicated VRAM memory. Thus, CPU’s memory can be used for other tasks • GPUs do very well than CPUs here, lets see!
  • 6. Another Advantage of GPUs over CPUs • GPUs consists of more cores than CPUs, hence are able to perform these memory intensive calculations more faster and in optimised way. • GPUs are able to parallelise these operations due to more number of little cores present in them • To paint a picture, imagine CPU is Ferrari and GPU is a huge truck, ferrari though fast can only transport 2 people at a time. GPU being a truck can transport a large number of people hence better for us.
  • 7. CPUs vs GPUs Source: NVIDIA YouTube
  • 8. The Challenge in using GPUs • Utilsing GPUs is a tricky process, the reason being one needs to write low level code to access the GPUs. • CUDA is the NVIDIA’s library for Deep Learning • It is not just your code specifically that matters, it is actually the entire code path between your concept and the CUDA cores that are executing it on the GPU • But, Worry not! Things have improved for Good. • Deep learning frameworks such as TensorFlow, PyTorch take care of taking your Python code by using a computation graph that translates into CUDA code to GPUs.
  • 9. Scaling it up: Using a Cluster of GPUs • So far what we discussed applies to simple models. • As we create more and more complex models, a single GPU isn’t useful one needs multiple GPUs, namely a cluster of GPUs. • Unfortunately, parallelising tasks in GPUs aren’t as simple as in CPUs. • Fortunately TensorFlow provides a way to distribute training amongst the GPUs its called tf.distributed( )
  • 10. How TensorFlow Does it? • There are 2 types of Deep Learning Training Parallelism possible, Data Parallelism and Model Parallelism • Most widely used in Data parallelism more suitable for Deep learning with huge amount of data • Data gets divided to multiple GPUs and each GPU runs its own copy of Model training and training parameters are shared. • This approach is called centralised approach
  • 12. Challenges in this approach • But that too comes with its own Challenge 😕
  • 13. Challenges in Centralised Approach • It becomes a challenge to decide the accurate ratio of number of parameter to workers. • If multiple parameter servers are used, the communication pattern becomes “all-to-all” which may saturate network interconnects. • Another key challenge is as we try to implement this, the TF code becomes more and more complex. • The Data scientist needs to add more parameter and worker level codes. • Lets look at the performance now
  • 14. Scaling on tf.distributed( ), Source: Uber Engg Blog Note: Ideal is computed by multiplying the single-GPU rate by the number of GPUs
  • 15. The Solution? • Presenting Uber’s Horovod
  • 16. How horovod does it? • Horovod uses a different algorithm from Baidu called ring-allreduce • The algorithm works in totally different ways than centralised approach • Its rather a de-centralised approach • The approach works faster and uses less bandwidth than parameter sharing • But how it works? Lets see.
  • 17. De-Centralised Approach Ring all-reduce algorithm, Source: Baidu
  • 18. Why it works? • The no need of parameter server leads to lesser communication overheads • The algorithm is bandwidth-optimal, meaning that if the buffer is large enough, it will optimally utilize the available network. • The allreduce approach is much easier to understand and adopt. • All the user needs to do is modify their program to average gradients using an allreduce() operation.
  • 19. Implementing Horovod in your code • Implementation is pretty simple with horovod being packaged as a Python package. • Step 1 hvd.init() initializes Horovod. • Step 2 config.gpu_options.visible_device_list = str(hvd.local_ran k()) assigns a GPU to each of the TensorFlow processes.
  • 20. • Step 3 opt=hvd.DistributedOptimiz er(opt)wraps any regular TensorFlow optimizer with Horovod optimizer which takes care of averaging gradients using ring- allreduce. • Step 4 hvd.BroadcastGlobalVariablesH ook(0) broadcasts variables from the first process to all other processes to ensure consistent initialization. If the program does not use
  • 21. But Hey! Does all this works?
  • 23. My Contact information Feel free to contact me for any questions • Twitter: @umang_sha • LinkedIn: umangsharma-datascience