SlideShare a Scribd company logo
1 of 15
1
Distributed training
and frameworks
AI4Media โ€“ WP3 Workshop, 2021-05-04
Hannes Fassold, JOANNEUM RESEARCH - DIGITAL
Introduction & Motivation
โ€ข Motivation
โ€ข State-of-the-art DL models get bigger, as well as the datasets
on which they are trained on
โ€ข GPT-3 model (SoA text processing / NLP model)
โ€ข model size is 175 billion parameters
โ€ข trained on 500 billion tokens
โ€ข So training time gets up and up โ€ฆ
โ€ข Doing a single training is usally not sufficient
โ€ข Hyperparameter tuning is usually done
โ€ข Adapt learning rate, momentum, ..
โ€ข Might want to experiment with the network architecture
โ€ข Network architecture search
2
Definition
โ€ข In distributed training the workload to train a model is split up
and shared among multiple processors (workers / nodes)
โ€ข Can be a โ€œclusterโ€ of few workers or up to several hundreds
โ€ข Usually, each worker is equipped with 2-8 GPUs
โ€ข Optimal case is linear scaling
โ€ข Training time is inversely proportionally to number of workers
โ€ข Usually not achieved due to adverse affects for larger clusters
โ€ข Serial parts (which can not be paralllized) get more prominent
โ€ข Communication cost (between workers) may rise dis-proportionally
โ€ข Distributed training allows training โ€œfrom scratchโ€ on a huge dataset in minutes
โ€ข E.g. Image classification model can be trained in 1.5 minutes
on ImageNet dataset, employing 512 GPUs
3
Data parallelism versus Model parallelism
โ€ข Data parallelism
โ€ข Training data is split into chunks
โ€ข Each worker processes a chunk
and updates model
โ€ข Advantages
โ€ข Can be applied to any model
โ€ข Disadvantages
โ€ข Each worker must have enough (GPU)
memory to hold the whole model
โ€ข Updated model must be communicated
regularly to all workers
4
Data parallelism versus Model parallelism
โ€ข Model parallelism
โ€ข Model is split into several parts
โ€ข Each worker processes
its respective model part
โ€ข Advantages
โ€ข Support for large models which
do not fit in GPU memory (e.g. NLP models)
โ€ข Disadvantages
โ€ข One has to find an efficient split of the model,
depends on model structure and number of workers
5
System architecture
โ€ข System architecture describes how the model parameter
updates of the different workers are performed
โ€ข Centralized system architecture
โ€ข Workers periodically report their model
updates to one (or more) parameter servers
โ€ข Decentralized system architecture
โ€ข Workers exchange the model updates
directly via an allreduce operation
โ€ข Topology of the allreduce operation is critical
โ€ข Fully connected => Communication cost O(n^2) !
โ€ข Usually using high-performance topologies like
ring, tree, butterfly etc.
6
Synchronization strategies
โ€ข Different strategies to synchronize the model parameters between all workers
โ€ข Synchronous
โ€ข Sync of model parameters is done after each iteration (mini-batch)
โ€ข Prone to straggler problem (slowest worker delays all workers)
โ€ข Bounded asynchronous
โ€ข Workers may train on model parameters which are โ€˜a few iterationsโ€™ old
โ€ข Asynchronous (e.g. Hogwild algorithm)
โ€ข Workers update their model completely independent from others
โ€ข Difficult to reason about model convergence
โ€ข Lost update problem: new parameters written by
worker A could be overwritten by worker B
7
Distributed training frameworks
โ€ข Main DL frameworks (PyTorch, TensorFlow, MXNet)
โ€ข Provide mainly support for a single node (but using multiple GPUs)
โ€ข Horovod (Uber)
โ€ข PyTorch, TensorFlow, Keras, MXNet
โ€ข Data parallelism and limited model parallelism
โ€ข Fairscale (Facebook)
โ€ข PyTorch
โ€ข Data parallelism and limited model/pipeline parallelism
โ€ข Deepspeed (Microsoft)
โ€ข PyTorch
โ€ข Data parallelism and model/pipeline parallelism
โ€ข Gradient compression (1-bit Adam/LAMB), โ€ฆ
8
9
10
11
12
13
14
AI4Media WP3 workshop - Distributed training introduction

More Related Content

What's hot

Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning IntroductionDong Guo
ย 
Back propagation
Back propagationBack propagation
Back propagationBangalore
ย 
Keras: Deep Learning Library for Python
Keras: Deep Learning Library for PythonKeras: Deep Learning Library for Python
Keras: Deep Learning Library for PythonRafi Khan
ย 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
ย 
Handwriting recognition
Handwriting recognitionHandwriting recognition
Handwriting recognitionMaeda Hanafi
ย 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018Adam Gibson
ย 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
ย 
DyCode Engineering - Machine Learning with TensorFlow
DyCode Engineering - Machine Learning with TensorFlowDyCode Engineering - Machine Learning with TensorFlow
DyCode Engineering - Machine Learning with TensorFlowAlwin Arrasyid
ย 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...Sanghamitra Deb
ย 
ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18Suzanne Wallace
ย 
Deep learning for real life applications
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applicationsAnas Arram, Ph.D
ย 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine LearningYuriy Guts
ย 
Productionizing dl from the ground up
Productionizing dl from the ground upProductionizing dl from the ground up
Productionizing dl from the ground upAdam Gibson
ย 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021Sanghamitra Deb
ย 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
ย 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture SearchDaeJin Kim
ย 
A Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingA Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingShivam Sawhney
ย 
Machine Learning on Distributed Systems by Josh Poduska
Machine Learning on Distributed Systems by Josh PoduskaMachine Learning on Distributed Systems by Josh Poduska
Machine Learning on Distributed Systems by Josh PoduskaData Con LA
ย 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015Beatrice van Eden
ย 

What's hot (20)

Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
ย 
Back propagation
Back propagationBack propagation
Back propagation
ย 
Keras: Deep Learning Library for Python
Keras: Deep Learning Library for PythonKeras: Deep Learning Library for Python
Keras: Deep Learning Library for Python
ย 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
ย 
Handwriting recognition
Handwriting recognitionHandwriting recognition
Handwriting recognition
ย 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
ย 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
ย 
DyCode Engineering - Machine Learning with TensorFlow
DyCode Engineering - Machine Learning with TensorFlowDyCode Engineering - Machine Learning with TensorFlow
DyCode Engineering - Machine Learning with TensorFlow
ย 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...
ย 
ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18
ย 
Deep learning for real life applications
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applications
ย 
Dl
DlDl
Dl
ย 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
ย 
Productionizing dl from the ground up
Productionizing dl from the ground upProductionizing dl from the ground up
Productionizing dl from the ground up
ย 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
ย 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
ย 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
ย 
A Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingA Neural Network that Understands Handwriting
A Neural Network that Understands Handwriting
ย 
Machine Learning on Distributed Systems by Josh Poduska
Machine Learning on Distributed Systems by Josh PoduskaMachine Learning on Distributed Systems by Josh Poduska
Machine Learning on Distributed Systems by Josh Poduska
ย 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
ย 

Similar to AI4Media WP3 workshop - Distributed training introduction

Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# WayBishnu Rawal
ย 
Pdc lecture1
Pdc lecture1Pdc lecture1
Pdc lecture1SyedSafeer1
ย 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko Neotys
ย 
Distributed Model Validation with Epsilon
Distributed Model Validation with EpsilonDistributed Model Validation with Epsilon
Distributed Model Validation with EpsilonSina Madani
ย 
Data warehouse 26 exploiting parallel technologies
Data warehouse  26 exploiting parallel technologiesData warehouse  26 exploiting parallel technologies
Data warehouse 26 exploiting parallel technologiesVaibhav Khanna
ย 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedWee Hyong Tok
ย 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdfgiddy5
ย 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at ScaleMateusz Dymczyk
ย 
Assessing quick update methods of statistical translation models
Assessing quick update methods of statistical translation modelsAssessing quick update methods of statistical translation models
Assessing quick update methods of statistical translation modelstransLectures
ย 
Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-finalsupportlogic
ย 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
ย 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
ย 
Presentation 7.pptx
Presentation 7.pptxPresentation 7.pptx
Presentation 7.pptxShivam327815
ย 
Multithreaded Programming Part- I.pdf
Multithreaded Programming Part- I.pdfMultithreaded Programming Part- I.pdf
Multithreaded Programming Part- I.pdfHarika Pudugosula
ย 
Parallel architecture &programming
Parallel architecture &programmingParallel architecture &programming
Parallel architecture &programmingIsmail El Gayar
ย 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Peter Trรถger
ย 

Similar to AI4Media WP3 workshop - Distributed training introduction (20)

Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# Way
ย 
Pdc lecture1
Pdc lecture1Pdc lecture1
Pdc lecture1
ย 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko
ย 
Distributed Model Validation with Epsilon
Distributed Model Validation with EpsilonDistributed Model Validation with Epsilon
Distributed Model Validation with Epsilon
ย 
Data warehouse 26 exploiting parallel technologies
Data warehouse  26 exploiting parallel technologiesData warehouse  26 exploiting parallel technologies
Data warehouse 26 exploiting parallel technologies
ย 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learned
ย 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
ย 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
ย 
Assessing quick update methods of statistical translation models
Assessing quick update methods of statistical translation modelsAssessing quick update methods of statistical translation models
Assessing quick update methods of statistical translation models
ย 
Pthread
PthreadPthread
Pthread
ย 
Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-final
ย 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
ย 
C3 w3
C3 w3C3 w3
C3 w3
ย 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
ย 
unit 4.pptx
unit 4.pptxunit 4.pptx
unit 4.pptx
ย 
unit 4.pptx
unit 4.pptxunit 4.pptx
unit 4.pptx
ย 
Presentation 7.pptx
Presentation 7.pptxPresentation 7.pptx
Presentation 7.pptx
ย 
Multithreaded Programming Part- I.pdf
Multithreaded Programming Part- I.pdfMultithreaded Programming Part- I.pdf
Multithreaded Programming Part- I.pdf
ย 
Parallel architecture &programming
Parallel architecture &programmingParallel architecture &programming
Parallel architecture &programming
ย 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)
ย 

Recently uploaded

Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spaintimesproduction05
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSrknatarajan
ย 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
ย 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
ย 
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .DerechoLaboralIndivi
ย 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
ย 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
ย 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
ย 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
ย 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
ย 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
ย 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
ย 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
ย 

Recently uploaded (20)

Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
ย 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
ย 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
ย 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
ย 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ย 
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
ย 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
ย 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
ย 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
ย 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
ย 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
ย 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
ย 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
ย 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
ย 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
ย 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
ย 

AI4Media WP3 workshop - Distributed training introduction

  • 1. 1 Distributed training and frameworks AI4Media โ€“ WP3 Workshop, 2021-05-04 Hannes Fassold, JOANNEUM RESEARCH - DIGITAL
  • 2. Introduction & Motivation โ€ข Motivation โ€ข State-of-the-art DL models get bigger, as well as the datasets on which they are trained on โ€ข GPT-3 model (SoA text processing / NLP model) โ€ข model size is 175 billion parameters โ€ข trained on 500 billion tokens โ€ข So training time gets up and up โ€ฆ โ€ข Doing a single training is usally not sufficient โ€ข Hyperparameter tuning is usually done โ€ข Adapt learning rate, momentum, .. โ€ข Might want to experiment with the network architecture โ€ข Network architecture search 2
  • 3. Definition โ€ข In distributed training the workload to train a model is split up and shared among multiple processors (workers / nodes) โ€ข Can be a โ€œclusterโ€ of few workers or up to several hundreds โ€ข Usually, each worker is equipped with 2-8 GPUs โ€ข Optimal case is linear scaling โ€ข Training time is inversely proportionally to number of workers โ€ข Usually not achieved due to adverse affects for larger clusters โ€ข Serial parts (which can not be paralllized) get more prominent โ€ข Communication cost (between workers) may rise dis-proportionally โ€ข Distributed training allows training โ€œfrom scratchโ€ on a huge dataset in minutes โ€ข E.g. Image classification model can be trained in 1.5 minutes on ImageNet dataset, employing 512 GPUs 3
  • 4. Data parallelism versus Model parallelism โ€ข Data parallelism โ€ข Training data is split into chunks โ€ข Each worker processes a chunk and updates model โ€ข Advantages โ€ข Can be applied to any model โ€ข Disadvantages โ€ข Each worker must have enough (GPU) memory to hold the whole model โ€ข Updated model must be communicated regularly to all workers 4
  • 5. Data parallelism versus Model parallelism โ€ข Model parallelism โ€ข Model is split into several parts โ€ข Each worker processes its respective model part โ€ข Advantages โ€ข Support for large models which do not fit in GPU memory (e.g. NLP models) โ€ข Disadvantages โ€ข One has to find an efficient split of the model, depends on model structure and number of workers 5
  • 6. System architecture โ€ข System architecture describes how the model parameter updates of the different workers are performed โ€ข Centralized system architecture โ€ข Workers periodically report their model updates to one (or more) parameter servers โ€ข Decentralized system architecture โ€ข Workers exchange the model updates directly via an allreduce operation โ€ข Topology of the allreduce operation is critical โ€ข Fully connected => Communication cost O(n^2) ! โ€ข Usually using high-performance topologies like ring, tree, butterfly etc. 6
  • 7. Synchronization strategies โ€ข Different strategies to synchronize the model parameters between all workers โ€ข Synchronous โ€ข Sync of model parameters is done after each iteration (mini-batch) โ€ข Prone to straggler problem (slowest worker delays all workers) โ€ข Bounded asynchronous โ€ข Workers may train on model parameters which are โ€˜a few iterationsโ€™ old โ€ข Asynchronous (e.g. Hogwild algorithm) โ€ข Workers update their model completely independent from others โ€ข Difficult to reason about model convergence โ€ข Lost update problem: new parameters written by worker A could be overwritten by worker B 7
  • 8. Distributed training frameworks โ€ข Main DL frameworks (PyTorch, TensorFlow, MXNet) โ€ข Provide mainly support for a single node (but using multiple GPUs) โ€ข Horovod (Uber) โ€ข PyTorch, TensorFlow, Keras, MXNet โ€ข Data parallelism and limited model parallelism โ€ข Fairscale (Facebook) โ€ข PyTorch โ€ข Data parallelism and limited model/pipeline parallelism โ€ข Deepspeed (Microsoft) โ€ข PyTorch โ€ข Data parallelism and model/pipeline parallelism โ€ข Gradient compression (1-bit Adam/LAMB), โ€ฆ 8
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. 14

Editor's Notes

  1. GPT-3 info => https://lambdalabs.com/blog/demystifying-gpt-3/ und https://scilogs.spektrum.de/hlf/an-ai-walks-into-a-bar-and-it-writes-an-awesome-story/
  2. Alexnet in 1.5 minuten โ€“ siehe https://arxiv.org/pdf/1902.06855.pdf
  3. Info und figures aus https://arxiv.org/pdf/1903.11314.pdf
  4. Info und figures aus https://arxiv.org/pdf/1903.11314.pdf