Using Multi GPU in PyTorch

•

0 likes•1,032 views

Jun Young Park

PyTorch 에서 다중 GPU를 활용할 수 있도록 도와주는 DataParallel 을 다루어 본 개인 공부자료 입니다. 여러분들의 소중한 의견 감사합니다.

Technology

Using Multi GPU
in PyTorch
RTSS Jun Young Park

Problem
- Low utilization
Only allocated
single GPU.
Zero Utilization
Redundant Memory

Problem
- Duration & Memory Allocation
 Large batch size causes lack of memory.
 Out of memory error from PyTorch -> Python kernel dies.
 Can’t set large batch size.
 Can afford batch_size = 5, num_workers = 2
 Can’t divide up the work with the other GPUs
 Elapsed Time : 25m 44s (10 epochs)
 Reached 99% of accuracy in 9 epochs (for training set)
 It takes too much time.

Data Parallelism in PyTorch
 Implemented using torch.nn.DataParallel()
 Can be used for wrapping a module or model.
 Also support primitives (torch.nn.parallel.*)
 Replicate : Replicate the model on multiple devices(GPUs)
 Scatter : Distribute the input in the first-dimension.
 Gather : Gather and concatenate the input in the first-dimension.
 Apply-Parallel : Apply a set of already-distributed inputs to a set of already-distributed
models.
 PyTorch Tutorials – Multi-GPU examples
 https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html

After Parallelism
- GPU Utilization
 Hyperparameters
 Batch Size : 128
 Number of Workers : 16
 High Utilization.
 Can use large memory space.
 Allocated all GPUs

After Parallelism
- Training Performance
 Hyperparameters
 Batch Size : 128
 Large batch size need more memory space
 Number of Workers : 16
 Recommended to set (4 * NUM_GPUs) – From the forum
 Elapsed Time : 7m 50s (10 epochs)
 Reached 99% of accuracy in 4 epochs (for training set).
 It just taken 3m 10s.

What's hot

Keras on tensorflow in R & PythonLonghow Lam

TensorFlow for HPC?inside-BigData.com

Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo

Pytorchehsan tr

Tensorflow London 12: Marcel Horstmann and Laurent Decamp 'Using TensorFlow t...Seldon

How to use tensorflowhyunyoung Lee

TPU paper slideDong-Hyun Hwang

Intel optimized tensorflow, distributed deep learninggeetachauhan

An Introduction to TensorFlow architectureMani Goswami

Chainer v4 and v5Preferred Networks

PR-129: Horovod: fast and easy distributed deep learning in TensorFlowSeoul National University

Introduction to Chainer 11 may,2018Preferred Networks

Effective machine learning_with_tpuAthul Suresh

Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016MLconf

Chainer ui v0.3 and imagereportPreferred Networks

Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceLviv Startup Club

Available HPC resources at CSUCCSUC - Consorci de Serveis Universitaris de Catalunya

Exploring Gpgpu WorkloadsUnai Lopez-Novoa

S1170143 2s1170143

What's hot (19)

Keras on tensorflow in R & Python

TensorFlow for HPC?

Distributed implementation of a lstm on spark and tensorflow

Pytorch

Tensorflow London 12: Marcel Horstmann and Laurent Decamp 'Using TensorFlow t...

How to use tensorflow

TPU paper slide

Intel optimized tensorflow, distributed deep learning

An Introduction to TensorFlow architecture

Chainer v4 and v5

PR-129: Horovod: fast and easy distributed deep learning in TensorFlow

Introduction to Chainer 11 may,2018

Effective machine learning_with_tpu

Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016

Chainer ui v0.3 and imagereport

Anirudh Koul. 30 Golden Rules of Deep Learning Performance

Available HPC resources at CSUC

Exploring Gpgpu Workloads

S1170143 2

Similar to Using Multi GPU in PyTorch

C3 w3Ajay Taneja

Distributed DNN training: Infrastructure, challenges, and lessons learnedWee Hyong Tok

NbvtalkatjntuvizianagaramNagasuri Bala Venkateswarlu

Open power ddl and lmsGanesan Narayanasamy

Parallel computationJayanti Prasad Ph.D.

Introduction to PolyaxonYu Ishikawa

parallel-computation.pdfJayanti Prasad Ph.D.

Google TPUHao(Robin) Dong

HiPEAC 2019 Tutorial - Maestro RTOSTulipp. Eu

Java multi thread programming on cmp systemQUAID-E-AWAM UNIVERSITY OF ENGINEERING, SCIENCE & TECHNOLOGY, NAWABSHAH, SINDH, PAKISTAN

Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDatabricks

Fast and Reproducible Deep LearningGreg Gandenberger

Tutotial 2 answerUdaya Kumar

Distributed Prioritized Experience Replay(Ape-X)Younggyo Seo

Deep Learning at ScaleMateusz Dymczyk

Can FPGAs Compete with GPUs?inside-BigData.com

Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com

Tensor Processing Unit (TPU)Antonios Katsarakis

Data Parallel Deep Learninginside-BigData.com

Large Model support and Distribute deep learningGanesan Narayanasamy

Similar to Using Multi GPU in PyTorch (20)

C3 w3

Distributed DNN training: Infrastructure, challenges, and lessons learned

Nbvtalkatjntuvizianagaram

Open power ddl and lms

Parallel computation

Introduction to Polyaxon

parallel-computation.pdf

Google TPU

HiPEAC 2019 Tutorial - Maestro RTOS

Java multi thread programming on cmp system

Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling

Fast and Reproducible Deep Learning

Tutotial 2 answer

Distributed Prioritized Experience Replay(Ape-X)

Deep Learning at Scale

Can FPGAs Compete with GPUs?

Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...

Tensor Processing Unit (TPU)

Data Parallel Deep Learning

Large Model support and Distribute deep learning

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Slack Application Development 101 Slidespraypatel2

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

How to convert PDF to text with Nanonetsnaman860154

Understanding the Laravel MVC ArchitecturePixlogix Infotech

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

My Hashitalk Indonesia April 2024 Presentation

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Presentation on how to chat with PDF using ChatGPT code interpreter

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Slack Application Development 101 Slides

Maximizing Board Effectiveness 2024 Webinar.pptx

Scaling API-first – The story of a global engineering organization

Salesforce Community Group Quito, Salesforce 101

How to convert PDF to text with Nanonets

Understanding the Laravel MVC Architecture

How to Troubleshoot Apps for the Modern Connected Worker

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Handwritten Text Recognition for manuscripts and early printed texts

Using Multi GPU in PyTorch

1. Using Multi GPU in PyTorch RTSS Jun Young Park

2. Problem - Low utilization Only allocated single GPU. Zero Utilization Redundant Memory

3. Problem - Duration & Memory Allocation  Large batch size causes lack of memory.  Out of memory error from PyTorch -> Python kernel dies.  Can’t set large batch size.  Can afford batch_size = 5, num_workers = 2  Can’t divide up the work with the other GPUs  Elapsed Time : 25m 44s (10 epochs)  Reached 99% of accuracy in 9 epochs (for training set)  It takes too much time.

4. Data Parallelism in PyTorch  Implemented using torch.nn.DataParallel()  Can be used for wrapping a module or model.  Also support primitives (torch.nn.parallel.*)  Replicate : Replicate the model on multiple devices(GPUs)  Scatter : Distribute the input in the first-dimension.  Gather : Gather and concatenate the input in the first-dimension.  Apply-Parallel : Apply a set of already-distributed inputs to a set of already-distributed models.  PyTorch Tutorials – Multi-GPU examples  https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html

5. After Parallelism - GPU Utilization  Hyperparameters  Batch Size : 128  Number of Workers : 16  High Utilization.  Can use large memory space.  Allocated all GPUs

6. After Parallelism - Training Performance  Hyperparameters  Batch Size : 128  Large batch size need more memory space  Number of Workers : 16  Recommended to set (4 * NUM_GPUs) – From the forum  Elapsed Time : 7m 50s (10 epochs)  Reached 99% of accuracy in 4 epochs (for training set).  It just taken 3m 10s.

Using Multi GPU in PyTorch

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Using Multi GPU in PyTorch

Similar to Using Multi GPU in PyTorch (20)

More from Jun Young Park

More from Jun Young Park (8)

Recently uploaded

Recently uploaded (20)

Using Multi GPU in PyTorch