3. Problem
- Duration & Memory Allocation
Large batch size causes lack of memory.
Out of memory error from PyTorch -> Python kernel dies.
Can’t set large batch size.
Can afford batch_size = 5, num_workers = 2
Can’t divide up the work with the other GPUs
Elapsed Time : 25m 44s (10 epochs)
Reached 99% of accuracy in 9 epochs (for training set)
It takes too much time.
4. Data Parallelism in PyTorch
Implemented using torch.nn.DataParallel()
Can be used for wrapping a module or model.
Also support primitives (torch.nn.parallel.*)
Replicate : Replicate the model on multiple devices(GPUs)
Scatter : Distribute the input in the first-dimension.
Gather : Gather and concatenate the input in the first-dimension.
Apply-Parallel : Apply a set of already-distributed inputs to a set of already-distributed
models.
PyTorch Tutorials – Multi-GPU examples
https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html
5. After Parallelism
- GPU Utilization
Hyperparameters
Batch Size : 128
Number of Workers : 16
High Utilization.
Can use large memory space.
Allocated all GPUs
6. After Parallelism
- Training Performance
Hyperparameters
Batch Size : 128
Large batch size need more memory space
Number of Workers : 16
Recommended to set (4 * NUM_GPUs) – From the forum
Elapsed Time : 7m 50s (10 epochs)
Reached 99% of accuracy in 4 epochs (for training set).
It just taken 3m 10s.