SlideShare a Scribd company logo
Adam Tetelman, Solutions Architect
An overview of the MIG feature, configuration caveats, and details around additional features,
isolation, communication, value-add, and use-cases.
MIG Technical Overview
2
Why - MIG on the DGX A100
Multi-Instance GPU - Major MIG benefits and goals
● The goal of the MIG feature is to increase GPU utilization. This is done by partitioning a single GPU into
multiple fully isolated devices that are efficiently sized per-use-case, specifically smaller use-cases that only
require a subset of GPU resources.
● Benefits of MIG on the NVIDIA A100:
○ Physical allocation of resources used by parallel GPU workloads
→ Secure multi-tenant environments with isolation and predictable QoS
○ Versatile profiles with dynamic configuration (152 configurations on a DGX!)
→ Maximized utilization by configuring NVIDIA A100 for specific workloads
○ CUDA Programming model unchanged
→ No code changes required.
3
When - MIG on the DGX A100
Workloads that don’t utilize the full A100 GPU - hpc, prototyping, inference, and light training
4
How - MIG on the DGX A100
Optimize GPU Utilization, Expand Access to More Users with Guaranteed Quality of Service
Up To 7 GPU Instances In a Single A100:
Dedicated SM, Memory, L2 cache, Bandwidth for
hardware QoS & isolation
Simultaneous Workload Execution With
Guaranteed Quality Of Service:
All MIG instances run in parallel with predictable
throughput & latency
Right Sized GPU Allocation:
Different sized MIG instances based on target
workloads
Diverse Deployment Environments:
Supported with Bare metal, Docker, Kubernetes,
Virtualized Env.
MIG User Guide: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html
6
MIG Terminology - Instances
A MIG device is made up of GPU Instance and a Compute Instance
● Multi-Instance GPU (MIG) - The MIG feature allows one or more GPU Instances to be allocated within a
GPU. Making a single GPU appear as if it were many.
● GPU Instances (GI) - A fully isolated collection of all physical GPU resources. Can contain one or more
GPU Compute Instances. Contains one or more Compute Instances.
● Compute Instance (CI) - An isolated collection of GPU SMs (CUDA cores) belongs to a single GPU Instance.
Shares GPU memory with other CIs in that GPU Instance.
● Instance Profile - A GPU Instance Profile (GIP) or GPU Compute Instance Profile (CIP) defines the
configuration and available resources in an Instance.
● MIG Device - A MIG Device is the actual “GPU” an application sees and it is the combination of a GPU,
GPU Instance, and Compute Instance.
7
GPU Instances vs. Compute Instance
Full isolation vs. shared resources
GPU instance per user Compute instance per process
8
GPU Instances in MIG
Applications in a single GPU Instance get isolation and guaranteed QoS
Each GPU Instance has physical partitions of GPU
memory, GPU SMs, and all other GPU hardware. This
provide each GPU Instance and the VMs with guaranteed
QoS through resource isolation.
Each GPU compute Instance within the GPU instance
provides partial isolation. This allows isolated compute
resources and independent workload scheduling.
Clock speed, MIG profile configuration, and other
settings should be optimized based on the expected MIG
use-cases. There is no “best” or “optimal” combination
of profiles and configurations.
The creation of GPU instances requires root privileges.
9
Compute Instances on MIG
Processes in a Compute Instance get isolation & flexibility at the expense of QoS
Each GPU compute instance within the GPU instance
provides functional isolation. This allows isolated
compute resources and independent workload
scheduling.
Processes in a CI (such as a container) are executed as
applications against GPU Compute instances.
A process can get full isolation by being the only CUDA
application running on a specific GPU Instance.
If multiple processes are running with GPU compute
instances running on the same GPU instance, then the
same level of isolation is not provided, but ICP is
available.
10
MIG Terminology - Slices
Memory slice, Compute slices, and GPU engines
● GPU Slice - A GPU slice is the smallest fraction of the GPU that combines a single GPU memory slice and
a single GPU SM slice.
● GPU Memory Slice - A GPU memory slice is the smallest fraction of the GPU’s memory, including the
corresponding memory controllers and cache. A GPU memory slice is roughly 1/8 of the total GPU
memory resources, including both capacity and bandwidth.
● GPU SM Slice - A GPU SM slice is the smallest fraction of the SMs. A GPU SM slice is roughly one seventh,
1/7 of the total number of SMs.
● GPU Engine - A GPU Engine is what executes work on the GPI. Different engines are responsible for
different actions such as the Compute engine or Copy engine. Engines are scheduled independently and
work in a GPU context.
11
Configuration options with MIG
Configuration MIG on the NVIDIA A100
Flexible configurations
Driver presents available profiles
18 combinations possible
Constraints
► Graphics contexts not supported
► P2P not available (e.g. no NVLink)
► For running VMs within each instance, vGPU is
required
Setup
► Workflow to enable MIG requires a GPU reset
► Setting is persistent across reboots (stored in
inforom)
► Reconfiguration is dynamic and can be done from a
container
MIG instances on A100-SXM4-40GB
12
GPU Instance Profiles
MIG
Instance
Instances
per GPU
SMs Memory Engines
Target Workload
Training Inference
1g.5gb 7 14 5 GB 0 NVDECs
BERT Fine-tuning (e.g.
SQuAD), Multiple
chatbots, Jupyter
notebooks
Multiple inference (e.g. Triton); ResNet-50, BERT, W&D
networks2g.10gb 3 28 10 GB 1 NVDECs
3g.20gb 2 42 20 GB 2 NVDECs
Training on ResNet-50,
BERT, WnD networks4g.20gb 1 56 20 GB 2 NVDECs
7g.40gb 1 98 40 GB 5 NVDECs
All the options for a single A100-SXM4-40GB GPU
* A small carveout (~0.5GB) is needed for MIG mgmt. and not usable by end user
* GPU Instance profile name is <SM_SLICE_COUNT>g.<GPU_MEMORY_COUNT>gb
13
GPU reset required to enable/disable MIG mode, this one-time operation is
per-gpu and persists across reboots
Use NVML/nvidia-smi to manage MIG
Configuration supported through management containers
Configure and reconfigure as-needed
Example: List available profiles with nvidia-smi
MIG Management & Configuration
List/Create/Update/Destroy Instances via NVML and nvidia-smi
# nvidia-smi mig --list-gpu-instances
+------------------------------------------------+
| GPU instances: |
| GPU Name Profile Instance Placement |
| ID ID Start:Size |
|================================================|
| 0 1g.5gb 19 9 2:1 |
+------------------------------------------------+
| 0 1g.5gb 19 10 3:1 |
+------------------------------------------------+
| 0 1g.5gb 19 13 6:1 |
+------------------------------------------------+
| 0 2g.10gb 14 3 0:2 |
+------------------------------------------------+
| 0 2g.10gb 14 5 4:2 |
+------------------------------------------------+
14
Using MIG in Docker Containers
Passing through specific MIG devices
With MIG disabled the GPU is still specified by the GPU index or GPU UUID.
However, when MIG is enabled the device is specified by index in the format
<gpu-device-index>:<mig-device-index> or by UUID in the format
MIG-<gpu-id>/<gpu-instance-id>/<gpu-compute-instance-id>.
Configure MIG (as root)
Configure MIG (non-root)
MIG monitoring
Note: Commands are slightly different on Docker version 18.x (shown here is 19.03+)
docker run --cap-add SYS_ADMIN --gpus '"device=0"' -e NVIDIA_MIG_CONFIG_DEVICES="all" nvidia/cuda:9.0-base nvidia-smi
docker run --gpus '"device=0:0,0:1"' nvidia/cuda:9.0-base nvidia-smi # MIG ENABLED
docker run --cap-add SYS_ADMIN --gpus '"device=0"' -e NVIDIA_MIG_MONITOR_DEVICES="all" nvidia/cuda:9.0-base nvidia-smi
chown nobody /proc/driver/nvidia/capabilities/mig/config
docker run --cap-add SYS_ADMIN --user nobody --gpus '"device=0"' 
-e NVIDIA_MIG_CONFIG_DEVICES="all" nvidia/cuda:9.0-base nvidia-smi
docker run --gpus '"device=0,1"' nvidia/cuda:9.0-base nvidia-smi # MIG DISABLED
15
MIG for Maximized Resource Utilization
Use Case
Number of
GPU Instances
(GI) Available
Description Example
GPU Memory
per model
Inferencing 7
Run multiple Inference Servers
simultaneously
Jarvis Chatbots 5 GB
Development 7
Serve multiple development
environments simultaneously
Prototyping in
Jupyter
Notebooks
5 GB
Fine-tuning 3
Perform fine-tuning for
multiple smaller models &
datasets simultaneously
BERT
Fine-Tuning
10 GB
Training 2
Train multiple larger models
simultaneously
ResNet50
Training
20 GB
Dynamically change available cloud GPU instance types to meet demand
16
Mixed workloads with MIG
Running 3 different applications on a single GPU
P0
P1
P2
P3
P0
P0
Instance 0 Instance 1 Instance 2
4-Slice
4 parallel CUDA processes P0
– P3
2-Slice
One CUDA process P0
GPU
Instances
FB FB
Compute
Engine
[0]
L2
Compute
Engine
[1]
Compute
Engine
[2]
Compute
Engine
[3]
Compute Engine
(x2)
Compute
Engine
L2
FB
L2
Compute
&
Memory
Partitions
1-Slice
Running Debugger
NVDEC
[0]
NVDEC
[1]
NVDEC
x1
When a whole GPU is underutilized, use MIG to better match GPU
resources to users needs thus optimizing overall GPU utilization.
17
MIG for Optimized Inferencing
Scaling out microservices with Triton and MIG
Horizontally scale out your containers or VMs 7x by using MIG
GPU compute Instance instead of GPU devices
No updates needed for application code; deployment code
must be updated to use MIG rather than GPU resources
Continue using microservice best-practice, one server per app,
or allow Triton to manage all MIG devices
Ideal for batch-size 1 inferencing
18
MIG for Many Developers
Using a single DGX A100 as a development platform for 50+ people
Mix user development that requires minimal
resources with large workloads that are resource
intensive on the same platform
Give a very large number of users GPUs for
development work, learning, demos, and POCs
Perfect for:
● Dev teams that do not need extensive GPU mem
● Large instructor-led workshops
● Academic coursework and institutions
● Sales organization with ongoing demos
19
Consolidating Different Workloads on DGX A100
One platform for training, inference, and data analytics
20
Summary of MIG
Flexibility & Isolation
MIG allows you to configure a single GPU to fit any set of
workloads
A MIG devices looks like an ordinary GPU, so no change is
necessary to any user-code
With the proper MIG configuration, no GPU resources remain
idle
A single A100 can be dynamically configured to best serve the
current workload
MIG integrates into existing monitoring framework and works
with virtualization and container orchestration platforms
As your team grows and your clusters scale the DGX A100 will
continue to be the right tool for the job with MIG

More Related Content

What's hot

Cgroups, namespaces and beyond: what are containers made from?
Cgroups, namespaces and beyond: what are containers made from?Cgroups, namespaces and beyond: what are containers made from?
Cgroups, namespaces and beyond: what are containers made from?Docker, Inc.
 
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureSupermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureRebekah Rodriguez
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph Community
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Odinot Stanislas
 
NVIDIA vGPU - Introduction to NVIDIA Virtual GPU
NVIDIA vGPU - Introduction to NVIDIA Virtual GPUNVIDIA vGPU - Introduction to NVIDIA Virtual GPU
NVIDIA vGPU - Introduction to NVIDIA Virtual GPULee Bushen
 
SQL Server High Availability and Disaster Recovery
SQL Server High Availability and Disaster RecoverySQL Server High Availability and Disaster Recovery
SQL Server High Availability and Disaster RecoveryMichael Poremba
 
MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 BriefingJonathan Symonds
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017 Karan Singh
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
Multi Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on KubernetesMulti Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on KubernetesOhyama Masanori
 
The hardest part of microservices: your data
The hardest part of microservices: your dataThe hardest part of microservices: your data
The hardest part of microservices: your dataChristian Posta
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] IO Visor Project
 
Windows deployment on bare metal using ironic
Windows deployment on bare metal using ironicWindows deployment on bare metal using ironic
Windows deployment on bare metal using ironicSrinivasa Acharya
 
RedHat OpenStack Platform Overview
RedHat OpenStack Platform OverviewRedHat OpenStack Platform Overview
RedHat OpenStack Platform Overviewindevlab
 
IBM Spectrum Scale Security
IBM Spectrum Scale Security IBM Spectrum Scale Security
IBM Spectrum Scale Security Sandeep Patil
 
The BlackBox Project: Safely store secrets in Git/Mercurial (originally for P...
The BlackBox Project: Safely store secrets in Git/Mercurial (originally for P...The BlackBox Project: Safely store secrets in Git/Mercurial (originally for P...
The BlackBox Project: Safely store secrets in Git/Mercurial (originally for P...Tom Limoncelli
 
한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson PlatformHANCOM MDS
 
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CD
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CDA GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CD
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CDJulian Mazzitelli
 
Innovative Solutions for Cloud Gaming, Media, Transcoding, & AI Inferencing
Innovative Solutions for Cloud Gaming, Media, Transcoding, & AI InferencingInnovative Solutions for Cloud Gaming, Media, Transcoding, & AI Inferencing
Innovative Solutions for Cloud Gaming, Media, Transcoding, & AI InferencingRebekah Rodriguez
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 

What's hot (20)

Cgroups, namespaces and beyond: what are containers made from?
Cgroups, namespaces and beyond: what are containers made from?Cgroups, namespaces and beyond: what are containers made from?
Cgroups, namespaces and beyond: what are containers made from?
 
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureSupermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
 
NVIDIA vGPU - Introduction to NVIDIA Virtual GPU
NVIDIA vGPU - Introduction to NVIDIA Virtual GPUNVIDIA vGPU - Introduction to NVIDIA Virtual GPU
NVIDIA vGPU - Introduction to NVIDIA Virtual GPU
 
SQL Server High Availability and Disaster Recovery
SQL Server High Availability and Disaster RecoverySQL Server High Availability and Disaster Recovery
SQL Server High Availability and Disaster Recovery
 
MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 Briefing
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Multi Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on KubernetesMulti Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on Kubernetes
 
The hardest part of microservices: your data
The hardest part of microservices: your dataThe hardest part of microservices: your data
The hardest part of microservices: your data
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
 
Windows deployment on bare metal using ironic
Windows deployment on bare metal using ironicWindows deployment on bare metal using ironic
Windows deployment on bare metal using ironic
 
RedHat OpenStack Platform Overview
RedHat OpenStack Platform OverviewRedHat OpenStack Platform Overview
RedHat OpenStack Platform Overview
 
IBM Spectrum Scale Security
IBM Spectrum Scale Security IBM Spectrum Scale Security
IBM Spectrum Scale Security
 
The BlackBox Project: Safely store secrets in Git/Mercurial (originally for P...
The BlackBox Project: Safely store secrets in Git/Mercurial (originally for P...The BlackBox Project: Safely store secrets in Git/Mercurial (originally for P...
The BlackBox Project: Safely store secrets in Git/Mercurial (originally for P...
 
한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform
 
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CD
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CDA GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CD
A GitOps Kubernetes Native CICD Solution with Argo Events, Workflows, and CD
 
Innovative Solutions for Cloud Gaming, Media, Transcoding, & AI Inferencing
Innovative Solutions for Cloud Gaming, Media, Transcoding, & AI InferencingInnovative Solutions for Cloud Gaming, Media, Transcoding, & AI Inferencing
Innovative Solutions for Cloud Gaming, Media, Transcoding, & AI Inferencing
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 

Similar to Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud

GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDASavith Satheesh
 
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 4  Maximizing the utilization of GPU resources on-premise and in the cloudPart 4  Maximizing the utilization of GPU resources on-premise and in the cloud
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloudUniva, an Altair Company
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
qCUDA-ARM : Virtualization for Embedded GPU Architectures
 qCUDA-ARM : Virtualization for Embedded GPU Architectures  qCUDA-ARM : Virtualization for Embedded GPU Architectures
qCUDA-ARM : Virtualization for Embedded GPU Architectures 柏瑀 黃
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
 
1101: GRID 技術セッション 2:vGPU Sizing
1101: GRID 技術セッション 2:vGPU Sizing1101: GRID 技術セッション 2:vGPU Sizing
1101: GRID 技術セッション 2:vGPU SizingNVIDIA Japan
 
2-GPGPU-Sim-Overview.pptx
2-GPGPU-Sim-Overview.pptx2-GPGPU-Sim-Overview.pptx
2-GPGPU-Sim-Overview.pptxYonggangLiu3
 
Gpu application in cuda memory
Gpu application in cuda memoryGpu application in cuda memory
Gpu application in cuda memoryjournalacij
 
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxssuser30e7d2
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化NVIDIA Taiwan
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Stefano Di Carlo
 
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel MovidiusBenchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel MovidiusbyteLAKE
 
XPDS16: Live scalability for vGPU using gScale - Xiao Zheng, Intel
XPDS16: Live scalability for vGPU using gScale - Xiao Zheng, IntelXPDS16: Live scalability for vGPU using gScale - Xiao Zheng, Intel
XPDS16: Live scalability for vGPU using gScale - Xiao Zheng, IntelThe Linux Foundation
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit pptSandeep Singh
 

Similar to Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud (20)

GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 4  Maximizing the utilization of GPU resources on-premise and in the cloudPart 4  Maximizing the utilization of GPU resources on-premise and in the cloud
Part 4 Maximizing the utilization of GPU resources on-premise and in the cloud
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
qCUDA-ARM : Virtualization for Embedded GPU Architectures
 qCUDA-ARM : Virtualization for Embedded GPU Architectures  qCUDA-ARM : Virtualization for Embedded GPU Architectures
qCUDA-ARM : Virtualization for Embedded GPU Architectures
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
 
1101: GRID 技術セッション 2:vGPU Sizing
1101: GRID 技術セッション 2:vGPU Sizing1101: GRID 技術セッション 2:vGPU Sizing
1101: GRID 技術セッション 2:vGPU Sizing
 
Introduction to Hybrid SLI Technology
Introduction to Hybrid SLI TechnologyIntroduction to Hybrid SLI Technology
Introduction to Hybrid SLI Technology
 
2-GPGPU-Sim-Overview.pptx
2-GPGPU-Sim-Overview.pptx2-GPGPU-Sim-Overview.pptx
2-GPGPU-Sim-Overview.pptx
 
Gpu application in cuda memory
Gpu application in cuda memoryGpu application in cuda memory
Gpu application in cuda memory
 
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
 
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel MovidiusBenchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
 
Cuda
CudaCuda
Cuda
 
XPDS16: Live scalability for vGPU using gScale - Xiao Zheng, Intel
XPDS16: Live scalability for vGPU using gScale - Xiao Zheng, IntelXPDS16: Live scalability for vGPU using gScale - Xiao Zheng, Intel
XPDS16: Live scalability for vGPU using gScale - Xiao Zheng, Intel
 
GPU
GPUGPU
GPU
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 

Recently uploaded

In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 

Recently uploaded (20)

In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud

  • 1. Adam Tetelman, Solutions Architect An overview of the MIG feature, configuration caveats, and details around additional features, isolation, communication, value-add, and use-cases. MIG Technical Overview
  • 2. 2 Why - MIG on the DGX A100 Multi-Instance GPU - Major MIG benefits and goals ● The goal of the MIG feature is to increase GPU utilization. This is done by partitioning a single GPU into multiple fully isolated devices that are efficiently sized per-use-case, specifically smaller use-cases that only require a subset of GPU resources. ● Benefits of MIG on the NVIDIA A100: ○ Physical allocation of resources used by parallel GPU workloads → Secure multi-tenant environments with isolation and predictable QoS ○ Versatile profiles with dynamic configuration (152 configurations on a DGX!) → Maximized utilization by configuring NVIDIA A100 for specific workloads ○ CUDA Programming model unchanged → No code changes required.
  • 3. 3 When - MIG on the DGX A100 Workloads that don’t utilize the full A100 GPU - hpc, prototyping, inference, and light training
  • 4. 4 How - MIG on the DGX A100 Optimize GPU Utilization, Expand Access to More Users with Guaranteed Quality of Service Up To 7 GPU Instances In a Single A100: Dedicated SM, Memory, L2 cache, Bandwidth for hardware QoS & isolation Simultaneous Workload Execution With Guaranteed Quality Of Service: All MIG instances run in parallel with predictable throughput & latency Right Sized GPU Allocation: Different sized MIG instances based on target workloads Diverse Deployment Environments: Supported with Bare metal, Docker, Kubernetes, Virtualized Env. MIG User Guide: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html
  • 5.
  • 6. 6 MIG Terminology - Instances A MIG device is made up of GPU Instance and a Compute Instance ● Multi-Instance GPU (MIG) - The MIG feature allows one or more GPU Instances to be allocated within a GPU. Making a single GPU appear as if it were many. ● GPU Instances (GI) - A fully isolated collection of all physical GPU resources. Can contain one or more GPU Compute Instances. Contains one or more Compute Instances. ● Compute Instance (CI) - An isolated collection of GPU SMs (CUDA cores) belongs to a single GPU Instance. Shares GPU memory with other CIs in that GPU Instance. ● Instance Profile - A GPU Instance Profile (GIP) or GPU Compute Instance Profile (CIP) defines the configuration and available resources in an Instance. ● MIG Device - A MIG Device is the actual “GPU” an application sees and it is the combination of a GPU, GPU Instance, and Compute Instance.
  • 7. 7 GPU Instances vs. Compute Instance Full isolation vs. shared resources GPU instance per user Compute instance per process
  • 8. 8 GPU Instances in MIG Applications in a single GPU Instance get isolation and guaranteed QoS Each GPU Instance has physical partitions of GPU memory, GPU SMs, and all other GPU hardware. This provide each GPU Instance and the VMs with guaranteed QoS through resource isolation. Each GPU compute Instance within the GPU instance provides partial isolation. This allows isolated compute resources and independent workload scheduling. Clock speed, MIG profile configuration, and other settings should be optimized based on the expected MIG use-cases. There is no “best” or “optimal” combination of profiles and configurations. The creation of GPU instances requires root privileges.
  • 9. 9 Compute Instances on MIG Processes in a Compute Instance get isolation & flexibility at the expense of QoS Each GPU compute instance within the GPU instance provides functional isolation. This allows isolated compute resources and independent workload scheduling. Processes in a CI (such as a container) are executed as applications against GPU Compute instances. A process can get full isolation by being the only CUDA application running on a specific GPU Instance. If multiple processes are running with GPU compute instances running on the same GPU instance, then the same level of isolation is not provided, but ICP is available.
  • 10. 10 MIG Terminology - Slices Memory slice, Compute slices, and GPU engines ● GPU Slice - A GPU slice is the smallest fraction of the GPU that combines a single GPU memory slice and a single GPU SM slice. ● GPU Memory Slice - A GPU memory slice is the smallest fraction of the GPU’s memory, including the corresponding memory controllers and cache. A GPU memory slice is roughly 1/8 of the total GPU memory resources, including both capacity and bandwidth. ● GPU SM Slice - A GPU SM slice is the smallest fraction of the SMs. A GPU SM slice is roughly one seventh, 1/7 of the total number of SMs. ● GPU Engine - A GPU Engine is what executes work on the GPI. Different engines are responsible for different actions such as the Compute engine or Copy engine. Engines are scheduled independently and work in a GPU context.
  • 11. 11 Configuration options with MIG Configuration MIG on the NVIDIA A100 Flexible configurations Driver presents available profiles 18 combinations possible Constraints ► Graphics contexts not supported ► P2P not available (e.g. no NVLink) ► For running VMs within each instance, vGPU is required Setup ► Workflow to enable MIG requires a GPU reset ► Setting is persistent across reboots (stored in inforom) ► Reconfiguration is dynamic and can be done from a container MIG instances on A100-SXM4-40GB
  • 12. 12 GPU Instance Profiles MIG Instance Instances per GPU SMs Memory Engines Target Workload Training Inference 1g.5gb 7 14 5 GB 0 NVDECs BERT Fine-tuning (e.g. SQuAD), Multiple chatbots, Jupyter notebooks Multiple inference (e.g. Triton); ResNet-50, BERT, W&D networks2g.10gb 3 28 10 GB 1 NVDECs 3g.20gb 2 42 20 GB 2 NVDECs Training on ResNet-50, BERT, WnD networks4g.20gb 1 56 20 GB 2 NVDECs 7g.40gb 1 98 40 GB 5 NVDECs All the options for a single A100-SXM4-40GB GPU * A small carveout (~0.5GB) is needed for MIG mgmt. and not usable by end user * GPU Instance profile name is <SM_SLICE_COUNT>g.<GPU_MEMORY_COUNT>gb
  • 13. 13 GPU reset required to enable/disable MIG mode, this one-time operation is per-gpu and persists across reboots Use NVML/nvidia-smi to manage MIG Configuration supported through management containers Configure and reconfigure as-needed Example: List available profiles with nvidia-smi MIG Management & Configuration List/Create/Update/Destroy Instances via NVML and nvidia-smi # nvidia-smi mig --list-gpu-instances +------------------------------------------------+ | GPU instances: | | GPU Name Profile Instance Placement | | ID ID Start:Size | |================================================| | 0 1g.5gb 19 9 2:1 | +------------------------------------------------+ | 0 1g.5gb 19 10 3:1 | +------------------------------------------------+ | 0 1g.5gb 19 13 6:1 | +------------------------------------------------+ | 0 2g.10gb 14 3 0:2 | +------------------------------------------------+ | 0 2g.10gb 14 5 4:2 | +------------------------------------------------+
  • 14. 14 Using MIG in Docker Containers Passing through specific MIG devices With MIG disabled the GPU is still specified by the GPU index or GPU UUID. However, when MIG is enabled the device is specified by index in the format <gpu-device-index>:<mig-device-index> or by UUID in the format MIG-<gpu-id>/<gpu-instance-id>/<gpu-compute-instance-id>. Configure MIG (as root) Configure MIG (non-root) MIG monitoring Note: Commands are slightly different on Docker version 18.x (shown here is 19.03+) docker run --cap-add SYS_ADMIN --gpus '"device=0"' -e NVIDIA_MIG_CONFIG_DEVICES="all" nvidia/cuda:9.0-base nvidia-smi docker run --gpus '"device=0:0,0:1"' nvidia/cuda:9.0-base nvidia-smi # MIG ENABLED docker run --cap-add SYS_ADMIN --gpus '"device=0"' -e NVIDIA_MIG_MONITOR_DEVICES="all" nvidia/cuda:9.0-base nvidia-smi chown nobody /proc/driver/nvidia/capabilities/mig/config docker run --cap-add SYS_ADMIN --user nobody --gpus '"device=0"' -e NVIDIA_MIG_CONFIG_DEVICES="all" nvidia/cuda:9.0-base nvidia-smi docker run --gpus '"device=0,1"' nvidia/cuda:9.0-base nvidia-smi # MIG DISABLED
  • 15. 15 MIG for Maximized Resource Utilization Use Case Number of GPU Instances (GI) Available Description Example GPU Memory per model Inferencing 7 Run multiple Inference Servers simultaneously Jarvis Chatbots 5 GB Development 7 Serve multiple development environments simultaneously Prototyping in Jupyter Notebooks 5 GB Fine-tuning 3 Perform fine-tuning for multiple smaller models & datasets simultaneously BERT Fine-Tuning 10 GB Training 2 Train multiple larger models simultaneously ResNet50 Training 20 GB Dynamically change available cloud GPU instance types to meet demand
  • 16. 16 Mixed workloads with MIG Running 3 different applications on a single GPU P0 P1 P2 P3 P0 P0 Instance 0 Instance 1 Instance 2 4-Slice 4 parallel CUDA processes P0 – P3 2-Slice One CUDA process P0 GPU Instances FB FB Compute Engine [0] L2 Compute Engine [1] Compute Engine [2] Compute Engine [3] Compute Engine (x2) Compute Engine L2 FB L2 Compute & Memory Partitions 1-Slice Running Debugger NVDEC [0] NVDEC [1] NVDEC x1 When a whole GPU is underutilized, use MIG to better match GPU resources to users needs thus optimizing overall GPU utilization.
  • 17. 17 MIG for Optimized Inferencing Scaling out microservices with Triton and MIG Horizontally scale out your containers or VMs 7x by using MIG GPU compute Instance instead of GPU devices No updates needed for application code; deployment code must be updated to use MIG rather than GPU resources Continue using microservice best-practice, one server per app, or allow Triton to manage all MIG devices Ideal for batch-size 1 inferencing
  • 18. 18 MIG for Many Developers Using a single DGX A100 as a development platform for 50+ people Mix user development that requires minimal resources with large workloads that are resource intensive on the same platform Give a very large number of users GPUs for development work, learning, demos, and POCs Perfect for: ● Dev teams that do not need extensive GPU mem ● Large instructor-led workshops ● Academic coursework and institutions ● Sales organization with ongoing demos
  • 19. 19 Consolidating Different Workloads on DGX A100 One platform for training, inference, and data analytics
  • 20. 20 Summary of MIG Flexibility & Isolation MIG allows you to configure a single GPU to fit any set of workloads A MIG devices looks like an ordinary GPU, so no change is necessary to any user-code With the proper MIG configuration, no GPU resources remain idle A single A100 can be dynamically configured to best serve the current workload MIG integrates into existing monitoring framework and works with virtualization and container orchestration platforms As your team grows and your clusters scale the DGX A100 will continue to be the right tool for the job with MIG