https://www.flickr.com/photos/mwichary/3209181446
Virtual Machines
CPU GPU FPGA
Data Bricks Batch AI AKS
Azure Machine Learning SDK
應用程式及資料處理流程
https://www.flickr.com/photos/mwichary/4145891520
https://github.com/Azure/pixel_level_land_classification
Purpose Graphics
VM Family NV v1
GPU NVIDIA M60
GPU Memory 8 GB
Sizes 1, 2 or 4 GPU
Interconnect PCIe (dual root)
2nd Network
VM CPU Haswell
VM RAM 56-224 GB
Local SSD ~380-1500 GB
Storage Std Storage
Driver Quadro/Grid PC
Purpose Graphics Compute Compute Compute
VM Family NV v1 NC v1 NC v2 NC v3
GPU NVIDIA M60 NVIDIA K80 NVIDIA P100 NVIDIA V100
GPU Memory 8 GB 8 GB 16 GB 16 GB
Sizes 1, 2 or 4 GPU 1, 2 or 4 GPU 1, 2 or 4 GPU 1, 2 or 4 GPU
Interconnect PCIe (dual root) PCIe (dual root) PCIe (dual root) PCIe (dual root)
2nd Network FDR InfiniBand FDR InfiniBand FDR InfiniBand
VM CPU Haswell Haswell Broadwell Broadwell
VM RAM 56-224 GB 56-224 GB 112-448 GB 112-448 GB
Local SSD ~380-1500 GB ~380-1500 GB ~700-3000 GB ~700-3000 GB
Storage Std Storage Std Storage Prem Storage Prem Storage
Driver Quadro/Grid PC Tesla Tesla Tesla
Purpose Graphics Compute Compute Compute Deep Learning
VM Family NV v1 NC v1 NC v2 NC v3 ND v1
GPU NVIDIA M60 NVIDIA K80 NVIDIA P100 NVIDIA V100 NVIDIA P40
GPU Memory 8 GB 8 GB 16 GB 16 GB 24 GB
Sizes 1, 2 or 4 GPU 1, 2 or 4 GPU 1, 2 or 4 GPU 1, 2 or 4 GPU 1, 2 or 4 GPU
Interconnect PCIe (dual root) PCIe (dual root) PCIe (dual root) PCIe (dual root) PCIe (dual root)
2nd Network FDR InfiniBand FDR InfiniBand FDR InfiniBand FDR InfiniBand
VM CPU Haswell Haswell Broadwell Broadwell Broadwell
VM RAM 56-224 GB 56-224 GB 112-448 GB 112-448 GB 112-448 GB
Local SSD ~380-1500 GB ~380-1500 GB ~700-3000 GB ~700-3000 GB ~700-3000 GB
Storage Std Storage Std Storage Prem Storage Prem Storage Prem Storage
Driver Quadro/Grid PC Tesla Tesla Tesla Tesla
Purpose Graphics
VM Family NV v2
GPU NVIDIA M60
GPU Memory 8 GB
Sizes 1, 2 or 4 GPU
Interconnect PCIe (dual root)
2nd Network
VM CPU Broadwell
VM RAM 112-448 GB
Local SSD ~700-3000 GB
Storage Prem Storage
Driver Quadro/Grid PC
Purpose Graphics Deep Learning
VM Family NV v2 ND v2
GPU NVIDIA M60 NVIDIA V100
GPU Memory 8 GB 16 GB
Sizes 1, 2 or 4 GPU 8 GPU
Interconnect PCIe (dual root) NVLink
2nd Network
VM CPU Broadwell Skylake
VM RAM 112-448 GB 768 GB
Local SSD ~700-3000 GB ~1300 GB
Storage Prem Storage Prem Storage
Driver Quadro/Grid PC Tesla
https://www.flickr.com/photos/mwichary/4358127163
Clusters
• Provision GPUs
• Install drivers
and software
• Interactive use
Scheduling
• Queue work
• Prioritize jobs
• Start MPI
• Monitor
• Handle failures
Data
• Scale access to
training data
• Output logs &
models
• Secure &
compliant
Cost
• Scale up and
down
• Share reserved
instances
• Use low priority
Workflow
• Efficient
hardware
• Tooling
integration
• Laptop to cloud
New API coming soon
https://github.com/Azure/BatchAI/tree/master/recipes
 Azure Blob FUSE: Default in samples going forward
https://www.flickr.com/photos/mwichary/4358122657
https://github.com/saidbleik/batchai_mm_ad
https://github.com/Azure/doAzureParallel
https://github.com/Azure/aztk/
https://azure.github.io/batch-shipyard/
https://github.com/Azure/batch-shipyard
https://www.flickr.com/photos/mwichary/4148855707
Deep Learning at Scale

Deep Learning at Scale