GPU Virtualization: Doing Much More with GPUs

187 views

Published on

Bitfusion Super Computing 2016 Presentation on GPU Virtualization using Bitfusion Boost Technology

Published in: Software
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
187
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Why did we start at 64 GPUs.
    Imagine what you could do with 32, 64, 128 GPUs in a single server?
  • Remove IBM
  • GPU Virtualization: Doing Much More with GPUs

    1. 1. GPU Virtualization: Doing Much More with GPUs Mazhar Memon, CTO Bitfusion SC16 Salt Lake City, UT 1
    2. 2. Quick Poll: Any GPU users? 2
    3. 3. GPU Users: Much variety • Learners • Application developers • HPC Scientist • CAD/CAE • Artist + Designers • Data Analyst 3 One size doesn’t fit all Manufacturing Retail & Finance Media & Entertainment Pharma & Healthcare Oil & Gas Deep Learning
    4. 4. Variety of GPU Sizes • TX1 • GTX • Tesla • Quadro 4
    5. 5. Problem: How to do more with your (static) GPUs? 5
    6. 6. Virtualization 101 6
    7. 7. Server 7 Hypervisor V M V M V M V M Server Hypervisor vGP U V M vGP U V M vGP U V M vGP U V M Virtualization: Support More Users or Applications on a Single Server
    8. 8. Many users, small problems GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU
    9. 9. One user, one big problem 9 GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU
    10. 10. Large data, small device memory 10 GPU GPU App requirement Available memory App demand >> GPU memory
    11. 11. GPU Virtualization on Steroids 11 Use your favorite GPU applications as-is Bitfusion Boost Layer Your existing GPU infrastructure
    12. 12. Solve Small Problems Cheaply GPU GPU Slice GPUs into arbitrary fractions Memory and process isolation Available on Nimbix Today: $0.49 GPU instances
    13. 13. LogicallyattachedGPUs Solve Large Problems Dynamically CPU-only Node 48 Cores 3 TB Memory 72 TB SSD Storage BoostMassive Virtual Node GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU Racks with GPUs GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU Creating the largest virtual GPU machines on demand
    14. 14. GPU GPU Host Memory Solve Large Data Problems Efficiently 14 Available memory Dynamic paging of GPU memory backed by host memory Works for non-Pascal GPUs as well
    15. 15. Monitoring and Managing GPUs Easily • Use your favorite tools: All common tools e.g. nvidia-smi work across virtual clusters 15
    16. 16. Handling Faults Automatically 16 GPU GPU GPU GPU App Failover to any other available GPU server upon Catastrophic, memory, intermittent faults
    17. 17. Bitfusion Boost: Software Stack application remote servers local server System view Hardware VM Hypervisor Drivers Operating system SDI User Space Intercepts applications and applies a variety of rules including automatic scale-out, resource pooling, high availability, etc. Hardware VM Hypervisor Drivers Operating system SDI Hardware VM Hypervisor Drivers Operating system SDI Open APIs Custom APIs Libraries Application Core Functions Hardware VM Hypervisor Drivers Operating system SDI Deploy on bare metal, containers, VMs. Secure, Portable, Frictionless
    18. 18. App Specific Instance Configurations as Machine Images 18 Resource Pooling: • Consolidate use of compute resources • Increase utilization • Lower capital costs Resource Provisioning: • Enforce CPU, memory, utilization quotas • Effect QoS policy and guarantees • Maximize utilization and reduce costs High availability: • Detect failures at app level • Rollback, failover, error detection • Events for higher level reporting Heterogeneous Offload: • Leverage HPC hardware • Interpose vendor libraries • Retarget hot functions to efficient specialized devices Scale-out: • Distribute and load balance load across systems • Scale performance on demand • Take advantage of runtime optimizations Advanced Profiling: • Understand application demands of the datacenter • Fine-grained data provides unique insight • Precise recommendations for capacity planning Deep Learning Caffe Deep Learning Torch Deep Learning Tensorflow Media Transcoding Rendering Scientific Computing Boost: Add broad set of features to your application http://www.bitfusion.io/boost-machine-images
    19. 19. Boost Available on Nimbix today 19 Developer-optimized machine configurations:
    20. 20. 20 Learn more about Bitfusion Boost at boost.bitfusion.io

    ×