Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Containerizing GPU
Applications with Docker
for Scaling to the Cloud
FUTURE OF PACKAGING APPLICATIONS
SUBBU RAMA
CPU
CPU
CPU
GPU
GPU
GPU
GPU
Mem
MemMemMem
CPU CPU CPU CPU
GPU GPU GPU GPU
GPU GPU GPU GPU
Mem Mem Mem Mem
Mem Mem Mem Mem
...
What problems are we trying to solve?
Hardware is Stuck:
proper setup and optimization can take days
code portability
Software is Stuck
4
Operating system requirements
Library dependencies
Drivers
Interoperability between tools
Proper insta...
Hardware is Stuck
5
Code portability
Performance portability
Resource provisioning
Proper setup and optimization can take ...
Goal
Given:
Applications from different vendors
Systems of different capabilities
Heterogeneous hardware
Compose a workflo...
Current Solutions
Current solutions revolve around a common denominator:
Operating system that works for all tools in chai...
Solution
Containerize all applications
◦ Create GPU/CPU versions
Assemble containers into workflow templates
◦ To represen...
Containers are nothing new
Part of Linux for last 10 years
LXC, FreeBSD Jails, Solaris Containers, etc.
What is new are AP...
Containers vs. VMs (Stack Comparison)
Why Containers?
Easy Deployment
◦ Avoid hours of environment / application setup
◦ Fast environment spin-up / tear-down
Fl...
DevOps Hell
GPU Containers the NVIDIA Way
Much easier that it used to be
◦ One no longer has to fully reinstall the NVIDIA
driver with...
Shell Scripts
Chef
Puppet
Ansible
GPU Container Getting Started (CAFFE)
Create a Dockerfile
◦ Very small, easy to re-build/update container if needed
◦ Repr...
Dockerfile Code
Demo 1: Deploy GPU Container Across Clouds
Demo will show:
Launching container on Cloud #1 and execute application
Taking ...
Container Performance
People are sceptical about container performance vs bare-metal
◦ There are special cases where perfo...
So what about Data?
In general, avoid storing data in containers.
◦ Container ought to be immutable
◦ bring it up, perform...
Application Flow Pipelines & Scheduling
Sophisticated tool flows rarely consist of a single application
◦ Some steps may o...
Example: HPC workflow
Semiconductor Circuit Design
Example: HPC workflow
Semiconductor Circuit Design
CPU/GPU App
Containers and Schedulers
In general several assumptions can be made about today’s clusters
◦ # CPU nodes >> # GPU nodes
◦...
Schedulers
What if we can break Physical Machine
Limitations?
Most cloud service provider and data centers are limited by physical li...
Introduce Bitfusion Boost Containers
We can:
◦ Combine Bitfusion Boost and Containers -> create magic!
What things can we ...
Boost Container Building Blocks
Boost Server Container
◦ Boost provisioned container with Boost Server
◦ Runs on any GPU p...
Boost Container Architecture
Demo 2: Build Virtual GPU Instances in the
Cloud
Demo will show the following using containers:
How in minutes we can crea...
On
cloud.bitfusion.io
Thank You
Visit Bitfusion Booth #731
@Bitfusionio | @subburama | subbu@bitfusion.io
Containerizing GPU Applications with Docker for Scaling to the Cloud
Upcoming SlideShare
Loading in …5
×

Containerizing GPU Applications with Docker for Scaling to the Cloud

2,152 views

Published on

Containerizing GPU Applications with Docker for Scaling to the Cloud

Published in: Technology

Containerizing GPU Applications with Docker for Scaling to the Cloud

  1. 1. Containerizing GPU Applications with Docker for Scaling to the Cloud FUTURE OF PACKAGING APPLICATIONS SUBBU RAMA
  2. 2. CPU CPU CPU GPU GPU GPU GPU Mem MemMemMem CPU CPU CPU CPU GPU GPU GPU GPU GPU GPU GPU GPU Mem Mem Mem Mem Mem Mem Mem Mem Data Center Virtual Supercomputer GPU GPU Mem GPUGPU CPU MemMem Mem Turns Discrete Computing Resources into a Virtual Supercomputer
  3. 3. What problems are we trying to solve? Hardware is Stuck: proper setup and optimization can take days code portability
  4. 4. Software is Stuck 4 Operating system requirements Library dependencies Drivers Interoperability between tools Proper installation can take days
  5. 5. Hardware is Stuck 5 Code portability Performance portability Resource provisioning Proper setup and optimization can take days
  6. 6. Goal Given: Applications from different vendors Systems of different capabilities Heterogeneous hardware Compose a workflow that: Works: individual components work, thus workflow works Is Portable: workload can be migrated across infrastructure Is Performant: has the ability to take advantage of GPU hardware Is Secure: individual components can be easily audited
  7. 7. Current Solutions Current solutions revolve around a common denominator: Operating system that works for all tools in chain Compute nodes which satisfy the most memory hungry application Need GPUs? Must deploy on top of GPU only nodes Cost sensitive? Must deploy on low-end CPU only nodes Common denominator shortcoming: Inefficiencies Poor utilization / over provisioning Non-performant
  8. 8. Solution Containerize all applications ◦ Create GPU/CPU versions Assemble containers into workflow templates ◦ To represent particular use cases and pipelines Use workflow templates to create virtual clusters ◦ Optimize performance / budget via virtual clusters
  9. 9. Containers are nothing new Part of Linux for last 10 years LXC, FreeBSD Jails, Solaris Containers, etc. What is new are APIs ◦ Docker ◦ Rocket ◦ Etc. Specifically ◦ A complete runtime environment: OS, application, libraries, dependencies, binaries, and configuration files ◦ Can be quickly deployed on a set of container hosts when needed
  10. 10. Containers vs. VMs (Stack Comparison)
  11. 11. Why Containers? Easy Deployment ◦ Avoid hours of environment / application setup ◦ Fast environment spin-up / tear-down Flexibility ◦ Applications use preferred version of OS, libs language versions, etc. ◦ Move data to application, or move Application to data Reproducibility / Reliability / Scaling ◦ Workflow steps start with clean and immutable images ◦ Reliability through easy migration and checkpointing
  12. 12. DevOps Hell
  13. 13. GPU Containers the NVIDIA Way Much easier that it used to be ◦ One no longer has to fully reinstall the NVIDIA driver within the container ◦ No more container vs. host system driver matching conflicts - container works with host OS driver - there is still a driver and toolkit dependency ◦ https://github.com/NVIDIA/nvidia-docker Requirements ◦ Host has NVIDIA Drivers ◦ Host has Docker installed
  14. 14. Shell Scripts Chef Puppet Ansible
  15. 15. GPU Container Getting Started (CAFFE) Create a Dockerfile ◦ Very small, easy to re-build/update container if needed ◦ Reproducible builds ◦ Specify Operating System ◦ Install Operating System basics ◦ Install Application Dependencies ◦ Install Application Once Dockerfile is done: Build Container, Test Container, Store Container in Repo Quickly spin up and container where and when needed Enables “fire-and-forget” GPU applications What about data? Long answer: we’ll get to that in a bit Short: Put it somewhere else, keep containers small
  16. 16. Dockerfile Code
  17. 17. Demo 1: Deploy GPU Container Across Clouds Demo will show: Launching container on Cloud #1 and execute application Taking exact same container and launch on Cloud #2 and execute application containers run on any cloud or datacenter or OS (even Windows) containers use different types of GPUs and drivers and everything works transparently “fire and forget” GPU applications on GPU hardware you need wherever it may be
  18. 18. Container Performance People are sceptical about container performance vs bare-metal ◦ There are special cases where performance can be an issue, but in general performance is on par, and better than VMs ◦ Docker versus Bare Metal is within 10% performance W. Felter, A. Ferreira, R. Rajamony, and J. Rubio. An Updated Performance Comparison of Virtual Machines and Linux Containers.Technology, 28:32, 2014. (IBM)
  19. 19. So what about Data? In general, avoid storing data in containers. ◦ Container ought to be immutable ◦ bring it up, perform a task, return the result, shut it down ◦ Containers ought to be small ◦ size of containers impacts startup times ◦ size of containers impacts time it takes to pull container from repository Use Data Volume Containers
  20. 20. Application Flow Pipelines & Scheduling Sophisticated tool flows rarely consist of a single application ◦ Some steps may only run on CPUs ◦ Some steps may execute on a CPU or a GPU Challenge is how to schedule these flow efficiently to either obtain faster turnaround times or better overall throughput, while maintaining reproducible results
  21. 21. Example: HPC workflow Semiconductor Circuit Design
  22. 22. Example: HPC workflow Semiconductor Circuit Design CPU/GPU App
  23. 23. Containers and Schedulers In general several assumptions can be made about today’s clusters ◦ # CPU nodes >> # GPU nodes ◦ GPU nodes have a fixed #GPUs in them ◦ Best machine for an application is usually determined by ◦ amount of memory ◦ amount and type of CPUs ◦ amount and type of GPUs How can containers help with scheduling give this constraint vs. regular schedulers
  24. 24. Schedulers
  25. 25. What if we can break Physical Machine Limitations? Most cloud service provider and data centers are limited by physical limitations ◦ Example: Largest machines has 2 GPUs (Softlayer), 4 GPUs (AWS) ◦ Rack can only have max amount of GPUs due to power constraints What if we could create virtual machines and clusters and present them to applications as a single virtual machines? How would this change the clusters and schedulers? * Elastic Containers or Elastic Machines via Containers (grow or shrink)
  26. 26. Introduce Bitfusion Boost Containers We can: ◦ Combine Bitfusion Boost and Containers -> create magic! What things can we build? ◦ Create a machine which has 16 or more virtual GPUs! ◦ Run an application across these GPUs without having to setup MPI, SPARK, HADOOP! ◦ Run GPU applications on non-GPU machines by automatically offloading to GPU machines in the cluster ◦ All of the above can be done WITHOUT CODE CHANGES for GPU enabled applications!
  27. 27. Boost Container Building Blocks Boost Server Container ◦ Boost provisioned container with Boost Server ◦ Runs on any GPU provisioned host ◦ Can act as a client at the same time Boost Client Container ◦ Boost provisioned container with Boost Client and End User Application ◦ Runs on any-type of instance including CPU only instances
  28. 28. Boost Container Architecture
  29. 29. Demo 2: Build Virtual GPU Instances in the Cloud Demo will show the following using containers: How in minutes we can create virtual GPU cluster configurations How we can provision GPU machines which don’t exist in the physical world How we can run GPU applications on non-GPU machines How we can execute applications across these configurations without changing a single line of code!
  30. 30. On
  31. 31. cloud.bitfusion.io
  32. 32. Thank You Visit Bitfusion Booth #731 @Bitfusionio | @subburama | subbu@bitfusion.io

×