SlideShare a Scribd company logo
Containerizing GPU
Applications with Docker
for Scaling to the Cloud
FUTURE OF PACKAGING APPLICATIONS
SUBBU RAMA
CPU
CPU
CPU
GPU
GPU
GPU
GPU
Mem
MemMemMem
CPU CPU CPU CPU
GPU GPU GPU GPU
GPU GPU GPU GPU
Mem Mem Mem Mem
Mem Mem Mem Mem
Data Center
Virtual
Supercomputer
GPU
GPU
Mem
GPUGPU
CPU
MemMem
Mem
Turns Discrete Computing Resources into a Virtual Supercomputer
What problems are we trying to solve?
Hardware is Stuck:
proper setup and optimization can take days
code portability
Software is Stuck
4
Operating system requirements
Library dependencies
Drivers
Interoperability between tools
Proper installation can take days
Hardware is Stuck
5
Code portability
Performance portability
Resource provisioning
Proper setup and optimization can take days
Goal
Given:
Applications from different vendors
Systems of different capabilities
Heterogeneous hardware
Compose a workflow that:
Works: individual components work, thus workflow works
Is Portable: workload can be migrated across infrastructure
Is Performant: has the ability to take advantage of GPU hardware
Is Secure: individual components can be easily audited
Current Solutions
Current solutions revolve around a common denominator:
Operating system that works for all tools in chain
Compute nodes which satisfy the most memory hungry application
Need GPUs? Must deploy on top of GPU only nodes
Cost sensitive? Must deploy on low-end CPU only nodes
Common denominator shortcoming: Inefficiencies
Poor utilization / over provisioning
Non-performant
Solution
Containerize all applications
◦ Create GPU/CPU versions
Assemble containers into workflow templates
◦ To represent particular use cases and pipelines
Use workflow templates to create virtual clusters
◦ Optimize performance / budget via virtual clusters
Containers are nothing new
Part of Linux for last 10 years
LXC, FreeBSD Jails, Solaris Containers, etc.
What is new are APIs
◦ Docker
◦ Rocket
◦ Etc.
Specifically
◦ A complete runtime environment: OS, application, libraries, dependencies, binaries, and configuration files
◦ Can be quickly deployed on a set of container hosts when needed
Containers vs. VMs (Stack Comparison)
Why Containers?
Easy Deployment
◦ Avoid hours of environment / application setup
◦ Fast environment spin-up / tear-down
Flexibility
◦ Applications use preferred version of OS, libs language versions, etc.
◦ Move data to application, or move Application to data
Reproducibility / Reliability / Scaling
◦ Workflow steps start with clean and immutable images
◦ Reliability through easy migration and checkpointing
DevOps Hell
GPU Containers the NVIDIA Way
Much easier that it used to be
◦ One no longer has to fully reinstall the NVIDIA
driver within the container
◦ No more container vs. host system driver
matching conflicts - container works with host
OS driver - there is still a driver and toolkit
dependency
◦ https://github.com/NVIDIA/nvidia-docker
Requirements
◦ Host has NVIDIA Drivers
◦ Host has Docker installed
Shell Scripts
Chef
Puppet
Ansible
GPU Container Getting Started (CAFFE)
Create a Dockerfile
◦ Very small, easy to re-build/update container if needed
◦ Reproducible builds
◦ Specify Operating System
◦ Install Operating System basics
◦ Install Application Dependencies
◦ Install Application
Once Dockerfile is done:
Build Container, Test Container, Store Container in Repo
Quickly spin up and container where and when
needed
Enables “fire-and-forget” GPU applications
What about data?
Long answer: we’ll get to that in a bit
Short: Put it somewhere else, keep containers small
Dockerfile Code
Demo 1: Deploy GPU Container Across Clouds
Demo will show:
Launching container on Cloud #1 and execute application
Taking exact same container and launch on Cloud #2 and execute application
containers run on any cloud or datacenter or OS (even Windows)
containers use different types of GPUs and drivers
and everything works transparently
“fire and forget” GPU applications on GPU hardware you need wherever it may be
Container Performance
People are sceptical about container performance vs bare-metal
◦ There are special cases where performance can be an issue, but in general performance is on par, and
better than VMs
◦ Docker versus Bare Metal is within 10% performance
W. Felter, A. Ferreira, R. Rajamony, and J. Rubio. An Updated Performance Comparison of Virtual Machines and Linux
Containers.Technology, 28:32, 2014. (IBM)
So what about Data?
In general, avoid storing data in containers.
◦ Container ought to be immutable
◦ bring it up, perform a task, return the result, shut it down
◦ Containers ought to be small
◦ size of containers impacts startup times
◦ size of containers impacts time it takes to pull container from repository
Use Data Volume Containers
Application Flow Pipelines & Scheduling
Sophisticated tool flows rarely consist of a single application
◦ Some steps may only run on CPUs
◦ Some steps may execute on a CPU or a GPU
Challenge is how to schedule these flow efficiently to either obtain faster turnaround times or better
overall throughput, while maintaining reproducible results
Example: HPC workflow
Semiconductor Circuit Design
Example: HPC workflow
Semiconductor Circuit Design
CPU/GPU App
Containers and Schedulers
In general several assumptions can be made about today’s clusters
◦ # CPU nodes >> # GPU nodes
◦ GPU nodes have a fixed #GPUs in them
◦ Best machine for an application is usually determined by
◦ amount of memory
◦ amount and type of CPUs
◦ amount and type of GPUs
How can containers help with scheduling give this constraint vs. regular schedulers
Schedulers
What if we can break Physical Machine
Limitations?
Most cloud service provider and data centers are limited by physical limitations
◦ Example: Largest machines has 2 GPUs (Softlayer), 4 GPUs (AWS)
◦ Rack can only have max amount of GPUs due to power constraints
What if we could create virtual machines and clusters and present them to applications as a single
virtual machines?
How would this change the clusters and schedulers?
* Elastic Containers or Elastic Machines via Containers (grow or shrink)
Introduce Bitfusion Boost Containers
We can:
◦ Combine Bitfusion Boost and Containers -> create magic!
What things can we build?
◦ Create a machine which has 16 or more virtual GPUs!
◦ Run an application across these GPUs without having to setup MPI, SPARK, HADOOP!
◦ Run GPU applications on non-GPU machines by automatically offloading to GPU machines in the cluster
◦ All of the above can be done WITHOUT CODE CHANGES for GPU enabled applications!
Boost Container Building Blocks
Boost Server Container
◦ Boost provisioned container with Boost Server
◦ Runs on any GPU provisioned host
◦ Can act as a client at the same time
Boost Client Container
◦ Boost provisioned container with Boost Client and End User Application
◦ Runs on any-type of instance including CPU only instances
Boost Container Architecture
Demo 2: Build Virtual GPU Instances in the
Cloud
Demo will show the following using containers:
How in minutes we can create virtual GPU cluster configurations
How we can provision GPU machines which don’t exist in the physical world
How we can run GPU applications on non-GPU machines
How we can execute applications across these configurations without changing a single line of code!
On
cloud.bitfusion.io
Thank You
Visit Bitfusion Booth #731
@Bitfusionio | @subburama | subbu@bitfusion.io

More Related Content

What's hot

Kubernetes and Hybrid Deployments
Kubernetes and Hybrid DeploymentsKubernetes and Hybrid Deployments
Kubernetes and Hybrid Deployments
Sandeep Parikh
 
Kubernetes 101 Workshop
Kubernetes 101 WorkshopKubernetes 101 Workshop
Cloud spanner architecture and use cases
Cloud spanner architecture and use casesCloud spanner architecture and use cases
Cloud spanner architecture and use cases
GDG Cloud Bengaluru
 
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetes
William Stewart
 
Learn kubernetes in 90 minutes
Learn kubernetes in 90 minutesLearn kubernetes in 90 minutes
Learn kubernetes in 90 minutes
Larry Cai
 
My kubernetes toolkit
My kubernetes toolkitMy kubernetes toolkit
My kubernetes toolkit
Sreenivas Makam
 
Kubernetes in Docker
Kubernetes in DockerKubernetes in Docker
Kubernetes in Docker
Docker, Inc.
 
Introduction kubernetes 2017_12_24
Introduction kubernetes 2017_12_24Introduction kubernetes 2017_12_24
Introduction kubernetes 2017_12_24
Sam Zheng
 
Kubernetes training
Kubernetes trainingKubernetes training
Kubernetes training
Des Drury
 
Kubernetes architecture
Kubernetes architectureKubernetes architecture
Kubernetes architecture
Janakiram MSV
 
Gordon's secret session kubernetes on windows
Gordon's secret session   kubernetes on windowsGordon's secret session   kubernetes on windows
Gordon's secret session kubernetes on windows
Docker, Inc.
 
GlueCon kubernetes & container engine
GlueCon kubernetes & container engineGlueCon kubernetes & container engine
GlueCon kubernetes & container engine
brendandburns
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkus
Red Hat Developers
 
Distributed tensorflow on kubernetes
Distributed tensorflow on kubernetesDistributed tensorflow on kubernetes
Distributed tensorflow on kubernetes
inwin stack
 
KubeCon EU 2016: Bringing an open source Containerized Container Platform to ...
KubeCon EU 2016: Bringing an open source Containerized Container Platform to ...KubeCon EU 2016: Bringing an open source Containerized Container Platform to ...
KubeCon EU 2016: Bringing an open source Containerized Container Platform to ...
KubeAcademy
 
"Kubernetes as Driver of Generic IT Automation"
"Kubernetes as Driver of Generic IT Automation""Kubernetes as Driver of Generic IT Automation"
"Kubernetes as Driver of Generic IT Automation"
ConSol Consulting & Solutions Software GmbH
 
2016 - Continuously Delivering Microservices in Kubernetes using Jenkins
2016 - Continuously Delivering Microservices in Kubernetes using Jenkins2016 - Continuously Delivering Microservices in Kubernetes using Jenkins
2016 - Continuously Delivering Microservices in Kubernetes using Jenkins
devopsdaysaustin
 
Production sec ops with kubernetes in docker
Production sec ops with kubernetes in dockerProduction sec ops with kubernetes in docker
Production sec ops with kubernetes in docker
Docker, Inc.
 
Revolutionizing WSO2 PaaS with Kubernetes & App Factory
Revolutionizing WSO2 PaaS with Kubernetes & App FactoryRevolutionizing WSO2 PaaS with Kubernetes & App Factory
Revolutionizing WSO2 PaaS with Kubernetes & App Factory
Imesh Gunaratne
 
How to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these projectHow to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these project
inwin stack
 

What's hot (20)

Kubernetes and Hybrid Deployments
Kubernetes and Hybrid DeploymentsKubernetes and Hybrid Deployments
Kubernetes and Hybrid Deployments
 
Kubernetes 101 Workshop
Kubernetes 101 WorkshopKubernetes 101 Workshop
Kubernetes 101 Workshop
 
Cloud spanner architecture and use cases
Cloud spanner architecture and use casesCloud spanner architecture and use cases
Cloud spanner architecture and use cases
 
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetes
 
Learn kubernetes in 90 minutes
Learn kubernetes in 90 minutesLearn kubernetes in 90 minutes
Learn kubernetes in 90 minutes
 
My kubernetes toolkit
My kubernetes toolkitMy kubernetes toolkit
My kubernetes toolkit
 
Kubernetes in Docker
Kubernetes in DockerKubernetes in Docker
Kubernetes in Docker
 
Introduction kubernetes 2017_12_24
Introduction kubernetes 2017_12_24Introduction kubernetes 2017_12_24
Introduction kubernetes 2017_12_24
 
Kubernetes training
Kubernetes trainingKubernetes training
Kubernetes training
 
Kubernetes architecture
Kubernetes architectureKubernetes architecture
Kubernetes architecture
 
Gordon's secret session kubernetes on windows
Gordon's secret session   kubernetes on windowsGordon's secret session   kubernetes on windows
Gordon's secret session kubernetes on windows
 
GlueCon kubernetes & container engine
GlueCon kubernetes & container engineGlueCon kubernetes & container engine
GlueCon kubernetes & container engine
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkus
 
Distributed tensorflow on kubernetes
Distributed tensorflow on kubernetesDistributed tensorflow on kubernetes
Distributed tensorflow on kubernetes
 
KubeCon EU 2016: Bringing an open source Containerized Container Platform to ...
KubeCon EU 2016: Bringing an open source Containerized Container Platform to ...KubeCon EU 2016: Bringing an open source Containerized Container Platform to ...
KubeCon EU 2016: Bringing an open source Containerized Container Platform to ...
 
"Kubernetes as Driver of Generic IT Automation"
"Kubernetes as Driver of Generic IT Automation""Kubernetes as Driver of Generic IT Automation"
"Kubernetes as Driver of Generic IT Automation"
 
2016 - Continuously Delivering Microservices in Kubernetes using Jenkins
2016 - Continuously Delivering Microservices in Kubernetes using Jenkins2016 - Continuously Delivering Microservices in Kubernetes using Jenkins
2016 - Continuously Delivering Microservices in Kubernetes using Jenkins
 
Production sec ops with kubernetes in docker
Production sec ops with kubernetes in dockerProduction sec ops with kubernetes in docker
Production sec ops with kubernetes in docker
 
Revolutionizing WSO2 PaaS with Kubernetes & App Factory
Revolutionizing WSO2 PaaS with Kubernetes & App FactoryRevolutionizing WSO2 PaaS with Kubernetes & App Factory
Revolutionizing WSO2 PaaS with Kubernetes & App Factory
 
How to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these projectHow to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these project
 

Similar to Containerizing GPU Applications with Docker for Scaling to the Cloud

OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
NETWAYS
 
Where should I run my code? Serverless, Containers, Virtual Machines and more
Where should I run my code? Serverless, Containers, Virtual Machines and moreWhere should I run my code? Serverless, Containers, Virtual Machines and more
Where should I run my code? Serverless, Containers, Virtual Machines and more
Bret McGowen - NYC Google Developer Advocate
 
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Indrajit Poddar
 
Week 8 lecture material
Week 8 lecture materialWeek 8 lecture material
Week 8 lecture material
Ankit Gupta
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
Hojoong Kim
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
nklmish
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
OpenEBS
 
Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14
Simon Storm
 
Introduction to PaaS and Heroku
Introduction to PaaS and HerokuIntroduction to PaaS and Heroku
Introduction to PaaS and Heroku
Tapio Rautonen
 
.NET Cloud-Native Bootcamp
.NET Cloud-Native Bootcamp.NET Cloud-Native Bootcamp
.NET Cloud-Native Bootcamp
VMware Tanzu
 
Meteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container EngineMeteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container Engine
Kit Merker
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
Bert Poller
 
Containers and Docker
Containers and DockerContainers and Docker
Containers and Docker
Damian T. Gordon
 
56k.cloud training
56k.cloud training56k.cloud training
56k.cloud training
Brian Christner
 
Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015
WaveMaker, Inc.
 
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
Docker, Inc.
 
Docker - A high level introduction to dockers and containers
Docker - A high level introduction to dockers and containersDocker - A high level introduction to dockers and containers
Docker - A high level introduction to dockers and containers
Dr Ganesh Iyer
 
Kubernetes is all you need
Kubernetes is all you needKubernetes is all you need
Kubernetes is all you need
Vishwas N
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
Steve Wong
 
create auto scale jboss cluster with openshift
create auto scale jboss cluster with openshiftcreate auto scale jboss cluster with openshift
create auto scale jboss cluster with openshift
Yusuf Hadiwinata Sutandar
 

Similar to Containerizing GPU Applications with Docker for Scaling to the Cloud (20)

OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
 
Where should I run my code? Serverless, Containers, Virtual Machines and more
Where should I run my code? Serverless, Containers, Virtual Machines and moreWhere should I run my code? Serverless, Containers, Virtual Machines and more
Where should I run my code? Serverless, Containers, Virtual Machines and more
 
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
 
Week 8 lecture material
Week 8 lecture materialWeek 8 lecture material
Week 8 lecture material
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14
 
Introduction to PaaS and Heroku
Introduction to PaaS and HerokuIntroduction to PaaS and Heroku
Introduction to PaaS and Heroku
 
.NET Cloud-Native Bootcamp
.NET Cloud-Native Bootcamp.NET Cloud-Native Bootcamp
.NET Cloud-Native Bootcamp
 
Meteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container EngineMeteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container Engine
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
 
Containers and Docker
Containers and DockerContainers and Docker
Containers and Docker
 
56k.cloud training
56k.cloud training56k.cloud training
56k.cloud training
 
Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015
 
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
 
Docker - A high level introduction to dockers and containers
Docker - A high level introduction to dockers and containersDocker - A high level introduction to dockers and containers
Docker - A high level introduction to dockers and containers
 
Kubernetes is all you need
Kubernetes is all you needKubernetes is all you need
Kubernetes is all you need
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
create auto scale jboss cluster with openshift
create auto scale jboss cluster with openshiftcreate auto scale jboss cluster with openshift
create auto scale jboss cluster with openshift
 

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 

Containerizing GPU Applications with Docker for Scaling to the Cloud

  • 1. Containerizing GPU Applications with Docker for Scaling to the Cloud FUTURE OF PACKAGING APPLICATIONS SUBBU RAMA
  • 2. CPU CPU CPU GPU GPU GPU GPU Mem MemMemMem CPU CPU CPU CPU GPU GPU GPU GPU GPU GPU GPU GPU Mem Mem Mem Mem Mem Mem Mem Mem Data Center Virtual Supercomputer GPU GPU Mem GPUGPU CPU MemMem Mem Turns Discrete Computing Resources into a Virtual Supercomputer
  • 3. What problems are we trying to solve? Hardware is Stuck: proper setup and optimization can take days code portability
  • 4. Software is Stuck 4 Operating system requirements Library dependencies Drivers Interoperability between tools Proper installation can take days
  • 5. Hardware is Stuck 5 Code portability Performance portability Resource provisioning Proper setup and optimization can take days
  • 6. Goal Given: Applications from different vendors Systems of different capabilities Heterogeneous hardware Compose a workflow that: Works: individual components work, thus workflow works Is Portable: workload can be migrated across infrastructure Is Performant: has the ability to take advantage of GPU hardware Is Secure: individual components can be easily audited
  • 7. Current Solutions Current solutions revolve around a common denominator: Operating system that works for all tools in chain Compute nodes which satisfy the most memory hungry application Need GPUs? Must deploy on top of GPU only nodes Cost sensitive? Must deploy on low-end CPU only nodes Common denominator shortcoming: Inefficiencies Poor utilization / over provisioning Non-performant
  • 8. Solution Containerize all applications ◦ Create GPU/CPU versions Assemble containers into workflow templates ◦ To represent particular use cases and pipelines Use workflow templates to create virtual clusters ◦ Optimize performance / budget via virtual clusters
  • 9. Containers are nothing new Part of Linux for last 10 years LXC, FreeBSD Jails, Solaris Containers, etc. What is new are APIs ◦ Docker ◦ Rocket ◦ Etc. Specifically ◦ A complete runtime environment: OS, application, libraries, dependencies, binaries, and configuration files ◦ Can be quickly deployed on a set of container hosts when needed
  • 10. Containers vs. VMs (Stack Comparison)
  • 11. Why Containers? Easy Deployment ◦ Avoid hours of environment / application setup ◦ Fast environment spin-up / tear-down Flexibility ◦ Applications use preferred version of OS, libs language versions, etc. ◦ Move data to application, or move Application to data Reproducibility / Reliability / Scaling ◦ Workflow steps start with clean and immutable images ◦ Reliability through easy migration and checkpointing
  • 13.
  • 14. GPU Containers the NVIDIA Way Much easier that it used to be ◦ One no longer has to fully reinstall the NVIDIA driver within the container ◦ No more container vs. host system driver matching conflicts - container works with host OS driver - there is still a driver and toolkit dependency ◦ https://github.com/NVIDIA/nvidia-docker Requirements ◦ Host has NVIDIA Drivers ◦ Host has Docker installed
  • 16. GPU Container Getting Started (CAFFE) Create a Dockerfile ◦ Very small, easy to re-build/update container if needed ◦ Reproducible builds ◦ Specify Operating System ◦ Install Operating System basics ◦ Install Application Dependencies ◦ Install Application Once Dockerfile is done: Build Container, Test Container, Store Container in Repo Quickly spin up and container where and when needed Enables “fire-and-forget” GPU applications What about data? Long answer: we’ll get to that in a bit Short: Put it somewhere else, keep containers small
  • 18. Demo 1: Deploy GPU Container Across Clouds Demo will show: Launching container on Cloud #1 and execute application Taking exact same container and launch on Cloud #2 and execute application containers run on any cloud or datacenter or OS (even Windows) containers use different types of GPUs and drivers and everything works transparently “fire and forget” GPU applications on GPU hardware you need wherever it may be
  • 19. Container Performance People are sceptical about container performance vs bare-metal ◦ There are special cases where performance can be an issue, but in general performance is on par, and better than VMs ◦ Docker versus Bare Metal is within 10% performance W. Felter, A. Ferreira, R. Rajamony, and J. Rubio. An Updated Performance Comparison of Virtual Machines and Linux Containers.Technology, 28:32, 2014. (IBM)
  • 20. So what about Data? In general, avoid storing data in containers. ◦ Container ought to be immutable ◦ bring it up, perform a task, return the result, shut it down ◦ Containers ought to be small ◦ size of containers impacts startup times ◦ size of containers impacts time it takes to pull container from repository Use Data Volume Containers
  • 21. Application Flow Pipelines & Scheduling Sophisticated tool flows rarely consist of a single application ◦ Some steps may only run on CPUs ◦ Some steps may execute on a CPU or a GPU Challenge is how to schedule these flow efficiently to either obtain faster turnaround times or better overall throughput, while maintaining reproducible results
  • 23. Example: HPC workflow Semiconductor Circuit Design CPU/GPU App
  • 24. Containers and Schedulers In general several assumptions can be made about today’s clusters ◦ # CPU nodes >> # GPU nodes ◦ GPU nodes have a fixed #GPUs in them ◦ Best machine for an application is usually determined by ◦ amount of memory ◦ amount and type of CPUs ◦ amount and type of GPUs How can containers help with scheduling give this constraint vs. regular schedulers
  • 26. What if we can break Physical Machine Limitations? Most cloud service provider and data centers are limited by physical limitations ◦ Example: Largest machines has 2 GPUs (Softlayer), 4 GPUs (AWS) ◦ Rack can only have max amount of GPUs due to power constraints What if we could create virtual machines and clusters and present them to applications as a single virtual machines? How would this change the clusters and schedulers? * Elastic Containers or Elastic Machines via Containers (grow or shrink)
  • 27. Introduce Bitfusion Boost Containers We can: ◦ Combine Bitfusion Boost and Containers -> create magic! What things can we build? ◦ Create a machine which has 16 or more virtual GPUs! ◦ Run an application across these GPUs without having to setup MPI, SPARK, HADOOP! ◦ Run GPU applications on non-GPU machines by automatically offloading to GPU machines in the cluster ◦ All of the above can be done WITHOUT CODE CHANGES for GPU enabled applications!
  • 28. Boost Container Building Blocks Boost Server Container ◦ Boost provisioned container with Boost Server ◦ Runs on any GPU provisioned host ◦ Can act as a client at the same time Boost Client Container ◦ Boost provisioned container with Boost Client and End User Application ◦ Runs on any-type of instance including CPU only instances
  • 30. Demo 2: Build Virtual GPU Instances in the Cloud Demo will show the following using containers: How in minutes we can create virtual GPU cluster configurations How we can provision GPU machines which don’t exist in the physical world How we can run GPU applications on non-GPU machines How we can execute applications across these configurations without changing a single line of code!
  • 31. On
  • 33. Thank You Visit Bitfusion Booth #731 @Bitfusionio | @subburama | subbu@bitfusion.io