#ibmedge© 2016 IBM Corporation
Enabling Cognitive Workloads on the Cloud:
GPUs with Mesos, Docker and Marathon on
POWER
Seetharami Seelam, IBM Research
Indrajit Poddar, IBM Systems
#ibmedge
Please Note:
• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
and at IBM’s sole discretion.
• Information regarding potential future products is intended to outline our general product direction and it
should not be relied on in making a purchasing decision.
• The information mentioned regarding potential future products is not a commitment, promise, or legal
obligation to deliver any material, code or functionality. Information about potential future products may not be
incorporated into any contract.
• The development, release, and timing of any future features or functionality described for our products
remains at our sole discretion.
• Performance is based on measurements and projections using standard IBM benchmarks in a controlled
environment. The actual throughput or performance that any user will experience will vary depending upon
many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the
I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be
given that an individual user will achieve results similar to those stated here.
1
#ibmedge
About Seelam
Expertise:
• 10+ years in Large scale high performance and
distributed systems
• Built multiple cloud services for IBM Bluemix:
autoscaling, business rules, containers, POWER
containers, and Deep Learning as a service
• Enabled and scaled Docker on POWER/Z for
extreme density (tens of thousands of
containers)
• Enabling GPUs in the cloud for container-based
workloads (Mesos/Kub/Docker)
2
Dr. Seetharami R. Seelam
Research Staff Member
IBM T. J. Watson Research Center
Yorktown Heights, NY
sseelam@us.ibm.com
Twitter: sseelam
#ibmedge
About Indrajit (a.k.a. I.P)
Expertise:
• Accelerated Cloud Data Services, Machine
Learning and Deep Learning
• Apache Spark, TensorFlow… with GPUs
• Distributed Computing (scale out and up)
• Cloud Foundry, Spectrum Conductor, Mesos,
Kubernetes, Docker, OpenStack, WebSphere
• Cloud computing on High Performance Systems
• OpenPOWER, IBM POWER
3
Indrajit Poddar
Senior Technical Staff Member,
Master Inventor, IBM Systems
ipoddar@us.ibm.com
Twitter: @ipoddar
#ibmedge
Agenda
• Introduction to Cognitive workloads and POWER
• Requirements for GPUs in the Cloud
• Mesos/GPU enablement
• Kubernetes/GPU enablement
• Demo of GPU usage with Docker on OpenPOWER to identify dog
breads
• Machine and Deep Leaning Ecosystem on OpenPOWER
• Summary and Next Steps
4
#ibmedge
Cognition
5
What you and I (our brains) do without even thinking about it…..we recognize a bicycle
#ibmedge
Now machines are learning the way we learn….
6
From "Texture of the Nervous System
of Man and the Vertebrates" by
Santiago Ramón y Cajal.
Artificial Neural Networks
#ibmedge
But training needs a lot computational resources
Easy scale-out with: Deep Learning model training is hard to distribute
Training can take hours, days or weeks
Input data and model sizes are becoming
larger than ever (e.g. video input, billions of
features etc.)
Real-time analytics with:
Unprecedented demand for offloaded computation,
accelerators, and higher memory bandwidth systems
Resulting in….
Moore’s law is dying
#ibmedge
OpenPOWER: Open Hardware for High Performance
8
Systems designed for
big data analytics
and superior cloud economics
Upto:
12 cores per cpu
96 hardware threads per cpu
1 TB RAM
7.6Tb/s combined I/O Bandwidth
GPUs and FPGAs coming…
OpenPOWER
Traditional
Intel x86
http://www.softlayer.com/bare-metal-search?processorModel[]=9
#ibmedge
New OpenPOWER Systems with NVLink
9
S822LC-hpc “Minsky”:
2 POWER8 CPUs with 4 NVIDIA® Tesla® P100
GPUs GPUs hooked directly to CPUs using
Nvidia’s NVLink high-speed interconnect
http://www-03.ibm.com/systems/power/hardware/s822lc-hpc/index.html
#ibmedge
Transparent acceleration for Deep Learning on
OpenPOWER and GPUs
10
Huge speed-ups
with GPUs and
OpenPOWER!
http://openpower.devpost.com/
Impressive acceleration examples:
• artNet Genre classifier
• Distributed Tensorflow for cancer detection
• Scale up and out genetics bioinformatics
• Full red blood cell modeling
• Accelerated ultrasound imaging
• Emergency service prediction
#ibmedge
Enabling Accelerators/GPUs in the cloud stack
Deep Learning apps
11
Containers and images
OR
Accelerators
Clustering frameworks
#ibmedge
Requirements for GPUs in the Cloud
12
Function/Feature Comments
GPUs exposed to Dockerized
applications
Apps need access to /dev/nvidia* to use the GPUs
Support for NVIDIA GPUs Both IBM Cloud and POWER systems support NVIDIA GPUs
Support Multiple GPUs per node IBM Cloud machines have up to 2 K80s (4 GPUs) and POWER nodes
have many more
Containers require no GPU drivers GPU drivers are huge and drivers in a container creates a portability
problems so we need to support to mounting GPU drivers into the
container from the host (volume injection)
GPU Isolation GPUs allocated to a workloads should be invisible to other workloads
GPU Auto-discovery Worker node agent automatically discovers the GPU types and numbers
and report to the scheduler
GPU Usage metrics GPU utilization is critical for developers so need to expose these metrics
Support for heterogeneous GPUs in a
cluster (including app to pick a GPU
type)
IBM Cloud has M60, K80, etc and different workloads need different
GPUs
GPU sharing GPUs should be isolated between workloads
GPUs should be sharable in some cases between workloads
#ibmedge
NVIDIA Docker
13
Credit: https://github.com/NVIDIA/nvidia-docker
• A docker wrapper and tools
to package and GPU based apps
• Uses drivers on the host
• Manual GPU assignment
• Good for single node
• Available on POWER
#ibmedge
Mesos and Ecosystem
• Open-source cluster manager
• Enables siloed applications to be consolidated on a shared pool of resources, delivering:
• Rich framework ecosystem
• Emerging GPU support
14
#ibmedge
Mesos GPU support
(Joint work between Mesosphere, NVIDIA and IBM)
Credit: Kevin Klaues, Mesosphere
Mesos support for GPUs v 1.1
• Mesos will support GPU in two different
frameworks
– Unified containerizer
• No docker support initially
• Remove Docker daemon from the node
– Docker containerizer
• Traditional executor for Docker
• Docker container based deployment
• On going work
– Code to allocate GPUs at the node in docker
containerizer
– Code to share the state with unified containerizer
– Logic for node recovery (nvidia driving this work)
• Limitations
– No GPU sharing between docker containers
– Limited GPU usage information exposed in the UI
– Slave recovery code will evolve over time
– NVIDIA GPUs only
#ibmedge
Implementation
• GPU shared by mesos containerizer and docker containerizer
• mesos-docker-executor extended to handle devices isolation/exposition through docker daemon
• Native docker implementation for CPU/memory/GPU/GPU driver volume management
16
Nvidia GPU
Allocator
Nvidia Volume
Manager
Mesos
Containerizer
Docker
Containerizer
Docker Daemon
CPU Memory GPU GPU driver volume
mesos-docker-executor
Nvidia GPU Isolator
Mesos Agent
Docker image label check:
com.nvidia.volumes.needed="nvidia_driver"
#ibmedge
Mesos GPU monitor and Marathon on OpenPOWER
17
#ibmedge
Usage and Progress
• Usage
• Compile Mesos with flag: ../configure --with-nvml=/nvml-header-path &&
make –j install
• Build GPU images following nvidia-docker:
(https://github.com/NVIDIA/nvidia-docker)
• Run a docker task with additional such resource “gpus=1”
• Release
• Target release: 1.1
• GPU allocator for docker containerizer (code review)
• GPU isolation/exposition support for msos-docker-executor (code review)
• GPU driver volume injection (under development)
18
#ibmedge
Eco-system Activities
• Marathon
• GPU support for Mesos Containerizer in release 1.3
• GPU support for Docker Containerizer ready for release (waiting for
Mesos side code merge)
19
#ibmedge
Kubernetes
• Open source
orchestration system for
Docker containers
• Handles scheduling onto
nodes in a compute
cluster
• Actively manages
workloads to ensure that
their state matches the
users declared intentions
• Emerging support for
GPUs
20
Kubernetes
master
Docker
Engine
Docker
Engine
Docker
Engine
Host Host Host
Kubelet/
Proxy
Kubelet/
Proxy
Kubelet/
Proxy
Etcd
cluster
-API server
-Scheduler
-Controller Mgr
Support HA mode
Cluster state
#ibmedge
Kubernetes GPU support
• Design Doc for GPU support in K8s has been out for a while
– https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/gpu-support.md
Function/Feature Kub Community Our Contribution
GPUs exposed to
Dockerized applications
Yes
Support for NVIDIA GPUs Yes
Support Multiple GPUs per
node
Yes, a PR is
pending*
Containers require no GPU
drivers
No PoC complete
GPU Isolation Yes
GPU Auto-discovery No future
GPU Usage metrics No future
Support for heterogeneous
GPUs in a cluster
(including app to pick a
GPU type)
No future
GPU sharing No future
*GPU on Kubernetes updates in community: https://github.com/kubernetes/kubernetes/pull/28216
#ibmedge
Status of GPUs in Mesos and Kubernetes
22
Function/Feature NVIDIA Docker Mesos Kubernetes
GPUs exposed to Dockerized applications ✔ ✔ ✔
Support for NVIDIA GPUs ✔ ✔ ✔
Support Multiple GPUs per node ✔ ✔ ✔
Containers require no GPU drivers ✔ ✔ Future
GPU Isolation ✔ ✔ ✔
GPU Auto-discovery Future Future ✔
GPU Usage metrics ✔ Future Future
Support for heterogeneous GPUs in a cluster (including app to pick a
GPU type)
✘ Future Future
GPU sharing ✔
(not controlled)
Future Future
© 2016 IBM Corporation #ibmedge
Demo
23
#ibmedge
Machine Learning and Deep Learning analytics on
OpenPOWER
No code changes needed!!
24
ATLAS
Automatically Tuned Linear Algebra
Software)
#ibmedge
Learn More and Get Started…
25
Power-Efficient Machine Learning on
POWER Systems using FPGA Acceleration
Machine and Deep Learning on Power Systems
Register for a SuperVessel Account and take deep learning
notebooks running in docker containers a spin!
https://ny1.ptopenlab.com/bigdata_cluster
#ibmedge
Summary and Next Steps
• Cognitive, Machine and Deep Learning workloads are everywhere
• OpenPOWER and Accelerators will help speed up these workloads
• Containers can be leveraged with accelerators for agile deployment of these new
workloads
• Docker, Mesos and Kubernetes are making rapid progress to support accelerators
• OpenPOWER and this emerging cloud stack makes it the preferred platform for Cognitive
workloads
|
#ibmedge
Special Thanks to Collaborators
• Kevin Klues, Mesosphere
• Yu Bo Li, IBM
• Rajat Phull, NVIdia
• Guangya Liu, IBM
• Qian Zhang, IBM
• Benjamin Mahler, Mesosphere
• Vikrama Ditya, Nvidia
• Yong Feng, IBM
• Christy L Norman Perez, IBM
• Kubernetes Team
© 2016 IBM Corporation #ibmedge
Thank You
Seelam – sseelam@us.ibm.com
IP - ipoddar@us.ibm.com
© 2016 IBM Corporation #ibmedge
Backup
29
#ibmedge
Notices and Disclaimers
30
Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission
from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of
initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS
DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE
USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY.
IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.
IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our
warranty terms apply.”
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers
have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in
which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials
and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or
their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and
interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such
laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law
#ibmedge
Notices and Disclaimers Con’t.
31
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not
tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the
ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual
property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®,
FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG,
Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®,
PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®,
StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business
Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marathon on POWER

  • 1.
    #ibmedge© 2016 IBMCorporation Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marathon on POWER Seetharami Seelam, IBM Research Indrajit Poddar, IBM Systems
  • 2.
    #ibmedge Please Note: • IBM’sstatements regarding its plans, directions, and intent are subject to change or withdrawal without notice and at IBM’s sole discretion. • Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. • The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. • The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. • Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 1
  • 3.
    #ibmedge About Seelam Expertise: • 10+years in Large scale high performance and distributed systems • Built multiple cloud services for IBM Bluemix: autoscaling, business rules, containers, POWER containers, and Deep Learning as a service • Enabled and scaled Docker on POWER/Z for extreme density (tens of thousands of containers) • Enabling GPUs in the cloud for container-based workloads (Mesos/Kub/Docker) 2 Dr. Seetharami R. Seelam Research Staff Member IBM T. J. Watson Research Center Yorktown Heights, NY sseelam@us.ibm.com Twitter: sseelam
  • 4.
    #ibmedge About Indrajit (a.k.a.I.P) Expertise: • Accelerated Cloud Data Services, Machine Learning and Deep Learning • Apache Spark, TensorFlow… with GPUs • Distributed Computing (scale out and up) • Cloud Foundry, Spectrum Conductor, Mesos, Kubernetes, Docker, OpenStack, WebSphere • Cloud computing on High Performance Systems • OpenPOWER, IBM POWER 3 Indrajit Poddar Senior Technical Staff Member, Master Inventor, IBM Systems ipoddar@us.ibm.com Twitter: @ipoddar
  • 5.
    #ibmedge Agenda • Introduction toCognitive workloads and POWER • Requirements for GPUs in the Cloud • Mesos/GPU enablement • Kubernetes/GPU enablement • Demo of GPU usage with Docker on OpenPOWER to identify dog breads • Machine and Deep Leaning Ecosystem on OpenPOWER • Summary and Next Steps 4
  • 6.
    #ibmedge Cognition 5 What you andI (our brains) do without even thinking about it…..we recognize a bicycle
  • 7.
    #ibmedge Now machines arelearning the way we learn…. 6 From "Texture of the Nervous System of Man and the Vertebrates" by Santiago Ramón y Cajal. Artificial Neural Networks
  • 8.
    #ibmedge But training needsa lot computational resources Easy scale-out with: Deep Learning model training is hard to distribute Training can take hours, days or weeks Input data and model sizes are becoming larger than ever (e.g. video input, billions of features etc.) Real-time analytics with: Unprecedented demand for offloaded computation, accelerators, and higher memory bandwidth systems Resulting in…. Moore’s law is dying
  • 9.
    #ibmedge OpenPOWER: Open Hardwarefor High Performance 8 Systems designed for big data analytics and superior cloud economics Upto: 12 cores per cpu 96 hardware threads per cpu 1 TB RAM 7.6Tb/s combined I/O Bandwidth GPUs and FPGAs coming… OpenPOWER Traditional Intel x86 http://www.softlayer.com/bare-metal-search?processorModel[]=9
  • 10.
    #ibmedge New OpenPOWER Systemswith NVLink 9 S822LC-hpc “Minsky”: 2 POWER8 CPUs with 4 NVIDIA® Tesla® P100 GPUs GPUs hooked directly to CPUs using Nvidia’s NVLink high-speed interconnect http://www-03.ibm.com/systems/power/hardware/s822lc-hpc/index.html
  • 11.
    #ibmedge Transparent acceleration forDeep Learning on OpenPOWER and GPUs 10 Huge speed-ups with GPUs and OpenPOWER! http://openpower.devpost.com/ Impressive acceleration examples: • artNet Genre classifier • Distributed Tensorflow for cancer detection • Scale up and out genetics bioinformatics • Full red blood cell modeling • Accelerated ultrasound imaging • Emergency service prediction
  • 12.
    #ibmedge Enabling Accelerators/GPUs inthe cloud stack Deep Learning apps 11 Containers and images OR Accelerators Clustering frameworks
  • 13.
    #ibmedge Requirements for GPUsin the Cloud 12 Function/Feature Comments GPUs exposed to Dockerized applications Apps need access to /dev/nvidia* to use the GPUs Support for NVIDIA GPUs Both IBM Cloud and POWER systems support NVIDIA GPUs Support Multiple GPUs per node IBM Cloud machines have up to 2 K80s (4 GPUs) and POWER nodes have many more Containers require no GPU drivers GPU drivers are huge and drivers in a container creates a portability problems so we need to support to mounting GPU drivers into the container from the host (volume injection) GPU Isolation GPUs allocated to a workloads should be invisible to other workloads GPU Auto-discovery Worker node agent automatically discovers the GPU types and numbers and report to the scheduler GPU Usage metrics GPU utilization is critical for developers so need to expose these metrics Support for heterogeneous GPUs in a cluster (including app to pick a GPU type) IBM Cloud has M60, K80, etc and different workloads need different GPUs GPU sharing GPUs should be isolated between workloads GPUs should be sharable in some cases between workloads
  • 14.
    #ibmedge NVIDIA Docker 13 Credit: https://github.com/NVIDIA/nvidia-docker •A docker wrapper and tools to package and GPU based apps • Uses drivers on the host • Manual GPU assignment • Good for single node • Available on POWER
  • 15.
    #ibmedge Mesos and Ecosystem •Open-source cluster manager • Enables siloed applications to be consolidated on a shared pool of resources, delivering: • Rich framework ecosystem • Emerging GPU support 14
  • 16.
    #ibmedge Mesos GPU support (Jointwork between Mesosphere, NVIDIA and IBM) Credit: Kevin Klaues, Mesosphere Mesos support for GPUs v 1.1 • Mesos will support GPU in two different frameworks – Unified containerizer • No docker support initially • Remove Docker daemon from the node – Docker containerizer • Traditional executor for Docker • Docker container based deployment • On going work – Code to allocate GPUs at the node in docker containerizer – Code to share the state with unified containerizer – Logic for node recovery (nvidia driving this work) • Limitations – No GPU sharing between docker containers – Limited GPU usage information exposed in the UI – Slave recovery code will evolve over time – NVIDIA GPUs only
  • 17.
    #ibmedge Implementation • GPU sharedby mesos containerizer and docker containerizer • mesos-docker-executor extended to handle devices isolation/exposition through docker daemon • Native docker implementation for CPU/memory/GPU/GPU driver volume management 16 Nvidia GPU Allocator Nvidia Volume Manager Mesos Containerizer Docker Containerizer Docker Daemon CPU Memory GPU GPU driver volume mesos-docker-executor Nvidia GPU Isolator Mesos Agent Docker image label check: com.nvidia.volumes.needed="nvidia_driver"
  • 18.
    #ibmedge Mesos GPU monitorand Marathon on OpenPOWER 17
  • 19.
    #ibmedge Usage and Progress •Usage • Compile Mesos with flag: ../configure --with-nvml=/nvml-header-path && make –j install • Build GPU images following nvidia-docker: (https://github.com/NVIDIA/nvidia-docker) • Run a docker task with additional such resource “gpus=1” • Release • Target release: 1.1 • GPU allocator for docker containerizer (code review) • GPU isolation/exposition support for msos-docker-executor (code review) • GPU driver volume injection (under development) 18
  • 20.
    #ibmedge Eco-system Activities • Marathon •GPU support for Mesos Containerizer in release 1.3 • GPU support for Docker Containerizer ready for release (waiting for Mesos side code merge) 19
  • 21.
    #ibmedge Kubernetes • Open source orchestrationsystem for Docker containers • Handles scheduling onto nodes in a compute cluster • Actively manages workloads to ensure that their state matches the users declared intentions • Emerging support for GPUs 20 Kubernetes master Docker Engine Docker Engine Docker Engine Host Host Host Kubelet/ Proxy Kubelet/ Proxy Kubelet/ Proxy Etcd cluster -API server -Scheduler -Controller Mgr Support HA mode Cluster state
  • 22.
    #ibmedge Kubernetes GPU support •Design Doc for GPU support in K8s has been out for a while – https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/gpu-support.md Function/Feature Kub Community Our Contribution GPUs exposed to Dockerized applications Yes Support for NVIDIA GPUs Yes Support Multiple GPUs per node Yes, a PR is pending* Containers require no GPU drivers No PoC complete GPU Isolation Yes GPU Auto-discovery No future GPU Usage metrics No future Support for heterogeneous GPUs in a cluster (including app to pick a GPU type) No future GPU sharing No future *GPU on Kubernetes updates in community: https://github.com/kubernetes/kubernetes/pull/28216
  • 23.
    #ibmedge Status of GPUsin Mesos and Kubernetes 22 Function/Feature NVIDIA Docker Mesos Kubernetes GPUs exposed to Dockerized applications ✔ ✔ ✔ Support for NVIDIA GPUs ✔ ✔ ✔ Support Multiple GPUs per node ✔ ✔ ✔ Containers require no GPU drivers ✔ ✔ Future GPU Isolation ✔ ✔ ✔ GPU Auto-discovery Future Future ✔ GPU Usage metrics ✔ Future Future Support for heterogeneous GPUs in a cluster (including app to pick a GPU type) ✘ Future Future GPU sharing ✔ (not controlled) Future Future
  • 24.
    © 2016 IBMCorporation #ibmedge Demo 23
  • 25.
    #ibmedge Machine Learning andDeep Learning analytics on OpenPOWER No code changes needed!! 24 ATLAS Automatically Tuned Linear Algebra Software)
  • 26.
    #ibmedge Learn More andGet Started… 25 Power-Efficient Machine Learning on POWER Systems using FPGA Acceleration Machine and Deep Learning on Power Systems Register for a SuperVessel Account and take deep learning notebooks running in docker containers a spin! https://ny1.ptopenlab.com/bigdata_cluster
  • 27.
    #ibmedge Summary and NextSteps • Cognitive, Machine and Deep Learning workloads are everywhere • OpenPOWER and Accelerators will help speed up these workloads • Containers can be leveraged with accelerators for agile deployment of these new workloads • Docker, Mesos and Kubernetes are making rapid progress to support accelerators • OpenPOWER and this emerging cloud stack makes it the preferred platform for Cognitive workloads |
  • 28.
    #ibmedge Special Thanks toCollaborators • Kevin Klues, Mesosphere • Yu Bo Li, IBM • Rajat Phull, NVIdia • Guangya Liu, IBM • Qian Zhang, IBM • Benjamin Mahler, Mesosphere • Vikrama Ditya, Nvidia • Yong Feng, IBM • Christy L Norman Perez, IBM • Kubernetes Team
  • 29.
    © 2016 IBMCorporation #ibmedge Thank You Seelam – sseelam@us.ibm.com IP - ipoddar@us.ibm.com
  • 30.
    © 2016 IBMCorporation #ibmedge Backup 29
  • 31.
    #ibmedge Notices and Disclaimers 30 Copyright© 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law
  • 32.
    #ibmedge Notices and DisclaimersCon’t. 31 Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.