SlideShare a Scribd company logo
1 of 38
Download to read offline
Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
NVIDIA GTC 2019
Jeremy Eder, Andre Beausoleil, Red Hat
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
Agenda
● Red Hat + NVIDIA Partnership Overview
● Announcements / What’s New
● OpenShift + GPU Integration Details
2
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
● GPU accelerated workloads in the enterprise
○ AI/ML and HPC
● Deploy and manage NGC containers
○ On-prem or public cloud
● Managing virtualized resources in the data center
○ vGPU for technical workstation
● Fast deployment of GPU resources with Red Hat
○ Easy to use driver framework
Where Red Hat Partners with NVIDIA
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
Red Hat/NVIDIA Technology Partnership Timeline
Nvidia GTC 2017:
Red Hat vGPU
Roadmap Update
May‘17
STAC-A2 Benchmark
(Nvidia/HPe/RHEL-STAC
Conf NYC, RH & Nvidia
Blogs)
Nov’17
2018 Rice Oil & Gas
HPC Conf
(vGPU/RHV)
Mar’18
Mar’18Nov’17
SC2017
RH/Nvidia (booth
demos, talks)
LSF & MM Summit:
Nouveau Driver
demo
May’18
Nvidia GTC 2018 &
Kubernetes WG mtg - RH
vGPU & Kubernetes
sessions, RH sponsorship
RH Summit - AI booth &
OpenShift Partner
Theatre & RH AI/ML
Strategy sessions
RHV4.2/vGPU 6.1
& CUDA9.2 Annc.
May’18
vGPU/RHV -
Joint Webinar -
Oil & Gas Use
Case
Jun’18
Oct’18
NVIDIA GTC DC;
RHEL & OpenShift
Certification on
DGX-1
Dec’18
OpenShift Commons/
KubeCon; Deep
Learning on
OpenShift w/GPUs
Mar’19
NVIDIA GTC 2019;
RHEL & OpenShift
Certification on
DGX-2 / T4 GPU
Server Configs, RH
Sponsorship
Apr’18
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
● Red Hat Enterprise Linux Certification on DGX-1 & DGX-2 systems
○ Support for Kubernetes-based, OpenShift Container Platform
○ NVIDIA GPU Cloud (NGC) containers to run on RHEL and OpenShift
● Red Hat’s OpenShift provides advanced ways of managing hardware to best
leverage GPUs in container environments
● NVIDIA developed precompiled driver packages to simplify GPU
deployments on Red Hat products
● NVIDIA’s latest T4 GPUs are available on Red Hat Enterprise Linux
○ T4 Server with RHEL support from most major OEM server vendors
○ T4 servers are “NGC-Ready” to run GPU containers
Red Hat + NVIDIA: What’s New?
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
• Heterogeneous Memory Management (HMM)
• Memory management between device and CPU
• Nouveau Driver
• Graphics device driver for NVIDIA GPU
• Mediated Devices (mdev)
• Enabling vGPU through the Linux kernel framework
• Kubernetes Device Plugins
• Fast and direct access to GPU hardware
• Run GPU enabled containers in Kubernetes cluster
Open Source Projects
Red Hat + NVIDIA: Open Source Collaboration
Red Hat OpenShift Container Platform
OPENSHIFT - CONTAINER PLATFORM FOR AI
Enable Kubernetes clusters to seamlessly run accelerated AI workloads in containers
Red Hat is delivering required functionality to efficiently run
AI/ML workloads on OpenShift
● 3.10, 3.11
○ Device plugins provide access to FPGAs, GPGPUs, SoC and
other specialized HW to applications running in containers
○ CPU Manager provides containers with exclusive access to
compute resources, like CPU cores, for better utilization
○ Huge Pages Support enables containers with large
memory requirements to run more efficiently
● 4.0
○ Multi-network feature allows more than one network
interface per container for better traffic management
8
RHEL
OCP NODE
C
C
RHEL
OCP NODE
c
C
C
RHEL
OCP NODE
C
RED HAT
ENTERPRISE LINUX
OCP MASTER
API/AUTHENTICATIO
N
DATA STORE
SCHEDULER
HEALTH/SCALING
RHEL
OCP NODE
C C
RHEL
OCP NODE
C C
RHEL
OCP NODE
C
GPU-enabled server
with Red Hat Enterprise Linux and
OpenShift Container platform (OCP)
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
9
One Platform to...
OpenShift is the single platform
to run any application:
● Old or new
● Monolithic/Microservice
9
Big Data
NFV
FSI
Animation
ISVsHPC
Machine
Learning
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
Data Scientist User Experience (Service Catalog)
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
● Resource Management Working Group
○ Features Delivered
■ Device Plugins (GPU/Bypass/FPGA)
■ CPU Manager (exclusive cores)
■ Huge Pages Support
○ Extensive Roadmap
● Intel, IBM, Google, NVIDIA, Red Hat, many more...
Upstream First: Kubernetes Working Groups
11
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
● Network Plumbing Working Group
○ Formalized Dec 2017
● Implemented a multi-network specification:
https://github.com/K8sNetworkPlumbingWG/multi-net-spec
(collection of CRDs for multiple networks, owned by sig-network)
● Reference Design implemented in Multus CNI by Red Hat
● Separate control- and data-plane, Overlapping IPs, Fast Data-plane
● IBM, Intel, Red Hat, Huawei, Cisco, Tigera...at least.
Upstream First: Kubernetes Working Groups
12
GPU Cluster Topology
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
Control Plane
Compute and GPU Nodes
Infrastructure
master
and etcd
master
and etcd
master
and etcd
registry
and
router
registry
and
router
LB
registry
and
router
What does an OpenShift (OCP) Cluster look like?
14
GPU GPU
GPU GPU
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
● How to enable software to take advantage of “special”
hardware
● Create Node Pools
○ MachineSets
○ Mark them as “special”
○ Taints/Tolerations
○ Priority/Preemption
○ ExtendedResourceTole
ration
15
Compute and GPU Nodes
GPU GPU
GPU GPU
OpenShift Cluster Topology
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
● How to enable software to take advantage of “special”
hardware
● Tune/Configure the OS
○ Tuned Profiles
○ CPU Isolation
○ sysctls
16
Compute and GPU Nodes
GPU GPU
GPU GPU
OpenShift Cluster Topology
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
● How to enable software to take advantage of “special”
hardware
● Optimize your workload
○ Dedicate CPU cores
○ Consume hugepages
17
Compute and GPU Nodes
GPU GPU
GPU GPU
OpenShift Cluster Topology
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
● How to enable software to take advantage of “special”
hardware
● Enable the Hardware
○ Install drivers
○ Deploy Device Plugin
○ Deploy monitoring
18
Compute and GPU Nodes
GPU GPU
GPU GPU
OpenShift Cluster Topology
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
● How to enable software to take advantage of “special”
hardware
● Consume the Device
○ KubeFlow Template
deployment
19
Compute and GPU Nodes
GPU GPU
GPU GPU
OpenShift Cluster Topology
Support Components
INSERT DESIGNATOR, IF NEEDED21
Cluster Node Tuning Operator (tuned)
OpenShift node-level tuning operator
● Consolidate/Centralize node-level tuning
(openshift-ansible)
● Set tunings for Elastic/Router/SDN
● Add more flexibility to add custom tuning
specified by customers
● NVIDIA DGX-1 & DGX-2 Tuned Profiles
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
Node Feature Discovery Operator (NFD)
● Git Repos:
○ Upstream
○ Downstream
● Client/Server model
● Customize with “hooks”
Labels: feature.node.kubernetes.io/cpu-hardware_multithreading=true
feature.node.kubernetes.io/cpuid-AVX2=true
feature.node.kubernetes.io/cpuid-SSE4.2=true
feature.node.kubernetes.io/kernel-selinux.enabled=true
feature.node.kubernetes.io/kernel-version.full=3.10.0-957.5.1.el7.x86_64
feature.node.kubernetes.io/pci-0300_10de.present=true
feature.node.kubernetes.io/storage-nonrotationaldisk=true
feature.node.kubernetes.io/system-os_release.ID=rhcos
feature.node.kubernetes.io/system-os_release.VERSION_ID=4.0
feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=0
Steer workloads based on infrastructure capabilities
23
https://github.com/intel/multus-cni
NFV Partner Engineering along with
the Network Plumbing Working
Group is using Multus as part of a
reference implementation.
Multus CNI is a “meta plugin” for
Kubernetes CNI which enables one
to attach multiple network interfaces
on each pod. It allows one to assign
a CNI plugin to each interface
created in the pod.
24
THE PROBLEM (Today)
Kubernetes
Master/Node
Pod A
eth0 flannel
#1 Each pod only has one
network interface
Kubernetes
Master/Node
#2 Each master/node has only
one static CNI configuration
so.
static.
25
THE SOLUTION (Today)
Kubernetes Master/Node
Static CNI configuration points
to Multus
Kubernetes
Master/Node
Each subsequent CNI plugin, as called by Multus, has
configurations which are defined in CRD objects
Pod C
eth0 net0
flannel macvlan
I’d like a flannel interface, and a
macvlan interface please.
flannel macvlan
CRDs
Sure thing bud, I’ll pull up the
configurations stored in CRD objects.
Pod annotation
26
WHAT MULTUS DOES
Pod
eth0
Pod
eth0 net0
OpenShift
SDN CNI
(default)
macvlan
Pod without Multus Pod with Multus
Kubernetes
Multus CNI
Kubernetes
OpenShift
SDN CNI
macvlan CNI
OpenShift
SDN CNI
OpenShift
SDN CNI
(default)
27
The specification uses
annotations to call out a list of
intended network attachments
as “sidecar networks”
Standardized CRD
apiVersion: v1
kind: Pod
metadata:
name: pod_c
annotations:
kubernetes.cni.cncf.io/networks: '[
{ "name": "flannel-conf" },
{ "name": "macvlan-conf" }
]'
spec:
containers: [...]
Name: macvlan-conf
Namespace: default
Labels: <none>
Annotations: <none>
API Version: cni.cncf.io/v1
Args: [ { "master": "eth0", "mode": "bridge", ...
Kind: Network
Plugin: macvlan
Metadata: [...]
Pod annotations
CRD Object
CNI network configurations are
packed inside CRD objects.
Mapsto...
As currently proposed by Network Plumbing Working Group.
Installation and Day 2 Management
of NVIDIA GPUs in OpenShift 4
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
Roadmap: Operationalizing GPUs on OpenShift 4
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
Roadmap: Specialized Hardware in OpenShift 4
machine-config-operator
special-resource-operator (NIC)
prometheus/grafana dashboards
openshift-multus daemonsetcluster-network-operator
cluster-node-tuning-operator
cluster-nfd-operator
special-resource-operator (GPU)
machine-api-operator
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
Special Resource Operator
Daemonset
Roadmap: Special Resource Operator
OpenShift Node | GPU | FPGA | NIC | OTHER
Node Feature Discovery Operator OpenShift Node | GPU | FPGA | NIC | OTHER
Node Object:
Labels:
feature.node.kubernetes.io/pci-0300_10de.prese
nt=true
Capacity:
example.com/gpu: 4
Driver Container (optional)
Device Plugin Container
Monitoring Container (Prometheus endpoint)
Cluster Node Tuning Operator (next gen tuned)
Blue Boxes: owned, supported, shipped by Red Hat
Green Boxes: owned, supported, shipped by Partner
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
Soft or Hard Shared Cluster Partitioning?
Priority and Preemption
● Create PriorityClasses based on business goals
● Annotate pod specs with priorityClassName
● If all GPUs are used
○ A high prio pod is queued
○ A low prio pod is running
○ Kube will preempt low prio pod
■ And schedule high prio pod
● Ensures optimal density
Taints and Toleration
● Taints are “node labels with policies”
○ You can taint a node like
○ nvidia.com/gpu=value:NoSchedule
● Then a pod will have to “tolerate” the
nvidia.com/gpu taint, otherwise it won’t run
on that node.
● This allows you to create “node pools”
● Could lead to under-utilized resources
● Might make sense for security or business
rules
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
Enforcing Quota on GPUs (per namespace)
Create a quota on a namespace
# cat gpu-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: gpu-quota
namespace: nvidia
spec:
hard:
requests.nvidia.com/gpu: 1
Verify the quota is set
# oc describe quota gpu-quota -n nvidia
Name: gpu-quota
Namespace: nvidia
Resource Used Hard
-------- ---- ----
requests.nvidia.com/gpu 0 1
Expected message when exceeding quota
# oc create -f gpu-pod.yaml
Error from server (Forbidden): error when creating "gpu-pod.yaml": pods "gpu-pod-f7z2w" is forbidden:exceeded
quota: gpu-quota, requested: requests.nvidia.com/gpu=1, used: requests.nvidia.com/gpu=1, limited:
requests.nvidia.com/gpu=1
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
● Red Hat and NVIDIA are collaborating to improve the user experience of
NVIDIA's drivers and CUDA Toolkit on RHEL and OpenShift
● Easier install/upgrade through upcoming changes to the driver packaging
(e.g., no DKMS required anymore)
○ Let us know if you are interested in a tech preview!
● Improved coordination between NVIDIA and Red Hat regarding testing,
release processes, and support
● High-level goal is to make NVIDIA's driver feel more like a normal in-box
driver
NVIDIA Driver Packaging
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
Special Resource Operator
Daemonset
Roadmap: Special Resource Operator
OpenShift Node | GPU | FPGA | NIC | OTHER
Node Feature Discovery Operator OpenShift Node | GPU | FPGA | NIC | OTHER
Node Object:
Labels:
feature.node.kubernetes.io/pci-0300_10de.prese
nt=true
Capacity:
example.com/gpu: 4
Driver Container (optional)
Device Plugin Container
Monitoring Container (Prometheus endpoint)
Cluster Node Tuning Operator (next gen tuned)
Blue Boxes: owned, supported, shipped by Red Hat
Green Boxes: owned, supported, shipped by Partner
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
Thank You!
● Come see us @ Booth 716
● Accompanying Announcement blog
THANK YOU
plus.google.com/+RedHat
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHat
NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted
Demo
Link
1. Show no driver
2. Show node labels
3. Show nfd operator and node label differences, focus on PCI row (CPU node and GPU node)
4. Show GPU operator create and tail operator logs
5. Show oc describe node (nvidia.com/gpu=X)
6. Taints and Tolerations? Show oc describe node focus on taints (nvidia.com/gpu:NoSchedule)
7. Priority/Preemption Show oc get priorityclasses
8. GPU workload demo…

More Related Content

What's hot

Scalable POSIX File Systems in the Cloud
Scalable POSIX File Systems in the CloudScalable POSIX File Systems in the Cloud
Scalable POSIX File Systems in the CloudRed_Hat_Storage
 
Red hat on_power-ibm _lop_day_2015
Red hat on_power-ibm _lop_day_2015Red hat on_power-ibm _lop_day_2015
Red hat on_power-ibm _lop_day_2015cmilsted
 
DevOpsDays Taipei 2017 從打鐵到雲端
DevOpsDays Taipei 2017 從打鐵到雲端DevOpsDays Taipei 2017 從打鐵到雲端
DevOpsDays Taipei 2017 從打鐵到雲端Hung-Yen Chen
 
Osdc2012 xtfs.talk
Osdc2012 xtfs.talkOsdc2012 xtfs.talk
Osdc2012 xtfs.talkUdo Seidel
 
Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-CasesRed Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-CasesRed_Hat_Storage
 
OpenShift Commons Briefing: Ask Me Anything about Cinder and Glance
OpenShift Commons Briefing: Ask Me Anything about Cinder and GlanceOpenShift Commons Briefing: Ask Me Anything about Cinder and Glance
OpenShift Commons Briefing: Ask Me Anything about Cinder and GlanceBrian Rosmaita
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
 
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...Michal Němec
 
adp.ceph.openstack.talk
adp.ceph.openstack.talkadp.ceph.openstack.talk
adp.ceph.openstack.talkUdo Seidel
 
Ostd.ksplice.talk
Ostd.ksplice.talkOstd.ksplice.talk
Ostd.ksplice.talkUdo Seidel
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyCeph Community
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed_Hat_Storage
 
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015bipin kunal
 
Iocg Whats New In V Sphere
Iocg Whats New In V SphereIocg Whats New In V Sphere
Iocg Whats New In V SphereAnne Achleman
 
Secure container: Kata container and gVisor
Secure container: Kata container and gVisorSecure container: Kata container and gVisor
Secure container: Kata container and gVisorChing-Hsuan Yen
 
DRBD + OpenStack (Openstack Live Prague 2016)
DRBD + OpenStack (Openstack Live Prague 2016)DRBD + OpenStack (Openstack Live Prague 2016)
DRBD + OpenStack (Openstack Live Prague 2016)Jaroslav Jacjuk
 
LinuxCon NA 2016: When Containers and Virtualization Do - and Don’t - Work T...
LinuxCon NA 2016:  When Containers and Virtualization Do - and Don’t - Work T...LinuxCon NA 2016:  When Containers and Virtualization Do - and Don’t - Work T...
LinuxCon NA 2016: When Containers and Virtualization Do - and Don’t - Work T...Jeremy Eder
 
OSCON 2017: To contain or not to contain
OSCON 2017:  To contain or not to containOSCON 2017:  To contain or not to contain
OSCON 2017: To contain or not to containJeremy Eder
 
Cephfsglusterfs.talk
Cephfsglusterfs.talkCephfsglusterfs.talk
Cephfsglusterfs.talkUdo Seidel
 

What's hot (20)

Scalable POSIX File Systems in the Cloud
Scalable POSIX File Systems in the CloudScalable POSIX File Systems in the Cloud
Scalable POSIX File Systems in the Cloud
 
Red hat on_power-ibm _lop_day_2015
Red hat on_power-ibm _lop_day_2015Red hat on_power-ibm _lop_day_2015
Red hat on_power-ibm _lop_day_2015
 
kpatch.kgraft
kpatch.kgraftkpatch.kgraft
kpatch.kgraft
 
DevOpsDays Taipei 2017 從打鐵到雲端
DevOpsDays Taipei 2017 從打鐵到雲端DevOpsDays Taipei 2017 從打鐵到雲端
DevOpsDays Taipei 2017 從打鐵到雲端
 
Osdc2012 xtfs.talk
Osdc2012 xtfs.talkOsdc2012 xtfs.talk
Osdc2012 xtfs.talk
 
Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-CasesRed Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
 
OpenShift Commons Briefing: Ask Me Anything about Cinder and Glance
OpenShift Commons Briefing: Ask Me Anything about Cinder and GlanceOpenShift Commons Briefing: Ask Me Anything about Cinder and Glance
OpenShift Commons Briefing: Ask Me Anything about Cinder and Glance
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...
 
adp.ceph.openstack.talk
adp.ceph.openstack.talkadp.ceph.openstack.talk
adp.ceph.openstack.talk
 
Ostd.ksplice.talk
Ostd.ksplice.talkOstd.ksplice.talk
Ostd.ksplice.talk
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS Plans
 
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
Cephfs - Red Hat Openstack and Ceph meetup, Pune 28th november 2015
 
Iocg Whats New In V Sphere
Iocg Whats New In V SphereIocg Whats New In V Sphere
Iocg Whats New In V Sphere
 
Secure container: Kata container and gVisor
Secure container: Kata container and gVisorSecure container: Kata container and gVisor
Secure container: Kata container and gVisor
 
DRBD + OpenStack (Openstack Live Prague 2016)
DRBD + OpenStack (Openstack Live Prague 2016)DRBD + OpenStack (Openstack Live Prague 2016)
DRBD + OpenStack (Openstack Live Prague 2016)
 
LinuxCon NA 2016: When Containers and Virtualization Do - and Don’t - Work T...
LinuxCon NA 2016:  When Containers and Virtualization Do - and Don’t - Work T...LinuxCon NA 2016:  When Containers and Virtualization Do - and Don’t - Work T...
LinuxCon NA 2016: When Containers and Virtualization Do - and Don’t - Work T...
 
OSCON 2017: To contain or not to contain
OSCON 2017:  To contain or not to containOSCON 2017:  To contain or not to contain
OSCON 2017: To contain or not to contain
 
Cephfsglusterfs.talk
Cephfsglusterfs.talkCephfsglusterfs.talk
Cephfsglusterfs.talk
 

Similar to NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted

Using-NVIDIA-GPU-Cloud-Containers-on-the-Nimbix-Cloud-NVIDIA.pdf
Using-NVIDIA-GPU-Cloud-Containers-on-the-Nimbix-Cloud-NVIDIA.pdfUsing-NVIDIA-GPU-Cloud-Containers-on-the-Nimbix-Cloud-NVIDIA.pdf
Using-NVIDIA-GPU-Cloud-Containers-on-the-Nimbix-Cloud-NVIDIA.pdfjn7887
 
DCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUsDCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUsDocker, Inc.
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018NVIDIA
 
02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...Yutaka Kawai
 
“A New, Open-standards-based, Open-source Programming Model for All Accelerat...
“A New, Open-standards-based, Open-source Programming Model for All Accelerat...“A New, Open-standards-based, Open-source Programming Model for All Accelerat...
“A New, Open-standards-based, Open-source Programming Model for All Accelerat...Edge AI and Vision Alliance
 
01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a box01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a boxYutaka Kawai
 
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIAH2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIASri Ambati
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化NVIDIA Taiwan
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Pycon2014 GPU computing
Pycon2014 GPU computingPycon2014 GPU computing
Pycon2014 GPU computingAshwin Ashok
 
GPU Cloud Server in India
GPU Cloud Server in IndiaGPU Cloud Server in India
GPU Cloud Server in IndiaCloudtechtiq
 
Top 5 DGX Sessions From GTC 2019
Top 5 DGX Sessions From GTC 2019Top 5 DGX Sessions From GTC 2019
Top 5 DGX Sessions From GTC 2019NVIDIA
 
BCON22: oneAPI backend - Blender Cycles on Intel GPUs
BCON22: oneAPI backend - Blender Cycles on Intel GPUsBCON22: oneAPI backend - Blender Cycles on Intel GPUs
BCON22: oneAPI backend - Blender Cycles on Intel GPUsXavier Hallade
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
 
GPU Accelerated Virtual Desktop Infrastructure (VDI) on OpenStack
GPU Accelerated Virtual Desktop Infrastructure (VDI) on OpenStackGPU Accelerated Virtual Desktop Infrastructure (VDI) on OpenStack
GPU Accelerated Virtual Desktop Infrastructure (VDI) on OpenStackBrian Schott
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps WorkshopWeaveworks
 

Similar to NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted (20)

Using-NVIDIA-GPU-Cloud-Containers-on-the-Nimbix-Cloud-NVIDIA.pdf
Using-NVIDIA-GPU-Cloud-Containers-on-the-Nimbix-Cloud-NVIDIA.pdfUsing-NVIDIA-GPU-Cloud-Containers-on-the-Nimbix-Cloud-NVIDIA.pdf
Using-NVIDIA-GPU-Cloud-Containers-on-the-Nimbix-Cloud-NVIDIA.pdf
 
DCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUsDCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUs
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...
 
“A New, Open-standards-based, Open-source Programming Model for All Accelerat...
“A New, Open-standards-based, Open-source Programming Model for All Accelerat...“A New, Open-standards-based, Open-source Programming Model for All Accelerat...
“A New, Open-standards-based, Open-source Programming Model for All Accelerat...
 
Cuda
CudaCuda
Cuda
 
01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a box01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a box
 
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIAH2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Pycon2014 GPU computing
Pycon2014 GPU computingPycon2014 GPU computing
Pycon2014 GPU computing
 
GPU Cloud Server in India
GPU Cloud Server in IndiaGPU Cloud Server in India
GPU Cloud Server in India
 
Top 5 DGX Sessions From GTC 2019
Top 5 DGX Sessions From GTC 2019Top 5 DGX Sessions From GTC 2019
Top 5 DGX Sessions From GTC 2019
 
BCON22: oneAPI backend - Blender Cycles on Intel GPUs
BCON22: oneAPI backend - Blender Cycles on Intel GPUsBCON22: oneAPI backend - Blender Cycles on Intel GPUs
BCON22: oneAPI backend - Blender Cycles on Intel GPUs
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
GPU Accelerated Virtual Desktop Infrastructure (VDI) on OpenStack
GPU Accelerated Virtual Desktop Infrastructure (VDI) on OpenStackGPU Accelerated Virtual Desktop Infrastructure (VDI) on OpenStack
GPU Accelerated Virtual Desktop Infrastructure (VDI) on OpenStack
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
NFV features in kubernetes
NFV features in kubernetesNFV features in kubernetes
NFV features in kubernetes
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps Workshop
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted

  • 1. Red Hat and the NVIDIA DGX: Tried, Tested, Trusted NVIDIA GTC 2019 Jeremy Eder, Andre Beausoleil, Red Hat
  • 2. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted Agenda ● Red Hat + NVIDIA Partnership Overview ● Announcements / What’s New ● OpenShift + GPU Integration Details 2
  • 3. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted ● GPU accelerated workloads in the enterprise ○ AI/ML and HPC ● Deploy and manage NGC containers ○ On-prem or public cloud ● Managing virtualized resources in the data center ○ vGPU for technical workstation ● Fast deployment of GPU resources with Red Hat ○ Easy to use driver framework Where Red Hat Partners with NVIDIA
  • 4. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted Red Hat/NVIDIA Technology Partnership Timeline Nvidia GTC 2017: Red Hat vGPU Roadmap Update May‘17 STAC-A2 Benchmark (Nvidia/HPe/RHEL-STAC Conf NYC, RH & Nvidia Blogs) Nov’17 2018 Rice Oil & Gas HPC Conf (vGPU/RHV) Mar’18 Mar’18Nov’17 SC2017 RH/Nvidia (booth demos, talks) LSF & MM Summit: Nouveau Driver demo May’18 Nvidia GTC 2018 & Kubernetes WG mtg - RH vGPU & Kubernetes sessions, RH sponsorship RH Summit - AI booth & OpenShift Partner Theatre & RH AI/ML Strategy sessions RHV4.2/vGPU 6.1 & CUDA9.2 Annc. May’18 vGPU/RHV - Joint Webinar - Oil & Gas Use Case Jun’18 Oct’18 NVIDIA GTC DC; RHEL & OpenShift Certification on DGX-1 Dec’18 OpenShift Commons/ KubeCon; Deep Learning on OpenShift w/GPUs Mar’19 NVIDIA GTC 2019; RHEL & OpenShift Certification on DGX-2 / T4 GPU Server Configs, RH Sponsorship Apr’18
  • 5. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted ● Red Hat Enterprise Linux Certification on DGX-1 & DGX-2 systems ○ Support for Kubernetes-based, OpenShift Container Platform ○ NVIDIA GPU Cloud (NGC) containers to run on RHEL and OpenShift ● Red Hat’s OpenShift provides advanced ways of managing hardware to best leverage GPUs in container environments ● NVIDIA developed precompiled driver packages to simplify GPU deployments on Red Hat products ● NVIDIA’s latest T4 GPUs are available on Red Hat Enterprise Linux ○ T4 Server with RHEL support from most major OEM server vendors ○ T4 servers are “NGC-Ready” to run GPU containers Red Hat + NVIDIA: What’s New?
  • 6. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted • Heterogeneous Memory Management (HMM) • Memory management between device and CPU • Nouveau Driver • Graphics device driver for NVIDIA GPU • Mediated Devices (mdev) • Enabling vGPU through the Linux kernel framework • Kubernetes Device Plugins • Fast and direct access to GPU hardware • Run GPU enabled containers in Kubernetes cluster Open Source Projects Red Hat + NVIDIA: Open Source Collaboration
  • 7. Red Hat OpenShift Container Platform
  • 8. OPENSHIFT - CONTAINER PLATFORM FOR AI Enable Kubernetes clusters to seamlessly run accelerated AI workloads in containers Red Hat is delivering required functionality to efficiently run AI/ML workloads on OpenShift ● 3.10, 3.11 ○ Device plugins provide access to FPGAs, GPGPUs, SoC and other specialized HW to applications running in containers ○ CPU Manager provides containers with exclusive access to compute resources, like CPU cores, for better utilization ○ Huge Pages Support enables containers with large memory requirements to run more efficiently ● 4.0 ○ Multi-network feature allows more than one network interface per container for better traffic management 8 RHEL OCP NODE C C RHEL OCP NODE c C C RHEL OCP NODE C RED HAT ENTERPRISE LINUX OCP MASTER API/AUTHENTICATIO N DATA STORE SCHEDULER HEALTH/SCALING RHEL OCP NODE C C RHEL OCP NODE C C RHEL OCP NODE C GPU-enabled server with Red Hat Enterprise Linux and OpenShift Container platform (OCP)
  • 9. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted 9 One Platform to... OpenShift is the single platform to run any application: ● Old or new ● Monolithic/Microservice 9 Big Data NFV FSI Animation ISVsHPC Machine Learning
  • 10. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted Data Scientist User Experience (Service Catalog)
  • 11. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted ● Resource Management Working Group ○ Features Delivered ■ Device Plugins (GPU/Bypass/FPGA) ■ CPU Manager (exclusive cores) ■ Huge Pages Support ○ Extensive Roadmap ● Intel, IBM, Google, NVIDIA, Red Hat, many more... Upstream First: Kubernetes Working Groups 11
  • 12. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted ● Network Plumbing Working Group ○ Formalized Dec 2017 ● Implemented a multi-network specification: https://github.com/K8sNetworkPlumbingWG/multi-net-spec (collection of CRDs for multiple networks, owned by sig-network) ● Reference Design implemented in Multus CNI by Red Hat ● Separate control- and data-plane, Overlapping IPs, Fast Data-plane ● IBM, Intel, Red Hat, Huawei, Cisco, Tigera...at least. Upstream First: Kubernetes Working Groups 12
  • 14. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted Control Plane Compute and GPU Nodes Infrastructure master and etcd master and etcd master and etcd registry and router registry and router LB registry and router What does an OpenShift (OCP) Cluster look like? 14 GPU GPU GPU GPU
  • 15. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted ● How to enable software to take advantage of “special” hardware ● Create Node Pools ○ MachineSets ○ Mark them as “special” ○ Taints/Tolerations ○ Priority/Preemption ○ ExtendedResourceTole ration 15 Compute and GPU Nodes GPU GPU GPU GPU OpenShift Cluster Topology
  • 16. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted ● How to enable software to take advantage of “special” hardware ● Tune/Configure the OS ○ Tuned Profiles ○ CPU Isolation ○ sysctls 16 Compute and GPU Nodes GPU GPU GPU GPU OpenShift Cluster Topology
  • 17. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted ● How to enable software to take advantage of “special” hardware ● Optimize your workload ○ Dedicate CPU cores ○ Consume hugepages 17 Compute and GPU Nodes GPU GPU GPU GPU OpenShift Cluster Topology
  • 18. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted ● How to enable software to take advantage of “special” hardware ● Enable the Hardware ○ Install drivers ○ Deploy Device Plugin ○ Deploy monitoring 18 Compute and GPU Nodes GPU GPU GPU GPU OpenShift Cluster Topology
  • 19. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted ● How to enable software to take advantage of “special” hardware ● Consume the Device ○ KubeFlow Template deployment 19 Compute and GPU Nodes GPU GPU GPU GPU OpenShift Cluster Topology
  • 21. INSERT DESIGNATOR, IF NEEDED21 Cluster Node Tuning Operator (tuned) OpenShift node-level tuning operator ● Consolidate/Centralize node-level tuning (openshift-ansible) ● Set tunings for Elastic/Router/SDN ● Add more flexibility to add custom tuning specified by customers ● NVIDIA DGX-1 & DGX-2 Tuned Profiles
  • 22. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted Node Feature Discovery Operator (NFD) ● Git Repos: ○ Upstream ○ Downstream ● Client/Server model ● Customize with “hooks” Labels: feature.node.kubernetes.io/cpu-hardware_multithreading=true feature.node.kubernetes.io/cpuid-AVX2=true feature.node.kubernetes.io/cpuid-SSE4.2=true feature.node.kubernetes.io/kernel-selinux.enabled=true feature.node.kubernetes.io/kernel-version.full=3.10.0-957.5.1.el7.x86_64 feature.node.kubernetes.io/pci-0300_10de.present=true feature.node.kubernetes.io/storage-nonrotationaldisk=true feature.node.kubernetes.io/system-os_release.ID=rhcos feature.node.kubernetes.io/system-os_release.VERSION_ID=4.0 feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4 feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=0 Steer workloads based on infrastructure capabilities
  • 23. 23 https://github.com/intel/multus-cni NFV Partner Engineering along with the Network Plumbing Working Group is using Multus as part of a reference implementation. Multus CNI is a “meta plugin” for Kubernetes CNI which enables one to attach multiple network interfaces on each pod. It allows one to assign a CNI plugin to each interface created in the pod.
  • 24. 24 THE PROBLEM (Today) Kubernetes Master/Node Pod A eth0 flannel #1 Each pod only has one network interface Kubernetes Master/Node #2 Each master/node has only one static CNI configuration so. static.
  • 25. 25 THE SOLUTION (Today) Kubernetes Master/Node Static CNI configuration points to Multus Kubernetes Master/Node Each subsequent CNI plugin, as called by Multus, has configurations which are defined in CRD objects Pod C eth0 net0 flannel macvlan I’d like a flannel interface, and a macvlan interface please. flannel macvlan CRDs Sure thing bud, I’ll pull up the configurations stored in CRD objects. Pod annotation
  • 26. 26 WHAT MULTUS DOES Pod eth0 Pod eth0 net0 OpenShift SDN CNI (default) macvlan Pod without Multus Pod with Multus Kubernetes Multus CNI Kubernetes OpenShift SDN CNI macvlan CNI OpenShift SDN CNI OpenShift SDN CNI (default)
  • 27. 27 The specification uses annotations to call out a list of intended network attachments as “sidecar networks” Standardized CRD apiVersion: v1 kind: Pod metadata: name: pod_c annotations: kubernetes.cni.cncf.io/networks: '[ { "name": "flannel-conf" }, { "name": "macvlan-conf" } ]' spec: containers: [...] Name: macvlan-conf Namespace: default Labels: <none> Annotations: <none> API Version: cni.cncf.io/v1 Args: [ { "master": "eth0", "mode": "bridge", ... Kind: Network Plugin: macvlan Metadata: [...] Pod annotations CRD Object CNI network configurations are packed inside CRD objects. Mapsto... As currently proposed by Network Plumbing Working Group.
  • 28. Installation and Day 2 Management of NVIDIA GPUs in OpenShift 4
  • 29. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted Roadmap: Operationalizing GPUs on OpenShift 4
  • 30. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted Roadmap: Specialized Hardware in OpenShift 4 machine-config-operator special-resource-operator (NIC) prometheus/grafana dashboards openshift-multus daemonsetcluster-network-operator cluster-node-tuning-operator cluster-nfd-operator special-resource-operator (GPU) machine-api-operator
  • 31. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted Special Resource Operator Daemonset Roadmap: Special Resource Operator OpenShift Node | GPU | FPGA | NIC | OTHER Node Feature Discovery Operator OpenShift Node | GPU | FPGA | NIC | OTHER Node Object: Labels: feature.node.kubernetes.io/pci-0300_10de.prese nt=true Capacity: example.com/gpu: 4 Driver Container (optional) Device Plugin Container Monitoring Container (Prometheus endpoint) Cluster Node Tuning Operator (next gen tuned) Blue Boxes: owned, supported, shipped by Red Hat Green Boxes: owned, supported, shipped by Partner
  • 32. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted Soft or Hard Shared Cluster Partitioning? Priority and Preemption ● Create PriorityClasses based on business goals ● Annotate pod specs with priorityClassName ● If all GPUs are used ○ A high prio pod is queued ○ A low prio pod is running ○ Kube will preempt low prio pod ■ And schedule high prio pod ● Ensures optimal density Taints and Toleration ● Taints are “node labels with policies” ○ You can taint a node like ○ nvidia.com/gpu=value:NoSchedule ● Then a pod will have to “tolerate” the nvidia.com/gpu taint, otherwise it won’t run on that node. ● This allows you to create “node pools” ● Could lead to under-utilized resources ● Might make sense for security or business rules
  • 33. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted Enforcing Quota on GPUs (per namespace) Create a quota on a namespace # cat gpu-quota.yaml apiVersion: v1 kind: ResourceQuota metadata: name: gpu-quota namespace: nvidia spec: hard: requests.nvidia.com/gpu: 1 Verify the quota is set # oc describe quota gpu-quota -n nvidia Name: gpu-quota Namespace: nvidia Resource Used Hard -------- ---- ---- requests.nvidia.com/gpu 0 1 Expected message when exceeding quota # oc create -f gpu-pod.yaml Error from server (Forbidden): error when creating "gpu-pod.yaml": pods "gpu-pod-f7z2w" is forbidden:exceeded quota: gpu-quota, requested: requests.nvidia.com/gpu=1, used: requests.nvidia.com/gpu=1, limited: requests.nvidia.com/gpu=1
  • 34. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted ● Red Hat and NVIDIA are collaborating to improve the user experience of NVIDIA's drivers and CUDA Toolkit on RHEL and OpenShift ● Easier install/upgrade through upcoming changes to the driver packaging (e.g., no DKMS required anymore) ○ Let us know if you are interested in a tech preview! ● Improved coordination between NVIDIA and Red Hat regarding testing, release processes, and support ● High-level goal is to make NVIDIA's driver feel more like a normal in-box driver NVIDIA Driver Packaging
  • 35. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted Special Resource Operator Daemonset Roadmap: Special Resource Operator OpenShift Node | GPU | FPGA | NIC | OTHER Node Feature Discovery Operator OpenShift Node | GPU | FPGA | NIC | OTHER Node Object: Labels: feature.node.kubernetes.io/pci-0300_10de.prese nt=true Capacity: example.com/gpu: 4 Driver Container (optional) Device Plugin Container Monitoring Container (Prometheus endpoint) Cluster Node Tuning Operator (next gen tuned) Blue Boxes: owned, supported, shipped by Red Hat Green Boxes: owned, supported, shipped by Partner
  • 36. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted Thank You! ● Come see us @ Booth 716 ● Accompanying Announcement blog
  • 38. NVIDIA GTC 2019: Red hat and the NVIDIA DGX Tried, Tested, Trusted Demo Link 1. Show no driver 2. Show node labels 3. Show nfd operator and node label differences, focus on PCI row (CPU node and GPU node) 4. Show GPU operator create and tail operator logs 5. Show oc describe node (nvidia.com/gpu=X) 6. Taints and Tolerations? Show oc describe node focus on taints (nvidia.com/gpu:NoSchedule) 7. Priority/Preemption Show oc get priorityclasses 8. GPU workload demo…