bit.ly/kubemaster1
1
GPU Enablement for Data
Science on OpenShift
Pete MacKinnon
Red Hat AI Center of Excellence
@pdmackinnon
● pmackinn@redhat.com
● Principal Engineer in the Red Hat AI Center of Excellence
● Kubeflow committer since project formation
● Open Data Hub and NVIDIA GPU Operator contributor
● KubeCon, TensorFlow World, GTC, ODSC, OpenShift
Commons, and SCaLE 17x presenter
● Technical Editor for upcoming Kubeflow publication
● Co-author of “Linux Unleashed”
● Thirty years of distributed computing consulting and
engineering experience
• Data science: data and models
• AI/ML lifecycle: training to inference
• Scalars, vectors, and tensors
• CPU and GPU
• Notebooks and frameworks
• The OpenShift GPU operator “family”
• The components of GPU enablement
• Installation and demo
Agenda
Data
Models
The AI/ML lifecycle
Inference/Serving
Training
Data collection
Feature
extraction
Labeling
Monitoring
Logging
Analysis
Transformation
Validation
Splitting
Model validation
Hyperparameter tuning
Algorithm selection or
development
Model Data and Model
in Production
Data
Scalars, vectors, and tensors
Scalar - a real number having magnitude that measures
something: volume, density, speed, energy, mass, time, etc.
Vector - a one-dimensional array of scalars: force, velocity,
momentum, etc.
Tensor - a higher-order algebraic object that could be a scalar, a
vector, a multidimensional array, a multilinear map, etc.
Modern CPU have advanced instruction sets for vector algebra
but modern GPU are built specifically to perform complex
tensor operations with a high degree of parallelism
Scalars, vectors, and tensors
How many matrix multiplications can be done in one clock cycle?
Image: https://iq.opengenus.org/
10¹ 10⁴ 10⁵
So, in one clock cycle...
CPU (scalar)
CPU/GPU
(vector)
GPU (tensor)
Or, DL with real world data...
Object
(scalar)
Movement
(vector)
Classification, velocity,
bearing, and much more
(tensor)
CPU and GPU
NVIDIA Ampere A100
• 6912 FP32 CUDA Cores
• 432 Gen3 Tensor Cores
but
• FP32 -> 19.5 TFLOPS
AMD EPYC 7702 (Rome)
• 64 CPU Cores
• 128 Threads
• 2.0GHz Base Clock
• FP32 -> 1-2 TFLOPS
A GPU notebook
Profit
380x speedup over CPU in basic CNN smoke test
(Intel Xeon E5-2686 vs. NVIDIA V100-SXM2-16Gi)
Special Resource Operator
(SRO)
● Community operator
● Reference
implementation for other
specialized hardware
○ NIC, FPGA
● Provided the code basis
for the NVIDIA GPU
Operator
● Deployed from
OperatorHub
GPU operators
NVIDIA GPU Operator
● Certified and supported on
OpenShift by NVIDIA and Red Hat
● Can be deployed from embedded
OperatorHub or with Helm
Both operators require node feature
discovery (NFD)
NVIDIA also provides the GPU feature
operator for enhanced labeling
Operator components
• Container-runtime-toolkit: The NVIDIA GPU Operator
supports docker and cri-o container runtimes. This daemonset
ensures the correct runtime setup for the GPU hook.
• Driver: A container deployed as a daemonset that holds all
userspace and kernelspace software to make the GPU device
work.
• Device plugin: A daemonset that monitors the health and
availability of the GPU on the node. Vital for pod scheduling.
• DCGM: Data Center GPU Monitoring - a node exporter that
captures GPU metrics for use by Prometheus.
nodeSelector:
feature.node.kubernetes.io/pci-10de.present: "true"
Installation
Demo
Thank You

GPU enablement for data science on OpenShift | DevNation Tech Talk

  • 1.
    bit.ly/kubemaster1 1 GPU Enablement forData Science on OpenShift Pete MacKinnon Red Hat AI Center of Excellence
  • 2.
    @pdmackinnon ● pmackinn@redhat.com ● PrincipalEngineer in the Red Hat AI Center of Excellence ● Kubeflow committer since project formation ● Open Data Hub and NVIDIA GPU Operator contributor ● KubeCon, TensorFlow World, GTC, ODSC, OpenShift Commons, and SCaLE 17x presenter ● Technical Editor for upcoming Kubeflow publication ● Co-author of “Linux Unleashed” ● Thirty years of distributed computing consulting and engineering experience
  • 3.
    • Data science:data and models • AI/ML lifecycle: training to inference • Scalars, vectors, and tensors • CPU and GPU • Notebooks and frameworks • The OpenShift GPU operator “family” • The components of GPU enablement • Installation and demo Agenda
  • 4.
  • 5.
  • 6.
    The AI/ML lifecycle Inference/Serving Training Datacollection Feature extraction Labeling Monitoring Logging Analysis Transformation Validation Splitting Model validation Hyperparameter tuning Algorithm selection or development Model Data and Model in Production Data
  • 7.
    Scalars, vectors, andtensors Scalar - a real number having magnitude that measures something: volume, density, speed, energy, mass, time, etc. Vector - a one-dimensional array of scalars: force, velocity, momentum, etc. Tensor - a higher-order algebraic object that could be a scalar, a vector, a multidimensional array, a multilinear map, etc. Modern CPU have advanced instruction sets for vector algebra but modern GPU are built specifically to perform complex tensor operations with a high degree of parallelism
  • 8.
    Scalars, vectors, andtensors How many matrix multiplications can be done in one clock cycle? Image: https://iq.opengenus.org/ 10¹ 10⁴ 10⁵
  • 9.
    So, in oneclock cycle... CPU (scalar) CPU/GPU (vector) GPU (tensor)
  • 10.
    Or, DL withreal world data... Object (scalar) Movement (vector) Classification, velocity, bearing, and much more (tensor)
  • 11.
    CPU and GPU NVIDIAAmpere A100 • 6912 FP32 CUDA Cores • 432 Gen3 Tensor Cores but • FP32 -> 19.5 TFLOPS AMD EPYC 7702 (Rome) • 64 CPU Cores • 128 Threads • 2.0GHz Base Clock • FP32 -> 1-2 TFLOPS
  • 12.
  • 13.
    Profit 380x speedup overCPU in basic CNN smoke test (Intel Xeon E5-2686 vs. NVIDIA V100-SXM2-16Gi)
  • 14.
    Special Resource Operator (SRO) ●Community operator ● Reference implementation for other specialized hardware ○ NIC, FPGA ● Provided the code basis for the NVIDIA GPU Operator ● Deployed from OperatorHub GPU operators NVIDIA GPU Operator ● Certified and supported on OpenShift by NVIDIA and Red Hat ● Can be deployed from embedded OperatorHub or with Helm Both operators require node feature discovery (NFD) NVIDIA also provides the GPU feature operator for enhanced labeling
  • 15.
    Operator components • Container-runtime-toolkit:The NVIDIA GPU Operator supports docker and cri-o container runtimes. This daemonset ensures the correct runtime setup for the GPU hook. • Driver: A container deployed as a daemonset that holds all userspace and kernelspace software to make the GPU device work. • Device plugin: A daemonset that monitors the health and availability of the GPU on the node. Vital for pod scheduling. • DCGM: Data Center GPU Monitoring - a node exporter that captures GPU metrics for use by Prometheus. nodeSelector: feature.node.kubernetes.io/pci-10de.present: "true"
  • 16.
  • 17.
  • 18.