Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs

Build FAST Deep Learning
Apps with Docker on
OpenPOWER and GPUs
Accelerated training and inference
Indrajit Poddar (I.P)
Ashwin Srinivas
IBM

IBM Systems
Deep Learning
What you and I (our brains) do without even thinking about it…..we recognize a bicycle

IBM Systems
Now machines are learning the way we learn….
3
From "Texture of the
Nervous System of Man and the
Vertebrates" by
Santiago Ramón y Cajal.
Artificial Neural Networks

IBM Systems
But training needs a lot computational resources
Easy scale-out with:
But Deep Learning model training is not easy to distribute
Training can take hours, days or weeks
Input data and model sizes are becoming
larger than ever (e.g. video input, billions of features
etc.)
Real-time analytics with:
Unprecedented demand for offloaded computation,
accelerators, and higher memory bandwidth systems
Resulting in….
Moore’s law is dying

IBM Systems
OpenPOWER: Open Hardware for High Performance
5
Systems designed for
big data analytics
and superior cloud economics
Upto:
12 cores per cpu
96 hardware threads per cpu
1 TB RAM
7.6Tb/s combined I/O Bandwidth
GPUs and FPGAs coming…
OpenPOWER
Traditional
Intel x86
http://www.softlayer.com/POWER-SERVERS
https://mc.jarvice.com/

IBM Systems
Nimbix Cloud Adds IBM “Minsky” S822LC for HPC
PaaS+SaaS
Containerized:
Platform delivers
industry best
performance
and agility at the
lowest cost to
the customer
“True HPC Cloud Eliminates
Virtualization and Embraces
Containerization + Acceleration for
Native Bare-Metal Performance”
Nimbix Cloud
Advantages
•Easier to use
•Highest Performance
•Ultra Fast Launch
Times
•Lower Cost
•Faster time to Value
•Bare-Metal
Acceleration
•Enterprise
Accounting
•Application
Marketplace
•Private Apps
•Private Cloud Option
https://mc.jarvice.com/
https://power.jarvice.com

IBM Systems
The Cognitive Revolution
New Paradigm, New Chip, New Servers
S822LC for High
Performance Computing
POWER8 + coherent
CAPI + novel NVLink
for high BW coherent
CPU/GPU acceleration
New Chip
“POWER8 with NVLink”
Accelerated AI
S821LC:
High Density 2-Socket 1U
S822LC for Big Data
New Power
Linux Servers
Accelerator XAccelerator X
M.Gschwind, Bringing the Deep Learning Revolution into the Enterprise

IBM Systems
Introducing 822LC Power System for HPC
First Custom-Built GPU Accelerator Server with NVLink
2.5x Faster CPU-GPU Data
Communication via NVLink
NVLink
80 GB/s
GPUGPU
P8P8
GPUGPU GPUGPU
P8P8
GPUGPU
PCIe
32 GB/s
GPUGPU
x86x86
GPUGPU GPUGPU
x86x86
GPUGPU
No NVLink between CPU &
GPU for x86 Servers: PCIe
Bottleneck
NVIDIA P100 Pascal GPU
POWER8 NVLink Server x86 Servers with PCIe
• Custom-built GPU Accelerator Server
• High-Speed NVLink Connections between
CPUs & GPUs and among GPUs
• Features novel NVIDIA P100 Pascal GPU
accelerator
M.Gschwind, Bringing the Deep Learning Revolution into the Enterprise

IBM Systems
TensorFlow on Tesla P100: PowerAI is 30% faster
9
IBM S822LC 20-cores 2.86GHz 512GB memory / 4 NVIDIA Tesla P100 GPUs / Ubuntu 16.04 / CUDA 8.0.44 /
cuDNN 5.1 / TensorFlow 0.12.0 / Inception v3 Benchmark (64 image minbatch)
Intel Broadwell E5-2640v4 20-core 2.6 GHz 512GB memory / 4 NVIDIA Tesla P100 GPUs/ Ubuntu 16.04 / CUDA
8.0.44 / cuDNN 5.1 / TensorFlow 0.12.0 / Inception v3 Benchmark (64 image minbatch)
Larger value is better

IBM Systems
PowerAI vs DGX-1: 1.6x TensorFlow Throughput / Dollar
10
▪ TensorFlow 0.12 on the IBM PowerAI platform takes
advantage of the full capabilities of NVLink
▪ For image classification and analysis this means a 1.6X price
performance advantage relative to the NVIDIA DGX-1
System Images / Second List Price $ / Image / Second
NVIDIA DGX-1
(8 P100 GPU,
512GB Mem)
330 $129,000 $390
PowerAI (4 P100
GPU, 512 GB
Mem)
273 $67,000 $241
Lower cost is better

IBM Systems
NVLink and P100 advantage
• NVLink reduces communication time and overhead
• Incorporating the fastest GPU for deep learning
• Data gets from GPU-GPU, Memory-GPU faster, for shorter training times
x86 based
GPU system
POWER8 +
Tesla
P100+NVLink
ImageNet / Alexnet: Minibatch size = 128
170 ms
78 ms
IBM advantage: data communication and
GPU performance

IBM Systems
Introducing PowerAI: Get Started Fast with Deep Learning
12
Enabled by High Performance Computing Infrastructure
Package of Pre-Compiled Major
Deep Learning Frameworks
Easy to install & get started with
Deep Learning with Enterprise-
Class Support
Optimized for Performance
To Take Advantage of NVLink

IBM Systems
Machine Learning and Deep Learning analytics on OpenPOWER
No code changes needed!!
13
ATLAS
Automatically Tuned Linear Algebra
Software)

IBM Systems
OpenPOWER: GPU support
14
Credit: Kevin Klaues, Mesosphere
IBM Spectrum
Conductor includes
enhanced support for
fine grained GPU and
CPU scheduling with
Apache Spark and
Docker
Mesos supports GPUs
Huge speed-ups with GPUs and OpenPOWER!

IBM Systems
ENABLING Accelerators/GPUs in the Cloud Stack
15
Deep Learning Training + Inference
Containers
and images
Accelerators
Clustering frameworks

IBM Systems
Build Deep Learning Docker Images Using PowerAI Software
16
Dockerfile
FROM nimbix/ubuntu-cuda-ppc64le
RUN wget --no-verbose
http://developer.download.nvidia.com/compute/machine-
learning/repos/ubuntu1404/ppc64el/nvidia-machine-
learning-repo-ubuntu1404_1.0.0-1_ppc64el.deb && dpkg
--install nvidia-*.deb && rm -f nvidia-*.deb && apt-get
update
RUN wget --no-verbose
http://download.boulder.ibm.com/ibmdl/pub/software/serv
er/mldl/mldl-repo-local_3.3.0_ppc64el.deb &&
dpkg --install mldl*.deb &&
apt-get update &&
apt-get -y install power-mldl openmpi-
bin numactl libopenmpi-dev &&
apt-get clean
RUN apt-get update && apt-get -y install power-mldl
openmpi-bin numactl libopenmpi-dev && apt-get clean
POWER

IBM Systems
NVIDIA Docker
17
https://github.com/NVIDIA/nvidia-docker
• A Docker wrapper and tools
to package and GPU based apps
• Uses drivers on the host
• No need to include drivers in
Docker image
• No GPU scheduling
• Good for single node
• Available on POWER

IBM Systems
Demo on NIMBIX
| 18

Thank you.
IBM Systems
ibm.com/systems
| 19

Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs

More Related Content

What's hot

Similar to Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs

Recently uploaded

Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs

Editor's Notes