Deep Learning using OpenPOWER

Chekuri S. Choudary
Cognitive Systems Specialist, IBM
Chekuri.Choudary@ibm.com
November 10, 2018
Deep Learning On OpenPOWER
Academic Discussion Group Workshop 2018
Dallas TX

Deep Learning In Ocean Engineering
Use Case 1:
Whale Conservation Using AIS and Acoustic Data
Use Case 2:
Satellite Automatic Identification System (S-AIS Data)
HPC Infrastructure Design for AIS Applications
Cognitive Applications of AIS Data

Deep Learning for Whale Conservation
• Fewer than 500 Atlantic Right Whales are alive
• 17 right whales died last summer in Atlantic Ocean
• Acoustic sensors installed in ocean waters
• Whales sounds are manually classified by the marine biologists
A dead North Atlantic right whale,
spotted off the coast of Virginia on
Jan. 22, 2018.

Deep Learning for Whale Conservation
Segmentation
and Frequency
Domain
Conversion
Raw Acoustic
Files
Ocean Sound
Classification
Algorithm
AIS DatabaseOcean Acoustics
Database
Ship Collision Detection,
Whale Conservation
Strategies
• CNNs on Spectrograms
• LSTM for Unclassified
Whales

Satellite Automatic Identification System (S-AIS) Data
Reference: MEOPAR – exactEarth – Dalhousie S-AIS Data Initiative, Casey Hilliard, Dr. Stan Matwin, and Dr. Ron Pelot
• Tracking system on ships
• Most ships transmit data to coast and satellites at fixed time intervals
• Unique identity, type, position, speed etc.
• S-AIS provide a holistic view of the oceans
• Incredibly useful for ocean management and monitoring

HPC Infrastructure Design For S-AIS Applications
Local storage
IBM PowerAI, IBM Spectrum Conductor (Spark applications with GPU Scheduling)
1) Populating Database using one or more nodes
2) Querying Database. One node if PostGres and MapD. Multiple nodes if using Kinetica. Optionally outputs
HDF5, LMDB, and TFRecord files that can be used by Caffe, Tensorflow, and other distributed deep learning
applications. The query results can also be fed directly to applications.
3) Ocean data Analytics based on Spark, MapReduce, MPI etc.
IBM Spectrum LSF (MPI applications)
IBM ESS Storage
PostGres with POST-STROM GPU
plugin (5-10 TB) (single node)
MapD (single node) (5 – 10 TB)
Kinetica (Distributed) (5 – 10 TB)
SOLR (Distributed) (10 – 20 TB)
HDF5 files
LMDB files
TFRecords
files
POWER8 GPUs
Local storage
POWER8 GPUs
Local storage
POWER8 GPUs
Local storage
POWER8 GPUs

7
Enterprises Embracing Open-
Source AI Software
Enterprises building
machine learning teams
Most using Open-Source software:
TensorFlow is most popular
Today: proofs of concept/technology
Tomorrow: congratulations! You’re in
production
57%
AI Developed
On-Premise
42%
Cloud Hosted
Modeling
57%
Developed
using on-prem
resources

8
Original design Simplify the process of installing
and running optimized deep learning on Power
Integrated AI Platform
Caffe

9
PowerAI
Open-Source Based
Enterprise AI Platform
Open Source Frameworks:
Supported Distribution
Developer Ease-of-Use Tools
Faster Training Times via
HW & SW Performance Optimizations
Integrated & Supported AI Platform
3-4x Speedup for AI Training
Ease of Use Tools for Data Scientists
GPU-Accelerated
Power Servers
Storage
Caffe
SnapML

10
PowerAI Features
• Traditional Machine Learning at Scale (SnapML)
• Hyperparameter Search & Optimization
• Distributed Training Across Multiple GPUs & Servers
• Large Model Support
• Handling Multiple Training Jobs Concurrently
• Data Preparation Tools
• Training Visualization
• Model Selection

Distributed Deep Learning (DDL)
Using the Power of 100s of Servers
16 Days Down to 7 Hours: Near Ideal Scaling to 256 GPUs and
Beyond
1 System 64 Systems
16 Days
7 Hours
ResNet-101, ImageNet-22K, Caffe with PowerAI DDL, Running on Minsky (S822Lc) Power System
58x Faster

What is Snap Machine Learning?
IBM Research - Zurich / Introduction to Snap Machine Learning / May 2018 / © 2018 IBM Corporation
12
Framework Models
GPU
Acceleration
Distributed
Training
Sparse Data
Support
scikit-learn ML No No Yes
Apache
Spark* MLlib
ML No Yes Yes
TensorFlow** ML/{DL} Yes Yes Limited
Snap ML GLMs Yes Yes Yes
Machine Learning
(ML)
Deep
Learning
(DL)
GLMs
Snap ML: A new framework for fast training of GLMs
* The Apache Software Foundation (ASF) owns all Apache-related trademarks, service marks, and graphic logos
on behalf of our Apache project communities, and the names of all Apache projects are trademarks of the ASF.
**TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

Example: snap-ml-mpi
IBM Research - Zurich / Introduction to Snap Machine Learning / May 2018 / © 2018 IBM Corporation
Describe application using high-level Python code.
Launch application on 4 nodes using mpirun (4 GPUs per node):

Data
DNN Model
Monitor & Prune
Select Best
Hyperparameters
Job n
Job 2
Job 1
Auto Hyper-Parameter Tuning in PowerAI
IBM Spectrum Conductor with Spark
GPU-Accelerated
Power9 Servers
• Data scientists run 100s of jobs with different Hyper-parameters
• Learning rate, Decay rate, Batch size, Optimizers (GradientDecedent, Adadelta, Momentum, RMSProp, ..)
• Auto-Tuner searches for good hyper-parameters by launching 10s of jobs & selecting the best ones
• 3 search approaches: Random, Tree-based Parzen Estimator (TPE), Bayesian
PowerAI Auto-Tuner (DL Insight)

Runtime Training Visualization
Monitor, Analyze,& Optimize
15

Data Preparation for Deep Learning
16
Import from different formats Transform, split and shuffle

IBM Model Library
ibmcws-deep-learning-caffe-samples ibmcws-deep-learning-tensorflow-samples
Talk to us if you need an new optimized model @ our Slack Channel http://ibm.biz/joinslackcws

Training
deep learning workloads
Elastic
Distributed
Multi-tenant large scale distributed

Train Larger More Complex Models
19
Large Model SupportTraditional Model Support
Limited memory on GPU forces tradeoff
in model size / data resolution
Use system memory and GPU to support more
complex and higher resolution data
CPUDDR4
GPU
PCIe
Graphics
Memory
System
Bottleneck
Here
POWER
CPU
DDR4
GPU
NVLink
Graphics
Memory
POWER NVLink
Data Pipe
Enterprise

Deep Learning using OpenPOWER

More Related Content

What's hot

Similar to Deep Learning using OpenPOWER

More from Ganesan Narayanasamy

Recently uploaded

Deep Learning using OpenPOWER