Chekuri S. Choudary
Cognitive Systems Specialist, IBM
Chekuri.Choudary@ibm.com
November 10, 2018
Deep Learning On OpenPOWER
Academic Discussion Group Workshop 2018
Dallas TX
Deep Learning In Ocean Engineering
Use Case 1:
Whale Conservation Using AIS and Acoustic Data
Use Case 2:
Satellite Automatic Identification System (S-AIS Data)
HPC Infrastructure Design for AIS Applications
Cognitive Applications of AIS Data
Deep Learning for Whale Conservation
• Fewer than 500 Atlantic Right Whales are alive
• 17 right whales died last summer in Atlantic Ocean
• Acoustic sensors installed in ocean waters
• Whales sounds are manually classified by the marine biologists
A dead North Atlantic right whale,
spotted off the coast of Virginia on
Jan. 22, 2018.
Deep Learning for Whale Conservation
Segmentation
and Frequency
Domain
Conversion
Raw Acoustic
Files
Ocean Sound
Classification
Algorithm
AIS DatabaseOcean Acoustics
Database
Ship Collision Detection,
Whale Conservation
Strategies
• CNNs on Spectrograms
• LSTM for Unclassified
Whales
Satellite Automatic Identification System (S-AIS) Data
Reference: MEOPAR – exactEarth – Dalhousie S-AIS Data Initiative, Casey Hilliard, Dr. Stan Matwin, and Dr. Ron Pelot
• Tracking system on ships
• Most ships transmit data to coast and satellites at fixed time intervals
• Unique identity, type, position, speed etc.
• S-AIS provide a holistic view of the oceans
• Incredibly useful for ocean management and monitoring
HPC Infrastructure Design For S-AIS Applications
Local storage
IBM PowerAI, IBM Spectrum Conductor (Spark applications with GPU Scheduling)
1) Populating Database using one or more nodes
2) Querying Database. One node if PostGres and MapD. Multiple nodes if using Kinetica. Optionally outputs
HDF5, LMDB, and TFRecord files that can be used by Caffe, Tensorflow, and other distributed deep learning
applications. The query results can also be fed directly to applications.
3) Ocean data Analytics based on Spark, MapReduce, MPI etc.
IBM Spectrum LSF (MPI applications)
IBM ESS Storage
PostGres with POST-STROM GPU
plugin (5-10 TB) (single node)
MapD (single node) (5 – 10 TB)
Kinetica (Distributed) (5 – 10 TB)
SOLR (Distributed) (10 – 20 TB)
HDF5 files
LMDB files
TFRecords
files
POWER8 GPUs
Local storage
POWER8 GPUs
Local storage
POWER8 GPUs
Local storage
POWER8 GPUs
7
Enterprises Embracing Open-
Source AI Software
Enterprises building
machine learning teams
Most using Open-Source software:
TensorFlow is most popular
Today: proofs of concept/technology
Tomorrow: congratulations! You’re in
production
57%
AI Developed
On-Premise
42%
Cloud Hosted
Modeling
57%
Developed
using on-prem
resources
8
Original design Simplify the process of installing
and running optimized deep learning on Power
Integrated AI Platform
Caffe
9
PowerAI
Open-Source Based
Enterprise AI Platform
Open Source Frameworks:
Supported Distribution
Developer Ease-of-Use Tools
Faster Training Times via
HW & SW Performance Optimizations
Integrated & Supported AI Platform
3-4x Speedup for AI Training
Ease of Use Tools for Data Scientists
GPU-Accelerated
Power Servers
Storage
Caffe
SnapML
10
PowerAI Features
• Traditional Machine Learning at Scale (SnapML)
• Hyperparameter Search & Optimization
• Distributed Training Across Multiple GPUs & Servers
• Large Model Support
• Handling Multiple Training Jobs Concurrently
• Data Preparation Tools
• Training Visualization
• Model Selection
Distributed Deep Learning (DDL)
Using the Power of 100s of Servers
16 Days Down to 7 Hours: Near Ideal Scaling to 256 GPUs and
Beyond
1 System 64 Systems
16 Days
7 Hours
ResNet-101, ImageNet-22K, Caffe with PowerAI DDL, Running on Minsky (S822Lc) Power System
58x Faster
What is Snap Machine Learning?
IBM Research - Zurich / Introduction to Snap Machine Learning / May 2018 / © 2018 IBM Corporation
12
Framework Models
GPU
Acceleration
Distributed
Training
Sparse Data
Support
scikit-learn ML No No Yes
Apache
Spark* MLlib
ML No Yes Yes
TensorFlow** ML/{DL} Yes Yes Limited
Snap ML GLMs Yes Yes Yes
Machine Learning
(ML)
Deep
Learning
(DL)
GLMs
Snap ML: A new framework for fast training of GLMs
* The Apache Software Foundation (ASF) owns all Apache-related trademarks, service marks, and graphic logos
on behalf of our Apache project communities, and the names of all Apache projects are trademarks of the ASF.
**TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
Example: snap-ml-mpi
IBM Research - Zurich / Introduction to Snap Machine Learning / May 2018 / © 2018 IBM Corporation
Describe application using high-level Python code.
Launch application on 4 nodes using mpirun (4 GPUs per node):
Data
DNN Model
Monitor & Prune
Select Best
Hyperparameters
Job n
Job 2
Job 1
Auto Hyper-Parameter Tuning in PowerAI
IBM Spectrum Conductor with Spark
GPU-Accelerated
Power9 Servers
• Data scientists run 100s of jobs with different Hyper-parameters
• Learning rate, Decay rate, Batch size, Optimizers (GradientDecedent, Adadelta, Momentum, RMSProp, ..)
• Auto-Tuner searches for good hyper-parameters by launching 10s of jobs & selecting the best ones
• 3 search approaches: Random, Tree-based Parzen Estimator (TPE), Bayesian
PowerAI Auto-Tuner (DL Insight)
Runtime Training Visualization
Monitor, Analyze,& Optimize
15
Data Preparation for Deep Learning
16
Import from different formats Transform, split and shuffle
IBM Model Library
ibmcws-deep-learning-caffe-samples ibmcws-deep-learning-tensorflow-samples
Talk to us if you need an new optimized model @ our Slack Channel http://ibm.biz/joinslackcws
Training
deep learning workloads
Elastic
Distributed
Multi-tenant large scale distributed
Train Larger More Complex Models
19
Large Model SupportTraditional Model Support
Limited memory on GPU forces tradeoff
in model size / data resolution
Use system memory and GPU to support more
complex and higher resolution data
CPUDDR4
GPU
PCIe
Graphics
Memory
System
Bottleneck
Here
POWER
CPU
DDR4
GPU
NVLink
Graphics
Memory
POWER NVLink
Data Pipe
Enterprise

Deep Learning using OpenPOWER

  • 1.
    Chekuri S. Choudary CognitiveSystems Specialist, IBM Chekuri.Choudary@ibm.com November 10, 2018 Deep Learning On OpenPOWER Academic Discussion Group Workshop 2018 Dallas TX
  • 2.
    Deep Learning InOcean Engineering Use Case 1: Whale Conservation Using AIS and Acoustic Data Use Case 2: Satellite Automatic Identification System (S-AIS Data) HPC Infrastructure Design for AIS Applications Cognitive Applications of AIS Data
  • 3.
    Deep Learning forWhale Conservation • Fewer than 500 Atlantic Right Whales are alive • 17 right whales died last summer in Atlantic Ocean • Acoustic sensors installed in ocean waters • Whales sounds are manually classified by the marine biologists A dead North Atlantic right whale, spotted off the coast of Virginia on Jan. 22, 2018.
  • 4.
    Deep Learning forWhale Conservation Segmentation and Frequency Domain Conversion Raw Acoustic Files Ocean Sound Classification Algorithm AIS DatabaseOcean Acoustics Database Ship Collision Detection, Whale Conservation Strategies • CNNs on Spectrograms • LSTM for Unclassified Whales
  • 5.
    Satellite Automatic IdentificationSystem (S-AIS) Data Reference: MEOPAR – exactEarth – Dalhousie S-AIS Data Initiative, Casey Hilliard, Dr. Stan Matwin, and Dr. Ron Pelot • Tracking system on ships • Most ships transmit data to coast and satellites at fixed time intervals • Unique identity, type, position, speed etc. • S-AIS provide a holistic view of the oceans • Incredibly useful for ocean management and monitoring
  • 6.
    HPC Infrastructure DesignFor S-AIS Applications Local storage IBM PowerAI, IBM Spectrum Conductor (Spark applications with GPU Scheduling) 1) Populating Database using one or more nodes 2) Querying Database. One node if PostGres and MapD. Multiple nodes if using Kinetica. Optionally outputs HDF5, LMDB, and TFRecord files that can be used by Caffe, Tensorflow, and other distributed deep learning applications. The query results can also be fed directly to applications. 3) Ocean data Analytics based on Spark, MapReduce, MPI etc. IBM Spectrum LSF (MPI applications) IBM ESS Storage PostGres with POST-STROM GPU plugin (5-10 TB) (single node) MapD (single node) (5 – 10 TB) Kinetica (Distributed) (5 – 10 TB) SOLR (Distributed) (10 – 20 TB) HDF5 files LMDB files TFRecords files POWER8 GPUs Local storage POWER8 GPUs Local storage POWER8 GPUs Local storage POWER8 GPUs
  • 7.
    7 Enterprises Embracing Open- SourceAI Software Enterprises building machine learning teams Most using Open-Source software: TensorFlow is most popular Today: proofs of concept/technology Tomorrow: congratulations! You’re in production 57% AI Developed On-Premise 42% Cloud Hosted Modeling 57% Developed using on-prem resources
  • 8.
    8 Original design Simplifythe process of installing and running optimized deep learning on Power Integrated AI Platform Caffe
  • 9.
    9 PowerAI Open-Source Based Enterprise AIPlatform Open Source Frameworks: Supported Distribution Developer Ease-of-Use Tools Faster Training Times via HW & SW Performance Optimizations Integrated & Supported AI Platform 3-4x Speedup for AI Training Ease of Use Tools for Data Scientists GPU-Accelerated Power Servers Storage Caffe SnapML
  • 10.
    10 PowerAI Features • TraditionalMachine Learning at Scale (SnapML) • Hyperparameter Search & Optimization • Distributed Training Across Multiple GPUs & Servers • Large Model Support • Handling Multiple Training Jobs Concurrently • Data Preparation Tools • Training Visualization • Model Selection
  • 11.
    Distributed Deep Learning(DDL) Using the Power of 100s of Servers 16 Days Down to 7 Hours: Near Ideal Scaling to 256 GPUs and Beyond 1 System 64 Systems 16 Days 7 Hours ResNet-101, ImageNet-22K, Caffe with PowerAI DDL, Running on Minsky (S822Lc) Power System 58x Faster
  • 12.
    What is SnapMachine Learning? IBM Research - Zurich / Introduction to Snap Machine Learning / May 2018 / © 2018 IBM Corporation 12 Framework Models GPU Acceleration Distributed Training Sparse Data Support scikit-learn ML No No Yes Apache Spark* MLlib ML No Yes Yes TensorFlow** ML/{DL} Yes Yes Limited Snap ML GLMs Yes Yes Yes Machine Learning (ML) Deep Learning (DL) GLMs Snap ML: A new framework for fast training of GLMs * The Apache Software Foundation (ASF) owns all Apache-related trademarks, service marks, and graphic logos on behalf of our Apache project communities, and the names of all Apache projects are trademarks of the ASF. **TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
  • 13.
    Example: snap-ml-mpi IBM Research- Zurich / Introduction to Snap Machine Learning / May 2018 / © 2018 IBM Corporation Describe application using high-level Python code. Launch application on 4 nodes using mpirun (4 GPUs per node):
  • 14.
    Data DNN Model Monitor &Prune Select Best Hyperparameters Job n Job 2 Job 1 Auto Hyper-Parameter Tuning in PowerAI IBM Spectrum Conductor with Spark GPU-Accelerated Power9 Servers • Data scientists run 100s of jobs with different Hyper-parameters • Learning rate, Decay rate, Batch size, Optimizers (GradientDecedent, Adadelta, Momentum, RMSProp, ..) • Auto-Tuner searches for good hyper-parameters by launching 10s of jobs & selecting the best ones • 3 search approaches: Random, Tree-based Parzen Estimator (TPE), Bayesian PowerAI Auto-Tuner (DL Insight)
  • 15.
  • 16.
    Data Preparation forDeep Learning 16 Import from different formats Transform, split and shuffle
  • 17.
    IBM Model Library ibmcws-deep-learning-caffe-samplesibmcws-deep-learning-tensorflow-samples Talk to us if you need an new optimized model @ our Slack Channel http://ibm.biz/joinslackcws
  • 18.
  • 19.
    Train Larger MoreComplex Models 19 Large Model SupportTraditional Model Support Limited memory on GPU forces tradeoff in model size / data resolution Use system memory and GPU to support more complex and higher resolution data CPUDDR4 GPU PCIe Graphics Memory System Bottleneck Here POWER CPU DDR4 GPU NVLink Graphics Memory POWER NVLink Data Pipe Enterprise