Introduction to PowerAI - The Enterprise AI Platform

PowerAI
The Enterprise AI Platform
Indrajit Poddar (I.P)
ipoddar@us.ibm.com
STSM, IBM Systems Technical Strategy
July, 2017

$-
$2,000.00
$4,000.00
$6,000.00
$8,000.00
$10,000.00
$12,000.00
$14,000.00
$16,000.00
$18,000.00
2017 2018 2019 2020
Deep Learning Hardware Revenue: $1.8B-$15-6B

Cognitive Market Spend(2019)
Software Services Hardware Other
5
$31 Billion
IDC Market Data

Artificial
Intelligence &
Cognitive
Applications
Big
Data
Machine
Learning
Deep
Learning
(Neural Nets)
The Cognitive Landscape is Evolving

Core concepts in Machine Learning:
Training  Inference
Training
• Data intensive:
historical data sets
• Compute intensive:
100% accelerated
• Develop a model for
use on the edge as
inference
Inference
• Enables the computer
to act in real time
• Low Power
• Out at the edge

Input Result
Earlier Layers
Detect Edges
Later Layers Detect
Features like Eyes,
Nose, Mouth

person
car
helmet
motorcycle
bird
frog
person
dog
chair
person
hammer
flower pot
power drill

13
Transmission Line
Inspection

15
To build a team with deep learning
expertise : 2 months ~ 1 year
To prepare massive training
data : ~ 10 man month(s)
To train a new
model : 1 hour ~
week
To give an AI
inference result :
< 1s
Challenges in creating an AI infrastructure
Time needed to:
• Find skills
• Handle large data-sets
• Hi-res images, video feed..
• Continuously train models
• Run inferencing at scale
• Handle rapidly evolving open
source components
CPUs are not getting faster as rapidly as before
- Moore’s law is dying
Resulting in unprecedented demand for :
• Offloaded computation, accelerators, and higher
memory bandwidth systems
• Easy to use software that works with open source and
scales

PowerAI: Enterprise Class, Ease of Use, Faster Training
Enterprise Software
Distribution
Binary Package of Major
Deep Learning Frameworks
with Enterprise Support
Tools for Ease of
Development
Graphical tools to Enhance
Data Scientist Developer
Experience
Faster Training Times
for Data Scientists
Performance Optimized for
Single Node & Distributed
Computing Scaling

17
Data Lake
Transform & Prep
Data (ETL)
Trained Model
Images of
Damaged
Components
ModelTraining
Transform & Prep
Data (ETL)
Off-Line
Training
Production
LiveVideo

Financial Services
Retailers
Internal Business Processes
Chatbots, Call Center Automation
Transportation
Text Analytics of Social Media, Call
Center Phone Logs
18

MEDIA/ENTERTAINMENT
RETAIL
Reco. Engines,
Precision Mktg
COMMUNICATIONS
Location-based
advertising
LIFE SCIENCES
Sequence Analysis,
Radiology
UTILITIES
Smart Meter analysis,
Capacity planning
$
FINANCIAL SERVICES
Risk analysis
Fraud detection
CUSTOMER SERVICE
Chatbots, Helpdesk
Automated Expenses
LAW & DEFENSE
Threat analysis - social
media monitoring
RESEARCH
Physics
Modeling
HEALTH CARE
Patient sensors,
monitoring, EHRs
TRANSPORTATION
Optimal traffic flows,
Route planning
CONSUMER GOODS
Sentiment
analysis
Advertising
effectiveness
OIL & GAS
Exploration,
sensor analysis
AUTOMOTIVE
ADAS,
Maintenance
MANUFACTURING
Line inspection,
Defect analysis

21
AI Strategy: Ease of Use & Performance
Open Frameworks
Developer Ease-of-Use Tools
Performance Optimizations:
Software & Hardware

Caffe NVCaffe TorchIBMCaffe
Distributed
TensorFlowTensorFlow
OpenBLAS
Theano
Deep Learning
Frameworks
Accelerated
Servers and
Infrastructure
for Scaling
Spectrum Scale:
High-Speed Parallel
File System
Scale to
Cloud
Cluster of NVLink
Servers
Bazel DIGITSNCCL
Distributed
Communications
Supporting
Libraries
Chainer

PowerAI
DL Frameworks + Libraries
(TensorFlow, Caffe, ..)
IBM Data Science
Experience (DSX)
Distributed Computing
with Spark & MPI
DL Developer Tools
Spectrum Scale High-Speed
File System via HDFS APIs
Cluster of NVLink Servers
PowerAI Enterprise (Coming soon)
IBM Enterprise
Support
Application Dev
Services
Enterprise Support & Services
to Augment Enterprise
Expertise
Packaged, Pre-Compiled Deep
Learning Frameworks
(TensorFlow, Caffe, Torch, ..)
Optimized for Scaling &
Fast Training Time
Data Scientists Productivity
Tools Targeted to DL
Developers
IBM Confidential

PowerAI: Making AI More Accessible to Developers
• AI Vision: Targeted at Application Developers
• Data Extraction, Transformation and Preparation tool
• DL Insight
• Distributed Deep Learning
Multi-tenant, Enterprise-ready Deep Learning Platform for Data Scientists
24

caffe-bvlc: install cuda,cuDNN, install openblas, install protobuf, clone, build and install opencv, install python, install python-dev, install
libgflags, install libgoogle-glog-dev, install liblmdb-dev, edit make file to enable CuDNN, make all, make distribution
Torch: complicated on Power as luaJIT has mixed support for OpenPOWER. We use a luaJIT fork to build.
caffe-nv: same dependencies as caffe-bvlc; separate upstream repo for caffe-nv, specific versions are needed for newer versions of Nvidia’s
DIGITS tool.
caffe-ibm: same dependencies as caffe-bvlc, separate build stream; versions; updates
Tensorflow: in PIP for x86, but it is often recommended to build from source: upgrade pip, install Bazel, install many dependencies including
java, configure the build, compile, pip install whl, upgrade protobuf
Theano: install python, numpy, scipy, openBLAS, python-dev, nose, Sphinx, cuda, pycuda, clone, build and install libgpuarray
DIGITS: clone digits from repo, install dependencies (PIP)
Life without PowerAI:
With PowerAI:
PowerAI: install cuda, cuDNN; sudo apt-get install power-mldl

DL Frameworks
(TF, Caffe, etc)
Data Prep & ETL via
Spectrum Conductor
with Spark
Input
Data
Deep Learning GUI
Data & Model
Management, ETLTools,
Monitor,Visualize,
Advise
DL Insight
Tuning Engine
AIVision
ComputerVisionApp
DevelopmentToolkit
IBM Spectrum Conductor with Spark
System mgmt, Distributed ETL, DistributedTraining, Hyper-Parameter Optimization
DistributedTraining

27
Data Lake & Data Stores
Distributed Computing
Machine & Deep Learning
Libraries & Frameworks
CognitiveAPIs
(Eg:Watson)
In-House
CognitiveAPIs
Applications
Hadoop HDFS,
NoSQL DBs
Spark, MPI
TensorFlow, Caffe,
SparkML
Speech,Vision,
NLP, Sentiment
Segment Specific:
Finance, Retail,
Healthcare, etc.
Accelerated Servers Storage
Accelerated
Infrastructure
Transform & Prep
Data (ETL)

https://mc.jarvice.com/
28
ATLAS
Automatically Tuned Linear Algebra
Software)
https://power.jarvice.com/

29
Deep Learning Training + Inference
Accelerators
Clustering frameworks
Workload
Aware
Scheduling
Shared
Resource
Management
Emerging
Workloads
Dev Ops & Micro Services
High Performance
Computing
Design / Simulation / Modeling
‘New-gen
Workloads’
Hadoop, Spark, Containers
with Spark
IBM
Cloud
private
Ne
w
High Performance
Analytics
Trade / Risk Analytics
Containers and images
IBM Data
Science
Experience

31
IBM OpenPOWER Moves on Deep Learning with a Vengeance
“In short, IBM kicked some butt today”
Rob Enderle
Industry Analyst
IBM brings Google's AI tools to its powerful computers
Google has cool technology to recognize images and speech, and IBM's
hardware can diagnose diseases and beat humans in Jeopardy.
Combine the two, and you get a powerful computer with serious brains.

OpenPOWER: Open Hardware for High Performance
32
Systems designed for
big data analytics
and superior cloud economics
Upto:
10 cores per cpu
96 hardware threads per cpu
1/2 TB RAM
7.6Tb/s combined I/O Bandwidth
OpenPOWER
Traditional
Intel x86
http://www.softlayer.com/POWER-SERVERS
https://power.jarvice.com/landing

Accelerated AI: Chip and Servers
POWER8 + coherent CAPI +
novel NVlink
for high BW coherent
CPU/GPU acceleration
S822LC-hpc:
• 2 POWER8 10 Core CPUs
• 4 NVIDIA P100 ”Pascal” GPUs
• 256 GB System Memory
• 2 SSD storage devices
• High-speed interconnect
(IB or Ethernet, depending on
infrastructure)
• Optional:
• Up to 1 TB System Memory
• PCIe attached NVMe storage
“POWER8 with NVLink”
S821LC:
High Density 2-Socket 1U
S822LC for Big Data
S822LC for High
Performance Computing
Power
Linux Servers
M.Gschwind, Bringing the Deep Learning Revolution into the Enterprise
Accelerated AI
Accelerator X
33

Introducing 822LC Power System for HPC:
First Custom-Built GPU Accelerator Server with NVLink and NVidia P100 GPUs
M.Gschwind, Bringing the Deep Learning Revolution into the Enterprise
▪ Custom-built GPU Accelerator Server
▪ High-Speed NVLink Connections between
CPUs & GPUs and among GPUs
▪ Features novel NVIDIA P100 Pascal GPU
accelerator
NVIDIA P100 Pascal GPU
2.5x Faster CPU-GPU Data Communication
via NVLink
NVLink
80 GB/s
GPU
P8
GPU GPU
P8
GPU
POWER8 NVLink Server
PCIe
32 GB/s
GPU GPU GPU GPU
No NVLink between CPU & GPU for x86
Servers: PCIe Bottleneck
x86 Servers with PCIe
x86 x86
34

Higher Performance with Power8 CPU-P100 GPU NVLink
P100
GPU
POWER8
CPU
GPU
Memory
System
Memory
P100
GPU
80 GB/s
GPU
Memory
NVLink
115 GB/s
P100
GPU
POWER8
CPU
GPU
Memory
System
Memory
P100
GPU
80 GB/s
GPU
Memory
NVLink
115 GB/s

0
50
100
150
200
250
300
S822LC - Optimized E5-2640v4
Images Processed (Images/Sec)
(TensorFlow, Inception v3)
36
IBM S822LC 20-cores 2.86GHz 512GB memory / 4 NVIDIA Tesla P100 GPUs / Ubuntu 16.04 /
CUDA 8.0.44 / cuDNN 5.1 / TensorFlow 0.12.0 / Inception v3 Benchmark (64 image minbatch)
Intel Broadwell E5-2640v4 20-core 2.6 GHz 512GB memory / 4 NVIDIA Tesla P100 GPUs/ Ubuntu 16.04 /
CUDA 8.0.44 / cuDNN 5.1 / TensorFlow 0.12.0 / Inception v3 Benchmark (64 image minbatch)
Power8 “Minsky” Server Intel x86-Based Server
Minsky: 30% Faster

PowerAI vs DGX-1: 1.6xTensorFlowThroughput / Dollar
(lower cost is better)
37
• TensorFlow 0.12 on the IBM PowerAI
platform takes advantage of the full
capabilities of NVLink
• For image classification and analysis this
means a 1.6X price performance advantage
relative to the NVIDIA DGX-1
System Images /
Second
List Price $ / Image /
Second
NVIDIA DGX-1
(8 P100 GPU,
512GB Mem)
330 $129,000 $390
PowerAI (4 P100
GPU, 512 GB
Mem)
273 $67,000 $241

PowerAI Trial Configurations in a public cloud:
• Docker container builds and comes up in minutes
• Single P100 GPUs
• 30 days with 60 hrs standard (120 for Sales referral)
• 128GB RAM, 32 CPU threads, 1TB shared storage
• Quad P100 GPUs
• 30 days with 120hrs standard (more by request)
• 512GB RAM, 128 CPU threads, 1TB shared storage
Contact: Michael Boros
Nimbix Cloud Advantages
• Easier to use
• Highest Performance
• Ultra Fast Launch Times
• Lower Cost
• Faster time to Value
• Bare-Metal Acceleration
• Enterprise Accounting
• Application Marketplace
• Private Apps
https://www.slideshare.net/IndrajitPoddar/fast-scalable-easy-machine-
learning-with-openpower-gpus-and-docker
Experience performance
with productivity
A superior integrated stack and
adequate hardware resources for
deep learning insights

40
Launch deep learning
training by one-clickData labeling
Monitor the training
progress
Deploy the inference API to
data center
Generate and deploy the DL
inference accelerator onto FPGA
DL Engineer could get
optimized model parameters
DL Insight
DL Engineer
DSX
Inject the designed DL
network into AI Vision
AI Vision
Develop the DL
neural network via
the interactive GUI
Solution
developer
PowerAI
Inference Engine
Test
engineer
Error results will be looped back
to trigger new training task
"Easier Insights with Data Science Experience and PowerAI Deep Learning" -
https://ibm.box.com/s/m7ooeoi738rs7dq9l9v0i9iir79t4xmd
Analytics Signature Moment Event in Munich:
https://www.ibm.com/analytics/us/en/events/machine-learning/

• a 10x increase in
inspections/day
• a 90%decrease in
inspection time
• a Significant reduction
in worker accidents
Example value realized by an Asian
Utility company using PowerAI

Introduction to PowerAI - The Enterprise AI Platform

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to PowerAI - The Enterprise AI Platform

Similar to Introduction to PowerAI - The Enterprise AI Platform (20)

More from Indrajit Poddar

More from Indrajit Poddar (8)

Recently uploaded

Recently uploaded (20)

Introduction to PowerAI - The Enterprise AI Platform