8. Core concepts in Machine Learning:
Training Inference
Training
• Data intensive:
historical data sets
• Compute intensive:
100% accelerated
• Develop a model for
use on the edge as
inference
Inference
• Enables the computer
to act in real time
• Low Power
• Out at the edge
15. 15
To build a team with deep learning
expertise : 2 months ~ 1 year
To prepare massive training
data : ~ 10 man month(s)
To train a new
model : 1 hour ~
week
To give an AI
inference result :
< 1s
Challenges in creating an AI infrastructure
Time needed to:
• Find skills
• Handle large data-sets
• Hi-res images, video feed..
• Continuously train models
• Run inferencing at scale
• Handle rapidly evolving open
source components
CPUs are not getting faster as rapidly as before
- Moore’s law is dying
Resulting in unprecedented demand for :
• Offloaded computation, accelerators, and higher
memory bandwidth systems
• Easy to use software that works with open source and
scales
16. PowerAI: Enterprise Class, Ease of Use, Faster Training
Enterprise Software
Distribution
Binary Package of Major
Deep Learning Frameworks
with Enterprise Support
Tools for Ease of
Development
Graphical tools to Enhance
Data Scientist Developer
Experience
Faster Training Times
for Data Scientists
Performance Optimized for
Single Node & Distributed
Computing Scaling
17. 17
Data Lake
Transform & Prep
Data (ETL)
Trained Model
Images of
Damaged
Components
ModelTraining
Transform & Prep
Data (ETL)
Off-Line
Training
Production
LiveVideo
23. PowerAI
DL Frameworks + Libraries
(TensorFlow, Caffe, ..)
IBM Data Science
Experience (DSX)
Distributed Computing
with Spark & MPI
DL Developer Tools
Spectrum Scale High-Speed
File System via HDFS APIs
Cluster of NVLink Servers
PowerAI Enterprise (Coming soon)
IBM Enterprise
Support
Application Dev
Services
Enterprise Support & Services
to Augment Enterprise
Expertise
Packaged, Pre-Compiled Deep
Learning Frameworks
(TensorFlow, Caffe, Torch, ..)
Optimized for Scaling &
Fast Training Time
Data Scientists Productivity
Tools Targeted to DL
Developers
IBM Confidential
24. PowerAI: Making AI More Accessible to Developers
• AI Vision: Targeted at Application Developers
• Data Extraction, Transformation and Preparation tool
• DL Insight
• Distributed Deep Learning
Multi-tenant, Enterprise-ready Deep Learning Platform for Data Scientists
24
25. caffe-bvlc: install cuda,cuDNN, install openblas, install protobuf, clone, build and install opencv, install python, install python-dev, install
libgflags, install libgoogle-glog-dev, install liblmdb-dev, edit make file to enable CuDNN, make all, make distribution
Torch: complicated on Power as luaJIT has mixed support for OpenPOWER. We use a luaJIT fork to build.
caffe-nv: same dependencies as caffe-bvlc; separate upstream repo for caffe-nv, specific versions are needed for newer versions of Nvidia’s
DIGITS tool.
caffe-ibm: same dependencies as caffe-bvlc, separate build stream; versions; updates
Tensorflow: in PIP for x86, but it is often recommended to build from source: upgrade pip, install Bazel, install many dependencies including
java, configure the build, compile, pip install whl, upgrade protobuf
Theano: install python, numpy, scipy, openBLAS, python-dev, nose, Sphinx, cuda, pycuda, clone, build and install libgpuarray
DIGITS: clone digits from repo, install dependencies (PIP)
Life without PowerAI:
With PowerAI:
PowerAI: install cuda, cuDNN; sudo apt-get install power-mldl
26. DL Frameworks
(TF, Caffe, etc)
Data Prep & ETL via
Spectrum Conductor
with Spark
Input
Data
Deep Learning GUI
Data & Model
Management, ETLTools,
Monitor,Visualize,
Advise
DL Insight
Tuning Engine
AIVision
ComputerVisionApp
DevelopmentToolkit
IBM Spectrum Conductor with Spark
System mgmt, Distributed ETL, DistributedTraining, Hyper-Parameter Optimization
DistributedTraining
27. 27
Data Lake & Data Stores
Distributed Computing
Machine & Deep Learning
Libraries & Frameworks
CognitiveAPIs
(Eg:Watson)
In-House
CognitiveAPIs
Applications
Hadoop HDFS,
NoSQL DBs
Spark, MPI
TensorFlow, Caffe,
SparkML
Speech,Vision,
NLP, Sentiment
Segment Specific:
Finance, Retail,
Healthcare, etc.
Accelerated Servers Storage
Accelerated
Infrastructure
Transform & Prep
Data (ETL)
29. 29
Deep Learning Training + Inference
Accelerators
Clustering frameworks
Workload
Aware
Scheduling
Shared
Resource
Management
Emerging
Workloads
Dev Ops & Micro Services
High Performance
Computing
Design / Simulation / Modeling
‘New-gen
Workloads’
Hadoop, Spark, Containers
with Spark
IBM
Cloud
private
Ne
w
High Performance
Analytics
Trade / Risk Analytics
Containers and images
IBM Data
Science
Experience
31. 31
IBM OpenPOWER Moves on Deep Learning with a Vengeance
“In short, IBM kicked some butt today”
Rob Enderle
Industry Analyst
IBM brings Google's AI tools to its powerful computers
Google has cool technology to recognize images and speech, and IBM's
hardware can diagnose diseases and beat humans in Jeopardy.
Combine the two, and you get a powerful computer with serious brains.
32. OpenPOWER: Open Hardware for High Performance
32
Systems designed for
big data analytics
and superior cloud economics
Upto:
10 cores per cpu
96 hardware threads per cpu
1/2 TB RAM
7.6Tb/s combined I/O Bandwidth
OpenPOWER
Traditional
Intel x86
http://www.softlayer.com/POWER-SERVERS
https://power.jarvice.com/landing
33. Accelerated AI: Chip and Servers
POWER8 + coherent CAPI +
novel NVlink
for high BW coherent
CPU/GPU acceleration
S822LC-hpc:
• 2 POWER8 10 Core CPUs
• 4 NVIDIA P100 ”Pascal” GPUs
• 256 GB System Memory
• 2 SSD storage devices
• High-speed interconnect
(IB or Ethernet, depending on
infrastructure)
• Optional:
• Up to 1 TB System Memory
• PCIe attached NVMe storage
“POWER8 with NVLink”
S821LC:
High Density 2-Socket 1U
S822LC for Big Data
S822LC for High
Performance Computing
Power
Linux Servers
M.Gschwind, Bringing the Deep Learning Revolution into the Enterprise
Accelerated AI
Accelerator X
33
34. Introducing 822LC Power System for HPC:
First Custom-Built GPU Accelerator Server with NVLink and NVidia P100 GPUs
M.Gschwind, Bringing the Deep Learning Revolution into the Enterprise
▪ Custom-built GPU Accelerator Server
▪ High-Speed NVLink Connections between
CPUs & GPUs and among GPUs
▪ Features novel NVIDIA P100 Pascal GPU
accelerator
NVIDIA P100 Pascal GPU
2.5x Faster CPU-GPU Data Communication
via NVLink
NVLink
80 GB/s
GPU
P8
GPU GPU
P8
GPU
POWER8 NVLink Server
PCIe
32 GB/s
GPU GPU GPU GPU
No NVLink between CPU & GPU for x86
Servers: PCIe Bottleneck
x86 Servers with PCIe
x86 x86
34
35. Higher Performance with Power8 CPU-P100 GPU NVLink
P100
GPU
POWER8
CPU
GPU
Memory
System
Memory
P100
GPU
80 GB/s
GPU
Memory
NVLink
115 GB/s
P100
GPU
POWER8
CPU
GPU
Memory
System
Memory
P100
GPU
80 GB/s
GPU
Memory
NVLink
115 GB/s
37. PowerAI vs DGX-1: 1.6xTensorFlowThroughput / Dollar
(lower cost is better)
37
• TensorFlow 0.12 on the IBM PowerAI
platform takes advantage of the full
capabilities of NVLink
• For image classification and analysis this
means a 1.6X price performance advantage
relative to the NVIDIA DGX-1
System Images /
Second
List Price $ / Image /
Second
NVIDIA DGX-1
(8 P100 GPU,
512GB Mem)
330 $129,000 $390
PowerAI (4 P100
GPU, 512 GB
Mem)
273 $67,000 $241
39. PowerAI Trial Configurations in a public cloud:
• Docker container builds and comes up in minutes
• Single P100 GPUs
• 30 days with 60 hrs standard (120 for Sales referral)
• 128GB RAM, 32 CPU threads, 1TB shared storage
• Quad P100 GPUs
• 30 days with 120hrs standard (more by request)
• 512GB RAM, 128 CPU threads, 1TB shared storage
Contact: Michael Boros
Nimbix Cloud Advantages
• Easier to use
• Highest Performance
• Ultra Fast Launch Times
• Lower Cost
• Faster time to Value
• Bare-Metal Acceleration
• Enterprise Accounting
• Application Marketplace
• Private Apps
https://www.slideshare.net/IndrajitPoddar/fast-scalable-easy-machine-
learning-with-openpower-gpus-and-docker
Experience performance
with productivity
A superior integrated stack and
adequate hardware resources for
deep learning insights
40. 40
Launch deep learning
training by one-clickData labeling
Monitor the training
progress
Deploy the inference API to
data center
Generate and deploy the DL
inference accelerator onto FPGA
DL Engineer could get
optimized model parameters
DL Insight
DL Engineer
DSX
Inject the designed DL
network into AI Vision
AI Vision
Develop the DL
neural network via
the interactive GUI
Solution
developer
PowerAI
Inference Engine
Test
engineer
Error results will be looped back
to trigger new training task
"Easier Insights with Data Science Experience and PowerAI Deep Learning" -
https://ibm.box.com/s/m7ooeoi738rs7dq9l9v0i9iir79t4xmd
Analytics Signature Moment Event in Munich:
https://www.ibm.com/analytics/us/en/events/machine-learning/
41. • a 10x increase in
inspections/day
• a 90%decrease in
inspection time
• a Significant reduction
in worker accidents
Example value realized by an Asian
Utility company using PowerAI