WML SNAP ML

Watson Machine Learning
Community Edition
WML-CE
—
Pradipta Ghosh
Distributed Machine learning
IBM Cognitive Systems

Agenda
WML-CE Overview
Setup Hands-on
SnapML Hands-on

Watson Machine Learning Community Edition
Open Source – Enhanced and delivered as Conda Packages
Curated, tested and pre-compiled binary software distribution that enables enterprises to quickly and easily deploy deep learning
for their data science and analytics development
SnapML

WML CE
(PowerAI)
Deep Learning Impact
(DLI) Module
Data & Model
Management, ETL,
Visualize, Advise
IBM Spectrum Conductor with Spark
Cluster Virtualization,
Dynamic Resource Orchestration,
Multiple Frameworks, Distributed Execution Engine
PowerAI: Open Source ML Frameworks
Large Model Support (LMS)
Distributed Deep Learning
(DDL – 1000s of nodes)
Auto Hyper-parameter
Tuning
WML Accelerator
(PowerAI Enterprise)
Accelerated
Infrastructure
Accelerated Servers Storage
AI for
Data Scientists and
non-Data Scientists
Distributed Deep Learning (up
to 4 nodes)
SnapML
PowerAI Vision
Auto-DL for Images & Video
Label Train Deploy
H2O Driverless AI
Auto-ML for Text & Numeric Data, NLP
Import Experiment Deploy

Train larger more complex models
Large Model SupportTraditional Model Support
Limited memory on GPU forces tradeoff
in model size / data resolution
Use system memory and GPU to support more
complex and higher resolution data
CPUDDR4
GPU
PCIe
Graphics
Memory
System
Bottleneck
Here
POWER
CPU
DDR4
GPU
NVLink
Graphics
Memory
POWER NVLink
Data Pipe

6
1 TB
Memory
Power 9
CPU
V100
GPU
V100
GPU
170GB/s
NVLink
150 GB/s
1 TB
Memory
Power 9
CPU
V100
GPU
V100
GPU
170GB/s
NVLink
150 GB/s
IBM AC922 Power System
Deep Learning Server (4-GPU Config)
Store Large Models
in System Memory
Operate on One
Layer at a Time
Fast Transfer
via NVLink
5x Faster Data Communication with Unique CPU-GPU NVLink High-Speed
Connection

Distributed Deep
Learning (DDL)
Deep learning training takes
days to weeks
Limited scaling to
multiple x86 servers
PowerAI with DDL enables
scaling to 100s of GPUs 1 System 64 Systems
16 Days Down to 7 Hours
58x Faster
16 Days
7 Hours
Near Ideal Scaling to 256 GPUs
ResNet-101, ImageNet-22K
1
2
4
8
16
32
64
128
256
4 16 64 256
Speedup
Number of GPUs
Ideal Scaling
DDL Actual Scaling
95%Scaling with
256 GPUS
Caffe with PowerAI DDL, Running on Minsky (S822LC) Power System
ResNet-50, ImageNet-1K

8
WML-CE : conda distribution
What is Conda
(and why should I care?)
It’s a packaging format
It has its own packaging manager
It has its own packaging list
Integrated dependency solver
Acquires software from a repository by URL (similar to Git)
Importantly for us… it allows you to install multiple instances
https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/

9
We haven’t forgotten Docker!
https://hub.docker.com/r/ibmcom/powerai
NEW: Images with individual frameworks
- Base repository image (no frameworks installed)
- Tensorflow based image (py36, py37)
- Pytorch based image (py36, py37)
- Caffe-ibm based image (py36, py37)
- SnapML based image (py36, py37)
- All frameworks (py36, py37)
We now provide Red Hat Universal Base images too !!
https://access.redhat.com/containers/#/product/18c03ee6ba6a3657
More choice, more flexibility, more simplicity

CE Strategy
 Freely available
https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/
 Provide as Bare Metal and Container
 Support upstream CI environments
 Contribute features and bug fixes
 Release Quarterly
 Engage with Conda Forge
 Sample data
https://github.com/IBM/powerai

What data
science
methods are
used at work?
Distributed Machine Learning / December, 2019 / © 2019 IBM Corporation
Source: Kaggle Data Science Survey 2019

Machine LearningArtifical
Intelligence
Deep Learning
Logistic Regression
(80%)
Decision Trees / Random
Forrest (75%)
Gradient Boosting
Machines (64%)
SVMs
CNNs (43%)
DNNs (32%)
RNNs (30%)
GANs (7%)
Bayesian
Techniques (31%)
Evolutionary
Approaches
(7%)

scikit-learn is the most
widely-used ML
framework
Why?
• Wide variety of ML models.
• Good documentation.
• Standardized API.

Snap ML: Accelerating Machine Learning
Why Fast?
Performance matters for:
• online re-training of models
• model selection and hyper-parameter tuning
• fast adaptability to changes
Why Large-Scale?
Large datasets arise in business-critical applications:
recommendation, credit fraud, advertising,
space exploration, weather, etc.
Why Resource-Savvy?
Increased Resource Utilization. Less idle time.
Less usage means savings, higher profit margin.
Why Interpretable?
Necessary feature for regulated industries where
accountability is critical.
Snap ML is a set of compute libraries that transparently
accelerate open source frameworks for training Machine
Learning (ML) Models
It’s main characteristics are:
15
Fast
Scalable
Consumable
Interpretable
Resource-efficient
Core publication: https://arxiv.org/abs/1803.06333

Snap ML Features
16
Decision Trees
Random Forest
WMLCE 1.6.1 (2Q19)
Boosting Machine
WMLCE 1.6.2 (4Q19)
Linear Regression
Logistic Regression
WMLCE 1.6.0 (1Q19)
SVM
Source: Kaggle ML & DS Survey (Nov. 2019)
In February 2020 we released the 7th version of
Snap ML with IBM WML-CE 1.7.0
Snap ML offers today’s most popular ML models

17
Multi-Core, Multi-Socket &
GPU Acceleration
Distributed Training: Multi-CPU & Multi-GPU
GPU Accelerated
Logistic Regression
Linear Regression
Support Vector Machines
SnapBoost
Multi-Core, Multi-CPU
Decision Trees
Random Forests
CPU-GPU Memory
Management
APIs for Popular ML
Frameworks
Snap ML
Distributed High Performance Machine Learning Library
Snap Machine Learning (ML) Library
Distributed Training GPU Acceleration Sparse Data Optimization

Accelerated and Distributed ML in WML CE
Snap ML
scikit-learn fork (pai4sk)
RAPIDSLogistic Regression Random Forest
Decision Tree SVM
Ridge/Lasso
Regression
SnapBoost
cuDF, cuML
Watson Machine Learning CE
TensorFlow
PyTorch
Caffe
Keras
ML solutions DL solutions
DDL
LMS
DMLC XGBoost
Dask

Snap ML Value Proposition
19
Leadership in AI for business:
Leading performance, response time
Scalability to multi-TB datasets
Higher efficiency, translating to lower cost
Higher accuracy, translating to higher profits
Explainability for regulated industries
Leading ML framework
Strong differentiation in performance, scalability,
accuracy
Applied in usecases across FSS, Retail, Advertising:
Fraud detection, credit default prediction, stock
prediction, pricing, sales forecasting, CTR prediction
>10x faster than Scikit-learn
398.9
17.3 12.7
0.0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
sklearn snap-CPU snap-1GPU
TrainingTime(s)
PRICE PREDICTION dataset
23x 31x
Handling TB-scale datasets More accurate than XGBoost
XGBoost
better
SnapBoost
better
1.1 Hours
1.53
Minutes
0
20
40
60
80
Runtime(Minutes)
Criteo TB dataset (4.2B rows)
46x
Faster
TensorFlow
90 x86 Servers
(CPU-only)
Snap ML
4 Power9 Servers
With GPUs
OpenML.org 48 binary classification dataset

WML SNAP ML

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to WML SNAP ML

Similar to WML SNAP ML (20)

More from Ganesan Narayanasamy

More from Ganesan Narayanasamy (20)

Recently uploaded

Recently uploaded (20)

WML SNAP ML

Editor's Notes