Watson Machine Learning
Community Edition(WML-CE)
—
Ravi Gummadi
IBM Cognitive Systems
March, 2020 / © 2020 IBM Corporation
Agenda
• WML-CE Overview
• Open Source ML/DL frameworks
• LMS
• DDL
• DML(includes SnapML)
• WML-CE Setup
• SnapML Deep Dive
WML CE
(PowerAI)
Deep Learning Impact
(DLI) Module
Data & Model
Management, ETL,
Visualize, Advise
IBM Spectrum Conductor with Spark
Cluster Virtualization,
Dynamic Resource Orchestration,
Multiple Frameworks, Distributed Execution Engine
WML-CE: Open Source ML/DL Frameworks
Large Model Support (LMS)
Distributed Deep Learning
(DDL – 1000s of nodes)
Auto Hyper-parameter
Tuning
WML Accelerator
(PowerAI Enterprise)
Accelerated
Infrastructure
Accelerated Servers Storage
AI for
Data Scientists and
non-Data Scientists
Distributed Deep
Learning(DDL)
PowerAI Vision
Auto-DL for Images & Video
Label Train Deploy
H2O Driverless AI
Auto-ML for Text & Numeric Data, NLP
Import Experiment Deploy
SnapML
March, 2020 / © 2020 IBM Corporation
Watson Machine Learning Community Edition
Open Source – Enhanced and delivered as Conda Packages
Curated, tested and pre-compiled binary software distribution that enables enterprises to quickly and easily deploy machine learning
for their data science and analytics development
SnapML
IBM Value Addition: LMS: Train larger more complex models
Large Model SupportTraditional Model Support
Limited memory on GPU forces tradeoff
in model size / data resolution
Use system memory and GPU to support more
complex and higher resolution data
CPUDDR4
GPU
PCIe
Graphics
Memory
System
Bottleneck
Here
POWER
CPU
DDR4
GPU
NVLink
Graphics
Memory
POWER NVLink
Data Pipe
https://developer.ibm.com/linuxonpower/2019/06/11/tensorflow-large-model-support-resources/
6
1 TB
Memory
Power 9
CPU
V100
GPU
V100
GPU
170GB/s
NVLink
150 GB/s
1 TB
Memory
Power 9
CPU
V100
GPU
V100
GPU
170GB/s
NVLink
150 GB/s
IBM AC922 Power System
Deep Learning Server (4-GPU Config)
Store Large Models
in System Memory
Operate on One
Layer at a Time
Fast Transfer
via NVLink
5x Faster Data Communication with Unique CPU-GPU NVLink High-Speed
Connection
IBM Value Addition:
Distributed Deep
Learning (DDL)
Deep learning training takes
days to weeks
Limited scaling to
multiple x86 servers
PowerAI with DDL enables
scaling to 100s of GPUs 1 System 64 Systems
16 Days Down to 7 Hours
58x Faster
16 Days
7 Hours
Near Ideal Scaling to 256 GPUs
ResNet-101, ImageNet-22K
1
2
4
8
16
32
64
128
256
4 16 64 256
Speedup
Number of GPUs
Ideal Scaling
DDL Actual Scaling
95%Scaling with
256 GPUS
Caffe with PowerAI DDL, Running on Minsky (S822LC) Power System
ResNet-50, ImageNet-1K
Distributed Machine Learning (DML)
8
• IBM Value Addition: SnapML
 pai4sk: scikit-learn compatible ML algorithms API
 snapml-spark: SparkML compatible GLMs API
• RAPIDS – cudf, cuml
• dmlc XGBoost
• dask, dask-cudf, dask-xgboost
WML-CE setup
9
10
WML-CE : conda distribution
What is Conda
(and why should I care?)
It’s a packaging format
It has its own packaging manager
It has its own packaging list
Integrated dependency solver
Acquires software from a repository by URL (similar to Git)
Importantly for us… it allows you to install multiple instances
https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
https://www.ibm.com/support/knowledgecenter/en/SS5SF7_1.7.0/navigation/wmlce_install.html
11
We haven’t forgotten Docker!
https://hub.docker.com/r/ibmcom/powerai
NEW: Images with individual frameworks
- Base repository image (no frameworks installed)
- Tensorflow based image (py36, py37)
- Pytorch based image (py36, py37)
- Caffe-ibm based image (py36, py37)
- SnapML based image (py36, py37)
- All frameworks (py36, py37)
We now provide Red Hat Universal Base images too !!
https://access.redhat.com/containers/#/product/18c03ee6ba6a3657
More choice, more flexibility, more simplicity
WML-CE Strategy
 Freely available
https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/
 Provide as Bare Metal and Container
 Support upstream CI environments
 Contribute features and bug fixes
 Release Quarterly
 Engage with Conda Forge
 Sample data
https://github.com/IBM/powerai
SnapML
13
What data
science
methods are
used at work?
Source: Kaggle Data Science Survey 2019
Distributed Machine Learning / March, 2020 / © 2020 IBM Corporation
Machine LearningArtifical
Intelligence
Deep Learning
Logistic Regression
(80%)
Decision Trees / Random
Forrest (75%)
Gradient Boosting
Machines (64%)
SVMs
CNNs (43%)
DNNs (32%)
RNNs (30%)
GANs (7%)
Bayesian
Techniques (31%)
Evolutionary
Approaches
(7%)
Distributed Machine Learning / March, 2020 / © 2020 IBM Corporation
scikit-learn is the most
widely-used ML
framework
Source: Kaggle Data Science Survey 2018
Why?
• Wide variety of ML models.
• Good documentation.
• Standardized API.
Source: Kaggle Data Science Survey 2019
Distributed Machine Learning / March, 2020 / © 2020 IBM Corporation
Snap ML: Accelerating Machine Learning
Why Fast?
Performance matters for:
• online re-training of models
• model selection and hyper-parameter tuning
• fast adaptability to changes
Why Large-Scale?
Large datasets arise in business-critical applications:
recommendation, credit fraud, advertising,
space exploration, weather, etc.
Why Resource-Savvy?
Increased Resource Utilization. Less idle time.
Less usage means savings, higher profit margin.
Why Interpretable?
Necessary feature for regulated industries where
accountability is critical.
Snap ML is a set of compute libraries that transparently
accelerate open source frameworks for training Machine
Learning (ML) Models
It’s main characteristics are:
17
Fast
Scalable
Consumable
Interpretable
Resource-efficient
Core publication: https://arxiv.org/abs/1803.06333
Snap ML Features
18
Decision Trees
Random Forest
WMLCE 1.6.1 (2Q19)
Boosting Machine
WMLCE 1.6.2 (4Q19)
Linear Regression
Logistic Regression
WMLCE 1.6.0 (1Q19)
SVM
Source: Kaggle ML & DS Survey (Nov. 2019)
In February 2020 we released the 7th version of
Snap ML with IBM WML-CE 1.7.0
Snap ML offers today’s most popular ML models
19
Multi-Core, Multi-Socket &
GPU Acceleration
Distributed Training: Multi-CPU & Multi-GPU
GPU Accelerated
Logistic Regression
Linear Regression
Support Vector Machines
SnapBoost
Multi-Core, Multi-CPU
Decision Trees
Random Forests
CPU-GPU Memory
Management
APIs for Popular ML
Frameworks
Snap ML
Distributed High Performance Machine Learning Library
Snap Machine Learning (ML) Library
Distributed Training GPU Acceleration Sparse Data Optimization
Distributed Machine Learning / March, 2020 / © 2020 IBM Corporation
Accelerated and Distributed ML in WML CE
Snap ML
scikit-learn fork (pai4sk)
RAPIDSLogistic Regression Random Forest
Decision Tree SVM
Ridge/Lasso
Regression
SnapBoost
cuDF, cuML
Watson Machine Learning CE
TensorFlow
PyTorch
Caffe
Keras
ML solutions DL solutions
DDL
LMS
Distributed Machine Learning / March 2020 / © 2020 IBM Corporation
DMLC XGBoost
Dask
Snap ML Value Proposition
21
Leadership in AI for business:
Leading performance, response time
Scalability to multi-TB datasets
Higher efficiency, translating to lower cost
Higher accuracy, translating to higher profits
Explainability for regulated industries
Leading ML framework
Strong differentiation in performance, scalability,
accuracy
Applied in usecases across FSS, Retail, Advertising:
Fraud detection, credit default prediction, stock
prediction, pricing, sales forecasting, CTR prediction
>10x faster than Scikit-learn
398.9
17.3 12.7
0.0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
sklearn snap-CPU snap-1GPU
TrainingTime(s)
PRICE PREDICTION dataset
23x 31x
Handling TB-scale datasets More accurate than XGBoost
XGBoost
better
SnapBoost
better
1.1 Hours
1.53
Minutes
0
20
40
60
80
Runtime(Minutes)
Criteo TB dataset (4.2B rows)
46x
Faster
TensorFlow
90 x86 Servers
(CPU-only)
Snap ML
4 Power9 Servers
With GPUs
OpenML.org 48 binary classification dataset
References
22
• WML-CE Knowledge Center
https://www.ibm.com/support/knowledgecenter/en/SS5SF7
• WML-CE conda packages
https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
• IBM Cognitive Systems developer portal
https://developer.ibm.com/linuxonpower/deep-learning-powerai/
• WML-CE FAQ
https://developer.ibm.com/linuxonpower/deep-learning-powerai/faq/
• WML-CE ancillary and supplemental info
https://github.com/IBM/powerai
• Large Model Support
https://developer.ibm.com/linuxonpower/2019/06/11/tensorflow-large-model-support-resources/
• SnapML Research Page
https://www.zurich.ibm.com/snapml/
• SnapML API documentation
https://ibmsoe.github.io/snap-ml-doc/v1.6.0/
Thank You
23
Distributed Machine Learning / March, 2020 / © 2020 IBM Corporation

WML OpenPOWER presentation

  • 1.
    Watson Machine Learning CommunityEdition(WML-CE) — Ravi Gummadi IBM Cognitive Systems March, 2020 / © 2020 IBM Corporation
  • 2.
    Agenda • WML-CE Overview •Open Source ML/DL frameworks • LMS • DDL • DML(includes SnapML) • WML-CE Setup • SnapML Deep Dive
  • 3.
    WML CE (PowerAI) Deep LearningImpact (DLI) Module Data & Model Management, ETL, Visualize, Advise IBM Spectrum Conductor with Spark Cluster Virtualization, Dynamic Resource Orchestration, Multiple Frameworks, Distributed Execution Engine WML-CE: Open Source ML/DL Frameworks Large Model Support (LMS) Distributed Deep Learning (DDL – 1000s of nodes) Auto Hyper-parameter Tuning WML Accelerator (PowerAI Enterprise) Accelerated Infrastructure Accelerated Servers Storage AI for Data Scientists and non-Data Scientists Distributed Deep Learning(DDL) PowerAI Vision Auto-DL for Images & Video Label Train Deploy H2O Driverless AI Auto-ML for Text & Numeric Data, NLP Import Experiment Deploy SnapML March, 2020 / © 2020 IBM Corporation
  • 4.
    Watson Machine LearningCommunity Edition Open Source – Enhanced and delivered as Conda Packages Curated, tested and pre-compiled binary software distribution that enables enterprises to quickly and easily deploy machine learning for their data science and analytics development SnapML
  • 5.
    IBM Value Addition:LMS: Train larger more complex models Large Model SupportTraditional Model Support Limited memory on GPU forces tradeoff in model size / data resolution Use system memory and GPU to support more complex and higher resolution data CPUDDR4 GPU PCIe Graphics Memory System Bottleneck Here POWER CPU DDR4 GPU NVLink Graphics Memory POWER NVLink Data Pipe https://developer.ibm.com/linuxonpower/2019/06/11/tensorflow-large-model-support-resources/
  • 6.
    6 1 TB Memory Power 9 CPU V100 GPU V100 GPU 170GB/s NVLink 150GB/s 1 TB Memory Power 9 CPU V100 GPU V100 GPU 170GB/s NVLink 150 GB/s IBM AC922 Power System Deep Learning Server (4-GPU Config) Store Large Models in System Memory Operate on One Layer at a Time Fast Transfer via NVLink 5x Faster Data Communication with Unique CPU-GPU NVLink High-Speed Connection
  • 7.
    IBM Value Addition: DistributedDeep Learning (DDL) Deep learning training takes days to weeks Limited scaling to multiple x86 servers PowerAI with DDL enables scaling to 100s of GPUs 1 System 64 Systems 16 Days Down to 7 Hours 58x Faster 16 Days 7 Hours Near Ideal Scaling to 256 GPUs ResNet-101, ImageNet-22K 1 2 4 8 16 32 64 128 256 4 16 64 256 Speedup Number of GPUs Ideal Scaling DDL Actual Scaling 95%Scaling with 256 GPUS Caffe with PowerAI DDL, Running on Minsky (S822LC) Power System ResNet-50, ImageNet-1K
  • 8.
    Distributed Machine Learning(DML) 8 • IBM Value Addition: SnapML  pai4sk: scikit-learn compatible ML algorithms API  snapml-spark: SparkML compatible GLMs API • RAPIDS – cudf, cuml • dmlc XGBoost • dask, dask-cudf, dask-xgboost
  • 9.
  • 10.
    10 WML-CE : condadistribution What is Conda (and why should I care?) It’s a packaging format It has its own packaging manager It has its own packaging list Integrated dependency solver Acquires software from a repository by URL (similar to Git) Importantly for us… it allows you to install multiple instances https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/ https://www.ibm.com/support/knowledgecenter/en/SS5SF7_1.7.0/navigation/wmlce_install.html
  • 11.
    11 We haven’t forgottenDocker! https://hub.docker.com/r/ibmcom/powerai NEW: Images with individual frameworks - Base repository image (no frameworks installed) - Tensorflow based image (py36, py37) - Pytorch based image (py36, py37) - Caffe-ibm based image (py36, py37) - SnapML based image (py36, py37) - All frameworks (py36, py37) We now provide Red Hat Universal Base images too !! https://access.redhat.com/containers/#/product/18c03ee6ba6a3657 More choice, more flexibility, more simplicity
  • 12.
    WML-CE Strategy  Freelyavailable https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/  Provide as Bare Metal and Container  Support upstream CI environments  Contribute features and bug fixes  Release Quarterly  Engage with Conda Forge  Sample data https://github.com/IBM/powerai
  • 13.
  • 14.
    What data science methods are usedat work? Source: Kaggle Data Science Survey 2019 Distributed Machine Learning / March, 2020 / © 2020 IBM Corporation
  • 15.
    Machine LearningArtifical Intelligence Deep Learning LogisticRegression (80%) Decision Trees / Random Forrest (75%) Gradient Boosting Machines (64%) SVMs CNNs (43%) DNNs (32%) RNNs (30%) GANs (7%) Bayesian Techniques (31%) Evolutionary Approaches (7%) Distributed Machine Learning / March, 2020 / © 2020 IBM Corporation
  • 16.
    scikit-learn is themost widely-used ML framework Source: Kaggle Data Science Survey 2018 Why? • Wide variety of ML models. • Good documentation. • Standardized API. Source: Kaggle Data Science Survey 2019 Distributed Machine Learning / March, 2020 / © 2020 IBM Corporation
  • 17.
    Snap ML: AcceleratingMachine Learning Why Fast? Performance matters for: • online re-training of models • model selection and hyper-parameter tuning • fast adaptability to changes Why Large-Scale? Large datasets arise in business-critical applications: recommendation, credit fraud, advertising, space exploration, weather, etc. Why Resource-Savvy? Increased Resource Utilization. Less idle time. Less usage means savings, higher profit margin. Why Interpretable? Necessary feature for regulated industries where accountability is critical. Snap ML is a set of compute libraries that transparently accelerate open source frameworks for training Machine Learning (ML) Models It’s main characteristics are: 17 Fast Scalable Consumable Interpretable Resource-efficient Core publication: https://arxiv.org/abs/1803.06333
  • 18.
    Snap ML Features 18 DecisionTrees Random Forest WMLCE 1.6.1 (2Q19) Boosting Machine WMLCE 1.6.2 (4Q19) Linear Regression Logistic Regression WMLCE 1.6.0 (1Q19) SVM Source: Kaggle ML & DS Survey (Nov. 2019) In February 2020 we released the 7th version of Snap ML with IBM WML-CE 1.7.0 Snap ML offers today’s most popular ML models
  • 19.
    19 Multi-Core, Multi-Socket & GPUAcceleration Distributed Training: Multi-CPU & Multi-GPU GPU Accelerated Logistic Regression Linear Regression Support Vector Machines SnapBoost Multi-Core, Multi-CPU Decision Trees Random Forests CPU-GPU Memory Management APIs for Popular ML Frameworks Snap ML Distributed High Performance Machine Learning Library Snap Machine Learning (ML) Library Distributed Training GPU Acceleration Sparse Data Optimization Distributed Machine Learning / March, 2020 / © 2020 IBM Corporation
  • 20.
    Accelerated and DistributedML in WML CE Snap ML scikit-learn fork (pai4sk) RAPIDSLogistic Regression Random Forest Decision Tree SVM Ridge/Lasso Regression SnapBoost cuDF, cuML Watson Machine Learning CE TensorFlow PyTorch Caffe Keras ML solutions DL solutions DDL LMS Distributed Machine Learning / March 2020 / © 2020 IBM Corporation DMLC XGBoost Dask
  • 21.
    Snap ML ValueProposition 21 Leadership in AI for business: Leading performance, response time Scalability to multi-TB datasets Higher efficiency, translating to lower cost Higher accuracy, translating to higher profits Explainability for regulated industries Leading ML framework Strong differentiation in performance, scalability, accuracy Applied in usecases across FSS, Retail, Advertising: Fraud detection, credit default prediction, stock prediction, pricing, sales forecasting, CTR prediction >10x faster than Scikit-learn 398.9 17.3 12.7 0.0 50.0 100.0 150.0 200.0 250.0 300.0 350.0 400.0 450.0 sklearn snap-CPU snap-1GPU TrainingTime(s) PRICE PREDICTION dataset 23x 31x Handling TB-scale datasets More accurate than XGBoost XGBoost better SnapBoost better 1.1 Hours 1.53 Minutes 0 20 40 60 80 Runtime(Minutes) Criteo TB dataset (4.2B rows) 46x Faster TensorFlow 90 x86 Servers (CPU-only) Snap ML 4 Power9 Servers With GPUs OpenML.org 48 binary classification dataset
  • 22.
    References 22 • WML-CE KnowledgeCenter https://www.ibm.com/support/knowledgecenter/en/SS5SF7 • WML-CE conda packages https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/ • IBM Cognitive Systems developer portal https://developer.ibm.com/linuxonpower/deep-learning-powerai/ • WML-CE FAQ https://developer.ibm.com/linuxonpower/deep-learning-powerai/faq/ • WML-CE ancillary and supplemental info https://github.com/IBM/powerai • Large Model Support https://developer.ibm.com/linuxonpower/2019/06/11/tensorflow-large-model-support-resources/ • SnapML Research Page https://www.zurich.ibm.com/snapml/ • SnapML API documentation https://ibmsoe.github.io/snap-ml-doc/v1.6.0/
  • 23.
  • 24.
    Distributed Machine Learning/ March, 2020 / © 2020 IBM Corporation