3. Watson Machine Learning Community Edition
Open Source – Enhanced and delivered as Conda Packages
Curated, tested and pre-compiled binary software distribution that enables enterprises to quickly and easily deploy deep learning
for their data science and analytics development
SnapML
4. WML CE
(PowerAI)
Deep Learning Impact
(DLI) Module
Data & Model
Management, ETL,
Visualize, Advise
IBM Spectrum Conductor with Spark
Cluster Virtualization,
Dynamic Resource Orchestration,
Multiple Frameworks, Distributed Execution Engine
PowerAI: Open Source ML Frameworks
Large Model Support (LMS)
Distributed Deep Learning
(DDL – 1000s of nodes)
Auto Hyper-parameter
Tuning
WML Accelerator
(PowerAI Enterprise)
Accelerated
Infrastructure
Accelerated Servers Storage
AI for
Data Scientists and
non-Data Scientists
Distributed Deep Learning (up
to 4 nodes)
SnapML
PowerAI Vision
Auto-DL for Images & Video
Label Train Deploy
H2O Driverless AI
Auto-ML for Text & Numeric Data, NLP
Import Experiment Deploy
5. Train larger more complex models
Large Model SupportTraditional Model Support
Limited memory on GPU forces tradeoff
in model size / data resolution
Use system memory and GPU to support more
complex and higher resolution data
CPUDDR4
GPU
PCIe
Graphics
Memory
System
Bottleneck
Here
POWER
CPU
DDR4
GPU
NVLink
Graphics
Memory
POWER NVLink
Data Pipe
6. 6
1 TB
Memory
Power 9
CPU
V100
GPU
V100
GPU
170GB/s
NVLink
150 GB/s
1 TB
Memory
Power 9
CPU
V100
GPU
V100
GPU
170GB/s
NVLink
150 GB/s
IBM AC922 Power System
Deep Learning Server (4-GPU Config)
Store Large Models
in System Memory
Operate on One
Layer at a Time
Fast Transfer
via NVLink
5x Faster Data Communication with Unique CPU-GPU NVLink High-Speed
Connection
7. Distributed Deep
Learning (DDL)
Deep learning training takes
days to weeks
Limited scaling to
multiple x86 servers
PowerAI with DDL enables
scaling to 100s of GPUs 1 System 64 Systems
16 Days Down to 7 Hours
58x Faster
16 Days
7 Hours
Near Ideal Scaling to 256 GPUs
ResNet-101, ImageNet-22K
1
2
4
8
16
32
64
128
256
4 16 64 256
Speedup
Number of GPUs
Ideal Scaling
DDL Actual Scaling
95%Scaling with
256 GPUS
Caffe with PowerAI DDL, Running on Minsky (S822LC) Power System
ResNet-50, ImageNet-1K
8. 8
WML-CE : conda distribution
What is Conda
(and why should I care?)
It’s a packaging format
It has its own packaging manager
It has its own packaging list
Integrated dependency solver
Acquires software from a repository by URL (similar to Git)
Importantly for us… it allows you to install multiple instances
https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
9. 9
We haven’t forgotten Docker!
https://hub.docker.com/r/ibmcom/powerai
NEW: Images with individual frameworks
- Base repository image (no frameworks installed)
- Tensorflow based image (py36, py37)
- Pytorch based image (py36, py37)
- Caffe-ibm based image (py36, py37)
- SnapML based image (py36, py37)
- All frameworks (py36, py37)
We now provide Red Hat Universal Base images too !!
https://access.redhat.com/containers/#/product/18c03ee6ba6a3657
More choice, more flexibility, more simplicity
10. CE Strategy
Freely available
https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/
Provide as Bare Metal and Container
Support upstream CI environments
Contribute features and bug fixes
Release Quarterly
Engage with Conda Forge
Sample data
https://github.com/IBM/powerai
15. Snap ML: Accelerating Machine Learning
Why Fast?
Performance matters for:
• online re-training of models
• model selection and hyper-parameter tuning
• fast adaptability to changes
Why Large-Scale?
Large datasets arise in business-critical applications:
recommendation, credit fraud, advertising,
space exploration, weather, etc.
Why Resource-Savvy?
Increased Resource Utilization. Less idle time.
Less usage means savings, higher profit margin.
Why Interpretable?
Necessary feature for regulated industries where
accountability is critical.
Snap ML is a set of compute libraries that transparently
accelerate open source frameworks for training Machine
Learning (ML) Models
It’s main characteristics are:
15
Fast
Scalable
Consumable
Interpretable
Resource-efficient
Core publication: https://arxiv.org/abs/1803.06333
16. Snap ML Features
16
Decision Trees
Random Forest
WMLCE 1.6.1 (2Q19)
Boosting Machine
WMLCE 1.6.2 (4Q19)
Linear Regression
Logistic Regression
WMLCE 1.6.0 (1Q19)
SVM
Source: Kaggle ML & DS Survey (Nov. 2019)
In February 2020 we released the 7th version of
Snap ML with IBM WML-CE 1.7.0
Snap ML offers today’s most popular ML models