Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems

Machine Learning at Scale on OpenPOWER
Chekuri S. Choudary (IBM)

Objectives
• Introduce the foundations of data science and artificial intelligence
• Give an overview of state-of-the-art machine learning technologies
• Demonstrate the benefits of leveraging H2O DriverlessAI and the IBM
hardware (Power processors coupled with NVLink) for AI projects in
enterprises

Agenda
• AI vs machine learning vs deep learning
• Supervised learning vs unsupervised learning
• Training vs inferencing
• Use cases
• Importance of data and data types (structured data, unstructured data)
• Data analysis
• Feature engineering
• Types of machine learning problems (regression, classification etc.)
• Machine learning algorithms
• AI technology landscape
• Role of GPUs and Power systems in AI development
• Best practices in AI development
• Automatic machine learning (AutoML)

Big
Data
Artificial
Intelligence &
Cognitive
Applications
Machine
Learning
Deep
Learning
(Neural Nets)
Artificial Intelligence, Machine Learning, and Deep Learning
• Deep Neural Networks: Lot more hidden layers (order of tens/hundreds)
• Cognitive Computing: Intersects AI, ML, and DL

Machine learning is good for…
1.Complex set of rules impossible to code
2.Long list of rules
3.Adapt to new data
4.Getting insights into large amounts of data

Training
• Data intensive:
historical data sets
• Compute intensive:
100% accelerated
• Develop a model for
use on the edge as
inference
Inference
• Enables the computer
to act in real time
• Low Power
• Out at the edge

AUTOMOTIVE
Auto sensors
reporting location,
problems
COMMUNICATIONS
Location-based
advertising
CONSUMER PACKAGED GOODS
Sentiment analysis of
what’s hot, problems
$
FINANCIAL SERVICES
Risk & portfolio analysis
New products
EDUCATION & RESEARCH
Experiment sensor analysis
HIGH TECHNOLOGY /
INDUSTRIAL MFG.
Mfg. quality
Warranty analysis
LIFE SCIENCES
Clinical trials
MEDIA/ENTERTAINMENT
Viewers / advertising
effectiveness
ON-LINE SERVICES /
SOCIAL MEDIA
People & career matching
HEALTH CARE
Patient sensors,
monitoring, EHRs
OIL & GAS
Drilling exploration
sensor analysis
RETAIL
Consumer sentiment
TRAVEL &
TRANSPORTATION
Sensor analysis for
optimal traffic flows
UTILITIES
Smart Meter analysis
for network capacity,
LAW ENFORCEMENT
& DEFENSE
Threat analysis - social
media monitoring, photo
analysis
AI Enterprise Use Cases

Retail Use Cases
• Pricing Optimization
• Promotion Optimization
• Customer Churn Prediction
• Personalized Marketing
• Assortment Planning
• Supply Chain Management
• Demand Forecasting
• Inventory Planning
• Inventory Replenishment
• Customer Insights
8

Financial Services Use Cases
• Fraud Detection (i.e. credit card transactions)
• Customer Life Value Prediction
• Customer Churn Prediction
• Client Risk Profiling
• Credit Risk Assessment
• Credit scoring and Underwriting
• Financial Risk Management
• Asset Valuation
• Stock and Index Futures Prediction
• Stock Volatility Prediction
• Residential mortgage appraisal
• Security, cyber-threat detection
9

Exploratory Data Analysis
• Trivial but critical to data science process
• Plots
• Time-series data
• Histograms
• Pair-wise scatterplots
• Summary statistics
• Mean, Median, Mode, Maximum, Minimum, Upper and lower quartiles
• Outlier analysis
• Fill missing data

(Goodfellow 2016)
Representations Matter
CHAPTER 1. INTRODUCTION
x
y Cartesian coordinates
r
Polar coordinates

(Goodfellow 2016)
Learning Multiple Components
Input
Hand-
designed
program
Output
Input
Hand-
designed
features
Mapping from
features
Output
Input
Features
Mapping from
features
Output
Input
Simple
features
Mapping from
features
Output
Additional
layers of more
abstract
features
Rule-based
systems
Classic
machine
learning Representation
learning
Deep
learning

(Goodfellow 2016)
Depth: Repeated Composition
Visible layer
(input pixels)
1st hidden layer
(edges)
2nd hidden layer
(corners and
contours)
3rd hidden layer
(object parts)
CAR PERSON ANIMAL
Output
(object identity)
Figure 1.2: Illustration of a deep learning model. It is diﬃcult for a computer to understand

Machine Learning Tasks
• Regression
• Classification
• Anamoly Detection
• Density Estimation
• Structured Output
• Synthesis
• Denoising

Machine Learning Algorithms:
•Tree Based Methods (Decision Trees, Random Forests, and Gradient Boosting)
•Generalized Linear Models
•Linear Regression
•Logistic Regression
•Support Vector Machines
•Unsupervised Learning Techniques (Clustering, Principal Component Analysis)
•Neural Networks
•Neural Network Topologies
•Convolutional Neural Networks (R-CNN, F-CNN, U-Net for Medical Imaging)
•Sequence Models (RNN, LSTM)
•Autoencoders
•Generative Adversarial Networks
Learning Paradigms:
•Transfer Learning

AI Solutions Engineering
• Exploratory Data Analysis, Data Visualization
• Data Engineering
• Data Augmentation
• ETL
• Hyper parameter tuning and search algorithms (Random, Bayesian, and TPE Search)
• Data Leakage
• Bias Detection and Mitigation
• Interpretability (Explainability, Fairness, Accountability, Transparency, Ethics)
• LIME, Anchors, TreeInterpreter, Partial Dependency Plots, Deconvolution etc.
• Inference Optimization (NVidia TensorRT)

Elements of Enterprise AI
• Data Security
• Model Deployment and Operationalization
• Inferencing Scenarios (In database, cloud etc.)
• Interoperability issues
• Model Retraining
• Model Maintenance (Versioning, Documentation etc.)
• Regulatory Compliance (HIPAA, SEC, GDPR etc.)
• Model Security (Adversarial Attacks on AI Models)
• Resource Management (Scheduling, Multitenancy etc.)
• Collaboration Tools

Why now?
• Data explosion
• GPUs
• Some advancements in machine learning algorithms

Technologies for Democratization of Deep Learning
• HPC, Distributed Computing Clusters, Public and Private Clouds
• Multicore-processors (Power9)
• GPUs
• Storage Technologies
• Open Source Deep Learning Frameworks
• TensorFlow, Keras, PyTorch, FastAI, MXNet
• Traditional Machine Learning Frameworks
• Scikit-Learn, H2O, XGBoost, IBM SnapML
• Automatic Machine Learning Frameworks
• TPOT, auto-sklearn, auto-PyTorch, auto-keras
• H2O DriverlessAI, Data Robot, IBM AutoAI
• Data Processing Libraries
• Pandas, Cudf, Dask, Dask-cudf
• Open Source Databases
• MongoDB, Cassandra, EnterpriseDB, MariaDB, Redis, Neo4J

Feature
Engineering
HPC Cluster/Public Cloud/Private Cloud/Hybrid Cloud
Distributed Storage (Storage for AI)
Data
Analysis/Engineering/
Warehousing/Mining
Model Development,
Testing & Validation
Deployment &
Inferencing
Retraining, Online Training
& Model Versioning
HPC Schedulers, Cloud Middleware, Kubernetes, HELM, Containers, Virtualization
Databases, Big Data Tools, Pythonic Frameworks, HPC Libraries, Microservices
Cloud Native AI
IoT

Applications of Deep Learning
• Computer Vision
• Natural Language Processing
• Speech recognition
• Bioinformatics and Chemistry
• Quantitative Finance

Computer Vision Applications of Deep
Learning
• Object Detection
• Face Recognition
• Event Recognition
• Human Pose Estimation
• Motion Tracking

(Goodfellow 2016)
Solving Object Recognition
2010 2011 2012 2013 2014 2015
Year
0.00
0.05
0.10
0.15
0.20
0.25
0.30
ILSVRCclassiﬁcationerrorrate
Figure 1.12: Since deep networks reached the scale necessary to compete in the ImageNet
Large Scale Visual Recognition Challenge, they have consistently won the competition

Some Popular CNNs
• LeNet (1990)
• AlexNet (2012)
• ZF net (2013)
• GoogLeNet (2014)
• VGGNet (2014)
• ResNet (2015)

Deep learning Requires Data
Little Data (More hand
engineering)
Lots of Data (Less hand
engineering and simpler
algorithms)
Speech
Recognition
Object
Recognition
Object
Detection
Source Andrew Ng

• Fast Training
• GLMs can scale
to datasets with
billions of
examples
and/or features
Why are GLMs/Trees useful? • Less tuning
• Algorithms for
training linear
models involve
much less
parameters than
more complex
models (GBMs,
NNs)
• Building blocks for Complex Models
• More expressive models such as
Gradient Boosting Machines heavily
use Decision Trees as basic models.
• Accelerating Decision Tree models
naturally leads to faster GBM models
as well.
• Interpretability
• Linear models are naturally
interpretable since they explicitly
assign an importance to each input
feature.
• Tree models are interpretable as they
explicitly illustrate the path to a
decision.
ML Models
Support
Vector
Machines
Logistic
Regression
Ridge
Regression
Lasso
Regression
Decision
Trees
Random
Forests
IBM Research / Snap ML / March 2019 / © 2019 IBM Corporation
Gradient
Boosting

Why Accelerated GLM?
Time to insight can become dominated by model training time in three
important scenarios:
A. The rate of data ingestion is relatively small, but the number of models is very large.
-- best accuracy is often achieved using large ensembles of models
B. The rate of data ingestion is relatively small, but frequent re-training is required.
-- need to adapt to events in real time
C. The rate of data ingestion grows to comprise many TB’s of data per day.
-- training even simple ML models becomes a challenge
Data
Ingestion
Cleaning
Feature
Extraction
Train Model 1
Train Model 2
Train Model N
Evaluation /
Selection
…
Data
Sources
Insight
Time to insight
Automated, cloud-based deployment
Modular architecture that leverages
existing ML and analytics services
IBM Research - Zurich / Introduction to Snap Machine Learning / May 2018 / © 2018 IBM Corporation

• Level 1
• Parallelism
across nodes
connected via a
network
interface.
• Level 2
• Parallelism
across GPUs
within the same
node connected
via an
interconnect
(e.g. NVLINK).
• Level 3
• Parallelism across the streaming
multiprocessors of the GPU
hardware.
Multi-level Parallelism
29

• Shorter training times
• Facilitates distributed machine learning
POWER, NVLink and V100 Advantage
POWER9
GPU
Memory
CPU Memory
150GB/s
150
GB/s
150 GB/s
GPU
Memory
POWER9 AC922 Two cores. Four NVIDIA V100 GPU

Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems

Recommended

Recommended

More Related Content

Similar to Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems

Similar to Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems (20)

More from Ganesan Narayanasamy

More from Ganesan Narayanasamy (20)

Recently uploaded

Recently uploaded (20)

Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems