Innovation with ai at scale on the edge vt sept 2019 v0

Innovating with
AI at Scale:
Tools and Tips for
Training and Inference
Presenter: Clarisse Taaffe-Hedglin
clarisse@us.ibm.com
Executive AI Architect
IBM Systems

1. Drivers of the AI explosion
2. Implementing use cases at scale
3. Deploying models to the edge
2

4
USE CASES
ARE EVERYWHERE
IBM Skills Academy / © Copyright 2018 IBM Corporation

Artificial Intelligence brings
new Cognitive Capabilities
• Computers can be trained to “See”
Example: Airport security inspecting luggage
• Computers can be trained to “Hear”
Example: Maintenance crew listening to railcars
• Computers can be trained to “do”: mimic an expert
Example: Mobile phone provider predicting customer churn

Data + Algorithms + Compute
CPU
GPU
FPGA
The key triggers rapidly advancing AI
Open Source Software

MEDIA/ENTERTAINMENT
RETAIL
Reco. Engines,
Precision Mktg
OTHERS
Agriculture,
Remote
Sensing
LIFE SCIENCES
Sequence
Analysis,
Radiology
UTILITIES
Smart Meter
analysis, Capacity
planning
$
FINANCIAL SERVICES
Risk analysis
Fraud detection
CUSTOMER SERVICE
Chatbots, Helpdesk
Automated
Expenses
LAW & DEFENSE
Threat analysis -
social media
monitoring
RESEARCH
Physics
Modeling
HEALTH CARE
Patient sensors,
monitoring, EHRs
TRANSPORTATION
Optimal traffic
flows, Route
planning
CONSUMER GOODS
Sentiment
analysis
Advertising
effectiveness
OIL & GAS
Exploration,
sensor
analysis
AUTOMOTIVE
ADAS,
Maintenance
MANUFACTURING
Line
inspection,
Defect analysis
Addressable market
Cognitive Systems / February 26 / © 2019 IBM Corporation

BIG, COMPLEX SYSTEMS
PERSONALIZATION
AUTOMATION
SIMULATING
RELATIONSHIPS
VISUAL RECOGNITION
PATTERNS
The scenariosAI can
best solve for today
IBM Skills Academy / © Copyright 2018 IBM Corporation

ML Framework Landscape
9
Which ML frameworks have you used
the most over the last 5 years?
Source: Kaggle Data Science Survey 2018
scikit-learn is, by far, the most widely-used
ML framework
Why?
• Wide variety of ML models
• Good documentation
• Standardized API
Some downsides of scikit-learn are:
1. Lack of support of deep learning (DL)
2. Slow performance for large datasets
Problem (1) is addressed by DL frame works in
PowerAI (TensorFlow, PyTorch) recently rebranded
as Watson Machine Learning Accelerator
Problem (2) is addressed by Snap ML

Watson Machine Learning Community Edition
TensorFlow
TensorFlow Probability
TensorBoard
TensorFlow-Keras
BVLC Caffe
IBM Enhanced Caffe
Caffe2
OpenBLAS
HDF5
Curated, tested and pre-compiled binary
software distribution that enables enterprises
to quickly and easily deploy deep learning for
their data science and analytics development
Including all of the following frameworks:
Nvidia RAPIDS

Distributed Deep Learning
Simplifies the process of training
deep learning models across a
cluster for faster time to results.
Software Libraries
WML CE software and the
accelerated Power servers
support a host of accelerator
libraries like SnapML, Nvidia
RAPIDS
Large Model Support
Use system memory with GPUs
to support more complex models
and higher resolution data.
IBM adds value to curated, tested, and
pre-compiled frameworks with
Watson Machine Learning Community Edition
GPU
CPU

Evolving from compute systems to Cognitive Systems
P8 P9 P10
Open Frameworks
Partnerships
Industry Alignment
DevEcosystem
Accelerator Roadmaps
Open Accelerator
Interfaces
Not Just About Hardware Design
hardware
software
+
It’s about co-optimization and open
innovation
which just work for ML, DL, and AI
IBM Software
12

How to get to AI at scale ?
13

Distributed Deep
Learning (DDL)
16Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation
Deep learning training
takes days to weeks
Limited scaling to
multiple x86 servers
PowerAI with DDL
enables scaling to 100s
of GPUs
1 System 64 Systems
16 Days Down to 7 Hours
58x Faster
16 Days
7 Hours
Near Ideal Scaling to 256 GPUs
ResNet-101, ImageNet-22K
1
2
4
8
16
32
64
128
256
4 16 64 256
Speedup
Number of GPUs
Ideal Scaling
95%Scaling with
256 GPUS
Caffe with PowerAI DDL, Running on Minsky (S822Lc) Power
System
ResNet-50, ImageNet-1K

Train larger more complex models
Large Model SupportTraditional Model Support
Limited memory on GPU forces tradeoff
in model size / data resolution
Use system memory and GPU to support more
complex and higher resolution data
CPUDDR4
GPU
PCIe
Graphics
Memory
System
Bottleneck
Here
POWER
CPU
DDR4
GPU
NVLink
Graphics
Memory
POWER NVLink
Data Pipe

Large AI Models Train
~4 Times Faster
POWER9 Servers with
NVLink to GPUs
vs
x86 Servers with PCIe to
GPUs
19
3.1 Hours
49 Mins
0
2000
4000
6000
8000
10000
12000
Xeon x86 2640v4 w/
4x V100 GPUs
Power AC922 w/ 4x
V100 GPUs
Time(secs)
Caffe with LMS (Large Model Support)
Runtime of 1000 Iterations
3.8x Faster
GoogleNet model on Enlarged
ImageNet Dataset (2240x2240)

TensorFlow Large Model Support NVLINK2 Advantage
s.
3DUnet segmentation models with
higher resolution images allows for
learning and labeling finer details
and structures of brain tumors.
https://developer.ibm.com/linuxonpower/2018/07/27/tensorflow-large-model-support-case-study-3d-image-segmentation/

Accelerating Machine Learning
Why Fast?
Speed is important/crucial in many cases:
• online re-training of models
• model selection and hyper-parameter tuning
• fast adaptability to changes
Why Large-Scale?
Large datasets arise in numerous business-critical
applications: recommendation, credit fraud, advertising,
space exploration, weather, etc.
Why Resource-Savvy?
Not everyone can afford on-prem computing.
Renting computing in the cloud is billed by usage.
Less usage means savings, higher profit margin.
Snap ML is a framework for training
Machine Learning (ML) Models
It is characterized by:
 high performance
 scalability to very large datasets
 high resource efficiency
Artificial
Intelligence
Machine
Learning
Deep Learning
(Neural Networks)
21

Which models are supported?
22
Snap ML (PowerAI 1.6.0) currently supports:
• Generalized Linear Models:
- Logistic Regression
- Ridge Regression
- Lasso Regression
- Support Vector Machines (SVMs)
• Tree-based models:
- Decision Trees
- Random Forest
With more to come…
Source: Kaggle Data Science Survey 2017
Which data science methods are used at work?
Supported
by Snap
ML

23
Decision Tree
Performance Results
Random Forest
Performance Results
23
5.2x 4.5x
On average 6.5x faster than sklearn (CPU-only) On average 3.8x faster than sklearn (CPU-only)
Project www: https://www.zurich.ibm.com/snapml/
Core publication: https://arxiv.org/abs/1803.06333

Nvidia RAPIDS
RAPIDS is a set of open source libraries for GPU accelerating data preparation and machine
learning.
OSS website: rapids.ai

Nvidia RAPIDS cuDF - GPU DataFrames
is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data
provides a pandas-like API that will be familiar to data engineers & data scientists
Current version is 0.6
 PowerAI 1.6.0 CuDF included tech preview version is backlevel (0.2)
 WIP to get latest into Conda or build yourself (open source)
Examples of data manipulation in cuDF like object creation, viewing, selection, merge, concat, etc can be
found here:
https://rapidsai.github.io/projects/cudf/en/latest/10min.html

Simple cuDF example
download a CSV, then uses the GPU to parse it into rows and columns and run calculations:
output:

Nvidia RAPIDS cuML - GPU Machine Learning
is a suite of libraries that implement machine learning algorithms and mathematical primitives functions
enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs
Current version is 0.6
 PowerAI 1.6.0 CuML included tech preview version is backlevel (0.2)
 WIP to get latest into Conda or build yourself (open source)
Documentation on supported algorithms like Kmeans, tSVD, PCA, DBSCAN can be found here:
https://docs.rapids.ai/api/cuml/stable/

Simple cuML example
loads input and computes DBSCAN clusters, all on GPU:
output:

COLLECT - Make data simple and accessible
ORGANIZE - Create a trusted analytics foundation
ANALYZE - Scale AI everywhere with trust & transparency
Data of every type, regardless of
where it lives
MODERNIZE
your data estate for an
AI and multicloud world
INFUSE – Operationalize AI across business processes
The AI Ladder
A prescriptive approach to accelerating the journey to AI
30
AI
AI-optimized systems
infrastructure

Introduction to Nvidia TensorRT
NVIDIA TensorRT™ is a platform for high-performance deep learning inference. It includes a deep
learning inference optimizer and runtime that delivers low latency and high-throughput for deep
learning inference applications.
Nvidia website: https://developer.nvidia.com/tensorrt

Tensorflow and TensorRT inference
TensorFlow™ integration
with TensorRT™ (TF-TRT)
optimizes and executes
compatible subgraphs,
allowing TensorFlow to execute
the remaining graph. While you
can still use TensorFlow's wide
and flexible feature
set, TensorRT will parse the
model and apply optimizations
to the portions of the graph
wherever possible.

Note: TensorRT engines are optimized for the
currently available GPUs, so conversions should
take place on the machine that will be running
inference.

Calibrating for lower precision with a minimal loss of accuracy
reduces the requirements on bandwidth and allows for faster
computation speed. It also allows for the use of Tensor Cores,
which perform matrix multiplication on 4×4 FP16 matrices and adds
a 4×4 FP16 or FP32 matrix.

https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-
inference/

Nvidia TensorRT Current Version
Version 6 Announced on September 16th (current)
https://news.developer.nvidia.com/tensorrt6-breaks-bert-record/
Version 5.1.3.6 added as a tech preview to WML CE 1.6.1
https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/

41
Resources
https://developer.ibm.com/linuxonpower/deep-learning-powerai#tab_education
Nvidia TensorRT: https://developer.nvidia.com/tensorrt
WML CE 1.6.1: https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
TF-TRT Documentation: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/
IBM TensorRT introduction blog: https://developer.ibm.com/linuxonpower/2019/07/29/introducing-tensorflow-with-tensorrt-tf-trt/
IBM Tensorflow Serving blog (includes TensorRT example): https://developer.ibm.com/linuxonpower/2019/08/05/using-tensorrt-models-
with-tensorflow-serving-on-wml-ce/
Image classification and object detection: github.com/tensorflow/tensorrt
Nvidia forum:https://devtalk.nvidia.com/default/board/301/deep-learning-training-and-inference-/
Mixed precision and accuracy: https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9143-mixed-precision-
training-of-deep-neural-networks.pdf
Demo: https://github.com/cheeyauk/tf_to_tensorrt

IBM Systems WW Client Experience Centers
IBM Internal Use Only
Search Center Offerings in ISCEP:
https://ibm.biz/client-experience-portal
Contact Center via
IBM Systems Worldwide Client Experience
Centers maximize IBM Systems competitive
advantage in the Cloud and Cognitive era by
providing access to world class technical
experts and infrastructure services to assist
Clients with the transformation of their IT
implementations. Center offerings enable IBM
Sellers and Business Partners to progress and
expedite System Sales opportunities.
9 Worldwide Locations (* also Infrastructure
Hubs):
Austin TX , *Poughkeepsie NY, Rochester MN,
Tucson AZ, *Beijing CHINA, Boeblingen
GERMANY, Guadalajara MEXICO,*Montpellier
FRANCE, Tokyo JAPAN
Client Experience
Tailored, in-depth
technology
Innovation Exchange
Events
Relationship building
Demonstrations
Meetups
Solution workshops
Remote options
(Inbound & Outbound)
Infrastructure
Solutions
Benchmarks, MVP & Proof
of Technology
“Test Drives”
Demonstrations
Infrastructure Services
Certify ISV solutions
Hosting
Cloud Environment
(Inbound to Centers)
Architecture &
Design
Advise clients, Enable
Sellers, “Art of the
Possible”
Discovery & Design
Workshops, Consulting,
Showcases, Reference
Architectures, Co-
Creation of assets
Included CSSC
(Inbound & Outbound)
Content
Content Development
IBM Redbooks
Training Courses
Video courses
“Test Drives”
Demonstrations
NEW: Co-Creation Lab; CEC Cloud; IBM Systems Center of Competency for Red
Hat

Please note
IBM’s statements regarding its plans, directions, and intent are subject to change
or withdrawal without notice and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general
product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise,
or legal obligation to deliver any material, code or functionality. Information about potential
future products may not be incorporated into any contract.
The development, release, and timing of any future features or functionality described for our
products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in
a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon many factors, including considerations such as the
amount of multiprogramming in the user’s job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an
individual user will achieve results similar to those stated here.
44

Notices and disclaimers
45Replace the footer with text from the PPT-Updater. Instructions are included in that file.
© 2018 International Business Machines Corporation. No part of this
document may be reproduced or transmitted in any form without
written permission from IBM.
U.S. Government Users Restricted Rights — use, duplication or
disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to
products that have not yet been announced by IBM) has been reviewed
for accuracy as of the date of initial publication and could include
unintentional technical or typographical errors. IBM shall have no
responsibility to update this information. This document is distributed
“as is” without any warranty, either express or implied. In no event,
shall IBM be liable for any damage arising from the use of this
information, including but not limited to, loss of data, business
interruption, loss of profit or loss of opportunity. IBM products and
services are warranted per the terms and conditions of the agreements
under which they are provided.
IBM products are manufactured from new parts or new and used parts.
In some cases, a product may not be new and may have been previously
installed. Regardless, our warranty terms apply.”
Any statements regarding IBM's future direction, intent or product
plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled,
isolated environments. Customer examples are presented as illustrations of
how those
customers have used IBM products and the results they may have
achieved. Actual performance, cost, savings or other results in other
operating environments may vary.
References in this document to IBM products, programs, or services does
not imply that IBM intends to make such products, programs or services
available in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by
independent session speakers, and do not necessarily reflect the views of
IBM. All materials and discussions are provided for informational purposes
only, and are neither intended to, nor shall constitute legal or other guidance
or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal
requirements and to obtain advice of competent legal counsel as to
the identification and interpretation of any relevant laws and regulatory
requirements that may affect the customer’s business and any actions the
customer may need to take to comply with such laws. IBM does not provide
legal advice or represent or warrant that its services or products will ensure
that the customer follows any law.

Notices and disclaimers
continued
46Replace the footer with text from the PPT-Updater. Instructions are included in that file.
Information concerning non-IBM products was obtained from the
suppliers of those products, their published announcements or other
publicly available sources. IBM has not tested those products about this
publication and cannot confirm the accuracy of performance, compatibility
or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of
those products. IBM does not warrant the quality of any third-party
products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM expressly disclaims all
warranties, expressed or implied, including but not limited to, the
implied warranties of merchantability and fitness for a purpose.
The provision of the information contained herein is not intended to, and
does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com and [names of other referenced IBM
products and services used in the presentation] are trademarks of
International Business Machines Corporation, registered in many
jurisdictions worldwide. Other product and service names might
be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at "Copyright and trademark
information" at: www.ibm.com/legal/copytrade.shtml.
.

Innovation with ai at scale on the edge vt sept 2019 v0

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Innovation with ai at scale on the edge vt sept 2019 v0

Similar to Innovation with ai at scale on the edge vt sept 2019 v0 (20)

More from Ganesan Narayanasamy

More from Ganesan Narayanasamy (20)

Recently uploaded

Recently uploaded (20)

Innovation with ai at scale on the edge vt sept 2019 v0

Editor's Notes