AI Scalability for the Next Decade

AI Scalability
for a Decade
Spark Summit
Europe 10/2019
Dave McDonnell
Manager, Business Development
IBM Storage Division

§ The Path to AI
§ Challenges
§ IA and AI

AI is transforming every industry
Autonomous driving
Collision avoidance
Route optimization
Location-based advertising
Customer Experience
Stock forecasting
Buyer behavior
Clinical trials
Drug discovery
Genomics
Experimental sensor capture
Hypothesis modeling
Seismic analysis
exploration
Smart metering /
Usage forecasting
Market prediction
Fraud detection
Risk mitigation
Threat detection/assessment
Video surveillance
Social media monitoring
Traffic flow analysis
Manufacturing quality control
Supply chain optimization
Warranty analysis

Cognitive applications require cognitive
infrastructure
Scalability
Cost Management
Simplicity
Security
Governance
…

What is AI?
Laptop
R Programming
Python Programming
Concept
Understanding
Defining a Problem
Building Model
Proof of Concept with
GPU Technology
POC in the Cloud
POC On Premise
Does it work?
Can it do something
meaningful for me
today?
Which AI opportunities
are next?
Expanding
Growth in Cloud Usage
GPU Workstation
GPU Server
How to improve?
More applications
More GPU
workstations/servers
USB Drives. Disk
Array #1. Disk Array
#2. etc..
Challenges
Searching for
information
Sharing information
Performance
Scaling
Back-Up
Archiving
Security
Governance

Inefficient workload
management results in poor
server utilization rates and
throughput.
Replicated data
Difficult to scale
Lack of security with
open-source frameworks
and applications
introduces risk.
Multiple Spark teams each with
dedicated servers = wasted
capacity and high administrative
overhead.
Scaling Spark
is challenging

47%
Advanced data
management
50%
Data volume
and quality
44%
Skills gap
Top 3 challenges
for organizations
deploying AI workloads

The Problem
•Data needs to be labelled
•Its manual, too much data, time taking
•Data is scarce and expensive to aggerate
•Developing custom models quickly
•Data scientists are required
•Not all Data scientists are created equal
•Deploying custom solutions around these Models
•Need application developers with skills in OpenCV
•Leveraging GPUs
•Train and Infer faster
•Need to deploy models at the Edge – Nvidia TX2/Xavier/T4
•Keep up with CUDA drivers

80%
of data is either
inaccessible,
untrusted or
unanalyzed
81%
of users do not
understand the
data required
for AI
(information architecture)
12
No amount of AI algorithmic sophistication will overcome a
lack of data [architecture] … bad data is simply paralyzing…“
There is no
Without an
”

Let’s Help You Get There..
Software is your friend
A workflow view
AI and Big Data Analytics at Scale

© IBM Corporation 2019 14
Cloud Object Storage and Spectrum Discover and Spectrum Scale
• Ingests & indexes system metadata
via Action Agent SDK
• Extracts labels from images
• Adds as custom tags
• Global ingest of IOT data from
vehicles
• Geo-dispersed COS
• Searches for images with
labeled as having
‘Pothole’ feature
SORT & EXTRACTINGEST CURATE TRAIN
Stop sign
Data Scientist
• Enriches data catalog
with new tags derived
from analysis
via Action Agent SDK
• Trains a model
Pothole
Data Scientist
Tools like Power Vision AI

Is your data stranded?
Complex workflows can lead to data isolation
Replication
sprawl Time
to deliver
Synch
issues
Custody
chain
Performance
disparities
High labor
cost
Mgt.
Complication
Optimization
nightmares
?
Ingest Preparation Training Inference

The End-to-End Enterprise Data Pipeline
Machine Learning and Deep Learning do happen in a silo
EDGE INSIGHTS
Ingest Organize Analyze Prepare Train Inference
Streams/
NFS/S3
Data Input
Integrate new data
with existing
repositories
Correlate data from
data lake for newer
insights and views
Use select datasets to identify
patterns and train models for future
decision making
ETL, Tagging
BI, HPC
Sample Workloads
Data as the shared asset between various Analytics and AI stages
in an E2E enterprise data pipeline

Hadoop
HDFS
Data Lake
Cloud Based
Data Providers
IOT
Deep
Learning
ImpactHDFS
Transparency
Connector
/ESS
Enabling Agility, Straight-Thru Processing
for Improved Data Engineering ETL
Spark AI Grid with Watson ML Accelerator
Reduce cost and improve service levels

© IBM Corporation 2018 18
IBM Elastic Storage Server (ESS)
Integrated scale-out data management for file and object data
Optimal building block for high-performance, scalable,
reliable enterprise Spectrum Scale storage
• Faster data access with choice to scale-up or out
• Easy to deploy clusters with unified system GUI
• Simplified storage administration with IBM Spectrum Control integration
One solution for all your Spectrum Scale data needs
• Single repository of data with unified file and object support
• Anywhere access with multi-protocol support:
NFS 4.0, SMB, OpenStack Swift, Cinder, and Manila
• Ideal for Big Data Analytics with full Hadoop transparency
Ready for business critical data
• Disaster recovery with synchronous or asynchronous replication
• Ensure reliability and fast rebuild times using Spectrum Scale RAID’s
dispersed data and erasure code
• Five 99999s of availability
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage
ESS 5U84
Storage

Power Systems LC922 – Delivering enhanced price-performance for Apache Spark
Reduce operating costs and deliver results faster compared to tested Intel Xeon systems
1. Results are based IBM Internal Measurements running four concurrent streams of 99 TPC-DS like queries against a 3TB dataset. Results valid as of 4/25/18 and conducted under laboratory condition with speculative execution controls to mitigate user-to-kernel and user-to-user side-channel attacks on both
systems, individual results can vary based on workload size, use of storage subsystems & other conditions
2. Hardware: 4 nodes IBM Power LC922 (2x20-core/2.7 GHz/512 GB memory) using 12 x 8TB HDD, 10 GbE two-port, RHEL 7.5 LE for Power9 and 4 nodes of Intel Xeon Gold 6140; 36 cores (2 x 18c chips) at 2.3 GHz; 512 GB memory, 12 x 8TB HDDs, 10Gbps NIC, Red Hat Enterprise Linux 7.5
3. Software: Apache Spark 2.3.0 located at http://spark.apache.org/downloads.html ; and open source Hadoop HDP 2.7.5
4. Pricing is based on Power LC922 http://www-03.ibm.com/systems/power/hardware/linux-lc.html and publicly available x86 pricing.
5. Apache®, Apache Spark®, and associated logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Intel Xeon SP Gold
6140 server
22.6 QpH
Power LC922
29.5 QpH
Power LC922
$43,948
Intel Xeon SP Gold
6150 server
$53,548
18%
LOWER
Price2,3,4
30%
MORE
Performance1
Power LC922 Delivers
1.6x
Price-performance
+

Dedicated Resources
IBM Spectrum Conductor
A complete enterprise-grade solution for Data Engineering and Analytics
• Dynamic Sharing with guaranteed SLA
• Rapidly deploy & support multiple concurrent
instances and versions of Spark, Notebooks,
Anaconda, MongoDB & other services
• Proven at scale:
• 5K hosts, 150K cores, >1B tasks/day
• High performance workload and resource manager:
• 30-224% faster
IBM Spectrum Conductor / Spark + AI Summit / 2019-Apr-23 / © 2019 IBM Corporation 20
Secure Multitenant Shared Resources
Shared Resources
Dedicated Resource Silos

IBM Spectrum Conductor / Spark + AI Summit / 2019-Apr-23 / © 2019 IBM Corporation 21
Superior performance and scalability compared to
competing orchestration / resource management
solutions
30 to 224% faster than YARN
25 and 88% faster than Apache Mesos
Consistent and predictable delivering 77% relative
standard deviation (RSD)
YARN & Mesos are relatively unpredictable 777%
and 304% RSD respectively
Audited results
2607
327
899
574
1673
253
582
177
1660
202
478 458
0
500
1000
1500
2000
2500
3000
Case 1: Sync
interactive multi-user
Case 2:
Asynchonous batch
multi-user
Case 3: Mixed multi-
user
Case 4: Mixed multi-
tenant
Throughput of Spark SMB-2 benchmark workload on various
Resource Managers
(Jobs/hour - higher is better)
IBM Spectrum Conductor Apache YARN v2.7.3 Apache Mesos v1.0.1
Audited benchmark results https://stacresearch.com/news/2017/05/19/IBM170405
IBM Spectrum Conductor
Better performance and predictability

IBM PowerAI Vision: “Point-and-Click” AI for Images & Video
Label Image or
Video Data
Auto-Train AI
Model
Package & Deploy
AI Model

Leading the
Pack in
AI Infrastructure
IBM Systems Reference
Architecture for AI
IBM PowerAI
IBM Spectrum Computing
IBM Storage
IBM Accelerated
Compute Platform
IBM Power Servers
IBM Spectrum Computing
IBM Spectrum Scale & ESS
IBM Storage Solutions
for AI / ML / DL
IBM Spectrum Scale
IBM Cloud Object Storage
IBM Spectrum Discover
All Built upon Data Infrastructure with:
Wells Fargo:
Financial Risk Modeling
“Wells Fargo data scientists
build, enhance, & validate
hundreds of models each day,
speed is critical, along with
scalability, as they deal with
greater amounts of data &
more complicated models.
Academically, people talk
about fancy algorithms. But
in real life, how efficiently
the models run in
distributed environments is
critical. IBM is a very good
partner & we are very pleased
with their solution.”
Richard Liu, Quantitative
Analytics Manager, Wells Fargo,
IBM Think18

How Do You Get AI Scalability for a Decade ?
Plan for the Future
• Where is the biggest pain today?
• What does it look like in 3, 5, and 10 years?
• Think workflow
Software Is Your Friend
• Capabilities, Efficiency, Balance, Cost, Scalability
• Usability
Hybrid MultiCloud World
• Cloud or on premise? It’s both !
• Drive the linkage with Data Science and IT
(information architecture)
There is no
Without an

AI takes flight
“In any business, differentiation is everything. Data is the
source of differentiation. How we started on the path to
today’s Delta: we had to get the basics first. The foundation.
We now have the foundation, the data infrastructure. In order to
improve the processes we have. In the past, we had the data,
but we didn’t have the sourcing, or the data infrastructure to get
at the data. So we built this data ocean with billions of
data points and turned it into action for Delta to better
serve customers.”
“That’s a great point. You can’t do enterprise wide analytics and
AI (Artificial Intelligence) until you have the right “data basics”
first, i.e. you have to have the foundation information
architecture and data infrastructure in place.
Most companies have random acts of digital and AI all
over. But until you can pull them together, and re-imagine
how the work is to be done, you can’t scale any of it.”
Ed Bastian, CEO, Delta Airlines Ginni Rometty, CEO, IBM
“In 2010, we had 5,600 maintenance cancellations, at least one every day that year.
In 2018, we had just 55 maintenance cancellations.
That’s a 99% improvement.”
– Ed Bastian, CEO, Delta Airlines
IBM Keynote
Jan. 9, 2019

AI Scalability for the Next Decade

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AI Scalability for the Next Decade

Similar to AI Scalability for the Next Decade (20)

More from Paula Koziol

More from Paula Koziol (20)

Recently uploaded

Recently uploaded (20)

AI Scalability for the Next Decade