If you're like most of the world, you're on an aggressive race to implement machine learning applications and on a path to get to deep learning. If you can give better service at a lower cost, you will be the winners in 2030. But infrastructure is a key challenge to getting there. What does the technology infrastructure look like over the next decade as you move from Petabytes to Exabytes? How are you budgeting for more colossal data growth over the next decade? How do your data scientists share data today and will it scale for 5-10 years? Do you have the appropriate security, governance, back-up and archiving processes in place? This session will address these issues and discuss strategies for customers as they ramp up their AI journey with a long term view.
7. What is AI?
Laptop
R Programming
Python Programming
Concept
Understanding
Defining a Problem
Building Model
Proof of Concept with
GPU Technology
POC in the Cloud
POC On Premise
Does it work?
Can it do something
meaningful for me
today?
Which AI opportunities
are next?
Expanding
Growth in Cloud Usage
GPU Workstation
GPU Server
How to improve?
More applications
More GPU
workstations/servers
USB Drives. Disk
Array #1. Disk Array
#2. etc..
Challenges
Searching for
information
Sharing information
Performance
Scaling
Back-Up
Archiving
Security
Governance
8. Inefficient workload
management results in poor
server utilization rates and
throughput.
Replicated data
Difficult to scale
Lack of security with
open-source frameworks
and applications
introduces risk.
Multiple Spark teams each with
dedicated servers = wasted
capacity and high administrative
overhead.
Scaling Spark
is challenging
10. The Problem
•Data needs to be labelled
•Its manual, too much data, time taking
•Data is scarce and expensive to aggerate
•Developing custom models quickly
•Data scientists are required
•Not all Data scientists are created equal
•Deploying custom solutions around these Models
•Need application developers with skills in OpenCV
•Leveraging GPUs
•Train and Infer faster
•Need to deploy models at the Edge – Nvidia TX2/Xavier/T4
•Keep up with CUDA drivers
12. 80%
of data is either
inaccessible,
untrusted or
unanalyzed
81%
of users do not
understand the
data required
for AI
(information architecture)
12
No amount of AI algorithmic sophistication will overcome a
lack of data [architecture] … bad data is simply paralyzing…“
There is no
Without an
”
13. Let’s Help You Get There..
Software is your friend
A workflow view
AI and Big Data Analytics at Scale
15. Is your data stranded?
Complex workflows can lead to data isolation
Replication
sprawl Time
to deliver
Synch
issues
Custody
chain
Performance
disparities
High labor
cost
Mgt.
Complication
Optimization
nightmares
?
Ingest Preparation Training Inference
16. The End-to-End Enterprise Data Pipeline
Machine Learning and Deep Learning do happen in a silo
EDGE INSIGHTS
Ingest Organize Analyze Prepare Train Inference
Streams/
NFS/S3
Data Input
Integrate new data
with existing
repositories
Correlate data from
data lake for newer
insights and views
Use select datasets to identify
patterns and train models for future
decision making
ETL, Tagging
BI, HPC
Sample Workloads
Data as the shared asset between various Analytics and AI stages
in an E2E enterprise data pipeline
17. Hadoop
HDFS
Data Lake
Cloud Based
Data Providers
IOT
Deep
Learning
ImpactHDFS
Transparency
Connector
/ESS
Enabling Agility, Straight-Thru Processing
for Improved Data Engineering ETL
Spark AI Grid with Watson ML Accelerator
Reduce cost and improve service levels
19. Power Systems LC922 – Delivering enhanced price-performance for Apache Spark
Reduce operating costs and deliver results faster compared to tested Intel Xeon systems
1. Results are based IBM Internal Measurements running four concurrent streams of 99 TPC-DS like queries against a 3TB dataset. Results valid as of 4/25/18 and conducted under laboratory condition with speculative execution controls to mitigate user-to-kernel and user-to-user side-channel attacks on both
systems, individual results can vary based on workload size, use of storage subsystems & other conditions
2. Hardware: 4 nodes IBM Power LC922 (2x20-core/2.7 GHz/512 GB memory) using 12 x 8TB HDD, 10 GbE two-port, RHEL 7.5 LE for Power9 and 4 nodes of Intel Xeon Gold 6140; 36 cores (2 x 18c chips) at 2.3 GHz; 512 GB memory, 12 x 8TB HDDs, 10Gbps NIC, Red Hat Enterprise Linux 7.5
3. Software: Apache Spark 2.3.0 located at http://spark.apache.org/downloads.html ; and open source Hadoop HDP 2.7.5
4. Pricing is based on Power LC922 http://www-03.ibm.com/systems/power/hardware/linux-lc.html and publicly available x86 pricing.
5. Apache®, Apache Spark®, and associated logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Intel Xeon SP Gold
6140 server
22.6 QpH
Power LC922
29.5 QpH
Power LC922
$43,948
Intel Xeon SP Gold
6150 server
$53,548
18%
LOWER
Price2,3,4
30%
MORE
Performance1
Power LC922 Delivers
1.6x
Price-performance
+
24. Leading the
Pack in
AI Infrastructure
IBM Systems Reference
Architecture for AI
IBM PowerAI
IBM Spectrum Computing
IBM Storage
IBM Accelerated
Compute Platform
IBM Power Servers
IBM Spectrum Computing
IBM Spectrum Scale & ESS
IBM Storage Solutions
for AI / ML / DL
IBM Spectrum Scale
IBM Cloud Object Storage
IBM Spectrum Discover
All Built upon Data Infrastructure with:
Wells Fargo:
Financial Risk Modeling
“Wells Fargo data scientists
build, enhance, & validate
hundreds of models each day,
speed is critical, along with
scalability, as they deal with
greater amounts of data &
more complicated models.
Academically, people talk
about fancy algorithms. But
in real life, how efficiently
the models run in
distributed environments is
critical. IBM is a very good
partner & we are very pleased
with their solution.”
Richard Liu, Quantitative
Analytics Manager, Wells Fargo,
IBM Think18
25. How Do You Get AI Scalability for a Decade ?
Plan for the Future
• Where is the biggest pain today?
• What does it look like in 3, 5, and 10 years?
• Think workflow
Software Is Your Friend
• Capabilities, Efficiency, Balance, Cost, Scalability
• Usability
Hybrid MultiCloud World
• Cloud or on premise? It’s both !
• Drive the linkage with Data Science and IT
(information architecture)
There is no
Without an
26.
27. AI takes flight
“In any business, differentiation is everything. Data is the
source of differentiation. How we started on the path to
today’s Delta: we had to get the basics first. The foundation.
We now have the foundation, the data infrastructure. In order to
improve the processes we have. In the past, we had the data,
but we didn’t have the sourcing, or the data infrastructure to get
at the data. So we built this data ocean with billions of
data points and turned it into action for Delta to better
serve customers.”
“That’s a great point. You can’t do enterprise wide analytics and
AI (Artificial Intelligence) until you have the right “data basics”
first, i.e. you have to have the foundation information
architecture and data infrastructure in place.
Most companies have random acts of digital and AI all
over. But until you can pull them together, and re-imagine
how the work is to be done, you can’t scale any of it.”
Ed Bastian, CEO, Delta Airlines Ginni Rometty, CEO, IBM
“In 2010, we had 5,600 maintenance cancellations, at least one every day that year.
In 2018, we had just 55 maintenance cancellations.
That’s a 99% improvement.”
– Ed Bastian, CEO, Delta Airlines
IBM Keynote
Jan. 9, 2019