OpenPOWER Webinar Series
AI @ Scale in
the Enterprise
Clarisse Taaffe-Hedglin
clarisse@us.ibm.com
Executive AI Architect
IBM Systems
2
Please note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without
notice and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it
should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal
obligation to deliver any material, code or functionality. Information about potential future products may not
be incorporated into any contract.
The development, release, and timing of any future features or functionality described for our products
remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled
environment. The actual throughput or performance that any user will experience will vary depending upon
many factors, including considerations such as the amount of multiprogramming in the user’s job stream,
the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can
be given that an individual user will achieve results similar to those stated here.
Agenda
Data Analytics Today
The AI Ladder and Lifecycle
AI in the Enterprise Themes
Infrastructure Considerations
Example
• No governance
• No collaboration
• Limited complexity
How Customers Do Data Analytics Traditionally
Spreadsheets
• Broad rules and categories
• Not dynamic
Business Rules
• Hard to maintain
• Pre-set rules and
approaches
Homegrown
Applications
• Limited use of analytics
• Hard coded models that do
not apply to unique needs
• Slow response
Other Applications
5
Enterprise Analytics Modernization: From Data to Actions
010101010101010111100010011001010111
0000000000010101010100000000000 111101011
11000 000000000000 111111 010101 101010 10101010100
Prescriptive
What should
we do ?
Descriptive
What Has
Happened?
Cognitive
Learn
Dynamically
Predictive
What Will
Happen?
ACTIONDATA
HUMAN INPUTS
<
< >
< >
>c
c
c
c >
Predict a
Future Event
Segment Data
/ Detect
Anomalies
Determine
optimal
quantity,
price,
resource
allocation, or
best action
Understand
Past Activity
Discover
Insights in
Content
(text, images,
video)
Interact in
Natural
Language
Forecast
and Budget
based on
past activity
Supervised Unsupervised
Predictive: What will happen? Prescriptive:
What should
we do?
Descriptive:
What
happened?
Planning:
What is our
Plan?
NLPDeep Learning
Supervised
Common Patterns of Analytics Business Problems
Solving business problems with Data and AI
will utilize a combination of these analytics patterns
Three broad categories of Use Cases
“Structured” Data Use Cases
Computer Vision Use Cases
- Big Data (Rows and Columns)
- GPU Servers
- Available AI Software
More Accuracy !
This is sort of “Magic”
- in training, a Model learns to detect and classify objects
Natural Language Processing Use Cases
- A Model learns to read and hear and “understand” language
Organizations are adopting
AI to solve business problems
Fraud Safety, inspection and
process improvement
Defense and security
“AI is the
fastest-growing
workload”*
9*Forrester Research Inc. “AI Deep Learning Workloads Demand a New Approach to Infrastructure”, by
Mike Gualtieri, Christopher Voce, Srividya Sridharan, Michele Goetz, Renee Taylor, May 4, 2018.
COLLECT - Make data simple and accessible
ORGANIZE - Create a trusted analytics foundation
ANALYZE - Scale AI everywhere with trust & transparency
Data of every type, regardless of
where it lives
MODERNIZE
your data estate for an
AI and multicloud world
INFUSE – Operationalize AI across business processes
The AI Ladder
A prescriptive approach to accelerating the journey to AI
10
AI
AI-optimized systems
infrastructure
Unstructured, Landing, Exploration and Archive
Operational Data
Real-time Data Processing & Analytics
Transaction and
application data
Machine,
sensor data
Enterprise
content
Image, geospatial,
video
Social data
Third-party data
Information Integration & Governance
Data is Prerequisite to AI
Risk, Fraud
Chat bots,
personal
assistants
Supply Chain
Optimization
Dynamic
Pricing,
Recommenders
Behavior
Modeling
Vision,
Autonomous
Systems
Enterprise Data Pipeline for AI
Insights Out
Trained Model
Inference
Data In
Transient Storage
SDS/Cloud
Global Ingest
Throughput-oriented,
globally accessible
Cloud
ETL
High throughput, Random
I/O,
SSD/Hybrid
Archive
High scalability, large/sequential I/O
HDD Cloud
Tape
Hadoop / Spark
Data Lakes
Throughput-oriented
Hybrid/HDD
ML / DL
Prep ⇨ Training ⇨ Inference
High throughput, low
latency,
Random I/O
SSD/NVMe
Classification &
Metadata Tagging
High volume, index &
auto-tagging zone
Fast Ingest /
Real-time Analytics
High throughput
SSD
Throughput-oriented,
software defined
temporary landing zone
capacity tier
performance tier performance &
capacity Tier
performance &
capacity Tier
performance tier
capacity tier
Fits Traditional and New Use Cases
EDGE INGEST ORGANIZE ANALYZE INSIGHTSML / DL
IBM Spectrum Scale / Storage for AI / © 2020 IBM Corporation
AI Model Development Workflow
•Data preparation
•Model development environment
•Runtime environment
•Train, deploy and manage models
•Business KPI and production metrics
•Explainability and fairness
Data Science Team IT Operations Team
Data Science Exploration
to Production
Use Case Exploration
Data Science Model Build
Use Case Deployment in Production
Requires solution architecture
Deploy
Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Use Case Exploration
Data Science Model Build
Security, Privacy and Governance
Traditional infrastructure isn’t
suited for AI workloads
Systems don't easily scale
to meet demand
Processor not optimized for
AI workloads
The wrong infrastructure puts AI at risk.
Data pipeline too slow, causing
bottleneck effect
Common AI Data Considerations
Data Compute
Legacy Data
Stores
IoT, Mobile
& Sensors
Collaboration
Partners
New Data
Ingest InferenceTrainingPreparation
Iterative Model training to improve accuracy
Champion
Challenge
r
-”Data Center”
- At Edge
Trained
Model
§ Ease to Massively Scale
§ High Performance
§ Tiered / Archive
§ Secure
§ High Performance
§ Metadata Tagging
§ Single Name Space
Low Latency
Dev & Inference Stack
- Open Source
- Stable and Supported
- Auditable
Productivity
Performance & Robust
Considerations
Infrastructure
Demands for AI
Equipped for volumes of data
Flexible storage for a range
of data demands
Versatile, power-efficient data
center accelerators
Advanced I/O for minimal latency
Scalability and distributed
data center capability
Inference
Powerful data center
accelerators with coherence
Advanced I/O for high
bandwidth and low latency
Proven scalability
Training
Equipped for volumes of data
Inferencing Considerations
Real-Time (vs Batch): Many AI applications
have response times in milli-seconds and in
many cases have 100K+ IOT events per
second (Latency, Latency, Latency)
Scalability: Ability to scale inference engine
and manage infrastructure
Data Pipeline: The data that is feed into
models has to be cleaned and structured to
produce accurate results
Security: Applications running AI models in
the field and back-offices
Multi-Tenancy: Multiple business
applications leveraging shared
infrastructure, Multiple Models per Business
Application
Tools Proliferation: Analytics, Data/Object
Tagging, Model Training and Inferencing
Model Management: Continuous
Training/Re-Training of Models, AI-DevOps,
Ease of Deployment
Transparency: Ability to explain decisions
A
C
C
U
R
A
C
Y
Transaction integration
Huge Scale
As-a-Service offering
Inference Data Center or In-Cloud
Multi-Tenancy
Low latency
Data movement considerations
Near Edge Inferencing
On-prem or In-Cloud
Inference at Edge
On-prem/device
Stand alone device
Low latency
Data movement considerations
Typical AI Inferencing Scenarios
© 2020 IBM Corporation20
Data and AI Lifecycle in the Enterprise
© 2020 IBM Corporation21
Quality Inspection
- Very low latency
Equipment Sensors
- low latency
Servers
GPU (IC922)
Storage
( ESS )
Optimization
- batch
Factory location 2
A Manufacturing Example
Cloud / IOT
Servers
GPU
Storage
Quality Inspection
- Very low latency
- Device Inference?
Equipment Sensors
- low latency
Servers
GPU / FPGA
Storage
( ESS ) Plant Optimization
- batch
Factory location 1
. . . .
On-Prem
AI
Model
Training
Enterprise
Systems
AI inferencing
In Transaction
Systems
Headquarters
AI Applications
and Data
Hybrid Cloud
- Containers
- Cloud Paks
Data and
meta-data
Archive
© 2020 IBM Corporation22
A Packaged Goods Quality Inspection Example
Manufacturer Inventory of
Manufactured Goods
Retailer
Shelf Inventory CheckPackage CheckInventory Management
Inspection Points
Quality Check
Images
Videos
Visual Inspector
Industrial Cameras
❶ Upload
images/videos
❷ Train models
❸ Inference
IBM Maximo Visual Inspection
Supply Chain
Solution
Consumer
Packaged Goods
*No Data Scientist required
QUALITY INSPECTION
Best Practice Approach:
Think Solutions !
Gaining insights with Machine Learning/Deep
Learning requires a solution first approach
Focus on business problem and use cases
Data is a pre-requisite
ML/DL is just a piece of an overall workflow
Establish trusted partnership relationship
Infrastructure matters
Collaboratively work towards solution
In Summary
Thank You

AI at Scale in Enterprises

  • 1.
    OpenPOWER Webinar Series AI@ Scale in the Enterprise Clarisse Taaffe-Hedglin clarisse@us.ibm.com Executive AI Architect IBM Systems
  • 2.
    2 Please note IBM’sstatements regarding its plans, directions, and intent are subject to change or withdrawal without notice and at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
  • 3.
    Agenda Data Analytics Today TheAI Ladder and Lifecycle AI in the Enterprise Themes Infrastructure Considerations Example
  • 4.
    • No governance •No collaboration • Limited complexity How Customers Do Data Analytics Traditionally Spreadsheets • Broad rules and categories • Not dynamic Business Rules • Hard to maintain • Pre-set rules and approaches Homegrown Applications • Limited use of analytics • Hard coded models that do not apply to unique needs • Slow response Other Applications
  • 5.
    5 Enterprise Analytics Modernization:From Data to Actions 010101010101010111100010011001010111 0000000000010101010100000000000 111101011 11000 000000000000 111111 010101 101010 10101010100 Prescriptive What should we do ? Descriptive What Has Happened? Cognitive Learn Dynamically Predictive What Will Happen? ACTIONDATA HUMAN INPUTS < < > < > >c c c c >
  • 6.
    Predict a Future Event SegmentData / Detect Anomalies Determine optimal quantity, price, resource allocation, or best action Understand Past Activity Discover Insights in Content (text, images, video) Interact in Natural Language Forecast and Budget based on past activity Supervised Unsupervised Predictive: What will happen? Prescriptive: What should we do? Descriptive: What happened? Planning: What is our Plan? NLPDeep Learning Supervised Common Patterns of Analytics Business Problems Solving business problems with Data and AI will utilize a combination of these analytics patterns
  • 7.
    Three broad categoriesof Use Cases “Structured” Data Use Cases Computer Vision Use Cases - Big Data (Rows and Columns) - GPU Servers - Available AI Software More Accuracy ! This is sort of “Magic” - in training, a Model learns to detect and classify objects Natural Language Processing Use Cases - A Model learns to read and hear and “understand” language
  • 8.
    Organizations are adopting AIto solve business problems Fraud Safety, inspection and process improvement Defense and security
  • 9.
    “AI is the fastest-growing workload”* 9*ForresterResearch Inc. “AI Deep Learning Workloads Demand a New Approach to Infrastructure”, by Mike Gualtieri, Christopher Voce, Srividya Sridharan, Michele Goetz, Renee Taylor, May 4, 2018.
  • 10.
    COLLECT - Makedata simple and accessible ORGANIZE - Create a trusted analytics foundation ANALYZE - Scale AI everywhere with trust & transparency Data of every type, regardless of where it lives MODERNIZE your data estate for an AI and multicloud world INFUSE – Operationalize AI across business processes The AI Ladder A prescriptive approach to accelerating the journey to AI 10 AI AI-optimized systems infrastructure
  • 11.
    Unstructured, Landing, Explorationand Archive Operational Data Real-time Data Processing & Analytics Transaction and application data Machine, sensor data Enterprise content Image, geospatial, video Social data Third-party data Information Integration & Governance Data is Prerequisite to AI Risk, Fraud Chat bots, personal assistants Supply Chain Optimization Dynamic Pricing, Recommenders Behavior Modeling Vision, Autonomous Systems
  • 12.
    Enterprise Data Pipelinefor AI Insights Out Trained Model Inference Data In Transient Storage SDS/Cloud Global Ingest Throughput-oriented, globally accessible Cloud ETL High throughput, Random I/O, SSD/Hybrid Archive High scalability, large/sequential I/O HDD Cloud Tape Hadoop / Spark Data Lakes Throughput-oriented Hybrid/HDD ML / DL Prep ⇨ Training ⇨ Inference High throughput, low latency, Random I/O SSD/NVMe Classification & Metadata Tagging High volume, index & auto-tagging zone Fast Ingest / Real-time Analytics High throughput SSD Throughput-oriented, software defined temporary landing zone capacity tier performance tier performance & capacity Tier performance & capacity Tier performance tier capacity tier Fits Traditional and New Use Cases EDGE INGEST ORGANIZE ANALYZE INSIGHTSML / DL IBM Spectrum Scale / Storage for AI / © 2020 IBM Corporation
  • 13.
    AI Model DevelopmentWorkflow •Data preparation •Model development environment •Runtime environment •Train, deploy and manage models •Business KPI and production metrics •Explainability and fairness Data Science Team IT Operations Team
  • 14.
    Data Science Exploration toProduction Use Case Exploration Data Science Model Build Use Case Deployment in Production Requires solution architecture Deploy Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Use Case Exploration Data Science Model Build Security, Privacy and Governance
  • 16.
    Traditional infrastructure isn’t suitedfor AI workloads Systems don't easily scale to meet demand Processor not optimized for AI workloads The wrong infrastructure puts AI at risk. Data pipeline too slow, causing bottleneck effect
  • 17.
    Common AI DataConsiderations Data Compute Legacy Data Stores IoT, Mobile & Sensors Collaboration Partners New Data Ingest InferenceTrainingPreparation Iterative Model training to improve accuracy Champion Challenge r -”Data Center” - At Edge Trained Model § Ease to Massively Scale § High Performance § Tiered / Archive § Secure § High Performance § Metadata Tagging § Single Name Space Low Latency Dev & Inference Stack - Open Source - Stable and Supported - Auditable Productivity Performance & Robust Considerations
  • 18.
    Infrastructure Demands for AI Equippedfor volumes of data Flexible storage for a range of data demands Versatile, power-efficient data center accelerators Advanced I/O for minimal latency Scalability and distributed data center capability Inference Powerful data center accelerators with coherence Advanced I/O for high bandwidth and low latency Proven scalability Training Equipped for volumes of data
  • 19.
    Inferencing Considerations Real-Time (vsBatch): Many AI applications have response times in milli-seconds and in many cases have 100K+ IOT events per second (Latency, Latency, Latency) Scalability: Ability to scale inference engine and manage infrastructure Data Pipeline: The data that is feed into models has to be cleaned and structured to produce accurate results Security: Applications running AI models in the field and back-offices Multi-Tenancy: Multiple business applications leveraging shared infrastructure, Multiple Models per Business Application Tools Proliferation: Analytics, Data/Object Tagging, Model Training and Inferencing Model Management: Continuous Training/Re-Training of Models, AI-DevOps, Ease of Deployment Transparency: Ability to explain decisions A C C U R A C Y Transaction integration Huge Scale As-a-Service offering Inference Data Center or In-Cloud Multi-Tenancy Low latency Data movement considerations Near Edge Inferencing On-prem or In-Cloud Inference at Edge On-prem/device Stand alone device Low latency Data movement considerations Typical AI Inferencing Scenarios
  • 20.
    © 2020 IBMCorporation20 Data and AI Lifecycle in the Enterprise
  • 21.
    © 2020 IBMCorporation21 Quality Inspection - Very low latency Equipment Sensors - low latency Servers GPU (IC922) Storage ( ESS ) Optimization - batch Factory location 2 A Manufacturing Example Cloud / IOT Servers GPU Storage Quality Inspection - Very low latency - Device Inference? Equipment Sensors - low latency Servers GPU / FPGA Storage ( ESS ) Plant Optimization - batch Factory location 1 . . . . On-Prem AI Model Training Enterprise Systems AI inferencing In Transaction Systems Headquarters AI Applications and Data Hybrid Cloud - Containers - Cloud Paks Data and meta-data Archive
  • 22.
    © 2020 IBMCorporation22 A Packaged Goods Quality Inspection Example Manufacturer Inventory of Manufactured Goods Retailer Shelf Inventory CheckPackage CheckInventory Management Inspection Points Quality Check Images Videos Visual Inspector Industrial Cameras ❶ Upload images/videos ❷ Train models ❸ Inference IBM Maximo Visual Inspection Supply Chain Solution Consumer Packaged Goods *No Data Scientist required QUALITY INSPECTION
  • 23.
    Best Practice Approach: ThinkSolutions ! Gaining insights with Machine Learning/Deep Learning requires a solution first approach Focus on business problem and use cases Data is a pre-requisite ML/DL is just a piece of an overall workflow Establish trusted partnership relationship Infrastructure matters Collaboratively work towards solution In Summary
  • 24.