© 2020 IBM Corporation
IBM Cognitive
Infrastructure
Relevant Contribution to
the COVID-19 Fight
Improve time to results.
Impact more lives.
April 2020
1
IBM Power Systems | IBM Storage
© 2020 IBM Corporation
Collaborators
2
Frank Lee
Terry Leatherland
Linton Ward
Clarisse Taaffe-Hedglin
Ruzhu Chen
Mohamed El-Shanawany
Bill Josko
Steven Meserve
Marisa de Peralta
Jeff Hong
Mike Kowolenko, Novisystems
+
Many Partners, Clients,
Researchers
IT Builders
© 2020 IBM Corporation
3
Infrastructure delivers faster insights with greater
efficiency to – change the flow of work.
High-performance Data & AI deployed against this problem at massive scale
reduces time spent delivering insights through unique load balancing and
model optimization technologies
© 2020 IBM Corporation
IBM Accelerates Medical Research Tasks
GENOMICS
Biomarkers detection,
biodata modeling and
statistics data
visualization
DIAG N O STIC S
Image classification using
AI with flexible, targeted
models in open
frameworks
MOLECULAR
SIMULATION
Drug discovery via
modeling of
macromolecule
receptors and small-
molecule ligands
DATA FU SIO N
Synthesize and model
diverse data using data
fusion, natural language
processing, and machine
learning
BIOMOLECULAR
STRUCTURE
Cryo-EM image
restoration and
refinement analysis
for drug design and
discovery
QU A LIT Y
IN SP E C TIO N
Quality control and
regulatory compliance
in medical device or
pharmaceutical
manufacturing
4
© 2020 IBM Corporation
R&D pipeline
5
Manufacturing
Adapted from https://www.nimh.nih.gov/about/directors/thomas-insel/blog/2012/experimental-medicine.shtml
GEN O M ICS
MOLECULAR
SIMULATION
BIOMOLECULAR
STRUCTURE
DIAG N O STIC S
DATA FU SIO N
QUALITY
INSPECTION
IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation
Data
Overload
Oceans of data
arise from rapid
digitization and
instrumentation
of healthcare.
App Chaos
Thousands of
applications,
workflows and
models are not
all following the
same rules.
Adoption
Vertically
integrated
toolsets with
heavy
customization
and vendor lock-
in create work
silos.
Performance
When scaling up
or out, most
institutions
cannot diagnose
or analyze the
performance
problems they
face.
Cost
Demanding
workloads
require well-
orchestrated
infrastructure to
manage, monitor
and control
costs.
Five key challenges to progress remain despite advances
6
© 2020 IBM Corporation
© 2020 IBM Corporation
A framework for designing, deploying, growing and optimizing infrastructure for HPC, AI and Cloud, created in collaboration with world’s leading
healthcare and life sciences institutions, and using Red Hat OpenShift, IBM Power Systems, IBM Storage and open API endpoints.
From Data to Insight with IBM HPDA Reference Architecture
DATAHUB
High Performance Data Fabric & Catalog Capable of
Handling Exabytes of Data and Trillions of Objects
OR C H E ST R ATI
ON
High Performance Computing & AI Platform Capable of
Orchestrating Thousands of Servers and GPUs
APPS & MODELS
Large-scale and high-throughput workloads such as
HPC, AI and Cloud computing
ME D IC A L
TA S K S
Genomics, molecular simulation, structural analysis,
diagnostics, data fusion, manufacturing quality
inspection.
7
8
Orchestrator
Carrier & Engine for Jungles of App
Datahub
Dams & maps for Ocean of Data
Disk TapeFlash
Compute
& Storage
Software-
Defined
Infrastructure
Applications
& Tools
CPU GPU
Framework
& Libraries
Clinical RWE
High Performance Data & AI (HPDA) Architecture
ImagingGenomics
© 2020 IBM Corporation
© 2020 IBM Corporation
InfiniBand and Ethernet
switches & UFM
(Mellanox) shared
Elastic Storage
Server (ESS) per System
100TB – xx PB
v High performance & scalability
v Linux RHEL7.6, CUDA, ESSL,xl
compilers, SMT, GPFS, NVMe …
v Optimized ML/DL frameworks and
tools, e.g., Watson Machine Learning-
CE (Pytorch, TensorFlow, Caffe),
v Life Science Apps: GATK, Relion,
NAMD, Amber, Gromacs, etc
v Value-add tools:
v IBM Visual Insights, IBM Visual
Inspector , 3D data labeling prep
v Application interoperability &
manageability:
v Openshift, Docker, Kubernetes, LSF ..
v Anaconda, python, Jupyter …
v MPI, XLC/C++, GCC, AT …
v Data management & governance
v Elastic Storage /Spectrum
Discover
Accelerated Computing Platform (ACP)
Network
IBM TOR Switch
Enet TOR Switch
Network Manager
Compute Nodes
• Deep Learning
• Inference
• Computation
Management Nodes
ESS Mgmt. Node
ESS* Storage
1-18 Deep Learning
nodes for model training
2+ compute nodes for
inferencing, computation
1-5 Mgmt. nodes
Per rack
Power9
AC922
Power9
IC922
Hardware Building Blocks So3ware Building Blocks
OpenPOWER
9
To provide enhanced user experiences at enterprise scale
Researcher
Auto ML with
Driverless AI
Instance #2
Data Engineer/Analyst
ETL / Reporting
Instance #4
Risk Analytics
LOB
Instance #3
Administrator
Compute Nodes
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Data scientist
Customer behavior...
Trend analysis...
Instance #1
Broader use, better business value
ü Logical view - each group has their own environment
ü Right resources allocated to each user based on their
workload characteristics and SLA priority
ü Multitenancy ensures data and application isolation
Existing Hadoop Data Lakes
Spark AI Grid
Data from
lake and
other
sources
x86
© 2020 IBM Corporation
Journey to Power Analytics Grid
Comput
e
YARN
HDFS
Comput
e
YARN
HDFS
Linux
Hive, HBase, Spark,…
Hadoop Workloads
Cloudera Hadoop
With WMLA at the Edge
Compute
YARN
Compute
YARN
Linux on Power
Hive, HBase, Spark,…
Shared Storage
POSI
X
HDF
S
HDF
S
Secondar
y Storage
Tier
Data
Ingest
Compute
Spectrum
Conductor
Linux on Power
GPU
Spark, TensorFlow, Caffe,..
ML/DL AI Workloads
with IBM Power WMLA
POSI
X
Compute
Spectrum
Conductor
Hadoop Workloads
With WMLA
Phase I: De-Couple Compute Phase II: Shared Storage Tier Phase III, AI Hub
Benefits
De-Coupled Compute from
Storage migrating workloads
from Cloudera, Reduce
dependence on Static/Batch ETL
Faster ingest with POSIX, Reduce storage
footprint and copies of data, Grow
storage independently of compute
leveraging in-memory Spark vs ETL
Complexity of managing
Hadoop stack, Analytics
Performance, Resource
utilization
Improved resource uPlizaPon, MulP-
tenancy, Improved AnalyPcs Performance,
Spark GPU acceleraPon & AI ML/DL
frameworks on the same cluster
Cloudera
Data
Lake
Server
Nodes
Read Only
Hybrid-Cloud ready, evoluPon to
AI ML/DL analyPcs
Replace Cloudera with
x86/Power and evolve to
migraPon off of Hadoop
MapReduce to Spark centric
plaVorm
11
© 2020 IBM Corporation
© 2020 IBM Corporation
reduce “Time To Answer”
Biomolecular StructureMolecular Fit SimulationGenomics
Data Fusion and AIMedical Diagnostics AI Medical Manufacturing Quality
Accelerate drug discovery by accelerated
modeling of macromolecule receptors and
their small-molecule ligands. Applications are
optimized and accelerated (>2x) on IBM
Power9 with advanced CPU and GPU
technology, as well as IBM ESS storage, hybrid
cloud solutions.
Applications: NAMD, Gromacs, AMBER
Cryo-EM image restruction and refinement
analysis is accelerated with IBM Power9 and
ESS storage solutions by more than 2 Jmes,
comparing to its peers. High resolution protein
receptors and ligands can be applied in drug
design and discovery studies.
Application: RELION, CrYolo
Accelerate diagnostics with accelerated AI for
image (classification…) with flexible, targeted
models in open frameworks. An accelerated
computing platform accelerates time to results.
systems management and storage solutions
provide efficient and trusted curation of models,
images, metadata and inferences.
Applications: Visual Insights, Watson Machine
Learning + Accelerator, (Tensorflow, Pytorch)
Rapidly synthesize and model diverse data
using data fusion, natural language
processing, and machine learning. Fact
extraction from papers and EMR (Meddra-
based dictionaries). Text, structured lab
values and images can be included in AI
models and statistical methods in real time..
Partners: Novi Systems H2O, Spark, Hadoop
High performance and efficiency (10x) of
software tools and applications for genomic
variants and biomarkers detection, biodata
modeling and statistics data visualization. High
throughput and optimized workload pipelines
to accelerate biodata analysis with highly
optimal and parallel I/O, memory, CPU and
GPU computations
Applications: GATK 4, BWA, ParaBricks
Rapidly Leverage data fusion and AI for
quality control and regulatory compliance in
medical device manufacturing or in
pharmaceutical manufacturing. Make
product accept / reject decisions and
maintain compliance records.
Application: IBM Visual Inspector
Partner: Novi Systems
The Solutions
12
Optimizing Precision Medicine
Reduced time-to-completion for long-running
jobs while increasing resource utilization
Using IBM, Sidra has completed hundreds of
thousands of computing tasks comprising
millions of files and directories, without
experiencing system downtime.
100,000+
© 2020 IBM Corporation
Biomolecular Structure: Cryogenic Electron Microscope
Image
Bucket
5TB per day
“Time to Answer” is now paramount
IBM IC922 = 46min. vs 77 min on x86
• Storage growth escalating each day = 1PB per year.
• Image rendering is compute intensive
© 2020 IBM Corporation
• IBM is deploying AI techniques to automate the pipeline for High Resolution screening
and re-imaging using IBM Visual Insights.
• Molecules images are frozen in ice. Question: is it feasible to magnify on 100x resolution?
• Each grid image is a data file. Manual selection intensive to determine High Res candidates
• AI can automate the High Resolution re-image pipeline
AI Image Processing
© 2020 IBM Corporation
© 2020 IBM Corporation
Molecular Simulations. - Standard Codes
NAMD
Amber
Gromacs
IBM Performance is xx faster than Intel.
16© 2020 IBM Corporation
NAMD , VMD 25X performance boost with IBM
© 2020 IBM Corporation
© 2020 IBM Corporation
18
© 2020 IBM Corporation
© 2020 IBM Corporation
Medical Imaging
19
IBM Visual Insights Radiology Use Case
Using an automated Deep Learning
model built with IBM Visual Insights,
Radiologists can process lung X-rays to
detect Covid-19 induced pneumonia
faster and with more accuracy.
© 2020 IBM Corporation
Optimizing Medical Imaging
Enhance image identification with deep learning
to assist physicians and benefit patients
1300 MRI images trained by IBM Power
Systems and IBM Storage in just two hours,
compared to forty hours on traditional
architectures
20x faster
© 2020 IBM Corporation
• A Solution – Not a Bag of Tools – targeting the domain expert
• We provide solutions that use AI, not just AI tools
• Targets subject matter experts
• Intuitive interfaces; Excel level skills
• Easy to Understand - Data integration, filtering, and analysis on
steroids
• Allow users to focus on the problem, not the technology
• Overcome internal and external issues with data fusion
• Provide the ability to explore data to identify issues and
opportunities
• Simple to Use
• Abstract ML and NLP to identify important features or isolate
facts
from structured and unstructured data
• Delivered as a subscription (including
h/w, support, consulting, training)
• Provide ROI in minutes, not months
*Patents pending on the methods employed by NoviLens
Data Fusion and AI: NoviLens© Data AI Appliance
• Easy data import and automation
• Data fusion
• Enhance problem solving while de-
risking decisions.
• Internal data is your unique position
• supplement it with external data
• Visual Analytics and Filtering
• Natural Language Processing
• Machine Learning
• Deployed in the IBM Cloud or On-Premise
Easy-to-use Data AI Appliance for Decision Makers.
** TechData IBM Reseller in
Research Triangle Park 22
© 2020 Novi Systems
Uses of Data Fusion AI and advanced analytics
Clinical trial:
– Patient selection - read through EMRs to find patients with acceptable criteria
(shorten enrollment period)
– Monitor trial - pick up adverse events by reading through EMRs then grouping symptoms as well as lab data then
doing regression analysis (sample size probably too small for ML)
Production:
– Capacity: Monitor process stream for process control. Biggest issue will be supply. Know when to "kill" a batch
to free up production. Done through monitoring process (quantitative data as well a batch records).
– Product Release: Can integrate with LIMS and if they have electronic lab notebooks, treat the notebooks like
EMRs. Will accelerate failure investigations and provide insight into design space shortening release times (did
this with ThermoFisher)
– Regulatory: Assist in Post Market Surveillance by reviewing Phase IV or Medline records automatically
Distribution:
– Depending on the sophistication of their ERP systems, can coordinate specifications with regulatory
requirements (may have different specs/country).
Late stage drug development and Product Release
© 2020 Novi Systems
Data Fusion Example: Manufactured Product Disposition
product disposition
manufacturing
supply chain Production planning
BOM
vendor qualification/certification
material specifications
processing
cleaning air
water
surfaces
facilities air
water
surfaces
in process monitor
process trending
SPC
process documentation
engineering
maintenance
PM
event
shutdown
facilities
surfaces
mechanical
air
water
change control
training
quality
lab testings
oos
investigation
Control charting
product
process
materials
doc review
facility testingwater
air
surfaces
equipment
training
change control
audits deviation management
investigation product review
lab review
facilities review
training
development data
process development process capability
product attributes
R&D
preclinical
clinical
Manufacturing
Quality
Investigation
Product
Disposition
Deviation
Management
Data Fusion Example: Manufactured Product Disposition
product disposition
manufacturing
supply chain Production planning
BOM
vendor qualification/certification
material specifications
processing
cleaning air
water
surfaces
facilities air
water
surfaces
in process monitor
process trending
SPC
process documentation
engineering
maintenance
PM
event
shutdown
facilities
surfaces
mechanical
air
water
change control
training
quality
lab testings
oos
investigation
Control charting
product
process
materials
doc review
facility testingwater
air
surfaces
equipment
training
change control
audits deviation management
investigation product review
lab review
facilities review
training
development data
process development process capability
product attributes
R&D
preclinical
clinical
Manufacturing
Quality
Investigation
Product
Disposition
Deviation
Management
Decision
© 2020 Novi Systems
25
https://novi.systems/covid-19/
Ac#onable
Decisions
Image
Sensor
Database
Text
Data
Fusion
Sources
Fact
Tables
Ques#ons
AI &
Analytics
Language
Analytics
Classification
Detection
Time
Series
Descriptive
Statistics
Ω
?
NoviLens AI Appliance
End to End Workflow
26© 2020 IBM Corporation
TopicsH2O Driverless AI – Simple, Fast, Accurate, Interpretable
Easy Deployment for
Low Latency Models
• Production-ready, stand-
alone scoring pipelines that
are easy for IT to deploy and
manage
• Python and Java
• Streamlined scoring code to
deploy on any device: on the
edge, mobile, …
• Very fast (milliseconds) to
satisfy today’s real-time apps
Fast and
Accurate Results
• “Data Scientist in a Box”
• Simple interface
• Automatic feature
engineering to increase
accuracy
• Automatic recipes for solving
wide variety of use-cases
• Automatic tuning to
find and tune the right
ensemble of models
Industry Leading
Interpretability
• Trusted results with
explainability and
transparency
• Interpretability for debugging,
not just for regulators
• Get reason codes and model
interpretability in plain
English
• K-Lime, LOCO, partial
dependence and more
Automatic Data
Visualization
• Automatic generation of
visualizations and graphs to
explore your data before the
model-building process
• Most relevant graphs shown
for the given data set
• Identify outliers and
missing values
H2O Driverless AI Overview
27
27© 2020 IBM Corporation
© 2020 IBM Corporation
Extending Support for Edge Devices
IBM Visual Inspector for Healthcare
Manufacturing, Pharmaceuticals
AC922/IC922/x86
NVIDIA Jetson
TX2/Nano
iOS Device
TensorRT
CoreMLPower9 IC922/AC922 & x86 (coming soon)
28
© 2020 IBM Corporation
© 2020 IBM Corporation
Executive Summary
• IBM Visual Inspector is a standalone iOS
application that can execute AI models
trained on PowerAI Vision
• Available on the Apple App Store
• Devices can be handheld or fixed
(mounted)
• Works in network connected or
disconnected mode
• Provides model and device management
capability to support an end-to-end AI
Computer Vision environment
• Built-in demonstration models
• Custom model development requires
Visual Insights
IBM Visual Inspector
29
© 2020 IBM Corporation
Capture data to build model Inference
(disconnected or connected)
Remote management
Visual Inspector App Functions
Multiple Ways to Utilize One Application
Where does this fit? Healthcare “things” manufacturing, Pharma drug manufacturing
30
© 2020 IBM Corporation
OpenPOWER Servers
IBM Power AC922
TRAIN
Powering the Fastest Supercomputer
DATA
IBM Power IC922
INFERENCE
IBM Power IC922
Deploy AI into ProductionStorage Dense Server
31
• NVMe dense server with IO rich
architecture for superior throughput1
• Enterprise ready cloud deployment
with RH OpenShift and Power
Systems reliability
• 2.35x superior price/performance for
containerized cloud deployments
• Best training platform with 4x faster
model iteration
• ~6x data throughput with NVLink
to GPUs
• Synergistic HW/SW offerings for ease
of use and leadership performance
• Superior density (33%) and through-
put to inference accelerators
• Open design for accelerator diversity
• Deploy inference at scale with HW
and SW solution offerings
31
© 2020 IBM Corporation
© 2020 IBM Corporation
INSIGHTS
Ingest
Prepare | Train | Inference
• Single name space
• Global collaboration / hybrid cloud
• Software RAID / erasure coding
• Multi-protocol support
Throughput-oriented,
software-defined
temporary landing zone
Transient storage
High throughput
performance tier
Fast ingest /
Real-time analytics High volume, index &
auto-tagging zone
Classification &
metadata tagging
Organize
Analyze
ML/DL
ETL
Large-scale Runtime Environment
Watson Machine Learning
Accelerator
WML CE
SnapML
Spectrum Computing
IBM Spectrum Storage for the
AI Data Pipeline
The fastest path from ingest to insights
Putting it all together
High scalability,
large/sequential
I/O capacity tier
Archive
DATA
© 2020 IBM Corporation
© 2020 IBM Corporation
IBM Elastic Storage Server
| 33
Spectrum Scale
ESS
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
7 - 32 GB/s
Models GL1S, GL2S, GL4S, GL5S, GL6S
1-6, 84 disk drive enclosures
0.25 – 6.8 PB usable
0.33 – 8.9 PB raw
GLxS
Disk
High perfomance,
capacity
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
60 TB – 1.1 PB usable
90 TB to 1.5 PB raw
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
9 - 37 GB/s
Models GS1S, GS2S, GS4S
1-4 SSD enclosures
High perf, IOPS,
random
I/O
GSxS
Flash
Disk: 0.5 - 2.5 PB usable
SSD: 60 - 530 TB usable
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
D1 D2 D3 D4 D5 D6 D7 D8
S822L
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
FC5887
GHxx
Hybrid
Disk: 14 to 29 GB/s
SSD: 13 to 26 GB/s
Max: to 36 GB/s*
Models GH12, GH14, GH22, GH24
2-4, 84 drive HDD enclosures
1-2, 24 drive SSD enclosures
Combined high
perfomance,
capacity, IOPS,
random
I/O
*Maximum combined Disk&SSD Perf per ESS unit
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
D1 D2 D3 D4 D5 D6 D7 D8
S822L
Models GL1C, GL2C, GL4C, GL5C, GL6C, GL8C
1-8, 106 disk drive enclosures
7 - 32 GB/s
0.78 – 9.1 PB usable
1 – 11.8 PB raw
GLxC
Disk
High
performance,
capacity, density
9 PB
per
rack
8 May 2020/ © 2018 IBM Corporation
• Who, what, when, where, and why of account, container, object, stream, dir, file
• Perfect for indexing and searching
• Metadata may be separate from the data, stored with the data, or derived from the data
• Posix inode plus extended attributes
• Standard document headers (doc, ppt, mp3, dicom, pdf, jpeg, GeoTIFF)
• Custom metadata tags
• AI derived metadata
Age, Biomarkers, Developmental Stage, Cell
Surface, Markers, Cell Type/Cell Line,
Disease State, Extract Molecule, Genetic
Characteristics, Immunoprecipitation,
antibody, Organism,
Biomedical
Natural Language
Processing
Image
Location
Size
Owner
Group
Permissions
Last-Modified
...
System
Metadata
Metadata: Key to Unlocking Data Value & Improving Management
Spectrum Discover
© 2020 IBM Corporation
IBM Spectrum Archive:
Policy-based Cost
Optimization
35
Small files last
accessed > 30
days
last accessed
> 60days
Silver pool is >60%
full Drain it to 20%
accessed
today and
file size is
<1G
Send it back to
Silver pool when
accessed
System pool
(Flash)
Gold pool
(SSD)
Silver pool
( NL SAS)
TS4500
Spectrum Archive
Automation
• Powerful policy engine
- Example: File Heat measures how often
the file is accessed.
- As the file gets “cold” move it
automatically to a lower cost storage
pool
- Information Lifecycle Management
- Fast metadata ‘scanning’ and data
movement
- Automated data migration to based on
threshold
• Users not affected by data migration
- Single namespace
- Persistent view of the data
• Tape as the external pool of Spectrum Scale
© 2020 IBM Corporation
© 2020 IBM Corporation
Watson Machine Learning
Community Edition
Deep Learning Impact
(DLI) Module
Data & Model
Management, ETL,
Visualize, Advise
IBM Spectrum Conductor with Spark
Cluster Virtualization,
Dynamic Resource Orchestration,
Multiple Frameworks, Distributed Execution Engine
WML CE: Open Source ML Frameworks
Large Model Support (LMS)
Distributed Deep Learning
(DDL – 1000s of nodes)
Auto Hyper-parameter
Tuning
Watson Machine
Learning Accelerator
IBM Visual Insights
Auto-DL for Images & Video
Label Train Deploy
Accelerated
Infrastructure Accelerated Servers Storage
AI for
Data Scientists and
non-Data Scientists
H2O Driverless AI
Auto-ML for Text & Numeric
Data, NLP
Import
Experi
ment
Deploy
Distributed Deep Learning (up
to 4 nodes)
Cognitive Systems Software Offerings
Visual Inspector
iOS Inferencing
Application
Capture Inference
37
Call to Action
• Research organizations are leading the way to introduce new
methods and new approaches.
© 2020 IBM Corporation

Covid-19 Response Capability with Power Systems

  • 1.
    © 2020 IBMCorporation IBM Cognitive Infrastructure Relevant Contribution to the COVID-19 Fight Improve time to results. Impact more lives. April 2020 1 IBM Power Systems | IBM Storage
  • 2.
    © 2020 IBMCorporation Collaborators 2 Frank Lee Terry Leatherland Linton Ward Clarisse Taaffe-Hedglin Ruzhu Chen Mohamed El-Shanawany Bill Josko Steven Meserve Marisa de Peralta Jeff Hong Mike Kowolenko, Novisystems + Many Partners, Clients, Researchers IT Builders
  • 3.
    © 2020 IBMCorporation 3 Infrastructure delivers faster insights with greater efficiency to – change the flow of work. High-performance Data & AI deployed against this problem at massive scale reduces time spent delivering insights through unique load balancing and model optimization technologies
  • 4.
    © 2020 IBMCorporation IBM Accelerates Medical Research Tasks GENOMICS Biomarkers detection, biodata modeling and statistics data visualization DIAG N O STIC S Image classification using AI with flexible, targeted models in open frameworks MOLECULAR SIMULATION Drug discovery via modeling of macromolecule receptors and small- molecule ligands DATA FU SIO N Synthesize and model diverse data using data fusion, natural language processing, and machine learning BIOMOLECULAR STRUCTURE Cryo-EM image restoration and refinement analysis for drug design and discovery QU A LIT Y IN SP E C TIO N Quality control and regulatory compliance in medical device or pharmaceutical manufacturing 4
  • 5.
    © 2020 IBMCorporation R&D pipeline 5 Manufacturing Adapted from https://www.nimh.nih.gov/about/directors/thomas-insel/blog/2012/experimental-medicine.shtml GEN O M ICS MOLECULAR SIMULATION BIOMOLECULAR STRUCTURE DIAG N O STIC S DATA FU SIO N QUALITY INSPECTION
  • 6.
    IBM Systems atSupercomputing 2019 / © 2019 IBM Corporation Data Overload Oceans of data arise from rapid digitization and instrumentation of healthcare. App Chaos Thousands of applications, workflows and models are not all following the same rules. Adoption Vertically integrated toolsets with heavy customization and vendor lock- in create work silos. Performance When scaling up or out, most institutions cannot diagnose or analyze the performance problems they face. Cost Demanding workloads require well- orchestrated infrastructure to manage, monitor and control costs. Five key challenges to progress remain despite advances 6 © 2020 IBM Corporation
  • 7.
    © 2020 IBMCorporation A framework for designing, deploying, growing and optimizing infrastructure for HPC, AI and Cloud, created in collaboration with world’s leading healthcare and life sciences institutions, and using Red Hat OpenShift, IBM Power Systems, IBM Storage and open API endpoints. From Data to Insight with IBM HPDA Reference Architecture DATAHUB High Performance Data Fabric & Catalog Capable of Handling Exabytes of Data and Trillions of Objects OR C H E ST R ATI ON High Performance Computing & AI Platform Capable of Orchestrating Thousands of Servers and GPUs APPS & MODELS Large-scale and high-throughput workloads such as HPC, AI and Cloud computing ME D IC A L TA S K S Genomics, molecular simulation, structural analysis, diagnostics, data fusion, manufacturing quality inspection. 7
  • 8.
    8 Orchestrator Carrier & Enginefor Jungles of App Datahub Dams & maps for Ocean of Data Disk TapeFlash Compute & Storage Software- Defined Infrastructure Applications & Tools CPU GPU Framework & Libraries Clinical RWE High Performance Data & AI (HPDA) Architecture ImagingGenomics © 2020 IBM Corporation
  • 9.
    © 2020 IBMCorporation InfiniBand and Ethernet switches & UFM (Mellanox) shared Elastic Storage Server (ESS) per System 100TB – xx PB v High performance & scalability v Linux RHEL7.6, CUDA, ESSL,xl compilers, SMT, GPFS, NVMe … v Optimized ML/DL frameworks and tools, e.g., Watson Machine Learning- CE (Pytorch, TensorFlow, Caffe), v Life Science Apps: GATK, Relion, NAMD, Amber, Gromacs, etc v Value-add tools: v IBM Visual Insights, IBM Visual Inspector , 3D data labeling prep v Application interoperability & manageability: v Openshift, Docker, Kubernetes, LSF .. v Anaconda, python, Jupyter … v MPI, XLC/C++, GCC, AT … v Data management & governance v Elastic Storage /Spectrum Discover Accelerated Computing Platform (ACP) Network IBM TOR Switch Enet TOR Switch Network Manager Compute Nodes • Deep Learning • Inference • Computation Management Nodes ESS Mgmt. Node ESS* Storage 1-18 Deep Learning nodes for model training 2+ compute nodes for inferencing, computation 1-5 Mgmt. nodes Per rack Power9 AC922 Power9 IC922 Hardware Building Blocks So3ware Building Blocks OpenPOWER 9
  • 10.
    To provide enhanceduser experiences at enterprise scale Researcher Auto ML with Driverless AI Instance #2 Data Engineer/Analyst ETL / Reporting Instance #4 Risk Analytics LOB Instance #3 Administrator Compute Nodes Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Linux Data scientist Customer behavior... Trend analysis... Instance #1 Broader use, better business value ü Logical view - each group has their own environment ü Right resources allocated to each user based on their workload characteristics and SLA priority ü Multitenancy ensures data and application isolation Existing Hadoop Data Lakes Spark AI Grid Data from lake and other sources x86 © 2020 IBM Corporation
  • 11.
    Journey to PowerAnalytics Grid Comput e YARN HDFS Comput e YARN HDFS Linux Hive, HBase, Spark,… Hadoop Workloads Cloudera Hadoop With WMLA at the Edge Compute YARN Compute YARN Linux on Power Hive, HBase, Spark,… Shared Storage POSI X HDF S HDF S Secondar y Storage Tier Data Ingest Compute Spectrum Conductor Linux on Power GPU Spark, TensorFlow, Caffe,.. ML/DL AI Workloads with IBM Power WMLA POSI X Compute Spectrum Conductor Hadoop Workloads With WMLA Phase I: De-Couple Compute Phase II: Shared Storage Tier Phase III, AI Hub Benefits De-Coupled Compute from Storage migrating workloads from Cloudera, Reduce dependence on Static/Batch ETL Faster ingest with POSIX, Reduce storage footprint and copies of data, Grow storage independently of compute leveraging in-memory Spark vs ETL Complexity of managing Hadoop stack, Analytics Performance, Resource utilization Improved resource uPlizaPon, MulP- tenancy, Improved AnalyPcs Performance, Spark GPU acceleraPon & AI ML/DL frameworks on the same cluster Cloudera Data Lake Server Nodes Read Only Hybrid-Cloud ready, evoluPon to AI ML/DL analyPcs Replace Cloudera with x86/Power and evolve to migraPon off of Hadoop MapReduce to Spark centric plaVorm 11 © 2020 IBM Corporation
  • 12.
    © 2020 IBMCorporation reduce “Time To Answer” Biomolecular StructureMolecular Fit SimulationGenomics Data Fusion and AIMedical Diagnostics AI Medical Manufacturing Quality Accelerate drug discovery by accelerated modeling of macromolecule receptors and their small-molecule ligands. Applications are optimized and accelerated (>2x) on IBM Power9 with advanced CPU and GPU technology, as well as IBM ESS storage, hybrid cloud solutions. Applications: NAMD, Gromacs, AMBER Cryo-EM image restruction and refinement analysis is accelerated with IBM Power9 and ESS storage solutions by more than 2 Jmes, comparing to its peers. High resolution protein receptors and ligands can be applied in drug design and discovery studies. Application: RELION, CrYolo Accelerate diagnostics with accelerated AI for image (classification…) with flexible, targeted models in open frameworks. An accelerated computing platform accelerates time to results. systems management and storage solutions provide efficient and trusted curation of models, images, metadata and inferences. Applications: Visual Insights, Watson Machine Learning + Accelerator, (Tensorflow, Pytorch) Rapidly synthesize and model diverse data using data fusion, natural language processing, and machine learning. Fact extraction from papers and EMR (Meddra- based dictionaries). Text, structured lab values and images can be included in AI models and statistical methods in real time.. Partners: Novi Systems H2O, Spark, Hadoop High performance and efficiency (10x) of software tools and applications for genomic variants and biomarkers detection, biodata modeling and statistics data visualization. High throughput and optimized workload pipelines to accelerate biodata analysis with highly optimal and parallel I/O, memory, CPU and GPU computations Applications: GATK 4, BWA, ParaBricks Rapidly Leverage data fusion and AI for quality control and regulatory compliance in medical device manufacturing or in pharmaceutical manufacturing. Make product accept / reject decisions and maintain compliance records. Application: IBM Visual Inspector Partner: Novi Systems The Solutions 12
  • 13.
    Optimizing Precision Medicine Reducedtime-to-completion for long-running jobs while increasing resource utilization Using IBM, Sidra has completed hundreds of thousands of computing tasks comprising millions of files and directories, without experiencing system downtime. 100,000+ © 2020 IBM Corporation
  • 14.
    Biomolecular Structure: CryogenicElectron Microscope Image Bucket 5TB per day “Time to Answer” is now paramount IBM IC922 = 46min. vs 77 min on x86 • Storage growth escalating each day = 1PB per year. • Image rendering is compute intensive © 2020 IBM Corporation
  • 15.
    • IBM isdeploying AI techniques to automate the pipeline for High Resolution screening and re-imaging using IBM Visual Insights. • Molecules images are frozen in ice. Question: is it feasible to magnify on 100x resolution? • Each grid image is a data file. Manual selection intensive to determine High Res candidates • AI can automate the High Resolution re-image pipeline AI Image Processing © 2020 IBM Corporation
  • 16.
    © 2020 IBMCorporation Molecular Simulations. - Standard Codes NAMD Amber Gromacs IBM Performance is xx faster than Intel. 16© 2020 IBM Corporation
  • 17.
    NAMD , VMD25X performance boost with IBM © 2020 IBM Corporation
  • 18.
    © 2020 IBMCorporation 18 © 2020 IBM Corporation
  • 19.
    © 2020 IBMCorporation Medical Imaging 19
  • 20.
    IBM Visual InsightsRadiology Use Case Using an automated Deep Learning model built with IBM Visual Insights, Radiologists can process lung X-rays to detect Covid-19 induced pneumonia faster and with more accuracy. © 2020 IBM Corporation
  • 21.
    Optimizing Medical Imaging Enhanceimage identification with deep learning to assist physicians and benefit patients 1300 MRI images trained by IBM Power Systems and IBM Storage in just two hours, compared to forty hours on traditional architectures 20x faster © 2020 IBM Corporation
  • 22.
    • A Solution– Not a Bag of Tools – targeting the domain expert • We provide solutions that use AI, not just AI tools • Targets subject matter experts • Intuitive interfaces; Excel level skills • Easy to Understand - Data integration, filtering, and analysis on steroids • Allow users to focus on the problem, not the technology • Overcome internal and external issues with data fusion • Provide the ability to explore data to identify issues and opportunities • Simple to Use • Abstract ML and NLP to identify important features or isolate facts from structured and unstructured data • Delivered as a subscription (including h/w, support, consulting, training) • Provide ROI in minutes, not months *Patents pending on the methods employed by NoviLens Data Fusion and AI: NoviLens© Data AI Appliance • Easy data import and automation • Data fusion • Enhance problem solving while de- risking decisions. • Internal data is your unique position • supplement it with external data • Visual Analytics and Filtering • Natural Language Processing • Machine Learning • Deployed in the IBM Cloud or On-Premise Easy-to-use Data AI Appliance for Decision Makers. ** TechData IBM Reseller in Research Triangle Park 22 © 2020 Novi Systems
  • 23.
    Uses of DataFusion AI and advanced analytics Clinical trial: – Patient selection - read through EMRs to find patients with acceptable criteria (shorten enrollment period) – Monitor trial - pick up adverse events by reading through EMRs then grouping symptoms as well as lab data then doing regression analysis (sample size probably too small for ML) Production: – Capacity: Monitor process stream for process control. Biggest issue will be supply. Know when to "kill" a batch to free up production. Done through monitoring process (quantitative data as well a batch records). – Product Release: Can integrate with LIMS and if they have electronic lab notebooks, treat the notebooks like EMRs. Will accelerate failure investigations and provide insight into design space shortening release times (did this with ThermoFisher) – Regulatory: Assist in Post Market Surveillance by reviewing Phase IV or Medline records automatically Distribution: – Depending on the sophistication of their ERP systems, can coordinate specifications with regulatory requirements (may have different specs/country). Late stage drug development and Product Release © 2020 Novi Systems
  • 24.
    Data Fusion Example:Manufactured Product Disposition product disposition manufacturing supply chain Production planning BOM vendor qualification/certification material specifications processing cleaning air water surfaces facilities air water surfaces in process monitor process trending SPC process documentation engineering maintenance PM event shutdown facilities surfaces mechanical air water change control training quality lab testings oos investigation Control charting product process materials doc review facility testingwater air surfaces equipment training change control audits deviation management investigation product review lab review facilities review training development data process development process capability product attributes R&D preclinical clinical Manufacturing Quality Investigation Product Disposition Deviation Management Data Fusion Example: Manufactured Product Disposition product disposition manufacturing supply chain Production planning BOM vendor qualification/certification material specifications processing cleaning air water surfaces facilities air water surfaces in process monitor process trending SPC process documentation engineering maintenance PM event shutdown facilities surfaces mechanical air water change control training quality lab testings oos investigation Control charting product process materials doc review facility testingwater air surfaces equipment training change control audits deviation management investigation product review lab review facilities review training development data process development process capability product attributes R&D preclinical clinical Manufacturing Quality Investigation Product Disposition Deviation Management Decision © 2020 Novi Systems
  • 25.
  • 26.
  • 27.
    TopicsH2O Driverless AI– Simple, Fast, Accurate, Interpretable Easy Deployment for Low Latency Models • Production-ready, stand- alone scoring pipelines that are easy for IT to deploy and manage • Python and Java • Streamlined scoring code to deploy on any device: on the edge, mobile, … • Very fast (milliseconds) to satisfy today’s real-time apps Fast and Accurate Results • “Data Scientist in a Box” • Simple interface • Automatic feature engineering to increase accuracy • Automatic recipes for solving wide variety of use-cases • Automatic tuning to find and tune the right ensemble of models Industry Leading Interpretability • Trusted results with explainability and transparency • Interpretability for debugging, not just for regulators • Get reason codes and model interpretability in plain English • K-Lime, LOCO, partial dependence and more Automatic Data Visualization • Automatic generation of visualizations and graphs to explore your data before the model-building process • Most relevant graphs shown for the given data set • Identify outliers and missing values H2O Driverless AI Overview 27 27© 2020 IBM Corporation
  • 28.
    © 2020 IBMCorporation Extending Support for Edge Devices IBM Visual Inspector for Healthcare Manufacturing, Pharmaceuticals AC922/IC922/x86 NVIDIA Jetson TX2/Nano iOS Device TensorRT CoreMLPower9 IC922/AC922 & x86 (coming soon) 28 © 2020 IBM Corporation
  • 29.
    © 2020 IBMCorporation Executive Summary • IBM Visual Inspector is a standalone iOS application that can execute AI models trained on PowerAI Vision • Available on the Apple App Store • Devices can be handheld or fixed (mounted) • Works in network connected or disconnected mode • Provides model and device management capability to support an end-to-end AI Computer Vision environment • Built-in demonstration models • Custom model development requires Visual Insights IBM Visual Inspector 29
  • 30.
    © 2020 IBMCorporation Capture data to build model Inference (disconnected or connected) Remote management Visual Inspector App Functions Multiple Ways to Utilize One Application Where does this fit? Healthcare “things” manufacturing, Pharma drug manufacturing 30
  • 31.
    © 2020 IBMCorporation OpenPOWER Servers IBM Power AC922 TRAIN Powering the Fastest Supercomputer DATA IBM Power IC922 INFERENCE IBM Power IC922 Deploy AI into ProductionStorage Dense Server 31 • NVMe dense server with IO rich architecture for superior throughput1 • Enterprise ready cloud deployment with RH OpenShift and Power Systems reliability • 2.35x superior price/performance for containerized cloud deployments • Best training platform with 4x faster model iteration • ~6x data throughput with NVLink to GPUs • Synergistic HW/SW offerings for ease of use and leadership performance • Superior density (33%) and through- put to inference accelerators • Open design for accelerator diversity • Deploy inference at scale with HW and SW solution offerings 31 © 2020 IBM Corporation
  • 32.
    © 2020 IBMCorporation INSIGHTS Ingest Prepare | Train | Inference • Single name space • Global collaboration / hybrid cloud • Software RAID / erasure coding • Multi-protocol support Throughput-oriented, software-defined temporary landing zone Transient storage High throughput performance tier Fast ingest / Real-time analytics High volume, index & auto-tagging zone Classification & metadata tagging Organize Analyze ML/DL ETL Large-scale Runtime Environment Watson Machine Learning Accelerator WML CE SnapML Spectrum Computing IBM Spectrum Storage for the AI Data Pipeline The fastest path from ingest to insights Putting it all together High scalability, large/sequential I/O capacity tier Archive DATA © 2020 IBM Corporation
  • 33.
    © 2020 IBMCorporation IBM Elastic Storage Server | 33 Spectrum Scale ESS D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L 7 - 32 GB/s Models GL1S, GL2S, GL4S, GL5S, GL6S 1-6, 84 disk drive enclosures 0.25 – 6.8 PB usable 0.33 – 8.9 PB raw GLxS Disk High perfomance, capacity 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC5887 D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC5887 D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L 60 TB – 1.1 PB usable 90 TB to 1.5 PB raw 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC5887 D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L 9 - 37 GB/s Models GS1S, GS2S, GS4S 1-4 SSD enclosures High perf, IOPS, random I/O GSxS Flash Disk: 0.5 - 2.5 PB usable SSD: 60 - 530 TB usable D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC5887 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC5887 D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC5887 D1 D2 D3 D4 D5 D6 D7 D8 S822L 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 FC5887 GHxx Hybrid Disk: 14 to 29 GB/s SSD: 13 to 26 GB/s Max: to 36 GB/s* Models GH12, GH14, GH22, GH24 2-4, 84 drive HDD enclosures 1-2, 24 drive SSD enclosures Combined high perfomance, capacity, IOPS, random I/O *Maximum combined Disk&SSD Perf per ESS unit D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L D1 D2 D3 D4 D5 D6 D7 D8 S822L Models GL1C, GL2C, GL4C, GL5C, GL6C, GL8C 1-8, 106 disk drive enclosures 7 - 32 GB/s 0.78 – 9.1 PB usable 1 – 11.8 PB raw GLxC Disk High performance, capacity, density 9 PB per rack
  • 34.
    8 May 2020/© 2018 IBM Corporation • Who, what, when, where, and why of account, container, object, stream, dir, file • Perfect for indexing and searching • Metadata may be separate from the data, stored with the data, or derived from the data • Posix inode plus extended attributes • Standard document headers (doc, ppt, mp3, dicom, pdf, jpeg, GeoTIFF) • Custom metadata tags • AI derived metadata Age, Biomarkers, Developmental Stage, Cell Surface, Markers, Cell Type/Cell Line, Disease State, Extract Molecule, Genetic Characteristics, Immunoprecipitation, antibody, Organism, Biomedical Natural Language Processing Image Location Size Owner Group Permissions Last-Modified ... System Metadata Metadata: Key to Unlocking Data Value & Improving Management Spectrum Discover © 2020 IBM Corporation
  • 35.
    IBM Spectrum Archive: Policy-basedCost Optimization 35 Small files last accessed > 30 days last accessed > 60days Silver pool is >60% full Drain it to 20% accessed today and file size is <1G Send it back to Silver pool when accessed System pool (Flash) Gold pool (SSD) Silver pool ( NL SAS) TS4500 Spectrum Archive Automation • Powerful policy engine - Example: File Heat measures how often the file is accessed. - As the file gets “cold” move it automatically to a lower cost storage pool - Information Lifecycle Management - Fast metadata ‘scanning’ and data movement - Automated data migration to based on threshold • Users not affected by data migration - Single namespace - Persistent view of the data • Tape as the external pool of Spectrum Scale © 2020 IBM Corporation
  • 36.
    © 2020 IBMCorporation Watson Machine Learning Community Edition Deep Learning Impact (DLI) Module Data & Model Management, ETL, Visualize, Advise IBM Spectrum Conductor with Spark Cluster Virtualization, Dynamic Resource Orchestration, Multiple Frameworks, Distributed Execution Engine WML CE: Open Source ML Frameworks Large Model Support (LMS) Distributed Deep Learning (DDL – 1000s of nodes) Auto Hyper-parameter Tuning Watson Machine Learning Accelerator IBM Visual Insights Auto-DL for Images & Video Label Train Deploy Accelerated Infrastructure Accelerated Servers Storage AI for Data Scientists and non-Data Scientists H2O Driverless AI Auto-ML for Text & Numeric Data, NLP Import Experi ment Deploy Distributed Deep Learning (up to 4 nodes) Cognitive Systems Software Offerings Visual Inspector iOS Inferencing Application Capture Inference
  • 37.
    37 Call to Action •Research organizations are leading the way to introduce new methods and new approaches. © 2020 IBM Corporation