AI in healthcare - Use Cases

Powering Up
AI in Healthcare
Clarisse Taaffe-Hedglin
Executive IT Architect
IBM Garage
IBM Systems
clarisse@us.ibm.com

Agenda
Healthcare Use cases
The AI Ladder and Lifecycle
AI at Scale Themes

“AI is the
fastest-growing
workload”*
3
*Forrester Research Inc. “AI Deep Learning Workloads Demand a New Approach to Infrastructure”, by
Mike Gualtieri, Christopher Voce, Srividya Sridharan, Michele Goetz, Renee Taylor, May 4, 2018.
3 IBM IT Infrastructure / © 2021 IBM Corporation

Machine Learning Context
REINFORCEMENT
LEARNING
TRANSFER
LEARNING
“AI is the automation of automation” – Jensen Huang, GCG 2020

5
Analytics Modernization: From Data to Actions
010101010101010111100010011001010111
0000000000010101010100000000000 111101011
11000 000000000000 111111 010101 101010 10101010100
Prescriptive
What should
we do ?
Descriptive
What Has
Happened?
Cognitive
Learn
Dynamically
Predictive
What Will
Happen?
ACTION
DATA
HUMAN INPUTS
<
< >
< >
>
>
delivering faster insights with greater efficiency to impact more lives

A framework for designing, deploying, growing and optimizing infrastructure for HPC, AI and Cloud, created in
collaboration with world’s leading healthcare and life sciences institutions, and using Red Hat OpenShift, IBM
Power Systems, IBM Storage and open API endpoints.
From Data to Insight with an Optimal Reference Architecture
DATAHUB
High Performance Data Fabric & Catalog
Capable of Handling Exabytes of Data
and Trillions of Objects
ORCHESTRATION
High Performance Computing & AI
Platform Capable of Orchestrating
Thousands of Servers and GPUs
APPS & MODELS
Large-scale and high-throughput
workloads such as HPC, AI and Cloud
computing
MEDICAL TASKS
Genomics, molecular simulation,
structural analysis, diagnostics, data
fusion, manufacturing quality inspection.

Three broad categories of AI Use Cases
“Structured” Data Use Cases
Computer Vision Use Cases
- Big Data (Rows and Columns)
- Available AI Software More Accuracy !
This is sort of “Magic”
- a deep learning Model is trained to detect and classify objects
Natural Language Processing Use Cases
- A Model learns to read, hear and “understand” language

§ BIG, COMPLEX SYSTEMS
§ PERSONALIZATION
§ AUTOMATION
§ SIMULATING RELATIONSHIPS
§ VISUAL RECOGNITION
§ PATTERN DETECTION
§ CHATBOTS
§ DESIGN OF EXPERIMENTS
§ OPTIMIZATION
Thescenarios
AIcansolvefor
today

Addressable Markets And Fields For AI
RETAIL
Recommendation
engines, Precision
marketing
AGRICULTURE
Crop yield, Plant
disease, Remote
sensing
LIFE SCIENCES
Sequence
Analysis,
Radiology
UTILITIES
Smart Meter analysis,
Capacity planning
$
FINANCIAL SERVICES
Risk analysis
Fraud detection
CUSTOMER SERVICE
Chatbots, Helpdesk,
Automated
Expenses
LAW & DEFENSE
Threat analysis -
social media
monitoring
RESEARCH
Physics Modeling
Simulation
optimization
TRANSPORTATION
Optimal traffic
flows, Route
planning
CONSUMER GOODS
Sentiment
analysis
HEALTH CARE
Patient sensors,
monitoring, EHRs
MEDIA/ENTERTAINMENT
Advertising
effectiveness
OIL & GAS
Exploration,
Sensor analysis
AUTOMOTIVE
ADAS,
Maintenance
MANUFACTURING
Line inspection,
Defect analysis
AI and Autonomous Machine Learning will help
revolutionized every single industry making us
more productive and efficient to do things that
today are impossible to do.

10
Smart loves problems, and there has never been a bigger
problem facing our world.
Biomolecular Structure
Molecular Simulation
Genomics Medical Diagnostics AI
Data Fusion and AI
Bio-Informatics
Artificial intelligence and high-performance computing have already begun to attack the
virus, assisting in molecular drug discovery, genomics and medical image processing.

Data
Overload
Oceans of data
arise from rapid
digitization and
instrumentation
of healthcare.
App Chaos
Thousands of
applications,
workflows and
models are not
all following the
same rules.
Adoption
Vertically
integrated
toolsets with
heavy
customization
and vendor lock-
in create work
silos.
Performance
When scaling up
or out, most
institutions
cannot diagnose
or analyze the
performance
problems they
face.
Cost
Demanding
workloads
require well-
orchestrated
infrastructure to
manage, monitor
and control
costs.
Five key challenges to progress remain despite advances

Data
Insight
HPC Analysis &
Simulation
AI Inference &
Automation
Sensors
The Convergence of HPC and AI

Optimizing Medical Imaging
Enhance image identification with deep learning
to assist physicians and benefit patients
1300 MRI images trained by IBM Power
Systems and IBM Storage in just two hours,
compared to forty hours on traditional
architectures

97% Accuracy for Melanoma Detection for Dermoscopic Images
Melanoma vs. Atypical & Benign
Human*
Deep
Features
Ensembles CNN DRN
Doctor/
Expert
ImageNet + Sparse
Coding
+ Low-level + Auto-
Encoder
Deep
Learning
Deep
Residual
Learning
0.84 0.91 0.92 0.93 0.94 0.95 0.97
- 0.73 0.73 0.74 0.77 - -
* Estimated human expert performance
Use Case
Automatic skin lesion image analysis for
melanoma detection with Memorial Sloan
Kettering (MSK-CC)
Visual modeling techniques:
§ Deep Residual Networks
§ Conv. Neural Networks
§ Ensemble Models
Top Performance
= 97% Accuracy!!!
Melanoma vs. Atypical
Best
14
Think 2020 / DOC ID / Month XX, 2020 / © 2020 IBM Corporation

15
Advances in instrument
design, sample preprocessing
and mathematical methods
have enabled high volume
throughput imaging at atomic
scale.
Cryogenic electron
microscopes generate an
average of 5 TB of image data
per day
BIOMOLECULAR STRUCTURE
Massive Data Sets Require Massive Processing Capability

Accelerating Cryo-EM Imaging Analysis
Reduced time-to-completion for high resolution image
analysis jobs while increasing resource utilization
Using IBM AC922 cluster, more than 100 cryo-EM
high resolution image workload analysis jobs running
in parallel on Satori cluster
BIOMOLECULAR STRUCTURE

Simulation of millions of atoms requiring large computational
resources
Large scale simulation includes millions of
atoms
• Virus molecules
• Ribosomes
• Bioenergy system and complex
Solution
• High performance computing CPU and
GPUs accelerating performance
• Optimal memory and network bandwidths
scaling performance to hundreds of nodes
• Techniques to reduce number of simulations
Receptor
ligand
Virus molecule simulation Receptor-ligand fit
Cryptic binding site prediction Binding energy prediction
MOLECULAR SIMULATION

Molecular Dynamics Simulation Computational Intensity
A) Using NAMD to simulate influenza
B) virus (left)and Covid-19 (right)
B) Drug discovery:
protein receptor
C) In silico prediction of protein cryptic binding site D) Predicting protein receptor
ligand binding energy
Receptor
ligand
Large scale simulation
includes millions of atoms
• Virus molecules
• Ribosomes
• Bioenergy system and complex
Solution
• High performance computing
CPU and GPUs accelerating
performance
• Optimal memory and network
bandwidths scaling performance
to hundreds of nodes
• Techniques to reduce number of
simulations

Bayesian optimization
accelerated workflow
uses 1/3 of the
calculations to achieve 4
orders of magnitude
resolution increase
Optimizing Molecular Modeling
Achieves human level
performance in days
instead of months.
Accelerated Force Field Tuning Intelligent Phase Diagram Exploration

Faster
Better Cheaper
BOA accelerates
time to insight, time
to value, and time to
design by factors
Example:
IBM EDA ->100x faster
than brute force
BOA can find new and
unknown optima in a
design space because of
its lack of bias and
exploration algorithm
Example:
Infineon – 3x faster than
other methods and
4 orders of magnitude
better resolution
Nothing is cheaper than a
simulation which is never
run. BOA prevents
unnecessary work which
reduces all kinds of costs
Example:
GlaxoSmithKline –
reduced their screen
workload from 20k
experiments to 200
IBM
BOA
Bayesian Optimization Value
0 200 400 600 800
BOA
Greedy
Similarity
Diversity
count
Search Method Comparison
Drug Discovery Case - Single
Objective
All Data / Ties removed
Conclusion: >80% of the
time IBM BOA is the best
method with the least regret

Optimizing Precision Genomics
Reduced time-to-completion for long-running
jobs while increasing resource utilization
Using IBM, Sidra has completed hundreds of
thousands of computing tasks comprising
millions of files and directories, without
experiencing system downtime.

COLLECT - Make data simple and accessible
ORGANIZE - Create a trusted analytics foundation
ANALYZE - Scale AI everywhere with trust & transparency
Data of every type, regardless
of where it lives
MODERNIZE
your data estate for an AI
and hybrid multicloud
world
INFUSE – Operationalize AI across business processes
The AI Ladder
A prescriptive approach to accelerating the journey to AI
AI
AI-optimized systems
infrastructure

The Data: Biological Data Analytics
Biological
Data Analysis
Biomarker
Identification
Biodata
modeling and
Statistical
Analysis
Biodata
Visualization
Medical Images
Data analysis
Structural
Bioinformatics
Genomics
Sequence data
analysis
Biological Data Analytics
q Genomic Sequence Data: an explosive growth of biodata
q Sequence alignment
q Variant discovery and characterization
q Genomic profiling and pattern discovery
q Biomarker Identification: gene expression profile, RNA-seq, ChIP-
seq, microarray identification and validation, etc.
q Structural Bioinformatics: identify and predict 3D biomolecule
structures, such Cryo-EM data refinement, molecular dynamic
simulation, NMR, x-Ray crystallographic data, etc.
q Biodata Modeling & Statistical Analysis: biological pathways
analysis, Gene, clinical data cohorts study, data extraction, etc.
q Medical Image Processing: image segmentation, registration,
statistic modeling.
q Biodata Visualization: 3D molecule structures, genomics sequences
visualization, etc.
Ruzhu Chen @ 2019

High performance
and high throughput
storage hierarchy
required for data
loading, extraction
and computation.
Tertiary storage
required for archive
and store. Storage
tools for data
indexing, discovery
and governance.
Computation
High performance
and efficiency of
software tools and
applications for
genomic variants and
biomarkers analysis,
drug discovery,
medical image
processing and
molecule structure
modeling, data
visualization.
High throughput
and optimized
workload pipelines
to accelerate
biodata analysis
with highly
optimal and
parallel I/O,
memory, CPU and
GPU
computations.
Solutions
Large volume
and variety of
data around
genomic sequences,
gene expression,
images, structural
biomolecules,
clinical and
healthcare
information,
personized medicine
data
Data Storage
The Challenges: Analyzing Explosive biological data
Data Explosion
Ruzhu Chen @ 2019

Data Pipeline for AI in Healthcare
Insights Out
Trained Models,
simulations
Inference
Data In
Transient Storage
SDS/Cloud
Global Ingest
Throughput-oriented,
globally accessible
Cloud
ETL
High throughput, Random
I/O,
SSD/Hybrid
Archive
High scalability, large/sequential I/O
HDD Cloud Tape
Hadoop / Spark
Data Lakes
Throughput-oriented
Hybrid/HDD
ML / DL
Prep ⇨ Training ⇨ Inference
High throughput, low
latency,
Random I/O
SSD/NVMe
Classification &
Metadata Tagging
High volume, index &
auto-tagging zone
Fast Ingest /
Real-time Analytics
High throughput
SSD
Throughput-oriented,
software defined
temporary landing zone
capacity tier
performance tier performance &
capacity Tier
performance &
capacity Tier
performance tier
capacity tier
Fits Traditional and New Use Cases
EDGE COLLECT ORGANIZE ANALYZE INSIGHTS
INFUSE
IBM Spectrum Scale / Storage for AI / © 2020 IBM Corporation

Public data
Anything data system can pull
from the outside world for free
through web connections,
databases, IoT and sensors
Proprietary data
What private data from the
outside world could the system be
given permission to use?
Purchased data
What pre-trained data could the
system buy or subscribe to?
IBM Skills Academy / © Copyright 2018 IBM Corporation
Ground truth
Data used to define what the system
knows from day one
Domain knowledge
Data resources that can be used to
teach the system to understand and
be an expert in a particular field
Private data
Unique data the creator owns and
only shares internally
Personal public data
What unique data does the creator
share with the outside world?
Transaction and
application data
Machine,
sensor data
Enterprise
content
Image, geospatial,
video
Social data
Third-party data
Available Data Sources

2 June 2021/ © 2018 IBM Corporation
• Metadata is the structured data about the unstructured object
• Who, what, when, where, and why of account, container, object, stream, dir, file
• Perfect for indexing and searching
• Metadata may be separate from the data, stored with the data, or derived from the data
• Posix inode plus extended attributes
• Standard document headers (doc, ppt, mp3, dicom, pdf, jpeg, GeoTIFF)
• Custom metadata tags
• AI derived metadata
Age, Biomarkers, Developmental Stage, Cell
Surface, Markers, Cell Type/Cell Line,
Disease State, Extract Molecule, Genetic
Characteristics, Immunoprecipitation,
antibody, Organism
Biomedical
Natural Language
Processing
Image
Location
Size
Owner
Group
Permissions
Last-Modified
...
System
Metadata
Where is the data?

Metadata-Fueled Data Analysis
Large Scale Data Ingest
• Scan records at high speed
• Live event notifications
• Capture system-level tags
• Automatic indexing
Business-Oriented
Data Mapping
• Custom data tagging
• Content-inspection via APIs
• Policy-driven workflows
Data Activation
• Data movement via APIs
• Extensible architecture
• Solution Blueprints
Data Visualization
• Query billions of records
in seconds
• Multi-faceted search
• Drilldown dashboard
• Customizable reports

Common AI Data Considerations
Data Compute
Legacy Data
Stores
IoT, Mobile
& Sensors
Collaboration
Partners
New Data
Ingest Inference
Training
Preparation
Iterative Model training to improve accuracy
Champion
Challenge
r
-”Data Center”
- At Edge
Trained
Model
§ Ease to Massively Scale
§ High Performance
§ Tiered / Archive
§ Secure
§ High Performance
§ Metadata Tagging
§ Single Name Space
Low Latency
Dev & Inference Stack
- Open Source
- Stable and Supported
- Auditable
Productivity
Performance
Robustness
Considerations

• Ease of use
• Optimize resources
• Scale workload
AI Frameworks /
Open-Source Libraries
AI Tools and
Applications
AI Software Landscape
AI
Infrastructure

Anaconda Environment for Applications
• Use anaconda enterprise network
(AEN) to manage cryo-EM software
repository on server.
• Easy to use and update software
Anaconda Architecture for Cryo-EM Analysis
Computation
Web Interface
Repo Install
Software
Control
Authentication
Anaconda Server
Compute Nodes
Database Users

Anaconda Environment for Applications
• Use anaconda enterprise network
(AEN) to manage cryo-EM
software repository on server.
• Easy to use and update software
Anaconda Architecture for Cryo-EM Analysis
Computation
Web Interface
Repo Install
Software
Control
Authentication
Anaconda Server
Compute Nodes
Database Users

OpenPOWER is a technical community
dedicated to expanding the the IBM Power architecture ecosystem
https://github.com/open-ce
Open-CE
Minimize time to value for
foundational ML/DL packages
Provide a flexible source-to-image
solution to provide a complete and
customizable AI environment.

Fairness Explainability Adversarial
Robustness
Transparency
Is it fair?
Is it easy to
understand?
Is it secure? Is it accountable?
Pillars of Trusted AI

AI in healthcare - Use Cases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AI in healthcare - Use Cases

Similar to AI in healthcare - Use Cases (20)

More from Ganesan Narayanasamy

More from Ganesan Narayanasamy (20)

Recently uploaded

Recently uploaded (20)

AI in healthcare - Use Cases