MICCAI - Workshop on High Performance and Distributed Computing for Medical Imaging - Sept 22 2011

Integrative Multi-Scale Biomedical Informatics

Joel Saltz MD, PhD
Director Center for Comprehensive
Informatics

Squeezing Information from Spatial Datasets

• Leverage exascale data and
Center for Comprehensive Informatics

computer resources to
squeeze the most out of
image, sensor or simulation
data
• Run lots of different
algorithms to derive same
features
• Run lots of algorithms to
derive complementary
features
• Data models and data
management infrastructure
to manage data products,
feature sets and results from
classification and machine
learning algorithms

INTEGRATIVE BIOMEDICAL
INFORMATICS ANALYSIS

Reproducible anatomic/functional
characterization at gross level (Radiology) and
fine level (Pathology)
Integration of anatomic/functional
characterization with multiple types of “omic”
information
Create categories of jointly classified data to
describe pathophysiology, predict prognosis,
response to treatment

In Silico Center for
Brain Tumor Research
Specific Aims:

1. Influence of necrosis/
hypoxia on gene expression and
genetic classification.

2. Molecular correlates of high
resolution nuclear morphometry.

3. Gene expression profiles
that predict glioma progression.

4. Molecular correlates of MRI
enhancement patterns.

Integration of heterogeneous multiscale
information

•Coordinated initiatives
Pathology, Radiology,
“omics”
Radiology
•Exploit synergies
Imaging
between all initiatives
to improve ability to
forecast survival & Patient
“Omic” Outco
response. me
Data

Pathologic
Features

Example: Pathology and Gene Expression
Joint Predictors of Recurrence/Survival

Lee Cooper Carlos Moreno

INTEGRATIVE ANALYSIS
OF BRAIN TUMOR DATA

In Silico Center for
Brain Tumor Research
Key Data Sets

REMBRANDT: Gene expression and genomics data
set of all glioma subtypes

The Cancer Genome Atlas (TCGA): Rich “omics” set
of GBM, digitized Pathology and Radiology

Pathology and Radiology Images from Henry Ford
Hospital, Emory, Thomas Jefferson U, MD Anderson
and others

TCGA Research Network

Digital Pathology

Neuroimaging

TCGA and REMBRANDT Datasets

*Some overlap with TCGA main repository but rescanned at 40X Mikkelsen @ H
** Data obtained by generosity of Lisa Scarpace/Tom
** Data obtained by generosity of Flanders/Mark Curtis @ TJU @ HF
***Thanks to Adam Lisa Scarpace/Tom Mikkelsen
***Thanks to Adam Flanders/Mark Curtis @ TJU

Neuroimaging/MRI Data

*Pending Upload to NBIA portal

** Data obtained by efforts of Lisa Scarpace/Tom Mikkelsen
@ HF

Brain Tumor Radiology and Pathology

Anaplastic Astrocytoma
(WHO grade III)

Glioblastoma
(WHO grade IV)

Morphological Tissue Classification
Whole Slide Imaging Cellular Features

Nuclei Segmentation

Jun Kong

Requirements for High Resolution Whole Slide
Feature Characterization

• 1010 pixels for each whole slide image
• 10 whole slide images per patient
• 108 image features per whole slide
image
• 10,000 brain tumor patients
• 1015 pixels
• 1013 features
• Hundreds of algorithms
• Annotations and markups from dozens
of humans

Distinguishing Characteristic in
Gliomas
Nuclear Qualities
Round shaped with Elongated with
smooth regular texture rough, irregular texture

Oligodendroglioma Astrocytoma

 Use image analysis algorithms to segment and classify microanatomic
features (Nuclei, Astrocytoma, Necrosis ...) in whole slide images
 Represent the segmentation and classification in a well defined
structured format that can be used to correlate the pathology with
other data modalities

Astrocytoma vs Oligodendroglima
Overlap in genetics, gene expression, histology

Astrocytoma vs Oligodendroglima
• Assess nuclear size (area and
perimeter), shape (eccentricity,
circularity major axis, minor axis, Fourier
shape descriptor and extent ratio),
intensity (average, maximum, minimum,
standard error) and texture (entropy,
energy, skewness and kurtosis).

Machine-based Classification of TCGA GBMs (J Kong)
Whole slide scans from 14 TCGA GBMS (69 slides)
7 purely astrocytic in morphology; 7 with 2+ oligo component
399,233 nuclei analyzed for astro/oligo features
Cases were categorized based on ratio of oligo/astro cells

TCGA Gene
Expression Query:
c-Met overexpression

Manual: TCGA Neuropathology Attributes
120 TCGA specimens; 3 Reviewers
Presence and Degree of:
Differentiation:
Microvascular hyperplasia  Small cell component
 Complex/glomeruloid  Gemistocytes
 Endothelial hyperplasia  Oligodendroglial
 Multi-nucleated/giant cells
Necrosis  Epithelial metaplasia
 Pseudopalisading pattern  Mesenchymal metaplasia
 Zonal necrosis
Other Features
Inflammation
 Perineuronal/perivascular
 Macrophages/histiocytes satellitosis
 Lymphocytes  Entrapped gray or white matter
 Neutrophils  Micro-mineralization

Nuclear Features Used to Classify GBMs

10

20

30

40
Feature Indices

50

60

70

80

90

100

110

Clustergram of selected features used in consensus
clustering

Nuclear Features Used to Classify GBMs
2 1 3 4

1
20

40

60
2

Cluster
80

100

3
120

140

160 4

Consensus clustering of morphological
50 100 150
0 0.2 0.
signatures Silho

Study includes 200 million nuclei taken from 480
slides corresponding to 167 distinct patients.

1
Cluster 1
0.9 Cluster 2

Cluster 3
0.8
Cluster 4

0.7

0.6
Survival

0.5

0.4

0.3

0.2

0.1

0
0 500 1000 1500 2000 2500 3000
Days

Survival of morphological clusters

1
Proneural
0.9 Neural

Classical
0.8
Mesenchymal

0.7

0.6
Survival

0.5

0.4

0.3

0.2

0.1

0
0 500 1000 1500 2000 2500 3000
Days

Survival of patients by molecular tumor
subtype

Cluster 2:
What does this mean? Gemistocytic Signature

• No association with
• Transcriptional class
• Age

• Molecular correlates:
Wnt signaling, TP53 signaling,
Oxidative phosphorylation,
mRNA processing


High Resolution Methods

Multicore/GPU Implementation

• Nuclear/cell segmentation
• Feature extraction
• Generate clusters of nuclear/cell
Features
• Classify nuclei/cells by feature cluster
• Classify patients by populations of
classified nuclei/cells


Multicore/GPU Feature Extraction

Optimizing CPU/GPU Memory Management

Optimizing CPU/GPU Irregular Problem
Performance

Scheduling for CPU/GPU System

• PRIORITY: assigns tasks to CPUs or GPUs based
on estimate of relative performance gain of each
task on GPU vs on CPU and load of the CPUs and
GPUs
• Sorted queue of (task, data element) tuples based
on the relative speedup expected for each tuple
• As more (task, data element) tuples are created,
tuples are inserted into the queue such that the
queue remains sorted
• First Come First Served: simple low overhead
policy

Aggregation to reduce kernel invocation and data
movement overheads

• Objects are grouped based on which image tile they
are segmented from. Each tuple consists of (objects
in image tile, feature computation operation)
• Perform group of tasks that are applied to the same
data element (or the same chunk of data elements)
• Performance Aware Task Fusion (PATF) -- groups
tasks based on GPU speedup values. Two groups of
tasks: one group containing tasks with good GPU
speedup values and the other group containing the
remaining tasks

Multiscale Systems Biology
Employ multi-resolution methods to
characterize necrosis, angiogenesis and
correlate these with “omics”

No enhancement Rim-enhancement
Normal Vessels ? Vascular Changes
Stable lesion Rapid progression

Correlation of Necrosis, Angiogenesis and “omics”

• GBMs display variable and regionally heterogeneous degrees
of necrosis (asterisk) and angiogenesis
• These factors may impact gene expression profiles

Genes Correlated with Necrosis include Transcription Factors
Identified as Regulators of the Mesenchymal Transition

•Frozen sections from 88 GBM samples marked
to identify regions of necrosis and angiogenesis
•Extent of both necrosis and angiogenesis calculated as a
percentage of total tissue area

Gene SAM q-value
Symbol (Corrected p-value)
C/EBPB < 0.000001
C/EBPD < 0.000001
FOSL2 < 0.000001
Carro MS, et al. Nature 263: 318-25, 2010
STAT3 0.0047
RUNX1 0.0082

VASARI

Feature Set

Chad Holder
Scott Hwang
Adam Flanders

Defining Rich Set of Qualitative and
Quantitative Image Biomarkers

• Community-driven ontology development
project; collaboration with ASNR
• Imaging features (5 categories)
– Location of lesion
– Morphology of lesion margin (definition, thickness,
enhancement, diffusion)

– Morphology of lesion substance (enhancement, PS
characteristics, focality/multicentricity, necrosis, cysts, midline
invasion, cortical involvement, T1/FLAIR ratio)

– Alterations in vicinity of lesion (edema, edema
crossing midline, hemorrhage, pial invasion, ependymal invasion,
satellites, deep WM invasion, calvarial remodeling)

– Resection features (extent of nCE tissue, CE tissue, resected
components)

Imaging Predictors of survival and molecular profiles
in the TCGA Glioblastoma Data set
The TCGA glioma working group

1Emory University Hospital, Atlanta, GA 2National Cancer Institute, Bethesda, MD. 3Thomas Jefferson University Hospital, Philadelphia,
PA. 4Henry Ford University Hospital, Detroit, Michigan. 5National Institute of Health, Bethesda, MD. 6Boston University School of
Medicine, Boston, MA. 7SAIC-Frederick, Inc., Frederick, MD. 8University of Virginia, Charlottesville, VA. 9 Northwestern University
Chicago, IL

Emory TJU/CBIT/NCI UVA/Northwestern Henry Ford

David A Gutman1 Adam Flanders3 Max Wintermark8 Lisa Scarpace4
Lee Cooper1 Eric Huang2 Manal Jilwan8 Tom Mikkelsen4
Scott N Hwang1 Robert J Clifford2 Prashant Raghavan8
Chad A Holder1 Dina Hammoud3 Pat Mongkolwat9
Doris Gao1 John Freymann7
Carlos Moreno1 Justin Kirby7
Arun Krishnan1 Carl Jaffe6
Jun Kong1
Seena Dehkharghani1
Joel Saltz1
Dan Brat1

Correlative Imaging Results

• Minimal enhancing tumor (≤5%)
strongly associated with Proneural
classification (p=0.0006).

• >5% proportion of necrosis and the
presence of microvascular hyperplasia in
pathology slides (p=0.008).

• Greater maximum tumor dimension (T2
signal) associated with
present/abundant microvascular
hyperplasia (p=0.001).

< 5% Enhancement

Correlative Imaging Results

• TP53 mutant tumors had a smaller mean
tumor sizes (p=0.002) on T2-weighted or
FLAIR images.

• EGFR mutant tumors were significantly
larger than TP53 mutant tumors
(p=0.0005).

• High level EGFR amplification was
associated with >5% enhancement and
>5% proportion of necrosis (p < 0.01).

> 5% Necrosis

Automated feature extraction pipeline being
developed for MRI

Large Scale Data Management

 Implemented with IBM DB2 for large scale

pathology image metadata (~million markups per
slide)
 Represented by a complex data model capturing
multi-faceted information including markups,
annotations, algorithm provenance, specimen, etc.
 Support for complex relationships and spatial
query: multi-level granularities, relationships
between markups and annotations, spatial and
nested relationships

Data Models to Represent Feature Sets
and Experimental Metadata

PAIS |pās| : Pathology Analytical Imaging Standards
• Provide semantically enabled data model to
support pathology analytical imaging
• Data objects, comprehensive data types, and
flexible relationships
• Object-oriented design, easily extensible
• Reuse existing standards
– Reuse relevant classes already defined in AIM
– Follow DICOM WG 26 metadata specifications on WSI
reference
– Specimen information in DICOM Supplement 122 and
caTissue
– Use caDSR for CDE and NCI Thesaurus for ontology

Pathology Imaging GIS
class Domain Mo...

WholeSlideImageReference TMAImageReference
Patient

Region
MicroscopyImageReference
Subject
0..1

0..1
DICOMImageReference 1
1 Specimen
ImageReference 1 0..*
0..* 0..1

1 0..1 AnatomicEntity
User 0..* 1
1
0..1

1 1 Equipment
Group 0..1
1
0..1 1 PAIS
1 0..*
Project AnnotationReference
1
0..1

0..* 1 1 1

Collection 0..* 0..*

1 0..*
0..* Annotation 0..*

0..* Markup
0..1
0..*
GeometricShape 1

Image analysis
1 1
0..* 0..* 0..*
1

Observation Calculation Inference
Surface Field
1 1
0..1 0..1 0..1

Provenance
0..*

PAIS model PAIS data management
Modeling and management of markup and annotation for
querying and sharing through parallel RDBMS + spatial DBMS

Segmentation

HDFS data staging MapReduce based queries
On the fly data processing for algorithm validation/algorithm
Feature extraction
sensitivity studies, or discovery of preliminary results

Workflow, Adaptivity and Quality of Service

• In-transit data processing using
filter/stream systems
• Semantic Workflows
• Hierarchical pipeline design with coarse
and fine grained components
• Adaptivity and Quality of Service

Computerized Classification System
for Grading Neuroblastoma
Initialization Yes
Image Tile Background? Label
I=L
• Background Identification
No

Create Image I(L)
• Image Decomposition (Multi-
Training Tiles resolution levels)
Segmentation I = I -1 • Image Segmentation
Down-sampling
(EMLDA)
Segmentation
Feature Construction
• Feature Construction (2nd
Yes

No order statistics, Tonal
Feature Extraction I > 1?
Feature Construction
Features)
Feature Extraction
Classification
• Feature Extraction (LDA) +
Classification (Bayesian)
Classifier Training
No
• Multi-resolution Layer
Within Confidence
Region ? Controller (Confidence
Yes
TRAINING Region)
TESTING

Slides’ Preparation

• 64990 x 59412 pixels in full resolution
• Original Size: 10.8 Gb; Compressed Sized: ≈ 833Mb

8x

40x

Semantic Workflows (Wings)
Collaborative Work with Yolanda Gil, Mary Hall

• A systematic strategy for composing application
components into workflows
• Search for the most appropriate implementation of
both components and workflows
• Component optimization
– Select among implementation variants of the same
computation
– Derive integer values of optimization parameters
– Only search promising code variants and a restricted
parameter space
• Workflow optimization
– Knowledge-rich representation of workflow properties


Adaptivity

Framework

• Description Module (Wings): Describe application workflow
using semantics of workflow components
• Execution module (Pegasus, DataCutter, Condor): Maps to
resources, generates and places fine grained filter/stream
pipelines
• Tradeoff Module: Schedules execution based on application
level QOS

DataCutter Filter Stream Workflow

Conclusions

• Increasingly common task: Integrative analysis of
complementary multi-resolution and “omic” data sources
• Elucidation of mechanisms, identify medically important
disease sub-classes
• Cell level whole slide analysis of whole slide images
requires innovations in parallelization, data management
• Integration of Radiology and Pathology has promise but
pose additional challenges
• Role of quality of service/on demand computing

Thanks to:
• In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David
Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel
Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director)
• caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin
Kurc, Himanshu Rathod Emory leads
• caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon,
Daniel Rubin, Fred Prior, Larry Tarbox and many others
• In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz
• Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi Schreibmann, Paul
Pantalone
• Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti, Fusheng Wang, Tony
Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun
Hu, Lin Yang, David J. Foran (Rutgers)
• NIH/in silico TCGA Imaging Group: Scott Hwang, Bob Clifford, Erich Huang, Dima
Hammoud, Manal Jilwan, Prashant Raghavan, Max Wintermark, David Gutman, Carlos
Moreno, Lee Cooper, John Freymann, Justin Kirby, Arun Krishnan, Seena Dehkharghani,
Carl Jaffe
• ACTSI Biomedical Informatics Program: Marc Overcash, Tim Morris, Tahsin Kurc,
Alexander Quarshie, Circe Tsui, Adam Davis, Sharon Mason, Andrew Post, Alfredo Tirado-
Ramos
• NSF Scientific Workflow Collaboration: Vijay Kumar, Yolanda Gil, Mary Hall, Ewa Deelman,
Tahsin Kurc, P. Sadayappan, Gaurang Mehta, Karan Vahi

MICCAI - Workshop on High Performance and Distributed Computing for Medical Imaging - Sept 22 2011

More Related Content

What's hot

Similar to MICCAI - Workshop on High Performance and Distributed Computing for Medical Imaging - Sept 22 2011

More from Joel Saltz

MICCAI - Workshop on High Performance and Distributed Computing for Medical Imaging - Sept 22 2011