Integrative Multi-Scale Biomedical Informatics


             Joel Saltz MD, PhD
     Director Center for Comprehensive
                 Informatics
Squeezing Information from Spatial Datasets

                                       • Leverage exascale data and
Center for Comprehensive Informatics




                                         computer resources to
                                         squeeze the most out of
                                         image, sensor or simulation
                                         data
                                       • Run lots of different
                                         algorithms to derive same
                                         features
                                       • Run lots of algorithms to
                                         derive complementary
                                         features
                                       • Data models and data
                                         management infrastructure
                                         to manage data products,
                                         feature sets and results from
                                         classification and machine
                                         learning algorithms
INTEGRATIVE BIOMEDICAL
                                       INFORMATICS ANALYSIS
Center for Comprehensive Informatics




                                       Reproducible anatomic/functional
                                       characterization at gross level (Radiology) and
                                       fine level (Pathology)
                                       Integration of anatomic/functional
                                       characterization with multiple types of “omic”
                                       information
                                       Create categories of jointly classified data to
                                       describe pathophysiology, predict prognosis,
                                       response to treatment
In Silico Center for
Brain Tumor Research
                   Specific Aims:

                   1.   Influence of necrosis/
                   hypoxia on gene expression and
                   genetic classification.

                   2. Molecular correlates of high
                   resolution nuclear morphometry.

                   3.    Gene expression profiles
                   that predict glioma progression.

                   4.  Molecular correlates of MRI
                   enhancement patterns.
Integration of heterogeneous multiscale
                                       information
Center for Comprehensive Informatics




                                       •Coordinated initiatives
                                        Pathology, Radiology,
                                        “omics”
                                                                           Radiology
                                       •Exploit synergies
                                                                            Imaging
                                        between all initiatives
                                        to improve ability to
                                        forecast survival &                             Patient
                                                                  “Omic”                Outco
                                        response.                                         me
                                                                   Data


                                                                           Pathologic
                                                                            Features
Example: Pathology and Gene Expression
Joint Predictors of Recurrence/Survival




        Lee Cooper Carlos Moreno
INTEGRATIVE ANALYSIS
                                       OF BRAIN TUMOR DATA
Center for Comprehensive Informatics
In Silico Center for
Brain Tumor Research
Key Data Sets

REMBRANDT: Gene expression and genomics data
 set of all glioma subtypes

The Cancer Genome Atlas (TCGA): Rich “omics” set
  of GBM, digitized Pathology and Radiology

Pathology and Radiology Images from Henry Ford
  Hospital, Emory, Thomas Jefferson U, MD Anderson
  and others
TCGA Research Network



                    Digital Pathology




                     Neuroimaging
TCGA and REMBRANDT Datasets
Center for Comprehensive Informatics




                                          *Some overlap with TCGA main repository but rescanned at 40X Mikkelsen @ H
                                                        ** Data obtained by generosity of Lisa Scarpace/Tom
                                          ** Data obtained by generosity of Flanders/Mark Curtis @ TJU @ HF
                                                        ***Thanks to Adam Lisa Scarpace/Tom Mikkelsen
                                          ***Thanks to Adam Flanders/Mark Curtis @ TJU
Neuroimaging/MRI Data
Center for Comprehensive Informatics




                                       *Pending Upload to NBIA portal

                                       ** Data obtained by efforts of Lisa Scarpace/Tom Mikkelsen
                                       @ HF
Brain Tumor Radiology and Pathology

                    Anaplastic Astrocytoma
                    (WHO grade III)




                    Glioblastoma
                    (WHO grade IV)
Morphological Tissue Classification
                                           Whole Slide Imaging               Cellular Features
Center for Comprehensive Informatics




                                           Nuclei Segmentation




                                                                                           Jun Kong
Requirements for High Resolution Whole Slide
                                       Feature Characterization
Center for Comprehensive Informatics




                                       • 1010 pixels for each whole slide image
                                       • 10 whole slide images per patient
                                       • 108 image features per whole slide
                                         image
                                       • 10,000 brain tumor patients
                                       • 1015 pixels
                                       • 1013 features
                                       • Hundreds of algorithms
                                       • Annotations and markups from dozens
                                         of humans
Distinguishing Characteristic in
 Gliomas
                                    Nuclear Qualities
           Round shaped with                                 Elongated with
           smooth regular texture                   rough, irregular texture




   Oligodendroglioma                                                           Astrocytoma


 Use image analysis algorithms to segment and classify microanatomic
  features (Nuclei, Astrocytoma, Necrosis ...) in whole slide images
 Represent the segmentation and classification in a well defined
  structured format that can be used to correlate the pathology with
  other data modalities
Astrocytoma vs Oligodendroglima
                                       Overlap in genetics, gene expression, histology
Center for Comprehensive Informatics




                                                                Astrocytoma vs Oligodendroglima
                                                                • Assess nuclear size (area and
                                                                   perimeter), shape (eccentricity,
                                                                   circularity major axis, minor axis, Fourier
                                                                   shape descriptor and extent ratio),
                                                                   intensity (average, maximum, minimum,
                                                                   standard error) and texture (entropy,
                                                                   energy, skewness and kurtosis).
Machine-based Classification of TCGA GBMs (J Kong)
Whole slide scans from 14 TCGA GBMS (69 slides)
7 purely astrocytic in morphology; 7 with 2+ oligo component
399,233 nuclei analyzed for astro/oligo features
Cases were categorized based on ratio of oligo/astro cells




                                                               TCGA Gene
                                                               Expression Query:
                                                               c-Met overexpression
Manual: TCGA Neuropathology Attributes
     120 TCGA specimens; 3 Reviewers
Presence and Degree of:
                             Differentiation:
Microvascular hyperplasia     Small cell component
 Complex/glomeruloid         Gemistocytes
 Endothelial hyperplasia     Oligodendroglial
                              Multi-nucleated/giant cells
Necrosis                      Epithelial metaplasia
 Pseudopalisading pattern    Mesenchymal metaplasia
 Zonal necrosis
                             Other Features
Inflammation
                              Perineuronal/perivascular
 Macrophages/histiocytes       satellitosis
 Lymphocytes                 Entrapped gray or white matter
 Neutrophils                 Micro-mineralization
Nuclear Features Used to Classify GBMs

                                                                 10
Center for Comprehensive Informatics




                                                                 20

                                                                 30

                                                                 40
                                               Feature Indices




                                                                 50

                                                                 60

                                                                 70

                                                                 80

                                                                 90

                                                                 100

                                                                 110

                                       Clustergram of selected features used in consensus
                                       clustering
Nuclear Features Used to Classify GBMs
                                                   2          1          3    4
Center for Comprehensive Informatics




                                                                                                    1
                                              20

                                              40

                                              60
                                                                                                    2




                                                                                          Cluster
                                              80

                                             100

                                                                                                    3
                                             120

                                             140

                                             160                                                    4

                                          Consensus clustering of morphological
                                                         50        100        150
                                                                                                        0   0.2      0.
                                          signatures                                                              Silho


                                           Study includes 200 million nuclei taken from 480
                                           slides corresponding to 167 distinct patients.
1
                                                                                          Cluster 1
                                                  0.9                                     Cluster 2
Center for Comprehensive Informatics



                                                                                          Cluster 3
                                                  0.8
                                                                                          Cluster 4

                                                  0.7

                                                  0.6
                                       Survival




                                                  0.5

                                                  0.4

                                                  0.3

                                                  0.2

                                                  0.1

                                                   0
                                                        0   500   1000   1500   2000   2500       3000
                                                                         Days




                                                        Survival of morphological clusters
1
                                                                                       Proneural
                                                  0.9                                  Neural
Center for Comprehensive Informatics



                                                                                       Classical
                                                  0.8
                                                                                       Mesenchymal

                                                  0.7

                                                  0.6
                                       Survival




                                                  0.5

                                                  0.4

                                                  0.3

                                                  0.2

                                                  0.1

                                                   0
                                                        0   500   1000   1500   2000   2500      3000
                                                                         Days


                                           Survival of patients by molecular tumor
                                           subtype
Cluster 2:
                                       What does this mean?         Gemistocytic Signature
Center for Comprehensive Informatics




                                       • No association with
                                          • Transcriptional class
                                          • Age

                                       • Molecular correlates:
                                         Wnt signaling, TP53 signaling,
                                         Oxidative phosphorylation,
                                         mRNA processing
Center for Comprehensive Informatics

                                       High Resolution Methods
Multicore/GPU Implementation
Center for Comprehensive Informatics




                                       • Nuclear/cell segmentation
                                       • Feature extraction
                                       • Generate clusters of nuclear/cell
                                         Features
                                       • Classify nuclei/cells by feature cluster
                                       • Classify patients by populations of
                                         classified nuclei/cells
Center for Comprehensive Informatics

                                       Multicore/GPU Feature Extraction
Center for Comprehensive Informatics

                                       Multicore/GPU Feature Extraction
Center for Comprehensive Informatics
Center for Comprehensive Informatics
Optimizing CPU/GPU Memory Management
Center for Comprehensive Informatics
Optimizing CPU/GPU Irregular Problem
                                       Performance
Center for Comprehensive Informatics
Scheduling for CPU/GPU System
Center for Comprehensive Informatics




                                       • PRIORITY: assigns tasks to CPUs or GPUs based
                                         on estimate of relative performance gain of each
                                         task on GPU vs on CPU and load of the CPUs and
                                         GPUs
                                       • Sorted queue of (task, data element) tuples based
                                         on the relative speedup expected for each tuple
                                       • As more (task, data element) tuples are created,
                                         tuples are inserted into the queue such that the
                                         queue remains sorted
                                       • First Come First Served: simple low overhead
                                         policy
Aggregation to reduce kernel invocation and data
                                       movement overheads
Center for Comprehensive Informatics




                                       • Objects are grouped based on which image tile they
                                         are segmented from. Each tuple consists of (objects
                                         in image tile, feature computation operation)
                                       • Perform group of tasks that are applied to the same
                                         data element (or the same chunk of data elements)
                                       • Performance Aware Task Fusion (PATF) -- groups
                                         tasks based on GPU speedup values. Two groups of
                                         tasks: one group containing tasks with good GPU
                                         speedup values and the other group containing the
                                         remaining tasks
Center for Comprehensive Informatics
Center for Comprehensive Informatics
Center for Comprehensive Informatics
Multiscale Systems Biology
      Employ multi-resolution methods to
     characterize necrosis, angiogenesis and
          correlate these with “omics”




No enhancement                   Rim-enhancement
Normal Vessels        ?          Vascular Changes
Stable lesion                    Rapid progression
Correlation of Necrosis, Angiogenesis and “omics”
Center for Comprehensive Informatics




                                       • GBMs display variable and regionally heterogeneous degrees
                                         of necrosis (asterisk) and angiogenesis
                                       • These factors may impact gene expression profiles
Genes Correlated with Necrosis include Transcription Factors
                                       Identified as Regulators of the Mesenchymal Transition
Center for Comprehensive Informatics




                                       •Frozen sections from 88 GBM samples marked
                                        to identify regions of necrosis and angiogenesis
                                       •Extent of both necrosis and angiogenesis calculated as a
                                       percentage of total tissue area



                                            Gene       SAM q-value
                                           Symbol   (Corrected p-value)
                                           C/EBPB       < 0.000001
                                           C/EBPD       < 0.000001
                                            FOSL2       < 0.000001
                                                                          Carro MS, et al. Nature 263: 318-25, 2010
                                            STAT3         0.0047
                                           RUNX1          0.0082
VASARI
Center for Comprehensive Informatics




                                                       Feature Set




                                       Chad Holder
                                       Scott Hwang
                                       Adam Flanders
Defining Rich Set of Qualitative and
                                       Quantitative Image Biomarkers
Center for Comprehensive Informatics




                                       • Community-driven ontology development
                                         project; collaboration with ASNR
                                       • Imaging features (5 categories)
                                         – Location of lesion
                                         – Morphology of lesion margin                 (definition, thickness,
                                           enhancement, diffusion)

                                         – Morphology of lesion substance                  (enhancement, PS
                                           characteristics, focality/multicentricity, necrosis, cysts, midline
                                           invasion, cortical involvement, T1/FLAIR ratio)

                                         – Alterations in vicinity of lesion           (edema, edema
                                           crossing midline, hemorrhage, pial invasion, ependymal invasion,
                                           satellites, deep WM invasion, calvarial remodeling)

                                         – Resection features          (extent of nCE tissue, CE tissue, resected
                                           components)
Imaging Predictors of survival and molecular profiles
                                                 in the TCGA Glioblastoma Data set
                                                                         The TCGA glioma working group
Center for Comprehensive Informatics




                                        1Emory University Hospital, Atlanta, GA 2National Cancer Institute, Bethesda, MD. 3Thomas Jefferson University Hospital, Philadelphia,
                                        PA. 4Henry Ford University Hospital, Detroit, Michigan. 5National Institute of Health, Bethesda, MD. 6Boston University School of
                                        Medicine, Boston, MA. 7SAIC-Frederick, Inc., Frederick, MD. 8University of Virginia, Charlottesville, VA. 9 Northwestern University
                                        Chicago, IL

                                              Emory                       TJU/CBIT/NCI                      UVA/Northwestern                          Henry Ford

                                        David A Gutman1                     Adam Flanders3                    Max Wintermark8                       Lisa Scarpace4
                                           Lee Cooper1                       Eric Huang2                        Manal Jilwan8                      Tom Mikkelsen4
                                         Scott N Hwang1                    Robert J Clifford2                Prashant Raghavan8
                                         Chad A Holder1                    Dina Hammoud3                      Pat Mongkolwat9
                                           Doris Gao1                      John Freymann7
                                         Carlos Moreno1                      Justin Kirby7
                                         Arun Krishnan1                       Carl Jaffe6
                                            Jun Kong1
                                       Seena Dehkharghani1
                                            Joel Saltz1
                                             Dan Brat1
Correlative Imaging Results
Center for Comprehensive Informatics




            • Minimal enhancing tumor (≤5%)
              strongly associated with Proneural
              classification (p=0.0006).


            • >5% proportion of necrosis and the
              presence of microvascular hyperplasia in
              pathology slides (p=0.008).


            • Greater maximum tumor dimension (T2
              signal) associated with
              present/abundant microvascular
              hyperplasia (p=0.001).


                                                                     < 5% Enhancement
Correlative Imaging Results
Center for Comprehensive Informatics




                                       • TP53 mutant tumors had a smaller mean
                                         tumor sizes (p=0.002) on T2-weighted or
                                         FLAIR images.


                                       • EGFR mutant tumors were significantly
                                         larger than TP53 mutant tumors
                                         (p=0.0005).


                                       • High level EGFR amplification was
                                         associated with >5% enhancement and
                                         >5% proportion of necrosis (p < 0.01).

                                                                                   > 5% Necrosis
Automated feature extraction pipeline being
                                                   developed for MRI
Center for Comprehensive Informatics
Large Scale Data Management

                                        Implemented with IBM DB2 for large scale
Center for Comprehensive Informatics




                                         pathology image metadata (~million markups per
                                         slide)
                                        Represented by a complex data model capturing
                                         multi-faceted   information   including  markups,
                                         annotations, algorithm provenance, specimen, etc.
                                        Support for complex relationships and spatial
                                         query:    multi-level granularities, relationships
                                         between markups and annotations, spatial and
                                         nested relationships
Data Models to Represent Feature Sets
and Experimental Metadata

PAIS |pās| : Pathology Analytical Imaging Standards
• Provide semantically enabled data model to
  support pathology analytical imaging
• Data objects, comprehensive data types, and
  flexible relationships
• Object-oriented design, easily extensible
• Reuse existing standards
   – Reuse relevant classes already defined in AIM
   – Follow DICOM WG 26 metadata specifications on WSI
     reference
   – Specimen information in DICOM Supplement 122 and
     caTissue
   – Use caDSR for CDE and NCI Thesaurus for ontology
Pathology Imaging GIS
                                                              class Domain Mo...



                                                                  WholeSlideImageReference                         TMAImageReference
                                                                                                                                                                                           Patient


                                                                                                                              Region
                                                                  MicroscopyImageReference
                                                                                                                                                                                           Subject
                                                                                                                                              0..1
Center for Comprehensive Informatics




                                                                                                                                                                           0..1
                                                                     DICOMImageReference                                                                   1
                                                                                                                                         1                                              Specimen
                                                                                                               ImageReference                                  1             0..*
                                                                                                  0..*                                                                                                            0..1

                                                                                                                                                             1             0..1     AnatomicEntity
                                                                              User                                                    0..*                  1
                                                                                                                                                                                                                 1
                                                                                       0..1


                                                                                                                   1              1                                                     Equipment
                                                                             Group                                                                                        0..1
                                                                                                                                                      1
                                                                                        0..1             1             PAIS
                                                                                                                                                      1 0..*
                                                                            Project                                                                                       AnnotationReference
                                                                                                         1
                                                                                        0..1


                                                                                       0..*                    1       1                              1

                                                                          Collection                                                                                0..*                              0..*

                                                                                              1                        0..*
                                                                                                                                       0..*                    Annotation                          0..*

                                                                                       0..*              Markup
                                                                                                                                               0..1
                                                                                       0..*
                                                                  GeometricShape                                                                                                               1




                                        Image analysis
                                                                                                                                                               1                    1
                                                                                                                                                                   0..*             0..*                         0..*
                                                                                                                              1

                                                                                                                                             Observation                      Calculation                    Inference
                                                                          Surface                            Field
                                                                                                                                                          1                             1
                                                                                                                                         0..1              0..1                         0..1

                                                                                                                                                      Provenance
                                                                                                                                                                                               0..*




                                                                                     PAIS model                                                                                                                          PAIS data management
                                                            Modeling and management of markup and annotation for
                                                            querying and sharing through parallel RDBMS + spatial DBMS

                                        Segmentation




                                                                                   HDFS data staging                                                                                                                      MapReduce based queries
                                                            On the fly data processing for algorithm validation/algorithm
                                       Feature extraction
                                                            sensitivity studies, or discovery of preliminary results
Workflow, Adaptivity and Quality of Service

• In-transit data processing using
  filter/stream systems
• Semantic Workflows
• Hierarchical pipeline design with coarse
  and fine grained components
• Adaptivity and Quality of Service
Computerized Classification System
     for Grading Neuroblastoma
                    Initialization                           Yes
Image Tile                              Background?                           Label
                        I=L
                                                                                      • Background Identification
                                            No

                                       Create Image I(L)
                                                                                      • Image Decomposition (Multi-
   Training Tiles                                                                       resolution levels)
                                        Segmentation              I = I -1            • Image Segmentation
  Down-sampling
                                                                                        (EMLDA)
   Segmentation
                                     Feature Construction
                                                                                      • Feature Construction (2nd
                                                             Yes

                                                                             No         order statistics, Tonal
                                      Feature Extraction          I > 1?
Feature Construction
                                                                                        Features)
 Feature Extraction
                                        Classification
                                                                                      • Feature Extraction (LDA) +
                                                                                        Classification (Bayesian)
 Classifier Training
                                                            No
                                                                                      • Multi-resolution Layer
                                      Within Confidence
                                          Region ?                                      Controller (Confidence
                                                            Yes
   TRAINING                                                                             Region)
                                                  TESTING
Slides’ Preparation

• 64990 x 59412 pixels in full resolution
• Original Size: 10.8 Gb; Compressed Sized: ≈ 833Mb



                                                 8x




                                                 40x
Semantic Workflows (Wings)
                                       Collaborative Work with Yolanda Gil, Mary Hall
Center for Comprehensive Informatics




                                       • A systematic strategy for composing application
                                         components into workflows
                                       • Search for the most appropriate implementation of
                                         both components and workflows
                                       • Component optimization
                                          – Select among implementation variants of the same
                                            computation
                                          – Derive integer values of optimization parameters
                                          – Only search promising code variants and a restricted
                                            parameter space
                                       • Workflow optimization
                                          – Knowledge-rich representation of workflow properties
Center for Comprehensive Informatics


                                       Adaptivity
Framework




• Description Module (Wings): Describe application workflow
  using semantics of workflow components
• Execution module (Pegasus, DataCutter, Condor): Maps to
  resources, generates and places fine grained filter/stream
  pipelines
• Tradeoff Module: Schedules execution based on application
  level QOS
DataCutter Filter Stream Workflow
Impact
Conclusions

• Increasingly common task: Integrative analysis of
  complementary multi-resolution and “omic” data sources
• Elucidation of mechanisms, identify medically important
  disease sub-classes
• Cell level whole slide analysis of whole slide images
  requires innovations in parallelization, data management
• Integration of Radiology and Pathology has promise but
  pose additional challenges
• Role of quality of service/on demand computing
Thanks to:
•   In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David
    Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel
    Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director)
•   caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin
    Kurc, Himanshu Rathod Emory leads
•   caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon,
    Daniel Rubin, Fred Prior, Larry Tarbox and many others
•   In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz
•   Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi Schreibmann, Paul
    Pantalone
•   Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti, Fusheng Wang, Tony
    Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun
    Hu, Lin Yang, David J. Foran (Rutgers)
•   NIH/in silico TCGA Imaging Group: Scott Hwang, Bob Clifford, Erich Huang, Dima
    Hammoud, Manal Jilwan, Prashant Raghavan, Max Wintermark, David Gutman, Carlos
    Moreno, Lee Cooper, John Freymann, Justin Kirby, Arun Krishnan, Seena Dehkharghani,
    Carl Jaffe
•   ACTSI Biomedical Informatics Program: Marc Overcash, Tim Morris, Tahsin Kurc,
    Alexander Quarshie, Circe Tsui, Adam Davis, Sharon Mason, Andrew Post, Alfredo Tirado-
    Ramos
•   NSF Scientific Workflow Collaboration: Vijay Kumar, Yolanda Gil, Mary Hall, Ewa Deelman,
    Tahsin Kurc, P. Sadayappan, Gaurang Mehta, Karan Vahi
Thanks!

MICCAI - Workshop on High Performance and Distributed Computing for Medical Imaging - Sept 22 2011

  • 1.
    Integrative Multi-Scale BiomedicalInformatics Joel Saltz MD, PhD Director Center for Comprehensive Informatics
  • 2.
    Squeezing Information fromSpatial Datasets • Leverage exascale data and Center for Comprehensive Informatics computer resources to squeeze the most out of image, sensor or simulation data • Run lots of different algorithms to derive same features • Run lots of algorithms to derive complementary features • Data models and data management infrastructure to manage data products, feature sets and results from classification and machine learning algorithms
  • 3.
    INTEGRATIVE BIOMEDICAL INFORMATICS ANALYSIS Center for Comprehensive Informatics Reproducible anatomic/functional characterization at gross level (Radiology) and fine level (Pathology) Integration of anatomic/functional characterization with multiple types of “omic” information Create categories of jointly classified data to describe pathophysiology, predict prognosis, response to treatment
  • 4.
    In Silico Centerfor Brain Tumor Research Specific Aims: 1. Influence of necrosis/ hypoxia on gene expression and genetic classification. 2. Molecular correlates of high resolution nuclear morphometry. 3. Gene expression profiles that predict glioma progression. 4. Molecular correlates of MRI enhancement patterns.
  • 5.
    Integration of heterogeneousmultiscale information Center for Comprehensive Informatics •Coordinated initiatives Pathology, Radiology, “omics” Radiology •Exploit synergies Imaging between all initiatives to improve ability to forecast survival & Patient “Omic” Outco response. me Data Pathologic Features
  • 6.
    Example: Pathology andGene Expression Joint Predictors of Recurrence/Survival Lee Cooper Carlos Moreno
  • 7.
    INTEGRATIVE ANALYSIS OF BRAIN TUMOR DATA Center for Comprehensive Informatics
  • 8.
    In Silico Centerfor Brain Tumor Research Key Data Sets REMBRANDT: Gene expression and genomics data set of all glioma subtypes The Cancer Genome Atlas (TCGA): Rich “omics” set of GBM, digitized Pathology and Radiology Pathology and Radiology Images from Henry Ford Hospital, Emory, Thomas Jefferson U, MD Anderson and others
  • 9.
    TCGA Research Network Digital Pathology Neuroimaging
  • 10.
    TCGA and REMBRANDTDatasets Center for Comprehensive Informatics *Some overlap with TCGA main repository but rescanned at 40X Mikkelsen @ H ** Data obtained by generosity of Lisa Scarpace/Tom ** Data obtained by generosity of Flanders/Mark Curtis @ TJU @ HF ***Thanks to Adam Lisa Scarpace/Tom Mikkelsen ***Thanks to Adam Flanders/Mark Curtis @ TJU
  • 11.
    Neuroimaging/MRI Data Center forComprehensive Informatics *Pending Upload to NBIA portal ** Data obtained by efforts of Lisa Scarpace/Tom Mikkelsen @ HF
  • 12.
    Brain Tumor Radiologyand Pathology Anaplastic Astrocytoma (WHO grade III) Glioblastoma (WHO grade IV)
  • 13.
    Morphological Tissue Classification Whole Slide Imaging Cellular Features Center for Comprehensive Informatics Nuclei Segmentation Jun Kong
  • 14.
    Requirements for HighResolution Whole Slide Feature Characterization Center for Comprehensive Informatics • 1010 pixels for each whole slide image • 10 whole slide images per patient • 108 image features per whole slide image • 10,000 brain tumor patients • 1015 pixels • 1013 features • Hundreds of algorithms • Annotations and markups from dozens of humans
  • 15.
    Distinguishing Characteristic in Gliomas Nuclear Qualities Round shaped with Elongated with smooth regular texture rough, irregular texture Oligodendroglioma Astrocytoma  Use image analysis algorithms to segment and classify microanatomic features (Nuclei, Astrocytoma, Necrosis ...) in whole slide images  Represent the segmentation and classification in a well defined structured format that can be used to correlate the pathology with other data modalities
  • 16.
    Astrocytoma vs Oligodendroglima Overlap in genetics, gene expression, histology Center for Comprehensive Informatics Astrocytoma vs Oligodendroglima • Assess nuclear size (area and perimeter), shape (eccentricity, circularity major axis, minor axis, Fourier shape descriptor and extent ratio), intensity (average, maximum, minimum, standard error) and texture (entropy, energy, skewness and kurtosis).
  • 17.
    Machine-based Classification ofTCGA GBMs (J Kong) Whole slide scans from 14 TCGA GBMS (69 slides) 7 purely astrocytic in morphology; 7 with 2+ oligo component 399,233 nuclei analyzed for astro/oligo features Cases were categorized based on ratio of oligo/astro cells TCGA Gene Expression Query: c-Met overexpression
  • 18.
    Manual: TCGA NeuropathologyAttributes 120 TCGA specimens; 3 Reviewers Presence and Degree of: Differentiation: Microvascular hyperplasia  Small cell component  Complex/glomeruloid  Gemistocytes  Endothelial hyperplasia  Oligodendroglial  Multi-nucleated/giant cells Necrosis  Epithelial metaplasia  Pseudopalisading pattern  Mesenchymal metaplasia  Zonal necrosis Other Features Inflammation  Perineuronal/perivascular  Macrophages/histiocytes satellitosis  Lymphocytes  Entrapped gray or white matter  Neutrophils  Micro-mineralization
  • 19.
    Nuclear Features Usedto Classify GBMs 10 Center for Comprehensive Informatics 20 30 40 Feature Indices 50 60 70 80 90 100 110 Clustergram of selected features used in consensus clustering
  • 20.
    Nuclear Features Usedto Classify GBMs 2 1 3 4 Center for Comprehensive Informatics 1 20 40 60 2 Cluster 80 100 3 120 140 160 4 Consensus clustering of morphological 50 100 150 0 0.2 0. signatures Silho Study includes 200 million nuclei taken from 480 slides corresponding to 167 distinct patients.
  • 21.
    1 Cluster 1 0.9 Cluster 2 Center for Comprehensive Informatics Cluster 3 0.8 Cluster 4 0.7 0.6 Survival 0.5 0.4 0.3 0.2 0.1 0 0 500 1000 1500 2000 2500 3000 Days Survival of morphological clusters
  • 22.
    1 Proneural 0.9 Neural Center for Comprehensive Informatics Classical 0.8 Mesenchymal 0.7 0.6 Survival 0.5 0.4 0.3 0.2 0.1 0 0 500 1000 1500 2000 2500 3000 Days Survival of patients by molecular tumor subtype
  • 23.
    Cluster 2: What does this mean? Gemistocytic Signature Center for Comprehensive Informatics • No association with • Transcriptional class • Age • Molecular correlates: Wnt signaling, TP53 signaling, Oxidative phosphorylation, mRNA processing
  • 24.
    Center for ComprehensiveInformatics High Resolution Methods
  • 25.
    Multicore/GPU Implementation Center forComprehensive Informatics • Nuclear/cell segmentation • Feature extraction • Generate clusters of nuclear/cell Features • Classify nuclei/cells by feature cluster • Classify patients by populations of classified nuclei/cells
  • 26.
    Center for ComprehensiveInformatics Multicore/GPU Feature Extraction
  • 27.
    Center for ComprehensiveInformatics Multicore/GPU Feature Extraction
  • 28.
  • 29.
  • 30.
    Optimizing CPU/GPU MemoryManagement Center for Comprehensive Informatics
  • 31.
    Optimizing CPU/GPU IrregularProblem Performance Center for Comprehensive Informatics
  • 32.
    Scheduling for CPU/GPUSystem Center for Comprehensive Informatics • PRIORITY: assigns tasks to CPUs or GPUs based on estimate of relative performance gain of each task on GPU vs on CPU and load of the CPUs and GPUs • Sorted queue of (task, data element) tuples based on the relative speedup expected for each tuple • As more (task, data element) tuples are created, tuples are inserted into the queue such that the queue remains sorted • First Come First Served: simple low overhead policy
  • 33.
    Aggregation to reducekernel invocation and data movement overheads Center for Comprehensive Informatics • Objects are grouped based on which image tile they are segmented from. Each tuple consists of (objects in image tile, feature computation operation) • Perform group of tasks that are applied to the same data element (or the same chunk of data elements) • Performance Aware Task Fusion (PATF) -- groups tasks based on GPU speedup values. Two groups of tasks: one group containing tasks with good GPU speedup values and the other group containing the remaining tasks
  • 34.
  • 35.
  • 36.
  • 37.
    Multiscale Systems Biology Employ multi-resolution methods to characterize necrosis, angiogenesis and correlate these with “omics” No enhancement Rim-enhancement Normal Vessels ? Vascular Changes Stable lesion Rapid progression
  • 38.
    Correlation of Necrosis,Angiogenesis and “omics” Center for Comprehensive Informatics • GBMs display variable and regionally heterogeneous degrees of necrosis (asterisk) and angiogenesis • These factors may impact gene expression profiles
  • 39.
    Genes Correlated withNecrosis include Transcription Factors Identified as Regulators of the Mesenchymal Transition Center for Comprehensive Informatics •Frozen sections from 88 GBM samples marked to identify regions of necrosis and angiogenesis •Extent of both necrosis and angiogenesis calculated as a percentage of total tissue area Gene SAM q-value Symbol (Corrected p-value) C/EBPB < 0.000001 C/EBPD < 0.000001 FOSL2 < 0.000001 Carro MS, et al. Nature 263: 318-25, 2010 STAT3 0.0047 RUNX1 0.0082
  • 40.
    VASARI Center for ComprehensiveInformatics Feature Set Chad Holder Scott Hwang Adam Flanders
  • 41.
    Defining Rich Setof Qualitative and Quantitative Image Biomarkers Center for Comprehensive Informatics • Community-driven ontology development project; collaboration with ASNR • Imaging features (5 categories) – Location of lesion – Morphology of lesion margin (definition, thickness, enhancement, diffusion) – Morphology of lesion substance (enhancement, PS characteristics, focality/multicentricity, necrosis, cysts, midline invasion, cortical involvement, T1/FLAIR ratio) – Alterations in vicinity of lesion (edema, edema crossing midline, hemorrhage, pial invasion, ependymal invasion, satellites, deep WM invasion, calvarial remodeling) – Resection features (extent of nCE tissue, CE tissue, resected components)
  • 42.
    Imaging Predictors ofsurvival and molecular profiles in the TCGA Glioblastoma Data set The TCGA glioma working group Center for Comprehensive Informatics 1Emory University Hospital, Atlanta, GA 2National Cancer Institute, Bethesda, MD. 3Thomas Jefferson University Hospital, Philadelphia, PA. 4Henry Ford University Hospital, Detroit, Michigan. 5National Institute of Health, Bethesda, MD. 6Boston University School of Medicine, Boston, MA. 7SAIC-Frederick, Inc., Frederick, MD. 8University of Virginia, Charlottesville, VA. 9 Northwestern University Chicago, IL Emory TJU/CBIT/NCI UVA/Northwestern Henry Ford David A Gutman1 Adam Flanders3 Max Wintermark8 Lisa Scarpace4 Lee Cooper1 Eric Huang2 Manal Jilwan8 Tom Mikkelsen4 Scott N Hwang1 Robert J Clifford2 Prashant Raghavan8 Chad A Holder1 Dina Hammoud3 Pat Mongkolwat9 Doris Gao1 John Freymann7 Carlos Moreno1 Justin Kirby7 Arun Krishnan1 Carl Jaffe6 Jun Kong1 Seena Dehkharghani1 Joel Saltz1 Dan Brat1
  • 43.
    Correlative Imaging Results Centerfor Comprehensive Informatics • Minimal enhancing tumor (≤5%) strongly associated with Proneural classification (p=0.0006). • >5% proportion of necrosis and the presence of microvascular hyperplasia in pathology slides (p=0.008). • Greater maximum tumor dimension (T2 signal) associated with present/abundant microvascular hyperplasia (p=0.001). < 5% Enhancement
  • 44.
    Correlative Imaging Results Centerfor Comprehensive Informatics • TP53 mutant tumors had a smaller mean tumor sizes (p=0.002) on T2-weighted or FLAIR images. • EGFR mutant tumors were significantly larger than TP53 mutant tumors (p=0.0005). • High level EGFR amplification was associated with >5% enhancement and >5% proportion of necrosis (p < 0.01). > 5% Necrosis
  • 45.
    Automated feature extractionpipeline being developed for MRI Center for Comprehensive Informatics
  • 46.
    Large Scale DataManagement  Implemented with IBM DB2 for large scale Center for Comprehensive Informatics pathology image metadata (~million markups per slide)  Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.  Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships
  • 47.
    Data Models toRepresent Feature Sets and Experimental Metadata PAIS |pās| : Pathology Analytical Imaging Standards • Provide semantically enabled data model to support pathology analytical imaging • Data objects, comprehensive data types, and flexible relationships • Object-oriented design, easily extensible • Reuse existing standards – Reuse relevant classes already defined in AIM – Follow DICOM WG 26 metadata specifications on WSI reference – Specimen information in DICOM Supplement 122 and caTissue – Use caDSR for CDE and NCI Thesaurus for ontology
  • 48.
    Pathology Imaging GIS class Domain Mo... WholeSlideImageReference TMAImageReference Patient Region MicroscopyImageReference Subject 0..1 Center for Comprehensive Informatics 0..1 DICOMImageReference 1 1 Specimen ImageReference 1 0..* 0..* 0..1 1 0..1 AnatomicEntity User 0..* 1 1 0..1 1 1 Equipment Group 0..1 1 0..1 1 PAIS 1 0..* Project AnnotationReference 1 0..1 0..* 1 1 1 Collection 0..* 0..* 1 0..* 0..* Annotation 0..* 0..* Markup 0..1 0..* GeometricShape 1 Image analysis 1 1 0..* 0..* 0..* 1 Observation Calculation Inference Surface Field 1 1 0..1 0..1 0..1 Provenance 0..* PAIS model PAIS data management Modeling and management of markup and annotation for querying and sharing through parallel RDBMS + spatial DBMS Segmentation HDFS data staging MapReduce based queries On the fly data processing for algorithm validation/algorithm Feature extraction sensitivity studies, or discovery of preliminary results
  • 49.
    Workflow, Adaptivity andQuality of Service • In-transit data processing using filter/stream systems • Semantic Workflows • Hierarchical pipeline design with coarse and fine grained components • Adaptivity and Quality of Service
  • 50.
    Computerized Classification System for Grading Neuroblastoma Initialization Yes Image Tile Background? Label I=L • Background Identification No Create Image I(L) • Image Decomposition (Multi- Training Tiles resolution levels) Segmentation I = I -1 • Image Segmentation Down-sampling (EMLDA) Segmentation Feature Construction • Feature Construction (2nd Yes No order statistics, Tonal Feature Extraction I > 1? Feature Construction Features) Feature Extraction Classification • Feature Extraction (LDA) + Classification (Bayesian) Classifier Training No • Multi-resolution Layer Within Confidence Region ? Controller (Confidence Yes TRAINING Region) TESTING
  • 51.
    Slides’ Preparation • 64990x 59412 pixels in full resolution • Original Size: 10.8 Gb; Compressed Sized: ≈ 833Mb 8x 40x
  • 52.
    Semantic Workflows (Wings) Collaborative Work with Yolanda Gil, Mary Hall Center for Comprehensive Informatics • A systematic strategy for composing application components into workflows • Search for the most appropriate implementation of both components and workflows • Component optimization – Select among implementation variants of the same computation – Derive integer values of optimization parameters – Only search promising code variants and a restricted parameter space • Workflow optimization – Knowledge-rich representation of workflow properties
  • 53.
    Center for ComprehensiveInformatics Adaptivity
  • 54.
    Framework • Description Module(Wings): Describe application workflow using semantics of workflow components • Execution module (Pegasus, DataCutter, Condor): Maps to resources, generates and places fine grained filter/stream pipelines • Tradeoff Module: Schedules execution based on application level QOS
  • 56.
  • 57.
  • 58.
    Conclusions • Increasingly commontask: Integrative analysis of complementary multi-resolution and “omic” data sources • Elucidation of mechanisms, identify medically important disease sub-classes • Cell level whole slide analysis of whole slide images requires innovations in parallelization, data management • Integration of Radiology and Pathology has promise but pose additional challenges • Role of quality of service/on demand computing
  • 59.
    Thanks to: • In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director) • caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads • caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and many others • In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz • Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi Schreibmann, Paul Pantalone • Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti, Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers) • NIH/in silico TCGA Imaging Group: Scott Hwang, Bob Clifford, Erich Huang, Dima Hammoud, Manal Jilwan, Prashant Raghavan, Max Wintermark, David Gutman, Carlos Moreno, Lee Cooper, John Freymann, Justin Kirby, Arun Krishnan, Seena Dehkharghani, Carl Jaffe • ACTSI Biomedical Informatics Program: Marc Overcash, Tim Morris, Tahsin Kurc, Alexander Quarshie, Circe Tsui, Adam Davis, Sharon Mason, Andrew Post, Alfredo Tirado- Ramos • NSF Scientific Workflow Collaboration: Vijay Kumar, Yolanda Gil, Mary Hall, Ewa Deelman, Tahsin Kurc, P. Sadayappan, Gaurang Mehta, Karan Vahi
  • 60.