Integrative Multi-Scale Biomedical Informatics             Joel Saltz MD, PhD     Director Center for Comprehensive       ...
Squeezing Information from Spatial Datasets                                       • Leverage exascale data andCenter for C...
INTEGRATIVE BIOMEDICAL                                       INFORMATICS ANALYSISCenter for Comprehensive Informatics     ...
In Silico Center forBrain Tumor Research                   Specific Aims:                   1.   Influence of necrosis/   ...
Integration of heterogeneous multiscale                                       informationCenter for Comprehensive Informat...
Example: Pathology and Gene ExpressionJoint Predictors of Recurrence/Survival        Lee Cooper Carlos Moreno
INTEGRATIVE ANALYSIS                                       OF BRAIN TUMOR DATACenter for Comprehensive Informatics
In Silico Center forBrain Tumor ResearchKey Data SetsREMBRANDT: Gene expression and genomics data set of all glioma subtyp...
TCGA Research Network                    Digital Pathology                     Neuroimaging
TCGA and REMBRANDT DatasetsCenter for Comprehensive Informatics                                          *Some overlap wit...
Neuroimaging/MRI DataCenter for Comprehensive Informatics                                       *Pending Upload to NBIA po...
Brain Tumor Radiology and Pathology                    Anaplastic Astrocytoma                    (WHO grade III)          ...
Morphological Tissue Classification                                           Whole Slide Imaging               Cellular F...
Requirements for High Resolution Whole Slide                                       Feature CharacterizationCenter for Comp...
Distinguishing Characteristic in Gliomas                                    Nuclear Qualities           Round shaped with ...
Astrocytoma vs Oligodendroglima                                       Overlap in genetics, gene expression, histologyCente...
Machine-based Classification of TCGA GBMs (J Kong)Whole slide scans from 14 TCGA GBMS (69 slides)7 purely astrocytic in mo...
Manual: TCGA Neuropathology Attributes     120 TCGA specimens; 3 ReviewersPresence and Degree of:                         ...
Nuclear Features Used to Classify GBMs                                                                 10Center for Compre...
Nuclear Features Used to Classify GBMs                                                   2          1          3    4Cente...
1                                                                                          Cluster 1                      ...
1                                                                                       Proneural                         ...
Cluster 2:                                       What does this mean?         Gemistocytic SignatureCenter for Comprehensi...
Center for Comprehensive Informatics                                       High Resolution Methods
Multicore/GPU ImplementationCenter for Comprehensive Informatics                                       • Nuclear/cell segm...
Center for Comprehensive Informatics                                       Multicore/GPU Feature Extraction
Center for Comprehensive Informatics                                       Multicore/GPU Feature Extraction
Center for Comprehensive Informatics
Center for Comprehensive Informatics
Optimizing CPU/GPU Memory ManagementCenter for Comprehensive Informatics
Optimizing CPU/GPU Irregular Problem                                       PerformanceCenter for Comprehensive Informatics
Scheduling for CPU/GPU SystemCenter for Comprehensive Informatics                                       • PRIORITY: assign...
Aggregation to reduce kernel invocation and data                                       movement overheadsCenter for Compre...
Center for Comprehensive Informatics
Center for Comprehensive Informatics
Center for Comprehensive Informatics
Multiscale Systems Biology      Employ multi-resolution methods to     characterize necrosis, angiogenesis and          co...
Correlation of Necrosis, Angiogenesis and “omics”Center for Comprehensive Informatics                                     ...
Genes Correlated with Necrosis include Transcription Factors                                       Identified as Regulator...
VASARICenter for Comprehensive Informatics                                                       Feature Set              ...
Defining Rich Set of Qualitative and                                       Quantitative Image BiomarkersCenter for Compreh...
Imaging Predictors of survival and molecular profiles                                                 in the TCGA Glioblas...
Correlative Imaging ResultsCenter for Comprehensive Informatics            • Minimal enhancing tumor (≤5%)              st...
Correlative Imaging ResultsCenter for Comprehensive Informatics                                       • TP53 mutant tumors...
Automated feature extraction pipeline being                                                   developed for MRICenter for ...
Large Scale Data Management                                        Implemented with IBM DB2 for large scaleCenter for Com...
Data Models to Represent Feature Setsand Experimental MetadataPAIS |pās| : Pathology Analytical Imaging Standards• Provide...
Pathology Imaging GIS                                                              class Domain Mo...                     ...
Workflow, Adaptivity and Quality of Service• In-transit data processing using  filter/stream systems• Semantic Workflows• ...
Computerized Classification System     for Grading Neuroblastoma                    Initialization                        ...
Slides’ Preparation• 64990 x 59412 pixels in full resolution• Original Size: 10.8 Gb; Compressed Sized: ≈ 833Mb           ...
Semantic Workflows (Wings)                                       Collaborative Work with Yolanda Gil, Mary HallCenter for ...
Center for Comprehensive Informatics                                       Adaptivity
Framework• Description Module (Wings): Describe application workflow  using semantics of workflow components• Execution mo...
DataCutter Filter Stream Workflow
Impact
Conclusions• Increasingly common task: Integrative analysis of  complementary multi-resolution and “omic” data sources• El...
Thanks to:•   In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David    Gutman, Jun Kon...
Thanks!
MICCAI - Workshop on High Performance and Distributed Computing for Medical Imaging - Sept 22 2011
Upcoming SlideShare
Loading in …5
×

MICCAI - Workshop on High Performance and Distributed Computing for Medical Imaging - Sept 22 2011

615 views

Published on

Keynote talk at HP-MICCAI / MICCAI-DCI 2011, The Joint Workshop on High Performance and Distributed Computing for Medical Imaging, featured at MICCAI 2011, held on September 22nd 2011 in Toronto, Canada

  • Be the first to comment

  • Be the first to like this

MICCAI - Workshop on High Performance and Distributed Computing for Medical Imaging - Sept 22 2011

  1. 1. Integrative Multi-Scale Biomedical Informatics Joel Saltz MD, PhD Director Center for Comprehensive Informatics
  2. 2. Squeezing Information from Spatial Datasets • Leverage exascale data andCenter for Comprehensive Informatics computer resources to squeeze the most out of image, sensor or simulation data • Run lots of different algorithms to derive same features • Run lots of algorithms to derive complementary features • Data models and data management infrastructure to manage data products, feature sets and results from classification and machine learning algorithms
  3. 3. INTEGRATIVE BIOMEDICAL INFORMATICS ANALYSISCenter for Comprehensive Informatics Reproducible anatomic/functional characterization at gross level (Radiology) and fine level (Pathology) Integration of anatomic/functional characterization with multiple types of “omic” information Create categories of jointly classified data to describe pathophysiology, predict prognosis, response to treatment
  4. 4. In Silico Center forBrain Tumor Research Specific Aims: 1. Influence of necrosis/ hypoxia on gene expression and genetic classification. 2. Molecular correlates of high resolution nuclear morphometry. 3. Gene expression profiles that predict glioma progression. 4. Molecular correlates of MRI enhancement patterns.
  5. 5. Integration of heterogeneous multiscale informationCenter for Comprehensive Informatics •Coordinated initiatives Pathology, Radiology, “omics” Radiology •Exploit synergies Imaging between all initiatives to improve ability to forecast survival & Patient “Omic” Outco response. me Data Pathologic Features
  6. 6. Example: Pathology and Gene ExpressionJoint Predictors of Recurrence/Survival Lee Cooper Carlos Moreno
  7. 7. INTEGRATIVE ANALYSIS OF BRAIN TUMOR DATACenter for Comprehensive Informatics
  8. 8. In Silico Center forBrain Tumor ResearchKey Data SetsREMBRANDT: Gene expression and genomics data set of all glioma subtypesThe Cancer Genome Atlas (TCGA): Rich “omics” set of GBM, digitized Pathology and RadiologyPathology and Radiology Images from Henry Ford Hospital, Emory, Thomas Jefferson U, MD Anderson and others
  9. 9. TCGA Research Network Digital Pathology Neuroimaging
  10. 10. TCGA and REMBRANDT DatasetsCenter for Comprehensive Informatics *Some overlap with TCGA main repository but rescanned at 40X Mikkelsen @ H ** Data obtained by generosity of Lisa Scarpace/Tom ** Data obtained by generosity of Flanders/Mark Curtis @ TJU @ HF ***Thanks to Adam Lisa Scarpace/Tom Mikkelsen ***Thanks to Adam Flanders/Mark Curtis @ TJU
  11. 11. Neuroimaging/MRI DataCenter for Comprehensive Informatics *Pending Upload to NBIA portal ** Data obtained by efforts of Lisa Scarpace/Tom Mikkelsen @ HF
  12. 12. Brain Tumor Radiology and Pathology Anaplastic Astrocytoma (WHO grade III) Glioblastoma (WHO grade IV)
  13. 13. Morphological Tissue Classification Whole Slide Imaging Cellular FeaturesCenter for Comprehensive Informatics Nuclei Segmentation Jun Kong
  14. 14. Requirements for High Resolution Whole Slide Feature CharacterizationCenter for Comprehensive Informatics • 1010 pixels for each whole slide image • 10 whole slide images per patient • 108 image features per whole slide image • 10,000 brain tumor patients • 1015 pixels • 1013 features • Hundreds of algorithms • Annotations and markups from dozens of humans
  15. 15. Distinguishing Characteristic in Gliomas Nuclear Qualities Round shaped with Elongated with smooth regular texture rough, irregular texture Oligodendroglioma Astrocytoma Use image analysis algorithms to segment and classify microanatomic features (Nuclei, Astrocytoma, Necrosis ...) in whole slide images Represent the segmentation and classification in a well defined structured format that can be used to correlate the pathology with other data modalities
  16. 16. Astrocytoma vs Oligodendroglima Overlap in genetics, gene expression, histologyCenter for Comprehensive Informatics Astrocytoma vs Oligodendroglima • Assess nuclear size (area and perimeter), shape (eccentricity, circularity major axis, minor axis, Fourier shape descriptor and extent ratio), intensity (average, maximum, minimum, standard error) and texture (entropy, energy, skewness and kurtosis).
  17. 17. Machine-based Classification of TCGA GBMs (J Kong)Whole slide scans from 14 TCGA GBMS (69 slides)7 purely astrocytic in morphology; 7 with 2+ oligo component399,233 nuclei analyzed for astro/oligo featuresCases were categorized based on ratio of oligo/astro cells TCGA Gene Expression Query: c-Met overexpression
  18. 18. Manual: TCGA Neuropathology Attributes 120 TCGA specimens; 3 ReviewersPresence and Degree of: Differentiation:Microvascular hyperplasia  Small cell component Complex/glomeruloid  Gemistocytes Endothelial hyperplasia  Oligodendroglial  Multi-nucleated/giant cellsNecrosis  Epithelial metaplasia Pseudopalisading pattern  Mesenchymal metaplasia Zonal necrosis Other FeaturesInflammation  Perineuronal/perivascular Macrophages/histiocytes satellitosis Lymphocytes  Entrapped gray or white matter Neutrophils  Micro-mineralization
  19. 19. Nuclear Features Used to Classify GBMs 10Center for Comprehensive Informatics 20 30 40 Feature Indices 50 60 70 80 90 100 110 Clustergram of selected features used in consensus clustering
  20. 20. Nuclear Features Used to Classify GBMs 2 1 3 4Center for Comprehensive Informatics 1 20 40 60 2 Cluster 80 100 3 120 140 160 4 Consensus clustering of morphological 50 100 150 0 0.2 0. signatures Silho Study includes 200 million nuclei taken from 480 slides corresponding to 167 distinct patients.
  21. 21. 1 Cluster 1 0.9 Cluster 2Center for Comprehensive Informatics Cluster 3 0.8 Cluster 4 0.7 0.6 Survival 0.5 0.4 0.3 0.2 0.1 0 0 500 1000 1500 2000 2500 3000 Days Survival of morphological clusters
  22. 22. 1 Proneural 0.9 NeuralCenter for Comprehensive Informatics Classical 0.8 Mesenchymal 0.7 0.6 Survival 0.5 0.4 0.3 0.2 0.1 0 0 500 1000 1500 2000 2500 3000 Days Survival of patients by molecular tumor subtype
  23. 23. Cluster 2: What does this mean? Gemistocytic SignatureCenter for Comprehensive Informatics • No association with • Transcriptional class • Age • Molecular correlates: Wnt signaling, TP53 signaling, Oxidative phosphorylation, mRNA processing
  24. 24. Center for Comprehensive Informatics High Resolution Methods
  25. 25. Multicore/GPU ImplementationCenter for Comprehensive Informatics • Nuclear/cell segmentation • Feature extraction • Generate clusters of nuclear/cell Features • Classify nuclei/cells by feature cluster • Classify patients by populations of classified nuclei/cells
  26. 26. Center for Comprehensive Informatics Multicore/GPU Feature Extraction
  27. 27. Center for Comprehensive Informatics Multicore/GPU Feature Extraction
  28. 28. Center for Comprehensive Informatics
  29. 29. Center for Comprehensive Informatics
  30. 30. Optimizing CPU/GPU Memory ManagementCenter for Comprehensive Informatics
  31. 31. Optimizing CPU/GPU Irregular Problem PerformanceCenter for Comprehensive Informatics
  32. 32. Scheduling for CPU/GPU SystemCenter for Comprehensive Informatics • PRIORITY: assigns tasks to CPUs or GPUs based on estimate of relative performance gain of each task on GPU vs on CPU and load of the CPUs and GPUs • Sorted queue of (task, data element) tuples based on the relative speedup expected for each tuple • As more (task, data element) tuples are created, tuples are inserted into the queue such that the queue remains sorted • First Come First Served: simple low overhead policy
  33. 33. Aggregation to reduce kernel invocation and data movement overheadsCenter for Comprehensive Informatics • Objects are grouped based on which image tile they are segmented from. Each tuple consists of (objects in image tile, feature computation operation) • Perform group of tasks that are applied to the same data element (or the same chunk of data elements) • Performance Aware Task Fusion (PATF) -- groups tasks based on GPU speedup values. Two groups of tasks: one group containing tasks with good GPU speedup values and the other group containing the remaining tasks
  34. 34. Center for Comprehensive Informatics
  35. 35. Center for Comprehensive Informatics
  36. 36. Center for Comprehensive Informatics
  37. 37. Multiscale Systems Biology Employ multi-resolution methods to characterize necrosis, angiogenesis and correlate these with “omics”No enhancement Rim-enhancementNormal Vessels ? Vascular ChangesStable lesion Rapid progression
  38. 38. Correlation of Necrosis, Angiogenesis and “omics”Center for Comprehensive Informatics • GBMs display variable and regionally heterogeneous degrees of necrosis (asterisk) and angiogenesis • These factors may impact gene expression profiles
  39. 39. Genes Correlated with Necrosis include Transcription Factors Identified as Regulators of the Mesenchymal TransitionCenter for Comprehensive Informatics •Frozen sections from 88 GBM samples marked to identify regions of necrosis and angiogenesis •Extent of both necrosis and angiogenesis calculated as a percentage of total tissue area Gene SAM q-value Symbol (Corrected p-value) C/EBPB < 0.000001 C/EBPD < 0.000001 FOSL2 < 0.000001 Carro MS, et al. Nature 263: 318-25, 2010 STAT3 0.0047 RUNX1 0.0082
  40. 40. VASARICenter for Comprehensive Informatics Feature Set Chad Holder Scott Hwang Adam Flanders
  41. 41. Defining Rich Set of Qualitative and Quantitative Image BiomarkersCenter for Comprehensive Informatics • Community-driven ontology development project; collaboration with ASNR • Imaging features (5 categories) – Location of lesion – Morphology of lesion margin (definition, thickness, enhancement, diffusion) – Morphology of lesion substance (enhancement, PS characteristics, focality/multicentricity, necrosis, cysts, midline invasion, cortical involvement, T1/FLAIR ratio) – Alterations in vicinity of lesion (edema, edema crossing midline, hemorrhage, pial invasion, ependymal invasion, satellites, deep WM invasion, calvarial remodeling) – Resection features (extent of nCE tissue, CE tissue, resected components)
  42. 42. Imaging Predictors of survival and molecular profiles in the TCGA Glioblastoma Data set The TCGA glioma working groupCenter for Comprehensive Informatics 1Emory University Hospital, Atlanta, GA 2National Cancer Institute, Bethesda, MD. 3Thomas Jefferson University Hospital, Philadelphia, PA. 4Henry Ford University Hospital, Detroit, Michigan. 5National Institute of Health, Bethesda, MD. 6Boston University School of Medicine, Boston, MA. 7SAIC-Frederick, Inc., Frederick, MD. 8University of Virginia, Charlottesville, VA. 9 Northwestern University Chicago, IL Emory TJU/CBIT/NCI UVA/Northwestern Henry Ford David A Gutman1 Adam Flanders3 Max Wintermark8 Lisa Scarpace4 Lee Cooper1 Eric Huang2 Manal Jilwan8 Tom Mikkelsen4 Scott N Hwang1 Robert J Clifford2 Prashant Raghavan8 Chad A Holder1 Dina Hammoud3 Pat Mongkolwat9 Doris Gao1 John Freymann7 Carlos Moreno1 Justin Kirby7 Arun Krishnan1 Carl Jaffe6 Jun Kong1 Seena Dehkharghani1 Joel Saltz1 Dan Brat1
  43. 43. Correlative Imaging ResultsCenter for Comprehensive Informatics • Minimal enhancing tumor (≤5%) strongly associated with Proneural classification (p=0.0006). • >5% proportion of necrosis and the presence of microvascular hyperplasia in pathology slides (p=0.008). • Greater maximum tumor dimension (T2 signal) associated with present/abundant microvascular hyperplasia (p=0.001). < 5% Enhancement
  44. 44. Correlative Imaging ResultsCenter for Comprehensive Informatics • TP53 mutant tumors had a smaller mean tumor sizes (p=0.002) on T2-weighted or FLAIR images. • EGFR mutant tumors were significantly larger than TP53 mutant tumors (p=0.0005). • High level EGFR amplification was associated with >5% enhancement and >5% proportion of necrosis (p < 0.01). > 5% Necrosis
  45. 45. Automated feature extraction pipeline being developed for MRICenter for Comprehensive Informatics
  46. 46. Large Scale Data Management  Implemented with IBM DB2 for large scaleCenter for Comprehensive Informatics pathology image metadata (~million markups per slide)  Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.  Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships
  47. 47. Data Models to Represent Feature Setsand Experimental MetadataPAIS |pās| : Pathology Analytical Imaging Standards• Provide semantically enabled data model to support pathology analytical imaging• Data objects, comprehensive data types, and flexible relationships• Object-oriented design, easily extensible• Reuse existing standards – Reuse relevant classes already defined in AIM – Follow DICOM WG 26 metadata specifications on WSI reference – Specimen information in DICOM Supplement 122 and caTissue – Use caDSR for CDE and NCI Thesaurus for ontology
  48. 48. Pathology Imaging GIS class Domain Mo... WholeSlideImageReference TMAImageReference Patient Region MicroscopyImageReference Subject 0..1Center for Comprehensive Informatics 0..1 DICOMImageReference 1 1 Specimen ImageReference 1 0..* 0..* 0..1 1 0..1 AnatomicEntity User 0..* 1 1 0..1 1 1 Equipment Group 0..1 1 0..1 1 PAIS 1 0..* Project AnnotationReference 1 0..1 0..* 1 1 1 Collection 0..* 0..* 1 0..* 0..* Annotation 0..* 0..* Markup 0..1 0..* GeometricShape 1 Image analysis 1 1 0..* 0..* 0..* 1 Observation Calculation Inference Surface Field 1 1 0..1 0..1 0..1 Provenance 0..* PAIS model PAIS data management Modeling and management of markup and annotation for querying and sharing through parallel RDBMS + spatial DBMS Segmentation HDFS data staging MapReduce based queries On the fly data processing for algorithm validation/algorithm Feature extraction sensitivity studies, or discovery of preliminary results
  49. 49. Workflow, Adaptivity and Quality of Service• In-transit data processing using filter/stream systems• Semantic Workflows• Hierarchical pipeline design with coarse and fine grained components• Adaptivity and Quality of Service
  50. 50. Computerized Classification System for Grading Neuroblastoma Initialization YesImage Tile Background? Label I=L • Background Identification No Create Image I(L) • Image Decomposition (Multi- Training Tiles resolution levels) Segmentation I = I -1 • Image Segmentation Down-sampling (EMLDA) Segmentation Feature Construction • Feature Construction (2nd Yes No order statistics, Tonal Feature Extraction I > 1?Feature Construction Features) Feature Extraction Classification • Feature Extraction (LDA) + Classification (Bayesian) Classifier Training No • Multi-resolution Layer Within Confidence Region ? Controller (Confidence Yes TRAINING Region) TESTING
  51. 51. Slides’ Preparation• 64990 x 59412 pixels in full resolution• Original Size: 10.8 Gb; Compressed Sized: ≈ 833Mb 8x 40x
  52. 52. Semantic Workflows (Wings) Collaborative Work with Yolanda Gil, Mary HallCenter for Comprehensive Informatics • A systematic strategy for composing application components into workflows • Search for the most appropriate implementation of both components and workflows • Component optimization – Select among implementation variants of the same computation – Derive integer values of optimization parameters – Only search promising code variants and a restricted parameter space • Workflow optimization – Knowledge-rich representation of workflow properties
  53. 53. Center for Comprehensive Informatics Adaptivity
  54. 54. Framework• Description Module (Wings): Describe application workflow using semantics of workflow components• Execution module (Pegasus, DataCutter, Condor): Maps to resources, generates and places fine grained filter/stream pipelines• Tradeoff Module: Schedules execution based on application level QOS
  55. 55. DataCutter Filter Stream Workflow
  56. 56. Impact
  57. 57. Conclusions• Increasingly common task: Integrative analysis of complementary multi-resolution and “omic” data sources• Elucidation of mechanisms, identify medically important disease sub-classes• Cell level whole slide analysis of whole slide images requires innovations in parallelization, data management• Integration of Radiology and Pathology has promise but pose additional challenges• Role of quality of service/on demand computing
  58. 58. Thanks to:• In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director)• caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads• caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and many others• In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz• Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi Schreibmann, Paul Pantalone• Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti, Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers)• NIH/in silico TCGA Imaging Group: Scott Hwang, Bob Clifford, Erich Huang, Dima Hammoud, Manal Jilwan, Prashant Raghavan, Max Wintermark, David Gutman, Carlos Moreno, Lee Cooper, John Freymann, Justin Kirby, Arun Krishnan, Seena Dehkharghani, Carl Jaffe• ACTSI Biomedical Informatics Program: Marc Overcash, Tim Morris, Tahsin Kurc, Alexander Quarshie, Circe Tsui, Adam Davis, Sharon Mason, Andrew Post, Alfredo Tirado- Ramos• NSF Scientific Workflow Collaboration: Vijay Kumar, Yolanda Gil, Mary Hall, Ewa Deelman, Tahsin Kurc, P. Sadayappan, Gaurang Mehta, Karan Vahi
  59. 59. Thanks!

×