High Dimensional Fused-Informatics


Published on

Tools, methods, algorithms, software for integrative analysis of Pathology, Radiology, "omics" and outcome data.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Combine with next slide.Graphical representation
  • 10 billion pixels1 million markups, 100 million featuresQuadrillion pixels10 trillion features
  • Metadata about imagesMetadata about image targets, how images are derived (patient, specimen, anatomicEntity, etc)3) Metadata about analyses (the purpose of the analysis, who performed the analysis, etc) 4) Image markups -- a markup delineates a spatial region (e.g., as points, lines, polygons, multi-polygons) in images5) Annotation: Image features: a type of annotation calculated or derived from the markups6) Annotation: observation -- an annotation associates semantic meaning to markup entities through coded or free text terms that provide explanatory or descriptive information7) provenance information, i.e., the derivation history of a markup or annotation, including algorithm information, parameters, and inputsNative XML database based approachSmall sized PAIS documents, e.g., organ, tissue, or region level annotationsNo mapping needed, support standard XML queriesRelational and spatial database approachFor large scale PAIS documents, e.g., analysis results at cellular or subcellular level Data mapped into relational tables and spatial objectsHighly efficient on storage and queries
  • Instead, we develop a system called Hadoop-GIS, and provide a generic framework to support high performance spatial queries and analytics for spatial big data on MapReduce and CPU-GPU hybrid systems.Hadoop-GIS provides: …
  • High Dimensional Fused-Informatics

    1. 1. High Dimensional Fused- Informatics Joel Saltz MD, PhD Chair Biomedical Informatics Stony Brook University Associate Director for Informatics, Stony Brook Cancer Center
    2. 2. Integrative Biomedical Informatics Analysis • Reproducible anatomic/functional characterization at fine level (Pathology) and gross level (Radiology) • Integrate of anatomic/functional characterization, multiple types of “omic” information, outcome • Predict treatment outcome, select, monitor treatments • Integrated analysis and presentation of observations, features Radiology Imaging Patient Outcome Pathologic Features “Omic” Data
    3. 3. Pathology and Radiology imaging have different properties in roles of discovery and aggressiveness potential • Differences – arise from differing capabilities & need not completely correspond – sampling differences & global properties – differing purposes • discovery, staging, IMRT/brachyRx planning – Pathology – high spatial and increasing molecular resolution – Radiology – global view, temporal information, increasing spatial resolution Carl Jaffe
    4. 4. Correlating Imaging Phenotypes with Genomic Signatures: Scientific Opportunities (Imaging Genomics Workshop NCI June 2013) Clinical Approach and Use • Development of imaging+analysis methods to characterize heterogeneity • within a tumor at one time point • evolution over time • among different tumor types • Development of imaging metrics that: • can predict and detect emergence of resistance? • correlates with genomic heterogeneity? • correlates with habitat heterogeneity? • can identify more homogeneous sub-types
    5. 5. VASARI Feature Set
    6. 6. Pathology Analytical Imaging • Provide rich information about morphological and functional characteristics • Image analysis, feature extraction on multiple scales • Spatially mapped “omics” • Multiple microscopy modalities Glass Slides Scanning Whole Slide Images Image Analysis
    7. 7. Morphological Tissue Classification Nuclei Segmentation Cellular Features Lee Cooper, Jun Kong Whole Slide Imaging
    8. 8. Quantitative Feature Analysis in Pathology: Emory In Silico Center for Brain Tumor Research (PI = Dan Brat, PD= Joel Saltz) NLM/NCI: Integrative Analysis/Digital Pathology R01LM011119, R01LM009239 (Dual PIs Joel Saltz, David Foran)
    9. 9. Millions of Nuclei Defined by n Features • Top-down analysis: analyze features in context of existing diagnostic constructs • Bottom-up analysis: let nuclear features define and drive the analysis
    10. 10. Direct Study of Relationship Between vs Lee Cooper, Carlos Moreno
    11. 11. Clustering identifies three morphological groups• Analyzed 200 million nuclei from 162 TCGA GBMs (462 slides) • Named for functions of associated genes: Cell Cycle (CC), Chromatin Modification (CM), Protein Biosynthesis (PB) • Prognostically-significant (logrank p=4.5e-4) FeatureIndices CC CM PB 10 20 30 40 50 0 500 1000 1500 2000 2500 3000 0 0.2 0.4 0.6 0.8 1 Days Survival CC CM PB
    12. 12. Associations
    13. 13. Millions of Nuclei Defined by n Features • Top-down analysis: use the features with existing diagnostic constructs • Bottom-up analysis: let features define and drive the analysis
    14. 14. Nuclear Analysis Workflow • Describe individual nuclei in terms of size, shape, and texture Step 2: Feature Extraction Step 1: Nuclei Segmentation
    15. 15. Oligodendroglioma Astrocytoma Nuclear Qualities 1 10 Step 3: Nuclei Classification
    16. 16. Survival Analysis Human Machine
    17. 17. Gene Expression Correlates of High Oligo-Astro Ratio on Machine-based Classification Oligo Related Genes Myelin Basic Protein Proteolipoprotein HoxD1 Nuclear features most Associated with Oligo Signature Genes: Circularity (high) Eccentricity (low)
    18. 18. Role of Microenvironment • Necrosis in TCGA GBM tissue samples v.s. Verhaak transcriptional class • Mesenchymal transcriptional class -- greater levels of necrosis than other classes • Gene expression signatures of nonmesenchymal GBMs became more similar to the mesenchymal signature with increasing levels of necrosis
    19. 19. Microenvironment and Master Regulators • Extent of Necrosis Related Expression of Master Regulators of the Mesenchymal Transition Necrosis and C/EBP-β
    20. 20. Computation and Data Management: Requirements and Challenges • Explosion of derived data – 105x105 pixels per image – 1 million objects per image – Hundreds to thousands of images per study • High computational complexity – Image analysis, feature extraction, machine learning pipelines – Spatial queries involve heavy duty geometric computations
    21. 21. Projection – 2025 • 100K – 1M pathology slides/hospital/year • 2GB compressed per slide • 1-10 slides used for Pathologist computer aided diagnosis • 100-10K slides used in hospital Quality control • Groups of 100K+ slides used for clinical research studies -- Combined with molecular, outcome data
    22. 22. HPC: Tools for Image Analysis, Feature Extraction, Machine Learning Pipelines
    23. 23. HPC Whole Slide Segmentation and Feature Extraction Pipeline Tony Pan, George Teodoro, Tahsin Kurc and Scott Klasky
    24. 24. Titan – Peak Speed 30,000,000,000,000,000 floating point operations per second!
    25. 25. Large Scale Data Management  Data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.  Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships  Highly optimized spatial query and analyses  Implemented in a variety of ways including optimized CPU/GPU, Hadoop/HDFS and IBM DB2
    26. 26. Spatial Centric – Pathology Imaging “GIS” Point query: human marked point inside a nucleus . Window query: return markups contained in a rectangle Spatial join query: algorithm validation/comparison Containment query: nuclear feature aggregation in tumor regions Fusheng Wang
    27. 27. PAIS (Pathology Analytical Imaging Standards) • PAIS Logical Model – 62 UML classes – markups, annotations, imageReferences, provenance • PAIS Data Representation – XML (compressed) or HDF5 • PAIS Databases – loading, managing and querying and sharing data – Native XML DBMS or RDBMS + SDBMS class Domain Mo... Annotation GeometricShape CalculationObservation Specimen ImageReference Provenance User PAIS Equipment Group AnatomicEntity Subject Field Project MicroscopyImageReference DICOMImageReference TMAImageReference Markup Inference Region WholeSlideImageReference Patient Surface Collection AnnotationReference 10..1 1 0..1 0..* 0..* 1 0..* 1 0..1 1 0..* 1 0..1 1 0..1 1 0..1 1 0..* 1 0..* 0..* 0..* 1 0..1 1 0..1 1 0..* 0..1 0..* 1 0..* 1 0..1 1 0..* 1 0..1 1 0..1 1 0..* 10..* 1 0..* 1 0..* Fusheng Wang
    28. 28. High Performance Spatial Queries and Analytics: Hadoop-GIS General framework to support high performance spatial queries and analytics for spatial big data on MapReduce and CPU-GPU hybrid platforms • Spatial data processing methods and pipelines with spatial partition level parallelism running on MapReduce • Multi-level indexing methods to accelerate spatial data processing • Declarative spatial queries and translation into MapReduce operations • Utilize GPU to parallelize spatial operations and integrate them into MapReduce [VLDB’12, GIS’12, GIS’13, VLDB’13]
    29. 29. MICCAI 2014 BRAIN TUMOR Classification and Segmentation Challenges TCGA TCIA IMAGING CHALLENGE DIGITAL PATHOLOGY CHALLENGE Phase 1: Training June 20 - July 31 Phase 2: Leader Board Aug 1 - Aug 29 Phase 3: Test Sept 8 - Sept 12 For more information about these challenges and a related workshop on September 14, 2014 at MICCAI in Boston, see: cancerimagingarchive.net MICCAI: Medical Image Computing and Computer Aided Interventions - MICCAI2014.org TCGA: The Cancer Genome Atlas - cancergenome.nih.gov TCIA: The Cancer Image Archive - cancerimagingarchive.net
    30. 30. Digital Pathology/Brain Tumor Image Segmentation (BRATS) • Use data currently available through data archive resources of the National Institutes of Health (NIH), namely, the Cancer Genome Atlas (TCGA) and the Cancer Image Archive (TCIA) • Digital Pathology challenge will use digital slides related to patients whose genomics data are available from TCGA. Similarly, BRATS 2014 Challenge will use clinical MRI image data, also from the TCGA study subjects. • Proposed outcome of RSNA/ASCP workshop – Coordinated Pathology/Radiology 2015 challenge – feature selection and statistical/machine learning algorithms to leverage Radiology, Pathology and “omic” features to predict outcome, response to treatment
    31. 31. Thanks!