Indiana 4 2011 Final Final
Upcoming SlideShare
Loading in...5

Indiana 4 2011 Final Final



Presentation at the University of Indiana Spring 2011

Presentation at the University of Indiana Spring 2011



Total Views
Slideshare-icon Views on SlideShare
Embed Views



1 Embed 1 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Indiana 4 2011 Final Final Indiana 4 2011 Final Final Presentation Transcript

    • Integrative Multi-Scale Biomedical Informatics
      Joel Saltz MD, PhD
      Director Center for Comprehensive Informatics
    • Leverage exascale data and computer resources to squeeze the most out of image, sensor or simulation data
      Run lots of different algorithms to derive same features
      Run lots of algorithms to derive complementary features
      Data models and data management infrastructure to manage data products, feature sets and results from classification and machine learning algorithms
      Squeezing Information from Spatial Datasets
    • Outline
      Integrative biomedical informatics analysis –feature sets obtained from Pathology and Radiology studies
      Techniques, tools and methodologies for derivation, management and analysis of feature sets
    • INTEGrative Biomedical Informatics Analysis
      Reproducible anatomic/functional characterization at gross level (Radiology) and fine level (Pathology)
      Integration of anatomic/functional characterization with multiple types of “omic” information
      Create categories of jointly classified data to describe pathophysiology, predict prognosis, response to treatment
    • In Silico Center for Brain Tumor Research
      Specific Aims:
      Influence of necrosis/
      hypoxia on gene expression and
      genetic classification.
      2. Molecular correlates of high
      resolution nuclear morphometry.
      Gene expression profiles
      that predict glioma progression.
      4. Molecular correlates of MRI
      enhancement patterns.
    • Integration of heterogeneous multiscale information
      Patient Outcome
      • Coordinated initiatives Pathology, Radiology, “omics”
      • Exploit synergies between all initiatives to improve ability to forecast survival & response.
      Pathologic Features
    • Example: Pathology and Gene Expression Joint Predictors of Recurrence/Survival
      Lee Cooper Carlos Moreno
    • Feature Characterization in Pathology and Radiology
      Role – In silico Brain Tumor Research
      Scaling Requirements
    • In Silico Center for Brain Tumor Research
      Key Data Sets
      REMBRANDT: Gene expression and genomics data set of all glioma subtypes
      The Cancer Genome Atlas (TCGA): Rich “omics” set of GBM, digitized Pathology and Radiology
      Pathology and Radiology Images from Henry Ford Hospital, Emory, Thomas Jefferson U, MD Anderson and others
    • TCGA Research Network
      Digital Pathology
    • Progression to GBM
      Anaplastic Astrocytoma
      (WHO grade III)
      (WHO grade IV)
    • TCGA Neuropathology Attributes 120 TCGA specimens; 3 Reviewers
       Presence and Degree of:
      Microvascular hyperplasia
      • Complex/glomeruloid
      • Endothelial hyperplasia
      • Pseudopalisading pattern
      • Zonal necrosis
      • Macrophages/histiocytes
      • Lymphocytes
      • Neutrophils
      • Small cell component
      • Gemistocytes
      • Oligodendroglial
      • Multi-nucleated/giant cells
      • Epithelial metaplasia          
      • Mesenchymal metaplasia
      Other Features
      • Perineuronal/perivascular satellitosis
      • Entrapped gray or white matter
      • Micro-mineralization  
    • Distinguishing Characteristic in Gliomas
      Nuclear Qualities
      Round shaped with
      smooth regular texture
      Elongated with rough, irregular texture
      Use image analysis algorithms to segment and classify microanatomic features (Nuclei, Astrocytoma, Necrosis ...) in whole slide images
      Represent the segmentation and classification in a well defined structured format that can be used to correlate the pathology with other data modalities
    • TCGA Whole Slide Images
      Feature Extraction
      Jun Kong
    • AstrocytomavsOligodendroglimaOverlap in genetics, gene expression, histology
      • Assess nuclear size (area and perimeter), shape (eccentricity, circularity major axis, minor axis, Fourier shape descriptor and extent ratio), intensity (average, maximum, minimum, standard error) and texture (entropy, energy, skewness and kurtosis).
    • Machine-based Classification of TCGA GBMs (J Kong)
      Whole slide scans from 14 TCGA GBMS (69 slides)
      7 purely astrocytic in morphology; 7 with 2+ oligo component
      399,233 nuclei analyzed for astro/oligo features
      Cases were categorized based on ratio of oligo/astro cells
      TCGA Gene
      Expression Query:
      c-Met overexpression
    • Clustergram of selected features used in consensus clustering
      Nuclear Features Used to Classify GBMs
    • Nuclear Features Used to Classify GBMs
      Consensus clustering of morphological signatures
      Study includes 200 million nuclei taken from 480 slides corresponding to 167 distinct patients.
    • Survival of morphological clusters
    • Survival of patients by molecular tumor subtype
    • Articulate Physical Interpretations of Results
    • Images
    • Multiscale Systems Biology
      Employ multi-resolution methods to characterize necrosis, angiogenesis and correlate these with “omics”
      Vascular Changes
      Rapid progression
      No enhancement
      Normal Vessels
      Stable lesion
    • Correlation of Necrosis, Angiogenesis and “omics”
      GBMs display variable and regionally heterogeneous degrees of necrosis (asterisk) and angiogenesis
      These factors may impact gene expression profiles
    • Genes Correlated with Necrosis include Transcription Factors Identified as Regulators of the Mesenchymal Transition
      • Frozen sections from 88 GBM samples marked to identify regions of necrosis and angiogenesis
      • Extent of both necrosis and angiogenesis calculated as a percentage of total tissue area
      Carro MS, et al. Nature 263: 318-25, 2010
    • Feature Sets in Radiology(Adam Flanders, TJU; Dan Rubin, Stanford, Lori Dodd, NCI)
      Require standardized validated feature sets to describe de novo disease.
      Fundamental obstacle to new imaging criteria as treatment biomarkers is lack of standard terminology:
      To define a comprehensive set of imaging features of cancer
      For reporting imaging results
      To provide a more quantitative, reproducible basis for assessing baseline disease and treatment response
    • Defining Rich Set of Qualitative and Quantitative Image Biomarkers
      Community-driven ontology development project; collaboration with ASNR
      Imaging features (5 categories)
      Locationof lesion
      Morphology of lesion margin(definition, thickness, enhancement, diffusion)
      Morphology of lesion substance (enhancement, PS characteristics, focality/multicentricity, necrosis, cysts, midline invasion, cortical involvement, T1/FLAIR ratio)
      Alterations in vicinity of lesion (edema, edema crossing midline, hemorrhage, pial invasion, ependymal invasion, satellites, deep WM invasion, calvarial remodeling)
      Resection features (extent of nCE tissue, CE tissue, resected components)
    • Imaging Predictors of survival and molecular profiles in the TCGA Glioblastoma Data set
      The TCGA glioma working group
      1Emory University Hospital, Atlanta, GA 2National Cancer Institute, Bethesda, MD.  3Thomas Jefferson University Hospital, Philadelphia, PA. 4Henry Ford University Hospital, Detroit, Michigan. 5National Institute of Health, Bethesda, MD.  6Boston University School of Medicine, Boston, MA. 7SAIC-Frederick, Inc., Frederick, MD. 8University of Virginia, Charlottesville, VA. 9Northwestern University Chicago, IL
    • Assumed Dependence Between Features
      F1: Tumor Location
      F3: Eloq. Brain
      F5: Prop. Enhance
      F6: Prop. nCET
      F7: Prop. Necrosis
      F9: Distribution
      F10: T1/FLAIR
      F11: En. Marg.Thick.
      F13: Def. Non.Marg.
      F14: Prop. Edema
      F16: Hemorrhage
      F18: Pial Invasion
      F19: Ependymal
      F20: Cort. Involve.
      F21: Deep WM Inv.
      F22: nCET Cross. Mid.
      F24: Satellites
      NOTE: each feature omitted from this graph is independent of
      every other feature.
      Slide thanks to Eric Huang, NIH
    • Estimation Problem Size Reduction
      Can ignore seven of the features  size of contingency table reduced from 2.64 × 1012 cells to 1.34 × 1010 cells.
      Collapsibility reduces size of contingency table even further:
      Any binary feature Fj connected to only one feature on graph (i.e. given the feature Fj is connected to, Fj is independent of all other features) can also be ignored
      Eliminates need to deal with Hemorrhage (F16), Pial Invasion (F18), Cortical Involvement (F20), and Satellites (F24).
      Reduces size of contingency table to 1.68 × 109 cells.
      Additional analogous considerations can be used to reduce size of contingency table by more than two additional orders of magnitude
      Slide thanks to Eric Huang, NIH
    • Correlative Imaging Results
      Minimal enhancing tumor (≤5%) strongly associated with Proneural classification (p=0.0006).
      >5% proportion of necrosis and the presence of microvascular hyperplasia in pathology slides (p=0.008).
      Greater maximum tumor dimension (T2 signal) associated with present/abundant microvascular hyperplasia (p=0.001).
      < 5% Enhancement
    • Correlative Imaging Results
      TP53 mutant tumors had a smaller mean tumor sizes (p=0.002) on T2-weighted or FLAIR images.
      EGFR mutant tumors were significantly larger than TP53 mutant tumors (p=0.0005).
      High level EGFR amplification was associated with >5% enhancement and >5% proportion of necrosis (p < 0.01).
      > 5% Necrosis
    • Squeezing Information from Spatial Datasets
      Leverage exascale data and computer resources to squeeze the most out of image, sensor or simulation data
      Run lots of different algorithms to derive same features
      Run lots of algorithms to derive complementary features
      Data models and data management infrastructure to manage data products, feature sets and results from classification and machine learning algorithms
    • Pipeline for Whole Slide Feature Characterization
      1010 pixels for each whole slide image
      10 whole slide images per patient
      108 image features per whole slide image
      10,000 brain tumor patients
      1015 pixels
      1013 features
      Hundreds of algorithms
      Annotations and markups from dozens of humans
    • PAIS Database
      • Implemented with IBM DB2 for large scale pathology image metadata (~million markups per slide)
      • Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.
      • Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships
      PAIS Database
      • Implemented with IBM DB2 for large scale pathology image metadata (~million markups per slide)
      • Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.
      • Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships
      • Support for high-level data statistical analysis
    • Data Models to Represent Feature Sets and Experimental Metadata
      PAIS |pās| : Pathology Analytical Imaging Standards
      Provide semantically enabled data model to support pathology analytical imaging
      Data objects, comprehensive data types, and flexible relationships
      Object-oriented design, easily extensible
      Reuse existing standards
      Reuse relevant classes already defined in AIM
      Follow DICOM WG 26 metadata specifications on WSI reference
      Specimen information in DICOM Supplement 122 and caTissue
      Use caDSR for CDE and NCI Thesaurus for ontology concepts
    • Pathology Imaging GIS
      Image analysis
      PAIS model
      PAIS data management
      Modeling and management of markup and annotation for querying and sharing through parallel RDBMS + spatial DBMS
      HDFS data staging
      MapReduce based queries
      On the fly data processing for algorithm validation/algorithm sensitivity studies, or discovery of preliminary results
      Feature extraction
    • Generation and Analysis of Imaging Features
      In-transit data processing using filter/stream systems
      Semantic Workflows
      Hierarchical pipeline design with coarse and fine grained components
      Adaptivity and Quality of Service
    • Same basic story in multiple domains
    • Classification using DataCutter Filter Stream Workflow
    • Slides’ Preparation
      64990 x 59412 pixels in full resolution
      Original Size: 10.8 Gb; Compressed Sized: ≈ 833Mb
    • Computerized Classification System for Grading Neuroblastoma
      Image Tile
      I = L
      Create Image I(L)
      Training Tiles
      I = I -1
      Feature Construction
      I > 1?
      Feature Extraction
      Feature Construction
      Feature Extraction
      Classifier Training
      Within Confidence
      Region ?
      Background Identification
      Image Decomposition (Multi-resolution levels)
      Image Segmentation (EMLDA)
      Feature Construction (2nd order statistics, Tonal Features)
      Feature Extraction (LDA) + Classification (Bayesian)
      Multi-resolution Layer Controller (Confidence Region)
    • Segmentation
      A typical segmentation result of an image from undifferentiated class with components segmented by this method is shown. (a) Original image; (b) Partitioned image shown in color; (c)Nuclei; (d)Cytoplasm; (e)Neuropil; (f)Background component.
    • Semantic Workflows (Wings)Collaborative Work with Yolanda Gil, Mary Hall
      • A systematic strategy for composing application components into workflows
      • Search for the most appropriate implementation of both components and workflows
      • Component optimization
      • Select among implementation variants of the same computation
      • Derive integer values of optimization parameters
      • Only search promising code variants and a restricted parameter space
      • Workflow optimization
      • Knowledge-rich representation of workflow properties
    • Adaptivity
    • Framework
      Description Module (Wings): Describe application workflow using semantics of workflow components
      Execution module (Pegasus, DataCutter, Condor): Maps to resources, generates and places fine grained filter/stream pipelines
      Tradeoff Module: Schedules execution based on application level QOS
    • Impact
    • Image Mining for Comparative Analysis of Expression Patterns in Tissue Microarray
      (PI’s: Foran and Saltz)
      • Build reference library of
      expression signatures, integrate
      state-of-the-art multi-spectral
      imaging capability and build a
      deployable clinical decision support
      system for analyzing imaged
      • Technologies and computational
      tools developed during the course of
      the project to be tested on a
      Grid-enabled, virtual laboratory
      established among strategic sites
      located at CINJ, Emory, RU, UPenn,
      OSU, and ASU.
      David J. Foran, Ph.D.
      Funded by NIH through grant
    • Center for Comprehensive Informatics Integrative Biomedical Informatics Projects
      • In Silico Study of Brain Tumors
      • Minority Health Genomics and Translational Research Bio-Repository Database (MH-GRID)
      • ACTSI Cardiovascular, Diabetes, Brain Tumor Registry
      • Early Hospital Readmission
      • CFAR (Center for AIDS Research) HIV/Cancer Project
      • Radiation Therapy and Quantitative Imaging
      • Integrative Analysis of Text and Discrete Data Related to Smoking Cessation and Asthma
      • Semantic Query and Analysis of Integrative Datasets in Renal Transplant Clinical Studies (CTOT-C)
    • Atlanta Clinical and Translational Science InstituteFederated Data Warehouse System
      Develop integrative, federated ACTSI information warehouse
      Integrated clinical/imaging/”omic”/biomarker/tissue information should always be available
      A virtually centralized, big Atlanta wide information warehouse that has all relevant data
      Patients seen and information gathered at any ACTSI site, specimens sent to any affiliated core, imaging carried out at any affiliated site
      E.g. Gene expression, SNP, virtual slide images, hematology studies and CMV serologies for kidney transplant candidates accrued into Study X or Study Y between Feb 2011 and Jan 2012 who were on the kidney transplant waiting list as of November 1, 2010.
      Development efforts
      Security, Web Portal, Common Data Elements & Vocabularies, Identifiers, High-performance Computing middleware, Testing framework.
    • ACTSI-wide Federated Data Warehouse
    • Thanks to:
      In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director)
      caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads
      caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and many others
      In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz
      Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi Schreibmann, Paul Pantalone
      Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti, Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers)
      NIH/in silico TCGA Imaging Group: Scott Hwang, Bob Clifford, Erich Huang, DimaHammoud, ManalJilwan, PrashantRaghavan, Max Wintermark, David Gutman, Carlos Moreno, Lee Cooper, John Freymann, Justin Kirby, Arun Krishnan, Seena Dehkharghani, Carl Jaffe
      ACTSI Biomedical Informatics Program: Marc Overcash, Tim Morris, Tahsin Kurc, Alexander Quarshie, Circe Tsui, Adam Davis, Sharon Mason, Andrew Post, Alfredo Tirado-Ramos
      NSF Scientific Workflow Collaboration: Vijay Kumar, Yolanda Gil, Mary Hall, EwaDeelman, Tahsin Kurc, P. Sadayappan, Gaurang Mehta, Karan Vahi
    • Thanks!