Indiana 4 2011 Final Final

469 views
370 views

Published on

Presentation at the University of Indiana Spring 2011

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
469
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Indiana 4 2011 Final Final

  1. 1. Integrative Multi-Scale Biomedical Informatics<br />Joel Saltz MD, PhD<br />Director Center for Comprehensive Informatics<br />
  2. 2. Leverage exascale data and computer resources to squeeze the most out of image, sensor or simulation data<br />Run lots of different algorithms to derive same features<br />Run lots of algorithms to derive complementary features<br />Data models and data management infrastructure to manage data products, feature sets and results from classification and machine learning algorithms<br />Squeezing Information from Spatial Datasets<br />
  3. 3. Outline<br />Integrative biomedical informatics analysis –feature sets obtained from Pathology and Radiology studies<br />Techniques, tools and methodologies for derivation, management and analysis of feature sets<br />
  4. 4. INTEGrative Biomedical Informatics Analysis<br />Reproducible anatomic/functional characterization at gross level (Radiology) and fine level (Pathology)<br />Integration of anatomic/functional characterization with multiple types of “omic” information<br />Create categories of jointly classified data to describe pathophysiology, predict prognosis, response to treatment<br />
  5. 5. In Silico Center for Brain Tumor Research<br />Specific Aims:<br />Influence of necrosis/<br />hypoxia on gene expression and<br />genetic classification.<br />2. Molecular correlates of high<br />resolution nuclear morphometry.<br />Gene expression profiles <br />that predict glioma progression.<br />4. Molecular correlates of MRI<br />enhancement patterns.<br />
  6. 6. Integration of heterogeneous multiscale information<br />Radiology<br />Imaging<br />Patient Outcome<br />“Omic”<br />Data <br /><ul><li>Coordinated initiatives Pathology, Radiology, “omics”
  7. 7. Exploit synergies between all initiatives to improve ability to forecast survival & response.</li></ul>Pathologic Features<br />
  8. 8. Example: Pathology and Gene Expression Joint Predictors of Recurrence/Survival<br />Lee Cooper Carlos Moreno<br />
  9. 9. Feature Characterization in Pathology and Radiology<br />Role – In silico Brain Tumor Research<br />Algorithms<br />Scaling Requirements<br />
  10. 10. In Silico Center for Brain Tumor Research<br />Key Data Sets<br />REMBRANDT: Gene expression and genomics data set of all glioma subtypes<br />The Cancer Genome Atlas (TCGA): Rich “omics” set of GBM, digitized Pathology and Radiology<br />Pathology and Radiology Images from Henry Ford Hospital, Emory, Thomas Jefferson U, MD Anderson and others<br />
  11. 11. TCGA Research Network<br />Digital Pathology<br />Neuroimaging<br />
  12. 12. Progression to GBM<br />Anaplastic Astrocytoma<br />(WHO grade III)<br />Glioblastoma<br />(WHO grade IV)<br />
  13. 13. TCGA Neuropathology Attributes 120 TCGA specimens; 3 Reviewers<br /> Presence and Degree of:<br />Microvascular hyperplasia<br /><ul><li>Complex/glomeruloid
  14. 14. Endothelial hyperplasia</li></ul>Necrosis <br /><ul><li>Pseudopalisading pattern
  15. 15. Zonal necrosis</li></ul>Inflammation<br /><ul><li>Macrophages/histiocytes
  16. 16. Lymphocytes
  17. 17. Neutrophils</li></ul> <br />Differentiation:<br /><ul><li>Small cell component
  18. 18. Gemistocytes
  19. 19. Oligodendroglial
  20. 20. Multi-nucleated/giant cells
  21. 21. Epithelial metaplasia          
  22. 22. Mesenchymal metaplasia</li></ul>Other Features<br /><ul><li>Perineuronal/perivascular satellitosis
  23. 23. Entrapped gray or white matter
  24. 24. Micro-mineralization  </li></li></ul><li>Distinguishing Characteristic in Gliomas<br />Nuclear Qualities<br />Round shaped with<br />smooth regular texture<br />Elongated with rough, irregular texture<br />Oligodendroglioma<br />Astrocytoma<br />Use image analysis algorithms to segment and classify microanatomic features (Nuclei, Astrocytoma, Necrosis ...) in whole slide images<br />Represent the segmentation and classification in a well defined structured format that can be used to correlate the pathology with other data modalities<br />
  25. 25. TCGA Whole Slide Images<br />Feature Extraction<br />Jun Kong<br />
  26. 26. AstrocytomavsOligodendroglimaOverlap in genetics, gene expression, histology<br />AstrocytomavsOligodendroglima<br /><ul><li>Assess nuclear size (area and perimeter), shape (eccentricity, circularity major axis, minor axis, Fourier shape descriptor and extent ratio), intensity (average, maximum, minimum, standard error) and texture (entropy, energy, skewness and kurtosis).</li></li></ul><li>Machine-based Classification of TCGA GBMs (J Kong)<br />Whole slide scans from 14 TCGA GBMS (69 slides)<br />7 purely astrocytic in morphology; 7 with 2+ oligo component<br />399,233 nuclei analyzed for astro/oligo features<br />Cases were categorized based on ratio of oligo/astro cells<br />TCGA Gene <br />Expression Query:<br />c-Met overexpression<br />
  27. 27. Clustergram of selected features used in consensus clustering<br />Nuclear Features Used to Classify GBMs<br />
  28. 28. Nuclear Features Used to Classify GBMs<br />Consensus clustering of morphological signatures<br />Study includes 200 million nuclei taken from 480 slides corresponding to 167 distinct patients. <br />
  29. 29. Survival of morphological clusters<br />
  30. 30. Survival of patients by molecular tumor subtype<br />
  31. 31. Articulate Physical Interpretations of Results<br />
  32. 32. Images<br />
  33. 33. Multiscale Systems Biology<br />Employ multi-resolution methods to characterize necrosis, angiogenesis and correlate these with “omics”<br />Rim-enhancement<br />Vascular Changes<br />Rapid progression<br />No enhancement<br />Normal Vessels<br />Stable lesion<br />?<br />
  34. 34. Correlation of Necrosis, Angiogenesis and “omics”<br />GBMs display variable and regionally heterogeneous degrees of necrosis (asterisk) and angiogenesis <br />These factors may impact gene expression profiles<br />
  35. 35. Genes Correlated with Necrosis include Transcription Factors Identified as Regulators of the Mesenchymal Transition<br /><ul><li>Frozen sections from 88 GBM samples marked to identify regions of necrosis and angiogenesis
  36. 36. Extent of both necrosis and angiogenesis calculated as a percentage of total tissue area</li></ul>Carro MS, et al. Nature 263: 318-25, 2010<br />
  37. 37. Feature Sets in Radiology(Adam Flanders, TJU; Dan Rubin, Stanford, Lori Dodd, NCI)<br />Require standardized validated feature sets to describe de novo disease.<br />Fundamental obstacle to new imaging criteria as treatment biomarkers is lack of standard terminology:<br />To define a comprehensive set of imaging features of cancer<br />For reporting imaging results<br />To provide a more quantitative, reproducible basis for assessing baseline disease and treatment response<br />
  38. 38. Defining Rich Set of Qualitative and Quantitative Image Biomarkers<br />Community-driven ontology development project; collaboration with ASNR<br />Imaging features (5 categories)<br />Locationof lesion<br />Morphology of lesion margin(definition, thickness, enhancement, diffusion)<br />Morphology of lesion substance (enhancement, PS characteristics, focality/multicentricity, necrosis, cysts, midline invasion, cortical involvement, T1/FLAIR ratio)<br />Alterations in vicinity of lesion (edema, edema crossing midline, hemorrhage, pial invasion, ependymal invasion, satellites, deep WM invasion, calvarial remodeling)<br />Resection features (extent of nCE tissue, CE tissue, resected components)<br />
  39. 39. Imaging Predictors of survival and molecular profiles in the TCGA Glioblastoma Data set<br />The TCGA glioma working group<br />1Emory University Hospital, Atlanta, GA 2National Cancer Institute, Bethesda, MD.  3Thomas Jefferson University Hospital, Philadelphia, PA. 4Henry Ford University Hospital, Detroit, Michigan. 5National Institute of Health, Bethesda, MD.  6Boston University School of Medicine, Boston, MA. 7SAIC-Frederick, Inc., Frederick, MD. 8University of Virginia, Charlottesville, VA. 9Northwestern University Chicago, IL<br />
  40. 40. Assumed Dependence Between Features<br />F1: Tumor Location<br />F3: Eloq. Brain<br />F5: Prop. Enhance<br />F6: Prop. nCET<br />F7: Prop. Necrosis<br />F9: Distribution<br />F10: T1/FLAIR<br />F11: En. Marg.Thick.<br />F13: Def. Non.Marg.<br />F14: Prop. Edema<br />F16: Hemorrhage<br />F18: Pial Invasion<br />F19: Ependymal<br />F20: Cort. Involve.<br />F21: Deep WM Inv.<br />F22: nCET Cross. Mid.<br />F24: Satellites<br />F18<br />F20<br />F9<br />F3<br />F16<br />F7<br />F22<br />F21<br />F1<br />F6<br />F11<br />F5<br />F14<br />F19<br />F10<br />F13<br />F24<br />NOTE: each feature omitted from this graph is independent of<br />every other feature.<br />Slide thanks to Eric Huang, NIH<br />
  41. 41. Estimation Problem Size Reduction<br />Can ignore seven of the features  size of contingency table reduced from 2.64 × 1012 cells to 1.34 × 1010 cells.<br />Collapsibility reduces size of contingency table even further: <br />Any binary feature Fj connected to only one feature on graph (i.e. given the feature Fj is connected to, Fj is independent of all other features) can also be ignored<br />Eliminates need to deal with Hemorrhage (F16), Pial Invasion (F18), Cortical Involvement (F20), and Satellites (F24).<br />Reduces size of contingency table to 1.68 × 109 cells.<br />Additional analogous considerations can be used to reduce size of contingency table by more than two additional orders of magnitude<br />Slide thanks to Eric Huang, NIH<br />
  42. 42. Correlative Imaging Results<br />Minimal enhancing tumor (≤5%) strongly associated with Proneural classification (p=0.0006). <br />>5% proportion of necrosis and the presence of microvascular hyperplasia in pathology slides (p=0.008).<br />Greater maximum tumor dimension (T2 signal) associated with present/abundant microvascular hyperplasia (p=0.001).<br />< 5% Enhancement<br />
  43. 43. Correlative Imaging Results<br />TP53 mutant tumors had a smaller mean tumor sizes (p=0.002) on T2-weighted or FLAIR images. <br />EGFR mutant tumors were significantly larger than TP53 mutant tumors (p=0.0005). <br />High level EGFR amplification was associated with >5% enhancement and >5% proportion of necrosis (p < 0.01). <br />> 5% Necrosis<br />
  44. 44. Squeezing Information from Spatial Datasets<br />Leverage exascale data and computer resources to squeeze the most out of image, sensor or simulation data<br />Run lots of different algorithms to derive same features<br />Run lots of algorithms to derive complementary features<br />Data models and data management infrastructure to manage data products, feature sets and results from classification and machine learning algorithms<br />
  45. 45. Pipeline for Whole Slide Feature Characterization<br />1010 pixels for each whole slide image<br />10 whole slide images per patient<br />108 image features per whole slide image<br />10,000 brain tumor patients<br />1015 pixels<br />1013 features<br />Hundreds of algorithms<br />Annotations and markups from dozens of humans<br />
  46. 46. PAIS Database<br /><ul><li>Implemented with IBM DB2 for large scale pathology image metadata (~million markups per slide)
  47. 47. Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.
  48. 48. Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships</li></ul>PAIS Database<br /><ul><li>Implemented with IBM DB2 for large scale pathology image metadata (~million markups per slide)
  49. 49. Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.
  50. 50. Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships
  51. 51. Support for high-level data statistical analysis</li></li></ul><li>Data Models to Represent Feature Sets and Experimental Metadata<br />PAIS |pās| : Pathology Analytical Imaging Standards<br />Provide semantically enabled data model to support pathology analytical imaging<br />Data objects, comprehensive data types, and flexible relationships<br />Object-oriented design, easily extensible<br />Reuse existing standards<br />Reuse relevant classes already defined in AIM<br />Follow DICOM WG 26 metadata specifications on WSI reference<br />Specimen information in DICOM Supplement 122 and caTissue<br />Use caDSR for CDE and NCI Thesaurus for ontology concepts<br />
  52. 52. Pathology Imaging GIS<br />Image analysis<br />PAIS model<br />PAIS data management <br />Modeling and management of markup and annotation for querying and sharing through parallel RDBMS + spatial DBMS<br />Segmentation<br />HDFS data staging<br />MapReduce based queries<br />On the fly data processing for algorithm validation/algorithm sensitivity studies, or discovery of preliminary results<br />Feature extraction<br />
  53. 53. Generation and Analysis of Imaging Features<br />In-transit data processing using filter/stream systems<br />Semantic Workflows<br />Hierarchical pipeline design with coarse and fine grained components<br />Adaptivity and Quality of Service<br />
  54. 54. Same basic story in multiple domains<br />
  55. 55. Classification using DataCutter Filter Stream Workflow<br />
  56. 56. Slides’ Preparation<br />8x<br />40x<br />64990 x 59412 pixels in full resolution<br />Original Size: 10.8 Gb; Compressed Sized: ≈ 833Mb<br />
  57. 57. Computerized Classification System for Grading Neuroblastoma<br />Background?<br />Yes<br />Image Tile<br />Label<br />Initialization<br />I = L<br />No<br />Create Image I(L)<br />Training Tiles<br />Segmentation<br />I = I -1<br />Down-sampling<br />Feature Construction<br />Segmentation<br />Yes<br />No<br />I > 1?<br />Feature Extraction<br />Feature Construction<br />Feature Extraction<br />Classification<br />Classifier Training<br />Within Confidence<br />Region ?<br />No<br />Yes<br />TRAINING<br />TESTING<br />Background Identification<br />Image Decomposition (Multi-resolution levels)<br />Image Segmentation (EMLDA)<br />Feature Construction (2nd order statistics, Tonal Features)<br />Feature Extraction (LDA) + Classification (Bayesian)<br />Multi-resolution Layer Controller (Confidence Region)<br />
  58. 58. Segmentation<br />A typical segmentation result of an image from undifferentiated class with components segmented by this method is shown. (a) Original image; (b) Partitioned image shown in color; (c)Nuclei; (d)Cytoplasm; (e)Neuropil; (f)Background component.<br />
  59. 59. Semantic Workflows (Wings)Collaborative Work with Yolanda Gil, Mary Hall<br /><ul><li>A systematic strategy for composing application components into workflows
  60. 60. Search for the most appropriate implementation of both components and workflows
  61. 61. Component optimization
  62. 62. Select among implementation variants of the same computation
  63. 63. Derive integer values of optimization parameters
  64. 64. Only search promising code variants and a restricted parameter space
  65. 65. Workflow optimization
  66. 66. Knowledge-rich representation of workflow properties</li></li></ul><li>Adaptivity<br />
  67. 67. Framework<br />Description Module (Wings): Describe application workflow using semantics of workflow components<br />Execution module (Pegasus, DataCutter, Condor): Maps to resources, generates and places fine grained filter/stream pipelines<br />Tradeoff Module: Schedules execution based on application level QOS<br />
  68. 68.
  69. 69. Impact<br />
  70. 70.
  71. 71. Image Mining for Comparative Analysis of Expression Patterns in Tissue Microarray<br />(PI’s: Foran and Saltz)<br /><ul><li>Build reference library of</li></ul>expression signatures, integrate <br />state-of-the-art multi-spectral <br />imaging capability and build a <br />deployable clinical decision support <br />system for analyzing imaged <br />specimens. <br /><ul><li>Technologies and computational </li></ul>tools developed during the course of <br />the project to be tested on a <br />Grid-enabled, virtual laboratory <br />established among strategic sites <br />located at CINJ, Emory, RU, UPenn, <br />OSU, and ASU.<br /> <br /> <br />David J. Foran, Ph.D.<br />Funded by NIH through grant<br />#5R01LM009239-02 <br />
  72. 72. Center for Comprehensive Informatics Integrative Biomedical Informatics Projects<br /><ul><li>In Silico Study of Brain Tumors
  73. 73. Minority Health Genomics and Translational Research Bio-Repository Database (MH-GRID)
  74. 74. ACTSI Cardiovascular, Diabetes, Brain Tumor Registry
  75. 75. Early Hospital Readmission
  76. 76. CFAR (Center for AIDS Research) HIV/Cancer Project
  77. 77. Radiation Therapy and Quantitative Imaging
  78. 78. Integrative Analysis of Text and Discrete Data Related to Smoking Cessation and Asthma
  79. 79. Semantic Query and Analysis of Integrative Datasets in Renal Transplant Clinical Studies (CTOT-C)</li></li></ul><li>Atlanta Clinical and Translational Science InstituteFederated Data Warehouse System<br />Develop integrative, federated ACTSI information warehouse <br />Integrated clinical/imaging/”omic”/biomarker/tissue information should always be available<br />A virtually centralized, big Atlanta wide information warehouse that has all relevant data<br />Patients seen and information gathered at any ACTSI site, specimens sent to any affiliated core, imaging carried out at any affiliated site<br />E.g. Gene expression, SNP, virtual slide images, hematology studies and CMV serologies for kidney transplant candidates accrued into Study X or Study Y between Feb 2011 and Jan 2012 who were on the kidney transplant waiting list as of November 1, 2010. <br />Development efforts<br />Security, Web Portal, Common Data Elements & Vocabularies, Identifiers, High-performance Computing middleware, Testing framework. <br />
  80. 80. ACTSI-wide Federated Data Warehouse<br />
  81. 81. Thanks to:<br />In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director)<br />caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads<br />caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and many others<br />In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz<br />Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi Schreibmann, Paul Pantalone<br />Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti, Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers)<br />NIH/in silico TCGA Imaging Group: Scott Hwang, Bob Clifford, Erich Huang, DimaHammoud, ManalJilwan, PrashantRaghavan, Max Wintermark, David Gutman, Carlos Moreno, Lee Cooper, John Freymann, Justin Kirby, Arun Krishnan, Seena Dehkharghani, Carl Jaffe<br />ACTSI Biomedical Informatics Program: Marc Overcash, Tim Morris, Tahsin Kurc, Alexander Quarshie, Circe Tsui, Adam Davis, Sharon Mason, Andrew Post, Alfredo Tirado-Ramos<br />NSF Scientific Workflow Collaboration: Vijay Kumar, Yolanda Gil, Mary Hall, EwaDeelman, Tahsin Kurc, P. Sadayappan, Gaurang Mehta, Karan Vahi<br />
  82. 82. Thanks!<br />

×