Biomedical Informatics and Integrative Cancer Research Joel Saltz MD, PhD Director Center for Comprehensive Informatics
Objectives• Brain Tumor in Silico Center• Whole Exome Sequencing and Hypertension in African American Populations• Biomedical Informatics: caBIG, CTSA Informatics Tools and Infrastructure
Integrative Analysis: Tumor Microenvironment• Structural and functional differentiation within tumor Tumors are organs consisting of many interdependent cell types• Molecular pathways are time and space dependent• “Field effects” – gradient of genetic, epigenetic changes• Radiology, microscopy, high throughput genetic, genomic, epigenetic studies, flow cytometry, microCT, nanotechologies …• Create biomarkers to understand disease progression, response to treatment • From John E. Niederhuber, M.D. Director National Cancer Institute, NIH presented at Integrating and Leveraging the Physical Sciences to Open a New Frontier in Oncology, Feb 2008
Informatics Requirements•Parallel initiatives Radiology Pathology, Radiology, Imaging “omics”•Exploit synergies between all initiatives Patient “Omic” Outco to improve ability to Data me forecast survival & response. Pathologic Features
In Silico Center for Translational NeuroOncology InformaticsDirector: Joel Saltz, MD, PhD; PI Dan Brat MD, PhD AIMS 1. Determine genetic and gene expression correlates of high resolution nuclear morphometry in the diffuse gliomas and their relation to MR features using Rembrandt and TCGA datasets. 2. Determine the influence of tumor micro- environment on gene expression profiling and genetic classification using TCGA data 3. Examine the gene expression profile of low grade gliomas that progress to GBM for predictive clustering, prognostic significance and correlates with pathologic and radiologic features. 4. Identify correlates of MRI enhancement patterns in astrocytic neoplasms with underlying vascular changes and gene expression profiles.
In Silico Program Objectives (from NCI) • In silico is an expression used to mean "performed on computer or via computer simulation.“ (Wikipedia) • In silico science centers: support investigator-initiated, hypothesis-driven research in the etiology, treatment, and prevention of cancer using in silico methods • Generating and publishing novel cancer research findings leveraging caBIG tools and infrastructure • Identifying novel bioinformatics processes and tools to exploit existing data resources • Encouraging the development of additional data resources and caBIG analytic services • Assessing the capabilities of current caBIG tools • Emory, Columbia, Georgetown, Fred Hutchinson Cancer , Translational Genomics Research Institute8
TCGA: Large Scale Integrative multi-”omic” Cancer Study
TCGA Research Network Digital Pathology Neuroimaging
Distinguish (and maybe redefine) astrocytic, oligodendroglial and oligoastrocytic tumors using TCGA and Rembrandt Important since treatment and Outcome differ• Link nuclear shape, texture to biological and clinical behavior• How is nuclear shape, texture related to gene expression category defined by clustering analysis of Rembrandt data sets?• Relate nuclear morphometry and gene expression to neuroimaging features (Vasari feature set)• Genetic and gene expression correlates of high resolution nuclear morphometry and relation to MR features using Rembrandt and TCGA datasets.
TCGA Brain Pathology Criteria Attributes that Relate to Entire SpecimenRoughly 200 TCGA specimens; Three Reviewers with Dan Brat adjudicatingNot Present: Not detected on any block Small cell component Present: detected on any block Gemistocytes Abundant: present in ≥ 50% of 10x “Oligodendroglioma-like” component fields in ≥ 50% blocks with perinuclear cytoplasmic halos Perineuronal and/or perivascular satellitosis Microvascular hyperplasia elements (1,2) Multi-nucleated/giant cells Complex/glomeruloid Epithelial metaplasia Circumferential endothelial hyperplasia Mesenchymal metaplasia Entrapped gray matter Necrosis elements (3,4) Entrapped white matter Multiple serpentine pseduopalisading Micro-mineralization pattern Zonal necrosis Inflammation Macrophage/histiocytic infiltrates Lymphocytic infiltrates Polymorphonuclear leukocytic infiltrates
Characterization of specific microanatomic structuresCharacterization of characterization of neoplastic nuclei regions of angiogenesis• Nuclear size (area and • endothelial hypertrophy perimeter) • endothelial hyperplasia• shape (eccentricity, circularity • microvascular hyperplasia major axis, minor axis, Fourier • glomeruloid proliferation shape descriptor and extent ratio) • area of angiogenesis region• intensity (average, maximum, • shape – (how the region minimum, standard error) and departs from a fitted tubular texture (entropy, energy, structure) skewness and kurtosis) • normalized color
TCGA Whole Slide ImagesFeature Extraction Jun Kong
Class Assignment Nuclear QualitiesOligodendroglioma Astrocytoma 1 10
Astrocytoma vs OligodendroglimaOverlap in genetics, gene expression, histology Astrocytoma vs Oligodendroglima • Assess nuclear size (area and perimeter), shape (eccentricity, circularity major axis, minor axis, Fourier shape descriptor and extent ratio), intensity (average, maximum, minimum, standard error) and texture (entropy, energy, skewness and kurtosis).
Machine-based Classification of TCGA GBMs (J Kong)Whole slide scans from 15 TCGA GBMS (69 slides)7 purely astrocytic in morphology; 7 with 2+ oligo components399,233 nuclei analyzed for astro/oligo featuresCases were categorized based on ratio of oligo/astro cells Separation: p =1.4 X 10 -22 TCGA Gene Expression Query: c-Met overexpression
Examine gene expression profiles of low grade gliomas that progress to GBM for predictive clustering and correlates with pathologic and radiologic features. Imaging Pathology MolecularTime1 – 8 yrs
Hierarchical clustering of 176 Rembrandt samples using TCGA classification genes defines four major subtypes. Proneural Neural Mesenchymal Classical (Lee Cooper and Carlos Moreno)
Predicting Recurrence/Survival 75 lower-grade gliomas in 43 oligodendrogliomas in REMBRANDT (p < 0.0003). REMBRANDT (p < 0.0002). Lee Cooper Carlos Moreno
Neuroimaging CorrelatesDefine relationship between contrast-enhancement, perfusion and permeability with vascular changesCorrelate MR characteristics defined by the Vasari Feature Set with pathologic grade, vascular morphology and gene expression profiles
Angiogenesis Segmentation Hematoxylin Image H&E Color Image Deconvolution Eosin Image Eosin intensity image Eosin Spatial Density Density Image Norm. Calculation Image Density Object Boundary Segmented Image ID Smoothing VesselsAngiogenic Segmentation
States of AngiogenesisEndothelial Hypertrophy Endothelial HyperplasiaComplex MicrovascularHyperplasia Lee Cooper Sharath Cholleti
Recent Findings from Integrated Analysis of Necrosis, Angiogenesis, Gene Expression in GBM• Lee A.D. Cooper; Carlos S. Moreno; Candace S. Chisolm; Christina Appin; David A. Gutman; Jun Kong; Tahsin Kurc; Joel H. Saltz; Daniel J. Brat• Frozen sections from 88 GBM samples were manually marked to identify regions of necrosis and angiogenic vessels exhibiting endothelial hypertrophy, hyperplasia, or complex microvascular proliferation• Markups were used to calculate extent of both necrosis and angiogenesis as a percentage of total tissue area• Gene expression from the HT-HGU133A platform analyzed using Significance Analysis of Microarrays (SAM); Cox Proportional Hazards modeling to identify mRNAs significantly associated with extent of necrosis and/or angiogenesis using a false discovery rate cutoff of < 5%
Recent Findings from Integrated Analysis of Necrosis, Angiogenesis, Gene Expression in GBM• Associated with necrosis were master regulators of the mesenchymal tumor subtype, including C/EBP-B, C/EBP-D, STAT3, FOSL2, and RUNX1• IPA analysis of genes correlated with necrosis identified significantly enriched canonical pathways including :• HIF-1α (p = 3.0e-7), NFκB (p = 1.4e-3),• IL-6 (p = 6.9e-6), FGF (p = 2.7e-5),• ERK/MAPK (p = 1.2e-4),• Protein Kinase A signaling (p = 1.9e-4),• Thrombin signaling (p = 5.2e-3),• HGF (p = 0.023) signaling.
Vasari Imaging Criteria (Adam Flanders, TJU; Dan Rubin, Stanford, Lori Dodd, NCI)• Require standardized validated feature sets to describe de novo disease.• Fundamental obstacle to new imaging criteria as treatment biomarkers is lack of standard terminology: – To define a comprehensive set of imaging features of cancer – For reporting imaging results – To provide a more quantitative, reproducible basis for assessing baseline disease and treatment response
Classify Imaging Features of Entire Tumor and Resected SpecimenRecord features of the Distinguish features that compriseentire tumor at baseline. tissue in resected specimen.Imaging Features of Resected Specimen• Extent of resection of enhancing tumor• Extent resection of nCET• Extent resection of vasogenic edema
Defining Rich Set of Qualitative and Quantitative Image Biomarkers• Community-driven ontology development project; collaboration with ASNR• Imaging features (5 categories) – Location of lesion – Morphology of lesion margin (definition, thickness, enhancement, diffusion) – Morphology of lesion substance (enhancement, PS characteristics, focality/multicentricity, necrosis, cysts, midline invasion, cortical involvement, T1/FLAIR ratio) – Alterations in vicinity of lesion (edema, edema crossing midline, hemorrhage, pial invasion, ependymal invasion, satellites, deep WM invasion, calvarial remodeling) – Resection features (extent of nCE tissue, CE tissue, resected components)
Results: Reader Agreement• High inter-observer agreement among the three readers – (kappa = 0.68, p<0.001)• Percentage agreement was also high for most features individually – 22 of 30 features (73%) had agreement greater than 50% – Twelve features (40%) had >80% agreement – No feature had less than 20% agreement• Feature agreement rose substantially when used with tolerance (+/- 1).
Preliminary Relationships of Features to Survival • Cox proportional hazards models were fit to each of the thirty features related to overall survival. • Features associated with lower survival included (p<.0001): – Proportion of enhancing tissue at baseline. – Thick or nodular enhancement characteristics. – Contralateral hemisphere invasion. • Proportion of non contrast enhancing tumor (nCET) had positive correlation with survival. • Tumor size at baseline had no relationship to survival.
Recent Findings Relating Radiology, Pathology “Omics”• Linear regression models incorporating multiple imaging features or a single VASARI feature (ependymal extension) and tumor gene expression can be used to predict patient survival.• Multiple statistically significant associations between imaging and genomic features in glioblastomas. EGFR mutant tumors were significantly larger than TP53 mutant tumors, and were more likely to demonstrate pial involvement. CDKN2A homozygous deletion associated with an ill-defined nonenhancing tumor margin and enhancing pial involvement.• Significant association between minimal enhancing tumor (≤5% proportion of the overall tumor) and Proneural classification (p=0.0006). Significant association between a >5% proportion of necrosis and the presence of microvascular hyperplasia in pathology slides (p=0.008).
Minority GridGrady, Kaiser-Atlanta, MSM-East Point, Jackson-HindsMorehouse, Emory, Jackson Heart Study, University of Washington, Baylor• Aim 1: Establish organizational framework as consoritium of academic medical centers and minority-serving “safety net” medical care facilities• Aim 2: Establish an EHR-linked bioinformatics/bio-repository infrastructure that facilitates in depth genotyping, phenotypic characterization and logitudinal surveillance of minority patients• Aim 3: Demonstrate utility of MH-GRID with a “use case” project that defines genetic, personal and social-environmental determinants of severe hypertension in African Americans• This platform could also be leveraged to carry out cancer studies
Overall Goals of Minority Grid• Breadth and nature of genomic variation associated with clinical phenotypes among patients of various bio- geographical ancestral groups• Bio-ancestry-specific, low frequency/major effect DNA variants that contribute to racial differences in drug responsiveness, health outcomes and health disparities• Characterization of admixture• Long term outcome of patients with at-risk variants revealed by whole exome sequencing
Approach• Identification of 1200 cases, 1200 controls – Controls have longitudinal followup with BP consistently below 120/80• Whole exon sequencing• Detection of new common variants and rare/low frequency variants• EHR data, interview data: health literacy, perceived stress, dietary intake, physical activity, neighborhood characteristics (via geocoding)• Clinical Laboratory analyses: electrolytes, plasma creatinine, lipid profile, glucose, estimated GFR• Project funded for roughly 2 months and is getting underway
Transcontinental Railway:The Golden Spike - Triumph of Standards
Semantic Interoperability: Same ideas, different words
The ca“BIG” PictureChallenges• Unprecedented magnitude of change throughout the system• Constant flow of information to manage• Legacy systems• Cultural barriers (from Ken Beutow)
The ca“BIG” PictureThe cancer Biomedical Informatics Grid (caBIG):• Standards-based vocabulary, data elements, data models facilitate information exchange• Common, widely distributed infrastructure permits cancer community to focus on innovation• Collection of interoperable component-based applications developed to common standards• Cancer information is widely available to diverse communities (from Ken Beutow)
Biomedical Informatics and Middleware Translates and Integrates Information Natural Language Processing Ontologies Disseminates Information Grid Information IntegrationBrings in Information Grid Information Integration
caGrid -- “Octopus middleware”caGrid Components – Language (metadata, ontologies) – Grid Service Graphical Development Toolkit (Introduce) – DICOM compatability (IVI middleware) – Security (GAARDS) – Advertisement and Discovery – Workflow
Integrated BIP• Architecture working group to design a common architecture• Collaborative projects – Security infrastructure – Testing framework – Bioinformatics support – Registry implementation at Grady for quality improvement and cardiovascular research – LIMS deployment for biospecimen management – i2b2 deployment for clinical data• Leverage institutional strengths for education and training• Leverage over $3.8M in grant and internal funding this year
ACTSI-wide Federated Data Warehouse System Develop integrative, federated ACTSI information warehouse Integrated clinical/imaging/”omic”/biomarker/tissue information should always be available A virtually centralized, big Atlanta wide information warehouse that has all relevant data Patients seen and information gathered at any ACTSI site, specimens sent to any affiliated core, imaging carried out at any affiliated site Give me all gene expression, SNP, virtual slide images, hematology studies and CMV serologies for kidney transplant candidates accrued into Study X or Study Y between Feb 2011 and Jan 2012 who were on the kidney transplant waiting list as of November 1, 2010. Development efforts Security, Web Portal, Common Data Elements & Vocabularies, Identifiers, High-performance Computing middleware, Testing framework.
Crucial to Leverage Institutional DataOverview Acquisition Transfer Information Warehouse User Access ADT Lab Multi-Dimensional Respiratory Analysis & Data Real D Mining Blood time Ad-hoc Endoscopy A Query Cardiology T Siemens Img A Business Clinical CPOE OR system I Patient Mgmt Daily N Dictated reports T Text Mining, NLP Pathology reports E Meta Data Patient Billing G Weekly R Practice Plans A Pt Satisfaction Monthly Image Analysis T Cancer Genetics I Web Scorecards Wound O Research External & Dashboards Images Tissue N Wound Web Pulmonary De- Center Genomic Data Identification Research Honest Broker Error Report Benchmarking Ohio State Information Warehouse Infrastructure
Enhanced Registries • Linked Databases for Research • Leverages common data elements and models and existing standards. Initially for cardiovascular disease, diabetes and co-morbidities. • Derived data elements represent categories of data and temporal patterns of interest. • Linked to source data – initially, the Emory Healthcare Clinical Data Warehouse and the Grady Health System Diabetes Patient Tracking System. • Supports end-user researcher query and analysis.• Research PACS • Federated support for management of image data. • DICOM standards and Grid services for federated access. • Management of image analysis results.
Registry Project Status• Co-morbidity registry prototype completed that exports demographics, encounters, readmissions, discharge diagnoses and diagnosis categories, and medication categories to Excel pivot tables• Has been used by Emory Healthcare to identify co-morbidities associated with readmissions for patient populations at high risk• System development is ongoing
Distinguishing Characteristic in Gliomas Nuclear Qualities Round shaped with Elongated with rough, smooth regular texture irregular texture Oligodendroglioma Astrocytoma Use image analysis algorithms to segment and classify microanatomic features (Nuclei, Astrocytoma, Necrosis ...) in whole slide images Represent the segmentation and classification in a well defined structured format that can be used to correlate the pathology with other data modalities
PAIS Database Implemented with IBM DB2 for large scale pathology image metadata (~million markups per slide) Represented by a complex data model capturing multi- faceted information including markups, annotations, algorithm provenance, specimen, etc. Support for complex relationships and spatial query: multi- level granularities, relationships between markups and annotations, spatial and nested relationships
PAIS Database and Analysis Pipeline Suite of analysis algorithms and pipelines that carry out the following tasks:1. segmentation of cells and nuclei;2. characterization of shape and texture features of segmented nuclei;3. storage of nuclei meta-data in relational database;4. mechanism supporting spatial queries for human- annotated nuclei;5. machine learning methods that integrate information from features to accomplish classification tasks.
Image Mining for Comparative Analysis of Expression Patterns in Tissue Microarray (PI’s: Foran and Saltz)Build reference library ofexpression signatures, integratestate-of-the-art multi-spectralimaging capability and build adeployable clinical decision supportsystem for analyzing imaged specimens.Technologies and computationaltools developed during the course ofthe project to be tested on aGrid-enabled, virtual laboratoryestablished among strategic siteslocated at CINJ, Emory, RU, UPenn,OSU, and ASU.Funded by NIH through grant#5R01LM009239-02 David J. Foran, Ph.D.
ACTSI: Example Active Biomedical Informatics Projects In Silico Study of Brain Tumors Minority Health Genomics and Translational Research Bio-Repository Database (MH-GRID) ACTSI Cardiovascular, Diabetes, Brain Tumor Registry Early Hospital Readmission CFAR (Center for AIDS Research) HIV/Cancer Project Radiation Therapy and Quantitative Imaging Integrative Analysis of Text and Discrete Data Related to Smoking Cessation and Asthma Metadata Analysis of Glycan Structures Semantic Query and Analysis of Integrative Datasets in Renal Transplant Clinical Studies (CTOT-C)
Thanks to:• In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director)• caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads• caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and many others• In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz• Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi Schreibmann, Paul Pantalone• Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti, Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers)