Talk given to the Emory Cancer Control and Population Science Program 2/17/2011 Describing Biomedical Informatics, Integrative Cancer Research, caBIG and CTSA
Top Rated Bangalore Call Girls Richmond Circle ⟟ 8250192130 ⟟ Call Me For Gen...
Wci Pop Sci Feb 2011
1. Biomedical Informatics and Integrative
Cancer Research
Joel Saltz MD, PhD
Director Center for Comprehensive
Informatics
2. Objectives
• Brain Tumor in Silico Center
• Whole Exome Sequencing and
Hypertension in African American
Populations
• Biomedical Informatics: caBIG, CTSA
Informatics Tools and Infrastructure
3. Integrative Analysis: Tumor Microenvironment
• Structural and functional
differentiation within tumor Tumors are organs consisting of
many interdependent cell types
• Molecular pathways are time
and space dependent
• “Field effects” – gradient of
genetic, epigenetic changes
• Radiology, microscopy, high
throughput genetic, genomic,
epigenetic studies, flow
cytometry, microCT,
nanotechologies …
• Create biomarkers to
understand disease
progression, response to
treatment
• From John E. Niederhuber, M.D. Director
National Cancer Institute, NIH presented at
Integrating and Leveraging the Physical
Sciences to Open a New Frontier in Oncology,
Feb 2008
4. Informatics Requirements
•Parallel initiatives
Radiology
Pathology, Radiology,
Imaging
“omics”
•Exploit synergies
between all initiatives Patient
“Omic” Outco
to improve ability to
Data me
forecast survival &
response.
Pathologic
Features
7. In Silico Center for Translational
NeuroOncology Informatics
Director: Joel Saltz, MD, PhD; PI Dan Brat MD, PhD
AIMS
1. Determine genetic and gene expression
correlates of high resolution nuclear
morphometry in the diffuse gliomas and
their relation to MR features using
Rembrandt and TCGA datasets.
2. Determine the influence of tumor micro-
environment on gene expression profiling
and genetic classification using TCGA data
3. Examine the gene expression profile of low
grade gliomas that progress to GBM for
predictive clustering, prognostic
significance and correlates with pathologic
and radiologic features.
4. Identify correlates of MRI enhancement
patterns in astrocytic neoplasms with
underlying vascular changes and gene
expression profiles.
8. In Silico Program Objectives (from
NCI)
• In silico is an expression used to mean "performed on computer
or via computer simulation.“ (Wikipedia)
• In silico science centers: support investigator-initiated,
hypothesis-driven research in the etiology, treatment, and
prevention of cancer using in silico methods
• Generating and publishing novel cancer research findings leveraging
caBIG tools and infrastructure
• Identifying novel bioinformatics processes and tools to exploit
existing data resources
• Encouraging the development of additional data resources and
caBIG analytic services
• Assessing the capabilities of current caBIG tools
• Emory, Columbia, Georgetown, Fred Hutchinson Cancer ,
Translational Genomics Research Institute
8
11. Distinguish (and maybe redefine) astrocytic, oligodendroglial
and oligoastrocytic tumors using TCGA and Rembrandt
Important since treatment and Outcome differ
• Link nuclear shape, texture to biological and clinical
behavior
• How is nuclear shape, texture related to gene
expression category defined by clustering analysis of
Rembrandt data sets?
• Relate nuclear morphometry and gene expression to
neuroimaging features (Vasari feature set)
• Genetic and gene expression correlates of high
resolution nuclear morphometry and relation to MR
features using Rembrandt and TCGA datasets.
12. TCGA Brain Pathology Criteria
Attributes that Relate to Entire Specimen
Roughly 200 TCGA specimens; Three Reviewers with Dan
Brat adjudicating
Not Present: Not detected on any block Small cell component
Present: detected on any block Gemistocytes
Abundant: present in ≥ 50% of 10x “Oligodendroglioma-like” component
fields in ≥ 50% blocks with perinuclear cytoplasmic halos
Perineuronal and/or perivascular satellitosis
Microvascular hyperplasia elements (1,2) Multi-nucleated/giant cells
Complex/glomeruloid Epithelial metaplasia
Circumferential endothelial hyperplasia Mesenchymal metaplasia
Entrapped gray matter
Necrosis elements (3,4) Entrapped white matter
Multiple serpentine pseduopalisading Micro-mineralization
pattern
Zonal necrosis Inflammation
Macrophage/histiocytic infiltrates
Lymphocytic infiltrates
Polymorphonuclear leukocytic infiltrates
13. Characterization of specific microanatomic
structures
Characterization of characterization of
neoplastic nuclei regions of angiogenesis
• Nuclear size (area and • endothelial hypertrophy
perimeter) • endothelial hyperplasia
• shape (eccentricity, circularity • microvascular hyperplasia
major axis, minor axis, Fourier • glomeruloid proliferation
shape descriptor and extent
ratio) • area of angiogenesis region
• intensity (average, maximum, • shape – (how the region
minimum, standard error) and departs from a fitted tubular
texture (entropy, energy, structure)
skewness and kurtosis) • normalized color
15. Class Assignment
Nuclear Qualities
Oligodendroglioma Astrocytoma
1 10
16. Astrocytoma vs Oligodendroglima
Overlap in genetics, gene expression, histology
Astrocytoma vs Oligodendroglima
• Assess nuclear size (area and
perimeter), shape (eccentricity,
circularity major axis, minor axis, Fourier
shape descriptor and extent ratio),
intensity (average, maximum, minimum,
standard error) and texture (entropy,
energy, skewness and kurtosis).
17. Machine-based Classification of TCGA GBMs (J Kong)
Whole slide scans from 15 TCGA GBMS (69 slides)
7 purely astrocytic in morphology; 7 with 2+ oligo components
399,233 nuclei analyzed for astro/oligo features
Cases were categorized based on ratio of oligo/astro cells
Separation:
p =1.4 X 10 -22
TCGA Gene
Expression Query:
c-Met overexpression
18. Examine gene expression profiles of low grade gliomas that progress to GBM for predictive
clustering and correlates with pathologic and radiologic features.
Imaging Pathology Molecular
Time
1 – 8 yrs
19. Hierarchical clustering of 176 Rembrandt samples using TCGA classification genes
defines four major subtypes.
Proneural Neural Mesenchymal Classical
(Lee Cooper and Carlos Moreno)
20. Predicting Recurrence/Survival
75 lower-grade gliomas in 43 oligodendrogliomas in
REMBRANDT (p < 0.0003). REMBRANDT (p < 0.0002).
Lee Cooper
Carlos Moreno
21. Neuroimaging Correlates
Define relationship between
contrast-enhancement,
perfusion and permeability with
vascular changes
Correlate MR characteristics
defined by the Vasari Feature
Set with pathologic grade,
vascular morphology and gene
expression profiles
22. Angiogenesis Segmentation
Hematoxylin
Image
H&E Color
Image Deconvolution
Eosin
Image
Eosin intensity image
Eosin Spatial Density Density
Image Norm. Calculation Image
Density Object Boundary Segmented
Image ID Smoothing Vessels
Angiogenic Segmentation
24. Recent Findings from Integrated Analysis of
Necrosis, Angiogenesis, Gene Expression in
GBM
• Lee A.D. Cooper; Carlos S. Moreno; Candace S. Chisolm; Christina Appin;
David A. Gutman; Jun Kong; Tahsin Kurc; Joel H. Saltz; Daniel J. Brat
• Frozen sections from 88 GBM samples were manually marked to identify
regions of necrosis and angiogenic vessels exhibiting endothelial
hypertrophy, hyperplasia, or complex microvascular proliferation
• Markups were used to calculate extent of both necrosis and angiogenesis
as a percentage of total tissue area
• Gene expression from the HT-HGU133A platform analyzed using
Significance Analysis of Microarrays (SAM); Cox Proportional Hazards
modeling to identify mRNAs significantly associated with extent of necrosis
and/or angiogenesis using a false discovery rate cutoff of < 5%
25. Recent Findings from Integrated Analysis of
Necrosis, Angiogenesis, Gene Expression in
GBM
• Associated with necrosis were master regulators of the
mesenchymal tumor subtype, including C/EBP-B, C/EBP-D, STAT3,
FOSL2, and RUNX1
• IPA analysis of genes correlated with necrosis identified significantly
enriched canonical pathways including :
• HIF-1α (p = 3.0e-7), NFκB (p = 1.4e-3),
• IL-6 (p = 6.9e-6), FGF (p = 2.7e-5),
• ERK/MAPK (p = 1.2e-4),
• Protein Kinase A signaling (p = 1.9e-4),
• Thrombin signaling (p = 5.2e-3),
• HGF (p = 0.023) signaling.
26. Vasari Imaging Criteria
(Adam Flanders, TJU; Dan Rubin, Stanford, Lori Dodd, NCI)
• Require standardized validated feature sets to
describe de novo disease.
• Fundamental obstacle to new imaging criteria
as treatment biomarkers is
lack of standard terminology:
– To define a comprehensive set of imaging
features of cancer
– For reporting imaging results
– To provide a more quantitative, reproducible
basis for assessing baseline disease and
treatment response
27. Classify Imaging Features of Entire
Tumor and Resected Specimen
Record features of the Distinguish features that comprise
entire tumor at baseline. tissue in resected specimen.
Imaging Features of Resected Specimen
• Extent of resection of enhancing tumor
• Extent resection of nCET
• Extent resection of vasogenic edema
28. Defining Rich Set of Qualitative and
Quantitative Image Biomarkers
• Community-driven ontology development
project; collaboration with ASNR
• Imaging features (5 categories)
– Location of lesion
– Morphology of lesion margin (definition, thickness,
enhancement, diffusion)
– Morphology of lesion substance (enhancement, PS
characteristics, focality/multicentricity, necrosis, cysts, midline
invasion, cortical involvement, T1/FLAIR ratio)
– Alterations in vicinity of lesion (edema, edema
crossing midline, hemorrhage, pial invasion, ependymal invasion,
satellites, deep WM invasion, calvarial remodeling)
– Resection features (extent of nCE tissue, CE tissue, resected
components)
29. Results: Reader Agreement
• High inter-observer agreement among
the three readers
– (kappa = 0.68, p<0.001)
• Percentage agreement was also high for most features
individually
– 22 of 30 features (73%) had agreement greater than
50%
– Twelve features (40%) had >80% agreement
– No feature had less than 20% agreement
• Feature agreement rose substantially when used with
tolerance (+/- 1).
30. Preliminary Relationships of Features to Survival
• Cox proportional hazards models were fit to
each of the thirty features related to overall
survival.
• Features associated with lower survival
included (p<.0001):
– Proportion of enhancing tissue at baseline.
– Thick or nodular enhancement characteristics.
– Contralateral hemisphere invasion.
• Proportion of non contrast enhancing tumor
(nCET) had positive correlation with survival.
• Tumor size at baseline had no relationship to
survival.
31. Recent Findings Relating Radiology, Pathology
“Omics”
• Linear regression models incorporating multiple imaging features or
a single VASARI feature (ependymal extension) and tumor gene
expression can be used to predict patient survival.
• Multiple statistically significant associations between imaging and
genomic features in glioblastomas. EGFR mutant tumors were
significantly larger than TP53 mutant tumors, and were more likely
to demonstrate pial involvement. CDKN2A homozygous deletion
associated with an ill-defined nonenhancing tumor margin and
enhancing pial involvement.
• Significant association between minimal enhancing tumor (≤5%
proportion of the overall tumor) and Proneural classification
(p=0.0006). Significant association between a >5% proportion of
necrosis and the presence of microvascular hyperplasia in
pathology slides (p=0.008).
32. Minority Grid
Grady, Kaiser-Atlanta, MSM-East Point, Jackson-Hinds
Morehouse, Emory, Jackson Heart Study, University of
Washington, Baylor
• Aim 1: Establish organizational framework as consoritium of
academic medical centers and minority-serving “safety net” medical
care facilities
• Aim 2: Establish an EHR-linked bioinformatics/bio-repository
infrastructure that facilitates in depth genotyping, phenotypic
characterization and logitudinal surveillance of minority patients
• Aim 3: Demonstrate utility of MH-GRID with a “use case” project that
defines genetic, personal and social-environmental determinants of
severe hypertension in African Americans
• This platform could also be leveraged to carry out cancer
studies
33. Overall Goals of Minority Grid
• Breadth and nature of genomic variation associated with
clinical phenotypes among patients of various bio-
geographical ancestral groups
• Bio-ancestry-specific, low frequency/major effect DNA
variants that contribute to racial differences in drug
responsiveness, health outcomes and health disparities
• Characterization of admixture
• Long term outcome of patients with at-risk variants
revealed by whole exome sequencing
34. Approach
• Identification of 1200 cases, 1200 controls
– Controls have longitudinal followup with BP consistently below
120/80
• Whole exon sequencing
• Detection of new common variants and rare/low frequency
variants
• EHR data, interview data: health literacy, perceived stress,
dietary intake, physical activity, neighborhood characteristics
(via geocoding)
• Clinical Laboratory analyses: electrolytes, plasma creatinine,
lipid profile, glucose, estimated GFR
• Project funded for roughly 2 months and is getting
underway
37. The ca“BIG” Picture
Challenges
• Unprecedented magnitude of change
throughout the system
• Constant flow of information to
manage
• Legacy systems
• Cultural barriers
(from Ken Beutow)
38. The ca“BIG” Picture
The cancer Biomedical Informatics
Grid (caBIG):
• Standards-based vocabulary,
data elements, data models
facilitate information exchange
• Common, widely distributed
infrastructure permits cancer
community to focus on innovation
• Collection of interoperable component-based
applications developed to common
standards
• Cancer information is widely available to
diverse communities (from Ken Beutow)
39. Biomedical Informatics and Middleware
Translates and
Integrates Information
Natural Language Processing
Ontologies
Disseminates
Information
Grid
Information Integration
Brings in Information
Grid
Information
Integration
40. caGrid -- “Octopus middleware”
caGrid Components
– Language (metadata,
ontologies)
– Grid Service Graphical
Development Toolkit
(Introduce)
– DICOM compatability (IVI
middleware)
– Security (GAARDS)
– Advertisement and
Discovery
– Workflow
41. Integrated BIP
• Architecture working group to design a common
architecture
• Collaborative projects
– Security infrastructure
– Testing framework
– Bioinformatics support
– Registry implementation at Grady for quality improvement
and cardiovascular research
– LIMS deployment for biospecimen management
– i2b2 deployment for clinical data
• Leverage institutional strengths for education and
training
• Leverage over $3.8M in grant and internal funding this
year
42. ACTSI-wide Federated Data Warehouse System
Develop integrative, federated ACTSI information warehouse
Integrated clinical/imaging/”omic”/biomarker/tissue information
should always be available
A virtually centralized, big Atlanta wide information warehouse that
has all relevant data
Patients seen and information gathered at any ACTSI site, specimens sent
to any affiliated core, imaging carried out at any affiliated site
Give me all gene expression, SNP, virtual slide images, hematology
studies and CMV serologies for kidney transplant candidates accrued
into Study X or Study Y between Feb 2011 and Jan 2012 who were on
the kidney transplant waiting list as of November 1, 2010.
Development efforts
Security, Web Portal, Common Data Elements & Vocabularies,
Identifiers, High-performance Computing middleware, Testing
framework.
43. Crucial to Leverage Institutional
Data
Overview
Acquisition Transfer Information Warehouse User Access
ADT
Lab
Multi-Dimensional
Respiratory Analysis & Data
Real D Mining
Blood
time
Ad-hoc
Endoscopy A
Query
Cardiology T
Siemens Img A Business Clinical
CPOE
OR system I
Patient Mgmt Daily N
Dictated reports T Text Mining, NLP
Pathology reports E Meta Data
Patient Billing G
Weekly R
Practice Plans
A
Pt Satisfaction Monthly Image Analysis
T
Cancer Genetics I Web Scorecards
Wound O Research External & Dashboards
Images
Tissue
N Wound
Web
Pulmonary De- Center
Genomic Data
Identification Research
Honest Broker
Error Report Benchmarking
Ohio State Information Warehouse Infrastructure
45. Enhanced Registries
• Linked Databases for Research
• Leverages common data elements and models and existing standards.
Initially for cardiovascular disease, diabetes and co-morbidities.
• Derived data elements represent categories of data and temporal
patterns of interest.
• Linked to source data – initially, the Emory Healthcare Clinical Data
Warehouse and the Grady Health System Diabetes Patient Tracking
System.
• Supports end-user researcher query and analysis.
• Research PACS
• Federated support for management of image data.
• DICOM standards and Grid services for federated access.
• Management of image analysis results.
46. Registry Project Status
• Co-morbidity registry prototype completed
that exports demographics, encounters,
readmissions, discharge diagnoses and
diagnosis categories, and medication
categories to Excel pivot tables
• Has been used by Emory Healthcare to identify
co-morbidities associated with readmissions
for patient populations at high risk
• System development is ongoing
48. Distinguishing Characteristic in
Gliomas
Nuclear Qualities
Round shaped with Elongated with rough,
smooth regular texture irregular texture
Oligodendroglioma Astrocytoma
Use image analysis algorithms to segment and classify microanatomic
features (Nuclei, Astrocytoma, Necrosis ...) in whole slide images
Represent the segmentation and classification in a well defined
structured format that can be used to correlate the pathology with
other data modalities
49. PAIS Database
Implemented with IBM DB2 for large scale pathology
image metadata (~million markups per slide)
Represented by a complex data model capturing multi-
faceted information including markups, annotations,
algorithm provenance, specimen, etc.
Support for complex relationships and spatial query: multi-
level granularities, relationships between markups and
annotations, spatial and nested relationships
50.
51. PAIS Database and Analysis Pipeline
Suite of analysis algorithms and pipelines that carry out
the following tasks:
1. segmentation of cells and nuclei;
2. characterization of shape and texture features of
segmented nuclei;
3. storage of nuclei meta-data in relational database;
4. mechanism supporting spatial queries for human-
annotated nuclei;
5. machine learning methods that integrate information from
features to accomplish classification tasks.
52. Image Mining for Comparative Analysis of
Expression Patterns in Tissue Microarray
(PI’s: Foran and Saltz)
Build reference library of
expression signatures, integrate
state-of-the-art multi-spectral
imaging capability and build a
deployable clinical decision support
system for analyzing imaged specimens.
Technologies and computational
tools developed during the course of
the project to be tested on a
Grid-enabled, virtual laboratory
established among strategic sites
located at CINJ, Emory, RU, UPenn,
OSU, and ASU.
Funded by NIH through grant
#5R01LM009239-02 David J. Foran, Ph.D.
53. ACTSI: Example Active Biomedical Informatics Projects
In Silico Study of Brain Tumors
Minority Health Genomics and Translational Research
Bio-Repository Database (MH-GRID)
ACTSI Cardiovascular, Diabetes, Brain Tumor Registry
Early Hospital Readmission
CFAR (Center for AIDS Research) HIV/Cancer Project
Radiation Therapy and Quantitative Imaging
Integrative Analysis of Text and Discrete Data Related
to Smoking Cessation and Asthma
Metadata Analysis of Glycan Structures
Semantic Query and Analysis of Integrative Datasets in
Renal Transplant Clinical Studies (CTOT-C)
54. Thanks to:
• In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish
Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos
Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen,
Adam Flanders, Joel Saltz (Director)
• caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella
co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads
• caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam
Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and
many others
• In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz
• Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi
Schreibmann, Paul Pantalone
• Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti,
Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David
Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J.
Foran (Rutgers)