Cardiotoxicity is unfortunately a common side effect of many modern chemotherapeutic agents. The mechanisms that underlie these detrimental effects on heart muscle, however, remain unclear. The Drug Toxicity Signature Generation Center at ISMMS aims to address this unresolved issue by providing a bridge between molecular changes in cells and the prediction of pathophysiological effects. I will discuss ongoing work in which we use next-generation sequencing to quantify changes in gene expression that occur in cardiac myocytes after they are treated with potentially toxic chemotherapeutic agents. I will focus in particular on the computational pipeline we are developing that integrates sophisticated sequence alignment, statistical and network analysis, and dynamical mathematical models to develop novel predictions about the mechanisms underlying drug-induced cardiotoxicity.
Jaehee Shim is a Ph.D candidate in the Biophysics and Systems Pharmacology Program at Icahn School of Medicine at Mount Sinai (ISMMS). As a part of her Ph.D. studies, she is building dynamical prediction models based on analysis of gene expression data generated by the Drug Toxicity Signature Generation Center at ISMMS. She received her B.S in Biochemistry from the University of Michigan-Dearborn. Prior to starting her Ph.D, Jaehee worked at the ISMMS Genomics Core with a team of senior scientists and gained experience in improving and troubleshooting RNA sequencing protocols using Next Generation Sequencing Platforms.
2. Big Data in the Field of Biology:
In the Beginning…
Notable Events that led to Big
Data Era:
Sanger Sequencing(1977)
Roger Tsien et al Patented
“Base-by-Base”
Technology(1990)
Pyrosequencing Introduced by
Nyren &Tsien. (1996)
Human Genome Project(1990-
2003)
Big Data Sources:
Genome
Transcriptome-expressed
genome
Proteome
Electronic Medical Records
3. Big Data in the Field of Biology:
In the Beginning…
Human Body:
13 organ systems in
human body with 4
basic tissue types
15-70 trillion cells
Genome
Transcript(messenger RNA)
Protein
Drawing of woman's torso from Anatomical
Notebooks of Leonardo da Vinci(1452-1519)
Complete set of genetic
information
Same in every cell
Selectively expressed
genes
Specific to the
tissue/organ cell type
Proteins are made from
transcripts
Multiple versions of
protein can arise from
one transcript (post-
translation modification)
4. Sequencing Data:
How big are they?
Stephens et al.(2015) PLoS Biol 13(7): e1002195
Projected annual storage & computing needs in 2025
…so in 2025, we can expect to
see the annual production of
1 X 1021 Bases/Year X
1byte/4bases =2.5X1020 bytes
OR
250 Exa-bytes!
Just from sequencing alone!
0
1E+19
2E+19
3E+19
4E+19
5E+19
Twitter Youtube Genomics
ProjectedAnnual
StorageNeed
Twitter
Youtube
Genomics
5. Now that we have covered the basics…
How are we using this BIG DATA
approach to predict drug-induced
cardiotoxicity?
6. Imperfections of Modern Drug Design
Drug Toxicity: Alternative
drug targets perturb cellular
dynamics and induce
adverse event in a patient
How Common are the Drug
Toxicity Events?
: 770,000 injuries or
deaths in US per
yearper
The Agency for Healthcare Research and Quality
By Stephen Jeffrey, The Economist
7. Cancer Drug Cardiac heath
Prediction of toxicity requires more investigation.
Underlying mechanisms are not clear.
Albini et al. (2009) J. Natl. Cancer Inst. 102:14–25.
8. Principal Investigators:
Marc Birtwistle
Ravi Iyengar
Eric Sobie
Cellular Signatures for Cardiotoxicity of Targeted Cancer Drugs
(Protein Kinase Inhibitors)
Can we obtain precise and personalized signatures?
Drug Toxicity Signature Generation Center (DToxS)
Protein kinase inhibition
altered gene
expression
cardiomyopathy
Cardiotoxicity
8
9. Why Do We Want to Personalize Medicine?
If we had to prescribe the same
drugs to EVERYONE before…
Now, we can SELECTIVELY
prescribe to the ONES who are
expected to respond!
Advantage?
Precise, effective delivery of the
treatment for the individual
patient
Lower risk of getting unnecessary
side-effects
Reducing the unnecessary medical
costs for treatments that may not
work.
10. Drug-Induced Toxicity Prediction Strategy
1. Electrophysiological
abnormality-- Arrhythmia :
Thinning
of the walls
2. Structural abnormality--
Dilated Cardiomyopathy:
Prediction can be made with
mathematical modeling
Transcriptome
Data
Gene Perturbation
Measurements
Mathematical
Modeling
Network Analysis
Prediction of abnormalities is assessed through
integrating transcriptome data with dynamical models
Upregulated
Downregulated
11. Experimental & Computational Strategy for Years 1-2
(1) Focus on cardiotoxicity caused by cancer therapeutics, e.g. tyrosine kinase inhibitors (TKIs)
(2) Treat cells with clinically-relevant doses of FDA approved TKIs and mitigating non-
cancer drugs as controls.
Mitigators identified from clinical data in the FDA – Adverse Events Database (FAERS)
(3) Measure changes in gene expression and protein levels at 48 hours using mRNA-seq
and proteomics
(4) Analyze results to obtain signatures, build biologically-relevant networks, and
integrate network analysis data with predictive dynamical models to obtain
dynamically ranked signatures
11
12. SORAFENIB DASATINIB
SUNITINIB PAZOPANIB
TOFACITINIB RUXOLITINIB
CRIZOTINIB AFATINIB
ERLOTINIB REGORAFENIB
GEFITINIB PONATINIB
IMATINIB DABRAFENIB
BOSUTINIB VEMURAFENIB
VANDETANIB CABOZANTINIB
LAPATINIB TRAMETINIB
NILOTINIB CERITINIB
AXITINIB
Kinase Inhibitors with Cardiac Risk
URSODEOXYCHOLIC
ACID PREDNISIOLONE
LOPERAMIDE DOMPERIDONE
DOMPERIDONE ALENDRONATE
APREPITANT PAROXETINE
DIETHYLPROPION ESTRADIOL
ENTECAVIR MONTELUKAST
OLMESARTAN CYCLOSPORINE
DICLOFENAC CEFUROXIME
CYTARABINE METHOTREXATE
GRANISETRON LOXAPINE
Control Drugs
Candidates of Cancer drug & Control
Drugs
13. Experimental design
Compare cardiotoxic cancer drugs with non-toxic non-cancer drugs and combinations
mRNA-seq
Proteomics
48 HOURS
Vehicle CTRL Cardiotoxic
Drug
non-Cardiotoxic
Drug(CTRL Drug)
Combination
Computational analysis to produce precise, personal signatures
13
15. Mapping/Counting of the Raw Gene Sequences
RAW Sequence in text format(FASTQ file):
Reference Seq.
Schematic representation of how ‘fragments of
sequences’ are “aligned” to a reference
sequence.
17. QC: How to Weed Out the Outliers from
Replicate Samples
To identify outliers, correlate each pair of samples in the same experimental group
We exclude Control Sample 4
as an outlier
Pearson correlation > 0.98
seems to indicate good
reproducibility for this assay;
future results will solidify this
QC standard
19. Questions We Can Address With Gene Signatures
What patterns are common amongst potentially cardiotoxic protein kinase inhibitors?
PRECISION IN SIGNATURES
What differences are observed between drugs, and can these be connected to
differences in drug/target structure, dosing, and clinical data?
PERSONALIZED SIGNATURES
Can differences in signature patterns between human subjects (cell lines) help us
to understand inter-individual variability in drug toxicity?
Drug repurposing for cancer chemotherapy?
Can drug combination signatures help us to understand clinically-observed
toxicity mitigation?
19
20. Cardiotoxic Cancer Drugs Show a More
Consistent Pattern of Differential Expression
Average –log10(p-value) Across Drug Group
NumberofGenes
Cancer Drugs
Non-Cancer
Drugs
Mean Log2 Fold Change
Cancer Drug non-Cancer(CTRL)
20
21. 0 10 20 30 40 50
collagen fibril organization
cellular localization
regulation of cellular component organization
regulation of apoptotic process
response to organic substance
response to wounding
cellular response to chemical stimulus
regulation of cell death
regulation of programmed cell death
regulation of cell migration
regulation of locomotion
regulation of cellular component movement
regulation of cell motility
cellular component organization
cellular component organization or biogenesis
negative regulation of cellular process
response to stress
negative regulation of biological process
extracellular structure organization
extracellular matrix organization
0 10 20 30 40 50
protein complex disassembly
establishment of protein localization to membrane
macromolecular complex disassembly
mRNA catabolic process
cellular protein complex disassembly
translational elongation
nuclear-transcribed mRNA catabolic process
translational initiation
translational termination
viral life cycle
protein targeting to membrane
multi-organism metabolic process
protein localization to endoplasmic reticulum
nuclear-transcribed mRNA catabolic process, nonsense-…
viral gene expression
viral transcription
establishment of protein localization to endoplasmic…
protein targeting to ER
cotranslational protein targeting to membrane
SRP-dependent cotranslational protein targeting to…
Minus log10(p-value)
Extracellular
matrix, Collagen,
Response to
wounding
Apoptosis, Cell death
Cell migration
Co-translational
protein targeting,
Translation,
Ribosomal proteins
(viral) transcription
and mRNA catabolism
Protein translation and
Protein complex assembly/
disassembly
General
GObiologicalprocesses
Cardiomyopathy-related
GObiologicalprocesses
Cancer Drug Cardiotoxicity Processes are
Enriched in the Initial Transcriptomic SignatureCancerDrugs
Non-Cancer
Drugs(CTRL)
22. Tanimoto Coefficient for Structural Similarity
0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75
WholeTranscriptomeCorrelationCoefficient
0.7
0.75
0.8
0.85
0.9
0.95
1
BOS, AFA
DAS, AFA
DAS, BOS
ERL, AFA
ERL, BOSERL, DAS
PAZ, AFA
PAZ, BOS
PAZ, DAS
PAZ, ERL
RUX, AFA
RUX, BOS
RUX, DAS
RUX, ERL
RUX, PAZ
SOR, AFA
SOR, BOS
SOR, DAS
SOR, ERL
SOR, PAZ
SOR, RUX
SUN, AFA
SUN, BOS
SUN, DAS
SUN, ERL
SUN, PAZ
SUN, RUX
SUN, SOR
VAN, AFA
VAN, BOS
VAN, DAS
VAN, ERL
VAN, PAZ
VAN, RUX
VAN, SOR
VAN, SUN
Differences Between Cancer Drugs—Relationship Between
Gene Expression Similarity and Structural Similarity
High correlation because small
changes in expression
Correlated structural and gene
expression similarity between drugs
Preliminary efforts to define signature precision
23. Next Step:
Prediction of Phenotypic Changes
Based on Gene Expression Data
Using Dynamical Modeling with
Differential Equations
24. Structural Abnormality Prediction :
Hypertrophy
Extracellular Stimuli
InteractingSpecies
Phenotypic Outputs
Ryall et al. (2012) JBC 287: 42259–42268.
Beta-adrenergic
Receptor
Map Kinase Pathway:
cascade of
phosphorylation reaction
to propagate signal from
the stimulus
25. Kraeutler et al. (2012) BMC Sys Biol. 4:157.
Methods: Model implemented using “Normalized Hill” Ordinary
Differential Equations Simulations of dynamics with minimal
parameterization.
)(
1][
, DDfw
dt
Dd
MAXBactBD
D
nn
n
BMAX
Bact
ECB
BY
f
50
,
,
Structural Abnormality Prediction :
Hypertrophy
Each arrow represents a generic
activation or inhibition reaction.
26. Structural Abnormality Prediction :
Hypertrophy
Quantitative Analysis of
Gene Perturbation in the
Network
Transcriptome
(~20,000 genes)
Genes in Hypertrophy
Network (~106 genes)
Simulate the time
course of different
pathway activation
that leads to
hypertrophy
Mathematical
Simulation
Trastuzumab
Sorafenib
Sunitinib
Modeling Strategy:
27. Hypertrophy Signaling Model Simulation
NFAT
BNP
GSK3B
time (minutes)
50 100 150 200 250 300 350 400
0
0.5
1
1.5
2
2.5
time (minutes)
50 100 150 200 250 300 350 400
0
0.5
1
1.5
2
2.5
time (minutes)
50 100 150 200 250 300 350 400
Normalizedactivity
0
0.5
1
1.5
2
2.5
time (minutes)
50 100 150 200 250 300 350 400
0
0.5
1
1.5
2
2.5
CREB
Control
Sorafenib
Sunitinib
Trastuzumab
Stimulus given:
Phenylephrine
(PE)
No
Stimulus
No
Stimulus
No
Stimulus
Stretch Isoproterenol
(ISO)
Fibroblast Growth
Factor (FGF)
NormalizedactivityNormalizedactivityNormalizedactivity
Different Cancer Drugs Induce Different Responses in Gene Species for
a Given Stimulus
Next Step: How Each Gene Node Contribute to Overall Phenotypic
(Structural) Changes?
28. Raw Gene Expression Pattern in
Hypertrophy Network
Sorafenib Sunitinib Trastuzumab
Log FC in gene expression data
Noticeable genetic perturbation in Sorafenib
Mild induction of gene change in Sunitinib and
Trastuzumab
Q. Does this noticeable gene perturbation
necessarily mean activation of hypertrophy?
Next Step:
Using Hypertrophy Network Model, simulate the
projected changes in hypertophic phenotypes by
integrating the raw gene expression pattern!
29. Predicted Pro-hypertrophic Changes
Per Drug Condition
phenotypic output
rNomalizedHypertrophicResponse
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
Sorafenib
Sunitinib
Trastuzumab
Pro-HypertrophicAnti-Hypertrophic
Sunitnib is the most hypertrophic drug!
Instead of looking at overall gene change, we need to look at how each
gene is affected!
30. Sensitivity Analysis of
Hypertrophy Network Model
Serca aMHC CellArea bMHC BNP ANP sACT
Hypertrophy Network has:
106 interacting Nodes
17 stimuli
7 phenotypic outputs
Strategy for simulating the impact
of each of 106 interacting
species(Sensitivity Analysis) :
Given no stimulus
Vary each node’s default
parameter by ±10 %
Measure the impact of the
variation in relation to each of 7
phenotypic output
Sensitivity Analysis of 106 Nodes
No Significant
Changes
Only 5 Nodes are Responsible for Structural Changes!
Sensitive nodes:
GSK3B HDAC SERCA aMHC foxo
31. Sunitinib-induced gene expression changes in the sensitive nodes have
complete opposite pattern from the other two drugs
Cancer Drug Induced Changes
in the Sensitive Nodes
Does drug treatment change the sensitivity of the node in overall
network? (i.e. Given the drug treatment, will the sensitivity pattern
change?)
32. 'aMHC' 'foxo' 'HDAC' 'SERCA'
'aMHC' 'ANP' 'bMHC' 'CellArea'
'CREB' 'foxo' 'GATA4' 'GSK3B'
'HDAC' 'NFAT’ 'sACT' 'SERCA'
'aMHC' 'foxo' 'HDAC' 'SERCA'
Drug specific sensitivity of
106 nodes per phenotypic outputs
Noticeable Increase in the Number of Sensitive Nodes in Sunitinib Treated
Cells
Currently in the process of:
1. Expanding sensitivity analysis to all drug conditions
2. Integrating sensitivity metrics with hypertrophy index
33. Conclusions and Future Directions
Summary:
Gene expression data were integrated with existing
network-based models to investigate pathophysiological
mechanisms of drug-induced cardiotoxicity.
Simulations were used to show:
Time-dependent changes in intracellular signaling
Stimulus-dependent phenotypic changes
Changes in sensitive nodes in the network
Current Challenges:
Integrating additional network-based dynamical models
EGF-induced signaling
Apoptosis
Comparing drug classes in depth using simulation results
New predictions for which processes/outputs are most
relevant?
34. Acknowledgements
Dr. Eric Sobie Lab
Megan Cummins
Ryan Devenyi
Elisa Nuñez-Acosta
Jingqi Gong
Marc Birtwistle
Ravi Iyengar
Eric Sobie
Evren Azeloglu
Yi-bang Chen
Sunita D'Souza
James Gallo
Milind Mahajan
Christoph Schaniel
Avner Schlessinger
Pedro Martinez
Tina Hu
Priyanka Dhanan
Rick Koch
Gomathi
Jayaraman
Jens Hansen
Yuguang Xiong
The Mount Sinai LINCS DSGC team
38. Statistical Computation of Differential Expressed
Genes(DEGs)
Trastuzumab
Ursodeoxycholic acid
Combination
73/28 (up/down)
22/28 (up/down)
98/43 (up/down)
Differentially Expressed:
Log2 Fold Change: -4 0 4
FASTQ file
(Raw data
from
Sequencer)
Sequence
Alignment with
BWA
QC: Eliminate
Outlier Samples
Consolidate and
Normalize BWA
output with EdgeR
EdgeR (Trimmed mean of means, TMM) :
Normalize based on a weighted average
instead of a median.
EdgeR computes statistical significance
based on the normalized data using TMM &
generates DEGs with p-values
Trastuzumab
Using DEGs,
statistically
imporatant
cellular pathway
list generated