More Related Content Similar to Presentation july 31_2015 Similar to Presentation july 31_2015 (20) Presentation july 31_20152. Use cell line data to better match cancer patients and
therapies
3. Cell lines are a rich source of drug sensitivity data
5280 Unique Drug - Cell line
Sensitivity Values
Predict
675 Human cancer cell lines with
RNAseq Gene Expression
5. Ridge regression keeps only a few significant genes
in model
Drug Sensitivity
Linear Model
Ridge Regression
6. Automatic Relevance Determnation Regression tunes
model sparsity from data
Drug Sensitivity
Linear Model
Ridge Regression
Automatic Relevance
Determination Regression
7. Start with only the most highly varying genes
8000 of the highest varying genes in the Genentech data were chosen
as features to reduce noise and for computational tractability
Drug Sensitivity
Linear Model
Ridge Regression
Automatic Relevance
Determination Regression
8. Docetaxel model predicts clinical response in breast
cancer patients
False Positive Rate
0.75 ROC AUC
7/8 patients will be
sensitive
True Positive
Rate
9. False Positive Rate
0.75 ROC AUC
91% of sensitive patients will recieve treatment
64% of resistant patients will avoid unnecessary treatment
True Positive
Rate
Docetaxel model predicts clinical response in
breast cancer patients
10. Model predicts Gemcitabine response in TCGA lung
cancer data
Gemcitabine
logIC50
With Tumor Tumor Free
p<5% by Wilcoxon Rank-Sum Test 0.80 ROC AUC
Top 4 predictions are all sensitive
11. About Me
©2014NatureAmerica,Inc.Allrightsreserved.
T E C H N I C A L R E P O RT S
Functional interpretation of genomic variation is critical
to understanding human disease, but it remains difficult to
predict the effects of specific mutations on protein interaction
networks and the phenotypes they regulate. We describe an
analytical framework based on multiscale statistical mechanics
that integrates genomic and biophysical data to model the
human SH2-phosphoprotein network in normal and cancer
cells. We apply our approach to data in The Cancer Genome
Atlas (TCGA) and test model predictions experimentally.
We find that mutations mapping to phosphoproteins often
create new interactions but that mutations altering SH2
domains result almost exclusively in loss of interactions.
Some of these mutations eliminate all interactions, but many
cause more selective loss, thereby rewiring specific edges
in highly connected subnetworks. Moreover, idiosyncratic
mutations appear to be as functionally consequential as
recurrent mutations. By synthesizing genomic, structural and
biochemical data, our framework represents a new approach to
the interpretation of genetic variation.
TCGA and similar projects have generated extensive data on the muta -
tional landscape of tumors1. To understand the functional consequences
of these mutations, it is necessary to ascertain how they alter the pro-
tein-protein interaction (PPI) networks involved in regulating cellular
Low-throughput methods such as fluorescence polarization spec-
troscopy provide precise interaction data on a few dozen PIDs and
ligands but cannot easily be scaled to the full proteome2, whereas
high-throughput array-based methods provide greater scale but
suffer from systematic artifacts and high false positive and false
negative rates, resulting in data sets that only partly agree2. An
additional challenge is modeling the effects of mutations for pro-
teins with multiple binding domains and/or multiple sites of phos-
phorylation9, a reality for most signaling proteins (for example, the
CRK oncoprotein). Existing methods are either limited to individual
domains10–13 or are insufficiently precise to discern the effects of single-
residue changes14,15.
The MSM framework we have developed combines genomic, bind-
ing and structural data and reconciles inconsistencies within and
among data sets to generate PID networks for normal and cancer
cells. We develop a bottom-up first-principles approach, involving a
single mathematical equation based on statistical mechanical ensem-
bles, that models domains, proteins and networks, and we then apply
this approach to the analysis of SH2 networks and mutations found
in TCGA16. We validate newly predicted interactions experimentally
and demonstrate the sensitivity of MSM to single-residue mutations
that cause subtle changes in binding affinity. Our analysis provides
mechanistic insights into an important PID cancer network and vali-
dates a computational approach to PID networks that can be applied
A multiscale statistical mechanical framework integrates
biophysical and genomic data to assemble cancer networks
Mohammed AlQuraishi1–3, Grigoriy Koytiger1,3, Anne Jenney1, Gavin MacBeath2 & Peter K Sorger1
PTPN11 (SHP2)GRB2
STAT5A
STAT6
STAT1
STAT4
STAT2
STAT3STA
P1TENC1
TNS1
TNS3
TNS4
RIN
1
SH2D5
SH2B1
SH2B2
SH2B3
SHC1
SHC3
PLCG1
PLCG2
BCAR3
SH2D3C
SH2D3A
FER
FES
CRKCRKLMATKHSH2DSH2D2A
SHBSHFSHDSHEFGRYES1
SRC
H
C
K
LCK
LYN
BLK
SLA2
PTK6
PTPN6
PTPN11
ABL1
GRAP
GRB2
NCK1
NCK2
PIK3R1
PIK3R3
PIK3R22NHC
ITK
TXK
BTK
TEC
BMX
SYK
ZAP70
GRB7
GRB14
GRB10
SH2D1A
SH2D1B
IN
PPL1
BLNK
LCP2
DAPP1VAV1VAV3VAV2RASA1SH3BP2
JAK2
JAK3
SUPT6H
CBL
ANKS1A
ANKS1B
GULP1
NUMB
NUMBL
DAB1
DAB2
APBA1
APBA2
APBA3
CCM
2
D
O
K
1
DO
K2
DO
K4
DOK6
DOK5FRS3IRS1
IRS4
APBB1APBB2APBB3
SHC2
APPL1
EPS8L2
STAT5A
STAT6
STAT1
STAT4
STAT2
STAT3STA
P1TENC1
TNS1
TNS3
TNS4
RIN
1
SH2D5
SH2B1
SH2B2
SH2B3
SHC1
SHC3
PLCG1
PLCG2
BCAR3
SH2D3C
SH2D3A
FER
FES
CRKCRKLMATKHSH2DSH2D2A
SHBSHFSHDSHEFGRYES1
SRC
H
C
K
LCK
LYN
BLK
SLA2
PTK6
PTPN6
PTPN11
ABL1
GRAP
GRB2
NCK1
NCK2
PIK3R1
PIK3R3
PIK3R2CHN2
ITK
TXK
BTK
TEC
BMX
SYK
ZAP70
GRB7
GRB14
GRB10
SH2D1A
SH2D1B
IN
PPL1
BLNK
LCP2
DAPP1VAV1VAV3VAV2RASA1SH3BP2
JAK2
JAK3
SUPT6H
CBL
ANKS1A
ANKS1B
GULP1
NUMB
NUMBL
DAB1
DAB2
APBA1
APBA2
APBA3
CCM
2
D
O
K
1
DO
K2
DO
K4
DOK6
DOK5FRS3IRS1
IRS4
APBB1APBB2APBB3
SHC2
APPL1
EPS8L2
Ph.D. Harvard University
Department of Chemistry and Chemical Biology
Postdoc Harvard Medical School
Department of Systems Biology
Management Consultant
Dean & Co.