Stephen Friend Institute for Cancer Research 2011-11-01

Actionable Cancer Network Models
And Open Medical Information Systems

Integrating layers of omics data models and compute spaces

Stephen Friend MD PhD

Sage Bionetworks (Non-Profit Organization)
Seattle/ Beijing/ Amsterdam
ICR Oslo
November 1, 2011

Why not use data intensive science
to build models of disease

Current Reward Structures

Organizational Structures and Tools

Pilots

Opportunities

What is the problem?
•  Regulatory hurdles too high?
•  Low hanging fruit picked?
•  Payers unwilling to pay?
•  Genome has not delivered?
•  Valley of death?
•  Companies not large enough to execute on strategy?
•  Internal research costs too high?
•  Clinical trials in developed countries too expensive?

In fact, all are true but none is the real problem

What
is
the
problem?

We
need
to
rebuild
the
drug
discovery
process
so
that
we

be6er
understand
disease
biology
before
tes8ng

proprietary
compounds
on
sick
pa8ents

What
is
the
problem?

Most
approved
cancer
therapies
assumed
tumor

indica8ons
would
represent
homogenous
popula8ons

Most
new
cancer
therapies
are
in
search
of
single
altered

components

Our
exis8ng
tumor
models
o>en
assume
pathway

knowledge
suﬃcinet
to
infer
correct
therapies

Personalized Medicine 101:
Capturing Single bases pair mutations = ID of responders

The value of appropriate representations/ maps

“Data Intensive” Science- Fourth Scientific Paradigm

Equipment capable of generating
massive amounts of data

IT Interoperability

Open Information System

Host evolving computational models
in a “Compute Space”

WHY
NOT
USE

“DATA
INTENSIVE”
SCIENCE

TO
BUILD
BETTER
DISEASE
MAPS?

what will it take to understand disease?

DNA

RNA
PROTEIN
(dark
maGer)

MOVING
BEYOND
ALTERED
COMPONENT
LISTS

2002 Can one build a “causal” model?

How is genomic data used to understand biology?
RNA amplification

Tumors
Microarray hybirdization

Tumors
Gene Index

Standard GWAS Approaches Profiling Approaches
Identifies Causative DNA Variation but Genome scale profiling provide correlates of disease
provides NO mechanism   Many examples BUT what is cause and effect?

  Provide unbiased view of
molecular physiology as it
relates to disease phenotypes
trait
  Insights on mechanism
  Provide causal relationships
and allows predictions

20
Integrated Genetics Approaches

Integration of Genotypic, Gene Expression & Trait Data
Schadt et al. Nature Genetics 37: 710 (2005)
Millstein et al. BMC Genetics 10: 23 (2009)

Causal Inference

“Global Coherent Datasets”
•  population based
•  100s-1000s individuals

Chen et al. Nature 452:429 (2008) Zhu et al. Cytogenet Genome Res. 105:363 (2004)
Zhang & Horvath. Stat.Appl.Genet.Mol.Biol. 4: article 17 (2005) Zhu et al. PLoS Comput. Biol. 3: e69 (2007)

Gene Co-Expression Network Analysis

Define a Gene Co-expression Similarity

Define a Family of Adjacency Functions

Determine the AF Parameters

Define a Measure of Node Distance

Identify Network Modules (Clustering)

Relate the Network Concepts to
External Gene or Sample Information 22
Zhang B, Horvath S. Stat Appl Genet Mol Biol 2005

Constructing Co-expression Networks

Start with expression measures for genes most variant genes across 100s ++ samples

1 2 3 4 Note: NOT a gene
expression heatmap
1

1 0.8 0.2 -0.8
Establish a 2D correlation matrix 2
for all gene pairs
expression

0.8 1 0.1 -0.6
3

0.2 0.1 1 -0.1
4

-0.8 -0.6 -0.1 1
Brain sample
Correlation Matrix
Define Threshold
eg >0.6 for edge

1 2 4 3 1 2 3 4
1 1
1 4 1 1 1 0 1 1 0 1
2 2
1 1 1 0 1 1 0 1
1 1 1 0 Hierarchically 3
Identify modules 4 0 0 1 0
2 3 cluster
4
3 0 0 0 1 1 1 0 1
Network Module Clustered Connection Matrix Connection Matrix
sets of genes for which many
pairs interact (relative to the
total number of pairs in that
set)

Preliminary Probabalistic Models- Rosetta /Schadt

Networks facilitate direct
identification of genes that are
causal for disease
Evolutionarily tolerated weak spots

Gene symbol Gene name Variance of OFPM Mouse Source
explained by gene model
expression*
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics
Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics
Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of
Medicine and Dentistry at New
Jersey, NJ) [12]

Lactb Lactamase beta 52% tg Constructed using BAC transgenics
Me1 Malic enzyme 1 52% ko Naturally occurring KO
Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13]
Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11]
C3ar1 Complement component 46% ko Purchased from Deltagen, CA
3a receptor 1
Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CA
Nat Genet (2005) 205:370 factor beta receptor 2

Our ability to integrate compound data into our network analyses

db/db mouse
(p~10E(-30))

= up regulated
= down regulated
db/db mouse
(p~10E(-20)
p~10E(-100))

AVANDIA in db/db mouse

Genomic
Literature
Protein-‐Protein
Complexes

Transcriptiona
l

Signaling

THE EVOLUTION OF SYSTEMS BIOLOGY

Mol.
Proﬁles
Structure

Model
Evolution

Disease
Models

Model
Topology

Physiologic
/

Model
Dynamics
Pathologic

Phenotype
Regulation

Extensive Publications now Substantiating Scientific Approach
Probabilistic Causal Bionetwork Models
• >80 Publications from Rosetta Genetics

Metabolic "Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)
Disease "Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)
"Genetics of gene expression and its effect on disease." Nature. (2008)
"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009)
….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc
CVD "Identification of pathways for atherosclerosis." Circ Res. (2007)
"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)
…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome

Bone "Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)
d
..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)
Methods "An integrative genomics approach to infer causal associations ... Nat Genet. (2005)
"Increasing the power to detect causal associations… PLoS Comput Biol. (2007)
"Integrating large-scale functional genomic data ..." Nat Genet. (2008)
…… Plus 3 additional papers in PLoS Genet., BMC Genet.

List of Influential Papers in Network Modeling

  50 network papers
  http://sagebase.org/research/resources.php

“Data Intensive” Science- Fourth Scientific Paradigm
Score Card for Medical Sciences

Equipment capable of generating
massive amounts of data A-

IT Interoperability D

Open Information System D-

Host evolving computational models
in a “Compute Space F

We still consider much clinical research as if we we
hunter gathers - not sharing
.

TENURE

FEUDAL
STATES

Clinical/genomic data
are accessible but minimally usable

Little incentive to annotate and curate
data for other scientists to use

Mathematical
models of disease
are not built to be
reproduced or
versioned by others

Assumption that genetic alterations
in human conditions should be owned

Lack of standard forms for sharing data
and lack of forms for future rights and consentss

Publication Bias- Where can we find the (negative) clinical data?

sharing as an adoption of common standards..
Clinical Genomics Privacy IP

Sage Mission
Sage Bionetworks is a non-profit organization with a vision to
create a commons where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the
elimination of human disease

Building Disease Maps Data Repository

Commons Pilots Discovery Platform
Sagebase.org

Sage Bionetworks Collaborators

  Pharma Partners
  Merck, Pfizer, Takeda, Astra Zeneca,
Amgen, Johnson &Johnson
  Foundations
  Kauffman CHDI, Gates Foundation

  Government
  NIH, LSDF

  Academic
  Levy (Framingham)
  Rosengren (Lund)
  Krauss (CHORI)

  Federation
  Ideker, Califarno, Butte, Schadt 40

NEW MAPS
Disease Map and Tool Users-
( Scientists, Industry, Foundations, Regulators...)

PLATFORM
Sage Platform and Infrastructure Builders-
( Academic Biotech and Industry IT Partners...)

PILOTS= PROJECTS FOR COMMONS
Data Sharing Commons Pilots-
(Federation, CCSB, Inspire2Live....)

NEW TOOLS
ORM
APS

Data Tool and Disease Map Generators-
(Global coherent data sets, Cytoscape,
M

F
PLAT

Clinical Trialists, Industrial Trialists, CROs…)
NEW

RULES GOVERN RULES AND GOVERNANCE
Data Sharing Barrier Breakers-
(Patients Advocates, Governance
and Policy Makers, Funders...)

Bin Zhang
Integration of Multiple Networks for Jun Zhu
Pathway and Target Identification

CNV
Data
Gene
Expression
Clinical
Traits
Co-Expression
Bayesian Network
Network
Integration of Coexp. &
Bayesian Networks 42

Key Driver Analysis
42

Bin Zhang
Key Driver Analysis Jun Zhu
Justin Guinney

http://sagebase.org/research/tools.php 43

Bin Zhang
Model of Breast Cancer: Co-expression Xudong Dai
Jun Zhu
A) Miller 159 samples B) Christos 189 samples
NKI: N Engl J Med. 2002 Dec 19;347(25):1999.

Wang: Lancet. 2005 Feb 19-25;365(9460):671.

Miller: Breast Cancer Res. 2005;7(6):R953.

Christos: J Natl Cancer Inst. 2006 15;98(4):262.

C) NKI 295 samples

E) Super modules

Cell cycle

Pre-mRNA

ECM
D) Wang 286 samples Blood vessel

Immune
response

44
Zhang B et al., Towards a global picture of breast cancer (manuscript).

Bin Zhang
Model of Breast Cancer: Integration Xudong Dai
Jun Zhu
Conserved Super-modules

mRNA proc.
= predictive
Breast Cancer Bayesian Network

Chromatin
of survival

Extract gene:gene relationships for selected super-modules from BN and define Key Drivers

Pathways & Regulators
(Key drivers=yellow; key drivers validated in siRNA screen=green)
Cell Cycle (Blue) Chromatin Modification (Black) Pre-mRNA proc. (Brown) mRNA proc. (red)

45
Zhang B et al., Key Driver Analysis in Gene Networks (manuscript)

Developing predictive models of genotype specific sensitivity
to Perturbations- Margolin

Predictive model

Developing predictive models of genotype-specific
sensitivity to compound treatment

Gene8c
Feature
Matrix

Expression,
copy
number,

somaQc
mutaQons,
etc.

Predic8ve
Features

(biomarkers)

Cancer
samples
with
varying

degrees
of
response
to
therapy

Sensi8ve
Refractory

(e.g.
EC50)

47

1
20
100
500

Feature
Features
Features
Features

Elastic net regression

48

Bootstrapping retains robust predictive features

49

Our approach identifies mutations in genes upstream of MEK
as top predictors of sensitivity to MEK inhibition

#3
Mut
NRAS

!"#$% &"#$% #1
Mut
BRAF

PD-‐0325901

'"#(%
#312
Mut
NRAS

)*!+,-% #./0-11%
2/345-674+%

#9
Mut
BRAF

50

PD-‐0325901

For 11/12 compounds, the #1 predictive feature in an unbiased
analysis corresponds to the known stratifier of sensitivity
#2
CML
lineage

CML lineage
#1
EGFR
mut

EGFR mut

#1
EGFR
mut

EGFR mut

#1
CML
lineage

#1
EGFR
mut

CML linage
EGFR mut

#1
ERBB2
expr

ERBB2 expr

How
accurate
would
predic8ve

#1
BRAF
mut

models
perform
for
diagnos8cs?

BRAF mut

#1
HGF
expr

HGF expr
#2
NRAS
mut
NRAS mut

BRAF mut
#1
BRAF
mut

#3
KRAS
mut

KRAS mut

#2
NRAS
mut

NRAS mut
BRAF mut

#1
BRAF
mut

#3
KRAS
mut

KRAS mut

#2
NRAS
mut

NRAS mut
BRAF mut

#1
BRAF
mut

#2
TP53
mut

TP53 mut

#3
CDKN2A
copy

CDKN2A copy

#1
MDM2
expr

MDM2 expr

51

Why not share clinical /genomic data and model building in the
ways currently used by the software industry
(power of tracking workflows and versioning

Leveraging Existing Technologies

Addama

Taverna
tranSMART

INTEROPERABILITY
SYNAPSE

Genome Pattern
CYTOSCAPE
tranSMART
I2B2
INTEROPERABILITY

sage bionetworks synapse project
Watch What I Do, Not What I Say Reduce, Reuse, Recycle

My Other Computer is Amazon
Most of the People You Need to Work with
Don’t Work with You

Six
Pilots
at
Sage
Bionetworks

CTCAP

Non-‐Responders

Arch2POCM

ORM
S
MAP
The
FederaQon

F
PLAT
Portable
Legal
Consent

NEW
Sage
Congress
Project
RULES GOVERN

CTCAP

Clinical Trial Comparator Arm Partnership “CTCAP”
Strategic Opportunities For Regulatory Science
Leadership and Action

FDA
September 27, 2011

Clinical Trial Comparator Arm
Partnership (CTCAP)
  Description: Collate, Annotate, Curate and Host Clinical Trial Data
with Genomic Information from the Comparator Arms of Industry and
Foundation Sponsored Clinical Trials: Building a Site for Sharing
Data and Models to evolve better Disease Maps.
  Public-Private Partnership of leading pharmaceutical companies,
clinical trial groups and researchers.
  Neutral Conveners: Sage Bionetworks and Genetic Alliance
[nonprofits].
  Initiative to share existing trial data (molecular and clinical) from
non-proprietary comparator and placebo arms to create powerful
new tool for drug development.

Started Sept 2010

Shared clinical/genomic data sharing and analysis will
maximize clinical impact and enable discovery

•  Graphic
of
curated
to
qced
to
models

Non-‐Responders
Project

To identify Non-Responders to approved
Oncology drug regimens in order to improve
outcomes, spare patients unnecessary toxicities
from treatments that have no benefit to them, and
reduce healthcare costs

The
Non-‐Responder
Cancer
Project
Leadership
Team

Stephen Friend, MD, PhD
Todd Golub, MD
Founding Director Cancer Biology
President and Co-Founder of
Program Broad Institute, Charles Dana
Sage Bionetworks, Head of
Investigator Dana-Farber Cancer
Merck Oncology 01-08,
Institute, Professor of Pediatrics Harvard
Founder of Rosetta
Medical School, Investigator, Howard
Inpharmatics 97-01, co-
Hughes Medical Institute
Founder of the Seattle Project

Richard Schilsky, MD
Garry Nolan, PhD Chief, Hematology- Oncology, Deputy
Professor, Baxter Laboratory of Stem Director, Comprehensive Cancer
Cell Biology, Department of Microbiology Center, University of Chicago; Chair,
and Immunology, Stanford University National Cancer Institute Board of
Director, Proteomics Center at Stanford Scientific Advisors; past-President
University
ASCO, past Chairman CALGB clinical
trials group

11

The
Non-‐Responder
Project
is
an
internaQonal
iniQaQve
with
funding

for
6
iniQal
cancers
anQcipated
from
both
the
public
and
private
sectors

GEOGRAPHY
United
States
China

TARGET

CANCER
Ovarian

Renal
Breast
AML
Colon
Lung

RetrospecQve

FUNDING

SOURCE

Seeking
private
sector
and
study;
likely
to

Funded
by
the
Chinese
government

philanthropic
funding
for
be
funded
by

and
private
sector
partners

the
Federal

prospec8ve
studies
Government

5

Arch2POCM

Restructuring
the
PrecompeQQve

Space
for
Drug
Discovery

How
to
potenQally
De-‐Risk

High-‐Risk
TherapeuQc
Areas

How can we accelerate the pace of scientific discovery?
2008
2009
2010
2011

Ways to move beyond
“traditional” collaborations?

Intra-lab vs Inter-lab
Communication

Colrain/ Industrial PPPs Academic
Unions

human aging:
predicting bioage using whole blood methylation

•  Independent training (n=170) and validation (n=123) Caucasian cohorts
•  450k Illumina methylation array
•  Exom sequencing
•  Clinical phenotypes: Type II diabetes, BMI, gender…

Training Cohort: San Diego (n=170) Validation Cohort: Utah (n=123)
RMSE=3.35 RMSE=5.44

100
100

! ! !!
!
! !
!! ! !
! !
! !
! ! ! ! !! !
! ! ! !!!!
!! ! !
! ! ! !! !
! !! !
! !!
! ! !! !

80
Biological Age

Biological Age
!! !
!!!! !!
! !! !
! ! ! !!! !
80

! !!!
!! ! ! ! !
!!
! !
! !
! !! !
! ! ! !!!!
!!! !
!!!
!!
! !! !!!!
! !!! !! !! ! ! !
! ! ! !
! !
! !!! !
! !! !
! !!
! !!
!! ! !! !
!! !
! !!!! ! !
! ! !! !
!! !
! ! !! !
! !!!
! ! ! !!! ! !!
!!
!! !
!! ! !
! !
!! ! !
!
! !!
!! ! ! !
!

60
!!! ! ! !
! !! !
! ! !
! !! !
60

! ! ! !!
!! ! ! !
! !
!!
! !!!!
! ! !
! ! !! !
! !!
! ! ! !!
! ! !
! ! ! 40
40

! !

40 50 60 70 80 90 100 40 50 60 70 80 90

Chronological Age Chronological Age

sage federation:
model of biological age

Faster Aging
Predicted
Age
(liver
expression)

Slower Aging

Clinical Association
-  Gender
-  BMI
-  Disease
Age Differential Genotype Association
Gene Pathway Expression

Chronological
Age
(years)

Reproducible
science==shareable
science

Sweave: combines programmatic analysis with narrative

Dynamic generation of statistical reports
using literate data analysis

Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports
using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –
Proceedings in Computational Statistics,pages 575-580.
Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9

Federated
Aging
Project
:

Combining
analysis
+
narraQve

=Sweave Vignette
Sage Lab
R code + PDF(plots + text + code snippets)
narrative
HTML

Data objects

Califano Lab Ideker Lab Submitted
Paper

Shared
Data
JIRA:
Source
code
repository
&
wiki

Repository

Portable
Legal
Consent

(AcQvaQng
PaQents)

John
Wilbanks

Sage
Congress
Project

April
20
2012

RA

Parkinson’s

Asthma

(Responders
CompeQQons)

Why not use data intensive science
to build models of disease

Current Reward Structures

Organizational Structures and Tools

Six Pilots

Opportunities

Actionable Cancer Network Models
And Open Medical Information Systems

IMPACT
ON
PATIENTS

Stephen Friend Institute for Cancer Research 2011-11-01

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Similar to Stephen Friend Institute for Cancer Research 2011-11-01

Similar to Stephen Friend Institute for Cancer Research 2011-11-01 (20)

More from Sage Base

More from Sage Base (12)

Recently uploaded

Recently uploaded (20)

Stephen Friend Institute for Cancer Research 2011-11-01