Stephen Friend Molecular Imaging Program at Stanford (MIPS) 2011-08-15

Use of Bionetworks to Build Maps of Diseases

Stephen Friend MD PhD

Sage Bionetworks (Non-Profit Organization)
Seattle/ Beijing/ San Francisco

MIPS Seminar Series
August 15th, 2011

why consider the fourth paradigm- data intensive science

thinking beyond the narrative, beyond pathways

advantages of an open innovation compute space

it is more about how than what

Alzheimer’s Diabetes

Treating Symptoms v.s. Modifying Diseases
Cancer Obesity
Will it work for me?

WHY NOT USE
“DATA INTENSIVE” SCIENCE
TO BUILD BETTER DISEASE MAPS?

“Data Intensive Science”- “Fourth Scientific Paradigm”
For building: “Better Maps of Human Disease”

Equipment capable of generating
massive amounts of data

IT Interoperability

Open Information System
Evolving Models hosted in a
Compute Space- Knowledge Expert

It is now possible to carry out comprehensive
monitoring of many traits at the population level
Monitor disease and molecular traits in
populations

Putative causal gene

Disease trait

what will it take to understand disease?

DNA RNA PROTEIN (dark matter)

MOVING BEYOND ALTERED COMPONENT LISTS

2002 Can one build a “causal” model?

How is genomic data used to understand biology?
RNA amplification

Tumors
Microarray hybirdization

Tumors
Gene Index

!Standard"GWAS Approaches Profiling Approaches
Identifies Causative DNA Variation but Genome scale profiling provide correlates of disease
provides NO mechanism   Many examples BUT what is cause and effect?

  Provide unbiased view of
molecular physiology as it
relates to disease phenotypes
trait
  Insights on mechanism
  Provide causal relationships
and allows predictions

Integrated"
! Genetics Approaches

Integration of Genotypic, Gene Expression & Trait Data
Schadt et al. Nature Genetics 37: 710 (2005)
Millstein et al. BMC Genetics 10: 23 (2009)

Causal Inference

“Global Coherent Datasets”
•  population based
•  100s-1000s individuals

Chen et al. Nature 452:429 (2008) Zhu et al. Cytogenet Genome Res. 105:363 (2004)
Zhang & Horvath. Stat.Appl.Genet.Mol.Biol. 4: article 17 (2005) Zhu et al. PLoS Comput. Biol. 3: e69 (2007)

Constructing Co-expression Networks

Start with expression measures for genes most variant genes across 100s ++ samples

1 2 3 4 Note: NOT a gene
expression heatmap
1

1 0.8 0.2 -0.8
Establish a 2D correlation matrix 2
for all gene pairs
expression

0.8 1 0.1 -0.6
3

0.2 0.1 1 -0.1
4

-0.8 -0.6 -0.1 1
Brain sample
Correlation Matrix
Define Threshold
eg >0.6 for edge

1 2 4 3 1 2 3 4
1 1
1 4 1 1 1 0 1 1 0 1
2 2
1 1 1 0 1 1 0 1
1 1 1 0 Hierarchically 3
Identify modules 4 0 0 1 0
2 3 cluster
4
3 0 0 0 1 1 1 0 1
Network Module Clustered Connection Matrix Connection Matrix
sets of genes for which many
pairs interact (relative to the
total number of pairs in that
set)

Preliminary Probabalistic Models- Rosetta /Schadt

Networks facilitate direct identification of
genes that are causal for disease
Evolutionarily tolerated weak spots

Gene symbol Gene name Variance of OFPM Mouse Source
explained by gene model
expression*
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics
Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics
Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of
Medicine and Dentistry at New
Jersey, NJ) [12]

Lactb Lactamase beta 52% tg Constructed using BAC transgenics
Me1 Malic enzyme 1 52% ko Naturally occurring KO
Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13]
Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11]
C3ar1 Complement component 46% ko Purchased from Deltagen, CA
3a receptor 1
Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CA
Nat Genet (2005) 205:370 factor beta receptor 2

List of Influential Papers in Network Modeling

  50 network papers
  http://sagebase.org/research/resources.php

Recognition that the benefits of bionetwork based molecular
models of diseases are powerful but that they require
significant resources

Appreciation that it will require decades of evolving
representations as real complexity emerges and needs to be
integrated with therapeutic interventions

Sage Mission
Sage Bionetworks is a non-profit organization with a vision to
create a commons where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the
elimination of human disease

Building Disease Maps Data Repository

Commons Pilots Discovery Platform
Sagebase.org

Sage Bionetworks Collaborators

  Pharma Partners
  Merck, Pfizer, Takeda, Astra Zeneca,
Amgen, Johnson &Johnson
  Foundations
  Kauffman CHDI, Gates Foundation

  Government
  NIH, LSDF

  Academic
  Levy (Framingham)
  Rosengren (Lund)
  Krauss (CHORI)

  Federation
  Ideker, Califarno, Butte, Schadt 22

Engaging Communities of Interest
NEW MAPS
Disease Map and Tool Users-
( Scientists, Industry, Foundations, Regulators...)

PLATFORM
Sage Platform and Infrastructure Builders-
( Academic Biotech and Industry IT Partners...)

RULES AND GOVERNANCE
Data Sharing Barrier Breakers-
(Patients Advocates, Governance
M
and Policy Makers, Funders...)
APS

FOR
M

NEW TOOLS
PLAT
NEW

Data Tool and Disease Map Generators-
(Global coherent data sets, Cytoscape,
RULES GOVERN Clinical Trialists, Industrial Trialists, CROs…)

PILOTS= PROJECTS FOR COMMONS
Data Sharing Commons Pilots-
(Federation, CCSB, Inspire2Live....)

Platform Commons Research
Cancer
Neurological Disease
Metabolic Disease
Curation/Annotation
Building
Data Disease
Repository Maps
CTCAP
Public Data Pfizer
Merck Data Outposts Merck
TCGA/ICGC Federation Takeda
CCSB Astra Zeneca
CHDI
Commons Gates
NIH
Pilots
LSDF-WPP
Inspire2Live
Hosting Data POC
Hosting Tools Bayesian Models
Co-expression Models
Hosting Models

Discovery Tools &
Platform Methods
KDA/GSVA
LSDF

Example 1: Breast Cancer
Coexpression Networks
Module combination

Partition BN

Bayesian Network

Survival Analysis

25
Zhang B et al., manuscript

Generation of Co-expression & Bayesian Networks from
published Breast Cancer Studies

4 Public Breast Cancer Datasets

NKI: van de Vijver et al. A gene-expression
signature as a predictor of survival in breast
cancer. N Engl J Med. 2002 Dec 19;347
295 samples
(25):1999-2009.

Wang Y et al. Gene-expression profiles to
predict distant metastasis of lymph-node-
negative primary breast cancer. Lancet. 286 samples
2005 Feb 19-25;365(9460):671-9.

Miller: Pawitan Y et al. Gene expression
profiling spares early breast cancer patients
from adjuvant therapy: derived and 159 samples
validated in two population-based cohorts.
Breast Cancer Res. 2005;7(6):R953-64.

Christos: Sotiriou C et al.. Gene
expression profiling in breast cancer:
understanding the molecular basis of 189 samples
histologic grade to improve prognosis. J
Natl Cancer Inst. 2006 Feb 15;98(4):
262-72.

Recovery of EGFR and Her2 oncoproteins
downstream pathways by super modules

Comparison of Super-modules with EGFR and Her2
signaling and resistance pathways

Key Driver Analysis
•  Identify key regulators for a list of genes h and a network N
•  Check the enrichment of h in the downstream of each node in N
•  The nodes significantly enriched for h are the candidate drivers

29

A) Cell Cycle (blue) B) Chromatin modification (black)

C) Pre-mRNA Processing (brown) D) mRNA Processing (red)

Global driver
Global driver & RNAi
validation

30

Signaling between Super Modules

(View Poster presented by Bin Zhang)

Example 2. The Sage Non-Responder Project in Cancer

•  To identify Non-Responders to approved drug regimens so
Purpose: we can improve outcomes, spare patients unnecessary
toxicities from treatments that have no benefit to them, and
reduce healthcare costs
Leadership: •  Co-Chairs Stephen Friend, Todd Golub, Charles Sawyers &
Rich Schilsky
Initial •  AML (at first relapse)
Studies: •  Non-Small Cell Lung Cancer
•  Ovarian Cancer (at first relapse)
•  Breast Cancer
•  Renal Cell
•  Multiple Myeloma
Sage Bionetworks • Non-Responder Project

Bin Zhang
Model of Alzheimer’s Disease Jun Zhu

AD

normal

AD

normal

AD

normal

Cell
cycle
http://sage.fhcrc.org/downloads/downloads.php

Anders
New Type II Diabetes Disease Models Rosengren

Global expression data
340 genes in islet-specific
from 64 human islet donors
open chromatin regions
Blue module: 3000 genes
Associated with
Type 2 diabetes
Elevated HbA1c
Reduced insulin secretion

168 overlapping genes, which have

•  Higher connectivity
•  Markedly stronger association with
•  Type 2 diabetes
•  Elevated HbA1c
•  Reduced insulin secretion
•  Enrichment for beta-cell transcription
factors and exocytotic proteins

New Type II Diabetes Disease Models Anders
Rosengren

•  Search across 1300 datasets in MetaGEO at Sage for similar expression profiles
Top hit: Islet dedifferentiation study where the 168 genes were upregulated in
mature islets and downregulated in dedifferentiated islets (Kutlu et al., Phys Gen 2009)

•  Analyses of expression-SNPs and clinical SNPs as well as Causal Inference Test

•  Identification of candidate key genes affecting beta-cell differentiation and chromatin

Working hypothesis:

Normal beta-cell: open chromatin in islet-specific regions,
high expression of beta-cell transcription factors,
differentiated beta-cells and normal insulin secretion

Diabetic beta-cell: lower expression of beta-cell transcription
factors affecting the identified module, dedifferentiation,
reduced insulin secretion and hyperglycemia

Next steps: Validation of hypothesis and suggested key genes in human islets

Clinical Trial Comparator Arm
Partnership (CTCAP)
  Description: Collate, Annotate, Curate and Host Clinical Trial Data
with Genomic Information from the Comparator Arms of Industry and
Foundation Sponsored Clinical Trials: Building a Site for Sharing
Data and Models to evolve better Disease Maps.
  Public-Private Partnership of leading pharmaceutical companies,
clinical trial groups and researchers.
  Neutral Conveners: Sage Bionetworks and Genetic Alliance
[nonprofits].
  Initiative to share existing trial data (molecular and clinical) from
non-proprietary comparator and placebo arms to create powerful
new tool for drug development.

Examples: The Sage Federation

•  Founding Lab Groups

–  Seattle- Sage Bionetworks
–  New York- Columbia: Andrea Califano
–  Palo Alto- Stanford: Atul Butte
–  San Diego- UCSD: Trey Ideker
–  San Francisco: UCSF/Sage: Eric Schadt

•  Initial Projects
–  Aging
–  Diabetes
–  Warburg

•  Goals: Share all datasets, tools, models
Develop interoperability for human data

THE FEDERATION
Butte Califano Friend Ideker Schadt

vs

Federation s Genome-wide Network and
Modeling Approach

Califano group at Columbia Sage Bionetworks Butte group at Stanford

Human Aging Project
Data Transformations Machine Learning

Brain A
(n=363)
Interactome Elastic Net
Brain B
(n=145)

Brain C TF Activity Profile Age
(n=400) Network Prior Model
Models
Blood A
(n=~1000) Gene Set / Pathway
Variation Analysis
Blood B Tree Classifiers
(n=~1000)

Adipose
(n=~700)

Deriving Master Regulators from Transcription Factors
Regulatory Networks Glycolysis & Glycogenesis Metabolism Pathway

Inferring Prostate Cancer Regulatory Modules for Glycolysis
&Glycogenesis Metabolism Pathway
Sage bionetworks approach
Prostate cancer global coherent
data set (GSE21032) Taylor BS. et al (2010) Cancer Cell 18(1):11-22

Integrated Bayesian Approach

Zhu J. et al (2008) Nature Genetics 40(7):
854-61
Glycolysis and Inferred Transcriptional
Glycogenesis Metablism Regulatory Network in Prostate
Gene Set (GGMSE) Cancer

Cox Proportional-Hazards
Prostate Cancer Regulatory Regression model based on
Modules for GGMSE and Other individual gene for recurrence free
Metabolism Pathways survival
Duarte N. et al (2006) PNAS 107(6):1777-1782

Metabolism pathways with regulatory
modules enriched by poor prognosis genes
for prostate cancer

Genes Associated with Poor Prognosis are disproportionally
found among the networks regulating the !glycolysis" Genes
P-Value<0.005 Size of the node proportional to -log10 P value for recurrence free survival.

Inferred regulatory module for GGMSE Inferred regulatory module for Oxidative
Phosphorylation and Sphingolipid
>5 fold enrichment of recurrence free prognostic genes with
Metabolism genes
the Glycolysis BN module than random selection (p<1e-100)

Federated Aging Project :
Combining analysis + narrative
=Sweave Vignette
Sage Lab
R code + PDF(plots + text + code snippets)
narrative
HTML

Data objects

Califano Lab Ideker Lab Submitted
Paper

Shared Data JIRA: Source code repository & wiki
Repository

Why not share clinical /genomic data and model building in the
ways currently used by the software industry
(power of tracking workflows and versioning

Synapse as a Github for building models of disease

Evolution of a Software Project

Biology Tools Support Collaboration

Potential Supporting Technologies

Addama

Taverna
tranSMART

Platform for Modeling

SYNAPSE

Eight Projects Initiated in last year

!

Group D
LEGAL STACK-ENABLING PAIENTS: John Wilbanks

Arch2POCM

Restructuring Drug Discovery

Absurdity of Current R&D Ecosystem
•  $200B per year in biomedical and drug discovery R&D
•  Handful of new medicines approved each year
•  Productivity in steady decline since 1950
•  90% of novel drugs entering clinical trials fail
•  NIH and EU just started spending billions to duplicate process

•  Significant pharma revenues going off patent in next 5 years
•  >30,000 pharma employees fired in each of last four years
•  Number of R&D sites in Europe down from 29 to 16 since 2009

What is the problem?
•  Regulatory hurdles too high?
•  Low hanging fruit picked?
•  Payers unwilling to pay?
•  Genome has not delivered?
•  Valley of death?
•  Companies not large enough to execute on strategy?
•  Internal research costs too high?
•  Clinical trials in developed countries too expensive?

In fact, all are true but none is the real problem

•  The current system is designed as if every new program is destined to
deliver an approved drug

•  Past 20 years prove this assumption wrong (again and again)

•  Why do promising early results rarely translate into approved drugs?

•  Bottom line: we have poor understanding of biology

•  Lack of early-data sharing within closed information systems dooms
drug discovery for frequent avoidable failure


We need to rebuild the drug discovery process so that we
better understand disease biology before testing proprietary
compounds on sick patients

The solution – Arch2POCM
1.  Create an Archipelago of clinicians and scientists from public
and private sectors to take projects from ideas to Proof of
Clinical Mechanism (POCM)

2.  Arch2POCM is a collaborative, data-sharing network of
scientists, whose drug discovery objective is to use robust
compounds against new targets to disentangle the complexity
of human biology, not to create a medicine

3.  Success?
•  A compound that provides proof of concept for a novel target-
allowing companies to use this common information to compete,
with dramatic increased chances of success
•  Culling targets with doomed mechanisms before multiple companies
waste money exploring them - at $50M a pop

Why data sharing through to Phase IIb?
•  Most rapidly reveals limitations and opportunities associated with the
target

•  Increases probability of success for internal proprietary programs

•  Scientific decisions are not influenced by market considerations or
biased internal thinking

•  Target mechanism is only properly tested at Phase IIb

Why no IP on “Common Stream” compounds?
•  Allows multiple groups to test diverse indications without funds
from Arch2POCM- crowdsourcing drug discovery

•  Broader and faster data dissemination

•  Far fewer legal agreements to negotiate

•  Generates “freedom to operate” on target because there are
no patent thickets to wade through

•  Efficient way to access world’s top scientists and doctors
without hassle

Existing Team Ready to Execute

First major milestones
2013- First Compound in clinical trials

2014- Go and No-Go Decisions from common stream of targets driving
Proprietary Programs

2014- Full complement of target programs activated

2014- Core Clinical Programs joined by crowdsourced clinical trials

OPPORTUNITIES FOR MIPS COMMUNITY

Data sets, Tools and Models

Joining Synapse Communities

Joining Federation Projects

Joinig Arch2POCM

Change reward structures for sharing data
(patients and academics)

Stephen Friend Molecular Imaging Program at Stanford (MIPS) 2011-08-15

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Viewers also liked

Viewers also liked (9)

Similar to Stephen Friend Molecular Imaging Program at Stanford (MIPS) 2011-08-15

Similar to Stephen Friend Molecular Imaging Program at Stanford (MIPS) 2011-08-15 (20)

More from Sage Base

More from Sage Base (13)

Recently uploaded

Recently uploaded (20)

Stephen Friend Molecular Imaging Program at Stanford (MIPS) 2011-08-15