The Network Effect: Integrative
Systems Approaches to Modeling
Biological Processes
John Quackenbush
AMATA
October 14, 201...
Essentially, all models are wrong,
but some are useful.
– George E. Box
The purpose of models is not to fit the
data but to sharpen the questions.
–Samuel Karlin
Every revolution in science—from
Copernican heliocentric model to the
rise of statistical and quantum
mechanics, from Darw...
Disease Progression and
Personalized Care

Birth

Treatment

Natural History of Disease

Clinical Care

Environment
+ Life...
Networks
Why we care about networks
Biological processes are driven not by genes
but by networks
We want to understand causal relat...
Networks
What we are not talking about:
Metabolic pathways – KEGG

Signal transduction pathways – BioCarta
Biochemical Pat...
When we say “Networks” we mean…

Genes are represented as “nodes”

Interactions are represented by
“edges”
Edges can be di...
Networks as Models
Phenomenology and Models
Ultimately, we look to develop a theory that describes
the interactions that drive biological sys...
Subtypes in Ovarian Cancer
2004 Estimated US
Cancer Deaths*
Men
290,890

Women
272,810

25%

Lung & bronchus

15%

Breast

10%

10%

Colon & rectum

...
A new subtype of ovarian cancer
mRNA/miRNA and DNA were extracted from 132
well-annotated FFPE samples and profiled on arr...
Identifying modules using ISIS*
Module:
Set of genes
supporting a
bi-partition
ISIS searches for stratifications of sample...
Angiogenic Subtype
Survival and Validation

1090 high grade,
late stage
serous tumors

1606 published
ovarian tumors
miRNA expression supports the subtypes
mir202 is under-expressed in the poor prognosis set,
correlating with up-regulation...
Beyond Bayesian Networks
What can we learn from networks?
Normal Tissue
Network

Chemosensitive
Tumor

Chemoresistant
Tumor
21
Regulation of Transcription
regulatory
sequences

promoter

Specific transcription factors
22
Another Idea: Message Passing
.

Transcription Factor
The TF is Responsible for
communicating with its Target

Downstream ...
Application of PANDA to OvCa
Downloaded expression data from 510 OvCa patients from
TCGA
Normalized data using fRMA and ma...
PANDA: Integrative Network Models
Conditions
Genes

Network for
Angiogenic Subtype

Expression data
(Angiogenic)

Genes

C...
Message-Passing Networks:
PANDA
Motif Data

Network0

PPI0

Responsibility

PPI1

Expression0

Availability

Network1

Exp...
Network Differences are captured in
Edges
15735 unique edges,
Including 49 TFs
Targeting 4419 genes

12631 unique edges,
I...
Ten “Key” Transcription Factors

TF differential Expression

Potential Connection with Angiogenesis

Target differential E...
Complex Regulatory Patterns Emerge

"A+/A-" genes targeted and more highly/lowly expressed in angiogenic
subtype
"A+;N-" g...
Inner ring: key TFs
Colored by Edge
Enrichment (A or N)
Outer ring: genes
Colored by Differential
Expression (A or N)

Int...
Complex Regulatory Patterns Emerge
TF2
PRRX2
SOX5
SOX5
MZF1
ARNT
MZF1

sig.
1.16E-23
1.01E-14
3.83E-12
5.83E-23
6.13E-16
9...
Regulatory Patterns suggest Therapies
Inhaled Corticosteroids in Asthma
Sham

Dex
Message-Passing Networks:
PANDA 2.0
miRNA targets

Motif Data

PPI0

Methylation

Network0

Genetics

Expression0

Metabol...
Generalizing to Individual Patients
Edge probabilities for subtypes are
an average over individual patients
We can general...
Yeast Cell Cycle Data: Edges Oscillate

Matthew Tung, Kimberly Glass
eQTL Networks: A simple idea
eQTLs should group together with core SNPs
regulating particular cellular functions
Perform a...
eQTL Networks: A simple idea
Common QTL SNPs regulate common functions
The modularity of a network quantifies the extent t...
eQTL Networks: A simple idea
Genomics is here to stay
Before I came here I was confused
about this subject.
After listening to your lecture,
I am still confused but at a higher...
Acknowledgments
Array Software Hit Team
Eleanor Howe
John Quackenbush
Dan Schlauch
Gene Expression Team
Fieda Abderazzaq
S...
The Network Effect: Integrative Systems Approaches to Modeling Biological Processes - John Quackenbush
The Network Effect: Integrative Systems Approaches to Modeling Biological Processes - John Quackenbush
The Network Effect: Integrative Systems Approaches to Modeling Biological Processes - John Quackenbush
The Network Effect: Integrative Systems Approaches to Modeling Biological Processes - John Quackenbush
Upcoming SlideShare
Loading in …5
×

The Network Effect: Integrative Systems Approaches to Modeling Biological Processes - John Quackenbush

889 views
698 views

Published on

Two trends are driving innovation and discovery in biological sciences: technologies that allow holistic surveys of genes, proteins, and metabolites and the growing realization that analysis and interpretation of the resulting requires an understanding of the complex factors that mediate the link between genotype and phenotype. The growing body of biological and biomedical information, driven by an exponential drop in the cost of generating genomic data, provides an outstanding opportunity for leveraging what we already “know” in a systematic way to understand the problems we are studying. Here, I will provide an overview of some of the methods we are using to investigate the complexities of human phenotypes and to explore how we can use biological data to uncover the cellular networks and pathways that underlie human disease, building predictive models of those networks that may help to direct therapies, with an emphasis on exploring functional pathways in ovarian cancer.

Published in: Health & Medicine
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
889
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

The Network Effect: Integrative Systems Approaches to Modeling Biological Processes - John Quackenbush

  1. 1. The Network Effect: Integrative Systems Approaches to Modeling Biological Processes John Quackenbush AMATA October 14, 2013
  2. 2. Essentially, all models are wrong, but some are useful. – George E. Box
  3. 3. The purpose of models is not to fit the data but to sharpen the questions. –Samuel Karlin
  4. 4. Every revolution in science—from Copernican heliocentric model to the rise of statistical and quantum mechanics, from Darwin’s theory of evolution and natural selection to the theory of the gene—has been driven by one and only one thing: access to data. –John Quackenbush
  5. 5. Disease Progression and Personalized Care Birth Treatment Natural History of Disease Clinical Care Environment + Lifestyle Outcomes Treatment Options Disease Staging Patient Stratification Early Detection Genetic Risk Biomarkers Quality Of Life Death
  6. 6. Networks
  7. 7. Why we care about networks Biological processes are driven not by genes but by networks We want to understand causal relationships in biological systems wherever possible Correlations in gene expression can be considered to be the result of network interactions We want to find networks using available genomic data (largely expression data)
  8. 8. Networks What we are not talking about: Metabolic pathways – KEGG Signal transduction pathways – BioCarta Biochemical Pathways – Roche (Bohringer) Transcription Factor Networks etc.
  9. 9. When we say “Networks” we mean… Genes are represented as “nodes” Interactions are represented by “edges” Edges can be directed to show “causal” interactions Edges are not necessarily direct interactions
  10. 10. Networks as Models
  11. 11. Phenomenology and Models Ultimately, we look to develop a theory that describes the interactions that drive biological systems The embodiment of the resulting theory should be a model describing the interactions we are seeking to understand Phenomenology, or phenomenological models, describe a body of knowledge that relates empirical observations of phenomena to each other, in a way which is consistent with fundamental theory, but is not directly derived from theory The question is not “Is this model right?” Rather, the question is “Is the model useful?”
  12. 12. Subtypes in Ovarian Cancer
  13. 13. 2004 Estimated US Cancer Deaths* Men 290,890 Women 272,810 25% Lung & bronchus 15% Breast 10% 10% Colon & rectum Pancreas 5% 6% Ovary Leukemia 5% 6% Pancreas Non-Hodgkin lymphoma 4% 4% Leukemia 3% Esophagus 4% Non-Hodgkin lymphoma Liver & intrahepatic bile duct 3% 3% Uterine corpus 2% Multiple myeloma Urinary bladder 3% 2% Brain/ONS Kidney 3% Lung & bronchus 32% Prostate 10% Colon & rectum All other sites 24% All other sites 21% ONS=Other nervous system. Source: American Cancer Society, 2004.
  14. 14. A new subtype of ovarian cancer mRNA/miRNA and DNA were extracted from 132 well-annotated FFPE samples and profiled on arrays We used a technique called ISIS to find robust bi-partitions in the data A major, robust subtype was associated with expression of angiogenesis genes We curated all published gene expression data to validate the split and signature
  15. 15. Identifying modules using ISIS* Module: Set of genes supporting a bi-partition ISIS searches for stratifications of samples into two groups that maximize a DLD score. *ISIS: Identifying splits of clear separation (von Heydebreck et al., Bioinformatics 2001)
  16. 16. Angiogenic Subtype
  17. 17. Survival and Validation 1090 high grade, late stage serous tumors 1606 published ovarian tumors
  18. 18. miRNA expression supports the subtypes mir202 is under-expressed in the poor prognosis set, correlating with up-regulation of its putative targets
  19. 19. Beyond Bayesian Networks
  20. 20. What can we learn from networks? Normal Tissue Network Chemosensitive Tumor Chemoresistant Tumor 21
  21. 21. Regulation of Transcription regulatory sequences promoter Specific transcription factors 22
  22. 22. Another Idea: Message Passing . Transcription Factor The TF is Responsible for communicating with its Target Downstream Target The Target must be Available to respond to the TF
  23. 23. Application of PANDA to OvCa Downloaded expression data from 510 OvCa patients from TCGA Normalized data using fRMA and mapped probes to EnsEMBL IDs using BiomaRt Assigned subtypes using a Gaussian Mixture Model using Mclust 188 angiogenic, 322 non-angiogenic Anecdotal evidence suggests about 1/3 of patients treated with angiogenesis inhibitors respond Used PANDA to map out networks
  24. 24. PANDA: Integrative Network Models Conditions Genes Network for Angiogenic Subtype Expression data (Angiogenic) Genes Conditions Expression data (Non-angiogenic) Compare/Identify Differences Network for Non-angiogenic Subtype
  25. 25. Message-Passing Networks: PANDA Motif Data Network0 PPI0 Responsibility PPI1 Expression0 Availability Network1 Expression1
  26. 26. Network Differences are captured in Edges 15735 unique edges, Including 49 TFs Targeting 4419 genes 12631 unique edges, Including 56 TFs Targeting 4081 genes
  27. 27. Ten “Key” Transcription Factors TF differential Expression Potential Connection with Angiogenesis Target differential Expression important chromatin remodeler in angiogenesis NFKB1 ARID3A TF differential Methylation required for hematopoetic development SOX5 involved Target differential Methylation in prostate cancer progression, responsive to estrogen Publication(s) PMID 20203265 21199920 19173284, 16636675 TFAP2A increases MMP2 expression and angiogenesis in melanoma 11423987 NKX2-5 regulates heart development 10021345 PRRX2 deletion cause vascular anomalies 10664157 AHR knock-out impairs angiogenesis 19617630 SPIB inhibits plasma cell differentiation 18552212 MZF1 represses MMP-2 in cervical cancer 22846578 BRCA1 inhibits VEGF and represses IGF1 in breast cancer 12400015, 22739988
  28. 28. Complex Regulatory Patterns Emerge "A+/A-" genes targeted and more highly/lowly expressed in angiogenic subtype "A+;N-" genes, or genes targeted in both subnetworks and more highly expressed in angiogenic subtype "N+;A-" genes, or genes targeted in both subnetworks and more highly expressed in non-angiogenic subtype "N-/N+" genes targeted in the non-angiogenic subnetwork but are more highly/lowly expressed in angiogenic subtype
  29. 29. Inner ring: key TFs Colored by Edge Enrichment (A or N) Outer ring: genes Colored by Differential Expression (A or N) Interring Connections Colored by Subnetwork (A or N) Ticks – genes annotated to “angiogenesis” in GO
  30. 30. Complex Regulatory Patterns Emerge TF2 PRRX2 SOX5 SOX5 MZF1 ARNT MZF1 sig. 1.16E-23 1.01E-14 3.83E-12 5.83E-23 6.13E-16 9.08E-16 # 244 155 157 92 382 148 Class A+ A+ A+ NNN- Co-regulatory TF Pairs TF1 ARID3A ARID3A PRRX2 ARNT AHR ETS1
  31. 31. Regulatory Patterns suggest Therapies
  32. 32. Inhaled Corticosteroids in Asthma Sham Dex
  33. 33. Message-Passing Networks: PANDA 2.0 miRNA targets Motif Data PPI0 Methylation Network0 Genetics Expression0 Metabolomics Responsibility PPI1 Availability Network1 Expression1
  34. 34. Generalizing to Individual Patients Edge probabilities for subtypes are an average over individual patients We can generalize this to a weighted sum over patients We can write this as a matrix equation We can then define a matrix of edge weights And a matrix of observed edges We can solve for the edges for each patient/sample Matthew Tung, Kimberly Glass
  35. 35. Yeast Cell Cycle Data: Edges Oscillate Matthew Tung, Kimberly Glass
  36. 36. eQTL Networks: A simple idea eQTLs should group together with core SNPs regulating particular cellular functions Perform a “standard eQTL” analysis: Y = β0 + β1 ADD + ε where Y is the quantitative trait and ADD is the allele dosage of a genotype. Create a bipartite graph where SNPs and genes are nodes and significant eQTL associations are edges. Use “leading eigenvector” clustering to find “communities” in the graph Fah Sathirapongsasuti
  37. 37. eQTL Networks: A simple idea Common QTL SNPs regulate common functions The modularity of a network quantifies the extent to which vertices cluster into community groups. Bipartite network clustering was done uaing the leading eigenvector method (Barber 2007 Physical Rev). We assessed functional enrichment for each cluster using the Bioconductor GOStats package which takes into account the hierarchical structure of GO annotation. Fah Sathirapongsasuti
  38. 38. eQTL Networks: A simple idea
  39. 39. Genomics is here to stay
  40. 40. Before I came here I was confused about this subject. After listening to your lecture, I am still confused but at a higher level. - Enrico Fermi, (1901-1954)
  41. 41. Acknowledgments Array Software Hit Team Eleanor Howe John Quackenbush Dan Schlauch Gene Expression Team Fieda Abderazzaq Stefan Bentink Aedin Culhane Benjamin Haibe-Kains Jessica Mar Melissa Merritt Megha Padi Renee Rubio <johnq@jimmy.harvard.edu> Center for Cancer Computational Biology Mick Correll Victor Chistyakov Dustin Holloway Lan Hui Lev Kuznetsov Niall O'Connor Jerry Papenhausen Yaoyu Wang John Quackenbush http://cccb.dfci.harvard.edu (Former) Stellar Students Martin Aryee Kaveh Maghsoudi Jess Mar Systems Support Stas Alekseev, Sys Admin Administrative Support Joan Coraccio Julianna Coraccio University of Queensland Christine Wells Lizzy Mason http://compbio.dfci.harvard.edu

×