This document summarizes a presentation given by Florian Markowetz on computational methods for analyzing large-scale gene perturbation screens. It discusses how new technologies have generated vast amounts of genomic data and introduces functional genomics approaches that perturb genes on a genome-wide scale to understand biological systems and pathways. Specific methods described include using high-dimensional phenotypes from screens to infer pathway features through nested effects models and probabilistic data integration. The document provides examples of applying these approaches to study the immune response in Drosophila and embryonic stem cell fate regulation.
島岡要 広島大歯・免疫学授業(2012年12月4日)Leukocyte Trafficking in Health and Diseases
Analyzing Large-Scale Gene Perturbation Screens Using Computational Methods
1. Computational methods to analyze
large-scale and high-dimensional
gene perturbation screens
McGill University, Montreal, March 26, 2008
Florian Markowetz
florian@genomics.princeton.edu
http://genomics.princeton.edu/∼florian
Lewis-Sigler Institute for Integrative Genomics
Princeton University
2. A wealth of data
New technologies over the last 10–15 years have allowed
genome-wide measurements of . . .
Genome sequences
Gene and protein expression
Protein-Protein interactions
Transcription factor binding
Tissue/cellular localization
Histone modifications
DNA methylation
Figure from fig.cox.miami.edu
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 1
3. How to understand a complex system?
Richard Feynman:
“What I cannot create, I do not understand.”
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 2
4. How to understand a complex system?
Richard Feynman:
“What I cannot create, I do not understand.”
Functional Genomics:
“What I cannot break, I do not understand.”
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 2
5. External perturbations
Drugs
Small
molecules
RNAi
Protein
Stress
Knockout
mRNA
DNA
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 3
6. One- or Low-dimensional Phenotypes
viability or cell death growth rate activity of reporter genes
Size
A-
Time
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 4
7. One- or Low-dimensional Phenotypes
viability or cell death growth rate activity of reporter genes
Size
A-
Time
Example:
Finding regulators of Nanog
in Mouse ES cells
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 4
8. Members of Swi/Snf-complex regulate Nanog
Schaniel C, et al., submitted to Nature
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 5
9. Members of Swi/Snf-complex regulate Nanog
Smarcc1 binding targets
from ChIP-chip data
Functional targets from microarray
after Smarcc1 knockdown
Schaniel C, et al., submitted to Nature
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 5
10. f
ti
Members of Swi/Snf-complex regulate Nanog
mo
d
ze
mi
ti
C11
op
2
Smarcc1 binding targets
C CGG
from ChIP-chip data
bits
1
CC A AC
Functional targets from microarray
0 TT G CG
after Smarcc1 knockdown
1
2
3
4
5
6
7
8
9
Schaniel C, et al., submitted to Nature
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 5
11. Phenotyping screens: what to observe?
One-dimensional Phenotypes:
identify candidate genes on a genome-wide scale
first step for follow-up analysis
hard to relate to specific gene function and pathways
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 6
12. Phenotyping screens: what to observe?
One-dimensional Phenotypes:
identify candidate genes on a genome-wide scale
first step for follow-up analysis
hard to relate to specific gene function and pathways
High-dimensional Phenotypes:
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 6
13. The information gap
Direct information
effects are visible at other
pathway components
Pathway Pathway
B
?
B
D D
A A
C C
- Cell survival or death
- Growth rate
- downstream genes
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 7
14. The information gap
Direct information Indirect information
effects are visible at other effects are only visible at
pathway components ’downstream reporters’
Pathway Pathway
Pathway Pathway
B B
??
B B
D D
D D
A A
A A
C C
C C
- Cell survival or death death
- Cell survival or
- Growth rate rate
- Growth
- downstream genes genes
- downstream
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 7
15. Bridging the information gap
1. Nested Effects Models ::
pathway features can be inferred from high-
dimensional phenotypes.
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 8
16. Bridging the information gap
1. Nested Effects Models ::
pathway features can be inferred from high-
dimensional phenotypes.
2. Probabilistic data integration ::
Physical interactions between proteins must
explain perturbation effects.
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 8
17. Bridging the information gap
1. Nested Effects Models ::
pathway features can be inferred from high-
dimensional phenotypes.
2. Probabilistic data integration ::
Physical interactions between proteins must
explain perturbation effects.
3. Comprehensive Phenotypes ::
Dissecting cell fate regulation in mESC by
probing four levels of gene regulation.
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 8
18. 1. Nested Effects Models
Pathway
?
B D
A C
High-dimensional
Phenotypes
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 9
19. Immune response in Drosophila
Response to microbial challenge
(Boutros et al., Dev Cell, 2002)
Columns: silenced genes.
Rows: effects on other genes.
Figures courtesy of Michael Boutros
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 10
20. Immune response in Drosophila
Response to microbial challenge
(Boutros et al., Dev Cell, 2002)
Columns: silenced genes.
Rows: effects on other genes.
Results:
1. Silencing tak1 reduces expression
of all LPS-inducible transcripts.
2. Silencing rel (key) or mkk4/hep
reduces expression of separate
sets of induced transcripts.
Figures courtesy of Michael Boutros
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 10
21. Immune response in Drosophila
Response to microbial challenge
(Boutros et al., Dev Cell, 2002)
Columns: silenced genes.
Rows: effects on other genes.
Results:
1. Silencing tak1 reduces expression
of all LPS-inducible transcripts.
2. Silencing rel (key) or mkk4/hep
reduces expression of separate
sets of induced transcripts.
Figures courtesy of Michael Boutros
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 10
22. Nested Effects Models
Markowetz et al. (2005, 2007), Tresch and Markowetz (2008)
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 11
23. NEM: Model formulation
Pathway genes: X, Y, Z Effects: E1, . . . , E6
• core topology • states are observed
• to be reconstructed = Data D
• connection to pathway unknown
= Model M
= Parameters θ
Likelihood P (D | M ) given false positive and false negative rates
Markowetz et al., 2005
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 12
24. NEM: Inference
Exhaustive enumeration: score all subset patterns to find the
one fitting the data best
Markowetz et al. Bioinformatics, 2005
MCMC, Simulated Annealing: take small probabilistic steps to
explore model space
. . . with A Tresch; in preparation, 2008
Divide and conquer: break a big model into smaller, manageable
pieces and then re-assemble
Markowetz et al. ISMB 2007
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 13
25. Extensions to NEMs
Drop the transitivity
requirement
Tresch and Markowetz (2008)
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 14
26. Extensions to NEMs
Drop the transitivity Likelihood based on
log-ratios of effects
requirement
Tresch and Markowetz (2008)
Florian Markowetz, Analyzing Phenotyping Screens, March 26, 2008 14
27. Extensions to NEMs
Drop the transitivity Likelihood based on
log-ratios of effects