2. Biological interaction networks
Nodes: genes/proteins or other molecules
Edges based on evidence for interaction
Voineagu et al. 2011 Nature
Breker and
Schuldiner
2009
Gene co-expression Protein-protein
interaction
Genetic interaction
Goal: Integrated analysis of different types of networks 2
3. Integration of networks
Better picture, reduces noise
Traditional approaches:
Look for “conserved” clusters
co-clustering (Hanisch et al. 2002); JointCluster (Narayanan et
al. 2011),
Look for clusters with special properties
MATISSE (Ulitsky and Shamir 2008)
3
4. Analysis of network pairs
Interactions types can differ: within (“positive”) vs.
between (“negative”) functional units
Input: networks P, N with same vertex set
Goal: summarize both networks in a module map
Node – module: gene set highly connected in P
Link – two modules highly
interconnected in N
Between-pathway models
Kelley and Ideker 2005
Ulitsky et al. 2008
Kelley and Kingsford 2011
Leiserson et al. 2011
P
N
4
5. Algorithms
Different definitions for the links and the
optimization objective function
Problems are NP hard
Approximation is also hard (weighted graphs)
Our algorithmic strategy:
Initiators: Find a good initial solution
Improvers: refine by merging/excluding modules
5
6. Initiators
Cluster P
Hierarchical
Node addition
Find linked module pairs
DICER: Local search in the P
and N (Kelley, Ideker 2005, Amar et al. 2013)
MBC-DICER: Find bi-cliques
Define candidate sets U and V that are
bicliques in N
Exhaustive solver (FP-MBC Li et al. 2007)
- requires tuning
6
7. Local Improvement
(DICER algorithm, Amar et al. PLoS CB 2013)
Link: sum of N weights between modules is positive
Goal: enlarge links
Greedy approach
Merge module links or add single nodes to link
7
8. Global analysis: node vs. module
Null hypothesis: edges between
v and M are drawn randomly
(n=deg(v))
Hyper-geometric p-value
Options for weighted graphs:
Use Wilcoxon rank-sum test
Set a threshold and use the
same test
M
Not M
v
8
9. Global analysis: module vs. module
Calculate a p-value for each node in V
and each node in U
Merge p-values using Fisher’s method
Under the null-hypothesis follows a
Chi-square distribution (dfs=number of
p-values)
U V
Other
nodes
9
10. Global analysis
Given a set of modules M and a set of significant links
L, the solution score:
Improvement steps: merge modules if the score
improves (select the best step iteratively)
Fast and accurate analysis:
Decide when to recalculate p-values
Perform many merges simultaneously
10
14. Link p <10-50
Chromatin
related hubs
similar to
Baryshnikova et
al. 2011
The yeast module map
14
15. The top links in the map (p <10-70)
Between
complexes
Between
subcomplexes
15
16. Comparison to extant methods
Analysis of the Collins et al. 2007 data
Comparing to extant methods that exploit both
positive and negative GIs and their weights
Algorithm
Number
of
modules
Gene
coverage
Maximal
module
size
Number of
enriched
GO terms
Percent
enriched
modules
Percent
enriched
links
Number of
links
MBC-DICER
(Global)
32 238 20 53 84 79 67
Genecentric
(Leiserson et al.
11)
116 1248 25 39 63 43 58
Kelley and
Kingsford 11
117 355 17 32 17 6 403
16
17. (2) Arabidopsis PPI & MD networks
P: PPIs. N: metabolic dependencies (Tzfadia et al. 2012)
Discover protein complexes and their metabolic links
17
18. Using the module map for function
prediction
Validated modules by their ability to predict gene functions
in MapMan
Function assignment: the gene’s module best assignment
LOOCV: precision and recall > 80%
Gene MapMan term Module p-value
AT5G48000
sulfur-
containing.glucosinolates 0.0001
AT5G42590
sulfur-
containing.glucosinolates 0.0001
AT2G30870
redox.ascorbate and
glutathione.ascorbate 0.0028
AT4G15440 isoprenoids.carotenoids 0.0002
AT1G62830 isoprenoids.carotenoids 0.0003
AT4G01690 isoprenoids.carotenoids 0.0003
New predictions
18
19. (3) Human case-control profiles
Data: expression profiles of Lung cancer (blood)
P: multi-phenotype co-expression network ; N: differential correlation
(DC): change in correlation in disease vs. controls
Cross-validation: most links show high DC in the test set
Link example:
Breakage of immune
activation in cancer
(enrichment q-value<1E-10)
Enrichment for NSLC-
specific causal miRNA
(mir-34 family, p =0.002,
mir2disease DB)
19
20. Summary
Integration of networks
Considering different interaction types
A summary module-map
Algorithms
Initiators
Improvers
Algorithms perform well in simulations and real data
PPI+GI
PPI+MD
Human disease: correlation and differential correlation
Next steps (?)
Cytoscape app (maybe next year…)
Can we use module maps instead of gene networks for network
inference?
20