NetBioSIG2013-Talk David Amar

David Amar
School of Computer Science
Tel Aviv University
July 2013
1

Biological interaction networks
 Nodes: genes/proteins or other molecules
 Edges based on evidence for interaction
Voineagu et al. 2011 Nature
Breker and
Schuldiner
2009
Gene co-expression Protein-protein
interaction
Genetic interaction
Goal: Integrated analysis of different types of networks 2

Integration of networks
 Better picture, reduces noise
 Traditional approaches:
 Look for “conserved” clusters
 co-clustering (Hanisch et al. 2002); JointCluster (Narayanan et
al. 2011),
 Look for clusters with special properties
 MATISSE (Ulitsky and Shamir 2008)
3

Analysis of network pairs
 Interactions types can differ: within (“positive”) vs.
between (“negative”) functional units
 Input: networks P, N with same vertex set
 Goal: summarize both networks in a module map
 Node – module: gene set highly connected in P
 Link – two modules highly
interconnected in N
 Between-pathway models
Kelley and Ideker 2005
Ulitsky et al. 2008
Kelley and Kingsford 2011
Leiserson et al. 2011
P
N
4

Algorithms
 Different definitions for the links and the
optimization objective function
 Problems are NP hard
 Approximation is also hard (weighted graphs)
 Our algorithmic strategy:
 Initiators: Find a good initial solution
 Improvers: refine by merging/excluding modules
5

Initiators
 Cluster P
 Hierarchical
 Node addition
 Find linked module pairs
 DICER: Local search in the P
and N (Kelley, Ideker 2005, Amar et al. 2013)
 MBC-DICER: Find bi-cliques
 Define candidate sets U and V that are
bicliques in N
 Exhaustive solver (FP-MBC Li et al. 2007)
- requires tuning
6

Local Improvement
(DICER algorithm, Amar et al. PLoS CB 2013)
 Link: sum of N weights between modules is positive
 Goal: enlarge links
 Greedy approach
 Merge module links or add single nodes to link
7

Global analysis: node vs. module
 Null hypothesis: edges between
v and M are drawn randomly
(n=deg(v))
 Hyper-geometric p-value
 Options for weighted graphs:
 Use Wilcoxon rank-sum test
 Set a threshold and use the
same test
M
Not M
v
8

Global analysis: module vs. module
 Calculate a p-value for each node in V
and each node in U
 Merge p-values using Fisher’s method
 Under the null-hypothesis follows a
Chi-square distribution (dfs=number of
p-values)
U V
Other
nodes
9

Global analysis
 Given a set of modules M and a set of significant links
L, the solution score:
 Improvement steps: merge modules if the score
improves (select the best step iteratively)
 Fast and accurate analysis:
 Decide when to recalculate p-values
 Perform many merges simultaneously
10

(0) Simulations
 Graphs with 500 nodes, edge weight 1, non edge -1
 Plant a tree map with 6 modules (module size 10-20)
 Add random Gaussian noise (mean 0, SD = 1.2), additional modules,
bi-cliques
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Jaccard
Global Local Initiator only
12

(1) Yeast PPI and GI networks
 3979 genes
 P: protein-protein interactions (45,456 edges)
 N: negative genetic interactions (76,267 edges)
 Local improvers: poor results (less than 3 links)
 Results for global improver:
Initiator Modules Gene
coverage
Max
module
size
Enriched
GO terms
Enriched
modules
(%)
Enriched
links (%)
Links
MBC-DICER 100 946 49 243 87 80 430
DICER5 103 957 46 249 82 74 438
DICER 104 837 34 192 67 61 498
Hierarchical 123 877 30 186 68 59 394
NodeAddition 102 950 49 240 83 79 430
13

 Link p <10-50
 Chromatin
related hubs
similar to
Baryshnikova et
al. 2011
The yeast module map
14

The top links in the map (p <10-70)
Between
complexes
Between
subcomplexes
15

Comparison to extant methods
 Analysis of the Collins et al. 2007 data
 Comparing to extant methods that exploit both
positive and negative GIs and their weights
Algorithm
Number
of
modules
Gene
coverage
Maximal
module
size
Number of
enriched
GO terms
Percent
enriched
modules
Percent
enriched
links
Number of
links
MBC-DICER
(Global)
32 238 20 53 84 79 67
Genecentric
(Leiserson et al.
11)
116 1248 25 39 63 43 58
Kelley and
Kingsford 11
117 355 17 32 17 6 403
16

(2) Arabidopsis PPI & MD networks
 P: PPIs. N: metabolic dependencies (Tzfadia et al. 2012)
 Discover protein complexes and their metabolic links
17

Using the module map for function
prediction
 Validated modules by their ability to predict gene functions
in MapMan
 Function assignment: the gene’s module best assignment
 LOOCV: precision and recall > 80%
Gene MapMan term Module p-value
AT5G48000
sulfur-
containing.glucosinolates 0.0001
AT5G42590
sulfur-
containing.glucosinolates 0.0001
AT2G30870
redox.ascorbate and
glutathione.ascorbate 0.0028
AT4G15440 isoprenoids.carotenoids 0.0002
New predictions
18

(3) Human case-control profiles
 Data: expression profiles of Lung cancer (blood)
 P: multi-phenotype co-expression network ; N: differential correlation
(DC): change in correlation in disease vs. controls
 Cross-validation: most links show high DC in the test set
Link example:
Breakage of immune
activation in cancer
(enrichment q-value<1E-10)
Enrichment for NSLC-
specific causal miRNA
(mir-34 family, p =0.002,
mir2disease DB)
19

Summary
 Integration of networks
 Considering different interaction types
 A summary module-map
 Algorithms
 Initiators
 Improvers
 Algorithms perform well in simulations and real data
 PPI+GI
 PPI+MD
 Human disease: correlation and differential correlation
 Next steps (?)
 Cytoscape app (maybe next year…)
 Can we use module maps instead of gene networks for network
inference?
20

NetBioSIG2013-Talk David Amar

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NetBioSIG2013-Talk David Amar

Similar to NetBioSIG2013-Talk David Amar (20)

More from Alexander Pico

More from Alexander Pico (17)

Recently uploaded

Recently uploaded (20)

NetBioSIG2013-Talk David Amar