SlideShare a Scribd company logo
1 of 29
Download to read offline
Joint gene network inference with multiple
samples: a bootstrapped consensual approach
Nathalie Villa-Vialaneix
http://www.nathalievilla.org
nathalie.villa@univ-paris1.fr
Réunion annuelle du groupe NETBIO
Paris, 10 septembre 2013
Joint work with Matthieu Vignes, Nathalie Viguerie
and Magali SanCristobal
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 1 / 21
Short overview on network inference with GGM
Outline
1 Short overview on network inference with GGM
2 Inference with multiple samples
3 Simulations
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 2 / 21
Short overview on network inference with GGM
Framework
Data: large scale gene expression data
individuals
n 30/50



X =


. . . . . .
. . X
j
i
. . .
. . . . . .


variables (genes expression), p 103/4
What we want to obtain: a graph/network with
• nodes: genes;
• edges: strong links between gene expressions.
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 3 / 21
Short overview on network inference with GGM
Using partial correlations
correlation is not causality...
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 4 / 21
Short overview on network inference with GGM
Using partial correlations
strong indirect correlation
y z
x
set.seed(2807); x <- rnorm(100)
y <- 2*x+1+rnorm(100,0,0.1); cor(x,y) [1] 0.998826
z <- 2*x+1+rnorm(100,0,0.1); cor(x,z) [1] 0.998751
cor(y,z) [1] 0.9971105
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 4 / 21
Short overview on network inference with GGM
Using partial correlations
strong indirect correlation
y z
x
set.seed(2807); x <- rnorm(100)
y <- 2*x+1+rnorm(100,0,0.1); cor(x,y) [1] 0.998826
z <- 2*x+1+rnorm(100,0,0.1); cor(x,z) [1] 0.998751
cor(y,z) [1] 0.9971105
Partial correlation
cor(lm(x z)$residuals,lm(y z)$residuals) [1] 0.7801174
cor(lm(x y)$residuals,lm(z y)$residuals) [1] 0.7639094
cor(lm(y x)$residuals,lm(z x)$residuals) [1] -0.1933699
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 4 / 21
Short overview on network inference with GGM
Theoretical framework
Gaussian Graphical Models (GGM) [Schäfer and Strimmer, 2005,
Meinshausen and Bühlmann, 2006, Friedman et al., 2008]
gene expressions: X ∼ N(0, Σ)
Sparse approach: partial correlations are estimated by using linear
models and a sparse penalty: ∀ j
Xj
= βT
j X−j
+ ; arg max
(βjj )j

log MLj − λ
j j
|βjj |


In the Gaussian framework: βjj = −
Sjj
Sjj
where S = Σ−1
(concentration
matrix) is related to partial correlations by πjj = −
Sjj
√
SjjSj j
.
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 5 / 21
Inference with multiple samples
Outline
1 Short overview on network inference with GGM
2 Inference with multiple samples
3 Simulations
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 6 / 21
Inference with multiple samples
Motivation for multiple networks inference
Pan-European project Diogenes1
(with Nathalie Viguerie, INSERM):
gene expressions (lipid tissues) from 204 obese women before and after
a low-calorie diet (LCD).
• Assumption: A
common functioning
exists regardless the
condition;
• Which genes are linked
independently
from/depending on the
condition?
1
http://www.diogenes-eu.org/; see also [Viguerie et al., 2012]
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 7 / 21
Inference with multiple samples
Naive approach: independent estimations
Notations: p genes measured in k samples, each corresponding to a
specific condition: (Xc
j
)j=1,...,p ∼ N(0, Σc
), for c = 1, . . . , k.
For c = 1, . . . , k, nc independent observations (Xc
ij
)i=1,...,nc and c nc = n.
Independent inference
Estimation ∀ c = 1, . . . , k and ∀ j = 1, . . . , p,
Xc
j = Xc
jβc
j + c
j
are estimated (independently) by maximizing pseudo-likelihood:
L(S|X) =
c j
log P Xc
j |Xc
j, Sc
j , S concentration matrix
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 8 / 21
Inference with multiple samples
Related papers
Problem: previous estimation does not use the fact that the different
networks should be somehow alike!
Previous proposals
• [Chiquet et al., 2011] replace Σc
by Σc
= 1
2 Σc
+ 1
2 Σ and add a
sparse penalty;
• [Chiquet et al., 2011] LASSO and Group-LASSO type penalties to
force consistent or sign-coherent edges between conditions;
• [Danaher et al., ] add a sparse penalty and the penalty
c c Sc
− Sc
L1 ;
• [Mohan et al., 2012] add a group-LASSO like penalty
c c j Sc
j
− Sc
j L2 that focuses on differences due to a few number
of nodes only.
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 9 / 21
Inference with multiple samples
Consensus LASSO
Proposal: Infer multiple networks by forcing them toward a consensual
network: i.e., explicitly constraining the differences between conditions
to be under control but with a L2
penalty to allow for more differences
than with Group-LASSO type penalties.
Original optimization:
max
(βc
jk
)k j,c=1,...,C c

log MLc
j − λ
k j
|βc
jk |

 .
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 10 / 21
Inference with multiple samples
Consensus LASSO
Proposal: Infer multiple networks by forcing them toward a consensual
network: i.e., explicitly constraining the differences between conditions
to be under control but with a L2
penalty to allow for more differences
than with Group-LASSO type penalties.
Original optimization:
max
(βc
jk
)k j,c=1,...,C c

log MLc
j − λ
k j
|βc
jk |

 .
[Ambroise et al., 2009, Chiquet et al., 2011]: is equivalent to minimize p
problems having dimension k(p − 1):
1
2
βT
j Σjjβj + βT
j Σjj + λ βj L1
with Σjj is the block diagonal matrix Diag Σ1
jj
, . . . , Σk
jj
and similarly for
Σjj.
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 10 / 21
Inference with multiple samples
Consensus LASSO
Proposal: Infer multiple networks by forcing them toward a consensual
network: i.e., explicitly constraining the differences between conditions
to be under control but with a L2
penalty to allow for more differences
than with Group-LASSO type penalties.
Add a constraint to force inference toward a “consensus” βcons
1
2
βT
j Σjjβj + βT
j Σjj + λ βj L1 + µ
c
wc βc
j − βcons
j
2
L2
with:
• wc: real number used to weight the conditions (wc = 1 or wc = 1√
nc
);
• µ regularization parameter;
• βcons
j
whatever you want...?
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 10 / 21
Inference with multiple samples
Choice of a consensus
βcons
j
= c
nc
n βc
j
is a good choice because:
• the consensual penalty is then quadratic with respect to βj;
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 11 / 21
Inference with multiple samples
Choice of a consensus
βcons
j
= c
nc
n βc
j
is a good choice because:
• the consensual penalty is then quadratic with respect to βj;
• thus, solving the optimization problem is equivalent to maximizing
1
2
βT
j Sj(µ)βj + βT
j Σjj + λ
c
1
nc
βc
j 1
with Sj(µ) = Σjj + 2µAT
A with A a matrix that does not depend on j.
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 11 / 21
Inference with multiple samples
Choice of a consensus
βcons
j
= c
nc
n βc
j
is a good choice because:
• the consensual penalty is then quadratic with respect to βj;
• thus, solving the optimization problem is equivalent to maximizing
1
2
βT
j Sj(µ)βj + βT
j Σjj + λ
c
1
nc
βc
j 1
with Sj(µ) = Σjj + 2µAT
A with A a matrix that does not depend on j.
Convex part + L1
-norm penalty
similar to standard LASSO problems: use of an “active set” approach as
described in [Osborne et al., 2000, Chiquet et al., 2011]
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 11 / 21
Inference with multiple samples
Bootstrap estimation
Bootstrapped Consensus Lasso
1: Require: List of genes: {1, . . . , p}; Gene expressions: X; Condition
ids: ci ∈ {1, ..., C}
2: Initialize ∀ j, j ∈ {1, . . . , p}, Nc
(j, j ) ← 0; µ fixed
3: for b = 1 → P do
4: Take a bootstrap sample Bb
5: Estimate (βc,b,λ
j
)j,c,λ from the previous method for several λ
(decreasing order)
6: Find j,j ,c Iβc,λ,b
j
0
> T1 return (βc,b
j
)j,c := (βc,λmax,b
j
)j,c
7: if βc,b
j
0 then
8: Nc
(j, j ) ← Nc
(j, j ) + 1
9: end if
10: end for
11: Select edges with Nc
(j, j ) > T2 (T2 chosen)
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 12 / 21
Simulations
Outline
1 Short overview on network inference with GGM
2 Inference with multiple samples
3 Simulations
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 13 / 21
Simulations
Simulated data
Expression data with known co-expression network
• original network (scale free) taken from
http://www.comp-sys-bio.org/AGN/data.html (100 nodes,
∼ 200 edges, loops removed);
• rewire a ratio r of the edges to generate k “children” networks
(sharing approximately 100(1 − 2r)% of their edges);
• generate “expression data” with a random Gaussian process from
each chid.
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 14 / 21
Simulations
An example with k = 2, r = 5%
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 15 / 21
Simulations
Choice for T2
Data: r = 0.05, k = 2 and n1 = n2 = 20
100 bootstrap samples, µ = 1, T1 = 250 or 500
qq 30 selections at least
30 selections at least
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75
precision
recall
q
q
40 selections at least
43 selections at least
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
precision
recall
Dots correspond to best F = 2 × precision×recall
precision+recall
⇒ Best F corresponds to selecting a number of edges approximately
equal to the number of edges of the original network.
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 16 / 21
Simulations
Choice for T1 and µ
µ T1 % of improvement
0.1/1 {250, 300, 500} of bootstrapping
network sizes rewired edges: 5%
20-20 1 500 30.69
20-30 0.1 500 11.87
30-30 1 300 20.15
50-50 1 300 14.36
20-20-20-20-20 1 500 86.04
30-30-30-30 0.1 500 42.67
network sizes rewired edges: 20%
20-20 0.1 300 -17.86
20-30 0.1 300 -18.35
30-30 1 500 -7.97
50-50 0.1 300 -7.83
20-20-20-20-20 0.1 500 10.27
30-30-30-30 1 500 13.48
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 17 / 21
Simulations
Comparisons (best/worst case F for different parameters)
Method gLasso cLasso gLasso+boot cLasso+boot
nc rewired edges: 5%
20-20 0.19 0.22 (0.18) 0.27 (0.26) 0.29 (0.27)
20-30 0.26 0.30 (0.26) 0.31 (0.29) 0.33 (0.32)
30-30 0.28 0.31 (0.27) 0.35 (0.31) 0.38 (0.36)
50-50 0.36 0.43 (0.36) 0.47 (0.46) 0.49 (0.49)
20-20-20-20-20 0.19 0.23 (0.18) 0.39 (0.38) 0.43 (0.40)
30-30-30-30 0.30 0.36 (0.29) 0.49 (0.48) 0.51 (0.50)
nc rewired edges: 20%
20-20 0.21 0.23 (0.19) 0.18 (0.17) 0.19 (0.17)
20-30 0.26 0.26 (0.25) 0.20 (0.19) 0.22 (0.20)
30-30 0.28 0.31 (0.29) 0.27 (0.27) 0.29 (0.28)
50-50 0.42 0.43 (0.41) 0.38 (0.37) 0.40 (0.38)
20-20-20-20-20 0.20 0.22 (0.20) 0.22 (0.20) 0.24 (0.24)
30-30-30-30 0.27 0.29 (0.27) 0.30 (0.30) 0.33 (0.31)
Not shown here but when the % of rewired edges is larger (20%),
intertwinned Lasso has better performances (they are not improved by
bootstrapping).
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 18 / 21
Simulations
Real data
204 obese women ; expression of 221 genes before and after a LCD
µ = 1 ; T1 = 1000 (target density: 4%)
Distribution of the number of times an edge is selected over 100
bootstrap samples
0
500
1000
1500
0 25 50 75 100
Counts
Frequency
(70% of the pairs of nodes are never selected) ⇒ T2 = 80
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 19 / 21
Simulations
Networks
Before diet
q
q
q
q
q
q
q
q
q
q
q
qq
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
AZGP1
CIDEA
FADS1
FADS2
GPD1L
PCK2
After diet
q
q
q
q
q
q
q
q
q
q
q
qq
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
AZGP1
CIDEA
FADS1
FADS2
GPD1L
PCK2
densities about 1.3% - some interactions (both shared and specific) make
sense for the biologist
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 20 / 21
Simulations
Thank you for your attention...
Programs available in the R package therese (on R-Forge). Joint work
with
Magali SanCristobal Matthieu Vignes
(LGC, INRA de Toulouse) (MIAT, INRA de Toulouse)
Nathalie Viguerie
(I2MC, INSERM Toulouse)
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 21 / 21
Simulations
References
Ambroise, C., Chiquet, J., and Matias, C. (2009).
Inferring sparse Gaussian graphical models with latent structure.
Electronic Journal of Statistics, 3:205–238.
Chiquet, J., Grandvalet, Y., and Ambroise, C. (2011).
Inferring multiple graphical structures.
Statistics and Computing, 21(4):537–553.
Danaher, P., Wang, P., and Witten, D.
The joint graphical lasso for inverse covariance estimation accross multiple classes.
Preprint arXiv 1111.0324v3. Submitted for publication.
Friedman, J., Hastie, T., and Tibshirani, R. (2008).
Sparse inverse covariance estimation with the graphical lasso.
Biostatistics, 9(3):432–441.
Meinshausen, N. and Bühlmann, P. (2006).
High dimensional graphs and variable selection with the lasso.
Annals of Statistic, 34(3):1436–1462.
Mohan, K., Chung, J., Han, S., Witten, D., Lee, S., and Fazel, M. (2012).
Structured learning of Gaussian graphical models.
In Proceedings of NIPS (Neural Information Processing Systems) 2012, Lake Tahoe, Nevada, USA.
Osborne, M., Presnell, B., and Turlach, B. (2000).
On the LASSO and its dual.
Journal of Computational and Graphical Statistics, 9(2):319–337.
Schäfer, J. and Strimmer, K. (2005).
An empirical bayes approach to inferring large-scale gene association networks.
Bioinformatics, 21(6):754–764.
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 21 / 21
Simulations
Viguerie, N., Montastier, E., Maoret, J., Roussel, B., Combes, M., Valle, C., Villa-Vialaneix, N., Iacovoni, J., Martinez, J., Holst,
C., Astrup, A., Vidal, H., Clément, K., Hager, J., Saris, W., and Langin, D. (2012).
Determinants of human adipose tissue gene expression: impact of diet, sex, metabolic status and cis genetic regulation.
PLoS Genetics, 8(9):e1002959.
Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 21 / 21

More Related Content

What's hot

RFNM-Aranda-Final.PDF
RFNM-Aranda-Final.PDFRFNM-Aranda-Final.PDF
RFNM-Aranda-Final.PDF
Thomas Aranda
 
Geometric correlations in multiplexes and how they make them more robust
Geometric correlations in multiplexes and how they make them more robustGeometric correlations in multiplexes and how they make them more robust
Geometric correlations in multiplexes and how they make them more robust
Kolja Kleineberg
 
The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics
The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics
The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics
Kolja Kleineberg
 

What's hot (18)

Gaussian Process Latent Variable Models & applications in single-cell genomics
Gaussian Process Latent Variable Models & applications in single-cell genomicsGaussian Process Latent Variable Models & applications in single-cell genomics
Gaussian Process Latent Variable Models & applications in single-cell genomics
 
Advanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysisAdvanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysis
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
capsule network
capsule networkcapsule network
capsule network
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
RFNM-Aranda-Final.PDF
RFNM-Aranda-Final.PDFRFNM-Aranda-Final.PDF
RFNM-Aranda-Final.PDF
 
Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Geometric correlations in multiplexes and how they make them more robust
Geometric correlations in multiplexes and how they make them more robustGeometric correlations in multiplexes and how they make them more robust
Geometric correlations in multiplexes and how they make them more robust
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity data
 
The evaluation for the defense of adversarial attacks
The evaluation for the defense of adversarial attacksThe evaluation for the defense of adversarial attacks
The evaluation for the defense of adversarial attacks
 
Sparse inverse covariance estimation using skggm
Sparse inverse covariance estimation using skggmSparse inverse covariance estimation using skggm
Sparse inverse covariance estimation using skggm
 
The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics
The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics
The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics
 
Continuous Unsupervised Training of Deep Architectures
Continuous Unsupervised Training of Deep ArchitecturesContinuous Unsupervised Training of Deep Architectures
Continuous Unsupervised Training of Deep Architectures
 
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
 
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
 

Viewers also liked

Viewers also liked (11)

Social Externalities of Different Fuels in Brazil
Social Externalities of Different Fuels in BrazilSocial Externalities of Different Fuels in Brazil
Social Externalities of Different Fuels in Brazil
 
An Overview of Brazilian Actions Regarding Sustainability of Biofuels
An Overview of Brazilian Actions Regarding Sustainability of BiofuelsAn Overview of Brazilian Actions Regarding Sustainability of Biofuels
An Overview of Brazilian Actions Regarding Sustainability of Biofuels
 
Initiatives Regarding Sustainability of Biofuels in Europe and their Potencia...
Initiatives Regarding Sustainability of Biofuels in Europe and their Potencia...Initiatives Regarding Sustainability of Biofuels in Europe and their Potencia...
Initiatives Regarding Sustainability of Biofuels in Europe and their Potencia...
 
Research Program on Sustainability of Bioethanol: Socio economic impacts
Research Program on Sustainability of Bioethanol: Socio economic impactsResearch Program on Sustainability of Bioethanol: Socio economic impacts
Research Program on Sustainability of Bioethanol: Socio economic impacts
 
Apresentação do CTBE
Apresentação do CTBEApresentação do CTBE
Apresentação do CTBE
 
Procedimentos de Interação: CTBE-Indústria
Procedimentos de Interação: CTBE-IndústriaProcedimentos de Interação: CTBE-Indústria
Procedimentos de Interação: CTBE-Indústria
 
NGO Reporter Brasil´s Experience on Monitoring Social Impacts of Ethanol
NGO Reporter Brasil´s Experience on Monitoring Social Impacts of EthanolNGO Reporter Brasil´s Experience on Monitoring Social Impacts of Ethanol
NGO Reporter Brasil´s Experience on Monitoring Social Impacts of Ethanol
 
Biological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical ModelsBiological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical Models
 
The Biomass, Fibre and Sucrose Dilemma in Realising the Agronomic Potential o...
The Biomass, Fibre and Sucrose Dilemma in Realising the Agronomic Potential o...The Biomass, Fibre and Sucrose Dilemma in Realising the Agronomic Potential o...
The Biomass, Fibre and Sucrose Dilemma in Realising the Agronomic Potential o...
 
Research Program on Sustainability of Bioethanol
Research Program on Sustainability of BioethanolResearch Program on Sustainability of Bioethanol
Research Program on Sustainability of Bioethanol
 
Silicon Importance on Aliviating Biotic and Abiotic Stress on Sugarcane
Silicon Importance on Aliviating Biotic and Abiotic Stress on SugarcaneSilicon Importance on Aliviating Biotic and Abiotic Stress on Sugarcane
Silicon Importance on Aliviating Biotic and Abiotic Stress on Sugarcane
 

Similar to Joint gene network inference with multiple samples: a bootstrapped consensual approach

Towards controlling evolutionary dynamics through network geometry: some very...
Towards controlling evolutionary dynamics through network geometry: some very...Towards controlling evolutionary dynamics through network geometry: some very...
Towards controlling evolutionary dynamics through network geometry: some very...
Kolja Kleineberg
 

Similar to Joint gene network inference with multiple samples: a bootstrapped consensual approach (20)

Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
CoopLoc Technical Presentation
CoopLoc Technical PresentationCoopLoc Technical Presentation
CoopLoc Technical Presentation
 
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
 
Overlap Layout Consensus assembly
Overlap Layout Consensus assemblyOverlap Layout Consensus assembly
Overlap Layout Consensus assembly
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
論文紹介:Learning With Neighbor Consistency for Noisy Labels
論文紹介:Learning With Neighbor Consistency for Noisy Labels論文紹介:Learning With Neighbor Consistency for Noisy Labels
論文紹介:Learning With Neighbor Consistency for Noisy Labels
 
A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...
 
Towards controlling evolutionary dynamics through network geometry: some very...
Towards controlling evolutionary dynamics through network geometry: some very...Towards controlling evolutionary dynamics through network geometry: some very...
Towards controlling evolutionary dynamics through network geometry: some very...
 
Graphical Model Selection for Big Data
Graphical Model Selection for Big DataGraphical Model Selection for Big Data
Graphical Model Selection for Big Data
 
Inria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCInria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCC
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
Estimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample SetsEstimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample Sets
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...
 
A new look on performance of small-cell network with design of multiple anten...
A new look on performance of small-cell network with design of multiple anten...A new look on performance of small-cell network with design of multiple anten...
A new look on performance of small-cell network with design of multiple anten...
 

More from tuxette

More from tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 

Recently uploaded

Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 

Recently uploaded (20)

Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 

Joint gene network inference with multiple samples: a bootstrapped consensual approach

  • 1. Joint gene network inference with multiple samples: a bootstrapped consensual approach Nathalie Villa-Vialaneix http://www.nathalievilla.org nathalie.villa@univ-paris1.fr Réunion annuelle du groupe NETBIO Paris, 10 septembre 2013 Joint work with Matthieu Vignes, Nathalie Viguerie and Magali SanCristobal Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 1 / 21
  • 2. Short overview on network inference with GGM Outline 1 Short overview on network inference with GGM 2 Inference with multiple samples 3 Simulations Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 2 / 21
  • 3. Short overview on network inference with GGM Framework Data: large scale gene expression data individuals n 30/50    X =   . . . . . . . . X j i . . . . . . . . .   variables (genes expression), p 103/4 What we want to obtain: a graph/network with • nodes: genes; • edges: strong links between gene expressions. Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 3 / 21
  • 4. Short overview on network inference with GGM Using partial correlations correlation is not causality... Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 4 / 21
  • 5. Short overview on network inference with GGM Using partial correlations strong indirect correlation y z x set.seed(2807); x <- rnorm(100) y <- 2*x+1+rnorm(100,0,0.1); cor(x,y) [1] 0.998826 z <- 2*x+1+rnorm(100,0,0.1); cor(x,z) [1] 0.998751 cor(y,z) [1] 0.9971105 Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 4 / 21
  • 6. Short overview on network inference with GGM Using partial correlations strong indirect correlation y z x set.seed(2807); x <- rnorm(100) y <- 2*x+1+rnorm(100,0,0.1); cor(x,y) [1] 0.998826 z <- 2*x+1+rnorm(100,0,0.1); cor(x,z) [1] 0.998751 cor(y,z) [1] 0.9971105 Partial correlation cor(lm(x z)$residuals,lm(y z)$residuals) [1] 0.7801174 cor(lm(x y)$residuals,lm(z y)$residuals) [1] 0.7639094 cor(lm(y x)$residuals,lm(z x)$residuals) [1] -0.1933699 Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 4 / 21
  • 7. Short overview on network inference with GGM Theoretical framework Gaussian Graphical Models (GGM) [Schäfer and Strimmer, 2005, Meinshausen and Bühlmann, 2006, Friedman et al., 2008] gene expressions: X ∼ N(0, Σ) Sparse approach: partial correlations are estimated by using linear models and a sparse penalty: ∀ j Xj = βT j X−j + ; arg max (βjj )j  log MLj − λ j j |βjj |   In the Gaussian framework: βjj = − Sjj Sjj where S = Σ−1 (concentration matrix) is related to partial correlations by πjj = − Sjj √ SjjSj j . Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 5 / 21
  • 8. Inference with multiple samples Outline 1 Short overview on network inference with GGM 2 Inference with multiple samples 3 Simulations Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 6 / 21
  • 9. Inference with multiple samples Motivation for multiple networks inference Pan-European project Diogenes1 (with Nathalie Viguerie, INSERM): gene expressions (lipid tissues) from 204 obese women before and after a low-calorie diet (LCD). • Assumption: A common functioning exists regardless the condition; • Which genes are linked independently from/depending on the condition? 1 http://www.diogenes-eu.org/; see also [Viguerie et al., 2012] Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 7 / 21
  • 10. Inference with multiple samples Naive approach: independent estimations Notations: p genes measured in k samples, each corresponding to a specific condition: (Xc j )j=1,...,p ∼ N(0, Σc ), for c = 1, . . . , k. For c = 1, . . . , k, nc independent observations (Xc ij )i=1,...,nc and c nc = n. Independent inference Estimation ∀ c = 1, . . . , k and ∀ j = 1, . . . , p, Xc j = Xc jβc j + c j are estimated (independently) by maximizing pseudo-likelihood: L(S|X) = c j log P Xc j |Xc j, Sc j , S concentration matrix Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 8 / 21
  • 11. Inference with multiple samples Related papers Problem: previous estimation does not use the fact that the different networks should be somehow alike! Previous proposals • [Chiquet et al., 2011] replace Σc by Σc = 1 2 Σc + 1 2 Σ and add a sparse penalty; • [Chiquet et al., 2011] LASSO and Group-LASSO type penalties to force consistent or sign-coherent edges between conditions; • [Danaher et al., ] add a sparse penalty and the penalty c c Sc − Sc L1 ; • [Mohan et al., 2012] add a group-LASSO like penalty c c j Sc j − Sc j L2 that focuses on differences due to a few number of nodes only. Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 9 / 21
  • 12. Inference with multiple samples Consensus LASSO Proposal: Infer multiple networks by forcing them toward a consensual network: i.e., explicitly constraining the differences between conditions to be under control but with a L2 penalty to allow for more differences than with Group-LASSO type penalties. Original optimization: max (βc jk )k j,c=1,...,C c  log MLc j − λ k j |βc jk |   . Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 10 / 21
  • 13. Inference with multiple samples Consensus LASSO Proposal: Infer multiple networks by forcing them toward a consensual network: i.e., explicitly constraining the differences between conditions to be under control but with a L2 penalty to allow for more differences than with Group-LASSO type penalties. Original optimization: max (βc jk )k j,c=1,...,C c  log MLc j − λ k j |βc jk |   . [Ambroise et al., 2009, Chiquet et al., 2011]: is equivalent to minimize p problems having dimension k(p − 1): 1 2 βT j Σjjβj + βT j Σjj + λ βj L1 with Σjj is the block diagonal matrix Diag Σ1 jj , . . . , Σk jj and similarly for Σjj. Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 10 / 21
  • 14. Inference with multiple samples Consensus LASSO Proposal: Infer multiple networks by forcing them toward a consensual network: i.e., explicitly constraining the differences between conditions to be under control but with a L2 penalty to allow for more differences than with Group-LASSO type penalties. Add a constraint to force inference toward a “consensus” βcons 1 2 βT j Σjjβj + βT j Σjj + λ βj L1 + µ c wc βc j − βcons j 2 L2 with: • wc: real number used to weight the conditions (wc = 1 or wc = 1√ nc ); • µ regularization parameter; • βcons j whatever you want...? Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 10 / 21
  • 15. Inference with multiple samples Choice of a consensus βcons j = c nc n βc j is a good choice because: • the consensual penalty is then quadratic with respect to βj; Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 11 / 21
  • 16. Inference with multiple samples Choice of a consensus βcons j = c nc n βc j is a good choice because: • the consensual penalty is then quadratic with respect to βj; • thus, solving the optimization problem is equivalent to maximizing 1 2 βT j Sj(µ)βj + βT j Σjj + λ c 1 nc βc j 1 with Sj(µ) = Σjj + 2µAT A with A a matrix that does not depend on j. Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 11 / 21
  • 17. Inference with multiple samples Choice of a consensus βcons j = c nc n βc j is a good choice because: • the consensual penalty is then quadratic with respect to βj; • thus, solving the optimization problem is equivalent to maximizing 1 2 βT j Sj(µ)βj + βT j Σjj + λ c 1 nc βc j 1 with Sj(µ) = Σjj + 2µAT A with A a matrix that does not depend on j. Convex part + L1 -norm penalty similar to standard LASSO problems: use of an “active set” approach as described in [Osborne et al., 2000, Chiquet et al., 2011] Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 11 / 21
  • 18. Inference with multiple samples Bootstrap estimation Bootstrapped Consensus Lasso 1: Require: List of genes: {1, . . . , p}; Gene expressions: X; Condition ids: ci ∈ {1, ..., C} 2: Initialize ∀ j, j ∈ {1, . . . , p}, Nc (j, j ) ← 0; µ fixed 3: for b = 1 → P do 4: Take a bootstrap sample Bb 5: Estimate (βc,b,λ j )j,c,λ from the previous method for several λ (decreasing order) 6: Find j,j ,c Iβc,λ,b j 0 > T1 return (βc,b j )j,c := (βc,λmax,b j )j,c 7: if βc,b j 0 then 8: Nc (j, j ) ← Nc (j, j ) + 1 9: end if 10: end for 11: Select edges with Nc (j, j ) > T2 (T2 chosen) Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 12 / 21
  • 19. Simulations Outline 1 Short overview on network inference with GGM 2 Inference with multiple samples 3 Simulations Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 13 / 21
  • 20. Simulations Simulated data Expression data with known co-expression network • original network (scale free) taken from http://www.comp-sys-bio.org/AGN/data.html (100 nodes, ∼ 200 edges, loops removed); • rewire a ratio r of the edges to generate k “children” networks (sharing approximately 100(1 − 2r)% of their edges); • generate “expression data” with a random Gaussian process from each chid. Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 14 / 21
  • 21. Simulations An example with k = 2, r = 5% Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 15 / 21
  • 22. Simulations Choice for T2 Data: r = 0.05, k = 2 and n1 = n2 = 20 100 bootstrap samples, µ = 1, T1 = 250 or 500 qq 30 selections at least 30 selections at least 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 precision recall q q 40 selections at least 43 selections at least 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 precision recall Dots correspond to best F = 2 × precision×recall precision+recall ⇒ Best F corresponds to selecting a number of edges approximately equal to the number of edges of the original network. Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 16 / 21
  • 23. Simulations Choice for T1 and µ µ T1 % of improvement 0.1/1 {250, 300, 500} of bootstrapping network sizes rewired edges: 5% 20-20 1 500 30.69 20-30 0.1 500 11.87 30-30 1 300 20.15 50-50 1 300 14.36 20-20-20-20-20 1 500 86.04 30-30-30-30 0.1 500 42.67 network sizes rewired edges: 20% 20-20 0.1 300 -17.86 20-30 0.1 300 -18.35 30-30 1 500 -7.97 50-50 0.1 300 -7.83 20-20-20-20-20 0.1 500 10.27 30-30-30-30 1 500 13.48 Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 17 / 21
  • 24. Simulations Comparisons (best/worst case F for different parameters) Method gLasso cLasso gLasso+boot cLasso+boot nc rewired edges: 5% 20-20 0.19 0.22 (0.18) 0.27 (0.26) 0.29 (0.27) 20-30 0.26 0.30 (0.26) 0.31 (0.29) 0.33 (0.32) 30-30 0.28 0.31 (0.27) 0.35 (0.31) 0.38 (0.36) 50-50 0.36 0.43 (0.36) 0.47 (0.46) 0.49 (0.49) 20-20-20-20-20 0.19 0.23 (0.18) 0.39 (0.38) 0.43 (0.40) 30-30-30-30 0.30 0.36 (0.29) 0.49 (0.48) 0.51 (0.50) nc rewired edges: 20% 20-20 0.21 0.23 (0.19) 0.18 (0.17) 0.19 (0.17) 20-30 0.26 0.26 (0.25) 0.20 (0.19) 0.22 (0.20) 30-30 0.28 0.31 (0.29) 0.27 (0.27) 0.29 (0.28) 50-50 0.42 0.43 (0.41) 0.38 (0.37) 0.40 (0.38) 20-20-20-20-20 0.20 0.22 (0.20) 0.22 (0.20) 0.24 (0.24) 30-30-30-30 0.27 0.29 (0.27) 0.30 (0.30) 0.33 (0.31) Not shown here but when the % of rewired edges is larger (20%), intertwinned Lasso has better performances (they are not improved by bootstrapping). Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 18 / 21
  • 25. Simulations Real data 204 obese women ; expression of 221 genes before and after a LCD µ = 1 ; T1 = 1000 (target density: 4%) Distribution of the number of times an edge is selected over 100 bootstrap samples 0 500 1000 1500 0 25 50 75 100 Counts Frequency (70% of the pairs of nodes are never selected) ⇒ T2 = 80 Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 19 / 21
  • 26. Simulations Networks Before diet q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q AZGP1 CIDEA FADS1 FADS2 GPD1L PCK2 After diet q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q AZGP1 CIDEA FADS1 FADS2 GPD1L PCK2 densities about 1.3% - some interactions (both shared and specific) make sense for the biologist Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 20 / 21
  • 27. Simulations Thank you for your attention... Programs available in the R package therese (on R-Forge). Joint work with Magali SanCristobal Matthieu Vignes (LGC, INRA de Toulouse) (MIAT, INRA de Toulouse) Nathalie Viguerie (I2MC, INSERM Toulouse) Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 21 / 21
  • 28. Simulations References Ambroise, C., Chiquet, J., and Matias, C. (2009). Inferring sparse Gaussian graphical models with latent structure. Electronic Journal of Statistics, 3:205–238. Chiquet, J., Grandvalet, Y., and Ambroise, C. (2011). Inferring multiple graphical structures. Statistics and Computing, 21(4):537–553. Danaher, P., Wang, P., and Witten, D. The joint graphical lasso for inverse covariance estimation accross multiple classes. Preprint arXiv 1111.0324v3. Submitted for publication. Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441. Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the lasso. Annals of Statistic, 34(3):1436–1462. Mohan, K., Chung, J., Han, S., Witten, D., Lee, S., and Fazel, M. (2012). Structured learning of Gaussian graphical models. In Proceedings of NIPS (Neural Information Processing Systems) 2012, Lake Tahoe, Nevada, USA. Osborne, M., Presnell, B., and Turlach, B. (2000). On the LASSO and its dual. Journal of Computational and Graphical Statistics, 9(2):319–337. Schäfer, J. and Strimmer, K. (2005). An empirical bayes approach to inferring large-scale gene association networks. Bioinformatics, 21(6):754–764. Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 21 / 21
  • 29. Simulations Viguerie, N., Montastier, E., Maoret, J., Roussel, B., Combes, M., Valle, C., Villa-Vialaneix, N., Iacovoni, J., Martinez, J., Holst, C., Astrup, A., Vidal, H., Clément, K., Hager, J., Saris, W., and Langin, D. (2012). Determinants of human adipose tissue gene expression: impact of diet, sex, metabolic status and cis genetic regulation. PLoS Genetics, 8(9):e1002959. Consensus LASSO (SAMM, UP1) Nathalie Villa-Vialaneix Paris, 10 septembre 2013 21 / 21