Reading revue of "Inferring Multiple Graphical Structures"

Reading revue of Inferring Multiple Graphical
Structures
from J. Chiquet et al. (and related articles)
Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr
http://www.nathalievilla.org
Groupe de travail samm-graph - 17/02/2012
Reading revue (Chiquet et al., 2011) samm-graph, 17/02/2012 Nathalie Villa-Vialaneix 1 / 18

Outline
1 Network inference
Package GeneNet
Package glasso
2 Multiple Graphical Structures

Network inference
Outline
1 Network inference
Package GeneNet
Package glasso

Network inference
Framework
Data: large scale gene expression data
individuals
n 30/50



X =


. . . . . .
. . X
j
i . . .
. . . . . .


variables (genes expression), p 103/4
What we want to obtain: a network with
• nodes: genes;
• edges: signicant and direct co-expression between two genes (track
transcription regulations)

Network inference
Advantages of inferring a network from large scale
transcription data
1 over raw data: focuses on direct links
strong indirect correlation

Network inference
transcription data
2 over raw data (again): focuses on signicant links (more robust)

Network inference
transcription data
2 over raw data (again): focuses on signicant links (more robust)
3 over bibliographic network: can handle interactions with yet
unknown (not annotated) genes

Network inference
Various approaches (and packages) to infer gene
co-expression networks
• Graphical Gaussian Model (Xi)i=1,...,n are i.i.d. Gaussian random
variables N(0, Σ) (gene expression); then
j ←→ j (genes j and j are linked) ⇔ Cor X
j, X
j |(X
k)k=j,j 0
Cor X
j, X
j |(X
k)k=j,j Σ−1
j,j ⇒ nd the partial correlations
by means of (Σn)−1
.

Network inference
• Graphical Gaussian Model (Xi)i=1,...,n are i.i.d. Gaussian random
variables N(0, Σ) (gene expression); then
j ←→ j (genes j and j are linked) ⇔ Cor X
j, X
j |(X
k)k=j,j 0
Cor X
j, X
j |(X
k)k=j,j Σ−1
j,j ⇒ nd the partial correlations
by means of (Σn)−1
.
Problem: Σ is a p-dimensional matrix (with p large) and n is small
compared to p ⇒ (Σn)−1
is a poor estimate of Σ−1
!

Network inference
• Graphical Gaussian Model
• seminal work:
[Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b]
(with bootstrapping or shrinkage and a proposal for a Bayesian test for
signicance); package genenet;

Network inference
• seminal work:
[Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b]
(with bootstrapping or shrinkage and a proposal for a Bayesian test for
signicance); package genenet;
• sparse approaches [Friedman et al., 2008]: packages GGMselect
[Giraud et al., 2009] or SIMoNe [Chiquet et al., 2009,
Ambroise et al., 2009, Chiquet et al., 2011] (with unsupervised
clustering or able to handle multiple populations data)

Network inference
• Bayesian network learning [Pearl, 1998, Pearl and Russel, 2002]
DAG (Direct Acyclic Graph) and (conditional) probability tables

Network inference
Learning: nd conditional probability tables and DAG.
Standard issues:
• search for unobserved (latent) variables dependency;
• estimate probabilities by ML optimization (EM algorithm);
• search for DAG (skeleton, directionality): several DAGs are often
plausible.
Package bnlearn, [Scutari, 2010].

Network inference
• Networks based on mutual information (MI): MI, I (X
j, X
j )
measures the information gain (related to KL divergence):
I (X
j, X
j ) = H(X
j) + H(X
j ) − H(X
j, X
j ) = H(X
j) − H(X
j|X
j )
where H is the entropy H(X
j) = − x∈Xj p(x ) log p(x ) (I
uncertainty reduction in one variable after removing the uncertainty
in the other variable).
Standard issues:
• estimate I ;
• nd out which pairs of variables have signicant MI.
Package minet, [Meyer et al., 2008].

Network inference Package GeneNet
GGM: shrinkage approach
package GeneNet estimates partial correlations in the Gaussian
Graphical Model framework [Schäfer and Strimmer, 2005b]:
• X = (X
1
, . . . , X
p) (p genes expressions): random Gaussian vector
with variance Σ;
• j ↔ j ⇔ Cor(X
j, X
j |(X
k)k=j,j ) 0 ⇔ Σ−1
jj 0.
Shrinkage: use (1 − λ)Σ + λΩ instead of Σ (where Ω is, e.g., identity
matrix and λ is estimated from the data) to stabilize the estimation of Σ−1
(bagging is also useable [Schäfer and Strimmer, 2005a])
Signicant partial correlations are then selected using a Bayesian test
based on a distribution mixture: partial correlation ts a mixture model
η0f0(., κ) + ηAfA
η0 prior for null hypothesis, ηA = 1 − η0, η0 ηA (η0, κ estimated by EM).
FDR correction: at level α (5% here), keep edges for which p(i) ≤ iα
e/η0
where e is the number of edges and p(1), p(2), ..., p(e) are ordered p-values.

Network inference Package GeneNet
Example
Expression data: 272 genes and 53 observations (pigs...)
Shrinkage approach: 883 edges (density: 2.24%); Bootstrap approach:
2345 edges (density: 6.36%).

Network inference Package glasso
Sparse linear regression
Linear regression for each node:
∀ j = 1, . . . , p, X
j = SjX
−j + j
with X
−j, gene expressions without gene j .

∀ j = 1, . . . , p, X
j = SjX
−j + j
with X
Relation with the network:
j ↔ j ⇔ Sjj = 0.

∀ j = 1, . . . , p, X
j = SjX
−j + j
with X
j ↔ j ⇔ Sjj = 0.
Estimation: [Meinshausen and Bühlmann, 2006] LS estimate
∀ j = 1, . . . , p, arg min
Sj
n
i=1
X
j
i − SjX
−j
i
2

∀ j = 1, . . . , p, X
j = SjX
−j + j
with X
j ↔ j ⇔ Sjj = 0.
Estimation: [Meinshausen and Bühlmann, 2006] LS estimate with
L
1
-penalization
∀ j = 1, . . . , p, arg min
Sj
n
i=1
X
j
i − SjX
−j
i
2
+λ
j =j
|Sjj |

∀ j = 1, . . . , p, X
j = SjX
−j + j
with X
j ↔ j ⇔ Sjj = 0.
Estimation: [Meinshausen and Bühlmann, 2006] LS estimate with
L
1
-penalization
∀ j = 1, . . . , p, arg min
Sj
n
i=1
X
j
i − SjX
−j
i
2
+λ
j =j
|Sjj |
Sparse penalization ⇒ only a few j are such that Sjj = 0 (variable
selection).

Sparse linear regression by pseudo-Likelihood maximization
Estimation: [Friedman et al., 2008] Gaussien framework allows us to
use pseudo-ML optimization with a sparse penalization
L (S |X ) −λ S 1 =
n
i=1


p
j=1
log P(X
j
i |X
−j
i , Sj)

 −λ S 1
Remark: For [Meinshausen and Bühlmann, 2006], the estimates are
not symmetric ⇒ symmetrization is done by OR or AND policies.

Summary
Density comparison
Schäfer and Strimmer (shrinkage) 2.24%
Schäfer and Strimmer (bootstrap) 6.36%
Friedman et al. 3.78%
Meinshausen and Bühlmann (OR policy) 3.24%
Meinshausen and Bühlmann (AND policy) 1.68%

Summary
Density comparison
Schäfer and Strimmer (shrinkage) 2.24%
Schäfer and Strimmer (bootstrap) 6.36%
Friedman et al. 3.78%
Meinshausen and Bühlmann (OR policy) 3.24%
Meinshausen and Bühlmann (AND policy) 1.68%
Edges comparison
Schäfer Strimmer Schäfer Strimmer Friedman et al.
(883) (2345) (1425)
Schäfer Strimmer 883
Friedman et al. 883 1425
Meinshausen Bühlmann (1195) 883 1195 1195

Multiple Graphical Structures
Outline
1 Network inference
Package GeneNet
Package glasso

Framework
T samples measuring the expression of the same genes:
X
1,t, . . . , X
p,t
for t = 1, . . . , T and each X
j,t is a nt-dimensional vectors (nt observations
in sample t).

Framework
X
1,t, . . . , X
p,t
in sample t).
Naive approach: independant inferences
L S
t|X
t =
n
i=1


p
j=1
log P(X
j,t
i |X
−j,t
i , S
t
j )


and
arg max
S1,...,ST
t
L S
t|X
t − λ S
t
1

Framework
X
1,t, . . . , X
p,t
in sample t).
Naive approach: independant inferences
L S
t|X
t =
n
i=1


p
j=1
log P(X
j,t
i |X
−j,t
i , S
t
j )


and
arg max
S1,...,ST
t
L S
t|X
t − λ S
t
1
Problem: Doesn't use the fact that the samples are actually related... and
produces T networks!

3 solutions to address this issue
First note that, in the Gaussian framework:
L (S |X ) =
n
2
log det(D) −
n
2
Tr D
−1/2
S ΣSD
−1/2
−
np
2π
where D = Diag (S11, . . . , Spp) and Σ is the empirical covariance matrix ⇒
L (S |X ) ≡ L S |Σ ;

L (S |X ) =
n
2
log det(D) −
n
2
Tr D
−1/2
S ΣSD
−1/2
−
np
2π
L (S |X ) ≡ L S |Σ ;
• Intertwined estimation Use Σt = αΣt + (1 − α)¯Σt instead of Σt
where ¯Σt = 1
n t ntΣt
arg max
S1,...,ST
t
L S
t|Σt − λ S
t
1
Similar to the assumption that each sample is generated from a
mixture of Gaussian(?). In the experiments, α = 1/2.

L (S |X ) =
n
2
log det(D) −
n
2
Tr D
−1/2
S ΣSD
−1/2
−
np
2π
L (S |X ) ≡ L S |Σ ;
• Intertwined estimation
• Group-LASSO Mixed norm:
arg max
t









L S
t|Σt − λ
j=j t
(Sjj )2
1/2
Sjj ≡ t (St
jj
)2
1/2









(tends to encourage Sjj = 0). Hence should lead to very consensual
inferred networks.

L (S |X ) =
n
2
log det(D) −
n
2
Tr D
−1/2
S ΣSD
−1/2
−
np
2π
L (S |X ) ≡ L S |Σ ;
• Intertwined estimation
• Group-LASSO
• Cooperative-LASSO
arg max
t








L St
|Σt
− λ
j=j







 t
(St
jj )2
+
1/2
(S+)jj
+
t
(−St
jj )2
+
1/2
(S−)jj
















Takes into account that sign swaps are unlickely accross samples (down and
up-regulations).

Illustration of Group vs Cooperative LASSO

Comparison

Real life experiment
independent estimations true - sum of intertwined

Open questions
• is the group-lasso type penalty the correct approach to the biological
problem?
• how to be able to combine the network to analyze the dierences
between networks? (distances between graphs?) to build a unique
consensual network from all samples (mean network, AND network,
OR network... ?)
• could it be relevant to penalize the sparse regression problem by an
additional relagularization (e.g., distance between each network and a
consensual network)?

References
Ambroise, C., Chiquet, J., and Matias, C. (2009).
Inferring sparse Gaussian graphical models with latent structure.
Electronic Journal of Statistics, 3:205238.
Chiquet, J., Grandvalet, Y., and Ambroise, C. (2011).
Inferring multiple graphical structures.
Statistics and Computing, 21(4):537553.
Chiquet, J., Smith, A., Grasseau, G., Matias, C., and Ambroise, C. (2009).
SIMoNe: Statistical Inference for MOdular NEtworks.
Bioinformatics, 25(3):417418.
Friedman, J., Hastie, T., and Tibshirani, R. (2008).
Sparse inverse covariance estimation with the graphical lasso.
Biostatistics, 9(3):432441.
Giraud, C., Huet, S., and Verzelen, N. (2009).
Graph selection with ggmselect.
Technical report, preprint arXiv.
http://fr.arxiv.org/abs/0907.0619.
Meinshausen, N. and Bühlmann, P. (2006).
High dimensional graphs and variable selection with the lasso.
Annals of Statistic, 34(3):14361462.
Meyer, P., Latte, F., and Bontempi, G. (2008).
minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information.
BMC Bioinformatics, 9(461).
Pearl, J. (1998).
Probabilistic reasoning in intelligent systems: networks of plausible inference.

In Kaufmann, M., editor, Representation and reasoning series (2nd printing ed.). San Fracisco, California,
USA.
Pearl, J. and Russel, S. (2002).
Bayesian networks.
In Michael, A., editor, Handbook of Brain Theory and Neural Networks. Bradford Books (MIT Press),
Cambridge, Massachussets, USA.
Schäfer, J. and Strimmer, K. (2005a).
An empirical bayes approach to inferring large-scale gene association networks.
Bioinformatics, 21(6):754764.
Schäfer, J. and Strimmer, K. (2005b).
A shrinkage approach to large-scale covariance matrix estimation and implication for functional genomics.
Statistical Applications in Genetics and Molecular Biology, 4:132.
Scutari, M. (2010).
Learning Bayesian networks with the bnlearn R package.
Journal of Statistical Software, 35(3):122.

Reading revue of "Inferring Multiple Graphical Structures"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Reading revue of "Inferring Multiple Graphical Structures"

Similar to Reading revue of "Inferring Multiple Graphical Structures" (20)

More from tuxette

More from tuxette (20)

Recently uploaded

Recently uploaded (20)

Reading revue of "Inferring Multiple Graphical Structures"