SlideShare a Scribd company logo
1 of 100
Download to read offline
Joint Estimation of Graphical Models
Across Multiple Classes
Tavis Abrahamsen, Ray Bai,
Syed Rahman, Andrey Skripnikov
Department of Statistics
University of Florida
April 22, 2015
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 1 / 48
Background
Suppose that observations x1, x2, . . . , xn ∈ RP are independent and
identically distributed N(µ, Σ) where µ ∈ Rp and Σ is a positive definite
p × p matrix.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 2 / 48
Background
Suppose that observations x1, x2, . . . , xn ∈ RP are independent and
identically distributed N(µ, Σ) where µ ∈ Rp and Σ is a positive definite
p × p matrix.
The zeros in the inverse covariance matrix Σ−1 correspond to pairs of
features that are conditionally independent - that is, pairs of variables that
are independent of each other, given all the other variables in the data set.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 2 / 48
Background
Suppose that observations x1, x2, . . . , xn ∈ RP are independent and
identically distributed N(µ, Σ) where µ ∈ Rp and Σ is a positive definite
p × p matrix.
The zeros in the inverse covariance matrix Σ−1 correspond to pairs of
features that are conditionally independent - that is, pairs of variables that
are independent of each other, given all the other variables in the data set.
In a Gaussian graphical model, the conditional dependence relationships
are represented by a graph in which nodes represent the features and edges
connect conditionally dependent pairs of features.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 2 / 48
Background
A natural way to estimate the precision (or concentration) matrix Σ−1 is
via maximum likelihood. Letting S denote the empirical covariance matrix
of X, the Gaussian log likelihood takes the form (up to a constant)
n
2
log det Σ−1
− trace S Σ−1
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 3 / 48
Background
A natural way to estimate the precision (or concentration) matrix Σ−1 is
via maximum likelihood. Letting S denote the empirical covariance matrix
of X, the Gaussian log likelihood takes the form (up to a constant)
n
2
log det Σ−1
− trace S Σ−1
Maximizing this function with respect to Σ−1 yields the maximum
likelihood estimate S−1.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 3 / 48
Background
A natural way to estimate the precision (or concentration) matrix Σ−1 is
via maximum likelihood. Letting S denote the empirical covariance matrix
of X, the Gaussian log likelihood takes the form (up to a constant)
n
2
log det Σ−1
− trace S Σ−1
Maximizing this function with respect to Σ−1 yields the maximum
likelihood estimate S−1.
Problems with using the MLE of Σ−1:
1. In the high dimensional setting where p > n the matrix S is singular
and cannot be inverted to yield an estimate of Σ−1.
2. Even when S−1 exists, this estimator will typically not be sparse.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 3 / 48
Background
One method for obtaining sparse estimates of Σ−1 is to instead solve the
graphical lasso problem
max
Θ
log det Θ − trace (S Θ) − λ||Θ||1,
where λ is a nonnegative tuning parameter.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 4 / 48
Background
One method for obtaining sparse estimates of Σ−1 is to instead solve the
graphical lasso problem
max
Θ
log det Θ − trace (S Θ) − λ||Θ||1,
where λ is a nonnegative tuning parameter.
Advantages of using the graphical lasso estimator:
1. The l1 penalty term yields sparse estimates of Σ−1 when λ is “large”.
2. This problem can be solved even if p n.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 4 / 48
Estimating Models for Heterogeneous Data
Problem: The standard formulation for estimating a Gaussian graphical
model assumes that each observation is drawn from the same distribution.
However, in many datasets the observations may correspond to several
distinct classes.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 5 / 48
Estimating Models for Heterogeneous Data
Problem: The standard formulation for estimating a Gaussian graphical
model assumes that each observation is drawn from the same distribution.
However, in many datasets the observations may correspond to several
distinct classes.
Estimating a single graphical model would mask the underlying
heterogeneity, while estimating separate models for each category ignores
the common structure.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 5 / 48
Estimating Models for Heterogeneous Data
Problem: The standard formulation for estimating a Gaussian graphical
model assumes that each observation is drawn from the same distribution.
However, in many datasets the observations may correspond to several
distinct classes.
Estimating a single graphical model would mask the underlying
heterogeneity, while estimating separate models for each category ignores
the common structure.
We want to jointly estimate several graphical models corresponding to
different categories. In this presentation, we will describe three algorithms
that enable us to do this.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 5 / 48
Motivation - Genetics
Graphical models are of particular interest in the analysis of gene
expression data since they can provide a useful tool for visualizing the
relationships among genes and generating biological hypotheses.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 6 / 48
Motivation - Genetics
Graphical models are of particular interest in the analysis of gene
expression data since they can provide a useful tool for visualizing the
relationships among genes and generating biological hypotheses.
The datasets may correspond to several distinct classes. For example,
suppose a cancer researcher collects gene expression measurements for a
set of cancer tissue samples and a set of normal tissue samples. One might
want to estimate a graphical model for the cancer samples and a graphical
model for the normal samples.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 6 / 48
Motivation - Genetics
Graphical models are of particular interest in the analysis of gene
expression data since they can provide a useful tool for visualizing the
relationships among genes and generating biological hypotheses.
The datasets may correspond to several distinct classes. For example,
suppose a cancer researcher collects gene expression measurements for a
set of cancer tissue samples and a set of normal tissue samples. One might
want to estimate a graphical model for the cancer samples and a graphical
model for the normal samples.
One might expect the two graphical models to be similar to each other
since both are based on the same type of tissue, but also have important
differences resulting from the fact that gene networks are often
dysregulated in cancer.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 6 / 48
Problem Formulation
Suppose we have a heterogeneous data set with p variables and K
categories.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 7 / 48
Problem Formulation
Suppose we have a heterogeneous data set with p variables and K
categories.
The k-th category contains nk observations (x
(k)
1 , ..., x
(k)
nk ), where each
x
(k)
i = (x
(k)
i,1 , ..., x
(k)
i,p ) is a p-dimensional row vector.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 7 / 48
Problem Formulation
Suppose we have a heterogeneous data set with p variables and K
categories.
The k-th category contains nk observations (x
(k)
1 , ..., x
(k)
nk ), where each
x
(k)
i = (x
(k)
i,1 , ..., x
(k)
i,p ) is a p-dimensional row vector.
Without loss of generality, we assume the observations in the same
category are centered along each variable, i.e., nk
i=1 x
(k)
i,j = 0 for all
1 ≤ j ≤ p and 1 ≤ k ≤ K.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 7 / 48
Problem Formulation
We further assume that x
(k)
1 , ..., x
(k)
nk are independent and identically
distributed, sampled from a p-variate Gaussian distribution with mean zero
(without loss of generality since we center the data) and covariance matrix
Σ(k).
Let Ω(k) = (Σ(k))−1 = (ω
(k)
j,j )pxp.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 8 / 48
Problem Formulation
We further assume that x
(k)
1 , ..., x
(k)
nk are independent and identically
distributed, sampled from a p-variate Gaussian distribution with mean zero
(without loss of generality since we center the data) and covariance matrix
Σ(k).
Let Ω(k) = (Σ(k))−1 = (ω
(k)
j,j )pxp.
Then the likelihood function for k-th category will look like this:
l(Ω(k)
) = −
nk
2
log(2π) +
nk
2
[log{det(Ω(k)
)} − trace(ˆΣ(k)
Ω(k)
)],
where ˆΣ(k) is the sample covariance matrix for the k-th category.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 8 / 48
Problem Formulation
The most direct way to deal with such heterogeneous data is to estimate
K individual graphical models. We can compute a separate l1-regularized
estimator for each category k, 1 ≤ k ≤ K, by solving
min
Ω(k)
trace(ˆΣ(k)
Ω(k)
) − log{det(Ω(k)
)} + λk
j=j
|ω
(k)
j,j |,
where the minimum is taken over symmetric positive definite matrices.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 9 / 48
Problem Formulation
The most direct way to deal with such heterogeneous data is to estimate
K individual graphical models. We can compute a separate l1-regularized
estimator for each category k, 1 ≤ k ≤ K, by solving
min
Ω(k)
trace(ˆΣ(k)
Ω(k)
) − log{det(Ω(k)
)} + λk
j=j
|ω
(k)
j,j |,
where the minimum is taken over symmetric positive definite matrices.
This problem can be efficiently solved by existing algorithms such as glasso
of Freidman et al. (2008). However, we would prefer to jointly estimate
several graphical models instead of estimating each one separately.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 9 / 48
Joint Estimation
To improve estimation in cases where graphical models for different
categories may share some common structure, J. Guo, E. Levina, G.
Michailidis, and J. Zhu propose the following joint estimation method in
their paper “Joint Estimation of Multiple Graphical Models” (2009).
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 10 / 48
Joint Estimation
To improve estimation in cases where graphical models for different
categories may share some common structure, J. Guo, E. Levina, G.
Michailidis, and J. Zhu propose the following joint estimation method in
their paper “Joint Estimation of Multiple Graphical Models” (2009).
First, they reparameterize each ω
(k)
j,j as
ω
(k)
j,j = θj,j γ
(k)
j,j , 1 ≤ j = j ≤ p, 1 ≤ k ≤ K
.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 10 / 48
Joint Estimation
To improve estimation in cases where graphical models for different
categories may share some common structure, J. Guo, E. Levina, G.
Michailidis, and J. Zhu propose the following joint estimation method in
their paper “Joint Estimation of Multiple Graphical Models” (2009).
First, they reparameterize each ω
(k)
j,j as
ω
(k)
j,j = θj,j γ
(k)
j,j , 1 ≤ j = j ≤ p, 1 ≤ k ≤ K
.
For identifiability purposes, we restrict θj,j > 0, 1 ≤ j = j ≤ p. To
preserve symmetry, they also require θj,j = θj ,j and γ
(k)
j,j = γ
(k)
j ,j,
1 ≤ j = j ≤ p and 1 ≤ k ≤ K.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 10 / 48
Joint Estimation
To improve estimation in cases where graphical models for different
categories may share some common structure, J. Guo, E. Levina, G.
Michailidis, and J. Zhu propose the following joint estimation method in
their paper “Joint Estimation of Multiple Graphical Models” (2009).
First, they reparameterize each ω
(k)
j,j as
ω
(k)
j,j = θj,j γ
(k)
j,j , 1 ≤ j = j ≤ p, 1 ≤ k ≤ K
.
For identifiability purposes, we restrict θj,j > 0, 1 ≤ j = j ≤ p. To
preserve symmetry, they also require θj,j = θj ,j and γ
(k)
j,j = γ
(k)
j ,j,
1 ≤ j = j ≤ p and 1 ≤ k ≤ K.
This decomposition treats {ω
(1)
j,j , ..., ω
(K)
j,j } as a group, with the common
factor θj,j controlling the presence of the link between nodes j and j in
any of the categories, and γ
(k)
j,j reflects the differences between categories.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 10 / 48
Joint Estimation
Let Θ = (θj,j )pxp and Γ(k) = (γ
(k)
j,j )pxp. To estimate this model, we
propose the following penalized criterion:
min
Θ,{Γ(k)}K
k=1
K
k=1
[trace(ˆΣΩ(k)
)−log{det(Ω(k)
)}] +η1
j=j
θj,j +η2
j=j
K
k=1
|γ
(k)
j,j |,
(1)
subject to ω
(k)
j,j = θj,j γ
(k)
j,j , θj,j > 0, 1 ≤ j, j ≤ p
θj,j = θj ,j, γ
(k)
j,j = γ
(k)
j ,j, 1 ≤ j = j ≤ p; 1 ≤ k ≤ K
θj,j = 1, γ
(k)
j,j = ω
(k)
j,j , 1 ≤ j ≤ p; 1 ≤ k ≤ K,
where η1 and η2 are two tuning parameters.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 11 / 48
Joint Estimation
The first parameter, η1 , controls the sparsity of the common factors
θj,j and it can effectively remove the common zero elements across
Ω(1), ..., Ω(K); i.e., if θj,j is shrunk to zero, there will be no link
between nodes j and j in any of the K graphs.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 12 / 48
Joint Estimation
The first parameter, η1 , controls the sparsity of the common factors
θj,j and it can effectively remove the common zero elements across
Ω(1), ..., Ω(K); i.e., if θj,j is shrunk to zero, there will be no link
between nodes j and j in any of the K graphs.
If θj,j is not zero, some of the γ
(k)
j,j s (and hence some of the ω
(k)
j,j s)
can still be set to zero by the second penalty controlled by η2. This
allows graphs belonging to different categories to have different
structures.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 12 / 48
Joint Estimation
To simplify the model, the two tuning parameters η1 and η2 in (1) can be
replaced by a single tuning parameter, by letting η = η1η2. It turns out
that solving (1) is equivalent to solving
min
Θ,{Γ(k)}K
k=1
K
k=1
[trace(ˆΣΩ(k)
) − log{det(Ω(k)
)}] +
j=j
θj,j + η
j=j
K
k=1
|γ
(k)
j,j |,
(2)
subject to ω
(k)
j,j = θj,j γ
(k)
j,j , θj,j > 0, 1 ≤ j, j ≤ p
θj,j = θj ,j, γ
(k)
j,j = γ
(k)
j ,j, 1 ≤ j = j ≤ p; 1 ≤ k ≤ K
θj,j = 1, γ
(k)
j,j = ω
(k)
j,j , 1 ≤ j ≤ p; 1 ≤ k ≤ K,
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 13 / 48
Joint Estimation
First, Guo et al. reformulate the problem (2) in a more convenient form
for computational purposes.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 14 / 48
Joint Estimation
First, Guo et al. reformulate the problem (2) in a more convenient form
for computational purposes.
Let {ˆΩ(k)}K
k=1 be a local minimizer of
min
{Ω(k)}K
k=1
K
k=1
[trace(ˆΣΩ(k)
) − log{det(Ω(k)
)}] + λ
j=j
K
k=1
|ω
(k)
j,j |, (3)
where λ = 2
√
η. Then, there exists a local minimizer of (2), (ˆΘ, {ˆΓ}K
k=1),
such that ˆΩ(k) = ˆΘ(k)ˆΓ(k), for all 1 ≤ k ≤ K. On the other hand, if
(ˆΘ, {ˆΓ(k)}K
k=1) is a local minimizer of (2), then there also exists a local
minimizer of (3), {ˆΩ(k)}K
k=1, such that ˆΩ(k) = ˆΘ(k)ˆΓ(k) for all 1 ≤ k ≤ K.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 14 / 48
Joint Estimation
One can see that because of the penalty term in (3), the optimization
criterion is not convex.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 15 / 48
Joint Estimation
One can see that because of the penalty term in (3), the optimization
criterion is not convex.
In order to reach convexity, an iterative optimization approach based on
local linear approximation (LLA) is used (Zou and Li, 2008). If pλ(|ω|)
denotes the penalty term then LLA does the following:
pλ(|ω|) ≈ pλ(|ω0
|) + pλ(|ω0
|)(|ω| − |ω0
|), for ω ≈ ω0
.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 15 / 48
Joint Estimation
One can see that because of the penalty term in (3), the optimization
criterion is not convex.
In order to reach convexity, an iterative optimization approach based on
local linear approximation (LLA) is used (Zou and Li, 2008). If pλ(|ω|)
denotes the penalty term then LLA does the following:
pλ(|ω|) ≈ pλ(|ω0
|) + pλ(|ω0
|)(|ω| − |ω0
|), for ω ≈ ω0
.
Specifically, letting (ω
(k)
j,j )(t) denote the estimates from the previous
iteration t, we approximate
K
k=1
|ω
(k)
j,j | ∼
K
k=1 |ω
(k)
j,j |
K
k=1 |(ω
(k)
j,j )(t)|
.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 15 / 48
Joint Estimation
One can see that because of the penalty term in (3), the optimization
criterion is not convex.
In order to reach convexity, an iterative optimization approach based on
local linear approximation (LLA) is used (Zou and Li, 2008). If pλ(|ω|)
denotes the penalty term then LLA does the following:
pλ(|ω|) ≈ pλ(|ω0
|) + pλ(|ω0
|)(|ω| − |ω0
|), for ω ≈ ω0
.
Specifically, letting (ω
(k)
j,j )(t) denote the estimates from the previous
iteration t, we approximate
K
k=1
|ω
(k)
j,j | ∼
K
k=1 |ω
(k)
j,j |
K
k=1 |(ω
(k)
j,j )(t)|
.
LLA estimate helps alleviate the computation burden and overcome the
potential local minima problem in minimizing the nonconvex function.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 15 / 48
Joint Estimation Algorithm
Thus, at the (t + 1)-th iteration of our joint estimation algorithm, problem
(3) may be decomposed into K individual optimization problems:
(Ω(k)
)(t+1)
= arg min
Ω(k)
trace(ˆΣ(k)
Ω(k)
) − log{det(Ω(k)
)} + λ
j=j
τ
(k)
j,j |ω
(k)
j,j |,
(4)
where τ
(k)
j,j = ( K
k=1 |(ω
(k)
j,j )(t)|)−1/2.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 16 / 48
Joint Estimation Algorithm
Thus, at the (t + 1)-th iteration of our joint estimation algorithm, problem
(3) may be decomposed into K individual optimization problems:
(Ω(k)
)(t+1)
= arg min
Ω(k)
trace(ˆΣ(k)
Ω(k)
) − log{det(Ω(k)
)} + λ
j=j
τ
(k)
j,j |ω
(k)
j,j |,
(4)
where τ
(k)
j,j = ( K
k=1 |(ω
(k)
j,j )(t)|)−1/2.
Note that criterion (4) is exactly the sparse precision matrix estimation
problem with weighted l1 penalty; the solution can be efficiently computed
using glasso.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 16 / 48
Convexity of the Criterion
trace(ˆΣ(k)Ω(k)) is an affine function of elements of Ω(k), therefore is
convex.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 17 / 48
Convexity of the Criterion
trace(ˆΣ(k)Ω(k)) is an affine function of elements of Ω(k), therefore is
convex.
log{det(Ω(k))} is convex by the restriction to the line proof from
class.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 17 / 48
Convexity of the Criterion
trace(ˆΣ(k)Ω(k)) is an affine function of elements of Ω(k), therefore is
convex.
log{det(Ω(k))} is convex by the restriction to the line proof from
class.
λ j=j τ
(k)
j,j |ω
(k)
j,j | is convex as an affine function of elements of Ω(k).
The sum of convex functions is convex, therefore the minimization
criterion is a convex function of Ω(k).
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 17 / 48
Joint Estimation Algorithm
In summary, the proposed algorithm for solving
min
{Ω(k)}K
k=1
K
k=1
[trace(ˆΣΩ(k)
) − log{det(Ω(k)
)}] + λ
j=j
K
k=1
|ω
(k)
j,j |,
is:
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 18 / 48
Joint Estimation Algorithm
In summary, the proposed algorithm for solving
min
{Ω(k)}K
k=1
K
k=1
[trace(ˆΣΩ(k)
) − log{det(Ω(k)
)}] + λ
j=j
K
k=1
|ω
(k)
j,j |,
is:
(a) Initialize ˆΩ(k) = (ˆΣ(k) + νIp)−1 for all 1 ≤ k ≤ K, where Ip is the
identity matrix and the constant ν is chosen to guarantee ˆΣ(k) + νIp
is positive definite;
(b) Using glasso, update ˆΩ(k) by (4) for all 1 ≤ k ≤ K.
(c) Repeat the previous step until convergence is achieved.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 18 / 48
Model Selection.
The tuning parameter λ in (4) controls the sparsity of the resulting
estimator.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 19 / 48
Model Selection.
The tuning parameter λ in (4) controls the sparsity of the resulting
estimator.
Guo et al. select the optimal tuning parameter by minimizing a Bayesian
information criterion (BIC), which balances the goodness of fit and the
model complexity. Specifically, we define BIC for the proposed joint
estimation method as
BIC(λ) =
K
k=1
[trace(ˆΣ(k) ˆΩ
(k)
λ ) − log{det(ˆΩ
(k)
λ } + log(nk)dfk]
where ˆΩ
(1)
λ , ..., ˆΩ
(K)
λ are the estimates from (4) with tuning parameter λ
and the degrees of freedom are dfk = #{(j, j ) : j < j , ˆω
(k)
j,j = 0}.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 19 / 48
Simulation Study
To assess the performance of the joint estimation algorithm in Guo et al.,
three types of simulated networks are simulated: a chain network, a
5-nearest neighbors network, and a scale-free (power law) network. In all
cases, p = 100 features and K = 3. For k = 1, ..., K, nk = 100 iid
observations from a multiariate normal distribution N(0, (Ω(k))−1), where
Ω(k) is the precision matrix for the k-th category.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 20 / 48
Simulation Study
To assess the performance of the joint estimation algorithm in Guo et al.,
three types of simulated networks are simulated: a chain network, a
5-nearest neighbors network, and a scale-free (power law) network. In all
cases, p = 100 features and K = 3. For k = 1, ..., K, nk = 100 iid
observations from a multiariate normal distribution N(0, (Ω(k))−1), where
Ω(k) is the precision matrix for the k-th category.
A constant ρ is used to add heterogeneity to the common structure by
creating additional indiivdual links (i.e. ρ is the ratio of the number of
individual links to the number of common links).
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 20 / 48
Networks
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 21 / 48
Model Diagnostics
We compare the joint estimation method to the separate estimation
method. To assess performance:
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 22 / 48
Model Diagnostics
We compare the joint estimation method to the separate estimation
method. To assess performance:
Plot ROC curves which plot the average proportion of correctly
detected links against the average false positive range of values over a
range of values of λ.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 22 / 48
Model Diagnostics
We compare the joint estimation method to the separate estimation
method. To assess performance:
Plot ROC curves which plot the average proportion of correctly
detected links against the average false positive range of values over a
range of values of λ.
Compute and compare the following metrics: average entropy loss
(EL), average Frobenius loss (FL), average false positive (FP) and
average false negative (FN) rates, and the average rate of
mis-identified common zeros among the categories (CZ).
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 22 / 48
Simulation Results
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 23 / 48
Simulation Results
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 24 / 48
Simulation Results
As the estimated ROC curves in Figure 2 show, the joint estimation
method dominates the separate estimation method when the proportion of
individual links is low. When the proportion of indvidual links is high, the
methods perform similarly.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 25 / 48
Simulation Results
As the estimated ROC curves in Figure 2 show, the joint estimation
method dominates the separate estimation method when the proportion of
individual links is low. When the proportion of indvidual links is high, the
methods perform similarly.
As Table 1 shows, the joint estimation method produces lower entropy and
Frobenius losses, as well as lower false negative rates.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 25 / 48
Simulation Results
As the estimated ROC curves in Figure 2 show, the joint estimation
method dominates the separate estimation method when the proportion of
individual links is low. When the proportion of indvidual links is high, the
methods perform similarly.
As Table 1 shows, the joint estimation method produces lower entropy and
Frobenius losses, as well as lower false negative rates.
When the proportion of common links is high enough, it produces lower
false positive rates and performs better at identifying common zeros than
the separate estimation method.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 25 / 48
Simulation Results
As the estimated ROC curves in Figure 2 show, the joint estimation
method dominates the separate estimation method when the proportion of
individual links is low. When the proportion of indvidual links is high, the
methods perform similarly.
As Table 1 shows, the joint estimation method produces lower entropy and
Frobenius losses, as well as lower false negative rates.
When the proportion of common links is high enough, it produces lower
false positive rates and performs better at identifying common zeros than
the separate estimation method.
Conclusion: In the presence of a common structure, the joint estimation
method outperforms the separate estimation method. In the absence of a
common structure, their performance is fairly similar.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 25 / 48
Data Example
Webpages from websites at computer science departments in four
universities (Cornell, Texas, Washington, and Wisconsin) are manually
classified into seven categories: student, faculty, staff, department, course,
project, and other. The joint estimation method is applied to n = 1396
documents in the four largest categories and p = 100 terms, and links
between the terms are explored. The resulting common structure is shown
in Figure 3.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 26 / 48
Data Example
Webpages from websites at computer science departments in four
universities (Cornell, Texas, Washington, and Wisconsin) are manually
classified into seven categories: student, faculty, staff, department, course,
project, and other. The joint estimation method is applied to n = 1396
documents in the four largest categories and p = 100 terms, and links
between the terms are explored. The resulting common structure is shown
in Figure 3.
The model also allows us to explore the heterogeneity between different
categories. For example, the graphs for the “student” and “faculty”
categories have some links that appear in only one or the other (e.g. the
terms “teach” and “assist” are only linked in the “student” category, while
“assist-professor” and “select-public” are only linked in the “faculty”
category.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 26 / 48
Data Example
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 27 / 48
Data Example
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 28 / 48
The Joint Graphical Lasso
In their paper “The joint graphical lasso for inverse covariance estimation
across multiple classes,” P. Danaher, P. Wang, & D. Witten propose the
joint graphical lasso as a technique for jointly estimating multiple
graphical models corresponding to distinct but related conditions, such as
cancer and normal tissue.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 29 / 48
The Joint Graphical Lasso
In their paper “The joint graphical lasso for inverse covariance estimation
across multiple classes,” P. Danaher, P. Wang, & D. Witten propose the
joint graphical lasso as a technique for jointly estimating multiple
graphical models corresponding to distinct but related conditions, such as
cancer and normal tissue.
Suppose we have data from K ≥ 2 distinct classes. Instead of estimating
Σ−1
1 , Σ−1
2 , . . . , Σ−1
K separately, the authors propose estimating estimating
these values jointly by maximizing the following penalized log-likelihood
function
max
{Θ}
K
k=1
nk log det Θ(k)
− trace S(k)
Θ(k)
− P({Θ}),
subject to the constraint that Θ(1), Θ(2), . . . , Θ(K) are positive definite,
where {Θ} = {Θ(1), Θ(2), . . . , Θ(K)}.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 29 / 48
The Joint Graphical Lasso
In their paper “The joint graphical lasso for inverse covariance estimation
across multiple classes,” P. Danaher, P. Wang, & D. Witten propose the
joint graphical lasso as a technique for jointly estimating multiple
graphical models corresponding to distinct but related conditions, such as
cancer and normal tissue.
Suppose we have data from K ≥ 2 distinct classes. Instead of estimating
Σ−1
1 , Σ−1
2 , . . . , Σ−1
K separately, the authors propose estimating estimating
these values jointly by maximizing the following penalized log-likelihood
function
max
{Θ}
K
k=1
nk log det Θ(k)
− trace S(k)
Θ(k)
− P({Θ}),
subject to the constraint that Θ(1), Θ(2), . . . , Θ(K) are positive definite,
where {Θ} = {Θ(1), Θ(2), . . . , Θ(K)}.
P({Θ}) denotes a convex penalty function, so the objective function is
strictly concave.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 29 / 48
Joint Graphical Lasso
Danaher et al. propose choosing a penalty function P that will encourage
ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) to share certain characteristics, such as the locations
or values of the nonzero elements, in addition to providing sparse
estimates of the precision matrices.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 30 / 48
Joint Graphical Lasso
Danaher et al. propose choosing a penalty function P that will encourage
ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) to share certain characteristics, such as the locations
or values of the nonzero elements, in addition to providing sparse
estimates of the precision matrices.
Two penalty functions suggested by Danaher et al. are the fused graphical
lasso and group graphical lasso penalties.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 30 / 48
Fused Graphical Lasso
The fused graphical lasso (FGL) penalty is given by
P({Θ}) = λ1
K
k=1 i=j
|θ
(k)
ij | + λ2
k<k i,j
|θ
(k)
ij − θ
(k )
ij |,
where λ1 and λ2 are nonnegative tuning parameters.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 31 / 48
Fused Graphical Lasso
The fused graphical lasso (FGL) penalty is given by
P({Θ}) = λ1
K
k=1 i=j
|θ
(k)
ij | + λ2
k<k i,j
|θ
(k)
ij − θ
(k )
ij |,
where λ1 and λ2 are nonnegative tuning parameters.
1. Like the graphical lasso, FGL results in sparse estimates
ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) when the tuning parameter λ1 is large.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 31 / 48
Fused Graphical Lasso
The fused graphical lasso (FGL) penalty is given by
P({Θ}) = λ1
K
k=1 i=j
|θ
(k)
ij | + λ2
k<k i,j
|θ
(k)
ij − θ
(k )
ij |,
where λ1 and λ2 are nonnegative tuning parameters.
1. Like the graphical lasso, FGL results in sparse estimates
ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) when the tuning parameter λ1 is large.
2. Many of the elements ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) will be identical across
classes when the tuning parameter λ2 is large.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 31 / 48
Fused Graphical Lasso
The fused graphical lasso (FGL) penalty is given by
P({Θ}) = λ1
K
k=1 i=j
|θ
(k)
ij | + λ2
k<k i,j
|θ
(k)
ij − θ
(k )
ij |,
where λ1 and λ2 are nonnegative tuning parameters.
1. Like the graphical lasso, FGL results in sparse estimates
ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) when the tuning parameter λ1 is large.
2. Many of the elements ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) will be identical across
classes when the tuning parameter λ2 is large.
3. FGL borrows information aggressively across classes, encouraging not
only similar network structure but also similar edge values.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 31 / 48
Group Graphical Lasso
The group graphical lasso (GGL) penalty function is given by
P({Θ}) = λ1
K
k=1 i=j
|θ
(k)
ij | + λ2
i=j
K
k=1
θ
(k)2
ij ,
where λ1 and λ2 are nonnegative tuning parameters.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 32 / 48
Group Graphical Lasso
The group graphical lasso (GGL) penalty function is given by
P({Θ}) = λ1
K
k=1 i=j
|θ
(k)
ij | + λ2
i=j
K
k=1
θ
(k)2
ij ,
where λ1 and λ2 are nonnegative tuning parameters.
1. The group lasso penalty encourages a similar pattern of sparsity across
all of the precision matrices - there will be a tendency for the zeros in
the K estimated precision matrices to occur in the same places.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 32 / 48
Group Graphical Lasso
The group graphical lasso (GGL) penalty function is given by
P({Θ}) = λ1
K
k=1 i=j
|θ
(k)
ij | + λ2
i=j
K
k=1
θ
(k)2
ij ,
where λ1 and λ2 are nonnegative tuning parameters.
1. The group lasso penalty encourages a similar pattern of sparsity across
all of the precision matrices - there will be a tendency for the zeros in
the K estimated precision matrices to occur in the same places.
2. When λ1 = 0 and λ2 > 0, each ˆΘ(k) will have an identical pattern of
non-zero elements. On the other hand, the lasso penalty encourages
further sparsity within each ˆΘ(k).
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 32 / 48
Group Graphical Lasso
The group graphical lasso (GGL) penalty function is given by
P({Θ}) = λ1
K
k=1 i=j
|θ
(k)
ij | + λ2
i=j
K
k=1
θ
(k)2
ij ,
where λ1 and λ2 are nonnegative tuning parameters.
1. The group lasso penalty encourages a similar pattern of sparsity across
all of the precision matrices - there will be a tendency for the zeros in
the K estimated precision matrices to occur in the same places.
2. When λ1 = 0 and λ2 > 0, each ˆΘ(k) will have an identical pattern of
non-zero elements. On the other hand, the lasso penalty encourages
further sparsity within each ˆΘ(k).
3. GGL encourages a weaker form of similarity across the K precision
matrices than FGL in that GGL only encourages a shared pattern of
sparsity and not shared edge values.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 32 / 48
Algorithm for the Joint Graphical Lasso
Danaher et al. solve the joint graphical lasso problem using an alternating
directions method of multipliers (ADMM) algorithm.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 33 / 48
Algorithm for the Joint Graphical Lasso
Danaher et al. solve the joint graphical lasso problem using an alternating
directions method of multipliers (ADMM) algorithm.
The original problem can be rewritten as
min
{Θ},{Z}
−
K
k=1
nk log det Θ(k)
− trace S(k)
Θ(k)
+ P({Z})
subject to the positive-definiteness constraint as well as the constraint that
Z(k) = Θ(k), where {Z} = {Z(1), Z(2), . . . , Z(k)}.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 33 / 48
Algorithm for the Joint Graphical Lasso
The scaled augmented Lagrangian for this problem is given by
Lρ({Θ}, {Z}, {U}) = −
K
k=1
nk log det Θ(k)
− trace S(k)
Θ(k)
+ P({Z})
+
ρ
2
K
k=1
||Θ(k)
− Z(k)
+ U(k)
||2
F ,
where {U} = {U(1), U(2), . . . , U(k)}.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 34 / 48
Algorithm for the Joint Graphical Lasso
The scaled augmented Lagrangian for this problem is given by
Lρ({Θ}, {Z}, {U}) = −
K
k=1
nk log det Θ(k)
− trace S(k)
Θ(k)
+ P({Z})
+
ρ
2
K
k=1
||Θ(k)
− Z(k)
+ U(k)
||2
F ,
where {U} = {U(1), U(2), . . . , U(k)}.
The ADMM corresponding to the above problem results in iterating three
simple steps.
(a) {Θ(i)} ← arg min{Θ}{Lρ({Θ}, {Z(i−1)}, {U(i−1)})}
(b) {Z(i)} ← arg min{Z}{Lρ({Θ(i)}, {Z}, {U(i−1)})}
(c) {U(i)} ← {U(i−1)} + ({Θ(i)} − {Z(i)}).
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 34 / 48
Algorithm for the Joint Graphical Lasso
The scaled augmented Lagrangian for this problem is given by
Lρ({Θ}, {Z}, {U}) = −
K
k=1
nk log det Θ(k)
− trace S(k)
Θ(k)
+ P({Z})
+
ρ
2
K
k=1
||Θ(k)
− Z(k)
+ U(k)
||2
F ,
where {U} = {U(1), U(2), . . . , U(k)}.
The ADMM corresponding to the above problem results in iterating three
simple steps.
(a) {Θ(i)} ← arg min{Θ}{Lρ({Θ}, {Z(i−1)}, {U(i−1)})}
(b) {Z(i)} ← arg min{Z}{Lρ({Θ(i)}, {Z}, {U(i−1)})}
(c) {U(i)} ← {U(i−1)} + ({Θ(i)} − {Z(i)}).
The update in step (a) preserves positive-definiteness, thus this constraint
can be satisfied simply by initializing Θ
(k)
(0) = I.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 34 / 48
Simulation Study
The data for main simulation study consisted of generating three networks
with p = 500 features belonging to ten equally sized unconnected
subnetworks, each with a power law degree distribution, i.e. a scale-free
network. Of the ten subnetworks, eight have the same structure and edge
values in all three classes, one is identical between the first two classes and
missing in the third (i.e. the corresponding features are singletons in the
third network), and one is present in only the first class. The
corresponding graph is shown in Figure 1.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 35 / 48
Simulation Study
The data for main simulation study consisted of generating three networks
with p = 500 features belonging to ten equally sized unconnected
subnetworks, each with a power law degree distribution, i.e. a scale-free
network. Of the ten subnetworks, eight have the same structure and edge
values in all three classes, one is identical between the first two classes and
missing in the third (i.e. the corresponding features are singletons in the
third network), and one is present in only the first class. The
corresponding graph is shown in Figure 1.
In addition to the 500-feature network pair, we generate a pair of networks
with p = 1000 features, each of which is block diagonal with 500 × 500
blocks corresponding to two copies of the 500-feature networks just
described.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 35 / 48
Figure 1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
. 6. Network used to generate simulated datasets for Figure 2 in Section 7.1.2. Black edges are common to all th
sses, green edges are present only in classes 1 and 2, and red edges are present only in class 1.
Figure: This shows the graph corresponding to the concentration matrix for the
main simulation study. Black edges are common to all three classes, green edges
are present only in classes 1 and 2, and red edges are present only in class 1.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 36 / 48
Basic Result
As we will see in the next slides, when p = 500 and n = 150, FGL
performs the best in the simulations conducted by Danaher et. al.
although it is much slower than glasso (the fastest) and the group lasso
penalty case. The graphical lasso and the joint estimation method
introduced by Guo et al. (2011) are included in the comparisons as well.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 37 / 48
Basic Result
As we will see in the next slides, when p = 500 and n = 150, FGL
performs the best in the simulations conducted by Danaher et. al.
although it is much slower than glasso (the fastest) and the group lasso
penalty case. The graphical lasso and the joint estimation method
introduced by Guo et al. (2011) are included in the comparisons as well.
In addition, FGL also seems to do better than GGL asymptotically when
we hold p to be fixed and let n increase.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 37 / 48
In terms of parameters:
FGL is presented in terms of λ2 but for GGL they reparametrize the results
in terms ω2 = 1√
2
λ2/(λ1 + 1√
2
λ2). Each line corresponds to a different
value of λ2 and ω2.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 38 / 48
In terms of parameters:
FGL is presented in terms of λ2 but for GGL they reparametrize the results
in terms ω2 = 1√
2
λ2/(λ1 + 1√
2
λ2). Each line corresponds to a different
value of λ2 and ω2.
As λ1 or ω1 = λ1 + 1√
2
λ2 increases, sparsity increases, or equivalently, the
number of edges selected decreases.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 38 / 48
In terms of parameters:
FGL is presented in terms of λ2 but for GGL they reparametrize the results
in terms ω2 = 1√
2
λ2/(λ1 + 1√
2
λ2). Each line corresponds to a different
value of λ2 and ω2.
As λ1 or ω1 = λ1 + 1√
2
λ2 increases, sparsity increases, or equivalently, the
number of edges selected decreases.
According to the authors, approaches such as AIC, BIC, and
cross-validation tend to choose models too large to be useful. In this
setting, model selection should be guided by practical considerations, such
as network interpretability, stability, and the desire for an edge set with a
low false discovery rate.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 38 / 48
Some measures of error
SSE =
K
k=1 i=j
(ˆθ
(k)
ij − (Σ(k)
)−1
ij )2
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 39 / 48
Some measures of error
SSE =
K
k=1 i=j
(ˆθ
(k)
ij − (Σ(k)
)−1
ij )2
Differential edges are defined as the edges that differ between classes. For
FGL, it is computed as the number of pairs k < k , i < j such that
ˆθ
(k)
ij = ˆθ
(k )
ij . For GGL, the proposal of Guo et al. (2011), and the
graphical lasso it is computed as the number of pairs k < k , i < j such
that |ˆθ
(k)
ij − ˆθ
(k )
ij | > 10−2.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 39 / 48
Some measures of error
SSE =
K
k=1 i=j
(ˆθ
(k)
ij − (Σ(k)
)−1
ij )2
Differential edges are defined as the edges that differ between classes. For
FGL, it is computed as the number of pairs k < k , i < j such that
ˆθ
(k)
ij = ˆθ
(k )
ij . For GGL, the proposal of Guo et al. (2011), and the
graphical lasso it is computed as the number of pairs k < k , i < j such
that |ˆθ
(k)
ij − ˆθ
(k )
ij | > 10−2.
The Kullback-Leibler Divergence (dKL) from the multivariate normal
model with inverse covariance estimates Θ(1), ..., Θ(k)to the multivariate
normal model with the true precision matrices Σ(1), ..., Σ(k) is
1
2
K
k=1
(− log det(Θ(k)
Σ(k)
) + trace(Θ(k)
Σ(k)
))
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 39 / 48
Figure 2 from Danaher et. alJoint graphical lasso for estimation of multiple networks 1
0 1000 2000 3000 4000 5000 6000 7000
020040060080010001200
(a)
FP Edges
TPEdges
0 1000 2000 3000 4000 5000 6000
2030405060708090
(b)
Total Edges Selected
SumofSquaredErrors 0 100 200 300 400
010203040
(c)
FP Differential Edges
TPDifferentialEdges
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●●●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
0 500 1000 1500
765770775780785790795
(d)
L1 norm of {Θ}
Kullback−LeiblerDivergence
0 500 1000 1500 2000 2500
0.10.55.050.0500.0
(e)
Total Edges Selected
RunTimeinSeconds
●
●
FGL
GGL
Guo et. al. 2011
Graphical Lasso
λ2= 0.025
λ2= 0.05
λ2= 0.075
λ2= 0.1
ω2= 0.15
ω2= 0.5
ω2= 0.7
ω2= 1
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 40 / 48
Figure 3 from Danaher et. al - Analysis of lung cancer
microarray data
her gene using the chosen tuning parameters. Identification of block diagonal structure using Theorem 1 a
plication of the FGL algorithm took less than two minutes. (Note that this data set is so large that it wo
computationally prohibitive to apply the proposal of Guo et al. (2011)!) FGL estimated 134 edges sha
tween the two networks, 202 edges present only in the cancer network, and 18 edges present only in
rmal tissue network. The results are displayed in Figure 3.
g. 3. Conditional dependency networks inferred from 17,772 genes in normal and cancerous lung cells. 278 ge
ve nonzero edges in at least one of the two networks. Black lines denote edges common to both classes. Red and gr
es denote tumor-specific and normal-specific edges, respectively.
The estimated networks contain many two-gene subnetworks common to both classes, a few small subn
Figure: Conditional dependency networks inferred from 17,772 genes in normal
and cancerous lung cells. 278 genes have nonzero edges in at least one of the two
networks. Black lines denote edges common to both classes. Red and green lines
denote tumor-specific and normal-specific edges, respectively. The parameters
used were λ1 = 0.95 and λ2 = 0.005.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 41 / 48
Table 1 from Danaher et. al
12 Danaher et al.
Table 1. Performances as a function of n and p. Means over 100 replicates are shown
for dKL, and for sensitivity (Sens.) and false discovery rate (FDR) of detection of
edges (DE) and di↵erential edge detection (DED).
p n dKL DE Sens. DE FDR DED Sens. DED FDR
FGL
50 545.1 0.502 0.966 0.262 0.996
500 200 517.5 0.570 0.053 0.228 0.485
500 516.6 0.590 0.001 0.192 0.036
50 1119.3 0.600 0.970 0.245 0.998
1000 200 1035.0 0.666 0.063 0.223 0.557
500 1033.3 0.681 0.000 0.194 0.025
GGL
50 549.8 0.490 0.973 0.337 0.996
500 200 520.8 0.505 0.060 0.244 0.903
500 519.7 0.524 0.010 0.194 0.921
50 1127.9 0.587 0.976 0.316 0.998
1000 200 1041.7 0.615 0.061 0.239 0.908
500 1039.4 0.629 0.007 0.197 0.920
8. Analysis of lung cancer microarray data
We applied FGL to a dataset containing 22, 283 microarray-derived gene expression measurements from large
airway epithelial cells sampled from 97 patients with lung cancer and 90 controls (Spira et al. 2007). The data
are publicly available from the Gene Expression Omnibus (Barrett et al. 2005) at accession number GDS2771.
We omitted genes with standard deviations in the bottom 20% since a greater share of their variance is likely
attributable to non-biological noise. The remaining genes were normalized to have mean zero and standard
deviation one within each class. To avoid disparate levels of sparsity between the classes and to prevent the
larger class from dominating the estimated networks, we weighted each class equally instead of by sample
size in (2.4). Since our goal was data visualization and hypothesis generation, we chose a high value for theAbrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 42 / 48
Table 1 indicates that
for fixed p, as n increases, FGL seems to do much better in terms of edge
detection (as expected, sensitivity increases and false discovery rate
decreases as n increases) and differential edge detection (false discovery
rate decreases as n increases) than GGL. Oddly enough, in all cases, DED
Sens declines as n increases .
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 43 / 48
Nearest Neighbor Network (n = 100, p =25)
Group
λ1 = 0.07, λ2 = 0.05 λ1 = 0.05, λ2 = 0.25
F 0.05642315 0.06010037
FP 0.3391197 0.02918924
FN 0.3035487 0.8517427
Table: Nearest Neighbor Network for Group Lasso Penalty
Fused
λ1 = 0.05, λ2 = 0.05 λ1 = 0.2, λ2 = 0.1
F 0.06335103 0.06954624
FP 0.4269327 0.002245857
FN 0.3139659 0.9723261
Table: Nearest Neighbor Network for Fused Lasso Penalty
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 44 / 48
ˆΩ1 for Nearest Neighbor Network (n = 100, p =25)
V1
V2
V3
V4
V5
V6
V7
V8
V9
V10
V11V12
V13
V14
V15
V16
V17
V18
V19
V20
V21
V22
V23
V24
V25
(a) Estimated Nearest Neighbor
Graph with Fused Lasso Penalty
V1
V2
V3
V4
V5
V6
V7
V8
V9
V10
V11
V12
V13
V14
V15
V16
V17
V18
V19
V20
V21V22
V23V24
V25
(b) Estimated Nearest Neighbor
Graph with Group Lasso Penalty
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
(c) True Nearest Neighbor Graph
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 45 / 48
ˆΩ2 for Nearest Neighbor Network (n = 100, p =25)
V1
V2
V3
V4
V5
V6
V7
V8
V9
V10
V11
V12
V13
V14
V15
V16
V17
V18
V19
V20
V21
V22
V23
V24
V25
(a) Estimated Nearest Neighbor
Graph with Fused Lasso Penalty
V1
V2
V3
V4
V5
V6
V7
V8
V9
V10
V11
V12
V13
V14
V15
V16
V17
V18
V19
V20
V21
V22
V23
V24
V25
(b) Estimated Nearest Neighbor
Graph with Group Lasso Penalty
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
(c) True Nearest Neighbor Graph
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 46 / 48
ˆΩ3 for Nearest Neighbor Network (n = 100, p =25)
V1
V2
V3
V4
V5
V6
V7
V8
V9
V10
V11
V12
V13
V14
V15V16
V17
V18
V19
V20
V21
V22
V23
V24
V25
(a) Estimated Nearest Neighbor
Graph with Fused Lasso Penalty
V1
V2
V3
V4
V5
V6
V7
V8
V9
V10
V11
V12
V13
V14
V15
V16 V17
V18
V19
V20
V21
V22
V23
V24
V25
(b) Estimated Nearest Neighbor
Graph with Group Lasso Penalty
1
2
3
45
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
(c) True Nearest Neighbor Graph
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 47 / 48
Additional notes
In the appendix, they show that if you go down to 2 classes, FGL performs
as fast as GGL while still maintaining the results best overall.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 48 / 48
Additional notes
In the appendix, they show that if you go down to 2 classes, FGL performs
as fast as GGL while still maintaining the results best overall.
We thought it was odd that the authors were much more interested in
having a low false positive rate at the cost of a high false negative instead
of a balance between the two.
Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 48 / 48

More Related Content

What's hot

Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Datatuxette
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learningUniversity of Groningen
 
Naïve Bayes Machine Learning Classification with R Programming: A case study ...
Naïve Bayes Machine Learning Classification with R Programming: A case study ...Naïve Bayes Machine Learning Classification with R Programming: A case study ...
Naïve Bayes Machine Learning Classification with R Programming: A case study ...SubmissionResearchpa
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysistuxette
 
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Christian Robert
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3Mintu246
 
Computations, Paths, Types and Proofs
Computations, Paths, Types and ProofsComputations, Paths, Types and Proofs
Computations, Paths, Types and ProofsRuy De Queiroz
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
 
Random Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application ExamplesRandom Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application ExamplesFörderverein Technische Fakultät
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityChristian Robert
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and morehsharmasshare
 
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...QUT_SEF
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?Christian Robert
 

What's hot (20)

Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Data
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
Naïve Bayes Machine Learning Classification with R Programming: A case study ...
Naïve Bayes Machine Learning Classification with R Programming: A case study ...Naïve Bayes Machine Learning Classification with R Programming: A case study ...
Naïve Bayes Machine Learning Classification with R Programming: A case study ...
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 
Rencontres Mutualistes
Rencontres MutualistesRencontres Mutualistes
Rencontres Mutualistes
 
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
 
QMC: Transition Workshop - Discussion of "Representative Points for Small and...
QMC: Transition Workshop - Discussion of "Representative Points for Small and...QMC: Transition Workshop - Discussion of "Representative Points for Small and...
QMC: Transition Workshop - Discussion of "Representative Points for Small and...
 
Graph Based Pattern Recognition
Graph Based Pattern RecognitionGraph Based Pattern Recognition
Graph Based Pattern Recognition
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Boston talk
Boston talkBoston talk
Boston talk
 
A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3
 
Computations, Paths, Types and Proofs
Computations, Paths, Types and ProofsComputations, Paths, Types and Proofs
Computations, Paths, Types and Proofs
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Random Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application ExamplesRandom Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application Examples
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard University
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
 
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxi...
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?
 

Similar to Presentation

Extreme bound analysis based on correlation coefficient for optimal regressio...
Extreme bound analysis based on correlation coefficient for optimal regressio...Extreme bound analysis based on correlation coefficient for optimal regressio...
Extreme bound analysis based on correlation coefficient for optimal regressio...Loc Nguyen
 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelLaboratoire Statistique et génome
 
Factor analysis
Factor analysis Factor analysis
Factor analysis Mintu246
 
An incremental approach to attribute reduction of dynamic set-valued informat...
An incremental approach to attribute reduction of dynamic set-valued informat...An incremental approach to attribute reduction of dynamic set-valued informat...
An incremental approach to attribute reduction of dynamic set-valued informat...Guangming Lang
 
Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streamsyuanchung
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision TheorySangwoo Mo
 
Linear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization일상 온
 
3 Entropy, Relative Entropy and Mutual Information.pdf
3 Entropy, Relative Entropy and Mutual Information.pdf3 Entropy, Relative Entropy and Mutual Information.pdf
3 Entropy, Relative Entropy and Mutual Information.pdfCharlesAsiedu4
 
Stats For Life Module7 Oc
Stats For Life Module7 OcStats For Life Module7 Oc
Stats For Life Module7 OcN Rabe
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer Sammer Qader
 
[DL輪読会]Generative Models of Visually Grounded Imagination
[DL輪読会]Generative Models of Visually Grounded Imagination[DL輪読会]Generative Models of Visually Grounded Imagination
[DL輪読会]Generative Models of Visually Grounded ImaginationDeep Learning JP
 

Similar to Presentation (20)

Extreme bound analysis based on correlation coefficient for optimal regressio...
Extreme bound analysis based on correlation coefficient for optimal regressio...Extreme bound analysis based on correlation coefficient for optimal regressio...
Extreme bound analysis based on correlation coefficient for optimal regressio...
 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical model
 
7.pdf
7.pdf7.pdf
7.pdf
 
7.pdf
7.pdf7.pdf
7.pdf
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
 
An incremental approach to attribute reduction of dynamic set-valued informat...
An incremental approach to attribute reduction of dynamic set-valued informat...An incremental approach to attribute reduction of dynamic set-valued informat...
An incremental approach to attribute reduction of dynamic set-valued informat...
 
Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streams
 
Lesson 26
Lesson 26Lesson 26
Lesson 26
 
AI Lesson 26
AI Lesson 26AI Lesson 26
AI Lesson 26
 
Estimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample SetsEstimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample Sets
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision Theory
 
Linear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Geert van Kollenburg-masterthesis
Geert van Kollenburg-masterthesisGeert van Kollenburg-masterthesis
Geert van Kollenburg-masterthesis
 
3 Entropy, Relative Entropy and Mutual Information.pdf
3 Entropy, Relative Entropy and Mutual Information.pdf3 Entropy, Relative Entropy and Mutual Information.pdf
3 Entropy, Relative Entropy and Mutual Information.pdf
 
Stats For Life Module7 Oc
Stats For Life Module7 OcStats For Life Module7 Oc
Stats For Life Module7 Oc
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
 
Adensonian classification
Adensonian classificationAdensonian classification
Adensonian classification
 
[DL輪読会]Generative Models of Visually Grounded Imagination
[DL輪読会]Generative Models of Visually Grounded Imagination[DL輪読会]Generative Models of Visually Grounded Imagination
[DL輪読会]Generative Models of Visually Grounded Imagination
 
Basen Network
Basen NetworkBasen Network
Basen Network
 

Presentation

  • 1. Joint Estimation of Graphical Models Across Multiple Classes Tavis Abrahamsen, Ray Bai, Syed Rahman, Andrey Skripnikov Department of Statistics University of Florida April 22, 2015 Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 1 / 48
  • 2. Background Suppose that observations x1, x2, . . . , xn ∈ RP are independent and identically distributed N(µ, Σ) where µ ∈ Rp and Σ is a positive definite p × p matrix. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 2 / 48
  • 3. Background Suppose that observations x1, x2, . . . , xn ∈ RP are independent and identically distributed N(µ, Σ) where µ ∈ Rp and Σ is a positive definite p × p matrix. The zeros in the inverse covariance matrix Σ−1 correspond to pairs of features that are conditionally independent - that is, pairs of variables that are independent of each other, given all the other variables in the data set. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 2 / 48
  • 4. Background Suppose that observations x1, x2, . . . , xn ∈ RP are independent and identically distributed N(µ, Σ) where µ ∈ Rp and Σ is a positive definite p × p matrix. The zeros in the inverse covariance matrix Σ−1 correspond to pairs of features that are conditionally independent - that is, pairs of variables that are independent of each other, given all the other variables in the data set. In a Gaussian graphical model, the conditional dependence relationships are represented by a graph in which nodes represent the features and edges connect conditionally dependent pairs of features. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 2 / 48
  • 5. Background A natural way to estimate the precision (or concentration) matrix Σ−1 is via maximum likelihood. Letting S denote the empirical covariance matrix of X, the Gaussian log likelihood takes the form (up to a constant) n 2 log det Σ−1 − trace S Σ−1 Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 3 / 48
  • 6. Background A natural way to estimate the precision (or concentration) matrix Σ−1 is via maximum likelihood. Letting S denote the empirical covariance matrix of X, the Gaussian log likelihood takes the form (up to a constant) n 2 log det Σ−1 − trace S Σ−1 Maximizing this function with respect to Σ−1 yields the maximum likelihood estimate S−1. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 3 / 48
  • 7. Background A natural way to estimate the precision (or concentration) matrix Σ−1 is via maximum likelihood. Letting S denote the empirical covariance matrix of X, the Gaussian log likelihood takes the form (up to a constant) n 2 log det Σ−1 − trace S Σ−1 Maximizing this function with respect to Σ−1 yields the maximum likelihood estimate S−1. Problems with using the MLE of Σ−1: 1. In the high dimensional setting where p > n the matrix S is singular and cannot be inverted to yield an estimate of Σ−1. 2. Even when S−1 exists, this estimator will typically not be sparse. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 3 / 48
  • 8. Background One method for obtaining sparse estimates of Σ−1 is to instead solve the graphical lasso problem max Θ log det Θ − trace (S Θ) − λ||Θ||1, where λ is a nonnegative tuning parameter. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 4 / 48
  • 9. Background One method for obtaining sparse estimates of Σ−1 is to instead solve the graphical lasso problem max Θ log det Θ − trace (S Θ) − λ||Θ||1, where λ is a nonnegative tuning parameter. Advantages of using the graphical lasso estimator: 1. The l1 penalty term yields sparse estimates of Σ−1 when λ is “large”. 2. This problem can be solved even if p n. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 4 / 48
  • 10. Estimating Models for Heterogeneous Data Problem: The standard formulation for estimating a Gaussian graphical model assumes that each observation is drawn from the same distribution. However, in many datasets the observations may correspond to several distinct classes. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 5 / 48
  • 11. Estimating Models for Heterogeneous Data Problem: The standard formulation for estimating a Gaussian graphical model assumes that each observation is drawn from the same distribution. However, in many datasets the observations may correspond to several distinct classes. Estimating a single graphical model would mask the underlying heterogeneity, while estimating separate models for each category ignores the common structure. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 5 / 48
  • 12. Estimating Models for Heterogeneous Data Problem: The standard formulation for estimating a Gaussian graphical model assumes that each observation is drawn from the same distribution. However, in many datasets the observations may correspond to several distinct classes. Estimating a single graphical model would mask the underlying heterogeneity, while estimating separate models for each category ignores the common structure. We want to jointly estimate several graphical models corresponding to different categories. In this presentation, we will describe three algorithms that enable us to do this. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 5 / 48
  • 13. Motivation - Genetics Graphical models are of particular interest in the analysis of gene expression data since they can provide a useful tool for visualizing the relationships among genes and generating biological hypotheses. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 6 / 48
  • 14. Motivation - Genetics Graphical models are of particular interest in the analysis of gene expression data since they can provide a useful tool for visualizing the relationships among genes and generating biological hypotheses. The datasets may correspond to several distinct classes. For example, suppose a cancer researcher collects gene expression measurements for a set of cancer tissue samples and a set of normal tissue samples. One might want to estimate a graphical model for the cancer samples and a graphical model for the normal samples. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 6 / 48
  • 15. Motivation - Genetics Graphical models are of particular interest in the analysis of gene expression data since they can provide a useful tool for visualizing the relationships among genes and generating biological hypotheses. The datasets may correspond to several distinct classes. For example, suppose a cancer researcher collects gene expression measurements for a set of cancer tissue samples and a set of normal tissue samples. One might want to estimate a graphical model for the cancer samples and a graphical model for the normal samples. One might expect the two graphical models to be similar to each other since both are based on the same type of tissue, but also have important differences resulting from the fact that gene networks are often dysregulated in cancer. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 6 / 48
  • 16. Problem Formulation Suppose we have a heterogeneous data set with p variables and K categories. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 7 / 48
  • 17. Problem Formulation Suppose we have a heterogeneous data set with p variables and K categories. The k-th category contains nk observations (x (k) 1 , ..., x (k) nk ), where each x (k) i = (x (k) i,1 , ..., x (k) i,p ) is a p-dimensional row vector. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 7 / 48
  • 18. Problem Formulation Suppose we have a heterogeneous data set with p variables and K categories. The k-th category contains nk observations (x (k) 1 , ..., x (k) nk ), where each x (k) i = (x (k) i,1 , ..., x (k) i,p ) is a p-dimensional row vector. Without loss of generality, we assume the observations in the same category are centered along each variable, i.e., nk i=1 x (k) i,j = 0 for all 1 ≤ j ≤ p and 1 ≤ k ≤ K. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 7 / 48
  • 19. Problem Formulation We further assume that x (k) 1 , ..., x (k) nk are independent and identically distributed, sampled from a p-variate Gaussian distribution with mean zero (without loss of generality since we center the data) and covariance matrix Σ(k). Let Ω(k) = (Σ(k))−1 = (ω (k) j,j )pxp. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 8 / 48
  • 20. Problem Formulation We further assume that x (k) 1 , ..., x (k) nk are independent and identically distributed, sampled from a p-variate Gaussian distribution with mean zero (without loss of generality since we center the data) and covariance matrix Σ(k). Let Ω(k) = (Σ(k))−1 = (ω (k) j,j )pxp. Then the likelihood function for k-th category will look like this: l(Ω(k) ) = − nk 2 log(2π) + nk 2 [log{det(Ω(k) )} − trace(ˆΣ(k) Ω(k) )], where ˆΣ(k) is the sample covariance matrix for the k-th category. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 8 / 48
  • 21. Problem Formulation The most direct way to deal with such heterogeneous data is to estimate K individual graphical models. We can compute a separate l1-regularized estimator for each category k, 1 ≤ k ≤ K, by solving min Ω(k) trace(ˆΣ(k) Ω(k) ) − log{det(Ω(k) )} + λk j=j |ω (k) j,j |, where the minimum is taken over symmetric positive definite matrices. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 9 / 48
  • 22. Problem Formulation The most direct way to deal with such heterogeneous data is to estimate K individual graphical models. We can compute a separate l1-regularized estimator for each category k, 1 ≤ k ≤ K, by solving min Ω(k) trace(ˆΣ(k) Ω(k) ) − log{det(Ω(k) )} + λk j=j |ω (k) j,j |, where the minimum is taken over symmetric positive definite matrices. This problem can be efficiently solved by existing algorithms such as glasso of Freidman et al. (2008). However, we would prefer to jointly estimate several graphical models instead of estimating each one separately. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 9 / 48
  • 23. Joint Estimation To improve estimation in cases where graphical models for different categories may share some common structure, J. Guo, E. Levina, G. Michailidis, and J. Zhu propose the following joint estimation method in their paper “Joint Estimation of Multiple Graphical Models” (2009). Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 10 / 48
  • 24. Joint Estimation To improve estimation in cases where graphical models for different categories may share some common structure, J. Guo, E. Levina, G. Michailidis, and J. Zhu propose the following joint estimation method in their paper “Joint Estimation of Multiple Graphical Models” (2009). First, they reparameterize each ω (k) j,j as ω (k) j,j = θj,j γ (k) j,j , 1 ≤ j = j ≤ p, 1 ≤ k ≤ K . Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 10 / 48
  • 25. Joint Estimation To improve estimation in cases where graphical models for different categories may share some common structure, J. Guo, E. Levina, G. Michailidis, and J. Zhu propose the following joint estimation method in their paper “Joint Estimation of Multiple Graphical Models” (2009). First, they reparameterize each ω (k) j,j as ω (k) j,j = θj,j γ (k) j,j , 1 ≤ j = j ≤ p, 1 ≤ k ≤ K . For identifiability purposes, we restrict θj,j > 0, 1 ≤ j = j ≤ p. To preserve symmetry, they also require θj,j = θj ,j and γ (k) j,j = γ (k) j ,j, 1 ≤ j = j ≤ p and 1 ≤ k ≤ K. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 10 / 48
  • 26. Joint Estimation To improve estimation in cases where graphical models for different categories may share some common structure, J. Guo, E. Levina, G. Michailidis, and J. Zhu propose the following joint estimation method in their paper “Joint Estimation of Multiple Graphical Models” (2009). First, they reparameterize each ω (k) j,j as ω (k) j,j = θj,j γ (k) j,j , 1 ≤ j = j ≤ p, 1 ≤ k ≤ K . For identifiability purposes, we restrict θj,j > 0, 1 ≤ j = j ≤ p. To preserve symmetry, they also require θj,j = θj ,j and γ (k) j,j = γ (k) j ,j, 1 ≤ j = j ≤ p and 1 ≤ k ≤ K. This decomposition treats {ω (1) j,j , ..., ω (K) j,j } as a group, with the common factor θj,j controlling the presence of the link between nodes j and j in any of the categories, and γ (k) j,j reflects the differences between categories. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 10 / 48
  • 27. Joint Estimation Let Θ = (θj,j )pxp and Γ(k) = (γ (k) j,j )pxp. To estimate this model, we propose the following penalized criterion: min Θ,{Γ(k)}K k=1 K k=1 [trace(ˆΣΩ(k) )−log{det(Ω(k) )}] +η1 j=j θj,j +η2 j=j K k=1 |γ (k) j,j |, (1) subject to ω (k) j,j = θj,j γ (k) j,j , θj,j > 0, 1 ≤ j, j ≤ p θj,j = θj ,j, γ (k) j,j = γ (k) j ,j, 1 ≤ j = j ≤ p; 1 ≤ k ≤ K θj,j = 1, γ (k) j,j = ω (k) j,j , 1 ≤ j ≤ p; 1 ≤ k ≤ K, where η1 and η2 are two tuning parameters. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 11 / 48
  • 28. Joint Estimation The first parameter, η1 , controls the sparsity of the common factors θj,j and it can effectively remove the common zero elements across Ω(1), ..., Ω(K); i.e., if θj,j is shrunk to zero, there will be no link between nodes j and j in any of the K graphs. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 12 / 48
  • 29. Joint Estimation The first parameter, η1 , controls the sparsity of the common factors θj,j and it can effectively remove the common zero elements across Ω(1), ..., Ω(K); i.e., if θj,j is shrunk to zero, there will be no link between nodes j and j in any of the K graphs. If θj,j is not zero, some of the γ (k) j,j s (and hence some of the ω (k) j,j s) can still be set to zero by the second penalty controlled by η2. This allows graphs belonging to different categories to have different structures. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 12 / 48
  • 30. Joint Estimation To simplify the model, the two tuning parameters η1 and η2 in (1) can be replaced by a single tuning parameter, by letting η = η1η2. It turns out that solving (1) is equivalent to solving min Θ,{Γ(k)}K k=1 K k=1 [trace(ˆΣΩ(k) ) − log{det(Ω(k) )}] + j=j θj,j + η j=j K k=1 |γ (k) j,j |, (2) subject to ω (k) j,j = θj,j γ (k) j,j , θj,j > 0, 1 ≤ j, j ≤ p θj,j = θj ,j, γ (k) j,j = γ (k) j ,j, 1 ≤ j = j ≤ p; 1 ≤ k ≤ K θj,j = 1, γ (k) j,j = ω (k) j,j , 1 ≤ j ≤ p; 1 ≤ k ≤ K, Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 13 / 48
  • 31. Joint Estimation First, Guo et al. reformulate the problem (2) in a more convenient form for computational purposes. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 14 / 48
  • 32. Joint Estimation First, Guo et al. reformulate the problem (2) in a more convenient form for computational purposes. Let {ˆΩ(k)}K k=1 be a local minimizer of min {Ω(k)}K k=1 K k=1 [trace(ˆΣΩ(k) ) − log{det(Ω(k) )}] + λ j=j K k=1 |ω (k) j,j |, (3) where λ = 2 √ η. Then, there exists a local minimizer of (2), (ˆΘ, {ˆΓ}K k=1), such that ˆΩ(k) = ˆΘ(k)ˆΓ(k), for all 1 ≤ k ≤ K. On the other hand, if (ˆΘ, {ˆΓ(k)}K k=1) is a local minimizer of (2), then there also exists a local minimizer of (3), {ˆΩ(k)}K k=1, such that ˆΩ(k) = ˆΘ(k)ˆΓ(k) for all 1 ≤ k ≤ K. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 14 / 48
  • 33. Joint Estimation One can see that because of the penalty term in (3), the optimization criterion is not convex. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 15 / 48
  • 34. Joint Estimation One can see that because of the penalty term in (3), the optimization criterion is not convex. In order to reach convexity, an iterative optimization approach based on local linear approximation (LLA) is used (Zou and Li, 2008). If pλ(|ω|) denotes the penalty term then LLA does the following: pλ(|ω|) ≈ pλ(|ω0 |) + pλ(|ω0 |)(|ω| − |ω0 |), for ω ≈ ω0 . Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 15 / 48
  • 35. Joint Estimation One can see that because of the penalty term in (3), the optimization criterion is not convex. In order to reach convexity, an iterative optimization approach based on local linear approximation (LLA) is used (Zou and Li, 2008). If pλ(|ω|) denotes the penalty term then LLA does the following: pλ(|ω|) ≈ pλ(|ω0 |) + pλ(|ω0 |)(|ω| − |ω0 |), for ω ≈ ω0 . Specifically, letting (ω (k) j,j )(t) denote the estimates from the previous iteration t, we approximate K k=1 |ω (k) j,j | ∼ K k=1 |ω (k) j,j | K k=1 |(ω (k) j,j )(t)| . Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 15 / 48
  • 36. Joint Estimation One can see that because of the penalty term in (3), the optimization criterion is not convex. In order to reach convexity, an iterative optimization approach based on local linear approximation (LLA) is used (Zou and Li, 2008). If pλ(|ω|) denotes the penalty term then LLA does the following: pλ(|ω|) ≈ pλ(|ω0 |) + pλ(|ω0 |)(|ω| − |ω0 |), for ω ≈ ω0 . Specifically, letting (ω (k) j,j )(t) denote the estimates from the previous iteration t, we approximate K k=1 |ω (k) j,j | ∼ K k=1 |ω (k) j,j | K k=1 |(ω (k) j,j )(t)| . LLA estimate helps alleviate the computation burden and overcome the potential local minima problem in minimizing the nonconvex function. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 15 / 48
  • 37. Joint Estimation Algorithm Thus, at the (t + 1)-th iteration of our joint estimation algorithm, problem (3) may be decomposed into K individual optimization problems: (Ω(k) )(t+1) = arg min Ω(k) trace(ˆΣ(k) Ω(k) ) − log{det(Ω(k) )} + λ j=j τ (k) j,j |ω (k) j,j |, (4) where τ (k) j,j = ( K k=1 |(ω (k) j,j )(t)|)−1/2. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 16 / 48
  • 38. Joint Estimation Algorithm Thus, at the (t + 1)-th iteration of our joint estimation algorithm, problem (3) may be decomposed into K individual optimization problems: (Ω(k) )(t+1) = arg min Ω(k) trace(ˆΣ(k) Ω(k) ) − log{det(Ω(k) )} + λ j=j τ (k) j,j |ω (k) j,j |, (4) where τ (k) j,j = ( K k=1 |(ω (k) j,j )(t)|)−1/2. Note that criterion (4) is exactly the sparse precision matrix estimation problem with weighted l1 penalty; the solution can be efficiently computed using glasso. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 16 / 48
  • 39. Convexity of the Criterion trace(ˆΣ(k)Ω(k)) is an affine function of elements of Ω(k), therefore is convex. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 17 / 48
  • 40. Convexity of the Criterion trace(ˆΣ(k)Ω(k)) is an affine function of elements of Ω(k), therefore is convex. log{det(Ω(k))} is convex by the restriction to the line proof from class. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 17 / 48
  • 41. Convexity of the Criterion trace(ˆΣ(k)Ω(k)) is an affine function of elements of Ω(k), therefore is convex. log{det(Ω(k))} is convex by the restriction to the line proof from class. λ j=j τ (k) j,j |ω (k) j,j | is convex as an affine function of elements of Ω(k). The sum of convex functions is convex, therefore the minimization criterion is a convex function of Ω(k). Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 17 / 48
  • 42. Joint Estimation Algorithm In summary, the proposed algorithm for solving min {Ω(k)}K k=1 K k=1 [trace(ˆΣΩ(k) ) − log{det(Ω(k) )}] + λ j=j K k=1 |ω (k) j,j |, is: Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 18 / 48
  • 43. Joint Estimation Algorithm In summary, the proposed algorithm for solving min {Ω(k)}K k=1 K k=1 [trace(ˆΣΩ(k) ) − log{det(Ω(k) )}] + λ j=j K k=1 |ω (k) j,j |, is: (a) Initialize ˆΩ(k) = (ˆΣ(k) + νIp)−1 for all 1 ≤ k ≤ K, where Ip is the identity matrix and the constant ν is chosen to guarantee ˆΣ(k) + νIp is positive definite; (b) Using glasso, update ˆΩ(k) by (4) for all 1 ≤ k ≤ K. (c) Repeat the previous step until convergence is achieved. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 18 / 48
  • 44. Model Selection. The tuning parameter λ in (4) controls the sparsity of the resulting estimator. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 19 / 48
  • 45. Model Selection. The tuning parameter λ in (4) controls the sparsity of the resulting estimator. Guo et al. select the optimal tuning parameter by minimizing a Bayesian information criterion (BIC), which balances the goodness of fit and the model complexity. Specifically, we define BIC for the proposed joint estimation method as BIC(λ) = K k=1 [trace(ˆΣ(k) ˆΩ (k) λ ) − log{det(ˆΩ (k) λ } + log(nk)dfk] where ˆΩ (1) λ , ..., ˆΩ (K) λ are the estimates from (4) with tuning parameter λ and the degrees of freedom are dfk = #{(j, j ) : j < j , ˆω (k) j,j = 0}. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 19 / 48
  • 46. Simulation Study To assess the performance of the joint estimation algorithm in Guo et al., three types of simulated networks are simulated: a chain network, a 5-nearest neighbors network, and a scale-free (power law) network. In all cases, p = 100 features and K = 3. For k = 1, ..., K, nk = 100 iid observations from a multiariate normal distribution N(0, (Ω(k))−1), where Ω(k) is the precision matrix for the k-th category. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 20 / 48
  • 47. Simulation Study To assess the performance of the joint estimation algorithm in Guo et al., three types of simulated networks are simulated: a chain network, a 5-nearest neighbors network, and a scale-free (power law) network. In all cases, p = 100 features and K = 3. For k = 1, ..., K, nk = 100 iid observations from a multiariate normal distribution N(0, (Ω(k))−1), where Ω(k) is the precision matrix for the k-th category. A constant ρ is used to add heterogeneity to the common structure by creating additional indiivdual links (i.e. ρ is the ratio of the number of individual links to the number of common links). Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 20 / 48
  • 48. Networks Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 21 / 48
  • 49. Model Diagnostics We compare the joint estimation method to the separate estimation method. To assess performance: Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 22 / 48
  • 50. Model Diagnostics We compare the joint estimation method to the separate estimation method. To assess performance: Plot ROC curves which plot the average proportion of correctly detected links against the average false positive range of values over a range of values of λ. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 22 / 48
  • 51. Model Diagnostics We compare the joint estimation method to the separate estimation method. To assess performance: Plot ROC curves which plot the average proportion of correctly detected links against the average false positive range of values over a range of values of λ. Compute and compare the following metrics: average entropy loss (EL), average Frobenius loss (FL), average false positive (FP) and average false negative (FN) rates, and the average rate of mis-identified common zeros among the categories (CZ). Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 22 / 48
  • 52. Simulation Results Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 23 / 48
  • 53. Simulation Results Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 24 / 48
  • 54. Simulation Results As the estimated ROC curves in Figure 2 show, the joint estimation method dominates the separate estimation method when the proportion of individual links is low. When the proportion of indvidual links is high, the methods perform similarly. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 25 / 48
  • 55. Simulation Results As the estimated ROC curves in Figure 2 show, the joint estimation method dominates the separate estimation method when the proportion of individual links is low. When the proportion of indvidual links is high, the methods perform similarly. As Table 1 shows, the joint estimation method produces lower entropy and Frobenius losses, as well as lower false negative rates. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 25 / 48
  • 56. Simulation Results As the estimated ROC curves in Figure 2 show, the joint estimation method dominates the separate estimation method when the proportion of individual links is low. When the proportion of indvidual links is high, the methods perform similarly. As Table 1 shows, the joint estimation method produces lower entropy and Frobenius losses, as well as lower false negative rates. When the proportion of common links is high enough, it produces lower false positive rates and performs better at identifying common zeros than the separate estimation method. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 25 / 48
  • 57. Simulation Results As the estimated ROC curves in Figure 2 show, the joint estimation method dominates the separate estimation method when the proportion of individual links is low. When the proportion of indvidual links is high, the methods perform similarly. As Table 1 shows, the joint estimation method produces lower entropy and Frobenius losses, as well as lower false negative rates. When the proportion of common links is high enough, it produces lower false positive rates and performs better at identifying common zeros than the separate estimation method. Conclusion: In the presence of a common structure, the joint estimation method outperforms the separate estimation method. In the absence of a common structure, their performance is fairly similar. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 25 / 48
  • 58. Data Example Webpages from websites at computer science departments in four universities (Cornell, Texas, Washington, and Wisconsin) are manually classified into seven categories: student, faculty, staff, department, course, project, and other. The joint estimation method is applied to n = 1396 documents in the four largest categories and p = 100 terms, and links between the terms are explored. The resulting common structure is shown in Figure 3. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 26 / 48
  • 59. Data Example Webpages from websites at computer science departments in four universities (Cornell, Texas, Washington, and Wisconsin) are manually classified into seven categories: student, faculty, staff, department, course, project, and other. The joint estimation method is applied to n = 1396 documents in the four largest categories and p = 100 terms, and links between the terms are explored. The resulting common structure is shown in Figure 3. The model also allows us to explore the heterogeneity between different categories. For example, the graphs for the “student” and “faculty” categories have some links that appear in only one or the other (e.g. the terms “teach” and “assist” are only linked in the “student” category, while “assist-professor” and “select-public” are only linked in the “faculty” category. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 26 / 48
  • 60. Data Example Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 27 / 48
  • 61. Data Example Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 28 / 48
  • 62. The Joint Graphical Lasso In their paper “The joint graphical lasso for inverse covariance estimation across multiple classes,” P. Danaher, P. Wang, & D. Witten propose the joint graphical lasso as a technique for jointly estimating multiple graphical models corresponding to distinct but related conditions, such as cancer and normal tissue. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 29 / 48
  • 63. The Joint Graphical Lasso In their paper “The joint graphical lasso for inverse covariance estimation across multiple classes,” P. Danaher, P. Wang, & D. Witten propose the joint graphical lasso as a technique for jointly estimating multiple graphical models corresponding to distinct but related conditions, such as cancer and normal tissue. Suppose we have data from K ≥ 2 distinct classes. Instead of estimating Σ−1 1 , Σ−1 2 , . . . , Σ−1 K separately, the authors propose estimating estimating these values jointly by maximizing the following penalized log-likelihood function max {Θ} K k=1 nk log det Θ(k) − trace S(k) Θ(k) − P({Θ}), subject to the constraint that Θ(1), Θ(2), . . . , Θ(K) are positive definite, where {Θ} = {Θ(1), Θ(2), . . . , Θ(K)}. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 29 / 48
  • 64. The Joint Graphical Lasso In their paper “The joint graphical lasso for inverse covariance estimation across multiple classes,” P. Danaher, P. Wang, & D. Witten propose the joint graphical lasso as a technique for jointly estimating multiple graphical models corresponding to distinct but related conditions, such as cancer and normal tissue. Suppose we have data from K ≥ 2 distinct classes. Instead of estimating Σ−1 1 , Σ−1 2 , . . . , Σ−1 K separately, the authors propose estimating estimating these values jointly by maximizing the following penalized log-likelihood function max {Θ} K k=1 nk log det Θ(k) − trace S(k) Θ(k) − P({Θ}), subject to the constraint that Θ(1), Θ(2), . . . , Θ(K) are positive definite, where {Θ} = {Θ(1), Θ(2), . . . , Θ(K)}. P({Θ}) denotes a convex penalty function, so the objective function is strictly concave. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 29 / 48
  • 65. Joint Graphical Lasso Danaher et al. propose choosing a penalty function P that will encourage ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) to share certain characteristics, such as the locations or values of the nonzero elements, in addition to providing sparse estimates of the precision matrices. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 30 / 48
  • 66. Joint Graphical Lasso Danaher et al. propose choosing a penalty function P that will encourage ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) to share certain characteristics, such as the locations or values of the nonzero elements, in addition to providing sparse estimates of the precision matrices. Two penalty functions suggested by Danaher et al. are the fused graphical lasso and group graphical lasso penalties. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 30 / 48
  • 67. Fused Graphical Lasso The fused graphical lasso (FGL) penalty is given by P({Θ}) = λ1 K k=1 i=j |θ (k) ij | + λ2 k<k i,j |θ (k) ij − θ (k ) ij |, where λ1 and λ2 are nonnegative tuning parameters. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 31 / 48
  • 68. Fused Graphical Lasso The fused graphical lasso (FGL) penalty is given by P({Θ}) = λ1 K k=1 i=j |θ (k) ij | + λ2 k<k i,j |θ (k) ij − θ (k ) ij |, where λ1 and λ2 are nonnegative tuning parameters. 1. Like the graphical lasso, FGL results in sparse estimates ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) when the tuning parameter λ1 is large. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 31 / 48
  • 69. Fused Graphical Lasso The fused graphical lasso (FGL) penalty is given by P({Θ}) = λ1 K k=1 i=j |θ (k) ij | + λ2 k<k i,j |θ (k) ij − θ (k ) ij |, where λ1 and λ2 are nonnegative tuning parameters. 1. Like the graphical lasso, FGL results in sparse estimates ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) when the tuning parameter λ1 is large. 2. Many of the elements ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) will be identical across classes when the tuning parameter λ2 is large. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 31 / 48
  • 70. Fused Graphical Lasso The fused graphical lasso (FGL) penalty is given by P({Θ}) = λ1 K k=1 i=j |θ (k) ij | + λ2 k<k i,j |θ (k) ij − θ (k ) ij |, where λ1 and λ2 are nonnegative tuning parameters. 1. Like the graphical lasso, FGL results in sparse estimates ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) when the tuning parameter λ1 is large. 2. Many of the elements ˆΘ(1), ˆΘ(2), . . . , ˆΘ(K) will be identical across classes when the tuning parameter λ2 is large. 3. FGL borrows information aggressively across classes, encouraging not only similar network structure but also similar edge values. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 31 / 48
  • 71. Group Graphical Lasso The group graphical lasso (GGL) penalty function is given by P({Θ}) = λ1 K k=1 i=j |θ (k) ij | + λ2 i=j K k=1 θ (k)2 ij , where λ1 and λ2 are nonnegative tuning parameters. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 32 / 48
  • 72. Group Graphical Lasso The group graphical lasso (GGL) penalty function is given by P({Θ}) = λ1 K k=1 i=j |θ (k) ij | + λ2 i=j K k=1 θ (k)2 ij , where λ1 and λ2 are nonnegative tuning parameters. 1. The group lasso penalty encourages a similar pattern of sparsity across all of the precision matrices - there will be a tendency for the zeros in the K estimated precision matrices to occur in the same places. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 32 / 48
  • 73. Group Graphical Lasso The group graphical lasso (GGL) penalty function is given by P({Θ}) = λ1 K k=1 i=j |θ (k) ij | + λ2 i=j K k=1 θ (k)2 ij , where λ1 and λ2 are nonnegative tuning parameters. 1. The group lasso penalty encourages a similar pattern of sparsity across all of the precision matrices - there will be a tendency for the zeros in the K estimated precision matrices to occur in the same places. 2. When λ1 = 0 and λ2 > 0, each ˆΘ(k) will have an identical pattern of non-zero elements. On the other hand, the lasso penalty encourages further sparsity within each ˆΘ(k). Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 32 / 48
  • 74. Group Graphical Lasso The group graphical lasso (GGL) penalty function is given by P({Θ}) = λ1 K k=1 i=j |θ (k) ij | + λ2 i=j K k=1 θ (k)2 ij , where λ1 and λ2 are nonnegative tuning parameters. 1. The group lasso penalty encourages a similar pattern of sparsity across all of the precision matrices - there will be a tendency for the zeros in the K estimated precision matrices to occur in the same places. 2. When λ1 = 0 and λ2 > 0, each ˆΘ(k) will have an identical pattern of non-zero elements. On the other hand, the lasso penalty encourages further sparsity within each ˆΘ(k). 3. GGL encourages a weaker form of similarity across the K precision matrices than FGL in that GGL only encourages a shared pattern of sparsity and not shared edge values. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 32 / 48
  • 75. Algorithm for the Joint Graphical Lasso Danaher et al. solve the joint graphical lasso problem using an alternating directions method of multipliers (ADMM) algorithm. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 33 / 48
  • 76. Algorithm for the Joint Graphical Lasso Danaher et al. solve the joint graphical lasso problem using an alternating directions method of multipliers (ADMM) algorithm. The original problem can be rewritten as min {Θ},{Z} − K k=1 nk log det Θ(k) − trace S(k) Θ(k) + P({Z}) subject to the positive-definiteness constraint as well as the constraint that Z(k) = Θ(k), where {Z} = {Z(1), Z(2), . . . , Z(k)}. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 33 / 48
  • 77. Algorithm for the Joint Graphical Lasso The scaled augmented Lagrangian for this problem is given by Lρ({Θ}, {Z}, {U}) = − K k=1 nk log det Θ(k) − trace S(k) Θ(k) + P({Z}) + ρ 2 K k=1 ||Θ(k) − Z(k) + U(k) ||2 F , where {U} = {U(1), U(2), . . . , U(k)}. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 34 / 48
  • 78. Algorithm for the Joint Graphical Lasso The scaled augmented Lagrangian for this problem is given by Lρ({Θ}, {Z}, {U}) = − K k=1 nk log det Θ(k) − trace S(k) Θ(k) + P({Z}) + ρ 2 K k=1 ||Θ(k) − Z(k) + U(k) ||2 F , where {U} = {U(1), U(2), . . . , U(k)}. The ADMM corresponding to the above problem results in iterating three simple steps. (a) {Θ(i)} ← arg min{Θ}{Lρ({Θ}, {Z(i−1)}, {U(i−1)})} (b) {Z(i)} ← arg min{Z}{Lρ({Θ(i)}, {Z}, {U(i−1)})} (c) {U(i)} ← {U(i−1)} + ({Θ(i)} − {Z(i)}). Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 34 / 48
  • 79. Algorithm for the Joint Graphical Lasso The scaled augmented Lagrangian for this problem is given by Lρ({Θ}, {Z}, {U}) = − K k=1 nk log det Θ(k) − trace S(k) Θ(k) + P({Z}) + ρ 2 K k=1 ||Θ(k) − Z(k) + U(k) ||2 F , where {U} = {U(1), U(2), . . . , U(k)}. The ADMM corresponding to the above problem results in iterating three simple steps. (a) {Θ(i)} ← arg min{Θ}{Lρ({Θ}, {Z(i−1)}, {U(i−1)})} (b) {Z(i)} ← arg min{Z}{Lρ({Θ(i)}, {Z}, {U(i−1)})} (c) {U(i)} ← {U(i−1)} + ({Θ(i)} − {Z(i)}). The update in step (a) preserves positive-definiteness, thus this constraint can be satisfied simply by initializing Θ (k) (0) = I. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 34 / 48
  • 80. Simulation Study The data for main simulation study consisted of generating three networks with p = 500 features belonging to ten equally sized unconnected subnetworks, each with a power law degree distribution, i.e. a scale-free network. Of the ten subnetworks, eight have the same structure and edge values in all three classes, one is identical between the first two classes and missing in the third (i.e. the corresponding features are singletons in the third network), and one is present in only the first class. The corresponding graph is shown in Figure 1. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 35 / 48
  • 81. Simulation Study The data for main simulation study consisted of generating three networks with p = 500 features belonging to ten equally sized unconnected subnetworks, each with a power law degree distribution, i.e. a scale-free network. Of the ten subnetworks, eight have the same structure and edge values in all three classes, one is identical between the first two classes and missing in the third (i.e. the corresponding features are singletons in the third network), and one is present in only the first class. The corresponding graph is shown in Figure 1. In addition to the 500-feature network pair, we generate a pair of networks with p = 1000 features, each of which is block diagonal with 500 × 500 blocks corresponding to two copies of the 500-feature networks just described. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 35 / 48
  • 82. Figure 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● . 6. Network used to generate simulated datasets for Figure 2 in Section 7.1.2. Black edges are common to all th sses, green edges are present only in classes 1 and 2, and red edges are present only in class 1. Figure: This shows the graph corresponding to the concentration matrix for the main simulation study. Black edges are common to all three classes, green edges are present only in classes 1 and 2, and red edges are present only in class 1. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 36 / 48
  • 83. Basic Result As we will see in the next slides, when p = 500 and n = 150, FGL performs the best in the simulations conducted by Danaher et. al. although it is much slower than glasso (the fastest) and the group lasso penalty case. The graphical lasso and the joint estimation method introduced by Guo et al. (2011) are included in the comparisons as well. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 37 / 48
  • 84. Basic Result As we will see in the next slides, when p = 500 and n = 150, FGL performs the best in the simulations conducted by Danaher et. al. although it is much slower than glasso (the fastest) and the group lasso penalty case. The graphical lasso and the joint estimation method introduced by Guo et al. (2011) are included in the comparisons as well. In addition, FGL also seems to do better than GGL asymptotically when we hold p to be fixed and let n increase. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 37 / 48
  • 85. In terms of parameters: FGL is presented in terms of λ2 but for GGL they reparametrize the results in terms ω2 = 1√ 2 λ2/(λ1 + 1√ 2 λ2). Each line corresponds to a different value of λ2 and ω2. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 38 / 48
  • 86. In terms of parameters: FGL is presented in terms of λ2 but for GGL they reparametrize the results in terms ω2 = 1√ 2 λ2/(λ1 + 1√ 2 λ2). Each line corresponds to a different value of λ2 and ω2. As λ1 or ω1 = λ1 + 1√ 2 λ2 increases, sparsity increases, or equivalently, the number of edges selected decreases. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 38 / 48
  • 87. In terms of parameters: FGL is presented in terms of λ2 but for GGL they reparametrize the results in terms ω2 = 1√ 2 λ2/(λ1 + 1√ 2 λ2). Each line corresponds to a different value of λ2 and ω2. As λ1 or ω1 = λ1 + 1√ 2 λ2 increases, sparsity increases, or equivalently, the number of edges selected decreases. According to the authors, approaches such as AIC, BIC, and cross-validation tend to choose models too large to be useful. In this setting, model selection should be guided by practical considerations, such as network interpretability, stability, and the desire for an edge set with a low false discovery rate. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 38 / 48
  • 88. Some measures of error SSE = K k=1 i=j (ˆθ (k) ij − (Σ(k) )−1 ij )2 Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 39 / 48
  • 89. Some measures of error SSE = K k=1 i=j (ˆθ (k) ij − (Σ(k) )−1 ij )2 Differential edges are defined as the edges that differ between classes. For FGL, it is computed as the number of pairs k < k , i < j such that ˆθ (k) ij = ˆθ (k ) ij . For GGL, the proposal of Guo et al. (2011), and the graphical lasso it is computed as the number of pairs k < k , i < j such that |ˆθ (k) ij − ˆθ (k ) ij | > 10−2. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 39 / 48
  • 90. Some measures of error SSE = K k=1 i=j (ˆθ (k) ij − (Σ(k) )−1 ij )2 Differential edges are defined as the edges that differ between classes. For FGL, it is computed as the number of pairs k < k , i < j such that ˆθ (k) ij = ˆθ (k ) ij . For GGL, the proposal of Guo et al. (2011), and the graphical lasso it is computed as the number of pairs k < k , i < j such that |ˆθ (k) ij − ˆθ (k ) ij | > 10−2. The Kullback-Leibler Divergence (dKL) from the multivariate normal model with inverse covariance estimates Θ(1), ..., Θ(k)to the multivariate normal model with the true precision matrices Σ(1), ..., Σ(k) is 1 2 K k=1 (− log det(Θ(k) Σ(k) ) + trace(Θ(k) Σ(k) )) Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 39 / 48
  • 91. Figure 2 from Danaher et. alJoint graphical lasso for estimation of multiple networks 1 0 1000 2000 3000 4000 5000 6000 7000 020040060080010001200 (a) FP Edges TPEdges 0 1000 2000 3000 4000 5000 6000 2030405060708090 (b) Total Edges Selected SumofSquaredErrors 0 100 200 300 400 010203040 (c) FP Differential Edges TPDifferentialEdges ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●●●● ● ●●● ● ● ●●● ● ● ●●● ● ● ●●● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ●● ● ● 0 500 1000 1500 765770775780785790795 (d) L1 norm of {Θ} Kullback−LeiblerDivergence 0 500 1000 1500 2000 2500 0.10.55.050.0500.0 (e) Total Edges Selected RunTimeinSeconds ● ● FGL GGL Guo et. al. 2011 Graphical Lasso λ2= 0.025 λ2= 0.05 λ2= 0.075 λ2= 0.1 ω2= 0.15 ω2= 0.5 ω2= 0.7 ω2= 1 Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 40 / 48
  • 92. Figure 3 from Danaher et. al - Analysis of lung cancer microarray data her gene using the chosen tuning parameters. Identification of block diagonal structure using Theorem 1 a plication of the FGL algorithm took less than two minutes. (Note that this data set is so large that it wo computationally prohibitive to apply the proposal of Guo et al. (2011)!) FGL estimated 134 edges sha tween the two networks, 202 edges present only in the cancer network, and 18 edges present only in rmal tissue network. The results are displayed in Figure 3. g. 3. Conditional dependency networks inferred from 17,772 genes in normal and cancerous lung cells. 278 ge ve nonzero edges in at least one of the two networks. Black lines denote edges common to both classes. Red and gr es denote tumor-specific and normal-specific edges, respectively. The estimated networks contain many two-gene subnetworks common to both classes, a few small subn Figure: Conditional dependency networks inferred from 17,772 genes in normal and cancerous lung cells. 278 genes have nonzero edges in at least one of the two networks. Black lines denote edges common to both classes. Red and green lines denote tumor-specific and normal-specific edges, respectively. The parameters used were λ1 = 0.95 and λ2 = 0.005. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 41 / 48
  • 93. Table 1 from Danaher et. al 12 Danaher et al. Table 1. Performances as a function of n and p. Means over 100 replicates are shown for dKL, and for sensitivity (Sens.) and false discovery rate (FDR) of detection of edges (DE) and di↵erential edge detection (DED). p n dKL DE Sens. DE FDR DED Sens. DED FDR FGL 50 545.1 0.502 0.966 0.262 0.996 500 200 517.5 0.570 0.053 0.228 0.485 500 516.6 0.590 0.001 0.192 0.036 50 1119.3 0.600 0.970 0.245 0.998 1000 200 1035.0 0.666 0.063 0.223 0.557 500 1033.3 0.681 0.000 0.194 0.025 GGL 50 549.8 0.490 0.973 0.337 0.996 500 200 520.8 0.505 0.060 0.244 0.903 500 519.7 0.524 0.010 0.194 0.921 50 1127.9 0.587 0.976 0.316 0.998 1000 200 1041.7 0.615 0.061 0.239 0.908 500 1039.4 0.629 0.007 0.197 0.920 8. Analysis of lung cancer microarray data We applied FGL to a dataset containing 22, 283 microarray-derived gene expression measurements from large airway epithelial cells sampled from 97 patients with lung cancer and 90 controls (Spira et al. 2007). The data are publicly available from the Gene Expression Omnibus (Barrett et al. 2005) at accession number GDS2771. We omitted genes with standard deviations in the bottom 20% since a greater share of their variance is likely attributable to non-biological noise. The remaining genes were normalized to have mean zero and standard deviation one within each class. To avoid disparate levels of sparsity between the classes and to prevent the larger class from dominating the estimated networks, we weighted each class equally instead of by sample size in (2.4). Since our goal was data visualization and hypothesis generation, we chose a high value for theAbrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 42 / 48
  • 94. Table 1 indicates that for fixed p, as n increases, FGL seems to do much better in terms of edge detection (as expected, sensitivity increases and false discovery rate decreases as n increases) and differential edge detection (false discovery rate decreases as n increases) than GGL. Oddly enough, in all cases, DED Sens declines as n increases . Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 43 / 48
  • 95. Nearest Neighbor Network (n = 100, p =25) Group λ1 = 0.07, λ2 = 0.05 λ1 = 0.05, λ2 = 0.25 F 0.05642315 0.06010037 FP 0.3391197 0.02918924 FN 0.3035487 0.8517427 Table: Nearest Neighbor Network for Group Lasso Penalty Fused λ1 = 0.05, λ2 = 0.05 λ1 = 0.2, λ2 = 0.1 F 0.06335103 0.06954624 FP 0.4269327 0.002245857 FN 0.3139659 0.9723261 Table: Nearest Neighbor Network for Fused Lasso Penalty Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 44 / 48
  • 96. ˆΩ1 for Nearest Neighbor Network (n = 100, p =25) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 (a) Estimated Nearest Neighbor Graph with Fused Lasso Penalty V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21V22 V23V24 V25 (b) Estimated Nearest Neighbor Graph with Group Lasso Penalty 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 (c) True Nearest Neighbor Graph Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 45 / 48
  • 97. ˆΩ2 for Nearest Neighbor Network (n = 100, p =25) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 (a) Estimated Nearest Neighbor Graph with Fused Lasso Penalty V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 (b) Estimated Nearest Neighbor Graph with Group Lasso Penalty 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 (c) True Nearest Neighbor Graph Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 46 / 48
  • 98. ˆΩ3 for Nearest Neighbor Network (n = 100, p =25) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 (a) Estimated Nearest Neighbor Graph with Fused Lasso Penalty V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 (b) Estimated Nearest Neighbor Graph with Group Lasso Penalty 1 2 3 45 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 (c) True Nearest Neighbor Graph Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 47 / 48
  • 99. Additional notes In the appendix, they show that if you go down to 2 classes, FGL performs as fast as GGL while still maintaining the results best overall. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 48 / 48
  • 100. Additional notes In the appendix, they show that if you go down to 2 classes, FGL performs as fast as GGL while still maintaining the results best overall. We thought it was odd that the authors were much more interested in having a low false positive rate at the cost of a high false negative instead of a balance between the two. Abrahamsen, Bai, Rahman & Skripnikov The Joint Graphical Lasso April 22, 2015 48 / 48