Learning the structure of Gaussian Graphical models with unobserved variables by Marina Vinyes, Software Engineer in Machine Learning @Criteo

Learning the structure of Gaussian
Graphical models with unobserved variables
Marina Vinyes, Ph.D.
Paris WiMLDS Organizer, Machine Learning Engineer at Criteo
4th June 2019
1 / 17

Why graphical models?
Graphs are a natural way to represent data
Family tree Social network
Gene regulatory
network
Left: Photo of Marie Curie Museum (Muzeum Marii Sklodowskiej-Curie) is courtesy of TripAdvisor. Middle:
https://en.wikipedia.org/wiki/Social graph. Right: Emmert Streib et al. [2014] 2 / 17

What are graphical models?
Nodes correspond to random variables
Edges correspond to statistical dependencies between variables
Diﬀerent kinds of graphical models
directed/undirected graph
discrete/continous/both variables
3 / 17

Conditional independence
B
A C
B: Train strike
A: Marina is late
C: Caroline is late
A and C independent?
No
A and C cond. independent
given B?
Yes
B
A C
B: Traﬃc jam
A: Rain
C: Football match
A and C independent?
Yes
A and C cond. independent
given B?
No
4 / 17

Learning the structure of a graphical model
Goal: Knowledge discovery, ﬁrst step towards causality eﬀects,. . .
X1
X2 X3
X4
X6 X5
X1
X2 X3
X4
X6 X5
5 / 17

Learning the structure of a graphical model
Easier for undirected Gaussian graphical models...
Σ−1
i,j = 0 if and only if no edge between Xi and Xj
(where Σ−1 is the inverse covariance matrix)
X1
X2 X3
X4
X6 X5
ˆΣ−1 ≈
Clariﬁcation: All next slides only undirected Gaussian
graphical models
6 / 17

Graphical lasso: sparsity assumption
Approximation:
ˆΣ the empirical covariance matrix
ˆΣ−1 ≈ sparse
Formulation:
min
S
fnll (S) + λ S 1
s.t. S 0
Negative log likelihood fnll (M) := − log det(M) + tr(MΣ)
Semideﬁnite program
7 / 17

What if some variables are unobserved?
Consider a graphical model with 2 latent variables
Complete graph, 12 edges
sparse structure
Marginalized graph, 22 edges
not so sparse structure
8 / 17

Link with the structure of the precision matrix K
K = Σ−1 where Σ is the covariance of the full graph
X1
X2
X3
X4
X6
X5
X7
X8
X9
X10
X11
Inversion formula: Σ−1
OO = KOO − UK−1
HHU
9 / 17

Previous work
Chandrasekaran et al. [2010]
Since, Σ−1
OO = KOO − UK−1
HHU
Approximation:
ˆΣOO the empirical covariance matrix
ˆΣ−1
OO ≈ sparse + low rank
Formulation:
min
S,L
fnll (S − L) + λ(η S 1 + tr(L))
s.t. S − L 0 L 0
Negative log likelihood fnll (M) := − log det(M) + tr(MΣOO)
Semideﬁnite program
Limitation:
The low rank component does not recover the connectivity
between latent and observed variables
10 / 17

Our formulation: more structure on L
Assuming:
latent variables are independent (KHH is diagonal)
every latent variable is connected to k observed variables
ˆΣ−1
OO ≈ sparse + L where we impose structure on L
using an atomic norm on L ≈ UU
min
S,L
fnll (S − L) + λ(η S 1 + γA(L))
s.t. S − L 0 L 0
11 / 17

Our formulation: more structure on L
Σ−1
OO ≈ +s1 u1u1 +s2 +s3u2u2 u3u3
S L1 L2 L3
Atomic norm γA:
Atomic norm for matrices [Richard et al., 2014]
A := {uu | u ∈ Rp
: u 0 ≤ k, u 2 = 1}
12 / 17

Results: Plots of matrix K for the full graph
ground truth sparse + low rank ours
disjoint 5 10 15 20 25 30 35 40 45
5
10
15
20
25
30
35
40
45
5 10 15 20 25 30 35 40 45
5
10
15
20
25
30
35
40
45
5 10 15 20 25 30 35 40 45
5
10
15
20
25
30
35
40
45
overlap 5 10 15 20 25 30 35 40 45
5
10
15
20
25
30
35
40
45
5 10 15 20 25 30 35 40 45
5
10
15
20
25
30
35
40
45
5 10 15 20 25 30 35 40 45
5
10
15
20
25
30
35
40
45
diﬀerent
sizes
5 10 15 20 25 30 35 40 45
5
10
15
20
25
30
35
40
45
5 10 15 20 25 30 35 40 45
5
10
15
20
25
30
35
40
45
5 10 15 20 25 30 35 40 45
5
10
15
20
25
30
35
40
45
13 / 17

Conclusion and perspectives
convex approach with matrix regularization
real dataset
directed graphs
full paper with algorithm and identiﬁability results
https://arxiv.org/abs/1807.07754
14 / 17

References I
V. Chandrasekaran, P. A. Parrilo, and A. S. Willsky. Latent variable
graphical model selection via convex optimization. In Communication,
Control, and Computing (Allerton), 2010 48th Annual Allerton
Conference on, pages 1610–1613. IEEE, 2010.
V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The
convex geometry of linear inverse problems. Foundations of
Computational mathematics, 12(6):805–849, 2012.
F. Emmert Streib, R. De Matos Simoes, P. Mullan, B. Haibe-Kains, and
M. Dehmer. The gene regulatory network for breast cancer: integrated
regulatory landscape of cancer hallmarks. Frontiers in Genetics, 5:15,
2014.
E. Richard, G. R. Obozinski, and J.-P. Vert. Tight convex relaxations for
sparse matrix factorization. In Advances in Neural Information
Processing Systems, pages 3284–3292, 2014.
R. Rockafellar. Convex Analysis. Princeton Univ. Press, 1970.
16 / 17

Atomic norms for leveraging structure
Rockafellar [1970], Chandrasekaran et al. [2012]
Let A be a collection of atoms
x =
a∈A
caa
Atomic norm on A:
γA(x) := inf
c
{
a∈A
ca | ca ≥ 0,
a∈A
caa = x}
Example of trace norm
Matrix M ∈ Rn×p of rank k.
SVD: M = k
i=1 ci ui vi
M tr :=
k
i=1
|ci | = γA(M)
A := set of rank one matrices uv with u 2
2 ≤ 1, v 2
2 ≤ 1 17 / 17

Learning the structure of Gaussian Graphical models with unobserved variables by Marina Vinyes, Software Engineer in Machine Learning @Criteo

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Learning the structure of Gaussian Graphical models with unobserved variables by Marina Vinyes, Software Engineer in Machine Learning @Criteo

Similar to Learning the structure of Gaussian Graphical models with unobserved variables by Marina Vinyes, Software Engineer in Machine Learning @Criteo (20)

More from Paris Women in Machine Learning and Data Science

More from Paris Women in Machine Learning and Data Science (20)

Recently uploaded

Recently uploaded (20)

Learning the structure of Gaussian Graphical models with unobserved variables by Marina Vinyes, Software Engineer in Machine Learning @Criteo