Upcoming SlideShare
×

# SIMoNe: Statistical Iference for MOdular NEtworks

1,684 views

Published on

Published in: Technology, Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,684
On SlideShare
0
From Embeds
0
Number of Embeds
964
Actions
Shares
0
9
0
Likes
0
Embeds 0
No embeds

No notes for slide

### SIMoNe: Statistical Iference for MOdular NEtworks

1. 1. SIMoNe An R package for inferring Gausssian networks with latent clustering Julien Chiquet (and Camille, Christophe, Gilles, Catherine, Yves) ´ Laboratoire Statistique et Genome, ´ ´ ´ La genopole - Universite d’Evry SSB – 13 avril 2010 SIMoNe: inferring Gaussian networks with latent clustering 1
2. 2. Problem Inference n ≈ 10s/100s of slides g ≈ 1000s of genes Which interactions? O(g 2 ) parameters (edges) ! The main statistical issue is the high dimensional setting SIMoNe: inferring Gaussian networks with latent clustering 2
3. 3. Handling the scarcity of data (1) By reducing the number of parameters Assumption Connections will only appear between informative genes select p key genes P differential analysis p “reasonable” compared to n typically, n ∈ [p/5; 5p] the learning dataset inference n size–p vectors of expression (X1 , . . . , Xn ) with Xi ∈ Rp SIMoNe: inferring Gaussian networks with latent clustering 3
4. 4. Handling the scarcity of data (2) By collecting as many observations as possible Multitask learning Go to learning How should we merge the data? organism drug 2 drug 1 drug 3 SIMoNe: inferring Gaussian networks with latent clustering 4
5. 5. Handling the scarcity of data (2) By collecting as many observations as possible Multitask learning Go to learning by inferring each network independently organism drug 2 drug 1 drug 3 (1) (1) (1) (2) (2) (2) (3) (3) (3) (X1 , . . . , Xn1 ), Xi ∈ Rp1 (X1 , . . . , Xn2 ), Xi ∈ Rp2 (X1 , . . . , Xn3 ), Xi ∈ Rp3 inference inference inference SIMoNe: inferring Gaussian networks with latent clustering 4
6. 6. Handling the scarcity of data (2) By collecting as many observations as possible Multitask learning Go to learning by pooling all the available data organism drug 2 drug 1 drug 3 (X1 , . . . , Xn ), Xi ∈ Rp , with n = n1 + n2 + n3 . inference SIMoNe: inferring Gaussian networks with latent clustering 4
7. 7. Handling the scarcity of data (2) By collecting as many observations as possible Multitask learning Go to learning by breaking the separability organism drug 2 drug 1 drug 3 (1) (1) (1) (2) (2) (2) (3) (3) (3) (X1 , . . . , Xn1 ), Xi ∈ Rp1 (X1 , . . . , Xn2 ), Xi ∈ Rp2 (X1 , . . . , Xn3 ), Xi ∈ Rp3 inference SIMoNe: inferring Gaussian networks with latent clustering 4
8. 8. Handling the scarcity of data (3) By introducing some prior Priors should be biologically grounded 1. few genes effectively interact (sparsity), 2. networks are organized (latent clustering), 3. steady-state or time-course data (directedness relies on the modelling). G5 G4 G6 G2 G3 G7 G0 G1 G9 G8 SIMoNe: inferring Gaussian networks with latent clustering 5
9. 9. Handling the scarcity of data (3) By introducing some prior Priors should be biologically grounded 1. few genes effectively interact (sparsity), 2. networks are organized (latent clustering), 3. steady-state or time-course data (directedness relies on the modelling). G5 G4 G6 G2 G3 G7 G0 G1 G9 G8 SIMoNe: inferring Gaussian networks with latent clustering 5
10. 10. Handling the scarcity of data (3) By introducing some prior Priors should be biologically grounded 1. few genes effectively interact (sparsity), 2. networks are organized (latent clustering), 3. steady-state or time-course data (directedness relies on the modelling). B3 B2 B4 A3 B1 B5 A1 A2 C2 C1 SIMoNe: inferring Gaussian networks with latent clustering 5
11. 11. Handling the scarcity of data (3) By introducing some prior Priors should be biologically grounded 1. few genes effectively interact (sparsity), 2. networks are organized (latent clustering), 3. steady-state or time-course data (directedness relies on the modelling). B3 B2 B4 A3 B1 B5 A1 A2 C2 C1 SIMoNe: inferring Gaussian networks with latent clustering 5
12. 12. Handling the scarcity of data (3) By introducing some prior Priors should be biologically grounded 1. few genes effectively interact (sparsity), 2. networks are organized (latent clustering), 3. steady-state or time-course data (directedness relies on the modelling). B3 B2 B4 A3 B1 B5 A1 A2 C2 C1 SIMoNe: inferring Gaussian networks with latent clustering 5
13. 13. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 6
14. 14. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 6
15. 15. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 6
16. 16. The graphical models: general settings Assumption A microarray can be represented as a multivariate Gaussian vector X = (X(1), . . . , X(p)) ∈ Rp . Collecting gene expression 1. Steady-state data leads to an i.i.d. sample. 2. Time-course data gives a time series. Graphical interpretation i conditional dependency between X(i) and X(j) if and only if or j non null partial correlation between X(i) and X(j) SIMoNe: inferring Gaussian networks with latent clustering 7
17. 17. The graphical models: general settings Assumption A microarray can be represented as a multivariate Gaussian vector X = (X(1), . . . , X(p)) ∈ Rp . Collecting gene expression 1. Steady-state data leads to an i.i.d. sample. 2. Time-course data gives a time series. Graphical interpretation i conditional dependency between X(i) and X(j) ? if and only if or j non null partial correlation between X(i) and X(j) SIMoNe: inferring Gaussian networks with latent clustering 7
18. 18. The graphical models: general settings Assumption A microarray can be represented as a multivariate Gaussian vector X = (X(1), . . . , X(p)) ∈ Rp . Collecting gene expression 1. Steady-state data leads to an i.i.d. sample. 2. Time-course data gives a time series. Graphical interpretation i conditional dependency between Xt (i) and Xt−1 (j) ? if and only if or j non null partial correlation between Xt (i) and Xt−1 (j) SIMoNe: inferring Gaussian networks with latent clustering 7
19. 19. The general statistical approach Let Θ be the parameters to infer (the edges). A penalized likelihood approach ˆ Θλ = arg max L(Θ; data) − λ pen 1 (Θ, Z), Θ L is the model log-likelihood, Z is a latent clustering of the network, pen 1 is a penalty function tuned by λ > 0. It performs 1. regularization (needed when n p), 2. selection (sparsity induced by the 1 -norm), 3. model-driven inference (penalty adapted according to Z). SIMoNe: inferring Gaussian networks with latent clustering 8
20. 20. The general statistical approach Let Θ be the parameters to infer (the edges). A penalized likelihood approach ˆ Θλ = arg max L(Θ; data) − λ pen 1 (Θ, Z), Θ L is the model log-likelihood, Z is a latent clustering of the network, pen 1 is a penalty function tuned by λ > 0. It performs 1. regularization (needed when n p), 2. selection (sparsity induced by the 1 -norm), 3. model-driven inference (penalty adapted according to Z). SIMoNe: inferring Gaussian networks with latent clustering 8
21. 21. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 9
22. 22. The Gaussian model for an i.i.d. sample Let X ∼ N (0p , Σ) with X1 , . . . , Xn i.i.d. copies of X, X be the n × p matrix whose kth row is Xk , Θ = (θij )i,j∈P Σ−1 be the concentration matrix. Graphical interpretation Since corij|P{i,j} = −θij / θii θjj for i = j,   θij = 0 X(i) ⊥ X(j)|X(P{i, j}) ⇔ ⊥ or edge (i, j) ∈ network. /  Θ describes the undirected graph of conditional dependencies. SIMoNe: inferring Gaussian networks with latent clustering 10
23. 23. Neighborhood selection (1) Let Xi be the ith column of X, Xi be X deprived of Xi . θij Xi = Xi β + ε, where βj = − . θii ¨ Meinshausen and Bulhman, 2006 Since sign(corij|P{i,j} ) = sign(βj ), select the neighbors of i with 1 2 arg min Xi − Xi β 2 +λ β . β n 1 The sign pattern of Θλ is inferred after a symmetrization step. SIMoNe: inferring Gaussian networks with latent clustering 11
24. 24. Neighborhood selection (2) The pseudo log-likelihood of the i.i.d Gaussian sample is p n ˜ Liid (Θ; S) = log P(Xk (i)|Xk (Pi); Θi ) , i=1 k=1 n n n = log det(D) − Trace D−1/2 ΘSΘD−1/2 − log(2π), 2 2 2 where D = diag(Θ). Proposition Θpseudo = arg max Liid (Θ; S) − λ Θ ˆ λ ˜ 1 Θ:θij =θii has the same null entries as inferred by neighborhood selection. SIMoNe: inferring Gaussian networks with latent clustering 12
25. 25. The Gaussian likelihood for an i.i.d. sample Let S = n−1 X X be the empirical variance-covariance matrix: S is a sufﬁcient statistic of Θ. The log-likelihood n n n Liid (Θ; S) = log det(Θ) − Trace(SΘ) + log(2π). 2 2 2 The MLE = S−1 of Θ is not deﬁned for n < p and never sparse. The need for regularization is huge. SIMoNe: inferring Gaussian networks with latent clustering 13
26. 26. Penalized log-likelihood Banerjee et al., JMLR 2008 ˆ Θλ = arg max Liid (Θ; S) − λ Θ , 1 Θ efﬁciently solved by the graphical L ASSO of Friedman et al, 2008. Ambroise, Chiquet, Matias, EJS 2009 Use adaptive penalty parameters for different coefﬁcients Liid (Θ; S) − λ PZ Θ 1 , where PZ is a matrix of weights depending on the underlying clustering Z. Works with the pseudo log-likelihood (computationally efﬁcient). SIMoNe: inferring Gaussian networks with latent clustering 14
27. 27. Penalized log-likelihood Banerjee et al., JMLR 2008 ˆ Θλ = arg max Liid (Θ; S) − λ Θ , 1 Θ efﬁciently solved by the graphical L ASSO of Friedman et al, 2008. Ambroise, Chiquet, Matias, EJS 2009 Use adaptive penalty parameters for different coefﬁcients ˜ Liid (Θ; S) − λ PZ Θ , 1 where PZ is a matrix of weights depending on the underlying clustering Z. Works with the pseudo log-likelihood (computationally efﬁcient). SIMoNe: inferring Gaussian networks with latent clustering 14
28. 28. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 15
29. 29. The Gaussian model for time-course data (1) Let X1 , . . . , Xn be a ﬁrst order vector autoregressive process Xt = ΘXt−1 + b + εt , t ∈ [1, n] where we are looking for Θ = (θij )i,j∈P and X0 ∼ N (0p , Σ0 ), εt is a Gaussian white noise with covariance σ 2 Ip , cov(Xt , εs ) = 0 for s > t, so that Xt is markovian. Graphical interpretation since cov (Xt (i), Xt−1 (j)|Xt−1 (Pj)) θij = , var (Xt−1 (j)|Xt−1 (Pj))   θij = 0 Xt (i) ⊥ Xt−1 (j)|Xt−1 (Pj) ⇔ ⊥ or edge (j i) ∈ network /  SIMoNe: inferring Gaussian networks with latent clustering 16
30. 30. The Gaussian model for time-course data (2) Let X be the n × p matrix whose kth row is Xk , S = n−1 Xn Xn be the within time covariance matrix, V = n−1 Xn X0 be the across time covariance matrix. The log-likelihood n Ltime (Θ; S, V) = n Trace (VΘ) − Trace (Θ SΘ) + c. 2 The MLE = S−1 V of Θ is still not deﬁned for n < p. SIMoNe: inferring Gaussian networks with latent clustering 17
31. 31. Penalized log-likelihood Charbonnier, Chiquet, Ambroise, SAGMB 2010 ˆ Θλ = arg max Ltime (Θ; S, V) − λ PZ Θ 1 Θ where PZ is a (non-symmetric) matrix of weights depending on the underlying clustering Z. Major difference with the i.i.d. case The graph is directed: cov (Xt (i), Xt−1 (j)|Xt−1 (Pj)) θij = var (Xt−1 (j)|Xt−1 (Pj)) cov (Xt (j), Xt−1 (i)|Xt−1 (Pi)) = . var (Xt−1 (i)|Xt−1 (Pi)) SIMoNe: inferring Gaussian networks with latent clustering 18
32. 32. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 19
33. 33. Coupling related problems Consider T samples concerning the expressions of the same p genes, (t) (t) X1 , . . . , Xnt is the tth sample drawn from N (0p , Σ(t) ), with covariance matrix S(t) . Multiple samples setup Go to scheme Ignoring the relationships between the tasks leads to T arg max L(Θ(t) ; S(t) ) − λ pen 1 (Θ(t) , Z). Θ(t) ,t=1...,T t=1 Breaking the separability Either by modifying the objective function or the constraints. SIMoNe: inferring Gaussian networks with latent clustering 20
34. 34. Coupling related problems Consider T samples concerning the expressions of the same p genes, (t) (t) X1 , . . . , Xnt is the tth sample drawn from N (0p , Σ(t) ), with covariance matrix S(t) . Multiple samples setup Go to scheme Ignoring the relationships between the tasks leads to T arg max L(Θ(t) ; S(t) ) − λ pen 1 (Θ(t) , Z). Θ(t) ,t=1...,T t=1 Breaking the separability Either by modifying the objective function or the constraints. SIMoNe: inferring Gaussian networks with latent clustering 20
35. 35. Coupling related problems Consider T samples concerning the expressions of the same p genes, (t) (t) X1 , . . . , Xnt is the tth sample drawn from N (0p , Σ(t) ), with covariance matrix S(t) . Multiple samples setup Go to scheme Remarks Ignoring the relationships between the tasks leads to In the sequel, the Z is eluded for clarity (no loss of generality). T Multitask learning is easily adapted pen (Θ(t) , Z). data yet arg max L(Θ(t) ; S(t) ) − λ to time-course 1 only steady statet=1 Θ (t) ,t=1...,T version is presented here. Breaking the separability Either by modifying the objective function or the constraints. SIMoNe: inferring Gaussian networks with latent clustering 20
36. 36. Coupling problems through the objective function The Intertwined L ASSO T max ˜ ˜ L(Θ(t) ; S(t) ) − λ Θ(t) 1 Θ(t) ,t...,T t=1 ¯ S = n T nt S(t) is an “across-task” covariance matrix. 1 t=1 ˜ ¯ S(t) = αS(t) + (1 − α)S is a mixture between inner/over-tasks covariance matrices. setting α = 0 is equivalent to pooling all the data and infer one common network, setting α = 1 is equivalent to treating T independent problems. SIMoNe: inferring Gaussian networks with latent clustering 21
37. 37. Coupling problems by grouping variables (1) Groups deﬁnition Groups are the T -tuple composed by the (i, j) entries of each Θ(t) , t = 1, . . . , T . Most relationships between the genes are kept or removed across all tasks simultaneously. The graphical group-L ASSO T T 1/2 ˜ (t) (t) (t) 2 max L Θ ;S −λ θij . Θ(t) ,t...,T t=1 i,j∈P t=1 i=j SIMoNe: inferring Gaussian networks with latent clustering 22
38. 38. (2) (2) β2 =0 β2 = 0.3 1 1 Group-L ASSO penalty =0 (1) (1) β1 β1 −1 1 −1 1 Assume (2) β1 −1 (1) −1 (1) 2 tasks (T = 2) β2 β2 2 coefﬁcients (p = 2) 1 1 Let represent the unit ball = 0.3 (1) (1) β1 β1 −1 1 −1 1 2 2 1/2 (2) (t) 2 β1 −1 −1 βi ≤1 (1) (1) β2 β2 i=1 t=1 SIMoNe: inferring Gaussian networks with latent clustering 23
39. 39. Coupling problems by grouping variables (2) Graphical group-L ASSO modiﬁcation Inside a group, value are most likeliky sign consistent. The graphical cooperative-L ASSO T max ˜ L S(t) ; Θ(t) Θ(t) ,t...,T t=1  1/2 1/2  T T (t) 2 (t) 2   −λ θij + θij ,  + −  i,j∈P t=1 t=1 i=j where [u]+ = max(0, u) and [u]− = min(0, u). SIMoNe: inferring Gaussian networks with latent clustering 24
40. 40. (2) (2) β2 =0 β2 = 0.3 Coop-L ASSO penalty Assume 1 1 2 tasks (T = 2) =0 (1) (1) β1 −1 1 β1 −1 1 2 coefﬁcients (p = 2) (2) β1 −1 −1 β2 (1) β2 (1) Let represent the unit ball 2 2 1/2 1 1 2 (t) βi + = 0.3 (1) (1) i=1 t=1 β1 β1 −1 1 −1 1 2 2 1/2 (2) (t) β1 −1 −1 + −βi ≤1 (1) (1) + β2 β2 i=1 t=1 SIMoNe: inferring Gaussian networks with latent clustering 25
41. 41. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 26
42. 42. The overall strategy Our basic criteria is of the form L(Θ; data) − λ PZ Θ 1 . What we are looking for the edges, through Θ, the correct level of sparsity λ, the underlying clustering Z with connectivity matrix πZ . What SIMoNe does 1. Infer a family of networks G = {Θλ : λ ∈ [λmax , 0]} 2. Select G that maximizes an information criteria 3. Learn Z on the selected network G 4. Infer a family of networks with PZ ∝ 1 − πZ 5. Select GZ that maximizes an information criteria SIMoNe: inferring Gaussian networks with latent clustering 27
43. 43. The overall strategy Our basic criteria is of the form L(Θ; data) − λ PZ Θ 1 . What we are looking for the edges, through Θ, the correct level of sparsity λ, the underlying clustering Z with connectivity matrix πZ . What SIMoNe does 1. Infer a family of networks G = {Θλ : λ ∈ [λmax , 0]} 2. Select G that maximizes an information criteria 3. Learn Z on the selected network G 4. Infer a family of networks with PZ ∝ 1 − πZ 5. Select GZ that maximizes an information criteria SIMoNe: inferring Gaussian networks with latent clustering 27
44. 44. SIMoNe Suppose you want toSIMoNE recover a clustered network: Graph Target Adjacency Matrix Target Network SIMoNe: inferring Gaussian networks with latent clustering 28
45. 45. SIMoNe Start with microarray data SIMoNE Data SIMoNe: inferring Gaussian networks with latent clustering 28
46. 46. SIMoNe SIMoNE SIMoNE without prior Adjacency Matrix Data corresponding to G SIMoNe: inferring Gaussian networks with latent clustering 28
47. 47. SIMoNe SIMoNE SIMoNE without prior Adjacency Matrix Penalty matrix PZ Data corresponding to G Decreasing transformation Mixer πZ Connectivity matrix SIMoNe: inferring Gaussian networks with latent clustering 28
48. 48. SIMoNe SIMoNE SIMoNE without prior + Adjacency Matrix Adjacency Matrix Penalty matrix PZ Data corresponding to G corresponding to GZ Decreasing transformation Mixer πZ Connectivity matrix SIMoNe: inferring Gaussian networks with latent clustering 28
49. 49. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 29
50. 50. Monotask framework: problem decomposition Consider the following reordering of Θ Θii Θii Θii Θ= , Θi = . Θii θii θii Block coordinate descent algorithm arg max L(Θ; data) − λ pen 1 (Θ) Θ relies on p penalized, convex-optimization problems arg min f (β; S) + λ pen 1 (β), (1) β∈Rp−1 where f is convex and β = Θii for steady-state data. SIMoNe: inferring Gaussian networks with latent clustering 30
51. 51. Monotask framework: problem decomposition Consider the following reordering of Θ Θii Θii Θii Θ= , Θi = . Θii θii θii Block coordinate descent algorithm arg max L(Θ; data) − λ pen 1 (Θ) Θ relies on p penalized, convex-optimization problems arg min f (β; S, V) + λ pen 1 (β), (1) β∈Rp where f is convex and β = Θi for time-course data. SIMoNe: inferring Gaussian networks with latent clustering 30
52. 52. Monotask framework: algorithms 1. steady-state: Covsel/GLasso (Liid (Θ) − λ Θ 1 ) starts from S + λIp positive deﬁnite, iterates on the columns of Θ−1 until stabilization, both estimation and selection of Θ. ˜ 2. steady-state: neighborhood selection (Liid (Θ) − λ Θ ) 1 select signs patterns of Θii with the L ASSO, only one pass per column required, post-symmetrization needed. 3. time-course: VAR(1) inference (Ltime (Θ) − λ Θ 1 ) select and estimate Θi with the L ASSO, only one pass per column required, both estimation and selection. SIMoNe: inferring Gaussian networks with latent clustering 31
53. 53. Monotask framework: algorithms 1. steady-state: Covsel/GLasso (Liid (Θ) − λ Θ 1 ) starts from S + λIp positive deﬁnite, iterates on the columns of Θ−1 until stabilization, both estimation and selection of Θ. ˜ 2. steady-state: neighborhood selection (Liid (Θ) − λ Θ ) 1 select signs patterns of Θii with the L ASSO, only one pass per column required, post-symmetrization needed. 3. time-course: VAR(1) inference (Ltime (Θ) − λ Θ 1 ) select and estimate Θi with the L ASSO, only one pass per column required, both estimation and selection. SIMoNe: inferring Gaussian networks with latent clustering 31
54. 54. Monotask framework: algorithms 1. steady-state: Covsel/GLasso (Liid (Θ) − λ Θ 1 ) starts from S + λIp positive deﬁnite, iterates on the columns of Θ−1 until stabilization, both estimation and selection of Θ. ˜ 2. steady-state: neighborhood selection (Liid (Θ) − λ Θ ) 1 select signs patterns of Θii with the L ASSO, only one pass per column required, post-symmetrization needed. 3. time-course: VAR(1) inference (Ltime (Θ) − λ Θ 1 ) select and estimate Θi with the L ASSO, only one pass per column required, both estimation and selection. SIMoNe: inferring Gaussian networks with latent clustering 31
55. 55. Multitask framework: problem decomposition (1) Consider the (p T ) × (p T ) block-diagonal matrix C composed by the empirical covariance matrices of each tasks 0  (1)  S C=  .. ,  . 0 S(T ) and deﬁne Remark Let us consider multitask algorithms in the steady-state frame-  (1) Sii 0   (1)  S work (easily adapted to time-course data)  ii  ..  , Cii =  .  .   Cii =   .   .  . (T ) (T ) 0 Sii Sii The (p − 1) T × (p − 1) T matrix Cii is the matrix C where we removed each line and each column pertaining to variable i. SIMoNe: inferring Gaussian networks with latent clustering 32
56. 56. Multitask framework: problem decomposition (1) Consider the (p T ) × (p T ) block-diagonal matrix C composed by the empirical covariance matrices of each tasks 0  (1)  S C=  .. ,  . 0 S(T ) and deﬁne (1) (1)     Sii 0 Sii .. =  . .     Cii =   .  , Cii   .  . (T ) (T ) 0 Sii Sii The (p − 1) T × (p − 1) T matrix Cii is the matrix C where we removed each line and each column pertaining to variable i. SIMoNe: inferring Gaussian networks with latent clustering 32
57. 57. Multitask framework: problem decomposition (2) Estimate the ith -columns of the T tasks bind together T arg max ˜ L(Θ(t) ; S(t) ) − λ pen 1 (Θ(t) ) Θ(t) ,t=1...,T t=1 is decomposed into p convex optimization problems arg min f (β; C) + λ pen 1 (β), β∈RT ×(p−1) (t) where we set β (t) = Θii and β (1)   β =  .  ∈ RT ×(p−1) .  .  . β (T ) SIMoNe: inferring Gaussian networks with latent clustering 33
58. 58. Solving the sub-problem Subdifferential approach min L(β) = f (β) + pen 1 (β) , β∈RT ×(p−1) β is a minimizer iif 0p ∈ ∂β L(β), with ∂β L(β) = β f (β) + λ∂β pen 1 (β). SIMoNe: inferring Gaussian networks with latent clustering 34
59. 59. Solving the sub-problem Subdifferential approach min L(β) = f (β) + pen 1 (β) , β∈RT ×(p−1) β is a minimizer iif 0p ∈ ∂β L(β), with ∂β L(β) = β f (β) + λ∂β pen 1 (β). For the graphical Intertwined L ASSO T pen 1 (β) = β (t) , 1 t=1 where the grouping effect is managed by the function f . SIMoNe: inferring Gaussian networks with latent clustering 34
60. 60. Solving the sub-problem Subdifferential approach min L(β) = f (β) + pen 1 (β) , β∈RT ×(p−1) β is a minimizer iif 0p ∈ ∂β L(β), with ∂β L(β) = β f (β) + λ∂β pen 1 (β). For the graphical Group-L ASSO p−1 [1:T ] pen 1 (β) = βi , 2 i=1 [1:T ] (1) (T ) where β i = βi , . . . , βi ∈ RT is the vector of the ith com- ponent across tasks. SIMoNe: inferring Gaussian networks with latent clustering 34
61. 61. Solving the sub-problem Subdifferential approach min L(β) = f (β) + pen 1 (β) , β∈RT ×(p−1) β is a minimizer iif 0p ∈ ∂β L(β), with ∂β L(β) = β f (β) + λ∂β pen 1 (β). For the graphical Coop-L ASSO p−1 [1:T ] [1:T ] pen 1 (β) = βi + −β i , + 2 + 2 i=1 [1:T ] (1) (T ) where β i = βi , . . . , βi ∈ RT is the vector of the ith com- ponent across tasks. SIMoNe: inferring Gaussian networks with latent clustering 34
62. 62. General active set algorithm: yellow belt // 0. INITIALIZATION β ← 0, A ← ∅ while 0 ∈ ∂β L(β) do / // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A Find a solution h to the smooth problem h f (β A + h) + λ∂h pen 1 (β A + h) = 0, where ∂h pen 1 = h pen 1 . βA ← βA + h // 2. IDENTIFY NEWLY ZEROED VARIABLES A ← A{i} // 3. IDENTIFY NEW NON-ZERO VARIABLES // Select a candidate i ∈ Ac ∂f (β) i ← arg max vj , where vj = min ∂βj + λν j∈Ac ν∈∂β gk j end SIMoNe: inferring Gaussian networks with latent clustering 35
63. 63. General active set algorithm: orange belt // 0. INITIALIZATION β ← 0, A ← ∅ while 0 ∈ ∂β L(β) do / // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A Find a solution h to the smooth problem h f (β A + h) + λ∂h pen 1 (β A + h) = 0, where ∂h pen 1 = h pen 1 . βA ← βA + h // 2. IDENTIFY NEWLY ZEROED VARIABLES A ← A{i} // 3. IDENTIFY NEW NON-ZERO VARIABLES // Select a candidate i ∈ Ac which violates the more the optimality conditions ∂f (β) i ← arg max vj , where vj = min ∂βj + λν j∈Ac ν∈∂β gk j if it exists such an i then A ← A ∪ {i} else Stop and return β, which is optimal end end SIMoNe: inferring Gaussian networks with latent clustering 35
64. 64. General active set algorithm: green belt // 0. INITIALIZATION β ← 0, A ← ∅ while 0 ∈ ∂β L(β) do / // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A Find a solution h to the smooth problem h f (β A + h) + λ∂h pen 1 (β A + h) = 0, where ∂h pen 1 = h pen 1 . βA ← βA + h // 2. IDENTIFY NEWLY ZEROED VARIABLES ∂f (β) while ∃i ∈ A : βi = 0 and min ∂β + λν = 0 do ν∈∂β gk i i A ← A{i} end // 3. IDENTIFY NEW NON-ZERO VARIABLES // Select a candidate i ∈ Ac such that an infinitesimal change of βi provides the highest reduction of L ∂f (β) i ← arg max vj , where vj = min ∂β + λν ν∈∂β gk j j∈Ac j if vi = 0 then A ← A ∪ {i} else Stop and return β, which is optimal end end SIMoNe: inferring Gaussian networks with latent clustering 35
65. 65. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 36
66. 66. Tuning the penalty parameter What does the literature say? Theory based penalty choices √ 1. Optimal order of penalty in the p n framework: n log p Bunea et al. 2007, Bickel et al. 2009 2. Control on the probability of connecting two distinct connectivity sets Meinshausen et al. 2006, Banerjee et al. 2008, Ambroise et al. 2009 practically much too conservative Cross-validation Optimal in terms of prediction, not in terms of selection Problematic with small samples: changes the sparsity constraint due to sample size SIMoNe: inferring Gaussian networks with latent clustering 37
67. 67. Tuning the penalty parameter BIC / AIC Theorem (Zou et al. 2008) ˆlasso ˆlasso df(βλ ) = βλ 0 Straightforward extensions to the graphical framework ˆ ˆ log n BIC(λ) = L(Θλ ; X) − df(Θλ ) 2 ˆ ˆ AIC(λ) = L(Θλ ; X) − df(Θλ ) Rely on asymptotic approximations, but still relevant for small data set ˜ Easily adapted to Liid , Liid , Ltime and multitask framework. SIMoNe: inferring Gaussian networks with latent clustering 38
68. 68. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 39
69. 69. MixNet ¨ ´ Erdos-Renyi Mixture for Networks The data is now the network itself Consider A = (aij )i,j∈P , the adjacency matrix associated to Θ: aij = 1{θij =0} . Latent structure modeling (Daudin et al., 2008) Spread the nodes on a set Q = {1, . . . , q, . . . , Q} of classes with α a Q–size vector giving αi = P(i ∈ q), ziq = 1{i∈q} are independent hidden variables Zi ∼ M(1, α), π a Q × Q matrix giving πq = P(aij = 1|i ∈ q, j ∈ ). Connexion probabilities depends on the node class belonging: aij |{Ziq Zj = 1} ∼ B(πq ). SIMoNe: inferring Gaussian networks with latent clustering 40
70. 70. Estimation strategy Likelihoods the observed data: P(A|α, π) = Z P(A, Z|α, π). the complete data: P(A, Z|α, π). The EM criteria E log P(A, Z|α, π)|A . requires P(Z|A, α, π) which is not tractable! SIMoNe: inferring Gaussian networks with latent clustering 41
71. 71. Variational inference Principle Approximate P(Z|A, α, π) by Rτ (Z) chosen to minimize KL(Rτ (Z); P(Z|A, α, π)), where Rτ is such as log Rτ (Z) = iq Ziq log τiq and τ are the variational parameters to optimize. Variational Bayes (Latouche et al.) Put appropriate priors on α and π, Give good performances especially for the choice of Q and is thus relevant in the SIMoNe context. SIMoNe: inferring Gaussian networks with latent clustering 42
72. 72. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 43
73. 73. Network generation Let ﬁx the number p = card(P) of nodes, if the graph is directed or not. Afﬁliation matrix A = (aij )i,j∈P 1. usual MixNet framework the Q × Q matrix Π, with πq = P(aij = 1|i ∈ q, j ∈ ), the Q-size vector α with αq = P(i ∈ q). 2. constraint MixNet version the Q × Q matrix Π, with πq = card{(i, j) ∈ P × P : i ∈ q, j ∈ }, the Q-size vector α with αq = card({i ∈ P : i ∈ q})/p. SIMoNe: inferring Gaussian networks with latent clustering 44
74. 74. Gaussian data generation The Θ matrix 1. for undirected case, Θ is the concentration matrix compute the normalized Laplacian of A, generate a symmetric pattern of random signs. 2. for directed case, Θ represents the VAR(1) parameters generate random correlations for aij = 0, normalized by the eigen-value with greatest modulus, generate a pattern of random signs. The Gaussian sample X 1. for undirected case, compute Σ by pseudo-inversion of Θ , generate the multivariate Gaussian sample with Cholesky decomposition of Σ . 2. for directed case, Θ permits to generate a stable VAR(1) process. SIMoNe: inferring Gaussian networks with latent clustering 45
75. 75. Gaussian data generation The Θ matrix 1. for undirected case, Θ is the concentration matrix compute the normalized Laplacian of A, generate a symmetric pattern of random signs. 2. for directed case, Θ represents the VAR(1) parameters generate random correlations for aij = 0, normalized by the eigen-value with greatest modulus, generate a pattern of random signs. The Gaussian sample X 1. for undirected case, compute Σ by pseudo-inversion of Θ , generate the multivariate Gaussian sample with Cholesky decomposition of Σ . 2. for directed case, Θ permits to generate a stable VAR(1) process. SIMoNe: inferring Gaussian networks with latent clustering 45
76. 76. Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100, 2. two classes, hubs and leaves, with proportions α = (0.1, 0.9), 3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise. SIMoNe: inferring Gaussian networks with latent clustering 46
77. 77. Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100, 2. two classes, hubs and leaves, with proportions α = (0.1, 0.9), 3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise. Boxplot of Precision values, without and with structure inference 0.8 0.6 precision = TP/(TP+FP) 0.4 0.2 precision wocl.BIC precision wocl.AIC SIMoNe: inferring Gaussian networks with latent clustering 46
78. 78. Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100, 2. two classes, hubs and leaves, with proportions α = (0.1, 0.9), 3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise. Boxplot of Recall values, without and with structure inference 1.0 0.8 0.6 recall = TP/P (power) 0.4 0.2 recall wocl.BIC recall wcl.BIC recall wocl.AIC recall wcl.AIC SIMoNe: inferring Gaussian networks with latent clustering 46
79. 79. Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100, 2. two classes, hubs and leaves, with proportions α = (0.1, 0.9), 3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise. Boxplot of Fallout values, without and with structure inference 0.04 ● ● ● ● ● ● ● ● 0.03 ● ● ● ● fallout = FP/N (type I error) 0.02 0.01 0.00 fallout wocl.BIC fallout wcl.BIC fallout wocl.AIC fallout wcl.AIC SIMoNe: inferring Gaussian networks with latent clustering 46
80. 80. Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring Gaussian networks with latent clustering 47
81. 81. Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring Gaussian networks with latent clustering 47
82. 82. Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring Gaussian networks with latent clustering 47
83. 83. Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring Gaussian networks with latent clustering 47
84. 84. Multitask: simulation results Precision/Recall curve ROC curve precision = TP/(TP+FP) fallout = FP/N (type I error) recall = TP/P (power) recall = TP/P (power) SIMoNe: inferring Gaussian networks with latent clustering 48
85. 85. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 25, δ = 1 SIMoNe: inferring Gaussian networks with latent clustering 48
86. 86. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 25, δ = 3 SIMoNe: inferring Gaussian networks with latent clustering 48
87. 87. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 25, δ = 5 SIMoNe: inferring Gaussian networks with latent clustering 48
88. 88. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 50, δ = 1 SIMoNe: inferring Gaussian networks with latent clustering 48
89. 89. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 50, δ = 3 SIMoNe: inferring Gaussian networks with latent clustering 48
90. 90. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 50, δ = 5 SIMoNe: inferring Gaussian networks with latent clustering 48
91. 91. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 100, δ = 1 SIMoNe: inferring Gaussian networks with latent clustering 48
92. 92. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 100, δ = 3 SIMoNe: inferring Gaussian networks with latent clustering 48
93. 93. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 100, δ = 5 SIMoNe: inferring Gaussian networks with latent clustering 48
94. 94. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 49
95. 95. Breast cancer Prediction of the outcome of preoperative chemotherapy Two types of patients Patient response can be classiﬁed as 1. either a pathologic complete response (PCR), 2. or residual disease (not PCR). Gene expression data 133 patients (99 not PCR, 34 PCR) 26 identiﬁed genes (differential analysis) SIMoNe: inferring Gaussian networks with latent clustering 50
96. 96. Pooling the data cancer data: pooling approach demo/cancer_pooled.swf SIMoNe: inferring Gaussian networks with latent clustering 51
97. 97. Multitask approach: PCR / not PCR cancer data: graphical cooperative Lasso demo/cancer_mtasks.swf SIMoNe: inferring Gaussian networks with latent clustering 52
98. 98. Conclusions To sum-up SIMoNe embeds most state-of-the-art statistical methods for GGM inference based upon 1 -penalization, both steady-state and time course data can be dealt with, (hopefully) biologist-friendly R package. Perspectives Adding transversal tools such as network comparison, bootstrap to limit the number of false positives, more critieria to choose the penalty parameter, interface to Gene Ontology. SIMoNe: inferring Gaussian networks with latent clustering 53
99. 99. Publications Ambroise, Chiquet, Matias, 2009. Inferring sparse Gaussian graphical models with latent structure Electronic Journal of Statistics, 3, 205-238. Chiquet, Smith, Grasseau, Matias, Ambroise, 2009. SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics, 25(3), 417-418. Charbonnier, Chiquet, Ambroise, 2010. Weighted-Lasso for Structured Network Inference from Time Course Data., SAGMB, 9. Chiquet, Grandvalet, Ambroise, arXiv preprint. Inferring multiple Gaussian graphical models. SIMoNe: inferring Gaussian networks with latent clustering 54
100. 100. Publications Ambroise, Chiquet, Matias, 2009. Inferring sparse Gaussian graphical models with latent structure Electronic Journal of Statistics, 3, 205-238. Chiquet, Smith, Grasseau, Matias, Ambroise, 2009. SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics, 25(3), 417-418. Charbonnier, Chiquet, Ambroise, 2010. Weighted-Lasso for Structured Network Inference from Time Course Data., SAGMB, 9. Chiquet, Grandvalet, Ambroise, arXiv preprint. Inferring multiple Gaussian graphical models. Working paper: Chiquet, Charbonnier, Ambroise, Grasseau. SIMoNe: An R package for inferring Gausssian networks with latent structure, Journal of Statistical Softwares. Working paper: Chiquet, Grandvalet, Ambroise, Jeanmougin. Biological analysis of breast cancer by multitasks learning. SIMoNe: inferring Gaussian networks with latent clustering 54