Inferring networks from multiple samples with consensus LASSO

1. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Joint network inference with the consensual LASSO Nathalie Villa-Vialaneix Joint work with Matthieu Vignes, Nathalie Viguerie and Magali San Cristobal Séminaire de Statistique Parisien Paris, 15 septembre 2014 http://www.nathalievilla.org nathalie.villa@toulouse.inra.fr Nathalie Villa-Vialaneix | Consensus Lasso 1/36

2. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Outline 1 Short overview on biological background 2 Network inference and GGM 3 Inference with multiple samples 4 Simulations Nathalie Villa-Vialaneix | Consensus Lasso 2/36

3. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations DNA DNA (DeoxyriboNucleic Acid (DNA) molecule that encodes the genetic instructions used in the development and functioning of all known living organisms and many viruses double helix made with only four nucleotides (Adenine, Cytosine, Thymine, Guanine): A binds with T and C with G. Nathalie Villa-Vialaneix | Consensus Lasso 3/36

4. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Transcription Transcription of DNA transcription is the process by which a particular segment of DNA (called gene) is copied into a single-strand RNA (which is called message RNA) A is transcripted into U, T into A, C into G and G into C Nathalie Villa-Vialaneix | Consensus Lasso 4/36

5. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations What is mRNA used for? mRNA then moves out of the cell nucleus and is translated into proteins which are made of 20 different amino-acids (using an alphabet: 3 letters of mRNA ! 1 amino-acid) Proteins are used by the cell to function. Nathalie Villa-Vialaneix | Consensus Lasso 5/36

6. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Gene expression In a given cell, at a given time, all the genes are not translated or not translated at the same level. Gene expression is a process by which a given gene is transcripted or/and traducted It depends on the type of cell, the environment, ... Nathalie Villa-Vialaneix | Consensus Lasso 6/36

7. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Gene expression In a given cell, at a given time, all the genes are not translated or not translated at the same level. Gene expression is a process by which a given gene is transcripted or/and traducted It depends on the type of cell, the environment, ... Gene expression can be measured either by: “counting” the number of copies (mRNA) of a given gene in the cell: transcriptomic data; “counting” the quantity of a given protein in the cell: proteomic data. Even though a given mRNA is translated into a unique protein, there is no simple relationship between transcriptomic and proteomic data. Nathalie Villa-Vialaneix | Consensus Lasso 6/36

8. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Transcriptomic data How are transcriptomic data obtained? DNA spots associated to target genes (probes) are attached to a solid surface (array) Nathalie Villa-Vialaneix | Consensus Lasso 7/36

9. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Transcriptomic data How are transcriptomic data obtained? RNA material extracting for cells of interest (blood, lipid tissue, muscle, urine...) are labeled fluorescently and applied on the array When the binding between mRNA and the probes is good, the spot becomes fluorescent. Expression of a given gene is quantified by the intensity of the fluorescent signal (read by a scanner). Nathalie Villa-Vialaneix | Consensus Lasso 7/36

10. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Typical microarray data Data: large scale gene expression data individuals n ' 30=50 8>>><>>>: X = 0BBBBBBBB@ : : : : : : : : Xj i : : : : : : : : : 1CCCCCCCCA | {z } variables (genes expression); p'103=4 Nathalie Villa-Vialaneix | Consensus Lasso 8/36

11. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Typical microarray data Data: large scale gene expression data individuals n ' 30=50 8>>><>>>: X = 0BBBBBBBB@ : : : : : : : : Xj i : : : : : : : : : 1CCCCCCCCA | {z } variables (genes expression); p'103=4 Typical design: two (or more) conditions (treated/control for instance) More and more complicated designs: crossed conditions (treated/control and different breeds), longitudinal data... Nathalie Villa-Vialaneix | Consensus Lasso 8/36

12. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Typical microarray data Data: large scale gene expression data individuals n ' 30=50 8>>><>>>: X = 0BBBBBBBB@ : : : : : : : : Xj i : : : : : : : : : 1CCCCCCCCA | {z } variables (genes expression); p'103=4 Typical design: two (or more) conditions (treated/control for instance) More and more complicated designs: crossed conditions (treated/control and different breeds), longitudinal data... Typical issues: find differentially expressed genes (i.e., genes whose expression is significantly different between the conditions) Nathalie Villa-Vialaneix | Consensus Lasso 8/36

14. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Systems biology Instead of being used to produce proteins, some genes’ expressions activate or repress other genes’ expressions ) understanding the whole cascade helps to comprehend the global functioning of living organisms1 1Picture taken from: Abdollahi A et al., PNAS 2007, 104:12890-12895. c 2007 by National Academy of Sciences Nathalie Villa-Vialaneix | Consensus Lasso 10/36

15. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Model framework Data: large scale gene expression data individuals n ' 30=50 8>>><>>>:X = 0BBBBBBBB@ : : : : : : : : Xj i : : : : : : : : : 1CCCCCCCCA | {z } variables (genes expression); p'103=4 What we want to obtain: a graph/network with nodes: (selected) genes; edges: strong links between gene expressions. Nathalie Villa-Vialaneix | Consensus Lasso 11/36

16. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Advantages of network inference 1 over raw data: focuses on the strongest direct relationships: irrelevant or indirect relations are removed (more robust) and the data are easier to visualize and understand (track transcription relations). Nathalie Villa-Vialaneix | Consensus Lasso 12/36

17. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Advantages of network inference 1 over raw data: focuses on the strongest direct relationships: irrelevant or indirect relations are removed (more robust) and the data are easier to visualize and understand (track transcription relations). Expression data are analyzed all together and not by pairs (systems model). Nathalie Villa-Vialaneix | Consensus Lasso 12/36

18. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Advantages of network inference 1 over raw data: focuses on the strongest direct relationships: irrelevant or indirect relations are removed (more robust) and the data are easier to visualize and understand (track transcription relations). Expression data are analyzed all together and not by pairs (systems model). 2 over bibliographic network: can handle interactions with yet unknown (not annotated) genes and deal with data collected in a particular condition. Nathalie Villa-Vialaneix | Consensus Lasso 12/36

19. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Using correlations Relevance network [Butte and Kohane, 1999] First (naive) approach: calculate correlations between expressions for all pairs of genes, threshold the smallest ones and build the network. Correlations Thresholding Graph Nathalie Villa-Vialaneix | Consensus Lasso 13/36

20. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Using partial correlations x y z strong indirect correlation Nathalie Villa-Vialaneix | Consensus Lasso 14/36

21. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Using partial correlations x y z strong indirect correlation set.seed(2807); x <- rnorm(100) y <- 2*x+1+rnorm(100,0,0.1); cor(x,y) [1] 0.998826 z <- 2*x+1+rnorm(100,0,0.1); cor(x,z) [1] 0.998751 cor(y,z) [1] 0.9971105 Nathalie Villa-Vialaneix | Consensus Lasso 14/36

22. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Using partial correlations x y z strong indirect correlation set.seed(2807); x <- rnorm(100) y <- 2*x+1+rnorm(100,0,0.1); cor(x,y) [1] 0.998826 z <- 2*x+1+rnorm(100,0,0.1); cor(x,z) [1] 0.998751 cor(y,z) [1] 0.9971105 ] Partial correlation cor(lm(xz)$residuals,lm(yz)$residuals) [1] 0.7801174 cor(lm(xy)$residuals,lm(zy)$residuals) [1] 0.7639094 cor(lm(yx)$residuals,lm(zx)$residuals) [1] -0.1933699 Nathalie Villa-Vialaneix | Consensus Lasso 14/36

23. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Partial correlation and GGM (Xi)i=1;:::;n are i.i.d. Gaussian random variables N(0; ) (gene expression); then j ! j0(genes j and j0 are linked) , Cor Xj ; Xj0 j(Xk )k,j;j0 0 Nathalie Villa-Vialaneix | Consensus Lasso 15/36

24. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Partial correlation and GGM (Xi)i=1;:::;n are i.i.d. Gaussian random variables N(0; ) (gene expression); then j ! j0(genes j and j0 are linked) , Cor Xj ; Xj0 j(Xk )k,j;j0 0 If (concentration matrix) S = 1, Cor Xj ; Xj0 j(Xk )k,j;j0 = Sjj0 p SjjSj0 j0 ) Estimate 1 to unravel the graph structure Nathalie Villa-Vialaneix | Consensus Lasso 15/36

25. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Partial correlation and GGM (Xi)i=1;:::;n are i.i.d. Gaussian random variables N(0; ) (gene expression); then j ! j0(genes j and j0 are linked) , Cor Xj ; Xj0 j(Xk )k,j;j0 0 If (concentration matrix) S = 1, Cor Xj ; Xj0 j(Xk )k,j;j0 = Sjj0 p SjjSj0 j0 ) Estimate 1 to unravel the graph structure Problem: : p-dimensional matrix and n p ) (b n)1 is a poor estimate of S)! Nathalie Villa-Vialaneix | Consensus Lasso 15/36

26. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Various approaches for inferring networks with GGM Graphical Gaussian Model seminal work: [Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b] (with shrinkage and a proposal for a Bayesian test of significance) estimate 1 by (b n + I)1 use a Bayesian test to test which coefficients are significantly non zero. Nathalie Villa-Vialaneix | Consensus Lasso 16/36

27. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Various approaches for inferring networks with GGM Graphical Gaussian Model seminal work: [Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b] (with shrinkage and a proposal for a Bayesian test of significance) estimate 1 by (b n + I)1 use a Bayesian test to test which coefficients are significantly non zero. relation with LM 8 j, estimate the linear model: Xj =

28. Tj Xj + ; arg max (

29. jj0 )j0 (logMLj) because

30. jj0 = Sjj0 Sjj . Nathalie Villa-Vialaneix | Consensus Lasso 16/36

31. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Various approaches for inferring networks with GGM Graphical Gaussian Model seminal work: [Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b] (with shrinkage and a proposal for a Bayesian test of significance) estimate 1 by (b n + I)1 use a Bayesian test to test which coefficients are significantly non zero. relation with LM 8 j, estimate the linear model: Xj =

32. Tj Xj + ; arg min (

33. jj0 )j0 Xn i=1 Xij

34. Tj Xj i 2 because

35. jj0 = Sjj0 Sjj . Nathalie Villa-Vialaneix | Consensus Lasso 16/36

36. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Various approaches for inferring networks with GGM Graphical Gaussian Model seminal work: [Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b] (with shrinkage and a proposal for a Bayesian test of significance) estimate 1 by (b n + I)1 use a Bayesian test to test which coefficients are significantly non zero. sparse approaches: [Meinshausen and Bühlmann, 2006, Friedman et al., 2008]: 8 j, estimate the linear model: Xj =

37. Tj Xj + ; arg min (

38. jj0 )j0 Xn i=1 Xij

39. Tj Xj i 2 +k

40. jkL1 with k

41. jkL1 = P j0 j

42. jj0 j L1 penalty yields to

43. jj0 = 0 for most j0 (variable selection) Nathalie Villa-Vialaneix | Consensus Lasso 16/36

45. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Motivation for multiple networks inference Pan-European project Diogenes2 (with Nathalie Viguerie, INSERM): gene expressions (lipid tissues) from 204 obese women before and after a low-calorie diet (LCD). Assumption: A common functioning exists regardless the condition; Which genes are linked independently from/depending on the condition? 2http://www.diogenes-eu.org/; see also [Viguerie et al., 2012] Nathalie Villa-Vialaneix | Consensus Lasso 18/36

46. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Naive approach: independent estimations Notations: p genes measured in k samples, each corresponding to a specific condition: (Xc j )j=1;:::;p N(0;c), for c = 1; : : : ; k. For c = 1; : : : ; k, nc independent observations (Xc P ij )i=1;:::;nc and c nc = n. Independent inference Estimation 8 c = 1; : : : ; k and 8 j = 1; : : : ; p, Xc j = Xc nj

47. cj + c j are estimated (independently) by maximizing pseudo-likelihood: L(SjX) = Xk c=1 Xp j=1 Xnc i=1 log P Xc ij jXci ;nj ; Sc j Nathalie Villa-Vialaneix | Consensus Lasso 19/36

48. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Related papers Problem: previous estimation does not use the fact that the different networks should be somehow alike! Previous proposals [Chiquet et al., 2011] replace c bye c = 12 c + 12 and add a sparse penalty; Nathalie Villa-Vialaneix | Consensus Lasso 20/36

49. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Related papers Problem: previous estimation does not use the fact that the different networks should be somehow alike! Previous proposals [Chiquet et al., 2011] replace c bye c = 12 c + 12 and add a sparse penalty; [Chiquet et al., 2011] LASSO and Group-LASSO type penalties to force identical or sign-coherent edges between conditions: X jj0 sX c (Sc jj0 )2 or X jj0 266666664 sX c (Sc jj0 )2 + + sX c (Sc jj0 )2 377777775 ) Sc jj0 = 0 8 c for most entries OR Sc jj0 can only be of a given sign (positive or negative) whatever c Nathalie Villa-Vialaneix | Consensus Lasso 20/36

50. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Related papers Problem: previous estimation does not use the fact that the different networks should be somehow alike! Previous proposals [Chiquet et al., 2011] replace c bye c = 12 c + 12 and add a sparse penalty; [Chiquet et al., 2011] LASSO and Group-LASSO type penalties to force identical or sign-coherent edges between conditions [Danaher et al., 2013] add the penalty P c,c0 kSc Sc0 kL1 ) (sparse penalty over the concentration matrix entries forces to strong consistency between conditions); Nathalie Villa-Vialaneix | Consensus Lasso 20/36

51. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Related papers Problem: previous estimation does not use the fact that the different networks should be somehow alike! Previous proposals [Chiquet et al., 2011] replace c bye c = 12 c + 12 and add a sparse penalty; [Chiquet et al., 2011] LASSO and Group-LASSO type penalties to force identical or sign-coherent edges between conditions [Danaher et al., 2013] add the penalty P c,c0 kSc Sc0 kL1 ) (sparse penalty over the concentration matrix entries forces to strong consistency between conditions); [PMohan P et al., 2012] add a group-LASSO like penalty c,c0 j kSc j Sc0 j kL2 that focuses on differences due to a few number of nodes only. Nathalie Villa-Vialaneix | Consensus Lasso 20/36

52. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Consensus LASSO Proposal Infer multiple networks by forcing them toward a consensual network: i.e., explicitly keeping the differences between conditions under control but with a L2 penalty (allow for more differences than Group-LASSO type penalties). Original optimization: max (

53. c jk )k,j;c=1;:::;C X c 0BBBBBB@ logMLcj X k,j 1CCCCCCA j

54. c jk j : Nathalie Villa-Vialaneix | Consensus Lasso 21/36

55. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Consensus LASSO Proposal Infer multiple networks by forcing them toward a consensual network: i.e., explicitly keeping the differences between conditions under control but with a L2 penalty (allow for more differences than Group-LASSO type penalties). Original optimization: max (

56. c jk )k,j;c=1;:::;C X c 0BBBBBB@ logMLcj X k,j 1CCCCCCA j

57. c jk j : [Ambroise et al., 2009, Chiquet et al., 2011]: is equivalent to minimize p problems having dimension k(p 1): 1 2

58. Tj b njnj

59. j +

60. Tj b jnj + k

61. jkL1 ;

62. j = (

63. 1j ; : : : ;

64. kj ) withb njnj : block diagonal matrix Diag b 1 njnj ; : : : ;b k njnj and similarly forb jnj . Nathalie Villa-Vialaneix | Consensus Lasso 21/36

65. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Consensus LASSO Proposal Infer multiple networks by forcing them toward a consensual network: i.e., explicitly keeping the differences between conditions under control but with a L2 penalty (allow for more differences than Group-LASSO type penalties). Add a constraint to force inference toward a “consensus”

66. cons 1 2

67. Tj b njnj

68. j +

69. Tj b jnj + k

70. jkL1 + X c wck

71. cj

72. cons j k2 L2 with: wc : real number used to weight the conditions (wc = 1 or wc = 1 p nc ); regularization parameter;

73. cons j whatever you want...? Nathalie Villa-Vialaneix | Consensus Lasso 21/36

74. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Choice of a consensus: set one... Typical case: a prior network is known (e.g., from bibliography); with no prior information, use a fixed prior corresponding to (e.g.) global inference ) given (and fixed)

75. cons Nathalie Villa-Vialaneix | Consensus Lasso 22/36

76. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Choice of a consensus: set one... Typical case: a prior network is known (e.g., from bibliography); with no prior information, use a fixed prior corresponding to (e.g.) global inference ) given (and fixed)

77. cons Proposition Using a fixed

78. cons j , the optimization problem is equivalent to minimizing the p following standard quadratic problem in Rk(p1) with L1-penalty: 1 2

79. Tj B1()

80. j +

81. Tj B2() + k

82. jkL1 ; where B1() =b njnj + 2Ik(p1), with Ik(p1) the k(p 1)-identity matrix B2() =b jnj 2Ik(p1)

83. cons with

84. cons = (

85. cons j )T ; : : : ; (

86. cons j )T T Nathalie Villa-Vialaneix | Consensus Lasso 22/36

87. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Choice of a consensus: adapt one during training... Derive the consensus from the condition-specific estimates:

88. cons j = X c nc n

89. cj Nathalie Villa-Vialaneix | Consensus Lasso 23/36

90. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Choice of a consensus: adapt one during training... Derive the consensus from the condition-specific estimates:

91. cons j = X c nc n

92. cj Proposition Using

93. cons j = Pkc =1 nc n

94. cj , the optimization problem is equivalent to minimizing the following standard quadratic problem with L1-penalty: 1 2

95. Tj Sj()

96. j +

97. Tj b jnj + k

98. jkL1 where Sj() =b njnj + 2AT ()A() where A() is a [k(p 1) k(p 1)]-matrix that does not depend on j. Nathalie Villa-Vialaneix | Consensus Lasso 23/36

99. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Computational aspects: optimization 2j Tj 1j Tj 12 Common framework Objective function can be decomposed into: convex part C(

100. j) =

101. Q() +

102. Q() L1-norm penalty P(

103. j) = k

104. jkL1 Nathalie Villa-Vialaneix | Consensus Lasso 24/36

106. j) =

107. Q() +

109. j) = k

110. jkL1 optimization by “active set” [Osborne et al., 2000, Chiquet et al., 2011] 1: repeat( given) 2: Given A and

111. jj0 st:

112. jj0 , 0, 8 j0 2 A, solve (over h) the smooth minimization problem restricted to A C(

113. j + h) + P(

114. j + h) )

115. j

116. j + h 4: until Nathalie Villa-Vialaneix | Consensus Lasso 24/36

118. j) =

119. Q() +

121. j) = k

123. jj0 st:

125. j + h) + P(

126. j + h) )

127. j

128. j + h 3: Update A by adding most violating variables, i.e., variables st: abs

134. @C(

135. j) + @P(

136. j) 0 with [@P(

137. j )]j0 2 [1; 1] if j0 A 4: until Nathalie Villa-Vialaneix | Consensus Lasso 24/36

139. j) =

140. Q() +

142. j) = k

144. jj0 st:

146. j + h) + P(

147. j + h) )

148. j

155. @C(

156. j) + @P(

157. j) 0 with [@P(

158. j )]j0 2 [1; 1] if j0 A 4: until all conditions satisfied i.e. abs

164. @C(

165. j) + @P(

166. j) 0 Nathalie Villa-Vialaneix | Consensus Lasso 24/36

168. j) =

169. Q() +

171. j) = k

173. jj0 st:

175. j + h) + P(

176. j + h) )

177. j

184. @C(

185. j) + @P(

186. j) 0 with [@P(

187. j )]j0 2 [1; 1] if j0 A 4: until all conditions satisfied i.e. abs

193. @C(

194. j) + @P(

195. j) 0 Repeat: large # small using previous

196. as prior Nathalie Villa-Vialaneix | Consensus Lasso 24/36

197. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Bootstrap estimation ' BOLASSO subsample n observations with replacement x x x x x x x x Nathalie Villa-Vialaneix | Consensus Lasso 25/36

198. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Bootstrap estimation ' BOLASSO subsample n observations with replacement x x x x x x x x cLasso estimation ! for varying , (

199. ;b jj0 )jj0 Nathalie Villa-Vialaneix | Consensus Lasso 25/36

201. ;b jj0 )jj0 # threshold keep the first T1 largest coefficients Nathalie Villa-Vialaneix | Consensus Lasso 25/36

203. ;b jj0 )jj0 # threshold keep the first T1 largest coefficients B Frequency table (1; 2) (1; 3) ... (j; j0) ... 130 25 ... 120 ... Nathalie Villa-Vialaneix | Consensus Lasso 25/36

205. ;b jj0 )jj0 # threshold keep the first T1 largest coefficients B Frequency table (1; 2) (1; 3) ... (j; j0) ... 130 25 ... 120 ... ! Keep the T2 most frequent pairs Nathalie Villa-Vialaneix | Consensus Lasso 25/36

207. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Simulated data Expression data with known co-expression network original network (scale free) taken from http://www.comp-sys-bio.org/AGN/data.html (100 nodes, 200 edges, loops removed); rewire a ratio r of the edges to generate k “children” networks (sharing approximately 100(1 2r)% of their edges); Nathalie Villa-Vialaneix | Consensus Lasso 27/36

208. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Simulated data Expression data with known co-expression network original network (scale free) taken from http://www.comp-sys-bio.org/AGN/data.html (100 nodes, 200 edges, loops removed); rewire a ratio r of the edges to generate k “children” networks (sharing approximately 100(1 2r)% of their edges); generate “expression data” with a random Gaussian process from each chid: use the Laplacian of the graph to generate a putative concentration matrix; use edge colors in the original network to set the edge sign; correct the obtained matrix to make it positive; invert to obtain a covariance matrix...; ... which is used in a random Gaussian process to generate expression data (with noise). Nathalie Villa-Vialaneix | Consensus Lasso 27/36

209. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations An example with k = 2, r = 5% mother network3 first child second child 3actually the parent network. My co-author wisely noted that the mistake was unforgivable for a feminist... Nathalie Villa-Vialaneix | Consensus Lasso 28/36

210. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Choice for T2 Data: r = 0:05, k = 2 and n1 = n2 = 20 100 bootstrap samples, = 1, T1 = 250 or 500 l l 30 selections at least 30 selections at least 1.00 0.75 0.50 0.25 0.00 0.00 0.25 0.50 0.75 precision recall 43 selections at least l l 40 selections at least 1.00 0.75 0.50 0.25 0.00 0.00 0.25 0.50 0.75 1.00 precision recall Dots correspond to best F = 2 precisionrecall precision+recall ) Best F corresponds to selecting a number of edges approximately equal to the number of edges in the original network. Nathalie Villa-Vialaneix | Consensus Lasso 29/36

211. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Choice for T1 and T1 % of improvement 0.1/1 f250; 300; 500g of bootstrapping network sizes rewired edges: 5% 20-20 1 500 30.69 20-30 0.1 500 11.87 30-30 1 300 20.15 50-50 1 300 14.36 20-20-20-20-20 1 500 86.04 30-30-30-30 0.1 500 42.67 network sizes rewired edges: 20% 20-20 0.1 300 -17.86 20-30 0.1 300 -18.35 30-30 1 500 -7.97 50-50 0.1 300 -7.83 20-20-20-20-20 0.1 500 10.27 30-30-30-30 1 500 13.48 Nathalie Villa-Vialaneix | Consensus Lasso 30/36

212. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Comparison with other approaches Method compared (direct and bootstrap approaches) independant Graphical LASSO estimation gLasso methods implementated in the R package simone and described in [Chiquet et al., 2011]: intertwinned LASSO iLasso, cooperative LASSO coopLasso and group LASSO groupLasso fused graphical LASSO as described in [Danaher et al., 2013] as implemented in the R package fgLasso Nathalie Villa-Vialaneix | Consensus Lasso 31/36

213. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Comparison with other approaches Method compared (direct and bootstrap approaches) independant Graphical LASSO estimation gLasso methods implementated in the R package simone and described in [Chiquet et al., 2011]: intertwinned LASSO iLasso, cooperative LASSO coopLasso and group LASSO groupLasso fused graphical LASSO as described in [Danaher et al., 2013] as implemented in the R package fgLasso consensus Lasso with fixed prior (the mother network) cLasso(p) fixed prior (average over the conditions of independant estimations) cLasso(2) adaptative estimation of the prior cLasso(m) Nathalie Villa-Vialaneix | Consensus Lasso 31/36

214. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Comparison with other approaches Method compared (direct and bootstrap approaches) independant Graphical LASSO estimation gLasso methods implementated in the R package simone and described in [Chiquet et al., 2011]: intertwinned LASSO iLasso, cooperative LASSO coopLasso and group LASSO groupLasso fused graphical LASSO as described in [Danaher et al., 2013] as implemented in the R package fgLasso consensus Lasso with fixed prior (the mother network) cLasso(p) fixed prior (average over the conditions of independant estimations) cLasso(2) adaptative estimation of the prior cLasso(m) Parameters set to: T1 = 500, B = 100, = 1 Nathalie Villa-Vialaneix | Consensus Lasso 31/36

215. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Selected results (best F) rewired edges: 5% - conditions: 2 - sample size: 2 30 direct version Method gLasso iLasso groupLasso coopLasso 0.28 0.35 0.32 0.35 Method fgLasso cLasso(m) cLasso(p) cLasso(2) 0.32 0.31 0.86 0.30 bootstrap version Method gLasso iLasso groupLasso coopLasso 0.31 0.34 0.36 0.34 Method fgLasso cLasso(m) cLasso(p) cLasso(2) 0.36 0.37 0.86 0.35 Conclusions bootstraping improves performance (except for iLasso and for large r) joint inference improves performance using a good prior is (as expected) very efficient adaptive approach for cLasso is better than naive approach Nathalie Villa-Vialaneix | Consensus Lasso 32/36

216. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Real data 204 obese women ; expression of 221 genes before and after a LCD = 1 ; T1 = 1000 (target density: 4%) Distribution of the number of times an edge is selected over 100 bootstrap samples 1500 1000 500 0 0 25 50 75 100 Counts Frequency (70% of the pairs of nodes are never selected) ) T2 = 80 Nathalie Villa-Vialaneix | Consensus Lasso 33/36

217. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Networks Before diet l l l l l l l FADS2 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l AZGP1 CIDEA FADS1 GPD1L PCK2 After diet l l l l l l l FADS2 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l AZGP1 CIDEA FADS1 GPD1L PCK2 densities about 1:3% - some interactions (both shared and specific) make sense to the biologist Nathalie Villa-Vialaneix | Consensus Lasso 34/36

218. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Thank you for your attention... Programs available in the R package therese (on R-Forge)4. Joint work with Magali SanCristobal Matthieu Vignes (GenPhySe, INRA Toulouse) (Massey University, NZ) Nathalie Viguerie (I2MC, INSERM Toulouse) 4https://r-forge.r-project.org/projects/therese-pkg Nathalie Villa-Vialaneix | Consensus Lasso 35/36

219. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations Questions? Nathalie Villa-Vialaneix | Consensus Lasso 36/36

220. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations References Ambroise, C., Chiquet, J., and Matias, C. (2009). Inferring sparse Gaussian graphical models with latent structure. Electronic Journal of Statistics, 3:205–238. Butte, A. and Kohane, I. (1999). Unsupervised knowledge discovery in medical databases using relevance networks. In Proceedings of the AMIA Symposium, pages 711–715. Chiquet, J., Grandvalet, Y., and Ambroise, C. (2011). Inferring multiple graphical structures. Statistics and Computing, 21(4):537–553. Danaher, P., Wang, P., and Witten, D. (2013). The joint graphical lasso for inverse covariance estimation accross multiple classes. Journal of the Royal Statistical Society Series B. Forthcoming. Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441. Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the lasso. Annals of Statistic, 34(3):1436–1462. Mohan, K., Chung, J., Han, S., Witten, D., Lee, S., and Fazel, M. (2012). Structured learning of Gaussian graphical models. In Proceedings of NIPS (Neural Information Processing Systems) 2012, Lake Tahoe, Nevada, USA. Osborne, M., Presnell, B., and Turlach, B. (2000). Nathalie Villa-Vialaneix | Consensus Lasso 36/36

221. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations On the LASSO and its dual. Journal of Computational and Graphical Statistics, 9(2):319–337. Schäfer, J. and Strimmer, K. (2005a). An empirical bayes approach to inferring large-scale gene association networks. Bioinformatics, 21(6):754–764. Schäfer, J. and Strimmer, K. (2005b). A shrinkage approach to large-scale covariance matrix estimation and implication for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4:1–32. Viguerie, N., Montastier, E., Maoret, J., Roussel, B., Combes, M., Valle, C., Villa-Vialaneix, N., Iacovoni, J., Martinez, J., Holst, C., Astrup, A., Vidal, H., Clément, K., Hager, J., Saris, W., and Langin, D. (2012). Determinants of human adipose tissue gene expression: impact of diet, sex, metabolic status and cis genetic regulation. PLoS Genetics, 8(9):e1002959. Nathalie Villa-Vialaneix | Consensus Lasso 36/36

Inferring networks from multiple samples with consensus LASSO

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Viewers also liked

Viewers also liked (20)

Similar to Inferring networks from multiple samples with consensus LASSO

Similar to Inferring networks from multiple samples with consensus LASSO (20)

More from tuxette

More from tuxette (20)

Recently uploaded

Recently uploaded (20)

Inferring networks from multiple samples with consensus LASSO