Successfully reported this slideshow.
Your SlideShare is downloading. ×

Biological Network Inference via Gaussian Graphical Models

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 163 Ad

More Related Content

Slideshows for you (20)

Viewers also liked (11)

Advertisement

Similar to Biological Network Inference via Gaussian Graphical Models (20)

More from CTBE - Brazilian Bioethanol Sci&Tech Laboratory (20)

Advertisement

Recently uploaded (20)

Biological Network Inference via Gaussian Graphical Models

  1. 1. An introduction to Biological network inference via Gaussian Graphical Models Christophe Ambroise, Julien Chiquet e ´ Statistique et G´nome, CNRS & Universit´ d’Evry Val d’Essonne e S˜o Paulo – School on Advance Science – Octobre 2012 a http://stat.genopole.cnrs.fr/~cambroise Network inference 1
  2. 2. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 2
  3. 3. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 3
  4. 4. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 4
  5. 5. Real networks I Many scientific fields : I World Wide Web I Biology, sociology, physics I Nature of data under study: I Interactions between N objects I O(N 2 ) possible interactions I Network topology : I Describes the way nodes interact, structure/function Sample of 250 blogs (nodes) with their links relationship (edges) of the French political Blogosphere. Network inference 5
  6. 6. 1 What the reconstructed networks are expected to be (1) Regulatory networks E. coli regulatory network I relationships between gene and their products I inhibition/activation I impossible to recover at large scale I always incomplete 1 1 and are presumably wrongly assumed to be Network inference 6
  7. 7. What the reconstructed networks are expected to be (2) Regulatory networks Figure: Regulatory network identified in mammalian cells: highly structured Network inference 7
  8. 8. What the reconstructed networks are expected to be (3) Protein-Protein interaction networks Figure: Yeast PPI network : do not be mislead by the representation, trust stat ! Network inference 8
  9. 9. What the reconstructed networks are expected to be (3) Protein-Protein interaction networks Figure: Yeast PPI network : do not be mislead by the representation, trust stat ! Network inference 8
  10. 10. What the reconstructed networks are expected to be (3) Protein-Protein interaction networks Figure: Yeast PPI network : do not be mislead by the representation, trust stat ! Network inference 8
  11. 11. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 9
  12. 12. What are we looking at? Central dogma of molecular biology transcription translation DNA mRNA Proteins replication Proteins I are building blocks of any cellular functionality, I are encoded by the genes, I do interact (at the protein and gene level – regulations). Network inference 10
  13. 13. What questions in functional genomics? (1) Various levels/scales of study I genome: sequence analysis, I transcriptome: gene expression levels, I proteome: protein functions and interactions. Questions 1. Biological understanding I Mechanisms of diseases, I gene/protein functions and interactions. 2. Medical/clinical care I Diagnostic (type of disease), I prognostic (survival analysis), I treatment (prediction of response). Network inference 11
  14. 14. What questions in functional genomics? (1) Various levels/scales of study I genome: sequence analysis, I transcriptome: gene expression levels, I proteome: protein functions and interactions. Questions 1. Biological understanding I Mechanisms of diseases, I gene/protein functions and interactions. 2. Medical/clinical care I Diagnostic (type of disease), I prognostic (survival analysis), I treatment (prediction of response). Network inference 11
  15. 15. What questions in functional genomics? (2) Central dogma of molecular biology transcription translation DNA mRNA Proteins replication Basic biostatistical issues Selecting some genes of interest (biomarkers), Looking for interactions between them (pathway analysis). Network inference 12
  16. 16. How is this measured? (1) Microarray technology: parallel measurement of many biological features signal processing Matrix of features n ⌧ p 0 1 2 3 p 1 x1 x1 x1 . . . x1 Expression levels of p B. C pretreatment X=@ . . A probes are simultaneously p 1 2 2 xn xn x1 . . . xn monitored for n individuals Network inference 13
  17. 17. How is this measured? (2) Next Generation Sequencing: parallel measurement of even many more biological features assembling Matrix of features n n p 0 1 2 3 p 1 k1 k1 k1 . . . k1 B. C X=@ . Expression counts are extracted pretreatment . A from small repeated sequences p 1 2 2 kn kn k1 . . . kn and monitored for n individuals Network inference 14
  18. 18. What questions are we dealing with? (1) Supervised canonical example at the gene level: di↵erential analysis Leukemia (Golub data, thanks to P. Neuvial) I AML – Acute Myeloblastic Leukemia, n1 = 11, I ALL – Acute Lymphoblastic Leukemia n2 = 27, a n1 + n2 vector of outcome with each patient’s tumor type. Supervised classification Find genes with significant di↵erent expression levels between groups – biomarkers prediction purpose Network inference 15
  19. 19. What questions are we dealing with? (2) Unsupervised canonical example at the gene level: hierarchical clustering Same kind of data, no outcome is considered (Unsupervised) clustering Find groups of gene which show statistical dependencies/commonalities – hoping for biological interactions exploratory purpose functional understanding Can we do better than that ? And how do genes interact anyway? Network inference 16
  20. 20. What questions are we dealing with? (2) Unsupervised canonical example at the gene level: hierarchical clustering Same kind of data, no outcome is considered (Unsupervised) clustering Find groups of gene which show statistical dependencies/commonalities – hoping for biological interactions exploratory purpose functional understanding Can we do better than that ? And how do genes interact anyway? Network inference 16
  21. 21. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 17
  22. 22. The problem at hand Inference ⇡ 10s/100s microarray/sequencing experiments ⇡ 1000s probes (“genes”) Modeling questions prior to inference 1. What do the nodes represent? (the easiest one) 2. What is/should be the meaning of an edge? (the toughest one) I Biologically? I Statistically? Network inference 18
  23. 23. The problem at hand Inference ⇡ 10s/100s microarray/sequencing experiments ⇡ 1000s probes (“genes”) Modeling questions prior to inference 1. What do the nodes represent? (the easiest one) 2. What is/should be the meaning of an edge? (the toughest one) I Biologically? I Statistically? Network inference 18
  24. 24. The problem at hand Inference ⇡ 10s/100s microarray/sequencing experiments ⇡ 1000s probes (“genes”) Modeling questions prior to inference 1. What do the nodes represent? (the easiest one) 2. What is/should be the meaning of an edge? (the toughest one) I Biologically? I Statistically? Network inference 18
  25. 25. The problem at hand Inference ⇡ 10s/100s microarray/sequencing experiments ⇡ 1000s probes (“genes”) Modeling questions prior to inference 1. What do the nodes represent? (the easiest one) 2. What is/should be the meaning of an edge? (the toughest one) I Biologically? I Statistically? Network inference 18
  26. 26. More questions/issues Modelling I Is the network dynamic of static? I How has the data been generated? (time-course/steady state) I Are the edges oriented or not? (causality) I What do the edges represent for my particular problem? Statistical challenges I (Ultra) high dimensionality, I Noisy data, lack of reproducibility, I Heterogeneity of the data (many techniques, various signals). Network inference 19
  27. 27. More questions/issues Modelling I Is the network dynamic of static? I How has the data been generated? (time-course/steady state) I Are the edges oriented or not? (causality) I What do the edges represent for my particular problem? Statistical challenges I (Ultra) high dimensionality, I Noisy data, lack of reproducibility, I Heterogeneity of the data (many techniques, various signals). Network inference 19
  28. 28. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 20
  29. 29. Canonical model settings Biological microarrays in comparable conditions Notations 1. a set P = {1, . . . , p} of p variables: these are typically the genes (could be proteins); 2. a sample N = {1, . . . , n} of individuals associated to the variables: these are typically the microarray (could be sequence counts). Basic statistical model This can be view as I a random vector X in Rp , whose j th entry is the j th variable, I a n-size sample (X 1 , . . . , X n ), such as X i is the i th microarrays, I could be independent identically distributed copies (steady-state) I could be dependent in a certain way (time-course data) I assume a parametric probability distribution for X (Gaussian). Network inference 21
  30. 30. Canonical model settings Biological microarrays in comparable conditions Notations 1. a set P = {1, . . . , p} of p variables: these are typically the genes (could be proteins); 2. a sample N = {1, . . . , n} of individuals associated to the variables: these are typically the microarray (could be sequence counts). Basic statistical model This can be view as I a random vector X in Rp , whose j th entry is the j th variable, I a n-size sample (X 1 , . . . , X n ), such as X i is the i th microarrays, I could be independent identically distributed copies (steady-state) I could be dependent in a certain way (time-course data) I assume a parametric probability distribution for X (Gaussian). Network inference 21
  31. 31. Canonical model settings Biological microarrays in comparable conditions Notations 1. a set P = {1, . . . , p} of p variables: these are typically the genes (could be proteins); 2. a sample N = {1, . . . , n} of individuals associated to the variables: The data are typically the microarray (could be sequence counts). these Stacking (X 1 , . . . , X n ), we met the usual individual/variable table X Basic statistical model 0 1 2 3 p1 This can be view as x1 x1 x1 . . . x1 B. C I Inference j th @ . a random vector X in Rp , whose X =entry is the j th variable, A . 1 2 2 p I a n-size sample (X 1 , . . . , X n ), such as Xxin is xn ix1 microarrays, the th . . . xn I could be independent identically distributed copies (steady-state) I could be dependent in a certain way (time-course data) I assume a parametric probability distribution for X (Gaussian). Network inference 21
  32. 32. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 22
  33. 33. Modeling relationship between variables (1) Independence Definition (Independence of events) Two events A and B are independent if and only if P(A, B ) = P(A)P(B ), which is usually denoted by A ? B . Equivalently, ? I A ? B , P(A|B ) = P(A), ? I A ? B , P(A|B ) = P(A|B c ) ? Example (class vs party) party party class Labour Tory class Labour Tory working 0.42 0.28 working 0.60 0.40 bourgeoisie 0.06 0.24 bourgeoisie 0.20 0.80 Table: Joint probability (left) vs. conditional probability (right) Network inference 23
  34. 34. Modeling relationship between variables (1) Independence Definition (Independence of events) Two events A and B are independent if and only if P(A, B ) = P(A)P(B ), which is usually denoted by A ? B . Equivalently, ? I A ? B , P(A|B ) = P(A), ? I A ? B , P(A|B ) = P(A|B c ) ? Example (class vs party) party party class Labour Tory class Labour Tory working 0.42 0.28 working 0.60 0.40 bourgeoisie 0.06 0.24 bourgeoisie 0.20 0.80 Table: Joint probability (left) vs. conditional probability (right) Network inference 23
  35. 35. Modeling relationships between variables (2) Conditional independence Generalizing to more than two events requires strong assumptions (mutual independence). Better handle with Definition (Conditional independence of events) Two events A and B are independent if and only if P(A, B |C ) = P(A|C )P(B |C ), which is usually denoted by A ? B |C ? Example (Does QI depends on weight?) Consider the events A = ”having low QI”, B = ”having low weight”. Network inference 24
  36. 36. Modeling relationships between variables (2) Conditional independence Generalizing to more than two events requires strong assumptions (mutual independence). Better handle with Definition (Conditional independence of events) Two events A and B are independent if and only if P(A, B |C ) = P(A|C )P(B |C ), which is usually denoted by A ? B |C ? Example (Does QI depends on weight?) Consider the events A = ”having low QI”, B = ”having low weight”. Network inference 24
  37. 37. Modeling relationships between variables (2) Conditional independence Generalizing to more than two events requires strong assumptions (mutual independence). Better handle with Definition (Conditional independence of events) Two events A and B are independent if and only if P(A, B |C ) = P(A|C )P(B |C ), which is usually denoted by A ? B |C ? Example (Does QI depends on weight?) Consider the events A = ”having low QI”, B = ”having low weight”. Estimating2 P(A, B ), P(A) and P(B ) in a sample would lead to P(A, B ) 6= P(A)P(B ) 2 stupidly Network inference 24
  38. 38. Modeling relationships between variables (2) Conditional independence Generalizing to more than two events requires strong assumptions (mutual independence). Better handle with Definition (Conditional independence of events) Two events A and B are independent if and only if P(A, B |C ) = P(A|C )P(B |C ), which is usually denoted by A ? B |C ? Example (Does QI depends on weight?) Consider the events A = ”having low QI”, B = ”having low weight”. But in fact, introducing C = ”having a given age”, P(A, B |C ) = P(A|C )P(B |C ) Network inference 24
  39. 39. Independence of random vectors (1) Independence and Conditional independence: natural generalization Definition Consider 3 random vector X , Y , Z with distribution fX , fY , fZ , jointly fXY , fXYZ . Then, I X and Y are independent iif fXY (x , y) = fX (x )fY (y); I X and Y are conditionally independent on Z , z : fZ (z ) > 0 iif fXY |Z (x , y; z ) = fX |Z (x ; z )fY |Z (y; z ). Proposition (Factorization criterion) X and Y are independent (resp. conditionally independent on Z ) iif there exists functions g and h such as, for all x and y 1. fXY (x , y) = g(x )h(y), 2. fXYZ (x , y, z ) = g(x , z )h(y, z ), for all z fZ (z ) > 0. Network inference 25
  40. 40. Independence of random vectors (1) Independence and Conditional independence: natural generalization Definition Consider 3 random vector X , Y , Z with distribution fX , fY , fZ , jointly fXY , fXYZ . Then, I X and Y are independent iif fXY (x , y) = fX (x )fY (y); I X and Y are conditionally independent on Z , z : fZ (z ) > 0 iif fXY |Z (x , y; z ) = fX |Z (x ; z )fY |Z (y; z ). Proposition (Factorization criterion) X and Y are independent (resp. conditionally independent on Z ) iif there exists functions g and h such as, for all x and y 1. fXY (x , y) = g(x )h(y), 2. fXYZ (x , y, z ) = g(x , z )h(y, z ), for all z fZ (z ) > 0. Network inference 25
  41. 41. Independence of random vectors (2) Independence vs Conditional independence f ; X ? Y |Z ? f ; fXYZ f ; fX fY fZ f ; X ? Z |Y ? f ; Y ? Z |X ? Figure: Mutual independence, Conditional dependence, full dependence. Network inference 26
  42. 42. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 27
  43. 43. Definition Definition A graphical model gives a graphical (intuitive) representation of the dependence structure of a probability distribution. Graphical structure $ Random variables/Random vector It links 1. a random vector (or a set of random variables.) X = {X1 , . . . , Xp } with distribution P, 2. a graph G = (P, E) where I P = {1, . . . , p} is the set of nodes associated to each variable, I E is a set of edges describing the dependence relationship of X ⇠ P. Network inference 28
  44. 44. Definition Definition A graphical model gives a graphical (intuitive) representation of the dependence structure of a probability distribution. Graphical structure $ Random variables/Random vector It links 1. a random vector (or a set of random variables.) X = {X1 , . . . , Xp } with distribution P, 2. a graph G = (P, E) where I P = {1, . . . , p} is the set of nodes associated to each variable, I E is a set of edges describing the dependence relationship of X ⇠ P. Network inference 28
  45. 45. Conditional Independence Graphs Definition Definition The conditional independence graph of a random vector X is the undirected graph G = {P, E} with the set of node P = {1, . . . , p} and where (i , j ) 2 E , Xi ? Xj |P{i , j }. / ? Property It owns the Markov property: any two subsets of variables separated by a third is independent conditionally on variables in the third set. Network inference 29
  46. 46. Conditional Independence Graphs Definition Definition The conditional independence graph of a random vector X is the undirected graph G = {P, E} with the set of node P = {1, . . . , p} and where (i , j ) 2 E , Xi ? Xj |P{i , j }. / ? Property It owns the Markov property: any two subsets of variables separated by a third is independent conditionally on variables in the third set. Network inference 29
  47. 47. Conditional Independence Graphs An example Let X1 , X2 , X3 , X4 be four random variables with joint probability density function fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) with u a given constant. Apply the factorization property fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) = exp(u) · exp(x1 + x1 x2 ) · exp(x2 x3 x4 ) Graphical representation 1 2 4 G = (P, E) such as P = {1, 2, 3, 4} and E= 3 Network inference 30
  48. 48. Conditional Independence Graphs An example Let X1 , X2 , X3 , X4 be four random variables with joint probability density function fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) with u a given constant. Apply the factorization property fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) = exp(u) · exp(x1 + x1 x2 ) · exp(x2 x3 x4 ) Graphical representation 1 2 4 G = (P, E) such as P = {1, 2, 3, 4} and E = {?} 3 Network inference 30
  49. 49. Conditional Independence Graphs An example Let X1 , X2 , X3 , X4 be four random variables with joint probability density function fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) with u a given constant. Apply the factorization property fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) = exp(u) · exp(x1 + x1 x2 ) · exp(x2 x3 x4 ) Graphical representation 1 2 4 G = (P, E) such as P = {1, 2, 3, 4} and E = {(1, 2)} 3 Network inference 30
  50. 50. Conditional Independence Graphs An example Let X1 , X2 , X3 , X4 be four random variables with joint probability density function fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) with u a given constant. Apply the factorization property fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) = exp(u) · exp(x1 + x1 x2 ) · exp(x2 x3 x4 ) Graphical representation 1 2 4 G = (P, E) such as P = {1, 2, 3, 4} and E = {(2, 3), (3, 4), (2, 4)} 3 Network inference 30
  51. 51. Directed Acyclic conditional independence Graph (DAG) Motivation Limitation of undirected graphs Sometimes an ordering on the variables is known, which allows to break the symmetry in the graphical representation to introduce, in some sense, “causality” in the modeling. Consequences I Each element of E has to be directed. I There are no directed cycle in the graph. We thus deal with a directed acyclic graph (or DAG). Network inference 31
  52. 52. Directed Acyclic conditional independence Graph (DAG) Definition Definition (Ordering) An ordering between variables {1, . . . , p} is a relation such that: i) for all couple (i , j ), either i j or j i , ii) is transitive iii) is not reflexive. I A natural ordering is obtained when variables are observed across time, I A natural conditioning set for a pair of variables (i , j ) is the past, denoted P(j ) = 1, . . . , j for j . Definition (DAG) The directed conditional dependence graph of X is the directed graph G = (P, E ) where (i , j ) such as i j 2 E , Xj ? Xi |P(j ){i , j }. / ? Network inference 32
  53. 53. Directed Acyclic conditional independence Graph (DAG) Definition Definition (Ordering) An ordering between variables {1, . . . , p} is a relation such that: i) for all couple (i , j ), either i j or j i , ii) is transitive iii) is not reflexive. I A natural ordering is obtained when variables are observed across time, I A natural conditioning set for a pair of variables (i , j ) is the past, denoted P(j ) = 1, . . . , j for j . Definition (DAG) The directed conditional dependence graph of X is the directed graph G = (P, E ) where (i , j ) such as i j 2 E , Xj ? Xi |P(j ){i , j }. / ? Network inference 32
  54. 54. Directed Acyclic conditional independence Graph (DAG) Factorization and Markov property Another view is a parent/descendant relationships to deal with the ordering of the nodes: The factorization property p Y fX (x ) = fXk |pak (xk |pak ), k =1 where pak are the parents of node k . Network inference 33
  55. 55. Directed Acyclic conditional independence Graph (DAG) An example x1 x2 x3 x4 x5 x6 x7 fX (x ) = ?fX1 fX2 fX3 fX4 |X1 ,X2 ,X3 fX5 |X1 ,X3 fX6 |X4 fX7 |X4 ,X5 . Network inference 34
  56. 56. Directed Acyclic conditional independence Graph (DAG) An example x1 x2 x3 x4 x5 x6 x7 fX (x ) = ?fX1 · · · fX2 fX3 fX4 |X1 ,X2 ,X3 fX5 |X1 ,X3 fX6 |X4 fX7 |X4 ,X5 . Network inference 34
  57. 57. Directed Acyclic conditional independence Graph (DAG) An example x1 x2 x3 x4 x5 x6 x7 fX (x ) = ?fX1 fX2 · · · fX3 fX4 |X1 ,X2 ,X3 fX5 |X1 ,X3 fX6 |X4 fX7 |X4 ,X5 . Network inference 34
  58. 58. Directed Acyclic conditional independence Graph (DAG) An example x1 x2 x3 x4 x5 x6 x7 fX (x ) = ?fX1 fX2 fX3 · · · fX4 |X1 ,X2 ,X3 fX5 |X1 ,X3 fX6 |X4 fX7 |X4 ,X5 . Network inference 34
  59. 59. Directed Acyclic conditional independence Graph (DAG) An example x1 x2 x3 x4 x5 x6 x7 fX (x ) = ?fX1 fX2 fX3 fX4 |X1 ,X2 ,X3 · · · fX5 |X1 ,X3 fX6 |X4 fX7 |X4 ,X5 . Network inference 34
  60. 60. Directed Acyclic conditional independence Graph (DAG) An example x1 x2 x3 x4 x5 x6 x7 fX (x ) = ?fX1 fX2 fX3 fX4 |X1 ,X2 ,X3 fX5 |X1 ,X3 · · · fX6 |X4 fX7 |X4 ,X5 . Network inference 34
  61. 61. Directed Acyclic conditional independence Graph (DAG) An example x1 x2 x3 x4 x5 x6 x7 fX (x ) = ?fX1 fX2 fX3 fX4 |X1 ,X2 ,X3 fX5 |X1 ,X3 fX6 |X4 · · · fX7 |X4 ,X5 . Network inference 34
  62. 62. Directed Acyclic conditional independence Graph (DAG) An example x1 x2 x3 x4 x5 x6 x7 fX (x ) = ?fX1 fX2 fX3 fX4 |X1 ,X2 ,X3 fX5 |X1 ,X3 fX6 |X4 fX7 |X4 ,X5 . Network inference 34
  63. 63. Directed Acyclic conditional independence Graph (DAG) Markov property Local Markov property For any Y 2 dek where dek are the descendants of k , then Xk ? Y | pak , ? that is, Xk is conditionally independent on its non-descendants given its parents. Network inference 35
  64. 64. Local Markov property: example x1 x2 x3 Check that x4 ? x5 | {x2 , x3 }, by ? using the factorization property. x4 x5 P(x2 , x3 , x4 , x5 ) P(x4 |x5 , x2 , x3 ) = P(x2 , x3 , x5 ) P(x2 )P(x3 )P(x4 |x2 , x3 )P(x5 |x3 ) = P(x2 )P(x3 )P(x5 |x3 ) = P(x4 |x2 , x3 ). Network inference 36
  65. 65. Local Markov property: example x1 x2 x3 Check that x4 ? x5 | {x2 , x3 }, by ? using the factorization property. x4 x5 P(x2 , x3 , x4 , x5 ) P(x4 |x5 , x2 , x3 ) = P(x2 , x3 , x5 ) P(x2 )P(x3 )P(x4 |x2 , x3 )P(x5 |x3 ) = P(x2 )P(x3 )P(x5 |x3 ) = P(x4 |x2 , x3 ). Network inference 36
  66. 66. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 37
  67. 67. Modeling the genomic data Gaussian assumption The data 0 1 2 3 p 1 x1 x1 x1 . . . x1 B. C Inference X=@ . . A 1 2 2 p xn xn x1 . . . xn Assuming fX (X) multivariate Gaussian Greatly simplifies the inference: naturally links independence and conditional independence to the covariance and partial covariance, gives a straightforward interpretation to the graphical modeling previously considered. Network inference 38
  68. 68. Modeling the genomic data Gaussian assumption The data 0 1 2 3 p 1 x1 x1 x1 . . . x1 B. C Inference X=@ . . A 1 2 2 p xn xn x1 . . . xn Assuming fX (X) multivariate Gaussian Greatly simplifies the inference: naturally links independence and conditional independence to the covariance and partial covariance, gives a straightforward interpretation to the graphical modeling previously considered. Network inference 38
  69. 69. Start gently with the univariate Gaussian distribution The Gaussian distribution is the natural model for the level of expression of gene (noisy data). We note X ⇠ N (µ, 2 ),so as EX = µ, VarX = 2 and ⇢ 1 1 fX (x ) = p exp (x µ)2 , 2⇡ 2 2 and p 1 log fX (x ) = log 2⇡ 2 (x µ)2 . 2 Useless for modeling the distribution of expression level for a whole bunch of genes. Network inference 39
  70. 70. Start gently with the univariate Gaussian distribution The Gaussian distribution is the natural model for the level of expression of gene (noisy data). We note X ⇠ N (µ, 2 ),so as EX = µ, VarX = 2 and ⇢ 1 1 fX (x ) = p exp (x µ)2 , 2⇡ 2 2 and p 1 log fX (x ) = log 2⇡ 2 (x µ)2 . 2 Useless for modeling the distribution of expression level for a whole bunch of genes. Network inference 39
  71. 71. One step forward: bivariate Gaussian distribution Need concepts of covariance and correlation Let X , Y be two real random variables. Definitions h i cov(X , Y ) = E X E(X ) Y E(Y ) = E(XY ) E(X )E(Y ). cov(X , Y ) ⇢XY = cor(X , Y ) = p . Var(X ) · Var(Y ) Proposition I cov(X , X ) = Var(X ) = E[(X EX )(Y EY )], I cov(X + Y , Z ) = cov(X , Z ) + cov(X , Z ), I Var(X + Y ) = Var(X ) + Var(Y ) + 2cov(X , Y ). I X ? Y ) cov(X , Y ) = 0. ? I X ? Y , cov(X , Y ) = 0 when X , Y are Gaussian. ? Network inference 40
  72. 72. One step forward: bivariate Gaussian distribution Need concepts of covariance and correlation Let X , Y be two real random variables. Definitions h i cov(X , Y ) = E X E(X ) Y E(Y ) = E(XY ) E(X )E(Y ). cov(X , Y ) ⇢XY = cor(X , Y ) = p . Var(X ) · Var(Y ) Proposition I cov(X , X ) = Var(X ) = E[(X EX )(Y EY )], I cov(X + Y , Z ) = cov(X , Z ) + cov(X , Z ), I Var(X + Y ) = Var(X ) + Var(Y ) + 2cov(X , Y ). I X ? Y ) cov(X , Y ) = 0. ? I X ? Y , cov(X , Y ) = 0 when X , Y are Gaussian. ? Network inference 40
  73. 73. The bivariate Gaussian distribution ✓ ◆ 1 1 1 x µ1 fXY (x , y) = p exp{ x µ1 y µ2 ⌃ } 2⇡ det ⌃ 2 y µ2 where ⌃ is the variance/covariance matrix which is symmetric and positive definite. ✓ ◆ Var(X ) cov(Y , X ) ⌃= . cov(Y , X ) Var(Y ) and 1 1 fX ,Y (x , y) = p exp (x 2 + y 2 + 2⇢XY xy), 2⇡(1 ⇢2 ) XY 2(1 ⇢2 ) XY where ⇢XY is the correlation between X , Y and describe the interaction between them. Network inference 41
  74. 74. The bivariate Gaussian distribution ✓ ◆ 1 1 1 x µ1 fXY (x , y) = p exp{ x µ1 y µ2 ⌃ } 2⇡ det ⌃ 2 y µ2 where ⌃ is the variance/covariance matrix which is symmetric and positive definite. If standardized, ✓ ◆ 1 ⇢XY ⌃= . ⇢XY 1 and 1 1 fX ,Y (x , y) = p exp (x 2 + y 2 + 2⇢XY xy), 2⇡(1 ⇢2 ) XY 2(1 ⇢2 ) XY where ⇢XY is the correlation between X , Y and describe the interaction between them. Network inference 41
  75. 75. The bivariate Gaussian distribution The Covariance Matrix Let X ⇠ N (0, ⌃), with unit variance and ⇢XY = 0 ✓ ◆ 1 0 ⌃= . 0 1 The shape of the 2-D distribution evolves accordingly. Network inference 42
  76. 76. The bivariate Gaussian distribution The Covariance Matrix Let X ⇠ N (0, ⌃), with unit variance and ⇢XY = 0.9 ✓ ◆ 1 0.9 ⌃= . 0.9 1 The shape of the 2-D distribution evolves accordingly. Network inference 42
  77. 77. Full generalization: multivariate Gaussian vector Now need partial covariance and partial correlation Let X , Y , Z be real random variables. Definitions cov(X , Y |Z ) = cov(X , Y ) cov(X , Z )cov(Y , Z )/Var(Z ). ⇢XY ⇢XZ ⇢YZ ⇢XY |Z = q q . 1 ⇢2XZ 1 ⇢2 YZ Give the interaction between X and Y once removed the e↵ect of Z . Proposition When X , Y , Z are jointly Gaussian, then cov(X , Y |Z ) = 0 , cor(X , Y |Z ) = 0 , X ? Y |Z . ? Network inference 43
  78. 78. Full generalization: multivariate Gaussian vector Now need partial covariance and partial correlation Let X , Y , Z be real random variables. Definitions cov(X , Y |Z ) = cov(X , Y ) cov(X , Z )cov(Y , Z )/Var(Z ). ⇢XY ⇢XZ ⇢YZ ⇢XY |Z = q q . 1 ⇢2XZ 1 ⇢2 YZ Give the interaction between X and Y once removed the e↵ect of Z . Proposition When X , Y , Z are jointly Gaussian, then cov(X , Y |Z ) = 0 , cor(X , Y |Z ) = 0 , X ? Y |Z . ? Network inference 43
  79. 79. The multivariate Gaussian distribution Allow to give a modeling for the expression level of a whole set of genes P: Gaussian vector Let X ⇠ N (µ, ⌃), and assume any block decomposition with {a, b} a partition of P ✓ ◆ ⌃ab ⌃ba ⌃= . ⌃ab ⌃bb Then 1. Xa is Gaussian with distribution N (µa , ⌃aa ) 2. Xa |Xb = x is Gaussian with distribution N (µa|b , ⌃a|b ) known. Network inference 44
  80. 80. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 45
  81. 81. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 46
  82. 82. Steady-state data: scheme Inference ⇡ 10s microarrays over time Which interactions? ⇡ 1000s probes (“genes”) Network inference 47
  83. 83. Modeling the underlying distribution (1) Model for data generation I A microarray can be represented as a multivariate vector X = (X1 , . . . , Xp ) 2 Rp , I Consider n biological replicate in the same condition, which forms a usual n-size sample (X1 , . . . , Xn ). Consequence: a Gaussian Graphical Model I X ⇠ N (µ, ⌃) with X1 , . . . , Xn i.i.d. copies of X , I ⇥ = (✓ij )i,j 2P , ⌃ 1 is called the concentration matrix. Network inference 48
  84. 84. Modeling the underlying distribution (1) Model for data generation I A microarray can be represented as a multivariate vector X = (X1 , . . . , Xp ) 2 Rp , I Consider n biological replicate in the same condition, which forms a usual n-size sample (X1 , . . . , Xn ). Consequence: a Gaussian Graphical Model I X ⇠ N (µ, ⌃) with X1 , . . . , Xn i.i.d. copies of X , I ⇥ = (✓ij )i,j 2P , ⌃ 1 is called the concentration matrix. Network inference 48
  85. 85. Modeling the underlying distribution (2) Interpretation as a GGM Multivariate Gaussian vector and covariance selection ✓ij p = cor Xi , Xj |XPi,j = ⇢ij |P{i,j } , ✓ii ✓jj Graphical Interpretation The matrix ⇥ = (✓ij )i,j 2P encodes the network G we are looking for. conditional dependency between Xj and Xi ? i or if and only if non-null partial correlation between Xj and Xi j m ✓ij 6= 0 Network inference 49
  86. 86. Modeling the underlying distribution (2) Interpretation as a GGM Multivariate Gaussian vector and covariance selection ✓ij p = cor Xi , Xj |XPi,j = ⇢ij |P{i,j } , ✓ii ✓jj Graphical Interpretation The matrix ⇥ = (✓ij )i,j 2P encodes the network G we are looking for. conditional dependency between Xj and Xi ? i or if and only if non-null partial correlation between Xj and Xi j m ✓ij 6= 0 Network inference 49
  87. 87. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 50
  88. 88. Time-course data: scheme t0 Inference t1 tn ⇡ 10s microarrays over time Which interactions? ⇡ 1000s probes (“genes”) Network inference 51
  89. 89. Modeling time-course data with DAG Collecting gene expression 1. Follow-up of one single experiment/individual; 2. Close enough time-points to ensure I dependency between consecutive measurements; I homogeneity of the Markov process. Xt 1 Xt+1 1 X4 Xt 2 Xt+1 2 X1 stands for Xt+1 3 X3 X2 X5 G Xt+1 4 Xt+1 5 Network inference 52
  90. 90. Modeling time-course data with DAG Collecting gene expression 1. Follow-up of one single experiment/individual; 2. Close enough time-points to ensure I dependency between consecutive measurements; I homogeneity of the Markov process. Xt 1 Xt+1 1 X4 Xt 2 Xt+1 2 X1 stands for Xt+1 3 X3 X2 X5 G Xt+1 4 Xt+1 5 Network inference 52
  91. 91. Modeling time-course data with DAG Collecting gene expression 1. Follow-up of one single experiment/individual; 2. Close enough time-points to ensure I dependency between consecutive measurements; I homogeneity of the Markov process. Xt 1 X2 1 ... Xn 1 X1 2 X2 2 ... Xn 2 X1 3 X2 3 ... Xn 3 X1 4 X2 4 ... Xn 4 G G G X1 5 X2 5 ... Xn 5 Network inference 52
  92. 92. DAG: remark X1 t X1 t+1 X4 X2 t X2 t+1 X1 versus X3 t+1 X3 X2 X5 G X4 t+1 X5 t+1 Argh, there is a cycle :’( is indeed a DAG Overcomes the rather restrictive acyclic requirement Network inference 53
  93. 93. Modeling the underlying distribution (1) Model for data generation A microarray can be represented as a multivariate vector X = (X1 , . . . , Xp ) 2 Rp , generated through a first order vector autoregressive process VAR(1): X t = ⇥X t 1 + b + "t , t 2 [1, n] where "t is a white noise to ensure the Markov property and X 0 ⇠ N (0, ⌃0 ). Consequence: a Gaussian Graphical Model I Each X t |X t 1 ⇠ N (✓X t 1 , ⌃), I or, equivalently, Xjt |X t 1 ⇠ N (⇥j X t 1 , ⌃) where ⌃ is known and ⇥j is the j th row of ⇥. Network inference 54
  94. 94. Modeling the underlying distribution (1) Model for data generation A microarray can be represented as a multivariate vector X = (X1 , . . . , Xp ) 2 Rp , generated through a first order vector autoregressive process VAR(1): X t = ⇥X t 1 + b + "t , t 2 [1, n] where "t is a white noise to ensure the Markov property and X 0 ⇠ N (0, ⌃0 ). Consequence: a Gaussian Graphical Model I Each X t |X t 1 ⇠ N (✓X t 1 , ⌃), I or, equivalently, Xjt |X t 1 ⇠ N (⇥j X t 1 , ⌃) where ⌃ is known and ⇥j is the j th row of ⇥. Network inference 54
  95. 95. Modeling the underlying distribution (2) I 2 3 2 3 ✓11 . . . ✓1j . . ✓1p 2 1 3 2 3 2 1 3 Xt1 X b " 6 . 7 6 6 . . . . . . . . 7 6 t 17 6 1 7 6 t 7 7 . .7 6.7 6 7 6 . . . . . . . . 76 7 6 6 . 7 6 76 . 7 6 . 7 6 . 7 6 i7 6 . . . . . . . . 7676 7 6 7 6 7 6 Xt 7 6 . 7 6 7 6 i7 6 7 = 6 ✓i1 . . . ✓ij . 7 6 j 7 + 6 bi 7 + 6 "t 7 . ✓ip 7 6 6 . 7 6 X 7 6.7 6.7 6 7 6 . . . . . . . . 7 6 t 17 6 7 6 7 6 . 7 6 76 . 7 6 . 7 6 . 7 6 7 6 . . . . . . . . 76 7 6 7 6 7 4 . 5 6 74 . 5 4 . 5 4 . 5 4 . . . . . . . . 5 Xtp Xtp 1 bp "pt ✓p1 . . . ✓pj . . ✓pp I Example: 0 1 ✓11 ✓12 0 ⇥ = @ ✓21 0 0 A 0 ✓32 0 Network inference 55
  96. 96. Modeling the underlying distribution (3) Interpretation as a GGM The VAR(1) as a covariance selection model ⇣ ⌘ cov Xit , Xjt 1 |XPj1 t ✓ij = ⇣ ⌘ , var Xjt 1 |XPj1 t Graphical Interpretation The matrix ⇥ = (✓ij )i,j 2P encodes the network G we are looking for. conditional dependency between Xjt 1 and Xit ? i or if and only if non-null partial correlation between Xjt 1 and Xit j m ✓ij 6= 0 Network inference 56
  97. 97. Modeling the underlying distribution (3) Interpretation as a GGM The VAR(1) as a covariance selection model ⇣ ⌘ cov Xit , Xjt 1 |XPj1 t ✓ij = ⇣ ⌘ , var Xjt 1 |XPj1 t Graphical Interpretation The matrix ⇥ = (✓ij )i,j 2P encodes the network G we are looking for. conditional dependency between Xjt 1 and Xit ? i or if and only if non-null partial correlation between Xjt 1 and Xit j m ✓ij 6= 0 Network inference 56
  98. 98. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 57
  99. 99. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 58
  100. 100. The graphical models: remindera a for goldfish-like memories Assumption A microarray can be represented as a multivariate Gaussian vector X . Collecting gene expression 1. Steady-state data leads to an i.i.d. sample. 2. Time-course data gives a time series. Graphical interpretation i conditional dependency between X (i) and X (j ) if and only if or j non null partial correlation between X (i) and X (j ) Encoded in an unknown matrix of parameters ⇥. Network inference 59
  101. 101. The graphical models: remindera a for goldfish-like memories Assumption A microarray can be represented as a multivariate Gaussian vector X . Collecting gene expression 1. Steady-state data leads to an i.i.d. sample. 2. Time-course data gives a time series. Graphical interpretation i ? conditional dependency between X (i) and X (j ) if and only if or j non null partial correlation between X (i) and X (j ) Encoded in an unknown matrix of parameters ⇥. Network inference 59
  102. 102. The graphical models: remindera a for goldfish-like memories Assumption A microarray can be represented as a multivariate Gaussian vector X . Collecting gene expression 1. Steady-state data leads to an i.i.d. sample. 2. Time-course data gives a time series. Graphical interpretation i ? conditional dependency between Xt (i) and Xt 1 (j ) if and only if or j non null partial correlation between Xt (i) and Xt 1 (j ) Encoded in an unknown matrix of parameters ⇥. Network inference 59
  103. 103. The Maximum likelihood estimator The natural approach for parametric statistics Let X be a random vector with distribution defined by fX (x ; ⇥), where ⇥ are the model parameters. Maximum likelihood estimator ˆ ⇥ = arg max L(⇥; X) ⇥ where L is the log likelihood, a function of the parameters: n Y L(⇥; X) = log fX (xk ; ⇥), k =1 where xk is the k row of X. Remarks I This a convex optimization problem, I We just need to detect non zero coe cients in ⇥ Network inference 60
  104. 104. The penalized likelihood approach Let ⇥ be the parameters to infer (the edges). A penalized likelihood approach ˆ ⇥ = arg max L(⇥; X) pen`1 (⇥), ⇥ I L is the model log-likelihood, I pen`1 is a penalty function tuned by > 0. It performs 1. regularization (needed when n ⌧ p), 2. selection (sparsity induced by the `1 -norm), Network inference 61
  105. 105. The penalized likelihood approach Let ⇥ be the parameters to infer (the edges). A penalized likelihood approach ˆ ⇥ = arg max L(⇥; X) pen`1 (⇥), ⇥ I L is the model log-likelihood, I pen`1 is a penalty function tuned by > 0. It performs 1. regularization (needed when n ⌧ p), 2. selection (sparsity induced by the `1 -norm), Network inference 61
  106. 106. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 62
  107. 107. A Geometric View of Sparsity Constrained Optimization We basically want to solve a problem of the form maximize f ( 1, 2 ; X) 1, 2 2 ; X) where f is typically a concave likelihood function. 1, This is strictly equivalent to solve f( minimize g( 1, 2 ; X) 1, 2 where g = f is convex ! For instance the square lost in the OLS. 2 1 Network inference 63
  108. 108. A Geometric View of Sparsity Constrained Optimization ( 2 ; X) maximize f ( 1, 2 ; X) 1, 2 , s.t. ⌦( 1, 2) c 1, where ⌦ defines a domain that f( constrains . 2 1 Network inference 63
  109. 109. A Geometric View of Sparsity Constrained Optimization ( maximize f ( 1, 2 ; X) 1, 2 , s.t. ⌦( 1, 2) c where ⌦ defines a domain that constrains . m 2 maximize f ( 1, 2 ; X) ⌦( 1, 2) 1, 2 1 Network inference 63
  110. 110. A Geometric View of Sparsity Constrained Optimization ( maximize f ( 1, 2 ; X) 1, 2 , s.t. ⌦( 1, 2) c where ⌦ defines a domain that constrains . m maximize f ( 1, 2 ; X) ⌦( 1, 2) 2 1, 2 How shall we define ⌦ to induce sparsity? 1 Network inference 63
  111. 111. A Geometric View of Sparsity Supporting Hyperplane An hyperplane supports a set i↵ I the set is contained in one half-space I the set has at least one point on the hyperplane 2 1 Network inference 64
  112. 112. A Geometric View of Sparsity Supporting Hyperplane An hyperplane supports a set i↵ I the set is contained in one half-space I the set has at least one point on the hyperplane 2 1 Network inference 64
  113. 113. A Geometric View of Sparsity Supporting Hyperplane An hyperplane supports a set i↵ I the set is contained in one half-space I the set has at least one point on the hyperplane 2 1 Network inference 64
  114. 114. A Geometric View of Sparsity Supporting Hyperplane An hyperplane supports a set i↵ I the set is contained in one half-space I the set has at least one point on the hyperplane 2 1 Network inference 64
  115. 115. A Geometric View of Sparsity Supporting Hyperplane An hyperplane supports a set i↵ I the set is contained in one half-space I the set has at least one point on the hyperplane 2 1 There are Supporting Hyperplane at all points of convex sets: Generalize tangents Network inference 64
  116. 116. A Geometric View of Sparsity Supporting Hyperplane An hyperplane supports a set i↵ I the set is contained in one half-space I the set has at least one point on the hyperplane 2 2 1 1 Network inference 64
  117. 117. A Geometric View of Sparsity Dual Cone Generalizes normals 2 2 2 1 1 1 Network inference 65
  118. 118. A Geometric View of Sparsity Dual Cone Generalizes normals 2 2 2 1 1 1 Network inference 65
  119. 119. A Geometric View of Sparsity Dual Cone Generalizes normals 2 2 2 1 1 1 Network inference 65
  120. 120. A Geometric View of Sparsity Dual Cone Generalizes normals 2 2 2 1 1 1 Shape of dual cones ) sparsity pattern Network inference 65
  121. 121. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 66
  122. 122. The LASSO R. Tibshirani, 1996. The Lasso: Least Absolute Shrinkage and Selection Operator S. Chen , D. Donoho , M. Saunders, 1995. 3.2. Basis Pursuit. Régularisations ` p 23 Weisberg, 1980. Forward Stagewise regression. 2 2 ( minimize ky X k2 , 2 2R2 s.t.` 2 k k1 = | 1 | + | 2|  c. ls ls m `1 1 1 minimize ky X k2 + k k1 . 2 2R Fig. 3.2 – Comparaisons des solutions de problèmes régularisés par une norme `1 et `2 . Network inference 67
  123. 123. Orthogonal case and link to the OLS OLS shrinkage The Lasso has no analytical solution but in the orthogonal case: when X| X = I (never for real data), ˆlasso = sign( ˆols ) max(0, | ˆols | ). j j j OLS 4 Lasso 2 0 ols 4 2 0 2 4 2 4 Network inference 68
  124. 124. LARs: Least angle regression B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, 2004. Least Angle Regression. E cient algorithm to compute the Lasso solutions The LARS solution consists of a curve denoting the solution for each value of . I construct a piecewise linear path of solution starting from the null vector towards the OLS estimate, I (Almost) the same cost as OLS, I well adapted to cross validation (help us to choose ). Network inference 69
  125. 125. Example: prostate cancer I Lasso solution path with Lars > library(lars) > load("prostate.rda") > x <- as.matrix(x) > x <- scale(as.matrix(x)) > out <- lars(x,y) > plot(out) Network inference 70
  126. 126. Example: prostate cancer II LASSO 0 1 3 5 6 7 8 1 * 6 * * * ** * Standardized Coefficients * 4 * 2 * * * * * * 2 ** * * 8 * * * * * * * * 7 * * * 0 * * * ** * * * * * * 3 * 0.0 0.2 0.4 0.6 0.8 1.0 Network inference 71
  127. 127. Choice of the tuning parameter I Model selection criteria log n BIC( ) = ky X ˆ k2 2 df( ˆ ) 2 AIC( ) = ky X ˆ k2 2 df( ˆ ) where df( ˆ ) is the number of nonzero entries in . Cross-validation 1. split the data into K folds, 2. use successively each K fold as the testing set, 3. compute the test error on this K folds, 4. average to obtain the CV estimation of the test error. is chosen to minimize the CV test error. Network inference 72
  128. 128. Choice of the tuning parameter II CV choice for > cv.lars(x,y, K=10) Network inference 73
  129. 129. Choice of the tuning parameter III 1.6 1.4 ● ● Cross−Validated MSE ● ● ● 1.2 ● ● ● ● ● ● ● ● 1.0 ● ● ● ● ● ● ● ●●●● 0.8 ●●●●●●●●●●●●●●●●● ●● ●● ●● 0.6 ●●● ●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● 0.0 0.2 0.4 0.6 0.8 1.0 Network inference 74
  130. 130. Many variations Group-Lasso Activate the variables by group (given by the user). Adaptive/Weighted-Lasso Adjust the penalty level to each variables, according to prior knowledge or with data driven weights. BoLasso Bootstrapped version that removes false positives/stabilizes the estimate. etc. + many theoretical results. Network inference 75
  131. 131. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 76
  132. 132. Outline Introduction Motivations Background on omics Modeling issue Modeling tools Statistical dependence Graphical models Covariance selection and Gaussian vector Gaussian Graphical Models for genomic data Steady-state data Time-course data Statistical inference Penalized likelihood approach Inducing sparsity and regularization The Lasso Application in Post-genomics Modeling time-course data Illustrations Multitask learning Network inference 77
  133. 133. Problem t0 Inference t1 tn ⇡ 10s microarrays over time Which interactions? ⇡ 1000s probes (“genes”) The main statistical issue is the high dimensional setting. Network inference 78
  134. 134. Handling the scarcity of the data By introducing some prior Priors should be biologically grounded 1. few genes e↵ectively interact (sparsity), 2. networks are organized (latent clustering), G8 G7 G9 G11 G1 G6 G10 G4 G5 G2 G12 G13 G3 Network inference 79
  135. 135. Handling the scarcity of the data By introducing some prior Priors should be biologically grounded 1. few genes e↵ectively interact (sparsity), 2. networks are organized (latent clustering), G8 G7 G9 G11 G1 G6 G10 G4 G5 G2 G12 G13 G3 Network inference 79
  136. 136. Handling the scarcity of the data By introducing some prior Priors should be biologically grounded 1. few genes e↵ectively interact (sparsity), 2. networks are organized (latent clustering), B3 B2 B4 B A1 B1 B5 A4 A A2 C1 C A3 Network inference 79
  137. 137. Penalized log-likelihood Banerjee et al., JMLR 2008 ˆ ⇥ = arg max Liid (⇥; S) k⇥k`1 , ⇥ e ciently solved by the graphical Lasso of Friedman et al, 2008. Ambroise, Chiquet, Matias, EJS 2009 Use adaptive penalty parameters for di↵erent coe cients Liid (⇥; S) kPZ ? ⇥k`1 , where PZ is a matrix of weights depending on the underlying clustering Z. Works with the pseudo log-likelihood (computationally e cient). Network inference 80
  138. 138. Penalized log-likelihood Banerjee et al., JMLR 2008 ˆ ⇥ = arg max Liid (⇥; S) k⇥k`1 , ⇥ e ciently solved by the graphical Lasso of Friedman et al, 2008. Ambroise, Chiquet, Matias, EJS 2009 Use adaptive penalty parameters for di↵erent coe cients ˜ Liid (⇥; S) kPZ ? ⇥k`1 , where PZ is a matrix of weights depending on the underlying clustering Z. Works with the pseudo log-likelihood (computationally e cient). Network inference 80
  139. 139. Neighborhood selection (1) Let I Xi be the i th column of X, I Xi be X deprived of Xi . ✓ij Xi = Xi + ", where j = . ✓ii Meinshausen and B¨lhman, 2006 u Since sign(corij |P{i,j } ) = sign( j ), select the neighbors of i with 1 2 arg min Xi Xi 2 + k k`1 . n The sign pattern of ⇥ is inferred after a symmetrization step. Network inference 81
  140. 140. Neighborhood selection (2) The pseudo log-likelihood of the i.i.d Gaussian sample is p n ! X X ˜ Liid (⇥; S) = log P(Xk (i )|Xk (Pi ); ⇥i ) , i=1 k =1 n n ⇣ ⌘ n 1/2 1/2 = log det(D) Trace D ⇥S⇥D log(2⇡), 2 2 2 where D = diag(⇥). Proposition ˆ pseudo = arg max Liid (⇥; S) ⇥ ˜ k⇥k`1 ⇥:✓ij 6=✓ii has the same null entries as inferred by neighborhood selection. Network inference 82
  141. 141. Structured regularization Introduce prior knowledge Building the weights 1. Build w from prior biological information I transcription factors vs. regulatees, I number of potential binding sites, I KEGG pathways, Gene Ontology . . . 2. Build the weights matrix from clustering algorithm I Infer the network G 0 with w = 1 for each node, I Apply a clustering algorithm on G 0 , I Re-Infer G with w built according to the clustering Z. Network inference 83

×