Ensuring Technical Readiness For Copilot in Microsoft 365
Biological Network Inference via Gaussian Graphical Models
1. An introduction to Biological network inference via
Gaussian Graphical Models
Christophe Ambroise, Julien Chiquet
e ´
Statistique et G´nome, CNRS & Universit´ d’Evry Val d’Essonne
e
S˜o Paulo – School on Advance Science – Octobre 2012
a
http://stat.genopole.cnrs.fr/~cambroise
Network inference 1
2. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 2
3. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 3
4. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 4
5. Real networks
I Many scientific fields :
I World Wide Web
I Biology, sociology, physics
I Nature of data under study:
I Interactions between N
objects
I O(N 2 ) possible interactions
I Network topology :
I Describes the way nodes
interact, structure/function Sample of 250 blogs (nodes) with their links
relationship (edges) of the French political Blogosphere.
Network inference 5
6. 1
What the reconstructed networks are expected to be (1)
Regulatory networks
E. coli regulatory network
I relationships between
gene and their products
I inhibition/activation
I impossible to recover at
large scale
I always incomplete
1
1
and are presumably wrongly assumed to be
Network inference 6
7. What the reconstructed networks are expected to be (2)
Regulatory networks
Figure: Regulatory network identified in mammalian cells: highly structured
Network inference 7
8. What the reconstructed networks are expected to be (3)
Protein-Protein interaction networks
Figure: Yeast PPI network : do not be mislead by the representation, trust stat !
Network inference 8
9. What the reconstructed networks are expected to be (3)
Protein-Protein interaction networks
Figure: Yeast PPI network : do not be mislead by the representation, trust stat !
Network inference 8
10. What the reconstructed networks are expected to be (3)
Protein-Protein interaction networks
Figure: Yeast PPI network : do not be mislead by the representation, trust stat !
Network inference 8
11. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 9
12. What are we looking at?
Central dogma of molecular biology
transcription translation
DNA mRNA Proteins
replication
Proteins
I are building blocks of any cellular functionality,
I are encoded by the genes,
I do interact (at the protein and gene level – regulations).
Network inference 10
13. What questions in functional genomics? (1)
Various levels/scales of study
I genome: sequence analysis,
I transcriptome: gene expression levels,
I proteome: protein functions and interactions.
Questions
1. Biological understanding
I Mechanisms of diseases,
I gene/protein functions and interactions.
2. Medical/clinical care
I Diagnostic (type of disease),
I prognostic (survival analysis),
I treatment (prediction of response).
Network inference 11
14. What questions in functional genomics? (1)
Various levels/scales of study
I genome: sequence analysis,
I transcriptome: gene expression levels,
I proteome: protein functions and interactions.
Questions
1. Biological understanding
I Mechanisms of diseases,
I gene/protein functions and interactions.
2. Medical/clinical care
I Diagnostic (type of disease),
I prognostic (survival analysis),
I treatment (prediction of response).
Network inference 11
15. What questions in functional genomics? (2)
Central dogma of molecular biology
transcription translation
DNA mRNA Proteins
replication
Basic biostatistical issues
Selecting some genes of interest (biomarkers),
Looking for interactions between them (pathway analysis).
Network inference 12
16. How is this measured? (1)
Microarray technology: parallel measurement of many biological features
signal processing
Matrix of features n ⌧ p 0 1 2 3 p 1
x1 x1 x1 . . . x1
Expression levels of p B. C
pretreatment X=@ .
. A
probes are simultaneously p
1 2 2
xn xn x1 . . . xn
monitored for n individuals
Network inference 13
17. How is this measured? (2)
Next Generation Sequencing: parallel measurement of even many more biological features
assembling
Matrix of features n n p 0 1 2 3 p 1
k1 k1 k1 . . . k1
B. C
X=@ .
Expression counts are extracted
pretreatment . A
from small repeated sequences p
1 2 2
kn kn k1 . . . kn
and monitored for n individuals
Network inference 14
18. What questions are we dealing with? (1)
Supervised canonical example at the gene level: di↵erential analysis
Leukemia (Golub data, thanks to P. Neuvial)
I AML – Acute Myeloblastic Leukemia, n1 = 11,
I ALL – Acute Lymphoblastic Leukemia n2 = 27,
a n1 + n2 vector of outcome with each patient’s tumor type.
Supervised classification
Find genes with significant
di↵erent expression levels
between groups – biomarkers
prediction purpose
Network inference 15
19. What questions are we dealing with? (2)
Unsupervised canonical example at the gene level: hierarchical clustering
Same kind of data, no outcome is considered
(Unsupervised) clustering
Find groups of gene which show
statistical
dependencies/commonalities –
hoping for biological interactions
exploratory purpose
functional understanding
Can we do better than that ? And how do genes interact anyway?
Network inference 16
20. What questions are we dealing with? (2)
Unsupervised canonical example at the gene level: hierarchical clustering
Same kind of data, no outcome is considered
(Unsupervised) clustering
Find groups of gene which show
statistical
dependencies/commonalities –
hoping for biological interactions
exploratory purpose
functional understanding
Can we do better than that ? And how do genes interact anyway?
Network inference 16
21. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 17
22. The problem at hand
Inference
⇡ 10s/100s microarray/sequencing experiments
⇡ 1000s probes (“genes”)
Modeling questions prior to inference
1. What do the nodes represent? (the easiest one)
2. What is/should be the meaning of an edge? (the toughest one)
I Biologically?
I Statistically?
Network inference 18
23. The problem at hand
Inference
⇡ 10s/100s microarray/sequencing experiments
⇡ 1000s probes (“genes”)
Modeling questions prior to inference
1. What do the nodes represent? (the easiest one)
2. What is/should be the meaning of an edge? (the toughest one)
I Biologically?
I Statistically?
Network inference 18
24. The problem at hand
Inference
⇡ 10s/100s microarray/sequencing experiments
⇡ 1000s probes (“genes”)
Modeling questions prior to inference
1. What do the nodes represent? (the easiest one)
2. What is/should be the meaning of an edge? (the toughest one)
I Biologically?
I Statistically?
Network inference 18
25. The problem at hand
Inference
⇡ 10s/100s microarray/sequencing experiments
⇡ 1000s probes (“genes”)
Modeling questions prior to inference
1. What do the nodes represent? (the easiest one)
2. What is/should be the meaning of an edge? (the toughest one)
I Biologically?
I Statistically?
Network inference 18
26. More questions/issues
Modelling
I Is the network dynamic of static?
I How has the data been generated? (time-course/steady state)
I Are the edges oriented or not? (causality)
I What do the edges represent for my particular problem?
Statistical challenges
I (Ultra) high dimensionality,
I Noisy data, lack of reproducibility,
I Heterogeneity of the data (many techniques, various signals).
Network inference 19
27. More questions/issues
Modelling
I Is the network dynamic of static?
I How has the data been generated? (time-course/steady state)
I Are the edges oriented or not? (causality)
I What do the edges represent for my particular problem?
Statistical challenges
I (Ultra) high dimensionality,
I Noisy data, lack of reproducibility,
I Heterogeneity of the data (many techniques, various signals).
Network inference 19
28. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 20
29. Canonical model settings
Biological microarrays in comparable conditions
Notations
1. a set P = {1, . . . , p} of p variables:
these are typically the genes (could be proteins);
2. a sample N = {1, . . . , n} of individuals associated to the variables:
these are typically the microarray (could be sequence counts).
Basic statistical model
This can be view as
I a random vector X in Rp , whose j th entry is the j th variable,
I a n-size sample (X 1 , . . . , X n ), such as X i is the i th microarrays,
I could be independent identically distributed copies (steady-state)
I could be dependent in a certain way (time-course data)
I assume a parametric probability distribution for X (Gaussian).
Network inference 21
30. Canonical model settings
Biological microarrays in comparable conditions
Notations
1. a set P = {1, . . . , p} of p variables:
these are typically the genes (could be proteins);
2. a sample N = {1, . . . , n} of individuals associated to the variables:
these are typically the microarray (could be sequence counts).
Basic statistical model
This can be view as
I a random vector X in Rp , whose j th entry is the j th variable,
I a n-size sample (X 1 , . . . , X n ), such as X i is the i th microarrays,
I could be independent identically distributed copies (steady-state)
I could be dependent in a certain way (time-course data)
I assume a parametric probability distribution for X (Gaussian).
Network inference 21
31. Canonical model settings
Biological microarrays in comparable conditions
Notations
1. a set P = {1, . . . , p} of p variables:
these are typically the genes (could be proteins);
2. a sample N = {1, . . . , n} of individuals associated to the variables:
The data are typically the microarray (could be sequence counts).
these
Stacking (X 1 , . . . , X n ), we met the usual individual/variable table X
Basic statistical model 0 1 2 3 p1
This can be view as x1 x1 x1 . . . x1
B. C
I Inference j th @ .
a random vector X in Rp , whose X =entry is the j th variable, A
.
1 2 2 p
I a n-size sample (X 1 , . . . , X n ), such as Xxin is xn ix1 microarrays,
the th . . . xn
I could be independent identically distributed copies (steady-state)
I could be dependent in a certain way (time-course data)
I assume a parametric probability distribution for X (Gaussian).
Network inference 21
32. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 22
33. Modeling relationship between variables (1)
Independence
Definition (Independence of events)
Two events A and B are independent if and only if
P(A, B ) = P(A)P(B ),
which is usually denoted by A ? B . Equivalently,
?
I A ? B , P(A|B ) = P(A),
?
I A ? B , P(A|B ) = P(A|B c )
?
Example (class vs party)
party party
class Labour Tory class Labour Tory
working 0.42 0.28 working 0.60 0.40
bourgeoisie 0.06 0.24 bourgeoisie 0.20 0.80
Table: Joint probability (left) vs. conditional probability (right)
Network inference 23
34. Modeling relationship between variables (1)
Independence
Definition (Independence of events)
Two events A and B are independent if and only if
P(A, B ) = P(A)P(B ),
which is usually denoted by A ? B . Equivalently,
?
I A ? B , P(A|B ) = P(A),
?
I A ? B , P(A|B ) = P(A|B c )
?
Example (class vs party)
party party
class Labour Tory class Labour Tory
working 0.42 0.28 working 0.60 0.40
bourgeoisie 0.06 0.24 bourgeoisie 0.20 0.80
Table: Joint probability (left) vs. conditional probability (right)
Network inference 23
35. Modeling relationships between variables (2)
Conditional independence
Generalizing to more than two events requires strong assumptions
(mutual independence). Better handle with
Definition (Conditional independence of events)
Two events A and B are independent if and only if
P(A, B |C ) = P(A|C )P(B |C ),
which is usually denoted by A ? B |C
?
Example (Does QI depends on weight?)
Consider the events A = ”having low QI”, B = ”having low weight”.
Network inference 24
36. Modeling relationships between variables (2)
Conditional independence
Generalizing to more than two events requires strong assumptions
(mutual independence). Better handle with
Definition (Conditional independence of events)
Two events A and B are independent if and only if
P(A, B |C ) = P(A|C )P(B |C ),
which is usually denoted by A ? B |C
?
Example (Does QI depends on weight?)
Consider the events A = ”having low QI”, B = ”having low weight”.
Network inference 24
37. Modeling relationships between variables (2)
Conditional independence
Generalizing to more than two events requires strong assumptions
(mutual independence). Better handle with
Definition (Conditional independence of events)
Two events A and B are independent if and only if
P(A, B |C ) = P(A|C )P(B |C ),
which is usually denoted by A ? B |C
?
Example (Does QI depends on weight?)
Consider the events A = ”having low QI”, B = ”having low weight”.
Estimating2 P(A, B ), P(A) and P(B ) in a sample would lead to
P(A, B ) 6= P(A)P(B )
2
stupidly
Network inference 24
38. Modeling relationships between variables (2)
Conditional independence
Generalizing to more than two events requires strong assumptions
(mutual independence). Better handle with
Definition (Conditional independence of events)
Two events A and B are independent if and only if
P(A, B |C ) = P(A|C )P(B |C ),
which is usually denoted by A ? B |C
?
Example (Does QI depends on weight?)
Consider the events A = ”having low QI”, B = ”having low weight”.
But in fact, introducing C = ”having a given age”,
P(A, B |C ) = P(A|C )P(B |C )
Network inference 24
39. Independence of random vectors (1)
Independence and Conditional independence: natural generalization
Definition
Consider 3 random vector X , Y , Z with distribution fX , fY , fZ , jointly
fXY , fXYZ . Then,
I X and Y are independent iif fXY (x , y) = fX (x )fY (y);
I X and Y are conditionally independent on Z , z : fZ (z ) > 0 iif
fXY |Z (x , y; z ) = fX |Z (x ; z )fY |Z (y; z ).
Proposition (Factorization criterion)
X and Y are independent (resp. conditionally independent on Z ) iif
there exists functions g and h such as, for all x and y
1. fXY (x , y) = g(x )h(y),
2. fXYZ (x , y, z ) = g(x , z )h(y, z ), for all z fZ (z ) > 0.
Network inference 25
40. Independence of random vectors (1)
Independence and Conditional independence: natural generalization
Definition
Consider 3 random vector X , Y , Z with distribution fX , fY , fZ , jointly
fXY , fXYZ . Then,
I X and Y are independent iif fXY (x , y) = fX (x )fY (y);
I X and Y are conditionally independent on Z , z : fZ (z ) > 0 iif
fXY |Z (x , y; z ) = fX |Z (x ; z )fY |Z (y; z ).
Proposition (Factorization criterion)
X and Y are independent (resp. conditionally independent on Z ) iif
there exists functions g and h such as, for all x and y
1. fXY (x , y) = g(x )h(y),
2. fXYZ (x , y, z ) = g(x , z )h(y, z ), for all z fZ (z ) > 0.
Network inference 25
41. Independence of random vectors (2)
Independence vs Conditional independence
f ; X ? Y |Z
?
f ; fXYZ
f ; fX fY fZ
f ; X ? Z |Y
? f ; Y ? Z |X
?
Figure: Mutual independence, Conditional dependence, full dependence.
Network inference 26
42. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 27
43. Definition
Definition
A graphical model gives a graphical (intuitive) representation of the
dependence structure of a probability distribution.
Graphical structure $ Random variables/Random vector
It links
1. a random vector (or a set of random variables.) X = {X1 , . . . , Xp }
with distribution P,
2. a graph G = (P, E) where
I P = {1, . . . , p} is the set of nodes associated to each variable,
I E is a set of edges describing the dependence relationship of X ⇠ P.
Network inference 28
44. Definition
Definition
A graphical model gives a graphical (intuitive) representation of the
dependence structure of a probability distribution.
Graphical structure $ Random variables/Random vector
It links
1. a random vector (or a set of random variables.) X = {X1 , . . . , Xp }
with distribution P,
2. a graph G = (P, E) where
I P = {1, . . . , p} is the set of nodes associated to each variable,
I E is a set of edges describing the dependence relationship of X ⇠ P.
Network inference 28
45. Conditional Independence Graphs
Definition
Definition
The conditional independence graph of a random vector X is the
undirected graph G = {P, E} with the set of node P = {1, . . . , p} and
where
(i , j ) 2 E , Xi ? Xj |P{i , j }.
/ ?
Property
It owns the Markov property: any two subsets of variables separated by a
third is independent conditionally on variables in the third set.
Network inference 29
46. Conditional Independence Graphs
Definition
Definition
The conditional independence graph of a random vector X is the
undirected graph G = {P, E} with the set of node P = {1, . . . , p} and
where
(i , j ) 2 E , Xi ? Xj |P{i , j }.
/ ?
Property
It owns the Markov property: any two subsets of variables separated by a
third is independent conditionally on variables in the third set.
Network inference 29
47. Conditional Independence Graphs
An example
Let X1 , X2 , X3 , X4 be four random variables with joint probability density
function fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) with u a given constant.
Apply the factorization property
fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 )
= exp(u) · exp(x1 + x1 x2 ) · exp(x2 x3 x4 )
Graphical representation
1 2 4
G = (P, E) such as P = {1, 2, 3, 4}
and
E=
3
Network inference 30
48. Conditional Independence Graphs
An example
Let X1 , X2 , X3 , X4 be four random variables with joint probability density
function fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) with u a given constant.
Apply the factorization property
fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 )
= exp(u) · exp(x1 + x1 x2 ) · exp(x2 x3 x4 )
Graphical representation
1 2 4
G = (P, E) such as P = {1, 2, 3, 4}
and
E = {?}
3
Network inference 30
49. Conditional Independence Graphs
An example
Let X1 , X2 , X3 , X4 be four random variables with joint probability density
function fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) with u a given constant.
Apply the factorization property
fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 )
= exp(u) · exp(x1 + x1 x2 ) · exp(x2 x3 x4 )
Graphical representation
1 2 4
G = (P, E) such as P = {1, 2, 3, 4}
and
E = {(1, 2)}
3
Network inference 30
50. Conditional Independence Graphs
An example
Let X1 , X2 , X3 , X4 be four random variables with joint probability density
function fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) with u a given constant.
Apply the factorization property
fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 )
= exp(u) · exp(x1 + x1 x2 ) · exp(x2 x3 x4 )
Graphical representation
1 2 4
G = (P, E) such as P = {1, 2, 3, 4}
and
E = {(2, 3), (3, 4), (2, 4)}
3
Network inference 30
51. Directed Acyclic conditional independence Graph (DAG)
Motivation
Limitation of undirected graphs
Sometimes an ordering on the variables is known, which allows to break
the symmetry in the graphical representation to introduce, in some sense,
“causality” in the modeling.
Consequences
I Each element of E has to be directed.
I There are no directed cycle in the graph.
We thus deal with a directed acyclic graph (or DAG).
Network inference 31
52. Directed Acyclic conditional independence Graph (DAG)
Definition
Definition (Ordering)
An ordering between variables {1, . . . , p} is a relation such that: i) for
all couple (i , j ), either i j or j i , ii) is transitive iii) is not
reflexive.
I A natural ordering is obtained when variables are observed across
time,
I A natural conditioning set for a pair of variables (i , j ) is the past,
denoted P(j ) = 1, . . . , j for j .
Definition (DAG)
The directed conditional dependence graph of X is the directed graph
G = (P, E ) where
(i , j ) such as i j 2 E , Xj ? Xi |P(j ){i , j }.
/ ?
Network inference 32
53. Directed Acyclic conditional independence Graph (DAG)
Definition
Definition (Ordering)
An ordering between variables {1, . . . , p} is a relation such that: i) for
all couple (i , j ), either i j or j i , ii) is transitive iii) is not
reflexive.
I A natural ordering is obtained when variables are observed across
time,
I A natural conditioning set for a pair of variables (i , j ) is the past,
denoted P(j ) = 1, . . . , j for j .
Definition (DAG)
The directed conditional dependence graph of X is the directed graph
G = (P, E ) where
(i , j ) such as i j 2 E , Xj ? Xi |P(j ){i , j }.
/ ?
Network inference 32
54. Directed Acyclic conditional independence Graph (DAG)
Factorization and Markov property
Another view is a parent/descendant relationships to deal with the
ordering of the nodes:
The factorization property
p
Y
fX (x ) = fXk |pak (xk |pak ),
k =1
where pak are the parents of node k .
Network inference 33
63. Directed Acyclic conditional independence Graph (DAG)
Markov property
Local Markov property
For any Y 2 dek where dek are the descendants of k , then
Xk ? Y | pak ,
?
that is, Xk is conditionally independent on its non-descendants given its
parents.
Network inference 35
66. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 37
67. Modeling the genomic data
Gaussian assumption
The data
0 1 2 3 p 1
x1 x1 x1 . . . x1
B. C
Inference X=@ .
. A
1 2 2 p
xn xn x1 . . . xn
Assuming fX (X) multivariate Gaussian
Greatly simplifies the inference:
naturally links independence and conditional independence to the
covariance and partial covariance,
gives a straightforward interpretation to the graphical modeling
previously considered.
Network inference 38
68. Modeling the genomic data
Gaussian assumption
The data
0 1 2 3 p 1
x1 x1 x1 . . . x1
B. C
Inference X=@ .
. A
1 2 2 p
xn xn x1 . . . xn
Assuming fX (X) multivariate Gaussian
Greatly simplifies the inference:
naturally links independence and conditional independence to the
covariance and partial covariance,
gives a straightforward interpretation to the graphical modeling
previously considered.
Network inference 38
69. Start gently with the univariate Gaussian distribution
The Gaussian distribution is the
natural model for the level of
expression of gene (noisy data).
We note X ⇠ N (µ, 2 ),so as EX = µ, VarX = 2 and
⇢
1 1
fX (x ) = p exp (x µ)2 ,
2⇡ 2 2
and
p 1
log fX (x ) = log 2⇡ 2
(x µ)2 .
2
Useless for modeling the distribution of expression level for a whole
bunch of genes.
Network inference 39
70. Start gently with the univariate Gaussian distribution
The Gaussian distribution is the
natural model for the level of
expression of gene (noisy data).
We note X ⇠ N (µ, 2 ),so as EX = µ, VarX = 2 and
⇢
1 1
fX (x ) = p exp (x µ)2 ,
2⇡ 2 2
and
p 1
log fX (x ) = log 2⇡ 2
(x µ)2 .
2
Useless for modeling the distribution of expression level for a whole
bunch of genes.
Network inference 39
71. One step forward: bivariate Gaussian distribution
Need concepts of covariance and correlation
Let X , Y be two real random variables.
Definitions h i
cov(X , Y ) = E X E(X ) Y E(Y ) = E(XY ) E(X )E(Y ).
cov(X , Y )
⇢XY = cor(X , Y ) = p .
Var(X ) · Var(Y )
Proposition
I cov(X , X ) = Var(X ) = E[(X EX )(Y EY )],
I cov(X + Y , Z ) = cov(X , Z ) + cov(X , Z ),
I Var(X + Y ) = Var(X ) + Var(Y ) + 2cov(X , Y ).
I X ? Y ) cov(X , Y ) = 0.
?
I X ? Y , cov(X , Y ) = 0 when X , Y are Gaussian.
?
Network inference 40
72. One step forward: bivariate Gaussian distribution
Need concepts of covariance and correlation
Let X , Y be two real random variables.
Definitions h i
cov(X , Y ) = E X E(X ) Y E(Y ) = E(XY ) E(X )E(Y ).
cov(X , Y )
⇢XY = cor(X , Y ) = p .
Var(X ) · Var(Y )
Proposition
I cov(X , X ) = Var(X ) = E[(X EX )(Y EY )],
I cov(X + Y , Z ) = cov(X , Z ) + cov(X , Z ),
I Var(X + Y ) = Var(X ) + Var(Y ) + 2cov(X , Y ).
I X ? Y ) cov(X , Y ) = 0.
?
I X ? Y , cov(X , Y ) = 0 when X , Y are Gaussian.
?
Network inference 40
73. The bivariate Gaussian distribution
✓ ◆
1 1 1 x µ1
fXY (x , y) = p exp{ x µ1 y µ2 ⌃ }
2⇡ det ⌃ 2 y µ2
where ⌃ is the variance/covariance matrix which is symmetric and
positive definite.
✓ ◆
Var(X ) cov(Y , X )
⌃= .
cov(Y , X ) Var(Y )
and
1 1
fX ,Y (x , y) = p exp (x 2 + y 2 + 2⇢XY xy),
2⇡(1 ⇢2 )
XY
2(1 ⇢2 )
XY
where ⇢XY is the correlation between X , Y and describe the interaction
between them.
Network inference 41
74. The bivariate Gaussian distribution
✓ ◆
1 1 1 x µ1
fXY (x , y) = p exp{ x µ1 y µ2 ⌃ }
2⇡ det ⌃ 2 y µ2
where ⌃ is the variance/covariance matrix which is symmetric and
positive definite. If standardized,
✓ ◆
1 ⇢XY
⌃= .
⇢XY 1
and
1 1
fX ,Y (x , y) = p exp (x 2 + y 2 + 2⇢XY xy),
2⇡(1 ⇢2 )
XY
2(1 ⇢2 )
XY
where ⇢XY is the correlation between X , Y and describe the interaction
between them.
Network inference 41
75. The bivariate Gaussian distribution
The Covariance Matrix
Let
X ⇠ N (0, ⌃),
with unit variance and
⇢XY = 0
✓ ◆
1 0
⌃= .
0 1
The shape of the 2-D
distribution evolves
accordingly.
Network inference 42
76. The bivariate Gaussian distribution
The Covariance Matrix
Let
X ⇠ N (0, ⌃),
with unit variance and
⇢XY = 0.9
✓ ◆
1 0.9
⌃= .
0.9 1
The shape of the 2-D
distribution evolves
accordingly.
Network inference 42
77. Full generalization: multivariate Gaussian vector
Now need partial covariance and partial correlation
Let X , Y , Z be real random variables.
Definitions
cov(X , Y |Z ) = cov(X , Y ) cov(X , Z )cov(Y , Z )/Var(Z ).
⇢XY ⇢XZ ⇢YZ
⇢XY |Z = q q .
1 ⇢2XZ 1 ⇢2 YZ
Give the interaction between X and Y once removed the e↵ect of Z .
Proposition
When X , Y , Z are jointly Gaussian, then
cov(X , Y |Z ) = 0 , cor(X , Y |Z ) = 0 , X ? Y |Z .
?
Network inference 43
78. Full generalization: multivariate Gaussian vector
Now need partial covariance and partial correlation
Let X , Y , Z be real random variables.
Definitions
cov(X , Y |Z ) = cov(X , Y ) cov(X , Z )cov(Y , Z )/Var(Z ).
⇢XY ⇢XZ ⇢YZ
⇢XY |Z = q q .
1 ⇢2XZ 1 ⇢2 YZ
Give the interaction between X and Y once removed the e↵ect of Z .
Proposition
When X , Y , Z are jointly Gaussian, then
cov(X , Y |Z ) = 0 , cor(X , Y |Z ) = 0 , X ? Y |Z .
?
Network inference 43
79. The multivariate Gaussian distribution
Allow to give a modeling for the expression level of a whole set of genes
P:
Gaussian vector
Let X ⇠ N (µ, ⌃), and assume any block decomposition with {a, b} a
partition of P ✓ ◆
⌃ab ⌃ba
⌃= .
⌃ab ⌃bb
Then
1. Xa is Gaussian with distribution N (µa , ⌃aa )
2. Xa |Xb = x is Gaussian with distribution N (µa|b , ⌃a|b ) known.
Network inference 44
80. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 45
81. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 46
82. Steady-state data: scheme
Inference
⇡ 10s microarrays over time
Which interactions?
⇡ 1000s probes (“genes”)
Network inference 47
83. Modeling the underlying distribution (1)
Model for data generation
I A microarray can be represented as a multivariate vector
X = (X1 , . . . , Xp ) 2 Rp ,
I Consider n biological replicate in the same condition, which forms a
usual n-size sample (X1 , . . . , Xn ).
Consequence: a Gaussian Graphical Model
I X ⇠ N (µ, ⌃) with X1 , . . . , Xn i.i.d. copies of X ,
I ⇥ = (✓ij )i,j 2P , ⌃ 1
is called the concentration matrix.
Network inference 48
84. Modeling the underlying distribution (1)
Model for data generation
I A microarray can be represented as a multivariate vector
X = (X1 , . . . , Xp ) 2 Rp ,
I Consider n biological replicate in the same condition, which forms a
usual n-size sample (X1 , . . . , Xn ).
Consequence: a Gaussian Graphical Model
I X ⇠ N (µ, ⌃) with X1 , . . . , Xn i.i.d. copies of X ,
I ⇥ = (✓ij )i,j 2P , ⌃ 1
is called the concentration matrix.
Network inference 48
85. Modeling the underlying distribution (2)
Interpretation as a GGM
Multivariate Gaussian vector and covariance selection
✓ij
p = cor Xi , Xj |XPi,j = ⇢ij |P{i,j } ,
✓ii ✓jj
Graphical Interpretation
The matrix ⇥ = (✓ij )i,j 2P encodes the network G we are looking for.
conditional dependency between Xj and Xi
? i or
if and only if non-null partial correlation between Xj and Xi
j m
✓ij 6= 0
Network inference 49
86. Modeling the underlying distribution (2)
Interpretation as a GGM
Multivariate Gaussian vector and covariance selection
✓ij
p = cor Xi , Xj |XPi,j = ⇢ij |P{i,j } ,
✓ii ✓jj
Graphical Interpretation
The matrix ⇥ = (✓ij )i,j 2P encodes the network G we are looking for.
conditional dependency between Xj and Xi
? i or
if and only if non-null partial correlation between Xj and Xi
j m
✓ij 6= 0
Network inference 49
87. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 50
88. Time-course data: scheme
t0 Inference
t1
tn
⇡ 10s microarrays over time
Which interactions?
⇡ 1000s probes (“genes”)
Network inference 51
89. Modeling time-course data with DAG
Collecting gene expression
1. Follow-up of one single experiment/individual;
2. Close enough time-points to ensure
I dependency between consecutive measurements;
I homogeneity of the Markov process.
Xt
1 Xt+1
1
X4
Xt
2 Xt+1
2
X1 stands for Xt+1
3
X3 X2 X5 G Xt+1
4
Xt+1
5
Network inference 52
90. Modeling time-course data with DAG
Collecting gene expression
1. Follow-up of one single experiment/individual;
2. Close enough time-points to ensure
I dependency between consecutive measurements;
I homogeneity of the Markov process.
Xt
1 Xt+1
1
X4
Xt
2 Xt+1
2
X1 stands for Xt+1
3
X3 X2 X5 G Xt+1
4
Xt+1
5
Network inference 52
91. Modeling time-course data with DAG
Collecting gene expression
1. Follow-up of one single experiment/individual;
2. Close enough time-points to ensure
I dependency between consecutive measurements;
I homogeneity of the Markov process.
Xt
1 X2
1
... Xn
1
X1
2 X2
2
... Xn
2
X1
3 X2
3
... Xn
3
X1
4 X2
4
... Xn
4
G G G
X1
5 X2
5
... Xn
5
Network inference 52
92. DAG: remark
X1
t X1
t+1
X4
X2
t X2
t+1
X1 versus X3
t+1
X3 X2 X5 G X4
t+1
X5
t+1
Argh, there is a cycle :’( is indeed a DAG
Overcomes the rather restrictive acyclic requirement
Network inference 53
93. Modeling the underlying distribution (1)
Model for data generation
A microarray can be represented as a multivariate vector
X = (X1 , . . . , Xp ) 2 Rp , generated through a first order vector
autoregressive process VAR(1):
X t = ⇥X t 1
+ b + "t , t 2 [1, n]
where "t is a white noise to ensure the Markov property and
X 0 ⇠ N (0, ⌃0 ).
Consequence: a Gaussian Graphical Model
I Each X t |X t 1 ⇠ N (✓X t 1 , ⌃),
I or, equivalently, Xjt |X t 1 ⇠ N (⇥j X t 1 , ⌃)
where ⌃ is known and ⇥j is the j th row of ⇥.
Network inference 54
94. Modeling the underlying distribution (1)
Model for data generation
A microarray can be represented as a multivariate vector
X = (X1 , . . . , Xp ) 2 Rp , generated through a first order vector
autoregressive process VAR(1):
X t = ⇥X t 1
+ b + "t , t 2 [1, n]
where "t is a white noise to ensure the Markov property and
X 0 ⇠ N (0, ⌃0 ).
Consequence: a Gaussian Graphical Model
I Each X t |X t 1 ⇠ N (✓X t 1 , ⌃),
I or, equivalently, Xjt |X t 1 ⇠ N (⇥j X t 1 , ⌃)
where ⌃ is known and ⇥j is the j th row of ⇥.
Network inference 54
96. Modeling the underlying distribution (3)
Interpretation as a GGM
The VAR(1) as a covariance selection model
⇣ ⌘
cov Xit , Xjt 1 |XPj1
t
✓ij = ⇣ ⌘ ,
var Xjt 1 |XPj1
t
Graphical Interpretation
The matrix ⇥ = (✓ij )i,j 2P encodes the network G we are looking for.
conditional dependency between Xjt 1 and Xit
? i or
if and only if non-null partial correlation between Xjt 1 and Xit
j
m
✓ij 6= 0
Network inference 56
97. Modeling the underlying distribution (3)
Interpretation as a GGM
The VAR(1) as a covariance selection model
⇣ ⌘
cov Xit , Xjt 1 |XPj1
t
✓ij = ⇣ ⌘ ,
var Xjt 1 |XPj1
t
Graphical Interpretation
The matrix ⇥ = (✓ij )i,j 2P encodes the network G we are looking for.
conditional dependency between Xjt 1 and Xit
? i or
if and only if non-null partial correlation between Xjt 1 and Xit
j
m
✓ij 6= 0
Network inference 56
98. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 57
99. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 58
100. The graphical models: remindera
a
for goldfish-like memories
Assumption
A microarray can be represented as a multivariate Gaussian vector X .
Collecting gene expression
1. Steady-state data leads to an i.i.d. sample.
2. Time-course data gives a time series.
Graphical interpretation
i conditional dependency between X (i) and X (j )
if and only if or
j non null partial correlation between X (i) and X (j )
Encoded in an unknown matrix of parameters ⇥.
Network inference 59
101. The graphical models: remindera
a
for goldfish-like memories
Assumption
A microarray can be represented as a multivariate Gaussian vector X .
Collecting gene expression
1. Steady-state data leads to an i.i.d. sample.
2. Time-course data gives a time series.
Graphical interpretation
i
? conditional dependency between X (i) and X (j )
if and only if or
j non null partial correlation between X (i) and X (j )
Encoded in an unknown matrix of parameters ⇥.
Network inference 59
102. The graphical models: remindera
a
for goldfish-like memories
Assumption
A microarray can be represented as a multivariate Gaussian vector X .
Collecting gene expression
1. Steady-state data leads to an i.i.d. sample.
2. Time-course data gives a time series.
Graphical interpretation
i
? conditional dependency between Xt (i) and Xt 1 (j )
if and only if or
j non null partial correlation between Xt (i) and Xt 1 (j )
Encoded in an unknown matrix of parameters ⇥.
Network inference 59
103. The Maximum likelihood estimator
The natural approach for parametric statistics
Let X be a random vector with distribution defined by fX (x ; ⇥), where
⇥ are the model parameters.
Maximum likelihood estimator
ˆ
⇥ = arg max L(⇥; X)
⇥
where L is the log likelihood, a function of the parameters:
n
Y
L(⇥; X) = log fX (xk ; ⇥),
k =1
where xk is the k row of X.
Remarks
I This a convex optimization problem,
I We just need to detect non zero coe cients in ⇥
Network inference 60
104. The penalized likelihood approach
Let ⇥ be the parameters to infer (the edges).
A penalized likelihood approach
ˆ
⇥ = arg max L(⇥; X) pen`1 (⇥),
⇥
I L is the model log-likelihood,
I pen`1 is a penalty function tuned by > 0.
It performs
1. regularization (needed when n ⌧ p),
2. selection (sparsity induced by the `1 -norm),
Network inference 61
105. The penalized likelihood approach
Let ⇥ be the parameters to infer (the edges).
A penalized likelihood approach
ˆ
⇥ = arg max L(⇥; X) pen`1 (⇥),
⇥
I L is the model log-likelihood,
I pen`1 is a penalty function tuned by > 0.
It performs
1. regularization (needed when n ⌧ p),
2. selection (sparsity induced by the `1 -norm),
Network inference 61
106. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 62
107. A Geometric View of Sparsity
Constrained Optimization
We basically want to solve a problem of
the form
maximize f ( 1, 2 ; X)
1, 2
2 ; X)
where f is typically a concave likelihood
function.
1,
This is strictly equivalent to solve
f(
minimize g( 1, 2 ; X)
1, 2
where g = f is convex ! For instance
the square lost in the OLS.
2
1
Network inference 63
108. A Geometric View of Sparsity
Constrained Optimization
(
2 ; X)
maximize f ( 1, 2 ; X)
1, 2 ,
s.t. ⌦( 1, 2) c
1,
where ⌦ defines a domain that
f(
constrains .
2
1
Network inference 63
109. A Geometric View of Sparsity
Constrained Optimization
(
maximize f ( 1, 2 ; X)
1, 2 ,
s.t. ⌦( 1, 2) c
where ⌦ defines a domain that
constrains .
m
2
maximize f ( 1, 2 ; X) ⌦( 1, 2)
1, 2
1
Network inference 63
110. A Geometric View of Sparsity
Constrained Optimization
(
maximize f ( 1, 2 ; X)
1, 2 ,
s.t. ⌦( 1, 2) c
where ⌦ defines a domain that
constrains .
m
maximize f ( 1, 2 ; X) ⌦( 1, 2)
2
1, 2
How shall we define ⌦ to induce
sparsity?
1
Network inference 63
111. A Geometric View of Sparsity
Supporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
2
1
Network inference 64
112. A Geometric View of Sparsity
Supporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
2
1
Network inference 64
113. A Geometric View of Sparsity
Supporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
2
1
Network inference 64
114. A Geometric View of Sparsity
Supporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
2
1
Network inference 64
115. A Geometric View of Sparsity
Supporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
2
1
There are Supporting Hyperplane at all points of convex sets:
Generalize tangents
Network inference 64
116. A Geometric View of Sparsity
Supporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
2
2
1 1
Network inference 64
117. A Geometric View of Sparsity
Dual Cone
Generalizes normals
2
2
2
1 1 1
Network inference 65
118. A Geometric View of Sparsity
Dual Cone
Generalizes normals
2
2
2
1 1 1
Network inference 65
119. A Geometric View of Sparsity
Dual Cone
Generalizes normals
2
2
2
1 1 1
Network inference 65
120. A Geometric View of Sparsity
Dual Cone
Generalizes normals
2
2
2
1 1 1
Shape of dual cones ) sparsity pattern
Network inference 65
121. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 66
122. The LASSO
R. Tibshirani, 1996.
The Lasso: Least Absolute Shrinkage and Selection Operator
S. Chen , D. Donoho , M. Saunders, 1995.
3.2. Basis Pursuit.
Régularisations ` p 23
Weisberg, 1980.
Forward Stagewise regression.
2 2
(
minimize ky X k2 ,
2
2R2
s.t.` 2 k k1 = | 1 | + | 2| c.
ls ls
m
`1 1 1
minimize ky X k2 + k k1 .
2
2R
Fig. 3.2 – Comparaisons des solutions de problèmes régularisés par une norme `1 et `2 .
Network inference 67
123. Orthogonal case and link to the OLS
OLS shrinkage
The Lasso has no analytical solution but in the orthogonal case: when
X| X = I (never for real data),
ˆlasso = sign( ˆols ) max(0, | ˆols | ).
j j j
OLS
4
Lasso
2
0 ols
4 2 0 2 4
2
4
Network inference 68
124. LARs: Least angle regression
B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, 2004.
Least Angle Regression.
E cient algorithm to compute the Lasso solutions
The LARS solution consists of a curve denoting the solution for each
value of .
I construct a piecewise linear path of solution starting from the null
vector towards the OLS estimate,
I (Almost) the same cost as OLS,
I well adapted to cross validation (help us to choose ).
Network inference 69
125. Example: prostate cancer I
Lasso solution path with Lars
> library(lars)
> load("prostate.rda")
> x <- as.matrix(x)
> x <- scale(as.matrix(x))
> out <- lars(x,y)
> plot(out)
Network inference 70
127. Choice of the tuning parameter I
Model selection criteria
log n
BIC( ) = ky X ˆ k2
2 df( ˆ )
2
AIC( ) = ky X ˆ k2
2 df( ˆ )
where df( ˆ ) is the number of nonzero entries in .
Cross-validation
1. split the data into K folds,
2. use successively each K fold as the testing set,
3. compute the test error on this K folds,
4. average to obtain the CV estimation of the test error.
is chosen to minimize the CV test error.
Network inference 72
128. Choice of the tuning parameter II
CV choice for
> cv.lars(x,y, K=10)
Network inference 73
130. Many variations
Group-Lasso
Activate the variables by group (given by the user).
Adaptive/Weighted-Lasso
Adjust the penalty level to each variables, according to prior knowledge or
with data driven weights.
BoLasso
Bootstrapped version that removes false positives/stabilizes the estimate.
etc.
+ many theoretical results.
Network inference 75
131. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 76
132. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 77
133. Problem
t0 Inference
t1
tn
⇡ 10s microarrays over time
Which interactions?
⇡ 1000s probes (“genes”)
The main statistical issue is the high dimensional setting.
Network inference 78
134. Handling the scarcity of the data
By introducing some prior
Priors should be biologically grounded
1. few genes e↵ectively interact (sparsity),
2. networks are organized (latent clustering),
G8
G7 G9
G11
G1 G6 G10
G4 G5 G2
G12
G13
G3
Network inference 79
135. Handling the scarcity of the data
By introducing some prior
Priors should be biologically grounded
1. few genes e↵ectively interact (sparsity),
2. networks are organized (latent clustering),
G8
G7 G9
G11
G1 G6 G10
G4 G5 G2
G12
G13
G3
Network inference 79
136. Handling the scarcity of the data
By introducing some prior
Priors should be biologically grounded
1. few genes e↵ectively interact (sparsity),
2. networks are organized (latent clustering),
B3
B2 B4
B
A1 B1 B5
A4 A A2
C1
C
A3
Network inference 79
137. Penalized log-likelihood
Banerjee et al., JMLR 2008
ˆ
⇥ = arg max Liid (⇥; S) k⇥k`1 ,
⇥
e ciently solved by the graphical Lasso of Friedman et al, 2008.
Ambroise, Chiquet, Matias, EJS 2009
Use adaptive penalty parameters for di↵erent coe cients
Liid (⇥; S) kPZ ? ⇥k`1 ,
where PZ is a matrix of weights depending on the underlying clustering
Z.
Works with the pseudo log-likelihood (computationally e cient).
Network inference 80
138. Penalized log-likelihood
Banerjee et al., JMLR 2008
ˆ
⇥ = arg max Liid (⇥; S) k⇥k`1 ,
⇥
e ciently solved by the graphical Lasso of Friedman et al, 2008.
Ambroise, Chiquet, Matias, EJS 2009
Use adaptive penalty parameters for di↵erent coe cients
˜
Liid (⇥; S) kPZ ? ⇥k`1 ,
where PZ is a matrix of weights depending on the underlying clustering
Z.
Works with the pseudo log-likelihood (computationally e cient).
Network inference 80
139. Neighborhood selection (1)
Let
I Xi be the i th column of X,
I Xi be X deprived of Xi .
✓ij
Xi = Xi + ", where j = .
✓ii
Meinshausen and B¨lhman, 2006
u
Since sign(corij |P{i,j } ) = sign( j ), select the neighbors of i with
1 2
arg min Xi Xi 2
+ k k`1 .
n
The sign pattern of ⇥ is inferred after a symmetrization step.
Network inference 81
140. Neighborhood selection (2)
The pseudo log-likelihood of the i.i.d Gaussian sample is
p n
!
X X
˜
Liid (⇥; S) = log P(Xk (i )|Xk (Pi ); ⇥i ) ,
i=1 k =1
n n ⇣ ⌘ n
1/2 1/2
= log det(D) Trace D ⇥S⇥D log(2⇡),
2 2 2
where D = diag(⇥).
Proposition
ˆ pseudo = arg max Liid (⇥; S)
⇥ ˜ k⇥k`1
⇥:✓ij 6=✓ii
has the same null entries as inferred by neighborhood selection.
Network inference 82
141. Structured regularization
Introduce prior knowledge
Building the weights
1. Build w from prior biological information
I transcription factors vs. regulatees,
I number of potential binding sites,
I KEGG pathways, Gene Ontology . . .
2. Build the weights matrix from clustering algorithm
I Infer the network G 0 with w = 1 for each node,
I Apply a clustering algorithm on G 0 ,
I Re-Infer G with w built according to the clustering Z.
Network inference 83