SlideShare a Scribd company logo
1 of 32
Download to read offline
Inferring Gene Regulatory Networks from
         Time-Course Gene Expression Data

  Camille Charbonnier and Julien Chiquet and Christophe Ambroise

                                                ´
                    Laboratoire Statistique et Genome,
                          ´                   ´   ´
                      La genopole - Universite d’Evry


                       JOBIM – 7-9 septembre 2010




Camille Charbonnier and Julien Chiquet and Christophe Ambroise     1
Problem




 t0                                   Inference
      t1
            tn
       ≈ 10s microarrays over time
                                                     Which interactions?
           ≈ 1000s probes (“genes”)




           The main statistical issue is the high dimensional setting.



Camille Charbonnier and Julien Chiquet and Christophe Ambroise             2
Handling the scarcity of the data
By introducing some prior

 Priors should be biologically grounded
  1. few genes effectively interact (sparsity),
  2. networks are organized (latent clustering),

                                      G8



                            G7                    G9

                                      G11




                G1               G6         G10




         G4     G5    G2
                                                       G12


                                            G13
                G3




Camille Charbonnier and Julien Chiquet and Christophe Ambroise   3
Handling the scarcity of the data
By introducing some prior

 Priors should be biologically grounded
  1. few genes effectively interact (sparsity),
  2. networks are organized (latent clustering),

                                      G8



                            G7                    G9

                                      G11




                G1               G6         G10




         G4     G5    G2
                                                       G12


                                            G13
                G3




Camille Charbonnier and Julien Chiquet and Christophe Ambroise   3
Handling the scarcity of the data
By introducing some prior

 Priors should be biologically grounded
  1. few genes effectively interact (sparsity),
  2. networks are organized (latent clustering),

                                      B3



                            B2                  B4

                                      B




                A1               B1        B5




          A4    A     A2
                                                     C1


                                           C
                A3




Camille Charbonnier and Julien Chiquet and Christophe Ambroise   3
Outline


 Statistical models
    Gaussian Graphical Model for Time-course data
    Structured Regularization

 Algorithms and methods
    Overall view
    Model selection

 Numerical experiments
   Inference methods
   Performance on simulated data
   E. coli S.O.S DNA repair network



Camille Charbonnier and Julien Chiquet and Christophe Ambroise   4
Gaussian Graphical Model for Time-course data
 Collecting gene expression
  1. Follow-up of one single experiment/individual;
  2. Close enough time-points to ensure
            dependency between consecutive measurements;
            homogeneity of the Markov process.

                                        X1
                                         t

       X4
                                        X2
                                         t        X2
                                                   t+1



       X1                  stands for             X3
                                                   t+1



  X3           X2     X5                     G    X4
                                                   t+1



                                                  X5
                                                   t+1



Camille Charbonnier and Julien Chiquet and Christophe Ambroise   5
Gaussian Graphical Model for Time-course data
 Collecting gene expression
  1. Follow-up of one single experiment/individual;
  2. Close enough time-points to ensure
            dependency between consecutive measurements;
            homogeneity of the Markov process.

                                        X1
                                         t

       X4
                                        X2
                                         t        X2
                                                   t+1



       X1                  stands for             X3
                                                   t+1



  X3           X2     X5                     G    X4
                                                   t+1



                                                  X5
                                                   t+1



Camille Charbonnier and Julien Chiquet and Christophe Ambroise   5
Gaussian Graphical Model for Time-course data
 Collecting gene expression
  1. Follow-up of one single experiment/individual;
  2. Close enough time-points to ensure
         dependency between consecutive measurements;
         homogeneity of the Markov process.

  X1
   t              X1
                   2
                                   ...             X1
                                                    n



  X2
   1              X2
                   2
                                   ...             X2
                                                    n



  X3
   1              X3
                   2
                                   ...             X3
                                                    n



  X4
   1              X4
                   2
                                   ...             X4
                                                    n

          G               G                G
  X5
   1              X5
                   2
                                   ...             X5
                                                    n



Camille Charbonnier and Julien Chiquet and Christophe Ambroise   5
Gaussian Graphical Model for Time-Course data

 Assumption
 A microarray can be represented as a multivariate Gaussian
 vector X = (X(1), . . . , X(p)) ∈ Rp , following a first order vector
 autoregressive process V AR(1):

                    Xt = ΘXt−1 + b + εt ,           t ∈ [1, n]

 where we are looking for Θ = (θij )i,j∈P .

 Graphical interpretation
        i                       conditional dependency between Xt−1 (j) and Xt (i)
               if and only if
  j
                                non null partial correlation between Xt−1 (j) and Xt (i)

                                                        θij = 0
  k




Camille Charbonnier and Julien Chiquet and Christophe Ambroise                         6
Gaussian Graphical Model for Time-Course data

 Assumption
 A microarray can be represented as a multivariate Gaussian
 vector X = (X(1), . . . , X(p)) ∈ Rp , following a first order vector
 autoregressive process V AR(1):

                    Xt = ΘXt−1 + b + εt ,           t ∈ [1, n]

 where we are looking for Θ = (θij )i,j∈P .

 Graphical interpretation
      ?   i                     conditional dependency between Xt−1 (j) and Xt (i)
               if and only if
  j
                                non null partial correlation between Xt−1 (j) and Xt (i)

                                                        θij = 0
  k




Camille Charbonnier and Julien Chiquet and Christophe Ambroise                         6
Gaussian Graphical Model for Time-Course data

 Assumption
 A microarray can be represented as a multivariate Gaussian
 vector X = (X(1), . . . , X(p)) ∈ Rp , following a first order vector
 autoregressive process V AR(1):

                    Xt = ΘXt−1 + b + εt ,           t ∈ [1, n]

 where we are looking for Θ = (θij )i,j∈P .

 Graphical interpretation
      ?   i                     conditional dependency between Xt−1 (j) and Xt (i)
               if and only if
  j
                                non null partial correlation between Xt−1 (j) and Xt (i)

                                                        θij = 0
  k




Camille Charbonnier and Julien Chiquet and Christophe Ambroise                         6
Gaussian Graphical Model for Time-course data

 Let
       X be the n × p matrix whose kth row is Xk ,
       S = n−1 Xn Xn be the within time covariance matrix,
       V = n−1 Xn X0 be the across time covariance matrix.


 The log-likelihood
                                             n
          Ltime (Θ; S, V) = n Trace (VΘ) −     Trace (Θ SΘ) + c.
                                             2


   Maximum Likelihood Estimator ΘM LE = S−1 V
       not defined for n < p;
       even if n > p, requires multiple testing.

Camille Charbonnier and Julien Chiquet and Christophe Ambroise     7
Structured regularization
 1   penalized log-likelihood



 Charbonnier, Chiquet, Ambroise, SAGMB 2010

               ˆ
               Θλ = arg max Ltime (Θ; S, V) − λ ·           PZ |Θij |
                                                             ij
                         Θ                          i,j∈P

 where λ is an overall tuning parameter and PZ is a
 (non-symmetric) matrix of weights depending on the underlying
 clustering Z.
 It performs
     1. regularization (needed when n        p),
     2. selection (specificity of the   1 -norm),
     3. cluster-driven inference (penalty adapted to Z).


Camille Charbonnier and Julien Chiquet and Christophe Ambroise          8
Structured regularization
“Bayesian” interpretation of                           1   regularization

 Laplacian prior on Θ depends on the clustering Z

                   P(Θ|Z) ∝                                      exp −λ · PZ · |Θij | .
                                                                           ij
                                                           i,j


 PZ summarizes prior information on the position of edges
                                          1.0


                                                                                  λPZ = 1
                                                                                    ij
                                                                                  λPZ = 2
                                                                                    ij
                     prior distribution
                                          0.8
                                          0.6
                                          0.4
                                          0.2




                                                -1.0        -0.5     0.0    0.5        1.0
                                                                    Θij
Camille Charbonnier and Julien Chiquet and Christophe Ambroise                               9
How to come up with a latent clustering?
 Biological expertise
     Build Z from prior biological information
          transcription factors vs. regulatees,
          number of potential binding sites,
          KEGG pathways, . . .
     Build the weight matrix from Z.

               ¨   ´
 Inference: Erdos-Renyi Mixture for Networks
 (Daudin et al., 2008)

     Spread the nodes into Q classes;
     Connexion probabilities depends upon node classes:

                    P(i → j|i ∈ class q, j ∈ class ) = πq .

     Build PZ ∝ 1 − πq .

Camille Charbonnier and Julien Chiquet and Christophe Ambroise   10
Algorithm

         Suppose you want toSIMoNE
                             recover a clustered network:




                                                  Graph




                   Target Adjacency Matrix

                                             Target Network




Camille Charbonnier and Julien Chiquet and Christophe Ambroise   11
Algorithm

           Start with microarray data
                               SIMoNE




                         Data




Camille Charbonnier and Julien Chiquet and Christophe Ambroise   11
Algorithm
                                 SIMoNE




                        Inference without prior




                                           Adjacency Matrix
                         Data             corresponding to G




Camille Charbonnier and Julien Chiquet and Christophe Ambroise   11
Algorithm
                                               SIMoNE




                                    Inference without prior




                                                            Adjacency Matrix
     Penalty matrix PZ               Data                  corresponding to G


             Decreasing transformation                     Mixer
                                            πZ
                                     Connectivity matrix

Camille Charbonnier and Julien Chiquet and Christophe Ambroise                  11
Algorithm
                                  Inference with clustered prior
                                            SIMoNE




                                    Inference without prior




                            +


                                                            Adjacency Matrix     Adjacency Matrix
     Penalty matrix PZ               Data                  corresponding to G   corresponding to GZ


             Decreasing transformation                     Mixer
                                            πZ
                                     Connectivity matrix

Camille Charbonnier and Julien Chiquet and Christophe Ambroise                                        11
Tuning the penalty parameter
BIC / AIC


 Degrees of freedom of the Lasso (Zou et al. 2008)

                          ˆ
                       df(β λ ) =         ˆλ
                                        1(βk = 0)
                                    k


 Straightforward extensions to the graphical framework

                            ˆ            ˆ log n
                 BIC(λ) = L(Θλ ; X) − df(Θλ )
                                              2

                               ˆ            ˆ
                    AIC(λ) = L(Θλ ; X) − df(Θλ )

     Rely on asymptotic approximations,
     but still relevant on simulated small samples.

Camille Charbonnier and Julien Chiquet and Christophe Ambroise   12
Inference methods

  1. Lasso (Tibshirani)
  2. Adaptive Lasso (Zou et al.)
     Weights inversely proportional to an initial Lasso estimate.

  3. KnwCl
     Weights structured according to true clustering.

  4. InfCl
     Weights structured according to inferred clustering.

  5. Renet-VAR (Shimamura et al.)
     Edge estimation based on a recursive elastic net.

  6. G1DBN (Lebre et al.)
             `
     Edge estimation based on dynamic Bayesian networks followed by statistical
     testing of edges.

  7. Shrinkage (Opgen-Rhein et al.)
     Edge estimation based on shrinkage followed by multiple testing local false
     discovery rate correction.

Camille Charbonnier and Julien Chiquet and Christophe Ambroise                     13
Simulations: time-course data with star-pattern

 Simulation settings
     2 classes, hubs and leaves, with proportions α = (0.1, 0.9),
     K = 2p edges, among which:
         85% from hubs to leaves,
         15% between hubs.

                   p genes    n arrays   samples
                      20         40        500
                      20         20        500
                      20         10        500
                     100        100        200
                     100         50        200
                     800         20        100



Camille Charbonnier and Julien Chiquet and Christophe Ambroise      14
Simulations: time-course data with star-pattern
                         a) p=20, n=40
           1.5
           1.0
           0.5
           0.0
                         b) p=20, n=20
                                            1.5
                                            1.0
                                            0.5
                                            0.0
                         c) p=20, n=10
           1.5
           1.0
           0.5
           0.0                                    Precision
                        d) p=100, n=100           Recall
                                            1.5
                                            1.0
                                            0.5
                                            0.0
                        e) p=100, n=50
           1.5
           1.0
           0.5
           0.0
                        f) p=800, n=20
                                            1.5
                                            1.0
                                            0.5
                                            0.0
           Ad SO C

           Ad tive C
               tiv IC

                 C C
                       IC
              In BIC

              In AIC

                 et IC
               G AR

                 rin N
                       ge
              S AI

                       I



                       I




              Sh BD
             ap −B

             ap −A
             Kn e−B

             Kn l−A




             R l−B




                    ka
                   −V
            LA O−




                   l−

                   l−




                 1D
                 C
                fC

                fC
              SS




               w

               w




              en
          LA




Camille Charbonnier and Julien Chiquet and Christophe Ambroise   15
Reasonnable computing time
                                                          CPU time (log scale)




                                       10




                                                                                              4 hr.
                                             G1DBN
                                             Renet−VAR
                                             InfCl




                                                                                              1 hr.
                                       8




                                                                                              10 min.
                                       6
               log of CPU time (sec)




                                                                                              1 min.
                                       4




                                                                                              10 sec.
                                       2




                                                                                              1 sec.
                                       0




                                            2.5     3.0      3.5            4.0   4.5   5.0

                                                                   log of p/n



 Figure: Computing times on the log-log scale for Renet-VAR, G1DBN
 and InfCl (including inference of classes). Intel Dual Core 3.40 GHz
 processor.
Camille Charbonnier and Julien Chiquet and Christophe Ambroise                                          16
E. coli S.O.S DNA repair network
 E.coli S.O.S data




        SOS.pdf




Camille Charbonnier and Julien Chiquet and Christophe Ambroise   17
E. coli S.O.S DNA repair network
Precision and Recall rates


                                               Exp. 1
               1.5
               1.0
               0.5
               0.0
                                               Exp. 2
                                                                             1.5
                                                                             1.0
                                                                             0.5
                                                                             0.0   Precision
                                               Exp. 3                              Recall
               1.5
               1.0
               0.5
               0.0
                                               Exp. 4
                                                                             1.5
                                                                             1.0
                                                                             0.5
                                                                             0.0
                       o


                                    e

                                          BN


                                                    et


                                                                l


                                                                         l
                                                            nC


                                                                         C
                                 tiv
                       ss




                                                 en




                                                                        r
                                                                     fe
                                         1D




                                                           ow
                             ap
                     La




                                                R




                                                                    In
                            Ad


                                        G




                                                        Kn




Camille Charbonnier and Julien Chiquet and Christophe Ambroise                                 18
E. coli S.O.S DNA repair network
Inferred networks

                           Lasso                             Adaptive-Lasso
                                                                                lexA
                                  lexA
                                                           umuDC                        uvrD
               umuDC                         uvrD


                                                   polB
               recA                                       recA                                polB



                                     ruvA                    uvrA                       ruvA
                    uvrA
                           uvrY                                         uvrY




              Known Classification                         Inferred Classification
                                  lexA
                                                                                lexA

                umuDC                                                                   uvrD
                                            uvrD           umuDC


               recA                            polB        recA                               polB



                    uvrA                    ruvA                 uvrA                  ruvA
                           uvrY                                          uvrY

Camille Charbonnier and Julien Chiquet and Christophe Ambroise                                       19
Conclusions

 To sum-up
     cluster-driven inference of gene regulatory networks from
     time-course data,
     expert-based or inferred latent structure,
     embedded in the SIMoNe R package along with similar
     algorithms dealing with steady-state or multitask data.


 Perspectives
     inference of truely dynamic networks,
     use of additive biological information to refine the
     inference,
     comparison of inferred networks.


Camille Charbonnier and Julien Chiquet and Christophe Ambroise   20
Publications

    Ambroise, Chiquet, Matias, 2009.
    Inferring sparse Gaussian graphical models with latent structure
    Electronic Journal of Statistics, 3, 205-238.
    Chiquet, Smith, Grasseau, Matias, Ambroise, 2009.
    SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics,
    25(3), 417-418.
    Charbonnier, Chiquet, Ambroise, 2010.
    Weighted-Lasso for Structured Network Inference from Time
    Course Data., SAGMB, 9.
    Chiquet, Grandvalet, Ambroise, 2010. Statistics and Computing.
    Inferring multiple Gaussian graphical models.




Camille Charbonnier and Julien Chiquet and Christophe Ambroise           21
Publications

     Ambroise, Chiquet, Matias, 2009.
     Inferring sparse Gaussian graphical models with latent structure
     Electronic Journal of Statistics, 3, 205-238.
     Chiquet, Smith, Grasseau, Matias, Ambroise, 2009.
     SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics,
     25(3), 417-418.
     Charbonnier, Chiquet, Ambroise, 2010.
     Weighted-Lasso for Structured Network Inference from Time
     Course Data., SAGMB, 9.
     Chiquet, Grandvalet, Ambroise, 2010. Statistics and Computing.
     Inferring multiple Gaussian graphical models.
     Working paper: Chiquet, Charbonnier, Ambroise, Grasseau.
     SIMoNe: An R package for inferring Gausssian networks with
     latent structure, Journal of Statistical Softwares.
    Working paper: Chiquet, Grandvalet, Ambroise, Jeanmougin.
    Biological analysis of breast cancer by multitasks learning.
Camille Charbonnier and Julien Chiquet and Christophe Ambroise            21

More Related Content

What's hot

ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihoodChristian Robert
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methodsChristian Robert
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihoodChristian Robert
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themPierre Jacob
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsChristian Robert
 
Likelihood survey-nber-0713101
Likelihood survey-nber-0713101Likelihood survey-nber-0713101
Likelihood survey-nber-0713101NBER
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
12 x1 t04 06 integrating functions of time (2013)
12 x1 t04 06 integrating functions of time (2013)12 x1 t04 06 integrating functions of time (2013)
12 x1 t04 06 integrating functions of time (2013)Nigel Simmons
 
12 x1 t04 06 integrating functions of time (2012)
12 x1 t04 06 integrating functions of time (2012)12 x1 t04 06 integrating functions of time (2012)
12 x1 t04 06 integrating functions of time (2012)Nigel Simmons
 
The Black-Litterman model in the light of Bayesian portfolio analysis
The Black-Litterman model in the light of Bayesian portfolio analysisThe Black-Litterman model in the light of Bayesian portfolio analysis
The Black-Litterman model in the light of Bayesian portfolio analysisDaniel Bruggisser
 
Parameter Uncertainty and Learning in Dynamic Financial Decisions
Parameter Uncertainty and Learning in Dynamic Financial DecisionsParameter Uncertainty and Learning in Dynamic Financial Decisions
Parameter Uncertainty and Learning in Dynamic Financial DecisionsDaniel Bruggisser
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMCPierre Jacob
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...Kazuki Fujikawa
 
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...BigMC
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Rediet Moges
 

What's hot (20)

ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing them
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methods
 
Mixture Models for Image Analysis
Mixture Models for Image AnalysisMixture Models for Image Analysis
Mixture Models for Image Analysis
 
Likelihood survey-nber-0713101
Likelihood survey-nber-0713101Likelihood survey-nber-0713101
Likelihood survey-nber-0713101
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
12 x1 t04 06 integrating functions of time (2013)
12 x1 t04 06 integrating functions of time (2013)12 x1 t04 06 integrating functions of time (2013)
12 x1 t04 06 integrating functions of time (2013)
 
12 x1 t04 06 integrating functions of time (2012)
12 x1 t04 06 integrating functions of time (2012)12 x1 t04 06 integrating functions of time (2012)
12 x1 t04 06 integrating functions of time (2012)
 
Hw4sol
Hw4solHw4sol
Hw4sol
 
The Black-Litterman model in the light of Bayesian portfolio analysis
The Black-Litterman model in the light of Bayesian portfolio analysisThe Black-Litterman model in the light of Bayesian portfolio analysis
The Black-Litterman model in the light of Bayesian portfolio analysis
 
Parameter Uncertainty and Learning in Dynamic Financial Decisions
Parameter Uncertainty and Learning in Dynamic Financial DecisionsParameter Uncertainty and Learning in Dynamic Financial Decisions
Parameter Uncertainty and Learning in Dynamic Financial Decisions
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...
 
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03
 

Viewers also liked

Viewers also liked (14)

Job Photos
Job PhotosJob Photos
Job Photos
 
Gaussian Graphical Models with latent structure
Gaussian Graphical Models with latent structureGaussian Graphical Models with latent structure
Gaussian Graphical Models with latent structure
 
Multitask learning for GGM
Multitask learning for GGMMultitask learning for GGM
Multitask learning for GGM
 
Networking in social media
Networking in social mediaNetworking in social media
Networking in social media
 
Dragos bucurenci
Dragos bucurenciDragos bucurenci
Dragos bucurenci
 
SIMoNe: Statistical Iference for MOdular NEtworks
SIMoNe: Statistical Iference for MOdular NEtworksSIMoNe: Statistical Iference for MOdular NEtworks
SIMoNe: Statistical Iference for MOdular NEtworks
 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical model
 
Compensation speak asia
Compensation speak asiaCompensation speak asia
Compensation speak asia
 
Movie Sets And Locations
Movie Sets And LocationsMovie Sets And Locations
Movie Sets And Locations
 
Sparsity by worst-case quadratic penalties
Sparsity by worst-case quadratic penaltiesSparsity by worst-case quadratic penalties
Sparsity by worst-case quadratic penalties
 
奢侈品媒体案例
奢侈品媒体案例奢侈品媒体案例
奢侈品媒体案例
 
Proposal penelitian
Proposal penelitianProposal penelitian
Proposal penelitian
 
Umma
UmmaUmma
Umma
 
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-LassoSparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
 

Similar to Weighted Lasso for Network inference

ABC & Empirical Lkd
ABC & Empirical LkdABC & Empirical Lkd
ABC & Empirical LkdDeb Roy
 
New Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial SectorNew Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial SectorSSA KPI
 
Statistical Analysis of Neural Coding
Statistical Analysis of Neural CodingStatistical Analysis of Neural Coding
Statistical Analysis of Neural CodingYifei Shea, Ph.D.
 
Framework for the analysis and design of encryption strategies based on d...
Framework for the analysis and design of encryption strategies     based on d...Framework for the analysis and design of encryption strategies     based on d...
Framework for the analysis and design of encryption strategies based on d...darg0001
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber SecurityAltoros
 
Vanilla rao blackwellisation
Vanilla rao blackwellisationVanilla rao blackwellisation
Vanilla rao blackwellisationDeb Roy
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
 
Univariate Financial Time Series Analysis
Univariate Financial Time Series AnalysisUnivariate Financial Time Series Analysis
Univariate Financial Time Series AnalysisAnissa ATMANI
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
Time-Evolving Relational Classification and Ensemble Methods
Time-Evolving Relational Classification and Ensemble MethodsTime-Evolving Relational Classification and Ensemble Methods
Time-Evolving Relational Classification and Ensemble MethodsRyan Rossi
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysisrik0
 

Similar to Weighted Lasso for Network inference (14)

ABC & Empirical Lkd
ABC & Empirical LkdABC & Empirical Lkd
ABC & Empirical Lkd
 
New Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial SectorNew Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial Sector
 
Statistical Analysis of Neural Coding
Statistical Analysis of Neural CodingStatistical Analysis of Neural Coding
Statistical Analysis of Neural Coding
 
YSC 2013
YSC 2013YSC 2013
YSC 2013
 
Framework for the analysis and design of encryption strategies based on d...
Framework for the analysis and design of encryption strategies     based on d...Framework for the analysis and design of encryption strategies     based on d...
Framework for the analysis and design of encryption strategies based on d...
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber Security
 
Trondheim, LGM2012
Trondheim, LGM2012Trondheim, LGM2012
Trondheim, LGM2012
 
Vanilla rao blackwellisation
Vanilla rao blackwellisationVanilla rao blackwellisation
Vanilla rao blackwellisation
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
 
Univariate Financial Time Series Analysis
Univariate Financial Time Series AnalysisUnivariate Financial Time Series Analysis
Univariate Financial Time Series Analysis
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
Time-Evolving Relational Classification and Ensemble Methods
Time-Evolving Relational Classification and Ensemble MethodsTime-Evolving Relational Classification and Ensemble Methods
Time-Evolving Relational Classification and Ensemble Methods
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 

Weighted Lasso for Network inference

  • 1. Inferring Gene Regulatory Networks from Time-Course Gene Expression Data Camille Charbonnier and Julien Chiquet and Christophe Ambroise ´ Laboratoire Statistique et Genome, ´ ´ ´ La genopole - Universite d’Evry JOBIM – 7-9 septembre 2010 Camille Charbonnier and Julien Chiquet and Christophe Ambroise 1
  • 2. Problem t0 Inference t1 tn ≈ 10s microarrays over time Which interactions? ≈ 1000s probes (“genes”) The main statistical issue is the high dimensional setting. Camille Charbonnier and Julien Chiquet and Christophe Ambroise 2
  • 3. Handling the scarcity of the data By introducing some prior Priors should be biologically grounded 1. few genes effectively interact (sparsity), 2. networks are organized (latent clustering), G8 G7 G9 G11 G1 G6 G10 G4 G5 G2 G12 G13 G3 Camille Charbonnier and Julien Chiquet and Christophe Ambroise 3
  • 4. Handling the scarcity of the data By introducing some prior Priors should be biologically grounded 1. few genes effectively interact (sparsity), 2. networks are organized (latent clustering), G8 G7 G9 G11 G1 G6 G10 G4 G5 G2 G12 G13 G3 Camille Charbonnier and Julien Chiquet and Christophe Ambroise 3
  • 5. Handling the scarcity of the data By introducing some prior Priors should be biologically grounded 1. few genes effectively interact (sparsity), 2. networks are organized (latent clustering), B3 B2 B4 B A1 B1 B5 A4 A A2 C1 C A3 Camille Charbonnier and Julien Chiquet and Christophe Ambroise 3
  • 6. Outline Statistical models Gaussian Graphical Model for Time-course data Structured Regularization Algorithms and methods Overall view Model selection Numerical experiments Inference methods Performance on simulated data E. coli S.O.S DNA repair network Camille Charbonnier and Julien Chiquet and Christophe Ambroise 4
  • 7. Gaussian Graphical Model for Time-course data Collecting gene expression 1. Follow-up of one single experiment/individual; 2. Close enough time-points to ensure dependency between consecutive measurements; homogeneity of the Markov process. X1 t X4 X2 t X2 t+1 X1 stands for X3 t+1 X3 X2 X5 G X4 t+1 X5 t+1 Camille Charbonnier and Julien Chiquet and Christophe Ambroise 5
  • 8. Gaussian Graphical Model for Time-course data Collecting gene expression 1. Follow-up of one single experiment/individual; 2. Close enough time-points to ensure dependency between consecutive measurements; homogeneity of the Markov process. X1 t X4 X2 t X2 t+1 X1 stands for X3 t+1 X3 X2 X5 G X4 t+1 X5 t+1 Camille Charbonnier and Julien Chiquet and Christophe Ambroise 5
  • 9. Gaussian Graphical Model for Time-course data Collecting gene expression 1. Follow-up of one single experiment/individual; 2. Close enough time-points to ensure dependency between consecutive measurements; homogeneity of the Markov process. X1 t X1 2 ... X1 n X2 1 X2 2 ... X2 n X3 1 X3 2 ... X3 n X4 1 X4 2 ... X4 n G G G X5 1 X5 2 ... X5 n Camille Charbonnier and Julien Chiquet and Christophe Ambroise 5
  • 10. Gaussian Graphical Model for Time-Course data Assumption A microarray can be represented as a multivariate Gaussian vector X = (X(1), . . . , X(p)) ∈ Rp , following a first order vector autoregressive process V AR(1): Xt = ΘXt−1 + b + εt , t ∈ [1, n] where we are looking for Θ = (θij )i,j∈P . Graphical interpretation i conditional dependency between Xt−1 (j) and Xt (i) if and only if j non null partial correlation between Xt−1 (j) and Xt (i) θij = 0 k Camille Charbonnier and Julien Chiquet and Christophe Ambroise 6
  • 11. Gaussian Graphical Model for Time-Course data Assumption A microarray can be represented as a multivariate Gaussian vector X = (X(1), . . . , X(p)) ∈ Rp , following a first order vector autoregressive process V AR(1): Xt = ΘXt−1 + b + εt , t ∈ [1, n] where we are looking for Θ = (θij )i,j∈P . Graphical interpretation ? i conditional dependency between Xt−1 (j) and Xt (i) if and only if j non null partial correlation between Xt−1 (j) and Xt (i) θij = 0 k Camille Charbonnier and Julien Chiquet and Christophe Ambroise 6
  • 12. Gaussian Graphical Model for Time-Course data Assumption A microarray can be represented as a multivariate Gaussian vector X = (X(1), . . . , X(p)) ∈ Rp , following a first order vector autoregressive process V AR(1): Xt = ΘXt−1 + b + εt , t ∈ [1, n] where we are looking for Θ = (θij )i,j∈P . Graphical interpretation ? i conditional dependency between Xt−1 (j) and Xt (i) if and only if j non null partial correlation between Xt−1 (j) and Xt (i) θij = 0 k Camille Charbonnier and Julien Chiquet and Christophe Ambroise 6
  • 13. Gaussian Graphical Model for Time-course data Let X be the n × p matrix whose kth row is Xk , S = n−1 Xn Xn be the within time covariance matrix, V = n−1 Xn X0 be the across time covariance matrix. The log-likelihood n Ltime (Θ; S, V) = n Trace (VΘ) − Trace (Θ SΘ) + c. 2 Maximum Likelihood Estimator ΘM LE = S−1 V not defined for n < p; even if n > p, requires multiple testing. Camille Charbonnier and Julien Chiquet and Christophe Ambroise 7
  • 14. Structured regularization 1 penalized log-likelihood Charbonnier, Chiquet, Ambroise, SAGMB 2010 ˆ Θλ = arg max Ltime (Θ; S, V) − λ · PZ |Θij | ij Θ i,j∈P where λ is an overall tuning parameter and PZ is a (non-symmetric) matrix of weights depending on the underlying clustering Z. It performs 1. regularization (needed when n p), 2. selection (specificity of the 1 -norm), 3. cluster-driven inference (penalty adapted to Z). Camille Charbonnier and Julien Chiquet and Christophe Ambroise 8
  • 15. Structured regularization “Bayesian” interpretation of 1 regularization Laplacian prior on Θ depends on the clustering Z P(Θ|Z) ∝ exp −λ · PZ · |Θij | . ij i,j PZ summarizes prior information on the position of edges 1.0 λPZ = 1 ij λPZ = 2 ij prior distribution 0.8 0.6 0.4 0.2 -1.0 -0.5 0.0 0.5 1.0 Θij Camille Charbonnier and Julien Chiquet and Christophe Ambroise 9
  • 16. How to come up with a latent clustering? Biological expertise Build Z from prior biological information transcription factors vs. regulatees, number of potential binding sites, KEGG pathways, . . . Build the weight matrix from Z. ¨ ´ Inference: Erdos-Renyi Mixture for Networks (Daudin et al., 2008) Spread the nodes into Q classes; Connexion probabilities depends upon node classes: P(i → j|i ∈ class q, j ∈ class ) = πq . Build PZ ∝ 1 − πq . Camille Charbonnier and Julien Chiquet and Christophe Ambroise 10
  • 17. Algorithm Suppose you want toSIMoNE recover a clustered network: Graph Target Adjacency Matrix Target Network Camille Charbonnier and Julien Chiquet and Christophe Ambroise 11
  • 18. Algorithm Start with microarray data SIMoNE Data Camille Charbonnier and Julien Chiquet and Christophe Ambroise 11
  • 19. Algorithm SIMoNE Inference without prior Adjacency Matrix Data corresponding to G Camille Charbonnier and Julien Chiquet and Christophe Ambroise 11
  • 20. Algorithm SIMoNE Inference without prior Adjacency Matrix Penalty matrix PZ Data corresponding to G Decreasing transformation Mixer πZ Connectivity matrix Camille Charbonnier and Julien Chiquet and Christophe Ambroise 11
  • 21. Algorithm Inference with clustered prior SIMoNE Inference without prior + Adjacency Matrix Adjacency Matrix Penalty matrix PZ Data corresponding to G corresponding to GZ Decreasing transformation Mixer πZ Connectivity matrix Camille Charbonnier and Julien Chiquet and Christophe Ambroise 11
  • 22. Tuning the penalty parameter BIC / AIC Degrees of freedom of the Lasso (Zou et al. 2008) ˆ df(β λ ) = ˆλ 1(βk = 0) k Straightforward extensions to the graphical framework ˆ ˆ log n BIC(λ) = L(Θλ ; X) − df(Θλ ) 2 ˆ ˆ AIC(λ) = L(Θλ ; X) − df(Θλ ) Rely on asymptotic approximations, but still relevant on simulated small samples. Camille Charbonnier and Julien Chiquet and Christophe Ambroise 12
  • 23. Inference methods 1. Lasso (Tibshirani) 2. Adaptive Lasso (Zou et al.) Weights inversely proportional to an initial Lasso estimate. 3. KnwCl Weights structured according to true clustering. 4. InfCl Weights structured according to inferred clustering. 5. Renet-VAR (Shimamura et al.) Edge estimation based on a recursive elastic net. 6. G1DBN (Lebre et al.) ` Edge estimation based on dynamic Bayesian networks followed by statistical testing of edges. 7. Shrinkage (Opgen-Rhein et al.) Edge estimation based on shrinkage followed by multiple testing local false discovery rate correction. Camille Charbonnier and Julien Chiquet and Christophe Ambroise 13
  • 24. Simulations: time-course data with star-pattern Simulation settings 2 classes, hubs and leaves, with proportions α = (0.1, 0.9), K = 2p edges, among which: 85% from hubs to leaves, 15% between hubs. p genes n arrays samples 20 40 500 20 20 500 20 10 500 100 100 200 100 50 200 800 20 100 Camille Charbonnier and Julien Chiquet and Christophe Ambroise 14
  • 25. Simulations: time-course data with star-pattern a) p=20, n=40 1.5 1.0 0.5 0.0 b) p=20, n=20 1.5 1.0 0.5 0.0 c) p=20, n=10 1.5 1.0 0.5 0.0 Precision d) p=100, n=100 Recall 1.5 1.0 0.5 0.0 e) p=100, n=50 1.5 1.0 0.5 0.0 f) p=800, n=20 1.5 1.0 0.5 0.0 Ad SO C Ad tive C tiv IC C C IC In BIC In AIC et IC G AR rin N ge S AI I I Sh BD ap −B ap −A Kn e−B Kn l−A R l−B ka −V LA O− l− l− 1D C fC fC SS w w en LA Camille Charbonnier and Julien Chiquet and Christophe Ambroise 15
  • 26. Reasonnable computing time CPU time (log scale) 10 4 hr. G1DBN Renet−VAR InfCl 1 hr. 8 10 min. 6 log of CPU time (sec) 1 min. 4 10 sec. 2 1 sec. 0 2.5 3.0 3.5 4.0 4.5 5.0 log of p/n Figure: Computing times on the log-log scale for Renet-VAR, G1DBN and InfCl (including inference of classes). Intel Dual Core 3.40 GHz processor. Camille Charbonnier and Julien Chiquet and Christophe Ambroise 16
  • 27. E. coli S.O.S DNA repair network E.coli S.O.S data SOS.pdf Camille Charbonnier and Julien Chiquet and Christophe Ambroise 17
  • 28. E. coli S.O.S DNA repair network Precision and Recall rates Exp. 1 1.5 1.0 0.5 0.0 Exp. 2 1.5 1.0 0.5 0.0 Precision Exp. 3 Recall 1.5 1.0 0.5 0.0 Exp. 4 1.5 1.0 0.5 0.0 o e BN et l l nC C tiv ss en r fe 1D ow ap La R In Ad G Kn Camille Charbonnier and Julien Chiquet and Christophe Ambroise 18
  • 29. E. coli S.O.S DNA repair network Inferred networks Lasso Adaptive-Lasso lexA lexA umuDC uvrD umuDC uvrD polB recA recA polB ruvA uvrA ruvA uvrA uvrY uvrY Known Classification Inferred Classification lexA lexA umuDC uvrD uvrD umuDC recA polB recA polB uvrA ruvA uvrA ruvA uvrY uvrY Camille Charbonnier and Julien Chiquet and Christophe Ambroise 19
  • 30. Conclusions To sum-up cluster-driven inference of gene regulatory networks from time-course data, expert-based or inferred latent structure, embedded in the SIMoNe R package along with similar algorithms dealing with steady-state or multitask data. Perspectives inference of truely dynamic networks, use of additive biological information to refine the inference, comparison of inferred networks. Camille Charbonnier and Julien Chiquet and Christophe Ambroise 20
  • 31. Publications Ambroise, Chiquet, Matias, 2009. Inferring sparse Gaussian graphical models with latent structure Electronic Journal of Statistics, 3, 205-238. Chiquet, Smith, Grasseau, Matias, Ambroise, 2009. SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics, 25(3), 417-418. Charbonnier, Chiquet, Ambroise, 2010. Weighted-Lasso for Structured Network Inference from Time Course Data., SAGMB, 9. Chiquet, Grandvalet, Ambroise, 2010. Statistics and Computing. Inferring multiple Gaussian graphical models. Camille Charbonnier and Julien Chiquet and Christophe Ambroise 21
  • 32. Publications Ambroise, Chiquet, Matias, 2009. Inferring sparse Gaussian graphical models with latent structure Electronic Journal of Statistics, 3, 205-238. Chiquet, Smith, Grasseau, Matias, Ambroise, 2009. SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics, 25(3), 417-418. Charbonnier, Chiquet, Ambroise, 2010. Weighted-Lasso for Structured Network Inference from Time Course Data., SAGMB, 9. Chiquet, Grandvalet, Ambroise, 2010. Statistics and Computing. Inferring multiple Gaussian graphical models. Working paper: Chiquet, Charbonnier, Ambroise, Grasseau. SIMoNe: An R package for inferring Gausssian networks with latent structure, Journal of Statistical Softwares. Working paper: Chiquet, Grandvalet, Ambroise, Jeanmougin. Biological analysis of breast cancer by multitasks learning. Camille Charbonnier and Julien Chiquet and Christophe Ambroise 21