Inference of the JAK-STAT Gene Network via Graphical Models


Published on

AACIMP 2011 Summer School. Operational Research stream. Lecture by Gerhard-Wilhelm Weber.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Inference of the JAK-STAT Gene Network via Graphical Models

  1. 1. 6th International Summer SchoolNational University of Technology of the Ukraine Kiev, Ukraine, August 8-20, 2011 Kiev Summer School: Appendix Inference of the JAK-STAT Gene Network via Graphical Models Vilda Purutçuoğlu1 Tuğba Erdem2 Gerhard Wilhelm Weber3 1,2Department of Statistics, Middle East Technical University, Ankara, Turkey 3Institute of Applied Mathematics, Middle East Technical University, Ankara, Turkey,,
  2. 2. Outline• Introduction• JAK-STAT Pathway under IFN Treatment• Simulation of the data via Gillespie Algorithm• Graphical models: 1. Graphical model from shrinkage covariance matrix 2. Lasso-based graphical model 3. Graphical lasso with L1 penalized likelihood• Application and Conclusion InterSymp 2011 - Inference of the JAK-STAT Gene Network via Graphical Models 2
  3. 3. INTRODUCTION• A biological network defines the elements and interactions of biologically linked components in a cellular metabolism.• In the graph theory, the networks are represented by nodes which denote genes, proteins or species and the edges, i.e., interactions or links, between the nodes. The graphical models define such structures under the conditional independency concept (Whittaker, 1990).• To estimate the links in a network, several methods are proposed such as Boolean approaches, differential equations, and stochastic modelling (Bower and Bolouri, 2001).• Among them, the graphical models can be proposed as an alternative model where the interactions between the nodes can be estimated and the network itself can be inferred via both static and dynamic framework.• In this study, we estimate the JAK-STAT biological system with realistic complexity via three major approaches of graphical models: – the shrinkage covariance method (Schafer and Strimmer, 2005) – the lasso-based graphical model (Meinshausen and Bühlmann, 2006) – the lasso with -penalized regression method (Friedman et al., 2008) InterSymp 2011 - Inference of the JAK-STAT Gene Network via Graphical Models 3
  4. 4. JAK-STAT PATHWAY UNDER IFN TREATMENT• The JAK-STAT (Janus kinase/signal transducer and activator transcription) pathway is one of the major signalling transaction systems which is activated by Type I interferons and regulates cytokine-dependent gene expression and growth factors of mammals.• In this study we consider the description of the system under the IFN treatment which is developed againt the hepatitis C virus (HCV).• Maiwald et al. (2010) represents this system under IFN via 40 nodes and 66 reactions in which the stochastic reaction rate constants are listed by combining different data sources about this pathway.• In Figure 1, the simple representation of the JAK- STAT system under IFN treatment described in Maiwald et al. (2010) is drawn via simone R package. Figure.1: Simple representation of the IFN-mediated JAK-STAT pathway InterSymp 2011 - Inference of the JAK-STAT Gene Network via Graphical Models 4
  5. 5. SIMULATION OF THE DATA VIA THE GILLESPIE ALGORITHM• In inference of the system via the graphical models, we use a time-course dataset which is generated via the Gillespie algorithm, also known as the Direct method (Gillespie, 1977).• This algorithm is the most common and usually the most efficient simulator based on the chemical master equation which describes the stochastic behaviour of a system. – Procedure: in each iteration the Gillespie generates a random value from the exponential distribution with rate as the summation of total hazards in the system h0(Y), t Exp(h0(Y)) to specify the time of the next reaction. (Y is the states explaining the number of molecules for each species, t is the change in time, and h is the hazard, i.e. the product of the number of distinct molecular reactant combinations available in the state Y for each reaction with associated reaction rate constant). – Once the next reaction time is determined, the algorithm chooses the reaction type randomly during with probability hj(Y) / h0(Y) in which hj(Y) is the hazard of the jth reaction. – The system is updated according to the time to the next event and the event type.• For the JAK-STAT system, we run this algorithm until the total time unit =100 while initializing the number of molecules and the stochastic reaction rate constants as stated in Maiwald et al. (2010). Then we take the values at the interger time unit from t = 90,...,99. A measurement dataset for 10 time points for 40 nodes is constructed. InterSymp 2011 - Inference of the JAK-STAT Gene Network via Graphical Models 5
  6. 6. GRAPHICAL MODELS1. Graphical Model From Shrinkage Covariance Matrix• In estimation of precision matrix which is basically obtained from the covariance matrix of the nodes in a network, the shrinkage of the covariance matrix improves the inference for a sparse network. In this method the shrinked estimate of the covariance matrix is obtained by S*= T + (1- )S, where S is the unbiased estimate of Σ. T= diag(s11, ..., spp) represents a low dimensional target, and refers to the shrinkage parameter estimated by minimizing the mean squared error loss function (Schafer and Strimmer, 2005).• If is high, the shrinked S becomes less dimensional, but has higher variance. Whereas, if is low, S becomes higher dimensional with lower variance. Therefore the objective is to find the optimal value for , which is achieved by the minimizing the associated loss function.• In the application of the graphical model in the JAK-STAT pathway, we observe that the strengths of the interactions via the shrinkage estimates mostly validate the current literature under = 0.56. For instance the estimated strength between IFN_influx and IFN_free which gives relatively higher correlations, as = 0.43, is checked from the biological knowledge and it is found that it possesses truely high interaction within each other. InterSymp 2011 - Inference of the JAK-STAT Gene Network via Graphical Models 6
  7. 7. GRAPHICAL MODELS2. Lasso-based Graphical Model• In inference of the strength of the interaction when the absent links are already known, we can implement the lasso-based regression model which regresses each node on the remaining ones via Y(p)= Y(-p)β+ε, where p is the last node and –p represents the remaining nodes, β is the regression coefficients and ε is the error from normal distribution with zero mean. In the estimation, β is found from the L1 penalty on β under the penalty term .• This approach also enables us to infer the whole structure with existence of nodes and links in sparse networks. it is computationally efficient and provides good approximation to the distribution of variables, whereas, can produce non-symmetric covariance matrix (Meinshausen and Bühlmann, 2006; Wit et al., 2010). • For implementation of the lasso-based approach, we control the number of correct estimated links for each and we observe that the optimal solution for both the strength and the estimated network structure is analysed under = 0.0001 which enforces the sparsity in the network. Figure 2: Estimated system via the L1 penalized lasso regression under =0.1. InterSymp 2011 - Inference of the JAK-STAT Gene Network via Graphical Models 7
  8. 8. GRAPHICAL MODELS3. Graphical Lasso With L1 Penalized Likelihood Approach•Different from the previous lasso-method, this approach penalizes the entries of precisionmatrix, rather than the regression parameters, by ensuring symmetric and invertible covariancematrix in the regression model. The estimation is conducted via maxθ (log |θ|- trace (S θ)- || θ||1)optimization, where θ is the precision matrix. In order to find an optimal value for , ROC-typecurves can be performed for the comparison of sensitivity and specificity values. Hereby thewhich maximizes the sensitivity is chosen as the optimal penalty parameter (Wit et al, 2010).• In the application of the L1 -penalized lasso regression,we compute the true positive rate versus false positiverate as shown in Table 1. From the results, it is seen thatthe optimal is calculated for = 0.1 penalty parameter. True positive rate False positive rate 0.1 0.3925 0.2076 0.5 0.3738 0.2016 0.7 0.3551 0.1956 0.75 0.3551 0.1969 0.8 0.3551 0.1956 0.9 0.3551 0.1942 Figure 3: Estimated system via the L1 penalized 0.95 0.3551 0.1929 lasso regression under =0.1.Table 1: The true positive and false positive rate for -penalized lasso regression. InterSymp 2011 - Inference of the JAK-STAT Gene Network via Graphical Models 8
  9. 9. CONCLUSION and FUTURE WORKWe see that the graphical model is promising for the inference under sparse and high dimensional network. Whereas the performance of the estimates is highly fluctuated with respect to the chosen penalty parameter . Therefore we believe that the final network structure can be inferred under different criteria including the model selection criteria such as AIC and BIC as proposed in the current study of Wit et al. (2010). InterSymp 2011 - Inference of the JAK-STAT Gene Network via Graphical Models 9
  10. 10. REFERENCES• Bower, J. and H. Bolouri (2001); Computational Modelling of Genetic and Biochemical Networks; MIT, 2nd Edition• Friedman, J., Hastie, T. and R. Tibshirani (2008); Sparse Inverse Covariance Estimation with the Graphical Lasso, Biostatistics; Vol.9, No.3 (pp. 432-441)• Gillespie, D.T. (1977); Exact Stochastic Simulation of Coupled Chemical Reactions; Journal of Physical Chemistry; Vol. 81, No. 25 (pp. 2340-2361)• Maiwald, T., Schneider, A., Busch, H., Sahle, S., Gretz, N., Weiss, T., Kummer, U. and U. Klingmuller (2010); Combining Theoretical Analysis and Experimental Data Generation Reveals IRF9 as a Crucial Factor for Accelerating Interferon Induced Early Antiviral Signalling; FEBS Journal 277 (pp. 4741-4754)• Meinshausen, N. and P. Bühlmann (2006); High Dimensional Graphs and Variable Selection with the Lasso; Annals of Statistics; Vol. 34, No. 3 (pp. 1436-1462)• Schafer, J. and K. Strimmer (2005); A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics; SAGEM, Vol. 4, No. 1• Whittaker, J. (1990); Graphical Models in Applied Multivariate Statistics; John Wiley and Sons• Wit, E., Vinciotti, V. and V. Purutçuoğlu (2010); Statistics for Biological Networks; Short Course Notes: 25th International Biometric Conference (IBC); Florianopolis, Brazil (pp. 1-197) InterSymp 2011 - Inference of the JAK-STAT Gene Network via Graphical Models 10