This document discusses methods for inferring gene co-expression networks from multiple gene expression samples, such as from different breeds or conditions. It describes using graphical Gaussian models and sparse regression approaches like the graphical lasso to learn networks from individual samples. For multiple samples, independent or joint network estimation methods are discussed, including the GroupLasso and CoopLasso approaches implemented in the R package simone, which aim to find consensus networks that are consistent or sign-coherent across conditions. An example dataset with gene expression from two pig breeds is analyzed to compare the methods.
Mini useR! in Melbourne https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network/events/251933078/
MelbURN (Melbourne useR group) https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network
July 16th, 2018
Melbourne, Australia
Mini useR! in Melbourne https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network/events/251933078/
MelbURN (Melbourne useR group) https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network
July 16th, 2018
Melbourne, Australia
In this talk I review the concept of Granger causality and the problematic effects of synergy and redundancy on its estimation.
I will then propose an operative definition of these concepts.
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
Nathalie Vialaneix
4th course on Computational Systems Biology of Cancer: Multi-omics and Machine Learning Approaches
International course, Curie training
https://training.institut-curie.org/courses/sysbiocancer2021
(remote)
September 29th, 2021
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/dlai-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/drl-2020/
This course presents the principles of reinforcement learning as an artificial intelligence tool based on the interaction of the machine with its environment, with applications to control tasks (eg. robotics, autonomous driving) o decision making (eg. resource optimization in wireless communication networks). It also advances in the development of deep neural networks trained with little or no supervision, both for discriminative and generative tasks, with special attention on multimedia applications (vision, language and speech).
https://telecombcn-dl.github.io/idl-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/dlai-2019/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco ScutariBayes Nets meetup London
A talk given at the Bayes Nets meetup on Sept 29th 2016 by Dr Marco Scutari from the University of Oxford. Title of the talk was Bayesian Network Modelling with examples in Genetics and Systems Biology, with case studies.
Image segmentation is a classic computer vision task that aims at labeling pixels with semantic classes. These slides provide an overview of the basic approaches applied from the deep learning field to tackle this challenge and presents the basic subtasks (semantic, instance and panoptic segmentation) and related datasets.
Presented at the International Summer School on Deep Learning (ISSonDL) 2020 held online and organized by the University of Gdansk (Poland) between the 30th August and 2nd September.
http://2020.dl-lab.eu/virtual-summer-school-on-deep-learning/
In this talk I review the concept of Granger causality and the problematic effects of synergy and redundancy on its estimation.
I will then propose an operative definition of these concepts.
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
Nathalie Vialaneix
4th course on Computational Systems Biology of Cancer: Multi-omics and Machine Learning Approaches
International course, Curie training
https://training.institut-curie.org/courses/sysbiocancer2021
(remote)
September 29th, 2021
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/dlai-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/drl-2020/
This course presents the principles of reinforcement learning as an artificial intelligence tool based on the interaction of the machine with its environment, with applications to control tasks (eg. robotics, autonomous driving) o decision making (eg. resource optimization in wireless communication networks). It also advances in the development of deep neural networks trained with little or no supervision, both for discriminative and generative tasks, with special attention on multimedia applications (vision, language and speech).
https://telecombcn-dl.github.io/idl-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/dlai-2019/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco ScutariBayes Nets meetup London
A talk given at the Bayes Nets meetup on Sept 29th 2016 by Dr Marco Scutari from the University of Oxford. Title of the talk was Bayesian Network Modelling with examples in Genetics and Systems Biology, with case studies.
Image segmentation is a classic computer vision task that aims at labeling pixels with semantic classes. These slides provide an overview of the basic approaches applied from the deep learning field to tackle this challenge and presents the basic subtasks (semantic, instance and panoptic segmentation) and related datasets.
Presented at the International Summer School on Deep Learning (ISSonDL) 2020 held online and organized by the University of Gdansk (Poland) between the 30th August and 2nd September.
http://2020.dl-lab.eu/virtual-summer-school-on-deep-learning/
Visualiser et fouiller des réseaux - Méthodes et exemples dans Rtuxette
AG du PEPI IBIS, 1er avril 2014
Cet exposé introduira la notion de réseaux et les problématiques élémentaires qui y sont généralement associées (visualisation, recherche de sommets importants, recherche de modules). Les notions seront illustrées à l'aide d'exemples utilisant le logiciel R sur un réseau réel.
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
An important, and well studied, class of stochastic models is given by stochastic differential equations (SDEs). In this talk, we consider Bayesian inference based on measurements from several individuals, to provide inference at the "population level" using mixed-effects modelling. We consider the case where dynamics are expressed via SDEs or other stochastic (Markovian) models. Stochastic differential equation mixed-effects models (SDEMEMs) are flexible hierarchical models that account for (i) the intrinsic random variability in the latent states dynamics, as well as (ii) the variability between individuals, and also (iii) account for measurement error. This flexibility gives rise to methodological and computational difficulties.
Fully Bayesian inference for nonlinear SDEMEMs is complicated by the typical intractability of the observed data likelihood which motivates the use of sampling-based approaches such as Markov chain Monte Carlo. A Gibbs sampler is proposed to target the marginal posterior of all parameters of interest. The algorithm is made computationally efficient through careful use of blocking strategies, particle filters (sequential Monte Carlo) and correlated pseudo-marginal approaches. The resulting methodology is is flexible, general and is able to deal with a large class of nonlinear SDEMEMs [1]. In a more recent work [2], we also explored ways to make inference even more scalable to an increasing number of individuals, while also dealing with state-space models driven by other stochastic dynamic models than SDEs, eg Markov jump processes and nonlinear solvers typically used in systems biology.
[1] S. Wiqvist, A. Golightly, AT McLean, U. Picchini (2020). Efficient inference for stochastic differential mixed-effects models using correlated particle pseudo-marginal algorithms, CSDA, https://doi.org/10.1016/j.csda.2020.107151
[2] S. Persson, N. Welkenhuysen, S. Shashkova, S. Wiqvist, P. Reith, G. W. Schmidt, U. Picchini, M. Cvijovic (2021). PEPSDI: Scalable and flexible inference framework for stochastic dynamic single-cell models, bioRxiv doi:10.1101/2021.07.01.450748.
There is now a huge literature on Bayesian methods for variable selection that use spike-and-slab priors. Such methods, in particular, have been quite successful for applications in a variety of different fields. High-throughput genomics and neuroimaging are two of such examples. There, novel methodological questions are being generated, requiring the integration of different concepts, methods, tools and data types. These have in particular motivated the development of variable selection priors that go beyond the independence assumptions of a simple Bernoulli prior on the variable inclusion indicators. In this talk I will describe various prior constructions that incorporate information about structural dependencies among the variables. I will also address extensions of the models to the analysis of count data. I will motivate the development of the models using specific applications from neuroimaging and from studies that use microbiome data.
Complex systems are characterized by constituents -- from neurons in the brain to individuals in a social network -- which exhibit special structural organization and nonlinear dynamics. As a consequence, a complex system cannot be understood by studying its units separately because their interactions lead to unexpected emerging phenomena, from collective behavior to phase transitions.
Recently, we have discovered that a new level of complexity characterizes a variety of natural and artificial systems, where units interact, simultaneously, in distinct ways. For instance, this is the case of multimodal transportation systems (e.g., metro, bus and train networks) or of biological molecules, whose interactions might be of different type (e.g. physical, chemical, genetic) or functionality (e.g., regulatory, inhibitory, etc.). The unprecedented newfound wealth of multivariate data allows to categorize system's interdependency by defining distinct "layers", each one encoding a different network representation of the system. The result is a multilayer network model.
Analyzing data from different domains -- including molecular biology, neuroscience, urban transport, telecommunications -- we will show that neglecting or disregarding multivariate information might lead to poor results. Conversely, multilayer models provide a suitable framework for complex data analytics, allowing to quantify the resilience of a system to perturbations (e.g., localized failures or targeted attacks), improving forecasting of spreading processes and accuracy in classification problems.
Professor Timoteo Carletti presented a seminar titled "A journey in the zoo of Turing patterns: the topology does matter as part of the SMART Seminar Series on 8th March 2018.
More information: http://www.uoweis.co/event/a-journey-in-the-zoo-of-turing-patterns-the-topology-does-matter/
Keep updated with future events: http://www.uoweis.co/events/category/smart-infrastructure-facility/
We consider the problem of model estimation in episodic Block MDPs. In these MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states. We are interested in estimating the latent state decoding function (the mapping from the observations to latent states) based on data generated under a fixed behavior policy. We derive an information-theoretical lower bound on the error rate for estimating this function and present an algorithm approaching this fundamental limit. In turn, our algorithm also provides estimates of all the components of the MDP.
We apply our results to the problem of learning near-optimal policies in the reward-free setting. Based on our efficient model estimation algorithm, we show that we can infer a policy converging (as the number of collected samples grows large) to the optimal policy at the best possible asymptotic rate. Our analysis provides necessary and sufficient conditions under which exploiting the block structure yields improvements in the sample complexity for identifying near-optimal policies. When these conditions are met, the sample complexity in the minimax reward-free setting is improved by a multiplicative factor $n$, where $n$ is the number of contexts.
Knowledge of cause-effect relationships is central to the field of climate science, supporting mechanistic understanding, observational sampling strategies, experimental design, model development and model prediction. While the major causal connections in our planet's climate system are already known, there is still potential for new discoveries in some areas. The purpose of this talk is to make this community familiar with a variety of available tools to discover potential cause-effect relationships from observed or simulation data. Some of these tools are already in use in climate science, others are just emerging in recent years. None of them are miracle solutions, but many can provide important pieces of information to climate scientists. An important way to use such methods is to generate cause-effect hypotheses that climate experts can then study further. In this talk we will (1) introduce key concepts important for causal analysis; (2) discuss some methods based on the concepts of Granger causality and Pearl causality; (3) point out some strengths and limitations of these approaches; and (4) illustrate such methods using a few real-world examples from climate science.
“Statistical Physics Studies of Machine Learning Problems" by Lenka Zdeborova, Researcher @CNRS
Abstract : We will talk about some insight of the following questions: What makes problems studied in machine and statistical physics related? How can this relation be used to understand better the performance and limitations of machine learning systems? What happens when a phase transition is found in a computational problem? How do phase transitions influence algorithmic hardness?
Similar to Consensual gene co-expression network inference with multiple samples (20)
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Consensual gene co-expression network inference with multiple samples
1. Consensual gene co-expression network
inference with multiple samples
Nathalie Villa-Vialaneix(1,2)
http://www.nathalievilla.org
nathalie.villa@univ-paris1.fr
Joint work with Magali SanCristobal and Laurence Liaubet
Groupe de travail biostatistique - 19 mars 2013
(1) (2)
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 1 / 21
2. Overview on network inference
Outline
1 Overview on network inference
2 Graphical Gaussian Models
3 Inference with multiple samples
4 Illustration
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 2 / 21
3. Overview on network inference
Framework
Data: large scale gene expression data
individuals
n 30/50
X =
. . . . . .
. . X
j
i
. . .
. . . . . .
variables (genes expression), p 103/4
What we want to obtain: a graph/network with
• nodes: genes;
• edges: “significant” and direct co-expression between two genes
(track transcription regulations).
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 3 / 21
4. Overview on network inference
Modeling multiple interactions between genes with a
network
Co-expression networks
• nodes: genes
• edges: “direct” co-expression
between two genes
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 4 / 21
5. Overview on network inference
Modeling multiple interactions between genes with a
network
Co-expression networks
• nodes: genes
• edges: “direct” co-expression between two genes
Method:
“Correlations” Thresholding Graph
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 4 / 21
6. Overview on network inference
Correlations/Partial correlations
strong indirect correlation
y z
x
set.seed(2807); x <- runif(100)
y <- 2*x+1 + rnorm(100,0,0.1); cor(x,y); [1] 0.9870407
z <- -x+2 + rnorm(100,0,0.1); cor(x,z); [1] -0.9443082
cor(y,z) [1] -0.9336924
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 5 / 21
7. Overview on network inference
Correlations/Partial correlations
Partial correlation
Cor (z, y|x)
Correlation between residuals:
set.seed(2807); x <- runif(100)
y <- 2*x+1 + rnorm(100,0,0.1); cor(x,y); [1] 0.9870407
z <- -x+2 + rnorm(100,0,0.1); cor(x,z); [1] -0.9443082
cor(y,z) [1] -0.9336924
cor(lm(y x)$residuals,lm(z x)$residuals) [1] -0.03071178
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 5 / 21
8. Overview on network inference
Advantages of a network approach
1 over raw data and correlation network (relevance network,
[Butte and Kohane, 1999]): focuses on direct links;
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 6 / 21
9. Overview on network inference
Advantages of a network approach
1 over raw data and correlation network (relevance network,
[Butte and Kohane, 1999]): focuses on direct links;
2 over raw data (again): focuses on “significant” links (more robust)
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 6 / 21
10. Overview on network inference
Advantages of a network approach
1 over raw data and correlation network (relevance network,
[Butte and Kohane, 1999]): focuses on direct links;
2 over raw data (again): focuses on “significant” links (more robust)
3 over bibliographic network: can handle interactions with yet
unknown (not annotated) genes
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 6 / 21
11. Graphical Gaussian Models
Outline
1 Overview on network inference
2 Graphical Gaussian Models
3 Inference with multiple samples
4 Illustration
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 7 / 21
12. Graphical Gaussian Models
Theoretical framework
Gaussian Graphical Models (GGM) X ∼ N(0, Σ) gene expressions
Seminal work [Schäfer and Strimmer, 2005], R package GeneNet:
estimation of the partial correlations
πjj = Cor(Xj
, Xj
|Xk
, k j, j )
from the concentration matrix S = Σ−1
:
πjj = −
Sjj
SjjSj j
.
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 8 / 21
13. Graphical Gaussian Models
Theoretical framework
Gaussian Graphical Models (GGM) X ∼ N(0, Σ) gene expressions
Seminal work [Schäfer and Strimmer, 2005], R package GeneNet:
estimation of the partial correlations
πjj = Cor(Xj
, Xj
|Xk
, k j, j )
from the concentration matrix S = Σ−1
:
πjj = −
Sjj
SjjSj j
.
Main issue: p n ⇒ Σ badly conditioned ⇒ estimating S from Σ−1
is a
bad idea...
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 8 / 21
14. Graphical Gaussian Models
Theoretical framework
Gaussian Graphical Models (GGM) X ∼ N(0, Σ) gene expressions
Seminal work [Schäfer and Strimmer, 2005], R package GeneNet:
estimation of the partial correlations
πjj = Cor(Xj
, Xj
|Xk
, k j, j )
from the concentration matrix S = Σ−1
:
πjj = −
Sjj
SjjSj j
.
Main issue: p n ⇒ Σ badly conditioned ⇒ estimating S from Σ−1
is a
bad idea... Schafer & Strimmer’s proposal:
1 use Σ + λI rather than Σ to estimate S;
2 select only the most significant Sjj (Bayesian test):
S ∼ (1 − η0)fA + η0f0
with f0: distribution of the “null” edges and η0 proportion of null edges
among the partial correlations values (close to 1).
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 8 / 21
15. Graphical Gaussian Models
Sparse regression approach
[Meinshausen and Bühlmann, 2006, Friedman et al., 2008] Partial
correlations can also be estimated by using linear models: ∀ j
Xj
= βT
j X−j
+
In the Gaussian framework: βjj = −
Sjj
Sjj
.
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 9 / 21
16. Graphical Gaussian Models
Sparse regression approach
[Meinshausen and Bühlmann, 2006, Friedman et al., 2008] Partial
correlations can also be estimated by using linear models: ∀ j
Xj
= βT
j X−j
+
In the Gaussian framework: βjj = −
Sjj
Sjj
.
Independant regressions:
max
(βjj )j
log MLj − λ
j j
|βjj |
with log MLj ∼ − n
i X
j
i
− j j βjj X
j
i
2
.
Consequence: the sparse penalty yields to βjj = 0 for most coefficients
(“all-in-one” approach: no thresholding step needed).
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 9 / 21
17. Graphical Gaussian Models
Sparse regression approach
[Meinshausen and Bühlmann, 2006, Friedman et al., 2008] Partial
correlations can also be estimated by using linear models: ∀ j
Xj
= βT
j X−j
+
In the Gaussian framework: βjj = −
Sjj
Sjj
.
Global approach: Graphical Lasso (R package glasso)
max
(βjj )jj
j
log MLj + λ
j j
|βjj |
Consequence: the sparse penalty yields to βjj = 0 for most coefficients
(“all-in-one” approach: no thresholding step needed).
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 9 / 21
18. Graphical Gaussian Models
Other methods/packages to infer networks
• relevance (correlation) networks: R package WGCNA
• Bayesian networks: R package bnlearn
[Pearl, 1998, Pearl and Russel, 2002, Scutari, 2010]
• networks based on mutual information: R package minet
[Meyer et al., 2008]
• networks based on random forest [Huynh-Thu et al., 2010]
See also:
• http://cran.r-project.org/web/views/gR.html (CRAN task
view on graphical methods)
• https://www.coursera.org/course/pgm (Daphne’s Koller on-line
course on “Probabilistic Graphical Models”, starts on April, 8th)
• https://www.coursera.org/course/netsysbio (On-line course
on “Network Analysis in Systems Biology”)
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 10 / 21
19. Inference with multiple samples
Outline
1 Overview on network inference
2 Graphical Gaussian Models
3 Inference with multiple samples
4 Illustration
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 11 / 21
20. Inference with multiple samples
Multiple networks inference
Transcriptomic data coming from several different conditions.
Examples:
• genes expression from pig muscle in Landrace and Large white
breeds;
• genes expression from obese humans after and before a diet.
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 12 / 21
21. Inference with multiple samples
Multiple networks inference
Transcriptomic data coming from several different conditions.
Examples:
• genes expression from pig muscle in Landrace and Large white
breeds;
• genes expression from obese humans after and before a diet.
• Assumption: A
common functioning
exists regardless the
condition;
• Which genes are
correlated
independently
from/depending on the
condition?
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 12 / 21
22. Inference with multiple samples
Dataset description
“DeLiSus” dataset
• variables: expression of 81 genes (selected by Laurence)
• conditions: two breeds (33 “Landrace” and 51 “Large white”; 84 pigs)
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 13 / 21
24. Inference with multiple samples
Multiple networks
Independent estimations: if c = 1, . . . , C are different samples (or
“conditions”, e.g., breeds or before/after diet...)
max
(βc
jk
)k j,c=1,...,C c
log MLc
j − λ
k j
|βc
jk |
.
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 15 / 21
25. Inference with multiple samples
Multiple networks
Independent estimations: if c = 1, . . . , C are different samples (or
“conditions”, e.g., breeds or before/after diet...)
max
(βc
jk
)k j,c=1,...,C c
log MLc
j − λ
k j
|βc
jk |
.
Joint estimations:
Implemented in the R package simone, [Chiquet et al., 2011]
GroupLasso Consensual network between conditions (enforces identical
edges by a group LASSO penalty)
CoopLasso Sign-coherent network between conditions (prevents edges
that corresponds to partial correlations having different
signs; thus allows one to obtain a few differences between
the conditions)
Intertwined In GLasso replace Σc
by 1/2Σc
+ 1/2Σ where Σ = 1
C c Σc
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 15 / 21
26. Inference with multiple samples
Consensus LASSO
Proposal: Infer multiple networks by forcing them toward a consensual
network.
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 16 / 21
27. Inference with multiple samples
Consensus LASSO
Proposal: Infer multiple networks by forcing them toward a consensual
network.
Original optimization:
max
(βc
jk
)k j,c=1,...,C c
log MLc
j − λ
k j
|βc
jk |
.
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 16 / 21
28. Inference with multiple samples
Consensus LASSO
Proposal: Infer multiple networks by forcing them toward a consensual
network.
Add a constraint to force inference toward a consensus βcons:
max
(βc
jk
)k j,c=1,...,C c
log MLc
j − λ
k j
|βc
jk | − µ
c
wc βc
j − βcons
j
2
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 16 / 21
29. Inference with multiple samples
Consensus LASSO
Proposal: Infer multiple networks by forcing them toward a consensual
network.
Add a constraint to force inference toward a consensus βcons:
max
(βc
jk
)k j,c=1,...,C c
log MLc
j − λ
k j
|βc
jk | − µ
c
wc βc
j − βcons
j
2
Examples:
• βcons
j
= βc∗
j
with c∗ = arg min |βc
j
| (network intersection);
• βcons
j
= c
nc
n βc
j
(“average” network).
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 16 / 21
30. Inference with multiple samples
In practice...
βcons
j
= c
nc
n βc
j
is a good choice because:
•
∂βcons
j
∂βc
j
exists;
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 17 / 21
31. Inference with multiple samples
In practice...
βcons
j
= c
nc
n βc
j
is a good choice because:
•
∂βcons
j
∂βc
j
exists;
• thus, solving the optimization problem is equivalent to maximizing
1
2
βT
j Sj(µ)βj + βT
j Σjj + λ
c
1
nc
βc
j 1
with Σjj, the jth row of empirical covariance matrix deprived from its
jth column and Sj(µ) = Σjj + 2µAT
A where Σjj is the empirical
covariance matrix deprived from its jth row and column and A is a
matrix that does not depend on j.
This is a standard LASSO problem that can be solved using a
sub-gradient method (as described in [Chiquet et al., 2011] and already
implemented in the beta-R-package therese).
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 17 / 21
32. Illustration
Outline
1 Overview on network inference
2 Graphical Gaussian Models
3 Inference with multiple samples
4 Illustration
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 18 / 21
33. Illustration
Datasets description
“DeLiSus” dataset
• variables: expression of 26 genes (selected by Laurence)
• conditions: two breeds (33 “Landrace” and 51 “Large white”; 84 pigs)
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 19 / 21
34. Illustration
Datasets description
“DeLiSus” dataset
• variables: expression of 26 genes (selected by Laurence)
• conditions: two breeds (33 “Landrace” and 51 “Large white”; 84 pigs)
Methodology
• package GeneNet: networks are estimated independently by a GGM
approach (edges selected based on the p-value in a Bayesian test);
• consensus LASSO: µ fixed and λ varied on a regularization path.
Selection of an instance of the path based on the number of edges
(similar than with GeneNet).
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 19 / 21
38. Illustration
Conclusion
... much left to do:
• biological validation,
• selecting λ (AIC and BIC are way too restrictive...),
• tuning µ,
• other comparisons...
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 21 / 21
39. Illustration
References
Butte, A. and Kohane, I. (1999).
Unsupervised knowledge discovery in medical databases using relevance networks.
In Proceedings of the AMIA Symposium, pages 711–715.
Chiquet, J., Grandvalet, Y., and Ambroise, C. (2011).
Inferring multiple graphical structures.
Statistics and Computing, 21(4):537–553.
Friedman, J., Hastie, T., and Tibshirani, R. (2008).
Sparse inverse covariance estimation with the graphical lasso.
Biostatistics, 9(3):432–441.
Huynh-Thu, V., Irrthum, A., Wehenkel, L., and Geurts, P. (2010).
Inferring regulatory networks from expression data using tree-based methods.
PLoS ONE, 5(9):e12776.
Meinshausen, N. and Bühlmann, P. (2006).
High dimensional graphs and variable selection with the lasso.
Annals of Statistic, 34(3):1436–1462.
Meyer, P., Lafitte, F., and Bontempi, G. (2008).
minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information.
BMC Bioinformatics, 9(461).
Pearl, J. (1998).
Probabilistic reasoning in intelligent systems: networks of plausible inference.
Morgan Kaufmann, San Francisco, California, USA.
Pearl, J. and Russel, S. (2002).
Bayesian Networks.
Bradford Books (MIT Press), Cambridge, Massachussets, USA.
Schäfer, J. and Strimmer, K. (2005).
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 21 / 21
40. Illustration
An empirical bayes approach to inferring large-scale gene association networks.
Bioinformatics, 21(6):754–764.
Scutari, M. (2010).
Learning Bayesian networks with the bnlearn R package.
Journal of Statistical Software, 35(3):1–22.
Consensus LASSO (INRA de Toulouse, MIAT) Nathalie Villa-Vialaneix Toulouse, 19 mars 2013 21 / 21