This document provides an introduction to biological network inference using Gaussian graphical models. It discusses motivations for network inference based on the central dogma of molecular biology and common questions in functional genomics. The challenges of modeling high-dimensional omics data are described, including what network nodes and edges represent statistically and biologically. Gaussian graphical models are proposed as a tool for modeling dependencies between biological variables in genomic data, with the goal of reconstructing biological networks from large-scale omics experiments.
Network Biology: A paradigm for modeling biological complex systemsGanesh Bagler
These slides are part of the two lectures delivered at the as part of the 'National Workshop on Network Modelling and Graph Theory' (Dec 14-16, 2017) at Department of Mathematics, Dibrugarh University, Assam, India.
(1) Network Biology: A paradigm for integrative modeling of biological complex systems -- 14 Dec 2017, 3:30pm
(2) Applications of network modeling in biomedicine -- 15 Dec 2017, 9:00pm
Sponsored by UGC under SAP DRS (II)
(1) Workshop link: https://www.dibru.ac.in/upcoming-events/2981-national-workshop-on-network-modelling-and-graph-theory
(2) The Workshop Flyer: https://www.dibru.ac.in/images/uploaded_files/2017/Nov/National_Workshop_on_Network_Modelling_and_Graph_Theory.pdf
Analysis of Neocognitron of Neural Network Method in the String RecognitionIDES Editor
This paper aims that analysing neural network method
in pattern recognition. A neural network is a processing device,
whose design was inspired by the design and functioning of
human brain and their components. The proposed solutions
focus on applying Neocognitron Algorithm model for pattern
recognition. The primary function of which is to retrieve in a
pattern stored in memory, when an incomplete or noisy version
of that pattern is presented. An associative memory is a
storehouse of associated patterns that are encoded in some
form. In auto-association, an input pattern is associated with
itself and the states of input and output units coincide. When
the storehouse is incited with a given distorted or partial
pattern, the associated pattern pair stored in its perfect form
is recalled. Pattern recognition techniques are associated a
symbolic identity with the image of the pattern. This problem
of replication of patterns by machines (computers) involves
the machine printed patterns. There is no idle memory
containing data and programmed, but each neuron is
programmed and continuously active.
Diagnosis Chest Diseases Using Neural Network and Genetic Hybrid AlgorithmIJERA Editor
The back propagation algorithm is most popular algorithm in feed forward neural network with the multi-layer. It measures the output error and calculates the gradient of the error and adjusting the ANN weight moving along the descending gradient direction. Back propagation is used to learn and store by mapping relations of input- output model. A genetic algorithm is having a random probability distribution or pattern that may be analyses statistically but may not be predicted precisely. Genetic algorithm is an iterative procedure that generates new population for individual from the old one. In my paper I am proposing to implement the back propagation algorithm and genetic algorithm to compare the output accuracy percent for medical diagnosis on various chest diseases (Asthme, tuberculosis, lung cancer, pneumonia).
Functional Genomics Journal Club presentation on the following publication:
Kuzawa, C. W., Chugani, H. T., Grossman, L. I., Lipovich, L., Muzik, O., Hof, P. R., … Lange, N. (2014). Metabolic costs and evolutionary implications of human brain development. Proceedings of the National Academy of Sciences, 111(36), 13010–13015. https://doi.org/10.1073/pnas.1323099111
Network Biology: A paradigm for modeling biological complex systemsGanesh Bagler
These slides are part of the two lectures delivered at the as part of the 'National Workshop on Network Modelling and Graph Theory' (Dec 14-16, 2017) at Department of Mathematics, Dibrugarh University, Assam, India.
(1) Network Biology: A paradigm for integrative modeling of biological complex systems -- 14 Dec 2017, 3:30pm
(2) Applications of network modeling in biomedicine -- 15 Dec 2017, 9:00pm
Sponsored by UGC under SAP DRS (II)
(1) Workshop link: https://www.dibru.ac.in/upcoming-events/2981-national-workshop-on-network-modelling-and-graph-theory
(2) The Workshop Flyer: https://www.dibru.ac.in/images/uploaded_files/2017/Nov/National_Workshop_on_Network_Modelling_and_Graph_Theory.pdf
Analysis of Neocognitron of Neural Network Method in the String RecognitionIDES Editor
This paper aims that analysing neural network method
in pattern recognition. A neural network is a processing device,
whose design was inspired by the design and functioning of
human brain and their components. The proposed solutions
focus on applying Neocognitron Algorithm model for pattern
recognition. The primary function of which is to retrieve in a
pattern stored in memory, when an incomplete or noisy version
of that pattern is presented. An associative memory is a
storehouse of associated patterns that are encoded in some
form. In auto-association, an input pattern is associated with
itself and the states of input and output units coincide. When
the storehouse is incited with a given distorted or partial
pattern, the associated pattern pair stored in its perfect form
is recalled. Pattern recognition techniques are associated a
symbolic identity with the image of the pattern. This problem
of replication of patterns by machines (computers) involves
the machine printed patterns. There is no idle memory
containing data and programmed, but each neuron is
programmed and continuously active.
Diagnosis Chest Diseases Using Neural Network and Genetic Hybrid AlgorithmIJERA Editor
The back propagation algorithm is most popular algorithm in feed forward neural network with the multi-layer. It measures the output error and calculates the gradient of the error and adjusting the ANN weight moving along the descending gradient direction. Back propagation is used to learn and store by mapping relations of input- output model. A genetic algorithm is having a random probability distribution or pattern that may be analyses statistically but may not be predicted precisely. Genetic algorithm is an iterative procedure that generates new population for individual from the old one. In my paper I am proposing to implement the back propagation algorithm and genetic algorithm to compare the output accuracy percent for medical diagnosis on various chest diseases (Asthme, tuberculosis, lung cancer, pneumonia).
Functional Genomics Journal Club presentation on the following publication:
Kuzawa, C. W., Chugani, H. T., Grossman, L. I., Lipovich, L., Muzik, O., Hof, P. R., … Lange, N. (2014). Metabolic costs and evolutionary implications of human brain development. Proceedings of the National Academy of Sciences, 111(36), 13010–13015. https://doi.org/10.1073/pnas.1323099111
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Could A Model Of Predictive Voting Explain Many Long-Range Connections? by Su...Numenta
These are slides on a workshop Subutai Ahmad hosted on March 5, 2018 at the Computational and Systems Neuroscience Meeting (Cosyne) 2018.
About:
This workshop on long-range cortical circuits is focused on our peer-reviewed paper, “A Theory of How Columns in the Neocortex Enable Learning the Structure of the World.” Subutai discussed the inference mechanism introduced in the paper, our theory of location information, and how long-range connections allow columns to integrate inputs over space to perform object recognition.
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Abdelrahman Hosny
In the past, the study of medicine used to focus on observing biological processes that take place in organisms, and based on these observations, biologists would make conclusions that translate into a better understanding of how organisms systems work. Recently, the approach has changed to a computational paradigm, where scientists try to model these biological processes as mathematical equations or statistical models. In this study, we have modeled an important activity of cell replication in a type of bacteria genome using different deep learning network models. Results from this research suggest that deep learning models have the potential to learn representations of DNA sequences, hence predicting cell behavior. Source code is available under MIT license at: http://abdelrahmanhosny.github.io/DL-Cerevesiae/
This proposed method focus on these issues by developing a novel classification algorithm by combining Gene Expression Graph (GEG) with Manhattan distance. This method will be used to express the gene expression data. Gene Expression Graph provides the optimal view about the relationship between normal and unhealthy genes. The method of using a graph-based gene expression to express gene information was first offered by the authors in [1] and [2], It will permits to construct a classifier based on an association between graphs represented for well-known classes and graphs represented for samples to evaluate. Additionally Euclidean distance is used to measure the strength of relationship which exists between the genes.
Have We Missed Half of What the Neocortex Does? by Jeff Hawkins (12/15/2017)Numenta
This was a presentation given on December 15, 2017 at the MIT Center for Brains, Minds + Machines as part of their Brains, Minds and Machines Seminar Series.
You can watch the recording of the presentation after Slide 1.
In this talk, Jeff describes a theory that sensory regions of the neocortex process two inputs. One input is the well-known sensory data arriving via thalamic relay cells. We propose the second input is a representation of allocentric location. The allocentric location represents where the sensed feature is relative to the object being sensed, in an object-centric reference frame. As the sensors move, cortical columns learn complete models of objects by integrating sensory features and location representations over time. Lateral projections allow columns to rapidly reach a consensus of what object is being sensed. We propose that the representation of allocentric location is derived locally, in layer 6 of each column, using the same tiling principles as grid cells in the entorhinal cortex. Because individual cortical columns are able to model complete complex objects, cortical regions are far more powerful than currently believed. The inclusion of allocentric location offers the possibility of rapid progress in understanding the function of numerous aspects of cortical anatomy.
Jeff discusses material from these two papers. Others can be found at https://numenta.com/papers
A Theory of How Columns in the Neocortex Enable Learning the Structure of the World
URL: https://doi.org/10.3389/fncir.2017.00081
Why Neurons Have Thousands of Synapses, A Theory of Sequence Memory in the Neocortex
URL: https://doi.org/10.3389/fncir.2016.00023
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERcscpconf
Bioinformatics and computational biology are rooted in life sciences as well as computer and
information sciences and technologies. Bioinformatics applies principles of information
sciences and technologies to make the vast, diverse, and complex life sciences data more
understandable and useful. Computational biology uses mathematical and computational
approaches to address theoretical and experimental questions in biology. Short read sequence
assembly is one of the most important steps in the analysis of biological data. There are many
open source software’s available for short read sequence assembly where MAQ is one such
popularly used software by the research community.
In general, biological data sets generated by next generation sequencers are very huge and
massive which requires tremendous amount of computational resources. The algorithm used for
the short read sequence assembly is NP Hard which is computationally expensive and time
consuming. Also MAQ is single threaded software which doesn't use the power of multi core and
distributed computing and it doesn't scale. In this paper we report HPC-MAQ which addresses
the NP-Hard related challenges of genome reference assembly and enables MAQ parallel and scalable through Hadoop which is a software framework for distributed computing.
A clonal based algorithm for the reconstruction of genetic network using s sy...eSAT Journals
Abstract Motivation: Gene regulatory network is the network based approach to represent the interactions between genes. DNA microarray is the most widely used technology for extracting the relationships between thousands of genes simultaneously. Gene microarray experiment provides the gene expression data for a particular condition and varying time periods. The expression of a particular gene depends upon the biological conditions and other genes. In this paper, we propose a new method for the analysis of microarray data. The proposed method makes use of S-system, which is a well-accepted model for the gene regulatory network reconstruction. Since the problem has multiple solutions, we have to identify an optimized solution. Evolutionary algorithms have been used to solve such problems. Though there are a number of attempts already been carried out by various researchers, the solutions are still not that satisfactory with respect to the time taken and the degree of accuracy achieved. Therefore, there is a need of huge amount further work in this topic for achieving solutions with improved performances. Results: In this work, we have proposed Clonal selection algorithm for identifying optimal gene regulatory network. The approach is tested on the real life data: SOS Ecoli DNA repairing gene expression data. It is observed that the proposed algorithm converges much faster and provides better results than the existing algorithms. Index Terms: Microarray analysis, Evolutionary Algorithm, Artificial Immune System, S-system, Gene Regulatory Network, SOS Ecoli DNA repairing, Clonal Selection Algorithm.
EEG Based Classification of Emotions with CNN and RNNijtsrd
Emotions are biological states associated with the nervous system, especially the brain brought on by neurophysiological changes. They variously cognate with thoughts, feelings, behavioural responses, and a degree of pleasure or displeasure and it exists everywhere in daily life. It is a significant research topic in the development of artificial intelligence to evaluate human behaviour that are primarily based on emotions. In this paper, Deep Learning Classifiers will be applied to SJTU Emotion EEG Dataset SEED to classify human emotions from EEG using Python. Then the accuracy of respective classifiers that is, the performance of emotion classification using Convolutional Neural Network CNN and Recurrent Neural Networks are compared. The experimental results show that RNN is better than CNN in solving sequence prediction problems. S. Harshitha | Mrs. A. Selvarani "EEG Based Classification of Emotions with CNN and RNN" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30374.pdf Paper Url :https://www.ijtsrd.com/engineering/electronics-and-communication-engineering/30374/eeg-based-classification-of-emotions-with-cnn-and-rnn/s-harshitha
Apresentação de Marco Aurélio P. Lima realizada no "Workshop sobre Procedimentos que Regem o Relacionamento do CTBE com a Indústria"
Data: 1 de junho de 2010
Local: CTBE, Campinas, Brasil
Website do evento: http://www.bioetanol.org.br/workshop6
Presentation of Martin Junginger for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Martin Junginger realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Could A Model Of Predictive Voting Explain Many Long-Range Connections? by Su...Numenta
These are slides on a workshop Subutai Ahmad hosted on March 5, 2018 at the Computational and Systems Neuroscience Meeting (Cosyne) 2018.
About:
This workshop on long-range cortical circuits is focused on our peer-reviewed paper, “A Theory of How Columns in the Neocortex Enable Learning the Structure of the World.” Subutai discussed the inference mechanism introduced in the paper, our theory of location information, and how long-range connections allow columns to integrate inputs over space to perform object recognition.
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Abdelrahman Hosny
In the past, the study of medicine used to focus on observing biological processes that take place in organisms, and based on these observations, biologists would make conclusions that translate into a better understanding of how organisms systems work. Recently, the approach has changed to a computational paradigm, where scientists try to model these biological processes as mathematical equations or statistical models. In this study, we have modeled an important activity of cell replication in a type of bacteria genome using different deep learning network models. Results from this research suggest that deep learning models have the potential to learn representations of DNA sequences, hence predicting cell behavior. Source code is available under MIT license at: http://abdelrahmanhosny.github.io/DL-Cerevesiae/
This proposed method focus on these issues by developing a novel classification algorithm by combining Gene Expression Graph (GEG) with Manhattan distance. This method will be used to express the gene expression data. Gene Expression Graph provides the optimal view about the relationship between normal and unhealthy genes. The method of using a graph-based gene expression to express gene information was first offered by the authors in [1] and [2], It will permits to construct a classifier based on an association between graphs represented for well-known classes and graphs represented for samples to evaluate. Additionally Euclidean distance is used to measure the strength of relationship which exists between the genes.
Have We Missed Half of What the Neocortex Does? by Jeff Hawkins (12/15/2017)Numenta
This was a presentation given on December 15, 2017 at the MIT Center for Brains, Minds + Machines as part of their Brains, Minds and Machines Seminar Series.
You can watch the recording of the presentation after Slide 1.
In this talk, Jeff describes a theory that sensory regions of the neocortex process two inputs. One input is the well-known sensory data arriving via thalamic relay cells. We propose the second input is a representation of allocentric location. The allocentric location represents where the sensed feature is relative to the object being sensed, in an object-centric reference frame. As the sensors move, cortical columns learn complete models of objects by integrating sensory features and location representations over time. Lateral projections allow columns to rapidly reach a consensus of what object is being sensed. We propose that the representation of allocentric location is derived locally, in layer 6 of each column, using the same tiling principles as grid cells in the entorhinal cortex. Because individual cortical columns are able to model complete complex objects, cortical regions are far more powerful than currently believed. The inclusion of allocentric location offers the possibility of rapid progress in understanding the function of numerous aspects of cortical anatomy.
Jeff discusses material from these two papers. Others can be found at https://numenta.com/papers
A Theory of How Columns in the Neocortex Enable Learning the Structure of the World
URL: https://doi.org/10.3389/fncir.2017.00081
Why Neurons Have Thousands of Synapses, A Theory of Sequence Memory in the Neocortex
URL: https://doi.org/10.3389/fncir.2016.00023
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERcscpconf
Bioinformatics and computational biology are rooted in life sciences as well as computer and
information sciences and technologies. Bioinformatics applies principles of information
sciences and technologies to make the vast, diverse, and complex life sciences data more
understandable and useful. Computational biology uses mathematical and computational
approaches to address theoretical and experimental questions in biology. Short read sequence
assembly is one of the most important steps in the analysis of biological data. There are many
open source software’s available for short read sequence assembly where MAQ is one such
popularly used software by the research community.
In general, biological data sets generated by next generation sequencers are very huge and
massive which requires tremendous amount of computational resources. The algorithm used for
the short read sequence assembly is NP Hard which is computationally expensive and time
consuming. Also MAQ is single threaded software which doesn't use the power of multi core and
distributed computing and it doesn't scale. In this paper we report HPC-MAQ which addresses
the NP-Hard related challenges of genome reference assembly and enables MAQ parallel and scalable through Hadoop which is a software framework for distributed computing.
A clonal based algorithm for the reconstruction of genetic network using s sy...eSAT Journals
Abstract Motivation: Gene regulatory network is the network based approach to represent the interactions between genes. DNA microarray is the most widely used technology for extracting the relationships between thousands of genes simultaneously. Gene microarray experiment provides the gene expression data for a particular condition and varying time periods. The expression of a particular gene depends upon the biological conditions and other genes. In this paper, we propose a new method for the analysis of microarray data. The proposed method makes use of S-system, which is a well-accepted model for the gene regulatory network reconstruction. Since the problem has multiple solutions, we have to identify an optimized solution. Evolutionary algorithms have been used to solve such problems. Though there are a number of attempts already been carried out by various researchers, the solutions are still not that satisfactory with respect to the time taken and the degree of accuracy achieved. Therefore, there is a need of huge amount further work in this topic for achieving solutions with improved performances. Results: In this work, we have proposed Clonal selection algorithm for identifying optimal gene regulatory network. The approach is tested on the real life data: SOS Ecoli DNA repairing gene expression data. It is observed that the proposed algorithm converges much faster and provides better results than the existing algorithms. Index Terms: Microarray analysis, Evolutionary Algorithm, Artificial Immune System, S-system, Gene Regulatory Network, SOS Ecoli DNA repairing, Clonal Selection Algorithm.
EEG Based Classification of Emotions with CNN and RNNijtsrd
Emotions are biological states associated with the nervous system, especially the brain brought on by neurophysiological changes. They variously cognate with thoughts, feelings, behavioural responses, and a degree of pleasure or displeasure and it exists everywhere in daily life. It is a significant research topic in the development of artificial intelligence to evaluate human behaviour that are primarily based on emotions. In this paper, Deep Learning Classifiers will be applied to SJTU Emotion EEG Dataset SEED to classify human emotions from EEG using Python. Then the accuracy of respective classifiers that is, the performance of emotion classification using Convolutional Neural Network CNN and Recurrent Neural Networks are compared. The experimental results show that RNN is better than CNN in solving sequence prediction problems. S. Harshitha | Mrs. A. Selvarani "EEG Based Classification of Emotions with CNN and RNN" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30374.pdf Paper Url :https://www.ijtsrd.com/engineering/electronics-and-communication-engineering/30374/eeg-based-classification-of-emotions-with-cnn-and-rnn/s-harshitha
Apresentação de Marco Aurélio P. Lima realizada no "Workshop sobre Procedimentos que Regem o Relacionamento do CTBE com a Indústria"
Data: 1 de junho de 2010
Local: CTBE, Campinas, Brasil
Website do evento: http://www.bioetanol.org.br/workshop6
Presentation of Martin Junginger for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Martin Junginger realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Manoel Regis Leal for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Manoel Regis Leal realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Marcia Azanha for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Marcia Azanha realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Apresentação de Rosana Di Giorgio realizada no "Workshop sobre Procedimentos que Regem o Relacionamento do CTBE com a Indústria"
Data: 1 de junho de 2010
Local: CTBE, Campinas, Brasil
Website do evento: http://www.bioetanol.org.br/workshop6
Presentation of Manoel Regis Leal for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Manoel Regis Leal realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Marcel Gomes
for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Marcel Gomes realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Event / Evento: II Workshop on Sugarcane Physiology for Agronomic Applications
Speaker / Palestrante: Frederick C. Botha (Sugar Research Australia)
Date / Data: Oct, 29-30th 2013 / 29 e 30 de outubro de 2013
Place / Local: CTBE/CNPEM Campus, Campinas, Brazil
Event Website / Website do evento: www.bioetanol.org.br/sugarcanephysiology
Presentation of Arnaldo Walter for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Arnaldo Walter realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Event / Evento: II Workshop on Sugarcane Physiology for Agronomic Applications
Speaker / Palestrante: Gaspar H. Korndörfer (Federal University of Uberlândia)
Date / Data: Oct, 29-30th 2013 / 29 e 30 de outubro de 2013
Place / Local: CTBE/CNPEM Campus, Campinas, Brazil
Event Website / Website do evento: www.bioetanol.org.br/sugarcanephysiology
Introduction to graph databases and Neo4j for the bachelors student in Life sciences. Hands-on workshop for Neo4j and Cypher query language. The source of material for the hands-on training is: https://neo4j.com/graphacademy/online-training/introduction-to-neo4j/
Single-cell RNA sequencing workshop given at the Ottawa Hospital Research Institute in 2018. Note that slides contain animations that won't be viewed in the slidehsare
Understanding Protein Function on a Genome-scale through the Analysis of Molecular Networks
Cornell Medical School, Physiology, Biophysics and Systems Biology (PBSB) graduate program, 2009.01.26, 16:00-17:00; [I:CORNELL-PBSB] (Long networks talk, incl. the following topics: why networks w. amsci*, funnygene*, net. prediction intro, memint*, tse*, essen*, sandy*, metagenomics*, netpossel*, tyna*+ topnet*, & pubnet* . Fits easily into 60’ w. 10’ questions. PPT works on mac & PC and has many photos w. EXIF tag kwcornellpbsb .)
Date Given: 01/26/2009
Summary: ENViz performs enrichment analysis for pathways and gene ontology (GO) terms in matched datasets of multiple data types (e.g. gene expression and metabolites or miRNA), then visualizes results as a Cytoscape network that can be navigated to show data overlaid on pathways and GO DAGs.
Background: Modern genomic, metabolomics, and proteomic assays produce multiplexed measurements that characterize molecular composition and biological activity from complimentary angles. Integrative analysis of such measurements remains a challenge to life science and biomedical researchers. We present an enrichment network approach to jointly analyzing two types of sample matched datasets and systematic annotations, implemented as a plugin to the Cytoscape [1] network biology software platform.
Approach: ENViz analyses a primary dataset (e.g. gene expression) with respect to a ‘pivot’ dataset (e.g. miRNA expression, metabolomics or proteomics measurements) and primary data annotation (e.g. pathway or GO). For each pivot entity, we rank elements of the primary data based on the correlation to the pivot across all samples, and compute statistical enrichment of annotation sets in the top of this ranked list based on minimum hypergeometric statistics [2]. Significant results are represented as an enrichment network - a bipartite graph with nodes corresponding to pivot and annotation entities, and edges corresponding to pivot-annotation pairs with statistical enrichmentscores above the user defined threshold. Correlations of primary data and pivot data are visually overlaid on biological pathways for significant pivot-annotation pairs using the WikiPathways resource [3], and on gene ontology terms. Edges of the enrichment network may point to functionally relevant mechanisms. In [4], a significant association between miR-19a and the cell-cycle module was substantiated as an association to proliferation, validated using a high-throughput transfection assay. The figures below show a pathway enrichment network, with pathway nodes green and miRNAs gray (left), network view of the edge between Inflammatory Response Pathway and mir-337-5p (center), and GO enrichment network with red areas indicating high enrichment for immune response and metabolic processes (right).
Personalized medicine via molecular interrogation, data mining and systems bi...Gerald Lushington
One of the major problems in our medical system is the prescription of medicines that, although well validated over a general group of clinical trial patients for specific ailments, may produce unhelpful or even harmful results in some individuals. A major emerging goal in the pharmaceutical and biomedical industries is the ability to tailor medicines to the individual. This can be achieved, but in practice still requires careful analysis of an extensive array of data and thus has not yet entered the mainstream medical practice.
large data set is not available for some disease such as Brain Tumor. This and part2 presentation shows how to find "Actionable solution from a difficult cancer dataset
Event / Evento: II Workshop on Sugarcane Physiology for Agronomic Applications
Speaker / Palestrante: Jorge Donzeli (Sugarcane Research Center - CTC)
Date / Data: Oct, 29-30th 2013 / 29 e 30 de outubro de 2013
Place / Local: CTBE/CNPEM Campus, Campinas, Brazil
Event Website / Website do evento: www.bioetanol.org.br/sugarcanephysiology
Event / Evento: II Workshop on Sugarcane Physiology for Agronomic Applications
Speaker / Palestrante: Renato Vicentini (University of Campinas - Unicamp)
Date / Data: Oct, 29-30th 2013 / 29 e 30 de outubro de 2013
Place / Local: CTBE/CNPEM Campus, Campinas, Brazil
Event Website / Website do evento: www.bioetanol.org.br/sugarcanephysiology
Apresentação de Naldo Dantas realizada no "Workshop sobre Procedimentos que Regem o Relacionamento do CTBE com a Indústria"
Data: 1 de junho de 2010
Local: CTBE, Campinas, Brasil
Website do evento: http://www.bioetanol.org.br/workshop6
Apresentação de Gilson Spanemberg realizada no "Workshop sobre Procedimentos que Regem o Relacionamento do CTBE com a Indústria"
Data: 1 de junho de 2010
Local: CTBE, Campinas, Brasil
Website do evento: http://www.bioetanol.org.br/workshop6
Apresentação de Laercio de Sequeira realizada no "Workshop sobre Procedimentos que Regem o Relacionamento do CTBE com a Indústria"
Data: 1 de junho de 2010
Local: CTBE, Campinas, Brasil
Website do evento: http://www.bioetanol.org.br/workshop6
Presentation of Celso Manzato for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Celso Manzato realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Thelma Krug for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Thelma Krug realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Robert Boddey for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Robert Boddey realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Dr Mairi J Black
for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Dr Mairi J Black realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Joaquim Seabra
for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Joaquim Bento Ferreira realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Joaquim Bento Ferreira for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Joaquim Bento Ferreira realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Gerd Spavorek for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Gerd Spavorek realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Carlos C. Cerri for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Carlos C. Cerri realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Antonio D. Santiago for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Antonio D. Santiago realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Andre Nassar for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Marcos S. Buckeridge realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Semida Silveira for the "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle"
Apresentação de Semida Silveira realizada no "2nd Workshop on the Impact of New Technologies on the Sustainability of the Sugarcane/Bioethanol Production Cycle "
Date / Data : Novr 11th - 12th 2009/
11 e 12 de novembro de 2009
Place / Local: CTBE, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop5
Presentation of Rubens Maciel for the "Workshop Virtual Sugarcane Biorefinery"
Apresentação de Rubens Maciel realizada no "Workshop Virtual Sugarcane Biorefinery "
Date / Data : Aug 13 - 14th 2009/
13 e 14 de agosto de 2009
Place / Local: ABTLus, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop4
Presentation of Rosana Ceron Di Giorgio for the "Workshop Virtual Sugarcane Biorefinery"
Apresentação de Rosana Ceron Di Giorgio realizada no "Workshop Virtual Sugarcane Biorefinery "
Date / Data : Aug 13 - 14th 2009/
13 e 14 de agosto de 2009
Place / Local: ABTLus, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop4
Presentation of Paulo A. Soares for the "Workshop Virtual Sugarcane Biorefinery"
Apresentação de Paulo A. Soares realizada no "Workshop Virtual Sugarcane Biorefinery "
Date / Data : Aug 13 - 14th 2009/
13 e 14 de agosto de 2009
Place / Local: ABTLus, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop4
Presentation of José Dilcio Rocha for the "Workshop Virtual Sugarcane Biorefinery"
Apresentação de José Dilcio Rocha realizada no "Workshop Virtual Sugarcane Biorefinery "
Date / Data : Aug 13 - 14th 2009/
13 e 14 de agosto de 2009
Place / Local: ABTLus, Campinas, Brazil
Event Website / Website do evento: http://www.bioetanol.org.br/workshop4
More from CTBE - Brazilian Bioethanol Sci&Tech Laboratory (20)
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Biological Network Inference via Gaussian Graphical Models
1. An introduction to Biological network inference via
Gaussian Graphical Models
Christophe Ambroise, Julien Chiquet
e ´
Statistique et G´nome, CNRS & Universit´ d’Evry Val d’Essonne
e
S˜o Paulo – School on Advance Science – Octobre 2012
a
http://stat.genopole.cnrs.fr/~cambroise
Network inference 1
2. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 2
3. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 3
4. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 4
5. Real networks
I Many scientific fields :
I World Wide Web
I Biology, sociology, physics
I Nature of data under study:
I Interactions between N
objects
I O(N 2 ) possible interactions
I Network topology :
I Describes the way nodes
interact, structure/function Sample of 250 blogs (nodes) with their links
relationship (edges) of the French political Blogosphere.
Network inference 5
6. 1
What the reconstructed networks are expected to be (1)
Regulatory networks
E. coli regulatory network
I relationships between
gene and their products
I inhibition/activation
I impossible to recover at
large scale
I always incomplete
1
1
and are presumably wrongly assumed to be
Network inference 6
7. What the reconstructed networks are expected to be (2)
Regulatory networks
Figure: Regulatory network identified in mammalian cells: highly structured
Network inference 7
8. What the reconstructed networks are expected to be (3)
Protein-Protein interaction networks
Figure: Yeast PPI network : do not be mislead by the representation, trust stat !
Network inference 8
9. What the reconstructed networks are expected to be (3)
Protein-Protein interaction networks
Figure: Yeast PPI network : do not be mislead by the representation, trust stat !
Network inference 8
10. What the reconstructed networks are expected to be (3)
Protein-Protein interaction networks
Figure: Yeast PPI network : do not be mislead by the representation, trust stat !
Network inference 8
11. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 9
12. What are we looking at?
Central dogma of molecular biology
transcription translation
DNA mRNA Proteins
replication
Proteins
I are building blocks of any cellular functionality,
I are encoded by the genes,
I do interact (at the protein and gene level – regulations).
Network inference 10
13. What questions in functional genomics? (1)
Various levels/scales of study
I genome: sequence analysis,
I transcriptome: gene expression levels,
I proteome: protein functions and interactions.
Questions
1. Biological understanding
I Mechanisms of diseases,
I gene/protein functions and interactions.
2. Medical/clinical care
I Diagnostic (type of disease),
I prognostic (survival analysis),
I treatment (prediction of response).
Network inference 11
14. What questions in functional genomics? (1)
Various levels/scales of study
I genome: sequence analysis,
I transcriptome: gene expression levels,
I proteome: protein functions and interactions.
Questions
1. Biological understanding
I Mechanisms of diseases,
I gene/protein functions and interactions.
2. Medical/clinical care
I Diagnostic (type of disease),
I prognostic (survival analysis),
I treatment (prediction of response).
Network inference 11
15. What questions in functional genomics? (2)
Central dogma of molecular biology
transcription translation
DNA mRNA Proteins
replication
Basic biostatistical issues
Selecting some genes of interest (biomarkers),
Looking for interactions between them (pathway analysis).
Network inference 12
16. How is this measured? (1)
Microarray technology: parallel measurement of many biological features
signal processing
Matrix of features n ⌧ p 0 1 2 3 p 1
x1 x1 x1 . . . x1
Expression levels of p B. C
pretreatment X=@ .
. A
probes are simultaneously p
1 2 2
xn xn x1 . . . xn
monitored for n individuals
Network inference 13
17. How is this measured? (2)
Next Generation Sequencing: parallel measurement of even many more biological features
assembling
Matrix of features n n p 0 1 2 3 p 1
k1 k1 k1 . . . k1
B. C
X=@ .
Expression counts are extracted
pretreatment . A
from small repeated sequences p
1 2 2
kn kn k1 . . . kn
and monitored for n individuals
Network inference 14
18. What questions are we dealing with? (1)
Supervised canonical example at the gene level: di↵erential analysis
Leukemia (Golub data, thanks to P. Neuvial)
I AML – Acute Myeloblastic Leukemia, n1 = 11,
I ALL – Acute Lymphoblastic Leukemia n2 = 27,
a n1 + n2 vector of outcome with each patient’s tumor type.
Supervised classification
Find genes with significant
di↵erent expression levels
between groups – biomarkers
prediction purpose
Network inference 15
19. What questions are we dealing with? (2)
Unsupervised canonical example at the gene level: hierarchical clustering
Same kind of data, no outcome is considered
(Unsupervised) clustering
Find groups of gene which show
statistical
dependencies/commonalities –
hoping for biological interactions
exploratory purpose
functional understanding
Can we do better than that ? And how do genes interact anyway?
Network inference 16
20. What questions are we dealing with? (2)
Unsupervised canonical example at the gene level: hierarchical clustering
Same kind of data, no outcome is considered
(Unsupervised) clustering
Find groups of gene which show
statistical
dependencies/commonalities –
hoping for biological interactions
exploratory purpose
functional understanding
Can we do better than that ? And how do genes interact anyway?
Network inference 16
21. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 17
22. The problem at hand
Inference
⇡ 10s/100s microarray/sequencing experiments
⇡ 1000s probes (“genes”)
Modeling questions prior to inference
1. What do the nodes represent? (the easiest one)
2. What is/should be the meaning of an edge? (the toughest one)
I Biologically?
I Statistically?
Network inference 18
23. The problem at hand
Inference
⇡ 10s/100s microarray/sequencing experiments
⇡ 1000s probes (“genes”)
Modeling questions prior to inference
1. What do the nodes represent? (the easiest one)
2. What is/should be the meaning of an edge? (the toughest one)
I Biologically?
I Statistically?
Network inference 18
24. The problem at hand
Inference
⇡ 10s/100s microarray/sequencing experiments
⇡ 1000s probes (“genes”)
Modeling questions prior to inference
1. What do the nodes represent? (the easiest one)
2. What is/should be the meaning of an edge? (the toughest one)
I Biologically?
I Statistically?
Network inference 18
25. The problem at hand
Inference
⇡ 10s/100s microarray/sequencing experiments
⇡ 1000s probes (“genes”)
Modeling questions prior to inference
1. What do the nodes represent? (the easiest one)
2. What is/should be the meaning of an edge? (the toughest one)
I Biologically?
I Statistically?
Network inference 18
26. More questions/issues
Modelling
I Is the network dynamic of static?
I How has the data been generated? (time-course/steady state)
I Are the edges oriented or not? (causality)
I What do the edges represent for my particular problem?
Statistical challenges
I (Ultra) high dimensionality,
I Noisy data, lack of reproducibility,
I Heterogeneity of the data (many techniques, various signals).
Network inference 19
27. More questions/issues
Modelling
I Is the network dynamic of static?
I How has the data been generated? (time-course/steady state)
I Are the edges oriented or not? (causality)
I What do the edges represent for my particular problem?
Statistical challenges
I (Ultra) high dimensionality,
I Noisy data, lack of reproducibility,
I Heterogeneity of the data (many techniques, various signals).
Network inference 19
28. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 20
29. Canonical model settings
Biological microarrays in comparable conditions
Notations
1. a set P = {1, . . . , p} of p variables:
these are typically the genes (could be proteins);
2. a sample N = {1, . . . , n} of individuals associated to the variables:
these are typically the microarray (could be sequence counts).
Basic statistical model
This can be view as
I a random vector X in Rp , whose j th entry is the j th variable,
I a n-size sample (X 1 , . . . , X n ), such as X i is the i th microarrays,
I could be independent identically distributed copies (steady-state)
I could be dependent in a certain way (time-course data)
I assume a parametric probability distribution for X (Gaussian).
Network inference 21
30. Canonical model settings
Biological microarrays in comparable conditions
Notations
1. a set P = {1, . . . , p} of p variables:
these are typically the genes (could be proteins);
2. a sample N = {1, . . . , n} of individuals associated to the variables:
these are typically the microarray (could be sequence counts).
Basic statistical model
This can be view as
I a random vector X in Rp , whose j th entry is the j th variable,
I a n-size sample (X 1 , . . . , X n ), such as X i is the i th microarrays,
I could be independent identically distributed copies (steady-state)
I could be dependent in a certain way (time-course data)
I assume a parametric probability distribution for X (Gaussian).
Network inference 21
31. Canonical model settings
Biological microarrays in comparable conditions
Notations
1. a set P = {1, . . . , p} of p variables:
these are typically the genes (could be proteins);
2. a sample N = {1, . . . , n} of individuals associated to the variables:
The data are typically the microarray (could be sequence counts).
these
Stacking (X 1 , . . . , X n ), we met the usual individual/variable table X
Basic statistical model 0 1 2 3 p1
This can be view as x1 x1 x1 . . . x1
B. C
I Inference j th @ .
a random vector X in Rp , whose X =entry is the j th variable, A
.
1 2 2 p
I a n-size sample (X 1 , . . . , X n ), such as Xxin is xn ix1 microarrays,
the th . . . xn
I could be independent identically distributed copies (steady-state)
I could be dependent in a certain way (time-course data)
I assume a parametric probability distribution for X (Gaussian).
Network inference 21
32. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 22
33. Modeling relationship between variables (1)
Independence
Definition (Independence of events)
Two events A and B are independent if and only if
P(A, B ) = P(A)P(B ),
which is usually denoted by A ? B . Equivalently,
?
I A ? B , P(A|B ) = P(A),
?
I A ? B , P(A|B ) = P(A|B c )
?
Example (class vs party)
party party
class Labour Tory class Labour Tory
working 0.42 0.28 working 0.60 0.40
bourgeoisie 0.06 0.24 bourgeoisie 0.20 0.80
Table: Joint probability (left) vs. conditional probability (right)
Network inference 23
34. Modeling relationship between variables (1)
Independence
Definition (Independence of events)
Two events A and B are independent if and only if
P(A, B ) = P(A)P(B ),
which is usually denoted by A ? B . Equivalently,
?
I A ? B , P(A|B ) = P(A),
?
I A ? B , P(A|B ) = P(A|B c )
?
Example (class vs party)
party party
class Labour Tory class Labour Tory
working 0.42 0.28 working 0.60 0.40
bourgeoisie 0.06 0.24 bourgeoisie 0.20 0.80
Table: Joint probability (left) vs. conditional probability (right)
Network inference 23
35. Modeling relationships between variables (2)
Conditional independence
Generalizing to more than two events requires strong assumptions
(mutual independence). Better handle with
Definition (Conditional independence of events)
Two events A and B are independent if and only if
P(A, B |C ) = P(A|C )P(B |C ),
which is usually denoted by A ? B |C
?
Example (Does QI depends on weight?)
Consider the events A = ”having low QI”, B = ”having low weight”.
Network inference 24
36. Modeling relationships between variables (2)
Conditional independence
Generalizing to more than two events requires strong assumptions
(mutual independence). Better handle with
Definition (Conditional independence of events)
Two events A and B are independent if and only if
P(A, B |C ) = P(A|C )P(B |C ),
which is usually denoted by A ? B |C
?
Example (Does QI depends on weight?)
Consider the events A = ”having low QI”, B = ”having low weight”.
Network inference 24
37. Modeling relationships between variables (2)
Conditional independence
Generalizing to more than two events requires strong assumptions
(mutual independence). Better handle with
Definition (Conditional independence of events)
Two events A and B are independent if and only if
P(A, B |C ) = P(A|C )P(B |C ),
which is usually denoted by A ? B |C
?
Example (Does QI depends on weight?)
Consider the events A = ”having low QI”, B = ”having low weight”.
Estimating2 P(A, B ), P(A) and P(B ) in a sample would lead to
P(A, B ) 6= P(A)P(B )
2
stupidly
Network inference 24
38. Modeling relationships between variables (2)
Conditional independence
Generalizing to more than two events requires strong assumptions
(mutual independence). Better handle with
Definition (Conditional independence of events)
Two events A and B are independent if and only if
P(A, B |C ) = P(A|C )P(B |C ),
which is usually denoted by A ? B |C
?
Example (Does QI depends on weight?)
Consider the events A = ”having low QI”, B = ”having low weight”.
But in fact, introducing C = ”having a given age”,
P(A, B |C ) = P(A|C )P(B |C )
Network inference 24
39. Independence of random vectors (1)
Independence and Conditional independence: natural generalization
Definition
Consider 3 random vector X , Y , Z with distribution fX , fY , fZ , jointly
fXY , fXYZ . Then,
I X and Y are independent iif fXY (x , y) = fX (x )fY (y);
I X and Y are conditionally independent on Z , z : fZ (z ) > 0 iif
fXY |Z (x , y; z ) = fX |Z (x ; z )fY |Z (y; z ).
Proposition (Factorization criterion)
X and Y are independent (resp. conditionally independent on Z ) iif
there exists functions g and h such as, for all x and y
1. fXY (x , y) = g(x )h(y),
2. fXYZ (x , y, z ) = g(x , z )h(y, z ), for all z fZ (z ) > 0.
Network inference 25
40. Independence of random vectors (1)
Independence and Conditional independence: natural generalization
Definition
Consider 3 random vector X , Y , Z with distribution fX , fY , fZ , jointly
fXY , fXYZ . Then,
I X and Y are independent iif fXY (x , y) = fX (x )fY (y);
I X and Y are conditionally independent on Z , z : fZ (z ) > 0 iif
fXY |Z (x , y; z ) = fX |Z (x ; z )fY |Z (y; z ).
Proposition (Factorization criterion)
X and Y are independent (resp. conditionally independent on Z ) iif
there exists functions g and h such as, for all x and y
1. fXY (x , y) = g(x )h(y),
2. fXYZ (x , y, z ) = g(x , z )h(y, z ), for all z fZ (z ) > 0.
Network inference 25
41. Independence of random vectors (2)
Independence vs Conditional independence
f ; X ? Y |Z
?
f ; fXYZ
f ; fX fY fZ
f ; X ? Z |Y
? f ; Y ? Z |X
?
Figure: Mutual independence, Conditional dependence, full dependence.
Network inference 26
42. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 27
43. Definition
Definition
A graphical model gives a graphical (intuitive) representation of the
dependence structure of a probability distribution.
Graphical structure $ Random variables/Random vector
It links
1. a random vector (or a set of random variables.) X = {X1 , . . . , Xp }
with distribution P,
2. a graph G = (P, E) where
I P = {1, . . . , p} is the set of nodes associated to each variable,
I E is a set of edges describing the dependence relationship of X ⇠ P.
Network inference 28
44. Definition
Definition
A graphical model gives a graphical (intuitive) representation of the
dependence structure of a probability distribution.
Graphical structure $ Random variables/Random vector
It links
1. a random vector (or a set of random variables.) X = {X1 , . . . , Xp }
with distribution P,
2. a graph G = (P, E) where
I P = {1, . . . , p} is the set of nodes associated to each variable,
I E is a set of edges describing the dependence relationship of X ⇠ P.
Network inference 28
45. Conditional Independence Graphs
Definition
Definition
The conditional independence graph of a random vector X is the
undirected graph G = {P, E} with the set of node P = {1, . . . , p} and
where
(i , j ) 2 E , Xi ? Xj |P{i , j }.
/ ?
Property
It owns the Markov property: any two subsets of variables separated by a
third is independent conditionally on variables in the third set.
Network inference 29
46. Conditional Independence Graphs
Definition
Definition
The conditional independence graph of a random vector X is the
undirected graph G = {P, E} with the set of node P = {1, . . . , p} and
where
(i , j ) 2 E , Xi ? Xj |P{i , j }.
/ ?
Property
It owns the Markov property: any two subsets of variables separated by a
third is independent conditionally on variables in the third set.
Network inference 29
47. Conditional Independence Graphs
An example
Let X1 , X2 , X3 , X4 be four random variables with joint probability density
function fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) with u a given constant.
Apply the factorization property
fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 )
= exp(u) · exp(x1 + x1 x2 ) · exp(x2 x3 x4 )
Graphical representation
1 2 4
G = (P, E) such as P = {1, 2, 3, 4}
and
E=
3
Network inference 30
48. Conditional Independence Graphs
An example
Let X1 , X2 , X3 , X4 be four random variables with joint probability density
function fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) with u a given constant.
Apply the factorization property
fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 )
= exp(u) · exp(x1 + x1 x2 ) · exp(x2 x3 x4 )
Graphical representation
1 2 4
G = (P, E) such as P = {1, 2, 3, 4}
and
E = {?}
3
Network inference 30
49. Conditional Independence Graphs
An example
Let X1 , X2 , X3 , X4 be four random variables with joint probability density
function fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) with u a given constant.
Apply the factorization property
fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 )
= exp(u) · exp(x1 + x1 x2 ) · exp(x2 x3 x4 )
Graphical representation
1 2 4
G = (P, E) such as P = {1, 2, 3, 4}
and
E = {(1, 2)}
3
Network inference 30
50. Conditional Independence Graphs
An example
Let X1 , X2 , X3 , X4 be four random variables with joint probability density
function fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 ) with u a given constant.
Apply the factorization property
fX (x ) = exp(u + x1 + x1 x2 + x2 x3 x4 )
= exp(u) · exp(x1 + x1 x2 ) · exp(x2 x3 x4 )
Graphical representation
1 2 4
G = (P, E) such as P = {1, 2, 3, 4}
and
E = {(2, 3), (3, 4), (2, 4)}
3
Network inference 30
51. Directed Acyclic conditional independence Graph (DAG)
Motivation
Limitation of undirected graphs
Sometimes an ordering on the variables is known, which allows to break
the symmetry in the graphical representation to introduce, in some sense,
“causality” in the modeling.
Consequences
I Each element of E has to be directed.
I There are no directed cycle in the graph.
We thus deal with a directed acyclic graph (or DAG).
Network inference 31
52. Directed Acyclic conditional independence Graph (DAG)
Definition
Definition (Ordering)
An ordering between variables {1, . . . , p} is a relation such that: i) for
all couple (i , j ), either i j or j i , ii) is transitive iii) is not
reflexive.
I A natural ordering is obtained when variables are observed across
time,
I A natural conditioning set for a pair of variables (i , j ) is the past,
denoted P(j ) = 1, . . . , j for j .
Definition (DAG)
The directed conditional dependence graph of X is the directed graph
G = (P, E ) where
(i , j ) such as i j 2 E , Xj ? Xi |P(j ){i , j }.
/ ?
Network inference 32
53. Directed Acyclic conditional independence Graph (DAG)
Definition
Definition (Ordering)
An ordering between variables {1, . . . , p} is a relation such that: i) for
all couple (i , j ), either i j or j i , ii) is transitive iii) is not
reflexive.
I A natural ordering is obtained when variables are observed across
time,
I A natural conditioning set for a pair of variables (i , j ) is the past,
denoted P(j ) = 1, . . . , j for j .
Definition (DAG)
The directed conditional dependence graph of X is the directed graph
G = (P, E ) where
(i , j ) such as i j 2 E , Xj ? Xi |P(j ){i , j }.
/ ?
Network inference 32
54. Directed Acyclic conditional independence Graph (DAG)
Factorization and Markov property
Another view is a parent/descendant relationships to deal with the
ordering of the nodes:
The factorization property
p
Y
fX (x ) = fXk |pak (xk |pak ),
k =1
where pak are the parents of node k .
Network inference 33
63. Directed Acyclic conditional independence Graph (DAG)
Markov property
Local Markov property
For any Y 2 dek where dek are the descendants of k , then
Xk ? Y | pak ,
?
that is, Xk is conditionally independent on its non-descendants given its
parents.
Network inference 35
66. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 37
67. Modeling the genomic data
Gaussian assumption
The data
0 1 2 3 p 1
x1 x1 x1 . . . x1
B. C
Inference X=@ .
. A
1 2 2 p
xn xn x1 . . . xn
Assuming fX (X) multivariate Gaussian
Greatly simplifies the inference:
naturally links independence and conditional independence to the
covariance and partial covariance,
gives a straightforward interpretation to the graphical modeling
previously considered.
Network inference 38
68. Modeling the genomic data
Gaussian assumption
The data
0 1 2 3 p 1
x1 x1 x1 . . . x1
B. C
Inference X=@ .
. A
1 2 2 p
xn xn x1 . . . xn
Assuming fX (X) multivariate Gaussian
Greatly simplifies the inference:
naturally links independence and conditional independence to the
covariance and partial covariance,
gives a straightforward interpretation to the graphical modeling
previously considered.
Network inference 38
69. Start gently with the univariate Gaussian distribution
The Gaussian distribution is the
natural model for the level of
expression of gene (noisy data).
We note X ⇠ N (µ, 2 ),so as EX = µ, VarX = 2 and
⇢
1 1
fX (x ) = p exp (x µ)2 ,
2⇡ 2 2
and
p 1
log fX (x ) = log 2⇡ 2
(x µ)2 .
2
Useless for modeling the distribution of expression level for a whole
bunch of genes.
Network inference 39
70. Start gently with the univariate Gaussian distribution
The Gaussian distribution is the
natural model for the level of
expression of gene (noisy data).
We note X ⇠ N (µ, 2 ),so as EX = µ, VarX = 2 and
⇢
1 1
fX (x ) = p exp (x µ)2 ,
2⇡ 2 2
and
p 1
log fX (x ) = log 2⇡ 2
(x µ)2 .
2
Useless for modeling the distribution of expression level for a whole
bunch of genes.
Network inference 39
71. One step forward: bivariate Gaussian distribution
Need concepts of covariance and correlation
Let X , Y be two real random variables.
Definitions h i
cov(X , Y ) = E X E(X ) Y E(Y ) = E(XY ) E(X )E(Y ).
cov(X , Y )
⇢XY = cor(X , Y ) = p .
Var(X ) · Var(Y )
Proposition
I cov(X , X ) = Var(X ) = E[(X EX )(Y EY )],
I cov(X + Y , Z ) = cov(X , Z ) + cov(X , Z ),
I Var(X + Y ) = Var(X ) + Var(Y ) + 2cov(X , Y ).
I X ? Y ) cov(X , Y ) = 0.
?
I X ? Y , cov(X , Y ) = 0 when X , Y are Gaussian.
?
Network inference 40
72. One step forward: bivariate Gaussian distribution
Need concepts of covariance and correlation
Let X , Y be two real random variables.
Definitions h i
cov(X , Y ) = E X E(X ) Y E(Y ) = E(XY ) E(X )E(Y ).
cov(X , Y )
⇢XY = cor(X , Y ) = p .
Var(X ) · Var(Y )
Proposition
I cov(X , X ) = Var(X ) = E[(X EX )(Y EY )],
I cov(X + Y , Z ) = cov(X , Z ) + cov(X , Z ),
I Var(X + Y ) = Var(X ) + Var(Y ) + 2cov(X , Y ).
I X ? Y ) cov(X , Y ) = 0.
?
I X ? Y , cov(X , Y ) = 0 when X , Y are Gaussian.
?
Network inference 40
73. The bivariate Gaussian distribution
✓ ◆
1 1 1 x µ1
fXY (x , y) = p exp{ x µ1 y µ2 ⌃ }
2⇡ det ⌃ 2 y µ2
where ⌃ is the variance/covariance matrix which is symmetric and
positive definite.
✓ ◆
Var(X ) cov(Y , X )
⌃= .
cov(Y , X ) Var(Y )
and
1 1
fX ,Y (x , y) = p exp (x 2 + y 2 + 2⇢XY xy),
2⇡(1 ⇢2 )
XY
2(1 ⇢2 )
XY
where ⇢XY is the correlation between X , Y and describe the interaction
between them.
Network inference 41
74. The bivariate Gaussian distribution
✓ ◆
1 1 1 x µ1
fXY (x , y) = p exp{ x µ1 y µ2 ⌃ }
2⇡ det ⌃ 2 y µ2
where ⌃ is the variance/covariance matrix which is symmetric and
positive definite. If standardized,
✓ ◆
1 ⇢XY
⌃= .
⇢XY 1
and
1 1
fX ,Y (x , y) = p exp (x 2 + y 2 + 2⇢XY xy),
2⇡(1 ⇢2 )
XY
2(1 ⇢2 )
XY
where ⇢XY is the correlation between X , Y and describe the interaction
between them.
Network inference 41
75. The bivariate Gaussian distribution
The Covariance Matrix
Let
X ⇠ N (0, ⌃),
with unit variance and
⇢XY = 0
✓ ◆
1 0
⌃= .
0 1
The shape of the 2-D
distribution evolves
accordingly.
Network inference 42
76. The bivariate Gaussian distribution
The Covariance Matrix
Let
X ⇠ N (0, ⌃),
with unit variance and
⇢XY = 0.9
✓ ◆
1 0.9
⌃= .
0.9 1
The shape of the 2-D
distribution evolves
accordingly.
Network inference 42
77. Full generalization: multivariate Gaussian vector
Now need partial covariance and partial correlation
Let X , Y , Z be real random variables.
Definitions
cov(X , Y |Z ) = cov(X , Y ) cov(X , Z )cov(Y , Z )/Var(Z ).
⇢XY ⇢XZ ⇢YZ
⇢XY |Z = q q .
1 ⇢2XZ 1 ⇢2 YZ
Give the interaction between X and Y once removed the e↵ect of Z .
Proposition
When X , Y , Z are jointly Gaussian, then
cov(X , Y |Z ) = 0 , cor(X , Y |Z ) = 0 , X ? Y |Z .
?
Network inference 43
78. Full generalization: multivariate Gaussian vector
Now need partial covariance and partial correlation
Let X , Y , Z be real random variables.
Definitions
cov(X , Y |Z ) = cov(X , Y ) cov(X , Z )cov(Y , Z )/Var(Z ).
⇢XY ⇢XZ ⇢YZ
⇢XY |Z = q q .
1 ⇢2XZ 1 ⇢2 YZ
Give the interaction between X and Y once removed the e↵ect of Z .
Proposition
When X , Y , Z are jointly Gaussian, then
cov(X , Y |Z ) = 0 , cor(X , Y |Z ) = 0 , X ? Y |Z .
?
Network inference 43
79. The multivariate Gaussian distribution
Allow to give a modeling for the expression level of a whole set of genes
P:
Gaussian vector
Let X ⇠ N (µ, ⌃), and assume any block decomposition with {a, b} a
partition of P ✓ ◆
⌃ab ⌃ba
⌃= .
⌃ab ⌃bb
Then
1. Xa is Gaussian with distribution N (µa , ⌃aa )
2. Xa |Xb = x is Gaussian with distribution N (µa|b , ⌃a|b ) known.
Network inference 44
80. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 45
81. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 46
82. Steady-state data: scheme
Inference
⇡ 10s microarrays over time
Which interactions?
⇡ 1000s probes (“genes”)
Network inference 47
83. Modeling the underlying distribution (1)
Model for data generation
I A microarray can be represented as a multivariate vector
X = (X1 , . . . , Xp ) 2 Rp ,
I Consider n biological replicate in the same condition, which forms a
usual n-size sample (X1 , . . . , Xn ).
Consequence: a Gaussian Graphical Model
I X ⇠ N (µ, ⌃) with X1 , . . . , Xn i.i.d. copies of X ,
I ⇥ = (✓ij )i,j 2P , ⌃ 1
is called the concentration matrix.
Network inference 48
84. Modeling the underlying distribution (1)
Model for data generation
I A microarray can be represented as a multivariate vector
X = (X1 , . . . , Xp ) 2 Rp ,
I Consider n biological replicate in the same condition, which forms a
usual n-size sample (X1 , . . . , Xn ).
Consequence: a Gaussian Graphical Model
I X ⇠ N (µ, ⌃) with X1 , . . . , Xn i.i.d. copies of X ,
I ⇥ = (✓ij )i,j 2P , ⌃ 1
is called the concentration matrix.
Network inference 48
85. Modeling the underlying distribution (2)
Interpretation as a GGM
Multivariate Gaussian vector and covariance selection
✓ij
p = cor Xi , Xj |XPi,j = ⇢ij |P{i,j } ,
✓ii ✓jj
Graphical Interpretation
The matrix ⇥ = (✓ij )i,j 2P encodes the network G we are looking for.
conditional dependency between Xj and Xi
? i or
if and only if non-null partial correlation between Xj and Xi
j m
✓ij 6= 0
Network inference 49
86. Modeling the underlying distribution (2)
Interpretation as a GGM
Multivariate Gaussian vector and covariance selection
✓ij
p = cor Xi , Xj |XPi,j = ⇢ij |P{i,j } ,
✓ii ✓jj
Graphical Interpretation
The matrix ⇥ = (✓ij )i,j 2P encodes the network G we are looking for.
conditional dependency between Xj and Xi
? i or
if and only if non-null partial correlation between Xj and Xi
j m
✓ij 6= 0
Network inference 49
87. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 50
88. Time-course data: scheme
t0 Inference
t1
tn
⇡ 10s microarrays over time
Which interactions?
⇡ 1000s probes (“genes”)
Network inference 51
89. Modeling time-course data with DAG
Collecting gene expression
1. Follow-up of one single experiment/individual;
2. Close enough time-points to ensure
I dependency between consecutive measurements;
I homogeneity of the Markov process.
Xt
1 Xt+1
1
X4
Xt
2 Xt+1
2
X1 stands for Xt+1
3
X3 X2 X5 G Xt+1
4
Xt+1
5
Network inference 52
90. Modeling time-course data with DAG
Collecting gene expression
1. Follow-up of one single experiment/individual;
2. Close enough time-points to ensure
I dependency between consecutive measurements;
I homogeneity of the Markov process.
Xt
1 Xt+1
1
X4
Xt
2 Xt+1
2
X1 stands for Xt+1
3
X3 X2 X5 G Xt+1
4
Xt+1
5
Network inference 52
91. Modeling time-course data with DAG
Collecting gene expression
1. Follow-up of one single experiment/individual;
2. Close enough time-points to ensure
I dependency between consecutive measurements;
I homogeneity of the Markov process.
Xt
1 X2
1
... Xn
1
X1
2 X2
2
... Xn
2
X1
3 X2
3
... Xn
3
X1
4 X2
4
... Xn
4
G G G
X1
5 X2
5
... Xn
5
Network inference 52
92. DAG: remark
X1
t X1
t+1
X4
X2
t X2
t+1
X1 versus X3
t+1
X3 X2 X5 G X4
t+1
X5
t+1
Argh, there is a cycle :’( is indeed a DAG
Overcomes the rather restrictive acyclic requirement
Network inference 53
93. Modeling the underlying distribution (1)
Model for data generation
A microarray can be represented as a multivariate vector
X = (X1 , . . . , Xp ) 2 Rp , generated through a first order vector
autoregressive process VAR(1):
X t = ⇥X t 1
+ b + "t , t 2 [1, n]
where "t is a white noise to ensure the Markov property and
X 0 ⇠ N (0, ⌃0 ).
Consequence: a Gaussian Graphical Model
I Each X t |X t 1 ⇠ N (✓X t 1 , ⌃),
I or, equivalently, Xjt |X t 1 ⇠ N (⇥j X t 1 , ⌃)
where ⌃ is known and ⇥j is the j th row of ⇥.
Network inference 54
94. Modeling the underlying distribution (1)
Model for data generation
A microarray can be represented as a multivariate vector
X = (X1 , . . . , Xp ) 2 Rp , generated through a first order vector
autoregressive process VAR(1):
X t = ⇥X t 1
+ b + "t , t 2 [1, n]
where "t is a white noise to ensure the Markov property and
X 0 ⇠ N (0, ⌃0 ).
Consequence: a Gaussian Graphical Model
I Each X t |X t 1 ⇠ N (✓X t 1 , ⌃),
I or, equivalently, Xjt |X t 1 ⇠ N (⇥j X t 1 , ⌃)
where ⌃ is known and ⇥j is the j th row of ⇥.
Network inference 54
96. Modeling the underlying distribution (3)
Interpretation as a GGM
The VAR(1) as a covariance selection model
⇣ ⌘
cov Xit , Xjt 1 |XPj1
t
✓ij = ⇣ ⌘ ,
var Xjt 1 |XPj1
t
Graphical Interpretation
The matrix ⇥ = (✓ij )i,j 2P encodes the network G we are looking for.
conditional dependency between Xjt 1 and Xit
? i or
if and only if non-null partial correlation between Xjt 1 and Xit
j
m
✓ij 6= 0
Network inference 56
97. Modeling the underlying distribution (3)
Interpretation as a GGM
The VAR(1) as a covariance selection model
⇣ ⌘
cov Xit , Xjt 1 |XPj1
t
✓ij = ⇣ ⌘ ,
var Xjt 1 |XPj1
t
Graphical Interpretation
The matrix ⇥ = (✓ij )i,j 2P encodes the network G we are looking for.
conditional dependency between Xjt 1 and Xit
? i or
if and only if non-null partial correlation between Xjt 1 and Xit
j
m
✓ij 6= 0
Network inference 56
98. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 57
99. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 58
100. The graphical models: remindera
a
for goldfish-like memories
Assumption
A microarray can be represented as a multivariate Gaussian vector X .
Collecting gene expression
1. Steady-state data leads to an i.i.d. sample.
2. Time-course data gives a time series.
Graphical interpretation
i conditional dependency between X (i) and X (j )
if and only if or
j non null partial correlation between X (i) and X (j )
Encoded in an unknown matrix of parameters ⇥.
Network inference 59
101. The graphical models: remindera
a
for goldfish-like memories
Assumption
A microarray can be represented as a multivariate Gaussian vector X .
Collecting gene expression
1. Steady-state data leads to an i.i.d. sample.
2. Time-course data gives a time series.
Graphical interpretation
i
? conditional dependency between X (i) and X (j )
if and only if or
j non null partial correlation between X (i) and X (j )
Encoded in an unknown matrix of parameters ⇥.
Network inference 59
102. The graphical models: remindera
a
for goldfish-like memories
Assumption
A microarray can be represented as a multivariate Gaussian vector X .
Collecting gene expression
1. Steady-state data leads to an i.i.d. sample.
2. Time-course data gives a time series.
Graphical interpretation
i
? conditional dependency between Xt (i) and Xt 1 (j )
if and only if or
j non null partial correlation between Xt (i) and Xt 1 (j )
Encoded in an unknown matrix of parameters ⇥.
Network inference 59
103. The Maximum likelihood estimator
The natural approach for parametric statistics
Let X be a random vector with distribution defined by fX (x ; ⇥), where
⇥ are the model parameters.
Maximum likelihood estimator
ˆ
⇥ = arg max L(⇥; X)
⇥
where L is the log likelihood, a function of the parameters:
n
Y
L(⇥; X) = log fX (xk ; ⇥),
k =1
where xk is the k row of X.
Remarks
I This a convex optimization problem,
I We just need to detect non zero coe cients in ⇥
Network inference 60
104. The penalized likelihood approach
Let ⇥ be the parameters to infer (the edges).
A penalized likelihood approach
ˆ
⇥ = arg max L(⇥; X) pen`1 (⇥),
⇥
I L is the model log-likelihood,
I pen`1 is a penalty function tuned by > 0.
It performs
1. regularization (needed when n ⌧ p),
2. selection (sparsity induced by the `1 -norm),
Network inference 61
105. The penalized likelihood approach
Let ⇥ be the parameters to infer (the edges).
A penalized likelihood approach
ˆ
⇥ = arg max L(⇥; X) pen`1 (⇥),
⇥
I L is the model log-likelihood,
I pen`1 is a penalty function tuned by > 0.
It performs
1. regularization (needed when n ⌧ p),
2. selection (sparsity induced by the `1 -norm),
Network inference 61
106. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 62
107. A Geometric View of Sparsity
Constrained Optimization
We basically want to solve a problem of
the form
maximize f ( 1, 2 ; X)
1, 2
2 ; X)
where f is typically a concave likelihood
function.
1,
This is strictly equivalent to solve
f(
minimize g( 1, 2 ; X)
1, 2
where g = f is convex ! For instance
the square lost in the OLS.
2
1
Network inference 63
108. A Geometric View of Sparsity
Constrained Optimization
(
2 ; X)
maximize f ( 1, 2 ; X)
1, 2 ,
s.t. ⌦( 1, 2) c
1,
where ⌦ defines a domain that
f(
constrains .
2
1
Network inference 63
109. A Geometric View of Sparsity
Constrained Optimization
(
maximize f ( 1, 2 ; X)
1, 2 ,
s.t. ⌦( 1, 2) c
where ⌦ defines a domain that
constrains .
m
2
maximize f ( 1, 2 ; X) ⌦( 1, 2)
1, 2
1
Network inference 63
110. A Geometric View of Sparsity
Constrained Optimization
(
maximize f ( 1, 2 ; X)
1, 2 ,
s.t. ⌦( 1, 2) c
where ⌦ defines a domain that
constrains .
m
maximize f ( 1, 2 ; X) ⌦( 1, 2)
2
1, 2
How shall we define ⌦ to induce
sparsity?
1
Network inference 63
111. A Geometric View of Sparsity
Supporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
2
1
Network inference 64
112. A Geometric View of Sparsity
Supporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
2
1
Network inference 64
113. A Geometric View of Sparsity
Supporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
2
1
Network inference 64
114. A Geometric View of Sparsity
Supporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
2
1
Network inference 64
115. A Geometric View of Sparsity
Supporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
2
1
There are Supporting Hyperplane at all points of convex sets:
Generalize tangents
Network inference 64
116. A Geometric View of Sparsity
Supporting Hyperplane
An hyperplane supports a set i↵
I the set is contained in one half-space
I the set has at least one point on the hyperplane
2
2
1 1
Network inference 64
117. A Geometric View of Sparsity
Dual Cone
Generalizes normals
2
2
2
1 1 1
Network inference 65
118. A Geometric View of Sparsity
Dual Cone
Generalizes normals
2
2
2
1 1 1
Network inference 65
119. A Geometric View of Sparsity
Dual Cone
Generalizes normals
2
2
2
1 1 1
Network inference 65
120. A Geometric View of Sparsity
Dual Cone
Generalizes normals
2
2
2
1 1 1
Shape of dual cones ) sparsity pattern
Network inference 65
121. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 66
122. The LASSO
R. Tibshirani, 1996.
The Lasso: Least Absolute Shrinkage and Selection Operator
S. Chen , D. Donoho , M. Saunders, 1995.
3.2. Basis Pursuit.
Régularisations ` p 23
Weisberg, 1980.
Forward Stagewise regression.
2 2
(
minimize ky X k2 ,
2
2R2
s.t.` 2 k k1 = | 1 | + | 2| c.
ls ls
m
`1 1 1
minimize ky X k2 + k k1 .
2
2R
Fig. 3.2 – Comparaisons des solutions de problèmes régularisés par une norme `1 et `2 .
Network inference 67
123. Orthogonal case and link to the OLS
OLS shrinkage
The Lasso has no analytical solution but in the orthogonal case: when
X| X = I (never for real data),
ˆlasso = sign( ˆols ) max(0, | ˆols | ).
j j j
OLS
4
Lasso
2
0 ols
4 2 0 2 4
2
4
Network inference 68
124. LARs: Least angle regression
B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, 2004.
Least Angle Regression.
E cient algorithm to compute the Lasso solutions
The LARS solution consists of a curve denoting the solution for each
value of .
I construct a piecewise linear path of solution starting from the null
vector towards the OLS estimate,
I (Almost) the same cost as OLS,
I well adapted to cross validation (help us to choose ).
Network inference 69
125. Example: prostate cancer I
Lasso solution path with Lars
> library(lars)
> load("prostate.rda")
> x <- as.matrix(x)
> x <- scale(as.matrix(x))
> out <- lars(x,y)
> plot(out)
Network inference 70
127. Choice of the tuning parameter I
Model selection criteria
log n
BIC( ) = ky X ˆ k2
2 df( ˆ )
2
AIC( ) = ky X ˆ k2
2 df( ˆ )
where df( ˆ ) is the number of nonzero entries in .
Cross-validation
1. split the data into K folds,
2. use successively each K fold as the testing set,
3. compute the test error on this K folds,
4. average to obtain the CV estimation of the test error.
is chosen to minimize the CV test error.
Network inference 72
128. Choice of the tuning parameter II
CV choice for
> cv.lars(x,y, K=10)
Network inference 73
130. Many variations
Group-Lasso
Activate the variables by group (given by the user).
Adaptive/Weighted-Lasso
Adjust the penalty level to each variables, according to prior knowledge or
with data driven weights.
BoLasso
Bootstrapped version that removes false positives/stabilizes the estimate.
etc.
+ many theoretical results.
Network inference 75
131. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 76
132. Outline
Introduction
Motivations
Background on omics
Modeling issue
Modeling tools
Statistical dependence
Graphical models
Covariance selection and Gaussian vector
Gaussian Graphical Models for genomic data
Steady-state data
Time-course data
Statistical inference
Penalized likelihood approach
Inducing sparsity and regularization
The Lasso
Application in Post-genomics
Modeling time-course data
Illustrations
Multitask learning
Network inference 77
133. Problem
t0 Inference
t1
tn
⇡ 10s microarrays over time
Which interactions?
⇡ 1000s probes (“genes”)
The main statistical issue is the high dimensional setting.
Network inference 78
134. Handling the scarcity of the data
By introducing some prior
Priors should be biologically grounded
1. few genes e↵ectively interact (sparsity),
2. networks are organized (latent clustering),
G8
G7 G9
G11
G1 G6 G10
G4 G5 G2
G12
G13
G3
Network inference 79
135. Handling the scarcity of the data
By introducing some prior
Priors should be biologically grounded
1. few genes e↵ectively interact (sparsity),
2. networks are organized (latent clustering),
G8
G7 G9
G11
G1 G6 G10
G4 G5 G2
G12
G13
G3
Network inference 79
136. Handling the scarcity of the data
By introducing some prior
Priors should be biologically grounded
1. few genes e↵ectively interact (sparsity),
2. networks are organized (latent clustering),
B3
B2 B4
B
A1 B1 B5
A4 A A2
C1
C
A3
Network inference 79
137. Penalized log-likelihood
Banerjee et al., JMLR 2008
ˆ
⇥ = arg max Liid (⇥; S) k⇥k`1 ,
⇥
e ciently solved by the graphical Lasso of Friedman et al, 2008.
Ambroise, Chiquet, Matias, EJS 2009
Use adaptive penalty parameters for di↵erent coe cients
Liid (⇥; S) kPZ ? ⇥k`1 ,
where PZ is a matrix of weights depending on the underlying clustering
Z.
Works with the pseudo log-likelihood (computationally e cient).
Network inference 80
138. Penalized log-likelihood
Banerjee et al., JMLR 2008
ˆ
⇥ = arg max Liid (⇥; S) k⇥k`1 ,
⇥
e ciently solved by the graphical Lasso of Friedman et al, 2008.
Ambroise, Chiquet, Matias, EJS 2009
Use adaptive penalty parameters for di↵erent coe cients
˜
Liid (⇥; S) kPZ ? ⇥k`1 ,
where PZ is a matrix of weights depending on the underlying clustering
Z.
Works with the pseudo log-likelihood (computationally e cient).
Network inference 80
139. Neighborhood selection (1)
Let
I Xi be the i th column of X,
I Xi be X deprived of Xi .
✓ij
Xi = Xi + ", where j = .
✓ii
Meinshausen and B¨lhman, 2006
u
Since sign(corij |P{i,j } ) = sign( j ), select the neighbors of i with
1 2
arg min Xi Xi 2
+ k k`1 .
n
The sign pattern of ⇥ is inferred after a symmetrization step.
Network inference 81
140. Neighborhood selection (2)
The pseudo log-likelihood of the i.i.d Gaussian sample is
p n
!
X X
˜
Liid (⇥; S) = log P(Xk (i )|Xk (Pi ); ⇥i ) ,
i=1 k =1
n n ⇣ ⌘ n
1/2 1/2
= log det(D) Trace D ⇥S⇥D log(2⇡),
2 2 2
where D = diag(⇥).
Proposition
ˆ pseudo = arg max Liid (⇥; S)
⇥ ˜ k⇥k`1
⇥:✓ij 6=✓ii
has the same null entries as inferred by neighborhood selection.
Network inference 82
141. Structured regularization
Introduce prior knowledge
Building the weights
1. Build w from prior biological information
I transcription factors vs. regulatees,
I number of potential binding sites,
I KEGG pathways, Gene Ontology . . .
2. Build the weights matrix from clustering algorithm
I Infer the network G 0 with w = 1 for each node,
I Apply a clustering algorithm on G 0 ,
I Re-Infer G with w built according to the clustering Z.
Network inference 83