Network analysis for computational biology
Upcoming SlideShare
Loading in...5
×
 

Network analysis for computational biology

on

  • 149 views

Séminaire INSERM, I2C, Toulouse

Séminaire INSERM, I2C, Toulouse
October 11th, 2013

Statistics

Views

Total Views
149
Views on SlideShare
116
Embed Views
33

Actions

Likes
0
Downloads
1
Comments
0

2 Embeds 33

http://www.nathalievilla.org 26
http://localhost 7

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Network analysis for computational biology Network analysis for computational biology Presentation Transcript

  • Network analysis for computational biology Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr http://www.nathalievilla.org IUT STID (Carcassonne) & SAMM (Université Paris 1) Séminaire INSERM - 11 octobre 2011 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 1 / 19
  • Outline 1 An introduction to networks/graphs 2 Analysis of the eect of low-calorie diet on obese women 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 2 / 19
  • An introduction to networks/graphs Outline 1 An introduction to networks/graphs 2 Analysis of the eect of low-calorie diet on obese women 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 3 / 19
  • An introduction to networks/graphs What is a network/graph? réseau/graphe Mathematical object used to model relational data between entities. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 4 / 19
  • An introduction to networks/graphs What is a network/graph? réseau/graphe Mathematical object used to model relational data between entities. The entities are called the nodes or the vertices n÷uds/sommets 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 4 / 19
  • An introduction to networks/graphs What is a network/graph? réseau/graphe Mathematical object used to model relational data between entities. A relation between two entities is modeled by an edge arête 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 4 / 19
  • An introduction to networks/graphs Examples Bibliographic network nodes gene, TF, enzyme... edges a relationship between two nodes that is already reported in the litterature 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 5 / 19
  • An introduction to networks/graphs Examples Network inferred from expression data nodes genes edges a strong direct co-expression between two nodes observed in the gene expression data 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 5 / 19
  • An introduction to networks/graphs Standard issues associated with networks Inference Giving expression data, how to build a graph whose edges represent the direct links between genes? Example: co-expression networks built from microarray/RNAseq data (nodes = genes; edges = signicant direct links between expressions of two genes) 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 6 / 19
  • An introduction to networks/graphs Standard issues associated with networks Inference Giving expression data, how to build a graph whose edges represent the direct links between genes? Graph mining (examples) 1 Network visualization: nodes are not a priori associated to a given position. How to represent the network in a meaningful way? Random positions Positions aiming at representing connected nodes closer 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 6 / 19
  • An introduction to networks/graphs Standard issues associated with networks Inference Giving expression data, how to build a graph whose edges represent the direct links between genes? Graph mining (examples) 1 Network visualization: nodes are not a priori associated to a given position. How to represent the network in a meaningful way? 2 Network clustering: identify communities (groups of nodes that are densely connected and share a few links (comparatively) with the other groups) 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 6 / 19
  • An introduction to networks/graphs Network inference Data: large scale gene expression data individuals n 30/50    X =   . . . . . . . . X j i . . . . . . . . .   variables (genes expression), p 103/4 What we want to obtain: a network with • nodes: genes; • edges: signicant and direct co-expression between two genes (track transcription regulations) 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 7 / 19
  • An introduction to networks/graphs Advantages of inferring a network from large scale transcription data 1 over raw data: focuses on the strongest direct relationships: irrelevant or indirect relations are removed (more robust) and the data are easier to visualize and understand. Expression data are analyzed all together and not by pairs. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 8 / 19
  • An introduction to networks/graphs Advantages of inferring a network from large scale transcription data 1 over raw data: focuses on the strongest direct relationships: irrelevant or indirect relations are removed (more robust) and the data are easier to visualize and understand. Expression data are analyzed all together and not by pairs. 2 over bibliographic network: can handle interactions with yet unknown (not annotated) genes and deal with data collected in a particular condition. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 8 / 19
  • An introduction to networks/graphs Using correlations: relevance network [Butte and Kohane, 1999, Butte and Kohane, 2000] First (naive) approach: calculate correlations between expressions for all pairs of genes, threshold the smallest ones and build the network. Correlations Thresholding Graph 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 9 / 19
  • An introduction to networks/graphs But correlation is not causality... 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 10 / 19
  • An introduction to networks/graphs But correlation is not causality... strong indirect correlation y z x set.seed(2807); x <- runif(100) y <- 2*x+1+rnorm(100,0,0.1); cor(x,y); [1] 0.9988261 z <- 2*x+1+rnorm(100,0,0.1); cor(x,z); [1] 0.998751 cor(y,z); [1] 0.9971105 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 10 / 19
  • An introduction to networks/graphs But correlation is not causality... strong indirect correlation y z x set.seed(2807); x <- runif(100) y <- 2*x+1+rnorm(100,0,0.1); cor(x,y); [1] 0.9988261 z <- 2*x+1+rnorm(100,0,0.1); cor(x,z); [1] 0.998751 cor(y,z); [1] 0.9971105 Partial correlation cor(lm(y∼x)$residuals,lm(z∼x)$residuals) [1] -0.1933699 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 10 / 19
  • An introduction to networks/graphs But correlation is not causality... strong indirect correlation y z x Networks are built using partial correlations, i.e., correlations between gene expressions knowing the expression of all the other genes (residual correlations). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 10 / 19
  • An introduction to networks/graphs Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 11 / 19
  • An introduction to networks/graphs Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) algorithmes de forces (e.g., [Fruchterman and Reingold, 1991]) 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 11 / 19
  • An introduction to networks/graphs Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) algorithmes de forces (e.g., [Fruchterman and Reingold, 1991]) • attractive forces: similar to springs along the edges 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 11 / 19
  • An introduction to networks/graphs Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) algorithmes de forces (e.g., [Fruchterman and Reingold, 1991]) • attractive forces: similar to springs along the edges • repulsive forces: similar to electric forces between all pairs of vertices 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 11 / 19
  • An introduction to networks/graphs Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) algorithmes de forces (e.g., [Fruchterman and Reingold, 1991]) • attractive forces: similar to springs along the edges • repulsive forces: similar to electric forces between all pairs of vertices iterative algorithm until stabilization of the vertex positions. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 11 / 19
  • An introduction to networks/graphs Visualization software • package igraph1 [Csardi and Nepusz, 2006] (static representation with useful tools for graph mining) 1 http://igraph.sourceforge.net/ 2 http://gephi.org 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 12 / 19
  • An introduction to networks/graphs Visualization software • package igraph1 [Csardi and Nepusz, 2006] (static representation with useful tools for graph mining) • free software Gephi2 (interactive software, supports zooming and panning) 1 http://igraph.sourceforge.net/ 2 http://gephi.org 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 12 / 19
  • An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  • An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 2 vertex betweenness centralité: number of shortest paths between all pairs of vertices that pass through the vertex. Betweenness is a centrality measure (vertices with a large betweenness that are the most likely to disconnect the network if removed). The orange node's degree is equal to 2, its betweenness to 4. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  • An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 2 vertex betweenness centralité: number of shortest paths between all pairs of vertices that pass through the vertex. Betweenness is a centrality measure (vertices with a large betweenness that are the most likely to disconnect the network if removed). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  • An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 2 vertex betweenness centralité: number of shortest paths between all pairs of vertices that pass through the vertex. Betweenness is a centrality measure (vertices with a large betweenness that are the most likely to disconnect the network if removed). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  • An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 2 vertex betweenness centralité: number of shortest paths between all pairs of vertices that pass through the vertex. Betweenness is a centrality measure (vertices with a large betweenness that are the most likely to disconnect the network if removed). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  • An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 2 vertex betweenness centralité: number of shortest paths between all pairs of vertices that pass through the vertex. Betweenness is a centrality measure (vertices with a large betweenness that are the most likely to disconnect the network if removed). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  • An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 2 vertex betweenness centralité: number of shortest paths between all pairs of vertices that pass through the vertex. Betweenness is a centrality measure (vertices with a large betweenness that are the most likely to disconnect the network if removed). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  • An introduction to networks/graphs Vertex clustering classication Cluster vertexes into groups that are densely connected and share a few links (comparatively) with the other groups. Clusters are often called communities communautés (social sciences) or modules modules (biology). Example 1: Natty's facebook network 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 14 / 19
  • An introduction to networks/graphs Find clusters by modularity optimization modularité The modularity [Newman and Girvan, 2004] of the partition (C1, . . . , CK) is equal to: Q(C1, . . . , CK) = 1 2m K k=1xi ,xj∈Ck (Wij − Pij) with Pij: weight of a null model (graph with the same degree distribution but no preferential attachment): Pij = didj 2m with di = 1 2 j=i Wij. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 15 / 19
  • An introduction to networks/graphs Find clusters by modularity optimization modularité The modularity [Newman and Girvan, 2004] of the partition (C1, . . . , CK) is equal to: Q(C1, . . . , CK) = 1 2m K k=1xi ,xj∈Ck (Wij − Pij) with Pij: weight of a null model (graph with the same degree distribution but no preferential attachment): Pij = didj 2m with di = 1 2 j=i Wij. A good clustering should maximize the modularity: • Q when (xi, xj) are in the same cluster and Wij Pij • Q when (xi, xj) are in two dierent clusters and Wij Pij 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 15 / 19
  • An introduction to networks/graphs Find clusters by modularity optimization modularité The modularity [Newman and Girvan, 2004] of the partition (C1, . . . , CK) is equal to: Q(C1, . . . , CK) = 1 2m K k=1xi ,xj∈Ck (Wij − Pij) with Pij: weight of a null model (graph with the same degree distribution but no preferential attachment): Pij = didj 2m with di = 1 2 j=i Wij. A good clustering should maximize the modularity: • Q when (xi, xj) are in the same cluster and Wij Pij • Q when (xi, xj) are in two dierent clusters and Wij Pij Modularity optimization helps separate hubs (an edge is all the more important in the criterion that it is attached to a vertex with a low degree). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 15 / 19
  • Analysis of the eect of low-calorie diet on obese women Outline 1 An introduction to networks/graphs 2 Analysis of the eect of low-calorie diet on obese women 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 16 / 19
  • Analysis of the eect of low-calorie diet on obese women Data Experimental protocol 135 obese women and 3 times: before LCD, after a 2-month LCD and 6 months later (between the end of LCD and the last measurement, women are randomized into one of 5 recommended diet groups). At every time step, 221 gene expressions, 28 fatty acids and 15 clinical variables (i.e., weight, HDL, ...) 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 17 / 19
  • Analysis of the eect of low-calorie diet on obese women Data Experimental protocol 135 obese women and 3 times: before LCD, after a 2-month LCD and 6 months later (between the end of LCD and the last measurement, women are randomized into one of 5 recommended diet groups). At every time step, 221 gene expressions, 28 fatty acids and 15 clinical variables (i.e., weight, HDL, ...) Correlations between gene expressions and between a gene expression and a fatty acid levels are not of the same order: inference method must be dierent inside the groups and between two groups. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 17 / 19
  • Analysis of the eect of low-calorie diet on obese women Data Data pre-processing At CID3, individuals are split into three groups: weight loss, weight regain and stable weight (groups are not correlated to the diet group according to χ2-test). Partition similar to the one obtained with weight increase/decrease ratio. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 17 / 19
  • Analysis of the eect of low-calorie diet on obese women Method Network inference Clustering Mining 3 inter-dataset networks sparse CCA merge for each time step into one network 3 intra-dataset networks sparse partial correlation 5 networks CID1 CID2 3×CID3 Extract important nodes Study/Compare clusters 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 18 / 19
  • Analysis of the eect of low-calorie diet on obese women Brief overview on results 5 networks inferred with 264 nodes each: CID1 CID2 CID3g1 CID3g2 CID3g3 size LCC 244 251 240 259 258 density 2.3% 2.3% 2.3% 2.3% 2.3% transitivity 17.2% 11.9% 21.6% 10.6% 10.4% nb clusters 14 (2-52) 10 (4-52) 11 (2-46) 12 (2-51) 12 (3-54) 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 19 / 19
  • Analysis of the eect of low-calorie diet on obese women References Butte, A. and Kohane, I. (1999). Unsupervised knowledge discovery in medical databases using relevance networks. In Proceedings of the AMIA Symposium, pages 711715. Butte, A. and Kohane, I. (2000). Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In Proceedings of the Pacic Symposium on Biocomputing, pages 418429. Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems. Fruchterman, T. and Reingold, B. (1991). Graph drawing by force-directed placement. Software, Practice and Experience, 21:11291164. Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review, E, 69:026113. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 19 / 19