Network analysis for computational biology
Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr
http://www.nathalievil...
Outline
1 An introduction to networks/graphs
2 Analysis of the eect of low-calorie diet on obese women
1 octobre 2013 (Sém...
An introduction to networks/graphs
Outline
1 An introduction to networks/graphs
2 Analysis of the eect of low-calorie diet...
An introduction to networks/graphs
What is a network/graph? réseau/graphe
Mathematical object used to model relational dat...
An introduction to networks/graphs
What is a network/graph? réseau/graphe
Mathematical object used to model relational dat...
An introduction to networks/graphs
What is a network/graph? réseau/graphe
Mathematical object used to model relational dat...
An introduction to networks/graphs
Examples
Bibliographic network
nodes gene, TF, enzyme...
edges a relationship between t...
An introduction to networks/graphs
Examples
Network inferred from expression data
nodes genes
edges a strong direct co-exp...
An introduction to networks/graphs
Standard issues associated with networks
Inference
Giving expression data, how to build...
An introduction to networks/graphs
Standard issues associated with networks
Inference
Giving expression data, how to build...
An introduction to networks/graphs
Standard issues associated with networks
Inference
Giving expression data, how to build...
An introduction to networks/graphs
Network inference
Data: large scale gene expression data
individuals
n 30/50



X =
...
An introduction to networks/graphs
Advantages of inferring a network from large scale
transcription data
1 over raw data: ...
An introduction to networks/graphs
Advantages of inferring a network from large scale
transcription data
1 over raw data: ...
An introduction to networks/graphs
Using correlations: relevance network
[Butte and Kohane, 1999,
Butte and Kohane, 2000]
...
An introduction to networks/graphs
But correlation is not causality...
1 octobre 2013 (Séminaire INSERM) Network Nathalie ...
An introduction to networks/graphs
But correlation is not causality...
strong indirect correlation
y z
x
set.seed(2807); x...
An introduction to networks/graphs
But correlation is not causality...
strong indirect correlation
y z
x
set.seed(2807); x...
An introduction to networks/graphs
But correlation is not causality...
strong indirect correlation
y z
x
Networks are buil...
An introduction to networks/graphs
Visualization tools help understand the graph
macro-structure
Purpose: How to display t...
An introduction to networks/graphs
Visualization tools help understand the graph
macro-structure
Purpose: How to display t...
An introduction to networks/graphs
Visualization tools help understand the graph
macro-structure
Purpose: How to display t...
An introduction to networks/graphs
Visualization tools help understand the graph
macro-structure
Purpose: How to display t...
An introduction to networks/graphs
Visualization tools help understand the graph
macro-structure
Purpose: How to display t...
An introduction to networks/graphs
Visualization software
• package igraph1 [Csardi and Nepusz, 2006] (static
representati...
An introduction to networks/graphs
Visualization software
• package igraph1 [Csardi and Nepusz, 2006] (static
representati...
An introduction to networks/graphs
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given v...
An introduction to networks/graphs
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given v...
An introduction to networks/graphs
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given v...
An introduction to networks/graphs
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given v...
An introduction to networks/graphs
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given v...
An introduction to networks/graphs
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given v...
An introduction to networks/graphs
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given v...
An introduction to networks/graphs
Vertex clustering classication
Cluster vertexes into groups that are densely connected ...
An introduction to networks/graphs
Find clusters by modularity optimization modularité
The modularity [Newman and Girvan, ...
An introduction to networks/graphs
Find clusters by modularity optimization modularité
The modularity [Newman and Girvan, ...
An introduction to networks/graphs
Find clusters by modularity optimization modularité
The modularity [Newman and Girvan, ...
Analysis of the eect of low-calorie diet on obese women
Outline
1 An introduction to networks/graphs
2 Analysis of the eec...
Analysis of the eect of low-calorie diet on obese women
Data
Experimental protocol
135 obese women and 3 times: before LCD...
Analysis of the eect of low-calorie diet on obese women
Data
Experimental protocol
135 obese women and 3 times: before LCD...
Analysis of the eect of low-calorie diet on obese women
Data
Data pre-processing
At CID3, individuals are split into three...
Analysis of the eect of low-calorie diet on obese women
Method
Network inference Clustering Mining
3 inter-dataset network...
Analysis of the eect of low-calorie diet on obese women
Brief overview on results
5 networks inferred with 264 nodes each:...
Analysis of the eect of low-calorie diet on obese women
References
Butte, A. and Kohane, I. (1999).
Unsupervised knowledge...
Upcoming SlideShare
Loading in …5
×

Network analysis for computational biology

615 views

Published on

Séminaire INSERM, I2C, Toulouse
October 11th, 2013

Published in: Science, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
615
On SlideShare
0
From Embeds
0
Number of Embeds
79
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Network analysis for computational biology

  1. 1. Network analysis for computational biology Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr http://www.nathalievilla.org IUT STID (Carcassonne) & SAMM (Université Paris 1) Séminaire INSERM - 11 octobre 2011 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 1 / 19
  2. 2. Outline 1 An introduction to networks/graphs 2 Analysis of the eect of low-calorie diet on obese women 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 2 / 19
  3. 3. An introduction to networks/graphs Outline 1 An introduction to networks/graphs 2 Analysis of the eect of low-calorie diet on obese women 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 3 / 19
  4. 4. An introduction to networks/graphs What is a network/graph? réseau/graphe Mathematical object used to model relational data between entities. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 4 / 19
  5. 5. An introduction to networks/graphs What is a network/graph? réseau/graphe Mathematical object used to model relational data between entities. The entities are called the nodes or the vertices n÷uds/sommets 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 4 / 19
  6. 6. An introduction to networks/graphs What is a network/graph? réseau/graphe Mathematical object used to model relational data between entities. A relation between two entities is modeled by an edge arête 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 4 / 19
  7. 7. An introduction to networks/graphs Examples Bibliographic network nodes gene, TF, enzyme... edges a relationship between two nodes that is already reported in the litterature 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 5 / 19
  8. 8. An introduction to networks/graphs Examples Network inferred from expression data nodes genes edges a strong direct co-expression between two nodes observed in the gene expression data 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 5 / 19
  9. 9. An introduction to networks/graphs Standard issues associated with networks Inference Giving expression data, how to build a graph whose edges represent the direct links between genes? Example: co-expression networks built from microarray/RNAseq data (nodes = genes; edges = signicant direct links between expressions of two genes) 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 6 / 19
  10. 10. An introduction to networks/graphs Standard issues associated with networks Inference Giving expression data, how to build a graph whose edges represent the direct links between genes? Graph mining (examples) 1 Network visualization: nodes are not a priori associated to a given position. How to represent the network in a meaningful way? Random positions Positions aiming at representing connected nodes closer 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 6 / 19
  11. 11. An introduction to networks/graphs Standard issues associated with networks Inference Giving expression data, how to build a graph whose edges represent the direct links between genes? Graph mining (examples) 1 Network visualization: nodes are not a priori associated to a given position. How to represent the network in a meaningful way? 2 Network clustering: identify communities (groups of nodes that are densely connected and share a few links (comparatively) with the other groups) 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 6 / 19
  12. 12. An introduction to networks/graphs Network inference Data: large scale gene expression data individuals n 30/50    X =   . . . . . . . . X j i . . . . . . . . .   variables (genes expression), p 103/4 What we want to obtain: a network with • nodes: genes; • edges: signicant and direct co-expression between two genes (track transcription regulations) 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 7 / 19
  13. 13. An introduction to networks/graphs Advantages of inferring a network from large scale transcription data 1 over raw data: focuses on the strongest direct relationships: irrelevant or indirect relations are removed (more robust) and the data are easier to visualize and understand. Expression data are analyzed all together and not by pairs. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 8 / 19
  14. 14. An introduction to networks/graphs Advantages of inferring a network from large scale transcription data 1 over raw data: focuses on the strongest direct relationships: irrelevant or indirect relations are removed (more robust) and the data are easier to visualize and understand. Expression data are analyzed all together and not by pairs. 2 over bibliographic network: can handle interactions with yet unknown (not annotated) genes and deal with data collected in a particular condition. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 8 / 19
  15. 15. An introduction to networks/graphs Using correlations: relevance network [Butte and Kohane, 1999, Butte and Kohane, 2000] First (naive) approach: calculate correlations between expressions for all pairs of genes, threshold the smallest ones and build the network. Correlations Thresholding Graph 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 9 / 19
  16. 16. An introduction to networks/graphs But correlation is not causality... 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 10 / 19
  17. 17. An introduction to networks/graphs But correlation is not causality... strong indirect correlation y z x set.seed(2807); x - runif(100) y - 2*x+1+rnorm(100,0,0.1); cor(x,y); [1] 0.9988261 z - 2*x+1+rnorm(100,0,0.1); cor(x,z); [1] 0.998751 cor(y,z); [1] 0.9971105 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 10 / 19
  18. 18. An introduction to networks/graphs But correlation is not causality... strong indirect correlation y z x set.seed(2807); x - runif(100) y - 2*x+1+rnorm(100,0,0.1); cor(x,y); [1] 0.9988261 z - 2*x+1+rnorm(100,0,0.1); cor(x,z); [1] 0.998751 cor(y,z); [1] 0.9971105 Partial correlation cor(lm(y∼x)$residuals,lm(z∼x)$residuals) [1] -0.1933699 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 10 / 19
  19. 19. An introduction to networks/graphs But correlation is not causality... strong indirect correlation y z x Networks are built using partial correlations, i.e., correlations between gene expressions knowing the expression of all the other genes (residual correlations). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 10 / 19
  20. 20. An introduction to networks/graphs Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 11 / 19
  21. 21. An introduction to networks/graphs Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) algorithmes de forces (e.g., [Fruchterman and Reingold, 1991]) 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 11 / 19
  22. 22. An introduction to networks/graphs Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) algorithmes de forces (e.g., [Fruchterman and Reingold, 1991]) • attractive forces: similar to springs along the edges 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 11 / 19
  23. 23. An introduction to networks/graphs Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) algorithmes de forces (e.g., [Fruchterman and Reingold, 1991]) • attractive forces: similar to springs along the edges • repulsive forces: similar to electric forces between all pairs of vertices 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 11 / 19
  24. 24. An introduction to networks/graphs Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) algorithmes de forces (e.g., [Fruchterman and Reingold, 1991]) • attractive forces: similar to springs along the edges • repulsive forces: similar to electric forces between all pairs of vertices iterative algorithm until stabilization of the vertex positions. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 11 / 19
  25. 25. An introduction to networks/graphs Visualization software • package igraph1 [Csardi and Nepusz, 2006] (static representation with useful tools for graph mining) 1 http://igraph.sourceforge.net/ 2 http://gephi.org 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 12 / 19
  26. 26. An introduction to networks/graphs Visualization software • package igraph1 [Csardi and Nepusz, 2006] (static representation with useful tools for graph mining) • free software Gephi2 (interactive software, supports zooming and panning) 1 http://igraph.sourceforge.net/ 2 http://gephi.org 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 12 / 19
  27. 27. An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  28. 28. An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 2 vertex betweenness centralité: number of shortest paths between all pairs of vertices that pass through the vertex. Betweenness is a centrality measure (vertices with a large betweenness that are the most likely to disconnect the network if removed). The orange node's degree is equal to 2, its betweenness to 4. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  29. 29. An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 2 vertex betweenness centralité: number of shortest paths between all pairs of vertices that pass through the vertex. Betweenness is a centrality measure (vertices with a large betweenness that are the most likely to disconnect the network if removed). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  30. 30. An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 2 vertex betweenness centralité: number of shortest paths between all pairs of vertices that pass through the vertex. Betweenness is a centrality measure (vertices with a large betweenness that are the most likely to disconnect the network if removed). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  31. 31. An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 2 vertex betweenness centralité: number of shortest paths between all pairs of vertices that pass through the vertex. Betweenness is a centrality measure (vertices with a large betweenness that are the most likely to disconnect the network if removed). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  32. 32. An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 2 vertex betweenness centralité: number of shortest paths between all pairs of vertices that pass through the vertex. Betweenness is a centrality measure (vertices with a large betweenness that are the most likely to disconnect the network if removed). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  33. 33. An introduction to networks/graphs Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity. 2 vertex betweenness centralité: number of shortest paths between all pairs of vertices that pass through the vertex. Betweenness is a centrality measure (vertices with a large betweenness that are the most likely to disconnect the network if removed). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 13 / 19
  34. 34. An introduction to networks/graphs Vertex clustering classication Cluster vertexes into groups that are densely connected and share a few links (comparatively) with the other groups. Clusters are often called communities communautés (social sciences) or modules modules (biology). Example 1: Natty's facebook network 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 14 / 19
  35. 35. An introduction to networks/graphs Find clusters by modularity optimization modularité The modularity [Newman and Girvan, 2004] of the partition (C1, . . . , CK) is equal to: Q(C1, . . . , CK) = 1 2m K k=1xi ,xj∈Ck (Wij − Pij) with Pij: weight of a null model (graph with the same degree distribution but no preferential attachment): Pij = didj 2m with di = 1 2 j=i Wij. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 15 / 19
  36. 36. An introduction to networks/graphs Find clusters by modularity optimization modularité The modularity [Newman and Girvan, 2004] of the partition (C1, . . . , CK) is equal to: Q(C1, . . . , CK) = 1 2m K k=1xi ,xj∈Ck (Wij − Pij) with Pij: weight of a null model (graph with the same degree distribution but no preferential attachment): Pij = didj 2m with di = 1 2 j=i Wij. A good clustering should maximize the modularity: • Q when (xi, xj) are in the same cluster and Wij Pij • Q when (xi, xj) are in two dierent clusters and Wij Pij 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 15 / 19
  37. 37. An introduction to networks/graphs Find clusters by modularity optimization modularité The modularity [Newman and Girvan, 2004] of the partition (C1, . . . , CK) is equal to: Q(C1, . . . , CK) = 1 2m K k=1xi ,xj∈Ck (Wij − Pij) with Pij: weight of a null model (graph with the same degree distribution but no preferential attachment): Pij = didj 2m with di = 1 2 j=i Wij. A good clustering should maximize the modularity: • Q when (xi, xj) are in the same cluster and Wij Pij • Q when (xi, xj) are in two dierent clusters and Wij Pij Modularity optimization helps separate hubs (an edge is all the more important in the criterion that it is attached to a vertex with a low degree). 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 15 / 19
  38. 38. Analysis of the eect of low-calorie diet on obese women Outline 1 An introduction to networks/graphs 2 Analysis of the eect of low-calorie diet on obese women 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 16 / 19
  39. 39. Analysis of the eect of low-calorie diet on obese women Data Experimental protocol 135 obese women and 3 times: before LCD, after a 2-month LCD and 6 months later (between the end of LCD and the last measurement, women are randomized into one of 5 recommended diet groups). At every time step, 221 gene expressions, 28 fatty acids and 15 clinical variables (i.e., weight, HDL, ...) 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 17 / 19
  40. 40. Analysis of the eect of low-calorie diet on obese women Data Experimental protocol 135 obese women and 3 times: before LCD, after a 2-month LCD and 6 months later (between the end of LCD and the last measurement, women are randomized into one of 5 recommended diet groups). At every time step, 221 gene expressions, 28 fatty acids and 15 clinical variables (i.e., weight, HDL, ...) Correlations between gene expressions and between a gene expression and a fatty acid levels are not of the same order: inference method must be dierent inside the groups and between two groups. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 17 / 19
  41. 41. Analysis of the eect of low-calorie diet on obese women Data Data pre-processing At CID3, individuals are split into three groups: weight loss, weight regain and stable weight (groups are not correlated to the diet group according to χ2-test). Partition similar to the one obtained with weight increase/decrease ratio. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 17 / 19
  42. 42. Analysis of the eect of low-calorie diet on obese women Method Network inference Clustering Mining 3 inter-dataset networks sparse CCA merge for each time step into one network 3 intra-dataset networks sparse partial correlation 5 networks CID1 CID2 3×CID3 Extract important nodes Study/Compare clusters 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 18 / 19
  43. 43. Analysis of the eect of low-calorie diet on obese women Brief overview on results 5 networks inferred with 264 nodes each: CID1 CID2 CID3g1 CID3g2 CID3g3 size LCC 244 251 240 259 258 density 2.3% 2.3% 2.3% 2.3% 2.3% transitivity 17.2% 11.9% 21.6% 10.6% 10.4% nb clusters 14 (2-52) 10 (4-52) 11 (2-46) 12 (2-51) 12 (3-54) 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 19 / 19
  44. 44. Analysis of the eect of low-calorie diet on obese women References Butte, A. and Kohane, I. (1999). Unsupervised knowledge discovery in medical databases using relevance networks. In Proceedings of the AMIA Symposium, pages 711715. Butte, A. and Kohane, I. (2000). Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In Proceedings of the Pacic Symposium on Biocomputing, pages 418429. Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems. Fruchterman, T. and Reingold, B. (1991). Graph drawing by force-directed placement. Software, Practice and Experience, 21:11291164. Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review, E, 69:026113. 1 octobre 2013 (Séminaire INSERM) Network Nathalie Villa-Vialaneix 19 / 19

×