2022_Meetup_Mazza-Marzo.pptx

Tommaso Mazza, Ph.D.
Exploiting Graph
Theory for Systems
Biology

- Introduction to Computational & Systems Biology
- Concepts of Graph Theory
- Bio-interaction networks and visualization
- Data sources of P2P interactions
- Measures of topological importance
OUTLINE

What
is
Systems
Biology? To understand biology at the system
level, we must examine the structure
and dynamics of cellular and
organismal function, rather than the
characteristics of isolated parts of a cell
or organism. Properties of systems,
such as robustness, emerge as central
issues, and understanding these
properties may have an impact on the
future of medicine.
Hiroaki Kitano

Networks:
the
starting
points
Texts typically trace the origin of
graph theory back to the Königsberg
Bridge Problem and its solution by
Leonhard Euler (1736). He wrote a
solution to a problem concerning the
geometry of a place. First paper in
graph theory
Problem of the Königsberg bridges:
Starting and ending at the same
point, is it possible to cross all
seven bridges just once and return
to the starting point?

What’s
a
Graph?
It is a pair G = (V, E), where
V = V(G) = set of vertices
E = E(G) = set of edges
v1
v5
v3
v2
v4
e1
e2
e4
e3
e5
e6

Definitions
–
Graph
Type
Simple graph
A graph without loops or parallel edges
Weighted graph
A graph where each edge is assigned a
numerical label or “weight”
Type Edges Multiple Edges
Allowed ?
Loops Allowed ?
Simple Graph undirected No No
Multigraph undirected Yes No
Pseudograph undirected Yes Yes
Directed Graph directed No Yes
Directed
Multigraph
directed Yes Yes

Connected
graphs
An undirected graph is
connected if every pair of
vertices can be connected
by a path
Each connected subgraph
of a non-connected graph G
is called a component of G

Representation
Incidence (Matrix)
Adjacency List
Adjacency Matrix
- Rows and columns are labeled with
ordered vertices
- write a 1 if there is an edge between the
row vertex and the column vertex
- and 0 if no edge exists between them
v w x y
v 0 1 0 1
w 1 0 1 1
x 0 1 0 1
y 1 1 1 0

Hands-on #1
Coding environment
setup (live)

- Install Miniconda for your OS
- https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html
- Setup environment
- conda create –n meetup python=3.7
- conda activate meetup
- conda install notebook pandas networkx matplotlib=2.2.3
- Lunch Jupyter Notebook from the meetup code folder
- jupyter notebook meetup1.ipynb
cmd

- Introduction to Computational & Systems Biology
- Concepts of Graph theory
OUTLINE

Network abstractions
- Node: biological object; edge: interaction between nodes
Regulatory networks
- Node: genes; edge: regulatory interaction
Metabolic networks
- Node: metabolite; edge: reaction
Type
of
biological
networks
Protein networks
- Node: protein; edge: interaction
- Node: complex; edge: sharing a protein
- Node: residue; edge: folding neighbors

Is
the
organization
of
biological
network
random? The Scale-Free Model:
Preferential Attachment
Preferential attachment means that the more connected a
node is, the more likely it is to receive new links.
Growth: degree-m nodes are constantly added
Preferential attachment: the probability that a new node
connects to an existing one is proportional to its degree

Is
the
organization
of
biological
network
random?
Node Degree / Rank
Degree = Number of neighbors
Local characterization!
Node degree in PPI networks correlates with:
Gene essentiality
Conservation rate
Likelihood to cause human disease

Is
the
organization
of
biological
network
random?
The Power-Law Distribution
( ) c
P k k

Fat or heavy tail!
Leads to a “scale-free” network
Characterized by a small number of highly
connected nodes, known as hubs
Hubs are crucial:
Affect error and attack tolerance of complex
networks (Albert et al. Nature, 2000)
party hubs and date hubs

Is
the
organization
of
biological
network
random? Power-law distribution
 log-log scale
 high skew (asymmetry)
 straight line on a log-log plot

Is
the
organization
of
biological
network
random?
Power laws are seemingly everywhere
Moby Dick scientific papers 1981-1997 AOL users visiting sites ‘97
bestsellers 1895-1965 AT&T customers on 1 day California 1910-1992
Source: MEJ Newman, ’Power laws, Pareto distributions and Zipf’s law’

Is
the
organization
of
biological
network
random?
Movie Actor Collaboration Network
Nodes – 212,250 actors
Edges – co-appearance in a movie
P(k) ~ k-2.3
Barabasi and Albert, Science, 1999
Tropic Thunder (2008)

Is
the
organization
of
biological
network
random?
Protein Interaction Networks
Yook et al, Proteomics, 2004
Nodes – Proteins
Edges – Interactions (yeast)
P(k) ~ k-2.5

Is
the
organization
of
biological
network
random?
Metabolic Networks
C.Elegans
(eukaryote)
E. Coli
(bacterium)
Averaged
(43 organisms)
A.Fulgidus
(archae)
Jeong et al., Nature, 2000
Nodes – Metabolites
Edges – Reactions
P(k) ~ k-2.2±2
Metabolic networks
across all kingdoms
of life are scale-free

Protein interaction
- Introduction on Graph theory
OUTLINE

Hands-on #2
(recorded)
Cytoscape

Data
Sources
Literature
Your data
Co-expression-based methods
correlation coefficients
entropy
…

Genemania
#3
http://genemania.org/

Hunt
for
causes
&
mechanisms
1. What are the most functionally
important molecules?
2. What are the mechanisms of
pathogenesis?

AIDS dataset
1 2
3
4
5
6 7
8
9
10
11
12
13 14
15
16
17
18
19
20
21
22
23
24
27
28
29
30
31
32
34
35
36
37
38
39
40
42
43
45
46 47
48
49
50
52
54
55
56
57
58
59
61
63
64
65
66
67
68
70
71
72
74
75
77
78
79
81
83
84
85
87
88
89
90 91
92
93
95
96
97
98
99
100
101
102
104
105
106
107
108
109
113
115
116
117
118
120
122
123
124
125
126
127
128
129
130
131
134
135
136
137
139
140
141 142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
163
165
166
167
169
171
172
173
174
175
177
178
180
182
183
184
185
187
188
189
190
191
192
193
194
196
197
199
200 202
203
204
205
206
208
209
210
211
212
213
215
216
217
219
220
222
223
224
225
226
227
228
231
232
233
234
235
236
239
240
244
245
246
251
252
254
255
256
258
260
273
275
277
280
285
287
290
295
298
Acquaintance network. Upward triangles
indicate African-Americans, downward
triangles indicate Puerto-Ricans, and squares
identify all others.

Terrorists' dataset
A
B
C
Figure 7. Terrorist network compiled by
Krebs (2001)

Geometric properties
betweenness closeness degree
Degree:
Eigenvector:
Betweenness:
Closeness:
Clustering coeff.:

Identifying sets of key players
AIMS:
optimally diffusing something through the network
(KPP-Pos) The kp-set is maximally connected to all other nodes.
optimally disrupting or fragmenting the network by
removing the key nodes
(KPP-Neg) Removing the kp-set would result in a residual network with the least possible
cohesion)

Pyntacle
http://pyntacle.css-mendel.it

Tommaso Mazza, Ph.D.
t.mazza@css-mendel.it
Bioinformatics Unit, PI
IRCCS Casa Sollievo della Sofferenza
Viale Regina Margherita, 261
00198, Rome (IT)

2022_Meetup_Mazza-Marzo.pptx

Recommended

Recommended

More Related Content

Similar to 2022_Meetup_Mazza-Marzo.pptx

Similar to 2022_Meetup_Mazza-Marzo.pptx (20)

More from Deep Learning Italia

More from Deep Learning Italia (20)

Recently uploaded

Recently uploaded (20)

2022_Meetup_Mazza-Marzo.pptx

Editor's Notes