Interpretation of the biological knowledge using networks approach

Interpretation of the biological
knowledge using networks approach
Elena Sügis
elena.sugis@.ut.ee
Bioinformatics for bioengineers LTTI.00.016, Spring 2018

lots of
experiments
v
analysis
Science
knowledge
hypothesis
v
v
lots of
experiments
v
analysis
Science
knowledge
hypothesis
v
v
Networks - the language of complex systems
Image 2 is adapted from http://www.jillkgregory.com/new-gallery-17/Image 1 is adapted from https://en.wikipedia.org/wiki/Complex_network

Image 2 is adapted from http://www.jillkgregory.com/new-gallery-17/
lots of
experiments
v
analysis
Science
knowledge
hypothesis
v
v
lots of
experiments
v
analysis
Science
knowledge
hypothesis
v
v
Networks-the language of complex systems
Image 1 is adapted from https://en.wikipedia.org/wiki/Complex_network

Networks are powerful tools
Analysis
• Topological properties
• Hubs and subnetworks
• Classify, cluster and diffuse
• Data integration
Visualization
• Data overlays
• Layouts and animation
• Exploratory analysis
• Context and interpretation
Image is adapted from Cassar, EMBO Reports 2015, Fig.8

• Reduce complexity 
• More efﬁcient than tables 
• Great for data integration 
• Intuitive visualization
Beneﬁts of using networks

6
3
4
5
2
1
• NODES
• EDGES
Graphs are mathematical structure composed of set of objects
where pairs of the objects are connected by links
Networks can be built for any functional system
Networks - are graphs

• Genes
• Proteins
• Metabolites
• Enzymes
• Organisms
6
3
4
5
2
1
Nodes
The nodes in the networks represent related objects

Biological relationships:
• Interactions
• Regulations
• Reactions
• Transformations
• Activations
• Inhibitions
etc.
Edges
The edges in the network represent the type of relationship
between two entities
A B
A B
A B
A B
activates
binds to
has similar
sequence
co-cited

Edges
A B
A B
A B
directed
undirected
weighted
0,8
The architecture (or topology) of a network can be represented as
graph with links between the parts.

Image is adapted from https://www.systemsbiology.org/about/what-is-systems-biology/
Interactome
With networks, we can organize and integrate information at diﬀerent levels

Pathways
NETWORKS PATHWAYS
Collection of binary interactions Human-curated, detailed
Large scale Small scale
Generated from omics data
Constructed from literature/domain
expert knowledge
A pathway is a series of actions among molecules in a cell that leads to a
certain product or a change in a cell.

You want to know:
- Type of relationships between genes
- Strength of relationship
- Functions of the related genes
- Pathways
- etc.
Gene list from
experiment
APP
PSEN1
FYN
MAPT
BIN1
EPHA1
EPHA2
PSEN
What network can tell you

What network can tell you
You can:
• Visually identify relationships among the group of
biological entities
• Find drag targets
• Identify overrepresented gene/protein functions
• Discover biological pathways
Alzheimer’s disease

• Series of molecular cancer
proﬁles
• Clinical, genomic, methylation,
RNA and proteomic signatures.
• Multiple data types integrated
into signalling network
• Includes patient sample-level
data
Image is adapted from TCGA (2013) Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature, 499, Fig. 4
Networks application in research

Data comes in different forms
Computational data - 
results of the analysis
Raw data -
results of the experiments 
Sequencing technologies
Mass spectrometry
healthy cell cancer cell
DNA
RNA
Protein
co-expression
differential
expression

Big hairball
Nice and clear they say
Reduce complexity they say

Biological networks rarely tell us anything by themselves
Analysis involves:
• Understanding the characteristics of the network
• Modularity
• Comparison with other networks (i.e., random networks)
Visualization involves:
• Placing nodes in a meaningful way (layouts)
• Mapping biologically relevant data to the network
• Change node size, colour, edge weights, etc. 
which allows better biological interpretation.
Making sense of the biological networks

Analysis
tools
Awesome resultData
Analysis pipeline

Network analysis tools
intro
medium
advanced

Network analysis tools
intro
medium
advanced
hands-on session

Network properties
Global Network Properties
Local Network Properties
• Degree distribu-on
• Clustering coeﬃcient
• Shortest path
• Centrali-es
• Network mo-fs
Figure is adapted from https://cytoscape.github.io/cytoscape-tutorials/presentations/advanced-automation-2017-mpi.html#/11

Degree distribution
Degree of a node is the number of edges incident to the node.

Degree distribution
Degree of a node is the number of edges incident to the node.
Degree distribution:
• Let P(k) be the percentage of nodes of degree k in the network.
The degree distribution is the distribution of P(k) over all k.
• P(k) can be understood as the probability that a node has degree k.
P(k) ~
e−λ
λk
k!
Image is adapted from E. Ravasz et al., Science, 2002

Degree distribution in scale-free networks
• Networks with power-law degree distributions are called scale-free
networks
• Most nodes are of low degree, but there is a small number of
highly-linked nodes (nodes of high degree) called “hubs.”
P(k) ~ k−γ

Clustering coefficient
Clustering coefficient is a measure of degree to which nodes in a
graph tend to cluster together.
Ci=2Ei/ki(ki-1)
ith node has ki neighbours linking with it
Ei is the actual number of links between ki neighbours
ki(ki-1)/2 maximal number of links between ki neighbours
Clustering coefficient of a vertex in a graph quantifies
how close its neighbours are to be a clique (complete
graph)

Hierarchical modularity
Many highly connected small clusters
combine into
few larger but less connected clusters
combine into
even larger and even less connected clusters
Clustering coefﬁcient follows power-law distributionC(k) ~ k−β

Comparison of the network properties
C(k) ~ k−β
P(k) ~ k−γ
P(k) ~
e−λ
λk
k!

Shortest path
• Distance between two nodes is the smallest number of links that
have to be traversed to get from one node to the other. 
Shortest path is the path that achieves that distance. 
• Small world network is characterised by small average path length
l =
2
N(N −1)
lij
i<j
∑
lij is the shortest path length between node i and j

Deﬁning important nodes in biological
networks
How would you deﬁne an important node?

Deﬁning important nodes in biological
networks
the most connected?
connects other nodes in the network?
the closest to other nodes?

Centrality
Centrality quantifies the topological importance of a node (edge) in a network.
• Degree centrality defined number of
edges incident upon a node (find hubs).
C D (node) = Degree of this node 
 
• Betweenness centrality indicates how
much load is on a node (bottleneck).
C B (node) = The average number of
shortest paths that go through this node
 
• Closeness centrality defines how close a
node is to all other nodes in the network.
C C (node) = Inverse of the average of the
shortest paths to all other nodes.
https://cytoscape.github.io/cytoscape-tutorials/presentations/modules/network-analysis/index.html#/0/6

Figure is partially adapted with modiﬁcations from original https://cytoscape.github.io/cytoscape-tutorials/presentations/modules/network-analysis/index.html#/0/6
How different centralities look
HUB
node that connect two sub-networks
closest node to all other nodes

Biological meaning
Degree centrality Closeness centralityBetweenness centrality
• Amount of control that
this node has over the
interactions of other
nodes in the network 
• How much information
load is on the node 
• Describes connectivity of
the network 
• Nodes that connect two
sub-networks 
• Can be calculated for
edges as well
• Nodes with a high
degree are also called
hub nodes 
• Real networks have many
nodes with low degree
and few nodes with high
degree 
• Nodes with a high
degree tend to be
essential nodes 
• Regulatory elements like
transcription factors often
have a high out-degree
• Indication for how fast
information spreads from
a given node to other
reachable nodes in the
network
• The more central a node
is, the smaller is the
distance to all other
nodes, the higher is the
closeness
Material is adapted from BioSB 2015 Network Analysis Course

Brain connectivity
• A few regions that link the left and the right half of our brain
• They therefore have a high betweenness
AS. Panditet al, Cerebral Cortex (2014) Whole-brain mapping of structural connectivity in infants reveals altered connection strength associated with growth and preterm birth

Biological networks
• Free-scale networks (tend to have power-law degree
distribution)
• “Small world” networks (small average path length) 
• Have hierarchical modularity property (have a high
clustering coefﬁcient independent of network size)
• Robustness (have strong resistance to failure on random
attacks and vulnerable to targeted attacks)

Pattern (sub-networks) that occurs more often than in randomised networks
Network motifs
Diﬀerent types of network show diﬀerent motifs. Gene regulatory
networks with transcription factors have typical regulation motifs.

Motifs in yeast regulatory network
Image is adapted from Lee et al. Transcriptional Regulatory Networks in Saccharomyces cerevisiae, Science 2002

• consists of a regulator
that binds to the
promoter region of its
own gene 
• reduced response
time to environmental
stimuli 
• decreased cost of
regulation 
• increased stability of
gene expression

• consists of a
regulatory circuit
whose closure
involves two or more
factors  
• provides the capacity
for feedback control  
• offers the potential to
produce bistable
systems that can
switch between two
alternative states

• contains a regulator that
controls a second
regulator and both
regulators bind a common
target gene 
• acts as a switch that is
designed to be sensitive
to sustained inputs  
• provides control of
expression of target gene
depending on the
accumulation of adequate
levels of the master and
secondary regulators

v
• contains a single regulator
that binds a set of genes
under a speciﬁc condition
• is responsible for some
particular biological
function
v

v
v
• set of regulators that bind
together to a set of genes
• coordinates gene
expression across a wide
variety of biological
conditions 
• two different regulators
responding to two different
inputs allow coordinate
expression of the set of
genes under two different
conditions

v
• consists of chains of three
or more regulators in
which one regulator binds
the promoter for a second
regulator and so on
• simplest ordering of
transcriptional events 
• regulators functioning at
one stage of the cell cycle
regulate the expression of
factors required for entry
into the next stage of the
cell cycle

Community detection
in biological networks

Community detection
Figure is adapted from original https://cytoscape.github.io/cytoscape-tutorials/presentations/advanced-automation-2017-mpi.html#/11
Identifying closely-related groups of nodes (modules/clusters)
• Based on topology
• Based on a shared function(s)

Hub-based modules
Module contains a node with high degree and its ﬁrst neighbours

Clique modules
Module contains nodes that are all connected between each other

MCL-based modules
• Flow simulation based method
• Consider a graph with many links within a cluster, and fewer links
between clusters.
• This means if you were to start at a node, and then randomly travel
to a connected node, you’re more likely to stay within a cluster than
travel between.
• By doing random walks in the graph, it may be possible to discover.
where the ﬂow tends to gather, and therefore, where clusters are
• Random Walks on a graph are calculated using “Markov Chains”.
Image is adapted from https://micans.org/mcl/

Betweenness-centrality based modules
Algorithm step-wise removes edges (nodes) with the highest betweenness-centrality

Group functional
characterisation

Functional enrichment
Your gene 
list
• Each module contains a list of genes.

• You want to know the biological story behind this module.

Functional characterisation
Identify biological function of the module
Cellular component
Molecular function
Biological process
Gene Ontology
KEGG
Reactome
Pathways
Regulation
miRBase miRNAs
TRANSFAC TF targets
Biogrid PPIs
CORUM protein complexes
Human Phenotype Ontology
Extra

Genes with
known 
function x
?
Your gene 
list

Does your gene list includes more
genes with function x than expected by
random chance?
Genes with
known 
function x
?
Your gene 
list

Tool for functional enrichment
http://biit.cs.ut.ee/gproﬁler
J. Reimand, M. Kull, H. Peterson, J. Hansen, J. Vilo: g:Profiler - a web-based toolset for
functional profiling of gene lists from large-scale experiments (2007) NAR 35 W193-W200
 
Jüri Reimand, Tambet Arak, Priit Adler, Liis Kolberg, Sulev Reisberg, Hedi Peterson, Jaak
Vilo: g:Profiler -- a web server for functional interpretation of gene lists (2016 update)
Nucleic Acids Research 2016; doi: 10.1093/nar/gkw199

2175 modules found
Enrichment results for example module
https://biit.cs.ut.ee/graphweb/
Example of module functional
characterisation

Clustering based on enriched function
http://apps.cytoscape.org/apps/cluego

Questions & Answers
https://www.sli.do/ #P783
Ask a question Vote for a question
Open browser Go to www.slido.com Enter code #P783
4 5

Interpretation of the biological knowledge using networks approach

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Interpretation of the biological knowledge using networks approach

Similar to Interpretation of the biological knowledge using networks approach (20)

More from Elena Sügis

More from Elena Sügis (8)

Recently uploaded

Recently uploaded (20)

Interpretation of the biological knowledge using networks approach