Minicourse on Network Science

A Mini-Course on Network Science
Pavel Loskot
p.loskot@swan.ac.uk

Pavel Loskot c 2014 1/3
Course Outline
1. Introduction
• fundamentals of complex systems and graph theory
2. Structure
• sub-graphs, centrality measures, weighted networks, community
3. Random Models
• random, small world and scale free networks
4. Robustness
• some deﬁnitions and metrics
5. Processes
• epidemic spreading and information cascades
6. Algorithms
• max ﬂow and min cut, routing, search and navigation
7. Software
• using Matlab and Python, available software, few demos from YouTube

Used Resources
Ernesto Estrada
The Structure of Complex Networks: Theory and Applications
Oxford University Press, 2011
Cecilia Mascolo
Social and Technological Network Analysis
Course at University of Cambridge, UK
Jari Saram¨aki
Introduction to Complex Networks
Aalto University, Finland
Animesh Mukherjee
Complex Network Theory
IIT Kharagpur, India

Used Resources
Robert Leese
An Introduction to Clustering
Industrial Mathematics Knowledge Transfer Network
Kevin Wayne
Max Flow, Min Cut
Princeton University, USA
James F. Kurose and Keith W. Ross
Computer Networking, A Top-Down Approach
Pearson Education, 2012
Wikipedia
various topics

Complex Systems
Emergence of complexity:
• locally simple rules, and yet
globally complex behavior
• systems evolve, are dynamic and
adapt to the environment
Modeling:
• inﬁnitely many possibilities
• normally data-driven, but what data
to collect?
Emergence of stochasticity:
God doesn’t play dice with the world.
• many entities, complex interactions
• often useful to describe observations
statistically (joint PDF, correlations)
• human beings are living at the edge
of stochastic and deterministic world

Illustration of Complexity
Simple idea:
• send packets between two nodes
Implementation:
• how to distinguish end-nodes?
• how to ﬁnd the route?
• how to share network (resources)
among billions end-nodes?
• how to deal with lost and delayed
packets?
• how to deal with mobility and nodes
leaving and arriving?
Solution:
• evolution - solve problems iteratively
• separation - divide and conquer
• new problems emerge as network
growths: scalability, stability, security

Emergence of Order
Differing (spatial-temporal) perspectives
• insider: interacting with immediate neighbors (immediate, local)
• outsider: system level perception (average, global)

Description of Networks
1. Complete: everybody connected with everybody else
2. Random: connections selected arbitrarily at random
3. Random tree: connections selected arbitrarily at random, no cycles allowed
4. Real-world networks:
• exponential degree distribution and strongly disassortative
• small average path length and high clustering coefﬁcient
• several nodes with high (degree, closeness and betweenness) centrality
• several main communities
. . . and many other distinctive characteristics
Challenge: how to synthesize real-world networks with all these properties?

Formal Definitions
Network
• graph model of functional and/or structural relationships of a complex system
Time-invariant network
• graph G = (V, E) where set of nodes V = {v1, . . . , vN}, and set of edges (links)
E =⊂ V ⊗ V = {e1, . . . , eL}, i.e., every edge el ∈ E is associated with one pair
(vi, vj) ∈ V ⊗ V, or in other words, E is a set of (un)ordered pairs from V
• let’s not allow self-edges ([vi, vi] E) and duplicate-edges (E has unique
elements)
• nodes and edges are objects, but for analysis and evaluation purposes, we
need numbers, i.e., assign numbers (called weights) to nodes and edges
vn → Wv(n) el → We(l) el = [vi, vj] = Wv(i, j)
Dynamic networks
• graphs (nodes, edges) as
well as weights can vary
over time

Formal Definitions (cont.)
Graph edges (in structural models)
• only if two nodes communicate; this communication can be implemented in
many different ways (radiation, material transport flows, . . .)
• communicating nodes interact i.e. influence each other’s behavior
• communications are, first of all, information flows:
Two nodes communicate
• if there is enough information delivered (just sent is not enough) over a given
time-window i.e. communication is integral (average) quantity
• delivered information may be ignored, not recognized, or misinterpreted

Fundamentals (of Graph Theory)
(Un)directed graphs
• for directed graphs, E is a set of ordered pairs [u, v] ∈ V ⊗ V
Neighbors, degrees
• u is neighbor of v if (u, v) ∈ E, then u and v are said to be adjacent nodes
• (u, w), (v, w), (y, w) and (x, w) are adjacent edges which are incident at w
• in-degree kin, out-degree kout and k = kin + kout degree distributions are
important statistics (this assumes all edges counted with unit weights)

Fundamentals
Isomorphic graphs
• G1 and G2 are isomorphic if one-to-one mapping of vertices and (possibly
directed) edges (i.e., different visualizations of the same graph)
Edge (or connection) density
ρ =
|E|
|V|
2
=
2|E|
|V|(|V| − 1)
• |V|
2 = |V|(|V|−1)
2 is the maximum possible number of edges
• ρ = 1 if fully connected, real-world network ρ ≪ 1 (i.e. sparse)
• graph is sparse if |E| ≈ |V|, graph is dense if |E| ≈ |V|2
Clique
• ˜G = ( ˜V, ˜E) is a subgraph of G = (V, E) if ˜V ⊆ V and ˜E ⊆ E
• clique is a maximal, completely connected subgraph of the graph
• N-clique is a fully connected subgraph with N vertices
• clique number is the size of the largest clique in the graph

Fundamentals
Path, walk, trial
• path from v1 to vL is an ordered sequence of edges between ordered list of
vertices such that no vertex is visited twice
• length of path is the number of its edges (i.e. assuming edges of unit length)
• if there is no path between two vertices, their path length is inﬁnite
• distance of two vertices is their shortest path (having the smallest length)
• walk of length L from v1 to vL+1 is a sequence [v1, . . . , vL+1] where two
subsequent (only those) vertices are required to be different
• trial is a walk with no repeated edge
• cycle is a path that starts and ends at the same vertex
Diameter (d) and radius (r) of a graph
• the longest and the smallest shortest path, respectively:
d = max
u,v∈V
distance(u, v) r = min
u,v∈V
distance(u, v)

Fundamentals
Average path length
• it is the average shortest path between all pairs of vertices
¯d =
1
2 |V|
2 u,v∈V
distance(u, v)
• if some vertices u and v are disconnected (i.e., no path connecting u and v),
the average path length is harmonic mean instead (its reciprocal)
¯d =


1
2 |V|
2 u,v∈V
1
distance(u, v)


−1
Graph coloring
• assign labels to vertices, so that no adjacent vertices get the same label
• chromatic number is the minimum number of colors to solve the coloring

Fundamentals
Connectivity
• connected component is a subgraph where there is a path between every
pair of vertices; for directed graphs, the directions can be ignored
• connected graph if there is a path between every pair of its vertices; in other
words, the graph contains a single connected component
• (sub)graphs not connected are disconnected
• node connectivity is the smallest number of vertices when (they are)
removed, the graph becomes disconnected
• edge connectivity is the smallest number of edges when (they are) removed,
the graph becomes disconnected
• strongly connected component if its every vertex is reachable from any other
of its vertex (i.e., edge directions matter here)
Cutsets
• vertex cutset is a set of vertices when removed disconnect the graph (i.e.
increases the number of graph components); they are also known as
articulation points or brokers (in social networks)
• edge cutset is a set of edges when removed disconnect the graph

Fundamentals
• vertices G and H in graph below are cutset vertices
• bridges: if removed, the number of graph components increases
Basic graphs

Fundamentals
Tree
• connected graph with no cycles (adding only one link creates a cycle)
• becomes disconnected by removing any single link
• any pair of nodes is connected by exactly one path
• spanning tree is subgraph of a network including all its nodes and it is a tree
R-regular graph
• all vertices have degree R and there are |E| = R|V|/2 edges
Planar graph
• can be drawn in a 2D plane such that no two edges intersect
• among all complete graphs Cn, only C1, C2, C3 and C4 are planar
• example of embeddings of C4

Basic Graphs
Bipartite networks
• two sets of vertices, only edges between vertices in these two sets allowed
• graph is bipartite if it does not contain any odd cycles
• generalization to more than two sets of vertices is possible
Graph matching
• given graph G = (V, E), a matching M ⊆ E in G is a set of edges not sharing
any common vertex
• maximal matching any edge
added to M will violate matching
• maximum matching contains the
largest number of edges

Adjacency Matrix
• (binary) adjacency matrix [A]ij =
1 if [vi, vj] ∈ E
0 otherwise
• for undirected graphs, A is symmetric (i.e. A = AT
)
k = (1T
A)T
= A1 degree distribution
• for directed graphs, A is asymmetric
kin
= (1T
A)T
in-degree distribution
kout
= A1 out-degree distribution
• average degree of a graph
¯k =
2|E|
|V|
=
1T
k
|V|
=
1T
A1
|V|
= u∈V k(u)
|V|

Adjacency Matrix
• for undirected bipartite graphs with vertices V = V1 ∪ V2, |V1| = n1, |V2| = n2,
A =
0n2×n1
RT
n1×n2
Rn1×n2
0n1×n2
• generally, An
, n = 1, 2, . . . denotes the number of paths of length n in graph,
i.e., [An
]ij is the number of distinct n-hop paths between vertices i and j
• [AT
A]ij and [AAT
]ij is the number of vertices connected to/from the
vertices vi and vj at the same time, respectively
• tr A3
/6 is the number of triangles in the matrix
• open and closed triangles
• closed triangle represents 6 closed triplets (starting at each of 3 vertices in
2 directions)
Incidence matrix
• |V| × |E| matrix, [B]ij =
1 if vi ∈ ej
0 otherwise
• degree matrix is a diagonal matrix D = diag k1, . . . , k|V|
• adjacency matrix can be also expressed as A = BBT
− D

Adjacency Matrix
Graph spectrum
• recall that for undirected graph, adjacency matrix is symmetric, so its
eigenvalues are real-valued and referred to as graph spectrum
• eigenvalue λ and eigenvector v satisfy Av = λv, i.e., (λI − A)v = 0
• characteristic polynomial pA(t) = det(tI − A) = i(t − λi) of matrix A has
the roots the eigenvalues of A
• Laplacian matrix of graph G is L = D − A (degree minus adjacency matrix):
[L]ij =



[D]ii = k(i) i = j
−1 i j and [vi, vj] ∈ E
0 otherwise
and the spectrum of graph G are eigenvalues of L (rather than of A)
Properties of Laplacian
• multiplicity of λ0 = 0 of L is the number of connected components of G
• eigenvalue λ0 = 0 corresponds to eigenvector v0 = [1, . . . , 1]T
, i.e., Lv0 = 0
• L = BBT
where B is |V| × |E| incidence matrix of graph G = (V, E)

Power-Law Distribution
• long-tail (right) with many low-connected vertices (left) (80-20 rule)
• many real-world networks experience this degree distribution, so they have
star-like topology
• also known as scale-free distribution of scale-free networks; these networks
are self-similar at different (spatial-temporal) scales
p(k) = A k−γ
→ p(c · k) = A c−γ
· p(k)
• cumulative degree distribution (CDD)
P(k) =
∞
k′=k
p(k′
) ≈ k−(γ−1)
(the probability the degree at least k)

Analyzing Degree Distributions
Degree-degree correlations
• assortivity coefﬁcient or Pearson correlation coefﬁcient (r)
Assortative mixing (r > 0)
• bias towards connections between nodes with similar characteristics (hubs
tend to connect to each other)
• useful, e.g. to understand spread of diseases and their treatment
Disassortative mixing (r < 0)
• dissimilar nodes tend to connect to each other (hubs avoid each other)
Neutral mixing (r = 0)
• connections follow some probability distribution

Analyzing Degree Distributions
Mathematically:
• let pij be the probability of edge to have degrees ki and kj at both ends
ij
pij = 1
i
pij = qj =
kjpj
j kjpj
• perfectly assortative networks have pij = qiδij (only nodes of the same
degree connect)
• if degrees independent, then pij = qiqj
• Pearson coefﬁcient, −1 ≤ r ≤ 1
r =
E (ki − E[ki])(kj − E kj )
σ2(ki) σ2(kj)
=
i,j kikj(pij − qiqj)
max i,j kikj(pij − qiqj)
=
i,j kikj(pij − qiqj)
i,j kikj(δij − qi)qj

Degree Distributions
Degree-degree correlation
• directed graphs (networks)
Summary
• for large graphs, edges (topology) can be considered statistically
• degree distribution is partial statistical description (of topology)
• degree-degree correlation is more informative, but still incomplete info

Take-Home Messages
Complex Systems
• consists of large number of interacting components
• graphs are very good mathematical models of these systems; they are very
generic objects with many specific instances (trees, lists, tables etc.)
• availability of observations (measurements data) is a strong driving force
• a common systematic framework to study these systems: Network Science
History of modern science
Problems of simplicity (1600-1800) understanding influence of one
variable over another
Problems of disorganized (1900-1950) number of variables is very large
complexity but system as a whole has well-
defined average behavior
Problems of organized (1950-today) simultaneously dealing with number
complexity of factors forming whole system
- W. Weaver, 1948

Similarity of Networks
• the Nature is built up of complex networks
• there is need to have a common framework for systematically describing,
analyzing and eventually synthesizing networks to mimic the Nature

Comparing Networks
Similarity of (static) networks
1. calculate and compare (a vector of) metrics for each network; N.B. we can
only compare scalar values (e.g. Euclidean distances between vectors)
OR
2. identify distinctive subgraphs at certain granularity, and compare those
Graphlets [Prˇzulj, 2004]
• pictured right: 30 subgraphs of
2-5 nodes of 73 possible types
• generalizes vector of node
degrees to graphlet degrees; it
is a vector of 73 components of
the number of nodes of given
type in the network
Fragments
• quantitative analysis relies on
correlations between fragment
statistics in the network and the
network properties

Comparing Networks
Motifs [Milo, 2002]
• subgraphs having the statistical significance of occurrence much larger than
if the network was created completely at random
• network randomization:
1. select two links at random, 2. exchange their end-points, 3. repeat
• a motif in the real network occurs much more often than (on average) in an
ensemble of random networks having the same degree distribution
• we require that the probability of motif appearing in an ensemble of random
networks at least the number of times as in real network is small
• this is quantified by the Z-score (N denotes the number of occurrences)
Z =
Nreal − E[Nrandom]
E (Nrandom − E[Nrandom])2
• motifs are network specific, although families of networks can share the
same motifs
• importance of motifs can be evaluated as the significance profile (SP) vector
SP =


Z1
i Z2
i
,
Z2
i Z2
i
, · · ·



Comparing Networks
Motif examples
Relative abundance of fragments
• assume that ensemble of random networks has the same nodes degrees as
the real-world network
α =
Nreal − E[Nrandom]
Nreal + E[Nrandom]

Comparing Networks
Relative abundance examples
• ratios of the number of fragments occurrences are also useful to characterize
the network structure as shown next

Transitivity Measures
Clustering coefficient
• recall that every triangle represents three connected (open or closed) triples
• let | ˜T3| be the number of triangles and | ˜P2| the number of 2-paths (connected
triples with 2 or 3 ties); the clustering coefficient (a.k.a. network transitivity):
C3 =
3| ˜T3|
| ˜P2|
where | ˜T3| = tr A3
/6, and | ˜P2| =
1
2 ij
[A2
]ij − tr{A}
• a network can be highly clustered locally, but not globally (i.e., considering
average of local clusterings across all nodes is not sufficient)
• clustering tends to be much larger for real-world than random networks
Example
• A’s friends: B,C,D and E
• all possible edges among A’s friends: B-C, B-
D, B-E, C-D, C-E, D-E, i.e., 6 in total and out of
which only 1 (C-D) exists
• thus, clustering coefficient of A is 1/6
Generalization
• any subgraph, ratio of actual to maximum
possible number of its occurrences: Cn = n| ˜Tn|
| ˜Pn|

Centrality Measures
Aim
• quantify importance of nodes in a network (so-called positional advantage),
i.e. how nodes contribute to the overall structural properties of the network
• e.g. important nodes disseminate information faster, can stop spreading
epidemics, can protect network from breaking and so on
Degree centrality
• hubs are likely to have the largest inﬂuence (e.g. number of friends to help)
• a transitivity measure since it is ratio of single (neighboring) node fragments
• for a network of N nodes, i-th node of degree ki has degree centrality
C1(i) =
| ˜T1|
| ˜P1|
= CD(i) =
ki
N − 1
Network centralization (centrality) (σ2
C1
)
σ2
C1
=
1
N − 1
N
i=1
C1(i) − ¯C1
2
where ¯C1 =
1
N
N
i=1
C1(i)
• star topology has the maximum while line
topology has the minimum centralization

Centrality Measures
Freeman’s degree centrality
• quantify variations in node degree centrality in the whole network
¯CD =
N
i=1(k∗
− ki)
(N − 1)(N − 2)
where k∗
= maxi ki, and max N
i=1(k∗
− ki) = (N − 1)(N − 2) for a star network
Betweenness centrality (beyond nearest neighbors)
• quantify node importance in communications between pairs of other nodes
• ability to broker between groups, likelihood of intercepting information etc.
• thus, it is the likeliness of node w to be involved in communications
CB(w) =
1
n−1
2 u w v
ρ(u, w, v)
ρ(u, v)
(normalization optional)
ρ(u, w, v) number of shortest paths between u and v via w
ρ(u, v) number of all shortest paths between u and v
OR
ρ(u, w, v) maximum ﬂow from u to v through w
ρ(u, v) total maximum ﬂow from u to v

Centrality Measures
Example (betweenness centrality)
• A and E are not in-between any pairs, B and D are in-between 3 pairs, and
C is in-between 4 pairs
Closeness centrality
• measure of how much the node is in “middle of things”
• let d(u, v) be the shortest path length between nodes u and v
CC(u) =


1
N − 1 u v
d(u, v)


−1
(normalization optional)
Example (closeness centrality)
CC(A)= 1+2+3+4
4
−1
= 0.4

Centrality Measures
Information centrality
CIC(i) =


1
N j
1
Iij


−1
Eigenvector centrality (xu)
• account for connections that are (or not) isolated; important nodes are likely
connected to other important nodes
• let B(u) be the neighbors of node u
xu =
1
λ v∈B(u)
xv =
1
λ v∈V
[A]u,v xv ⇒ Ax = λx
algorithm: initialize xu = 1 ∀u, re-calculate xu ∀u, λ = maxu xu, repeat
Katz centrality
• instead of counting shortest paths (as in closeness centrality), count all paths
• let 1 < α < λ1 (largest eigenvalue of A)
CK(i) = [Z · 1]i where Z =
∞
k=1
α−k
Ak
= I −
1
α
A
−1
− I
so the values of CK(i) are dependent on choice of α

Centrality Measures
PageRank centrality
• reflects the probabilities that random walk through the network arrives to any
particular node
• intuitively, if there are many links out of node v, one of these links to node u
represents average recommendation of u by v; if the number of links out of v
is reduced, recommendation of u by v increases
• define the modified adjacency matrix [H]ij =
1/kout(i) if [vi, vj] ∈ E
0 otherwise
• PageRank vector CPR = [CPR(1), . . . ,CPR(N)]T
at step k is updated as
Ck+1
PR := Ck
PR · H
note that node 4 traps a random
walker, and also, the search
is often randomly reset (with
probability 1 − α), so this
modified H should be used
instead:
H′
= αH +
α
N
(a1T
) +
1 − α
N
where [a]i =
1 if kout(i) = 0
0 otherwise

Centrality Measures
Reciprocity (r)
• in directed networks, link from u to v can be reciprocated as link from v to u;
these are called co-links
r =
ij[A]ij[A]ji
|E|
(fraction of reciprocated links)
Rich-Club coefﬁcient of degree k (R(k))
• hubs tend to be densely interconnected which is quantiﬁed by R(k)
• let subgraph (V′
(k), E′
) ⊆ (V, E) where V′
(k) ⊆ V is subset of nodes with
degree at least k, and E′
⊆ E are the corresponding edges among V′
(k)
R(k) =
|E′
|
|V′(k)|
2
Matching index (µij)
• quantify similarity of connectivity of the two end-vertices of an edge
• small value of µij indicates the edge between vi ∈ V and vj ∈ V is a bridge
between two dissimilar regions of the network
µij =
k i,j AikAk j
i k Aik + j k Ajk

Weighted Networks
Graph Network System
vertex node component
edge link interaction
Weights mapping
• weights can be assigned to vertices as well as to (more often) edges; we
assume mapping
W : (V, E) → (V, W)
so weighted adjacency matrix and original adjacency matrix, respectively,
[W ]ij = wij ∈ R [A]ij =
1 |wij| ≥ ∆
0 |wij| < ∆
Vertex strength
• degree distribution is generalized to the strength distribution having again a
power-law-like tails in many real-world networks
s(i) =
j
wij
• it was observed that node strength and node degree have dependency as
E[s|k] ∝ kβ
, β > 0
for β > 1, high-degree vertices (hubs) tend to be high-strength vertices

Weighted Networks
Other generalizations
• the edge contributions can be normalized as wij/ j wij = wij/s(i), e.g. the
average nearest (first order) neighbor degree
kNN(i) =
j
wi j
s(i)
[A]ij k( j)
• importantly, there are no generally agreed definitions of quantities (metrics)
for weighted networks, e.g. the clustering coefficient ([A]ij ≡ aij)
C3(i) =
1
s(i)(k(i) − 1) j,k
wij + wik
2
aijajkaik [Barrat, Barthélemy, Vespignani]
C3(i) =
1
k(i)(k(i) − 1) j,k
(wijwikwjk)1/3
[Onnela, Saramäki, Kaski,Kertész]
C3(i) =
j,k wijwjkwik
k wik
2
− k w2
ik
[Zhang] C3(i) =
j,k wijwjkwki
(maxij wij) j,k wijwik
[Holme]
where for unweighted network we assume wij = 1 if [vi, vj] ∈ E, and 0 otherwise

Weighted Networks
Time-series as graphs
1. pre-processing: reduce measurement noise, reduce amount of data
2. calculate magnitude of correlations (possibly with thresholding)
0 ≤ [W ]ij =
E didj − E[di] E dj
E d2
i − E[di]2
E d2
j − E dj
2
≤ 1
3. construct a weighted graph assuming the weight matrix W

Weighted Networks
Spanning tree of a graph
• a tree topology containing all nodes of the graph
• possibly additional requirement to maximize or minimize the sum of edge
weights
• it can be used to emphasize clusters in the graph, but . . . a lot of information
is discarded and is also sensitive to noise and thresholding
Example (NYSE stocks)

Community Structure
Network communities
• so far, we considered local and
global structure and properties;
here, we look at spatial scale
in-between individual nodes and
the whole network - clusters
• clusters are obtained by network
partitioning or clustering
• our objective is to partition the
network using only its topology
Why clustering
• manage complex systems by creating hierarchy, for example,
Big Data analysis and classiﬁcation such as large databases, customer
recommendations, website ranking, genomics, market evaluations etc.
• identify bridges and weak ties in networks
Formally
• ﬁnd P disjoint subsets Vi, so that ∪P
i=1Vi = V and Vi ∩ Vj is empty set for i j

Community Structure
Balanced partitioning
• given P, the size of partitions is approximately equal i.e. |Vi| ≈ |V|/P
• possibly also, the cut (the links between subsets) size can be minimized
Community
• there is a path through the community between every pair of nodes
• internal connection density signiﬁcantly larger than density of external
connections
Cut size
• assume a weighted network, and the partition ˜V ⊂ V
• the internal and external weights of a node vi ∈ ˜V in the partition
Wint(i) =
vj∈ ˜V
wij , Wext(i) =
vj ˜V
wij
• the cut size between ˜V and the rest of nodes V ˜V
Ccut( ˜V) =
1
2 vi∈V
Wext(i)

Community Structure
Reducing cut size
• moving node vi in or out the partition ˜V will change the cut size by
g(i) = Wext(i) − Wint(i)
so cut size is reduced if Wext(i) > Wint(i)
• for partitions already balanced, consider replacing one node in the partition
(i.e., move one node out and another node in); the cut size is changed by
g(i, j) =
g(i) + g( j) − 2wij if vi and vj connected
g(i) + g( j) otherwise
Centrality based partitioning
• links connecting nodes in different communities are likely to have large edge
betweenness centrality (deﬁned analogically to node betweenness)
Algorithm [Girvan, Newman, 2002]
1. calculate edge betweenness for all links, remove link with highest such value
2. recalculate edge betweenness for remaining links
3. repeat until all links have been removed

Community Structure
Modularity
• need to compare different partitions to decide which one is the best
• intuitively, cohesion or links density within the community is likely to be
signiﬁcantly larger than if the community is formed at random
• for partitioning ∪P
i=1Vi = V, with edges Ei within the partition Vi, the
modularity indicator (ci is community assignment of vertex vi)
Q =
P
i=1


|Ei|
|E|
−


vj∈Vk
k(vj)
2|E|


2

 = . . . =
1
2|E| i,j
[A]ij −
k(i)k( j)
2|E|
δ(ci, cj)
so it is actual number of edges minus expected number of edges inside the
community for a random subgraph with the same node degree distribution
• Q ≥ −1, and max Q = 1 for strong community structure
• can be used as stopping criterion (Q >> 0) in Girvan-Newman algorithm
Modularity optimization
• ﬁnd partitioning with maximum modularity (exact solution is NP complete):
complexity O |V|
|V|/2 ∼ 2|V|
√
π|V|/2
for large |V|

Community Structure
Resolution problem
• modularity based clustering may fail
to identify obvious small clusters
close to a large cluster
• modularity is deﬁcient if clusters are
circularly connect (pictured right)
• other similarity measures also
affected (minimum cuts, . . .)
Possible solution
• use multiple similarity metrics
• then choose the best partition by
consensus (e.g. majority vote)

Community Structure
Hierarchical clustering
• complexity O((|E| + |V|)|V|), many networks are sparse (|V| ≈ |E|)
Algorithm
1. Initialize: |V| communities of 1
vertex each
2. Calculate modularity ∆Q for all
pairs of existing communities
3. Merge the community pair
having the largest increase ∆Q
4. Build the dendrogram and
repeat steps 2 and 3 until only
one community remains
Clustering based on Euclidean distance

Community Structure
Merging clusters
• similarity between clusters can be measured as single linkage: minimum
between all pairs of nodes in two clusters
• Complete linkage: maximum between all pairs of nodes in two clusters
• Average linkage: average between all pairs of nodes in two clusters
Limitations of modularity
• appears to be strongly dependent on the density of links in the network
• thus, not good measure to determine communities in sparse networks
Clustering techniques
1. Agglomerative (bottom-up) techniques: edges are added among nodes to
create communities (e.g. dendrogram)
2. Divisive (top-down) techniques: edges are removed from graph to create
separate communities
3. Spectral techniques: graph splitting based on eigen-analysis
Similarity measures
• quantify (dis)similarity between nodes to decide on communities in all
clustering algorithms
• selection strongly application dependent (modularity, cosine similarity,
Jaccard’s coefﬁcient, . . .)

Community Structure
Louvain method (based on modularity optimization)
• more accurate and more efﬁcient (much faster) than hierarchical clustering
• number of communities decreases quickly in only few iterations
Algorithm
1. Initialize: every node is in its own community
2. For each node i, consider all its neighbors j, and check if moving i into j’s
community increases ∆Q
3. Move i into community for which ∆Q is maximum
4. Repeat steps 2 and 3 until no further improvement possible (i.e. ∆Q = 0)
5. collapse the communities into single nodes (merging multiple edges
between these new nodes), and go back to step 2

Community Structure
K-means clustering
• number of clusters K predeﬁned
• minimize e.g. Euclidean distances:
{Vi}i=1,...,K = argmin
K
i=1 v∈Vi
v − ¯vi
2
, ¯vi =
1
|Vi| v∈Vi
v
Algorithm
1. Initialize: select K vertices
at random as initial clusters
and assign remaining
vertices to nearest clusters
2. Calculate new centroids ¯vi
for each cluster
3. Re-assigned all vertices to
the nearest clusters
4. Go to step 2 until some
stopping criterion is met
89% of data correctly classiﬁed

Community Structure
Limitations of K-means clustering
• sensitivity to initial conditions and outliers
• sensitivity to non-homogeneous structure, i.e. clusters differ signiﬁcantly in
size, connection density, non-spherical shape (for Euclidean distance metric)

Community Structure
Gaussian mixture models
• assume there are K clusters, vertex vi has location xi
• location of vertices in cluster Vi are normally distributed ∼ N(x|µi, λi)
• let zi, i = 1, . . . , K be independent latent variables such that zi = 1 if cluster i
∼ N(x|µi, λi) and zi = 0 otherwise, so Pr(z) = K
i=1 πzi
i
• if zi are known, the data are labeled (parameters of their distribution are
known), otherwise data are un-labeled (unsupervised learning)
• πi = Pr(zi) are mixing probabilities (weights), so that K
i=1 πi = 1 and the
distribution of location x of a vertex is
p(x) =
z
p(x|z) p(z) =
K
i=1
πi N(x|µi, λi), wherep(x|z) =
K
i=1
N(x|µi, λi)zi
• unknown parameters: mixing coefﬁcients (πi), means (µi), covariances (λi)
• using Bayes’ theorem, we can ﬁnd posterior probabilities (responsibilities)
Pr(zi|x) that the k-th Gaussian component has in explaining observed data
Algorithm [Expectation Maximization (EM)]
1. E-step: evaluate Pr(zi|x) given current parameters
2. M-step: re-estimate parameters using current Pr(zi|x)

Community Structure
Overlapping communities
• nodes may belong to more than one community (i.e. subsets Vi not disjoint)
Clique percolation method [Palla 2005]
• K-clique are K fully connected nodes
• K-cliques adjacent if share K − 1 nodes
• K-clique community is a set of nodes
connected through adjacent K-cliques
Algorithm
1. identify maximal cliques in the
network (complex problem, but
fortunately many real-world
networks are relatively sparse)
2. consider cliques as single
nodes; interconnect cliques if
they share at least K − 1 nodes
3. identify connected components
in graph created in step 2

Community Structure
Spectral clustering
• K-means, Gaussian mixtures,
hierarchical method are good for
compact clusters
• spectral clustering transforms
the data into a new basis where
standard algorithms work well
Algorithm
1. Construct similarity matrix,
[S]ij = Exp − xi − xj
2
/2σ2
2. Construct Laplacian L = D −S
where D is diagonal matrix of
weights, [D]ii = j[S]ij
3. Construct matrix U of k
eigenvectors corresponding
to k largest eigenvalues of L
4. Perform clustering on the
transformed data x′
= UT
x

Community Structure
Real-time clustering
• (dynamic) re-clustering for every new data arrival is expensive
• (dynamically) varying the number of clusters is confusing
Hierarchical Agglomerate Clustering [HAC, 2004]
1. Initialization: hierarchical clustering (e.g. using dendrogram)
2. new data either assigned to one of existing cluster, OR
3. new data form new cluster, and two existing clusters are merged

Community Structure
Community analysis
• distribution of community sizes
• intra-community edge densities
• number of intra- and inter-community links
• average number of communities per node
• . . .
Community network
• communities → nodes
• edges weighted by number of links
between communities

Take-Home Messages
Network structure analysis
• structure of un-weighted static networks, i.e. knowing only their topology
• subgraphs, graphlets, fragments and motifs are building blocks of large
networks; the statistics of their occurrence is useful to compare network
topology beyond their degree distribution
• network partitioning or clustering to identify (overlapping) communities
Measures of network structure
• centrality (degree, betweenness, closeness, eigenvector, Katz, PageRank,
. . .)
• clustering coefficient, Rich-Club coefficient
• modifications of measures for weighted networks

Statistical Modeling
Objectives
• account for models or parameters uncertainty, measurement noises etc.
• make (short-to-medium term) predictions from the models
• generate artificial data for verifying models and predictions
• decide how much randomness influence properties; here we compare
structural and functional properties of random and real-world networks
Milgram’s experiment [1967]
• famous “six-degree separation”
• 300 people at random to send letter
to a person in Boston
• repeated in 2003: 18 targets, 60k
senders, communications via emails
• new findings in 2003: median 5 −
7 steps, network structure is not
everything, high impact of incentives
• Facebook: 92% of users only 5 hops
away, 99% at 6 hops away

Random Network Models
Erdos-Renyi (ER) random graph [1959]
• graph GER(n, p) with n vertices and edges chosen independently with
probability p, “a zero-order approximation” of real-world networks
• thus, vertex degree is random (for large n, binomial distribution approximated
by Poisson distribution)
Pr(k) =
n − 1
k
pk
(1 − p)N−1−k
≈ e−¯k
¯kk
k!
, ¯k = E[k] = (n − 1)p
• most vertices have average linking to other nodes (i.e. degree close to ¯k)
• diameter (d) and average distance (¯l) between two vertices is relatively small
compared to the size of the graph
d =
ln(n)
ln(p(n − 1))
=
ln(n)
ln(¯k)
≈ ¯l
• average number of edges E[|E|] = p N
2 = n¯k
2 ; the latter, since ¯k = 2E[|E|]
n
Connectivity of ER random graph
• average degree
¯k



< 1 graph disconnected
> 1 a giant component appears
≥ ln(n) graph (almost) completely connected

Average path length and diameter of ER (random) graph
• let l(i, j) be the shortest path between vertices vi and vj
• the shortest paths can be combined into a single metric as
¯l = 1
(n
2) i,j
i j
l(i,j)
2 average shortest path
d = maxi,j l(i, j) maximum shortest path (diameter)
• if n is large, the average path length ¯l ∝ ln n is relatively small and
growths slowly with network size (this is typical for many large networks);
for comparison, 1D lattice (chain): ¯l ∝ n, 2D lattice: ¯l ∝ n1/2

Clustering coefficient of ER graph
• ratio of neighbors being friends to all possible friendships among neighbors
• probability that two neighbors are connected is p, so clustering coefficient
CER = p =
¯k
n
which is much smaller than for real-world networks with the same density
• for large networks limn→∞ CER = 0, so large random networks resembles a
tree (i.e. they have no clustering)
Components in ER graphs
• if p (and thus, also ¯k) is small, there
are several disjoint components
• if p is increased, there is one giant
component (of size ≈ n) with the
rest of nodes being in isolated small
components
• the giant component appears when
p ≈ 1/n, e.g. for n = 103
(see figure)

Percolation transition (when increasing p)
1. Subcritical: ¯k < 1, many small simple components of size at most ln n
2. Critical: ¯k ≈ 1, size of largest component is ∼ n2/3
, the giant component
appears and starts growing
3. Supercritical: ¯k > 1, there is one giant component of size almost n, the
second largest component has size about ln n
Summary of ER graph
• degree distribution is Poisson (most nodes have degree close to average)
with no correlations of node degrees
• average path length is small and ∝ ln n
• connectivity depends on ¯k with percolation transition

Random geometric models [Penrose, 2003]
• main motivation: some networks can grow subject to geometric constraints
• e.g., place n nodes randomly in (2D) space; two nodes i and j connected
only if their distance xi − xj ≤ r
• there exists critical radius rc to form a connected giant component (if r > rc):
rc =
√
ln n + O(1)
πn
Random distance models [Avin, 2008]
• n nodes placed randomly in (2D) space
• links created randomly with the probability ∝ f(dij) where dij = xi − xj

Small World Networks
More on Milgram’s experiment
• how accurate 6 degree separation, how likely the chain to be completed
Findings from real-world social networks
• sub-optimal choice in choosing next link in chain is made 1/2 of time
• Facebook measurements: average distance is 4.74
• Twitter measurements: average distance is 4.67 (50% are at 4 steps, nearly
everyone in 5 steps)

Main features
high clustering: Creal−world ≫ Crandom
average path length: ¯lreal−world ≈ ¯lrandom
Watts-Strogatz (WS) small world network model [Nature, 1998]
• launched the interest into complex networks (over 3.5k citations)
• single control parameter to generate regular to purely random networks
• the model: 1. generate regular graph, 2. rewire links with probability p
• in all network generators: self-loops and duplicated links not allowed

WS original model
• select fraction of p edges and rewire one of their end-points
WS model alternation
• add fraction p of edges to initial regular lattice

Properties of WS model:
Degree distribution
Pr(k) =
min(k−K,K)
i=0
K
i (1 − p)i
pK−i(pK)k−K−i
(k−K−i)! e−pK
∼ Poisson-like distribution
Clustering coefﬁcient
• if node i has K neighbors,
C =
#edges among K neighbors
(K
2)
• probability that connected triple still
connected after rewiring is (1 − p)3
• C(p = 0) = 3k−3
4k−2 = 3
4, C(p = 1) ≈ 2k
n
• C(p) C(p = 0) · (1 − p)3
, i.e.,
C(p)/C(0) (1 − p)3
Average path length: ¯l ≈ (n−1)(n+2k−1)
4kn

Kleinberg’s geographical small world model [Nature, 2000]
• connectivity derived from geographical distances
• the model: 1. link nearest neighbors 2. add links with the probability
Pr(link between u and v) ∼ const × u − v −r
where r is navigability exponent (e.g., links are purely random for r = 0)
Hierarchical small world model [Science, 2001]
• hierarchically nested groups, link probability pij ∼ exp−αxi j
Other strategies for generative models of small world networks
• add/rewire links based on chosen properties of current links and edges
• add/rewire links to optimize particular property of the network

Topology trade-offs
(a) commuter rail network
(b) star network
(c) minimum spanning tree network

Bridges
• in social networks, close friends know what you know, and they also know
others who know what you know
• bridge between A and B (if removed, these two nodes become disconnected)
• local bridge between A and B (if removed, distance A-B increased to > 2)

Strong and weak ties [Granovetter, 1974]
• in social networks, links are strong or weak ties (friends vs. acquaintances)
• strong triadic closure: if A-B and A-C are strong ties, then at least weak tie
between B-C exists
• if there are enough strong ties in network, local bridges must be weak ties
Almost local bridges
• neighbor overlap of nodes A and B:
#neighbors of both A and B
#neighbors of at least A or B
= N(i,j)
(k(i)−1)+(k(j)−1)−N(i,j)
• almost local bridges are links
whose end-nodes have no common
neighbors (i.e., the overlap of their
neighbors is 0)

Removing ties from social networks (percolation analysis)
• removing weak ties breaks down the network
• removing strong ties degrades the network more smoothly
• however, this is speciﬁc only to social networks
Removing ties from other networks
• e.g. removing important road (strong tie) is more damaging
• central veins are more important then peripheral veins

Illustration of weak vs. strong ties removal
(a) original network
(b) 80% of strongest links removed, 20% of weak ties remain
(c) 80% of weakest links removed, 20% of strong ties remain
• no evidence of degradation for (b), network clearly fragmented in case (c)
• strong links are within dense neighborhoods (triangles, cliques etc.)
• weak links (and bridges) interconnect these dense neighborhoods

Scale Free Networks
Power-law distribution
Pr(k) ∼ const×k−γ
, typically 2 < γ < 3
• i.e., a straight line in log-log domain:
log Pr(k) ∼ −γ log k + log const
• always a few highly connected hubs

Scale Free Networks
Power-law distribution
• some other distributions look like power-laws
• estimating γ may not be so easy
• 1st and 2nd moments:
E[k] ∝
∞
k0
k × k−γ
dk = lim
k→∞
1
2 − γ


1
kγ−2
−
1
kγ−2
0

 =
= ∞ if γ ≤ 2
< ∞ if γ > 2
E k2
∝
∞
k0
k2
× k−γ
dk =
= ∞ if γ ≤ 3
< ∞ if γ > 3
Preferential attachment
• now, the concern is how to generate scale-free networks
• we use richer-get-richer effect and add new nodes sequentially:
1. with probability p, choose any existing node and link to it
2. with probability 1 − p, link to existing node with probability proportional to
their current degrees

Scale Free Networks
Barab´asi-Albert (BA) scale-free model
1. Growth: start from seed network of m0 isolated nodes
2. Preferential attachment: add a new node with m ≤ m0 edges to existing
nodes that are chosen with the probability Π(i) = k(i)/ i k(i)
3. after t steps, the network has n = m0 + t nodes and mt edges, and
Π(i) =
k(i)
i k(i)
=
k(i)
2mt − m
≈
k(i)
2mt
• this procedure generates degree distribution
Pr(k) =
2m(m + 1)
k(k + 1)(k + 2)
2m
k3
∝ k−3
so γ = 3 and average degree ¯k = 2m
• average shortest path length: ¯l ∝ ln n
ln(ln n)
• clustering coefﬁcients: C ∝ (ln n)2
n
( . . . too small for real-world networks)

Scale Free Networks
Question
• In scale-free networks, how much is “popularity” predictable?
Answer
• if we restart the process, different popular nodes will emerge
Other scale-free network models
• motivation: improve clustering coefﬁcient
and allow to change exponent γ
[Holme, Kim]
• after preferential attachment step,
with probability p, add one more
edge to randomly selected neighbor
• resulting clustering coefﬁcient C ∝ 1
k
( . . . much more realistic)
[Vazquez et al.]
• random walk instead of preferential attachment (“get to know important
people through people you already know”)
[Kleinberg, Kumar]
• copy a vertex and rewire its edges with certain probability

Scale Free Networks
Network generator with given γ
1. Initialize: seed network with m0 (isolated) nodes
2. Add one node and m links (not necessarily stemming from the new node) at
each time; after t time-steps, the link is added to node i with the probability
Π(i) = α
k(i)
i k(i)
+ (1 − α)
1
t + m0
where i k(i) = 2mt
3. thus, α = 1 leads to preferential attachment, and α = 0 is for uniform
attachment, and the degree distribution
Pr(k) ∝ k−(1+1/α)
Models with non-power-law distribution
• power-law distribution is a good ﬁt for large networks (averaging effect)
• on smaller scales, in more “specialized” sub-networks, power-law may not
be such a good ﬁt
• log-normal distribution has been observed in such cases

Scale Free Networks
Configuration network model
• degrees are pre-assigned to n nodes assuming a degree distribution Pr(k)
• edges are added by randomly selecting pairs of these n nodes
• a family of graphs generated this way will have the same degree distribution
• excess degree is the number of possible outward links of a node which
has been arrived to during a walk (i.e., one less than the node degree in
undirected graphs)
Pr(kexcess) =
(k + 1)Pr(k + 1)
k kPr(k)
Other models
• many other stochastic models of networks can be devised and then analyzed
• hence, it is important to define quality of such models, e.g. (generally):
– flexibility (design for specific parameter settings)
– mathematical tractability
– accuracy (to fit experimental data, make predictions)

Take-Home Messages
Random networks
• Erdos & Renyi studied a simple model in 1959
• it has Poisson degree distribution with small average path length, but
clustering goes to zero with network size
Small world networks
• the world is small, 6 degree separation (Milgram’s experiment)
• short average path length, but clustering still smaller than in real networks
• real-world networks contain weak and strong ties
• Watts & Strogatz proposed simple model of small world networks (in 1998)
Scale free networks
• main focus is to produce power-law distribution
• Barabási & Albert proposed model based on preferential attachment (in
1999); many modifications of this model can be (and were) devised
Network models
• mostly stochastic with main motivation is to emulate real-world networks
• find structural properties to explain specific (global) properties of networks
• useful to define quality of these models

Robustness
Percolation
• monitor network metrics while nodes or edges are being removed
Dual problem
• monitor network metrics while nodes or edges are being added
1. What strategy to remove/add nodes or edges?
• no knowledge: nodes and edges removed (uniformly) at random
• knowledge of structure: removing nodes and edges with high centrality
• adding nodes and edges: cf. random network generators
2. Which metrics most relevant and should be monitored?
• rate of decay/growth of: network diameter, average degree, average
distance, size of giant component etc.
3. Which (class of) networks to consider?
• any network, networks with speciﬁc degree distribution etc.
4. Why to consider robustness?
• in general, networks resilience to attacks is a growing concern
• want to design networks that are robust to damage

Robustness
Pragmatic deﬁnition
• the network is robust if it can withstand accidental damage, random topology
changes as well as intentional attacks and remain operational
• this accounts for the remaining nodes and links to be able to carry ﬂows and
perform other tasks without excessive congestion, dead-locks etc.
• observing average decay (e.g. size of giant component) may not be that
useful (e.g. it cannot identify local congestion further impairing the network)
• note also that we are still considering only networks with static topology
Example
• 50 nodes, removing 40 out of 116 edges decreases ¯k from 4.6 to 3.0

Robustness as Stability
Global stability
• system is stable if it returns to equilibrium after any perturbation
Resistance
• ability of a community to resist change in face of potentially perturbing force
Resilience
• ability of a community to recover to normal functioning after disturbance
Variability
• variations in community density over time (measured e.g. as changes in
mean/variance) due to external disturbances

Robustness
Percolation threshold
• if ¯k decreases by removing
edges, network suddenly becomes
disconnected
• if ¯k increases by adding edges, giant
component suddenly emerges
Examples
p probability of ﬁlling squares, at p critical, giant connected component appears

Robustness
Experiment [Barab´asi et al., 2000]
strategy: random failures versus targeted attacks removing nodes
metrics: average or maximum (network diameter) shortest path
networks: exponential versus scale-free (the same |V| and |E|)

Robustness
Experiment (cont.)
effect on size of
giant component s
and its average s

Robustness
Experiment (cont.)
(Internet and WWW)
effect on size of
giant component s
and its average s

Robustness of Scale-Free Networks
Random failures vs targeted attack
(a) original network of 574 nodes
(b) removing 20% (115) of nodes randomly
leaves 427 nodes in giant component
(c) removing only 2.8% (22) most connected hubs
leaves 301 nodes in giant component
Bottom line
• scale-free networks are robust against random failures
• they are very vulnerable against targeted attacks

Robustness of Scale-Free Networks
Impact of power-law exponent on robustness (∼ k−γ
)
• γ = 2.5: graceful degradation
• γ = 3.5: giant component disappears
at about f = 40%
• assume e.g. case of γ = 2.7 (square
markers)
• kmax is maximum degree among
remaining nodes
• removing only 1% of nodes discards
giant component (top ﬁgure)
• kmax has to be very small to destroy
giant component (bottom ﬁgure)

Robustness
Percolation threshold for random failures
• in general, minimum fraction of nodes required (i.e., that cannot be randomly
removed) for giant component to exist
fc =
E[k]
E k2 − E[k]
• specialized for random networks:
fc =
1
E[k]
thus, if ¯k = E[k] is large, random network can withstand large losses; e.g. if
¯k = 4, then 1/4 of nodes is enough for giant component to exists (i.e., 3/4 of
nodes have to be removed to destroy giant component)
• specialized for scale-free networks:
fc −→ 0
as E k2
tends to be very large (even inﬁnite) which makes these networks
very robust against random failures (and attacks)

Take-Home Messages
Scale-free networks
• very robust against random failures (some suggest that this is the reason
why these networks are found so often in real world)
• but very vulnerable against attacks to highly connected hubs
• since hubs are also responsible (and effective) for spreading messages,
diseases etc. through the network
• in Social Networks, it is not hubs but rather weak ties and bridges that make
these networks vulnerable
Small world networks
• have not been considered here
• one extreme is a ring with one hop neighboring connections without any
shortcut links; such network is not robust at all
• another extreme is a fully connected network which is unbreakable
• small world networks are in-between these two extremes; their robustness
is likely derived from the density of shortcut links
Making networks more robust
• obvious strategy is to guarantee some minimum degree for every node (i.e.,
to achieve connections redundancy)

Epidemic Spreading
Network processes
• strongly influenced by network structure; e.g. shortcuts significantly speed
up spreading (of information, diseases) and synchronization of processes
• hence, understanding of such network(ed) (distributed) processes requires
understanding of the underlying network structures
• e.g. neurons integrate signals from neighbors, if above threshold, the
excitation fires and then fades away; this leads to oscillating cascades
• here, we consider diffusion of diseases characterized by contagion (lack
of choice), unlike information spreading where nodes make decisions to
maximize their pay-offs

Epidemic Spreading
Simple spreading models
• ring topology with shortcuts
• all nodes susceptible
• nodes infected with probability p
• spreading of disease, computer
viruses, . . .
• tree topology of spreading in waves
of k nodes, all nodes susceptible
• nodes infected with probability p
a) p is large, disease spreads out
b) p is small, disease dies out
Reproductive number: R0 = kp
a) if R0 < 1, disease dies out in ﬁnite
number of waves
b) if R0 > 1, disease very likely infects
at least 1 person in each wave

Epidemic Spreading
Limitations of simple models
• small changes in k and p can move R0 above or below threshold (R0 ≷ 1)
• network topology not realistic (e.g. no triangles)
• nodes get infected only once and never recover
More realistic: SI model
• two classes of nodes: S (susceptible) and I (infected)
• once infected, the node cannot recover
|V| = |S | + |I| total number of nodes (V = S ∪ I)
β = λ¯k infection rate per node (0 ≤ λ ≤ 1)
β|S |/|V| susceptible contacts per unit of time
dI
dt = β|S ||I|/|V| overall rate of infection
• let i = |I|/|V| be fraction of nodes infected, then di
dt = β i (1 − i) which yields a
logistic curve:
i(t) =
i(0) eβt
1 − (1 − eβt) i(0)

Epidemic Spreading
More realistic: SIR model
• improve SI model by assuming infected nodes recover at rate υ, i.e., nodes
stay infected only for (average) time τ = 1/υ
• recovered node will become resistant (i.e. cannot be infected again)
• deﬁne fractions s = |S |/|V|, i = |I|/|V|, r = |R|/|V|, so s + i + r = 1; rate of
change of these fractions over time:
ds
dt
= −βsi,
di
dt
= βsi − υi,
dr
dt
= υi
solution again requires initial conditions s(0), i(0) and r(0)
Possible outcomes
a) disease may die out
b) disease may spread to
whole network
c) disease becomes endemic
(does not spread, nor die
out)

Epidemic Spreading
More realistic: SIS model
• no (permanent) recovery, but infected node may again become susceptible
• infected to susceptible rate υ:
a) if β > υ, logistic growth (as in SI model), but never infects whole population
b) if β → υ, then i → 0 (infection will slowly die out)
c) if β < υ, then infection dies out exponentially
• mathematical model (assumes r = 0):
ds
dt
= υi − βsi,
di
dt
= βsi − υi, s + i = 1

Epidemic Spreading
Prognosis of epidemic
• reproductive number: R0 = β/υ
a) if R0 > 1, infection survives
b) if R0 < 1, infection dies out
• in SI model, υ → 0, so R0 ≫ 1
(a) SI model
(b) SIR model
(c) SIS model
Extensions of SIR model
• rather than assuming recovery after τ time units, let recovery be possible at
each time with some (ﬁxed) probability
• infected state further subdivided (e.g. early, middle and ﬁnal disease stages)
• non-homogeneous mixing: restrictions how the nodes meet (e.g. travel to
geographical locations, quarantining, . . . )
• other random network models (note that Erdos-Renyi model with
homogeneous mixing was implicitly assumed in SI, SIR and SIS models)

Epidemic Spreading
SIS model in scale-free networks
• experimentally observed that
computer viruses survive
signiﬁcantly longer than
predicted from SIS model
over random networks
• it was found that there is no
epidemic threshold in scale-free
networks, so infection proliferate
independently of spreading rate
• however, there is critical fraction
of shortcuts in scale-free
networks; if enough shortcuts,
disease suddenly becomes
epidemic
• critical fraction of shortcuts is a
function of rates β and υ

Epidemic Spreading
Network immunization
• random networks: uniformly random immunization is helpful
• scale-free networks: targeted degree-based immunization required as
random immunization does not help
• targeted local immunization: immunize one immediate neighbor for every
node in a randomly selected group (i.e. nodes with higher degree are more
likely to be immunized)
red-circles: random immunization of scale-free network
red-squares: targeted immunization of scale-free network
black-squares: random & targeted immunization of random network

Take-Home Messages
Epidemic spreading
• practical modeling requires to extract model parameters from real data
• knowledge of nodes mobility is key to accurate modeling of spreading
• epidemic spreading strongly inﬂuenced by information diffusion (i.e. knowing
what is happening and what to do)
• predictive modeling (if epidemic spreading on-going, it is desirable to be in
real-time) is routinely used in practice as prevention
SARS prediction and comparison with real outbreak data

Network Dynamics
Spatial-temporal scales
(a) short: link activation and deactivation
– topology is a snapshot
– connected components must respect time sequences of links
(b) longer: topology change from one structure to another
– communities formation, merging, splitting
– large communities persist in time if there is exchange of their members
– small communities persist if their core is highly connected with strong ties
(c) long: network evolution (birth, growth and decline)
– in scale-free networks, in spite of changes (nodes and links appear and
disappear), degree, weight and strength distributions remain stationary

Information Cascades
Aims
• understand how behaviors, ideas, technology usage etc. are adopted,
inﬂuenced and spread through networks
Diffusion model
• two nodes v and w
• two behaviors A and B
• two pay-offs a > 0 and b > 0
(i.e., the larger, the better)
Network implications
• let 0 ≤ p ≤ 1, and there
are d neighbors of v
• pd neighbors of v
choose A
• (1 − p)d neighbors of v
choose B
• A is better strategy if:
pd·a ≥ (1−p)d·b ⇒ p ≥
b
a + b

Information Cascades
Example diffusion in a network
• let a = 3, b = 2, so b
a+b = 2/5
• A: dark circles, B: light circles
(b) only v and w adopt A
(c) nodes r and t switch to A (i.e. 2/3 neighbors of A); u does not switch (but
1/3 of its neighbors chose A); note also that: 1/3 < 2/5 < 2/3
(d) also nodes s and u switch to A

Take-Home Messages
Cascades
• initial adoption by few nodes may generate complete cascade
• it is dependent on network structure
• it is also crucially dependent on threshold b/(a+b), so changing pay-offs can
make big difference (e.g. making the product more attractive)
• OR, directly influence key nodes (initial adopters)
• densely inter-connected clusters are difficult to penetrate
• key parameters: clusters connection density and pay-off threshold
Role of weak ties
• very useful in spreading information
• poor in transferring behaviors that are risky and/or costly
Influencing nodes
• in networks with many clusters, users are more easily influenced
• reinforcement is very important in influencing users
• node centrality is crucial for (information, behavior) diffusion

Max Flow and Min Cut
Scenario
• single source node s and single sink node t (for simplicity)
• directed edges between nodes represent ﬂows (information, material, . . . )
• every edge assigned a weight representing max possible ﬂow ≡ capacity

Dual problems (of combinatorial optimization)
1. find minimum cut of a graph G = (V, E) where V is set of nodes and E are
weighted edges (max flows)
2. find maximum possible total flow from s ∈ V to t ∈ V over E while flows at
every other node are equalized (in-flow = out-flow)
Cut (S, T)
• node partitioning V = S ∪ T such that S ∩ T = ∅ and s ∈ S and t ∈ T
Capacity of cut (S, T)
• sum of weights (capacities) leaving set S and entering set T

Minimum cut problem
• find the cut with the minimum capacity
Maximum flow problem
• assign flows to edges not larger than their capacity, so that total flow from s
to t is maximized and flows in all other nodes (V{s, t}) are equalized

Observation 1
• ﬂow from S to T is equal to the total ﬂow reaching sink t

Observation 2
• flow from S to T is at most equal to capacity of the cut
• if flow from S to T is equal to capacity of the cut, then we have maximum
possible flow from S to T and (S, T) is minimum cut

Greedy algorithm
1. select a path from s to t and set its flow to be equal to the minimum capacity
among its edges (≡ bottleneck)
2. for every edge, obtain residual capacity ≡ capacity - flow (“undo” flow sent):
i.e. add edge (w, v) to every edge (v, w) with positive residual capacity
3. augment path with strictly positive residual capacities

Ford-Fulkerson algorithm
• greedy algorithm to find a maximum flow
• find augmenting path with strictly positive residual capacities
• if path can no longer be augmented, the flow is maximum
Max-flow min-cut theorem
• The value of maximum flow is equal to the capacity of the minimum cut.
Complexity of Ford-Fulkerson algorithm
• assume capacities are integers 1, . . . , U
• Theorem 1: the algorithm terminates in at most |V| · U iterations.
• Theorem 2: if all edge capacities are integers, then the maximum flow has
integer values of flows on every edge.

Choosing initial augmenting path
• some choices lead to exponential time algorithm, clever choices lead to
polynomial time algorithm (number of iterations):
1. choose path with fewest edges (shortest path, breadth first search)
2. choose path with maximum bottleneck capacity (fastest path, priority or
depth first search)
Application: Bipartite matching
• find maximum matching of a bipartite graph G
• solve max-flow problem for extended graph G′
• by integer theorem (see above), there exists a maximum flow with 0/1 values

Take-Home Messages
Applications of max-flow and min-cut theorem
• Network connectivity
• Bipartite matching
• Data mining
• Open-pit mining
• Airline scheduling
• Image processing
• Project selection
• Baseball elimination
• Network reliability
• Security of statistical data
• Distributed computing
• Egalitarian stable matching
• Distributed computing
• . . .
There are many efficient algorithms for solving max-flow min-cut problem.

Network Routing
Routing algorithms
• ﬁnd the least cost path between any two nodes in the (telecommunication)
network
• link cost: e.g. capacity, inverse of delay, or more simply, all links have a unit
cost
• path cost: sum of link costs along the path
1. Link state routing algorithms
• assume every node has knowledge about network topology and all link costs
• thus, all nodes have the same (global) knowledge (how?)
• so-called centralized or link state algorithms
2. Distance vector routing algorithms
• only local knowledge of link costs to all neighbors
• iterative computations in collaboration with neighbors
• so-called decentralized or distance vector algorithms

Network Routing
Link state routing: Dijkstra algorithm
• every node computes the least cost path to all other nodes in the network
• the computed paths are stored in so-called forwarding table
• after K iterations, the least cost paths known for K destination nodes
Algorithm:
c(x, y) link cost between neighbors x and y (= ∞ if not neighbors)
D(v) current cost of path from source to destination node v
p(v) predecessor node along path from source to node v
V′
set of nodes whose least cost paths already known

Network Routing
Dijkstra algorithm example
• the shortest path constructed by tracking predecessors
• if ties encountered, they can be broken arbitrarily

Network Routing
Dijkstra algorithm example
Complexity of Dijkstra algorithm
• at each iteration, need to check N nodes not in V′
, i.e., N(N + 1)/2
comparisons ∼ O(N2
)
• more efﬁcient implementations devised ∼ O(N log N)

Network Routing
Distance vector algorithm
• fully distributed generation of forwarding tables
• based on Bellman-Ford equation (dynamic programming)
dx(y) = minv∈N(x) (c(x, v) + dv(y))
v∗
= argminv∈N(x) (c(x, v) + dv(y))
N(x) neighbors of node x
c(x, v) link cost from x to v
dv(y) cost from neighbor v to destination y
v∗
next hop in least cost path from x to y
Example
dv(z) = 5, dx(z) = 3, dw(z) = 3
du(z) = min


c(u, v) + dv(z),
c(u, x) + dx(z),
c(u, w) + dw(z)


= min


2 + 5,
1 + 3,
5 + 3


= 4

Network Routing
Distance vector algorithm
• Dx(y) is least cost from x to y and it is iteratively estimated
• every node x maintains distance vectors for yourself and all its neighbors;
recall that V is set of all nodes and N(x) is set of neighbors of x
Dx = Dx(y) : y ∈ V
Dv = Dv(y) : y ∈ V , v ∈ N(x)
as well as x knows costs c(x, v) to all its neighbors v ∈ N(x)
• key idea is to periodically exchange distance vectors Dx among neighbors;
the vectors are then updated using B-F equation as:
Dx(y) ← min
v∈N(x)
(c(x, v) + Dv(y)) , for ∀y ∈ V
so (under some minor conditions) estimate Dx(y) −→ true value dx(y)
Distance vector updates (at each node)
1. asynchronous: triggered by change of local link cost, or by update message
from the neighbor
2. synchronous: notify all neighbors if own distance vector changes

Network Routing
Example updates

Network Routing
“Good news travel fast”
“Bad news travel slow”
Comparison
Link state Distance vector
Messages O(|V| · |E|) msgs sent local exchange only
Convergence O(|V|2
), may have time varies, possibly loops,
oscillations count-to-inf problem
Robustness may advertise incorrect may advertise incorrect
link cost, each node path cost, each node’s
computes its own table table used by others
(errors propagate )

Search on Networks
• the aim is to find some source-destination path in reasonable amount of time
• the path cost is not an issue unlike in routing
Surprising observations (from real-world networks)
1. short paths exist between pairs of nodes (6 degree separation)
2. these short paths can be discovered (and used)
Remarks
• both observations closely interrelated
• it is not so clear how to discover (or even create) these short paths
• typical situation is nodes have only local rather than global information;
flooding to discover the destination known to be very inefficient
Decentralized search
• Kleinberg’s small world network
model: n × n grid of nodes with local
connections plus every node v has a
random long range link to node w
Pr(v link to w) ∼ d(v, w)−α
, α ≥ 0
and distance d(v, w) ≡ #grid steps
• value of α trade-offs how random
long-range connections are

Search on Networks
Comparing search strategies
• efficiency of a search strategy is expected delivery time (over random long-
range contacts i.e. topology, and random source-destination pairs)
• delivery time ∼ number of hops in the graph (unit-weight links)
Trading-off value of α
• α = 0 long-range links are uniformly distributed (∼ WS model), difficult to
navigate having only local knowledge (and knowing location of destination)
• for α = 0, the actual chosen path to destination is likely to be significantly
longer than the corresponding shortest path
• α > 0 higher clustering, long-range links less random, more realistic scenario
• lower-bounds on expected delivery time [Kleinberg 2000]
¯TD ≥



const × n(2−α)/3
0 ≤ α < 2
const × (log n)2
α = 2
const × n(α−2)
2 < α < 3
thus, α = 2 is a polynomial in log n, while other cases are polynomials in n

Search on Networks
Web search
• information retrieval since 60’s using “textual analysis”
• more recently, information ranked by its score (e.g., #links to it)
Scoring a webpage
• #webpages pointing to it (unit-weight links)
• sum of the scores of neighboring webpages pointing to it

Search on Networks
Authorities
• nodes pointed to by highly ranked nodes
• they offer prominent, highly endorsed answers to queries
Hubs
• nodes that point to highly ranked nodes
Assessing authorities and hubs
• compute weights h(i) (for hubs) and b(i) (for authorities)
h(i) =
j
[A]ij b( j)
b(i) =
j
[A]ij h( j)
• the weights are computed iteratively as (in matrix form)
ht+1 = (AAT
)ht
bt+1 = (AT
A)bt
• main drawback: it requires global knowledge (of A), so it is query-dependent

Search on Networks
PageRank (named by the Google founder)
• ranking pages independently of queries
• main idea: page is important if it is linked by other important pages
• every page is assigned a weight
w( j) =
i
[A]ij w(i) ·
1
dout(i)
w(i) weights of in-bound neighboring pages
dout(i) out-degree of node i to dilute its importance
if it links to many other nodes
• the weights w(i) are probabilities that from any starting page, the page i is
reached via a random walk
• however, if some page does not have out-bound links, the random walker
gets trapped; so with probability s choose random walk, and with probability
(1 − s) jump randomly to any other node

Search on Networks
Strategies
• many strategies may be devised, some are more efficient than others
• decentralized search is a practical requirement in large networks
• in social networks, weak (social) ties and hierarchy play significant role
• visiting the same nodes while searching is inefficient, yet there is tendency
to visit hubs often
To aid the search
• nodes as sources of information are scored (e.g. by level of trust)
• exploiting network structure of (distributed) information helps significantly
• challenge: real-time updates of contents
• ranking (i.e. scoring) algorithms are kept secret and changed (updated)
continuously

Take-Home Messages
Routing
• it is not only to find source-destination path, but the one having least cost
• it is implicitly assumed that each node has an address (identification)
• routing in the Internet evolved over time (i.e., it has not been designed from
the beginning)
• it is still unclear why the Internet routing works so well at such large scales
• main issues with the Internet routing are robustness, security and congestion
Search on small world and scale free networks
• small world networks have small short path length and high clustering
coefficient, however, Watts-Strogatz (WS) model does not capture
navigability of real-world networks
• search is fast and scales well in scale-free networks

Software Requirements for Graph Data
Tasks
• input data in common format (e.g. Excel, CSV, . . . )
• convert (output) data into the desired format (GraphML, Pajek, . . . )
• Social Network Analysis (SNA) of data
• dynamic (temporal) analysis
• data visualization
Requirements
• steep learning curve (easy to grasp)
• ﬂexibility (use different formats for input and
output)
• scalability (Big Data, application dependent)
• speed (if Big Data or real-time)
• parallel and distributed computing capability
(MapReduce)
• functionality as modules or add-ins
• . . .

Networks in Matlab

Networks with Python

Networks in C, R, Python

Networks Visualization and Analysis

Networks Community Analysis

Social Network Analysis

Popular in Bioinformatics

Networks Online Demos

Networks Data

Minicourse on Network Science

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Minicourse on Network Science

Similar to Minicourse on Network Science (20)

More from Pavel Loskot

More from Pavel Loskot (17)

Recently uploaded

Recently uploaded (20)

Minicourse on Network Science