SlideShare a Scribd company logo
1 of 147
Download to read offline
A Mini-Course on Network Science
Pavel Loskot
p.loskot@swan.ac.uk
Pavel Loskot c 2014 1/3
Course Outline
1. Introduction
• fundamentals of complex systems and graph theory
2. Structure
• sub-graphs, centrality measures, weighted networks, community
3. Random Models
• random, small world and scale free networks
4. Robustness
• some definitions and metrics
5. Processes
• epidemic spreading and information cascades
6. Algorithms
• max flow and min cut, routing, search and navigation
7. Software
• using Matlab and Python, available software, few demos from YouTube
Pavel Loskot c 2014 2/3
Used Resources
Ernesto Estrada
The Structure of Complex Networks: Theory and Applications
Oxford University Press, 2011
Cecilia Mascolo
Social and Technological Network Analysis
Course at University of Cambridge, UK
Jari Saram¨aki
Introduction to Complex Networks
Aalto University, Finland
Animesh Mukherjee
Complex Network Theory
IIT Kharagpur, India
Pavel Loskot c 2014 3/3
Used Resources
Robert Leese
An Introduction to Clustering
Industrial Mathematics Knowledge Transfer Network
Kevin Wayne
Max Flow, Min Cut
Princeton University, USA
James F. Kurose and Keith W. Ross
Computer Networking, A Top-Down Approach
Pearson Education, 2012
Wikipedia
various topics
Networks: Introduction
Pavel Loskot c 2014 1/22
Complex Systems
Emergence of complexity:
• locally simple rules, and yet
globally complex behavior
• systems evolve, are dynamic and
adapt to the environment
Modeling:
• infinitely many possibilities
• normally data-driven, but what data
to collect?
Emergence of stochasticity:
God doesn’t play dice with the world.
• many entities, complex interactions
• often useful to describe observations
statistically (joint PDF, correlations)
• human beings are living at the edge
of stochastic and deterministic world
Pavel Loskot c 2014 2/22
Illustration of Complexity
Simple idea:
• send packets between two nodes
Implementation:
• how to distinguish end-nodes?
• how to find the route?
• how to share network (resources)
among billions end-nodes?
• how to deal with lost and delayed
packets?
• how to deal with mobility and nodes
leaving and arriving?
Solution:
• evolution - solve problems iteratively
• separation - divide and conquer
• new problems emerge as network
growths: scalability, stability, security
Pavel Loskot c 2014 3/22
Emergence of Order
Differing (spatial-temporal) perspectives
• insider: interacting with immediate neighbors (immediate, local)
• outsider: system level perception (average, global)
Pavel Loskot c 2014 4/22
Description of Networks
1. Complete: everybody connected with everybody else
2. Random: connections selected arbitrarily at random
3. Random tree: connections selected arbitrarily at random, no cycles allowed
4. Real-world networks:
• exponential degree distribution and strongly disassortative
• small average path length and high clustering coefficient
• several nodes with high (degree, closeness and betweenness) centrality
• several main communities
. . . and many other distinctive characteristics
Challenge: how to synthesize real-world networks with all these properties?
Pavel Loskot c 2014 5/22
Formal Definitions
Network
• graph model of functional and/or structural relationships of a complex system
Time-invariant network
• graph G = (V, E) where set of nodes V = {v1, . . . , vN}, and set of edges (links)
E =⊂ V ⊗ V = {e1, . . . , eL}, i.e., every edge el ∈ E is associated with one pair
(vi, vj) ∈ V ⊗ V, or in other words, E is a set of (un)ordered pairs from V
• let’s not allow self-edges ([vi, vi] E) and duplicate-edges (E has unique
elements)
• nodes and edges are objects, but for analysis and evaluation purposes, we
need numbers, i.e., assign numbers (called weights) to nodes and edges
vn → Wv(n) el → We(l) el = [vi, vj] = Wv(i, j)
Dynamic networks
• graphs (nodes, edges) as
well as weights can vary
over time
Pavel Loskot c 2014 6/22
Formal Definitions (cont.)
Graph edges (in structural models)
• only if two nodes communicate; this communication can be implemented in
many different ways (radiation, material transport flows, . . .)
• communicating nodes interact i.e. influence each other’s behavior
• communications are, first of all, information flows:
Two nodes communicate
• if there is enough information delivered (just sent is not enough) over a given
time-window i.e. communication is integral (average) quantity
• delivered information may be ignored, not recognized, or misinterpreted
Pavel Loskot c 2014 7/22
Fundamentals (of Graph Theory)
(Un)directed graphs
• for directed graphs, E is a set of ordered pairs [u, v] ∈ V ⊗ V
Neighbors, degrees
• u is neighbor of v if (u, v) ∈ E, then u and v are said to be adjacent nodes
• (u, w), (v, w), (y, w) and (x, w) are adjacent edges which are incident at w
• in-degree kin, out-degree kout and k = kin + kout degree distributions are
important statistics (this assumes all edges counted with unit weights)
Pavel Loskot c 2014 8/22
Fundamentals
Isomorphic graphs
• G1 and G2 are isomorphic if one-to-one mapping of vertices and (possibly
directed) edges (i.e., different visualizations of the same graph)
Edge (or connection) density
ρ =
|E|
|V|
2
=
2|E|
|V|(|V| − 1)
• |V|
2 = |V|(|V|−1)
2 is the maximum possible number of edges
• ρ = 1 if fully connected, real-world network ρ ≪ 1 (i.e. sparse)
• graph is sparse if |E| ≈ |V|, graph is dense if |E| ≈ |V|2
Clique
• ˜G = ( ˜V, ˜E) is a subgraph of G = (V, E) if ˜V ⊆ V and ˜E ⊆ E
• clique is a maximal, completely connected subgraph of the graph
• N-clique is a fully connected subgraph with N vertices
• clique number is the size of the largest clique in the graph
Pavel Loskot c 2014 9/22
Fundamentals
Path, walk, trial
• path from v1 to vL is an ordered sequence of edges between ordered list of
vertices such that no vertex is visited twice
• length of path is the number of its edges (i.e. assuming edges of unit length)
• if there is no path between two vertices, their path length is infinite
• distance of two vertices is their shortest path (having the smallest length)
• walk of length L from v1 to vL+1 is a sequence [v1, . . . , vL+1] where two
subsequent (only those) vertices are required to be different
• trial is a walk with no repeated edge
• cycle is a path that starts and ends at the same vertex
Diameter (d) and radius (r) of a graph
• the longest and the smallest shortest path, respectively:
d = max
u,v∈V
distance(u, v) r = min
u,v∈V
distance(u, v)
Pavel Loskot c 2014 10/22
Fundamentals
Average path length
• it is the average shortest path between all pairs of vertices
¯d =
1
2 |V|
2 u,v∈V
distance(u, v)
• if some vertices u and v are disconnected (i.e., no path connecting u and v),
the average path length is harmonic mean instead (its reciprocal)
¯d =


1
2 |V|
2 u,v∈V
1
distance(u, v)


−1
Graph coloring
• assign labels to vertices, so that no adjacent vertices get the same label
• chromatic number is the minimum number of colors to solve the coloring
Pavel Loskot c 2014 11/22
Fundamentals
Connectivity
• connected component is a subgraph where there is a path between every
pair of vertices; for directed graphs, the directions can be ignored
• connected graph if there is a path between every pair of its vertices; in other
words, the graph contains a single connected component
• (sub)graphs not connected are disconnected
• node connectivity is the smallest number of vertices when (they are)
removed, the graph becomes disconnected
• edge connectivity is the smallest number of edges when (they are) removed,
the graph becomes disconnected
• strongly connected component if its every vertex is reachable from any other
of its vertex (i.e., edge directions matter here)
Cutsets
• vertex cutset is a set of vertices when removed disconnect the graph (i.e.
increases the number of graph components); they are also known as
articulation points or brokers (in social networks)
• edge cutset is a set of edges when removed disconnect the graph
Pavel Loskot c 2014 12/22
Fundamentals
• vertices G and H in graph below are cutset vertices
• bridges: if removed, the number of graph components increases
Basic graphs
Pavel Loskot c 2014 13/22
Fundamentals
Tree
• connected graph with no cycles (adding only one link creates a cycle)
• becomes disconnected by removing any single link
• any pair of nodes is connected by exactly one path
• spanning tree is subgraph of a network including all its nodes and it is a tree
R-regular graph
• all vertices have degree R and there are |E| = R|V|/2 edges
Planar graph
• can be drawn in a 2D plane such that no two edges intersect
• among all complete graphs Cn, only C1, C2, C3 and C4 are planar
• example of embeddings of C4
Pavel Loskot c 2014 14/22
Basic Graphs
Bipartite networks
• two sets of vertices, only edges between vertices in these two sets allowed
• graph is bipartite if it does not contain any odd cycles
• generalization to more than two sets of vertices is possible
Graph matching
• given graph G = (V, E), a matching M ⊆ E in G is a set of edges not sharing
any common vertex
• maximal matching any edge
added to M will violate matching
• maximum matching contains the
largest number of edges
Pavel Loskot c 2014 15/22
Adjacency Matrix
• (binary) adjacency matrix [A]ij =
1 if [vi, vj] ∈ E
0 otherwise
• for undirected graphs, A is symmetric (i.e. A = AT
)
k = (1T
A)T
= A1 degree distribution
• for directed graphs, A is asymmetric
kin
= (1T
A)T
in-degree distribution
kout
= A1 out-degree distribution
• average degree of a graph
¯k =
2|E|
|V|
=
1T
k
|V|
=
1T
A1
|V|
= u∈V k(u)
|V|
Pavel Loskot c 2014 16/22
Adjacency Matrix
• for undirected bipartite graphs with vertices V = V1 ∪ V2, |V1| = n1, |V2| = n2,
A =
0n2×n1
RT
n1×n2
Rn1×n2
0n1×n2
• generally, An
, n = 1, 2, . . . denotes the number of paths of length n in graph,
i.e., [An
]ij is the number of distinct n-hop paths between vertices i and j
• [AT
A]ij and [AAT
]ij is the number of vertices connected to/from the
vertices vi and vj at the same time, respectively
• tr A3
/6 is the number of triangles in the matrix
• open and closed triangles
• closed triangle represents 6 closed triplets (starting at each of 3 vertices in
2 directions)
Incidence matrix
• |V| × |E| matrix, [B]ij =
1 if vi ∈ ej
0 otherwise
• degree matrix is a diagonal matrix D = diag k1, . . . , k|V|
• adjacency matrix can be also expressed as A = BBT
− D
Pavel Loskot c 2014 17/22
Adjacency Matrix
Graph spectrum
• recall that for undirected graph, adjacency matrix is symmetric, so its
eigenvalues are real-valued and referred to as graph spectrum
• eigenvalue λ and eigenvector v satisfy Av = λv, i.e., (λI − A)v = 0
• characteristic polynomial pA(t) = det(tI − A) = i(t − λi) of matrix A has
the roots the eigenvalues of A
• Laplacian matrix of graph G is L = D − A (degree minus adjacency matrix):
[L]ij =



[D]ii = k(i) i = j
−1 i j and [vi, vj] ∈ E
0 otherwise
and the spectrum of graph G are eigenvalues of L (rather than of A)
Properties of Laplacian
• multiplicity of λ0 = 0 of L is the number of connected components of G
• eigenvalue λ0 = 0 corresponds to eigenvector v0 = [1, . . . , 1]T
, i.e., Lv0 = 0
• L = BBT
where B is |V| × |E| incidence matrix of graph G = (V, E)
Pavel Loskot c 2014 18/22
Power-Law Distribution
• long-tail (right) with many low-connected vertices (left) (80-20 rule)
• many real-world networks experience this degree distribution, so they have
star-like topology
• also known as scale-free distribution of scale-free networks; these networks
are self-similar at different (spatial-temporal) scales
p(k) = A k−γ
→ p(c · k) = A c−γ
· p(k)
• cumulative degree distribution (CDD)
P(k) =
∞
k′=k
p(k′
) ≈ k−(γ−1)
(the probability the degree at least k)
Pavel Loskot c 2014 19/22
Analyzing Degree Distributions
Degree-degree correlations
• assortivity coefficient or Pearson correlation coefficient (r)
Assortative mixing (r > 0)
• bias towards connections between nodes with similar characteristics (hubs
tend to connect to each other)
• useful, e.g. to understand spread of diseases and their treatment
Disassortative mixing (r < 0)
• dissimilar nodes tend to connect to each other (hubs avoid each other)
Neutral mixing (r = 0)
• connections follow some probability distribution
Pavel Loskot c 2014 20/22
Analyzing Degree Distributions
Mathematically:
• let pij be the probability of edge to have degrees ki and kj at both ends
ij
pij = 1
i
pij = qj =
kjpj
j kjpj
• perfectly assortative networks have pij = qiδij (only nodes of the same
degree connect)
• if degrees independent, then pij = qiqj
• Pearson coefficient, −1 ≤ r ≤ 1
r =
E (ki − E[ki])(kj − E kj )
σ2(ki) σ2(kj)
=
i,j kikj(pij − qiqj)
max i,j kikj(pij − qiqj)
=
i,j kikj(pij − qiqj)
i,j kikj(δij − qi)qj
Pavel Loskot c 2014 21/22
Degree Distributions
Degree-degree correlation
• directed graphs (networks)
Summary
• for large graphs, edges (topology) can be considered statistically
• degree distribution is partial statistical description (of topology)
• degree-degree correlation is more informative, but still incomplete info
Pavel Loskot c 2014 22/22
Take-Home Messages
Complex Systems
• consists of large number of interacting components
• graphs are very good mathematical models of these systems; they are very
generic objects with many specific instances (trees, lists, tables etc.)
• availability of observations (measurements data) is a strong driving force
• a common systematic framework to study these systems: Network Science
History of modern science
Problems of simplicity (1600-1800) understanding influence of one
variable over another
Problems of disorganized (1900-1950) number of variables is very large
complexity but system as a whole has well-
defined average behavior
Problems of organized (1950-today) simultaneously dealing with number
complexity of factors forming whole system
- W. Weaver, 1948
Networks: Structure
Pavel Loskot c 2014 1/32
Similarity of Networks
• the Nature is built up of complex networks
• there is need to have a common framework for systematically describing,
analyzing and eventually synthesizing networks to mimic the Nature
Pavel Loskot c 2014 2/32
Comparing Networks
Similarity of (static) networks
1. calculate and compare (a vector of) metrics for each network; N.B. we can
only compare scalar values (e.g. Euclidean distances between vectors)
OR
2. identify distinctive subgraphs at certain granularity, and compare those
Graphlets [Prˇzulj, 2004]
• pictured right: 30 subgraphs of
2-5 nodes of 73 possible types
• generalizes vector of node
degrees to graphlet degrees; it
is a vector of 73 components of
the number of nodes of given
type in the network
Fragments
• quantitative analysis relies on
correlations between fragment
statistics in the network and the
network properties
Pavel Loskot c 2014 3/32
Comparing Networks
Motifs [Milo, 2002]
• subgraphs having the statistical significance of occurrence much larger than
if the network was created completely at random
• network randomization:
1. select two links at random, 2. exchange their end-points, 3. repeat
• a motif in the real network occurs much more often than (on average) in an
ensemble of random networks having the same degree distribution
• we require that the probability of motif appearing in an ensemble of random
networks at least the number of times as in real network is small
• this is quantified by the Z-score (N denotes the number of occurrences)
Z =
Nreal − E[Nrandom]
E (Nrandom − E[Nrandom])2
• motifs are network specific, although families of networks can share the
same motifs
• importance of motifs can be evaluated as the significance profile (SP) vector
SP =


Z1
i Z2
i
,
Z2
i Z2
i
, · · ·


Pavel Loskot c 2014 4/32
Comparing Networks
Motif examples
Relative abundance of fragments
• assume that ensemble of random networks has the same nodes degrees as
the real-world network
α =
Nreal − E[Nrandom]
Nreal + E[Nrandom]
Pavel Loskot c 2014 5/32
Comparing Networks
Relative abundance examples
• ratios of the number of fragments occurrences are also useful to characterize
the network structure as shown next
Pavel Loskot c 2014 6/32
Transitivity Measures
Clustering coefficient
• recall that every triangle represents three connected (open or closed) triples
• let | ˜T3| be the number of triangles and | ˜P2| the number of 2-paths (connected
triples with 2 or 3 ties); the clustering coefficient (a.k.a. network transitivity):
C3 =
3| ˜T3|
| ˜P2|
where | ˜T3| = tr A3
/6, and | ˜P2| =
1
2 ij
[A2
]ij − tr{A}
• a network can be highly clustered locally, but not globally (i.e., considering
average of local clusterings across all nodes is not sufficient)
• clustering tends to be much larger for real-world than random networks
Example
• A’s friends: B,C,D and E
• all possible edges among A’s friends: B-C, B-
D, B-E, C-D, C-E, D-E, i.e., 6 in total and out of
which only 1 (C-D) exists
• thus, clustering coefficient of A is 1/6
Generalization
• any subgraph, ratio of actual to maximum
possible number of its occurrences: Cn = n| ˜Tn|
| ˜Pn|
Pavel Loskot c 2014 7/32
Centrality Measures
Aim
• quantify importance of nodes in a network (so-called positional advantage),
i.e. how nodes contribute to the overall structural properties of the network
• e.g. important nodes disseminate information faster, can stop spreading
epidemics, can protect network from breaking and so on
Degree centrality
• hubs are likely to have the largest influence (e.g. number of friends to help)
• a transitivity measure since it is ratio of single (neighboring) node fragments
• for a network of N nodes, i-th node of degree ki has degree centrality
C1(i) =
| ˜T1|
| ˜P1|
= CD(i) =
ki
N − 1
Network centralization (centrality) (σ2
C1
)
σ2
C1
=
1
N − 1
N
i=1
C1(i) − ¯C1
2
where ¯C1 =
1
N
N
i=1
C1(i)
• star topology has the maximum while line
topology has the minimum centralization
Pavel Loskot c 2014 8/32
Centrality Measures
Freeman’s degree centrality
• quantify variations in node degree centrality in the whole network
¯CD =
N
i=1(k∗
− ki)
(N − 1)(N − 2)
where k∗
= maxi ki, and max N
i=1(k∗
− ki) = (N − 1)(N − 2) for a star network
Betweenness centrality (beyond nearest neighbors)
• quantify node importance in communications between pairs of other nodes
• ability to broker between groups, likelihood of intercepting information etc.
• thus, it is the likeliness of node w to be involved in communications
CB(w) =
1
n−1
2 u w v
ρ(u, w, v)
ρ(u, v)
(normalization optional)
ρ(u, w, v) number of shortest paths between u and v via w
ρ(u, v) number of all shortest paths between u and v
OR
ρ(u, w, v) maximum flow from u to v through w
ρ(u, v) total maximum flow from u to v
Pavel Loskot c 2014 9/32
Centrality Measures
Example (betweenness centrality)
• A and E are not in-between any pairs, B and D are in-between 3 pairs, and
C is in-between 4 pairs
Closeness centrality
• measure of how much the node is in “middle of things”
• let d(u, v) be the shortest path length between nodes u and v
CC(u) =


1
N − 1 u v
d(u, v)


−1
(normalization optional)
Example (closeness centrality)
CC(A)= 1+2+3+4
4
−1
= 0.4
Pavel Loskot c 2014 10/32
Centrality Measures
Information centrality
CIC(i) =


1
N j
1
Iij


−1
Eigenvector centrality (xu)
• account for connections that are (or not) isolated; important nodes are likely
connected to other important nodes
• let B(u) be the neighbors of node u
xu =
1
λ v∈B(u)
xv =
1
λ v∈V
[A]u,v xv ⇒ Ax = λx
algorithm: initialize xu = 1 ∀u, re-calculate xu ∀u, λ = maxu xu, repeat
Katz centrality
• instead of counting shortest paths (as in closeness centrality), count all paths
• let 1 < α < λ1 (largest eigenvalue of A)
CK(i) = [Z · 1]i where Z =
∞
k=1
α−k
Ak
= I −
1
α
A
−1
− I
so the values of CK(i) are dependent on choice of α
Pavel Loskot c 2014 11/32
Centrality Measures
PageRank centrality
• reflects the probabilities that random walk through the network arrives to any
particular node
• intuitively, if there are many links out of node v, one of these links to node u
represents average recommendation of u by v; if the number of links out of v
is reduced, recommendation of u by v increases
• define the modified adjacency matrix [H]ij =
1/kout(i) if [vi, vj] ∈ E
0 otherwise
• PageRank vector CPR = [CPR(1), . . . ,CPR(N)]T
at step k is updated as
Ck+1
PR := Ck
PR · H
note that node 4 traps a random
walker, and also, the search
is often randomly reset (with
probability 1 − α), so this
modified H should be used
instead:
H′
= αH +
α
N
(a1T
) +
1 − α
N
where [a]i =
1 if kout(i) = 0
0 otherwise
Pavel Loskot c 2014 12/32
Centrality Measures
Reciprocity (r)
• in directed networks, link from u to v can be reciprocated as link from v to u;
these are called co-links
r =
ij[A]ij[A]ji
|E|
(fraction of reciprocated links)
Rich-Club coefficient of degree k (R(k))
• hubs tend to be densely interconnected which is quantified by R(k)
• let subgraph (V′
(k), E′
) ⊆ (V, E) where V′
(k) ⊆ V is subset of nodes with
degree at least k, and E′
⊆ E are the corresponding edges among V′
(k)
R(k) =
|E′
|
|V′(k)|
2
Matching index (µij)
• quantify similarity of connectivity of the two end-vertices of an edge
• small value of µij indicates the edge between vi ∈ V and vj ∈ V is a bridge
between two dissimilar regions of the network
µij =
k i,j AikAk j
i k Aik + j k Ajk
Pavel Loskot c 2014 13/32
Weighted Networks
Graph Network System
vertex node component
edge link interaction
Weights mapping
• weights can be assigned to vertices as well as to (more often) edges; we
assume mapping
W : (V, E) → (V, W)
so weighted adjacency matrix and original adjacency matrix, respectively,
[W ]ij = wij ∈ R [A]ij =
1 |wij| ≥ ∆
0 |wij| < ∆
Vertex strength
• degree distribution is generalized to the strength distribution having again a
power-law-like tails in many real-world networks
s(i) =
j
wij
• it was observed that node strength and node degree have dependency as
E[s|k] ∝ kβ
, β > 0
for β > 1, high-degree vertices (hubs) tend to be high-strength vertices
Pavel Loskot c 2014 14/32
Weighted Networks
Other generalizations
• the edge contributions can be normalized as wij/ j wij = wij/s(i), e.g. the
average nearest (first order) neighbor degree
kNN(i) =
j
wi j
s(i)
[A]ij k( j)
• importantly, there are no generally agreed definitions of quantities (metrics)
for weighted networks, e.g. the clustering coefficient ([A]ij ≡ aij)
C3(i) =
1
s(i)(k(i) − 1) j,k
wij + wik
2
aijajkaik [Barrat, Barth´elemy, Vespignani]
C3(i) =
1
k(i)(k(i) − 1) j,k
(wijwikwjk)1/3
[Onnela, Saram¨aki, Kaski,Kert´esz]
C3(i) =
j,k wijwjkwik
k wik
2
− k w2
ik
[Zhang] C3(i) =
j,k wijwjkwki
(maxij wij) j,k wijwik
[Holme]
where for unweighted network we assume wij = 1 if [vi, vj] ∈ E, and 0 otherwise
Pavel Loskot c 2014 15/32
Weighted Networks
Time-series as graphs
1. pre-processing: reduce measurement noise, reduce amount of data
2. calculate magnitude of correlations (possibly with thresholding)
0 ≤ [W ]ij =
E didj − E[di] E dj
E d2
i − E[di]2
E d2
j − E dj
2
≤ 1
3. construct a weighted graph assuming the weight matrix W
Pavel Loskot c 2014 16/32
Weighted Networks
Spanning tree of a graph
• a tree topology containing all nodes of the graph
• possibly additional requirement to maximize or minimize the sum of edge
weights
• it can be used to emphasize clusters in the graph, but . . . a lot of information
is discarded and is also sensitive to noise and thresholding
Example (NYSE stocks)
Pavel Loskot c 2014 17/32
Community Structure
Network communities
• so far, we considered local and
global structure and properties;
here, we look at spatial scale
in-between individual nodes and
the whole network - clusters
• clusters are obtained by network
partitioning or clustering
• our objective is to partition the
network using only its topology
Why clustering
• manage complex systems by creating hierarchy, for example,
Big Data analysis and classification such as large databases, customer
recommendations, website ranking, genomics, market evaluations etc.
• identify bridges and weak ties in networks
Formally
• find P disjoint subsets Vi, so that ∪P
i=1Vi = V and Vi ∩ Vj is empty set for i j
Pavel Loskot c 2014 18/32
Community Structure
Balanced partitioning
• given P, the size of partitions is approximately equal i.e. |Vi| ≈ |V|/P
• possibly also, the cut (the links between subsets) size can be minimized
Community
• there is a path through the community between every pair of nodes
• internal connection density significantly larger than density of external
connections
Cut size
• assume a weighted network, and the partition ˜V ⊂ V
• the internal and external weights of a node vi ∈ ˜V in the partition
Wint(i) =
vj∈ ˜V
wij , Wext(i) =
vj ˜V
wij
• the cut size between ˜V and the rest of nodes V ˜V
Ccut( ˜V) =
1
2 vi∈V
Wext(i)
Pavel Loskot c 2014 19/32
Community Structure
Reducing cut size
• moving node vi in or out the partition ˜V will change the cut size by
g(i) = Wext(i) − Wint(i)
so cut size is reduced if Wext(i) > Wint(i)
• for partitions already balanced, consider replacing one node in the partition
(i.e., move one node out and another node in); the cut size is changed by
g(i, j) =
g(i) + g( j) − 2wij if vi and vj connected
g(i) + g( j) otherwise
Centrality based partitioning
• links connecting nodes in different communities are likely to have large edge
betweenness centrality (defined analogically to node betweenness)
Algorithm [Girvan, Newman, 2002]
1. calculate edge betweenness for all links, remove link with highest such value
2. recalculate edge betweenness for remaining links
3. repeat until all links have been removed
Pavel Loskot c 2014 20/32
Community Structure
Modularity
• need to compare different partitions to decide which one is the best
• intuitively, cohesion or links density within the community is likely to be
significantly larger than if the community is formed at random
• for partitioning ∪P
i=1Vi = V, with edges Ei within the partition Vi, the
modularity indicator (ci is community assignment of vertex vi)
Q =
P
i=1


|Ei|
|E|
−


vj∈Vk
k(vj)
2|E|


2

 = . . . =
1
2|E| i,j
[A]ij −
k(i)k( j)
2|E|
δ(ci, cj)
so it is actual number of edges minus expected number of edges inside the
community for a random subgraph with the same node degree distribution
• Q ≥ −1, and max Q = 1 for strong community structure
• can be used as stopping criterion (Q >> 0) in Girvan-Newman algorithm
Modularity optimization
• find partitioning with maximum modularity (exact solution is NP complete):
complexity O |V|
|V|/2 ∼ 2|V|
√
π|V|/2
for large |V|
Pavel Loskot c 2014 21/32
Community Structure
Resolution problem
• modularity based clustering may fail
to identify obvious small clusters
close to a large cluster
• modularity is deficient if clusters are
circularly connect (pictured right)
• other similarity measures also
affected (minimum cuts, . . .)
Possible solution
• use multiple similarity metrics
• then choose the best partition by
consensus (e.g. majority vote)
Pavel Loskot c 2014 22/32
Community Structure
Hierarchical clustering
• complexity O((|E| + |V|)|V|), many networks are sparse (|V| ≈ |E|)
Algorithm
1. Initialize: |V| communities of 1
vertex each
2. Calculate modularity ∆Q for all
pairs of existing communities
3. Merge the community pair
having the largest increase ∆Q
4. Build the dendrogram and
repeat steps 2 and 3 until only
one community remains
Clustering based on Euclidean distance
Pavel Loskot c 2014 23/32
Community Structure
Merging clusters
• similarity between clusters can be measured as single linkage: minimum
between all pairs of nodes in two clusters
• Complete linkage: maximum between all pairs of nodes in two clusters
• Average linkage: average between all pairs of nodes in two clusters
Limitations of modularity
• appears to be strongly dependent on the density of links in the network
• thus, not good measure to determine communities in sparse networks
Clustering techniques
1. Agglomerative (bottom-up) techniques: edges are added among nodes to
create communities (e.g. dendrogram)
2. Divisive (top-down) techniques: edges are removed from graph to create
separate communities
3. Spectral techniques: graph splitting based on eigen-analysis
Similarity measures
• quantify (dis)similarity between nodes to decide on communities in all
clustering algorithms
• selection strongly application dependent (modularity, cosine similarity,
Jaccard’s coefficient, . . .)
Pavel Loskot c 2014 24/32
Community Structure
Louvain method (based on modularity optimization)
• more accurate and more efficient (much faster) than hierarchical clustering
• number of communities decreases quickly in only few iterations
Algorithm
1. Initialize: every node is in its own community
2. For each node i, consider all its neighbors j, and check if moving i into j’s
community increases ∆Q
3. Move i into community for which ∆Q is maximum
4. Repeat steps 2 and 3 until no further improvement possible (i.e. ∆Q = 0)
5. collapse the communities into single nodes (merging multiple edges
between these new nodes), and go back to step 2
Pavel Loskot c 2014 25/32
Community Structure
K-means clustering
• number of clusters K predefined
• minimize e.g. Euclidean distances:
{Vi}i=1,...,K = argmin
K
i=1 v∈Vi
v − ¯vi
2
, ¯vi =
1
|Vi| v∈Vi
v
Algorithm
1. Initialize: select K vertices
at random as initial clusters
and assign remaining
vertices to nearest clusters
2. Calculate new centroids ¯vi
for each cluster
3. Re-assigned all vertices to
the nearest clusters
4. Go to step 2 until some
stopping criterion is met
89% of data correctly classified
Pavel Loskot c 2014 26/32
Community Structure
Limitations of K-means clustering
• sensitivity to initial conditions and outliers
• sensitivity to non-homogeneous structure, i.e. clusters differ significantly in
size, connection density, non-spherical shape (for Euclidean distance metric)
Pavel Loskot c 2014 27/32
Community Structure
Gaussian mixture models
• assume there are K clusters, vertex vi has location xi
• location of vertices in cluster Vi are normally distributed ∼ N(x|µi, λi)
• let zi, i = 1, . . . , K be independent latent variables such that zi = 1 if cluster i
∼ N(x|µi, λi) and zi = 0 otherwise, so Pr(z) = K
i=1 πzi
i
• if zi are known, the data are labeled (parameters of their distribution are
known), otherwise data are un-labeled (unsupervised learning)
• πi = Pr(zi) are mixing probabilities (weights), so that K
i=1 πi = 1 and the
distribution of location x of a vertex is
p(x) =
z
p(x|z) p(z) =
K
i=1
πi N(x|µi, λi), wherep(x|z) =
K
i=1
N(x|µi, λi)zi
• unknown parameters: mixing coefficients (πi), means (µi), covariances (λi)
• using Bayes’ theorem, we can find posterior probabilities (responsibilities)
Pr(zi|x) that the k-th Gaussian component has in explaining observed data
Algorithm [Expectation Maximization (EM)]
1. E-step: evaluate Pr(zi|x) given current parameters
2. M-step: re-estimate parameters using current Pr(zi|x)
Pavel Loskot c 2014 28/32
Community Structure
Overlapping communities
• nodes may belong to more than one community (i.e. subsets Vi not disjoint)
Clique percolation method [Palla 2005]
• K-clique are K fully connected nodes
• K-cliques adjacent if share K − 1 nodes
• K-clique community is a set of nodes
connected through adjacent K-cliques
Algorithm
1. identify maximal cliques in the
network (complex problem, but
fortunately many real-world
networks are relatively sparse)
2. consider cliques as single
nodes; interconnect cliques if
they share at least K − 1 nodes
3. identify connected components
in graph created in step 2
Pavel Loskot c 2014 29/32
Community Structure
Spectral clustering
• K-means, Gaussian mixtures,
hierarchical method are good for
compact clusters
• spectral clustering transforms
the data into a new basis where
standard algorithms work well
Algorithm
1. Construct similarity matrix,
[S]ij = Exp − xi − xj
2
/2σ2
2. Construct Laplacian L = D −S
where D is diagonal matrix of
weights, [D]ii = j[S]ij
3. Construct matrix U of k
eigenvectors corresponding
to k largest eigenvalues of L
4. Perform clustering on the
transformed data x′
= UT
x
Pavel Loskot c 2014 30/32
Community Structure
Real-time clustering
• (dynamic) re-clustering for every new data arrival is expensive
• (dynamically) varying the number of clusters is confusing
Hierarchical Agglomerate Clustering [HAC, 2004]
1. Initialization: hierarchical clustering (e.g. using dendrogram)
2. new data either assigned to one of existing cluster, OR
3. new data form new cluster, and two existing clusters are merged
Pavel Loskot c 2014 31/32
Community Structure
Community analysis
• distribution of community sizes
• intra-community edge densities
• number of intra- and inter-community links
• average number of communities per node
• . . .
Community network
• communities → nodes
• edges weighted by number of links
between communities
Pavel Loskot c 2014 32/32
Take-Home Messages
Network structure analysis
• structure of un-weighted static networks, i.e. knowing only their topology
• subgraphs, graphlets, fragments and motifs are building blocks of large
networks; the statistics of their occurrence is useful to compare network
topology beyond their degree distribution
• network partitioning or clustering to identify (overlapping) communities
Measures of network structure
• centrality (degree, betweenness, closeness, eigenvector, Katz, PageRank,
. . .)
• clustering coefficient, Rich-Club coefficient
• modifications of measures for weighted networks
Networks: Random Models
Pavel Loskot c 2014 1/23
Statistical Modeling
Objectives
• account for models or parameters uncertainty, measurement noises etc.
• make (short-to-medium term) predictions from the models
• generate artificial data for verifying models and predictions
• decide how much randomness influence properties; here we compare
structural and functional properties of random and real-world networks
Milgram’s experiment [1967]
• famous “six-degree separation”
• 300 people at random to send letter
to a person in Boston
• repeated in 2003: 18 targets, 60k
senders, communications via emails
• new findings in 2003: median 5 −
7 steps, network structure is not
everything, high impact of incentives
• Facebook: 92% of users only 5 hops
away, 99% at 6 hops away
Pavel Loskot c 2014 2/23
Random Network Models
Erdos-Renyi (ER) random graph [1959]
• graph GER(n, p) with n vertices and edges chosen independently with
probability p, “a zero-order approximation” of real-world networks
• thus, vertex degree is random (for large n, binomial distribution approximated
by Poisson distribution)
Pr(k) =
n − 1
k
pk
(1 − p)N−1−k
≈ e−¯k
¯kk
k!
, ¯k = E[k] = (n − 1)p
• most vertices have average linking to other nodes (i.e. degree close to ¯k)
• diameter (d) and average distance (¯l) between two vertices is relatively small
compared to the size of the graph
d =
ln(n)
ln(p(n − 1))
=
ln(n)
ln(¯k)
≈ ¯l
• average number of edges E[|E|] = p N
2 = n¯k
2 ; the latter, since ¯k = 2E[|E|]
n
Connectivity of ER random graph
• average degree
¯k



< 1 graph disconnected
> 1 a giant component appears
≥ ln(n) graph (almost) completely connected
Pavel Loskot c 2014 3/23
Random Network Models
Average path length and diameter of ER (random) graph
• let l(i, j) be the shortest path between vertices vi and vj
• the shortest paths can be combined into a single metric as
¯l = 1
(n
2) i,j
i j
l(i,j)
2 average shortest path
d = maxi,j l(i, j) maximum shortest path (diameter)
• if n is large, the average path length ¯l ∝ ln n is relatively small and
growths slowly with network size (this is typical for many large networks);
for comparison, 1D lattice (chain): ¯l ∝ n, 2D lattice: ¯l ∝ n1/2
Pavel Loskot c 2014 4/23
Random Network Models
Clustering coefficient of ER graph
• ratio of neighbors being friends to all possible friendships among neighbors
• probability that two neighbors are connected is p, so clustering coefficient
CER = p =
¯k
n
which is much smaller than for real-world networks with the same density
• for large networks limn→∞ CER = 0, so large random networks resembles a
tree (i.e. they have no clustering)
Components in ER graphs
• if p (and thus, also ¯k) is small, there
are several disjoint components
• if p is increased, there is one giant
component (of size ≈ n) with the
rest of nodes being in isolated small
components
• the giant component appears when
p ≈ 1/n, e.g. for n = 103
(see figure)
Pavel Loskot c 2014 5/23
Random Network Models
Percolation transition (when increasing p)
1. Subcritical: ¯k < 1, many small simple components of size at most ln n
2. Critical: ¯k ≈ 1, size of largest component is ∼ n2/3
, the giant component
appears and starts growing
3. Supercritical: ¯k > 1, there is one giant component of size almost n, the
second largest component has size about ln n
Summary of ER graph
• degree distribution is Poisson (most nodes have degree close to average)
with no correlations of node degrees
• average path length is small and ∝ ln n
• connectivity depends on ¯k with percolation transition
Pavel Loskot c 2014 6/23
Random Network Models
Random geometric models [Penrose, 2003]
• main motivation: some networks can grow subject to geometric constraints
• e.g., place n nodes randomly in (2D) space; two nodes i and j connected
only if their distance xi − xj ≤ r
• there exists critical radius rc to form a connected giant component (if r > rc):
rc =
√
ln n + O(1)
πn
Random distance models [Avin, 2008]
• n nodes placed randomly in (2D) space
• links created randomly with the probability ∝ f(dij) where dij = xi − xj
Pavel Loskot c 2014 7/23
Small World Networks
More on Milgram’s experiment
• how accurate 6 degree separation, how likely the chain to be completed
Findings from real-world social networks
• sub-optimal choice in choosing next link in chain is made 1/2 of time
• Facebook measurements: average distance is 4.74
• Twitter measurements: average distance is 4.67 (50% are at 4 steps, nearly
everyone in 5 steps)
Pavel Loskot c 2014 8/23
Small World Networks
Main features
high clustering: Creal−world ≫ Crandom
average path length: ¯lreal−world ≈ ¯lrandom
Watts-Strogatz (WS) small world network model [Nature, 1998]
• launched the interest into complex networks (over 3.5k citations)
• single control parameter to generate regular to purely random networks
• the model: 1. generate regular graph, 2. rewire links with probability p
• in all network generators: self-loops and duplicated links not allowed
Pavel Loskot c 2014 9/23
Small World Networks
WS original model
• select fraction of p edges and rewire one of their end-points
WS model alternation
• add fraction p of edges to initial regular lattice
Pavel Loskot c 2014 10/23
Small World Networks
Properties of WS model:
Degree distribution
Pr(k) =
min(k−K,K)
i=0
K
i (1 − p)i
pK−i(pK)k−K−i
(k−K−i)! e−pK
∼ Poisson-like distribution
Clustering coefficient
• if node i has K neighbors,
C =
#edges among K neighbors
(K
2)
• probability that connected triple still
connected after rewiring is (1 − p)3
• C(p = 0) = 3k−3
4k−2 = 3
4, C(p = 1) ≈ 2k
n
• C(p) C(p = 0) · (1 − p)3
, i.e.,
C(p)/C(0) (1 − p)3
Average path length: ¯l ≈ (n−1)(n+2k−1)
4kn
Pavel Loskot c 2014 11/23
Small World Networks
Kleinberg’s geographical small world model [Nature, 2000]
• connectivity derived from geographical distances
• the model: 1. link nearest neighbors 2. add links with the probability
Pr(link between u and v) ∼ const × u − v −r
where r is navigability exponent (e.g., links are purely random for r = 0)
Hierarchical small world model [Science, 2001]
• hierarchically nested groups, link probability pij ∼ exp−αxi j
Other strategies for generative models of small world networks
• add/rewire links based on chosen properties of current links and edges
• add/rewire links to optimize particular property of the network
Pavel Loskot c 2014 12/23
Small World Networks
Topology trade-offs
(a) commuter rail network
(b) star network
(c) minimum spanning tree network
Pavel Loskot c 2014 13/23
Small World Networks
Bridges
• in social networks, close friends know what you know, and they also know
others who know what you know
• bridge between A and B (if removed, these two nodes become disconnected)
• local bridge between A and B (if removed, distance A-B increased to > 2)
Pavel Loskot c 2014 14/23
Small World Networks
Strong and weak ties [Granovetter, 1974]
• in social networks, links are strong or weak ties (friends vs. acquaintances)
• strong triadic closure: if A-B and A-C are strong ties, then at least weak tie
between B-C exists
• if there are enough strong ties in network, local bridges must be weak ties
Almost local bridges
• neighbor overlap of nodes A and B:
#neighbors of both A and B
#neighbors of at least A or B
= N(i,j)
(k(i)−1)+(k(j)−1)−N(i,j)
• almost local bridges are links
whose end-nodes have no common
neighbors (i.e., the overlap of their
neighbors is 0)
Pavel Loskot c 2014 15/23
Small World Networks
Removing ties from social networks (percolation analysis)
• removing weak ties breaks down the network
• removing strong ties degrades the network more smoothly
• however, this is specific only to social networks
Removing ties from other networks
• e.g. removing important road (strong tie) is more damaging
• central veins are more important then peripheral veins
Pavel Loskot c 2014 16/23
Small World Networks
Illustration of weak vs. strong ties removal
(a) original network
(b) 80% of strongest links removed, 20% of weak ties remain
(c) 80% of weakest links removed, 20% of strong ties remain
• no evidence of degradation for (b), network clearly fragmented in case (c)
• strong links are within dense neighborhoods (triangles, cliques etc.)
• weak links (and bridges) interconnect these dense neighborhoods
Pavel Loskot c 2014 17/23
Scale Free Networks
Power-law distribution
Pr(k) ∼ const×k−γ
, typically 2 < γ < 3
• i.e., a straight line in log-log domain:
log Pr(k) ∼ −γ log k + log const
• always a few highly connected hubs
Pavel Loskot c 2014 18/23
Scale Free Networks
Power-law distribution
• some other distributions look like power-laws
• estimating γ may not be so easy
• 1st and 2nd moments:
E[k] ∝
∞
k0
k × k−γ
dk = lim
k→∞
1
2 − γ


1
kγ−2
−
1
kγ−2
0

 =
= ∞ if γ ≤ 2
< ∞ if γ > 2
E k2
∝
∞
k0
k2
× k−γ
dk =
= ∞ if γ ≤ 3
< ∞ if γ > 3
Preferential attachment
• now, the concern is how to generate scale-free networks
• we use richer-get-richer effect and add new nodes sequentially:
1. with probability p, choose any existing node and link to it
2. with probability 1 − p, link to existing node with probability proportional to
their current degrees
Pavel Loskot c 2014 19/23
Scale Free Networks
Barab´asi-Albert (BA) scale-free model
1. Growth: start from seed network of m0 isolated nodes
2. Preferential attachment: add a new node with m ≤ m0 edges to existing
nodes that are chosen with the probability Π(i) = k(i)/ i k(i)
3. after t steps, the network has n = m0 + t nodes and mt edges, and
Π(i) =
k(i)
i k(i)
=
k(i)
2mt − m
≈
k(i)
2mt
• this procedure generates degree distribution
Pr(k) =
2m(m + 1)
k(k + 1)(k + 2)
2m
k3
∝ k−3
so γ = 3 and average degree ¯k = 2m
• average shortest path length: ¯l ∝ ln n
ln(ln n)
• clustering coefficients: C ∝ (ln n)2
n
( . . . too small for real-world networks)
Pavel Loskot c 2014 20/23
Scale Free Networks
Question
• In scale-free networks, how much is “popularity” predictable?
Answer
• if we restart the process, different popular nodes will emerge
Other scale-free network models
• motivation: improve clustering coefficient
and allow to change exponent γ
[Holme, Kim]
• after preferential attachment step,
with probability p, add one more
edge to randomly selected neighbor
• resulting clustering coefficient C ∝ 1
k
( . . . much more realistic)
[Vazquez et al.]
• random walk instead of preferential attachment (“get to know important
people through people you already know”)
[Kleinberg, Kumar]
• copy a vertex and rewire its edges with certain probability
Pavel Loskot c 2014 21/23
Scale Free Networks
Network generator with given γ
1. Initialize: seed network with m0 (isolated) nodes
2. Add one node and m links (not necessarily stemming from the new node) at
each time; after t time-steps, the link is added to node i with the probability
Π(i) = α
k(i)
i k(i)
+ (1 − α)
1
t + m0
where i k(i) = 2mt
3. thus, α = 1 leads to preferential attachment, and α = 0 is for uniform
attachment, and the degree distribution
Pr(k) ∝ k−(1+1/α)
Models with non-power-law distribution
• power-law distribution is a good fit for large networks (averaging effect)
• on smaller scales, in more “specialized” sub-networks, power-law may not
be such a good fit
• log-normal distribution has been observed in such cases
Pavel Loskot c 2014 22/23
Scale Free Networks
Configuration network model
• degrees are pre-assigned to n nodes assuming a degree distribution Pr(k)
• edges are added by randomly selecting pairs of these n nodes
• a family of graphs generated this way will have the same degree distribution
• excess degree is the number of possible outward links of a node which
has been arrived to during a walk (i.e., one less than the node degree in
undirected graphs)
Pr(kexcess) =
(k + 1)Pr(k + 1)
k kPr(k)
Other models
• many other stochastic models of networks can be devised and then analyzed
• hence, it is important to define quality of such models, e.g. (generally):
– flexibility (design for specific parameter settings)
– mathematical tractability
– accuracy (to fit experimental data, make predictions)
Pavel Loskot c 2014 23/23
Take-Home Messages
Random networks
• Erdos & Renyi studied a simple model in 1959
• it has Poisson degree distribution with small average path length, but
clustering goes to zero with network size
Small world networks
• the world is small, 6 degree separation (Milgram’s experiment)
• short average path length, but clustering still smaller than in real networks
• real-world networks contain weak and strong ties
• Watts & Strogatz proposed simple model of small world networks (in 1998)
Scale free networks
• main focus is to produce power-law distribution
• Barab´asi & Albert proposed model based on preferential attachment (in
1999); many modifications of this model can be (and were) devised
Network models
• mostly stochastic with main motivation is to emulate real-world networks
• find structural properties to explain specific (global) properties of networks
• useful to define quality of these models
Networks: Robustness
Pavel Loskot c 2014 1/11
Robustness
Percolation
• monitor network metrics while nodes or edges are being removed
Dual problem
• monitor network metrics while nodes or edges are being added
1. What strategy to remove/add nodes or edges?
• no knowledge: nodes and edges removed (uniformly) at random
• knowledge of structure: removing nodes and edges with high centrality
• adding nodes and edges: cf. random network generators
2. Which metrics most relevant and should be monitored?
• rate of decay/growth of: network diameter, average degree, average
distance, size of giant component etc.
3. Which (class of) networks to consider?
• any network, networks with specific degree distribution etc.
4. Why to consider robustness?
• in general, networks resilience to attacks is a growing concern
• want to design networks that are robust to damage
Pavel Loskot c 2014 2/11
Robustness
Pragmatic definition
• the network is robust if it can withstand accidental damage, random topology
changes as well as intentional attacks and remain operational
• this accounts for the remaining nodes and links to be able to carry flows and
perform other tasks without excessive congestion, dead-locks etc.
• observing average decay (e.g. size of giant component) may not be that
useful (e.g. it cannot identify local congestion further impairing the network)
• note also that we are still considering only networks with static topology
Example
• 50 nodes, removing 40 out of 116 edges decreases ¯k from 4.6 to 3.0
Pavel Loskot c 2014 3/11
Robustness as Stability
Global stability
• system is stable if it returns to equilibrium after any perturbation
Resistance
• ability of a community to resist change in face of potentially perturbing force
Resilience
• ability of a community to recover to normal functioning after disturbance
Variability
• variations in community density over time (measured e.g. as changes in
mean/variance) due to external disturbances
Pavel Loskot c 2014 4/11
Robustness
Percolation threshold
• if ¯k decreases by removing
edges, network suddenly becomes
disconnected
• if ¯k increases by adding edges, giant
component suddenly emerges
Examples
p probability of filling squares, at p critical, giant connected component appears
Pavel Loskot c 2014 5/11
Robustness
Experiment [Barab´asi et al., 2000]
strategy: random failures versus targeted attacks removing nodes
metrics: average or maximum (network diameter) shortest path
networks: exponential versus scale-free (the same |V| and |E|)
Pavel Loskot c 2014 6/11
Robustness
Experiment (cont.)
effect on size of
giant component s
and its average s
Pavel Loskot c 2014 7/11
Robustness
Experiment (cont.)
(Internet and WWW)
effect on size of
giant component s
and its average s
Pavel Loskot c 2014 8/11
Robustness of Scale-Free Networks
Random failures vs targeted attack
(a) original network of 574 nodes
(b) removing 20% (115) of nodes randomly
leaves 427 nodes in giant component
(c) removing only 2.8% (22) most connected hubs
leaves 301 nodes in giant component
Bottom line
• scale-free networks are robust against random failures
• they are very vulnerable against targeted attacks
Pavel Loskot c 2014 9/11
Robustness of Scale-Free Networks
Impact of power-law exponent on robustness (∼ k−γ
)
• γ = 2.5: graceful degradation
• γ = 3.5: giant component disappears
at about f = 40%
• assume e.g. case of γ = 2.7 (square
markers)
• kmax is maximum degree among
remaining nodes
• removing only 1% of nodes discards
giant component (top figure)
• kmax has to be very small to destroy
giant component (bottom figure)
Pavel Loskot c 2014 10/11
Robustness
Percolation threshold for random failures
• in general, minimum fraction of nodes required (i.e., that cannot be randomly
removed) for giant component to exist
fc =
E[k]
E k2 − E[k]
• specialized for random networks:
fc =
1
E[k]
thus, if ¯k = E[k] is large, random network can withstand large losses; e.g. if
¯k = 4, then 1/4 of nodes is enough for giant component to exists (i.e., 3/4 of
nodes have to be removed to destroy giant component)
• specialized for scale-free networks:
fc −→ 0
as E k2
tends to be very large (even infinite) which makes these networks
very robust against random failures (and attacks)
Pavel Loskot c 2014 11/11
Take-Home Messages
Scale-free networks
• very robust against random failures (some suggest that this is the reason
why these networks are found so often in real world)
• but very vulnerable against attacks to highly connected hubs
• since hubs are also responsible (and effective) for spreading messages,
diseases etc. through the network
• in Social Networks, it is not hubs but rather weak ties and bridges that make
these networks vulnerable
Small world networks
• have not been considered here
• one extreme is a ring with one hop neighboring connections without any
shortcut links; such network is not robust at all
• another extreme is a fully connected network which is unbreakable
• small world networks are in-between these two extremes; their robustness
is likely derived from the density of shortcut links
Making networks more robust
• obvious strategy is to guarantee some minimum degree for every node (i.e.,
to achieve connections redundancy)
Networks: Processes
Pavel Loskot c 2014 1/13
Epidemic Spreading
Network processes
• strongly influenced by network structure; e.g. shortcuts significantly speed
up spreading (of information, diseases) and synchronization of processes
• hence, understanding of such network(ed) (distributed) processes requires
understanding of the underlying network structures
• e.g. neurons integrate signals from neighbors, if above threshold, the
excitation fires and then fades away; this leads to oscillating cascades
• here, we consider diffusion of diseases characterized by contagion (lack
of choice), unlike information spreading where nodes make decisions to
maximize their pay-offs
Pavel Loskot c 2014 2/13
Epidemic Spreading
Simple spreading models
• ring topology with shortcuts
• all nodes susceptible
• nodes infected with probability p
• spreading of disease, computer
viruses, . . .
• tree topology of spreading in waves
of k nodes, all nodes susceptible
• nodes infected with probability p
a) p is large, disease spreads out
b) p is small, disease dies out
Reproductive number: R0 = kp
a) if R0 < 1, disease dies out in finite
number of waves
b) if R0 > 1, disease very likely infects
at least 1 person in each wave
Pavel Loskot c 2014 3/13
Epidemic Spreading
Limitations of simple models
• small changes in k and p can move R0 above or below threshold (R0 ≷ 1)
• network topology not realistic (e.g. no triangles)
• nodes get infected only once and never recover
More realistic: SI model
• two classes of nodes: S (susceptible) and I (infected)
• once infected, the node cannot recover
|V| = |S | + |I| total number of nodes (V = S ∪ I)
β = λ¯k infection rate per node (0 ≤ λ ≤ 1)
β|S |/|V| susceptible contacts per unit of time
dI
dt = β|S ||I|/|V| overall rate of infection
• let i = |I|/|V| be fraction of nodes infected, then di
dt = β i (1 − i) which yields a
logistic curve:
i(t) =
i(0) eβt
1 − (1 − eβt) i(0)
Pavel Loskot c 2014 4/13
Epidemic Spreading
More realistic: SIR model
• improve SI model by assuming infected nodes recover at rate υ, i.e., nodes
stay infected only for (average) time τ = 1/υ
• recovered node will become resistant (i.e. cannot be infected again)
• define fractions s = |S |/|V|, i = |I|/|V|, r = |R|/|V|, so s + i + r = 1; rate of
change of these fractions over time:
ds
dt
= −βsi,
di
dt
= βsi − υi,
dr
dt
= υi
solution again requires initial conditions s(0), i(0) and r(0)
Possible outcomes
a) disease may die out
b) disease may spread to
whole network
c) disease becomes endemic
(does not spread, nor die
out)
Pavel Loskot c 2014 5/13
Epidemic Spreading
More realistic: SIS model
• no (permanent) recovery, but infected node may again become susceptible
• infected to susceptible rate υ:
a) if β > υ, logistic growth (as in SI model), but never infects whole population
b) if β → υ, then i → 0 (infection will slowly die out)
c) if β < υ, then infection dies out exponentially
• mathematical model (assumes r = 0):
ds
dt
= υi − βsi,
di
dt
= βsi − υi, s + i = 1
Pavel Loskot c 2014 6/13
Epidemic Spreading
Prognosis of epidemic
• reproductive number: R0 = β/υ
a) if R0 > 1, infection survives
b) if R0 < 1, infection dies out
• in SI model, υ → 0, so R0 ≫ 1
(a) SI model
(b) SIR model
(c) SIS model
Extensions of SIR model
• rather than assuming recovery after τ time units, let recovery be possible at
each time with some (fixed) probability
• infected state further subdivided (e.g. early, middle and final disease stages)
• non-homogeneous mixing: restrictions how the nodes meet (e.g. travel to
geographical locations, quarantining, . . . )
• other random network models (note that Erdos-Renyi model with
homogeneous mixing was implicitly assumed in SI, SIR and SIS models)
Pavel Loskot c 2014 7/13
Epidemic Spreading
SIS model in scale-free networks
• experimentally observed that
computer viruses survive
significantly longer than
predicted from SIS model
over random networks
• it was found that there is no
epidemic threshold in scale-free
networks, so infection proliferate
independently of spreading rate
• however, there is critical fraction
of shortcuts in scale-free
networks; if enough shortcuts,
disease suddenly becomes
epidemic
• critical fraction of shortcuts is a
function of rates β and υ
Pavel Loskot c 2014 8/13
Epidemic Spreading
Network immunization
• random networks: uniformly random immunization is helpful
• scale-free networks: targeted degree-based immunization required as
random immunization does not help
• targeted local immunization: immunize one immediate neighbor for every
node in a randomly selected group (i.e. nodes with higher degree are more
likely to be immunized)
red-circles: random immunization of scale-free network
red-squares: targeted immunization of scale-free network
black-squares: random & targeted immunization of random network
Pavel Loskot c 2014 9/13
Take-Home Messages
Epidemic spreading
• practical modeling requires to extract model parameters from real data
• knowledge of nodes mobility is key to accurate modeling of spreading
• epidemic spreading strongly influenced by information diffusion (i.e. knowing
what is happening and what to do)
• predictive modeling (if epidemic spreading on-going, it is desirable to be in
real-time) is routinely used in practice as prevention
SARS prediction and comparison with real outbreak data
Pavel Loskot c 2014 10/13
Network Dynamics
Spatial-temporal scales
(a) short: link activation and deactivation
– topology is a snapshot
– connected components must respect time sequences of links
(b) longer: topology change from one structure to another
– communities formation, merging, splitting
– large communities persist in time if there is exchange of their members
– small communities persist if their core is highly connected with strong ties
(c) long: network evolution (birth, growth and decline)
– in scale-free networks, in spite of changes (nodes and links appear and
disappear), degree, weight and strength distributions remain stationary
Pavel Loskot c 2014 11/13
Information Cascades
Aims
• understand how behaviors, ideas, technology usage etc. are adopted,
influenced and spread through networks
Diffusion model
• two nodes v and w
• two behaviors A and B
• two pay-offs a > 0 and b > 0
(i.e., the larger, the better)
Network implications
• let 0 ≤ p ≤ 1, and there
are d neighbors of v
• pd neighbors of v
choose A
• (1 − p)d neighbors of v
choose B
• A is better strategy if:
pd·a ≥ (1−p)d·b ⇒ p ≥
b
a + b
Pavel Loskot c 2014 12/13
Information Cascades
Example diffusion in a network
• let a = 3, b = 2, so b
a+b = 2/5
• A: dark circles, B: light circles
(b) only v and w adopt A
(c) nodes r and t switch to A (i.e. 2/3 neighbors of A); u does not switch (but
1/3 of its neighbors chose A); note also that: 1/3 < 2/5 < 2/3
(d) also nodes s and u switch to A
Pavel Loskot c 2014 13/13
Take-Home Messages
Cascades
• initial adoption by few nodes may generate complete cascade
• it is dependent on network structure
• it is also crucially dependent on threshold b/(a+b), so changing pay-offs can
make big difference (e.g. making the product more attractive)
• OR, directly influence key nodes (initial adopters)
• densely inter-connected clusters are difficult to penetrate
• key parameters: clusters connection density and pay-off threshold
Role of weak ties
• very useful in spreading information
• poor in transferring behaviors that are risky and/or costly
Influencing nodes
• in networks with many clusters, users are more easily influenced
• reinforcement is very important in influencing users
• node centrality is crucial for (information, behavior) diffusion
Networks: Algorithms
Pavel Loskot c 2014 1/24
Max Flow and Min Cut
Scenario
• single source node s and single sink node t (for simplicity)
• directed edges between nodes represent flows (information, material, . . . )
• every edge assigned a weight representing max possible flow ≡ capacity
Pavel Loskot c 2014 2/24
Max Flow and Min Cut
Dual problems (of combinatorial optimization)
1. find minimum cut of a graph G = (V, E) where V is set of nodes and E are
weighted edges (max flows)
2. find maximum possible total flow from s ∈ V to t ∈ V over E while flows at
every other node are equalized (in-flow = out-flow)
Cut (S, T)
• node partitioning V = S ∪ T such that S ∩ T = ∅ and s ∈ S and t ∈ T
Capacity of cut (S, T)
• sum of weights (capacities) leaving set S and entering set T
Pavel Loskot c 2014 3/24
Max Flow and Min Cut
Minimum cut problem
• find the cut with the minimum capacity
Maximum flow problem
• assign flows to edges not larger than their capacity, so that total flow from s
to t is maximized and flows in all other nodes (V{s, t}) are equalized
Pavel Loskot c 2014 4/24
Max Flow and Min Cut
Observation 1
• flow from S to T is equal to the total flow reaching sink t
Pavel Loskot c 2014 5/24
Max Flow and Min Cut
Observation 2
• flow from S to T is at most equal to capacity of the cut
• if flow from S to T is equal to capacity of the cut, then we have maximum
possible flow from S to T and (S, T) is minimum cut
Pavel Loskot c 2014 6/24
Max Flow and Min Cut
Greedy algorithm
1. select a path from s to t and set its flow to be equal to the minimum capacity
among its edges (≡ bottleneck)
2. for every edge, obtain residual capacity ≡ capacity - flow (“undo” flow sent):
i.e. add edge (w, v) to every edge (v, w) with positive residual capacity
3. augment path with strictly positive residual capacities
Pavel Loskot c 2014 7/24
Max Flow and Min Cut
Ford-Fulkerson algorithm
• greedy algorithm to find a maximum flow
• find augmenting path with strictly positive residual capacities
• if path can no longer be augmented, the flow is maximum
Max-flow min-cut theorem
• The value of maximum flow is equal to the capacity of the minimum cut.
Complexity of Ford-Fulkerson algorithm
• assume capacities are integers 1, . . . , U
• Theorem 1: the algorithm terminates in at most |V| · U iterations.
• Theorem 2: if all edge capacities are integers, then the maximum flow has
integer values of flows on every edge.
Pavel Loskot c 2014 8/24
Max Flow and Min Cut
Choosing initial augmenting path
• some choices lead to exponential time algorithm, clever choices lead to
polynomial time algorithm (number of iterations):
1. choose path with fewest edges (shortest path, breadth first search)
2. choose path with maximum bottleneck capacity (fastest path, priority or
depth first search)
Application: Bipartite matching
• find maximum matching of a bipartite graph G
• solve max-flow problem for extended graph G′
• by integer theorem (see above), there exists a maximum flow with 0/1 values
Pavel Loskot c 2014 9/24
Take-Home Messages
Applications of max-flow and min-cut theorem
• Network connectivity
• Bipartite matching
• Data mining
• Open-pit mining
• Airline scheduling
• Image processing
• Project selection
• Baseball elimination
• Network reliability
• Security of statistical data
• Distributed computing
• Egalitarian stable matching
• Distributed computing
• . . .
There are many efficient algorithms for solving max-flow min-cut problem.
Pavel Loskot c 2014 10/24
Network Routing
Routing algorithms
• find the least cost path between any two nodes in the (telecommunication)
network
• link cost: e.g. capacity, inverse of delay, or more simply, all links have a unit
cost
• path cost: sum of link costs along the path
1. Link state routing algorithms
• assume every node has knowledge about network topology and all link costs
• thus, all nodes have the same (global) knowledge (how?)
• so-called centralized or link state algorithms
2. Distance vector routing algorithms
• only local knowledge of link costs to all neighbors
• iterative computations in collaboration with neighbors
• so-called decentralized or distance vector algorithms
Pavel Loskot c 2014 11/24
Network Routing
Link state routing: Dijkstra algorithm
• every node computes the least cost path to all other nodes in the network
• the computed paths are stored in so-called forwarding table
• after K iterations, the least cost paths known for K destination nodes
Algorithm:
c(x, y) link cost between neighbors x and y (= ∞ if not neighbors)
D(v) current cost of path from source to destination node v
p(v) predecessor node along path from source to node v
V′
set of nodes whose least cost paths already known
Pavel Loskot c 2014 12/24
Network Routing
Dijkstra algorithm example
• the shortest path constructed by tracking predecessors
• if ties encountered, they can be broken arbitrarily
Pavel Loskot c 2014 13/24
Network Routing
Dijkstra algorithm example
Complexity of Dijkstra algorithm
• at each iteration, need to check N nodes not in V′
, i.e., N(N + 1)/2
comparisons ∼ O(N2
)
• more efficient implementations devised ∼ O(N log N)
Pavel Loskot c 2014 14/24
Network Routing
Distance vector algorithm
• fully distributed generation of forwarding tables
• based on Bellman-Ford equation (dynamic programming)
dx(y) = minv∈N(x) (c(x, v) + dv(y))
v∗
= argminv∈N(x) (c(x, v) + dv(y))
N(x) neighbors of node x
c(x, v) link cost from x to v
dv(y) cost from neighbor v to destination y
v∗
next hop in least cost path from x to y
Example
dv(z) = 5, dx(z) = 3, dw(z) = 3
du(z) = min


c(u, v) + dv(z),
c(u, x) + dx(z),
c(u, w) + dw(z)


= min


2 + 5,
1 + 3,
5 + 3


= 4
Pavel Loskot c 2014 15/24
Network Routing
Distance vector algorithm
• Dx(y) is least cost from x to y and it is iteratively estimated
• every node x maintains distance vectors for yourself and all its neighbors;
recall that V is set of all nodes and N(x) is set of neighbors of x
Dx = Dx(y) : y ∈ V
Dv = Dv(y) : y ∈ V , v ∈ N(x)
as well as x knows costs c(x, v) to all its neighbors v ∈ N(x)
• key idea is to periodically exchange distance vectors Dx among neighbors;
the vectors are then updated using B-F equation as:
Dx(y) ← min
v∈N(x)
(c(x, v) + Dv(y)) , for ∀y ∈ V
so (under some minor conditions) estimate Dx(y) −→ true value dx(y)
Distance vector updates (at each node)
1. asynchronous: triggered by change of local link cost, or by update message
from the neighbor
2. synchronous: notify all neighbors if own distance vector changes
Pavel Loskot c 2014 16/24
Network Routing
Example updates
Pavel Loskot c 2014 17/24
Network Routing
“Good news travel fast”
“Bad news travel slow”
Comparison
Link state Distance vector
Messages O(|V| · |E|) msgs sent local exchange only
Convergence O(|V|2
), may have time varies, possibly loops,
oscillations count-to-inf problem
Robustness may advertise incorrect may advertise incorrect
link cost, each node path cost, each node’s
computes its own table table used by others
(errors propagate )
Pavel Loskot c 2014 18/24
Search on Networks
• the aim is to find some source-destination path in reasonable amount of time
• the path cost is not an issue unlike in routing
Surprising observations (from real-world networks)
1. short paths exist between pairs of nodes (6 degree separation)
2. these short paths can be discovered (and used)
Remarks
• both observations closely interrelated
• it is not so clear how to discover (or even create) these short paths
• typical situation is nodes have only local rather than global information;
flooding to discover the destination known to be very inefficient
Decentralized search
• Kleinberg’s small world network
model: n × n grid of nodes with local
connections plus every node v has a
random long range link to node w
Pr(v link to w) ∼ d(v, w)−α
, α ≥ 0
and distance d(v, w) ≡ #grid steps
• value of α trade-offs how random
long-range connections are
Pavel Loskot c 2014 19/24
Search on Networks
Comparing search strategies
• efficiency of a search strategy is expected delivery time (over random long-
range contacts i.e. topology, and random source-destination pairs)
• delivery time ∼ number of hops in the graph (unit-weight links)
Trading-off value of α
• α = 0 long-range links are uniformly distributed (∼ WS model), difficult to
navigate having only local knowledge (and knowing location of destination)
• for α = 0, the actual chosen path to destination is likely to be significantly
longer than the corresponding shortest path
• α > 0 higher clustering, long-range links less random, more realistic scenario
• lower-bounds on expected delivery time [Kleinberg 2000]
¯TD ≥



const × n(2−α)/3
0 ≤ α < 2
const × (log n)2
α = 2
const × n(α−2)
2 < α < 3
thus, α = 2 is a polynomial in log n, while other cases are polynomials in n
Pavel Loskot c 2014 20/24
Search on Networks
Web search
• information retrieval since 60’s using “textual analysis”
• more recently, information ranked by its score (e.g., #links to it)
Scoring a webpage
• #webpages pointing to it (unit-weight links)
• sum of the scores of neighboring webpages pointing to it
Pavel Loskot c 2014 21/24
Search on Networks
Authorities
• nodes pointed to by highly ranked nodes
• they offer prominent, highly endorsed answers to queries
Hubs
• nodes that point to highly ranked nodes
Assessing authorities and hubs
• compute weights h(i) (for hubs) and b(i) (for authorities)
h(i) =
j
[A]ij b( j)
b(i) =
j
[A]ij h( j)
• the weights are computed iteratively as (in matrix form)
ht+1 = (AAT
)ht
bt+1 = (AT
A)bt
• main drawback: it requires global knowledge (of A), so it is query-dependent
Pavel Loskot c 2014 22/24
Search on Networks
PageRank (named by the Google founder)
• ranking pages independently of queries
• main idea: page is important if it is linked by other important pages
• every page is assigned a weight
w( j) =
i
[A]ij w(i) ·
1
dout(i)
w(i) weights of in-bound neighboring pages
dout(i) out-degree of node i to dilute its importance
if it links to many other nodes
• the weights w(i) are probabilities that from any starting page, the page i is
reached via a random walk
• however, if some page does not have out-bound links, the random walker
gets trapped; so with probability s choose random walk, and with probability
(1 − s) jump randomly to any other node
Pavel Loskot c 2014 23/24
Search on Networks
Strategies
• many strategies may be devised, some are more efficient than others
• decentralized search is a practical requirement in large networks
• in social networks, weak (social) ties and hierarchy play significant role
• visiting the same nodes while searching is inefficient, yet there is tendency
to visit hubs often
To aid the search
• nodes as sources of information are scored (e.g. by level of trust)
• exploiting network structure of (distributed) information helps significantly
• challenge: real-time updates of contents
• ranking (i.e. scoring) algorithms are kept secret and changed (updated)
continuously
Pavel Loskot c 2014 24/24
Take-Home Messages
Routing
• it is not only to find source-destination path, but the one having least cost
• it is implicitly assumed that each node has an address (identification)
• routing in the Internet evolved over time (i.e., it has not been designed from
the beginning)
• it is still unclear why the Internet routing works so well at such large scales
• main issues with the Internet routing are robustness, security and congestion
Search on small world and scale free networks
• small world networks have small short path length and high clustering
coefficient, however, Watts-Strogatz (WS) model does not capture
navigability of real-world networks
• search is fast and scales well in scale-free networks
Networks: Software
Pavel Loskot c 2014 1/11
Software Requirements for Graph Data
Tasks
• input data in common format (e.g. Excel, CSV, . . . )
• convert (output) data into the desired format (GraphML, Pajek, . . . )
• Social Network Analysis (SNA) of data
• dynamic (temporal) analysis
• data visualization
Requirements
• steep learning curve (easy to grasp)
• flexibility (use different formats for input and
output)
• scalability (Big Data, application dependent)
• speed (if Big Data or real-time)
• parallel and distributed computing capability
(MapReduce)
• functionality as modules or add-ins
• . . .
Pavel Loskot c 2014 2/11
Networks in Matlab
Pavel Loskot c 2014 3/11
Networks in Matlab
Pavel Loskot c 2014 4/11
Networks with Python
Pavel Loskot c 2014 5/11
Networks in C, R, Python
Pavel Loskot c 2014 6/11
Networks Visualization and Analysis
Pavel Loskot c 2014 7/11
Networks Community Analysis
Pavel Loskot c 2014 8/11
Social Network Analysis
Pavel Loskot c 2014 9/11
Popular in Bioinformatics
Pavel Loskot c 2014 10/11
Networks Online Demos
Pavel Loskot c 2014 11/11
Networks Data

More Related Content

What's hot

Graphs in Data Structure
 Graphs in Data Structure Graphs in Data Structure
Graphs in Data Structurehafsa komal
 
Breadth First Search & Depth First Search
Breadth First Search & Depth First SearchBreadth First Search & Depth First Search
Breadth First Search & Depth First SearchKevin Jadiya
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra Sahil Kumar
 
GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)
GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)
GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)Madhu Bala
 
Graph representation
Graph representationGraph representation
Graph representationTech_MX
 
A study on connectivity in graph theory june 18 123e
A study on connectivity in graph theory  june 18 123eA study on connectivity in graph theory  june 18 123e
A study on connectivity in graph theory june 18 123easwathymaths
 
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersMachine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersPier Luca Lanzi
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalramya marichamy
 
A presentation on prim's and kruskal's algorithm
A presentation on prim's and kruskal's algorithmA presentation on prim's and kruskal's algorithm
A presentation on prim's and kruskal's algorithmGaurav Kolekar
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIJSRD
 
Dijkstra's algorithm
Dijkstra's algorithmDijkstra's algorithm
Dijkstra's algorithmgsp1294
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.Megha Sharma
 

What's hot (20)

Graphs in Data Structure
 Graphs in Data Structure Graphs in Data Structure
Graphs in Data Structure
 
Breadth First Search & Depth First Search
Breadth First Search & Depth First SearchBreadth First Search & Depth First Search
Breadth First Search & Depth First Search
 
MATCHING GRAPH THEORY
MATCHING GRAPH THEORYMATCHING GRAPH THEORY
MATCHING GRAPH THEORY
 
Graph theory
Graph theory Graph theory
Graph theory
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
 
GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)
GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)
GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)
 
Graph representation
Graph representationGraph representation
Graph representation
 
A study on connectivity in graph theory june 18 123e
A study on connectivity in graph theory  june 18 123eA study on connectivity in graph theory  june 18 123e
A study on connectivity in graph theory june 18 123e
 
Graph Theory: Trees
Graph Theory: TreesGraph Theory: Trees
Graph Theory: Trees
 
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersMachine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Multi ways trees
Multi ways treesMulti ways trees
Multi ways trees
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
 
A presentation on prim's and kruskal's algorithm
A presentation on prim's and kruskal's algorithmA presentation on prim's and kruskal's algorithm
A presentation on prim's and kruskal's algorithm
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 
Dijkstra's algorithm
Dijkstra's algorithmDijkstra's algorithm
Dijkstra's algorithm
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.
 
Floyd Warshall Algorithm
Floyd Warshall AlgorithmFloyd Warshall Algorithm
Floyd Warshall Algorithm
 

Similar to Minicourse on Network Science

4. social network analysis
4. social network analysis4. social network analysis
4. social network analysisLokesh Ramaswamy
 
Lecture 14 data structures and algorithms
Lecture 14 data structures and algorithmsLecture 14 data structures and algorithms
Lecture 14 data structures and algorithmsAakash deep Singhal
 
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Lauri Eloranta
 
Unit II_Graph.pptxkgjrekjgiojtoiejhgnltegjte
Unit II_Graph.pptxkgjrekjgiojtoiejhgnltegjteUnit II_Graph.pptxkgjrekjgiojtoiejhgnltegjte
Unit II_Graph.pptxkgjrekjgiojtoiejhgnltegjtepournima055
 
Unit ii divide and conquer -2
Unit ii divide and conquer -2Unit ii divide and conquer -2
Unit ii divide and conquer -2subhashchandra197
 
Graph Data Structure
Graph Data StructureGraph Data Structure
Graph Data StructureKeno benti
 
Topological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsTopological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsMason Porter
 
Graph in data structure
Graph in data structureGraph in data structure
Graph in data structurePooja Bhojwani
 
Unit VI - Graphs.ppt
Unit VI - Graphs.pptUnit VI - Graphs.ppt
Unit VI - Graphs.pptHODElex
 
Random graph models
Random graph modelsRandom graph models
Random graph modelsnetworksuw
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 

Similar to Minicourse on Network Science (20)

4. social network analysis
4. social network analysis4. social network analysis
4. social network analysis
 
Lecture 14 data structures and algorithms
Lecture 14 data structures and algorithmsLecture 14 data structures and algorithms
Lecture 14 data structures and algorithms
 
Graphs
GraphsGraphs
Graphs
 
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
 
10.graph
10.graph10.graph
10.graph
 
Unit II_Graph.pptxkgjrekjgiojtoiejhgnltegjte
Unit II_Graph.pptxkgjrekjgiojtoiejhgnltegjteUnit II_Graph.pptxkgjrekjgiojtoiejhgnltegjte
Unit II_Graph.pptxkgjrekjgiojtoiejhgnltegjte
 
Networks
NetworksNetworks
Networks
 
Unit ii divide and conquer -2
Unit ii divide and conquer -2Unit ii divide and conquer -2
Unit ii divide and conquer -2
 
Unit ix graph
Unit   ix    graph Unit   ix    graph
Unit ix graph
 
Graph Data Structure
Graph Data StructureGraph Data Structure
Graph Data Structure
 
Unit 9 graph
Unit   9 graphUnit   9 graph
Unit 9 graph
 
Topological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsTopological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial Systems
 
Graph in data structure
Graph in data structureGraph in data structure
Graph in data structure
 
graphs.ppt
graphs.pptgraphs.ppt
graphs.ppt
 
Unit VI - Graphs.ppt
Unit VI - Graphs.pptUnit VI - Graphs.ppt
Unit VI - Graphs.ppt
 
Random graph models
Random graph modelsRandom graph models
Random graph models
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
An Introduction to Networks
An Introduction to NetworksAn Introduction to Networks
An Introduction to Networks
 
Data Mining Lecture_9.pptx
Data Mining Lecture_9.pptxData Mining Lecture_9.pptx
Data Mining Lecture_9.pptx
 
LEC 12-DSALGO-GRAPHS(final12).pdf
LEC 12-DSALGO-GRAPHS(final12).pdfLEC 12-DSALGO-GRAPHS(final12).pdf
LEC 12-DSALGO-GRAPHS(final12).pdf
 

More from Pavel Loskot

Quest for machine intelligence: Statistical learning methods
Quest for machine intelligence: Statistical learning methodsQuest for machine intelligence: Statistical learning methods
Quest for machine intelligence: Statistical learning methodsPavel Loskot
 
Research Automation
Research AutomationResearch Automation
Research AutomationPavel Loskot
 
Internet Access in Challenging Environments
Internet Access in Challenging EnvironmentsInternet Access in Challenging Environments
Internet Access in Challenging EnvironmentsPavel Loskot
 
Network Metrics and Measurements in the Era of the Digital Economies
Network Metrics and Measurements in the Era of the Digital EconomiesNetwork Metrics and Measurements in the Era of the Digital Economies
Network Metrics and Measurements in the Era of the Digital EconomiesPavel Loskot
 
The Next Big Thing: The Physical Internet
The Next Big Thing: The Physical InternetThe Next Big Thing: The Physical Internet
The Next Big Thing: The Physical InternetPavel Loskot
 
Templates and other research methods in Telecommunications
Templates and other research methods in TelecommunicationsTemplates and other research methods in Telecommunications
Templates and other research methods in TelecommunicationsPavel Loskot
 
Innovative Baggage Delivery for Sustainable Air Transport
Innovative Baggage Delivery for Sustainable Air TransportInnovative Baggage Delivery for Sustainable Air Transport
Innovative Baggage Delivery for Sustainable Air TransportPavel Loskot
 
Overview of Digital Economy
Overview of Digital EconomyOverview of Digital Economy
Overview of Digital EconomyPavel Loskot
 
Energy Efficiency of Telecom Networks
Energy Efficiency of Telecom NetworksEnergy Efficiency of Telecom Networks
Energy Efficiency of Telecom NetworksPavel Loskot
 
ICT for Care and Assisted Living
ICT for Care and Assisted LivingICT for Care and Assisted Living
ICT for Care and Assisted LivingPavel Loskot
 
3D Spatial Channel Modeling
3D Spatial Channel Modeling3D Spatial Channel Modeling
3D Spatial Channel ModelingPavel Loskot
 
Concept of Adaptive Transmission
Concept of Adaptive TransmissionConcept of Adaptive Transmission
Concept of Adaptive TransmissionPavel Loskot
 
Channel modeling based on 3D time-varying fields of information
Channel modeling based on 3D time-varying fields of informationChannel modeling based on 3D time-varying fields of information
Channel modeling based on 3D time-varying fields of informationPavel Loskot
 
Adaptive Transmission Concept
Adaptive Transmission ConceptAdaptive Transmission Concept
Adaptive Transmission ConceptPavel Loskot
 
Adaptive Radio Links 2
Adaptive Radio Links 2Adaptive Radio Links 2
Adaptive Radio Links 2Pavel Loskot
 
Adaptive Radio Links
Adaptive Radio LinksAdaptive Radio Links
Adaptive Radio LinksPavel Loskot
 
Multiuser MIMO-OFDM simulation framework in Matlab
Multiuser MIMO-OFDM simulation framework in MatlabMultiuser MIMO-OFDM simulation framework in Matlab
Multiuser MIMO-OFDM simulation framework in MatlabPavel Loskot
 

More from Pavel Loskot (17)

Quest for machine intelligence: Statistical learning methods
Quest for machine intelligence: Statistical learning methodsQuest for machine intelligence: Statistical learning methods
Quest for machine intelligence: Statistical learning methods
 
Research Automation
Research AutomationResearch Automation
Research Automation
 
Internet Access in Challenging Environments
Internet Access in Challenging EnvironmentsInternet Access in Challenging Environments
Internet Access in Challenging Environments
 
Network Metrics and Measurements in the Era of the Digital Economies
Network Metrics and Measurements in the Era of the Digital EconomiesNetwork Metrics and Measurements in the Era of the Digital Economies
Network Metrics and Measurements in the Era of the Digital Economies
 
The Next Big Thing: The Physical Internet
The Next Big Thing: The Physical InternetThe Next Big Thing: The Physical Internet
The Next Big Thing: The Physical Internet
 
Templates and other research methods in Telecommunications
Templates and other research methods in TelecommunicationsTemplates and other research methods in Telecommunications
Templates and other research methods in Telecommunications
 
Innovative Baggage Delivery for Sustainable Air Transport
Innovative Baggage Delivery for Sustainable Air TransportInnovative Baggage Delivery for Sustainable Air Transport
Innovative Baggage Delivery for Sustainable Air Transport
 
Overview of Digital Economy
Overview of Digital EconomyOverview of Digital Economy
Overview of Digital Economy
 
Energy Efficiency of Telecom Networks
Energy Efficiency of Telecom NetworksEnergy Efficiency of Telecom Networks
Energy Efficiency of Telecom Networks
 
ICT for Care and Assisted Living
ICT for Care and Assisted LivingICT for Care and Assisted Living
ICT for Care and Assisted Living
 
3D Spatial Channel Modeling
3D Spatial Channel Modeling3D Spatial Channel Modeling
3D Spatial Channel Modeling
 
Concept of Adaptive Transmission
Concept of Adaptive TransmissionConcept of Adaptive Transmission
Concept of Adaptive Transmission
 
Channel modeling based on 3D time-varying fields of information
Channel modeling based on 3D time-varying fields of informationChannel modeling based on 3D time-varying fields of information
Channel modeling based on 3D time-varying fields of information
 
Adaptive Transmission Concept
Adaptive Transmission ConceptAdaptive Transmission Concept
Adaptive Transmission Concept
 
Adaptive Radio Links 2
Adaptive Radio Links 2Adaptive Radio Links 2
Adaptive Radio Links 2
 
Adaptive Radio Links
Adaptive Radio LinksAdaptive Radio Links
Adaptive Radio Links
 
Multiuser MIMO-OFDM simulation framework in Matlab
Multiuser MIMO-OFDM simulation framework in MatlabMultiuser MIMO-OFDM simulation framework in Matlab
Multiuser MIMO-OFDM simulation framework in Matlab
 

Recently uploaded

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 

Recently uploaded (20)

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 

Minicourse on Network Science

  • 1. A Mini-Course on Network Science Pavel Loskot p.loskot@swan.ac.uk
  • 2. Pavel Loskot c 2014 1/3 Course Outline 1. Introduction • fundamentals of complex systems and graph theory 2. Structure • sub-graphs, centrality measures, weighted networks, community 3. Random Models • random, small world and scale free networks 4. Robustness • some definitions and metrics 5. Processes • epidemic spreading and information cascades 6. Algorithms • max flow and min cut, routing, search and navigation 7. Software • using Matlab and Python, available software, few demos from YouTube
  • 3. Pavel Loskot c 2014 2/3 Used Resources Ernesto Estrada The Structure of Complex Networks: Theory and Applications Oxford University Press, 2011 Cecilia Mascolo Social and Technological Network Analysis Course at University of Cambridge, UK Jari Saram¨aki Introduction to Complex Networks Aalto University, Finland Animesh Mukherjee Complex Network Theory IIT Kharagpur, India
  • 4. Pavel Loskot c 2014 3/3 Used Resources Robert Leese An Introduction to Clustering Industrial Mathematics Knowledge Transfer Network Kevin Wayne Max Flow, Min Cut Princeton University, USA James F. Kurose and Keith W. Ross Computer Networking, A Top-Down Approach Pearson Education, 2012 Wikipedia various topics
  • 6. Pavel Loskot c 2014 1/22 Complex Systems Emergence of complexity: • locally simple rules, and yet globally complex behavior • systems evolve, are dynamic and adapt to the environment Modeling: • infinitely many possibilities • normally data-driven, but what data to collect? Emergence of stochasticity: God doesn’t play dice with the world. • many entities, complex interactions • often useful to describe observations statistically (joint PDF, correlations) • human beings are living at the edge of stochastic and deterministic world
  • 7. Pavel Loskot c 2014 2/22 Illustration of Complexity Simple idea: • send packets between two nodes Implementation: • how to distinguish end-nodes? • how to find the route? • how to share network (resources) among billions end-nodes? • how to deal with lost and delayed packets? • how to deal with mobility and nodes leaving and arriving? Solution: • evolution - solve problems iteratively • separation - divide and conquer • new problems emerge as network growths: scalability, stability, security
  • 8. Pavel Loskot c 2014 3/22 Emergence of Order Differing (spatial-temporal) perspectives • insider: interacting with immediate neighbors (immediate, local) • outsider: system level perception (average, global)
  • 9. Pavel Loskot c 2014 4/22 Description of Networks 1. Complete: everybody connected with everybody else 2. Random: connections selected arbitrarily at random 3. Random tree: connections selected arbitrarily at random, no cycles allowed 4. Real-world networks: • exponential degree distribution and strongly disassortative • small average path length and high clustering coefficient • several nodes with high (degree, closeness and betweenness) centrality • several main communities . . . and many other distinctive characteristics Challenge: how to synthesize real-world networks with all these properties?
  • 10. Pavel Loskot c 2014 5/22 Formal Definitions Network • graph model of functional and/or structural relationships of a complex system Time-invariant network • graph G = (V, E) where set of nodes V = {v1, . . . , vN}, and set of edges (links) E =⊂ V ⊗ V = {e1, . . . , eL}, i.e., every edge el ∈ E is associated with one pair (vi, vj) ∈ V ⊗ V, or in other words, E is a set of (un)ordered pairs from V • let’s not allow self-edges ([vi, vi] E) and duplicate-edges (E has unique elements) • nodes and edges are objects, but for analysis and evaluation purposes, we need numbers, i.e., assign numbers (called weights) to nodes and edges vn → Wv(n) el → We(l) el = [vi, vj] = Wv(i, j) Dynamic networks • graphs (nodes, edges) as well as weights can vary over time
  • 11. Pavel Loskot c 2014 6/22 Formal Definitions (cont.) Graph edges (in structural models) • only if two nodes communicate; this communication can be implemented in many different ways (radiation, material transport flows, . . .) • communicating nodes interact i.e. influence each other’s behavior • communications are, first of all, information flows: Two nodes communicate • if there is enough information delivered (just sent is not enough) over a given time-window i.e. communication is integral (average) quantity • delivered information may be ignored, not recognized, or misinterpreted
  • 12. Pavel Loskot c 2014 7/22 Fundamentals (of Graph Theory) (Un)directed graphs • for directed graphs, E is a set of ordered pairs [u, v] ∈ V ⊗ V Neighbors, degrees • u is neighbor of v if (u, v) ∈ E, then u and v are said to be adjacent nodes • (u, w), (v, w), (y, w) and (x, w) are adjacent edges which are incident at w • in-degree kin, out-degree kout and k = kin + kout degree distributions are important statistics (this assumes all edges counted with unit weights)
  • 13. Pavel Loskot c 2014 8/22 Fundamentals Isomorphic graphs • G1 and G2 are isomorphic if one-to-one mapping of vertices and (possibly directed) edges (i.e., different visualizations of the same graph) Edge (or connection) density ρ = |E| |V| 2 = 2|E| |V|(|V| − 1) • |V| 2 = |V|(|V|−1) 2 is the maximum possible number of edges • ρ = 1 if fully connected, real-world network ρ ≪ 1 (i.e. sparse) • graph is sparse if |E| ≈ |V|, graph is dense if |E| ≈ |V|2 Clique • ˜G = ( ˜V, ˜E) is a subgraph of G = (V, E) if ˜V ⊆ V and ˜E ⊆ E • clique is a maximal, completely connected subgraph of the graph • N-clique is a fully connected subgraph with N vertices • clique number is the size of the largest clique in the graph
  • 14. Pavel Loskot c 2014 9/22 Fundamentals Path, walk, trial • path from v1 to vL is an ordered sequence of edges between ordered list of vertices such that no vertex is visited twice • length of path is the number of its edges (i.e. assuming edges of unit length) • if there is no path between two vertices, their path length is infinite • distance of two vertices is their shortest path (having the smallest length) • walk of length L from v1 to vL+1 is a sequence [v1, . . . , vL+1] where two subsequent (only those) vertices are required to be different • trial is a walk with no repeated edge • cycle is a path that starts and ends at the same vertex Diameter (d) and radius (r) of a graph • the longest and the smallest shortest path, respectively: d = max u,v∈V distance(u, v) r = min u,v∈V distance(u, v)
  • 15. Pavel Loskot c 2014 10/22 Fundamentals Average path length • it is the average shortest path between all pairs of vertices ¯d = 1 2 |V| 2 u,v∈V distance(u, v) • if some vertices u and v are disconnected (i.e., no path connecting u and v), the average path length is harmonic mean instead (its reciprocal) ¯d =   1 2 |V| 2 u,v∈V 1 distance(u, v)   −1 Graph coloring • assign labels to vertices, so that no adjacent vertices get the same label • chromatic number is the minimum number of colors to solve the coloring
  • 16. Pavel Loskot c 2014 11/22 Fundamentals Connectivity • connected component is a subgraph where there is a path between every pair of vertices; for directed graphs, the directions can be ignored • connected graph if there is a path between every pair of its vertices; in other words, the graph contains a single connected component • (sub)graphs not connected are disconnected • node connectivity is the smallest number of vertices when (they are) removed, the graph becomes disconnected • edge connectivity is the smallest number of edges when (they are) removed, the graph becomes disconnected • strongly connected component if its every vertex is reachable from any other of its vertex (i.e., edge directions matter here) Cutsets • vertex cutset is a set of vertices when removed disconnect the graph (i.e. increases the number of graph components); they are also known as articulation points or brokers (in social networks) • edge cutset is a set of edges when removed disconnect the graph
  • 17. Pavel Loskot c 2014 12/22 Fundamentals • vertices G and H in graph below are cutset vertices • bridges: if removed, the number of graph components increases Basic graphs
  • 18. Pavel Loskot c 2014 13/22 Fundamentals Tree • connected graph with no cycles (adding only one link creates a cycle) • becomes disconnected by removing any single link • any pair of nodes is connected by exactly one path • spanning tree is subgraph of a network including all its nodes and it is a tree R-regular graph • all vertices have degree R and there are |E| = R|V|/2 edges Planar graph • can be drawn in a 2D plane such that no two edges intersect • among all complete graphs Cn, only C1, C2, C3 and C4 are planar • example of embeddings of C4
  • 19. Pavel Loskot c 2014 14/22 Basic Graphs Bipartite networks • two sets of vertices, only edges between vertices in these two sets allowed • graph is bipartite if it does not contain any odd cycles • generalization to more than two sets of vertices is possible Graph matching • given graph G = (V, E), a matching M ⊆ E in G is a set of edges not sharing any common vertex • maximal matching any edge added to M will violate matching • maximum matching contains the largest number of edges
  • 20. Pavel Loskot c 2014 15/22 Adjacency Matrix • (binary) adjacency matrix [A]ij = 1 if [vi, vj] ∈ E 0 otherwise • for undirected graphs, A is symmetric (i.e. A = AT ) k = (1T A)T = A1 degree distribution • for directed graphs, A is asymmetric kin = (1T A)T in-degree distribution kout = A1 out-degree distribution • average degree of a graph ¯k = 2|E| |V| = 1T k |V| = 1T A1 |V| = u∈V k(u) |V|
  • 21. Pavel Loskot c 2014 16/22 Adjacency Matrix • for undirected bipartite graphs with vertices V = V1 ∪ V2, |V1| = n1, |V2| = n2, A = 0n2×n1 RT n1×n2 Rn1×n2 0n1×n2 • generally, An , n = 1, 2, . . . denotes the number of paths of length n in graph, i.e., [An ]ij is the number of distinct n-hop paths between vertices i and j • [AT A]ij and [AAT ]ij is the number of vertices connected to/from the vertices vi and vj at the same time, respectively • tr A3 /6 is the number of triangles in the matrix • open and closed triangles • closed triangle represents 6 closed triplets (starting at each of 3 vertices in 2 directions) Incidence matrix • |V| × |E| matrix, [B]ij = 1 if vi ∈ ej 0 otherwise • degree matrix is a diagonal matrix D = diag k1, . . . , k|V| • adjacency matrix can be also expressed as A = BBT − D
  • 22. Pavel Loskot c 2014 17/22 Adjacency Matrix Graph spectrum • recall that for undirected graph, adjacency matrix is symmetric, so its eigenvalues are real-valued and referred to as graph spectrum • eigenvalue λ and eigenvector v satisfy Av = λv, i.e., (λI − A)v = 0 • characteristic polynomial pA(t) = det(tI − A) = i(t − λi) of matrix A has the roots the eigenvalues of A • Laplacian matrix of graph G is L = D − A (degree minus adjacency matrix): [L]ij =    [D]ii = k(i) i = j −1 i j and [vi, vj] ∈ E 0 otherwise and the spectrum of graph G are eigenvalues of L (rather than of A) Properties of Laplacian • multiplicity of λ0 = 0 of L is the number of connected components of G • eigenvalue λ0 = 0 corresponds to eigenvector v0 = [1, . . . , 1]T , i.e., Lv0 = 0 • L = BBT where B is |V| × |E| incidence matrix of graph G = (V, E)
  • 23. Pavel Loskot c 2014 18/22 Power-Law Distribution • long-tail (right) with many low-connected vertices (left) (80-20 rule) • many real-world networks experience this degree distribution, so they have star-like topology • also known as scale-free distribution of scale-free networks; these networks are self-similar at different (spatial-temporal) scales p(k) = A k−γ → p(c · k) = A c−γ · p(k) • cumulative degree distribution (CDD) P(k) = ∞ k′=k p(k′ ) ≈ k−(γ−1) (the probability the degree at least k)
  • 24. Pavel Loskot c 2014 19/22 Analyzing Degree Distributions Degree-degree correlations • assortivity coefficient or Pearson correlation coefficient (r) Assortative mixing (r > 0) • bias towards connections between nodes with similar characteristics (hubs tend to connect to each other) • useful, e.g. to understand spread of diseases and their treatment Disassortative mixing (r < 0) • dissimilar nodes tend to connect to each other (hubs avoid each other) Neutral mixing (r = 0) • connections follow some probability distribution
  • 25. Pavel Loskot c 2014 20/22 Analyzing Degree Distributions Mathematically: • let pij be the probability of edge to have degrees ki and kj at both ends ij pij = 1 i pij = qj = kjpj j kjpj • perfectly assortative networks have pij = qiδij (only nodes of the same degree connect) • if degrees independent, then pij = qiqj • Pearson coefficient, −1 ≤ r ≤ 1 r = E (ki − E[ki])(kj − E kj ) σ2(ki) σ2(kj) = i,j kikj(pij − qiqj) max i,j kikj(pij − qiqj) = i,j kikj(pij − qiqj) i,j kikj(δij − qi)qj
  • 26. Pavel Loskot c 2014 21/22 Degree Distributions Degree-degree correlation • directed graphs (networks) Summary • for large graphs, edges (topology) can be considered statistically • degree distribution is partial statistical description (of topology) • degree-degree correlation is more informative, but still incomplete info
  • 27. Pavel Loskot c 2014 22/22 Take-Home Messages Complex Systems • consists of large number of interacting components • graphs are very good mathematical models of these systems; they are very generic objects with many specific instances (trees, lists, tables etc.) • availability of observations (measurements data) is a strong driving force • a common systematic framework to study these systems: Network Science History of modern science Problems of simplicity (1600-1800) understanding influence of one variable over another Problems of disorganized (1900-1950) number of variables is very large complexity but system as a whole has well- defined average behavior Problems of organized (1950-today) simultaneously dealing with number complexity of factors forming whole system - W. Weaver, 1948
  • 29. Pavel Loskot c 2014 1/32 Similarity of Networks • the Nature is built up of complex networks • there is need to have a common framework for systematically describing, analyzing and eventually synthesizing networks to mimic the Nature
  • 30. Pavel Loskot c 2014 2/32 Comparing Networks Similarity of (static) networks 1. calculate and compare (a vector of) metrics for each network; N.B. we can only compare scalar values (e.g. Euclidean distances between vectors) OR 2. identify distinctive subgraphs at certain granularity, and compare those Graphlets [Prˇzulj, 2004] • pictured right: 30 subgraphs of 2-5 nodes of 73 possible types • generalizes vector of node degrees to graphlet degrees; it is a vector of 73 components of the number of nodes of given type in the network Fragments • quantitative analysis relies on correlations between fragment statistics in the network and the network properties
  • 31. Pavel Loskot c 2014 3/32 Comparing Networks Motifs [Milo, 2002] • subgraphs having the statistical significance of occurrence much larger than if the network was created completely at random • network randomization: 1. select two links at random, 2. exchange their end-points, 3. repeat • a motif in the real network occurs much more often than (on average) in an ensemble of random networks having the same degree distribution • we require that the probability of motif appearing in an ensemble of random networks at least the number of times as in real network is small • this is quantified by the Z-score (N denotes the number of occurrences) Z = Nreal − E[Nrandom] E (Nrandom − E[Nrandom])2 • motifs are network specific, although families of networks can share the same motifs • importance of motifs can be evaluated as the significance profile (SP) vector SP =   Z1 i Z2 i , Z2 i Z2 i , · · ·  
  • 32. Pavel Loskot c 2014 4/32 Comparing Networks Motif examples Relative abundance of fragments • assume that ensemble of random networks has the same nodes degrees as the real-world network α = Nreal − E[Nrandom] Nreal + E[Nrandom]
  • 33. Pavel Loskot c 2014 5/32 Comparing Networks Relative abundance examples • ratios of the number of fragments occurrences are also useful to characterize the network structure as shown next
  • 34. Pavel Loskot c 2014 6/32 Transitivity Measures Clustering coefficient • recall that every triangle represents three connected (open or closed) triples • let | ˜T3| be the number of triangles and | ˜P2| the number of 2-paths (connected triples with 2 or 3 ties); the clustering coefficient (a.k.a. network transitivity): C3 = 3| ˜T3| | ˜P2| where | ˜T3| = tr A3 /6, and | ˜P2| = 1 2 ij [A2 ]ij − tr{A} • a network can be highly clustered locally, but not globally (i.e., considering average of local clusterings across all nodes is not sufficient) • clustering tends to be much larger for real-world than random networks Example • A’s friends: B,C,D and E • all possible edges among A’s friends: B-C, B- D, B-E, C-D, C-E, D-E, i.e., 6 in total and out of which only 1 (C-D) exists • thus, clustering coefficient of A is 1/6 Generalization • any subgraph, ratio of actual to maximum possible number of its occurrences: Cn = n| ˜Tn| | ˜Pn|
  • 35. Pavel Loskot c 2014 7/32 Centrality Measures Aim • quantify importance of nodes in a network (so-called positional advantage), i.e. how nodes contribute to the overall structural properties of the network • e.g. important nodes disseminate information faster, can stop spreading epidemics, can protect network from breaking and so on Degree centrality • hubs are likely to have the largest influence (e.g. number of friends to help) • a transitivity measure since it is ratio of single (neighboring) node fragments • for a network of N nodes, i-th node of degree ki has degree centrality C1(i) = | ˜T1| | ˜P1| = CD(i) = ki N − 1 Network centralization (centrality) (σ2 C1 ) σ2 C1 = 1 N − 1 N i=1 C1(i) − ¯C1 2 where ¯C1 = 1 N N i=1 C1(i) • star topology has the maximum while line topology has the minimum centralization
  • 36. Pavel Loskot c 2014 8/32 Centrality Measures Freeman’s degree centrality • quantify variations in node degree centrality in the whole network ¯CD = N i=1(k∗ − ki) (N − 1)(N − 2) where k∗ = maxi ki, and max N i=1(k∗ − ki) = (N − 1)(N − 2) for a star network Betweenness centrality (beyond nearest neighbors) • quantify node importance in communications between pairs of other nodes • ability to broker between groups, likelihood of intercepting information etc. • thus, it is the likeliness of node w to be involved in communications CB(w) = 1 n−1 2 u w v ρ(u, w, v) ρ(u, v) (normalization optional) ρ(u, w, v) number of shortest paths between u and v via w ρ(u, v) number of all shortest paths between u and v OR ρ(u, w, v) maximum flow from u to v through w ρ(u, v) total maximum flow from u to v
  • 37. Pavel Loskot c 2014 9/32 Centrality Measures Example (betweenness centrality) • A and E are not in-between any pairs, B and D are in-between 3 pairs, and C is in-between 4 pairs Closeness centrality • measure of how much the node is in “middle of things” • let d(u, v) be the shortest path length between nodes u and v CC(u) =   1 N − 1 u v d(u, v)   −1 (normalization optional) Example (closeness centrality) CC(A)= 1+2+3+4 4 −1 = 0.4
  • 38. Pavel Loskot c 2014 10/32 Centrality Measures Information centrality CIC(i) =   1 N j 1 Iij   −1 Eigenvector centrality (xu) • account for connections that are (or not) isolated; important nodes are likely connected to other important nodes • let B(u) be the neighbors of node u xu = 1 λ v∈B(u) xv = 1 λ v∈V [A]u,v xv ⇒ Ax = λx algorithm: initialize xu = 1 ∀u, re-calculate xu ∀u, λ = maxu xu, repeat Katz centrality • instead of counting shortest paths (as in closeness centrality), count all paths • let 1 < α < λ1 (largest eigenvalue of A) CK(i) = [Z · 1]i where Z = ∞ k=1 α−k Ak = I − 1 α A −1 − I so the values of CK(i) are dependent on choice of α
  • 39. Pavel Loskot c 2014 11/32 Centrality Measures PageRank centrality • reflects the probabilities that random walk through the network arrives to any particular node • intuitively, if there are many links out of node v, one of these links to node u represents average recommendation of u by v; if the number of links out of v is reduced, recommendation of u by v increases • define the modified adjacency matrix [H]ij = 1/kout(i) if [vi, vj] ∈ E 0 otherwise • PageRank vector CPR = [CPR(1), . . . ,CPR(N)]T at step k is updated as Ck+1 PR := Ck PR · H note that node 4 traps a random walker, and also, the search is often randomly reset (with probability 1 − α), so this modified H should be used instead: H′ = αH + α N (a1T ) + 1 − α N where [a]i = 1 if kout(i) = 0 0 otherwise
  • 40. Pavel Loskot c 2014 12/32 Centrality Measures Reciprocity (r) • in directed networks, link from u to v can be reciprocated as link from v to u; these are called co-links r = ij[A]ij[A]ji |E| (fraction of reciprocated links) Rich-Club coefficient of degree k (R(k)) • hubs tend to be densely interconnected which is quantified by R(k) • let subgraph (V′ (k), E′ ) ⊆ (V, E) where V′ (k) ⊆ V is subset of nodes with degree at least k, and E′ ⊆ E are the corresponding edges among V′ (k) R(k) = |E′ | |V′(k)| 2 Matching index (µij) • quantify similarity of connectivity of the two end-vertices of an edge • small value of µij indicates the edge between vi ∈ V and vj ∈ V is a bridge between two dissimilar regions of the network µij = k i,j AikAk j i k Aik + j k Ajk
  • 41. Pavel Loskot c 2014 13/32 Weighted Networks Graph Network System vertex node component edge link interaction Weights mapping • weights can be assigned to vertices as well as to (more often) edges; we assume mapping W : (V, E) → (V, W) so weighted adjacency matrix and original adjacency matrix, respectively, [W ]ij = wij ∈ R [A]ij = 1 |wij| ≥ ∆ 0 |wij| < ∆ Vertex strength • degree distribution is generalized to the strength distribution having again a power-law-like tails in many real-world networks s(i) = j wij • it was observed that node strength and node degree have dependency as E[s|k] ∝ kβ , β > 0 for β > 1, high-degree vertices (hubs) tend to be high-strength vertices
  • 42. Pavel Loskot c 2014 14/32 Weighted Networks Other generalizations • the edge contributions can be normalized as wij/ j wij = wij/s(i), e.g. the average nearest (first order) neighbor degree kNN(i) = j wi j s(i) [A]ij k( j) • importantly, there are no generally agreed definitions of quantities (metrics) for weighted networks, e.g. the clustering coefficient ([A]ij ≡ aij) C3(i) = 1 s(i)(k(i) − 1) j,k wij + wik 2 aijajkaik [Barrat, Barth´elemy, Vespignani] C3(i) = 1 k(i)(k(i) − 1) j,k (wijwikwjk)1/3 [Onnela, Saram¨aki, Kaski,Kert´esz] C3(i) = j,k wijwjkwik k wik 2 − k w2 ik [Zhang] C3(i) = j,k wijwjkwki (maxij wij) j,k wijwik [Holme] where for unweighted network we assume wij = 1 if [vi, vj] ∈ E, and 0 otherwise
  • 43. Pavel Loskot c 2014 15/32 Weighted Networks Time-series as graphs 1. pre-processing: reduce measurement noise, reduce amount of data 2. calculate magnitude of correlations (possibly with thresholding) 0 ≤ [W ]ij = E didj − E[di] E dj E d2 i − E[di]2 E d2 j − E dj 2 ≤ 1 3. construct a weighted graph assuming the weight matrix W
  • 44. Pavel Loskot c 2014 16/32 Weighted Networks Spanning tree of a graph • a tree topology containing all nodes of the graph • possibly additional requirement to maximize or minimize the sum of edge weights • it can be used to emphasize clusters in the graph, but . . . a lot of information is discarded and is also sensitive to noise and thresholding Example (NYSE stocks)
  • 45. Pavel Loskot c 2014 17/32 Community Structure Network communities • so far, we considered local and global structure and properties; here, we look at spatial scale in-between individual nodes and the whole network - clusters • clusters are obtained by network partitioning or clustering • our objective is to partition the network using only its topology Why clustering • manage complex systems by creating hierarchy, for example, Big Data analysis and classification such as large databases, customer recommendations, website ranking, genomics, market evaluations etc. • identify bridges and weak ties in networks Formally • find P disjoint subsets Vi, so that ∪P i=1Vi = V and Vi ∩ Vj is empty set for i j
  • 46. Pavel Loskot c 2014 18/32 Community Structure Balanced partitioning • given P, the size of partitions is approximately equal i.e. |Vi| ≈ |V|/P • possibly also, the cut (the links between subsets) size can be minimized Community • there is a path through the community between every pair of nodes • internal connection density significantly larger than density of external connections Cut size • assume a weighted network, and the partition ˜V ⊂ V • the internal and external weights of a node vi ∈ ˜V in the partition Wint(i) = vj∈ ˜V wij , Wext(i) = vj ˜V wij • the cut size between ˜V and the rest of nodes V ˜V Ccut( ˜V) = 1 2 vi∈V Wext(i)
  • 47. Pavel Loskot c 2014 19/32 Community Structure Reducing cut size • moving node vi in or out the partition ˜V will change the cut size by g(i) = Wext(i) − Wint(i) so cut size is reduced if Wext(i) > Wint(i) • for partitions already balanced, consider replacing one node in the partition (i.e., move one node out and another node in); the cut size is changed by g(i, j) = g(i) + g( j) − 2wij if vi and vj connected g(i) + g( j) otherwise Centrality based partitioning • links connecting nodes in different communities are likely to have large edge betweenness centrality (defined analogically to node betweenness) Algorithm [Girvan, Newman, 2002] 1. calculate edge betweenness for all links, remove link with highest such value 2. recalculate edge betweenness for remaining links 3. repeat until all links have been removed
  • 48. Pavel Loskot c 2014 20/32 Community Structure Modularity • need to compare different partitions to decide which one is the best • intuitively, cohesion or links density within the community is likely to be significantly larger than if the community is formed at random • for partitioning ∪P i=1Vi = V, with edges Ei within the partition Vi, the modularity indicator (ci is community assignment of vertex vi) Q = P i=1   |Ei| |E| −   vj∈Vk k(vj) 2|E|   2   = . . . = 1 2|E| i,j [A]ij − k(i)k( j) 2|E| δ(ci, cj) so it is actual number of edges minus expected number of edges inside the community for a random subgraph with the same node degree distribution • Q ≥ −1, and max Q = 1 for strong community structure • can be used as stopping criterion (Q >> 0) in Girvan-Newman algorithm Modularity optimization • find partitioning with maximum modularity (exact solution is NP complete): complexity O |V| |V|/2 ∼ 2|V| √ π|V|/2 for large |V|
  • 49. Pavel Loskot c 2014 21/32 Community Structure Resolution problem • modularity based clustering may fail to identify obvious small clusters close to a large cluster • modularity is deficient if clusters are circularly connect (pictured right) • other similarity measures also affected (minimum cuts, . . .) Possible solution • use multiple similarity metrics • then choose the best partition by consensus (e.g. majority vote)
  • 50. Pavel Loskot c 2014 22/32 Community Structure Hierarchical clustering • complexity O((|E| + |V|)|V|), many networks are sparse (|V| ≈ |E|) Algorithm 1. Initialize: |V| communities of 1 vertex each 2. Calculate modularity ∆Q for all pairs of existing communities 3. Merge the community pair having the largest increase ∆Q 4. Build the dendrogram and repeat steps 2 and 3 until only one community remains Clustering based on Euclidean distance
  • 51. Pavel Loskot c 2014 23/32 Community Structure Merging clusters • similarity between clusters can be measured as single linkage: minimum between all pairs of nodes in two clusters • Complete linkage: maximum between all pairs of nodes in two clusters • Average linkage: average between all pairs of nodes in two clusters Limitations of modularity • appears to be strongly dependent on the density of links in the network • thus, not good measure to determine communities in sparse networks Clustering techniques 1. Agglomerative (bottom-up) techniques: edges are added among nodes to create communities (e.g. dendrogram) 2. Divisive (top-down) techniques: edges are removed from graph to create separate communities 3. Spectral techniques: graph splitting based on eigen-analysis Similarity measures • quantify (dis)similarity between nodes to decide on communities in all clustering algorithms • selection strongly application dependent (modularity, cosine similarity, Jaccard’s coefficient, . . .)
  • 52. Pavel Loskot c 2014 24/32 Community Structure Louvain method (based on modularity optimization) • more accurate and more efficient (much faster) than hierarchical clustering • number of communities decreases quickly in only few iterations Algorithm 1. Initialize: every node is in its own community 2. For each node i, consider all its neighbors j, and check if moving i into j’s community increases ∆Q 3. Move i into community for which ∆Q is maximum 4. Repeat steps 2 and 3 until no further improvement possible (i.e. ∆Q = 0) 5. collapse the communities into single nodes (merging multiple edges between these new nodes), and go back to step 2
  • 53. Pavel Loskot c 2014 25/32 Community Structure K-means clustering • number of clusters K predefined • minimize e.g. Euclidean distances: {Vi}i=1,...,K = argmin K i=1 v∈Vi v − ¯vi 2 , ¯vi = 1 |Vi| v∈Vi v Algorithm 1. Initialize: select K vertices at random as initial clusters and assign remaining vertices to nearest clusters 2. Calculate new centroids ¯vi for each cluster 3. Re-assigned all vertices to the nearest clusters 4. Go to step 2 until some stopping criterion is met 89% of data correctly classified
  • 54. Pavel Loskot c 2014 26/32 Community Structure Limitations of K-means clustering • sensitivity to initial conditions and outliers • sensitivity to non-homogeneous structure, i.e. clusters differ significantly in size, connection density, non-spherical shape (for Euclidean distance metric)
  • 55. Pavel Loskot c 2014 27/32 Community Structure Gaussian mixture models • assume there are K clusters, vertex vi has location xi • location of vertices in cluster Vi are normally distributed ∼ N(x|µi, λi) • let zi, i = 1, . . . , K be independent latent variables such that zi = 1 if cluster i ∼ N(x|µi, λi) and zi = 0 otherwise, so Pr(z) = K i=1 πzi i • if zi are known, the data are labeled (parameters of their distribution are known), otherwise data are un-labeled (unsupervised learning) • πi = Pr(zi) are mixing probabilities (weights), so that K i=1 πi = 1 and the distribution of location x of a vertex is p(x) = z p(x|z) p(z) = K i=1 πi N(x|µi, λi), wherep(x|z) = K i=1 N(x|µi, λi)zi • unknown parameters: mixing coefficients (πi), means (µi), covariances (λi) • using Bayes’ theorem, we can find posterior probabilities (responsibilities) Pr(zi|x) that the k-th Gaussian component has in explaining observed data Algorithm [Expectation Maximization (EM)] 1. E-step: evaluate Pr(zi|x) given current parameters 2. M-step: re-estimate parameters using current Pr(zi|x)
  • 56. Pavel Loskot c 2014 28/32 Community Structure Overlapping communities • nodes may belong to more than one community (i.e. subsets Vi not disjoint) Clique percolation method [Palla 2005] • K-clique are K fully connected nodes • K-cliques adjacent if share K − 1 nodes • K-clique community is a set of nodes connected through adjacent K-cliques Algorithm 1. identify maximal cliques in the network (complex problem, but fortunately many real-world networks are relatively sparse) 2. consider cliques as single nodes; interconnect cliques if they share at least K − 1 nodes 3. identify connected components in graph created in step 2
  • 57. Pavel Loskot c 2014 29/32 Community Structure Spectral clustering • K-means, Gaussian mixtures, hierarchical method are good for compact clusters • spectral clustering transforms the data into a new basis where standard algorithms work well Algorithm 1. Construct similarity matrix, [S]ij = Exp − xi − xj 2 /2σ2 2. Construct Laplacian L = D −S where D is diagonal matrix of weights, [D]ii = j[S]ij 3. Construct matrix U of k eigenvectors corresponding to k largest eigenvalues of L 4. Perform clustering on the transformed data x′ = UT x
  • 58. Pavel Loskot c 2014 30/32 Community Structure Real-time clustering • (dynamic) re-clustering for every new data arrival is expensive • (dynamically) varying the number of clusters is confusing Hierarchical Agglomerate Clustering [HAC, 2004] 1. Initialization: hierarchical clustering (e.g. using dendrogram) 2. new data either assigned to one of existing cluster, OR 3. new data form new cluster, and two existing clusters are merged
  • 59. Pavel Loskot c 2014 31/32 Community Structure Community analysis • distribution of community sizes • intra-community edge densities • number of intra- and inter-community links • average number of communities per node • . . . Community network • communities → nodes • edges weighted by number of links between communities
  • 60. Pavel Loskot c 2014 32/32 Take-Home Messages Network structure analysis • structure of un-weighted static networks, i.e. knowing only their topology • subgraphs, graphlets, fragments and motifs are building blocks of large networks; the statistics of their occurrence is useful to compare network topology beyond their degree distribution • network partitioning or clustering to identify (overlapping) communities Measures of network structure • centrality (degree, betweenness, closeness, eigenvector, Katz, PageRank, . . .) • clustering coefficient, Rich-Club coefficient • modifications of measures for weighted networks
  • 62. Pavel Loskot c 2014 1/23 Statistical Modeling Objectives • account for models or parameters uncertainty, measurement noises etc. • make (short-to-medium term) predictions from the models • generate artificial data for verifying models and predictions • decide how much randomness influence properties; here we compare structural and functional properties of random and real-world networks Milgram’s experiment [1967] • famous “six-degree separation” • 300 people at random to send letter to a person in Boston • repeated in 2003: 18 targets, 60k senders, communications via emails • new findings in 2003: median 5 − 7 steps, network structure is not everything, high impact of incentives • Facebook: 92% of users only 5 hops away, 99% at 6 hops away
  • 63. Pavel Loskot c 2014 2/23 Random Network Models Erdos-Renyi (ER) random graph [1959] • graph GER(n, p) with n vertices and edges chosen independently with probability p, “a zero-order approximation” of real-world networks • thus, vertex degree is random (for large n, binomial distribution approximated by Poisson distribution) Pr(k) = n − 1 k pk (1 − p)N−1−k ≈ e−¯k ¯kk k! , ¯k = E[k] = (n − 1)p • most vertices have average linking to other nodes (i.e. degree close to ¯k) • diameter (d) and average distance (¯l) between two vertices is relatively small compared to the size of the graph d = ln(n) ln(p(n − 1)) = ln(n) ln(¯k) ≈ ¯l • average number of edges E[|E|] = p N 2 = n¯k 2 ; the latter, since ¯k = 2E[|E|] n Connectivity of ER random graph • average degree ¯k    < 1 graph disconnected > 1 a giant component appears ≥ ln(n) graph (almost) completely connected
  • 64. Pavel Loskot c 2014 3/23 Random Network Models Average path length and diameter of ER (random) graph • let l(i, j) be the shortest path between vertices vi and vj • the shortest paths can be combined into a single metric as ¯l = 1 (n 2) i,j i j l(i,j) 2 average shortest path d = maxi,j l(i, j) maximum shortest path (diameter) • if n is large, the average path length ¯l ∝ ln n is relatively small and growths slowly with network size (this is typical for many large networks); for comparison, 1D lattice (chain): ¯l ∝ n, 2D lattice: ¯l ∝ n1/2
  • 65. Pavel Loskot c 2014 4/23 Random Network Models Clustering coefficient of ER graph • ratio of neighbors being friends to all possible friendships among neighbors • probability that two neighbors are connected is p, so clustering coefficient CER = p = ¯k n which is much smaller than for real-world networks with the same density • for large networks limn→∞ CER = 0, so large random networks resembles a tree (i.e. they have no clustering) Components in ER graphs • if p (and thus, also ¯k) is small, there are several disjoint components • if p is increased, there is one giant component (of size ≈ n) with the rest of nodes being in isolated small components • the giant component appears when p ≈ 1/n, e.g. for n = 103 (see figure)
  • 66. Pavel Loskot c 2014 5/23 Random Network Models Percolation transition (when increasing p) 1. Subcritical: ¯k < 1, many small simple components of size at most ln n 2. Critical: ¯k ≈ 1, size of largest component is ∼ n2/3 , the giant component appears and starts growing 3. Supercritical: ¯k > 1, there is one giant component of size almost n, the second largest component has size about ln n Summary of ER graph • degree distribution is Poisson (most nodes have degree close to average) with no correlations of node degrees • average path length is small and ∝ ln n • connectivity depends on ¯k with percolation transition
  • 67. Pavel Loskot c 2014 6/23 Random Network Models Random geometric models [Penrose, 2003] • main motivation: some networks can grow subject to geometric constraints • e.g., place n nodes randomly in (2D) space; two nodes i and j connected only if their distance xi − xj ≤ r • there exists critical radius rc to form a connected giant component (if r > rc): rc = √ ln n + O(1) πn Random distance models [Avin, 2008] • n nodes placed randomly in (2D) space • links created randomly with the probability ∝ f(dij) where dij = xi − xj
  • 68. Pavel Loskot c 2014 7/23 Small World Networks More on Milgram’s experiment • how accurate 6 degree separation, how likely the chain to be completed Findings from real-world social networks • sub-optimal choice in choosing next link in chain is made 1/2 of time • Facebook measurements: average distance is 4.74 • Twitter measurements: average distance is 4.67 (50% are at 4 steps, nearly everyone in 5 steps)
  • 69. Pavel Loskot c 2014 8/23 Small World Networks Main features high clustering: Creal−world ≫ Crandom average path length: ¯lreal−world ≈ ¯lrandom Watts-Strogatz (WS) small world network model [Nature, 1998] • launched the interest into complex networks (over 3.5k citations) • single control parameter to generate regular to purely random networks • the model: 1. generate regular graph, 2. rewire links with probability p • in all network generators: self-loops and duplicated links not allowed
  • 70. Pavel Loskot c 2014 9/23 Small World Networks WS original model • select fraction of p edges and rewire one of their end-points WS model alternation • add fraction p of edges to initial regular lattice
  • 71. Pavel Loskot c 2014 10/23 Small World Networks Properties of WS model: Degree distribution Pr(k) = min(k−K,K) i=0 K i (1 − p)i pK−i(pK)k−K−i (k−K−i)! e−pK ∼ Poisson-like distribution Clustering coefficient • if node i has K neighbors, C = #edges among K neighbors (K 2) • probability that connected triple still connected after rewiring is (1 − p)3 • C(p = 0) = 3k−3 4k−2 = 3 4, C(p = 1) ≈ 2k n • C(p) C(p = 0) · (1 − p)3 , i.e., C(p)/C(0) (1 − p)3 Average path length: ¯l ≈ (n−1)(n+2k−1) 4kn
  • 72. Pavel Loskot c 2014 11/23 Small World Networks Kleinberg’s geographical small world model [Nature, 2000] • connectivity derived from geographical distances • the model: 1. link nearest neighbors 2. add links with the probability Pr(link between u and v) ∼ const × u − v −r where r is navigability exponent (e.g., links are purely random for r = 0) Hierarchical small world model [Science, 2001] • hierarchically nested groups, link probability pij ∼ exp−αxi j Other strategies for generative models of small world networks • add/rewire links based on chosen properties of current links and edges • add/rewire links to optimize particular property of the network
  • 73. Pavel Loskot c 2014 12/23 Small World Networks Topology trade-offs (a) commuter rail network (b) star network (c) minimum spanning tree network
  • 74. Pavel Loskot c 2014 13/23 Small World Networks Bridges • in social networks, close friends know what you know, and they also know others who know what you know • bridge between A and B (if removed, these two nodes become disconnected) • local bridge between A and B (if removed, distance A-B increased to > 2)
  • 75. Pavel Loskot c 2014 14/23 Small World Networks Strong and weak ties [Granovetter, 1974] • in social networks, links are strong or weak ties (friends vs. acquaintances) • strong triadic closure: if A-B and A-C are strong ties, then at least weak tie between B-C exists • if there are enough strong ties in network, local bridges must be weak ties Almost local bridges • neighbor overlap of nodes A and B: #neighbors of both A and B #neighbors of at least A or B = N(i,j) (k(i)−1)+(k(j)−1)−N(i,j) • almost local bridges are links whose end-nodes have no common neighbors (i.e., the overlap of their neighbors is 0)
  • 76. Pavel Loskot c 2014 15/23 Small World Networks Removing ties from social networks (percolation analysis) • removing weak ties breaks down the network • removing strong ties degrades the network more smoothly • however, this is specific only to social networks Removing ties from other networks • e.g. removing important road (strong tie) is more damaging • central veins are more important then peripheral veins
  • 77. Pavel Loskot c 2014 16/23 Small World Networks Illustration of weak vs. strong ties removal (a) original network (b) 80% of strongest links removed, 20% of weak ties remain (c) 80% of weakest links removed, 20% of strong ties remain • no evidence of degradation for (b), network clearly fragmented in case (c) • strong links are within dense neighborhoods (triangles, cliques etc.) • weak links (and bridges) interconnect these dense neighborhoods
  • 78. Pavel Loskot c 2014 17/23 Scale Free Networks Power-law distribution Pr(k) ∼ const×k−γ , typically 2 < γ < 3 • i.e., a straight line in log-log domain: log Pr(k) ∼ −γ log k + log const • always a few highly connected hubs
  • 79. Pavel Loskot c 2014 18/23 Scale Free Networks Power-law distribution • some other distributions look like power-laws • estimating γ may not be so easy • 1st and 2nd moments: E[k] ∝ ∞ k0 k × k−γ dk = lim k→∞ 1 2 − γ   1 kγ−2 − 1 kγ−2 0   = = ∞ if γ ≤ 2 < ∞ if γ > 2 E k2 ∝ ∞ k0 k2 × k−γ dk = = ∞ if γ ≤ 3 < ∞ if γ > 3 Preferential attachment • now, the concern is how to generate scale-free networks • we use richer-get-richer effect and add new nodes sequentially: 1. with probability p, choose any existing node and link to it 2. with probability 1 − p, link to existing node with probability proportional to their current degrees
  • 80. Pavel Loskot c 2014 19/23 Scale Free Networks Barab´asi-Albert (BA) scale-free model 1. Growth: start from seed network of m0 isolated nodes 2. Preferential attachment: add a new node with m ≤ m0 edges to existing nodes that are chosen with the probability Π(i) = k(i)/ i k(i) 3. after t steps, the network has n = m0 + t nodes and mt edges, and Π(i) = k(i) i k(i) = k(i) 2mt − m ≈ k(i) 2mt • this procedure generates degree distribution Pr(k) = 2m(m + 1) k(k + 1)(k + 2) 2m k3 ∝ k−3 so γ = 3 and average degree ¯k = 2m • average shortest path length: ¯l ∝ ln n ln(ln n) • clustering coefficients: C ∝ (ln n)2 n ( . . . too small for real-world networks)
  • 81. Pavel Loskot c 2014 20/23 Scale Free Networks Question • In scale-free networks, how much is “popularity” predictable? Answer • if we restart the process, different popular nodes will emerge Other scale-free network models • motivation: improve clustering coefficient and allow to change exponent γ [Holme, Kim] • after preferential attachment step, with probability p, add one more edge to randomly selected neighbor • resulting clustering coefficient C ∝ 1 k ( . . . much more realistic) [Vazquez et al.] • random walk instead of preferential attachment (“get to know important people through people you already know”) [Kleinberg, Kumar] • copy a vertex and rewire its edges with certain probability
  • 82. Pavel Loskot c 2014 21/23 Scale Free Networks Network generator with given γ 1. Initialize: seed network with m0 (isolated) nodes 2. Add one node and m links (not necessarily stemming from the new node) at each time; after t time-steps, the link is added to node i with the probability Π(i) = α k(i) i k(i) + (1 − α) 1 t + m0 where i k(i) = 2mt 3. thus, α = 1 leads to preferential attachment, and α = 0 is for uniform attachment, and the degree distribution Pr(k) ∝ k−(1+1/α) Models with non-power-law distribution • power-law distribution is a good fit for large networks (averaging effect) • on smaller scales, in more “specialized” sub-networks, power-law may not be such a good fit • log-normal distribution has been observed in such cases
  • 83. Pavel Loskot c 2014 22/23 Scale Free Networks Configuration network model • degrees are pre-assigned to n nodes assuming a degree distribution Pr(k) • edges are added by randomly selecting pairs of these n nodes • a family of graphs generated this way will have the same degree distribution • excess degree is the number of possible outward links of a node which has been arrived to during a walk (i.e., one less than the node degree in undirected graphs) Pr(kexcess) = (k + 1)Pr(k + 1) k kPr(k) Other models • many other stochastic models of networks can be devised and then analyzed • hence, it is important to define quality of such models, e.g. (generally): – flexibility (design for specific parameter settings) – mathematical tractability – accuracy (to fit experimental data, make predictions)
  • 84. Pavel Loskot c 2014 23/23 Take-Home Messages Random networks • Erdos & Renyi studied a simple model in 1959 • it has Poisson degree distribution with small average path length, but clustering goes to zero with network size Small world networks • the world is small, 6 degree separation (Milgram’s experiment) • short average path length, but clustering still smaller than in real networks • real-world networks contain weak and strong ties • Watts & Strogatz proposed simple model of small world networks (in 1998) Scale free networks • main focus is to produce power-law distribution • Barab´asi & Albert proposed model based on preferential attachment (in 1999); many modifications of this model can be (and were) devised Network models • mostly stochastic with main motivation is to emulate real-world networks • find structural properties to explain specific (global) properties of networks • useful to define quality of these models
  • 86. Pavel Loskot c 2014 1/11 Robustness Percolation • monitor network metrics while nodes or edges are being removed Dual problem • monitor network metrics while nodes or edges are being added 1. What strategy to remove/add nodes or edges? • no knowledge: nodes and edges removed (uniformly) at random • knowledge of structure: removing nodes and edges with high centrality • adding nodes and edges: cf. random network generators 2. Which metrics most relevant and should be monitored? • rate of decay/growth of: network diameter, average degree, average distance, size of giant component etc. 3. Which (class of) networks to consider? • any network, networks with specific degree distribution etc. 4. Why to consider robustness? • in general, networks resilience to attacks is a growing concern • want to design networks that are robust to damage
  • 87. Pavel Loskot c 2014 2/11 Robustness Pragmatic definition • the network is robust if it can withstand accidental damage, random topology changes as well as intentional attacks and remain operational • this accounts for the remaining nodes and links to be able to carry flows and perform other tasks without excessive congestion, dead-locks etc. • observing average decay (e.g. size of giant component) may not be that useful (e.g. it cannot identify local congestion further impairing the network) • note also that we are still considering only networks with static topology Example • 50 nodes, removing 40 out of 116 edges decreases ¯k from 4.6 to 3.0
  • 88. Pavel Loskot c 2014 3/11 Robustness as Stability Global stability • system is stable if it returns to equilibrium after any perturbation Resistance • ability of a community to resist change in face of potentially perturbing force Resilience • ability of a community to recover to normal functioning after disturbance Variability • variations in community density over time (measured e.g. as changes in mean/variance) due to external disturbances
  • 89. Pavel Loskot c 2014 4/11 Robustness Percolation threshold • if ¯k decreases by removing edges, network suddenly becomes disconnected • if ¯k increases by adding edges, giant component suddenly emerges Examples p probability of filling squares, at p critical, giant connected component appears
  • 90. Pavel Loskot c 2014 5/11 Robustness Experiment [Barab´asi et al., 2000] strategy: random failures versus targeted attacks removing nodes metrics: average or maximum (network diameter) shortest path networks: exponential versus scale-free (the same |V| and |E|)
  • 91. Pavel Loskot c 2014 6/11 Robustness Experiment (cont.) effect on size of giant component s and its average s
  • 92. Pavel Loskot c 2014 7/11 Robustness Experiment (cont.) (Internet and WWW) effect on size of giant component s and its average s
  • 93. Pavel Loskot c 2014 8/11 Robustness of Scale-Free Networks Random failures vs targeted attack (a) original network of 574 nodes (b) removing 20% (115) of nodes randomly leaves 427 nodes in giant component (c) removing only 2.8% (22) most connected hubs leaves 301 nodes in giant component Bottom line • scale-free networks are robust against random failures • they are very vulnerable against targeted attacks
  • 94. Pavel Loskot c 2014 9/11 Robustness of Scale-Free Networks Impact of power-law exponent on robustness (∼ k−γ ) • γ = 2.5: graceful degradation • γ = 3.5: giant component disappears at about f = 40% • assume e.g. case of γ = 2.7 (square markers) • kmax is maximum degree among remaining nodes • removing only 1% of nodes discards giant component (top figure) • kmax has to be very small to destroy giant component (bottom figure)
  • 95. Pavel Loskot c 2014 10/11 Robustness Percolation threshold for random failures • in general, minimum fraction of nodes required (i.e., that cannot be randomly removed) for giant component to exist fc = E[k] E k2 − E[k] • specialized for random networks: fc = 1 E[k] thus, if ¯k = E[k] is large, random network can withstand large losses; e.g. if ¯k = 4, then 1/4 of nodes is enough for giant component to exists (i.e., 3/4 of nodes have to be removed to destroy giant component) • specialized for scale-free networks: fc −→ 0 as E k2 tends to be very large (even infinite) which makes these networks very robust against random failures (and attacks)
  • 96. Pavel Loskot c 2014 11/11 Take-Home Messages Scale-free networks • very robust against random failures (some suggest that this is the reason why these networks are found so often in real world) • but very vulnerable against attacks to highly connected hubs • since hubs are also responsible (and effective) for spreading messages, diseases etc. through the network • in Social Networks, it is not hubs but rather weak ties and bridges that make these networks vulnerable Small world networks • have not been considered here • one extreme is a ring with one hop neighboring connections without any shortcut links; such network is not robust at all • another extreme is a fully connected network which is unbreakable • small world networks are in-between these two extremes; their robustness is likely derived from the density of shortcut links Making networks more robust • obvious strategy is to guarantee some minimum degree for every node (i.e., to achieve connections redundancy)
  • 98. Pavel Loskot c 2014 1/13 Epidemic Spreading Network processes • strongly influenced by network structure; e.g. shortcuts significantly speed up spreading (of information, diseases) and synchronization of processes • hence, understanding of such network(ed) (distributed) processes requires understanding of the underlying network structures • e.g. neurons integrate signals from neighbors, if above threshold, the excitation fires and then fades away; this leads to oscillating cascades • here, we consider diffusion of diseases characterized by contagion (lack of choice), unlike information spreading where nodes make decisions to maximize their pay-offs
  • 99. Pavel Loskot c 2014 2/13 Epidemic Spreading Simple spreading models • ring topology with shortcuts • all nodes susceptible • nodes infected with probability p • spreading of disease, computer viruses, . . . • tree topology of spreading in waves of k nodes, all nodes susceptible • nodes infected with probability p a) p is large, disease spreads out b) p is small, disease dies out Reproductive number: R0 = kp a) if R0 < 1, disease dies out in finite number of waves b) if R0 > 1, disease very likely infects at least 1 person in each wave
  • 100. Pavel Loskot c 2014 3/13 Epidemic Spreading Limitations of simple models • small changes in k and p can move R0 above or below threshold (R0 ≷ 1) • network topology not realistic (e.g. no triangles) • nodes get infected only once and never recover More realistic: SI model • two classes of nodes: S (susceptible) and I (infected) • once infected, the node cannot recover |V| = |S | + |I| total number of nodes (V = S ∪ I) β = λ¯k infection rate per node (0 ≤ λ ≤ 1) β|S |/|V| susceptible contacts per unit of time dI dt = β|S ||I|/|V| overall rate of infection • let i = |I|/|V| be fraction of nodes infected, then di dt = β i (1 − i) which yields a logistic curve: i(t) = i(0) eβt 1 − (1 − eβt) i(0)
  • 101. Pavel Loskot c 2014 4/13 Epidemic Spreading More realistic: SIR model • improve SI model by assuming infected nodes recover at rate υ, i.e., nodes stay infected only for (average) time τ = 1/υ • recovered node will become resistant (i.e. cannot be infected again) • define fractions s = |S |/|V|, i = |I|/|V|, r = |R|/|V|, so s + i + r = 1; rate of change of these fractions over time: ds dt = −βsi, di dt = βsi − υi, dr dt = υi solution again requires initial conditions s(0), i(0) and r(0) Possible outcomes a) disease may die out b) disease may spread to whole network c) disease becomes endemic (does not spread, nor die out)
  • 102. Pavel Loskot c 2014 5/13 Epidemic Spreading More realistic: SIS model • no (permanent) recovery, but infected node may again become susceptible • infected to susceptible rate υ: a) if β > υ, logistic growth (as in SI model), but never infects whole population b) if β → υ, then i → 0 (infection will slowly die out) c) if β < υ, then infection dies out exponentially • mathematical model (assumes r = 0): ds dt = υi − βsi, di dt = βsi − υi, s + i = 1
  • 103. Pavel Loskot c 2014 6/13 Epidemic Spreading Prognosis of epidemic • reproductive number: R0 = β/υ a) if R0 > 1, infection survives b) if R0 < 1, infection dies out • in SI model, υ → 0, so R0 ≫ 1 (a) SI model (b) SIR model (c) SIS model Extensions of SIR model • rather than assuming recovery after τ time units, let recovery be possible at each time with some (fixed) probability • infected state further subdivided (e.g. early, middle and final disease stages) • non-homogeneous mixing: restrictions how the nodes meet (e.g. travel to geographical locations, quarantining, . . . ) • other random network models (note that Erdos-Renyi model with homogeneous mixing was implicitly assumed in SI, SIR and SIS models)
  • 104. Pavel Loskot c 2014 7/13 Epidemic Spreading SIS model in scale-free networks • experimentally observed that computer viruses survive significantly longer than predicted from SIS model over random networks • it was found that there is no epidemic threshold in scale-free networks, so infection proliferate independently of spreading rate • however, there is critical fraction of shortcuts in scale-free networks; if enough shortcuts, disease suddenly becomes epidemic • critical fraction of shortcuts is a function of rates β and υ
  • 105. Pavel Loskot c 2014 8/13 Epidemic Spreading Network immunization • random networks: uniformly random immunization is helpful • scale-free networks: targeted degree-based immunization required as random immunization does not help • targeted local immunization: immunize one immediate neighbor for every node in a randomly selected group (i.e. nodes with higher degree are more likely to be immunized) red-circles: random immunization of scale-free network red-squares: targeted immunization of scale-free network black-squares: random & targeted immunization of random network
  • 106. Pavel Loskot c 2014 9/13 Take-Home Messages Epidemic spreading • practical modeling requires to extract model parameters from real data • knowledge of nodes mobility is key to accurate modeling of spreading • epidemic spreading strongly influenced by information diffusion (i.e. knowing what is happening and what to do) • predictive modeling (if epidemic spreading on-going, it is desirable to be in real-time) is routinely used in practice as prevention SARS prediction and comparison with real outbreak data
  • 107. Pavel Loskot c 2014 10/13 Network Dynamics Spatial-temporal scales (a) short: link activation and deactivation – topology is a snapshot – connected components must respect time sequences of links (b) longer: topology change from one structure to another – communities formation, merging, splitting – large communities persist in time if there is exchange of their members – small communities persist if their core is highly connected with strong ties (c) long: network evolution (birth, growth and decline) – in scale-free networks, in spite of changes (nodes and links appear and disappear), degree, weight and strength distributions remain stationary
  • 108. Pavel Loskot c 2014 11/13 Information Cascades Aims • understand how behaviors, ideas, technology usage etc. are adopted, influenced and spread through networks Diffusion model • two nodes v and w • two behaviors A and B • two pay-offs a > 0 and b > 0 (i.e., the larger, the better) Network implications • let 0 ≤ p ≤ 1, and there are d neighbors of v • pd neighbors of v choose A • (1 − p)d neighbors of v choose B • A is better strategy if: pd·a ≥ (1−p)d·b ⇒ p ≥ b a + b
  • 109. Pavel Loskot c 2014 12/13 Information Cascades Example diffusion in a network • let a = 3, b = 2, so b a+b = 2/5 • A: dark circles, B: light circles (b) only v and w adopt A (c) nodes r and t switch to A (i.e. 2/3 neighbors of A); u does not switch (but 1/3 of its neighbors chose A); note also that: 1/3 < 2/5 < 2/3 (d) also nodes s and u switch to A
  • 110. Pavel Loskot c 2014 13/13 Take-Home Messages Cascades • initial adoption by few nodes may generate complete cascade • it is dependent on network structure • it is also crucially dependent on threshold b/(a+b), so changing pay-offs can make big difference (e.g. making the product more attractive) • OR, directly influence key nodes (initial adopters) • densely inter-connected clusters are difficult to penetrate • key parameters: clusters connection density and pay-off threshold Role of weak ties • very useful in spreading information • poor in transferring behaviors that are risky and/or costly Influencing nodes • in networks with many clusters, users are more easily influenced • reinforcement is very important in influencing users • node centrality is crucial for (information, behavior) diffusion
  • 112. Pavel Loskot c 2014 1/24 Max Flow and Min Cut Scenario • single source node s and single sink node t (for simplicity) • directed edges between nodes represent flows (information, material, . . . ) • every edge assigned a weight representing max possible flow ≡ capacity
  • 113. Pavel Loskot c 2014 2/24 Max Flow and Min Cut Dual problems (of combinatorial optimization) 1. find minimum cut of a graph G = (V, E) where V is set of nodes and E are weighted edges (max flows) 2. find maximum possible total flow from s ∈ V to t ∈ V over E while flows at every other node are equalized (in-flow = out-flow) Cut (S, T) • node partitioning V = S ∪ T such that S ∩ T = ∅ and s ∈ S and t ∈ T Capacity of cut (S, T) • sum of weights (capacities) leaving set S and entering set T
  • 114. Pavel Loskot c 2014 3/24 Max Flow and Min Cut Minimum cut problem • find the cut with the minimum capacity Maximum flow problem • assign flows to edges not larger than their capacity, so that total flow from s to t is maximized and flows in all other nodes (V{s, t}) are equalized
  • 115. Pavel Loskot c 2014 4/24 Max Flow and Min Cut Observation 1 • flow from S to T is equal to the total flow reaching sink t
  • 116. Pavel Loskot c 2014 5/24 Max Flow and Min Cut Observation 2 • flow from S to T is at most equal to capacity of the cut • if flow from S to T is equal to capacity of the cut, then we have maximum possible flow from S to T and (S, T) is minimum cut
  • 117. Pavel Loskot c 2014 6/24 Max Flow and Min Cut Greedy algorithm 1. select a path from s to t and set its flow to be equal to the minimum capacity among its edges (≡ bottleneck) 2. for every edge, obtain residual capacity ≡ capacity - flow (“undo” flow sent): i.e. add edge (w, v) to every edge (v, w) with positive residual capacity 3. augment path with strictly positive residual capacities
  • 118. Pavel Loskot c 2014 7/24 Max Flow and Min Cut Ford-Fulkerson algorithm • greedy algorithm to find a maximum flow • find augmenting path with strictly positive residual capacities • if path can no longer be augmented, the flow is maximum Max-flow min-cut theorem • The value of maximum flow is equal to the capacity of the minimum cut. Complexity of Ford-Fulkerson algorithm • assume capacities are integers 1, . . . , U • Theorem 1: the algorithm terminates in at most |V| · U iterations. • Theorem 2: if all edge capacities are integers, then the maximum flow has integer values of flows on every edge.
  • 119. Pavel Loskot c 2014 8/24 Max Flow and Min Cut Choosing initial augmenting path • some choices lead to exponential time algorithm, clever choices lead to polynomial time algorithm (number of iterations): 1. choose path with fewest edges (shortest path, breadth first search) 2. choose path with maximum bottleneck capacity (fastest path, priority or depth first search) Application: Bipartite matching • find maximum matching of a bipartite graph G • solve max-flow problem for extended graph G′ • by integer theorem (see above), there exists a maximum flow with 0/1 values
  • 120. Pavel Loskot c 2014 9/24 Take-Home Messages Applications of max-flow and min-cut theorem • Network connectivity • Bipartite matching • Data mining • Open-pit mining • Airline scheduling • Image processing • Project selection • Baseball elimination • Network reliability • Security of statistical data • Distributed computing • Egalitarian stable matching • Distributed computing • . . . There are many efficient algorithms for solving max-flow min-cut problem.
  • 121. Pavel Loskot c 2014 10/24 Network Routing Routing algorithms • find the least cost path between any two nodes in the (telecommunication) network • link cost: e.g. capacity, inverse of delay, or more simply, all links have a unit cost • path cost: sum of link costs along the path 1. Link state routing algorithms • assume every node has knowledge about network topology and all link costs • thus, all nodes have the same (global) knowledge (how?) • so-called centralized or link state algorithms 2. Distance vector routing algorithms • only local knowledge of link costs to all neighbors • iterative computations in collaboration with neighbors • so-called decentralized or distance vector algorithms
  • 122. Pavel Loskot c 2014 11/24 Network Routing Link state routing: Dijkstra algorithm • every node computes the least cost path to all other nodes in the network • the computed paths are stored in so-called forwarding table • after K iterations, the least cost paths known for K destination nodes Algorithm: c(x, y) link cost between neighbors x and y (= ∞ if not neighbors) D(v) current cost of path from source to destination node v p(v) predecessor node along path from source to node v V′ set of nodes whose least cost paths already known
  • 123. Pavel Loskot c 2014 12/24 Network Routing Dijkstra algorithm example • the shortest path constructed by tracking predecessors • if ties encountered, they can be broken arbitrarily
  • 124. Pavel Loskot c 2014 13/24 Network Routing Dijkstra algorithm example Complexity of Dijkstra algorithm • at each iteration, need to check N nodes not in V′ , i.e., N(N + 1)/2 comparisons ∼ O(N2 ) • more efficient implementations devised ∼ O(N log N)
  • 125. Pavel Loskot c 2014 14/24 Network Routing Distance vector algorithm • fully distributed generation of forwarding tables • based on Bellman-Ford equation (dynamic programming) dx(y) = minv∈N(x) (c(x, v) + dv(y)) v∗ = argminv∈N(x) (c(x, v) + dv(y)) N(x) neighbors of node x c(x, v) link cost from x to v dv(y) cost from neighbor v to destination y v∗ next hop in least cost path from x to y Example dv(z) = 5, dx(z) = 3, dw(z) = 3 du(z) = min   c(u, v) + dv(z), c(u, x) + dx(z), c(u, w) + dw(z)   = min   2 + 5, 1 + 3, 5 + 3   = 4
  • 126. Pavel Loskot c 2014 15/24 Network Routing Distance vector algorithm • Dx(y) is least cost from x to y and it is iteratively estimated • every node x maintains distance vectors for yourself and all its neighbors; recall that V is set of all nodes and N(x) is set of neighbors of x Dx = Dx(y) : y ∈ V Dv = Dv(y) : y ∈ V , v ∈ N(x) as well as x knows costs c(x, v) to all its neighbors v ∈ N(x) • key idea is to periodically exchange distance vectors Dx among neighbors; the vectors are then updated using B-F equation as: Dx(y) ← min v∈N(x) (c(x, v) + Dv(y)) , for ∀y ∈ V so (under some minor conditions) estimate Dx(y) −→ true value dx(y) Distance vector updates (at each node) 1. asynchronous: triggered by change of local link cost, or by update message from the neighbor 2. synchronous: notify all neighbors if own distance vector changes
  • 127. Pavel Loskot c 2014 16/24 Network Routing Example updates
  • 128. Pavel Loskot c 2014 17/24 Network Routing “Good news travel fast” “Bad news travel slow” Comparison Link state Distance vector Messages O(|V| · |E|) msgs sent local exchange only Convergence O(|V|2 ), may have time varies, possibly loops, oscillations count-to-inf problem Robustness may advertise incorrect may advertise incorrect link cost, each node path cost, each node’s computes its own table table used by others (errors propagate )
  • 129. Pavel Loskot c 2014 18/24 Search on Networks • the aim is to find some source-destination path in reasonable amount of time • the path cost is not an issue unlike in routing Surprising observations (from real-world networks) 1. short paths exist between pairs of nodes (6 degree separation) 2. these short paths can be discovered (and used) Remarks • both observations closely interrelated • it is not so clear how to discover (or even create) these short paths • typical situation is nodes have only local rather than global information; flooding to discover the destination known to be very inefficient Decentralized search • Kleinberg’s small world network model: n × n grid of nodes with local connections plus every node v has a random long range link to node w Pr(v link to w) ∼ d(v, w)−α , α ≥ 0 and distance d(v, w) ≡ #grid steps • value of α trade-offs how random long-range connections are
  • 130. Pavel Loskot c 2014 19/24 Search on Networks Comparing search strategies • efficiency of a search strategy is expected delivery time (over random long- range contacts i.e. topology, and random source-destination pairs) • delivery time ∼ number of hops in the graph (unit-weight links) Trading-off value of α • α = 0 long-range links are uniformly distributed (∼ WS model), difficult to navigate having only local knowledge (and knowing location of destination) • for α = 0, the actual chosen path to destination is likely to be significantly longer than the corresponding shortest path • α > 0 higher clustering, long-range links less random, more realistic scenario • lower-bounds on expected delivery time [Kleinberg 2000] ¯TD ≥    const × n(2−α)/3 0 ≤ α < 2 const × (log n)2 α = 2 const × n(α−2) 2 < α < 3 thus, α = 2 is a polynomial in log n, while other cases are polynomials in n
  • 131. Pavel Loskot c 2014 20/24 Search on Networks Web search • information retrieval since 60’s using “textual analysis” • more recently, information ranked by its score (e.g., #links to it) Scoring a webpage • #webpages pointing to it (unit-weight links) • sum of the scores of neighboring webpages pointing to it
  • 132. Pavel Loskot c 2014 21/24 Search on Networks Authorities • nodes pointed to by highly ranked nodes • they offer prominent, highly endorsed answers to queries Hubs • nodes that point to highly ranked nodes Assessing authorities and hubs • compute weights h(i) (for hubs) and b(i) (for authorities) h(i) = j [A]ij b( j) b(i) = j [A]ij h( j) • the weights are computed iteratively as (in matrix form) ht+1 = (AAT )ht bt+1 = (AT A)bt • main drawback: it requires global knowledge (of A), so it is query-dependent
  • 133. Pavel Loskot c 2014 22/24 Search on Networks PageRank (named by the Google founder) • ranking pages independently of queries • main idea: page is important if it is linked by other important pages • every page is assigned a weight w( j) = i [A]ij w(i) · 1 dout(i) w(i) weights of in-bound neighboring pages dout(i) out-degree of node i to dilute its importance if it links to many other nodes • the weights w(i) are probabilities that from any starting page, the page i is reached via a random walk • however, if some page does not have out-bound links, the random walker gets trapped; so with probability s choose random walk, and with probability (1 − s) jump randomly to any other node
  • 134. Pavel Loskot c 2014 23/24 Search on Networks Strategies • many strategies may be devised, some are more efficient than others • decentralized search is a practical requirement in large networks • in social networks, weak (social) ties and hierarchy play significant role • visiting the same nodes while searching is inefficient, yet there is tendency to visit hubs often To aid the search • nodes as sources of information are scored (e.g. by level of trust) • exploiting network structure of (distributed) information helps significantly • challenge: real-time updates of contents • ranking (i.e. scoring) algorithms are kept secret and changed (updated) continuously
  • 135. Pavel Loskot c 2014 24/24 Take-Home Messages Routing • it is not only to find source-destination path, but the one having least cost • it is implicitly assumed that each node has an address (identification) • routing in the Internet evolved over time (i.e., it has not been designed from the beginning) • it is still unclear why the Internet routing works so well at such large scales • main issues with the Internet routing are robustness, security and congestion Search on small world and scale free networks • small world networks have small short path length and high clustering coefficient, however, Watts-Strogatz (WS) model does not capture navigability of real-world networks • search is fast and scales well in scale-free networks
  • 137. Pavel Loskot c 2014 1/11 Software Requirements for Graph Data Tasks • input data in common format (e.g. Excel, CSV, . . . ) • convert (output) data into the desired format (GraphML, Pajek, . . . ) • Social Network Analysis (SNA) of data • dynamic (temporal) analysis • data visualization Requirements • steep learning curve (easy to grasp) • flexibility (use different formats for input and output) • scalability (Big Data, application dependent) • speed (if Big Data or real-time) • parallel and distributed computing capability (MapReduce) • functionality as modules or add-ins • . . .
  • 138. Pavel Loskot c 2014 2/11 Networks in Matlab
  • 139. Pavel Loskot c 2014 3/11 Networks in Matlab
  • 140. Pavel Loskot c 2014 4/11 Networks with Python
  • 141. Pavel Loskot c 2014 5/11 Networks in C, R, Python
  • 142. Pavel Loskot c 2014 6/11 Networks Visualization and Analysis
  • 143. Pavel Loskot c 2014 7/11 Networks Community Analysis
  • 144. Pavel Loskot c 2014 8/11 Social Network Analysis
  • 145. Pavel Loskot c 2014 9/11 Popular in Bioinformatics
  • 146. Pavel Loskot c 2014 10/11 Networks Online Demos
  • 147. Pavel Loskot c 2014 11/11 Networks Data