Network Measures: Characterizing networks

Lecture 4:
Network Measures
CS 765: Complex Networks
Slides are modified from Networks: Theory and Application by Lada Adamic

Characterizing networks:
Is everything connected?
2

Network metrics: components
 If there is a path from every vertex in a network to every
other, the network is connected
 otherwise, it is disconnected
 Component: A subset of vertices such that there exist at
least one path from each member of the subset to others
and there does not exist another vertex in the network
which is connected to any vertex in the subset
 Maximal subset
 A singeleton vertex that is not connected to any other
forms a size one component
 Every vertex belongs to exactly one component
3

network metrics: size of giant component
 if the largest component encompasses a significant fraction of the graph,
it is called the giant component
5

components in directed networks
A
B
C
D
E
F
G
H
Weakly connected components
A B C D E
G H F
6
 Strongly connected components
 Each node within the component can be reached from every other node
in the component by following directed links
Strongly connected components
B C D E
A
G H
F
 Weakly connected components:
 every node can be reached from every other node by following links in
either direction
A
B
C
D
E
F
G
H

components in directed networks
 Every strongly connected component of more than one
vertex has at least one cycle
 Out-component: set of all vertices that are reachable
via directed paths starting at a specific vertex v
 Out-components of all members of a strongly
connected component are identical
 In-component: set of all vertices from which there is a
direct path to a vertex v
 In-components of all members of a strongly connected
component are identical
7
A
B
C
D
E
F
G
H

bowtie model of the web
 The Web is a directed graph:
 webpages link to other webpages
 The connected components tell us what set of pages can
be reached from any other just by surfing
 no ‘jumping’ around by typing in a URL or using a search engine
 Broder et al. 1999 – crawl of over 200 million pages and
1.5 billion links.
 SCC – 27.5%
 IN and OUT – 21.5%
 Tendrils and tubes – 21.5%
 Disconnected – 8%
8

How far apart are things?
9

Network metrics: paths
 A path is any sequence of vertices such that every
consecutive pair of vertices in the sequence is
connected by an edge in the network.
 For directed: traversed in the correct direction for the edges.
 path can visit itself (vertex or edge) more than once
 Self-avoiding paths do not intersect themselves.
 Path length r is the number of edges on the path
 Called hops
10

Paths
 A path between nodes i0 and in is an ordered list
of n links P = {(i0, i1), (i1, i2), (i2, i3), ... ,(in-1, in)}.
 The length of the path is n.
 The path shown in orange in (a) follows the route
1→2→5→7→4→6, hence its length is n = 5
 The shortest paths between nodes 1 and 7, or the
distance d17, correspond to the path with the fewest
number of links that connect nodes 1 to 7.
 There can be multiple paths of the same length, as illustrated by
the two paths shown in orange and grey.
 The network diameter is the largest distance
in the network
 being dmax = 3 here.

Independent paths
 Edge independent paths: if they share no common edge
 Vertex independent paths: if they share no common
vertex except start and end vertices
 Vertex-independent => Edge-independent
 Also called disjoint paths
 These set of paths are not necessarily unique
 Connectivity of vertices: the maximal number of
independent paths between a pair of vertices
 Used to identify bottlenecks and resiliency to failures
12

Cut Sets and Maximum Flow
 A minimum cut set is the smallest cut set that will
disconnect a specified pair of vertices
 Need not to be unique
 Menger’s theorem: If there is no cut set of size less than
n between a pair of vertices, then there are at least n
independent paths between the same vertices
 Implies that the size of min cut set is equal to maximum number
of independent paths
 for both edge and vertex independence
 Maximum Flow between a pair of vertices is the number
of edge independent paths times the edge capacity
13

Network metrics: shortest paths

A
B
C
D
E
1
2
2
3
3
15
3

Structural metrics:
Average path length
16
1 ≤ L ≤ D ≤ N-1

shortest paths
 <d> = 0.35 + 2.06 log(N)
 prediction: <d> = 17.5 for 200 million nodes
 actual: <d> = 16 for reachable pairs
0 2 4 6 8 10
x 10
4
0
5
10
15
20
25
average
shortest
path
number of webpages

Paths
18
path Shortest path Diameter
Average path length Cycle Eulerian path Hamiltonian path

Node degree
 Outdegree =
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 0 0 0 1
1 1 0 0 0
A =


n
j
ij
A
1
example: outdegree for node 3 is 2, which
we obtain by summing the number of non-
zero entries in the 3rd
row
 Indegree =
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 0 0 0 1
1 1 0 0 0
A =


n
i
ij
A
1
example: the indegree for node 3 is 1,
which we obtain by summing the number of
non-zero entries in the 3rd
column


n
i
i
A
1
3


n
j
j
A
1
3
1
2
3
4
5
19

Node degrees

20



n
i
i
k
L
1
2
1

 



n
i
out
i
n
i
in
i k
k
L
1
1

Degree sequence and Degree frequency
 Degree sequence: An ordered list of the (in,out) degree of each node
 In-degree sequence:
 [2, 2, 2, 1, 1, 1, 1, 0]
 Out-degree sequence:
 [2, 2, 2, 2, 1, 1, 1, 0]
 (undirected) degree sequence:
 [3, 3, 3, 2, 2, 1, 1, 1]
 Degree frequency: A frequency count of the occurrence of each degree
In-degree frequency:
[(2,3) (1,4) (0,1)]
Out-degree frequency :
[(2,4) (1,3) (0,1)]
(undirected) frequency :
[(3,3) (2,2) (1,3)]
0 1 2
0
1
2
3
4
5
indegree
frequency
21

Degree distribution
 The degree distribution is a function P(k), which gives
the probability of a randomly chosen node from the
graph having degree k
22
In-degree 0 1 2 3
Frequency 1 4 3 0
Distribution 0.13 0.50 0.38 0.00
Out-degree 0 1 2 3
Frequency 1 3 4 0
Distribution 0.13 0.38 0.50 0.00
Degree 0 1 2 3
Frequency 0 3 2 3
Distribution 0.00 0.38 0.25 0.38

Structural Metrics: Degree distribution
23

Degree Distribution Plot
The -axis represents the degree and the -axis
represents the fraction of nodes having that
degree
On social networking sites
There exist many users with few
connections and there exist a
handful of users with very large
numbers of friends.
Facebook
Degree Distribution

Degree distributions
 Imagine I have a graph with 1000 nodes, but no links.
Now I start adding links randomly, one by one.
 After 10 random additions, what do you expect the degree
distribution to be?
 What will the average node degree be after 1000 additions?
 The standard situation in a network where links are
added completely at random.
 If there are n nodes, and m edges randomly added, then the
peak of this is at 2m/n, the average degree.
 For a randomly picked node, the most likely degree is the
average one.
 The probabilities then drop quickly either side.
P(k)
0
0.05
0.1
0.15
0.2
0.25
0.3
1.00
3.00
5.00
7.00
9.00
11.00
13.00
15.00
17.00

Degree Distributions
26
Protein interactions of yeast

degree distribution
 indegree, a ~ 2.1
 outdegree, a ~ 2.4
source: Pennock et al.: Winners don't take all: Characterizing the competition for links on the web
PNAS April 16, 2002 vol. 99 no. 8 5207-5211

How dense are they?

network metrics: graph density
 Of the connections that may exist between n nodes
 directed graph
Lmax = n*(n-1)
 undirected graph
Lmax = n*(n-1)/2
 What fraction are present?
 density = L / Lmax
 In real networks L << Lmax
 For example, out of 12 possible connections,
this graph has 7, giving it a density of 7/12 = 0.583
29

Graph density
30
 Would this measure be useful for comparing networks of
different sizes (different numbers of nodes)?
 As n → ∞, a graph whose density reaches
 0 is a sparse graph
 a constant is a dense graph

Transitivity
  is said to be transitive if a  b and b  c together
imply a  c
 Perfect transitivity in network → cliques
 Partial transitivity
 u knows v and v knows w →
=
31

Local Clustering Coefficient
 Clustering coefficient measures transitivity in
undirected graphs
 Local clustering coefficient measures transitivity at the
node level
 Commonly employed for undirected graphs
 Computes how strongly neighbors of a node (nodes
adjacent to ) are themselves connected
In an undirected graph, the
denominator can be rewritten as:

Local Clustering Coefficient: Example
 Thin lines depict connections to neighbors
 Dashed lines are the missing connections among
neighbors
 Solid lines indicate connected neighbors
 When all neighbors are connected
 When none of neighbors are connected

Structural Metrics:
Clustering coefficient
35

Clustering Coefficient and Triples
 Triple: an ordered set of three nodes,
 connected by two (open triple) edges or
 three edges (closed triple)
 A triangle can miss any of its three edges
 A triangle has 3 Triples
and are different triples
• The same members
• First missing edge and second missing
and are the same triple

[Global] Clustering Coefficient
 Count paths of length two and check whether the third edge
exists
When counting triangles, since every triangle has 6
closed paths of length 2
Or we can rewrite it as

[Global] Clustering Coefficient: Example
• Average clustering coefficient and global clustering
coefficient are different
• In some extreme cases they could differ
considerably

clustering & motifs
 clustering coefficient ~ 0.11 (at the site level)
Source: Milo et al., “Superfamilies of evolved and designed networks”, Science 303 (5663), p. 1538-1542, 2004.

Local Clustering and Redundancy
 Redundancy
42

Reciprocity
If you become my friend, I’ll be yours
 Reciprocity is simplified version of
transitivity
 It considers closed loops of length 2
 If node is connected to node ,
 by connecting to , exhibits reciprocity

Reciprocity
 How likely is it that the node you point to will point to you
as well.
44

Reciprocity: Example
Reciprocal nodes:

Cocitation and Bibliographic coupling
 Cocitation of two vertices i and j is the number of
vertices that have outgoing edges to both
 Bibliographic coupling is the number of vertices to
which both point
46

Signed Edges and Structural balance
 Friends / Enemies
 Friend of friend →
 Enemy of my enemy →
 Structural balance: only loops of even number of
“negative links”
 Structurally balanced → partitioned into groups where
internal links are positive and between group links are
negative
47

Social Balance Theory
 Consistency in friend/foe relationships among individuals
 Informally, friend/foe relationships are consistent when
 In the network
 Positive edges demonstrate friendships ()
 Negative edges demonstrate being enemies ()
 Triangle of nodes , and , is balanced, if and only if
 denotes the value of the edge between nodes and

Social Balance Theory: Possible Combinations
For any cycle if the multiplication of edge values become positive,
then the cycle is socially balanced

50
Keith Collins, Loubna Mrie - Quartz

Similarity
 What interaction patterns are common?
 Reciprocity and Transitivity
 Balance and Status
 Who are the like-minded users and how can we
find these similar individuals?
 Similarity
 Who are the central figures (influential nodes) in
the network?
 Centrality

Structural Equivalence
 Structural Equivalence:
 We look at the neighborhood shared by two
nodes;
 The size of this shared neighborhood defines how
similar two nodes are.
 Example:
 Two brothers have in common
 sisters, mother, father, grandparents, etc.
 This shows that they are similar

 Vertex similarity:
 The neighborhood often excludes the node itself
 Issue?
 Connected nodes not sharing a neighbor will be assigned zero similarity
 Solution:
 We can assume nodes are included in their neighborhoods
Structural Equivalence: Definitions
Jaccard Similarity:
Cosine Similarity:
Normalize?

Similarity Significance
Measuring Similarity Significance: compare the
calculated similarity value with its expected value
where vertices pick their neighbors at random
 For vertices and with degrees and this
expectation is
 There is a chance of becoming ‘s neighbor
 selects neighbors
 We can rewrite neighborhood overlap as

Normalized Similarity, cont.
times the Covariance between and
Normalize the covariance by the multiplication of Variances
We get Pearson correlation coefficient
(range of   [-1,1] )

Similarity
 Structural Equivalence: share many of the same
neighbors
 Jaccard Similarity:
 Cosine Similarity:
 Pearson Coefficient: Given degree of two nodes, how many
common neighbors they have ()
 Euclidian Distance:
 Regular Equivalence: neighbors are the same
 Katz Similarity:
58

Regular Equivalence
 In regular equivalence,
 We do not look at
neighborhoods shared between
individuals, but
 How neighborhoods
themselves are similar
 Example:
 Athletes are similar not because
they know each other in person,
but since they know similar
individuals, such as coaches,
trainers, other players, etc.

• , are similar when their neighbors and are
similar
Regular Equivalence

Regular Equivalence
 and are similar when is similar to ’s
neighbors
 In vector format
A vertex is highly similar
to itself, we guarantee
this by adding an identity
matrix to the equation
W the matrix is invertible

Regular Equivalence: Example
 Any row/column of this matrix shows the similarity to other vertices
 Vertex 1 is most similar (other than itself) to vertices 2 and 3
 Nodes 2 and 3 have the highest similarity (regular equivalence)
The largest eigenvalue of is 2.43
Set

Homophily and Assortative Mixing
 Assortativity: Tendency to be linked with nodes that are
similar in some way
 Humans: age, race, nationality, language, income, education
level, etc.
 Citations: similar fields than others
 Web-pages: Language
 Disassortativity: Tendency to be linked with nodes that
are different in some way
 Network providers: End users vs other providers
 Assortative mixing can be based on
 Enumerative characteristic
 Scalar characteristic
63

Assortativity: An Example
 The friendship network in a US
high school in 1994
 Colors represent races,
 White: whites
 Grey: blacks
 Light Grey: hispanics
 Black: others
 High assortativity between
individuals of the same race

Assortativity Significance
 Assortativity significance
 The difference between measured assortativity and expected
assortativity
 The higher this difference, the more significant the assortativity
observed
 Example
 In a school, half the population is white and the other half is Hispanic.
 We expected 50% of the connections to be between members of
different races.
 If all connections are between members of different races, then we
have a significant finding

Modularity (enumerative)
 Extend to which a node is connected to a like in network
 + if there are more edges between nodes of the same type than
expected value
 - otherwise
is 1 if ci and cj are of same type, and 0 otherwise
err is fraction of edges that join same type of vertices
ar is fraction of ends of edges attached to vertices type r
66

Assortative coefficient (enumerative)
 Modularity is almost always less than 1, hence we can
normalize it with the Qmax value
67

Assortative coefficient (scalar)
 r=1, perfectly assortative
 r=-1, perfectly disassortative
 r=0, non-assortative
 Usually node degree is used as scale
68

Modularity Example
The number of edges between nodes of the same color
is less than the expected number of edges between them

Assortativity Coefficient of Various Networks
72
M.E.J. Newman. Assortative mixing in networks

Measuring Assortativity for Ordinal Attributes
 A common measure for analyzing the relationship between
ordinal values is covariance
 It describes how two variables change together
 In our case, we have a network
 We are interested in how values assigned to nodes that are
connected (via edges) are correlated

Covariance Variables
 The value assigned to node is
 We construct two variables and
 For any edge , we assume that is observed from variable
and is observed from variable
 represents the ordinal values associated with the left-node
(the first node) of the edges and represents the values
associated with the right-node (the second node) of the
edges
 We need to compute the covariance between variables and

Covariance Variables: Example
: (18, 21, 21, 20)
: (21, 18, 20, 21)
List of edges:
(A, C)
(C, A)
(C, B)
(B, C)

Normalizing Covariance
Pearson correlation is the
normalized version of covariance
In our case:

Measures and Metrics
 Knowing the structure of a network, we can calculate
various useful quantities or measures that capture
particular features of the network topology.
 basis of most of such measures are from social network analysis
 So far,
 Average path length, Diameter, Degree distribution, Density,
Assortativity, Connectedness, Clustering coefficient,
 Centrality
 Degree, Eigenvector, Katz, PageRank, Hubs, Closeness,
Betweenness, ….
 Several other graph metrics
80

Outline
 Network metrics can help us characterize networks
 This has is roots in graph theory
 Today there are many network analysis tools to choose
from

Network Measures: Characterizing networks

More Related Content

Similar to Network Measures: Characterizing networks

Network Measures: Characterizing networks

Editor's Notes