Financial Cartography - Center for Financial Research
Financial Networks III. Centrality and Systemic Importance
1. Center for Financial Studies at the Goethe University
PhD Mini-course
Frankfurt, 25 January 2013
Financial Networks
III. Centrality and Systemic
Importance
Dr. Kimmo Soramäki
Founder and CEO
FNA, www.fna.fi
2. Agenda for today
• Centrality and Network Core
• Developing SinkRank
• Analyzing and visualizing cross-border banking
exposures
2
4. Common centrality metrics
Centrality aims to summarize some notion of importance.
Operationalizing the concept is more challenging.
Degree: number of links
Closeness: distance from/to other
nodes via shortest paths
Betweenness: number of shortest
paths going through the node
Eigenvector: nodes that are linked by
other important nodes are more central,
probability of a random process
5. Centrality depends on network process
• Trajectory • Transmission
– Geodesic paths (shortest paths) – Parallel duplication
– Any path (visit a given node once) – Serial duplication
– Trails (visit a given link once) – Transfer
– Walks (free movement)
Borgatti (2005). Centrality and network flow .
Social Networks 27, pp. 55–71.
7. Closeness
• The Farness of a node is
defined as the sum of its
distances to all other nodes
• The Closeness of a node is
defined as the inverse of the
farness
• Needs a connected graph (or
component)
• Directed/undirected
• Weighed/un-weighted
7
8. Betweenness Centrality
• Measures the number of shortest paths going through a vertex or an
arc
• Algorithm
– For each pair of vertices (s,t),
compute the shortest paths
between them
– For each pair of vertices (s,t),
determine the fraction of shortest
paths that pass through the vertex
in question
– Sum this fraction over all pairs of
vertices (s,t).
• Directed/undirected; Weighed/unweighted
Freeman, Linton (1977). "A set of measures of centrality
based upon betweenness". Sociometry 40: 35–4 8
10. Cut Edge/Arc
Cut edge or bridge is an edge
whose deletion increases the
number of connected components
Tarjan ('74) provides a linear time
algorithm
11. Cut Points/Vertices
Cut points are the end vertices of
a cut arc (if their degree is not 1)
# Add network „CutPoint' to database.
addn -n CutPoint -preserve false
# Add vertices and arcs to network.
adda -a v1-v2
adda -a v1-v5
adda -a v2-v3
adda -a v3-v4
adda -a v4-v5
adda -a v3-v6
adda -a v6-v8
adda -a v8-v7
adda -a v6-v7
# Identify cut arc and vertex
cutarc
cutvertex
# Visualize
viz -vcolor cutvertex -vsizedefault 10 -vlabel vertex_id -awidthdefault 2 -arrows false -fontsize 25 -saveas CutPointViz
13. Sample network
Adjacency matrix
A B C
A 0 1 2
B 1 0 0
C 0 1 0
Transition matrix :
Right stochastic Left stochastic
A B C A B C adda -a A-C -preserve false
adda -a C-B
A 0 1/3 2/3 A 0 1 0 adda -a A-B
adda -a B-A
B 1 0 0 B 1/3 0 0
setap -p value -value 1
C 0 1 0 C 2/3 1 0 setap -a A-C -p value -value 2
13
14. Degree
• Local measure
• Can be calculated for all types of networks
• Undirected, outgoing and incoming direction
• Weighted degree = Strength
A B C
degree 3 3 2
out-degree 2 1 1
in-degree 1 2 1
strength 4 3 3
out-strength 3 1 1
in-strength 1 2 2 14
15. Eigenvector Centrality (EVC)
• Connections are not equal, a connection to a more important node is
more important
• We make centrality (xi) proportional to the average of the centrality
(e.g. degree) of i‟s network neighbors:
where λ is a constant and A the adjacency matrix (Aij =1 if link i-j
exists, and 0 otherwise)
• Defining a vector of centralities x=(x1, x2, ..., xn), we can rewrite
x=Ax
• We see that x is an eigenvector of the adjacency matrix with
eigenvalue λ 15
16. EVC - Properties
• All entries of x are positive for eigenvector associated with the
largest eigenvalue (Perron–Frobenius theorem). The entry xi gives
EVC for node i.
• Adjacency matrix A can also contain weights instead of 0-1 links ->
weighed EVC
• The graph can be directed (asymmetric A) -> directed EVC
• Can contain loops (self-links, Ai=j)
• The graph must be strongly connected!
Can be calculated only for GSCC.
16
17. Markov chains
• Markov chains are memoryless random
processes that undergo transitions from
one state to another
• We describe a Markov chain as follows: (66.6%) (100%)
We have a set of states, S = {s1, s2,.. sn}
(33.3%)
• A process starts in one of these states
and moves successively from one state
(100%)
to another. Each move is called a step.
• If the chain is currently in state si, it moves to state sj at the next step with a
probability denoted by pij
• The probabilities pij are called transition probabilities and a matrix T
specifying pij's a transition matrix
17
18. State probability vector
• Let q(t)=(q1(t),, q2(t) , ... , qn(t)) be the state probability vector whose ith
component is the probability that the chain is in state i at time t.
• Markov chain is fully defined by q(0) and T
q(t)=q(t−1)T=q(0)Tt
• q(t) is also called the distribution of the chain at time t
• Question: at which probabilities do we find a random process at
states si when t is large?
• An important node would have the process visit it often
18
19. Stationary probability vector
• A stationary probability vector π is defined as a vector that
does not change under application of the transition matrix
π= πT
• For any
– irreducible (~ strongly connected component)
– aperiodic (~ process does not visit nodes at determined intervals)
– positive-recurrent (~ process re-enters each node eventually)
Markov Chain there exists a unique stationary probability
vector (Fundamental Theorem of Markov Chains)
19
20. Simple way of Calculating
• The distribution vector after 1 step is the matrix product, q(0)T
• The distribution one step later, obtained by again multiplying by T,
is given by (q(0)T)T = q(0)T2.
• Similarly, the distribution after t steps can be obtained by
multiplying q(0) on the right by T t times, or multiplying q(0) by Tt.
• Distribution After t Steps: q(t)=q(0)Tt
• EVC = elements of q(t) for a large t
• Power iteration -method
20
21. Combining iterative and Markov chain
interpretations
• The Perron–Frobenius theorem says that in a stochatic matrix, the
largest absolute eigenvalue is always 1
• Transition matrix can be right (T) or left (T') stochastic
• As a result we have:
π=Aπ (Eigenvector)
π=πT (Markov Chain)
π=T'π (Largest Eigenvector, i.e. 1, of left stochastic transition
matrix)
21
22. Most networks are not strongly connected
• EVC can be calculated only for “Giant Strongly Connected
Component” (GSCC)
• Due to need for irreducible, aperiodic, positive-recurrent Markov
Chain
• Solution: PageRank and the 'Random Surfer" -model
23. PageRank
• Solves the problem with a “Damping factor” which is used to
modify the transition matrix (S)
– Gi,j= i,j
C
• Effectively allowing the random process
out of dead-ends (dangling nodes), but (66.6%) (100%)
at the cost of introducing error
(33.3%)
• Effect of A B
– Centrality of each node is 1/N (100%)
– Eigenvector Centrality
– Commonly is used
24. Calculating EVC and PageRank
# Create sample network
adda -a A-C -preserve false
adda -a C-B A B C
adda -a A-B
adda -a B-A EVC 0.375 0.375 0.250
setap -p value -value 1
setap -a A-C -p value -value 2
PageRank 0.368 0.374 0.258
# Calculate weighted and directed EVC
evc -p value -saveas EVC PageRank-0 0.375 0.375 0.250
# Calculate PageRank (default alpha=0.15)
# Note: This relates to 0.85 in slides
PageRank-1 0.333 0.333 0.333
pagerank -p value -saveas PageRank
CheiRank 0.397 0.388 0.215
# Calculate PageRank (alpha=0)
pagerank -p value -alpha 0 -saveas PageRank-0 CheiRank-0 0.400 0.400 0.200
# Calculate PageRank (alpha=1)
pagerank -p value -alpha 1 -saveas PageRank-1 CheiRank-1 0.333 0.333 0.333
#Calculate same for CheiRank
cheirank -p value -saveas CheiRank
cheirank -p value -alpha 0 -saveas CheiRank-0
cheirank -p value -alpha 1 -saveas CheiRank-1
# save results in a csv file
savev -file walkcentrality.csv 24
25. Final notes on PageRank/EVC
• Undirected vs. Directed
– PageRank generally in-direction
– out-direction = CheiRank
Important and
Important
Fragile
CheiRank
• Unweighted vs. Weighted Unimportant Fragile
– 0/1 or real values in A/T
PageRank
25
27. Maximum Clique
• A graph may contain many complete subgraphs ("cliques"), i.e. sets
of nodes where each pair of nodes is connected
• The largest of these is called 'Maximum Clique'
• One way of finding the 'core'
# Create random network
random -nv 30 -na 120 -preserve false -seed 123
# Identify maximum undirected clique
# 0 - no clique, 1 - maximum clique, 2... smaller cliques
maxclique -direction any
# Set color property of nodes in clique as red
setvp -p color -value red -e maxclique=1
# Visualize
viz -vcolor color -vsizedefault 8 -arrows false
27
28. Newman Modularity
• Method for detecting modules (also called groups, clusters or
communities)
• Networks with high modularity have dense connections between the
nodes within modules but sparse connections between nodes in
different modules.
# Create random tree
tree -nv 30 -preserve false -seed 123
# Identify communities with Newman's modularity algorithm
newman
# Visualize
viz -vcolor newman -vsizedefault 8 -arrows false
Newman, M. E. J. (2006). "Modularity and community structure in
networks". PROCEEDINGS- NATIONAL ACADEMY OF SCIENCES USA 103
(23): 8577–8696. 28
29. Craig - von Peter Core
• Interbank markets are tiered in
a core-periphery structure
• Determines the optimal set of
core banks that achieves the
best structural match between
observed structure and
perfectly tiered structure
# Create network with core-periphery structure
complete -nv 3 -preserve false -directed false
adda -a 00001-00004
adda -a 00002-00005
adda -a 00003-00006
# Calculate core
cvpcore
# Set color property of nodes in clique as red
setvp -p color -value red -e cvpcore=true Ben Craig and Goetz von Peter (2010). Interbank tiering
and money center banks, BIS Working Papers No 322.
# Visualize
29
viz -vcolor color -vsizedefault 8 -arrows false
30. Developing a centrality metric
for Payment Systems
SinkRank
Discussion Paper, No. 2012-43 | September 3, 2012
http://www.economics-ejournal.org/economics/discussionpapers/2012-43
30
31. Interbank Payment Systems
• Provide the backbone of all
economic transactions
• Banks settle claims arising from
customers transfers, own
securities/FX trades and liquidity
management
• Target 2 settled 839 trillion in
2010
32. Systemic Risk in Payment Systems
• Credit risk has been virtually eliminated by system design (real-time
gross settlement)
• Liquidity risk remains
– “Congestion”
– “Liquidity Dislocation”
• Trigger may be
– Operational/IT event
– Liquidity event
– Solvency event
• Time scale is intraday, spillovers possible
34. Distance to Sink
• Markov chains are well-suited to model transfers along walks
• Absorbing Markov Chains give distances:
From B 1
To A
From C 2
(66.6%) (100%) From A
To B
From C 1
(33.3%)
From A
To C
From B
(100%)
35. SinkRank
SinkRanks on unweighted
• SinkRank is the average distance networks
to a node via (weighted) walks
from other nodes
• We need an assumption on the
distribution of liquidity in the
network at time of failure
– Assume uniform ->
unweighted average
– Estimate distribution -> PageRank -
weighted average
– Use real distribution ->
Real distribution are used as weights
38. Experiments
• Design issues
– Real vs. artificial networks?
– Real vs. simulated failures?
– How to measure disruption?
• Approach taken
1. Create artificial data with close resemblance to the US Fedwire
system (BA-type, Soramäki et al 2007)
2. Simulate failure of a bank: the bank can only receive but not send
any payments for the whole day
3. Measure “Liquidity Dislocation” and “Congestion” by non-failing
banks
4. Correlate 3. (the “Disruption”) with SinkRank of the failing bank
39. Barabási–Albert (BA) model
• Based on Barabási–Albert (BA) model
• The BA algorithm generates random scale-free networks and is
based on two forces: growth an preferential attachment:
– The network begins with an initial network of m0 (>2) nodes.
– New nodes are added to the network one at a time.
– Each new node is connected to existing nodes with a probability that is
proportional to the number of links that the existing nodes already have.
• Instead of links, we generate payments (multiple links between pairs
of nodes)
• We use lower preferential attachment accumulation than the BA
model
39
41. Measures
• Congestion: duration of delays in the system aggregated
over all banks
• Liquidity Dislocation: the average reduction in available
funds of the other banks due to the failing bank
• Disruption: duration-weighted sum of Congestion and
Liquidity Dislocation
• -> Carry out counterfactual simulations with generated
data - failing banks and measuring impact
41
42. Distance from Sink vs. Disruption
Relationship between
Failure Distance and
Disruption when the most
central bank fails
Highest disruption to
banks whose liquidity is
absorbed first (low
Distance to Sink)
Distance to Sink
43. SinkRank vs. Disruption
Relationship between
SinkRank and Disruption
Highest disruption by
banks who absorb
liquidity quickly from the
system (low SinkRank)