ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor Daniel Martin Katz
1. Complex Systems Models
in the Social Sciences
(Lecture 3)
daniel martin katz
illinois institute of technology
chicago kent college of law
@computationaldanielmartinkatz.com computationallegalstudies.com
3. Stanley Milgram’s
Other Experiment
Milgram was interested in the
structure of society
Including the social distance
between individuals
While the term “six degrees” is often
attributed to milgram it can be traced to ideas
from hungarian author Frigyes Karinthy
What is the average distance
between two individuals in
society?
5. Six Degrees of Separation?
NE
MA
Target person worked in Boston as a stockbroker
296 senders from Boston and Omaha.
20% of senders reached target.
Average chain length = 6.5.
And So the term ...
“Six degrees of Separation”
6. Six Degrees
Six Degrees is a claim that “average path
length” between two individuals in society
is ~ 6
The idea of ‘Six Degrees’ Popularized
through plays/movies and the kevin bacon
game
http://oracleofbacon.org/
9. But What is Wrong
with Milgram’s Logic?
150(150) = 22,500
150 3 = 3,375,000
150 4 = 506,250,000
150 5= 75,937,500,000
10. The Strength of ‘Weak’ Ties
Does Milgram get
it right? (Mark Granovetter)
Visualization Source: Early Friendster – MIT Network
www.visualcomplexity.com
Strong and Weak Ties
(Clustered
v.
Spanning)
Clustering ----
My Friends’ Friends
are also likely to
be friends
11. So Was Milgram Correct?
Small Worlds (i.e. Six Degrees) was a theoretical
and an empirical Claim
The Theoretical Account Was Incorrect
The Empirical Claim was still intact
Query as to how could real social networks
display both small worlds and clustering?
At the Same time, the Strength of Weak Ties was
also an Theoretical and Empirical proposition
12. Watts and Strogatz (1998)
A few random links in an otherwise clustered
graph yields the types of small world
properties found by Milgram
“Randomness” is key bridge between the small
world result and the clustering that is
commonly observed in real social networks
13. Watts and Strogatz (1998)
A Small Amount of Random Rewiring or
Something akin to Weak Ties—Allows for
Clustering and Small Worlds
Random Graphlocally Clustered
16. 1 mode
Actor to Actor
Could be Binary
(0,1)
Did they
Co-Appear?
Different Forms of
Network Representation
17. Different Forms of
Network Representation
1 mode
Actor to Actor
Could also be
Weighted
(I.E. Edge Weights by
Number of
Co-Appearences)
18. Features of Networks
Mesoscopic Community Structures
We will discuss these next week
Macroscopic Graph Level Properties
We will discuss these today
Microscopic Node Level Properties
We will discuss these Next week
20. Shortest Paths
Shortest Paths
The shortest set of links
connecting two nodes
Also, known as the geodesic path
In many graphs, there are multiple
shortest paths
21. Shortest Paths
Shortest Paths
A and C are connected by
2 shortest paths
A – E – B - C
A – E – D - C
Diameter: the largest geodesic distance
in the graph
The distance between A and C is
the maximum for the graph: 4
22. Shortest Paths
I n t h e W a t t s - S t r o g a t z M o d e l
Shortest Paths are reduced by
increasing levels of random rewiring
24. Density
Density = Of the connections
that could exist between n nodes
directed graph: emax = n*(n-1)
(each of the n nodes can connect to (n-1) other nodes)
undirected graph emax = n*(n-1)/2
(since edges are undirected, count each one only once)
What Fraction are Present?
25. Density
What fraction are present?
density = e / emax
For example, out of 12
possible connections..
this graph
this graph has 7,
giving it a density of
7/12 = 0.58
A “fully connected graph has a density =1
26. Connected Components
We are often interested in whether
the graph has a single or multiple
connected components
Strong Components
Giant Component
Weak Components
27. “Largest Weakly Connected Component” in the
SCOTUS Citation Network
There exist cases that are not in this visual as
they are disconnected as of the year 1830
However, by 2009, 99% of SCOTUS Decisions are
in the Largest Weakly Connected Component
32. Degree Distributions
outdegree
how many directed edges (arcs)
originate at a node
indegree
how many directed edges (arcs) are
incident on a node
degree (in or out)
number of edges incident on a node
Indegree=3
Outdegree=2
Degree=5
33. Node Degree
from
Matrix Values
Outdegree:
outdegree for node 3 = 2,
which we obtain by summing
the number of non-zero
entries in the 3rd row
Indegree:
indegree for node 3 = 1,
which we obtain by summing
the number of non-zero
entries in the 3rd column
34. Degree Distributions
These are Degree Count for particular nodes
but we are also interested in the distribution
of arcs (or edges) across all nodes
These Distributions are called “degree
distributions”
Degree distribution: A frequency count of
the occurrence of each degree
36. Degree Distributions
Imagine we have this 8 node network:
In-degree distribution:
[(2,3) (1,4) (0,1)]
Out-degree distribution:
[(2,4) (1,3) (0,1)]
(undirected) distribution:
[(3,3) (2,2) (1,3)]
37. Why are Degree
Distributions Useful?
They are the signature of a dynamic process
We will discuss in greater detail tomorrow
Consider several canonical network models
47. Readings on Power law /
Scale free Networks
Check out Lada Adamic’s Power Law Tutorial
Describes distinctions between the Zipf,
Power-law and Pareto distribution
http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html
This is the original paper that gave rise to
all of the other power law networks papers:
A.-L. Barabási & R. Albert, Emergence of scaling in random
networks, Science 286, 509–512 (1999)
50. How Do I Know Something
is Actually a Power Law?
51. Clauset, Shalizi & Newman
http://arxiv.org/abs/0706.1062
argues for the use of MLE
instead of linear regression
Demonstrates that a number
of prior papers mistakenly
called their distribution a
power law
Here is why you should use
Maximum Likelihood Estimation
(MLE) instead of linear
regression
You recover the power law
when its present
Notice spread between the
Yellow and red lines
52. Back to the Random Graph
Models for a Moment
Poisson distribution
Erdos-Renyi is the default random
graph model:
randomly draw E edges
between N nodes
There are no hubs in the network
Rather, there exists a narrow
distribution of connectivities
53. Back to the Random Graph
Models for a Moment
let there be n people
p is the probability that any two of them are ‘friends’
Binomial Poisson Normal
limit p small Limit large n
55. Generating Power Law
Distributed Networks
Pseudocode for the growing power law networks:
Start with small number of nodes
add new vertices one by one
each new edge connects to an existing vertex in
proportion to the number of edges that vertex
already displays (i.e. preferentially attach)
56. Growing Power Law
Distributed Networks
The previous pseudocode is not a unique solution
A variety of other growth dynamics are possible
In the simple case this is a system that extremely
“sensitive to initial conditions”
upstarts who garner early advantage are able to
extend their relative advantage in later periods
for example, imagine you receive a higher interest
rate the more money you have “rich get richer”