Upcoming SlideShare
×

# ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor Daniel Martin Katz

969 views
816 views

Published on

Published in: Education, Technology
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
969
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
52
0
Likes
1
Embeds 0
No embeds

No notes for slide

### ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor Daniel Martin Katz

1. 1. Complex Systems Models in the Social Sciences (Lecture 3) daniel martin katz illinois institute of technology chicago kent college of law @computationaldanielmartinkatz.com computationallegalstudies.com
2. 2. Back to Where We Ended Our Last Class
3. 3. Stanley Milgram’s Other Experiment Milgram was interested in the structure of society Including the social distance between individuals While the term “six degrees” is often attributed to milgram it can be traced to ideas from hungarian author Frigyes Karinthy What is the average distance between two individuals in society?
4. 4. Stanley Milgram’s Other Experiment NE MA
5. 5. Six Degrees of Separation? NE MA Target person worked in Boston as a stockbroker 296 senders from Boston and Omaha. 20% of senders reached target. Average chain length = 6.5. And So the term ... “Six degrees of Separation”
6. 6. Six Degrees Six Degrees is a claim that “average path length” between two individuals in society is ~ 6 The idea of ‘Six Degrees’ Popularized through plays/movies and the kevin bacon game http://oracleofbacon.org/
7. 7. Six Degrees of Kevin Bacon
8. 8. Visualization Source: Duncan J. Watts, Six Degrees Six Degrees of Kevin Bacon
9. 9. But What is Wrong with Milgram’s Logic? 150(150) = 22,500 150 3 = 3,375,000 150 4 = 506,250,000 150 5= 75,937,500,000
10. 10. The Strength of ‘Weak’ Ties Does Milgram get it right? (Mark Granovetter) Visualization Source: Early Friendster – MIT Network www.visualcomplexity.com Strong and Weak Ties (Clustered v. Spanning) Clustering ---- My Friends’ Friends are also likely to be friends
11. 11. So Was Milgram Correct? Small Worlds (i.e. Six Degrees) was a theoretical and an empirical Claim The Theoretical Account Was Incorrect The Empirical Claim was still intact Query as to how could real social networks display both small worlds and clustering? At the Same time, the Strength of Weak Ties was also an Theoretical and Empirical proposition
12. 12. Watts and Strogatz (1998) A few random links in an otherwise clustered graph yields the types of small world properties found by Milgram “Randomness” is key bridge between the small world result and the clustering that is commonly observed in real social networks
13. 13. Watts and Strogatz (1998) A Small Amount of Random Rewiring or Something akin to Weak Ties—Allows for Clustering and Small Worlds Random Graphlocally Clustered
14. 14. Different Form of Network Representation 1 mode 2 mode
15. 15. 2 mode Actors and Movies Different Forms of Network Representation
16. 16. 1 mode Actor to Actor Could be Binary (0,1) Did they Co-Appear? Different Forms of Network Representation
17. 17. Different Forms of Network Representation 1 mode Actor to Actor Could also be Weighted (I.E. Edge Weights by Number of Co-Appearences)
18. 18. Features of Networks Mesoscopic Community Structures We will discuss these next week Macroscopic Graph Level Properties We will discuss these today Microscopic Node Level Properties We will discuss these Next week
19. 19. Macroscopic Graph Level Properties Degree Distributions (Outdegree & Indegree) Clustering Coefficients Connected Components Shortest Paths Density
20. 20. Shortest Paths Shortest Paths The shortest set of links connecting two nodes Also, known as the geodesic path In many graphs, there are multiple shortest paths
21. 21. Shortest Paths Shortest Paths A and C are connected by 2 shortest paths A – E – B - C A – E – D - C Diameter: the largest geodesic distance in the graph The distance between A and C is the maximum for the graph: 4
22. 22. Shortest Paths I n t h e W a t t s - S t r o g a t z M o d e l Shortest Paths are reduced by increasing levels of random rewiring
23. 23. Clustering Coefficients Clustering Coefficients Measure of the tendency of nodes in a graph to cluster Both a graph level average for clustering Also, a local version which is interested in cliqueness of a graph
24. 24. Density Density = Of the connections that could exist between n nodes directed graph: emax = n*(n-1) (each of the n nodes can connect to (n-1) other nodes) undirected graph emax = n*(n-1)/2  (since edges are undirected, count each one only once) What Fraction are Present?
25. 25. Density What fraction are present? density = e / emax For example, out of 12  possible connections.. this graph this graph has 7, giving it a density of   7/12 = 0.58 A “fully connected graph has a density =1
26. 26. Connected Components We are often interested in whether the graph has a single or multiple connected components Strong Components Giant Component Weak Components
27. 27. “Largest Weakly Connected Component” in the SCOTUS Citation Network There exist cases that are not in this visual as they are disconnected as of the year 1830 However, by 2009, 99% of SCOTUS Decisions are in the Largest Weakly Connected Component
28. 28. Connected Components Open “Giant Component” from the netlogo models Library
29. 29. Connected Components Notice the fraction of nodes in the giant component Notice the Size of the “Giant Component” Model has been advanced 25+ Ticks
30. 30. Connected Components Model has been advanced 80+ Ticks Notice the fraction of nodes in the giant component Notice the Size of the “Giant Component”
31. 31. Connected Components Model has been advanced 120+ Ticks Notice the fraction of nodes in the giant component Notice the Size of the “Giant Component” now = “num-nodes” in the slider
32. 32. Degree Distributions outdegree  how many directed edges (arcs) originate at a node indegree  how many directed edges (arcs) are incident on a node degree (in or out)  number of edges incident on a node Indegree=3 Outdegree=2 Degree=5
33. 33. Node Degree from Matrix Values Outdegree: outdegree for node 3 = 2, which we obtain by summing the number of non-zero entries in the 3rd row Indegree: indegree for node 3 = 1, which we obtain by summing the number of non-zero entries in the 3rd column
34. 34. Degree Distributions These are Degree Count for particular nodes but we are also interested in the distribution of arcs (or edges) across all nodes These Distributions are called “degree distributions” Degree distribution: A frequency count of the occurrence of each degree
35. 35. Degree Distributions Imagine we have this 8 node network: In-degree sequence: [2, 2, 2, 1, 1, 1, 1, 0] Out-degree sequence: [2, 2, 2, 2, 1, 1, 1, 0] (undirected) degree sequence: [3, 3, 3, 2, 2, 1, 1, 1]
36. 36. Degree Distributions Imagine we have this 8 node network: In-degree distribution: [(2,3) (1,4) (0,1)] Out-degree distribution: [(2,4) (1,3) (0,1)] (undirected) distribution: [(3,3) (2,2) (1,3)]
37. 37. Why are Degree Distributions Useful? They are the signature of a dynamic process We will discuss in greater detail tomorrow Consider several canonical network models
38. 38. Canonical Network Models Erdős-Renyi Random Network Highly Clustered Network Watts-Strogatz Small World Network Barabási-Albert Preferential Attachment Network
39. 39. Why are Degree Distributions Useful? Barabási-Albert Preferential Attachment Network
40. 40. Power Law / Scale Free Networks
41. 41. Barabási-Albert Preferential Attachment Netlogo Models Library --> Networks --> Preferential Attachment Watch the Changing Degree Distribution
42. 42. Barabási-Albert Preferential Attachment Netlogo Models Library --> Networks --> Preferential Attachment
43. 43. Barabási-Albert Preferential Attachment Netlogo Models Library --> Networks --> Preferential Attachment
44. 44. Barabási-Albert Preferential Attachment Netlogo Models Library --> Networks --> Preferential Attachment
45. 45. Barabási-Albert Preferential Attachment Netlogo Models Library --> Networks --> Preferential Attachment
46. 46. Barabási-Albert Preferential Attachment Netlogo Models Library --> Networks --> Preferential Attachment
47. 47. Readings on Power law / Scale free Networks Check out Lada Adamic’s Power Law Tutorial Describes distinctions between the Zipf, Power-law and Pareto distribution http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html This is the original paper that gave rise to all of the other power law networks papers: A.-L. Barabási & R. Albert, Emergence of scaling in random networks, Science 286, 509–512 (1999)
48. 48. Power Laws Seem to be Everywhere
49. 49. Power Laws Seem to be Everywhere
50. 50. How Do I Know Something is Actually a Power Law?
51. 51. Clauset, Shalizi & Newman http://arxiv.org/abs/0706.1062 argues for the use of MLE instead of linear regression Demonstrates that a number of prior papers mistakenly called their distribution a power law Here is why you should use Maximum Likelihood Estimation (MLE) instead of linear regression You recover the power law when its present Notice spread between the Yellow and red lines
52. 52. Back to the Random Graph Models for a Moment Poisson distribution Erdos-Renyi is the default random graph model: randomly draw E edges between N nodes There are no hubs in the network Rather, there exists a narrow distribution of connectivities
53. 53. Back to the Random Graph Models for a Moment let there be n people p is the probability that any two of them are ‘friends’ Binomial Poisson Normal limit p small Limit large n
54. 54. Random Graphs Power Law networks
55. 55. Generating Power Law Distributed Networks Pseudocode for the growing power law networks: Start with small number of nodes add new vertices one by one each new edge connects to an existing vertex in proportion to the number of edges that vertex already displays (i.e. preferentially attach)
56. 56. Growing Power Law Distributed Networks The previous pseudocode is not a unique solution A variety of other growth dynamics are possible In the simple case this is a system that extremely “sensitive to initial conditions” upstarts who garner early advantage are able to extend their relative advantage in later periods for example, imagine you receive a higher interest rate the more money you have “rich get richer”