The document provides an overview of a tutorial on network analysis and the law given by Daniel Martin Katz and Michael J. Bommarito II. It discusses Katz's background in law and network science. The tutorial covers an introduction to network analysis including key concepts like nodes, edges, degree distributions and more. It also discusses applications of network analysis to law including legal elites, diffusion of legal ideas, and judicial citation networks. Advanced topics like community detection algorithms are also outlined.
1. Network Analysis and the Law
Daniel Martin Katz
Illinois Tech
Chicago Kent College of Law
Michael J. Bommarito II
Center for Study of
Complex Systems
Jurix 2011 Tutorial @ Universität Wien
!
2. My Background
Associate Professor of Law
Illinois Tech - Chicago Kent Law
Former NSF IGERT Fellow,
University of Michigan
Center for the Study of Complex Systems
(2009-2010)
PhD
Political Science & Public Policy
University of Michigan
(2011)
JD
University of Michigan
Law School
(2005)
3. My Background
Former NSF IGERT Fellow,
University of Michigan
Center for the Study of Complex
Systems
PhD Pre-Candidate
Dept. of Political Science
University of Michigan
Masters Degree
Financial Engineering
University of Michigan
4. Outline of Our Session
Network Analysis: An Extended Primer
Network Analysis & Law
The Frontier of Network Analysis & Law
Legal Elites
Diffusion and other Related Processes
Legal Doctrine and Legal Rules
Advanced Network Science Topics
Community Detection
ERGM / P* Models
Social Epidemiology
Distance Measures for Dynamic Citation Networks
Dynamic Community Detection
The Judicial Collaborative Filter (Judge Aided Info Retrevial)
6. Introduction to
Network Analysis
What is a Network?
What is a Social Network?
Mathematical Representation of the
Relationships Between Units such as
Actors, Institutions, Software, etc.
Special class of graph Involving
Particular Units and Connections
8. Social Science
For Images and Links to
Underlying projects:
http://jhfowler.ucsd.edu/
3D HiDef SCOTUS Movie
Co-Sponsorship in Congress
Spread of Obesity
Hiring and Placement of
Political Science PhD’s
9. Social Science
The 2004 Political Blogosphere
(Adamic & Glance)
High School Friendship
(Moody)
Roll Call Votes in United States Congress
(Mucha, et al)
16. Example: Nodes in an actor-
based social Network
Alice
Bill
Carrie
David
Ellen
How Can We Represent The
Relevant Social Relationships?
Terminology & Examples
23. A Survey Based Example
“Which of the above individuals
do you consider a close friend?”
Image We Surveyed 5 Actors:
(1) Daniel,
(2) Jennifer,
(3) Josh,
(4) Bill,
(5) Larry
24. From an EdgeList to Matrix
1 2 3 4 5
---------------------------
Daniel (1) 0 1 1 1 1
Jennifer (2) 1 0 1 0 0
Josh (3) 0 1 0 1 1
Bill (4) 0 0 0 0 0
Larry (5) 1 1 1 1 0
*Directed Connections (Arcs) 13
1 2
1 3
1 4
1 5
2 1
2 3
3 4
3 5
3 2
5 1
5 4
5 3
5 2
ROWS è COLUMNS
*How to Read the Edge List: (Person in Column 1 is friends with Person in Column 2)
25. 1 2 3 4 5
---------------------------
Daniel (1) 0 1 1 1 1
Jennifer (2) 1 0 1 0 0
Josh (3) 0 1 0 1 1
Bill (4) 0 0 0 0 0
Larry (5) 1 1 1 1 0
From a Survey
to a Network
26. A Quick Law Based
Example of a
Dynamic Network
27. United States Supreme Court
To Play Movie of the Early SCOTUS Jurisprudence:
http://vimeo.com/9427420
Documentation is Available Here:
http://computationallegalstudies.com/2010/02/11/the-development-of-structure-in-the-citation-network-of-the-
united-states-supreme-court-now-in-hd/
37. The Origin of Network
Science is Graph Theory
The Königsberg Bridge Problem
the first theorem in graph theory
Is It Possible to cross each bridge
each and only once?
38. The Königsberg Bridge Problem
Leonhard Euler
(Pronounced Oil-er)
proved that this was
not possible
Is It Possible to
cross each bridge
each and only once?
39. Eulerian and
Hamiltonian Paths
Eulerian path: traverse
each edge exactly once
If starting point and end point are the same:
only possible if no nodes have an odd degree
each path must visit and leave each shore
If don’t need to return to starting point
can have 0 or 2 nodes with an odd degree
Hamiltonian path: visit
each vertex exactly once
41. Moreno, Heider, et. al.
and the Early Scholarship
Focused Upon Determining the Manner in
Which Society was Organized
Developed early techniques to represent the
social world Sociogram/ Sociograph
Obviously did not
have access to
modern computing
power
42. Stanley Milgram’s
Other Experiment
Milgram was interested in the
structure of society
Including the social distance
between individuals
While the term “six degrees” is often
attributed to milgram it can be traced to ideas
from hungarian author Frigyes Karinthy
What is the average distance
between two individuals in
society?
44. Six Degrees of Separation?
NE
MA
Target person worked in Boston as a stockbroker
296 senders from Boston and Omaha.
20% of senders reached target.
Average chain length = 6.5.
And So the term ...
“Six degrees of Separation”
45. Six Degrees
Six Degrees is a claim that “average path
length” between two individuals in society
is ~ 6
The idea of ‘Six Degrees’ Popularized
through plays/movies and the kevin bacon
game
http://oracleofbacon.org/
48. But What is Wrong
with Milgram’s Logic?
150(150) = 22,500
150 3 = 3,375,000
150 4 = 506,250,000
150 5= 75,937,500,000
49. The Strength of ‘Weak’ Ties
Does Milgram get
it right? (Mark Granovetter)
Visualization Source: Early Friendster – MIT Network
www.visualcomplexity.com
Strong and Weak Ties
(Clustered
v.
Spanning)
Clustering ----
My Friends’ Friends
are also likely to
be friends
50. So Was Milgram Correct?
Small Worlds (i.e. Six Degrees) was a theoretical
and an empirical Claim
The Theoretical Account Was Incorrect
The Empirical Claim was still intact
Query as to how could real social networks
display both small worlds and clustering?
At the Same time, the Strength of Weak Ties was
also an Theoretical and Empirical proposition
51. Watts and Strogatz (1998)
A few random links in an otherwise clustered
graph yields the types of small world
properties found by Milgram
“Randomness” is key bridge between the small
world result and the clustering that is
commonly observed in real social networks
52. Watts and Strogatz (1998)
A Small Amount of Random Rewiring or
Something akin to Weak Ties—Allows for
Clustering and Small Worlds
Random Graphlocally Clustered
55. The Milgram Experiment
How did the successful subjects actually
succeed?
How did they manage to get the envelope
from nebraska to boston?
this is a question regarding how
individuals conduct searches in their
networks
Given most individuals do not know the
path to distantly linked individuals
56. Search in Networks
Most individuals do not know the path to
an individual who is many hops away
Must rely on some sort of heuristic rules
to determine the possible path
57. Search in Networks
What information about the problem might
the individual attempt to leverage?
visual by duncan watts
dimensional data:
send it to a stockbroker
send it to closet possible city to boston
58. Follow up to
the original
Experiment
available at:
http://research.yahoo.com/pub/2397
Published in
Science in 2003
65. Shortest Paths
Shortest Paths
The shortest set of links
connecting two nodes
Also, known as the geodesic path
In many graphs, there are multiple
shortest paths
66. Shortest Paths
Shortest Paths
A and C are connected by
2 shortest paths
A – E – B - C
A – E – D - C
Diameter: the largest geodesic distance
in the graph
The distance between A and C is
the maximum for the graph: 3
67. Shortest Paths
I n t h e W a t t s - S t r o g a t z M o d e l
Shortest Paths are reduced by
increasing levels of random rewiring
69. Density
Density = Of the connections
that could exist between n nodes
directed graph: emax = n*(n-1)
(each of the n nodes can connect to (n-1) other nodes)
undirected graph emax = n*(n-1)/2
(since edges are undirected, count each one only once)
What Fraction are Present?
70. Density
What fraction are present?
density = e / emax
For example, out of 12
possible connections..
this graph
this graph has 7,
giving it a density of
7/12 = 0.58
A “fully connected graph has a density =1
71. Connected Components
We are often interested in whether
the graph has a single or multiple
connected components
Strong Components
Giant Component
Weak Components
72. Netlogo
Basic Simulation
Platform for Agent
Based Modeling &
Simple Network
Simulation
http://ccl.northwestern.edu/netlogo/
Wilensky (1999)
HIV / VOTING Hawk/Dove
(A Classic from
Evolutionary Game Theory)
73. Netlogo
Please DownLoad Netlogo as we
will be using it occasionally
throughout this tutorial
http://ccl.northwestern.edu/netlogo/
Wilensky (1999)
78. Degree Distributions
outdegree
how many directed edges (arcs)
originate at a node
indegree
how many directed edges (arcs) are
incident on a node
degree (in or out)
number of edges incident on a node
Indegree=3
Outdegree=2
Degree=5
79. Node Degree
from
Matrix Values
Outdegree:
outdegree for node 3 = 2,
which we obtain by summing
the number of non-zero
entries in the 3rd row
Indegree:
indegree for node 3 = 1,
which we obtain by summing
the number of non-zero
entries in the 3rd column
80. Degree Distributions
These are Degree Count for particular nodes
but we are also interested in the distribution
of arcs (or edges) across all nodes
These Distributions are called “degree
distributions”
Degree distribution: A frequency count of
the occurrence of each degree
82. Degree Distributions
Imagine we have this 8 node network:
In-degree distribution:
[(2,3) (1,4) (0,1)]
Out-degree distribution:
[(2,4) (1,3) (0,1)]
(undirected) distribution:
[(3,3) (2,2) (1,3)]
83. Why are Degree
Distributions Useful?
They are the signature of a dynamic process
We will discuss in greater detail tomorrow
Consider several canonical network models
92. Readings on Power law /
Scale free Networks
Check out Lada Adamic’s Power Law Tutorial
Describes distinctions between the Zipf,
Power-law and Pareto distribution
http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html
This is the original paper that gave rise to
all of the other power law networks papers:
A.-L. Barabási & R. Albert, Emergence of scaling in random
networks, Science 286, 509–512 (1999)
95. How Do I Know Something
is Actually a Power Law?
96. Clauset, Shalizi & Newman
http://arxiv.org/abs/0706.1062
argues for the use of MLE
instead of linear regression
Demonstrates that a number
of prior papers mistakenly
called their distribution a
power law
Here is why you should use
Maximum Likelihood Estimation
(MLE) instead of linear
regression
You recover the power law
when its present
Notice spread between the
Yellow and red lines
97. Back to the Random Graph
Models for a Moment
Poisson distribution
Erdos-Renyi is the default random
graph model:
randomly draw E edges
between N nodes
There are no hubs in the network
Rather, there exists a narrow
distribution of connectivities
98. Back to the Random Graph
Models for a Moment
let there be n people
p is the probability that any two of them are ‘friends’
Binomial Poisson Normal
limit p small Limit large n
100. Generating Power Law
Distributed Networks
Pseudocode for the growing power law networks:
Start with small number of nodes
add new vertices one by one
each new edge connects to an existing vertex in
proportion to the number of edges that vertex
already displays (i.e. preferentially attach)
101. Growing Power Law
Distributed Networks
The previous pseudocode is not a unique solution
A variety of other growth dynamics are possible
In the simple case this is a system that extremely
“sensitive to initial conditions”
upstarts who garner early advantage are able to
extend their relative advantage in later periods
for example, imagine you receive a higher interest
rate the more money you have “rich get richer”
102. Just To Preview The
Application to Positive
Legal Theory ....
103. Power Laws Appear to be a
Common Feature of Legal Systems
Katz, et al (2011)
American Legal Academy
Katz & Stafford (2010)
American Federal Judges
Geist (2009)
Austrian Supreme Court
Smith (2007)
U.S. Supreme Court
Smith (2007)
U.S. Law Reviews
Post & Eisen (2000)
NY Ct of Appeals
106. Node Level Measures
Sociologists have long been interested in roles /
positions that various nodes occupy with in
networks
For example various centrality measures
have been developed
Degree
Closeness
Here is a non-exhaustive List:
Betweenness
Hubs/Authorities
107. Degree
Degree is simply a count of the number of
arcs (or edges) incident to a node
Here the nodes are sized by degree:
108. Degree as a measure
of centrality
Please Calculate the “degree” of each of the nodes
109. Degree as a measure
of centrality
ask yourself, in which case does “degree” appear to
capture the most important actors?
110. Degree as a measure
of centrality
what about here, does it capture the “center”?
111. Closeness Centrality
Closeness is based on the inverse of the
distance of each actor to every other actor
in the network
Closeness Formula:
Normalized Closeness Formula:
114. Betweenness Centrality
Idea is related to
bridges, weak ties
This individual may
serve an important
function
Betweenness
centrality counts
the number of
geodesic paths
between i & k that
actor j resides on
116. Betweenness Centrality
Check these yourself:
gjk = the number of
geodesics connecting j & k,
and
gjk = the number that actor
i is on
Note: there is also a normalized
version of the formula
117. Betweenness Centrality
Betweenness is a very
powerful concept
We will return when we discuss
community detection in networks ...
If you want to
preview check out this paper:
Michelle Girvan & Mark Newman, Community
structure in social and biological networks, Proc.
Natl. Acad. Sci. USA 99, 7821–7826 (2002)
High Betweenness actors need not
be actors that score high on
other centrality measures (such
as degree, etc.)
[see picture to the right]
118. Hubs and Authorities
The Hubs and Authorities Algorithm
(HITS) was developed by Computer
Scientist Jon Kleinberg
Similar to the Google “PageRank”
Algorithm developed by Larry Page
Kleinberg is a MacArthur Fellow and
has offered a number of major
contributions
119. Hubs and Authorities
We are interested in BOTH:
to whom a webpage links
and
From whom it has received links
In Ranking a Webpage ...
120. Hubs and Authorities
Intuition --
If we are trying to rank a webpage
having a link from the New York Times
is more of than one from a random
person’s blog
HITS offers a significant improvement
over measuring degree as degree treats
all connections as equally valuable
121. Hubs and Authorities
Relies upon ideas such as recursion
Measure who is important?
Measure who is important to who
is important?
Measure who is important to who
is important to who is important ?
Etc.
122. Hubs and Authorities
Hubs: Hubs are highly-valued lists for
a given query
for example, a directory page from a major encyclopedia or
paper that links to many different highly-linked pages would
typically have a higher hub score than a page that links to
relatively few other sources.
Authority: Authorities are highly
endorsed answers to a query
A page that is particularly popular and linked by many
different directories will typically have a higher authority
score than a page that is unpopular.
Note: A Given WebPage could be both a hub and an authority
123. Hubs and Authorities
Hubs and Authorities has been used in a
wide number of social science articles
There exists some variants of the
Original HITS Algorithm
Here is the Original Article :
Jon Kleinberg, Authoritative sources in a
hyperlinked environment, Journal of the
Association of Computing Machinery, 46 (5): 604–
632 (1999).
Note: there is a 1998 edition as well
124. Calculating Centrality
Measures
Thankfully, centrality measures, etc. need not be
calculated by hand
Lots of software packages ...
in increasing levels of difficulty ... left to right
Difference in functions, etc. across the packages
easy: accepts
microsoft
excel files
Medium: requires
the .net / .paj
file setup
Hard: has lots of
features
(R or Python)
125. Daniel Martin Katz Eric Provins!
Introduction to Computing for
Complex Systems (Session XVII)!
Access A Full
Step By Step
Tutorial for Pajek
The Slides From My
Intro to Computing for
Complex Systems
Access Using this Tab
126. Network Analysis Software
Just Download Pajek
and Use the Tutorial
You should download it to your personal machine
MAC Users Note: It is a PC only Program so you will need
something like crossover or you will have to multiboot
http://pajek.imfm.si/doku.php?id=download
127.
128. Advanced Network Science Topics
Community Detection
ERGM Models
Diffusion /
Social Epidemiology
http://computationallegalstudies.com/2009/10/11/
programming-dynamic-models-in-python/
170. 6,778%2*0(9'*'&:,%(T'#2'+(!-:&/'1(
0<CE)'FEOV?)$EXHES)6;RP?EFA))
)
Mason A. Porter, Jukka-Pekka Onnela and Peter J. Mucha. 2009.
Communities in Networks. Notices of the American Mathematical Society
56: 1082-1166.
(
(
Santo Forunato. 2010. Community detection in graphs. Physics Reports.
486: 75-174.(
32&4$'/(LF(M,77$-2*,(NND(9$%2'/(3$-:%(O$*P(
192. Network Analysis & Law
Mapping Social Structure of Legal Elites
(hustle & Flow Article)
Diffusion, Norm Adoption and other
Related Processes
(JLE Article)
Legal Doctrine and Legal Rules
(Sinks Paper with Application to
Patents, etc.)
199. Collected Nearly 19,000 Law Clerk ‘Events’
1995 - 2005 For All Article III Judges
Relying Upon Data From Staff Directories
Network Analysis of
the Federal Judiciary
200. The Core Claim
In the Aggregate ...
Law Clerk Movements Reveal
Between Judicial Actors
Social or Professional Relationships
201. Network Analysis of
the Federal Judiciary
Judge E
Justice ZJustice Y
Judge C
Judge D
Judge B
Judge A
210. Reproduction of Hierarchy?
A Social Network Analysis of the
American Law Professoriate
Daniel Martin Katz
Josh Gubler
Jon Zelner
Michael Bommarito
Eric Provins
Eitan Ingall
211. Motivation for Project
Why Do Certain Paradigms, Histories, Ideas Succeed?
Function of the ‘Quality’ of the Idea
Social Factors also Influence the Spread of Ideas
Most Ideas Do Not Persist ....
212. Law Professors are Important Actors
Agents of Socialization
Repositories / Distributors of information
Socialize Future lawyers, Judges & law Professors
Responsible for Developing Particular Legal Ideas
(Brandwein (2007) ; Graber (1991), etc.)
Law Professor Behavior is a Important
Component of Positive Legal Theory
Positive Legal Theory
213. Social Network Analysis
Method for Characterizing Diffusion / Info Flow
Method for Tracking Social Connections, etc.
Method for Ranking Components based
upon Various Graph Based Measures
235. Hub Score
Score Each Institution’s Placements by
Number and Quality of Links
Normalized Score (0, 1]
Similar to the Google PageRank™ Algorithm
Measure who is important?
Measure who is important to who is important?
Run Analysis Recursively...
246. Highly Skewed Nature of
Legal Systems
Smith 2007
Post & Eisen 2000Katz & Stafford 2010
!
247. Implications for Rankings
Rankings only Imply Ordering ( >, =, < )
End Users tend to Conflate Ranks with
Linearized Distances Between Units
(Tversky 1977)
Non-Stationary Distances Between Entities
Both Trivial and Large Distances
Linearity Heuristic Often Works
Assuming Linearity Can Prove Misleading
249. Why Computational
Simulation?
History only Provides a Single Model Run
Computational Simulation allows ...
Consideration of Alternative “States of the world”
Evaluation of Counterfactuals
250. Computational Model of
Information Diffusion
We Apply a simple Disease Model to
Consider the Spread of Ideas, etc.
Clear Tradeoff Between Structural Position
in the Network and “Idea Infectiousness”
251. A Basic Description
of the Model
Consider a Hypothetical Idea Released
at a Given Institution
Infectiousness Probability = p
Two Forms Diffusion...
Direct Socialization
Signal Giving to Former Students
Infect neighbors, neighbors-neighbors, etc.
252. Lots of Channels of Information Diffusion
Among Legal Academics
Judicial Decisions, Law Reviews, Other Materials
Academic Conferences, Other Professional Orgs
SSRN, Legal Blogosphere, etc.
Channels of Diffusion
Other Channels of Information Dissemination
Legal Socialization / Training
257. Run a Simulation
on Your Desktop
http://computationallegalstudies.com/2009/04/22/the-revolution-will-not-be-televised-but-will-it-
come-from-harvard-or-yale-a-network-analysis-of-the-american-law-professoriate-part-iii/
(Requires Java 5.0 or Higher)
258. From a Single Run to
Consensus Diffusion Plot
Netlogo is Good for Model Demonstration
Regular Programming Language Typically
Required for Full Scale Implementation
We Used Python
http://ccl.northwestern.edu/netlogo/
http://www.python.org/
Object Oriented Programming Language
259. From a Single Run to
Consensus Diffusion Plot
Repeated the Diffusion Simulation
Hundreds of Model Runs Per School
Yielded a Consensus Plot for Each School
Results for Five Emblematic Schools
Exponential, linear and sub-linear
261. Differential Host Susceptibility
Some Potential
Model Improvements?
Countervailing Information / Paradigms
S I R Model Susceptible-Infected-Recovered
262. Directions for
Future Research
Longitudinal Data
Hiring/Placement/Laterals
Current Collecting Data
Database Linkage to Articles/Citations
Working with Content Providers
Empirical Evaluation of Simulation
Computational Lingusitics
Text Mining, Sentiment Coding
263.
264. Example Project #3:
On the Road to the
Legal Genome Project ...
Dynamic Community Detection
&
Distance Measures for
Dynamic Citation Networks
265. Distance Measures for
Dynamic Citation Networks
Michael J. Bommarito II
Daniel Martin Katz
Jon Zelner
James H. Fowler
271. How Can We
Track the Novel
Combination,
Mutation and
Spread of Ideas?
272. Information Genome Project
The Development, Mutation
and and Spread of Ideas
Precedent in Common Law Systems
Patent Citations
Bibliometric Analysis
296. Cases Decided by
the Supreme Court
Citations in the
Current Year
Citations from
prior years
PLAY MOVIE!
http://computationallegalstudies.com/
2010/02/11/the-development-of-structure-in-
the-citation-network-of-the-united-states-
supreme-court-now-in-hd/