Statistical mechanics model of SCOTUS voting behavior

Statistical mechanics of SCOTUS

Edward D. Lee1
Chase P. Broedersz1,2
William Bialek1,2

1Department of Physics, Princeton U.
2Lewis-Sigler Institute for Integrative Genomics, Princeton U.

From complex to simple
Group behavior of social organisms can
manifest complex patterns at the group
level even though they might reduce
down to simple rules for individuals
[1—3]. For example, starlings seem to
interact only with local neighbors, and
these local interactions produce global
patterns like on the right. We call this
emergent collective phenomena
because such behavior is not explicitly
encoded in the behavior of individuals,
but arises from interactions.
Is it the case that decisions by groups
of people, despite apparent Flocks of starlings form complex patterns
complexities, are likewise reducible to (http://webodysseum.com/videos/spectacu
simpler rules? lar-starling-flocks-video-murmuration/).

US Supreme Court (SCOTUS)
We investigate the voting behavior of the SCOTUS. We wish to use the
structure of the decisions to gain insight into how these decisions are
reached. We show results for the second Rehnquist Court (1994—2005,
𝑁 = 895 cases) and discuss other “natural courts”—periods of time when
member remain constant—where relevant.
SCOTUS facts:
• highest court in the US government • write a majority and a minority
• nine Justices appointed for life opinion, legally clarifying their
• vote on constitutionality of legislative decisions, which can be
and executive actions supplemented with separate opinions
• usually hears appeals from lower • Justices must ultimately render a
courts' decisions, which Justices binary decision
affirm or reverse • the second Rehnquist Court is
• sometimes Justices are recused for typically considered as 4 liberals and
conflict of interest, sickness, etc. 5 conservatives
• decisions by majority vote

Previous approaches to SCOTUS voting
Although there is a long history of scholarly study of SCOTUS, nearly all of approaches rely
on important assumptions, which are mostly justified in some way. Some assume Justices
vote independently given ideological preferences [5]. Others, if including interactions, do
not include them in a general model of voting or posit their structure instead of deriving it
from data [6]. Many draw on an underlying cognitive framework like rationality or
expression of internal beliefs [7—8]. Many rely on an ideological, liberal vs. conservative
axis [5,7—8]. Few consider the entire distribution of votes and so are not predictive outside
the selected subset. The two main classes of models are attitudinal and game theoretical.

Characteristics of
previous approaches

• Decision-making
• Independent framework
• Externally posited
voters • Ideology
interaction
• Subset of distribution of
votes is relevant Can we overcome
• Not generally predictive
some of these
limitations?
Attitudinal Game theoretical

Back to the data
The data, published by [4],
immediately allows us to rule
out some hypotheses. For
example, we see that in the
second Rehnquist Court, 44%
of votes were unanimous.
Overall, when considering the
natural courts shown on the
right, 36% of votes are
unanimous on average. Only
10% of votes fall along the
liberal vs. conservative divide.
Does an independent model
support this observation? Distribution of voting data for natural courts starting in
given year. Blue, 0 dissenting votes; red, 1; yellow, 2;
green, 3; black, 4. Terms are the number of years that
the same members remained on the Court. The number
of votes on record for each set of years is in gray.

The simplest model
Independent voters
Each Justice 𝑖 has probability 𝑝 𝑖 of voting to affirm, so the probability of 𝑘 votes in
the majority out of 𝑛 Justices is
𝑛 𝑘 𝑛−𝑘
𝑛
𝑃 𝑘 = 𝑝 1− 𝑝 + 𝑝 𝑛−𝑘 1 − 𝑝 𝑘
𝑘 𝑛− 𝑘
An independent model fails to explain the distribution of votes in the majority.
This is not so surprising because we’ve
assumed that all higher order behaviors are
encoded within the first moments. For
example, for a unanimous vote (𝑘 = 9) to
occur in the independent model, all Justices
would happen to vote the same way, but this
happens much more rarely than observed,
yielding 0.5% of the observed value. Indeed,
interactions are crucial to SCOTUS voting
behavior.

Evidence for interaction
Justices take at least two votes for a case: an initial secret vote and a final
decision. Justices may attempt to persuade each other in between, but it is
difficult to measure such interactions partly because the first vote is secret.

According to data available on the Waite Court (1874—1887), 9% of final
votes had at least one dissenting vote while 40% had at least one in the initial
vote [5]. In general, Justices more often switch to the majority than the
reverse, suggestive of consensus-promoting interaction. Maltzmann et al.
show from memos that Justices strategically manipulate their communication
to attempt to influence the vote and written opinions of the Court [6].

Nonetheless, most models either treat the Justices as independent or do not
explicitly include interactions in a predictive framework. Indeed, it is difficult
to devise the right structure for interactions!

How do we account for interactions in a principled fashion?

Including interactions…
The independent model is the simplest model that fits the average
voting record of a Justice 𝜎 𝑖 , 𝜎 𝑖 . It is the simplest because all higher
order correlations, 𝜎 𝑖 𝜎𝑗 , 𝜎 𝑖 𝜎𝑗 𝜎 𝑘 … 𝜎 𝑖 𝜎𝑗 … 𝜎 𝑛 , are reducible to 𝜎 𝑖 .
In fact, all higher order statistics are as random as possible given the
first moments. We can generalize this idea to a 𝑚th order model that
fits all correlations up to order 𝑚 yet generates all > 𝑚 order
correlations randomly. Since these distributions are as random as
possible given what is fit, that also means that we make no further
assumptions than what is given in the fit correlations or about how
these distributions are generated.

With SCOTUS, we might expect that we need to account for the bloc
behavior (5 vs. 4) and unanimous behavior by including terms of the
4th, 5th and 9th orders explicitly. However, let us take only the next step
of fitting both 𝜎 𝑖 and 𝜎 𝑖 𝜎𝑗 .

…as maximizing entropy
The formalization for generating these distributions is called the principle of maximum
entropy [9]. Entropy is a measure of the randomness of a distribution. The entropy of
a probability distribution 𝑃(𝜎) of the votes of a set of 𝑛 voters 𝜎 = {𝜎1 , … , 𝜎 𝑛 } is
𝑆 𝑃 𝜎 =− 𝑃 𝜎 log 𝑃(𝜎)
𝜎
which we maximize while constraining 𝜎 𝑖 and 𝜎 𝑖 𝜎𝑗
𝑛 𝑛
1
𝑆 𝑃 𝜎 , ℎ 𝑖 , 𝐽 𝑖𝑗 = 𝑆 − ℎ 𝑖 𝜎𝑖 − 𝐽 𝑖𝑗 𝜎 𝑖 𝜎𝑗
2
𝑖=1 𝑖,𝑗=1
with Lagrange multipliers ℎ 𝑖 , 𝐽 𝑖𝑗 . The resulting model is known as the Ising model
1
𝑃 𝜎 = 𝑒 −𝐻(𝜎)
𝑍
𝑛 1 𝑛
𝐻 𝜎 =− ℎ 𝑖 𝜎𝑖 − 𝐽 𝑖𝑗 𝜎 𝑖 𝜎𝑗
𝑖=1 2 𝑖,𝑗=1

with a normalizing constant, the partition function 𝑍, and Hamiltonian 𝐻(𝜎).

Ising model
1 −𝐻(𝜎)
𝑃 𝜎 = 𝑒
𝑍
𝑛 1 𝑛
𝐻 𝜎 =− ℎ 𝑖 𝜎𝑖 − 𝐽 𝑖𝑗 𝜎 𝑖 𝜎𝑗
𝑖=1 2 𝑖,𝑗=1

Since Justices have binary votes, we restrict 𝜎 𝑖 ∈ {−1,1}. The ℎ 𝑖
loosely refer to the “mean bias” of each voter 𝜎 𝑖 , and the 𝐽 𝑖𝑗 loosely
refer to the interaction between them, or “couplings.” Since votes, or
“states,” with a smaller 𝐻 are more probable, ℎ 𝑖 > 0 implies that 𝜎 𝑖 is
more likely to take value 1. Also, 𝐽 𝑖𝑗 > 0 implies that 𝜎 𝑖 and 𝜎𝑗 are
more likely to take the same value.

We can solve for the parameters ℎ 𝑖 and 𝐽 𝑖𝑗 such that our model fits the
given moments ⟨𝜎 𝑖 ⟩ and ⟨𝜎 𝑖 𝜎𝑗 ⟩.

Mapping spins
We have yet to define how the values of 𝜎 𝑖 correspond to
actual vote. It is not as simple as calling one value affirm and
the other reverse: the outcome of affirming or reversing
depends on how the case is posed. It is entirely possible that
affirming one case is a liberal decision and conservative in
another. What is the right dimension along which to orient the
𝜎 𝑖 ? We abstain from making a choice, and introducing
external bias, by symmetrizing the up and down votes such
that −1 and 1 are equivalent.
This keeps 𝜎 𝑖 𝜎𝑗 the same and fixes 𝜎 𝑖 = 0.
Correspondingly, ℎ 𝑖 = 0. We find that absence of a bias is a
reasonable assumption because bias is not the dominant term
for judicial voting behavior.

Model fit
Remarkably, the Ising model fits the
data well. One measure of the fit is to
consider the difference in entropy of the
𝑚th order model with the data
𝐼 𝑚 = 𝑆 𝑚 − 𝑆data [2]. As we increase 𝑚,
we capture more correlation and the
entropy of our models monotonically
decreases to that of the data, where
𝑆 𝑛 = 𝑆data . The furthest distance 𝐼1 is
called the multi-information. Our model
captures 90% of the multi-information
(right).

Thus, it nearly captures all the structure
in the data. It also follows − log 𝑃 𝜎 ∝ The model
𝐸(𝜎) for the most frequent states. The captures 90%
least fit states only appear one or twice of the multi-
on average in a bootstrap sample of the information.
data.

Implications of Ising model fit
The fit by the Ising model shows that higher order behaviors
like ideological blocs and unanimity can emerge from lower
order behaviors at the level of pairwise interactions between
individuals. Including higher order terms will result in a
marginal improvement in the fit.

This result is surprising because it suggests that higher level
coordination is not the dominant explanation of voting
behavior. Previously, scholars have pointed to the high level of
consensus in the Court to as evidence for a “norm of
consensus,” which seems analogous to an effective ninth
order term for behavior [6].

Found coupling network

𝐶 𝑖𝑗 = 𝜎 𝑖 𝜎𝑗 − 𝜎 𝑖 𝜎𝑗 and 𝐽 𝑖𝑗 graphs. Justices with a liberal voting record are
colored blue whereas those with a conservative are colored red. Positive edges
are red and negative blue. Widths are proportional to magnitude. All 𝐶 𝑖𝑗 are
positive whereas some 𝐽 𝑖𝑗 are negative. Justices are initialed: John Stevens (JS),
Ruth Ginsburg (RG), David Souter (DS), Steven Breyer (SB), Sandra O’Connor
(SO), Anthony Kennedy (AK), William Rehnquist (WR), Antonin Scalia (AS),
Clarence Thomas (CT).

Understanding couplings
As a simple check, we see that the average 𝐽 𝑖𝑗 within
ideological blocs (blue to blue or red to red) are positive
while the average between (blue to red) is negative
(previous slide). The corresponding averages of 𝐶 𝑖𝑗 also
show this relative change although all 𝐶 𝑖𝑗 are positive,
obscuring the antagonistic tendency.

To better understand the distribution of 𝐽 𝑖𝑗 , we consider
the effective field on 𝜎 𝑖 from its neighbors.
𝑛
1
ℎeff =
𝑖 𝐽 𝑖𝑗 𝜎𝑗
2
𝑗=1
Note that it depends on the state of neighbors 𝜎𝑗 . Since
this distribution over all 𝜎 is symmetric around 0, we
Distributions 𝑃[ℎeff 𝜎 ]. Red histogram is
only show the positive half (right).
𝑖
distribution of fields from only conservative
We fix 𝜎 𝑖 = 1 and compare the shifts in the distributions
Justices. Ordered from most liberal to most
of ℎeff of its neighbors 𝜎 , which we measure by taking
𝑗 𝑗 conservative record from left to right, top to
the mean over standard deviation 𝜇/Σ 𝑗𝑖 . In the absence bottom. The more conservatively a Justice
of such perturbation, 𝜇/Σ 𝑗𝑖 = 0. votes, the more the mean field due to
conservatives marches to the right.

Shifts in ℎ eff
𝑖
𝜇
= 0.8 Average shifts in
Σ 𝑗𝑖 distributions of
𝜇 ℎeff over
= 4.7 𝑖
Σ 𝑗𝑖 Liberals Conservatives 𝜇 ideological blocs
= 4.3 when holding one
Σ 𝑗𝑖 member of a bloc,
𝑖, at 1 at a time.
Average shift in liberals (𝑗) 𝜇
when holding = 0.8
conservatives fixed (𝑖) Σ 𝑗𝑖
As expected, ideological neighbors are much more affected by fixing 𝜎 𝑖 = 1
by a factor of 5-6. Overall, the Court always shifts in the same direction as the
perturbation.

O'Connor and Kennedy shift conservatives (liberals) to 𝜇/Σ 𝑗𝑖 = 2.6 (1.4) and
𝜇/Σ 𝑗𝑖 = 3.1 (1.1), reaffirming their moderate credentials. Stevens, however,
has weaker connections to both groups with 𝜇/Σ 𝑗𝑖 = 0.36 (2.72). Thus, we find
that higher order behavior as ideological blocs and general unanimity are
reflected in the couplings.

Caveats with couplings
We must be careful not to interpret the 𝐽 𝑖𝑗 literally as corresponding to
behavioral interaction on the Court. The distinction that we cannot
make, which is indeed impossible with this data set, is to explain the
underlying mechanism for correlations. We may find two Justices that
vote together too much for chance, but it could be the case that either
they collaborate to a large extent or that their perspectives have been
shaped by a similar background. The latter involves a hidden third
actor, but it is indistinguishable from the other with only the voting
record. In many ways, possible confounding factors that contribute to
𝐽 𝑖𝑗 reflect fundamental limitations of the data.

Our guiding principle is that we refrain from assuming anything beyond
what is already given from the data; other models do not have the
same claim minimal assumptions.

Probing influence
Now that we have a model of voting behavior, we can use the
model to probe the behavior of the system under
perturbations.

The quantity of interest here is the majority outcome of the
court
𝑁
𝑖 𝜎𝑖
𝛾=
| 𝑖𝑁 𝜎 𝑖 |
because this is the decision rendered.

How sensitive is the average decision 𝜸 to a small changes
in the average behavior of a Justice 𝝈 𝒊 ?

Probing influence
Formally, this is the susceptibility of ⟨𝛾⟩
1 𝜕⟨𝛾⟩ 1
𝜓𝑖 = = 𝛾𝜎 𝑖 − 𝛾 𝜎 𝑖
𝜒 𝑖 𝜕ℎ 𝑖 𝜒𝑖
which we have normalized over
𝜕⟨𝜎 𝑖 ⟩
𝜒𝑖 =
𝜕ℎ 𝑖
to compare the Justices equally with respect to changes in
their averages. The values we find are

SO AK WR DS SB AS RG CT JS mean
𝝍𝒊 0.834 0.809 0.719 0.650 0.644 0.623 0.616 0.608 0.421 0.658
𝝍 𝒊 − ⟨𝝍⟩ 0.176 0.151 0.061 -0.008 -0.014 -0.035 -0.042 -0.050 -0.237
95% confidence
0.001 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002
interval

Are ideological medians influential?
The typical wisdom in the political science literature is that these
ideological medians are the most influential for Court decisions.
Basically, the argument is that voters who sit in the middle of a
unidimensional, symmetric space will be predictive of the majority
[10]. The relevant space is liberal vs. conservative as we have
confirmed. The Justices to whom the outcome is most sensitive here
are the ideological medians SO and AK, in agreement with the claim.

However, real systems may be complicated by interactions that
constrain how such a voter may cast her vote or how a majority forms
initially and persists. We do not find that it is the ideological medians
to whom the outcome of the court is most sensitive in general.
Importantly, our results are derived under minimal assumptions. We
do not assume ideological behavior—which has to be imposed by the
observer—and we account for interactions.

Isolating interactions
Given these interactions, a Justice could affect the outcome in two ways:
1. Change own vote
2. Impact on colleagues’ votes through interactions

What is the role of interactions in the 𝝍 𝒊 ?

We isolate the role of interactions in the susceptibility of 𝜓 𝑖 . We simulate
how 𝜎 𝑖 increases pressure on colleagues through couplings by
1. increasing the average coupling such that neighbors’ effective fields
break symmetry around 0 to ⟨ℎeff ⟩ = 𝐽 𝑖𝑗 𝜖. However, this will also incur a
𝑗
shift to 𝜎 𝑖 ≠ 0, so
2. we add a compensating field ℎ′𝑖 to ℎeff to fix 𝜎 𝑖 = 0.
𝑖

We denote the resulting change from pushing on 𝜎 𝑖 ’s neighbors 𝛿𝛾 𝑖 .

AK SO DS SB RG WR AS CT JS mean
𝜹𝜸 𝒊 0.348 0.340 0.296 0.276 0.245 0.231 0.195 0.138 0.130 0.244
𝜹𝜸 𝒊 − ⟨𝜹𝜸⟩ 0.104 0.095 0.051 0.032 0.001 -0.014 -0.049 -0.106 -0.114
95% confidence
0.001 0.001 0.001 0.001 0.001 0.002 0.003 0.003 0.003
interval
Comparing with 𝜓 𝑖 …
AK and SO switch order and are relatively closer. Interactions
may differentiate between Justices for whom interactions are
important. It is not the case that Justices highest by 𝜓 𝑖 are
also highest by 𝛿𝛾 𝑖 across all natural courts although it is
here.

WR falls from 1st to 6th place. WR is the Chief Justice who is
responsible for enforcing procedural rules and has prerogative
for assigning opinions. Interestingly, WR is consistently low by
𝛿𝛾 𝑖 but rises in rank by 𝜓 𝑖 only being appointed Chief
Justice.

AK SO DS SB RG WR AS CT JS mean
𝜹𝜸 𝒊 0.348 0.340 0.296 0.276 0.245 0.231 0.195 0.138 0.130 0.244
𝜹𝜸 𝒊 − ⟨𝜹𝜸⟩ 0.104 0.095 0.051 0.032 0.001 -0.014 -0.049 -0.106 -0.114
95% confidence
0.001 0.001 0.001 0.001 0.001 0.002 0.003 0.003 0.003
interval
Comparing with 𝜓 𝑖 …
CT and JS are relatively much closer. CT and JS are the most
extreme voters on the conservative and liberal ends of the
spectrum. Fittingly, the outcome is least sensitive to their
couplings. Moreover, CT and AS are similarly biased
ideologically, but AS seems to be more strongly embedded in
the interaction network.

All 𝛿𝛾 𝑖 > 0, reflecting the general tendency to consensus.

Conclusion
We show that SCOTUS voting behavior can be explained as behavior
that emerges from pairwise interaction even though higher order
behaviors are manifest.

We show how one can exploit the model of voting behavior by
considering the susceptibility of ⟨𝛾⟩ to shifts in average voting
behavior. We also isolate the shifts in ⟨𝛾⟩ specific to interactions and
distinguish between Justices similar by 𝜓 𝑖 along that second dimension
of 𝛿𝛾 𝑖 .

However suggestive our results, the correspondence of parameters to
real behavior remains unclear. We hope to soon start a collaboration
with political scientists investigate whether an interpretable
correspondence can be established.

Works cited
1. W. Bialek, A. Cavagna, I. Giardina, T. Mora, and E. Silvestri, PNAS
109, 4786 (2012).
2. E. Schneidman, M. Berry, and R. Segev, Nature 440, 1007 (2006).
3. I. Couzin, J. Krause, et al., Nature 433, 7025 (2005).
4. H. J. Spaeth, L. Epstein, et al., Supreme Court Database (2011).
5. A.D. Martin and K. M. Quinn, Pol. Anal. 10, 134 (2002).
6. L. Epstein, J. A. Segal, et al., Am. J. of Pol. Sci 83, 557 (2001)
7. F. Maltzmann, J. F. Spriggs II, et al., Crafting law on the Supreme
Court (2000).
8. E. T. Jaynes, Phy. Rev. 106, 620 (1957).
9. D. Black, J. of Pol. Econ. 56, 23 (1948).

Acknowledgements
Funding from
NSF grant CCF-0939370
Dept. of Physics, Princeton University

Thanks to
Sigma Xi for hosting this showcase.

Statistical mechanics model of SCOTUS voting behavior

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Statistical mechanics model of SCOTUS voting behavior

Similar to Statistical mechanics model of SCOTUS voting behavior (20)

Statistical mechanics model of SCOTUS voting behavior