Sigma xi showcase_2013_draft0

152 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
152
On SlideShare
0
From Embeds
0
Number of Embeds
29
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Sigma xi showcase_2013_draft0

  1. 1. Statistical mechanics of SCOTUS Edward D. Lee1 Chase P. Broedersz1,2 William Bialek1,2 1Department of Physics, Princeton U. 2Lewis-Sigler Institute for Integrative Genomics, Princeton U.
  2. 2. From complex to simpleGroup behavior of social organisms canmanifest complex patterns at the grouplevel even though they might reducedown to simple rules for individuals[1β€”3]. For example, starlings seem tointeract only with local neighbors, andthese local interactions produce globalpatterns like on the right. We call thisemergent collective phenomenabecause such behavior is not explicitlyencoded in the behavior of individuals,but arises from interactions.Is it the case that decisions by groupsof people, despite apparent Flocks of starlings form complex patternscomplexities, are likewise reducible to (http://webodysseum.com/videos/spectacusimpler rules? lar-starling-flocks-video-murmuration/).
  3. 3. US Supreme Court (SCOTUS)We investigate the voting behavior of the SCOTUS. We wish to use thestructure of the decisions to gain insight into how these decisions arereached. We show results for the second Rehnquist Court (1994β€”2005, 𝑁 = 895 cases) and discuss other β€œnatural courts”—periods of time whenmember remain constantβ€”where relevant.SCOTUS facts:β€’ highest court in the US government β€’ write a majority and a minorityβ€’ nine Justices appointed for life opinion, legally clarifying theirβ€’ vote on constitutionality of legislative decisions, which can be and executive actions supplemented with separate opinionsβ€’ usually hears appeals from lower β€’ Justices must ultimately render a courts decisions, which Justices binary decision affirm or reverse β€’ the second Rehnquist Court isβ€’ sometimes Justices are recused for typically considered as 4 liberals and conflict of interest, sickness, etc. 5 conservativesβ€’ decisions by majority vote
  4. 4. Previous approaches to SCOTUS votingAlthough there is a long history of scholarly study of SCOTUS, nearly all of approaches relyon important assumptions, which are mostly justified in some way. Some assume Justicesvote independently given ideological preferences [5]. Others, if including interactions, donot include them in a general model of voting or posit their structure instead of deriving itfrom data [6]. Many draw on an underlying cognitive framework like rationality orexpression of internal beliefs [7β€”8]. Many rely on an ideological, liberal vs. conservativeaxis [5,7β€”8]. Few consider the entire distribution of votes and so are not predictive outsidethe selected subset. The two main classes of models are attitudinal and game theoretical.Characteristics ofprevious approaches β€’ Decision-making β€’ Independent framework β€’ Externally posited voters β€’ Ideology interaction β€’ Subset of distribution of votes is relevant Can we overcome β€’ Not generally predictive some of these limitations? Attitudinal Game theoretical
  5. 5. Back to the dataThe data, published by [4],immediately allows us to ruleout some hypotheses. Forexample, we see that in thesecond Rehnquist Court, 44%of votes were unanimous.Overall, when considering thenatural courts shown on theright, 36% of votes areunanimous on average. Only10% of votes fall along theliberal vs. conservative divide.Does an independent modelsupport this observation? Distribution of voting data for natural courts starting in given year. Blue, 0 dissenting votes; red, 1; yellow, 2; green, 3; black, 4. Terms are the number of years that the same members remained on the Court. The number of votes on record for each set of years is in gray.
  6. 6. The simplest model Independent votersEach Justice 𝑖 has probability 𝑝 𝑖 of voting to affirm, so the probability of π‘˜ votes inthe majority out of 𝑛 Justices is 𝑛 π‘˜ π‘›βˆ’π‘˜ 𝑛 𝑃 π‘˜ = 𝑝 1βˆ’ 𝑝 + 𝑝 π‘›βˆ’π‘˜ 1 βˆ’ 𝑝 π‘˜ π‘˜ π‘›βˆ’ π‘˜An independent model fails to explain the distribution of votes in the majority.This is not so surprising because we’veassumed that all higher order behaviors areencoded within the first moments. Forexample, for a unanimous vote (π‘˜ = 9) tooccur in the independent model, all Justiceswould happen to vote the same way, but thishappens much more rarely than observed,yielding 0.5% of the observed value. Indeed,interactions are crucial to SCOTUS votingbehavior.
  7. 7. Evidence for interactionJustices take at least two votes for a case: an initial secret vote and a finaldecision. Justices may attempt to persuade each other in between, but it isdifficult to measure such interactions partly because the first vote is secret.According to data available on the Waite Court (1874β€”1887), 9% of finalvotes had at least one dissenting vote while 40% had at least one in the initialvote [5]. In general, Justices more often switch to the majority than thereverse, suggestive of consensus-promoting interaction. Maltzmann et al.show from memos that Justices strategically manipulate their communicationto attempt to influence the vote and written opinions of the Court [6].Nonetheless, most models either treat the Justices as independent or do notexplicitly include interactions in a predictive framework. Indeed, it is difficultto devise the right structure for interactions!How do we account for interactions in a principled fashion?
  8. 8. Including interactions…The independent model is the simplest model that fits the averagevoting record of a Justice 𝜎 𝑖 , 𝜎 𝑖 . It is the simplest because all higherorder correlations, 𝜎 𝑖 πœŽπ‘— , 𝜎 𝑖 πœŽπ‘— 𝜎 π‘˜ … 𝜎 𝑖 πœŽπ‘— … 𝜎 𝑛 , are reducible to 𝜎 𝑖 .In fact, all higher order statistics are as random as possible given thefirst moments. We can generalize this idea to a π‘šth order model thatfits all correlations up to order π‘š yet generates all > π‘š ordercorrelations randomly. Since these distributions are as random aspossible given what is fit, that also means that we make no furtherassumptions than what is given in the fit correlations or about howthese distributions are generated.With SCOTUS, we might expect that we need to account for the blocbehavior (5 vs. 4) and unanimous behavior by including terms of the4th, 5th and 9th orders explicitly. However, let us take only the next stepof fitting both 𝜎 𝑖 and 𝜎 𝑖 πœŽπ‘— .
  9. 9. …as maximizing entropyThe formalization for generating these distributions is called the principle of maximumentropy [9]. Entropy is a measure of the randomness of a distribution. The entropy ofa probability distribution 𝑃(𝜎) of the votes of a set of 𝑛 voters 𝜎 = {𝜎1 , … , 𝜎 𝑛 } is 𝑆 𝑃 𝜎 =βˆ’ 𝑃 𝜎 log 𝑃(𝜎) 𝜎which we maximize while constraining 𝜎 𝑖 and 𝜎 𝑖 πœŽπ‘— 𝑛 𝑛 1 𝑆 𝑃 𝜎 , β„Ž 𝑖 , 𝐽 𝑖𝑗 = 𝑆 βˆ’ β„Ž 𝑖 πœŽπ‘– βˆ’ 𝐽 𝑖𝑗 𝜎 𝑖 πœŽπ‘— 2 𝑖=1 𝑖,𝑗=1with Lagrange multipliers β„Ž 𝑖 , 𝐽 𝑖𝑗 . The resulting model is known as the Ising model 1 𝑃 𝜎 = 𝑒 βˆ’π»(𝜎) 𝑍 𝑛 1 𝑛 𝐻 𝜎 =βˆ’ β„Ž 𝑖 πœŽπ‘– βˆ’ 𝐽 𝑖𝑗 𝜎 𝑖 πœŽπ‘— 𝑖=1 2 𝑖,𝑗=1with a normalizing constant, the partition function 𝑍, and Hamiltonian 𝐻(𝜎).
  10. 10. Ising model 1 βˆ’π»(𝜎) 𝑃 𝜎 = 𝑒 𝑍 𝑛 1 𝑛 𝐻 𝜎 =βˆ’ β„Ž 𝑖 πœŽπ‘– βˆ’ 𝐽 𝑖𝑗 𝜎 𝑖 πœŽπ‘— 𝑖=1 2 𝑖,𝑗=1Since Justices have binary votes, we restrict 𝜎 𝑖 ∈ {βˆ’1,1}. The β„Ž 𝑖loosely refer to the β€œmean bias” of each voter 𝜎 𝑖 , and the 𝐽 𝑖𝑗 looselyrefer to the interaction between them, or β€œcouplings.” Since votes, orβ€œstates,” with a smaller 𝐻 are more probable, β„Ž 𝑖 > 0 implies that 𝜎 𝑖 ismore likely to take value 1. Also, 𝐽 𝑖𝑗 > 0 implies that 𝜎 𝑖 and πœŽπ‘— aremore likely to take the same value.We can solve for the parameters β„Ž 𝑖 and 𝐽 𝑖𝑗 such that our model fits thegiven moments ⟨𝜎 𝑖 ⟩ and ⟨𝜎 𝑖 πœŽπ‘— ⟩.
  11. 11. Mapping spinsWe have yet to define how the values of 𝜎 𝑖 correspond toactual vote. It is not as simple as calling one value affirm andthe other reverse: the outcome of affirming or reversingdepends on how the case is posed. It is entirely possible thataffirming one case is a liberal decision and conservative inanother. What is the right dimension along which to orient the 𝜎 𝑖 ? We abstain from making a choice, and introducingexternal bias, by symmetrizing the up and down votes suchthat βˆ’1 and 1 are equivalent.This keeps 𝜎 𝑖 πœŽπ‘— the same and fixes 𝜎 𝑖 = 0.Correspondingly, β„Ž 𝑖 = 0. We find that absence of a bias is areasonable assumption because bias is not the dominant termfor judicial voting behavior.
  12. 12. Model fitRemarkably, the Ising model fits thedata well. One measure of the fit is toconsider the difference in entropy of the π‘šth order model with the data 𝐼 π‘š = 𝑆 π‘š βˆ’ 𝑆data [2]. As we increase π‘š,we capture more correlation and theentropy of our models monotonicallydecreases to that of the data, where 𝑆 𝑛 = 𝑆data . The furthest distance 𝐼1 iscalled the multi-information. Our modelcaptures 90% of the multi-information(right).Thus, it nearly captures all the structurein the data. It also follows βˆ’ log 𝑃 𝜎 ∝ The model 𝐸(𝜎) for the most frequent states. The captures 90%least fit states only appear one or twice of the multi-on average in a bootstrap sample of the information.data.
  13. 13. Implications of Ising model fitThe fit by the Ising model shows that higher order behaviorslike ideological blocs and unanimity can emerge from lowerorder behaviors at the level of pairwise interactions betweenindividuals. Including higher order terms will result in amarginal improvement in the fit.This result is surprising because it suggests that higher levelcoordination is not the dominant explanation of votingbehavior. Previously, scholars have pointed to the high level ofconsensus in the Court to as evidence for a β€œnorm ofconsensus,” which seems analogous to an effective ninthorder term for behavior [6].
  14. 14. Found coupling network 𝐢 𝑖𝑗 = 𝜎 𝑖 πœŽπ‘— βˆ’ 𝜎 𝑖 πœŽπ‘— and 𝐽 𝑖𝑗 graphs. Justices with a liberal voting record arecolored blue whereas those with a conservative are colored red. Positive edgesare red and negative blue. Widths are proportional to magnitude. All 𝐢 𝑖𝑗 arepositive whereas some 𝐽 𝑖𝑗 are negative. Justices are initialed: John Stevens (JS),Ruth Ginsburg (RG), David Souter (DS), Steven Breyer (SB), Sandra O’Connor(SO), Anthony Kennedy (AK), William Rehnquist (WR), Antonin Scalia (AS),Clarence Thomas (CT).
  15. 15. Understanding couplingsAs a simple check, we see that the average 𝐽 𝑖𝑗 withinideological blocs (blue to blue or red to red) are positivewhile the average between (blue to red) is negative(previous slide). The corresponding averages of 𝐢 𝑖𝑗 alsoshow this relative change although all 𝐢 𝑖𝑗 are positive,obscuring the antagonistic tendency.To better understand the distribution of 𝐽 𝑖𝑗 , we considerthe effective field on 𝜎 𝑖 from its neighbors. 𝑛 1 β„Žeff = 𝑖 𝐽 𝑖𝑗 πœŽπ‘— 2 𝑗=1Note that it depends on the state of neighbors πœŽπ‘— . Sincethis distribution over all 𝜎 is symmetric around 0, we Distributions 𝑃[β„Žeff 𝜎 ]. Red histogram isonly show the positive half (right). 𝑖 distribution of fields from only conservativeWe fix 𝜎 𝑖 = 1 and compare the shifts in the distributions Justices. Ordered from most liberal to mostof β„Žeff of its neighbors 𝜎 , which we measure by taking 𝑗 𝑗 conservative record from left to right, top tothe mean over standard deviation πœ‡/Ξ£ 𝑗𝑖 . In the absence bottom. The more conservatively a Justiceof such perturbation, πœ‡/Ξ£ 𝑗𝑖 = 0. votes, the more the mean field due to conservatives marches to the right.
  16. 16. Shifts in β„Ž eff 𝑖 πœ‡ = 0.8 Average shifts in Ξ£ 𝑗𝑖 distributions of πœ‡ β„Žeff over = 4.7 𝑖Σ 𝑗𝑖 Liberals Conservatives πœ‡ ideological blocs = 4.3 when holding one Ξ£ 𝑗𝑖 member of a bloc, 𝑖, at 1 at a time. Average shift in liberals (𝑗) πœ‡ when holding = 0.8 conservatives fixed (𝑖) Ξ£ 𝑗𝑖As expected, ideological neighbors are much more affected by fixing 𝜎 𝑖 = 1by a factor of 5-6. Overall, the Court always shifts in the same direction as theperturbation.OConnor and Kennedy shift conservatives (liberals) to πœ‡/Ξ£ 𝑗𝑖 = 2.6 (1.4) and πœ‡/Ξ£ 𝑗𝑖 = 3.1 (1.1), reaffirming their moderate credentials. Stevens, however,has weaker connections to both groups with πœ‡/Ξ£ 𝑗𝑖 = 0.36 (2.72). Thus, we findthat higher order behavior as ideological blocs and general unanimity arereflected in the couplings.
  17. 17. Caveats with couplingsWe must be careful not to interpret the 𝐽 𝑖𝑗 literally as corresponding tobehavioral interaction on the Court. The distinction that we cannotmake, which is indeed impossible with this data set, is to explain theunderlying mechanism for correlations. We may find two Justices thatvote together too much for chance, but it could be the case that eitherthey collaborate to a large extent or that their perspectives have beenshaped by a similar background. The latter involves a hidden thirdactor, but it is indistinguishable from the other with only the votingrecord. In many ways, possible confounding factors that contribute to 𝐽 𝑖𝑗 reflect fundamental limitations of the data.Our guiding principle is that we refrain from assuming anything beyondwhat is already given from the data; other models do not have thesame claim minimal assumptions.
  18. 18. Probing influenceNow that we have a model of voting behavior, we can use themodel to probe the behavior of the system underperturbations.The quantity of interest here is the majority outcome of thecourt 𝑁 𝑖 πœŽπ‘– 𝛾= | 𝑖𝑁 𝜎 𝑖 |because this is the decision rendered.How sensitive is the average decision 𝜸 to a small changesin the average behavior of a Justice 𝝈 π’Š ?
  19. 19. Probing influenceFormally, this is the susceptibility of βŸ¨π›ΎβŸ© 1 πœ•βŸ¨π›ΎβŸ© 1 πœ“π‘– = = π›ΎπœŽ 𝑖 βˆ’ 𝛾 𝜎 𝑖 πœ’ 𝑖 πœ•β„Ž 𝑖 πœ’π‘–which we have normalized over πœ•βŸ¨πœŽ 𝑖 ⟩ πœ’π‘– = πœ•β„Ž 𝑖to compare the Justices equally with respect to changes intheir averages. The values we find are SO AK WR DS SB AS RG CT JS mean ππ’Š 0.834 0.809 0.719 0.650 0.644 0.623 0.616 0.608 0.421 0.658 𝝍 π’Š βˆ’ ⟨𝝍⟩ 0.176 0.151 0.061 -0.008 -0.014 -0.035 -0.042 -0.050 -0.23795% confidence 0.001 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 interval
  20. 20. Are ideological medians influential?The typical wisdom in the political science literature is that theseideological medians are the most influential for Court decisions.Basically, the argument is that voters who sit in the middle of aunidimensional, symmetric space will be predictive of the majority[10]. The relevant space is liberal vs. conservative as we haveconfirmed. The Justices to whom the outcome is most sensitive hereare the ideological medians SO and AK, in agreement with the claim.However, real systems may be complicated by interactions thatconstrain how such a voter may cast her vote or how a majority formsinitially and persists. We do not find that it is the ideological mediansto whom the outcome of the court is most sensitive in general.Importantly, our results are derived under minimal assumptions. Wedo not assume ideological behaviorβ€”which has to be imposed by theobserverβ€”and we account for interactions.
  21. 21. Isolating interactionsGiven these interactions, a Justice could affect the outcome in two ways:1. Change own vote2. Impact on colleagues’ votes through interactionsWhat is the role of interactions in the 𝝍 π’Š ?We isolate the role of interactions in the susceptibility of πœ“ 𝑖 . We simulatehow 𝜎 𝑖 increases pressure on colleagues through couplings by1. increasing the average coupling such that neighbors’ effective fields break symmetry around 0 to βŸ¨β„Žeff ⟩ = 𝐽 𝑖𝑗 πœ–. However, this will also incur a 𝑗 shift to 𝜎 𝑖 β‰  0, so2. we add a compensating field β„Žβ€²π‘– to β„Žeff to fix 𝜎 𝑖 = 0. 𝑖We denote the resulting change from pushing on 𝜎 𝑖 ’s neighbors 𝛿𝛾 𝑖 .
  22. 22. Isolating interactions AK SO DS SB RG WR AS CT JS mean 𝜹𝜸 π’Š 0.348 0.340 0.296 0.276 0.245 0.231 0.195 0.138 0.130 0.244 𝜹𝜸 π’Š βˆ’ ⟨𝜹𝜸⟩ 0.104 0.095 0.051 0.032 0.001 -0.014 -0.049 -0.106 -0.11495% confidence 0.001 0.001 0.001 0.001 0.001 0.002 0.003 0.003 0.003intervalComparing with πœ“ 𝑖 …AK and SO switch order and are relatively closer. Interactionsmay differentiate between Justices for whom interactions areimportant. It is not the case that Justices highest by πœ“ 𝑖 arealso highest by 𝛿𝛾 𝑖 across all natural courts although it ishere.WR falls from 1st to 6th place. WR is the Chief Justice who isresponsible for enforcing procedural rules and has prerogativefor assigning opinions. Interestingly, WR is consistently low by 𝛿𝛾 𝑖 but rises in rank by πœ“ 𝑖 only being appointed ChiefJustice.
  23. 23. Isolating interactions AK SO DS SB RG WR AS CT JS mean 𝜹𝜸 π’Š 0.348 0.340 0.296 0.276 0.245 0.231 0.195 0.138 0.130 0.244 𝜹𝜸 π’Š βˆ’ ⟨𝜹𝜸⟩ 0.104 0.095 0.051 0.032 0.001 -0.014 -0.049 -0.106 -0.11495% confidence 0.001 0.001 0.001 0.001 0.001 0.002 0.003 0.003 0.003intervalComparing with πœ“ 𝑖 …CT and JS are relatively much closer. CT and JS are the mostextreme voters on the conservative and liberal ends of thespectrum. Fittingly, the outcome is least sensitive to theircouplings. Moreover, CT and AS are similarly biasedideologically, but AS seems to be more strongly embedded inthe interaction network.All 𝛿𝛾 𝑖 > 0, reflecting the general tendency to consensus.
  24. 24. ConclusionWe show that SCOTUS voting behavior can be explained as behaviorthat emerges from pairwise interaction even though higher orderbehaviors are manifest.We show how one can exploit the model of voting behavior byconsidering the susceptibility of βŸ¨π›ΎβŸ© to shifts in average votingbehavior. We also isolate the shifts in βŸ¨π›ΎβŸ© specific to interactions anddistinguish between Justices similar by πœ“ 𝑖 along that second dimensionof 𝛿𝛾 𝑖 .However suggestive our results, the correspondence of parameters toreal behavior remains unclear. We hope to soon start a collaborationwith political scientists investigate whether an interpretablecorrespondence can be established.
  25. 25. Works cited1. W. Bialek, A. Cavagna, I. Giardina, T. Mora, and E. Silvestri, PNAS 109, 4786 (2012).2. E. Schneidman, M. Berry, and R. Segev, Nature 440, 1007 (2006).3. I. Couzin, J. Krause, et al., Nature 433, 7025 (2005).4. H. J. Spaeth, L. Epstein, et al., Supreme Court Database (2011).5. A.D. Martin and K. M. Quinn, Pol. Anal. 10, 134 (2002).6. L. Epstein, J. A. Segal, et al., Am. J. of Pol. Sci 83, 557 (2001)7. F. Maltzmann, J. F. Spriggs II, et al., Crafting law on the Supreme Court (2000).8. E. T. Jaynes, Phy. Rev. 106, 620 (1957).9. D. Black, J. of Pol. Econ. 56, 23 (1948).
  26. 26. Further bibliography
  27. 27. Acknowledgements Funding from NSF grant CCF-0939370Dept. of Physics, Princeton University Thanks to Sigma Xi for hosting this showcase.

Γ—