Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- ICPSR - Complex Systems Models in t... by Daniel Katz 2058 views
- ICPSR - Complex Systems Models in t... by Daniel Katz 952 views
- ICPSR - Complex Systems Models in t... by Daniel Katz 814 views
- Complex Systems Computing - Webscra... by Daniel Katz 918 views
- ICPSR - Complex Systems Models in t... by Daniel Katz 2758 views
- ICPSR - Complex Systems Models in t... by Daniel Katz 2720 views

601 views

Published on

No Downloads

Total views

601

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

30

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Complex Systems Models in the Social Sciences (Lecture 7) daniel martin katz illinois institute of technology chicago kent college of law @computationaldanielmartinkatz.com computationallegalstudies.com
- 2. consider the applied case of judicial prediction
- 3. Every year, law reviews, magazine and newspaper articles, television and radio time, conference panels, blog posts, and tweets are devoted to questions such as: How will the Court rule in particular cases?
- 4. Experts, Crowds, Algorithms
- 5. There are 3 Known Ways to Predict Something
- 6. Experts, Crowds, Algorithms
- 7. We could apply this to a wide range of problems
- 8. For today we will apply these approaches to the decisions of the Supreme Court of United States
- 9. this is an example of what is possible with other data
- 10. Experts
- 11. Columbia Law Review October, 2004 Theodore W. Ruger, Pauline T. Kim, Andrew D. Martin, Kevin M. Quinn Legal and Political Science Approaches to Predicting Supreme Court Decision Making The Supreme Court Forecasting Project:
- 12. experts
- 13. Case Level Prediction Justice Level Prediction 67.4% experts 58% experts From the 68 Included Cases for the 2002-2003 Supreme Court Term
- 14. these experts probably performed badly because they overﬁt
- 15. they ﬁt to the noise and not the signal
- 16. we need to evaluate experts and somehow benchmark their expertise
- 17. from a pure forecasting standpoint
- 18. the best known SCOTUS predictor is
- 19. Crowds
- 20. crowds
- 21. Algorithms
- 22. Black Reed Frankfurter Douglas Jackson Burton Clark Minton Warren Harlan Brennan Whittaker Stewart White Goldberg Fortas Marshall Burger Blackmun Powell Rehnquist Stevens OConnor Scalia Kennedy Souter Thomas Ginsburg Breyer Roberts Alito Sotomayor Kagan 1953 1963 1973 1983 1993 2003 2013 9-0 Reverse 8-1, 7-2, 6-3 19 19 19 19 19 20 20 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 - Reverse 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 - 8-1, 7-2, 6-3 9-0 19 19 19 19 19 20 20 algorithms
- 23. we have developed an algorithm that we call {Marshall}+ extremely randomized trees (ERT)
- 24. Benchmarking since 1953 + Using only data available prior to the decision Mean Court Direction [FE] Mean Court Direction 10 [FE] Mean Court Direction Issue [FE] Mean Court Direction Issue 10 [FE] Mean Court Direction Petitioner [FE] Mean Court Direction Petitioner 10 [FE] Mean Court Direction Respondent [FE] Mean Court Direction Respondent 10 [FE] Mean Court Direction Circuit Origin [FE] Mean Court Direction Circuit Origin 10 [FE] Mean Court Direction Circuit Source [FE] Mean Court Direction Circuit Source 10 [FE] Difference Justice Court Direction [FE] Abs. Difference Justice Court Direction [FE] Difference Justice Court Direction Issue [FE] Abs. Difference Justice Court Direction Issue [FE] Z Score Difference Justice Court Direction Issue [FE] Difference Justice Court Direction Petitioner [FE] Abs. Difference Justice Court Direction Petitioner [FE] Difference Justice Court Direction Respondent [FE] Abs. Difference Justice Court Direction Respondent [FE] Z Score Justice Court Direction Difference [FE] Justice Lower Court Direction Difference [FE] Justice Lower Court Direction Abs. Difference [FE] Justice Lower Court Direction Z Score [FE] Z Score Justice Lower Court Direction Difference [FE] Agreement of Justice with Majority [FE] Agreement of Justice with Majority 10 [FE] Difference Court and Lower Ct Direction [FE] Abs. Difference Court and Lower Ct Direction [FE] Z-Score Difference Court and Lower Ct Direction [FE] Z-Score Abs. Difference Court and Lower Ct Direction [FE] Justice [S] Justice Gender [FE] Is Chief [FE] Party President [FE] Natural Court [S] Segal Cover Score [SC] Year of Birth [FE] Mean Lower Court Direction Circuit Source [FE] Mean Lower Court Direction Circuit Source 10 [FE] Mean Lower Court Direction Issue [FE] Mean Lower Court Direction Issue 10 [FE] Mean Lower Court Direction Petitioner [FE] Mean Lower Court Direction Petitioner 10 [FE] Mean Lower Court Direction Respondent [FE] Mean Lower Court Direction Respondent 10 [FE] Mean Justice Direction [FE] Mean Justice Direction 10 [FE] Mean Justice Direction Z Score [FE] Mean Justice Direction Petitioner [FE] Mean Justice Direction Petitioner 10 [FE] Mean Justice Direction Respondent [FE] Mean Justice Direction Respondent 10 [FE] Mean Justice Direction for Circuit Origin [FE] Mean Justice Direction for Circuit Origin 10 [FE] Mean Justice Direction for Circuit Source [FE] Mean Justice Direction for Circuit Source 10 [FE] Mean Justice Direction by Issue [FE] Mean Justice Direction by Issue 10 [FE] Mean Justice Direction by Issue Z Score [FE] Admin Action [S] Case Origin [S] Case Origin Circuit [S] Case Source [S] Case Source Circuit [S] Law Type [S] Lower Court Disposition Direction [S] Lower Court Disposition [S] Lower Court Disagreement [S] Issue [S] Issue Area [S] Jurisdiction Manner [S] Month Argument [FE] Month Decision [FE] Petitioner [S] Petitioner Binned [FE] Respondent [S] Respondent Binned [FE] Cert Reason [S] Mean Agreement Level of Current Court [FE] Std. Dev. of Agreement Level of Current Court [FE] Mean Current Court Direction Circuit Origin [FE] Std. Dev. Current Court Direction Circuit Origin [FE] Mean Current Court Direction Circuit Source [FE] Std. Dev. Current Court Direction Circuit Source [FE] Mean Current Court Direction Issue [FE] Z-Score Current Court Direction Issue [FE] Std. Dev. Current Court Direction Issue [FE] Mean Current Court Direction [FE] Std. Dev. Current Court Direction [FE] Mean Current Court Direction Petitioner [FE] Std. Dev. Current Court Direction Petitioner [FE] Mean Current Court Direction Respondent [FE] Std. Dev. Current Court Direction Respondent [FE] 0.00781 0.00205 0.00283 0.00604 0.00764 0.00971 0.00793 TOTAL 0.04403 Justice and Court Background Information Case Information 0.00978 0.00971 0.00845 0.00953 0.01015 0.01370 0.01190 0.01125 0.00706 0.01541 0.01469 0.00595 0.02014 0.01349 0.01406 0.01199 0.01490 0.01179 0.01408 TOTAL 0.22814 Overall Historic Supreme Court Trends 0.00988 0.01997 0.01546 0.00938 0.00863 0.00904 0.00875 0.00925 0.00791 0.00864 0.00951 0.01017 TOTAL 0.12663 Lower Court Trends 0.00962 0.01017 0.01334 0.00933 0.00949 0.00874 0.00973 0.00900 TOTAL 0.07946 0.00955 0.00936 0.00789 0.00850 0.00945 0.01021 0.01469 0.00832 0.01266 0.00918 0.00942 0.00863 0.00894 0.00882 0.00888 Current Supreme Court Trends TOTAL 0.14456 Individual Supreme Court Justice Trends 0.01248 0.01530 0.00826 0.00732 0.01027 0.00724 0.01030 0.00792 0.00945 0.00891 0.00970 0.01881 0.00950 0.00771 TOTAL 0.14323 0.01210 0.00929 0.01167 0.00968 0.01055 0.00705 0.00708 0.00690 0.00699 0.01280 0.01922 0.02494 0.01126 0.00992 0.00866 0.01483 0.01522 0.01199 0.01217 0.01150 TOTAL 0.23391 Differences in Trends
- 25. Total Cases Predicted Total Votes Predicted 7,700 68,964
- 26. Justice Prediction Case Prediction 70.9% accuracy 69.6% accuracy From 1953 - 2014
- 27. Relies upon Random Forest but first lets look at CART
- 28. Classiﬁcation and RegressionTrees (CART)
- 29. Given Some Data: (X1, Y1), ... , (Xn, Yn) Now We Have a New Set of X’s We Want to Predict the Y
- 30. Form a BinaryTree that Minimizes the Error in each leaf of the tree CART (Classiﬁcation & RegressionTrees)
- 31. Observe the Correspondence Between the Data andTrees
- 32. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0 Adapted from Example By Mathematical Monk
- 33. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0 Adapted from Example By Mathematical Monk We want to build an approach which can lead to the proper classiﬁcation (labeling) of new data points ( ) that are dropped into this space
- 34. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0 Adapted from Example By Mathematical Monk
- 35. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0 Adapted from Example By Mathematical Monk L e t s B e g i n t o Partition the Space
- 36. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0 1 2 1 2 Adapted from Example By Mathematical Monk L e t s B e g i n t o Partition the Space split 1 (a)
- 37. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0 1 2 1 2 Adapted from Example By Mathematical Monk This Split Will Be Memorialized in theTree split 1 (a)
- 38. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0 1 2 1 2 Adapted from Example By Mathematical Monk We Ask the Question is Xi1 > 1 ? - with a binary (yes or no) response split 1 (a) Xi1 > 1 ? YesNo
- 39. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0 1 2 1 2 Adapted from Example By Mathematical Monk If No - then we are in zone (a) ... we tally the number of zeros and ones Using Majority Rule do we assign a classiﬁcation to this rule this leaf split 1 (a) Xi1 > 1 ? YesNo (0,5) Classify as 1 zone (a)
- 40. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0 1 2 1 2 Adapted from Example By Mathematical Monk Here we Classify as a 1 because (0,5) which is 0 zero’s and 5 one’s split 1 (a) Xi1 > 1 ? YesNo (0,5) Classify as 1 zone (a)
- 41. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0 1 2 1 2 Adapted from Example By Mathematical Monk Using a Similar Approach Lets Begin to Fill in the Rest of theTree split 1 (a) Xi1 > 1 ? YesNo (0,5) Classify as 1 zone (a)
- 42. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0 1 2 1 2 Adapted from Example By Mathematical Monk split 1 (a) Xi1 > 1 ? YesNo (0,5) Classify as 1 zone (a) Xi2 > 1.45 ? No Yes split 2
- 43. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0split 1 split 2 split 3 1 2 2.2 1 2 Xi1 > 1 ? (0,5) Xi2 > 1.45 ? (4,1)(2,3) Xi1 < 2 ? Classify as 1 Classify as 1 Classify as 0 (a) zone (a) 1.45 YesNo Adapted from Example By Mathematical Monk No (b) (c) zone (b) zone (c) YesNo Yes
- 44. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0split 1 split 2 split 3 split 4 1 2 2.2 1 2 Xi1 > 1 ? (0,5) Xi2 > 1.45 ? Xi1 > 2.2 ? (1,4)(5,0)(4,1)(2,3) Xi1 < 2 ? Classify as 1 Classify as 1 Classify as 0 (a) zone (a) 1.45 YesNo Adapted from Example By Mathematical Monk No (b) (c) (d) (e) zone (b) zone (c) YesNo YesNo Yes zone (d) Classify as 0 Classify as 1 zone (e)
- 45. Okay Lets Add Back the ( ) which are new items to be classiﬁed
- 46. For simplicity sake there is one in each zone
- 47. We Will Use theTree Because theTree Is Our Prediction Machine
- 48. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0split 1 split 2 split 3 split 4 1 2 2.2 1 2 Xi1 > 1 ? (0,5) Xi2 > 1.45 ? Xi1 > 2.2 ? (1,4)(5,0)(4,1)(2,3) Xi1 < 2 ? Classify as 1 Classify as 1 Classify as 0 (a) zone (a) 1.45 YesNo Adapted from Example By Mathematical Monk No (b) (c) (d) (e) zone (b) zone (c) YesNo YesNo Yes zone (d) Classify as 0 Classify as 1 zone (e)
- 49. 1 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 01 0 Xi1 Xi2 0split 1 split 2 split 3 split 4 1 2 2.2 1 2 Xi1 > 1 ? (0,5) Xi2 > 1.45 ? Xi1 > 2.2 ? (1,4)(5,0)(4,1)(2,3) Xi1 < 2 ? Classify as 1 Classify as 1 Classify as 0 (a) zone (a) 1.45 YesNo Adapted from Example By Mathematical Monk No (b) (c) (d) (e) zone (b) zone (c) YesNo YesNo Yes zone (d) Classify as 0 Classify as 1 zone (e) 1 1 1 0 1 0
- 50. In this simple example, we eyeballed the 2D space, partitioned it and stopped after 4 Splits
- 51. Most Real Problems are Not So Simple ...
- 52. Real problems are n-dimensional (not 2D) (1)
- 53. For real problems, you need to select criteria (or a criterion) for deciding where to partition (split) the data (2)
- 54. For real problems you must develop a stopping condition or pursue recursive partitioning of the space (3)
- 55. Solutions to these 3 Problems are among the core questions in algorithm selection / development
- 56. From an Algorithmic Perspective - TheTask is to Develop a Method to Partition theTrees
- 57. Must Do So Without Knowing the Speciﬁc Contours of the Data / Problem in Question
- 58. So How Do We TraverseThrough The Data?
- 59. Optimal Partitioning of Trees is NP-Complete
- 60. “Although any given solution to an NP-complete problem can be veriﬁed quickly (in polynomial time), there is no known efﬁcient way to locate a solution in the ﬁrst place; indeed, the most notable characteristic of NP-complete problems is that no fast solution to them is known.That is, the time required to solve the problem using any currently known algorithm increases very quickly as the size of the problem grows”
- 61. key implication is that one cannot in advance determine the “optimal tree”
- 62. Breiman, et al (1984) uses a Greedy Optimization Method
- 63. Greedy Optimization Method is used to calculate the MLE (maximum-likelihood estimation)
- 64. Greedy is a Heuristic “makes the locally optimal choice at each stage with the hope of ﬁnding a global optimum. In many problems, a greedy strategy does not in general produce an optimal solution, but nonetheless a greedy heuristic may yield locally optimal solutions that approximate a global optimal solution in a reasonable time.”
- 65. CART Approach to Decision Trees
- 66. Get the Data Here: http://www.stat.cmu.edu/~cshalizi/350/hw/06/cadata.dat
- 67. x <- read.table("http://www.stat.cmu.edu/~cshalizi/350/hw/06/cadata.dat") Get the Data Here: Load the DataSet: http://www.stat.cmu.edu/~cshalizi/350/hw/06/cadata.dat
- 68. http://www.stat.cmu.edu/~cshalizi/350/lectures/22/lecture-22.pdf x <- read.table("http://www.stat.cmu.edu/~cshalizi/350/hw/06/cadata.dat", header=TRUE) Get the Data Here: Load the DataSet: http://www.stat.cmu.edu/~cshalizi/350/hw/06/cadata.dat Follow Example on Page 4-7 (example 2.1)
- 69. http://www3.nd.edu/~mclark19/learn/ML.pdf Replicate this On Your Own
- 70. Applications of Classiﬁcation Trees in Law
- 71. http://wusct.wustl.edu/media/man2.pdf
- 72. Random Forest
- 73. One well-known problem with standard classiﬁcation trees is their tendency toward overﬁtting
- 74. This is because standard decision trees are weak learners
- 75. Random forest is an approach to aggregate weak learners into collective strong learners (think of it as statistical crowd sourcing)
- 76. Random Forest: Group of DecisionTrees Outperforms and is more Robust (i.e. is less likely to overﬁt) than a Single DecisionTree
- 77. Ensemble method that leverages bagging (bootstrap aggregation) Brieman (1996) With Random Substrates Brieman (2001) Random Forest:
- 78. bootstrap aggregation is applied to the training data random substrates is applied to / about the variables Two Layers of Randomness
- 79. bootstrap aggregation (row) is applied to the training data random substrates (column) is applied to / about the variables Two Layers of Randomness
- 80. What is Bagging?
- 81. bagging = bootstrap aggregation
- 82. https://www.youtube.com/watch?v=Rm6s6gmLTdg
- 83. “if the outlook is sunny and the humidity is less than or equal to 70, then it’s probably OK to play.” http://bit.ly/1icRlmE Single Decision Tree
- 84. Single Decision Tree http://bit.ly/1icRlmE Random Forest (Blackwell 2012)
- 85. Sample N cases at random with replacement to create a subset of the data STEP 1: (Blackwell 2012)
- 86. M predictor variables are selected at random from all the predictor variables. The predictor variable that provides the best split, according to some objective function, is used to do a binary split on that node. At the next node, choose another m variables at random from all predictor variables and do the same.” STEP 2: “At each node:
- 87. http://www.stat.berkeley.edu/~breiman/RandomForests/
- 88. https://www.youtube.com/watch?v=ngaQrYqxtoM#t=18
- 89. Additional Notes For Random Forest Trees are not pruned As potentially overﬁt individual trees combine to yield well ﬁt ensembles
- 90. http://machinelearning202.pbworks.com/w/ﬁle/fetch/37597425/ performanceCompSupervisedLearning-caruana.pdf Trees (particularly with optimization) have proven to be unreasonably effect
- 91. 10 Different Binary Classiﬁcation Methods on 11 Different Datasets (w/ 5000 training cases each) Trees and Forest were surprisingly effective
- 92. http://videolectures.net/solomon_caruana_wslmw/
- 93. http://www.r-bloggers.com/a-brief-tour-of-the-trees-and-forests/
- 94. http://www.r-bloggers.com/classiﬁcation-tree-models/
- 95. Experts, Crowds, Algorithms
- 96. For most problems ... ensembles of these streams outperform any single stream
- 97. Humans + Machines
- 98. Humans + Machines >
- 99. Humans + Machines Humans or Machines >
- 100. Ensembles come in various forms
- 101. Here is a well known example
- 102. Poll Aggregation is one form of ensemble where the learning question is to determine how much weight (if any) to assign to each individual poll
- 103. poll weighting
- 104. A Visual Depiction of How to build an ensemble method in our judicial prediction example
- 105. expert crowd algorithm ensemble method learning problem is to discover when to use a given stream of intelligence
- 106. expert crowd algorithm via back testing we can learn the weights to apply for particular problems ensemble method learning problem is to discover when to use a given stream of intelligence
- 107. {Marshall}+ algorithm
- 108. expert crowd algorithm
- 109. {Marshall}+ improvement will likely come from determining the optimal weighting of experts, crowds and algorithms for various types of cases
- 110. ERISA cases thus might look like this
- 111. Patent cases Perhaps might look like this
- 112. Search/Seizure cases while could look like this
- 113. this is one slice our research effort ...
- 114. and we are working on a series of improvements to the model
- 115. including structuring previously unstructured datasets
- 116. and using natural language processing tools (where appropriate)

No public clipboards found for this slide

Be the first to comment