Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Martin Katz

5,782 views

Published on

Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Martin Katz

Published in: Education

Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Martin Katz

  1. 1. Quantitative Legal Prediction Professor Daniel Martin Katz Illinois Tech - Chicago Kent College of Law -Or- How I Learned to Stop Worrying and Start Preparing for the Future of the Legal Services Industry) @computationaldanielmartinkatz.com computationallegalstudies.com
  2. 2. My is Lab Focused Upon Support the R&D for the Legal Services Industry
  3. 3. There are many Research & Development Questions in the Legal Services Industry
  4. 4. My Research Group is Actively Engaged in Research that is Relevant to Developing Future Legal/Govt Information Products
  5. 5. Here Are a Few Things from Our Lab 3D HD Visualization of Supreme Court Citation Network Campaign Contributions and Legislative Ecosystems Six Degrees of Marbury v. Madison Electronic World Treaty Index The United States Code
  6. 6. Here Are a Few Things from Our Lab American Federal Judiciary American Law Professoriate Building New Algorithms Large Scale Judicial Studies
  7. 7. Legal Language Explorer Indexing 450,000+ Cases
  8. 8. ComputationalLegalStudies.com BLOG
  9. 9. before providing some concrete examples - some broad thoughts ...
  10. 10. three faces of innovation in legal
  11. 11. (1) lawyers for innovators / entrepreneurs
  12. 12. (1) lawyers for innovators / entrepreneurs what most lawyers and law schools think of as “Law+Entrepreneurship"
  13. 13. (2) lawyers as innovators - substance
  14. 14. poison pill - “the most important innovation in corporate law since Samuel Calvin Tate Dodd invented the trust for John D. Rockefeller and Standard Oil in 1879” (2) lawyers as innovators - substance
  15. 15. emerging areas - 3D Printing, Driverless Cars, Augmented Reality, Data Breach, Big Data+Privacy, etc. Drones, Internet of Things, CyberSecurity, (2) lawyers as innovators - substance
  16. 16. (3) lawyers as innovators - business/process
  17. 17. innovation directed toward transforming the practice of law (3) lawyers as innovators - business/process
  18. 18. there are different ways that organizations are innovating on the third face
  19. 19. {Law Substantive Legal Expertise Analytics Platform AI Computing Process Mapping User Experience Design Thinking Business Models Regulation Marketing + Tech + Design TM + Delivery}
  20. 20. some traditional law firms have been very aggressive
  21. 21. but most of the innovation is Lex.Startup
  22. 22. Lex.Startup is beginning to take hold
  23. 23. 15 2009 Lex.Startup
  24. 24. 15 2009 Lex.Startup
  25. 25. 15 425+ 2009 2014 Law or Legal Related Companies as highlighted by Josh Kubicki @ ReInventLaw London 2013 Lex.Startup
  26. 26. So what are these folks doing?
  27. 27. R + D Function in the Legal Industry
  28. 28. We Could Imagine a World Where Law Firms Did the R+D for the Industry
  29. 29. But That Has (Mostly) Proven Illusive
  30. 30. Lex.Startup is undertaking that function
  31. 31. Here are the specific approaches that are being undertaken
  32. 32. Some organizations are doing more than one
  33. 33. labor arbitrage
  34. 34. labor arbitrage process/ tech arbitrage
  35. 35. labor arbitrage process/ tech arbitrage regulatory arbitrage
  36. 36. labor arbitrage process/ tech arbitrage regulatory arbitrage design as the ultimate bespoke
  37. 37. labor arbitrage process/ tech arbitrage regulatory arbitrage design as the ultimate bespoke predictive analytics
  38. 38. could do an individual talk on any of these topics...
  39. 39. labor arbitrage process/ tech arbitrage regulatory arbitrage design as the ultimate bespoke predictive analytics
  40. 40. Quantitative Legal Prediction Daniel Martin Katz Michigan State University - College of Law -Or- How I Learned to Stop Worrying and Start Preparing for the Future of the Legal Services Industry)
  41. 41. Today I Would Like to Sketch (In Part) Where I Believe the Legal Industry is Heading
  42. 42. Simply Put
  43. 43. Data Driven Law Practice
  44. 44. Before Talking About the Law Business
  45. 45. Some Broad Trends
  46. 46. This is the Era of “Big Data” Decreasing Data Storage Costs Increasing Computing Power Fundamentally Altering the Scope of Scientific Inquiry and Technical Possibility
  47. 47. Highlighting the Data Deluge 2008 2009 2010
  48. 48. 2011 2011
  49. 49. What is Driving the Big Data Revolution?
  50. 50. Moore’s law !
  51. 51. And How Big is ‘Big’?
  52. 52. How Much Data Is a Petabyte?
  53. 53. How Much Data Is a Petabyte?
  54. 54. Kryder’s law !
  55. 55. How Much Data Is a Petabyte?
  56. 56. How Much Data Is a Petabyte?
  57. 57. How Much Data Is a Petabyte?
  58. 58. How Much Data Is a Petabyte?
  59. 59. Erik Brynjolfsson is the Schussel Family Professor at the  MIT Sloan School of Management , Director of the MIT Center for Digital Business, Chair of the MIT Sloan Management Review , and the Editor of the Information Systems Network Andrew McAfee, a principal research scientist at MIT’s Center for Digital Business, studies the ways that information technology (IT) affects business.
  60. 60. . .. .... .. ...... .... ............ 128 256 512 1024 2048 4096 327688192 16384 65536 131k 262k 524k 1M 2M 4M ................................................................ 8M ................................ 16M 33M 67M 134M 268M 536M 1B 2B 4B 8B 17B 34B 68B 137B 274B 549B 1T 2T 4T 8T 17T 35T 70T 140T 281T 562T 1Q 2Q 4Q 9Q 18Q 36Q 72Q 144Q 288Q 576Q 1QT 2QT 4QT 9QT
  61. 61. Okay But ...
  62. 62. Data is only half the story...
  63. 63. Computation and Artificial Intelligence
  64. 64. The Artificial Intelligence Revolution is On
  65. 65. The Artificial Intelligence Revolution is On
  66. 66. The Artificial Intelligence Revolution is On
  67. 67. The Artificial Intelligence Revolution is On .... But it is not what we thought
  68. 68. ‘Soft’ Artificial Intelligence
  69. 69. “Practically every financial transaction, from someone buying a cup of coffee to someone trading a trillion dollars of credit default derivatives, is done in software .... Health care and education, in my view, are next up for fundamental software-based transformation. My venture capital firm is backing aggressive start-ups in both of these gigantic and critical industries. We believe both of these industries, which historically have been highly resistant to entrepreneurial change, are primed for tipping by great new software-centric entrepreneurs ... Companies in every industry need to assume that a software revolution is coming.”
  70. 70. The First Response I Typically Encounter
  71. 71. You Cannot Replace The Things I Do ...
  72. 72. With a Computer
  73. 73. It is Useful
  74. 74. To Consider Industries
  75. 75. Where Human Reasoning
  76. 76. Was Paramount
  77. 77. Finance was an Industry
  78. 78. Where Qualitative Human Reasoning
  79. 79. Reigned Supreme
  80. 80. But Not Anymore ...
  81. 81. The Rise of the Quants...
  82. 82. 50%+ of Trades on NYSE
  83. 83. http://www.cbsnews.com/video/watch/?id=6945451n
  84. 84. How About This One ...
  85. 85. 2004 DARPA Grand Challenge
  86. 86. Goal: Build a Driverless Car that Could Travel 150 miles
  87. 87. Winning Vehicle Traveled only Eight Miles
  88. 88. Fast Forward to 2012 ...
  89. 89. Now The Business of Law
  90. 90. Remember Industry It is Not A Binary Proposition
  91. 91. Computers CANNOT Do Everything
  92. 92. But
  93. 93. Displacing 20%-30% of the Work Load is Damn Pretty Significant
  94. 94. 100  Lawyers 70  Lawyers   10  Law  +  Tech     5    Tech  +  Law   70  Lawyers  in  ‘Safe’  Employment     30  Lawyers  in  Employment                Susceptible  to  Automation 85  Lawyers/Legal  Service  Jobs 30%  Reduction  in  Traditional  Law  Jobs   15%  Reduction  in  Law  Related  Employment   Arbitrage  Opportunities  For  Helping  Move  Across  the  Spectrum
  95. 95. 30%  Reduction  in  Traditional  Law  Jobs   15%  Reduction  in  Law  Related  Employment   Arbitrage  Opportunities  For  Helping  Move  Across  the  Spectrum 100  Lawyers 70  Lawyers   10  Law  +  Tech     5    Tech  +  Law   70  Lawyers  in  ‘Safe’  Employment     30  Lawyers  in  Employment                Susceptible  to  Automation 85  Lawyers/Legal  Service  Jobs
  96. 96. There is Potential For Growing Other Parts of the Market ...
  97. 97. Technology Aided Access to Justice
  98. 98. Some Applicable Terms That Will Drive The Future of the Industry
  99. 99. And Will Be Part of What it Means to “Think Like a Lawyer”
  100. 100. Natural Language Processing Clustering Knowledge Representation Dimension Reduction Feature Selection Feature Extraction Classification
  101. 101. http://www.drewconway.com/zia/?p=2378
  102. 102. So What is the Next Big Thing... ?
  103. 103. Quantitative Legal Prediction
  104. 104. 2011 The Age of Quantitative Legal Prediction
  105. 105. 2011 The Age of Quantitative Legal Prediction
  106. 106. 2011 The Age of Quantitative Legal Prediction
  107. 107. 2012 The Age of Quantitative Legal Prediction
  108. 108. 2013 The Age of Quantitative Legal Prediction
  109. 109. 2013 The Age of Quantitative Legal Prediction
  110. 110. 2013
  111. 111. 2013
  112. 112. 2013
  113. 113. 2013
  114. 114. 2013
  115. 115. Three Key Ideas About Prediction
  116. 116. (1) Inverse Problem (2) System Dynamics (3) How Machines Learn
  117. 117. Hypothesis Testing is the Core of Mainstream Science
  118. 118. Deduction
  119. 119. Popperian Falsification
  120. 120. Partial or Complete Induction
  121. 121. Is the Alternative
  122. 122. In Case You Did not Know
  123. 123. This is an Inductive Age
  124. 124. This is the Age of Aspirational Spelling (Spelling is 1.0 Thinking)
  125. 125. (a) Induce a Plausible Model from Existing Data
  126. 126. (b) Validate Model
  127. 127. Either: Out of Sample Forward Prediction Or Both
  128. 128. (2) System Dynamics
  129. 129. Imagine Two Different Complex Systems
  130. 130. Weather
  131. 131. Tides
  132. 132. vs. Easy/ Predictable Difficult / Chaotic TIDES ALMANAC
  133. 133. Formal Treatment of the question of prediction in alternative Domains
  134. 134. (3) Machine Learning is the Heart of Predictive Analytics
  135. 135. Cause and Effect Quantitative Legal Prediction vs.
  136. 136. Cause and Effect Quantitative Legal Prediction vs.
  137. 137. Quantitative Methods for Lawyers Professor Daniel Martin Katz
  138. 138. Legal Analytics Professor Daniel Martin Katz Professor Michael J Bommarito II http://www.legalanalyticscourse.com/
  139. 139. Supervised Statistical models Bayesian, e.g., Naïve Bayes Classification Frequentist, e.g., Ordinary Least Squares Neural Networks (NN) Support Vector Machines (SVM) Random Forests (RF) Genetic Algorithms (GA) Semi/Unsupervised Neural Networks (NN) Clustering K-means Hierarchical Radial Basis (RBF) Graph Some Machine Learning Methods
  140. 140. http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
  141. 141. classification clustering regression dimension reduction the family of machine learning methods
  142. 142. Quick Example of Some of the Methods
  143. 143. Adapted from Slides By Victor Lavrenko and Nigel Goddard @ University of Edinburgh Take A LookThese 12
  144. 144. 72 Female Human 3 Female Horse 36 Male Human 21 Male Human 67 Male Human 29 Female Human 54 Male Human 44 Male Human 50 Male Human 42 Female Human 6 Male Dog 7 Female Human
  145. 145. Classification (Supervised Learning) decision boundary female male f( ) Gender?
  146. 146. Classification (Supervised Learning) decision boundary female male f( ) Gender? Regression (Supervised Learning) #f( ) Age? 723 2 3 67 54 29 42 44 50 7 6 27 44 53 3 68 2 48 10 6 743 4 4
  147. 147. Classification (Supervised Learning) decision boundary female male f( ) Gender? f( ) Loan Application? Yes Multi Class Classification (Supervised Learning) No Maybe Yes Perhaps No Multiclass = Boundary Hyperplane Regression (Supervised Learning) #f( ) Age? 723 2 3 67 54 29 42 44 50 7 6 27 44 53 3 68 2 48 10 6 743 4 4
  148. 148. Classification (Supervised Learning) decision boundary female male f( ) Gender? f( ) Loan Application? Yes Multi Class Classification (Supervised Learning) No Maybe Yes Perhaps No Multiclass = Boundary Hyperplane Regression (Supervised Learning) #f( ) Age? 723 2 3 67 54 29 42 44 50 7 6 27 44 53 3 68 2 48 10 6 743 4 4 Clustering (Unsupervised Learning) Clusterf( ) Group?
  149. 149. MY THESIS
  150. 150. Human Prediction is One Hallmark of the Legal Services Industry
  151. 151. The Race for the Future of this Industry
  152. 152. To Use Applied Machine Learning to Mimic the Behavior “Expert Reasoners”
  153. 153. but do so at highly aggregated scale
  154. 154. What Do I Mean?
  155. 155. Humans are Amazing Pattern Detectors
  156. 156. But Aggregation is the Question ...
  157. 157. How Does a Human Reasoner Evaluate 10,000 100,000 1,000,000 Data Points?
  158. 158. Truth is They Don’t
  159. 159. Truth is They Don’t #Heuristics
  160. 160. Quantitative Legal Prediction
  161. 161. It Has Already Begun ...
  162. 162. And It Will Continue to Move Up The Value Chain ...
  163. 163. E-Discovery
  164. 164. E-Discovery
  165. 165. Some Pieces of the Previous Entry Level BigLaw Jobs
  166. 166. Have Been Displaced In Part By
  167. 167. Just Remember there was a Day When E-Discovery was not mainstream
  168. 168. And Now It Dominates Our Industry
  169. 169. And Now It Dominates The Industry
  170. 170. It Is About to Be Reset
  171. 171. But Another Form of Predictive Technology...
  172. 172. Predictive Coding
  173. 173. Predictive Coding Will Move Across the Machine Learning Spectrum Supervised UnsupervisedSemi- Supervised Present Future
  174. 174. Supervised Unsupervised Predictive Coding (Classification) The Future Machine Learning Methods 2 x 2 Informed Naive Basic Clustering Algorithm © daniel martin katz michael j bommarito
  175. 175. Predictive Coding as an example of applied machine learning ...
  176. 176. predictive coding = ~ binary classification © daniel martin katz michael j bommarito
  177. 177. © daniel martin katz michael j bommarito LearningTask = Determine Whether a Given Document is Relevant? Relevant Not Relevant f( ) relevance? Binary Classification (Supervised Learning) and/or 010 101 001
  178. 178. take the sample set as a training set and use human experts © daniel martin katz michael j bommarito
  179. 179. the use of the human experts is called “supervised learning” © daniel martin katz michael j bommarito
  180. 180. in the simple binary case, ask humans to assign objects to two piles © daniel martin katz michael j bommarito
  181. 181. Apply Human Coders © daniel martin katz michael j bommarito
  182. 182. yellow = relevant white = non-relevant and return this © daniel martin katz michael j bommarito
  183. 183. Non RelevantRelevant © daniel martin katz michael j bommarito
  184. 184. Key Insight ... © daniel martin katz michael j bommarito
  185. 185. What Allows A Human To Separate These Two Classes of Documents? © daniel martin katz michael j bommarito
  186. 186. that precise human process is what “predictive coding” is trying to mimic © daniel martin katz michael j bommarito
  187. 187. most vendors are selling a largely undifferentiated product © daniel martin katz michael j bommarito
  188. 188. Humans are selecting upon some “features” of the documents © daniel martin katz michael j bommarito
  189. 189. to place those documents in their respective bins
 (i.e. relevant, non-relevant) © daniel martin katz michael j bommarito
  190. 190. features =? text, author, date, other metadata © daniel martin katz michael j bommarito
  191. 191. machine learning task is trying to recover (learn) what separates the relevant from the non-relevant documents © daniel martin katz michael j bommarito
  192. 192. once we learn the rule / boundary we can apply it to separate the remain documents into the two classes © daniel martin katz michael j bommarito
  193. 193. © daniel martin katz michael j bommarito we want to take what we learn here
  194. 194. © daniel martin katz michael j bommarito we want to take what we learn here
  195. 195. © daniel martin katz michael j bommarito we want to take what we learn here and apply it here
  196. 196. Legal Procurement & Legal Supply Chain Mgmt.
  197. 197. Legal Supply Chain Mgmt. (High End of Market) Data and Logistics = General Counsels as Maestros managing the global legal supply chain
  198. 198. General Counsels as Legal Procurement Specialists TyMetrix - Using $50 billion+ in Legal Spend Data to Help GC’s Look for Arbitrage Opportunities, Value Propositions in Hiring Law Firms Legal Procurement (High End of Market)
  199. 199. Driving Down your Legal Bills Yeah there is an App for That City Firm Size Partner Experience Calculate Legal Procurement (High End of Market) http://tymetrix.com/mobile_apps/
  200. 200. Predicting Judicial Decisions
  201. 201. Model Leverages Classification Tree (Tool from Machine Learning)
  202. 202. Here is a Technical Paper that I Am Currently Finishing
  203. 203. Predicting the Behavior of the United States Supreme Court: A General Approach By Daniel Martin Katz Michael J. Bommarito II Josh Blackman
  204. 204. From Classification Trees to Random Forest
  205. 205. Random forest is an approach to aggregate weak learners into collective strong learners (think of it as “the wisdom of the statistical crowds”)
  206. 206. Affirm or Reverse Lower Court Decision ? Left Hand Side (Y)
  207. 207. Right Hand Side ( X’s) court_direction_mean court_direction_std justice_direction_mean justice_direction_std justice_court_difference_z lcDisposition_Direction lcDispositionDirection_difference_abs lcDispositionDirection_difference lcDispositionDirection_difference_z Case Information issue issueArea lawType certReason respondent respondent_dk petitioner petitioner_dk caseOrigin caseSource monthArgument timesince_arg Court / Justice Information party_president segal_cover year_of_birth naturalCourt Historical Justice / Court Information
  208. 208. SCOTUS Random Forest
  209. 209. SCOTUS Random Forest
  210. 210. This is a Great Example
  211. 211. But This is Merely the Tip of the Iceberg
  212. 212. Predicting Outcomes in Disputes
  213. 213. Disputes v. Decisions
  214. 214. Disputes, Filings, etc.
  215. 215. Bargaining in the Shadow of the Law
  216. 216. What is the Client’s First Question?
  217. 217. Do I have a case?
  218. 218. How is that assessment generated?
  219. 219. How Does the Human Reasoner Arrive at Their Conclusion?
  220. 220. Pattern Detection High Dimensional Similarity Matching Analogical Reasoning etc.
  221. 221. Mental Models { vs or + } Aggregated Data
  222. 222. The Immediate Future =
  223. 223. Humans + Machines > Humans or Machines
  224. 224. Standard Client Memo + Statistical Portrait of Lots of ‘similar’ cases
  225. 225. Examples
  226. 226. IP LITIGATION / M & A Valuation
  227. 227. https://lexmachina.com/
  228. 228. https://lexmachina.com/ June 2012
  229. 229. November 2013
  230. 230. Securities Fraud Litigation
  231. 231. Predictive Model of Securities Fraud Class Action Lawsuits
  232. 232. Predicts both the likelihood of settlement and the expected settlement amount
  233. 233. Uses only variables that are known at the day of filing
  234. 234. There Are Many Other Additional Examples ...
  235. 235. Negotiations
  236. 236. Transactional Work
  237. 237. “The software identifies standard and terms in contracts, and its benchmarking tools show lawyers how their current document compares to the standard.”
  238. 238. Due Diligence
  239. 239. The system comes pre-trained 
 for provisions including: Title, Parties, Date, Term, Change of Control, Assignment, Indemnity, Confidentiality, Governing Law, License Grant, Bankruptcy, Notice, Amendment, Non-Solicit, and more.
  240. 240. Based on testing, we know our system finds 90% or more of the instances of nearly every substantive provision it covers. This 90% number is our system’s recall; its precision differs by provision by provision but is consistently very manageable.
  241. 241. We are able to build custom provisions on request. Thanks to our highly customized training algorithms, this process is easy and relatively automated. We are also engaged in adding more provisions.
  242. 242. Lawyer Quality & Performance
  243. 243. The VC Community Is Turning to Legal
  244. 244. R e p o r t e d s a l e price between $35 million and $40 million. Final Number was likely between $80 - $100 million A n u m b e r o f venture capitalists have invested in t h e c o m p a n y , including Silicon Valley’s Sequoia C a p i t a l w h i c h invested $7 million in 2007 ....
  245. 245. And There is Lots More in this Space ...
  246. 246. Okay There are Serious Technical Questions In Here ...
  247. 247. But this is Also Very Practical
  248. 248. Lawyers are in the Prediction Business (in Part)
  249. 249. Technology Has Already Disrupted Law ...
  250. 250. There Is Going to Be Lots More
  251. 251. In Other Words,
  252. 252. Welcome To Law’s Information Revolution
  253. 253. And Yes There Is Going to Be Math (& Computing) on the Exam
  254. 254. Daniel Martin Katz Illinois Institute of Technology Associate Professor of Law @ computational computationallegalstudies.com danielmartinkatz.com

×