Legal Analytics - Introduction to the Course - Professor Daniel Martin Katz + Professor Michael J Bommartio II

17,761 views

Published on

Legal Analytics - Introduction to the Course - Professor Daniel Martin Katz + Professor Michael J Bommartio II

Published in: Education

Legal Analytics - Introduction to the Course - Professor Daniel Martin Katz + Professor Michael J Bommartio II

  1. 1. Legal Analytics Professor Daniel Martin Katz Professor Michael J Bommarito II legalanalyticscourse.com Class 1 Introduction to the Course
  2. 2. This Course is Called “Legal Analytics” access more at legalanalyticscourse.com
  3. 3. < As We Define it ... “Legal Analytics” is about deriving substantively meaningful insight from some sort of legal data > access more at legalanalyticscourse.com
  4. 4. < Let us start with a general description of the overall landscape > access more at legalanalyticscourse.com
  5. 5. () Statistical Models for Causal Inference () Statistical Models for Prediction versus access more at legalanalyticscourse.com
  6. 6. Few Words About Causal Inference Causal Inference is at the core of the “empirical turn” that has taken hold in law as well as the social sciences access more at legalanalyticscourse.com
  7. 7. Such Approaches are best for Appropriate Problems/ Question where identifying/linking cause and effect are key Few Words About Causal Inference access more at legalanalyticscourse.com
  8. 8. Instrumental Variables, Propensity Score Matching, Rubin Causal Model, Regression Discontinuity, Difference in Differences, etc. Here are just some of the methods/topics associated with causal inference Few Words About Causal Inference access more at legalanalyticscourse.com
  9. 9. However, the methods associated with Causal Inference will not be the focus of this course access more at legalanalyticscourse.com
  10. 10. We are focused upon prediction access more at legalanalyticscourse.com
  11. 11. We are focused upon machine learning access more at legalanalyticscourse.com
  12. 12. We are focused upon data science access more at legalanalyticscourse.com
  13. 13. We are going to learn data management skills access more at legalanalyticscourse.com
  14. 14. SKILLSTO BETAUGHT: Collecting, cleaning and processing data Exploring and analyzing data to produce knowledge and insights, including: Communicating data and knowledge to clients, colleagues, or courts. Machine learning (i.e., classification, regression, and clustering) Visualization Natural language processing (time permitting)
  15. 15. Books For This Class access more at legalanalyticscourse.com
  16. 16. <TheTheoretical Orientation > access more at legalanalyticscourse.com
  17. 17. Deduction versus Induction access more at legalanalyticscourse.com
  18. 18. “Long Before Machine Learning came into existence philosophers knew that generalizing from particular cases to general rules is not a well posed problem” Flach Page 20 access more at legalanalyticscourse.com
  19. 19. David Hume access more at legalanalyticscourse.com
  20. 20. Black Swans Two Core Issues: Uniformity of Nature? access more at legalanalyticscourse.com
  21. 21. Black Swan Problem Even If We Observe White Swan after White Swan we cannot induce that all swans are white access more at legalanalyticscourse.com
  22. 22. “[T]here are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – there are things we do not know we don't know. ” United States Secretary of Defense Donald Rumsfeld
  23. 23. Uniformity of Nature It is a mistake to presuppose that a sequence of events in the future will occur as it always has in the past access more at legalanalyticscourse.com
  24. 24. < Regression as a Prediction Tool > access more at legalanalyticscourse.com
  25. 25. Standard Linear Regression Can Be Used to Predict a Quantity access more at legalanalyticscourse.com
  26. 26. Task = Predict the Expected Hourly Rate of a Lawyer f( ) Cost? # and/or 010 101 001 access more at legalanalyticscourse.com
  27. 27. Build a (Regression) Model from Existing Billing Data access more at legalanalyticscourse.com
  28. 28. Y = βo +/- β1 ( X1 ) +/- β2 ( X2 ) +/- β3 ( X3 ) +/- β4 ( X3 ) +/- β5 ( X3 ) + ε Y = $151 + $15 ( ) + 161 ( ) + 95 ( ) + 34 ( ) +/- β5 ( ) + ε Per 100 Lawyers If Tier 1 Market is True Partner Status is True Per 10 Years Practice Area access more at legalanalyticscourse.com
  29. 29. Turn Around and Use This Model To Predict a New Lawyer (also Matters, etc.) access more at legalanalyticscourse.com
  30. 30. This Requires a Method to Deal With Changes in Dynamics access more at legalanalyticscourse.com
  31. 31. This Requires a Method to Update the Model as Time Moves Forward access more at legalanalyticscourse.com
  32. 32. Must Deal With Underfitting / Overfitting the Existing Data access more at legalanalyticscourse.com
  33. 33. < Machine Learning A HighLevel Overview > access more at legalanalyticscourse.com
  34. 34. < Machine Learning HighLevel Overview > See Flach Textbook Page 11 access more at legalanalyticscourse.com
  35. 35. See Flach Textbook Page 11 Here we have the main ingredients of machine learning: tasks, models and features. access more at legalanalyticscourse.com
  36. 36. “A task (red box) requires an appropriate mapping – a model – from data described by features to outputs.” “Obtaining such a mapping from training data is what constitutes a learning problem (blue box).” access more at legalanalyticscourse.com
  37. 37. Key Point: “tasks are addressed by models, w h e r e a s l e a r n i n g problems are solved by learning algorithms that produce models” access more at legalanalyticscourse.com
  38. 38. < The Family of ML Methods > access more at legalanalyticscourse.com
  39. 39. http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
  40. 40. Adapted from Slides By Victor Lavrenko and Nigel Goddard @ University of Edinburgh Take A LookThese 12 access more at legalanalyticscourse.com
  41. 41. 72 Female Human 3 Female Horse 36 Male Human 21 Male Human 67 Male Human 29 Female Human 54 Male Human 44 Male Human 50 Male Human 42 Female Human 6 Male Dog 7 Female Human access more at legalanalyticscourse.com
  42. 42. Task = Determine the Gender of the Respective Agents female male f( ) Gender? and/or 010 101 001 access more at legalanalyticscourse.com
  43. 43. Task = Determine the Gender of the Respective Agents female male f( ) Gender? Binary Classification (Supervised Learning) and/or 010 101 001 access more at legalanalyticscourse.com
  44. 44. Classification (Supervised Learning) female male f( ) Gender? access more at legalanalyticscourse.com
  45. 45. Classification (Supervised Learning) decision boundary female male f( ) Gender? access more at legalanalyticscourse.com
  46. 46. Task = Determine Whether the Agents Will Obtain Employment? Yes No f( ) Job? Binary Classification (Supervised Learning) access more at legalanalyticscourse.com
  47. 47. Classification (Supervised Learning) Yes No f( ) Job? access more at legalanalyticscourse.com
  48. 48. Classification (Supervised Learning) decision boundary Yes No f( ) Job? decision boundary access more at legalanalyticscourse.com
  49. 49. Task = Determine Whether the Agents Will Obtain a Loan? Yes Perhapsf( ) Loan? Multi Class Classification (Supervised Learning) No access more at legalanalyticscourse.com
  50. 50. f( ) Multi Class Classification (Supervised Learning) Loan? Yes Perhaps No access more at legalanalyticscourse.com
  51. 51. f( ) Loan? Yes Multi Class Classification (Supervised Learning) No Maybe Yes Perhaps No access more at legalanalyticscourse.com
  52. 52. Multiclass = Hyperplane access more at legalanalyticscourse.com
  53. 53. Task = Determine the Age of the Respective Agents f( ) Age? Regression (Supervised Learning) # access more at legalanalyticscourse.com
  54. 54. Regression (Supervised Learning) #f( ) Age? 723 21 36 67 54 29 42 44 50 7 6 access more at legalanalyticscourse.com
  55. 55. Regression (Supervised Learning) #f( ) Age? 723 21 36 67 54 29 42 44 50 7 6 27 44 53 37 68 22 48 10 6 74 3 44 access more at legalanalyticscourse.com
  56. 56. Task = Can We Determine to Which Group the Agent Belongs? Clustering (Unsupervised Learning) f( ) Group? Cluster Relies upon some notion of “similarity” access more at legalanalyticscourse.com
  57. 57. Clustering (Unsupervised Learning) Clusterf( ) Group? access more at legalanalyticscourse.com
  58. 58. Clustering (Unsupervised Learning) Clusterf( ) Group? access more at legalanalyticscourse.com
  59. 59. < Loss Functions > access more at legalanalyticscourse.com
  60. 60. “In statistics, typically a loss function is used for parameter estimation, and the event in question is some function of the difference between estimated and true values for an instance of data.” access more at legalanalyticscourse.com
  61. 61. Take a Set of Predictor X’s and some response Y Obtain a function f (X) to make predictions of Y from those input variables This is called a loss function L(Y, f (X)) In order to identify f (X) we need another function to penalize errors in prediction access more at legalanalyticscourse.com
  62. 62. Once Again Remember Linear Regression access more at legalanalyticscourse.com
  63. 63. 05101520 0 5 10 15 20 X Fitted values Y access more at legalanalyticscourse.com
  64. 64. Notice that the prediction line does not really pass through the middle of any particular observation There is an error term called “epsilon” which attempts to capture the amount of error in the model Y = α + βx + ε A Large ErrorTerm Mean that the Regression Line Does not Really “Fit” the Data Particularly Well 05101520 0 5 10 15 20 X Fitted values Y
  65. 65. Standard Linear Regression = minimize the sum of squared residuals residual is the difference between observed value and fitted value access more at legalanalyticscourse.com
  66. 66. Regression Analysis Involves a Loss Function access more at legalanalyticscourse.com
  67. 67. Linear Regression Squared Error Loss Function L(Y, f ( X)) = ( y − f ( X))Σ 2 access more at legalanalyticscourse.com
  68. 68. Linear Regression Y = α + βX where α and β are both in the reals access more at legalanalyticscourse.com
  69. 69. Y = βo +/- β1 ( X1 ) +/- β2 ( X2 ) +/- β3 ( X3 ) +/- β4 ( X3 ) +/- β5 ( X3 ) + ε Y = $151 + $15 ( ) + 161 ( ) + 95 ( ) + 34 ( ) +/- β5 ( ) + ε Per 100 Lawyers If Tier 1 Market is True Partner Status is True Per 10 Years Practice Area access more at legalanalyticscourse.com
  70. 70. Linear Regression Y = α + βX where α and β are both in the reals Minimizing our SSE loss function helps us identify the "best" alpha and beta that define an actual function out of the family defined above. access more at legalanalyticscourse.com
  71. 71. Why is it Squared Error Loss Function Correct? access more at legalanalyticscourse.com
  72. 72. There are many other loss functions access more at legalanalyticscourse.com
  73. 73. Many models are defined by a functional form or family, e.g., logistic regression, linear regression, SVM+kernel.   Most often, the geometric category that Flach discusses is tied to these forms, and the "loss" functions are essentially "distance" or "spatial" metrics. Note: access more at legalanalyticscourse.com
  74. 74. Misclassification is one common loss function access more at legalanalyticscourse.com
  75. 75. “Imagine we are trying to predict a binary outcome (0,1) Now swap the (0,1) for [-1,1] L(Y, f( X)) = I (y ≠ sign( f))Σ I is the indicator function where we are summing up misclassifications” Example drawn from Michael Clark @ Notre Dame access more at legalanalyticscourse.com
  76. 76. < Okay A Few Words About Implementation > access more at legalanalyticscourse.com
  77. 77. < We Will Use > access more at legalanalyticscourse.com
  78. 78. < Review These As Needed > http://computationallegalstudies.com/quantitative-methods-for-lawyers-course/
  79. 79. < Review These As Needed > http://computationallegalstudies.com/quantitative-methods-for-lawyers-course/
  80. 80. < More to Come in the Next Class > access more at legalanalyticscourse.com
  81. 81. Legal Analytics Class 1 - Introduction to the Course daniel martin katz blog | ComputationalLegalStudies corp | LexPredict michael j bommarito twitter | @computational blog | ComputationalLegalStudies corp | LexPredict twitter | @mjbommar more content available at legalanalyticscourse.com site | danielmartinkatz.com site | bommaritollc.com

×