Legal Informatics Research Today: Implications for Legal Prediction, 3D Printing, & eDiscovery


Published on

Presentation at CICL 2013: Conference on Innovation and Communications Law, 16 May 2013, Glen Arbor, Michigan.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Legal Informatics Research Today: Implications for Legal Prediction, 3D Printing, & eDiscovery

  1. 1. Legal InformaticsResearch Today:Implications for LegalPrediction, 3D Printing &eDiscoveryRobert RichardsPenn State UniversityCICL 2013: Conference on Innovation and CommunicationsLaw
  2. 2. Agenda Legal Informatics: Overview eDiscovery: Methods, Recent Research 3D Printing: How legal tech could apply Legal Prediction Methods, Recent Research
  3. 3. Legal Informatics: DefinitionLegal informatics is:(1) the study of legal information /communication systems(2) the application of ICT(information / communicationtechnology) to legal information
  4. 4. ICTLegal Information
  5. 5. What is legal information? Structured data that express: 1. Legal Rules 2. Information about Legal Rules(1st, 2nd, 3rd, etc. order legal metadata) 3. Evidence Non-legal data used to support anassertion about a legal rule
  6. 6. What is a legal information /communication system? A set of interrelated entities thatreceive, process, or output legalinformation Examples: A law office time/billing system A database of court decisions A statistical model predicting a legaloutcome
  7. 7. Legal Informatics Viewpoint:4 Levels In a domain Addressing an application area From one or more sub-disciplines, by Employing one or moremethodologies
  8. 8. Legal Informatics: DomainsLaw PracticeCourtsLegislatureRegulatoryPolitics / CivicComputingLegalEducationBusinessConsumers
  9. 9. Legal Informatics: Application AreasLitigationCompliancePlanningInterviewing/CounselingNegotiationEducationGovernance /Policy making
  10. 10. Legal Informatics: Sub-Disciplines Artificial Intelligence Information Retrieval Text Processing / NLP Metadata/ KnowledgeRepresentation Databases / Storage Linguistics /Communication Human-ComputerInteraction / InformationBehavior Management /Sociology of Info
  11. 11. Legal Informatics: Methodologies Prototyping Statistics /Probability Experimentation Network Analysis Survey Research Case Study Cost-BenefitAnalysis Ethnography Interviewing Doctrinal Analysis
  12. 12. ExampleMuch eDiscovery researchinvolves… Law Practice (Domain) Litigation / Evidence (Application Area) Information retrieval + text analysis +knowledge representation /metadata +management (Sub-Disciplines) Prototyping + experimentation + statisticalanalysis + cost-benefit analysis(Methodologies)
  13. 13. 4-Level Approach Reveals Relationships Between(Apparently) Dissimilar Research Activities Scherer, S., Wimmer, M. A., &Markisic, S. (2013). Bridgingnarrative scenario texts andformal policy modeling throughconceptual policy modeling.Artificial Intelligence and Law.doi:10.1007/s10506-013-9142-2
  14. 14. Scherer et al. (2013)ICTCitizen’s LegalNarrative Doctrine/Rule
  15. 15. Scherer et al.: Public Policy DomainMethodologies: Prototyping + Case studySub-Disciplines: Artificial intelligence + Linguistics + TextAnalysis + Knowledge RepresentationApplication area: Translating non-legal language to legalconceptsDomain: Public policy (e-Participation)
  16. 16. Scherer et al.: Law Practice DomainMethodologies: Prototyping + Case studySub-Disciplines: Artificial intelligence + Linguistics + TextAnalysis + Knowledge RepresentationApplication area: Translating non-legal language to legalconceptsDomain: Law practice (Counseling, Interviewing)
  17. 17. Functions of Legal Informatics Approach Analyze: Processes Define: Problems Explain: Causation Predict: Outcomes
  18. 18. Functions of Legal Informatics Approach(cont’d) Evaluate: Processes Outcomes Apply: Diverse approaches andmethods
  19. 19. eDiscovery Definition Goals and Motivation Models Research Results Predictive Coding Future Areas of Research
  20. 20. eDiscovery: definitionIn litigation, the request for andproduction of electronicallystored information relevant to aclaim or count
  21. 21. eDiscovery: GoalsIncrease effectiveness of methodsLower costs
  22. 22. Cost Motivation Big Data  prohibitive costs oftraditional relevance- and privilege-review With data sets of > 106objects linearmanual review and privilege reviewbecome unsustainably expensive
  23. 23. EDRM Model
  24. 24. New Models Emerging:Informatics-Based, Elaborating EDRMEDRM Oard &Webber
  25. 25. Oard & Webber (2013)Production requestCollectionResponsive ESIProduction---> InsightFormulationAcquisitionReview forRelevanceReviewforPrivilegeSense-making©Copyright 2013 Douglas W. Oard and WilliamWebber
  26. 26. TREC & EDI: Key Findings Initial Search & Second-Step RelevanceFeedback: Automated relevance ranking > Boolean query in re: recall Interactive Evaluation: Technology-Assisted Review > ManualReview in re: overall results + precision High Precision + High Recall are possible withcertain topics
  27. 27. TREC Key Findings (cont’d) Predictive coding produced high recall But most machine learning systems could not correctlychoose correct sample size to maximize precision andrecall. Machine learning systems that yielded highlyrelevant results also yielded highly materialdocs Privilege Review Remains a Key Cost Driver &Is Under-Automated (Pace & Zakaras, 2012) Automated privilege review yielded high recall in onestudy (but method was not disclosed)
  28. 28. eDiscovery: Measurement ErrorLow rates of inter-assessor agreement Found in TREC & EDI studiesCooperation between parties on evaluation intech-assisted review likely to lower measurementerror This is an emerging best practice (see, e.g., DaSilva Moore)
  29. 29. eDiscovery: Recent Emphases (Baron, 2011) Process Quality Standards & BestPractices Metrics & certification (DESI IV, 2011) Cooperation between Parties Sedona Conference (2009) Improved Search, including PredictiveCoding DESI V, 2013 Results of TREC & EDI researchCourts are implementing all of these
  30. 30. eDiscovery: Recent Emphases:Sub-Disciplines Process Quality Standards & BestPractices Management Cooperation between Parties Management, InformationRetrieval, Knowledge Representation Improved Search, including PredictiveCoding Information Retrieval, TextAnalysis, Knowledge
  31. 31. Predictive Coding: DefinitionMachine learning applied toclassification of information e.g., as responsive / non-responsive
  32. 32. Predictive Coding: Diverse Methods Support VectorMachines Latent SemanticAnalysis Naïve BayesianClassifiers Decision Trees Neural Networks Association RuleLearning Rule Induction Genetic Algorithms
  33. 33. Predictive Coding: Courts Reading, Citing, &Applying Legal Informatics Research Da Silva Moore v. Publicis Groupe EORHB v. HOA Holdings Global Aerospace Inc. v. LandowAviation Kleen Products v. Packaging Corp. ofAmerica
  34. 34. eDiscovery: Future Research Directions Evaluation Standards & Certification Threshold point estimates Relevance threshold Sample size threshold Confidence level, confidence intervals Typology of Production Requests Electronic Discovery Institute plans 2ndstudy on real e-discovery materials testing TREC conclusions, with higher ecological validity
  35. 35. eDiscovery: Future Research Directions(cont’d) Measurement Error: Modeling it & correcting for it Designing re-usable test collections Automated privilege review Identifying effective methods Designing test collections to evaluate those methods
  36. 36. eDiscovery: Future Research Directions(cont’d) Evaluating de-duplication methods Improved privacy measures to enableexperiments on real-life data sets Apply other sub-disciplines, includingInformation behavior Diversify methods, including socialnetwork analysis More research on Early CaseAssessment
  37. 37. 3D Printing Definition Expected Effects Lawyers’ Value-Add Short-Term Application of LegalTechnology Long-Term Application of Legal Technology
  38. 38. 3D Printing: Definition The generation of physical objectsfrom computer models, by a layeringprocess Also called Additive Manufacturing(Gibson, Rosen, & Stucker, 2010)
  39. 39. 3D Printing: Some Expected Effects Democratizing manufacturing More inventors More innovation More infringement More demand for legal complianceservices More demand for patent legal
  40. 40. Patent Lawyers’ Value-Add forEntrepreneurs / New Inventors Patent Search Claim Interpretation Currency of Information Customization of Information toClient’s Circumstances Strategic Advice (Law + Business)
  41. 41. How Might Legal Informatics Affect 3DPrinting? Legal Informatics is likely to interactwith 3D Printing in two ways: Short-Term: Unbundling of patentlegal services (Mosten, 1994) Long-Term: Automated patentsearch & Modeling of claiminterpretations incorporated intoCAD software
  42. 42. Unbundling of Patent Legal Services Selling (outdated) patent searchresults Selling (outdated) memorandacontaining claim interpretations Offering (remotely) updated &customized search results andcounseling for an extra fee
  43. 43. Patent Legal Services Unbundling: 4-LevelsDomain: BusinessApplication Areas: Compliance, CounselingSub-Disciplines: Management, Information Retrieval, KnowledgeRepresentationMethodologies: Prototyping, Case Studies, DoctrinalAnalysis, Cost-Benefit Analysis
  44. 44. Automated patent search & modeling of claiminterpretations (Hulicki, 2013; Mulligan & Lee, 2012) User inputs simulation/design/imageof invention CAD software analyzesinput, determines domain & patentsearch parameters CAD Software executes patentsearch, retrieves relevant patents inforce CAD software analyzes claims of
  45. 45. Automated patent search & modeling of claiminterpretations (cont’d) CAD Software translates claims intosimulation parameters For each simulation model, CAD softwarecalculates probability of liability for patentinfringement & possible exposure Output displays liability probability +potential exposure Lawyer offers (remote) legal counseling foran extra fee
  46. 46. Automated Patent Search & Modeling ofClaim Interpretations: 4-LevelsDomain: BusinessApplication Areas: Compliance, CounselingSub-Disciplines: Artificial Intelligence, InformationRetrieval, Knowledge Representation, Human-Computer InteractionMethodologies: Prototyping, Statistical Modeling, CaseStudies, Experimentation, Ethnography, Interviewing
  47. 47. Implications of Both Scenarios More small-scaleinventors/entrepreneurs will haveaccess to legal complianceinformation at an affordable price Clients can choose to pay more forhigher levels of service Reform of legal ethics rules may berequired to implement either scenario
  48. 48. Legal Prediction Definition 4-Level View Temporal Dimensions Research Results Possible Effects Future Research Directions
  49. 49. Legal Prediction: Definitions (1) Methods for calculating theprobability of the occurrence or non-occurrence of law-related events orcircumstances at a point in time, onthe basis of data acquired at anearlier point in time (2) Methods for inferring law-relatedattributes of a population from asample
  50. 50. Legal Prediction: Application Areas Case Outcome / LitigationManagement (Blackman et al., 2012; Ruger etal., 2004; Ribstein, 2012) Imputing Default Terms in Contracts &Wills (Porat & Strahilevitz, 2013) Legislative Bill Passage (Tauberer, 2012; Yano et al., 2012)
  51. 51. Legal Prediction: Application Areas (cont’d) Document Relevance (eDiscovery, Legalresearch) (Katz, 2013) Legal Spend (In-House Counsel) (Katz, 2013) Lawyer Hiring (Law Firms) (Katz, 2013) Legal Compliance (Clients, In-HouseCounsel) (Ribstein, 2012)
  52. 52. Legal Prediction: Sub-Disciplines Artificial Intelligence Information Retrieval Metadata / KnowledgeRepresentation Text Processing
  53. 53. Legal Prediction: Diverse Methods Bayesian Inference (McShane et al., 2012; Guimerà & Sales-Pardo, 2011) Stochastic Block Modeling (Guimerà & Sales-Pardo, 2011) Classification/Decision Trees (Ribstein, 2012; Ruger et al., 2004) Crowdsourced Prediction Markets (Blackman et al., 2012; Ribstein, 2012)
  54. 54. Legal Prediction: Diverse Methods (cont’d) Machine Learning (Katz, 2013) Case-Based Reasoning (Ribstein, 2012) Surveys (Dimmock & Gerken, 2012; Porat & Strahilevitz, 2013) Regression, Maximum Likelihood (Dimmock & Gerken, 2012)
  55. 55. Legal Prediction:Model vs. CrowdsourcingBlackman’s FantasySCOTUS vs. Martin, Rugeret al. Complementary approaches
  56. 56. Legal Prediction:Three Temporal Dimensions Synchronic: Inference from sample to parameters of a static population Predictive coding, machine learning Used to collect data set for model Diachronic Future: Inference from sample at t to observations at t + 1, where t +1 is later than today Forward prediction (Katz) Often performed on the data set gathered using Synchronicprediction Diachronic Past: Retrospective prediction Inference from sample at t to observations at t + 1, where t +1 is earlier than today
  57. 57. Legal Prediction:Some Research Results Decision Tree > Domain Experts (Ruger et al.) Crowdsourcing > Domain Experts (Blackman etal.) Crowdsourcing = Decision Tree (Blackman et al.) Stochastic Block Models > case-content basedalgorithms (Guimerà & Sales-Pardo) Stochastic Block Models > Domain Experts(Guimerà & Sales-Pardo)
  58. 58. Legal Prediction: Possible Effects Lawyer disintermediation (Katz, 2013;Ribstein, 2012) Client empowerment (Ribstein, 2012) Reduction in legal costs (Katz, 2013;Ribstein, 2012) Within businesses, distribution of legaltasks to non-legal personnel(Ribstein, 2012)
  59. 59. Legal Prediction: Future ResearchDirections Analogical reasoning: development ofimproved models (Katz) Crowdsourced prediction markets forlower-level courts (Blackman et al.) Automated prediction engines forlower-level courts (Blackman et al.)
  60. 60. References Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. I. (1996). Fastdiscovery of association rules. Advances in Knowledge Discovery and DataMining, 12:307–328. Ashley, K. D., & Brüninghaus, S. (2009). Automatically classifying case texts andpredicting outcomes. Artificial Intelligence and Law, 17, 125-165. doi:10.1007/s10506-009-9077-9 Ashley, K. D., & Bridewell, W. (2010). Emerging AI & Law approaches to automatinganalysis and retrieval of electronically stored information in discovery proceedings.Artificial Intelligence and Law, 18, 311-320. doi:10.1007/s10506-010-9098-4 Barnett, T., Godjevac, S., Renders, J.-M., Privault, C., Schneider, J., & Wickstrom, R.(2009, June). Machine learning classification for document review. Paper presented atthe DESI III Global E-Discovery/E-Disclosure Workshop: A Pre-Conference Workshopat the twelfth International Conference on Artificial Intelligence and Law, ICAIL2009, Barcelona, Spain. Baron, J. (2011). Law in the age of exabytes: Some further thoughts on ‘informationinflation’ and current issues in e-discovery search. Richmond Journal of Law andTechnology, 17(3), Article 9. Retrieved from Blackman, J., Aft, A., & Carpenter, C. (2012). FantasySCOTUS: Crowdsourcing aprediction market for the Supreme Court. Northwestern Journal of Technology andIntellectual Property, 10(3), Article 3. Retrieved from Cohen, W. W. (1995). Fast effective rule induction. In Machine learning: Proceedingsof the twelfth international conference, ML95.
  61. 61. References (cont’d) Conrad, J. (2010). E-discovery revisited: the need for artificial intelligence beyond informationretrieval. Artificial Intelligence and Law, 18, 321-345. doi:10.1007/s10506-010-9096-6 Cormack, G. V., & Grossman, M. R., Hedin, B., & Oard, D. W. (2011). Overview of the TREC 2010legal track. In The Nineteenth Text Retrieval Conference (TREC 2010) Proceedings. N.p.: NIST. Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y, 2012). DESI IV (2011). [Call for papers:] ICAIL 2011 workshop on setting standards for searchingelectronically stored information in discovery proceedings (DESI IV Workshop), June6, 2011, University of Pittsburgh, Pittsburgh, PA. DESI V (2013). [Call for papers:] ICAIL 2013 workshop on standards for using predictivecoding, machine learning, and other advanced search and review methods in e-discovery (DESI Vworkshop), June 14, 2013, Consiglio Nazionale delle Ricerche, Rome, Italy. Dimmock, S. G., & Gerken, W. C. (2012). Predicting fraud by investment managers. Journal ofFinancial Economics, 105, 153-173. doi:10.1016/j.jfineco.2012.01.002 EORHB, Inc. v. HOA Holdings LLC, Civ. Ac. No. 7409-VCL (Del. Ch. Oct. 15, 2012). Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Machine Learning, 29, 131-163. Gibson, I., Rosen, D. W., & Stucker, B. (2010). Additive manufacturing technologies: Rapidprototyping to direct digital manufacturing. New York: Springer Global Aerospace, Inc., v. Landow Aviation, L.P., No. CL 61040 (Va. Cir., Apr. 23, 2012). Grossman, M. R., & Cormack, G. V. (2011). Technology-assisted review in e-discovery can bemore effective and more efficient than exhaustive manual review. Richmond Journal of Law andTechnology, 17(3), Article 11. Retrieved from Grossman, M. R., Cormack, G. V., Hedin, B., & Oard, D. W. (2011). Overview of the TREC 2011legal track. In The Twentieth Text Retrieval Conference (TREC 2011) Proceedings. N.p.: NIST.
  62. 62. References (cont’d) Guimerà, R., & Sales-Pardo, M. (2011). Justice blocks and predictability of U.S. Supreme Courtvotes. PLOS ONE, 6(11), e27188. doi:10.1371/journal.pone.0027188 Hulicki, M. (2013, May). Recent judgments of the highest court as a step towards objectification ofpatentability. Paper presented at CICL 2013: Conference on Innovation and CommunicationLaw, Glen Arbor, MI. In re Actos (Pioglitazone) Products, No. 6:11-md-2299 (M.D. La., July 27, 2012). Joachims, T. (1998). Text categorization with support vector machines: Learning with manyrelevant features. In C. Nédellec & C. Rouveiro (Eds.), Proceedings of the 10th EuropeanConference on Machine Learning (pp. 137–142). Katz, D. M. (2013). Quantitative legal prediction—Or—How I learned to stop worrying and startpreparing for the data-driven future of the legal service industry. Emory Law Journal, 62, 101-158. Kleen Prods. LLC v. Packaging Corp. of Am., No. 10 C 5711 (N.D. Ill., Sept. 28, 2012). LexMachina. (n.d.). About, technology. Retrieved from Martin, A. D., & Quinn, K. M. (2002). Dynamic ideal point estimation via Markov chain Monte Carlofor the U.S. Supreme Court, 1953–1999. Political Analysis, 10, 134-153. doi:10.1093/pan/10.2.134 McShane, B. B., Watson, O. P., Baker, T., & Griffith, S. J. (2012). Predicting securities fraudsettlements and amounts: A hierarchical Bayesian model of federal securities class action lawsuits.Journal of Empirical Legal Studies, 9, 482-510. doi:10.1111/j.1740-1461.2012.01260.x Mosten, F. S. (1994). Unbundling of legal services and the family lawyer. Family LawQuarterly, 28, 421-449. Mulligan, C., & Lee, T. B. (forthcoming). Scaling the patent system. N.Y.U. Annual Survey ofAmerican Law. Retrieved from Oard, D. W., Baron, J. R., Hedin, B., Lewis, D. D., & Tomlinson, S. (2010). Evaluation ofinformation retrieval for e-discovery. Artificial Intelligence and Law, 18, 347-386.doi:10.1007/s10506-010-9093-9
  63. 63. References (cont’d) Oard, D. W., & Webber, W. (2013). Information retrieval for e-discovery.Foundations and Trends in Information Retrieval, 7, 1-141. Retrieved from Pace, N. M., & Zakaras, L. (2012). Where the money goes: Understandinglitigant expenditures for producing electronic discovery. Santa Monica, CA:Rand Institute for Civil Justice. Porat, A., & Strahilevitz, L. J. (2013). Personalizing default rules anddisclosure with big data (University of Chicago Coase-Sandor Institute forLaw and Economics working paper no. 634, 2nd series). Retrieved from Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81-106. Ribstein, L. (2012). Delawyering the corporation. Wisconsin LawReview, 2012, 305-332. Richards, R. (2009, June). What is legal information? Paper presented at theConference on Legal Information: Scholarship and Teaching, at theUniversity of Colorado School of Law, Boulder, CO. Retrieved from
  64. 64. References (cont’d) Roitblat, H. L., Kershaw, A., & Oot, P. (2010). Document categorization in legalelectronic discovery: Computer classification vs. manual review. Journal of theAmerican Society for Information Science and Technology, 61, 70-80.doi/10.1002/asi.21233 Ruger, T. W., Kim, P. T., Martin, A. D., Quinn, K. M. (2004). The Supreme Courtforecasting project: Legal and political science approaches to predicting SupremeCourt decisionmaking. Columbia Law Review, 104, 1150-1210. Scherer, S., Wimmer, M. A., & Markisic, S. (2013). Bridging narrative scenario textsand formal policy modeling through conceptual policy modeling. Artificial Intelligenceand Law. doi:10.1007/s10506-013-9142-2 The Sedona Conference. (2009). Commentary on achieving quality in e-discovery. N.p.: The Sedona Conference. Tauberer, J. (2012, December 7). Bill prognosis gets a few improvements. GovTrackBlog [web log post]. Retrieved from Webber, W. (2011, July). Re-examining the effectiveness of manual review. Paperpresented at SIGIR 2011 Information Retrieval for E-Discovery (SIRE)Workshop, Beijing, China. Yano, T., Smith, N. A., & Wilkerson, J. D. (2012, October). Textual predictors of billsurvival in congressional committees. Paper presented at New Directions in AnalyzingText as Data 2012, Harvard University, Cambridge, MA. Retrieved from