Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Predictive Coding and E-Discovery in 2015 and Beyond - LegalTechNYC 2013 ( Daniel Martin Katz + Michael J. Bommarito II )

10,714 views

Published on

Published in: Technology, Education

Predictive Coding and E-Discovery in 2015 and Beyond - LegalTechNYC 2013 ( Daniel Martin Katz + Michael J. Bommarito II )

  1. 1. @ computationalcomputationallegalstudies.com Predictive Coding and E-Discovery in 2015 and Beyond daniel martin katz michael j bommarito ii
  2. 2. The 2x2 Machine Learning Spectrum Info Viz & Pattern Detection Rates of Scaling
  3. 3. The 2x2 Machine Learning Spectrum
  4. 4. In Order to Understand Where We Are Heading ...
  5. 5. in 2015 and Beyond ...
  6. 6. it is necessary to have insight regarding how predictive coding actually works
  7. 7. Predictive Coding Relies Upon a Particular Class of Machine Learning Methods
  8. 8. Predictive Coding Relies Upon a Particular Class of Machine Learning Methods
  9. 9. The Current Approach is drawn from the family of so called “supervised methods”
  10. 10. What is the difference between supervised and unsupervised?
  11. 11. As you have likely seen ...
  12. 12. Predictive Coding
  13. 13. Develop a Training Set using human experts
  14. 14. In the simple case, assign objects to two piles
  15. 15. Take This Document Set ...
  16. 16. Apply Human Coders
  17. 17. yellow = relevant white = non-relevant And Return This ...
  18. 18. Non RelevantRelevant
  19. 19. Key Insight ...
  20. 20. What Allows A Human To Separate These Two Classes of Documents?
  21. 21. that precise human process is what predictive coding is trying to mimic
  22. 22. Humans are selecting upon features of documents
  23. 23. to place those documents in their respective bins (i.e. relevant, non-relevant)
  24. 24. features =? text, author, date, other metadata
  25. 25. supervised methods “learn” from the training data
  26. 26. but there are different forms of learning by machines ...
  27. 27. There Is Learning Within a Matter (i.e. learning from a specific training set)
  28. 28. But what about using prior matters to inform both feature selection and the weighting of those features
  29. 29. In other words, it is possible to learn from the experience of having processed documents in the past
  30. 30. both inside a given company but also across companies ...
  31. 31. It comes from data aggregation / reusing data
  32. 32. This is Learning and Rule Propagation Across Matters
  33. 33. feedback loops are the best friends of algorithms
  34. 34. feedback loops can help make algorithms become much smarter ...
  35. 35. Supervised Unsupervised Predictive Coding The Future Machine Learning Methods 2 x 2 Informed Naive Basic Clustering Algorithm
  36. 36. Supervised Statistical models Bayesian, e.g., Naïve Bayes Classification Frequentist, e.g., Ordinary Least Squares Neural Networks (NN) Support Vector Machines (SVM) Random Forests (RF) Genetic Algorithms (GA) Semi/unsupervised Neural Networks (NN) Clustering K-means Hierarchical Radial Basis (RBF) Graph Some Machine Learning Algorithms
  37. 37. Info Viz & Pattern Detection
  38. 38. Think about the task faced by the intelligence community ...
  39. 39. mountains of information to process
  40. 40. how are those intelligence analysts aided?
  41. 41. Information Visualization
  42. 42. The Visual Cortex is a very powerful CPU ...
  43. 43. We are very good pattern detectors ...
  44. 44. We need a mix of analytics and viz ...
  45. 45. because there are significant efficiency gains to be obtained from applications of sophisticated data visualization techniques
  46. 46. This Next Generation of EDiscovery Software is viz intensive ...
  47. 47. but this is only the beginning ...
  48. 48. including an even more enriched notion of time dynamics ...
  49. 49. Rates of Scaling
  50. 50. Will Discovery Costs Eventually Be Reduced?
  51. 51. Two Scaling Relationships that are in question ...
  52. 52. Cost Per Gig
  53. 53. “[I]n 2001, a 300 Gb legal matter would take 200 attorneys a full year to review, at a cost of about $15 million. In 2003, a similar-sized matter took 100 attorneys 3 weeks to complete, at a cost of $6 million. And in 2006, a 300 Gb investigation took 65 attorneys only 2.5 days to complete, at a cost of $2 million. And now, cases with several hundreds of Gbs are routine.” Improving Document Review in E-Discovery FTI Consulting
  54. 54. Past Rate of ESI Creation
  55. 55. Long Term Rate of ESI Creation ?
  56. 56. Daniel Martin Katz Michigan State University Associate Professor of Law @ computational computationallegalstudies.com reinventlaw.com http://about.me/daniel.martin.katz

×