DemystifyingPredictive
CodingTechnology
Date: Wednesday,August13,2014
Time: 1p.m.ET/NoonCT/11a.m.
MT/10a.m.PT
Anita Engles, VP Products and Marketing
Daegis
Doug Stewart, VP Sales Support
Daegis
TAR Defined
A process for prioritizing or coding a collection of
electronic documents using a computerized
system that harnesses human judgments of one
or more Subject Matter Expert(s) on a smaller set
of documents and then extrapolates those
judgments to the remaining Document Population.
* Grossman & Cormack 2012
The TAR Frontlines
• Evaluation of Machine-Learning
Protocols for Technology-Assisted
Review in Electronic Discovery
(2014)
• Maura R. Grossman and Gordon V. Cormack
• http://cormack.uwaterloo.ca/cormack/calstudy/
Key Findings
• Non-Random Selection Methods
Work Best for Seed Set
• Active Learning Better than Passive
Learning
• Senior Level Subject Matter Experts
are NOT Required to Train System
TAR Steps
Process Overview
ProducingTraining
Assessing
Results
Creating the
Seed Set
Keyword
Searching
Relatedness
Scoring
Identifying
the
Population
Relatedness Scoring
Building the Map
• Build the MapStep
• Measure
Relationships
Purpose
• AlgorithmsVariations
• Core to Predictive
Functionality
Why It
Matters
Keyword Searching
Tried and True
• Validated & Iterative
Keyword SearchingStep
• Inexpensive TrainingPurpose
• Not used in All
ApproachesVariations
• Drives Efficiency
Why It
Matters
motorcycle or bike AND ((throttle or accel*) w/10 stick)
Seed Set
Building the Seed Set
• Review Strategically
Sampled DocsStep
• Generates High-level
Relevancy “Heat Map”Purpose
• Random, Strategic,
Judgmental SamplesVariations
• Drives Efficiency
Why It
Matters
Predicting Responsiveness
The Prediction Engine
Prediction
Engine
Relatedness
Map
Seed Set /
Search
Training
Definitely
Predictive Calls
Responsive?
Definitely Not
The three categories of information we know are fed into the system’s algorithm, which
evaluates the data to score the likelihood of each document’s being responsive.
Assessing the Results
Building the Answer Key
•Assess Accuracy Based on
Industry Standard MetricsStep
•Informs Decision to Stop TARPurpose
•Simple and Stratified Sampling
•Sample Once or Multiple TimesVariations
•Defensibility
Why It
Matters
Definitely
Predictive Calls
Responsive?
Definitely Not
Training / Learning
Continual Refinement
Definitely
Predictive Calls
Responsive?
Definitely Not
Refining keyword searches and manually reviewing documents with highest
levels of uncertainty moves docs from the middle toward the endpoints.
• Reviewers Train and
System LearnsStep
• Transfer Subject Matter
Expertise to TAR SystemPurpose
• Active Learning
• Passive LearningVariations
• Dramatic Cost Savings
Why It
Matters
Post-TAR
Producing the Responsive Documents
• Terminate TAR Review
• Decision based on Accuracy and Cost Metrics
• “Stabilization”
• Harvest Predicted Calls
• Review Responsive Docs
• Sample Non-Responsive Docs
• Document Entire Process
Accuracy Metrics
How Accuracy is Measured TAR improves
the F1 score by
moving
documents
from false
(incorrect) bins
to the true bins
where they
belong.
Selected TAR Bibliography
TAR Resources
1. Search, Forward: Will Manual Document Review and Keyword Searches
be Replaced by Computer-assisted Coding? (2011)
• Judge Andrew Peck
• http://www.law.com/jsp/lawtechnologynews/PubArticleLTN.jsp?id=120251653
0534
2. Technology-Assisted Review in E-Discovery can be More Effective and
More Efficient than Exhaustive Manual Review (2011)
• Maura R. Grossman and Gordon V. Cormack
• http://jolt.richmond.edu/v17i3/article11.pdf
3. Where the Money Goes: Understanding Litigant Expenditures for
Producing Electronic Discovery (2012)
• RAND Institute for Civil Justice: Nicholas M. Pace, Laura Zakaras
• http://www.rand.org/pubs/monographs/MG1208.html#abstract
Thank You!
Q&A
15

Demystifying Predictive Coding Technology

  • 1.
    DemystifyingPredictive CodingTechnology Date: Wednesday,August13,2014 Time: 1p.m.ET/NoonCT/11a.m. MT/10a.m.PT AnitaEngles, VP Products and Marketing Daegis Doug Stewart, VP Sales Support Daegis
  • 2.
    TAR Defined A processfor prioritizing or coding a collection of electronic documents using a computerized system that harnesses human judgments of one or more Subject Matter Expert(s) on a smaller set of documents and then extrapolates those judgments to the remaining Document Population. * Grossman & Cormack 2012
  • 3.
    The TAR Frontlines •Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery (2014) • Maura R. Grossman and Gordon V. Cormack • http://cormack.uwaterloo.ca/cormack/calstudy/
  • 4.
    Key Findings • Non-RandomSelection Methods Work Best for Seed Set • Active Learning Better than Passive Learning • Senior Level Subject Matter Experts are NOT Required to Train System
  • 5.
    TAR Steps Process Overview ProducingTraining Assessing Results Creatingthe Seed Set Keyword Searching Relatedness Scoring Identifying the Population
  • 6.
    Relatedness Scoring Building theMap • Build the MapStep • Measure Relationships Purpose • AlgorithmsVariations • Core to Predictive Functionality Why It Matters
  • 7.
    Keyword Searching Tried andTrue • Validated & Iterative Keyword SearchingStep • Inexpensive TrainingPurpose • Not used in All ApproachesVariations • Drives Efficiency Why It Matters motorcycle or bike AND ((throttle or accel*) w/10 stick)
  • 8.
    Seed Set Building theSeed Set • Review Strategically Sampled DocsStep • Generates High-level Relevancy “Heat Map”Purpose • Random, Strategic, Judgmental SamplesVariations • Drives Efficiency Why It Matters
  • 9.
    Predicting Responsiveness The PredictionEngine Prediction Engine Relatedness Map Seed Set / Search Training Definitely Predictive Calls Responsive? Definitely Not The three categories of information we know are fed into the system’s algorithm, which evaluates the data to score the likelihood of each document’s being responsive.
  • 10.
    Assessing the Results Buildingthe Answer Key •Assess Accuracy Based on Industry Standard MetricsStep •Informs Decision to Stop TARPurpose •Simple and Stratified Sampling •Sample Once or Multiple TimesVariations •Defensibility Why It Matters Definitely Predictive Calls Responsive? Definitely Not
  • 11.
    Training / Learning ContinualRefinement Definitely Predictive Calls Responsive? Definitely Not Refining keyword searches and manually reviewing documents with highest levels of uncertainty moves docs from the middle toward the endpoints. • Reviewers Train and System LearnsStep • Transfer Subject Matter Expertise to TAR SystemPurpose • Active Learning • Passive LearningVariations • Dramatic Cost Savings Why It Matters
  • 12.
    Post-TAR Producing the ResponsiveDocuments • Terminate TAR Review • Decision based on Accuracy and Cost Metrics • “Stabilization” • Harvest Predicted Calls • Review Responsive Docs • Sample Non-Responsive Docs • Document Entire Process
  • 13.
    Accuracy Metrics How Accuracyis Measured TAR improves the F1 score by moving documents from false (incorrect) bins to the true bins where they belong.
  • 14.
    Selected TAR Bibliography TARResources 1. Search, Forward: Will Manual Document Review and Keyword Searches be Replaced by Computer-assisted Coding? (2011) • Judge Andrew Peck • http://www.law.com/jsp/lawtechnologynews/PubArticleLTN.jsp?id=120251653 0534 2. Technology-Assisted Review in E-Discovery can be More Effective and More Efficient than Exhaustive Manual Review (2011) • Maura R. Grossman and Gordon V. Cormack • http://jolt.richmond.edu/v17i3/article11.pdf 3. Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery (2012) • RAND Institute for Civil Justice: Nicholas M. Pace, Laura Zakaras • http://www.rand.org/pubs/monographs/MG1208.html#abstract
  • 15.

Editor's Notes

  • #6 Doug transitions to this slide in explaining the TAR process without Seniors doing the training Anita asks Doug to explain the process with an example—DP NMSIC, this may have to hold until all the deep dive slides have been explained. Deep dive into the process either by anticipating that Doug is talking about a process deeply or by his verbal request to advance to the next slide.
  • #7 Doug explains process Anita asks clarifying questions if appropriate
  • #8 Doug, key point spend some time Anita, will ask clarifying questions only to steer direction
  • #9 Doug, segue from previous slide Anita, ask clarifying questions if needed
  • #10 Doug
  • #11 Doug explains process Anita asks clarifying questions if appropriate
  • #12 Doug explains process Anita asks clarifying questions if appropriate
  • #13 Questions to ask Doug: How do you know when to stop? What is harvesting predicated calls exactly? How would you sample non-responsive docs? How would you document the whole process? If we have not already described how successfully this worked for DP NMSIC, then this is the time to do so briefly and plug for judicial acceptance.
  • #14 Just in case you didn’t have enough info on TAR let’s dive into what the accuracy measurements mean to you and your review. Time permitting
  • #15 To get more info….