SlideShare a Scribd company logo
Getting to a Manageable Review Set

            Intake
                                                       Focus on finding,
                     Duplicates
             Data      25%                           reviewing & using the
             100%
                                                           “right” data,
                                  Junk/Spam/
                                     Porn             not just filtering data
                                     20%
                                   NR/Priv
                                    20%
                                                    Non-
                                                 Responsive
                                                    20%
                                               Responsive     Produced
                                               & Priv 15%      12.25%


            These figures vary based upon the data set received

12/5/2011                                                                       1
Review risks
       Failure to collect the right data
       Failure to find responsive documents
       Failure to recognize responsive documents
       Failure to recognize privileged documents
       Inconsistent treatment of documents (e.g.,
       duplicates)
       Failure to complete project in a timely manner

       Sophisticated Tools
            – Understand What They Do and Don’t Do Well
            – Inform Yourself, Speak to References, Consultants
12/5/2011                                                         2
Search Methodologies

                                        Visualization
                                        Measurement
                          Relationship
                            Analysis
                         documents with
                           causal or
                      sequential relationship
Context
                     Social Network Analysis
               relationships among relevant people
               relationships among relevant people
               Clustering
               Clustering              Ontology
                                       Ontology
Concept       similarity of
               similarity of          generalized
                                      generalized
             salient features
             salient features       words or phrases
                                    words or phrases
                                    specific exact words,
 Content     Keyword
             Keyword                specific exact words
                                     specific exact words
                                proximity searches, stemming


12/5/2011                                                      3
Myth
            Keyword Searching is the Way to Go

               If I agree to keyword terms, I am OK
               Missing in Action (Under-inclusive)
               Unwanted Extras (Over-inclusive)
               Multiple subject/persons (Disambiguate)

            Reality: Keyword Search is one tool among many!




12/5/2011                                                     4
"simple keyword searches end
 up being both over- and under-
 inclusive."
     Judge Paul Grimm, Victor Stanley, Inc. v. Creative Pipe, Inc., No. MJG-06-2662, 2008 U.S. Dist. LEXIS 42025
                                                                                          (D. Md. May 29, 2008).




Keyword culling
Keyword Accuracy Example
    Keyword search reduced the
    document set by only 47%

    And 88% of the documents
    returned by keyword
    search were not responsive
    (Over-inclusive)




     8,553 responsive documents
     missed by keyword search
     (Almost 8% of responsive
     documents missed by
     keyword search - Under-inclusive)



12/5/2011                                    6
Under Inclusive - Missing in Action
        Missing abbreviations / acronyms / clippings:
            – incentive stock option but not ISO

        Missing inflectional variants:
            – grant but not grants, granted, granting

        Missing spellings or common misspellings:
            – gray but not grey

            – privileged but not priviliged, priviledged, privilidged,
               priveliged, privelidged, priveledged, …

        Missing syntactic variants:
            • board of directors meetingbut not meeting of the board of
              directors, BOD meeting, board meeting, BOD mtg…
        Missing Synonyms/Paraphrases:
            • Hire date but not start date
12/5/2011                                                                 7
Over-Inclusive - Unwanted Extras (a)
            Options

            Target: Sheila was granted 100,000 options at $10
             Match: What are our options for lunch?
             Match in a signature line:
                     Amanda Wacz
                     Acme Stock Options Administrator
            Destroy
             Target:destroyevidence
             Match in a disclaimer: The information in this email, and any
               attachments, may contain confidential and/or privileged
               information and is intended solely for the use of the named
               recipient(s). Any disclosure or dissemination in whatever form, by
               anyone other than the recipient is strictly prohibited. If you have
               received this transmission in error, please contact the sender
               and destroy this message and any attachments. Thank you.
12/5/2011                                                                            8
Over-Inclusive - Unwanted Extras (b)
       alter*

       Target: alter, alters, altered, altering
  Matches: alternate, alternative, alternation, altercate,
       altercation, alterably, …


       grant

    Target:stock optiongrant
  Matches names:Grant Woods, Howard Grant


12/5/2011                                                    9
Failure to Disambiguate
               Words that Relate to Multiple Subjects
            Example: refund is used to refer to:
             – FERC-ordered refunds owed by Enron for
               overcharging
             – Tax refunds (both corporate and personal)
             – Mundane business matters

            In a given matter, one might be of interest
            while the others are not




12/5/2011                                                  10
Technology Enhanced Review:
            Speed, Predictable Costs, and Accuracy
       Automate any portion of the review

              Source    Eliminate
               Data    Duplicates &
                       System Files


             100%                 Non-Responsive
                       30%           Isolation         Example from a real case
                                    ontologies


                                                 NR by
                                      30%     Technology  Responsive
                                               Enhanced by Technology
                                                Review     Enhanced
                                               (removed     Review       Priv by
                                             another 18%)  (removed    High-Speed
                                                          another 7%) Manual Review

                                              22%                         3%
                                                        15%


12/5/2011                                                                             11
Example: “priv” ontology


                   Valuable, re-usable work product
                   Combines classifiers into concepts,
                   into bigger concepts




12/5/2011                                            12
Disclaimer Detection

        Disclaimers can throw
        off attempts to detect
        privileged
        communications
        Prevalent throughout
        many companies,
        even on trivial
        communications
        Detect them
        automatically, and
        exclude them from
        searches



12/5/2011                                    13
Privileged by Actor Only




            Responsive                                Privileged by Actor and Term


                         D omain of D isclaimer
                               D etection


                                                  Privileged by Term Only




                              Privileged by
                            D isclaimer Only




12/5/2011                                                                            14
Priv Logs
       Expensive - But Do NOT Have to Be
       In re Vioxx Products Liability Litigation (E.D. La 2007)
       Merck’s Priv Log had 30,000 items on it
            – How to Make a Judge Angry
            – How to Waste Client Money
            – How to Attract Sanctions




12/5/2011                                                         15
Transparency of Process
       Discussing Review Protocols
        – Provide transparent, defensible, sophisticated search
          based on document content
        – Clustering, Ontologies, Analytics, and yes, sometimes
          Keywords too
       Develop search methodologies for each case
        – Use technology experts in consultation with case / legal
          experts
       Results verifiable by Quality Control
        – Defensible sampling
       Sophisticated Tools
        – Understand What They Do and Don’t Do Well
        – Inform Yourself, Speak to References, Consultants
12/5/2011                                                            16
Blair &Maron:
                          Keyword search is incomplete
                            What the lawyers thought
                       100%
                            they were finding
                       90%
Responsive documents




                       80%
                       70%
                       60%
                       50%
                                                                         What they
                       40%                                               actually found
                       30%
                       20%
                       10%
                        0%
                                     Predicted                             Obtained

                                                 Blair and Maron, Communications of the ACM, 28, 1985, 289-299
Blair and Maron
         “It is impossibly difficult for users to
             predict the exact words, word
        combinations, and phrases that are
              used by all (or most) relevant
        documents and only (or primarily) by
                   those documents.”




Blair & Maron Study: 20% recall
Lawyers picked 3 key terms,
B & M found 26 more
Defense: “Unfortunate incident”
Plaintiff: “Disaster”



                                     Blair and Maron, Communications of the ACM, 28, 1985, 289-299
Predictive
 Coding
Document categorization in Legal Discovery:
Computer Classification vs. Manual Review
Herbert L. Roitblat, Anne Kershaw, & Patrick Oot
1

                          0.95

                           0.9
Agreement with original




                          0.85

                           0.8

                          0.75

                           0.7

                          0.65

                           0.6

                          0.55

                           0.5
                                 Team A    Team B     System C       System D
                                  Manual            Computer
                                  review            classification 2010, JASIST
                                                       Roitblat, Kershaw, &Oot,
Gold Standard
Turing test




Alan Turing, 1912-1954
Substantial disagreement between
            Team A & Team B

                  28%

      629         580                 858
                                                                    A
                                                                    Both
                                                                    B




0           500     1000       1500                2000

                  Responsive Documents

                                 Roitblat, Kershaw, &Oot, 2010, JASIST
Conclusion
The computer systems yielded comparable level of
performance relative to manual review
Fewer people, less time, less cost
Measure performance to evaluate
Will lawyers lose control?




Computer system amplifies the
  intelligence of the Expert
Will lawyers
lose their jobs?
Tap into the mind of an expert
Technology-Enhanced or Automated Review




12/5/2011                                   29
Setup




                                           Sample




    Responsive                                                 Non-
                                Expert judges                  responsive
                                   sample


Repeat as needed
                                                Model learns
                                                Model
                                                predicts

         Responsive                                    Non-responsive

                   Model categorizes all remaining
                   documents
Predictive coding achieves much higher
           accuracy (Jaccard)


  Team A Only   Team A and Team B                        Team B


    0.304            0.281                                  0.415




Humans           Humans and Predictive Coding                    Predictive Coding


0.186                         0.688                                          0.126




                Responsive Documents
                       Data from Roitblat, et al. and an Internal OrcaTec Case Study
Why doesn’t everyone use it?

•   Attorneys don’t understand the
    technology
•   May not be aware of the accuracy data
•   May not understand how to fit into their
    work flow
•   Not in everyone’s economic interest
•   Acceptable to judges?
Defensible?


Measure TREC      Roitblat, e   Roitblat   Predictiv
        2008      t al. Team    et al.     e
                  A             Team B     Coding*
Precision 0.210   0.197         0.183      0.899
Recall    0.555   0.488         0.539      0.873


                                           *OrcaTec internal Result
Thank you!




               Herb Roitblat           Sonya Sigler
            770-650-7706x229           650-281-8325
            herb@orcatec.com        sonya@sigler.name




12/5/2011                                               34

More Related Content

Viewers also liked

RIP+MIX HCI3 24.01.11
RIP+MIX HCI3 24.01.11RIP+MIX HCI3 24.01.11
RIP+MIX HCI3 24.01.11
University of Dundee
 
Xây dựng và phát triển phần mềm mã mở - Cần thơ nguyễn vũ hưng
Xây dựng và phát triển phần mềm mã mở - Cần thơ   nguyễn vũ hưngXây dựng và phát triển phần mềm mã mở - Cần thơ   nguyễn vũ hưng
Xây dựng và phát triển phần mềm mã mở - Cần thơ nguyễn vũ hưngVu Hung Nguyen
 
Wollongong blogs and wikis 43
Wollongong blogs and wikis  43Wollongong blogs and wikis  43
Wollongong blogs and wikis 43Mark Woolley
 
Proyecto gpm
Proyecto gpmProyecto gpm
Proyecto gpm
Xochilt Ramirez
 
Ow2 Open World Forum09 Trustie Project
Ow2 Open World Forum09 Trustie ProjectOw2 Open World Forum09 Trustie Project
Ow2 Open World Forum09 Trustie ProjectOW2
 
Izaro e bilboko metro geltokia
Izaro e bilboko metro geltokiaIzaro e bilboko metro geltokia
Izaro e bilboko metro geltokiakontakatiluak06
 
JPCL jz201593z Wang Presentation
JPCL jz201593z Wang PresentationJPCL jz201593z Wang Presentation
JPCL jz201593z Wang Presentation
jpcoffice
 
Wojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-roku
Wojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-rokuWojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-roku
Wojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-roku
Księgarnia Grzbiet
 
Curs Ofimàtica 2004-2005. Bloc OO Presentacions
Curs Ofimàtica 2004-2005. Bloc OO PresentacionsCurs Ofimàtica 2004-2005. Bloc OO Presentacions
Curs Ofimàtica 2004-2005. Bloc OO Presentacions
Alex Araujo
 
Xearthquakerocksjapan efatt saleh
Xearthquakerocksjapan efatt salehXearthquakerocksjapan efatt saleh
Xearthquakerocksjapan efatt saleh
geoffdymond
 
Guía 1 Emprendedores
Guía 1 EmprendedoresGuía 1 Emprendedores
Guía 1 Emprendedores
marthaceciliamedinadiez
 
Fraternidad Misionera Padre Manuel Soria
Fraternidad Misionera Padre Manuel SoriaFraternidad Misionera Padre Manuel Soria
Fraternidad Misionera Padre Manuel SoriaOscar Sagastume
 

Viewers also liked (14)

RIP+MIX HCI3 24.01.11
RIP+MIX HCI3 24.01.11RIP+MIX HCI3 24.01.11
RIP+MIX HCI3 24.01.11
 
Xây dựng và phát triển phần mềm mã mở - Cần thơ nguyễn vũ hưng
Xây dựng và phát triển phần mềm mã mở - Cần thơ   nguyễn vũ hưngXây dựng và phát triển phần mềm mã mở - Cần thơ   nguyễn vũ hưng
Xây dựng và phát triển phần mềm mã mở - Cần thơ nguyễn vũ hưng
 
Wollongong blogs and wikis 43
Wollongong blogs and wikis  43Wollongong blogs and wikis  43
Wollongong blogs and wikis 43
 
Proyecto gpm
Proyecto gpmProyecto gpm
Proyecto gpm
 
Ow2 Open World Forum09 Trustie Project
Ow2 Open World Forum09 Trustie ProjectOw2 Open World Forum09 Trustie Project
Ow2 Open World Forum09 Trustie Project
 
Izaro e bilboko metro geltokia
Izaro e bilboko metro geltokiaIzaro e bilboko metro geltokia
Izaro e bilboko metro geltokia
 
JPCL jz201593z Wang Presentation
JPCL jz201593z Wang PresentationJPCL jz201593z Wang Presentation
JPCL jz201593z Wang Presentation
 
RIP GCM
RIP GCMRIP GCM
RIP GCM
 
Wojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-roku
Wojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-rokuWojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-roku
Wojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-roku
 
Curs Ofimàtica 2004-2005. Bloc OO Presentacions
Curs Ofimàtica 2004-2005. Bloc OO PresentacionsCurs Ofimàtica 2004-2005. Bloc OO Presentacions
Curs Ofimàtica 2004-2005. Bloc OO Presentacions
 
Xearthquakerocksjapan efatt saleh
Xearthquakerocksjapan efatt salehXearthquakerocksjapan efatt saleh
Xearthquakerocksjapan efatt saleh
 
Guía 1 Emprendedores
Guía 1 EmprendedoresGuía 1 Emprendedores
Guía 1 Emprendedores
 
Fraternidad Misionera Padre Manuel Soria
Fraternidad Misionera Padre Manuel SoriaFraternidad Misionera Padre Manuel Soria
Fraternidad Misionera Padre Manuel Soria
 
Luis Gamboa
Luis  GamboaLuis  Gamboa
Luis Gamboa
 

Similar to SF Women in eDiscovery Sept 2011

Exploring session search
Exploring session searchExploring session search
Exploring session search
Gene Golovchinsky
 
SLAS Informatics SIG: SLAS2013 Presentation
SLAS Informatics SIG: SLAS2013 PresentationSLAS Informatics SIG: SLAS2013 Presentation
SLAS Informatics SIG: SLAS2013 Presentation
SLAS (Society for Laboratory Automation and Screening)
 
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
jmorriso
 
Is this Entitity Relevant to your Needs - CIKM2012
Is this Entitity Relevant to your Needs - CIKM2012Is this Entitity Relevant to your Needs - CIKM2012
Is this Entitity Relevant to your Needs - CIKM2012David Carmel
 
201206 IASA Session 408 - Applied Analytics
201206 IASA Session 408 - Applied Analytics201206 IASA Session 408 - Applied Analytics
201206 IASA Session 408 - Applied Analytics
Steven Callahan
 
Explaining the Explainability: ‘Why’ and ‘How’ of Explainability in Research
Explaining the Explainability: ‘Why’ and ‘How’ of Explainability  in ResearchExplaining the Explainability: ‘Why’ and ‘How’ of Explainability  in Research
Explaining the Explainability: ‘Why’ and ‘How’ of Explainability in Research
Melih Bahar
 
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Gianluca Tarasconi
 
User Research in the Financial Space
User Research in the Financial SpaceUser Research in the Financial Space
User Research in the Financial Space
BentleyDUC
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
Using Big Data to create a data drive organization
Using Big Data to create a data drive organizationUsing Big Data to create a data drive organization
Using Big Data to create a data drive organization
Edward Chenard
 
Metadata Quality
Metadata QualityMetadata Quality
Metadata Quality
tbruce
 
PSY 540 Short Presentation Guidelines and Rubric Overvi.docx
PSY 540 Short Presentation Guidelines and Rubric  Overvi.docxPSY 540 Short Presentation Guidelines and Rubric  Overvi.docx
PSY 540 Short Presentation Guidelines and Rubric Overvi.docx
potmanandrea
 
Semantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeSemantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCube
Amit Sheth
 
Leadership Decision Making Process 052311
Leadership   Decision Making Process   052311Leadership   Decision Making Process   052311
Leadership Decision Making Process 052311Richard Gay, CPPO, RSBO
 
Georgetown Law Guest Lecture 2012 6 2
Georgetown Law Guest Lecture 2012 6 2Georgetown Law Guest Lecture 2012 6 2
Georgetown Law Guest Lecture 2012 6 2
Sonya Sigler
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2
Dave King
 
Information Management and Analytics
Information Management and Analytics Information Management and Analytics
Information Management and Analytics AKAGroup
 
Towards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic DataTowards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic Data
Panos Alexopoulos
 

Similar to SF Women in eDiscovery Sept 2011 (20)

Exploring session search
Exploring session searchExploring session search
Exploring session search
 
TVOT June 2012
TVOT June 2012TVOT June 2012
TVOT June 2012
 
SLAS Informatics SIG: SLAS2013 Presentation
SLAS Informatics SIG: SLAS2013 PresentationSLAS Informatics SIG: SLAS2013 Presentation
SLAS Informatics SIG: SLAS2013 Presentation
 
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
 
Is this Entitity Relevant to your Needs - CIKM2012
Is this Entitity Relevant to your Needs - CIKM2012Is this Entitity Relevant to your Needs - CIKM2012
Is this Entitity Relevant to your Needs - CIKM2012
 
201206 IASA Session 408 - Applied Analytics
201206 IASA Session 408 - Applied Analytics201206 IASA Session 408 - Applied Analytics
201206 IASA Session 408 - Applied Analytics
 
Explaining the Explainability: ‘Why’ and ‘How’ of Explainability in Research
Explaining the Explainability: ‘Why’ and ‘How’ of Explainability  in ResearchExplaining the Explainability: ‘Why’ and ‘How’ of Explainability  in Research
Explaining the Explainability: ‘Why’ and ‘How’ of Explainability in Research
 
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
 
User Research in the Financial Space
User Research in the Financial SpaceUser Research in the Financial Space
User Research in the Financial Space
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
Using Big Data to create a data drive organization
Using Big Data to create a data drive organizationUsing Big Data to create a data drive organization
Using Big Data to create a data drive organization
 
Big Data in Context
Big Data in ContextBig Data in Context
Big Data in Context
 
Metadata Quality
Metadata QualityMetadata Quality
Metadata Quality
 
PSY 540 Short Presentation Guidelines and Rubric Overvi.docx
PSY 540 Short Presentation Guidelines and Rubric  Overvi.docxPSY 540 Short Presentation Guidelines and Rubric  Overvi.docx
PSY 540 Short Presentation Guidelines and Rubric Overvi.docx
 
Semantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeSemantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCube
 
Leadership Decision Making Process 052311
Leadership   Decision Making Process   052311Leadership   Decision Making Process   052311
Leadership Decision Making Process 052311
 
Georgetown Law Guest Lecture 2012 6 2
Georgetown Law Guest Lecture 2012 6 2Georgetown Law Guest Lecture 2012 6 2
Georgetown Law Guest Lecture 2012 6 2
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2
 
Information Management and Analytics
Information Management and Analytics Information Management and Analytics
Information Management and Analytics
 
Towards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic DataTowards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic Data
 

More from Sonya Sigler

Georgetown lecture 2012 6 2 full
Georgetown lecture 2012 6 2 fullGeorgetown lecture 2012 6 2 full
Georgetown lecture 2012 6 2 full
Sonya Sigler
 
2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started Sigler2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started Sigler
Sonya Sigler
 
2013 7 24 TAR Webinar 5 Tips & Myths Sigler
2013 7 24 TAR Webinar 5 Tips & Myths Sigler2013 7 24 TAR Webinar 5 Tips & Myths Sigler
2013 7 24 TAR Webinar 5 Tips & Myths Sigler
Sonya Sigler
 
2012 6 27 TAR Webinar Part 1 Sigler
2012 6 27 TAR Webinar Part 1 Sigler2012 6 27 TAR Webinar Part 1 Sigler
2012 6 27 TAR Webinar Part 1 Sigler
Sonya Sigler
 
2012 11 7 TAR Webinar Part 3 Sigler
2012 11 7 TAR Webinar Part 3 Sigler2012 11 7 TAR Webinar Part 3 Sigler
2012 11 7 TAR Webinar Part 3 Sigler
Sonya Sigler
 
2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler
Sonya Sigler
 

More from Sonya Sigler (6)

Georgetown lecture 2012 6 2 full
Georgetown lecture 2012 6 2 fullGeorgetown lecture 2012 6 2 full
Georgetown lecture 2012 6 2 full
 
2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started Sigler2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started Sigler
 
2013 7 24 TAR Webinar 5 Tips & Myths Sigler
2013 7 24 TAR Webinar 5 Tips & Myths Sigler2013 7 24 TAR Webinar 5 Tips & Myths Sigler
2013 7 24 TAR Webinar 5 Tips & Myths Sigler
 
2012 6 27 TAR Webinar Part 1 Sigler
2012 6 27 TAR Webinar Part 1 Sigler2012 6 27 TAR Webinar Part 1 Sigler
2012 6 27 TAR Webinar Part 1 Sigler
 
2012 11 7 TAR Webinar Part 3 Sigler
2012 11 7 TAR Webinar Part 3 Sigler2012 11 7 TAR Webinar Part 3 Sigler
2012 11 7 TAR Webinar Part 3 Sigler
 
2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 

SF Women in eDiscovery Sept 2011

  • 1. Getting to a Manageable Review Set Intake Focus on finding, Duplicates Data 25% reviewing & using the 100% “right” data, Junk/Spam/ Porn not just filtering data 20% NR/Priv 20% Non- Responsive 20% Responsive Produced & Priv 15% 12.25% These figures vary based upon the data set received 12/5/2011 1
  • 2. Review risks Failure to collect the right data Failure to find responsive documents Failure to recognize responsive documents Failure to recognize privileged documents Inconsistent treatment of documents (e.g., duplicates) Failure to complete project in a timely manner Sophisticated Tools – Understand What They Do and Don’t Do Well – Inform Yourself, Speak to References, Consultants 12/5/2011 2
  • 3. Search Methodologies Visualization Measurement Relationship Analysis documents with causal or sequential relationship Context Social Network Analysis relationships among relevant people relationships among relevant people Clustering Clustering Ontology Ontology Concept similarity of similarity of generalized generalized salient features salient features words or phrases words or phrases specific exact words, Content Keyword Keyword specific exact words specific exact words proximity searches, stemming 12/5/2011 3
  • 4. Myth Keyword Searching is the Way to Go If I agree to keyword terms, I am OK Missing in Action (Under-inclusive) Unwanted Extras (Over-inclusive) Multiple subject/persons (Disambiguate) Reality: Keyword Search is one tool among many! 12/5/2011 4
  • 5. "simple keyword searches end up being both over- and under- inclusive." Judge Paul Grimm, Victor Stanley, Inc. v. Creative Pipe, Inc., No. MJG-06-2662, 2008 U.S. Dist. LEXIS 42025 (D. Md. May 29, 2008). Keyword culling
  • 6. Keyword Accuracy Example Keyword search reduced the document set by only 47% And 88% of the documents returned by keyword search were not responsive (Over-inclusive) 8,553 responsive documents missed by keyword search (Almost 8% of responsive documents missed by keyword search - Under-inclusive) 12/5/2011 6
  • 7. Under Inclusive - Missing in Action Missing abbreviations / acronyms / clippings: – incentive stock option but not ISO Missing inflectional variants: – grant but not grants, granted, granting Missing spellings or common misspellings: – gray but not grey – privileged but not priviliged, priviledged, privilidged, priveliged, privelidged, priveledged, … Missing syntactic variants: • board of directors meetingbut not meeting of the board of directors, BOD meeting, board meeting, BOD mtg… Missing Synonyms/Paraphrases: • Hire date but not start date 12/5/2011 7
  • 8. Over-Inclusive - Unwanted Extras (a) Options Target: Sheila was granted 100,000 options at $10 Match: What are our options for lunch? Match in a signature line: Amanda Wacz Acme Stock Options Administrator Destroy Target:destroyevidence Match in a disclaimer: The information in this email, and any attachments, may contain confidential and/or privileged information and is intended solely for the use of the named recipient(s). Any disclosure or dissemination in whatever form, by anyone other than the recipient is strictly prohibited. If you have received this transmission in error, please contact the sender and destroy this message and any attachments. Thank you. 12/5/2011 8
  • 9. Over-Inclusive - Unwanted Extras (b) alter* Target: alter, alters, altered, altering Matches: alternate, alternative, alternation, altercate, altercation, alterably, … grant Target:stock optiongrant Matches names:Grant Woods, Howard Grant 12/5/2011 9
  • 10. Failure to Disambiguate Words that Relate to Multiple Subjects Example: refund is used to refer to: – FERC-ordered refunds owed by Enron for overcharging – Tax refunds (both corporate and personal) – Mundane business matters In a given matter, one might be of interest while the others are not 12/5/2011 10
  • 11. Technology Enhanced Review: Speed, Predictable Costs, and Accuracy Automate any portion of the review Source Eliminate Data Duplicates & System Files 100% Non-Responsive 30% Isolation Example from a real case ontologies NR by 30% Technology Responsive Enhanced by Technology Review Enhanced (removed Review Priv by another 18%) (removed High-Speed another 7%) Manual Review 22% 3% 15% 12/5/2011 11
  • 12. Example: “priv” ontology Valuable, re-usable work product Combines classifiers into concepts, into bigger concepts 12/5/2011 12
  • 13. Disclaimer Detection Disclaimers can throw off attempts to detect privileged communications Prevalent throughout many companies, even on trivial communications Detect them automatically, and exclude them from searches 12/5/2011 13
  • 14. Privileged by Actor Only Responsive Privileged by Actor and Term D omain of D isclaimer D etection Privileged by Term Only Privileged by D isclaimer Only 12/5/2011 14
  • 15. Priv Logs Expensive - But Do NOT Have to Be In re Vioxx Products Liability Litigation (E.D. La 2007) Merck’s Priv Log had 30,000 items on it – How to Make a Judge Angry – How to Waste Client Money – How to Attract Sanctions 12/5/2011 15
  • 16. Transparency of Process Discussing Review Protocols – Provide transparent, defensible, sophisticated search based on document content – Clustering, Ontologies, Analytics, and yes, sometimes Keywords too Develop search methodologies for each case – Use technology experts in consultation with case / legal experts Results verifiable by Quality Control – Defensible sampling Sophisticated Tools – Understand What They Do and Don’t Do Well – Inform Yourself, Speak to References, Consultants 12/5/2011 16
  • 17. Blair &Maron: Keyword search is incomplete What the lawyers thought 100% they were finding 90% Responsive documents 80% 70% 60% 50% What they 40% actually found 30% 20% 10% 0% Predicted Obtained Blair and Maron, Communications of the ACM, 28, 1985, 289-299
  • 18. Blair and Maron “It is impossibly difficult for users to predict the exact words, word combinations, and phrases that are used by all (or most) relevant documents and only (or primarily) by those documents.” Blair & Maron Study: 20% recall Lawyers picked 3 key terms, B & M found 26 more Defense: “Unfortunate incident” Plaintiff: “Disaster” Blair and Maron, Communications of the ACM, 28, 1985, 289-299
  • 20. Document categorization in Legal Discovery: Computer Classification vs. Manual Review Herbert L. Roitblat, Anne Kershaw, & Patrick Oot
  • 21. 1 0.95 0.9 Agreement with original 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 Team A Team B System C System D Manual Computer review classification 2010, JASIST Roitblat, Kershaw, &Oot,
  • 24. Substantial disagreement between Team A & Team B 28% 629 580 858 A Both B 0 500 1000 1500 2000 Responsive Documents Roitblat, Kershaw, &Oot, 2010, JASIST
  • 25. Conclusion The computer systems yielded comparable level of performance relative to manual review Fewer people, less time, less cost Measure performance to evaluate
  • 26. Will lawyers lose control? Computer system amplifies the intelligence of the Expert
  • 28. Tap into the mind of an expert
  • 29. Technology-Enhanced or Automated Review 12/5/2011 29
  • 30. Setup Sample Responsive Non- Expert judges responsive sample Repeat as needed Model learns Model predicts Responsive Non-responsive Model categorizes all remaining documents
  • 31. Predictive coding achieves much higher accuracy (Jaccard) Team A Only Team A and Team B Team B 0.304 0.281 0.415 Humans Humans and Predictive Coding Predictive Coding 0.186 0.688 0.126 Responsive Documents Data from Roitblat, et al. and an Internal OrcaTec Case Study
  • 32. Why doesn’t everyone use it? • Attorneys don’t understand the technology • May not be aware of the accuracy data • May not understand how to fit into their work flow • Not in everyone’s economic interest • Acceptable to judges?
  • 33. Defensible? Measure TREC Roitblat, e Roitblat Predictiv 2008 t al. Team et al. e A Team B Coding* Precision 0.210 0.197 0.183 0.899 Recall 0.555 0.488 0.539 0.873 *OrcaTec internal Result
  • 34. Thank you! Herb Roitblat Sonya Sigler 770-650-7706x229 650-281-8325 herb@orcatec.com sonya@sigler.name 12/5/2011 34

Editor's Notes

  1. Keyword and Boolean selection / searching yielded only 20% of the responsive documents.
  2. OrcaTec’s performance compares very favorably to similar measures observed using teams of human reviewers and other predictive coding systems. In the TREC 2008 ad hoc task, the highest recall achieved by a system was 0.555 (i.e., 55.5% of the documents identified as relevant were retrieved; Run “wat7fuse”). The precision corresponding to that level of recall was 0.210, meaning that 21% of the retrieved documents were determined to be relevant.Roitblat, Kershaw, and Oot measured precision and recall for two human teams. Team A yielded precision of 0.197 and recall of 0.488. Team B yielded precision of 0.183 and recall of 0.539.