Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Taming Big Datawithin theCorporate LitigationLifecycleJeremy Greshin   Chris ToomeyStoredIQ         Catalyst
True or False?   Every two days we create as much information as we did from the dawn of    civilization up until 2003? ...
Info Management Maturity Curve                                                                               BIG          ...
Identify Your Data through „Data Topology‟ Map4
Classify Data for Relevance                                                                    PROTECT – protect high valu...
Intelligent Approach to ECA before collection “Early Case Assessment” can occur prior to preservation and collection, with...
Transition into Litigation Management…                                                        Processing                  ...
E-Discovery Costs                E-Discovery Costs                             Collection                                 ...
Keyword Searching – The Basics  “Attorneys working with  experienced paralegals were able  only to find about 20% of the  ...
Are Lawyers Qualified to Search? Whether search terms will yield the information sought is a complicated question involvin...
Are Key Words Enough?  Federal Rule 502 (b):  “Advanced Analytics”  [A] party that uses advanced  analytical software appl...
Search Tools to Streamline Content  NIST and Systems Files  Date Ranges, Custodians, File   Types  Key Words (Stemming,...
Protecting Privilege Workflow   Search      Categorize        Review         Quality Control                              ...
Production Quality ControlQuality Control    Rule 502(b) requires “reasonable precautions to                   prevent dis...
Is Manual Review the Gold Standard?   “The idea that exhaustive manual review is the most effective   – and therefore the ...
Predictive Coding
How does it work?
Predictive Coding Protocol     Use on document samples or seed sets     Transparency: Disclose the searches and results ...
What are the Benefits of Predictive Coding? Cost savings can be significant Decrease review time Increased accuracy and...
Potential Uses of Predictive Coding  Powerful tool with many use cases, each with its own degree of   perceived risk     ...
Predictive Ranking Trends in the Industry 37% use predictive ranking now 36% will start using it in the  next 12 months...
Benefits of Complete “End to End” Solution           Lowers eDiscovery costs by reducing the amount of           downstrea...
Jeremy Greshin   Chris ToomeyStoredIQ         Catalyst
Upcoming SlideShare
Loading in …5
×

Taming Big Data within the Corporate Litigation Lifecycle

1,082 views

Published on

Presentation given Sept. 13, 2012, at the Thomson Reuters Managing Litigation Series program in Chicago. Presenters were Chris Toomey, director of Alliance and Channel Development at Catalyst Repository Systems, and Jeremy Greshin, managing director, Legal Solutions, at StoredIQ.

  • Be the first to comment

Taming Big Data within the Corporate Litigation Lifecycle

  1. 1. Taming Big Datawithin theCorporate LitigationLifecycleJeremy Greshin Chris ToomeyStoredIQ Catalyst
  2. 2. True or False? Every two days we create as much information as we did from the dawn of civilization up until 2003? Unstructured data is the largest and fastest growing segment of the digital universe, growing at more than 62% a year? On average, more than 50% of data is more than 3 years old? 91% of data is never accessed beyond 90 days after it was created? 69% of data stored has no current business value? Collecting & reviewing data for eDiscovery costs $17,500/gigabyte? Review makes up the largest percentage of E-Discovery production costs? ANSWERS:(sources: Eric Schmidt of Google, Gartner, IDC, CGOC, EDRM, Sedona, RAND study)
  3. 3. Info Management Maturity Curve BIG BUSINE Strategic SS Data-driven Enterprise Corporate Policy Enforcement Business Value Business Process Optimization Classification & Relevance Find & Analyze Data Tactical BIG Data DATA Intelligence Low High3
  4. 4. Identify Your Data through „Data Topology‟ Map4
  5. 5. Classify Data for Relevance PROTECT – protect high value data through identification and isolation Defensible Data ESI DESTROY – identify and remove data Deletion that is unauthorized, or not related to general business 5% 30% - 50% EXPIRE – delete aged data that is not on Permanent/ Legal Hold Legal Hold CLASSIFY – optional phase for planning an information governance project Data Source Retention 100 TB Protect Destroy Expire Classify Store Platform 50 - 65 TB Phase I Phase II Phase III Phase IV5
  6. 6. Intelligent Approach to ECA before collection “Early Case Assessment” can occur prior to preservation and collection, without moving data from where it natively resides  Reduces the amount of downstream data ECA  Qualitatively enriches the data for downstream review  Enables assessment of the matter sooner and more accurately Processing  Lowers cost of formal Review and Analysis PreservationInformation Identification Review Production PresentationManagement Collection AnalysisVolume Relevance
  7. 7. Transition into Litigation Management… Processing Preservation Records Identification Analysis Production Trial/HearingManagement Collection ReviewData Volume (Actual) Relevance (% of Total) Electronic Discovery Reference Model “Seventy percent of e-discovery costs are spent on processing, analysis, review, and production.” - Forrester
  8. 8. E-Discovery Costs E-Discovery Costs Collection 8% Review Processing 73% 19%
  9. 9. Keyword Searching – The Basics “Attorneys working with experienced paralegals were able only to find about 20% of the relevant documents despite their belief that they had found more than 75% of the relevant documents” David C. Blair & M.E. Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System (1985).
  10. 10. Are Lawyers Qualified to Search? Whether search terms will yield the information sought is a complicated question involving the sciences of computer technology, statistics and linguistics. Magistrate Facciola in U. S. v. O’Keefe (D.D.C Feb. 2008) [A]ll keyword searches are not created equal…. The only prudent way to test the reliability of the keyword search is to perform some appropriate sampling… Magistrate Grimm in Victor Stanley v. Creative Pipe (D. Md. May 2008)
  11. 11. Are Key Words Enough? Federal Rule 502 (b): “Advanced Analytics” [A] party that uses advanced analytical software applications and linguistic tools in screening for privilege and work product may be found to have taken „reasonable steps’ to prevent inadvertent disclosure.” Advisory Committee Notes on the Amended Rule
  12. 12. Search Tools to Streamline Content  NIST and Systems Files  Date Ranges, Custodians, File Types  Key Words (Stemming, Proximity, Boolean)  Concept searching  Clustering  Email and Near Duplicates  Tracked Search  Language Identification
  13. 13. Protecting Privilege Workflow Search Categorize Review Quality Control  Coding Validation Attorney to Names Attorney Automated Workflow Attorney to Email Thread / Client Near Duplicate Law Firms Attorney to Integrated Client + Other Privilege Log Statistical Legal Terms Creation Attorney → Sampling Vendor
  14. 14. Production Quality ControlQuality Control Rule 502(b) requires “reasonable precautions to prevent disclosures.”  Coding Validation  Rule based production module to safeguard against producing documents coded as privileged Email Thread /  Use email threads and near duplicate analysis to Near Duplicate verify that similar/related documents are tagged consistently  Random statistically valid stratified sampling to QC productions Statistical Sampling
  15. 15. Is Manual Review the Gold Standard? “The idea that exhaustive manual review is the most effective – and therefore the most defensible – approach to document review is strongly refuted. Technology assisted review can (and does) yield more accurate results than exhaustive manual review, with much lower effort” - Maura Grossman and Gordon Commack “Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient that Exhaustive Manual Review”, Richmond Journal of Law and Tech, Vol XVII, Issue 3
  16. 16. Predictive Coding
  17. 17. How does it work?
  18. 18. Predictive Coding Protocol  Use on document samples or seed sets  Transparency: Disclose the searches and results  Use the samples to start “training” the computer  Matter Expert reviews first 500 selected  Computer refines search to validate  Continued review through number of rounds  Share the document sets  Larger document set analyzed  Non-relevant documents sampled and reviewed  Cut-off point determined after sampling.
  19. 19. What are the Benefits of Predictive Coding? Cost savings can be significant Decrease review time Increased accuracy and quality Early Case Assessment tool Manage risk & exposure Enable small review teams and eliminate costly outside review Reduction of collections and preservation volumes
  20. 20. Potential Uses of Predictive Coding  Powerful tool with many use cases, each with its own degree of perceived risk  Parties’ Agreement High  Early Case Assessment  Search Terms  Review Prioritization  Advanced Culling Medium  Automated Review  For clients not ready for automated review, can still be used to achieve significant cost savings Low Perceived Risk
  21. 21. Predictive Ranking Trends in the Industry 37% use predictive ranking now 36% will start using it in the next 12 months 88% of those who use it now will increase use in next 12 months 1% of those who use it now will decrease use Source: E-discovery Journal Poll, 2012
  22. 22. Benefits of Complete “End to End” Solution Lowers eDiscovery costs by reducing the amount of downstream data while improving search strategy outcomes Cost Enables assessment of matter & data quickly, accurately Time Formulate strategy with accurate cost predictability, control costs of review, target searching effectively, leverageStrategy technology. Comprehensive approach reduces data touch points, reduces spoliation, targets cost reductions, improves accuracy Risk
  23. 23. Jeremy Greshin Chris ToomeyStoredIQ Catalyst

×