AIIM Conference 2014
Orlando, FL
April 2, 2014
Jason R. Baron, Esq.
Information Governance and eDiscovery Group
Drinker Bi...
(c) Jason R. Baron 2013
We have entered the era where
Big Data is ….
(c) Jason R. Baron 2014
The World Has Changed
§  We are not just managing thousands or millions of paper files
§  We are at an inflection point ...
(c) Jason R. Baron 2013
Reality:
The era of information inflation and Big Data in litigation has
just begun….
Lehman Broth...
Information governance is needed in a world where . . .
-  80% of enterprise data is unstructured
-  60% of documents are ...
www.aiim.org/infochaos	
  
Do	
  YOU	
  understand	
  the	
  business	
  	
  
challenge	
  of	
  the	
  next	
  10	
  year...
Traditional Document Review Processes
8
§  Labor intensive
§  Linear Review
§  Quality of manual coding for responsiven...
9
Searching the Haystack….
10
to find relevant needles…
False
Positives
Relevant
Smoking
Policy Emails
OMB
VP Chief
of Staff
Ron Klain
Office of
the U.S.
Trade
Rep.
White
House
C...
12
Example of Boolean search string from
U.S. v. Philip Morris
§  (((master settlement agreement OR msa) AND
NOT (medical...
Emerging New Strategies:
“Predictive Analytics”
Improved review and case
assessment: cluster docs thru
use of software wit...
Defining “predictive coding” or
“TAR”
§  A process for prioritizing or coding a collection of electronic
documents using ...
Judicial endorsement of predictive
analytics in document review by Judge
Peck in da Silva Moore v. Publicis
Groupe (SDNY F...
The da Silva Moore Protocol
• Supervised learning
•  Random sampling
•  Establishment of seed set
• Issue tags
•  Iteratio...
The demise of RM….
● John Mancini, President of AIIM:
• “If by traditional records management you mean
manual systems—even...
Process Optimization Problem: The
transactional toll of user-based
recordkeeping schemes (“as is” RM)
(c) Jason R. Baron 2...
…. and the need for better,
automated solutions ….
(c) Jason R. Baron 2013
Email is still
the 800 lb.
gorilla of
ediscovery
(c) Jason R. Baron 2013
Archivist/OMB Directive
● M-12-18, Managing Government Records
Directive, dated 8/24/12:
1.1 By 2019, Federal agencies wil...
NARA Moved to the Cloud for Email with
Embedded RM/Autocategorization
(c) Jason R. Baron 2013
Capstone Officials
Capstone officials may
include:
●  Officials at or near the top of
an agency or an organizational
subco...
How To Avoid A Train Wreck With
Email Archiving….
Capture	
  E-­‐mail	
  But	
  U:lize	
  Records	
  Management!	
  
(c) J...
25
Can advanced analytics techniques and technologies,
including Auto-Categorization, Auto-redaction, Auto-
indexing, Auto...
Homage to Carl Linnaeus (1707-1778)
(c) Jason R. Baron 2013
Linnaean classification of the animal
kingdom§  Kingdom: Animalia
§  Phylum: Chordata
§  Subphylum: Vertebrata
§  Supe...
Which category?
(c) Jason R. Baron 2013
The Coming Age of Dark Archives (and the
inability to provide access unless we have
smart ways of extracting signal from n...
We should be leveraging the power of
predictive analytics to improve
information governance . . .
-- RM: defensible dispos...
(c) Jason R. Baron 2013
IG &Analytics: True Life Stories “Ripped from the
Headlines”
§  The Case of the Wayward Would-Be ...
What is the IGI?
The IGI is a cross-disciplinary think tank and consortium
dedicated to advancing the adoption of Informat...
“The future is here. It is just not evenly
distributed.”
--William Gibson
(c) Jason R. Baron 2013
References
Sources Referencing Information Governance, Autocategorization & Predictive Coding
B. Borden & J.R. Baron, “Fin...
www.aiim.org/infochaos	
  
Do	
  YOU	
  understand	
  the	
  business	
  	
  
challenge	
  of	
  the	
  next	
  10	
  year...
Jason R. Baron
Of Counsel
Drinker Biddle & Reath LLP
1500 K Street, N.W.
Washington, D.C. 20005
(202) 230-5196
Email: jaso...
All Needle, No Haystack: Bring Predictive Analysis to Information Governance
Upcoming SlideShare
Loading in …5
×

All Needle, No Haystack: Bring Predictive Analysis to Information Governance

907 views

Published on

The hottest topic in ediscovery continues to be the use of predictive analytics in making litigation more efficient, but why should we stop there? Can we take the lessons learned from the Da Silva Moore case about using new analytical tools and techniques and apply them to the information governance space? In this session, the founder of the TREC Legal Track and former Co-Chair of Working Group 1 of The Sedona Conference discusses how analytics may be used by law firms to add value to their client’s information governance issues, including with respect to business intelligence, e-record archiving, and record classification, retention and remediation.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
907
On SlideShare
0
From Embeds
0
Number of Embeds
193
Actions
Shares
0
Downloads
18
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

All Needle, No Haystack: Bring Predictive Analysis to Information Governance

  1. 1. AIIM Conference 2014 Orlando, FL April 2, 2014 Jason R. Baron, Esq. Information Governance and eDiscovery Group Drinker Biddle & Reath LLP Washington, D.C. 20005 © Jason R. Baron 2014 Finding The Signal in the Noise: Bringing Predictive Analytics To the Information Governance Space
  2. 2. (c) Jason R. Baron 2013
  3. 3. We have entered the era where Big Data is …. (c) Jason R. Baron 2014
  4. 4. The World Has Changed §  We are not just managing thousands or millions of paper files §  We are at an inflection point in history in terms of data volume §  IDC Report: 1800 new exabytes this year (1 exabyte=data equivalent of 50,000 yrs of continuous movies) §  Open data policies vs. “the iceberg”: a vast amount of information is “hidden” underneath the web —how is it to be reliably preserved and accessed? (c) Jason R. Baron 2013
  5. 5. (c) Jason R. Baron 2013 Reality: The era of information inflation and Big Data in litigation has just begun…. Lehman Brothers Investigation —  350 billion page universe (3 petabytes) —  Examiner narrowed collection by selecting key custodians, using dozens of Boolean searches —  Reviewed 5 million docs (40 million pages using 70 contract attorneys) Source: Report of Anton R. Valukas, Examiner, In re Lehman Brothers Holdings Inc., et al., Chapter 11 Case No. 08-13555 (U.S. Bankruptcy Ct. S.D.N.Y. March 11, 2010), Vol. 7, Appx. 5, at http:// lehmanreport.jenner.com/.
  6. 6. Information governance is needed in a world where . . . -  80% of enterprise data is unstructured -  60% of documents are obsolete -  50% of documents are duplicate -  80% documents are not retrieved by traditional search (c) Jason R. Baron 2013
  7. 7. www.aiim.org/infochaos   Do  YOU  understand  the  business     challenge  of  the  next  10  years?   This  ebook  from  AIIM  President   John  Mancini  explains.  
  8. 8. Traditional Document Review Processes 8 §  Labor intensive §  Linear Review §  Quality of manual coding for responsiveness open to question (see RAND Study, 2012)
  9. 9. 9 Searching the Haystack….
  10. 10. 10 to find relevant needles…
  11. 11. False Positives Relevant Smoking Policy Emails OMB VP Chief of Staff Ron Klain Office of the U.S. Trade Rep. White House Counsel
  12. 12. 12 Example of Boolean search string from U.S. v. Philip Morris §  (((master settlement agreement OR msa) AND NOT (medical savings account OR metropolitan standard area)) OR s. 1415 OR (ets AND NOT educational testing service) OR (liggett AND NOT sharon a. liggett) OR atco OR lorillard OR (pmi AND NOT presidential management intern) OR pm usa OR rjr OR (b&w AND NOT photo*) OR phillip morris OR batco OR ftc test method OR star scientific OR vector group OR joe camel OR (marlboro AND NOT upper marlboro)) AND NOT (tobacco* OR cigarette* OR smoking OR tar OR nicotine OR smokeless OR synar amendment OR philip morris OR r.j. reynolds OR ("brown and williamson") OR ("brown & williamson") OR bat industries OR liggett group)
  13. 13. Emerging New Strategies: “Predictive Analytics” Improved review and case assessment: cluster docs thru use of software with minimal human intervention at front end to code “seeded” data set Slide adapted from Gartner Conference June 23, 2010 Washington, D.C. (c) Jason R. Baron 2013
  14. 14. Defining “predictive coding” or “TAR” §  A process for prioritizing or coding a collection of electronic documents using a computerized system that harnesses human judgments of one or more subject matter experts on a smaller set of documents and then extrapolates those judgments to the remaining document population. §  Also referred to as “supervised or active machine learning,” “computer-assisted review” or “technology-assisted review” Source: Adapted from Grossman-Cormack Glossary of Technology Assisted Review, v. 1.0 (Oct 2012) (c) Jason R. Baron 2013
  15. 15. Judicial endorsement of predictive analytics in document review by Judge Peck in da Silva Moore v. Publicis Groupe (SDNY Feb. 24, 2012) This opinion appears to be the first in which a Court has approved of the use of computer-assisted review. . . . What the Bar should take away from this Opinion is that computer-assisted review is an available tool and should be seriously considered for use in large-data-volume cases where it may save the producing party (or both parties) significant amounts of legal fees in document review. Counsel no longer have to worry about being the ‘first’ or ‘guinea pig’ for judicial acceptance of computer-assisted review . . . Computer-assisted review can now be considered judicially- approved for use in appropriate cases. (c) Jason R. Baron 2013
  16. 16. The da Silva Moore Protocol • Supervised learning •  Random sampling •  Establishment of seed set • Issue tags •  Iteration •  Random sampling of docs deemed irrelevant (c) Jason R. Baron 2013
  17. 17. The demise of RM…. ● John Mancini, President of AIIM: • “If by traditional records management you mean manual systems—even if they are computerized – then I would say traditional records management is dead. The idea that we could get busy people to care about our complicated retention schedules, and drag and drop documents into folders, and manually apply metadata document by document according to an elaborate taxonomy will soon seem as ridiculous as asking a blacksmith to work on a Ferrari.” (c) Jason R. Baron 2013
  18. 18. Process Optimization Problem: The transactional toll of user-based recordkeeping schemes (“as is” RM) (c) Jason R. Baron 2013
  19. 19. …. and the need for better, automated solutions …. (c) Jason R. Baron 2013
  20. 20. Email is still the 800 lb. gorilla of ediscovery (c) Jason R. Baron 2013
  21. 21. Archivist/OMB Directive ● M-12-18, Managing Government Records Directive, dated 8/24/12: 1.1 By 2019, Federal agencies will manage all permanent records in an electronic format. 1.2 By 2016, Federal agencies will manage both permanent and temporary email records in an accessible electronic format. http://www.whitehouse.gov/sites/default/files/omb/memoranda/2012/m-12-18.pdf (c) Jason R. Baron 2013
  22. 22. NARA Moved to the Cloud for Email with Embedded RM/Autocategorization (c) Jason R. Baron 2013
  23. 23. Capstone Officials Capstone officials may include: ●  Officials at or near the top of an agency or an organizational subcomponent ●  Key staff members that may be in positions that create or receive presumptively permanent email records Capstone   accounts   Other   accounts   Key  staff   accounts   Other   accounts   (c) Jason R. Baron 2013
  24. 24. How To Avoid A Train Wreck With Email Archiving…. Capture  E-­‐mail  But  U:lize  Records  Management!   (c) Jason R. Baron 2013
  25. 25. 25 Can advanced analytics techniques and technologies, including Auto-Categorization, Auto-redaction, Auto- indexing, Auto-translation, etc., be applied and leveraged by Records Managers/Information Governance types? Yes, but …. Information Governance / Records Analytics
  26. 26. Homage to Carl Linnaeus (1707-1778) (c) Jason R. Baron 2013
  27. 27. Linnaean classification of the animal kingdom§  Kingdom: Animalia §  Phylum: Chordata §  Subphylum: Vertebrata §  Superclass: Tetrapoda §  Class: Mammalia §  Subclass: Theria §  Infraclass: Eutheria §  Cohort: Unguiculata §  Order: Primata §  Suborder: Anthropoidea §  Superfamily: Hominoidae §  Family: Hominidae §  Subfamily: Homininae §  Genus: Homo §  Subgenus: Homo (Homo) §  Specific epithet: sapiens (c) Jason R. Baron 2013
  28. 28. Which category? (c) Jason R. Baron 2013
  29. 29. The Coming Age of Dark Archives (and the inability to provide access unless we have smart ways of extracting signal from noise) (c) Jason R. Baron 2013
  30. 30. We should be leveraging the power of predictive analytics to improve information governance . . . -- RM: defensible disposal of low value information -- Regulatory compliance -- Risk mitigation – segregating sensitive materials… (PII, proprietary, etc.) -- Business intelligence -- E-discovery -- Collaboration across enterprise -- Providing access to dark data & archives (c) Jason R. Baron 2013
  31. 31. (c) Jason R. Baron 2013 IG &Analytics: True Life Stories “Ripped from the Headlines” §  The Case of the Wayward Would-Be Whisteblower §  The Case of the Mistakenly Valued Merger & Acquisition
  32. 32. What is the IGI? The IGI is a cross-disciplinary think tank and consortium dedicated to advancing the adoption of Information Governance practices and technologies through research, publishing, advocacy, and peer-to-peer networking. It provides industry thought leadership and benchmarking designed to foster consensus and conversation It is a connector among the stakeholders of information governance It is a promoter of industry best practices and standards www.iginitiative.com
  33. 33. “The future is here. It is just not evenly distributed.” --William Gibson (c) Jason R. Baron 2013
  34. 34. References Sources Referencing Information Governance, Autocategorization & Predictive Coding B. Borden & J.R. Baron, “Finding the Signal in the Noise: Information Governance, Analytics, and The Future of the Law,” 20 Richmond J. Law & Technology 7 (2014), http://jolt.richmond.edu J.R. Baron, “Law in the Age of Exabytes: Some Further Thoughts on ‘Information Inflation’ and Current Issues in E- Discovery Search, 17 Richmond J. Law & Technology (2011), see http://jolt.richmond.edu N. Pace, “Where The Money Goes: Understanding Litigant Expenditures for Producing E-Discovery,” RAND Publication (2012), see http://www.rand.org/pubs/monographs/MG1208.html TREC Legal Track Home Page, http://trec-legal.umiacs.umd.edu (includes bibliography for further reading) The Sedona Conference®, The Sedona Conference Commentary on Information Governance (2013) Latest “Supervised Learning/Predictive Coding” Case Law: •  Da Silva Moore v. Publicis Groupe, 2012 WL 607412 (S.D.N.Y. Feb. 24, 2012), approved and adopted in Da Silva Moore v. Publicis Groupe, 2012 WL 1446534, at *2 (S.D.N.Y. Apr. 26, 2012) •  EORHB v HOA Holdings, Civ. No. 7409-VCL (Del. Ch. Oct. 15, 2012) •  Global Aerospace Inc., et al. v. Landow Aviation, L.P., et al., 2012 WL 1431215 (Va. Cir. Ct. Apr. 23, 2012). •  In re Actos (Pioglitazone) Products, 2012 WL 3899669 (W.D. La. July 27, 2012) •  Kleen Products, LLC v. Packaging Corp. of America, 10 C 5711 (N.D. Ill.) (Nolan, M.J.) •  In re Biomet M2a Magnum Hip Implant Products Liability Litigation, 3:12-MD-2391 (S.D. Ind.) (April 18, 2013) (c) Jason R. Baron 2013
  35. 35. www.aiim.org/infochaos   Do  YOU  understand  the  business     challenge  of  the  next  10  years?   This  ebook  from  AIIM  President   John  Mancini  explains.  
  36. 36. Jason R. Baron Of Counsel Drinker Biddle & Reath LLP 1500 K Street, N.W. Washington, D.C. 20005 (202) 230-5196 Email: jason.baron@dbr.com (c) Jason R. Baron 2014

×