SlideShare a Scribd company logo
1 of 85
“Triggers,” Preservation 
& Search 
June 2, 2012 
Georgetown Law 
Sonya L. Sigler 
9/23/2014 1
Overview 
Triggers & Preservation 
• What is it? 
• Why Does it Matter? 
Search 
Keyword Search 
Clustering 
Ontologies 
Technology Enhanced Review - Sampling 
Social Networking Analysis 
Relationship Analysis 
9/23/2014 2
“Triggers” & Preservation 
What is a Trigger? 
– Litigation reasonably anticipated 
– Who decides 
Litigation Hold Continuum 
– Established in hind sight 
– Threat 
– Letter about litigation 
– Filing Suit 
Cases 
– Pippin, Zubulake, Pension Committee 
9/23/2014 3
Pippins v. KPMG 
How much data to Preserve? 
– All hard drives (Pippins’ position) 
– 100 Sample Hard drives (KPMG’s position) 
To Cooperate or NOT to Cooperate? 
How Judges React to Lack of Cooperation 
9/23/2014 4
Zubulake 
Litigation Holds 
– Cannot send a request into the ether 
Preservation 
Have to follow-up 
Take affirmative steps to monitor compliance 
In-house Counsel Duty 
Cannot leave it to employees discretion 
Document what was done 
9/23/2014 5
Pension Committee 
No intentional destruction of data 
Careless & indifferent 
No Latchkey Custodians (alone & unsupervised) 
– Identify Custodians 
– Monitor their efforts 
– Including former employees and third parties 
Proactive 
Consistent 
Reasonable Approach 
9/23/2014 6
Triggers 
When does a duty to preserve arise? 
9/23/2014 7
What To Do? 
Who to include? 
– Not about data volume 
– Not about contact with underlying “litigation” 
Key Players (Zubulake opinions) 
– Likely to have relevant information 
– CEO, Board, Committees, employees, etc. 
Produce it from the Key Player (not others) 
– Nursing Home Pension Fund v. Oracle 
– Produce emails from the CEO (15) not others (1,650) 
9/23/2014 8
Spoliation 
Failure to Preserve 
– Didn’t Ask 
• Right person 
• Right Place 
– Didn’t follow up 
Destruction of Data 
– Intentional 
– Inadvertent destruction 
What can happen 
– Sanctions 
– Adverse Inferences 
9/23/2014 9
Search 
How to Use it To Find Information 
How to Use it to Ignore Information 
When to use which search methodology 
9/23/2014 10
Search - Data Assessment 
Where is the Data? 
– Data Mapping - databases, servers, desktops, laptops, 
IMs, smart phones, voicemail, other records 
Defining Process from Collection to Review to 
Production 
Collection Strategy, Process, Approach 
– Scope of collection: custodians, date ranges, topics 
Reports on the Data Processing 
– File types, encrypted files, de-duplication rates, 
password protected files, encrypted files, etc. 
Not Reasonably Accessible data 
Assessing Risk of Data Loss 
9/23/2014 11
Search - Case Assessment 
Who - Cast of Characters 
What - What the Heck Happened? 
Where - Where did it take place? 
When - What time period are we concerned with? 
How - fraud, antitrust violation, etc. 
WHY - What were the motives involved? 
Data Assessment ≠ Effective Case Assessment 
9/23/2014 12
Keyword Search Under Scrutiny 
United States v. O’Keefe (Facciola) 
– Questioned lawyers’ ability to decide which search terms are more likely to 
produce relevant information 
– Facciola has also suggested that litigants take a look at advanced search 
methodologies 
Victor Stanley, Inc. v. Creative Pipe, Inc. (Grimm) 
– Defensibility of process AND execution lies with the party relying upon the 
search protocol to meet their obligations which needs to be able to explain 
search rationale, appropriateness, and proper implementation 
– Advocates quality assurance, e.g. by sampling 
– Searches should be designed by a competent practitioner 
9/23/2014 13
Keyword Specific Case 
William A. Gross Construction Associates, Inc. v. 
American Manufacturers Mutual Insurance Company 
SDNY, Judge Andrew Peck 
Keyword list was in the thousands 
Use the actual data set and custodians to figure out 
keywords 
“This case is just the latest example of lawyers designing keyword 
searches in the dark, by the seat of the pants, without adequate 
(indeed, here, apparently without any) discussion with those who wrote 
the emails. Prior decisions from Magistrate Judges in the Baltimore- 
Washington Beltway have warned counsel of this problem, but the 
message has not gotten through to the Bar in this District.” 
9/23/2014 14
$6M Keyword Mistake 
In re Fannie Mae Securities Litigation 
3rd Party - OFHEO 
DC Circuit - Judge David Tatel 
Attorney agreed to something he did NOT understand 
Long list of key terms 
Taxpayers suffered the consequence 
9/23/2014 15
What This Means 
• The Courts are finally 
catching up 
• Courts actively ruling on 
Standards of Care and 
Process 
• Lawyers are Getting Wise 
9/23/2014 16
Case Law Effects on Discovery 
Defensibility of Review Process is now a focus 
– Culling now can kill you later 
– Cooperation is a hot topic 
– Tussle between inside & outside counsel 
– Beginning to see planning as a necessity 
Increased focus on Quality 
– Heightened involvement expected from corporate clients 
in the overall process 
– Cases pushing this, Qualcomm, Creative Pipe 
9/23/2014 17
What Else Is There? 
Effort to establish & codify uniform “Best Practices” 
– Quickly becoming roadmap for uneducated industry 
– Increasingly relied upon by judges as measure of reasonable or 
standard behavior 
Publications have addressed: 
– Document retention & production 
– Email management 
– Search & Retrieval 
– Protective orders & confidentiality 
– ESI admissibility 
9/23/2014 18
Getting to a Manageable Review Set 
Intake 
Data 
100% 
Duplicates 
25% 
reviewing & using the 
not just filtering data 
Non- 
Focus on finding, 
Responsive 
20% 
“right” data, 
Produced 
12.25% 
Junk/Spam/ 
Porn 
20% 
NR/Priv 
20% 
Responsive 
& Priv 15% 
These figures vary based upon the data set received 
9/23/2014 19
Search Methodologies 
Visualization 
Measurement 
Relationship 
Analysis 
documents with 
causal or 
sequential relationship 
Social Network Analysis 
relationships among relevant people 
Clustering Ontology 
similarity of 
salient features 
Ontology 
generalized 
generalized 
words or phrases 
words or phrases 
specific exact words, 
KKeeyywwoorrdd specific exact words 
Keyword 
Keyword specific exact words 
proximity searches, stemming 
Context 
Concept 
Content 
9/23/2014 20
Keyword Accuracy Example 
Keyword search reduced the 
document set by only 47% 
And 88% of the documents 
returned by keyword 
search were not responsive 
(Over-inclusive) 
8,553 responsive documents 
missed by keyword search 
(Almost 8% of responsive 
documents missed by 
keyword search - Under-inclusive) 
9/23/2014 21
Myth 
Keyword Searching is the Way to Go 
If I agree to keyword terms, I am OK 
Keyword Search Cases 
Keyword replacement example 
Keyword substitution 
Missing in Action (Under-inclusive) 
Unwanted Extras (Over-inclusve) 
Multiple subject/persons (Disambiguate) 
9/23/2014 22
Fact or Myth? 
Manual review by humans of large amounts of information 
is as accurate and complete as possible - perhaps even 
perfect - and constitutes the gold standard by which all 
searches should be measured 
This is “The reigning Myth of ‘perfect’ retrieval using traditional means” 
Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery 
The Sedona Conference Journal (2007) p. 199 
Human beings retrieved less than 20% of the relevant documents when they believed they were retrieving over 75% 
An Evaluation of Retrieval Effectiveness for a Full-Text Document Retrieval System 
Blair & Maron (1985) 
9/23/2014 23
IS 240 – Spring 2011 
Blair and Maron 1985 
A classic study of retrieval effectiveness 
– earlier studies were on unrealistically small collections 
Studied an archive of documents for a legal suit 
– ~350,000 pages of text 
– 40 queries 
– focus on high recall 
– Used IBM’s STAIRS full-text system 
Main Result: 
– The system retrieved less than 20% of the relevant 
documents for a particular information need; lawyers 
thought they had 75% 
But many queries had very high precision
IS 240 – Spring 2011 
Blair and Maron, cont. 
How they estimated recall 
– generated partially random samples of unseen documents 
– had users (unaware these were random) judge them for 
relevance 
Other results: 
– two lawyers searches had similar performance 
– lawyers recall was not much different from paralegal’s
IS 240 – Spring 2011 
Blair and Maron, cont. 
Why recall was low 
– users can’t foresee exact words and phrases that will 
indicate relevant documents 
• “accident” referred to by those responsible as: 
“event,” “incident,” “situation,” “problem,” … 
• differing technical terminology 
• slang, misspellings 
– Perhaps the value of higher recall decreases as the 
number of relevant documents grows, so more detailed 
queries were not attempted once the users were satisfied
Keyword Search Summary 
Pro 
Word Stemming 
–Hous* - house, housemate, 
household 
Easy to use/explain/agree 
Familiar 
Fast results 
Con 
Over-inclusive 
–Disambiguate 
Under-inclusive 
Word must be present 
Hard to craft 
Ineffective with short 
messages, IMs 
9/23/2014 27
Keyword Truths 
Under-inclusive - missing relevant or important 
info 
Over-inclusive - costly to review 
“Reasonable Keyword Search” doesn’t exist 
Effective keyword search is difficult/impossible 
– Index Data, Analyze Index 
– Suggest keywords or approach 
Keywords may not be appropriate for the data 
Keyword Search is ONE Tool in Your Arsenal 
9/23/2014 28
Keyword Accuracy Example 
Keyword search reduced the 
document set by only 47% 
And 88% of the documents 
returned by keyword 
search were not responsive 
(Over-inclusive) 
8,553 responsive documents 
missed by keyword search 
(Almost 8% of responsive 
documents missed by 
keyword search - Under-inclusive) 
9/23/2014 29
Search Methodology Continuum 
Review Methodology - Decided Upfront 
Identify Issues in the Case 
– Formulate Queries and Approaches for Finding 
Responsive Documents 
– Formulate Relevancy and Responsiveness Guidelines 
Identify Primary Participants 
Select or Triage Documents for Review 
9/23/2014 30
Review Tools for Relevancy Assessment 
Keyword Searches, Culling 
– Slices of Data are Reviewed 
Categorization of Data 
– Entire Dataset is Categorized 
– Review Targeted Data 
Automated Review 
– Categorization of Dataset 
– Random Sampling (Statistically Significant) 
9/23/2014 31
Categorization of Data for Review 
Categorize Entire Data Set 
– Spam/Porn/System Files 
– Personal/Private Data 
– Non-relevant Business Data 
Business Data 
– Relevancy Assessment by Topic 
– Privilege Review 
Keyword, Topic Analysis - Overlap, Holes 
9/23/2014 32
Search Methodologies 
Visualization 
Measurement 
Relationship 
Analysis 
documents with 
causal or 
sequential relationship 
Social Network Analysis 
relationships among relevant people 
Clustering Ontology 
similarity of 
salient features 
Ontology 
generalized 
generalized 
words or phrases 
words or phrases 
specific exact words, 
KKeeyywwoorrdd specific exact words 
Keyword 
Keyword specific exact words 
proximity searches, stemming 
Context 
Concept 
Content 
9/23/2014 33
Categorization Methods 
Statistical Methods (#s based) 
– Topic Clustering 
• Statistical Similarity 
• Counting #s of words, appearance together 
– Latent Semantic Indexing 
– Supervised v. Unsupervised Clustering 
Linguistic Methods (Word Based) 
– Keyword (Culling Method) 
– Ontologies 
9/23/2014 34
Clustering 
Clustering just means putting documents into groups that have 
something in common. 
Manually (that's what manual review is) 
Keyword Searches 
Ontologies (linguistic filters) 
Automated clustering (using technology) 
– Automated clustering by document type (all the Word 
documents go into one basket 
– Automated clustering by creation date 
– Automated clustering by Actor 
– Automated clustering by statistical similarity (statistical 
clustering) 
– ... and many other approaches 
9/23/2014 35
Clustering -- “Options” 
1 Cluster or 4 Clusters 
Financial/energy 
trading options 
Email/computer 
menu-driven 
options 
Stock options 
(ISO's) 
The generic idea of 
an available choice of 
action 
9/23/2014 36
Clustering 
Software implements statistical 
methods of finding groups of “similar” 
documents 
– “Similar” must be defined appropriately 
for the application 
Documents are categorized with very 
little effort by the user 
May help with document review 
– A single reviewer can look at similar 
documents together, produce 
consistent review decisions 
– Tight clustering can be used to detect 
“near duplicates” caused by OCR 
errors 
9/23/2014 37
Clustering vs. queries 
Clustering is unpredictable compared to keywords or 
taxonomies 
The items that look very similar (to the clustering 
algorithm) may not actually be similar in ways that 
matter 
– Relevancy may depend upon fine legal distinctions 
– May vary in the same matter by subpoena and/or 
jurisdiction 
9/23/2014 38
Ontologies 
Implement ontologies for directed searches. 
– Approach searching from a knowledge-representation viewpoint 
– Field is 25 years old, lots of work done 
– Advantages: 
• Disambiguate different meanings of the same word from their 
context 
 More accurate 
• Encapsulate many ways of saying the same thing 
 More thorough 
• Search for concepts, not individual words 
 More intuitive, more reusable, and faster 
Can be combined with other methods (unsupervised 
clustering, discussions). 
9/23/2014 39
Subjectivity 
GOOD WEATHER 
– Sun 
– Calm 
BAD WEATHER 
– Rain 
– Snow 
– Wind 
9/23/2014 40
A More Realistic Ontology 
ROYALTY CONCEPT 
• royalty 
• royalties 
• rty 
• commission 
• commissions 
• comm. 
• honorarium 
• honorariums 
• honoraria 
• usage fee 
• usage charge 
• usg fee 
• use fee 
• fee for use 
• fee for usage 
• incent* 
• insent* 
• earn a fee 
• eam a fee 
• charge for use 
• charged for use 
• charging for use 
• charges for use 
• licence fee 
• license fee 
• lisense fee 
• “take cut”~2 
• “takes cut”~2 
• “took cut”~2 
• “slice pie”~5 
• “piece pie”~5 
• “piece action”~5 
• “slice action”~5 
• -king 
• -queen 
• -prince 
• -princess 
9/23/2014 41
Ontology as a Query 
But it can be slightly cumbersome to deal with directly in 
that form 
q ((+(std:%CapacityReports_% std:%DINCapacity_%) +(std:%ACMEEPPlant_% std:%ProductName_%)) (+(std:%ACMEPNPlant_% 
std:%ProductName_%) +(std:%ProductiveCapability_% std:%CapacityReports_%)) (+(std:%CapacityCreep_% 
std:%OperationsImprovement_% std:%CapacityExpansion_% std:%CapacityRestoration_%) +(std:%ACMEPNPlant_% 
std:%ProductName_%)) (+(std:%EquipmentReplacement_% std:%FinishingColumn_%) +(std:%ACMEPNPlant_% 
std:%ProductName_%)) (std:%Audit_% actor:%Audit_%) (+(std:%SettlementNegotiations_% std:%ContractNegotiations_% ) 
+(actor:%ACMEOutsideCounsel_% std:%ACMEOutsideCounsel_% actor:%ACME UBOutsideCounsel_% 
std:%AcmeSubOutsideCounsel_% actor:%AcmeSub_% std:%AcmeSub_%)) (std:%FTC_% actor:%FTC_%) 
((+subject:%ProductName_% +(std:swap std:"supply agreement" std:"exchange agreement" std:"agree to exchange")) std:"name 
(About a quarter of its regular size) 
9/23/2014 42
Ontology Pros & Cons 
Identify acronyms 
Normalize variants 
Disambiguate terms 
Identify overly broad keywords 
Identify and correct keywords with errors 
Create extensive libraries of ontologies 
Can be used as a clustering method 
Topics can appear in more than one languages 
Reusable for different types of litigation, e.g. anti-trust, 
product liability etc. (and for both offense and defense) 
As with Keyword - word based 
Labor intensive, upfront 
9/23/2014 43
“Search” Terminology 
Technology-Enhanced Review 
Technology Assisted Review 
Automated Review 
Predictive Coding 
• Process 
• Workflow 
Technology 
People 
• Subject Matter 
• Review 
• Feedback 
• Privilege 
• Production 
Quality 
Control 
9/23/2014 44
Setup 
Sample 
Expert judges sample 
Non-responsive 
Responsive 
Model learns 
Model predicts 
Responsive Non-responsive 
Model categorizes all remaining documents 
Repeat as needed
Automated Review Methodology
Technology Enhanced Review: 
Speed, Predictable Costs, and Accuracy 
Example from a real case 
Priv by 
High-Speed 
Manual Review 
Automate any portion of the review 
Source 
Data 
Eliminate 
Duplicates & 
System Files 
Non-Responsive 
Isolation 
ontologies 
Responsive 
by Technology 
Enhanced 
Review 
(removed 
another 7%) 
NR by 
Technology 
Enhanced 
Review 
(removed 
another 18%) 
30% 
30% 
15% 
22% 
100% 
3% 
9/23/2014 47
Search Methodologies 
Visualization 
Measurement 
Relationship 
Analysis 
documents with 
causal or 
sequential relationship 
Social Network Analysis 
relationships among relevant people 
Clustering Ontology 
similarity of 
salient features 
Ontology 
generalized 
generalized 
words or phrases 
words or phrases 
specific exact words, 
KKeeyywwoorrdd specific exact words 
Keyword 
Keyword specific exact words 
proximity searches, stemming 
Context 
Concept 
Content 
9/23/2014 48
From Document Analysis to 
Social Network Analysis 
9/23/2014 49
From Social Network Analysis 
to Discussions 
9/23/2014 50
Search Methodologies 
Visualization 
Measurement 
Relationship 
Analysis 
documents with 
causal or 
sequential relationship 
Social Network Analysis 
relationships among relevant people 
Clustering Ontology 
similarity of 
salient features 
Ontology 
generalized 
generalized 
words or phrases 
words or phrases 
specific exact words, 
KKeeyywwoorrdd specific exact words 
Keyword 
Keyword specific exact words 
proximity searches, stemming 
Context 
Concept 
Content 
9/23/2014 51
Analytics are Based on the Model 
Analytics 
and on Discussions 
9/23/2014 52
Better Answers and Better Questions 
When were customary work practices circumvented? 
When did established norms of behavior change? 
Who knew, or likely knew, what facts? 
Who interacted with whom and how intimately? 
Who was involved in what types of decisions or meetings? 
Who are the real ‘insiders’? 
What data is hidden or missing? 
When were electronically documented conversations 
“taken off line,” possibly in an attempt to avoid detection? 
How did the importance of different actors change over time? 
9/23/2014 53
Bear Stearns 
Lower Bar For Fraud? 
Two hedge fund managers 
arrested 
Charged with securities and 
wire fraud, and one with 
insider trading 
Internal emails: 
– “I'm fearful of these markets. ... As we discussed it may not be a 
meltdown for the general economy but in our world it will be.” 
– “I think we should close the funds now .” 
External communications: 
– “We are very comfortable with exactly where we are.” 
– “The funds are performing exactly as they were designed to.” 
9/23/2014 54
Sentiment Analysis Visualization 
9/23/2014 55
Analysis of Anomalous Communication Patterns 
Unusual levels relative to a 
particular type of activity 
pop out 
Color-coded graphs show 
relative communication 
densities for apples to 
apples comparisons 
9/23/2014 56
Spread of Information 
9/23/2014 57
Emotive Tone 
Whistle-blower Scenario 
9/23/2014 58
“Call Me” Events 
Sequence Viewer used for analytics-driven review 
9/23/2014 59
Search Risks 
Failure to find responsive documents 
Failure to recognize responsive documents 
Failure to recognize privileged documents 
Inconsistent treatment of documents (e.g., 
duplicates) 
Failure to complete project in a timely manner 
Sophisticated Tools 
– Understand What They Do and Don’t Do Well 
– Inform Yourself, Speak to References, Consultants 
9/23/2014 60
Transparency of Process 
Discussing Review Protocols 
– Provide transparent, defensible, sophisticated search 
based on document content 
– Clustering, Ontologies, Analytics, and yes, sometimes 
Keywords too 
Develop search methodologies for each case 
– Use technology experts in consultation with case / legal 
experts 
Results verifiable by Quality Control 
– Defensible sampling 
9/23/2014 61
Thank you! 
Sonya L. Sigler 
Vice President, Product Strategy 
SFL Data 
415-321-8385 
sonya@sfldata.com 
www.sfldata.com 
9/23/2014 62
Review Protocol 
≠ Agreeing to Search Terms 
Data Culling (upfront or backend) 
Search Methodologies - Continuum 
– Keyword Positive List 
– Ontologies 
– Clustering 
– Technology Enhanced Review 
– Relationship Analysis 
Quality Control Process & Procedures 
Privilege Review, Sensitivities 
Production Format & Timing 
9/23/2014 63
Search 
The Courts are Finally Starting to Catch up to 
Technology 
Making more aggressive rulings: 
– Forcing attorneys to live with the results of bad 
searches 
– Sanctioning those who screw up, even if no allegation 
of fraud 
– Demanding repeatable, 
demonstrable process – using 
terms like “quality assurance” 
9/23/2014 64
Search Under Scrutiny 
Facciola’s Opinions - United States v. O’Keefe 
“for lawyers and judges to dare opine that a certain 
search term or terms would be more likely to produce 
information than [other] search terms … is truly to go 
where angels fear to tread.” 
He has also suggested that litigants take a good look at 
more advanced search methodologies, including the use 
of computational linguistics and technology assisted 
review 
9/23/2014 65
Reasonableness of Search Methods 
Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md., May 29, 2008). 
"Common sense suggests that even a properly designed and executed 
keyword search may prove to be over-inclusive or under-inclusive...the only 
prudent way to test the reliability of the keyword search is to perform some 
appropriate sampling." 
“Selection of the appropriate search and information retrieval technique 
requires careful advance planning by persons qualified to design effective 
search methodology. The implementation of the methodology selected should 
be tested for quality assurance; and the party selecting the methodology must 
be prepared to explain the rationale for the method chosen to the court, 
demonstrate that it is appropriate for the task, and show that it was properly 
implemented.” 
9/23/2014 66
From Pre-Discovery to Production Completeness 
Henry v. Quicken Loans --> 26(f) consulting 
– Lawyers agreed to keyword lists and process 
– Ran own (unsanctioned) searches with expert 
– Told to live with bad results, and pay for it 
Qualcomm --> Smell Test; Dig Deeper 
– In-house counsel (Qualcomm) v. Outside Counsel (Day Casebeer) 
– Sanctions, Attorney Client-Privilege Problems 
– Associate found docs and told they weren’t relevant; found out the 
hard way that those and 230,000 other pages were relevant 
Judge Rader’s Protocol in TX for Patent cases 
– 5 custodians 
– 5 search terms (can you say over broad…) 
9/23/2014 67
Under-inclusive - Missing in Action 
Missing abbreviations / acronyms / clippings: 
– incentive stock option but not ISO 
– Board of Directors but not BOD 
– 1998 plan but not 98 plan 
Missing inflectional variants: 
– grant but not grants, granted, granting 
Missing spellings or common misspellings: 
– gray but not grey 
– privileged but not priviliged, priviledged, privilidged, 
priveliged, privelidged, priveledged, … 
9/23/2014 68
Missing in Action II 
Missing syntactic variants: 
board of directors meeting 
but not 
meeting of the board 
of directors 
BOD meeting 
board meeting 
BOD mtg 
board mtg 
directors’ meeting 
directors’mtg 
mtg of the BOD 
mtg of the directors 
BOD meetings 
board meetings 
BOD mtgs 
board mtgs 
directors’ meetings 
directors’ mtgs 
mtgs of the BOD 
mtgs of the directors 
9/23/2014 69
Missing in Action III 
Missing synonyms / paraphrases: 
hire date but not start date 
approved by Smith 
but not 
Smith’s approval 
the approval of Smith 
Smith’s ok 
Smith’s go-ahead 
Smith’s goahead 
the go-ahead from 
Smith 
the goahead from 
Smith 
the nod from Smith 
Smith’s signature 
Smith’s sign-off 
the sign-off of Smith 
the signoff of Smith 
9/23/2014 70
Missing in Action IV 
As a keyword item, the address 
101 E. Bergen Ave., Temple, CA 90200 
does not match any of: 
101 East Bergen Avenue 
the Bergen site 
the Temple location 
our 90200 outlet 
9/23/2014 71
Over-inclusive - Unwanted Extras 
Options 
Target: Sheila was granted 100,000 options at $10 
Match: What are our options for lunch? 
Match in a signature line: 
Amanda Wacz 
Acme Stock Options Administrator 
Destroy 
Target: destroy evidence 
Match in a disclaimer: The information in this email, and any 
attachments, may contain confidential and/or privileged 
information and is intended solely for the use of the named 
recipient(s). Any disclosure or dissemination in whatever form, by 
anyone other than the recipient is strictly prohibited. If you have 
received this transmission in error, please contact the sender 
and destroy this message and any attachments. Thank you. 
9/23/2014 72
Unwanted Extras II 
alter* 
Target: alter, alters, altered, altering 
Matches: alternate, alternative, alternation, altercate, 
altercation, alterably, … 
grant 
Target: stock option grant 
Matches names: Grant Woods, Howard Grant 
9/23/2014 73
Tuning an Ontology 
Linguists briefed as reviewers 
Linguists read the data 
Linguists study complaint and other relevant 
documents 
Linguists analyze the search index 
Legal Team provides input, feedback 
9/23/2014 74
A Simple Linguistic Ontology 
ROYALTY CONCEPT 
– Royalty 
– Commission 
– Honorarium 
– Usage Fee 
– Slice of the Pie 
9/23/2014 75
A Simple Pricing Concept 
PRICING CONCEPT 
– Purchase Order 
– PO 
– Dollar amount 
– Invoice 
9/23/2014 76
Adding Subjective Content 
PRICING CONCEPT 
– Purchase Order 
– PO 
– Dollar amount 
– Invoice 
– Cylinder 
– Canister 
– Bottle 
9/23/2014 77
Ontology Usage 
Identifying Misspellings, Slang, Nicknames, etc. 
Variant Generation – help the user find what he 
meant (names, words, suggestions) 
– Buy* Buying, Buys, Bought, etc. 
– Kenneth Lay, Ken Lay, klay, kenneth.lay 
View variations in context to choose topics 
Document segmentation – text blocks, signatures 
Finding Words in Context, Frequency 
at serious risk of losing 25 
are certain risks inherent in 16 
9/23/2014 78
Identifying misspellings, slang, etc 
1. Match the index against electronic dictionary. 
2. From the remaining material (not in dictionary), remove any 
items that are merely numbers. 
3. Find (in the ontologies) any words that are similar to what 
remains. 
4. Add the similar words to the ontology 
This increases coverage (i.e., ensures 
that we retrieve documents that 
otherwise would have been missed) 
9/23/2014 79
Variant Generation 
Help the user find out 
search for what he meant 
Take names, numbers, 
and other entities for 
which the user wants to 
search 
Automatically generate 
likely synonyms 
9/23/2014 80
Variant Generation 
Show the context of these variations, so the user can 
evaluate them. 
9/23/2014 81
Document Segmentation 
Examples of signatures 
Jean-Louis Koenig 
President GGDA Region 
MegaCorp International SA 
Rue de Concours 2280 
Bern, Switzerland 
Robert Guilliam 
Product Regulatory Affairs & Compliance 
MegaCorp International 
Neuchatel 
Switzerland 
Tél. +41 (31) 125 2366 
Alberto Goreman 
Manager Printing & Packaging, Eastern Region 
+57 3 451 7195, alberto_goreman@megacorp.com 
9/23/2014 82
Finding words in context 
Phrase Total Instances 
risks alienating some 37 
at serious risk of losing 25 
are certain risks inherent in 16 
are at risk of running 15 
it be risking anything by 15 
difference a risk o why 14 
and the risks inherent in 12 
without assuming any risk 8 
we could risk losing next 7 
avoid transferring risk to the 5 
requires taking risks and the 4 
can t risk not living 3 
and unknown risks and uncertainties 2 
a potential risk that was 2 
avoid transfering risk to the 2 
This increases coverage AND precision 
9/23/2014 83
Multi-Lingual Issues 
Does language matter? 
– Lucerne 
– Luzerne 
– Lucerna 
These places were all the same city 
Name of city not necessarily expressed in the same 
language as rest of document 
In Europe, many email threads and documents are 
mixed language, and must be properly categorized as 
such 
9/23/2014 84
Automated Ontology Expansion Tools 
Currently implemented expansion modules: 
Spelling variants: 
color >> colour, defense >> defence, labeled>> labelled 
Lemmatization (recovering uninflected form): 
walking >> walk, ate >> eat 
Morphological variants: 
eat >> eats, eating, eaten, ate 
hablar >> hablo, hablas, habla, hablan, habláis, hablamos 
Number expansion: 
$2.5B >> two point five billion dollars 
2,567 >> two thousand five hundred sixty seven 
13 >> 13th, thirteenth 
Name variants: 
Elizabeth Van der Beek >> “Liz Van der Beek”, “Liz Vander Beek”, “Van 
der Beek, Elizabeth”, “Beth Vanderbeek”, etc. 
Email variants (mined from alias clusters file): 
Elizabeth Van der Beek >> evanderbeek, liz.vanderbeek, vanderbeekl, 
emvanderbeek, etc. 
Abbreviations: 
administrative project meeting >> admin project meeting, admin project 
mtg, admin proj mtg, etc. 
9/23/2014 85

More Related Content

What's hot

Birds Bears and Bs:Optimal SEO for Today's Search Engines
Birds Bears and Bs:Optimal SEO for Today's Search EnginesBirds Bears and Bs:Optimal SEO for Today's Search Engines
Birds Bears and Bs:Optimal SEO for Today's Search EnginesMarianne Sweeny
 
OpenRural's Guide to Digital Public Records in N.C.
OpenRural's Guide to Digital Public Records in N.C.OpenRural's Guide to Digital Public Records in N.C.
OpenRural's Guide to Digital Public Records in N.C.Ryan Thornburg
 
Advanced Keyword Research SMX Toronto March 2013
Advanced Keyword Research SMX Toronto March 2013Advanced Keyword Research SMX Toronto March 2013
Advanced Keyword Research SMX Toronto March 2013BrightEdge
 
From metasearch to metaservices
From metasearch to metaservicesFrom metasearch to metaservices
From metasearch to metaservicesdswalker
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
 
Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...ARDC
 
INSIDER'S PERSPECTIVE: Three Trends That Will Define the Next Horizon in Lega...
INSIDER'S PERSPECTIVE: Three Trends That Will Define the Next Horizon in Lega...INSIDER'S PERSPECTIVE: Three Trends That Will Define the Next Horizon in Lega...
INSIDER'S PERSPECTIVE: Three Trends That Will Define the Next Horizon in Lega...LexisNexis
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
 
Best Practices for Conducting Sexual Harassment Investigations
Best Practices for Conducting Sexual Harassment InvestigationsBest Practices for Conducting Sexual Harassment Investigations
Best Practices for Conducting Sexual Harassment InvestigationsCase IQ
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset ConversationMicah Altman
 

What's hot (10)

Birds Bears and Bs:Optimal SEO for Today's Search Engines
Birds Bears and Bs:Optimal SEO for Today's Search EnginesBirds Bears and Bs:Optimal SEO for Today's Search Engines
Birds Bears and Bs:Optimal SEO for Today's Search Engines
 
OpenRural's Guide to Digital Public Records in N.C.
OpenRural's Guide to Digital Public Records in N.C.OpenRural's Guide to Digital Public Records in N.C.
OpenRural's Guide to Digital Public Records in N.C.
 
Advanced Keyword Research SMX Toronto March 2013
Advanced Keyword Research SMX Toronto March 2013Advanced Keyword Research SMX Toronto March 2013
Advanced Keyword Research SMX Toronto March 2013
 
From metasearch to metaservices
From metasearch to metaservicesFrom metasearch to metaservices
From metasearch to metaservices
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...Managing and publishing sensitive data in the social sciences - Webinar trans...
Managing and publishing sensitive data in the social sciences - Webinar trans...
 
INSIDER'S PERSPECTIVE: Three Trends That Will Define the Next Horizon in Lega...
INSIDER'S PERSPECTIVE: Three Trends That Will Define the Next Horizon in Lega...INSIDER'S PERSPECTIVE: Three Trends That Will Define the Next Horizon in Lega...
INSIDER'S PERSPECTIVE: Three Trends That Will Define the Next Horizon in Lega...
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Best Practices for Conducting Sexual Harassment Investigations
Best Practices for Conducting Sexual Harassment InvestigationsBest Practices for Conducting Sexual Harassment Investigations
Best Practices for Conducting Sexual Harassment Investigations
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 

Viewers also liked

Morforegiaona nossa região
Morforegiaona nossa regiãoMorforegiaona nossa região
Morforegiaona nossa regiãobecresforte
 
Uma história de reconciliação
Uma história de reconciliaçãoUma história de reconciliação
Uma história de reconciliaçãoAmanda Duarte
 
Discurso De Posse D
Discurso De  Posse  DDiscurso De  Posse  D
Discurso De Posse Drowalino
 
Discurso De Posse D
Discurso De  Posse  DDiscurso De  Posse  D
Discurso De Posse Drowalino
 
Legalização de documentos belo horizonte
Legalização de documentos  belo horizonteLegalização de documentos  belo horizonte
Legalização de documentos belo horizontejuramentado02
 

Viewers also liked (8)

Pericles
PericlesPericles
Pericles
 
Morforegiaona nossa região
Morforegiaona nossa regiãoMorforegiaona nossa região
Morforegiaona nossa região
 
Uma história de reconciliação
Uma história de reconciliaçãoUma história de reconciliação
Uma história de reconciliação
 
025 judas
025 judas025 judas
025 judas
 
Discurso De Posse D
Discurso De  Posse  DDiscurso De  Posse  D
Discurso De Posse D
 
Artigo nadja
Artigo nadjaArtigo nadja
Artigo nadja
 
Discurso De Posse D
Discurso De  Posse  DDiscurso De  Posse  D
Discurso De Posse D
 
Legalização de documentos belo horizonte
Legalização de documentos  belo horizonteLegalização de documentos  belo horizonte
Legalização de documentos belo horizonte
 

Similar to Georgetown lecture 2012 6 2 full

Georgetown Law Guest Lecture 2012 6 2
Georgetown Law Guest Lecture 2012 6 2Georgetown Law Guest Lecture 2012 6 2
Georgetown Law Guest Lecture 2012 6 2Sonya Sigler
 
Electric Insurance ESI Planning
Electric Insurance   ESI PlanningElectric Insurance   ESI Planning
Electric Insurance ESI PlanningJohn Jablonski
 
Data collection methods
Data collection methodsData collection methods
Data collection methodsashima_sodhi
 
Market and Social Research Part 8
Market and Social Research Part 8Market and Social Research Part 8
Market and Social Research Part 8bestsliders
 
Proportionality in Ediscovery
Proportionality in EdiscoveryProportionality in Ediscovery
Proportionality in EdiscoveryJosh Kubicki
 
Research Methodology Module-04
Research Methodology Module-04Research Methodology Module-04
Research Methodology Module-04Kishor Ade
 
Rsearch methodology
Rsearch methodologyRsearch methodology
Rsearch methodologyneeann24
 
Module 3 - Improving Current Business with External Data- Online
Module 3 - Improving Current Business with External Data- Online Module 3 - Improving Current Business with External Data- Online
Module 3 - Improving Current Business with External Data- Online caniceconsulting
 
T3 data collecting techniques
T3 data collecting techniquesT3 data collecting techniques
T3 data collecting techniqueskompellark
 
Data collection methods
Data collection methodsData collection methods
Data collection methodsSourabh Modgil
 
Custodian Interviews - How to Leverage a Valuable Opportunity
Custodian Interviews - How to Leverage a Valuable Opportunity Custodian Interviews - How to Leverage a Valuable Opportunity
Custodian Interviews - How to Leverage a Valuable Opportunity Logikcull.com
 
Computer Assisted Review and Reasonable Solutions under Rule26
Computer Assisted Review and Reasonable Solutions under Rule26Computer Assisted Review and Reasonable Solutions under Rule26
Computer Assisted Review and Reasonable Solutions under Rule26Michael Geske
 
Theres No Crying In Baseball...Or In E Discovery 04.30.10
Theres No Crying In Baseball...Or In E Discovery 04.30.10Theres No Crying In Baseball...Or In E Discovery 04.30.10
Theres No Crying In Baseball...Or In E Discovery 04.30.10knugent
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interityIUPUI
 
Different Methods of Collection of Data
Different Methods of Collection of DataDifferent Methods of Collection of Data
Different Methods of Collection of DataP. Veeresha
 
250 words agree or disagreePlease discuss the various limitation.docx
250 words agree or disagreePlease discuss the various limitation.docx250 words agree or disagreePlease discuss the various limitation.docx
250 words agree or disagreePlease discuss the various limitation.docxvickeryr87
 
Evidence Integrity And Evidence Continuity Essay
Evidence Integrity And Evidence Continuity EssayEvidence Integrity And Evidence Continuity Essay
Evidence Integrity And Evidence Continuity EssayJessica Howard
 
data collection methods
data collection methodsdata collection methods
data collection methodsKingMajanga
 
Streamlining Document Review & Production: Pitfalls and Best Practices
Streamlining Document Review & Production: Pitfalls and Best Practices Streamlining Document Review & Production: Pitfalls and Best Practices
Streamlining Document Review & Production: Pitfalls and Best Practices Osler, Hoskin & Harcourt LLP
 

Similar to Georgetown lecture 2012 6 2 full (20)

Georgetown Law Guest Lecture 2012 6 2
Georgetown Law Guest Lecture 2012 6 2Georgetown Law Guest Lecture 2012 6 2
Georgetown Law Guest Lecture 2012 6 2
 
Electric Insurance ESI Planning
Electric Insurance   ESI PlanningElectric Insurance   ESI Planning
Electric Insurance ESI Planning
 
Data collection methods
Data collection methodsData collection methods
Data collection methods
 
Market and Social Research Part 8
Market and Social Research Part 8Market and Social Research Part 8
Market and Social Research Part 8
 
Proportionality in Ediscovery
Proportionality in EdiscoveryProportionality in Ediscovery
Proportionality in Ediscovery
 
Research Methodology Module-04
Research Methodology Module-04Research Methodology Module-04
Research Methodology Module-04
 
Rsearch methodology
Rsearch methodologyRsearch methodology
Rsearch methodology
 
Module 3 - Improving Current Business with External Data- Online
Module 3 - Improving Current Business with External Data- Online Module 3 - Improving Current Business with External Data- Online
Module 3 - Improving Current Business with External Data- Online
 
T3 data collecting techniques
T3 data collecting techniquesT3 data collecting techniques
T3 data collecting techniques
 
Data collection methods
Data collection methodsData collection methods
Data collection methods
 
Custodian Interviews - How to Leverage a Valuable Opportunity
Custodian Interviews - How to Leverage a Valuable Opportunity Custodian Interviews - How to Leverage a Valuable Opportunity
Custodian Interviews - How to Leverage a Valuable Opportunity
 
Computer Assisted Review and Reasonable Solutions under Rule26
Computer Assisted Review and Reasonable Solutions under Rule26Computer Assisted Review and Reasonable Solutions under Rule26
Computer Assisted Review and Reasonable Solutions under Rule26
 
Theres No Crying In Baseball...Or In E Discovery 04.30.10
Theres No Crying In Baseball...Or In E Discovery 04.30.10Theres No Crying In Baseball...Or In E Discovery 04.30.10
Theres No Crying In Baseball...Or In E Discovery 04.30.10
 
Rm sem-3
Rm sem-3Rm sem-3
Rm sem-3
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interity
 
Different Methods of Collection of Data
Different Methods of Collection of DataDifferent Methods of Collection of Data
Different Methods of Collection of Data
 
250 words agree or disagreePlease discuss the various limitation.docx
250 words agree or disagreePlease discuss the various limitation.docx250 words agree or disagreePlease discuss the various limitation.docx
250 words agree or disagreePlease discuss the various limitation.docx
 
Evidence Integrity And Evidence Continuity Essay
Evidence Integrity And Evidence Continuity EssayEvidence Integrity And Evidence Continuity Essay
Evidence Integrity And Evidence Continuity Essay
 
data collection methods
data collection methodsdata collection methods
data collection methods
 
Streamlining Document Review & Production: Pitfalls and Best Practices
Streamlining Document Review & Production: Pitfalls and Best Practices Streamlining Document Review & Production: Pitfalls and Best Practices
Streamlining Document Review & Production: Pitfalls and Best Practices
 

More from Sonya Sigler

2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started Sigler2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started SiglerSonya Sigler
 
2013 7 24 TAR Webinar 5 Tips & Myths Sigler
2013 7 24 TAR Webinar 5 Tips & Myths Sigler2013 7 24 TAR Webinar 5 Tips & Myths Sigler
2013 7 24 TAR Webinar 5 Tips & Myths SiglerSonya Sigler
 
2012 6 27 TAR Webinar Part 1 Sigler
2012 6 27 TAR Webinar Part 1 Sigler2012 6 27 TAR Webinar Part 1 Sigler
2012 6 27 TAR Webinar Part 1 SiglerSonya Sigler
 
2012 11 7 TAR Webinar Part 3 Sigler
2012 11 7 TAR Webinar Part 3 Sigler2012 11 7 TAR Webinar Part 3 Sigler
2012 11 7 TAR Webinar Part 3 SiglerSonya Sigler
 
2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 SiglerSonya Sigler
 
SF Women in eDiscovery Sept 2011
SF Women in eDiscovery Sept 2011SF Women in eDiscovery Sept 2011
SF Women in eDiscovery Sept 2011Sonya Sigler
 

More from Sonya Sigler (6)

2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started Sigler2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started Sigler
 
2013 7 24 TAR Webinar 5 Tips & Myths Sigler
2013 7 24 TAR Webinar 5 Tips & Myths Sigler2013 7 24 TAR Webinar 5 Tips & Myths Sigler
2013 7 24 TAR Webinar 5 Tips & Myths Sigler
 
2012 6 27 TAR Webinar Part 1 Sigler
2012 6 27 TAR Webinar Part 1 Sigler2012 6 27 TAR Webinar Part 1 Sigler
2012 6 27 TAR Webinar Part 1 Sigler
 
2012 11 7 TAR Webinar Part 3 Sigler
2012 11 7 TAR Webinar Part 3 Sigler2012 11 7 TAR Webinar Part 3 Sigler
2012 11 7 TAR Webinar Part 3 Sigler
 
2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler
 
SF Women in eDiscovery Sept 2011
SF Women in eDiscovery Sept 2011SF Women in eDiscovery Sept 2011
SF Women in eDiscovery Sept 2011
 

Recently uploaded

A SHORT HISTORY OF LIBERTY'S PROGREE THROUGH HE EIGHTEENTH CENTURY
A SHORT HISTORY OF LIBERTY'S PROGREE THROUGH HE EIGHTEENTH CENTURYA SHORT HISTORY OF LIBERTY'S PROGREE THROUGH HE EIGHTEENTH CENTURY
A SHORT HISTORY OF LIBERTY'S PROGREE THROUGH HE EIGHTEENTH CENTURYJulian Scutts
 
Shubh_Burden of proof_Indian Evidence Act.pptx
Shubh_Burden of proof_Indian Evidence Act.pptxShubh_Burden of proof_Indian Evidence Act.pptx
Shubh_Burden of proof_Indian Evidence Act.pptxShubham Wadhonkar
 
一比一原版(UM毕业证书)美国密歇根大学安娜堡分校毕业证如何办理
一比一原版(UM毕业证书)美国密歇根大学安娜堡分校毕业证如何办理一比一原版(UM毕业证书)美国密歇根大学安娜堡分校毕业证如何办理
一比一原版(UM毕业证书)美国密歇根大学安娜堡分校毕业证如何办理A AA
 
Understanding the Role of Labor Unions and Collective Bargaining
Understanding the Role of Labor Unions and Collective BargainingUnderstanding the Role of Labor Unions and Collective Bargaining
Understanding the Role of Labor Unions and Collective Bargainingbartzlawgroup1
 
一比一原版曼彻斯特城市大学毕业证如何办理
一比一原版曼彻斯特城市大学毕业证如何办理一比一原版曼彻斯特城市大学毕业证如何办理
一比一原版曼彻斯特城市大学毕业证如何办理Airst S
 
3 Formation of Company.www.seribangash.com.ppt
3 Formation of Company.www.seribangash.com.ppt3 Formation of Company.www.seribangash.com.ppt
3 Formation of Company.www.seribangash.com.pptseri bangash
 
589308994-interpretation-of-statutes-notes-law-college.pdf
589308994-interpretation-of-statutes-notes-law-college.pdf589308994-interpretation-of-statutes-notes-law-college.pdf
589308994-interpretation-of-statutes-notes-law-college.pdfSUSHMITAPOTHAL
 
一比一原版(USC毕业证书)南加州大学毕业证学位证书
一比一原版(USC毕业证书)南加州大学毕业证学位证书一比一原版(USC毕业证书)南加州大学毕业证学位证书
一比一原版(USC毕业证书)南加州大学毕业证学位证书irst
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理Airst S
 
一比一原版伦敦南岸大学毕业证如何办理
一比一原版伦敦南岸大学毕业证如何办理一比一原版伦敦南岸大学毕业证如何办理
一比一原版伦敦南岸大学毕业证如何办理Airst S
 
一比一原版埃克塞特大学毕业证如何办理
一比一原版埃克塞特大学毕业证如何办理一比一原版埃克塞特大学毕业证如何办理
一比一原版埃克塞特大学毕业证如何办理Airst S
 
Hely-Hutchinson v. Brayhead Ltd .pdf
Hely-Hutchinson v. Brayhead Ltd         .pdfHely-Hutchinson v. Brayhead Ltd         .pdf
Hely-Hutchinson v. Brayhead Ltd .pdfBritto Valan
 
Corporate Sustainability Due Diligence Directive (CSDDD or the EU Supply Chai...
Corporate Sustainability Due Diligence Directive (CSDDD or the EU Supply Chai...Corporate Sustainability Due Diligence Directive (CSDDD or the EU Supply Chai...
Corporate Sustainability Due Diligence Directive (CSDDD or the EU Supply Chai...Dr. Oliver Massmann
 
Navigating Employment Law - Term Project.pptx
Navigating Employment Law - Term Project.pptxNavigating Employment Law - Term Project.pptx
Navigating Employment Law - Term Project.pptxelysemiller87
 
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理A AA
 
ASMA JILANI EXPLAINED CASE PLD 1972 FOR CSS
ASMA JILANI EXPLAINED CASE PLD 1972 FOR CSSASMA JILANI EXPLAINED CASE PLD 1972 FOR CSS
ASMA JILANI EXPLAINED CASE PLD 1972 FOR CSSCssSpamx
 
Interpretation of statute topics for project
Interpretation of statute topics for projectInterpretation of statute topics for project
Interpretation of statute topics for projectVarshRR
 
Independent Call Girls Pune | 8005736733 Independent Escorts & Dating Escorts...
Independent Call Girls Pune | 8005736733 Independent Escorts & Dating Escorts...Independent Call Girls Pune | 8005736733 Independent Escorts & Dating Escorts...
Independent Call Girls Pune | 8005736733 Independent Escorts & Dating Escorts...SUHANI PANDEY
 
ARTICLE 370 PDF about the indian constitution.
ARTICLE 370 PDF about the  indian constitution.ARTICLE 370 PDF about the  indian constitution.
ARTICLE 370 PDF about the indian constitution.tanughoshal0
 
一比一原版(ECU毕业证书)埃迪斯科文大学毕业证如何办理
一比一原版(ECU毕业证书)埃迪斯科文大学毕业证如何办理一比一原版(ECU毕业证书)埃迪斯科文大学毕业证如何办理
一比一原版(ECU毕业证书)埃迪斯科文大学毕业证如何办理Airst S
 

Recently uploaded (20)

A SHORT HISTORY OF LIBERTY'S PROGREE THROUGH HE EIGHTEENTH CENTURY
A SHORT HISTORY OF LIBERTY'S PROGREE THROUGH HE EIGHTEENTH CENTURYA SHORT HISTORY OF LIBERTY'S PROGREE THROUGH HE EIGHTEENTH CENTURY
A SHORT HISTORY OF LIBERTY'S PROGREE THROUGH HE EIGHTEENTH CENTURY
 
Shubh_Burden of proof_Indian Evidence Act.pptx
Shubh_Burden of proof_Indian Evidence Act.pptxShubh_Burden of proof_Indian Evidence Act.pptx
Shubh_Burden of proof_Indian Evidence Act.pptx
 
一比一原版(UM毕业证书)美国密歇根大学安娜堡分校毕业证如何办理
一比一原版(UM毕业证书)美国密歇根大学安娜堡分校毕业证如何办理一比一原版(UM毕业证书)美国密歇根大学安娜堡分校毕业证如何办理
一比一原版(UM毕业证书)美国密歇根大学安娜堡分校毕业证如何办理
 
Understanding the Role of Labor Unions and Collective Bargaining
Understanding the Role of Labor Unions and Collective BargainingUnderstanding the Role of Labor Unions and Collective Bargaining
Understanding the Role of Labor Unions and Collective Bargaining
 
一比一原版曼彻斯特城市大学毕业证如何办理
一比一原版曼彻斯特城市大学毕业证如何办理一比一原版曼彻斯特城市大学毕业证如何办理
一比一原版曼彻斯特城市大学毕业证如何办理
 
3 Formation of Company.www.seribangash.com.ppt
3 Formation of Company.www.seribangash.com.ppt3 Formation of Company.www.seribangash.com.ppt
3 Formation of Company.www.seribangash.com.ppt
 
589308994-interpretation-of-statutes-notes-law-college.pdf
589308994-interpretation-of-statutes-notes-law-college.pdf589308994-interpretation-of-statutes-notes-law-college.pdf
589308994-interpretation-of-statutes-notes-law-college.pdf
 
一比一原版(USC毕业证书)南加州大学毕业证学位证书
一比一原版(USC毕业证书)南加州大学毕业证学位证书一比一原版(USC毕业证书)南加州大学毕业证学位证书
一比一原版(USC毕业证书)南加州大学毕业证学位证书
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
一比一原版伦敦南岸大学毕业证如何办理
一比一原版伦敦南岸大学毕业证如何办理一比一原版伦敦南岸大学毕业证如何办理
一比一原版伦敦南岸大学毕业证如何办理
 
一比一原版埃克塞特大学毕业证如何办理
一比一原版埃克塞特大学毕业证如何办理一比一原版埃克塞特大学毕业证如何办理
一比一原版埃克塞特大学毕业证如何办理
 
Hely-Hutchinson v. Brayhead Ltd .pdf
Hely-Hutchinson v. Brayhead Ltd         .pdfHely-Hutchinson v. Brayhead Ltd         .pdf
Hely-Hutchinson v. Brayhead Ltd .pdf
 
Corporate Sustainability Due Diligence Directive (CSDDD or the EU Supply Chai...
Corporate Sustainability Due Diligence Directive (CSDDD or the EU Supply Chai...Corporate Sustainability Due Diligence Directive (CSDDD or the EU Supply Chai...
Corporate Sustainability Due Diligence Directive (CSDDD or the EU Supply Chai...
 
Navigating Employment Law - Term Project.pptx
Navigating Employment Law - Term Project.pptxNavigating Employment Law - Term Project.pptx
Navigating Employment Law - Term Project.pptx
 
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
 
ASMA JILANI EXPLAINED CASE PLD 1972 FOR CSS
ASMA JILANI EXPLAINED CASE PLD 1972 FOR CSSASMA JILANI EXPLAINED CASE PLD 1972 FOR CSS
ASMA JILANI EXPLAINED CASE PLD 1972 FOR CSS
 
Interpretation of statute topics for project
Interpretation of statute topics for projectInterpretation of statute topics for project
Interpretation of statute topics for project
 
Independent Call Girls Pune | 8005736733 Independent Escorts & Dating Escorts...
Independent Call Girls Pune | 8005736733 Independent Escorts & Dating Escorts...Independent Call Girls Pune | 8005736733 Independent Escorts & Dating Escorts...
Independent Call Girls Pune | 8005736733 Independent Escorts & Dating Escorts...
 
ARTICLE 370 PDF about the indian constitution.
ARTICLE 370 PDF about the  indian constitution.ARTICLE 370 PDF about the  indian constitution.
ARTICLE 370 PDF about the indian constitution.
 
一比一原版(ECU毕业证书)埃迪斯科文大学毕业证如何办理
一比一原版(ECU毕业证书)埃迪斯科文大学毕业证如何办理一比一原版(ECU毕业证书)埃迪斯科文大学毕业证如何办理
一比一原版(ECU毕业证书)埃迪斯科文大学毕业证如何办理
 

Georgetown lecture 2012 6 2 full

  • 1. “Triggers,” Preservation & Search June 2, 2012 Georgetown Law Sonya L. Sigler 9/23/2014 1
  • 2. Overview Triggers & Preservation • What is it? • Why Does it Matter? Search Keyword Search Clustering Ontologies Technology Enhanced Review - Sampling Social Networking Analysis Relationship Analysis 9/23/2014 2
  • 3. “Triggers” & Preservation What is a Trigger? – Litigation reasonably anticipated – Who decides Litigation Hold Continuum – Established in hind sight – Threat – Letter about litigation – Filing Suit Cases – Pippin, Zubulake, Pension Committee 9/23/2014 3
  • 4. Pippins v. KPMG How much data to Preserve? – All hard drives (Pippins’ position) – 100 Sample Hard drives (KPMG’s position) To Cooperate or NOT to Cooperate? How Judges React to Lack of Cooperation 9/23/2014 4
  • 5. Zubulake Litigation Holds – Cannot send a request into the ether Preservation Have to follow-up Take affirmative steps to monitor compliance In-house Counsel Duty Cannot leave it to employees discretion Document what was done 9/23/2014 5
  • 6. Pension Committee No intentional destruction of data Careless & indifferent No Latchkey Custodians (alone & unsupervised) – Identify Custodians – Monitor their efforts – Including former employees and third parties Proactive Consistent Reasonable Approach 9/23/2014 6
  • 7. Triggers When does a duty to preserve arise? 9/23/2014 7
  • 8. What To Do? Who to include? – Not about data volume – Not about contact with underlying “litigation” Key Players (Zubulake opinions) – Likely to have relevant information – CEO, Board, Committees, employees, etc. Produce it from the Key Player (not others) – Nursing Home Pension Fund v. Oracle – Produce emails from the CEO (15) not others (1,650) 9/23/2014 8
  • 9. Spoliation Failure to Preserve – Didn’t Ask • Right person • Right Place – Didn’t follow up Destruction of Data – Intentional – Inadvertent destruction What can happen – Sanctions – Adverse Inferences 9/23/2014 9
  • 10. Search How to Use it To Find Information How to Use it to Ignore Information When to use which search methodology 9/23/2014 10
  • 11. Search - Data Assessment Where is the Data? – Data Mapping - databases, servers, desktops, laptops, IMs, smart phones, voicemail, other records Defining Process from Collection to Review to Production Collection Strategy, Process, Approach – Scope of collection: custodians, date ranges, topics Reports on the Data Processing – File types, encrypted files, de-duplication rates, password protected files, encrypted files, etc. Not Reasonably Accessible data Assessing Risk of Data Loss 9/23/2014 11
  • 12. Search - Case Assessment Who - Cast of Characters What - What the Heck Happened? Where - Where did it take place? When - What time period are we concerned with? How - fraud, antitrust violation, etc. WHY - What were the motives involved? Data Assessment ≠ Effective Case Assessment 9/23/2014 12
  • 13. Keyword Search Under Scrutiny United States v. O’Keefe (Facciola) – Questioned lawyers’ ability to decide which search terms are more likely to produce relevant information – Facciola has also suggested that litigants take a look at advanced search methodologies Victor Stanley, Inc. v. Creative Pipe, Inc. (Grimm) – Defensibility of process AND execution lies with the party relying upon the search protocol to meet their obligations which needs to be able to explain search rationale, appropriateness, and proper implementation – Advocates quality assurance, e.g. by sampling – Searches should be designed by a competent practitioner 9/23/2014 13
  • 14. Keyword Specific Case William A. Gross Construction Associates, Inc. v. American Manufacturers Mutual Insurance Company SDNY, Judge Andrew Peck Keyword list was in the thousands Use the actual data set and custodians to figure out keywords “This case is just the latest example of lawyers designing keyword searches in the dark, by the seat of the pants, without adequate (indeed, here, apparently without any) discussion with those who wrote the emails. Prior decisions from Magistrate Judges in the Baltimore- Washington Beltway have warned counsel of this problem, but the message has not gotten through to the Bar in this District.” 9/23/2014 14
  • 15. $6M Keyword Mistake In re Fannie Mae Securities Litigation 3rd Party - OFHEO DC Circuit - Judge David Tatel Attorney agreed to something he did NOT understand Long list of key terms Taxpayers suffered the consequence 9/23/2014 15
  • 16. What This Means • The Courts are finally catching up • Courts actively ruling on Standards of Care and Process • Lawyers are Getting Wise 9/23/2014 16
  • 17. Case Law Effects on Discovery Defensibility of Review Process is now a focus – Culling now can kill you later – Cooperation is a hot topic – Tussle between inside & outside counsel – Beginning to see planning as a necessity Increased focus on Quality – Heightened involvement expected from corporate clients in the overall process – Cases pushing this, Qualcomm, Creative Pipe 9/23/2014 17
  • 18. What Else Is There? Effort to establish & codify uniform “Best Practices” – Quickly becoming roadmap for uneducated industry – Increasingly relied upon by judges as measure of reasonable or standard behavior Publications have addressed: – Document retention & production – Email management – Search & Retrieval – Protective orders & confidentiality – ESI admissibility 9/23/2014 18
  • 19. Getting to a Manageable Review Set Intake Data 100% Duplicates 25% reviewing & using the not just filtering data Non- Focus on finding, Responsive 20% “right” data, Produced 12.25% Junk/Spam/ Porn 20% NR/Priv 20% Responsive & Priv 15% These figures vary based upon the data set received 9/23/2014 19
  • 20. Search Methodologies Visualization Measurement Relationship Analysis documents with causal or sequential relationship Social Network Analysis relationships among relevant people Clustering Ontology similarity of salient features Ontology generalized generalized words or phrases words or phrases specific exact words, KKeeyywwoorrdd specific exact words Keyword Keyword specific exact words proximity searches, stemming Context Concept Content 9/23/2014 20
  • 21. Keyword Accuracy Example Keyword search reduced the document set by only 47% And 88% of the documents returned by keyword search were not responsive (Over-inclusive) 8,553 responsive documents missed by keyword search (Almost 8% of responsive documents missed by keyword search - Under-inclusive) 9/23/2014 21
  • 22. Myth Keyword Searching is the Way to Go If I agree to keyword terms, I am OK Keyword Search Cases Keyword replacement example Keyword substitution Missing in Action (Under-inclusive) Unwanted Extras (Over-inclusve) Multiple subject/persons (Disambiguate) 9/23/2014 22
  • 23. Fact or Myth? Manual review by humans of large amounts of information is as accurate and complete as possible - perhaps even perfect - and constitutes the gold standard by which all searches should be measured This is “The reigning Myth of ‘perfect’ retrieval using traditional means” Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery The Sedona Conference Journal (2007) p. 199 Human beings retrieved less than 20% of the relevant documents when they believed they were retrieving over 75% An Evaluation of Retrieval Effectiveness for a Full-Text Document Retrieval System Blair & Maron (1985) 9/23/2014 23
  • 24. IS 240 – Spring 2011 Blair and Maron 1985 A classic study of retrieval effectiveness – earlier studies were on unrealistically small collections Studied an archive of documents for a legal suit – ~350,000 pages of text – 40 queries – focus on high recall – Used IBM’s STAIRS full-text system Main Result: – The system retrieved less than 20% of the relevant documents for a particular information need; lawyers thought they had 75% But many queries had very high precision
  • 25. IS 240 – Spring 2011 Blair and Maron, cont. How they estimated recall – generated partially random samples of unseen documents – had users (unaware these were random) judge them for relevance Other results: – two lawyers searches had similar performance – lawyers recall was not much different from paralegal’s
  • 26. IS 240 – Spring 2011 Blair and Maron, cont. Why recall was low – users can’t foresee exact words and phrases that will indicate relevant documents • “accident” referred to by those responsible as: “event,” “incident,” “situation,” “problem,” … • differing technical terminology • slang, misspellings – Perhaps the value of higher recall decreases as the number of relevant documents grows, so more detailed queries were not attempted once the users were satisfied
  • 27. Keyword Search Summary Pro Word Stemming –Hous* - house, housemate, household Easy to use/explain/agree Familiar Fast results Con Over-inclusive –Disambiguate Under-inclusive Word must be present Hard to craft Ineffective with short messages, IMs 9/23/2014 27
  • 28. Keyword Truths Under-inclusive - missing relevant or important info Over-inclusive - costly to review “Reasonable Keyword Search” doesn’t exist Effective keyword search is difficult/impossible – Index Data, Analyze Index – Suggest keywords or approach Keywords may not be appropriate for the data Keyword Search is ONE Tool in Your Arsenal 9/23/2014 28
  • 29. Keyword Accuracy Example Keyword search reduced the document set by only 47% And 88% of the documents returned by keyword search were not responsive (Over-inclusive) 8,553 responsive documents missed by keyword search (Almost 8% of responsive documents missed by keyword search - Under-inclusive) 9/23/2014 29
  • 30. Search Methodology Continuum Review Methodology - Decided Upfront Identify Issues in the Case – Formulate Queries and Approaches for Finding Responsive Documents – Formulate Relevancy and Responsiveness Guidelines Identify Primary Participants Select or Triage Documents for Review 9/23/2014 30
  • 31. Review Tools for Relevancy Assessment Keyword Searches, Culling – Slices of Data are Reviewed Categorization of Data – Entire Dataset is Categorized – Review Targeted Data Automated Review – Categorization of Dataset – Random Sampling (Statistically Significant) 9/23/2014 31
  • 32. Categorization of Data for Review Categorize Entire Data Set – Spam/Porn/System Files – Personal/Private Data – Non-relevant Business Data Business Data – Relevancy Assessment by Topic – Privilege Review Keyword, Topic Analysis - Overlap, Holes 9/23/2014 32
  • 33. Search Methodologies Visualization Measurement Relationship Analysis documents with causal or sequential relationship Social Network Analysis relationships among relevant people Clustering Ontology similarity of salient features Ontology generalized generalized words or phrases words or phrases specific exact words, KKeeyywwoorrdd specific exact words Keyword Keyword specific exact words proximity searches, stemming Context Concept Content 9/23/2014 33
  • 34. Categorization Methods Statistical Methods (#s based) – Topic Clustering • Statistical Similarity • Counting #s of words, appearance together – Latent Semantic Indexing – Supervised v. Unsupervised Clustering Linguistic Methods (Word Based) – Keyword (Culling Method) – Ontologies 9/23/2014 34
  • 35. Clustering Clustering just means putting documents into groups that have something in common. Manually (that's what manual review is) Keyword Searches Ontologies (linguistic filters) Automated clustering (using technology) – Automated clustering by document type (all the Word documents go into one basket – Automated clustering by creation date – Automated clustering by Actor – Automated clustering by statistical similarity (statistical clustering) – ... and many other approaches 9/23/2014 35
  • 36. Clustering -- “Options” 1 Cluster or 4 Clusters Financial/energy trading options Email/computer menu-driven options Stock options (ISO's) The generic idea of an available choice of action 9/23/2014 36
  • 37. Clustering Software implements statistical methods of finding groups of “similar” documents – “Similar” must be defined appropriately for the application Documents are categorized with very little effort by the user May help with document review – A single reviewer can look at similar documents together, produce consistent review decisions – Tight clustering can be used to detect “near duplicates” caused by OCR errors 9/23/2014 37
  • 38. Clustering vs. queries Clustering is unpredictable compared to keywords or taxonomies The items that look very similar (to the clustering algorithm) may not actually be similar in ways that matter – Relevancy may depend upon fine legal distinctions – May vary in the same matter by subpoena and/or jurisdiction 9/23/2014 38
  • 39. Ontologies Implement ontologies for directed searches. – Approach searching from a knowledge-representation viewpoint – Field is 25 years old, lots of work done – Advantages: • Disambiguate different meanings of the same word from their context  More accurate • Encapsulate many ways of saying the same thing  More thorough • Search for concepts, not individual words  More intuitive, more reusable, and faster Can be combined with other methods (unsupervised clustering, discussions). 9/23/2014 39
  • 40. Subjectivity GOOD WEATHER – Sun – Calm BAD WEATHER – Rain – Snow – Wind 9/23/2014 40
  • 41. A More Realistic Ontology ROYALTY CONCEPT • royalty • royalties • rty • commission • commissions • comm. • honorarium • honorariums • honoraria • usage fee • usage charge • usg fee • use fee • fee for use • fee for usage • incent* • insent* • earn a fee • eam a fee • charge for use • charged for use • charging for use • charges for use • licence fee • license fee • lisense fee • “take cut”~2 • “takes cut”~2 • “took cut”~2 • “slice pie”~5 • “piece pie”~5 • “piece action”~5 • “slice action”~5 • -king • -queen • -prince • -princess 9/23/2014 41
  • 42. Ontology as a Query But it can be slightly cumbersome to deal with directly in that form q ((+(std:%CapacityReports_% std:%DINCapacity_%) +(std:%ACMEEPPlant_% std:%ProductName_%)) (+(std:%ACMEPNPlant_% std:%ProductName_%) +(std:%ProductiveCapability_% std:%CapacityReports_%)) (+(std:%CapacityCreep_% std:%OperationsImprovement_% std:%CapacityExpansion_% std:%CapacityRestoration_%) +(std:%ACMEPNPlant_% std:%ProductName_%)) (+(std:%EquipmentReplacement_% std:%FinishingColumn_%) +(std:%ACMEPNPlant_% std:%ProductName_%)) (std:%Audit_% actor:%Audit_%) (+(std:%SettlementNegotiations_% std:%ContractNegotiations_% ) +(actor:%ACMEOutsideCounsel_% std:%ACMEOutsideCounsel_% actor:%ACME UBOutsideCounsel_% std:%AcmeSubOutsideCounsel_% actor:%AcmeSub_% std:%AcmeSub_%)) (std:%FTC_% actor:%FTC_%) ((+subject:%ProductName_% +(std:swap std:"supply agreement" std:"exchange agreement" std:"agree to exchange")) std:"name (About a quarter of its regular size) 9/23/2014 42
  • 43. Ontology Pros & Cons Identify acronyms Normalize variants Disambiguate terms Identify overly broad keywords Identify and correct keywords with errors Create extensive libraries of ontologies Can be used as a clustering method Topics can appear in more than one languages Reusable for different types of litigation, e.g. anti-trust, product liability etc. (and for both offense and defense) As with Keyword - word based Labor intensive, upfront 9/23/2014 43
  • 44. “Search” Terminology Technology-Enhanced Review Technology Assisted Review Automated Review Predictive Coding • Process • Workflow Technology People • Subject Matter • Review • Feedback • Privilege • Production Quality Control 9/23/2014 44
  • 45. Setup Sample Expert judges sample Non-responsive Responsive Model learns Model predicts Responsive Non-responsive Model categorizes all remaining documents Repeat as needed
  • 47. Technology Enhanced Review: Speed, Predictable Costs, and Accuracy Example from a real case Priv by High-Speed Manual Review Automate any portion of the review Source Data Eliminate Duplicates & System Files Non-Responsive Isolation ontologies Responsive by Technology Enhanced Review (removed another 7%) NR by Technology Enhanced Review (removed another 18%) 30% 30% 15% 22% 100% 3% 9/23/2014 47
  • 48. Search Methodologies Visualization Measurement Relationship Analysis documents with causal or sequential relationship Social Network Analysis relationships among relevant people Clustering Ontology similarity of salient features Ontology generalized generalized words or phrases words or phrases specific exact words, KKeeyywwoorrdd specific exact words Keyword Keyword specific exact words proximity searches, stemming Context Concept Content 9/23/2014 48
  • 49. From Document Analysis to Social Network Analysis 9/23/2014 49
  • 50. From Social Network Analysis to Discussions 9/23/2014 50
  • 51. Search Methodologies Visualization Measurement Relationship Analysis documents with causal or sequential relationship Social Network Analysis relationships among relevant people Clustering Ontology similarity of salient features Ontology generalized generalized words or phrases words or phrases specific exact words, KKeeyywwoorrdd specific exact words Keyword Keyword specific exact words proximity searches, stemming Context Concept Content 9/23/2014 51
  • 52. Analytics are Based on the Model Analytics and on Discussions 9/23/2014 52
  • 53. Better Answers and Better Questions When were customary work practices circumvented? When did established norms of behavior change? Who knew, or likely knew, what facts? Who interacted with whom and how intimately? Who was involved in what types of decisions or meetings? Who are the real ‘insiders’? What data is hidden or missing? When were electronically documented conversations “taken off line,” possibly in an attempt to avoid detection? How did the importance of different actors change over time? 9/23/2014 53
  • 54. Bear Stearns Lower Bar For Fraud? Two hedge fund managers arrested Charged with securities and wire fraud, and one with insider trading Internal emails: – “I'm fearful of these markets. ... As we discussed it may not be a meltdown for the general economy but in our world it will be.” – “I think we should close the funds now .” External communications: – “We are very comfortable with exactly where we are.” – “The funds are performing exactly as they were designed to.” 9/23/2014 54
  • 56. Analysis of Anomalous Communication Patterns Unusual levels relative to a particular type of activity pop out Color-coded graphs show relative communication densities for apples to apples comparisons 9/23/2014 56
  • 57. Spread of Information 9/23/2014 57
  • 58. Emotive Tone Whistle-blower Scenario 9/23/2014 58
  • 59. “Call Me” Events Sequence Viewer used for analytics-driven review 9/23/2014 59
  • 60. Search Risks Failure to find responsive documents Failure to recognize responsive documents Failure to recognize privileged documents Inconsistent treatment of documents (e.g., duplicates) Failure to complete project in a timely manner Sophisticated Tools – Understand What They Do and Don’t Do Well – Inform Yourself, Speak to References, Consultants 9/23/2014 60
  • 61. Transparency of Process Discussing Review Protocols – Provide transparent, defensible, sophisticated search based on document content – Clustering, Ontologies, Analytics, and yes, sometimes Keywords too Develop search methodologies for each case – Use technology experts in consultation with case / legal experts Results verifiable by Quality Control – Defensible sampling 9/23/2014 61
  • 62. Thank you! Sonya L. Sigler Vice President, Product Strategy SFL Data 415-321-8385 sonya@sfldata.com www.sfldata.com 9/23/2014 62
  • 63. Review Protocol ≠ Agreeing to Search Terms Data Culling (upfront or backend) Search Methodologies - Continuum – Keyword Positive List – Ontologies – Clustering – Technology Enhanced Review – Relationship Analysis Quality Control Process & Procedures Privilege Review, Sensitivities Production Format & Timing 9/23/2014 63
  • 64. Search The Courts are Finally Starting to Catch up to Technology Making more aggressive rulings: – Forcing attorneys to live with the results of bad searches – Sanctioning those who screw up, even if no allegation of fraud – Demanding repeatable, demonstrable process – using terms like “quality assurance” 9/23/2014 64
  • 65. Search Under Scrutiny Facciola’s Opinions - United States v. O’Keefe “for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than [other] search terms … is truly to go where angels fear to tread.” He has also suggested that litigants take a good look at more advanced search methodologies, including the use of computational linguistics and technology assisted review 9/23/2014 65
  • 66. Reasonableness of Search Methods Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md., May 29, 2008). "Common sense suggests that even a properly designed and executed keyword search may prove to be over-inclusive or under-inclusive...the only prudent way to test the reliability of the keyword search is to perform some appropriate sampling." “Selection of the appropriate search and information retrieval technique requires careful advance planning by persons qualified to design effective search methodology. The implementation of the methodology selected should be tested for quality assurance; and the party selecting the methodology must be prepared to explain the rationale for the method chosen to the court, demonstrate that it is appropriate for the task, and show that it was properly implemented.” 9/23/2014 66
  • 67. From Pre-Discovery to Production Completeness Henry v. Quicken Loans --> 26(f) consulting – Lawyers agreed to keyword lists and process – Ran own (unsanctioned) searches with expert – Told to live with bad results, and pay for it Qualcomm --> Smell Test; Dig Deeper – In-house counsel (Qualcomm) v. Outside Counsel (Day Casebeer) – Sanctions, Attorney Client-Privilege Problems – Associate found docs and told they weren’t relevant; found out the hard way that those and 230,000 other pages were relevant Judge Rader’s Protocol in TX for Patent cases – 5 custodians – 5 search terms (can you say over broad…) 9/23/2014 67
  • 68. Under-inclusive - Missing in Action Missing abbreviations / acronyms / clippings: – incentive stock option but not ISO – Board of Directors but not BOD – 1998 plan but not 98 plan Missing inflectional variants: – grant but not grants, granted, granting Missing spellings or common misspellings: – gray but not grey – privileged but not priviliged, priviledged, privilidged, priveliged, privelidged, priveledged, … 9/23/2014 68
  • 69. Missing in Action II Missing syntactic variants: board of directors meeting but not meeting of the board of directors BOD meeting board meeting BOD mtg board mtg directors’ meeting directors’mtg mtg of the BOD mtg of the directors BOD meetings board meetings BOD mtgs board mtgs directors’ meetings directors’ mtgs mtgs of the BOD mtgs of the directors 9/23/2014 69
  • 70. Missing in Action III Missing synonyms / paraphrases: hire date but not start date approved by Smith but not Smith’s approval the approval of Smith Smith’s ok Smith’s go-ahead Smith’s goahead the go-ahead from Smith the goahead from Smith the nod from Smith Smith’s signature Smith’s sign-off the sign-off of Smith the signoff of Smith 9/23/2014 70
  • 71. Missing in Action IV As a keyword item, the address 101 E. Bergen Ave., Temple, CA 90200 does not match any of: 101 East Bergen Avenue the Bergen site the Temple location our 90200 outlet 9/23/2014 71
  • 72. Over-inclusive - Unwanted Extras Options Target: Sheila was granted 100,000 options at $10 Match: What are our options for lunch? Match in a signature line: Amanda Wacz Acme Stock Options Administrator Destroy Target: destroy evidence Match in a disclaimer: The information in this email, and any attachments, may contain confidential and/or privileged information and is intended solely for the use of the named recipient(s). Any disclosure or dissemination in whatever form, by anyone other than the recipient is strictly prohibited. If you have received this transmission in error, please contact the sender and destroy this message and any attachments. Thank you. 9/23/2014 72
  • 73. Unwanted Extras II alter* Target: alter, alters, altered, altering Matches: alternate, alternative, alternation, altercate, altercation, alterably, … grant Target: stock option grant Matches names: Grant Woods, Howard Grant 9/23/2014 73
  • 74. Tuning an Ontology Linguists briefed as reviewers Linguists read the data Linguists study complaint and other relevant documents Linguists analyze the search index Legal Team provides input, feedback 9/23/2014 74
  • 75. A Simple Linguistic Ontology ROYALTY CONCEPT – Royalty – Commission – Honorarium – Usage Fee – Slice of the Pie 9/23/2014 75
  • 76. A Simple Pricing Concept PRICING CONCEPT – Purchase Order – PO – Dollar amount – Invoice 9/23/2014 76
  • 77. Adding Subjective Content PRICING CONCEPT – Purchase Order – PO – Dollar amount – Invoice – Cylinder – Canister – Bottle 9/23/2014 77
  • 78. Ontology Usage Identifying Misspellings, Slang, Nicknames, etc. Variant Generation – help the user find what he meant (names, words, suggestions) – Buy* Buying, Buys, Bought, etc. – Kenneth Lay, Ken Lay, klay, kenneth.lay View variations in context to choose topics Document segmentation – text blocks, signatures Finding Words in Context, Frequency at serious risk of losing 25 are certain risks inherent in 16 9/23/2014 78
  • 79. Identifying misspellings, slang, etc 1. Match the index against electronic dictionary. 2. From the remaining material (not in dictionary), remove any items that are merely numbers. 3. Find (in the ontologies) any words that are similar to what remains. 4. Add the similar words to the ontology This increases coverage (i.e., ensures that we retrieve documents that otherwise would have been missed) 9/23/2014 79
  • 80. Variant Generation Help the user find out search for what he meant Take names, numbers, and other entities for which the user wants to search Automatically generate likely synonyms 9/23/2014 80
  • 81. Variant Generation Show the context of these variations, so the user can evaluate them. 9/23/2014 81
  • 82. Document Segmentation Examples of signatures Jean-Louis Koenig President GGDA Region MegaCorp International SA Rue de Concours 2280 Bern, Switzerland Robert Guilliam Product Regulatory Affairs & Compliance MegaCorp International Neuchatel Switzerland Tél. +41 (31) 125 2366 Alberto Goreman Manager Printing & Packaging, Eastern Region +57 3 451 7195, alberto_goreman@megacorp.com 9/23/2014 82
  • 83. Finding words in context Phrase Total Instances risks alienating some 37 at serious risk of losing 25 are certain risks inherent in 16 are at risk of running 15 it be risking anything by 15 difference a risk o why 14 and the risks inherent in 12 without assuming any risk 8 we could risk losing next 7 avoid transferring risk to the 5 requires taking risks and the 4 can t risk not living 3 and unknown risks and uncertainties 2 a potential risk that was 2 avoid transfering risk to the 2 This increases coverage AND precision 9/23/2014 83
  • 84. Multi-Lingual Issues Does language matter? – Lucerne – Luzerne – Lucerna These places were all the same city Name of city not necessarily expressed in the same language as rest of document In Europe, many email threads and documents are mixed language, and must be properly categorized as such 9/23/2014 84
  • 85. Automated Ontology Expansion Tools Currently implemented expansion modules: Spelling variants: color >> colour, defense >> defence, labeled>> labelled Lemmatization (recovering uninflected form): walking >> walk, ate >> eat Morphological variants: eat >> eats, eating, eaten, ate hablar >> hablo, hablas, habla, hablan, habláis, hablamos Number expansion: $2.5B >> two point five billion dollars 2,567 >> two thousand five hundred sixty seven 13 >> 13th, thirteenth Name variants: Elizabeth Van der Beek >> “Liz Van der Beek”, “Liz Vander Beek”, “Van der Beek, Elizabeth”, “Beth Vanderbeek”, etc. Email variants (mined from alias clusters file): Elizabeth Van der Beek >> evanderbeek, liz.vanderbeek, vanderbeekl, emvanderbeek, etc. Abbreviations: administrative project meeting >> admin project meeting, admin project mtg, admin proj mtg, etc. 9/23/2014 85