Guest Lecture on Litigation Holds, Preservation, and Search Methodologies. If you want to download this rather than just view, please email me at sonya@sonyasigler.com
Document Retention Policies Intersect Electronic Discovery Obligations - by M...SHIMOKAJI IP
There is an entitlement to electronic discovery, while a document retention policy can provide a safe harbor. What are the duties to preserve documents, and what are the risks of destruction?
Experience Mazda Zoom Zoom Lifestyle and Culture by Visiting and joining the Official Mazda Community at http://www.MazdaCommunity.org for additional insight into the Zoom Zoom Lifestyle and special offers for Mazda Community Members. If you live in Arizona, check out CardinaleWay Mazda's eCommerce website at http://www.Cardinale-Way-Mazda.com
This presentation describes history and impact of Alice decision in 2014. It also lists out key court decisions and important examples in USPTO guideline. Finally, it provides key take-aways and recommendations.
Document Retention Policies Intersect Electronic Discovery Obligations - by M...SHIMOKAJI IP
There is an entitlement to electronic discovery, while a document retention policy can provide a safe harbor. What are the duties to preserve documents, and what are the risks of destruction?
Experience Mazda Zoom Zoom Lifestyle and Culture by Visiting and joining the Official Mazda Community at http://www.MazdaCommunity.org for additional insight into the Zoom Zoom Lifestyle and special offers for Mazda Community Members. If you live in Arizona, check out CardinaleWay Mazda's eCommerce website at http://www.Cardinale-Way-Mazda.com
This presentation describes history and impact of Alice decision in 2014. It also lists out key court decisions and important examples in USPTO guideline. Finally, it provides key take-aways and recommendations.
Defining a Legal Strategy ... The Value in Early Case AssessmentAubrey Owens
Early Case Assessment provides the framework for litigators to identify and analyze electronically stored information in response to a litigation hold and.or discovery request.
Practical Legacy Data Remediation - Redgrave LLPRedgrave LLP
There are plenty of people echoing the risks associated with legacy data and a "keep everything” mentality. Join us for a webinar that takes those discussions a step further, offering insight from both a legal and technical perspective into how remediation projects can be managed cost effectively and in a manner that does not up-end everyday business operations. During this one-hour discussion, Redgrave LLP Partner Andy Cosgrove and Analysts Diana Fasching and Christian Rummelhoff also outline a defensible framework for the disposition of legacy data, and share real-world examples of paper and electronic remediation projects. Victoria Edelman, Vice President of Education for the ALSP and Director of Training for iCONECT Development, facilitates.
Practical Legacy Data Remediation - Redgrave LLPRedgrave LLP
There are plenty of people echoing the risks associated with legacy data and a "keep everything” mentality. Join us for a webinar that takes those discussions a step further, offering insight from both a legal and technical perspective into how remediation projects can be managed cost effectively and in a manner that does not up-end everyday business operations. During this one-hour discussion, Redgrave LLP Partner Andy Cosgrove and Analysts Diana Fasching and Christian Rummelhoff also outline a defensible framework for the disposition of legacy data, and share real-world examples of paper and electronic remediation projects. Victoria Edelman, Vice President of Education for the ALSP and Director of Training for iCONECT Development, facilitates.
Datascram is being called a massive “Datascam.” Engineers cut corners and, as it turns out, data is not deleted forever. Instead, once deleted, it resides on a Nigerian server where it is sold to the highest bidder. As the company prepares to shut its doors, new questions emerge about Damian Diamond’s role in the fiasco and whether he could be held personally responsible for the company’s potentially criminal activities.
Cyber Summit 2016: Privacy Issues in Big Data Sharing and ReuseCybera Inc.
Although there is no well-established definition of big data, its main characteristic is its sheer volume. Large volumes of data are generated by people (e.g., via social media) and by technology, including sensors (e.g., cameras, microphones), trackers (e.g., RFID tags, web surfing behavior) and other devices (e.g., mobile phones, wearables for self-surveillance/quantified self), whether or not they are connected to the Internet of Things. However, the large volumes of data needed to capitalize on the benefits of big data can to some extent also be established by the reuse of existing data, a source that is sometimes overlooked.
Data can be reused for purposes similar to that for which it was initially collected, but also beyond these purposes. Similarly, data can be reused in its original context, but also beyond this context. However, such repurposing and recontextualizing of data may lead to privacy issues. For instance, data reuse may lead to issues regarding informed consent and informational self-determination. When the data is used for profiling and other types of predictive analytics, also issues regarding stigmatization and discrimination may arise. This presentation by Bart Custers, Head of Research, eLaw – Center for Law and Digital Technologies at Leiden University, The Netherlands, focuses on the privacy issues of big data sharing and reuse and how these issues could be addressed.
2013 3 27 TAR Webinar Part 4 Getting Started SiglerSonya Sigler
Getting started using technology assisted review can be difficult if lawyers aren't used to this type of technology. Part 4 of this webinar series provides in depth coverage on how to get started with TAR tools.
Part 5 in this series of webinars on Demystifying Technology Assisted Review covers Dispelling Myths and Offering Practice Tips. Sonya Sigler of SFL Data, Paige Hunt of Perkins Coie, and CHris Mammen of Hogan Lovells cover this topic in depth.
Defining a Legal Strategy ... The Value in Early Case AssessmentAubrey Owens
Early Case Assessment provides the framework for litigators to identify and analyze electronically stored information in response to a litigation hold and.or discovery request.
Practical Legacy Data Remediation - Redgrave LLPRedgrave LLP
There are plenty of people echoing the risks associated with legacy data and a "keep everything” mentality. Join us for a webinar that takes those discussions a step further, offering insight from both a legal and technical perspective into how remediation projects can be managed cost effectively and in a manner that does not up-end everyday business operations. During this one-hour discussion, Redgrave LLP Partner Andy Cosgrove and Analysts Diana Fasching and Christian Rummelhoff also outline a defensible framework for the disposition of legacy data, and share real-world examples of paper and electronic remediation projects. Victoria Edelman, Vice President of Education for the ALSP and Director of Training for iCONECT Development, facilitates.
Practical Legacy Data Remediation - Redgrave LLPRedgrave LLP
There are plenty of people echoing the risks associated with legacy data and a "keep everything” mentality. Join us for a webinar that takes those discussions a step further, offering insight from both a legal and technical perspective into how remediation projects can be managed cost effectively and in a manner that does not up-end everyday business operations. During this one-hour discussion, Redgrave LLP Partner Andy Cosgrove and Analysts Diana Fasching and Christian Rummelhoff also outline a defensible framework for the disposition of legacy data, and share real-world examples of paper and electronic remediation projects. Victoria Edelman, Vice President of Education for the ALSP and Director of Training for iCONECT Development, facilitates.
Datascram is being called a massive “Datascam.” Engineers cut corners and, as it turns out, data is not deleted forever. Instead, once deleted, it resides on a Nigerian server where it is sold to the highest bidder. As the company prepares to shut its doors, new questions emerge about Damian Diamond’s role in the fiasco and whether he could be held personally responsible for the company’s potentially criminal activities.
Cyber Summit 2016: Privacy Issues in Big Data Sharing and ReuseCybera Inc.
Although there is no well-established definition of big data, its main characteristic is its sheer volume. Large volumes of data are generated by people (e.g., via social media) and by technology, including sensors (e.g., cameras, microphones), trackers (e.g., RFID tags, web surfing behavior) and other devices (e.g., mobile phones, wearables for self-surveillance/quantified self), whether or not they are connected to the Internet of Things. However, the large volumes of data needed to capitalize on the benefits of big data can to some extent also be established by the reuse of existing data, a source that is sometimes overlooked.
Data can be reused for purposes similar to that for which it was initially collected, but also beyond these purposes. Similarly, data can be reused in its original context, but also beyond this context. However, such repurposing and recontextualizing of data may lead to privacy issues. For instance, data reuse may lead to issues regarding informed consent and informational self-determination. When the data is used for profiling and other types of predictive analytics, also issues regarding stigmatization and discrimination may arise. This presentation by Bart Custers, Head of Research, eLaw – Center for Law and Digital Technologies at Leiden University, The Netherlands, focuses on the privacy issues of big data sharing and reuse and how these issues could be addressed.
2013 3 27 TAR Webinar Part 4 Getting Started SiglerSonya Sigler
Getting started using technology assisted review can be difficult if lawyers aren't used to this type of technology. Part 4 of this webinar series provides in depth coverage on how to get started with TAR tools.
Part 5 in this series of webinars on Demystifying Technology Assisted Review covers Dispelling Myths and Offering Practice Tips. Sonya Sigler of SFL Data, Paige Hunt of Perkins Coie, and CHris Mammen of Hogan Lovells cover this topic in depth.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
2. Overview
Triggers & Preservation
• What is it?
• Why Does it Matter?
Search
Keyword Search
Clustering
Ontologies
Technology Enhanced Review - Sampling
Social Networking Analysis
Relationship Analysis
6/4/2012 2
3. “Triggers” & Preservation
What is a Trigger?
– Litigation reasonably anticipated
– Who decides
Litigation Hold Continuum
– Established in hind sight
– Threat
– Letter about litigation
– Filing Suit
Cases
– Pippin, Zubulake, Pension Committee
6/4/2012 3
4. Pippins v. KPMG
How much data to Preserve?
– All hard drives (Pippins‟ position)
– 100 Sample Hard drives (KPMG‟s position)
To Cooperate or NOT to Cooperate?
How Judges React to Lack of Cooperation
6/4/2012 4
5. Zubulake
Litigation Holds
– Cannot send a request into the ether
Preservation
Have to follow-up
Take affirmative steps to monitor compliance
In-house Counsel Duty
Cannot leave it to employees discretion
Document what was done
6/4/2012 5
6. Pension Committee
No intentional destruction of data
Careless & indifferent
No Latchkey Custodians (alone & unsupervised)
– Identify Custodians
– Monitor their efforts
– Including former employees and third parties
Proactive
Consistent
Reasonable Approach
6/4/2012 6
7. Triggers
When does a duty to preserve arise?
6/4/2012 7
8. What To Do?
Who to include?
– Not about data volume
– Not about contact with underlying “litigation”
Key Players (Zubulake opinions)
– Likely to have relevant information
– CEO, Board, Committees, employees, etc.
Produce it from the Key Player (not others)
– Nursing Home Pension Fund v. Oracle
– Produce emails from the CEO (15) not others (1,650)
6/4/2012 8
9. Spoliation
Failure to Preserve
– Didn‟t Ask
• Right person
• Right Place
– Didn‟t follow up
Destruction of Data
– Intentional
– Inadvertent destruction
What can happen
– Sanctions
– Adverse Inferences
6/4/2012 9
10. Search
How to Use it To Find Information
How to Use it to Ignore Information
When to use which search methodology
6/4/2012 10
11. Search - Data Assessment
Where is the Data?
– Data Mapping -
databases, servers, desktops, laptops, IMs, smart
phones, voicemail, other records
Defining Process from Collection to Review to
Production
Collection Strategy, Process, Approach
– Scope of collection: custodians, date ranges, topics
Reports on the Data Processing
– File types, encrypted files, de-duplication
rates, password protected files, encrypted files, etc.
Not Reasonably Accessible data
Assessing Risk of Data Loss
6/4/2012 11
12. Search - Case Assessment
Who - Cast of Characters
What - What the Heck Happened?
Where - Where did it take place?
When - What time period are we concerned with?
How - fraud, antitrust violation, etc.
WHY - What were the motives involved?
Data Assessment ≠ Effective Case Assessment
6/4/2012 12
13. Keyword Search Under Scrutiny
United States v. O‟Keefe (Facciola)
– Questioned lawyers‟ ability to decide which search terms are more likely to
produce relevant information
– Facciola has also suggested that litigants take a look at advanced search
methodologies
Victor Stanley, Inc. v. Creative Pipe, Inc. (Grimm)
– Defensibility of process AND execution lies with the party relying upon the
search protocol to meet their obligations which needs to be able to explain
search rationale, appropriateness, and proper implementation
– Advocates quality assurance, e.g. by sampling
– Searches should be designed by a competent practitioner
6/4/2012 13
14. Keyword Specific Case
William A. Gross Construction Associates, Inc. v.
American Manufacturers Mutual Insurance Company
SDNY, Judge Andrew Peck
Keyword list was in the thousands
Use the actual data set and custodians to figure out
keywords
“This case is just the latest example of lawyers designing keyword
searches in the dark, by the seat of the pants, without adequate
(indeed, here, apparently without any) discussion with those who wrote
the emails. Prior decisions from Magistrate Judges in the Baltimore-
Washington Beltway have warned counsel of this problem, but the
message has not gotten through to the Bar in this District.”
6/4/2012 14
15. $6M Keyword Mistake
In re Fannie Mae Securities Litigation
3rd Party - OFHEO
DC Circuit - Judge David Tatel
Attorney agreed to something he did NOT understand
Long list of key terms
Taxpayers suffered the consequence
6/4/2012 15
16. What This Means
• The Courts are finally
catching up
• Courts actively ruling on
Standards of Care and
Process
• Lawyers are Getting Wise
6/4/2012 16
17. Case Law Effects on Discovery
Defensibility of Review Process is now a focus
– Culling now can kill you later
– Cooperation is a hot topic
– Tussle between inside & outside counsel
– Beginning to see planning as a necessity
Increased focus on Quality
– Heightened involvement expected from corporate clients
in the overall process
– Cases pushing this, Qualcomm, Creative Pipe
6/4/2012 17
18. What Else Is There?
Effort to establish & codify uniform “Best Practices”
– Quickly becoming roadmap for uneducated industry
– Increasingly relied upon by judges as measure of reasonable or
standard behavior
Publications have addressed:
– Document retention & production
– Email management
– Search & Retrieval
– Protective orders & confidentiality
– ESI admissibility
6/4/2012 18
19. Getting to a Manageable Review Set
Intake
Focus on
Duplicates
Data 25% finding, reviewing &
100%
using the “right” data,
Junk/Spam/
Porn not just filtering data
20%
NR/Priv
20%
Non-
Responsive
20%
Responsive Produced
& Priv 15% 12.25%
These figures vary based upon the data set received
6/4/2012 19
20. Search Methodologies
Visualization
Measurement
Relationship
Analysis
documents with
causal or
sequential relationship
Context
Social Network Analysis
relationships among relevant people
relationships among relevant people
Clustering
Clustering Ontology
Ontology
Concept similarity of
similarity of generalized
generalized
salient features
salient features words or phrases
words or phrases
specific exact words,
Content Keyword
Keyword specific exact words
specific exact words
proximity searches, stemming
6/4/2012 20
21. Keyword Accuracy Example
Keyword search reduced the
document set by only 47%
And 88% of the documents
returned by keyword
search were not responsive
(Over-inclusive)
8,553 responsive documents
missed by keyword search
(Almost 8% of responsive
documents missed by
keyword search - Under-inclusive)
6/4/2012 21
22. Myth
Keyword Searching is the Way to Go
If I agree to keyword terms, I am OK
Keyword Search Cases
Keyword replacement example
Keyword substitution
Missing in Action (Under-inclusive)
Unwanted Extras (Over-inclusve)
Multiple subject/persons (Disambiguate)
6/4/2012 22
23. Fact or Myth?
Manual review by humans of large amounts of information
is as accurate and complete as possible - perhaps even
perfect - and constitutes the gold standard by which all
searches should be measured
This is ‚The reigning Myth of ‘perfect’ retrieval using traditional
means‛
Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery
The Sedona Conference Journal (2007) p. 199
Human beings retrieved less than 20% of the relevant documents
when they believed they were retrieving over 75%
An Evaluation of Retrieval Effectiveness for a Full-Text Document Retrieval System
Blair & Maron (1985)
6/4/2012 23
24. Blair and Maron 1985
A classic study of retrieval effectiveness
– earlier studies were on unrealistically small collections
Studied an archive of documents for a legal suit
– ~350,000 pages of text
– 40 queries
– focus on high recall
– Used IBM‟s STAIRS full-text system
Main Result:
– The system retrieved less than 20% of the relevant
documents for a particular information need; lawyers
thought they had 75%
But many queries had very high precision
25. Blair and Maron, cont.
How they estimated recall
– generated partially random samples of unseen documents
– had users (unaware these were random) judge them for
relevance
Other results:
– two lawyers searches had similar performance
– lawyers recall was not much different from paralegal‟s
26. Blair and Maron, cont.
Why recall was low
– users can‟t foresee exact words and phrases that will
indicate relevant documents
• “accident” referred to by those responsible as:
“event,” “incident,” “situation,” “problem,” …
• differing technical terminology
• slang, misspellings
– Perhaps the value of higher recall decreases as the
number of relevant documents grows, so more detailed
queries were not attempted once the users were satisfied
27. Keyword Search Summary
Pro Con
Word Stemming Over-inclusive
–Hous* - –Disambiguate
house, housemate, household Under-inclusive
Easy to use/explain/agree Word must be present
Familiar Hard to craft
Ineffective with short
Fast results messages, IMs
6/4/2012 27
28. Keyword Truths
Under-inclusive - missing relevant or important
info
Over-inclusive - costly to review
“Reasonable Keyword Search” doesn‟t exist
Effective keyword search is difficult/impossible
– Index Data, Analyze Index
– Suggest keywords or approach
Keywords may not be appropriate for the data
Keyword Search is ONE Tool in Your Arsenal
6/4/2012 28
29. Keyword Accuracy Example
Keyword search reduced the
document set by only 47%
And 88% of the documents
returned by keyword
search were not responsive
(Over-inclusive)
8,553 responsive documents
missed by keyword search
(Almost 8% of responsive
documents missed by
keyword search - Under-inclusive)
6/4/2012 29
30. Search Methodology Continuum
Review Methodology - Decided Upfront
Identify Issues in the Case
– Formulate Queries and Approaches for Finding
Responsive Documents
– Formulate Relevancy and Responsiveness Guidelines
Identify Primary Participants
Select or Triage Documents for Review
6/4/2012 30
31. Review Tools for Relevancy Assessment
Keyword Searches, Culling
– Slices of Data are Reviewed
Categorization of Data
– Entire Dataset is Categorized
– Review Targeted Data
Automated Review
– Categorization of Dataset
– Random Sampling (Statistically Significant)
6/4/2012 31
32. Categorization of Data for Review
Categorize Entire Data Set
– Spam/Porn/System Files
– Personal/Private Data
– Non-relevant Business Data
Business Data
– Relevancy Assessment by Topic
– Privilege Review
Keyword, Topic Analysis - Overlap, Holes
6/4/2012 32
33. Search Methodologies
Visualization
Measurement
Relationship
Analysis
documents with
causal or
sequential relationship
Context
Social Network Analysis
relationships among relevant people
relationships among relevant people
Clustering
Clustering Ontology
Ontology
Concept similarity of
similarity of generalized
generalized
salient features
salient features words or phrases
words or phrases
specific exact words,
Content Keyword
Keyword specific exact words
specific exact words
proximity searches, stemming
6/4/2012 33
35. Clustering
Clustering just means putting documents into groups that have
something in common.
Manually (that's what manual review is)
Keyword Searches
Ontologies (linguistic filters)
Automated clustering (using technology)
– Automated clustering by document type (all the Word
documents go into one basket
– Automated clustering by creation date
– Automated clustering by Actor
– Automated clustering by statistical similarity (statistical
clustering)
– ... and many other approaches
6/4/2012 35
36. Clustering -- “Options”
1 Cluster or 4 Clusters
Financial/energy
trading options
Email/computer
menu-driven
options
Stock options
(ISO's)
The generic idea of
an available choice of
action
6/4/2012 36
37. Clustering
Software implements statistical
methods of finding groups of “similar”
documents
– “Similar” must be defined appropriately
for the application
Documents are categorized with very
little effort by the user
May help with document review
– A single reviewer can look at similar
documents together, produce
consistent review decisions
– Tight clustering can be used to detect
“near duplicates” caused by OCR
errors
6/4/2012 37
38. Clustering vs. queries
Clustering is unpredictable compared to keywords or
taxonomies
The items that look very similar (to the clustering
algorithm) may not actually be similar in ways that
matter
– Relevancy may depend upon fine legal distinctions
– May vary in the same matter by subpoena and/or
jurisdiction
6/4/2012 38
39. Ontologies
Implement ontologies for directed searches.
– Approach searching from a knowledge-representation viewpoint
– Field is 25 years old, lots of work done
– Advantages:
• Disambiguate different meanings of the same word from their
context
More accurate
• Encapsulate many ways of saying the same thing
More thorough
• Search for concepts, not individual words
More intuitive, more reusable, and faster
Can be combined with other methods (unsupervised
clustering, discussions).
6/4/2012 39
40. Subjectivity
GOOD WEATHER
– Sun
– Calm
BAD WEATHER
– Rain
– Snow
– Wind
6/4/2012 40
41. A More Realistic Ontology
ROYALTY CONCEPT
• royalty • charge for use
• royalties • charged for use
• rty
• charging for use
• commission
• commissions • charges for use
• comm. • licence fee
• honorarium • license fee
• honorariums • lisense fee
• honoraria • “take cut”~2
• usage fee • “takes cut”~2
• usage charge • “took cut”~2
• usg fee • “slice pie”~5
• use fee
• “piece pie”~5
• fee for use
• fee for usage • “piece action”~5
• incent* • “slice action”~5
• insent* • -king
• earn a fee • -queen
• eam a fee • -prince
• -princess
6/4/2012 41
42. Ontology as a Query
But it can be slightly cumbersome to deal with directly in
that form
q ((+(std:%CapacityReports_% std:%DINCapacity_%) +(std:%ACMEEPPlant_% std:%ProductName_%)) (+(std:%ACMEPNPlant_%
std:%ProductName_%) +(std:%ProductiveCapability_% std:%CapacityReports_%)) (+(std:%CapacityCreep_%
std:%OperationsImprovement_% std:%CapacityExpansion_% std:%CapacityRestoration_%) +(std:%ACMEPNPlant_%
std:%ProductName_%)) (+(std:%EquipmentReplacement_% std:%FinishingColumn_%) +(std:%ACMEPNPlant_%
std:%ProductName_%)) (std:%Audit_% actor:%Audit_%) (+(std:%SettlementNegotiations_% std:%ContractNegotiations_% )
+(actor:%ACMEOutsideCounsel_% std:%ACMEOutsideCounsel_% actor:%ACME UBOutsideCounsel_%
std:%AcmeSubOutsideCounsel_% actor:%AcmeSub_% std:%AcmeSub_%)) (std:%FTC_% actor:%FTC_%)
((+subject:%ProductName_% +(std:swap std:"supply agreement" std:"exchange agreement" std:"agree to exchange")) std:"name
(About a quarter of its regular size)
6/4/2012 42
43. Ontology Pros & Cons
Identify acronyms
Normalize variants
Disambiguate terms
Identify overly broad keywords
Identify and correct keywords with errors
Create extensive libraries of ontologies
Can be used as a clustering method
Topics can appear in more than one languages
Reusable for different types of litigation, e.g. anti-trust,
product liability etc. (and for both offense and defense)
As with Keyword - word based
Labor intensive, upfront
6/4/2012 43
44. “Search” Terminology
Technology-Enhanced Review
Technology Assisted Review
Automated Review
Predictive Coding
People
• Process • Privilege
• Workflow • Subject Matter • Production
• Review
• Feedback
Quality
Technology
Control
6/4/2012 44
45. Setup
Sample
Responsive Non-
Expert judges sample responsive
Repeat as needed
Model learns
Model predicts
Responsive Non-responsive
Model categorizes all remaining documents
47. Technology Enhanced Review:
Speed, Predictable Costs, and Accuracy
Automate any portion of the review
Source Eliminate
Data Duplicates &
System Files
100% Non-Responsive
30% Isolation Example from a real case
ontologies
NR by
30% Technology Responsive
Enhanced by Technology
Review Enhanced
(removed Review Priv by
another 18%) (removed High-Speed
another 7%) Manual Review
22% 3%
15%
6/4/2012 47
48. Search Methodologies
Visualization
Measurement
Relationship
Analysis
documents with
causal or
sequential relationship
Context
Social Network Analysis
relationships among relevant people
relationships among relevant people
Clustering
Clustering Ontology
Ontology
Concept similarity of
similarity of generalized
generalized
salient features
salient features words or phrases
words or phrases
specific exact words,
Content Keyword
Keyword specific exact words
specific exact words
proximity searches, stemming
6/4/2012 48
51. Search Methodologies
Visualization
Measurement
Relationship
Analysis
documents with
causal or
sequential relationship
Context
Social Network Analysis
relationships among relevant people
relationships among relevant people
Clustering
Clustering Ontology
Ontology
Concept similarity of
similarity of generalized
generalized
salient features
salient features words or phrases
words or phrases
specific exact words,
Content Keyword
Keyword specific exact words
specific exact words
proximity searches, stemming
6/4/2012 51
53. Better Answers and Better Questions
When were customary work practices circumvented?
When did established norms of behavior change?
Who knew, or likely knew, what facts?
Who interacted with whom and how intimately?
Who was involved in what types of decisions or meetings?
Who are the real „insiders‟?
What data is hidden or missing?
When were electronically documented conversations
“taken off line,” possibly in an attempt to avoid detection?
How did the importance of different actors change over time?
6/4/2012 53
54. Bear Stearns
Lower Bar For Fraud?
Two hedge fund managers
arrested
Charged with securities and
wire fraud, and one with
insider trading
Internal emails:
– “I'm fearful of these markets. ... As we discussed it may not be a
meltdown for the general economy but in our world it will be.”
– “I think we should close the funds now .”
External communications:
– “We are very comfortable with exactly where we are.”
– “The funds are performing exactly as they were designed to.”
6/4/2012 54
56. Analysis of Anomalous Communication Patterns
Unusual levels relative to a
particular type of activity
pop out
Color-coded graphs show
relative communication
densities for apples to
apples comparisons
6/4/2012 56
58. Emotive Tone
Whistle-blower Scenario
6/4/2012 58
59. “Call Me” Events
Sequence Viewer used for analytics-driven review
6/4/2012 59
60. Search Risks
Failure to find responsive documents
Failure to recognize responsive documents
Failure to recognize privileged documents
Inconsistent treatment of documents
(e.g., duplicates)
Failure to complete project in a timely manner
Sophisticated Tools
– Understand What They Do and Don‟t Do Well
– Inform Yourself, Speak to References, Consultants
6/4/2012 60
61. Transparency of Process
Discussing Review Protocols
– Provide transparent, defensible, sophisticated search
based on document content
– Clustering, Ontologies, Analytics, and yes, sometimes
Keywords too
Develop search methodologies for each case
– Use technology experts in consultation with case / legal
experts
Results verifiable by Quality Control
– Defensible sampling
6/4/2012 61
62. Thank you!
Sonya L. Sigler
Vice President, Product Strategy
SFL Data
415-321-8385
sonya@sfldata.com
www.sfldata.com
6/4/2012 62
63. Review Protocol
≠ Agreeing to Search Terms
Data Culling (upfront or backend)
Search Methodologies - Continuum
– Keyword Positive List
– Ontologies
– Clustering
– Technology Enhanced Review
– Relationship Analysis
Quality Control Process & Procedures
Privilege Review, Sensitivities
Production Format & Timing
6/4/2012 63
64. Search
The Courts are Finally Starting to Catch up to
Technology
Making more aggressive rulings:
– Forcing attorneys to live with the results of bad
searches
– Sanctioning those who screw up, even if no allegation
of fraud
– Demanding repeatable,
demonstrable process – using
terms like “quality assurance”
6/4/2012 64
65. Search Under Scrutiny
Facciola’s Opinions - United States v. O’Keefe
“for lawyers and judges to dare opine that a certain
search term or terms would be more likely to produce
information than [other] search terms … is truly to go
where angels fear to tread.”
He has also suggested that litigants take a good look at
more advanced search methodologies, including the use
of computational linguistics and technology assisted
review
6/4/2012 65
66. Reasonableness of Search Methods
Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md., May 29, 2008).
"Common sense suggests that even a properly designed and executed
keyword search may prove to be over-inclusive or under-inclusive...the only
prudent way to test the reliability of the keyword search is to perform some
appropriate sampling."
“Selection of the appropriate search and information retrieval technique
requires careful advance planning by persons qualified to design effective
search methodology. The implementation of the methodology selected should
be tested for quality assurance; and the party selecting the methodology must
be prepared to explain the rationale for the method chosen to the court,
demonstrate that it is appropriate for the task, and show that it was properly
implemented.”
6/4/2012 66
67. From Pre-Discovery to Production Completeness
Henry v. Quicken Loans --> 26(f) consulting
– Lawyers agreed to keyword lists and process
– Ran own (unsanctioned) searches with expert
– Told to live with bad results, and pay for it
Qualcomm --> Smell Test; Dig Deeper
– In-house counsel (Qualcomm) v. Outside Counsel (Day Casebeer)
– Sanctions, Attorney Client-Privilege Problems
– Associate found docs and told they weren‟t relevant; found out the
hard way that those and 230,000 other pages were relevant
Judge Rader‟s Protocol in TX for Patent cases
– 5 custodians
– 5 search terms (can you say over broad…)
6/4/2012 67
68. Under-inclusive - Missing in Action
Missing abbreviations / acronyms / clippings:
– incentive stock option but not ISO
– Board of Directors but not BOD
– 1998 plan but not 98 plan
Missing inflectional variants:
– grant but not grants, granted, granting
Missing spellings or common misspellings:
– gray but not grey
– privileged but not
priviliged, priviledged, privilidged, priveliged, privelidged, pri
veledged, …
6/4/2012 68
69. Missing in Action II
Missing syntactic variants:
board of directors meeting
but not
meeting of the board mtg of the directors
of directors BOD meetings
BOD meeting board meetings
board meeting BOD mtgs
BOD mtg board mtgs
board mtg directors’ meetings
directors’ meeting directors’ mtgs
directors’mtg mtgs of the BOD
mtg of the BOD mtgs of the directors
6/4/2012 69
70. Missing in Action III
Missing synonyms / paraphrases:
hire date but notstart date
approved by Smith
but not
Smith’s approval the goahead from
the approval of Smith Smith
Smith’s ok the nod from Smith
Smith’s go-ahead Smith’s signature
Smith’s goahead Smith’s sign-off
the go-ahead from the sign-off of Smith
Smith the signoff of Smith
6/4/2012 70
71. Missing in Action IV
As a keyword item, the address
101 E. Bergen Ave., Temple, CA 90200
does not match any of:
101 East Bergen Avenue
the Bergen site
the Temple location
our 90200 outlet
6/4/2012 71
72. Over-inclusive - Unwanted Extras
Options
Target: Sheila was granted 100,000 options at $10
Match: What are our options for lunch?
Match in a signature line:
Amanda Wacz
Acme Stock Options Administrator
Destroy
Target:destroyevidence
Match in a disclaimer: The information in this email, and any
attachments, may contain confidential and/or privileged
information and is intended solely for the use of the named
recipient(s). Any disclosure or dissemination in whatever form, by
anyone other than the recipient is strictly prohibited. If you have
received this transmission in error, please contact the sender
and destroy this message and any attachments. Thank you.
6/4/2012 72
73. Unwanted Extras II
alter*
Target: alter, alters, altered, altering
Matches:
alternate, alternative, alternation, altercate, altercation, alt
erably, …
grant
Target:stock optiongrant
Matches names:GrantWoods, HowardGrant
6/4/2012 73
74. Tuning an Ontology
Linguists briefed as reviewers
Linguists read the data
Linguists study complaint and other relevant
documents
Linguists analyze the search index
Legal Team provides input, feedback
6/4/2012 74
75. A Simple Linguistic Ontology
ROYALTY CONCEPT
– Royalty
– Commission
– Honorarium
– Usage Fee
– Slice of the Pie
6/4/2012 75
76. A Simple Pricing Concept
PRICING CONCEPT
– Purchase Order
– PO
– Dollar amount
– Invoice
6/4/2012 76
77. Adding Subjective Content
PRICING CONCEPT
– Purchase Order
– PO
– Dollar amount
– Invoice
– Cylinder
– Canister
– Bottle
6/4/2012 77
78. Ontology Usage
Identifying Misspellings, Slang, Nicknames, etc.
Variant Generation – help the user find what he
meant (names, words, suggestions)
– Buy* Buying, Buys, Bought, etc.
– Kenneth Lay, Ken Lay, klay, kenneth.lay
View variations in context to choose topics
Document segmentation – text blocks, signatures
Finding Words in Context, Frequency
at serious risk of losing 25
are certain risks inherent in 16
6/4/2012 78
79. Identifying misspellings, slang, etc
1. Match the index against electronic dictionary.
2. From the remaining material (not in dictionary), remove any
items that are merely numbers.
3. Find (in the ontologies) any words that are similar to what
remains.
4. Add the similar words to the ontology
This increases coverage (i.e., ensures
that we retrieve documents that
otherwise would have been missed)
6/4/2012 79
80. Variant Generation
Help the user find out
search for what he meant
Take
names, numbers, and
other entities for which
the user wants to search
Automatically generate
likely synonyms
6/4/2012 80
81. Variant Generation
Show the context of these variations, so the user can evaluate
them.
6/4/2012 81
82. Document Segmentation
Examples of signatures
Jean-Louis Koenig
President GGDA Region
MegaCorp International SA
Robert Guilliam
Rue de Concours 2280
Product Regulatory Affairs &Compliance
Bern, Switzerland
MegaCorp International
Neuchatel
Switzerland
Tél. +41 (31) 125 2366
Alberto Goreman
Manager Printing &Packaging, Eastern Region
+57 3 451 7195, alberto_goreman@megacorp.com
6/4/2012 82
83. Finding words in context
Phrase Total Instances
risks alienating some 37
at serious risk of losing 25
are certain risks inherent in 16
are at risk of running 15
it be risking anything by 15
difference a risk o why 14
and the risks inherent in 12
without assuming any risk 8
we could risk losing next 7
avoid transferring risk to the 5
requires taking risks and the 4
can t risk not living 3
and unknown risks and uncertainties 2
a potential risk that was 2
avoid transfering risk to the 2
This increases coverage AND precision
6/4/2012 83
84. Multi-Lingual Issues
Does language matter?
– Lucerne
– Luzerne
– Lucerna
These places were all the same city
Name of city not necessarily expressed in the same
language as rest of document
In Europe, many email threads and documents are
mixed language, and must be properly categorized as
such
6/4/2012 84
85. Automated Ontology Expansion Tools
Currently implemented expansion modules:
Spelling variants:
color>>colour, defense>>defence, labeled>>labelled
Lemmatization (recovering uninflected form):
walking>>walk, ate>>eat
Morphological variants:
eat>>eats, eating, eaten, ate
hablar>>hablo, hablas, habla, hablan, habláis, hablamos
Number expansion:
$2.5B>>two point five billion dollars
2,567>>two thousand five hundred sixty seven
13>>13th, thirteenth
Name variants:
Elizabeth Van der Beek>>“Liz Van der Beek”, “Liz Vander Beek”, “Van der
Beek, Elizabeth”, “Beth Vanderbeek”, etc.
Email variants (mined from alias clusters file):
Elizabeth Van der
Beek>>evanderbeek, liz.vanderbeek, vanderbeekl, emvanderbeek, etc.
Abbreviations:
administrative project meeting>>admin project meeting, admin project
mtg, admin proj mtg, etc.
6/4/2012 85
Editor's Notes
Investors sued to recover losses from the liquidation of two hedge funds2003 – they retained counsel to help file suit – counsel advised them to retain documents to file a complaint2004 – filed complaint, stayed until 20072007 – depositions revealed gaps in PLAINTIFF’s production of docsMonetary sancitons against 13 plaintiffs, and had to process and produce back-up tapes at their own expense.
Pension Committee found gross negligence and willfulness of their failure to include board members or investment committee members and gross negligence that they did not collect data form former employees
Pension Committee – monetary sanctions on 13 plaintiffs; some parties had to process and produce back-up tapes at their own expense