How new ai based analytics ignite a productivity revolution in e discovery-final
1. HOW NEW AI-BASED ANALYTICS IGNITE A
PRODUCTIVITY REVOLUTION IN
EDISCOVERY
ACEDS Webinar - August 24th, 2017
2. TODAY’S SPEAKERS
Mary Mack
Executive Director ACEDS
Paul Starrett
Specialist in electronic
evidence and data science in
the legal profession
Johannes Scholtes
CSO at ZyLAB
Professor Text-Mining
University of Maastricht
4. Tools from the field of Artificial Intelligence and Data Science
accelerate truth-finding missions in regulatory requests and
internal investigations.
New AI-based analytics have drastically increased the speed
and improved the quality of the eDiscovery process.
But what exactly are these new AI techniques and how do they
compare to all the other analytics we have been using for
years?
TODAY’S AGENDA
5. THE BUZZ
SLIDE / 5
e-Discovery & Artificial Intelligence The new reality
AI becomes good business practice
6. WHAT ARE WE TALKING ABOUT?
“Analytics” is the discovery,
interpretation, and communication
of meaningful patterns in data.
The terms “analytics” or “analysis”
describe functions ranging from
reporting and review metrics to
sophisticated search and
advanced data, text-mining and
machine learning applications.
Benefits also range across various
dimensions.
“Artificial Intelligence (AI) is a
broad, complex field of research.
AI includes tasks such as
reasoning, problem solving,
knowledge representation,
planning, machine learning,
natural language processing,
perception, motion, social
intelligence, and even creativity.
The ultimate goal is the creation
of some form of general
intelligence.
SLIDE / 6
7. The Usual Suspects:
Exploding data volumes;
New types of data (multi-media, social, BYOD);
Exploding eDiscovery costs;
New regulations and compliance requirements
GDPR
Cyber-security requirements
More enthusiastic regulators, especially outside of the US.
SLIDE / 7
WHY WE SHOULD CARE
8. DEALING WITH THE EDISCOVERY DATA WAVE
In eDiscovery, you never know in
advance:
How much data you will have;
What type of data it will be and thus
what type of processing is required;
What workflow and iterations you will
have;
Automation, AI and Data Science are
very CPU and computers memory
intensive;
So, you need intelligent and extremely
load-balancing and resource allocation to
prevent bottlenecks and deal effectively
with the “Data Wave” in eDiscovery.
9. Better understand your data: the ability to make better strategic
decisions.
Early Case Assessment: build and justify eDiscovery budget,
resources and timelines.
Reduce data volumes: cut through the noise and zero in on
documents of interest.
Take an investigative approach: organize and prioritize documents.
Reduce your eDiscovery cost: improve productivity and precision of
your team.
Better quality: see greater consistency in coding decisions across
similar documents.
Speed up litigation.
SLIDE / 9
WHY ANALYTICS?
10. Humans have cognitive limitations when processing and
deriving insights from large-scale document sets; humans
simply cannot successfully synthesize large volumes of data.
Technology will help lawyers work more efficiently, effectively,
and enjoyably.
Grossman & Cormack* : “TAR was not only more effective than
human review at finding relevant documents, but also much
cheaper … Overall, the myth that exhaustive manual review is
the most effective—and therefore the most defensible—
approach to document review is strongly refuted.”
SLIDE / 10
WHY AI-BASED ANALYTICS?
* TECHNOLOGY-ASSISTED REVIEW IN E-DISCOVERY CAN BE MORE EFFECTIVE AND MORE EFFICIENT THAN EXHAUSTIVE MANUAL REVIEW
By Maura R. Grossman* & Gordon V. Cormack. Richmond Journal of Law and Technology. Vol. XVII, Issue 3.
12. Structural: aka syntactic analytics
File-, Document and Forensic Property extraction, Meta-data
filtering, Saved (full-text) Searches, Email Thread detection,
Email Thread reduction, Missing emails in thread, Duplicate- and
Near Duplicate detection, Language identification,
Communication Analysis, Time-line Visualizations, Geo-mapping,
…
Conceptual: aka semantic or meaning based analytics
Keyword Expansion (taxonomy), Content Clustering, Content-
based Categorization, Conceptual Search, Sentiment & Emotion
Mining, Semantic Content Analysis, Word-Cloud, Topic Modeling,
…
Machine Learning: data driven (predictive) analytics
Technology Assisted Review, Contract clause detection &
classification, Privileged detection, …
SLIDE / 12
WHAT KIND OF ANALYTICS HAVE WE SEEN?
STRUCTURE OF DATA
MEANING OF DATA
LEARN FROM DATA
13. WHAT IS THE RELATION BETWEEN AI AND ANALYTICS?
eDiscovery needs:
Perception
Reading: OCR, handwriting detection, signature
recognition,
Listening: Audio search
Vision: Image classification
Language: Machine Translation
Intelligent Search
Machine Learning for search
Concept Clustering
Data Visualization
Text classification and categorization
Document
Paragraph (clause)
Sentence or phrase
AI provides the algorithms and evaluation methods:
Machine Learning
Decision trees
Support Vector Machines
Deep Learning (CNN)
Topic Modeling / Concept Search
Hierarchical Clustering
LSI
LDA
NMF
Natural Language Processing (NLP)
Shallow Parsing
Deep Parsing
Co-reference resolution
SLIDE / 13
18. PERCEPTION: OCR ON BITMAPS
ZyLAB: people often screenshot or take
pictures from such information, just in case
or to remember…. ZyLAB will pick up such
images, OCR and find them…
19. STRUCTURAL: UNPACK EMBEDDED CONTENT
ZyLAB:
• Every embedded item is extracted and OCR-ed if needed.
• Search & Find
• Show in document family
24. Question Entities or patterns to address this question
Who is it about? PERSON, COMPANY, ORGANIZATION. EMAIL
ADDRESS
What is it about? Result of Topic Modeling and Concept Clustering
When did it happen? DATE, TIME, MONTH, DAY WEEK, YEAR
Where did it happen? ADDRESS, CITY, COUNTRY, CONTINENT,
DEPARTMENT and other geo-locations
Why did it happen? Sentiments, emotions and cursing
How did it happen? Combining entities and facts
How much/often did it happen? Quantitative measures such as amounts,
currencies, and other numbers. Also frequency
and averages on entity occurrences.
SLIDE / 24
25. MORE DETAILED INSIGHTS
SLIDE / 25
More interesting is to combine the W’s. For instance, why
not look for Who is Where, or What happened When.
Who – Who
Who – Why
When – What
26. The era of traditional keyword and Boolean search
seems to be over. Even the most brilliant query results
in too many hits. Reviewing these takes too much
time and resources.
People do not know exactly what to look for, what
keywords to use or how to spell them.
The quality of traditional search is much lower than
the searchers think (80% perceived versus 20-40%
actual quality).
Only highly skilled searchers who manage all
(advanced) query options are able to get close to
80%. Even then, they cannot be sure that they did in
fact found 80% of all relevant documents. This is
another problem measuring recall: you never know
what you miss.
MACHINE LEARNING: THE NEW SEARCH
29. Have we found all relevant
information? How complete
is the data we sent to the
regulator? Machine
learning!
During this process, several
quantitative measures can
be calculated such as
precision, recall, F-values
and precision of the return
set. Based on these
measurements, one can
describe exactly how much
of the relevant information
has been found at which
moment in the process.
HOW CAN WE MEASURE RECALL
34. ZyLAB’s Direct Collecting makes tremendous time savings to get data ready for early
case assessment and (first) pass review. Direct Collection drastically reduces the cost
and risks of downloading / uploading data or the shipping around of tapes and hard disks.
ZyLAB’s Deep Processing allows you to automatically reduce your data volumes before
you send them on for review, without getting in trouble or being accused of data
spoliation. If every component of data is searchable, only then can one use automated
tools to reduce data.
Using ZyLAB’s Review Accelerators you can minimize the most expensive and time
consuming part of the eDiscovery process. TAR, batch tagging, sampling, redaction,
email trails, …
Litigants use ZyLAB’s Early Case Assessment to quickly understand the facts and
merits of a case, identify key custodians and recognize critical information so they can
develop an effective and realistic litigation strategy.
SLIDE / 34
BENEFITS TO IN-HOUSE COUNSEL
35. BENEFITS TO LAW FIRMS
ZyLAB covers multiple eDiscovery use
cases. One platform: More cases, more
volume, better pricing.
No need to involve any 3rd parties.
Bill the hours for project management and
data science (machine learning) as well.
DIY: upload data and almost immediately
start reviewing with your team and bill the
hours.
Find out what really happened with
ZyLAB’s deep search and analytics.
Expand review team.
Replace the bottom of the traditional
earnings pyramid with “review robots”:
make more margin.
Be more competitive.
Do more work with your current team:
never have to pass on new opportunities
because of capacity problems.
less risk of errors and missing out on key
issues. So, less risk for liability claims and
higher insurance premiums.
36.
37. “ZYLAB TAKES CARE OF THE PROCESS, SUPPORTS THE LAWYER BY
THINKING COMMERCIALLY AND PROVIDES COMFORT WITH THE
USE OF ADVANCED TECHNOLOGY”
Ruben Elkerbout, anti-trust lawyer and partner with Stek Lawyers