TECHNOLOGY FOR EDD
By Sara Emami
WHAT IS KEYWORD
When we think of the term, “keyword search” we
are talking about a basic search technique that
involves searching for one or more words within a
collection of documents.
Typically, a keyword search involves a user typing
their search request, or query, into a search engine
such as Google, which then returns only those
documents that contain the search terms entered.
The documents returned by the search engine are
called the search results.
KEYWORD SEARCH AND
Keyword searching in the EDRM (Electronic
Discovery Reference Model) can utilize an array of
techniques through a variety of data. Often time,
data in a case are searched within documents in a
specific case, but even there the documents can take
Understanding the array of forms will not only
benefit the EDD consultant, but also their client in
the best approach to pursue their case.
KEY WORD SEARCHES AND
Computer files (known as Electronically Stored Information,
or ESI), including files such as documents created with
Microsoft Word or PowerPoint, email stored as individual
message files or together in an Outlook or Notes data file,
OCR (Optical Character Recognition) files created from
scanned paper documents, or even more exotic files such as
those created by a CADCAM program demand the need for
computer systems to store and manage data in important
KEYWORD SEARCHING AND
WHY IT IS SIGNIFICANT IN EDISCOVERY
Search tools and methodologies are significant because they
have numerous applications during the e-discovery phase of the
litigation lifecycle and yield searches which help cases for clients
needing relevant information for their case.
Let us take a real life example of the processes and challenges
related to using search and how these challenges can be
mitigated. Our example includes an automobile accident and a
maintenance shop or garage which should have documented a
failed brake system, but may have been incompetent.
Let us say that Attorney John Doe is working on a new case
involving a car accident. The plaintiff is claiming that his local
garage failed to spot the a failing brake system in his client’s
2004 Honda Civic. As a result, the failing breaks not only
caused a major car accident, but additionally caused property
damage and bodily injury.
Attorney Jacob Bacon, who is representing the defendant’s
garage, has a database containing thousands of documents,
including email to and from the plaintiff and the defendant,
email from a mailing list for Honda enthusiasts that both
plaintiff and defendant participated in, and OCR’d documents
including maintenance records and receipts from the garage.
EXAMPLE 1 (continued)
This time Attorney John Doe runs a concept search using the
keywords on Honda Civic, brakes, accident, and maintenance. As
John Doe scrolls through the results he doesn’t see anything new,
until he sees the word “stoppies”, which he is unfamiliar with. A
little digging in the result set of documents lets him discover that
“stoppies” is a behavior similar to wheelies that can result in
The documents containing this word revealed that the plaintiff
frequently engaged in this dangerous behavior. Attorney Doe
now had the ammunition he needed to win his case, using a
concept he did not know in advance existed. What exactly is
concept searching? Read on to find out.
We have discussed the notion of keyword searching, but
based on our recent example of the failed brake system
involving the Honda, let us examine what concept or
“conceptual” searching is.
Concept search is an automated method used to search
electronically stored and unstructured text for
information based on “ideas” or “concepts”. As we saw
in our previous example of the automobile accident, the
term “stoppies” was a concept or idea to show a failed
brake system. The information retrieved in response to a
concept query should be relevant to the ideas contained in
the text of the query.
CONCEPT SEARCHING Example
Let us say that you are hired on by Oil/Gas Company X who is
in the midst of a lawsuit by a terminated employee by which
the employee wants to sue Oil/Gas Company X for wrongful
termination. Now, if we are wanting to perform a search on
the word “termination” – what other concept
words/concept ideas related to to “termination” can you
Here are some random words that might be found in e-mails
related to termination: canned, let-go, hosed, fired,
gatorated, sunset and beaches, retired, vacation, etc. With
concept search technologies and their advanced
capabilities, concept searching can assess trends in
evaluating patters and produce results that can help lawyers
and corporations with their litigation.
CONTEMPORARY EXAMPLE (CAN
YOU SPELL ENRON?)
We all may recall the Enron and WorldCom debacle which
highlighted corporate greed and was quite the scandal of the
early 2000s. How would concept searching help incriminate
the big bad wolfs?
Let us take an example Enron used to “hide” or employ the use
of “code” to prevent authorities or legal entities from finding
their hidden crime.
The term “Rawhide” was found in several of the Enron emails.
“Rawhide” could mean a kind of leather or an old TV show, but
in the context of the Enron emails, “Rawhide” actually refers to
one of its off-books partnerships.
“Raptor” was another of those problematic partnerships. So a
Concept Search query in the Enron emails for “Raptor” would
not net you documents about hawks, but rather about
“Rawhide” and other off-books partnerships, even if the words
“Raptor” and “Rawhide” did not actually appear in any
particular document itself.
BENEFITS OF CONCEPT
Increased likelihood of finding a larger number of
Less time spent perusing irrelevant documents
Less time spent trying to come up with the right
Reduced time, cost and effort overall in
retrieving the best documents in reply to the
concept of your query in the context of the entire
Let us say that we are working with a major oil/gas company (Oil
Company X) and that Oil Company X needs a vendor who hires
us to assist them with a lawsuit against oil company Y. Their
lawsuit references intellectual property theft in the year 2009
and Oil company X argues that there are certain words or
phrases that would incriminate Oil Company Y. How would we
be able to assist our client in the most cost efficient and timeefficient fashion? Keyword searching allows vendors to zoom
into collected data to find the relevant data in the form of
“keyword search” that would assist the client with their lawsuit
in the most meaningful fashion.
Understanding the reasoning behind keyword searching allows
us to help our clients.
WHAT QUALIFIES AS
Keyword searches are most often used to identify
documents that are either responsive or privileged. It
is also widely used for large-scale culling and filtering
of documents. Keywords often form a basic building
block for constructing other more complex
compound searches. Such compound searches use
other search elements such as Boolean logic.
The syntax in the search string;
Use of the keywords with or without stemming;
Use of keywords with certain wildcard specifications and the syntax
for said wildcards;
Case-sensitivity of keywords used in searches and whether the
keyword should match both cases; and
The target data sources to be searched.
Whether the query can be applied to any specific fields such as email
‘To/From’ or ‘Subject’.
Whether the query can be applied to any specific date range such as
an email ‘Sent Date’ between the date range of January 1, 2001
through December 31, 2001
Boolean searches are used to combine results of multiple
searches as well as to designate ambiguity, as when search for
two or more terms but do not necessarily need both.
Imagine you are at your local university library and want to
perform a search in one of the library databases which houses
many of the scholastic journals. You encounter a database form
which asks you to enter the
A wildcard is a character that may be used in a search term to
represent one or more other characters. It also allows you to find
words using patterns for a set of words and to find synonyms or
forms of a word The two most commonly used wildcards are:
1) The question mark (“?”) may be used to represent a single
character in a search expression. For example, searching for the
“ho?se” would yield results which contain such words as
“house” and “horse”.
Fuzzy search allows searching for word variations
such as in the case of misspellings.
Typically, such searching includes some form of
distance and score computations between the
specified word and the words in the corpus.
Fuzzy search is specified using the operator: fuzzysearch.
Synonyms are word variations that are determined
to be synonyms of the word being searched. Such
searching includes some form of dictionary or
thesaurus based lookup (e.g. party synonym is
gathering, get=together, festivity, etc.).
A proximity search looks for documents where two or more
separately matching term occurrences are within a specified
distance, where distance is the number of intermediate words
In addition to proximity, some implementations may also
impose a constraint on the word order, in that the order in the
searched text must be identical to the order of the search
query. Proximity searching goes beyond the simple matching
of words by adding the constraint of proximity and is generally
regarded as a form of advanced search.
For example, a search could be used to find "red brick house",
and match phrases such as "red house of brick" or "house
made of red brick". By limiting the proximity, these phrases
can be matched while avoiding documents where the words
are scattered or spread across a page or in unrelated articles
in an anthology.
Truncation specification is one way to match word variations.
Truncation allows for the final few characters to be left
Stemming specification is another method for matching word
variations. Stemming is the process of finding the root form of a
The stemming specification will match all morphological inflections
of the word, so that if you enter the search term sing, the
stemming matches would include singing, sang, and song. Note
that even though a stemming search will return singing for a
search term of sing, this is different from wildcard search. A
wildcard search for sing* will not return sang or song, while it will
WHAT IS METADATA AND
WHY IS IT IMPORTANT?
Software programs embed various categories of
metadata in the documents users create.
Metadata is significant because it describes how, when,
and by whom an electronic document was created,
modified, and transmitted.
Unlike paper documents, electronic documents are
unique because they carry their history with them.
Paper is boring and pertains to dinosaurs as it merely
shows us what a document said or looked like.
Electronic tells where the document went and what it
METADATA AND EMAILS
An e-mail carries information about its author,
creation date, attachments, identities of all recipients
including who was CC’ed or BCC’ed.
Metadata also connects attachments to e-mails.
Information embedded in other file types may
include document names, authors, number of times
printed…etc. Track changes reflects modifications by
Some methods of document review fail to account for and
preserve metadata. If a document is printed in the review
or production process, its metadata is lost.
Many lawyers believe they are conducting EDD when in
fact they are working with electronic images of
documents. The process of scanning and coding
documents into a database does not capture original
Understand the difference between document metadata
versus file system metadata.
When we think of file system metadata, think ‘file timestamps’
While ‘file metadata’ and “timestamps are often used
interchangeably, they mean two completely different things.
There are two separate ‘timestamps’ for office documents and
several other file types. The first set, is stored in the operating
system (Windows, Linux, MacOS) and are different from those
stored in the file.
The metadata stored in a file (Date Created, Date Last Saved
etc.) may also be referred to as the files timestamps and
confused with what’s stored by the operating system.