• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Deep Dive: Advanced Search Technologies

Deep Dive: Advanced Search Technologies



Even with recent advancements in predictive coding, tried and true searching tactics such as keyword searching, concept searching, topic grouping, near de-duplication, and email threading will ...

Even with recent advancements in predictive coding, tried and true searching tactics such as keyword searching, concept searching, topic grouping, near de-duplication, and email threading will continue to play an important role in ediscovery filtering, review and production across the Electronic Discovery Reference Model (EDRM).



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Deep Dive: Advanced Search Technologies Deep Dive: Advanced Search Technologies Presentation Transcript

    • 2
    • Discussion Overview  Case Law and Industry Guidance: The Role of Searching in Ediscovery  Back to the Basics: Keyword Searching Tips  Deep Dive: Advanced Searching Technologies 3
    • Judicial Viewpoints on Keyword Searching Court required parties to “confer on the development of reasonable search terms” instead of compelling production without a list of proposed search terms provided by the requesting party “Common practice governing the discovery of [ESI] requires the use of search terms . . . If the producing party generates the search terms on its own, the inevitable result will be complaints that the search terms were inadequate” EEOC v. McCormick & Schmick’s Seafood Restaurants, Inc., 2012 WL 380048 (D. Md. Feb. 3, 2012). 4
    •  Keyword searching plays an important role in winnowing document sets for discovery Analyzing Search Methods 5  Objective of search: high recall and precision » Recall – fraction of relevant documents found during review » Precision – fraction of identified documents that actually are relevant In this example, fruit is relevant; broccoli is not.
    • Designing Effective Keyword Searches 1. Understand your search engine » Learn how each operator works (OR, AND, PROXIMITY, etc.) » Be aware of operator precedence (Boolean or left-to-right) and use parentheses to clarify » Work with ediscovery provider to create an alternative strategy for lengthy searches that may “time out” 6
    • Designing Effective Keyword Searches 2. Develop a search strategy » Run broad searches for date-range culling, etc. then use results as scope for sub-level searches » Save searches and search results for future use and reference » Find on-point documents and use “similar” documents and concepts to provide additional key terms » Know your universe (foreign language requires foreign keywords!) 7
    • Designing Effective Keyword Searches 3. Build smart keyword lists  Use a text editor to reduce errors » Programs that format text can cause difficulty » Use a program like Notepad and place each term on a separate line » Spell check » Be aware of commonly misspelled keywords or privilege terms  Understand the impact of your key terms » Be flexible: account for word/phrase permutations – use a “Data Dictionary” » Over-inclusive? Under-inclusive? » “Noise words” increase likelihood of false hits 8
    • Advanced Searching Technologies What are some “new and evolving” search methods?  1. Concept Searching  2. Topic Grouping  3. Language Identification  4. Email Threading  5. Near De-Duplication  6. Sampling **Technology-assisted Review 9 Will not cover in this presentation – hot, evolving topic! Will cover in this presentation
    • Keyword Searching Concept Searching Allows reviewers to find documents with similar conceptual terms even if they do not contain the exact search terms Seldom used for filtering; increasingly used for review 1. Keyword Searching vs. Concept Searching Uses search terms to retrieve documents that contain those exact terms 10 Standard practice; generally accepted in the courts Emerging as a technology alternative
    • 2. Topic Grouping  Documents automatically grouped by theme without human input  Topic grouping will group similar documents and label them for quick identification  Users do not need to “seed” the processing engine by providing keywords 11
    • 3. Language Identification  This technology can identify all languages in a document as well as the primary language and pass this information along via a metadata field  A legal team needs to know what languages are in a collection, and the volume of foreign language documents  Reports can help determine whether to use machine translations, foreign language reviewers, or a combination 12
    • 4. Email Threading  Identifies and groups for review e-mail conversations based on content  Using actual content of the e-mails to identify e-mail threads is the most reliable method, as it will not fail to recognize a thread if the subject line changes or if e- mails are exchanged across different e-mail applications 13
    • 5. Near De-Duplication  Reviewers can quickly identify and compare documents that are very similar to one another but are not exact duplicates  Technology assesses document set’s similarities, identifying the most uniquely representative documents as “the core” » All related documents are then grouped around the core 14
    • 6. Sampling: Defensibility & Quality Control Sampling is the practice of looking at a certain % of documents in a data set or particular folder of data » Strengthens the defensibility of the process » Helps validate what you have (and equally important, do not have) in your production set » May take place iteratively throughout the review process or prior to production – During ongoing quality control – At the end to assess completeness of review 15