Deep Dive: Advanced Search Technologies


Published on

Even with recent advancements in predictive coding, tried and true searching tactics such as keyword searching, concept searching, topic grouping, near de-duplication, and email threading will continue to play an important role in ediscovery filtering, review and production across the Electronic Discovery Reference Model (EDRM).

Published in: Technology, Design
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Deep Dive: Advanced Search Technologies

  1. 1. 2
  2. 2. Discussion Overview  Case Law and Industry Guidance: The Role of Searching in Ediscovery  Back to the Basics: Keyword Searching Tips  Deep Dive: Advanced Searching Technologies 3
  3. 3. Judicial Viewpoints on Keyword Searching Court required parties to “confer on the development of reasonable search terms” instead of compelling production without a list of proposed search terms provided by the requesting party “Common practice governing the discovery of [ESI] requires the use of search terms . . . If the producing party generates the search terms on its own, the inevitable result will be complaints that the search terms were inadequate” EEOC v. McCormick & Schmick’s Seafood Restaurants, Inc., 2012 WL 380048 (D. Md. Feb. 3, 2012). 4
  4. 4.  Keyword searching plays an important role in winnowing document sets for discovery Analyzing Search Methods 5  Objective of search: high recall and precision » Recall – fraction of relevant documents found during review » Precision – fraction of identified documents that actually are relevant In this example, fruit is relevant; broccoli is not.
  5. 5. Designing Effective Keyword Searches 1. Understand your search engine » Learn how each operator works (OR, AND, PROXIMITY, etc.) » Be aware of operator precedence (Boolean or left-to-right) and use parentheses to clarify » Work with ediscovery provider to create an alternative strategy for lengthy searches that may “time out” 6
  6. 6. Designing Effective Keyword Searches 2. Develop a search strategy » Run broad searches for date-range culling, etc. then use results as scope for sub-level searches » Save searches and search results for future use and reference » Find on-point documents and use “similar” documents and concepts to provide additional key terms » Know your universe (foreign language requires foreign keywords!) 7
  7. 7. Designing Effective Keyword Searches 3. Build smart keyword lists  Use a text editor to reduce errors » Programs that format text can cause difficulty » Use a program like Notepad and place each term on a separate line » Spell check » Be aware of commonly misspelled keywords or privilege terms  Understand the impact of your key terms » Be flexible: account for word/phrase permutations – use a “Data Dictionary” » Over-inclusive? Under-inclusive? » “Noise words” increase likelihood of false hits 8
  8. 8. Advanced Searching Technologies What are some “new and evolving” search methods?  1. Concept Searching  2. Topic Grouping  3. Language Identification  4. Email Threading  5. Near De-Duplication  6. Sampling **Technology-assisted Review 9 Will not cover in this presentation – hot, evolving topic! Will cover in this presentation
  9. 9. Keyword Searching Concept Searching Allows reviewers to find documents with similar conceptual terms even if they do not contain the exact search terms Seldom used for filtering; increasingly used for review 1. Keyword Searching vs. Concept Searching Uses search terms to retrieve documents that contain those exact terms 10 Standard practice; generally accepted in the courts Emerging as a technology alternative
  10. 10. 2. Topic Grouping  Documents automatically grouped by theme without human input  Topic grouping will group similar documents and label them for quick identification  Users do not need to “seed” the processing engine by providing keywords 11
  11. 11. 3. Language Identification  This technology can identify all languages in a document as well as the primary language and pass this information along via a metadata field  A legal team needs to know what languages are in a collection, and the volume of foreign language documents  Reports can help determine whether to use machine translations, foreign language reviewers, or a combination 12
  12. 12. 4. Email Threading  Identifies and groups for review e-mail conversations based on content  Using actual content of the e-mails to identify e-mail threads is the most reliable method, as it will not fail to recognize a thread if the subject line changes or if e- mails are exchanged across different e-mail applications 13
  13. 13. 5. Near De-Duplication  Reviewers can quickly identify and compare documents that are very similar to one another but are not exact duplicates  Technology assesses document set’s similarities, identifying the most uniquely representative documents as “the core” » All related documents are then grouped around the core 14
  14. 14. 6. Sampling: Defensibility & Quality Control Sampling is the practice of looking at a certain % of documents in a data set or particular folder of data » Strengthens the defensibility of the process » Helps validate what you have (and equally important, do not have) in your production set » May take place iteratively throughout the review process or prior to production – During ongoing quality control – At the end to assess completeness of review 15