Semantic Search in E-Discovery
Research on the application of text mining and information retrieval
for fact finding in regulatory investigations

                                                                       David Graus
Who’s Involved?

     Prof. dr. Maarten de Rijke                   Dr. Hans Henseler
                                                  Lector E-Discovery, CREATE-IT applied
     Director Intelligent Systems Lab, UvA        research



     David Graus, MSc.                            David van Dijk, MSc.
     PhD Candidate, Semantic Search               Researcher E-Discovery, CREATE-IT
                                                  applied research
     in E-Discovery, UvA



     Zhaochun Ren, MSc.                           Menno Israël, MSc.
                                                  Teamleader Knowledge and Expertise
     PhD Candidate, Semantic Search               Centre for Intelligent Data Analysis
     in E-Discovery, UvA                          (Kecida), NFI




                                             Semantic search in e-discovery              2
Introduction

 £   Semantic Search in E-Discovery




                                Semantic search in e-discovery   3
What is

 £   Semantic Search in E-Discovery
      ˜   retrieving and securing digital forensic evidence




                                           Semantic search in e-discovery   4
What is

 £   Semantic Search in E-Discovery




                               Semantic search in e-discovery   5
What is

 £   Semantic Search in E-Discovery
      ˜ retrieving and securing digital forensic evidence
      ˜ from emails, forums, etc...




                                          Semantic search in e-discovery   6
What is

 £   Semantic Search in e-Discovery




                               Semantic search in e-discovery   7
Challenge

¢   Finding out who knew what, from whom, and when




                                Semantic search in e-discovery   8
Challenge

¢   Finding out who knew what, from whom, and when
¢   Generic search is not the answer




                                  Semantic search in e-discovery   9
Finding evidence for E-Discovery

¢   We don’t know what we’re looking for
¢   What we’re looking for might be deliberately hidden
¢   Communication might be very domain-specific,
     contextualized or incomplete




                                   Semantic search in e-discovery   10
Task

¢   Retrieve all relevant traces
¢   Highly iterative search process
¢   Support (re)formulating questions and hypotheses




                                       Semantic search in e-discovery   11
How do we approach this?

¢   Two subprojects:
     £   Information Retrieval
          ˜   Finding material of unstructured nature from large collections
     £   Information Extraction/Text Mining
          ˜   Discovering patterns in data




                                                    Semantic search in e-discovery   12
How do we approach this?

¢   Information Retrieval
     £   Integrating structure/context of data in retrieval models
          ˜   Capturing forum and email context
          ˜   Conversational search




                                                   Semantic search in e-discovery   13
How do we approach this?

¢   Information Extraction/Text Mining
     £   Extracting structured knowledge from user generated
          content
          ˜   Semantic pre-processing
          ˜   Social network inference
          ˜   Information maps




                                          Semantic search in e-discovery   14
How do we approach this?

¢   Information Retrieval <-> Information Extraction




                                   Semantic search in e-discovery   15
Current work (first steps)

¢   Information Retrieval
     £   Twitter Mining (as a form of conversational search)


¢   Information Extraction/Text Mining
     £   Entity linking (for semantic document enrichment)


¢   TREC/TAC benchmarking events
     £   TREC Legal Track 2011 (2013?)



                                            Semantic search in e-discovery   16
Contributions

¢   xTAS: Open source text analysis toolkit
¢   iColumbo: Internet monitoring framework
¢   Used by:
     £   Internet Recherche Netwerk
     £   Koninklijke Bibliotheek
     £   Beeld en Geluid
     £   ... You?




                                       Semantic search in e-discovery   17
Semantic search in E-discovery

¢   David Graus
¢   d.p.graus@uva.nl




                        Semantic search in e-discovery   18

Semantic Search in E-Discovery

  • 1.
    Semantic Search inE-Discovery Research on the application of text mining and information retrieval for fact finding in regulatory investigations David Graus
  • 2.
    Who’s Involved? Prof. dr. Maarten de Rijke Dr. Hans Henseler Lector E-Discovery, CREATE-IT applied Director Intelligent Systems Lab, UvA research David Graus, MSc. David van Dijk, MSc. PhD Candidate, Semantic Search Researcher E-Discovery, CREATE-IT applied research in E-Discovery, UvA Zhaochun Ren, MSc. Menno Israël, MSc. Teamleader Knowledge and Expertise PhD Candidate, Semantic Search Centre for Intelligent Data Analysis in E-Discovery, UvA (Kecida), NFI Semantic search in e-discovery 2
  • 3.
    Introduction £ Semantic Search in E-Discovery Semantic search in e-discovery 3
  • 4.
    What is £ Semantic Search in E-Discovery ˜ retrieving and securing digital forensic evidence Semantic search in e-discovery 4
  • 5.
    What is £ Semantic Search in E-Discovery Semantic search in e-discovery 5
  • 6.
    What is £ Semantic Search in E-Discovery ˜ retrieving and securing digital forensic evidence ˜ from emails, forums, etc... Semantic search in e-discovery 6
  • 7.
    What is £ Semantic Search in e-Discovery Semantic search in e-discovery 7
  • 8.
    Challenge ¢ Finding out who knew what, from whom, and when Semantic search in e-discovery 8
  • 9.
    Challenge ¢ Finding out who knew what, from whom, and when ¢ Generic search is not the answer Semantic search in e-discovery 9
  • 10.
    Finding evidence forE-Discovery ¢ We don’t know what we’re looking for ¢ What we’re looking for might be deliberately hidden ¢ Communication might be very domain-specific, contextualized or incomplete Semantic search in e-discovery 10
  • 11.
    Task ¢ Retrieve all relevant traces ¢ Highly iterative search process ¢ Support (re)formulating questions and hypotheses Semantic search in e-discovery 11
  • 12.
    How do weapproach this? ¢ Two subprojects: £ Information Retrieval ˜ Finding material of unstructured nature from large collections £ Information Extraction/Text Mining ˜ Discovering patterns in data Semantic search in e-discovery 12
  • 13.
    How do weapproach this? ¢ Information Retrieval £ Integrating structure/context of data in retrieval models ˜ Capturing forum and email context ˜ Conversational search Semantic search in e-discovery 13
  • 14.
    How do weapproach this? ¢ Information Extraction/Text Mining £ Extracting structured knowledge from user generated content ˜ Semantic pre-processing ˜ Social network inference ˜ Information maps Semantic search in e-discovery 14
  • 15.
    How do weapproach this? ¢ Information Retrieval <-> Information Extraction Semantic search in e-discovery 15
  • 16.
    Current work (firststeps) ¢ Information Retrieval £ Twitter Mining (as a form of conversational search) ¢ Information Extraction/Text Mining £ Entity linking (for semantic document enrichment) ¢ TREC/TAC benchmarking events £ TREC Legal Track 2011 (2013?) Semantic search in e-discovery 16
  • 17.
    Contributions ¢ xTAS: Open source text analysis toolkit ¢ iColumbo: Internet monitoring framework ¢ Used by: £ Internet Recherche Netwerk £ Koninklijke Bibliotheek £ Beeld en Geluid £ ... You? Semantic search in e-discovery 17
  • 18.
    Semantic search inE-discovery ¢ David Graus ¢ d.p.graus@uva.nl Semantic search in e-discovery 18