FreEed - Open Source eDiscovery

3,137 views

Published on

Backgrou

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,137
On SlideShare
0
From Embeds
0
Number of Embeds
70
Actions
Shares
0
Downloads
37
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

FreEed - Open Source eDiscovery

  1. 1. FreeEed Open source eDiscovery with Hadoop
  2. 2. Background of eDiscovery <ul><ul><li>Preservation </li></ul></ul><ul><ul><li>Discovery request </li></ul></ul><ul><ul><li>Production </li></ul></ul>
  3. 3. EDRM
  4. 4. What does FreeEed do now? <ul><li>Processing: </li></ul><ul><li>  </li></ul><ul><ul><li>Text extraction </li></ul></ul><ul><ul><li>Metadata extraction </li></ul></ul><ul><ul><li>Culling </li></ul></ul><ul><ul><li>Deliver load file </li></ul></ul><ul><ul><li>Deliver native documents </li></ul></ul>
  5. 5. What will FreeEed do soon? <ul><li>  </li></ul><ul><ul><li>Review </li></ul></ul><ul><ul><li>Analysis </li></ul></ul><ul><ul><li>Production </li></ul></ul><ul><ul><li>Presentation </li></ul></ul>
  6. 6. What else can FreeEed do? <ul><li>  </li></ul><ul><ul><li>Preservation </li></ul></ul><ul><ul><li>Collection </li></ul></ul>
  7. 7. Why can FreeEed do all of that? <ul><li>  </li></ul><ul><ul><li>Big Data technologies </li></ul></ul><ul><ul><ul><li>Storage </li></ul></ul></ul><ul><ul><ul><li>Processing </li></ul></ul></ul><ul><ul><li>Open source tools </li></ul></ul><ul><ul><ul><li>Text/metadata extraction </li></ul></ul></ul><ul><ul><ul><li>OCR </li></ul></ul></ul>
  8. 8. Advantages of open source approach <ul><li>  </li></ul><ul><ul><li>Easy reach </li></ul></ul><ul><ul><li>Modern technologies </li></ul></ul><ul><ul><li>Sharing spirit </li></ul></ul><ul><ul><li>Community support </li></ul></ul><ul><ul><li>Integrate or use any way you want </li></ul></ul>
  9. 9. Three ways to run FreeEed <ul><li>  </li></ul><ul><ul><li>Standalone on Linux </li></ul></ul><ul><ul><li>Private Linux cluster </li></ul></ul><ul><ul><li>Amazon cloud, controlled from your laptop (Windows, Mac, or Linux) - coming soon </li></ul></ul>
  10. 10. FreeEed Architecture <ul><li>  </li></ul><ul><ul><li>Staging (zip files) </li></ul></ul><ul><ul><li>Text/metadata extraction </li></ul></ul><ul><ul><li>Culling </li></ul></ul><ul><ul><li>TIFFing or PDF </li></ul></ul><ul><ul><li>Post-processing </li></ul></ul>
  11. 11. Staging <ul><li>  </li></ul><ul><ul><li>One zip file per node (computer/server) </li></ul></ul><ul><ul><li>Size controls load balancing </li></ul></ul><ul><ul><li>Big enough to make sense </li></ul></ul><ul><ul><li>Small enough to tolerate failure </li></ul></ul>
  12. 12. Text and Metadata <ul><li>  </li></ul><ul><ul><li>Tika </li></ul></ul><ul><ul><li>Umbrella for extractors </li></ul></ul><ul><ul><li>Hundreds of file formats </li></ul></ul><ul><ul><li>Just one line of code: </li></ul></ul><ul><li>  </li></ul><ul><li>String text =  </li></ul><ul><li>tika.parseToString(inputStream, metadata); </li></ul>
  13. 13. Culling <ul><li>  </li></ul><ul><ul><li>Selecting only responsive documents </li></ul></ul><ul><ul><li>Lucene - open source search  </li></ul></ul><ul><ul><li>Flexible search queries </li></ul></ul><ul><ul><li>Search in memory </li></ul></ul><ul><ul><li>Two lines of code: </li></ul></ul><ul><li>        Searcher searcher = new IndexSearcher(idx);         isResponsive = search(searcher, queryString); </li></ul>
  14. 14. TIFF/PDF <ul><li>  </li></ul><ul><ul><li>OpenOffice </li></ul></ul><ul><ul><li>LibreOffice </li></ul></ul><ul><ul><li>Admittedly, TIFFing is hard </li></ul></ul><ul><ul><li>Open source answer: it is what it is </li></ul></ul><ul><ul><li>Perfectionist answer: commercial filters </li></ul></ul>
  15. 15. Database use (HBase or Cassandra) <ul><li>For example, find all authors </li></ul><ul><ul><li>Document -> Author </li></ul></ul><ul><ul><li>Key = Author, Value = None </li></ul></ul><ul><ul><li>Author can be overwritten </li></ul></ul><ul><ul><li>The &quot;Authors&quot; row has all Authors </li></ul></ul>
  16. 16. So, practically <ul><li>Command-line </li></ul><ul><li>java -jar dist/FreeEed.jar -param_file my.freeeed.properties </li></ul><ul><li>or GUI </li></ul>
  17. 17. 1-2-3 <ul><li>  </li></ul><ul><ul><li>Install Ubuntu </li></ul></ul><ul><ul><li>Download FreeEed </li></ul></ul><ul><ul><li>Run the program </li></ul></ul><ul><ul><li>Ask for more features </li></ul></ul>
  18. 18. Install Ubuntu <ul><li>  </li></ul>
  19. 19. Download and unzip FreeEed <ul><li>  </li></ul>
  20. 20. Enjoy <ul><li>  </li></ul>

×