SlideShare a Scribd company logo
1 of 32
+
Hadoop-based Open Source eDiscovery:
FreeEed
(Easy as popcorn)
+
Business (legal) use case
2
• Duty to disclose information – rule FRCP 26
• Preserve relevant information
• Produce information on request
• Keep the information for X years
• Sanctions for obstruction
• Sanctions for non-compliance
+
Before the thirties
3
• Court room was full of surprises
+
Civil discovery changes this
4
+
Discovery basics
5
• Obligations of the parties
• At the start of a lawsuit or litigation
possibility, preserve relevant data
• Produce data at request, within timelines
• Review the data before production
• Can request eDiscovery from opponents
• Store and archive
+
Interesting facts about eDiscovery
6
• Most of these are proprietary or under NDA
• Representative case size: 5GB to 500GB
• Cost per GB of processing: $5-200, ~$100
• Takes 25-50% of litigation budget
• Days to process and months to review
• Preservation: 3-7 years
• 500 providers, with 10 majors
+
Challenges of eDiscovery
7
• Data sizes in the TB
• Seasonal loads, tight deadlines
• Hundreds of file formats
• Heavy read/write load in review
• Text analytics is of paramount importance
• Huge price tickets obstruct justice
+
FreeEed main features
8
• Open source Hadoop-based eDiscovery:
• As scalable as Hadoop
• Fast review with NoSQL
• Scales with the lawsuit - time and volume
• Data preservation and archiving with VM
• Only possible with open source license
+
Design goals
9
• Built on open source components
• Big Data scalable
• Preservation, chain of custody, archiving
• Scalable technically and business-ly
• Stable (don’t laugh, people get different
results on different runs)
• Close-source compatible (MS + Azure too)
+
Packaging architecture
10
• Comes as VM’s
• Grab as few or as many as you want
• No mixing of matters
• No ethical problems
• Preserve for as many years as you want
• 1 VM = 1 corn, FreeEed = free popcorn
+
FreeEed makes lawyers happy
11
+
FreeEed : Architecture
12
+
FreeEed popcorn is very popular with
lawyers, legal techs, IT, etc.
+
FreeEed popcorn
14
• Deploy on laptops, servers or cloud
• One-node or any number of nodes
• Scalable storage
• Different cooking recipes
• No mixing of matters
• Easy archiving
• Easy deletion
+
Processing architecture
15
• Based on golden-image VM
• Controlled cluster start in any environment
• Index / cull on the fly or later
• Immediately searchable
+
Cluster start-up on EC2
16
+
Cloud integration
 Downloadable VM’s
 Same VM’s on Amazon AWS
 Amazon VM’s are very convenient
 Immediate deployment
 Any hardware configuration you need
 Control lots of power from a limited-power laptop
 Azure – working with Microsoft
17
+
Review architecture
18
• Lucene
• Solr
• HBase
• Lucene indexes created in reducers and
combined in Solr
• For small matters, write directly to Solr
+
Review screen
19
+
Review capabilities
20
• Search
• Cull down
• View text and metadata
• Tag documents
• Export as images or as native files
+
Eagle eye’s view - EDRM
21
+
Left of EDRM – Legal Hold
22
• FreeEedCollect
• Architecture:
https://github.com/markkerzner/FreeEedC
ollect
• ZooKeeper/MapReduce/Flume/HDFS
+
Right of EDRM – Org. charts
23
Partnership with Sintelix
+
Analytics – network of actors
24
Partnership with Sintelix
+
FreeEed and data governance
25
• Virtualization for data preservation
• Scalable processing
• Archiving
• Documents groups not mixing
• Data format stored together with software that
understands it
+
Hadoop & Big Data applications
26
• Other related applications
• Financial – text analytics
• Energy – documents and procedures
analytics
• Actual on-going projects
+
FreeEed as a learning tool
27
• 100’s of downloads
• Dozens of active users
• Real-world Hadoop application
• Many developers download to learn
• Complex, real, but manageable
+
FreeEed adoption – who is trying
our “popcorn”?
28
• Large law firms
• Small law firms and solos
• Government agencies
• Universities
• Enterprises
• Developers learn Big Data
+
Looking forward
29
• Add
• Collection
• Analytics
• Community
• Integrations
• Implementations
+
How you can use FreeEed
30
• For its intended purpose
• Large law firms
• Small firms and solos,
• Pro-se
• Integrate in the IT legal
• Start a similar document management project
+
How you can use FreeEed
31
• For its intended purpose
• Large law firms
• Small firms and solos,
• Pro-se
• Integrate in the IT legal
• Start a similar document management project
+
Q&A
32
• Thank you!
• People usually ask:
• How can I put my data in the cloud?
• Is it safe?
• Do you do OCR, PST, OST, etc…?

More Related Content

Similar to FreeEed presentation

Switching to Oracle Document Cloud
Switching to Oracle Document CloudSwitching to Oracle Document Cloud
Switching to Oracle Document CloudBrian Huff
 
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527Zohar Elkayam
 
BIOIT14: Deploying very low cost cloud storage technology in a traditional re...
BIOIT14: Deploying very low cost cloud storage technology in a traditional re...BIOIT14: Deploying very low cost cloud storage technology in a traditional re...
BIOIT14: Deploying very low cost cloud storage technology in a traditional re...Dirk Petersen
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemZohar Elkayam
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemThings Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemZohar Elkayam
 
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014MongoDB
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
EDB Postgres in Public Sector
EDB Postgres in Public SectorEDB Postgres in Public Sector
EDB Postgres in Public SectorKangaroot
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015Zohar Elkayam
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersZohar Elkayam
 
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
Belgium & Luxembourg dedicated online Data Virtualization discovery workshopBelgium & Luxembourg dedicated online Data Virtualization discovery workshop
Belgium & Luxembourg dedicated online Data Virtualization discovery workshopDenodo
 
Big Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate SystemsBig Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate SystemsDavid Bennett
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudAlluxio, Inc.
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Soujanya V
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
Webinar: How to Design a Compliant and GDPR Ready Collaboration System
Webinar: How to Design a Compliant and GDPR Ready Collaboration SystemWebinar: How to Design a Compliant and GDPR Ready Collaboration System
Webinar: How to Design a Compliant and GDPR Ready Collaboration SystemStorage Switzerland
 
界昇 20151007 ira_cognizer
界昇 20151007 ira_cognizer界昇 20151007 ira_cognizer
界昇 20151007 ira_cognizer景逸 王
 

Similar to FreeEed presentation (20)

Switching to Oracle Document Cloud
Switching to Oracle Document CloudSwitching to Oracle Document Cloud
Switching to Oracle Document Cloud
 
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
 
BIOIT14: Deploying very low cost cloud storage technology in a traditional re...
BIOIT14: Deploying very low cost cloud storage technology in a traditional re...BIOIT14: Deploying very low cost cloud storage technology in a traditional re...
BIOIT14: Deploying very low cost cloud storage technology in a traditional re...
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemThings Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
 
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
EDB Postgres in Public Sector
EDB Postgres in Public SectorEDB Postgres in Public Sector
EDB Postgres in Public Sector
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for Developers
 
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
Belgium & Luxembourg dedicated online Data Virtualization discovery workshopBelgium & Luxembourg dedicated online Data Virtualization discovery workshop
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
 
Big Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate SystemsBig Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate Systems
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Webinar: How to Design a Compliant and GDPR Ready Collaboration System
Webinar: How to Design a Compliant and GDPR Ready Collaboration SystemWebinar: How to Design a Compliant and GDPR Ready Collaboration System
Webinar: How to Design a Compliant and GDPR Ready Collaboration System
 
界昇 20151007 ira_cognizer
界昇 20151007 ira_cognizer界昇 20151007 ira_cognizer
界昇 20151007 ira_cognizer
 

More from Mark Kerzner

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Witsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingWitsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingMark Kerzner
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupMark Kerzner
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleMark Kerzner
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiMark Kerzner
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Mark Kerzner
 
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Mark Kerzner
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpMark Kerzner
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Mark Kerzner
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discoveryMark Kerzner
 
FreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryFreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryMark Kerzner
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandMark Kerzner
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetableMark Kerzner
 
Carnavale de Venice
Carnavale de VeniceCarnavale de Venice
Carnavale de VeniceMark Kerzner
 

More from Mark Kerzner (20)

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Toorcamp 2016
Toorcamp 2016Toorcamp 2016
Toorcamp 2016
 
Witsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingWitsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streaming
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - Altiscale
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Cloudera search
Cloudera searchCloudera search
Cloudera search
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
 
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS
 
SHMcloud vision
SHMcloud visionSHMcloud vision
SHMcloud vision
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdp
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2
 
Hadoop on ec2
Hadoop on ec2Hadoop on ec2
Hadoop on ec2
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discovery
 
FreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryFreEed - Open Source eDiscovery
FreEed - Open Source eDiscovery
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, Switzerland
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetable
 
Carnavale de Venice
Carnavale de VeniceCarnavale de Venice
Carnavale de Venice
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

FreeEed presentation

  • 1. + Hadoop-based Open Source eDiscovery: FreeEed (Easy as popcorn)
  • 2. + Business (legal) use case 2 • Duty to disclose information – rule FRCP 26 • Preserve relevant information • Produce information on request • Keep the information for X years • Sanctions for obstruction • Sanctions for non-compliance
  • 3. + Before the thirties 3 • Court room was full of surprises
  • 5. + Discovery basics 5 • Obligations of the parties • At the start of a lawsuit or litigation possibility, preserve relevant data • Produce data at request, within timelines • Review the data before production • Can request eDiscovery from opponents • Store and archive
  • 6. + Interesting facts about eDiscovery 6 • Most of these are proprietary or under NDA • Representative case size: 5GB to 500GB • Cost per GB of processing: $5-200, ~$100 • Takes 25-50% of litigation budget • Days to process and months to review • Preservation: 3-7 years • 500 providers, with 10 majors
  • 7. + Challenges of eDiscovery 7 • Data sizes in the TB • Seasonal loads, tight deadlines • Hundreds of file formats • Heavy read/write load in review • Text analytics is of paramount importance • Huge price tickets obstruct justice
  • 8. + FreeEed main features 8 • Open source Hadoop-based eDiscovery: • As scalable as Hadoop • Fast review with NoSQL • Scales with the lawsuit - time and volume • Data preservation and archiving with VM • Only possible with open source license
  • 9. + Design goals 9 • Built on open source components • Big Data scalable • Preservation, chain of custody, archiving • Scalable technically and business-ly • Stable (don’t laugh, people get different results on different runs) • Close-source compatible (MS + Azure too)
  • 10. + Packaging architecture 10 • Comes as VM’s • Grab as few or as many as you want • No mixing of matters • No ethical problems • Preserve for as many years as you want • 1 VM = 1 corn, FreeEed = free popcorn
  • 13. + FreeEed popcorn is very popular with lawyers, legal techs, IT, etc.
  • 14. + FreeEed popcorn 14 • Deploy on laptops, servers or cloud • One-node or any number of nodes • Scalable storage • Different cooking recipes • No mixing of matters • Easy archiving • Easy deletion
  • 15. + Processing architecture 15 • Based on golden-image VM • Controlled cluster start in any environment • Index / cull on the fly or later • Immediately searchable
  • 17. + Cloud integration  Downloadable VM’s  Same VM’s on Amazon AWS  Amazon VM’s are very convenient  Immediate deployment  Any hardware configuration you need  Control lots of power from a limited-power laptop  Azure – working with Microsoft 17
  • 18. + Review architecture 18 • Lucene • Solr • HBase • Lucene indexes created in reducers and combined in Solr • For small matters, write directly to Solr
  • 20. + Review capabilities 20 • Search • Cull down • View text and metadata • Tag documents • Export as images or as native files
  • 21. + Eagle eye’s view - EDRM 21
  • 22. + Left of EDRM – Legal Hold 22 • FreeEedCollect • Architecture: https://github.com/markkerzner/FreeEedC ollect • ZooKeeper/MapReduce/Flume/HDFS
  • 23. + Right of EDRM – Org. charts 23 Partnership with Sintelix
  • 24. + Analytics – network of actors 24 Partnership with Sintelix
  • 25. + FreeEed and data governance 25 • Virtualization for data preservation • Scalable processing • Archiving • Documents groups not mixing • Data format stored together with software that understands it
  • 26. + Hadoop & Big Data applications 26 • Other related applications • Financial – text analytics • Energy – documents and procedures analytics • Actual on-going projects
  • 27. + FreeEed as a learning tool 27 • 100’s of downloads • Dozens of active users • Real-world Hadoop application • Many developers download to learn • Complex, real, but manageable
  • 28. + FreeEed adoption – who is trying our “popcorn”? 28 • Large law firms • Small law firms and solos • Government agencies • Universities • Enterprises • Developers learn Big Data
  • 29. + Looking forward 29 • Add • Collection • Analytics • Community • Integrations • Implementations
  • 30. + How you can use FreeEed 30 • For its intended purpose • Large law firms • Small firms and solos, • Pro-se • Integrate in the IT legal • Start a similar document management project
  • 31. + How you can use FreeEed 31 • For its intended purpose • Large law firms • Small firms and solos, • Pro-se • Integrate in the IT legal • Start a similar document management project
  • 32. + Q&A 32 • Thank you! • People usually ask: • How can I put my data in the cloud? • Is it safe? • Do you do OCR, PST, OST, etc…?