SlideShare a Scribd company logo
1 of 2
Download to read offline
The Panama Papers and how the EDRM
would help review 11.5 million
documents
CityDocs7th April 2016 News
Read the news currently and you will see various articles on “the Biggest leak in history” with regard
to exposing 2.6 terabytes of Electronically Stored Information (ESI) and scanned documents aka The
Panama Papers.
Ramon Fonseca of the law firm Mossack Fonseca has gone on record to state that the leak of these
documents was not from an inside source, but that of a cyber breach. The cyber breach on its own
will bring questions around security and how law firms will have to monitor for network intrusions.
The Electronic Discovery Reference Model (EDRM) is designed to turn large volumes of data into
relevant documents, as shown in the diagram below:
How the data was Identified has not been released, as this was done through a cyber-attack on the
servers of Mossack Fonseca. At this time we do not know how the Collection was handled during the
attack, or whether the data was Preserved.
If you read this article by Matt Burgess, Carl Barron of Nuix explains how they were able to assist
Süddeutsche Zeitung and International Consortium of Investigative Journalists in breaking down the
11.5 million documents and indexing them so that searches could be conducted on the data set.
The Nuix Processing engine is one of the most powerful in the industry and can work through
exceptionally large data sets in a short period of time. A very useful feature in the Nuix arsenal is its
ability to pull names and companies from documents, into a list.
The document set being dealt with contained both electronic data and scanned hard copy
documents. In order to search the hard copy documents, traditionally a company would scan, unitise
and code certain metadata fields, such as “To”, “From”, “CC”, “BCC”, “Document Title/Subject” and
“Date”. The hard copy documents would then be loaded into a Review platform, such as kCura’s
Relativity, and the electronic documents would follow a traditional processing route through Nuix
and then loaded into the same review platform.
Using the names and companies that can be identified using Nuix, journalists would be able to put
together searches to identify the documents that could be relevant for their investigation. If a review
tool such as Relativity was to be used, those search terms could be highlighted across the document
set to help Analyse what is there.
Relativity has some very powerful analytics features of its own for dealing with large data sets, such
as email threading and near duplicate analysis. Email threading helps cut down the number of
documents requiring full review by showing only the inclusive email.
The last email in a thread: The last email in a particular thread will be marked
inclusive, because any text added in this last email (even just a “forwarded”
indication) will be unique to this email and this one alone. If nobody used
attachments, and nobody ever changed the subject line, or went “below the line”
to change text, this would be the only type of inclusiveness (Source).
Trying to manually review the 11.5 million Panama Papers would be costly and prohibitive. It would
also take a lot of time to go through the data set and require many people to review accordingly.
Technological advances from review platforms allow for an assisted option in reviewing a vast
quantity of documents. Relativity, for example, features an assisted review module that Identifies
relevant and irrelevant documents, saving huge amounts of time and money.
For data sets such as the Panama Papers, the EDRM has undoubtedly helped to streamline this
process and speed up the legal proceedings. Similarly, any project that contains data to process and
review would benefit from eDisclosure solutions over paper disclosure.
Written by James Merritt, Director at CityDocs Forensic Technology & eDisclosure.

More Related Content

What's hot

Sensordaten analysieren mit Docker, CrateDB und Grafana
Sensordaten analysieren mit Docker, CrateDB und GrafanaSensordaten analysieren mit Docker, CrateDB und Grafana
Sensordaten analysieren mit Docker, CrateDB und GrafanaClaus Matzinger
 
Harvard Law School Justice Hackathon 2018 | Hauser Hall by Jacob Khan
Harvard Law School Justice Hackathon 2018 | Hauser Hall by Jacob KhanHarvard Law School Justice Hackathon 2018 | Hauser Hall by Jacob Khan
Harvard Law School Justice Hackathon 2018 | Hauser Hall by Jacob KhanJacob Khan
 
Clariah WP4 dataLegend data stories
Clariah WP4 dataLegend data storiesClariah WP4 dataLegend data stories
Clariah WP4 dataLegend data storiesRuben Schalk
 
K-NEAREST NEIGHBOR CLASSIFICATION OVER SEMANTICALLY SECURE ENCRYPTED RELATION...
K-NEAREST NEIGHBOR CLASSIFICATION OVER SEMANTICALLY SECURE ENCRYPTED RELATION...K-NEAREST NEIGHBOR CLASSIFICATION OVER SEMANTICALLY SECURE ENCRYPTED RELATION...
K-NEAREST NEIGHBOR CLASSIFICATION OVER SEMANTICALLY SECURE ENCRYPTED RELATION...I3E Technologies
 
K nearest neighbor classification over semantically secure encrypted relation...
K nearest neighbor classification over semantically secure encrypted relation...K nearest neighbor classification over semantically secure encrypted relation...
K nearest neighbor classification over semantically secure encrypted relation...ieeepondy
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text MiningHemant Sharma
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryRuben Schalk
 
WestlawNext - Research at the Next Level - The Recorder 2011
WestlawNext - Research at the Next Level - The Recorder  2011WestlawNext - Research at the Next Level - The Recorder  2011
WestlawNext - Research at the Next Level - The Recorder 2011Marissa Andrea
 
Linked Data media experiment
Linked Data media experimentLinked Data media experiment
Linked Data media experimentMediArena
 
Big data forum 19 march 2014
Big data forum   19 march 2014Big data forum   19 march 2014
Big data forum 19 march 2014Matt Carroll
 
Barcelona 2014: CrossRef System and Support Update by Chuck Koscher
Barcelona 2014: CrossRef System and Support Update by Chuck KoscherBarcelona 2014: CrossRef System and Support Update by Chuck Koscher
Barcelona 2014: CrossRef System and Support Update by Chuck KoscherCrossref
 

What's hot (13)

DBpedia mobile
DBpedia mobileDBpedia mobile
DBpedia mobile
 
cloud fuzzy search
cloud fuzzy searchcloud fuzzy search
cloud fuzzy search
 
Sensordaten analysieren mit Docker, CrateDB und Grafana
Sensordaten analysieren mit Docker, CrateDB und GrafanaSensordaten analysieren mit Docker, CrateDB und Grafana
Sensordaten analysieren mit Docker, CrateDB und Grafana
 
Harvard Law School Justice Hackathon 2018 | Hauser Hall by Jacob Khan
Harvard Law School Justice Hackathon 2018 | Hauser Hall by Jacob KhanHarvard Law School Justice Hackathon 2018 | Hauser Hall by Jacob Khan
Harvard Law School Justice Hackathon 2018 | Hauser Hall by Jacob Khan
 
Clariah WP4 dataLegend data stories
Clariah WP4 dataLegend data storiesClariah WP4 dataLegend data stories
Clariah WP4 dataLegend data stories
 
K-NEAREST NEIGHBOR CLASSIFICATION OVER SEMANTICALLY SECURE ENCRYPTED RELATION...
K-NEAREST NEIGHBOR CLASSIFICATION OVER SEMANTICALLY SECURE ENCRYPTED RELATION...K-NEAREST NEIGHBOR CLASSIFICATION OVER SEMANTICALLY SECURE ENCRYPTED RELATION...
K-NEAREST NEIGHBOR CLASSIFICATION OVER SEMANTICALLY SECURE ENCRYPTED RELATION...
 
K nearest neighbor classification over semantically secure encrypted relation...
K nearest neighbor classification over semantically secure encrypted relation...K nearest neighbor classification over semantically secure encrypted relation...
K nearest neighbor classification over semantically secure encrypted relation...
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University Library
 
WestlawNext - Research at the Next Level - The Recorder 2011
WestlawNext - Research at the Next Level - The Recorder  2011WestlawNext - Research at the Next Level - The Recorder  2011
WestlawNext - Research at the Next Level - The Recorder 2011
 
Linked Data media experiment
Linked Data media experimentLinked Data media experiment
Linked Data media experiment
 
Big data forum 19 march 2014
Big data forum   19 march 2014Big data forum   19 march 2014
Big data forum 19 march 2014
 
Barcelona 2014: CrossRef System and Support Update by Chuck Koscher
Barcelona 2014: CrossRef System and Support Update by Chuck KoscherBarcelona 2014: CrossRef System and Support Update by Chuck Koscher
Barcelona 2014: CrossRef System and Support Update by Chuck Koscher
 

Similar to PanamaPapers

Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioCHAKER ALLAOUI
 
SURVEY OF UNITED STATES RELATED DOMAINS: SECURE NETWORK PROTOCOL ANALYSIS
SURVEY OF UNITED STATES RELATED DOMAINS: SECURE NETWORK PROTOCOL ANALYSISSURVEY OF UNITED STATES RELATED DOMAINS: SECURE NETWORK PROTOCOL ANALYSIS
SURVEY OF UNITED STATES RELATED DOMAINS: SECURE NETWORK PROTOCOL ANALYSISIJNSA Journal
 
Meliorating usable document density for online event detection
Meliorating usable document density for online event detectionMeliorating usable document density for online event detection
Meliorating usable document density for online event detectionIJICTJOURNAL
 
Network Forensic Investigation of HTTPS Protocol
Network Forensic Investigation of HTTPS ProtocolNetwork Forensic Investigation of HTTPS Protocol
Network Forensic Investigation of HTTPS ProtocolIJMER
 
Cert Overview
Cert OverviewCert Overview
Cert Overviewmattnik
 
Christopher furton-darpa-project-memex-erodes-internet-privacy
Christopher furton-darpa-project-memex-erodes-internet-privacyChristopher furton-darpa-project-memex-erodes-internet-privacy
Christopher furton-darpa-project-memex-erodes-internet-privacyChris Furton
 
Demystifying analytics in e discovery white paper 06-30-14
Demystifying analytics in e discovery   white paper 06-30-14Demystifying analytics in e discovery   white paper 06-30-14
Demystifying analytics in e discovery white paper 06-30-14Steven Toole
 
A Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisA Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisMichele Thomas
 
An introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt ThearlingAn introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt ThearlingPim Piepers
 
An introduction to Data Mining
An introduction to Data MiningAn introduction to Data Mining
An introduction to Data MiningShobhita Dayal
 
Understanding the world wide web
Understanding the world wide webUnderstanding the world wide web
Understanding the world wide webChelse Benham
 
Understanding The World Wide Web
Understanding The World Wide WebUnderstanding The World Wide Web
Understanding The World Wide WebChelse Benham
 
Competitive Intelligence Made easy
Competitive Intelligence Made easyCompetitive Intelligence Made easy
Competitive Intelligence Made easyRaghav Shaligram
 
Additional Data Session Statistical Data Distinguish between full.docx
Additional Data Session Statistical Data Distinguish between full.docxAdditional Data Session Statistical Data Distinguish between full.docx
Additional Data Session Statistical Data Distinguish between full.docxwrite4
 
the darknet and the future of content distribution
the darknet and the future of content distributionthe darknet and the future of content distribution
the darknet and the future of content distributionmustafa sarac
 

Similar to PanamaPapers (20)

Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process Scenario
 
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUDLITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
LITERATURE SURVEY ON BIG DATA AND PRESERVING PRIVACY FOR THE BIG DATA IN CLOUD
 
SURVEY OF UNITED STATES RELATED DOMAINS: SECURE NETWORK PROTOCOL ANALYSIS
SURVEY OF UNITED STATES RELATED DOMAINS: SECURE NETWORK PROTOCOL ANALYSISSURVEY OF UNITED STATES RELATED DOMAINS: SECURE NETWORK PROTOCOL ANALYSIS
SURVEY OF UNITED STATES RELATED DOMAINS: SECURE NETWORK PROTOCOL ANALYSIS
 
Meliorating usable document density for online event detection
Meliorating usable document density for online event detectionMeliorating usable document density for online event detection
Meliorating usable document density for online event detection
 
Network Forensic Investigation of HTTPS Protocol
Network Forensic Investigation of HTTPS ProtocolNetwork Forensic Investigation of HTTPS Protocol
Network Forensic Investigation of HTTPS Protocol
 
Cert Overview
Cert OverviewCert Overview
Cert Overview
 
Christopher furton-darpa-project-memex-erodes-internet-privacy
Christopher furton-darpa-project-memex-erodes-internet-privacyChristopher furton-darpa-project-memex-erodes-internet-privacy
Christopher furton-darpa-project-memex-erodes-internet-privacy
 
Demystifying analytics in e discovery white paper 06-30-14
Demystifying analytics in e discovery   white paper 06-30-14Demystifying analytics in e discovery   white paper 06-30-14
Demystifying analytics in e discovery white paper 06-30-14
 
Computer forencis
Computer forencisComputer forencis
Computer forencis
 
Exposing the Hidden Web
Exposing the Hidden WebExposing the Hidden Web
Exposing the Hidden Web
 
A Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisA Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And Analysis
 
An introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt ThearlingAn introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt Thearling
 
An introduction to Data Mining
An introduction to Data MiningAn introduction to Data Mining
An introduction to Data Mining
 
Datamining
DataminingDatamining
Datamining
 
Understanding the world wide web
Understanding the world wide webUnderstanding the world wide web
Understanding the world wide web
 
Understanding The World Wide Web
Understanding The World Wide WebUnderstanding The World Wide Web
Understanding The World Wide Web
 
Competitive Intelligence Made easy
Competitive Intelligence Made easyCompetitive Intelligence Made easy
Competitive Intelligence Made easy
 
Additional Data Session Statistical Data Distinguish between full.docx
Additional Data Session Statistical Data Distinguish between full.docxAdditional Data Session Statistical Data Distinguish between full.docx
Additional Data Session Statistical Data Distinguish between full.docx
 
the darknet and the future of content distribution
the darknet and the future of content distributionthe darknet and the future of content distribution
the darknet and the future of content distribution
 
Darknet5 (1)
Darknet5 (1)Darknet5 (1)
Darknet5 (1)
 

PanamaPapers

  • 1. The Panama Papers and how the EDRM would help review 11.5 million documents CityDocs7th April 2016 News Read the news currently and you will see various articles on “the Biggest leak in history” with regard to exposing 2.6 terabytes of Electronically Stored Information (ESI) and scanned documents aka The Panama Papers. Ramon Fonseca of the law firm Mossack Fonseca has gone on record to state that the leak of these documents was not from an inside source, but that of a cyber breach. The cyber breach on its own will bring questions around security and how law firms will have to monitor for network intrusions. The Electronic Discovery Reference Model (EDRM) is designed to turn large volumes of data into relevant documents, as shown in the diagram below: How the data was Identified has not been released, as this was done through a cyber-attack on the servers of Mossack Fonseca. At this time we do not know how the Collection was handled during the attack, or whether the data was Preserved. If you read this article by Matt Burgess, Carl Barron of Nuix explains how they were able to assist Süddeutsche Zeitung and International Consortium of Investigative Journalists in breaking down the 11.5 million documents and indexing them so that searches could be conducted on the data set. The Nuix Processing engine is one of the most powerful in the industry and can work through exceptionally large data sets in a short period of time. A very useful feature in the Nuix arsenal is its ability to pull names and companies from documents, into a list.
  • 2. The document set being dealt with contained both electronic data and scanned hard copy documents. In order to search the hard copy documents, traditionally a company would scan, unitise and code certain metadata fields, such as “To”, “From”, “CC”, “BCC”, “Document Title/Subject” and “Date”. The hard copy documents would then be loaded into a Review platform, such as kCura’s Relativity, and the electronic documents would follow a traditional processing route through Nuix and then loaded into the same review platform. Using the names and companies that can be identified using Nuix, journalists would be able to put together searches to identify the documents that could be relevant for their investigation. If a review tool such as Relativity was to be used, those search terms could be highlighted across the document set to help Analyse what is there. Relativity has some very powerful analytics features of its own for dealing with large data sets, such as email threading and near duplicate analysis. Email threading helps cut down the number of documents requiring full review by showing only the inclusive email. The last email in a thread: The last email in a particular thread will be marked inclusive, because any text added in this last email (even just a “forwarded” indication) will be unique to this email and this one alone. If nobody used attachments, and nobody ever changed the subject line, or went “below the line” to change text, this would be the only type of inclusiveness (Source). Trying to manually review the 11.5 million Panama Papers would be costly and prohibitive. It would also take a lot of time to go through the data set and require many people to review accordingly. Technological advances from review platforms allow for an assisted option in reviewing a vast quantity of documents. Relativity, for example, features an assisted review module that Identifies relevant and irrelevant documents, saving huge amounts of time and money. For data sets such as the Panama Papers, the EDRM has undoubtedly helped to streamline this process and speed up the legal proceedings. Similarly, any project that contains data to process and review would benefit from eDisclosure solutions over paper disclosure. Written by James Merritt, Director at CityDocs Forensic Technology & eDisclosure.