1. The Panama Papers and how the EDRM
would help review 11.5 million
documents
CityDocs7th April 2016 News
Read the news currently and you will see various articles on “the Biggest leak in history” with regard
to exposing 2.6 terabytes of Electronically Stored Information (ESI) and scanned documents aka The
Panama Papers.
Ramon Fonseca of the law firm Mossack Fonseca has gone on record to state that the leak of these
documents was not from an inside source, but that of a cyber breach. The cyber breach on its own
will bring questions around security and how law firms will have to monitor for network intrusions.
The Electronic Discovery Reference Model (EDRM) is designed to turn large volumes of data into
relevant documents, as shown in the diagram below:
How the data was Identified has not been released, as this was done through a cyber-attack on the
servers of Mossack Fonseca. At this time we do not know how the Collection was handled during the
attack, or whether the data was Preserved.
If you read this article by Matt Burgess, Carl Barron of Nuix explains how they were able to assist
Süddeutsche Zeitung and International Consortium of Investigative Journalists in breaking down the
11.5 million documents and indexing them so that searches could be conducted on the data set.
The Nuix Processing engine is one of the most powerful in the industry and can work through
exceptionally large data sets in a short period of time. A very useful feature in the Nuix arsenal is its
ability to pull names and companies from documents, into a list.
2. The document set being dealt with contained both electronic data and scanned hard copy
documents. In order to search the hard copy documents, traditionally a company would scan, unitise
and code certain metadata fields, such as “To”, “From”, “CC”, “BCC”, “Document Title/Subject” and
“Date”. The hard copy documents would then be loaded into a Review platform, such as kCura’s
Relativity, and the electronic documents would follow a traditional processing route through Nuix
and then loaded into the same review platform.
Using the names and companies that can be identified using Nuix, journalists would be able to put
together searches to identify the documents that could be relevant for their investigation. If a review
tool such as Relativity was to be used, those search terms could be highlighted across the document
set to help Analyse what is there.
Relativity has some very powerful analytics features of its own for dealing with large data sets, such
as email threading and near duplicate analysis. Email threading helps cut down the number of
documents requiring full review by showing only the inclusive email.
The last email in a thread: The last email in a particular thread will be marked
inclusive, because any text added in this last email (even just a “forwarded”
indication) will be unique to this email and this one alone. If nobody used
attachments, and nobody ever changed the subject line, or went “below the line”
to change text, this would be the only type of inclusiveness (Source).
Trying to manually review the 11.5 million Panama Papers would be costly and prohibitive. It would
also take a lot of time to go through the data set and require many people to review accordingly.
Technological advances from review platforms allow for an assisted option in reviewing a vast
quantity of documents. Relativity, for example, features an assisted review module that Identifies
relevant and irrelevant documents, saving huge amounts of time and money.
For data sets such as the Panama Papers, the EDRM has undoubtedly helped to streamline this
process and speed up the legal proceedings. Similarly, any project that contains data to process and
review would benefit from eDisclosure solutions over paper disclosure.
Written by James Merritt, Director at CityDocs Forensic Technology & eDisclosure.