This presentation was provided for an ARMA Indianapolis Chapter meeting.
How did I get the term “Dark Data”? Not from Darth Vader, but they do have some things in common.
I copied “Dark Matter”, because it also goes undetected yet still affects things (objects/solar systems) around it.This image was created by observing the gravitational effects on light and objects around the matter. No instrument can actually see the dark matter directly.
Dark Data is in everything digital that we create, yet we don’t see it.
Dark Data is hiding in the most unsuspecting places.
DCO – Used to reduce the disk size to exactly match the size of another hard drive. This makes it easier to clone hard drives.HPA – Used to store vendor utilities on a hard drive, where a user can’t delete them.These areas are difficult to access and add or remove.Unformatted Disk Space is the remaining space that has not been allocated to a disk volume that the user can access.
Many recovery tools falsely report their recovery success. Many of the successfully recovered files are actually corrupted with other file fragments.
Most Forensics Tools keep these files in the Exception Bin. Have you ever seen an investigation with an empty Exception Bin? What if the best evidence was hiding in that Exception Bin?!?Ex: Hidden TrueCrypt volume file, that looks like random data.
The list on the left was produced with Windows, as an extreme example. Although, many eDiscovery tools don’t do much better than this.The list on the right was produced by a tool that specializes in accurately identifying thousands of file types.Notice the 3 Alternate Data Streams identified on the right. They weren’t just detected, but analyzed to catch any hidden file types.
Many tools combine RAM slack with Drive Slack. This causes confusion when file carving for partial files, because these slacks come from different sources.
Common files may contain stowaways.Bpp = Bits Per Pixel
Step 1: Rename the file to be smuggled to ‘document.xml’ (I used a simple text file)Step 2: Rename Word.docx to Word.zipStep 3: Open Word.zip with WinZipStep 4: Add the new smuggled ‘document.xml’ to Word.zip (in the root)Step 5: Rename Word.zip to Word.docx
This example shows an MS Outlook Form Template that was edited to remove part of a sentence. The deleted content is still there!When the paragraph/object shrank, the Stream Slack inherited the end of the paragraph.Existing Redaction tools use Microsoft libraries that ignore the Stream Slack.
Smuggled data is broken down into bits and substituted for picture data that doesn’t effect the visible image enough to be noticed.May just change 1 bit per pixel, or fill the Field Slack.The smuggled data may also be encrypted before insertion.
Here are some methods for cleaning out and preventing Dark Data.
Researching tools that can track & redact metadata andDark Data artifacts is vital in your fight against misconduct. If your IT department isn’t doing this, then you are your company’s last line of defense.
Dark Data Hiding in your Records: Opportunity or Danger?
Dark DataHiding in your RecordsOpportunity or Danger?<br />Rob Zirnstein<br />President<br />Forensic Innovations<br />January 19th, 2011<br />
Darth Vader?<br />No, “Dark Data”, but they both<br />Are often associated with evil<br />Keep secrets (“Luke, I’m your father”)<br />Are potentially harmful<br />
Dark Matter?<br />No, “Dark Data”! But they both<br />Go undetected<br />Are surrounded by<br /> detectable stuff<br />Affect things around them<br />
What is Dark Data?<br />Dark Data in our digital devices<br />Everyone creates it (unintentionally)<br />Criminals may hide it (Anti-Forensics)<br />Forensic tools can’t see it<br />But it is there!<br />Data that we can’t see<br />On our hard drives<br />On out flash drives<br />In our computer files<br />
Where is Dark Data?<br />DCO & HPA<br />Unformatted Disk Space<br />Deleted Files<br />Unknown Files<br />Between Files<br />Inside Common Files<br />Deleted Data Objects<br />
Deleted Files<br />Deleted Files aren’t really gone?<br />Unused Disk Space (in a volume)<br />Disk Caches / Swap Files<br />Windows Recycle Bin<br />Are they hard to recover?<br />Fragmentation is deadly<br />Large databases tend to be<br /> heavily fragmented<br />Even DFRWS Researchers find<br /> that fragmentation can make<br /> some file types impossible to<br /> recover (http://www.dfrws.org/2007/challenge/results.shtml)<br />
Unknown Files (1)<br />500 types of files handled by eDiscovery, Document Management & Computer Forensics Tools<br />50,000+* types of files in the world<br />5,000 types of files typically in use<br />*http://filext.com<br />
Between Files<br />Alternate Data Streams (ADS)<br />Files hiding behind files (on NTFS)<br />RAM Slack<br />Padding between the end of a file and the end of the current sector<br />Typically zeros, sometimes random content<br />File/Cluster/Residual/Drive Slack<br />Padding between sectors used<br /> & the end of the current cluster<br />Previous sector content that<br /> should be used in File Carving<br />http://www.forensics-intl.com/def6.html<br />
Inside Common Files<br />Deleted Objects<br />Ex: Adobe PDF & MS Office 2003 (OLE)<br /> not removing deleted data (change tracking)<br />Smuggled Objects<br />Ex: MS Office 2007 (Zip) and MS Wave<br /> (RIFF) formats ignore foreign objects<br />Object / Stream Slack<br />Ex: OLE objects have sector size issues,<br /> just like with disk sectors<br />Field Slack<br />Ex: Image files that don’t use the whole<br /> palette, and/or less than 8/16/32/48 bpp<br />Steganography<br />
Smuggled Objects<br />Some formats ignore<br /> foreign objects<br />MS Office 2007 (Zip)<br />MS Wave (RIFF)<br />This example<br />I added a file to a<br /> Word 2007 document.<br />The document opens<br /> without any error.<br />
Deleted Data in Slack<br /> Deleted Data that evades Redaction<br />
Steganography<br />Intentional Data Hiding<br />
Dark Data Can Be Fragile<br />Deleting Files without using the Recycle Bin.<br />SHIFT + DEL<br />Defragmenting a hard drive.<br />Installing Applications.<br />Turning off “Track Changes” & “Fast Save” options.<br />Using Redaction Tools.<br />MS Word - http://redaction.codeplex.com<br />PDF - http://www.appligent.com/redax<br />PDF - http://www.rapidredact.com<br />Using Data Wipers.<br />SafeErase - http://www.oo-software.com<br />CyberScrub - http://www.cyberscrub.com<br />
Dangers<br />You may loose a law suit if the other side finds what you missed.<br />Corporate Digital Assets may be walking out the door.<br />Intellectual Property theft<br /> can put a company out of business.<br />
Opportunities<br />Protect your company by being Aware of your Digital Assets.<br />Illegal content may be hidden accidentally or intentionally.<br />Recover lost Digital Assets by knowing where to look.<br />Employee misconduct is tracked by the hidden trail of improper acts.<br />Catch Intellectual Property theft before it walks out the door.<br />Identify in-house criminals by detecting their smuggling methods.<br />
What Does FI Do?<br />Create Technologies to Capture Dark Data<br />File Investigator<br />File Expander<br />File Harvester<br />Equip Law Enforcement with Tools<br />FI TOOLS<br />FI Object Explorer<br />FI Data Profiler Portable<br />
FI Technologies<br />File Investigator<br />Discovers Files Masquerading as Other Types<br />Identifies 3,953+ File Types<br />High Accuracy & Speed<br />File Expander<br />Discovers Hidden Data within files<br />Data missed by all forensic tools<br /><ul><li>File Harvester (Under Development)</li></ul>Recovers deleted/lost files the<br /> rest of the industry can’t<br />Will eventually rebuild partial files<br />