Here we have placed the string “test\\n” in front of a valid 7zip file.
Given that the file doesn’t start with the 7zip start marker and instead begins with plaintext and a newline, the UNIX ‘file’ utility misinterprets it as a data file. p7zip, on the other hand, begins its interpretation of the file starting at the 7zip header. This results in the file still being a valid 7zip archive.
Here, while saving a GIF in GIMP, we write a PHP backdoor into a comment. This will be mostly ignored when parsing the file as an image, but as PHP only interprets code between its start and end markers “<?php” and “?>”, the image data will not affect the execution of the script.
The backdoor is written directly into the file.
Here is the combination PDF and 7zip file we’ve created, opened as a PDF.
Then, we change the file extension (though this actually should be unnecessary) and list the contents of the embedded 7zip archive.
This is a JPEG file. It looks ordinary and parses correctly.
When we interpret the same file as a RAR archive, we find that we have a valid archive, too! This RAR archive was simply appended to the end of our original JPEG. While it is possible to append a RAR to the end of a JPEG and get a file which opens as either format, it is not possible to append a JPEG to the end of a RAR and achieve the same results. This is due to the use of absolute offsets in the JPEG format which must be adjusted to point to the correct resources.
Before the fix was put in place, it was fairly commonplace to see book sharing threads on 4chan, where people appended rar files containing ebook versions of books to jpegs of book covers for the appropriate book. People could download the jpegs, change the extension to .rar, and get an ebook of the book mentioned.
Dan Crowley - Jack Of All Formats
Jack of all Formats<br />Daniel “unicornFurnace” Crowley<br />Penetration Tester, Trustwave - SpiderLabs<br />
Introductions<br />How can files be multiple formats?<br />Why is this interesting from a security perspective?<br />What can we do about it?<br />(yodawg we heard you like files so we put files in your files)<br />
Terms<br />File piggybacking<br />Placing one file into another<br />File consumption<br />Parsing a file and interpreting its contents<br />
Scope of this talk<br />Files which can be interpreted as multiple formats<br />…with at most a change of file extension<br />Covert channels<br />Through use of piggybacking<br />Examples are mostly Web-centric<br />Only because it’s my specialty<br />This concept applies to more than Web applications<br />Srsly this applies to more than Web applications<br />GUYS IT’S NOT JUST WEB APPS<br />
Files with multiple formats<br />How to piggyback files<br />
File format flexibility<br />Not always rigidly defined<br />From the PDF specification:“This standard does not specify the following:……methods for validating the conformance of PDF files or readers…”<br />Thank you Julia Wolf for “OMG WTF PDF”<br />CSV comments exist but are not part of the standard<br />Not all data in a file is parsed<br />Metadata<br />Unreferenced blocks of data<br />Data outside start/end markers<br />Reserved, unused fields<br />
File format flexibility<br />Some data can be interpreted multiple ways<br />Method of file consumption often determined by:<br />File extension<br />Multiple file extensions may result in multiple parses<br />Bytes at beginning of file<br />First identified file header<br />
7zip file with junk data at the beginning<br />
7zip file with junk data at the beginning<br />
Metadata<br />Information about the file itself<br />Not always parsed by the file consumer<br />“Comment”fields, few restrictions on data<br />Files can be inserted into comment fields for one format<br />ID3 tags for mp3 files will be shown in players<br />But not usually interpreted<br />
Unreferenced blocks of data<br />Certain formats define resources with offsets and sizes<br />Unmentioned parts of the file are ignored<br />Other files can occupy unmentioned space<br />Other formats indicate a total size of data to be parsed<br />Any additional data is ignored<br />Other files can simply be appended<br />
Unreferenced PDF object<br />PDF xref table, lists object offsets in the file<br />We first remove one reference<br />Next, we replace part of that object’s content…<br />
Unreferenced PDF object<br />…with a 7zip file.<br />
Start/End markers<br />Many formats use a magic byte sequence to denote the beginning of data<br />Similarly, many have one to denote the end of data<br />Data outside start/end markers is ignored<br />Files can be placed before or after such markers<br />Files must not contain conflicting markers<br />
Limitations<br />Some formats use absolute offsets<br />They must be placed at start of file or offsets must be adjusted<br />Examples: JPEG, BMP, PDF<br />Some have headers which indicate the size of each resource to follow<br />Such files are usually easy to work with<br />Other files can be appended without breaking things<br />Examples: RAR<br />
Limitations<br />Some files are simply parsed from start to end<br />Such files require some metadata, unreferenced space, or data which can be manipulated to have multiple meanings<br />Different parsers for the same format operate differently<br />Might implement different non-standard features<br />May interpret format of files in different ways<br />
TrueCrypt volumes<br />No start/end markers<br />No publicly known signature<br />Parsed from start of file to end of file<br />No metadata fields<br />No unused space<br />Data is difficult to manipulate<br />
Security Implications<br />Reasons why file piggybacking must be considered<br />
Security Implications<br />File upload pwnage<br />Checking for well-formed images doesn’t prevent backdoor upload<br />Anti-Virus evasion<br />Some AV detect file format being scanned then apply format specific rules<br />If file is multiple formats the wrong rules might be applied<br />Data infiltration/exfiltration<br />Do you care what .mp3 files pass in and out of your network?<br />How about .exe and .doc files?<br />
Security Implications<br />Multiple file consumers<br />Different programs may interpret the file in different ways<br />GIFAR issue<br />Parasitic storage<br />How many file uploads allow only valid images?<br />Disk space exhaustion DoS<br />Some image uploads limit uploads by picture dimensions<br />Size of the file may not actually be checked<br />
File upload pwnage<br />Imagine a Web-based image upload utility<br />It confirms that the uploaded file is a valid JPEG<br />It doesn’t check the file extension<br />It uploads the file into the Web root<br />It doesn’t set the permissions to disallow execution<br />Code upload is possible if the file is also a valid JPEG<br />This isn’t hard…<br />
Anti-Virus evasion exercise<br />Check detection rates on Win32 netcat<br />Place it in an archive and check<br />Put junk data at the beginning of the file and check<br />Piggyback the archive onto the end of a JPEG and check<br />Change the file extension to .JPG and check<br />
Without breaking the piggybacked files’ usability</li></li></ul><li>Parasitic storage<br /><ul><li>Certain sites allow for file upload of specific formats</li></ul>File piggybacking essentially removes this limitation<br /><ul><li>This technique has been used on 4chan (now fixed)</li></ul>Book sharing threads<br />LOIC distribution<br />CP distribution<br /><ul><li>Still works on ImagesHack.Us