nullcon 2011 - Fuzzing with Complexities


Published on

Fuzzing with Complexities by Vishwas Sharma

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

nullcon 2011 - Fuzzing with Complexities

  1. 1. Fuzzing with complexities<br />Vishwas Sharma<br /><br /><br />
  2. 2. Introduction<br />We all have been a witness to major threats in the past years and I guess no one could forget names like ‘Conficker’ (1), ‘Stuxnet’ (2) and ‘Aurora Project’ (3). All these malware had a unique delivery system which was based on exploiting the host operating system and further talking control of the OS.<br />These threats are always there and only thing we expect to achieve is that, we find vulnerability before a bad guy do and do something about it.<br />Software companies spend a lot of their time and money in making their product more stable, more reliable and more secure.<br />Vista Microsoft has made sure that functions like strcpy, sprintf etc. are eliminated at the Software development lifecycle (SDL)<br /><br /><br />
  3. 3. Introduction<br />In fact all major vendors have realized the importance of having a secure SDL and importance of testing in their product.<br />Google and Firefox have a policy of rewarding any researcher who comes up with a bug or a resulting exploit.<br /><br /><br />Figure 1: Microsoft Simplified SDL (4)<br />
  4. 4. Software Testing<br />Software testing is any activity aimed at evaluating an attribute or capability of a program or system and determining that it meets its required results.<br />Unlike most physical systems, most of the defects in software are design errors, not manufacturing defects<br /><br /><br />
  5. 5. Code Coverage<br />Code coverage is one of the most important metrics used to decide on the completeness of the test cases. This metrics gives us the relationship between the test conducted and instructions executed with the application. <br /><br /><br />
  6. 6. Code Coverage<br />Of course this metrics can be further broken down into more detailed metrics<br />Function coverage - Has each function (or subroutine) in the program been called?<br />Statement coverage - Has each node in the program been executed?<br />Decision coverage - Has every edge in the program been executed? For instance, have the requirements of each branch of each control structure (such as in IF and CASE statements) been met as well as not met?<br />Condition coverage - Has each Boolean sub-expression evaluated both to true and false? <br />Condition coverage - Both decision and condition coverage should be satisfied.<br /><br /><br />
  7. 7. Code Coverage<br /><br /><br />An example of Code coverage<br />
  8. 8. Code Coverage<br /><br /><br />Test needed to find bugs<br />Tests needed for coverage<br />Shows that even on a good coverage some bugs would still be left alone<br />
  9. 9. BlackBox Testing<br />No knowledge of the inner working of the software, neither of the protocol or kind of input expected, this situation is rightly named as Black-box Testing<br /><br /><br />
  10. 10. Whitebox Testing<br />Information on internal data structure and algorithms is completely shared between the product development team and the tester’s team<br />Information can be used to test API’s, Code Coverage, fault injection, Mutation of testing and many more.<br /><br /><br />
  11. 11. Fuzzing<br />The first person credit of working and formulating this technique is Barton Miller and his students from University of Wisconsin-Madison in 1989<br />In simple words it is the technique in which repeated invalid or mutated or malformed input is supplied to application with only intention to find bugs the application<br />It is observed that fuzzing is most effective against application developed in C/C++, these languages make the programmer responsible for memory management whereas managed code i.e. developed in C#, Java etc. would yield bugs of a very different class<br /><br /><br />
  12. 12. Fuzzing<br /><br /><br />
  13. 13. Fuzzing<br /><br /><br />Important distinction between Fuzzing and other testing activity. This distinction is the intent. <br />A testing team knows a lot about the program and basically test that whether a program is behaving as it is supposed to behave where as a security researcher only care that his fuzzer crashes your tested application. <br />
  14. 14. Fuzzer<br /><br /><br />I would like to make note of two python based fuzzing framework available in the open source community that I use most extensively.<br />PeachFuzzer - Peach is a SmartFuzzer that is capable of performing both generation and mutation based fuzzing (10).<br />Sulley - Sulley is a fuzzer development and fuzz testing framework consisting of multiple extensible components. Sulley (IMHO) exceeds the capabilities of most previously published fuzzing technologies, commercial and public domain <br />
  15. 15. Fuzzer<br /><br /><br />I would like to make note of two python based fuzzing framework available in the open source community that I use most extensively.<br />PeachFuzzer - Peach is a SmartFuzzer that is capable of performing both generation and mutation based fuzzing.<br />Peach Fuzzing Platform<br /><ul><li>Sulley - Sulley is a fuzzer development and fuzz testing framework consisting of multiple extensible components. Sulley exceeds the capabilities of most previously published fuzzing technologies, commercial and public domain</li></ul> <br />
  16. 16. Fuzzer<br />Peach is been improved day in and day out and it is the only other open source fuzzer that is maintained apart from Metasploit fuzzer. Peach is written as primary data fuzzer, but as it open source it can be extended to secondary and even nth-class fuzzer. Peach fuzzer is also used by adobe in its testing of Adobe reader<br />Sulley is not maintained but is as good as you can get when it comes to generation based fuzzing<br />Collection of fuzzers<br /><br /><br />
  17. 17. Complexity<br />“Software bugs will almost always exist in any software module with moderate size: not because programmers are careless or irresponsible, but because the complexity of software is generally intractable -- and humans have only limited ability to manage complexity. It is also true that for any complex systems, design defects can never be completely ruled out” - Jiantao Pan, Carnegie Mellon University<br />In many of the fuzzers it is observed that test cases produced fails to achieve the basic packet sanitation test of the target application if the fuzzer is has improper understanding of the input type and structure<br /><br /><br />
  18. 18. Complexity<br />A study done by Microsoft on a 450 lines of code and then testing it with various fuzz combinations to see the effective results that was produced is shown below :<br /><br /><br />Analysis based on Effort in producing fuzzer and defects found correlated with kind of fuzzer<br />
  19. 19. Packets<br />An example of ASCII based packet (irc)<br />There are few other examples quite popularly known eg.<br />HTML<br />CSS<br />FTP<br />And many more<br /><br /><br />
  20. 20. Binary based Packets<br /><br /><br />But what happens when the formats no longer sticks to one data format? What happens when our data switches from one set of data format like ASCII to binary and then binary to ASCII again and to add a cherry on top sections are encoded differently even the ascii portion can be encoded and even imported from other binary or ASCII based formats<br />
  21. 21. Example of one such format<br /><br /><br />Example of one such complex formats ie. PDF<br />We see these being used in every day applications like office documents, Adobe PDF, SMB protocols and more. One cannot try to randomly fuzz these files as they have pretty good input validation modules which prevent any dumb attempt to fuzz them<br />
  22. 22. What we know so far<br /><br /><br />What we have gathered until here is summarized here as we move ahead you will find answers to these problems<br />
  23. 23. Some answers<br /><br /><br />Code Coverage fails for these applications<br />Protocol awareness can be used as once we have all the information of a protocol that we could have, we can intuitively say that the packet which contains the most number of tags or objects would require more code to be covered with that module. Now this could be said that we cannot guarantee the code coverage still because if we do not find a packet that contains all the tags or object<br />Testing all cases in one go was never the idea but multiple tests covering every tag is what will be fruitful.<br />Data format inconsistency<br />One can easy write a fuzzer of either and ASCII based packet or for binary based packet. But when these formats get together in a packet, it becomes unnaturally difficult to write one. <br />The solution lays in visualizing and breaking problem in parts which we most comfortable in. We can use the separate out the data generation capability from both ASCII and Binary format. Remember here I have trying to separate out these capabilities not necessarily for fuzzing.<br />
  24. 24. Some answers<br /><br /><br />Multiple Files Embedded in a single packets<br />With separating of types we can further separate to a secondary level data production module ie. A different level of generating data. What this means is that if a PDF file if we have a font and image embedded inside the file we can actually write a different fuzzer for font and for an image and combines each of these result with the PDF files in the manner similar to multiple encoding level problem.<br />Multiple Encoding levels<br />As we have separated ASCII with Binary in the same format one can further add custom encoding in each packet as one like. They will all fall back together when we combine them later. See the case study for more clarification. <br />For example in a PDF file if we have a multiple font embedded inside the file we can make use of different encoders for each such font as each is generated separately<br />
  25. 25. Strategy <br /><br /><br />Now is the right time to talk about the strategy that I have used when fuzzing one such format, PDF. You will find different definition of these terms, but this is what I understand out of them. This process is typically described in the terms of system under test and called for directed area with the system, where as in my study I have taken it out of box and placed these conditions on Data packet itself.<br />
  26. 26. Attack point selection<br /><br /><br />Attack Point Selection<br />The attach point selection is a simple process in which I have tried to specify a specific point within the packet which needs to be tested. Now selection of these points depends a lot upon some gathered intelligence of the system, including pervious vulnerabilities. As this eliminates a few attack point as they have already been attacked before. For example if working on a simple PDF file which contain a U3D file which is known to previously cause a vulnerability in Adobe reader one can say this format is previously been tested primarily (after looking at the vulnerability) so a lot more efforts would be required in finding a vulnerability next time. One can focus his time and energy in finding other routes into the application which has still not been tested by security researchers.<br />
  27. 27. Directed Fuzzing<br /><br /><br />To Fuzz<br />Whenever a vulnerability is released it is released with a very few information. One such disclosure example would be. <br /> Adobe Flash Player Multiple Tag JPEG Parsing Remote Code Execution Vulnerability<br />-- Vulnerability Details:<br />This vulnerability allows remote attackers to execute arbitrary code on vulnerable installations of Adobe Flash Player. User interaction is required in that a target must visit a malicious website. The specific flaw exists within the code for parsing embedded image datawithin SWF files. The DefineBits tag and several of its variations are prone to a parsing issue while handling JPEG data. Specifically, the vulnerability is due to decompression routines that do not validate image dimensions sufficiently before performing operations on heap memory. An attacker can exploit this vulnerability to execute arbitrary code under the context of the user running the browser.<br />Figure 7: An example of Vulnerability disclosure<br />
  28. 28.<br /><br />Figure 7: An example of Vulnerability disclosure<br />Demo<br />CVE 2010-2862<br />Integer overflow in CoolType.dll in Adobe Reader 8.2.3 and 9.3.3, and Acrobat 9.3.3, allows remote attackers to execute arbitrary code via a TrueType font with a large maxCompositePoints value in a Maximum Profile (maxp) table.<br />