Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery [Rooted CON 2013]


Published on

La charla está enfocada en una herramienta de análisis de código estático, la cuál se encuentra en desarrollo actualmente, enfocada específicamente en la búsqueda de vulnerabilidades, en vez de centrarse en errores típicos de programación como las más populares herramientas de análisis de código tales como Coverity o Klockwork. Durante el transcurso de la misma se irá dando toda la base necesaria para entender el funcionamiento de estas herramientas, la diferencia entre herramientas para buscar bugs y vulnerabilidades así como la parte que el ponente considera fundamental de dar interactividad a este tipo de herramientas.

Al final de la charla se mostrará una pequeña demo de la herramienta actual y algunos fallos/vulnerabilidades encontrados gracias a la misma.

Published in: Technology

Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery [Rooted CON 2013]

  1. 1. Interactive Static Analysis Tools for Vulnerability Discovery (Fugue) Joxean Koret
  2. 2. Static Analysis Tools● What are them? – Tools to find properties of a given piece of software without actually executing it. – The “properties” I find in this case are bugs/vulnerabilities.● We need good static analysis tools for performing audits in software.
  3. 3. Why?● Software is becoming bigger and bigger.● As so, harder to analyze. – Examples: Firefox, Google Chrome, MS Office...● Auditing software like this, by hand, is tedious and takes a long while.● Fuzzing is good for finding vulnerabilities in such big products. – But is not the solution (neither is SA, I think). – Is just another useful tool.7/04/13
  4. 4. Why?● Typical old vulnerabilities easily found by quick manual code audits are almost gone, bye-bye! – strcpy, memcpy, sprintf, syslog, etc...● No vulnerabilities like this in highly audited code bases (except maybe sudo or freetype...). – Apache, Firefox, Google Chrome...● We need better tools. – My approach: Static analysis (Fugue).7/04/13
  5. 5. 7/04/13
  6. 6. What do we need tools for?● For highlighting interesting possible error prone areas. – Thus, reducing the number of areas the auditor needs to focus on.● For "automagically" finding known vulnerabilities. – For example, bad usage of API calls.● For matching a vulnerability of type/pattern A in software B in other software C. – Vulnerability extrapolation.● ...7/04/13
  7. 7. What do we need tools for?● For checking against specific rules or patterns for the software being audited. – Different rules applies to every different software. – Vulnerabilities specific to one product.● For doing all of the previous things against a software in either binary or source code format. – Or even both.● For doing all of this interactively. – Why is IDA the best disassembler out there?7/04/13
  8. 8. Interactivity is key● We need automatic tools that can be corrected by a human. – The tool will make mistakes a human can recognize.● We need to let the human identify and correct those mistakes “somehow”.● We need, also, a way to let the auditor decide what is (s)he interested in and what is not.7/04/13
  9. 9. Bug/Vulnerability Finding Tools● There are plenty of bug finding tools: – Coverity, Klockwork, Fortify, CodeSonar, etc...● They all find different bugs. – There is no tool A that finds a superset of bugs found by B and/or C.● Theyre good at finding bugs (and some vulnerabilities).● But they are focused on a different audience... – In my opinion, bug and vulnerability finding tools are different because of this.7/04/13
  10. 10. Bug finding tools → Developers● They try to find any kind of software defect.● They try to minimize the complexity of alerts.● They try to minimize the number of false positives to the minimum possible. – Sometimes, even dropping checkers that can find awesome bugs but the false positive ratio is “high”.● They tend to remove anything the developers cannot understand or that can be too hard to understand. – Otherwise, every bug would be, blindly, considered a false positive and the tool would be, finally, ignored.7/04/13
  11. 11. Vuln finding tools → Auditors● Im not interested on any kind of software defects (i.e., div by zero). Only “theoretically” exploitable ones. – Or perhaps yes: vulns in exception handlers...● I dont mind to analyze 100 false positives if for every 100 I get one awesome vulnerability.● I dont mind having to spend a day or a week understanding what a complex checker said if its worth it. – If its really a vulnerability, its even better. – The harder its to find the lower the chances that somebody else found it.7/04/13
  12. 12. How to do it?● Steps: – Identify the source code – Parse the source code – Translate the source code – Understand the program – Run checkers against the program – Interact with the auditor – Go to “Run checkers” or “Parse the source code” again...7/04/13
  13. 13. Identifying the source● A tool like this must be able to identify the source before anything else. – The "source" can be either real source code (C/C++/...), disassembly code or decompiled code.● If the tool cannot handle both source codes and binaries the tool will be too restricted.● Identifying the "source" is not as easy as it may sounds at first chance... – Correct disassembly, for example, is a problem. – Auditors interaction is required. – Complete or partial source code. ● Include paths, conditional compilation, etc...7/04/13
  14. 14. Parsing the source● Typical misconception/false statement: “Parsing source code is an already solved problem”7/04/13
  15. 15. Already solved what???7/04/13
  16. 16. Parsing source code● Writing a parser for one compiler is a big task, but can be done “easily”.● Writing a parser for *any* compilers accepted source code is a huge task. – You must accept and parse even malformed code. – Examples: MS Visual C++ precompiler headers. ● You can write whatever you want before the first include.● A parser for just one compiler doesnt have this kind of problems. – You just accept what you consider OK.● For finding vulnerabilities, your parser must accept anything you feed with.7/04/13
  17. 17. Writing a parser● You need to parse “the source” to get the AST. – Abstract Syntax Tree. More on this later...● I dont like to reinvent the wheel and I dont recommend you. – Dont write your own parser. – No. – Really.● Use an existing parser than can handle as many “dialects” as possible.7/04/13
  18. 18. “Writing” a parser● For my 1st prototype, I used pycparser. – OK for a quick prototype, not for the final tool.● It would be a bad choice for many reasons, like: – It only accepts well formed C. ● I wrote “filters” to “clean” the not accepted C... – It only accepts C source for which all types are known. – If just one error happens during parsing, it stops and cannot recover from it. – I patched it to try to recover from errors. But sometimes, it is simply, not possible.7/04/13
  19. 19. “Writing” a parser● Fugue uses libclang. It accepts virtually anything. – Very good at recovering from errors. – Talking about C source code, it "swallows" almost anything. – Supports also C++ and Objective-C.● Proved to be good in real scenarios: i.e., klockwork uses it.● If you happen to have a rich uncle, Edison Design Group C++ frontend is, probably, the best choice. – Proved to be good in real scenarios: i.e., coverity uses it.7/04/13
  20. 20. A “parser” for binaries● You need to parse "disassembly" to get the AST (Abstract Syntax Tree).● Parsing disassembly is, in my opinion, far easier than parsing real source code. – The code is not that flexible.● But there are problems: – Many different assemblies: ARM, 8086, 8087, AMD64, MIPS, PPC, etc...7/04/13
  21. 21. A “parser” for binaries● What do? Intermediate representations. – Translators of assembly. – Examples: ● REIL (Zynamics).7/04/13
  22. 22. A “parser” for binaries● My idea: instead of writing a translator for the processors you want, use existing tools. – Decompilers. [Public] decompilers for x86 and ARM exists (Hex-Rays).● Using them "could be" a good idea. – Hex-Rays decompilers export an API to get the AST for a function. – Just what I want.● Problems: – The decompilers are writen for humans to understand the code. – Not writen for programs to find vulnerabilities. – A bad decompiler assumption may generate a lot of false positives. ● Example: GCC.7/04/13
  23. 23. GCC and decompiled code● Given this example C source code, my prototype found (only) 3 errors.7/04/13
  24. 24. GCC and decompiled code● However, running my tool against the decompiled code for this toy program, 4 appeared.● Notice the warning for “init_proc” function.7/04/13
  25. 25. GCC and decompiled code● Why this false positive? Because of a bad decompiler assumption:● The function “init_proc” returns void, not int.7/04/13
  26. 26. More problems with decompilers● This problem is easy to identify and fix.● What about this one?Source Code Decompiled Code7/04/13
  27. 27. Problems with decompiled code● It isnt a bug in the decompiler neither a bad assumption.● It is a compiler optimization.● It is only noticeable in real source code. – Having source code is very easy to identify: Dead code.● NOTE: Having both source code and binaries this (and others optimizations) can be detected and used.7/04/13
  28. 28. Translating the “source”● No matter how, we have the AST (Abstract Syntax Tree). – What is this?7/04/13
  29. 29. Abstract Syntax Tree● Extracted from Wikipedia: “In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is abstract in the sense that it does not represent every detail that appears in the real syntax.”7/04/13
  30. 30. Example AST● An AST for the following code: while b != 0   if a > b     a = a – b   else     b = b – a return a7/04/13
  31. 31. Translating the source● Every tool I use will have a different AST. – Example: libclang and Hex-Rays decompiler.● Need to translate the different ASTs supported to an internal AST format. – Not hard. But though.● We have it! Whats next? Typical error: – Why do anything else? Just use the AST for finding bugs! Lets do write checkers now!7/04/13
  32. 32. Using the AST for finding bugs7/04/13
  33. 33. Using the AST for finding bugs● Do not use the AST for finding bugs. – Youre using the wrong tool for this task.● Use the AST to build the CFG. – Control Flow Graph, more on this later.● However, ASTs are good for: – Finding and enforcing specific code styles. – Indenting source code. – Writing source-to-source translators – ...7/04/13
  34. 34. Using the AST● You have the AST for every function in either the binary or the code base you want to audit.● With the internal representation of the AST many other things are still needed: – The call graph of the program. Sort of easy, but not always: function pointers, virtual functions, constructors/destructors, etc... – The control flow graph (CFG) of every function. ● Identify basic blocks and relationships between them. – ...7/04/13
  35. 35. More things...● More things still needed… – The super control flow graph of the program. ● A call graph where every called functions CFG is expanded in the call graph. – The data dependency graph of the program. ● How argument A in function B travels over function C and affects var D of function E... ● IMO, the hardest task.● Those task arent easy at all. – Ill explain some of them in the next slides...7/04/13
  36. 36. Understanding the program● The Call Graph of the program is needed. – Why? To know every possible function path in the program.● To build it we can, simply: – Visit every node in every functions AST. – Save a list of all functions referenced from every function visited.● Thats is. The easiest way. – Is not complete... But is “good enough” to start.7/04/13
  37. 37. Understanding the program● Next thing needed: The CFG (Control Flow Graph).● What is this? Wikipedia to the rescue: – “A control flow graph (CFG) in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution.“7/04/13
  38. 38. Control Flow Graph● A CFG for the following code: while b != 0   if a > b     a = a – b   else     b = b – a return a7/04/13
  39. 39. Understanding the program● Lets say, no matter how, that our tool “understands” the program: – We know every possible path in the program. – We know how a variable X in function Y travels and is used in the complete program.● The next step is to convert the code from the AST of every basic block of the CFG to another form easier for analysing code. – Why?7/04/13
  40. 40. The AST, again...● We “could” write simple checkers with the CFG and the AST of every instruction of every basic block, but I do not recommend it. – An AST can be very complex even for not so complex expressions. – Example: ● signed int u = (float)x * y + func() ● VarDecl → Assignment → Cast → VarRef → BinaryOperator → VarRef → BinaryOperator → CallExpr.7/04/13
  41. 41. Understanding the program● Its needed something that makes the analysis easier.● Typical forms of code aimed to make analysis easier: – 3AC: Three Address Code. – SSA: Static Single Assingment form.● What are them?7/04/13
  42. 42. Three Address Code● Definition by Wikipedia: – “In computer science, three-address code (often abbreviated to TAC or 3AC) is a form of representing intermediate code used by compilers to aid in the implementation of code-improving transformations. Each instruction in three-address code can be described as a 4-tuple: (operator, operand1, operand2, result).“● Basically, we have every instruction represented in “more instructions” but all of them will only have one operator, 2 operands at most and a result.7/04/13
  43. 43. Three Address Code7/04/13
  44. 44. Static Single Assignment form● What is SSA? – “Static single assignment form (often abbreviated as SSA form or simply SSA) is a property of an intermediate representation (IR), which says that each variable is assigned exactly once. Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript in textbooks, so that every definition gets its own version.”● Pretty similar to 3AC but creating different versions of the variables, instead of temporary ones. – There are more differences, though...7/04/13
  45. 45. Understanding the program● In my opinion, it doesnt matter what form do you use: – Both are great enough for the task.● We just need that: – Every instruction does one and *only* one action. ● No side effects. – And every instruction have, as most, 2 operands, 1 operator and a result.7/04/13
  46. 46. Writing checkers to find vulns● A bug finding tool finds software defects in any part of the source. – The most code you check, the better.● A vulnerability finding tool should not, in my opinion... – Client side code: Im not interested in stack overflows reading configuration files that I cannot influence from remote. – Server side: Im not interested in bugs related to parsing configuration files, environment variables, etc...7/04/13
  47. 47. Writing checkers to find vulns● ...however, I may be interested on such bugs if Im auditing privileged local applications. – For example: any suid tool, like sudo.● In short: – It will depend on the kind of application (or which part of the application) were auditing. – It changes from application to application. – The tool must interact with the auditor. ● Not the checker itself, but must know “where”.7/04/13
  48. 48. Writing checkers to find vulns● In a vulnerability finding tool we need to say to the tool what areas were interested on. – Is this a remote application? Only focus on what can be influenced from remote. – Is this a local SUID binary? Focus on whatever area the user can feed input to.● So, what we need? First of all, a way to say to the tool: this is the area Im interested on. – Interactivity with the auditor.7/04/13
  49. 49. Writing checkers to find vulns● One example with Evince, a document viewer.● Running some prior versions of my tool a curious bug was found:7/04/13
  50. 50. Writing checkers to find vulns● Big mistake as "n" comes from a font file and, instead of using Min the developer used Max. – So great. Bravo!● However, we cannot forge a DVI file with an embeded font (this code parses fonts) so, while an obvious bug, unfortunately, it isnt a vulnerability.● My tool wasted time finding non remotely exploitable bugs. This is bad.● Interactivity is needed.7/04/13
  51. 51. Writing checkers to find vulns● For this, the auditor needs to identify the programs entry point(s). – Example: Find vulnerabilities starting from function "recv_data" in the call graph. – “Oh, BTW, I only control arg1 and arg3, not arg2”.● We need a way to say: Analyze all functions called from this "data entry point". – And not those completely uninteresting functions that deals with parsing local fonts, environment variables, etc... As with the Evince example.7/04/13
  52. 52. Writing checkers to find vulns● Also, we need a way to let the auditor determine what an external function/function pointer does. – Example: It reserves/frees memory, executes code, loads a library, etc...● If not, our tool will fail to find even the simplest bugs in real world scenarios. – In Infiltrate 2011, Halvar Flake (Thomas Dullien) showed a bug that in his opinion cannot bet handled by todays static analysis tools (because of machine states handling). – Ill show you even easier examples of what cannot be handled by any current static analysis tool.7/04/13
  53. 53. 7/04/13
  54. 54. External function pointers7/04/13
  55. 55. More problems writing checkers7/04/13
  56. 56. Problems writing checkers● There are 2 types of checkers: intraprocedurals and interprocedurals.● Intraprocedural ones only checks what happens inside one function.● Interprocedural ones checks what happens when var A travels to function B and is assigned to var C, and so on, and so on...7/04/13
  57. 57. “Hello World” checker● Writing a "hello world" like checker: finding uninitiliazed variable usages (intraprocedural).● Seems to be easy at first. Happens to be not so easy.● Why? – One of the many problems: Path explosion.● Suppose we have a function F0 with 10 basic blocks and 20 edges. Analyzing all possible paths is possible in not so many time.● Now lets see a “short of complex” function...7/04/13
  58. 58. Some Acrobat Reader function...7/04/13
  59. 59. The Acrobat Reader function● The number of possible paths in this function is so big we cannot traverse all of them in an acceptable time. – Probably, impossible.● We have to find solutions. One of them is “Sensitive analysis”. – Flow-Sensitive, path-sensitive, context-sensitive. – Simply, we need to make the number of paths we need to traverse smaller.● For this type of analysis to be possible we need to abstract all predicates in the function (remember 3AC/SSA?).7/04/13
  60. 60. Sensitive analysis● How to do it? Just my opinion, one idea: – Find in what basic blocks "local variables" are used and what predicates depends on them. ● Im not even talking at this point about interprocedural analysis. – Find the paths between the entry point, the basic blocks where the local vars are used and the functions exit points. – Then, remove all the other nodes to generate a smaller CFG. If there are unconnected nodes add the basic blocks and relations needed to connect them. – Hopefully, we will have a shorter version of the CFG with only what you need.7/04/13
  61. 61. And even more problems...● Suppose that we have, finally, our "hello world" intraprocedural checker. – Finally! My first one took me a lot...● Now, we should make it interprocedural.● Very often, a variable is declared in a function A, travels over function B, C, ..., until its used in function Y.● We need to control "the machine state". – There is no “state” but “many possible states”.7/04/13
  62. 62. Problems, problems, problems...● Do you remember the path explossion problem? Think about it in intraprocedural analysis. – Horrible.● Think about it controlling “the state”. – Terrible.● Lets talk a bit more about the state...7/04/13
  63. 63. Problems, problems, problems...● How many possible machine states we may have? – We cannot control all of them. Impossible. – Possible paths depends on machine states so, again, we cannot control all the possible paths. – We may guess the limits and try partial solutions. ● Predicates abstraction, opaque predicates, etc..., and symbolic execution.7/04/13
  64. 64. Symbolic execution● During symbolic execution we try to find if a particular state S0 is possible for function F0 (lets say were only talking about intraprocedural analysis).● We can abstract the predicates, the computational operations that affects them and generate phormulaes to prove satisfiability using a SAT/SMT solver. – Some people says it isnt the way to go... (i.e. Coverity). – Others do use this way (Goanna, for example). – I really dont know.7/04/13
  65. 65. Fugue: Current state, future directions and goals● Current state: far from finished.● I dont really know when Ill finish it, if at all. Really. – But... I would like to release “something” in one year.● Anyway, even if finish it... I cant be sure it will find awesome bugs. – But it amazes me that even the most rudimentary (current & past) versions of the tool, actually, finds real bugs.7/04/13
  66. 66. Questions?7/04/13