Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery [Rooted CON 2013]

Interactive Static Analysis
Tools for Vulnerability
Discovery
(Fugue)

Joxean Koret

Static Analysis Tools
● What are them?
– Tools to find properties of a given piece of
software without actually executing it.
– The “properties” I find in this case are
bugs/vulnerabilities.
● We need good static analysis tools for
performing audits in software.

Why?

● Software is becoming bigger and bigger.
● As so, harder to analyze.
– Examples: Firefox, Google Chrome, MS Office...
● Auditing software like this, by hand, is tedious and
takes a long while.
● Fuzzing is good for finding vulnerabilities in such big
products.
– But is not the solution (neither is SA, I think).
– Is just another useful tool.

7/04/13

Why?
● Typical old vulnerabilities easily found by quick
manual code audits are almost gone, bye-bye!
– strcpy, memcpy, sprintf, syslog, etc...
● No vulnerabilities like this in highly audited code
bases (except maybe sudo or freetype...).
– Apache, Firefox, Google Chrome...
● We need better tools.
– My approach: Static analysis (Fugue).

7/04/13

What do we need tools for?
● For highlighting interesting possible error prone areas.
– Thus, reducing the number of areas the auditor
needs to focus on.
● For "automagically" finding known vulnerabilities.
– For example, bad usage of API calls.
● For matching a vulnerability of type/pattern A in
software B in other software C.
– Vulnerability extrapolation.
● ...
7/04/13

What do we need tools for?
● For checking against specific rules or patterns for the
software being audited.
– Different rules applies to every different software.
– Vulnerabilities specific to one product.
● For doing all of the previous things against a software
in either binary or source code format.
– Or even both.
● For doing all of this interactively.
– Why is IDA the best disassembler out there?

7/04/13

Interactivity is key
● We need automatic tools that can be
corrected by a human.
– The tool will make mistakes a human can
recognize.
● We need to let the human identify and
correct those mistakes “somehow”.
● We need, also, a way to let the auditor
decide what is (s)he interested in and what
is not.
7/04/13

Bug/Vulnerability Finding Tools
● There are plenty of bug finding tools:
– Coverity, Klockwork, Fortify, CodeSonar, etc...
● They all find different bugs.
– There is no tool A that finds a superset of bugs found by
B and/or C.
● They're good at finding bugs (and some
vulnerabilities).
● But they are focused on a different audience...
– In my opinion, bug and vulnerability finding tools are
different because of this.
7/04/13

Bug finding tools → Developers

● They try to find any kind of software defect.
● They try to minimize the complexity of alerts.
● They try to minimize the number of false positives to the
minimum possible.
– Sometimes, even dropping checkers that can find awesome
bugs but the false positive ratio is “high”.
● They tend to remove anything the developers cannot
understand or that can be too hard to understand.
– Otherwise, every bug would be, blindly, considered a false
positive and the tool would be, finally, ignored.

7/04/13

Vuln finding tools → Auditors
● I'm not interested on any kind of software defects (i.e., div
by zero). Only “theoretically” exploitable ones.
– Or perhaps yes: vulns in exception handlers...
● I don't mind to analyze 100 false positives if for every 100 I
get one awesome vulnerability.
● I don't mind having to spend a day or a week
understanding what a complex checker said if it's worth it.
– If it's really a vulnerability, it's even better.
– The harder it's to find the lower the chances that somebody
else found it.

7/04/13

How to do it?
● Steps:
– Identify the source code
– Parse the source code
– Translate the source code
– Understand the program
– Run checkers against the program
– Interact with the auditor
– Go to “Run checkers” or “Parse the source code” again...

7/04/13

Identifying the source
● A tool like this must be able to identify the source before anything
else.
– The "source" can be either real source code (C/C++/...),
disassembly code or decompiled code.
● If the tool cannot handle both source codes and binaries the tool will
be too restricted.
● Identifying the "source" is not as easy as it may sounds at first
chance...
– Correct disassembly, for example, is a problem.
– Auditor's interaction is required.
– Complete or partial source code.
● Include paths, conditional compilation, etc...
7/04/13

Parsing the source
● Typical misconception/false statement:

“Parsing source code is an already
solved problem”

7/04/13

Already solved what???

7/04/13

Parsing source code
● Writing a parser for one compiler is a big task, but can be done
“easily”.
● Writing a parser for *any* compiler's accepted source code is a huge
task.
– You must accept and parse even malformed code.
– Examples: MS Visual C++ precompiler headers.
● You can write whatever you want before the first include.
● A parser for just one compiler doesn't have this kind of problems.
– You just accept what you consider OK.
● For finding vulnerabilities, your parser must accept anything you feed
with.

7/04/13

Writing a parser
● You need to parse “the source” to get the AST.
– Abstract Syntax Tree. More on this later...
● I don't like to reinvent the wheel and I don't
recommend you.
– Don't write your own parser.
– No.
– Really.
● Use an existing parser than can handle as
many “dialects” as possible.
7/04/13

“Writing” a parser
● For my 1st prototype, I used pycparser.
– OK for a quick prototype, not for the final tool.
● It would be a bad choice for many reasons, like:
– It only accepts well formed C.
● I wrote “filters” to “clean” the not accepted C...
– It only accepts C source for which all types are known.
– If just one error happens during parsing, it stops and cannot
recover from it.
– I patched it to try to recover from errors. But sometimes, it is
simply, not possible.

7/04/13

“Writing” a parser
● Fugue uses libclang. It accepts virtually anything.
– Very good at recovering from errors.
– Talking about C source code, it "swallows" almost
anything.
– Supports also C++ and Objective-C.
● Proved to be good in real scenarios: i.e., klockwork uses it.
● If you happen to have a rich uncle, Edison Design Group
C++ frontend is, probably, the best choice.
– Proved to be good in real scenarios: i.e., coverity uses
it.
7/04/13

A “parser” for binaries
● You need to parse "disassembly" to get the
AST (Abstract Syntax Tree).
● Parsing disassembly is, in my opinion, far
easier than parsing real source code.
– The code is not that flexible.
● But there are problems:
– Many different assemblies: ARM, 8086, 8087,
AMD64, MIPS, PPC, etc...
7/04/13

● What do? Intermediate representations.
– Translators of assembly.
– Examples:
● REIL (Zynamics).

7/04/13

● My idea: instead of writing a translator for the processors you want, use
existing tools.
– Decompilers. [Public] decompilers for x86 and ARM exists (Hex-Rays).
● Using them "could be" a good idea.
– Hex-Rays decompilers export an API to get the AST for a function.
– Just what I want.
● Problems:
– The decompilers are writen for humans to understand the code.
– Not writen for programs to find vulnerabilities.
– A bad decompiler assumption may generate a lot of false positives.
● Example: GCC.

7/04/13

GCC and decompiled code
● Given this example C source code, my
prototype found (only) 3 errors.

7/04/13

● However, running my tool against the
decompiled code for this toy program, 4
appeared.

● Notice the warning for “init_proc” function.
7/04/13

● Why this false positive? Because of a bad
decompiler assumption:

● The function “init_proc” returns void, not int.
7/04/13

More problems with decompilers
● This problem is easy to identify and fix.
● What about this one?
Source Code Decompiled Code

7/04/13

Problems with decompiled code
● It isn't a bug in the decompiler neither a
bad assumption.
● It is a compiler optimization.
● It is only noticeable in real source code.
– Having source code is very easy to identify:
Dead code.
● NOTE: Having both source code and
binaries this (and others optimizations) can
be detected and used.
7/04/13

Translating the “source”
● No matter how, we have the AST (Abstract
Syntax Tree).
– What is this?

7/04/13

Abstract Syntax Tree
● Extracted from Wikipedia:
“In computer science, an abstract syntax tree (AST), or
just syntax tree, is a tree representation of the abstract
syntactic structure of source code written in a
programming language. Each node of the tree denotes a
construct occurring in the source code. The syntax is
'abstract' in the sense that it does not represent every
detail that appears in the real syntax.”

7/04/13

Example AST
● An AST for the
following code:
while b != 0
  if a > b
    a = a – b
  else
    b = b – a
return a
7/04/13

Translating the source
● Every tool I use will have a different AST.
– Example: libclang and Hex-Rays decompiler.
● Need to translate the different ASTs
supported to an internal AST format.
– Not hard. But though.
● We have it! What's next? Typical error:
– Why do anything else? Just use the AST for
finding bugs! Let's do write checkers now!

7/04/13

Using the AST for finding bugs

7/04/13

Using the AST for finding bugs
● Do not use the AST for finding bugs.
– You're using the wrong tool for this task.
● Use the AST to build the CFG.
– Control Flow Graph, more on this later.
● However, ASTs are good for:
– Finding and enforcing specific code styles.
– Indenting source code.
– Writing source-to-source translators
– ...
7/04/13

Using the AST
● You have the AST for every function in either the
binary or the code base you want to audit.
● With the internal representation of the AST many other
things are still needed:
– The call graph of the program. Sort of easy, but not
always: function pointers, virtual functions,
constructors/destructors, etc...
– The control flow graph (CFG) of every function.
● Identify basic blocks and relationships between them.
– ...
7/04/13

More things...
●
More things still needed…
– The super control flow graph of the program.
● A call graph where every called function's CFG is
expanded in the call graph.
– The data dependency graph of the program.
● How argument A in function B travels over function
C and affects var D of function E...
● IMO, the hardest task.
● Those task aren't easy at all.
– I'll explain some of them in the next slides...
7/04/13

Understanding the program
● The Call Graph of the program is needed.
– Why? To know every possible function path in
the program.
● To build it we can, simply:
– Visit every node in every function's AST.
– Save a list of all functions referenced from
every function visited.
● That's is. The easiest way.
– Is not complete... But is “good enough” to start.
7/04/13

● Next thing needed: The CFG (Control Flow
Graph).
● What is this? Wikipedia to the rescue:
– “A control flow graph (CFG) in computer
science is a representation, using graph
notation, of all paths that might be traversed
through a program during its execution.“

7/04/13

Control Flow Graph
● A CFG for the
following code:
while b != 0
  if a > b
    a = a – b
  else
    b = b – a
return a
7/04/13

● Let's say, no matter how, that our tool
“understands” the program:
– We know every possible path in the program.
– We know how a variable X in function Y travels
and is used in the complete program.
● The next step is to convert the code from
the AST of every basic block of the CFG to
another form easier for analysing code.
– Why?
7/04/13

The AST, again...
● We “could” write simple checkers with the
CFG and the AST of every instruction of
every basic block, but I do not recommend
it.
– An AST can be very complex even for not so
complex expressions.
– Example:
● signed int u = (float)x * y + func()
● VarDecl → Assignment → Cast → VarRef →
BinaryOperator → VarRef → BinaryOperator →
CallExpr.
7/04/13

● It's needed something that makes the
analysis easier.
● Typical forms of code aimed to make
analysis easier:
– 3AC: Three Address Code.
– SSA: Static Single Assingment form.
● What are them?

7/04/13

Three Address Code
● Definition by Wikipedia:
– “In computer science, three-address code (often
abbreviated to TAC or 3AC) is a form of representing
intermediate code used by compilers to aid in the
implementation of code-improving transformations.
Each instruction in three-address code can be
described as a 4-tuple: (operator, operand1, operand2,
result).“
● Basically, we have every instruction represented in “more
instructions” but all of them will only have one operator, 2
operands at most and a result.

7/04/13

Three Address Code

7/04/13

Static Single Assignment form
● What is SSA?
– “Static single assignment form (often abbreviated as
SSA form or simply SSA) is a property of an
intermediate representation (IR), which says that each
variable is assigned exactly once. Existing variables in
the original IR are split into versions, new variables
typically indicated by the original name with a subscript
in textbooks, so that every definition gets its own
version.”
● Pretty similar to 3AC but creating different versions of the
variables, instead of temporary ones.
– There are more differences, though...
7/04/13

● In my opinion, it doesn't matter what form
do you use:
– Both are great enough for the task.
● We just need that:
– Every instruction does one and *only* one
action.
● No side effects.
– And every instruction have, as most, 2
operands, 1 operator and a result.
7/04/13

Writing checkers to find vulns
● A bug finding tool finds software defects in any part of
the source.
– The most code you check, the better.
● A vulnerability finding tool should not, in my opinion...
– Client side code: I'm not interested in stack overflows
reading configuration files that I cannot influence from
remote.
– Server side: I'm not interested in bugs related to parsing
configuration files, environment variables, etc...

7/04/13

● ...however, I may be interested on such bugs if
I'm auditing privileged local applications.
– For example: any suid tool, like sudo.
● In short:
– It will depend on the kind of application (or which
part of the application) we're auditing.
– It changes from application to application.
– The tool must interact with the auditor.
● Not the checker itself, but must know “where”.

7/04/13

● In a vulnerability finding tool we need to say to
the tool what areas we're interested on.
– Is this a remote application? Only focus on what
can be influenced from remote.
– Is this a local SUID binary? Focus on whatever area
the user can feed input to.
● So, what we need? First of all, a way to say to
the tool: this is the area I'm interested on.
– Interactivity with the auditor.

7/04/13

● One example with Evince, a document viewer.
● Running some prior versions of my tool a
curious bug was found:

7/04/13

● Big mistake as "n" comes from a font file and, instead
of using Min the developer used Max.
– So great. Bravo!
● However, we cannot forge a DVI file with an embeded
font (this code parses fonts) so, while an obvious bug,
unfortunately, it isn't a vulnerability.
● My tool wasted time finding non remotely exploitable
bugs. This is bad.
● Interactivity is needed.

7/04/13

● For this, the auditor needs to identify the program's entry
point(s).
– Example: Find vulnerabilities starting from function
"recv_data" in the call graph.
– “Oh, BTW, I only control arg1 and arg3, not arg2”.
● We need a way to say: Analyze all functions called from
this "data entry point".
– And not those completely uninteresting functions that
deals with parsing local fonts, environment variables,
etc... As with the Evince example.

7/04/13

● Also, we need a way to let the auditor determine what
an external function/function pointer does.
– Example: It reserves/frees memory, executes code, loads a
library, etc...
● If not, our tool will fail to find even the simplest bugs in
real world scenarios.
– In Infiltrate 2011, Halvar Flake (Thomas Dullien) showed a
bug that in his opinion cannot bet handled by today's static
analysis tools (because of machine states handling).
– I'll show you even easier examples of what cannot be
handled by any current static analysis tool.

7/04/13

External function pointers

7/04/13

Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery [Rooted CON 2013]

More Related Content

What's hot

Similar to Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery [Rooted CON 2013]

More from RootedCON

Recently uploaded

Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery [Rooted CON 2013]