Symbolic Execution And KLEE

Symbolic executionOverview of work done by Dawson Engler’s group at Stanford (EGT/EXE/KLEE*) by Shauvik Roy Choudhary http://cc.gatech.edu/~shauvik Some slides adapted from the EXE and KLEE presentations + slides from Saswat

Old research area but still active.. First introduced in 1975 (source: Saswat) 1976 by James King, IBM – TJ watson Very active area of research. Eg. EGT / EXE / KLEE [Stanford] DART [Bell Labs] CUTE [UIUC] SAGE, Pex [MSR Redmond] Vigilante [MSR Cambridge] BitScope [Berkeley/CMU] CatchConv [Berkeley] JPF [NASA Ames] 2

Symbolic Execution Symbolic execution refers to execution of program with symbols as argument. Unlike concrete execution, in symbolic execution the program can take any feasible path. (limitation: constraint solver) During symbolic execution, program state consists of symbolic values for some memory locations path condition Path condition is a conjuct of constraints on the symbolic input values. Solution of path-condition is an test-input that covers the respective path. 3

Implementation of Symbolic Execution Transformation approach transform the program to another program that operates on symbolic values such that execution of the transformed program is equivalent to symbolic execution of the original program difficult to implement, portable solution, suitable for Java, .NET Instrumentation approach callback hooks are inserted in the program such that symbolic execution is done in background during normal execution of program easy to implement for C Customized runtime approach Customize the runtime (e.g., JVM) to support symbolic execution Applicable to Java, .NET, difficult to implement, flexible, not portable 4 CUTE, KLEE JPF

Limitations of Symbolic Execution Limited by the power of constraint solver cannot handle non-linear and very complex constraints Does not scale when number of paths are large. (subject of ongoing research in this area) Source code, or equivalent (e.g., Java class files) is required for precise symbolic execution 5

EGT & EXE Slides based on D. Engler’s slides

Generic features: Baroque interfaces, tricky input, rats nest of conditionals. Enormous undertaking to hit with manual testing. Random “fuzz” testing Charm: no manual work Blind generation makes hard to hit errors for narrow input range Also hard to hit errors that require structure This talk: a simple trick to finesse. Goal: find many bugs in systems code

EGT: Execution Generated Testing [SPIN’05] Basic Idea: Use the code itself to construct its input ! Basic Algorithm: Symbolic execution + constraints solving. Run code on symbolic inputs, initial value = “anything” As code observes inputs, it tells us values it can be. At conditionals that uses symbolic input, fork On true branch, add constraint that input satisfies check On false that it does not. Then generate constraints using these inputs and re-run code using them. 8 How to make system code crash itself !

The toy example Initialize x to be “any int” Code will run 3 times. Solve constraints at each to get our 3 test cases. 9

The big picture Implementation prototype Do source-to-source transformation using CIL Use CVCL decision procedure to solve constraints, then re-run code on concrete values Robustness: use mixed symbolic and concrete execution 3 ways to look at what’s going on Grammar extraction Turn code inside out from input consumer to generator Sort-of Heisenberg effect: observations perturb symbolic inputs into increasingly concrete ones. More definite observation = more definite perturbation 10

Mixed execution Basic idea: given an operation: If all of it’s operands are concrete, just do it. If any are symbolic, add constraint. If current constraints are impossible, stop. If current path causes something to blow up, solve & emit. If current path calls unmodelled function, solve & call. If program exits, solve & emit. How to track? Use variable addresses to determine if symbolic or concrete Note: Symbolic assignment not destructive. Creates new symbol 11

Example transformation “+” Each varv has v.concrete and v.symbolic fields If v is concrete, symbol = <invalid> and vice versa 12

Results Mutt vs <= 1.4 have buffer overflow (osdi paper) Input size 4, took 34 minutes to generate 458 tests with 98% st coverage printf(3 implementations pintOS, gccfast, embedded) Made format strings symbolic Two bugs Incorrect grouping of integers Incorrect handling of plus flags (“%” followed by space) 14

More.. WsMP3 server case study 2ooo LOC Technique: Make recv input symbolic Found known security hole + 2 new bugs 15 Network controlled infinite loop Buffer overflow

EXE: EXecution generated Executions [CCS’06] Same ideas as EGT Main contributions More practical tool: Can test any code path Generates actual attacks Constraint Solver : STP Decision solver for bitvectors and arrays. If solvable, passes constraints to MiniSAT Four times lesser code than CVCL and magnitude faster Array optimizations (substitution, refinements, simplification) 16 Automatically Generating inputs of Death !

The mechanics User marks input to treat symbolically using either: Compile with EXE compiler, exe-cc. Uses CIL to Insert checks around every expression: if operands all concrete, run as normal. Otherwise, add as constraint Insert fork calls when symbolic could cause multiple acts ./a.out: forks at each decision point. When path terminates use STP to solve constraints. Terminates when: (1) exit, (2) crash, (3) EXE detects err Rerun concrete through uninstrumented code.

Isn’t exponential expensive? Only fork on symbolic branches. Most concrete (linear). Loops? Heuristics. Default: DFS. Linear processes with chain depth. Can get stuck. “Best first” search: chose branch, backtrack to point that will run code hit fewest times. Can do better… However: Happy to let run for weeks as long as generating interesting test cases. Competition is manual and random.

Mixed execution Basic idea: given expression (e.g., deref, ALU op) If all of its operands are concrete, just do it. If any are symbolic, add as constraint. If current constraints are impossible, stop. If current path hits error or exit(), solve+emit. If calls uninstrumented code: do call, or solve and do call Example: “x = y + z” If y, z both concrete, execute. Record x = concrete. Otherwise set “x = y + z”, record x =symbolic. Result: Most code runs concretely: small slice deals w/ symbolics. Robust: do not need all source code (e.g., OS). Just run

Limits Missed constraints: If call asm, or CIL cannot eat file. STP cannot do div/mod: constraint to be power of 2, shift, mask respectively. Cannot handle **p where “p” is symbolic: must concretize *p. (Note: **p still symbolic.) Stops path if cannot solve; can get lost in exponentials. Missing: No symbolic function pointers, symbolics passed to varargs not tracked. No floating point. long long support is erratic.

EXE Results Berkley Packet Filter Two buffer overflow exploits udhcpd – well tested user level DHCP server Five memory errors PCRE – Perl Compatible Regular Expressions Many out of bounds writes leading to abort in glibc on free Disks of death – File systems Four bugs on ext2 & ext 3 file systems. Null pointer dereference in JFS 21

A galactic view [Oakland’06]

KLEE Thanks to CristianCadar for the slides

24 Code complexity Tricky control flow Complex dependencies Abusive use of pointer operations Environmental dependencies Code has to anticipate all possible interactions Including malicious ones Writing Systems Code Is Hard

KLEE [OSDI 2008, Best Paper Award] Based on symbolic execution and constraint solving techniques Automatically generates high coverage test suites ,[object Object],Finds deep bugs in complex systems programs ,[object Object],25

Toy Example x=  x < 0 intbad_abs(intx) { if (x < 0) return –x; if (x == 1234) return –x; return x; } TRUE FALSE x0 x< 0 x = 1234 return -x TRUE FALSE x= 1234 x1234 x = -2 return x return -x test1.out x = 3 x = 1234 test2.out test3.out 26

KLEE Architecture LLVM bytecode C code L L V M x = -2 K L E E SYMBOLIC ENVIRONMENT x = 1234 x = 3 x  0 x  1234 x = 3 Constraint Solver (STP) 27

Outline Motivation Example and Basic Architecture Scalability Challenges Experimental Evaluation 28

Three Big Challenges Motivation Example and Basic Architecture Scalability Challenges ,[object Object]

Interaction with environmentExperimental Evaluation 29

Exponential Search Space Naïve exploration can easily get “stuck” Use search heuristics: Coverage-optimized search ,[object Object]

Favor paths that recently hit new codeRandom path search ,[object Object],30

Constraint Solving Dominates runtime ,[object Object]

Invoked at every branchTwo simple and effective optimizations ,[object Object]

Caching solutionsDramatic speedup on our benchmarks 32

Eliminating Irrelevant Constraints In practice, each branch usually depends on a small number of variables … … if (x < 10) { … } x + y > 10 z & -z = z x< 10 ? 33

Caching Solutions Static set of branches: lots of similar constraint sets 2  y < 100 x > 3 x + y > 10 x = 5 y = 15 x = 5 y = 15 2  y < 100 x + y > 10 Eliminating constraints cannot invalidate solution 2  y < 100 x > 3 x + y > 10 x < 10 x = 5 y = 15 Adding constraints often does not invalidate solution UBTree data structure [Hoffman and Koehler, IJCAI ’99] 34

Dramatic Speedup Aggregated data over 73 applications Time (s) Executed instructions (normalized) 35

Environment: Calling Out Into OS intfd = open(“t.txt”, O_RDONLY); If all arguments are concrete, forward to OS Otherwise, provide models that can handle symbolic files ,[object Object],intfd = open(sym_str, O_RDONLY); 37

Environmental Modeling // actual implementation: ~50 LOC ssize_tread(intfd, void *buf, size_t count) { exe_file_t *f = get_file(fd); … memcpy(buf, f->contents + f->off, count) f->off += count; … } Plain C code run by KLEE ,[object Object],Currently: effective support for symbolic command line arguments, files, links, pipes, ttys, environment vars 38

Does KLEE work? Motivation Example and Basic Architecture Scalability Challenges Evaluation ,[object Object]

GNU Coreutils Suite Core user-level apps installed on many UNIX systems 89 stand-alone (i.e. excluding wrappers) apps (v6.10) ,[object Object]

Management of system properties: hostname, printenv, etc.

Text file processing : sort, wc, od, etc.

…Variety of functions, different authors, intensive interaction with environment Heavily tested, mature code 40

Symbolic Execution And KLEE

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Symbolic Execution And KLEE

Similar to Symbolic Execution And KLEE (20)

More from Shauvik Roy Choudhary, Ph.D.

More from Shauvik Roy Choudhary, Ph.D. (10)

Recently uploaded

Recently uploaded (20)

Symbolic Execution And KLEE