SlideShare a Scribd company logo
1 of 54
Symbolic executionOverview of work done by Dawson Engler’s group at Stanford (EGT/EXE/KLEE*) by Shauvik Roy Choudhary http://cc.gatech.edu/~shauvik Some slides adapted from the EXE and KLEE presentations + slides from Saswat
Old research area but still active.. First introduced in 1975  (source: Saswat) 1976 by James King, IBM – TJ watson Very active area of research. Eg. EGT / EXE / KLEE [Stanford] DART [Bell Labs] CUTE [UIUC] SAGE, Pex [MSR Redmond] Vigilante [MSR Cambridge] BitScope [Berkeley/CMU] CatchConv [Berkeley] JPF [NASA Ames] 2
Symbolic Execution Symbolic execution refers to execution of program with symbols as argument. Unlike concrete execution, in symbolic execution the program can take any feasible path. (limitation: constraint solver) During symbolic execution, program state consists of symbolic values for some memory locations path condition Path condition is a conjuct of constraints on the symbolic input values. Solution of path-condition is an test-input that covers the respective path. 3
Implementation of Symbolic Execution Transformation approach transform the program to another program that operates on symbolic values such that execution of the transformed program is equivalent to symbolic execution of the original program difficult to implement, portable solution, suitable for Java, .NET Instrumentation approach callback hooks are inserted in the program such that symbolic execution is done in background during normal execution of program easy to implement for C Customized runtime approach Customize the runtime (e.g., JVM) to support symbolic execution Applicable to Java, .NET, difficult to implement, flexible, not portable 4 CUTE, KLEE JPF
Limitations of Symbolic Execution Limited by the power of constraint solver cannot handle non-linear and very complex constraints Does not scale when number of paths are large. (subject of ongoing research in this area) Source code, or equivalent (e.g., Java class files) is required for precise symbolic execution 5
EGT & EXE Slides based on D. Engler’s slides
Generic features:  Baroque interfaces, tricky input, rats nest of conditionals. Enormous undertaking to hit with manual testing. Random “fuzz” testing Charm: no manual work Blind generation makes hard					 to hit errors for narrow						 input range Also hard to hit errors that 						require structure This talk: a simple trick to finesse. Goal: find many bugs in systems code
EGT: Execution Generated Testing [SPIN’05] Basic Idea: Use the code itself to construct its input ! Basic Algorithm: Symbolic execution + constraints solving. Run code on symbolic inputs, initial value = “anything” As code observes inputs, it tells us values it can be. At conditionals that uses symbolic input, fork On true branch, add constraint that input satisfies check On false that it does not. Then generate constraints using these inputs and re-run code using them. 8 How to make system code crash itself !
The toy example Initialize x to be “any int” Code will run 3 times. Solve constraints at each    to get our 3 test cases. 9
The big picture Implementation prototype Do source-to-source transformation using CIL Use CVCL decision procedure to solve constraints, then re-run code on concrete values Robustness: use mixed symbolic and concrete execution 3 ways to look at what’s going on Grammar extraction Turn code inside out from input consumer to generator Sort-of Heisenberg effect: observations perturb symbolic inputs into increasingly concrete ones. More definite observation = more definite perturbation 10
Mixed execution Basic idea: given an operation: If all of it’s operands are concrete, just do it. If any are symbolic, add constraint. If current constraints are impossible, stop. If current path causes something to blow up, solve & emit. If current path calls unmodelled function, solve & call. If program exits, solve & emit. How to track? Use variable addresses to determine if symbolic or concrete Note: Symbolic assignment not destructive. Creates new symbol 11
Example transformation “+” Each varv has v.concrete and v.symbolic fields 	If v is concrete, symbol = <invalid> and vice versa 12
13
Results Mutt vs <= 1.4 have buffer overflow (osdi paper) Input size 4, took 34 minutes to generate 458 tests with 98% st coverage printf(3 implementations pintOS, gccfast, embedded) Made format strings symbolic Two bugs Incorrect grouping of integers  Incorrect handling of plus flags (“%” followed by space) 14
More.. WsMP3 server case study 2ooo LOC Technique: Make recv input symbolic Found known security hole + 2 new bugs 15 Network controlled infinite loop Buffer overflow
EXE: EXecution generated Executions [CCS’06] Same ideas as EGT Main contributions More practical tool:  Can test any code path Generates actual attacks Constraint Solver : STP Decision solver for bitvectors and arrays. If solvable, passes constraints to MiniSAT Four times lesser code than CVCL and magnitude faster Array optimizations (substitution, refinements, simplification) 16 Automatically Generating inputs of Death !
The mechanics User marks input to treat symbolically using either: Compile with EXE compiler, exe-cc.  Uses CIL to Insert checks around every expression: if operands all concrete, run as normal.  Otherwise, add as constraint Insert fork calls when symbolic could cause multiple acts ./a.out: forks at each decision point. When path terminates use STP to solve constraints. Terminates when: (1) exit, (2) crash, (3) EXE detects err Rerun concrete through uninstrumented code.
Isn’t exponential expensive? Only fork on symbolic branches. Most concrete (linear). Loops?  Heuristics. Default: DFS.  Linear processes with chain depth. Can get stuck. “Best first” search: chose branch, backtrack to point that will run code hit fewest times. Can do better… However: Happy to let run for weeks as long as generating interesting test cases.  Competition is manual and random.
Mixed execution Basic idea: given expression (e.g., deref, ALU op) If all of its operands are concrete, just do it. If any are symbolic, add as constraint. If current constraints are impossible, stop. If current path hits error or exit(), solve+emit. If calls uninstrumented code: do call, or solve and do call Example: “x = y + z” If y, z both concrete, execute.  Record x = concrete. Otherwise set “x = y + z”, record x =symbolic. Result: Most code runs concretely: small slice deals w/ symbolics. Robust: do not need all source code (e.g., OS).  Just run
Limits Missed constraints: If call asm, or CIL cannot eat file. STP cannot do div/mod: constraint to be power of 2, shift, mask respectively. Cannot handle **p where “p” is symbolic: must concretize *p.  (Note: **p still symbolic.) Stops path if cannot solve; can get lost in exponentials. Missing: No symbolic function pointers, symbolics passed to varargs not tracked. No floating point. long long support is erratic.
EXE Results Berkley Packet Filter Two buffer overflow exploits udhcpd – well tested user level DHCP server Five memory errors PCRE – Perl Compatible Regular Expressions Many out of bounds writes leading to abort in glibc on free Disks of death – File systems Four bugs on ext2 & ext 3 file systems. Null pointer dereference in JFS 21
A galactic view [Oakland’06]
KLEE Thanks to CristianCadar for the slides
24 Code complexity Tricky control flow Complex dependencies Abusive use of pointer operations Environmental dependencies Code has to anticipate all possible interactions Including malicious ones Writing Systems Code Is Hard
KLEE [OSDI 2008, Best Paper Award] Based on symbolic execution and constraint solving techniques Automatically generates high coverage test suites ,[object Object],Finds deep bugs in complex systems programs ,[object Object],25
Toy Example x=  x < 0 intbad_abs(intx)  {      if (x < 0) 	     return –x;      if (x == 1234)          return –x;      return x; } TRUE FALSE x0 x< 0 x = 1234 return -x TRUE FALSE x= 1234 x1234 x = -2 return x return -x test1.out x = 3 x = 1234 test2.out test3.out 26
KLEE Architecture LLVM bytecode C code L L V M x = -2 K L E E SYMBOLIC  ENVIRONMENT x = 1234 x = 3 x  0 x  1234 x = 3 Constraint Solver (STP) 27
Outline Motivation  Example and Basic Architecture Scalability Challenges Experimental Evaluation 28
Three Big Challenges Motivation  Example and Basic Architecture Scalability Challenges ,[object Object]
Expensive constraint solving
Interaction with environmentExperimental Evaluation 29
Exponential Search Space Naïve exploration can easily get “stuck” Use search heuristics: Coverage-optimized search ,[object Object]
Favor paths that recently hit new codeRandom path search ,[object Object],30
Three Big Challenges Motivation  Example and Basic Architecture Scalability Challenges ,[object Object]
Expensive constraint solving
Interaction with environmentExperimental Evaluation 31
Constraint Solving Dominates runtime ,[object Object]
Invoked at every branchTwo simple and effective optimizations ,[object Object]
Caching solutionsDramatic speedup on our benchmarks 32
Eliminating Irrelevant Constraints In practice, each branch usually depends on a small number of variables … … if (x < 10) {     … }                    x + y > 10 z & -z = z x< 10 ? 33
Caching Solutions Static set of branches: lots of similar constraint sets 2  y < 100 x > 3 x + y > 10 x = 5 y = 15 x = 5 y = 15 2  y < 100 x + y > 10 Eliminating constraints cannot invalidate solution 2  y < 100 x > 3 x + y > 10 x < 10 x = 5 y = 15 Adding constraints often  does not invalidate solution UBTree data structure [Hoffman and Koehler, IJCAI ’99] 34
Dramatic Speedup Aggregated data over 73 applications Time (s) Executed instructions (normalized) 35
Three Big Challenges Motivation  Example and Basic Architecture Scalability Challenges ,[object Object]
Expensive constraint solving
Interaction with environmentExperimental Evaluation 36
Environment: Calling Out Into OS intfd  = open(“t.txt”, O_RDONLY); If all arguments are concrete, forward to OS Otherwise, provide models that can handle symbolic files ,[object Object],intfd  = open(sym_str, O_RDONLY); 37
Environmental Modeling // actual implementation: ~50 LOC ssize_tread(intfd, void *buf, size_t count) { exe_file_t *f = get_file(fd);         … memcpy(buf, f->contents + f->off, count) f->off += count;         … } Plain C code run by KLEE ,[object Object],Currently: effective support for symbolic command line arguments, files, links, pipes, ttys, environment vars 38
Does KLEE work? Motivation  Example and Basic Architecture Scalability Challenges Evaluation ,[object Object]
Bug finding
Crosschecking39
GNU Coreutils Suite Core user-level apps installed on many UNIX systems 89 stand-alone (i.e. excluding wrappers) apps (v6.10) ,[object Object]
Management of system properties: hostname, printenv, etc.
Text file processing : sort, wc, od, etc.
…Variety of functions, different authors, intensive interaction with environment Heavily tested, mature code 40

More Related Content

What's hot

Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler designSudip Singh
 
All pairs shortest path algorithm
All pairs shortest path algorithmAll pairs shortest path algorithm
All pairs shortest path algorithmSrikrishnan Suresh
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designSadia Akter
 
Amortized Analysis of Algorithms
Amortized Analysis of Algorithms Amortized Analysis of Algorithms
Amortized Analysis of Algorithms sathish sak
 
Context free langauges
Context free langaugesContext free langauges
Context free langaugessudhir sharma
 
Architecting large Node.js applications
Architecting large Node.js applicationsArchitecting large Node.js applications
Architecting large Node.js applicationsSergi Mansilla
 
Bootstrapping in Compiler
Bootstrapping in CompilerBootstrapping in Compiler
Bootstrapping in CompilerAkhil Kaushik
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)Sławomir Zborowski
 
Array of objects.pptx
Array of objects.pptxArray of objects.pptx
Array of objects.pptxRAGAVIC2
 
Classes and Objects in C#
Classes and Objects in C#Classes and Objects in C#
Classes and Objects in C#Adeel Rasheed
 
Job sequencing with Deadlines
Job sequencing with DeadlinesJob sequencing with Deadlines
Job sequencing with DeadlinesYashiUpadhyay3
 
Error managing and exception handling in java
Error managing and exception handling in javaError managing and exception handling in java
Error managing and exception handling in javaAndhra University
 
Matrix chain multiplication
Matrix chain multiplicationMatrix chain multiplication
Matrix chain multiplicationKiran K
 
sum of subset problem using Backtracking
sum of subset problem using Backtrackingsum of subset problem using Backtracking
sum of subset problem using BacktrackingAbhishek Singh
 
GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)
GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)
GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)Madhu Bala
 
Composing an App with Free Monads (using Cats)
Composing an App with Free Monads (using Cats)Composing an App with Free Monads (using Cats)
Composing an App with Free Monads (using Cats)Hermann Hueck
 

What's hot (20)

Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler design
 
Java IO
Java IOJava IO
Java IO
 
Java packages
Java packagesJava packages
Java packages
 
All pairs shortest path algorithm
All pairs shortest path algorithmAll pairs shortest path algorithm
All pairs shortest path algorithm
 
Phases of Compiler
Phases of CompilerPhases of Compiler
Phases of Compiler
 
The role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler designThe role of the parser and Error recovery strategies ppt in compiler design
The role of the parser and Error recovery strategies ppt in compiler design
 
Amortized Analysis of Algorithms
Amortized Analysis of Algorithms Amortized Analysis of Algorithms
Amortized Analysis of Algorithms
 
Context free langauges
Context free langaugesContext free langauges
Context free langauges
 
Architecting large Node.js applications
Architecting large Node.js applicationsArchitecting large Node.js applications
Architecting large Node.js applications
 
Bootstrapping in Compiler
Bootstrapping in CompilerBootstrapping in Compiler
Bootstrapping in Compiler
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)
 
Array of objects.pptx
Array of objects.pptxArray of objects.pptx
Array of objects.pptx
 
Classes and Objects in C#
Classes and Objects in C#Classes and Objects in C#
Classes and Objects in C#
 
Job sequencing with Deadlines
Job sequencing with DeadlinesJob sequencing with Deadlines
Job sequencing with Deadlines
 
Error managing and exception handling in java
Error managing and exception handling in javaError managing and exception handling in java
Error managing and exception handling in java
 
Matrix chain multiplication
Matrix chain multiplicationMatrix chain multiplication
Matrix chain multiplication
 
sum of subset problem using Backtracking
sum of subset problem using Backtrackingsum of subset problem using Backtracking
sum of subset problem using Backtracking
 
Context free grammar
Context free grammarContext free grammar
Context free grammar
 
GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)
GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)
GRAPH APPLICATION - MINIMUM SPANNING TREE (MST)
 
Composing an App with Free Monads (using Cats)
Composing an App with Free Monads (using Cats)Composing an App with Free Monads (using Cats)
Composing an App with Free Monads (using Cats)
 

Similar to Symbolic Execution And KLEE

Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)Emilio Coppa
 
DARPA CGC and DEFCON CTF: Automatic Attack and Defense Technique
DARPA CGC and DEFCON CTF: Automatic Attack and Defense TechniqueDARPA CGC and DEFCON CTF: Automatic Attack and Defense Technique
DARPA CGC and DEFCON CTF: Automatic Attack and Defense TechniqueChong-Kuan Chen
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Benoit Combemale
 
Assessing Unit Test Quality
Assessing Unit Test QualityAssessing Unit Test Quality
Assessing Unit Test Qualityguest268ee8
 
Improving Code Quality Through Effective Review Process
Improving Code Quality Through Effective  Review ProcessImproving Code Quality Through Effective  Review Process
Improving Code Quality Through Effective Review ProcessDr. Syed Hassan Amin
 
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...Raffi Khatchadourian
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
 
A Survey of Concurrency Constructs
A Survey of Concurrency ConstructsA Survey of Concurrency Constructs
A Survey of Concurrency ConstructsTed Leung
 
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.Wolfgang Grieskamp
 
Software Testing Tecniques
Software Testing TecniquesSoftware Testing Tecniques
Software Testing Tecniquesersanbilik
 
OCL'16 slides: Models from Code or Code as a Model?
OCL'16 slides: Models from Code or Code as a Model?OCL'16 slides: Models from Code or Code as a Model?
OCL'16 slides: Models from Code or Code as a Model?Antonio García-Domínguez
 
Qat09 presentations dxw07u
Qat09 presentations dxw07uQat09 presentations dxw07u
Qat09 presentations dxw07uShubham Sharma
 
香港六合彩 » SlideShare
香港六合彩 » SlideShare香港六合彩 » SlideShare
香港六合彩 » SlideShareyayao
 
Elixir and elm
Elixir and elmElixir and elm
Elixir and elmMix & Go
 
Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020
Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020
Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020Andrzej Jóźwiak
 
Custom Detectors for FindBugs (London Java Community Unconference 2)
Custom Detectors for FindBugs (London Java Community Unconference 2)Custom Detectors for FindBugs (London Java Community Unconference 2)
Custom Detectors for FindBugs (London Java Community Unconference 2)Robin Fernandes
 

Similar to Symbolic Execution And KLEE (20)

Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)
 
DARPA CGC and DEFCON CTF: Automatic Attack and Defense Technique
DARPA CGC and DEFCON CTF: Automatic Attack and Defense TechniqueDARPA CGC and DEFCON CTF: Automatic Attack and Defense Technique
DARPA CGC and DEFCON CTF: Automatic Attack and Defense Technique
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
 
Assessing Unit Test Quality
Assessing Unit Test QualityAssessing Unit Test Quality
Assessing Unit Test Quality
 
Improving Code Quality Through Effective Review Process
Improving Code Quality Through Effective  Review ProcessImproving Code Quality Through Effective  Review Process
Improving Code Quality Through Effective Review Process
 
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...
Open Problems in Automatically Refactoring Legacy Java Software to use New Fe...
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
A Survey of Concurrency Constructs
A Survey of Concurrency ConstructsA Survey of Concurrency Constructs
A Survey of Concurrency Constructs
 
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
 
Software Testing Tecniques
Software Testing TecniquesSoftware Testing Tecniques
Software Testing Tecniques
 
OCL'16 slides: Models from Code or Code as a Model?
OCL'16 slides: Models from Code or Code as a Model?OCL'16 slides: Models from Code or Code as a Model?
OCL'16 slides: Models from Code or Code as a Model?
 
Cgc2
Cgc2Cgc2
Cgc2
 
Unit testing - A&BP CC
Unit testing - A&BP CCUnit testing - A&BP CC
Unit testing - A&BP CC
 
Qat09 presentations dxw07u
Qat09 presentations dxw07uQat09 presentations dxw07u
Qat09 presentations dxw07u
 
Code Metrics
Code MetricsCode Metrics
Code Metrics
 
香港六合彩 » SlideShare
香港六合彩 » SlideShare香港六合彩 » SlideShare
香港六合彩 » SlideShare
 
Elixir and elm
Elixir and elmElixir and elm
Elixir and elm
 
Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020
Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020
Do I need tests when I have the compiler - Andrzej Jóźwiak - TomTom Dev Day 2020
 
Tdd is not about testing
Tdd is not about testingTdd is not about testing
Tdd is not about testing
 
Custom Detectors for FindBugs (London Java Community Unconference 2)
Custom Detectors for FindBugs (London Java Community Unconference 2)Custom Detectors for FindBugs (London Java Community Unconference 2)
Custom Detectors for FindBugs (London Java Community Unconference 2)
 

More from Shauvik Roy Choudhary, Ph.D. (10)

Test and docs: Hand in hand
Test and docs: Hand in handTest and docs: Hand in hand
Test and docs: Hand in hand
 
Using Robots for App Testing
Using Robots for App Testing Using Robots for App Testing
Using Robots for App Testing
 
From Manual to Automated Tests - STAC 2015
From Manual to Automated Tests - STAC 2015From Manual to Automated Tests - STAC 2015
From Manual to Automated Tests - STAC 2015
 
PhD Dissertation Defense (April 2015)
PhD Dissertation Defense (April 2015)PhD Dissertation Defense (April 2015)
PhD Dissertation Defense (April 2015)
 
Espresso Barista
Espresso BaristaEspresso Barista
Espresso Barista
 
CheckDroid Startup Madness 2014
CheckDroid Startup Madness 2014CheckDroid Startup Madness 2014
CheckDroid Startup Madness 2014
 
Penetration Testing with Improved Input Vector Identification
Penetration Testing with Improved Input Vector IdentificationPenetration Testing with Improved Input Vector Identification
Penetration Testing with Improved Input Vector Identification
 
Auto web
Auto webAuto web
Auto web
 
Intro to Html 5
Intro to Html 5Intro to Html 5
Intro to Html 5
 
Barcamp Atlanta 2007
Barcamp Atlanta 2007Barcamp Atlanta 2007
Barcamp Atlanta 2007
 

Recently uploaded

Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 

Recently uploaded (20)

Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 

Symbolic Execution And KLEE

  • 1. Symbolic executionOverview of work done by Dawson Engler’s group at Stanford (EGT/EXE/KLEE*) by Shauvik Roy Choudhary http://cc.gatech.edu/~shauvik Some slides adapted from the EXE and KLEE presentations + slides from Saswat
  • 2. Old research area but still active.. First introduced in 1975 (source: Saswat) 1976 by James King, IBM – TJ watson Very active area of research. Eg. EGT / EXE / KLEE [Stanford] DART [Bell Labs] CUTE [UIUC] SAGE, Pex [MSR Redmond] Vigilante [MSR Cambridge] BitScope [Berkeley/CMU] CatchConv [Berkeley] JPF [NASA Ames] 2
  • 3. Symbolic Execution Symbolic execution refers to execution of program with symbols as argument. Unlike concrete execution, in symbolic execution the program can take any feasible path. (limitation: constraint solver) During symbolic execution, program state consists of symbolic values for some memory locations path condition Path condition is a conjuct of constraints on the symbolic input values. Solution of path-condition is an test-input that covers the respective path. 3
  • 4. Implementation of Symbolic Execution Transformation approach transform the program to another program that operates on symbolic values such that execution of the transformed program is equivalent to symbolic execution of the original program difficult to implement, portable solution, suitable for Java, .NET Instrumentation approach callback hooks are inserted in the program such that symbolic execution is done in background during normal execution of program easy to implement for C Customized runtime approach Customize the runtime (e.g., JVM) to support symbolic execution Applicable to Java, .NET, difficult to implement, flexible, not portable 4 CUTE, KLEE JPF
  • 5. Limitations of Symbolic Execution Limited by the power of constraint solver cannot handle non-linear and very complex constraints Does not scale when number of paths are large. (subject of ongoing research in this area) Source code, or equivalent (e.g., Java class files) is required for precise symbolic execution 5
  • 6. EGT & EXE Slides based on D. Engler’s slides
  • 7. Generic features: Baroque interfaces, tricky input, rats nest of conditionals. Enormous undertaking to hit with manual testing. Random “fuzz” testing Charm: no manual work Blind generation makes hard to hit errors for narrow input range Also hard to hit errors that require structure This talk: a simple trick to finesse. Goal: find many bugs in systems code
  • 8. EGT: Execution Generated Testing [SPIN’05] Basic Idea: Use the code itself to construct its input ! Basic Algorithm: Symbolic execution + constraints solving. Run code on symbolic inputs, initial value = “anything” As code observes inputs, it tells us values it can be. At conditionals that uses symbolic input, fork On true branch, add constraint that input satisfies check On false that it does not. Then generate constraints using these inputs and re-run code using them. 8 How to make system code crash itself !
  • 9. The toy example Initialize x to be “any int” Code will run 3 times. Solve constraints at each to get our 3 test cases. 9
  • 10. The big picture Implementation prototype Do source-to-source transformation using CIL Use CVCL decision procedure to solve constraints, then re-run code on concrete values Robustness: use mixed symbolic and concrete execution 3 ways to look at what’s going on Grammar extraction Turn code inside out from input consumer to generator Sort-of Heisenberg effect: observations perturb symbolic inputs into increasingly concrete ones. More definite observation = more definite perturbation 10
  • 11. Mixed execution Basic idea: given an operation: If all of it’s operands are concrete, just do it. If any are symbolic, add constraint. If current constraints are impossible, stop. If current path causes something to blow up, solve & emit. If current path calls unmodelled function, solve & call. If program exits, solve & emit. How to track? Use variable addresses to determine if symbolic or concrete Note: Symbolic assignment not destructive. Creates new symbol 11
  • 12. Example transformation “+” Each varv has v.concrete and v.symbolic fields If v is concrete, symbol = <invalid> and vice versa 12
  • 13. 13
  • 14. Results Mutt vs <= 1.4 have buffer overflow (osdi paper) Input size 4, took 34 minutes to generate 458 tests with 98% st coverage printf(3 implementations pintOS, gccfast, embedded) Made format strings symbolic Two bugs Incorrect grouping of integers Incorrect handling of plus flags (“%” followed by space) 14
  • 15. More.. WsMP3 server case study 2ooo LOC Technique: Make recv input symbolic Found known security hole + 2 new bugs 15 Network controlled infinite loop Buffer overflow
  • 16. EXE: EXecution generated Executions [CCS’06] Same ideas as EGT Main contributions More practical tool: Can test any code path Generates actual attacks Constraint Solver : STP Decision solver for bitvectors and arrays. If solvable, passes constraints to MiniSAT Four times lesser code than CVCL and magnitude faster Array optimizations (substitution, refinements, simplification) 16 Automatically Generating inputs of Death !
  • 17. The mechanics User marks input to treat symbolically using either: Compile with EXE compiler, exe-cc. Uses CIL to Insert checks around every expression: if operands all concrete, run as normal. Otherwise, add as constraint Insert fork calls when symbolic could cause multiple acts ./a.out: forks at each decision point. When path terminates use STP to solve constraints. Terminates when: (1) exit, (2) crash, (3) EXE detects err Rerun concrete through uninstrumented code.
  • 18. Isn’t exponential expensive? Only fork on symbolic branches. Most concrete (linear). Loops? Heuristics. Default: DFS. Linear processes with chain depth. Can get stuck. “Best first” search: chose branch, backtrack to point that will run code hit fewest times. Can do better… However: Happy to let run for weeks as long as generating interesting test cases. Competition is manual and random.
  • 19. Mixed execution Basic idea: given expression (e.g., deref, ALU op) If all of its operands are concrete, just do it. If any are symbolic, add as constraint. If current constraints are impossible, stop. If current path hits error or exit(), solve+emit. If calls uninstrumented code: do call, or solve and do call Example: “x = y + z” If y, z both concrete, execute. Record x = concrete. Otherwise set “x = y + z”, record x =symbolic. Result: Most code runs concretely: small slice deals w/ symbolics. Robust: do not need all source code (e.g., OS). Just run
  • 20. Limits Missed constraints: If call asm, or CIL cannot eat file. STP cannot do div/mod: constraint to be power of 2, shift, mask respectively. Cannot handle **p where “p” is symbolic: must concretize *p. (Note: **p still symbolic.) Stops path if cannot solve; can get lost in exponentials. Missing: No symbolic function pointers, symbolics passed to varargs not tracked. No floating point. long long support is erratic.
  • 21. EXE Results Berkley Packet Filter Two buffer overflow exploits udhcpd – well tested user level DHCP server Five memory errors PCRE – Perl Compatible Regular Expressions Many out of bounds writes leading to abort in glibc on free Disks of death – File systems Four bugs on ext2 & ext 3 file systems. Null pointer dereference in JFS 21
  • 22. A galactic view [Oakland’06]
  • 23. KLEE Thanks to CristianCadar for the slides
  • 24. 24 Code complexity Tricky control flow Complex dependencies Abusive use of pointer operations Environmental dependencies Code has to anticipate all possible interactions Including malicious ones Writing Systems Code Is Hard
  • 25.
  • 26. Toy Example x=  x < 0 intbad_abs(intx) { if (x < 0) return –x; if (x == 1234) return –x; return x; } TRUE FALSE x0 x< 0 x = 1234 return -x TRUE FALSE x= 1234 x1234 x = -2 return x return -x test1.out x = 3 x = 1234 test2.out test3.out 26
  • 27. KLEE Architecture LLVM bytecode C code L L V M x = -2 K L E E SYMBOLIC ENVIRONMENT x = 1234 x = 3 x  0 x  1234 x = 3 Constraint Solver (STP) 27
  • 28. Outline Motivation Example and Basic Architecture Scalability Challenges Experimental Evaluation 28
  • 29.
  • 32.
  • 33.
  • 34.
  • 37.
  • 38.
  • 39. Caching solutionsDramatic speedup on our benchmarks 32
  • 40. Eliminating Irrelevant Constraints In practice, each branch usually depends on a small number of variables … … if (x < 10) { … } x + y > 10 z & -z = z x< 10 ? 33
  • 41. Caching Solutions Static set of branches: lots of similar constraint sets 2  y < 100 x > 3 x + y > 10 x = 5 y = 15 x = 5 y = 15 2  y < 100 x + y > 10 Eliminating constraints cannot invalidate solution 2  y < 100 x > 3 x + y > 10 x < 10 x = 5 y = 15 Adding constraints often does not invalidate solution UBTree data structure [Hoffman and Koehler, IJCAI ’99] 34
  • 42. Dramatic Speedup Aggregated data over 73 applications Time (s) Executed instructions (normalized) 35
  • 43.
  • 46.
  • 47.
  • 48.
  • 51.
  • 52. Management of system properties: hostname, printenv, etc.
  • 53. Text file processing : sort, wc, od, etc.
  • 54. …Variety of functions, different authors, intensive interaction with environment Heavily tested, mature code 40
  • 55. Coreutils ELOC (incl. called lib) Number of applications Executable Lines of Code (ELOC) 41
  • 56.
  • 57. High Line Coverage (Coreutils, non-lib, 1h/utility = 89 h) Overall: 84%, Average 91%, Median 95% 16 at 100% Coverage (ELOC %) Apps sorted by KLEE coverage 43
  • 58. KLEE 91% Manual 68% Beats 15 Years of Manual Testing Avg/utility Manual tests also check correctness KLEE coverage – Manual coverage Apps sorted by KLEE coverage – Manual coverage 44
  • 59. Busybox Suite for Embedded Devices Overall: 91%, Average 94%, Median 98% 31 at 100% Coverage (ELOC %) Apps sorted by KLEE coverage 45
  • 60. KLEE 94% Manual 44% Busybox – KLEE vs. Manual Avg/utility KLEE coverage – Manual coverage Apps sorted by KLEE coverage – Manual coverage 46
  • 61.
  • 64.
  • 65. KLEE generates actual command lines exposing crashes48
  • 66. md5sum -c t1.txt mkdir -Z a b mkfifo -Z a b mknod -Z a b p seq -f %0 1 pr -e t2.txt tac -r t3.txt t3.txt paste -d abcdefghijklmnopqrstuvwxyz ptx -F abcdefghijklmnopqrstuvwxyz ptx x t4.txt t1.txt: MD5( t2.txt: t3.txt: t4.txt: A Ten command lines of death 49
  • 67.
  • 70.
  • 71. An assert is just a branch, and KLEE proves feasibility/infeasibility of each branch it reaches
  • 72. If KLEE determines infeasibility of false side of assert, the assert was proven on the current path51
  • 73. Crosschecking Assume f(x) and f’(x) implement the same interface Make input x symbolic Run KLEE on assert(f(x) == f’(x)) For each explored path: KLEE terminates w/o error: paths are equivalent KLEE terminates w/ error: mismatch found Coreutils vs. Busybox: UNIX utilities should conform to IEEE Std.1003.1 Crosschecked pairs of Coreutils and Busybox apps Verified paths, found mismatches 52
  • 74. Input Busybox Coreutils tee "" <t1.txt [infinite loop] [terminates] tee - [copies once to stdout] [copies twice] comm t1.txt t2.txt [doesn’t show diff] [shows diff] cksum / "4294967295 0 /" "/: Is a directory" split / "/: Is a directory" tr [duplicates input] "missing operand" [ 0 ‘‘<’’ 1 ] "binary op. expected" tail –2l [rejects] [accepts] unexpand –f [accepts] [rejects] split – [rejects] [accepts] t1.txt: a t2.txt: b (no newlines!) Mismatches Found 53
  • 75.
  • 76. KLEE DEMO Tool available at http://klee.llvm.org/ Experiments Tool examples isLower() RegExp More experimentation 55
  • 77. Discussion Questions / Ideas ? Thanks for listening !