Successfully reported this slideshow.
Your SlideShare is downloading. ×

A Survey on Dynamic Symbolic Execution for Automatic Test Generation

Ad

A Survey on
Dynamic Symbolic Execution
for Automatic Test Generation
Jan. 6 2014
PQE
Hyunmin Seo
1

Ad

Motivation
•  Testing is a practical way to verify software
•  The cost for testing account more than 50%
of total softwar...

Ad

Outline
•  Automatic Test Generation
–  Random Testing
–  Combinatorial Testing
–  Search-Based Testing
–  Symbolic Execut...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 65 Ad
1 of 65 Ad

More Related Content

Similar to A Survey on Dynamic Symbolic Execution for Automatic Test Generation (20)

A Survey on Dynamic Symbolic Execution for Automatic Test Generation

  1. 1. A Survey on Dynamic Symbolic Execution for Automatic Test Generation Jan. 6 2014 PQE Hyunmin Seo 1
  2. 2. Motivation •  Testing is a practical way to verify software •  The cost for testing account more than 50% of total software development costs [Tassey ‘02] •  Effective, efficient and scalable automatic testing is required [Bounimova ‘13, Kim ‘12] 2
  3. 3. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 3
  4. 4. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 4
  5. 5. Random Testing •  Random Testing – Randomly generate test inputs •  Adaptive Random Testing (ART) – Spread test cases evenly over input domain [Chen ’04] – Failure-causing inputs form contiguous region [White ‘80, Chan ‘96] •  Feedback-Directed Random Testing – Randoop [Pacheco ‘07] – Unit testing 5
  6. 6. Random Testing Summary •  One of the most fundamental and well-studied approach [Hamlet ‘94, Loo ‘88] –  Many variations •  Pros –  Efficient, Scalable –  No source code requirement •  Cons –  Low coverage [Burnim ’08] 6
  7. 7. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 7
  8. 8. Combinatorial Testing •  Find a subset of input parameters satisfying a certain property [Cohen ‘13] •  Mathematical property 8
  9. 9. Vertical Ruler Ruler Units Default View SS Navigation End with Black Always Mirror Warn Before Visible Inches Normal Pop-up Yes Yes Yes Invisible Centimeters Slide None No No No Points Outline Picas Total # of configuration Settings = 2*4*3*2*2*2 = 384 9
  10. 10. N-way Covering Array •  A subset including all the possible combinations from any N factors at least once [Cohen ‘13] 10
  11. 11. No Vertical Ruler Ruler Units Default View SS Navigation End with Black Always Mirror Warn Before 1 Visible Centimeters Outline Pop-up No No Yes 2 Invisible Inches Outline Pop-up No No No 3 Invisible Centimeters Slide None Yes Yes Yes 4 Visible Picas Outline Pop-up Yes Yes No 5 Invisible Centimeters Normal Pop-up Yes Yes No 6 Visible Points Outline None Yes No Yes 7 Invisible Points Slide Pop-up No No No 8 Invisible Picas Slide Pop-up No Yes Yes 9 Invisible Points Normal None No Yes No 10 Visible Inches Normal None Yes No Yes 11 Visible Inches Slide Pop-up No Yes Yes 12 Invisible Picas Normal None Yes No No Vertical Ruler Ruler Units Default View SS Navigation End with Black Always Mirror Warn Before Visible Inches Normal Pop-up Yes Yes Yes Invisible Centimeters Slide None No No No Points Outline Picas 2-Way Covering Array 11 CA(12;2,(25,31,41)
  12. 12. Combinatorial Testing Summary •  Research Direction –  How to find the minimum size array •  Greedy [Tung ‘00, Colbourn ‘04] •  Meta-heuristics [Cohen ‘03, Stardom ‘01] –  Application to different domain •  Software Product Line [McGregor ‘01, Perrouin ‘10] •  Pros –  Systematic testing with mathematical property [Cohen ‘13] –  Sample configurations to be tested [Qu 08’] •  Cons –  Too many combinations for program inputs 12
  13. 13. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 13
  14. 14. Search-Based Testing •  A branch of SBSE in which meta heuristics are used to guide the search [McMinn ‘04] •  Typical process –  Start with a random input –  Search nearby locations for better solution –  Evaluate with fitness function –  Update the current solution with a better solution –  Search is guided by meta-heuristics 14
  15. 15. Meta-Heuristics Input domain (a) Hill climbing Fitnessvalue Input domain (b) Simulated Annealing Fitnessvalue Input domain (c) Genetic Algorithm Fitnessvalue 15 [McMinn ‘11]
  16. 16. [McMinn ’11] Input :A string count:The number of digits in the string if (count >= 4) if (count <= 10) if (checksum % 10 == checkdigit) FALSE FALSE FALSE TRUE TRUE TRUE Target π2: count = 20 π3: count = 11 π1 π2 π3 Search Based-Testing Example 16
  17. 17. Fitness Function •  Combination of approach level and branch distance •  Approach level –  The number of target’s control dependent node not executed by the current input •  Branch distance [Tracey ‘98] 17 Element   Value   Boolean   if  TRUE  then  0  else  K   a  =  b   if  abs(a-­‐b)  =  0  then  0  else  abs(a-­‐b)  +  K   a  ≠  b   if  abs(a-­‐b)  ≠  0  then  0  else  K   a  <  b   if  a-­‐b  <  0  then  0  else  (a-­‐b)  +  K   a  ≤  b   if  a-­‐b  ≤  0  then  0  else  (a-­‐b)  +  K   a  >  b   if  b-­‐a  <  0  then  0  else  (b-­‐a)  +  K   a  ≥  b   if  b-­‐a  ≤  0  then  0  else  (b-­‐a)  +  K   a ∨ b   min  (  cost(a),  cost(b)  )   a ∧ b   cost  (a)  +  cost  (b)   !a   move  negation  inward  and  propagate  
  18. 18. Search-Based Testing Summary •  A branch of SBSE –  Different search heuristics –  Different domain [Harman ’13] •  Pros –  Guide the execution toward a specific branch –  Non-functional testing (ex. longest execution time) [Wegener ’98] •  Cons –  Search space challenge –  Design of fitness functions [Arcuri ‘10] 18
  19. 19. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 19
  20. 20. Symbolic Execution-Based Testing •  Use symbolic values to represent program variables and path conditions [King ‘76, Clarke ‘76] •  Find precise constraints for each execution path and generate test input by solving the constraints. 20
  21. 21. x  =  sym_input();   y  =  sym_input();   z  =  sym_input();     a  =  x  +  y     if  (z  >  a)      b  =  x  –  y   else      b  =  2  *  y     ...   Var   Value   PC:  s3>s1+s2     PC:  s3<=s1+s2     x s1   y s2   z s3   a s1 + s2   b s1 - s2   Var   Value   x s1   y s2   z s3   a s1 + s2   b 2s2   Symbolic Execution 21
  22. 22. π1 : PC1 π2 : PC2 π3 : PC3 . . . πn : PCn Test Generation SMT solver π1 : x = 1, y = 2, ... π2 : x = 1, y = 5, ... π3 : x = -5, y = 0,.. . . . πn : x = …, y = … Path Conditions Test Inputs 22
  23. 23. Symbolic Execution Based-Testing Summary •  Pros –  No redundant inputs taking the same path –  High Coverage •  Cons –  Low efficiency –  Depends on constraint solving techniques –  External library calls –  State explosion –  Imprecision 23
  24. 24. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 24
  25. 25. Limitations of SE 25 01    void  foo(int  x,  int  y)  {   02            if  (external  (x)  ==  y)  {   03                    //  branch  1   04            }   05            else  if  (hash(x)    >  y)  {   06                  //  branch  2   07              }   08    }       è No  source  code  available       è hash()  is  complex  arithmetic  
  26. 26. Dynamic Symbolic Execution •  Perform symbolic execution dynamically along an execution path of a concrete input [DART ‘05, CUTE ’05, PEX ‘08] •  Apply concretization – External library calls – Complex constraints 26
  27. 27. DSE π1 pc1 pc2 pc3 pc4 π2 π1 π2 π1 π3 PC’ = pc1∧pc2∧!pc3 PC’’ = pc1∧!pc2 27 PC = pc1∧pc2∧pc3 … ∧pcn
  28. 28. Benefit •  Based on symbolic execution –  No redundant inputs taking the same path –  High coverage •  Reach deep program state by starting from well-formed user provided input •  Use concrete values to overcome limitations –  External library calls –  Complicated constraints •  Many tools –  CREST, CUTE, JCUTE, PEX, SAGE, EXE, KLEE 28
  29. 29. Comparison Technique Efficiency Coverage Source code Requirement ETC Random No Combinatorial No Combine with other techniques Search-Based Yes/No Non-functional Testing Symbolic Execution Yes DSE Yes Concretization 29
  30. 30. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 30
  31. 31. Imprecision •  When the symbolic execution cannot represent the exact semantic of the program [Elkarablieh ’09] – Modeling a 4-Byte integer with a mathematical integer •  Imprecision may manifest as Divergence [Godefroid ’08] 31
  32. 32. Divergence pc1 pc2 pc3 pc4 pc5 pc1 ∧ pc2 ∧ ! pc3 32
  33. 33. Proposed solutions •  Integer size, Bit operations –  BitVector [SAGE ’08] •  Symbolic pointer dereferencing –  Array Theory of SMT solvers [Elkarablieh ‘09] •  Floating-point operations –  Combined static and dynamic analysis [Godefroid ‘10] •  Interaction with environment –  Modeling [KLEE ‘08] –  Reporting [Xiao ‘11] 33
  34. 34. BitVector •  Use bitvector in SMT solvers – Fixed-size integers – Bit operation on integer variables •  a & b •  a << 4 •  Slower than integer arithmetic 34
  35. 35. Symbolic Pointer Dereferencing •  Symbolic values are used to calculate the addresses of pointer values – Array index – a[S0] 35
  36. 36. 01    void  single  array  (BYTE  x,  BYTE  y)  {     02        BYTE  ∗  a  =  new  BYTE[4];   03        a[0]  =  x;     04        a[1]  =  0;     05        a[2]  =  1;     06        a[3]  =  2;     07   08        if  (a[x]  ==  a[y]  +  2)     09            assert(false  );     10   11        delete  []  a;     12 }     a[x] == a[y] + 2 è 0 != 0 + 2 a[x] == a[y] + 2 è S0 != 0 + 2 a[x] == a[y] + 2 è 1 != 0 + 2 [Elkarablieh ‘09] 36 Con Sym Con x 0 S0 2 y 1 S1 1 a[0] 0 S0 2 a[1] 0 0 0 a[2] 1 1 1 a[3] 2 2 2 a[x] 0 S0 1 a[y] 0 0 0 Symbolic Pointer Dereferencing Example
  37. 37. 01    void  single  array  (BYTE  x,  BYTE  y)  {     02        BYTE  ∗  a  =  new  BYTE[4];   03        a[0]  =  x;     04        a[1]  =  0;     05        a[2]  =  1;     06        a[3]  =  2;     07   08        if  (a[x]  ==  a[y]  +  2)     09            assert(false  );     10   11        delete  []  a;     12 }     [Elkarablieh ‘09] 37 Array Theory of SMT Solver Con Sym Con x 0 S0 2 y 1 S1 1 a[0] 0 S0 2 a[1] 0 0 0 a[2] 1 1 1 a[3] 2 2 2 a[x] 0 S0 1 a[y] 0 0 0 a[x]  :    0  ≤  x  ≤  3  ∧  a[x]      {0,1,2}   a[y]  :    0  ≤  y  ≤  3  ∧  a[y]      {0,1,2,x}  
  38. 38. Floating Point Operation •  [Godefroid ’10] •  FP code should only perform memory safe data-processing – Payload of an image or video file •  Non-FP code should deal with buffer allocations and memory address computations •  Lightweight local path-insensitive “may” analysis + precise “must” dynamic analysis 38
  39. 39. Interaction With Environment •  Modeling [KLEE ‘08] – System Calls – int  fd  =  open(argv[1],  O_RDNLY);     •  Precise Identification and Report – [Xiao ’11] 39
  40. 40. Imprecision Summary Reason Proposed Solutions Fixed-size Integer BitVector [SAGE ‘08] Symbolic Pointer Dereferencing Array Theory [Elkarablieh ’09] Floating-point operations Combined Static and Dynamic analysis [Godefroid ‘10] Interaction with Environment Modeling [KLEE ‘08] Precise identification and report [Xiao ’11] 40 Remaining Challenges: Precise reasoning about floating points, Interaction with Environment, External Library Calls, Concurrent programs
  41. 41. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 41
  42. 42. Constraint Solving •  Need to solve path constraints to get the test input •  The major bottleneck – Takes long time – Cannot solve 42
  43. 43. Proposed Solutions •  Optimization [KLEE ‘08] – Expression rewriting – Implied value concretization – Irrelevant constraint elimination – Constraint caching •  Meta-heuristic based constraints solving [Borges ‘12, Souza ‘11, Lakhotia ‘10] •  Hybrid approach [Garg ‘13] 43
  44. 44. Optimization •  Irrelevant constraint elimination [KLEE ‘08] •  Constraint Caching [KLEE ‘08] 44
  45. 45. Meta-Heuristic Approach •  SMT solvers may not support – Non-linear constraints – Floating-Points expressions – Very complex constraints •  Use Meta-Heuristic Approaches [Borges ‘12, Souza ‘11, Lakhotia ’10] 45
  46. 46. Hybrid Approach [Garg ’13] •  Apply concretization first and solve it quickly with an off-the-shelf SMT solver •  If divergence occurred, use ICP (Interval Constraint Propagation) to solve the constraints 46
  47. 47. Constraint Solving Summary Target Proposed Solutions Time overhead Irrelevant Constraint Elimination Constraint Caching [KLEE ‘08] Complex constraints Meta-heuristic Approach [Borges ‘12, Souza ‘11, Lakhotia ‘10] Non-linear constraints ICP [Garg,‘13] 47 Remaining Challenges: Floating points, Complex constraints, Non-linear constraints
  48. 48. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 48
  49. 49. Path Explosion •  The number of paths in a program increases exponentially with the number of branches in the program 49
  50. 50. Path Explosion π1 pc1 pc2 pc3 pc4 π2 π1 π2 π1 π3 pc1∧pc2∧!pc3 pc1∧!pc2 50
  51. 51. Proposed Solutions •  Pruning Redundant Path –  RWset [Cristian ‘08] –  Interpolation [Jaffar ’13] •  Function Summary –  Compositional [Godefroid ‘07,‘10] –  Demand-driven compositional [Anand ‘08] •  Search Heuristics –  CFG [Burnim ‘08] –  Generational [Godefroid ‘08] –  CarFast [Park ‘12] –  Hybrid [Majumdar ‘07] 51
  52. 52. Pruning Redundant Paths •  RWset ‘08 – If an execution reached a program point in the same state as some previous executions, then the execution will produce the same results – If two states are only differ in program values that are not subsequently read, then the two state will produce the same results 52
  53. 53. Pruning Redundant Paths •  Interpolant [Jaffar ’13] •  Succinctly representation of the core reason why a branch cannot be covered 53
  54. 54. Interpolant Example 54 UNSAT branch Full Interpolant ( x < 3z + 2) [Jaffar ’13]
  55. 55. Function Summary •  A function summary [Godefroid ‘07,‘10] •  prew is a conjunction of constraints of the inputs to the function •  postw , effect, is a conjunction of constraints of the outputs from the function 55
  56. 56. Function Summary foo(x, y) Assume foo has 10 execution paths Without Summary With Summary N paths N × 10 paths foo(x, y) N paths N paths 56
  57. 57. Search Heuristics •  Prioritize branches and explore relevant branches only 57
  58. 58. Search Heuristics (a) DFS (b) BFS (c) Heuristic Search 58
  59. 59. Search Heuristics •  Coverage-Optimized – CFG-directed [Burnim ‘08] – CarFast [Park ‘12] – Generational [GodeFroid ‘10] – Hybrid [Majumdar ‘07] •  Patch-Optimized – KATCH [Cadar ‘13] 59
  60. 60. CFG-Directed Search 60 π1 pc1 pc2 pc3 pc4 [Burnim ’08]
  61. 61. Limitations of Search Heuristics •  Does not consider how execution reached to branch •  Does not handle non-symbolic path constraints – pc = 3 > 0 – pc’ = !(3 > 0) = 3 ≤ 0 = UNSAT 61
  62. 62. Guiding Execution Toward a Branch 62 UNSAT
  63. 63. Path Explosion Summary Approach Proposed Solutions Pruning Redundant Paths RWset [Boonstoppel ‘08] Interpolation [Jaffar ‘13] Function Summary Compositional [Godefroid ’07,‘10] Demand-Driven Compositional [Anand ‘08] Search Heuristics CFG-Directed [Burnim ‘08] Generational [Godefroid ‘08] CarFast [Park ‘12] Hybrid [Majumdar ‘07] KATCH [Cadar ’13] 63 Remaining Challenges: Better Search Strategies, Guiding execution toward a specific branch
  64. 64. Conclusion •  DSE is a promising automatic test generation techniques achieving a high coverage •  DSE relies on symbolic execution and constraint solving •  Challenges – Imprecision, Constraint solving, Path explosion – GUI Application Testing, Concurrent programs, Object Creation problem 64
  65. 65. 65 Challenges and Proposed Solutions Imprecision Integer Size BitVector [SAGE ’08] Symbolic Pointer Dereferencing Array Theory [Elkarablieh ’09] Floating-points Combined Static and Dynamic analysis [Godefroid ’10] Environments Modeling [KLEE ‘08] Precise identification and report [Xiao ’11] Constraint Solving Optimization Irrelevant Constraint Elimination Constraint Caching [KLEE ’08] Meta-Heuristics [Borges ‘12, Souza ‘11, Lakhotia ’10] Hybrid ICP [Garg,‘13] Path Explosion Pruning Redundant Paths RWset [Boonstoppel ‘08] Interpolation [Jaffar ’13] Function Summary Compositional [Godefroid ’07,‘10] Demand-Driven Compositional [Anand ’08] Search Heuristics CFG-Directed [Burnim ‘08] Generational [Godefroid ‘08] CarFast [Park ‘12] KATCH [Cadar ’13] Hybrid [Majumdar ‘07]

×