A Survey on Dynamic Symbolic Execution for Automatic Test Generation

1,638 views

Published on

PQE at HKUST

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,638
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
65
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

A Survey on Dynamic Symbolic Execution for Automatic Test Generation

  1. 1. A Survey on Dynamic Symbolic Execution for Automatic Test Generation Jan. 6 2014 PQE Hyunmin Seo 1
  2. 2. Motivation •  Testing is a practical way to verify software •  The cost for testing account more than 50% of total software development costs [Tassey ‘02] •  Effective, efficient and scalable automatic testing is required [Bounimova ‘13, Kim ‘12] 2
  3. 3. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 3
  4. 4. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 4
  5. 5. Random Testing •  Random Testing – Randomly generate test inputs •  Adaptive Random Testing (ART) – Spread test cases evenly over input domain [Chen ’04] – Failure-causing inputs form contiguous region [White ‘80, Chan ‘96] •  Feedback-Directed Random Testing – Randoop [Pacheco ‘07] – Unit testing 5
  6. 6. Random Testing Summary •  One of the most fundamental and well-studied approach [Hamlet ‘94, Loo ‘88] –  Many variations •  Pros –  Efficient, Scalable –  No source code requirement •  Cons –  Low coverage [Burnim ’08] 6
  7. 7. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 7
  8. 8. Combinatorial Testing •  Find a subset of input parameters satisfying a certain property [Cohen ‘13] •  Mathematical property 8
  9. 9. Vertical Ruler Ruler Units Default View SS Navigation End with Black Always Mirror Warn Before Visible Inches Normal Pop-up Yes Yes Yes Invisible Centimeters Slide None No No No Points Outline Picas Total # of configuration Settings = 2*4*3*2*2*2 = 384 9
  10. 10. N-way Covering Array •  A subset including all the possible combinations from any N factors at least once [Cohen ‘13] 10
  11. 11. No Vertical Ruler Ruler Units Default View SS Navigation End with Black Always Mirror Warn Before 1 Visible Centimeters Outline Pop-up No No Yes 2 Invisible Inches Outline Pop-up No No No 3 Invisible Centimeters Slide None Yes Yes Yes 4 Visible Picas Outline Pop-up Yes Yes No 5 Invisible Centimeters Normal Pop-up Yes Yes No 6 Visible Points Outline None Yes No Yes 7 Invisible Points Slide Pop-up No No No 8 Invisible Picas Slide Pop-up No Yes Yes 9 Invisible Points Normal None No Yes No 10 Visible Inches Normal None Yes No Yes 11 Visible Inches Slide Pop-up No Yes Yes 12 Invisible Picas Normal None Yes No No Vertical Ruler Ruler Units Default View SS Navigation End with Black Always Mirror Warn Before Visible Inches Normal Pop-up Yes Yes Yes Invisible Centimeters Slide None No No No Points Outline Picas 2-Way Covering Array 11 CA(12;2,(25,31,41)
  12. 12. Combinatorial Testing Summary •  Research Direction –  How to find the minimum size array •  Greedy [Tung ‘00, Colbourn ‘04] •  Meta-heuristics [Cohen ‘03, Stardom ‘01] –  Application to different domain •  Software Product Line [McGregor ‘01, Perrouin ‘10] •  Pros –  Systematic testing with mathematical property [Cohen ‘13] –  Sample configurations to be tested [Qu 08’] •  Cons –  Too many combinations for program inputs 12
  13. 13. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 13
  14. 14. Search-Based Testing •  A branch of SBSE in which meta heuristics are used to guide the search [McMinn ‘04] •  Typical process –  Start with a random input –  Search nearby locations for better solution –  Evaluate with fitness function –  Update the current solution with a better solution –  Search is guided by meta-heuristics 14
  15. 15. Meta-Heuristics Input domain (a) Hill climbing Fitnessvalue Input domain (b) Simulated Annealing Fitnessvalue Input domain (c) Genetic Algorithm Fitnessvalue 15 [McMinn ‘11]
  16. 16. [McMinn ’11] Input :A string count:The number of digits in the string if (count >= 4) if (count <= 10) if (checksum % 10 == checkdigit) FALSE FALSE FALSE TRUE TRUE TRUE Target π2: count = 20 π3: count = 11 π1 π2 π3 Search Based-Testing Example 16
  17. 17. Fitness Function •  Combination of approach level and branch distance •  Approach level –  The number of target’s control dependent node not executed by the current input •  Branch distance [Tracey ‘98] 17 Element   Value   Boolean   if  TRUE  then  0  else  K   a  =  b   if  abs(a-­‐b)  =  0  then  0  else  abs(a-­‐b)  +  K   a  ≠  b   if  abs(a-­‐b)  ≠  0  then  0  else  K   a  <  b   if  a-­‐b  <  0  then  0  else  (a-­‐b)  +  K   a  ≤  b   if  a-­‐b  ≤  0  then  0  else  (a-­‐b)  +  K   a  >  b   if  b-­‐a  <  0  then  0  else  (b-­‐a)  +  K   a  ≥  b   if  b-­‐a  ≤  0  then  0  else  (b-­‐a)  +  K   a ∨ b   min  (  cost(a),  cost(b)  )   a ∧ b   cost  (a)  +  cost  (b)   !a   move  negation  inward  and  propagate  
  18. 18. Search-Based Testing Summary •  A branch of SBSE –  Different search heuristics –  Different domain [Harman ’13] •  Pros –  Guide the execution toward a specific branch –  Non-functional testing (ex. longest execution time) [Wegener ’98] •  Cons –  Search space challenge –  Design of fitness functions [Arcuri ‘10] 18
  19. 19. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 19
  20. 20. Symbolic Execution-Based Testing •  Use symbolic values to represent program variables and path conditions [King ‘76, Clarke ‘76] •  Find precise constraints for each execution path and generate test input by solving the constraints. 20
  21. 21. x  =  sym_input();   y  =  sym_input();   z  =  sym_input();     a  =  x  +  y     if  (z  >  a)      b  =  x  –  y   else      b  =  2  *  y     ...   Var   Value   PC:  s3>s1+s2     PC:  s3<=s1+s2     x s1   y s2   z s3   a s1 + s2   b s1 - s2   Var   Value   x s1   y s2   z s3   a s1 + s2   b 2s2   Symbolic Execution 21
  22. 22. π1 : PC1 π2 : PC2 π3 : PC3 . . . πn : PCn Test Generation SMT solver π1 : x = 1, y = 2, ... π2 : x = 1, y = 5, ... π3 : x = -5, y = 0,.. . . . πn : x = …, y = … Path Conditions Test Inputs 22
  23. 23. Symbolic Execution Based-Testing Summary •  Pros –  No redundant inputs taking the same path –  High Coverage •  Cons –  Low efficiency –  Depends on constraint solving techniques –  External library calls –  State explosion –  Imprecision 23
  24. 24. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 24
  25. 25. Limitations of SE 25 01    void  foo(int  x,  int  y)  {   02            if  (external  (x)  ==  y)  {   03                    //  branch  1   04            }   05            else  if  (hash(x)    >  y)  {   06                  //  branch  2   07              }   08    }       è No  source  code  available       è hash()  is  complex  arithmetic  
  26. 26. Dynamic Symbolic Execution •  Perform symbolic execution dynamically along an execution path of a concrete input [DART ‘05, CUTE ’05, PEX ‘08] •  Apply concretization – External library calls – Complex constraints 26
  27. 27. DSE π1 pc1 pc2 pc3 pc4 π2 π1 π2 π1 π3 PC’ = pc1∧pc2∧!pc3 PC’’ = pc1∧!pc2 27 PC = pc1∧pc2∧pc3 … ∧pcn
  28. 28. Benefit •  Based on symbolic execution –  No redundant inputs taking the same path –  High coverage •  Reach deep program state by starting from well-formed user provided input •  Use concrete values to overcome limitations –  External library calls –  Complicated constraints •  Many tools –  CREST, CUTE, JCUTE, PEX, SAGE, EXE, KLEE 28
  29. 29. Comparison Technique Efficiency Coverage Source code Requirement ETC Random No Combinatorial No Combine with other techniques Search-Based Yes/No Non-functional Testing Symbolic Execution Yes DSE Yes Concretization 29
  30. 30. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 30
  31. 31. Imprecision •  When the symbolic execution cannot represent the exact semantic of the program [Elkarablieh ’09] – Modeling a 4-Byte integer with a mathematical integer •  Imprecision may manifest as Divergence [Godefroid ’08] 31
  32. 32. Divergence pc1 pc2 pc3 pc4 pc5 pc1 ∧ pc2 ∧ ! pc3 32
  33. 33. Proposed solutions •  Integer size, Bit operations –  BitVector [SAGE ’08] •  Symbolic pointer dereferencing –  Array Theory of SMT solvers [Elkarablieh ‘09] •  Floating-point operations –  Combined static and dynamic analysis [Godefroid ‘10] •  Interaction with environment –  Modeling [KLEE ‘08] –  Reporting [Xiao ‘11] 33
  34. 34. BitVector •  Use bitvector in SMT solvers – Fixed-size integers – Bit operation on integer variables •  a & b •  a << 4 •  Slower than integer arithmetic 34
  35. 35. Symbolic Pointer Dereferencing •  Symbolic values are used to calculate the addresses of pointer values – Array index – a[S0] 35
  36. 36. 01    void  single  array  (BYTE  x,  BYTE  y)  {     02        BYTE  ∗  a  =  new  BYTE[4];   03        a[0]  =  x;     04        a[1]  =  0;     05        a[2]  =  1;     06        a[3]  =  2;     07   08        if  (a[x]  ==  a[y]  +  2)     09            assert(false  );     10   11        delete  []  a;     12 }     a[x] == a[y] + 2 è 0 != 0 + 2 a[x] == a[y] + 2 è S0 != 0 + 2 a[x] == a[y] + 2 è 1 != 0 + 2 [Elkarablieh ‘09] 36 Con Sym Con x 0 S0 2 y 1 S1 1 a[0] 0 S0 2 a[1] 0 0 0 a[2] 1 1 1 a[3] 2 2 2 a[x] 0 S0 1 a[y] 0 0 0 Symbolic Pointer Dereferencing Example
  37. 37. 01    void  single  array  (BYTE  x,  BYTE  y)  {     02        BYTE  ∗  a  =  new  BYTE[4];   03        a[0]  =  x;     04        a[1]  =  0;     05        a[2]  =  1;     06        a[3]  =  2;     07   08        if  (a[x]  ==  a[y]  +  2)     09            assert(false  );     10   11        delete  []  a;     12 }     [Elkarablieh ‘09] 37 Array Theory of SMT Solver Con Sym Con x 0 S0 2 y 1 S1 1 a[0] 0 S0 2 a[1] 0 0 0 a[2] 1 1 1 a[3] 2 2 2 a[x] 0 S0 1 a[y] 0 0 0 a[x]  :    0  ≤  x  ≤  3  ∧  a[x]      {0,1,2}   a[y]  :    0  ≤  y  ≤  3  ∧  a[y]      {0,1,2,x}  
  38. 38. Floating Point Operation •  [Godefroid ’10] •  FP code should only perform memory safe data-processing – Payload of an image or video file •  Non-FP code should deal with buffer allocations and memory address computations •  Lightweight local path-insensitive “may” analysis + precise “must” dynamic analysis 38
  39. 39. Interaction With Environment •  Modeling [KLEE ‘08] – System Calls – int  fd  =  open(argv[1],  O_RDNLY);     •  Precise Identification and Report – [Xiao ’11] 39
  40. 40. Imprecision Summary Reason Proposed Solutions Fixed-size Integer BitVector [SAGE ‘08] Symbolic Pointer Dereferencing Array Theory [Elkarablieh ’09] Floating-point operations Combined Static and Dynamic analysis [Godefroid ‘10] Interaction with Environment Modeling [KLEE ‘08] Precise identification and report [Xiao ’11] 40 Remaining Challenges: Precise reasoning about floating points, Interaction with Environment, External Library Calls, Concurrent programs
  41. 41. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 41
  42. 42. Constraint Solving •  Need to solve path constraints to get the test input •  The major bottleneck – Takes long time – Cannot solve 42
  43. 43. Proposed Solutions •  Optimization [KLEE ‘08] – Expression rewriting – Implied value concretization – Irrelevant constraint elimination – Constraint caching •  Meta-heuristic based constraints solving [Borges ‘12, Souza ‘11, Lakhotia ‘10] •  Hybrid approach [Garg ‘13] 43
  44. 44. Optimization •  Irrelevant constraint elimination [KLEE ‘08] •  Constraint Caching [KLEE ‘08] 44
  45. 45. Meta-Heuristic Approach •  SMT solvers may not support – Non-linear constraints – Floating-Points expressions – Very complex constraints •  Use Meta-Heuristic Approaches [Borges ‘12, Souza ‘11, Lakhotia ’10] 45
  46. 46. Hybrid Approach [Garg ’13] •  Apply concretization first and solve it quickly with an off-the-shelf SMT solver •  If divergence occurred, use ICP (Interval Constraint Propagation) to solve the constraints 46
  47. 47. Constraint Solving Summary Target Proposed Solutions Time overhead Irrelevant Constraint Elimination Constraint Caching [KLEE ‘08] Complex constraints Meta-heuristic Approach [Borges ‘12, Souza ‘11, Lakhotia ‘10] Non-linear constraints ICP [Garg,‘13] 47 Remaining Challenges: Floating points, Complex constraints, Non-linear constraints
  48. 48. Outline •  Automatic Test Generation –  Random Testing –  Combinatorial Testing –  Search-Based Testing –  Symbolic Execution-Based Testing –  Dynamic Symbolic Execution •  Challenges in DSE (SE) –  Imprecision –  Constraint Solving –  Path Explosion 48
  49. 49. Path Explosion •  The number of paths in a program increases exponentially with the number of branches in the program 49
  50. 50. Path Explosion π1 pc1 pc2 pc3 pc4 π2 π1 π2 π1 π3 pc1∧pc2∧!pc3 pc1∧!pc2 50
  51. 51. Proposed Solutions •  Pruning Redundant Path –  RWset [Cristian ‘08] –  Interpolation [Jaffar ’13] •  Function Summary –  Compositional [Godefroid ‘07,‘10] –  Demand-driven compositional [Anand ‘08] •  Search Heuristics –  CFG [Burnim ‘08] –  Generational [Godefroid ‘08] –  CarFast [Park ‘12] –  Hybrid [Majumdar ‘07] 51
  52. 52. Pruning Redundant Paths •  RWset ‘08 – If an execution reached a program point in the same state as some previous executions, then the execution will produce the same results – If two states are only differ in program values that are not subsequently read, then the two state will produce the same results 52
  53. 53. Pruning Redundant Paths •  Interpolant [Jaffar ’13] •  Succinctly representation of the core reason why a branch cannot be covered 53
  54. 54. Interpolant Example 54 UNSAT branch Full Interpolant ( x < 3z + 2) [Jaffar ’13]
  55. 55. Function Summary •  A function summary [Godefroid ‘07,‘10] •  prew is a conjunction of constraints of the inputs to the function •  postw , effect, is a conjunction of constraints of the outputs from the function 55
  56. 56. Function Summary foo(x, y) Assume foo has 10 execution paths Without Summary With Summary N paths N × 10 paths foo(x, y) N paths N paths 56
  57. 57. Search Heuristics •  Prioritize branches and explore relevant branches only 57
  58. 58. Search Heuristics (a) DFS (b) BFS (c) Heuristic Search 58
  59. 59. Search Heuristics •  Coverage-Optimized – CFG-directed [Burnim ‘08] – CarFast [Park ‘12] – Generational [GodeFroid ‘10] – Hybrid [Majumdar ‘07] •  Patch-Optimized – KATCH [Cadar ‘13] 59
  60. 60. CFG-Directed Search 60 π1 pc1 pc2 pc3 pc4 [Burnim ’08]
  61. 61. Limitations of Search Heuristics •  Does not consider how execution reached to branch •  Does not handle non-symbolic path constraints – pc = 3 > 0 – pc’ = !(3 > 0) = 3 ≤ 0 = UNSAT 61
  62. 62. Guiding Execution Toward a Branch 62 UNSAT
  63. 63. Path Explosion Summary Approach Proposed Solutions Pruning Redundant Paths RWset [Boonstoppel ‘08] Interpolation [Jaffar ‘13] Function Summary Compositional [Godefroid ’07,‘10] Demand-Driven Compositional [Anand ‘08] Search Heuristics CFG-Directed [Burnim ‘08] Generational [Godefroid ‘08] CarFast [Park ‘12] Hybrid [Majumdar ‘07] KATCH [Cadar ’13] 63 Remaining Challenges: Better Search Strategies, Guiding execution toward a specific branch
  64. 64. Conclusion •  DSE is a promising automatic test generation techniques achieving a high coverage •  DSE relies on symbolic execution and constraint solving •  Challenges – Imprecision, Constraint solving, Path explosion – GUI Application Testing, Concurrent programs, Object Creation problem 64
  65. 65. 65 Challenges and Proposed Solutions Imprecision Integer Size BitVector [SAGE ’08] Symbolic Pointer Dereferencing Array Theory [Elkarablieh ’09] Floating-points Combined Static and Dynamic analysis [Godefroid ’10] Environments Modeling [KLEE ‘08] Precise identification and report [Xiao ’11] Constraint Solving Optimization Irrelevant Constraint Elimination Constraint Caching [KLEE ’08] Meta-Heuristics [Borges ‘12, Souza ‘11, Lakhotia ’10] Hybrid ICP [Garg,‘13] Path Explosion Pruning Redundant Paths RWset [Boonstoppel ‘08] Interpolation [Jaffar ’13] Function Summary Compositional [Godefroid ’07,‘10] Demand-Driven Compositional [Anand ’08] Search Heuristics CFG-Directed [Burnim ‘08] Generational [Godefroid ‘08] CarFast [Park ‘12] KATCH [Cadar ’13] Hybrid [Majumdar ‘07]

×