STAR: Stack Trace based Automatic Crash Reproduction

5,786 views

Published on

Ning's PhD Thesis Defense

Published in: Technology, Automotive
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,786
On SlideShare
0
From Embeds
0
Number of Embeds
3,596
Actions
Shares
0
Downloads
29
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Therefore, failure reproduction is a very important part of the software development process.
  • Techniques have been introduced to try to reproduce crashes, the current state-of…
  • However, they have several limitations:
  • No runtime data collections.No runtime performance overhead.Does not rely on memory dump or developer written contracts.Optimizations to greatly improve the crash reproduction process.Capable of reproducing non-trivial crashes from object-oriented programs.
  • STAR implements an backward symbolic execution engine that can compute the crash preconditions
  • The number of potential backward path grows exponentially to the number of branches
  • Mostly minimal
  • The underlying challenging is that, constraint solver does not any background knowledge about the subject program
  • We found that,
  • Given an input model (object state), the approach can effectively construct method sequence which can generates objects satisfying this model.
  • If there is subsequent invocations, effect from the invocation target method will also be included.
  • Star implements and deductive engine which The input model returned by the SMT solver indicates the requiring object states for the method inputs. A method path can produce the target object state if : Φ𝑝𝑎𝑡h∧Φ𝑡𝑎𝑟𝑔𝑒𝑡 is satisfiableStarimplements and deductive engine which The input model returned by the SMT solver indicates the requiring object states for the method inputs. A method path can produce the target object state if : Φ_𝑝𝑎𝑡ℎ∧Φ_𝑡𝑎𝑟𝑔𝑒𝑡 is satisfiable
  • … In addition, because of our various efficiency improvements, we are able to apply STAR to much larger subjects compared to previous studies.
  • The improvement from ACC is comparatively smaller since it has relatively fewer paths than ANT and LOG.
  • Because STAR needs a precondition to reproduce a crash, the precondition column shows the upper bound of the number of crashes we can reproduce.
  • Our settings favor Randoop over STAR.
  • 14 more crashes by STAR
  • Our settings favor Randoop over STAR.
  • … In addition, because of our various efficiency improvements, we are able to apply STAR to much larger subjects compared to previous studies.
  • STAR: Stack Trace based Automatic Crash Reproduction

    1. 1. 1 STAR: STACK TRACE BASED AUTOMATIC CRASH REPRODUCTION PhD Thesis Defence Ning Chen Advisor: Sunghun Kim November 05, 2013
    2. 2. 2 Outline 1. Motivation & Related Work 2. Approaches of STAR 1) Crash Precondition Computation 2) Input Model Generation 3) Test Input Generation 3. Evaluation Study 4. Challenges & Future Work 5. Contributions
    3. 3. 3 Motivation  Failure reproduction is a difficult and time consuming task. But it is necessary for fixing the corresponding bug.  For example: https://issues.apache.org/jira/browse/COLLECTIONS-70  Have not been fixed for five months due to difficulties in reproducing the bug.  After a test case was submit, it was soon fixed with a comment: “As always, a good test case makes all the difference.”
    4. 4. 4 Problem Statement  The intention of this research is to propose a stack trace based automatic crash reproduction framework, which is efficient and applicable to real world object-oriented programs.  Sub-problem 1: Propose an efficient crash precondition computation approach which is applicable to non-trivial real world programs.  Sub-problem 2: Propose a novel method sequence composition approach which can generate crash reproducible test cases for object-oriented programs.
    5. 5. 5 Contributions  Study the scalability challenge of automatic crash reproduction, and propose approaches to improve its efficiency.  Study the object creation challenge for reproducing object-oriented crashes, and propose a novel method sequence composition approach to address it.  A novel framework, STAR, which combines the proposed approaches to achieve automatic crash reproduction using only crash stack trace.  A detailed empirical evaluation to investigate the usefulness of STAR.
    6. 6. Related Work
    7. 7. 7 Related Work  Record-and-replay approaches:  Jrapture, 2000,  BugNet, 2005,  ReCrash/ReCrashJ, 2008  LEAP/LEAN, 2010  Post-failure-process approaches:  Microsoft PSE, 2004  IBM SnuggleBug, 2009  XyLem, 2009  ESD, 2010  BugRedux, 2012
    8. 8. 8 Record-and-replay Approaches  Approach:  Monitoring Phase: Captures/Stores runtime heap & stack objects.  Test Generation Phase: Generates tests that loads the correct objects with the crashed methods. Original Program Execution Store from heap & stack Stored Objects Load as crashed method params Recreated Test Case
    9. 9. 9 Record-and-replay Approaches Frameworks Instrumenta tion Data Collections Memory Overhead Performance Overhead Jrapture’00 Required All Interactions N/A N/A BugNet’05 Required / Hardware All Inputs/ Executed Code N/A N/A ReCrash’08 Required Stack Objects 7% - 90% 31% - 60% LEAP’10 Required SPE Access / Thread Info N/A 7% - 600%  Limitations:  Require up-front instrumentations or special hardware deployment.  Collect client-side data, which may raise privacy concern. [Clause et. al, 2010]  Non-trivial memory and runtime overheads.
    10. 10. 10 Post-failure-process Approaches  Perform analyses on crashes only after they have occurred.  Advantages  Usually do not record runtime data.  Incur no or very little performance overhead.
    11. 11. 11 Post-failure-process Approaches  Crash Explanation Approaches  Microsoft PSE [Manevich et. al, 2004]  IBM SnuggleBug [Chandra et. al, 2009]  XyLem [Nanda et. al, 2009]  Assist crash debugging by providing hints on the target crashes:  Potential crash traces  Potential crash conditions  Could not reproduce the target crashes.
    12. 12. 12 Post-failure-process Approaches  Crash Reproduction Approaches  Core dump-based Approaches  Cdd [Leitner et. al, 2009]  RECORE [Roßler et. al, 2013]  Symbolic execution-based approaches  ESD [Zamfir et. al, 2009]  BugRedux [Jin et. al, 2012]  Aims to reproduce crashes using only post-failure data such as  Crash stack traces  Memory core dump at the time of the crash
    13. 13. 13 Crash Reproduction Approaches  Core dump-based approaches  E.g. Cdd [Leitner et. al, 2009] and RECORE [Roßler et. al, 2013]  Leverage the memory core dump and even some developer written contracts to guide the crash reproduction process.  Advantage  Higher chance of reproducing a crash as more data is provided.  Limitations  Requires not just stack trace, but the entire memory core dump at the time of the crash.  Less capable in reality due to the lack of memory core dump.
    14. 14. 14 Crash Reproduction Approaches  Symbolic execution-based approaches  E.g. ESD [Zamfir et. al, 2009] and BugRedux [Jin et. al, 2012]  Perform symbolic execution-based analysis to identify crash paths and generate crash reproducible test cases.
    15. 15. 15 Crash Reproduction Approaches  Advantages:  Use only crash stack trace to achieve crash reproduction.  No runtime overhead is incurred at client-side.  Limitations:  Existing approaches rely on forward symbolic executions to compute crash preconditions, which is less efficient.  Could not be fully optimized due to the nature of forward symbolic execution.  Could not reproduce non-trivial crashes from object-oriented programs due to the object-creation challenge.
    16. 16. 16 Crash Reproduction Approaches  STAR: Stack Traced based Automatic crash Reproduction  Advantages: Approaches Limitations Advantages of STAR Record-replay Data collection No runtime data collection Record-replay Performance overhead No performance overhead Core dump based Memory Core dump and developer written contracts Crash stack trace Symbolic. Exec.-based Lack of optimizations Symbolic Exec.-based Lack of support for objectoriented programs Optimizations to greatly improve the crash reproduction process. Capable of reproducing non-trivial crashes for object-oriented programs.
    17. 17. 17 Overview of STAR 1 stack trace Crash Precondition Computation Crash Preconditions 2 Input Model Generation program Crash Models test cases 3 Test Input Generation
    18. 18. Crash Precondition Computation
    19. 19. 19 Crash Precondition Computation Crash Precondition Computation 1 stack trace Crash Precondition Computation Crash Preconditions 2 Input Model Generation program Crash Models test cases 3 Test Input Generation
    20. 20. Crash Precondition Computation 20 Crash Precondition Computation  Crash Precondition  the conditions of inputs at a method entry that can trigger the crash.  It specifies in what kind of memory state can the crash be reproduced.
    21. 21. Crash Precondition Computation 21 Crash Precondition Computation  Existing approaches such as ESD and BugRedux use forward symbolic executions to compute the crash preconditions.  Program is executed in the same direction as normal executions.  Inputs and variables are represented as symbolic values instead of concrete values.  Limitations of forward symbolic execution  Non-demand-driven: Need to execute many paths not related to crash  Limited optimization: Difficult perform optimizations using the crash information
    22. 22. Crash Precondition Computation 22 Crash Precondition Computation  STAR performs a backward symbolic execution to compute the crash precondition.  Program is executed from crash location to method entry.  Advantages of backward symbolic execution  Demand-driven: Only paths related to the crash are executed.  Optimizations: Optimizations can be performed using the crash information.
    23. 23. Crash Precondition Computation 23 Backward Symbolic Execution  Given a program P, a crash location L and the crash condition C at L, we execute P from L to a method entry with C as the initial crash precondition.  The precondition is updated along the execution path according to the executed statements.  E.g. int var3 = var1 + var2; -> all occurrences of var3 are replaced by var1 + var2  E.g. if (var1 != null) -> Coming from true branch: var1 != null is added to precondition -> Coming from false branch: var1 == null is added to precondition  The preconditions at method entries are save as the final crash preconditions.
    24. 24. 24 Crash Precondition Computation Backward Symbolic Execution Precondition Method Entry If (i < buffer.length) T buffer[i] = 0; Symbolic Execution int i = this.last; {buffer != null} {last < 0 or last >= buffer.length} {last < buffer.length} {buffer != null} {i < 0 or i >= buffer.length} {i < buffer.length} {buffer != null} {i < 0 or i >= buffer.length} TRUE AIOBE
    25. 25. 25 Crash Precondition Computation Challenge – Path explosion isDebugging() F T print(…) debugLog(…) buffer = new int[16] index >= buffer.length T F i=0 i = index buffer[i] = 0 AIOBE …
    26. 26. Crash Precondition Computation Optimizations  STAR introduces three different approaches to improve crash precondition computation process:  Static Path Reduction  Heuristic backtracking  Early detection of inner contradictions 26
    27. 27. Crash Precondition Computation 27 Static Path Reduction  Observation: Only a subset of the conditional branches and method calls contribute to the target crash.  E.g. Methods that perform runtime logging can be safely skipped  E.g. Branches which do not modify the crash related variables can be safely skipped.  Optimization: STAR detects and skips branches or method calls that do not contribute to the target crash during symbolic execution.
    28. 28. 28 Crash Precondition Computation Static Path Reduction isDebugging() F T print(…) debugLog(…) buffer = new int[16] index >= buffer.length T F i=0 i = index buffer[i] = 0 AIOBE method isDebugging() does not contribute to the crash
    29. 29. 29 Crash Precondition Computation Static Path Reduction isDebugging() F T print(…) debugLog(…) buffer = new int[16] index >= buffer.length T F i=0 i = index buffer[i] = 0 AIOBE the conditional branch does not contribute to the crash as well.
    30. 30. 30 Crash Precondition Computation Static Path Reduction isDebugging() F T print(…) debugLog(…) buffer = new int[16] index >= buffer.length T F i=0 i = index buffer[i] = 0 AIOBE STAR can detect and skip over methods and branches not contributing to the crash
    31. 31. Crash Precondition Computation 31 Static Path Reduction  A conditional branch or a method call is contributive to the crash if:  It can modify any stack location referenced in the current crash precondition formula.  It can modify any heap location referenced in the current crash precondition formula.  However, in backward execution, the actual heap locations may not be decidable until they are explicitly defined.
    32. 32. Crash Precondition Computation 32 Static Path Reduction  For any reference whose heap location cannot be decide:  Compare whether the modified heap location and the reference has compatible data types.  Compare whether the modified heap location and the reference has the same field name (exception array)  If both of the above criterion are satisfied, the heap locations are considered the same.  In Java, the same heap location can only be accessed through the same field name, except for array fields.
    33. 33. Crash Precondition Computation 33 Heuristic Backtracking  Observation: Backtracking execution to the most recent branching point is likely inefficient, as the contradictions are usually introduced much earlier.  Optimization: STAR can efficiently backtrack to the most relevant branches where contradictions may still be avoided.
    34. 34. 34 Crash Precondition Computation Heuristic Backtracking An executed path is not satisfiable according to the SMT solver. isDebugging() F T print(…) debugLog(…) buffer = new int[16] index >= buffer.length T F i=0 i = index buffer[i] = 0 AIOBE
    35. 35. 35 Crash Precondition Computation Heuristic Backtracking Typical backtracking is not efficient. isDebugging() F T print(…) debugLog(…) buffer = new int[16] index >= buffer.length T F i=0 i = index buffer[i] = 0 AIOBE
    36. 36. 36 Crash Precondition Computation Heuristic Backtracking STAR can quickly backtrack to the most relevant branches isDebugging() F T print(…) debugLog(…) buffer = new int[16] index >= buffer.length T F i=0 i = index buffer[i] = 0 AIOBE
    37. 37. Crash Precondition Computation 37 Heuristic Backtracking  The unsatisfiable core of the last unsatisfied path conditions.  A subset of the path conditions which are still unsatisfied by themselves  A branching point is considered relevant to the last unsatisfaction and will be backtracked to only if:  A condition in the unsatisfiable core was added in this branch, or  A variable’s concrete value in the unsatisfiable core was decided in this branch, or  A variable’s actual heap location in the unsatisfiable core was decided in this branch.
    38. 38. 38 Crash Precondition Computation Inner Contradiction Detection isDebugging() F T print(…) debugLog(…) buffer = new int[16] index >= buffer.length T F i=0 i = index buffer[i] = 0 AIOBE STAR quickly discovers innercontradictions in the current precondition during execution.
    39. 39. 39 Crash Precondition Computation Inner Contradiction Detection isDebugging() F T STAR quickly discovers innercontradictions in the current precondition during execution. print(…) debugLog(…) buffer = new int[16] index >= buffer.length T F i=0 i = index buffer[i] = 0 AIOBE Crash Precondition: index < 0 or index >= 16 Index < 16
    40. 40. Crash Precondition Computation 40 Other Details  Loops and recursive calls  Options for the maximum loop unrollment and maximum recursive call depth  Call graph construction  User can specify a pointer analysis algorithm to use  Option for maximum call targets  String operations  Strings are treated as arrays of characters.  Complex string operations/regular expressions are not support: require the usage of more specialized constraint solvers: Z3-str, HAMPI
    41. 41. Input Model Generation
    42. 42. 42 Input Model Generation Input Model Generation 1 stack trace Crash Precondition Computation Crash Preconditions 2 Input Model Generation program Crash Models test cases 3 Test Input Generation
    43. 43. Input Model Generation 43 Input Model Generation  After computing the crash precondition, we need to compute a model (object state) which satisfies this precondition.  However, for one precondition, there could be many models that can satisfy it. • E.g. For precondition: {ArrayList.size != 0}, there could be infinite number of models satisfying it.
    44. 44. Input Model Generation 44 Generating Feasible Input Models  Object Creation Challenge [Xiao et. al, 2011]  Not every model satisfying a precondition is feasible to be generated.  For precondition: ArrayList.size != 0, an input model: ArrayList.size == -1 can satisfy it, but such object can never be generated.  Therefore, we want to obtain input models whose objects are actually feasible to generate.
    45. 45. 45 Input Model Generation Generating Practical Input Models  For different input models, the difficulties in generating the corresponding objects can be very different. Model 1: ArrayList.size == 100 Model 2: ArrayList.size == 1 Requires add() 100 times Requires add() 1 time  Therefore, we also want to obtain input models whose values are as close to the initial values as possible.
    46. 46. Input Model Generation 46 Class Information  STAR has an input model generation approach that can  Generate feasible models  Generate practical models  Extracts and uses the class semantic information to guide the input model generation process.  The initial value for each class member field.  The potential value range for each numerical field: • e.g. ArrayList.size >= 0
    47. 47. 47 Input Model Generation Input Model Generation Crash Precondition Class Information Value Range ArrayList.size != 0 ArrayList.size >= 0 Initial Value ArrayList.size starts from 0 SMT Solver ArrayList.size == 1 A feasible and practical model
    48. 48. Test Input Generation
    49. 49. 49 Test Input Generation Test Input Generation 1 stack trace Crash Precondition Computation Crash Preconditions 2 Input Model Generation program Crash Models test cases 3 Test Input Generation
    50. 50. Test Input Generation 50 Test Input Generation  Given a crashing model, it is necessary to generate test inputs that can satisfy it.  However, it could be challenging to generate object test inputs [Xiao et. al, 2011]  Non-public fields are not assignable  Class invariants are easily broken if generate using reflection.  A legitimate method sequence that can create and mutate an object to satisfy the target model (target object state).
    51. 51. Test Input Generation 51 Test Input Generation  Randomized techniques  Randoop [Pacheco et. al, 2007]  Dynamic analysis  Palulu [Artzi et. al, 2009]  Palus [Zhang et. al, 2011]  Codebase mining  MSeqGen [Thummalapenta et. al, 2009]  Not efficient as their input generation process are not demand- driven, and may rely on existing code bases.
    52. 52. Test Input Generation Test Input Generation  STAR proposes a novel demand-driven test input generation approach. 52
    53. 53. Test Input Generation Summary Extraction  Forward symbolic execution to obtain the summary of each method. 53
    54. 54. Test Input Generation Summary Extraction 54
    55. 55. 55 Test Input Generation Summary Extraction Method Entry We perform a forward symbolic execution to the target method. obj != null T list[size] = obj e = new Exception() size += 1 Path Effect F throw e Path Condition Path 1 obj != null list[size] = obj size += 1 Path 2 Method Exit obj == null throw new Exception
    56. 56. Test Input Generation Method Sequence Deduction  STAR introduced a deductive-style approach to construct method sequences that can achieve the target object state 56
    57. 57. 57 Test Input Generation Method Sequence Deduction Recursive deduction for parameter Deductive Engine Input Parameter’s Object States satisfies Candidate Method Constraint Solver By taking this path, the target object state can be achieved .
    58. 58. Test Input Generation Example public class Container { public Container() public void add(Object); public void remove(Object); public void clear(); } Desired object state (Input model): Container.size == 10 58
    59. 59. 59 Test Input Generation Example – Summary Extraction Container() Path 1 TRUE TRUE remove all in list size = 0 size = 0 Path 1 Path 2 obj != null add(obj) obj == null list[size] = obj size += 1 remove(obj) Path 1 clear() throw an exception Path 1 Path 2 obj in list obj not in list remove from list size -= 1 No effect
    60. 60. 60 Test Input Generation Example – Sequence Deduction Method Deduction Can add() produce target state? Container.size == 10 Select add(obj) Yes, this.size == 9 && obj != null Deductive Engine Can clear() produce target state? Container.size == 9 Select clear() No, not satisfiable Constraint Solver
    61. 61. 61 Test Input Generation Example – Sequence Deduction Method Deduction Can add() produce target state? Container.size == 10 Select add(obj) Yes, this.size == 9 && obj != null Deductive Engine Can add() produce target state? Container.size == 9 Select add(obj) Yes, this.size == 8 && obj != null Constraint Solver … Can Contaier() produce target state? Container.size == 0 Select Container() Yes, no parameter requirement
    62. 62. Test Input Generation 62 Example – Final Sequence  Combine in reverse direction to form the whole sequence void sequence() { Container container = new Container(); Object o1 = new Object(); container.add(o1); … (10 times) }
    63. 63. Test Input Generation 63 Other Details  The forward symbolic execution in method summary extraction follows similar settings as precondition computation  E.g. Loops and recursive calls are expanded for only limited times/depth. (So the extracted path summary ≤ total method paths) • The incompleteness of method path summary does not affect the precision of the method sequence composition.  Generated method sequences are still correct.  Method sequences may not be generated due to missing path summary.  Optimizations have been applied to reduce the number of methods and method paths to examine.
    64. 64. Evaluation
    65. 65. 65 Research Questions  Research Question 1 How many crashes can STAR compute their crash triggering preconditions?  Research Question 2 How many crashes can STAR reproduce based on the crash triggering preconditions?  Research Question 3 How many crash reproductions by STAR are useful for revealing the actual cause of the crashes?
    66. 66. 66 Evaluation Setup  Subjects:  Apache-Commons-Collection (ACC): data container library that implements additional data structures over JDK. 60kLOC.  Ant (ANT) Java build tool that supports a number of built-in and extension tasks such as compile, test and run Java applications. 100kLOC.  Log4j (LOG) logging package for printing log output to different local and remote destinations. 20kLOC.
    67. 67. 67 Evaluation Setup  Crash Report Collection:  Collect from the issue tracking system of each subject.  Only confirmed and fixed crashes were collected.  Crashes with no or incorrect stack trace information were discarded.  Three major types of crashes: custom thrown exceptions, NPE and AIOBE. (covers 80% of crashes, Nam et. al, 2009) Subject # of Crashes Versions Avg. Fix Time Report Period ACC 12 2.0 – 4.0 42 days Oct. 03 – Jun. 12 ANT 21 1.6.1 – 1.8.3 25 days Apr. 04 – Aug. 12 LOG 19 1.0.0 – 1.2.16 77 days Jan. 01 – Oct. 09  52 crashes were obtained from the three subjects.
    68. 68. 68 Evaluation Setup  Our evaluation study has the largest number of crashes compared to previous studies Subject Number of Crashes RECRASH 11 ESD 6 BugRedux 17 RECORE 7 STAR 52
    69. 69. 69 Research Question 1  How many crashes can STAR compute their crash preconditions?  How many crashes can STAR compute crash precondition without the optimization approaches.  How many crashes can STAR compute crash precondition with the optimization approaches.  We applied STAR to compute the preconditions for each crash.
    70. 70. 70 Research Question 1 Crashes with preconditions (%) Percentage of crashes whose preconditions were computed by STAR 80 70 75 73.7 71.4 66.7 60 73.1 +36.9 36.8 50 +38.5 34.6 +57.1 40 30 20 14.3 10 0 ACC ANT Without Optimizations LOG With Optimizations Overall
    71. 71. 71 Research Question 1 Average time spent (second) Average time to compute the crash preconditions (The lower the better) 100 90 80 70 60 50 40 30 20 10 0 90.4 59.3 55.1 18.5 2.1 ACC 4.9 ANT Without Optimizations 2.4 LOG With Optimizations 3.3 Overall
    72. 72. 72 Research Question 1 Crashes with preconditions (%) Percentage of crashes whose preconditions were computed by STAR – Break down by each optimization 80 70 60 50 40 30 20 10 0 75 66.7 75 73.7 71.4 73.1 66.7 66.7 47.4 44.2 42.1 36.8 36.8 38.5 34.6 36.5 23.8 23.8 14.3 ACC 14.3 ANT No Optimization Heuristic Backtracking All Optimizations LOG Overall Static Path Reduction Contradiction Detect
    73. 73. 73 Research Question 1  STAR successfully computed crash preconditions for 38 (73.1%) out of the 52 crashes.  STAR’s optimization approaches have significantly improved the overall result by 20 (38.5%) crashes.  Static path reduction is the most effective optimization, but the application of all three optimizations together can achieve a much higher improvement.
    74. 74. 74 Research Question 2  How many crashes can STAR reproduce based on the crash preconditions?  Criterion of Reproduction [ReCrash, 2008] A crash is considered reproduced if the generated test case can trigger the same type of exception at the same crash line.  We applied STAR to generate crash reproducible test cases for each computed crash precondition.
    75. 75. 75 Research Question 2  Overall crash reproductions achieved by STAR for each subject: Subject # of Crashes # of Precondition # of Reproduced Ratio ACC 12 9 8 66.7% (88.9%) ANT 21 15 12 57.1% (80.0%) LOG 19 14 11 57.9% (78.6%) Total 52 38 31 59.6% (81.6%)
    76. 76. 76 Research Question 2  More statistics for the test case generation process by STAR Subject Average # of Objects Avg. Candidate Methods Min – Max Sequence Average Sequence ACC 1.5 35.5 2 - 19 9.4 ANT 1.4 11.7 2 - 14 6.2 LOG 1.5 21.8 2 - 17 8.1 Total 1.5 21.4 2 - 19 7.7
    77. 77. 77 Research Question 3  Criterion of Reproduction does not require a crash reproduction to match the complete stack trace frames.  A partial match of only the top stack frames is still considered as a valid reproduction of the target crash according to the criterion.  The root causes of more than 60% of crashes lie in the top three stack frames [Schroter et. al, 2010]  It is not necessary to reproduce the complete stack trace to reveal the root cause of a crash.
    78. 78. 78 Research Question 3  Drawbacks of Criterion of Reproduction  The crash reproduction may not be the same crash.  The crash reproduction may not be useful for revealing the crash triggering bug. Reproduced Buggy frame
    79. 79. 79 Research Question 3  How many crash reproductions by STAR are useful for revealing the actual causes of the crashes?  Criterion of useful crash reproduction A crash reproduction is considered useful if it can trigger the same incorrect behaviors at the buggy location, and eventually causes the crash to re-appear.  We manually examined the original and fixed versions of the program to identify the actual buggy location for each crash.
    80. 80. 80 Research Question 3  Overall useful crash reproductions achieved by STAR for each subject: Subject # of Reproduced # of Useful Ratio (Total) ACC 8 7 87.5% (58.3%) ANT 12 7 58.3% (33.3%) LOG 11 8 72.7% (42.1%) Total 31 22 71.0% (42.3%)
    81. 81. 81 Comparison Study  We compared STAR with two different crash reproduction frameworks:  Randoop: feedback-directed test input generation framework. It is capable of generating thousands of test inputs that may reproduce the target crashes.  Maximum of 1000 seconds to generate test cases. (10 times of STAR)  Manually provide the crash related class list to increase its probabilities.  BugRedux: a state-of-the-art crash reproduction framework. It can compute crash preconditions and generate crash reproducible test cases.  We apply the two frameworks to the same set of crashes used in our evaluation.
    82. 82. 82 Comparison Study The number of crashes reproduced by the three approaches 38 40 Number of Crashes 35 31 30 25 22 18 20 15 12 10 10 8 7 5 0 0 Precondition Randoop Reproduction BugRedux Usefulness STAR
    83. 83. 83 Comparison Study STAR Randoop 12 crashes BugRedux 5 crashes 10 crashes
    84. 84. 84 Comparison Study  STAR outperformed Randoop because:  Randoop uses a randomized search technique to generate method sequences. Can generate many method sequences but not guided.  Due to the large search space of real world programs, the probabilities to generate crash reproducible sequences are low.  STAR outperformed BugRedux because:  Several effective optimizations to improve the efficiency of the crash precondition computation process.  A method sequence composition approach that can generate complex input objects satisfying the crash preconditions.
    85. 85. 85 Case Study  https://issues.apache.org/jira/browse/collections-411  An IndexOutOfBoundsException could be raised in method ListOrderedMap.putAll() due to incorrect index increment. 01 public void putAll(int index, Map map) { 02 for (Map.Entry entry : map.entrySet()) { 03 put(index, entry.getKey(), entry.getValue(); 04 ++index; / / buggy increment 05 } 06 }  This bug was soon fixed by the developers by adding checkers to make sure index is incremented only in certain cases.
    86. 86. 86 Case Study  STAR was applied to generate a crash reproducible test case for this crash:  Surprisingly, it successfully generated a test case that could crash both the original and fixed (latest) version of the program.  We reported this potential issue discovered by STAR to the project developers  https://issues.apache.org/jira/browse/collections-474  We also attached the auto-generated test case by STAR in our bug report.
    87. 87. 87 Case Study  Developers quickly confirmed:  The original patch for bug ACC-411 was actually incomplete. It missed a corner case that can still crash the program.  Neither the developers nor the original bug reporter identified this corner case in over a year.  It only took developers a few hours to confirmed and fixed the bug after STAR’s test case demonstrated this corner case.  The crash reproducible test case by STAR was added to the official test suite of the Apache Commons Collections project by the developers.  http://svn.apache.org/r1496168
    88. 88. 88 Case Study  STAR is capable of identifying and reproducing crashes that are even difficult for experienced developers.  STAR can be used to confirm the completeness of bug fixes.  If a bug fix is incomplete, STAR may generate a crash reproducible test case to demonstrate the missing corner case.
    89. 89. Challenges & Future Work
    90. 90. 90 Challenges  We manually examined each not reproduced crashes to identify the major challenges of reproduction:  Environment dependency (36.7%)  File input.  Network input.  SMT Solver Limitation (23.3%)  Complex string constraints (e.g. regular expressions)  Non-linear arithmetic  Concurrency & Non-determinism (16.7%)  Some crashes are only reproducible non-deterministically or under concurrent execution.  Path Explosion (6.7%)
    91. 91. 91 Future Work  Improving reproducibility  Support for environment simulation, e.g. file inputs  Incorporate specialized SMT solver: string solver like Z3-str  Automatic fault localization  Existing fault localization approaches requires both passing and failing test cases locate faulty statements.  STAR’s ability to generate failing test cases can help automate the fault localization process.  Crash reproduction for mobile applications  Android applications are similar to desktop Java programs in many aspects.
    92. 92. 92 Conclusions  We proposed STAR, an automatic crash reproduction framework using stack trace.  Successfully reproduced 31 (59.6%) out of 52 real world crashes from three non-trivial programs.  The reproduced crashes can effectively help developers reveal the underlying crash triggering bugs, or even identify unknown bug.  A comparison study demonstrates that STAR can significantly outperform existing crash reproduction approaches.
    93. 93. Thank You!
    94. 94. Appendix
    95. 95. 95 Subject Sizes  Our evaluation study has one of the largest subject size compared to previous studies Subject Subject Sizes Average Subject Size RECRASH 200 – 86,000 47,000 ESD 100 – 100,000 N/A BugRedux 500 – 241,000 27,000 RECORE 68 – 62,000 35,000 STAR 20,000 – 100,000 60,000
    96. 96. 96 Research Question 1 Average time spent (second) Average time to compute the crash preconditions (The lower the better) – Break down by each optimization 100 90.4 86.8 80 74.8 67.5 60 59.3 55.1 50 47.8 48.2 54.3 39.2 40 28.3 20 18.5 11.8 15.9 13.8 4.9 2.1 3.3 2.4 0 ACC ANT No Optimization Heuristic Backtracking All Optimizations LOG Overall Static Path Reduction Contradiction Detect
    97. 97. 97 Average time spent (second) Comparison Study Average time to reproduce crashes (The lower the better) – Only the common reproductions 35 29.9 30 25 20 15 10.8 8.7 10 5 2.4 4.275 3.75 2.3 4.6 0 ACC ANT BugRedux LOG STAR Overall
    98. 98. 98 User Survey Survey Sent Responses Confirmed Correctness Confirmed Usefulness 31 6 (19%) 5 3  ACC-53 “The auto-generated test case would reproduce the bug. . . I think that having such a test case would have been useful.”
    99. 99. Comparison Study Branch coverage achieved by different test case generation approaches Branch Coverage (%) 80 74 69 70 58 60 61 54 54 50 40 40 29 30 20 16 29 19 36 30 22 20 22 12 10 0 0 ACC Sample Execution JSAP Randoop Palulu SAT4J RecGen Palus STAR

    ×