STAR: Stack Trace based Automatic Crash Reproduction
1. 1
STAR: STACK TRACE BASED
AUTOMATIC CRASH REPRODUCTION
PhD Thesis Defence
Ning Chen
Advisor: Sunghun Kim
November 05, 2013
2. 2
Outline
1. Motivation & Related Work
2. Approaches of STAR
1) Crash Precondition Computation
2) Input Model Generation
3) Test Input Generation
3. Evaluation Study
4. Challenges & Future Work
5. Contributions
3. 3
Motivation
Failure reproduction is a difficult and time consuming task.
But it is necessary for fixing the corresponding bug.
For example: https://issues.apache.org/jira/browse/COLLECTIONS-70
Have not been fixed for five months due to difficulties in
reproducing the bug.
After a test case was submit, it was soon fixed with a comment:
“As always, a good test case makes all the difference.”
4. 4
Problem Statement
The intention of this research is to propose a stack trace based
automatic crash reproduction framework, which is efficient and
applicable to real world object-oriented programs.
Sub-problem 1:
Propose an efficient crash precondition computation approach which
is applicable to non-trivial real world programs.
Sub-problem 2:
Propose a novel method sequence composition approach which can
generate crash reproducible test cases for object-oriented programs.
5. 5
Contributions
Study the scalability challenge of automatic crash reproduction, and
propose approaches to improve its efficiency.
Study the object creation challenge for reproducing object-oriented
crashes, and propose a novel method sequence composition
approach to address it.
A novel framework, STAR, which combines the proposed approaches
to achieve automatic crash reproduction using only crash stack trace.
A detailed empirical evaluation to investigate the usefulness of STAR.
7. 7
Related Work
Record-and-replay approaches:
Jrapture, 2000,
BugNet, 2005,
ReCrash/ReCrashJ, 2008
LEAP/LEAN, 2010
Post-failure-process approaches:
Microsoft PSE, 2004
IBM SnuggleBug, 2009
XyLem, 2009
ESD, 2010
BugRedux, 2012
8. 8
Record-and-replay Approaches
Approach:
Monitoring Phase: Captures/Stores runtime heap & stack objects.
Test Generation Phase: Generates tests that loads the correct
objects with the crashed methods.
Original Program Execution
Store from heap & stack
Stored
Objects
Load as crashed method params
Recreated Test Case
10. 10
Post-failure-process Approaches
Perform analyses on crashes only after they have
occurred.
Advantages
Usually do not record runtime data.
Incur no or very little performance overhead.
11. 11
Post-failure-process Approaches
Crash Explanation Approaches
Microsoft PSE [Manevich et. al, 2004]
IBM SnuggleBug [Chandra et. al, 2009]
XyLem [Nanda et. al, 2009]
Assist crash debugging by providing hints on the target
crashes:
Potential crash traces
Potential crash conditions
Could not reproduce the target crashes.
12. 12
Post-failure-process Approaches
Crash Reproduction Approaches
Core dump-based Approaches
Cdd [Leitner et. al, 2009]
RECORE [Roßler et. al, 2013]
Symbolic execution-based approaches
ESD [Zamfir et. al, 2009]
BugRedux [Jin et. al, 2012]
Aims to reproduce crashes using only post-failure data
such as
Crash stack traces
Memory core dump at the time of the crash
13. 13
Crash Reproduction Approaches
Core dump-based approaches
E.g. Cdd [Leitner et. al, 2009] and RECORE [Roßler et. al, 2013]
Leverage the memory core dump and even some developer written
contracts to guide the crash reproduction process.
Advantage
Higher chance of reproducing a crash as more data is provided.
Limitations
Requires not just stack trace, but the entire memory core dump at
the time of the crash.
Less capable in reality due to the lack of memory core dump.
14. 14
Crash Reproduction Approaches
Symbolic execution-based approaches
E.g. ESD [Zamfir et. al, 2009] and
BugRedux [Jin et. al, 2012]
Perform symbolic execution-based analysis to identify crash paths
and generate crash reproducible test cases.
15. 15
Crash Reproduction Approaches
Advantages:
Use only crash stack trace to achieve crash reproduction.
No runtime overhead is incurred at client-side.
Limitations:
Existing approaches rely on forward symbolic executions to
compute crash preconditions, which is less efficient.
Could not be fully optimized due to the nature of forward symbolic
execution.
Could not reproduce non-trivial crashes from object-oriented
programs due to the object-creation challenge.
16. 16
Crash Reproduction Approaches
STAR: Stack Traced based Automatic crash Reproduction
Advantages:
Approaches
Limitations
Advantages of STAR
Record-replay
Data collection
No runtime data collection
Record-replay
Performance overhead
No performance overhead
Core dump
based
Memory Core dump and
developer written contracts
Crash stack trace
Symbolic.
Exec.-based
Lack of optimizations
Symbolic
Exec.-based
Lack of support for objectoriented programs
Optimizations to greatly improve
the crash reproduction process.
Capable of reproducing non-trivial
crashes for object-oriented programs.
17. 17
Overview of STAR
1
stack trace
Crash Precondition Computation
Crash
Preconditions
2
Input Model Generation
program
Crash Models
test
cases
3
Test Input Generation
19. 19
Crash Precondition Computation
Crash Precondition Computation
1
stack trace
Crash Precondition Computation
Crash
Preconditions
2
Input Model Generation
program
Crash Models
test
cases
3
Test Input Generation
20. Crash Precondition Computation
20
Crash Precondition Computation
Crash Precondition
the conditions of inputs at a method entry that can trigger the crash.
It specifies in what kind of memory state can the crash be
reproduced.
21. Crash Precondition Computation
21
Crash Precondition Computation
Existing approaches such as ESD and BugRedux use forward
symbolic executions to compute the crash preconditions.
Program is executed in the same direction as normal executions.
Inputs and variables are represented as symbolic values instead of
concrete values.
Limitations of forward symbolic execution
Non-demand-driven: Need to execute many paths not related to
crash
Limited optimization: Difficult perform optimizations using the
crash information
22. Crash Precondition Computation
22
Crash Precondition Computation
STAR performs a backward symbolic execution to compute the
crash precondition.
Program is executed from crash location to method entry.
Advantages of backward symbolic execution
Demand-driven: Only paths related to the crash are executed.
Optimizations: Optimizations can be performed using the crash
information.
23. Crash Precondition Computation
23
Backward Symbolic Execution
Given a program P, a crash location L and the crash condition
C at L, we execute P from L to a method entry with C as the
initial crash precondition.
The precondition is updated along the execution path
according to the executed statements.
E.g. int var3 = var1 + var2;
-> all occurrences of var3 are replaced by var1 + var2
E.g. if (var1 != null)
-> Coming from true branch: var1 != null is added to precondition
-> Coming from false branch: var1 == null is added to precondition
The preconditions at method entries are save as the final crash
preconditions.
24. 24
Crash Precondition Computation
Backward Symbolic Execution
Precondition
Method
Entry
If (i < buffer.length)
T
buffer[i] = 0;
Symbolic Execution
int i = this.last;
{buffer != null}
{last < 0 or last >= buffer.length}
{last < buffer.length}
{buffer != null}
{i < 0 or i >= buffer.length}
{i < buffer.length}
{buffer != null}
{i < 0 or i >= buffer.length}
TRUE
AIOBE
25. 25
Crash Precondition Computation
Challenge – Path explosion
isDebugging()
F
T
print(…)
debugLog(…)
buffer = new int[16]
index >= buffer.length
T
F
i=0
i = index
buffer[i] = 0
AIOBE
…
26. Crash Precondition Computation
Optimizations
STAR introduces three different approaches to improve
crash precondition computation process:
Static Path Reduction
Heuristic backtracking
Early detection of inner contradictions
26
27. Crash Precondition Computation
27
Static Path Reduction
Observation:
Only a subset of the conditional branches and method calls
contribute to the target crash.
E.g. Methods that perform runtime logging can be safely skipped
E.g. Branches which do not modify the crash related variables can
be safely skipped.
Optimization:
STAR detects and skips branches or method calls that do not
contribute to the target crash during symbolic execution.
28. 28
Crash Precondition Computation
Static Path Reduction
isDebugging()
F
T
print(…)
debugLog(…)
buffer = new int[16]
index >= buffer.length
T
F
i=0
i = index
buffer[i] = 0
AIOBE
method isDebugging() does
not contribute to the crash
29. 29
Crash Precondition Computation
Static Path Reduction
isDebugging()
F
T
print(…)
debugLog(…)
buffer = new int[16]
index >= buffer.length
T
F
i=0
i = index
buffer[i] = 0
AIOBE
the conditional branch does not
contribute to the crash as well.
30. 30
Crash Precondition Computation
Static Path Reduction
isDebugging()
F
T
print(…)
debugLog(…)
buffer = new int[16]
index >= buffer.length
T
F
i=0
i = index
buffer[i] = 0
AIOBE
STAR can detect and skip
over methods and branches
not contributing to the crash
31. Crash Precondition Computation
31
Static Path Reduction
A conditional branch or a method call is contributive to the
crash if:
It can modify any stack location referenced in the current crash
precondition formula.
It can modify any heap location referenced in the current crash
precondition formula.
However, in backward execution, the actual heap
locations may not be decidable until they are explicitly
defined.
32. Crash Precondition Computation
32
Static Path Reduction
For any reference whose heap location cannot be decide:
Compare whether the modified heap location and the reference has
compatible data types.
Compare whether the modified heap location and the reference has
the same field name (exception array)
If both of the above criterion are satisfied, the heap locations are
considered the same.
In Java, the same heap location can only be accessed
through the same field name, except for array fields.
33. Crash Precondition Computation
33
Heuristic Backtracking
Observation:
Backtracking execution to the most recent branching point is likely
inefficient, as the contradictions are usually introduced much earlier.
Optimization:
STAR can efficiently backtrack to the most relevant branches where
contradictions may still be avoided.
34. 34
Crash Precondition Computation
Heuristic Backtracking
An executed path is not satisfiable
according to the SMT solver.
isDebugging()
F
T
print(…)
debugLog(…)
buffer = new int[16]
index >= buffer.length
T
F
i=0
i = index
buffer[i] = 0
AIOBE
35. 35
Crash Precondition Computation
Heuristic Backtracking
Typical backtracking is not
efficient.
isDebugging()
F
T
print(…)
debugLog(…)
buffer = new int[16]
index >= buffer.length
T
F
i=0
i = index
buffer[i] = 0
AIOBE
36. 36
Crash Precondition Computation
Heuristic Backtracking
STAR can quickly backtrack to
the most relevant branches
isDebugging()
F
T
print(…)
debugLog(…)
buffer = new int[16]
index >= buffer.length
T
F
i=0
i = index
buffer[i] = 0
AIOBE
37. Crash Precondition Computation
37
Heuristic Backtracking
The unsatisfiable core of the last unsatisfied path
conditions.
A subset of the path conditions which are still unsatisfied by
themselves
A branching point is considered relevant to the last
unsatisfaction and will be backtracked to only if:
A condition in the unsatisfiable core was added in this branch, or
A variable’s concrete value in the unsatisfiable core was decided in
this branch, or
A variable’s actual heap location in the unsatisfiable core was
decided in this branch.
38. 38
Crash Precondition Computation
Inner Contradiction Detection
isDebugging()
F
T
print(…)
debugLog(…)
buffer = new int[16]
index >= buffer.length
T
F
i=0
i = index
buffer[i] = 0
AIOBE
STAR quickly discovers innercontradictions in the current
precondition during execution.
39. 39
Crash Precondition Computation
Inner Contradiction Detection
isDebugging()
F
T
STAR quickly discovers innercontradictions in the current
precondition during execution.
print(…)
debugLog(…)
buffer = new int[16]
index >= buffer.length
T
F
i=0
i = index
buffer[i] = 0
AIOBE
Crash Precondition:
index < 0 or index >= 16
Index < 16
40. Crash Precondition Computation
40
Other Details
Loops and recursive calls
Options for the maximum loop unrollment and maximum recursive
call depth
Call graph construction
User can specify a pointer analysis algorithm to use
Option for maximum call targets
String operations
Strings are treated as arrays of characters.
Complex string operations/regular expressions are not support:
require the usage of more specialized constraint solvers: Z3-str,
HAMPI
42. 42
Input Model Generation
Input Model Generation
1
stack trace
Crash Precondition Computation
Crash
Preconditions
2
Input Model Generation
program
Crash Models
test
cases
3
Test Input Generation
43. Input Model Generation
43
Input Model Generation
After computing the crash precondition, we need to
compute a model (object state) which satisfies this
precondition.
However, for one precondition, there could be many
models that can satisfy it.
• E.g. For precondition: {ArrayList.size != 0}, there could be infinite
number of models satisfying it.
44. Input Model Generation
44
Generating Feasible Input Models
Object Creation Challenge [Xiao et. al, 2011]
Not every model satisfying a precondition is feasible to be
generated.
For precondition: ArrayList.size != 0, an input model: ArrayList.size
== -1 can satisfy it, but such object can never be generated.
Therefore, we want to obtain input models whose objects
are actually feasible to generate.
45. 45
Input Model Generation
Generating Practical Input Models
For different input models, the difficulties in generating the
corresponding objects can be very different.
Model 1:
ArrayList.size == 100
Model 2:
ArrayList.size == 1
Requires add() 100 times
Requires add() 1 time
Therefore, we also want to obtain input models whose
values are as close to the initial values as possible.
46. Input Model Generation
46
Class Information
STAR has an input model generation approach that can
Generate feasible models
Generate practical models
Extracts and uses the class semantic information to
guide the input model generation process.
The initial value for each class member field.
The potential value range for each numerical field:
• e.g. ArrayList.size >= 0
47. 47
Input Model Generation
Input Model Generation
Crash Precondition
Class Information
Value Range
ArrayList.size
!= 0
ArrayList.size
>= 0
Initial Value
ArrayList.size
starts from 0
SMT
Solver
ArrayList.size == 1
A feasible and practical model
49. 49
Test Input Generation
Test Input Generation
1
stack trace
Crash Precondition Computation
Crash
Preconditions
2
Input Model Generation
program
Crash Models
test
cases
3
Test Input Generation
50. Test Input Generation
50
Test Input Generation
Given a crashing model, it is necessary to generate test
inputs that can satisfy it.
However, it could be challenging to generate object test
inputs [Xiao et. al, 2011]
Non-public fields are not assignable
Class invariants are easily broken if generate using reflection.
A legitimate method sequence that can create and
mutate an object to satisfy the target model (target object
state).
51. Test Input Generation
51
Test Input Generation
Randomized techniques
Randoop [Pacheco et. al, 2007]
Dynamic analysis
Palulu [Artzi et. al, 2009]
Palus [Zhang et. al, 2011]
Codebase mining
MSeqGen [Thummalapenta et. al, 2009]
Not efficient as their input generation process are not demand-
driven, and may rely on existing code bases.
52. Test Input Generation
Test Input Generation
STAR proposes a novel demand-driven test input
generation approach.
52
55. 55
Test Input Generation
Summary Extraction
Method
Entry
We perform a forward symbolic
execution to the target method.
obj != null
T
list[size] = obj
e = new
Exception()
size += 1
Path Effect
F
throw e
Path Condition
Path 1
obj != null
list[size] = obj
size += 1
Path 2
Method
Exit
obj == null
throw new
Exception
56. Test Input Generation
Method Sequence Deduction
STAR introduced a deductive-style approach to construct
method sequences that can achieve the target object state
56
57. 57
Test Input Generation
Method Sequence Deduction
Recursive deduction for parameter
Deductive
Engine
Input Parameter’s
Object States
satisfies
Candidate Method
Constraint
Solver
By taking this path, the target
object state can be achieved
.
58. Test Input Generation
Example
public class Container {
public Container()
public void add(Object);
public void remove(Object);
public void clear();
}
Desired object state (Input model): Container.size == 10
58
59. 59
Test Input Generation
Example – Summary Extraction
Container()
Path 1
TRUE
TRUE
remove all in list
size = 0
size = 0
Path 1
Path 2
obj != null
add(obj)
obj == null
list[size] = obj
size += 1
remove(obj)
Path 1
clear()
throw an
exception
Path 1
Path 2
obj in list
obj not in list
remove from list
size -= 1
No effect
60. 60
Test Input Generation
Example – Sequence Deduction
Method
Deduction
Can add() produce target state?
Container.size
== 10
Select add(obj)
Yes, this.size == 9 && obj != null
Deductive
Engine
Can clear() produce target state?
Container.size
== 9
Select clear()
No, not satisfiable
Constraint
Solver
61. 61
Test Input Generation
Example – Sequence Deduction
Method
Deduction
Can add() produce target state?
Container.size
== 10
Select add(obj)
Yes, this.size == 9 && obj != null
Deductive
Engine
Can add() produce target state?
Container.size
== 9
Select add(obj)
Yes, this.size == 8 && obj != null
Constraint
Solver
…
Can Contaier() produce target state?
Container.size
== 0
Select
Container()
Yes, no parameter requirement
62. Test Input Generation
62
Example – Final Sequence
Combine in reverse direction to form the whole sequence
void sequence() {
Container container = new Container();
Object o1 = new Object();
container.add(o1);
… (10 times)
}
63. Test Input Generation
63
Other Details
The forward symbolic execution in method summary extraction
follows similar settings as precondition computation
E.g. Loops and recursive calls are expanded for only limited
times/depth. (So the extracted path summary ≤ total method paths)
• The incompleteness of method path summary does not affect
the precision of the method sequence composition.
Generated method sequences are still correct.
Method sequences may not be generated due to missing path summary.
Optimizations have been applied to reduce the number of
methods and method paths to examine.
65. 65
Research Questions
Research Question 1
How many crashes can STAR compute their crash triggering
preconditions?
Research Question 2
How many crashes can STAR reproduce based on the crash
triggering preconditions?
Research Question 3
How many crash reproductions by STAR are useful for revealing the
actual cause of the crashes?
66. 66
Evaluation Setup
Subjects:
Apache-Commons-Collection (ACC):
data container library that implements additional data structures over
JDK. 60kLOC.
Ant (ANT)
Java build tool that supports a number of built-in and extension tasks such
as compile, test and run Java applications. 100kLOC.
Log4j (LOG)
logging package for printing log output to different local and remote
destinations. 20kLOC.
67. 67
Evaluation Setup
Crash Report Collection:
Collect from the issue tracking system of each subject.
Only confirmed and fixed crashes were collected.
Crashes with no or incorrect stack trace information were discarded.
Three major types of crashes: custom thrown exceptions, NPE and
AIOBE. (covers 80% of crashes, Nam et. al, 2009)
Subject
# of Crashes
Versions
Avg. Fix Time
Report Period
ACC
12
2.0 – 4.0
42 days
Oct. 03 – Jun. 12
ANT
21
1.6.1 – 1.8.3
25 days
Apr. 04 – Aug. 12
LOG
19
1.0.0 – 1.2.16
77 days
Jan. 01 – Oct. 09
52 crashes were obtained from the three subjects.
68. 68
Evaluation Setup
Our evaluation study has the largest number of crashes
compared to previous studies
Subject
Number of Crashes
RECRASH
11
ESD
6
BugRedux
17
RECORE
7
STAR
52
69. 69
Research Question 1
How many crashes can STAR compute their crash
preconditions?
How many crashes can STAR compute crash precondition without
the optimization approaches.
How many crashes can STAR compute crash precondition with the
optimization approaches.
We applied STAR to compute the preconditions for each
crash.
70. 70
Research Question 1
Crashes with preconditions (%)
Percentage of crashes whose preconditions were computed by STAR
80
70
75
73.7
71.4
66.7
60
73.1
+36.9
36.8
50
+38.5
34.6
+57.1
40
30
20
14.3
10
0
ACC
ANT
Without Optimizations
LOG
With Optimizations
Overall
71. 71
Research Question 1
Average time spent (second)
Average time to compute the crash preconditions (The lower the better)
100
90
80
70
60
50
40
30
20
10
0
90.4
59.3
55.1
18.5
2.1
ACC
4.9
ANT
Without Optimizations
2.4
LOG
With Optimizations
3.3
Overall
72. 72
Research Question 1
Crashes with preconditions (%)
Percentage of crashes whose preconditions were computed by STAR –
Break down by each optimization
80
70
60
50
40
30
20
10
0
75
66.7
75
73.7
71.4
73.1
66.7 66.7
47.4
44.2
42.1
36.8
36.8
38.5
34.6
36.5
23.8 23.8
14.3
ACC
14.3
ANT
No Optimization
Heuristic Backtracking
All Optimizations
LOG
Overall
Static Path Reduction
Contradiction Detect
73. 73
Research Question 1
STAR successfully computed crash preconditions for 38
(73.1%) out of the 52 crashes.
STAR’s optimization approaches have significantly
improved the overall result by 20 (38.5%) crashes.
Static path reduction is the most effective optimization, but
the application of all three optimizations together can
achieve a much higher improvement.
74. 74
Research Question 2
How many crashes can STAR reproduce based on the
crash preconditions?
Criterion of Reproduction [ReCrash, 2008]
A crash is considered reproduced if the generated test case can
trigger the same type of exception at the same crash line.
We applied STAR to generate crash reproducible test
cases for each computed crash precondition.
75. 75
Research Question 2
Overall crash reproductions achieved by STAR for each
subject:
Subject
# of Crashes
# of
Precondition
# of
Reproduced
Ratio
ACC
12
9
8
66.7%
(88.9%)
ANT
21
15
12
57.1%
(80.0%)
LOG
19
14
11
57.9%
(78.6%)
Total
52
38
31
59.6%
(81.6%)
76. 76
Research Question 2
More statistics for the test case generation process by
STAR
Subject
Average # of
Objects
Avg. Candidate
Methods
Min – Max
Sequence
Average
Sequence
ACC
1.5
35.5
2 - 19
9.4
ANT
1.4
11.7
2 - 14
6.2
LOG
1.5
21.8
2 - 17
8.1
Total
1.5
21.4
2 - 19
7.7
77. 77
Research Question 3
Criterion of Reproduction does not require a crash
reproduction to match the complete stack trace frames.
A partial match of only the top stack frames is still considered as a
valid reproduction of the target crash according to the criterion.
The root causes of more than 60% of crashes lie in the
top three stack frames [Schroter et. al, 2010]
It is not necessary to reproduce the complete stack trace to reveal
the root cause of a crash.
78. 78
Research Question 3
Drawbacks of Criterion of Reproduction
The crash reproduction may not be the same crash.
The crash reproduction may not be useful for revealing the crash
triggering bug.
Reproduced
Buggy frame
79. 79
Research Question 3
How many crash reproductions by STAR are useful for
revealing the actual causes of the crashes?
Criterion of useful crash reproduction
A crash reproduction is considered useful if it can trigger the same
incorrect behaviors at the buggy location, and eventually causes the
crash to re-appear.
We manually examined the original and fixed versions of
the program to identify the actual buggy location for each
crash.
80. 80
Research Question 3
Overall useful crash reproductions achieved by STAR for
each subject:
Subject
# of Reproduced
# of Useful
Ratio (Total)
ACC
8
7
87.5% (58.3%)
ANT
12
7
58.3% (33.3%)
LOG
11
8
72.7% (42.1%)
Total
31
22
71.0% (42.3%)
81. 81
Comparison Study
We compared STAR with two different crash reproduction
frameworks:
Randoop: feedback-directed test input generation framework. It is
capable of generating thousands of test inputs that may reproduce
the target crashes.
Maximum of 1000 seconds to generate test cases. (10 times of STAR)
Manually provide the crash related class list to increase its probabilities.
BugRedux: a state-of-the-art crash reproduction framework. It can
compute crash preconditions and generate crash reproducible test
cases.
We apply the two frameworks to the same set of crashes
used in our evaluation.
82. 82
Comparison Study
The number of crashes reproduced by the three approaches
38
40
Number of Crashes
35
31
30
25
22
18
20
15
12
10
10
8
7
5
0
0
Precondition
Randoop
Reproduction
BugRedux
Usefulness
STAR
84. 84
Comparison Study
STAR outperformed Randoop because:
Randoop uses a randomized search technique to generate method
sequences. Can generate many method sequences but not guided.
Due to the large search space of real world programs, the
probabilities to generate crash reproducible sequences are low.
STAR outperformed BugRedux because:
Several effective optimizations to improve the efficiency of the
crash precondition computation process.
A method sequence composition approach that can generate
complex input objects satisfying the crash preconditions.
85. 85
Case Study
https://issues.apache.org/jira/browse/collections-411
An IndexOutOfBoundsException could be raised in method
ListOrderedMap.putAll() due to incorrect index increment.
01 public void putAll(int index, Map map) {
02
for (Map.Entry entry : map.entrySet()) {
03
put(index, entry.getKey(), entry.getValue();
04
++index; / / buggy increment
05
}
06 }
This bug was soon fixed by the developers by adding checkers
to make sure index is incremented only in certain cases.
86. 86
Case Study
STAR was applied to generate a crash reproducible test case
for this crash:
Surprisingly, it successfully generated a test case that could crash both the
original and fixed (latest) version of the program.
We reported this potential issue discovered by STAR to the
project developers
https://issues.apache.org/jira/browse/collections-474
We also attached the auto-generated test case by STAR in our bug
report.
87. 87
Case Study
Developers quickly confirmed:
The original patch for bug ACC-411 was actually incomplete. It
missed a corner case that can still crash the program.
Neither the developers nor the original bug reporter identified this
corner case in over a year.
It only took developers a few hours to confirmed and fixed the bug
after STAR’s test case demonstrated this corner case.
The crash reproducible test case by STAR was added to the
official test suite of the Apache Commons Collections project by
the developers.
http://svn.apache.org/r1496168
88. 88
Case Study
STAR is capable of identifying and reproducing crashes that
are even difficult for experienced developers.
STAR can be used to confirm the completeness of bug fixes.
If a bug fix is incomplete, STAR may generate a crash reproducible
test case to demonstrate the missing corner case.
90. 90
Challenges
We manually examined each not reproduced crashes to
identify the major challenges of reproduction:
Environment dependency (36.7%)
File input.
Network input.
SMT Solver Limitation (23.3%)
Complex string constraints (e.g. regular expressions)
Non-linear arithmetic
Concurrency & Non-determinism (16.7%)
Some crashes are only reproducible non-deterministically or under
concurrent execution.
Path Explosion (6.7%)
91. 91
Future Work
Improving reproducibility
Support for environment simulation, e.g. file inputs
Incorporate specialized SMT solver: string solver like Z3-str
Automatic fault localization
Existing fault localization approaches requires both passing and
failing test cases locate faulty statements.
STAR’s ability to generate failing test cases can help automate the
fault localization process.
Crash reproduction for mobile applications
Android applications are similar to desktop Java programs in many
aspects.
92. 92
Conclusions
We proposed STAR, an automatic crash reproduction
framework using stack trace.
Successfully reproduced 31 (59.6%) out of 52 real world crashes
from three non-trivial programs.
The reproduced crashes can effectively help developers reveal the
underlying crash triggering bugs, or even identify unknown bug.
A comparison study demonstrates that STAR can significantly
outperform existing crash reproduction approaches.
95. 95
Subject Sizes
Our evaluation study has one of the largest subject size
compared to previous studies
Subject
Subject Sizes
Average Subject Size
RECRASH
200 – 86,000
47,000
ESD
100 – 100,000
N/A
BugRedux
500 – 241,000
27,000
RECORE
68 – 62,000
35,000
STAR
20,000 – 100,000
60,000
96. 96
Research Question 1
Average time spent (second)
Average time to compute the crash preconditions (The lower the better) –
Break down by each optimization
100
90.4
86.8
80
74.8
67.5
60
59.3
55.1
50
47.8 48.2
54.3
39.2
40
28.3
20
18.5
11.8
15.9 13.8
4.9
2.1
3.3
2.4
0
ACC
ANT
No Optimization
Heuristic Backtracking
All Optimizations
LOG
Overall
Static Path Reduction
Contradiction Detect
97. 97
Average time spent (second)
Comparison Study
Average time to reproduce crashes (The lower the better) –
Only the common reproductions
35
29.9
30
25
20
15
10.8
8.7
10
5
2.4
4.275 3.75
2.3
4.6
0
ACC
ANT
BugRedux
LOG
STAR
Overall
99. Comparison Study
Branch coverage achieved by different test case generation approaches
Branch Coverage (%)
80
74
69
70
58
60
61
54
54
50
40
40
29
30
20
16
29
19
36
30
22
20 22
12
10
0
0
ACC
Sample Execution
JSAP
Randoop
Palulu
SAT4J
RecGen
Palus
STAR
Editor's Notes
Therefore, failure reproduction is a very important part of the software development process.
Techniques have been introduced to try to reproduce crashes, the current state-of…
However, they have several limitations:
No runtime data collections.No runtime performance overhead.Does not rely on memory dump or developer written contracts.Optimizations to greatly improve the crash reproduction process.Capable of reproducing non-trivial crashes from object-oriented programs.
STAR implements an backward symbolic execution engine that can compute the crash preconditions
The number of potential backward path grows exponentially to the number of branches
Mostly minimal
The underlying challenging is that, constraint solver does not any background knowledge about the subject program
We found that,
Given an input model (object state), the approach can effectively construct method sequence which can generates objects satisfying this model.
If there is subsequent invocations, effect from the invocation target method will also be included.
Star implements and deductive engine which The input model returned by the SMT solver indicates the requiring object states for the method inputs. A method path can produce the target object state if : Φ𝑝𝑎𝑡h∧Φ𝑡𝑎𝑟𝑔𝑒𝑡 is satisfiableStarimplements and deductive engine which The input model returned by the SMT solver indicates the requiring object states for the method inputs. A method path can produce the target object state if : Φ_𝑝𝑎𝑡ℎ∧Φ_𝑡𝑎𝑟𝑔𝑒𝑡 is satisfiable
… In addition, because of our various efficiency improvements, we are able to apply STAR to much larger subjects compared to previous studies.
The improvement from ACC is comparatively smaller since it has relatively fewer paths than ANT and LOG.
Because STAR needs a precondition to reproduce a crash, the precondition column shows the upper bound of the number of crashes we can reproduce.
Our settings favor Randoop over STAR.
14 more crashes by STAR
Our settings favor Randoop over STAR.
… In addition, because of our various efficiency improvements, we are able to apply STAR to much larger subjects compared to previous studies.