Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzzles (ASE 2012)
1. 1
Puzzle-Based Automatic Testing:
Bringing Humans Into the Loop by
Solving Puzzles
Ning Chen and Sunghun Kim
The Hong Kong University of Science and Technology
ASE 2012, Essen
2. 2
Motivation
• Many automatic test generation techniques introduced:
• Randoop (C.Pacheco, ICSE 2007)
• Pex (N.Tillmann, TAP 2008)
• OCAT (H.Jaygarl, ISSTA 2010)
• However, their coverage results are still not satisfactory
when applied to complex object-oriented programs.
3. 3
Coverage by Randoop
Subject Branches Coverage
Commons Math 7707 61.6%
Commons Collections 5242 53.0%
4. 4
Coverage by Pex
Subject LOC Coverage
SvnBridge 17.1K 56.2%
xUnit 11.4K 15.5%
Math.Net 3.5K 62.8%
QuickGraph 8.3K 53.2%
Total 40.3K 49.8%
- by Xiao et.al (ICSE 2011)
5. 5
Major Challenges
• The Constraint Solving challenge:
Test generators fail to solve path conditions to cover
certain branches.
• The Object Creation/Mutation challenge:
Test generators cannot create and/or modify test
inputs to desired object states;
6. 6
Challenge 1 : The Constraint Solving problem
void foo(int x, int y, int n) {
int value = x << n;
if (value < y && n > 2) {
// not covered branch
…
}
}
What’s a model (solution) for the path condition:
(x << n) < y && n > 2
7. 7
Challenge 2: The Object Creation/Mutation Challenge
void foo(Container container) {
Given Model:
if (container.size() >= 10) { container.size() = 10
// not covered branch
…
}
How do we create and mutate a Container object into
size() = 10?
22. 22
Path Computation
• The final result of this phase:
• The models which will be used to generate the object mutation
puzzles.
• The constraints not solvable by SMT solver, which will be used to
generate the constraint solving puzzles.
24. 24
Extracting Sub-models
• A complete model can contain object states for different
inputs, we want to divide them into sub-models.
Complete Model:
in == not null
in.readInt() == 1
this.currentState == null
Sub-model 1:
Sub-model 2:
in == not null
this.currentState == null
in.readInt() == 1
25. 25
Prioritizing Sub-models
• One sub-model may appear in many models.
Model 1: Model 2:
in == not null in == not null
in.readInt() ==1 in.readInt() == 0
this.currentState == null this.currentState == null
• Sub-models are prioritized so that high frequent sub-
models are ranked higher.
28. 28
Extracting Error Related Constraints
• From up-front testing runs, we can obtain many branches
whose path conditions are not solvable by the SMT solver.
Path Conditions:
1. this == not null
2. this.sums == not null
3. this.sums.length * this.sums.length <= 4096
4. this.sums.length > 0
5. this.n > 1
Error: feature not support: non-linear problem .
29. 29
Extracting Error Related Constraints
• From up-front testing runs, we can obtain many branches
whose path conditions are not solvable by the SMT solver.
Path Conditions:
1. this == not null
2. this.sums == not null
3. this.sums.length * this.sums.length <= 4096
4. this.sums.length > 0
5. this.n > 1
Error: feature not support: non-linear problem .
Error related constraints:
1. this.sums.length * this.sums.length <= 4096
2. this.sums.length > 0
30. 30
30
Grouping Constraint Sets
• Error related constraint sets could be identical except only
the variable names:
Group 1 Group 2 Group 3
Constraint
Set Constraint
Constraint Set
Sets
31. 31
31
Grouping Constraint Sets
• Error related constraint sets could be identical except only
the variable names:
Group 1 Group 2 Group 3
Constraint
Set Constraint
Constraint Set
Sets
Puzzle 3 Puzzle 1 Puzzle 2
34. 34
Test Case Generation from Solutions
• Solutions from Constraint Solving Puzzles:
• More models for not covered branches
• Solutions from Object Mutation Puzzles:
• Method call sequences to achieve the goal state
• They can be directly translated to source code for generating one
test input.
• A test case can be generated when all test inputs are generated.
37. 37
Research Question 1
• How many PAT puzzles are solvable by humans?
• Participants:
• Eight computer science major graduate students.
• Subject:
Subject Version Branches
Commons Math 2.1 7707
38. 38
Research Question 1
• Presenting Puzzles:
• The same top 100 object mutation puzzles.
• The same top 100 constraint solving puzzles.
• Repeated solutions are counted only once.
• Result:
Puzzle Total Presented Solved Avg. Time
Mutation 100 51 1 minute
Constraint 100 72 1 minute
39. 39
Research Question 2
• How many people would play PAT voluntarily?
• Participants:
• We posted the links to PAT puzzles on Twitter and encourage
people to participate.
• Subject:
Subject Version Branches
Commons Collections 3.2.1 5242
40. 40
Research Question 2
• Puzzles:
• The same top 100 object mutation puzzles.
• The same top 100 constraint solving puzzles.
• Repeated solutions are counted only once.
• Result
• In total, 120 people volunteered to play the puzzles
Puzzle Total Presented Solved Avg. Time
Mutation 100 24 1 minute
Constraint 100 84 1 minute
41. 41
Research Question 3
• How much is the test coverage improved by the puzzle
solutions of PAT?
• We executed test cases generated from puzzles solutions
from RQ1 and RQ2.
• We measure the # of additional branches coverable by
PAT over the baseline techniques (Randoop + Symbolic)
43. 43
Research Question 4
• How much manual test case writing effort can be saved
with the help of PAT?
44. 44
Research Question 4
• How much manual test case writing effort can be saved
with the help of PAT?
45. 45
Conclusion
• A novel framework to support software testing through
puzzle solving by Internet users.
• Two prototype puzzles have been introduced:
• Constraint solving puzzles
• Object mutation puzzles
• More kinds of puzzles could be developed in the future
• Evaluations show that PAT puzzles are solvable and can
help improve non-trivial branch coverage on complex OO
programs.
Editor's Notes
Hello everyone, I am ningchen from hkust. Today I will present the paper: Puzzle-Based Automatic Testing: Bringing Humans intothe Loop by Solving Puzzles
Recent years, many automatic test generation approaches have been introduced, such as randooppex and ocat. However, theirtest coverage is still not satisfactory when to complex object-oriented programs.
For example, we applied an automatic test generation approach, Randoop, on two complex object oriented programs, apache commons math and apache commons collections. The branch coverage achieved by Randoop is only 61 and 53% respectively.From the bottom figure, we can see that, the coverage by Randoop was basically saturated after 1000 seconds.
Also, in another recent research, the researchers found that the branch coverage achieved by Pex is around 15 – 60% when applied to complex object oriented programs.
Two of the major challenges identified to block higher coverage in these automatic test generation approaches is: The constraint solving challenge: Where Test generators fail to solve path conditions to cover the branches due to SMT solver limitations.And the object creation and mutation challenge, Where test generators cannot create and mutate test inputs to desired object states that can cover the target branches.
I will now present a simple motivating example to show these challenges in automatic test generation approaches. Given a foo method which takes three parameters. Our goal is to cover the not covered branch in green.The first challenge we face is: what is a model (a solution) for the path condition in bold. SMT solvers leveraged by the test generation approaches may return error if they do not support non-linear bitwise operation.
In the second example, assume we are able to retrieve a model for the path condition of the not covered branch which is the container object size is equal to 10.The second challenge we face is: ‘how can we create and mutate the Container object into size of 10’? Sometimes there are setter methods available for modifying the necessary object state. But in many there is no direct setter functions available. In such cases, it’s required that some specific method call sequences are constructed to create and mutate the object into the desired object state.However, creating the method sequences could be a non-trivial task.
After struggling trying to handle these challenges manually, we raised a question: can we leverage humans to help solve these challenges in the form of puzzles? In our previous motivating examples, especially the one on constraint solving, even though SMT solvers may fail to provide us the result, it is in fact not difficult for human to solve them.
That’s why we propose PAT, a puzzle-based testing environment which incorporates the help from humans to handle challenges like the ones we just presented.Before going into more technical details of PAT, I will first present a brief overview of PAT puzzles.
The first kind of PAT puzzle we designed is the object mutation puzzle. The purpose of object mutation puzzle is to incorporate human to help us find out how to construct the method call sequences that can create and mutate an object into a target object state.
Initially, PAT presents a goal object state to the human players. The goal state shows the object state which we want to generate objects to satisfy. Human players are asked to help mutate a randomly selected object to satisfy this goal state.
Under the goal state, the current state is also displayed which represents the state of the current object under mutated. An object mutation puzzle is considered solved when each condition in the current object matches exactly with the condition in the goal state.Of course, players can always load another object to play with, when they think that it is not likely to mutate the current object to the goal state.
In the panel at the bottom half of the puzzle, a set of available methods are listed. These are the available public methods in the object’s class. Human players can select methods from the list to execute. The execution results are immediately shown on the screen.Furthermore, PAT can heuristically identify and recommend the most possible methods in the list for satisfying the goal state.
The second kind of PAT puzzle we designed is the constraint solving puzzle. The purpose of the constraint solving puzzle is to incorporate human to help us find out models for constraints that were not solved by the SMT solver.
The left panel of the puzzle shows the set of constraints currently being solved. Each line on the panel represents one constraint to be satisfied. A constraint is highlighted in green if it is satisfied under the current set of concrete values.Otherwise the constraint is shown in red.
The panel on the right is the concrete values currently assign to the variables in the constraints. Human players are expected to provided concrete values in this panel to the variables. AndThe final goal of the constraint solving puzzle is to provide a set of concrete values to the variables that can satisfy all of the constraints presented in the left panel.
So, I have just presented an overview of the PAT puzzles, But how did PAT construct the PAT puzzles from the program code.Next, I will present the detail approach of PAT on creating the puzzles from a program.
The architectural design of the PAT framework is as follows. PAT is consists of five modules:
Initially, in the Up-front Testing Runs, we run existing automatic test generation techniques for the target code. We obtain two categories of information in this phase: The first one is…
A complete coverage report from these automatic techniques Since state-of-the-art test generation approaches such as Randoop can already achieve 50-60% coverage, PAT will only focus on the rest of the difficult to cover areas.So that human effort would not be wasted.The second pieceof artifacts we collect is dynamic runtime information such as:a) Object instancesb)Method call sequencesWe will need to use these information for the later phases to help generate PAT puzzles.
After performing the up-front testing runs, PAT next conducts a path Computation on branches not covered in the up-front testing runs.It is a backward symbolic execution process. During which, PAT propagates path conditions from the target branch to public method entries. The purpose of this phase is to find out the models which can satisfy these path conditions to cover the not covered branches.
Two outputs will be generated in this phase:First, PAT retrieves the models to satisfy in order to cover the not coverage branches.These models are used to construct the object mutation puzzle.Second, PAT saves the complex path conditions which the SMT solver could not solved during the computation process.PAT uses these complex path conditions to construct the constraint solving puzzles.
After obtaining the model for covering branches, PAT uses these models to construct the object mutation puzzles.
Complete models extracted from the path computation process are not used directly to form the puzzle. The reason is, a complete model can contain several object states related to different inputs. For example, the complete model in the upper figure consists of two object states related to different test inputs. Therefore, PAT first divide models into sub-models instead of using complete models to construct puzzles.
After extracting sub-models, PAT next prioritizes sub-models.A sub-model can be shared by many models. For example, the two models on this page share a common sub-model. solving this common sub-model may benefit both models.Therefore, PAT prioritizes sub-models so that high frequent sub-models are always ranked higher.
Finally, each sub-model is presented as a object mutation puzzle, where human players can interact on the interface to satisfy the presented sub-model.
The second kind of puzzle PAT creates the constraint solving puzzle.
The first step in creating the constraint solving puzzle is to extract the error related constraints.Given a path condition that causes the smt solver to fail in the up-front testing runs, PAT identifies and extracts the constraints that causes SMT error.
For instance, in this example, constraint 3 and 4 are the error-causing constraints and were extracted from the path conditions.
After extracting the error related constraint sets, they are then grouped into different categories. The underlying reason is that, many constraint sets are identical except only the variable names.Therefore, to avoid solving the same constraint sets again and again, PAT puts semantically identical constraint sets into the same group.Since constraint sets are grouped, each grouped could then be prioritized according to the number of constraint sets they have in the group.Groups with more constraint sets are ranked higher and will be more likely to be chosen to present as a constraint solving puzzle.
After constraint sets are grouped, each grouped can then be prioritized according to the number of constraint sets they have in the group.Groups with more constraint sets are ranked higher and will be more likely to be chosen to present as a constraint solving puzzle.
Finally, one representative constraint set is selected from each group and presented as a constraint solving puzzle to players.The solution of this puzzle can be applied to all constraint sets of the same group by simply changing the variable names.
Finally, in the last phase, test cases are generated by analyzing the puzzle solutions from players.
For solutions from constraint solving puzzles, PAT uses them as additional models for the not covered branches. For solutions from object mutation puzzles, PAT translates human actions into method call sequences to generate test inputs. If all test inputs of a complete model are generated, a test case can be constructed by combining the corresponding method call sequences together.
Next, I will present the evaluation results of PAT.
For the evaluation, we use two different subject programs , the apache-commons-math and the apache-commons-collections. Both subjects are complex object-oriented programs with more than 5000 and 7000 branches respectively.For the up-front testing runs, we use a state-of-the-art Random test generation framework, Randoop plus a symbolic execution module. The bottom table shows that, about 55 to 65 % of the total branches are coverable by these baseline approaches. So PAT will only focus on the rest of the 40% branches not coverable in the Up-front testing runs.
In research question 1, we want to study, how many of the PAT puzzles are solvable by humans.We generated both kinds of puzzles for the subject program: apache-commons-math and Invited eight computer science graduate students to play the puzzles.
Each participant was presented with the top 100 object mutation and 100 constraint solving puzzles. We measured the total number of puzzles solved by the participants. Repeated solutions were counted only once.In total, 51 out of the 100 object mutation puzzles and 72 out of the 100 constraint solving puzzles were solved. The average time spent by each participant on a puzzle is around 1 minute before they solve it or move on to the next puzzle..
The next research question we investigate is, how many people would play PAT voluntarily?We generated both kinds of puzzles for the subject program: apache-commons-collection and posted the links to the puzzles on Twitter and encourage people to participate.
In total, 120 people volunteered to play the PAT puzzles.Similar to research question one, We also measured the total number of puzzles solved by the participants. In total, 24 out of the 100 object mutation puzzles and 84 out of the 100 constraint solving puzzles were solved by these 120 participants. The average time spent by each participant on a puzzle is also around 1 minute.
After retrieving puzzle solutions from the first two research questions, we went on to study how much test coverage can we improved by leveraging the puzzle solutions.To do so, we generated test cases from these PAT puzzle solutions and measure the number of additional branches coverable by PAT over our baseline techniques.
In this figure, the highlight areas are the additional branches covered by test cases generated from PAT puzzle solutions. In total, 534 additional branches and 308 additional branches were covered by these test cases. Considering the saturated coverage already achieved by the baseline test generation techniques, and the relatively small scale of the study, the additional improvements by PAT is non-trivial.
From the last research question, we saw that by playing only 200 PAT puzzles by humans, more than 500 additional branches could be covered. In research question 4, we study how much manual test case writing effort can potentially be saved by PAT.We randomly selected 20 branches from the two subjects and manually constructed test cases to cover them.
On average, it took us about 8 to 9 minutes of manual effort to construct one test case to cover a branch. So, PAT has the potential of reducing hours or even days of manual work by developers on test case writing.
In conclusion, in this paper, we proposed a novel framework to support software testing through puzzle solving by Internet users.Two types of puzzles have been introduced in the PAT framework, and more kinds of puzzles may be introduced in the future.Evaluations show that PAT puzzles are solvable and can help improve non-trivial branch coverage on complex OO programs.