1. CARFAST:
ACHIEVING HIGHER STATEMENT
COVERAGE FASTER
Sangmin Park,
Ishtiaque Hussain,
Christoph Csallner,
Kunal Taneja,
B. M. Mainul Hossain,
Mark Grechanik,
Chen Fu, Qing Xie
2. CarFast Implementation Evaluation Conclusion
Motivation - Achieving High Coverage
Coverage
Degreeto which program has been tested
Measure of confidence
Widely used in industry
Avionics industry standard, DO-254 and DO-178B
Automotive industry standard, IEC 61508
Other organizations
2
3. CarFast Implementation Evaluation Conclusion
Motivation - Achieving Coverage Fast
Current approaches
Timeout
Goal: Achieve high high coverage
Achieving coverage faster fast is difficult
Complex programs
Too many test inputs
(e.g., Renters Insurance Program with 78M customer profiles)
3
4. CarFast Implementation Evaluation Conclusion
High level approach
Observation (study we performed)
80% of statements are covered by 20% of branches
(we call those branches "profitable")
Intuition
Cover profitable branches fast leading to achieving
high statement coverage quickly
High level approach
Use static analysis to find profitable branches
Select inputs that direct program execution towards
profitable branches
4
5. CarFast Implementation Evaluation Conclusion
CarFast – Illustrative Example
i1 = 20 and i2 = 20
void foo (int i1, int i2) {
i1==10
1: if (i1 == 10) { T F
2: … // branch 1: 300 statements 300 i2==50
3: } else if (i2 == 50) { stmts T F
4: … // branch 2: 600 statements 600 100
5: } else { stmts stmts
6: … // branch 3: 100 statements
7: if (i1==20) {
8: if (i2==30) { … }
9: }
10: }
}
5
6. CarFast Implementation Evaluation Conclusion
CarFast – Illustrative Example
i1 = 20 and i2 = 20
void foo (int i1, int i2) {
i1==10
1: if (i1 == 10) { T F
2: … // branch 1: 300 statements 300 i2==50
3: } else if (i2 == 50) { stmts T F
4: … // branch 2: 600 statements 600 100
5: } else { stmts stmts
6: … // branch 3: 100 statements
7: if (i1==20) {
8: if (i2==30) { … }
9: }
10: }
DFS search: up to 10%
}
Branch 2: up to 70%
6
7. CarFast Implementation Evaluation Conclusion
CarFast – Algorithm
i1 = 20 and i2 = 20
void foo (int i1, int i2) {
i1==10
1: if (i1 == 10) { T F
2: … // branch 1: 300 statements 300 i2==50
3: } else if (i2 == 50) { stmts T F
4: … // branch 2: 600 statements 600 100
5: } else { stmts stmts
6: … // branch 3: 100 statements
7: if (i1==20) {
8: if (i2==30) { … }
9: }
10: }
Step 1: Step 2: Step 3:
} Select Select
Rank
Branches Initial Input Next Input 7
8. CarFast Implementation Evaluation Conclusion
CarFast – AlgorithmStep 1: Rank branches
• Counts (transitively) branches
by the number of statements
they contain
void foo (int i1, int i2) {
• Resolves method calls
• Ranks branches by statements
i1==10
1: if (i1 == 10) { T F
2: … // branch 1: 300 statements 300 i2==50
3: } else if (i2 == 50) { stmts T F
4: … // branch 2: 600 statements 600 100
5: } else { stmts stmts
6: … // branch 3: 100 statements
7: if (i1==20) {
8: if (i2==30) { … }
9: }
Rank Branch # Stmt
10: }
1 2 600
} 2 1 300
3 3 100
8
4 … …
9. CarFast Implementation Evaluation Conclusion
CarFast – Algorithm 2: Select a random input
Step
• Selects a random input from input
database
void foo (int i1, int i2) {
i1==10
1: if (i1 == 10) { T F
2: … // branch 1: 300 statements 300 i2==50
3: } else if (i2 == 50) { stmts T F
4: … // branch 2: 600 statements 600 100
5: } else { stmts stmts
6: … // branch 3: 100 statements
7: if (i1==20) {
8: if (i2==30) { … } Input 1: i1 = 20 and i2 = 20
9: }
Rank Branch # Stmt i1 i2
10: }
1 2 600 5 50
} 2 1 300 20 20
3 3 100 30 30
9
4 … … 40 40
10. CarFast Implementation Evaluation Conclusion
Step 3: Select next input from trace
• Executes the program with the input
CarFast – Algorithm to collect path condition
• Modifies path condition to cover
higher ranked branches
• Queries the condition to database
void foo (int i1, int i2) { • Selects random input if there are no
satisfying input
i1==10
1: if (i1 == 10) { T F
2: … // branch 1: 300 statements 300 i2==50
3: } else if (i2 == 50) { stmts T F
4: … // branch 2: 600 statements 600 100
5: } else { stmts stmts
6: … // branch 3: 100 statements
7: if (i1==20) {
8: if (i2==30) { … } Input 1: i1 = 20 and i2 = 20
9: }
Rank Branch # Stmt i1 i2
10: }
1 2 600 5 50
} 2 1 300 20 20
3 3 100 30 30
10
4 … … 40 40
11. CarFast Implementation Evaluation Conclusion
Step 3: Select next input from trace
• Executes the program with the input
CarFast – Algorithm to collect path condition
• Modifies path condition to cover
higher ranked branches
• Queries the condition to database
void foo (int i1, int i2) { • Selects random input if there are no
satisfying input
i1==10
1: if (i1 == 10) { T F
2: … // branch 1: 300 statements
i2==50
3: } else if (i2 == 50) { T F
4: … // branch 2: 600 statements 600 100
5: } else { stmts stmts
6: … // branch 3: 100 statements
7: if (i1==20) {
8: if (i2==30) { … } Input 1: i1 = 20 and i2 = 20
9: }
Rank Branch # Stmt i1 i2 C: (i1!=10)&&(i2!=50)&&(i1==20)&&(i2!=30)
10: }
1 2 600 5 50
} 2 1 300 20 20
3 3 100 30 30
11
4 … … 40 40
12. CarFast Implementation Evaluation Conclusion
Step 3: Select next input from trace
• Executes the program with the input
CarFast – Algorithm to collect path condition
• Modifies path condition to cover
higher ranked branches
• Queries the condition to database
void foo (int i1, int i2) { • Selects random input if there are no
satisfying input
i1==10
1: if (i1 == 10) { T F
2: … // branch 1: 300 statements
i2==50
3: } else if (i2 == 50) { T F
4: … // branch 2: 600 statements 600 100
5: } else { stmts stmts
6: … // branch 3: 100 statements
7: if (i1==20) {
8: if (i2==30) { … } Input 1: i1 = 20 and i2 = 20
9: }
Rank Branch # Stmt i1 i2 C: (i1!=10)&&(i2!=50)&&(i1==20)&&(i2!=30)
10: }
1 2 600 5 50
C’: (i1!=10)&&(i2==50)
} 2 1 300 20 20
3 3 100 30 30 Input 2: i1 = 5 and i2 = 50
12
4 … … 40 40
13. CarFast Implementation Evaluation Conclusion
Implementation
Scalability challenges in large applications: up to 1MLOC
Large constraints of size up to 5MB
Existing tools run out of memory
Execution Engine
Initial tool: Concolic execution engine (Dsc)
Solution: DSC-Dumper mode
Uses disk instead of memory
Removes memory overhead
Test Input Database
Initial tool: MSSQL server 2008
Solution: Constraint-based selector
Uses B+ tree based index
Provides API to process queries
13
14. CarFast Implementation Evaluation Conclusion
Experiment – Approaches
Adaptive Random Testing
Random Testing (ART)
• Random selection of inputs • Random selection of evenly
• Black-box approach distributed inputs
• Black-box approach
DART CarFast
• Concolic execution • Our approach
approach • Static ranking based path
• Depth-first path exploration exploration
• White-box approach • White-box approach
14
15. CarFast Implementation Evaluation Conclusion
Experiment – Subject Programs
Challenges in selecting programs
Programs with various sizes
Programs with complex properties
Programs without external dependencies
RugRat program generator [WODA 2012]
Stochastic-parse-tree based program generation approach
Highly configurable option parameters
Used in generating 12 programs from 1KLOC to 1MLOC
Test inputs
Each program has up to 20 integer inputs
Complete combination of inputs for 20 integers = 10020
Pairwise combination of inputs for 20 integers = 1M 15
16. CarFast Implementation Evaluation Conclusion
Experiment – Setup
Study Protocol
For statistical significance, ran 30 times
Total time = 4 approaches*12 programs*
30 times*24 hours
= 34,560 hours
Baseline coverage = min(covi)
where i = {Random, ART, DART, CarFast}
Measurement (to achieve baseline coverage)
Number of iterations (1 iteration = 1 selection)
Elapsed time
16
17. CarFast Implementation Evaluation Conclusion
Experiment – Results
3 1 2
Programs Baseline Appoaches Iterations Elapsed Time
Coverage (mean) (mean)
Random 17.1 522.2
ART 17.8 59.8
3 (1.2K) 45%
DART 693.5 1447.0
CarFast 5.9 571.0
Random 1023.2 3162.5
5 (2.1K) 78% ART 1615.6 5157.7
CarFast 463.9 20040.9
Random • DART doesn't
543.1 1736.8
7 (7.8K) 79% ART scale
684.1 2217.6
CarFast 380.0 18829
17
* Complete results are in the paper.
18. CarFast Implementation Evaluation Conclusion
Future Work
Bottleneck
Current: Identified modules causing bottlenecks
Future: Improve the runtime of CarFast
Fault-detection ability
Current: Does not measure fault-detection ability
Future: Investigate fault-detection ability
Other test coverage metrics
Current: Used static measure on statements
Future: Use static measure on branches
18
19. CarFast Implementation Evaluation Conclusion
Contributions
CarFast
The first approach to select inputs for achieving
statement coverage fast
Implementation
The tool scales up to 1MLOC
Experiment
The study shows limitations in popular testing
techniques with statistical significance
Tool, subjects, experimental data are available
www.carfast.org
20
21. CarFast Implementation Evaluation Conclusion
Related Work
Test-case prioritization
Test case prioritization: empirical studies
[Elbaum, 2002]
Dynamic symbolic execution
DART [Godefroid, 2005]
Hybrid concolic testing [Majundar, 2007]
Heuristics for dynamic test generation [Burnim, 2008]
Search-based testing
Fitness-guided path exploration [Xie, 2009]
22
22. CarFast Implementation Evaluation Conclusion
CarFast – Preliminary Study
Study
Performed on Apache programs
Investigated branches and statements
Observed power law in results –
20% of branches contain
80% of statements
Hypothesis
Assuming the observation holds,
we can steer execution to cover
those 20% of branches
23
Editor's Notes
Testing is an important part of the software-engineering process.Test coverage is a measure used in software testing.
Coverage - degree to which how much of the programCoverage - important because provides measure of how well the program is testedHence - achieving high essentialWidely….
Current approaches focused on achieving high coverage are slow, and often run out of available resources <animation>Achieving high coverage fast is difficult because …. <bullets>Hence, the goal of our technique is to achieve high coverage faster <animation>
To achieve the goal, we designed a high level approachThe approach starts with an observationFrom the observation, we got an intuitionBased on the intuition, we designed high level approach First, the approach can use Then, the approach select inputs using the analysis
Let me explain the approach with an illustrating example.Here is a Java-like example program.The function foo takes two integers i1 and i2.<click> It has three outer branches, branches 1 to 3, <click> and two inner branches under branch 3.<click> If the input takes branch 1, it covers 300 statements.If the input takes branch 2 or 3, <click> it covers 600 <click> or 100 statements.Our goal is to select an input to take branch 2 fast. <click> However, before developing a technique, we got a question on program characteristics.Do real programs have branches like branch 2?That is, do real programs have branches containing many statements than other branches?To answer the question, we performed a preliminary study.
The input takes branch 3Clearly the input is not good w.r.t. increasing statement coverageHowever, none of existing approaches systematically steer execution to branch 2One existing approach is DARTAs next input, it selects one inside branch 3, so the total coverage will remain up to 10%An ideal appraoch will try to get an input that covers branch 2 and
Finally, if the input takes branch 3, it can cover up to 100 statements.Assume that we have an input, i1 = 3 and i2 = 7. Then, this input will lead a path to branch 3. This is clearly a bad input because it covers at most 10% of the statement coverage.
The algorithm consists of three steps.Step 1 statically ranks branches before program executions.To do so, it counts the number of statements per branch.It works transitively in that if there are branches or method calls inside a branch, it includes the statements of them.Then, Step 1 ranks branches in decreasing order of statements.<click> For our example code, the algorithm generates the ranking table.<click> For example, branch 2 is ranked first because it contains 600 statements, the most number of statements in the program.
Step 2 of the algorithm selects an input randomly from the test input database.Then, it executes the program with the input to collect path condition.Here, path condition is conjunctions of constraints.Let’s go back to the example.<click> We assume that we have a test input database as Renters Insurance Program has user profiles.The input database has two columns, i1 and i2.<click> Step 2 selects a random input,i1 = 20 and i2 = 20.
Step 3 is the main part of the algorithm.Step 3 actually executes the program with inputs and selects the next input.<click>First, it executes the program with the current input to collect path condition.Here, path condition is conjunctions of constraints.<click>Then, it modifies path condition to cover high-ranked branches.To do so, it uses the static ranking table.<click>If Step 3 creates a new path condition, it queries the condition to the input database to find an input.This modification of path conditions goes on with a loop.<click>If there are no satisfying inputs with new path conditions, Step 3 randomly selects a new input from the database.<click>Let’s go back to the example.
Step 3 executes the program with Input 1.Then, it covers four if-statements and collects a path condition, C.<click> The first two constraints are from two outer if-statements, corresponding branches 1 and 2.<click> The last two constraints are from two inner if-statements, corresponding two branches inside branch3.
Then, Step 3 modifies the path condition, C.<click> It investigates the ranking table from the top rank. <click> Branch 2 corresponds to the if-statement of checking i2 == 50, and the corresponding constraint is in C.<click> Thus, Step 3 modifies C to create a new path condition C’ to cover Branch 2.<click> Then, it queries the input database to find an input satisfying C’. <click> It finds a new input i1=5 and i2=10, <click> and uses it as Input 2.In our example, Step 3 found an input with a first try. However, if there is no corresponding input in the database, <click><click> Step 3 can search for a new input by investigating other branches and constraints.<click> Then, if there is no such a test in the database, Step 3 finally can select a random input from database.Step 3 works in this way until a target coverage limit.In summary, CarFast guides the next input based on current input information.To evaluate CarFast, we implemented it for Java programs.
The goal of implementation is to apply CarFast to large applications, <click>up to 1MLOC programs.<clicik>To do so, we implemented our technique in three different modules: the CarFast main module that executes the main algorithm, the execution engine that executes the program with a test input and generates execution trace, and the input database.We found and addressed several challenges in the modules.<click> The first challenge was in the execution engine. Initially, we used a Java concolic execution engine for our purpose.We used Dsc, that was developed by our co-authors.<click> However, we observed several exceptions with memory-related problems. The problems occurred because the concolic execution engine could not hold the data for large programs.<click>Thus, we developed a new mode, Dsc-dumper mode. Instead of saving the execution engine states in memory, Dsc dumps its constraints into disk and passes them to CarFast.With the modification, we didn’t observe memory exceptions.
In our experiment, we compared four approaches each other.<click> First two approaches are random testing techniques.The first one is a pure random testing approach: It selects inputs with randomly.The second one is adaptive random testing approach: It selects inputs randomly, but it computes distance among inputs and selects evenly distributed inputs.<click> Last two approaches are white-box approaches.They collect path constraints and explores program path using the constraints.The third one is DART. It explores program path in a depth-first-search manner.The final approach is CarFast. In contrast to DART, it uses static ranking based approach to select inputs.
Finally, let me explain the subject programs.Selecting subject programs was challenging for several reasons.The program sizes should be various, and the programs should have complex logics.However, to run execution engines, we needed programs without any external dependencies.Finding such programs was challenging.<click> Thus, we created and used a program generator, RugRat, which was published in this year’s WODA.It uses stochastic parse tree based program generation approach to create random programs.It provides highly configurable options to express different program properties.We used RugRat in generating 12 programs from 1KLOC to 1MLOC.<click> The programs takes up to 20 integers as inputs.For input data, we used integer inputs with ranges -50 to 50.The complete combination of 20 integers of the range becomes 100 to 20.Instead of using the complete input, we used pairwise combinatorial testing technique to use reduced combinations.
Then, let me explain the experimental setting.<click>Because all four approaches have random nature, for statistical significance, for each approach, we ran each program 30 times.<click>Then, we performed a large-scale experiment to get the statistical significance.The total time becomes multiplications of 4 approaches, 12 programs, 30 times, and 24 hours – the time limit. So, it is 34 thousand hours.We ran the experiments on Amazon EC2 cloud.<click> We used two measurement criteria to compare approaches: number of iterations and elapsed time to reach target coverage.To determine target coverage, we used the lowest possible coverage after 24-hour time limit.
Random had less runtime than CarFast.However, CarFast performs more sophisticated analysis to select a fewer number of test cases that achieves same amount of test coverage.
1.2. 3.
There are several classes of related work.The first work is about test case prioritization. The work is in the context of regression testing and requires prior knowledge.The second work is about DSE. Systematic.The final work is about search-based testing. Not scalable.
In conclusion, there are several important contributions of our work.<click>First, we presented a new technique, called CarFast, that has high potential to achieve high statement coverage faster.<click>Second, we implemented the technique in a tool that scales to 1MLOC program.Moreover, we made all the data used in the project publicly available at www.carfast.org.<click>Finally, we performed the first large scale experiment that shows limitations and advantages of popular testing techniques with statistical significance.
There are several classes of related work.The first work is about test case prioritization. The work is in the context of regression testing and requires prior knowledge.The second work is about DSE. Systematic.The final work is about search-based testing. Not scalable.
The study has performed on three popular Apache programs: log4j, ant, and jMeter.<click> The study is to see the relationship between branches and their containing statements. Specifically, we counted the number of statements for their control-dependent branch.<click> As the result, we observed the Power law.That is, 20% of the branches contain 80% of the statements.<click> Assuming this observation holds for other programs, we hypothesized that if we can select inputs that cover those 20% branches first, we can get higher coverage faster.We developed such an algorithm, called CarFast.