1.
Model based Software Testing Test Assessment and Enhancement Aditya P. Mathur Purdue University Fall 2005 Last update: August 18, 2005
2.
Learning Objectives <ul><li>To understand the relevance and importance of test assessment. </li></ul><ul><li>To learn the fundamental principle underlying test assessment. </li></ul><ul><li>To learn various methods and tools for test assessment. </li></ul><ul><li>To understand the relative strengths/weaknesses of test assessment methods. </li></ul><ul><li>To learn how to improve tests based on a test assessment procedure. </li></ul>
3.
What is Test Assessment? <ul><li>Once a test set T, a collection of test inputs, has been developed, we ask: </li></ul><ul><li>How good is T? </li></ul><ul><li>It is the measurement of the goodness of T which is known as test assessment . </li></ul><ul><li>Test assessment is carried out based on one or more criteria . </li></ul>
4.
Test Assessment (contd.) <ul><li>These criteria are known as test adequacy criteria. </li></ul><ul><li>Test assessment is also known as test adequacy assessment . </li></ul>
5.
Test assessment (contd.) <ul><li>Test assessment provides the following information: </li></ul><ul><ul><li>A metric, also known as the adequacy score or coverage , usually between 0 and 1. </li></ul></ul><ul><ul><li>A list of all the weaknesses found in T, which when removed, will raise the score to 1. </li></ul></ul><ul><ul><li>The weaknesses depend on the criteria used for assessment. </li></ul></ul>
6.
Test assessment (contd.) <ul><li>Once the coverage has been computed, and the weaknesses identified, one can improve T. </li></ul><ul><li>Improvement of T is done by examining one or more weaknesses and constructing new test requirements designed to overcome the weakness(es). </li></ul><ul><li>The new test requirements lead to new test specifications and to further testing of the program. </li></ul>
7.
Test Assessment (contd.) <ul><li>This is continued until all weaknesses are overcome, i.e. the adequacy criterion is satisfied (coverage=1) . </li></ul><ul><li>In some instances it may not be possible to satisfy the adequacy criteria for one or more of the following reasons: </li></ul><ul><ul><ul><li>Lack of sufficient manpower </li></ul></ul></ul><ul><ul><ul><li>Weaknesses that cannot be removed because they are infeasible. </li></ul></ul></ul>
8.
Test Assessment (contd.) <ul><ul><ul><li>The cost of removing the weaknesses is not justified. </li></ul></ul></ul><ul><li>While improving T by removing its weaknesses, one usually tests the program more thoroughly than it has been tested so far. </li></ul><ul><li>This additional testing is likely to result in the discovery of remaining errors . </li></ul>
9.
Test Assessment (contd.) <ul><li>Test assessment and improvement is applicable throughout the testing process and during all stages of software development. </li></ul><ul><li>Hence we say that test assessment and improvement helps in the improvement of software reliability . </li></ul>
10.
Test Assessment Procedure Yes Improve T No Measure adequacy of T w.r.t. C. 2 Is T adequate? 3 Yes 4 More testing is warranted ? 5 Select an adequacy criterion C. 1 Develop T 0 No Done 6
11.
Principle Underlying Test Assessment <ul><li>There is a uniform principle that underlies test assessment throughout the testing process. </li></ul><ul><li>This principle is referred to as the coverage principle . </li></ul><ul><li>It has come about as a result of intensive research at Purdue and other research groups in software testing . </li></ul>
12.
The Coverage Principle <ul><li>To formulate and understand the coverage principle, we need to understand: </li></ul><ul><ul><li>coverage domains </li></ul></ul><ul><ul><li>coverage elements </li></ul></ul><ul><li>A coverage domain is a finite domain , related to the program under test, that we want to cover . Coverage elements are the individual elements of this domain </li></ul>
13.
The Coverage Principle (contd.) Coverage Domains Coverage Elements Requirements Classes Functions Interface mutations Exceptions
14.
The Coverage Principle (contd.) <ul><li>Measuring test adequacy and improving a test set against a sequence of well defined, increasingly strong, coverage domains leads to improved confidence in the reliability of the system under test. </li></ul>
15.
The Coverage Principle (contd.) <ul><li>Note the following properties of a coverage domain: </li></ul><ul><ul><li>It is related to the program under test. </li></ul></ul><ul><ul><li>It is finite . </li></ul></ul><ul><ul><li>It may come from program requirements, related to the inputs and outputs . </li></ul></ul>
16.
The Coverage Principle (contd.) <ul><ul><li>It may come from program code . Can you think of a coverage domain that comes from the program code? </li></ul></ul><ul><ul><li>It aids in measuring test adequacy as well as the progress made in testing. How ? </li></ul></ul>
17.
The Coverage Principle (contd.) <ul><li>Example: </li></ul><ul><ul><li>It is required to write a program that takes in the name of a person as a string and searches for the name in a file of names. The program must output the record ID which matches the given name. In case of no match a -1 is returned. </li></ul></ul>What coverage domains can be identified from this requirement?
18.
The Coverage Principle (contd.) <ul><li>As we learned earlier, improving coverage improves our confidence in the correct functioning of the program under test. </li></ul><ul><li>Given a program P and a test T suppose that T is adequate w.r.t. a coverage criterion C. </li></ul><ul><li>Does this mean that P is error free? </li></ul>Obviously……???
19.
Test Effort <ul><li>There are several measures of test effort . </li></ul><ul><li>One measure is the size of T. By this measure a test set with a larger number of test cases corresponds to higher effort than one with a lesser number of test cases. </li></ul>
20.
Error Detection Effectiveness <ul><li>Each coverage criterion has its error detection ability. This is also known as the error detection effectiveness or simply effectiveness of the criterion. </li></ul><ul><li>One measure of the effectiveness of criterion C is the fraction of faults guaranteed to be revealed by a test T that satisfies C. </li></ul>
21.
Effectiveness (contd.) <ul><li>Another measure is the probability that at least fraction f of the faults in P will be revealed by test T that satisfies C. </li></ul><ul><li>Unfortunately there is no absolute measure of the effectiveness of any given coverage criterion for a general class of programs and for arbitrary test sets. </li></ul>
22.
Effectiveness (contd.) <ul><li>One coverage criterion results in an exception to this rule: What is it? </li></ul><ul><li>Empirical studies conducted by researchers give us an idea of the relative goodness of various coverage criteria. </li></ul><ul><li>Thus, for a variety of criteria we can make a statement like: Criterion C1 is definitely better than criterion C2. </li></ul>
23.
Effectiveness-continued <ul><li>In some cases we may be able to say: Criterion C1 is probably better than criterion C2 . </li></ul><ul><li>Such information allows us to construct a hierarchy of coverage criteria. </li></ul><ul><li>This hierarchy is helpful in organizing and managing testing. How ? </li></ul>
24.
Strength of a coverage criterion <ul><li>The effectiveness of a coverage criterion is also referred to as its strength . </li></ul><ul><li>Strength is a measure of the criterion’s ability to reveal faults in a program. </li></ul><ul><li>Criterion C1 is considered stronger than criterion C2 if C1 is is capable of revealing more faults than C2. </li></ul>
25.
The Saturation Effect <ul><li>The rate at which new faults are discovered reduces as test adequacy with respect to a finite coverage domain increases ; it reduces to zero when the coverage domain has been exhausted. </li></ul>coverage 0 1
26.
Saturation Effect: Fault View Testing Effort Remaining Faults 0 N Functional t f s t f e t d s M t df e t m e
27.
Saturation Effect: Reliability View Functional, Decision, Dataflow, and Mutation tsting provide various test assessment criteria. True reliability (R) Estimated reliability (R’) Saturation region Reliability Testing Effort R’ f R’ d R’ df R’ m Functional R f t f s t f e Decision R d t d s t d e Dataflow R df t df s t df e Mutation R m t m s t f e
28.
Coverage principle-discussion <ul><li>Discuss: </li></ul><ul><ul><li>How will you use the knowledge of coverage principle and the saturation effect in organizing and managing testing ? </li></ul></ul>Can you think of any other uses of the coverage principle and the saturation effect?
29.
Control flow graph <ul><li>Control flow graph (CFG) of a program is a representation of the flow of execution within the program. </li></ul><ul><li>More formally, a CFG G is: </li></ul><ul><ul><li>G=(N,A) </li></ul></ul><ul><ul><ul><li>where N: set of nodes and A: set of arcs </li></ul></ul></ul><ul><ul><li>There is a unique entry node e n in N. </li></ul></ul><ul><ul><li>There is a unique exit node ex in N. A node represents a single statement or a block . </li></ul></ul><ul><ul><li>A block is a single-entry-single-exit sequence of instructions that are always executed in a sequence without any diversion of path except at the end of the block. </li></ul></ul>
30.
Control flow graph (contd.) <ul><ul><li>Every statement in a block, except possibly the first one, has exactly one predecessor . </li></ul></ul><ul><ul><li>Similarly, every statement in the block, except possibly the last one, has exactly one successor . </li></ul></ul><ul><ul><li>An arc a in A is a pair ( n,m ) of nodes from N which represent transfer of control from node n to node m . </li></ul></ul><ul><ul><li>A path of length k in G is an ordered sequence of arcs, from A such that: </li></ul></ul>
31.
Control flow graph (contd.) <ul><ul><ul><li>The first node a 1 is en </li></ul></ul></ul><ul><ul><ul><li>The last node a k is ex </li></ul></ul></ul><ul><ul><ul><li>For any two adjacent arcs a i = ( n,m ) and a j = ( p,q ), m=p . </li></ul></ul></ul><ul><ul><li>A path is considered executable or feasible if there exists a test case which causes this path to be traversed during program execution , otherwise the path is unexecutable or infeasible . </li></ul></ul>
32.
Control flow graph-example <ul><ul><ul><li>Exercise: </li></ul></ul></ul><ul><ul><ul><li>Draw a CFG for the following program and identify all paths. : </li></ul></ul></ul>1. scanf (x,y); if (y<0) 2. pow=0-y; 3. else pow=y; 4. z=1.0; 5. while (pow !=0) 6. {z=z*x; pow=pow-1;} 7. if (y<0) 8. z=1.0/z; 9. printf(z); What does the above program compute ?
33.
Control-flow Graph 2 3 pow=0-y; else pow=y; 4 z=1.0; 5 while (pow !=0) {z=z*x; pow=pow-1;} 6 7 if (y<0) 8 9 z=1.0/z; printf(z); 1 scanf (x,y); if (y<0) en ex
34.
Structure-based Test Adequacy <ul><li>Based on the CFG of a program several test adequacy criteria can be defined. </li></ul><ul><li>Some are: </li></ul><ul><ul><ul><li>statement coverage criterion </li></ul></ul></ul><ul><ul><ul><li>branch coverage criterion </li></ul></ul></ul><ul><ul><ul><li>condition coverage criterion </li></ul></ul></ul><ul><ul><ul><li>path coverage criterion </li></ul></ul></ul>
35.
Statement Coverage <ul><li>The coverage domain consists of all statements in the program. Restated, in terms of the control flow graph, it is the set of all nodes in G. </li></ul><ul><li>A test T satisfies the statement coverage criterion if upon execution of P on each element of T, each statement of P has been executed at least once. </li></ul>
36.
Statement coverage (contd.) <ul><li>Restated in terms of G, T is adequate w.r.t. the statement coverage criterion if each node in N is on at least one of the paths traversed when P is executed on each element of T. </li></ul>
37.
Statement Coverage (contd.) <ul><li>Class exercise: </li></ul><ul><ul><li>For the program for which you have drawn the control flow graph, develop a test set that satisfies the statement coverage criterion. </li></ul></ul><ul><ul><li>Follow the procedure for test assessment and improvement suggested earlier. </li></ul></ul>
38.
Statement Coverage-Weakness <ul><li>Consider the following program: </li></ul><ul><ul><ul><li>int abs (x); </li></ul></ul></ul><ul><ul><ul><li>int x; </li></ul></ul></ul><ul><ul><ul><li>{ </li></ul></ul></ul><ul><ul><ul><ul><li>if (x>=0) x=0-x; </li></ul></ul></ul></ul><ul><ul><ul><ul><li>return x; </li></ul></ul></ul></ul><ul><ul><ul><li>} </li></ul></ul></ul>
39.
Statement coverage-weakness <ul><li>Suppose that T= {(x=0)}. </li></ul><ul><li>Clearly, T satisfies the statement coverage criterion. </li></ul><ul><li>But is the program correct and is the error revealed by T which is adequate w.r.t. the statement coverage criterion? </li></ul><ul><ul><li>What do you suggest we do to improve T ? </li></ul></ul>
40.
Branch (or edge) coverage <ul><li>In G there may be nodes which correspond to conditions in P. Such nodes, also called condition nodes, contain branches in P. </li></ul><ul><li>Each such node is considered covered if during some execution of P, the condition evaluates to true and false ; these executions of P need not be the same. </li></ul>
41.
Branch coverage <ul><li>The coverage domain consists of all branches in G. Restated, in terms of the control flow graph, it is the set of all arcs exiting the condition nodes. </li></ul><ul><li>A test T satisfies the branch coverage criterion if upon execution of P on each element of T, each branch of P has been executed at least once. </li></ul>
42.
Branch coverage <ul><li>Class exercise: </li></ul><ul><ul><ul><li>Identify all condition nodes in the flow graph you have drawn earlier . </li></ul></ul></ul><ul><ul><ul><li>Does T= {(x=0)} satisfy the branch coverage criterion? </li></ul></ul></ul><ul><ul><ul><li>If not, then improve it so that it does . </li></ul></ul></ul>
43.
Branch Coverage-Weakness <ul><li>Consider the following program that is supposed to check if the input data item is in the range 0 to 100, inclusive: </li></ul><ul><ul><ul><li>int check (x); </li></ul></ul></ul><ul><ul><ul><li>int x; </li></ul></ul></ul><ul><ul><ul><li>{ </li></ul></ul></ul><ul><ul><ul><ul><li>if ((x>=0 )&& (x<= 200 )) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>check=true; </li></ul></ul></ul></ul><ul><ul><ul><ul><li>else check=false; </li></ul></ul></ul></ul><ul><ul><ul><li>} </li></ul></ul></ul>
44.
Branch Coverage-Weakness <ul><li>Class exercise: </li></ul><ul><ul><ul><li>Do you notice the error in this program ? </li></ul></ul></ul><ul><ul><ul><li>Find a test set T which is adequate w.r.t. statement coverage and does not reveal the error. </li></ul></ul></ul><ul><ul><ul><li>Improve T so that it is adequate w.r.t. branch coverage and does not reveal the error . </li></ul></ul></ul><ul><ul><ul><li>What do you conclude about the weakness of the branch coverage criterion ? </li></ul></ul></ul>
45.
Condition Coverage <ul><li>For example, in the check program the condition node contains the condition: </li></ul>((x>=0 ) && (x<= 200 )) <ul><li>Condition nodes in G might have compound conditions. </li></ul><ul><li>This is a compound condition which consists of the elementary conditions x>=0 and x<= 200 . </li></ul>
46.
Condition coverage (contd.) <ul><li>A compound condition is considered covered if all of its constituent elementary conditions evaluate to true and false, respectively, during some execution of P. </li></ul><ul><li>A test set T is adequate w.r.t. condition coverage if all conditions in P are covered when P is executed on elements of T. </li></ul>
47.
Condition coverage (contd.) <ul><li>Class exercise: </li></ul><ul><ul><ul><li>Improve T from the previous exercise so that it is adequate w.r.t. the condition coverage criterion for the check function and does not reveal the error . </li></ul></ul></ul><ul><ul><ul><li>Do you find the above possible ? </li></ul></ul></ul>
48.
Branch coverage-weakness (contd.) <ul><li>Consider the following program: </li></ul>0. int set_z( x,y); { 1. int x,y; 2. if (x!=0) 3. y=5; 4. else z=z-x; 5. if (z>1) 6. z=z/x; 7. else 8. z=y; } What might happen here?
49.
Branch Coverage-Weakness <ul><li>Class exercise: </li></ul><ul><ul><ul><li>Construct T for set_z such that (a) T is adequate w.r.t. the branch coverage criterion and (b) does not reveal the error . </li></ul></ul></ul><ul><ul><ul><li>What do you conclude about the effectiveness of the branch and condition coverage criteria ? </li></ul></ul></ul>
50.
Path coverage <ul><li>As mentioned before, a path through a program is a sequence of statements such that the entry node of the program CFG is the first node on the path and the exit node is the last one on the path. </li></ul>Is this definition equivalent to the one given earlier ?
51.
Path coverage (contd.) <ul><li>A test set T is considered adequate w.r.t. the path coverage criterion if all paths in P are executed at least once upon execution on each element of T. </li></ul><ul><li>Class exercise: </li></ul><ul><ul><ul><li>Construct T for set_z such that T is adequate w.r.t. the path coverage criterion and does not reveal the error . </li></ul></ul></ul><ul><ul><ul><li>Is the above possible ? </li></ul></ul></ul>
52.
Path Coverage-Weakness <ul><li>The number of paths in a program is usually very large. </li></ul><ul><li>How many paths in set_z ? </li></ul><ul><li>How many paths in check ? </li></ul>x y ? <ul><li>How many in the program that computes </li></ul>
53.
Path Coverage-Weaknesses <ul><li>It is the infinite or a prohibitively large number of paths that prevent the use of this criterion in practice. </li></ul><ul><li>Suppose that a test set T covers all paths. Will it guarantee that all errors in P are revealed ? </li></ul><ul><li>Is obtaining 100% path coverage equivalent to exhaustive testing ? </li></ul>
54.
Variants of Path Coverage <ul><ul><li>Make sure that each loop is executed 0, 1, and 2 times. </li></ul></ul><ul><li>As path coverage is usually impossible to attain, other heuristics have been proposed. </li></ul><ul><li>Loop coverage: </li></ul><ul><li>Try several combinations of if and switch statements. The combinations must come from requirements. </li></ul>
55.
Hierarchy in Control flow criteria Path coverage Condition coverage Branch coverage Statement coverage X Y X subsumes Y.
56.
Exercise <ul><li>Develop a test set T that is adequate w.r.t. the statement, condition, and the loop coverage criteria for the exponentiation program . </li></ul>
57.
Test strategy <ul><li>One can develop a test strategy based on any of the criteria discussed. </li></ul><ul><li>Example: </li></ul><ul><ul><li>A test strategy based on the statement coverage criterion will begin by evaluating a test set T against this criterion. Then new tests will be added to T until all the statements are covered, i.e. T satisfies the criterion. </li></ul></ul>
58.
Definitions <ul><li>Error-sensitive path : a path whose execution might lead to eventual detection of an error. </li></ul><ul><li>Error revealing path : a path whose execution will always cause the program to fail and the error to be detected. </li></ul>
59.
Definitions: Reliable Technique <ul><li>Reliable : A test technique is reliable for an error if it guarantees that the error will always be detected. </li></ul><ul><ul><li>This implies that a reliable testing technique must lead to the exercising of at least one error-revealing path. </li></ul></ul>
60.
Definitions: Weakly Reliable <ul><li>Weakly reliable : A test technique is weakly reliable if it forces the execution of at least one error sensitive path. </li></ul>
61.
Example: Error Detection [1] <ul><li>Let us go over the example in Korel and Laski’s paper. </li></ul><ul><li>It is a sorting program which uses the bubble sort algorithm. </li></ul><ul><li>It sorts an array a[0:N] in descending order. </li></ul><ul><li>There are two, nested, loops in the program. </li></ul><ul><li>The inner loop from i6-i10 finds the largest element of a[R1:N]. </li></ul>
62.
Example: Error Detection (contd.) <ul><li>The largest element is saved in R0 and R3 points to the location of R0 in a . </li></ul><ul><li>The completion of one iteration of the outer loop ensures that the sub-array a[0:R1-1] has been sorted and that a[R1-1] is greater than or equal to any element of a[R1:N]. </li></ul><ul><li>The outer loop swaps a(R1) with a(R3). </li></ul>
63.
Example: Error Detection (contd.) <ul><li>There is a missing re-initialization of R3 to R1 at the beginning of the inner loop. </li></ul><ul><li>In some cases this will cause the program to fail. </li></ul><ul><ul><ul><li>What are these cases ? </li></ul></ul></ul><ul><li>We will get back to this error later! </li></ul>
64.
Data flow graph <ul><li>The graph is constructed from the control flow graph (CFG) of the program. </li></ul><ul><li>It represents the flow of data in a program. </li></ul><ul><li>A statement that occurs within a node of the CFG might contain variables occurrences. </li></ul><ul><li>Each variable occurrence is classified as a def or a use . </li></ul>
65.
defs and uses <ul><li>A def represents the definition of a variable. Here are some sample defs of variable x : </li></ul><ul><ul><ul><li>x =y*x; </li></ul></ul></ul><ul><ul><ul><li>scanf(& x ,&y); </li></ul></ul></ul><ul><ul><ul><li>int x ; </li></ul></ul></ul><ul><ul><ul><li>x [i-1]=y*x; </li></ul></ul></ul>All defs of x are italicized . <ul><li>A use represents the use of a variable in a statement. Here a few examples of use of variable x : </li></ul>
66.
def-use (contd.) <ul><ul><ul><li>x= x +1; </li></ul></ul></ul><ul><ul><ul><li>printf (“x is %d, y is %d”, x ,y); </li></ul></ul></ul><ul><ul><ul><li>cout << x << endl << y </li></ul></ul></ul><ul><ul><ul><li>z= x [i+1] </li></ul></ul></ul><ul><ul><ul><li>if ( x <y)… </li></ul></ul></ul>All uses of x are italicized . <ul><li>Uses of a variable in input and assignments are classified as c-uses . Those in conditions are classified as p-uses . </li></ul>
67.
def-use (contd.) <ul><li>c-use stands for computational use and p-use for predicate-use . </li></ul><ul><li>Both c- and p-uses affect the flow of control: p-uses directly as their values are used in evaluating conditions and c-uses indirectly as their values are used to compute other variables which in turn affect the outcome of condition evaluation. </li></ul>
68.
def-use (contd.) <ul><li>A path from node i to node j is said to be def-clear w.r.t. a variable x if there is no def of x in the nodes along the path from node i to node j . Nodes i and j may have a def of x . </li></ul><ul><li>A def-clear path from node i to edge ( j,k ) is one in which no node on the path has a def of x . </li></ul>
69.
global-def <ul><li>A c-use of x in a block is considered global c-use if there is no def of x preceding this c-use within this block. </li></ul><ul><li>A def of a variable x is considered global to its block if it is the last def of x within that block. </li></ul>
70.
def-use graph: definitions <ul><li>def(i): set of all variables for which there is a global def inition at node i . </li></ul><ul><li>c-use(i): set of all variables that have a global c-use at node i . </li></ul><ul><li>p-use(i,j): set of all variables for which there is a p-use for the edge ( i,j ). </li></ul><ul><li>dcu(x,i): set of all nodes such that each node has x in its c-use and x is in def(i) . </li></ul>
71.
def-use graph: definitions <ul><li>dpu(x,i): set of all edges such that each edge has x in its p-use , x is in def(i) . </li></ul><ul><li>The def-use graph of program P is constructed by associating defs, c-use, and p-use sets with nodes of a flow graph. </li></ul>
74.
def-use graph exercise 0. int set_z(x,y); { 1. int x,y; 2. if (x!=0) 3. y=5; 4. else z=z-x; 5. if (z>1) 6. z=z/x; 7. else 8. z=y; } Draw a def-use graph for the following program.
76.
Test generation <ul><li>Exercises: </li></ul><ul><ul><li>For the above graph generate a test set that satisfies </li></ul></ul><ul><ul><ul><li>the branch coverage criterion </li></ul></ul></ul><ul><ul><ul><li>the all-defs criterion - for definitions of all variables at least one use (c- or p- use) must be exercised . </li></ul></ul></ul><ul><ul><ul><li>the all-uses criterion- all p-uses and all c-uses of all variable definitions be covered . </li></ul></ul></ul>Develop the tests incrementally, i.e. by modifying the previous test set!
77.
SUDS processing: Phase I P, Program under test Preprocess, compile and instrument . trace file upon execution . atac files generate Instrumented version of P (executable) generate Test set input Program output upon execution
78.
ATAC processing: phase II coverage analyzer .atac files .trace file control flow and data flow coverage values
79.
Mutation Testing <ul><li>What is mutation testing? </li></ul><ul><ul><li>Mutation testing is a code-based test assessment and improvement technique. </li></ul></ul><ul><ul><li>It relies on the competent programmer hypothesis which is the following assumption: </li></ul></ul><ul><ul><li>Given a specification a programmer develops a program that is either correct or differs from the correct program by a combination of simple errors . </li></ul></ul>
80.
Mutation testing (contd.) <ul><li>The process of program development is considered as iterative whereby an initial version of the program is refined by making simple, or a combination of simple changes, towards the final version. </li></ul>
81.
Mutant <ul><li>Given a program P, a mutant of P is obtained by making a simple change in P. </li></ul>What is zpush ? 1. int x,y; 2. if (x!=0) 3. y=5; 4. else z=z-x; 5. if (z>1) 6. z=z/x; 7. else 8. z=y; Program 1. int x,y; 2. if (x!=0) 3. y=5; 4. else z=z-x; 5. if (z>1) 6. z=z/ zpush(x); 7. else 8. z=y; Mutant
82.
Another mutant 1. int x,y; 2. if (x!=0) 3. y=5; 4. else z=z-x; 5. if (z>1) 6. z=z/x; 7. else 8. z=y; Program 1. int x,y; 2. if (x!=0) 3. y=5; 4. else z=z-x; 5. if (z < 1) 6. z =z/x; 7. else 8. z=y; Mutant
83.
Mutant <ul><li>A mutant M is considered distinguished by a test case t T iff: </li></ul><ul><ul><ul><li>P(t) M(t) </li></ul></ul></ul><ul><ul><ul><li>where P(t) and M(t) denote, respectively, the observed behavior of P and M when executed on test input t. </li></ul></ul></ul><ul><li>A mutant M is considered equivalent to P iff: </li></ul><ul><ul><ul><li>P(t) M(t) t T. </li></ul></ul></ul>
84.
Mutation score <ul><li>During testing a mutant is considered live if it has not been distinguished or proven equivalent. </li></ul><ul><li>Suppose that a total of #M mutants are generated for program P. </li></ul><ul><li>The mutation score of a test set T, designed to test P, is computed as: </li></ul><ul><ul><li>number of live mutants/(#M-number of equivalent mutants) </li></ul></ul>
85.
Test adequacy criterion <ul><li>A test T is considered adequate w .r.t. the mutation criterion if its mutation score is 1. </li></ul><ul><li>The number of mutants generated depends on P and the mutant operators applied on P. </li></ul><ul><li>A mutant operator is a rule that when applied to the program under test generates zero or more mutants. </li></ul>
86.
Mutant Operators <ul><li>Consider the following program: </li></ul><ul><ul><ul><li>int abs (x); </li></ul></ul></ul><ul><ul><ul><li>int x; </li></ul></ul></ul><ul><ul><ul><li>{ </li></ul></ul></ul><ul><ul><ul><ul><li>if (x>=0) x=0-x; </li></ul></ul></ul></ul><ul><ul><ul><ul><li>return x; </li></ul></ul></ul></ul><ul><ul><ul><li>} </li></ul></ul></ul>
87.
Mutation operator <ul><li>Consider the following rule: </li></ul><ul><ul><ul><li>Replace each relational operator in P by all possible relational operators excluding the one that is being replaced. </li></ul></ul></ul><ul><li>Assuming the set of relational operators to be: {<, >, <=, >=, ==, !=}, the above mutant operator will generate a total of 5 mutants of P. </li></ul>
88.
Mutation Operators <ul><li>Mutation operators are language dependent. </li></ul><ul><li>For Fortran a total of 22 operators were proposed. </li></ul><ul><li>For C a total of 77 operators were proposed. None have been proposed for C++ though most of the operators for C are applicable to C++ programs. </li></ul>
89.
Equivalent mutant <ul><ul><ul><li>int x,y,z; </li></ul></ul></ul><ul><ul><ul><li>scanf(&x,&y); </li></ul></ul></ul><ul><ul><ul><li>if (x>0) </li></ul></ul></ul><ul><ul><ul><ul><li>x=x+1; z=x*(y-1); </li></ul></ul></ul></ul><ul><ul><ul><li>else </li></ul></ul></ul><ul><ul><ul><ul><li>x=x-1; z=x*(y-1); </li></ul></ul></ul></ul><ul><li>Consider the following program P: </li></ul><ul><li>Here z is considered the output of P. </li></ul>
90.
Equivalent mutant (contd.) <ul><li>Now suppose that a mutant of P is obtained by changing x=x+1 to x=abs(x)+1 . </li></ul><ul><li>This mutant is equivalent to P as no test case can distinguish it from P. </li></ul>
91.
Mutation Testing Procedure Given P and a test set T: 1. Generate mutants 2. Compile P and the mutants 3. Execute P and the mutants on each test case. 4. Determine equivalent mutants.. 5. Determine mutation score. 6. If mutation score is not 1 then improve the test set and repeat from step 3.
92.
Mutation Testing Procedure (contd.) <ul><li>In practice the above procedure is implemented incrementally. </li></ul><ul><li>One applies a few selected mutant operators to P and computes the mutation score w.r.t. to the mutants generated. </li></ul><ul><li>Once these mutants have been distinguished or proven equivalent, another set of mutant operators is applied. </li></ul>
93.
Mutation Testing Procedure <ul><li>This procedure is repeated until either all the mutants have been exhausted or some external condition forces testing to stop. </li></ul><ul><li>We will not discuss the details of practical application of mutation testing . </li></ul>
94.
Tools for Mutation Testing <ul><li>Mothra : for Fortran, developed at Purdue, 1990 </li></ul><ul><li>Proteum : for C, developed at the University of Saõ Paulo at Saõ Carlos in Brazil. </li></ul>
95.
Uses of Mutation Testing <ul><li>Mutation testing is useful during integration testing to check for integration errors. </li></ul><ul><li>Only the variables that are in the interfaces of the components being integrated are mutated. This reduces the complexity of mutation testing. </li></ul>
Be the first to comment