SlideShare a Scribd company logo
1 of 42
An Experimental Evaluation
of the Reliability of Adaptive
Random Testing Methods
Hong Zhu
Department of Computing and Electronics,
Oxford Brookes University,
Oxford OX33 1HX, UK
Email: hzhu@brookes.ac.uk
Aug.
2009
2
Exterimental Study of ART Reliability
Motivation
 General problem:
 How to evaluate and compare software testing
methods?
 Existing works
 Comparison criteria
 Fault detecting ability
 Cost
 Comparison methods
 Formal analysis
• Subsumption relation and other relations
• Axiomatic assessment
 Experimental and empirical studies
A great number of software
testing methods have been
proposed, but few have been
widely used in practice.
Aug.
2009
3
Exterimental Study of ART Reliability
What is a good software testing method?
 Goodenough and Gerhart (1975):
 Validity:
 for all software under test (SUT) there is a test set
generated by the method that can detect faults if
any.
 Reliability:
 if one such test set detects a fault, then ideally any
test set of the method will also detect the fault.
J. B. Goodenough and S. L. Gerhart, Toward a theory
of test data selection, IEEE TSE, Vol.SE_3, June 1975.
Fault detecting ability
Reliable and stable
Aug.
2009
4
Exterimental Study of ART Reliability
Proposed idea
 Research hypothesis
 It is inadequate if testing methods are only compared on fault
detecting ability. They should also be compared on how
reliable they are in the sense of stably producing high fault
detecting rates.
 Research questions:
 Are testing methods differ from each other in reliability?
 If yes, what are the factors that affect their reliabilities?
 How to compare testing methods’ reliability?
 How to measure?
 How to do experiments?
 What are the practical implications?
Aug.
2009
5
Exterimental Study of ART Reliability
The approach
 Investigation of some specific software testing
methods to test the hypothesis:
 random software testing (RT)
 adaptive random software testing methods (ART)
Adaptive random testing have been proposed to further
improve the effectiveness and efficiency of random
testing by evenly distribute the test cases in the input
space. (T.Y. Chen, et al., 2004, …)
Random testing have been regarded
• as effective as systematic testing
• more efficient than systematic methods
Aug.
2009
6
Exterimental Study of ART Reliability
Background: (1) Random testing
 Random testing (RT) is a software testing
method that selects or generates test cases
through random sampling over the input domain
of the SUT according to a given probability
distribution.
 Representative random testing
 Non-representative random testing
Aug.
2009
7
Exterimental Study of ART Reliability
Definitions
Let D be the input domain, i.e. the set of valid input data
to the SUT. A test set T=<a1, a2, …, an>, n0, on domain
D is a finite sequence of elements in D, i.e. aiD, for all
i=1, 2, …, n.
The number n of elements in a test set is called the size of
the test set.
A SUT is tested on a test set T=<a1, a2, …, an> means that
the software is executed on inputs a1, a2, …, an one by
one with necessary initialization before each execution.
Strictly speaking, it should not be called test set as
elements are ordered. But, it is a convention in the
literature of ART to call them test set.
When n=0, the test set is called empty test, which means the
software is not executed at all. Without loss of generality, we
assume that test sets considered in this paper are non-empty.
Aug.
2009
8
Exterimental Study of ART Reliability
Definitions
A test set T=<a1, a2, …, an> is a random test set if
the elements are selected or generated at random
independently according to a given probability
distribution on the input domain D.
If the same input is allowed to occur more than
once in a test set, we say that the random test sets
are with replacement; otherwise, the test sets are
called without replacement.
In the experiment reported here, the random
test sets are with replacement.
Aug.
2009
9
Exterimental Study of ART Reliability
Background: (2) Adaptive random testing
 Basic idea:
 To evenly spread the test cases in the input space so
that it is more likely to find faults in the software.
 First to generate random test cases, then to
manipulate the random test cases to achieve the goal
of even spread.
 It is proposed and investigated by Prof. T.Y. Chen
and his research group. A variety of such
manipulations have been proposed and investigated.
Aug.
2009
10
Exterimental Study of ART Reliability
Adaptive Random Testing 1: Mirror
Let D0, D1, D2, …, Dk (k>0) be a disjoint partition of the input
domain D. The sub-domains D0, …, Dk are of equal size and there
is an one-to-one mapping mi from D0 to Di, i=1, …, k. The sub-
domain D0 is called the original sub-domain. The other sub-
domains are called the mirror sub-domains. The mappings mi are
called the mirror functions.
Let T0 =<a0
1, a0
2, …, a0
n> be a random test set on D0. The mirror
of the original test sets T0 in a mirror sub-domain Di through mi is a
test set Ti, written mi(T0), defined as follows.
mi(<a0
1, a0
2, …, a0
n>)=<ai
1, ai
2, …, ai
n>,
where mi(a0
j ) = ai
j, i=1, …, k, j=1, …, n. The mirror test set of T0 ,
written Mirror(T0), is a test set on D that
Mirror(T0) =
<a0
1, a1
1, a2
1, …, ak
1, a0
2, a1
2, …, ak
2, …, a0
n, a1
n, …, ak
n >.
Aug.
2009
11
Exterimental Study of ART Reliability
Adaptive Random Testing 2: Distance
Let || x, y || be a distance measure defined on input domain
D. It is extended to the distance between an element x and
a set Y of elements as || x, Y|| = min(||x, y||, yY). A test set
T=<a0, a1, a2, …, an> is maximally distanced if for all i=1,
2, …, n, || ai, {a0, …, ai-1}||  || aj, {a0, …, ai-1} ||, for all j
> i.
Let T be any given test set. The distance manipulation of
T is a re-ordering of T’s elements in the sequence so that it
is maximally distanced, written Distance(T).
A distance measure on a domain D is a function || || from D2 to real
numbers in [0,) such that for all x, y, zD,
|| x, x || = 0; || x, y|| = ||y, x || and ||x , y ||  ||x , z || + || z, y||.
Aug.
2009
12
Exterimental Study of ART Reliability
Adaptive Random Testing 3: Filter
Let || x , y || be a distance measure defined on the input
domain D. Let T be any given non-empty test set. Test set
T’ =<a1, a2, …, an> is said to be ordered according to the
distance to T, if for all i=1, 2,. …, n-1, ||ai, T||  ||ai+1, T||.
Let S and C be two given natural numbers such that S > C
> 0, and T0, T1, …, Tk be a sequence of test sets of size
S>1. Let U0=T0. Assume that Ui=<a1, …, aki> and
Ti+1=<c1, …, cS>. We define T’i+1=<b1, …, bS> be a test
set obtained from Ti+1 by permutation its elements so that
T’i+1 is ordered according to the distance to Ui. Then, we
inductively define Ui+1=<a1, …, aki, b1, …, bC>, for i=0,
1, …, k-1. The test set Uk is called the test set obtained by
filtering C out of S test cases.
Aug.
2009
13
Exterimental Study of ART Reliability
Notes on the definitions of ART techniques
 Strictly speaking, the results of the above manipulations
are not random test sets even if the test sets before
manipulation are random.
 These manipulations can be combined together to
obtain more sophisticated adaptive random testing.
 The set of manipulations defined above may not be
complete. There are also other manipulations on
random test set;
 In the literature, ART methods are usually described in
the form of test case generation algorithms rather than
manipulation operators.
 The filter manipulation is not used in the experiments
reported in this paper.
For example, in mirror adaptive random testing, the input domain is
partitioned. A random test set is first generated on the original sub-
domain. It is then manipulated by the Distance operation and then
Mirrored into the mirror sub-domain.
It is defined here to demonstrate that testing methods
defined in the form of an algorithm can also be defined
formally as manipulations.
This simplifies the descriptions of ART testing methods
concisely and formally,
This enables us to recognise easily the various different
ways in which they can be combined.
Aug.
2009
14
Exterimental Study of ART Reliability
The general framework of experiments with
software testing method
 Select a set of software under test (SUT): S1, S2, …, Sk, k>0;
 Select a measurement f of fault detecting ability on single test
set;
 Applying the test method to generate test sets T1, T2, …, Tk
 Test the SUT on test sets and measure fault detecting ability:
 Assume that m1, m2, …, mk are the fault detecting abilities of
T1, T2, …, Tk as measured by f in testing SUTs S1, S2, …, Sk,
 Calculate the performance of the test method using a fault
detecting ability measure
Si can be the same as or different from Sj.
Aug.
2009
15
Exterimental Study of ART Reliability
Background: (4) Measurements of fault
detecting ability
P-measure: Probability of detecting at least one fault
The P-measure of the k tests T1, T2, …, Tk is defined as
follows.
P(T1, T2, …, Tk)=d/k,
where k>0, { ( fails on ), 1 }
i i
d T a T S a i k
    
When k, P(T1, T2, …, Tk) approaches the probability
that a test set T detects at least one fault.
Aug.
2009
16
Exterimental Study of ART Reliability
Understanding P-measure
When the test cases in each test set T are
generated at random independently using the
same distribution of the operation profile, the
probability that a test set of size w detects at least
one fault is 1-(1-)w, where  is the failure
probability of the software. Formally, for random
testing, we have that
P(T) 1-(1-)w, where w = | T |.
In other words, the P-measure of random testing is an
estimation of the value 1-(1-)w, where w is the test size.
Aug.
2009
17
Exterimental Study of ART Reliability
Measurements of fault detecting ability (2)
E-measure: Average number of failures
The E-measure of k tests T1, T2, …, Tk is defined
as follows.
M(T1, T2, …, Tk)= ,
where k>0, , i=1,…, k.
1
1 k
i
i
u
k 
 
 
 

{ fails on }
i i
u a T S a
 
Aug.
2009
18
Exterimental Study of ART Reliability
Understanding the E-measure
When the test sets Ti are of the same size, say w, and the
test cases are selected at random independently using the
same distribution as the operation profile, the E-measure
approaches w, when k. Here,  is the failure
probability of the SUT. Formally, for random testing we
have that
M(T)  w,
where w = | T | .
In other words, the E-measure of random testing is the
estimation of w, where w is the test size.
When test set size is 1, for random
testing, P-measure equals to E-measure.
Aug.
2009
19
Exterimental Study of ART Reliability
Measurements of fault detecting ability (3)
F-measure: The First failure
The F-measure is defined as follows.
F(T1, T2, …, Tk)= ,
where k>0, Ti=<a1, a2, …, aki>, vi is a natural
number that S fails on the test case avi and for all
0<n<vi, S is correct on test case an.
1
1 k
i
i
v
k 
 
 
 

It is assumed that the sizes of the test sets Ti, i=1,…, k,
are large enough so that vi exists.
Aug.
2009
20
Exterimental Study of ART Reliability
Understanding the F-measure
 The F-measure is a random variable of geometric
distribution, i.e. the distribution of first success
Bernoulli trial, under the conditions:
 the test cases ai in each test set T are selected at random
independently
 using the same distribution of the operation profile
 In other words, when k, F(T1, T2, …, Tk) 1/,
where  is the probability of failure.
F(T)  1/
Summary. For random testing:
P(T) 1-(1-)w M(T)  w F(T)  1/
where w= |T|
Aug.
2009
21
Exterimental Study of ART Reliability
Measuring the reliability of test methods
S-measure: The variation of fault detecting ability
Let:
 T1, T2, …, Tk be a number of test sets generated according to a
testing method
 f be a given measurement of fault detecting ability,
The S-measure S(T1, T2, …, Tk) of tests T1, T2, …, Tk is
formally defined as follows.
S(T1, T2, …, Tk)= ,
where k>1, mi=f(Ti), i=1,…, k, .
2
1
1
( )
1
k
i
i
m m
k 
-
-

1
/
k
i
i
m m k

 
The S-measure is the sample standard deviation of the
values m1, m2, …, mk of the f-measures on the test sets.
The smaller the S-measure, the more reliable the testing
method is.
Aug.
2009
22
Exterimental Study of ART Reliability
The experiment: the goals
 to evaluate ART methods on their reliabilities of
fault-detecting abilities with respect to realistic
faults.
 to identify the factors that influence the
reliability of fault detecting ability.
 two main candidate factors:
 the reliability (or equivalently the failure probability) of
the SUT
 the regularity of failure domains.
This is achieved through the design of experiment
with repetition of testing on a number of test sets.
Andrews, Briand and Labiche (2005): mutants
systematically generated by using mutation testing
tools can represent very well the real faults.
Thus, we use mutants in our experiments rather
than simulation.
Aug.
2009
23
Exterimental Study of ART Reliability
Experiment process
A. Selection of subject programs
 GCD: the greatest common divisor
 LCM: the least common multiplier
B. Generation of mutants
 Systematically by using the MuJava testing tool
C. Generation of test sets
 For each testing method, five test sets were generated
 Each test set consists of 1000 test cases.
D. Test the mutants
 Both F-measures and E-measures are used
E. Analysis of the test results
Two dimensional
input space on
natural numbers.
Aug.
2009
24
Exterimental Study of ART Reliability
Distribution of mutants
Mutant Type GCD LCM Total
AOIS 54 52 106
AOIU 9 6 15
AORB 4 4 8
AORS 0 1 1
LOI 14 14 28
ROR 15 15 30
Total 96 92 187
Equivalent Mutants 28 21 49
Trivial Mutants 27 29 56
Used in experiment 41 52 93
The trivial
mutants fail on
every test case.
Types of
mutants
according
to the
mutation
operators.
Aug.
2009
25
Exterimental Study of ART Reliability
Test Set Generation Algorithm
Step 1. The input domain is divided into two sub-domains. The original sub-
domain D0={1..5000}  {1..10000}, The mirror sub-domain
D1={5001..10000}  {1.. 10000}.
Step 2. Use pseudo-random function of the Java language with different seeds to
generate 500 random test cases in the domain D0. Let T1 be the set of these
test cases, where the elements are ordered in the order that they are generated.
Step 3. The mirror test set T2 is generated by applying the following mirror
mapping m: D0  D1.
m(<x,y>)=<x+5000, y>. (*)
The test set RT is obtained by merging T1 with T2, i.e. adding the elements of
T2 at the end of T1.
Step 4. Let T=T1T2. The test set DMART is obtained by applying the Distance
manipulation on T, i.e.
DMART=Distance(T).
Step 5. Let T3 be the result of applying the distance manipulation on T1, i.e.
T3=Distance(T1). The test set MDART is obtained by applying the Mirror
manipulation on T3 with the mirror mapping m as defined in (*) above, i.e.
MDART=Mirror(T3)=Mirror(Distance(T1)).
1010
103
Aug.
2009
26
Exterimental Study of ART Reliability
The characteristics of the mutants
Distribution of average E-measures over 5 test sets
Trivial
mutants
Aug.
2009
27
Exterimental Study of ART Reliability
What’s new in the mutants?
In the existing research on ART techniques, the
subject samples (i.e. the SUT) are always selected
with very low failure rates.
 Mayer, Chen, Huang (2006): the comparison of
RT and ART techniques
 mutants with a failure rate higher than 0.05 were
dismissed.
 all the mutants of failure rates at most 0.05 were
treated equally in statistical analysis.
 Chen et al. (2004): the performances of ART
testing techniques were evaluated by simulating
the failure rates of 0.01, 0.005, 0.001 and
0.0005.
 Chen et al. (2007): most of the simulations are
for failure rates less then 25%. There are only
three sets of simulation data for failure rates
higher than 25%, i.e. , 1/2 and .
2 /2 2 /4
Open
question:
How does
ART perform
on SUT that
have higher
failure rates.
Simulation
experiments
on ART
Aug.
2009
28
Exterimental Study of ART Reliability
Distribution of StDev of E-measures over 5 test sets
Aug.
2009
29
Exterimental Study of ART Reliability
The relationship between Avg E-measures and their StDev
Aug.
2009
30
Exterimental Study of ART Reliability
Main findings : (1) Fault detecting abilities
Average F-measures over all non-trivial mutants
Aug.
2009
31
Exterimental Study of ART Reliability
( ) ( ) 1.973 1.901
3.65%
( ) 1.973
F RT F MDART
F RT
- -
 
( ) ( ) 1.973 1.625
17.64%
( ) 1.973
F RT F DMART
F RT
- -
 
DMART improves
RT by 17.64%,
MDART
improves RT
by 3.75%;
How much ART improves RT?
Aug.
2009
32
Exterimental Study of ART Reliability
Comparison with existing works
 The results we obtained seem inconsistent with
existing works obtain by simulations.
 MART and DART were of similar fault detecting
abilities in terms of average F-measures. They only
differ from each other in the time needed to generate
test cases.
 The scale of improvement is much smaller than that
demonstrated in existing work, which could be up to
40% increase in F-measures while we have shown
that the average improvement is less than 20%.
Aug.
2009
33
Exterimental Study of ART Reliability
Main findings: (2) Factors affect ability
What is the main factors that affect ART’s fault detecting ability?
three levels of F-measures (>2.5, 1.5 ~ 2.5, <1.5)
the M-measures can be divided into three areas
(240~340, 340 ~ 680, >680).
Aug.
2009
34
Exterimental Study of ART Reliability
This explains why our results are different from existing works
 Our mutants have higher failure rates.
 Others’ have lower faiulre rates
ART’s fault detecting abilities increase as SUT’s failure rate
decreases.
Aug.
2009
35
Exterimental Study of ART Reliability
Main findings: (3) ART’s reliability
DMART is the most reliable testing
method among the three.
Aug.
2009
36
Exterimental Study of ART Reliability
How much ART improves RT on
reliability?
( ) ( ) 1.413 1.324
6.30%
( ) 1.413
S RT S MDART
S RT
- -
 
( ) ( ) 1.413 0.951
32.70%
( ) 1.413
S RT S DMART
S RT
- -
 
DMART made a significant improvement in
reliability over RT.
The improvement that MDART made
is much smaller
Aug.
2009
37
Exterimental Study of ART Reliability
Main findings: (4) Factors affect reliability
Is SUT’s failure rate a factor that affects ART reliability?
The relationship between the standard deviation of the F-measure
on DMART tests and the average failure rates of the mutants.
Aug.
2009
38
Exterimental Study of ART Reliability
The correlation coefficients between the average E-measure and the
average standard deviations of F-measure of RT, MDART and
DMART on all mutants are -0.735, -0.645 and -0.615, respectively.
The reliabilities of these testing methods depend
on SUT’s failure rate.
The variation of
DMART tests’ F-
measure decreases as
the average failure
rates increases.
Aug.
2009
39
Exterimental Study of ART Reliability
Is the ‘regularity’ of failure domain a factor that
affects reliability?
As the standard deviations of E-measures increases, the standard
deviations of F-measures tend to increase.
Aug.
2009
40
Exterimental Study of ART Reliability
The correlation coefficients between the standard deviations of F-
measures of RT, MDART and DMART and the standard deviations of
E-measures for all mutants are 0.574, 0.464 and 0.430, respectively.
The average
standard
deviation is
calculated on
mutants in
various ranges
of standard
deviations of
E-measures
The reliabilities of testing methods are related to the standard
deviation of E-measures, i.e. the regularity of failure domains.
Aug.
2009
41
Exterimental Study of ART Reliability
Conclusion
 We confirmed that ART not only improves the fault
detecting ability even on higher failure rate SUT
 We discovered that ART significantly improves the
reliability of fault detecting ability
 the variations on fault detecting abilities are significantly
smaller than random testing, where
 fault detecting ability is measured by F-measures,
 the variation is measured by the standard deviation of F-measures.
 We discovered two factors that affect ART’s reliability.
 the failure rate of the software under test: when the SUT’s
failure rate increases, the test method’s reliability increases.
 the regularity of the failure domain: when the regularity
increases, the reliability also increases, where
 regularity is measured by the standard deviation of E-measures.
Aug.
2009
42
Exterimental Study of ART Reliability
References
 Yu Liu and Hong Zhu, An Experimental
Evaluation of the Reliability of Adaptive
Random Testing Methods, Proc. of The Second
IEEE International Conference on Secure
System Integration and Reliability Improvement
(SSIRI 2008), Yokohama , Japan, July 14-17,
2008, pp24-31.

More Related Content

Similar to an-experimental-evaluation-of-the-reliability-.ppt

Code coverage based test case selection and prioritization
Code coverage based test case selection and prioritizationCode coverage based test case selection and prioritization
Code coverage based test case selection and prioritizationijseajournal
 
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...IJCNCJournal
 
An automatic test data generation for data flow
An automatic test data generation for data flowAn automatic test data generation for data flow
An automatic test data generation for data flowWafaQKhan
 
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...Eswar Publications
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.pptbutest
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.pptbutest
 
Dynamics of structures with uncertainties
Dynamics of structures with uncertaintiesDynamics of structures with uncertainties
Dynamics of structures with uncertaintiesUniversity of Glasgow
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteIAEME Publication
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteIAEME Publication
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteIAEME Publication
 
ecir2019tutorial-finalised
ecir2019tutorial-finalisedecir2019tutorial-finalised
ecir2019tutorial-finalisedTetsuya Sakai
 
Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...Cigdem Aslay
 
Paper id 24201456
Paper id 24201456Paper id 24201456
Paper id 24201456IJRAT
 

Similar to an-experimental-evaluation-of-the-reliability-.ppt (20)

Code coverage based test case selection and prioritization
Code coverage based test case selection and prioritizationCode coverage based test case selection and prioritization
Code coverage based test case selection and prioritization
 
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
 
An automatic test data generation for data flow
An automatic test data generation for data flowAn automatic test data generation for data flow
An automatic test data generation for data flow
 
AutoTest.ppt
AutoTest.pptAutoTest.ppt
AutoTest.ppt
 
AutoTest.ppt
AutoTest.pptAutoTest.ppt
AutoTest.ppt
 
AutoTest.ppt
AutoTest.pptAutoTest.ppt
AutoTest.ppt
 
ICTIR2016tutorial
ICTIR2016tutorialICTIR2016tutorial
ICTIR2016tutorial
 
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.ppt
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.ppt
 
Dynamics of structures with uncertainties
Dynamics of structures with uncertaintiesDynamics of structures with uncertainties
Dynamics of structures with uncertainties
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suite
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suite
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suite
 
ecir2019tutorial-finalised
ecir2019tutorial-finalisedecir2019tutorial-finalised
ecir2019tutorial-finalised
 
Siguccs20101026
Siguccs20101026Siguccs20101026
Siguccs20101026
 
Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...
 
m3 (2).pdf
m3 (2).pdfm3 (2).pdf
m3 (2).pdf
 
Paper id 24201456
Paper id 24201456Paper id 24201456
Paper id 24201456
 
Don't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptxDon't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptx
 

Recently uploaded

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 

an-experimental-evaluation-of-the-reliability-.ppt

  • 1. An Experimental Evaluation of the Reliability of Adaptive Random Testing Methods Hong Zhu Department of Computing and Electronics, Oxford Brookes University, Oxford OX33 1HX, UK Email: hzhu@brookes.ac.uk
  • 2. Aug. 2009 2 Exterimental Study of ART Reliability Motivation  General problem:  How to evaluate and compare software testing methods?  Existing works  Comparison criteria  Fault detecting ability  Cost  Comparison methods  Formal analysis • Subsumption relation and other relations • Axiomatic assessment  Experimental and empirical studies A great number of software testing methods have been proposed, but few have been widely used in practice.
  • 3. Aug. 2009 3 Exterimental Study of ART Reliability What is a good software testing method?  Goodenough and Gerhart (1975):  Validity:  for all software under test (SUT) there is a test set generated by the method that can detect faults if any.  Reliability:  if one such test set detects a fault, then ideally any test set of the method will also detect the fault. J. B. Goodenough and S. L. Gerhart, Toward a theory of test data selection, IEEE TSE, Vol.SE_3, June 1975. Fault detecting ability Reliable and stable
  • 4. Aug. 2009 4 Exterimental Study of ART Reliability Proposed idea  Research hypothesis  It is inadequate if testing methods are only compared on fault detecting ability. They should also be compared on how reliable they are in the sense of stably producing high fault detecting rates.  Research questions:  Are testing methods differ from each other in reliability?  If yes, what are the factors that affect their reliabilities?  How to compare testing methods’ reliability?  How to measure?  How to do experiments?  What are the practical implications?
  • 5. Aug. 2009 5 Exterimental Study of ART Reliability The approach  Investigation of some specific software testing methods to test the hypothesis:  random software testing (RT)  adaptive random software testing methods (ART) Adaptive random testing have been proposed to further improve the effectiveness and efficiency of random testing by evenly distribute the test cases in the input space. (T.Y. Chen, et al., 2004, …) Random testing have been regarded • as effective as systematic testing • more efficient than systematic methods
  • 6. Aug. 2009 6 Exterimental Study of ART Reliability Background: (1) Random testing  Random testing (RT) is a software testing method that selects or generates test cases through random sampling over the input domain of the SUT according to a given probability distribution.  Representative random testing  Non-representative random testing
  • 7. Aug. 2009 7 Exterimental Study of ART Reliability Definitions Let D be the input domain, i.e. the set of valid input data to the SUT. A test set T=<a1, a2, …, an>, n0, on domain D is a finite sequence of elements in D, i.e. aiD, for all i=1, 2, …, n. The number n of elements in a test set is called the size of the test set. A SUT is tested on a test set T=<a1, a2, …, an> means that the software is executed on inputs a1, a2, …, an one by one with necessary initialization before each execution. Strictly speaking, it should not be called test set as elements are ordered. But, it is a convention in the literature of ART to call them test set. When n=0, the test set is called empty test, which means the software is not executed at all. Without loss of generality, we assume that test sets considered in this paper are non-empty.
  • 8. Aug. 2009 8 Exterimental Study of ART Reliability Definitions A test set T=<a1, a2, …, an> is a random test set if the elements are selected or generated at random independently according to a given probability distribution on the input domain D. If the same input is allowed to occur more than once in a test set, we say that the random test sets are with replacement; otherwise, the test sets are called without replacement. In the experiment reported here, the random test sets are with replacement.
  • 9. Aug. 2009 9 Exterimental Study of ART Reliability Background: (2) Adaptive random testing  Basic idea:  To evenly spread the test cases in the input space so that it is more likely to find faults in the software.  First to generate random test cases, then to manipulate the random test cases to achieve the goal of even spread.  It is proposed and investigated by Prof. T.Y. Chen and his research group. A variety of such manipulations have been proposed and investigated.
  • 10. Aug. 2009 10 Exterimental Study of ART Reliability Adaptive Random Testing 1: Mirror Let D0, D1, D2, …, Dk (k>0) be a disjoint partition of the input domain D. The sub-domains D0, …, Dk are of equal size and there is an one-to-one mapping mi from D0 to Di, i=1, …, k. The sub- domain D0 is called the original sub-domain. The other sub- domains are called the mirror sub-domains. The mappings mi are called the mirror functions. Let T0 =<a0 1, a0 2, …, a0 n> be a random test set on D0. The mirror of the original test sets T0 in a mirror sub-domain Di through mi is a test set Ti, written mi(T0), defined as follows. mi(<a0 1, a0 2, …, a0 n>)=<ai 1, ai 2, …, ai n>, where mi(a0 j ) = ai j, i=1, …, k, j=1, …, n. The mirror test set of T0 , written Mirror(T0), is a test set on D that Mirror(T0) = <a0 1, a1 1, a2 1, …, ak 1, a0 2, a1 2, …, ak 2, …, a0 n, a1 n, …, ak n >.
  • 11. Aug. 2009 11 Exterimental Study of ART Reliability Adaptive Random Testing 2: Distance Let || x, y || be a distance measure defined on input domain D. It is extended to the distance between an element x and a set Y of elements as || x, Y|| = min(||x, y||, yY). A test set T=<a0, a1, a2, …, an> is maximally distanced if for all i=1, 2, …, n, || ai, {a0, …, ai-1}||  || aj, {a0, …, ai-1} ||, for all j > i. Let T be any given test set. The distance manipulation of T is a re-ordering of T’s elements in the sequence so that it is maximally distanced, written Distance(T). A distance measure on a domain D is a function || || from D2 to real numbers in [0,) such that for all x, y, zD, || x, x || = 0; || x, y|| = ||y, x || and ||x , y ||  ||x , z || + || z, y||.
  • 12. Aug. 2009 12 Exterimental Study of ART Reliability Adaptive Random Testing 3: Filter Let || x , y || be a distance measure defined on the input domain D. Let T be any given non-empty test set. Test set T’ =<a1, a2, …, an> is said to be ordered according to the distance to T, if for all i=1, 2,. …, n-1, ||ai, T||  ||ai+1, T||. Let S and C be two given natural numbers such that S > C > 0, and T0, T1, …, Tk be a sequence of test sets of size S>1. Let U0=T0. Assume that Ui=<a1, …, aki> and Ti+1=<c1, …, cS>. We define T’i+1=<b1, …, bS> be a test set obtained from Ti+1 by permutation its elements so that T’i+1 is ordered according to the distance to Ui. Then, we inductively define Ui+1=<a1, …, aki, b1, …, bC>, for i=0, 1, …, k-1. The test set Uk is called the test set obtained by filtering C out of S test cases.
  • 13. Aug. 2009 13 Exterimental Study of ART Reliability Notes on the definitions of ART techniques  Strictly speaking, the results of the above manipulations are not random test sets even if the test sets before manipulation are random.  These manipulations can be combined together to obtain more sophisticated adaptive random testing.  The set of manipulations defined above may not be complete. There are also other manipulations on random test set;  In the literature, ART methods are usually described in the form of test case generation algorithms rather than manipulation operators.  The filter manipulation is not used in the experiments reported in this paper. For example, in mirror adaptive random testing, the input domain is partitioned. A random test set is first generated on the original sub- domain. It is then manipulated by the Distance operation and then Mirrored into the mirror sub-domain. It is defined here to demonstrate that testing methods defined in the form of an algorithm can also be defined formally as manipulations. This simplifies the descriptions of ART testing methods concisely and formally, This enables us to recognise easily the various different ways in which they can be combined.
  • 14. Aug. 2009 14 Exterimental Study of ART Reliability The general framework of experiments with software testing method  Select a set of software under test (SUT): S1, S2, …, Sk, k>0;  Select a measurement f of fault detecting ability on single test set;  Applying the test method to generate test sets T1, T2, …, Tk  Test the SUT on test sets and measure fault detecting ability:  Assume that m1, m2, …, mk are the fault detecting abilities of T1, T2, …, Tk as measured by f in testing SUTs S1, S2, …, Sk,  Calculate the performance of the test method using a fault detecting ability measure Si can be the same as or different from Sj.
  • 15. Aug. 2009 15 Exterimental Study of ART Reliability Background: (4) Measurements of fault detecting ability P-measure: Probability of detecting at least one fault The P-measure of the k tests T1, T2, …, Tk is defined as follows. P(T1, T2, …, Tk)=d/k, where k>0, { ( fails on ), 1 } i i d T a T S a i k      When k, P(T1, T2, …, Tk) approaches the probability that a test set T detects at least one fault.
  • 16. Aug. 2009 16 Exterimental Study of ART Reliability Understanding P-measure When the test cases in each test set T are generated at random independently using the same distribution of the operation profile, the probability that a test set of size w detects at least one fault is 1-(1-)w, where  is the failure probability of the software. Formally, for random testing, we have that P(T) 1-(1-)w, where w = | T |. In other words, the P-measure of random testing is an estimation of the value 1-(1-)w, where w is the test size.
  • 17. Aug. 2009 17 Exterimental Study of ART Reliability Measurements of fault detecting ability (2) E-measure: Average number of failures The E-measure of k tests T1, T2, …, Tk is defined as follows. M(T1, T2, …, Tk)= , where k>0, , i=1,…, k. 1 1 k i i u k         { fails on } i i u a T S a  
  • 18. Aug. 2009 18 Exterimental Study of ART Reliability Understanding the E-measure When the test sets Ti are of the same size, say w, and the test cases are selected at random independently using the same distribution as the operation profile, the E-measure approaches w, when k. Here,  is the failure probability of the SUT. Formally, for random testing we have that M(T)  w, where w = | T | . In other words, the E-measure of random testing is the estimation of w, where w is the test size. When test set size is 1, for random testing, P-measure equals to E-measure.
  • 19. Aug. 2009 19 Exterimental Study of ART Reliability Measurements of fault detecting ability (3) F-measure: The First failure The F-measure is defined as follows. F(T1, T2, …, Tk)= , where k>0, Ti=<a1, a2, …, aki>, vi is a natural number that S fails on the test case avi and for all 0<n<vi, S is correct on test case an. 1 1 k i i v k         It is assumed that the sizes of the test sets Ti, i=1,…, k, are large enough so that vi exists.
  • 20. Aug. 2009 20 Exterimental Study of ART Reliability Understanding the F-measure  The F-measure is a random variable of geometric distribution, i.e. the distribution of first success Bernoulli trial, under the conditions:  the test cases ai in each test set T are selected at random independently  using the same distribution of the operation profile  In other words, when k, F(T1, T2, …, Tk) 1/, where  is the probability of failure. F(T)  1/ Summary. For random testing: P(T) 1-(1-)w M(T)  w F(T)  1/ where w= |T|
  • 21. Aug. 2009 21 Exterimental Study of ART Reliability Measuring the reliability of test methods S-measure: The variation of fault detecting ability Let:  T1, T2, …, Tk be a number of test sets generated according to a testing method  f be a given measurement of fault detecting ability, The S-measure S(T1, T2, …, Tk) of tests T1, T2, …, Tk is formally defined as follows. S(T1, T2, …, Tk)= , where k>1, mi=f(Ti), i=1,…, k, . 2 1 1 ( ) 1 k i i m m k  - -  1 / k i i m m k    The S-measure is the sample standard deviation of the values m1, m2, …, mk of the f-measures on the test sets. The smaller the S-measure, the more reliable the testing method is.
  • 22. Aug. 2009 22 Exterimental Study of ART Reliability The experiment: the goals  to evaluate ART methods on their reliabilities of fault-detecting abilities with respect to realistic faults.  to identify the factors that influence the reliability of fault detecting ability.  two main candidate factors:  the reliability (or equivalently the failure probability) of the SUT  the regularity of failure domains. This is achieved through the design of experiment with repetition of testing on a number of test sets. Andrews, Briand and Labiche (2005): mutants systematically generated by using mutation testing tools can represent very well the real faults. Thus, we use mutants in our experiments rather than simulation.
  • 23. Aug. 2009 23 Exterimental Study of ART Reliability Experiment process A. Selection of subject programs  GCD: the greatest common divisor  LCM: the least common multiplier B. Generation of mutants  Systematically by using the MuJava testing tool C. Generation of test sets  For each testing method, five test sets were generated  Each test set consists of 1000 test cases. D. Test the mutants  Both F-measures and E-measures are used E. Analysis of the test results Two dimensional input space on natural numbers.
  • 24. Aug. 2009 24 Exterimental Study of ART Reliability Distribution of mutants Mutant Type GCD LCM Total AOIS 54 52 106 AOIU 9 6 15 AORB 4 4 8 AORS 0 1 1 LOI 14 14 28 ROR 15 15 30 Total 96 92 187 Equivalent Mutants 28 21 49 Trivial Mutants 27 29 56 Used in experiment 41 52 93 The trivial mutants fail on every test case. Types of mutants according to the mutation operators.
  • 25. Aug. 2009 25 Exterimental Study of ART Reliability Test Set Generation Algorithm Step 1. The input domain is divided into two sub-domains. The original sub- domain D0={1..5000}  {1..10000}, The mirror sub-domain D1={5001..10000}  {1.. 10000}. Step 2. Use pseudo-random function of the Java language with different seeds to generate 500 random test cases in the domain D0. Let T1 be the set of these test cases, where the elements are ordered in the order that they are generated. Step 3. The mirror test set T2 is generated by applying the following mirror mapping m: D0  D1. m(<x,y>)=<x+5000, y>. (*) The test set RT is obtained by merging T1 with T2, i.e. adding the elements of T2 at the end of T1. Step 4. Let T=T1T2. The test set DMART is obtained by applying the Distance manipulation on T, i.e. DMART=Distance(T). Step 5. Let T3 be the result of applying the distance manipulation on T1, i.e. T3=Distance(T1). The test set MDART is obtained by applying the Mirror manipulation on T3 with the mirror mapping m as defined in (*) above, i.e. MDART=Mirror(T3)=Mirror(Distance(T1)). 1010 103
  • 26. Aug. 2009 26 Exterimental Study of ART Reliability The characteristics of the mutants Distribution of average E-measures over 5 test sets Trivial mutants
  • 27. Aug. 2009 27 Exterimental Study of ART Reliability What’s new in the mutants? In the existing research on ART techniques, the subject samples (i.e. the SUT) are always selected with very low failure rates.  Mayer, Chen, Huang (2006): the comparison of RT and ART techniques  mutants with a failure rate higher than 0.05 were dismissed.  all the mutants of failure rates at most 0.05 were treated equally in statistical analysis.  Chen et al. (2004): the performances of ART testing techniques were evaluated by simulating the failure rates of 0.01, 0.005, 0.001 and 0.0005.  Chen et al. (2007): most of the simulations are for failure rates less then 25%. There are only three sets of simulation data for failure rates higher than 25%, i.e. , 1/2 and . 2 /2 2 /4 Open question: How does ART perform on SUT that have higher failure rates. Simulation experiments on ART
  • 28. Aug. 2009 28 Exterimental Study of ART Reliability Distribution of StDev of E-measures over 5 test sets
  • 29. Aug. 2009 29 Exterimental Study of ART Reliability The relationship between Avg E-measures and their StDev
  • 30. Aug. 2009 30 Exterimental Study of ART Reliability Main findings : (1) Fault detecting abilities Average F-measures over all non-trivial mutants
  • 31. Aug. 2009 31 Exterimental Study of ART Reliability ( ) ( ) 1.973 1.901 3.65% ( ) 1.973 F RT F MDART F RT - -   ( ) ( ) 1.973 1.625 17.64% ( ) 1.973 F RT F DMART F RT - -   DMART improves RT by 17.64%, MDART improves RT by 3.75%; How much ART improves RT?
  • 32. Aug. 2009 32 Exterimental Study of ART Reliability Comparison with existing works  The results we obtained seem inconsistent with existing works obtain by simulations.  MART and DART were of similar fault detecting abilities in terms of average F-measures. They only differ from each other in the time needed to generate test cases.  The scale of improvement is much smaller than that demonstrated in existing work, which could be up to 40% increase in F-measures while we have shown that the average improvement is less than 20%.
  • 33. Aug. 2009 33 Exterimental Study of ART Reliability Main findings: (2) Factors affect ability What is the main factors that affect ART’s fault detecting ability? three levels of F-measures (>2.5, 1.5 ~ 2.5, <1.5) the M-measures can be divided into three areas (240~340, 340 ~ 680, >680).
  • 34. Aug. 2009 34 Exterimental Study of ART Reliability This explains why our results are different from existing works  Our mutants have higher failure rates.  Others’ have lower faiulre rates ART’s fault detecting abilities increase as SUT’s failure rate decreases.
  • 35. Aug. 2009 35 Exterimental Study of ART Reliability Main findings: (3) ART’s reliability DMART is the most reliable testing method among the three.
  • 36. Aug. 2009 36 Exterimental Study of ART Reliability How much ART improves RT on reliability? ( ) ( ) 1.413 1.324 6.30% ( ) 1.413 S RT S MDART S RT - -   ( ) ( ) 1.413 0.951 32.70% ( ) 1.413 S RT S DMART S RT - -   DMART made a significant improvement in reliability over RT. The improvement that MDART made is much smaller
  • 37. Aug. 2009 37 Exterimental Study of ART Reliability Main findings: (4) Factors affect reliability Is SUT’s failure rate a factor that affects ART reliability? The relationship between the standard deviation of the F-measure on DMART tests and the average failure rates of the mutants.
  • 38. Aug. 2009 38 Exterimental Study of ART Reliability The correlation coefficients between the average E-measure and the average standard deviations of F-measure of RT, MDART and DMART on all mutants are -0.735, -0.645 and -0.615, respectively. The reliabilities of these testing methods depend on SUT’s failure rate. The variation of DMART tests’ F- measure decreases as the average failure rates increases.
  • 39. Aug. 2009 39 Exterimental Study of ART Reliability Is the ‘regularity’ of failure domain a factor that affects reliability? As the standard deviations of E-measures increases, the standard deviations of F-measures tend to increase.
  • 40. Aug. 2009 40 Exterimental Study of ART Reliability The correlation coefficients between the standard deviations of F- measures of RT, MDART and DMART and the standard deviations of E-measures for all mutants are 0.574, 0.464 and 0.430, respectively. The average standard deviation is calculated on mutants in various ranges of standard deviations of E-measures The reliabilities of testing methods are related to the standard deviation of E-measures, i.e. the regularity of failure domains.
  • 41. Aug. 2009 41 Exterimental Study of ART Reliability Conclusion  We confirmed that ART not only improves the fault detecting ability even on higher failure rate SUT  We discovered that ART significantly improves the reliability of fault detecting ability  the variations on fault detecting abilities are significantly smaller than random testing, where  fault detecting ability is measured by F-measures,  the variation is measured by the standard deviation of F-measures.  We discovered two factors that affect ART’s reliability.  the failure rate of the software under test: when the SUT’s failure rate increases, the test method’s reliability increases.  the regularity of the failure domain: when the regularity increases, the reliability also increases, where  regularity is measured by the standard deviation of E-measures.
  • 42. Aug. 2009 42 Exterimental Study of ART Reliability References  Yu Liu and Hong Zhu, An Experimental Evaluation of the Reliability of Adaptive Random Testing Methods, Proc. of The Second IEEE International Conference on Secure System Integration and Reliability Improvement (SSIRI 2008), Yokohama , Japan, July 14-17, 2008, pp24-31.