Evolutionary Testing of Stateful Systems:
a Holistic Approach

Matteo Miraz                Advisor: prof. L. Baresi
Febuary 18, 2011          Coadvisor: prof. P. L. Lanzi
Motivations                                                                2




• Software systems permeate (almost) every aspect of our life
   • Software is buggy
   • In 2002 the costs related to software errors are estimated
     in 60 Billion USD




     1999: NASA Mars Climate Orbiter ($125 million)   2011: iPhone Alarm

         DEI
Motivations                                                         3




Testing is an effective technique to increase the quality of software
    Does not guarantee that the system is error-free
    Is extremely expensive, as it requires up to half of the entire
      development effort


Stringent requirements on time-to-market
limit the testing effort
• We spoke with an important software
   consultant company, with major Italian
   banks as customers…
     • The good: they save up to half of
        the entire development effort
     • The bad: they do not test anything
     • The ugly: we had to explain them
        what testing is

         DEI
Related Work                                     4




• The research community proposed several ways
  to automatically generate tests

   • Symbolic Execution
   • Search-Based Software Testing




        DEI
Symbolic Execution                                                       5




•   The program is executed with
    symbolic values as input parameters
    (instead of concrete inputs)

                                                             PC: a = 3
•   Paths are associated with a                              PC: a = 3 & log10(b) = 7
    Path Condition (PC)
     • A PC constraints symbolic
       inputs to traverse the path
     • It is created incrementally:
       at each condition a formula is
                                          PC: a = 3 & log10(b) = 7
       added to the Path Condition
                                                Constraint
                                                  solver
•   A constraint solver is used
    to find concrete inputs for PCs       toTest(3, 10 000 000)




          DEI
Search Based Software Testing: a survey                                          6




• Goal: complete branch coverage
• Paradigm “command and conquer”
    The program is analyzed
     to identify branches
    Each branch is considered
     separately from the others
                                            entry



                                      w = Math.log(b)



                                          if (a == 3)

                                           F            T

                                                            if (w == 7)

                                                            F             T

                                                                              // target



                                               exit



       DEI
Search Based Software Testing: a survey                                                                   7




Once a target has been selected
• Identify dependent branches
• Search for inputs able to reach the target
• Fitness Function:
                                             Fitness(a = 0, b = 0) =
   • Approach Level +                      1 + norm(| 0 – 3 |) = 1.003
      Normalized distance
                                                                     entry



                                                                w = Math.log(b)
   Parameters        Fitness
                                          distance: | 0 – 3 |
   a=0, b=0          1.003                                         if (a == 3)
                                         Approach Level: 1
                                                                                 T
   a=1, b=0          1.002                                          F

                                                                                     if (w == 7)
   a=3, b=0          0.999               Approach Level: 0
                                                                                     F             T
   a=4, b=0          1.001                                                                             // target

   a=3, b=10         0.005
   a=3, b=100        0.003                                              exit



          DEI
Problems of Search Based Software Testing                   8




Usage of a single guidance    Works on isolated functions
       Coarse / Misleading   Modern systems are object-oriented
 void foo(int []a) {
   int flag = 0;
   for(int i=0; i<10; i++)
     if(a[i] == 23)
       flag = 1;

     if(flag == 1) {
       // target
     }
 }




         DEI
Targeting Stateful Systems                          9




• We Target Stateful systems (e.g., Java classes)




• Feature – State Loop
   • A feature might put objects
     in “particular ” states
   • A “particular” state might




                                                        enables
     enable other features




        DEI
Our Approach: TestFul                                          10




                        An individual of the Evolutionary
                        algorithm (a Test) is a sequence of
                        constructor and method calls on the test
                        cluster (i.e., the class under test + all the
                        required dependencies)

                        We use a multi-objective EA, guided by:

                        •   The coverage of tests
                             •   Tests with a high coverage stress
                                 more deeply the class under test and
                                 put objects in more interesting states

                        •   Compactness of tests
                             •   Unnecessary long tests waste
                                 computational resources during their
                                 evaluation




      DEI
Improvements                                    11




How can we improve        How can we correctly
the efficiency?           reward tests?
• We use efficiency       • We use
  enhancement               complementary
  techniques:               coverage criteria:
  • Local Search            • Control Flow Graph
  • Seeding                 • Data Flow Graph
  • Fitness Inheritance     • Behavioral




      DEI
Local Search                                                     12




• We hybridize the evolutionary algorithm with a step of local search

   •   EA works at class level
        • Reaches complex
           state configurations

   •   Local search works on methods
        • It focuses on the easiest
           missing element
        • It adopts a simpler search
           strategy (hill climbing)

   •   Which element to improve?
        • One of the best elements
        • 5% of the population
        • 10% of the population




         DEI
Local Search                                                 13




• At each generation of the Evolutionary Algorithm,
  we pick the { 5% / 10% / best } test and we target
  the easiest branch to reach among those not
  exercised yet

                                                       target




            DEI
Local Search                                                       14




• At each generation of the Evolutionary Algorithm, we pick the { 5% /
  10% / best } test and we target the easiest branch to reach among
  those not exercised yet
• We perform a local search by using a simple algorithm (hill climbing)
   • The guidance is provided by the following fitness function:




• The result (if any) is merged in the evolutionary algorithm’s population




            DEI
Contribution of the Local Search                         15




                                    Simple Problem
                                        (Fraction)




                                   Complex Problem
                                   (Disjoint Set Fast)




       DEI
Seeding                                                                      16




• Problem:
   •   Testful starts the evolution from a population with a poor quality.
• Solution:
   •   Run an inexpensive technique (random search) for a short time,
       and use its result as initial population




         DEI
Fitness Inheritance                                                   17




• Problem:
   •   Executing tests and collecting coverage is expensive
• Solution:
   •   Evaluate the “real” fitness only on a part of the population
        • Other individuals inherit the fitness of their parents
   •   Which policy to select individuals to evaluate?
        • Uniform selection
        • Frontier selection: better tests are evaluate more often




         DEI
Improvements                                    18




How can we improve        How can we correctly
the efficiency?           reward tests?
• We use efficiency       • We use
   enhancement              complementary
   techniques:              coverage criteria:
  • Local Search            • Control Flow Graph
  • Seeding                 • Data Flow Graph
  • Fitness Inheritance     • Behavioral




      DEI
Coverage of the Control-Flow Graph                                19




Testful aims to maximize:
    The number of basic blocks executed (~ statement coverage)
    The number of branches exercised
We compared our approach against
    jAutoTest: a Java Port of Bertrand Meyer’s Random Testing Approach
    Randoop: Michael Ernst’s taboo search
    etoc: Paolo Tonella’s Evolutionary Testing of Classes
On a benchmark of classes from
                                                                   10 min
• Literature
• Known software libraries
• Independent testing benchmarks




        DEI
Coverage of the Data-Flow Graph                                             20




Fault-detection effectiveness of Statement and Branch coverage has been disputed


•   Criteria on the coverage of the Data-Flow Graph have been proposed
     • Fit well on Object-Oriented systems
     • Data dependency
          • Statements might define the value of variable (e.g., v = 10 )
          • Statements might use the value of some variables (e.g. print(v) )
               • P-Use if the use happens in a predicate (e.g. if(v == 3) )

•   TestFul leverages Data-Flow information
      Extend the fitness function with all def-use and all def-puse coverage
      Improve Local Search and solve data-dependent problems
                                                             (e.g., flag variable)
•   Tracking data-flow coverage is expensive! ()




          DEI
Coverage of the Data-Flow Graph                           21




Performed an extensive empirical evaluation between
• Java Path Finder (symbolic execution)
• Testful only guided by Control-Flow information




               The structural coverage remains high,
               in spite of the higher monitoring cost

                        TestFul outperforms JPF
               (e.g., statement coverage: 89% vs. 35%)

           The data-flow coverage increases a little,
         and its standard deviation lower significantly



         DEI
Some insights from our empirical evaluation…                 22




             «container classes are the de facto benchmark
                 for testing software with internal state»
                              [ Arcuri 2010 ]




       DEI
Some insights from our empirical evaluation…                      23




S. Mouchawrab, L. C. Briand, Y. Labiche, and M. Di Penta.
Assessing, Comparing, and Combining State Machine-Based Testing and
Structural Testing: A Series of Experiments.
IEEE Transactions on Software Engineering, 2010

         DEI
Coverage of the Behavioral Model                                      24




White-Box coverage criteria judge tests by considering the covered code
    • If we execute the code that contains an error, we’ll likely spot it!
    • What if there is a high-level problem? (e.g., a feature is missing)

Black-Box testing derives tests from the specification of the system
    • Adopts an outer perspective, is more resilient to high-level errors
    • Requires the specification of the system, often not available
    • We can both infer the behavioral model of the system, and reward
      tests according to their ability to thoroughly exercise the system




         DEI
Behavioral Coverage: Preliminary Results   25




       DEI
Conclusions and Future Work                                      26




• Contributions:
   • Search-Based Software Testing
   • Holistic Approach
   • Efficiency Enhancement Techniques
   • Complementary Coverage Criteria
   • Extensive Empirical Validation

• Most interesting research directions
   • Use more sources to seed the evolutionary algorithm
   • Consider coarse-grained tests (e.g. integration testing)
   • Consider other types of stateful systems (e.g., services)




        DEI
Thank you for the attention…
                    Questions?




    DEI

Evolutionary Testing of Stateful Systems: a Holistic Approach

  • 1.
    Evolutionary Testing ofStateful Systems: a Holistic Approach Matteo Miraz Advisor: prof. L. Baresi Febuary 18, 2011 Coadvisor: prof. P. L. Lanzi
  • 2.
    Motivations 2 • Software systems permeate (almost) every aspect of our life • Software is buggy • In 2002 the costs related to software errors are estimated in 60 Billion USD 1999: NASA Mars Climate Orbiter ($125 million) 2011: iPhone Alarm DEI
  • 3.
    Motivations 3 Testing is an effective technique to increase the quality of software  Does not guarantee that the system is error-free  Is extremely expensive, as it requires up to half of the entire development effort Stringent requirements on time-to-market limit the testing effort • We spoke with an important software consultant company, with major Italian banks as customers… • The good: they save up to half of the entire development effort • The bad: they do not test anything • The ugly: we had to explain them what testing is DEI
  • 4.
    Related Work 4 • The research community proposed several ways to automatically generate tests • Symbolic Execution • Search-Based Software Testing DEI
  • 5.
    Symbolic Execution 5 • The program is executed with symbolic values as input parameters (instead of concrete inputs) PC: a = 3 • Paths are associated with a PC: a = 3 & log10(b) = 7 Path Condition (PC) • A PC constraints symbolic inputs to traverse the path • It is created incrementally: at each condition a formula is PC: a = 3 & log10(b) = 7 added to the Path Condition Constraint solver • A constraint solver is used to find concrete inputs for PCs toTest(3, 10 000 000) DEI
  • 6.
    Search Based SoftwareTesting: a survey 6 • Goal: complete branch coverage • Paradigm “command and conquer”  The program is analyzed to identify branches  Each branch is considered separately from the others entry w = Math.log(b) if (a == 3) F T if (w == 7) F T // target exit DEI
  • 7.
    Search Based SoftwareTesting: a survey 7 Once a target has been selected • Identify dependent branches • Search for inputs able to reach the target • Fitness Function: Fitness(a = 0, b = 0) = • Approach Level + 1 + norm(| 0 – 3 |) = 1.003 Normalized distance entry w = Math.log(b) Parameters Fitness distance: | 0 – 3 | a=0, b=0 1.003 if (a == 3) Approach Level: 1 T a=1, b=0 1.002 F if (w == 7) a=3, b=0 0.999 Approach Level: 0 F T a=4, b=0 1.001 // target a=3, b=10 0.005 a=3, b=100 0.003 exit DEI
  • 8.
    Problems of SearchBased Software Testing 8 Usage of a single guidance Works on isolated functions Coarse / Misleading Modern systems are object-oriented void foo(int []a) { int flag = 0; for(int i=0; i<10; i++) if(a[i] == 23) flag = 1; if(flag == 1) { // target } } DEI
  • 9.
    Targeting Stateful Systems 9 • We Target Stateful systems (e.g., Java classes) • Feature – State Loop • A feature might put objects in “particular ” states • A “particular” state might enables enable other features DEI
  • 10.
    Our Approach: TestFul 10 An individual of the Evolutionary algorithm (a Test) is a sequence of constructor and method calls on the test cluster (i.e., the class under test + all the required dependencies) We use a multi-objective EA, guided by: • The coverage of tests • Tests with a high coverage stress more deeply the class under test and put objects in more interesting states • Compactness of tests • Unnecessary long tests waste computational resources during their evaluation DEI
  • 11.
    Improvements 11 How can we improve How can we correctly the efficiency? reward tests? • We use efficiency • We use enhancement complementary techniques: coverage criteria: • Local Search • Control Flow Graph • Seeding • Data Flow Graph • Fitness Inheritance • Behavioral DEI
  • 12.
    Local Search 12 • We hybridize the evolutionary algorithm with a step of local search • EA works at class level • Reaches complex state configurations • Local search works on methods • It focuses on the easiest missing element • It adopts a simpler search strategy (hill climbing) • Which element to improve? • One of the best elements • 5% of the population • 10% of the population DEI
  • 13.
    Local Search 13 • At each generation of the Evolutionary Algorithm, we pick the { 5% / 10% / best } test and we target the easiest branch to reach among those not exercised yet target DEI
  • 14.
    Local Search 14 • At each generation of the Evolutionary Algorithm, we pick the { 5% / 10% / best } test and we target the easiest branch to reach among those not exercised yet • We perform a local search by using a simple algorithm (hill climbing) • The guidance is provided by the following fitness function: • The result (if any) is merged in the evolutionary algorithm’s population DEI
  • 15.
    Contribution of theLocal Search 15 Simple Problem (Fraction) Complex Problem (Disjoint Set Fast) DEI
  • 16.
    Seeding 16 • Problem: • Testful starts the evolution from a population with a poor quality. • Solution: • Run an inexpensive technique (random search) for a short time, and use its result as initial population DEI
  • 17.
    Fitness Inheritance 17 • Problem: • Executing tests and collecting coverage is expensive • Solution: • Evaluate the “real” fitness only on a part of the population • Other individuals inherit the fitness of their parents • Which policy to select individuals to evaluate? • Uniform selection • Frontier selection: better tests are evaluate more often DEI
  • 18.
    Improvements 18 How can we improve How can we correctly the efficiency? reward tests? • We use efficiency • We use enhancement complementary techniques: coverage criteria: • Local Search • Control Flow Graph • Seeding • Data Flow Graph • Fitness Inheritance • Behavioral DEI
  • 19.
    Coverage of theControl-Flow Graph 19 Testful aims to maximize:  The number of basic blocks executed (~ statement coverage)  The number of branches exercised We compared our approach against  jAutoTest: a Java Port of Bertrand Meyer’s Random Testing Approach  Randoop: Michael Ernst’s taboo search  etoc: Paolo Tonella’s Evolutionary Testing of Classes On a benchmark of classes from 10 min • Literature • Known software libraries • Independent testing benchmarks DEI
  • 20.
    Coverage of theData-Flow Graph 20 Fault-detection effectiveness of Statement and Branch coverage has been disputed • Criteria on the coverage of the Data-Flow Graph have been proposed • Fit well on Object-Oriented systems • Data dependency • Statements might define the value of variable (e.g., v = 10 ) • Statements might use the value of some variables (e.g. print(v) ) • P-Use if the use happens in a predicate (e.g. if(v == 3) ) • TestFul leverages Data-Flow information  Extend the fitness function with all def-use and all def-puse coverage  Improve Local Search and solve data-dependent problems (e.g., flag variable) • Tracking data-flow coverage is expensive! () DEI
  • 21.
    Coverage of theData-Flow Graph 21 Performed an extensive empirical evaluation between • Java Path Finder (symbolic execution) • Testful only guided by Control-Flow information The structural coverage remains high, in spite of the higher monitoring cost TestFul outperforms JPF (e.g., statement coverage: 89% vs. 35%) The data-flow coverage increases a little, and its standard deviation lower significantly DEI
  • 22.
    Some insights fromour empirical evaluation… 22 «container classes are the de facto benchmark for testing software with internal state» [ Arcuri 2010 ] DEI
  • 23.
    Some insights fromour empirical evaluation… 23 S. Mouchawrab, L. C. Briand, Y. Labiche, and M. Di Penta. Assessing, Comparing, and Combining State Machine-Based Testing and Structural Testing: A Series of Experiments. IEEE Transactions on Software Engineering, 2010 DEI
  • 24.
    Coverage of theBehavioral Model 24 White-Box coverage criteria judge tests by considering the covered code • If we execute the code that contains an error, we’ll likely spot it! • What if there is a high-level problem? (e.g., a feature is missing) Black-Box testing derives tests from the specification of the system • Adopts an outer perspective, is more resilient to high-level errors • Requires the specification of the system, often not available • We can both infer the behavioral model of the system, and reward tests according to their ability to thoroughly exercise the system DEI
  • 25.
  • 26.
    Conclusions and FutureWork 26 • Contributions: • Search-Based Software Testing • Holistic Approach • Efficiency Enhancement Techniques • Complementary Coverage Criteria • Extensive Empirical Validation • Most interesting research directions • Use more sources to seed the evolutionary algorithm • Consider coarse-grained tests (e.g. integration testing) • Consider other types of stateful systems (e.g., services) DEI
  • 27.
    Thank you forthe attention… Questions? DEI