Improving Effectiveness of Automated Software Testing
                               in the Absence of Specifications

effectiveness of automated testing in the absence of speci-                         +                       Test
isting test suite. In particular, we use Daikon [4] to infer     spectra differences can be used to locate deviation roots...
ers’ confidence on the program under test. Second, we                 [7] S. Khurshid, C. S. Pasareanu, and W. Visser. Gene...
Upcoming SlideShare
Loading in …5

Improving Effectiveness of Automated Software Testing in the ...


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Improving Effectiveness of Automated Software Testing in the ...

  1. 1. Improving Effectiveness of Automated Software Testing in the Absence of Specifications Tao Xie Department of Computer Science North Carolina State University Raleigh, NC 27695 Abstract they can conduct regression testing by rerunning the exist- ing test inputs in order to assure that no regression faults are Program specifications can be valuable in improving the introduced. Even when expected outputs are not created for effectiveness of automated software testing in generating the existing test inputs, the actual outputs produced by the test inputs and checking test executions for correctness. Un- new version can be automatically compared with the ones fortunately, specifications are often absent from programs produced by the old version in order to detect behavioral in practice. We present a framework for improving effective- differences. There are two major challenges in automated ness of automated testing in the absence of specifications. software testing: The framework supports a set of related techniques, in- Generate test inputs effectively. In testing object- cluding redundant-test detection, non-redundant-test gener- oriented problems, a test input typically consists of a se- ation, test selection, test abstraction, and program-spectra quence of method calls on the objects of the class; in- comparison. The framework has been implemented and em- puts for method calls consist of not only method arguments pirical results have shown that the developed techniques but also receiver-object states, which are sometimes struc- within the framework improve the effectiveness of auto- turally complex inputs, such as linked data structures that mated testing by detecting high percentage of redundant must satisfy complex properties. Method sequences can be tests among test inputs generated by existing tools, gener- generated to construct desired object states indirectly [11]; ating non-redundant test inputs to achieve high structural however, it is generally expensive to enumerate all possible coverage, reducing inspection efforts for detecting problems method sequences even given a small number of argument in the program, and exposing behavioral differences during values and a small bound on the maximum sequence length. regression testing. Verify test executions effectively. Test inputs can be automatically generated but the expected outputs for gener- ated test inputs often cannot be generated without specifica- 1 Introduction tions. When no expected outputs are available, developers often rely on program crashes or uncaught exceptions [3] To reduce the laborious human effort in software testing, as symptoms for unexpected behavior; however, it is lim- developers can conduct automated software testing by using ited in exploiting these generated test inputs by verifying tools to automate some activities in software testing. Soft- only whether the program crashes or throws uncaught ex- ware testing activities typically include generating test in- ceptions. In regression testing, developers can compare the puts, running test inputs, and verifying test executions. De- actual outputs of a new version of the program with the ac- velopers can use some existing frameworks or tools such as tual outputs of a previous version; however, behavioral dif- the JUnit testing framework [5] to write unit-test inputs and ferences between versions often cannot be propagated to the their expected outputs. Then the JUnit framework can auto- observable outputs that are compared between versions. mate running test inputs and verifying actual outputs against Although specifications can be used to improve the ef- the manually written assertions. To reduce the burden of fectiveness of generating test inputs and checking program manually creating test inputs, developers can use some ex- correctness when running test inputs without expected out- isting test-input generation tools [1, 3, 11] to generate test puts, specifications often do not exist in practice. Our re- inputs automatically. After developers modify a program, search focuses on developing a framework for improving
  2. 2. effectiveness of automated testing in the absence of speci- + Test fications. The framework includes techniques and tools for Test Program executor Program inputs execution improving the effectiveness of generating test inputs and in- info specting their executions for correctness, two major chal- Test Behavior lenges in automated software testing. generation inference Non- Program Redundant-test Test Test 2 Framework detector redundant-test generator selector abstractor spectra comparator Figure 1 shows our framework for improving effective- ness of automated testing. The framework consists of two groups of components. The first group of components —the Test Feedback redundant-test detector [14] and non-redundant-test genera- Test inputs generator Existing test tor [14, 15]— addresses the issues in generating test inputs. generators The second group of components (the test selector [17], test abstractor [18, 19], and program-spectra comparator [20]) infers program behavior dynamically in order to address the Figure 1. Framework for improving effective- issues in checking the correctness of test executions. The ness of automated testing second group of components further sends feedback infor- mation to the first group to guide test generation [16]. 2.2 Non-Redundant-Test Generator 2.1 Redundant-Test Detector Existing unit-test-generation tools generate a large num- Based on the notion of avoiding generating redundant- ber of test inputs to exercise different sequences of method tests, we have developed a non-redundant-test generator, calls in the interface of the class under test. Different com- which explores the concrete or symbolic receiver-object binations of method calls on the class under test result in a state space by using method calls (through normal program combinatorial explosion of tests. Because of resource con- execution or symbolic execution). The test generator based straints, existing test-generation tools often generate differ- on concrete-state exploration faces the state explosion prob- ent sequences of method calls whose lengths range from lem. Recently, symbolic execution [8] has been used to di- one [3] to three [11]. However, sequences of up-to-three rectly construct symbolic states for receiver objects [7, 13]; method calls are often insufficient for detecting faults or however, the application of symbolic execution requires the satisfying test adequacy criteria. In fact, a large portion of user to provide specially constructed class invariants [9]. these different sequences of method calls exercise no new Without requiring any class invariant, our test generator can method behavior; in other words, the tests formed by this also use symbolic execution of method sequences to explore large portion of sequences are redundant tests. We have de- the symbolic receiver-object states and prune this explo- fined redundant tests by using method-input values (includ- ration based on novel state comparisons (comparing both ing both argument values and receiver-object states). When heap representations and symbolic representations). Our the method-input values of each method call in a test have extension and application of symbolic execution in state been exercised by the existing tests, the test is considered as exploration not only alleviate the state explosion problem a redundant test even if the sequence of method calls in the but also generate relevant method arguments for method se- test is different from the one of any existing test. We have quences automatically by using a constraint solver. Our developed a redundant-test detector, which can post-process experimental results have shown the effectiveness of the a test suite generated by existing test-generation tools and test generation based on symbolic-state exploration: it can output a reduced test suite containing no redundant tests. achieve higher branch coverage faster than the test genera- Our approach not only presents a foundation for existing tion based on concrete-state exploration. tools that generate non-redundant tests [2, 10] but also en- ables any other test-generation tools [3, 11] to avoid gen- 2.3 Test Selector erating redundant tests by incorporating the redundant-test detection in their test generation process. Our experimental Because it is infeasible for developers to inspect the ac- results have shown the effectiveness of the redundant-test tual outputs of a large number of generated tests, we have detection tool: about 90% of the tests generated by a com- developed a test selector to select a small valuable subset mercial testing tool [11] are detected and reduced by our of generated tests for inspection. These selected tests ex- tool as redundant tests. ercise new behavior that has not been exercised by the ex-
  3. 3. isting test suite. In particular, we use Daikon [4] to infer spectra differences can be used to locate deviation roots, program behavior dynamically from the execution of the which are program locations that trigger the behavior devi- existing (manually) constructed test suite. We next feed in- ations. Inspecting value spectra differences can allow de- ferred behavior in the form of specifications to an existing velopers to determine whether program changes introduce specification-based test-generation tool [11]. The tool gen- intended behavioral differences or regression faults. Our erates tests to violate the inferred behavior. These violat- experimental results have shown that comparing value spec- ing tests are selected for inspection, because these violating tra can effectively expose behavioral differences between tests exhibit behavior different from the behavior exhibited versions even when their actual outputs are the same, and by the existing tests. Developers can inspect these violat- value spectra differences can be used to locate deviation ing tests together with the violated properties, equip these roots with high accuracy. tests with expected outputs, and add them to the existing test suite. Our experimental results have shown that the se- 2.6 Feedback Loop lected tests have a high probability of exposing anomalous program behavior (either faults or failures) in the program. Dynamic behavior inference requires a good-quality test suite to infer behavior that is close to what shall be de- 2.4 Test Abstractor scribed by a specification (if it is manually constructed). On the other hand, specification-based test generation can Instead of selecting a subset of generated tests for in- help produce a good-quality test suite but requires specifi- spection, our test abstractor summarizes and abstracts the cations, which often do not exist in practice. There seems receiver-object-state transition behavior exercised by all the to be a circular dependency between dynamic behavior in- generated tests. Because the concrete-state transition dia- ference and (specification-based) test generation. To exploit gram for receiver objects is too complicated for developers the circular dependency and alleviate the problem, we pro- to inspect, the test abstractor uses a state abstraction tech- pose a feedback loop between behavior inference and test nique based on the observers in a class interface; these ob- generation. The feedback loop starts with an existing test servers are the public methods whose return types are not suite (constructed manually or automatically) or some exist- void. An abstract state for a concrete state is represented by ing program runs. By using one of the behavior-inference the concrete state’s observable behavior, consisting of the components (the test selector, test abstractor, or program- return values of observer-method calls on the concrete state. spectra comparator), we first infer behavior based on the The abstract states and transitions among them are used to existing test suite or program runs. We then feed inferred construct succinct state transition diagrams for developers behavior to a specification-based test-generation tool or a to inspect. Our evaluation has shown that the abstract-state test-generation tool that can exploit the inferred behavior to transition diagrams can help discover anomalous behavior, improve its test generation. The new generated tests can debug exception-throwing behavior, and understand normal be used to infer new behavior. The new behavior can be behavior in the class interface. further used to guide test generation in the subsequent it- eration. Iterations terminate until a user-defined maximum 2.5 Program-Spectra Comparator iteration number has been reached or no new behavior has been inferred from new tests. This feedback loop provides a means to producing better tests and better approximated In regression testing, comparing the actual outputs of specifications automatically and incrementally. In addition, two program versions is limited in exposing the internal be- the by-products of the feedback loop are a set of selected havioral differences during the program execution, because tests for inspection; these selected tests exhibit new behav- internal behavioral differences often cannot be propagated ior that has not been exercised by the existing tests. to observable outputs. A program spectrum is used to char- acterize a program’s behavior [12]. We propose a new class of program spectra, called value spectra, to enrich the exist- 3. Conclusion ing program spectra family, which primarily includes struc- tural spectra (such as path spectra [6, 12]). Value spectra We have proposed a framework for improving effective- capture internal program states during a test execution. A ness of automated testing in the absence of specifications. A deviation is the difference between the value of a variable set of techniques and tools have been developed within the in a new program version and the corresponding one in an framework. First, we have defined redundant tests based old version. We have developed a program-spectra com- on method input values and developed a tool for detecting parator that compares the value spectra from an old version redundant tests among automatically generated tests; these and a new version, and uses the spectra differences to detect identified redundant tests increase testing time without in- behavior deviations in the new version. Furthermore, value creasing the ability to detect faults or increasing develop-
  4. 4. ers’ confidence on the program under test. Second, we [7] S. Khurshid, C. S. Pasareanu, and W. Visser. Generalized have developed a tool that generates only non-redundant symbolic execution for model checking and testing. In Proc. tests by executing method calls symbolically to explore the 9th International Conference on Tools and Algorithms for symbolic-state space. Symbolic execution not only allows the Construction and Analysis of Systems, pages 553–568, us to reduce the state space for exploration but also gener- April 2003. [8] J. C. King. Symbolic execution and program testing. Com- ates relevant method arguments automatically. Third, we mun. ACM, 19(7):385–394, 1976. have used Daikon [4] to infer behavior exercised by the ex- [9] B. Liskov and J. Guttag. Program Development in Java: isting tests and fed the inferred behavior in the form of spec- Abstraction, Specification, and Object-Oriented Design. ifications to a specification-based test generation tool [11]. Addison-Wesley, 2000. Developers can inspect those generated tests that violate [10] D. Marinov and S. Khurshid. TestEra: A novel framework these inferred behavior, instead of inspecting a large num- for automated testing of Java programs. In Proc. 16th IEEE ber of all generated tests. Fourth, we have used the returns International Conference on Automated Software Engineer- of observer methods to group concrete states into abstract ing, pages 22–31, 2001. states, from which we construct succinct observer abstrac- [11] Parasoft Jtest 4.5, 2003. http://www.parasoft. com/. tions for inspection. Fifth, we have defined value spectra to [12] T. Reps, T. Ball, M. Das, and J. Larus. The use of program characterize program behavior, compared the value spectra profiling for software maintenance with applications to the from an old version and a new version, and used the spectra year 2000 problem. In Proc. 6th ESEC/FSE, pages 432–449, differences to detect behavior deviations in the new version. 1997. We have further used value spectra differences to locate de- [13] W. Visser, C. S. Pasareanu, and S. Khurshid. Test input gen- viation roots. Finally, putting behavior inference and test eration with Java PathFinder. In Proc. 2004 ACM SIGSOFT generation together, we can construct a feedback loop be- International Symposium on Software Testing and Analysis, tween these two types of dynamic analysis, starting with an pages 97–107, 2004. existing test suite (constructed manually or automatically) [14] T. Xie, D. Marinov, and D. Notkin. Rostra: A framework or some existing program runs. The feedback loop produces for detecting redundant object-oriented unit tests. In Proc. 19th IEEE International Conference on Automated Software better tests and better approximated specifications automat- Engineering, pages 196–205, Sept. 2004. ically and incrementally. [15] T. Xie, D. Marinov, W. Schulte, and D. Notkin. Symstra: A framework for generating object-oriented unit tests using 4. Acknowledgments symbolic execution. In Proc. 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 365–381, April 2005. I would like to thank my Ph.D. advisor, David Notkin, [16] T. Xie and D. Notkin. Mutually enhancing test generation and my collaborators in my Ph.D. research: Darko Marinov, and specification inference. In Proc. 3rd International Work- Amir Michail, Wolfram Schulte, and Jianjun Zhao. shop on Formal Approaches to Testing of Software, volume 2931 of LNCS, pages 60–69, 2003. [17] T. Xie and D. Notkin. Tool-assisted unit test selection based References on operational violations. In Proc. 18th IEEE International Conference on Automated Software Engineering, pages 40– 48, 2003. [1] Agitar Agitator 3.0, 2005. [18] T. Xie and D. Notkin. Automatic extraction of object- [2] C. Boyapati, S. Khurshid, and D. Marinov. Korat: auto- oriented observer abstractions from unit-test executions. In mated testing based on Java predicates. In Proc. Interna- Proc. 6th International Conference on Formal Engineering tional Symposium on Software Testing and Analysis, pages Methods, pages 290–305, Nov. 2004. 123–133, 2002. [19] T. Xie and D. Notkin. Automatic extraction of sliced object [3] C. Csallner and Y. Smaragdakis. JCrasher: an automatic ro- state machines for component interfaces. In Proc. 3rd Work- bustness tester for Java. Software: Practice and Experience, shop on Specification and Verification of Component-Based 34:1025–1050, 2004. Systems, pages 39–46, October 2004. [4] M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. [20] T. Xie and D. Notkin. Checking inside the black box: Re- Dynamically discovering likely program invariants to sup- gression testing by comparing value spectra. IEEE Trans- port program evolution. IEEE Trans. Softw. Eng., 27(2):99– actions on Software Engineering, 31(10):869–883, October 123, 2001. 2005. [5] E. Gamma and K. Beck. JUnit, 2003. http://www. [6] M. J. Harrold, G. Rothermel, K. Sayre, R. Wu, and L. Yi. An empirical investigation of the relationship between spec- tra differences and regression faults. Journal of Software Testing, Verification and Reliability, 10(3):171–194, 2000.