Scaling Regression Testing to Large Software Systems


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Scaling Regression Testing to Large Software Systems

  1. 1. Scaling Regression Testing to Large Software Systems Alessandro Orso, Nanjuan Shi, and Mary Jean Harrold College of Computing Georgia Institute of Technology Atlanta, Georgia {orso|clarenan|harrold} ABSTRACT When software is modified, during development and main- sion of the software product, P , and needs to regression test tenance, it is regression tested to provide confidence that it before committing the changes to a repository or before the changes did not introduce unexpected errors and that release. new features behave as expected. One important prob- One important problem that D must face is how to select lem in regression testing is how to select a subset of test an appropriate subset T of T to rerun on P . This process cases, from the test suite used for the original version of the is called regression test selection (RTS hereafter). A simple software, when testing a modified version of the software. approach to RTS is to rerun all test cases in T on P , that Regression-test-selection techniques address this problem. is, to select T = T . However, rerunning all test cases in T Safe regression-test-selection techniques select every test case can be expensive and, when there are limited changes be- in the test suite that may behave differently in the origi- tween P and P , may involve unnecessary effort. Therefore, nal and modified versions of the software. Among existing RTS techniques (e.g., [2, 4, 6, 7, 9, 15, 17, 20, 22, 23]) use safe regression testing techniques, efficient techniques are of- information about P , P , and T to select a subset of T with ten too imprecise and achieve little savings in testing effort, which to test P , thus reducing the testing effort. An im- whereas precise techniques are too expensive when used on portant property for RTS techniques is safety. A safe RTS large systems. This paper presents a new regression-test- technique selects, under certain assumptions, every test case selection technique for Java programs that is safe, precise, in the test suite that may behave differently in the original and yet scales to large systems. It also presents a tool that and modified versions of the software [17]. Safety is impor- implements the technique and studies performed on a set tant for RTS techniques because it guarantees that T will of subjects ranging from 70 to over 500 KLOC. The studies contain all test cases that may reveal regression faults in P . show that our technique can efficiently reduce the regression In this paper, we are interested in safe RTS techniques only. testing effort and, thus, achieve considerable savings. Safe RTS techniques (e.g., [6, 7, 15, 17, 20, 22] differ in Categories and Subject Descriptors: D.2.5 [Software efficiency and precision. Efficiency and precision, for an RTS Engineering]: Testing and Debugging—Testing tools; technique, are generally related to the level of granularity at General Terms: Algorithms, Experimentation, Verifica- which the technique operates [3]. Techniques that work at a tion high level of abstraction—for instance, by analyzing change and coverage information at the method or class level—are Keywords: Regression testing, testing, test selection, soft- more efficient, but generally select more test cases (i.e., they ware evolution, software maintenance are less precise) than their counterparts that operate at a fine-grained level of abstraction. Conversely, techniques that 1. INTRODUCTION work at a fine-grained level of abstraction—for instance, by As software evolves, regression testing is applied to mod- analyzing change and coverage information at the statement ified versions of the software to provide confidence that the level—are precise, but often sacrifice efficiency. changed parts behave as intended and that the changes did In general, for an RTS technique to be cost-effective, the not introduce unexpected faults, also known as regression cost of performing the selection plus the cost of rerunning faults. In the typical regression testing scenario, D is the the selected subset of test cases must be less than the overall developer of a software product P , whose latest version has cost of rerunning the complete test suite [23]. For regression- been tested using a test suite T and then released. During testing cost models (e.g., [10, 11]), the meaning of the term maintenance, D modifies P to add new features and to fix cost depends on the specific scenario considered. For exam- faults. After performing the changes, D obtains a new ver- ple, for a test suite that requires human intervention (e.g., to check the outcome of the test cases or to setup some machin- ery), the savings must account for the human effort that is saved. For another example, in a cooperative environment, Permission to make digital or hard copies of all or part of this work for in which developers run an automated regression test suite personal or classroom use is granted without fee provided that copies are before committing their changes to a repository, reducing not made or distributed for profit or commercial advantage and that copies the number of test cases to rerun may result in early avail- bear this notice and the full citation on the first page. To copy otherwise, to ability of updated code and improve the efficiency of the republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. development process. Although empirical studies show that SIGSOFT’04/FSE-12, Oct. 31–Nov. 6, 2004, Newport Beach, CA, USA. existing safe RTS techniques can be cost-effective [6, 7, 19], Copyright 2004 ACM 1-58113-855-5/04/0010 ...$5.00.
  2. 2. such studies were performed using subjects of limited size. formation about which classes and interfaces have syntac- In our preliminary studies we found that, in many cases, ex- tically changed, to identify the parts of the program that isting safe techniques are not cost-effective when applied to may be affected by the changes between P and P . (Without large software systems: efficient techniques tend to be too loss of generality, we assume information about syntactically imprecise and often achieve little or no savings in testing changed classes and interfaces to be available. This informa- effort; precise techniques are generally too expensive to be tion can be easily gathered, for example, from configuration- used on large systems. For example, for one of the subjects management systems, IDEs that use versioning, or by com- studied, it took longer to perform RTS than to run the whole parison of the two versions of the program.) The output of test suite. this phase is a subset of the classes and interfaces in the pro- In this paper, we present a new RTS algorithm for Java gram (hereafter, referred to as the partition). In the selec- programs that handles the object-oriented features of the tion phase, the technique takes as input the partition identi- language, is safe and precise, and still scales to large sys- fied by the first phase and performs edge-level test selection tems. The algorithm consists of two phases: partitioning on the classes and interfaces in the partition. Edge-level test and selection. The partitioning phase builds a high-level selection selects test cases by analyzing change and cover- graph representation of programs P and P and performs age information at the level of the flow of control between a quick analysis of the graphs. The goal of the analysis is statements (see Section 2.2 for details). To perform edge- to identify, based on information on changed classes and in- level selection, we leverage an approach previously defined terfaces, the parts of P and P to be further analyzed. The by some of the authors [7]. selection phase of the algorithm builds a more detailed graph Because of the partitioning performed in the first phase, representation of the identified parts of P and P , analyzes the low-level, expensive analysis is generally performed on the graphs to identify differences between the programs, and only a small fraction of the whole program. Although only a selects for rerun test cases in T that traverse the changes. part of the program is analyzed, the approach is still—under Although the technique is defined for the Java language, certain assumptions—safe because (1) the partitioning iden- it can be adapted to work with other object-oriented (and tifies all classes and interfaces whose behavior may change traditional procedural) languages. as a consequence of the modifications to P , and (2) the edge- We also present DejaVOO: a tool that we developed and level technique we use on the selection phase is safe [7]. The that implements our RTS technique. assumptions for safety are discussed in Section 2.3. Finally, we present a set of empirical studies performed In the next two sections, we illustrate the two phases in using DejaVOO on a set of Java subjects ranging from 70 detail, using the example provided in Figures 1 and 2. The to over 500 KLOC. The studies show that, for the subjects example consists of a program P (Figure 1) and a modified considered, our technique is considerably more efficient than version of P , P (Figure 2). The differences between the two an existing precise technique that operates at a fine-grained programs are highlighted in the figures. Note that, for ease level of abstraction (89% on average for the largest subject). of presentation, we align and use the same line numbering The studies also show that the selection achieves consider- for corresponding lines of code in P and P . Also for ease of able savings in overall regression testing time. For the three presentation, in the rest of the paper we use the term type to subjects, our technique saved, on average, 19%, 36%, and indicate both classes and interfaces and the terms super-type 63% of the regression testing time. and sub-type to refer to type-hierarchical relation (involving The main contributions of this paper are: two classes, a class and an interface, or two interfaces). • The definition of a new technique for RTS that is ef- 2.1 Partitioning fective and can scale to large systems. The first phase of the approach performs a high-level anal- • The description of a prototype tool, DejaVOO, that ysis of P and P to identify the parts of the program that implements the technique. may be affected by the changes to P . The analysis is based • A set of empirical studies that show and discuss the on purely syntactic changes between P and P and on the effectiveness and efficiency of our technique. These are relationships among classes and interfaces in the program. the first studies that apply safe RTS to real systems of 2.1.1 Accounting for Syntactic Changes these sizes. Without loss of generality, we classify program changes into two groups: statement-level changes and declaration- 2. TWO-PHASE RTS TECHNIQUE level changes. A statement-level change consists of the mod- The basic idea behind our technique is to combine the ification, addition, or deletion of an executable statement. effectiveness of RTS techniques that are precise but may These changes are easily handled by RTS: each test case that be inefficient on large systems (e.g., [7, 17, 18]) with the traverses the modified part of the code must be re-executed. efficiency of techniques that work at a higher-level of ab- Figures 1 and 2, at line 9, show an example of statement- straction and may, thus, be imprecise (e.g., [9, 22]). We do level change from P to P . Any execution that exercises this using a two-phase approach that performs (1) an ini- that statement, will behave differently for P and P . tial high-level analysis, which identifies parts of the system A declaration-level change consists of the modification of to be further analyzed, and (2) an in-depth analysis of the a declaration. Examples of such changes are the modifica- identified parts, which selects the test cases in T to rerun. tion of the type of a variable, the addition or removal of a We call these two phases partitioning and selection. In method, the modification of an inheritance relationship, the the partitioning phase, the technique analyzes the program change of type in a catch clause, or the change of a modi- to identify hierarchical, aggregation, and use relationships fiers list (e.g., the addition of modifier “synchronized” to a among classes and interfaces [13]. Then, the technique uses method). These changes are more problematic for RTS than the information about these relationships, together with in- statement-level changes because they affect the behavior of
  3. 3. Program P Program P ’ 1: p u b l i c c l a s s SuperA { 1: p u b l i c c l a s s SuperA { 2: i n t i =0; 2: i n t i =0; 3: public void foo ( ) { 3: public void foo ( ) { 4: System . o u t . p r i n t l n ( i ) ; 4: System . o u t . p r i n t l n ( i ) ; 5: } 5: } 6: } 6: } 7: p u b l i c c l a s s A e x t e n d s SuperA { 7: p u b l i c c l a s s A e x t e n d s SuperA { 8: p u b l i c v o i d dummy ( ) { 8: p u b l i c v o i d dummy ( ) { 9: i −−; 9: i ++; 10: System . o u t . p r i n t l n (− i ) ; 10: System . o u t . p r i n t l n (− i ) ; 11: } 11: } 12 a : public void foo ( ) { 12 b : System . o u t . p r i n t l n ( i + 1 ) ; 12 c : } 12: } 12 d : } 13: public c l a s s SubA e x t e n d s A { } 13: public c l a s s SubA e x t e n d s A { } 14: public class B { 14: public class B { 15: p u b l i c void bar ( ) { 15: p u b l i c void bar ( ) { 16: SuperA a=L i b C l a s s . getAnyA ( ) ; 16: SuperA a=L i b C l a s s . getAnyA ( ) ; 17: a . foo ( ) ; 17: a . foo ( ) ; 18: } 18: } 19: } 19: } 20: public c l a s s SubB e x t e n d s B { } 20: public c l a s s SubB e x t e n d s B { } 21: public class C { 21: public class C { 22: p u b l i c v o i d b a r (B b ) { 22: p u b l i c v o i d b a r (B b ) { 23: b . bar ( ) ; 23: b . bar ( ) ; 24: } 24: } 25: } 25: } Figure 1: Example program P . Figure 2: Modified version of program P , P . the program only indirectly, often in non-obvious ways. Fig- However, such a straightforward approach does not work ures 1 and 2 show an example of a declaration-level change: in general for declaration-level changes. Assume now that a new method f oo is added to class A, in P (lines 12a–12c). the only change between P and P is the addition of method In this case, the change indirectly affects the statement that f oo to A (12a–12c in P ). As we discussed above, this change calls a.f oo (line 17). Assume that LibClass is a class in the leads to a possibly different behavior for test cases that tra- library and that LibClass.getAnyA() is a static method of verse statement 17 in P , which belongs to class B. There- such class that returns an instance of SuperA, A, or SubA. fore, all such test cases must be included in T , the set of After the static call at line 16, the dynamic type of a can test cases to rerun on P . Conversely, any test case that does be SuperA, A, or SubA. Therefore, due to dynamic bind- not execute that statement can be safely excluded from T . ing, the subsequent call to a.f oo can be bound to different Unfortunately, the straightforward approach would still methods in P and in P : in P , a.f oo is always bound to select only class A. The edge-level analysis of A would then SuperA.f oo, whereas in P , a.f oo can be bound to A.f oo show that the change between A in P and A in P is the or SuperA.f oo, depending on the dynamic type of a. This addition of the overriding method A.f oo, a declaration-level difference in binding may cause test cases that traverse the change that does not affect directly any other statement in statement at line 17 in P to behave differently in P . A. Therefore, the only way to select test cases that may be Declaration-level changes have generally more complex ef- affected by the change would be to select all test cases that fects than statement-level changes and, if not suitably han- instantiate class A1 because these test cases may execute dled, can cause an RTS technique to be imprecise, unsafe, A.f oo in P . Such an approach is clearly imprecise: some or both. We will show how our technique suitably handles test cases may instantiate class A and never traverse a.f oo, declaration-level changes in both phases. but the approach would still select them. Moreover, this selection is also unsafe. If, in the second-phase, we analyze 2.1.2 Accounting for Relationships Between Classes only class A, we will miss the fact that class SubA inherits In this section, we use the example in Figures 1 and 2 to from A. Without this information, we will not select test describe, intuitively, how our partitioning algorithm works cases that traverse a.f oo when the dynamic type of a is and the rationale behind it. To this end, we illustrate dif- SubA. Because these test cases may also behave differently ferent alternative approaches to partitioning, discuss their in P and P , not selecting them is unsafe. shortcomings, and motivate our approach. Because polymorphism and dynamic binding make RTS One straightforward approach for partitioning based on performed only on the changed types both unsafe and im- changes is to select just the changed types (classes or in- precise, a possible improvement is to select, when a type terfaces). Assume that the change at line 9 in Figures 1 C has changed, the whole type hierarchy that involves C and 2 is the only change between P and P . In this case, (i.e., all super- and sub-types of C. Considering our exam- any test case that behaves differently when run on P and P ple, this strategy will select a partition that contains classes must necessarily traverse statement 9 in P . Therefore, the SuperA, A, and SubA. By analyzing SuperA, A, and SubA, straightforward approach, which selects only class A, would the edge-level technique would (1) identify the inheritance be safe and precise: in the second phase, the edge-level anal- ysis of class A in P and P would identify the change at 1 In this case, static calls are non-relevant because dynamic statement 8 and select all and only test cases traversing it. binding can occur only on actual instances.
  4. 4. algorithm buildIRG SuperA B C input: program P output: IRG G for P begin buildIRG 1: create empty IRG G 2: for each class and interface e ∈ P do A SubB 3: create node ne 4: GN = GN ∪ {ne } inheritance edge 5: end for 6: for each class and interface e ∈ P do use edge 7: get direct super-type of e, s SubA 8: GIE = GIE ∪ { ne , ns } Figure 4: IRG for program P of Figure 1. 9: for each type r ∈ P that e references do 10: GU E = GU E ∪ { ne , nr } 11: end for 12: end for 2.1.3 Partitioning Algorithm 13: return G Before describing the details of the partitioning algorithm, end buildIRG we introduce the program representation on which the algo- Figure 3: Algorithm for building an IRG. rithm operates: the interclass relation graph. The Interclass Relation Graph (IRG) for a program is a triple {N, IE, U E}: relationships correctly, (2) discover that a call to a.f oo may • N is the set of nodes, one for each type. be bound to different methods in P and P if the type of • IE ⊂ N × N is the set of inheritance edges. An inher- a is A or SubA, and (3) consequently select for rerun all itance edge between a node for type e1 and a node for test cases that instantiate A, SubA, or both. Thus, such a type e2 indicates that e1 is a direct sub-type of e2 . partitioning would lead to safe results. • U E ⊂ N ×N is the set of use edges. A use edge between Although considering whole hierarchies solves the safety a node for type e1 and a node for type e2 indicates that issue, the approach is still imprecise. Again, some test cases e1 contains an explicit reference to e2 .2 may instantiate class A or SubA and never invoke a.f oo. Figure 3 shows the algorithm for building an IRG, buildIRG. Such test cases behave identically in P and P , but they For simplicity, in defining the algorithms, we use the follow- would still be selected for rerun. ing syntax: ne indicates the node for a type e (class or To improve the precision of the selection, our partition- interface); GN , GIE , and GU E indicate the set of nodes N , ing technique considers, in addition to the whole hierarchy inheritance edges IE, and use edges U E for a graph G, re- that involves a changed type, all types that explicitly refer- spectively. ence types in the hierarchy. For our example, the partition Algorithm buildIRG first creates a node for each type in would include SuperA, A, SubA, and B. By analyzing these the program (lines 2–5). Then, for each type e, the algo- four classes, an edge-level selection technique would be able rithm (1) connects ne to the node of its direct super-type to compute the hierarchy relationships, as discussed above, through an inheritance edge (lines 7–8), and (2) creates a and also to identify the call site to a.f oo in as the use edge from each nd to ne , such that d contains a reference point in the program where the program’s behavior may be to e (lines 9–11). Figure 4 shows the IRG for program P in affected by the change (details on how this is actually done Figure 1. The IRG represents the six classes in P and their are provided in Section 2.2). Therefore, the edge-level tech- inheritance and use relationships. nique can select all and only the test cases that call a.f oo Figures 5, 6, and 7 show our partition algorithm, compute- when the dynamic type of a is either A or SubA. This selec- Partition. The algorithm inputs the set of syntactically- tion is safe and as precise as the most precise existing RTS changed types C and two IRGs, one for the original version techniques (e.g., [2, 7, 17]). of the program and one for the modified version of the pro- It is important to note that no other type, besides the gram. As we mentioned above, change information can be ones in the partition, must be analyzed by the edge-level easily computed automatically. technique. Because of the way the system is partitioned, First, algorithm computePartition adds to partition P art any test case that behaves differently in P and P must all hierarchies that involve changed types. For each type e necessarily traverse one or more types in the partition, and in the changed-type set C, the algorithm adds to P art e would therefore be selected. Consider, for instance, class C itself (line 3) and all sub- and super-types of e, by calling of our example, which is not included in the partition. If a procedures addSubTypes and addSuperTypes (lines 4 and 5). test case that exercises class C shows a different behavior If s, the super-type of e in P , and s , the super-type of e in in P and P , it can only be because of the call to in P , differ (lines 6–8), the algorithm also adds all super-types Therefore, the test case would be selected even if we of s to the partition (line 9). This operation is performed consider only B. to account for cases in which type e is moved to another In summary, our partitioning technique selects, for each inheritance hierarchy. In these cases, changes to e may af- changed type (class or interface), (1) the type itself, (2) the fect not only types in e’s old hierarchy, but also types in e’s hierarchy of the changed type, and (3) the types that ex- new hierarchy. Consider our example program in Figure 1 plicitly reference any type in such hierarchy. Note that it and its corresponding IRG in Figure 4. Because the only may be possible to reduce the size of the partition identi- changed type is class A, at this point in the algorithm P art fied by the algorithm by performing additional analysis (e.g., would contain classes A, SubA, and SuperA. by distinguishing different kinds of use relationships among classes). However, doing so would increase the cost of the 2 The fact that e1 contains an explicit reference to e2 means partitioning and, as the studies in Section 3 show, our cur- that e1 uses e2 (e.g., by using e2 in a cast operation, by rent approach is effective in practice. In the next section, invoking one of e2 ’s methods, by referencing one of e2 ’s field, we present the algorithm that performs the selection. or by using e2 as an argument to instanceof).
  5. 5. algorithm computePartition procedure addSubTypes input: set of changed types C, input: IRG G, IRG for P , G type e IRG for P , G output: set of all sub-types of e, S declare: set of types T mp, initially empty begin addSubTypes output: set of types in the partition, P art 19: S = ∅ begin computePartition 20: for each node ns ∈ G, ns , ne ∈ GIE do 1: P art = ∅ 21: S = S ∪ {s} 2: for each type e ∈ C do 22: S = S ∪ addSubTypes(G, s) 3: P art = P art ∪ {e} 23: end for 4: P art = P art ∪ addSubTypes(G, e) 24: return S end addSubTypes 5: P art = P art ∪ addSuperTypes(G, e) 6: ns = n ∈ G, ne , n ∈ GIE Figure 6: Procedure addSubTypes. 7: ns = n ∈ G , ne , n ∈ GIE 8: if s = s then 9: P art = P art ∪ {s } procedure addSuperTypes 10: P art = P art ∪ addSuperTypes(G , s ) input: IRG G, 11: end if type e 12: end for output: set of all super-types of e, S 13: for each type p ∈ P art do begin addSuperTypes 14: for each edge v = nd , np ∈ GU E do 25: S = ∅ 15: T mp = T mp ∪ {d} 26: while ∃ns ∈ G, ne , ns ∈ GIE do 16: end for 27: S = S ∪ {s} 17: end for 28: e = s 18: P art = P art ∪ T mp 29: end while 19: return P art 30: return S end computePartition end addSuperTypes Figure 5: Algorithm computePartition. Figure 7: Procedure addSuperTypes. Second, for each type p currently in the partition, compute- following we provide an overview of this part of the tech- Partition adds to a temporary set T mp all types that ref- nique. Reference [7] provides additional details. erence p directly (lines 13–17). In our example, this part of the algorithm would add class B to T . 2.2.1 Computing Change Information Finally, the algorithm adds the types in T mp to P art and To compute change information, our technique first con- returns P art. structs two graphs, G and G , that represent the parts of P Procedure addSubTypes inputs an IRG G and a type e, and P in the partition identified by the partitioning phase. and returns the set of all types that are sub-types of e. The To adequately handle all Java language constructs, we de- procedure performs, through recursion, a backward traversal fined a new representation: the Java Interclass Graph. A of all inheritance edges whose target is node ne . Procedure Java Interclass Graph (JIG) extends a traditional control addSuperTypes also inputs an IRG G and a type e, and flow graph, in which nodes represent program statements returns the set of all types that are super-types of e. The and edges represent the flow of control between statements. procedure identifies super-types through reachability over The extensions account for various aspects of the Java lan- inheritance edges, starting from e. guage, such as inheritance, dynamic binding, exception han- dling, and synchronization. For example, all occurrences Complexity. Algorithm buildIRG makes a single pass over of class and interface names in the graph are fully quali- each type t, to identify t’s direct super-class and classes ref- fied, which accounts for possible changes of behavior due to erenced by t. Therefore, the worst-case time complexity of changes in the type hierarchy. The extensions also allow for the algorithm is O(m), where m is the size of program P . Al- safely analyzing subsystems if the part of the system that is gorithm computePartition performs reachability from each not analyzed is unchanged (and unaffected by the changes), changed type. The worst case time complexity of the algo- as described in detail in Reference [7]. rithm is, thus, |C|O(n2 ), where C is the set of changed types For the sake of space, instead of discussing the JIG in de- and n is the number of nodes in the IRG for P (i.e., the num- tail, we illustrate how we model dynamic binding in the JIG ber of classes and interfaces in the program). However, this using our example programs P and P (see Figures 1 and 2). complexity corresponds to the degenerate case of programs The top part of Figure 8 shows two partial JIGs, G and G , with n types and an inheritance tree of depth O(n). In prac- that represent method in P and P . In the graph, tice, the depth of the inheritance tree can be approximated each call site (lines 16 and 17) is expanded into a call and a with a constant, and the overall complexity of the partition- return node (nodes 16a and 16b, and 17a and 17b). Call and ing is linear in the size of the program. In fact, our ex- return nodes are connected by a path edge (edges (16a,16b) perimental results show that our partition algorithm is very and (17a,17b)), which represents the path through the called efficient in practice; for the largest of our subjects, JBoss, method. Call nodes are connected to the entry node(s) of which contains over 2,400 classes and over 500 KLOC, our the called method(s) with a call edge. If the call is static partitioning process took less than 15 seconds to complete. (i.e., not virtual), such as the call to LibClass.getAnyA(), the call node has only one outgoing call edge, labeled call. 2.2 Selection If the call is virtual, the call node is connected to the entry The second phase of the technique (1) computes change node of each method that can be bound to the call. Each information by analyzing the types in the partition identi- call edge from the call node to the entry node of a method m fied by the first phase, and (2) performs test selection by is labeled with the type of the receiver instance that causes matching the computed change information with coverage m to be bound to the call. In the example considered, the information. To perform edge-level selection, we use an ap- call to a.f oo at node 17a is a virtual call. In P , although proach previously defined by some of the authors [7]. In the a can have dynamic type SuperA, A, or SubA, the call is