Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this document? Why not share!

2,825 views

Published on

Specialization topics: Compiler Design / Program Analysis / Pointer analysis / Java program optimization

Published in:
Technology

No Downloads

Total views

2,825

On SlideShare

0

From Embeds

0

Number of Embeds

26

Shares

0

Downloads

70

Comments

0

Likes

2

No embeds

No notes for slide

- 1. A Static Slicing Tool for Sequential Java Programs A Thesis Submitted For the Degree of Master of Science (Engineering) in the Faculty of Engineering by Arvind Devaraj Computer Science and Automation Indian Institute of Science BANGALORE – 560 012 March 2007
- 2. i
- 3. AbstractA program slice consists of a subset of the statements of a program that can potentiallyaﬀect values computed at some point of interest. Such a point of interest along with a setof variables is called a slicing criterion. Slicing tools are useful for several applications,such as program understanding, testing, program integration, and so forth. Slicing objectoriented programs has some special problems, that need to be addressed due to featureslike inheritance, polymorphism and dynamic binding. Alias analysis is important forprecision of slices. In this thesis we implement a slicing tool for sequential Java programsin the SOOT framework. SOOT is a front-end for Java developed at McGill Universityand it provides several forms of intermediate code. We have integrated the slicer intothe framework. We also propose an improved technique for intraprocedural points-toanalysis. We have implemented this technique and compare the results of the analysiswith those for a ﬂow-insensitive scheme in SOOT. Performance results of the slicer arereported for several benchmarks. ii
- 4. ContentsAbstract ii1 Introduction 1 1.1 Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The SOOT Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Slicing 7 2.1 Intraprocedural Slicing using PDG . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Program Dependence Graph . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Slicing using the Program Dependence Graph . . . . . . . . . . . 8 2.1.3 Construction of the Data Dependence Graph . . . . . . . . . . . . 9 2.1.4 Control Dependence Graph . . . . . . . . . . . . . . . . . . . . . 11 2.1.5 Slicing in presence of unstructured control ﬂow . . . . . . . . . . . 14 2.1.6 Reconstructing CFG from the sliced PDG . . . . . . . . . . . . . 17 2.2 Interprocedural Slicing using SDG . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 System Dependence Graph . . . . . . . . . . . . . . . . . . . . . . 18 2.2.2 Calling context problem . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.3 Computing Summary Edges . . . . . . . . . . . . . . . . . . . . . 21 2.2.4 The Two Phase Slicing Algorithm . . . . . . . . . . . . . . . . . 21 2.2.5 Handling Shared Variables . . . . . . . . . . . . . . . . . . . . . . 23 2.3 Slicing Object Oriented Programs . . . . . . . . . . . . . . . . . . . . . . 26 2.3.1 Dependence Graph for Object Oriented Programs . . . . . . . . . 26 2.3.2 Handling Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.3 Handling Polymorphism . . . . . . . . . . . . . . . . . . . . . . . 34 2.3.4 Case Study - Elevator Class and its Dependence Graph . . . . . . 353 Points to Analysis 38 3.1 Need for Points to Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 Pointer Analysis using Constraints . . . . . . . . . . . . . . . . . . . . . 39 3.3 Dimensions of Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4 Andersen’s Algorithm for C . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5 Andersen’s Algorithm for Java . . . . . . . . . . . . . . . . . . . . . . . . 45 3.5.1 Model for references and heap objects . . . . . . . . . . . . . . . . 45 iii
- 5. CONTENTS iv 3.5.2 Computation of points to sets in SPARK . . . . . . . . . . . . . 47 3.6 CallGraph Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.6.1 Handling Virtual Methods . . . . . . . . . . . . . . . . . . . . . . 49 3.7 Improvements to Points to Analysis . . . . . . . . . . . . . . . . . . . . . 50 3.8 Improving Flow Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.8.1 Computing Valid Subgraph at each Program Point . . . . . . . . 53 3.8.2 Computation of Access Expressions . . . . . . . . . . . . . . . . 55 3.8.3 Checking for Satisﬁability . . . . . . . . . . . . . . . . . . . . . . 604 Implementation and Experimental Results 62 4.1 Soot-A bytecode analysis framework . . . . . . . . . . . . . . . . . . . . 62 4.2 Steps in performing slicing in Soot . . . . . . . . . . . . . . . . . . . . . 65 4.3 Points to Analysis and Call Graph . . . . . . . . . . . . . . . . . . . . . 65 4.4 Computing Required Classes . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.5 Side eﬀect computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.6 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.7 Computing the Class Dependence Graph . . . . . . . . . . . . . . . . . . 70 4.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 Conclusion and Future Work 75Bibliography 77
- 6. List of Tables 3.1 Constraints for C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 Constraints for Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3 Data ﬂow equations for computing valid edges . . . . . . . . . . . . . . . 53 3.4 Computation of Valid edges . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1 Benchmarks Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Number of Edges in the Class Dependence Graph . . . . . . . . . . . . . 72 4.3 Timing Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.4 Program Statistics - Partial Flow Sensitive . . . . . . . . . . . . . . . . . 73 4.5 Precision Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 v
- 7. List of Figures 1.1 A program and its slice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 A Control Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Post Dominator Tree for the CFG in Figure 2.1 . . . . . . . . . . . . . . 12 2.3 Dominance Frontiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 A program and its PDG (taken from [39]) . . . . . . . . . . . . . . . . . 15 2.5 Augmented CFG and PDG for the program in Figure 2.4 (taken from [39]) 16 2.6 A program with function calls . . . . . . . . . . . . . . . . . . . . . . . . 18 2.7 System Dependence Graph for an interprocedural program . . . . . . . . 19 2.8 Slicing the System Dependence Graph . . . . . . . . . . . . . . . . . . . 24 2.9 Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.10 The Dependence Graph for the main function (from [67]) . . . . . . . . 29 2.11 The Dependence Graphs for functions C() and D() (from [67]) . . . . . 29 2.12 Interface Dependence Graph (from [58]) . . . . . . . . . . . . . . . . . . 33 2.13 The Elevator program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.14 Dependence Graph for Elevator program . . . . . . . . . . . . . . . . . . 37 3.1 Need for Points to Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Points to Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Imprecision due to context insensitive analysis . . . . . . . . . . . . . . . 43 3.4 Object Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.5 An example program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.6 Access Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.7 OFG Subgraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.8 Access Expressions(for a DAG) . . . . . . . . . . . . . . . . . . . . . . . 58 3.9 Access Expressions (for general graph) . . . . . . . . . . . . . . . . . . . 60 3.10 Simpliﬁed Access Expressions . . . . . . . . . . . . . . . . . . . . . . . . 60 3.11 Dominator Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.1 Soot Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Computation of the class dependence graph . . . . . . . . . . . . . . . . 66 4.3 Jimple code and its slice . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 vi
- 8. Chapter 1Introduction1.1 SlicingA program slice consists of the parts of a program that can potentially aﬀect the value ofvariables computed at some point of interest. Such a point is called the slicing criterionand is speciﬁed by a pair (program point,set of variables).The original concept of aprogram slice was proposed by Mark Weiser [61]. According to his deﬁnition A slice s of program p is a subset of the statements of p that retains some speciﬁed behavior of p. The desired behavior is detailed by means of a slicing criterion c. Generally, a slicing criterion c is a set of variables V and a program point l. When the slice s is executed, it must always have the same values as program p for the variables in V at point l. Weiser claimed that a program slice was the abstraction that users had in mind asthey debugged programs. There have been variations in the deﬁnitions of program slicesdepending on the application in mind. Weiser’s original deﬁnition required a slice S ofa program to be an executable subset of the program, whereas another common deﬁni-tion deﬁnes a slice as a subset of statements that directly or indirectly aﬀect the valuescomputed at the point of interest but are not necessarily an executable segment. Fig-ure 1.1 shows a program sliced with respect to the slicing criterion ( print(product), 1
- 9. Chapter 1. Introduction 2 read(n); i = 1; read(n); sum = 0; i = 1; product = 1; product = 1; while (i<=n) { while (i<=n) { sum = sum + i; product = product * i; product = product * i; i = i + 1; i = i + 1; } } print(sum); print(product); print(product); Figure 1.1: A program and its sliceproduct) . Since the transformed program is expected to be much smaller than theoriginal it is hoped that dependencies between statements in the program will be moreexplicit. Surveys on program slicing are presented in [45], [73]. Slicing tools have beenused for several applications, such as program understanding [82], testing [74] [75], pro-gram integration [78], model checking [79] and so forth. 1. Program Understanding: Software engineers are assigned to understand a mas- sive piece of code and modify parts of them. When modifying a program, we need to comprehend a section of the program rather than the whole program. Backward and forward slicing can be used to browse the code and understand the interde- pendence between various parts of the program. 2. Testing: In the context of testing, a problem that is often encountered is that of ﬁnding the set of program statements that are aﬀected by a change in the program. This analysis is termed impact analysis. To determine what tests need to be re-run to test test a modiﬁed statement S, a backward slice on S will get the statements that actually inﬂuence the behavior of the program. 3. Debugging: Quite often the statement that is actually responsible for a bug that shows up at some program point P is statically far away from P . To reduce the search space of possible causes for the error the programmer can use a backward
- 10. Chapter 1. Introduction 3 slice to eliminate parts of the code that could not have been the cause of the problem. 4. Model Checking: Model checking is a veriﬁcation technique that performs an exhaustive exploration of a program’s state space. Typically the execution of a program is simulated and path and states encountered in the simulation are checked against correctness speciﬁcations phrased as temporal logic formula. The use of slicing here is to reduce the size of a program P beginning checked for a property by eliminating statements and variables that are irrelevant to the formula. There is an essential diﬀerence between static and dynamic slices. A static slicedisregards the actual inputs to a program whereas the latter relies on a speciﬁc test caseand therefore is in general , more precise. When slicing a program P we are concerned with both correctness as well as precision.For correctness we demand that the slice S produced by the tool is a superset of theactual slice S(p) for the slicing criterion p. Precision has to do with the size of the slice.For two correct slices S1 and S2 , S1 is more precise than S2 , if the statements of S1are a subset of the statements of S2 . Obtaining the most precise slice, is in general notcomputable, hence our aim is to compute a correct slice that is as precise as possible. The slicing problem can be addressed by viewing it as a reachability problem in aProgram Dependence Graph (PDG) [54]. A PDG is a directed graph with vertices cor-responding to statements and predicates and edges corresponding to data and controldependences. For the sequential intraprocedural case, the backward slice with respectto a node in the PDG is the set of all nodes in the PDG on which this node is tran-sitively dependent. Thus given the PDG, a simple reachability algorithm on the PDGwill construct the slice. However when considering interprocedural slices, the processis more complicated as mere reachability will produce imprecise slices. One needs totrack only interprocedural realizable paths, where a realizable path corresponds to legalcall/return pairs where a procedure always returns to the call site where it was invoked.The structure on which interprocedural slicing is generally implemented is the SystemDependence Graph [63] (SDG). This graph is a collection of graphs corresponding to
- 11. Chapter 1. Introduction 4PDG’ss for individual procedures augmented with some extra edges that capture theinteraction between them. Slicing of interprocedural programs is described by Horwitzet.al [63]. They use the SDG to track dependencies in a program and use a two phasealgorithm to ensure that only feasible paths are tracked, that is, those in which procedurecalls are matched with the correct return statements. Slicing object oriented programs adds yet another dimension of complexity to theslicing problem. Object-oriented concepts such as classes, objects, inheritance, poly-morphism and dynamic binding make representation and analysis techniques used forimperative programming languages inadequate for object-oriented programs. The ClassDependence Graph has been introduced by Larsen and Harrold [66], which can representclass hierarchy, data members and polymorphism. Some more features were added byLiang and Harrold [67]. The resolution of aliases is required for the correct computation of data dependencies.To compute the dependence graph, it is necessary to build a call graph. The computationof call graph becomes complicated in presence of dynamic binding , i.e. when the targetof a method call depends on the runtime type of a variable. Algorithms like Rapid TypeAnalysis (RTA) [26] compute call graphs using type information. A key analysis for object oriented languages is alias analysis. The objective here isto follow an object O from its point of allocation to ﬁnd out which objects referenceO and which other objects are referenced by the ﬁelds of O Resolving aliasing becomesimportant for the correct computation of data dependencies in the dependence graph.The precision of the analysis depends on various factors like ﬂow sensitivity, contextsensitivity and handling of ﬁeld references. Andersen [64] gives a ﬂow insensitive methodfor ﬁnding aliases using subset constraints. Lhotak [70] describes the method adaptedfor Java programs. In this thesis we implement a slicing tool for sequential Java programs and integrateit into the SOOT framework. We brieﬂy describe the framework and the contributionsof the thesis.
- 12. Chapter 1. Introduction 51.2 The SOOT FrameworkThe SOOT analysis and transformation framework [69] is a Java optimization frameworkdeveloped by the Sable Research Group at McGill University and it is intended to be arobust, easy-to-use research framework. It has been used extensively for program analy-sis, instrumentation, and optimization. It provides several forms of intermediate code foranalyzing and optimizing Java bytecode. Jimple is a typed three address representation,which we have used in our implementation. Our objective is to implement a slicing tool within the Soot framework [69] and makeit publicly available. At the time this work was begun there was no publicly availableslicing infrastructure for Java. The Indus [81] project addresses the slicing problem forJava programs and source code has been made available in February 2007.1.3 Contributions of the thesisThe following are the contributions of this thesis: 1. We have implemented the routines for creating the program dependence graphs and the class dependence graph for an input Java program that is represented in the form of Jimple intermediate code. 2. We have integrated a slicer into the framework. For inter-procedural slicing we have implemented the two-phase slicing algorithm of [63]. 3. We propose an improved technique for intraprocedural points-to analysis. This uses path expressions to track paths that encode valid points-to information. A simple data-ﬂow analysis formulation collects valid edges, i.e. those that are added to the object ﬂow graph. Reachability queries are handled in a reasonable amount of time. We have implemented this technique and compare the results of the analysis with those for a ﬂow-insensitive scheme in SOOT. 4. The slicing tool has been run on several benchmarks and we report on times taken
- 13. Chapter 1. Introduction 6 to build the class dependence graph, its size, slice sizes for some given slicing criteria and slicing times.
- 14. Chapter 2SlicingIn this chapter, we discuss techniques for slicing a program and in particular issues thatarise when slicing object oriented programs. The ﬁrst part of the chapter describes theProgram Dependence Graph (PDG), its construction and the algorithm for intraproce-dural slicing. For slicing programs with function calls, the System Dependence Graph(SDG) is used. The SDG is a collection of PDGs individual procedures with additionaledges for modeling procedure calls and parameter bindings. The second part of thechapter describes the construction of SDG and the algorithm for interprocedural slicing.The third part of the chapter describes dependence graph computation of object ori-ented programs, which is complicated because objects can be passed as parameters andmethods can be invoked upon objects. Also we need the results of points to analysis todetermine what objects are pointed by each reference variable. Then we describe the ex-tension of the algorithm for computing the dependence graph in presence of inheritanceand polymorphic function calls.2.1 Intraprocedural Slicing using PDGWeiser’s approach [61] to program slicing is based on dataﬂow equations. In his approach,the set of relevant variables is iteratively computed till a ﬁxed point is reached. Slicingvia graph reachability was introduced by Ottenstein [54]. In this approach a dependence 7
- 15. Chapter 2. Slicing 8graph of the program is constructed and the problem of slicing reduces to computingreachability on the dependence graph. We adopt this in our implementation.2.1.1 Program Dependence GraphA program dependence graph (PDG) represents the data and control dependencies inthe program. Nodes of PDG represent statements and predicates in a source program,and its edges denote dependence relations. The PDG can be constructed as follows. 1. Build the program’s CFG, and use it to compute data and control dependencies: Node N is data dependent on node M iﬀ M deﬁnes a variable x, N uses x, and there is an x-deﬁnition-free path in the CFG from M to N . Node N is control dependent on node M iﬀ M is a predicate node whose evaluation to true or false determines whether N will be executed. 2. Build the PDG. The nodes of the PDG are almost the same as the nodes of the CFG. However, in addition, there is a a special enter node, and a node for each predicate. The PDG does not include the CFG’s exit node. The edges of the PDG represent the data and control dependencies computed using the CFG.2.1.2 Slicing using the Program Dependence GraphTo compute the slice from statement (or predicate) S, start from the PDG node thatrepresents S and follow the data- and control-dependence edges backwards in the PDG.The components of the slice are all of the nodes reached in this manner. The computation of the data dependence graph is described in Section 2.1.3. Com-puting the control dependence graph is described in Section 2.1.4. Figure 2.4 shows anexample program and its corresponding PDG. Solid lines represent control dependencieswhile dashes lines represent data dependencies.
- 16. Chapter 2. Slicing 92.1.3 Construction of the Data Dependence GraphA data dependence graph represents the association between deﬁnitions and uses of avariable. There is an association (d, u) between a deﬁnition of variable v at d and a useof variable v at u iﬀ there is at least one control ﬂow path from d to u with no interveningdeﬁnition of v. Each node represent a statement. An edge represents a ﬂow dependency betweenstatements. Though there are many kinds of data dependencies between statements,only ﬂow dependencies are necessary for the purpose of slicing as only ﬂow dependenceneeds to be traced back in order to compute the PDG nodes comprising the slice. Outputand anti dependence edges do not represent true data dependence. Instead they encodea partial order on program statements, which is necessary to preserve since there is noexplicit control ﬂow relation between PDG nodes. However, PDG slices are normallymapped back to high-level source code, where control ﬂow is explicitly represented. Thusthere is no need for any such control ﬂow information to be present in the computedPDG slice. Computation of ﬂow dependencies is done by computing the problem of reachingdeﬁnitions. The problem of reaching deﬁnitions is a classical bitvector problem solvableby monotone dataﬂow framework. This associates a program point with the set ofdeﬁnitions reaching that point. The deﬁnitions reaching a program point along with theuse of a variable form ﬂow dependencies.Dependence in presence of arrays and recordsIn the presence of composite data types like arrays, records and pointers, the mostconservative method is to assume a deﬁnition of a variable to be the deﬁnition of theentire composite object [83]. A deﬁnition (or use) of an element of an array can beconsidered as deﬁnition (or use) of the entire array. For example, consider the statement a[i] = x
- 17. Chapter 2. Slicing 10Here the variable a is deﬁned and variables i, x are used. Thus DEF = {a} andREF = {i, x}. The value of a is used in computing the address of a[i] and thus a mustalso be included in the REF set. The correct value for REF is {a, i, x} [45] . Thisapproach is conservative leading to large slices created due to spurious dependencies.Our current implementation handles composite data types in this manner, though morereﬁned methods have been proposed in the literature. Agrawal et.al. [53] propose amodiﬁed algorithm for computing reaching deﬁnitions that determines the memory loca-tions deﬁned and used in statements and computes whether the intersection among thoselocations is complete or partial or statically indeterminable. Another method to avoidspurious dependencies is to use array index tests like GCD tests which can determinethat there is no dependence between two array accesses expressions.Data dependencies in presence of aliasingWhen computing data dependencies the major problem occurs due to presence of aliasing,Consider the following example. Here there is a data dependency between x.a = ... and ...= y.a since both x and y point to the object o1. Without alias analysis this dependencyis missed because the syntactic expressions x.a and y.a are diﬀerent. Thus resolvingaliases is necessary for the correct computation of data dependencies. Also if worst caseassumptions are made for ﬁeld loads and stores, many spurious dependencies are created.v o i d fun ( ) { obj x , y ; x=new o b j ( ) ; // o1 i s th e o b j e c t c r e a t e d y=x ; x.a = . . . . ; ... = y.a ;}
- 18. Chapter 2. Slicing 11 P if(x>y) S1 max = x; else S2 max = y;2.1.4 Control Dependence GraphAnother kind of dependence between statements arises due to the presence of controlstructure. For example, in the above code, the execution of S1 is dependent on the predicatex > y . Thus S1 is said to be control dependent on P. A slice with respect to S1 has toinclude P, because the execution of S1 depends on the outcome of the predicate node P. Two nodes Y and Z should be identiﬁed as having identical control conditions if inevery run of the program node, Y is executed if and only if Z is executed. In Figure2.1, nodes 2 and 5 are said to be control dependent on the true branch of node 1,since their execution is dependent conditionally on the outcome of node 1. The originalmethod for computing control dependence information using postdominators is presentedby Ferrante et.al. [47]. Cytron et.al. [46] gives an improved method for constructingcontrol dependence information by using dominance frontiers.Finding control dependence using postdominators relationshipA node X is said to be a postdominator of node Y if all possible paths from Y to the exitnode must pass through X. A node N is said to be control dependent on edge a → b , if 1. N postdominates b 2. N does not postdominate a In Figure 2.1, to ﬁnd the nodes that are control dependent on edge 1 → 2, we ﬁndnodes that postdominate node 2 but not node 1. Nodes 2 and 5 are such nodes. Sonodes 2 and 5 are control dependent on the edge 1 → 2.
- 19. Chapter 2. Slicing 12 This observation suggests that to ﬁnd the nodes that are control dependent on theedge X → Y , we can traverse the postdominator tree and mark all nodes that postdom-inate Y to be control dependent on Y , we stop when we reach the postdominator ofX. HIJK ONML 1 ÐÐ UU ÐÐ UU HIJK ONML Ð Ð ÐÐ UU 7 b UU ÐÐ bb HIJK ONML 2 b UU ÐÐÐ bb bb Ð bb UU ÐÐ bb Ð bb UU ÐÐ b1 ÐÐ bb ÐÐ Ð Ð ÐÐ 0 U HIJK ONML 5 b HIJK ONML 6 HIJK ONML 1 HIJK ONML HIJK ONML HIJK ONML Ð bb 3 b 4 6 ÐÐÐ bb bb Ð ÐÐ bb bb Ð × × ÐÐ bb bb ÐÐ ×× Ð b1 ÐÐ ÐÐ 0 Ð Ð ×× HIJK ONML 2 HIJK ONML 4 HIJK ONML 3 HIJK ONML 5 b ×× bb ×× bb ××× bb 0 ×× Ó HIJK ONML 7 Figure 2.2: Post Dominator Tree for the CFG in Figure 2.1 Figure 2.1: A Control Flow GraphUsing Dominance Frontiers to compute Control DependenceControl dependencies between statements can be computed in an eﬃcient manner us-ing the dominance frontier information. Cytron et.al. [46] describes the method forcomputing dominance frontiers. A dominance frontier for vertex vi contains all vertices vj such that vi dominates animmediate predecessor of vj , but vi does not strictly dominate vj [62] DF (vi ) = { vj | (vj ∈ V ) (∃vk ∈ P red(vj )) ((vi dom vk ) ∧ ¬(vi sdom vj )) } Informally, the set of nodes lying just outside the dominated region of Y is said to
- 20. Chapter 2. Slicing 13 HIJK ONML S Ö Ö ÖÖ ÖÖ @ ÖÖÖ HIJK ONML Y h Ö ÖÖ {{ hh hh ÖÖ {{ hh ÖÓ Ö }{{{ 3 ONML HIJK Z WVUT PQRS Y g PQRS WVUT Y QQ gg z QQ gg zz QQ gg zz 3 zz } QQ PQRS WVUT QQ Y QQ uuu QQ uu QQ uuu QQ uu uuu Q% uu zuu HIJK ONML X Figure 2.3: Dominance Frontiersbe in the dominance frontier of Y. In the example in Figure 2.3, Y dominates nodesY’,Y”,Y”’ and X lies just outside the dominated region. So X is said to be in thedominance frontier of Y. Note that if X is in the dominance frontier of Y , then there would be at least twoincoming paths to X of which one contains Y another not does not. If the CFG isreversed, then we have two outgoing paths from X, one containing Y and another notcontaining Y. This is same as the condition for Y to be control dependent on X. Thusto ﬁnd control dependence it is enough to ﬁnd the dominance frontiers on the reversecontrol ﬂow graph. Algorithm 1 computes the control dependence information.
- 21. Chapter 2. Slicing 14Algorithm 1 Algorithm to compute the Control Dependence Graph compute dominance frontiers of reversed CFG G i.e. for all N in G do let RDF (N ) be reverse dominator frontiers of N if RDF (N ) is empty then N is made control dependent on method entry node end if for all node P in RDF (N ) do for all node S in CFG successor of P do if S = N or N postdominates S then N is made control dependent on P end if end for end for end for2.1.5 Slicing in presence of unstructured control ﬂowIn the presence of unstructured control ﬂow caused due to jump statements like goto,break, continue and return, the algorithm for slicing can produce an incorrect slice. WhileJava does not have goto statements, break and continue statements cause unstructuredcontrol ﬂow. Consider computing slice with respect to the statement print(prod) inFigure 2.4. When the slicing algorithm discussed in Section 2.1.2 is applied , the state-ment break is not included, which is incorrect. This was discovered by Choi and Ferrante [38] and by Ball and Horwitz [37] whopresent a method to compute a correct slice in presence of unstructured control ﬂowstatements. Their method to correct for such statements is based on the observationthat jumps are similar to predicate nodes in a way - both aﬀect ﬂow of control. Thusjumps are also made to be sources of control dependence edges. A jump vertex has anoutgoing true edge to the target of the jump, and an outgoing false edge to the statementthat would execute if the jump were a no-op. A jump vertex is considered as a pseudopredicate since the outgoing false edge is non-executable. The original CFG augmentedwith these non-executable edges is called the Augmented Control Flow Graph (ACFG). Kumar and Horwitz [39] describe the following algorithm for slicing in presence ofjump statements.
- 22. Chapter 2. Slicing 15 enter prod = 1; k = 1; prod = 1 while (k = 10) { k=1 if (MAXINT/k prod) break; prod = prod * k; while (k = 10) T k++; F if (MAXINT/k prod) } print(k) print(k); T F print(prod); print(prod) break prod = prod * k exit k++ (a) Example Program (b) CFG enter prod = 1 k=1 while (k = 10) print(k) print(prod) if (MAXINT/k prod) break k++ prod = prod * k (c) PDG Figure 2.4: A program and its PDG (taken from [39])
- 23. Chapter 2. Slicing 16 enter enter prod = 1 prod = 1 print(prod) k=1 k=1 print(k) while (k = 10) T while (k = 10) F if (MAXINT/k prod) print(k) T T F if (MAXINT/k prod) print(prod) break prod = prod * k F break k++ exit k++ prod = prod * k (a) ACFG (b) Corresponding APDGFigure 2.5: Augmented CFG and PDG for the program in Figure 2.4 (taken from [39])
- 24. Chapter 2. Slicing 17 1. Build the program’s augmented control ﬂow graph described previously. Labels are treated as separate statements; i.e., each label is represented in the ACFG by a node with one outgoing edge to the statement that it labels. 2. Build the program’s augmented PDG. Ignore the non-executable ACFG edges when computing data-dependence edges; do not ignore them when computing control- dependence edges. (This way, the nodes that are executed only because a jump is present, as well as those that are not executed but would be if the jump were removed, are control dependent on the jump node, and therefore the jump will be included in their slices.) 3. To compute the slice from node S, follow data- and control-dependence edges back- wards from S . A label L is included in a slice iﬀ a statement “goto L” is in the slice2.1.6 Reconstructing CFG from the sliced PDGReconstructing the CFG from the PDG is described in in [71]. From the CFG and thePDG slice, a sliced CFG is constructed by walking through all nodes. For each node n,we execute the following. 1. If n is a goto statement or return statement, leave it in the slice 2. If n is a conditional statement , there are three cases (a) If n is not in the PDG slice, it can be removed (b) If n is in the PDG slice, but one of the branches is not, replace the jump to that branch with a jump to the convergence node of the branch (the node where two branches reconnect). If that node doesn’t exist , replace the jump with a jump to the return statement of the program (c) If n is present in the PDG slice and both branches are present leave n in the CFG
- 25. Chapter 2. Slicing 18main() { sum=0; i=1; while(i11) { sum=add(sum,i); i=add(i,1); } print(sum); print(i);}int add(int a,int b) { result=a+b; return result;} Figure 2.6: A program with function calls 3. Otherwise check if n is present in the PDG, if not remove it We next describe the interprocedural slicing algorithm implemented in this thesis.2.2 Interprocedural Slicing using SDG2.2.1 System Dependence GraphFor interprocedural slicing, Horwitz et.al [63] introduce the System Dependence Graph(SDG). A system-dependence graph is a collection of program-dependence graphs, onefor each procedure, with additional edges for modeling parameter passing. Figure 2.6shows a program with function calls. Figure 2.7 displays its SDG. Each PDG contains an entry node that represents entry to the procedure. To modelprocedure calls and parameter passing, an SDG introduces additional nodes and edges.Accesses to global variables are modeled via additional parameters of the procedure.They assume parameters are passed by value-result, and introduce additional nodes in
- 26. Chapter 2. Slicing 19 main sum=0 i=1 while(i11) print(sum) print(i) call add call add a_in=sum a_in=i i=r_out sum=r_out b_in=i b_in=1 enter add a=a_in b=b_in r_out=result result=a+b control edge parameter edge data edge call edge summary edge Figure 2.7: System Dependence Graph for an interprocedural program
- 27. Chapter 2. Slicing 20the interprocedural case. The following additional nodes are introduced. 1. Call-site nodes representing the call sites. 2. Actual-in and actual-out nodes representing the input and output parameters at the call sites. They are control dependent on the call-site node. 3. Formal-in and formal-out nodes representing the input and output parameters at the called procedure. They are control dependent on the procedure’s entry node. They also introduce additional edges to link the program dependence graphs together: 1. Call edges link the call-site nodes with the procedure entry nodes. 2. Parameter-in edges link the actual-in nodes with the formal-in nodes. 3. Parameter-out edges link the formal-out nodes with the actual-out nodes2.2.2 Calling context problemFor computing an intraprocedural slice, a simple reachability algorithm on the PDG issuﬃcient. However in interprocedural case, a simple reachability over the SDG doesn’twork since not all the paths are valid. For example, in Figure 2.7, the path a in = sum →a = a in → result = a + b → r out = result → i = r out is not valid interprocedurally.In an interprocedural valid path, a call edge must be matched with its correspondingreturn edge. To address this problem, Horwitz et.al. [63] introduce the concept of summary edges.These edges summarize the eﬀect of a procedure call. There is a summary edge betweenan actual in and an actual out node of a call site, if there is a dependency between thecorresponding formal in and formal out node of the called procedure. Thus a summaryedge summarizes the eﬀect of a procedure call.
- 28. Chapter 2. Slicing 212.2.3 Computing Summary EdgesWe describe computation of summary edges in Algorithm 2. The algorithm takes thegiven SDG and adds summary edges. P is the set of path edges. Each edge in P ofthe form (n, m) encodes the information that there is a realizable path in the SDG fromn to m. The worklist contains path edges that need to be processed. The algorithmbegins by asserting that there is a realizable path from each formal out node to itself.The set of realizable paths P is extended by traversing backwards through dependenceedges. If during the traversal, a formal in-node is encountered, then we have a realizablepath from formal-in to formal-out node. Therefore a summary edge is added betweenthe actual in and actual out nodes of the corresponding call sites. Because the insertionof summary edges makes more paths feasible, this process is continued iteratively, till nomore summary edges can be added. The algorithm for computing summary informationis displayed in Algorithm 2 Computing the summary edges is equivalent to the functional approach suggested bySharir and Pnueli [41].2.2.4 The Two Phase Slicing AlgorithmHorwitz et.al [63] describe the two phase algorithm. The interprocedural backward slicingalgorithm consists of two phases. The ﬁrst phase traverses backwards from the node inthe SDG that represents the slicing criterion along all edges except parameter-out edges,and marks those nodes that are reached. The second phase traverses backwards from allnodes marked during the ﬁrst phase along all edges except call and parameter-in edges,and marks reached nodes. The slice is the union of the marked nodes. Let s be theslicing criterion in procedure P 1. Phase 1 identiﬁes vertices that can reach s, and are either in P itself or in a procedure that calls P (either directly or transitively). Because parameter out edges are not followed, the traversal in Phase 1, does not descend into procedures
- 29. Chapter 2. Slicing 22Algorithm 2 Computing Summary Information W = ∅, W is the worklist P = ∅, P is the set of pathedges for all n ∈ N which is a formal out node do W = W ∪ (n, n) P = P ∪ (n, n) end for while W = ∅, worklist is not empty do remove one element (n,m) from worklist if n is a formal in node then for all n → n which is a parameter in edge do for all m → m which is a parameter out edge do if n and m belong to the same call site then E = E ∪ n → m add a new summary edge for all (m , x) ∈ P do P = P ∪ (n , x) W = W ∪ (n , x) end for end if end for end for else for all n → n do if (n , m) ∈ P then / P = P ∪ (n , m) W = W ∪ (n , m) end if end for end if end while
- 30. Chapter 2. Slicing 23 called by P. Though the algorithm doesn’t descend into the called procedures, the eﬀects of such procedures are not ignored due to the presence of summary edges. 2. Phase 2 identiﬁes vertices that reach s from procedures (transitively) called by P or from procedures called by procedures that (transitively) call P. Because call edges and parameter in edges are not followed, the traversal in phase 2 doesn’t ascend into calling procedures; the transitive ﬂow dependence edges from actual in to actual out vertices make such ascents unnecessary. We implemented a variation of the two phase slicing algorithm as described by Krinke[49]. Figure 2.8 shows the vertices in SDG marked during phase 1 and phase 2, whenthe statement print(i) is given as slicing criteria. The ﬁrst phase traverses backwardsalong all edges except the parameter out edge r out = result → i = r out . Thus theﬁrst phase does not descend into the procedure add. In second phase traverses backwardsall edges except the parameter in edges and call edges. Thus in the second phase neitherthe edge a in = sum → a = a in nor the edge call add → a = a in is traversed.2.2.5 Handling Shared VariablesThis section deals with handling variables that are shared across procedures. Sharedvariables include global variables in imperative languages. Though Java does not haveglobal variables, instance members of a class can be treated as global variables that areaccessible by the member functions. Shared variables are handled by passing them as a additional parameters in everyfunction. Considering every shared variable as a parameter is a correct but ineﬃcient asit increases the number of nodes. We can reduce the number of parameters passed bydoing interprocedural analysis and using the GMOD and GREF information [42]. 1. GMOD(P) : The set of variables that might be modiﬁed by P itself or by a proce- dure (transitively) called from P 2. GREF(P) : The set of variables that might be referenced by P itself or by a pro- cedure (transitively) called from P
- 31. Chapter 2. Slicing 24 main sum=0 i=1 while(i11) print(sum) print(i) call add call add a_in=sum a_in=i i=r_out sum=r_out b_in=i b_in=1 enter add a=a_in b=b_in r_out=result result=a+b marked in phase 1 control edge parameter edge data edge call edge marked in phase 2 summary edge Figure 2.8: Slicing the System Dependence Graph
- 32. Chapter 2. Slicing 25Algorithm 3 Two phase slicing algorithm (Krinke’s version) input G=(N,E) the given SDG, s ∈ N the slicing criterion output S ⊆ N , the slice W up = s W down = ∅ First phase while W up = ∅ worklist is not empty do remove one element n from W up for all m → n ∈ E do if m ∈ S then / if m → n is a parameter out edge then W down = W down ∪ m S =S∪m else W up = W up ∪ m S =S∪m end if end if end for end while while W down = ∅ worklist not empty do remove an element n from the worklist for all m → n ∈ E do if m ∈ S then / if m → n is not a parameter in edge or call edge then W down = W down ∪ m S =S∪m end if end if end for end while
- 33. Chapter 2. Slicing 26 GMOD and GREF sets are used to determine which parameter vertices are includedin procedure dependence graphs . At procedure entry, these nodes are inserted 1. Formal in for each variable in GM OD(P ) ∪ GREF (P ) 2. Formal out for each variable in GM OD(P ) Similarly at a call site, the following nodes are inserted 1. Actual in for each variable in GM OD(P ) ∪ GREF (P ) 2. Actual out for each variable in GM OD(P )2.3 Slicing Object Oriented ProgramsThe System Dependence Graph (SDG) is not suﬃcient to represent all dependenciesfor object oriented programs. An eﬃcient graph representation of an object orientedprogram should employ a class representation that can be reused in the construction ofother classes and applications that use the class. Section 2.3.1 discuss about dependencegraph representation for object oriented programs. Sections 2.3.2 and 2.3.3 discuss aboutinheritance and polymorphism respectively.2.3.1 Dependence Graph for Object Oriented ProgramsThe dependencies within a single method are represented using a Method DependenceGraph (MDG), which is composed of data dependence subgraph and control dependencesubgraph. The MDG has a method entry node which represents the start of a method.The method entry vertex has a formal in vertex for every formal parameter and a formalout vertex for each formal parameter that may be modiﬁed. Each call site has a call vertexand a set of actual parameter vertices: an actual-in vertex for each actual parameter atthe call site and an actual-out vertex for each actual parameter that may be modiﬁedby the called procedure. Parameter out edges are added from each formal-out node tothe corresponding actual-out node. The eﬀects of return statements are modeled by
- 34. Chapter 2. Slicing 27connecting the return statement to its corresponding call vertex using a parameter-outedge. Summary edges are added from actual in to actual out nodes as described inSection 2.2.3. Larsen and Harrold [66] represent the dependencies in a class using the class de-pendence graph (ClDG). A ClDG is a collection of MDGs constructed for individualmethods in the program. In addition it contains a class entry vertex that is connected tothe method entry vertex for each method in the class by a class member edge. Class entryvertices and class member edges let us track dependencies that arise due to interactionamong classes. In presence of multiple classes, additional dependence edges are required to recordthe interaction between classes. For example, when a class C1 creates an object of classC2, there is an implicit call to C2’s constructor. When there is a call site in method m1of class C1 to method m2 of class C2 , there is a call dependence edge from the call sitein m1 to method start vertex of m2 . Parameter in edges are added from actual in to thecorresponding formal in node and parameter out edges are added from formal out to thecorresponding actual in node. In object oriented programs, data dependence computation is complicated by thefact that statements can read to and write from ﬁelds of objects, i.e. a statement canhave side eﬀects. Computation of side eﬀect information requires points to analysis and isfurther discussed in Chapter 3. Also, methods can be invoked on objects and objects canbe passed as parameters. An algorithm for computing data dependence must considerthis into account.Handling objects at callsitesIn presence of a function call invoked on an object such as o.m1(), the function call canmodify the data members of o. Larsen and Harrold observe that data member variablesof a class are accessible to all methods in the class and hence can be treated as globalvariables. They use additional parameters to represent the data members referenced by amethod. Thus the data dependence introduced by two consecutive method calls via data
- 35. Chapter 2. Slicing 28class Base { int a,b; protected void vm() { a=a+b; } public Base() { class Derived extends Base { a=0; long d; b=0; public void vm() { } d=d+b; public void m2(int i) { } b=b+i; public Derived() { } super(); public void m1() { d=0; if(b0) vm(); } b=b+1; public void m3() { } d=d+1; m2(1); public void main1() { } Base o = new Base(); public void m4() { Base ba = new Base(); m1(); ba.m1(); } ba.m2(1); o.m2(1); public void main2() { } int i=read(); public void C(Base ba) { Base p; ba.m1(); if(i0) ba.m2(1); p=new Base(); } else public void D() { p=new Derived(); Base o = new Base(); C(p); C(o); p.m1(); o.m1(); } } } } Figure 2.9: Program
- 36. Chapter 2. Slicing 29 Figure 2.10: The Dependence Graph for the main function (from [67]) Figure 2.11: The Dependence Graphs for functions C() and D() (from [67])
- 37. Chapter 2. Slicing 30member variables can be represented as data dependence between the actual parametersat the method callsites. Figure 2.10 shows the dependence graph constructed for themain program of Figure 2.9. Variables a and b are considered as global variables sharedacross methods m1(), m2() and Base(). The data member variables are considered asadditional parameters that are passed to the function. This method of slicing includesonly those statements that are necessary for data members at the slicing criteria toreceive correct values. For example, slicing with respect to the node b = b out associatedwith the statement o.m2() will exclude statements that assign to data member a. One source of imprecision of this method is that it does not consider the fact thatdata members may belong to diﬀerent objects and creates spurious dependencies betweendata members of diﬀerent objects. In the above example, the slice wrongly includes thestatements ba.m1() and ba.m2(). Liang and Harrold [67] give an improved algorithm forobject sensitive slicing. In the dependence graph representation of [67], the constructor has no formal invertices for the instance variables since these variables cannot be referenced before theyare allocated by the class constructor. Thus the algorithm omits formal-in verticesfor instance variables in the class constructor In the approaches of [67], [66] the datamembers of the class are treated as additional parameters to be passed to the function.This increases the number of parameter nodes. The number of additional nodes canbe reduced using GMOD/GREF information. Actual-out and Formal-out vertices areneeded only for those data members that are modiﬁed by the member function. Actual-inand Formal-in vertices are needed for those data members accessed by the function.Handling Parameter ObjectsTonella [59] represents an object as a single vertex when the object is used as a parameter.This representation can lead to imprecise slices because it considers modiﬁcation (oraccess) of an individual ﬁeld in an object to be a modiﬁcation(or access) of the entireobject. For example, if the slicing criteria is o.b at the end of D() (in Figure 2.9), thenC(o) must be included. This in turn causes the slicer to include the parameter ba,
- 38. Chapter 2. Slicing 31which causes ba.a and ba.b to be included, though ba.a does not aﬀect o.b. To overcomethis limitation, Liang and Harrold [67] expand the parameter object as a tree. Figure2.11 shows the parameter ba being expanded into a tree. At the ﬁrst level, the noderepresenting ba is expanded into two nodes, Base and Derived each representing the typeba can possibly have. At the next level, each node is expanded into its constituent datamembers. Since data members can themselves be objects, the expansion is recursivelydone till we get primitive data types. In presence of recursive data types, where treeheight can be inﬁnite , k-limiting is used to limit the height of the tree to k. At the callstatement C(o) in Figure 2.9, the parameter object o is expanded into its data members.At the function call, actual in and actual out vertices are created for the data membersof o. Summary edges are added between the actual in and actual out vertices if there isa dependence possible through the called procedure.2.3.2 Handling InheritanceJava provides a single inheritance model which means that a new Java class can bedesigned that inherits state variables and functionality from an existing class. Thefunctionality of base class methods can be overridden by simply redeﬁning the methodsin the base class. Larsen and Harrold [66] construct dependence graph representationsfor methods deﬁned by the derived class . The representations of all methods thatare inherited from superclasses are simply reused. To construct the dependence graphrepresentation of class Derived (Figure 2.9), new representations are constructed formethods such m3(), m4(). The representation of m1() is reused from class Base Liang and Harrold [67] illustrate that in the presence of virtual methods, it is not pos-sible to directly reuse the representations of the methods of the superclass.For example,we cannot directly reuse the representation for m1() in class Base when we constructthe representation for class Derived. In the Base class , the call statement vm() inm1() resolves to Base :: vm(). If a class derived from Base redeﬁnes vm(), then the callstatement vm() no longer resolves to Base :: vm(), but to the newly deﬁned vm() of thederived class. The callsites in the representation of m1() for class Derived have to be
- 39. Chapter 2. Slicing 32changed. A method needs a new representation if 1. the method is declared in the new class 2. the method is declared in a lower class in the hierarchy and calls a newly redeﬁned virtual method directly or indirectly.For example, methods declared in Dervied need a new representation because thesemethods satisfy (1), Base.m1() also needs a new representation because it satisﬁes (2):Base.m1() calls Dervied.vm() which is redeﬁned in class DerivedHandling InterfacesIn Java, interfaces declare methods but let the responsibility of deﬁning the methods toconcrete classes implementing the interface. Interfaces allows the programmer to workwith objects by using the interface behavior that they implement, rather than by theirclass deﬁnition. Single Interfaces We use the interface representation graph [58] to represent a Java interface and itscorresponding classes that implement it. There is a unique vertex called interface startvertex for the entry of the interface. Each method declaration in the interface can beregarded as a call to its corresponding method in a class that implements it and thereforea call vertex is created for each method declaration in the interface. The interface startvertex is connected to each call vertex of the method declaration by interface membershipdependence arcs. If there are more than once classes that implement the interface, weconnect a method call in the interface to every corresponding method that implement itin the classes. Interface Extending Similar to extending classes, the representation of extendedinterface is constructed by reusing the representation of all methods that are inheritedfrom superinterfaces. For newly deﬁned methods in the extended interface, new repre-sentations are created.
- 40. Chapter 2. Slicing 33ie1 interface A { ie1c1 void method1(int h); (a)c2 void method2(int v); f1_in: this.h=this.h_in } f2_in: this.v=this.v_inie3 interface B extends A { c1 c2 f3_in: this.u=this.u_inc4 void method3(int u); f4_in: h1=h1_in } f5_in: v1=v1_in a1_in a2_in f6_in: u1=u1_ince5 class C1 implements A { f7_in: h2=h2_ins6 int h, v; e9 e16 f8_in: v2=v2_ine7 public void method1(int h1) { e7 e13 a1_in: h1_in=hs8 this.h = h1; a2_in: v1_in=v } a3_in: u1_in=ue9 public void method2(int v1) {s10 this.v = v1; f4_in f7_in f5_in f8_in } s10 s17 s8 s14 }ce11 class C2 implements A {s12 int h, v;e13 public void method1(int h2) { ie3s14 this.h = h2+1; (b) }e16 public void method2(int v2) {s17 this.v = v2+1; } c1 c2 c4 }ce18 class C3 implements B { interface-memberships19 int h, v, u; a1_in a2_in a3_in dependence arce20 public void method1(int h1) {s21 this.h = h1+2; control dependence arc e20 e22 e24 }e22 public void method2(int v1) {s23 this.v = v1+2; call dependence arc } f4_in f5_in f6_ine24 public void method3(int u1) { parameter dependence arcs25 this.u = u1+2; s21 s23 s25 } } Figure 2.12: Interface Dependence Graph (from [58])
- 41. Chapter 2. Slicing 342.3.3 Handling PolymorphismIn Java, method calls are bound to the implementation at runtime. Method invocationexpressions such as o.m(args) are executed as follows 1. The runtime type T of o is determined. 2. Load T.class 3. Check T to ﬁnd an implementation for method m. If T does not deﬁne an imple- mentation, T checks its superclass, and its superclass until an implementation is found. 4. Invoke method m with the argument list, args, and also pass o to the method, which will become the this value for method m. A polymorphic reference can refer to instances of more than one class. A classdependence graph represents such polymorphic method call by using a polymorphicchoice vertex [66]. A polymorphic choice vertex represents the selection of a particularcall given a set of possible destinations. In this method a message sent to a polymorphicobject is represented as a set of callsites one for each candidate message handling method,connected to a polymorphic choice vertex with polymorphic choice edges. This approachmay give incorrect results: in function main() , Larsen’s approach uses only one callsite torepresent statement p.m1() because m1() is declared only in Base. However, when m1()is called from objects of class Derived, it invokes Derived.vm() to modify d and whenm1() is called from objects of class Base, it invokes Base.vm() to modify a. One callsitecannot precisely represent both cases. This approach also computes spurious dependence:the approach is equivalent to using several objects, each belonging to a diﬀerent typeto represent a polymorphic object. The data dependence construction algorithm cannotdistinguish data members with the same names in these diﬀerent objects. Liang and Harrold [67] give an improved method in representing polymorphism toovercome this limitation. A polymorphic object is represented as a tree: the root of thetree represents the polymorphic object and the children of the root represent objects of
- 42. Chapter 2. Slicing 35the possible types. When the polymorphic object is used as a parameter, the childrenare further expanded into trees; when the polymorphic object receives a message, thechildren are further expanded into callsites. In Figure 2.11 the callsite ba.m1() can havereceiver types Base and Derived . Thus the call site is expanded (one for each type ofreceiver).2.3.4 Case Study - Elevator Class and its Dependence GraphFigure 2.13 shows the elevator program and the slice with respect to the line 59. Figure2.14 shows the class dependence graph constructed for the program. The C++ Elevatorclass discussed in [72] has been modiﬁed for Java.
- 43. Chapter 2. Slicing 36 30 } 31 int current floor; 1 class Elevator { 32 int current direction; 33 int top floor; 2 static int UP=1, DOWN=-1; 34 } 3 public Elevator(int t) { 35 class AlarmElevator extends Elevator { 4 current floor=1; 36 public AlarmElevator(int top floor) { 5 current direction = UP; 6 top floor = t; 37 super(top floor); 7 } 38 alarm on=0; 39 } 8 public void up() { 40 public void set alarm() { 9 current direction=UP; 41 alarm on=1;10 } 42 } 43 public void reset alarm() {11 public void down() { 44 alarm on=0; }12 current direction=DOWN; 45 public void go(int floor) {13 } 46 if(!alarm on)14 int which floor() { 47 super.go(floor);15 return current floor; 48 }16 } 49 protected int alarm on;17 public int direction() { 50 }18 return current direction;19 } 51 class Test { 52 public static void main(String args[]) {20 public void go (int floor) { 53 Elevator e; 54 if(condition)21 if(current direction==UP) { 55 e=new Elevator(10);22 while (current floor!= floor 56 else23 current floor = top floor)) 57 e=new AlarmElevator(10);24 current floor= current floor+1 ; 58 e.go(5);25 } 59 System.out.print(e.which floor());26 else { 60 }27 while (current floor != floor 61 }28 current floor 0)29 current floor= current floor-1; Figure 2.13: The Elevator program
- 44. Chapter 2. Slicing 37 52 slice point 54 58 59 A4_in 14 57 55 P1 F1_in 15A10_in A4_out A5_out A6_out A7_out A11_in A4_out A5_out A6_out A4_in A5_in A6_in A7_in A9_in A4_out A4_in A5_in A6_in A9_in A4_out 36F3_in F1_out F2_out F3_out F8_out 40 37 31A8_in A4_out A5_out A6_out 3 F1_in F2_in F3_in F8_in F5_in F1_out F4_in F1_out F2_out F3_out 32 33 4 5 6 control dependence A4_out edge A4_in A5_in A6_in A8_in 20 data dependence edge F1_in F2_in F3_in F5_in F1_out summary edge 21 call edge, parameter edge 22 27 key for parameter vertices A1_in: a_in = current_floor 24 29 A1_out: current_floor = a_out F1_in: current_floor = current_floor_in A2_in: b_in = 1 F1_out: current_floor_out = current_floor A3_in: b_in: = ?1 F2_in: current_dirn = current_dirn_in A4_in: current_floor_in = current_floor F2_out: current_dirn_out = current_dirn A4_out: current_floor = current_floor_out F3_in: top_floor = top_floor_in A5_in: current_dirn_in = current_dirn F3_out: top_floor_out = top_floor A5_out: current_dirn = current_dirn_out F4_in: 1_top_floor = 1_top_floor_in A6_in: top_floor_in = top_floor F5_in: floor = floor_in A6_out: top_floor = top_floor_out F6_in: a = a_in A7_in: alarm_on_in = alarm_on F6_out: a_out = a A7_out: alarm_on = alarm_on_out F7_in: b = b_in A8_in: 1_top_floor_in = 1_top_floor F8_in: alarm_on = alarm_on_in A9_in: floor_in = 5 F8_out: alarm_on_out = alarm_on A!0_in: top_floor = 10 A11_in: 1_top_floor = 10 Figure 2.14: Dependence Graph for Elevator program
- 45. Chapter 3Points to AnalysisIn this chapter we ﬁrst discuss the need for points to analysis. In the context of slicing,points to analysis is essential for the correct computation of data dependencies andconstruction of call graph. We summarize some issues related to computing points tosets, including the methods for its computation and various factors that aﬀect precision. We next describe Andersen’s algorithm for pointer analysis for C and its adaptationfor Java. We then describe a new method for intra-procedural alias analysis which is animprovement over ﬂow insensitive analysis but not as precise as a ﬂow sensitive analysis.3.1 Need for Points to AnalysisThe goal of pointer analysis is to statically determine the set of memory locations thatcan be pointed to by a pointer variable. If two variables can access the same memorylocation, the variables are said to be aliased. Alias analysis is necessary for program anal-ysis, optimizations and correct computation of data dependence which is necessary forslicing. Consider the computation of data dependence in Figure 3.1. Here the statementprint(y.a) is dependent on x.a=... , since x and y are aliased due to the executionof the statement y=x. Without alias analysis, it is not possible to infer that statement 7is dependent on statement 4. A points to graph gives information about the set of memory locations pointed at by 38
- 46. Chapter 3. Points to Analysis 391 void fun() {2 obj x,y;3 x=new obj(); // O1 represent the object allocated4 x.a = ....;5 ... = y.a;6 y = x;7 print(y.a);8 } Figure 3.1: Need for Points to Analysis each variable. Figure 3.1 shows a program and its associated points to graph. In C a variable can point to another stack variable or dynamically allocated memory on heap, whereas in Java a reference variable can point only to objects allocated on heap, as stack variables cannot be pointed to due to lack of address of operator (). Dynamically allocated memory locations on heap are not named. One convention is to refer objects (memory locations) by the statement at which they are created. A statement can be executed many times and therefore can create a new object each time. Thus approximations are introduced in the points to graph if the above convention is used. Another cause for approximation is the presence of recursion and dynamic allocation of memory, which leads to statically unbounded number of memory locations. 3.2 Pointer Analysis using Constraints Our aim is to derive the points to graph from the program text. One method to derive the points to graph is using constraints [64]. If pts(q) denotes the set of objects initially pointed by q, after an assignment such as p = q, p can additionally point to those objects, which are initially pointed at by q. Thus we have the constraint pts(p) ⊇ pts(q). Every statement in the program has an associated constraint. A solution to the constraints gives the points to sets associated with every variable. The constraints such as pts(p) ⊇ pts(q) are also called subset constraints or inclusion based constraints. Andersen uses subset constraints for analyzing C program and his algorithm is described in Section 3.4
- 47. Chapter 3. Points to Analysis 40 Points to graph for a C program Points to graph for a Java program s heap2 int a=1, b=2; class Obj { int f; } int *p, *q; r heap1 Obj r,s,t; void *r, *s; h1: r = new Obj(); p = a; q h2: s = new Obj(); q = b; p h3: r.f = new Obj(); t h1: r = malloc t = s; heap2 h2: s = malloc a s f b r heap1 f heap3 f Figure 3.2: Points to GraphsSubset vs Uniﬁcation ConstraintsThe constraints generated can be either subset based or equality based. A subset con-straint such as p ⊇ q says that the the points-to set of p contains the points-to set ofq. Instead of having subset constraints, Steensgaard [13] uses equality based constraintswhere after each assignment like p = q, the points to sets of p and q are uniﬁed i.e. thepoints to sets of both the variables are made identical. Steensgaard’s approach is based on a non standard type system, where type does notrefer to declared type in the program source. Instead, the type of a variable describesa set of locations possibly pointed to by the variable at runtime. At initialization eachvariable is described by a diﬀerent type. When two variables can point to the same mem-ory location, the types represented by the variables are merged. However the strongerconstraints make the analysis less precise. The equality based approach is also calleduniﬁcation because it treats assignments as bidirectional. This uniﬁcation merges the
- 48. Chapter 3. Points to Analysis 41points to set of both sides of the assignment and is essentially computing an equivalencerelation deﬁned by assignments, which is done by the fast union ﬁnd algorithm [22] If all the variables can be assigned types, subject to the constraints, then the sys-tem of constraints is said to be satisﬁable or well typed. Points-to analysis reduces tothe problem of assigning types to all locations (variables) in a program, such that thevariables in the program are well-typed. At the end of the analysis, two locations areassigned diﬀerent types, unless they have to be described by the same type in order forthe system of constraints to be well-typed.3.3 Dimensions of PrecisionThe various factors that contribute to the precision of the analysis computed are ﬂowsensitivity, ﬁeld sensitivity, context sensitivity and heap modelling. Ryder [17] discussesvarious parameters that contribute to the precision of the analysisFlow Sensitive vs Flow Insensitive approachA ﬂow sensitive analysis takes into account the control ﬂow structure of the program.Thus the points-to set associated with a variable is dependent on the program point. Itcomputes the mapping variable ⊗ program point → memory location. This is precisebut requires a large amount of memory since the points to sets of the same variable attwo diﬀerent program points may be diﬀerent and their points-to sets have to be recordedseparately. Flow sensitive analysis allows us to take advantage of strong updates, whereafter a statement x = ..., the points to information about x prior to that statement canbe removed. A ﬂow insensitive approach computes conservative information that is valid at allprogram points. It considers the program as a set of statements and computes points-toinformation ignoring control ﬂow. Flow insensitive analysis computes a single points torelation that holds regardless of the order in which assignment statements are actually
- 49. Chapter 3. Points to Analysis 42executed. A ﬂow insensitive analysis produces imprecise results. Consider the computation ofdata dependence for the program in Figure 3.1. If we apply ﬂow insensitive alias anal-ysis, then the analysis will conclude that x and y can both point to O1 , and thus thestatement ... = y.a (line 5) is made dependent on x.a = ... . But y can point to O1only after the statement y = x. Thus ﬂow insensitive analysis leads to spurious datadependence.Field SensitivityAggregate objects such as structures can be handled by one of three approaches: ﬁeld-insensitive, where ﬁeld information is discarded by modeling each aggregate with a singleconstraint variable; ﬁeld-based, where one constraint variable models all instances of aﬁeld; and ﬁnally, ﬁeld-sensitive, where a unique variable models each ﬁeld instance of anobject. The following table describes these approaches for the code segment x.a = new object(); y.b = x.a ; ﬁeld based pts(b) ⊇ pts(a) ﬁeld insensitive pts(y) ⊇ pts(x) ﬁeld sensitive pts(y.b) ⊇ pts(x.a)Heap AbstractionTwo variables are aliased if they can refer to the same object in memory. Thus we needto keep track of objects that can be present at runtime. The objects created at runtimecannot be determined statically and have to be conservatively approximated. The leastprecise manner is to consider the entire heap as a single object. The most common man-ner of abstraction is to have one abstract object per program point. This abstract objectis a representative of all the objects that can be created at runtime due to that program
- 50. Chapter 3. Points to Analysis 43main() { object a,b,c,d; a=new object(); pts(a) ⊇ {o1} b=new object(); pts(b) ⊇ {o2} c=id(a); pts(r) ⊇ pts(a), pts(c) ⊇ pts(r) d=id(b); pts(r) ⊇ pts(b), pts(d) ⊇ pts(r)}object id(object r) { return r;} Figure 3.3: Imprecision due to context insensitive analysispoint. A more precise abstraction is to take context sensitivity into account using thecalling context to distinguish between various objects created at the same program point.Context SensitivityA context sensitive analysis distinguishes between diﬀerent calling contexts and does notmerge data ﬂow information from multiple contexts. In Figure 3.3, a and b point to o1and o2 respectively. Due to the function calls, c is made to point to o1 and d is madeto point to o2. So the actual points to sets are a → o1 , b → o2, c → o1 and c → d Acontext insensitive analysis models parameter bindings as explicit assignments. Thus rpoints to both the objects o1 and o2. This leads to smearing of information making cand d point to both o1 and o2. One method to incorporate context sensitivity is to summarize each procedure andembed that information at the call sites. A method can change the points to sets ofall data reachable through static variables, incoming parameters and all objects createdby the method and its callees. A method’s summary must include the eﬀect of all theupdates that the function and all its callees can make, in terms of incoming parameters.Thus summaries are huge. Also there is another diﬃculty due to call back mechanism.

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment