A Static Slicing Tool for Sequential Java Programs                         A Thesis              Submitted For the Degree ...
i
AbstractA program slice consists of a subset of the statements of a program that can potentiallyaffect values computed at s...
ContentsAbstract                                                                                                          ...
CONTENTS                                                                                                                iv...
List of Tables 3.1   Constraints for C . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   ....
List of Figures 1.1    A program and its slice . . . . . . . . . . . . . . . . . . . . . . . . . . . .                    ...
Chapter 1Introduction1.1     SlicingA program slice consists of the parts of a program that can potentially affect the valu...
Chapter 1. Introduction                                                                 2 read(n); i = 1;                 ...
Chapter 1. Introduction                                                                     3      slice to eliminate part...
Chapter 1. Introduction                                                                  4PDG’ss for individual procedures...
Chapter 1. Introduction                                                                 51.2     The SOOT FrameworkThe SOO...
Chapter 1. Introduction                                                                     6    to build the class depend...
Chapter 2SlicingIn this chapter, we discuss techniques for slicing a program and in particular issues thatarise when slici...
Chapter 2. Slicing                                                                  8graph of the program is constructed a...
Chapter 2. Slicing                                                                        92.1.3     Construction of the D...
Chapter 2. Slicing                                                                    10Here the variable a is defined and ...
Chapter 2. Slicing                                                                    11   P if(x>y)  S1     max = x;     ...
Chapter 2. Slicing                                                                                      12     This observ...
Chapter 2. Slicing                                                                   13                                   ...
Chapter 2. Slicing                                                                      14Algorithm 1 Algorithm to compute...
Chapter 2. Slicing                                                                                                   15   ...
Chapter 2. Slicing                                                                                   16         enter     ...
Chapter 2. Slicing                                                                    17  1. Build the program’s augmented...
Chapter 2. Slicing                                                                  18main() {    sum=0;    i=1;    while(...
Chapter 2. Slicing                                                                                 19                     ...
Chapter 2. Slicing                                                                    20the interprocedural case. The foll...
Chapter 2. Slicing                                                                     212.2.3    Computing Summary EdgesW...
Chapter 2. Slicing                                       22Algorithm 2 Computing Summary Information W = ∅, W is the workl...
Chapter 2. Slicing                                                                    23     called by P. Though the algor...
Chapter 2. Slicing                                                                                   24                   ...
Chapter 2. Slicing                                              25Algorithm 3 Two phase slicing algorithm (Krinke’s versio...
Chapter 2. Slicing                                                                     26   GMOD and GREF sets are used to...
Chapter 2. Slicing                                                                     27connecting the return statement t...
Chapter 2. Slicing                                                            28class Base {    int a,b;    protected void...
Chapter 2. Slicing                                                             29      Figure 2.10: The Dependence Graph f...
Chapter 2. Slicing                                                                       30member variables can be represe...
Chapter 2. Slicing                                                                       31which causes ba.a and ba.b to b...
Chapter 2. Slicing                                                                     32changed. A method needs a new rep...
Chapter 2. Slicing                                                                                                        ...
Chapter 2. Slicing                                                                       342.3.3     Handling Polymorphism...
Chapter 2. Slicing                                                                     35the possible types. When the poly...
Chapter 2. Slicing                                                                     36                                 ...
Chapter 2. Slicing                                                                                                        ...
Chapter 3Points to AnalysisIn this chapter we first discuss the need for points to analysis. In the context of slicing,poin...
Chapter 3. Points to Analysis                                                             391 void fun() {2     obj x,y;3 ...
Chapter 3. Points to Analysis                                                                               40            ...
Chapter 3. Points to Analysis                                                           41points to set of both sides of t...
Chapter 3. Points to Analysis                                                         42executed.   A flow insensitive anal...
Chapter 3. Points to Analysis                                                         43main() {    object a,b,c,d;    a=n...
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Thesis: A Static Slicing Tool for sequential Java programs
Upcoming SlideShare
Loading in …5
×

Thesis: A Static Slicing Tool for sequential Java programs

2,825 views

Published on

Masters thesis at IISc

Specialization topics: Compiler Design / Program Analysis / Pointer analysis / Java program optimization






Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,825
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
70
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Thesis: A Static Slicing Tool for sequential Java programs

  1. 1. A Static Slicing Tool for Sequential Java Programs A Thesis Submitted For the Degree of Master of Science (Engineering) in the Faculty of Engineering by Arvind Devaraj Computer Science and Automation Indian Institute of Science BANGALORE – 560 012 March 2007
  2. 2. i
  3. 3. AbstractA program slice consists of a subset of the statements of a program that can potentiallyaffect values computed at some point of interest. Such a point of interest along with a setof variables is called a slicing criterion. Slicing tools are useful for several applications,such as program understanding, testing, program integration, and so forth. Slicing objectoriented programs has some special problems, that need to be addressed due to featureslike inheritance, polymorphism and dynamic binding. Alias analysis is important forprecision of slices. In this thesis we implement a slicing tool for sequential Java programsin the SOOT framework. SOOT is a front-end for Java developed at McGill Universityand it provides several forms of intermediate code. We have integrated the slicer intothe framework. We also propose an improved technique for intraprocedural points-toanalysis. We have implemented this technique and compare the results of the analysiswith those for a flow-insensitive scheme in SOOT. Performance results of the slicer arereported for several benchmarks. ii
  4. 4. ContentsAbstract ii1 Introduction 1 1.1 Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The SOOT Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Slicing 7 2.1 Intraprocedural Slicing using PDG . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Program Dependence Graph . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Slicing using the Program Dependence Graph . . . . . . . . . . . 8 2.1.3 Construction of the Data Dependence Graph . . . . . . . . . . . . 9 2.1.4 Control Dependence Graph . . . . . . . . . . . . . . . . . . . . . 11 2.1.5 Slicing in presence of unstructured control flow . . . . . . . . . . . 14 2.1.6 Reconstructing CFG from the sliced PDG . . . . . . . . . . . . . 17 2.2 Interprocedural Slicing using SDG . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 System Dependence Graph . . . . . . . . . . . . . . . . . . . . . . 18 2.2.2 Calling context problem . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.3 Computing Summary Edges . . . . . . . . . . . . . . . . . . . . . 21 2.2.4 The Two Phase Slicing Algorithm . . . . . . . . . . . . . . . . . 21 2.2.5 Handling Shared Variables . . . . . . . . . . . . . . . . . . . . . . 23 2.3 Slicing Object Oriented Programs . . . . . . . . . . . . . . . . . . . . . . 26 2.3.1 Dependence Graph for Object Oriented Programs . . . . . . . . . 26 2.3.2 Handling Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.3 Handling Polymorphism . . . . . . . . . . . . . . . . . . . . . . . 34 2.3.4 Case Study - Elevator Class and its Dependence Graph . . . . . . 353 Points to Analysis 38 3.1 Need for Points to Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 Pointer Analysis using Constraints . . . . . . . . . . . . . . . . . . . . . 39 3.3 Dimensions of Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4 Andersen’s Algorithm for C . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5 Andersen’s Algorithm for Java . . . . . . . . . . . . . . . . . . . . . . . . 45 3.5.1 Model for references and heap objects . . . . . . . . . . . . . . . . 45 iii
  5. 5. CONTENTS iv 3.5.2 Computation of points to sets in SPARK . . . . . . . . . . . . . 47 3.6 CallGraph Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.6.1 Handling Virtual Methods . . . . . . . . . . . . . . . . . . . . . . 49 3.7 Improvements to Points to Analysis . . . . . . . . . . . . . . . . . . . . . 50 3.8 Improving Flow Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.8.1 Computing Valid Subgraph at each Program Point . . . . . . . . 53 3.8.2 Computation of Access Expressions . . . . . . . . . . . . . . . . 55 3.8.3 Checking for Satisfiability . . . . . . . . . . . . . . . . . . . . . . 604 Implementation and Experimental Results 62 4.1 Soot-A bytecode analysis framework . . . . . . . . . . . . . . . . . . . . 62 4.2 Steps in performing slicing in Soot . . . . . . . . . . . . . . . . . . . . . 65 4.3 Points to Analysis and Call Graph . . . . . . . . . . . . . . . . . . . . . 65 4.4 Computing Required Classes . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.5 Side effect computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.6 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.7 Computing the Class Dependence Graph . . . . . . . . . . . . . . . . . . 70 4.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 Conclusion and Future Work 75Bibliography 77
  6. 6. List of Tables 3.1 Constraints for C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 Constraints for Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3 Data flow equations for computing valid edges . . . . . . . . . . . . . . . 53 3.4 Computation of Valid edges . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1 Benchmarks Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Number of Edges in the Class Dependence Graph . . . . . . . . . . . . . 72 4.3 Timing Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.4 Program Statistics - Partial Flow Sensitive . . . . . . . . . . . . . . . . . 73 4.5 Precision Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 v
  7. 7. List of Figures 1.1 A program and its slice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 A Control Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Post Dominator Tree for the CFG in Figure 2.1 . . . . . . . . . . . . . . 12 2.3 Dominance Frontiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 A program and its PDG (taken from [39]) . . . . . . . . . . . . . . . . . 15 2.5 Augmented CFG and PDG for the program in Figure 2.4 (taken from [39]) 16 2.6 A program with function calls . . . . . . . . . . . . . . . . . . . . . . . . 18 2.7 System Dependence Graph for an interprocedural program . . . . . . . . 19 2.8 Slicing the System Dependence Graph . . . . . . . . . . . . . . . . . . . 24 2.9 Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.10 The Dependence Graph for the main function (from [67]) . . . . . . . . 29 2.11 The Dependence Graphs for functions C() and D() (from [67]) . . . . . 29 2.12 Interface Dependence Graph (from [58]) . . . . . . . . . . . . . . . . . . 33 2.13 The Elevator program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.14 Dependence Graph for Elevator program . . . . . . . . . . . . . . . . . . 37 3.1 Need for Points to Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Points to Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Imprecision due to context insensitive analysis . . . . . . . . . . . . . . . 43 3.4 Object Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.5 An example program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.6 Access Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.7 OFG Subgraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.8 Access Expressions(for a DAG) . . . . . . . . . . . . . . . . . . . . . . . 58 3.9 Access Expressions (for general graph) . . . . . . . . . . . . . . . . . . . 60 3.10 Simplified Access Expressions . . . . . . . . . . . . . . . . . . . . . . . . 60 3.11 Dominator Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.1 Soot Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Computation of the class dependence graph . . . . . . . . . . . . . . . . 66 4.3 Jimple code and its slice . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 vi
  8. 8. Chapter 1Introduction1.1 SlicingA program slice consists of the parts of a program that can potentially affect the value ofvariables computed at some point of interest. Such a point is called the slicing criterionand is specified by a pair (program point,set of variables).The original concept of aprogram slice was proposed by Mark Weiser [61]. According to his definition A slice s of program p is a subset of the statements of p that retains some specified behavior of p. The desired behavior is detailed by means of a slicing criterion c. Generally, a slicing criterion c is a set of variables V and a program point l. When the slice s is executed, it must always have the same values as program p for the variables in V at point l. Weiser claimed that a program slice was the abstraction that users had in mind asthey debugged programs. There have been variations in the definitions of program slicesdepending on the application in mind. Weiser’s original definition required a slice S ofa program to be an executable subset of the program, whereas another common defini-tion defines a slice as a subset of statements that directly or indirectly affect the valuescomputed at the point of interest but are not necessarily an executable segment. Fig-ure 1.1 shows a program sliced with respect to the slicing criterion ( print(product), 1
  9. 9. Chapter 1. Introduction 2 read(n); i = 1; read(n); sum = 0; i = 1; product = 1; product = 1; while (i<=n) { while (i<=n) { sum = sum + i; product = product * i; product = product * i; i = i + 1; i = i + 1; } } print(sum); print(product); print(product); Figure 1.1: A program and its sliceproduct) . Since the transformed program is expected to be much smaller than theoriginal it is hoped that dependencies between statements in the program will be moreexplicit. Surveys on program slicing are presented in [45], [73]. Slicing tools have beenused for several applications, such as program understanding [82], testing [74] [75], pro-gram integration [78], model checking [79] and so forth. 1. Program Understanding: Software engineers are assigned to understand a mas- sive piece of code and modify parts of them. When modifying a program, we need to comprehend a section of the program rather than the whole program. Backward and forward slicing can be used to browse the code and understand the interde- pendence between various parts of the program. 2. Testing: In the context of testing, a problem that is often encountered is that of finding the set of program statements that are affected by a change in the program. This analysis is termed impact analysis. To determine what tests need to be re-run to test test a modified statement S, a backward slice on S will get the statements that actually influence the behavior of the program. 3. Debugging: Quite often the statement that is actually responsible for a bug that shows up at some program point P is statically far away from P . To reduce the search space of possible causes for the error the programmer can use a backward
  10. 10. Chapter 1. Introduction 3 slice to eliminate parts of the code that could not have been the cause of the problem. 4. Model Checking: Model checking is a verification technique that performs an exhaustive exploration of a program’s state space. Typically the execution of a program is simulated and path and states encountered in the simulation are checked against correctness specifications phrased as temporal logic formula. The use of slicing here is to reduce the size of a program P beginning checked for a property by eliminating statements and variables that are irrelevant to the formula. There is an essential difference between static and dynamic slices. A static slicedisregards the actual inputs to a program whereas the latter relies on a specific test caseand therefore is in general , more precise. When slicing a program P we are concerned with both correctness as well as precision.For correctness we demand that the slice S produced by the tool is a superset of theactual slice S(p) for the slicing criterion p. Precision has to do with the size of the slice.For two correct slices S1 and S2 , S1 is more precise than S2 , if the statements of S1are a subset of the statements of S2 . Obtaining the most precise slice, is in general notcomputable, hence our aim is to compute a correct slice that is as precise as possible. The slicing problem can be addressed by viewing it as a reachability problem in aProgram Dependence Graph (PDG) [54]. A PDG is a directed graph with vertices cor-responding to statements and predicates and edges corresponding to data and controldependences. For the sequential intraprocedural case, the backward slice with respectto a node in the PDG is the set of all nodes in the PDG on which this node is tran-sitively dependent. Thus given the PDG, a simple reachability algorithm on the PDGwill construct the slice. However when considering interprocedural slices, the processis more complicated as mere reachability will produce imprecise slices. One needs totrack only interprocedural realizable paths, where a realizable path corresponds to legalcall/return pairs where a procedure always returns to the call site where it was invoked.The structure on which interprocedural slicing is generally implemented is the SystemDependence Graph [63] (SDG). This graph is a collection of graphs corresponding to
  11. 11. Chapter 1. Introduction 4PDG’ss for individual procedures augmented with some extra edges that capture theinteraction between them. Slicing of interprocedural programs is described by Horwitzet.al [63]. They use the SDG to track dependencies in a program and use a two phasealgorithm to ensure that only feasible paths are tracked, that is, those in which procedurecalls are matched with the correct return statements. Slicing object oriented programs adds yet another dimension of complexity to theslicing problem. Object-oriented concepts such as classes, objects, inheritance, poly-morphism and dynamic binding make representation and analysis techniques used forimperative programming languages inadequate for object-oriented programs. The ClassDependence Graph has been introduced by Larsen and Harrold [66], which can representclass hierarchy, data members and polymorphism. Some more features were added byLiang and Harrold [67]. The resolution of aliases is required for the correct computation of data dependencies.To compute the dependence graph, it is necessary to build a call graph. The computationof call graph becomes complicated in presence of dynamic binding , i.e. when the targetof a method call depends on the runtime type of a variable. Algorithms like Rapid TypeAnalysis (RTA) [26] compute call graphs using type information. A key analysis for object oriented languages is alias analysis. The objective here isto follow an object O from its point of allocation to find out which objects referenceO and which other objects are referenced by the fields of O Resolving aliasing becomesimportant for the correct computation of data dependencies in the dependence graph.The precision of the analysis depends on various factors like flow sensitivity, contextsensitivity and handling of field references. Andersen [64] gives a flow insensitive methodfor finding aliases using subset constraints. Lhotak [70] describes the method adaptedfor Java programs. In this thesis we implement a slicing tool for sequential Java programs and integrateit into the SOOT framework. We briefly describe the framework and the contributionsof the thesis.
  12. 12. Chapter 1. Introduction 51.2 The SOOT FrameworkThe SOOT analysis and transformation framework [69] is a Java optimization frameworkdeveloped by the Sable Research Group at McGill University and it is intended to be arobust, easy-to-use research framework. It has been used extensively for program analy-sis, instrumentation, and optimization. It provides several forms of intermediate code foranalyzing and optimizing Java bytecode. Jimple is a typed three address representation,which we have used in our implementation. Our objective is to implement a slicing tool within the Soot framework [69] and makeit publicly available. At the time this work was begun there was no publicly availableslicing infrastructure for Java. The Indus [81] project addresses the slicing problem forJava programs and source code has been made available in February 2007.1.3 Contributions of the thesisThe following are the contributions of this thesis: 1. We have implemented the routines for creating the program dependence graphs and the class dependence graph for an input Java program that is represented in the form of Jimple intermediate code. 2. We have integrated a slicer into the framework. For inter-procedural slicing we have implemented the two-phase slicing algorithm of [63]. 3. We propose an improved technique for intraprocedural points-to analysis. This uses path expressions to track paths that encode valid points-to information. A simple data-flow analysis formulation collects valid edges, i.e. those that are added to the object flow graph. Reachability queries are handled in a reasonable amount of time. We have implemented this technique and compare the results of the analysis with those for a flow-insensitive scheme in SOOT. 4. The slicing tool has been run on several benchmarks and we report on times taken
  13. 13. Chapter 1. Introduction 6 to build the class dependence graph, its size, slice sizes for some given slicing criteria and slicing times.
  14. 14. Chapter 2SlicingIn this chapter, we discuss techniques for slicing a program and in particular issues thatarise when slicing object oriented programs. The first part of the chapter describes theProgram Dependence Graph (PDG), its construction and the algorithm for intraproce-dural slicing. For slicing programs with function calls, the System Dependence Graph(SDG) is used. The SDG is a collection of PDGs individual procedures with additionaledges for modeling procedure calls and parameter bindings. The second part of thechapter describes the construction of SDG and the algorithm for interprocedural slicing.The third part of the chapter describes dependence graph computation of object ori-ented programs, which is complicated because objects can be passed as parameters andmethods can be invoked upon objects. Also we need the results of points to analysis todetermine what objects are pointed by each reference variable. Then we describe the ex-tension of the algorithm for computing the dependence graph in presence of inheritanceand polymorphic function calls.2.1 Intraprocedural Slicing using PDGWeiser’s approach [61] to program slicing is based on dataflow equations. In his approach,the set of relevant variables is iteratively computed till a fixed point is reached. Slicingvia graph reachability was introduced by Ottenstein [54]. In this approach a dependence 7
  15. 15. Chapter 2. Slicing 8graph of the program is constructed and the problem of slicing reduces to computingreachability on the dependence graph. We adopt this in our implementation.2.1.1 Program Dependence GraphA program dependence graph (PDG) represents the data and control dependencies inthe program. Nodes of PDG represent statements and predicates in a source program,and its edges denote dependence relations. The PDG can be constructed as follows. 1. Build the program’s CFG, and use it to compute data and control dependencies: Node N is data dependent on node M iff M defines a variable x, N uses x, and there is an x-definition-free path in the CFG from M to N . Node N is control dependent on node M iff M is a predicate node whose evaluation to true or false determines whether N will be executed. 2. Build the PDG. The nodes of the PDG are almost the same as the nodes of the CFG. However, in addition, there is a a special enter node, and a node for each predicate. The PDG does not include the CFG’s exit node. The edges of the PDG represent the data and control dependencies computed using the CFG.2.1.2 Slicing using the Program Dependence GraphTo compute the slice from statement (or predicate) S, start from the PDG node thatrepresents S and follow the data- and control-dependence edges backwards in the PDG.The components of the slice are all of the nodes reached in this manner. The computation of the data dependence graph is described in Section 2.1.3. Com-puting the control dependence graph is described in Section 2.1.4. Figure 2.4 shows anexample program and its corresponding PDG. Solid lines represent control dependencieswhile dashes lines represent data dependencies.
  16. 16. Chapter 2. Slicing 92.1.3 Construction of the Data Dependence GraphA data dependence graph represents the association between definitions and uses of avariable. There is an association (d, u) between a definition of variable v at d and a useof variable v at u iff there is at least one control flow path from d to u with no interveningdefinition of v. Each node represent a statement. An edge represents a flow dependency betweenstatements. Though there are many kinds of data dependencies between statements,only flow dependencies are necessary for the purpose of slicing as only flow dependenceneeds to be traced back in order to compute the PDG nodes comprising the slice. Outputand anti dependence edges do not represent true data dependence. Instead they encodea partial order on program statements, which is necessary to preserve since there is noexplicit control flow relation between PDG nodes. However, PDG slices are normallymapped back to high-level source code, where control flow is explicitly represented. Thusthere is no need for any such control flow information to be present in the computedPDG slice. Computation of flow dependencies is done by computing the problem of reachingdefinitions. The problem of reaching definitions is a classical bitvector problem solvableby monotone dataflow framework. This associates a program point with the set ofdefinitions reaching that point. The definitions reaching a program point along with theuse of a variable form flow dependencies.Dependence in presence of arrays and recordsIn the presence of composite data types like arrays, records and pointers, the mostconservative method is to assume a definition of a variable to be the definition of theentire composite object [83]. A definition (or use) of an element of an array can beconsidered as definition (or use) of the entire array. For example, consider the statement a[i] = x
  17. 17. Chapter 2. Slicing 10Here the variable a is defined and variables i, x are used. Thus DEF = {a} andREF = {i, x}. The value of a is used in computing the address of a[i] and thus a mustalso be included in the REF set. The correct value for REF is {a, i, x} [45] . Thisapproach is conservative leading to large slices created due to spurious dependencies.Our current implementation handles composite data types in this manner, though morerefined methods have been proposed in the literature. Agrawal et.al. [53] propose amodified algorithm for computing reaching definitions that determines the memory loca-tions defined and used in statements and computes whether the intersection among thoselocations is complete or partial or statically indeterminable. Another method to avoidspurious dependencies is to use array index tests like GCD tests which can determinethat there is no dependence between two array accesses expressions.Data dependencies in presence of aliasingWhen computing data dependencies the major problem occurs due to presence of aliasing,Consider the following example. Here there is a data dependency between x.a = ... and ...= y.a since both x and y point to the object o1. Without alias analysis this dependencyis missed because the syntactic expressions x.a and y.a are different. Thus resolvingaliases is necessary for the correct computation of data dependencies. Also if worst caseassumptions are made for field loads and stores, many spurious dependencies are created.v o i d fun ( ) { obj x , y ; x=new o b j ( ) ; // o1 i s th e o b j e c t c r e a t e d y=x ; x.a = . . . . ; ... = y.a ;}
  18. 18. Chapter 2. Slicing 11 P if(x>y) S1 max = x; else S2 max = y;2.1.4 Control Dependence GraphAnother kind of dependence between statements arises due to the presence of controlstructure. For example, in the above code, the execution of S1 is dependent on the predicatex > y . Thus S1 is said to be control dependent on P. A slice with respect to S1 has toinclude P, because the execution of S1 depends on the outcome of the predicate node P. Two nodes Y and Z should be identified as having identical control conditions if inevery run of the program node, Y is executed if and only if Z is executed. In Figure2.1, nodes 2 and 5 are said to be control dependent on the true branch of node 1,since their execution is dependent conditionally on the outcome of node 1. The originalmethod for computing control dependence information using postdominators is presentedby Ferrante et.al. [47]. Cytron et.al. [46] gives an improved method for constructingcontrol dependence information by using dominance frontiers.Finding control dependence using postdominators relationshipA node X is said to be a postdominator of node Y if all possible paths from Y to the exitnode must pass through X. A node N is said to be control dependent on edge a → b , if 1. N postdominates b 2. N does not postdominate a In Figure 2.1, to find the nodes that are control dependent on edge 1 → 2, we findnodes that postdominate node 2 but not node 1. Nodes 2 and 5 are such nodes. Sonodes 2 and 5 are control dependent on the edge 1 → 2.
  19. 19. Chapter 2. Slicing 12 This observation suggests that to find the nodes that are control dependent on theedge X → Y , we can traverse the postdominator tree and mark all nodes that postdom-inate Y to be control dependent on Y , we stop when we reach the postdominator ofX. HIJK ONML 1 ÐÐ UU ÐÐ UU HIJK ONML Ð Ð ÐÐ UU 7 b UU ÐÐ bb HIJK ONML 2 b UU ÐÐÐ bb bb Ð bb UU ÐÐ bb Ð bb UU ÐÐ b1 ÐÐ bb ÐÐ Ð Ð ÐÐ 0 U HIJK ONML 5 b HIJK ONML 6 HIJK ONML 1 HIJK ONML HIJK ONML HIJK ONML Ð bb 3 b 4 6 ÐÐÐ bb bb Ð ÐÐ bb bb Ð × × ÐÐ bb bb ÐÐ ×× Ð b1 ÐÐ ÐÐ 0 Ð Ð ×× HIJK ONML 2 HIJK ONML 4 HIJK ONML 3 HIJK ONML 5 b ×× bb ×× bb ××× bb 0 ×× Ó HIJK ONML 7 Figure 2.2: Post Dominator Tree for the CFG in Figure 2.1 Figure 2.1: A Control Flow GraphUsing Dominance Frontiers to compute Control DependenceControl dependencies between statements can be computed in an efficient manner us-ing the dominance frontier information. Cytron et.al. [46] describes the method forcomputing dominance frontiers. A dominance frontier for vertex vi contains all vertices vj such that vi dominates animmediate predecessor of vj , but vi does not strictly dominate vj [62] DF (vi ) = { vj | (vj ∈ V ) (∃vk ∈ P red(vj )) ((vi dom vk ) ∧ ¬(vi sdom vj )) } Informally, the set of nodes lying just outside the dominated region of Y is said to
  20. 20. Chapter 2. Slicing 13 HIJK ONML S  Ö  Ö  ÖÖ   ÖÖ @ ÖÖÖ HIJK ONML Y h Ö ÖÖ {{ hh hh ÖÖ {{ hh ÖÓ Ö }{{{ 3 ONML HIJK Z WVUT PQRS Y g PQRS WVUT Y QQ gg z QQ gg zz QQ gg zz 3 zz } QQ PQRS WVUT QQ Y QQ uuu QQ uu QQ uuu QQ uu uuu Q% uu zuu HIJK ONML X Figure 2.3: Dominance Frontiersbe in the dominance frontier of Y. In the example in Figure 2.3, Y dominates nodesY’,Y”,Y”’ and X lies just outside the dominated region. So X is said to be in thedominance frontier of Y. Note that if X is in the dominance frontier of Y , then there would be at least twoincoming paths to X of which one contains Y another not does not. If the CFG isreversed, then we have two outgoing paths from X, one containing Y and another notcontaining Y. This is same as the condition for Y to be control dependent on X. Thusto find control dependence it is enough to find the dominance frontiers on the reversecontrol flow graph. Algorithm 1 computes the control dependence information.
  21. 21. Chapter 2. Slicing 14Algorithm 1 Algorithm to compute the Control Dependence Graph compute dominance frontiers of reversed CFG G i.e. for all N in G do let RDF (N ) be reverse dominator frontiers of N if RDF (N ) is empty then N is made control dependent on method entry node end if for all node P in RDF (N ) do for all node S in CFG successor of P do if S = N or N postdominates S then N is made control dependent on P end if end for end for end for2.1.5 Slicing in presence of unstructured control flowIn the presence of unstructured control flow caused due to jump statements like goto,break, continue and return, the algorithm for slicing can produce an incorrect slice. WhileJava does not have goto statements, break and continue statements cause unstructuredcontrol flow. Consider computing slice with respect to the statement print(prod) inFigure 2.4. When the slicing algorithm discussed in Section 2.1.2 is applied , the state-ment break is not included, which is incorrect. This was discovered by Choi and Ferrante [38] and by Ball and Horwitz [37] whopresent a method to compute a correct slice in presence of unstructured control flowstatements. Their method to correct for such statements is based on the observationthat jumps are similar to predicate nodes in a way - both affect flow of control. Thusjumps are also made to be sources of control dependence edges. A jump vertex has anoutgoing true edge to the target of the jump, and an outgoing false edge to the statementthat would execute if the jump were a no-op. A jump vertex is considered as a pseudopredicate since the outgoing false edge is non-executable. The original CFG augmentedwith these non-executable edges is called the Augmented Control Flow Graph (ACFG). Kumar and Horwitz [39] describe the following algorithm for slicing in presence ofjump statements.
  22. 22. Chapter 2. Slicing 15 enter prod = 1; k = 1; prod = 1 while (k = 10) { k=1 if (MAXINT/k prod) break; prod = prod * k; while (k = 10) T k++; F if (MAXINT/k prod) } print(k) print(k); T F print(prod); print(prod) break prod = prod * k exit k++ (a) Example Program (b) CFG enter prod = 1 k=1 while (k = 10) print(k) print(prod) if (MAXINT/k prod) break k++ prod = prod * k (c) PDG Figure 2.4: A program and its PDG (taken from [39])
  23. 23. Chapter 2. Slicing 16 enter enter prod = 1 prod = 1 print(prod) k=1 k=1 print(k) while (k = 10) T while (k = 10) F if (MAXINT/k prod) print(k) T T F if (MAXINT/k prod) print(prod) break prod = prod * k F break k++ exit k++ prod = prod * k (a) ACFG (b) Corresponding APDGFigure 2.5: Augmented CFG and PDG for the program in Figure 2.4 (taken from [39])
  24. 24. Chapter 2. Slicing 17 1. Build the program’s augmented control flow graph described previously. Labels are treated as separate statements; i.e., each label is represented in the ACFG by a node with one outgoing edge to the statement that it labels. 2. Build the program’s augmented PDG. Ignore the non-executable ACFG edges when computing data-dependence edges; do not ignore them when computing control- dependence edges. (This way, the nodes that are executed only because a jump is present, as well as those that are not executed but would be if the jump were removed, are control dependent on the jump node, and therefore the jump will be included in their slices.) 3. To compute the slice from node S, follow data- and control-dependence edges back- wards from S . A label L is included in a slice iff a statement “goto L” is in the slice2.1.6 Reconstructing CFG from the sliced PDGReconstructing the CFG from the PDG is described in in [71]. From the CFG and thePDG slice, a sliced CFG is constructed by walking through all nodes. For each node n,we execute the following. 1. If n is a goto statement or return statement, leave it in the slice 2. If n is a conditional statement , there are three cases (a) If n is not in the PDG slice, it can be removed (b) If n is in the PDG slice, but one of the branches is not, replace the jump to that branch with a jump to the convergence node of the branch (the node where two branches reconnect). If that node doesn’t exist , replace the jump with a jump to the return statement of the program (c) If n is present in the PDG slice and both branches are present leave n in the CFG
  25. 25. Chapter 2. Slicing 18main() { sum=0; i=1; while(i11) { sum=add(sum,i); i=add(i,1); } print(sum); print(i);}int add(int a,int b) { result=a+b; return result;} Figure 2.6: A program with function calls 3. Otherwise check if n is present in the PDG, if not remove it We next describe the interprocedural slicing algorithm implemented in this thesis.2.2 Interprocedural Slicing using SDG2.2.1 System Dependence GraphFor interprocedural slicing, Horwitz et.al [63] introduce the System Dependence Graph(SDG). A system-dependence graph is a collection of program-dependence graphs, onefor each procedure, with additional edges for modeling parameter passing. Figure 2.6shows a program with function calls. Figure 2.7 displays its SDG. Each PDG contains an entry node that represents entry to the procedure. To modelprocedure calls and parameter passing, an SDG introduces additional nodes and edges.Accesses to global variables are modeled via additional parameters of the procedure.They assume parameters are passed by value-result, and introduce additional nodes in
  26. 26. Chapter 2. Slicing 19 main sum=0 i=1 while(i11) print(sum) print(i) call add call add a_in=sum a_in=i i=r_out sum=r_out b_in=i b_in=1 enter add a=a_in b=b_in r_out=result result=a+b control edge parameter edge data edge call edge summary edge Figure 2.7: System Dependence Graph for an interprocedural program
  27. 27. Chapter 2. Slicing 20the interprocedural case. The following additional nodes are introduced. 1. Call-site nodes representing the call sites. 2. Actual-in and actual-out nodes representing the input and output parameters at the call sites. They are control dependent on the call-site node. 3. Formal-in and formal-out nodes representing the input and output parameters at the called procedure. They are control dependent on the procedure’s entry node. They also introduce additional edges to link the program dependence graphs together: 1. Call edges link the call-site nodes with the procedure entry nodes. 2. Parameter-in edges link the actual-in nodes with the formal-in nodes. 3. Parameter-out edges link the formal-out nodes with the actual-out nodes2.2.2 Calling context problemFor computing an intraprocedural slice, a simple reachability algorithm on the PDG issufficient. However in interprocedural case, a simple reachability over the SDG doesn’twork since not all the paths are valid. For example, in Figure 2.7, the path a in = sum →a = a in → result = a + b → r out = result → i = r out is not valid interprocedurally.In an interprocedural valid path, a call edge must be matched with its correspondingreturn edge. To address this problem, Horwitz et.al. [63] introduce the concept of summary edges.These edges summarize the effect of a procedure call. There is a summary edge betweenan actual in and an actual out node of a call site, if there is a dependency between thecorresponding formal in and formal out node of the called procedure. Thus a summaryedge summarizes the effect of a procedure call.
  28. 28. Chapter 2. Slicing 212.2.3 Computing Summary EdgesWe describe computation of summary edges in Algorithm 2. The algorithm takes thegiven SDG and adds summary edges. P is the set of path edges. Each edge in P ofthe form (n, m) encodes the information that there is a realizable path in the SDG fromn to m. The worklist contains path edges that need to be processed. The algorithmbegins by asserting that there is a realizable path from each formal out node to itself.The set of realizable paths P is extended by traversing backwards through dependenceedges. If during the traversal, a formal in-node is encountered, then we have a realizablepath from formal-in to formal-out node. Therefore a summary edge is added betweenthe actual in and actual out nodes of the corresponding call sites. Because the insertionof summary edges makes more paths feasible, this process is continued iteratively, till nomore summary edges can be added. The algorithm for computing summary informationis displayed in Algorithm 2 Computing the summary edges is equivalent to the functional approach suggested bySharir and Pnueli [41].2.2.4 The Two Phase Slicing AlgorithmHorwitz et.al [63] describe the two phase algorithm. The interprocedural backward slicingalgorithm consists of two phases. The first phase traverses backwards from the node inthe SDG that represents the slicing criterion along all edges except parameter-out edges,and marks those nodes that are reached. The second phase traverses backwards from allnodes marked during the first phase along all edges except call and parameter-in edges,and marks reached nodes. The slice is the union of the marked nodes. Let s be theslicing criterion in procedure P 1. Phase 1 identifies vertices that can reach s, and are either in P itself or in a procedure that calls P (either directly or transitively). Because parameter out edges are not followed, the traversal in Phase 1, does not descend into procedures
  29. 29. Chapter 2. Slicing 22Algorithm 2 Computing Summary Information W = ∅, W is the worklist P = ∅, P is the set of pathedges for all n ∈ N which is a formal out node do W = W ∪ (n, n) P = P ∪ (n, n) end for while W = ∅, worklist is not empty do remove one element (n,m) from worklist if n is a formal in node then for all n → n which is a parameter in edge do for all m → m which is a parameter out edge do if n and m belong to the same call site then E = E ∪ n → m add a new summary edge for all (m , x) ∈ P do P = P ∪ (n , x) W = W ∪ (n , x) end for end if end for end for else for all n → n do if (n , m) ∈ P then / P = P ∪ (n , m) W = W ∪ (n , m) end if end for end if end while
  30. 30. Chapter 2. Slicing 23 called by P. Though the algorithm doesn’t descend into the called procedures, the effects of such procedures are not ignored due to the presence of summary edges. 2. Phase 2 identifies vertices that reach s from procedures (transitively) called by P or from procedures called by procedures that (transitively) call P. Because call edges and parameter in edges are not followed, the traversal in phase 2 doesn’t ascend into calling procedures; the transitive flow dependence edges from actual in to actual out vertices make such ascents unnecessary. We implemented a variation of the two phase slicing algorithm as described by Krinke[49]. Figure 2.8 shows the vertices in SDG marked during phase 1 and phase 2, whenthe statement print(i) is given as slicing criteria. The first phase traverses backwardsalong all edges except the parameter out edge r out = result → i = r out . Thus thefirst phase does not descend into the procedure add. In second phase traverses backwardsall edges except the parameter in edges and call edges. Thus in the second phase neitherthe edge a in = sum → a = a in nor the edge call add → a = a in is traversed.2.2.5 Handling Shared VariablesThis section deals with handling variables that are shared across procedures. Sharedvariables include global variables in imperative languages. Though Java does not haveglobal variables, instance members of a class can be treated as global variables that areaccessible by the member functions. Shared variables are handled by passing them as a additional parameters in everyfunction. Considering every shared variable as a parameter is a correct but inefficient asit increases the number of nodes. We can reduce the number of parameters passed bydoing interprocedural analysis and using the GMOD and GREF information [42]. 1. GMOD(P) : The set of variables that might be modified by P itself or by a proce- dure (transitively) called from P 2. GREF(P) : The set of variables that might be referenced by P itself or by a pro- cedure (transitively) called from P
  31. 31. Chapter 2. Slicing 24 main sum=0 i=1 while(i11) print(sum) print(i) call add call add a_in=sum a_in=i i=r_out sum=r_out b_in=i b_in=1 enter add a=a_in b=b_in r_out=result result=a+b marked in phase 1 control edge parameter edge data edge call edge marked in phase 2 summary edge Figure 2.8: Slicing the System Dependence Graph
  32. 32. Chapter 2. Slicing 25Algorithm 3 Two phase slicing algorithm (Krinke’s version) input G=(N,E) the given SDG, s ∈ N the slicing criterion output S ⊆ N , the slice W up = s W down = ∅ First phase while W up = ∅ worklist is not empty do remove one element n from W up for all m → n ∈ E do if m ∈ S then / if m → n is a parameter out edge then W down = W down ∪ m S =S∪m else W up = W up ∪ m S =S∪m end if end if end for end while while W down = ∅ worklist not empty do remove an element n from the worklist for all m → n ∈ E do if m ∈ S then / if m → n is not a parameter in edge or call edge then W down = W down ∪ m S =S∪m end if end if end for end while
  33. 33. Chapter 2. Slicing 26 GMOD and GREF sets are used to determine which parameter vertices are includedin procedure dependence graphs . At procedure entry, these nodes are inserted 1. Formal in for each variable in GM OD(P ) ∪ GREF (P ) 2. Formal out for each variable in GM OD(P ) Similarly at a call site, the following nodes are inserted 1. Actual in for each variable in GM OD(P ) ∪ GREF (P ) 2. Actual out for each variable in GM OD(P )2.3 Slicing Object Oriented ProgramsThe System Dependence Graph (SDG) is not sufficient to represent all dependenciesfor object oriented programs. An efficient graph representation of an object orientedprogram should employ a class representation that can be reused in the construction ofother classes and applications that use the class. Section 2.3.1 discuss about dependencegraph representation for object oriented programs. Sections 2.3.2 and 2.3.3 discuss aboutinheritance and polymorphism respectively.2.3.1 Dependence Graph for Object Oriented ProgramsThe dependencies within a single method are represented using a Method DependenceGraph (MDG), which is composed of data dependence subgraph and control dependencesubgraph. The MDG has a method entry node which represents the start of a method.The method entry vertex has a formal in vertex for every formal parameter and a formalout vertex for each formal parameter that may be modified. Each call site has a call vertexand a set of actual parameter vertices: an actual-in vertex for each actual parameter atthe call site and an actual-out vertex for each actual parameter that may be modifiedby the called procedure. Parameter out edges are added from each formal-out node tothe corresponding actual-out node. The effects of return statements are modeled by
  34. 34. Chapter 2. Slicing 27connecting the return statement to its corresponding call vertex using a parameter-outedge. Summary edges are added from actual in to actual out nodes as described inSection 2.2.3. Larsen and Harrold [66] represent the dependencies in a class using the class de-pendence graph (ClDG). A ClDG is a collection of MDGs constructed for individualmethods in the program. In addition it contains a class entry vertex that is connected tothe method entry vertex for each method in the class by a class member edge. Class entryvertices and class member edges let us track dependencies that arise due to interactionamong classes. In presence of multiple classes, additional dependence edges are required to recordthe interaction between classes. For example, when a class C1 creates an object of classC2, there is an implicit call to C2’s constructor. When there is a call site in method m1of class C1 to method m2 of class C2 , there is a call dependence edge from the call sitein m1 to method start vertex of m2 . Parameter in edges are added from actual in to thecorresponding formal in node and parameter out edges are added from formal out to thecorresponding actual in node. In object oriented programs, data dependence computation is complicated by thefact that statements can read to and write from fields of objects, i.e. a statement canhave side effects. Computation of side effect information requires points to analysis and isfurther discussed in Chapter 3. Also, methods can be invoked on objects and objects canbe passed as parameters. An algorithm for computing data dependence must considerthis into account.Handling objects at callsitesIn presence of a function call invoked on an object such as o.m1(), the function call canmodify the data members of o. Larsen and Harrold observe that data member variablesof a class are accessible to all methods in the class and hence can be treated as globalvariables. They use additional parameters to represent the data members referenced by amethod. Thus the data dependence introduced by two consecutive method calls via data
  35. 35. Chapter 2. Slicing 28class Base { int a,b; protected void vm() { a=a+b; } public Base() { class Derived extends Base { a=0; long d; b=0; public void vm() { } d=d+b; public void m2(int i) { } b=b+i; public Derived() { } super(); public void m1() { d=0; if(b0) vm(); } b=b+1; public void m3() { } d=d+1; m2(1); public void main1() { } Base o = new Base(); public void m4() { Base ba = new Base(); m1(); ba.m1(); } ba.m2(1); o.m2(1); public void main2() { } int i=read(); public void C(Base ba) { Base p; ba.m1(); if(i0) ba.m2(1); p=new Base(); } else public void D() { p=new Derived(); Base o = new Base(); C(p); C(o); p.m1(); o.m1(); } } } } Figure 2.9: Program
  36. 36. Chapter 2. Slicing 29 Figure 2.10: The Dependence Graph for the main function (from [67]) Figure 2.11: The Dependence Graphs for functions C() and D() (from [67])
  37. 37. Chapter 2. Slicing 30member variables can be represented as data dependence between the actual parametersat the method callsites. Figure 2.10 shows the dependence graph constructed for themain program of Figure 2.9. Variables a and b are considered as global variables sharedacross methods m1(), m2() and Base(). The data member variables are considered asadditional parameters that are passed to the function. This method of slicing includesonly those statements that are necessary for data members at the slicing criteria toreceive correct values. For example, slicing with respect to the node b = b out associatedwith the statement o.m2() will exclude statements that assign to data member a. One source of imprecision of this method is that it does not consider the fact thatdata members may belong to different objects and creates spurious dependencies betweendata members of different objects. In the above example, the slice wrongly includes thestatements ba.m1() and ba.m2(). Liang and Harrold [67] give an improved algorithm forobject sensitive slicing. In the dependence graph representation of [67], the constructor has no formal invertices for the instance variables since these variables cannot be referenced before theyare allocated by the class constructor. Thus the algorithm omits formal-in verticesfor instance variables in the class constructor In the approaches of [67], [66] the datamembers of the class are treated as additional parameters to be passed to the function.This increases the number of parameter nodes. The number of additional nodes canbe reduced using GMOD/GREF information. Actual-out and Formal-out vertices areneeded only for those data members that are modified by the member function. Actual-inand Formal-in vertices are needed for those data members accessed by the function.Handling Parameter ObjectsTonella [59] represents an object as a single vertex when the object is used as a parameter.This representation can lead to imprecise slices because it considers modification (oraccess) of an individual field in an object to be a modification(or access) of the entireobject. For example, if the slicing criteria is o.b at the end of D() (in Figure 2.9), thenC(o) must be included. This in turn causes the slicer to include the parameter ba,
  38. 38. Chapter 2. Slicing 31which causes ba.a and ba.b to be included, though ba.a does not affect o.b. To overcomethis limitation, Liang and Harrold [67] expand the parameter object as a tree. Figure2.11 shows the parameter ba being expanded into a tree. At the first level, the noderepresenting ba is expanded into two nodes, Base and Derived each representing the typeba can possibly have. At the next level, each node is expanded into its constituent datamembers. Since data members can themselves be objects, the expansion is recursivelydone till we get primitive data types. In presence of recursive data types, where treeheight can be infinite , k-limiting is used to limit the height of the tree to k. At the callstatement C(o) in Figure 2.9, the parameter object o is expanded into its data members.At the function call, actual in and actual out vertices are created for the data membersof o. Summary edges are added between the actual in and actual out vertices if there isa dependence possible through the called procedure.2.3.2 Handling InheritanceJava provides a single inheritance model which means that a new Java class can bedesigned that inherits state variables and functionality from an existing class. Thefunctionality of base class methods can be overridden by simply redefining the methodsin the base class. Larsen and Harrold [66] construct dependence graph representationsfor methods defined by the derived class . The representations of all methods thatare inherited from superclasses are simply reused. To construct the dependence graphrepresentation of class Derived (Figure 2.9), new representations are constructed formethods such m3(), m4(). The representation of m1() is reused from class Base Liang and Harrold [67] illustrate that in the presence of virtual methods, it is not pos-sible to directly reuse the representations of the methods of the superclass.For example,we cannot directly reuse the representation for m1() in class Base when we constructthe representation for class Derived. In the Base class , the call statement vm() inm1() resolves to Base :: vm(). If a class derived from Base redefines vm(), then the callstatement vm() no longer resolves to Base :: vm(), but to the newly defined vm() of thederived class. The callsites in the representation of m1() for class Derived have to be
  39. 39. Chapter 2. Slicing 32changed. A method needs a new representation if 1. the method is declared in the new class 2. the method is declared in a lower class in the hierarchy and calls a newly redefined virtual method directly or indirectly.For example, methods declared in Dervied need a new representation because thesemethods satisfy (1), Base.m1() also needs a new representation because it satisfies (2):Base.m1() calls Dervied.vm() which is redefined in class DerivedHandling InterfacesIn Java, interfaces declare methods but let the responsibility of defining the methods toconcrete classes implementing the interface. Interfaces allows the programmer to workwith objects by using the interface behavior that they implement, rather than by theirclass definition. Single Interfaces We use the interface representation graph [58] to represent a Java interface and itscorresponding classes that implement it. There is a unique vertex called interface startvertex for the entry of the interface. Each method declaration in the interface can beregarded as a call to its corresponding method in a class that implements it and thereforea call vertex is created for each method declaration in the interface. The interface startvertex is connected to each call vertex of the method declaration by interface membershipdependence arcs. If there are more than once classes that implement the interface, weconnect a method call in the interface to every corresponding method that implement itin the classes. Interface Extending Similar to extending classes, the representation of extendedinterface is constructed by reusing the representation of all methods that are inheritedfrom superinterfaces. For newly defined methods in the extended interface, new repre-sentations are created.
  40. 40. Chapter 2. Slicing 33ie1 interface A { ie1c1 void method1(int h); (a)c2 void method2(int v); f1_in: this.h=this.h_in } f2_in: this.v=this.v_inie3 interface B extends A { c1 c2 f3_in: this.u=this.u_inc4 void method3(int u); f4_in: h1=h1_in } f5_in: v1=v1_in a1_in a2_in f6_in: u1=u1_ince5 class C1 implements A { f7_in: h2=h2_ins6 int h, v; e9 e16 f8_in: v2=v2_ine7 public void method1(int h1) { e7 e13 a1_in: h1_in=hs8 this.h = h1; a2_in: v1_in=v } a3_in: u1_in=ue9 public void method2(int v1) {s10 this.v = v1; f4_in f7_in f5_in f8_in } s10 s17 s8 s14 }ce11 class C2 implements A {s12 int h, v;e13 public void method1(int h2) { ie3s14 this.h = h2+1; (b) }e16 public void method2(int v2) {s17 this.v = v2+1; } c1 c2 c4 }ce18 class C3 implements B { interface-memberships19 int h, v, u; a1_in a2_in a3_in dependence arce20 public void method1(int h1) {s21 this.h = h1+2; control dependence arc e20 e22 e24 }e22 public void method2(int v1) {s23 this.v = v1+2; call dependence arc } f4_in f5_in f6_ine24 public void method3(int u1) { parameter dependence arcs25 this.u = u1+2; s21 s23 s25 } } Figure 2.12: Interface Dependence Graph (from [58])
  41. 41. Chapter 2. Slicing 342.3.3 Handling PolymorphismIn Java, method calls are bound to the implementation at runtime. Method invocationexpressions such as o.m(args) are executed as follows 1. The runtime type T of o is determined. 2. Load T.class 3. Check T to find an implementation for method m. If T does not define an imple- mentation, T checks its superclass, and its superclass until an implementation is found. 4. Invoke method m with the argument list, args, and also pass o to the method, which will become the this value for method m. A polymorphic reference can refer to instances of more than one class. A classdependence graph represents such polymorphic method call by using a polymorphicchoice vertex [66]. A polymorphic choice vertex represents the selection of a particularcall given a set of possible destinations. In this method a message sent to a polymorphicobject is represented as a set of callsites one for each candidate message handling method,connected to a polymorphic choice vertex with polymorphic choice edges. This approachmay give incorrect results: in function main() , Larsen’s approach uses only one callsite torepresent statement p.m1() because m1() is declared only in Base. However, when m1()is called from objects of class Derived, it invokes Derived.vm() to modify d and whenm1() is called from objects of class Base, it invokes Base.vm() to modify a. One callsitecannot precisely represent both cases. This approach also computes spurious dependence:the approach is equivalent to using several objects, each belonging to a different typeto represent a polymorphic object. The data dependence construction algorithm cannotdistinguish data members with the same names in these different objects. Liang and Harrold [67] give an improved method in representing polymorphism toovercome this limitation. A polymorphic object is represented as a tree: the root of thetree represents the polymorphic object and the children of the root represent objects of
  42. 42. Chapter 2. Slicing 35the possible types. When the polymorphic object is used as a parameter, the childrenare further expanded into trees; when the polymorphic object receives a message, thechildren are further expanded into callsites. In Figure 2.11 the callsite ba.m1() can havereceiver types Base and Derived . Thus the call site is expanded (one for each type ofreceiver).2.3.4 Case Study - Elevator Class and its Dependence GraphFigure 2.13 shows the elevator program and the slice with respect to the line 59. Figure2.14 shows the class dependence graph constructed for the program. The C++ Elevatorclass discussed in [72] has been modified for Java.
  43. 43. Chapter 2. Slicing 36 30 } 31 int current floor; 1 class Elevator { 32 int current direction; 33 int top floor; 2 static int UP=1, DOWN=-1; 34 } 3 public Elevator(int t) { 35 class AlarmElevator extends Elevator { 4 current floor=1; 36 public AlarmElevator(int top floor) { 5 current direction = UP; 6 top floor = t; 37 super(top floor); 7 } 38 alarm on=0; 39 } 8 public void up() { 40 public void set alarm() { 9 current direction=UP; 41 alarm on=1;10 } 42 } 43 public void reset alarm() {11 public void down() { 44 alarm on=0; }12 current direction=DOWN; 45 public void go(int floor) {13 } 46 if(!alarm on)14 int which floor() { 47 super.go(floor);15 return current floor; 48 }16 } 49 protected int alarm on;17 public int direction() { 50 }18 return current direction;19 } 51 class Test { 52 public static void main(String args[]) {20 public void go (int floor) { 53 Elevator e; 54 if(condition)21 if(current direction==UP) { 55 e=new Elevator(10);22 while (current floor!= floor 56 else23 current floor = top floor)) 57 e=new AlarmElevator(10);24 current floor= current floor+1 ; 58 e.go(5);25 } 59 System.out.print(e.which floor());26 else { 60 }27 while (current floor != floor 61 }28 current floor 0)29 current floor= current floor-1; Figure 2.13: The Elevator program
  44. 44. Chapter 2. Slicing 37 52 slice point 54 58 59 A4_in 14 57 55 P1 F1_in 15A10_in A4_out A5_out A6_out A7_out A11_in A4_out A5_out A6_out A4_in A5_in A6_in A7_in A9_in A4_out A4_in A5_in A6_in A9_in A4_out 36F3_in F1_out F2_out F3_out F8_out 40 37 31A8_in A4_out A5_out A6_out 3 F1_in F2_in F3_in F8_in F5_in F1_out F4_in F1_out F2_out F3_out 32 33 4 5 6 control dependence A4_out edge A4_in A5_in A6_in A8_in 20 data dependence edge F1_in F2_in F3_in F5_in F1_out summary edge 21 call edge, parameter edge 22 27 key for parameter vertices A1_in: a_in = current_floor 24 29 A1_out: current_floor = a_out F1_in: current_floor = current_floor_in A2_in: b_in = 1 F1_out: current_floor_out = current_floor A3_in: b_in: = ?1 F2_in: current_dirn = current_dirn_in A4_in: current_floor_in = current_floor F2_out: current_dirn_out = current_dirn A4_out: current_floor = current_floor_out F3_in: top_floor = top_floor_in A5_in: current_dirn_in = current_dirn F3_out: top_floor_out = top_floor A5_out: current_dirn = current_dirn_out F4_in: 1_top_floor = 1_top_floor_in A6_in: top_floor_in = top_floor F5_in: floor = floor_in A6_out: top_floor = top_floor_out F6_in: a = a_in A7_in: alarm_on_in = alarm_on F6_out: a_out = a A7_out: alarm_on = alarm_on_out F7_in: b = b_in A8_in: 1_top_floor_in = 1_top_floor F8_in: alarm_on = alarm_on_in A9_in: floor_in = 5 F8_out: alarm_on_out = alarm_on A!0_in: top_floor = 10 A11_in: 1_top_floor = 10 Figure 2.14: Dependence Graph for Elevator program
  45. 45. Chapter 3Points to AnalysisIn this chapter we first discuss the need for points to analysis. In the context of slicing,points to analysis is essential for the correct computation of data dependencies andconstruction of call graph. We summarize some issues related to computing points tosets, including the methods for its computation and various factors that affect precision. We next describe Andersen’s algorithm for pointer analysis for C and its adaptationfor Java. We then describe a new method for intra-procedural alias analysis which is animprovement over flow insensitive analysis but not as precise as a flow sensitive analysis.3.1 Need for Points to AnalysisThe goal of pointer analysis is to statically determine the set of memory locations thatcan be pointed to by a pointer variable. If two variables can access the same memorylocation, the variables are said to be aliased. Alias analysis is necessary for program anal-ysis, optimizations and correct computation of data dependence which is necessary forslicing. Consider the computation of data dependence in Figure 3.1. Here the statementprint(y.a) is dependent on x.a=... , since x and y are aliased due to the executionof the statement y=x. Without alias analysis, it is not possible to infer that statement 7is dependent on statement 4. A points to graph gives information about the set of memory locations pointed at by 38
  46. 46. Chapter 3. Points to Analysis 391 void fun() {2 obj x,y;3 x=new obj(); // O1 represent the object allocated4 x.a = ....;5 ... = y.a;6 y = x;7 print(y.a);8 } Figure 3.1: Need for Points to Analysis each variable. Figure 3.1 shows a program and its associated points to graph. In C a variable can point to another stack variable or dynamically allocated memory on heap, whereas in Java a reference variable can point only to objects allocated on heap, as stack variables cannot be pointed to due to lack of address of operator (). Dynamically allocated memory locations on heap are not named. One convention is to refer objects (memory locations) by the statement at which they are created. A statement can be executed many times and therefore can create a new object each time. Thus approximations are introduced in the points to graph if the above convention is used. Another cause for approximation is the presence of recursion and dynamic allocation of memory, which leads to statically unbounded number of memory locations. 3.2 Pointer Analysis using Constraints Our aim is to derive the points to graph from the program text. One method to derive the points to graph is using constraints [64]. If pts(q) denotes the set of objects initially pointed by q, after an assignment such as p = q, p can additionally point to those objects, which are initially pointed at by q. Thus we have the constraint pts(p) ⊇ pts(q). Every statement in the program has an associated constraint. A solution to the constraints gives the points to sets associated with every variable. The constraints such as pts(p) ⊇ pts(q) are also called subset constraints or inclusion based constraints. Andersen uses subset constraints for analyzing C program and his algorithm is described in Section 3.4
  47. 47. Chapter 3. Points to Analysis 40 Points to graph for a C program Points to graph for a Java program s heap2 int a=1, b=2; class Obj { int f; } int *p, *q; r heap1 Obj r,s,t; void *r, *s; h1: r = new Obj(); p = a; q h2: s = new Obj(); q = b; p h3: r.f = new Obj(); t h1: r = malloc t = s; heap2 h2: s = malloc a s f b r heap1 f heap3 f Figure 3.2: Points to GraphsSubset vs Unification ConstraintsThe constraints generated can be either subset based or equality based. A subset con-straint such as p ⊇ q says that the the points-to set of p contains the points-to set ofq. Instead of having subset constraints, Steensgaard [13] uses equality based constraintswhere after each assignment like p = q, the points to sets of p and q are unified i.e. thepoints to sets of both the variables are made identical. Steensgaard’s approach is based on a non standard type system, where type does notrefer to declared type in the program source. Instead, the type of a variable describesa set of locations possibly pointed to by the variable at runtime. At initialization eachvariable is described by a different type. When two variables can point to the same mem-ory location, the types represented by the variables are merged. However the strongerconstraints make the analysis less precise. The equality based approach is also calledunification because it treats assignments as bidirectional. This unification merges the
  48. 48. Chapter 3. Points to Analysis 41points to set of both sides of the assignment and is essentially computing an equivalencerelation defined by assignments, which is done by the fast union find algorithm [22] If all the variables can be assigned types, subject to the constraints, then the sys-tem of constraints is said to be satisfiable or well typed. Points-to analysis reduces tothe problem of assigning types to all locations (variables) in a program, such that thevariables in the program are well-typed. At the end of the analysis, two locations areassigned different types, unless they have to be described by the same type in order forthe system of constraints to be well-typed.3.3 Dimensions of PrecisionThe various factors that contribute to the precision of the analysis computed are flowsensitivity, field sensitivity, context sensitivity and heap modelling. Ryder [17] discussesvarious parameters that contribute to the precision of the analysisFlow Sensitive vs Flow Insensitive approachA flow sensitive analysis takes into account the control flow structure of the program.Thus the points-to set associated with a variable is dependent on the program point. Itcomputes the mapping variable ⊗ program point → memory location. This is precisebut requires a large amount of memory since the points to sets of the same variable attwo different program points may be different and their points-to sets have to be recordedseparately. Flow sensitive analysis allows us to take advantage of strong updates, whereafter a statement x = ..., the points to information about x prior to that statement canbe removed. A flow insensitive approach computes conservative information that is valid at allprogram points. It considers the program as a set of statements and computes points-toinformation ignoring control flow. Flow insensitive analysis computes a single points torelation that holds regardless of the order in which assignment statements are actually
  49. 49. Chapter 3. Points to Analysis 42executed. A flow insensitive analysis produces imprecise results. Consider the computation ofdata dependence for the program in Figure 3.1. If we apply flow insensitive alias anal-ysis, then the analysis will conclude that x and y can both point to O1 , and thus thestatement ... = y.a (line 5) is made dependent on x.a = ... . But y can point to O1only after the statement y = x. Thus flow insensitive analysis leads to spurious datadependence.Field SensitivityAggregate objects such as structures can be handled by one of three approaches: field-insensitive, where field information is discarded by modeling each aggregate with a singleconstraint variable; field-based, where one constraint variable models all instances of afield; and finally, field-sensitive, where a unique variable models each field instance of anobject. The following table describes these approaches for the code segment x.a = new object(); y.b = x.a ; field based pts(b) ⊇ pts(a) field insensitive pts(y) ⊇ pts(x) field sensitive pts(y.b) ⊇ pts(x.a)Heap AbstractionTwo variables are aliased if they can refer to the same object in memory. Thus we needto keep track of objects that can be present at runtime. The objects created at runtimecannot be determined statically and have to be conservatively approximated. The leastprecise manner is to consider the entire heap as a single object. The most common man-ner of abstraction is to have one abstract object per program point. This abstract objectis a representative of all the objects that can be created at runtime due to that program
  50. 50. Chapter 3. Points to Analysis 43main() { object a,b,c,d; a=new object(); pts(a) ⊇ {o1} b=new object(); pts(b) ⊇ {o2} c=id(a); pts(r) ⊇ pts(a), pts(c) ⊇ pts(r) d=id(b); pts(r) ⊇ pts(b), pts(d) ⊇ pts(r)}object id(object r) { return r;} Figure 3.3: Imprecision due to context insensitive analysispoint. A more precise abstraction is to take context sensitivity into account using thecalling context to distinguish between various objects created at the same program point.Context SensitivityA context sensitive analysis distinguishes between different calling contexts and does notmerge data flow information from multiple contexts. In Figure 3.3, a and b point to o1and o2 respectively. Due to the function calls, c is made to point to o1 and d is madeto point to o2. So the actual points to sets are a → o1 , b → o2, c → o1 and c → d Acontext insensitive analysis models parameter bindings as explicit assignments. Thus rpoints to both the objects o1 and o2. This leads to smearing of information making cand d point to both o1 and o2. One method to incorporate context sensitivity is to summarize each procedure andembed that information at the call sites. A method can change the points to sets ofall data reachable through static variables, incoming parameters and all objects createdby the method and its callees. A method’s summary must include the effect of all theupdates that the function and all its callees can make, in terms of incoming parameters.Thus summaries are huge. Also there is another difficulty due to call back mechanism.

×