SlideShare a Scribd company logo
1 of 92
Download to read offline
A Static Slicing Tool for Sequential Java Programs



                         A Thesis
              Submitted For the Degree of
             Master of Science (Engineering)
             in the Faculty of Engineering




                             by

                    Arvind Devaraj




              Computer Science and Automation
                 Indian Institute of Science
                  BANGALORE – 560 012

                        March 2007
i
Abstract

A program slice consists of a subset of the statements of a program that can potentially
affect values computed at some point of interest. Such a point of interest along with a set
of variables is called a slicing criterion. Slicing tools are useful for several applications,
such as program understanding, testing, program integration, and so forth. Slicing object
oriented programs has some special problems, that need to be addressed due to features
like inheritance, polymorphism and dynamic binding. Alias analysis is important for
precision of slices. In this thesis we implement a slicing tool for sequential Java programs
in the SOOT framework. SOOT is a front-end for Java developed at McGill University
and it provides several forms of intermediate code. We have integrated the slicer into
the framework. We also propose an improved technique for intraprocedural points-to
analysis. We have implemented this technique and compare the results of the analysis
with those for a flow-insensitive scheme in SOOT. Performance results of the slicer are
reported for several benchmarks.




                                              ii
Contents

Abstract                                                                                                                 ii

1 Introduction                                                                                                           1
  1.1 Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                  1
  1.2 The SOOT Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                         5
  1.3 Contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . .                                    5

2 Slicing                                                                                                                 7
  2.1 Intraprocedural Slicing using PDG . . . . . . . . . . . . . . .                            .   .   .   .   .   .    7
       2.1.1 Program Dependence Graph . . . . . . . . . . . . . . .                              .   .   .   .   .   .    8
       2.1.2 Slicing using the Program Dependence Graph . . . . .                                .   .   .   .   .   .    8
       2.1.3 Construction of the Data Dependence Graph . . . . . .                               .   .   .   .   .   .    9
       2.1.4 Control Dependence Graph . . . . . . . . . . . . . . .                              .   .   .   .   .   .   11
       2.1.5 Slicing in presence of unstructured control flow . . . . .                           .   .   .   .   .   .   14
       2.1.6 Reconstructing CFG from the sliced PDG . . . . . . .                                .   .   .   .   .   .   17
  2.2 Interprocedural Slicing using SDG . . . . . . . . . . . . . . . .                          .   .   .   .   .   .   18
       2.2.1 System Dependence Graph . . . . . . . . . . . . . . . .                             .   .   .   .   .   .   18
       2.2.2 Calling context problem . . . . . . . . . . . . . . . . .                           .   .   .   .   .   .   20
       2.2.3 Computing Summary Edges . . . . . . . . . . . . . . .                               .   .   .   .   .   .   21
       2.2.4 The Two Phase Slicing Algorithm . . . . . . . . . . .                               .   .   .   .   .   .   21
       2.2.5 Handling Shared Variables . . . . . . . . . . . . . . . .                           .   .   .   .   .   .   23
  2.3 Slicing Object Oriented Programs . . . . . . . . . . . . . . . .                           .   .   .   .   .   .   26
       2.3.1 Dependence Graph for Object Oriented Programs . . .                                 .   .   .   .   .   .   26
       2.3.2 Handling Inheritance . . . . . . . . . . . . . . . . . . .                          .   .   .   .   .   .   31
       2.3.3 Handling Polymorphism . . . . . . . . . . . . . . . . .                             .   .   .   .   .   .   34
       2.3.4 Case Study - Elevator Class and its Dependence Graph                                .   .   .   .   .   .   35

3 Points to Analysis                                                                                                     38
  3.1 Need for Points to Analysis . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   38
  3.2 Pointer Analysis using Constraints . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   39
  3.3 Dimensions of Precision . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   41
  3.4 Andersen’s Algorithm for C . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   44
  3.5 Andersen’s Algorithm for Java . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   45
      3.5.1 Model for references and heap objects        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   45


                                             iii
CONTENTS                                                                                                                iv



       3.5.2 Computation of points to sets in SPARK .                   . . . .         .   .   .   .   .   .   .   .   47
   3.6 CallGraph Construction . . . . . . . . . . . . . . .             . . . .         .   .   .   .   .   .   .   .   48
       3.6.1 Handling Virtual Methods . . . . . . . . . .               . . . .         .   .   .   .   .   .   .   .   49
   3.7 Improvements to Points to Analysis . . . . . . . . .             . . . .         .   .   .   .   .   .   .   .   50
   3.8 Improving Flow Sensitivity . . . . . . . . . . . . . .           . . . .         .   .   .   .   .   .   .   .   51
       3.8.1 Computing Valid Subgraph at each Program                   Point           .   .   .   .   .   .   .   .   53
       3.8.2 Computation of Access Expressions . . . .                  . . . .         .   .   .   .   .   .   .   .   55
       3.8.3 Checking for Satisfiability . . . . . . . . . .             . . . .         .   .   .   .   .   .   .   .   60

4 Implementation and Experimental Results                                                                               62
  4.1 Soot-A bytecode analysis framework . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   62
  4.2 Steps in performing slicing in Soot . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   65
  4.3 Points to Analysis and Call Graph . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   65
  4.4 Computing Required Classes . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   67
  4.5 Side effect computation . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   68
  4.6 Preprocessing . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   69
  4.7 Computing the Class Dependence Graph . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   70
  4.8 Experimental Results . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   71

5 Conclusion and Future Work                                                                                            75

Bibliography                                                                                                            77
List of Tables

 3.1   Constraints for C . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   44
 3.2   Constraints for Java . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   47
 3.3   Data flow equations for computing valid edges        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   53
 3.4   Computation of Valid edges . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   54

 4.1   Benchmarks Description . . . . . . . . . . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   72
 4.2   Number of Edges in the Class Dependence Graph               .   .   .   .   .   .   .   .   .   .   .   .   .   72
 4.3   Timing Requirements . . . . . . . . . . . . . . .           .   .   .   .   .   .   .   .   .   .   .   .   .   72
 4.4   Program Statistics - Partial Flow Sensitive . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   73
 4.5   Precision Comparison . . . . . . . . . . . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   73




                                           v
List of Figures

 1.1    A program and its slice . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                2

 2.1    A Control Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                   12
 2.2    Post Dominator Tree for the CFG in Figure 2.1 . . . . . . . . . . . . . .                                      12
 2.3    Dominance Frontiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                  13
 2.4    A program and its PDG (taken from [39]) . . . . . . . . . . . . . . . . .                                      15
 2.5    Augmented CFG and PDG for the program in Figure 2.4 (taken from [39])                                          16
 2.6    A program with function calls . . . . . . . . . . . . . . . . . . . . . . . .                                  18
 2.7    System Dependence Graph for an interprocedural program . . . . . . . .                                         19
 2.8    Slicing the System Dependence Graph . . . . . . . . . . . . . . . . . . .                                      24
 2.9    Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                  28
 2.10   The Dependence Graph for the main function (from [67]) . . . . . . . .                                         29
 2.11   The Dependence Graphs for functions C() and D() (from [67]) . . . . .                                          29
 2.12   Interface Dependence Graph (from [58]) . . . . . . . . . . . . . . . . . .                                     33
 2.13   The Elevator program . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                   36
 2.14   Dependence Graph for Elevator program . . . . . . . . . . . . . . . . . .                                      37

 3.1    Need for Points to Analysis . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   39
 3.2    Points to Graphs . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   40
 3.3    Imprecision due to context insensitive analysis    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   43
 3.4    Object Flow Graph . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   53
 3.5    An example program . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   54
 3.6    Access Expressions . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   54
 3.7    OFG Subgraph . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   56
 3.8    Access Expressions(for a DAG) . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   58
 3.9    Access Expressions (for general graph) . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   60
 3.10   Simplified Access Expressions . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   60
 3.11   Dominator Tree . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   60

 4.1 Soot Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . .                                       64
 4.2 Computation of the class dependence graph . . . . . . . . . . . . . . . .                                         66
 4.3 Jimple code and its slice . . . . . . . . . . . . . . . . . . . . . . . . . . .                                   74




                                            vi
Chapter 1

Introduction

1.1     Slicing
A program slice consists of the parts of a program that can potentially affect the value of
variables computed at some point of interest. Such a point is called the slicing criterion
and is specified by a pair (program point,set of variables).The original concept of a
program slice was proposed by Mark Weiser [61]. According to his definition

      A slice s of program p is a subset of the statements of p that retains some
      specified behavior of p. The desired behavior is detailed by means of a slicing
      criterion c. Generally, a slicing criterion c is a set of variables V and a
      program point l. When the slice s is executed, it must always have the same
      values as program p for the variables in V at point l.

   Weiser claimed that a program slice was the abstraction that users had in mind as
they debugged programs. There have been variations in the definitions of program slices
depending on the application in mind. Weiser’s original definition required a slice S of
a program to be an executable subset of the program, whereas another common defini-
tion defines a slice as a subset of statements that directly or indirectly affect the values
computed at the point of interest but are not necessarily an executable segment. Fig-
ure 1.1 shows a program sliced with respect to the slicing criterion ( print(product),

                                            1
Chapter 1. Introduction                                                                 2



 read(n);
 i = 1;                                 read(n);
 sum = 0;                               i = 1;
 product = 1;                           product = 1;
 while (i<=n) {                        while (i<=n) {
    sum = sum + i;
    product = product * i;                   product = product * i;
    i = i + 1;                               i = i + 1;
 }                                       }
 print(sum);
 print(product);                         print(product);



                          Figure 1.1: A program and its slice

product) . Since the transformed program is expected to be much smaller than the
original it is hoped that dependencies between statements in the program will be more
explicit. Surveys on program slicing are presented in [45], [73]. Slicing tools have been
used for several applications, such as program understanding [82], testing [74] [75], pro-
gram integration [78], model checking [79] and so forth.

  1. Program Understanding: Software engineers are assigned to understand a mas-
     sive piece of code and modify parts of them. When modifying a program, we need
     to comprehend a section of the program rather than the whole program. Backward
     and forward slicing can be used to browse the code and understand the interde-
     pendence between various parts of the program.

  2. Testing: In the context of testing, a problem that is often encountered is that of
     finding the set of program statements that are affected by a change in the program.
     This analysis is termed impact analysis. To determine what tests need to be re-run
     to test test a modified statement S, a backward slice on S will get the statements
     that actually influence the behavior of the program.

  3. Debugging: Quite often the statement that is actually responsible for a bug that
     shows up at some program point P is statically far away from P . To reduce the
     search space of possible causes for the error the programmer can use a backward
Chapter 1. Introduction                                                                     3



      slice to eliminate parts of the code that could not have been the cause of the
      problem.

  4. Model Checking: Model checking is a verification technique that performs an
      exhaustive exploration of a program’s state space. Typically the execution of a
      program is simulated and path and states encountered in the simulation are checked
      against correctness specifications phrased as temporal logic formula. The use of
      slicing here is to reduce the size of a program P beginning checked for a property
      by eliminating statements and variables that are irrelevant to the formula.

   There is an essential difference between static and dynamic slices. A static slice
disregards the actual inputs to a program whereas the latter relies on a specific test case
and therefore is in general , more precise.
   When slicing a program P we are concerned with both correctness as well as precision.
For correctness we demand that the slice S produced by the tool is a superset of the
actual slice S(p) for the slicing criterion p. Precision has to do with the size of the slice.
For two correct slices S1 and S2 , S1 is more precise than S2 , if the statements of S1
are a subset of the statements of S2 . Obtaining the most precise slice, is in general not
computable, hence our aim is to compute a correct slice that is as precise as possible.
   The slicing problem can be addressed by viewing it as a reachability problem in a
Program Dependence Graph (PDG) [54]. A PDG is a directed graph with vertices cor-
responding to statements and predicates and edges corresponding to data and control
dependences. For the sequential intraprocedural case, the backward slice with respect
to a node in the PDG is the set of all nodes in the PDG on which this node is tran-
sitively dependent. Thus given the PDG, a simple reachability algorithm on the PDG
will construct the slice. However when considering interprocedural slices, the process
is more complicated as mere reachability will produce imprecise slices. One needs to
track only interprocedural realizable paths, where a realizable path corresponds to legal
call/return pairs where a procedure always returns to the call site where it was invoked.
The structure on which interprocedural slicing is generally implemented is the System
Dependence Graph [63] (SDG). This graph is a collection of graphs corresponding to
Chapter 1. Introduction                                                                  4



PDG’ss for individual procedures augmented with some extra edges that capture the
interaction between them. Slicing of interprocedural programs is described by Horwitz
et.al [63]. They use the SDG to track dependencies in a program and use a two phase
algorithm to ensure that only feasible paths are tracked, that is, those in which procedure
calls are matched with the correct return statements.
   Slicing object oriented programs adds yet another dimension of complexity to the
slicing problem. Object-oriented concepts such as classes, objects, inheritance, poly-
morphism and dynamic binding make representation and analysis techniques used for
imperative programming languages inadequate for object-oriented programs. The Class
Dependence Graph has been introduced by Larsen and Harrold [66], which can represent
class hierarchy, data members and polymorphism. Some more features were added by
Liang and Harrold [67].
   The resolution of aliases is required for the correct computation of data dependencies.
To compute the dependence graph, it is necessary to build a call graph. The computation
of call graph becomes complicated in presence of dynamic binding , i.e. when the target
of a method call depends on the runtime type of a variable. Algorithms like Rapid Type
Analysis (RTA) [26] compute call graphs using type information.
   A key analysis for object oriented languages is alias analysis. The objective here is
to follow an object O from its point of allocation to find out which objects reference
O and which other objects are referenced by the fields of O Resolving aliasing becomes
important for the correct computation of data dependencies in the dependence graph.
The precision of the analysis depends on various factors like flow sensitivity, context
sensitivity and handling of field references. Andersen [64] gives a flow insensitive method
for finding aliases using subset constraints. Lhotak [70] describes the method adapted
for Java programs.
   In this thesis we implement a slicing tool for sequential Java programs and integrate
it into the SOOT framework. We briefly describe the framework and the contributions
of the thesis.
Chapter 1. Introduction                                                                 5



1.2     The SOOT Framework
The SOOT analysis and transformation framework [69] is a Java optimization framework
developed by the Sable Research Group at McGill University and it is intended to be a
robust, easy-to-use research framework. It has been used extensively for program analy-
sis, instrumentation, and optimization. It provides several forms of intermediate code for
analyzing and optimizing Java bytecode. Jimple is a typed three address representation,
which we have used in our implementation.
   Our objective is to implement a slicing tool within the Soot framework [69] and make
it publicly available. At the time this work was begun there was no publicly available
slicing infrastructure for Java. The Indus [81] project addresses the slicing problem for
Java programs and source code has been made available in February 2007.



1.3     Contributions of the thesis
The following are the contributions of this thesis:

  1. We have implemented the routines for creating the program dependence graphs
      and the class dependence graph for an input Java program that is represented in
      the form of Jimple intermediate code.

  2. We have integrated a slicer into the framework. For inter-procedural slicing we
      have implemented the two-phase slicing algorithm of [63].

  3. We propose an improved technique for intraprocedural points-to analysis. This uses
      path expressions to track paths that encode valid points-to information. A simple
      data-flow analysis formulation collects valid edges, i.e. those that are added to
      the object flow graph. Reachability queries are handled in a reasonable amount of
      time. We have implemented this technique and compare the results of the analysis
      with those for a flow-insensitive scheme in SOOT.

  4. The slicing tool has been run on several benchmarks and we report on times taken
Chapter 1. Introduction                                                                     6



    to build the class dependence graph, its size, slice sizes for some given slicing criteria
    and slicing times.
Chapter 2

Slicing

In this chapter, we discuss techniques for slicing a program and in particular issues that
arise when slicing object oriented programs. The first part of the chapter describes the
Program Dependence Graph (PDG), its construction and the algorithm for intraproce-
dural slicing. For slicing programs with function calls, the System Dependence Graph
(SDG) is used. The SDG is a collection of PDGs individual procedures with additional
edges for modeling procedure calls and parameter bindings. The second part of the
chapter describes the construction of SDG and the algorithm for interprocedural slicing.
The third part of the chapter describes dependence graph computation of object ori-
ented programs, which is complicated because objects can be passed as parameters and
methods can be invoked upon objects. Also we need the results of points to analysis to
determine what objects are pointed by each reference variable. Then we describe the ex-
tension of the algorithm for computing the dependence graph in presence of inheritance
and polymorphic function calls.



2.1     Intraprocedural Slicing using PDG
Weiser’s approach [61] to program slicing is based on dataflow equations. In his approach,
the set of relevant variables is iteratively computed till a fixed point is reached. Slicing
via graph reachability was introduced by Ottenstein [54]. In this approach a dependence


                                            7
Chapter 2. Slicing                                                                  8



graph of the program is constructed and the problem of slicing reduces to computing
reachability on the dependence graph. We adopt this in our implementation.


2.1.1    Program Dependence Graph

A program dependence graph (PDG) represents the data and control dependencies in
the program. Nodes of PDG represent statements and predicates in a source program,
and its edges denote dependence relations. The PDG can be constructed as follows.

  1. Build the program’s CFG, and use it to compute data and control dependencies:
     Node N is data dependent on node M iff M defines a variable x, N uses x, and
     there is an x-definition-free path in the CFG from M to N . Node N is control
     dependent on node M iff M is a predicate node whose evaluation to true or false
     determines whether N will be executed.

  2. Build the PDG. The nodes of the PDG are almost the same as the nodes of the
     CFG. However, in addition, there is a a special enter node, and a node for each
     predicate. The PDG does not include the CFG’s exit node. The edges of the PDG
     represent the data and control dependencies computed using the CFG.


2.1.2    Slicing using the Program Dependence Graph

To compute the slice from statement (or predicate) S, start from the PDG node that
represents S and follow the data- and control-dependence edges backwards in the PDG.
The components of the slice are all of the nodes reached in this manner.
   The computation of the data dependence graph is described in Section 2.1.3. Com-
puting the control dependence graph is described in Section 2.1.4. Figure 2.4 shows an
example program and its corresponding PDG. Solid lines represent control dependencies
while dashes lines represent data dependencies.
Chapter 2. Slicing                                                                        9



2.1.3     Construction of the Data Dependence Graph

A data dependence graph represents the association between definitions and uses of a
variable. There is an association (d, u) between a definition of variable v at d and a use
of variable v at u iff there is at least one control flow path from d to u with no intervening
definition of v.
   Each node represent a statement. An edge represents a flow dependency between
statements. Though there are many kinds of data dependencies between statements,
only flow dependencies are necessary for the purpose of slicing as only flow dependence
needs to be traced back in order to compute the PDG nodes comprising the slice. Output
and anti dependence edges do not represent true data dependence. Instead they encode
a partial order on program statements, which is necessary to preserve since there is no
explicit control flow relation between PDG nodes. However, PDG slices are normally
mapped back to high-level source code, where control flow is explicitly represented. Thus
there is no need for any such control flow information to be present in the computed
PDG slice.
   Computation of flow dependencies is done by computing the problem of reaching
definitions. The problem of reaching definitions is a classical bitvector problem solvable
by monotone dataflow framework. This associates a program point with the set of
definitions reaching that point. The definitions reaching a program point along with the
use of a variable form flow dependencies.


Dependence in presence of arrays and records

In the presence of composite data types like arrays, records and pointers, the most
conservative method is to assume a definition of a variable to be the definition of the
entire composite object [83]. A definition (or use) of an element of an array can be
considered as definition (or use) of the entire array. For example, consider the statement


                                         a[i] = x
Chapter 2. Slicing                                                                    10



Here the variable a is defined and variables i, x are used. Thus DEF = {a} and
REF = {i, x}. The value of a is used in computing the address of a[i] and thus a must
also be included in the REF set. The correct value for REF is {a, i, x} [45] . This
approach is conservative leading to large slices created due to spurious dependencies.
Our current implementation handles composite data types in this manner, though more
refined methods have been proposed in the literature. Agrawal et.al. [53] propose a
modified algorithm for computing reaching definitions that determines the memory loca-
tions defined and used in statements and computes whether the intersection among those
locations is complete or partial or statically indeterminable. Another method to avoid
spurious dependencies is to use array index tests like GCD tests which can determine
that there is no dependence between two array accesses expressions.


Data dependencies in presence of aliasing

When computing data dependencies the major problem occurs due to presence of aliasing,
Consider the following example. Here there is a data dependency between x.a = ... and ...
= y.a since both x and y point to the object o1. Without alias analysis this dependency
is missed because the syntactic expressions x.a and y.a are different. Thus resolving
aliases is necessary for the correct computation of data dependencies. Also if worst case
assumptions are made for field loads and stores, many spurious dependencies are created.

v o i d fun ( ) {
     obj x , y ;
     x=new o b j ( ) ;   // o1 i s th e o b j e c t c r e a t e d
     y=x ;
     x.a = . . . . ;
      ... = y.a ;
}
Chapter 2. Slicing                                                                    11




   P if(x>y)
  S1     max = x;
     else
  S2     max = y;



2.1.4    Control Dependence Graph

Another kind of dependence between statements arises due to the presence of control
structure.
   For example, in the above code, the execution of S1 is dependent on the predicate
x > y . Thus S1 is said to be control dependent on P. A slice with respect to S1 has to
include P, because the execution of S1 depends on the outcome of the predicate node P.
   Two nodes Y and Z should be identified as having identical control conditions if in
every run of the program node, Y is executed if and only if Z is executed. In Figure
2.1, nodes 2 and 5 are said to be control dependent on the true branch of node 1,
since their execution is dependent conditionally on the outcome of node 1. The original
method for computing control dependence information using postdominators is presented
by Ferrante et.al. [47]. Cytron et.al. [46] gives an improved method for constructing
control dependence information by using dominance frontiers.


Finding control dependence using postdominators relationship

A node X is said to be a postdominator of node Y if all possible paths from Y to the exit
node must pass through X. A node N is said to be control dependent on edge a → b , if

  1. N postdominates b

  2. N does not postdominate a

   In Figure 2.1, to find the nodes that are control dependent on edge 1 → 2, we find
nodes that postdominate node 2 but not node 1. Nodes 2 and 5 are such nodes. So
nodes 2 and 5 are control dependent on the edge 1 → 2.
Chapter 2. Slicing                                                                                      12



     This observation suggests that to find the nodes that are control dependent on the
edge X → Y , we can traverse the postdominator tree and mark all nodes that postdom-
inate Y to be control dependent on Y , we stop when we reach the postdominator of
X.




                              HIJK
                              ONML 1
                             ÐÐ UU
                           ÐÐ          UU                                              HIJK
                                                                                       ONML
                     Ð
                     Ð   ÐÐ              UU                                               7 b
                                           UU                                         ÐÐ     bb
                HIJK
                ONML
                   2 b                       UU                                    ÐÐÐ         bb
                                                                                                 bb
                Ð        bb                    UU                                ÐÐ                bb
               Ð           bb                    UU                            ÐÐ                   b1
             ÐÐ              bb                                              ÐÐ
         Ð
         Ð ÐÐ                    0
                                                    U'                   HIJK
                                                                         ONML
                                                                           5 b         HIJK
                                                                                       ONML
                                                                                          6          HIJK
                                                                                                     ONML
                                                                                                        1
     HIJK
     ONML                     HIJK
                              ONML                 HIJK
                                                   ONML                  Ð     bb
       3 b                         4                   6              ÐÐÐ        bb
           bb                 Ð                                     ÐÐ             bb
             bb              Ð                    × ×             ÐÐ                 bb
               bb          ÐÐ                   ××               Ð                    b1
                         ÐÐ                                    ÐÐ
                 0   Ð Ð                      ××           HIJK
                                                           ONML
                                                             2           HIJK
                                                                         ONML
                                                                           4           HIJK
                                                                                       ONML
                                                                                          3
                HIJK
                ONML
                   5 b                      ××
                         bb               ××
                           bb          ×××
                             bb
                               0     ××
                                     Ó
                              HIJK
                              ONML 7                       Figure 2.2: Post Dominator Tree
                                                           for the CFG in Figure 2.1



             Figure 2.1: A Control Flow Graph



Using Dominance Frontiers to compute Control Dependence

Control dependencies between statements can be computed in an efficient manner us-
ing the dominance frontier information. Cytron et.al. [46] describes the method for
computing dominance frontiers.
     A dominance frontier for vertex vi contains all vertices vj such that vi dominates an
immediate predecessor of vj , but vi does not strictly dominate vj [62]
     DF (vi ) = { vj | (vj ∈ V ) (∃vk ∈ P red(vj )) ((vi dom vk ) ∧ ¬(vi sdom vj )) }
     Informally, the set of nodes lying just outside the dominated region of Y is said to
Chapter 2. Slicing                                                                   13




                                    HIJK
                                    ONML S 
                                     Ö         
                                    Ö              
                                 ÖÖ                    
                                                            
                               ÖÖ                               @
                           ÖÖÖ                                    HIJK
                                                                  ONML
                                                                     Y h
                         Ö
                       ÖÖ                                     {{         hh
                                                                           hh
                     ÖÖ                                     {{               hh
                 ÖÓ Ö                                   }{{{                   3
              ONML
              HIJK
               Z                                 WVUT
                                                 PQRS
                                                   Y g                        PQRS
                                                                              WVUT
                                                                                 Y
                 QQ                                       gg                  z
                    QQ                                      gg              zz
                      QQ                                      gg          zz
                                                                  3    zz
                                                                       }
                        QQ                                     PQRS
                                                               WVUT
                          QQ                                        Y
                             QQ                             uuu
                               QQ                         uu
                                 QQ                  uuu
                                   QQ              uu
                                                uuu
                                      Q%      uu
                                           zuu
                                    HIJK
                                    ONML X




                             Figure 2.3: Dominance Frontiers

be in the dominance frontier of Y. In the example in Figure 2.3, Y dominates nodes
Y’,Y”,Y”’ and X lies just outside the dominated region. So X is said to be in the
dominance frontier of Y.
   Note that if X is in the dominance frontier of Y , then there would be at least two
incoming paths to X of which one contains Y another not does not. If the CFG is
reversed, then we have two outgoing paths from X, one containing Y and another not
containing Y. This is same as the condition for Y to be control dependent on X. Thus
to find control dependence it is enough to find the dominance frontiers on the reverse
control flow graph. Algorithm 1 computes the control dependence information.
Chapter 2. Slicing                                                                      14



Algorithm 1 Algorithm to compute the Control Dependence Graph
 compute dominance frontiers of reversed CFG G i.e.
 for all N in G do
   let RDF (N ) be reverse dominator frontiers of N
   if RDF (N ) is empty then
      N is made control dependent on method entry node
   end if
   for all node P in RDF (N ) do
      for all node S in CFG successor of P do
        if S = N or N postdominates S then
          N is made control dependent on P
        end if
      end for
   end for
 end for


2.1.5     Slicing in presence of unstructured control flow

In the presence of unstructured control flow caused due to jump statements like goto,
break, continue and return, the algorithm for slicing can produce an incorrect slice. While
Java does not have goto statements, break and continue statements cause unstructured
control flow. Consider computing slice with respect to the statement print(prod) in
Figure 2.4. When the slicing algorithm discussed in Section 2.1.2 is applied , the state-
ment break is not included, which is incorrect.
   This was discovered by Choi and Ferrante [38] and by Ball and Horwitz [37] who
present a method to compute a correct slice in presence of unstructured control flow
statements. Their method to correct for such statements is based on the observation
that jumps are similar to predicate nodes in a way - both affect flow of control. Thus
jumps are also made to be sources of control dependence edges. A jump vertex has an
outgoing true edge to the target of the jump, and an outgoing false edge to the statement
that would execute if the jump were a no-op. A jump vertex is considered as a pseudo
predicate since the outgoing false edge is non-executable. The original CFG augmented
with these non-executable edges is called the Augmented Control Flow Graph (ACFG).
   Kumar and Horwitz [39] describe the following algorithm for slicing in presence of
jump statements.
Chapter 2. Slicing                                                                                                   15




                                                               enter
           prod = 1;
           k = 1;                                            prod = 1
           while (k = 10) {
                                                               k=1
             if (MAXINT/k  prod) break;
             prod = prod * k;                           while (k = 10)            T
             k++;                                                F                     if (MAXINT/k  prod)
           }                                                 print(k)
           print(k);                                                                     T                 F
           print(prod);                                     print(prod)            break           prod = prod * k

                                                                 exit                                           k++
                (a) Example Program                                           (b) CFG

                                                    enter

                        prod = 1      k=1      while (k = 10)          print(k)             print(prod)

                                             if (MAXINT/k  prod)
                                     break                                 k++
                                                prod = prod * k


                                                   (c) PDG




              Figure 2.4: A program and its PDG (taken from [39])
Chapter 2. Slicing                                                                                   16




         enter                                                             enter
       prod = 1                                            prod = 1                          print(prod)

         k=1                                                          k=1           print(k)

    while (k = 10)       T                                            while (k = 10)
         F                    if (MAXINT/k  prod)
       print(k)
                      T         T             F                       if (MAXINT/k  prod)
      print(prod)         break         prod = prod * k
                                    F                        break                             k++
         exit                                        k++
                                                                          prod = prod * k


                      (a) ACFG                                   (b) Corresponding APDG




Figure 2.5: Augmented CFG and PDG for the program in Figure 2.4 (taken from [39])
Chapter 2. Slicing                                                                    17



  1. Build the program’s augmented control flow graph described previously. Labels
     are treated as separate statements; i.e., each label is represented in the ACFG by
     a node with one outgoing edge to the statement that it labels.

  2. Build the program’s augmented PDG. Ignore the non-executable ACFG edges when
     computing data-dependence edges; do not ignore them when computing control-
     dependence edges. (This way, the nodes that are executed only because a jump
     is present, as well as those that are not executed but would be if the jump were
     removed, are control dependent on the jump node, and therefore the jump will be
     included in their slices.)

  3. To compute the slice from node S, follow data- and control-dependence edges back-
     wards from S . A label L is included in a slice iff a statement “goto L” is in the
     slice


2.1.6      Reconstructing CFG from the sliced PDG

Reconstructing the CFG from the PDG is described in in [71]. From the CFG and the
PDG slice, a sliced CFG is constructed by walking through all nodes. For each node n,
we execute the following.

  1. If n is a goto statement or return statement, leave it in the slice

  2. If n is a conditional statement , there are three cases

        (a) If n is not in the PDG slice, it can be removed

        (b) If n is in the PDG slice, but one of the branches is not, replace the jump to
             that branch with a jump to the convergence node of the branch (the node
             where two branches reconnect). If that node doesn’t exist , replace the jump
             with a jump to the return statement of the program

        (c) If n is present in the PDG slice and both branches are present leave n in the
             CFG
Chapter 2. Slicing                                                                  18




main() {
    sum=0;
    i=1;
    while(i11)
    {
        sum=add(sum,i);
        i=add(i,1);
    }
    print(sum);
    print(i);
}

int add(int a,int b) {
    result=a+b;
    return result;
}



                      Figure 2.6: A program with function calls

  3. Otherwise check if n is present in the PDG, if not remove it

   We next describe the interprocedural slicing algorithm implemented in this thesis.



2.2     Interprocedural Slicing using SDG

2.2.1    System Dependence Graph

For interprocedural slicing, Horwitz et.al [63] introduce the System Dependence Graph
(SDG). A system-dependence graph is a collection of program-dependence graphs, one
for each procedure, with additional edges for modeling parameter passing. Figure 2.6
shows a program with function calls. Figure 2.7 displays its SDG.
   Each PDG contains an entry node that represents entry to the procedure. To model
procedure calls and parameter passing, an SDG introduces additional nodes and edges.
Accesses to global variables are modeled via additional parameters of the procedure.
They assume parameters are passed by value-result, and introduce additional nodes in
Chapter 2. Slicing                                                                                 19




                                           main



      sum=0              i=1           while(i11)             print(sum)              print(i)




                         call add                                    call add


     a_in=sum                                               a_in=i                       i=r_out
                                    sum=r_out
                     b_in=i
                                                                            b_in=1




                                                enter add




                a=a_in
                                      b=b_in
                                                                                        r_out=result
                                                            result=a+b



                                         control edge                                parameter edge
                                         data edge                                   call edge
                                         summary edge




      Figure 2.7: System Dependence Graph for an interprocedural program
Chapter 2. Slicing                                                                    20



the interprocedural case. The following additional nodes are introduced.

  1. Call-site nodes representing the call sites.

  2. Actual-in and actual-out nodes representing the input and output parameters at
     the call sites. They are control dependent on the call-site node.

  3. Formal-in and formal-out nodes representing the input and output parameters at
     the called procedure. They are control dependent on the procedure’s entry node.

   They also introduce additional edges to link the program dependence graphs together:

  1. Call edges link the call-site nodes with the procedure entry nodes.

  2. Parameter-in edges link the actual-in nodes with the formal-in nodes.

  3. Parameter-out edges link the formal-out nodes with the actual-out nodes


2.2.2    Calling context problem

For computing an intraprocedural slice, a simple reachability algorithm on the PDG is
sufficient. However in interprocedural case, a simple reachability over the SDG doesn’t
work since not all the paths are valid. For example, in Figure 2.7, the path a in = sum →
a = a in → result = a + b → r out = result → i = r out is not valid interprocedurally.
In an interprocedural valid path, a call edge must be matched with its corresponding
return edge.
   To address this problem, Horwitz et.al. [63] introduce the concept of summary edges.
These edges summarize the effect of a procedure call. There is a summary edge between
an actual in and an actual out node of a call site, if there is a dependency between the
corresponding formal in and formal out node of the called procedure. Thus a summary
edge summarizes the effect of a procedure call.
Chapter 2. Slicing                                                                     21



2.2.3    Computing Summary Edges

We describe computation of summary edges in Algorithm 2. The algorithm takes the
given SDG and adds summary edges. P is the set of path edges. Each edge in P of
the form (n, m) encodes the information that there is a realizable path in the SDG from
n to m. The worklist contains path edges that need to be processed. The algorithm
begins by asserting that there is a realizable path from each formal out node to itself.
The set of realizable paths P is extended by traversing backwards through dependence
edges. If during the traversal, a formal in-node is encountered, then we have a realizable
path from formal-in to formal-out node. Therefore a summary edge is added between
the actual in and actual out nodes of the corresponding call sites. Because the insertion
of summary edges makes more paths feasible, this process is continued iteratively, till no
more summary edges can be added. The algorithm for computing summary information
is displayed in Algorithm 2
   Computing the summary edges is equivalent to the functional approach suggested by
Sharir and Pnueli [41].




2.2.4    The Two Phase Slicing Algorithm

Horwitz et.al [63] describe the two phase algorithm. The interprocedural backward slicing
algorithm consists of two phases. The first phase traverses backwards from the node in
the SDG that represents the slicing criterion along all edges except parameter-out edges,
and marks those nodes that are reached. The second phase traverses backwards from all
nodes marked during the first phase along all edges except call and parameter-in edges,
and marks reached nodes. The slice is the union of the marked nodes. Let s be the
slicing criterion in procedure P

  1. Phase 1 identifies vertices that can reach s, and are either in P itself or in a
     procedure that calls P (either directly or transitively). Because parameter out
     edges are not followed, the traversal in Phase 1, does not descend into procedures
Chapter 2. Slicing                                       22




Algorithm 2 Computing Summary Information
 W = ∅, W is the worklist
 P = ∅, P is the set of pathedges
 for all n ∈ N which is a formal out node do
   W = W ∪ (n, n)
   P = P ∪ (n, n)
 end for

 while W = ∅, worklist is not empty do
   remove one element (n,m) from worklist
   if n is a formal in node then
     for all n → n which is a parameter in edge do
        for all m → m which is a parameter out edge do
          if n and m belong to the same call site then
            E = E ∪ n → m add a new summary edge
            for all (m , x) ∈ P do
               P = P ∪ (n , x)
               W = W ∪ (n , x)
            end for
          end if
        end for
     end for
   else
     for all n → n do
        if (n , m) ∈ P then
                   /
          P = P ∪ (n , m)
          W = W ∪ (n , m)
        end if
     end for
   end if
 end while
Chapter 2. Slicing                                                                    23



     called by P. Though the algorithm doesn’t descend into the called procedures, the
     effects of such procedures are not ignored due to the presence of summary edges.

  2. Phase 2 identifies vertices that reach s from procedures (transitively) called by P
     or from procedures called by procedures that (transitively) call P. Because call
     edges and parameter in edges are not followed, the traversal in phase 2 doesn’t
     ascend into calling procedures; the transitive flow dependence edges from actual in
     to actual out vertices make such ascents unnecessary.

   We implemented a variation of the two phase slicing algorithm as described by Krinke
[49]. Figure 2.8 shows the vertices in SDG marked during phase 1 and phase 2, when
the statement print(i) is given as slicing criteria. The first phase traverses backwards
along all edges except the parameter out edge r out = result → i = r out . Thus the
first phase does not descend into the procedure add. In second phase traverses backwards
all edges except the parameter in edges and call edges. Thus in the second phase neither
the edge a in = sum → a = a in nor the edge call add → a = a in is traversed.


2.2.5    Handling Shared Variables

This section deals with handling variables that are shared across procedures. Shared
variables include global variables in imperative languages. Though Java does not have
global variables, instance members of a class can be treated as global variables that are
accessible by the member functions.
   Shared variables are handled by passing them as a additional parameters in every
function. Considering every shared variable as a parameter is a correct but inefficient as
it increases the number of nodes. We can reduce the number of parameters passed by
doing interprocedural analysis and using the GMOD and GREF information [42].

  1. GMOD(P) : The set of variables that might be modified by P itself or by a proce-
     dure (transitively) called from P

  2. GREF(P) : The set of variables that might be referenced by P itself or by a pro-
     cedure (transitively) called from P
Chapter 2. Slicing                                                                                   24




                                             main



      sum=0                i=1           while(i11)             print(sum)              print(i)




                           call add                                    call add


     a_in=sum                                                 a_in=i                       i=r_out
                                      sum=r_out
                      b_in=i
                                                                              b_in=1




                                                  enter add




                  a=a_in
                                        b=b_in
                                                                                          r_out=result
                                                              result=a+b



      marked in phase 1                    control edge                                parameter edge
                                           data edge                                   call edge
      marked in phase 2
                                           summary edge




                  Figure 2.8: Slicing the System Dependence Graph
Chapter 2. Slicing                                              25




Algorithm 3 Two phase slicing algorithm (Krinke’s version)
  input G=(N,E) the given SDG, s ∈ N the slicing criterion
  output S ⊆ N , the slice
  W up = s
  W down = ∅
  First phase
  while W up = ∅ worklist is not empty do
    remove one element n from W up
    for all m → n ∈ E do
       if m ∈ S then
             /
         if m → n is a parameter out edge then
           W down = W down ∪ m
           S =S∪m
         else
           W up = W up ∪ m
           S =S∪m
         end if
       end if
    end for
  end while


  while W down = ∅ worklist not empty do
    remove an element n from the worklist
    for all m → n ∈ E do
      if m ∈ S then
            /
        if m → n is not a parameter in edge or call edge then
           W down = W down ∪ m
           S =S∪m
        end if
      end if
    end for
  end while
Chapter 2. Slicing                                                                     26



   GMOD and GREF sets are used to determine which parameter vertices are included
in procedure dependence graphs . At procedure entry, these nodes are inserted

  1. Formal in for each variable in GM OD(P ) ∪ GREF (P )

  2. Formal out for each variable in GM OD(P )

   Similarly at a call site, the following nodes are inserted

  1. Actual in for each variable in GM OD(P ) ∪ GREF (P )

  2. Actual out for each variable in GM OD(P )



2.3     Slicing Object Oriented Programs
The System Dependence Graph (SDG) is not sufficient to represent all dependencies
for object oriented programs. An efficient graph representation of an object oriented
program should employ a class representation that can be reused in the construction of
other classes and applications that use the class. Section 2.3.1 discuss about dependence
graph representation for object oriented programs. Sections 2.3.2 and 2.3.3 discuss about
inheritance and polymorphism respectively.


2.3.1    Dependence Graph for Object Oriented Programs

The dependencies within a single method are represented using a Method Dependence
Graph (MDG), which is composed of data dependence subgraph and control dependence
subgraph. The MDG has a method entry node which represents the start of a method.
The method entry vertex has a formal in vertex for every formal parameter and a formal
out vertex for each formal parameter that may be modified. Each call site has a call vertex
and a set of actual parameter vertices: an actual-in vertex for each actual parameter at
the call site and an actual-out vertex for each actual parameter that may be modified
by the called procedure. Parameter out edges are added from each formal-out node to
the corresponding actual-out node. The effects of return statements are modeled by
Chapter 2. Slicing                                                                     27



connecting the return statement to its corresponding call vertex using a parameter-out
edge. Summary edges are added from actual in to actual out nodes as described in
Section 2.2.3.
   Larsen and Harrold [66] represent the dependencies in a class using the class de-
pendence graph (ClDG). A ClDG is a collection of MDGs constructed for individual
methods in the program. In addition it contains a class entry vertex that is connected to
the method entry vertex for each method in the class by a class member edge. Class entry
vertices and class member edges let us track dependencies that arise due to interaction
among classes.
   In presence of multiple classes, additional dependence edges are required to record
the interaction between classes. For example, when a class C1 creates an object of class
C2, there is an implicit call to C2’s constructor. When there is a call site in method m1
of class C1 to method m2 of class C2 , there is a call dependence edge from the call site
in m1 to method start vertex of m2 . Parameter in edges are added from actual in to the
corresponding formal in node and parameter out edges are added from formal out to the
corresponding actual in node.
   In object oriented programs, data dependence computation is complicated by the
fact that statements can read to and write from fields of objects, i.e. a statement can
have side effects. Computation of side effect information requires points to analysis and is
further discussed in Chapter 3. Also, methods can be invoked on objects and objects can
be passed as parameters. An algorithm for computing data dependence must consider
this into account.


Handling objects at callsites

In presence of a function call invoked on an object such as o.m1(), the function call can
modify the data members of o. Larsen and Harrold observe that data member variables
of a class are accessible to all methods in the class and hence can be treated as global
variables. They use additional parameters to represent the data members referenced by a
method. Thus the data dependence introduced by two consecutive method calls via data
Chapter 2. Slicing                                                            28




class Base {
    int a,b;
    protected void vm() {
        a=a+b;
    }
    public Base() {                            class Derived extends Base {
        a=0;                                      long d;
        b=0;                                      public void vm() {
    }                                                 d=d+b;
    public void m2(int i) {                       }
        b=b+i;                                    public Derived() {
    }                                                 super();
    public void m1() {                                d=0;
        if(b0) vm();                             }
        b=b+1;                                    public void m3() {
    }                                                 d=d+1;
                                                      m2(1);
     public void main1() {                        }
         Base o = new Base();                     public void m4() {
         Base ba = new Base();                        m1();
         ba.m1();                                 }
         ba.m2(1);
         o.m2(1);                                 public void main2() {
     }                                                int i=read();
     public void C(Base ba) {                         Base p;
         ba.m1();                                     if(i0)
         ba.m2(1);                                         p=new Base();
     }                                                else
     public void D() {                                     p=new Derived();
         Base o = new Base();                         C(p);
         C(o);                                        p.m1();
         o.m1();                                  }
     }                                     }
 }




                                 Figure 2.9: Program
Chapter 2. Slicing                                                             29




      Figure 2.10: The Dependence Graph for the main function (from [67])




    Figure 2.11: The Dependence Graphs for functions C() and D() (from [67])
Chapter 2. Slicing                                                                       30



member variables can be represented as data dependence between the actual parameters
at the method callsites. Figure 2.10 shows the dependence graph constructed for the
main program of Figure 2.9. Variables a and b are considered as global variables shared
across methods m1(), m2() and Base(). The data member variables are considered as
additional parameters that are passed to the function. This method of slicing includes
only those statements that are necessary for data members at the slicing criteria to
receive correct values. For example, slicing with respect to the node b = b out associated
with the statement o.m2() will exclude statements that assign to data member a.
   One source of imprecision of this method is that it does not consider the fact that
data members may belong to different objects and creates spurious dependencies between
data members of different objects. In the above example, the slice wrongly includes the
statements ba.m1() and ba.m2(). Liang and Harrold [67] give an improved algorithm for
object sensitive slicing.
   In the dependence graph representation of [67], the constructor has no formal in
vertices for the instance variables since these variables cannot be referenced before they
are allocated by the class constructor. Thus the algorithm omits formal-in vertices
for instance variables in the class constructor In the approaches of [67], [66] the data
members of the class are treated as additional parameters to be passed to the function.
This increases the number of parameter nodes. The number of additional nodes can
be reduced using GMOD/GREF information. Actual-out and Formal-out vertices are
needed only for those data members that are modified by the member function. Actual-in
and Formal-in vertices are needed for those data members accessed by the function.


Handling Parameter Objects

Tonella [59] represents an object as a single vertex when the object is used as a parameter.
This representation can lead to imprecise slices because it considers modification (or
access) of an individual field in an object to be a modification(or access) of the entire
object. For example, if the slicing criteria is o.b at the end of D() (in Figure 2.9), then
C(o) must be included. This in turn causes the slicer to include the parameter ba,
Chapter 2. Slicing                                                                       31



which causes ba.a and ba.b to be included, though ba.a does not affect o.b. To overcome
this limitation, Liang and Harrold [67] expand the parameter object as a tree. Figure
2.11 shows the parameter ba being expanded into a tree. At the first level, the node
representing ba is expanded into two nodes, Base and Derived each representing the type
ba can possibly have. At the next level, each node is expanded into its constituent data
members. Since data members can themselves be objects, the expansion is recursively
done till we get primitive data types. In presence of recursive data types, where tree
height can be infinite , k-limiting is used to limit the height of the tree to k. At the call
statement C(o) in Figure 2.9, the parameter object o is expanded into its data members.
At the function call, actual in and actual out vertices are created for the data members
of o. Summary edges are added between the actual in and actual out vertices if there is
a dependence possible through the called procedure.


2.3.2     Handling Inheritance

Java provides a single inheritance model which means that a new Java class can be
designed that inherits state variables and functionality from an existing class. The
functionality of base class methods can be overridden by simply redefining the methods
in the base class. Larsen and Harrold [66] construct dependence graph representations
for methods defined by the derived class . The representations of all methods that
are inherited from superclasses are simply reused. To construct the dependence graph
representation of class Derived (Figure 2.9), new representations are constructed for
methods such m3(), m4(). The representation of m1() is reused from class Base
   Liang and Harrold [67] illustrate that in the presence of virtual methods, it is not pos-
sible to directly reuse the representations of the methods of the superclass.For example,
we cannot directly reuse the representation for m1() in class Base when we construct
the representation for class Derived. In the Base class , the call statement vm() in
m1() resolves to Base :: vm(). If a class derived from Base redefines vm(), then the call
statement vm() no longer resolves to Base :: vm(), but to the newly defined vm() of the
derived class. The callsites in the representation of m1() for class Derived have to be
Chapter 2. Slicing                                                                     32



changed. A method needs a new representation if

   1. the method is declared in the new class

   2. the method is declared in a lower class in the hierarchy and calls a newly redefined
      virtual method directly or indirectly.

For example, methods declared in Dervied need a new representation because these
methods satisfy (1), Base.m1() also needs a new representation because it satisfies (2):
Base.m1() calls Dervied.vm() which is redefined in class Derived


Handling Interfaces

In Java, interfaces declare methods but let the responsibility of defining the methods to
concrete classes implementing the interface. Interfaces allows the programmer to work
with objects by using the interface behavior that they implement, rather than by their
class definition.
   Single Interfaces
   We use the interface representation graph [58] to represent a Java interface and its
corresponding classes that implement it. There is a unique vertex called interface start
vertex for the entry of the interface. Each method declaration in the interface can be
regarded as a call to its corresponding method in a class that implements it and therefore
a call vertex is created for each method declaration in the interface. The interface start
vertex is connected to each call vertex of the method declaration by interface membership
dependence arcs. If there are more than once classes that implement the interface, we
connect a method call in the interface to every corresponding method that implement it
in the classes.
   Interface Extending Similar to extending classes, the representation of extended
interface is constructed by reusing the representation of all methods that are inherited
from superinterfaces. For newly defined methods in the extended interface, new repre-
sentations are created.
Chapter 2. Slicing                                                                                                             33




ie1 interface A {                                                                  ie1
c1     void method1(int h);              (a)
c2     void method2(int v);                                                                                                  f1_in:   this.h=this.h_in
    }                                                                                                                        f2_in:   this.v=this.v_in
ie3 interface B extends A {                                         c1                                   c2                  f3_in:   this.u=this.u_in
c4     void method3(int u);                                                                                                  f4_in:   h1=h1_in
    }                                                                                                                        f5_in:   v1=v1_in
                                                  a1_in                                  a2_in                               f6_in:   u1=u1_in
ce5 class C1 implements A {
                                                                                                                             f7_in:   h2=h2_in
s6     int h, v;                                                                                       e9       e16          f8_in:   v2=v2_in
e7     public void method1(int   h1) {                       e7             e13
                                                                                                                             a1_in:   h1_in=h
s8          this.h = h1;                                                                                                     a2_in:   v1_in=v
       }                                                                                                                     a3_in:   u1_in=u
e9     public void method2(int   v1) {
s10         this.v = v1;                 f4_in               f7_in                  f5_in                      f8_in
       }                                                                                         s10                   s17
                                                  s8                  s14
    }
ce11 class C2 implements A {
s12    int h, v;
e13    public void method1(int   h2) {                                             ie3
s14         this.h = h2+1;
                                         (b)
       }
e16    public void method2(int   v2) {
s17         this.v = v2+1;
       }
                                                        c1                         c2                             c4
    }
ce18 class C3 implements B {                                                                                                 interface-membership
s19    int h, v, u;                       a1_in                    a2_in                               a3_in                 dependence arc
e20    public void method1(int   h1) {
s21         this.h = h1+2;                                                                                                   control dependence arc
                                                       e20                         e22                          e24
       }
e22    public void method2(int   v1) {
s23         this.v = v1+2;                                                                                                   call dependence arc

       }
                                         f4_in                    f5_in                          f6_in
e24    public void method3(int   u1) {                                                                                       parameter dependence arc
s25         this.u = u1+2;                        s21                        s23                                 s25
       }
    }




                       Figure 2.12: Interface Dependence Graph (from [58])
Chapter 2. Slicing                                                                       34



2.3.3     Handling Polymorphism

In Java, method calls are bound to the implementation at runtime. Method invocation
expressions such as o.m(args) are executed as follows

  1. The runtime type T of o is determined.

  2. Load T.class

  3. Check T to find an implementation for method m. If T does not define an imple-
      mentation, T checks its superclass, and its superclass until an implementation is
      found.

  4. Invoke method m with the argument list, args, and also pass o to the method,
      which will become the this value for method m.

   A polymorphic reference can refer to instances of more than one class. A class
dependence graph represents such polymorphic method call by using a polymorphic
choice vertex [66]. A polymorphic choice vertex represents the selection of a particular
call given a set of possible destinations. In this method a message sent to a polymorphic
object is represented as a set of callsites one for each candidate message handling method,
connected to a polymorphic choice vertex with polymorphic choice edges. This approach
may give incorrect results: in function main() , Larsen’s approach uses only one callsite to
represent statement p.m1() because m1() is declared only in Base. However, when m1()
is called from objects of class Derived, it invokes Derived.vm() to modify d and when
m1() is called from objects of class Base, it invokes Base.vm() to modify a. One callsite
cannot precisely represent both cases. This approach also computes spurious dependence:
the approach is equivalent to using several objects, each belonging to a different type
to represent a polymorphic object. The data dependence construction algorithm cannot
distinguish data members with the same names in these different objects.
   Liang and Harrold [67] give an improved method in representing polymorphism to
overcome this limitation. A polymorphic object is represented as a tree: the root of the
tree represents the polymorphic object and the children of the root represent objects of
Chapter 2. Slicing                                                                     35



the possible types. When the polymorphic object is used as a parameter, the children
are further expanded into trees; when the polymorphic object receives a message, the
children are further expanded into callsites. In Figure 2.11 the callsite ba.m1() can have
receiver types Base and Derived . Thus the call site is expanded (one for each type of
receiver).


2.3.4        Case Study - Elevator Class and its Dependence Graph

Figure 2.13 shows the elevator program and the slice with respect to the line 59. Figure
2.14 shows the class dependence graph constructed for the program. The C++ Elevator
class discussed in [72] has been modified for Java.
Chapter 2. Slicing                                                                     36




                                                    30             }

                                                    31             int current floor;
 1 class Elevator {                                 32             int current direction;
                                                    33             int top floor;
 2       static int UP=1, DOWN=-1;                  34 }

 3     public Elevator(int t) {                     35 class AlarmElevator extends Elevator {
 4       current floor=1;
                                                    36             public AlarmElevator(int top floor) {
 5            current direction = UP;
 6            top floor = t;                        37                 super(top floor);
 7      }                                           38                 alarm on=0;
                                                    39         }
 8       public void up() {                         40         public void set alarm() {
 9          current direction=UP;                   41             alarm on=1;
10       }                                          42         }
                                                    43         public void reset alarm() {
11       public void down() {                       44             alarm on=0;    }
12          current direction=DOWN;                 45           public void go(int floor) {
13       }                                          46                 if(!alarm on)
14        int which floor() {                       47                     super.go(floor);
15             return current floor;                48         }
16       }
                                                    49         protected int alarm on;
17       public int direction() {                   50 }
18          return current direction;
19       }                                          51 class Test {
                                                    52     public static void main(String args[]) {
20      public   void go (int floor) {              53     Elevator e;
                                                    54           if(condition)
21     if(current direction==UP) {
                                                    55               e=new Elevator(10);
22           while (current floor!= floor           56         else
23            current floor = top floor))        57               e=new AlarmElevator(10);
24            current floor= current floor+1 ;      58                     e.go(5);
25       }                                          59         System.out.print(e.which floor());
26       else {                                     60     }
27          while (current floor != floor           61     }
28           current floor 0)
29             current floor= current floor-1;

                             Figure 2.13: The Elevator program
Chapter 2. Slicing                                                                                                                                                               37




                                                                                                                       52                                                                 slice point

                                          54                                                                                                                        58                                    59

                                                                                                                                                                                          A4_in                14
             57                                                  55
                                                                                                                                                                    P1                               F1_in

                                                                                                                                                                                                                15

A10_in        A4_out A5_out A6_out A7_out              A11_in    A4_out A5_out A6_out

                                                                                           A4_in    A5_in     A6_in     A7_in       A9_in        A4_out     A4_in        A5_in    A6_in     A9_in         A4_out
         36

F3_in         F1_out F2_out   F3_out F8_out

              40

        37
                                                                                                                                            31
A8_in              A4_out A5_out A6_out


                                                           3                            F1_in     F2_in     F3_in     F8_in     F5_in        F1_out

                                               F4_in             F1_out F2_out F3_out                                                   32


                                                                                                                                            33

                                                       4       5 6
                      control dependence                                                                                                     A4_out
                      edge                                                                A4_in      A5_in     A6_in        A8_in

                                                                                                                                                                                                     20
                      data dependence
                      edge
                                                                                                                                                          F1_in     F2_in         F3_in     F5_in         F1_out
                      summary edge
                                                                                                                                                                                                    21
                      call edge, parameter edge
                                                                                                                                                                                          22                   27


                      key for parameter vertices                          A1_in: a_in = current_floor                                                                            24                       29
                                                                          A1_out: current_floor = a_out
                      F1_in: current_floor = current_floor_in            A2_in: b_in = 1
                      F1_out: current_floor_out = current_floor A3_in: b_in: = ?1
                      F2_in: current_dirn = current_dirn_in                A4_in: current_floor_in = current_floor
                      F2_out: current_dirn_out = current_dirn              A4_out: current_floor = current_floor_out
                      F3_in: top_floor = top_floor_in                      A5_in: current_dirn_in = current_dirn
                      F3_out: top_floor_out = top_floor                    A5_out: current_dirn = current_dirn_out
                      F4_in: 1_top_floor = 1_top_floor_in                A6_in: top_floor_in = top_floor
                      F5_in: floor = floor_in                              A6_out: top_floor = top_floor_out
                      F6_in: a = a_in                                      A7_in: alarm_on_in = alarm_on
                      F6_out: a_out = a                                    A7_out: alarm_on = alarm_on_out
                      F7_in:    b = b_in                                   A8_in:   1_top_floor_in = 1_top_floor
                      F8_in:    alarm_on = alarm_on_in                     A9_in:   floor_in = 5
                      F8_out: alarm_on_out = alarm_on                      A!0_in: top_floor = 10
                                                                         A11_in:    1_top_floor = 10




                                               Figure 2.14: Dependence Graph for Elevator program
Chapter 3

Points to Analysis

In this chapter we first discuss the need for points to analysis. In the context of slicing,
points to analysis is essential for the correct computation of data dependencies and
construction of call graph. We summarize some issues related to computing points to
sets, including the methods for its computation and various factors that affect precision
. We next describe Andersen’s algorithm for pointer analysis for C and its adaptation
for Java. We then describe a new method for intra-procedural alias analysis which is an
improvement over flow insensitive analysis but not as precise as a flow sensitive analysis.



3.1      Need for Points to Analysis
The goal of pointer analysis is to statically determine the set of memory locations that
can be pointed to by a pointer variable. If two variables can access the same memory
location, the variables are said to be aliased. Alias analysis is necessary for program anal-
ysis, optimizations and correct computation of data dependence which is necessary for
slicing. Consider the computation of data dependence in Figure 3.1. Here the statement
print(y.a) is dependent on x.a=... , since x and y are aliased due to the execution
of the statement y=x. Without alias analysis, it is not possible to infer that statement 7
is dependent on statement 4.
   A points to graph gives information about the set of memory locations pointed at by


                                             38
Chapter 3. Points to Analysis                                                             39



1 void fun() {
2     obj x,y;
3     x=new obj();     // O1 represent the object allocated
4     x.a = ....;
5     ... = y.a;
6     y = x;
7     print(y.a);
8 }


                           Figure 3.1: Need for Points to Analysis

  each variable. Figure 3.1 shows a program and its associated points to graph.
     In C a variable can point to another stack variable or dynamically allocated memory
  on heap, whereas in Java a reference variable can point only to objects allocated on
  heap, as stack variables cannot be pointed to due to lack of address of operator ().
  Dynamically allocated memory locations on heap are not named. One convention is to
  refer objects (memory locations) by the statement at which they are created. A statement
  can be executed many times and therefore can create a new object each time. Thus
  approximations are introduced in the points to graph if the above convention is used.
  Another cause for approximation is the presence of recursion and dynamic allocation of
  memory, which leads to statically unbounded number of memory locations.



  3.2      Pointer Analysis using Constraints
  Our aim is to derive the points to graph from the program text. One method to derive
  the points to graph is using constraints [64]. If pts(q) denotes the set of objects initially
  pointed by q, after an assignment such as p = q, p can additionally point to those objects,
  which are initially pointed at by q. Thus we have the constraint pts(p) ⊇ pts(q). Every
  statement in the program has an associated constraint. A solution to the constraints
  gives the points to sets associated with every variable.
     The constraints such as pts(p) ⊇ pts(q) are also called subset constraints or inclusion
  based constraints. Andersen uses subset constraints for analyzing C program and his
  algorithm is described in Section 3.4
Chapter 3. Points to Analysis                                                                               40




               Points to graph for a C program                        Points to graph for a Java program




                                 s               heap2
         int a=1, b=2;                                         class Obj { int f; }
         int *p, *q;             r               heap1          Obj r,s,t;
         void *r, *s;
                                                         h1:     r = new Obj();
         p = a;                 q                       h2:     s = new Obj();
         q = b;
                                 p                       h3:   r.f = new Obj();         t
   h1:    r = malloc
                                                                t = s;                              heap2
   h2:   s = malloc              a                                                      s
                                                                                                      f
                                 b                                                      r
                                                                                                    heap1
                                                                                                      f
                                                                                                                 heap3
                                                                                                                   f




                                     Figure 3.2: Points to Graphs

Subset vs Unification Constraints

The constraints generated can be either subset based or equality based. A subset con-
straint such as p ⊇ q says that the the points-to set of p contains the points-to set of
q. Instead of having subset constraints, Steensgaard [13] uses equality based constraints
where after each assignment like p = q, the points to sets of p and q are unified i.e. the
points to sets of both the variables are made identical.
   Steensgaard’s approach is based on a non standard type system, where type does not
refer to declared type in the program source. Instead, the type of a variable describes
a set of locations possibly pointed to by the variable at runtime. At initialization each
variable is described by a different type. When two variables can point to the same mem-
ory location, the types represented by the variables are merged. However the stronger
constraints make the analysis less precise. The equality based approach is also called
unification because it treats assignments as bidirectional. This unification merges the
Chapter 3. Points to Analysis                                                           41



points to set of both sides of the assignment and is essentially computing an equivalence
relation defined by assignments, which is done by the fast union find algorithm [22]
   If all the variables can be assigned types, subject to the constraints, then the sys-
tem of constraints is said to be satisfiable or well typed. Points-to analysis reduces to
the problem of assigning types to all locations (variables) in a program, such that the
variables in the program are well-typed. At the end of the analysis, two locations are
assigned different types, unless they have to be described by the same type in order for
the system of constraints to be well-typed.



3.3     Dimensions of Precision
The various factors that contribute to the precision of the analysis computed are flow
sensitivity, field sensitivity, context sensitivity and heap modelling. Ryder [17] discusses
various parameters that contribute to the precision of the analysis




Flow Sensitive vs Flow Insensitive approach

A flow sensitive analysis takes into account the control flow structure of the program.
Thus the points-to set associated with a variable is dependent on the program point. It
computes the mapping variable ⊗ program point → memory location. This is precise
but requires a large amount of memory since the points to sets of the same variable at
two different program points may be different and their points-to sets have to be recorded
separately. Flow sensitive analysis allows us to take advantage of strong updates, where
after a statement x = ..., the points to information about x prior to that statement can
be removed.
   A flow insensitive approach computes conservative information that is valid at all
program points. It considers the program as a set of statements and computes points-to
information ignoring control flow. Flow insensitive analysis computes a single points to
relation that holds regardless of the order in which assignment statements are actually
Chapter 3. Points to Analysis                                                         42



executed.
   A flow insensitive analysis produces imprecise results. Consider the computation of
data dependence for the program in Figure 3.1. If we apply flow insensitive alias anal-
ysis, then the analysis will conclude that x and y can both point to O1 , and thus the
statement ... = y.a (line 5) is made dependent on x.a = ... . But y can point to O1
only after the statement y = x. Thus flow insensitive analysis leads to spurious data
dependence.




Field Sensitivity

Aggregate objects such as structures can be handled by one of three approaches: field-
insensitive, where field information is discarded by modeling each aggregate with a single
constraint variable; field-based, where one constraint variable models all instances of a
field; and finally, field-sensitive, where a unique variable models each field instance of an
object. The following table describes these approaches for the code segment

 x.a = new object();
 y.b = x.a ;


   field based         pts(b) ⊇ pts(a)
 field insensitive     pts(y) ⊇ pts(x)
  field sensitive    pts(y.b) ⊇ pts(x.a)



Heap Abstraction

Two variables are aliased if they can refer to the same object in memory. Thus we need
to keep track of objects that can be present at runtime. The objects created at runtime
cannot be determined statically and have to be conservatively approximated. The least
precise manner is to consider the entire heap as a single object. The most common man-
ner of abstraction is to have one abstract object per program point. This abstract object
is a representative of all the objects that can be created at runtime due to that program
Chapter 3. Points to Analysis                                                         43



main() {
    object a,b,c,d;
    a=new object();              pts(a)   ⊇   {o1}
    b=new object();              pts(b)   ⊇   {o2}
    c=id(a);                     pts(r)   ⊇   pts(a), pts(c) ⊇ pts(r)
    d=id(b);                     pts(r)   ⊇   pts(b), pts(d) ⊇ pts(r)
}

object id(object r) {

    return r;
}




                Figure 3.3: Imprecision due to context insensitive analysis

point. A more precise abstraction is to take context sensitivity into account using the
calling context to distinguish between various objects created at the same program point.




Context Sensitivity

A context sensitive analysis distinguishes between different calling contexts and does not
merge data flow information from multiple contexts. In Figure 3.3, a and b point to o1
and o2 respectively. Due to the function calls, c is made to point to o1 and d is made
to point to o2. So the actual points to sets are a → o1 , b → o2, c → o1 and c → d A
context insensitive analysis models parameter bindings as explicit assignments. Thus r
points to both the objects o1 and o2. This leads to smearing of information making c
and d point to both o1 and o2.
    One method to incorporate context sensitivity is to summarize each procedure and
embed that information at the call sites. A method can change the points to sets of
all data reachable through static variables, incoming parameters and all objects created
by the method and its callees. A method’s summary must include the effect of all the
updates that the function and all its callees can make, in terms of incoming parameters.
Thus summaries are huge. Also there is another difficulty due to call back mechanism.
Chapter 3. Points to Analysis                                                           44



In presence of dynamic binding, we do not know which method would be called making
it difficult to summarize the method [1].
   Another method to incorporate context sensitivity is the cloning based approach.
Cloning based approaches expands the call graph for each calling context. Thus there
is a separate path for each calling context. A context insensitive algorithm can thus be
run on the expanded graph. This leads to an exponential blowup. Whaley and Lam
[18] use Binary Decision Diagrams (BDD) are used to handle the exponential increase in
complexity caused due to cloning. BDDs were first used for pointer analysis by Berndl
et.al [31]. Milanova et.al [20] introduces object sensitivity, which is a form of context
sensitivity. Instead of using the call stack to distinguish different contexts, they use the
receiver object to distinguish between different contexts.



3.4     Andersen’s Algorithm for C
Andersen proposed a flow insensitive , context insensitive version of points to analysis
for C. His analysis modeled the heap using a separate concrete location to represent all
memory allocated at a given dynamic allocation site. The implementation expressed the
analysis using subset constraints and then solved the constraints.
   Andersen’s algorithm [64] models the points to relations as subset constraints. After a
statement such as p=q, p additionally points to those objects, which are initially pointed
by q. Thus we have the constraint pts(p) ⊇ pts(q). The list of constraints for C is given
in Table 3.1


                        p = x       x ∈ pts(p)
                        p=q          pts(p) ⊇ pts(q)
                        p = ∗q       ∀x ∈ pts(q), pts(p) ⊇ pts(x)
                        ∗p = q       ∀x ∈ pts(p), pts(x) ⊇ pts(q)

                              Table 3.1: Constraints for C

   Constraints are represented using a constraint graph. Each node N in the constraint
graph represents a variable and is annotated with pts(N ), the set of objects the variable
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006)

More Related Content

Viewers also liked

Cubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP FrameworkCubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP FrameworkStefan Urbanek
 
Data Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-FabiniData Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-FabiniScott Fabini
 
Nata de Coco Management Case
Nata de Coco Management CaseNata de Coco Management Case
Nata de Coco Management Caserogel84
 
Re-implementing Thrift using MDE
Re-implementing Thrift using MDERe-implementing Thrift using MDE
Re-implementing Thrift using MDESina Madani
 
Introduction to Golang final
Introduction to Golang final Introduction to Golang final
Introduction to Golang final Paul Chao
 
Singleton class in Java
Singleton class in JavaSingleton class in Java
Singleton class in JavaRahul Sharma
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsArvind Devaraj
 
BUD17-302: LLVM Internals #2
BUD17-302: LLVM Internals #2 BUD17-302: LLVM Internals #2
BUD17-302: LLVM Internals #2 Linaro
 
Embedded Linux/ Debian with ARM64 Platform
Embedded Linux/ Debian with ARM64 PlatformEmbedded Linux/ Debian with ARM64 Platform
Embedded Linux/ Debian with ARM64 PlatformSZ Lin
 
PHP Summer Training Presentation
PHP Summer Training PresentationPHP Summer Training Presentation
PHP Summer Training PresentationNitesh Sharma
 
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linaro
 
Database concurrency control &amp; recovery (1)
Database concurrency control &amp; recovery (1)Database concurrency control &amp; recovery (1)
Database concurrency control &amp; recovery (1)Rashid Khan
 
project on visual basic 6.0 ' Jewellery Shop Management'
project on visual basic 6.0 ' Jewellery Shop Management' project on visual basic 6.0 ' Jewellery Shop Management'
project on visual basic 6.0 ' Jewellery Shop Management' Anitha Krishnappa
 
BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement Linaro
 
Design, Fabrication and Analysis of Crank and Slotted Lever Quick Return Mech...
Design, Fabrication and Analysis of Crank and Slotted Lever Quick Return Mech...Design, Fabrication and Analysis of Crank and Slotted Lever Quick Return Mech...
Design, Fabrication and Analysis of Crank and Slotted Lever Quick Return Mech...Mohammed Naseeruddin Shah
 
Distributed system &amp; its characteristic
Distributed system &amp; its characteristicDistributed system &amp; its characteristic
Distributed system &amp; its characteristicAkash Rai
 
Electronic payment system
Electronic payment systemElectronic payment system
Electronic payment systempankhadi
 

Viewers also liked (19)

Cubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP FrameworkCubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP Framework
 
Data Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-FabiniData Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-Fabini
 
Nata de Coco Management Case
Nata de Coco Management CaseNata de Coco Management Case
Nata de Coco Management Case
 
Re-implementing Thrift using MDE
Re-implementing Thrift using MDERe-implementing Thrift using MDE
Re-implementing Thrift using MDE
 
Introduction to Golang final
Introduction to Golang final Introduction to Golang final
Introduction to Golang final
 
Singleton class in Java
Singleton class in JavaSingleton class in Java
Singleton class in Java
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier Transforms
 
BUD17-302: LLVM Internals #2
BUD17-302: LLVM Internals #2 BUD17-302: LLVM Internals #2
BUD17-302: LLVM Internals #2
 
Embedded Linux/ Debian with ARM64 Platform
Embedded Linux/ Debian with ARM64 PlatformEmbedded Linux/ Debian with ARM64 Platform
Embedded Linux/ Debian with ARM64 Platform
 
PHP Summer Training Presentation
PHP Summer Training PresentationPHP Summer Training Presentation
PHP Summer Training Presentation
 
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
 
Database concurrency control &amp; recovery (1)
Database concurrency control &amp; recovery (1)Database concurrency control &amp; recovery (1)
Database concurrency control &amp; recovery (1)
 
project on visual basic 6.0 ' Jewellery Shop Management'
project on visual basic 6.0 ' Jewellery Shop Management' project on visual basic 6.0 ' Jewellery Shop Management'
project on visual basic 6.0 ' Jewellery Shop Management'
 
BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement
 
Design, Fabrication and Analysis of Crank and Slotted Lever Quick Return Mech...
Design, Fabrication and Analysis of Crank and Slotted Lever Quick Return Mech...Design, Fabrication and Analysis of Crank and Slotted Lever Quick Return Mech...
Design, Fabrication and Analysis of Crank and Slotted Lever Quick Return Mech...
 
E Payment
E PaymentE Payment
E Payment
 
Distributed system &amp; its characteristic
Distributed system &amp; its characteristicDistributed system &amp; its characteristic
Distributed system &amp; its characteristic
 
Mobile os (msquare)
Mobile os (msquare)Mobile os (msquare)
Mobile os (msquare)
 
Electronic payment system
Electronic payment systemElectronic payment system
Electronic payment system
 

Similar to Thesis: Slicing of Java Programs using the Soot Framework (2006)

A Multi-Dimensional Compositional Approach for Business Process Engineering
A Multi-Dimensional Compositional Approach for Business Process EngineeringA Multi-Dimensional Compositional Approach for Business Process Engineering
A Multi-Dimensional Compositional Approach for Business Process EngineeringAng Chen
 
Distributed Mobile Graphics
Distributed Mobile GraphicsDistributed Mobile Graphics
Distributed Mobile GraphicsJiri Danihelka
 
The Dissertation
The DissertationThe Dissertation
The Dissertationphooji
 
iPDC-v1.3.0 - A Complete Technical Report including iPDC, PMU Simulator, and ...
iPDC-v1.3.0 - A Complete Technical Report including iPDC, PMU Simulator, and ...iPDC-v1.3.0 - A Complete Technical Report including iPDC, PMU Simulator, and ...
iPDC-v1.3.0 - A Complete Technical Report including iPDC, PMU Simulator, and ...Nitesh Pandit
 
I Pdc V1.3.0 A Complete Technical Report Including I Pdc, Pmu Simulator, An...
I Pdc V1.3.0   A Complete Technical Report Including I Pdc, Pmu Simulator, An...I Pdc V1.3.0   A Complete Technical Report Including I Pdc, Pmu Simulator, An...
I Pdc V1.3.0 A Complete Technical Report Including I Pdc, Pmu Simulator, An...Nitesh Pandit
 
SzaboGeza_disszertacio
SzaboGeza_disszertacioSzaboGeza_disszertacio
SzaboGeza_disszertacioGéza Szabó
 
Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAPArjun Aravind
 
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONSENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONSStephan Cadene
 
steganography using visual cryptography_report
steganography using visual cryptography_reportsteganography using visual cryptography_report
steganography using visual cryptography_reportSaurabh Nambiar
 
building blocks of a scalable webcrawler
building blocks of a scalable webcrawlerbuilding blocks of a scalable webcrawler
building blocks of a scalable webcrawlerMarc Seeger
 
Yii blog-1.1.9
Yii blog-1.1.9Yii blog-1.1.9
Yii blog-1.1.9Netechsrl
 
A Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative OptimizationsA Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative OptimizationsJeff Brooks
 

Similar to Thesis: Slicing of Java Programs using the Soot Framework (2006) (20)

A Multi-Dimensional Compositional Approach for Business Process Engineering
A Multi-Dimensional Compositional Approach for Business Process EngineeringA Multi-Dimensional Compositional Approach for Business Process Engineering
A Multi-Dimensional Compositional Approach for Business Process Engineering
 
jc_thesis_final
jc_thesis_finaljc_thesis_final
jc_thesis_final
 
Distributed Mobile Graphics
Distributed Mobile GraphicsDistributed Mobile Graphics
Distributed Mobile Graphics
 
The Dissertation
The DissertationThe Dissertation
The Dissertation
 
Thesis_Report
Thesis_ReportThesis_Report
Thesis_Report
 
iPDC-v1.3.0 - A Complete Technical Report including iPDC, PMU Simulator, and ...
iPDC-v1.3.0 - A Complete Technical Report including iPDC, PMU Simulator, and ...iPDC-v1.3.0 - A Complete Technical Report including iPDC, PMU Simulator, and ...
iPDC-v1.3.0 - A Complete Technical Report including iPDC, PMU Simulator, and ...
 
I Pdc V1.3.0 A Complete Technical Report Including I Pdc, Pmu Simulator, An...
I Pdc V1.3.0   A Complete Technical Report Including I Pdc, Pmu Simulator, An...I Pdc V1.3.0   A Complete Technical Report Including I Pdc, Pmu Simulator, An...
I Pdc V1.3.0 A Complete Technical Report Including I Pdc, Pmu Simulator, An...
 
Milan_thesis.pdf
Milan_thesis.pdfMilan_thesis.pdf
Milan_thesis.pdf
 
22024582
2202458222024582
22024582
 
SzaboGeza_disszertacio
SzaboGeza_disszertacioSzaboGeza_disszertacio
SzaboGeza_disszertacio
 
Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAP
 
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONSENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONS
 
Phd dissertation
Phd dissertationPhd dissertation
Phd dissertation
 
Oop c++ tutorial
Oop c++ tutorialOop c++ tutorial
Oop c++ tutorial
 
main
mainmain
main
 
steganography using visual cryptography_report
steganography using visual cryptography_reportsteganography using visual cryptography_report
steganography using visual cryptography_report
 
report
reportreport
report
 
building blocks of a scalable webcrawler
building blocks of a scalable webcrawlerbuilding blocks of a scalable webcrawler
building blocks of a scalable webcrawler
 
Yii blog-1.1.9
Yii blog-1.1.9Yii blog-1.1.9
Yii blog-1.1.9
 
A Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative OptimizationsA Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative Optimizations
 

More from Arvind Devaraj

Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Career options for CS and IT students
Career options for CS and IT studentsCareer options for CS and IT students
Career options for CS and IT studentsArvind Devaraj
 
Yourstory Android Workshop
Yourstory Android WorkshopYourstory Android Workshop
Yourstory Android WorkshopArvind Devaraj
 
Android High performance in GPU using opengles and renderscript
Android High performance in GPU using opengles and renderscriptAndroid High performance in GPU using opengles and renderscript
Android High performance in GPU using opengles and renderscriptArvind Devaraj
 
OpenGLES Android Graphics
OpenGLES Android GraphicsOpenGLES Android Graphics
OpenGLES Android GraphicsArvind Devaraj
 
AIDL - Android Interface Definition Language
AIDL  - Android Interface Definition LanguageAIDL  - Android Interface Definition Language
AIDL - Android Interface Definition LanguageArvind Devaraj
 
NDK Programming in Android
NDK Programming in AndroidNDK Programming in Android
NDK Programming in AndroidArvind Devaraj
 
Google Cloud Messaging
Google Cloud MessagingGoogle Cloud Messaging
Google Cloud MessagingArvind Devaraj
 
OpenGLES - Graphics Programming in Android
OpenGLES - Graphics Programming in Android OpenGLES - Graphics Programming in Android
OpenGLES - Graphics Programming in Android Arvind Devaraj
 
Sorting (introduction)
 Sorting (introduction) Sorting (introduction)
Sorting (introduction)Arvind Devaraj
 
Data structures (introduction)
 Data structures (introduction) Data structures (introduction)
Data structures (introduction)Arvind Devaraj
 
Graphics programming in open gl
Graphics programming in open glGraphics programming in open gl
Graphics programming in open glArvind Devaraj
 

More from Arvind Devaraj (20)

Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Nodejs presentation
Nodejs presentationNodejs presentation
Nodejs presentation
 
Career hunt pitch
Career hunt pitchCareer hunt pitch
Career hunt pitch
 
Career options for CS and IT students
Career options for CS and IT studentsCareer options for CS and IT students
Career options for CS and IT students
 
Careerhunt ebook
Careerhunt ebookCareerhunt ebook
Careerhunt ebook
 
Hyperbook
HyperbookHyperbook
Hyperbook
 
Yourstory Android Workshop
Yourstory Android WorkshopYourstory Android Workshop
Yourstory Android Workshop
 
Android High performance in GPU using opengles and renderscript
Android High performance in GPU using opengles and renderscriptAndroid High performance in GPU using opengles and renderscript
Android High performance in GPU using opengles and renderscript
 
OpenGLES Android Graphics
OpenGLES Android GraphicsOpenGLES Android Graphics
OpenGLES Android Graphics
 
Broadcast Receiver
Broadcast ReceiverBroadcast Receiver
Broadcast Receiver
 
AIDL - Android Interface Definition Language
AIDL  - Android Interface Definition LanguageAIDL  - Android Interface Definition Language
AIDL - Android Interface Definition Language
 
NDK Programming in Android
NDK Programming in AndroidNDK Programming in Android
NDK Programming in Android
 
Google Cloud Messaging
Google Cloud MessagingGoogle Cloud Messaging
Google Cloud Messaging
 
OpenGLES - Graphics Programming in Android
OpenGLES - Graphics Programming in Android OpenGLES - Graphics Programming in Android
OpenGLES - Graphics Programming in Android
 
Operating system
Operating systemOperating system
Operating system
 
Sorting (introduction)
 Sorting (introduction) Sorting (introduction)
Sorting (introduction)
 
Data structures (introduction)
 Data structures (introduction) Data structures (introduction)
Data structures (introduction)
 
Graphics programming in open gl
Graphics programming in open glGraphics programming in open gl
Graphics programming in open gl
 
Computer Systems
Computer SystemsComputer Systems
Computer Systems
 

Recently uploaded

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 

Recently uploaded (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdf
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 

Thesis: Slicing of Java Programs using the Soot Framework (2006)

  • 1. A Static Slicing Tool for Sequential Java Programs A Thesis Submitted For the Degree of Master of Science (Engineering) in the Faculty of Engineering by Arvind Devaraj Computer Science and Automation Indian Institute of Science BANGALORE – 560 012 March 2007
  • 2. i
  • 3. Abstract A program slice consists of a subset of the statements of a program that can potentially affect values computed at some point of interest. Such a point of interest along with a set of variables is called a slicing criterion. Slicing tools are useful for several applications, such as program understanding, testing, program integration, and so forth. Slicing object oriented programs has some special problems, that need to be addressed due to features like inheritance, polymorphism and dynamic binding. Alias analysis is important for precision of slices. In this thesis we implement a slicing tool for sequential Java programs in the SOOT framework. SOOT is a front-end for Java developed at McGill University and it provides several forms of intermediate code. We have integrated the slicer into the framework. We also propose an improved technique for intraprocedural points-to analysis. We have implemented this technique and compare the results of the analysis with those for a flow-insensitive scheme in SOOT. Performance results of the slicer are reported for several benchmarks. ii
  • 4. Contents Abstract ii 1 Introduction 1 1.1 Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The SOOT Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Slicing 7 2.1 Intraprocedural Slicing using PDG . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Program Dependence Graph . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Slicing using the Program Dependence Graph . . . . . . . . . . . 8 2.1.3 Construction of the Data Dependence Graph . . . . . . . . . . . . 9 2.1.4 Control Dependence Graph . . . . . . . . . . . . . . . . . . . . . 11 2.1.5 Slicing in presence of unstructured control flow . . . . . . . . . . . 14 2.1.6 Reconstructing CFG from the sliced PDG . . . . . . . . . . . . . 17 2.2 Interprocedural Slicing using SDG . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 System Dependence Graph . . . . . . . . . . . . . . . . . . . . . . 18 2.2.2 Calling context problem . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.3 Computing Summary Edges . . . . . . . . . . . . . . . . . . . . . 21 2.2.4 The Two Phase Slicing Algorithm . . . . . . . . . . . . . . . . . 21 2.2.5 Handling Shared Variables . . . . . . . . . . . . . . . . . . . . . . 23 2.3 Slicing Object Oriented Programs . . . . . . . . . . . . . . . . . . . . . . 26 2.3.1 Dependence Graph for Object Oriented Programs . . . . . . . . . 26 2.3.2 Handling Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.3 Handling Polymorphism . . . . . . . . . . . . . . . . . . . . . . . 34 2.3.4 Case Study - Elevator Class and its Dependence Graph . . . . . . 35 3 Points to Analysis 38 3.1 Need for Points to Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 Pointer Analysis using Constraints . . . . . . . . . . . . . . . . . . . . . 39 3.3 Dimensions of Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4 Andersen’s Algorithm for C . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5 Andersen’s Algorithm for Java . . . . . . . . . . . . . . . . . . . . . . . . 45 3.5.1 Model for references and heap objects . . . . . . . . . . . . . . . . 45 iii
  • 5. CONTENTS iv 3.5.2 Computation of points to sets in SPARK . . . . . . . . . . . . . 47 3.6 CallGraph Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.6.1 Handling Virtual Methods . . . . . . . . . . . . . . . . . . . . . . 49 3.7 Improvements to Points to Analysis . . . . . . . . . . . . . . . . . . . . . 50 3.8 Improving Flow Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.8.1 Computing Valid Subgraph at each Program Point . . . . . . . . 53 3.8.2 Computation of Access Expressions . . . . . . . . . . . . . . . . 55 3.8.3 Checking for Satisfiability . . . . . . . . . . . . . . . . . . . . . . 60 4 Implementation and Experimental Results 62 4.1 Soot-A bytecode analysis framework . . . . . . . . . . . . . . . . . . . . 62 4.2 Steps in performing slicing in Soot . . . . . . . . . . . . . . . . . . . . . 65 4.3 Points to Analysis and Call Graph . . . . . . . . . . . . . . . . . . . . . 65 4.4 Computing Required Classes . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.5 Side effect computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.6 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.7 Computing the Class Dependence Graph . . . . . . . . . . . . . . . . . . 70 4.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5 Conclusion and Future Work 75 Bibliography 77
  • 6. List of Tables 3.1 Constraints for C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 Constraints for Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3 Data flow equations for computing valid edges . . . . . . . . . . . . . . . 53 3.4 Computation of Valid edges . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1 Benchmarks Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Number of Edges in the Class Dependence Graph . . . . . . . . . . . . . 72 4.3 Timing Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.4 Program Statistics - Partial Flow Sensitive . . . . . . . . . . . . . . . . . 73 4.5 Precision Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 v
  • 7. List of Figures 1.1 A program and its slice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 A Control Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Post Dominator Tree for the CFG in Figure 2.1 . . . . . . . . . . . . . . 12 2.3 Dominance Frontiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 A program and its PDG (taken from [39]) . . . . . . . . . . . . . . . . . 15 2.5 Augmented CFG and PDG for the program in Figure 2.4 (taken from [39]) 16 2.6 A program with function calls . . . . . . . . . . . . . . . . . . . . . . . . 18 2.7 System Dependence Graph for an interprocedural program . . . . . . . . 19 2.8 Slicing the System Dependence Graph . . . . . . . . . . . . . . . . . . . 24 2.9 Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.10 The Dependence Graph for the main function (from [67]) . . . . . . . . 29 2.11 The Dependence Graphs for functions C() and D() (from [67]) . . . . . 29 2.12 Interface Dependence Graph (from [58]) . . . . . . . . . . . . . . . . . . 33 2.13 The Elevator program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.14 Dependence Graph for Elevator program . . . . . . . . . . . . . . . . . . 37 3.1 Need for Points to Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Points to Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Imprecision due to context insensitive analysis . . . . . . . . . . . . . . . 43 3.4 Object Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.5 An example program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.6 Access Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.7 OFG Subgraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.8 Access Expressions(for a DAG) . . . . . . . . . . . . . . . . . . . . . . . 58 3.9 Access Expressions (for general graph) . . . . . . . . . . . . . . . . . . . 60 3.10 Simplified Access Expressions . . . . . . . . . . . . . . . . . . . . . . . . 60 3.11 Dominator Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.1 Soot Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Computation of the class dependence graph . . . . . . . . . . . . . . . . 66 4.3 Jimple code and its slice . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 vi
  • 8. Chapter 1 Introduction 1.1 Slicing A program slice consists of the parts of a program that can potentially affect the value of variables computed at some point of interest. Such a point is called the slicing criterion and is specified by a pair (program point,set of variables).The original concept of a program slice was proposed by Mark Weiser [61]. According to his definition A slice s of program p is a subset of the statements of p that retains some specified behavior of p. The desired behavior is detailed by means of a slicing criterion c. Generally, a slicing criterion c is a set of variables V and a program point l. When the slice s is executed, it must always have the same values as program p for the variables in V at point l. Weiser claimed that a program slice was the abstraction that users had in mind as they debugged programs. There have been variations in the definitions of program slices depending on the application in mind. Weiser’s original definition required a slice S of a program to be an executable subset of the program, whereas another common defini- tion defines a slice as a subset of statements that directly or indirectly affect the values computed at the point of interest but are not necessarily an executable segment. Fig- ure 1.1 shows a program sliced with respect to the slicing criterion ( print(product), 1
  • 9. Chapter 1. Introduction 2 read(n); i = 1; read(n); sum = 0; i = 1; product = 1; product = 1; while (i<=n) { while (i<=n) { sum = sum + i; product = product * i; product = product * i; i = i + 1; i = i + 1; } } print(sum); print(product); print(product); Figure 1.1: A program and its slice product) . Since the transformed program is expected to be much smaller than the original it is hoped that dependencies between statements in the program will be more explicit. Surveys on program slicing are presented in [45], [73]. Slicing tools have been used for several applications, such as program understanding [82], testing [74] [75], pro- gram integration [78], model checking [79] and so forth. 1. Program Understanding: Software engineers are assigned to understand a mas- sive piece of code and modify parts of them. When modifying a program, we need to comprehend a section of the program rather than the whole program. Backward and forward slicing can be used to browse the code and understand the interde- pendence between various parts of the program. 2. Testing: In the context of testing, a problem that is often encountered is that of finding the set of program statements that are affected by a change in the program. This analysis is termed impact analysis. To determine what tests need to be re-run to test test a modified statement S, a backward slice on S will get the statements that actually influence the behavior of the program. 3. Debugging: Quite often the statement that is actually responsible for a bug that shows up at some program point P is statically far away from P . To reduce the search space of possible causes for the error the programmer can use a backward
  • 10. Chapter 1. Introduction 3 slice to eliminate parts of the code that could not have been the cause of the problem. 4. Model Checking: Model checking is a verification technique that performs an exhaustive exploration of a program’s state space. Typically the execution of a program is simulated and path and states encountered in the simulation are checked against correctness specifications phrased as temporal logic formula. The use of slicing here is to reduce the size of a program P beginning checked for a property by eliminating statements and variables that are irrelevant to the formula. There is an essential difference between static and dynamic slices. A static slice disregards the actual inputs to a program whereas the latter relies on a specific test case and therefore is in general , more precise. When slicing a program P we are concerned with both correctness as well as precision. For correctness we demand that the slice S produced by the tool is a superset of the actual slice S(p) for the slicing criterion p. Precision has to do with the size of the slice. For two correct slices S1 and S2 , S1 is more precise than S2 , if the statements of S1 are a subset of the statements of S2 . Obtaining the most precise slice, is in general not computable, hence our aim is to compute a correct slice that is as precise as possible. The slicing problem can be addressed by viewing it as a reachability problem in a Program Dependence Graph (PDG) [54]. A PDG is a directed graph with vertices cor- responding to statements and predicates and edges corresponding to data and control dependences. For the sequential intraprocedural case, the backward slice with respect to a node in the PDG is the set of all nodes in the PDG on which this node is tran- sitively dependent. Thus given the PDG, a simple reachability algorithm on the PDG will construct the slice. However when considering interprocedural slices, the process is more complicated as mere reachability will produce imprecise slices. One needs to track only interprocedural realizable paths, where a realizable path corresponds to legal call/return pairs where a procedure always returns to the call site where it was invoked. The structure on which interprocedural slicing is generally implemented is the System Dependence Graph [63] (SDG). This graph is a collection of graphs corresponding to
  • 11. Chapter 1. Introduction 4 PDG’ss for individual procedures augmented with some extra edges that capture the interaction between them. Slicing of interprocedural programs is described by Horwitz et.al [63]. They use the SDG to track dependencies in a program and use a two phase algorithm to ensure that only feasible paths are tracked, that is, those in which procedure calls are matched with the correct return statements. Slicing object oriented programs adds yet another dimension of complexity to the slicing problem. Object-oriented concepts such as classes, objects, inheritance, poly- morphism and dynamic binding make representation and analysis techniques used for imperative programming languages inadequate for object-oriented programs. The Class Dependence Graph has been introduced by Larsen and Harrold [66], which can represent class hierarchy, data members and polymorphism. Some more features were added by Liang and Harrold [67]. The resolution of aliases is required for the correct computation of data dependencies. To compute the dependence graph, it is necessary to build a call graph. The computation of call graph becomes complicated in presence of dynamic binding , i.e. when the target of a method call depends on the runtime type of a variable. Algorithms like Rapid Type Analysis (RTA) [26] compute call graphs using type information. A key analysis for object oriented languages is alias analysis. The objective here is to follow an object O from its point of allocation to find out which objects reference O and which other objects are referenced by the fields of O Resolving aliasing becomes important for the correct computation of data dependencies in the dependence graph. The precision of the analysis depends on various factors like flow sensitivity, context sensitivity and handling of field references. Andersen [64] gives a flow insensitive method for finding aliases using subset constraints. Lhotak [70] describes the method adapted for Java programs. In this thesis we implement a slicing tool for sequential Java programs and integrate it into the SOOT framework. We briefly describe the framework and the contributions of the thesis.
  • 12. Chapter 1. Introduction 5 1.2 The SOOT Framework The SOOT analysis and transformation framework [69] is a Java optimization framework developed by the Sable Research Group at McGill University and it is intended to be a robust, easy-to-use research framework. It has been used extensively for program analy- sis, instrumentation, and optimization. It provides several forms of intermediate code for analyzing and optimizing Java bytecode. Jimple is a typed three address representation, which we have used in our implementation. Our objective is to implement a slicing tool within the Soot framework [69] and make it publicly available. At the time this work was begun there was no publicly available slicing infrastructure for Java. The Indus [81] project addresses the slicing problem for Java programs and source code has been made available in February 2007. 1.3 Contributions of the thesis The following are the contributions of this thesis: 1. We have implemented the routines for creating the program dependence graphs and the class dependence graph for an input Java program that is represented in the form of Jimple intermediate code. 2. We have integrated a slicer into the framework. For inter-procedural slicing we have implemented the two-phase slicing algorithm of [63]. 3. We propose an improved technique for intraprocedural points-to analysis. This uses path expressions to track paths that encode valid points-to information. A simple data-flow analysis formulation collects valid edges, i.e. those that are added to the object flow graph. Reachability queries are handled in a reasonable amount of time. We have implemented this technique and compare the results of the analysis with those for a flow-insensitive scheme in SOOT. 4. The slicing tool has been run on several benchmarks and we report on times taken
  • 13. Chapter 1. Introduction 6 to build the class dependence graph, its size, slice sizes for some given slicing criteria and slicing times.
  • 14. Chapter 2 Slicing In this chapter, we discuss techniques for slicing a program and in particular issues that arise when slicing object oriented programs. The first part of the chapter describes the Program Dependence Graph (PDG), its construction and the algorithm for intraproce- dural slicing. For slicing programs with function calls, the System Dependence Graph (SDG) is used. The SDG is a collection of PDGs individual procedures with additional edges for modeling procedure calls and parameter bindings. The second part of the chapter describes the construction of SDG and the algorithm for interprocedural slicing. The third part of the chapter describes dependence graph computation of object ori- ented programs, which is complicated because objects can be passed as parameters and methods can be invoked upon objects. Also we need the results of points to analysis to determine what objects are pointed by each reference variable. Then we describe the ex- tension of the algorithm for computing the dependence graph in presence of inheritance and polymorphic function calls. 2.1 Intraprocedural Slicing using PDG Weiser’s approach [61] to program slicing is based on dataflow equations. In his approach, the set of relevant variables is iteratively computed till a fixed point is reached. Slicing via graph reachability was introduced by Ottenstein [54]. In this approach a dependence 7
  • 15. Chapter 2. Slicing 8 graph of the program is constructed and the problem of slicing reduces to computing reachability on the dependence graph. We adopt this in our implementation. 2.1.1 Program Dependence Graph A program dependence graph (PDG) represents the data and control dependencies in the program. Nodes of PDG represent statements and predicates in a source program, and its edges denote dependence relations. The PDG can be constructed as follows. 1. Build the program’s CFG, and use it to compute data and control dependencies: Node N is data dependent on node M iff M defines a variable x, N uses x, and there is an x-definition-free path in the CFG from M to N . Node N is control dependent on node M iff M is a predicate node whose evaluation to true or false determines whether N will be executed. 2. Build the PDG. The nodes of the PDG are almost the same as the nodes of the CFG. However, in addition, there is a a special enter node, and a node for each predicate. The PDG does not include the CFG’s exit node. The edges of the PDG represent the data and control dependencies computed using the CFG. 2.1.2 Slicing using the Program Dependence Graph To compute the slice from statement (or predicate) S, start from the PDG node that represents S and follow the data- and control-dependence edges backwards in the PDG. The components of the slice are all of the nodes reached in this manner. The computation of the data dependence graph is described in Section 2.1.3. Com- puting the control dependence graph is described in Section 2.1.4. Figure 2.4 shows an example program and its corresponding PDG. Solid lines represent control dependencies while dashes lines represent data dependencies.
  • 16. Chapter 2. Slicing 9 2.1.3 Construction of the Data Dependence Graph A data dependence graph represents the association between definitions and uses of a variable. There is an association (d, u) between a definition of variable v at d and a use of variable v at u iff there is at least one control flow path from d to u with no intervening definition of v. Each node represent a statement. An edge represents a flow dependency between statements. Though there are many kinds of data dependencies between statements, only flow dependencies are necessary for the purpose of slicing as only flow dependence needs to be traced back in order to compute the PDG nodes comprising the slice. Output and anti dependence edges do not represent true data dependence. Instead they encode a partial order on program statements, which is necessary to preserve since there is no explicit control flow relation between PDG nodes. However, PDG slices are normally mapped back to high-level source code, where control flow is explicitly represented. Thus there is no need for any such control flow information to be present in the computed PDG slice. Computation of flow dependencies is done by computing the problem of reaching definitions. The problem of reaching definitions is a classical bitvector problem solvable by monotone dataflow framework. This associates a program point with the set of definitions reaching that point. The definitions reaching a program point along with the use of a variable form flow dependencies. Dependence in presence of arrays and records In the presence of composite data types like arrays, records and pointers, the most conservative method is to assume a definition of a variable to be the definition of the entire composite object [83]. A definition (or use) of an element of an array can be considered as definition (or use) of the entire array. For example, consider the statement a[i] = x
  • 17. Chapter 2. Slicing 10 Here the variable a is defined and variables i, x are used. Thus DEF = {a} and REF = {i, x}. The value of a is used in computing the address of a[i] and thus a must also be included in the REF set. The correct value for REF is {a, i, x} [45] . This approach is conservative leading to large slices created due to spurious dependencies. Our current implementation handles composite data types in this manner, though more refined methods have been proposed in the literature. Agrawal et.al. [53] propose a modified algorithm for computing reaching definitions that determines the memory loca- tions defined and used in statements and computes whether the intersection among those locations is complete or partial or statically indeterminable. Another method to avoid spurious dependencies is to use array index tests like GCD tests which can determine that there is no dependence between two array accesses expressions. Data dependencies in presence of aliasing When computing data dependencies the major problem occurs due to presence of aliasing, Consider the following example. Here there is a data dependency between x.a = ... and ... = y.a since both x and y point to the object o1. Without alias analysis this dependency is missed because the syntactic expressions x.a and y.a are different. Thus resolving aliases is necessary for the correct computation of data dependencies. Also if worst case assumptions are made for field loads and stores, many spurious dependencies are created. v o i d fun ( ) { obj x , y ; x=new o b j ( ) ; // o1 i s th e o b j e c t c r e a t e d y=x ; x.a = . . . . ; ... = y.a ; }
  • 18. Chapter 2. Slicing 11 P if(x>y) S1 max = x; else S2 max = y; 2.1.4 Control Dependence Graph Another kind of dependence between statements arises due to the presence of control structure. For example, in the above code, the execution of S1 is dependent on the predicate x > y . Thus S1 is said to be control dependent on P. A slice with respect to S1 has to include P, because the execution of S1 depends on the outcome of the predicate node P. Two nodes Y and Z should be identified as having identical control conditions if in every run of the program node, Y is executed if and only if Z is executed. In Figure 2.1, nodes 2 and 5 are said to be control dependent on the true branch of node 1, since their execution is dependent conditionally on the outcome of node 1. The original method for computing control dependence information using postdominators is presented by Ferrante et.al. [47]. Cytron et.al. [46] gives an improved method for constructing control dependence information by using dominance frontiers. Finding control dependence using postdominators relationship A node X is said to be a postdominator of node Y if all possible paths from Y to the exit node must pass through X. A node N is said to be control dependent on edge a → b , if 1. N postdominates b 2. N does not postdominate a In Figure 2.1, to find the nodes that are control dependent on edge 1 → 2, we find nodes that postdominate node 2 but not node 1. Nodes 2 and 5 are such nodes. So nodes 2 and 5 are control dependent on the edge 1 → 2.
  • 19. Chapter 2. Slicing 12 This observation suggests that to find the nodes that are control dependent on the edge X → Y , we can traverse the postdominator tree and mark all nodes that postdom- inate Y to be control dependent on Y , we stop when we reach the postdominator of X. HIJK ONML 1 ÐÐ UU ÐÐ UU HIJK ONML Ð Ð ÐÐ UU 7 b UU ÐÐ bb HIJK ONML 2 b UU ÐÐÐ bb bb Ð bb UU ÐÐ bb Ð bb UU ÐÐ b1 ÐÐ bb ÐÐ Ð Ð ÐÐ 0 U' HIJK ONML 5 b HIJK ONML 6 HIJK ONML 1 HIJK ONML HIJK ONML HIJK ONML Ð bb 3 b 4 6 ÐÐÐ bb bb Ð ÐÐ bb bb Ð × × ÐÐ bb bb ÐÐ ×× Ð b1 ÐÐ ÐÐ 0 Ð Ð ×× HIJK ONML 2 HIJK ONML 4 HIJK ONML 3 HIJK ONML 5 b ×× bb ×× bb ××× bb 0 ×× Ó HIJK ONML 7 Figure 2.2: Post Dominator Tree for the CFG in Figure 2.1 Figure 2.1: A Control Flow Graph Using Dominance Frontiers to compute Control Dependence Control dependencies between statements can be computed in an efficient manner us- ing the dominance frontier information. Cytron et.al. [46] describes the method for computing dominance frontiers. A dominance frontier for vertex vi contains all vertices vj such that vi dominates an immediate predecessor of vj , but vi does not strictly dominate vj [62] DF (vi ) = { vj | (vj ∈ V ) (∃vk ∈ P red(vj )) ((vi dom vk ) ∧ ¬(vi sdom vj )) } Informally, the set of nodes lying just outside the dominated region of Y is said to
  • 20. Chapter 2. Slicing 13 HIJK ONML S  Ö  Ö  ÖÖ   ÖÖ @ ÖÖÖ HIJK ONML Y h Ö ÖÖ {{ hh hh ÖÖ {{ hh ÖÓ Ö }{{{ 3 ONML HIJK Z WVUT PQRS Y g PQRS WVUT Y QQ gg z QQ gg zz QQ gg zz 3 zz } QQ PQRS WVUT QQ Y QQ uuu QQ uu QQ uuu QQ uu uuu Q% uu zuu HIJK ONML X Figure 2.3: Dominance Frontiers be in the dominance frontier of Y. In the example in Figure 2.3, Y dominates nodes Y’,Y”,Y”’ and X lies just outside the dominated region. So X is said to be in the dominance frontier of Y. Note that if X is in the dominance frontier of Y , then there would be at least two incoming paths to X of which one contains Y another not does not. If the CFG is reversed, then we have two outgoing paths from X, one containing Y and another not containing Y. This is same as the condition for Y to be control dependent on X. Thus to find control dependence it is enough to find the dominance frontiers on the reverse control flow graph. Algorithm 1 computes the control dependence information.
  • 21. Chapter 2. Slicing 14 Algorithm 1 Algorithm to compute the Control Dependence Graph compute dominance frontiers of reversed CFG G i.e. for all N in G do let RDF (N ) be reverse dominator frontiers of N if RDF (N ) is empty then N is made control dependent on method entry node end if for all node P in RDF (N ) do for all node S in CFG successor of P do if S = N or N postdominates S then N is made control dependent on P end if end for end for end for 2.1.5 Slicing in presence of unstructured control flow In the presence of unstructured control flow caused due to jump statements like goto, break, continue and return, the algorithm for slicing can produce an incorrect slice. While Java does not have goto statements, break and continue statements cause unstructured control flow. Consider computing slice with respect to the statement print(prod) in Figure 2.4. When the slicing algorithm discussed in Section 2.1.2 is applied , the state- ment break is not included, which is incorrect. This was discovered by Choi and Ferrante [38] and by Ball and Horwitz [37] who present a method to compute a correct slice in presence of unstructured control flow statements. Their method to correct for such statements is based on the observation that jumps are similar to predicate nodes in a way - both affect flow of control. Thus jumps are also made to be sources of control dependence edges. A jump vertex has an outgoing true edge to the target of the jump, and an outgoing false edge to the statement that would execute if the jump were a no-op. A jump vertex is considered as a pseudo predicate since the outgoing false edge is non-executable. The original CFG augmented with these non-executable edges is called the Augmented Control Flow Graph (ACFG). Kumar and Horwitz [39] describe the following algorithm for slicing in presence of jump statements.
  • 22. Chapter 2. Slicing 15 enter prod = 1; k = 1; prod = 1 while (k = 10) { k=1 if (MAXINT/k prod) break; prod = prod * k; while (k = 10) T k++; F if (MAXINT/k prod) } print(k) print(k); T F print(prod); print(prod) break prod = prod * k exit k++ (a) Example Program (b) CFG enter prod = 1 k=1 while (k = 10) print(k) print(prod) if (MAXINT/k prod) break k++ prod = prod * k (c) PDG Figure 2.4: A program and its PDG (taken from [39])
  • 23. Chapter 2. Slicing 16 enter enter prod = 1 prod = 1 print(prod) k=1 k=1 print(k) while (k = 10) T while (k = 10) F if (MAXINT/k prod) print(k) T T F if (MAXINT/k prod) print(prod) break prod = prod * k F break k++ exit k++ prod = prod * k (a) ACFG (b) Corresponding APDG Figure 2.5: Augmented CFG and PDG for the program in Figure 2.4 (taken from [39])
  • 24. Chapter 2. Slicing 17 1. Build the program’s augmented control flow graph described previously. Labels are treated as separate statements; i.e., each label is represented in the ACFG by a node with one outgoing edge to the statement that it labels. 2. Build the program’s augmented PDG. Ignore the non-executable ACFG edges when computing data-dependence edges; do not ignore them when computing control- dependence edges. (This way, the nodes that are executed only because a jump is present, as well as those that are not executed but would be if the jump were removed, are control dependent on the jump node, and therefore the jump will be included in their slices.) 3. To compute the slice from node S, follow data- and control-dependence edges back- wards from S . A label L is included in a slice iff a statement “goto L” is in the slice 2.1.6 Reconstructing CFG from the sliced PDG Reconstructing the CFG from the PDG is described in in [71]. From the CFG and the PDG slice, a sliced CFG is constructed by walking through all nodes. For each node n, we execute the following. 1. If n is a goto statement or return statement, leave it in the slice 2. If n is a conditional statement , there are three cases (a) If n is not in the PDG slice, it can be removed (b) If n is in the PDG slice, but one of the branches is not, replace the jump to that branch with a jump to the convergence node of the branch (the node where two branches reconnect). If that node doesn’t exist , replace the jump with a jump to the return statement of the program (c) If n is present in the PDG slice and both branches are present leave n in the CFG
  • 25. Chapter 2. Slicing 18 main() { sum=0; i=1; while(i11) { sum=add(sum,i); i=add(i,1); } print(sum); print(i); } int add(int a,int b) { result=a+b; return result; } Figure 2.6: A program with function calls 3. Otherwise check if n is present in the PDG, if not remove it We next describe the interprocedural slicing algorithm implemented in this thesis. 2.2 Interprocedural Slicing using SDG 2.2.1 System Dependence Graph For interprocedural slicing, Horwitz et.al [63] introduce the System Dependence Graph (SDG). A system-dependence graph is a collection of program-dependence graphs, one for each procedure, with additional edges for modeling parameter passing. Figure 2.6 shows a program with function calls. Figure 2.7 displays its SDG. Each PDG contains an entry node that represents entry to the procedure. To model procedure calls and parameter passing, an SDG introduces additional nodes and edges. Accesses to global variables are modeled via additional parameters of the procedure. They assume parameters are passed by value-result, and introduce additional nodes in
  • 26. Chapter 2. Slicing 19 main sum=0 i=1 while(i11) print(sum) print(i) call add call add a_in=sum a_in=i i=r_out sum=r_out b_in=i b_in=1 enter add a=a_in b=b_in r_out=result result=a+b control edge parameter edge data edge call edge summary edge Figure 2.7: System Dependence Graph for an interprocedural program
  • 27. Chapter 2. Slicing 20 the interprocedural case. The following additional nodes are introduced. 1. Call-site nodes representing the call sites. 2. Actual-in and actual-out nodes representing the input and output parameters at the call sites. They are control dependent on the call-site node. 3. Formal-in and formal-out nodes representing the input and output parameters at the called procedure. They are control dependent on the procedure’s entry node. They also introduce additional edges to link the program dependence graphs together: 1. Call edges link the call-site nodes with the procedure entry nodes. 2. Parameter-in edges link the actual-in nodes with the formal-in nodes. 3. Parameter-out edges link the formal-out nodes with the actual-out nodes 2.2.2 Calling context problem For computing an intraprocedural slice, a simple reachability algorithm on the PDG is sufficient. However in interprocedural case, a simple reachability over the SDG doesn’t work since not all the paths are valid. For example, in Figure 2.7, the path a in = sum → a = a in → result = a + b → r out = result → i = r out is not valid interprocedurally. In an interprocedural valid path, a call edge must be matched with its corresponding return edge. To address this problem, Horwitz et.al. [63] introduce the concept of summary edges. These edges summarize the effect of a procedure call. There is a summary edge between an actual in and an actual out node of a call site, if there is a dependency between the corresponding formal in and formal out node of the called procedure. Thus a summary edge summarizes the effect of a procedure call.
  • 28. Chapter 2. Slicing 21 2.2.3 Computing Summary Edges We describe computation of summary edges in Algorithm 2. The algorithm takes the given SDG and adds summary edges. P is the set of path edges. Each edge in P of the form (n, m) encodes the information that there is a realizable path in the SDG from n to m. The worklist contains path edges that need to be processed. The algorithm begins by asserting that there is a realizable path from each formal out node to itself. The set of realizable paths P is extended by traversing backwards through dependence edges. If during the traversal, a formal in-node is encountered, then we have a realizable path from formal-in to formal-out node. Therefore a summary edge is added between the actual in and actual out nodes of the corresponding call sites. Because the insertion of summary edges makes more paths feasible, this process is continued iteratively, till no more summary edges can be added. The algorithm for computing summary information is displayed in Algorithm 2 Computing the summary edges is equivalent to the functional approach suggested by Sharir and Pnueli [41]. 2.2.4 The Two Phase Slicing Algorithm Horwitz et.al [63] describe the two phase algorithm. The interprocedural backward slicing algorithm consists of two phases. The first phase traverses backwards from the node in the SDG that represents the slicing criterion along all edges except parameter-out edges, and marks those nodes that are reached. The second phase traverses backwards from all nodes marked during the first phase along all edges except call and parameter-in edges, and marks reached nodes. The slice is the union of the marked nodes. Let s be the slicing criterion in procedure P 1. Phase 1 identifies vertices that can reach s, and are either in P itself or in a procedure that calls P (either directly or transitively). Because parameter out edges are not followed, the traversal in Phase 1, does not descend into procedures
  • 29. Chapter 2. Slicing 22 Algorithm 2 Computing Summary Information W = ∅, W is the worklist P = ∅, P is the set of pathedges for all n ∈ N which is a formal out node do W = W ∪ (n, n) P = P ∪ (n, n) end for while W = ∅, worklist is not empty do remove one element (n,m) from worklist if n is a formal in node then for all n → n which is a parameter in edge do for all m → m which is a parameter out edge do if n and m belong to the same call site then E = E ∪ n → m add a new summary edge for all (m , x) ∈ P do P = P ∪ (n , x) W = W ∪ (n , x) end for end if end for end for else for all n → n do if (n , m) ∈ P then / P = P ∪ (n , m) W = W ∪ (n , m) end if end for end if end while
  • 30. Chapter 2. Slicing 23 called by P. Though the algorithm doesn’t descend into the called procedures, the effects of such procedures are not ignored due to the presence of summary edges. 2. Phase 2 identifies vertices that reach s from procedures (transitively) called by P or from procedures called by procedures that (transitively) call P. Because call edges and parameter in edges are not followed, the traversal in phase 2 doesn’t ascend into calling procedures; the transitive flow dependence edges from actual in to actual out vertices make such ascents unnecessary. We implemented a variation of the two phase slicing algorithm as described by Krinke [49]. Figure 2.8 shows the vertices in SDG marked during phase 1 and phase 2, when the statement print(i) is given as slicing criteria. The first phase traverses backwards along all edges except the parameter out edge r out = result → i = r out . Thus the first phase does not descend into the procedure add. In second phase traverses backwards all edges except the parameter in edges and call edges. Thus in the second phase neither the edge a in = sum → a = a in nor the edge call add → a = a in is traversed. 2.2.5 Handling Shared Variables This section deals with handling variables that are shared across procedures. Shared variables include global variables in imperative languages. Though Java does not have global variables, instance members of a class can be treated as global variables that are accessible by the member functions. Shared variables are handled by passing them as a additional parameters in every function. Considering every shared variable as a parameter is a correct but inefficient as it increases the number of nodes. We can reduce the number of parameters passed by doing interprocedural analysis and using the GMOD and GREF information [42]. 1. GMOD(P) : The set of variables that might be modified by P itself or by a proce- dure (transitively) called from P 2. GREF(P) : The set of variables that might be referenced by P itself or by a pro- cedure (transitively) called from P
  • 31. Chapter 2. Slicing 24 main sum=0 i=1 while(i11) print(sum) print(i) call add call add a_in=sum a_in=i i=r_out sum=r_out b_in=i b_in=1 enter add a=a_in b=b_in r_out=result result=a+b marked in phase 1 control edge parameter edge data edge call edge marked in phase 2 summary edge Figure 2.8: Slicing the System Dependence Graph
  • 32. Chapter 2. Slicing 25 Algorithm 3 Two phase slicing algorithm (Krinke’s version) input G=(N,E) the given SDG, s ∈ N the slicing criterion output S ⊆ N , the slice W up = s W down = ∅ First phase while W up = ∅ worklist is not empty do remove one element n from W up for all m → n ∈ E do if m ∈ S then / if m → n is a parameter out edge then W down = W down ∪ m S =S∪m else W up = W up ∪ m S =S∪m end if end if end for end while while W down = ∅ worklist not empty do remove an element n from the worklist for all m → n ∈ E do if m ∈ S then / if m → n is not a parameter in edge or call edge then W down = W down ∪ m S =S∪m end if end if end for end while
  • 33. Chapter 2. Slicing 26 GMOD and GREF sets are used to determine which parameter vertices are included in procedure dependence graphs . At procedure entry, these nodes are inserted 1. Formal in for each variable in GM OD(P ) ∪ GREF (P ) 2. Formal out for each variable in GM OD(P ) Similarly at a call site, the following nodes are inserted 1. Actual in for each variable in GM OD(P ) ∪ GREF (P ) 2. Actual out for each variable in GM OD(P ) 2.3 Slicing Object Oriented Programs The System Dependence Graph (SDG) is not sufficient to represent all dependencies for object oriented programs. An efficient graph representation of an object oriented program should employ a class representation that can be reused in the construction of other classes and applications that use the class. Section 2.3.1 discuss about dependence graph representation for object oriented programs. Sections 2.3.2 and 2.3.3 discuss about inheritance and polymorphism respectively. 2.3.1 Dependence Graph for Object Oriented Programs The dependencies within a single method are represented using a Method Dependence Graph (MDG), which is composed of data dependence subgraph and control dependence subgraph. The MDG has a method entry node which represents the start of a method. The method entry vertex has a formal in vertex for every formal parameter and a formal out vertex for each formal parameter that may be modified. Each call site has a call vertex and a set of actual parameter vertices: an actual-in vertex for each actual parameter at the call site and an actual-out vertex for each actual parameter that may be modified by the called procedure. Parameter out edges are added from each formal-out node to the corresponding actual-out node. The effects of return statements are modeled by
  • 34. Chapter 2. Slicing 27 connecting the return statement to its corresponding call vertex using a parameter-out edge. Summary edges are added from actual in to actual out nodes as described in Section 2.2.3. Larsen and Harrold [66] represent the dependencies in a class using the class de- pendence graph (ClDG). A ClDG is a collection of MDGs constructed for individual methods in the program. In addition it contains a class entry vertex that is connected to the method entry vertex for each method in the class by a class member edge. Class entry vertices and class member edges let us track dependencies that arise due to interaction among classes. In presence of multiple classes, additional dependence edges are required to record the interaction between classes. For example, when a class C1 creates an object of class C2, there is an implicit call to C2’s constructor. When there is a call site in method m1 of class C1 to method m2 of class C2 , there is a call dependence edge from the call site in m1 to method start vertex of m2 . Parameter in edges are added from actual in to the corresponding formal in node and parameter out edges are added from formal out to the corresponding actual in node. In object oriented programs, data dependence computation is complicated by the fact that statements can read to and write from fields of objects, i.e. a statement can have side effects. Computation of side effect information requires points to analysis and is further discussed in Chapter 3. Also, methods can be invoked on objects and objects can be passed as parameters. An algorithm for computing data dependence must consider this into account. Handling objects at callsites In presence of a function call invoked on an object such as o.m1(), the function call can modify the data members of o. Larsen and Harrold observe that data member variables of a class are accessible to all methods in the class and hence can be treated as global variables. They use additional parameters to represent the data members referenced by a method. Thus the data dependence introduced by two consecutive method calls via data
  • 35. Chapter 2. Slicing 28 class Base { int a,b; protected void vm() { a=a+b; } public Base() { class Derived extends Base { a=0; long d; b=0; public void vm() { } d=d+b; public void m2(int i) { } b=b+i; public Derived() { } super(); public void m1() { d=0; if(b0) vm(); } b=b+1; public void m3() { } d=d+1; m2(1); public void main1() { } Base o = new Base(); public void m4() { Base ba = new Base(); m1(); ba.m1(); } ba.m2(1); o.m2(1); public void main2() { } int i=read(); public void C(Base ba) { Base p; ba.m1(); if(i0) ba.m2(1); p=new Base(); } else public void D() { p=new Derived(); Base o = new Base(); C(p); C(o); p.m1(); o.m1(); } } } } Figure 2.9: Program
  • 36. Chapter 2. Slicing 29 Figure 2.10: The Dependence Graph for the main function (from [67]) Figure 2.11: The Dependence Graphs for functions C() and D() (from [67])
  • 37. Chapter 2. Slicing 30 member variables can be represented as data dependence between the actual parameters at the method callsites. Figure 2.10 shows the dependence graph constructed for the main program of Figure 2.9. Variables a and b are considered as global variables shared across methods m1(), m2() and Base(). The data member variables are considered as additional parameters that are passed to the function. This method of slicing includes only those statements that are necessary for data members at the slicing criteria to receive correct values. For example, slicing with respect to the node b = b out associated with the statement o.m2() will exclude statements that assign to data member a. One source of imprecision of this method is that it does not consider the fact that data members may belong to different objects and creates spurious dependencies between data members of different objects. In the above example, the slice wrongly includes the statements ba.m1() and ba.m2(). Liang and Harrold [67] give an improved algorithm for object sensitive slicing. In the dependence graph representation of [67], the constructor has no formal in vertices for the instance variables since these variables cannot be referenced before they are allocated by the class constructor. Thus the algorithm omits formal-in vertices for instance variables in the class constructor In the approaches of [67], [66] the data members of the class are treated as additional parameters to be passed to the function. This increases the number of parameter nodes. The number of additional nodes can be reduced using GMOD/GREF information. Actual-out and Formal-out vertices are needed only for those data members that are modified by the member function. Actual-in and Formal-in vertices are needed for those data members accessed by the function. Handling Parameter Objects Tonella [59] represents an object as a single vertex when the object is used as a parameter. This representation can lead to imprecise slices because it considers modification (or access) of an individual field in an object to be a modification(or access) of the entire object. For example, if the slicing criteria is o.b at the end of D() (in Figure 2.9), then C(o) must be included. This in turn causes the slicer to include the parameter ba,
  • 38. Chapter 2. Slicing 31 which causes ba.a and ba.b to be included, though ba.a does not affect o.b. To overcome this limitation, Liang and Harrold [67] expand the parameter object as a tree. Figure 2.11 shows the parameter ba being expanded into a tree. At the first level, the node representing ba is expanded into two nodes, Base and Derived each representing the type ba can possibly have. At the next level, each node is expanded into its constituent data members. Since data members can themselves be objects, the expansion is recursively done till we get primitive data types. In presence of recursive data types, where tree height can be infinite , k-limiting is used to limit the height of the tree to k. At the call statement C(o) in Figure 2.9, the parameter object o is expanded into its data members. At the function call, actual in and actual out vertices are created for the data members of o. Summary edges are added between the actual in and actual out vertices if there is a dependence possible through the called procedure. 2.3.2 Handling Inheritance Java provides a single inheritance model which means that a new Java class can be designed that inherits state variables and functionality from an existing class. The functionality of base class methods can be overridden by simply redefining the methods in the base class. Larsen and Harrold [66] construct dependence graph representations for methods defined by the derived class . The representations of all methods that are inherited from superclasses are simply reused. To construct the dependence graph representation of class Derived (Figure 2.9), new representations are constructed for methods such m3(), m4(). The representation of m1() is reused from class Base Liang and Harrold [67] illustrate that in the presence of virtual methods, it is not pos- sible to directly reuse the representations of the methods of the superclass.For example, we cannot directly reuse the representation for m1() in class Base when we construct the representation for class Derived. In the Base class , the call statement vm() in m1() resolves to Base :: vm(). If a class derived from Base redefines vm(), then the call statement vm() no longer resolves to Base :: vm(), but to the newly defined vm() of the derived class. The callsites in the representation of m1() for class Derived have to be
  • 39. Chapter 2. Slicing 32 changed. A method needs a new representation if 1. the method is declared in the new class 2. the method is declared in a lower class in the hierarchy and calls a newly redefined virtual method directly or indirectly. For example, methods declared in Dervied need a new representation because these methods satisfy (1), Base.m1() also needs a new representation because it satisfies (2): Base.m1() calls Dervied.vm() which is redefined in class Derived Handling Interfaces In Java, interfaces declare methods but let the responsibility of defining the methods to concrete classes implementing the interface. Interfaces allows the programmer to work with objects by using the interface behavior that they implement, rather than by their class definition. Single Interfaces We use the interface representation graph [58] to represent a Java interface and its corresponding classes that implement it. There is a unique vertex called interface start vertex for the entry of the interface. Each method declaration in the interface can be regarded as a call to its corresponding method in a class that implements it and therefore a call vertex is created for each method declaration in the interface. The interface start vertex is connected to each call vertex of the method declaration by interface membership dependence arcs. If there are more than once classes that implement the interface, we connect a method call in the interface to every corresponding method that implement it in the classes. Interface Extending Similar to extending classes, the representation of extended interface is constructed by reusing the representation of all methods that are inherited from superinterfaces. For newly defined methods in the extended interface, new repre- sentations are created.
  • 40. Chapter 2. Slicing 33 ie1 interface A { ie1 c1 void method1(int h); (a) c2 void method2(int v); f1_in: this.h=this.h_in } f2_in: this.v=this.v_in ie3 interface B extends A { c1 c2 f3_in: this.u=this.u_in c4 void method3(int u); f4_in: h1=h1_in } f5_in: v1=v1_in a1_in a2_in f6_in: u1=u1_in ce5 class C1 implements A { f7_in: h2=h2_in s6 int h, v; e9 e16 f8_in: v2=v2_in e7 public void method1(int h1) { e7 e13 a1_in: h1_in=h s8 this.h = h1; a2_in: v1_in=v } a3_in: u1_in=u e9 public void method2(int v1) { s10 this.v = v1; f4_in f7_in f5_in f8_in } s10 s17 s8 s14 } ce11 class C2 implements A { s12 int h, v; e13 public void method1(int h2) { ie3 s14 this.h = h2+1; (b) } e16 public void method2(int v2) { s17 this.v = v2+1; } c1 c2 c4 } ce18 class C3 implements B { interface-membership s19 int h, v, u; a1_in a2_in a3_in dependence arc e20 public void method1(int h1) { s21 this.h = h1+2; control dependence arc e20 e22 e24 } e22 public void method2(int v1) { s23 this.v = v1+2; call dependence arc } f4_in f5_in f6_in e24 public void method3(int u1) { parameter dependence arc s25 this.u = u1+2; s21 s23 s25 } } Figure 2.12: Interface Dependence Graph (from [58])
  • 41. Chapter 2. Slicing 34 2.3.3 Handling Polymorphism In Java, method calls are bound to the implementation at runtime. Method invocation expressions such as o.m(args) are executed as follows 1. The runtime type T of o is determined. 2. Load T.class 3. Check T to find an implementation for method m. If T does not define an imple- mentation, T checks its superclass, and its superclass until an implementation is found. 4. Invoke method m with the argument list, args, and also pass o to the method, which will become the this value for method m. A polymorphic reference can refer to instances of more than one class. A class dependence graph represents such polymorphic method call by using a polymorphic choice vertex [66]. A polymorphic choice vertex represents the selection of a particular call given a set of possible destinations. In this method a message sent to a polymorphic object is represented as a set of callsites one for each candidate message handling method, connected to a polymorphic choice vertex with polymorphic choice edges. This approach may give incorrect results: in function main() , Larsen’s approach uses only one callsite to represent statement p.m1() because m1() is declared only in Base. However, when m1() is called from objects of class Derived, it invokes Derived.vm() to modify d and when m1() is called from objects of class Base, it invokes Base.vm() to modify a. One callsite cannot precisely represent both cases. This approach also computes spurious dependence: the approach is equivalent to using several objects, each belonging to a different type to represent a polymorphic object. The data dependence construction algorithm cannot distinguish data members with the same names in these different objects. Liang and Harrold [67] give an improved method in representing polymorphism to overcome this limitation. A polymorphic object is represented as a tree: the root of the tree represents the polymorphic object and the children of the root represent objects of
  • 42. Chapter 2. Slicing 35 the possible types. When the polymorphic object is used as a parameter, the children are further expanded into trees; when the polymorphic object receives a message, the children are further expanded into callsites. In Figure 2.11 the callsite ba.m1() can have receiver types Base and Derived . Thus the call site is expanded (one for each type of receiver). 2.3.4 Case Study - Elevator Class and its Dependence Graph Figure 2.13 shows the elevator program and the slice with respect to the line 59. Figure 2.14 shows the class dependence graph constructed for the program. The C++ Elevator class discussed in [72] has been modified for Java.
  • 43. Chapter 2. Slicing 36 30 } 31 int current floor; 1 class Elevator { 32 int current direction; 33 int top floor; 2 static int UP=1, DOWN=-1; 34 } 3 public Elevator(int t) { 35 class AlarmElevator extends Elevator { 4 current floor=1; 36 public AlarmElevator(int top floor) { 5 current direction = UP; 6 top floor = t; 37 super(top floor); 7 } 38 alarm on=0; 39 } 8 public void up() { 40 public void set alarm() { 9 current direction=UP; 41 alarm on=1; 10 } 42 } 43 public void reset alarm() { 11 public void down() { 44 alarm on=0; } 12 current direction=DOWN; 45 public void go(int floor) { 13 } 46 if(!alarm on) 14 int which floor() { 47 super.go(floor); 15 return current floor; 48 } 16 } 49 protected int alarm on; 17 public int direction() { 50 } 18 return current direction; 19 } 51 class Test { 52 public static void main(String args[]) { 20 public void go (int floor) { 53 Elevator e; 54 if(condition) 21 if(current direction==UP) { 55 e=new Elevator(10); 22 while (current floor!= floor 56 else 23 current floor = top floor)) 57 e=new AlarmElevator(10); 24 current floor= current floor+1 ; 58 e.go(5); 25 } 59 System.out.print(e.which floor()); 26 else { 60 } 27 while (current floor != floor 61 } 28 current floor 0) 29 current floor= current floor-1; Figure 2.13: The Elevator program
  • 44. Chapter 2. Slicing 37 52 slice point 54 58 59 A4_in 14 57 55 P1 F1_in 15 A10_in A4_out A5_out A6_out A7_out A11_in A4_out A5_out A6_out A4_in A5_in A6_in A7_in A9_in A4_out A4_in A5_in A6_in A9_in A4_out 36 F3_in F1_out F2_out F3_out F8_out 40 37 31 A8_in A4_out A5_out A6_out 3 F1_in F2_in F3_in F8_in F5_in F1_out F4_in F1_out F2_out F3_out 32 33 4 5 6 control dependence A4_out edge A4_in A5_in A6_in A8_in 20 data dependence edge F1_in F2_in F3_in F5_in F1_out summary edge 21 call edge, parameter edge 22 27 key for parameter vertices A1_in: a_in = current_floor 24 29 A1_out: current_floor = a_out F1_in: current_floor = current_floor_in A2_in: b_in = 1 F1_out: current_floor_out = current_floor A3_in: b_in: = ?1 F2_in: current_dirn = current_dirn_in A4_in: current_floor_in = current_floor F2_out: current_dirn_out = current_dirn A4_out: current_floor = current_floor_out F3_in: top_floor = top_floor_in A5_in: current_dirn_in = current_dirn F3_out: top_floor_out = top_floor A5_out: current_dirn = current_dirn_out F4_in: 1_top_floor = 1_top_floor_in A6_in: top_floor_in = top_floor F5_in: floor = floor_in A6_out: top_floor = top_floor_out F6_in: a = a_in A7_in: alarm_on_in = alarm_on F6_out: a_out = a A7_out: alarm_on = alarm_on_out F7_in: b = b_in A8_in: 1_top_floor_in = 1_top_floor F8_in: alarm_on = alarm_on_in A9_in: floor_in = 5 F8_out: alarm_on_out = alarm_on A!0_in: top_floor = 10 A11_in: 1_top_floor = 10 Figure 2.14: Dependence Graph for Elevator program
  • 45. Chapter 3 Points to Analysis In this chapter we first discuss the need for points to analysis. In the context of slicing, points to analysis is essential for the correct computation of data dependencies and construction of call graph. We summarize some issues related to computing points to sets, including the methods for its computation and various factors that affect precision . We next describe Andersen’s algorithm for pointer analysis for C and its adaptation for Java. We then describe a new method for intra-procedural alias analysis which is an improvement over flow insensitive analysis but not as precise as a flow sensitive analysis. 3.1 Need for Points to Analysis The goal of pointer analysis is to statically determine the set of memory locations that can be pointed to by a pointer variable. If two variables can access the same memory location, the variables are said to be aliased. Alias analysis is necessary for program anal- ysis, optimizations and correct computation of data dependence which is necessary for slicing. Consider the computation of data dependence in Figure 3.1. Here the statement print(y.a) is dependent on x.a=... , since x and y are aliased due to the execution of the statement y=x. Without alias analysis, it is not possible to infer that statement 7 is dependent on statement 4. A points to graph gives information about the set of memory locations pointed at by 38
  • 46. Chapter 3. Points to Analysis 39 1 void fun() { 2 obj x,y; 3 x=new obj(); // O1 represent the object allocated 4 x.a = ....; 5 ... = y.a; 6 y = x; 7 print(y.a); 8 } Figure 3.1: Need for Points to Analysis each variable. Figure 3.1 shows a program and its associated points to graph. In C a variable can point to another stack variable or dynamically allocated memory on heap, whereas in Java a reference variable can point only to objects allocated on heap, as stack variables cannot be pointed to due to lack of address of operator (). Dynamically allocated memory locations on heap are not named. One convention is to refer objects (memory locations) by the statement at which they are created. A statement can be executed many times and therefore can create a new object each time. Thus approximations are introduced in the points to graph if the above convention is used. Another cause for approximation is the presence of recursion and dynamic allocation of memory, which leads to statically unbounded number of memory locations. 3.2 Pointer Analysis using Constraints Our aim is to derive the points to graph from the program text. One method to derive the points to graph is using constraints [64]. If pts(q) denotes the set of objects initially pointed by q, after an assignment such as p = q, p can additionally point to those objects, which are initially pointed at by q. Thus we have the constraint pts(p) ⊇ pts(q). Every statement in the program has an associated constraint. A solution to the constraints gives the points to sets associated with every variable. The constraints such as pts(p) ⊇ pts(q) are also called subset constraints or inclusion based constraints. Andersen uses subset constraints for analyzing C program and his algorithm is described in Section 3.4
  • 47. Chapter 3. Points to Analysis 40 Points to graph for a C program Points to graph for a Java program s heap2 int a=1, b=2; class Obj { int f; } int *p, *q; r heap1 Obj r,s,t; void *r, *s; h1: r = new Obj(); p = a; q h2: s = new Obj(); q = b; p h3: r.f = new Obj(); t h1: r = malloc t = s; heap2 h2: s = malloc a s f b r heap1 f heap3 f Figure 3.2: Points to Graphs Subset vs Unification Constraints The constraints generated can be either subset based or equality based. A subset con- straint such as p ⊇ q says that the the points-to set of p contains the points-to set of q. Instead of having subset constraints, Steensgaard [13] uses equality based constraints where after each assignment like p = q, the points to sets of p and q are unified i.e. the points to sets of both the variables are made identical. Steensgaard’s approach is based on a non standard type system, where type does not refer to declared type in the program source. Instead, the type of a variable describes a set of locations possibly pointed to by the variable at runtime. At initialization each variable is described by a different type. When two variables can point to the same mem- ory location, the types represented by the variables are merged. However the stronger constraints make the analysis less precise. The equality based approach is also called unification because it treats assignments as bidirectional. This unification merges the
  • 48. Chapter 3. Points to Analysis 41 points to set of both sides of the assignment and is essentially computing an equivalence relation defined by assignments, which is done by the fast union find algorithm [22] If all the variables can be assigned types, subject to the constraints, then the sys- tem of constraints is said to be satisfiable or well typed. Points-to analysis reduces to the problem of assigning types to all locations (variables) in a program, such that the variables in the program are well-typed. At the end of the analysis, two locations are assigned different types, unless they have to be described by the same type in order for the system of constraints to be well-typed. 3.3 Dimensions of Precision The various factors that contribute to the precision of the analysis computed are flow sensitivity, field sensitivity, context sensitivity and heap modelling. Ryder [17] discusses various parameters that contribute to the precision of the analysis Flow Sensitive vs Flow Insensitive approach A flow sensitive analysis takes into account the control flow structure of the program. Thus the points-to set associated with a variable is dependent on the program point. It computes the mapping variable ⊗ program point → memory location. This is precise but requires a large amount of memory since the points to sets of the same variable at two different program points may be different and their points-to sets have to be recorded separately. Flow sensitive analysis allows us to take advantage of strong updates, where after a statement x = ..., the points to information about x prior to that statement can be removed. A flow insensitive approach computes conservative information that is valid at all program points. It considers the program as a set of statements and computes points-to information ignoring control flow. Flow insensitive analysis computes a single points to relation that holds regardless of the order in which assignment statements are actually
  • 49. Chapter 3. Points to Analysis 42 executed. A flow insensitive analysis produces imprecise results. Consider the computation of data dependence for the program in Figure 3.1. If we apply flow insensitive alias anal- ysis, then the analysis will conclude that x and y can both point to O1 , and thus the statement ... = y.a (line 5) is made dependent on x.a = ... . But y can point to O1 only after the statement y = x. Thus flow insensitive analysis leads to spurious data dependence. Field Sensitivity Aggregate objects such as structures can be handled by one of three approaches: field- insensitive, where field information is discarded by modeling each aggregate with a single constraint variable; field-based, where one constraint variable models all instances of a field; and finally, field-sensitive, where a unique variable models each field instance of an object. The following table describes these approaches for the code segment x.a = new object(); y.b = x.a ; field based pts(b) ⊇ pts(a) field insensitive pts(y) ⊇ pts(x) field sensitive pts(y.b) ⊇ pts(x.a) Heap Abstraction Two variables are aliased if they can refer to the same object in memory. Thus we need to keep track of objects that can be present at runtime. The objects created at runtime cannot be determined statically and have to be conservatively approximated. The least precise manner is to consider the entire heap as a single object. The most common man- ner of abstraction is to have one abstract object per program point. This abstract object is a representative of all the objects that can be created at runtime due to that program
  • 50. Chapter 3. Points to Analysis 43 main() { object a,b,c,d; a=new object(); pts(a) ⊇ {o1} b=new object(); pts(b) ⊇ {o2} c=id(a); pts(r) ⊇ pts(a), pts(c) ⊇ pts(r) d=id(b); pts(r) ⊇ pts(b), pts(d) ⊇ pts(r) } object id(object r) { return r; } Figure 3.3: Imprecision due to context insensitive analysis point. A more precise abstraction is to take context sensitivity into account using the calling context to distinguish between various objects created at the same program point. Context Sensitivity A context sensitive analysis distinguishes between different calling contexts and does not merge data flow information from multiple contexts. In Figure 3.3, a and b point to o1 and o2 respectively. Due to the function calls, c is made to point to o1 and d is made to point to o2. So the actual points to sets are a → o1 , b → o2, c → o1 and c → d A context insensitive analysis models parameter bindings as explicit assignments. Thus r points to both the objects o1 and o2. This leads to smearing of information making c and d point to both o1 and o2. One method to incorporate context sensitivity is to summarize each procedure and embed that information at the call sites. A method can change the points to sets of all data reachable through static variables, incoming parameters and all objects created by the method and its callees. A method’s summary must include the effect of all the updates that the function and all its callees can make, in terms of incoming parameters. Thus summaries are huge. Also there is another difficulty due to call back mechanism.
  • 51. Chapter 3. Points to Analysis 44 In presence of dynamic binding, we do not know which method would be called making it difficult to summarize the method [1]. Another method to incorporate context sensitivity is the cloning based approach. Cloning based approaches expands the call graph for each calling context. Thus there is a separate path for each calling context. A context insensitive algorithm can thus be run on the expanded graph. This leads to an exponential blowup. Whaley and Lam [18] use Binary Decision Diagrams (BDD) are used to handle the exponential increase in complexity caused due to cloning. BDDs were first used for pointer analysis by Berndl et.al [31]. Milanova et.al [20] introduces object sensitivity, which is a form of context sensitivity. Instead of using the call stack to distinguish different contexts, they use the receiver object to distinguish between different contexts. 3.4 Andersen’s Algorithm for C Andersen proposed a flow insensitive , context insensitive version of points to analysis for C. His analysis modeled the heap using a separate concrete location to represent all memory allocated at a given dynamic allocation site. The implementation expressed the analysis using subset constraints and then solved the constraints. Andersen’s algorithm [64] models the points to relations as subset constraints. After a statement such as p=q, p additionally points to those objects, which are initially pointed by q. Thus we have the constraint pts(p) ⊇ pts(q). The list of constraints for C is given in Table 3.1 p = x x ∈ pts(p) p=q pts(p) ⊇ pts(q) p = ∗q ∀x ∈ pts(q), pts(p) ⊇ pts(x) ∗p = q ∀x ∈ pts(p), pts(x) ⊇ pts(q) Table 3.1: Constraints for C Constraints are represented using a constraint graph. Each node N in the constraint graph represents a variable and is annotated with pts(N ), the set of objects the variable