20101017 program analysis_for_security_livshits_lecture02_compilers
Upcoming SlideShare
Loading in...5
×
 

20101017 program analysis_for_security_livshits_lecture02_compilers

on

  • 732 views

 

Statistics

Views

Total Views
732
Views on SlideShare
338
Embed Views
394

Actions

Likes
0
Downloads
3
Comments
0

8 Embeds 394

http://www.lektorium.tv 318
http://logic.pdmi.ras.ru 45
http://compsciclub.ru 14
http://lektorium.tv 13
http://translate.googleusercontent.com 1
http://www.lektorium.tv. 1
http://censys.ru 1
http://www.compsciclub.ru 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

20101017 program analysis_for_security_livshits_lecture02_compilers 20101017 program analysis_for_security_livshits_lecture02_compilers Presentation Transcript

  • Introduction to Compilers
    Ben Livshits
    Based in part of Stanford class slides from
    http://infolab.stanford.edu/~ullman/dragon/w06/w06.html
  • Organization
    Really basic stuff
    Flow Graphs
    Constant Folding
    Global Common Subexpressions
    Induction Variables/Reduction in Strength
    Data-flow analysis
    Proving Little Theorems
    Data-Flow Equations
    Major Examples
    Pointer analysis
  • Compiler Organization
  • Dataflow Analysis Basics
    L2:
    Compiler Organization
    Dataflow analysis basics
    L3:
    Dataflow lattices
    Integrative dataflow solution
    Gen/kill frameworks
  • Pointer Analysis
    L10:
    Pointer analysis
    L11
    Pointer analysis and bddbddb
  • 6
    Really Basic Stuff
    • Flow Graphs
    • Constant Folding
    • Global Common Subexpressions
    • Induction Variables/Reduction in Strength
  • 7
    Dawn of Code Optimization
    A never-published Stanford technical report by Fran Allen in 1968
    Flow graphs of intermediate code
    Key things worth doing
  • 8
    Intermediate Code
    for (i=0; i<n; i++)
    A[i] = 1;
    Intermediate code exposes optimizable constructs we cannot see at source-code level.
    Make flow explicit by breaking into basic blocks = sequences of steps with entry at beginning, exit at end.
  • 9
    i = 0
    if i>=n goto …
    t1 = 8*i
    A[t1] = 1
    i = i+1
    Basic Blocks
    for (i=0; i<n; i++)
    A[i] = 1;
  • 10
    Induction Variables
    x is an induction variable in a loop if it takes on a linear sequence of values each time through the loop.
    Common case: loop index like i and computed array index like t1.
    Eliminate “superfluous” induction variables.
    Replace multiplication by addition (reduction in strength ).
  • 11
    Example
    i = 0
    if i>=n goto …
    t1 = 8*i
    A[t1] = 1
    i = i+1
    t1 = 0
    n1 = 8*n
    if t1>=n1 goto …
    A[t1] = 1
    t1 = t1+8
  • 12
    Loop-Invariant Code Motion
    Sometimes, a computation is done each time around a loop.
    Move it before the loop to save n-1 computations.
    Be careful: could n=0? I.e., the loop is typically executed 0 times.
  • 13
    Example
    i = 0
    i = 0
    t1 = y+z
    if i>=n goto …
    if i>=n goto …
    t1 = y+z
    x = x+t1
    i = i+1
    x = x+t1
    i = i+1
  • 14
    Constant Folding
    Sometimes a variable has a known constant value at a point.
    If so, replacing the variable by the constant simplifies and speeds-up the code.
    Easy within a basic block; harder across blocks.
  • 15
    Example
    i = 0
    n = 100
    if i>=n goto …
    t1 = 8*i
    A[t1] = 1
    i = i+1
    t1 = 0
    if t1>=800 goto …
    A[t1] = 1
    t1 = t1+8
  • 16
    Global Common Subexpressions
    Suppose block B has a computation of x+y.
    Suppose we are sure that when we reach this computation, we are sure to have:
    Computed x+y, and
    Not subsequently reassigned x or y.
    Then we can hold the value of x+y and use it in B.
  • 17
    Example
    a = x+y
    t = x+y
    a = t
    b = x+y
    t = x+y
    b = t
    c = x+y
    c = t
  • 18
    Example --- Even Better
    t = x+y
    a = t
    t = x+y
    b = t
    c = t
    t = x+y
    a = t
    b = t
    t = x+y
    b = t
    c = t
  • 19
    Data-Flow Analysis
    • Proving Little Theorems
    • Data-Flow Equations
    • Major Examples
  • 20
    An Obvious Theorem
    boolean x = true;
    while (x) {
    . . . // no change to x
    }
    Doesn’t terminate.
    Proof: only assignment to x is at top, so x is always true.
  • 21
    As a Flow Graph
    x = true
    if x == true
    “body”
  • 22
    Formulation: Reaching Definitions
    Each place some variable x is assigned is a definition.
    Ask: for this use of x, where could x last have been defined.
    In our example: only at x=true.
  • 23
    d1
    d2
    Example: Reaching Definitions
    d1: x = true
    d1
    if x == true
    d2
    d1
    d2: a = 10
  • 24
    Clincher
    Since at x == true, d1 is the only definition of x that reaches, it must be that x is true at that point.
    The conditional is not really a conditional and can be replaced by a branch.
  • 25
    Not Always That Easy
    int i = 2; int j = 3;
    while (i != j) {
    if (i < j) i += 2;
    else j += 2;
    }
    We’ll develop techniques for this problem, but later …
  • 26
    d1
    d2
    d3
    d4
    d2, d3, d4
    d1, d3, d4
    d1, d2, d3, d4
    d1, d2, d3, d4
    The Flow Graph
    d1: i = 2
    d2: j = 3
    if i != j
    d1, d2, d3, d4
    if i < j
    d4: j = j+2
    d3: i = i+2
  • 27
    DFA Is Sometimes Insufficient
    In this example, i can be defined in two places, and j in two places.
    No obvious way to discover that i!=j is always true.
    But OK, because reaching definitions is sufficient to catch most opportunities for constant folding (replacement of a variable by its only possible value).
  • 28
    Be Conservative!
    (Code optimization only)
    It’s OK to discover a subset of the opportunities to make some code-improving transformation.
    It’s notOK to think you have an opportunity that you don’t really have.
  • 29
    Example: Be Conservative
    boolean x = true;
    while (x) {
    . . . *p = false; . . .
    }
    Is it possible that p points to x?
  • 30
    Another
    def of x
    d2
    As a Flow Graph
    d1: x = true
    d1
    if x == true
    d2: *p = false
  • 31
    Possible Resolution
    Just as data-flow analysis of “reaching definitions” can tell what definitions of x might reach a point, another DFA can eliminate cases where p definitely does not point to x.
    Example: the only definition of p is p = &y and there is no possibility that y is an alias of x.
  • 32
    Reaching Definitions Formalized
    A definition d of a variable x is said to reach a point p in a flow graph if:
    Every path from the entry of the flow graph to p has d on the path, and
    After the last occurrence of d there is no possibility that x is redefined.
  • 33
    Data-Flow Equations --- (1)
    A basic block can generate a definition.
    A basic block can either
    Kill a definition of x if it surely redefines x.
    Transmit a definition if it may not redefine the same variable(s) as that definition.
  • 34
    Data-Flow Equations --- (2)
    Variables:
    IN(B) = set of definitions reaching the beginning of block B.
    OUT(B) = set of definitions reaching the end of B.
  • 35
    Data-Flow Equations --- (3)
    Two kinds of equations:
    Confluence equations : IN(B) in terms of outs of predecessors of B.
    Transfer equations : OUT(B) in terms of of IN(B) and what goes on in block B.
  • 36
    Confluence Equations
    IN(B) = ∪predecessors P of B OUT(P)
    {d2, d3}
    {d1, d2}
    P2
    P1
    {d1, d2, d3}
    B
  • 37
    Transfer Equations
    Generate a definition in the block if its variable is not definitely rewritten later in the basic block.
    Kill a definition if its variable is definitely rewritten in the block.
    An internal definition may be both killed and generated.
  • 38
    Example: Gen and Kill
    IN = {d2(x), d3(y), d3(z), d5(y), d6(y), d7(z)}
    d1: y = 3
    d2: x = y+z
    d3: *p = 10
    d4: y = 5
    Kill includes {d1(x), d2(x),
    d3(y), d5(y), d6(y),…}
    Gen = {d2(x), d3(x),
    d3(z),…, d4(y)}
    OUT = {d2(x), d3(x), d3(z),…, d4(y), d7(z)}
  • 39
    Transfer Function for a Block
    For any block B:
    OUT(B) = (IN(B) – Kill(B)) ∪Gen(B)
  • 40
    Iterative Solution to Equations
    For an n-block flow graph, there are 2n equations in 2n unknowns.
    Alas, the solution is not unique.
    Use iterative solution to get the least fixed-point.
    Identifies any def that might reach a point.
  • 41
    Iterative Solution --- (2)
    IN(entry) = ∅;
    for each block B do OUT(B)= ∅;
    while (changes occur) do
    for each block B do {
    IN(B) = ∪predecessors P of B OUT(P);
    OUT(B) = (IN(B) – Kill(B)) ∪Gen(B);
    }
  • 42
    IN(B1) = {}
    OUT(B1) = {
    IN(B2) = {
    d1,
    OUT(B2) = {
    IN(B3) = {
    d1,
    OUT(B3) = {
    Example: Reaching Definitions
    d1: x = 5
    B1
    d1}
    d2}
    if x == 10
    B2
    d1,
    d2}
    d2}
    d2: x = 15
    B3
    d2}
  • 43
    Aside: Notice the Conservatism
    Not only the most conservative assumption about when a def is killed or gen’d.
    Also the conservative assumption that any path in the flow graph can actually be taken.
  • 44
    Everything Else About Data Flow Analysis
    • Flow- and Context-Sensitivity Logical Representation
    • Pointer Analysis
    • Interprocedural Analysis
  • 45
    Three Levels of Sensitivity
    In DFA so far, we have cared about where in the program we are.
    Called flow-sensitivity.
    But we didn’t care how we got there.
    Called context-sensitivity.
    We could even care about neither.
    Example: where could x ever be defined in this program?
  • 46
    Flow/Context Insensitivity
    Not so bad when program units are small (few assignments to any variable).
    Example: Java code often consists of many small methods.
    Remember: you can distinguish variables by their full name, e.g., class.method.block.identifier.
  • 47
    Context Sensitivity
    Can distinguish paths to a given point.
    Example: If we remembered paths, we would not have the problem in the constant-propagation framework where x+y = 5 but neither x nor y is constant over all paths.
  • 48
    The Example Again
    x = 3
    y = 2
    x = 2
    y = 3
    z = x+y
  • 49
    An Interprocedural Example
    int id(int x) {return x;}
    void p() {a=2; b=id(a);…}
    void q() {c=3; d=id(c);…}
    If we distinguish p calling id from q calling id, then we can discover b=2 and d=3.
    Otherwise, we think b, d = {2, 3}.
  • 50
    Context-Sensitivity --- (2)
    Loops and recursive calls lead to an infinite number of contexts.
    Generally used only for interprocedural analysis, so forget about loops.
    Need to collapse strong components of the calling graph to a single group.
    “Context” becomes the sequence of groups on the calling stack.
  • 51
    Example: Calling Graph
    t
    Contexts:
    Green
    Green, pink
    Green, yellow
    Green, pink, yellow
    s
    r
    p
    q
    main
  • 52
    Comparative Complexity
    Insensitive: proportional to size of program (number of variables).
    Flow-Sensitive: size of program, squared (points times variables).
    Context-Sensitive: worst-case exponential in program size (acyclic paths through the code).
  • 53
    Logical Representation
    We have used a set-theoretic formulation of DFA.
    IN = set of definitions, e.g.
    There has been recent success with a logical formulation, involving predicates.
    Example: Reach(d,x,i) = “definition d of variable x can reach point i.”
  • 54
    Comparison: Sets Vs. Logic
    Both have an efficiency enhancement.
    Sets: bit vectors and boolean ops.
    Logic: BDD’s, incremental evaluation.
    Logic allows integration of different aspects of a flow problem.
    Think of PRE as an example. We needed 6 stages to compute what we wanted.
  • 55
    Datalog --- (1)
    Predicate
    Arguments:
    variables or constants
    The body :
    For each assignment of values
    to variables that makes all these
    true …
    Make this
    atom true
    (the head ).
    Atom = Reach(d,x,i)
    Literal = Atom or NOT Atom
    Rule = Atom :- Literal & … & Literal
  • 56
    Example: Datalog Rules
    Reach(d,x,j) :- Reach(d,x,i) &
    StatementAt(i,s) &
    NOT Assign(s,x) &
    Follows(i,j)
    Reach(s,x,j) :- StatementAt(i,s) &
    Assign(s,x) &
    Follows(i,j)
  • 57
    Datalog --- (2)
    Intuition: subgoals in the body are combined by “and” (strictly speaking: “join”).
    Intuition: Multiple rules for a predicate (head) are combined by “or.”
  • 58
    Datalog --- (3)
    Predicates can be implemented by relations (as in a database).
    Each tuple, or assignment of values to the arguments, also represents a propositional (boolean) variable.
  • 59
    Iterative Algorithm for Datalog
    Start with the EDB predicates = “whatever the code dictates,” and with all IDB predicates empty.
    Repeatedly examine the bodies of the rules, and see what new IDB facts can be discovered from the EDB and existing IDB facts.
  • 60
    Example: Seminaive
    Path(x,y) :- Arc(x,y)
    Path(x,y) :- Path(x,z) & Path(z,y)
    NewPath(x,y) = Arc(x,y); Path(x,y) = ∅;
    while (NewPath != ∅) do {
    NewPath(x,y) = {(x,y) | NewPath(x,z)
    && Path(z,y) || Path(x,z) &&
    NewPath(z,y)} – Path(x,y);
    Path(x,y) = Path(x,y) ∪ NewPath(x,y);
    }
  • Pointer analysis
    61
  • 62
    New Topic: Pointer Analysis
    We shall consider Andersen’s formulation of Java object references.
    Flow/context insensitive analysis.
    Cast of characters:
    Local variables, which point to:
    Heap objects, which may have fields that are references to other heap objects.
  • 63
    Representing Heap Objects
    A heap object is named by the statement in which it is created.
    Note many run-time objects may have the same name.
    Example: h: T v = new T;says variable v can point to (one of) the heap object(s) created by statement h.
    v
    h
  • 64
    Other Relevant Statements
    v.f = w makes the f field of the heap object h pointed to by v point to what variable w points to.
    f
    v
    w
    f
    h
    g
    i
  • 65
    Other Statements --- (2)
    v = w.f makes v point to what the f field of the heap object h pointed to by w points to.
    v
    w
    i
    f
    h
    g
  • 66
    Other Statements --- (3)
    v = w makes v point to whatever w points to.
    Interprocedural Analysis : Also models copying an actual parameter to the corresponding formal or return value to a variable.
    v
    w
    h
  • 67
    Datalog Rules
    Pts(V,H) :- “H: V = new T”
    Pts(V,H) :- “V=W” & Pts(W,H)
    Pts(V,H) :- “V=W.F” & Pts(W,G) & Hpts(G,F,H)
    Hpts(H,F,G) :- “V.F=W” & Pts(V,H) & Pts(W,G)
  • 68
    Example
    T p(T x) {
    h: T a = new T;
    a.f = x;
    return a;
    }
    void main() {
    g: T b = new T;
    b = p(b);
    b = b.f;
    }
  • 69
    Apply Rules Recursively --- Round 1
    Pts(a,h)
    Pts(b,g)
    T p(T x) {h: T a = new T;
    a.f = x; return a;}
    void main() {g: T b = new T;
    b = p(b); b = b.f;}
  • 70
    Apply Rules Recursively --- Round 2
    Pts(x,g)
    Pts(b,h)
    T p(T x) {h: T a = new T;
    a.f = x; return a;}
    void main() {g: T b = new T;
    b = p(b); b = b.f;}
    Pts(a,h)
    Pts(b,g)
  • 71
    Apply Rules Recursively --- Round 3
    Hpts(h,f,g)
    Pts(x,h)
    T p(T x) {h: T a = new T;
    a.f = x; return a;}
    void main() {g: T b = new T;
    b = p(b); b = b.f;}
    Pts(a,h)
    Pts(b,g)
    Pts(x,g)
    Pts(b,h)
  • 72
    Apply Rules Recursively --- Round 4
    Hpts(h,f,h)
    T p(T x) {h: T a = new T;
    a.f = x; return a;}
    void main() {g: T b = new T;
    b = p(b); b = b.f;}
    Pts(a,h)
    Pts(b,g)
    Pts(x,g)
    Pts(b,h)
    Pts(x,h)
    Hpts(h,f,g)
  • 73
    Adding Context Sensitivity
    Include a component C = context.
    C doesn’t change within a function.
    Call and return can extend the context if the called function is not mutually recursive with the caller.
  • 74
    Example of Rules: Context Sensitive
    Pts(V,H,B,I+1,C) :- “B,I: V=W” & Pts(W,H,B,I,C)
    Pts(X,H,B0,0,D) :- Pts(V,H,B,I,C) & “B,I: call P(…,V,…)” & “X is the corresponding actual to V in P” & “B0 is the entry of P” & “context D is C extended by P”