- 1. Code Optimization Consideration for Optimization, Scope of Optimization, Optimization Techniques, Flow graph 1
- 2. • Compiler front-end: lexical analysis, syntax analysis, semantic analysis Tasks: understanding the source code, making sure the source code is written correctly • Compiler back-end: Intermediate code generation/improvement, and Machine code generation/improvement Tasks: translating the program to a semantically the same program (in a different language). Design Of a Compiler 2
- 3. What is Code Optimization? • Optimization is a program transformation technique, which tries to improve the code that consume less resources (i.e. CPU, Memory) and deliver high speed. • In optimization, high-level general programming constructs are replaced by very efficient low- level programming codes. A code optimizing process must follow the three rules given below: 1. The output code must not change the meaning of the program in any way. Should not change the output produced for any input Should not introduce an error 2. Optimization should increase the speed of the program and if possible, the program should demand less resources. 3. Optimization should itself be fast and should not delay the overall compiling process. 3
- 4. • Source Code: -Algorithms transformation can produce spectacular improvements • Intermediate Code: -Compiler can improve loops, procedure calls and address calculations -Typically only optimizing compilers include this phase • Target Code: - Compilers can use registers efficiently Optimized code’s features: • Executes faster • Code size get reduced • Efficient memory usage • Yielding better performance • Reduces the time and space complexity 4 Improvements can be made at various phases:
- 5. Organization of an optimizing compiler 5 Control flow analysis Data flow analysis Transformation Code optimizer Intermediate Code Generator Target Code Generator
- 6. • Flow analysis is a fundamental prerequisite for many important types of code improvement. • Generally control flow analysis precedes data flow analysis. • Control flow analysis (CFA) represents flow of control usually in form of graphs, CFA constructs such as • control flow graph -graphical representation of control flow or computation during the execution. • Call graph - represents calling relationships between subroutines. • Data flow analysis (DFA) is the process of asserting and collecting information prior to program execution about the possible modification, preservation, and use of certain entities (such as values or attributes of variables) in a computer program. 6 Flow analysis - Organization of an optimizing compiler
- 7. Basic Blocks Basic blocks • Basic blocks are sequences of intermediate code in which flow of control enters at the beginning and leaves at the end without any halt or possibility of branching except at the end. Example • Sequence of three-address statements forms a basic block: t1 : = a * a t2 : = a * b t3 : = 2 * t2 t4 : = t1 + t3 t5 : = b * b t6 : = t4 + t5 • Basic blocks are represented as directed acyclic blocks(DAGs), which are in turn represented using the value- numbering method applied on quadruples • Optimization is done on these basic blocks
- 8. A basic block begins in one of the following ways: • the entry point into the function. • the target of a branch (can be a label) • the instruction immediately following a branch or a return A basic block ends in any of the following ways : • a jump statement • a conditional or unconditional branch • a return statement
- 9. Basic Block Representation • A basic block is represented as a record consisting of 1. a count of the number of quadruples in the block 2. a pointer to the leader of the block which is 1. first statement of basic blocks 2. Any statement that is the target of a conditional or unconditional goto is a leader. 3. Any statement that immediately follows a goto or conditional goto statement is a leader. 3. pointers to the predecessors of the block – Previous block of current block 4. pointers to the successors of the block – Next block of current block • Note that jump statements point to basic blocks and not quadruples so as to make code movement easy 9
- 10. Basic Block Example { prod =0; i =1; do { prod =prod+ a[i] * b[i]; i =i+1; } while i < = 20 } 10 • Source code for dot product of two vectors a and b of length 20 prod = 0 i = 1 L1: t1 = 4* i t2 = a[t1] /*compute a[i] */ t3 = 4* i t4 = b[t3] /*compute b[i] */ t5 = t2*t4 t6 = prod+t5 prod = t6 t7 = i+1 i = t7 if i<=20 goto L1 • Three Address code of source code
- 11. CONTROL FLOW GRAPH (CFG) • A graph representation of three-address statements, called a flow graph. • The nodes of the CFG are basic blocks • Nodes in the flow graph represent computations, and the edges represent the flow of control. • Graph showing the different possible paths of program flow. • CFG is constructed by dividing the code into basic blocks • Flow graphs show control flow among basic blocks. • It is useful for understanding code-generation algorithms, even if the graph is not explicitly constructed by a code-generation algorithm.
- 12. 12 L1: t1 = 4* i t2 = a[t1] /*compute a[i] */ t3 = 4* i t4 = b[t3] /*compute b[i] */ t5 = t2*t4 t6 = prod+t5 prod = t6 t7 = i+1 i = t7 if i<=20 goto L1 prod = 0 i = 1 Block B1 Block B2 • One node is distinguished as the initial node (B1). • There is a directed edge B1 → B2, if B2 can immediately follow B1 in some execution sequence; i.e., • There is a conditional or unconditional jump from the last statement of B1 to the first statement of B2, or • B2 immediately follows B1. so there is an edge from B1 (last statement) to B2 (first statement), and B1 does not end in an unconditional jump. • B1 is the predecessor of B2, and B2 is a successor of B1. CONTROL FLOW GRAPH (CFG)
- 13. Global Data Flow Analysis • Collect information about the whole program. • Distribute the information to each block in the flow graph. • Data flow information: Information collected by data flow analysis. • Data flow equations: A set of equations solved by data flow analysis to gather data flow information. • Data flow analysis should never tell us that a transformation is safe. • When doing data flow analysis we must be • Conservative • Do not consider information that may not preserve the behavior of the program • Aggressive • Try to collect information that is as exact as possible, so we can get the greatest benefit from the optimizations. 13
- 14. Global Iterative Data Flow Analysis • Global: • Performed on the flow graph • Goal is to collect information at the beginning and end of each basic block • Iterative: • Construct data flow equations that describe how information flows through each basic block and solve them by iteratively converging on a solution. • Components of data flow equations • Sets containing collected information • in set: information coming into the BB from outside (following flow of data) • gen set: information generated/collected within the BB • kill set: information that, due to action within the BB, will affect what has been collected outside the BB • out set: information leaving the BB • Functions (operations on these sets) • Transfer functions describe how information changes as it flows through a basic block • Meet functions describe how information from multiple paths is combined. 14
- 15. • A Code optimizer locates between the intermediate code generator and code generator. – Can do control flow analysis. – Can do data flow analysis. – Does transformations to improve the intermediate code. 15 The Golden Rules of Optimization The 80/20 Rule In general, 80% percent of a program’s execution time is spent executing 20% of the code
- 16. Major Classifications of Code Optimization techniques Machine Dependent Optimization Machine dependent optimizations are based on register allocation and utilization of special machine-instruction sequences. It involves CPU registers and may have absolute memory references rather than relative references. Peephole optimization Register Allocation and Instruction Selection (Special Hardware features) Machine Independent Optimization It is a program transformation that improve the target code without taking into consideration any properties (CPU registers and/or absolute memory locations) of the target machine. Local optimizations Global Optimizations – Data flow analysis Inter-procedural optimizations Loop optimization 16
- 17. Scope Of Optimization • Peephole analysis • Within one or a few instructions • Local analysis • Within a basic block • Global analysis • Entire procedure or within a certain scope • Inter-procedural analysis • Beyond a procedure, consider the entire program 17
- 18. Classification of optimization There are primarily 3 types of optimizations: (1) Local optimization • Apply to a basic block in isolation (2) Global optimization • Apply across basic blocks (3) peep-hole optimization • Apply across boundaries Most compilers do (1), many do (2) and very few do (3) 18
- 19. Local optimization • Optimization performed within a basic block. • This is simplest form of optimizations • No need to analyze the whole procedure body. • Just analyze the basic blocks of the procedure. The local optimization techniques include: • Constant Folding • Constant Propagation • Algebraic Simplification • Operator Strength Reduction • Copy Propagation • Dead Code Elimination 19
- 20. Constant Folding • Evaluate constant expressions at compile time. 20 c:= 1 + 3 c:= 4 !true false Example: In the code fragment below, the expression (3 + 5) can be evaluated at compile time and replaced with the constant 8. int f () { return 3 + 5; } Below is the code fragment after constant folding. int f () { return 8; } Expressions with constant operands can be evaluated at compile time, thus improving run-time performance and reducing code size by avoiding evaluation at run-time.
- 21. Constant Propagation • Variables that have constant value, e.g. b := 3 • Later uses of b can be replaced by the constant, if no change of b in between. 21 Example: • In the code fragment below, the value of x can be propagated to the use of x. x = 3; y = x + 4; • Below is the code fragment after constant propagation and constant folding. x = 3; y = 7; b := 3 c := 1 + b d := b + c b := 3 c := 1 + 3 d := 3 + c Constants assigned to a variable can be propagated through the flow graph and substituted at the use of the variable.
- 22. Algebraic Simplification • Use algebraic properties to simplify expressions • Some expressions can be simplified by replacing them with an equivalent expression that is more efficient. 22 -(-i) i Example: The code fragment below contains expressions that can be simplified. void f (int i) { a[0] = i + 0; a[1] = i * 0; a[2] = i - i; a[3] = 1 + i + 1; } Below is the code fragment after expression simplification. void f (int i) { a[0] = i; a[1] = 0; a[2] = 0; a[3] = 2 + i; }
- 23. Operator Strength Reduction • Replace expensive operations with simpler ones • Typical cases of strength reduction occurs in address calculation of array references. • Example: Multiplications replaced by additions , 23 y := x * 2 y := x + x Replacement of an operator with a less costly one. temp = 5; for i=1 to 10 for i=1 to 10 { { … … x = i * 5 x = temp … … temp = temp + 5 } }
- 24. Copy Propagation • Given an assignment x = y, replace later uses of x with uses of y, provided there are no intervening assignments to x or y. • Example x[i] = a; x[i] = a; sum = x[i] + a; sum = a + a; • Example x := y; s := y * f(y) s := x * f(x) 24 After y is assigned to x, use y to replace x till x is assigned again reduce the copying. If y is reassigned in between, then this action cannot be performed.
- 25. Dead Code Elimination • Dead Code are portion of the program which will not be executed in any path of the program. It can be removed • Examples: • No control flows into a basic block • A variable is dead at a point (i.e) its value is not used anywhere in the program • An assignment is dead (i.e) assignment assigns a value to a dead variable • Ineffective statements: x := y + 1 (x is immediately redefined in 3rd line without use, therefore eliminate) y := 5 y := 5 x := 2 * z x := 2 * z 25 • A variable is dead if it is never used after last definition • Eliminate assignments to dead variables • Need to do data flow analysis to find dead variables
- 26. • Remove code never reached 26 if (false) {a := 5} if (false) {} Dead Code Elimination
- 27. Other types of Local Optimizations • The following two optimizations can be applied only on DAG or tree representation of basic block • Reordering statements that do not depend on one another • Reordering computations using algebraic laws Optimization of Basic Blocks • Many structure preserving transformations can be implemented by construction of DAGs of basic blocks 27
- 28. Example of a Directed Acyclic Graph (DAG) 28
- 29. DAG representation of Basic Block (BB) • Leaves are labeled with unique identifier (variable name or constant) • Interior nodes are labeled by an operator symbol • Nodes optionally have a list of labels (identifiers) • Edges relates operands to the operator (interior nodes are operator) • Interior node represents computed value • Identifier in the label are deemed to hold the value 29
- 30. Example: DAG for BB 30 t1 := 4 * i t1 * i 4 t1 := 4 * i t3 := 4 * i t2 := t1 + t3 * i 4 + t1, t3 t2 if (i <= 20)goto L1 <= i 20 (L1)
- 31. Construction of DAGs for BB • Input: Basic block, B • Output: A DAG for B containing the following information: 1) A label for each node 2) For leaves the labels are identifiers or constants 3) For interior nodes the labels are operators Data structure and functions: • Node: 1) Label: label of the node 2) Left: pointer to the left child node 3) Right: pointer to the right child node 4) List: list of additional labels (empty for leaves) • Node (id): returns the most recent node created for id. else return undef • Create(id,l,r): create a node with label id with l as left child and r as right child. l and r are optional parameters. 31
- 32. Algorithm: For each 3AC, A in BB if A is any of the following forms: 1. x = y op z 2. x = op y 3. x = y 1. if ((ny = node(y)) == undef) ny = Create (y); if (A = = type 1) and ((nz = node(z)) == undef) nz = Create(z); 32 Construction of DAGs for BB 2. If (A = = type 1) //x = y op z Find a node labelled ‘op’ with left and right as ny and nz respectively [determination of common sub-expression] If (not found) n = Create (op, ny, nz); If (A = = type 2) //x = op y Find a node labelled ‘op’ with a single child as ny If (not found) n = Create (op, ny); If (A = = type 3) n = Node (y); //x = y 2. Remove x from Node(x).list Add x in n.list Node(x) = n;
- 33. Example: DAG construction from BB t1 := 4 * i 33 * i 4 t1
- 34. Example: DAG construction from BB 34 t1 := 4 * i t2 := a [ t1 ] * i 4 t1 [] a t2
- 35. Example: DAG construction from BB 35 t1 := 4 * i t2 := a [ t1 ] t3 := 4 * i * i 4 t1, t3 [] a t2
- 36. Example: DAG construction from BB 36 t1 := 4 * i t2 := a [ t1 ] t3 := 4 * i t4 := b [ t3 ] * i 4 t1, t3 [] a t2 [] b t4
- 37. Example: DAG construction from BB 37 t1 := 4 * i t2 := a [ t1 ] t3 := 4 * i t4 := b [ t3 ] t5 := t2 + t4 * i 4 t1, t3 [] a t2 [] b t4 t5 +
- 38. Example: DAG construction from BB 38 t1 := 4 * i t2 := a [ t1 ] t3 := 4 * i t4 := b [ t3 ] t5 := t2 + t4 i := t5 * i 4 t1, t3 [] a t2 [] b t4 t5,i + • Observations: • A leaf node for the initial value of an id • A node n for each statement s • The children of node n are the last definition (prior to s) of the operands of n
- 39. Optimization of Basic Blocks • Common sub-expression elimination: by construction of DAG • Note: for common sub-expression elimination, we are actually targeting for expressions that compute the same value. 39 a = b + c b = b – d c = c + d e = b + c Common expressions (But do not generate the same result) • DAG representation identifies expressions that yield the same result a = b + c b = b – d c = c + d e = b + c b0 c0 d0 + + + - a b c e
- 40. • Dead code elimination: Code generation from DAG eliminates dead code. 40 a := b + c b := a – d d := a – d c := d + c b is not live c a := b + c d := a - d c := d + c b0 c0 d0 + - + a b, d × Optimization of Basic Blocks
- 41. Global Optimization • Optimization across basic blocks within a procedure/function • Could be restricted to a smaller scope, Example: a loop • Data-flow analysis is done to perform optimization across basic blocks • Each basic block is a node in the flow graph of the program. • These optimizations can be extended to an entire control - flow graph • Most of compiler implement global optimizations with well founded theory and practical gains 41
- 42. Interprocedural Optimizations • Spans multiple procedures, files • Not as popular as global optimizations • No single theory applicable to multiple scenarios • Time consuming 42
- 43. • Optimization technique that operates on the one or few instructions at a time. • Performs machine dependent improvements • Peeps into a single or sequence of two to three instructions (peephole) and replaces it by most efficient alternative (shorter or faster) instructions. • Peephole is a small moving window on the target systems • Characteristics of peep-hole optimizations: Redundant-instruction (loads and stores)elimination Flow-of-control optimizations - - Elimination of multiple jumps Elimination of unreachable code Algebraic simplifications Reducing operator strength Use of machine idioms 43 Peep-hole optimization
- 44. Eliminating Redundant Loads and Stores • If the target code contains the instruction sequence: • Instruction 2 can always be removed if it does not have a label. • If it is labeled, there is no guarantee that step 1 will always be executed before step 2. 44 MOV R, a MOV a, R goto L1 ... ... MOV R, a L1: MOV a, R
- 45. Eliminating Multiple Jumps • If we have jumps to other jumps, then the unnecessary jumps can be eliminated in either intermediate code or the target code. • Example: If we have a jump sequence: 45 if a < b goto L1 ... L1: goto L2 can be replaced by: if a < b goto L2 ... L1: goto L2 If there are no jumps to L1, then it may be possible to eliminate the statement, provided it is preceded by an unconditional jump.
- 46. Eliminating Unreachable Code • An unlabeled instruction that immediately follows an unconditional jump can possibly be removed, and this operation can be repeated in order to eliminate a sequence of instructions. 46 int debug = 0 if (debug) { print debugging information } this may be translated as if debug = = 1 goto L1 goto L2 L1: print debugging information L2: The statements that print the debugging information are unreachable and can be eliminated if debug != 1 goto L2 print debugging information L2:
- 47. Strength reduction • Example: • Replace X^2 computations by X*X • Replace multiplication by left shift • Replace division by right shift 47
- 48. Using Machine Idioms • The target machine may have hardware instructions to implement certain specific operations efficiently. • Detecting situations that permit the use of these instructions can reduce execution time significantly. • For example, some machines have auto-increment and auto-decrement addressing modes. • Using these addressing modes can greatly improve the quality of the code when pushing or popping a stack. • These modes can also be used for implementing statements like a = a + 1. 48 replace Add #1,R by Inc R
- 49. Other types of Code optimization 1. Function preserving Transformations Common Sub-expression Removal Copy propagation, Dead-code elimination, Constant folding 2. Loop Optimization 3. Two types of basic block optimizations I. Structure-Preserving Transformations • Common sub-expression elimination • Dead code elimination • Renaming of temporary variables • Interchange of two independent adjacent statements. II. Algebraic Transformations 49
- 50. Common Sub expression elimination • Common Sub expression elimination is a optimization that searches for instances of identical expressions (i.e. they all evaluate the same value), and • Analyses whether it is worthwhile replacing with a single variable holding the computed value. 50 Identify common sub-expression present in different expression, compute once, and use the result in all the places. Example: a := b * c temp := b * c … a := temp … … x := b * c + 5 x := temp + 5
- 51. Common Sub-expression elimination • Common sub-expression elimination • Example 1: a := b + c a := b + c c := b + c c := a d := b + c d := a • Example 2: in array index calculations • c[i+1] := a[i+1] + b[i+1] • During address computation, i+1 should be reused • Not visible in high level code, but in intermediate code
- 52. 52 Common Sub-expression evaluation z : = a + b + 10 a : = b 1 2 3 4 “a + b” is not a common sub-expression in 1 and 4 x : = a + b
- 53. Dead code Optimization: • Dead Code elimination removes code that does not affect a program. • Removing such code has two benefits. • It shrinks program size. • It avoids the executing irrelevant operations, which reduces its running time. • Two types of Dead Code elimination • Unreachable Code • Redundant statement 53
- 54. Unreachable Code - Dead code Optimization • In Computer Programming, Unreachable Code or dead code is code that exists in the source code of a program but can never be executed. Program Code if (a>b) m=a elseif (a<b) m=b elseif (a==b) m=0 else m=-1 Optimized Code if (a>b) m=a elseif (a<b) m=b else m=0 54
- 55. Redundant Code - Dead code Optimization • Redundant Code is code that is executed but has no effect on the output from a program main() { int a, b, c, r; a=5; b=6; c=a + b; r=2; r++; printf(“ %d” ,c); } Adding time & space complexity 55
- 56. Loop optimization • Loop optimization plays an important role in improving the performance of the source code by reducing overheads associated with executing loops. • The inner loops where programs tend to spend the bulk of their time. • The running time of a program may be improved if we decrease the number of instructions in an inner loop, even if we increase the amount of code outside that loop. • Loop Optimization techniques: • Code Motion - moves code outside a loop • Induction variables elimination - replace variables from inner loop • Reduction in strength - replaces expensive operation by a cheaper one, such as a multiplication by an addition 56
- 57. Code Motion - Loop Optimization • Example - Computation can be moved to outside of the loop i = 1 s= 0 do{ s= s +i a =5 i = i + 1 } while (i < =n) i = 1 s= 0 a =5 do{ s= s + i i = i + 1 } while (i < =n) Bringing a=5 outside the do while loop, is called code motion. 57
- 58. • Example for (i=0; i<n; i++) a[i] = a[i] + x/y; • Three address code for (i=0; i<n; i++) { c = x/y; a[i] = a[i] + c; } c = x/y; for (i=0; i<n; i++) a[i] = a[i] + c; 58 Code Motion - Loop Optimization
- 59. Code hoisting - Loop Optimization • Code Space reduction: Similar to common sub-expression elimination but with the objective to reduce code size. 59 Example: Code hoisting temp : = x ** 2 if (a< b) then if (a< b) then z := x ** 2 z := temp else else y := x ** 2 + 10 y := temp + 10 “x ** 2“ is computed once in both cases, but the code size in the second case reduces.
- 60. Induction variable elimination • If there are multiple induction variables in a loop, can eliminate the ones which are used only in the test condition 60 The code fragment below has three induction variables (i1, i2, and i3) that can be replaced with one induction variable int a[SIZE]; int b[SIZE]; void f (void) { int i1, i2, i3; for (i1 = 0, i2 = 0, i3 = 0; i1 < SIZE; i1++) a[i2++] = b[i3++]; return; } The code fragment below shows the loop after induction variable elimination. int a[SIZE]; int b[SIZE]; void f (void) { int i1; for (i1 = 0; i1 < SIZE; i1++) a[i1] = b[i1]; return; }
- 61. • Example s := 0; for (i=0; i<n; i++) { s := 4 * i; … } 61 Induction variable elimination s := 0; e := 4*n; while (s < e) { s := s + 4; } // i is not referenced in loop
- 62. Loop Fusion - Loop Optimization • Example for (i=0; i<n; i++) { A[i] = B[i] + 1 } for (i=0; i<n; i++) { C[i] = A[i] / 2 } for (i=0; i<n; i++) { D[i] = 1 / C[i+1] } Before Loop Fusion for (i=0; i<n; i++) { A[i] = B[i] + 1 C[i] = A[i] / 2 D[i] = 1 / C[i+1] } Is this correct? Actually, cannot fuse the third loop for (i=0; i<n; i++) { A[i] = B[i] + 1 C[i] = A[i] / 2 } for (i=0; i<n; i++) { D[i] = 1 / C[i+1] }
- 63. Loop unrolling or Loop collapsing - Loop Optimization • Execute loop body multiple times at each iteration • Try to get rid of the conditional branches, if possible • Allow optimization to cross multiple iterations of the loop • Especially for parallel instruction execution 63
- 64. Loop unrolling or Loop collapsing - Loop Optimization 64 Example: In the code fragment below, the double-nested loop on i and j can be collapsed into a single-nested loop. int a[100][300]; for (i = 0; i < 300; i++) for (j = 0; j < 100; j++) a[j][i] = 0; Here is the code fragment after the loop has been collapsed. int a[100][300]; int *p = &a[0][0]; for (i = 0; i < 30000; i++) *p++ = 0;
- 65. Renaming temporary variables • A statement t : = b + c ( t is a temporary ) can be changed to u : = b + c (u is a new temporary) and • All uses of this instance of t can be changed to u without changing the value of the basic block. • Such a block is called a normal-form block. 65 Interchange of two independent adjacent statements • Suppose a block has the following two adjacent statements: t1 : = b + c t2 : = x + y • We can interchange the two statements without affecting the value of the block if and only if neither x nor y is t1 and neither b nor c is t2.
- 66. Reference • A.V. Aho, M.S. Lam, R. Sethi, J. D. Ullman, Compilers Principles, Techniques and Tools, Pearson Edition, 2013. P. Kuppusamy - Lexical Analyzer