Post-compiler Software Optimization for Reducing Energy
Eric Schulte, Jonathan Dorn, et all
Presented By: Abhishek Abhyankar
MS Computer Science Virginia Tech
08-May-15 Computer Architecture CS 5504 Spring 2015 1
Traditional Way of doing things
• Make a case for reduction in energy consumption.
• Traditionally Energy optimization handled in Hardware.
• Voltage Scaling , Heterogeneous Cores, Specialized Cores and, many others.
• On Software side, its mainly concerned about increasing speed and
reducing size of the compiled code.
• Extracting Instruction, Thread, and Data level parallelism.
08-May-15 Computer Architecture CS 5504 Spring 2015 2
Post Compile Software Optimization
• Handle the Optimizations on Software level.
• Take the compiled code output from standard compiler.
• How can this be achieved ? One of the approach is :
“Genetic Optimization algorithm which uses concepts from Evolutionary
computation which stochastically mutilates the software for optimum
implementation, all this while preserving strict functional semantics.”
08-May-15 Computer Architecture CS 5504 Spring 2015 3
Background Concepts
• Functional Vs Non Functional Requirements.
• On going debate between
• Functional Requirements: Adhering to Specifications, Correctness of the code.
• Non Functional Requirements: Memory Utilization, Energy Consumption.
• Stochastic Methods
• Used heavily in Evolutionary computation.
• Randomly trying out different combinations.
08-May-15 Computer Architecture CS 5504 Spring 2015 4
Background Concepts .. continued
• Profile Guided Optimizations.
• Program is profiled by running it and gathering run time data.
• Call graph generation.
• Enforcing “nearest is the best” policy.
• Software robustness even after mutilation.
• Random mutilations of the software preserve the semantic meaning.
• Many implementation possible which lead to same semantic goal.
08-May-15 Computer Architecture CS 5504 Spring 2015 5
Background Concepts .. continued
• Evolutionary Computation.
• Darwinian principles.
• Generally applied in black box approach.
• Steady State Algorithms.
• After each iterations candidates are simply inserted back in populous.
• Best among them is selected or rather worse is deleted.
08-May-15 Computer Architecture CS 5504 Spring 2015 6
Genetic Optimization Algorithm(GOA)
• “Genetic Optimization algorithm which uses concepts from
Evolutionary computation which stochastically mutilates the software
for optimum implementation, all this while preserving strict
functional semantics.”
• Takes in three inputs to start.
• Benchmark Applications or Kernels.
• Test Suites which validate the mutation.
• Fitness Function
08-May-15 Computer Architecture CS 5504 Spring 2015 7
High-level working of GOA
08-May-15 Computer Architecture CS 5504 Spring 2015 8
GOA Working .. continued
• Take the program
• create many random variants of the program by changing the order of
the instructions , deleting and editing some
• Test the new variant with the test suites which are submitted
• If they pass then check for improvement in the non functional
requirements function
• If yes spit out the assembly code as an optimized code after applying
Minimization technique.
08-May-15 Computer Architecture CS 5504 Spring 2015 9
Representation of Assembly code
• Very simple strategy adopted to represent the assembly code.
• Each line will have a cell in an array.
• One line can be broken down and also have multiple cells too.
• The Augmented instructions are avoided.
• Limits the search space.
08-May-15 Computer Architecture CS 5504 Spring 2015 10
Experimental Setup and Benchmark Kernels:
• Intel machine used as an example of Desktop computer.
• I7 , 4 Cores, 8 GB Ram
• AMD machine used as an example for Server Scale machine.
• 48 Cores, 128 GB Ram
• 8 Kernels from PARSEC benchmark suite used.
• Blackscholes, bodytrack, ferret, fluidanimate, freqmine, swaptions, vips, and
x264
• They should at-least keep the underlying Architecture running for 1
sec and produce output.
08-May-15 Computer Architecture CS 5504 Spring 2015 11
Input Test Suites
• Comprehensive test suites for each kernel.
• Smallest input size of the test suite is considered.
• Just for validating requirements specification, stress or border testing
not needed that this point.
08-May-15 Computer Architecture CS 5504 Spring 2015 12
Fitness Function
• GOA proposes a linear scalar energy model
• Hardware counters are captured using the “perf” utility in Linux.
• Tightly coupled with the underlying Architecture and Fine grained.
• Heavily dependent on time factor.
08-May-15 Computer Architecture CS 5504 Spring 2015 13
power = Cconst + Cins + Cfpos + Ctca + Cmem
energy = seconds power
ins fpos tca mem
cycle cycle cycle cycle
Constants Derived from Empirical Study
08-May-15 Computer Architecture CS 5504 Spring 2015 14
Minimization Technique
• Iteration tend to create redundant patterns of code.
• The goal is to get the best energy efficiency with least amount of
changes.
• Delta Debugging is used to compare and remove redundant , non
influential changes.
08-May-15 Computer Architecture CS 5504 Spring 2015 15
Code Example of GOA
08-May-15 Computer Architecture CS 5504 Spring 2015 16
Post processing the optimized code
• Execute the original code with Held-out test suite.
• Obtain Wall-Socket real measurements.
• Execute the optimized code with Held-out test suite.
• Obtain Wall-Socket real measurements.
• Compare the two results and find out patterns which saw
improvements and percentage improvement in Energy consumption.
08-May-15 Computer Architecture CS 5504 Spring 2015 17
Results
• In blackscholes kernel GOA caught the induced repeatition loop and
found a way around it.
• In swaptions kernel GOA gave a 42% energy savings.
• Have to take it with a pinch of salt though.
• In vips kernel , the cache misses actually increased instructions lines
decreased and hence 20% improvement was observed.
08-May-15 Computer Architecture CS 5504 Spring 2015 18
Interesting Observations
• 7% average error found in most prediction models and so as in GOA.
• But still works fine with it.
• Empirical studies show that GOA might be better suited to finding
efficient sequence of assembly instructions but not efficient memory
access patterns.
• Energy reduction percentage is consistently more on AMD machines.
• But mainly due more opportunities due to bigger machine.
08-May-15 Computer Architecture CS 5504 Spring 2015 19
QoS dependent Optimization
• “Relaxed” preservation of semantics and more emphasis on QoS.
• The plug and play testing suite policy gives the developer option of
making GOA strict or loose on semantics.
• Relaxed functional requirements provide much more energy
efficiency but risk is taken by the developer to see the program
semantic does not break.
08-May-15 Computer Architecture CS 5504 Spring 2015 20
Key contributions
• Genetic Optimization Algorithm (GOA) combines insights from profile-
guided optimization, superoptimization, evolutionary computation
and mutational robustness.
• This technique gave 20% average energy savings across all
benchmarks.
• Very simple and mostly leverages from already available techniques.
08-May-15 Computer Architecture CS 5504 Spring 2015 21
Drawbacks of GOA
• Energy constant are taken empirically over repeated run on specific
hardware.
• Introducing GOA on new architecture will take considerable amount of work.
• Non deterministic approach makes it almost impossible to restore to
earlier code path after the software is changed even slightly.
• Must provide indexing of the code paths and remember them.
• Very High quality test suites “required”
• Failure to provide them might result in over optimized false working code.
08-May-15 Computer Architecture CS 5504 Spring 2015 22
Proposed Future Work
• Currently only applied to x86.
• A matrix implementation proposed as a solution to this problem.
• Indirect selection can optimize one parameter at the cost of
worsening other.
• Should be generalized to Java Byte code and ARM.
• Instead of Compiler which takes a predefined “agreed” path, a code
should be compiled with multiple compiler using multiple paths and
then best should be selected.
08-May-15 Computer Architecture CS 5504 Spring 2015 23
Questions / Discussion
08-May-15 Computer Architecture CS 5504 Spring 2015 24

Post compiler software optimization for reducing energy

  • 1.
    Post-compiler Software Optimizationfor Reducing Energy Eric Schulte, Jonathan Dorn, et all Presented By: Abhishek Abhyankar MS Computer Science Virginia Tech 08-May-15 Computer Architecture CS 5504 Spring 2015 1
  • 2.
    Traditional Way ofdoing things • Make a case for reduction in energy consumption. • Traditionally Energy optimization handled in Hardware. • Voltage Scaling , Heterogeneous Cores, Specialized Cores and, many others. • On Software side, its mainly concerned about increasing speed and reducing size of the compiled code. • Extracting Instruction, Thread, and Data level parallelism. 08-May-15 Computer Architecture CS 5504 Spring 2015 2
  • 3.
    Post Compile SoftwareOptimization • Handle the Optimizations on Software level. • Take the compiled code output from standard compiler. • How can this be achieved ? One of the approach is : “Genetic Optimization algorithm which uses concepts from Evolutionary computation which stochastically mutilates the software for optimum implementation, all this while preserving strict functional semantics.” 08-May-15 Computer Architecture CS 5504 Spring 2015 3
  • 4.
    Background Concepts • FunctionalVs Non Functional Requirements. • On going debate between • Functional Requirements: Adhering to Specifications, Correctness of the code. • Non Functional Requirements: Memory Utilization, Energy Consumption. • Stochastic Methods • Used heavily in Evolutionary computation. • Randomly trying out different combinations. 08-May-15 Computer Architecture CS 5504 Spring 2015 4
  • 5.
    Background Concepts ..continued • Profile Guided Optimizations. • Program is profiled by running it and gathering run time data. • Call graph generation. • Enforcing “nearest is the best” policy. • Software robustness even after mutilation. • Random mutilations of the software preserve the semantic meaning. • Many implementation possible which lead to same semantic goal. 08-May-15 Computer Architecture CS 5504 Spring 2015 5
  • 6.
    Background Concepts ..continued • Evolutionary Computation. • Darwinian principles. • Generally applied in black box approach. • Steady State Algorithms. • After each iterations candidates are simply inserted back in populous. • Best among them is selected or rather worse is deleted. 08-May-15 Computer Architecture CS 5504 Spring 2015 6
  • 7.
    Genetic Optimization Algorithm(GOA) •“Genetic Optimization algorithm which uses concepts from Evolutionary computation which stochastically mutilates the software for optimum implementation, all this while preserving strict functional semantics.” • Takes in three inputs to start. • Benchmark Applications or Kernels. • Test Suites which validate the mutation. • Fitness Function 08-May-15 Computer Architecture CS 5504 Spring 2015 7
  • 8.
    High-level working ofGOA 08-May-15 Computer Architecture CS 5504 Spring 2015 8
  • 9.
    GOA Working ..continued • Take the program • create many random variants of the program by changing the order of the instructions , deleting and editing some • Test the new variant with the test suites which are submitted • If they pass then check for improvement in the non functional requirements function • If yes spit out the assembly code as an optimized code after applying Minimization technique. 08-May-15 Computer Architecture CS 5504 Spring 2015 9
  • 10.
    Representation of Assemblycode • Very simple strategy adopted to represent the assembly code. • Each line will have a cell in an array. • One line can be broken down and also have multiple cells too. • The Augmented instructions are avoided. • Limits the search space. 08-May-15 Computer Architecture CS 5504 Spring 2015 10
  • 11.
    Experimental Setup andBenchmark Kernels: • Intel machine used as an example of Desktop computer. • I7 , 4 Cores, 8 GB Ram • AMD machine used as an example for Server Scale machine. • 48 Cores, 128 GB Ram • 8 Kernels from PARSEC benchmark suite used. • Blackscholes, bodytrack, ferret, fluidanimate, freqmine, swaptions, vips, and x264 • They should at-least keep the underlying Architecture running for 1 sec and produce output. 08-May-15 Computer Architecture CS 5504 Spring 2015 11
  • 12.
    Input Test Suites •Comprehensive test suites for each kernel. • Smallest input size of the test suite is considered. • Just for validating requirements specification, stress or border testing not needed that this point. 08-May-15 Computer Architecture CS 5504 Spring 2015 12
  • 13.
    Fitness Function • GOAproposes a linear scalar energy model • Hardware counters are captured using the “perf” utility in Linux. • Tightly coupled with the underlying Architecture and Fine grained. • Heavily dependent on time factor. 08-May-15 Computer Architecture CS 5504 Spring 2015 13 power = Cconst + Cins + Cfpos + Ctca + Cmem energy = seconds power ins fpos tca mem cycle cycle cycle cycle
  • 14.
    Constants Derived fromEmpirical Study 08-May-15 Computer Architecture CS 5504 Spring 2015 14
  • 15.
    Minimization Technique • Iterationtend to create redundant patterns of code. • The goal is to get the best energy efficiency with least amount of changes. • Delta Debugging is used to compare and remove redundant , non influential changes. 08-May-15 Computer Architecture CS 5504 Spring 2015 15
  • 16.
    Code Example ofGOA 08-May-15 Computer Architecture CS 5504 Spring 2015 16
  • 17.
    Post processing theoptimized code • Execute the original code with Held-out test suite. • Obtain Wall-Socket real measurements. • Execute the optimized code with Held-out test suite. • Obtain Wall-Socket real measurements. • Compare the two results and find out patterns which saw improvements and percentage improvement in Energy consumption. 08-May-15 Computer Architecture CS 5504 Spring 2015 17
  • 18.
    Results • In blackscholeskernel GOA caught the induced repeatition loop and found a way around it. • In swaptions kernel GOA gave a 42% energy savings. • Have to take it with a pinch of salt though. • In vips kernel , the cache misses actually increased instructions lines decreased and hence 20% improvement was observed. 08-May-15 Computer Architecture CS 5504 Spring 2015 18
  • 19.
    Interesting Observations • 7%average error found in most prediction models and so as in GOA. • But still works fine with it. • Empirical studies show that GOA might be better suited to finding efficient sequence of assembly instructions but not efficient memory access patterns. • Energy reduction percentage is consistently more on AMD machines. • But mainly due more opportunities due to bigger machine. 08-May-15 Computer Architecture CS 5504 Spring 2015 19
  • 20.
    QoS dependent Optimization •“Relaxed” preservation of semantics and more emphasis on QoS. • The plug and play testing suite policy gives the developer option of making GOA strict or loose on semantics. • Relaxed functional requirements provide much more energy efficiency but risk is taken by the developer to see the program semantic does not break. 08-May-15 Computer Architecture CS 5504 Spring 2015 20
  • 21.
    Key contributions • GeneticOptimization Algorithm (GOA) combines insights from profile- guided optimization, superoptimization, evolutionary computation and mutational robustness. • This technique gave 20% average energy savings across all benchmarks. • Very simple and mostly leverages from already available techniques. 08-May-15 Computer Architecture CS 5504 Spring 2015 21
  • 22.
    Drawbacks of GOA •Energy constant are taken empirically over repeated run on specific hardware. • Introducing GOA on new architecture will take considerable amount of work. • Non deterministic approach makes it almost impossible to restore to earlier code path after the software is changed even slightly. • Must provide indexing of the code paths and remember them. • Very High quality test suites “required” • Failure to provide them might result in over optimized false working code. 08-May-15 Computer Architecture CS 5504 Spring 2015 22
  • 23.
    Proposed Future Work •Currently only applied to x86. • A matrix implementation proposed as a solution to this problem. • Indirect selection can optimize one parameter at the cost of worsening other. • Should be generalized to Java Byte code and ARM. • Instead of Compiler which takes a predefined “agreed” path, a code should be compiled with multiple compiler using multiple paths and then best should be selected. 08-May-15 Computer Architecture CS 5504 Spring 2015 23
  • 24.
    Questions / Discussion 08-May-15Computer Architecture CS 5504 Spring 2015 24