This document proposes a genetic optimization algorithm (GOA) to optimize software at the post-compiler level to reduce energy consumption. GOA stochastically mutates compiled code while preserving functionality to find lower energy implementations. It takes compiled code, test suites, and an energy model as inputs. GOA generates variants, tests them, and selects lower energy ones using the model. Results showed up to 42% energy savings across benchmarks with some loss of optimization accuracy for specific hardware. Future work aims to generalize GOA to more platforms and compilers.
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Post compiler software optimization for reducing energy
1. Post-compiler Software Optimization for Reducing Energy
Eric Schulte, Jonathan Dorn, et all
Presented By: Abhishek Abhyankar
MS Computer Science Virginia Tech
08-May-15 Computer Architecture CS 5504 Spring 2015 1
2. Traditional Way of doing things
• Make a case for reduction in energy consumption.
• Traditionally Energy optimization handled in Hardware.
• Voltage Scaling , Heterogeneous Cores, Specialized Cores and, many others.
• On Software side, its mainly concerned about increasing speed and
reducing size of the compiled code.
• Extracting Instruction, Thread, and Data level parallelism.
08-May-15 Computer Architecture CS 5504 Spring 2015 2
3. Post Compile Software Optimization
• Handle the Optimizations on Software level.
• Take the compiled code output from standard compiler.
• How can this be achieved ? One of the approach is :
“Genetic Optimization algorithm which uses concepts from Evolutionary
computation which stochastically mutilates the software for optimum
implementation, all this while preserving strict functional semantics.”
08-May-15 Computer Architecture CS 5504 Spring 2015 3
4. Background Concepts
• Functional Vs Non Functional Requirements.
• On going debate between
• Functional Requirements: Adhering to Specifications, Correctness of the code.
• Non Functional Requirements: Memory Utilization, Energy Consumption.
• Stochastic Methods
• Used heavily in Evolutionary computation.
• Randomly trying out different combinations.
08-May-15 Computer Architecture CS 5504 Spring 2015 4
5. Background Concepts .. continued
• Profile Guided Optimizations.
• Program is profiled by running it and gathering run time data.
• Call graph generation.
• Enforcing “nearest is the best” policy.
• Software robustness even after mutilation.
• Random mutilations of the software preserve the semantic meaning.
• Many implementation possible which lead to same semantic goal.
08-May-15 Computer Architecture CS 5504 Spring 2015 5
6. Background Concepts .. continued
• Evolutionary Computation.
• Darwinian principles.
• Generally applied in black box approach.
• Steady State Algorithms.
• After each iterations candidates are simply inserted back in populous.
• Best among them is selected or rather worse is deleted.
08-May-15 Computer Architecture CS 5504 Spring 2015 6
7. Genetic Optimization Algorithm(GOA)
• “Genetic Optimization algorithm which uses concepts from
Evolutionary computation which stochastically mutilates the software
for optimum implementation, all this while preserving strict
functional semantics.”
• Takes in three inputs to start.
• Benchmark Applications or Kernels.
• Test Suites which validate the mutation.
• Fitness Function
08-May-15 Computer Architecture CS 5504 Spring 2015 7
9. GOA Working .. continued
• Take the program
• create many random variants of the program by changing the order of
the instructions , deleting and editing some
• Test the new variant with the test suites which are submitted
• If they pass then check for improvement in the non functional
requirements function
• If yes spit out the assembly code as an optimized code after applying
Minimization technique.
08-May-15 Computer Architecture CS 5504 Spring 2015 9
10. Representation of Assembly code
• Very simple strategy adopted to represent the assembly code.
• Each line will have a cell in an array.
• One line can be broken down and also have multiple cells too.
• The Augmented instructions are avoided.
• Limits the search space.
08-May-15 Computer Architecture CS 5504 Spring 2015 10
11. Experimental Setup and Benchmark Kernels:
• Intel machine used as an example of Desktop computer.
• I7 , 4 Cores, 8 GB Ram
• AMD machine used as an example for Server Scale machine.
• 48 Cores, 128 GB Ram
• 8 Kernels from PARSEC benchmark suite used.
• Blackscholes, bodytrack, ferret, fluidanimate, freqmine, swaptions, vips, and
x264
• They should at-least keep the underlying Architecture running for 1
sec and produce output.
08-May-15 Computer Architecture CS 5504 Spring 2015 11
12. Input Test Suites
• Comprehensive test suites for each kernel.
• Smallest input size of the test suite is considered.
• Just for validating requirements specification, stress or border testing
not needed that this point.
08-May-15 Computer Architecture CS 5504 Spring 2015 12
13. Fitness Function
• GOA proposes a linear scalar energy model
• Hardware counters are captured using the “perf” utility in Linux.
• Tightly coupled with the underlying Architecture and Fine grained.
• Heavily dependent on time factor.
08-May-15 Computer Architecture CS 5504 Spring 2015 13
power = Cconst + Cins + Cfpos + Ctca + Cmem
energy = seconds power
ins fpos tca mem
cycle cycle cycle cycle
14. Constants Derived from Empirical Study
08-May-15 Computer Architecture CS 5504 Spring 2015 14
15. Minimization Technique
• Iteration tend to create redundant patterns of code.
• The goal is to get the best energy efficiency with least amount of
changes.
• Delta Debugging is used to compare and remove redundant , non
influential changes.
08-May-15 Computer Architecture CS 5504 Spring 2015 15
16. Code Example of GOA
08-May-15 Computer Architecture CS 5504 Spring 2015 16
17. Post processing the optimized code
• Execute the original code with Held-out test suite.
• Obtain Wall-Socket real measurements.
• Execute the optimized code with Held-out test suite.
• Obtain Wall-Socket real measurements.
• Compare the two results and find out patterns which saw
improvements and percentage improvement in Energy consumption.
08-May-15 Computer Architecture CS 5504 Spring 2015 17
18. Results
• In blackscholes kernel GOA caught the induced repeatition loop and
found a way around it.
• In swaptions kernel GOA gave a 42% energy savings.
• Have to take it with a pinch of salt though.
• In vips kernel , the cache misses actually increased instructions lines
decreased and hence 20% improvement was observed.
08-May-15 Computer Architecture CS 5504 Spring 2015 18
19. Interesting Observations
• 7% average error found in most prediction models and so as in GOA.
• But still works fine with it.
• Empirical studies show that GOA might be better suited to finding
efficient sequence of assembly instructions but not efficient memory
access patterns.
• Energy reduction percentage is consistently more on AMD machines.
• But mainly due more opportunities due to bigger machine.
08-May-15 Computer Architecture CS 5504 Spring 2015 19
20. QoS dependent Optimization
• “Relaxed” preservation of semantics and more emphasis on QoS.
• The plug and play testing suite policy gives the developer option of
making GOA strict or loose on semantics.
• Relaxed functional requirements provide much more energy
efficiency but risk is taken by the developer to see the program
semantic does not break.
08-May-15 Computer Architecture CS 5504 Spring 2015 20
21. Key contributions
• Genetic Optimization Algorithm (GOA) combines insights from profile-
guided optimization, superoptimization, evolutionary computation
and mutational robustness.
• This technique gave 20% average energy savings across all
benchmarks.
• Very simple and mostly leverages from already available techniques.
08-May-15 Computer Architecture CS 5504 Spring 2015 21
22. Drawbacks of GOA
• Energy constant are taken empirically over repeated run on specific
hardware.
• Introducing GOA on new architecture will take considerable amount of work.
• Non deterministic approach makes it almost impossible to restore to
earlier code path after the software is changed even slightly.
• Must provide indexing of the code paths and remember them.
• Very High quality test suites “required”
• Failure to provide them might result in over optimized false working code.
08-May-15 Computer Architecture CS 5504 Spring 2015 22
23. Proposed Future Work
• Currently only applied to x86.
• A matrix implementation proposed as a solution to this problem.
• Indirect selection can optimize one parameter at the cost of
worsening other.
• Should be generalized to Java Byte code and ARM.
• Instead of Compiler which takes a predefined “agreed” path, a code
should be compiled with multiple compiler using multiple paths and
then best should be selected.
08-May-15 Computer Architecture CS 5504 Spring 2015 23