Static Neural Compiler Optimization via Deep Reinforcement Learning

04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 1
STATIC NEURAL COMPILER OPTIMIZATION VIA
DEEP REINFORCEMENT LEARNING
Rahim Mammadli1,2, Ali Jannesari3, Felix Wolf1,2
1 Graduate School of Computational Engineering, TU Darmstadt
2 Laboratory for Parallel Programming, TU Darmstadt
3 Software Analytics and Pervasive Parallelism Lab, Iowa State University

Agenda
•  Phase-ordering problem
•  Motivation to use reinforcement learning
•  Approach
•  Evaluation
•  Conclusion

Phase-ordering problem
•  Which pass to run?
•  How to set parameters?
Parameter1 Parameter2 Parameter3
Pass1 Pass2 Pass3 Pass4

Challenges (1/2)
•  Large number of optimizations
•  56 transformation passes in LLVM 10
•  More when considering parameters
•  Example: loop-unroll pass accepts 21
•  Effectiveness of optimizations depends on:
•  Code
•  Hardware
•  Problem size
unroll-threshold
unroll-count
unroll-max-count
unroll-runtime
…

Challenges (2/2)
•  Trade-off must be achieved between:
•  compilation time
•  program size
•  performance

Current approach in compilers
•  Predefined optimization sequences
•  -O0, -O1, -O2, -O3, -Ofast, -Os, -Oz, -Og, -O, -O4
•  Disadvantages:
•  Rigid structure (pre-defined ordering)
•  Input-sensitive[1] (imperfect cost function)
•  Maintenance

Motivation for reinforcement learning
•  IR resembles a state
•  Passes resemble actions
•  OS + LLVM optimizer resemble the environment
•  Rewards come directly from runtime
IR0
100ms
IR1
25ms
IR2
200ms
Pass1
Pass2

Approach
Max
input_ir.ll
Action
3
2
1
Reward
-2.3
+1.5
+0.3
4 -0.9
Predictions
Value > 0
LLVM
Optimizer
ENDSTART
false
Agent
RecordAction
History
true
+1.52
Action Reward

Problem definition
•  Learn the value function (Q):
•  Reward is calculated based on runtime:
Q tS , tA,w( )= R tS , tA( )+γ max
a∈A
Q t+1S ,a,w( )
R = ln
T tS( )
T t+1S( )

State
States
•  IR is encoded using NCC[2] embeddings
•  Action history is one-hot encoded
IR
Action
History
statement-1
statement-2
…
statement-n
action-1
action-2
…
action-m
NCC
One-Hot
Vector
Representation

Action spaces
Three levels of abstraction:
Action #2
Pass #2
Action #1
Pass #1
Action #3
Pass #3
Action #2
Parameter #1
Action #3
Parameter #2
Action #1
Pass #1
...
...
Action #1
Pass #1 Pass #2 Pass #3
...High
Level
Medium
Level
Low
Level

CORL Framework
SERVER
CLIENT
Learner
Exploration
Exploitation
Evaluation
Optimization
Visualization
Replay
Memory
Logs
Source
Codes
SQL DB
Visualizer
Workers
Manager
BenchmarkingLogging
Persistence
Task
Distribution
async
async

Experimental Setup
•  Dataset
•  109 single-source benchmarks from LLVM test suite
•  4:1 split across training and validation
•  Max optimization sequence length: 16
•  Workers: Lichtenberg HPC Cluster Phases I and II
•  LLVM 3.8

Metrics
•  Aggregate performance
•  Model
•  Highest Observed
•  O3
•  Individual performance

Space H
•  8 actions
•  ɣ = 0.9
•  τ = 400
•  Very good fit
0: Passes [0-7]
1: Passes [8-19]
2: Passes [20-29]
3: Passes [30-32]
4: Passes [33-37]
5: Passes [38-54]
6: Passes [55-65]
7: Passes [66-73]

Space H – Aggregate Results
Training Validation

Space H – Aggregate Results
Agent O3
Training 2.24x 2.17x
Validation 2.38x 2.67x

Space H – Individual Programs
•  Training
•  floyd-warshall.c
•  Bubblesort.c
•  Validation
•  recursive.c
•  dynprog.c
4 +0.91 7 +0.25 7 +0.01 6 -0.01
Agent O3
3.19x 4.55x
0 +0.43 4 +0.19 7 +0.32 5 +0.06 2.74x 1.35x
1 +0.85 6 +0.00 6 +0.00 6 +0.00 1.47x 2.96x
4 +0.21 5 +1.02 4 -0.06 5 +0.18 3.85x 2.91x

Space M
•  42 actions
•  Action = LLVM Pass
•  ɣ = 0.5
•  τ = 1000
•  Average fit after ~ 6 days

Space M – Aggregate Results
Training Validation

Space L
•  187 actions
•  42 passes
•  145 parameter selections
•  High bias (underfit)

Space L – Aggregate Results
Training Validation

Conclusion
•  Phase-ordering as reinforcement learning
•  Fully automatic approach
•  CORL Framework
•  Agent up to 1.32x faster than O3

Shortcomings / Outlook
•  Small dataset
•  Quality of embeddings
•  More suitable NN architecture (Graph NNs)
•  Dependence on particular system configuration
•  Data efficiency

References
[1] Gong, Zhangxiaowen, et al. "An empirical study of the effect of source-
level loop transformations on compiler stability." Proceedings of the ACM
on Programming Languages 2.OOPSLA (2018): 1-29.
[2] Ben-Nun, Tal, Alice Shoshana Jakobovits, and Torsten Hoefler. "Neural
code comprehension: A learnable representation of code semantics."
Advances in Neural Information Processing Systems. 2018.

Acknowledgements
This work is supported by the Graduate School CE within
the Centre for Computational Engineering at Technische
Universität Darmstadt and by the Hessian LOEWE initiative
within the Software-Factory 4.0 project. The calculations for
this research were conducted on the Lichtenberg Cluster of
TU Darmstadt.

Thank you!
Questions?

Static Neural Compiler Optimization via Deep Reinforcement Learning

Recommended

Recommended

More Related Content

Similar to Static Neural Compiler Optimization via Deep Reinforcement Learning

Similar to Static Neural Compiler Optimization via Deep Reinforcement Learning (20)

Recently uploaded

Recently uploaded (20)

Static Neural Compiler Optimization via Deep Reinforcement Learning