SlideShare a Scribd company logo
1 of 27
Download to read offline
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 1
STATIC NEURAL COMPILER OPTIMIZATION VIA
DEEP REINFORCEMENT LEARNING
Rahim Mammadli1,2, Ali Jannesari3, Felix Wolf1,2
1 Graduate School of Computational Engineering, TU Darmstadt
2 Laboratory for Parallel Programming, TU Darmstadt
3 Software Analytics and Pervasive Parallelism Lab, Iowa State University
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 2
Agenda
•  Phase-ordering problem
•  Motivation to use reinforcement learning
•  Approach
•  Evaluation
•  Conclusion
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 3
Phase-ordering problem
•  Which pass to run?
•  How to set parameters?
Parameter1 Parameter2 Parameter3
Pass1 Pass2 Pass3 Pass4
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 4
Challenges (1/2)
•  Large number of optimizations
•  56 transformation passes in LLVM 10
•  More when considering parameters
•  Example: loop-unroll pass accepts 21
•  Effectiveness of optimizations depends on:
•  Code
•  Hardware
•  Problem size
unroll-threshold
unroll-count
unroll-max-count
unroll-runtime
…
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 5
Challenges (2/2)
•  Trade-off must be achieved between:
•  compilation time
•  program size
•  performance
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 6
Current approach in compilers
•  Predefined optimization sequences
•  -O0, -O1, -O2, -O3, -Ofast, -Os, -Oz, -Og, -O, -O4
•  Disadvantages:
•  Rigid structure (pre-defined ordering)
•  Input-sensitive[1] (imperfect cost function)
•  Maintenance
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 7
Motivation for reinforcement learning
•  IR resembles a state
•  Passes resemble actions
•  OS + LLVM optimizer resemble the environment
•  Rewards come directly from runtime
IR0
100ms
IR1
25ms
IR2
200ms
Pass1
Pass2
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 8
Approach
Max
input_ir.ll
Action
3
2
1
Reward
-2.3
+1.5
+0.3
4 -0.9
Predictions
Value > 0
LLVM
Optimizer
ENDSTART
false
Agent
RecordAction
History
true
+1.52
Action Reward
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 9
Problem definition
•  Learn the value function (Q):
•  Reward is calculated based on runtime:
Q tS , tA,w( )= R tS , tA( )+γ max
a∈A
Q t+1S ,a,w( )
R = ln
T tS( )
T t+1S( )
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 10
State
States
•  IR is encoded using NCC[2] embeddings
•  Action history is one-hot encoded
IR
Action
History
statement-1
statement-2
…
statement-n
action-1
action-2
…
action-m
NCC
One-Hot
Vector
Representation
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 11
Action spaces
Three levels of abstraction:
Action #2
Pass #2
Action #1
Pass #1
Action #3
Pass #3
Action #2
Parameter #1
Action #3
Parameter #2
Action #1
Pass #1
...
...
Action #1
Pass #1 Pass #2 Pass #3
...High
Level
Medium
Level
Low
Level
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 12
CORL Framework
SERVER
CLIENT
Learner
Exploration
Exploitation
Evaluation
Optimization
Visualization
Replay
Memory
Logs
Source
Codes
SQL DB
Visualizer
Workers
Manager
BenchmarkingLogging
Persistence
Task 
Distribution
async
async
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 13
Experimental Setup
•  Dataset
•  109 single-source benchmarks from LLVM test suite
•  4:1 split across training and validation
•  Max optimization sequence length: 16
•  Workers: Lichtenberg HPC Cluster Phases I and II
•  LLVM 3.8
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 14
Metrics
•  Aggregate performance
•  Model
•  Highest Observed
•  O3
•  Individual performance
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 15
Space H
•  8 actions
•  ɣ	=	0.9	
•  τ	=	400	
•  Very good fit
0: Passes [0-7]
1: Passes [8-19]
2: Passes [20-29]
3: Passes [30-32]
4: Passes [33-37]
5: Passes [38-54]
6: Passes [55-65]
7: Passes [66-73]
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 16
Space H – Aggregate Results
Training Validation
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 17
Space H – Aggregate Results
Agent O3
Training 2.24x 2.17x
Validation 2.38x 2.67x
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 18
Space H – Individual Programs
•  Training
•  floyd-warshall.c
•  Bubblesort.c
•  Validation
•  recursive.c
•  dynprog.c
4 +0.91 7 +0.25 7 +0.01 6 -0.01
Agent	 O3	
3.19x 4.55x
0 +0.43 4 +0.19 7 +0.32 5 +0.06 2.74x 1.35x
1 +0.85 6 +0.00 6 +0.00 6 +0.00 1.47x 2.96x
4 +0.21 5 +1.02 4 -0.06 5 +0.18 3.85x 2.91x
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 19
Space M
•  42 actions
•  Action = LLVM Pass
•  ɣ	=	0.5	
•  τ	=	1000	
•  Average fit after ~ 6 days
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 20
Space M – Aggregate Results
Training Validation
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 21
Space L
•  187 actions
•  42 passes
•  145 parameter selections
•  High bias (underfit)
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 22
Space L – Aggregate Results
Training Validation
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 23
Conclusion
•  Phase-ordering as reinforcement learning
•  Fully automatic approach
•  CORL Framework
•  Agent up to 1.32x faster than O3
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 24
Shortcomings / Outlook
•  Small dataset
•  Quality of embeddings
•  More suitable NN architecture (Graph NNs)
•  Dependence on particular system configuration
•  Data efficiency
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 25
References
[1] Gong, Zhangxiaowen, et al. "An empirical study of the effect of source-
level loop transformations on compiler stability." Proceedings of the ACM
on Programming Languages 2.OOPSLA (2018): 1-29.
[2] Ben-Nun, Tal, Alice Shoshana Jakobovits, and Torsten Hoefler. "Neural
code comprehension: A learnable representation of code semantics."
Advances in Neural Information Processing Systems. 2018.
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 26
Acknowledgements
This work is supported by the Graduate School CE within
the Centre for Computational Engineering at Technische
Universität Darmstadt and by the Hessian LOEWE initiative
within the Software-Factory 4.0 project. The calculations for
this research were conducted on the Lichtenberg Cluster of
TU Darmstadt.
04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 27
Thank you!
Questions?

More Related Content

Similar to Static Neural Compiler Optimization via Deep Reinforcement Learning

Iteratively Learning Data Transformation Programs from Examples
Iteratively Learning Data Transformation Programs from ExamplesIteratively Learning Data Transformation Programs from Examples
Iteratively Learning Data Transformation Programs from ExamplesBo Wu
 
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...knowdiff
 
Flopsar light-galaxy eng-nl
Flopsar light-galaxy eng-nlFlopsar light-galaxy eng-nl
Flopsar light-galaxy eng-nlAdam Khan
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsSAIL_QU
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveLionel Briand
 
Scalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingScalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingLionel Briand
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Lionel Briand
 
RaDEn : A Scalable and Efficient Platform for Engineering Radiation Data
RaDEn :  A Scalable and Efficient Platform for Engineering Radiation DataRaDEn :  A Scalable and Efficient Platform for Engineering Radiation Data
RaDEn : A Scalable and Efficient Platform for Engineering Radiation DataHadi Fadlallah
 
Topograhical synthesis
Topograhical synthesis   Topograhical synthesis
Topograhical synthesis Deiptii Das
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
Application Performance Management
Application Performance ManagementApplication Performance Management
Application Performance ManagementNoriaki Tatsumi
 
Achieving horizontal scalability in density-based clustering for urls
Achieving horizontal scalability in density-based clustering for urlsAchieving horizontal scalability in density-based clustering for urls
Achieving horizontal scalability in density-based clustering for urlsAndrea Morichetta
 
Task Scheduling Using Firefly algorithm with cloudsim
Task Scheduling Using Firefly algorithm with cloudsimTask Scheduling Using Firefly algorithm with cloudsim
Task Scheduling Using Firefly algorithm with cloudsimAqilIzzuddin
 
Maintaining SLOs of Cloud-native Applications via Self-Adaptive Resource Sharing
Maintaining SLOs of Cloud-native Applications via Self-Adaptive Resource SharingMaintaining SLOs of Cloud-native Applications via Self-Adaptive Resource Sharing
Maintaining SLOs of Cloud-native Applications via Self-Adaptive Resource SharingVladimir Podolskiy
 
Combinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learningCombinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learning민재 정
 

Similar to Static Neural Compiler Optimization via Deep Reinforcement Learning (20)

Iteratively Learning Data Transformation Programs from Examples
Iteratively Learning Data Transformation Programs from ExamplesIteratively Learning Data Transformation Programs from Examples
Iteratively Learning Data Transformation Programs from Examples
 
COCOMO
COCOMOCOCOMO
COCOMO
 
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...
Amin Milani Fard: Directed Model Inference for Testing and Analysis of Web Ap...
 
Flopsar light-galaxy eng-nl
Flopsar light-galaxy eng-nlFlopsar light-galaxy eng-nl
Flopsar light-galaxy eng-nl
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise Applications
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal Perspective
 
Scalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingScalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and Testing
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
 
RaDEn : A Scalable and Efficient Platform for Engineering Radiation Data
RaDEn :  A Scalable and Efficient Platform for Engineering Radiation DataRaDEn :  A Scalable and Efficient Platform for Engineering Radiation Data
RaDEn : A Scalable and Efficient Platform for Engineering Radiation Data
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
 
Topograhical synthesis
Topograhical synthesis   Topograhical synthesis
Topograhical synthesis
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Application Performance Management
Application Performance ManagementApplication Performance Management
Application Performance Management
 
Achieving horizontal scalability in density-based clustering for urls
Achieving horizontal scalability in density-based clustering for urlsAchieving horizontal scalability in density-based clustering for urls
Achieving horizontal scalability in density-based clustering for urls
 
Task Scheduling Using Firefly algorithm with cloudsim
Task Scheduling Using Firefly algorithm with cloudsimTask Scheduling Using Firefly algorithm with cloudsim
Task Scheduling Using Firefly algorithm with cloudsim
 
Maintaining SLOs of Cloud-native Applications via Self-Adaptive Resource Sharing
Maintaining SLOs of Cloud-native Applications via Self-Adaptive Resource SharingMaintaining SLOs of Cloud-native Applications via Self-Adaptive Resource Sharing
Maintaining SLOs of Cloud-native Applications via Self-Adaptive Resource Sharing
 
Combinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learningCombinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learning
 
Mps intro
Mps introMps intro
Mps intro
 
Algorithmic Software Cost Modeling
Algorithmic Software Cost ModelingAlgorithmic Software Cost Modeling
Algorithmic Software Cost Modeling
 

Recently uploaded

9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONrouseeyyy
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 

Recently uploaded (20)

9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 

Static Neural Compiler Optimization via Deep Reinforcement Learning

  • 1. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 1 STATIC NEURAL COMPILER OPTIMIZATION VIA DEEP REINFORCEMENT LEARNING Rahim Mammadli1,2, Ali Jannesari3, Felix Wolf1,2 1 Graduate School of Computational Engineering, TU Darmstadt 2 Laboratory for Parallel Programming, TU Darmstadt 3 Software Analytics and Pervasive Parallelism Lab, Iowa State University
  • 2. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 2 Agenda •  Phase-ordering problem •  Motivation to use reinforcement learning •  Approach •  Evaluation •  Conclusion
  • 3. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 3 Phase-ordering problem •  Which pass to run? •  How to set parameters? Parameter1 Parameter2 Parameter3 Pass1 Pass2 Pass3 Pass4
  • 4. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 4 Challenges (1/2) •  Large number of optimizations •  56 transformation passes in LLVM 10 •  More when considering parameters •  Example: loop-unroll pass accepts 21 •  Effectiveness of optimizations depends on: •  Code •  Hardware •  Problem size unroll-threshold unroll-count unroll-max-count unroll-runtime …
  • 5. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 5 Challenges (2/2) •  Trade-off must be achieved between: •  compilation time •  program size •  performance
  • 6. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 6 Current approach in compilers •  Predefined optimization sequences •  -O0, -O1, -O2, -O3, -Ofast, -Os, -Oz, -Og, -O, -O4 •  Disadvantages: •  Rigid structure (pre-defined ordering) •  Input-sensitive[1] (imperfect cost function) •  Maintenance
  • 7. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 7 Motivation for reinforcement learning •  IR resembles a state •  Passes resemble actions •  OS + LLVM optimizer resemble the environment •  Rewards come directly from runtime IR0 100ms IR1 25ms IR2 200ms Pass1 Pass2
  • 8. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 8 Approach Max input_ir.ll Action 3 2 1 Reward -2.3 +1.5 +0.3 4 -0.9 Predictions Value > 0 LLVM Optimizer ENDSTART false Agent RecordAction History true +1.52 Action Reward
  • 9. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 9 Problem definition •  Learn the value function (Q): •  Reward is calculated based on runtime: Q tS , tA,w( )= R tS , tA( )+γ max a∈A Q t+1S ,a,w( ) R = ln T tS( ) T t+1S( )
  • 10. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 10 State States •  IR is encoded using NCC[2] embeddings •  Action history is one-hot encoded IR Action History statement-1 statement-2 … statement-n action-1 action-2 … action-m NCC One-Hot Vector Representation
  • 11. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 11 Action spaces Three levels of abstraction: Action #2 Pass #2 Action #1 Pass #1 Action #3 Pass #3 Action #2 Parameter #1 Action #3 Parameter #2 Action #1 Pass #1 ... ... Action #1 Pass #1 Pass #2 Pass #3 ...High Level Medium Level Low Level
  • 12. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 12 CORL Framework SERVER CLIENT Learner Exploration Exploitation Evaluation Optimization Visualization Replay Memory Logs Source Codes SQL DB Visualizer Workers Manager BenchmarkingLogging Persistence Task  Distribution async async
  • 13. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 13 Experimental Setup •  Dataset •  109 single-source benchmarks from LLVM test suite •  4:1 split across training and validation •  Max optimization sequence length: 16 •  Workers: Lichtenberg HPC Cluster Phases I and II •  LLVM 3.8
  • 14. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 14 Metrics •  Aggregate performance •  Model •  Highest Observed •  O3 •  Individual performance
  • 15. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 15 Space H •  8 actions •  ɣ = 0.9 •  τ = 400 •  Very good fit 0: Passes [0-7] 1: Passes [8-19] 2: Passes [20-29] 3: Passes [30-32] 4: Passes [33-37] 5: Passes [38-54] 6: Passes [55-65] 7: Passes [66-73]
  • 16. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 16 Space H – Aggregate Results Training Validation
  • 17. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 17 Space H – Aggregate Results Agent O3 Training 2.24x 2.17x Validation 2.38x 2.67x
  • 18. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 18 Space H – Individual Programs •  Training •  floyd-warshall.c •  Bubblesort.c •  Validation •  recursive.c •  dynprog.c 4 +0.91 7 +0.25 7 +0.01 6 -0.01 Agent O3 3.19x 4.55x 0 +0.43 4 +0.19 7 +0.32 5 +0.06 2.74x 1.35x 1 +0.85 6 +0.00 6 +0.00 6 +0.00 1.47x 2.96x 4 +0.21 5 +1.02 4 -0.06 5 +0.18 3.85x 2.91x
  • 19. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 19 Space M •  42 actions •  Action = LLVM Pass •  ɣ = 0.5 •  τ = 1000 •  Average fit after ~ 6 days
  • 20. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 20 Space M – Aggregate Results Training Validation
  • 21. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 21 Space L •  187 actions •  42 passes •  145 parameter selections •  High bias (underfit)
  • 22. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 22 Space L – Aggregate Results Training Validation
  • 23. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 23 Conclusion •  Phase-ordering as reinforcement learning •  Fully automatic approach •  CORL Framework •  Agent up to 1.32x faster than O3
  • 24. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 24 Shortcomings / Outlook •  Small dataset •  Quality of embeddings •  More suitable NN architecture (Graph NNs) •  Dependence on particular system configuration •  Data efficiency
  • 25. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 25 References [1] Gong, Zhangxiaowen, et al. "An empirical study of the effect of source- level loop transformations on compiler stability." Proceedings of the ACM on Programming Languages 2.OOPSLA (2018): 1-29. [2] Ben-Nun, Tal, Alice Shoshana Jakobovits, and Torsten Hoefler. "Neural code comprehension: A learnable representation of code semantics." Advances in Neural Information Processing Systems. 2018.
  • 26. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 26 Acknowledgements This work is supported by the Graduate School CE within the Centre for Computational Engineering at Technische Universität Darmstadt and by the Hessian LOEWE initiative within the Software-Factory 4.0 project. The calculations for this research were conducted on the Lichtenberg Cluster of TU Darmstadt.
  • 27. 04.12.20 | Department of Computer Science | Laboratory for Parallel Programming | Rahim Mammadli | 27 Thank you! Questions?