SlideShare a Scribd company logo
Constraint Programming in Compiler
Optimization: Lessons Learned
Peter van Beek
University of Waterloo
Acknowledgements
• Joint work with:

• Funding:

Omer Beg

NSERC

Alejandro López-Ortiz

IBM Canada

Abid Malik
Jim McInnes
Wayne Oldford
Claude-Guy Quimper
John Tromp
Kent Wilken
Huayue Wu
Application-driven research
• Idea:
• pick an application—a real-world problem—where, if you solve it, there would be a
significant impact

• Along the way, if all goes well, you will also:
• identify and fill gaps in theory
• identify and solve interesting sub-problems whose solutions will have general
applicability
Optimization problems in compilers
• Instruction selection
• Instruction scheduling
• basic-block scheduling

• super-block scheduling
• loop scheduling: tiling, unrolling, fusion

• Memory hierarchy optimizations
• Register allocation
Optimization problems in compilers
• Instruction selection
• Instruction scheduling
• basic-block scheduling

• super-block scheduling
• loop scheduling: tiling, unrolling, fusion

• Memory hierarchy optimizations
• Register allocation
Production compilers
“At the outset, note that basic-block scheduling is an NP-hard
problem, even with a very simple formulation of the
problem, so we must seek an effective heuristic, rather than an
exact approach.”
Steven Muchnick,
Advanced Compiler Design
& Implementation, 1997
Outline
• Introduction
• computer architecture
• superblock scheduling

• Constraint programming approach
• temporal scheduler
• spatial and temporal scheduler

• Experiments
• experimental setup
• experimental results

• Lessons learned
Computer architecture:
Performing instructions in parallel
• Multiple-issue
• multiple functional units;
e.g., ALUs, FPUs, load/store units, branch
units
• multiple instructions can be issued (begin
execution) each clock cycle
• issue width: max number of instructions that
can be issued each clock cycle
• on most architectures issue width less than
number of functional units
Computer architecture:
Performing instructions in parallel
• Pipelining
• overlap execution of instructions on a single
functional unit
• latency of an instruction
number of cycles before result is available
• execution time of an instruction

number of cycles before next instruction
can be issued on same functional unit
• serializing instruction
instruction that requires exclusive use of
entire processor in cycle in which it is issued

Analogy: vehicle assembly line
Superblock instruction scheduling
• Instruction scheduling
• assignment of a clock cycle to each instruction
• needed to take advantage of complex features of
architecture
• sometimes necessary for correctness (VLIW)

• Basic block
• straight-line sequence of code with single entry, single exit

• Superblock
• collection of basic blocks with a unique entrance but multiple exits

• Given a target architecture, find schedule with minimum expected
completion time
Example superblock

A:1

1
dependency DAG
• nodes

1
C:1

B:3

5

5

• one for each instruction
• labeled with execution time

D:1

2

• nodes F and G are branch
instructions, labeled with
probability the exit is taken

0

0
2

F:1

• arcs
• represent precedence

E:1

0
40%

G:1

• labeled with latencies

60%
Example superblock

A:1

1
optimal cost schedule for
2-issue processor
cycle

C:1

B:3

5

5

ALU FPU

1
2

D:1

A
B

E:1

2

3
4
5

1

0
2

F:1

C

0

6
7
8

E

0

D

9

G:1

F

10

40%

G

60%
Computer architecture:
General purpose architectures

processor
f

register file

i

b

m
Computer architecture:
Clustered architectures
cluster 1

cluster 2
f1

i1

b1

f2

m1

i2

b2

m2

c1

c2

register
file

register
file
cluster
interconnect

c0

c3

register
file

register
file

f0

cluster 0

i0

b0

m0

f3

i3

b3

m3

cluster 3
Computer architecture:
Clustered architectures
• Current: digital signal processing
• multimedia, audio processing, image processing
• wireless, ADSL modems, …

• Future trend: general purpose multi-core processors
• large numbers of cores
• fast inter-processor communication
Spatial and temporal scheduling
A
1
2

1

B

C
2

D

cycle

2

E

1

2
F

G

2

1

20%

H

1
2
3
4
5
6
7
8
9
10

c0

cycle

A

B
C
D
E
F
G
H

cost = 9.8
80%

1
2
3
4
5
6
7
8
9
10

c0

c1

A
B
C
D
E
F

G

H

cost = 7.6
Spatial and temporal scheduling
A
1
2

1

B

C
2

D

cycle

2

E

1

2
F

G

2

1

20%

H

1
2
3
4
5
6
7
8
9
10

c0

c1

A
B
C
D
E
F

G

H

cost = 7.6
80%
Approaches
• Superblock instruction scheduling is NP-complete
• Heuristic approaches in all commercial and open-source research compilers
• greedy list scheduling algorithm coupled with a priority heuristic

• Here: Optimal approach
• useful when longer compile times are tolerable
• e.g., compiling for software libraries, digital signal processing, embedded
applications, final production build
Outline
• Introduction
• computer architecture
• superblock scheduling

• Constraint programming approach
• temporal scheduler
• spatial and temporal scheduler

• Experiments
• experimental setup
• experimental results

• Lessons learned
Temporal scheduler:
Basic constraint model

A

1
variables

1
C

B

A, B, C, D, E, F, G

5

5

domains
{1, …, m}

D

constraints

E
2

B

A + 1, C

D

B + 5, …, G

0

0

A + 1,
F

2

F

gcc(A, B, C, F, G, nALU)
gcc(D, E, nFPU)
gcc(A, …, G, issuewidth)

0
40%

G

cost function

40 F + 60 G

60%
Temporal scheduler
Basic constraint model (con‟t)
non-fully pipelined instructions
• introduce auxiliary variables
PB,1
PB,2
• introduce additional constraints
B + 1 = PB,1
B + 2 = PB,2

gcc(A, B, PB,1, PB,2 C, F, G, nALU)
serializing instructions
• similar technique

B:3
Temporal scheduler:
Improving the model
• Add constraints to increase constraint propagation (e.g., Smith 2006)
• implied constraints: do not change set of solutions

• dominance constraints: preserve an optimal solution

• Here:
• many constraints added to constraint model in extensive preprocessing stage
that occurs once
• extensive preprocessing effort pays off as model is solved many times
Temporal scheduler:
Improving the solver
• From optimization to satisfaction
• find bounds on cost function

• enumerate solutions to cost function (knapsack constraint; Trick 2001)
• step through in increasing order of cost

• Improved bounds consistency algorithm for gcc constraints
• Use portfolio to improve performance (Gomes et al. 1997)
• increasing levels of constraint propagation

• Impact-based variable ordering (Refalo 2004)
• Structure-based decomposition technique (Freuder 1994)
Spatial and temporal scheduler:
Basic constraint model
variables

A

cycle of issue:

xA, xB, …, xH

cluster:

yA, yB, …, yH

1

2

domains
dom(x) = {1, …, m}
dom(y) = {0, …, k−1}
communication constraints

yA ≠ yC → xC ≥ xA + 1 + cost

B

C
2

D

2

E
1

2

…

G
1

20%

H

cost function

xH + 20

F
2

yA = yC → xC ≥ xA + 1

80

1

xG

80%
Spatial and temporal scheduler:
Improving the model
• Symmetry breaking
A
• add auxiliary variables: zAC, zBC, …
• dom(z) = {„=‟, „≠‟}
• instead of backtracking on the y‟s
backtrack on the edges with z‟s

• preserves at least one optimal solution

B

2

1
C
1
D
Spatial and temporal scheduler:
Improving the solver
• Preprocess DAG to find instructions which must be on same cluster
• preserve an optimal solution

• Variable ordering
• assign z variables first, in breadth-first order of DAG
• determine assignment for corresponding y variables
• determine cost of temporal schedule for these assignments
Outline
• Introduction
• computer architecture
• superblock scheduling

• Constraint programming approach
• temporal scheduler
• spatial and temporal scheduler

• Experiments
• experimental setup
• experimental results

• Lessons learned
Experimental setup: Instances
• All 154,651 superblocks from SPEC 2000 integer and floating pt. benchmarks
• standard benchmark suite
• consists of software packages chosen to be representative of types of
programming languages and applications
• superblocks generated by IBM‟s Tobey compiler when compiling the software
packages
• compilations done using Tobey‟s highest level of optimization
Experimental setup: Target architectures
Realistic architectures:
• not fully pipelined
• issue width not equal to number of functional units
• serializing instructions

architecture

issue
width

simple
int. units

1-issue

1

1

2-issue

2

1

4-issue

4

2

6-issue

6

2

complex
int. units

branch
units

floating
pt. units

1
1

memory
units

1

1

1

1

1

2

3

2
Experimental results: Temporal scheduler
Total time (hh:mm:ss) to schedule all superblocks and percentage
solved to optimality, for various time limits for solving each instance
1 sec.
architecture

time

10 sec.
%

time

1 min.
%

7:15:46 99.38

time

10 min.
%

10:22:36 99.96

time

%

1-issue

1:30:20 97.34

15:08:44 99.98

2-issue

3:57:13 91.83 30:53:83 93.90 108:50:01 97.18 665:31:00 97.70

4-issue

2:17:44 95.47 17:09:48 96.60

61:29:31 98.43 343:04:46 98.87

6-issue

3:04:18 93.59 25:03:44 94.76

87:04:34 97.78 511:19:14 98.29
Spatial and temporal scheduler:
Some related work
• Bottom Up Greedy (BUG) [Ellis. MIT Press „86]
• greedy heuristic algorithm
• localized clustering decisions

• Hierarchical Partitioning (RHOP) [Chu et al. PLDI „03]
• coarsening and refinement heuristic
• weights of nodes and edges updated as algorithm progresses
Experimental results:
Spatial and temporal scheduler
1.6

4-cluster-2-issue-2-cyl
rhop-ls

Average Speedup

1.4

1.2

1

0.8

0.6

0.4

Benchmarks

rhop-opt

cp
Experimental results:
Spatial and temporal scheduler
3

applu-2-cyl
rhop-ls

2.6

rhop-opt

Average Speedup

2.2

1.8

1.4

1

0.6
1―1

1―2

1―4

1―6

2―1

2―2

2―4

2―6

4―1

4―2

4―4

4―6

Architecture Configuration (#Clusters – IssueWidth)

8―1

8―2

8―4

8―6

cp
Outline
• Introduction
• computer architecture
• superblock scheduling

• Constraint programming approach
• temporal scheduler
• spatial and temporal scheduler

• Experiments
• experimental setup
• experimental results

• Lessons learned
Lessons learned (I)
• Pick problem carefully
• is a new solution needed?
• what is the likelihood of success?

• Existing heuristics may not leave any room for improvement
• examples: basic block scheduling, instruction selection
Lessons learned (II)
• Be prepared for adversity
• significant overhead
• learning domain of application

• significant implementation
• significant engineering

• different research cultures
• researchers are tribal
• different standards of reviewing (number & contentiousness)
• different standards of evaluation, formalization, assumptions
Lessons learned (III)
• Rewards
• can be attractive to students
• can lead to identifying and solving interesting sub-problems whose solutions have
general applicability
• bounds consistency for alldifferent and gcc global constraints
• restarts and portfolios
• machine learning of heuristics
Optimization problems in compilers
• Instruction selection
• Instruction scheduling
• basic-block scheduling

• super-block scheduling
• loop scheduling: tiling, unrolling, fusion

• Memory hierarchy optimizations
• Register allocation
Selected publications
• Applications
A. M. Malik, M. Chase, T. Russell, and P. van Beek. An application of constraint programming to superblock
instruction scheduling. CP-2008.
M. Beg and P. van Beek. A constraint programming approach for integrated spatial and temporal scheduling for
clustered architectures. ACM TECS, To appear.

• Global constraints
C.-G. Quimper, P. van Beek, A. Lopez-Ortiz, A. Golynski, and S. Bashir Sadjad. An efficient bounds consistency
algorithm for the global cardinality constraint. CP-2003.
A. Lopez-Ortiz, C.-G. Quimper, J. Tromp, and P. van Beek. A fast and simple algorithm for bounds consistency of
the alldifferent constraint. IJCAI-2003.

• Portfolios and restarts
H. Wu and P. van Beek. On portfolios for backtracking search in the presence of deadlines. ICTAI-2007.
H. Wu and P. van Beek. On universal restart strategies for backtracking search. CP-2007.

• Heuristics and machine learning
T. Russell, A. M. Malik, M. Chase, and P. van Beek. Learning heuristics for the superblock instruction scheduling
problem. IEEE TKDE, 2009.
M. Chase, A. M. Malik, T. Russell, R. W. Oldford, and P. van Beek. A computational study of heuristic and exact
techniques for superblock instruction scheduling. J. of Scheduling, 2012.
Next project:
Smart water infrastructure / water analytics
Spatial and temporal scheduler:
Search tree of basic model
yA=
A

B

2

0

1

2

3

1

yB=
yC=

C
1

yD=

0

1

0

0
12 3

2 3

D
find temporal schedule
for y = (0, 0, 0, 2)

1 2

3

0 1 2
0

1

0

3

23
1 2 3
Spatial and temporal scheduler:
Search tree of improved model
zAC=
A

B

2

1
C
1

zBC= (‘=’)
zCD= (‘=’)

(‘≠’)

(‘=’)

(‘≠’)

(‘=’)

(‘≠’)

(‘=’) (‘≠’)

(‘=’)

(‘≠’)

(‘≠’)

(‘=’) (‘≠’)

D

determine y,
find temporal schedule
for y =(0,0,0,0)
same as y =(1,1,1,1) etc.

determine y,
find temporal schedule
for y =(0,1,1,0)
same as y =(2,3,3,2), y =(0,2,2,3) etc.
Instruction Selection
+f32
*f32

Z

DAG:

+f32
Y

X

+f32
TILES:

rf32

+f32

*f32
rf32

rf32

*f32

rf32
rf32

rf32

+f32
Z
OUTPUT:

+f32

*f32

Z
OR

+f32
X

rf32

Y

*f32
+f32

X

Y
Instruction Selection
• Given
• an expression DAG G
• a set of tiles representing machine instructions

• Find a mapping of tiles to nodes in G of minimal cost (size) that covers G
• Complexity:
• polynomial for trees
• NP-hard for DAGs
Experimental evaluation
90

Burg

80

DP

70

CP

Code Size(KB)

60
50
40

30
20
10
0

Benchmarks

More Related Content

What's hot

20100309 02 - Software testing (McCabe)
20100309 02 - Software testing (McCabe)20100309 02 - Software testing (McCabe)
20100309 02 - Software testing (McCabe)
LeClubQualiteLogicielle
 
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization StudioRecent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
IBM Decision Optimization
 
Bug deBug Chennai 2012 Talk - V3 analysis an approach for estimating software...
Bug deBug Chennai 2012 Talk - V3 analysis an approach for estimating software...Bug deBug Chennai 2012 Talk - V3 analysis an approach for estimating software...
Bug deBug Chennai 2012 Talk - V3 analysis an approach for estimating software...
RIA RUI Society
 
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
Filip Krikava
 
Parallel concepts1
Parallel concepts1Parallel concepts1
Parallel concepts1
Dr. C.V. Suresh Babu
 
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software Architectures
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software ArchitecturesACTRESS: Domain-Specific Modeling of Self-Adaptive Software Architectures
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software Architectures
Filip Krikava
 
How to Connect SystemVerilog with Octave
How to Connect SystemVerilog with OctaveHow to Connect SystemVerilog with Octave
How to Connect SystemVerilog with Octave
Amiq Consulting
 
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...
Hsien-Hsin Sean Lee, Ph.D.
 
Subprogram
SubprogramSubprogram
Subprogram
baran19901990
 
Ch5 process synchronization
Ch5   process synchronizationCh5   process synchronization
Ch5 process synchronization
Welly Dian Astika
 
Finding Bugs Faster with Assertion Based Verification (ABV)
Finding Bugs Faster with Assertion Based Verification (ABV)Finding Bugs Faster with Assertion Based Verification (ABV)
Finding Bugs Faster with Assertion Based Verification (ABV)
DVClub
 
Contribution of recurrent connectionist language models in improving lstm bas...
Contribution of recurrent connectionist language models in improving lstm bas...Contribution of recurrent connectionist language models in improving lstm bas...
Contribution of recurrent connectionist language models in improving lstm bas...
anna8885
 
09 implementing+subprograms
09 implementing+subprograms09 implementing+subprograms
09 implementing+subprograms
baran19901990
 
Chap6 procedures & macros
Chap6 procedures & macrosChap6 procedures & macros
Chap6 procedures & macros
HarshitParkar6677
 
System verilog verification building blocks
System verilog verification building blocksSystem verilog verification building blocks
System verilog verification building blocks
Nirav Desai
 
LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)
Linaro
 
Next Generation MPICH: What to Expect - Lightweight Communication and More
Next Generation MPICH: What to Expect - Lightweight Communication and MoreNext Generation MPICH: What to Expect - Lightweight Communication and More
Next Generation MPICH: What to Expect - Lightweight Communication and More
Intel® Software
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
Akhil Kaushik
 
Concept of Pipelining
Concept of PipeliningConcept of Pipelining
Concept of Pipelining
SHAKOOR AB
 
Csci360 08-subprograms
Csci360 08-subprogramsCsci360 08-subprograms
Csci360 08-subprograms
Boniface Mwangi
 

What's hot (20)

20100309 02 - Software testing (McCabe)
20100309 02 - Software testing (McCabe)20100309 02 - Software testing (McCabe)
20100309 02 - Software testing (McCabe)
 
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization StudioRecent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
 
Bug deBug Chennai 2012 Talk - V3 analysis an approach for estimating software...
Bug deBug Chennai 2012 Talk - V3 analysis an approach for estimating software...Bug deBug Chennai 2012 Talk - V3 analysis an approach for estimating software...
Bug deBug Chennai 2012 Talk - V3 analysis an approach for estimating software...
 
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
 
Parallel concepts1
Parallel concepts1Parallel concepts1
Parallel concepts1
 
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software Architectures
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software ArchitecturesACTRESS: Domain-Specific Modeling of Self-Adaptive Software Architectures
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software Architectures
 
How to Connect SystemVerilog with Octave
How to Connect SystemVerilog with OctaveHow to Connect SystemVerilog with Octave
How to Connect SystemVerilog with Octave
 
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...
 
Subprogram
SubprogramSubprogram
Subprogram
 
Ch5 process synchronization
Ch5   process synchronizationCh5   process synchronization
Ch5 process synchronization
 
Finding Bugs Faster with Assertion Based Verification (ABV)
Finding Bugs Faster with Assertion Based Verification (ABV)Finding Bugs Faster with Assertion Based Verification (ABV)
Finding Bugs Faster with Assertion Based Verification (ABV)
 
Contribution of recurrent connectionist language models in improving lstm bas...
Contribution of recurrent connectionist language models in improving lstm bas...Contribution of recurrent connectionist language models in improving lstm bas...
Contribution of recurrent connectionist language models in improving lstm bas...
 
09 implementing+subprograms
09 implementing+subprograms09 implementing+subprograms
09 implementing+subprograms
 
Chap6 procedures & macros
Chap6 procedures & macrosChap6 procedures & macros
Chap6 procedures & macros
 
System verilog verification building blocks
System verilog verification building blocksSystem verilog verification building blocks
System verilog verification building blocks
 
LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)
 
Next Generation MPICH: What to Expect - Lightweight Communication and More
Next Generation MPICH: What to Expect - Lightweight Communication and MoreNext Generation MPICH: What to Expect - Lightweight Communication and More
Next Generation MPICH: What to Expect - Lightweight Communication and More
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
 
Concept of Pipelining
Concept of PipeliningConcept of Pipelining
Concept of Pipelining
 
Csci360 08-subprograms
Csci360 08-subprogramsCsci360 08-subprograms
Csci360 08-subprograms
 

Similar to Constraint Programming in Compiler Optimization: Lessons Learned

Validation and-design-in-a-small-team-environment
Validation and-design-in-a-small-team-environmentValidation and-design-in-a-small-team-environment
Validation and-design-in-a-small-team-environment
Obsidian Software
 
Validation and Design in a Small Team Environment
Validation and Design in a Small Team EnvironmentValidation and Design in a Small Team Environment
Validation and Design in a Small Team Environment
DVClub
 
MODULE 3 process synchronizationnnn.pptx
MODULE 3 process synchronizationnnn.pptxMODULE 3 process synchronizationnnn.pptx
MODULE 3 process synchronizationnnn.pptx
senthilkumar969017
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
Roberto Agostino Vitillo
 
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PerformanceLec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Hsien-Hsin Sean Lee, Ph.D.
 
Start MPC
Start MPC Start MPC
Start MPC
hamidreza2012
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
Piotr Przymus
 
ScilabTEC 2015 - Noesis Solutions
ScilabTEC 2015 - Noesis SolutionsScilabTEC 2015 - Noesis Solutions
ScilabTEC 2015 - Noesis Solutions
Scilab
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptx
San Kim
 
L-2 (Computer Performance).ppt
L-2 (Computer Performance).pptL-2 (Computer Performance).ppt
L-2 (Computer Performance).ppt
ImranKhan997082
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgao
Edhole.com
 
Mixing d ps building architecture on the cross cutting example
Mixing d ps building architecture on the cross cutting exampleMixing d ps building architecture on the cross cutting example
Mixing d ps building architecture on the cross cutting example
corehard_by
 
Onnc intro
Onnc introOnnc intro
Onnc intro
Luba Tang
 
Algorithm and C code related to data structure
Algorithm and C code related to data structureAlgorithm and C code related to data structure
Algorithm and C code related to data structure
Self-Employed
 
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Unai Lopez-Novoa
 
03 performance
03 performance03 performance
03 performance
marangburu42
 
04 performance
04 performance04 performance
04 performance
marangburu42
 
1 Introduction to C Programming.pptx
1 Introduction to C Programming.pptx1 Introduction to C Programming.pptx
1 Introduction to C Programming.pptx
aarockiaabinsAPIICSE
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
asdq4
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgao
Edhole.com
 

Similar to Constraint Programming in Compiler Optimization: Lessons Learned (20)

Validation and-design-in-a-small-team-environment
Validation and-design-in-a-small-team-environmentValidation and-design-in-a-small-team-environment
Validation and-design-in-a-small-team-environment
 
Validation and Design in a Small Team Environment
Validation and Design in a Small Team EnvironmentValidation and Design in a Small Team Environment
Validation and Design in a Small Team Environment
 
MODULE 3 process synchronizationnnn.pptx
MODULE 3 process synchronizationnnn.pptxMODULE 3 process synchronizationnnn.pptx
MODULE 3 process synchronizationnnn.pptx
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PerformanceLec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
 
Start MPC
Start MPC Start MPC
Start MPC
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
ScilabTEC 2015 - Noesis Solutions
ScilabTEC 2015 - Noesis SolutionsScilabTEC 2015 - Noesis Solutions
ScilabTEC 2015 - Noesis Solutions
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptx
 
L-2 (Computer Performance).ppt
L-2 (Computer Performance).pptL-2 (Computer Performance).ppt
L-2 (Computer Performance).ppt
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgao
 
Mixing d ps building architecture on the cross cutting example
Mixing d ps building architecture on the cross cutting exampleMixing d ps building architecture on the cross cutting example
Mixing d ps building architecture on the cross cutting example
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
Algorithm and C code related to data structure
Algorithm and C code related to data structureAlgorithm and C code related to data structure
Algorithm and C code related to data structure
 
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
 
03 performance
03 performance03 performance
03 performance
 
04 performance
04 performance04 performance
04 performance
 
1 Introduction to C Programming.pptx
1 Introduction to C Programming.pptx1 Introduction to C Programming.pptx
1 Introduction to C Programming.pptx
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgao
 

Recently uploaded

RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 

Recently uploaded (20)

RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 

Constraint Programming in Compiler Optimization: Lessons Learned

  • 1. Constraint Programming in Compiler Optimization: Lessons Learned Peter van Beek University of Waterloo
  • 2. Acknowledgements • Joint work with: • Funding: Omer Beg NSERC Alejandro López-Ortiz IBM Canada Abid Malik Jim McInnes Wayne Oldford Claude-Guy Quimper John Tromp Kent Wilken Huayue Wu
  • 3. Application-driven research • Idea: • pick an application—a real-world problem—where, if you solve it, there would be a significant impact • Along the way, if all goes well, you will also: • identify and fill gaps in theory • identify and solve interesting sub-problems whose solutions will have general applicability
  • 4. Optimization problems in compilers • Instruction selection • Instruction scheduling • basic-block scheduling • super-block scheduling • loop scheduling: tiling, unrolling, fusion • Memory hierarchy optimizations • Register allocation
  • 5. Optimization problems in compilers • Instruction selection • Instruction scheduling • basic-block scheduling • super-block scheduling • loop scheduling: tiling, unrolling, fusion • Memory hierarchy optimizations • Register allocation
  • 6. Production compilers “At the outset, note that basic-block scheduling is an NP-hard problem, even with a very simple formulation of the problem, so we must seek an effective heuristic, rather than an exact approach.” Steven Muchnick, Advanced Compiler Design & Implementation, 1997
  • 7. Outline • Introduction • computer architecture • superblock scheduling • Constraint programming approach • temporal scheduler • spatial and temporal scheduler • Experiments • experimental setup • experimental results • Lessons learned
  • 8. Computer architecture: Performing instructions in parallel • Multiple-issue • multiple functional units; e.g., ALUs, FPUs, load/store units, branch units • multiple instructions can be issued (begin execution) each clock cycle • issue width: max number of instructions that can be issued each clock cycle • on most architectures issue width less than number of functional units
  • 9. Computer architecture: Performing instructions in parallel • Pipelining • overlap execution of instructions on a single functional unit • latency of an instruction number of cycles before result is available • execution time of an instruction number of cycles before next instruction can be issued on same functional unit • serializing instruction instruction that requires exclusive use of entire processor in cycle in which it is issued Analogy: vehicle assembly line
  • 10. Superblock instruction scheduling • Instruction scheduling • assignment of a clock cycle to each instruction • needed to take advantage of complex features of architecture • sometimes necessary for correctness (VLIW) • Basic block • straight-line sequence of code with single entry, single exit • Superblock • collection of basic blocks with a unique entrance but multiple exits • Given a target architecture, find schedule with minimum expected completion time
  • 11. Example superblock A:1 1 dependency DAG • nodes 1 C:1 B:3 5 5 • one for each instruction • labeled with execution time D:1 2 • nodes F and G are branch instructions, labeled with probability the exit is taken 0 0 2 F:1 • arcs • represent precedence E:1 0 40% G:1 • labeled with latencies 60%
  • 12. Example superblock A:1 1 optimal cost schedule for 2-issue processor cycle C:1 B:3 5 5 ALU FPU 1 2 D:1 A B E:1 2 3 4 5 1 0 2 F:1 C 0 6 7 8 E 0 D 9 G:1 F 10 40% G 60%
  • 13. Computer architecture: General purpose architectures processor f register file i b m
  • 14. Computer architecture: Clustered architectures cluster 1 cluster 2 f1 i1 b1 f2 m1 i2 b2 m2 c1 c2 register file register file cluster interconnect c0 c3 register file register file f0 cluster 0 i0 b0 m0 f3 i3 b3 m3 cluster 3
  • 15. Computer architecture: Clustered architectures • Current: digital signal processing • multimedia, audio processing, image processing • wireless, ADSL modems, … • Future trend: general purpose multi-core processors • large numbers of cores • fast inter-processor communication
  • 16. Spatial and temporal scheduling A 1 2 1 B C 2 D cycle 2 E 1 2 F G 2 1 20% H 1 2 3 4 5 6 7 8 9 10 c0 cycle A B C D E F G H cost = 9.8 80% 1 2 3 4 5 6 7 8 9 10 c0 c1 A B C D E F G H cost = 7.6
  • 17. Spatial and temporal scheduling A 1 2 1 B C 2 D cycle 2 E 1 2 F G 2 1 20% H 1 2 3 4 5 6 7 8 9 10 c0 c1 A B C D E F G H cost = 7.6 80%
  • 18. Approaches • Superblock instruction scheduling is NP-complete • Heuristic approaches in all commercial and open-source research compilers • greedy list scheduling algorithm coupled with a priority heuristic • Here: Optimal approach • useful when longer compile times are tolerable • e.g., compiling for software libraries, digital signal processing, embedded applications, final production build
  • 19. Outline • Introduction • computer architecture • superblock scheduling • Constraint programming approach • temporal scheduler • spatial and temporal scheduler • Experiments • experimental setup • experimental results • Lessons learned
  • 20. Temporal scheduler: Basic constraint model A 1 variables 1 C B A, B, C, D, E, F, G 5 5 domains {1, …, m} D constraints E 2 B A + 1, C D B + 5, …, G 0 0 A + 1, F 2 F gcc(A, B, C, F, G, nALU) gcc(D, E, nFPU) gcc(A, …, G, issuewidth) 0 40% G cost function 40 F + 60 G 60%
  • 21. Temporal scheduler Basic constraint model (con‟t) non-fully pipelined instructions • introduce auxiliary variables PB,1 PB,2 • introduce additional constraints B + 1 = PB,1 B + 2 = PB,2 gcc(A, B, PB,1, PB,2 C, F, G, nALU) serializing instructions • similar technique B:3
  • 22. Temporal scheduler: Improving the model • Add constraints to increase constraint propagation (e.g., Smith 2006) • implied constraints: do not change set of solutions • dominance constraints: preserve an optimal solution • Here: • many constraints added to constraint model in extensive preprocessing stage that occurs once • extensive preprocessing effort pays off as model is solved many times
  • 23. Temporal scheduler: Improving the solver • From optimization to satisfaction • find bounds on cost function • enumerate solutions to cost function (knapsack constraint; Trick 2001) • step through in increasing order of cost • Improved bounds consistency algorithm for gcc constraints • Use portfolio to improve performance (Gomes et al. 1997) • increasing levels of constraint propagation • Impact-based variable ordering (Refalo 2004) • Structure-based decomposition technique (Freuder 1994)
  • 24. Spatial and temporal scheduler: Basic constraint model variables A cycle of issue: xA, xB, …, xH cluster: yA, yB, …, yH 1 2 domains dom(x) = {1, …, m} dom(y) = {0, …, k−1} communication constraints yA ≠ yC → xC ≥ xA + 1 + cost B C 2 D 2 E 1 2 … G 1 20% H cost function xH + 20 F 2 yA = yC → xC ≥ xA + 1 80 1 xG 80%
  • 25. Spatial and temporal scheduler: Improving the model • Symmetry breaking A • add auxiliary variables: zAC, zBC, … • dom(z) = {„=‟, „≠‟} • instead of backtracking on the y‟s backtrack on the edges with z‟s • preserves at least one optimal solution B 2 1 C 1 D
  • 26. Spatial and temporal scheduler: Improving the solver • Preprocess DAG to find instructions which must be on same cluster • preserve an optimal solution • Variable ordering • assign z variables first, in breadth-first order of DAG • determine assignment for corresponding y variables • determine cost of temporal schedule for these assignments
  • 27. Outline • Introduction • computer architecture • superblock scheduling • Constraint programming approach • temporal scheduler • spatial and temporal scheduler • Experiments • experimental setup • experimental results • Lessons learned
  • 28. Experimental setup: Instances • All 154,651 superblocks from SPEC 2000 integer and floating pt. benchmarks • standard benchmark suite • consists of software packages chosen to be representative of types of programming languages and applications • superblocks generated by IBM‟s Tobey compiler when compiling the software packages • compilations done using Tobey‟s highest level of optimization
  • 29. Experimental setup: Target architectures Realistic architectures: • not fully pipelined • issue width not equal to number of functional units • serializing instructions architecture issue width simple int. units 1-issue 1 1 2-issue 2 1 4-issue 4 2 6-issue 6 2 complex int. units branch units floating pt. units 1 1 memory units 1 1 1 1 1 2 3 2
  • 30. Experimental results: Temporal scheduler Total time (hh:mm:ss) to schedule all superblocks and percentage solved to optimality, for various time limits for solving each instance 1 sec. architecture time 10 sec. % time 1 min. % 7:15:46 99.38 time 10 min. % 10:22:36 99.96 time % 1-issue 1:30:20 97.34 15:08:44 99.98 2-issue 3:57:13 91.83 30:53:83 93.90 108:50:01 97.18 665:31:00 97.70 4-issue 2:17:44 95.47 17:09:48 96.60 61:29:31 98.43 343:04:46 98.87 6-issue 3:04:18 93.59 25:03:44 94.76 87:04:34 97.78 511:19:14 98.29
  • 31. Spatial and temporal scheduler: Some related work • Bottom Up Greedy (BUG) [Ellis. MIT Press „86] • greedy heuristic algorithm • localized clustering decisions • Hierarchical Partitioning (RHOP) [Chu et al. PLDI „03] • coarsening and refinement heuristic • weights of nodes and edges updated as algorithm progresses
  • 32. Experimental results: Spatial and temporal scheduler 1.6 4-cluster-2-issue-2-cyl rhop-ls Average Speedup 1.4 1.2 1 0.8 0.6 0.4 Benchmarks rhop-opt cp
  • 33. Experimental results: Spatial and temporal scheduler 3 applu-2-cyl rhop-ls 2.6 rhop-opt Average Speedup 2.2 1.8 1.4 1 0.6 1―1 1―2 1―4 1―6 2―1 2―2 2―4 2―6 4―1 4―2 4―4 4―6 Architecture Configuration (#Clusters – IssueWidth) 8―1 8―2 8―4 8―6 cp
  • 34. Outline • Introduction • computer architecture • superblock scheduling • Constraint programming approach • temporal scheduler • spatial and temporal scheduler • Experiments • experimental setup • experimental results • Lessons learned
  • 35. Lessons learned (I) • Pick problem carefully • is a new solution needed? • what is the likelihood of success? • Existing heuristics may not leave any room for improvement • examples: basic block scheduling, instruction selection
  • 36. Lessons learned (II) • Be prepared for adversity • significant overhead • learning domain of application • significant implementation • significant engineering • different research cultures • researchers are tribal • different standards of reviewing (number & contentiousness) • different standards of evaluation, formalization, assumptions
  • 37. Lessons learned (III) • Rewards • can be attractive to students • can lead to identifying and solving interesting sub-problems whose solutions have general applicability • bounds consistency for alldifferent and gcc global constraints • restarts and portfolios • machine learning of heuristics
  • 38. Optimization problems in compilers • Instruction selection • Instruction scheduling • basic-block scheduling • super-block scheduling • loop scheduling: tiling, unrolling, fusion • Memory hierarchy optimizations • Register allocation
  • 39. Selected publications • Applications A. M. Malik, M. Chase, T. Russell, and P. van Beek. An application of constraint programming to superblock instruction scheduling. CP-2008. M. Beg and P. van Beek. A constraint programming approach for integrated spatial and temporal scheduling for clustered architectures. ACM TECS, To appear. • Global constraints C.-G. Quimper, P. van Beek, A. Lopez-Ortiz, A. Golynski, and S. Bashir Sadjad. An efficient bounds consistency algorithm for the global cardinality constraint. CP-2003. A. Lopez-Ortiz, C.-G. Quimper, J. Tromp, and P. van Beek. A fast and simple algorithm for bounds consistency of the alldifferent constraint. IJCAI-2003. • Portfolios and restarts H. Wu and P. van Beek. On portfolios for backtracking search in the presence of deadlines. ICTAI-2007. H. Wu and P. van Beek. On universal restart strategies for backtracking search. CP-2007. • Heuristics and machine learning T. Russell, A. M. Malik, M. Chase, and P. van Beek. Learning heuristics for the superblock instruction scheduling problem. IEEE TKDE, 2009. M. Chase, A. M. Malik, T. Russell, R. W. Oldford, and P. van Beek. A computational study of heuristic and exact techniques for superblock instruction scheduling. J. of Scheduling, 2012.
  • 40. Next project: Smart water infrastructure / water analytics
  • 41. Spatial and temporal scheduler: Search tree of basic model yA= A B 2 0 1 2 3 1 yB= yC= C 1 yD= 0 1 0 0 12 3 2 3 D find temporal schedule for y = (0, 0, 0, 2) 1 2 3 0 1 2 0 1 0 3 23 1 2 3
  • 42. Spatial and temporal scheduler: Search tree of improved model zAC= A B 2 1 C 1 zBC= (‘=’) zCD= (‘=’) (‘≠’) (‘=’) (‘≠’) (‘=’) (‘≠’) (‘=’) (‘≠’) (‘=’) (‘≠’) (‘≠’) (‘=’) (‘≠’) D determine y, find temporal schedule for y =(0,0,0,0) same as y =(1,1,1,1) etc. determine y, find temporal schedule for y =(0,1,1,0) same as y =(2,3,3,2), y =(0,2,2,3) etc.
  • 44. Instruction Selection • Given • an expression DAG G • a set of tiles representing machine instructions • Find a mapping of tiles to nodes in G of minimal cost (size) that covers G • Complexity: • polynomial for trees • NP-hard for DAGs