SlideShare a Scribd company logo
INTRO TO COMPILER
DEVELOPMENT
LOGAN CHIEN
http://slide.logan.tw/compiler-intro/
LOGAN CHIEN
So ware engineer at MediaTek
Ametuar LLVM/Clang developer
Integrated LLVM/Clang into Android NDK
WHY I LOVE COMPILERS?
I have been a faithful reader of Jserv's blog for ten years.
I was inspired by the compiler and the virtual machine
technologies mentioned in his blog.
I have decided to choose compiler technologies as my
research topic since then.
UNDERGRADUATE COMPILER
COURSE
I took the undergraduate compiler course when I was a
sophomore.
MY PROFESSOR SAID ...
“We took many lectures to discuss about the parser.
However, when people say they are doing compiler
research, with large possibility, they are not referring to the
parsing technique.”
I AM HERE TO ...
Re-introduce the compiler technologies,
Give a lightening talk on industrial-strength compiler
design,
Explain the connection between compiler technologies and
the industry.
AGENDA
Re-introduction to Compiler (30min)
Industrial-strength Compiler Design (90min)
Compiler and ICT Industry (20min)
RE-INTRODUCTION TO
COMPILER
WHAT IS A COMPILER?
Compiler are tools for programmers to translate
programmer's thought into computer runnable programs.
ANALOGY — Translators who turn from one language to
another, such as those who translate Chinese to English.
WHAT HAVE WE LEARNED IN
UNDERGRADUATE COMPILER
COURSE?
LEXER
Reads the input source code (as a sequence of bytes) and
converts them into a stream of tokens.
u n s i g n e d b a c k g r o u n d ( u n s i g n e d f o r e g r o u n d ) {
i f ( ( f o r e g r o u n d > > 1 6 ) > 0 x 8 0 ) {
r e t u r n 0 ;
} e l s e {
r e t u r n 0 x f f f f f f ;
}
}
unsigned background ( unsigned foreground ) { if ( ( foreground >>
16 ) > 128 ) { return 0 ; } else { return 16777215 ; } }
PARSER
Reads the tokens and build an AST according to the syntax.
unsigned background ( unsigned foreground ) { if ( ( foreground >>
16 ) > 128 ) { return 0 ; } else { return 16777215 ; } }
( p r o c e d u r e b a c k g r o u n d
( a r g s ' ( f o r e g r o u n d ) )
( c o m p o u n d - s t m t
( i f - s t m t
( b i n - e x p r G E ( b i n - e x p r R S H I F T f o r e g r o u n d 1 6 ) 1 2 8 )
( r e t u r n - s t m t 0 )
( r e t u r n - s t m t 1 6 7 7 7 2 1 5 ) ) ) )
CODE GENERATOR
Generate the machine code or (assembly) according to the
AST. In the undergraduate course, we usually simply do
syntax-directed translation.
( p r o c e d u r e b a c k g r o u n d
( a r g s ' ( f o r e g r o u n d ) )
( c o m p o u n d - s t m t
( i f - s t m t
( b i n - e x p r G E ( b i n - e x p r R S H I F T f o r e g r o u n d 1 6 ) 1 2 8 )
( r e t u r n - s t m t 0 )
( r e t u r n - s t m t 1 6 7 7 7 2 1 5 ) ) ) )
l s r w 8 , w 0 , # 1 6
c m p w 8 , # 1 2 8
b . l o . L e l s e
m o v w 0 , w z r
r e t
. L e l s e :
o r r w 0 , w z r , # 0 x f f f f f f
r e t
WHAT'S MISSING?
Can a person who can only lex and parse sentences translate
articles well?
COMPILER REQUIREMENTS
A compiler should translate the source code precisely.
A compiler should utilize the device efficiently.
THREE RELATED FIELDS
Programming Language
Computer Architecture
Compiler
PROGRAMMING LANGUAGE THEORY
Essential component of a programming language: type
theory, variable scoping, language semantics, etc.
How do people reason and compose a program?
Create an abstraction that is understandable to human and
tracable to computers.
EXAMPLE: SUBTYPE AND MUTABLE
RECORDS
Why you can't perform following conversion in C++?
v o i d t e s t ( i n t * p t r ) {
i n t * * p = & p t r ;
c o n s t i n t * * a = p ; / / C o m p i l e r g i v e s w a r n i n g
/ / . . .
}
This is related to covariant type and contravarience type. With PLT, we know that we can only choose
two of (a) covariant type, (b) mutable records, and (c) type consistency.
v o i d t e s t ( i n t * p t r ) {
c o n s t i n t c = 0 ;
i n t * * p = & p t r ;
c o n s t i n t * * a = p ; / / I f i t i s a l l o w e d , b a d p r o g r a m s w i l l p a s s .
* a = & c ;
* p = 5 ; / / N o w a r n i n g h e r e .
}
EXAMPLE: DYNAMIC SCOPING IN
BASH
# ! / b i n / s h
v = 1 # I n i t i a l i z e v w i t h 1
f o o ( ) {
e c h o " f o o : v = $ { v } " # W h i c h v i s r e f e r r e d ?
v = 2 # W h i c h v i s a s s i g n e d ?
}
b a r ( ) {
l o c a l v = 3
f o o
e c h o " b a r : v = $ { v } " # W h a t w i l l b e p r i n t e d ?
}
v = 4 # A s s i g n 4 t o v
b a r
e c h o " v = $ { v } " # W h a t w i l l b e p r i n t e d ?
Ans: foo:v=3, bar:v=2, v=4. Surprisingly, foois
accessing local vin barinstead of the global v.
EXAMPLE: NON-LEXICAL SCOPING IN
JAVASCRIPT
/ / J a v a s c r i p t , t h e b a d p a r t
f u n c t i o n b a d ( v ) {
v a r s u m ;
w i t h ( v ) {
s u m = a + b ;
}
r e t u r n s u m ;
}
c o n s o l e . l o g ( b a d ( { a : 5 , b : 1 0 } ) ) ;
c o n s o l e . l o g ( b a d ( { a : 5 , b : 1 0 , s u m : 1 0 0 } ) ) ;
Ans: The second console.log()prints undefined.
COMPUTER ARCHITECTURE
Instruction set architecture: CISC vs. RISC.
Out-of-Order Execution vs. Instruction Scheduling.
Memory hierarchy
Memory model
QUIZ: DO YOU REALLY KNOW C?
Is it guaranteed that vwill always be loaded a er pred?
i n t p r e d ;
i n t v ;
i n t g e t ( i n t a , i n t b ) {
i n t r e s ;
i f ( p r e d > 0 ) {
r e s = v * a - v / b ;
} e l s e {
r e s = v * a + v / b ;
}
r e t u r n r e s ;
}
Ans: No. Independent reads/writes can be reordered. The standard only requires the result should
be the same as running from top to bottom (in a single thread.)
COMPILER ANALYSIS
Data-flow analysis — Analyze value ranges, check the
conditions or contraints, figure out modifications to
variables, etc.
Control-flow analysis — Analyze the structure of the
program, such as control dependency and loop structure.
Memory dependency analysis — Analyze the memory
access pattern of the access to array elements or pointer
dereferences, e.g. alias analysis.
ALIAS ANALYSIS
Determine whether two pointers can refers to the same
object or not.
v o i d m o v e ( c h a r * d s t , c o n s t c h a r * s r c , i n t n ) {
f o r ( i n t i = 0 ; i < n ; + + i ) {
d s t [ i ] = s r c [ i ] ;
}
}
i n t s u m ( i n t * p t r , c o n s t i n t * v a l , i n t n ) {
i n t r e s = 0 ;
f o r ( i n t i = 0 ; i < n ; + + i ) {
r e s + = * v a l ;
* p t r + + = 1 0 ;
}
r e t u r n r e s ;
}
QUIZ: DO YOU KNOW C++?
c l a s s Q M u t e x L o c k e r {
p u b l i c :
u n i o n {
Q M u t e x * m t x _ ;
u i n t p t r _ t v a l _ ;
} ;
v o i d u n l o c k ( ) {
i f ( v a l _ ) {
i f ( ( v a l _ & ( u i n t p t r _ t ) 1 ) = = ( u i n t p t r _ t ) 1 ) {
v a l _ & = ~ ( u i n t p t r _ t ) 1 ;
m t x _ - > u n l o c k ( ) ;
}
}
}
} ;
Pitfall: Reading from union fields that were not written
previously results in undefined behavior. Type-Based Alias
Analysis (TBAA) exploits this rule.
COMPILER OPTIMIZATION
Scalar optimization — Fold the constants, remove the
redundancies, change expressions with identities, etc.
Vector optimization — Convert several scalar operations
into one vector operation, e.g. combining for addinstruction
into one vector add.
Interprocedural optimization — function inlining,
devirtualization, cross-function analysis, etc.
OTHER COMPILER-RELATED
TECHNOLOGY
Just-in-time compilers
Binary translators
Program profiling and performance measurement
Facilities to run compiled executables, e.g. garbage
collectors
INDUSTRIAL-
STRENGTH COMPILER
DESIGN
What's the difference between your
final project and the industrial-
strength compiler?
KEY DIFFERENCE
Analysis — Reasons program structures and changes of
values.
Optimization — Applies several provably correct
transformation which should make program run faster.
Intermediate Representation — Data structure on which
analyses and optimizations are based.
INTERMEDIATE REPRESENTATION1/4
A data structure for program analyses and optimizations.
High-level enough to capture important properties and
encapsulates hardware limitation.
Low-level enough to be analyzed by analyses and
manipulated by transformations.
An abstraction layer for multiple front-ends and back-ends.
INTERMEDIATE REPRESENTATION2/4
C/C++
Ada
Fortran
Go
arm
aarch64
mips
nds
x86
GENERIC RTLGIMPLE
Tree SSA
GCC Compiler Pipeline
INTERMEDIATE REPRESENTATION3/4
C/C++
Swift
Rust
Obj‑C
arm
aarch64
mips
sparc
x86
MachInstrSelDAGLLVM IR
LLVM Compiler Pipeline
INTERMEDIATE REPRESENTATION4/4
GCC — GENERIC, GIMPLE, Tree SSA, and RTL.
LLVM — LLVM IR, Selection DAG, and Machine Instructions.
Java HotSpot — HIR, LIR, and MIR.
CONTROL FLOW GRAPH1/3
Basic block — A sequence of instructions that will only be
entered from the top and exited from the end.
Edge — If the basic block s may branch to t, then we add a
directed edge (s, t).
Predecessor/Successor — If there is an edge (s, t), then s is a
predecessor of t and t is a successor of s.
CONTROL FLOW GRAPH2/3
/ / y = a * x + b ;
v o i d m a t m u l ( d o u b l e * r e s t r i c t y ,
u n s i g n e d l o n g m , u n s i g n e d l o n g n ,
c o n s t d o u b l e * r e s t r i c t a ,
c o n s t d o u b l e * r e s t r i c t x ,
c o n s t d o u b l e * r e s t r i c t b ) {
f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) {
d o u b l e t = b [ r ] ;
f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) {
t + = a [ r * n + i ] * x [ i ] ;
}
y [ r ] = t ;
}
}
Input source program
CONTROL FLOW GRAPH3/3
bp = getelementptr b, r
t = load bp
i = 0
cmp = icmp lt, i, n
br cmp, B2, B3
B1
B2
rn = mul r, n
ai = add rn, i
ap = getelementptr a, ai
ad = load ap
xp = getelementptr x, i
xd = load xp
ax = mul ad, xd
t = add t, ax
i = add i, 1
cmp = icmp lt, i, n
br cmp, B2, B3
B3
yp = getelementptr y, r
store t, yp
r = add r, 1
cmp = icmp lt, r, n
br cmp, B1, B4B4
ret void
B0 (ENTRY)
r = 0
cmp = icmp lt r, m
br cmp, B1, B4
VARIABLE DEFINITION
bp = getelementptr b, r
t = load bp
i = 0
cmp = icmp lt, i, n
br cmp, B2, B3
B1
B2
rn = mul r, n
ai = add rn, i
ap = getelementptr a, ai
ad = load ap
xp = getelementptr x, i
xd = load xp
ax = mul ad, xd
t = add t, ax
i = add i, 1
cmp = icmp lt, i, n
br cmp, B2, B3
B3
yp = getelementptr y, r
store t, yp
r = add r, 1
cmp = icmp lt, r, n
br cmp, B1, B4B4
ret void
B0 (ENTRY)
r = 0
cmp = icmp lt r, m
br cmp, B1, B4
definitions of r
The place where a variable is assigned or defined.
VARIABLE USE
bp = getelementptr b, r
t = load bp
i = 0
cmp = icmp lt, i, n
br cmp, B2, B3
B1
B2
rn = mul r, n
ai = add rn, i
ap = getelementptr a, ai
ad = load ap
xp = getelementptr x, i
xd = load xp
ax = mul ad, xd
t = add t, ax
i = add i, 1
cmp = icmp lt, i, n
br cmp, B2, B3
B3
yp = getelementptr y, r
store t, yp
r = add r, 1
cmp = icmp lt, r, n
br cmp, B1, B4B4
ret void
B0 (ENTRY)
r = 0
cmp = icmp lt r, m
br cmp, B1, B4
definitions of r
uses of r
The places where a variable is referred or used.
REACHING DEFINITION1/2
bp = getelementptr b, r
t = load bp
i = 0
cmp = icmp lt, i, n
br cmp, B2, B3
B1
B2
rn = mul r, n
ai = add rn, i
ap = getelementptr a, ai
ad = load ap
xp = getelementptr x, i
xd = load xp
ax = mul ad, xd
t = add t, ax
i = add i, 1
cmp = icmp lt, i, n
br cmp, B2, B3
B3
yp = getelementptr y, r
store t, yp
r = add r, 1
cmp = icmp lt, r, n
br cmp, B1, B4B4
ret void
B0 (ENTRY)
r = 0
cmp = icmp lt r, m
br cmp, B1, B4
{d0} {d0, d1}
{d0, d1}
d0:
d1:
{d1}
{d0, d1}
{d0, d1}
Definitions that reaches a use.
REACHING DEFINITION2/2
Constant propagation is a good example to show the
usefulness of reaching definition.
v o i d t e s t ( i n t c o n d ) {
i n t a = 1 ; / / d 0
i n t b = 2 ; / / d 1
i f ( c o n d ) {
c = 3 ; / / d 2
} e l s e {
/ / R e a c h D e f [ a ] = { d 0 }
/ / R e a c h D e f [ b ] = { d 1 }
c = a + b ; / / d 3
}
/ / R e a c h D e f [ c ] = { d 2 , d 3 }
u s e ( c ) ;
}
DOMINANCE RELATION1/2
A basic block s dominates t iff every paths that goes from
entry to t will pass through s.
Every basic block in a CFG has an immediate dominator and
forms a dominator tree.
DOMINANCE RELATION2/2
B0
B1
B2
B3
B4
B0
B1
B2B3
B4
Dominator Tree
DOMINANCE FRONTIER
A basic block t is a dominance frontier of a basic block s, if
one of predecessor of t is dominated by s but t is not strictly
dominated by s.
B0
B1 B2
B3
B6
B4
B5
DF[B2] = {B3, B5}
DF[B4] = {B4}
DF[B1] = {B3, B5}
STATIC SINGLE-ASSIGNMENT FORM
static — A static analysis to the program (not the execution.)
single-assignment — Every variable can only be assigned
once.
SSA form is the most popular intermediate representation
recently.
It is adopted by a wide range of compilers, such as GCC,
LLVM, Java HotSpot, Android ART, etc.
SSA PROPERTIES
Every variables can only be defined once.
Every uses can only refer to one definition.
Use phifunction to handle the merged control-flow.
PHI FUNCTIONS
d e f i n e v o i d @ f o o ( i 1 c o n d ,
i 3 2 a , i 3 2 b ) {
e n t :
b r c o n d , b 1 , b 2
b 1 :
t 0 = m u l a , 4
b r b 3
b 2 :
t 1 = m u l b , 5
b r b 3
b 3 :
t 2 = p h i ( t 0 ) , ( t 1 )
u s e ( t 2 )
r e t
}
d e f i n e v o i d @ f o o ( i 3 2 n ) {
e n t :
b r l o o p
l o o p :
i 0 = p h i ( 0 ) , ( i 1 )
c m p = i c m p g e i 0 , n
b r c m p , e n d , c o n t
c o n t :
u s e ( i 0 )
i 1 = a d d i 0 , 1
b r l o o p
e n d :
r e t
}
ADVANTAGE OF SSA FORM
Compact — Reduce the def-use chain.
Referential transparency — The properties associated with a
variable will not be changed, aka. context-free.
REDUCED DEF-USE CHAIN
v o i d f o o ( i n t c o n d 1 , i n t c o n d 2 ,
i n t a , i n t b ) {
i n t t ;
i f ( c o n d 1 ) {
t = a * 4 ; / / d 0
} e l s e {
t = b * 5 ; / / d 1
}
i f ( c o n d 2 ) {
/ / r e a c h - d e f : { d 0 , d 1 }
u s e ( t ) ;
} e l s e {
/ / r e a c h - d e f : { d 0 , d 1 }
u s e ( t ) ;
}
}
v o i d f o o ( i n t c o n d 1 , i n t c o n d 2 ,
i n t a , i n t b ) {
i f ( c o n d 1 ) {
t . 0 = a * 4 ;
} e l s e {
t . 1 = b * 5 ;
}
t . 2 = p h i ( t . 0 , t . 1 ) ;
i f ( c o n d 2 ) {
u s e ( t . 2 ) ;
} e l s e {
u s e ( t . 2 ) ;
}
}
REFERENTIAL TRANSPARENCY1/2
v o i d f o o ( ) {
i n t r = 5 ; / / d 0
/ / . . . S O M E C O D E . . .
/ / W e c a n o n l y a s s u m e
/ / " r = = 5 " i f d 0 i s t h e
/ / o n l y r e a c h i n g d e f i n i t i o n .
u s e ( r ) ;
}
v o i d f o o ( ) {
r . 0 = 5 ;
/ / . . . S O M E C O D E . . .
/ / N o m a t t e r w h a t c o d e a r e
/ / s k i p p e d a b o v e , i t i s s a f e
/ / t o r e p l a c e f o l l o w i n g r . 0
/ / w i t h 5 .
u s e ( r . 0 ) ;
}
REFERENTIAL TRANSPARENCY2/2
v o i d f o o ( i n t a , i n t b ) {
i n t c = a + b ;
i n t r = a + b ;
/ / C a n w e s i m p l y r e p l a c e
/ / a l l o c c u r r e n c e o f r w i t h
/ / c ? ( N O )
/ / . . . S O M E C O D E . . .
u s e ( r ) ;
}
v o i d f o o ( i n t a , i n t b ) {
c . 0 = a + b ;
r . 0 = a + b ;
/ / . . . S O M E C O D E . . .
/ / N o m a t t e r w h a t c o d e a r e
/ / s k i p p e d a b o v e , i t i s s a f e
/ / t o r e p l a c e f o l l o w i n g r . 0
/ / w i t h c . 0 .
u s e ( r . 0 ) ;
}
BUILDING SSA FORM
1. Compute domination relationships between basic blocks
and build the dominator tree.
2. Compute dominator frontiers.
3. Insert phifunctions at dominator frontiers.
4. Traverse the dominator tree and rename the variables.
BEFORE SSA CONSTRUCTION
bp = getelementptr b, r
t = load bp
i = 0
cmp = icmp lt, i, n
br cmp, B2, B3
B1
B2
rn = mul r, n
ai = add rn, i
ap = getelementptr a, ai
ad = load ap
xp = getelementptr x, i
xd = load xp
ax = mul ad, xd
t = add t, ax
i = add i, 1
cmp = icmp lt, i, n
br cmp, B2, B3
B3
yp = getelementptr y, r
store t, yp
r = add r, 1
cmp = icmp lt, r, n
br cmp, B1, B4B4
ret void
B0 (ENTRY)
r = 0
cmp = icmp lt r, m
br cmp, B1, B4
AFTER SSA CONSTRUCTION
r.0 = phi (0, B0) (r.1 B3)
bp = getelementptr b, r.0
t.0 = load bp
cmp = icmp lt, 0, n
br cmp, B2, B3
B1
B2
t.1 = phi (t.0, B1) (t.2 B2)
i.0 = phi (0, B1) (i.1, B2)
rn = mul r.0, n
ai = add rn, i.1
ap = getelementptr a, ai
ad = load ap
xp = getelementptr x, i.1
xd = load xp
ax = mul ad, xd
t.2 = add t.1, ax
i.1 = add i.0, 1
cmp = icmp lt, i.1, n
br cmp, B2, B3
B3
t.3 = phi (t0, B1) (t.2, B2)
yp = getelementptr y, r.0
store t.3, yp
r.1 = add r.0, 1
cmp = icmp lt, r.1, n
br cmp, B1, B4
B4
ret void
B0 (ENTRY)
cmp = icmp lt 0, m
br cmp, B1, B4
OPTIMIZATIONS
CONSTANT PROPAGATION1/3
Constant propagation, sometimes known as constant
folding, will evaluate the instructions with constant
operands and propagate the constant result.
a = a d d 2 , 3
b = a
c = m u l a , b
a = 5
b = 5
c = 2 5
CONSTANT PROPAGATION2/3
Why do we need constant propagation?
s t r u c t q u e u e * c r e a t e _ q u e u e ( ) {
r e t u r n ( s t r u c t q u e u e * ) m a l l o c ( s i z e o f ( s t r u c t q u e u e * ) * 1 6 ) ;
}
i n t p r o c e s s _ d a t a ( i n t a , i n t b , i n t c ) {
i n t k [ ] = { 0 x 1 3 , 0 x 1 7 , 0 x 1 9 } ;
i f ( D E B U G ) {
v e r i f y _ d a t a ( k , d a t a ) ;
}
r e t u r n ( a * k [ 1 ] * k [ 2 ] + b * k [ 0 ] * k [ 2 ] + c * k [ 0 ] * k [ 1 ] ) ;
}
CONSTANT PROPAGATION3/3
For each basic block b from CFG in reversed postorder:
For each instruction i in basic block b from top to bottom:
If all of its operands are constants, the operation has no
side-effect, and we know how to evaluate it at compile time,
then evaluate the result, remove the instruction, and replace
all usages with the result.
GLOBAL VALUE NUMBERING1/2
Global value numbering tries to give numbers to the
computed expression and maps the newly visited expression
to the visited ones.
a = c * d ; / / [ ( * , c , d ) - > a ]
e = c ; / / [ ( * , c , d ) - > a ]
f = e * d ; / / q u e r y : i s ( * , c , d ) a v a i l a b l e ?
u s e ( a , e , f ) ;
a = c * d ;
u s e ( a , c , a ) ;
GLOBAL VALUE NUMBERING2/2
Traverse the basic blocks in depth-first order on dominator
tree.
Maintain a stack of hash table. Once we have returned from
a child node on the dominator tree, then we have to pop the
stack top.
Visit the instructions in the basic block with tᅠ=ᅠopᅠa,ᅠb
form and compute the hash for (op, a, b).
If (op, a, b) is already in the hash table, then change
tᅠ=ᅠopᅠa,ᅠb with tᅠ=ᅠhash_tab[(op,ᅠa,ᅠb)]
Otherwise, insert (op, a, b) -> t to the hash table.
DEAD CODE ELIMINATION1/6
Dead code elimination (DCE) removes unreachable
instructions or ignored results.
Constant propagation might reveal more dead code since
the branch conditions become constant value.
On the other hand, DCE can exploit more constant for
constant propagation because several definitions are
removed from the program.
DEAD CODE ELIMINATION2/6
Conditional statements with constant condition
i f ( k I s D e b u g B u i l d ) { / / D e a d c o d e
c h e c k _ i n v a r i a n t ( a , b , c , d ) ; / / D e a d c o d e
}
DEAD CODE ELIMINATION3/6
Platform-specific constant
v o i d h a s h _ e n t _ s e t _ v a l ( s t r u c t h a s h _ e n t * h , i n t v ) {
i f ( s i z e o f ( i n t ) < = s i z e o f ( v o i d * ) ) {
h - > p = ( v o i d * ) ( u i n t p t r _ t ) ( v ) ;
} e l s e {
h - > p = m a l l o c ( s i z e o f ( i n t ) ) ; / / D e a d c o d e
* ( h - > p ) = v ; / / D e a d c o d e
}
}
DEAD CODE ELIMINATION4/6
Computed result ignored
i n t c o m p u t e _ s u m ( i n t a , i n t b ) {
i n t s u m = ( a + b ) * ( a - b + 1 ) / 2 ; / / D e a d c o d e : N o t u s e d
r e t u r n 0 ;
}
Dead code a er dead store optimization
v o i d t e s t ( i n t * p , b o o l c o n d , i n t a , i n t b , i n t c ) {
i f ( c o n d ) {
t = a + b ; / / D e a d c o d e
* p = t ; / / W i l l b e r e m o v e d b y D S E
}
* p = c ;
}
DEAD CODE ELIMINATION5/6
Dead code a er code specialization
v o i d m a t m u l ( d o u b l e * r e s t r i c t y , u n s i g n e d l o n g m , u n s i g n e d l o n g n ,
c o n s t d o u b l e * r e s t r i c t a ,
c o n s t d o u b l e * r e s t r i c t x ,
c o n s t d o u b l e * r e s t r i c t b ) {
i f ( n = = 0 ) {
f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) {
d o u b l e t = b [ r ] ;
f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { / / D e a d c o d e
t + = a [ r * n + i ] * x [ i ] ; / / D e a d c o d e
} / / D e a d c o d e
y [ r ] = t ;
}
} e l s e {
/ / . . . s k i p p e d . . .
}
}
DEAD CODE ELIMINATION6/6
Traverse the CFG starting from entry in reversed post order.
Only traverse the successor that may be visited, i.e. if the
branch condition is a constant, then ignore the other side.
While traversing the basic block, clear the dead instructions
within the basic block.
A er the traversal, remove the unvisited basic blocks and
remove the variable uses that refers to the variables that are
defined in the unvisited basic blocks.
LOOP OPTIMIZATIONS
It is reasonable to assume a program spends more time in
the loop body. Thus, loop optimization is an important issue
in the compiler.
LOOP INVARIANT CODE MOTION1/3
Loop invariant code motion (LICM) is an optimization which
moves loop invariants or loop constants out of the loop.
i n t t e s t ( i n t n , i n t a , i n t b , i n t c ) {
i n t s u m = 0 ;
f o r ( i n t i = 0 ; i < n ; + + i ) {
s u m + = i * a * b * c ; / / a * b * c i s l o o p i n v a r i a n t
}
r e t u r n s u m ;
}
LOOP INVARIANT CODE MOTION2/3
f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) {
d o u b l e t = b [ r ] ;
f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) {
t + = a [ ( r * n ) + i ] * x [ i ] ; / / " r * n " i s a i n n e r l o o p i n v a r i a n t
}
y [ r ] = t ;
}
f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) {
d o u b l e t = b [ r ] ;
u n s i g n e d l o n g k = r * n ; / / " r * n " m o v e d o u t o f t h e i n n e r l o o p
f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) {
t + = a [ k + i ] * x [ i ] ;
}
y [ r ] = t ;
}
LOOP INVARIANT CODE MOTION3/3
How do we know whether a variable is a loop invariant?
If the computation of a value is not (transitively) depending
on following black lists, we can assume such value is a loop
invariant:
Phi instructions associating with the loop being
considered
Instructions with side-effects or non-pure instructions,
e.g. function calls
INDUCTION VARIABLE
If x represents the trip count (iteration count), then we call
a*x+b induction variables.
f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) {
d o u b l e t = b [ r ] ;
u n s i g n e d l o n g k = r * n ; / / o u t e r l o o p I V
f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) {
u n s i g n e d l o n g m = k + i ; / / i n n e r l o o p I V
t + = a [ m ] * x [ i ] ;
}
y [ r ] = t ;
}
k and r are induction variables of outer loop.
i and m are induction variables of inner loop.
LOOP STRENGTH REDUCTION1/2
Strength reduction is an optimization to replace the
multiplication in induction variables with an addition.
f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) {
d o u b l e t = b [ r ] ;
u n s i g n e d l o n g k = r * n ; / / o u t e r l o o p I V
f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) {
u n s i g n e d l o n g m = k + i ; / / i n n e r l o o p I V
t + = a [ m ] * x [ i ] ;
}
y [ r ] = t ;
}
f o r ( u n s i g n e d l o n g r = 0 , k = 0 ; r < m ; + + r , k + = n ) { / / k
d o u b l e t = b [ r ] ;
f o r ( u n s i g n e d l o n g i = 0 , m = k ; i < n ; + + i , + + m ) { / / m
t + = a [ m ] * x [ i ] ;
}
y [ r ] = t ;
}
LOOP STRENGTH REDUCTION2/2
We can even rewrite the range to eliminate the index
computation.
i n t s u m ( c o n s t i n t * a , i n t n ) {
i n t r e s = 0 ;
f o r ( i n t i = 0 ; i < n ; + + i ) {
r e s + = a [ i ] ; / / i m p l i c i t * ( a + s i z e o f ( i n t ) * i )
}
r e t u r n r e s ;
}
i n t s u m ( c o n s t i n t * a , i n t n ) {
i n t r e s = 0 ;
c o n s t i n t * e n d = a + n ; / / i m p l i c i t a + s i z e o f ( i n t ) * n
f o r ( c o n s t i n t * p = a ; p ! = e n d ; + + p ) { / / r a n g e u p d a t e d
r e s + = * p ;
}
r e t u r n r e s ;
}
LOOP UNROLLING1/2
Unroll the loop body multiple times.
Purpose: Reduce amortized loop iteration overhead.
Purpose: Reduce load/store stall and exploit instruction-
level parallelism, e.g. so ware pipelining.
Purpose: Prepare for vectorization, e.g. SIMD.
LOOP UNROLLING2/2
f o r ( u n s i g n e d l o n g r = 0 , k = 0 ; r < m ; + + r , k + = n ) {
d o u b l e t = b [ r ] ;
s w i t c h ( n & 0 x 3 ) { / / D u f f ' s d e v i c e
c a s e 3 : t + = a [ k + 2 ] * x [ 2 ] ;
c a s e 2 : t + = a [ k + 1 ] * x [ 1 ] ;
c a s e 1 : t + = a [ k ] * x [ 0 ] ;
}
f o r ( u n s i g n e d l o n g i = n & 0 x 3 ; i < n ; i + = 4 ) {
t + = a [ k + i ] * x [ i ] ;
t + = a [ k + i + 1 ] * x [ i + 1 ] ;
t + = a [ k + i + 2 ] * x [ i + 2 ] ;
t + = a [ k + i + 3 ] * x [ i + 3 ] ;
}
y [ r ] = t ;
}
INSTRUCTION SELECTION1/5
Instruction selection is a process to map IR into machine
instructions.
Compiler back-ends will perform pattern matching to the
best select instructions (according to the heuristic.)
INSTRUCTION SELECTION2/5
Complex operations, e.g. shi -and-add or multiply-
accumulate.
Array load/store instructions, which can be translated to one
shi -add-and-load on some architectures.
IR instructions are which not natively supported by the target
machine.
INSTRUCTION SELECTION3/5
d e f i n e i 6 4 @ m a c ( i 6 4 % a , i 6 4 % b , i 6 4 % c ) {
e n t :
% 0 = m u l i 6 4 % a , % b
% 1 = a d d i 6 4 % 0 , % c
r e t i 6 4 % 1
}
m a c :
m a d d x 0 , x 1 , x 0 , x 2
r e t
INSTRUCTION SELECTION4/5
d e f i n e i 6 4 @ l o a d _ s h i f t ( i 6 4 * % a , i 6 4 % i ) {
e n t :
% 0 = g e t e l e m e n t p t r i 6 4 , i 6 4 * % a , i 6 4 % i
% 1 = l o a d i 6 4 , i 6 4 * % 0
r e t i 6 4 % 1
}
l o a d _ s h i f t :
l d r x 0 , [ x 0 , x 1 , l s l # 3 ]
r e t
INSTRUCTION SELECTION5/5
% s t r u c t . A = t y p e { i 6 4 * , [ 1 6 x i 6 4 ] }
d e f i n e i 6 4 @ g e t _ v a l ( % s t r u c t . A * % p , i 6 4 % i , i 6 4 % j ) {
e n t :
% 0 = g e t e l e m e n t p t r % s t r u c t . A , % s t r u c t . A * % p , i 6 4 % i , i 3 2 1 , i 6 4 % j
% 1 = l o a d i 6 4 , i 6 4 * % 0
r e t i 6 4 % 1
}
g e t _ v a l :
m o v z w 8 , # 0 x 8 8
m a d d x 8 , x 1 , x 8 , x 0
a d d x 8 , x 8 , x 2 , l s l # 3
l d r x 0 , [ x 8 , # 8 ]
r e t
SSA DESTRUCTION1/5
How do we deal with the phifunctions?
Copy the assignment operator to the end of the predecessor.
(Must be done carefully)
SSA DESTRUCTION2/3
cmp = icmp eq, cond, 0
br cmp, B1, B2
B0
br B3
B1
br B3
B2
a = phi (3, B1) (7, B2)
print(a)
B3
cmp = icmp eq, cond, 0
br cmp, B1, B2
B0
a = 3
br B3
B1
a = 7
br B3
B2
print(a)
B3
converts to
Example to show the assignment copying
SSA DESTRUCTION3/5
i.0 = phi (0, B0) (i.1, B2)
cmp = icmp lt, i.0, 100
br cmp, B2, B3
B1
ret
B3
print(i.0)
i.1 = add i.0, 1
br B1
B2
br B1
B0
cmp = icmp lt, i.0, 100
br cmp, B2, B3
B1 B3
print(i.0)
i.1 = add i.0, 1
i.0 = i.1
br B1
B2
i.0 = 0
br B1
B0
ret
Converts to
Converts to
Example to show the assignment copying with loop
SSA DESTRUCTION4/5
i.0 = phi (0, B0) (i.1, B2)
cmp = icmp lt, i.0, 100
br cmp, B2, B3
B1
print(i.0)
ret
B3
print(i.0)
i.1 = add i.0, 1
br B1
B2
br B1
B0
cmp = icmp lt, i.0, 100
br cmp, B2, B3
B1 B3
print(i.0)
i.1 = add i.0, 1
i.0 = i.1
br B1
B2
i.0 = 0
br B1
B0
print(i.0)
ret
Doesn't
work!
Lost copy problem — Naive de-SSA algorithm doesn't work
due to live range conflicts. Need extra assignments and
renaming to avoid conflicts.
SSA DESTRUCTION5/5
t = x
x = y
y = t
br cmp, B2, B1
B1
print(x)
print(y)
B2
x = 1
y = 2
br B1
B0
x1 = phi (x0, y1)
y1 = phi (y0, x1)
br cmp, B2, B1
B1
print(x1)
print(y1)
B2
x0 = 1
y0 = 2
br B1
B0
x1 = y1
y1 = x1
br cmp, B2, B1
B1
print(x1)
print(y1)
B2
x0 = 1
y0 = 2
x1 = x0
y1= y0
br B1
B0
Swap problem — Conflicts due to parallel assignment
semantics of phiinstructions. A correct algorithm should
detect the cycle and implement parallel assignment with
swap instructions.
REGISTER ALLOCATION1/2
We have to replace infinite virtual registers with finite
machine registers.
Register Allocation — Make sure that the maximum
simultaneous register usages are less then k.
Register Assignment — Assign a register given the fact that
registers are always sufficient.
The classical solution is to compute the life time of each
variables, build the inference graph, spill the variable to
stack if the inference graph is not k-colorable.
REGISTER ALLOCATION2/2
ret y
add y, c, g
add g a f
add f, d, e
add e, a, c
add d, a, b
mov c, #2
mov b, #1
mov a, #1
a
b
c
d
e
f
g
y
a
g
b
c
e
d
f
y
INSTRUCTION SCHEDULING1/2
Sort the instruction according the number of cycles in order
to reduce execution time of the critical path.
Constraints: (a) Data dependence, (b) Functional units, (c)
Issue width, (d) Datapath forwarding, (e) Instruction cycle
time, etc.
INSTRUCTION SCHEDULING2/2
0 : a d d r 0 , r 0 , r 1
1 : a d d r 1 , r 2 , r 2
2 : l d r r 4 , [ r 3 ]
3 : s u b r 4 , r 4 , r 0
4 : s t r r 4 , [ r 1 ]
2 : l d r r 4 , [ r 3 ] # l o a d i n s t r u c t i o n n e e d s m o r e c y c l e s
0 : a d d r 0 , r 0 , r 1
1 : a d d r 1 , r 2 , r 2
3 : s u b r 4 , r 4 , r 0
4 : s t r r 4 , [ r 1 ]
COMPILER AND ICT
INDUSTRY
WHERE CAN WE FIND COMPILER?
SMART PHONES
Android ART (Java VM)
OpenGL|ES GLSL Compiler (Graphics)
BROWSERS
Javascript-related
Javascript Engine, e.g. V8, IonMonkey, JavaScriptCore,
Chakra
Regular Expression Engine
WebAssembly
WebGL — OpenGL binding for the Web (Graphics)
WebCL — OpenCL binding for the Web (Computing)
DESKTOP APPLICATION RUN-TIME
ENVIRONMENTS
Java run-time (Java, Scala, Clojure, Groovy)
.NET run-time (C#, F#)
Ruby
Python
PHP
VIRTUAL MACHINES AND EMULATORS
Simulate different architecture (using dynamic
recompilation), e.g. QEMU
Simulate special instructions that are not supported by
hypervisor.
DEVELOPMENT ENVIRONMENT
C/C++ compiler, e.g. MSVC, GCC, LLVM
Java compiler, e.g. OpenJDK
DSP compilers for digital signal processors.
VHDL compilers for IC design team.
Profiling and/or intrusmentation tools, e.g. Valgrind, Dr.
Memory, Address Sanitizer, Memory Sanitizer
WHAT DOES A COMPILER TEAM DO?
Develop the compiler toolchain for the in-house processors,
e.g. DSP, GPU, CPU, and etc.
Tune the performance of the existing compiler or virtual
machines.
Develop a new programming language or model (e.g. Cuda
from NVIDIA.)
SOME RESEARCH TOPICS ...
Memory model for concurrency — Designing a good
memory model at programming language level, which
should be intuitive to human while open for
compiler/architecture optimization, is still a challenging
problem.
Both Java and C++ took efforts to standardize. However,
there are still some unintuitive cases that are allowed and
some desirable cases being ruled out.
THE END
Q & A

More Related Content

What's hot

Ch03
Ch03Ch03
Ch03
Hankyo
 
Elements of C++11
Elements of C++11Elements of C++11
Elements of C++11
Uilian Ries
 
Ch04
Ch04Ch04
Ch04
Hankyo
 
ALF 5 - Parser Top-Down
ALF 5 - Parser Top-DownALF 5 - Parser Top-Down
ALF 5 - Parser Top-Down
Alexandru Radovici
 
C++ Advanced
C++ AdvancedC++ Advanced
C++ AdvancedVivek Das
 
BBS crawler for Taiwan
BBS crawler for TaiwanBBS crawler for Taiwan
BBS crawler for TaiwanBuganini Chiu
 
C++ Programming - 11th Study
C++ Programming - 11th StudyC++ Programming - 11th Study
C++ Programming - 11th Study
Chris Ohk
 
Tiramisu概要
Tiramisu概要Tiramisu概要
Tiramisu概要
Mr. Vengineer
 
About Go
About GoAbout Go
About Go
Jongmin Kim
 
CodiLime Tech Talk - Grzegorz Rozdzialik: What the java script
CodiLime Tech Talk - Grzegorz Rozdzialik: What the java scriptCodiLime Tech Talk - Grzegorz Rozdzialik: What the java script
CodiLime Tech Talk - Grzegorz Rozdzialik: What the java script
CodiLime
 
C Language - Switch and For Loop
C Language - Switch and For LoopC Language - Switch and For Loop
C Language - Switch and For Loop
Sukrit Gupta
 
Python idiomatico
Python idiomaticoPython idiomatico
Python idiomatico
PyCon Italia
 
6. binary tree
6. binary tree6. binary tree
6. binary tree
Geunhyung Kim
 
Data Structure - 2nd Study
Data Structure - 2nd StudyData Structure - 2nd Study
Data Structure - 2nd Study
Chris Ohk
 
How to master C++
How to master C++How to master C++
How to master C++
Florian Richoux
 
Unit 4
Unit 4Unit 4
Unit 4siddr
 

What's hot (20)

Ch03
Ch03Ch03
Ch03
 
Elements of C++11
Elements of C++11Elements of C++11
Elements of C++11
 
Ch04
Ch04Ch04
Ch04
 
ALF 5 - Parser Top-Down
ALF 5 - Parser Top-DownALF 5 - Parser Top-Down
ALF 5 - Parser Top-Down
 
Eecs 317 20010209
Eecs 317 20010209Eecs 317 20010209
Eecs 317 20010209
 
C++ Advanced
C++ AdvancedC++ Advanced
C++ Advanced
 
runtimestack
runtimestackruntimestack
runtimestack
 
BBS crawler for Taiwan
BBS crawler for TaiwanBBS crawler for Taiwan
BBS crawler for Taiwan
 
C++ Programming - 11th Study
C++ Programming - 11th StudyC++ Programming - 11th Study
C++ Programming - 11th Study
 
Tiramisu概要
Tiramisu概要Tiramisu概要
Tiramisu概要
 
Bottomupparser
BottomupparserBottomupparser
Bottomupparser
 
About Go
About GoAbout Go
About Go
 
CodiLime Tech Talk - Grzegorz Rozdzialik: What the java script
CodiLime Tech Talk - Grzegorz Rozdzialik: What the java scriptCodiLime Tech Talk - Grzegorz Rozdzialik: What the java script
CodiLime Tech Talk - Grzegorz Rozdzialik: What the java script
 
Regexp Master
Regexp MasterRegexp Master
Regexp Master
 
C Language - Switch and For Loop
C Language - Switch and For LoopC Language - Switch and For Loop
C Language - Switch and For Loop
 
Python idiomatico
Python idiomaticoPython idiomatico
Python idiomatico
 
6. binary tree
6. binary tree6. binary tree
6. binary tree
 
Data Structure - 2nd Study
Data Structure - 2nd StudyData Structure - 2nd Study
Data Structure - 2nd Study
 
How to master C++
How to master C++How to master C++
How to master C++
 
Unit 4
Unit 4Unit 4
Unit 4
 

Similar to Introduction to Compiler Development

Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limiters
Marina Kolpakova
 
Spring scala - Sneaking Scala into your corporation
Spring scala  - Sneaking Scala into your corporationSpring scala  - Sneaking Scala into your corporation
Spring scala - Sneaking Scala into your corporationHenryk Konsek
 
C Programming Homework Help
C Programming Homework HelpC Programming Homework Help
C Programming Homework Help
Programming Homework Help
 
Kamil witecki asynchronous, yet readable, code
Kamil witecki asynchronous, yet readable, codeKamil witecki asynchronous, yet readable, code
Kamil witecki asynchronous, yet readable, code
Kamil Witecki
 
Best C++ Programming Homework Help
Best C++ Programming Homework HelpBest C++ Programming Homework Help
Best C++ Programming Homework Help
C++ Homework Help
 
JavaFX, because you're worth it
JavaFX, because you're worth itJavaFX, because you're worth it
JavaFX, because you're worth it
Thierry Wasylczenko
 
01 stack 20160908_jintaek_seo
01 stack 20160908_jintaek_seo01 stack 20160908_jintaek_seo
01 stack 20160908_jintaek_seo
JinTaek Seo
 
A CTF Hackers Toolbox
A CTF Hackers ToolboxA CTF Hackers Toolbox
A CTF Hackers Toolbox
Stefan
 
Programming Assignment Help
Programming Assignment HelpProgramming Assignment Help
Programming Assignment Help
Programming Homework Help
 
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
Pôle Systematic Paris-Region
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
vsssuresh
 
Writing (Meteor) Code With Style
Writing (Meteor) Code With StyleWriting (Meteor) Code With Style
Writing (Meteor) Code With Style
Stephan Hochhaus
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
Edge AI and Vision Alliance
 
Data structures notes for college students btech.pptx
Data structures notes for college students btech.pptxData structures notes for college students btech.pptx
Data structures notes for college students btech.pptx
KarthikVijay59
 
Hardware Description Languages .pptx
Hardware Description Languages .pptxHardware Description Languages .pptx
Hardware Description Languages .pptx
wafawafa52
 
Native interfaces for R
Native interfaces for RNative interfaces for R
Native interfaces for R
Seth Falcon
 
javapravticalfile.doc
javapravticalfile.docjavapravticalfile.doc
javapravticalfile.doc
PrinceSaini209948
 
(Slightly) Smarter Smart Pointers
(Slightly) Smarter Smart Pointers(Slightly) Smarter Smart Pointers
(Slightly) Smarter Smart Pointers
Carlo Pescio
 
GraphQL Relay Introduction
GraphQL Relay IntroductionGraphQL Relay Introduction
GraphQL Relay Introduction
Chen-Tsu Lin
 
C programming
C programmingC programming
C programming
Karthikeyan A K
 

Similar to Introduction to Compiler Development (20)

Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limiters
 
Spring scala - Sneaking Scala into your corporation
Spring scala  - Sneaking Scala into your corporationSpring scala  - Sneaking Scala into your corporation
Spring scala - Sneaking Scala into your corporation
 
C Programming Homework Help
C Programming Homework HelpC Programming Homework Help
C Programming Homework Help
 
Kamil witecki asynchronous, yet readable, code
Kamil witecki asynchronous, yet readable, codeKamil witecki asynchronous, yet readable, code
Kamil witecki asynchronous, yet readable, code
 
Best C++ Programming Homework Help
Best C++ Programming Homework HelpBest C++ Programming Homework Help
Best C++ Programming Homework Help
 
JavaFX, because you're worth it
JavaFX, because you're worth itJavaFX, because you're worth it
JavaFX, because you're worth it
 
01 stack 20160908_jintaek_seo
01 stack 20160908_jintaek_seo01 stack 20160908_jintaek_seo
01 stack 20160908_jintaek_seo
 
A CTF Hackers Toolbox
A CTF Hackers ToolboxA CTF Hackers Toolbox
A CTF Hackers Toolbox
 
Programming Assignment Help
Programming Assignment HelpProgramming Assignment Help
Programming Assignment Help
 
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
 
Writing (Meteor) Code With Style
Writing (Meteor) Code With StyleWriting (Meteor) Code With Style
Writing (Meteor) Code With Style
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
 
Data structures notes for college students btech.pptx
Data structures notes for college students btech.pptxData structures notes for college students btech.pptx
Data structures notes for college students btech.pptx
 
Hardware Description Languages .pptx
Hardware Description Languages .pptxHardware Description Languages .pptx
Hardware Description Languages .pptx
 
Native interfaces for R
Native interfaces for RNative interfaces for R
Native interfaces for R
 
javapravticalfile.doc
javapravticalfile.docjavapravticalfile.doc
javapravticalfile.doc
 
(Slightly) Smarter Smart Pointers
(Slightly) Smarter Smart Pointers(Slightly) Smarter Smart Pointers
(Slightly) Smarter Smart Pointers
 
GraphQL Relay Introduction
GraphQL Relay IntroductionGraphQL Relay Introduction
GraphQL Relay Introduction
 
C programming
C programmingC programming
C programming
 

Recently uploaded

在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
mz5nrf0n
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 

Recently uploaded (20)

在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 

Introduction to Compiler Development

  • 1. INTRO TO COMPILER DEVELOPMENT LOGAN CHIEN http://slide.logan.tw/compiler-intro/
  • 2. LOGAN CHIEN So ware engineer at MediaTek Ametuar LLVM/Clang developer Integrated LLVM/Clang into Android NDK
  • 3. WHY I LOVE COMPILERS? I have been a faithful reader of Jserv's blog for ten years. I was inspired by the compiler and the virtual machine technologies mentioned in his blog. I have decided to choose compiler technologies as my research topic since then.
  • 4. UNDERGRADUATE COMPILER COURSE I took the undergraduate compiler course when I was a sophomore.
  • 5. MY PROFESSOR SAID ... “We took many lectures to discuss about the parser. However, when people say they are doing compiler research, with large possibility, they are not referring to the parsing technique.”
  • 6. I AM HERE TO ... Re-introduce the compiler technologies, Give a lightening talk on industrial-strength compiler design, Explain the connection between compiler technologies and the industry.
  • 7. AGENDA Re-introduction to Compiler (30min) Industrial-strength Compiler Design (90min) Compiler and ICT Industry (20min)
  • 9. WHAT IS A COMPILER? Compiler are tools for programmers to translate programmer's thought into computer runnable programs. ANALOGY — Translators who turn from one language to another, such as those who translate Chinese to English.
  • 10. WHAT HAVE WE LEARNED IN UNDERGRADUATE COMPILER COURSE?
  • 11. LEXER Reads the input source code (as a sequence of bytes) and converts them into a stream of tokens. u n s i g n e d b a c k g r o u n d ( u n s i g n e d f o r e g r o u n d ) { i f ( ( f o r e g r o u n d > > 1 6 ) > 0 x 8 0 ) { r e t u r n 0 ; } e l s e { r e t u r n 0 x f f f f f f ; } } unsigned background ( unsigned foreground ) { if ( ( foreground >> 16 ) > 128 ) { return 0 ; } else { return 16777215 ; } }
  • 12. PARSER Reads the tokens and build an AST according to the syntax. unsigned background ( unsigned foreground ) { if ( ( foreground >> 16 ) > 128 ) { return 0 ; } else { return 16777215 ; } } ( p r o c e d u r e b a c k g r o u n d ( a r g s ' ( f o r e g r o u n d ) ) ( c o m p o u n d - s t m t ( i f - s t m t ( b i n - e x p r G E ( b i n - e x p r R S H I F T f o r e g r o u n d 1 6 ) 1 2 8 ) ( r e t u r n - s t m t 0 ) ( r e t u r n - s t m t 1 6 7 7 7 2 1 5 ) ) ) )
  • 13. CODE GENERATOR Generate the machine code or (assembly) according to the AST. In the undergraduate course, we usually simply do syntax-directed translation. ( p r o c e d u r e b a c k g r o u n d ( a r g s ' ( f o r e g r o u n d ) ) ( c o m p o u n d - s t m t ( i f - s t m t ( b i n - e x p r G E ( b i n - e x p r R S H I F T f o r e g r o u n d 1 6 ) 1 2 8 ) ( r e t u r n - s t m t 0 ) ( r e t u r n - s t m t 1 6 7 7 7 2 1 5 ) ) ) ) l s r w 8 , w 0 , # 1 6 c m p w 8 , # 1 2 8 b . l o . L e l s e m o v w 0 , w z r r e t . L e l s e : o r r w 0 , w z r , # 0 x f f f f f f r e t
  • 14. WHAT'S MISSING? Can a person who can only lex and parse sentences translate articles well?
  • 15. COMPILER REQUIREMENTS A compiler should translate the source code precisely. A compiler should utilize the device efficiently.
  • 16. THREE RELATED FIELDS Programming Language Computer Architecture Compiler
  • 17. PROGRAMMING LANGUAGE THEORY Essential component of a programming language: type theory, variable scoping, language semantics, etc. How do people reason and compose a program? Create an abstraction that is understandable to human and tracable to computers.
  • 18. EXAMPLE: SUBTYPE AND MUTABLE RECORDS Why you can't perform following conversion in C++? v o i d t e s t ( i n t * p t r ) { i n t * * p = & p t r ; c o n s t i n t * * a = p ; / / C o m p i l e r g i v e s w a r n i n g / / . . . } This is related to covariant type and contravarience type. With PLT, we know that we can only choose two of (a) covariant type, (b) mutable records, and (c) type consistency. v o i d t e s t ( i n t * p t r ) { c o n s t i n t c = 0 ; i n t * * p = & p t r ; c o n s t i n t * * a = p ; / / I f i t i s a l l o w e d , b a d p r o g r a m s w i l l p a s s . * a = & c ; * p = 5 ; / / N o w a r n i n g h e r e . }
  • 19. EXAMPLE: DYNAMIC SCOPING IN BASH # ! / b i n / s h v = 1 # I n i t i a l i z e v w i t h 1 f o o ( ) { e c h o " f o o : v = $ { v } " # W h i c h v i s r e f e r r e d ? v = 2 # W h i c h v i s a s s i g n e d ? } b a r ( ) { l o c a l v = 3 f o o e c h o " b a r : v = $ { v } " # W h a t w i l l b e p r i n t e d ? } v = 4 # A s s i g n 4 t o v b a r e c h o " v = $ { v } " # W h a t w i l l b e p r i n t e d ? Ans: foo:v=3, bar:v=2, v=4. Surprisingly, foois accessing local vin barinstead of the global v.
  • 20. EXAMPLE: NON-LEXICAL SCOPING IN JAVASCRIPT / / J a v a s c r i p t , t h e b a d p a r t f u n c t i o n b a d ( v ) { v a r s u m ; w i t h ( v ) { s u m = a + b ; } r e t u r n s u m ; } c o n s o l e . l o g ( b a d ( { a : 5 , b : 1 0 } ) ) ; c o n s o l e . l o g ( b a d ( { a : 5 , b : 1 0 , s u m : 1 0 0 } ) ) ; Ans: The second console.log()prints undefined.
  • 21. COMPUTER ARCHITECTURE Instruction set architecture: CISC vs. RISC. Out-of-Order Execution vs. Instruction Scheduling. Memory hierarchy Memory model
  • 22. QUIZ: DO YOU REALLY KNOW C? Is it guaranteed that vwill always be loaded a er pred? i n t p r e d ; i n t v ; i n t g e t ( i n t a , i n t b ) { i n t r e s ; i f ( p r e d > 0 ) { r e s = v * a - v / b ; } e l s e { r e s = v * a + v / b ; } r e t u r n r e s ; } Ans: No. Independent reads/writes can be reordered. The standard only requires the result should be the same as running from top to bottom (in a single thread.)
  • 23. COMPILER ANALYSIS Data-flow analysis — Analyze value ranges, check the conditions or contraints, figure out modifications to variables, etc. Control-flow analysis — Analyze the structure of the program, such as control dependency and loop structure. Memory dependency analysis — Analyze the memory access pattern of the access to array elements or pointer dereferences, e.g. alias analysis.
  • 24. ALIAS ANALYSIS Determine whether two pointers can refers to the same object or not. v o i d m o v e ( c h a r * d s t , c o n s t c h a r * s r c , i n t n ) { f o r ( i n t i = 0 ; i < n ; + + i ) { d s t [ i ] = s r c [ i ] ; } } i n t s u m ( i n t * p t r , c o n s t i n t * v a l , i n t n ) { i n t r e s = 0 ; f o r ( i n t i = 0 ; i < n ; + + i ) { r e s + = * v a l ; * p t r + + = 1 0 ; } r e t u r n r e s ; }
  • 25. QUIZ: DO YOU KNOW C++? c l a s s Q M u t e x L o c k e r { p u b l i c : u n i o n { Q M u t e x * m t x _ ; u i n t p t r _ t v a l _ ; } ; v o i d u n l o c k ( ) { i f ( v a l _ ) { i f ( ( v a l _ & ( u i n t p t r _ t ) 1 ) = = ( u i n t p t r _ t ) 1 ) { v a l _ & = ~ ( u i n t p t r _ t ) 1 ; m t x _ - > u n l o c k ( ) ; } } } } ; Pitfall: Reading from union fields that were not written previously results in undefined behavior. Type-Based Alias Analysis (TBAA) exploits this rule.
  • 26. COMPILER OPTIMIZATION Scalar optimization — Fold the constants, remove the redundancies, change expressions with identities, etc. Vector optimization — Convert several scalar operations into one vector operation, e.g. combining for addinstruction into one vector add. Interprocedural optimization — function inlining, devirtualization, cross-function analysis, etc.
  • 27. OTHER COMPILER-RELATED TECHNOLOGY Just-in-time compilers Binary translators Program profiling and performance measurement Facilities to run compiled executables, e.g. garbage collectors
  • 29. What's the difference between your final project and the industrial- strength compiler?
  • 30. KEY DIFFERENCE Analysis — Reasons program structures and changes of values. Optimization — Applies several provably correct transformation which should make program run faster. Intermediate Representation — Data structure on which analyses and optimizations are based.
  • 31. INTERMEDIATE REPRESENTATION1/4 A data structure for program analyses and optimizations. High-level enough to capture important properties and encapsulates hardware limitation. Low-level enough to be analyzed by analyses and manipulated by transformations. An abstraction layer for multiple front-ends and back-ends.
  • 34. INTERMEDIATE REPRESENTATION4/4 GCC — GENERIC, GIMPLE, Tree SSA, and RTL. LLVM — LLVM IR, Selection DAG, and Machine Instructions. Java HotSpot — HIR, LIR, and MIR.
  • 35. CONTROL FLOW GRAPH1/3 Basic block — A sequence of instructions that will only be entered from the top and exited from the end. Edge — If the basic block s may branch to t, then we add a directed edge (s, t). Predecessor/Successor — If there is an edge (s, t), then s is a predecessor of t and t is a successor of s.
  • 36. CONTROL FLOW GRAPH2/3 / / y = a * x + b ; v o i d m a t m u l ( d o u b l e * r e s t r i c t y , u n s i g n e d l o n g m , u n s i g n e d l o n g n , c o n s t d o u b l e * r e s t r i c t a , c o n s t d o u b l e * r e s t r i c t x , c o n s t d o u b l e * r e s t r i c t b ) { f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) { d o u b l e t = b [ r ] ; f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { t + = a [ r * n + i ] * x [ i ] ; } y [ r ] = t ; } } Input source program
  • 37. CONTROL FLOW GRAPH3/3 bp = getelementptr b, r t = load bp i = 0 cmp = icmp lt, i, n br cmp, B2, B3 B1 B2 rn = mul r, n ai = add rn, i ap = getelementptr a, ai ad = load ap xp = getelementptr x, i xd = load xp ax = mul ad, xd t = add t, ax i = add i, 1 cmp = icmp lt, i, n br cmp, B2, B3 B3 yp = getelementptr y, r store t, yp r = add r, 1 cmp = icmp lt, r, n br cmp, B1, B4B4 ret void B0 (ENTRY) r = 0 cmp = icmp lt r, m br cmp, B1, B4
  • 38. VARIABLE DEFINITION bp = getelementptr b, r t = load bp i = 0 cmp = icmp lt, i, n br cmp, B2, B3 B1 B2 rn = mul r, n ai = add rn, i ap = getelementptr a, ai ad = load ap xp = getelementptr x, i xd = load xp ax = mul ad, xd t = add t, ax i = add i, 1 cmp = icmp lt, i, n br cmp, B2, B3 B3 yp = getelementptr y, r store t, yp r = add r, 1 cmp = icmp lt, r, n br cmp, B1, B4B4 ret void B0 (ENTRY) r = 0 cmp = icmp lt r, m br cmp, B1, B4 definitions of r The place where a variable is assigned or defined.
  • 39. VARIABLE USE bp = getelementptr b, r t = load bp i = 0 cmp = icmp lt, i, n br cmp, B2, B3 B1 B2 rn = mul r, n ai = add rn, i ap = getelementptr a, ai ad = load ap xp = getelementptr x, i xd = load xp ax = mul ad, xd t = add t, ax i = add i, 1 cmp = icmp lt, i, n br cmp, B2, B3 B3 yp = getelementptr y, r store t, yp r = add r, 1 cmp = icmp lt, r, n br cmp, B1, B4B4 ret void B0 (ENTRY) r = 0 cmp = icmp lt r, m br cmp, B1, B4 definitions of r uses of r The places where a variable is referred or used.
  • 40. REACHING DEFINITION1/2 bp = getelementptr b, r t = load bp i = 0 cmp = icmp lt, i, n br cmp, B2, B3 B1 B2 rn = mul r, n ai = add rn, i ap = getelementptr a, ai ad = load ap xp = getelementptr x, i xd = load xp ax = mul ad, xd t = add t, ax i = add i, 1 cmp = icmp lt, i, n br cmp, B2, B3 B3 yp = getelementptr y, r store t, yp r = add r, 1 cmp = icmp lt, r, n br cmp, B1, B4B4 ret void B0 (ENTRY) r = 0 cmp = icmp lt r, m br cmp, B1, B4 {d0} {d0, d1} {d0, d1} d0: d1: {d1} {d0, d1} {d0, d1} Definitions that reaches a use.
  • 41. REACHING DEFINITION2/2 Constant propagation is a good example to show the usefulness of reaching definition. v o i d t e s t ( i n t c o n d ) { i n t a = 1 ; / / d 0 i n t b = 2 ; / / d 1 i f ( c o n d ) { c = 3 ; / / d 2 } e l s e { / / R e a c h D e f [ a ] = { d 0 } / / R e a c h D e f [ b ] = { d 1 } c = a + b ; / / d 3 } / / R e a c h D e f [ c ] = { d 2 , d 3 } u s e ( c ) ; }
  • 42. DOMINANCE RELATION1/2 A basic block s dominates t iff every paths that goes from entry to t will pass through s. Every basic block in a CFG has an immediate dominator and forms a dominator tree.
  • 44. DOMINANCE FRONTIER A basic block t is a dominance frontier of a basic block s, if one of predecessor of t is dominated by s but t is not strictly dominated by s. B0 B1 B2 B3 B6 B4 B5 DF[B2] = {B3, B5} DF[B4] = {B4} DF[B1] = {B3, B5}
  • 45. STATIC SINGLE-ASSIGNMENT FORM static — A static analysis to the program (not the execution.) single-assignment — Every variable can only be assigned once. SSA form is the most popular intermediate representation recently. It is adopted by a wide range of compilers, such as GCC, LLVM, Java HotSpot, Android ART, etc.
  • 46. SSA PROPERTIES Every variables can only be defined once. Every uses can only refer to one definition. Use phifunction to handle the merged control-flow.
  • 47. PHI FUNCTIONS d e f i n e v o i d @ f o o ( i 1 c o n d , i 3 2 a , i 3 2 b ) { e n t : b r c o n d , b 1 , b 2 b 1 : t 0 = m u l a , 4 b r b 3 b 2 : t 1 = m u l b , 5 b r b 3 b 3 : t 2 = p h i ( t 0 ) , ( t 1 ) u s e ( t 2 ) r e t } d e f i n e v o i d @ f o o ( i 3 2 n ) { e n t : b r l o o p l o o p : i 0 = p h i ( 0 ) , ( i 1 ) c m p = i c m p g e i 0 , n b r c m p , e n d , c o n t c o n t : u s e ( i 0 ) i 1 = a d d i 0 , 1 b r l o o p e n d : r e t }
  • 48. ADVANTAGE OF SSA FORM Compact — Reduce the def-use chain. Referential transparency — The properties associated with a variable will not be changed, aka. context-free.
  • 49. REDUCED DEF-USE CHAIN v o i d f o o ( i n t c o n d 1 , i n t c o n d 2 , i n t a , i n t b ) { i n t t ; i f ( c o n d 1 ) { t = a * 4 ; / / d 0 } e l s e { t = b * 5 ; / / d 1 } i f ( c o n d 2 ) { / / r e a c h - d e f : { d 0 , d 1 } u s e ( t ) ; } e l s e { / / r e a c h - d e f : { d 0 , d 1 } u s e ( t ) ; } } v o i d f o o ( i n t c o n d 1 , i n t c o n d 2 , i n t a , i n t b ) { i f ( c o n d 1 ) { t . 0 = a * 4 ; } e l s e { t . 1 = b * 5 ; } t . 2 = p h i ( t . 0 , t . 1 ) ; i f ( c o n d 2 ) { u s e ( t . 2 ) ; } e l s e { u s e ( t . 2 ) ; } }
  • 50. REFERENTIAL TRANSPARENCY1/2 v o i d f o o ( ) { i n t r = 5 ; / / d 0 / / . . . S O M E C O D E . . . / / W e c a n o n l y a s s u m e / / " r = = 5 " i f d 0 i s t h e / / o n l y r e a c h i n g d e f i n i t i o n . u s e ( r ) ; } v o i d f o o ( ) { r . 0 = 5 ; / / . . . S O M E C O D E . . . / / N o m a t t e r w h a t c o d e a r e / / s k i p p e d a b o v e , i t i s s a f e / / t o r e p l a c e f o l l o w i n g r . 0 / / w i t h 5 . u s e ( r . 0 ) ; }
  • 51. REFERENTIAL TRANSPARENCY2/2 v o i d f o o ( i n t a , i n t b ) { i n t c = a + b ; i n t r = a + b ; / / C a n w e s i m p l y r e p l a c e / / a l l o c c u r r e n c e o f r w i t h / / c ? ( N O ) / / . . . S O M E C O D E . . . u s e ( r ) ; } v o i d f o o ( i n t a , i n t b ) { c . 0 = a + b ; r . 0 = a + b ; / / . . . S O M E C O D E . . . / / N o m a t t e r w h a t c o d e a r e / / s k i p p e d a b o v e , i t i s s a f e / / t o r e p l a c e f o l l o w i n g r . 0 / / w i t h c . 0 . u s e ( r . 0 ) ; }
  • 52. BUILDING SSA FORM 1. Compute domination relationships between basic blocks and build the dominator tree. 2. Compute dominator frontiers. 3. Insert phifunctions at dominator frontiers. 4. Traverse the dominator tree and rename the variables.
  • 53. BEFORE SSA CONSTRUCTION bp = getelementptr b, r t = load bp i = 0 cmp = icmp lt, i, n br cmp, B2, B3 B1 B2 rn = mul r, n ai = add rn, i ap = getelementptr a, ai ad = load ap xp = getelementptr x, i xd = load xp ax = mul ad, xd t = add t, ax i = add i, 1 cmp = icmp lt, i, n br cmp, B2, B3 B3 yp = getelementptr y, r store t, yp r = add r, 1 cmp = icmp lt, r, n br cmp, B1, B4B4 ret void B0 (ENTRY) r = 0 cmp = icmp lt r, m br cmp, B1, B4
  • 54. AFTER SSA CONSTRUCTION r.0 = phi (0, B0) (r.1 B3) bp = getelementptr b, r.0 t.0 = load bp cmp = icmp lt, 0, n br cmp, B2, B3 B1 B2 t.1 = phi (t.0, B1) (t.2 B2) i.0 = phi (0, B1) (i.1, B2) rn = mul r.0, n ai = add rn, i.1 ap = getelementptr a, ai ad = load ap xp = getelementptr x, i.1 xd = load xp ax = mul ad, xd t.2 = add t.1, ax i.1 = add i.0, 1 cmp = icmp lt, i.1, n br cmp, B2, B3 B3 t.3 = phi (t0, B1) (t.2, B2) yp = getelementptr y, r.0 store t.3, yp r.1 = add r.0, 1 cmp = icmp lt, r.1, n br cmp, B1, B4 B4 ret void B0 (ENTRY) cmp = icmp lt 0, m br cmp, B1, B4
  • 56. CONSTANT PROPAGATION1/3 Constant propagation, sometimes known as constant folding, will evaluate the instructions with constant operands and propagate the constant result. a = a d d 2 , 3 b = a c = m u l a , b a = 5 b = 5 c = 2 5
  • 57. CONSTANT PROPAGATION2/3 Why do we need constant propagation? s t r u c t q u e u e * c r e a t e _ q u e u e ( ) { r e t u r n ( s t r u c t q u e u e * ) m a l l o c ( s i z e o f ( s t r u c t q u e u e * ) * 1 6 ) ; } i n t p r o c e s s _ d a t a ( i n t a , i n t b , i n t c ) { i n t k [ ] = { 0 x 1 3 , 0 x 1 7 , 0 x 1 9 } ; i f ( D E B U G ) { v e r i f y _ d a t a ( k , d a t a ) ; } r e t u r n ( a * k [ 1 ] * k [ 2 ] + b * k [ 0 ] * k [ 2 ] + c * k [ 0 ] * k [ 1 ] ) ; }
  • 58. CONSTANT PROPAGATION3/3 For each basic block b from CFG in reversed postorder: For each instruction i in basic block b from top to bottom: If all of its operands are constants, the operation has no side-effect, and we know how to evaluate it at compile time, then evaluate the result, remove the instruction, and replace all usages with the result.
  • 59. GLOBAL VALUE NUMBERING1/2 Global value numbering tries to give numbers to the computed expression and maps the newly visited expression to the visited ones. a = c * d ; / / [ ( * , c , d ) - > a ] e = c ; / / [ ( * , c , d ) - > a ] f = e * d ; / / q u e r y : i s ( * , c , d ) a v a i l a b l e ? u s e ( a , e , f ) ; a = c * d ; u s e ( a , c , a ) ;
  • 60. GLOBAL VALUE NUMBERING2/2 Traverse the basic blocks in depth-first order on dominator tree. Maintain a stack of hash table. Once we have returned from a child node on the dominator tree, then we have to pop the stack top. Visit the instructions in the basic block with tᅠ=ᅠopᅠa,ᅠb form and compute the hash for (op, a, b). If (op, a, b) is already in the hash table, then change tᅠ=ᅠopᅠa,ᅠb with tᅠ=ᅠhash_tab[(op,ᅠa,ᅠb)] Otherwise, insert (op, a, b) -> t to the hash table.
  • 61. DEAD CODE ELIMINATION1/6 Dead code elimination (DCE) removes unreachable instructions or ignored results. Constant propagation might reveal more dead code since the branch conditions become constant value. On the other hand, DCE can exploit more constant for constant propagation because several definitions are removed from the program.
  • 62. DEAD CODE ELIMINATION2/6 Conditional statements with constant condition i f ( k I s D e b u g B u i l d ) { / / D e a d c o d e c h e c k _ i n v a r i a n t ( a , b , c , d ) ; / / D e a d c o d e }
  • 63. DEAD CODE ELIMINATION3/6 Platform-specific constant v o i d h a s h _ e n t _ s e t _ v a l ( s t r u c t h a s h _ e n t * h , i n t v ) { i f ( s i z e o f ( i n t ) < = s i z e o f ( v o i d * ) ) { h - > p = ( v o i d * ) ( u i n t p t r _ t ) ( v ) ; } e l s e { h - > p = m a l l o c ( s i z e o f ( i n t ) ) ; / / D e a d c o d e * ( h - > p ) = v ; / / D e a d c o d e } }
  • 64. DEAD CODE ELIMINATION4/6 Computed result ignored i n t c o m p u t e _ s u m ( i n t a , i n t b ) { i n t s u m = ( a + b ) * ( a - b + 1 ) / 2 ; / / D e a d c o d e : N o t u s e d r e t u r n 0 ; } Dead code a er dead store optimization v o i d t e s t ( i n t * p , b o o l c o n d , i n t a , i n t b , i n t c ) { i f ( c o n d ) { t = a + b ; / / D e a d c o d e * p = t ; / / W i l l b e r e m o v e d b y D S E } * p = c ; }
  • 65. DEAD CODE ELIMINATION5/6 Dead code a er code specialization v o i d m a t m u l ( d o u b l e * r e s t r i c t y , u n s i g n e d l o n g m , u n s i g n e d l o n g n , c o n s t d o u b l e * r e s t r i c t a , c o n s t d o u b l e * r e s t r i c t x , c o n s t d o u b l e * r e s t r i c t b ) { i f ( n = = 0 ) { f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) { d o u b l e t = b [ r ] ; f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { / / D e a d c o d e t + = a [ r * n + i ] * x [ i ] ; / / D e a d c o d e } / / D e a d c o d e y [ r ] = t ; } } e l s e { / / . . . s k i p p e d . . . } }
  • 66. DEAD CODE ELIMINATION6/6 Traverse the CFG starting from entry in reversed post order. Only traverse the successor that may be visited, i.e. if the branch condition is a constant, then ignore the other side. While traversing the basic block, clear the dead instructions within the basic block. A er the traversal, remove the unvisited basic blocks and remove the variable uses that refers to the variables that are defined in the unvisited basic blocks.
  • 67. LOOP OPTIMIZATIONS It is reasonable to assume a program spends more time in the loop body. Thus, loop optimization is an important issue in the compiler.
  • 68. LOOP INVARIANT CODE MOTION1/3 Loop invariant code motion (LICM) is an optimization which moves loop invariants or loop constants out of the loop. i n t t e s t ( i n t n , i n t a , i n t b , i n t c ) { i n t s u m = 0 ; f o r ( i n t i = 0 ; i < n ; + + i ) { s u m + = i * a * b * c ; / / a * b * c i s l o o p i n v a r i a n t } r e t u r n s u m ; }
  • 69. LOOP INVARIANT CODE MOTION2/3 f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) { d o u b l e t = b [ r ] ; f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { t + = a [ ( r * n ) + i ] * x [ i ] ; / / " r * n " i s a i n n e r l o o p i n v a r i a n t } y [ r ] = t ; } f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) { d o u b l e t = b [ r ] ; u n s i g n e d l o n g k = r * n ; / / " r * n " m o v e d o u t o f t h e i n n e r l o o p f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { t + = a [ k + i ] * x [ i ] ; } y [ r ] = t ; }
  • 70. LOOP INVARIANT CODE MOTION3/3 How do we know whether a variable is a loop invariant? If the computation of a value is not (transitively) depending on following black lists, we can assume such value is a loop invariant: Phi instructions associating with the loop being considered Instructions with side-effects or non-pure instructions, e.g. function calls
  • 71. INDUCTION VARIABLE If x represents the trip count (iteration count), then we call a*x+b induction variables. f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) { d o u b l e t = b [ r ] ; u n s i g n e d l o n g k = r * n ; / / o u t e r l o o p I V f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { u n s i g n e d l o n g m = k + i ; / / i n n e r l o o p I V t + = a [ m ] * x [ i ] ; } y [ r ] = t ; } k and r are induction variables of outer loop. i and m are induction variables of inner loop.
  • 72. LOOP STRENGTH REDUCTION1/2 Strength reduction is an optimization to replace the multiplication in induction variables with an addition. f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) { d o u b l e t = b [ r ] ; u n s i g n e d l o n g k = r * n ; / / o u t e r l o o p I V f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { u n s i g n e d l o n g m = k + i ; / / i n n e r l o o p I V t + = a [ m ] * x [ i ] ; } y [ r ] = t ; } f o r ( u n s i g n e d l o n g r = 0 , k = 0 ; r < m ; + + r , k + = n ) { / / k d o u b l e t = b [ r ] ; f o r ( u n s i g n e d l o n g i = 0 , m = k ; i < n ; + + i , + + m ) { / / m t + = a [ m ] * x [ i ] ; } y [ r ] = t ; }
  • 73. LOOP STRENGTH REDUCTION2/2 We can even rewrite the range to eliminate the index computation. i n t s u m ( c o n s t i n t * a , i n t n ) { i n t r e s = 0 ; f o r ( i n t i = 0 ; i < n ; + + i ) { r e s + = a [ i ] ; / / i m p l i c i t * ( a + s i z e o f ( i n t ) * i ) } r e t u r n r e s ; } i n t s u m ( c o n s t i n t * a , i n t n ) { i n t r e s = 0 ; c o n s t i n t * e n d = a + n ; / / i m p l i c i t a + s i z e o f ( i n t ) * n f o r ( c o n s t i n t * p = a ; p ! = e n d ; + + p ) { / / r a n g e u p d a t e d r e s + = * p ; } r e t u r n r e s ; }
  • 74. LOOP UNROLLING1/2 Unroll the loop body multiple times. Purpose: Reduce amortized loop iteration overhead. Purpose: Reduce load/store stall and exploit instruction- level parallelism, e.g. so ware pipelining. Purpose: Prepare for vectorization, e.g. SIMD.
  • 75. LOOP UNROLLING2/2 f o r ( u n s i g n e d l o n g r = 0 , k = 0 ; r < m ; + + r , k + = n ) { d o u b l e t = b [ r ] ; s w i t c h ( n & 0 x 3 ) { / / D u f f ' s d e v i c e c a s e 3 : t + = a [ k + 2 ] * x [ 2 ] ; c a s e 2 : t + = a [ k + 1 ] * x [ 1 ] ; c a s e 1 : t + = a [ k ] * x [ 0 ] ; } f o r ( u n s i g n e d l o n g i = n & 0 x 3 ; i < n ; i + = 4 ) { t + = a [ k + i ] * x [ i ] ; t + = a [ k + i + 1 ] * x [ i + 1 ] ; t + = a [ k + i + 2 ] * x [ i + 2 ] ; t + = a [ k + i + 3 ] * x [ i + 3 ] ; } y [ r ] = t ; }
  • 76. INSTRUCTION SELECTION1/5 Instruction selection is a process to map IR into machine instructions. Compiler back-ends will perform pattern matching to the best select instructions (according to the heuristic.)
  • 77. INSTRUCTION SELECTION2/5 Complex operations, e.g. shi -and-add or multiply- accumulate. Array load/store instructions, which can be translated to one shi -add-and-load on some architectures. IR instructions are which not natively supported by the target machine.
  • 78. INSTRUCTION SELECTION3/5 d e f i n e i 6 4 @ m a c ( i 6 4 % a , i 6 4 % b , i 6 4 % c ) { e n t : % 0 = m u l i 6 4 % a , % b % 1 = a d d i 6 4 % 0 , % c r e t i 6 4 % 1 } m a c : m a d d x 0 , x 1 , x 0 , x 2 r e t
  • 79. INSTRUCTION SELECTION4/5 d e f i n e i 6 4 @ l o a d _ s h i f t ( i 6 4 * % a , i 6 4 % i ) { e n t : % 0 = g e t e l e m e n t p t r i 6 4 , i 6 4 * % a , i 6 4 % i % 1 = l o a d i 6 4 , i 6 4 * % 0 r e t i 6 4 % 1 } l o a d _ s h i f t : l d r x 0 , [ x 0 , x 1 , l s l # 3 ] r e t
  • 80. INSTRUCTION SELECTION5/5 % s t r u c t . A = t y p e { i 6 4 * , [ 1 6 x i 6 4 ] } d e f i n e i 6 4 @ g e t _ v a l ( % s t r u c t . A * % p , i 6 4 % i , i 6 4 % j ) { e n t : % 0 = g e t e l e m e n t p t r % s t r u c t . A , % s t r u c t . A * % p , i 6 4 % i , i 3 2 1 , i 6 4 % j % 1 = l o a d i 6 4 , i 6 4 * % 0 r e t i 6 4 % 1 } g e t _ v a l : m o v z w 8 , # 0 x 8 8 m a d d x 8 , x 1 , x 8 , x 0 a d d x 8 , x 8 , x 2 , l s l # 3 l d r x 0 , [ x 8 , # 8 ] r e t
  • 81. SSA DESTRUCTION1/5 How do we deal with the phifunctions? Copy the assignment operator to the end of the predecessor. (Must be done carefully)
  • 82. SSA DESTRUCTION2/3 cmp = icmp eq, cond, 0 br cmp, B1, B2 B0 br B3 B1 br B3 B2 a = phi (3, B1) (7, B2) print(a) B3 cmp = icmp eq, cond, 0 br cmp, B1, B2 B0 a = 3 br B3 B1 a = 7 br B3 B2 print(a) B3 converts to Example to show the assignment copying
  • 83. SSA DESTRUCTION3/5 i.0 = phi (0, B0) (i.1, B2) cmp = icmp lt, i.0, 100 br cmp, B2, B3 B1 ret B3 print(i.0) i.1 = add i.0, 1 br B1 B2 br B1 B0 cmp = icmp lt, i.0, 100 br cmp, B2, B3 B1 B3 print(i.0) i.1 = add i.0, 1 i.0 = i.1 br B1 B2 i.0 = 0 br B1 B0 ret Converts to Converts to Example to show the assignment copying with loop
  • 84. SSA DESTRUCTION4/5 i.0 = phi (0, B0) (i.1, B2) cmp = icmp lt, i.0, 100 br cmp, B2, B3 B1 print(i.0) ret B3 print(i.0) i.1 = add i.0, 1 br B1 B2 br B1 B0 cmp = icmp lt, i.0, 100 br cmp, B2, B3 B1 B3 print(i.0) i.1 = add i.0, 1 i.0 = i.1 br B1 B2 i.0 = 0 br B1 B0 print(i.0) ret Doesn't work! Lost copy problem — Naive de-SSA algorithm doesn't work due to live range conflicts. Need extra assignments and renaming to avoid conflicts.
  • 85. SSA DESTRUCTION5/5 t = x x = y y = t br cmp, B2, B1 B1 print(x) print(y) B2 x = 1 y = 2 br B1 B0 x1 = phi (x0, y1) y1 = phi (y0, x1) br cmp, B2, B1 B1 print(x1) print(y1) B2 x0 = 1 y0 = 2 br B1 B0 x1 = y1 y1 = x1 br cmp, B2, B1 B1 print(x1) print(y1) B2 x0 = 1 y0 = 2 x1 = x0 y1= y0 br B1 B0 Swap problem — Conflicts due to parallel assignment semantics of phiinstructions. A correct algorithm should detect the cycle and implement parallel assignment with swap instructions.
  • 86. REGISTER ALLOCATION1/2 We have to replace infinite virtual registers with finite machine registers. Register Allocation — Make sure that the maximum simultaneous register usages are less then k. Register Assignment — Assign a register given the fact that registers are always sufficient. The classical solution is to compute the life time of each variables, build the inference graph, spill the variable to stack if the inference graph is not k-colorable.
  • 87. REGISTER ALLOCATION2/2 ret y add y, c, g add g a f add f, d, e add e, a, c add d, a, b mov c, #2 mov b, #1 mov a, #1 a b c d e f g y a g b c e d f y
  • 88. INSTRUCTION SCHEDULING1/2 Sort the instruction according the number of cycles in order to reduce execution time of the critical path. Constraints: (a) Data dependence, (b) Functional units, (c) Issue width, (d) Datapath forwarding, (e) Instruction cycle time, etc.
  • 89. INSTRUCTION SCHEDULING2/2 0 : a d d r 0 , r 0 , r 1 1 : a d d r 1 , r 2 , r 2 2 : l d r r 4 , [ r 3 ] 3 : s u b r 4 , r 4 , r 0 4 : s t r r 4 , [ r 1 ] 2 : l d r r 4 , [ r 3 ] # l o a d i n s t r u c t i o n n e e d s m o r e c y c l e s 0 : a d d r 0 , r 0 , r 1 1 : a d d r 1 , r 2 , r 2 3 : s u b r 4 , r 4 , r 0 4 : s t r r 4 , [ r 1 ]
  • 91. WHERE CAN WE FIND COMPILER?
  • 92. SMART PHONES Android ART (Java VM) OpenGL|ES GLSL Compiler (Graphics)
  • 93. BROWSERS Javascript-related Javascript Engine, e.g. V8, IonMonkey, JavaScriptCore, Chakra Regular Expression Engine WebAssembly WebGL — OpenGL binding for the Web (Graphics) WebCL — OpenCL binding for the Web (Computing)
  • 94. DESKTOP APPLICATION RUN-TIME ENVIRONMENTS Java run-time (Java, Scala, Clojure, Groovy) .NET run-time (C#, F#) Ruby Python PHP
  • 95. VIRTUAL MACHINES AND EMULATORS Simulate different architecture (using dynamic recompilation), e.g. QEMU Simulate special instructions that are not supported by hypervisor.
  • 96. DEVELOPMENT ENVIRONMENT C/C++ compiler, e.g. MSVC, GCC, LLVM Java compiler, e.g. OpenJDK DSP compilers for digital signal processors. VHDL compilers for IC design team. Profiling and/or intrusmentation tools, e.g. Valgrind, Dr. Memory, Address Sanitizer, Memory Sanitizer
  • 97. WHAT DOES A COMPILER TEAM DO? Develop the compiler toolchain for the in-house processors, e.g. DSP, GPU, CPU, and etc. Tune the performance of the existing compiler or virtual machines. Develop a new programming language or model (e.g. Cuda from NVIDIA.)
  • 98. SOME RESEARCH TOPICS ... Memory model for concurrency — Designing a good memory model at programming language level, which should be intuitive to human while open for compiler/architecture optimization, is still a challenging problem. Both Java and C++ took efforts to standardize. However, there are still some unintuitive cases that are allowed and some desirable cases being ruled out.