SlideShare a Scribd company logo
GCC porting
Use instruction pattern describe
target ISA
Shiva Chen
shiva0217@gmail.com
May 2013
Outline
 Compiler structure
 Intermediate languages in GCC
 Optimization pass in GCC
 Define instruction pattern
 Operand constraints
 Match instruction pattern
 Strict RTL
 Target defined constraints
 Emit assembly code
 Target information usage
 Preserve word to describe instruction pattern
 Example of instruction pattern
 Split instruction pattern
 Instruction attribute
 Peephole pattern
 Instruction scheduling
 Three main intermediate languages format in GCC
 GENERIC
Language-independent representation generated by each front
end
Common representation for all the languages supported by
GCC.
 GIMPLE
Perform language independent and target independent
optimization
 RTL
Perform the optimization which will notice target feature by
porting code
Gimple optimization pass in GCC
4.6.2
004t.gimple
006t.vcg
009t.omplower
010t.lower
012t.eh
013t.cfg
017t.ssa
018t.veclower
019t.inline_param1
020t.einline
021t.early_optimizations
022t.copyrename1
023t.ccp1
024t.forwprop1
025t.ealias
026t.esra
027t.copyprop1
028t.mergephi1
029t.cddce1
030t.eipa_sra
031t.tailr1
032t.switchconv
034t.profile
035t.local-pure-const1
036t.fnsplit
037t.release_ssa
038t.inline_param2
057t.copyrename2
058t.cunrolli
059t.ccp2
060t.forwprop2
062t.alias
063t.retslot
064t.phiprop
065t.fre
066t.copyprop2
067t.mergephi2
068t.vrp1
069t.dce1
070t.cselim
071t.ifcombine
072t.phiopt1
073t.tailr2
074t.ch
076t.cplxlower
077t.sra
078t.copyrename3
079t.dom1
080t.phicprop1
081t.dse1
082t.reassoc1
083t.dce2
084t.forwprop3
085t.phiopt2
086t.objsz
087t.ccp3
088t.copyprop3
090t.bswap
091t.crited
092t.pre
093t.sink
094t.loop
095t.loopinit
096t.lim1
097t.copyprop4
…
143t.optimized
RTL optimization pass in GCC 4.6.2
004t.gimple
144r.expand
Other gimple pass
145r.sibling
147r.initvals
148r.unshare
149r.vregs
150r.into_cfglayout
151r.jump
152r.subreg1
153r.dfinit
154r.cse1
155r.fwprop1
156r.cprop1
158r.hoist
159r.cprop2
162r.ce1
163r.reginfo
164r.loop2
165r.loop2_init
166r.loop2_invariant
170r.loop2_done
172r.cprop3
173r.cse2
174r.dse1
175r.fwprop2
176r.auto_inc_dec
177r.init-regs
178r.dce
179r.combine
180r.ce2
182r.regmove
183r.outof_cfglayout
184r.split1
185r.subreg2
188r.asmcons
190r.sched1
191r.ira
192r.postreload
194r.split2
198r.pro_and_epilogue
199r.dse2
200r.csa
201r.peephole2
202r.ce3
204r.cprop_hardreg
205r.dce
206r.bbro
208r.split4
209r.sched2
212r.alignments
215r.mach
216r.barriers
217r.dbr
218r.split5
220r.shorten
221r.nothrow
222r.final
223r.dfinish
224t.statistics
 Why need divide optimization pass to
gimple pass and RTL pass?
 Gimple pass have more high level semantic
Ex: switch, array, structure, variable
Some optimization is more easier to designed when
high level semantic still exist
 However, gimple pass lack of target information
Ex: instruction length(size), supported ISA
Therefore, we need RTL optimization pass
Define instruction pattern
 All the RTL pattern must match target ISA
 How to tell GCC generate the RTL match ISA ?
Instruction patterns
 Use define_expand, define_insn to describe the instruction
patterns which target support
(define_insn “addsi3"
[
(set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i"))
)
] ... )
Define instruction pattern
 GCC already define several instruction pattern
name and the semantic of the pattern
 addsi3
Add semantic with 3 SI mode operands
 GCC don’t know the operand constraint of the
target
 How to tell GCC our target’s operand constraint of each
instruction ?
Predicate
Constraint
Operand Constraints
 Multiple Alternative Constraints
(define_insn “addsi3"
[
(set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i"))
)
] ... )
Predicate: register_operand, nonmemory_operand
Constraint: r, i
Predicate should contain each constraints of the operand
For operand 2 with SI mode
r(reg) belong to nonmemory_operand
i(immediate) belong to nonmemory_operand
Operand Constraints
 GCC already have predicate to restrict
operand
 Why need constraint field ?
Give the opportunity to change operand while
optimization
 Ex:
movi $r0, 4;
add $r1, $r1, $r0 {addsi3}
Constant propagation
=> addi $r1, $1, 4 {addsi3}
Operand Constraints
 GCC use two level operand constraint
 group same semantic instruction together with
single instruction pattern (addsi3)
 Lots of ISA designed have several assembly
instructions with same semantic and different
operand constraint
 Reduce the instruction pattern when porting
Operand Constraints
 Use instruction pattern do ISA support
checking when GCC generate a new RTL
pattern
 Check does the back end define the pattern by
define_insn
 Check the operand type support or not by
predicate
 Check the operand belong to which alternative
by constraint
Operand Constraints
 Multiple Alternative Constraints
(define_insn “addsi3"
[
(set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i"))
)
] ... )
First alternative constraints
match “add”
Second alternative constraints
match “addi”
Match instruction pattern
 Multiple Alternative Constraints
(define_insn “addsi3"
[
(set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i"))
)
] ... )
Ex:
(set (reg/f:SI 88)
(plus:SI (reg:SI 87)
(reg/v:SI 55))
1. Parsing RTL pattern
(set (op0)
(plus:SI (op1)
(op2))
Match instruction pattern
 When will generate new RTL pattern ?
 RTL expand phase (GIMPLE to RTL)
 During optimization
Ex:
(set (reg/f:SI 47)
(lshiftrt:SI (reg:SI 60)
(const_int 2))
(set (reg/f:SI 88)
(plus:SI (reg:SI 47)
(reg:SI 55))
(set (reg/f:SI 88)
(plus:SI (lshiftrt:SI (reg:SI 60)
(const_int 2))
(reg/v:SI 55))Combine phase
srli $r47, $r60, 2
add $r88, $r47, $r55
add_srli $r88, $r55, $r60, 2
Strict RTL
 Does the new generated RTL pattern
always satisfy constraint ?
 GCC allow certain kind un-match constraint
which reload could fix it later
 Predicate must always satisfy
RTL1
Not do optimization1
Do optimization1
RTL1
RTL2
Reload
Reload
RTL3
RTL2 not satisfy constraint
RTL4
1. RTL3 and RTL4
Satisfy constraint
2. RTL4 is better
Then RTL3
Strict RTL
 Constraint could allow certain un-match before
reload, and hope reload to fix it
 Ex: constraint is m (memory), but current operand is
constant, GCC will allow before reload
 Reload phase is after register allocation
In fact, during register allocation, GCC will call reload rapidly
while the operand not fit the constraint.
 After reload, the operand must satisfy one of the
operand constraint (strict RTL)
Strict RTL
(define_insn “movsi"
[
(set (match_operand:SI 0 “register_operand" "=r,m")
(match_operand:SI 1 “register_operand" “r,r"))
)
] ... )
(set (reg/f:SI 47)
(reg:SI 60))
(set (reg/f:SI 47)
(reg:SI 3))
Assume
After register allocation
Pseudo register r60 assigned to r3
and the hardware register is exhausted
RA (set (mem:SI (plus (sp)(const))))
(reg:SI 3))
Reload
Target defined constraints
 Target could define their own predicate and
constraint
 Target defined predicate
(define_predicate "index_operand"
(ior (match_operand 0 "register_operand")
(and (match_operand 0 “const_int_operand")
(match_test "(INTVAL (op) < 4096
&& INTVAL (op) > -4096))")))
Target defined constraints
 Target defined constraint
(define_register_constraint "l"
"LO_REGS"
"registers r0->r7.")
(define_memory_constraint "Uv"
"@internal In ARM/Thumb-2 state a valid VFP load/store address."
(and (match_code "mem")
(match_test "TARGET_32BIT
&& arm_coproc_mem_operand (op, FALSE)")))
Emit assembly code
 Multiple Alternative Constraints(define_insn “addsi3"
[ (set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i")))]
“”
“@
add %0, %1, %2
addi %0, %1, %2”
)
Match First alternative constraints
match “add”
Output assembly code “add $r3, $r4, $5”
Ex:
(set (reg/f:SI 3)
(plus:SI (reg:SI 4)
(reg:SI 5))
Target information usage
 When will GCC use target information get from
instruction patterns ?
 RTL instruction pattern generation
generate insn-emit.c when building GCC by parsing instruction
patterns
 RTL instruction validation (target supported)
generate insn-recog.c when building GCC by parsing instruction
patterns
 Emit target assembly code
generate insn-output.c when building GCC by parsing
instruction patterns
Preserve word to describe instruction
pattern
define_insn
“naming pattern”
define_expand
“naming pattern”
define_insn
“*..”
RTL generation
RTL validation
Emit assembly
 GCC define several “naming patterns” and their semantic use to
generate RTL pattern during RTL expand phase
 ex: addsi3, subsi3, movsi, movhi …
 Some target ISA which the semantic not defined in GCC naming
pattern but the RTL could generate by some optimization
 ex: add_slli could generate by combine phase
 define un-naming pattern make the instruction validate
 define_insn “*add_slli”
 define_insn name with * prefix will identify as un-naming pattern
Example of instruction pattern
1600 ;; These control RTL generation for conditional jump insns
1601 (define_expand "cbranchsi4"
1602 [(set (pc)
1603 (if_then_else (match_operator 0 "ordered_comparison_operator"
1604 [(match_operand:SI 1 "nonmemory_nonsymbol_operand" "")
1605 (match_operand:SI 2 "nonmemory_nonsymbol_operand" "")])
1606 (label_ref (match_operand 3 "" ""))
1607 (pc)))]
1608 ""
1609 {
1610 sh_expand_cbranchsi4 (operands);
1611 DONE;
1612 }
1613 )
Semantic of “cbranchsi4”
compare operand1 and operand 2 by operator 0
branch to label 3 if the compare result is true
Predicate "ordered_comparison_operator“ including EQ,NE,
LT,LTU,LE,LEU,GT,GTU,GE,GEU.
Use porting function sh_expand_cbranchsi4 to generate RTL pattern
Example of instruction pattern
1621 (define_insn "*bcondz"
1622 [(set (pc)
1623 (if_then_else (match_operator 0 "bcondz_operator"
1624 [(match_operand:SI 1 "register_operand" "r")
1625 (const_int 0)])
1626 (label_ref (match_operand 2 "" ""))
1627 (pc)))]
1628 ""
1629 {
1630 switch (GET_CODE (operands[0]))
1631 {
1632 case EQ:
1633 return "beqz %1, %2";
1634 case NE:
1635 return "bnez %1, %2";
1636 case LT:
1637 return "bltz %1, %2";
1638 case LE:
1639 return "blez %1, %2";
1640 case GT:
1641 return "bgtz %1, %2";
1642 case GE:
1643 return "bgez %1, %2";
1644 default:
1645 gcc_unreachable ();
1646 }
1647 }
Un-naming pattern “*bcondz”
Use to validate RTL and emit
assembly code for the branch
compare with zero
Example of instruction pattern
1388 (define_insn "one_cmplsi2"
1389 [(set (match_operand:SI 0 "register_operand" "=r")
1390 (not:SI (match_operand:SI 1 "register_operand" "r")))]
1391 ""
1392 "nort%0, %1, %1“)
Semantic of “one_cmplsi2”
not operand1 and set to operand 0
Naming pattern “one_cmplsi2” use to generate RTL, validate RTL
And output assembly code
Output assembly “nor ra, rb, rb” to match the semantic
Split instruction pattern
 When will need split instruction pattern ?
 The const_int value too big that single assembly
instruction can’t encode
Split the const_int to high part and low part
Could split the constant while define_expand
 But it’s not good enough, why?
 Too early split the constant will lost the
opportunity to optimize the RTL pattern
Split instruction pattern
 The optimization phase “move2add”could do the following
thing (use assembly code to present RTL semantic for
convenient )
move $r0, 123456
move $r1, 123457
move $r2, 123458
move $r0, 123456
addi $r1, $r0, 1
addi $r2, $r0, 2
sethi $r0, hi20(123456)
ori $r0, lo12(123456)
sethi $r1, hi20(123457)
ori $r1, lo12(123457)
sethi $r2, hi20(123458)
ori $r2, lo12(123458)
If split const_int to high/low part too
early
move2add will fail to transfer move
to add
Split instruction pattern
 How to split instruction pattern not in RTL
expand phase ?
 Use define_split, define_insn_and_split
Split instruction pattern
004t.gimple
144r.expand
Other gimple pass
145r.sibling
147r.initvals
148r.unshare
149r.vregs
150r.into_cfglayout
151r.jump
152r.subreg1
153r.dfinit
154r.cse1
155r.fwprop1
156r.cprop1
158r.hoist
159r.cprop2
162r.ce1
163r.reginfo
164r.loop2
165r.loop2_init
166r.loop2_invariant
170r.loop2_done
172r.cprop3
173r.cse2
174r.dse1
175r.fwprop2
176r.auto_inc_dec
177r.init-regs
178r.dce
179r.combine
180r.ce2
182r.regmove
183r.outof_cfglayout
184r.split1
185r.subreg2
188r.asmcons
190r.sched1
191r.ira
192r.postreload
194r.split2
198r.pro_and_epilogue
199r.dse2
200r.csa
201r.peephole2
202r.ce3
204r.cprop_hardreg
205r.dce
206r.bbro
208r.split4
209r.sched2
212r.alignments
215r.mach
216r.barriers
217r.dbr
218r.split5
220r.shorten
221r.nothrow
222r.final
223r.dfinish
224t.statistics
Split instruction pattern
486 (define_insn_and_split "*movsi_const"
487 [(set (match_operand:WORD 0 "register_operand" "=r,r")
488 (match_operand:WORD 1 "immediate_operand" "P,i"))]
489 ""
490 {
491 if (GET_CODE (operands[1]) == CONST_INT
&& SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20))
492 {
493 return "movit%0, %1";
494 }
495 else
496 return "#";
497 }
498 "reload_completed && GET_CODE (operands[1]) == CONST_INT
&& ! SIGNED_INT_FITS_N_BITS (INTVAL (ope rands[1]), 20)"
499 [(set (match_dup 0) (high:SI (match_dup 1)))
500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]
If const_int not fit signed 20 bit
return “#”
which means the pattern will split in split phase
Split instruction pattern
486 (define_insn_and_split "*movsi_const"
487 [(set (match_operand:WORD 0 "register_operand" "=r,r")
488 (match_operand:WORD 1 "immediate_operand" "P,i"))]
489 ""
490 {
491 if (GET_CODE (operands[1]) == CONST_INT
&& SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20))
492 {
493 return "movit%0, %1";
494 }
495 else
496 return "#";
497 }
498 "reload_completed && GET_CODE (operands[1]) == CONST_INT
&& ! SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)"
499 [(set (match_dup 0) (high:SI (match_dup 1)))
500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]
Split conditions:
Which is reload_completed (after reload)
&& the const_int not fit signed 20 bit
Split instruction pattern
486 (define_insn_and_split "*movsi_const"
487 [(set (match_operand:WORD 0 "register_operand" "=r,r")
488 (match_operand:WORD 1 "immediate_operand" "P,i"))]
489 ""
490 {
491 if (GET_CODE (operands[1]) == CONST_INT
&& SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20))
492 {
493 return "movit%0, %1";
494 }
495 else
496 return "#";
497 }
498 "reload_completed && GET_CODE (operands[1]) == CONST_INT
&& ! SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)"
499 [(set (match_dup 0) (high:SI (match_dup 1)))
500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]
Split RTL pattern to set high part
And add low sum
match_dup 0 means duplicate operands 0 to this field
Split instruction pattern
288 (define_split
289 [(set (match_operand:ANY64 0 "register_operand" "")
290 (match_operand:ANY64 1 "register_operand" ""))]
291 "reload_completed &&
292 (! USE_V3_SERISE_ISA)”
295 [(set (match_dup 0) (match_dup 1))
296 (set (match_dup 2) (match_dup 3))]
297 “…
Split condition would be
reload_completed && not V3 ISA
V3 have movd44 which could do
64 bit register move
ANY64: DI, DF
DI: double int
DF:double float
define_split Define_insn_
and_split
Split RTL
RTL
validation
Emit
assembly
Instruction attribute
120 (define_attr "type"
121 "unknown,load,store,bequal, alu, .."
122 (const_string "unknown"))
…
614 (define_insn "cmovn"
615 [(set (match_operand:SI 0 "register_operand" "=r")
616 (if_then_else (ne:SI (match_operand:SI 1 "register_operand" "r")
617 (const_int 0))
618 (match_operand:SI 2 "register_operand" "r")
619 (match_operand:SI 3 "register_operand" "0")))]
620 ""
621 "cmovnt%0, %2, %1"
622 [(set_attr "type" "alu")
623 (set_attr “length” “4”])
(define_attr “attribute_name” “value domain” (default value))
Instruction attribute
 Attribute “type” use to divide instruction to
several instruction group
 Help to write instruction scheduling porting code
 Attribute “length” give each instruction ISA
length (size) information make the GCC
could calculate branch distance correctly.
Peephole pattern
2072 ;; Merge move 0 to bcondz
2073 (define_peephole2
2074 [(set (match_operand:SI 0 "register_operand" "") (const_int 0))
2075 (set (pc)
2076 (if_then_else (match_operator 1 "bcondz_operator"
2077 [(match_dup 0)
2078 (match_operand:SI 2 "register_operand" "r")])
2079 (label_ref (match_operand 3 "" ""))
2080 (pc)))]
2081 "peep2_reg_dead_p (2, operands[0])"
2082 [(set (pc)
2083 (if_then_else:SI (match_dup 1)
2084 (label_ref (match_dup 3)) (pc)))]
2085 "
2086 {
2087 operands[1] = gen_rtx_fmt_ee (swap_condition (GET_CODE (operands[1])) ,
2088 SImode, operands[2], GEN_INT(0));
2089 }")
Old RTL
New RTL
movi $r0, 0
bne $r0, $r1, L3
bnez $r1, L3
Instruction scheduling
 Instruction scheduling is the optimization
pass in GCC
 change instruction without changing the
semantic of the code
 To reduce the pipeline stall to improve
performance
 Instruction scheduling is belong to RTL phase
RTL optimization pass in GCC 4.6.2
004t.gimple
144r.expand
Other gimple pass
145r.sibling
147r.initvals
148r.unshare
149r.vregs
150r.into_cfglayout
151r.jump
152r.subreg1
153r.dfinit
154r.cse1
155r.fwprop1
156r.cprop1
158r.hoist
159r.cprop2
162r.ce1
163r.reginfo
164r.loop2
165r.loop2_init
166r.loop2_invariant
170r.loop2_done
172r.cprop3
173r.cse2
174r.dse1
175r.fwprop2
176r.auto_inc_dec
177r.init-regs
178r.dce
179r.combine
180r.ce2
182r.regmove
183r.outof_cfglayout
184r.split1
185r.subreg2
188r.asmcons
190r.sched1
191r.ira
192r.postreload
194r.split2
198r.pro_and_epilogue
199r.dse2
200r.csa
201r.peephole2
202r.ce3
204r.cprop_hardreg
205r.dce
206r.bbro
208r.split4
209r.sched2
212r.alignments
215r.mach
216r.barriers
217r.dbr
218r.split5
220r.shorten
221r.nothrow
222r.final
223r.dfinish
224t.statistics
Instruction scheduling
 GCC have two scheduling pass
 Sched1
Do the interblock scheduling before Register allocation
 Try to find the innermost loop as region
 Schedule the instructions in the region
 Improve the performance of hot spot (innermost loop)
 Extend the scope to region to find more scheduling
opportunity
 Sched2
Do the single basic block scheduling after Register allocation
Register allocation may produce spill code (load/store)
 Need re-schedule again
Instruction scheduling
 Instruction scheduling resolve the following
hazard to prevent pipeline stall
 Structure hazard
Structure hazard occur when two or more instruction
need the same function unit at the same time
 Data hazard
RAW (read after write): a true dependency
WAR (write after read): a anti-dependency
WAW(write after write): an output dependency
Instruction scheduling
 GCC provide several interface to describe
pipeline model
 After parsing the pipeline description porting
code
Gcc will generate a automata as a pipeline hazard
recognizer
To figure out the possibility of the instruction issue by
the processor on a given simulated cycle
(define_automaton “name")
Instruction scheduling
(define_automaton “a1")
(define_cpu_unit "decode1,decode2" "a1")
(define_cpu_unit "div" "a1")
(define_insn_reservation “alu_class" 1
(eq_attr “type" “alu")
"decode + alu")
(define_insn_reservation "mult_class" 1
(eq_attr “type" "mult")
"decode + mult")
a1: automata name
decode1, decode2, div: the cpu unit
(function unit) in the processor
define_insn_reservation: describe
pipeline rule for each instruction class
alu_class,mult_class:
insn-name (insn class)
(eq_attr “type" “alu"): match the rule
while the type attribute of the
Instruction pattern is alu
"decode + alu": regular expression
to describe the function unit usage
1 is the default cycle when the data
dependency occur
Instruction scheduling
 Multiple Alternative Constraints(define_insn “addsi3"
[ (set (match_operand:SI 0 “register_operand" "=r,r")
(plus:SI (match_operand:SI 1 “register_operand" "%r,r")
(match_operand:SI 2 “nonmemory_operand" “r,i")))]
“”
“@
add %0, %1, %2
addi %0, %1, %2”
[(set_attr “type" “alu")
)
(define_insn_reservation “alu_class" 1
(eq_attr “type" “alu")
"decode + alu")
Instruction scheduling
(define_automaton “a1")
(define_cpu_unit "decode1,decode2" "a1")
(define_cpu_unit “alu" "a1")
(define_cpu_unit “mult" "a1")
(define_insn_reservation “alu_class" 1
(eq_attr “type" “alu")
"decode + alu")
(define_insn_reservation "mult_class" 1
(eq_attr “type" "mult")
"decode + mult")
nothing
decode
+ alu
decode
+ mult
alu_class
next_cycle
next_cycle
mult_class
next_cycle
Current CPU
Function unit usage
Next cycle CPU
Function unit usage
State transition:
1. Occupy some function unit
2. release function some unit
Instruction scheduling
(define_insn_reservation “alu_class" 1
(eq_attr “type" “alu")
"decode + alu")
(define_insn_reservation "mult_class" 1
(eq_attr “type" "mult")
"decode + mult")
(define_bypass 2 “alu_class" “alu_class“)
(define_bypass 3 “mult_class" “mult_class“)
producer consumer t 1 2 3 4 5
alu_class alu_class
mult_class
1 0 0 0 0
0 0 0 0 0
mult_class alu_class
mult_class
0 0 0 0 0
1 1 0 0 0
1 means will stall at t cycle
t cycle is the cycle time
After producer
Instruction scheduling
producer consumer t 1 2 3 4 5
alu_class alu_class
mult_class
1 0 0 0 0
0 0 0 0 0
mult_class alu_class
mult_class
0 0 0 0 0
1 1 0 0 0
0 0 0 0 0
0 0 0 0 0
1 0 0 0 0
0 0 0 0 0
1 0 0 0 0
1 0 0 0 0
0 0 0 0 0
1 1 0 0 0
1 0 0 0 0
1 0 0 0 0
Current state
consumer
alu_class
mult_class
t 1 2 3 4 5
Instruction scheduling
1. movi $r0, 0 {alu}
2. movi $r1, 1 {alu}
3. add $r0, $r0, $r1 {alu}
4. lwi $r4, [$sp + 4] {load}
5. mul $r5, $r0, $r4 {mul}
1
4
2
5
3
(define_insn_reservation “alu_class" 1
(eq_attr “type" “alu")
"decode + alu")
(define_insn_reservation "mult_class" 1
(eq_attr “type" "mult")
"decode + mult")
(define_insn_reservation “load_class" 1
(eq_attr “type" “load")
"decode + mem")
Bottom up calcuate priority of
Each instruction
By P = max {latency
+ one successor latency}
1
22
3
3
Dataflow graph
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list: 1 2 4
Pending list: 3 5
Queued list:
Scheduled list:
Ready Pending Queued Scheduled
Scheduled
Dependency
resolved
Data hazard
Pick the max priority insn from Ready list
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list: 4
Pending list: 3 5
Queued list: 2
Scheduled list:1
1. movi $r0, 0 {alu}
{alu} {alu}
{alu} {load}
{mult}
cycle 1
(define_bypass 2 “alu_class" “alu_class“)
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list: 2
Pending list: 3 5
Queued list:
Scheduled list:1 4
1. movi $r0, 0 {alu}
{alu} {alu}
{alu} {load}
{mult}
cycle 1
4. lwi $r4, [$sp + 4] {load}cycle 2
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list:
Pending list: 5
Queued list: 3
Scheduled list:1 4 2
1. movi $r0, 0 {alu}
{alu} {alu}
{alu} {load}
{mult}
cycle 1
4. lwi $r4, [$sp + 4] {load}cycle 2
2. movi $r1, 1 {alu}cycle 3
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list: 3
Pending list: 5
Queued list:
Scheduled list:1 4 2
1. movi $r0, 0 {alu}
{alu} {alu}
{alu} {load}
{mult}
cycle 1
4. lwi $r4, [$sp + 4] {load}cycle 2
2. movi $r1, 1 {alu}cycle 3
cycle 4
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list: 5
Pending list:
Queued list:
Scheduled list:1 4 2 3
1. movi $r0, 0 {alu}
{alu} {alu}
{alu} {load}
{mult}
cycle 1
4. lwi $r4, [$sp + 4] {load}cycle 2
2. movi $r1, 1 {alu}cycle 3
cycle 4
3. add $r0, $r0, $r1 {alu}cycle 5
Instruction scheduling
1
4
2
5
3
Dataflow graph
Ready list:
Pending list:
Queued list:
Scheduled list:1 4 2 3 5
1. movi $r0, 0 {alu}
{alu} {alu}
{alu} {load}
{mult}
cycle 1
4. lwi $r4, [$sp + 4] {load}cycle 2
2. movi $r1, 1 {alu}cycle 3
cycle 4
3. add $r0, $r0, $r1 {alu}cycle 5
5. mul $r5, $r0, $r4 {mul}cycle 6
Thank you
Switch initialization conversion in
gimple optimization pass
31 int a,b;
32
33 switch (argc)
34 {
35 case 1:
36 case 2:
37 a = 8;
38 b = 6;
39 break;
40 case 3:
41 a = 9;
42 b = 5;
43 break;
44 case 12:
45 a = 10;
46 b = 4;
47 break;
48 default:
49 a = 16;
50 b = 1;
51 }
58 static const int = CSWTCH01[] = {6, 6, 5, 1, 1, 1, 1, 1, 1, 1, 1, 4};
59 static const int = CSWTCH02[] = {8, 8, 9, 16, 16, 16, 16, 16, 16, 16,
60 16, 16, 10};
61
62 if (((unsigned) argc) - 1 < 11)
63 {
64 a = CSWTCH02[argc - 1];
65 b = CSWTCH01[argc - 1];
66 }
67 else
68 {
69 a = 16;
70 b = 1;
71 }
Try to transfer switch statement to static array access

More Related Content

What's hot

An Introduction to CMake
An Introduction to CMakeAn Introduction to CMake
An Introduction to CMake
ICS
 
BUD17-302: LLVM Internals #2
BUD17-302: LLVM Internals #2 BUD17-302: LLVM Internals #2
BUD17-302: LLVM Internals #2
Linaro
 
LLVM Register Allocation
LLVM Register AllocationLLVM Register Allocation
LLVM Register Allocation
Wang Hsiangkai
 
GCC LTO
GCC LTOGCC LTO
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
National Cheng Kung University
 
Qemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System EmulationQemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System Emulation
National Cheng Kung University
 
Build Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVMBuild Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVM
National Cheng Kung University
 
Debug Information And Where They Come From
Debug Information And Where They Come FromDebug Information And Where They Come From
Debug Information And Where They Come From
Min-Yih Hsu
 
Integrated Register Allocation introduction
Integrated Register Allocation introductionIntegrated Register Allocation introduction
Integrated Register Allocation introduction
Shiva Chen
 
How to write a TableGen backend
How to write a TableGen backendHow to write a TableGen backend
How to write a TableGen backend
Min-Yih Hsu
 
Interpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratchInterpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratch
National Cheng Kung University
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)
Sławomir Zborowski
 
Virtual Machine Constructions for Dummies
Virtual Machine Constructions for DummiesVirtual Machine Constructions for Dummies
Virtual Machine Constructions for Dummies
National Cheng Kung University
 
Something About Dynamic Linking
Something About Dynamic LinkingSomething About Dynamic Linking
Something About Dynamic Linking
Wang Hsiangkai
 
icecream / icecc:分散式編譯系統簡介
icecream / icecc:分散式編譯系統簡介icecream / icecc:分散式編譯系統簡介
icecream / icecc:分散式編譯系統簡介
Kito Cheng
 
Effective CMake
Effective CMakeEffective CMake
Effective CMake
Daniel Pfeifer
 
Runtime Symbol Resolution
Runtime Symbol ResolutionRuntime Symbol Resolution
Runtime Symbol Resolution
Ken Kawamoto
 
Understand more about C
Understand more about CUnderstand more about C
Understand more about C
Yi-Hsiu Hsu
 

What's hot (20)

An Introduction to CMake
An Introduction to CMakeAn Introduction to CMake
An Introduction to CMake
 
BUD17-302: LLVM Internals #2
BUD17-302: LLVM Internals #2 BUD17-302: LLVM Internals #2
BUD17-302: LLVM Internals #2
 
LLVM Register Allocation
LLVM Register AllocationLLVM Register Allocation
LLVM Register Allocation
 
GCC LTO
GCC LTOGCC LTO
GCC LTO
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
 
Qemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System EmulationQemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System Emulation
 
Build Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVMBuild Programming Language Runtime with LLVM
Build Programming Language Runtime with LLVM
 
Debug Information And Where They Come From
Debug Information And Where They Come FromDebug Information And Where They Come From
Debug Information And Where They Come From
 
LLVM
LLVMLLVM
LLVM
 
Integrated Register Allocation introduction
Integrated Register Allocation introductionIntegrated Register Allocation introduction
Integrated Register Allocation introduction
 
How to write a TableGen backend
How to write a TableGen backendHow to write a TableGen backend
How to write a TableGen backend
 
Interpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratchInterpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratch
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)
 
Virtual Machine Constructions for Dummies
Virtual Machine Constructions for DummiesVirtual Machine Constructions for Dummies
Virtual Machine Constructions for Dummies
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
 
Something About Dynamic Linking
Something About Dynamic LinkingSomething About Dynamic Linking
Something About Dynamic Linking
 
icecream / icecc:分散式編譯系統簡介
icecream / icecc:分散式編譯系統簡介icecream / icecc:分散式編譯系統簡介
icecream / icecc:分散式編譯系統簡介
 
Effective CMake
Effective CMakeEffective CMake
Effective CMake
 
Runtime Symbol Resolution
Runtime Symbol ResolutionRuntime Symbol Resolution
Runtime Symbol Resolution
 
Understand more about C
Understand more about CUnderstand more about C
Understand more about C
 

Viewers also liked

Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories
Dániel Stein
 
Exception handling poirting in gcc
Exception handling poirting in gccException handling poirting in gcc
Exception handling poirting in gccShiva Chen
 
Fix gcc lra bug
Fix gcc lra bugFix gcc lra bug
Fix gcc lra bugShiva Chen
 
Rethinking the debugger
Rethinking the debuggerRethinking the debugger
Rethinking the debugger
Iulian Dragos
 
Network analysis in basketball
Network analysis in basketballNetwork analysis in basketball
Network analysis in basketball
Swaagie
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
Brendan Gregg
 

Viewers also liked (6)

Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories
 
Exception handling poirting in gcc
Exception handling poirting in gccException handling poirting in gcc
Exception handling poirting in gcc
 
Fix gcc lra bug
Fix gcc lra bugFix gcc lra bug
Fix gcc lra bug
 
Rethinking the debugger
Rethinking the debuggerRethinking the debugger
Rethinking the debugger
 
Network analysis in basketball
Network analysis in basketballNetwork analysis in basketball
Network analysis in basketball
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 

Similar to Gcc porting

GCC RTL and Machine Description
GCC RTL and Machine DescriptionGCC RTL and Machine Description
GCC RTL and Machine Description
Priyatham Bollimpalli
 
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
eugeniadean34240
 
The Ring programming language version 1.7 book - Part 87 of 196
The Ring programming language version 1.7 book - Part 87 of 196The Ring programming language version 1.7 book - Part 87 of 196
The Ring programming language version 1.7 book - Part 87 of 196
Mahmoud Samir Fayed
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Marina Kolpakova
 
The Ring programming language version 1.5.3 book - Part 91 of 184
The Ring programming language version 1.5.3 book - Part 91 of 184The Ring programming language version 1.5.3 book - Part 91 of 184
The Ring programming language version 1.5.3 book - Part 91 of 184
Mahmoud Samir Fayed
 
The Ring programming language version 1.6 book - Part 83 of 189
The Ring programming language version 1.6 book - Part 83 of 189The Ring programming language version 1.6 book - Part 83 of 189
The Ring programming language version 1.6 book - Part 83 of 189
Mahmoud Samir Fayed
 
GoFFIng around with Ruby #RubyConfPH
GoFFIng around with Ruby #RubyConfPHGoFFIng around with Ruby #RubyConfPH
GoFFIng around with Ruby #RubyConfPH
Gautam Rege
 
The Ring programming language version 1.5.4 book - Part 80 of 185
The Ring programming language version 1.5.4 book - Part 80 of 185The Ring programming language version 1.5.4 book - Part 80 of 185
The Ring programming language version 1.5.4 book - Part 80 of 185
Mahmoud Samir Fayed
 
Android RenderScript on LLVM
Android RenderScript on LLVMAndroid RenderScript on LLVM
Android RenderScript on LLVM
John Lee
 
The Ring programming language version 1.5.1 book - Part 76 of 180
The Ring programming language version 1.5.1 book - Part 76 of 180The Ring programming language version 1.5.1 book - Part 76 of 180
The Ring programming language version 1.5.1 book - Part 76 of 180
Mahmoud Samir Fayed
 
The Ring programming language version 1.5.2 book - Part 78 of 181
The Ring programming language version 1.5.2 book - Part 78 of 181The Ring programming language version 1.5.2 book - Part 78 of 181
The Ring programming language version 1.5.2 book - Part 78 of 181
Mahmoud Samir Fayed
 
The Ring programming language version 1.9 book - Part 93 of 210
The Ring programming language version 1.9 book - Part 93 of 210The Ring programming language version 1.9 book - Part 93 of 210
The Ring programming language version 1.9 book - Part 93 of 210
Mahmoud Samir Fayed
 
Boosting Developer Productivity with Clang
Boosting Developer Productivity with ClangBoosting Developer Productivity with Clang
Boosting Developer Productivity with Clang
Samsung Open Source Group
 
The Ring programming language version 1.5.4 book - Part 81 of 185
The Ring programming language version 1.5.4 book - Part 81 of 185The Ring programming language version 1.5.4 book - Part 81 of 185
The Ring programming language version 1.5.4 book - Part 81 of 185
Mahmoud Samir Fayed
 
GCC
GCCGCC
Specialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingSpecialized Compiler for Hash Cracking
Specialized Compiler for Hash Cracking
Positive Hack Days
 
A taste of GlobalISel
A taste of GlobalISelA taste of GlobalISel
A taste of GlobalISel
Igalia
 
The Ring programming language version 1.5.2 book - Part 77 of 181
The Ring programming language version 1.5.2 book - Part 77 of 181The Ring programming language version 1.5.2 book - Part 77 of 181
The Ring programming language version 1.5.2 book - Part 77 of 181
Mahmoud Samir Fayed
 
07 140430-ipp-languages used in llvm during compilation
07 140430-ipp-languages used in llvm during compilation07 140430-ipp-languages used in llvm during compilation
07 140430-ipp-languages used in llvm during compilation
Adam Husár
 
Clojure And Swing
Clojure And SwingClojure And Swing
Clojure And Swing
Skills Matter
 

Similar to Gcc porting (20)

GCC RTL and Machine Description
GCC RTL and Machine DescriptionGCC RTL and Machine Description
GCC RTL and Machine Description
 
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
 
The Ring programming language version 1.7 book - Part 87 of 196
The Ring programming language version 1.7 book - Part 87 of 196The Ring programming language version 1.7 book - Part 87 of 196
The Ring programming language version 1.7 book - Part 87 of 196
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 
The Ring programming language version 1.5.3 book - Part 91 of 184
The Ring programming language version 1.5.3 book - Part 91 of 184The Ring programming language version 1.5.3 book - Part 91 of 184
The Ring programming language version 1.5.3 book - Part 91 of 184
 
The Ring programming language version 1.6 book - Part 83 of 189
The Ring programming language version 1.6 book - Part 83 of 189The Ring programming language version 1.6 book - Part 83 of 189
The Ring programming language version 1.6 book - Part 83 of 189
 
GoFFIng around with Ruby #RubyConfPH
GoFFIng around with Ruby #RubyConfPHGoFFIng around with Ruby #RubyConfPH
GoFFIng around with Ruby #RubyConfPH
 
The Ring programming language version 1.5.4 book - Part 80 of 185
The Ring programming language version 1.5.4 book - Part 80 of 185The Ring programming language version 1.5.4 book - Part 80 of 185
The Ring programming language version 1.5.4 book - Part 80 of 185
 
Android RenderScript on LLVM
Android RenderScript on LLVMAndroid RenderScript on LLVM
Android RenderScript on LLVM
 
The Ring programming language version 1.5.1 book - Part 76 of 180
The Ring programming language version 1.5.1 book - Part 76 of 180The Ring programming language version 1.5.1 book - Part 76 of 180
The Ring programming language version 1.5.1 book - Part 76 of 180
 
The Ring programming language version 1.5.2 book - Part 78 of 181
The Ring programming language version 1.5.2 book - Part 78 of 181The Ring programming language version 1.5.2 book - Part 78 of 181
The Ring programming language version 1.5.2 book - Part 78 of 181
 
The Ring programming language version 1.9 book - Part 93 of 210
The Ring programming language version 1.9 book - Part 93 of 210The Ring programming language version 1.9 book - Part 93 of 210
The Ring programming language version 1.9 book - Part 93 of 210
 
Boosting Developer Productivity with Clang
Boosting Developer Productivity with ClangBoosting Developer Productivity with Clang
Boosting Developer Productivity with Clang
 
The Ring programming language version 1.5.4 book - Part 81 of 185
The Ring programming language version 1.5.4 book - Part 81 of 185The Ring programming language version 1.5.4 book - Part 81 of 185
The Ring programming language version 1.5.4 book - Part 81 of 185
 
GCC
GCCGCC
GCC
 
Specialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingSpecialized Compiler for Hash Cracking
Specialized Compiler for Hash Cracking
 
A taste of GlobalISel
A taste of GlobalISelA taste of GlobalISel
A taste of GlobalISel
 
The Ring programming language version 1.5.2 book - Part 77 of 181
The Ring programming language version 1.5.2 book - Part 77 of 181The Ring programming language version 1.5.2 book - Part 77 of 181
The Ring programming language version 1.5.2 book - Part 77 of 181
 
07 140430-ipp-languages used in llvm during compilation
07 140430-ipp-languages used in llvm during compilation07 140430-ipp-languages used in llvm during compilation
07 140430-ipp-languages used in llvm during compilation
 
Clojure And Swing
Clojure And SwingClojure And Swing
Clojure And Swing
 

Recently uploaded

PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 

Recently uploaded (20)

PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 

Gcc porting

  • 1. GCC porting Use instruction pattern describe target ISA Shiva Chen shiva0217@gmail.com May 2013
  • 2. Outline  Compiler structure  Intermediate languages in GCC  Optimization pass in GCC  Define instruction pattern  Operand constraints  Match instruction pattern  Strict RTL  Target defined constraints  Emit assembly code  Target information usage  Preserve word to describe instruction pattern  Example of instruction pattern  Split instruction pattern  Instruction attribute  Peephole pattern  Instruction scheduling
  • 3.
  • 4.  Three main intermediate languages format in GCC  GENERIC Language-independent representation generated by each front end Common representation for all the languages supported by GCC.  GIMPLE Perform language independent and target independent optimization  RTL Perform the optimization which will notice target feature by porting code
  • 5. Gimple optimization pass in GCC 4.6.2 004t.gimple 006t.vcg 009t.omplower 010t.lower 012t.eh 013t.cfg 017t.ssa 018t.veclower 019t.inline_param1 020t.einline 021t.early_optimizations 022t.copyrename1 023t.ccp1 024t.forwprop1 025t.ealias 026t.esra 027t.copyprop1 028t.mergephi1 029t.cddce1 030t.eipa_sra 031t.tailr1 032t.switchconv 034t.profile 035t.local-pure-const1 036t.fnsplit 037t.release_ssa 038t.inline_param2 057t.copyrename2 058t.cunrolli 059t.ccp2 060t.forwprop2 062t.alias 063t.retslot 064t.phiprop 065t.fre 066t.copyprop2 067t.mergephi2 068t.vrp1 069t.dce1 070t.cselim 071t.ifcombine 072t.phiopt1 073t.tailr2 074t.ch 076t.cplxlower 077t.sra 078t.copyrename3 079t.dom1 080t.phicprop1 081t.dse1 082t.reassoc1 083t.dce2 084t.forwprop3 085t.phiopt2 086t.objsz 087t.ccp3 088t.copyprop3 090t.bswap 091t.crited 092t.pre 093t.sink 094t.loop 095t.loopinit 096t.lim1 097t.copyprop4 … 143t.optimized
  • 6. RTL optimization pass in GCC 4.6.2 004t.gimple 144r.expand Other gimple pass 145r.sibling 147r.initvals 148r.unshare 149r.vregs 150r.into_cfglayout 151r.jump 152r.subreg1 153r.dfinit 154r.cse1 155r.fwprop1 156r.cprop1 158r.hoist 159r.cprop2 162r.ce1 163r.reginfo 164r.loop2 165r.loop2_init 166r.loop2_invariant 170r.loop2_done 172r.cprop3 173r.cse2 174r.dse1 175r.fwprop2 176r.auto_inc_dec 177r.init-regs 178r.dce 179r.combine 180r.ce2 182r.regmove 183r.outof_cfglayout 184r.split1 185r.subreg2 188r.asmcons 190r.sched1 191r.ira 192r.postreload 194r.split2 198r.pro_and_epilogue 199r.dse2 200r.csa 201r.peephole2 202r.ce3 204r.cprop_hardreg 205r.dce 206r.bbro 208r.split4 209r.sched2 212r.alignments 215r.mach 216r.barriers 217r.dbr 218r.split5 220r.shorten 221r.nothrow 222r.final 223r.dfinish 224t.statistics
  • 7.  Why need divide optimization pass to gimple pass and RTL pass?  Gimple pass have more high level semantic Ex: switch, array, structure, variable Some optimization is more easier to designed when high level semantic still exist  However, gimple pass lack of target information Ex: instruction length(size), supported ISA Therefore, we need RTL optimization pass
  • 8. Define instruction pattern  All the RTL pattern must match target ISA  How to tell GCC generate the RTL match ISA ? Instruction patterns  Use define_expand, define_insn to describe the instruction patterns which target support (define_insn “addsi3" [ (set (match_operand:SI 0 “register_operand" "=r,r") (plus:SI (match_operand:SI 1 “register_operand" "%r,r") (match_operand:SI 2 “nonmemory_operand" “r,i")) ) ] ... )
  • 9. Define instruction pattern  GCC already define several instruction pattern name and the semantic of the pattern  addsi3 Add semantic with 3 SI mode operands  GCC don’t know the operand constraint of the target  How to tell GCC our target’s operand constraint of each instruction ? Predicate Constraint
  • 10. Operand Constraints  Multiple Alternative Constraints (define_insn “addsi3" [ (set (match_operand:SI 0 “register_operand" "=r,r") (plus:SI (match_operand:SI 1 “register_operand" "%r,r") (match_operand:SI 2 “nonmemory_operand" “r,i")) ) ] ... ) Predicate: register_operand, nonmemory_operand Constraint: r, i Predicate should contain each constraints of the operand For operand 2 with SI mode r(reg) belong to nonmemory_operand i(immediate) belong to nonmemory_operand
  • 11. Operand Constraints  GCC already have predicate to restrict operand  Why need constraint field ? Give the opportunity to change operand while optimization  Ex: movi $r0, 4; add $r1, $r1, $r0 {addsi3} Constant propagation => addi $r1, $1, 4 {addsi3}
  • 12. Operand Constraints  GCC use two level operand constraint  group same semantic instruction together with single instruction pattern (addsi3)  Lots of ISA designed have several assembly instructions with same semantic and different operand constraint  Reduce the instruction pattern when porting
  • 13. Operand Constraints  Use instruction pattern do ISA support checking when GCC generate a new RTL pattern  Check does the back end define the pattern by define_insn  Check the operand type support or not by predicate  Check the operand belong to which alternative by constraint
  • 14. Operand Constraints  Multiple Alternative Constraints (define_insn “addsi3" [ (set (match_operand:SI 0 “register_operand" "=r,r") (plus:SI (match_operand:SI 1 “register_operand" "%r,r") (match_operand:SI 2 “nonmemory_operand" “r,i")) ) ] ... ) First alternative constraints match “add” Second alternative constraints match “addi”
  • 15. Match instruction pattern  Multiple Alternative Constraints (define_insn “addsi3" [ (set (match_operand:SI 0 “register_operand" "=r,r") (plus:SI (match_operand:SI 1 “register_operand" "%r,r") (match_operand:SI 2 “nonmemory_operand" “r,i")) ) ] ... ) Ex: (set (reg/f:SI 88) (plus:SI (reg:SI 87) (reg/v:SI 55)) 1. Parsing RTL pattern (set (op0) (plus:SI (op1) (op2))
  • 16. Match instruction pattern  When will generate new RTL pattern ?  RTL expand phase (GIMPLE to RTL)  During optimization Ex: (set (reg/f:SI 47) (lshiftrt:SI (reg:SI 60) (const_int 2)) (set (reg/f:SI 88) (plus:SI (reg:SI 47) (reg:SI 55)) (set (reg/f:SI 88) (plus:SI (lshiftrt:SI (reg:SI 60) (const_int 2)) (reg/v:SI 55))Combine phase srli $r47, $r60, 2 add $r88, $r47, $r55 add_srli $r88, $r55, $r60, 2
  • 17. Strict RTL  Does the new generated RTL pattern always satisfy constraint ?  GCC allow certain kind un-match constraint which reload could fix it later  Predicate must always satisfy RTL1 Not do optimization1 Do optimization1 RTL1 RTL2 Reload Reload RTL3 RTL2 not satisfy constraint RTL4 1. RTL3 and RTL4 Satisfy constraint 2. RTL4 is better Then RTL3
  • 18. Strict RTL  Constraint could allow certain un-match before reload, and hope reload to fix it  Ex: constraint is m (memory), but current operand is constant, GCC will allow before reload  Reload phase is after register allocation In fact, during register allocation, GCC will call reload rapidly while the operand not fit the constraint.  After reload, the operand must satisfy one of the operand constraint (strict RTL)
  • 19. Strict RTL (define_insn “movsi" [ (set (match_operand:SI 0 “register_operand" "=r,m") (match_operand:SI 1 “register_operand" “r,r")) ) ] ... ) (set (reg/f:SI 47) (reg:SI 60)) (set (reg/f:SI 47) (reg:SI 3)) Assume After register allocation Pseudo register r60 assigned to r3 and the hardware register is exhausted RA (set (mem:SI (plus (sp)(const)))) (reg:SI 3)) Reload
  • 20. Target defined constraints  Target could define their own predicate and constraint  Target defined predicate (define_predicate "index_operand" (ior (match_operand 0 "register_operand") (and (match_operand 0 “const_int_operand") (match_test "(INTVAL (op) < 4096 && INTVAL (op) > -4096))")))
  • 21. Target defined constraints  Target defined constraint (define_register_constraint "l" "LO_REGS" "registers r0->r7.") (define_memory_constraint "Uv" "@internal In ARM/Thumb-2 state a valid VFP load/store address." (and (match_code "mem") (match_test "TARGET_32BIT && arm_coproc_mem_operand (op, FALSE)")))
  • 22. Emit assembly code  Multiple Alternative Constraints(define_insn “addsi3" [ (set (match_operand:SI 0 “register_operand" "=r,r") (plus:SI (match_operand:SI 1 “register_operand" "%r,r") (match_operand:SI 2 “nonmemory_operand" “r,i")))] “” “@ add %0, %1, %2 addi %0, %1, %2” ) Match First alternative constraints match “add” Output assembly code “add $r3, $r4, $5” Ex: (set (reg/f:SI 3) (plus:SI (reg:SI 4) (reg:SI 5))
  • 23. Target information usage  When will GCC use target information get from instruction patterns ?  RTL instruction pattern generation generate insn-emit.c when building GCC by parsing instruction patterns  RTL instruction validation (target supported) generate insn-recog.c when building GCC by parsing instruction patterns  Emit target assembly code generate insn-output.c when building GCC by parsing instruction patterns
  • 24. Preserve word to describe instruction pattern define_insn “naming pattern” define_expand “naming pattern” define_insn “*..” RTL generation RTL validation Emit assembly  GCC define several “naming patterns” and their semantic use to generate RTL pattern during RTL expand phase  ex: addsi3, subsi3, movsi, movhi …  Some target ISA which the semantic not defined in GCC naming pattern but the RTL could generate by some optimization  ex: add_slli could generate by combine phase  define un-naming pattern make the instruction validate  define_insn “*add_slli”  define_insn name with * prefix will identify as un-naming pattern
  • 25. Example of instruction pattern 1600 ;; These control RTL generation for conditional jump insns 1601 (define_expand "cbranchsi4" 1602 [(set (pc) 1603 (if_then_else (match_operator 0 "ordered_comparison_operator" 1604 [(match_operand:SI 1 "nonmemory_nonsymbol_operand" "") 1605 (match_operand:SI 2 "nonmemory_nonsymbol_operand" "")]) 1606 (label_ref (match_operand 3 "" "")) 1607 (pc)))] 1608 "" 1609 { 1610 sh_expand_cbranchsi4 (operands); 1611 DONE; 1612 } 1613 ) Semantic of “cbranchsi4” compare operand1 and operand 2 by operator 0 branch to label 3 if the compare result is true Predicate "ordered_comparison_operator“ including EQ,NE, LT,LTU,LE,LEU,GT,GTU,GE,GEU. Use porting function sh_expand_cbranchsi4 to generate RTL pattern
  • 26. Example of instruction pattern 1621 (define_insn "*bcondz" 1622 [(set (pc) 1623 (if_then_else (match_operator 0 "bcondz_operator" 1624 [(match_operand:SI 1 "register_operand" "r") 1625 (const_int 0)]) 1626 (label_ref (match_operand 2 "" "")) 1627 (pc)))] 1628 "" 1629 { 1630 switch (GET_CODE (operands[0])) 1631 { 1632 case EQ: 1633 return "beqz %1, %2"; 1634 case NE: 1635 return "bnez %1, %2"; 1636 case LT: 1637 return "bltz %1, %2"; 1638 case LE: 1639 return "blez %1, %2"; 1640 case GT: 1641 return "bgtz %1, %2"; 1642 case GE: 1643 return "bgez %1, %2"; 1644 default: 1645 gcc_unreachable (); 1646 } 1647 } Un-naming pattern “*bcondz” Use to validate RTL and emit assembly code for the branch compare with zero
  • 27. Example of instruction pattern 1388 (define_insn "one_cmplsi2" 1389 [(set (match_operand:SI 0 "register_operand" "=r") 1390 (not:SI (match_operand:SI 1 "register_operand" "r")))] 1391 "" 1392 "nort%0, %1, %1“) Semantic of “one_cmplsi2” not operand1 and set to operand 0 Naming pattern “one_cmplsi2” use to generate RTL, validate RTL And output assembly code Output assembly “nor ra, rb, rb” to match the semantic
  • 28. Split instruction pattern  When will need split instruction pattern ?  The const_int value too big that single assembly instruction can’t encode Split the const_int to high part and low part Could split the constant while define_expand  But it’s not good enough, why?  Too early split the constant will lost the opportunity to optimize the RTL pattern
  • 29. Split instruction pattern  The optimization phase “move2add”could do the following thing (use assembly code to present RTL semantic for convenient ) move $r0, 123456 move $r1, 123457 move $r2, 123458 move $r0, 123456 addi $r1, $r0, 1 addi $r2, $r0, 2 sethi $r0, hi20(123456) ori $r0, lo12(123456) sethi $r1, hi20(123457) ori $r1, lo12(123457) sethi $r2, hi20(123458) ori $r2, lo12(123458) If split const_int to high/low part too early move2add will fail to transfer move to add
  • 30. Split instruction pattern  How to split instruction pattern not in RTL expand phase ?  Use define_split, define_insn_and_split
  • 31. Split instruction pattern 004t.gimple 144r.expand Other gimple pass 145r.sibling 147r.initvals 148r.unshare 149r.vregs 150r.into_cfglayout 151r.jump 152r.subreg1 153r.dfinit 154r.cse1 155r.fwprop1 156r.cprop1 158r.hoist 159r.cprop2 162r.ce1 163r.reginfo 164r.loop2 165r.loop2_init 166r.loop2_invariant 170r.loop2_done 172r.cprop3 173r.cse2 174r.dse1 175r.fwprop2 176r.auto_inc_dec 177r.init-regs 178r.dce 179r.combine 180r.ce2 182r.regmove 183r.outof_cfglayout 184r.split1 185r.subreg2 188r.asmcons 190r.sched1 191r.ira 192r.postreload 194r.split2 198r.pro_and_epilogue 199r.dse2 200r.csa 201r.peephole2 202r.ce3 204r.cprop_hardreg 205r.dce 206r.bbro 208r.split4 209r.sched2 212r.alignments 215r.mach 216r.barriers 217r.dbr 218r.split5 220r.shorten 221r.nothrow 222r.final 223r.dfinish 224t.statistics
  • 32. Split instruction pattern 486 (define_insn_and_split "*movsi_const" 487 [(set (match_operand:WORD 0 "register_operand" "=r,r") 488 (match_operand:WORD 1 "immediate_operand" "P,i"))] 489 "" 490 { 491 if (GET_CODE (operands[1]) == CONST_INT && SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)) 492 { 493 return "movit%0, %1"; 494 } 495 else 496 return "#"; 497 } 498 "reload_completed && GET_CODE (operands[1]) == CONST_INT && ! SIGNED_INT_FITS_N_BITS (INTVAL (ope rands[1]), 20)" 499 [(set (match_dup 0) (high:SI (match_dup 1))) 500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))] If const_int not fit signed 20 bit return “#” which means the pattern will split in split phase
  • 33. Split instruction pattern 486 (define_insn_and_split "*movsi_const" 487 [(set (match_operand:WORD 0 "register_operand" "=r,r") 488 (match_operand:WORD 1 "immediate_operand" "P,i"))] 489 "" 490 { 491 if (GET_CODE (operands[1]) == CONST_INT && SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)) 492 { 493 return "movit%0, %1"; 494 } 495 else 496 return "#"; 497 } 498 "reload_completed && GET_CODE (operands[1]) == CONST_INT && ! SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)" 499 [(set (match_dup 0) (high:SI (match_dup 1))) 500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))] Split conditions: Which is reload_completed (after reload) && the const_int not fit signed 20 bit
  • 34. Split instruction pattern 486 (define_insn_and_split "*movsi_const" 487 [(set (match_operand:WORD 0 "register_operand" "=r,r") 488 (match_operand:WORD 1 "immediate_operand" "P,i"))] 489 "" 490 { 491 if (GET_CODE (operands[1]) == CONST_INT && SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)) 492 { 493 return "movit%0, %1"; 494 } 495 else 496 return "#"; 497 } 498 "reload_completed && GET_CODE (operands[1]) == CONST_INT && ! SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)" 499 [(set (match_dup 0) (high:SI (match_dup 1))) 500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))] Split RTL pattern to set high part And add low sum match_dup 0 means duplicate operands 0 to this field
  • 35. Split instruction pattern 288 (define_split 289 [(set (match_operand:ANY64 0 "register_operand" "") 290 (match_operand:ANY64 1 "register_operand" ""))] 291 "reload_completed && 292 (! USE_V3_SERISE_ISA)” 295 [(set (match_dup 0) (match_dup 1)) 296 (set (match_dup 2) (match_dup 3))] 297 “… Split condition would be reload_completed && not V3 ISA V3 have movd44 which could do 64 bit register move ANY64: DI, DF DI: double int DF:double float define_split Define_insn_ and_split Split RTL RTL validation Emit assembly
  • 36. Instruction attribute 120 (define_attr "type" 121 "unknown,load,store,bequal, alu, .." 122 (const_string "unknown")) … 614 (define_insn "cmovn" 615 [(set (match_operand:SI 0 "register_operand" "=r") 616 (if_then_else (ne:SI (match_operand:SI 1 "register_operand" "r") 617 (const_int 0)) 618 (match_operand:SI 2 "register_operand" "r") 619 (match_operand:SI 3 "register_operand" "0")))] 620 "" 621 "cmovnt%0, %2, %1" 622 [(set_attr "type" "alu") 623 (set_attr “length” “4”]) (define_attr “attribute_name” “value domain” (default value))
  • 37. Instruction attribute  Attribute “type” use to divide instruction to several instruction group  Help to write instruction scheduling porting code  Attribute “length” give each instruction ISA length (size) information make the GCC could calculate branch distance correctly.
  • 38. Peephole pattern 2072 ;; Merge move 0 to bcondz 2073 (define_peephole2 2074 [(set (match_operand:SI 0 "register_operand" "") (const_int 0)) 2075 (set (pc) 2076 (if_then_else (match_operator 1 "bcondz_operator" 2077 [(match_dup 0) 2078 (match_operand:SI 2 "register_operand" "r")]) 2079 (label_ref (match_operand 3 "" "")) 2080 (pc)))] 2081 "peep2_reg_dead_p (2, operands[0])" 2082 [(set (pc) 2083 (if_then_else:SI (match_dup 1) 2084 (label_ref (match_dup 3)) (pc)))] 2085 " 2086 { 2087 operands[1] = gen_rtx_fmt_ee (swap_condition (GET_CODE (operands[1])) , 2088 SImode, operands[2], GEN_INT(0)); 2089 }") Old RTL New RTL movi $r0, 0 bne $r0, $r1, L3 bnez $r1, L3
  • 39. Instruction scheduling  Instruction scheduling is the optimization pass in GCC  change instruction without changing the semantic of the code  To reduce the pipeline stall to improve performance  Instruction scheduling is belong to RTL phase
  • 40. RTL optimization pass in GCC 4.6.2 004t.gimple 144r.expand Other gimple pass 145r.sibling 147r.initvals 148r.unshare 149r.vregs 150r.into_cfglayout 151r.jump 152r.subreg1 153r.dfinit 154r.cse1 155r.fwprop1 156r.cprop1 158r.hoist 159r.cprop2 162r.ce1 163r.reginfo 164r.loop2 165r.loop2_init 166r.loop2_invariant 170r.loop2_done 172r.cprop3 173r.cse2 174r.dse1 175r.fwprop2 176r.auto_inc_dec 177r.init-regs 178r.dce 179r.combine 180r.ce2 182r.regmove 183r.outof_cfglayout 184r.split1 185r.subreg2 188r.asmcons 190r.sched1 191r.ira 192r.postreload 194r.split2 198r.pro_and_epilogue 199r.dse2 200r.csa 201r.peephole2 202r.ce3 204r.cprop_hardreg 205r.dce 206r.bbro 208r.split4 209r.sched2 212r.alignments 215r.mach 216r.barriers 217r.dbr 218r.split5 220r.shorten 221r.nothrow 222r.final 223r.dfinish 224t.statistics
  • 41. Instruction scheduling  GCC have two scheduling pass  Sched1 Do the interblock scheduling before Register allocation  Try to find the innermost loop as region  Schedule the instructions in the region  Improve the performance of hot spot (innermost loop)  Extend the scope to region to find more scheduling opportunity  Sched2 Do the single basic block scheduling after Register allocation Register allocation may produce spill code (load/store)  Need re-schedule again
  • 42. Instruction scheduling  Instruction scheduling resolve the following hazard to prevent pipeline stall  Structure hazard Structure hazard occur when two or more instruction need the same function unit at the same time  Data hazard RAW (read after write): a true dependency WAR (write after read): a anti-dependency WAW(write after write): an output dependency
  • 43. Instruction scheduling  GCC provide several interface to describe pipeline model  After parsing the pipeline description porting code Gcc will generate a automata as a pipeline hazard recognizer To figure out the possibility of the instruction issue by the processor on a given simulated cycle (define_automaton “name")
  • 44. Instruction scheduling (define_automaton “a1") (define_cpu_unit "decode1,decode2" "a1") (define_cpu_unit "div" "a1") (define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu") (define_insn_reservation "mult_class" 1 (eq_attr “type" "mult") "decode + mult") a1: automata name decode1, decode2, div: the cpu unit (function unit) in the processor define_insn_reservation: describe pipeline rule for each instruction class alu_class,mult_class: insn-name (insn class) (eq_attr “type" “alu"): match the rule while the type attribute of the Instruction pattern is alu "decode + alu": regular expression to describe the function unit usage 1 is the default cycle when the data dependency occur
  • 45. Instruction scheduling  Multiple Alternative Constraints(define_insn “addsi3" [ (set (match_operand:SI 0 “register_operand" "=r,r") (plus:SI (match_operand:SI 1 “register_operand" "%r,r") (match_operand:SI 2 “nonmemory_operand" “r,i")))] “” “@ add %0, %1, %2 addi %0, %1, %2” [(set_attr “type" “alu") ) (define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu")
  • 46. Instruction scheduling (define_automaton “a1") (define_cpu_unit "decode1,decode2" "a1") (define_cpu_unit “alu" "a1") (define_cpu_unit “mult" "a1") (define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu") (define_insn_reservation "mult_class" 1 (eq_attr “type" "mult") "decode + mult") nothing decode + alu decode + mult alu_class next_cycle next_cycle mult_class next_cycle Current CPU Function unit usage Next cycle CPU Function unit usage State transition: 1. Occupy some function unit 2. release function some unit
  • 47. Instruction scheduling (define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu") (define_insn_reservation "mult_class" 1 (eq_attr “type" "mult") "decode + mult") (define_bypass 2 “alu_class" “alu_class“) (define_bypass 3 “mult_class" “mult_class“) producer consumer t 1 2 3 4 5 alu_class alu_class mult_class 1 0 0 0 0 0 0 0 0 0 mult_class alu_class mult_class 0 0 0 0 0 1 1 0 0 0 1 means will stall at t cycle t cycle is the cycle time After producer
  • 48. Instruction scheduling producer consumer t 1 2 3 4 5 alu_class alu_class mult_class 1 0 0 0 0 0 0 0 0 0 mult_class alu_class mult_class 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 Current state consumer alu_class mult_class t 1 2 3 4 5
  • 49. Instruction scheduling 1. movi $r0, 0 {alu} 2. movi $r1, 1 {alu} 3. add $r0, $r0, $r1 {alu} 4. lwi $r4, [$sp + 4] {load} 5. mul $r5, $r0, $r4 {mul} 1 4 2 5 3 (define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu") (define_insn_reservation "mult_class" 1 (eq_attr “type" "mult") "decode + mult") (define_insn_reservation “load_class" 1 (eq_attr “type" “load") "decode + mem") Bottom up calcuate priority of Each instruction By P = max {latency + one successor latency} 1 22 3 3 Dataflow graph
  • 50. Instruction scheduling 1 4 2 5 3 Dataflow graph Ready list: 1 2 4 Pending list: 3 5 Queued list: Scheduled list: Ready Pending Queued Scheduled Scheduled Dependency resolved Data hazard Pick the max priority insn from Ready list
  • 51. Instruction scheduling 1 4 2 5 3 Dataflow graph Ready list: 4 Pending list: 3 5 Queued list: 2 Scheduled list:1 1. movi $r0, 0 {alu} {alu} {alu} {alu} {load} {mult} cycle 1 (define_bypass 2 “alu_class" “alu_class“)
  • 52. Instruction scheduling 1 4 2 5 3 Dataflow graph Ready list: 2 Pending list: 3 5 Queued list: Scheduled list:1 4 1. movi $r0, 0 {alu} {alu} {alu} {alu} {load} {mult} cycle 1 4. lwi $r4, [$sp + 4] {load}cycle 2
  • 53. Instruction scheduling 1 4 2 5 3 Dataflow graph Ready list: Pending list: 5 Queued list: 3 Scheduled list:1 4 2 1. movi $r0, 0 {alu} {alu} {alu} {alu} {load} {mult} cycle 1 4. lwi $r4, [$sp + 4] {load}cycle 2 2. movi $r1, 1 {alu}cycle 3
  • 54. Instruction scheduling 1 4 2 5 3 Dataflow graph Ready list: 3 Pending list: 5 Queued list: Scheduled list:1 4 2 1. movi $r0, 0 {alu} {alu} {alu} {alu} {load} {mult} cycle 1 4. lwi $r4, [$sp + 4] {load}cycle 2 2. movi $r1, 1 {alu}cycle 3 cycle 4
  • 55. Instruction scheduling 1 4 2 5 3 Dataflow graph Ready list: 5 Pending list: Queued list: Scheduled list:1 4 2 3 1. movi $r0, 0 {alu} {alu} {alu} {alu} {load} {mult} cycle 1 4. lwi $r4, [$sp + 4] {load}cycle 2 2. movi $r1, 1 {alu}cycle 3 cycle 4 3. add $r0, $r0, $r1 {alu}cycle 5
  • 56. Instruction scheduling 1 4 2 5 3 Dataflow graph Ready list: Pending list: Queued list: Scheduled list:1 4 2 3 5 1. movi $r0, 0 {alu} {alu} {alu} {alu} {load} {mult} cycle 1 4. lwi $r4, [$sp + 4] {load}cycle 2 2. movi $r1, 1 {alu}cycle 3 cycle 4 3. add $r0, $r0, $r1 {alu}cycle 5 5. mul $r5, $r0, $r4 {mul}cycle 6
  • 58. Switch initialization conversion in gimple optimization pass 31 int a,b; 32 33 switch (argc) 34 { 35 case 1: 36 case 2: 37 a = 8; 38 b = 6; 39 break; 40 case 3: 41 a = 9; 42 b = 5; 43 break; 44 case 12: 45 a = 10; 46 b = 4; 47 break; 48 default: 49 a = 16; 50 b = 1; 51 } 58 static const int = CSWTCH01[] = {6, 6, 5, 1, 1, 1, 1, 1, 1, 1, 1, 4}; 59 static const int = CSWTCH02[] = {8, 8, 9, 16, 16, 16, 16, 16, 16, 16, 60 16, 16, 10}; 61 62 if (((unsigned) argc) - 1 < 11) 63 { 64 a = CSWTCH02[argc - 1]; 65 b = CSWTCH01[argc - 1]; 66 } 67 else 68 { 69 a = 16; 70 b = 1; 71 } Try to transfer switch statement to static array access