Your SlideShare is downloading. ×
  • Like
Gcc porting
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Gcc porting

  • 416 views
Published

 

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
416
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
15
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. GCC portingUse instruction pattern describetarget ISAShiva Chenshiva0217@gmail.comMay 2013
  • 2. Outline Compiler structure Intermediate languages in GCC Optimization pass in GCC Define instruction pattern Operand constraints Match instruction pattern Strict RTL Target defined constraints Emit assembly code Target information usage Preserve word to describe instruction pattern Example of instruction pattern Split instruction pattern Instruction attribute Peephole pattern Instruction scheduling
  • 3.  Three main intermediate languages format in GCC GENERICLanguage-independent representation generated by each frontendCommon representation for all the languages supported byGCC. GIMPLEPerform language independent and target independentoptimization RTLPerform the optimization which will notice target feature byporting code
  • 4. Gimple optimization pass in GCC4.6.2004t.gimple006t.vcg009t.omplower010t.lower012t.eh013t.cfg017t.ssa018t.veclower019t.inline_param1020t.einline021t.early_optimizations022t.copyrename1023t.ccp1024t.forwprop1025t.ealias026t.esra027t.copyprop1028t.mergephi1029t.cddce1030t.eipa_sra031t.tailr1032t.switchconv034t.profile035t.local-pure-const1036t.fnsplit037t.release_ssa038t.inline_param2057t.copyrename2058t.cunrolli059t.ccp2060t.forwprop2062t.alias063t.retslot064t.phiprop065t.fre066t.copyprop2067t.mergephi2068t.vrp1069t.dce1070t.cselim071t.ifcombine072t.phiopt1073t.tailr2074t.ch076t.cplxlower077t.sra078t.copyrename3079t.dom1080t.phicprop1081t.dse1082t.reassoc1083t.dce2084t.forwprop3085t.phiopt2086t.objsz087t.ccp3088t.copyprop3090t.bswap091t.crited092t.pre093t.sink094t.loop095t.loopinit096t.lim1097t.copyprop4…143t.optimized
  • 5. RTL optimization pass in GCC 4.6.2004t.gimple144r.expandOther gimple pass145r.sibling147r.initvals148r.unshare149r.vregs150r.into_cfglayout151r.jump152r.subreg1153r.dfinit154r.cse1155r.fwprop1156r.cprop1158r.hoist159r.cprop2162r.ce1163r.reginfo164r.loop2165r.loop2_init166r.loop2_invariant170r.loop2_done172r.cprop3173r.cse2174r.dse1175r.fwprop2176r.auto_inc_dec177r.init-regs178r.dce179r.combine180r.ce2182r.regmove183r.outof_cfglayout184r.split1185r.subreg2188r.asmcons190r.sched1191r.ira192r.postreload194r.split2198r.pro_and_epilogue199r.dse2200r.csa201r.peephole2202r.ce3204r.cprop_hardreg205r.dce206r.bbro208r.split4209r.sched2212r.alignments215r.mach216r.barriers217r.dbr218r.split5220r.shorten221r.nothrow222r.final223r.dfinish224t.statistics
  • 6.  Why need divide optimization pass togimple pass and RTL pass? Gimple pass have more high level semanticEx: switch, array, structure, variableSome optimization is more easier to designed whenhigh level semantic still exist However, gimple pass lack of target informationEx: instruction length(size), supported ISATherefore, we need RTL optimization pass
  • 7. Define instruction pattern All the RTL pattern must match target ISA How to tell GCC generate the RTL match ISA ?Instruction patterns Use define_expand, define_insn to describe the instructionpatterns which target support(define_insn “addsi3"[(set (match_operand:SI 0 “register_operand" "=r,r")(plus:SI (match_operand:SI 1 “register_operand" "%r,r")(match_operand:SI 2 “nonmemory_operand" “r,i")))] ... )
  • 8. Define instruction pattern GCC already define several instruction patternname and the semantic of the pattern addsi3Add semantic with 3 SI mode operands GCC don’t know the operand constraint of thetarget How to tell GCC our target’s operand constraint of eachinstruction ?PredicateConstraint
  • 9. Operand Constraints Multiple Alternative Constraints(define_insn “addsi3"[(set (match_operand:SI 0 “register_operand" "=r,r")(plus:SI (match_operand:SI 1 “register_operand" "%r,r")(match_operand:SI 2 “nonmemory_operand" “r,i")))] ... )Predicate: register_operand, nonmemory_operandConstraint: r, iPredicate should contain each constraints of the operandFor operand 2 with SI moder(reg) belong to nonmemory_operandi(immediate) belong to nonmemory_operand
  • 10. Operand Constraints GCC already have predicate to restrictoperand Why need constraint field ?Give the opportunity to change operand whileoptimization Ex:movi $r0, 4;add $r1, $r1, $r0 {addsi3}Constant propagation=> addi $r1, $1, 4 {addsi3}
  • 11. Operand Constraints GCC use two level operand constraint group same semantic instruction together withsingle instruction pattern (addsi3) Lots of ISA designed have several assemblyinstructions with same semantic and differentoperand constraint Reduce the instruction pattern when porting
  • 12. Operand Constraints Use instruction pattern do ISA supportchecking when GCC generate a new RTLpattern Check does the back end define the pattern bydefine_insn Check the operand type support or not bypredicate Check the operand belong to which alternativeby constraint
  • 13. Operand Constraints Multiple Alternative Constraints(define_insn “addsi3"[(set (match_operand:SI 0 “register_operand" "=r,r")(plus:SI (match_operand:SI 1 “register_operand" "%r,r")(match_operand:SI 2 “nonmemory_operand" “r,i")))] ... )First alternative constraintsmatch “add”Second alternative constraintsmatch “addi”
  • 14. Match instruction pattern Multiple Alternative Constraints(define_insn “addsi3"[(set (match_operand:SI 0 “register_operand" "=r,r")(plus:SI (match_operand:SI 1 “register_operand" "%r,r")(match_operand:SI 2 “nonmemory_operand" “r,i")))] ... )Ex:(set (reg/f:SI 88)(plus:SI (reg:SI 87)(reg/v:SI 55))1. Parsing RTL pattern(set (op0)(plus:SI (op1)(op2))
  • 15. Match instruction pattern When will generate new RTL pattern ? RTL expand phase (GIMPLE to RTL) During optimizationEx:(set (reg/f:SI 47)(lshiftrt:SI (reg:SI 60)(const_int 2))(set (reg/f:SI 88)(plus:SI (reg:SI 47)(reg:SI 55))(set (reg/f:SI 88)(plus:SI (lshiftrt:SI (reg:SI 60)(const_int 2))(reg/v:SI 55))Combine phasesrli $r47, $r60, 2add $r88, $r47, $r55add_srli $r88, $r55, $r60, 2
  • 16. Strict RTL Does the new generated RTL patternalways satisfy constraint ? GCC allow certain kind un-match constraintwhich reload could fix it later Predicate must always satisfyRTL1Not do optimization1Do optimization1RTL1RTL2ReloadReloadRTL3RTL2 not satisfy constraintRTL41. RTL3 and RTL4Satisfy constraint2. RTL4 is betterThen RTL3
  • 17. Strict RTL Constraint could allow certain un-match beforereload, and hope reload to fix it Ex: constraint is m (memory), but current operand isconstant, GCC will allow before reload Reload phase is after register allocationIn fact, during register allocation, GCC will call reload rapidlywhile the operand not fit the constraint. After reload, the operand must satisfy one of theoperand constraint (strict RTL)
  • 18. Strict RTL(define_insn “movsi"[(set (match_operand:SI 0 “register_operand" "=r,m")(match_operand:SI 1 “register_operand" “r,r")))] ... )(set (reg/f:SI 47)(reg:SI 60))(set (reg/f:SI 47)(reg:SI 3))AssumeAfter register allocationPseudo register r60 assigned to r3and the hardware register is exhaustedRA (set (mem:SI (plus (sp)(const))))(reg:SI 3))Reload
  • 19. Target defined constraints Target could define their own predicate andconstraint Target defined predicate(define_predicate "index_operand"(ior (match_operand 0 "register_operand")(and (match_operand 0 “const_int_operand")(match_test "(INTVAL (op) < 4096&& INTVAL (op) > -4096))")))
  • 20. Target defined constraints Target defined constraint(define_register_constraint "l""LO_REGS""registers r0->r7.")(define_memory_constraint "Uv""@internal In ARM/Thumb-2 state a valid VFP load/store address."(and (match_code "mem")(match_test "TARGET_32BIT&& arm_coproc_mem_operand (op, FALSE)")))
  • 21. Emit assembly code Multiple Alternative Constraints(define_insn “addsi3"[ (set (match_operand:SI 0 “register_operand" "=r,r")(plus:SI (match_operand:SI 1 “register_operand" "%r,r")(match_operand:SI 2 “nonmemory_operand" “r,i")))]“”“@add %0, %1, %2addi %0, %1, %2”)Match First alternative constraintsmatch “add”Output assembly code “add $r3, $r4, $5”Ex:(set (reg/f:SI 3)(plus:SI (reg:SI 4)(reg:SI 5))
  • 22. Target information usage When will GCC use target information get frominstruction patterns ? RTL instruction pattern generationgenerate insn-emit.c when building GCC by parsing instructionpatterns RTL instruction validation (target supported)generate insn-recog.c when building GCC by parsing instructionpatterns Emit target assembly codegenerate insn-output.c when building GCC by parsinginstruction patterns
  • 23. Preserve word to describe instructionpatterndefine_insn“naming pattern”define_expand“naming pattern”define_insn“*..”RTL generationRTL validationEmit assembly GCC define several “naming patterns” and their semantic use togenerate RTL pattern during RTL expand phase ex: addsi3, subsi3, movsi, movhi … Some target ISA which the semantic not defined in GCC namingpattern but the RTL could generate by some optimization ex: add_slli could generate by combine phase define un-naming pattern make the instruction validate define_insn “*add_slli” define_insn name with * prefix will identify as un-naming pattern
  • 24. Example of instruction pattern1600 ;; These control RTL generation for conditional jump insns1601 (define_expand "cbranchsi4"1602 [(set (pc)1603 (if_then_else (match_operator 0 "ordered_comparison_operator"1604 [(match_operand:SI 1 "nonmemory_nonsymbol_operand" "")1605 (match_operand:SI 2 "nonmemory_nonsymbol_operand" "")])1606 (label_ref (match_operand 3 "" ""))1607 (pc)))]1608 ""1609 {1610 sh_expand_cbranchsi4 (operands);1611 DONE;1612 }1613 )Semantic of “cbranchsi4”compare operand1 and operand 2 by operator 0branch to label 3 if the compare result is truePredicate "ordered_comparison_operator“ including EQ,NE,LT,LTU,LE,LEU,GT,GTU,GE,GEU.Use porting function sh_expand_cbranchsi4 to generate RTL pattern
  • 25. Example of instruction pattern1621 (define_insn "*bcondz"1622 [(set (pc)1623 (if_then_else (match_operator 0 "bcondz_operator"1624 [(match_operand:SI 1 "register_operand" "r")1625 (const_int 0)])1626 (label_ref (match_operand 2 "" ""))1627 (pc)))]1628 ""1629 {1630 switch (GET_CODE (operands[0]))1631 {1632 case EQ:1633 return "beqz %1, %2";1634 case NE:1635 return "bnez %1, %2";1636 case LT:1637 return "bltz %1, %2";1638 case LE:1639 return "blez %1, %2";1640 case GT:1641 return "bgtz %1, %2";1642 case GE:1643 return "bgez %1, %2";1644 default:1645 gcc_unreachable ();1646 }1647 }Un-naming pattern “*bcondz”Use to validate RTL and emitassembly code for the branchcompare with zero
  • 26. Example of instruction pattern1388 (define_insn "one_cmplsi2"1389 [(set (match_operand:SI 0 "register_operand" "=r")1390 (not:SI (match_operand:SI 1 "register_operand" "r")))]1391 ""1392 "nort%0, %1, %1“)Semantic of “one_cmplsi2”not operand1 and set to operand 0Naming pattern “one_cmplsi2” use to generate RTL, validate RTLAnd output assembly codeOutput assembly “nor ra, rb, rb” to match the semantic
  • 27. Split instruction pattern When will need split instruction pattern ? The const_int value too big that single assemblyinstruction can’t encodeSplit the const_int to high part and low partCould split the constant while define_expand But it’s not good enough, why? Too early split the constant will lost theopportunity to optimize the RTL pattern
  • 28. Split instruction pattern The optimization phase “move2add”could do the followingthing (use assembly code to present RTL semantic forconvenient )move $r0, 123456move $r1, 123457move $r2, 123458move $r0, 123456addi $r1, $r0, 1addi $r2, $r0, 2sethi $r0, hi20(123456)ori $r0, lo12(123456)sethi $r1, hi20(123457)ori $r1, lo12(123457)sethi $r2, hi20(123458)ori $r2, lo12(123458)If split const_int to high/low part tooearlymove2add will fail to transfer moveto add
  • 29. Split instruction pattern How to split instruction pattern not in RTLexpand phase ? Use define_split, define_insn_and_split
  • 30. Split instruction pattern004t.gimple144r.expandOther gimple pass145r.sibling147r.initvals148r.unshare149r.vregs150r.into_cfglayout151r.jump152r.subreg1153r.dfinit154r.cse1155r.fwprop1156r.cprop1158r.hoist159r.cprop2162r.ce1163r.reginfo164r.loop2165r.loop2_init166r.loop2_invariant170r.loop2_done172r.cprop3173r.cse2174r.dse1175r.fwprop2176r.auto_inc_dec177r.init-regs178r.dce179r.combine180r.ce2182r.regmove183r.outof_cfglayout184r.split1185r.subreg2188r.asmcons190r.sched1191r.ira192r.postreload194r.split2198r.pro_and_epilogue199r.dse2200r.csa201r.peephole2202r.ce3204r.cprop_hardreg205r.dce206r.bbro208r.split4209r.sched2212r.alignments215r.mach216r.barriers217r.dbr218r.split5220r.shorten221r.nothrow222r.final223r.dfinish224t.statistics
  • 31. Split instruction pattern486 (define_insn_and_split "*movsi_const"487 [(set (match_operand:WORD 0 "register_operand" "=r,r")488 (match_operand:WORD 1 "immediate_operand" "P,i"))]489 ""490 {491 if (GET_CODE (operands[1]) == CONST_INT&& SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20))492 {493 return "movit%0, %1";494 }495 else496 return "#";497 }498 "reload_completed && GET_CODE (operands[1]) == CONST_INT&& ! SIGNED_INT_FITS_N_BITS (INTVAL (ope rands[1]), 20)"499 [(set (match_dup 0) (high:SI (match_dup 1)))500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]If const_int not fit signed 20 bitreturn “#”which means the pattern will split in split phase
  • 32. Split instruction pattern486 (define_insn_and_split "*movsi_const"487 [(set (match_operand:WORD 0 "register_operand" "=r,r")488 (match_operand:WORD 1 "immediate_operand" "P,i"))]489 ""490 {491 if (GET_CODE (operands[1]) == CONST_INT&& SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20))492 {493 return "movit%0, %1";494 }495 else496 return "#";497 }498 "reload_completed && GET_CODE (operands[1]) == CONST_INT&& ! SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)"499 [(set (match_dup 0) (high:SI (match_dup 1)))500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]Split conditions:Which is reload_completed (after reload)&& the const_int not fit signed 20 bit
  • 33. Split instruction pattern486 (define_insn_and_split "*movsi_const"487 [(set (match_operand:WORD 0 "register_operand" "=r,r")488 (match_operand:WORD 1 "immediate_operand" "P,i"))]489 ""490 {491 if (GET_CODE (operands[1]) == CONST_INT&& SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20))492 {493 return "movit%0, %1";494 }495 else496 return "#";497 }498 "reload_completed && GET_CODE (operands[1]) == CONST_INT&& ! SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)"499 [(set (match_dup 0) (high:SI (match_dup 1)))500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]Split RTL pattern to set high partAnd add low summatch_dup 0 means duplicate operands 0 to this field
  • 34. Split instruction pattern288 (define_split289 [(set (match_operand:ANY64 0 "register_operand" "")290 (match_operand:ANY64 1 "register_operand" ""))]291 "reload_completed &&292 (! USE_V3_SERISE_ISA)”295 [(set (match_dup 0) (match_dup 1))296 (set (match_dup 2) (match_dup 3))]297 “…Split condition would bereload_completed && not V3 ISAV3 have movd44 which could do64 bit register moveANY64: DI, DFDI: double intDF:double floatdefine_split Define_insn_and_splitSplit RTLRTLvalidationEmitassembly
  • 35. Instruction attribute120 (define_attr "type"121 "unknown,load,store,bequal, alu, .."122 (const_string "unknown"))…614 (define_insn "cmovn"615 [(set (match_operand:SI 0 "register_operand" "=r")616 (if_then_else (ne:SI (match_operand:SI 1 "register_operand" "r")617 (const_int 0))618 (match_operand:SI 2 "register_operand" "r")619 (match_operand:SI 3 "register_operand" "0")))]620 ""621 "cmovnt%0, %2, %1"622 [(set_attr "type" "alu")623 (set_attr “length” “4”])(define_attr “attribute_name” “value domain” (default value))
  • 36. Instruction attribute Attribute “type” use to divide instruction toseveral instruction group Help to write instruction scheduling porting code Attribute “length” give each instruction ISAlength (size) information make the GCCcould calculate branch distance correctly.
  • 37. Peephole pattern2072 ;; Merge move 0 to bcondz2073 (define_peephole22074 [(set (match_operand:SI 0 "register_operand" "") (const_int 0))2075 (set (pc)2076 (if_then_else (match_operator 1 "bcondz_operator"2077 [(match_dup 0)2078 (match_operand:SI 2 "register_operand" "r")])2079 (label_ref (match_operand 3 "" ""))2080 (pc)))]2081 "peep2_reg_dead_p (2, operands[0])"2082 [(set (pc)2083 (if_then_else:SI (match_dup 1)2084 (label_ref (match_dup 3)) (pc)))]2085 "2086 {2087 operands[1] = gen_rtx_fmt_ee (swap_condition (GET_CODE (operands[1])) ,2088 SImode, operands[2], GEN_INT(0));2089 }")Old RTLNew RTLmovi $r0, 0bne $r0, $r1, L3bnez $r1, L3
  • 38. Instruction scheduling Instruction scheduling is the optimizationpass in GCC change instruction without changing thesemantic of the code To reduce the pipeline stall to improveperformance Instruction scheduling is belong to RTL phase
  • 39. RTL optimization pass in GCC 4.6.2004t.gimple144r.expandOther gimple pass145r.sibling147r.initvals148r.unshare149r.vregs150r.into_cfglayout151r.jump152r.subreg1153r.dfinit154r.cse1155r.fwprop1156r.cprop1158r.hoist159r.cprop2162r.ce1163r.reginfo164r.loop2165r.loop2_init166r.loop2_invariant170r.loop2_done172r.cprop3173r.cse2174r.dse1175r.fwprop2176r.auto_inc_dec177r.init-regs178r.dce179r.combine180r.ce2182r.regmove183r.outof_cfglayout184r.split1185r.subreg2188r.asmcons190r.sched1191r.ira192r.postreload194r.split2198r.pro_and_epilogue199r.dse2200r.csa201r.peephole2202r.ce3204r.cprop_hardreg205r.dce206r.bbro208r.split4209r.sched2212r.alignments215r.mach216r.barriers217r.dbr218r.split5220r.shorten221r.nothrow222r.final223r.dfinish224t.statistics
  • 40. Instruction scheduling GCC have two scheduling pass Sched1Do the interblock scheduling before Register allocation Try to find the innermost loop as region Schedule the instructions in the region Improve the performance of hot spot (innermost loop) Extend the scope to region to find more schedulingopportunity Sched2Do the single basic block scheduling after Register allocationRegister allocation may produce spill code (load/store) Need re-schedule again
  • 41. Instruction scheduling Instruction scheduling resolve the followinghazard to prevent pipeline stall Structure hazardStructure hazard occur when two or more instructionneed the same function unit at the same time Data hazardRAW (read after write): a true dependencyWAR (write after read): a anti-dependencyWAW(write after write): an output dependency
  • 42. Instruction scheduling GCC provide several interface to describepipeline model After parsing the pipeline description portingcodeGcc will generate a automata as a pipeline hazardrecognizerTo figure out the possibility of the instruction issue bythe processor on a given simulated cycle(define_automaton “name")
  • 43. Instruction scheduling(define_automaton “a1")(define_cpu_unit "decode1,decode2" "a1")(define_cpu_unit "div" "a1")(define_insn_reservation “alu_class" 1(eq_attr “type" “alu")"decode + alu")(define_insn_reservation "mult_class" 1(eq_attr “type" "mult")"decode + mult")a1: automata namedecode1, decode2, div: the cpu unit(function unit) in the processordefine_insn_reservation: describepipeline rule for each instruction classalu_class,mult_class:insn-name (insn class)(eq_attr “type" “alu"): match the rulewhile the type attribute of theInstruction pattern is alu"decode + alu": regular expressionto describe the function unit usage1 is the default cycle when the datadependency occur
  • 44. Instruction scheduling Multiple Alternative Constraints(define_insn “addsi3"[ (set (match_operand:SI 0 “register_operand" "=r,r")(plus:SI (match_operand:SI 1 “register_operand" "%r,r")(match_operand:SI 2 “nonmemory_operand" “r,i")))]“”“@add %0, %1, %2addi %0, %1, %2”[(set_attr “type" “alu"))(define_insn_reservation “alu_class" 1(eq_attr “type" “alu")"decode + alu")
  • 45. Instruction scheduling(define_automaton “a1")(define_cpu_unit "decode1,decode2" "a1")(define_cpu_unit “alu" "a1")(define_cpu_unit “mult" "a1")(define_insn_reservation “alu_class" 1(eq_attr “type" “alu")"decode + alu")(define_insn_reservation "mult_class" 1(eq_attr “type" "mult")"decode + mult")nothingdecode+ aludecode+ multalu_classnext_cyclenext_cyclemult_classnext_cycleCurrent CPUFunction unit usageNext cycle CPUFunction unit usageState transition:1. Occupy some function unit2. release function some unit
  • 46. Instruction scheduling(define_insn_reservation “alu_class" 1(eq_attr “type" “alu")"decode + alu")(define_insn_reservation "mult_class" 1(eq_attr “type" "mult")"decode + mult")(define_bypass 2 “alu_class" “alu_class“)(define_bypass 3 “mult_class" “mult_class“)producer consumer t 1 2 3 4 5alu_class alu_classmult_class1 0 0 0 00 0 0 0 0mult_class alu_classmult_class0 0 0 0 01 1 0 0 01 means will stall at t cyclet cycle is the cycle timeAfter producer
  • 47. Instruction schedulingproducer consumer t 1 2 3 4 5alu_class alu_classmult_class1 0 0 0 00 0 0 0 0mult_class alu_classmult_class0 0 0 0 01 1 0 0 00 0 0 0 00 0 0 0 01 0 0 0 00 0 0 0 01 0 0 0 01 0 0 0 00 0 0 0 01 1 0 0 01 0 0 0 01 0 0 0 0Current stateconsumeralu_classmult_classt 1 2 3 4 5
  • 48. Instruction scheduling1. movi $r0, 0 {alu}2. movi $r1, 1 {alu}3. add $r0, $r0, $r1 {alu}4. lwi $r4, [$sp + 4] {load}5. mul $r5, $r0, $r4 {mul}14253(define_insn_reservation “alu_class" 1(eq_attr “type" “alu")"decode + alu")(define_insn_reservation "mult_class" 1(eq_attr “type" "mult")"decode + mult")(define_insn_reservation “load_class" 1(eq_attr “type" “load")"decode + mem")Bottom up calcuate priority ofEach instructionBy P = max {latency+ one successor latency}12233Dataflow graph
  • 49. Instruction scheduling14253Dataflow graphReady list: 1 2 4Pending list: 3 5Queued list:Scheduled list:Ready Pending Queued ScheduledScheduledDependencyresolvedData hazardPick the max priority insn from Ready list
  • 50. Instruction scheduling14253Dataflow graphReady list: 4Pending list: 3 5Queued list: 2Scheduled list:11. movi $r0, 0 {alu}{alu} {alu}{alu} {load}{mult}cycle 1(define_bypass 2 “alu_class" “alu_class“)
  • 51. Instruction scheduling14253Dataflow graphReady list: 2Pending list: 3 5Queued list:Scheduled list:1 41. movi $r0, 0 {alu}{alu} {alu}{alu} {load}{mult}cycle 14. lwi $r4, [$sp + 4] {load}cycle 2
  • 52. Instruction scheduling14253Dataflow graphReady list:Pending list: 5Queued list: 3Scheduled list:1 4 21. movi $r0, 0 {alu}{alu} {alu}{alu} {load}{mult}cycle 14. lwi $r4, [$sp + 4] {load}cycle 22. movi $r1, 1 {alu}cycle 3
  • 53. Instruction scheduling14253Dataflow graphReady list: 3Pending list: 5Queued list:Scheduled list:1 4 21. movi $r0, 0 {alu}{alu} {alu}{alu} {load}{mult}cycle 14. lwi $r4, [$sp + 4] {load}cycle 22. movi $r1, 1 {alu}cycle 3cycle 4
  • 54. Instruction scheduling14253Dataflow graphReady list: 5Pending list:Queued list:Scheduled list:1 4 2 31. movi $r0, 0 {alu}{alu} {alu}{alu} {load}{mult}cycle 14. lwi $r4, [$sp + 4] {load}cycle 22. movi $r1, 1 {alu}cycle 3cycle 43. add $r0, $r0, $r1 {alu}cycle 5
  • 55. Instruction scheduling14253Dataflow graphReady list:Pending list:Queued list:Scheduled list:1 4 2 3 51. movi $r0, 0 {alu}{alu} {alu}{alu} {load}{mult}cycle 14. lwi $r4, [$sp + 4] {load}cycle 22. movi $r1, 1 {alu}cycle 3cycle 43. add $r0, $r0, $r1 {alu}cycle 55. mul $r5, $r0, $r4 {mul}cycle 6
  • 56. Thank you
  • 57. Switch initialization conversion ingimple optimization pass31 int a,b;3233 switch (argc)34 {35 case 1:36 case 2:37 a = 8;38 b = 6;39 break;40 case 3:41 a = 9;42 b = 5;43 break;44 case 12:45 a = 10;46 b = 4;47 break;48 default:49 a = 16;50 b = 1;51 }58 static const int = CSWTCH01[] = {6, 6, 5, 1, 1, 1, 1, 1, 1, 1, 1, 4};59 static const int = CSWTCH02[] = {8, 8, 9, 16, 16, 16, 16, 16, 16, 16,60 16, 16, 10};6162 if (((unsigned) argc) - 1 < 11)63 {64 a = CSWTCH02[argc - 1];65 b = CSWTCH01[argc - 1];66 }67 else68 {69 a = 16;70 b = 1;71 }Try to transfer switch statement to static array access