Your SlideShare is downloading. ×
0
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Gcc porting
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Gcc porting

506

Published on

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
506
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
17
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. GCC portingUse instruction pattern describetarget ISAShiva Chenshiva0217@gmail.comMay 2013
  • 2. Outline Compiler structure Intermediate languages in GCC Optimization pass in GCC Define instruction pattern Operand constraints Match instruction pattern Strict RTL Target defined constraints Emit assembly code Target information usage Preserve word to describe instruction pattern Example of instruction pattern Split instruction pattern Instruction attribute Peephole pattern Instruction scheduling
  • 3.  Three main intermediate languages format in GCC GENERICLanguage-independent representation generated by each frontendCommon representation for all the languages supported byGCC. GIMPLEPerform language independent and target independentoptimization RTLPerform the optimization which will notice target feature byporting code
  • 4. Gimple optimization pass in GCC4.6.2004t.gimple006t.vcg009t.omplower010t.lower012t.eh013t.cfg017t.ssa018t.veclower019t.inline_param1020t.einline021t.early_optimizations022t.copyrename1023t.ccp1024t.forwprop1025t.ealias026t.esra027t.copyprop1028t.mergephi1029t.cddce1030t.eipa_sra031t.tailr1032t.switchconv034t.profile035t.local-pure-const1036t.fnsplit037t.release_ssa038t.inline_param2057t.copyrename2058t.cunrolli059t.ccp2060t.forwprop2062t.alias063t.retslot064t.phiprop065t.fre066t.copyprop2067t.mergephi2068t.vrp1069t.dce1070t.cselim071t.ifcombine072t.phiopt1073t.tailr2074t.ch076t.cplxlower077t.sra078t.copyrename3079t.dom1080t.phicprop1081t.dse1082t.reassoc1083t.dce2084t.forwprop3085t.phiopt2086t.objsz087t.ccp3088t.copyprop3090t.bswap091t.crited092t.pre093t.sink094t.loop095t.loopinit096t.lim1097t.copyprop4…143t.optimized
  • 5. RTL optimization pass in GCC 4.6.2004t.gimple144r.expandOther gimple pass145r.sibling147r.initvals148r.unshare149r.vregs150r.into_cfglayout151r.jump152r.subreg1153r.dfinit154r.cse1155r.fwprop1156r.cprop1158r.hoist159r.cprop2162r.ce1163r.reginfo164r.loop2165r.loop2_init166r.loop2_invariant170r.loop2_done172r.cprop3173r.cse2174r.dse1175r.fwprop2176r.auto_inc_dec177r.init-regs178r.dce179r.combine180r.ce2182r.regmove183r.outof_cfglayout184r.split1185r.subreg2188r.asmcons190r.sched1191r.ira192r.postreload194r.split2198r.pro_and_epilogue199r.dse2200r.csa201r.peephole2202r.ce3204r.cprop_hardreg205r.dce206r.bbro208r.split4209r.sched2212r.alignments215r.mach216r.barriers217r.dbr218r.split5220r.shorten221r.nothrow222r.final223r.dfinish224t.statistics
  • 6.  Why need divide optimization pass togimple pass and RTL pass? Gimple pass have more high level semanticEx: switch, array, structure, variableSome optimization is more easier to designed whenhigh level semantic still exist However, gimple pass lack of target informationEx: instruction length(size), supported ISATherefore, we need RTL optimization pass
  • 7. Define instruction pattern All the RTL pattern must match target ISA How to tell GCC generate the RTL match ISA ?Instruction patterns Use define_expand, define_insn to describe the instructionpatterns which target support(define_insn “addsi3"[(set (match_operand:SI 0 “register_operand" "=r,r")(plus:SI (match_operand:SI 1 “register_operand" "%r,r")(match_operand:SI 2 “nonmemory_operand" “r,i")))] ... )
  • 8. Define instruction pattern GCC already define several instruction patternname and the semantic of the pattern addsi3Add semantic with 3 SI mode operands GCC don’t know the operand constraint of thetarget How to tell GCC our target’s operand constraint of eachinstruction ?PredicateConstraint
  • 9. Operand Constraints Multiple Alternative Constraints(define_insn “addsi3"[(set (match_operand:SI 0 “register_operand" "=r,r")(plus:SI (match_operand:SI 1 “register_operand" "%r,r")(match_operand:SI 2 “nonmemory_operand" “r,i")))] ... )Predicate: register_operand, nonmemory_operandConstraint: r, iPredicate should contain each constraints of the operandFor operand 2 with SI moder(reg) belong to nonmemory_operandi(immediate) belong to nonmemory_operand
  • 10. Operand Constraints GCC already have predicate to restrictoperand Why need constraint field ?Give the opportunity to change operand whileoptimization Ex:movi $r0, 4;add $r1, $r1, $r0 {addsi3}Constant propagation=> addi $r1, $1, 4 {addsi3}
  • 11. Operand Constraints GCC use two level operand constraint group same semantic instruction together withsingle instruction pattern (addsi3) Lots of ISA designed have several assemblyinstructions with same semantic and differentoperand constraint Reduce the instruction pattern when porting
  • 12. Operand Constraints Use instruction pattern do ISA supportchecking when GCC generate a new RTLpattern Check does the back end define the pattern bydefine_insn Check the operand type support or not bypredicate Check the operand belong to which alternativeby constraint
  • 13. Operand Constraints Multiple Alternative Constraints(define_insn “addsi3"[(set (match_operand:SI 0 “register_operand" "=r,r")(plus:SI (match_operand:SI 1 “register_operand" "%r,r")(match_operand:SI 2 “nonmemory_operand" “r,i")))] ... )First alternative constraintsmatch “add”Second alternative constraintsmatch “addi”
  • 14. Match instruction pattern Multiple Alternative Constraints(define_insn “addsi3"[(set (match_operand:SI 0 “register_operand" "=r,r")(plus:SI (match_operand:SI 1 “register_operand" "%r,r")(match_operand:SI 2 “nonmemory_operand" “r,i")))] ... )Ex:(set (reg/f:SI 88)(plus:SI (reg:SI 87)(reg/v:SI 55))1. Parsing RTL pattern(set (op0)(plus:SI (op1)(op2))
  • 15. Match instruction pattern When will generate new RTL pattern ? RTL expand phase (GIMPLE to RTL) During optimizationEx:(set (reg/f:SI 47)(lshiftrt:SI (reg:SI 60)(const_int 2))(set (reg/f:SI 88)(plus:SI (reg:SI 47)(reg:SI 55))(set (reg/f:SI 88)(plus:SI (lshiftrt:SI (reg:SI 60)(const_int 2))(reg/v:SI 55))Combine phasesrli $r47, $r60, 2add $r88, $r47, $r55add_srli $r88, $r55, $r60, 2
  • 16. Strict RTL Does the new generated RTL patternalways satisfy constraint ? GCC allow certain kind un-match constraintwhich reload could fix it later Predicate must always satisfyRTL1Not do optimization1Do optimization1RTL1RTL2ReloadReloadRTL3RTL2 not satisfy constraintRTL41. RTL3 and RTL4Satisfy constraint2. RTL4 is betterThen RTL3
  • 17. Strict RTL Constraint could allow certain un-match beforereload, and hope reload to fix it Ex: constraint is m (memory), but current operand isconstant, GCC will allow before reload Reload phase is after register allocationIn fact, during register allocation, GCC will call reload rapidlywhile the operand not fit the constraint. After reload, the operand must satisfy one of theoperand constraint (strict RTL)
  • 18. Strict RTL(define_insn “movsi"[(set (match_operand:SI 0 “register_operand" "=r,m")(match_operand:SI 1 “register_operand" “r,r")))] ... )(set (reg/f:SI 47)(reg:SI 60))(set (reg/f:SI 47)(reg:SI 3))AssumeAfter register allocationPseudo register r60 assigned to r3and the hardware register is exhaustedRA (set (mem:SI (plus (sp)(const))))(reg:SI 3))Reload
  • 19. Target defined constraints Target could define their own predicate andconstraint Target defined predicate(define_predicate "index_operand"(ior (match_operand 0 "register_operand")(and (match_operand 0 “const_int_operand")(match_test "(INTVAL (op) < 4096&& INTVAL (op) > -4096))")))
  • 20. Target defined constraints Target defined constraint(define_register_constraint "l""LO_REGS""registers r0->r7.")(define_memory_constraint "Uv""@internal In ARM/Thumb-2 state a valid VFP load/store address."(and (match_code "mem")(match_test "TARGET_32BIT&& arm_coproc_mem_operand (op, FALSE)")))
  • 21. Emit assembly code Multiple Alternative Constraints(define_insn “addsi3"[ (set (match_operand:SI 0 “register_operand" "=r,r")(plus:SI (match_operand:SI 1 “register_operand" "%r,r")(match_operand:SI 2 “nonmemory_operand" “r,i")))]“”“@add %0, %1, %2addi %0, %1, %2”)Match First alternative constraintsmatch “add”Output assembly code “add $r3, $r4, $5”Ex:(set (reg/f:SI 3)(plus:SI (reg:SI 4)(reg:SI 5))
  • 22. Target information usage When will GCC use target information get frominstruction patterns ? RTL instruction pattern generationgenerate insn-emit.c when building GCC by parsing instructionpatterns RTL instruction validation (target supported)generate insn-recog.c when building GCC by parsing instructionpatterns Emit target assembly codegenerate insn-output.c when building GCC by parsinginstruction patterns
  • 23. Preserve word to describe instructionpatterndefine_insn“naming pattern”define_expand“naming pattern”define_insn“*..”RTL generationRTL validationEmit assembly GCC define several “naming patterns” and their semantic use togenerate RTL pattern during RTL expand phase ex: addsi3, subsi3, movsi, movhi … Some target ISA which the semantic not defined in GCC namingpattern but the RTL could generate by some optimization ex: add_slli could generate by combine phase define un-naming pattern make the instruction validate define_insn “*add_slli” define_insn name with * prefix will identify as un-naming pattern
  • 24. Example of instruction pattern1600 ;; These control RTL generation for conditional jump insns1601 (define_expand "cbranchsi4"1602 [(set (pc)1603 (if_then_else (match_operator 0 "ordered_comparison_operator"1604 [(match_operand:SI 1 "nonmemory_nonsymbol_operand" "")1605 (match_operand:SI 2 "nonmemory_nonsymbol_operand" "")])1606 (label_ref (match_operand 3 "" ""))1607 (pc)))]1608 ""1609 {1610 sh_expand_cbranchsi4 (operands);1611 DONE;1612 }1613 )Semantic of “cbranchsi4”compare operand1 and operand 2 by operator 0branch to label 3 if the compare result is truePredicate "ordered_comparison_operator“ including EQ,NE,LT,LTU,LE,LEU,GT,GTU,GE,GEU.Use porting function sh_expand_cbranchsi4 to generate RTL pattern
  • 25. Example of instruction pattern1621 (define_insn "*bcondz"1622 [(set (pc)1623 (if_then_else (match_operator 0 "bcondz_operator"1624 [(match_operand:SI 1 "register_operand" "r")1625 (const_int 0)])1626 (label_ref (match_operand 2 "" ""))1627 (pc)))]1628 ""1629 {1630 switch (GET_CODE (operands[0]))1631 {1632 case EQ:1633 return "beqz %1, %2";1634 case NE:1635 return "bnez %1, %2";1636 case LT:1637 return "bltz %1, %2";1638 case LE:1639 return "blez %1, %2";1640 case GT:1641 return "bgtz %1, %2";1642 case GE:1643 return "bgez %1, %2";1644 default:1645 gcc_unreachable ();1646 }1647 }Un-naming pattern “*bcondz”Use to validate RTL and emitassembly code for the branchcompare with zero
  • 26. Example of instruction pattern1388 (define_insn "one_cmplsi2"1389 [(set (match_operand:SI 0 "register_operand" "=r")1390 (not:SI (match_operand:SI 1 "register_operand" "r")))]1391 ""1392 "nort%0, %1, %1“)Semantic of “one_cmplsi2”not operand1 and set to operand 0Naming pattern “one_cmplsi2” use to generate RTL, validate RTLAnd output assembly codeOutput assembly “nor ra, rb, rb” to match the semantic
  • 27. Split instruction pattern When will need split instruction pattern ? The const_int value too big that single assemblyinstruction can’t encodeSplit the const_int to high part and low partCould split the constant while define_expand But it’s not good enough, why? Too early split the constant will lost theopportunity to optimize the RTL pattern
  • 28. Split instruction pattern The optimization phase “move2add”could do the followingthing (use assembly code to present RTL semantic forconvenient )move $r0, 123456move $r1, 123457move $r2, 123458move $r0, 123456addi $r1, $r0, 1addi $r2, $r0, 2sethi $r0, hi20(123456)ori $r0, lo12(123456)sethi $r1, hi20(123457)ori $r1, lo12(123457)sethi $r2, hi20(123458)ori $r2, lo12(123458)If split const_int to high/low part tooearlymove2add will fail to transfer moveto add
  • 29. Split instruction pattern How to split instruction pattern not in RTLexpand phase ? Use define_split, define_insn_and_split
  • 30. Split instruction pattern004t.gimple144r.expandOther gimple pass145r.sibling147r.initvals148r.unshare149r.vregs150r.into_cfglayout151r.jump152r.subreg1153r.dfinit154r.cse1155r.fwprop1156r.cprop1158r.hoist159r.cprop2162r.ce1163r.reginfo164r.loop2165r.loop2_init166r.loop2_invariant170r.loop2_done172r.cprop3173r.cse2174r.dse1175r.fwprop2176r.auto_inc_dec177r.init-regs178r.dce179r.combine180r.ce2182r.regmove183r.outof_cfglayout184r.split1185r.subreg2188r.asmcons190r.sched1191r.ira192r.postreload194r.split2198r.pro_and_epilogue199r.dse2200r.csa201r.peephole2202r.ce3204r.cprop_hardreg205r.dce206r.bbro208r.split4209r.sched2212r.alignments215r.mach216r.barriers217r.dbr218r.split5220r.shorten221r.nothrow222r.final223r.dfinish224t.statistics
  • 31. Split instruction pattern486 (define_insn_and_split "*movsi_const"487 [(set (match_operand:WORD 0 "register_operand" "=r,r")488 (match_operand:WORD 1 "immediate_operand" "P,i"))]489 ""490 {491 if (GET_CODE (operands[1]) == CONST_INT&& SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20))492 {493 return "movit%0, %1";494 }495 else496 return "#";497 }498 "reload_completed && GET_CODE (operands[1]) == CONST_INT&& ! SIGNED_INT_FITS_N_BITS (INTVAL (ope rands[1]), 20)"499 [(set (match_dup 0) (high:SI (match_dup 1)))500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]If const_int not fit signed 20 bitreturn “#”which means the pattern will split in split phase
  • 32. Split instruction pattern486 (define_insn_and_split "*movsi_const"487 [(set (match_operand:WORD 0 "register_operand" "=r,r")488 (match_operand:WORD 1 "immediate_operand" "P,i"))]489 ""490 {491 if (GET_CODE (operands[1]) == CONST_INT&& SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20))492 {493 return "movit%0, %1";494 }495 else496 return "#";497 }498 "reload_completed && GET_CODE (operands[1]) == CONST_INT&& ! SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)"499 [(set (match_dup 0) (high:SI (match_dup 1)))500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]Split conditions:Which is reload_completed (after reload)&& the const_int not fit signed 20 bit
  • 33. Split instruction pattern486 (define_insn_and_split "*movsi_const"487 [(set (match_operand:WORD 0 "register_operand" "=r,r")488 (match_operand:WORD 1 "immediate_operand" "P,i"))]489 ""490 {491 if (GET_CODE (operands[1]) == CONST_INT&& SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20))492 {493 return "movit%0, %1";494 }495 else496 return "#";497 }498 "reload_completed && GET_CODE (operands[1]) == CONST_INT&& ! SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)"499 [(set (match_dup 0) (high:SI (match_dup 1)))500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]Split RTL pattern to set high partAnd add low summatch_dup 0 means duplicate operands 0 to this field
  • 34. Split instruction pattern288 (define_split289 [(set (match_operand:ANY64 0 "register_operand" "")290 (match_operand:ANY64 1 "register_operand" ""))]291 "reload_completed &&292 (! USE_V3_SERISE_ISA)”295 [(set (match_dup 0) (match_dup 1))296 (set (match_dup 2) (match_dup 3))]297 “…Split condition would bereload_completed && not V3 ISAV3 have movd44 which could do64 bit register moveANY64: DI, DFDI: double intDF:double floatdefine_split Define_insn_and_splitSplit RTLRTLvalidationEmitassembly
  • 35. Instruction attribute120 (define_attr "type"121 "unknown,load,store,bequal, alu, .."122 (const_string "unknown"))…614 (define_insn "cmovn"615 [(set (match_operand:SI 0 "register_operand" "=r")616 (if_then_else (ne:SI (match_operand:SI 1 "register_operand" "r")617 (const_int 0))618 (match_operand:SI 2 "register_operand" "r")619 (match_operand:SI 3 "register_operand" "0")))]620 ""621 "cmovnt%0, %2, %1"622 [(set_attr "type" "alu")623 (set_attr “length” “4”])(define_attr “attribute_name” “value domain” (default value))
  • 36. Instruction attribute Attribute “type” use to divide instruction toseveral instruction group Help to write instruction scheduling porting code Attribute “length” give each instruction ISAlength (size) information make the GCCcould calculate branch distance correctly.
  • 37. Peephole pattern2072 ;; Merge move 0 to bcondz2073 (define_peephole22074 [(set (match_operand:SI 0 "register_operand" "") (const_int 0))2075 (set (pc)2076 (if_then_else (match_operator 1 "bcondz_operator"2077 [(match_dup 0)2078 (match_operand:SI 2 "register_operand" "r")])2079 (label_ref (match_operand 3 "" ""))2080 (pc)))]2081 "peep2_reg_dead_p (2, operands[0])"2082 [(set (pc)2083 (if_then_else:SI (match_dup 1)2084 (label_ref (match_dup 3)) (pc)))]2085 "2086 {2087 operands[1] = gen_rtx_fmt_ee (swap_condition (GET_CODE (operands[1])) ,2088 SImode, operands[2], GEN_INT(0));2089 }")Old RTLNew RTLmovi $r0, 0bne $r0, $r1, L3bnez $r1, L3
  • 38. Instruction scheduling Instruction scheduling is the optimizationpass in GCC change instruction without changing thesemantic of the code To reduce the pipeline stall to improveperformance Instruction scheduling is belong to RTL phase
  • 39. RTL optimization pass in GCC 4.6.2004t.gimple144r.expandOther gimple pass145r.sibling147r.initvals148r.unshare149r.vregs150r.into_cfglayout151r.jump152r.subreg1153r.dfinit154r.cse1155r.fwprop1156r.cprop1158r.hoist159r.cprop2162r.ce1163r.reginfo164r.loop2165r.loop2_init166r.loop2_invariant170r.loop2_done172r.cprop3173r.cse2174r.dse1175r.fwprop2176r.auto_inc_dec177r.init-regs178r.dce179r.combine180r.ce2182r.regmove183r.outof_cfglayout184r.split1185r.subreg2188r.asmcons190r.sched1191r.ira192r.postreload194r.split2198r.pro_and_epilogue199r.dse2200r.csa201r.peephole2202r.ce3204r.cprop_hardreg205r.dce206r.bbro208r.split4209r.sched2212r.alignments215r.mach216r.barriers217r.dbr218r.split5220r.shorten221r.nothrow222r.final223r.dfinish224t.statistics
  • 40. Instruction scheduling GCC have two scheduling pass Sched1Do the interblock scheduling before Register allocation Try to find the innermost loop as region Schedule the instructions in the region Improve the performance of hot spot (innermost loop) Extend the scope to region to find more schedulingopportunity Sched2Do the single basic block scheduling after Register allocationRegister allocation may produce spill code (load/store) Need re-schedule again
  • 41. Instruction scheduling Instruction scheduling resolve the followinghazard to prevent pipeline stall Structure hazardStructure hazard occur when two or more instructionneed the same function unit at the same time Data hazardRAW (read after write): a true dependencyWAR (write after read): a anti-dependencyWAW(write after write): an output dependency
  • 42. Instruction scheduling GCC provide several interface to describepipeline model After parsing the pipeline description portingcodeGcc will generate a automata as a pipeline hazardrecognizerTo figure out the possibility of the instruction issue bythe processor on a given simulated cycle(define_automaton “name")
  • 43. Instruction scheduling(define_automaton “a1")(define_cpu_unit "decode1,decode2" "a1")(define_cpu_unit "div" "a1")(define_insn_reservation “alu_class" 1(eq_attr “type" “alu")"decode + alu")(define_insn_reservation "mult_class" 1(eq_attr “type" "mult")"decode + mult")a1: automata namedecode1, decode2, div: the cpu unit(function unit) in the processordefine_insn_reservation: describepipeline rule for each instruction classalu_class,mult_class:insn-name (insn class)(eq_attr “type" “alu"): match the rulewhile the type attribute of theInstruction pattern is alu"decode + alu": regular expressionto describe the function unit usage1 is the default cycle when the datadependency occur
  • 44. Instruction scheduling Multiple Alternative Constraints(define_insn “addsi3"[ (set (match_operand:SI 0 “register_operand" "=r,r")(plus:SI (match_operand:SI 1 “register_operand" "%r,r")(match_operand:SI 2 “nonmemory_operand" “r,i")))]“”“@add %0, %1, %2addi %0, %1, %2”[(set_attr “type" “alu"))(define_insn_reservation “alu_class" 1(eq_attr “type" “alu")"decode + alu")
  • 45. Instruction scheduling(define_automaton “a1")(define_cpu_unit "decode1,decode2" "a1")(define_cpu_unit “alu" "a1")(define_cpu_unit “mult" "a1")(define_insn_reservation “alu_class" 1(eq_attr “type" “alu")"decode + alu")(define_insn_reservation "mult_class" 1(eq_attr “type" "mult")"decode + mult")nothingdecode+ aludecode+ multalu_classnext_cyclenext_cyclemult_classnext_cycleCurrent CPUFunction unit usageNext cycle CPUFunction unit usageState transition:1. Occupy some function unit2. release function some unit
  • 46. Instruction scheduling(define_insn_reservation “alu_class" 1(eq_attr “type" “alu")"decode + alu")(define_insn_reservation "mult_class" 1(eq_attr “type" "mult")"decode + mult")(define_bypass 2 “alu_class" “alu_class“)(define_bypass 3 “mult_class" “mult_class“)producer consumer t 1 2 3 4 5alu_class alu_classmult_class1 0 0 0 00 0 0 0 0mult_class alu_classmult_class0 0 0 0 01 1 0 0 01 means will stall at t cyclet cycle is the cycle timeAfter producer
  • 47. Instruction schedulingproducer consumer t 1 2 3 4 5alu_class alu_classmult_class1 0 0 0 00 0 0 0 0mult_class alu_classmult_class0 0 0 0 01 1 0 0 00 0 0 0 00 0 0 0 01 0 0 0 00 0 0 0 01 0 0 0 01 0 0 0 00 0 0 0 01 1 0 0 01 0 0 0 01 0 0 0 0Current stateconsumeralu_classmult_classt 1 2 3 4 5
  • 48. Instruction scheduling1. movi $r0, 0 {alu}2. movi $r1, 1 {alu}3. add $r0, $r0, $r1 {alu}4. lwi $r4, [$sp + 4] {load}5. mul $r5, $r0, $r4 {mul}14253(define_insn_reservation “alu_class" 1(eq_attr “type" “alu")"decode + alu")(define_insn_reservation "mult_class" 1(eq_attr “type" "mult")"decode + mult")(define_insn_reservation “load_class" 1(eq_attr “type" “load")"decode + mem")Bottom up calcuate priority ofEach instructionBy P = max {latency+ one successor latency}12233Dataflow graph
  • 49. Instruction scheduling14253Dataflow graphReady list: 1 2 4Pending list: 3 5Queued list:Scheduled list:Ready Pending Queued ScheduledScheduledDependencyresolvedData hazardPick the max priority insn from Ready list
  • 50. Instruction scheduling14253Dataflow graphReady list: 4Pending list: 3 5Queued list: 2Scheduled list:11. movi $r0, 0 {alu}{alu} {alu}{alu} {load}{mult}cycle 1(define_bypass 2 “alu_class" “alu_class“)
  • 51. Instruction scheduling14253Dataflow graphReady list: 2Pending list: 3 5Queued list:Scheduled list:1 41. movi $r0, 0 {alu}{alu} {alu}{alu} {load}{mult}cycle 14. lwi $r4, [$sp + 4] {load}cycle 2
  • 52. Instruction scheduling14253Dataflow graphReady list:Pending list: 5Queued list: 3Scheduled list:1 4 21. movi $r0, 0 {alu}{alu} {alu}{alu} {load}{mult}cycle 14. lwi $r4, [$sp + 4] {load}cycle 22. movi $r1, 1 {alu}cycle 3
  • 53. Instruction scheduling14253Dataflow graphReady list: 3Pending list: 5Queued list:Scheduled list:1 4 21. movi $r0, 0 {alu}{alu} {alu}{alu} {load}{mult}cycle 14. lwi $r4, [$sp + 4] {load}cycle 22. movi $r1, 1 {alu}cycle 3cycle 4
  • 54. Instruction scheduling14253Dataflow graphReady list: 5Pending list:Queued list:Scheduled list:1 4 2 31. movi $r0, 0 {alu}{alu} {alu}{alu} {load}{mult}cycle 14. lwi $r4, [$sp + 4] {load}cycle 22. movi $r1, 1 {alu}cycle 3cycle 43. add $r0, $r0, $r1 {alu}cycle 5
  • 55. Instruction scheduling14253Dataflow graphReady list:Pending list:Queued list:Scheduled list:1 4 2 3 51. movi $r0, 0 {alu}{alu} {alu}{alu} {load}{mult}cycle 14. lwi $r4, [$sp + 4] {load}cycle 22. movi $r1, 1 {alu}cycle 3cycle 43. add $r0, $r0, $r1 {alu}cycle 55. mul $r5, $r0, $r4 {mul}cycle 6
  • 56. Thank you
  • 57. Switch initialization conversion ingimple optimization pass31 int a,b;3233 switch (argc)34 {35 case 1:36 case 2:37 a = 8;38 b = 6;39 break;40 case 3:41 a = 9;42 b = 5;43 break;44 case 12:45 a = 10;46 b = 4;47 break;48 default:49 a = 16;50 b = 1;51 }58 static const int = CSWTCH01[] = {6, 6, 5, 1, 1, 1, 1, 1, 1, 1, 1, 4};59 static const int = CSWTCH02[] = {8, 8, 9, 16, 16, 16, 16, 16, 16, 16,60 16, 16, 10};6162 if (((unsigned) argc) - 1 < 11)63 {64 a = CSWTCH02[argc - 1];65 b = CSWTCH01[argc - 1];66 }67 else68 {69 a = 16;70 b = 1;71 }Try to transfer switch statement to static array access

×