Advanced Pipelining• Superpiplining: Increase the depth of the pipeline (deep pipline)   – to overlap more instructions• M...
Advanced Pipelining • Static multiple issue    – compiler decides multiple issue before execution • Dynamic multiple issue...
Static Multiple Issue • Issue packet    – the set of instructions that issue together in a clock cycle • SMI concept    – ...
A Static Two-issue Datapath                              ©2004 Morgan Kaufmann Publishers   43
Static Multiple Issues• Extra resources (issuing 2 instrs per cycle)   –   Another 32bits from instruction memory   –   ne...
Example: Multiple-issue Code Scheduling•   How would this loop be scheduled on a two-issue pipeline for MIPS?    Reorder t...
Example: Loop Unrolling for Multiple-issue Pipelines Loop unrolling: • multiple copies of the loop body are made &   instr...
The BIG Picture • Both pipelining and multiple-issue execution   increase peak instr throughput. • Longer pipelines and wi...
Dynamic Pipeline Scheduling• SuperScalar processor – the pipeline is divided into three  major units   1. an instr fetch a...
The Dynamically scheduled Pipeline                                       Instruction fetch                                ...
The Dynamically scheduled Pipeline • Motivations for dynamic scheduling:   – Not all stalls are predictable (e.g., cache m...
Upcoming SlideShare
Loading in …5
×

Advanced pipelining

1,033 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,033
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Advanced pipelining

  1. 1. Advanced Pipelining• Superpiplining: Increase the depth of the pipeline (deep pipline) – to overlap more instructions• Multiple issue: start more than one instruction each cycle – To have CPI<1• Loop unrolling : a technique to get better instr scheduling – To expose more ILP• “Superscalar” processors – DEC Alpha 21264: 9 stage pipeline, 6 instruction issue – dynamic multiple issue: processor dynamically chooses which instructions to execute in a given cycle while trying to avoid hazard.• VLIW: very long instruction word, static multiple issue (relies more on compiler technology - packing instructions and handling hazard) ©2004 Morgan Kaufmann Publishers 40
  2. 2. Advanced Pipelining • Static multiple issue – compiler decides multiple issue before execution • Dynamic multiple issue – processor decides multiple issue during execution • Problems of multiple issue – How to package instructions into issue slots – How to deal with data and control hazard • Speculation – the compiler or processor guesses the outcome of an instruction to remove it as a dependence in executing other instructions ©2004 Morgan Kaufmann Publishers 41
  3. 3. Static Multiple Issue • Issue packet – the set of instructions that issue together in a clock cycle • SMI concept – regard an issue packet as one large instruction with multiple operations – Very Long Instruction Word (VLIW) or Explicitly Parallel Instruction Computer (EPIC) by intel IA-64 • Assume two instrs may be issued per clock cycle: – 1 for an integer ALU op or branch – 1 for a load or store ©2004 Morgan Kaufmann Publishers 42
  4. 4. A Static Two-issue Datapath ©2004 Morgan Kaufmann Publishers 43
  5. 5. Static Multiple Issues• Extra resources (issuing 2 instrs per cycle) – Another 32bits from instruction memory – need extra ports in the register file – Another ALU handling address calculation for data transfer • Without these extra resources ⇒ structural hazards• More ambitious compiler or h/w scheduling technique – loads have a latency of 1 clock cycle in simple five-stage pipeline • In two-issue pipeline, the next two inst cannot use the load result without stalling. – ALU that has no use latency in simple five-stage pipeline • Become 1-instr use latency (the result cannot be used in paired instr) ©2004 Morgan Kaufmann Publishers 44
  6. 6. Example: Multiple-issue Code Scheduling• How would this loop be scheduled on a two-issue pipeline for MIPS? Reorder the instrs to avoid as many pipeline stalls as possible. Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $zero, Loop• Ans: 4 clocks per loop iteration CPI = 4/5= 0.8 ALU or branch inst. Data transfer inst. Clock cycle Loop: lw $t0, 0($s1) 1 addi $s1, $s1, -4 2 addu $t0, $t0, $s2 3 bne $s1, $zero,Loop sw $t0, 4($s1) 4 ©2004 Morgan Kaufmann Publishers 45
  7. 7. Example: Loop Unrolling for Multiple-issue Pipelines Loop unrolling: • multiple copies of the loop body are made & instrs from different iterations are scheduled together • Register renaming - remove antidependence (name dependence) Ex. Assume the loop index is a multiple of four ALU or branch inst. Data transfer ClockLoop: lw $t0, 0($s1) inst. cycle addu $t0, $t0, $s2 Loop: addi $s1,$s1, -16 lw $t0, 0($s1) 1 sw $t0, 0($s1) lw $t1,12($s1) 2 addi $s1, $s1, -4 addu $t0, $t0, $s2 lw $t2,8($s1) 3 bne $s1, $zero, Loop 4 addu $t1, $t1, $s2 lw $t3,4($s1) addu $t2, $t2, $s2 sw $t0,16($s1) 5 • Ans: addu $t3, $t3, $s2 sw $t1,12($s1) 6 – 8/4 clocks per iteration sw $t2,8($s1) 7 – CPI = 8/14=0.57 bne $s1, $zero, Loop 8 sw $t3,4($s1) ©2004 Morgan Kaufmann Publishers 46
  8. 8. The BIG Picture • Both pipelining and multiple-issue execution increase peak instr throughput. • Longer pipelines and wider multiple-issue put even more pressure on the compiler to deliver on the performance potential of the hardware. • Hardware designers must ensure correct execution of all instr sequences. • Compiler writers must understand the pipeline to generate the appropriate code and then to achieve best performance. ©2004 Morgan Kaufmann Publishers 47
  9. 9. Dynamic Pipeline Scheduling• SuperScalar processor – the pipeline is divided into three major units 1. an instr fetch and decode unit: « fetches instrs, decodes them, & sends each instr to related functional units 2. functional units (FUs): « Reservation station: each FU has buffers « Once the buffer contains all its operands and the functional unit is ready to execute, the result is calculated. 3. a commit unit: « decide when to put the result into the reg file or memory ©2004 Morgan Kaufmann Publishers 48
  10. 10. The Dynamically scheduled Pipeline Instruction fetch In-order issue and decode unit Reservation Reservation … Reservation Reser vation station station station station Floating Load/ Out-of-orderFunctional Integer Integer … Out-of-order execute units point Store execution In-order commit Commit unit ©2004 Morgan Kaufmann Publishers 49
  11. 11. The Dynamically scheduled Pipeline • Motivations for dynamic scheduling: – Not all stalls are predictable (e.g., cache miss). (Ch7) – If dynamic branch prediction is used (it cannot know the execution order of instruction at compile time) – Pipeline latency and issue width change from one implementation to another. Dynamic scheduling allows to hide the multiple versions of hardware implementations of the same instruction set. Old code will get benefit of a new implementation without the need for recompilation. ©2004 Morgan Kaufmann Publishers 50

×