Pipelining
2
Outline
• 5 stage pipelining
• Structural and Data Hazards
• Forwarding
• Branch Schemes
• Conclusion
Sequential Laundry
• Sequential laundry takes 6 hours for 4 loads
• If they learned pipelining, how long would laundry take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
Pipelined Laundry
Start work ASAP
• Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
Pipelining Lessons
• Pipelining doesn’t help
latency of single task, it
helps throughput of
entire workload
• Pipeline rate limited by
slowest pipeline stage
• Multiple tasks operating
simultaneously
• Potential speedup =
Number pipe stages
• Unbalanced lengths of
pipe stages reduces
speedup
• Time to “fill” pipeline and
time to “drain” it reduces
speedup
A
B
C
D
6 PM 7 8 9
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
Example: MIPS (Note register location)
Op
31 26 0
15
16
20
21
25
Rs1 Rd immediate
Op
31 26 0
25
Op
31 26 0
15
16
20
21
25
Rs1 Rs2
target
Rd Opx
Register-Register
5
6
10
11
Register-Immediate
Op
31 26 0
15
16
20
21
25
Rs1 Rs2/Opx immediate
Branch
Jump / Call
5 Steps of MIPS Datapath
Memory
Access
Write
Back
Instruction
Fetch
Instr. Decode
Reg. Fetch
Execute
Addr. Calc
ALU
MUX
Memory
Reg
File
MUX
MUX
Data
Memory
MUX
Sign
Extend
4
Adder
Zero?
Next SEQ PC
Address
Next PC
WB Data
Inst
RD
RS1
RS2
Immediate
8
Five-Stage Pipelining
Figure A.2, Page A-8
I
n
s
t
r.
O
r
d
e
r
Time (clock cycles)
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7
Cycle 5
9
Pipelining is not quite that easy!
• Limits to pipelining: Hazards prevent next instruction
from executing during its designated clock cycle
– Structural hazards: HW cannot support this combination of
instructions (single person to fold and put clothes away)
– Data hazards: Instruction depends on result of prior instruction still
in the pipeline (missing sock)
– Control hazards: Caused by delay between the fetching of
instructions and decisions about changes in control flow (branches
and jumps).
10
One Memory Port/Structural Hazards
Figure A.4, Page A-14
I
n
s
t
r.
O
r
d
e
r
Time (clock cycles)
Load
Instr 1
Instr 2
Instr 3
Instr 4
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7
Cycle 5
Reg
ALU
DMem
Ifetch Reg
11
One Memory Port/Structural Hazards
(Similar to Figure A.5, Page A-15)
I
n
s
t
r.
O
r
d
e
r
Time (clock cycles)
Load
Instr 1
Instr 2
Stall
Instr 3
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7
Cycle 5
Reg
ALU
DMem
Ifetch Reg
Bubble Bubble Bubble Bubble
Bubble
How do you “bubble” the pipe?
12
I
n
s
t
r.
O
r
d
e
r
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Data Hazard on R1
Figure A.6, Page A-17
Time (clock cycles)
IF ID/RF EX MEM WB
13
• Read After Write (RAW)
InstrJ tries to read operand before InstrI writes it
Three Generic Data Hazards
I: add r1,r2,r3
J: sub r4,r1,r3
14
• Write After Read (WAR)
InstrJ writes operand before InstrI reads it
• Can’t happen in MIPS 5 stage pipeline because:
– All instructions take 5 stages, and
– Reads are always in stage 2, and
– Writes are always in stage 5
I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7
Three Generic Data Hazards
15
Three Generic Data Hazards
• Write After Write (WAW)
InstrJ writes operand before InstrI writes it.
• Can’t happen in MIPS 5 stage pipeline because:
– All instructions take 5 stages, and
– Writes are always in stage 5
I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7
16
Outline
• 5 stage pipelining
• Structural and Data Hazards
• Forwarding
• Branch Schemes
• Conclusion
17
Time (clock cycles)
Forwarding to Avoid Data Hazard
Figure A.7, Page A-19
I
n
s
t
r.
O
r
d
e
r
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
18
HW Change for Forwarding
Figure A.23, Page A-37
MEM/WR
ID/EX
EX/MEM
Data
Memory
ALU
mux
mux
Registers
NextPC
Immediate
mux
What circuit detects and resolves this hazard?
19
Outline
• 5 stage pipelining
• Structural and Data Hazards
• Forwarding
• Branch Schemes
• Conclusion
20
Control Hazard on Branches
Three Stage Stall
10: beq r1,r3,36
14: and r2,r3,r5
18: or r6,r1,r7
22: add r8,r1,r9
36: xor r10,r1,r11
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch
What do you do with the 3 instructions in between?
How do you do it?
Where is the “commit”?
21
Branch Hazard Alternatives
#1: Stall until branch direction is clear
#2: Predict Branch Not Taken
– Execute successor instructions in sequence
– “Squash” instructions in pipeline if branch actually taken
– Advantage of late pipeline state update
– 47% MIPS branches not taken on average
#3: Predict Branch Taken
– 53% MIPS branches taken on average
– But haven’t calculated branch target address in MIPS
» MIPS still incurs 1 cycle branch penalty
» Other machines: branch target known before outcome
22
Outline
• 5 stage pipelining
• Structural and Data Hazards
• Forwarding
• Branch Schemes
• Conclusion
23
And In Conclusion: Control and Pipelining
• Hazards limit performance on computers:
– Structural: need more HW resources
– Data (RAW,WAR,WAW): need forwarding, compiler scheduling
– Control: delayed branch, prediction
Acknowledgements
• Adapted from the slides of Prof. David Patterson
and Prof. Gheith Abandah.
24

Pipelining

  • 1.
  • 2.
    2 Outline • 5 stagepipelining • Structural and Data Hazards • Forwarding • Branch Schemes • Conclusion
  • 3.
    Sequential Laundry • Sequentiallaundry takes 6 hours for 4 loads • If they learned pipelining, how long would laundry take? A B C D 30 40 20 30 40 20 30 40 20 30 40 20 6 PM 7 8 9 10 11 Midnight T a s k O r d e r Time
  • 4.
    Pipelined Laundry Start workASAP • Pipelined laundry takes 3.5 hours for 4 loads A B C D 6 PM 7 8 9 10 11 Midnight T a s k O r d e r Time 30 40 40 40 40 20
  • 5.
    Pipelining Lessons • Pipeliningdoesn’t help latency of single task, it helps throughput of entire workload • Pipeline rate limited by slowest pipeline stage • Multiple tasks operating simultaneously • Potential speedup = Number pipe stages • Unbalanced lengths of pipe stages reduces speedup • Time to “fill” pipeline and time to “drain” it reduces speedup A B C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20
  • 6.
    Example: MIPS (Noteregister location) Op 31 26 0 15 16 20 21 25 Rs1 Rd immediate Op 31 26 0 25 Op 31 26 0 15 16 20 21 25 Rs1 Rs2 target Rd Opx Register-Register 5 6 10 11 Register-Immediate Op 31 26 0 15 16 20 21 25 Rs1 Rs2/Opx immediate Branch Jump / Call
  • 7.
    5 Steps ofMIPS Datapath Memory Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc ALU MUX Memory Reg File MUX MUX Data Memory MUX Sign Extend 4 Adder Zero? Next SEQ PC Address Next PC WB Data Inst RD RS1 RS2 Immediate
  • 8.
    8 Five-Stage Pipelining Figure A.2,Page A-8 I n s t r. O r d e r Time (clock cycles) Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5
  • 9.
    9 Pipelining is notquite that easy! • Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle – Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) – Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock) – Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).
  • 10.
    10 One Memory Port/StructuralHazards Figure A.4, Page A-14 I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5 Reg ALU DMem Ifetch Reg
  • 11.
    11 One Memory Port/StructuralHazards (Similar to Figure A.5, Page A-15) I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Stall Instr 3 Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5 Reg ALU DMem Ifetch Reg Bubble Bubble Bubble Bubble Bubble How do you “bubble” the pipe?
  • 12.
    12 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 andr6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Data Hazard on R1 Figure A.6, Page A-17 Time (clock cycles) IF ID/RF EX MEM WB
  • 13.
    13 • Read AfterWrite (RAW) InstrJ tries to read operand before InstrI writes it Three Generic Data Hazards I: add r1,r2,r3 J: sub r4,r1,r3
  • 14.
    14 • Write AfterRead (WAR) InstrJ writes operand before InstrI reads it • Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Reads are always in stage 2, and – Writes are always in stage 5 I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Three Generic Data Hazards
  • 15.
    15 Three Generic DataHazards • Write After Write (WAW) InstrJ writes operand before InstrI writes it. • Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Writes are always in stage 5 I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7
  • 16.
    16 Outline • 5 stagepipelining • Structural and Data Hazards • Forwarding • Branch Schemes • Conclusion
  • 17.
    17 Time (clock cycles) Forwardingto Avoid Data Hazard Figure A.7, Page A-19 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg
  • 18.
    18 HW Change forForwarding Figure A.23, Page A-37 MEM/WR ID/EX EX/MEM Data Memory ALU mux mux Registers NextPC Immediate mux What circuit detects and resolves this hazard?
  • 19.
    19 Outline • 5 stagepipelining • Structural and Data Hazards • Forwarding • Branch Schemes • Conclusion
  • 20.
    20 Control Hazard onBranches Three Stage Stall 10: beq r1,r3,36 14: and r2,r3,r5 18: or r6,r1,r7 22: add r8,r1,r9 36: xor r10,r1,r11 Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch What do you do with the 3 instructions in between? How do you do it? Where is the “commit”?
  • 21.
    21 Branch Hazard Alternatives #1:Stall until branch direction is clear #2: Predict Branch Not Taken – Execute successor instructions in sequence – “Squash” instructions in pipeline if branch actually taken – Advantage of late pipeline state update – 47% MIPS branches not taken on average #3: Predict Branch Taken – 53% MIPS branches taken on average – But haven’t calculated branch target address in MIPS » MIPS still incurs 1 cycle branch penalty » Other machines: branch target known before outcome
  • 22.
    22 Outline • 5 stagepipelining • Structural and Data Hazards • Forwarding • Branch Schemes • Conclusion
  • 23.
    23 And In Conclusion:Control and Pipelining • Hazards limit performance on computers: – Structural: need more HW resources – Data (RAW,WAR,WAW): need forwarding, compiler scheduling – Control: delayed branch, prediction
  • 24.
    Acknowledgements • Adapted fromthe slides of Prof. David Patterson and Prof. Gheith Abandah. 24