SlideShare a Scribd company logo
1 of 65
CS152 / Kubiatowicz
Lec12.1
3/10/99 ©UCB Spring 1999
CS152
Computer Architecture and Engineering
Lecture 12
Introduction to Pipelining
Mar 10, 1999
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/
CS152 / Kubiatowicz
Lec12.2
3/10/99 ©UCB Spring 1999
Recap: Microprogramming
° Microprogramming is a convenient method for
implementing structured control state diagrams:
• Random logic replaced by microPC sequencer and ROM
• Each line of ROM called a instruction:
contains sequencer control + values for control points
• limited state transitions:
branch to zero, next sequential,
branch to instruction address from displatch ROM
° Horizontal Code: one control bit in Instruction
for every control line in datapath
° Vertical Code: groups of control-lines coded
together in Instruction (e.g. possible ALU dest)
° Control design reduces to Microprogramming
• Part of the design process is to develop a “language” that
describes control and is easy for humans to understand
CS152 / Kubiatowicz
Lec12.3
3/10/99 ©UCB Spring 1999
Recap: Microprogramming
° Microprogramming is a fundamental concept
• implement an instruction set by building a very simple processor
and interpreting the instructions
• essential for very complex instructions and when few register
transfers are possible
• overkill when ISA matches datapath 1-1
sequencer
control
datapath control
micro-PC
-sequencer:
fetch,dispatch,
sequential
microinstruction ()
Dispatch
ROM
Opcode
-Code ROM
Decode
Decode
To DataPath
Decoders implement
our -code language:
For instance:
rt-ALU
rd-ALU
mem-ALU
CS152 / Kubiatowicz
Lec12.4
3/10/99 ©UCB Spring 1999
Recap: Exceptions
° Exception = unprogrammed control transfer
• system takes action to handle the exception
- must record the address of the offending instruction
- record any other information necessary to return afterwards
• returns control to user
• must save & restore user state
° Allows constuction of a “user virtual machine”
normal control flow:
sequential, jumps, branches, calls, returns
user program System
Exception
Handler
Exception:
return from
exception
CS152 / Kubiatowicz
Lec12.5
3/10/99 ©UCB Spring 1999
Recap: Precise Exceptions
° Precise  state of the machine is preserved as if
program executed up to the offending instruction
• All previous instructions completed
• Offending instruction and all following instructions act as if they have
not even started
• MIPS takes this position
° Imprecise  system software has to figure out what is
where and put it all back together
• Precise exceptions particularly important for virtual memory, since we
may need to quickly schedule another user
• Precise exceptions also important for frequent interrupts, where need
low-overhead restart
° Performance goals often lead designers to forsake
precise interrupts
• system software developers, user, markets etc. usually wish they had
not done this
° Modern techniques for out-of-order execution and
branch prediction help implement precise interrupts
CS152 / Kubiatowicz
Lec12.6
3/10/99 ©UCB Spring 1999
Recap: Two Types of Exceptions
° Interrupts
• caused by external events:
- Network, Keyboard, Disk I/O, Timer
• asynchronous to program execution
- Most interrupts can be disabled for brief periods of time
- Some (like “Power Failing”) are non-maskable (NMI)
° Traps
• caused by internal events
- exceptional conditions (overflow)
- errors (parity)
- faults (non-resident page)
• synchronous to program execution
• condition must be remedied by the handler
• instruction may be retried or simulated and program continued
or program may be aborted
CS152 / Kubiatowicz
Lec12.7
3/10/99 ©UCB Spring 1999
Example: How Control Handles Traps in our FSD
° Undefined Instruction–detected when no next state is
defined from state 1 for the op value.
• We handle this exception by defining the next state value for all op
values other than lw, sw, 0 (R-type), jmp, beq, and ori as new state 12.
• Shown symbolically using “other” to indicate that the op field does
not match any of the opcodes that label arcs out of state 1.
° Arithmetic overflow–detected on ALU ops such as
signed add
• Used to save PC and enter exception handler
° External Interrupt – flagged by asserted interrupt line
• Again, must save PC and enter exception handler
° Note: Challenge in designing control of a real machine
is to handle different interactions between instructions
and other exception-causing events such that control
logic remains small and fast.
• Complex interactions makes the control unit the most challenging
aspect of hardware design
CS152 / Kubiatowicz
Lec12.8
3/10/99 ©UCB Spring 1999
How add traps and interrupts to state diagram?
IR <= MEM[PC]
PC <= PC + 4
R-type
S <= A fun B
R[rd] <= S
S <= A op ZX
R[rt] <= S
ORi
S <= A + SX
R[rt] <= M
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= B
SW
“instruction fetch”
“decode”
0000
0001
0100
0101
0110
0111
1000
1001
1010
1011
1100
BEQ
0010
If A = B
then PC <= S
S<= PC +SX
undefined
instruction
EPC <= PC - 4
PC <= exp_addr
cause <= 10 (RI)
other
S <= A - B
overflow
EPC <= PC - 4
PC <= exp_addr
cause <= 12 (Ovf)
EPC <= PC - 4
PC <= exp_addr
cause <= 0(INT)
Handle
Interrupt
Pending INT
CS152 / Kubiatowicz
Lec12.9
3/10/99 ©UCB Spring 1999
But: What has to change in our -sequencer?
° Need concept of branch at micro-code level
R-type
S <= A fun B
0100
overflow
EPC <= PC - 4
PC <= exp_addr
cause <= 12 (Ovf)
µAddress
Select
Logic
Opcode
microPC
1
Adder
Dispatch
ROM
Mux
0
0
1
2
Mux
Mux
0
1
overflow
pending interrupt
4?
N?
Cond Select
Do -branch
-offset Seq Select
CS152 / Kubiatowicz
Lec12.10
3/10/99 ©UCB Spring 1999
Example: Can easily use with for non-ideal memory
IR <= MEM[PC]
R-type
A <= R[rs]
B <= R[rt]
S <= A fun B
R[rd] <= S
PC <= PC + 4
S <= A or ZX
R[rt] <= S
PC <= PC + 4
ORi
S <= A + SX
R[rt] <= M
PC <= PC + 4
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= B
BEQ
PC <=
Next(PC)
SW
“instruction fetch”
“decode / operand fetch”
Execute
Memory
Write-back
~wait wait
~wait wait
PC <= PC + 4
~wait wait
CS152 / Kubiatowicz
Lec12.11
3/10/99 ©UCB Spring 1999
Summary: Microprogramming one inspiration for RISC
° If simple instruction could execute at very high clock
rate…
° If you could even write compilers to produce
microinstructions…
° If most programs use simple instructions and
addressing modes…
° If microcode is kept in RAM instead of ROM so as to
fix bugs …
° If same memory used for control memory could be
used instead as cache for “macroinstructions”…
° Then why not skip instruction interpretation by a
microprogram and simply compile directly into lowest
language of machine? (microprogramming is overkill
when ISA matches datapath 1-1)
CS152 / Kubiatowicz
Lec12.12
3/10/99 ©UCB Spring 1999
The Big Picture: Where are We Now?
° The Five Classic Components of a Computer
° Next Topics:
• Pipelining by Analogy
• Administrivia; Course road map
Control
Datapath
Memory
Processor
Input
Output
CS152 / Kubiatowicz
Lec12.13
3/10/99 ©UCB Spring 1999
Pipelining is Natural!
° Laundry Example
° Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
° Washer takes 30 minutes
° Dryer takes 40 minutes
° “Folder” takes 20 minutes
A B C D
CS152 / Kubiatowicz
Lec12.14
3/10/99 ©UCB Spring 1999
Sequential Laundry
° Sequential laundry takes 6 hours for 4 loads
° If they learned pipelining, how long would laundry
take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
CS152 / Kubiatowicz
Lec12.15
3/10/99 ©UCB Spring 1999
Pipelined Laundry: Start work ASAP
° Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
CS152 / Kubiatowicz
Lec12.16
3/10/99 ©UCB Spring 1999
Pipelining Lessons
° Pipelining doesn’t help
latency of single task, it
helps throughput of entire
workload
° Pipeline rate limited by
slowest pipeline stage
° Multiple tasks operating
simultaneously using
different resources
° Potential speedup =
Number pipe stages
° Unbalanced lengths of
pipe stages reduces
speedup
° Time to “fill” pipeline and
time to “drain” it reduces
speedup
° Stall for Dependences
A
B
C
D
6 PM 7 8 9
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
CS152 / Kubiatowicz
Lec12.17
3/10/99 ©UCB Spring 1999
Administrative Issues
°Computers in the news:
•Intel decided to settle out of court
•FTC had much narrower case with Intel
than with Microsoft
•Intel choose quieter solution still
stinging from bad publicity due to
Divide bug.
°Moving on to Chapter 6
°Next weeksections in Cory Lab
CS152 / Kubiatowicz
Lec12.18
3/10/99 ©UCB Spring 1999
The Five Stages of Load
° Ifetch: Instruction Fetch
• Fetch the instruction from the Instruction Memory
° Reg/Dec: Registers Fetch and Instruction Decode
° Exec: Calculate the memory address
° Mem: Read the data from the Data Memory
° Wr: Write the data back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem Wr
Load
CS152 / Kubiatowicz
Lec12.19
3/10/99 ©UCB Spring 1999
Note: These 5 stages were there all along!
IR <= MEM[PC]
PC <= PC + 4
R-type
ALUout
<= A fun B
R[rd]
<= ALUout
ALUout
<= A op ZX
R[rt]
<= ALUout
ORi
ALUout
<= A + SX
R[rt] <= M
M <=
MEM[ALUout]
LW
ALUout
<= A + SX
MEM[ALUout]
<= B
SW
0000
0001
0100
0101
0110
0111
1000
1001
1010
1011
1100
BEQ
0010
If A = B then PC
<= ALUout
ALUout
<= PC +SX
Execute
Memory
Write-back
Decode
Fetch
CS152 / Kubiatowicz
Lec12.20
3/10/99 ©UCB Spring 1999
Pipelining
° Improve performance by increasing throughput
Ideal speedup is number of stages in the pipeline.
Do we achieve this?
CS152 / Kubiatowicz
Lec12.21
3/10/99 ©UCB Spring 1999
Basic Idea
° What do we need to add to split the datapath into stages?
CS152 / Kubiatowicz
Lec12.22
3/10/99 ©UCB Spring 1999
Graphically Representing Pipelines
° Can help with answering questions like:
• how many cycles does it take to execute this code?
• what is the ALU doing during cycle 4?
• use this representation to help understand datapaths
CS152 / Kubiatowicz
Lec12.23
3/10/99 ©UCB Spring 1999
Conventional Pipelined Execution Representation
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
Program Flow
Time
CS152 / Kubiatowicz
Lec12.24
3/10/99 ©UCB Spring 1999
Single Cycle, Multiple Cycle, vs. Pipeline
Clk
Cycle 1
Multiple Cycle Implementation:
Ifetch Reg Exec Mem Wr
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Load Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem
Load Store
Pipeline Implementation:
Ifetch Reg Exec Mem Wr
Store
Clk
Single Cycle Implementation:
Load Store Waste
Ifetch
R-type
Ifetch Reg Exec Mem Wr
R-type
Cycle 1 Cycle 2
CS152 / Kubiatowicz
Lec12.25
3/10/99 ©UCB Spring 1999
Why Pipeline?
° Suppose we execute 100 instructions
° Single Cycle Machine
• 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
° Multicycle Machine
• 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns
° Ideal pipelined machine
• 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
CS152 / Kubiatowicz
Lec12.26
3/10/99 ©UCB Spring 1999
Why Pipeline? Because the resources are there!
I
n
s
t
r.
O
r
d
e
r
Time (clock cycles)
Inst 0
Inst 1
Inst 2
Inst 4
Inst 3 ALU
Im Reg Dm Reg
ALU
Im Reg Dm Reg
ALU
Im Reg Dm Reg
ALU
Im Reg Dm Reg
ALU
Im Reg Dm Reg
CS152 / Kubiatowicz
Lec12.27
3/10/99 ©UCB Spring 1999
Can pipelining get us into trouble?
° Yes: Pipeline Hazards
• structural hazards: attempt to use the same resource two
different ways at the same time
- E.g., combined washer/dryer would be a structural hazard
or folder busy doing something else (watching TV)
• data hazards: attempt to use item before it is ready
- E.g., one sock of pair in dryer and one in washer; can’t fold
until get sock from washer through dryer
- instruction depends on result of prior instruction still in the
pipeline
• control hazards: attempt to make a decision before condition is
evaulated
- E.g., washing football uniforms and need to get proper
detergent level; need to see after dryer before next load in
- branch instructions
° Can always resolve hazards by waiting
• pipeline control must detect the hazard
• take action (or delay action) to resolve hazards
CS152 / Kubiatowicz
Lec12.28
3/10/99 ©UCB Spring 1999
Mem
Single Memory is a Structural Hazard
I
n
s
t
r.
O
r
d
e
r
Time (clock cycles)
Load
Instr 1
Instr 2
Instr 3
Instr 4
ALU
Mem Reg Mem Reg
ALU
Mem Reg Mem Reg
ALU
Mem Reg Mem Reg
ALU
Reg Mem Reg
ALU
Mem Reg Mem Reg
Detection is easy in this case! (right half highlight means read, left half write)
CS152 / Kubiatowicz
Lec12.29
3/10/99 ©UCB Spring 1999
Structural Hazards limit performance
° Example: if 1.3 memory accesses per instruction
and only one memory access per cycle then
• average CPI  1.3
• otherwise resource is more than 100% utilized
CS152 / Kubiatowicz
Lec12.30
3/10/99 ©UCB Spring 1999
° Stall: wait until decision is clear
° Impact: 2 lost cycles (i.e. 3 clock cycles per branch
instruction) => slow
° Move decision to end of decode
• save 1 cycle per branch
Control Hazard Solutions
I
n
s
t
r.
O
r
d
e
r
Time (clock cycles)
Add
Beq
Load
ALU
Mem Reg Mem Reg
ALU
Mem Reg Mem Reg
ALU
Reg Mem Reg
Mem
Lost
potential
CS152 / Kubiatowicz
Lec12.31
3/10/99 ©UCB Spring 1999
° Predict: guess one direction then back up if wrong
• Predict not taken
° Impact: 0 lost cycles per branch instruction if right,
1 if wrong (right 50% of time)
• Need to “Squash” and restart following instruction if wrong
• Produce CPI on branch of (1 *.5 + 2 * .5) = 1.5
• Total CPI might then be: 1.5 * .2 + 1 * .8 = 1.1 (20% branch)
° More dynamic scheme: history of 1 branch ( 90%)
Control Hazard Solutions
I
n
s
t
r.
O
r
d
e
r
Time (clock cycles)
Add
Beq
Load ALU
Mem Reg Mem Reg
ALU
Mem Reg Mem Reg
Mem
ALU
Reg Mem Reg
CS152 / Kubiatowicz
Lec12.32
3/10/99 ©UCB Spring 1999
° Redefine branch behavior (takes place after next
instruction) “delayed branch”
° Impact: 0 clock cycles per branch instruction if can
find instruction to put in “slot” ( 50% of time)
° As launch more instruction per clock cycle, less useful
Control Hazard Solutions
I
n
s
t
r.
O
r
d
e
r
Time (clock cycles)
Add
Beq
Misc
ALU
Mem Reg Mem Reg
ALU
Mem Reg Mem Reg
Mem
ALU
Reg Mem Reg
Load Mem
ALU
Reg Mem Reg
CS152 / Kubiatowicz
Lec12.33
3/10/99 ©UCB Spring 1999
Data Hazard on r1
add r1 ,r2,r3
sub r4, r1 ,r3
and r6, r1 ,r7
or r8, r1 ,r9
xor r10, r1 ,r11
CS152 / Kubiatowicz
Lec12.34
3/10/99 ©UCB Spring 1999
• Dependencies backwards in time are hazards
Data Hazard on r1:
I
n
s
t
r.
O
r
d
e
r
Time (clock cycles)
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
I
F
ID/R
F
E
X
ME
M
W
B
ALU
Im Reg Dm Reg
ALU
Im Reg Dm Reg
ALU
Im Reg Dm Reg
Im
ALU
Reg Dm Reg
ALU
Im Reg Dm Reg
CS152 / Kubiatowicz
Lec12.35
3/10/99 ©UCB Spring 1999
• “Forward” result from one stage to another
• “or” OK if define read/write properly
Data Hazard Solution:
I
n
s
t
r.
O
r
d
e
r
Time (clock cycles)
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
I
F
ID/R
F
E
X
ME
M
W
B
ALU
Im Reg Dm Reg
ALU
Im Reg Dm Reg
ALU
Im Reg Dm Reg
Im
ALU
Reg Dm Reg
ALU
Im Reg Dm Reg
CS152 / Kubiatowicz
Lec12.36
3/10/99 ©UCB Spring 1999
Reg
• Dependencies backwards in time are hazards
• Can’t solve with forwarding:
• Must delay/stall instruction dependent on loads
Forwarding (or Bypassing): What about Loads
Time (clock cycles)
lw r1,0(r2)
sub r4,r1,r3
I
F
ID/R
F
E
X
ME
M
W
B
ALU
Im Reg Dm
ALU
Im Reg Dm Reg
CS152 / Kubiatowicz
Lec12.37
3/10/99 ©UCB Spring 1999
Reg
• Dependencies backwards in time are hazards
• Can’t solve with forwarding:
• Must delay/stall instruction dependent on loads
Forwarding (or Bypassing): What about Loads
Time (clock cycles)
lw r1,0(r2)
sub r4,r1,r3
I
F
ID/R
F
E
X
ME
M
W
B
ALU
Im Reg Dm
ALU
Im Reg Dm Reg
Stall
CS152 / Kubiatowicz
Lec12.38
3/10/99 ©UCB Spring 1999
Designing a Pipelined Processor
° Go back and examine your datapath and control
diagram
° associated resources with states
° ensure that flows do not conflict, or figure out how
to resolve
° assert control in appropriate stage
CS152 / Kubiatowicz
Lec12.39
3/10/99 ©UCB Spring 1999
Control and Datapath: Split state diag into 5 pieces
IR <- Mem[PC]; PC <– PC+4;
A <- R[rs]; B<– R[rt]
S <– A + B;
R[rd] <– S;
S <– A + SX;
M <– Mem[S]
R[rd] <– M;
S <– A or ZX;
R[rt] <– S;
S <– A + SX;
Mem[S] <- B
If Cond
PC < PC+SX;
Exec
Reg.
File
Mem
Acces
s
Data
Mem
A
B
S
Reg
File
Equal
PC
Next
PC
IR
Inst.
Mem
D
M
CS152 / Kubiatowicz
Lec12.40
3/10/99 ©UCB Spring 1999
Pipelined Processor (almost) for slides
° What happens if we start a new instruction every
cycle?
Exec
Reg.
File
Mem
Acces
s
Data
Mem
A
B
S
M
Reg
File
Equal
PC
Next
PC
IR
Inst.
Mem Valid
IRex
Dcd
Ctrl
IRmem
Ex
Ctrl
IRwb
Mem
Ctrl
WB
Ctrl
D
CS152 / Kubiatowicz
Lec12.41
3/10/99 ©UCB Spring 1999
Pipelined Datapath (as in book); hard to read
CS152 / Kubiatowicz
Lec12.42
3/10/99 ©UCB Spring 1999
Pipelining the Load Instruction
° The five independent functional units in the pipeline
datapath are:
• Instruction Memory for the Ifetch stage
• Register File’s Read ports (bus A and busB) for the Reg/Dec stage
• ALU for the Exec stage
• Data Memory for the Mem stage
• Register File’s Write port (bus W) for the Wr stage
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Ifetch Reg/Dec Exec Mem Wr
1st lw
Ifetch Reg/Dec Exec Mem Wr
2nd lw
Ifetch Reg/Dec Exec Mem Wr
3rd lw
CS152 / Kubiatowicz
Lec12.43
3/10/99 ©UCB Spring 1999
The Four Stages of R-type
° Ifetch: Instruction Fetch
• Fetch the instruction from the Instruction Memory
° Reg/Dec: Registers Fetch and Instruction Decode
° Exec:
• ALU operates on the two register operands
• Update PC
° Wr: Write the ALU output back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec Wr
R-type
CS152 / Kubiatowicz
Lec12.44
3/10/99 ©UCB Spring 1999
Pipelining the R-type and Load Instruction
° We have pipeline conflict or structural hazard:
• Two instructions try to write to the register file at the same time!
• Only one write port
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Exec Wr
R-type
Ifetch Reg/Dec Exec Wr
R-type
Ifetch Reg/Dec Exec Mem Wr
Load
Ifetch Reg/Dec Exec Wr
R-type
Ifetch Reg/Dec Exec Wr
R-type
Ops! We have a problem!
CS152 / Kubiatowicz
Lec12.45
3/10/99 ©UCB Spring 1999
Important Observation
° Each functional unit can only be used once per
instruction
° Each functional unit must be used at the same stage
for all instructions:
• Load uses Register File’s Write Port during its 5th stage
• R-type uses Register File’s Write Port during its 4th stage
Ifetch Reg/Dec Exec Mem Wr
Load
1 2 3 4 5
Ifetch Reg/Dec Exec Wr
R-type
1 2 3 4
° 2 ways to solve this pipeline hazard.
CS152 / Kubiatowicz
Lec12.46
3/10/99 ©UCB Spring 1999
Solution 1: Insert “Bubble” into the Pipeline
° Insert a “bubble” into the pipeline to prevent 2 writes
at the same cycle
• The control logic can be complex.
• Lose instruction fetch and issue opportunity.
° No instruction is started in Cycle 6!
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Exec Wr
R-type
Ifetch Reg/Dec Exec
Ifetch Reg/Dec Exec Mem Wr
Load
Ifetch Reg/Dec Exec Wr
R-type
Ifetch Reg/Dec Exec Wr
R-type Pipeline
Bubble
Ifetch Reg/Dec Exec Wr
CS152 / Kubiatowicz
Lec12.47
3/10/99 ©UCB Spring 1999
Solution 2: Delay R-type’s Write by One Cycle
° Delay R-type’s register write by one cycle:
• Now R-type instructions also use Reg File’s write port at Stage 5
• Mem stage is a NOOP stage: nothing is being done.
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Mem Wr
R-type
Ifetch Reg/Dec Mem Wr
R-type
Ifetch Reg/Dec Exec Mem Wr
Load
Ifetch Reg/Dec Mem Wr
R-type
Ifetch Reg/Dec Mem Wr
R-type
Ifetch Reg/Dec Exec Wr
R-type Mem
Exec
Exec
Exec
Exec
1 2 3 4 5
CS152 / Kubiatowicz
Lec12.48
3/10/99 ©UCB Spring 1999
Modified Control & Datapath
IR <- Mem[PC]; PC <– PC+4;
A <- R[rs]; B<– R[rt]
S <– A + B;
R[rd] <– M;
S <– A + SX;
M <– Mem[S]
R[rd] <– M;
S <– A or ZX;
R[rt] <– M;
S <– A + SX;
Mem[S] <- B
if Cond PC
< PC+SX;
M <– S
Exec
Reg.
File
Mem
Acces
s
Data
Mem
A
B
S
Reg
File
Equal
PC
Next
PC
IR
Inst.
Mem
D
M
M <– S
CS152 / Kubiatowicz
Lec12.49
3/10/99 ©UCB Spring 1999
The Four Stages of Store
° Ifetch: Instruction Fetch
• Fetch the instruction from the Instruction Memory
° Reg/Dec: Registers Fetch and Instruction Decode
° Exec: Calculate the memory address
° Mem: Write the data into the Data Memory
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec Mem
Store Wr
CS152 / Kubiatowicz
Lec12.50
3/10/99 ©UCB Spring 1999
The Three Stages of Beq
° Ifetch: Instruction Fetch
• Fetch the instruction from the Instruction Memory
° Reg/Dec:
• Registers Fetch and Instruction Decode
° Exec:
• compares the two register operand,
• select correct branch target address
• latch into PC
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec Mem
Beq Wr
CS152 / Kubiatowicz
Lec12.51
3/10/99 ©UCB Spring 1999
Control Diagram
IR <- Mem[PC]; PC < PC+4;
A <- R[rs]; B<– R[rt]
S <– A + B;
R[rd] <– S;
S <– A + SX;
M <– Mem[S]
R[rd] <– M;
S <– A or ZX;
R[rt] <– S;
S <– A + SX;
Mem[S] <- B
If Cond PC
< PC+SX;
Exec
Reg.
File
Mem
Acces
s
Data
Mem
A
B
S
Reg
File
Equal
PC
Next
PC
IR
Inst.
Mem
D
M <– S M <– S
M
CS152 / Kubiatowicz
Lec12.52
3/10/99 ©UCB Spring 1999
Datapath + Data Stationary Control
Exec
Reg.
File
Mem
Acces
s
Data
Mem
A
B
S
Reg
File
PC
Next
PC
IR
Inst.
Mem
D
Decode
Mem
Ctrl
WB
Ctrl
M
rs rt
op
rs
rt
fun
im
ex
me
wb
rw
v
me
wb
rw
v
wb
rw
v
CS152 / Kubiatowicz
Lec12.53
3/10/99 ©UCB Spring 1999
Let’s Try it Out
10 lw r1, r2(35)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
these addresses are octal
CS152 / Kubiatowicz
Lec12.54
3/10/99 ©UCB Spring 1999
Start: Fetch 10
Exec
Reg.
File
Mem
Acces
s
Data
Mem
A
B
S
Reg
File
PC
Next
PC
IR
Inst.
Mem
D
Decode
Mem
Ctrl
WB
Ctrl
M
rs rt im
10 lw r1, r2(35)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
n n n n
10
CS152 / Kubiatowicz
Lec12.55
3/10/99 ©UCB Spring 1999
Fetch 14, Decode 10
Exec
Reg.
File
Mem
Acces
s
Data
Mem
A
B
S
Reg
File
PC
Next
PC
IR
Inst.
Mem
D
Decode
Mem
Ctrl
WB
Ctrl
M
2 rt im
10 lw r1, r2(35)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
n n n
14
lw
r1,
r2(35)
CS152 / Kubiatowicz
Lec12.56
3/10/99 ©UCB Spring 1999
Fetch 20, Decode 14, Exec 10
Exec
Reg.
File
Mem
Acces
s
Data
Mem
r2
B
S
Reg
File
PC
Next
PC
IR
Inst.
Mem
D
Decode
Mem
Ctrl
WB
Ctrl
M
2 rt
35
10 lw r1, r2(35)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
n n
20
lw
r1
addI
r2,
r2,
3
CS152 / Kubiatowicz
Lec12.57
3/10/99 ©UCB Spring 1999
Fetch 24, Decode 20, Exec 14, Mem 10
Exec
Reg.
File
Mem
Acces
s
Data
Mem
r2
B
r2+35
Reg
File
PC
Next
PC
IR
Inst.
Mem
D
Decode
Mem
Ctrl
WB
Ctrl
M
4 5
3
10 lw r1, r2(35)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
n
24
lw
r1
sub
r3,
r4,
r5
addI
r2,
r2,
3
CS152 / Kubiatowicz
Lec12.58
3/10/99 ©UCB Spring 1999
Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10
Exec
Reg.
File
Mem
Acces
s
Data
Mem
r4
r5
r2+3
Reg
File
PC
Next
PC
IR
Inst.
Mem
D
Decode
Mem
Ctrl
WB
Ctrl
M[r2+35]
6 7
10 lw r1, r2(35)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
30
lw
r1
beq
r6,
r7
100
addI
r2
sub
r3
CS152 / Kubiatowicz
Lec12.59
3/10/99 ©UCB Spring 1999
Fetch 34, Dcd 30, Ex 24, Mem 20, WB 14
Exec
Reg.
File
Mem
Acces
s
Data
Mem
r6
r7
r2+3
Reg
File
PC
Next
PC
IR
Inst.
Mem
D
Decode
Mem
Ctrl
WB
Ctrl
r1=M[r2+35]
9 xx
10 lw r1, r2(35)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
34
beq
addI
r2
sub
r3
r4-r5
100
ori
r8,
r9
17
CS152 / Kubiatowicz
Lec12.60
3/10/99 ©UCB Spring 1999
Fetch 100, Dcd 34, Ex 30, Mem 24, WB 20
Exec
Reg.
File
Mem
Acces
s
Data
Mem
r9
x
Reg
File
PC
Next
PC
IR
Inst.
Mem
D
Decode
Mem
Ctrl
WB
Ctrl
r1=M[r2+35]
11 12
10 lw r1, r2(35)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
100
beq
r2 = r2+3
sub
r3
r4-r5
17
ori
r8
xxx
add
r10,
r11,
r12
ooops, we should have only one delayed instruction
CS152 / Kubiatowicz
Lec12.61
3/10/99 ©UCB Spring 1999
Fetch 104, Dcd 100, Ex 34, Mem 30, WB 24
Exec
Reg.
File
Mem
Acces
s
Data
Mem
r11
r12
Reg
File
PC
Next
PC
IR
Inst.
Mem
D
Decode
Mem
Ctrl
WB
Ctrl
r1=M[r2+35]
14 15
10 lw r1, r2(35)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
104
beq
r2 = r2+3
r3 = r4-r5
xx
ori
r8
xxx
add
r10
and
r13,
r14,
r15
n
Squash the extra instruction in the branch shadow!
r9
|
17
CS152 / Kubiatowicz
Lec12.62
3/10/99 ©UCB Spring 1999
Fetch 108, Dcd 104, Ex 100, Mem 34, WB 30
Exec
Reg.
File
Mem
Acces
s
Data
Mem
r14
r15
Reg
File
PC
Next
PC
IR
Inst.
Mem
D
Decode
Mem
Ctrl
WB
Ctrl
r1=M[r2+35]
10 lw r1, r2(35)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
110
r2 = r2+3
r3 = r4-r5
xx
ori
r8
add
r10
and
r13
n
Squash the extra instruction in the branch shadow!
r9
|
17
r11+r12
CS152 / Kubiatowicz
Lec12.63
3/10/99 ©UCB Spring 1999
Fetch 114, Dcd 110, Ex 104, Mem 100, WB 34
Exec
Reg.
File
Mem
Acces
s
Data
Mem
Reg
File
PC
Next
PC
IR
Inst.
Mem
D
Decode
Mem
Ctrl
WB
Ctrl
r1=M[r2+35]
10 lw r1, r2(35)
14 addI r2, r2, 3
20 sub r3, r4, r5
24 beq r6, r7, 100
30 ori r8, r9, 17
34 add r10, r11, r12
100 and r13, r14, 15
114
r2 = r2+3
r3 = r4-r5
r8 = r9 | 17
add
r10
and
r13
n
Squash the extra instruction in the branch shadow!
r11+r12
NO WB
NO Ovflow
r14
&
R15
CS152 / Kubiatowicz
Lec12.64
3/10/99 ©UCB Spring 1999
Summary: Pipelining
° What makes it easy
• all instructions are the same length
• just a few instruction formats
• memory operands appear only in loads and stores
° What makes it hard?
• structural hazards: suppose we had only one memory
• control hazards: need to worry about branch instructions
• data hazards: an instruction depends on a previous instruction
° We’ll build a simple pipeline and look at these issues
° We’ll talk about modern processors and what really
makes it hard:
• exception handling
• trying to improve performance with out-of-order execution, etc.
CS152 / Kubiatowicz
Lec12.65
3/10/99 ©UCB Spring 1999
Summary
° Pipelining is a fundamental concept
• multiple steps using distinct resources
° Utilize capabilities of the Datapath by pipelined
instruction processing
• start next instruction while working on the current one
• limited by length of longest stage (plus fill/flush)
• detect and resolve hazards

More Related Content

Similar to lec12-pipelining.ppt

Syllabus 3 month pclr
Syllabus 3 month pclrSyllabus 3 month pclr
Syllabus 3 month pclrchiptroniks
 
Microprocessor and Microcontroller Based Systems.ppt
Microprocessor and Microcontroller Based Systems.pptMicroprocessor and Microcontroller Based Systems.ppt
Microprocessor and Microcontroller Based Systems.pptTALHARIAZ46
 
Syllabus 5 month pclr
Syllabus 5 month pclrSyllabus 5 month pclr
Syllabus 5 month pclrchiptroniks
 
PLC Introduction Details
PLC Introduction DetailsPLC Introduction Details
PLC Introduction Detailssuhaskhadake
 
Dealing with exceptions Computer Architecture part 2
Dealing with exceptions Computer Architecture part 2Dealing with exceptions Computer Architecture part 2
Dealing with exceptions Computer Architecture part 2Gaditek
 
Dealing with Exceptions Computer Architecture part 1
Dealing with Exceptions Computer Architecture part 1Dealing with Exceptions Computer Architecture part 1
Dealing with Exceptions Computer Architecture part 1Gaditek
 
digital camera image capturing technique
digital camera image capturing techniquedigital camera image capturing technique
digital camera image capturing techniqueSindhu Mani
 
BMC: Bare Metal Container @Open Source Summit Japan 2017
BMC: Bare Metal Container @Open Source Summit Japan 2017BMC: Bare Metal Container @Open Source Summit Japan 2017
BMC: Bare Metal Container @Open Source Summit Japan 2017Kuniyasu Suzaki
 
technical report presents a comprehensive study. .pptx
technical report presents a comprehensive study. .pptxtechnical report presents a comprehensive study. .pptx
technical report presents a comprehensive study. .pptxMostafaKhaled78
 
Ch2 embedded processors-i
Ch2 embedded processors-iCh2 embedded processors-i
Ch2 embedded processors-iAnkit Shah
 
The history and development of PLC.pdf
The history and development of PLC.pdfThe history and development of PLC.pdf
The history and development of PLC.pdfAbdurahmanWaleed
 

Similar to lec12-pipelining.ppt (20)

Syllabus 3 month pclr
Syllabus 3 month pclrSyllabus 3 month pclr
Syllabus 3 month pclr
 
Pclr syllabus 3 month
Pclr syllabus  3 monthPclr syllabus  3 month
Pclr syllabus 3 month
 
Microprocessor and Microcontroller Based Systems.ppt
Microprocessor and Microcontroller Based Systems.pptMicroprocessor and Microcontroller Based Systems.ppt
Microprocessor and Microcontroller Based Systems.ppt
 
ECI OpenFlow 2.0 the Future of SDN
ECI OpenFlow 2.0 the Future of SDN ECI OpenFlow 2.0 the Future of SDN
ECI OpenFlow 2.0 the Future of SDN
 
Syllabus 5 month pclr
Syllabus 5 month pclrSyllabus 5 month pclr
Syllabus 5 month pclr
 
chapter 4
chapter 4chapter 4
chapter 4
 
PLC Introduction Details
PLC Introduction DetailsPLC Introduction Details
PLC Introduction Details
 
19th Session.pptx
19th Session.pptx19th Session.pptx
19th Session.pptx
 
Dealing with exceptions Computer Architecture part 2
Dealing with exceptions Computer Architecture part 2Dealing with exceptions Computer Architecture part 2
Dealing with exceptions Computer Architecture part 2
 
Dealing with Exceptions Computer Architecture part 1
Dealing with Exceptions Computer Architecture part 1Dealing with Exceptions Computer Architecture part 1
Dealing with Exceptions Computer Architecture part 1
 
Unit-III.pptx
Unit-III.pptxUnit-III.pptx
Unit-III.pptx
 
digital camera image capturing technique
digital camera image capturing techniquedigital camera image capturing technique
digital camera image capturing technique
 
seminar on PIC1684
seminar on PIC1684seminar on PIC1684
seminar on PIC1684
 
BMC: Bare Metal Container @Open Source Summit Japan 2017
BMC: Bare Metal Container @Open Source Summit Japan 2017BMC: Bare Metal Container @Open Source Summit Japan 2017
BMC: Bare Metal Container @Open Source Summit Japan 2017
 
technical report presents a comprehensive study. .pptx
technical report presents a comprehensive study. .pptxtechnical report presents a comprehensive study. .pptx
technical report presents a comprehensive study. .pptx
 
Pclr syllabus 1 month
Pclr syllabus  1 monthPclr syllabus  1 month
Pclr syllabus 1 month
 
Ch2 embedded processors-i
Ch2 embedded processors-iCh2 embedded processors-i
Ch2 embedded processors-i
 
Debate on RISC-CISC
Debate on RISC-CISCDebate on RISC-CISC
Debate on RISC-CISC
 
The history and development of PLC.pdf
The history and development of PLC.pdfThe history and development of PLC.pdf
The history and development of PLC.pdf
 
UNIT-III ES.ppt
UNIT-III ES.pptUNIT-III ES.ppt
UNIT-III ES.ppt
 

Recently uploaded

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 

Recently uploaded (20)

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 

lec12-pipelining.ppt

  • 1. CS152 / Kubiatowicz Lec12.1 3/10/99 ©UCB Spring 1999 CS152 Computer Architecture and Engineering Lecture 12 Introduction to Pipelining Mar 10, 1999 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/
  • 2. CS152 / Kubiatowicz Lec12.2 3/10/99 ©UCB Spring 1999 Recap: Microprogramming ° Microprogramming is a convenient method for implementing structured control state diagrams: • Random logic replaced by microPC sequencer and ROM • Each line of ROM called a instruction: contains sequencer control + values for control points • limited state transitions: branch to zero, next sequential, branch to instruction address from displatch ROM ° Horizontal Code: one control bit in Instruction for every control line in datapath ° Vertical Code: groups of control-lines coded together in Instruction (e.g. possible ALU dest) ° Control design reduces to Microprogramming • Part of the design process is to develop a “language” that describes control and is easy for humans to understand
  • 3. CS152 / Kubiatowicz Lec12.3 3/10/99 ©UCB Spring 1999 Recap: Microprogramming ° Microprogramming is a fundamental concept • implement an instruction set by building a very simple processor and interpreting the instructions • essential for very complex instructions and when few register transfers are possible • overkill when ISA matches datapath 1-1 sequencer control datapath control micro-PC -sequencer: fetch,dispatch, sequential microinstruction () Dispatch ROM Opcode -Code ROM Decode Decode To DataPath Decoders implement our -code language: For instance: rt-ALU rd-ALU mem-ALU
  • 4. CS152 / Kubiatowicz Lec12.4 3/10/99 ©UCB Spring 1999 Recap: Exceptions ° Exception = unprogrammed control transfer • system takes action to handle the exception - must record the address of the offending instruction - record any other information necessary to return afterwards • returns control to user • must save & restore user state ° Allows constuction of a “user virtual machine” normal control flow: sequential, jumps, branches, calls, returns user program System Exception Handler Exception: return from exception
  • 5. CS152 / Kubiatowicz Lec12.5 3/10/99 ©UCB Spring 1999 Recap: Precise Exceptions ° Precise  state of the machine is preserved as if program executed up to the offending instruction • All previous instructions completed • Offending instruction and all following instructions act as if they have not even started • MIPS takes this position ° Imprecise  system software has to figure out what is where and put it all back together • Precise exceptions particularly important for virtual memory, since we may need to quickly schedule another user • Precise exceptions also important for frequent interrupts, where need low-overhead restart ° Performance goals often lead designers to forsake precise interrupts • system software developers, user, markets etc. usually wish they had not done this ° Modern techniques for out-of-order execution and branch prediction help implement precise interrupts
  • 6. CS152 / Kubiatowicz Lec12.6 3/10/99 ©UCB Spring 1999 Recap: Two Types of Exceptions ° Interrupts • caused by external events: - Network, Keyboard, Disk I/O, Timer • asynchronous to program execution - Most interrupts can be disabled for brief periods of time - Some (like “Power Failing”) are non-maskable (NMI) ° Traps • caused by internal events - exceptional conditions (overflow) - errors (parity) - faults (non-resident page) • synchronous to program execution • condition must be remedied by the handler • instruction may be retried or simulated and program continued or program may be aborted
  • 7. CS152 / Kubiatowicz Lec12.7 3/10/99 ©UCB Spring 1999 Example: How Control Handles Traps in our FSD ° Undefined Instruction–detected when no next state is defined from state 1 for the op value. • We handle this exception by defining the next state value for all op values other than lw, sw, 0 (R-type), jmp, beq, and ori as new state 12. • Shown symbolically using “other” to indicate that the op field does not match any of the opcodes that label arcs out of state 1. ° Arithmetic overflow–detected on ALU ops such as signed add • Used to save PC and enter exception handler ° External Interrupt – flagged by asserted interrupt line • Again, must save PC and enter exception handler ° Note: Challenge in designing control of a real machine is to handle different interactions between instructions and other exception-causing events such that control logic remains small and fast. • Complex interactions makes the control unit the most challenging aspect of hardware design
  • 8. CS152 / Kubiatowicz Lec12.8 3/10/99 ©UCB Spring 1999 How add traps and interrupts to state diagram? IR <= MEM[PC] PC <= PC + 4 R-type S <= A fun B R[rd] <= S S <= A op ZX R[rt] <= S ORi S <= A + SX R[rt] <= M M <= MEM[S] LW S <= A + SX MEM[S] <= B SW “instruction fetch” “decode” 0000 0001 0100 0101 0110 0111 1000 1001 1010 1011 1100 BEQ 0010 If A = B then PC <= S S<= PC +SX undefined instruction EPC <= PC - 4 PC <= exp_addr cause <= 10 (RI) other S <= A - B overflow EPC <= PC - 4 PC <= exp_addr cause <= 12 (Ovf) EPC <= PC - 4 PC <= exp_addr cause <= 0(INT) Handle Interrupt Pending INT
  • 9. CS152 / Kubiatowicz Lec12.9 3/10/99 ©UCB Spring 1999 But: What has to change in our -sequencer? ° Need concept of branch at micro-code level R-type S <= A fun B 0100 overflow EPC <= PC - 4 PC <= exp_addr cause <= 12 (Ovf) µAddress Select Logic Opcode microPC 1 Adder Dispatch ROM Mux 0 0 1 2 Mux Mux 0 1 overflow pending interrupt 4? N? Cond Select Do -branch -offset Seq Select
  • 10. CS152 / Kubiatowicz Lec12.10 3/10/99 ©UCB Spring 1999 Example: Can easily use with for non-ideal memory IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B BEQ PC <= Next(PC) SW “instruction fetch” “decode / operand fetch” Execute Memory Write-back ~wait wait ~wait wait PC <= PC + 4 ~wait wait
  • 11. CS152 / Kubiatowicz Lec12.11 3/10/99 ©UCB Spring 1999 Summary: Microprogramming one inspiration for RISC ° If simple instruction could execute at very high clock rate… ° If you could even write compilers to produce microinstructions… ° If most programs use simple instructions and addressing modes… ° If microcode is kept in RAM instead of ROM so as to fix bugs … ° If same memory used for control memory could be used instead as cache for “macroinstructions”… ° Then why not skip instruction interpretation by a microprogram and simply compile directly into lowest language of machine? (microprogramming is overkill when ISA matches datapath 1-1)
  • 12. CS152 / Kubiatowicz Lec12.12 3/10/99 ©UCB Spring 1999 The Big Picture: Where are We Now? ° The Five Classic Components of a Computer ° Next Topics: • Pipelining by Analogy • Administrivia; Course road map Control Datapath Memory Processor Input Output
  • 13. CS152 / Kubiatowicz Lec12.13 3/10/99 ©UCB Spring 1999 Pipelining is Natural! ° Laundry Example ° Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold ° Washer takes 30 minutes ° Dryer takes 40 minutes ° “Folder” takes 20 minutes A B C D
  • 14. CS152 / Kubiatowicz Lec12.14 3/10/99 ©UCB Spring 1999 Sequential Laundry ° Sequential laundry takes 6 hours for 4 loads ° If they learned pipelining, how long would laundry take? A B C D 30 40 20 30 40 20 30 40 20 30 40 20 6 PM 7 8 9 10 11 Midnight T a s k O r d e r Time
  • 15. CS152 / Kubiatowicz Lec12.15 3/10/99 ©UCB Spring 1999 Pipelined Laundry: Start work ASAP ° Pipelined laundry takes 3.5 hours for 4 loads A B C D 6 PM 7 8 9 10 11 Midnight T a s k O r d e r Time 30 40 40 40 40 20
  • 16. CS152 / Kubiatowicz Lec12.16 3/10/99 ©UCB Spring 1999 Pipelining Lessons ° Pipelining doesn’t help latency of single task, it helps throughput of entire workload ° Pipeline rate limited by slowest pipeline stage ° Multiple tasks operating simultaneously using different resources ° Potential speedup = Number pipe stages ° Unbalanced lengths of pipe stages reduces speedup ° Time to “fill” pipeline and time to “drain” it reduces speedup ° Stall for Dependences A B C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20
  • 17. CS152 / Kubiatowicz Lec12.17 3/10/99 ©UCB Spring 1999 Administrative Issues °Computers in the news: •Intel decided to settle out of court •FTC had much narrower case with Intel than with Microsoft •Intel choose quieter solution still stinging from bad publicity due to Divide bug. °Moving on to Chapter 6 °Next weeksections in Cory Lab
  • 18. CS152 / Kubiatowicz Lec12.18 3/10/99 ©UCB Spring 1999 The Five Stages of Load ° Ifetch: Instruction Fetch • Fetch the instruction from the Instruction Memory ° Reg/Dec: Registers Fetch and Instruction Decode ° Exec: Calculate the memory address ° Mem: Read the data from the Data Memory ° Wr: Write the data back to the register file Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Ifetch Reg/Dec Exec Mem Wr Load
  • 19. CS152 / Kubiatowicz Lec12.19 3/10/99 ©UCB Spring 1999 Note: These 5 stages were there all along! IR <= MEM[PC] PC <= PC + 4 R-type ALUout <= A fun B R[rd] <= ALUout ALUout <= A op ZX R[rt] <= ALUout ORi ALUout <= A + SX R[rt] <= M M <= MEM[ALUout] LW ALUout <= A + SX MEM[ALUout] <= B SW 0000 0001 0100 0101 0110 0111 1000 1001 1010 1011 1100 BEQ 0010 If A = B then PC <= ALUout ALUout <= PC +SX Execute Memory Write-back Decode Fetch
  • 20. CS152 / Kubiatowicz Lec12.20 3/10/99 ©UCB Spring 1999 Pipelining ° Improve performance by increasing throughput Ideal speedup is number of stages in the pipeline. Do we achieve this?
  • 21. CS152 / Kubiatowicz Lec12.21 3/10/99 ©UCB Spring 1999 Basic Idea ° What do we need to add to split the datapath into stages?
  • 22. CS152 / Kubiatowicz Lec12.22 3/10/99 ©UCB Spring 1999 Graphically Representing Pipelines ° Can help with answering questions like: • how many cycles does it take to execute this code? • what is the ALU doing during cycle 4? • use this representation to help understand datapaths
  • 23. CS152 / Kubiatowicz Lec12.23 3/10/99 ©UCB Spring 1999 Conventional Pipelined Execution Representation IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB Program Flow Time
  • 24. CS152 / Kubiatowicz Lec12.24 3/10/99 ©UCB Spring 1999 Single Cycle, Multiple Cycle, vs. Pipeline Clk Cycle 1 Multiple Cycle Implementation: Ifetch Reg Exec Mem Wr Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Load Ifetch Reg Exec Mem Wr Ifetch Reg Exec Mem Load Store Pipeline Implementation: Ifetch Reg Exec Mem Wr Store Clk Single Cycle Implementation: Load Store Waste Ifetch R-type Ifetch Reg Exec Mem Wr R-type Cycle 1 Cycle 2
  • 25. CS152 / Kubiatowicz Lec12.25 3/10/99 ©UCB Spring 1999 Why Pipeline? ° Suppose we execute 100 instructions ° Single Cycle Machine • 45 ns/cycle x 1 CPI x 100 inst = 4500 ns ° Multicycle Machine • 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns ° Ideal pipelined machine • 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
  • 26. CS152 / Kubiatowicz Lec12.26 3/10/99 ©UCB Spring 1999 Why Pipeline? Because the resources are there! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3 ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg
  • 27. CS152 / Kubiatowicz Lec12.27 3/10/99 ©UCB Spring 1999 Can pipelining get us into trouble? ° Yes: Pipeline Hazards • structural hazards: attempt to use the same resource two different ways at the same time - E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) • data hazards: attempt to use item before it is ready - E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer - instruction depends on result of prior instruction still in the pipeline • control hazards: attempt to make a decision before condition is evaulated - E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in - branch instructions ° Can always resolve hazards by waiting • pipeline control must detect the hazard • take action (or delay action) to resolve hazards
  • 28. CS152 / Kubiatowicz Lec12.28 3/10/99 ©UCB Spring 1999 Mem Single Memory is a Structural Hazard I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Mem Reg Mem Reg ALU Mem Reg Mem Reg ALU Mem Reg Mem Reg ALU Reg Mem Reg ALU Mem Reg Mem Reg Detection is easy in this case! (right half highlight means read, left half write)
  • 29. CS152 / Kubiatowicz Lec12.29 3/10/99 ©UCB Spring 1999 Structural Hazards limit performance ° Example: if 1.3 memory accesses per instruction and only one memory access per cycle then • average CPI  1.3 • otherwise resource is more than 100% utilized
  • 30. CS152 / Kubiatowicz Lec12.30 3/10/99 ©UCB Spring 1999 ° Stall: wait until decision is clear ° Impact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) => slow ° Move decision to end of decode • save 1 cycle per branch Control Hazard Solutions I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg Mem Reg ALU Mem Reg Mem Reg ALU Reg Mem Reg Mem Lost potential
  • 31. CS152 / Kubiatowicz Lec12.31 3/10/99 ©UCB Spring 1999 ° Predict: guess one direction then back up if wrong • Predict not taken ° Impact: 0 lost cycles per branch instruction if right, 1 if wrong (right 50% of time) • Need to “Squash” and restart following instruction if wrong • Produce CPI on branch of (1 *.5 + 2 * .5) = 1.5 • Total CPI might then be: 1.5 * .2 + 1 * .8 = 1.1 (20% branch) ° More dynamic scheme: history of 1 branch ( 90%) Control Hazard Solutions I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Mem Reg Mem Reg ALU Mem Reg Mem Reg Mem ALU Reg Mem Reg
  • 32. CS152 / Kubiatowicz Lec12.32 3/10/99 ©UCB Spring 1999 ° Redefine branch behavior (takes place after next instruction) “delayed branch” ° Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” ( 50% of time) ° As launch more instruction per clock cycle, less useful Control Hazard Solutions I n s t r. O r d e r Time (clock cycles) Add Beq Misc ALU Mem Reg Mem Reg ALU Mem Reg Mem Reg Mem ALU Reg Mem Reg Load Mem ALU Reg Mem Reg
  • 33. CS152 / Kubiatowicz Lec12.33 3/10/99 ©UCB Spring 1999 Data Hazard on r1 add r1 ,r2,r3 sub r4, r1 ,r3 and r6, r1 ,r7 or r8, r1 ,r9 xor r10, r1 ,r11
  • 34. CS152 / Kubiatowicz Lec12.34 3/10/99 ©UCB Spring 1999 • Dependencies backwards in time are hazards Data Hazard on r1: I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 I F ID/R F E X ME M W B ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg Im ALU Reg Dm Reg ALU Im Reg Dm Reg
  • 35. CS152 / Kubiatowicz Lec12.35 3/10/99 ©UCB Spring 1999 • “Forward” result from one stage to another • “or” OK if define read/write properly Data Hazard Solution: I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 I F ID/R F E X ME M W B ALU Im Reg Dm Reg ALU Im Reg Dm Reg ALU Im Reg Dm Reg Im ALU Reg Dm Reg ALU Im Reg Dm Reg
  • 36. CS152 / Kubiatowicz Lec12.36 3/10/99 ©UCB Spring 1999 Reg • Dependencies backwards in time are hazards • Can’t solve with forwarding: • Must delay/stall instruction dependent on loads Forwarding (or Bypassing): What about Loads Time (clock cycles) lw r1,0(r2) sub r4,r1,r3 I F ID/R F E X ME M W B ALU Im Reg Dm ALU Im Reg Dm Reg
  • 37. CS152 / Kubiatowicz Lec12.37 3/10/99 ©UCB Spring 1999 Reg • Dependencies backwards in time are hazards • Can’t solve with forwarding: • Must delay/stall instruction dependent on loads Forwarding (or Bypassing): What about Loads Time (clock cycles) lw r1,0(r2) sub r4,r1,r3 I F ID/R F E X ME M W B ALU Im Reg Dm ALU Im Reg Dm Reg Stall
  • 38. CS152 / Kubiatowicz Lec12.38 3/10/99 ©UCB Spring 1999 Designing a Pipelined Processor ° Go back and examine your datapath and control diagram ° associated resources with states ° ensure that flows do not conflict, or figure out how to resolve ° assert control in appropriate stage
  • 39. CS152 / Kubiatowicz Lec12.39 3/10/99 ©UCB Spring 1999 Control and Datapath: Split state diag into 5 pieces IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec Reg. File Mem Acces s Data Mem A B S Reg File Equal PC Next PC IR Inst. Mem D M
  • 40. CS152 / Kubiatowicz Lec12.40 3/10/99 ©UCB Spring 1999 Pipelined Processor (almost) for slides ° What happens if we start a new instruction every cycle? Exec Reg. File Mem Acces s Data Mem A B S M Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D
  • 41. CS152 / Kubiatowicz Lec12.41 3/10/99 ©UCB Spring 1999 Pipelined Datapath (as in book); hard to read
  • 42. CS152 / Kubiatowicz Lec12.42 3/10/99 ©UCB Spring 1999 Pipelining the Load Instruction ° The five independent functional units in the pipeline datapath are: • Instruction Memory for the Ifetch stage • Register File’s Read ports (bus A and busB) for the Reg/Dec stage • ALU for the Exec stage • Data Memory for the Mem stage • Register File’s Write port (bus W) for the Wr stage Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Ifetch Reg/Dec Exec Mem Wr 1st lw Ifetch Reg/Dec Exec Mem Wr 2nd lw Ifetch Reg/Dec Exec Mem Wr 3rd lw
  • 43. CS152 / Kubiatowicz Lec12.43 3/10/99 ©UCB Spring 1999 The Four Stages of R-type ° Ifetch: Instruction Fetch • Fetch the instruction from the Instruction Memory ° Reg/Dec: Registers Fetch and Instruction Decode ° Exec: • ALU operates on the two register operands • Update PC ° Wr: Write the ALU output back to the register file Cycle 1 Cycle 2 Cycle 3 Cycle 4 Ifetch Reg/Dec Exec Wr R-type
  • 44. CS152 / Kubiatowicz Lec12.44 3/10/99 ©UCB Spring 1999 Pipelining the R-type and Load Instruction ° We have pipeline conflict or structural hazard: • Two instructions try to write to the register file at the same time! • Only one write port Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Wr R-type Ops! We have a problem!
  • 45. CS152 / Kubiatowicz Lec12.45 3/10/99 ©UCB Spring 1999 Important Observation ° Each functional unit can only be used once per instruction ° Each functional unit must be used at the same stage for all instructions: • Load uses Register File’s Write Port during its 5th stage • R-type uses Register File’s Write Port during its 4th stage Ifetch Reg/Dec Exec Mem Wr Load 1 2 3 4 5 Ifetch Reg/Dec Exec Wr R-type 1 2 3 4 ° 2 ways to solve this pipeline hazard.
  • 46. CS152 / Kubiatowicz Lec12.46 3/10/99 ©UCB Spring 1999 Solution 1: Insert “Bubble” into the Pipeline ° Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle • The control logic can be complex. • Lose instruction fetch and issue opportunity. ° No instruction is started in Cycle 6! Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Wr R-type Pipeline Bubble Ifetch Reg/Dec Exec Wr
  • 47. CS152 / Kubiatowicz Lec12.47 3/10/99 ©UCB Spring 1999 Solution 2: Delay R-type’s Write by One Cycle ° Delay R-type’s register write by one cycle: • Now R-type instructions also use Reg File’s write port at Stage 5 • Mem stage is a NOOP stage: nothing is being done. Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Ifetch Reg/Dec Mem Wr R-type Ifetch Reg/Dec Mem Wr R-type Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Mem Wr R-type Ifetch Reg/Dec Mem Wr R-type Ifetch Reg/Dec Exec Wr R-type Mem Exec Exec Exec Exec 1 2 3 4 5
  • 48. CS152 / Kubiatowicz Lec12.48 3/10/99 ©UCB Spring 1999 Modified Control & Datapath IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– M; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– M; S <– A + SX; Mem[S] <- B if Cond PC < PC+SX; M <– S Exec Reg. File Mem Acces s Data Mem A B S Reg File Equal PC Next PC IR Inst. Mem D M M <– S
  • 49. CS152 / Kubiatowicz Lec12.49 3/10/99 ©UCB Spring 1999 The Four Stages of Store ° Ifetch: Instruction Fetch • Fetch the instruction from the Instruction Memory ° Reg/Dec: Registers Fetch and Instruction Decode ° Exec: Calculate the memory address ° Mem: Write the data into the Data Memory Cycle 1 Cycle 2 Cycle 3 Cycle 4 Ifetch Reg/Dec Exec Mem Store Wr
  • 50. CS152 / Kubiatowicz Lec12.50 3/10/99 ©UCB Spring 1999 The Three Stages of Beq ° Ifetch: Instruction Fetch • Fetch the instruction from the Instruction Memory ° Reg/Dec: • Registers Fetch and Instruction Decode ° Exec: • compares the two register operand, • select correct branch target address • latch into PC Cycle 1 Cycle 2 Cycle 3 Cycle 4 Ifetch Reg/Dec Exec Mem Beq Wr
  • 51. CS152 / Kubiatowicz Lec12.51 3/10/99 ©UCB Spring 1999 Control Diagram IR <- Mem[PC]; PC < PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec Reg. File Mem Acces s Data Mem A B S Reg File Equal PC Next PC IR Inst. Mem D M <– S M <– S M
  • 52. CS152 / Kubiatowicz Lec12.52 3/10/99 ©UCB Spring 1999 Datapath + Data Stationary Control Exec Reg. File Mem Acces s Data Mem A B S Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rs rt op rs rt fun im ex me wb rw v me wb rw v wb rw v
  • 53. CS152 / Kubiatowicz Lec12.53 3/10/99 ©UCB Spring 1999 Let’s Try it Out 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 these addresses are octal
  • 54. CS152 / Kubiatowicz Lec12.54 3/10/99 ©UCB Spring 1999 Start: Fetch 10 Exec Reg. File Mem Acces s Data Mem A B S Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rs rt im 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 n n n n 10
  • 55. CS152 / Kubiatowicz Lec12.55 3/10/99 ©UCB Spring 1999 Fetch 14, Decode 10 Exec Reg. File Mem Acces s Data Mem A B S Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2 rt im 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 n n n 14 lw r1, r2(35)
  • 56. CS152 / Kubiatowicz Lec12.56 3/10/99 ©UCB Spring 1999 Fetch 20, Decode 14, Exec 10 Exec Reg. File Mem Acces s Data Mem r2 B S Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2 rt 35 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 n n 20 lw r1 addI r2, r2, 3
  • 57. CS152 / Kubiatowicz Lec12.57 3/10/99 ©UCB Spring 1999 Fetch 24, Decode 20, Exec 14, Mem 10 Exec Reg. File Mem Acces s Data Mem r2 B r2+35 Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 4 5 3 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 n 24 lw r1 sub r3, r4, r5 addI r2, r2, 3
  • 58. CS152 / Kubiatowicz Lec12.58 3/10/99 ©UCB Spring 1999 Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 Exec Reg. File Mem Acces s Data Mem r4 r5 r2+3 Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M[r2+35] 6 7 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 30 lw r1 beq r6, r7 100 addI r2 sub r3
  • 59. CS152 / Kubiatowicz Lec12.59 3/10/99 ©UCB Spring 1999 Fetch 34, Dcd 30, Ex 24, Mem 20, WB 14 Exec Reg. File Mem Acces s Data Mem r6 r7 r2+3 Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 9 xx 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 34 beq addI r2 sub r3 r4-r5 100 ori r8, r9 17
  • 60. CS152 / Kubiatowicz Lec12.60 3/10/99 ©UCB Spring 1999 Fetch 100, Dcd 34, Ex 30, Mem 24, WB 20 Exec Reg. File Mem Acces s Data Mem r9 x Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 11 12 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 100 beq r2 = r2+3 sub r3 r4-r5 17 ori r8 xxx add r10, r11, r12 ooops, we should have only one delayed instruction
  • 61. CS152 / Kubiatowicz Lec12.61 3/10/99 ©UCB Spring 1999 Fetch 104, Dcd 100, Ex 34, Mem 30, WB 24 Exec Reg. File Mem Acces s Data Mem r11 r12 Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 14 15 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 104 beq r2 = r2+3 r3 = r4-r5 xx ori r8 xxx add r10 and r13, r14, r15 n Squash the extra instruction in the branch shadow! r9 | 17
  • 62. CS152 / Kubiatowicz Lec12.62 3/10/99 ©UCB Spring 1999 Fetch 108, Dcd 104, Ex 100, Mem 34, WB 30 Exec Reg. File Mem Acces s Data Mem r14 r15 Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 110 r2 = r2+3 r3 = r4-r5 xx ori r8 add r10 and r13 n Squash the extra instruction in the branch shadow! r9 | 17 r11+r12
  • 63. CS152 / Kubiatowicz Lec12.63 3/10/99 ©UCB Spring 1999 Fetch 114, Dcd 110, Ex 104, Mem 100, WB 34 Exec Reg. File Mem Acces s Data Mem Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 114 r2 = r2+3 r3 = r4-r5 r8 = r9 | 17 add r10 and r13 n Squash the extra instruction in the branch shadow! r11+r12 NO WB NO Ovflow r14 & R15
  • 64. CS152 / Kubiatowicz Lec12.64 3/10/99 ©UCB Spring 1999 Summary: Pipelining ° What makes it easy • all instructions are the same length • just a few instruction formats • memory operands appear only in loads and stores ° What makes it hard? • structural hazards: suppose we had only one memory • control hazards: need to worry about branch instructions • data hazards: an instruction depends on a previous instruction ° We’ll build a simple pipeline and look at these issues ° We’ll talk about modern processors and what really makes it hard: • exception handling • trying to improve performance with out-of-order execution, etc.
  • 65. CS152 / Kubiatowicz Lec12.65 3/10/99 ©UCB Spring 1999 Summary ° Pipelining is a fundamental concept • multiple steps using distinct resources ° Utilize capabilities of the Datapath by pipelined instruction processing • start next instruction while working on the current one • limited by length of longest stage (plus fill/flush) • detect and resolve hazards