2. Overview
Pipelining is widely used in modern
processors.
Pipelining improves system performance in
terms of throughput.[No of work done at a
given time]
3. Basic Concepts
Ability of a processor to perform more than one tasks in
single time duration is called pipelining.
6. Pipelining
It is technique of decomposing a sequential
process into suboperation, with each
suboperation completed in dedicated
segment.
Pipeline is commonly known as an assembly
line operation.
7. Traditional Pipeline Concept
Laundry Example
Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
Washer takes 30 minutes
Dryer takes 40 minutes
“Folder” takes 20 minutes
A B C D
8. Traditional Pipeline Concept
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
Sequential laundry takes 6
hours for 4 loads
If they learned pipelining,
how long would laundry
take?
6 PM 7 8 9 10 11 Midnight
Time
10. Traditional Pipeline Concept
Pipelining doesn’t help
latency of single task, it
helps throughput of entire
workload
Pipeline rate limited by
slowest pipeline stage
Multiple tasks operating
simultaneously using
different resources
Unbalanced lengths of pipe
stages reduces speedup
Time to “fill” pipeline and
time to “drain” it reduces
speedup.
A
B
C
D
6 PM 7 8 9
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
11. Idea of pipelining in computer
The processor execute the program by
fetching and executing instructions. One after
the other.
Let Fi and Ei refer to the fetch and execute
steps for instruction Ii
12. Use the Idea of Pipelining in a
Computer
F1
E1
F2
E2
F3
E3
I1 I2 I3
(a) Sequential execution
Instruction
fetch
unit
Execution
unit
Interstage buffer
B1
(b) Hardware organization
Time
F1 E1
F2 E2
F3 E3
Instruction
I1
I2
I3
(c) Pipelined execution
Figure Basic idea of instruction pipelining.
Clock cycle 1 2 3 4
Time
Fetch + Execution
13. Contd.,
Computer that has two separate hardware
units, one for fetching and another for
executing them.
the instruction fetched by the fetch unit is
deposited in an intermediate buffer B1.
This buffer needed to enable the execution
unit while fetch unit fetching the next
instruction.
14. The computer is controlled by a clock.
Any instruction fetch and execute steps
completed in one clock cycle.
15. Use the Idea of Pipelining in a
Computer Clock cycle 1 2 3 4 5 6
Time
7
Instruction
I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
Figure A 4stage pipeline.
(a) Instruction execution divided into four steps
Interstage buffers
F : Fetch
instruction
D : Decode
and fetch
operands
instruction E: Execute
operation
W : Write
results
(b) Hardware organization
B1 B2 B3
Fetch + Decode
+ Execution + Write
16. Fetch(F)- read the instruction from the memory
Decode(D)- Decode the instruction and fetch
the source operand
Execute(E)- perform the operation specified by
the instruction
Write(W)- store the result in the destination
location
17. Role of Cache Memory
Each pipeline stage is expected to complete in one
clock cycle.
The clock period should be long enough to let the
slowest pipeline stage to complete.
Faster stages can only wait for the slowest one to
complete.
Since main memory is very slow compared to the
execution, if each instruction needs to be fetched
from main memory, pipeline is almost useless.ten
times greater than the time needed to perform
pipeline stage]
Fortunately, we have cache.
18. Pipeline Performance
The potential increase in performance
resulting from pipelining is proportional to the
number of pipeline stages.
However, this increase would be achieved
only if all pipeline stages require the same
time to complete, and there is no interruption
throughout program execution.
Unfortunately, this is not true.
Floating point may involve many clock cycle
19. Pipeline Performance
F1 D1 E1 W1
F2 D2 E2 W2
F3
Instruction
I1
I2
I3 E3
D3 W3
D4 E4 W4
F4
I4
Clock c
ycle 1 2 3 4 5 6 7 8
Time
9
Figure Effect of an execution operation taking more than one clock cycle.
F5 D5 E5
I5
20. Pipeline Performance
• The previous pipeline is said to have been stalled for two
clock cycles.
• Any condition that causes a pipeline to stall is called a
hazard.
• Data hazard – any condition in which either the source or
the destination operands of an instruction are not
available at the time expected in the pipeline. So some
operation has to be delayed, and the pipeline stalls.
• Instruction (control) hazard – a delay in the availability of
an instruction causes the pipeline to stall.[cache miss]
• Structural hazard – the situation when two instructions
require the use of a given hardware resource at the same
time.
21. Pipeline Performance
F1 D1 E1 W1
F2 D2 E2 W2
F3 D3 E3 W3
I1
I2
I3
Clock cycle 1 2 3 4 5 6
Instruction
Figure . Pipeline stall caused by a cache miss in F2.
7 8 9
(a) Instruction execution steps in successive clock cycles
(b) Function performed by each processor stage in successive clock cycles
Time
Clock cycle 1 2 3 4 5 6 7 8 9
Stage
F: Fetch F1 F2 F2 F2 F2 F3
D: Decode D1 idle idle idle D2 D3
E: Execute E1 idle idle idle E2 E3
W: Write W1 idle idle idle W2 W3
Time
Idle periods –
stalls (bubbles)
Instruction
hazard(Cache
miss)
Decode unit is idle
in cycles 3 through
5, Execute unit idle
in cycle 4 through 6
and write unit is idle
in cycle 5 through 7
such idle period is
called stalls.