SlideShare a Scribd company logo
1 of 77
Lecture 4: Pipelining
Basics & Hazards
Kai Bu
kaibu@zju.edu.cn
Lab Opening Hours:
Mon – Thu 13:00 – 16:00
Thu 9:00 – 12:00 Sun 14:00 – 17:00
Assignment 1 Submission
Appendix C.1-C.2
Outline
• Part 1 Basics
what’s pipelining
pipelining principles
RISC and its five-stage pipeline
• Part 2 Challenges: Pipeline Hazards
structural hazard
data hazard
control hazard
Outline
• Part 1 Basics
what’s pipelining
pipelining principles
RISC and its five-stage pipeline
• Part 2 Challenges: Pipeline Hazards
structural hazard
data hazard
control hazard
What’s Pipelining
You already knew!
Try the laundry example:
Laundry Example
Ann, Brian, Cathy, Dave
Each has one load of clothes to
wash, dry, fold.
washer
30 mins
dryer
40 mins
folder
20 mins
Sequential Laundry
What would you do?
Task
Order
A
B
C
D
Time
30 40 20 30 40 20 30 40 20 30 40 20
6 Hours
Sequential Laundry
What would you do?
Task
Order
A
B
C
D
Time
30 40 20 30 40 20 30 40 20 30 40 20
6 Hours
Pipelined Laundry
Observations
• A task has a series
of stages;
• Stage dependency:
e.g., wash before
dry;
• Multi tasks with
overlapping stages;
• Simultaneously use
diff resources to
speed up;
• Slowest stage
determines the
finish time;
Task
Order
A
B
C
D
Time
30 40 40 40 40 20
3.5 Hours
Pipelined Laundry
Observations
• No speed up for
individual task;
e.g., A still takes
30+40+20=90
• But speed up for
average task
execution time;
e.g.,
3.5*60/4=52.5 <
30+40+20=90
Task
Order
A
B
C
D
Time
30 40 40 40 40 20
3.5 Hours
Assembly Line
Auto
Cola
Outline
• Part 1 Basics
what’s pipelining
pipelining principles
RISC and its five-stage pipeline
• Part 2 Challenges: Pipeline Hazards
structural hazard
data hazard
control hazard
Pipelining
• An implementation technique
whereby multiple instructions are
overlapped in execution.
e.g., B wash while A dry
• Essence: Start executing one
instruction before completing the
previous one.
• Significance: Make fast CPUs.
A
B
Balanced Pipeline
• Equal-length pipe stages
e.g., Wash, dry, fold = 40 mins
per unpipelined laundry time = 40x3 mins
3 pipe stages – wash, dry, fold
A
T1
40min
T2
T3
T4
A
A
B
B
B
C
C
D
Balanced Pipeline
• Equal-length pipe stages
e.g., Wash, dry, fold = 40 mins
per unpipelined laundry time = 40x3 mins
3 pipe stages – wash, dry, fold
A
T1
40min
T2
T3
T4
A
A
B
B
B
C
C
D
Balanced Pipeline
• Equal-length pipe stages
e.g., Wash, dry, fold = 40 mins
per unpipelined laundry time = 40x3 mins
3 pipe stages – wash, dry, fold
A
T1
40min
T2
T3
T4
A
A
B
B
B
C
C
D
One task/instruction
per 40 mins
Time per instruction by pipeline =
Time per instr on unpipelined machine
Number of pipe stages
Speed up by pipeline =
Number of pipe stages
Balanced Pipeline
• Equal-length pipe stages
e.g., Wash, dry, fold = 40 mins
per unpipelined laundry time = 40x3 mins
3 pipe stages – wash, dry, fold
A
T1
40min
T2
T3
T4
A
A
B
B
B
C
C
D
• Performance
Pipelining Terminology
• Latency: the time for an instruction to
complete.
• Throughput of a CPU: the number of
instructions completed per second.
• Clock cycle: everything in CPU moves in
lockstep; synchronized by the clock.
• Processor Cycle: time required between
moving an instruction one step down the
pipeline;
= time required to complete a pipe stage;
= max(times for completing all stages);
= one or two clock cycles, but rarely more.
• CPI: clock cycles per instruction
Outline
• Part 1 Basics
what’s pipelining
pipelining principles
RISC and its five-stage pipeline
• Part 2 Challenges: Pipeline Hazards
structural hazard
data hazard
control hazard
RISC: Reduced Instruction Set Computer
Properties:
• All operations on data apply to data in
registers and typically change the entire
register (32 or 64 bits per reg);
• Only load and store operations affect
memory;
load: move data from mem to reg;
store: move data from reg to mem;
• Only a few instruction formats; all
instructions typically being one size.
RISC: Reduced Instruction Set Computer
32 registers
3 classes of instructions - 1
• ALU (Arithmetic Logic Unit) instructions
operate on two regs or a reg + a sign-
extended immediate;
store the result into a third reg;
e.g., add (DADD), subtract (DSUB)
logical operations AND, OR
RISC: Reduced Instruction Set Computer
3 classes of instructions - 2
• Load (LD) and store (SD) instructions
operands: base register + offset;
the sum (called effective address) is used as
a memory address;
Load: use a second reg operand as the
destination for the data loaded from memory;
Store: use a second reg operand as the
source of the data stored into memory.
RISC: Reduced Instruction Set Computer
3 classes of instructions - 3
• Branches and jumps
conditional transfers of control;
Branch:
specify the branch condition with a set of
condition bits or comparisons between two
regs or between a reg and zero;
decide the branch destination by adding a
sign-extended offset to the current PC
(program counter);
RISC: Reduced Instruction Set Computer
at most 5 clock cycles per instruction – 1
IF ID EX MEM WB
• Instruction Fetch cycle
send the PC to memory;
fetch the current instruction from mem;
PC = PC + 4; //each instr is 4 bytes
RISC: Reduced Instruction Set Computer
at most 5 clock cycles per instruction – 2
IF ID EX MEM WB
• Instruction Decode/register fetch cycle
decode the instruction;
read the registers (corresponding to
register source specifiers);
RISC: Reduced Instruction Set Computer
at most 5 clock cycles per instruction – 3
IF ID EX MEM WB
• Execution/effective address cycle
ALU operates on the operands from ID:
3 functions depending on the instr type - 1
-Memory reference: ALU adds base register
and offset to form effective address;
RISC: Reduced Instruction Set Computer
at most 5 clock cycles per instruction – 3
IF ID EX MEM WB
• Execution/effective address cycle
ALU operates on the operands from ID:
3 functions depending on the instr type - 2
-Register-Register ALU instruction: ALU
performs the operation specified by opcode
on the values read from the register file;
RISC: Reduced Instruction Set Computer
at most 5 clock cycles per instruction – 3
IF ID EX MEM WB
• EXecution/effective address cycle
ALU operates on the operands from ID:
3 functions depending on the instr type - 3
-Register-Immediate ALU instruction: ALU
operates on the first value read from the
register file and the sign-extended
immediate.
RISC: Reduced Instruction Set Computer
at most 5 clock cycles per instruction – 4
IF ID EX MEM WB
• MEMory access
for load instr: the memory does a read
using the effective address;
for store instr: the memory writes the
data from the second register using the
effective address.
RISC: Reduced Instruction Set Computer
at most 5 clock cycles per instruction – 5
IF ID EX MEM WB
• Write-Back cycle
for Register-Register ALU or load instr;
write the result into the register file,
whether it comes from the memory (for
load) or from the ALU (for ALU instr).
RISC: Reduced Instruction Set Computer
at most 5 clock cycles per instruction
IF ID EX MEM WB
RISC: Five-Stage Pipeline
Simply start a new instruction
on each clock cycle;
Speedup = 5.
RISC: Five-Stage Pipeline
• How it works
separate instruction and data mems
to eliminate conflicts for a single
memory between instruction fetch
and data memory access.
IF MEM
Instr mem Data mem
RISC: Five-Stage Pipeline
• How it works
use the register file in two stages;
either with half CC;
in one clock cycle, write before read
ID WB
read write
RISC: Five-Stage Pipeline
• How it works
introduce pipeline registers between
successive stages;
pipeline registers store the results of
a stage and use them as the input of
the next stage.
RISC: Five-Stage Pipeline
• How it works
RISC: Five-Stage Pipeline
• How it works - omit pipeline regs
for simplicity
but required in implementation
RISC: Five-Stage Pipeline
• Example
Consider an unpipelined instruction.
1 ns clock cycle;
4 cycles for ALU and branches;
5 cycles for memory operations;
relative frequencies 40%, 20%, 40%;
0.2 ns pipeline overhead (e.g., due to
stage imbalance, pipeline register setup,
clock skew)
Question: How much speedup by pipeline?
RISC: Five-Stage Pipeline
• Answer
speedup by pipelining
= Avg instr time unpipelined
Avg instr time pipelined
= ?
RISC: Five-Stage Pipeline
• Answer
Avg instr time unpipelined
= clock cycle x avg CPI
= 1 ns x [(0.4+0.2)x4 + 0.4x5]
= 4.4 ns
Avg instr time pipelined
= 1+0.2
= 1.2 ns
RISC: Five-Stage Pipeline
• Answer
speedup by pipelining
= Avg instr time unpipelined
Avg instr time pipelined
= 4.4 ns
1.2 ns
= 3.7 times
That’s it !
That’s it?
When Pipeline Is Stuck
LD R1, 0(R2)
DSUB R4, R1, R5
R1
R1
Outline
• Part 1 Basics
what’s pipelining
pipelining principles
RISC and its five-stage pipeline
• Part 2 Challenges: Pipeline Hazards
structural hazard
data hazard
control hazard
Pipeline Hazards
• Hazards: situations that prevent the
next instruction from executing in the
designated clock cycle.
• 3 classes of hazards:
structural hazard – resource conflicts
data hazard – data dependency
control hazard – pc changes
(e.g., branches)
Outline
• Part 1 Basics
what’s pipelining
pipelining principles
RISC and its five-stage pipeline
• Part 2 Challenges: Pipeline Hazards
structural hazard
data hazard
control hazard
Structural Hazard
• Root Cause: resource conflicts
e.g., a processor with 1 reg write port
but intend two writes in a CC
• Solution
stall one of the instructions
until required unit is available
Structural Hazard
• Example
1 mem port
mem conflict
data access
vs
instr fetch
Load
Instr i+3
Instr i+2
Instr i+1
MEM
IF
Structural Hazard
Stall Instr i+3
till CC 5
Structural Hazard
• Example
ideal CPI is 1;
40% data references;
structural hazard with 1.05 times
higher clock rate than ideal;
Question:
is pipeline w/wo hazard faster?
by how much?
Stall for
one clock cycle
Structural Hazard
• Answer
avg instr time w/o hazard
=CPI x clock cycle timeideal
=1 x clock cycle timeideal
avg instr time w/ hazard
=(1 + 0.4x1) x clock cycle timeideal
1.05
=1.3 x clock cycle timeideal
So, w/o hazard is 1.3 times faster.
Outline
• Part 1 Basics
what’s pipelining
pipelining principles
RISC and its five-stage pipeline
• Part 2 Challenges: Pipeline Hazards
structural hazard
data hazard
control hazard
Data Hazard
• Root Cause: data dependency
when the pipeline changes the order
of read/write accesses to operands;
so that the order differs from the
order seen by sequentially executing
instructions on an unpipelined
processor.
Data Hazard
DADD
DSUB
AND
OR
XOR
R1, R2, R3
R4, R1, R5
R6, R1, R7
R8, R1, R9
R10, R1, R11
R1
No hazard
1st half cycle: w
2nd half cycle: r
Data Hazard
• Solution: forwarding
directly feed back EX/MEM&MEM/WB
pipeline regs’ results to the ALU inputs;
if forwarding hw detects that previous
ALU has written the reg corresponding
to a source for the current ALU,
control logic selects the forwarded
result as the ALU input.
Data Hazard: Forwarding
DADD
DSUB
AND
OR
XOR
R1, R2, R3
R4, R1, R5
R6, R1, R7
R8, R1, R9
R10, R1, R11
R1
Data Hazard: Forwarding
DADD
DSUB
AND
OR
XOR
R1, R2, R3
R4, R1, R5
R6, R1, R7
R8, R1, R9
R10, R1, R11
R1
EX/MEM
Data Hazard: Forwarding
DADD
DSUB
AND
OR
XOR
R1, R2, R3
R4, R1, R5
R6, R1, R7
R8, R1, R9
R10, R1, R11
R1
MEM/WB
Data Hazard: Forwarding
• Generalized forwarding
pass a result directly to the functional
unit that requires it;
forward results to not only ALU inputs
but also other types of functional units;
Data Hazard: Forwarding
• Generalized forwarding
DADD R1, R2, R3
LD R4, 0(R1)
SD R4, 12(R1)
R1
R1
R1
R1
R4
R4
Data Hazard
• Sometimes stall is necessary
R1
R1
LD R1, 0(R2)
DSUB R4, R1, R5
MEM/WB
Forwarding cannot be backward.
Has to stall.
Outline
• Part 1 Basics
what’s pipelining
pipelining principles
RISC and its five-stage pipeline
• Part 2 Challenges: Pipeline Hazards
structural hazard
data hazard
control hazard
Control Hazard
• braches and jumps
• Branch hazard
a branch may or may mot change PC
to other values other than PC+4;
taken branch: changes PC to its
target address;
untaken branch: falls through;
PC is not changed till the end of ID;
Branch Hazard
• Redo IF
If the branch is untaken,
the stall is unnecessary.
essentially a stall
Branch Hazard: Solutions
4 simple compile time schemes – 1
• Freeze or flush the pipeline
hold or delete any instructions after the
branch till the branch dst is known;
i.e., Redo IF w/o the first IF
Branch Hazard: Solutions
4 simple compile time schemes – 2
• Predicted-untaken
simply treat every branch as untaken;
when the branch is untaken,
pipelining as if no hazard.
Branch Hazard: Solutions
4 simple compile time schemes – 2
• Predicted-untaken
but if the branch is taken:
turn fetched instr into a no-op (idle);
restart the IF at the branch target addr
Branch Hazard: Solutions
4 simple compile time schemes – 3
• Predicted-taken
simply treat every branch as taken;
not apply to the five-stage pipeline;
apply to scenarios when branch target
addr is known before branch outcome.
Branch Hazard: Solutions
4 simple compile time schemes – 4
• Delayed branch
delay the branch execution after the
next instruction;
pipelining sequence:
branch instruction
sequential successor
branch target if taken
Branch delay slot
the next instruction
Branch Hazard: Solutions
• Delayed branch
Branch Hazard: Performance
• Example
a deeper pipeline (e.g., in MIPS R4000)
with the following branch penalties:
and the following branch frequencies:
Question: find the effective addition to
the CPI arising from branches.
Branch Hazard: Performance
• Answer
find the CPIs by
relative frequency x respective penalty.
0.04x2 0.10x3
0.08+0.30
Conclusion
• Pipelining promises fast CPU by
starting the execution of one
instruction before completing the
previous one.
• Classic five-stage pipeline for RISC
IF – ID – EX –MEM - WB
• Pipeline hazards limit ideal pipelining
structural/data/control hazard
?
Further Readings
• RISC wiki
http://en.wikipedia.org/wiki/Reduced_inst
ruction_set_computing
• MIPS wiki
http://en.wikipedia.org/wiki/MIPS_archite
cture
• RISC Processors
http://www.scs.carleton.ca/sivarama/org_
book/org_book_web/solution_manual/org
_soln_one/arch_book_solution_ch14.pdf
• …

More Related Content

Similar to Pipelining Lecture on Basics and Hazards

CMPN301-Pipelining_V2.pptx
CMPN301-Pipelining_V2.pptxCMPN301-Pipelining_V2.pptx
CMPN301-Pipelining_V2.pptxNadaAAmin
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptxJoyChowdhury30
 
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdfCS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdfAsst.prof M.Gokilavani
 
Parallel Processing Techniques Pipelining
Parallel Processing Techniques PipeliningParallel Processing Techniques Pipelining
Parallel Processing Techniques PipeliningRNShukla7
 
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.pptComputer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.pptHermellaGashaw
 
Unit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processingUnit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processingvishal choudhary
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesMahmudul Hasan
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with PipeliningAneesh Raveendran
 
Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelinesturki_09
 
Pipelining And Vector Processing
Pipelining And Vector ProcessingPipelining And Vector Processing
Pipelining And Vector ProcessingTheInnocentTuber
 
Pipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture pptPipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture pptmali yogesh kumar
 
Lec18 pipeline
Lec18 pipelineLec18 pipeline
Lec18 pipelineGRajendra
 
arithmaticpipline-170310085040.pptx
arithmaticpipline-170310085040.pptxarithmaticpipline-170310085040.pptx
arithmaticpipline-170310085040.pptxAshokRachapalli1
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInteX Research Lab
 
pipelining ppt.pdf
pipelining ppt.pdfpipelining ppt.pdf
pipelining ppt.pdfWilliamTom9
 
What to do when detect deadlock
What to do when detect deadlockWhat to do when detect deadlock
What to do when detect deadlockSyed Zaid Irshad
 
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOCSOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOCSnehaLatha68
 

Similar to Pipelining Lecture on Basics and Hazards (20)

CMPN301-Pipelining_V2.pptx
CMPN301-Pipelining_V2.pptxCMPN301-Pipelining_V2.pptx
CMPN301-Pipelining_V2.pptx
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptx
 
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdfCS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
 
Parallel Processing Techniques Pipelining
Parallel Processing Techniques PipeliningParallel Processing Techniques Pipelining
Parallel Processing Techniques Pipelining
 
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.pptComputer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
 
Unit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processingUnit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processing
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelines
 
BTCS501_MM_Ch9.pptx
BTCS501_MM_Ch9.pptxBTCS501_MM_Ch9.pptx
BTCS501_MM_Ch9.pptx
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 
Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelines
 
Pipelining And Vector Processing
Pipelining And Vector ProcessingPipelining And Vector Processing
Pipelining And Vector Processing
 
Pipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture pptPipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture ppt
 
Core pipelining
Core pipelining Core pipelining
Core pipelining
 
Lec18 pipeline
Lec18 pipelineLec18 pipeline
Lec18 pipeline
 
arithmaticpipline-170310085040.pptx
arithmaticpipline-170310085040.pptxarithmaticpipline-170310085040.pptx
arithmaticpipline-170310085040.pptx
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer Architecture
 
pipelining ppt.pdf
pipelining ppt.pdfpipelining ppt.pdf
pipelining ppt.pdf
 
What to do when detect deadlock
What to do when detect deadlockWhat to do when detect deadlock
What to do when detect deadlock
 
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOCSOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
 
Pipelining
PipeliningPipelining
Pipelining
 

Recently uploaded

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 

Recently uploaded (20)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 

Pipelining Lecture on Basics and Hazards

  • 1. Lecture 4: Pipelining Basics & Hazards Kai Bu kaibu@zju.edu.cn
  • 2. Lab Opening Hours: Mon – Thu 13:00 – 16:00 Thu 9:00 – 12:00 Sun 14:00 – 17:00 Assignment 1 Submission
  • 4. Outline • Part 1 Basics what’s pipelining pipelining principles RISC and its five-stage pipeline • Part 2 Challenges: Pipeline Hazards structural hazard data hazard control hazard
  • 5. Outline • Part 1 Basics what’s pipelining pipelining principles RISC and its five-stage pipeline • Part 2 Challenges: Pipeline Hazards structural hazard data hazard control hazard
  • 6. What’s Pipelining You already knew! Try the laundry example:
  • 7. Laundry Example Ann, Brian, Cathy, Dave Each has one load of clothes to wash, dry, fold. washer 30 mins dryer 40 mins folder 20 mins
  • 8. Sequential Laundry What would you do? Task Order A B C D Time 30 40 20 30 40 20 30 40 20 30 40 20 6 Hours
  • 9. Sequential Laundry What would you do? Task Order A B C D Time 30 40 20 30 40 20 30 40 20 30 40 20 6 Hours
  • 10. Pipelined Laundry Observations • A task has a series of stages; • Stage dependency: e.g., wash before dry; • Multi tasks with overlapping stages; • Simultaneously use diff resources to speed up; • Slowest stage determines the finish time; Task Order A B C D Time 30 40 40 40 40 20 3.5 Hours
  • 11. Pipelined Laundry Observations • No speed up for individual task; e.g., A still takes 30+40+20=90 • But speed up for average task execution time; e.g., 3.5*60/4=52.5 < 30+40+20=90 Task Order A B C D Time 30 40 40 40 40 20 3.5 Hours
  • 13. Outline • Part 1 Basics what’s pipelining pipelining principles RISC and its five-stage pipeline • Part 2 Challenges: Pipeline Hazards structural hazard data hazard control hazard
  • 14. Pipelining • An implementation technique whereby multiple instructions are overlapped in execution. e.g., B wash while A dry • Essence: Start executing one instruction before completing the previous one. • Significance: Make fast CPUs. A B
  • 15. Balanced Pipeline • Equal-length pipe stages e.g., Wash, dry, fold = 40 mins per unpipelined laundry time = 40x3 mins 3 pipe stages – wash, dry, fold A T1 40min T2 T3 T4 A A B B B C C D
  • 16. Balanced Pipeline • Equal-length pipe stages e.g., Wash, dry, fold = 40 mins per unpipelined laundry time = 40x3 mins 3 pipe stages – wash, dry, fold A T1 40min T2 T3 T4 A A B B B C C D
  • 17. Balanced Pipeline • Equal-length pipe stages e.g., Wash, dry, fold = 40 mins per unpipelined laundry time = 40x3 mins 3 pipe stages – wash, dry, fold A T1 40min T2 T3 T4 A A B B B C C D
  • 18. One task/instruction per 40 mins Time per instruction by pipeline = Time per instr on unpipelined machine Number of pipe stages Speed up by pipeline = Number of pipe stages Balanced Pipeline • Equal-length pipe stages e.g., Wash, dry, fold = 40 mins per unpipelined laundry time = 40x3 mins 3 pipe stages – wash, dry, fold A T1 40min T2 T3 T4 A A B B B C C D • Performance
  • 19. Pipelining Terminology • Latency: the time for an instruction to complete. • Throughput of a CPU: the number of instructions completed per second. • Clock cycle: everything in CPU moves in lockstep; synchronized by the clock. • Processor Cycle: time required between moving an instruction one step down the pipeline; = time required to complete a pipe stage; = max(times for completing all stages); = one or two clock cycles, but rarely more. • CPI: clock cycles per instruction
  • 20. Outline • Part 1 Basics what’s pipelining pipelining principles RISC and its five-stage pipeline • Part 2 Challenges: Pipeline Hazards structural hazard data hazard control hazard
  • 21. RISC: Reduced Instruction Set Computer Properties: • All operations on data apply to data in registers and typically change the entire register (32 or 64 bits per reg); • Only load and store operations affect memory; load: move data from mem to reg; store: move data from reg to mem; • Only a few instruction formats; all instructions typically being one size.
  • 22. RISC: Reduced Instruction Set Computer 32 registers 3 classes of instructions - 1 • ALU (Arithmetic Logic Unit) instructions operate on two regs or a reg + a sign- extended immediate; store the result into a third reg; e.g., add (DADD), subtract (DSUB) logical operations AND, OR
  • 23. RISC: Reduced Instruction Set Computer 3 classes of instructions - 2 • Load (LD) and store (SD) instructions operands: base register + offset; the sum (called effective address) is used as a memory address; Load: use a second reg operand as the destination for the data loaded from memory; Store: use a second reg operand as the source of the data stored into memory.
  • 24. RISC: Reduced Instruction Set Computer 3 classes of instructions - 3 • Branches and jumps conditional transfers of control; Branch: specify the branch condition with a set of condition bits or comparisons between two regs or between a reg and zero; decide the branch destination by adding a sign-extended offset to the current PC (program counter);
  • 25. RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction – 1 IF ID EX MEM WB • Instruction Fetch cycle send the PC to memory; fetch the current instruction from mem; PC = PC + 4; //each instr is 4 bytes
  • 26. RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction – 2 IF ID EX MEM WB • Instruction Decode/register fetch cycle decode the instruction; read the registers (corresponding to register source specifiers);
  • 27. RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction – 3 IF ID EX MEM WB • Execution/effective address cycle ALU operates on the operands from ID: 3 functions depending on the instr type - 1 -Memory reference: ALU adds base register and offset to form effective address;
  • 28. RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction – 3 IF ID EX MEM WB • Execution/effective address cycle ALU operates on the operands from ID: 3 functions depending on the instr type - 2 -Register-Register ALU instruction: ALU performs the operation specified by opcode on the values read from the register file;
  • 29. RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction – 3 IF ID EX MEM WB • EXecution/effective address cycle ALU operates on the operands from ID: 3 functions depending on the instr type - 3 -Register-Immediate ALU instruction: ALU operates on the first value read from the register file and the sign-extended immediate.
  • 30. RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction – 4 IF ID EX MEM WB • MEMory access for load instr: the memory does a read using the effective address; for store instr: the memory writes the data from the second register using the effective address.
  • 31. RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction – 5 IF ID EX MEM WB • Write-Back cycle for Register-Register ALU or load instr; write the result into the register file, whether it comes from the memory (for load) or from the ALU (for ALU instr).
  • 32. RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction IF ID EX MEM WB
  • 33. RISC: Five-Stage Pipeline Simply start a new instruction on each clock cycle; Speedup = 5.
  • 34. RISC: Five-Stage Pipeline • How it works separate instruction and data mems to eliminate conflicts for a single memory between instruction fetch and data memory access. IF MEM Instr mem Data mem
  • 35. RISC: Five-Stage Pipeline • How it works use the register file in two stages; either with half CC; in one clock cycle, write before read ID WB read write
  • 36. RISC: Five-Stage Pipeline • How it works introduce pipeline registers between successive stages; pipeline registers store the results of a stage and use them as the input of the next stage.
  • 38. RISC: Five-Stage Pipeline • How it works - omit pipeline regs for simplicity but required in implementation
  • 39. RISC: Five-Stage Pipeline • Example Consider an unpipelined instruction. 1 ns clock cycle; 4 cycles for ALU and branches; 5 cycles for memory operations; relative frequencies 40%, 20%, 40%; 0.2 ns pipeline overhead (e.g., due to stage imbalance, pipeline register setup, clock skew) Question: How much speedup by pipeline?
  • 40. RISC: Five-Stage Pipeline • Answer speedup by pipelining = Avg instr time unpipelined Avg instr time pipelined = ?
  • 41. RISC: Five-Stage Pipeline • Answer Avg instr time unpipelined = clock cycle x avg CPI = 1 ns x [(0.4+0.2)x4 + 0.4x5] = 4.4 ns Avg instr time pipelined = 1+0.2 = 1.2 ns
  • 42. RISC: Five-Stage Pipeline • Answer speedup by pipelining = Avg instr time unpipelined Avg instr time pipelined = 4.4 ns 1.2 ns = 3.7 times
  • 45. When Pipeline Is Stuck LD R1, 0(R2) DSUB R4, R1, R5 R1 R1
  • 46. Outline • Part 1 Basics what’s pipelining pipelining principles RISC and its five-stage pipeline • Part 2 Challenges: Pipeline Hazards structural hazard data hazard control hazard
  • 47. Pipeline Hazards • Hazards: situations that prevent the next instruction from executing in the designated clock cycle. • 3 classes of hazards: structural hazard – resource conflicts data hazard – data dependency control hazard – pc changes (e.g., branches)
  • 48. Outline • Part 1 Basics what’s pipelining pipelining principles RISC and its five-stage pipeline • Part 2 Challenges: Pipeline Hazards structural hazard data hazard control hazard
  • 49. Structural Hazard • Root Cause: resource conflicts e.g., a processor with 1 reg write port but intend two writes in a CC • Solution stall one of the instructions until required unit is available
  • 50. Structural Hazard • Example 1 mem port mem conflict data access vs instr fetch Load Instr i+3 Instr i+2 Instr i+1 MEM IF
  • 52. Structural Hazard • Example ideal CPI is 1; 40% data references; structural hazard with 1.05 times higher clock rate than ideal; Question: is pipeline w/wo hazard faster? by how much?
  • 53. Stall for one clock cycle Structural Hazard • Answer avg instr time w/o hazard =CPI x clock cycle timeideal =1 x clock cycle timeideal avg instr time w/ hazard =(1 + 0.4x1) x clock cycle timeideal 1.05 =1.3 x clock cycle timeideal So, w/o hazard is 1.3 times faster.
  • 54. Outline • Part 1 Basics what’s pipelining pipelining principles RISC and its five-stage pipeline • Part 2 Challenges: Pipeline Hazards structural hazard data hazard control hazard
  • 55. Data Hazard • Root Cause: data dependency when the pipeline changes the order of read/write accesses to operands; so that the order differs from the order seen by sequentially executing instructions on an unpipelined processor.
  • 56. Data Hazard DADD DSUB AND OR XOR R1, R2, R3 R4, R1, R5 R6, R1, R7 R8, R1, R9 R10, R1, R11 R1 No hazard 1st half cycle: w 2nd half cycle: r
  • 57. Data Hazard • Solution: forwarding directly feed back EX/MEM&MEM/WB pipeline regs’ results to the ALU inputs; if forwarding hw detects that previous ALU has written the reg corresponding to a source for the current ALU, control logic selects the forwarded result as the ALU input.
  • 58. Data Hazard: Forwarding DADD DSUB AND OR XOR R1, R2, R3 R4, R1, R5 R6, R1, R7 R8, R1, R9 R10, R1, R11 R1
  • 59. Data Hazard: Forwarding DADD DSUB AND OR XOR R1, R2, R3 R4, R1, R5 R6, R1, R7 R8, R1, R9 R10, R1, R11 R1 EX/MEM
  • 60. Data Hazard: Forwarding DADD DSUB AND OR XOR R1, R2, R3 R4, R1, R5 R6, R1, R7 R8, R1, R9 R10, R1, R11 R1 MEM/WB
  • 61. Data Hazard: Forwarding • Generalized forwarding pass a result directly to the functional unit that requires it; forward results to not only ALU inputs but also other types of functional units;
  • 62. Data Hazard: Forwarding • Generalized forwarding DADD R1, R2, R3 LD R4, 0(R1) SD R4, 12(R1) R1 R1 R1 R1 R4 R4
  • 63. Data Hazard • Sometimes stall is necessary R1 R1 LD R1, 0(R2) DSUB R4, R1, R5 MEM/WB Forwarding cannot be backward. Has to stall.
  • 64. Outline • Part 1 Basics what’s pipelining pipelining principles RISC and its five-stage pipeline • Part 2 Challenges: Pipeline Hazards structural hazard data hazard control hazard
  • 65. Control Hazard • braches and jumps • Branch hazard a branch may or may mot change PC to other values other than PC+4; taken branch: changes PC to its target address; untaken branch: falls through; PC is not changed till the end of ID;
  • 66. Branch Hazard • Redo IF If the branch is untaken, the stall is unnecessary. essentially a stall
  • 67. Branch Hazard: Solutions 4 simple compile time schemes – 1 • Freeze or flush the pipeline hold or delete any instructions after the branch till the branch dst is known; i.e., Redo IF w/o the first IF
  • 68. Branch Hazard: Solutions 4 simple compile time schemes – 2 • Predicted-untaken simply treat every branch as untaken; when the branch is untaken, pipelining as if no hazard.
  • 69. Branch Hazard: Solutions 4 simple compile time schemes – 2 • Predicted-untaken but if the branch is taken: turn fetched instr into a no-op (idle); restart the IF at the branch target addr
  • 70. Branch Hazard: Solutions 4 simple compile time schemes – 3 • Predicted-taken simply treat every branch as taken; not apply to the five-stage pipeline; apply to scenarios when branch target addr is known before branch outcome.
  • 71. Branch Hazard: Solutions 4 simple compile time schemes – 4 • Delayed branch delay the branch execution after the next instruction; pipelining sequence: branch instruction sequential successor branch target if taken Branch delay slot the next instruction
  • 73. Branch Hazard: Performance • Example a deeper pipeline (e.g., in MIPS R4000) with the following branch penalties: and the following branch frequencies: Question: find the effective addition to the CPI arising from branches.
  • 74. Branch Hazard: Performance • Answer find the CPIs by relative frequency x respective penalty. 0.04x2 0.10x3 0.08+0.30
  • 75. Conclusion • Pipelining promises fast CPU by starting the execution of one instruction before completing the previous one. • Classic five-stage pipeline for RISC IF – ID – EX –MEM - WB • Pipeline hazards limit ideal pipelining structural/data/control hazard
  • 76. ?
  • 77. Further Readings • RISC wiki http://en.wikipedia.org/wiki/Reduced_inst ruction_set_computing • MIPS wiki http://en.wikipedia.org/wiki/MIPS_archite cture • RISC Processors http://www.scs.carleton.ca/sivarama/org_ book/org_book_web/solution_manual/org _soln_one/arch_book_solution_ch14.pdf • …