SlideShare a Scribd company logo
Aneesh Raveendran
Centre for Development of Advanced
Computing, INDIA
• What is pipelining ?
• Pipeline Taxonomies
• Instruction Pipelines
• MIPS Instruction Pipeline
• Pipeline Hazards
• MIPS Pipelined Datapath
• Load Word Instruction Example
• Pipeline Datapath Example
• Pipeline Control
• Pipeline Instruction Example
• Pipeline Hazards
• Control Hazards
• Data Hazards
• Detecting Data Hazards
• Resolving Data Hazards
• Forwarding Example
• Stalling Example
• Branch Hazards
• Branching Example
• Key terms
• There are two main ways to increase the performance of
a processor through high-level system architecture
• Increasing the memory access speed
• Increasing the number of supported concurrent operations
• Pipelining !
• Parallelism ?
• Pipelining is the process by which instructions are
parallelized over several overlapping stages of
execution, in order to maximize datapath efficiency
• Pipelining is analogous to many everyday scenarios
• Car manufacturing process
• Batch laundry jobs
• Basically, any assembly-line operation applies
• Two important concepts:
• New inputs are accepted at one end before previously
accepted inputs appear as outputs at the other end;
• The number of operations performed per second is increased,
even though the elapsed time needed to perform any one
operation remains the same
Looking at the textbook’s example,
we have a 4-stage pipeline of
laundry tasks:
1. Place one dirty load of clothes
into washer
2. Place the washed clothes into a
dryer
3. Place a dry load on a table and
fold
4. Put the clothes away
Graphically speaking:
• Sequential (top) vs.
• Pipelined (bottom) execution
• There are two types of pipelines used in computer systems
• Arithmetic pipelines
• Used to pipeline data intensive functionalities
• Instruction pipelines
• Used to pipeline the basic instruction fetch and execute sequence
• Other classifications include
• Linear vs. nonlinear pipelines
• Presence (or lack) of feedforward and feedback paths between stages
• Static vs. dynamic pipelines
• Dynamic pipelines are multifunctional, taking on a different form
depending on the function being executed
• Scalar vs. vector pipelines
• Vector pipelines specifically target computations using vector data
• Let us now introduce the pipeline we’re working with
• It’s a 5-stage instruction, linear, static and scalar
pipeline, consisting of the following steps:
• Fetch instruction from Memory (IF)
• Read registers while decoding the instruction (ID)
• Execute the operation or calculate an address (EX)
• Access an operand in data memory (MEM)
• Write the result into a register (WB)
• Again, theoretically, pipeline speedup = number of
stages in pipeline
Inst. Fetch (2ns), Reg. read/write (1ns), ALU op. (2ns), Data access (2ns)
Clk
Cycle 1
Multiple Cycle Implementation:
Ifetch Reg Exec Mem Wr
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Load Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem
Load Store
Pipeline Implementation:
Ifetch Reg Exec Mem WrStore
Clk
Single Cycle Implementation:
Load Store Waste
Ifetch
R-type
Ifetch Reg Exec Mem WrR-type
Cycle 1 Cycle 2
• Suppose
• 100 instructions are executed
• The single cycle machine has a cycle time of 45 ns
• The multicycle and pipeline machines have cycle times of 10 ns
• The multicycle machine has a CPI of 4.6
• Single Cycle Machine
• 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
• Multicycle Machine
• 10 ns/cycle x 4.6 CPI x 100 inst = 4600 ns
• Ideal pipelined machine
• 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
• Ideal pipelined vs. single cycle speedup
• 4500 ns / 1040 ns = 4.33
• What has not yet been considered?
• What makes it easy
• all instructions are the same length
• just a few instruction formats
• memory operands appear only in loads and stores
• What makes it hard?
• structural hazards: suppose we had only one memory
• control hazards: need to worry about branch instructions
• data hazards: an instruction depends on a previous instruction
• We’ll build a simple pipeline and look at these issues
• structural hazards: attempt to use the same resource two
different ways at the same time
• E.g., two instructions try to read the same memory at the same time
• data hazards: attempt to use item before it is ready
• instruction depends on result of prior instruction still in the pipeline
add r1, r2, r3
sub r4, r2, r1
• control hazards: attempt to make a decision before condition is
evaulated
• branch instructions
beq r1, loop
add r1, r2, r3
• Can always resolve hazards by waiting
• pipeline control must detect the hazard
• take action (or delay action) to resolve hazards
What do we need to split the datapath into stages ?
Pipeline registers (buffers) are similar to multicycle processor design
Instruction fetch stage
Instruction decode and register file read stage
Execute or address calculation stage
Memory access stage
Write back stage
Write register number comes from the MEM/WB pipeline register along with the data
Multiple-clock cycle (vs. single-clock cycle) pipelined diagrams
Single-cycle pipeline diagram with one instruction on the pipeline
Single-cycle pipeline diagram with two instructions on the pipeline
• What control signals are required ?
• First, notice that the pipeline registers are written every clock
cycle, hence do not require explicit control signals, otherwise:
• Instruction fetch and PC increment
• Again, asserted at every clock cycle
• Instruction decode and register file read
• Again, asserted at every clock cycle
• Execution and address calculation
• Need to select the result register, the ALU operation, and either Read
data 2 or the sign-extended immediate for the ALU
• Memory access
• Need to read from memory, write to memory or complete branch
• Write back
• Need to send back either ALU result or memory value to the register file
Execution/Address
Calculation stage control
lines
Memory access stage
control lines
Write-back
stage control
lines
Instruction
Reg
Dst
ALU
Op1
ALU
Op0
ALU
Src
Branc
h
Mem
Read
Mem
Write
Reg
write
Mem
to Reg
R-format 1 1 0 0 0 0 0 1 0
lw 0 0 0 1 0 1 0 1 1
sw X 0 0 1 0 0 1 0 X
beq X 0 1 0 1 0 0 0 X
• Structural hazard
• Occurs when a combination of instructions is not supported by the datapath
• For example, a unified memory unit would need to be accessed in stages 1 (IF)
and 4 (MEM), which would cause a contention
• Pipeline outright fails in the presence of structural hazards
• Control hazard
• Occurs when a decision is made based on the results of one instructions, while
others are executing
• For example, a branch instruction is either taken or not
• Solutions that exist are stalling and predicting
• Data hazard
• Occurs when an instruction depends on the results of an instruction resident on
the pipeline
• For example, adding two register contents and storing their result into a third
register, then using that register’s contents for another operation
• Solutions that exist are based on forwarding
• Three major solutions
• Stall
• Predict
• Delayed branch slot
• Stalling involves always waiting for the PC to be updated with the
correct address before moving on
• A pipeline stall (or bubble) allows us to perform this wait
• Quite costly, as we have to stall even if the branch fails
• Predicting involves guessing whether the branch is taken or not,
and acting on that guess
• If correct, then proceed with normal pipeline execution
• If incorrect, then stall pipeline execution
• Delayed branch involves executing the next sequential instruction
with the branch taking place after that de laye d branch slo t
• The assembler automatically adjusts the instructions to make it
transparent from the programmer
• The instruction has to be safe, as in it shouldn’t affect the branch
• Longer pipelines requires the use of more branch delay slots
• Actual MIPS architecture solution
• Forwarding involves providing the inputs to a stage of one
instruction before the completion of another instruction
• Valid if destination stage is later in time than the source stage
• Left diagram shows typical forwarding scenario (add then sub)
• Right diagram shows that we still need a stall in the case of a load-
use data hazard (load then R-type)
sub $2, $1 , $3
and $1 2, $2, $5
o r $1 3, $6 , $2
add $1 4, $2, $2
sw $1 4, 1 0 0 ($2)
• We could insert “no operation” (nop) instructions to delay the
pipeline execution until the correct result is in the register file
sub $2, $1 , $3
no p
no p
and $1 2, $2, $5
o r $1 3, $6 , $2
add $1 4, $2, $2
sw $1 4, 1 0 0 ($2)
• Too slow as it adds extra useless clock cycles
• In reality, we try to find useful instructions to execute between data-
dependent instructions, but this happens too often to be efficient
• Let us try to formalize detecting a data hazard
1. EX/MEM.RegisterRd = ID/EX.RegisterRs
2. EX/MEM.RegisterRd = ID/EX.RegisterRt
3. MEM/WB.RegisterRd = ID/EX.RegisterRs
4. MEM/WB.RegisterRd = ID/EX.RegisterRt
sub $2, $1 , $3
and $1 2, $2, $5 Data hazard o f type #1
o r $1 3, $6 , $2 Data hazard o f type #4
add $1 4, $2, $2 No data hazard – re g iste r file
sw $1 4, 1 0 0 ($2) No data hazard – co rre ct o pe ratio n
• Two modifications are in order
• Firstly, we don’t have to forward all the time!
• Some instructions don’t write registers (e.g. beq)
• Use RegWrite signal in WB control block to determine condition
• Secondly, the $0 register must always return 0
• Can’t limit programmer of using it as a destination register
• Use RegisterRd to determine if $0 is being used
1. If (EX/MEM.RegWrite & (EX/MEM.RegisterRd ≠ 0) & (EX/MEM.RegisterRd=ID/EX.RegisterRs)) ForwardA= 10
2. If (EX/MEM.RegWrite & (EX/MEM.RegisterRd ≠ 0) & (EX/MEM.RegisterRd=ID/EX.RegisterRt)) ForwardB= 10
3. If (MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (MEM/WB.RegisterRd=ID/EX.RegisterRs)) ForwardA= 01
4. If (MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (MEM/WB.RegisterRd=ID/EX.RegisterRt)) ForwardB= 01
• Let us examine the hardware changes to our datapath
• Remember that there is no hazard in the WB stage,
because the register file is able to be written and read in
the same stage
Mux control Source Description
ForwardA = 00 ID/EX First ALU operand comes from RF
ForwardA = 01 EX/MEM First ALU operand forwarded from prior ALU result
ForwardA = 10 MEM/WB First ALU operand forwarded from data memory or prior ALU result
ForwardB = 00 ID/EX Second ALU operand comes from RF
ForwardB = 01 EX/MEM Second ALU operand forwarded from prior ALU result
ForwardB = 10 MEM/WB Second ALU operand forwarded from data memory or prior ALU result
lw $2, 2 0 ($1 )
and $4, $2, $5
o r $8 , $2, $6
add $9 , $4, $2
slt $1 , $6 , $7
• Let us try to formalize detecting a stalling data hazard
• If (ID/EX.MemRead & ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID/RegisterRt)))
• On the condition being true, we stall the pipeline!
• Other instructions are on the pipeline when we find out whether
we take the branch or not!
• Two solutions
• Assume branch is not taken
• Dynamic branch prediction
• We’ve already discussed the first solution
• Note that three instruction stages have to be flushed when the
branch is taken
• Done similarly to a data hazard stall (control values set to 0s)
• We can increase branch performance by moving the branch
decision to the ID stage (rather than the MEM stage)
• Branch target address calculated by moving adder into ID stage
• Branch decision done by comparing Rs and Rt
• Flushing the IF stage instruction involves nop instructions
• Store, in a branch pre dictio n buffe r, the history of each branch
instruction
• 1-bit requires one wrong prediction to update history table
• 2-bits requires two wrong predictions to update history table
• Pipelining vs. Parallelism
• Pipeline Stages
• Pipeline Taxonomies
• MIPS Instruction Pipeline
• Structural Hazards
• Control Hazards
• Data Hazards
• Pipeline Registers and Operation
• Pipeline Control
• Pipeline Throughput
• Pipeline Efficiency
• Control Hazard Stalling
• Control Hazard Predicting
• Control Hazard Delayed Branch
• Data Hazard Forwarding
• Data Hazard Detection
• Forwarding Unit
• Data Hazard Stalling
• Branch Prediction Buffer
1 . ane e shr20 20 @ g m ail. co m

More Related Content

What's hot

Interrupts of 8085
Interrupts of 8085Interrupts of 8085
Interrupts of 8085
ShivamSood22
 
Asynchronous Data Transfer.pptx
Asynchronous Data Transfer.pptxAsynchronous Data Transfer.pptx
Asynchronous Data Transfer.pptx
ArunaDevi63
 
Instruction Set of 8051 Microcontroller
Instruction Set of 8051 MicrocontrollerInstruction Set of 8051 Microcontroller
Instruction Set of 8051 Microcontroller
Multisoft Virtual Academy
 
Pipelining
PipeliningPipelining
Pipelining
sarith divakar
 
03 Introduction to Physical layer
03  Introduction to Physical layer03  Introduction to Physical layer
03 Introduction to Physical layer
Meenakshi Paul
 
Network layer logical addressing
Network layer logical addressingNetwork layer logical addressing
Network layer logical addressing
Sri Manakula Vinayagar Engineering College
 
Pipelining
PipeliningPipelining
Pipelining
AJAL A J
 
Instruction set of 8086
Instruction set of 8086Instruction set of 8086
Instruction set of 8086
Tirumalesh Nizampatnam
 
Classification of routing protocols
Classification of routing protocolsClassification of routing protocols
Classification of routing protocols
Menaga Selvaraj
 
Pipelining in computer architecture
Pipelining in computer architecturePipelining in computer architecture
Pipelining in computer architecture
Ramakrishna Reddy Bijjam
 
Instruction pipelining
Instruction pipeliningInstruction pipelining
Instruction pipelining
Tech_MX
 
HDLC
HDLCHDLC
UART Communication
UART CommunicationUART Communication
UART Communication
dattatraya1
 
Control Units : Microprogrammed and Hardwired:control unit
Control Units : Microprogrammed and Hardwired:control unitControl Units : Microprogrammed and Hardwired:control unit
Control Units : Microprogrammed and Hardwired:control unit
abdosaidgkv
 
Network layer tanenbaum
Network layer tanenbaumNetwork layer tanenbaum
Network layer tanenbaum
Mahesh Kumar Chelimilla
 
Multiple Access Protocal
Multiple Access ProtocalMultiple Access Protocal
Multiple Access Protocal
tes31
 
Computer Organization : CPU, Memory and I/O organization
Computer Organization : CPU, Memory and I/O organizationComputer Organization : CPU, Memory and I/O organization
Computer Organization : CPU, Memory and I/O organization
AmrutaMehata
 
UNIT-I-RTOS and Concepts
UNIT-I-RTOS and ConceptsUNIT-I-RTOS and Concepts
UNIT-I-RTOS and Concepts
Dr.YNM
 
Operating System Case Study and I/O System
Operating System Case Study and I/O SystemOperating System Case Study and I/O System
Operating System Case Study and I/O System
prakash ganesan
 
Computer organization and architecture
Computer organization and architectureComputer organization and architecture
Computer organization and architecture
Subesh Kumar Yadav
 

What's hot (20)

Interrupts of 8085
Interrupts of 8085Interrupts of 8085
Interrupts of 8085
 
Asynchronous Data Transfer.pptx
Asynchronous Data Transfer.pptxAsynchronous Data Transfer.pptx
Asynchronous Data Transfer.pptx
 
Instruction Set of 8051 Microcontroller
Instruction Set of 8051 MicrocontrollerInstruction Set of 8051 Microcontroller
Instruction Set of 8051 Microcontroller
 
Pipelining
PipeliningPipelining
Pipelining
 
03 Introduction to Physical layer
03  Introduction to Physical layer03  Introduction to Physical layer
03 Introduction to Physical layer
 
Network layer logical addressing
Network layer logical addressingNetwork layer logical addressing
Network layer logical addressing
 
Pipelining
PipeliningPipelining
Pipelining
 
Instruction set of 8086
Instruction set of 8086Instruction set of 8086
Instruction set of 8086
 
Classification of routing protocols
Classification of routing protocolsClassification of routing protocols
Classification of routing protocols
 
Pipelining in computer architecture
Pipelining in computer architecturePipelining in computer architecture
Pipelining in computer architecture
 
Instruction pipelining
Instruction pipeliningInstruction pipelining
Instruction pipelining
 
HDLC
HDLCHDLC
HDLC
 
UART Communication
UART CommunicationUART Communication
UART Communication
 
Control Units : Microprogrammed and Hardwired:control unit
Control Units : Microprogrammed and Hardwired:control unitControl Units : Microprogrammed and Hardwired:control unit
Control Units : Microprogrammed and Hardwired:control unit
 
Network layer tanenbaum
Network layer tanenbaumNetwork layer tanenbaum
Network layer tanenbaum
 
Multiple Access Protocal
Multiple Access ProtocalMultiple Access Protocal
Multiple Access Protocal
 
Computer Organization : CPU, Memory and I/O organization
Computer Organization : CPU, Memory and I/O organizationComputer Organization : CPU, Memory and I/O organization
Computer Organization : CPU, Memory and I/O organization
 
UNIT-I-RTOS and Concepts
UNIT-I-RTOS and ConceptsUNIT-I-RTOS and Concepts
UNIT-I-RTOS and Concepts
 
Operating System Case Study and I/O System
Operating System Case Study and I/O SystemOperating System Case Study and I/O System
Operating System Case Study and I/O System
 
Computer organization and architecture
Computer organization and architectureComputer organization and architecture
Computer organization and architecture
 

Similar to Performance Enhancement with Pipelining

12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
Sher Shah Merkhel
 
Computer SAarchitecture Lecture 6_Pip.pdf
Computer SAarchitecture Lecture 6_Pip.pdfComputer SAarchitecture Lecture 6_Pip.pdf
Computer SAarchitecture Lecture 6_Pip.pdf
kimhyunwoo24
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
Anwal Mirza
 
IT209 Cpu Structure Report
IT209 Cpu Structure ReportIT209 Cpu Structure Report
IT209 Cpu Structure Report
Bis Aquino
 
3 Pipelining
3 Pipelining3 Pipelining
3 Pipelining
fika sweety
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
dilip kumar
 
13 superscalar
13 superscalar13 superscalar
13 superscalar
Hammad Farooq
 
13_Superscalar.ppt
13_Superscalar.ppt13_Superscalar.ppt
13_Superscalar.ppt
LavleshkumarBais
 
Pipelining of Processors
Pipelining of ProcessorsPipelining of Processors
Pipelining of Processors
Gaditek
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computers
Syed Zaid Irshad
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptx
JoyChowdhury30
 
Conditional branches
Conditional branchesConditional branches
Conditional branches
Dilip Mathuria
 
CA UNIT III.pptx
CA UNIT III.pptxCA UNIT III.pptx
CA UNIT III.pptx
ssuser9dbd7e
 
Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelines
turki_09
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and Microcontroller
AmrutaMehata
 
Computer_Organization and architecture _unit 1.pptx
Computer_Organization and architecture _unit 1.pptxComputer_Organization and architecture _unit 1.pptx
Computer_Organization and architecture _unit 1.pptx
ManimegalaM3
 
Pipelining slides
Pipelining slides Pipelining slides
Pipelining slides
PrasantaKumarDash2
 
Coa.ppt2
Coa.ppt2Coa.ppt2
RISC.ppt
RISC.pptRISC.ppt
RISC.ppt
AmarDura2
 
13 risc
13 risc13 risc
13 risc
Anwal Mirza
 

Similar to Performance Enhancement with Pipelining (20)

12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
Computer SAarchitecture Lecture 6_Pip.pdf
Computer SAarchitecture Lecture 6_Pip.pdfComputer SAarchitecture Lecture 6_Pip.pdf
Computer SAarchitecture Lecture 6_Pip.pdf
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
IT209 Cpu Structure Report
IT209 Cpu Structure ReportIT209 Cpu Structure Report
IT209 Cpu Structure Report
 
3 Pipelining
3 Pipelining3 Pipelining
3 Pipelining
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
13 superscalar
13 superscalar13 superscalar
13 superscalar
 
13_Superscalar.ppt
13_Superscalar.ppt13_Superscalar.ppt
13_Superscalar.ppt
 
Pipelining of Processors
Pipelining of ProcessorsPipelining of Processors
Pipelining of Processors
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computers
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptx
 
Conditional branches
Conditional branchesConditional branches
Conditional branches
 
CA UNIT III.pptx
CA UNIT III.pptxCA UNIT III.pptx
CA UNIT III.pptx
 
Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelines
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and Microcontroller
 
Computer_Organization and architecture _unit 1.pptx
Computer_Organization and architecture _unit 1.pptxComputer_Organization and architecture _unit 1.pptx
Computer_Organization and architecture _unit 1.pptx
 
Pipelining slides
Pipelining slides Pipelining slides
Pipelining slides
 
Coa.ppt2
Coa.ppt2Coa.ppt2
Coa.ppt2
 
RISC.ppt
RISC.pptRISC.ppt
RISC.ppt
 
13 risc
13 risc13 risc
13 risc
 

More from Aneesh Raveendran

Single_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_RaveendranSingle_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_Raveendran
Aneesh Raveendran
 
Universal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP coreUniversal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP core
Aneesh Raveendran
 
Branch prediction
Branch predictionBranch prediction
Branch prediction
Aneesh Raveendran
 
Reversible Logic Gate
Reversible Logic GateReversible Logic Gate
Reversible Logic Gate
Aneesh Raveendran
 
Unalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory accessUnalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory access
Aneesh Raveendran
 
Pipelineing idealisam
Pipelineing idealisamPipelineing idealisam
Pipelineing idealisam
Aneesh Raveendran
 
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorArchitecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Aneesh Raveendran
 
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGADesign and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Aneesh Raveendran
 
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDLDesign of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Aneesh Raveendran
 

More from Aneesh Raveendran (9)

Single_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_RaveendranSingle_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_Raveendran
 
Universal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP coreUniversal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP core
 
Branch prediction
Branch predictionBranch prediction
Branch prediction
 
Reversible Logic Gate
Reversible Logic GateReversible Logic Gate
Reversible Logic Gate
 
Unalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory accessUnalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory access
 
Pipelineing idealisam
Pipelineing idealisamPipelineing idealisam
Pipelineing idealisam
 
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorArchitecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
 
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGADesign and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
 
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDLDesign of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDL
 

Recently uploaded

Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 

Recently uploaded (20)

Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 

Performance Enhancement with Pipelining

  • 1. Aneesh Raveendran Centre for Development of Advanced Computing, INDIA
  • 2. • What is pipelining ? • Pipeline Taxonomies • Instruction Pipelines • MIPS Instruction Pipeline • Pipeline Hazards • MIPS Pipelined Datapath • Load Word Instruction Example • Pipeline Datapath Example • Pipeline Control • Pipeline Instruction Example
  • 3. • Pipeline Hazards • Control Hazards • Data Hazards • Detecting Data Hazards • Resolving Data Hazards • Forwarding Example • Stalling Example • Branch Hazards • Branching Example • Key terms
  • 4. • There are two main ways to increase the performance of a processor through high-level system architecture • Increasing the memory access speed • Increasing the number of supported concurrent operations • Pipelining ! • Parallelism ? • Pipelining is the process by which instructions are parallelized over several overlapping stages of execution, in order to maximize datapath efficiency
  • 5. • Pipelining is analogous to many everyday scenarios • Car manufacturing process • Batch laundry jobs • Basically, any assembly-line operation applies • Two important concepts: • New inputs are accepted at one end before previously accepted inputs appear as outputs at the other end; • The number of operations performed per second is increased, even though the elapsed time needed to perform any one operation remains the same
  • 6. Looking at the textbook’s example, we have a 4-stage pipeline of laundry tasks: 1. Place one dirty load of clothes into washer 2. Place the washed clothes into a dryer 3. Place a dry load on a table and fold 4. Put the clothes away Graphically speaking: • Sequential (top) vs. • Pipelined (bottom) execution
  • 7. • There are two types of pipelines used in computer systems • Arithmetic pipelines • Used to pipeline data intensive functionalities • Instruction pipelines • Used to pipeline the basic instruction fetch and execute sequence • Other classifications include • Linear vs. nonlinear pipelines • Presence (or lack) of feedforward and feedback paths between stages • Static vs. dynamic pipelines • Dynamic pipelines are multifunctional, taking on a different form depending on the function being executed • Scalar vs. vector pipelines • Vector pipelines specifically target computations using vector data
  • 8. • Let us now introduce the pipeline we’re working with • It’s a 5-stage instruction, linear, static and scalar pipeline, consisting of the following steps: • Fetch instruction from Memory (IF) • Read registers while decoding the instruction (ID) • Execute the operation or calculate an address (EX) • Access an operand in data memory (MEM) • Write the result into a register (WB) • Again, theoretically, pipeline speedup = number of stages in pipeline
  • 9. Inst. Fetch (2ns), Reg. read/write (1ns), ALU op. (2ns), Data access (2ns)
  • 10. Clk Cycle 1 Multiple Cycle Implementation: Ifetch Reg Exec Mem Wr Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Load Ifetch Reg Exec Mem Wr Ifetch Reg Exec Mem Load Store Pipeline Implementation: Ifetch Reg Exec Mem WrStore Clk Single Cycle Implementation: Load Store Waste Ifetch R-type Ifetch Reg Exec Mem WrR-type Cycle 1 Cycle 2
  • 11. • Suppose • 100 instructions are executed • The single cycle machine has a cycle time of 45 ns • The multicycle and pipeline machines have cycle times of 10 ns • The multicycle machine has a CPI of 4.6 • Single Cycle Machine • 45 ns/cycle x 1 CPI x 100 inst = 4500 ns • Multicycle Machine • 10 ns/cycle x 4.6 CPI x 100 inst = 4600 ns • Ideal pipelined machine • 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns • Ideal pipelined vs. single cycle speedup • 4500 ns / 1040 ns = 4.33 • What has not yet been considered?
  • 12. • What makes it easy • all instructions are the same length • just a few instruction formats • memory operands appear only in loads and stores • What makes it hard? • structural hazards: suppose we had only one memory • control hazards: need to worry about branch instructions • data hazards: an instruction depends on a previous instruction • We’ll build a simple pipeline and look at these issues
  • 13. • structural hazards: attempt to use the same resource two different ways at the same time • E.g., two instructions try to read the same memory at the same time • data hazards: attempt to use item before it is ready • instruction depends on result of prior instruction still in the pipeline add r1, r2, r3 sub r4, r2, r1 • control hazards: attempt to make a decision before condition is evaulated • branch instructions beq r1, loop add r1, r2, r3 • Can always resolve hazards by waiting • pipeline control must detect the hazard • take action (or delay action) to resolve hazards
  • 14. What do we need to split the datapath into stages ?
  • 15. Pipeline registers (buffers) are similar to multicycle processor design
  • 17. Instruction decode and register file read stage
  • 18. Execute or address calculation stage
  • 21. Write register number comes from the MEM/WB pipeline register along with the data
  • 22. Multiple-clock cycle (vs. single-clock cycle) pipelined diagrams
  • 23. Single-cycle pipeline diagram with one instruction on the pipeline
  • 24. Single-cycle pipeline diagram with two instructions on the pipeline
  • 25. • What control signals are required ? • First, notice that the pipeline registers are written every clock cycle, hence do not require explicit control signals, otherwise: • Instruction fetch and PC increment • Again, asserted at every clock cycle • Instruction decode and register file read • Again, asserted at every clock cycle • Execution and address calculation • Need to select the result register, the ALU operation, and either Read data 2 or the sign-extended immediate for the ALU • Memory access • Need to read from memory, write to memory or complete branch • Write back • Need to send back either ALU result or memory value to the register file
  • 26.
  • 27. Execution/Address Calculation stage control lines Memory access stage control lines Write-back stage control lines Instruction Reg Dst ALU Op1 ALU Op0 ALU Src Branc h Mem Read Mem Write Reg write Mem to Reg R-format 1 1 0 0 0 0 0 1 0 lw 0 0 0 1 0 1 0 1 1 sw X 0 0 1 0 0 1 0 X beq X 0 1 0 1 0 0 0 X
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. • Structural hazard • Occurs when a combination of instructions is not supported by the datapath • For example, a unified memory unit would need to be accessed in stages 1 (IF) and 4 (MEM), which would cause a contention • Pipeline outright fails in the presence of structural hazards • Control hazard • Occurs when a decision is made based on the results of one instructions, while others are executing • For example, a branch instruction is either taken or not • Solutions that exist are stalling and predicting • Data hazard • Occurs when an instruction depends on the results of an instruction resident on the pipeline • For example, adding two register contents and storing their result into a third register, then using that register’s contents for another operation • Solutions that exist are based on forwarding
  • 39. • Three major solutions • Stall • Predict • Delayed branch slot • Stalling involves always waiting for the PC to be updated with the correct address before moving on • A pipeline stall (or bubble) allows us to perform this wait • Quite costly, as we have to stall even if the branch fails
  • 40. • Predicting involves guessing whether the branch is taken or not, and acting on that guess • If correct, then proceed with normal pipeline execution • If incorrect, then stall pipeline execution
  • 41. • Delayed branch involves executing the next sequential instruction with the branch taking place after that de laye d branch slo t • The assembler automatically adjusts the instructions to make it transparent from the programmer • The instruction has to be safe, as in it shouldn’t affect the branch • Longer pipelines requires the use of more branch delay slots • Actual MIPS architecture solution
  • 42. • Forwarding involves providing the inputs to a stage of one instruction before the completion of another instruction • Valid if destination stage is later in time than the source stage • Left diagram shows typical forwarding scenario (add then sub) • Right diagram shows that we still need a stall in the case of a load- use data hazard (load then R-type)
  • 43. sub $2, $1 , $3 and $1 2, $2, $5 o r $1 3, $6 , $2 add $1 4, $2, $2 sw $1 4, 1 0 0 ($2)
  • 44. • We could insert “no operation” (nop) instructions to delay the pipeline execution until the correct result is in the register file sub $2, $1 , $3 no p no p and $1 2, $2, $5 o r $1 3, $6 , $2 add $1 4, $2, $2 sw $1 4, 1 0 0 ($2) • Too slow as it adds extra useless clock cycles • In reality, we try to find useful instructions to execute between data- dependent instructions, but this happens too often to be efficient
  • 45. • Let us try to formalize detecting a data hazard 1. EX/MEM.RegisterRd = ID/EX.RegisterRs 2. EX/MEM.RegisterRd = ID/EX.RegisterRt 3. MEM/WB.RegisterRd = ID/EX.RegisterRs 4. MEM/WB.RegisterRd = ID/EX.RegisterRt sub $2, $1 , $3 and $1 2, $2, $5 Data hazard o f type #1 o r $1 3, $6 , $2 Data hazard o f type #4 add $1 4, $2, $2 No data hazard – re g iste r file sw $1 4, 1 0 0 ($2) No data hazard – co rre ct o pe ratio n
  • 46. • Two modifications are in order • Firstly, we don’t have to forward all the time! • Some instructions don’t write registers (e.g. beq) • Use RegWrite signal in WB control block to determine condition • Secondly, the $0 register must always return 0 • Can’t limit programmer of using it as a destination register • Use RegisterRd to determine if $0 is being used 1. If (EX/MEM.RegWrite & (EX/MEM.RegisterRd ≠ 0) & (EX/MEM.RegisterRd=ID/EX.RegisterRs)) ForwardA= 10 2. If (EX/MEM.RegWrite & (EX/MEM.RegisterRd ≠ 0) & (EX/MEM.RegisterRd=ID/EX.RegisterRt)) ForwardB= 10 3. If (MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (MEM/WB.RegisterRd=ID/EX.RegisterRs)) ForwardA= 01 4. If (MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (MEM/WB.RegisterRd=ID/EX.RegisterRt)) ForwardB= 01 • Let us examine the hardware changes to our datapath
  • 47.
  • 48. • Remember that there is no hazard in the WB stage, because the register file is able to be written and read in the same stage Mux control Source Description ForwardA = 00 ID/EX First ALU operand comes from RF ForwardA = 01 EX/MEM First ALU operand forwarded from prior ALU result ForwardA = 10 MEM/WB First ALU operand forwarded from data memory or prior ALU result ForwardB = 00 ID/EX Second ALU operand comes from RF ForwardB = 01 EX/MEM Second ALU operand forwarded from prior ALU result ForwardB = 10 MEM/WB Second ALU operand forwarded from data memory or prior ALU result
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55. lw $2, 2 0 ($1 ) and $4, $2, $5 o r $8 , $2, $6 add $9 , $4, $2 slt $1 , $6 , $7
  • 56. • Let us try to formalize detecting a stalling data hazard • If (ID/EX.MemRead & ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID/RegisterRt))) • On the condition being true, we stall the pipeline!
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64. • Other instructions are on the pipeline when we find out whether we take the branch or not!
  • 65. • Two solutions • Assume branch is not taken • Dynamic branch prediction • We’ve already discussed the first solution • Note that three instruction stages have to be flushed when the branch is taken • Done similarly to a data hazard stall (control values set to 0s) • We can increase branch performance by moving the branch decision to the ID stage (rather than the MEM stage) • Branch target address calculated by moving adder into ID stage • Branch decision done by comparing Rs and Rt • Flushing the IF stage instruction involves nop instructions
  • 66.
  • 67.
  • 68.
  • 69. • Store, in a branch pre dictio n buffe r, the history of each branch instruction • 1-bit requires one wrong prediction to update history table • 2-bits requires two wrong predictions to update history table
  • 70. • Pipelining vs. Parallelism • Pipeline Stages • Pipeline Taxonomies • MIPS Instruction Pipeline • Structural Hazards • Control Hazards • Data Hazards • Pipeline Registers and Operation • Pipeline Control • Pipeline Throughput • Pipeline Efficiency
  • 71. • Control Hazard Stalling • Control Hazard Predicting • Control Hazard Delayed Branch • Data Hazard Forwarding • Data Hazard Detection • Forwarding Unit • Data Hazard Stalling • Branch Prediction Buffer
  • 72. 1 . ane e shr20 20 @ g m ail. co m

Editor's Notes

  1. Here are the timing diagrams showing the differences between the single cycle, multiple cycle, and pipeline implementations. For example, in the pipeline implementation, we can finish executing the Load, Store, and R-type instruction sequence in seven cycles. In the multiple clock cycle implementation, however, we cannot start executing the store until Cycle 6 because we must wait for the load instruction to complete. Similarly, we cannot start the execution of the R-type instruction until the store instruction has completed its execution in Cycle 9. In the Single Cycle implementation, the cycle time is set to accommodate the longest instruction, the Load instruction. Consequently, the cycle time for the Single Cycle implementation can be five times longer than the multiple cycle implementation. But may be more importantly, since the cycle time has to be long enough for the load instruction, it is too long for the store instruction so the last part of the cycle here is wasted. +2 = 77 min. (X:57)