SlideShare a Scribd company logo
1
Branch Prediction
Aneesh Raveendran
Centre for Development of Advanced Computing, INDIA
aneeshr2020@gmail.com
2
Outline
• What are branches?
• Reducing branch penalties
• Branch prediction
• Why is branch prediction necessary?
• Branch prediction basics
• Issues which affect accurate branch prediction
• Examples of real predictors
3
Branches
• Instructions which can alter the flow of
instruction execution in a program
4
Types of Branches
Conditional Unconditional
Direct if - then- else
for loops
(bez, bnez, etc)
procedure calls (jal)
goto (j)
Indirect return (jr)
virtual function lookup
function pointers (jalr)
5
Techniques for handling
branches
IF ID EX MEM WB
• Stalling
• Branch delay slots
• Relies on programmer/compiler to fill
• Depends on being able to find suitable instructions
• Ties resolution delay to a particular pipeline
• Predication
– “if-conversion”: ∆ control dependence to data
dependence on branch condition
6
Why aren’t these techniques acceptable?
• Branches are frequent - 15-25%
• Today’s pipelines are deeper and wider
– Higher performance penalty for stalling
– Mis-prediction Penalty = issue width * resolution delay
cycles
• A lot of cycles can be wasted!!!
7
Branch Prediction
• Predicting the outcome of a branch
– Direction:
• Taken / Not Taken
• Direction predictors
– Target Address
• PC+offset (Taken)/ PC+4 (Not Taken)
• Target address predictors
– Branch Target Address Cache (BTAC) or
Branch Target Buffer (BTB)
8
Why do we need branch prediction?
• Branch prediction
• Increases the number of instructions available for the
scheduler to issue. Increases instruction level
parallelism (ILP)
• Allows useful work to be completed while waiting for
the branch to resolve
9
Branch Prediction Strategies
• Static
• Decided before runtime
• Examples:
• Always-Not Taken
• Always-Taken
• Backwards Taken, Forward Not Taken (BTFNT)
• Profile-driven prediction
• Dynamic
• Prediction decisions may change during the execution
of the program
10
What happens when a branch is predicted?
• On Mis-prediction:
• No speculative state may commit
• Squash instructions in the pipeline
• Must not allow stores in the pipeline to occur
• Cannot allow stores which would not have
happened to commit
• Need to handle exceptions appropriately
11
Bimodal Prediction
• Table of 2-bit saturating counters
– Predict the most common direction
– Advantages: simple, cheap, “good”
accuracy
10
T
01
NT
00
NT
Taken
Taken
Taken
Taken
Not
Taken
Not
Taken
Not
Taken
Not
Taken
11
T
PHT
PC
T/NT
00 01 10 11
Taken Taken Taken Taken
Not
Taken
Not
Taken
Not
Taken
Not
Taken
...
00 01 10 11
Taken Taken Taken Taken
Not
Taken
Not
Taken
Not
Taken
Not
Taken
12
Correlation
B1: if (x)
...
B2: if (y)
...
z=x&&y
B3: if (z)
...
• B3 can be predicted
with 100% accuracy
based on the outcomes
of B1 and B2
13
Two-Level Prediction
• Uses two levels of information to make a
direction prediction
– Branch History Table (BHT)
– PHT
• Captures patterned behavior of branches
– Groups of branches are correlated
– Particular branches have particular behavior
14
Two-level Predictor Classification
• Yeh and Patt 3-letter naming scheme
– Type of history collected
• G (global), P (per branch), S (per set)
• M (merge?)
– added by Skadron, Martonosi, Clark
– PHT type
• A (adaptive), S (static)
– PHT organization
• g (global), p (per branch), s (per set)
15
PHT
PC
T/NT
TNTTT
GBHR
PHT
PC
T/NT
BHT
TTNTTNT
TTTNTNT
NTNTTTT
NTTTTT
GAs Predictor PAs Predictor
Some Two-level Predictors
16
Hybrid Prediction
• Two or more predictor components
combined
• Different
branches benefit
from different types
of history
PC
T/NTT/NT
Bimodal
T/NT
Selector
PAs
...
17
Special Branches
• Procedure calls and returns
– Calls are always taken
– Return address almost always known
• Return Address Stack (RAS)
– On a procedure call, push the address of the
instruction after the call onto the stack
18
Issues Affecting Accurate Branch Prediction
• Aliasing
– More than one branch may use the same
BHT/PHT entry
• Constructive
– Prediction that would have been incorrect, predicted
correctly
• Destructive
– Prediction that would have been correct, predicted
incorrectly
• Neutral
– No change in the accuracy
19
More Issues
• Training time
– Need to see enough branches to uncover pattern
– Need enough time to reach steady state
• “Wrong” history
– Incorrect type of history for the branch
• Stale state
– Predictor is updated after information is needed
• Operating system context switches
– More aliasing caused by branches in different
programs
20
“Real” Branch Predictors
• Alpha 21264
– 8-stage pipeline, mispredict penalty 7 cycles
– 64 KB, 2-way instruction cache with line and
way prediction bits (Fetch)
• Each 4-instruction fetch block contains a prediction
for the next fetch block
– Hybrid predictor (Fetch)
• 12-bit GAg (4K-entry PHT, 2 bit counters)
• 10-bit PAg (1K-entry BHT, 1K-entry PHT, 3-bit
counters)
21
UltraSPARC-III
• 14-stage pipeline, bpred accessed in
instruction fetch stages 2-3
• 16K-entry 2-bit counter Gshare predictor
– Bimodal predictor which XOR’s PC bits with
global history register (except 3 lower order
bits) to reduce aliasing
• Miss queue
– Halves mispredict penalty by providing
instructions for immediate use
22
Pentium III
• Dynamic branch prediction
– 512-entry BTB predicts direction and target, 4-
bit history used with PC to derive direction
• Static branch predictor for BTB misses
• Return Address Stack (RAS), 4/8 entries
• Branch Penalties:
– Not Taken: no penalty
– Correctly predicted taken: 1 cycle
– Mispredicted: at least 9 cycles, as many as 26,
average 10-15 cycles
23
AMD Athlon K7
• 10-stage integer, 15-stage fp pipeline,
predictor accessed in fetch
• 2K-entry bimodal, 2K-entry BTAC
• 12-entry RAS
• Branch Penalties:
– Correct Predict Taken: 1 cycle
– Mispredict penalty: at least 10 cycles
Queries
• aneeshr2020@gmail.com
24

More Related Content

What's hot

Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazard
AJAL A J
 
Superscalar Processor
Superscalar ProcessorSuperscalar Processor
Superscalar Processor
Manash Kumar Mondal
 
Operand and Opcode | Computer Science
Operand and Opcode | Computer ScienceOperand and Opcode | Computer Science
Operand and Opcode | Computer Science
Transweb Global Inc
 
Instruction Set Architecture (ISA)
Instruction Set Architecture (ISA)Instruction Set Architecture (ISA)
Instruction Set Architecture (ISA)
Gaditek
 
Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memory
Mazin Alwaaly
 
Computer organisation -morris mano
Computer organisation  -morris manoComputer organisation  -morris mano
Computer organisation -morris mano
vishnu murthy
 
Virtual memory
Virtual memoryVirtual memory
Virtual memoryAnuj Modi
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
Kamal Acharya
 
Pipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture pptPipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture ppt
mali yogesh kumar
 
Memory organization in computer architecture
Memory organization in computer architectureMemory organization in computer architecture
Memory organization in computer architecture
Faisal Hussain
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlow
Chaudhary Manzoor
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
Kamal Acharya
 
Computer architecture register transfer languages rtl
Computer architecture register transfer languages rtlComputer architecture register transfer languages rtl
Computer architecture register transfer languages rtl
Mazin Alwaaly
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
vampugani
 
Computer registers
Computer registersComputer registers
Computer registers
DeepikaT13
 
Computer organization basics
Computer organization  basicsComputer organization  basics
Computer organization basics
Deepak John
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
Ali A Jalil
 
Lecture 14 run time environment
Lecture 14 run time environmentLecture 14 run time environment
Lecture 14 run time environment
Iffat Anjum
 

What's hot (20)

Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazard
 
Superscalar Processor
Superscalar ProcessorSuperscalar Processor
Superscalar Processor
 
Operand and Opcode | Computer Science
Operand and Opcode | Computer ScienceOperand and Opcode | Computer Science
Operand and Opcode | Computer Science
 
Instruction Set Architecture (ISA)
Instruction Set Architecture (ISA)Instruction Set Architecture (ISA)
Instruction Set Architecture (ISA)
 
Computer architecture virtual memory
Computer architecture virtual memoryComputer architecture virtual memory
Computer architecture virtual memory
 
Computer organisation -morris mano
Computer organisation  -morris manoComputer organisation  -morris mano
Computer organisation -morris mano
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Pipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture pptPipeline hazards in computer Architecture ppt
Pipeline hazards in computer Architecture ppt
 
Memory organization in computer architecture
Memory organization in computer architectureMemory organization in computer architecture
Memory organization in computer architecture
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlow
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Computer architecture register transfer languages rtl
Computer architecture register transfer languages rtlComputer architecture register transfer languages rtl
Computer architecture register transfer languages rtl
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
Computer registers
Computer registersComputer registers
Computer registers
 
Array Processor
Array ProcessorArray Processor
Array Processor
 
Computer organization basics
Computer organization  basicsComputer organization  basics
Computer organization basics
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
 
Lecture 14 run time environment
Lecture 14 run time environmentLecture 14 run time environment
Lecture 14 run time environment
 
06. thumb instructions
06. thumb instructions06. thumb instructions
06. thumb instructions
 

Viewers also liked

Comp architecture : branch prediction
Comp architecture : branch predictionComp architecture : branch prediction
Comp architecture : branch prediction
rinnocente
 
[2009 11-09] branch prediction
[2009 11-09] branch prediction[2009 11-09] branch prediction
[2009 11-09] branch predictionmobilevc
 
Lect09 adv-branch-prediction
Lect09 adv-branch-predictionLect09 adv-branch-prediction
Lect09 adv-branch-predictionGour Rakesh
 
Conformational analysis
Conformational analysisConformational analysis
Conformational analysisPinky Vincent
 
Intel 64bit Architecture
Intel 64bit ArchitectureIntel 64bit Architecture
Intel 64bit Architecture
Motaz Saad
 
Instruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) LimitationsInstruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) Limitations
Jose Pinilla
 
Intel CPU Manufacturing Process
Intel CPU Manufacturing ProcessIntel CPU Manufacturing Process
Intel CPU Manufacturing Process
A B Shinde
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
A B Shinde
 
pipelining
pipeliningpipelining
pipelining
Siddique Ibrahim
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Hsien-Hsin Sean Lee, Ph.D.
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
LinkedIn
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
Luminary Labs
 

Viewers also liked (13)

Comp architecture : branch prediction
Comp architecture : branch predictionComp architecture : branch prediction
Comp architecture : branch prediction
 
[2009 11-09] branch prediction
[2009 11-09] branch prediction[2009 11-09] branch prediction
[2009 11-09] branch prediction
 
Like 2014214
Like 2014214Like 2014214
Like 2014214
 
Lect09 adv-branch-prediction
Lect09 adv-branch-predictionLect09 adv-branch-prediction
Lect09 adv-branch-prediction
 
Conformational analysis
Conformational analysisConformational analysis
Conformational analysis
 
Intel 64bit Architecture
Intel 64bit ArchitectureIntel 64bit Architecture
Intel 64bit Architecture
 
Instruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) LimitationsInstruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) Limitations
 
Intel CPU Manufacturing Process
Intel CPU Manufacturing ProcessIntel CPU Manufacturing Process
Intel CPU Manufacturing Process
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
 
pipelining
pipeliningpipelining
pipelining
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Similar to Branch prediction

SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOCSOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SnehaLatha68
 
Conditional branches
Conditional branchesConditional branches
Conditional branches
Dilip Mathuria
 
ch2.pptx
ch2.pptxch2.pptx
ch2.pptx
Halogens
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database Systems
Daniel Abadi
 
Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelines
turki_09
 
Control hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptxControl hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptx
Irfan Anjum
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptx
JoyChowdhury30
 
Pipelining
PipeliningPipelining
Pipelining
sarith divakar
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Tathagata Das
 
14 superscalar
14 superscalar14 superscalar
14 superscalar
Anwal Mirza
 
DevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksDevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance Networks
Jason TC HOU (侯宗成)
 
Pipelining
PipeliningPipelining
Pipelining
AJAL A J
 
Pentinum 2
Pentinum 2Pentinum 2
Pentinum 2
Prateek Pandey
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and functionSher Shah Merkhel
 
Nokia kpi and_core_optimization
Nokia kpi and_core_optimizationNokia kpi and_core_optimization
Nokia kpi and_core_optimization
debasish goswami
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
dilip kumar
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Online Architecture Assignment Help
Online Architecture Assignment HelpOnline Architecture Assignment Help
Online Architecture Assignment Help
Architecture Assignment Help
 
Play With Streams
Play With StreamsPlay With Streams
Play With Streams
Tianjian Chen
 

Similar to Branch prediction (20)

SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOCSOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
 
Conditional branches
Conditional branchesConditional branches
Conditional branches
 
ch2.pptx
ch2.pptxch2.pptx
ch2.pptx
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database Systems
 
Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelines
 
Control hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptxControl hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptx
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptx
 
Pipelining
PipeliningPipelining
Pipelining
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
 
14 superscalar
14 superscalar14 superscalar
14 superscalar
 
DevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksDevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance Networks
 
Pipelining
PipeliningPipelining
Pipelining
 
PraveenBOUT++
PraveenBOUT++PraveenBOUT++
PraveenBOUT++
 
Pentinum 2
Pentinum 2Pentinum 2
Pentinum 2
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
Nokia kpi and_core_optimization
Nokia kpi and_core_optimizationNokia kpi and_core_optimization
Nokia kpi and_core_optimization
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Online Architecture Assignment Help
Online Architecture Assignment HelpOnline Architecture Assignment Help
Online Architecture Assignment Help
 
Play With Streams
Play With StreamsPlay With Streams
Play With Streams
 

More from Aneesh Raveendran

Single_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_RaveendranSingle_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_Raveendran
Aneesh Raveendran
 
Universal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP coreUniversal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP core
Aneesh Raveendran
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
Aneesh Raveendran
 
Reversible Logic Gate
Reversible Logic GateReversible Logic Gate
Reversible Logic Gate
Aneesh Raveendran
 
Unalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory accessUnalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory access
Aneesh Raveendran
 
Pipelineing idealisam
Pipelineing idealisamPipelineing idealisam
Pipelineing idealisam
Aneesh Raveendran
 
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorArchitecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Aneesh Raveendran
 
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGADesign and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Aneesh Raveendran
 
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDLDesign of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Aneesh Raveendran
 

More from Aneesh Raveendran (9)

Single_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_RaveendranSingle_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_Raveendran
 
Universal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP coreUniversal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP core
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 
Reversible Logic Gate
Reversible Logic GateReversible Logic Gate
Reversible Logic Gate
 
Unalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory accessUnalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory access
 
Pipelineing idealisam
Pipelineing idealisamPipelineing idealisam
Pipelineing idealisam
 
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorArchitecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
 
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGADesign and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
 
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDLDesign of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDL
 

Recently uploaded

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 

Recently uploaded (20)

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 

Branch prediction

  • 1. 1 Branch Prediction Aneesh Raveendran Centre for Development of Advanced Computing, INDIA aneeshr2020@gmail.com
  • 2. 2 Outline • What are branches? • Reducing branch penalties • Branch prediction • Why is branch prediction necessary? • Branch prediction basics • Issues which affect accurate branch prediction • Examples of real predictors
  • 3. 3 Branches • Instructions which can alter the flow of instruction execution in a program
  • 4. 4 Types of Branches Conditional Unconditional Direct if - then- else for loops (bez, bnez, etc) procedure calls (jal) goto (j) Indirect return (jr) virtual function lookup function pointers (jalr)
  • 5. 5 Techniques for handling branches IF ID EX MEM WB • Stalling • Branch delay slots • Relies on programmer/compiler to fill • Depends on being able to find suitable instructions • Ties resolution delay to a particular pipeline • Predication – “if-conversion”: ∆ control dependence to data dependence on branch condition
  • 6. 6 Why aren’t these techniques acceptable? • Branches are frequent - 15-25% • Today’s pipelines are deeper and wider – Higher performance penalty for stalling – Mis-prediction Penalty = issue width * resolution delay cycles • A lot of cycles can be wasted!!!
  • 7. 7 Branch Prediction • Predicting the outcome of a branch – Direction: • Taken / Not Taken • Direction predictors – Target Address • PC+offset (Taken)/ PC+4 (Not Taken) • Target address predictors – Branch Target Address Cache (BTAC) or Branch Target Buffer (BTB)
  • 8. 8 Why do we need branch prediction? • Branch prediction • Increases the number of instructions available for the scheduler to issue. Increases instruction level parallelism (ILP) • Allows useful work to be completed while waiting for the branch to resolve
  • 9. 9 Branch Prediction Strategies • Static • Decided before runtime • Examples: • Always-Not Taken • Always-Taken • Backwards Taken, Forward Not Taken (BTFNT) • Profile-driven prediction • Dynamic • Prediction decisions may change during the execution of the program
  • 10. 10 What happens when a branch is predicted? • On Mis-prediction: • No speculative state may commit • Squash instructions in the pipeline • Must not allow stores in the pipeline to occur • Cannot allow stores which would not have happened to commit • Need to handle exceptions appropriately
  • 11. 11 Bimodal Prediction • Table of 2-bit saturating counters – Predict the most common direction – Advantages: simple, cheap, “good” accuracy 10 T 01 NT 00 NT Taken Taken Taken Taken Not Taken Not Taken Not Taken Not Taken 11 T PHT PC T/NT 00 01 10 11 Taken Taken Taken Taken Not Taken Not Taken Not Taken Not Taken ... 00 01 10 11 Taken Taken Taken Taken Not Taken Not Taken Not Taken Not Taken
  • 12. 12 Correlation B1: if (x) ... B2: if (y) ... z=x&&y B3: if (z) ... • B3 can be predicted with 100% accuracy based on the outcomes of B1 and B2
  • 13. 13 Two-Level Prediction • Uses two levels of information to make a direction prediction – Branch History Table (BHT) – PHT • Captures patterned behavior of branches – Groups of branches are correlated – Particular branches have particular behavior
  • 14. 14 Two-level Predictor Classification • Yeh and Patt 3-letter naming scheme – Type of history collected • G (global), P (per branch), S (per set) • M (merge?) – added by Skadron, Martonosi, Clark – PHT type • A (adaptive), S (static) – PHT organization • g (global), p (per branch), s (per set)
  • 16. 16 Hybrid Prediction • Two or more predictor components combined • Different branches benefit from different types of history PC T/NTT/NT Bimodal T/NT Selector PAs ...
  • 17. 17 Special Branches • Procedure calls and returns – Calls are always taken – Return address almost always known • Return Address Stack (RAS) – On a procedure call, push the address of the instruction after the call onto the stack
  • 18. 18 Issues Affecting Accurate Branch Prediction • Aliasing – More than one branch may use the same BHT/PHT entry • Constructive – Prediction that would have been incorrect, predicted correctly • Destructive – Prediction that would have been correct, predicted incorrectly • Neutral – No change in the accuracy
  • 19. 19 More Issues • Training time – Need to see enough branches to uncover pattern – Need enough time to reach steady state • “Wrong” history – Incorrect type of history for the branch • Stale state – Predictor is updated after information is needed • Operating system context switches – More aliasing caused by branches in different programs
  • 20. 20 “Real” Branch Predictors • Alpha 21264 – 8-stage pipeline, mispredict penalty 7 cycles – 64 KB, 2-way instruction cache with line and way prediction bits (Fetch) • Each 4-instruction fetch block contains a prediction for the next fetch block – Hybrid predictor (Fetch) • 12-bit GAg (4K-entry PHT, 2 bit counters) • 10-bit PAg (1K-entry BHT, 1K-entry PHT, 3-bit counters)
  • 21. 21 UltraSPARC-III • 14-stage pipeline, bpred accessed in instruction fetch stages 2-3 • 16K-entry 2-bit counter Gshare predictor – Bimodal predictor which XOR’s PC bits with global history register (except 3 lower order bits) to reduce aliasing • Miss queue – Halves mispredict penalty by providing instructions for immediate use
  • 22. 22 Pentium III • Dynamic branch prediction – 512-entry BTB predicts direction and target, 4- bit history used with PC to derive direction • Static branch predictor for BTB misses • Return Address Stack (RAS), 4/8 entries • Branch Penalties: – Not Taken: no penalty – Correctly predicted taken: 1 cycle – Mispredicted: at least 9 cycles, as many as 26, average 10-15 cycles
  • 23. 23 AMD Athlon K7 • 10-stage integer, 15-stage fp pipeline, predictor accessed in fetch • 2K-entry bimodal, 2K-entry BTAC • 12-entry RAS • Branch Penalties: – Correct Predict Taken: 1 cycle – Mispredict penalty: at least 10 cycles