SlideShare a Scribd company logo
1 of 11
P a g e | 1
The University of Texas at Dallas
Department of Electrical Engineering
EECE/CS 6304: COMPUTER ARCHITECTURE
PROJECT #2
“ ANALYSIS OF DIFFERENT TYPES OF
BRANCH PREDICTORS ”
Submitted by,
Bharat Biyani (2021152193)
Shree Viswa Shamanthan L D (2021180127)
P a g e | 1
INTRODUCTION
In computer architecture, a branch predictor is a digital circuit that tries to speculate
which way a branch will go before this is known for sure (i.e., before its execution). The purpose
of the branch predictor is to improve the flow in the instruction pipeline. They play a critical role
in achieving high effective performance in many modern pipelined microprocessor architectures
such as x86.
In this project, we analyze the behavior of different branch predictor configurations in
three well-recognized benchmarks, especially GCC, ANAGRAM and GO. We used simplescalar
sim-outorder, which models all the execution aspects of Alpha 21264. The simulations provide
the CPI values, which we use to compare among different benchmarks.
We have used three types of hardware based branch prediction strategies, they are:
1) Bimodal Predictor: It is a simple predictor, which uses 2-bit saturating counters to predict if a
given branch is likely to be taken or not.
2) Two Level Predictor: A two-level adaptive predictor with an n-bit history is that it can predict
any repetitive sequence with any period if all n-bit sub-sequences are different. The
advantage of the two-level adaptive predictor is that it can quickly learn to predict an
arbitrary repetitive pattern.
3) Combined Predictor: A hybrid predictor also called combined predictor implements more
than one prediction mechanism. The final prediction is based either on a meta-predictor that
remembers which of the predictors has made the best predictions in the past or a majority
vote function based on an odd number of different predictors.
P a g e | 2
Part 1: Performance analysis of different types of branch predictors
The simulation is done for different configuration of Return Address Space (RAS) and types of
branch predictions.
 Baseline default RAS: Bimodal predictor with the default value for RAS.
-bpred bimod -bpred:bimod 256 -bpred:ras 8 -bpred:btb 64 2
 2 Level Predictor: Uses two bit for defining the state for branch predictor.
-bpred 2lev -bpred:2lev 1 256 4 0 -bpred:ras 8 -bpred:btb 64 2
 Comb: Combines a two levels and bimodal predictor.
-bpred comb -bpred:comb 256 -bpred:bimod 256 -bpred:2lev 1 256 4 0 -bpred:ras 8 -
bpred:btb 64 2
 RAS 4: Change the return address stack (RAS) size to 4.
-bpred bimod -bpred:bimod 256 -bpred:ras 4 -bpred:btb 64 2
 RAS 16: Change the return address stack (RAS) size to 16.
-bpred bimod -bpred:bimod 256 -bpred:ras 16 -bpred:btb 64 2
Performance Analysis based on CPI
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 Baseline 0.95 0.4674 0.7571
2 2 Level Predictor 0.9822 0.4605 0.7893
3 Comb 0.8678 0.4546 0.7516
4 Bimod: RAS 4 0.9538 0.4678 0.7574
5 Bimod: RAS 16 0.9498 0.4674 0.7571
Graphical Representation with above CPI
0
0.2
0.4
0.6
0.8
1
1.2
Baseline 2 Level
Predictor
Comb RAS 4 RAS 16
ANAGRAM
GO
GCC
P a g e | 3
Above graph clearly displays the performance of different configurations of branch predictor.
Analysis: Benchmark – GCC vs BP Configurations
GCC benchmark has more CPI as compared to the other benchmarks. Specifically, CPI
improved for combination of two level and bimodal predictor (Comb). It has high CPI for 2 level
predictor which uses two bits for defining state of branch predictor.
Analysis: Benchmark – ANAGRAM vs BP Configurations
From the above graph, we can infer that ANAGRAM benchmark has a less CPI than the
other two benchmarks. The performance of ANAGRAM benchmark is fairly constant for all the
configurations of branch predictor. Specifically, CPI is optimal for combination of two level and
bimodal predictor (Comb).
Analysis: Benchmark – GO vs BP Configurations
Above graph shows that GO benchmark performs better than the GCC benchmark. The
performance of GO benchmark is almost constant for all the configurations of branch predictor.
Specifically, CPI is optimal for combination of two level and bimodal predictor (Comb). With
respect to bimod size variation, if we change baseline configuration from the default return
address space from size of 4 to size of 16, CPI performance gets better. RAS size does not have
much impact on CPI.
P a g e | 4
Performance Analysis based on Address Hit Rates
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 Baseline 0.6734 0.956 0.7071
2 2 Level Predictor 0.6253 0.9575 0.6484
3 Comb 0.8339 0.9694 0.709
4 Bimod: RAS 4 0.6697 0.9555 0.7067
5 Bimod: RAS 16 0.6736 0.9605 0.7071
Graphical Representation with above Address Hit Rates
The above graph clearly shows the performance of different configurations of branch
predictor for different benchmarks.
For ANAGRAM benchmark, except for bimod, Return Address Stack (RAS) size 4, the
Address Hit Rates are appreciable.
For GO benchmark, except for 2 level predictor configuration, the Address Hit Rates are
appreciable.
For GCC benchmark, except for 2 level predictor configuration, the Address Hits Rates are
appreciable.
0
0.2
0.4
0.6
0.8
1
1.2
Baseline 2 Level Predictor Comb Bimod: RAS 4 Bimod: RAS 16
GCC
GO
ANAGRAM
P a g e | 5
Performance Analysis based on Direction Hit Rates
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 Baseline 0.6734 0.9605 0.7929
2 2 Level Predictor 0.7919 0.9614 0.7372
3 Comb 0.8617 0.9738 0.7978
4 Bimod: RAS 4 0.8431 0.9605 0.7929
5 Bimod: RAS 16 0.8431 0.9605 0.7929
The graph for the Direction Hit Rates with respect to every benchmark will provide us
more information on the effect of branch prediction configurations on different benchmarks.
Graphical Representation with above Direction Hit Rates
The Direction Hit Rates of the branch predictors fairly stays constant for each benchmark.
Specifically, ANAGRAM benchmark has more direction hit rates than other two benchmarks. In
this case, 2 level prediction direction rate gives worst performance but when we change Returns
Address Space from 8 to 16 or 8 to 4, it performs better.
0
0.2
0.4
0.6
0.8
1
1.2
Baseline 2 Level Predictor Comb Bimod: RAS 4 Bimod: RAS 16
GCC
GO
ANAGRAM
P a g e | 6
Part 2: Modification of the code to accommodate address misses
We carried out modifications in the following two files in simplescalar.
1) bpred.h
2) bpred.c
1) Changes in file bpred.h:
----------------
/* branch predictor def */
struct bpred_t {
------
} dirpred;
struct {
--------
} retstack;
/* stats */
counter_t addr_hits; /* num correct addr-predictions */
counter_t dir_hits; /* num correct dir-predictions (incl addr) */
counter_t addr_misses; /* num addr_misses */
counter_t used_ras; /* num RAS predictions used */
counter_t used_bimod; /* num bimodal predictions used (BPredComb) */
-----------
};
2) Changes in file bpred.c:
-----------
sprintf(buf, "%s.dir_hits", name);
stat_reg_counter(sdb, buf, "total number of direction-predicted hits " "(includes addr-
hits)",
&pred->dir_hits, 0, NULL);
sprintf(buf, "%s.addr_misses", name);
stat_reg_counter(sdb, buf, "total number of addr-misses",
&pred->addr_misses, 0, NULL);
-----------
if (bpred == NULL)
return;
bpred->dir_hits = 0;
bpred->addr_misses = 0;
-----------
/* Have a branch here */
if (correct)
pred->addr_hits++;
if (!!pred_taken == !!taken)
pred->dir_hits++;
else
pred->misses++;
pred->addr_misses= (pred->misses + pred->dir_hits - pred->addr_hits);
-----------
-----------
}
P a g e | 7
Part 3: Comparison of BTB Performance
The simulation is done for the following configurations of Branch Target Buffer:
 Baseline BTB configuration: 64 sets, 2 way associativity
–bpred bimod –bpred:bimod 256 -bpred:btb 64 2
 Showing the effect of the number of sets in BTB with the following options
–bpred bimod –bpred:bimod 256 -bpred:btb 32 2
–bpred bimod –bpred:bimod 256 –bpred:btb 128 2
 Showing the effect of associativity when the total size of BTB is fixed with the following options
–bpred bimod –bpred:bimod 256 -bpred:btb 32 4
–bpred bimod –bpred:bimod 256 -bpred:btb 128 1
Performance Analysis based on addr_hits
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 64 sets/2 way 2235498 2771048 1934760
2 32 sets/2 way 2095859 2746365 1832302
3 128 sets/2 way 2389785 2777415 2008597
4 32 sets/4 way 2260256 2775372 1936745
5 128 sets/1 way 2197498 2759944 1893595
Graphical Representation with above addr_hits
0
500000
1000000
1500000
2000000
2500000
3000000
64 sets/2 way 32 sets/2 way 128 sets/2 way 32 sets/4 way 128 sets/1 way
GO
GCC
ANAGRAM
P a g e | 8
The above graph shows the behavior of various configurations of Branch Target Buffer
(BTB) for different benchmarks. Among all the three benchmarks, ANAGRAM benchmark has the
highest address hits and the performance is relatively minimum for BTB with 32 sets and 4 way
set associative. GCC benchmark has moderate address hits and the performance is relatively
minimum for BTB with 32 sets and 4 way set associative. GO benchmark has poor address hits
when compared to other benchmark. For this benchmark, the address hits is again minimum for
the configuration of BTB with 32 sets and 4 way set associative.
Comparison of BTB Performance based on addr_misses
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 64 sets/2 way 1084176 127541 801464
2 32 sets/2 way 1223815 152224 903922
3 128 sets/2 way 929889 121174 727627
4 32 sets/4 way 1059418 123217 799479
5 128 sets/1 way 1122176 138645 842629
Graphical Representation with above addr_misses
From the above graph, as expected, address misses is very optimal for ANAGRAM
benchmark. GCC benchmark has maximum address misses among all the three benchmarks. As
we can see from the graph, decreasing the sets from 64 to 32 increases the miss rate and
increasing the number of set from 64 to 128 decreases the address misses. This is because
capacity misses is reduced by increasing the number of sets. In case of 32 sets/4 way
configuration, even though set is decreased from 64 to 32 the address miss is decreased because
the associativity is increased which reduces the conflict misses. In case of 128 sets/1 way
configuration, due to direct mapping, even the increase in number of set increases the
addr_misses.
0
200000
400000
600000
800000
1000000
1200000
1400000
64 sets/2 way 32 sets/2 way 128 sets/2 way 32 sets/4 way 128 sets/1 way
ANAGRAM
GO
GCC
P a g e | 9
Comparison of BTB Performance based on CPI
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 64 sets/2 way 0. 9500 0. 4674 0. 7571
2 32 sets/2 way 0. 9664 0. 4711 0. 7645
3 128 sets/2 way 0. 9304 0. 4664 0. 7496
4 32 sets/4 way 0. 9491 0. 4670 0. 7575
5 128 sets/1 way 0. 9528 0. 4686 0. 7583
Graphical Representation with above CPI
From the above graph, CPI remains fairly constant for every benchmark. Among the
benchmarks, ANAGRAM benchmark has the most optimal CPI and GCC benchmark holds the
maximum CPI for execution with various BTB configurations. The CPI seems to be higher for
configuration 32 sets/2 way compared to the 64 sets/2 way which has much higher sets than this
configuration. In case of 32 sets/4 way and 128 sets/1 way configurations, associativity and
number of sets makes the CPI almost equal to the 64 sets/2 way CPI. For the configuration with
set 128 and associativity 2 the CPI remains much lower than all other configurations.
0
0.2
0.4
0.6
0.8
1
1.2
64 sets/2 way 32 sets/2 way 128 sets/2 way 32 sets/4 way 128 sets/1 way
GCC
ANAGRAM
GO
P a g e | 10
Comparison of BTB Performance based on Branch Predictor Hit Rates
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 64 sets/2 way 0.6779 0.9546 0.6926
2 32 sets/2 way 0.636 0.9476 0.6527
3 128 sets/2 way 0.7221 0.9557 0.7225
4 32 sets/4 way 0.6852 0.9573 0.6931
5 128 sets/1 way 0.665 0.9518 0.6775
Graphical Representation with above Branch Predictor Hit Rates
The above graph clearly shows us that the branch predictor hit rate for all the
benchmarks is relatively low when number of set decreases in a BTB. When we closely observe
the variation in the branch predictor hit rates of different configurations, it is evident that for BTB
configuration, 32 sets and 2 way set associative the branch prediction hit rate is lower for all the
benchmarks.
CONCLUSION
For an optimal branch predictor, it is recommended to have higher sets but at the same time
tradeoff between cost and performance should be taken into consideration.
To have high address hit rates and direction hit rates, the simulation results suggests that
combination of two level and bimodal predictor configuration is better.
0
0.2
0.4
0.6
0.8
1
1.2
64 sets/2
way
32 sets/2
way
128 sets/2
way
32 sets/4
way
128 sets/1
way
GCC
ANAGRAM
GO

More Related Content

What's hot

A 128 kbit sram with an embedded energy monitoring circuit and sense amplifie...
A 128 kbit sram with an embedded energy monitoring circuit and sense amplifie...A 128 kbit sram with an embedded energy monitoring circuit and sense amplifie...
A 128 kbit sram with an embedded energy monitoring circuit and sense amplifie...
Grace Abraham
 
SRAM read and write and sense amplifier
SRAM read and write and sense amplifierSRAM read and write and sense amplifier
SRAM read and write and sense amplifier
Soumyajit Langal
 
Final Project Report
Final Project ReportFinal Project Report
Final Project Report
Riddhi Shah
 
Assignement 3 ADV report (1)
Assignement 3 ADV report (1)Assignement 3 ADV report (1)
Assignement 3 ADV report (1)
Riddhi Shah
 
ADS Lab 5 Report
ADS Lab 5 ReportADS Lab 5 Report
ADS Lab 5 Report
Riddhi Shah
 

What's hot (19)

Design and Simulation Low power SRAM Circuits
Design and Simulation Low power SRAM CircuitsDesign and Simulation Low power SRAM Circuits
Design and Simulation Low power SRAM Circuits
 
Static-Noise-Margin Analysis of Modified 6T SRAM Cell during Read Operation
Static-Noise-Margin Analysis of Modified 6T SRAM Cell during Read OperationStatic-Noise-Margin Analysis of Modified 6T SRAM Cell during Read Operation
Static-Noise-Margin Analysis of Modified 6T SRAM Cell during Read Operation
 
Project Report Of SRAM Design
Project Report Of SRAM DesignProject Report Of SRAM Design
Project Report Of SRAM Design
 
250nm Technology Based Low Power SRAM Memory
250nm Technology Based Low Power SRAM Memory250nm Technology Based Low Power SRAM Memory
250nm Technology Based Low Power SRAM Memory
 
Implementation of High Reliable 6T SRAM Cell Design
Implementation of High Reliable 6T SRAM Cell DesignImplementation of High Reliable 6T SRAM Cell Design
Implementation of High Reliable 6T SRAM Cell Design
 
Design of a low power asynchronous SRAM in 45nM CMOS
Design of a low power asynchronous SRAM in 45nM CMOSDesign of a low power asynchronous SRAM in 45nM CMOS
Design of a low power asynchronous SRAM in 45nM CMOS
 
ASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGY
ASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGYASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGY
ASIC DESIGN OF MINI-STEREO DIGITAL AUDIO PROCESSOR UNDER SMIC 180NM TECHNOLOGY
 
Low power sram
Low power sramLow power sram
Low power sram
 
Sram pdf
Sram pdfSram pdf
Sram pdf
 
SRAM- Ultra low voltage operation
SRAM- Ultra low voltage operationSRAM- Ultra low voltage operation
SRAM- Ultra low voltage operation
 
A 128 kbit sram with an embedded energy monitoring circuit and sense amplifie...
A 128 kbit sram with an embedded energy monitoring circuit and sense amplifie...A 128 kbit sram with an embedded energy monitoring circuit and sense amplifie...
A 128 kbit sram with an embedded energy monitoring circuit and sense amplifie...
 
SRAM read and write and sense amplifier
SRAM read and write and sense amplifierSRAM read and write and sense amplifier
SRAM read and write and sense amplifier
 
Final Project Report
Final Project ReportFinal Project Report
Final Project Report
 
Low power sram design using block partitioning
Low power sram design using block partitioningLow power sram design using block partitioning
Low power sram design using block partitioning
 
Assignement 3 ADV report (1)
Assignement 3 ADV report (1)Assignement 3 ADV report (1)
Assignement 3 ADV report (1)
 
SRAM
SRAMSRAM
SRAM
 
Ltu ex-05238-se
Ltu ex-05238-seLtu ex-05238-se
Ltu ex-05238-se
 
ADS Lab 5 Report
ADS Lab 5 ReportADS Lab 5 Report
ADS Lab 5 Report
 
A Novel Architecture Design & Characterization of CAM Controller IP Core with...
A Novel Architecture Design & Characterization of CAM Controller IP Core with...A Novel Architecture Design & Characterization of CAM Controller IP Core with...
A Novel Architecture Design & Characterization of CAM Controller IP Core with...
 

Similar to Evaluation of Branch Predictors

Iaetsd design and implementation of pseudo random number generator
Iaetsd design and implementation of pseudo random number generatorIaetsd design and implementation of pseudo random number generator
Iaetsd design and implementation of pseudo random number generator
Iaetsd Iaetsd
 
Estimation of bitlength of transformed quantized residue
Estimation of bitlength of transformed quantized residueEstimation of bitlength of transformed quantized residue
Estimation of bitlength of transformed quantized residue
IAEME Publication
 
Ccna 4 v5 practice skills assessment – packet tracer
Ccna 4 v5 practice skills assessment – packet tracerCcna 4 v5 practice skills assessment – packet tracer
Ccna 4 v5 practice skills assessment – packet tracer
Đồng Quốc Vương
 

Similar to Evaluation of Branch Predictors (20)

Iaetsd design and implementation of pseudo random number generator
Iaetsd design and implementation of pseudo random number generatorIaetsd design and implementation of pseudo random number generator
Iaetsd design and implementation of pseudo random number generator
 
Estimation of bitlength of transformed quantized residue
Estimation of bitlength of transformed quantized residueEstimation of bitlength of transformed quantized residue
Estimation of bitlength of transformed quantized residue
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
 
IRJET- Implementation of 16-Bit Pipelined ADC using 180nm CMOS Technology
IRJET-  	  Implementation of 16-Bit Pipelined ADC using 180nm CMOS TechnologyIRJET-  	  Implementation of 16-Bit Pipelined ADC using 180nm CMOS Technology
IRJET- Implementation of 16-Bit Pipelined ADC using 180nm CMOS Technology
 
Module 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and ProgrammingModule 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and Programming
 
BGP Weight Manipulation with Route Map
BGP Weight Manipulation with Route MapBGP Weight Manipulation with Route Map
BGP Weight Manipulation with Route Map
 
IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...
IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...
IRJET-Hardware Co-Simulation of Classical Edge Detection Algorithms using Xil...
 
IRJET- Accuracy Configurable Adder
IRJET- Accuracy Configurable AdderIRJET- Accuracy Configurable Adder
IRJET- Accuracy Configurable Adder
 
Ccna 4 v5 practice skills assessment – packet tracer
Ccna 4 v5 practice skills assessment – packet tracerCcna 4 v5 practice skills assessment – packet tracer
Ccna 4 v5 practice skills assessment – packet tracer
 
Designing of Adders and Vedic Multiplier using Gate Diffusion Input
Designing of Adders and Vedic Multiplier using Gate Diffusion InputDesigning of Adders and Vedic Multiplier using Gate Diffusion Input
Designing of Adders and Vedic Multiplier using Gate Diffusion Input
 
FPGA based JPEG Encoder
FPGA based JPEG EncoderFPGA based JPEG Encoder
FPGA based JPEG Encoder
 
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
 
Rems final
Rems finalRems final
Rems final
 
Queue Size Trade Off with Modulation in 802.15.4 for Wireless Sensor Networks
Queue Size Trade Off with Modulation in 802.15.4 for Wireless Sensor NetworksQueue Size Trade Off with Modulation in 802.15.4 for Wireless Sensor Networks
Queue Size Trade Off with Modulation in 802.15.4 for Wireless Sensor Networks
 
An Energy-Efficient Lut-Log-Bcjr Architecture Using Constant Log Bcjr Algorithm
An Energy-Efficient Lut-Log-Bcjr Architecture Using Constant Log Bcjr AlgorithmAn Energy-Efficient Lut-Log-Bcjr Architecture Using Constant Log Bcjr Algorithm
An Energy-Efficient Lut-Log-Bcjr Architecture Using Constant Log Bcjr Algorithm
 
RR
RRRR
RR
 
Bgp
BgpBgp
Bgp
 
40120140504012
4012014050401240120140504012
40120140504012
 

More from Bharat Biyani

More from Bharat Biyani (6)

Customizable Microprocessor design on Nexys 3 Spartan FPGA Board
Customizable Microprocessor design on Nexys 3 Spartan FPGA BoardCustomizable Microprocessor design on Nexys 3 Spartan FPGA Board
Customizable Microprocessor design on Nexys 3 Spartan FPGA Board
 
Solar Charge Controller
Solar Charge ControllerSolar Charge Controller
Solar Charge Controller
 
Standard cells library design
Standard cells library designStandard cells library design
Standard cells library design
 
Operational Amplifier Design
Operational Amplifier DesignOperational Amplifier Design
Operational Amplifier Design
 
Automated Traffic Density Detection and Speed Monitoring
Automated Traffic Density Detection and Speed MonitoringAutomated Traffic Density Detection and Speed Monitoring
Automated Traffic Density Detection and Speed Monitoring
 
32 bit ALU Chip Design using IBM 130nm process technology
32 bit ALU Chip Design using IBM 130nm process technology32 bit ALU Chip Design using IBM 130nm process technology
32 bit ALU Chip Design using IBM 130nm process technology
 

Recently uploaded

notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 

Recently uploaded (20)

UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 

Evaluation of Branch Predictors

  • 1. P a g e | 1 The University of Texas at Dallas Department of Electrical Engineering EECE/CS 6304: COMPUTER ARCHITECTURE PROJECT #2 “ ANALYSIS OF DIFFERENT TYPES OF BRANCH PREDICTORS ” Submitted by, Bharat Biyani (2021152193) Shree Viswa Shamanthan L D (2021180127)
  • 2. P a g e | 1 INTRODUCTION In computer architecture, a branch predictor is a digital circuit that tries to speculate which way a branch will go before this is known for sure (i.e., before its execution). The purpose of the branch predictor is to improve the flow in the instruction pipeline. They play a critical role in achieving high effective performance in many modern pipelined microprocessor architectures such as x86. In this project, we analyze the behavior of different branch predictor configurations in three well-recognized benchmarks, especially GCC, ANAGRAM and GO. We used simplescalar sim-outorder, which models all the execution aspects of Alpha 21264. The simulations provide the CPI values, which we use to compare among different benchmarks. We have used three types of hardware based branch prediction strategies, they are: 1) Bimodal Predictor: It is a simple predictor, which uses 2-bit saturating counters to predict if a given branch is likely to be taken or not. 2) Two Level Predictor: A two-level adaptive predictor with an n-bit history is that it can predict any repetitive sequence with any period if all n-bit sub-sequences are different. The advantage of the two-level adaptive predictor is that it can quickly learn to predict an arbitrary repetitive pattern. 3) Combined Predictor: A hybrid predictor also called combined predictor implements more than one prediction mechanism. The final prediction is based either on a meta-predictor that remembers which of the predictors has made the best predictions in the past or a majority vote function based on an odd number of different predictors.
  • 3. P a g e | 2 Part 1: Performance analysis of different types of branch predictors The simulation is done for different configuration of Return Address Space (RAS) and types of branch predictions.  Baseline default RAS: Bimodal predictor with the default value for RAS. -bpred bimod -bpred:bimod 256 -bpred:ras 8 -bpred:btb 64 2  2 Level Predictor: Uses two bit for defining the state for branch predictor. -bpred 2lev -bpred:2lev 1 256 4 0 -bpred:ras 8 -bpred:btb 64 2  Comb: Combines a two levels and bimodal predictor. -bpred comb -bpred:comb 256 -bpred:bimod 256 -bpred:2lev 1 256 4 0 -bpred:ras 8 - bpred:btb 64 2  RAS 4: Change the return address stack (RAS) size to 4. -bpred bimod -bpred:bimod 256 -bpred:ras 4 -bpred:btb 64 2  RAS 16: Change the return address stack (RAS) size to 16. -bpred bimod -bpred:bimod 256 -bpred:ras 16 -bpred:btb 64 2 Performance Analysis based on CPI Sr. No. Configuration Benchmarks GCC ANAGRAM GO 1 Baseline 0.95 0.4674 0.7571 2 2 Level Predictor 0.9822 0.4605 0.7893 3 Comb 0.8678 0.4546 0.7516 4 Bimod: RAS 4 0.9538 0.4678 0.7574 5 Bimod: RAS 16 0.9498 0.4674 0.7571 Graphical Representation with above CPI 0 0.2 0.4 0.6 0.8 1 1.2 Baseline 2 Level Predictor Comb RAS 4 RAS 16 ANAGRAM GO GCC
  • 4. P a g e | 3 Above graph clearly displays the performance of different configurations of branch predictor. Analysis: Benchmark – GCC vs BP Configurations GCC benchmark has more CPI as compared to the other benchmarks. Specifically, CPI improved for combination of two level and bimodal predictor (Comb). It has high CPI for 2 level predictor which uses two bits for defining state of branch predictor. Analysis: Benchmark – ANAGRAM vs BP Configurations From the above graph, we can infer that ANAGRAM benchmark has a less CPI than the other two benchmarks. The performance of ANAGRAM benchmark is fairly constant for all the configurations of branch predictor. Specifically, CPI is optimal for combination of two level and bimodal predictor (Comb). Analysis: Benchmark – GO vs BP Configurations Above graph shows that GO benchmark performs better than the GCC benchmark. The performance of GO benchmark is almost constant for all the configurations of branch predictor. Specifically, CPI is optimal for combination of two level and bimodal predictor (Comb). With respect to bimod size variation, if we change baseline configuration from the default return address space from size of 4 to size of 16, CPI performance gets better. RAS size does not have much impact on CPI.
  • 5. P a g e | 4 Performance Analysis based on Address Hit Rates Sr. No. Configuration Benchmarks GCC ANAGRAM GO 1 Baseline 0.6734 0.956 0.7071 2 2 Level Predictor 0.6253 0.9575 0.6484 3 Comb 0.8339 0.9694 0.709 4 Bimod: RAS 4 0.6697 0.9555 0.7067 5 Bimod: RAS 16 0.6736 0.9605 0.7071 Graphical Representation with above Address Hit Rates The above graph clearly shows the performance of different configurations of branch predictor for different benchmarks. For ANAGRAM benchmark, except for bimod, Return Address Stack (RAS) size 4, the Address Hit Rates are appreciable. For GO benchmark, except for 2 level predictor configuration, the Address Hit Rates are appreciable. For GCC benchmark, except for 2 level predictor configuration, the Address Hits Rates are appreciable. 0 0.2 0.4 0.6 0.8 1 1.2 Baseline 2 Level Predictor Comb Bimod: RAS 4 Bimod: RAS 16 GCC GO ANAGRAM
  • 6. P a g e | 5 Performance Analysis based on Direction Hit Rates Sr. No. Configuration Benchmarks GCC ANAGRAM GO 1 Baseline 0.6734 0.9605 0.7929 2 2 Level Predictor 0.7919 0.9614 0.7372 3 Comb 0.8617 0.9738 0.7978 4 Bimod: RAS 4 0.8431 0.9605 0.7929 5 Bimod: RAS 16 0.8431 0.9605 0.7929 The graph for the Direction Hit Rates with respect to every benchmark will provide us more information on the effect of branch prediction configurations on different benchmarks. Graphical Representation with above Direction Hit Rates The Direction Hit Rates of the branch predictors fairly stays constant for each benchmark. Specifically, ANAGRAM benchmark has more direction hit rates than other two benchmarks. In this case, 2 level prediction direction rate gives worst performance but when we change Returns Address Space from 8 to 16 or 8 to 4, it performs better. 0 0.2 0.4 0.6 0.8 1 1.2 Baseline 2 Level Predictor Comb Bimod: RAS 4 Bimod: RAS 16 GCC GO ANAGRAM
  • 7. P a g e | 6 Part 2: Modification of the code to accommodate address misses We carried out modifications in the following two files in simplescalar. 1) bpred.h 2) bpred.c 1) Changes in file bpred.h: ---------------- /* branch predictor def */ struct bpred_t { ------ } dirpred; struct { -------- } retstack; /* stats */ counter_t addr_hits; /* num correct addr-predictions */ counter_t dir_hits; /* num correct dir-predictions (incl addr) */ counter_t addr_misses; /* num addr_misses */ counter_t used_ras; /* num RAS predictions used */ counter_t used_bimod; /* num bimodal predictions used (BPredComb) */ ----------- }; 2) Changes in file bpred.c: ----------- sprintf(buf, "%s.dir_hits", name); stat_reg_counter(sdb, buf, "total number of direction-predicted hits " "(includes addr- hits)", &pred->dir_hits, 0, NULL); sprintf(buf, "%s.addr_misses", name); stat_reg_counter(sdb, buf, "total number of addr-misses", &pred->addr_misses, 0, NULL); ----------- if (bpred == NULL) return; bpred->dir_hits = 0; bpred->addr_misses = 0; ----------- /* Have a branch here */ if (correct) pred->addr_hits++; if (!!pred_taken == !!taken) pred->dir_hits++; else pred->misses++; pred->addr_misses= (pred->misses + pred->dir_hits - pred->addr_hits); ----------- ----------- }
  • 8. P a g e | 7 Part 3: Comparison of BTB Performance The simulation is done for the following configurations of Branch Target Buffer:  Baseline BTB configuration: 64 sets, 2 way associativity –bpred bimod –bpred:bimod 256 -bpred:btb 64 2  Showing the effect of the number of sets in BTB with the following options –bpred bimod –bpred:bimod 256 -bpred:btb 32 2 –bpred bimod –bpred:bimod 256 –bpred:btb 128 2  Showing the effect of associativity when the total size of BTB is fixed with the following options –bpred bimod –bpred:bimod 256 -bpred:btb 32 4 –bpred bimod –bpred:bimod 256 -bpred:btb 128 1 Performance Analysis based on addr_hits Sr. No. Configuration Benchmarks GCC ANAGRAM GO 1 64 sets/2 way 2235498 2771048 1934760 2 32 sets/2 way 2095859 2746365 1832302 3 128 sets/2 way 2389785 2777415 2008597 4 32 sets/4 way 2260256 2775372 1936745 5 128 sets/1 way 2197498 2759944 1893595 Graphical Representation with above addr_hits 0 500000 1000000 1500000 2000000 2500000 3000000 64 sets/2 way 32 sets/2 way 128 sets/2 way 32 sets/4 way 128 sets/1 way GO GCC ANAGRAM
  • 9. P a g e | 8 The above graph shows the behavior of various configurations of Branch Target Buffer (BTB) for different benchmarks. Among all the three benchmarks, ANAGRAM benchmark has the highest address hits and the performance is relatively minimum for BTB with 32 sets and 4 way set associative. GCC benchmark has moderate address hits and the performance is relatively minimum for BTB with 32 sets and 4 way set associative. GO benchmark has poor address hits when compared to other benchmark. For this benchmark, the address hits is again minimum for the configuration of BTB with 32 sets and 4 way set associative. Comparison of BTB Performance based on addr_misses Sr. No. Configuration Benchmarks GCC ANAGRAM GO 1 64 sets/2 way 1084176 127541 801464 2 32 sets/2 way 1223815 152224 903922 3 128 sets/2 way 929889 121174 727627 4 32 sets/4 way 1059418 123217 799479 5 128 sets/1 way 1122176 138645 842629 Graphical Representation with above addr_misses From the above graph, as expected, address misses is very optimal for ANAGRAM benchmark. GCC benchmark has maximum address misses among all the three benchmarks. As we can see from the graph, decreasing the sets from 64 to 32 increases the miss rate and increasing the number of set from 64 to 128 decreases the address misses. This is because capacity misses is reduced by increasing the number of sets. In case of 32 sets/4 way configuration, even though set is decreased from 64 to 32 the address miss is decreased because the associativity is increased which reduces the conflict misses. In case of 128 sets/1 way configuration, due to direct mapping, even the increase in number of set increases the addr_misses. 0 200000 400000 600000 800000 1000000 1200000 1400000 64 sets/2 way 32 sets/2 way 128 sets/2 way 32 sets/4 way 128 sets/1 way ANAGRAM GO GCC
  • 10. P a g e | 9 Comparison of BTB Performance based on CPI Sr. No. Configuration Benchmarks GCC ANAGRAM GO 1 64 sets/2 way 0. 9500 0. 4674 0. 7571 2 32 sets/2 way 0. 9664 0. 4711 0. 7645 3 128 sets/2 way 0. 9304 0. 4664 0. 7496 4 32 sets/4 way 0. 9491 0. 4670 0. 7575 5 128 sets/1 way 0. 9528 0. 4686 0. 7583 Graphical Representation with above CPI From the above graph, CPI remains fairly constant for every benchmark. Among the benchmarks, ANAGRAM benchmark has the most optimal CPI and GCC benchmark holds the maximum CPI for execution with various BTB configurations. The CPI seems to be higher for configuration 32 sets/2 way compared to the 64 sets/2 way which has much higher sets than this configuration. In case of 32 sets/4 way and 128 sets/1 way configurations, associativity and number of sets makes the CPI almost equal to the 64 sets/2 way CPI. For the configuration with set 128 and associativity 2 the CPI remains much lower than all other configurations. 0 0.2 0.4 0.6 0.8 1 1.2 64 sets/2 way 32 sets/2 way 128 sets/2 way 32 sets/4 way 128 sets/1 way GCC ANAGRAM GO
  • 11. P a g e | 10 Comparison of BTB Performance based on Branch Predictor Hit Rates Sr. No. Configuration Benchmarks GCC ANAGRAM GO 1 64 sets/2 way 0.6779 0.9546 0.6926 2 32 sets/2 way 0.636 0.9476 0.6527 3 128 sets/2 way 0.7221 0.9557 0.7225 4 32 sets/4 way 0.6852 0.9573 0.6931 5 128 sets/1 way 0.665 0.9518 0.6775 Graphical Representation with above Branch Predictor Hit Rates The above graph clearly shows us that the branch predictor hit rate for all the benchmarks is relatively low when number of set decreases in a BTB. When we closely observe the variation in the branch predictor hit rates of different configurations, it is evident that for BTB configuration, 32 sets and 2 way set associative the branch prediction hit rate is lower for all the benchmarks. CONCLUSION For an optimal branch predictor, it is recommended to have higher sets but at the same time tradeoff between cost and performance should be taken into consideration. To have high address hit rates and direction hit rates, the simulation results suggests that combination of two level and bimodal predictor configuration is better. 0 0.2 0.4 0.6 0.8 1 1.2 64 sets/2 way 32 sets/2 way 128 sets/2 way 32 sets/4 way 128 sets/1 way GCC ANAGRAM GO