Compared the performance of several branch predictor types with different RAS configurations and Branch Target Buffer configurations for three individual benchmarks namely GCC,GO and ANAGRAM using the SIMPLESCALAR simulator. Cycles per instruction(CPI),Address rate and Direction rate were the parameters used to compare and draw conclusions.
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
Evaluation of Branch Predictors
1. P a g e | 1
The University of Texas at Dallas
Department of Electrical Engineering
EECE/CS 6304: COMPUTER ARCHITECTURE
PROJECT #2
“ ANALYSIS OF DIFFERENT TYPES OF
BRANCH PREDICTORS ”
Submitted by,
Bharat Biyani (2021152193)
Shree Viswa Shamanthan L D (2021180127)
2. P a g e | 1
INTRODUCTION
In computer architecture, a branch predictor is a digital circuit that tries to speculate
which way a branch will go before this is known for sure (i.e., before its execution). The purpose
of the branch predictor is to improve the flow in the instruction pipeline. They play a critical role
in achieving high effective performance in many modern pipelined microprocessor architectures
such as x86.
In this project, we analyze the behavior of different branch predictor configurations in
three well-recognized benchmarks, especially GCC, ANAGRAM and GO. We used simplescalar
sim-outorder, which models all the execution aspects of Alpha 21264. The simulations provide
the CPI values, which we use to compare among different benchmarks.
We have used three types of hardware based branch prediction strategies, they are:
1) Bimodal Predictor: It is a simple predictor, which uses 2-bit saturating counters to predict if a
given branch is likely to be taken or not.
2) Two Level Predictor: A two-level adaptive predictor with an n-bit history is that it can predict
any repetitive sequence with any period if all n-bit sub-sequences are different. The
advantage of the two-level adaptive predictor is that it can quickly learn to predict an
arbitrary repetitive pattern.
3) Combined Predictor: A hybrid predictor also called combined predictor implements more
than one prediction mechanism. The final prediction is based either on a meta-predictor that
remembers which of the predictors has made the best predictions in the past or a majority
vote function based on an odd number of different predictors.
3. P a g e | 2
Part 1: Performance analysis of different types of branch predictors
The simulation is done for different configuration of Return Address Space (RAS) and types of
branch predictions.
Baseline default RAS: Bimodal predictor with the default value for RAS.
-bpred bimod -bpred:bimod 256 -bpred:ras 8 -bpred:btb 64 2
2 Level Predictor: Uses two bit for defining the state for branch predictor.
-bpred 2lev -bpred:2lev 1 256 4 0 -bpred:ras 8 -bpred:btb 64 2
Comb: Combines a two levels and bimodal predictor.
-bpred comb -bpred:comb 256 -bpred:bimod 256 -bpred:2lev 1 256 4 0 -bpred:ras 8 -
bpred:btb 64 2
RAS 4: Change the return address stack (RAS) size to 4.
-bpred bimod -bpred:bimod 256 -bpred:ras 4 -bpred:btb 64 2
RAS 16: Change the return address stack (RAS) size to 16.
-bpred bimod -bpred:bimod 256 -bpred:ras 16 -bpred:btb 64 2
Performance Analysis based on CPI
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 Baseline 0.95 0.4674 0.7571
2 2 Level Predictor 0.9822 0.4605 0.7893
3 Comb 0.8678 0.4546 0.7516
4 Bimod: RAS 4 0.9538 0.4678 0.7574
5 Bimod: RAS 16 0.9498 0.4674 0.7571
Graphical Representation with above CPI
0
0.2
0.4
0.6
0.8
1
1.2
Baseline 2 Level
Predictor
Comb RAS 4 RAS 16
ANAGRAM
GO
GCC
4. P a g e | 3
Above graph clearly displays the performance of different configurations of branch predictor.
Analysis: Benchmark – GCC vs BP Configurations
GCC benchmark has more CPI as compared to the other benchmarks. Specifically, CPI
improved for combination of two level and bimodal predictor (Comb). It has high CPI for 2 level
predictor which uses two bits for defining state of branch predictor.
Analysis: Benchmark – ANAGRAM vs BP Configurations
From the above graph, we can infer that ANAGRAM benchmark has a less CPI than the
other two benchmarks. The performance of ANAGRAM benchmark is fairly constant for all the
configurations of branch predictor. Specifically, CPI is optimal for combination of two level and
bimodal predictor (Comb).
Analysis: Benchmark – GO vs BP Configurations
Above graph shows that GO benchmark performs better than the GCC benchmark. The
performance of GO benchmark is almost constant for all the configurations of branch predictor.
Specifically, CPI is optimal for combination of two level and bimodal predictor (Comb). With
respect to bimod size variation, if we change baseline configuration from the default return
address space from size of 4 to size of 16, CPI performance gets better. RAS size does not have
much impact on CPI.
5. P a g e | 4
Performance Analysis based on Address Hit Rates
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 Baseline 0.6734 0.956 0.7071
2 2 Level Predictor 0.6253 0.9575 0.6484
3 Comb 0.8339 0.9694 0.709
4 Bimod: RAS 4 0.6697 0.9555 0.7067
5 Bimod: RAS 16 0.6736 0.9605 0.7071
Graphical Representation with above Address Hit Rates
The above graph clearly shows the performance of different configurations of branch
predictor for different benchmarks.
For ANAGRAM benchmark, except for bimod, Return Address Stack (RAS) size 4, the
Address Hit Rates are appreciable.
For GO benchmark, except for 2 level predictor configuration, the Address Hit Rates are
appreciable.
For GCC benchmark, except for 2 level predictor configuration, the Address Hits Rates are
appreciable.
0
0.2
0.4
0.6
0.8
1
1.2
Baseline 2 Level Predictor Comb Bimod: RAS 4 Bimod: RAS 16
GCC
GO
ANAGRAM
6. P a g e | 5
Performance Analysis based on Direction Hit Rates
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 Baseline 0.6734 0.9605 0.7929
2 2 Level Predictor 0.7919 0.9614 0.7372
3 Comb 0.8617 0.9738 0.7978
4 Bimod: RAS 4 0.8431 0.9605 0.7929
5 Bimod: RAS 16 0.8431 0.9605 0.7929
The graph for the Direction Hit Rates with respect to every benchmark will provide us
more information on the effect of branch prediction configurations on different benchmarks.
Graphical Representation with above Direction Hit Rates
The Direction Hit Rates of the branch predictors fairly stays constant for each benchmark.
Specifically, ANAGRAM benchmark has more direction hit rates than other two benchmarks. In
this case, 2 level prediction direction rate gives worst performance but when we change Returns
Address Space from 8 to 16 or 8 to 4, it performs better.
0
0.2
0.4
0.6
0.8
1
1.2
Baseline 2 Level Predictor Comb Bimod: RAS 4 Bimod: RAS 16
GCC
GO
ANAGRAM
7. P a g e | 6
Part 2: Modification of the code to accommodate address misses
We carried out modifications in the following two files in simplescalar.
1) bpred.h
2) bpred.c
1) Changes in file bpred.h:
----------------
/* branch predictor def */
struct bpred_t {
------
} dirpred;
struct {
--------
} retstack;
/* stats */
counter_t addr_hits; /* num correct addr-predictions */
counter_t dir_hits; /* num correct dir-predictions (incl addr) */
counter_t addr_misses; /* num addr_misses */
counter_t used_ras; /* num RAS predictions used */
counter_t used_bimod; /* num bimodal predictions used (BPredComb) */
-----------
};
2) Changes in file bpred.c:
-----------
sprintf(buf, "%s.dir_hits", name);
stat_reg_counter(sdb, buf, "total number of direction-predicted hits " "(includes addr-
hits)",
&pred->dir_hits, 0, NULL);
sprintf(buf, "%s.addr_misses", name);
stat_reg_counter(sdb, buf, "total number of addr-misses",
&pred->addr_misses, 0, NULL);
-----------
if (bpred == NULL)
return;
bpred->dir_hits = 0;
bpred->addr_misses = 0;
-----------
/* Have a branch here */
if (correct)
pred->addr_hits++;
if (!!pred_taken == !!taken)
pred->dir_hits++;
else
pred->misses++;
pred->addr_misses= (pred->misses + pred->dir_hits - pred->addr_hits);
-----------
-----------
}
8. P a g e | 7
Part 3: Comparison of BTB Performance
The simulation is done for the following configurations of Branch Target Buffer:
Baseline BTB configuration: 64 sets, 2 way associativity
–bpred bimod –bpred:bimod 256 -bpred:btb 64 2
Showing the effect of the number of sets in BTB with the following options
–bpred bimod –bpred:bimod 256 -bpred:btb 32 2
–bpred bimod –bpred:bimod 256 –bpred:btb 128 2
Showing the effect of associativity when the total size of BTB is fixed with the following options
–bpred bimod –bpred:bimod 256 -bpred:btb 32 4
–bpred bimod –bpred:bimod 256 -bpred:btb 128 1
Performance Analysis based on addr_hits
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 64 sets/2 way 2235498 2771048 1934760
2 32 sets/2 way 2095859 2746365 1832302
3 128 sets/2 way 2389785 2777415 2008597
4 32 sets/4 way 2260256 2775372 1936745
5 128 sets/1 way 2197498 2759944 1893595
Graphical Representation with above addr_hits
0
500000
1000000
1500000
2000000
2500000
3000000
64 sets/2 way 32 sets/2 way 128 sets/2 way 32 sets/4 way 128 sets/1 way
GO
GCC
ANAGRAM
9. P a g e | 8
The above graph shows the behavior of various configurations of Branch Target Buffer
(BTB) for different benchmarks. Among all the three benchmarks, ANAGRAM benchmark has the
highest address hits and the performance is relatively minimum for BTB with 32 sets and 4 way
set associative. GCC benchmark has moderate address hits and the performance is relatively
minimum for BTB with 32 sets and 4 way set associative. GO benchmark has poor address hits
when compared to other benchmark. For this benchmark, the address hits is again minimum for
the configuration of BTB with 32 sets and 4 way set associative.
Comparison of BTB Performance based on addr_misses
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 64 sets/2 way 1084176 127541 801464
2 32 sets/2 way 1223815 152224 903922
3 128 sets/2 way 929889 121174 727627
4 32 sets/4 way 1059418 123217 799479
5 128 sets/1 way 1122176 138645 842629
Graphical Representation with above addr_misses
From the above graph, as expected, address misses is very optimal for ANAGRAM
benchmark. GCC benchmark has maximum address misses among all the three benchmarks. As
we can see from the graph, decreasing the sets from 64 to 32 increases the miss rate and
increasing the number of set from 64 to 128 decreases the address misses. This is because
capacity misses is reduced by increasing the number of sets. In case of 32 sets/4 way
configuration, even though set is decreased from 64 to 32 the address miss is decreased because
the associativity is increased which reduces the conflict misses. In case of 128 sets/1 way
configuration, due to direct mapping, even the increase in number of set increases the
addr_misses.
0
200000
400000
600000
800000
1000000
1200000
1400000
64 sets/2 way 32 sets/2 way 128 sets/2 way 32 sets/4 way 128 sets/1 way
ANAGRAM
GO
GCC
10. P a g e | 9
Comparison of BTB Performance based on CPI
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 64 sets/2 way 0. 9500 0. 4674 0. 7571
2 32 sets/2 way 0. 9664 0. 4711 0. 7645
3 128 sets/2 way 0. 9304 0. 4664 0. 7496
4 32 sets/4 way 0. 9491 0. 4670 0. 7575
5 128 sets/1 way 0. 9528 0. 4686 0. 7583
Graphical Representation with above CPI
From the above graph, CPI remains fairly constant for every benchmark. Among the
benchmarks, ANAGRAM benchmark has the most optimal CPI and GCC benchmark holds the
maximum CPI for execution with various BTB configurations. The CPI seems to be higher for
configuration 32 sets/2 way compared to the 64 sets/2 way which has much higher sets than this
configuration. In case of 32 sets/4 way and 128 sets/1 way configurations, associativity and
number of sets makes the CPI almost equal to the 64 sets/2 way CPI. For the configuration with
set 128 and associativity 2 the CPI remains much lower than all other configurations.
0
0.2
0.4
0.6
0.8
1
1.2
64 sets/2 way 32 sets/2 way 128 sets/2 way 32 sets/4 way 128 sets/1 way
GCC
ANAGRAM
GO
11. P a g e | 10
Comparison of BTB Performance based on Branch Predictor Hit Rates
Sr. No. Configuration
Benchmarks
GCC ANAGRAM GO
1 64 sets/2 way 0.6779 0.9546 0.6926
2 32 sets/2 way 0.636 0.9476 0.6527
3 128 sets/2 way 0.7221 0.9557 0.7225
4 32 sets/4 way 0.6852 0.9573 0.6931
5 128 sets/1 way 0.665 0.9518 0.6775
Graphical Representation with above Branch Predictor Hit Rates
The above graph clearly shows us that the branch predictor hit rate for all the
benchmarks is relatively low when number of set decreases in a BTB. When we closely observe
the variation in the branch predictor hit rates of different configurations, it is evident that for BTB
configuration, 32 sets and 2 way set associative the branch prediction hit rate is lower for all the
benchmarks.
CONCLUSION
For an optimal branch predictor, it is recommended to have higher sets but at the same time
tradeoff between cost and performance should be taken into consideration.
To have high address hit rates and direction hit rates, the simulation results suggests that
combination of two level and bimodal predictor configuration is better.
0
0.2
0.4
0.6
0.8
1
1.2
64 sets/2
way
32 sets/2
way
128 sets/2
way
32 sets/4
way
128 sets/1
way
GCC
ANAGRAM
GO