Lecture 9: Branch Prediction
Basic idea, saturating counter, BHT,
BTB, return address prediction,
correlating prediction

...
Reducing Branch Penalty
Branch penalty in dynamically scheduled processors:
wasted cycles due to pipeline flushing on misp...
What to Use and What to Predict
Available info:



Current predicted PC
Past branch history
(direction and target)

pred...
Mis-prediction Detections and Feedbacks
Detections:
At the end of decoding




Target address known at
decoding, and not...
Branch Direction Prediction
Predict branch direction: taken or not taken
(T/NT)
taken
Not taken

BNE R1, R2, L1
…
L1: …

S...
Predictor for a Single Branch
General Form
1. Access

2. Predict
Output T/NT

state

PC

3. Feedback T/NT

1-bit predictio...
Branch History Table of 1-bit Predictor
BHT also Called Branch
Prediction Buffer in
textbook
Can use only one 1-bit
predic...
1-bit BHT Weakness
Example: in a loop, 1-bit BHT will cause
2 mispredictions
Consider a loop of 9 iterations before exit:
...
2-bit Saturating Counter
Solution: 2-bit scheme where change prediction only if
get misprediction twice: (Figure 3.7, p. 2...
Branch Target Buffer
Branch Target Buffer (BTB): Address of branch index to
get prediction AND branch address (if taken)
...
Return Addresses Prediction
Register indirect branch hard to predict
address



Many callers, one callee
Jump to multipl...
Correlating Branches
Code example showing
the potential

Assemble code

If (d==0)
d=1;
If (d==1)
…

BNEZ R1, L1
DADDIU R1,...
Correlating Branch Predictor
Idea: taken/not taken of
recently executed
branches is related to
behavior of next branch
(as...
Correlating Branch Predictor
General form: (m, n)
predictor
 m bits for global
history, n bits for local
history
 Record...
Accuracy of Different Schemes
(Figure 3.15, p. 206)
Frequencyofof Mispredictions
Frequency
Mispredictions

20%

4096 Entri...
Estimate Branch Penalty
EX: BHT correct rate
is 95%, BTB hit rate
is 95%
Average miss penalty
is 15 cycles
How much is the...
Accuracy of Return Address Predictor

17
Upcoming SlideShare
Loading in …5
×

Like 2014214

579 views

Published on

Published in: Spiritual
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
579
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • This lecture will cover sections 1.1-1.4: introduction, application, technology trends, cost and price.
  • <number>
  • Like 2014214

    1. 1. Lecture 9: Branch Prediction Basic idea, saturating counter, BHT, BTB, return address prediction, correlating prediction 1
    2. 2. Reducing Branch Penalty Branch penalty in dynamically scheduled processors: wasted cycles due to pipeline flushing on mispredicted branches Reduce branch penalty: 1. Predict branch/jump instructions AND branch direction (taken or not taken) 2. Predict branch/jump target address (for taken branches) 3. Speculatively execute instructions along the predicted path 2
    3. 3. What to Use and What to Predict Available info:   Current predicted PC Past branch history (direction and target) pred_PC PC What to predict:    Conditional branch inst: branch direction and target address Jump inst: target address Procedure call/return: target address May need instruction predecoded Predictors IM PC & Inst pred info feedback P C 3
    4. 4. Mis-prediction Detections and Feedbacks Detections: At the end of decoding   Target address known at decoding, and not match Flush fetch stage At commit (most cases)   Wrong branch direction or target address not match Flush the whole pipeline (at EXE: MIPS R10000) Feedbacks: Any time a mis-prediction is detected At a branch’s commit (at EXE: called speculative update) FETCH predictors RENAME REB/ROB SCHD EXE WB COMMIT 4
    5. 5. Branch Direction Prediction Predict branch direction: taken or not taken (T/NT) taken Not taken BNE R1, R2, L1 … L1: … Static prediction: compilers decide the direction Dynamic prediction: hardware decides the direction using dynamic information 1. 2. 3. 4. 5. 1-bit Branch-Prediction Buffer 2-bit Branch-Prediction Buffer Correlating Branch Prediction Buffer Tournament Branch Predictor and more … 5
    6. 6. Predictor for a Single Branch General Form 1. Access 2. Predict Output T/NT state PC 3. Feedback T/NT 1-bit prediction Feedback T Predict Taken NT 1 NT T 0 Predict Taken 6
    7. 7. Branch History Table of 1-bit Predictor BHT also Called Branch Prediction Buffer in textbook Can use only one 1-bit predictor, but accuracy is low BHT: use a table of simple predictors, indexed by bits from PC Similar to direct mapped cache More entries, more cost, but less conflicts, higher accuracy BHT can contain complex predictors K-bit Branch address 2k Prediction Prediction 7
    8. 8. 1-bit BHT Weakness Example: in a loop, 1-bit BHT will cause 2 mispredictions Consider a loop of 9 iterations before exit: for (…){ for (i=0; i<9; i++) a[i] = a[i] * 2.0; }  End of loop case, when it exits instead of looping as before  First time through loop on next time through code, when it predicts exit instead of looping  Only 80% accuracy even if loop 90% of the time 8
    9. 9. 2-bit Saturating Counter Solution: 2-bit scheme where change prediction only if get misprediction twice: (Figure 3.7, p. 249) T Predict Taken 11 NT T T Predict Not Taken 01 10 Predict Taken NT NT T 00 Predict Not Taken NT Blue: stop, not taken Gray: go, taken Adds hysteresis to decision making process 9
    10. 10. Branch Target Buffer Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken)  Note: must check for branch match now, since can’t use wrong branch address Example: BTB combined with BHT Branch PC Predicted PC PC of instruction FETCH =? No: branch not predicted, proceed normally (Next PC = PC+4) Extra Yes: instruction is prediction state branch and use bits predicted PC as next PC 10
    11. 11. Return Addresses Prediction Register indirect branch hard to predict address   Many callers, one callee Jump to multiple return addresses from a single address (no PC-target correlation) SPEC89 85% such branches for procedure return Since stack discipline for procedures, save return address in small buffer that acts like a stack: 8 to 16 entries has small miss rate 11
    12. 12. Correlating Branches Code example showing the potential Assemble code If (d==0) d=1; If (d==1) … BNEZ R1, L1 DADDIU R1,R0,#1 L1: DADDIU R3,R1,#-1 BNEZ R3, L2 L2: … Observation: if BNEZ1 is not taken, then BNEZ2 is taken 12
    13. 13. Correlating Branch Predictor Idea: taken/not taken of recently executed branches is related to behavior of next branch (as well as the history of that branch behavior)  Then behavior of recent branches selects between, say, 2 predictions of next branch, updating just that prediction  (1,1) predictor: 1-bit global, 1-bit local Branch address (4 bits) 1-bits per branch local predictors Prediction Prediction 1-bit global branch history (0 = not taken) 13
    14. 14. Correlating Branch Predictor General form: (m, n) predictor  m bits for global history, n bits for local history  Records correlation between m+1 branches  Simple implementation: global history can be store in a shift register  Example: (2,2) predictor, 2-bit global, 2-bit local Branch address (4 bits) 2-bits per branch local predictors Prediction Prediction 2-bit global branch history (01 = not taken then taken) 14
    15. 15. Accuracy of Different Schemes (Figure 3.15, p. 206) Frequencyofof Mispredictions Frequency Mispredictions 20% 4096 Entries 2-bit BHT Unlimited Entries 2-bit BHT 1024 Entries (2,2) BHT 18% 16% 14% 12% 11% 10% 8% 6% 6% 6% 6% 5% 5% 4% 4% 2% 1% 1% 0% 0% nasa7 matrix300 tomcatv doducd 4,096 entries: 2-bits per entry spice fpppp gcc Unlimited entries: 2-bits/entry espresso eqntott 1,024 entries (2,2) 15 li
    16. 16. Estimate Branch Penalty EX: BHT correct rate is 95%, BTB hit rate is 95% Average miss penalty is 15 cycles How much is the branch penalty? 16
    17. 17. Accuracy of Return Address Predictor 17

    ×