Lec Jan22 2009

915 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
915
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
47
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Lec Jan22 2009

  1. 1. CSL718 : Pipelined Processors Improving Branch Performance 22nd Jan, 2009 Anshul Kumar, CSE IITD
  2. 2. Improving Branch Performance • Branch Elimination – replace branch with other instructions • Branch Speed Up – reduce time for computing CC and TIF • Branch Prediction – guess the outcome and proceed, undo if necessary • Branch Target Capture – make use of history slide 2 Anshul Kumar, CSE IITD
  3. 3. Branch Elimination Use conditional instructions F (predicated execution) C T S C:S OP1 OP1 BC CC = Z, ∗ + 2 ADD R3, R2, R1, NZ ADD R3, R2, R1 OP2 OP2 slide 3 Anshul Kumar, CSE IITD
  4. 4. Branch Elimination - contd. CC IF IF IF D AG DF DF DF EX EX OP1 IF IF IF D AG TIF TIF TIF BC IF IF IF D’ D AG ADD/OP2 IF IF IF D AG DF DF DF EX EX ADD (cond) slide 4 Anshul Kumar, CSE IITD
  5. 5. Improving Branch Performance • Branch Elimination – replace branch with other instructions • Branch Speed Up – reduce time for computing CC and TIF • Branch Prediction – guess the outcome and proceed, undo if necessary • Branch Target Capture – make use of history slide 5 Anshul Kumar, CSE IITD
  6. 6. Branch Speed Up : early target address generation early target address generation • Assume each instruction is Branch • Generate target address while decoding • If target in same page omit translation • After decoding discard target address if not Branch IF IF IF D TIF TIF TIF BC AG slide 6 Anshul Kumar, CSE IITD
  7. 7. Branch Speed Up : increase CC - branch gap increase CC - branch gap Increase the gap between condition checking and branching • Early CC setting • Delayed branch slide 7 Anshul Kumar, CSE IITD
  8. 8. Early CC setting: insert n instructions insert n instructions (branch taken) (branch taken) CC IF IF D AG AG DF DF EX EX n=0 I-1 IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 6 (Delay can be reduced with larger target buffer) slide 8 Anshul Kumar, CSE IITD
  9. 9. Early CC setting: insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=1 I-1 IF IF D AG AG DF DF EX EX J IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 5 slide 9 Anshul Kumar, CSE IITD
  10. 10. Early CC setting: insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=2 I-1 IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 4 slide 10 Anshul Kumar, CSE IITD
  11. 11. Early CC setting: insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=3 I-1 IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D AG AG DF DF EX EX L IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 4 slide 11 Anshul Kumar, CSE IITD
  12. 12. Early CC setting: insert n instructions insert n instructions (branch not taken) (branch not taken) CC IF IF D AG AG DF DF EX EX n=0 I-1 IF IF D AG AG TIF TIF I IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 5 slide 12 Anshul Kumar, CSE IITD
  13. 13. Early CC setting: insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=1 I-1 IF IF D AG AG DF DF EX EX J IF IF D AG AG TIF TIF I IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 4 slide 13 Anshul Kumar, CSE IITD
  14. 14. Early CC setting: insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=2 I-1 IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D AG AG TIF TIF I IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 3 slide 14 Anshul Kumar, CSE IITD
  15. 15. Early CC setting: insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=3 I-1 IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D AG AG DF DF EX EX L IF IF D AG AG TIF TIF I IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 2 slide 15 Anshul Kumar, CSE IITD
  16. 16. Delayed Branch: insert n instructions insert n instructions (branch taken) (branch taken) CC IF IF D AG AG DF DF EX EX n=0 I-1 IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 6 slide 16 Anshul Kumar, CSE IITD
  17. 17. Delayed Branch : insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=1 I-1 IF IF D AG AG TIF TIF I IF IF D AG AG DF DF EX EX J IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 5 slide 17 Anshul Kumar, CSE IITD
  18. 18. Delayed Branch : insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=2 I-1 IF IF D AG AG TIF TIF I IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 4 slide 18 Anshul Kumar, CSE IITD
  19. 19. Delayed Branch : insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=3 I-1 IF IF D AG AG TIF TIF I IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D AG AG DF DF EX EX L IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 3 slide 19 Anshul Kumar, CSE IITD
  20. 20. Delayed Branch : insert n instructions insert n instructions (branch not taken) (branch not taken) CC IF IF D AG AG DF DF EX EX n=0 I-1 IF IF D AG AG TIF TIF I IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 5 slide 20 Anshul Kumar, CSE IITD
  21. 21. Delayed Branch : insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=1 I-1 IF IF D AG AG TIF TIF I IF IF D AG AG DF DF EX EX J IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 4 slide 21 Anshul Kumar, CSE IITD
  22. 22. Delayed Branch : insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=2 I-1 IF IF D AG AG TIF TIF I IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 3 slide 22 Anshul Kumar, CSE IITD
  23. 23. Delayed Branch : insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=3 I-1 IF IF D AG AG TIF TIF I IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D AG AG DF DF EX EX L IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 2 slide 23 Anshul Kumar, CSE IITD
  24. 24. Summary - Branch Speed Up n=0 n=1 n=2 n=3 n=4 n=5 uncond 4 4 4 4 4 4 delayed early CC branch setting cond (T) 6 5 4 4 4 4 cond (I) 5 4 3 2 1 0 uncond 4 3 2 1 0 0 cond (T) 6 5 4 3 2 1 cond (I) 5 4 3 2 1 0 slide 24 Anshul Kumar, CSE IITD
  25. 25. Improving Branch Performance • Branch Elimination – replace branch with other instructions • Branch Speed Up – reduce time for computing CC and TIF • Branch Prediction – guess the outcome and proceed, undo if necessary • Branch Target Capture – make use of history slide 25 Anshul Kumar, CSE IITD
  26. 26. Branch Prediction • Treat conditional branches as unconditional branches / NOP • Undo if necessary Strategies: – Fixed (always guess inline) – Static (guess on the basis of instruction type) – Dynamic (guess based on recent history) slide 26 Anshul Kumar, CSE IITD
  27. 27. Prediction based on statistics Instr % Branch Guess Correct Guess Correct uncond 14.5 100% always 14.5% always 14.5% cond 58 54% never 27% always 31% loop 9.8 91% always 9% always 9% call/ret 17.7 100% always 17.7% always 17.7% Total 68.2% 72.2% slide 27 Anshul Kumar, CSE IITD
  28. 28. Branch Prediction (guess inline, go inline) (guess inline, go inline) CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I IF IF D I+1 IF IF D I+2 delay = 0 slide 28 Anshul Kumar, CSE IITD
  29. 29. Branch Prediction (guess inline, goto target) (guess inline, goto target) CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 6 slide 29 Anshul Kumar, CSE IITD
  30. 30. Branch Prediction (guess target, go inline) (guess target, go inline) CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I D T D’ D I+1 D’ D I+2 delay = 5 slide 30 Anshul Kumar, CSE IITD
  31. 31. Branch Prediction (guess target, goto target) (guess target, goto target) CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 4 Same as unconditional branch slide 31 Anshul Kumar, CSE IITD
  32. 32. Static prediction strategy Let p = probability of taking branch guess target: delayt = 4 p + 5 (1 - p) = 5 - p guess inline: delayi = 6 p + 0 (1 - p) = 6 p ⇒ if (delayt < delayi) guess target else guess inline (delayt < delayi) ⇒ 5 - p < 6 p ⇒ p > 5/7 = .71 slide 32 Anshul Kumar, CSE IITD
  33. 33. Static prediction strategy - thresholds for different instructions thresholds for different instructions CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I →T I actual guess T 4 5 ↓ I 60 guess target if 4 p + 5 (1 - p) < 6 p + 0 (1 - p) i.e. p > .71 slide 33 Anshul Kumar, CSE IITD
  34. 34. Static prediction strategy - thresholds for different instructions thresholds for different instructions CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF EX EX I →T I Loop control actual guess T 4 6 ↓ I 71 guess target if 4 p + 6 (1 - p) < 7 p + 1 (1 - p) i.e. p > .62 slide 34 Anshul Kumar, CSE IITD
  35. 35. Static prediction strategy - thresholds for different instructions thresholds for different instructions CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG TIF TIF I →T I register address actual guess T 3 5 ↓ I 60 guess target if 3 p + 5 (1 - p) < 6 p + 0 (1 - p) i.e. p > .62 slide 35 Anshul Kumar, CSE IITD
  36. 36. Delayed Branch with Nullification (Also called annulment ) • Delay slot is used optionally • Branch instruction specifies the option • Option may be exercised based on correctness of branch prediction • Helps in better utilization of delay slots slide 36 Anshul Kumar, CSE IITD
  37. 37. Variants of Nullification 1.No annulment 2.Annul 3.Annul 4.Annul if not taken If taken always (branch-with-execute) (branch-or-skip) (branch-with-skip) bc bc bc bc D D D D D D D D Examples •SPARC: 1, 2 •MC88100: 1, 4 •i860: 2, 4 •HP PA: 1, 2, 3 slide 37 Anshul Kumar, CSE IITD
  38. 38. Annulment illustration use branch-or-skip use branch-with-skip bc D bc D slide 38 Anshul Kumar, CSE IITD
  39. 39. Dynamic Branch Prediction - basic idea previous Predict based on the history of branch loop: xxx 2 mispredictions xxx for every xxx occurrence xxx BC loop slide 39 Anshul Kumar, CSE IITD
  40. 40. Dynamic Branch Prediction - 2 bit prediction scheme N 0 1 T 3/2 0/1 T N T predict not taken predict taken N N 2 3 T slide 40 Anshul Kumar, CSE IITD
  41. 41. Dynamic Branch Prediction - Bimodal predictor Bimodal predictor Maintain saturating counters T T T T 0 1 2 3 N N N N slide 41 Anshul Kumar, CSE IITD
  42. 42. Dynamic Branch Prediction - History of last n occurrences History of last n occurrences current entry updated entry outcome of last three occurrences actual outcome of this branch ‘taken’ 1 1 0 1 1 1 0 : not taken 1 : taken prediction using majority decision slide 42 Anshul Kumar, CSE IITD
  43. 43. Dynamic Branch Prediction - storing prediction counters store in separate buffer or in cache directory directory storage CACHE cache line counter One counter per branch or One counter per cache line - merge results if multiple branches slide 43 Anshul Kumar, CSE IITD
  44. 44. Correct guesses vs. history length Correct guesses vs. history length n Compiler Business Scientific Supervisor 0 64.1 64.4 70.4 54.0 1 91.9 95.2 86.6 79.7 2 93.3 96.5 90.8 83.4 3 93.7 96.6 91.0 83.5 4 94.5 96.8 91.8 83.7 5 94.7 97.0 92.0 83.9 slide 44 Anshul Kumar, CSE IITD
  45. 45. Two-Level Prediction Two-Level • Uses two levels of information to make a direction prediction – Branch History Table (BHT) - last n occurrences – Pattern History Table (PHT) - saturating 2 bit counters • Captures patterned behavior of branches – Groups of branches are correlated – Particular branches have particular behavior slide 45 Anshul Kumar, CSE IITD
  46. 46. Correlation between branches • B3 can be predicted B1: if (x) with 100% accuracy ... based on the outcomes B2: if (y) of B1 and B2 ... z = x && y B3: if (z) ... slide 46 Anshul Kumar, CSE IITD
  47. 47. Some Two-level Predictors Two-level PC BHT GBHR PHT PHT 10110 11010 T/NT T/NT 01111 11100 00111 Local Predictor Global Predictor bits from PC and BHT can be combined to index PHT slide 47 Anshul Kumar, CSE IITD
  48. 48. Two-level Predictor Classification Two-level Predictor Classification • Yeh and Patt 3-letter naming scheme – Type of history collected • G (global), P (per branch), S (per set) – PHT type • A (adaptive), S (static) – PHT organization • g (global), p (per branch), s (per set) • Examples - GAs, PAp etc. slide 48 Anshul Kumar, CSE IITD
  49. 49. Improving Branch Performance • Branch Elimination – replace branch with other instructions • Branch Speed Up – reduce time for computing CC and TIF • Branch Prediction – guess the outcome and proceed, undo if necessary • Branch Target Capture – make use of history slide 49 Anshul Kumar, CSE IITD
  50. 50. Branch Target Capture • Branch Target Buffer (BTB) • Target Instruction Buffer (TIB) instr addr pred stats target target addr prob of target change < 5% target instr slide 50 Anshul Kumar, CSE IITD
  51. 51. BTB Performance BTB miss BTB hit decision go inline .4 go to target .6 result inline target inline target .8 .2 .2 .8 delay 0 6 5 0 .4*.8*0 + .4*.2*6 + .6*.2*5 + .6*.8*0 = 1.08 slide 51 Anshul Kumar, CSE IITD
  52. 52. Dynamic information about branch • Previous branch • Previous target address / decisions instruction • Explicit prediction • Implicit prediction • Stored in cache • Stored in separate buffer directory Branch Target Buffer (BTB) Branch History Table (BHT) Br Target Addr Cache (BTAC) Target Instr Buffer (TIB) Br Target Instr Cache (BTIC) These two can be combined slide 52 Anshul Kumar, CSE IITD
  53. 53. Storing prediction info directory storage In cache cache line counter In separate buffer instr addr pred stats target slide 53 Anshul Kumar, CSE IITD
  54. 54. Combined prediction mechanism • Explicit : use history bits • Implicit : use BTB hit/miss – hit ⇒ go to target, miss ⇒ go inline • Combined : BTB hit/miss followed by explicit prediction using history bits. – commonly used : hit ⇒ go to target, miss ⇒ explicit prediction – alternatively : miss ⇒ go inline, hit ⇒ explicit prediction slide 54 Anshul Kumar, CSE IITD
  55. 55. Combined prediction BTB miss BTB hit BTB miss BTB hit T I expl predict expl predict I I T T I T I T I TI T I TI T Prediction ⇒ T: Target, I: Inline Actual outcome ⇒ T: Target, I: Inline slide 55 Anshul Kumar, CSE IITD
  56. 56. Structure of Tables Instruction fetch path with • BHT • BTAC • BTIC slide 56 Anshul Kumar, CSE IITD
  57. 57. Compute/fetch scheme (no dynamic branch prediction) A I I+1 I+2 I+3 Instruction I Fetch address BTA F IIFA A I - cache R Compute BTA + Next sequential address BTI BTI+1 BTI+2 BTI+3 slide 57 Anshul Kumar, CSE IITD
  58. 58. BHT (Branch History Table) Instruction Fetch address 2222 I-cache 128 x 4 lines 128 x 4 BHT 16 K 8 instr/line entries 4-way set assoc 2222 4 instr/cycle History bits 4 x 1 instr Prediction decode queue logic issue queue 4 x 1 instr Taken / not taken BTA for a taken guess slide 58 Anshul Kumar, CSE IITD
  59. 59. BTAC scheme A I I+1 I+2 I+3 Instruction I Fetch address BA BTA BTA F IIFA A I - cache BTAC R + Next sequential address BTI BTI+1 BTI+2 BTI+3 slide 59 Anshul Kumar, CSE IITD
  60. 60. BTIC scheme - 1 A I Instruction I Fetch address BA BTI BTA+ BTA F IIFA A I - cache BTIC R + Next sequential address To decoder slide 60 Anshul Kumar, CSE IITD
  61. 61. BTIC scheme - 2 computed A I I+1 Instruction I Fetch address BA BTI BTI+1 BTA+ F IIFA A I - cache BTIC R + Next sequential address To decoder slide 61 Anshul Kumar, CSE IITD
  62. 62. References 1. M.J. Flynn, quot;Computer Architecture : Pipelined and Parallel Processor Designquot;, Narosa Publishing House/ Jones and Bartlett, 1996. 2. D. Sima, T. Fountain, P. Kacsuk, quot;Advanced Computer Architectures : A Design Space Approachquot;, Addison Wesley, 1997. 3. D.A. Patterson, J.L. Hennessy, quot;Computer Architecture : A Quantitative Approachquot;, Morgan CSE IITD Kaufmann Publishers, 2006. slide 62 Anshul Kumar,

×