Control Dependency<br />1<br />Problem<br />Dependency tracking in ARVI ignores control dependency<br />Can’t get practica...
Example of control dependency<br />Practical Available Register Set of branch 2<br />r1, r3<br />Available Register Set in...
Dependence Tracking<br />3<br />Branch 1 not taken<br />Branch 1 taken<br />
Improved Data Dependence Tracking<br />Resolve control dependency<br />Add Control flow information to tracking<br />Add l...
Behavior of branch instruction<br />Example <br />beq in MIPS instruction set architecture<br />5<br />
Improved Dependence Tracking<br />6<br />Branch 1 not taken<br />Branch 1 taken<br />
ARS of Branch 13<br />Improved Tracking<br />Track control dependency well<br />When completed by the INST1<br />It’s diff...
Common code problem<br />Performance loss in not control dependence code<br />In common code<br />ARS of Branch 15<br />Wh...
Distinguishing control flow in improved tracking<br />TA is wasted Information<br />it’s not mean that the prediction isn’...
“SetTA” Instruction <br />Add “SetTA” Instruction <br />Save next instruction address to TA<br />ARS of branch 15 is still...
Encoding<br />Amount of information is changed by number of registers in ARS<br />Amount of information<br />Assume each l...
Encoding of ARVI<br />XOR with each physical register values<br />Simple XOR HASH with XOR tree<br />12<br />
Reducing Hash conflict<br />Programs more use lower bits than higher bits of registers<br />Almost information is centrali...
Percentage of use of each bit<br />14<br /><ul><li>There are the bits that program use mostly
Hash conflict occurs in that bits</li></li></ul><li>Degree of centralization<br />15<br />High value mean use small number...
Proposed Encoding<br />XOR with each Logical register values<br />Different Circular shifted by logical number<br />Serial...
Select Logical Register X<br />Select Logical Register X<br />Select physical register value that mapped in logical regist...
Delay<br />nPR = Number of Physical Register<br />nLR = Number of Logical Register <br />L = Log2(nLR)<br />Simple XOR Has...
HW Resource<br />Simple XOR Hash<br />nPR *N*AND2 + (nPR-1)*N*XOR2<br />nPR-1 * 3bitADD for Logical num tag<br />Proposed ...
Suitable predictor for register-value-pattern<br />Characteristic of register-value-pattern<br />Need long pattern length ...
Proposed predictor<br />21<br />Modified YAGS[21]<br />1 Bimodial<br />Saving Biases for each branches<br />2 Cache<br />S...
Block diagram<br />22<br />1 Fast predictor predict direction in early cycle<br />When Modified YAGS hit and Depth tag is ...
Outlines<br />Why We need branch prediction ??<br />Related works<br />Improved Register-value-pattern generation<br />Exp...
Experimental environment<br />SimpleScalar3.0 <br />PISA Instruction Set Architecture<br />Little Endean<br />sim-outorder...
Processor Architecture Configuration<br />25<br />
Memory Architecture Configuration<br />26<br />
Predictor Configuration<br />27<br />
Outlines<br />Why We need branch prediction ??<br />Related works<br />Improved Register-value-pattern generation<br />Exp...
Register-Value-Pattern predictor<br />Register-Value-Pattern predictor predictor is predict like Human doing. <br />If we ...
Factors of performance loss<br />1. Limitation of dependence tracking<br />1.1 Load Branch<br />1.2 Control Dependency<br ...
Contribution<br />We improve some factors of performance loss<br />1.2 Control Dependency<br />2 Hash conflict in encoding...
Applications of Register-Value-Pattern<br />Register Value Pattern has limits at different kinds of branches with Branch H...
Outlines<br />Why We need branch prediction ??<br />Related works<br />Improved Register-value-pattern generation<br />Exp...
Reference<br />[1] T. Yeh and Y. Patt. “Two-level Adaptive Branch Prediction” In Proc 24th ACM/IEEE IntSymp. on Microarchi...
Reference<br />[6] F. Gabbay and A. Mendelson“Speculative Execution Based on Value Prediction” In Technical Report Technio...
Reference<br />[9] K.Wang “Highly Accurate Data Value Prediction using Hybrid Predictors” In Proc 30thIntSymp on Microarch...
Reference<br />[13]L. N. Vintan, M. Sbera, I. Z. Mihu and A. Florea, &quot;An alternative to branch prediction: pre-comput...
Reference<br />[16]  L. Chen, S. Dropsho and D. H. Albonesi“Dynamic Data Dependence Tracking and its Application to Branch...
Reference<br />[19] P. Chang , E. Hao and Y. Patt “Alternative Implementations of Hybrid Branch Predictors”.In Proc 28th A...
40<br />
Upcoming SlideShare
Loading in …5
×

improved register value pattern generation for branch prediction

780 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
780
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

improved register value pattern generation for branch prediction

  1. 1. Control Dependency<br />1<br />Problem<br />Dependency tracking in ARVI ignores control dependency<br />Can’t get practical Available registers<br />Make same patterns for different directions<br />CANNOT predict that branch correctly<br />
  2. 2. Example of control dependency<br />Practical Available Register Set of branch 2<br />r1, r3<br />Available Register Set in ARVI of branch 2<br />r3<br />When r3==1<br />r1 ==0 -&gt; not taken<br />r1 !=0 -&gt; taken<br />2<br />
  3. 3. Dependence Tracking<br />3<br />Branch 1 not taken<br />Branch 1 taken<br />
  4. 4. Improved Data Dependence Tracking<br />Resolve control dependency<br />Add Control flow information to tracking<br />Add logical register to architecture<br />Called TA-register(target address)<br />For maintaining Target address of last branch<br />TA is hidden source register of instructions<br />4<br />
  5. 5. Behavior of branch instruction<br />Example <br />beq in MIPS instruction set architecture<br />5<br />
  6. 6. Improved Dependence Tracking<br />6<br />Branch 1 not taken<br />Branch 1 taken<br />
  7. 7. ARS of Branch 13<br />Improved Tracking<br />Track control dependency well<br />When completed by the INST1<br />It’s different with practical ARS<br />But it can be predicted well <br />because TA has control flow information<br />When r3==1<br />TA==2 Not Taken<br />Ta==4 Taken<br />7<br />
  8. 8. Common code problem<br />Performance loss in not control dependence code<br />In common code<br />ARS of Branch 15<br />When completed by the INST2<br />Practical ARS<br />r2(INST1)<br />Previously proposed Tracking<br />r2(INST1)<br />Improved Tracking<br />r2(INST1), TA<br />8<br />
  9. 9. Distinguishing control flow in improved tracking<br />TA is wasted Information<br />it’s not mean that the prediction isn’t correct<br />But mean that predictor need more training<br />Information to Train<br />Previously Tracking<br />r2 = 0 -&gt; Taken<br />Improved Tracking<br />r2 =0, TA=5 -&gt; Taken<br />r2=0, TA=6 -&gt; Taken<br />9<br />
  10. 10. “SetTA” Instruction <br />Add “SetTA” Instruction <br />Save next instruction address to TA<br />ARS of branch 15 is still r2 and TA<br />But TA is always 6<br />Disadvantage<br />Wasted Instructions(INST6)<br />Programs will be Recompiled<br />Have to find start of common code for adding “setTA” at compile time<br />It’s hard because an Assembly language is not the structured programming language(have “goto”)<br />10<br />
  11. 11. Encoding<br />Amount of information is changed by number of registers in ARS<br />Amount of information<br />Assume each length of values is 10bits<br />1 register in ARS =&gt; 10bits<br />2 registers in ARS =&gt; 20bits<br />3 registers in ARS =&gt; 30bits<br />Must generate fixed length pattern from various length information<br />-&gt; HASH<br />Various Encodings are possible<br />11<br />
  12. 12. Encoding of ARVI<br />XOR with each physical register values<br />Simple XOR HASH with XOR tree<br />12<br />
  13. 13. Reducing Hash conflict<br />Programs more use lower bits than higher bits of registers<br />Almost information is centralized in lower bits<br />Hash conflict occurs due to lower bits<br />For decentralizing information distribution<br />Different circular shifted values per logical register numbers<br />Because physical number is changed in runtime<br />13<br />
  14. 14. Percentage of use of each bit<br />14<br /><ul><li>There are the bits that program use mostly
  15. 15. Hash conflict occurs in that bits</li></li></ul><li>Degree of centralization<br />15<br />High value mean use small number bits of registers<br />Information is centralized in small number of bits<br />Decentralized well by circular shift <br />
  16. 16. Proposed Encoding<br />XOR with each Logical register values<br />Different Circular shifted by logical number<br />Serialize physical-logical mapping<br />Value information is shorter than before(Disadventage)<br />16<br />
  17. 17. Select Logical Register X<br />Select Logical Register X<br />Select physical register value that mapped in logical register X<br />17<br />
  18. 18. Delay<br />nPR = Number of Physical Register<br />nLR = Number of Logical Register <br />L = Log2(nLR)<br />Simple XOR Hash<br />Log2(nPR) * XOR2 + AND2<br />Proposed Hash<br />Log2(nLR) *XOR2 + Select + AND2<br />Select = XOR2 + ANDL + Gate + OR(nPR)<br />nPR &gt; nLR *2<br />Log2(nPR) &gt; Log2(nLR) + 1<br />Approximately same or little bit slower<br />18<br />
  19. 19. HW Resource<br />Simple XOR Hash<br />nPR *N*AND2 + (nPR-1)*N*XOR2<br />nPR-1 * 3bitADD for Logical num tag<br />Proposed Hash<br />nPR *N*AND2 + nLR * Select + (nLR-1)*N*XOR2<br />Select = nPR *( L * (XOR2 +2Gate) + ANDL) + N * OR(NPR)<br />No Logical num tag<br />Pattern has that information already<br />19<br />
  20. 20. Suitable predictor for register-value-pattern<br />Characteristic of register-value-pattern<br />Need long pattern length for reliable prediction<br />PHT is not suitable<br />Must save tags for comparing states<br />Perceptron is not suitable[17][18]<br />Non-linear-separable[17][18]<br />Each bit of value has relation of AND with others<br />Perceptron is not suitable<br />Many various patterns for branches<br />If there is loop that r1 is changed from 0 to 999<br />There is 999 not taken patterns and 1 taken pattern<br />Long Delay for pattern generation<br />Perceptron is not suitable[17][18]<br />Must hybrid with fast predictor[19][20]<br />20<br />
  21. 21. Proposed predictor<br />21<br />Modified YAGS[21]<br />1 Bimodial<br />Saving Biases for each branches<br />2 Cache<br />Save only pattern that different with bias<br />Taken Cache<br />Saving Not taken patterns for taken biased branches <br />Not Taken Cache<br />Saving Taken patterns for Not taken biased branches <br />
  22. 22. Block diagram<br />22<br />1 Fast predictor predict direction in early cycle<br />When Modified YAGS hit and Depth tag is same with now state<br />Update fetch direction in late cycle<br />When Modified YAGS miss then predicted direction of YAGS is bias and we don’t know it is not trained or trained but not save<br />Selector select biased direction or Fasted predictor direction<br />
  23. 23. Outlines<br />Why We need branch prediction ??<br />Related works<br />Improved Register-value-pattern generation<br />Experiment and Evaluation<br />Contribution<br />Reference<br />23<br />
  24. 24. Experimental environment<br />SimpleScalar3.0 <br />PISA Instruction Set Architecture<br />Little Endean<br />sim-outorder<br />Performance-based<br />Execution driven<br />Cycle timer<br />Benchmarks<br />10 programs of SPEC 2k<br />Instructions coverage<br />150M ~ 250M instruction<br />24<br />
  25. 25. Processor Architecture Configuration<br />25<br />
  26. 26. Memory Architecture Configuration<br />26<br />
  27. 27. Predictor Configuration<br />27<br />
  28. 28. Outlines<br />Why We need branch prediction ??<br />Related works<br />Improved Register-value-pattern generation<br />Experiment and Evaluation<br />Contribution<br />Reference<br />28<br />
  29. 29. Register-Value-Pattern predictor<br />Register-Value-Pattern predictor predictor is predict like Human doing. <br />If we know “this branch was taken before when a=3 and b=4”<br />We predict the branch without calculation when arrive a=3 and b=4 again.<br />Commonsense design<br />Why it’s not possible 100% accuracy??<br />29<br />
  30. 30. Factors of performance loss<br />1. Limitation of dependence tracking<br />1.1 Load Branch<br />1.2 Control Dependency<br />2. Hash conflict in encoding<br />3. Prediction Delay<br />4. Various Patterns for same direction<br />4.1 Pattern capacity of predictor<br />4.2 Lack of training<br />30<br />
  31. 31. Contribution<br />We improve some factors of performance loss<br />1.2 Control Dependency<br />2 Hash conflict in encoding<br />4.1 Pattern capacity of predictor<br />But we still have assignments<br />31<br />
  32. 32. Applications of Register-Value-Pattern<br />Register Value Pattern has limits at different kinds of branches with Branch History Pattern<br />Higher performance in hybrid predictor with Branch History Pattern<br />Register-Value-Pattern with Branch register value based<br />Depth of dependence chain is 0<br />Means Branch register is already updated<br />We are good to use Branch register value based prediction in that case<br />Register-Value-Pattern for Value prediction<br />We can use register-value-pattern for value prediction as Information <br />32<br />
  33. 33. Outlines<br />Why We need branch prediction ??<br />Related works<br />Improved Register-value-pattern generation<br />Experiment and Evaluation<br />Contribution<br />Reference<br />33<br />
  34. 34. Reference<br />[1] T. Yeh and Y. Patt. “Two-level Adaptive Branch Prediction” In Proc 24th ACM/IEEE IntSymp. on Microarchitecture, 1991.<br />[2] T. Yeh and Y. Patt. “A Comparison of Dynamic Branch Predictors that use Two Levels of Branch History” In Proc 20th Ann IntSymp. on Computer Architecture,1993.<br />[3] S. Pan , K So and J. Rahmeh. “Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation” In Proc 5th Annual Intl Conf. on Architectural Support for Prog. Lang. and Operating Systems, 1992.<br />[4] R. Nair “Dynamic Path-Based Branch Correlation” In Proc 28th Ann IntSymp On Microarchitecture,1995.<br />[5] D. Jim´enez“Fast Path-Based Neural Branch Prediction” In Proc 36th Ann IEEE/ACM IntSymp On Microarchitecure, 2003<br />34<br />
  35. 35. Reference<br />[6] F. Gabbay and A. Mendelson“Speculative Execution Based on Value Prediction” In Technical Report Technion, 1997<br />[7] J. Gonzalez and A. Gonzalez “Control-Flow Speculation through Value Prediction for Superscalar Processors” In Proc Int Conf On Parallel Architectures and Compilation Techniques, 1999<br />[8] T. Heil, Z. Smith and J.E. Smith “Improving Branch Predictor by Correlating on Data Value” In Proc 32nd IntSymp On Microarchitecture,1999.<br />35<br />
  36. 36. Reference<br />[9] K.Wang “Highly Accurate Data Value Prediction using Hybrid Predictors” In Proc 30thIntSymp on Microarchitecture, 1997.<br />[10] M. Lipasti and J. Shen “Exceeding the Dataflow Limit via Value Prediction” In proc 29thIntSymp on Microarchitecture,1996.<br />[11] W.Mohan and M.Franklin “Improving Data Value Prediction Accuracy using Path Correlation” In Proc 6thInt Conf on High performance Computing, 1999.<br />[12] Y. Sazeides and J. Smith. “Implementations of Context Based Value Predictors” In Technical Report #ECE-TR-97- 8, University of Wisconsin-Madison, 1997.<br />36<br />
  37. 37. Reference<br />[13]L. N. Vintan, M. Sbera, I. Z. Mihu and A. Florea, &quot;An alternative to branch prediction: pre-computed branches,&quot; In ACM SIGARCH Computer Architecture News archive Vol 31 , 2003.<br />[14] L. He and Z. Liu, “A New Value Based Branch Predictor For SMT Processors” In Proc 16th IASTED Int Conf on Parallel and Distributed Computing and System, 2004<br />[15] Y. Pan, X. Fan, L. He, D. Wang “A bypass Mechanism to Enhance Branch Predictor for SMT”, In Proc 12th Asia-Pacific Conf on Computer Systems Architecture ACSAC2007, vol 4697, 2007<br />37<br />
  38. 38. Reference<br />[16] L. Chen, S. Dropsho and D. H. Albonesi“Dynamic Data Dependence Tracking and its Application to Branch Prediction” In Proc 9th IntSymp on Highperformance Computer Architecture, 2003. <br />[17]D.A.Jim´enez and C.Lin. “Dynamic Branch Prediction with Perceptrons”.InProc 7thIntSymp.on High Performace Computer Architecutre,2001.<br />[18] D.A.Jim´enez and C.Lin. “Neural Methods for Dynamic Branch Prediction”.In ACM Transactions on Computer Systems, 2002.<br />38<br />
  39. 39. Reference<br />[19] P. Chang , E. Hao and Y. Patt “Alternative Implementations of Hybrid Branch Predictors”.In Proc 28th Ann IntSymp.onMicroarchitecture, 1995.<br />[20] M. Evers, P. Chang and Y. Patt “Using Hybrid Branch Predictors to Improve Branch Prediction Accuracy in The Presence of Context Switches”. In Proc 23rd Ann IntSymp. on Computer Architecture ,1996<br />[21] A.Eden and T. Mudge. “The YAGS branch prediction scheme”InProc 31st Ann ACM/IEEE IntSymp.onMicroarchitectres, 1998<br />[22] P. N. Glaskowsky. “Pentium 4 (partially) previewed. “In Microprocessor Report, 2000.<br />39<br />
  40. 40. 40<br />

×