improved register value pattern generation for branch prediction
Upcoming SlideShare
Loading in...5
×
 

improved register value pattern generation for branch prediction

on

  • 659 views

 

Statistics

Views

Total Views
659
Views on SlideShare
642
Embed Views
17

Actions

Likes
0
Downloads
1
Comments
0

1 Embed 17

http://web2.c2.cyworld.com 17

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

improved register value pattern generation for branch prediction improved register value pattern generation for branch prediction Presentation Transcript

  • Control Dependency
    1
    Problem
    Dependency tracking in ARVI ignores control dependency
    Can’t get practical Available registers
    Make same patterns for different directions
    CANNOT predict that branch correctly
  • Example of control dependency
    Practical Available Register Set of branch 2
    r1, r3
    Available Register Set in ARVI of branch 2
    r3
    When r3==1
    r1 ==0 -> not taken
    r1 !=0 -> taken
    2
  • Dependence Tracking
    3
    Branch 1 not taken
    Branch 1 taken
  • Improved Data Dependence Tracking
    Resolve control dependency
    Add Control flow information to tracking
    Add logical register to architecture
    Called TA-register(target address)
    For maintaining Target address of last branch
    TA is hidden source register of instructions
    4
  • Behavior of branch instruction
    Example
    beq in MIPS instruction set architecture
    5
  • Improved Dependence Tracking
    6
    Branch 1 not taken
    Branch 1 taken
  • ARS of Branch 13
    Improved Tracking
    Track control dependency well
    When completed by the INST1
    It’s different with practical ARS
    But it can be predicted well
    because TA has control flow information
    When r3==1
    TA==2 Not Taken
    Ta==4 Taken
    7
  • Common code problem
    Performance loss in not control dependence code
    In common code
    ARS of Branch 15
    When completed by the INST2
    Practical ARS
    r2(INST1)
    Previously proposed Tracking
    r2(INST1)
    Improved Tracking
    r2(INST1), TA
    8
  • Distinguishing control flow in improved tracking
    TA is wasted Information
    it’s not mean that the prediction isn’t correct
    But mean that predictor need more training
    Information to Train
    Previously Tracking
    r2 = 0 -> Taken
    Improved Tracking
    r2 =0, TA=5 -> Taken
    r2=0, TA=6 -> Taken
    9
  • “SetTA” Instruction
    Add “SetTA” Instruction
    Save next instruction address to TA
    ARS of branch 15 is still r2 and TA
    But TA is always 6
    Disadvantage
    Wasted Instructions(INST6)
    Programs will be Recompiled
    Have to find start of common code for adding “setTA” at compile time
    It’s hard because an Assembly language is not the structured programming language(have “goto”)
    10
  • Encoding
    Amount of information is changed by number of registers in ARS
    Amount of information
    Assume each length of values is 10bits
    1 register in ARS => 10bits
    2 registers in ARS => 20bits
    3 registers in ARS => 30bits
    Must generate fixed length pattern from various length information
    -> HASH
    Various Encodings are possible
    11
  • Encoding of ARVI
    XOR with each physical register values
    Simple XOR HASH with XOR tree
    12
  • Reducing Hash conflict
    Programs more use lower bits than higher bits of registers
    Almost information is centralized in lower bits
    Hash conflict occurs due to lower bits
    For decentralizing information distribution
    Different circular shifted values per logical register numbers
    Because physical number is changed in runtime
    13
  • Percentage of use of each bit
    14
    • There are the bits that program use mostly
    • Hash conflict occurs in that bits
  • Degree of centralization
    15
    High value mean use small number bits of registers
    Information is centralized in small number of bits
    Decentralized well by circular shift
  • Proposed Encoding
    XOR with each Logical register values
    Different Circular shifted by logical number
    Serialize physical-logical mapping
    Value information is shorter than before(Disadventage)
    16
  • Select Logical Register X
    Select Logical Register X
    Select physical register value that mapped in logical register X
    17
  • Delay
    nPR = Number of Physical Register
    nLR = Number of Logical Register
    L = Log2(nLR)
    Simple XOR Hash
    Log2(nPR) * XOR2 + AND2
    Proposed Hash
    Log2(nLR) *XOR2 + Select + AND2
    Select = XOR2 + ANDL + Gate + OR(nPR)
    nPR > nLR *2
    Log2(nPR) > Log2(nLR) + 1
    Approximately same or little bit slower
    18
  • HW Resource
    Simple XOR Hash
    nPR *N*AND2 + (nPR-1)*N*XOR2
    nPR-1 * 3bitADD for Logical num tag
    Proposed Hash
    nPR *N*AND2 + nLR * Select + (nLR-1)*N*XOR2
    Select = nPR *( L * (XOR2 +2Gate) + ANDL) + N * OR(NPR)
    No Logical num tag
    Pattern has that information already
    19
  • Suitable predictor for register-value-pattern
    Characteristic of register-value-pattern
    Need long pattern length for reliable prediction
    PHT is not suitable
    Must save tags for comparing states
    Perceptron is not suitable[17][18]
    Non-linear-separable[17][18]
    Each bit of value has relation of AND with others
    Perceptron is not suitable
    Many various patterns for branches
    If there is loop that r1 is changed from 0 to 999
    There is 999 not taken patterns and 1 taken pattern
    Long Delay for pattern generation
    Perceptron is not suitable[17][18]
    Must hybrid with fast predictor[19][20]
    20
  • Proposed predictor
    21
    Modified YAGS[21]
    1 Bimodial
    Saving Biases for each branches
    2 Cache
    Save only pattern that different with bias
    Taken Cache
    Saving Not taken patterns for taken biased branches
    Not Taken Cache
    Saving Taken patterns for Not taken biased branches
  • Block diagram
    22
    1 Fast predictor predict direction in early cycle
    When Modified YAGS hit and Depth tag is same with now state
    Update fetch direction in late cycle
    When Modified YAGS miss then predicted direction of YAGS is bias and we don’t know it is not trained or trained but not save
    Selector select biased direction or Fasted predictor direction
  • Outlines
    Why We need branch prediction ??
    Related works
    Improved Register-value-pattern generation
    Experiment and Evaluation
    Contribution
    Reference
    23
  • Experimental environment
    SimpleScalar3.0
    PISA Instruction Set Architecture
    Little Endean
    sim-outorder
    Performance-based
    Execution driven
    Cycle timer
    Benchmarks
    10 programs of SPEC 2k
    Instructions coverage
    150M ~ 250M instruction
    24
  • Processor Architecture Configuration
    25
  • Memory Architecture Configuration
    26
  • Predictor Configuration
    27
  • Outlines
    Why We need branch prediction ??
    Related works
    Improved Register-value-pattern generation
    Experiment and Evaluation
    Contribution
    Reference
    28
  • Register-Value-Pattern predictor
    Register-Value-Pattern predictor predictor is predict like Human doing.
    If we know “this branch was taken before when a=3 and b=4”
    We predict the branch without calculation when arrive a=3 and b=4 again.
    Commonsense design
    Why it’s not possible 100% accuracy??
    29
  • Factors of performance loss
    1. Limitation of dependence tracking
    1.1 Load Branch
    1.2 Control Dependency
    2. Hash conflict in encoding
    3. Prediction Delay
    4. Various Patterns for same direction
    4.1 Pattern capacity of predictor
    4.2 Lack of training
    30
  • Contribution
    We improve some factors of performance loss
    1.2 Control Dependency
    2 Hash conflict in encoding
    4.1 Pattern capacity of predictor
    But we still have assignments
    31
  • Applications of Register-Value-Pattern
    Register Value Pattern has limits at different kinds of branches with Branch History Pattern
    Higher performance in hybrid predictor with Branch History Pattern
    Register-Value-Pattern with Branch register value based
    Depth of dependence chain is 0
    Means Branch register is already updated
    We are good to use Branch register value based prediction in that case
    Register-Value-Pattern for Value prediction
    We can use register-value-pattern for value prediction as Information
    32
  • Outlines
    Why We need branch prediction ??
    Related works
    Improved Register-value-pattern generation
    Experiment and Evaluation
    Contribution
    Reference
    33
  • Reference
    [1] T. Yeh and Y. Patt. “Two-level Adaptive Branch Prediction” In Proc 24th ACM/IEEE IntSymp. on Microarchitecture, 1991.
    [2] T. Yeh and Y. Patt. “A Comparison of Dynamic Branch Predictors that use Two Levels of Branch History” In Proc 20th Ann IntSymp. on Computer Architecture,1993.
    [3] S. Pan , K So and J. Rahmeh. “Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation” In Proc 5th Annual Intl Conf. on Architectural Support for Prog. Lang. and Operating Systems, 1992.
    [4] R. Nair “Dynamic Path-Based Branch Correlation” In Proc 28th Ann IntSymp On Microarchitecture,1995.
    [5] D. Jim´enez“Fast Path-Based Neural Branch Prediction” In Proc 36th Ann IEEE/ACM IntSymp On Microarchitecure, 2003
    34
  • Reference
    [6] F. Gabbay and A. Mendelson“Speculative Execution Based on Value Prediction” In Technical Report Technion, 1997
    [7] J. Gonzalez and A. Gonzalez “Control-Flow Speculation through Value Prediction for Superscalar Processors” In Proc Int Conf On Parallel Architectures and Compilation Techniques, 1999
    [8] T. Heil, Z. Smith and J.E. Smith “Improving Branch Predictor by Correlating on Data Value” In Proc 32nd IntSymp On Microarchitecture,1999.
    35
  • Reference
    [9] K.Wang “Highly Accurate Data Value Prediction using Hybrid Predictors” In Proc 30thIntSymp on Microarchitecture, 1997.
    [10] M. Lipasti and J. Shen “Exceeding the Dataflow Limit via Value Prediction” In proc 29thIntSymp on Microarchitecture,1996.
    [11] W.Mohan and M.Franklin “Improving Data Value Prediction Accuracy using Path Correlation” In Proc 6thInt Conf on High performance Computing, 1999.
    [12] Y. Sazeides and J. Smith. “Implementations of Context Based Value Predictors” In Technical Report #ECE-TR-97- 8, University of Wisconsin-Madison, 1997.
    36
  • Reference
    [13]L. N. Vintan, M. Sbera, I. Z. Mihu and A. Florea, "An alternative to branch prediction: pre-computed branches," In ACM SIGARCH Computer Architecture News archive Vol 31 , 2003.
    [14] L. He and Z. Liu, “A New Value Based Branch Predictor For SMT Processors” In Proc 16th IASTED Int Conf on Parallel and Distributed Computing and System, 2004
    [15] Y. Pan, X. Fan, L. He, D. Wang “A bypass Mechanism to Enhance Branch Predictor for SMT”, In Proc 12th Asia-Pacific Conf on Computer Systems Architecture ACSAC2007, vol 4697, 2007
    37
  • Reference
    [16] L. Chen, S. Dropsho and D. H. Albonesi“Dynamic Data Dependence Tracking and its Application to Branch Prediction” In Proc 9th IntSymp on Highperformance Computer Architecture, 2003.
    [17]D.A.Jim´enez and C.Lin. “Dynamic Branch Prediction with Perceptrons”.InProc 7thIntSymp.on High Performace Computer Architecutre,2001.
    [18] D.A.Jim´enez and C.Lin. “Neural Methods for Dynamic Branch Prediction”.In ACM Transactions on Computer Systems, 2002.
    38
  • Reference
    [19] P. Chang , E. Hao and Y. Patt “Alternative Implementations of Hybrid Branch Predictors”.In Proc 28th Ann IntSymp.onMicroarchitecture, 1995.
    [20] M. Evers, P. Chang and Y. Patt “Using Hybrid Branch Predictors to Improve Branch Prediction Accuracy in The Presence of Context Switches”. In Proc 23rd Ann IntSymp. on Computer Architecture ,1996
    [21] A.Eden and T. Mudge. “The YAGS branch prediction scheme”InProc 31st Ann ACM/IEEE IntSymp.onMicroarchitectres, 1998
    [22] P. N. Glaskowsky. “Pentium 4 (partially) previewed. “In Microprocessor Report, 2000.
    39
  • 40