• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Branch prediction contest_report

Branch prediction contest_report






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Branch prediction contest_report Branch prediction contest_report Document Transcript

    • Branch Prediction Contest: Implementation of Piecewise Linear Prediction Algorithm Prosunjit Biswas Department of Computer Science. University of Texas at San Antonio. Abstract First Path-Based Neural Branch Prediction[4]Branch predictor’s accuracy is very important to is another attempt that combines path and patternharness the parallelism available in ILP and thus history to overcome the limitation associated withimprove performance of today’s microprocessors preexisting neural predictors. It improved accuracyand specially superscalar processors. Among branch over previous neural predictors and achievedpredictors, various neural branch predictors significantly low latency. This predictor achieved IPCincluding Scaled Neural Branch Predictor (SNAP), of an aggressively clocked microarchitecture by 16%Piecewise Linear Branch predictor outperform other over the former perceptron predictor.state-of-the-art predictors. In this course final Scaled neural analog predictor, or SNAP is anotherproject for the course of Computer Architecture recently proposed neural branch predictor which uses(CS-5513), I have studied various neural predictors the concept of piecewise-linear branch prediction andand implemented the Piecewise Linear Branch relies on a mixed analog/digital implementation. ThisPredictor as per the algorithm provided by a predictor decreases latency over power consumptionresearch paper of Dr. Daniel A. Jimenez. The over other available neural predictors [5]. Fig.1hardware budget is restricted for this project and I (Courtesy – “An Optimized Scaled Neural Branchhave implemented the predictor within a predefined Predictor” by Daniel A. Jimenez) shows comparativehardware budget of 64K of memory. I am also performance of noted branch prediction approaches oncompeting for branch prediction contest. a set of SPEC CPU 2000 and 2006 integer benchmarks. III. THE ALGORITHMKeywords: Piecewise Linear, Neural Network, The Branch predictor algorithm has two major partsBranch Prediction. namely i) Prediction algorithm ii) Train/Update algorithm. Before going to the implementation of these I. INTRODUCTIONNeural Branch predictors are the most accuratepredictors in the literature but they were impracticaldue to the high latency associated with prediction. Thislatency is due to the complex computation that must becarried out to determine the excitation of an artificialneuron. [3]Piecewise Linear Branch Prediction [1] improved bothaccuracy and latency over previous neural predictors.This predictor works by developing a set of linearfunctions, one for each program path to the branch tobe predicted that separate predicted taken frompredicted untaken.In this paper, Piecewise Linear Branch Prediction,Daniel A. Jimenez proposed two versions of theprediction algorithm – i) The Idealized PiecewiseLinear Branch Predictor and ii) A Practical PiecewiseLinear Branch Predictor. In this project, I have focusedon the idealized predictor. II. RELATED WORKS Fig. 1. Performance of Branch different branch Predictors over SPEC CPU 2000 and 2006 integer benchmarks (Courtesy - “An Optimized Scaled NeuralPerceptron prediction is one of the first attempts in Branch Predictor” by Daniel A. Jimenez)branch prediction history that associated branch two algorithms, we will discuss the states and variableprediction through neural network. This predictor they use. The three dimensional array W is the dataachieved a improved misprediction rate on a composite structure used to store weights of the branches which istrace of SPEC2000 benchmarks by 14.7%. [2] But used in both prediction and update algorithm.unfortunately, this predictor was impractical due to itshigh latency.
    • Table II: The update/train algorithm void update (branch_update *u, bool taken, unsigned int target) { if (bi.br_flags & BR_CONDITIONAL) { Fig2: The array of W with its corresponding indices if ( abs(output)< theta || ( (output>=0) != taken) ){ if (taken == true ) {Branch address is generally taken as the last 8/10 bits if (W[address][0][0] < SAT_VAL)of the instruction address. For each predicting branch, W[address][0][0] ++;the algorithm keeps history of all other branches that } else { if (W[address][0][0] > (-1) * SAT_VAL)precede this branch in the dynamic path taken by the W[address][0][0] --;branch. The second dimension indicated by the variableGA keeps track of these per branch dynamic path }history. The third dimension, as shown as GHR[i], for(int i=0; i<H-1; i++) { if(GHR[i] == taken ) {keeps track of the position of the address GA[i] in the if (W[address][GA[i]][i] < SAT_VAL)global branch history register namely GHR. W[address][GA[i]][i] ++; } else {Some of the important variables of the algorithm is also if (W[address][GA[i]][i] > (-1) * SAT_VAL+1 )given here for the clarity purpose. W[address][GA[i]][i] --; }GA : An array of address. This array keeps the path }history associated with each branch address. As new } shift_update_GA(address);branch is executed, the address of the branch is shifted shift_update_GHR(taken);into the first position of the array. } }GHR: An array of Boolean true/false value. This arraykeep track of the taken / untaken status of the branches.H : Length of History Register. IV. TUNING PERFORMANCEOutput: An integer value generated by the predictor Besides the algorithm, the MPKI (Miss Per Kiloalgorithm to predict current branch. Instruction) rate of the algorithm depends on the size of various dimension of the array W. I have experienced MPKI against various dimension of W. The result ofTable I: The prediction algorithm. my experiment is shown below. Table 1 shows the result of the experiment.void branch_update *predict (branch_info & b) { Table I : MPKI rate of the Piecewise Linear Algorithm bi = b; if (b.br_flags & BR_CONDITIONAL) { with limited budget of 64K address = ( ((b.address >> 4 ) & 0x0F )<<2) | ((b.address>>2)) & 0x03; W[i][GA[i]][GHR[i] MPKI output = W[address][0][0]; for (int i=0; i<H; i++) { W[64][16][64] 3.982 if ( GHR[i] == true ) W[128][16][32] 4.217 output += W[address][GA[i]][i]; W[64][8][128] 4.292 else if (GHR[i] == false) W[32][16][128] 5.807 output -= W[address][GA[i]][i]; W[64][64][16] 4.826 } u.direction_prediction(output>=0); The table shows that the predictor performs better when } else { i, GA[i], GHR[i] has corresponding 64,16,64 entries. u.direction_prediction (false); } u.target_prediction (0); V. TWEAKING INSTRUCTION ADDRESS return &u;} I have found that rather than taking the last bits from the address, discarding the 2 least significant bits of the address and then taking 3-8 bits make the predictor predicts more accurately. It decreases the aliasing and thus improves prediction rate a little bit.
    • Table II: 64 K ( 65,532 Byte) memory budget limit calculation DataStructure/Array/Varia Memory calculation Fig. 3: Tweaking Branch address for performance ble speed up. W[64][16][63] of each 1 64,512 byte Byte long Constants(SIZE,H,SAT_V 5*1 byte ( each value < 128) VI. RESULT AL,theta,N) (GA[63] * 6 bits / 8) byte 48 byte (GHR[63] * 1 bit / 8) byte 8 byteMisprediction rate of the benchmarks according to the vaiables (address , output ) 8 bytepiecewise linear algorithm is shown in fig 4. Fig.5 * 4 byteshows comparison of different prediction Total: 64,581 bytealgorithms(piecewise linear, perceptron and gshare)against various given benchmarks. 14 12 VIII CONCLUSION 10 8 In this individual course final project, I have tried to 6 implement the piecewise linear branch prediction 4 algorithm. . In my implementation, I have achieved a 2 MPKI of 3.988 at best. I think, it is also possible to 0 enhance the performance of this algorithm with better /253.perlbmk 222.mpegaudio 300.twolf 205.raytrace 255.vortex 227.mtrt 256.bzip2 164.gzip 181.mcf 197.parser 201.compress 209.db 186.crafty 176.gcc 213.javac 175.vpr 202.jess 252.eon 254.gap 228.jack implementation tricks. I have also compared the performance of piecewise prediction algorithm with perceptron and gshare algorithms. With the same memory limit, piecewise prediction performsFig 4: Misprediction rate of different benchmarks using significantly better than the other two. piecewise linear prediction algorithm REFERENCES [1] Daniel A. Jimenez. Piecewise linear branch prediction. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA-32), June 2005. [2] D. Jimenez and C. Lin. Dynamic branch prediction with per-ceptrons. In Proceedings of the Seventh International Sym-posium on High Performance Computer Architecture,Jan-uary 2001 [3] Lakshminarayanan, Arun; Shriraghavan, Sowmya, “Neural Branch Prediction” available at Fig 5: Comparison of prediction algorithms against http://webspace.ulbsibiu.ro/lucian.vintan/html/neu different benchmarks on given 64K budget. ralpredictors.pdf [4] D.A. Jimenez, “Fast Path-Based Neural Branch VII. 64K BUDGET CALCULATION Prediction,” Proc. 36th Ann. Int’l Symp. Microarchitecture, pp. 243-252, Dec. 2003.I have limited the implementation of piecewise linearprediction algorithm within 64K + 256 byte memory. [5] D.A. Jimenez, “An optimized scaled neural branchThe algorithm performs better as I increase the memory predictor,” Computer Design (ICCD), 2011 IEEElimit. In table II, I have shown the calculation of 64K + 29th International Conference, pp. 113 - 118, Oct.256 byte budget. 2011.