Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cs718min1 2008soln View


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Cs718min1 2008soln View

  1. 1. CSL 718 Architecture of High Performance Systems Minor Test I Solution 2008
  2. 2. 1. Consider the following architectural changes in a non-pipelined processor that has a clock period of T ns, executes N instructions to run a particular benchmark with an average of C cycles per instruction. i) A new instruction is introduced which replaces a sequence of operations occurring at several places in that benchmark. ii) Pipelining is introduced. iii) The stage with maximum propagation delay is split into two stages. For each of these changes, indicate how are N, T, C, T*C, and N*T*C likely to change, giving reasons. Suppose the new instruction in i) is able to replace 75% of the instructions executed, what is the upper bound on possible performance improvement by this change?
  3. 3. Solution: i) N will decrease because multiple instructions are being replaced by a single instruction. T and C are likely to go up because the new instruction has a more complex task to perform which would need more cycles and/or the cycles have to accommodate more work. Assuming that the CPI of the instructions replaced and that of the instructions not replaced is same, 25% of the execution time is remaining unaffected. Suppose the remaining execution time, which is 75%, reduces by a factor k by using the new instruction. Then the overall speedup is - 1 .75 .25 + k This can be at most 4.
  4. 4. ii) Pipelining will lead to overlapped execution of instructions. Therefore, C will decrease. Pipelining will ideally tend to make C = 1, but because of hazards, it would usually be more than 1. If pipeline stages correspond to the original break-up of instructions into cycles, T will remain unchanged. N will certainly remain unchanged as there is no change in the instruction set. iii) The stage with maximum propagation delay determines the clock period T. Therefore, if this stage is split into two stages, T will decrease (provided that there was no other stage with the same propagation delay). This would also introduce an additional cycle for the affected instructions. Therefore, C will go up. N will remain unchanged as there is no change in the instruction set.
  5. 5. 2. A processor has a non-linear pipeline with 4 stages A, B, C and D. Each instruction goes through different stages in the following order A B C B A D C. Find the bounds on the maximum instruction throughput in a static hazard free schedule. Solution: The reservation table for this pipeline is as follows. 1 2 3 4 5 6 7 A X X B X X C X X D X Intervals which cause collision are: Row A – 4 Row B – 2 Row C – 4 Row D – none. Therefore, the initial collision vector is - 001010
  6. 6. No. of 1’s in the initial collision vector = 2. Therefore, minimum average latency ≤ 2+1 = 3 That is, maximum instruction throughput ≥ 1/3 instructions per cycle. Maximum number of checks in a row of the reservation table = 2 Therefore, minimum average latency ≥ 2 That is, maximum instruction throughput ≤ 1/2 instructions per cycle.
  7. 7. 3. Compute the number of cycles lost due to a branch hazard in a pipelined processor with 5 stages – instruction fetch (IF), decode (D), execute (EX), memory access (M) and write back (WB). Assume that in a branch instruction, decision-making as well as address calculation are completed in EX stage and also assume that the branches are taken 70% of the times. Consider the following cases – i) there is no delayed branch and no branch prediction, ii) there is one delayed branch slot which is filled with a useful instruction, iii) branch is statically predicted to be taken, iv) there is a branch target address buffer which is looked up in the IF stage itself and a hit (or miss) in this buffer (assume 80% hit) is used for predicting the branch to be taken (or not taken).
  8. 8. Solution: Instruction N is the branch instruction and T is the target instruction. Instructions wrongly started and abandoned are shown in red and those executed correctly are shown in green. Time slots in which an instruction is stalled are shown as ██. i) No delayed branch slot, no branch prediction (a) branch not taken N IF|D |EX N+1 IF|██|D |EX|M |WB N+2 ██|IF|D |EX|M |WB delay = 1 (b) branch taken N IF|D |EX N+1/T IF|██|IF|D |EX|M |WB delay = 2 T+1 ██|██|IF|D |EX|M |WB Average delay = 1*0.3 + 2*0.7 = 1.7
  9. 9. ii) One delayed branch slot, filled with useful instruction N+1 (a) branch not taken N IF|D |EX N+1 IF|D |EX|M |WB N+2 ██|IF|D |EX|M |WB delay = 1 (b) branch taken N IF|D |EX N+1 IF|D |EX|M |WB T ██|IF|D |EX|M |WB T+1 ██|IF|D |EX|M |WB delay = 1 Average delay = 1*0.3 + 1*0.7 = 1.0
  10. 10. iii) Branch statically predicted to be taken (a) branch not taken (prediction incorrect) N IF|D |EX N+1/T IF|██|IF N+1 ██|IF|D |EX|M |WB delay = 1 (b) branch taken (prediction correct) N IF|D |EX N+1/T IF|██|IF|D |EX|M |WB T+1 ██|██|IF|D |EX|M |WB delay = 2 Average delay = 1*0.3 + 2*0.7 = 1.7 Here branch prediction offers no advantage, because target address calculation and decision making are happening in the same stage.
  11. 11. iv) Branch target address buffer with 80% hit (a) hit and branch not taken (prediction incorrect) N IF|D |EX T/N+1 IF|D |IF|D |EX|M |WB T+1/N+2 IF|██|IF|D |EX|M |WB delay = 2 (b) hit and branch taken (prediction correct) N IF|D |EX T IF|D |EX|M |WB T+1 IF|D |EX|M |WB delay = 0
  12. 12. (c) miss and branch not taken (prediction correct) N IF|D |EX N+1 IF|D |EX|M |WB N+2 IF|D |EX|M |WB delay = 0 (d) miss and branch taken (prediction incorrect) N IF|D |EX N+1/T IF|D |IF|D |EX|M |WB N+2/T+1 IF|██|IF|D |EX|M |WB delay = 2 Average delay = 0.8*(2*0.3 + 0*0.7) + 0.2*(0*0.3 + 2*0.7) = 0.8*0.6 + 0.2*1.4 = 0.76
  13. 13. 4. A processor with dynamic scheduling and issue bound operand fetch has 3 execution units – one LOAD/STORE unit, one ADD/SUB unit and one MUL/DIV unit. It has a reservation station with 1 slot per execution unit and a single register file. Starting with the following instruction sequence in the instruction fetch buffer and empty reservation stations, for each instruction find the cycle in which it will be issued and the cycle in which it will write result. Assume out of order issue and out load R6, 34(R12) of order execution. Execute cycles load R2, 45(R13) taken by different instructions are - mul R0, R2, R4 LOAD/STORE : 2 sub R8, R2, R6 ADD/SUB : 1 div R10, R0, R6 MUL : 2 add R6, R8, R2 DIV : 4.
  14. 14. Solution: The following chart shows the execution of the given instruction sequence cycle by cycle. The stages of instruction execution are annotated as follows: IF Instruction fetch D Decode and issue EX1 Execute in LOAD/STORE unit EX2 Execute in ADD/SUB unit EX3 Execute in MUL/DIV unit WB Write back into register file and reservation stations 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Instr cycle ⇓ no.⇒ load IF D EX1 EX1 WB • • • load IF D EX1 EX1 WB mul IF D EX3 EX3 WB sub IF D EX2 WB • • • • • • • • • div IF D EX3 EX3 EX3 EX3 WB • • • • • • • • add IF D EX2 WB
  15. 15. Cycles in which an instruction is waiting for a reservation station are marked as • and the cycles in which an instruction is waiting for one or more operands are marked as . As seen in the time chart, the issue and write back cycles for various instructions are as follows. Instruction issue cycle write back cycle load 1 4 load 4 7 mul 1 10 sub 1 9 div 10 16 add 9 12