Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Advanced computer architecture

19,192 views

Published on

Advanced computer architecture

  1. 1. CSE 8383 - AdvancedComputer Architecture Week-3 Week of Jan 26, 2004 engr.smu.edu/~rewini/8383
  2. 2. Contents Linear Pipelines Nonlinear pipelines Instruction Pipelines Arithmetic Operations Design of Multifunction Pipeline
  3. 3. Linear Pipeline Processing Stages are linearly connected Perform fixed function Synchronous Pipeline  Clocked latches between Stage i and Stage i+1  Equal delays in all stages Asynchronous Pipeline (Handshaking)
  4. 4. Latches S1 S2 S3 L1 L2Slowest stage determines delayEqual delays  clock period
  5. 5. Reservation Table TimeS1 XS2 XS3 X XS4
  6. 6. 5 tasks on 4 stages TimeS1 X X X X XS2 X X X X XS3 X X X X XS4 X X X X X
  7. 7. Non Linear Pipelines Variable functions Feed-Forward Feedback
  8. 8. 3 stages & 2 functions X Y S1 S2 S3
  9. 9. Reservation Tables for X & YS1 X X XS2 X XS3 X X XS1 Y YS2 YS3 Y Y Y
  10. 10. Linear Instruction Pipelines Assume the following instruction execution phases:  Fetch (F)  Decode (D)  Operand Fetch (O)  Execute (E)  Write results (W)
  11. 11. Pipeline Instruction ExecutionF I1 I2 I3D I1 I2 I3O I1 I2 I3E I1 I2 I3W I1 I2 I3
  12. 12. Dependencies Data Dependency (Operand is not ready yet) Instruction Dependency (Branching) Will that Cause a Problem?
  13. 13. Data DependencyI1 -- Add R1, R2, R3I2 -- Sub R4, R1, R5 1 2 3 4 5 6 F I1 I2 D I1 I2 O I1 I2 E I1 I2 W I1 I2
  14. 14. Solutions STALL Forwarding Write and Read in one cycle ….
  15. 15. Instruction DependencyI1 – Branch oI2 – 1 2 3 4 5 6 F I1 I2 D I1 I2 O I1 I2 E I1 I2 W I1 I2
  16. 16. Solutions STALL Predict Branch taken Predict Branch not taken ….
  17. 17. Floating Point Multiplication Inputs (Mantissa1, Exponenet1), (Mantissa2, Exponent2) Add the two exponents  Exponent-out Multiple the 2 mantissas Normalize mantissa and adjust exponent Round the product mantissa to a single length mantissa. You may adjust the exponent
  18. 18. Linear Pipeline for floating- point multiplication Add Multiply Normalize Round Exponents Mantissa Add Partial Normalize Round AccumulatorExponents Products Re normalize
  19. 19. Linear Pipeline for floating- point Addition Partial Add Find Partial Subtract Shift Mantissa Leading 1 ShiftExponents Re Round normalize
  20. 20. Combined Adder and Multiplier Partial B Products A F C G HExponents Partial Add Find Partial Subtract Shift Mantissa Leading 1 Shift / ADD Re Round normalize E D
  21. 21. Reservation Table for Multiply 1 2 3 4 5 6 7A XB X XC X XD X XE XFGH
  22. 22. Reservation Table for Addition 1 2 3 4 5 6 7 8 9A YBC YD YE YF Y YG YH Y Y
  23. 23. Nonlinear Pipeline Design Latency The number of clock cycles between two initiations of a pipeline Collision Resource Conflict Forbidden Latencies Latencies that cause collisions
  24. 24. Nonlinear Pipeline Designcont Latency Sequence A sequence of permissible latencies between successive task initiations Latency Cycle A sequence that repeats the same subsequence Collision vector C = (Cm, Cm-1, …, C2, C1), m <= n-1 n = number of column in reservation table Ci = 1 if latency i causes collision, 0 otherwise
  25. 25. Mul – Mul Collision (lunchafter 1 cycle) 1 2 3 4 5 6 7A X ZB X X Z ZC X X Z ZD X Z XE X ZFGH
  26. 26. Mul –Mul Collision (lunch after2 cycles) 1 2 3 4 5 6 7A X ZB X X Z ZC X X Z ZD X X ZE XFGH
  27. 27. Mul – Mul Collision (lunchafter 3 cycles) 1 2 3 4 5 6 7A X ZB X X Z ZC X X Z ZD X XE XFGH
  28. 28. Collision Vector for Multiplyafter MultiplyForbidden Latencies: 1, 2Collision vector0 0 0 0 1 1  11Maximum forbidden latency = 2  m = 2
  29. 29. Example X Y S1 S2 S3
  30. 30. Reservation Tables for X & YS1 X X XS2 X XS3 X X XS1 Y YS2 YS3 Y Y Y
  31. 31. Reservation Tables for X & YS1 X X XS2 X XS3 X X XS1 Y YS2 YS3 Y Y Y
  32. 32. Forbidden Latencies X after X X after Y Y after X Y after Y
  33. 33. X after X 2S1 X1 X2 X1 X2 X1S2 X1 X2 X1 X2S3 X1 X2 X1 X2 X1 5S1 X1 X2 X1 X1 S2 X1 X1 X2S3 X1 X1 X1 X2
  34. 34. X after X 4S1 X1 X2 X1 X1S2 X1 X1 X2 X2S3 X1 X1 X2 X1 7S1 X1 X1 X2 X1 S2 X1 X1S3 X1 X1 X1
  35. 35. Collision Vector Forbidden Latencies: 2, 4, 5, 7 Collision Vector = 1011010
  36. 36. Y after YS1 Y Y YS2 Y YS3 Y Y Y Y YS1 Y YS2 YS3 Y Y Y Y Y
  37. 37. Collision Vector Forbidden Latencies: 2, 4 Collision Vector = 1010
  38. 38. Exercise – Find the collisionvector 1 2 3 4 5 6 7A X X XB X XC X XD X
  39. 39. State Diagram for X 8+ 1011010 3 8+ 6 8+ 1* 1011011 11111113* 6
  40. 40. Cycles Simple cycles  each state appears only once(3), (6), (8), (1, 8), (3, 8), and (6,8) Greedy Cycles  simple cycles whose edges are all made with minimum latencies from their respective starting states (1,8), (3)  one of them is MAL

×