Advanced computer architecture

7,742
-1

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,742
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
284
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Advanced computer architecture

  1. 1. CSE 8383 - AdvancedComputer Architecture Week-3 Week of Jan 26, 2004 engr.smu.edu/~rewini/8383
  2. 2. Contents Linear Pipelines Nonlinear pipelines Instruction Pipelines Arithmetic Operations Design of Multifunction Pipeline
  3. 3. Linear Pipeline Processing Stages are linearly connected Perform fixed function Synchronous Pipeline  Clocked latches between Stage i and Stage i+1  Equal delays in all stages Asynchronous Pipeline (Handshaking)
  4. 4. Latches S1 S2 S3 L1 L2Slowest stage determines delayEqual delays  clock period
  5. 5. Reservation Table TimeS1 XS2 XS3 X XS4
  6. 6. 5 tasks on 4 stages TimeS1 X X X X XS2 X X X X XS3 X X X X XS4 X X X X X
  7. 7. Non Linear Pipelines Variable functions Feed-Forward Feedback
  8. 8. 3 stages & 2 functions X Y S1 S2 S3
  9. 9. Reservation Tables for X & YS1 X X XS2 X XS3 X X XS1 Y YS2 YS3 Y Y Y
  10. 10. Linear Instruction Pipelines Assume the following instruction execution phases:  Fetch (F)  Decode (D)  Operand Fetch (O)  Execute (E)  Write results (W)
  11. 11. Pipeline Instruction ExecutionF I1 I2 I3D I1 I2 I3O I1 I2 I3E I1 I2 I3W I1 I2 I3
  12. 12. Dependencies Data Dependency (Operand is not ready yet) Instruction Dependency (Branching) Will that Cause a Problem?
  13. 13. Data DependencyI1 -- Add R1, R2, R3I2 -- Sub R4, R1, R5 1 2 3 4 5 6 F I1 I2 D I1 I2 O I1 I2 E I1 I2 W I1 I2
  14. 14. Solutions STALL Forwarding Write and Read in one cycle ….
  15. 15. Instruction DependencyI1 – Branch oI2 – 1 2 3 4 5 6 F I1 I2 D I1 I2 O I1 I2 E I1 I2 W I1 I2
  16. 16. Solutions STALL Predict Branch taken Predict Branch not taken ….
  17. 17. Floating Point Multiplication Inputs (Mantissa1, Exponenet1), (Mantissa2, Exponent2) Add the two exponents  Exponent-out Multiple the 2 mantissas Normalize mantissa and adjust exponent Round the product mantissa to a single length mantissa. You may adjust the exponent
  18. 18. Linear Pipeline for floating- point multiplication Add Multiply Normalize Round Exponents Mantissa Add Partial Normalize Round AccumulatorExponents Products Re normalize
  19. 19. Linear Pipeline for floating- point Addition Partial Add Find Partial Subtract Shift Mantissa Leading 1 ShiftExponents Re Round normalize
  20. 20. Combined Adder and Multiplier Partial B Products A F C G HExponents Partial Add Find Partial Subtract Shift Mantissa Leading 1 Shift / ADD Re Round normalize E D
  21. 21. Reservation Table for Multiply 1 2 3 4 5 6 7A XB X XC X XD X XE XFGH
  22. 22. Reservation Table for Addition 1 2 3 4 5 6 7 8 9A YBC YD YE YF Y YG YH Y Y
  23. 23. Nonlinear Pipeline Design Latency The number of clock cycles between two initiations of a pipeline Collision Resource Conflict Forbidden Latencies Latencies that cause collisions
  24. 24. Nonlinear Pipeline Designcont Latency Sequence A sequence of permissible latencies between successive task initiations Latency Cycle A sequence that repeats the same subsequence Collision vector C = (Cm, Cm-1, …, C2, C1), m <= n-1 n = number of column in reservation table Ci = 1 if latency i causes collision, 0 otherwise
  25. 25. Mul – Mul Collision (lunchafter 1 cycle) 1 2 3 4 5 6 7A X ZB X X Z ZC X X Z ZD X Z XE X ZFGH
  26. 26. Mul –Mul Collision (lunch after2 cycles) 1 2 3 4 5 6 7A X ZB X X Z ZC X X Z ZD X X ZE XFGH
  27. 27. Mul – Mul Collision (lunchafter 3 cycles) 1 2 3 4 5 6 7A X ZB X X Z ZC X X Z ZD X XE XFGH
  28. 28. Collision Vector for Multiplyafter MultiplyForbidden Latencies: 1, 2Collision vector0 0 0 0 1 1  11Maximum forbidden latency = 2  m = 2
  29. 29. Example X Y S1 S2 S3
  30. 30. Reservation Tables for X & YS1 X X XS2 X XS3 X X XS1 Y YS2 YS3 Y Y Y
  31. 31. Reservation Tables for X & YS1 X X XS2 X XS3 X X XS1 Y YS2 YS3 Y Y Y
  32. 32. Forbidden Latencies X after X X after Y Y after X Y after Y
  33. 33. X after X 2S1 X1 X2 X1 X2 X1S2 X1 X2 X1 X2S3 X1 X2 X1 X2 X1 5S1 X1 X2 X1 X1 S2 X1 X1 X2S3 X1 X1 X1 X2
  34. 34. X after X 4S1 X1 X2 X1 X1S2 X1 X1 X2 X2S3 X1 X1 X2 X1 7S1 X1 X1 X2 X1 S2 X1 X1S3 X1 X1 X1
  35. 35. Collision Vector Forbidden Latencies: 2, 4, 5, 7 Collision Vector = 1011010
  36. 36. Y after YS1 Y Y YS2 Y YS3 Y Y Y Y YS1 Y YS2 YS3 Y Y Y Y Y
  37. 37. Collision Vector Forbidden Latencies: 2, 4 Collision Vector = 1010
  38. 38. Exercise – Find the collisionvector 1 2 3 4 5 6 7A X X XB X XC X XD X
  39. 39. State Diagram for X 8+ 1011010 3 8+ 6 8+ 1* 1011011 11111113* 6
  40. 40. Cycles Simple cycles  each state appears only once(3), (6), (8), (1, 8), (3, 8), and (6,8) Greedy Cycles  simple cycles whose edges are all made with minimum latencies from their respective starting states (1,8), (3)  one of them is MAL
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×