Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lec Jan29 2009


Published on

Published in: Technology
  • Be the first to comment

Lec Jan29 2009

  1. 1. CSL718 : Superscalar Processors Issue and Despatch 29th Jan, 2009 Anshul Kumar, CSE IITD
  2. 2. Early proposals/prototypes Term Superscalar Cheetah America project(4) IBM Multititan project(2) DEC Match(2) Torch(4) Stanford U SIMP(4) DSNS(4) Kyushu U 1982 1983 1984 1985 1986 1987 1988 1989 slide 2 Anshul Kumar, CSE IITD
  3. 3. Commercial superscalars RISCs 960KA/KB ⇒ 960CA (3) • Intel 1989 • IBM Power 1 RS/6000 (4) 1990 PA7000 ⇒ PA7100 (2) • HP 1992 SPARC ⇒ SuperSparc (3) • SUN 1992 • DEC Alpha 21064(2) 1992 MC88100 ⇒ MC88110(2) • Motorola 1993 • Motorola PowerPC 601/603 (3) 1993 R4000 ⇒ R8000(4) • MIPS 1994 slide 3 Anshul Kumar, CSE IITD
  4. 4. Commercial superscalars CISCs 80486 ⇒ Pentium (2) • Intel 1993 • Motorola MC68040 ⇒ MC68060 (2) 1993 • Gmicro Gmicro/100p ⇒ Gmicro 500 (2) 1993 • AMD K5(2) – 4 RISC instr 1995 • CYRIX M1 (2) 1995 slide 4 Anshul Kumar, CSE IITD
  5. 5. Tasks of superscalar processing Parallel Parallel Preserving the decoding instruction sequential and issue execution consistency of instruction execution and exception processing slide 5 Anshul Kumar, CSE IITD
  6. 6. Superscalar decode and issue I - cache I - cache Instruction Instruction buffer buffer Scalar Superscalar Decode & Issue Issue Issue Decode & Issue IF D/I IF D I slide 6 Anshul Kumar, CSE IITD
  7. 7. Parallel Decoding • Fetch multiple instructions in instruction buffer • Decode multiple instructions in parallel – instruction window • Possibly check dependencies among these as well as with the instructions already under execution slide 7 Anshul Kumar, CSE IITD
  8. 8. Reducing decoding time Pre-decoding Second level cache • Do partial decoding while or main memory instructions are being loaded in I-cache N bits/cycle • Decoded information is Pre-decode unit appended to the instruction • This includes instruction N + n bits/cycle class, resources required I - cache etc. slide 8 Anshul Kumar, CSE IITD
  9. 9. Pre-decoding examples Pre-decoding Processor No. of predecode bits PA 7200 (1995) 5 PA 8000 (1996) 5 PowerPC 620(1996) 7 UltraSparc (1995) 4 HAL PM1 (1995) 4 AMD K5 (1995) 5 (per byte) R 10000 (1996) 4 slide 9 Anshul Kumar, CSE IITD
  10. 10. Blocking during issue Decode and issue Instruction buffer instructions issue window directly to EUs Decode Check & Issue Instructions may be blocked due to data dependency EU EU EU slide 10 Anshul Kumar, CSE IITD
  11. 11. Non-blocking Issue Non-blocking Instruction buffer Decode and issue Decode & Issue to buffers From buffers Reservation Reservation Reservation station station station dispatch to EUs Dep. Checking/ Dep. Checking/ Dep. Checking/ dispatch dispatch dispatch EU EU EU slide 11 Anshul Kumar, CSE IITD
  12. 12. Handling of Issue Blockages Preserving issue order Alignment of instruction issue aligned unaligned in-order out of order slide 12 Anshul Kumar, CSE IITD
  13. 13. Issue Order Issue in strict program order Out of order Issue Issue window Issue window Instructions Instructions to be issued e to be issued e d c b a d c b a Instructions Instructions a c a issued issued Example: MC 88110, PowerPC 601 Independent instruction Dependent instruction Issued instruction slide 13 Anshul Kumar, CSE IITD
  14. 14. Alignment Aligned Issue Unaligned Issue next window fixed window gliding window checked in cycle 1 h g f e d c b a h g f e d c b a issued a a in cycle 1 checked in cycle 2 h g f e d c b h g f e d c b issued c b c b in cycle 2 checked in cycle 3 h g f e d h g f e d issued d f e d in cycle 3 slide 14 Anshul Kumar, CSE IITD
  15. 15. Design space in instruction issue Coping with Coping with Use of Handling of Issue false data unresolved RSs issue blockages rate dependencies control (2-6) dependencies blocking non-blocking no Register renaming wait speculative slide 15 Anshul Kumar, CSE IITD
  16. 16. Frequently used issue policies in scalar processors Traditional Traditional Traditional Traditional scalar issue scalar issue scalar issue scalar issue with RSs with RSs with spec. and renaming execution i386 CDC 6600 IBM 360/91 I486 MC68030 MC68040 R3000 R4000 Sparc MicroSparc slide 16 Anshul Kumar, CSE IITD
  17. 17. Frequently used issue policies in super scalar processors Straightforward Straightforward Straight forward Advanced superscalar superscalar superscalar superscalar issue issue with issue with issue RSs renaming (renaming+RSs) R10000 (speculative execution in all) aligned unaligned PentiumPro MC68060 MC88110 Pentium PowerPC602 PowerPC602 PowerPC601 PA7200 R8000 PA8000 UltraSparc PA7100 Sparc64 SuperSparc Am29000 Alpha21164 K5 slide 17 Anshul Kumar, CSE IITD
  18. 18. Design Space of Reservation Stations Design Space of Reservation Stations Scope Layout of Operand fetch Instruction reservation policy dispatch scheme stations partial full slide 18 Anshul Kumar, CSE IITD
  19. 19. Layout of Reservation Stations Type Number of Number of read buffer entries and write ports depends on individual 2-4 no. of EUs group 6-16 Stand combined with connected central 20 alone renaming and total 15-40 (RS) reordering slide 19 Anshul Kumar, CSE IITD
  20. 20. Reservation Stations (RS) Individual RSs Group RSs Central RS RS RS RS RS RS EU EU EU EU EU EU EU EU slide 20 Anshul Kumar, CSE IITD
  21. 21. Operand Fetch Policies Dispatch Issue bound bound fetch fetch slide 21 Anshul Kumar, CSE IITD
  22. 22. Issue bound operand fetch (with single register file) (with single register file) instruction Decode/issue data RF RS RS RS RS EU EU EU EU slide 22 Anshul Kumar, CSE IITD
  23. 23. Dispatch bound operand fetch (with single register file) (with single register file) instruction Decode/issue data RS RS RS RS RF EU EU EU EU slide 23 Anshul Kumar, CSE IITD
  24. 24. Issue bound operand fetch (with multiple register files) (with multiple register files) instruction Decode/issue data RF RF RS RS RS RS EU EU EU EU slide 24 Anshul Kumar, CSE IITD
  25. 25. Dispatch bound operand fetch (with multiple register files) (with multiple register files) instruction Decode/issue data RS RS RS RS RF RF EU EU EU EU slide 25 Anshul Kumar, CSE IITD
  26. 26. Updating RFs and RSs instruction data Decode/issue RF RF RS RS RS RS EU EU EU EU slide 26 Anshul Kumar, CSE IITD
  27. 27. Instruction dispatch scheme Dispatch Dispatch Checking Treatment of policy rate operand empty RS availability single multiple instr/ instr/ cycle cycle Individual RS Group or central RS slide 27 Anshul Kumar, CSE IITD
  28. 28. Dispatch policy Selection Arbitration Dispatch rule rule order Rule for identifying Rule for choosing instructions which are one out of several ready for execution ready instructions (data dependency check) (earlier instruction has priority) slide 28 Anshul Kumar, CSE IITD
  29. 29. Dispatch order in-order partially out of out of order order check RS RS check slide 29 Anshul Kumar, CSE IITD
  30. 30. Checking availability of operands Direct check of Check of explicit score-board bits status bits in RS (usual for dispatch (usual for issue bound operand fetch) bound operand fetch) control flow approach data flow approach Flynn’s terminology slide 30 Anshul Kumar, CSE IITD
  31. 31. Score-board Score-board Introduced with CDC6600 Data status 0 1 0 1 2 1 Register File 0 1 slide 31 Anshul Kumar, CSE IITD
  32. 32. Checking in dispatch bound fetch Checking in dispatch bound fetch decoded instruction check V bits of sources Reservation station update Rd Rs1,Rs2,Rd set V bit OC Rs1 Rs2 Rd reset V bit of Rd Register File Os1 OC (opcode) Os2 (operand value) EU result, Rd slide 32 Anshul Kumar, CSE IITD
  33. 33. Checking in issue bound fetch Checking in issue bound fetch decoded update Rd, set V bit Rs1,Rs2,Rd instruction reset V bit of Rd Register File Os1 Os2 (operand value) check Vs1, Vs2 Reservation station OC, Os1, Os2, Rd OC Os1/Is1 Vs1 Os2/Is2 Vs2 Rd associative update of EU Is1, Is2 with Rd, set Vs bits result, Rd slide 33 Anshul Kumar, CSE IITD
  34. 34. Treatment of an empty RS Straight forward Bypassing approach RS if empty At least one RS RS cycle stay in RS EU EU Sparc64 Nx586 PowerPc 604 slide 34 Anshul Kumar, CSE IITD
  35. 35. Approaches in dispatching Straight forward Enhanced Advanced in order partially out of order out of order single single multiple instr/cycle instr/cycle instr/cycle individual RSs individual RSs group/central RSs Power1, PPC603 Power2 PM1, PentiumPro Nx586, Am29000 PPC604,620 PA8000, R10000 slide 35 Anshul Kumar, CSE IITD
  36. 36. Reference 1. D. Sima, T. Fountain, P. Kacsuk, quot;Advanced Computer Architectures : A Design Space Approachquot;, Addison Wesley, 1997. slide 36 Anshul Kumar, CSE IITD