Your SlideShare is downloading. ×
Lec Jan15 2009
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Lec Jan15 2009

478
views

Published on

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
478
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
41
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. CSL718 : Pipelined Processors PipelineTimings 15th Jan, 2009 Anshul Kumar, CSE IITD
  • 2. Pipelined Processors Pipelined Processors Parallel architectures Function-parallel Data-parallel Instr level (ILP) Thread level Process level Intel’s terminology: Pipelined VLIWs Superscalar • intra ILP processors processors • inter ILP slide 2 Anshul Kumar, CSE IITD
  • 3. Ideal Pipelining Tinst S stages slide 3 Anshul Kumar, CSE IITD
  • 4. Determining Clock Period P Reg Reg Comb Clock Δt Δt ≥ P Δt = Pmax P = propagation delay Pmax = max propagation delay slide 4 Anshul Kumar, CSE IITD
  • 5. Ideal Pipelining Tinst S stages Pmax = Tinst / S Δt = Tinst / S Effective CPI = 1 Effective time per inst Teff = CPI * Δt = 1 * Tinst / S slide 5 Anshul Kumar, CSE IITD
  • 6. Pipelining with hazards Tinst S stages Frequency of interruptions - b Δt = Tinst / S CPI = 1 + (S - 1) * b Teff = (1 + (S - 1) * b) * Tinst / S slide 6 Anshul Kumar, CSE IITD
  • 7. Teff vs. S (Tinst = 10) 12 10 8 b = .2 Teff 6 b = .1 b = .05 4 2 0 1 2 3 4 5 6 7 8 9 10 S
  • 8. A more realistic view P Reg Reg Comb Clock Register output delay Register setup time Clock skew slide 8 Anshul Kumar, CSE IITD
  • 9. Clocking Overhead • Fixed overhead c – Setup time – Output delay • Variable overhead (stretching factor) k – Clock skew Δt = Pmax + k * Pmax + c = (1 + k) * Tinst / S + c Teff = [1 + (S - 1) * b] * [(1 + k) * Tinst / S + c] slide 9 Anshul Kumar, CSE IITD
  • 10. Teff vs. S (Tinst = 10, c = 1, k = .1) 14 12 10 8 b = .2 Teff b = .1 6 b = .05 4 2 0 1 3 5 7 9 11 13 15 S
  • 11. Pipelining with Clocking Overhead Teff = [1 + (S - 1) * b] * [(1 + k) * Tinst / S + c] Sopt = √ [(1 - b) * (1 + k) * Tinst / (b * c)] slide 11 Anshul Kumar, CSE IITD
  • 12. Partitioning instruction into cycles with non-uniform stage times non-uniform One action - one pipeline stage => large quantization overhead Multiple actions per stage? Multiple stages per action? slide 12 Anshul Kumar, CSE IITD
  • 13. Example Put Away 2 ns Execute 7+7+8 ns Data - ALU 3 ns Cache Data 10 ns Cache Dir 6 ns Addr - MAR 3 ns Gen Addr 9ns Decode 6+6 ns Data - IR 3 ns Cache Data 10 ns Cache Dir 6 ns PC - MAR 4 ns slide 13 Anshul Kumar, CSE IITD
  • 14. Optimal Pipelining Tinst = 4+6+10+3+12+9+3+6+10+3+22+2 = 90 ns b = 0.2 c = 4 ns k = 5% Sopt = √ [(1 - b) * (1 + k) * Tinst / (b * c)] = 9.7 ⇒ 9 Pmax = 10 ns slide 14 Anshul Kumar, CSE IITD
  • 15. Example Put Away 2 ns Execute 7+7+8 ns Data - ALU 3 ns Pmax = 10 ns Cache Data 10 ns Cache Dir 6 ns Addr - MAR 3 ns Gen Addr 9ns S = 10 Δt = 14.5 ns Decode 6+6 ns S * Δt = 145 ns Data - IR 3 ns Cache Data 10 ns Cache Dir 6 ns PC - MAR 4 ns slide 15 Anshul Kumar, CSE IITD
  • 16. Example Put Away 2 ns Execute 7+7+8 ns Data - ALU 3 ns S=9 Cache Data 10 ns Cache Dir 6 ns Addr - MAR 3 ns Gen Addr 9ns Pmax = 13 ns Δt = 17.65 ns Decode 6+6 ns S * Δt = 159 ns Data - IR 3 ns Cache Data 10 ns Cache Dir 6 ns PC - MAR 4 ns slide 16 Anshul Kumar, CSE IITD
  • 17. Example Put Away 2 ns Execute 7+7+8 ns Data - ALU 3 ns Pmax = 20 ns Cache Data 10 ns Cache Dir 6 ns Addr - MAR 3 ns Gen Addr 9ns S=5 Δt = 25 ns Decode 6+6 ns S * Δt = 125 ns Data - IR 3 ns Cache Data 10 ns Cache Dir 6 ns PC - MAR 4 ns slide 17 Anshul Kumar, CSE IITD
  • 18. Comparison Δt S * Δt S Pmax Teff 9 13 17.65 159 45.89 10 10 14.50 145 40.60 5 20 25.00 125 45.00 slide 18 Anshul Kumar, CSE IITD
  • 19. Cycle Quantization Delays are not integral multiple of clock period Total overhead = clocking overhead + quantization overhead Δt ≥ Tinst / S + c (ignoring k) ∴ S * Δt ≥ Tinst + S * c Quantization overhead = S * (Δt - c) -Tinst This reduces as clock period becomes small slide 19 Anshul Kumar, CSE IITD
  • 20. Other Timing Approaches • Self Timed Circuits – No centralized free running clock – An operation begins as soon as its inputs are available, that is, all its predecessors have completed – Higher speed, lower power consumption • Wave Pipelining – Omit inter-stage registers – Reduced clocking overhead slide 20 Anshul Kumar, CSE IITD
  • 21. Conventional vs Wave Pipelining Conventional vs Wave Pipelining Conventional Pipeline Wave Pipeline • Registers separate • No registers between adjoining stages adjoining stages • Clock period > max prop • Clock period less than delay max prop delay • Inter-stage data stored in • Waves of data propagate registers through combinational network (effectively, data is stored in the combinational circuit delay!) slide 21 Anshul Kumar, CSE IITD
  • 22. No pipelining Reg X X’ Reg Y Clock X X’ Y slide 22 Anshul Kumar, CSE IITD
  • 23. Conventional pipelining Reg X X’ Y Y’ Z Z’ Reg W Clock X X’ Y Y’ Z Z’ W
  • 24. Wave pipelining Reg X Z’ Reg W Clock X Z’ slide 24 Anshul Kumar, CSE IITD W
  • 25. Timing Reg Reg Comb ckt X Y Clock T≥p+s T clock period X Y p s propagation delay set-up time slide 25 Anshul Kumar, CSE IITD
  • 26. Timing with clock skew Reg Reg Comb ckt X Y Clock T Clock skew = ±δ X Y p s δ δ T ≥ p + s + 2δ slide 26 Anshul Kumar, CSE IITD
  • 27. Variation in propagation delay • Different delays in different paths • Delay variation due to process / temperature/ power variations • Data-dependent delay variations slide 27 Anshul Kumar, CSE IITD
  • 28. Timing for wave pipelining Reg Reg Comb ckt X Y Clock T ±δ X Δp pmin Y pmax T ≥ Δ p + s + 4δ slide 28 Anshul Kumar, CSE IITD
  • 29. Timing for wave pipelining (expanded view) T X Δp Y nT (n-1) T pmin pmax pmin ≥ (n-1) T + 2δ nT ≥ pmax + s + 2δ ⇒T ≥ Δ p + s + 4δ slide 29 Anshul Kumar, CSE IITD
  • 30. Comparison Conventional Pipeline Wave Pipeline T ≥ pmax/n + s + 2δ T ≥ Δ p + s + 4δ (plus cycle quantization overhead) nT ≥ pmax + ns + 2nδ nT ≥ pmax + s + 2δ slide 30 Anshul Kumar, CSE IITD
  • 31. Problems with wave pipelining • Need to balance delays • Narrow range of clock frequencies • Control difficult • Not very suitable for non-linear pipelines slide 31 Anshul Kumar, CSE IITD
  • 32. References 1. M.J. Flynn, quot;Computer Architecture : Pipelined and Parallel Processor Designquot;, Narosa Publishing House/ Jones and Bartlett, 1996. 2. Wayne P. Burleson, Maciej Ciesielski, Fabian Klass, and Wentai Liu, “Wave-Pipelining: A Tutorial and Research Survey”, IEEE Trans. on VLSI Systems, vol. 6, no. 3, September 1998, pp. 464 – 474. slide 32 Anshul Kumar, CSE IITD