Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CPU Performance Enhancements

2,981 views

Published on

Introduction to modern concepts that enhance CPU performance and power efficiency

Published in: Engineering
  • Ripley's Believe It Or Not Investigated Him After His 5th Win...(unreal story inside) ★★★ https://tinyurl.com/t2onem4
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Stop Chasing Odds... Learn How To Master Them ◆◆◆ https://tinyurl.com/t2onem4
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Your wife will never find out! .. Just send me a message and ask to F.U.C.K. ♥♥♥ http://t.cn/AiuWSRdj
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Increasing Sex Drive And Getting Harder Erections, Naturally ➤➤ https://tinyurl.com/yy3nfggr
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Your opinions matter! get paid BIG $$$ for them! START NOW!!.. ♣♣♣ http://ishbv.com/surveys6/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

CPU Performance Enhancements

  1. 1. CPU Performance Enhancements CS2052 Computer Architecture Computer Science & Engineering University of Moratuwa Dilum Bandara Dilum.Bandara@uom.lk
  2. 2. Pipelining – It’s Natural!  Laundry example  Amal, Bimal, Chamal, & Dinal each have one load of clothes to wash, dry, & fold  Washer takes 30 minutes  Dryer takes 40 minutes  Folder takes 20 minutes A B C D 2
  3. 3. Sequential Laundry  Sequential laundry takes 6 hours for 4 loads  If they learned pipelining, how long would laundry take? A B C D 30 40 20 30 40 20 30 40 20 30 40 20 6 PM 7 8 9 10 11 Midnight T a s k O r d e r Time 3
  4. 4. Pipelined Laundry – Start Work ASAP  Pipelined laundry takes 3.5 hours for 4 loads A B C D 6 PM 7 8 9 10 11 Midnight T a s k O r d e r Time 30 40 40 40 40 20 4
  5. 5. Pipelining Lessons  Pipelining doesn’t reduce latency of a single task  Improve throughput of entire workload  Pipeline rate limited by slowest pipeline stage  Multiple tasks operating simultaneously  Potential speedup = No pipe stages  Unbalanced lengths of pipe stages reduces speedup  Time to fill pipeline & time to drain/flush it reduces speedup A B C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20
  6. 6. 6 Source: http://mail.humber.ca/~paul.mi chaud/Pipeline.htm Instruction Level Parallelism (ILP)
  7. 7. CPU Pipelines 7 Source: http://en.wikipedia.org/wiki/Classic_RISC_pipeline 5-stage MIPS pipeline
  8. 8. 8
  9. 9. Pipeline With a Branch Penalty Due to a Taken Branch 9 Source: http://mail.humber.ca/~paul.michaud/Pipeline.htm
  10. 10. Superscalar Architectures  Executes more than 1 instruction during a clock cycle by simultaneously dispatching multiple instructions to redundant functional units 10 Source: http://mail.humber.ca/~paul.michaud/Pipeline.htm
  11. 11. Intel Hyper Threading (HT)  Introduced with Intel Pentium 4  Allows 2 different resources of CPU to be used at the same time  While 1st thread (instruction) is working with integers (ALU’s integer unit) 2nd thread can work on floating point numbers (ALU’s floating point unit)  OS feels that there are 2 logical CPUs  Achieved through a mix of shared, replicated, & partitioned chip resources such as:  Registers  Arithmetic units  Cache memory 11
  12. 12. Amdahl’s Law  What’s maximum expected improvement to an overall system when only part of it is improved?  Amdahl said this relationship is not linear 12
  13. 13. Amdahl’s Law (Cont.) 13 Best you could ever hope to do  enhanced maximum Fraction-1 1 Speedup 
  14. 14. Amdahl’s Law – Example  Floating point instructions improved to run 2X; but only 10% of actual instructions are FP 14 Speedupoverall = 1 0.95 = 1.053 ExTimenew = ExTimeold × (0.9 + 0.1/2) = 0.95 × ExTimeold
  15. 15. Moore’s Law – Today’s Status 15 Moore’s Law – No of transistors on a chip tends to double about every 2 years Transistor count still rising Clock speed flattening sharply www.extremetech.com/wp- content/uploads/2012/02/CPU-Scaling.jpg
  16. 16. Dual Core  Introduced by IBM Power4  However, AMD brought it to consumer market  Combines 2 independent CPUs & their respective caches onto a single silicon chip  Provide better performance improvement than HT  True parallelism 16
  17. 17. Multi-Core 17 Source: www.anandtech.com/show/5174/why-ivy-bridge-is- still-quad-core
  18. 18. Multi-Core (Cont.) 18 Source: www.legitreviews.com/intel-core-i7-4770k-haswell-3-5ghz-quad-core-cpu-review_2203
  19. 19. Multi-Core (Cont.) 19 Source: www.hardwarecanucks.com/news/cpu/intel-launch-8-core-xeon-nehalemex/
  20. 20. Multi-Cores + Hyper Threading 20 Source: www.notebookcheck.net/Intel-Core-i7-Notebook-Processor-Clarksfield.21025.0.html
  21. 21. NVIDIA Tesla 2070 Many-Cores  GPUs  Graphic Processing Unit  NVIDIA & ATI  SIMD – Single Instruction Multiple Data  Intel Xeon Phi  General purpose 21 Intel Xeon Phi
  22. 22. Example Specifications 22 GTX 480 Tesla 2070 Tesla K80 Peak double precision FP performance 650 Gigaflops 515 Gigaflops 2.91 Teraflops Peak single precision FP performance 1.3 Teraflops 1.03 Teraflops 8.74 Teraflops CUDA cores 480 448 4992 Frequency of CUDA Cores 1.40 GHz 1.15 GHz 560/875 MHz Memory size (GDDR5) 1536 MB 6 GB 24 GB Memory bandwidth 177.4 GB/sec 150 GB/sec 480 GB/sec ECC Memory No Yes Yes
  23. 23. CPU vs. GPU Architecture 23 GPU devotes more transistors for computation
  24. 24. Multithreaded SIMD Processor 24 Source: Computer Architecture by John L. Hennessy and David A. Patterson
  25. 25. NVIDIA CUDA Architecture 25
  26. 26. Intel Xeon Phi 26 Source: www.pcgameshardware.de/Xeon-Phi-Hardware-256199/News/Intel-Xeon-Phi-Hardware- Informationen-1040924/
  27. 27. Intel Xeon Phi (Cont.) 27 Source: www.altera.com/technology/system-design/articles/2012/multicore-many-core.html
  28. 28. Power Consumption  Dynamic energy  Transistor switch from 0  1 or 1  0  ½ × Capacitive load × Voltage2  Dynamic power  ½ × Capacitive load × Voltage2 × Frequency switched  Static power consumption  Currentstatic × Voltage  Scales with no of transistors  Reducing voltage reduces energy  Reducing clock rate reduces power, not energy  Power gating than not only taking out clock signal28

×