Apu13 cp lu-keynote-final-slideshare

1,748 views

Published on

How Many Cores do we need in future SOC, and impact of HSA on this.
By
Chien-ping lu, PhD
Sr. director, Mediatek inc

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,748
On SlideShare
0
From Embeds
0
Number of Embeds
718
Actions
Shares
0
Downloads
62
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Apu13 cp lu-keynote-final-slideshare

  1. 1. How many cores will we need? Chien-ping lu, phd Sr. director, Mediatek inc
  2. 2. a group of hippos is called … A Crash 2 | how many cores will we need? | December 4, 2013 | Confidential
  3. 3. a group of crows is called … A Murder 3 | how many cores will we need? | December 4, 2013 | Confidential
  4. 4. a group of giraffes is called … From Wikipedia A Tower 4 | how many cores will we need? | December 4, 2013 | Confidential
  5. 5. So, it is not surprising that we use “A Parade” of elephants 5 | how many cores will we need? | December 4, 2013 | Confidential “A Herd” of sheep “An Army” of ants
  6. 6. From frequency to MULTIcore scaling Power Frequency performance Power Serial Computing Time 6 | how many cores will we need? | December 4, 2013 | Confidential Parallel Computing Power wall: 2005
  7. 7. How many cores will we need? Performance Moderate Time 7 | how many cores will we need? | December 4, 2013 | Confidential Massive
  8. 8. Dark silicon (OR DARK CORES)? Performance 8x  4x 2x Time 8 | how many cores will we need? | December 4, 2013 | Confidential 4x  3x 16x  4x
  9. 9. Light up the cores Redefine the cores to be heterogeneous Redefine the cores to be heterogeneous Dark Silicon: Dark Silicon: A concern on power A concern on power Power ceiling re w p o GPU-style “cores” Little cores Body tracking Big cores Parallelism wall Amdahl’s law Degree of Parallelism (number of cores) 9 An argument against An argument against parallel computing parallel computing | how many cores will we need? | December 4, 2013 | Confidential Ray tracing
  10. 10. The elephants: CPU cores For multiple-instruction-multiple-DATA (MIMD) execution Retrofitted for moderately parallel workloads, and not very efficient for massively parallel workloads Parallel.For (…) … Front End Front End Front End Front End Front End Front End Front End Front End Front End Front End ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU … ALU ALU Else Front End Front End … … A CPU core runs 1 iteration of the parallel loop The same color means the same piece of code 10 | how many cores will we need? | December 4, 2013 | Confidential
  11. 11. army of ants: simt cores For SIMT (single-instruction-multiple-thread ) Execution A SIMT core runs 1 iteration of the parallel loop Parallel.For (…) SIMT is the execution model of HSA and implemented in modern GPUs, with MIMD flexibility and SIMD efficiency Front End Front End Front End … … Else … SFU 1 SFU 0 A cluster of SIMT cores shares one front end in a SIMD manner 11 | how many cores will we need? | December 4, 2013 | Confidential Can achieve better power efficiency with more specialized function units given the right workload ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU … A branch is emulated thru divergence
  12. 12. Properties of massively data-parallel workloads • Problem size N of the parallel workload can keep growing • Visible serial workload s can be kept constant • Communication overhead is proportional to log P (by a factor of r) • Parallel workload is speeded up linearly by P, the number of cores • "Embarrassingly" parallel, when there is no communication overhead (r=0) ss ss N N rrlog P log P N/P N/P Time saved by P cores 12 | how many cores will we need? | December 4, 2013 | Confidential
  13. 13. Revisiting Amdahl's law for trend prediction Speedup = Speedup = 13 | how many cores will we need? | December 4, 2013 | Confidential s s +PN + ss+ rrlog PP + 1 / P + log + N
  14. 14. Mediatek face beautification When it comes to beauty, there seems to be no limit Before 14 | how many cores will we need? | December 4, 2013 | Confidential Skin tone adjustment Wrinkle removal Thinner face, bigger eyes
  15. 15. graphics keeps moving Recognized by 94% of American Consumers GL benchmark 2.1 Egypt, 2011 GFX bench 2.7 T-Rex, 2013 Highest grossing video game of all-time Pac-man, 1980 GL benchmark 2.5 Egypt, 2012 GFX bench 3.0 Manhattan, 2013 Mobile 3D Graphics 15 | how many cores will we need? | December 4, 2013 | Confidential
  16. 16. High-performance computing (HPC) keeps scaling out  HPC from 1993 to 2012 ‒GFLOPS ~ 130,000x ‒Cores ~ 11,000x ‒GHz ~ 10x More atoms Higher grid resolution More time steps 16 | how many cores will we need? | December 4, 2013 | Confidential
  17. 17. parallel killer apps are just around the corner completing the positive feedback loop Moore’s law Moore’s law Better user Better user experience experience Higher Frequency More cores Higher Frequency More cores What bigger problems to solve with bigger data? How solving bigger problems leads to better user experience? Bigger data-parallel Bigger data-parallel workloads in Graphics workloads in Graphics and HPC and HPC 17 | how many cores will we need? | December 4, 2013 | Confidential Data Data Mining bigger data Mining bigger data More complex More complex with Machine Bigger Machine with problems Biggerproblems software software Learning Learning
  18. 18. How to distinguish cat photos from dog ones? ASIRRA Animal Species Image Recognition for Restricting Access (from Microsoft Research) 18 | how many cores will we need? | December 4, 2013 | Confidential
  19. 19. Why is it hard? Source: training set of Kaggle.com Dogs vs. Cats competition 19 | how many cores will we need? | December 4, 2013 | Confidential
  20. 20. is there a solution to relate photos from the same dog? Prancer, a 5-years-old toy poodle, before and after grooming 20 | how many cores will we need? | December 4, 2013 | Confidential
  21. 21. MINE the solutions from the data Dog-Cat Dog-Cat classifier classifier Theory of the differences Theory of the differences between dogs and cats? between dogs and cats? Learn from many (12,500) Learn from many (12,500) photos labeled as dogs or photos labeled as dogs or cats cats Machine Learning Machine Learning 21 | how many cores will we need? | December 4, 2013 | Confidential
  22. 22. machine learning: prediction with powerful models  More powerful have more knobs, which need to be determined with a bigger data set  The explosive growth of data has made very powerful models feasible 6th-order polynomial over-fits the 4 samples 22 | how many cores will we need? | December 4, 2013 | Confidential
  23. 23. From data to user experience dog/cat photos Sensor readings Depth images Examples: x Bigger data lead to more Bigger data lead to more powerful models powerful models Web-scale Data ( xn , y n ) Client f x { ai } Model ai Knobs Cloud { } dog or cat jogging, walking or climbing body motion    y models with Powerful models with Powerful more knobs lead to more knobs lead to better user experience better user experience Determine { ai } to minimize the error between f xn { ai } and Model Machine Learning 23 | how many cores will we need? | December 4, 2013 | Confidential yn
  24. 24. Smart clients in the era of data Smarter Client Client Smarter Client Client Cloud Bigger Training Bigger Training Big Training Set Big Training Set Set Set In the cloud or the clients Better Better Connectivity Connectivity Connectivity Connectivity 24 | how many cores will we need? | December 4, 2013 | Confidential More powerful More powerful Powerful Model Powerful Model Model Model Better User User Better User User Experience Experience Better Sensing Sensing Better Sensing Sensing Bigger Data Bigger Data Data Mining Data Mining Mining Mining Local Machine Local Machine Learning Learning Input Input data data
  25. 25. Looking forward  The future is here ‒ There are already massively parallel heterogeneous processors  There is no shame in being dataparallel ‒ One of the smartest things achieved in computing is data parallel Source: Le et al., Building High-level Features Using Large Scale Unsupervised Learning 25 | how many cores will we need? | December 4, 2013 | Confidential  Go parallel and go heterogeneous to keep  Mobile device cool in our palms  Data centers clean for our environment Carbon footprint of US datacenters is at the same level as the airline industry
  26. 26. Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners. 26 | how many cores will we need? | December 4, 2013 | Confidential

×