Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

on

  • 860 views

Keynote presentation, How Many Cores Will We Need?, by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc., at the AMD Developer Summit (APU13), Nov. 11-13, 2013.

Keynote presentation, How Many Cores Will We Need?, by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc., at the AMD Developer Summit (APU13), Nov. 11-13, 2013.

Statistics

Views

Total Views
860
Views on SlideShare
858
Embed Views
2

Actions

Likes
0
Downloads
27
Comments
0

1 Embed 2

https://twitter.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc. Presentation Transcript

  • 1. HOW MANY CORES WILL WE NEED? IN SEARCH OF PARALLEL KILLER APPS CHIEN-PING LU, PHD MEDIATEK INC
  • 2. A GROUP OF HIPPOS IS CALLED … A Crash 2 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 3. A GROUP OF CROWS IS CALLED … A Murder 3 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 4. A GROUP OF GIRAFFES IS CALLED … From Wikipedia A Tower 4 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 5. SO, IT IS NOT SURPRISING THAT WE USE “A Parade” of elephants 5 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL “A Herd” of sheep “An Army” of ants
  • 6. FROM FREQUENCY TO MULTICORE SCALING Power Frequency performance Power Single-core Time 6 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Multi-core Power wall: 2005
  • 7. IT SEEMS INEVITABLE THAT WE WILL NEED A MASSIVE NUMBER OF CORES performance Moderate Time 7 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Massive
  • 8. DARK SILICON (OR DARK CORES)? performance 8x  4x 4x  3x 2x Time 8 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 16x  4x
  • 9. HOW TO LIGHT UP THE CORES? Redefine the cores to be heterogeneous Search for parallel killer apps power Power ceiling SIMT “cores” Little cores H.264 encoding Big cores Parallelism wall Degree of Parallelism 9 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Ray tracing
  • 10. ARMY OF ANTS: SIMT CORES FOR SIMT (SINGLE-INSTRUCTION-MULTIPLE-THREAD ) EXECUTION SIMT is the execution model of HSA and implemented in modern GPUs, with MIMD flexibility and SIMD efficiency A SIMT core runs 1 iteration of the parallel loop Parallel.For (…) Front End Front End Front End … If (…) then … Else … SPE SPE ALU 10 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL ALU ALU A cluster of SIMT cores shares one front end in a SIMD manner Specialized Processing Engines Wider SIMT ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU … A branch is emulated thru divergence
  • 11. MASSIVELY PARALLEL WORKLOADS • Problem size N can keep growing • Visible serial workload s can be kept constant • Parallel workload is speeded up by P, the number of cores • Reduction overhead is proportional to log P (by a factor of r) • "Embarrassingly" parallel, when there is no reduction overhead (r=0) s s N r log P N/P Time saved by P cores 11 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 12. REVISITING AMDAHL'S LAW s1=50%,r=50% s=50%, r=50% 10000 100 Speedup  Speedup  ss  N P ss rrlog P  1 / P  log P N Speedup 1000 N=16 N=16 N=64 N=64 N=256 N=256 P=N 10 100 10 1 1 2 2 4 4 8 8 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024 2048 4096 8192 1 1 Degree of Parallelism (P) Degree of Parallelism (P) 12 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 13. GRAPHICS KEEP MOVING Highest grossing video game of all-time bench 2.7 T-Rex GL benchmark 2.1 Egypt GFX Recognized by 94% of American Consumers Pac-man, 1980 GL benchmark 2.5 Egypt GFX bench 3.0 Manhattan Mobile 3D Graphics 13 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 14. MEDIATEK FACE BEAUTIFICATION WHEN IT COMES TO BEAUTY, THERE SEEMS TO BE NO LIMIT Before 14 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Skin tone adjustment Wrinkle removal Thinner face, bigger eyes
  • 15. HIGH-PERFORMANCE COMPUTING (HPC) KEEPS SCALING OUT More atoms Top of Top500 1993-2012 1,000,000 100,000 Relative to 1993  HPC from 1993 to 2012 ‒GFLOPS ~ 130,000x ‒Cores ~ 11,000x ‒GHz ~ 10x Higher grid resolution More time steps 15 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 10,000 1,000 GFLOPS Cores 100 10 1 1990 1995 2000 2005 2010 2015 0 GHz
  • 16. THE MISSING LINKS IN SEARCH OF PARALLEL KILLER APPS Moore’s law Better user experience Higher frequency More cores Bigger data What bigger problems to solve with bigger data? How solving bigger problems leads to better user experience? More complex Mining bigger data Bigger problems with Machine Learning software 16 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 17. MACHINE LEARNING: TREND PREDICTION WITH POWERFUL MODELS  Powerful models (with many knobs) tend to overfit the noise if the data set is not sufficiently large 350  The explosive growth of data has made powerful models feasible 250  A model with 1 billion knobs, trained with 10 million images from YouTube was used in Google Brain experiment to figure out the concepts of cats and human faces by itself 300 200 150 100 50 0 -50 0 2 4 Samples Data Linear Poly. (2nd order) Poly. (6th order) Source: Le et al., Building High-level Features Using Large Scale Unsupervised Learning 17 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 6th-order polynomial undulates excessively with only 4 samples 6
  • 18. HOW TO DISTINGUISH CATS FROM DOGS? ASIRRA Animal Species Image Recognition for Restricting Access (from Microsoft Research) 18 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 19. CAN ASIRRA BE CRACKED? 19 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 20. WHY IS IT HARD? Source: training set of Kaggle.com Dogs vs. Cats competition 20 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 21. IS THERE A MODEL FINDING OUT THAT THESE ARE THE SAME DOG? Prancer, a 5-years-old toy poodle, before and after grooming 21 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 22. MINE THE SOLUTIONS FROM THE DATA Dog-Cat classifier Theory of the differences between dogs and cats? Learn from many (12,500) photos labeled as dogs or cats Machine Learning 22 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 23. SMART AND SMARTER CLIENTS IN THE ERA OF BIG DATA Bigger Big Data Data Smarter Client Client Cloud Bigger Training Big Training Set Set Bigger Machine Machine Learning Learning In the cloud or the clients Powerful Bigger Model Better Sensing Sensing Input data 23 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Better Connectivity Connectivity Better Answer Answer Local Machine Learning
  • 24. PARALLEL COMPUTING IN THE CLOUD AND AT THE CLIENTS Examples:   dog/cat photos Sensor readings x dog or cat jogging, walking or driving f x ai  y Model Cloud Parallel Computing with more samples Samples ( xn , yn ) ai  Knobs Tweak ai  to minimize the error between f xn ai  and Model Machine Learning 24 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Client Parallel Computing with more knobs yn
  • 25. WHY HSA? Machine learning happens in the cloud and at the clients Models run in the cloud or at the clients Need same ease of programming and write-once-run-everywhere for heterogeneous cores 25 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Mediatek is one of the cofounders of HSA Foundation MediaTek is the first to introduce in mobile SoC  True Octa-Core  Heterogeneous Multiprocessing (HMP)
  • 26. SCALE OUT AND SCALE IN WITH HETEROGENEOUS CORES • Both the cloud and mobile clients are limited by power • Mobile devices need to keep cool in our palms • Data centers need to keep our environment clean • Carbon footprint of US datacenters is at the same level as the airline industry • A 1,000m2 datacenter consumes 1.5MW, enough to power 1,000 US homes per year In order to scale out, we need to scale in with heterogeneous cores in the cloud and in our palms 26 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Typical 1,000 homes in US
  • 27. BACKUP
  • 28. THE NEW VIRTUOUS CYCLE PERHAPS, LEADING TO COMPUTING LIKE OUR BRAIN Moore’s law and beyond Better user experience More heterogeneous cores Mining bigger data with Machine Learning 28 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Bigger data
  • 29. MASSIVELY PARALLEL WORKLOADS • Can keep growing the problem size N • The serial workload s can be kept constant • The parallel workload is speeded up by P, the number of cores • The reduction overhead is proportional to log P (by a factor of r) • "Embarrassingly" parallel, when there is no reduction overhead (r=0) s s N r log P N/P Time saved by P cores 29 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 30. THE ELEPHANTS: CPU CORES FOR MULTIPLE-INSTRUCTION-MULTIPLE-DATA (MIMD) EXECUTION Retrofitted for moderately parallel workloads, and not very efficient for massively parallel workloads Parallel.For (i) … If (…) Front End Front End Front End Front End Front End Front End Front End Front End Front End Front End ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU … ALU ALU Else Front End Front End … … A CPU core runs 1 iteration of the parallel loop The same color means the same piece of code 30 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  • 31. DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners. 31 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL