HOW MANY CORES WILL WE NEED?
IN SEARCH OF PARALLEL KILLER APPS
CHIEN-PING LU, PHD
MEDIATEK INC
A GROUP OF HIPPOS IS CALLED …

A Crash
2 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
A GROUP OF CROWS IS CALLED …

A Murder
3 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
A GROUP OF GIRAFFES IS CALLED …

From Wikipedia

A Tower
4 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTI...
SO, IT IS NOT SURPRISING THAT WE USE

“A Parade” of elephants

5 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONF...
FROM FREQUENCY TO MULTICORE SCALING

Power

Frequency

performance

Power

Single-core
Time
6 | HOW MANY CORES WILL WE NEE...
IT SEEMS INEVITABLE THAT WE WILL NEED A MASSIVE NUMBER OF CORES

performance

Moderate
Time
7 | HOW MANY CORES WILL WE NEE...
DARK SILICON (OR DARK CORES)?

performance
8x  4x
4x  3x
2x

Time
8 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 |...
HOW TO LIGHT UP THE CORES?
Redefine the cores to be heterogeneous

Search for parallel killer apps

power

Power ceiling

...
ARMY OF ANTS: SIMT CORES
FOR SIMT (SINGLE-INSTRUCTION-MULTIPLE-THREAD ) EXECUTION

SIMT is the execution model of HSA and
...
MASSIVELY PARALLEL WORKLOADS
• Problem size N can keep growing

• Visible serial workload s can be kept constant
• Paralle...
REVISITING AMDAHL'S LAW

s1=50%,r=50%
s=50%, r=50%
10000
100

Speedup 
Speedup 

ss  N
P
ss rrlog P  1 / P
 log P ...
GRAPHICS KEEP MOVING

Highest grossing video
game of all-time bench 2.7 T-Rex
GL benchmark 2.1 Egypt
GFX
Recognized by 94%...
MEDIATEK FACE BEAUTIFICATION
WHEN IT COMES TO BEAUTY, THERE SEEMS TO BE NO LIMIT

Before

14 | HOW MANY CORES WILL WE NEED...
HIGH-PERFORMANCE COMPUTING (HPC) KEEPS SCALING OUT

More atoms

Top of Top500 1993-2012
1,000,000
100,000
Relative to 1993...
THE MISSING LINKS
IN SEARCH OF PARALLEL KILLER APPS

Moore’s law

Better user
experience

Higher frequency
More cores

Big...
MACHINE LEARNING: TREND PREDICTION WITH POWERFUL MODELS
 Powerful models (with many knobs) tend to overfit the noise if t...
HOW TO DISTINGUISH CATS FROM DOGS?
ASIRRA
Animal Species Image Recognition for Restricting Access (from Microsoft Research...
CAN ASIRRA BE CRACKED?

19 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
WHY IS IT HARD?

Source: training set of Kaggle.com Dogs vs. Cats competition
20 | HOW MANY CORES WILL WE NEED? | DECEMBER...
IS THERE A MODEL FINDING OUT THAT THESE ARE THE SAME DOG?

Prancer, a 5-years-old toy poodle, before and after grooming
21...
MINE THE SOLUTIONS FROM THE DATA
Dog-Cat
classifier

Theory of the differences
between dogs and cats?

Learn from many (12...
SMART AND SMARTER CLIENTS IN THE ERA OF BIG DATA

Bigger
Big Data
Data
Smarter Client
Client

Cloud
Bigger Training
Big Tr...
PARALLEL COMPUTING IN THE CLOUD AND AT THE CLIENTS
Examples:




dog/cat photos
Sensor readings

x

dog or cat
jogging, ...
WHY HSA?
Machine learning happens in the
cloud and at the clients
Models run in the cloud or at the
clients
Need same e...
SCALE OUT AND SCALE IN WITH HETEROGENEOUS CORES
• Both the cloud and mobile clients
are limited by power
• Mobile devices ...
BACKUP
THE NEW VIRTUOUS CYCLE
PERHAPS, LEADING TO COMPUTING LIKE OUR BRAIN

Moore’s law and
beyond

Better user
experience

More ...
MASSIVELY PARALLEL WORKLOADS
• Can keep growing the problem size N

• The serial workload s can be kept constant
• The par...
THE ELEPHANTS: CPU CORES
FOR MULTIPLE-INSTRUCTION-MULTIPLE-DATA (MIMD) EXECUTION

Retrofitted for moderately parallel
work...
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain tec...
Upcoming SlideShare
Loading in …5
×

Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

2,110 views

Published on

Keynote presentation, How Many Cores Will We Need?, by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc., at the AMD Developer Summit (APU13), Nov. 11-13, 2013.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Keynote (Dr Chien-Ping Lu) - How Many Cores Will We Need? - by Dr Chien-Ping Lu, Sr Director – Corporate Technology Office, MediaTek USA Inc.

  1. 1. HOW MANY CORES WILL WE NEED? IN SEARCH OF PARALLEL KILLER APPS CHIEN-PING LU, PHD MEDIATEK INC
  2. 2. A GROUP OF HIPPOS IS CALLED … A Crash 2 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  3. 3. A GROUP OF CROWS IS CALLED … A Murder 3 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  4. 4. A GROUP OF GIRAFFES IS CALLED … From Wikipedia A Tower 4 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  5. 5. SO, IT IS NOT SURPRISING THAT WE USE “A Parade” of elephants 5 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL “A Herd” of sheep “An Army” of ants
  6. 6. FROM FREQUENCY TO MULTICORE SCALING Power Frequency performance Power Single-core Time 6 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Multi-core Power wall: 2005
  7. 7. IT SEEMS INEVITABLE THAT WE WILL NEED A MASSIVE NUMBER OF CORES performance Moderate Time 7 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Massive
  8. 8. DARK SILICON (OR DARK CORES)? performance 8x  4x 4x  3x 2x Time 8 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 16x  4x
  9. 9. HOW TO LIGHT UP THE CORES? Redefine the cores to be heterogeneous Search for parallel killer apps power Power ceiling SIMT “cores” Little cores H.264 encoding Big cores Parallelism wall Degree of Parallelism 9 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Ray tracing
  10. 10. ARMY OF ANTS: SIMT CORES FOR SIMT (SINGLE-INSTRUCTION-MULTIPLE-THREAD ) EXECUTION SIMT is the execution model of HSA and implemented in modern GPUs, with MIMD flexibility and SIMD efficiency A SIMT core runs 1 iteration of the parallel loop Parallel.For (…) Front End Front End Front End … If (…) then … Else … SPE SPE ALU 10 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL ALU ALU A cluster of SIMT cores shares one front end in a SIMD manner Specialized Processing Engines Wider SIMT ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU … A branch is emulated thru divergence
  11. 11. MASSIVELY PARALLEL WORKLOADS • Problem size N can keep growing • Visible serial workload s can be kept constant • Parallel workload is speeded up by P, the number of cores • Reduction overhead is proportional to log P (by a factor of r) • "Embarrassingly" parallel, when there is no reduction overhead (r=0) s s N r log P N/P Time saved by P cores 11 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  12. 12. REVISITING AMDAHL'S LAW s1=50%,r=50% s=50%, r=50% 10000 100 Speedup  Speedup  ss  N P ss rrlog P  1 / P  log P N Speedup 1000 N=16 N=16 N=64 N=64 N=256 N=256 P=N 10 100 10 1 1 2 2 4 4 8 8 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024 2048 4096 8192 1 1 Degree of Parallelism (P) Degree of Parallelism (P) 12 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  13. 13. GRAPHICS KEEP MOVING Highest grossing video game of all-time bench 2.7 T-Rex GL benchmark 2.1 Egypt GFX Recognized by 94% of American Consumers Pac-man, 1980 GL benchmark 2.5 Egypt GFX bench 3.0 Manhattan Mobile 3D Graphics 13 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  14. 14. MEDIATEK FACE BEAUTIFICATION WHEN IT COMES TO BEAUTY, THERE SEEMS TO BE NO LIMIT Before 14 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Skin tone adjustment Wrinkle removal Thinner face, bigger eyes
  15. 15. HIGH-PERFORMANCE COMPUTING (HPC) KEEPS SCALING OUT More atoms Top of Top500 1993-2012 1,000,000 100,000 Relative to 1993  HPC from 1993 to 2012 ‒GFLOPS ~ 130,000x ‒Cores ~ 11,000x ‒GHz ~ 10x Higher grid resolution More time steps 15 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 10,000 1,000 GFLOPS Cores 100 10 1 1990 1995 2000 2005 2010 2015 0 GHz
  16. 16. THE MISSING LINKS IN SEARCH OF PARALLEL KILLER APPS Moore’s law Better user experience Higher frequency More cores Bigger data What bigger problems to solve with bigger data? How solving bigger problems leads to better user experience? More complex Mining bigger data Bigger problems with Machine Learning software 16 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  17. 17. MACHINE LEARNING: TREND PREDICTION WITH POWERFUL MODELS  Powerful models (with many knobs) tend to overfit the noise if the data set is not sufficiently large 350  The explosive growth of data has made powerful models feasible 250  A model with 1 billion knobs, trained with 10 million images from YouTube was used in Google Brain experiment to figure out the concepts of cats and human faces by itself 300 200 150 100 50 0 -50 0 2 4 Samples Data Linear Poly. (2nd order) Poly. (6th order) Source: Le et al., Building High-level Features Using Large Scale Unsupervised Learning 17 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL 6th-order polynomial undulates excessively with only 4 samples 6
  18. 18. HOW TO DISTINGUISH CATS FROM DOGS? ASIRRA Animal Species Image Recognition for Restricting Access (from Microsoft Research) 18 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  19. 19. CAN ASIRRA BE CRACKED? 19 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  20. 20. WHY IS IT HARD? Source: training set of Kaggle.com Dogs vs. Cats competition 20 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  21. 21. IS THERE A MODEL FINDING OUT THAT THESE ARE THE SAME DOG? Prancer, a 5-years-old toy poodle, before and after grooming 21 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  22. 22. MINE THE SOLUTIONS FROM THE DATA Dog-Cat classifier Theory of the differences between dogs and cats? Learn from many (12,500) photos labeled as dogs or cats Machine Learning 22 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  23. 23. SMART AND SMARTER CLIENTS IN THE ERA OF BIG DATA Bigger Big Data Data Smarter Client Client Cloud Bigger Training Big Training Set Set Bigger Machine Machine Learning Learning In the cloud or the clients Powerful Bigger Model Better Sensing Sensing Input data 23 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Better Connectivity Connectivity Better Answer Answer Local Machine Learning
  24. 24. PARALLEL COMPUTING IN THE CLOUD AND AT THE CLIENTS Examples:   dog/cat photos Sensor readings x dog or cat jogging, walking or driving f x ai  y Model Cloud Parallel Computing with more samples Samples ( xn , yn ) ai  Knobs Tweak ai  to minimize the error between f xn ai  and Model Machine Learning 24 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Client Parallel Computing with more knobs yn
  25. 25. WHY HSA? Machine learning happens in the cloud and at the clients Models run in the cloud or at the clients Need same ease of programming and write-once-run-everywhere for heterogeneous cores 25 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Mediatek is one of the cofounders of HSA Foundation MediaTek is the first to introduce in mobile SoC  True Octa-Core  Heterogeneous Multiprocessing (HMP)
  26. 26. SCALE OUT AND SCALE IN WITH HETEROGENEOUS CORES • Both the cloud and mobile clients are limited by power • Mobile devices need to keep cool in our palms • Data centers need to keep our environment clean • Carbon footprint of US datacenters is at the same level as the airline industry • A 1,000m2 datacenter consumes 1.5MW, enough to power 1,000 US homes per year In order to scale out, we need to scale in with heterogeneous cores in the cloud and in our palms 26 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Typical 1,000 homes in US
  27. 27. BACKUP
  28. 28. THE NEW VIRTUOUS CYCLE PERHAPS, LEADING TO COMPUTING LIKE OUR BRAIN Moore’s law and beyond Better user experience More heterogeneous cores Mining bigger data with Machine Learning 28 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL Bigger data
  29. 29. MASSIVELY PARALLEL WORKLOADS • Can keep growing the problem size N • The serial workload s can be kept constant • The parallel workload is speeded up by P, the number of cores • The reduction overhead is proportional to log P (by a factor of r) • "Embarrassingly" parallel, when there is no reduction overhead (r=0) s s N r log P N/P Time saved by P cores 29 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  30. 30. THE ELEPHANTS: CPU CORES FOR MULTIPLE-INSTRUCTION-MULTIPLE-DATA (MIMD) EXECUTION Retrofitted for moderately parallel workloads, and not very efficient for massively parallel workloads Parallel.For (i) … If (…) Front End Front End Front End Front End Front End Front End Front End Front End Front End Front End ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU … ALU ALU Else Front End Front End … … A CPU core runs 1 iteration of the parallel loop The same color means the same piece of code 30 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL
  31. 31. DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners. 31 | HOW MANY CORES WILL WE NEED? | DECEMBER 11, 2013 | CONFIDENTIAL

×