For decades we have been able to take advantage of Moore’s Law to improve single thread performance, reduce power and cost with each generation of semiconductor technology. While technology has advanced after the end of Dennard scaling more than 10 years ago, the advances have slowed down. Server performance increases have relied on increasing core counts and power budgets.
At the same time, workloads have changed in the era of cloud computing. Scale out is becoming more important than scale up. Domain specific architectures have started to emerge to improve the energy efficiency of emerging workloads like deep learning.
This talk will provide a historical perspective and discuss emerging trends driving the development of modern processors.
SQL Database Design For Developers at php[tek] 2024
Hipeac 2018 keynote Talk
1. Qualcomm Datacenter Technologies, Inc.
Emerging Computing Trends in the Datacenter
Dileep Bhandarkar, Ph. D.
Vice President, Technology
HiPEAC18 Keynote – 23 January 2018, Manchester, United Kingdom
2. Outline
• Historical Perspective on 40 Years of Moore’s Law
– Single Core Era enabled by Dennard Scaling
• Post Dennard Scaling Drives Multi-Core Era
• The Shift to Energy Efficient Multi-Core Designs for
the Cloud
• Heterogenous Computing Era with Application
Specific Accelerators
3. The First 50 Years
after
Shockley’s Transistor Invention
4. 1958: Jack Kilby’s
Integrated Circuit
My 40+ Year Journey From Mainframes to Smartphones https://www.youtube.com/watch?v=7ptXpNFY3XM
Bob Noyce’s
Integrated Circuit
5. From 2300 to >1Billion Transistors
Moore’s Law video at http://www.cs.ucr.edu/~gupta/hpca9/HPCA-PDFs/Moores_Law_Video_HPCA9.wmv
6. Dennard Scaling
Device or Circuit Parameter Scaling Factor
Device dimension tox, L, W 1/K
Doping concentration Na K
Voltage V 1/K
Current I 1/K
Capacitance eA/t 1/K
Delay time per circuit VC/I 1/K
Power dissipation per circuit VI 1/K2
Power density VI/A 1
The benefits of scaling : as transistors get smaller, they can switch faster and use less power.
Each new generation of process technology was expected to reduce minimum feature size by
approximately 0.7x (K ~1.4). A 0.7x reduction in linear features size provided roughly a 2x
increase in transistor density.
Dennard scaling broke down around 2004 with unscaled interconnect delays and our inability
to scale the voltage and current due to reliability concerns.
But increasing transistor density (Moore’s Law) has continued to enable multicore designs.
7. THE MULTICORE ERA
SINGLE THREAD PERFORMANCE IMPROVEMENT SLOWING DOWN
PERFORMANCE DRIVEN BY HIGHER CORE COUNT
Post Dennard Scaling
9. The last 5 Generations of ~135W Xeon Processors
Slow Improvement in IPC but per thread performance constrained by power
Performance data from www.spec.org
8 cores
Mar 2012
10 cores
Sep 2013
12 cores
Sep 2014
14 cores
Apr 2016
18 cores
Jul 2017
10. No Improvement in Perf/Watt per Core
even with higher power
Performance data from www.spec.org
14. Disruptions Come from Below!
Mainframes
Minicomputers
RISC Systems
Desktop PCs
Notebooks
Smart Phones
Volume
Performance
Bell’s Law:
hardware technology,
networks, and interfaces
allows new, smaller, more
specialized computing
devices to be introduced to
serve a computing need.
15. 15
Qualcomm Datacenter
Technologies
Uniquely positioned to leverage
mobile growth and drive datacenter
process leadership
65nm 45nm 28nm 20nm 10nm
1st in the
industry
14nm
Mobile driven
NowThen
Fab process tech
driven by PC
Fab process tech driven
by mobile phones
PC driven
2008 2010 2012
2016
20182014 1.5B
units
256M
units
Smartphone unitsPC units
45nm 32nm 10nm14nm22nm
A new world in datacenter:
Manufacturing
process
Mobile Technology Disrupting the Cloud Datacenter
16. 16
Qualcomm Centriq
™
2400
Throughput performance
Thread Density
Quality of Service
Energy Efficiency
What Cloud means for
Processor Architecture
Key metrics
• Perf / thread
• Perf / Watt
• Perf / mm2
The future requires a new approach to CPU design
17. Datacenter Energy Efficiency Considerations
Source: https://eta.lbl.gov/publications/united-states-data-center-energy, http://perspectives.mvdirona.com/
• US datacenters consumed about 70 billion
kilowatt-hours of electricity in 2014
• Datacenters can cost between $10M and $20M
per megawatt
• Unused datacenter capacity can be expensive
• 1W of server power can cost $1 per year in
energy costs at 10 cents per KWH
• Server power related costs can be 30 to 40%
of overall datacenter operating costs
• Servers need to be designed for efficient
average power consumption instead just
maximizing peak output efficiency
Better Hyper-efficient Designs Needed to Improve Server Energy Efficiency
22. • Energy efficiency must be a implicit design target
• Desktop PC CPU cores are too power hungry and not energy efficient
• Wimpy cores are not good enough for servers
• Servers can be designed by scaling up energy efficient mobile core design philosophy
• Many workloads run best on different kinds of specialized processing engines
• Each processing engine has its own strengths
Lessons from Mobile Computing
23. • Order of Magnitude higher computational efficiency than general
purpose processors
• Can accept inefficient implementation to reduce time to market
• Many potential applications
– Machine Learning
– Encryption
– Data Compression
– Video processing
• Need reasonable volume for business case
• Algorithms need to be stable
• Can they be programmable? Where do FPGAs fit?
The Age of Application Specific Accelerators
24. Before the emergence of DNNs
Algorithms and rule based systems were laboriously hand-coded
But by 2012, the ingredients for change were available
Sufficiently powerful GPU’s
Readily available large data sets on the internet
The Emergence of Deep Neural Networks
Deep Neural Networks are becoming Pervasive
The turning point - ImageNet Competition 2012
“ImageNet Classification with Deep Convolutional Neural Networks”, Neural Information
Processing Systems Conference (NIPS 2012)
Deep Neural Net enabled a performance breakthrough
Now - DNN’s are simpler to develop and deploy, ushering in radical change in many fields and
entire industries
25. Deep Learning is Growing Exponentially
Source: Google
Source: Google
30. Deploying DNNs at Datacenter Scale
Training tends toward concentrated, centralized computation
Inference tends toward wide distribution
GPUs
Large DPU
CPUs
Small DPU
31. CPUs are not powerful enough for training, but have free cycles available for
inference – opportunity for add-in accelerator cards
Instruction Set enhancements can improve performance
GPUs have too much “extra baggage” that add cost and power for features not
needed for AI – opportunity for domain specific accelerators
FPGAs offer more flexibility, but are difficult to program and expensive
ASICs are energy and product cost efficient, but less flexible
Deep neural networks are making significant strides in many areas
speech, vision, language, search, robotics, medical imaging & treatment, drug discovery …
We have an opportunity to dramatically reshape our computing devices to
better serve this emerging and growing market
Expect to see lots of innovation and excitement in the years to come
Thoughts on Future Silicon for Deep Learning
32. • Single thread general purpose performance improvement is slowing down
• Energy efficiency is extremely important in datacenters
• ARM architecture enables energy efficient designs with good performance
• Typical-use efficiency is becoming more important than peak output efficiency
in enterprise data centers
• Idle mode power will become more important for servers
• Smart power management can dynamically optimize server operation to
improve efficiency in normal use
• There is plenty of opportunity for innovation on new application specific
architectures targeted for specific workloads
Concluding Remarks