ChipEx 2019 keynote

Manish Pandey
May 13, 2019
Keynote Talk
Artificial Intelligence: Driving the Next Generation of
Chips and Systems
2019
Tel Aviv, Israel

22019
Revolution in Artificial Intelligence

32019
Intelligence Infused from the Edge to the Cloud

42019
Fueled by advances in AI/ML
2000 2005 2010 2015
Growthinpapers(relativeto1996)
Increasing
Accuracy
ErrorRateinImageClassification
2010 2011 2012 2013 2014 2015 2017

52019
AI Revolution Enabled by HardwareAI driven by advances in hardware
Chiwawa
=
=
=

62019
Deep Learning Advances Gated by Hardware
[Bill Dally, SysML 2018]

72019
Deep Learning Advances Gated by Hardware
• Results improve with
o Larger Models
o Larger Datasets
=> More Computation

82019
Data Set and Model Size Scaling
[Hestess et al. Arxiv 17.12.00409]
• 256X data
• 64X model size
Þ Compute 32,000 X

92019
How Much Compute?
12 HD Cameras
Inferencing**
• 25 Million Weights
• 300 Gops for HD Image
• 9.4 Tops for 30fps
• 12 Cameras, 3 nets = 338 Tops
Training
• 30Tops x 108 (train set) x 10^2 (epochs) =
1023 Ops
**ResNet-50
[Bill Dally, SysML 2018]

102019
Challenge: End of Line for CMOS Scaling
• Device scaling down slowing
• Power Density stopped scaling in 2005
[Olukotun, Horowitz][IMEC]

112019
Dennard Scaling to Dark Silicon
Can we specialize designs for DNNs?
Taylor, "A landscape of the new dark silicon design regime." IEEE Micro 33.5 (2013): 8-19

122019
Energy Cost for Operations
Instruction Fetch/D 70
[Horowitz ISSCC 2014]
*45nm

132019
Energy Cost for DNN Ops
Y = AB + C
Instruction Data Type Energy Energy/Op
1 Op/Instr
(*,+)
Memory fp32 89 nJ 693 pJ
128 Ops/Instr
(AX+B)
Memory fp32 72 nJ 562 pJ
128 Ops/Instr
(AX+B)
Cache fp32 0.87 nJ 6.82 pJ
128 Ops/Instr
(AX+B)
Cache fp16 0.47 nJ 3.68 pJ
*45nm

142019
Build a processor tuned to application - specialize for DNNs
• More effective parallelism for AI Operations (not ILP)
– SIMD vs. MIMD
– VLIW vs. Speculative, out-of-order
• Eliminate unneeded accuracy
– IEEE replaced by lower precision FP
– 32-64 bit bit integers to 8-16 bit integers
• More effective use of memory bandwidth
– User controlled versus caches
How many TeraOps per Watt?

152019
2-D Grid of Processing Elements
Systolic Arrays
• Balance compute and I/O
• Local Communication
• Concurrency
• M1*M2 nxn mult in 2n cycles

162019
TPUv2
• 8GB + 8GB HBM
• 600 GB/s Mem BW
• 32b float
• MXU fp32 accumulation
• Reduced precision mult.
[Dean NIPS’17]

172019
Datatypes – Reduced Precision
16-bits accuracy matches 32-bits!
Stochastic Rounding: 2.2 rounded to 2 with probability 0.8
rounded to 3 with probability 0.2

182019
Integer Weight Representations

192019
Extreme Datatypes
• Q: What can we do with single bit or ternary (+a, -b, 0) representations
• A: Surprisingly, a lot. Just retrain/increase the number of activation layers
[Chenzhuo et al,, arxiv 1612.01064]
Network

202019
Pruning – How many values?
90% values can be thrown away

212019
Quantization – How many distinct values?
16 Values => 4 bits representation
Instead of 16 fp32 numbers, store16 4-bit indices
[Han et al, 2015]

222019
Memory Locality
20mm
32-bit op
4pJ
256-bit access
8KB Cache
50pJ
256-bit
Bus 256pJ
DRAM 32bits
640pJ
256-bit
Bus 25pJ
Compute Weights
Interconnect
Bring Compute Closer to Memory

232019
Memory and inter-chip communication advances
[Graphcore 2018]
pJ
64pJ/B
16GB
900GB/s @ 60W
10pJ/B
256MB
6TB/s @ 60W
1pJ/B
1000 x 256KB
60TB/s @ 60W
Memory power density
Is ~25% of logic power density

242019
In-Memory Compute
[Ando et al., JSSC 04/18]
In-memory compute with low-bits/value, low cost ops

252019
AI Application-System Co-design

262019
AI Application-System Co-design
[Reagen et al, Minerva, ISCA 2016]
• Co-design across the algorithm, architecture, and circuit levels
• Optimize DNN hardware accelerators across multiple datasets

272019
Where Next?
• Rise of non-Von Neumann Architectures to efficiently solve Domain-Specific
Problems
– A new Golden Age of Computer Architecture
• Advances in semi technology, computer architecture will continue advances in AI
• 10 Tops/W is the current State of the Art (~14nm)
• Several orders of magnitude gap with the human brain

ChipEx 2019 keynote

Recommended

Recommended

More Related Content

Similar to ChipEx 2019 keynote

Similar to ChipEx 2019 keynote (20)

Recently uploaded

Recently uploaded (20)

ChipEx 2019 keynote