Design Techniques
for DLA
* DLA stands for Deep Learning Accelerator

(draft)
https://en.wikipedia.org/wiki/The_Boss_Baby
Dark Silicon
https://www.publicdomainpictures.net/en/view-image.php?image=44607&picture=portrait-of-the-dark-sides-man
Roofline Model
Layer Behaviors
Convolution
• computation intensive 

• Around 1x1xC ~ 11x11xC

• Variant

• depthwise Separable Convolution

• sparse 

•
Fully-Connected
• Most weights
CNN Accelerator
Hardware Accelerator Design for Machine Learning, 2016
Hardware Accelerator Design for Machine Learning, 2016
Filter Decomposition
* A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things 

For Larger Convolution Kernels
Hardware Accelerator Design for Machine Learning, 2016
Model Compression
pruned and retrained
Hardware Accelerator Design for Machine Learning, 2016
pruned and retrained
Deep compression: Compressing DNNs with pruning, trained quantization and huffman coding, 2015
Tensor Core
"SIMD" for GPU
https://devblogs.nvidia.com/programming-tensor-cores-cuda-9/
Systolic Array
GotoBLAS library
General Tricks
• Burst/fetch continues blocks

•
Analog computing
Hardware Accelerator Design for Machine Learning, 2016
Hardware Accelerator Design for Machine Learning, 2016
Thermal
Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era, 2016
Memory Bandwidth
Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era, 2016
• most of the energy is consumed not in computation but in
moving data to and from memory 

• moving from 16- to 64-bit fetches only changes the
energy by 1.5x 

•
• ZERO copy
• Conv memory remapping
• Embedded Binarized Neural Networks

Deep Learning Accelerator Design Techniques