PR 072 (11th March, 2018)
Taeoh Kim
Storage
• AlexNet (caffe): 200MB
• VGG-16 (caffe): 500MB
Energy
• Add Operation: 0.9pJ
• Memory Access: 640pJ (DRAM)
• 20fps 1 Billion Neural Network: 12.8W
• Song Han et al., Learning both Weights and Connections for
Efficient Neural Networks, NIPS 2015
• Song Han et al., Deep Compression, ICLR 2016 Best Paper
• Song Han et al,. EIE: Efficient Inference Engine on Compressed
Deep Neural Network, ISCA 2016
• Song Han et al., DSD: Dense-Sparse-Dense Training for Deep
Neural Networks, ICLR 2017
Pruning Quantization Encoding
Pruning Quantization Encoding
Original Network
Pruning Quantization Encoding
Original Network
Reduce # of Weights
9~13x
Pruning Quantization Encoding
Original Network
Reduce # of Weights
Reduce Bits
Per Weight
9~13x 27~31x
Pruning Quantization Encoding
Original Network
Reduce # of Weights
Reduce Bits
Per Weight
Reduce Total Bits
9~13x 27~31x 35~49x
Compact Data
Representation
Reduce Bits
Per Representation
Reduce Total Bits
DCT
Quantization
8x8 Image Block
Huffman Coding: Variable Length Code / Prefix Condition Code
AllJPEGSlides fromthe Course “Image Coding” byProf. K.H.Sohn,YonseiUniversity
Image Compression
(JPEG)
Deep (Model)
Compression
Input Image Block (8x8) Pre-trained Networks
Compact Representation Discrete Cosine Transform
Iterative
Pruning + Retraining
Quantization Quantization Table Quantization + Fine Tuning
Encoding Huffman Coding Huffman Coding
Pruning
Reduce # of Weights
9~13x
SongHanetal.,Learning bothWeights andConnections forEfficient Neural Networks, NIPS 2015
Pruning
Reduce # of Weights
9~13x
• Regularization is Important
• L1 vs L2 ?
Elements ofStatistical Learning byHastieetal.
SongHanetal.,Learning bothWeights andConnections forEfficient Neural Networks, NIPS 2015
Pruning
Reduce # of Weights
9~13x
• Regularization is Important
• Before Ratraining, L1 is Better
• After Retraining, L2 is Better
SongHanetal.,Learning bothWeights andConnections for Efficient Neural Networks, NIPS 2015
Pruning
Reduce # of Weights
9~13x
• Store Sparse Connections (Index):
• Compressed Sparse Row (CSR)
• 2a+n+1 Numbers
Quantization
27~31x
• Scalar Quantization
• Centroids Fine Tuning
Reduce Bits
Per Weight
Quantization
27~31x
Reduce Bits
Per Weight
• Before Q: 16x32 = 512 bits
• After Q: 4x32 + 2x16 = 160 bits
• Compression Rate = 3.2
Quantization
27~31x
Reduce Bits
Per Weight
• Centroid Initialization
Quantization
27~31x
Reduce Bits
Per Weight
• Linear Initialization is the Best
• Large Weights are Important
Encoding
Reduce Total Bits
35~49x
Huffman Coding
• Effects of
Pruning +
Quantization
• 8 bits / 5 bits for Conv / FC Layers
= No Loss of Accuracy
• AlexNet
• VGGNet
• FC Compression Rate >> Conv Compression Rate
SongHanetal.,DSD:Dense-Sparse-Dense Training forDeepNeural Networks, ICLR2017
SongHanetal.,DSD:Dense-Sparse-Dense Training forDeepNeural Networks, ICLR2017
Pr072 deep compression

Pr072 deep compression