SlideShare a Scribd company logo
1 of 75
Download to read offline
3
This image is licensed under CC-BY 2.0
This image is in the public domain
This image is in the public domain
This image is licensed under CC-BY 2.0
IMAGE RECOGNITION SPEECH RECOGNITION
2012
AlexNet
2015
ResNet
152 layers
22.6 GFLOP
~3.5%
error
8 layers
1.4 GFLOP
~16% Error
16X
Model
2014
Deep Speech 1
2015
Deep Speech 2
80 GFLOP
7,000 hrs of Data
~8% Error
10X
Training Ops
465 GFLOP
12,000 hrs of Data
~5% Error
Microsoft Baidu
Hard to distribute large models through over-the-air update
This image is licensed under CC-BY 2.0
App icon is in the public domain
Phone image is licensed under CC-BY 2.0
ResNet18: 10.76% 2.5 days
ResNet50: 7.02% 5 days
ResNet101: 6.21% 1 week
ResNet152: 6.16% 1.5
weeks
Error rate Training time
AlphaGo: 1920 CPUs and 280 GPUs,
$3000 electric bill per game
on mobile: drains battery
larger model => more memory reference => more energy
1
Operatio
n
Energy [pJ]
32 bit int ADD 0.1
32 bit float ADD 0.9
32 bit Register File 1
32 bit int MULT 3.1
32 bit float MULT 3.7
32 bit SRAM Cache 5
32 bit DRAM Memory 640
Relative Energy Cost
1 10
=1000
100 1000
10000
larger model => more memory reference => more energy
Operation Energy[pJ
]
Relative Energy Cost
32 bit int AD 0.1
32 bit float ADD 0.9
32 bit Register File 1
32 bit int MULT 3.1
32 bit float MULT 3.7
32 bit SRAM Cache 5
32 bit DRAM Memory 640
1 10 100 1000 10000
how to make deep learning more efficient?
larger model => more memory reference => more energy
Operation: Energy (pJ)
8b Add 0.03
16b Add 0.05
32b Add 0.1
16b FP Add 0.4
32b FP Add 0.9
8b Mult 0.2
32b Mult 3.1
16b FP Mult 1.1
32b FP Mult 3.7
32b SRAM Read (8KB) 5
32b DRAM Read 640
Area (μm2
)
36
67
137
1360
4184
282
3495
1640
7700
N/A
N/A
•
6. Winograd
Transformation
•
1. Pruning
• 2. Weight Sharing
• 3. Quantization
• 4. Low Rank
Approximation
• 5. Binary / Ternary Net
6. Winograd
Transformation
•
• 1. Pruning
• 2. Weight Sharing
• 3. Quantization
• 4. Low Rank
Approximation
• 5. Binary / Ternary Net
13
60 Million
6M
-0.01x2
+x+1
14
15
AccuracyLoss
0.5%
0.0%
-0.5%
-1.0%
-1.5%
-2.0%
-2.5%
-3.0%
-3.5%
-4.0%
-4.5%
40% 50% 60% 70% 80%
Parameters Pruned Away
90% 100%
AccuracyLoss -0.5%
-1.0%
-1.5%
-2.0%
-2.5%
-3.0%
-3.5%
-4.0%
-4.5%
0.5%
0.0%
40% 50% 60% 70% 80%
Parameters Pruned Away
90% 100%
Pruning
16
AccuracyLoss
0.5%
0.0%
-0.5%
-1.0%
-1.5%
-2.0%
-2.5%
-3.0%
-3.5%
-4.0%
-4.5%
Parameters Pruned
Away
40% 50% 60% 70% 80% 90% 100%
Pruning Pruning+Retraining
17
AccuracyLoss
0.5%
0.0%
-0.5%
-1.0%
-1.5%
-2.0%
-2.5%
-3.0%
-3.5%
-4.0%
-4.5%
40% 50% 60% 70% 80%
Parameters Pruned Away
100%
Pruning Pruning+Retraining Iterative Pruning and Retraining
90%
18
Pruning RNNand LSTM
19
• Original:a basketball playerin a white uniform
is playing with a ball
• Pruned 90%: a basketball player in a white uniform
is playing with a basketball
• Original : a brown dog is running through a grassy
field
• Pruned 90%: a brown dog is running through a
grassy area
•
•
Original : a soccer player in red is running in the field
Pruned 95%: a man in a red shirt and black and
white black shirt is running through a field
•
•
Original : a man is riding a surfboard on a wave
Pruned 90%: a man in a wetsuit is riding a wave on a
beach
95%
20
90%
90%
90%
50 Trillion
Synapses
500
Trillion
Synapses
1000
Trillion
Synapses
Newborn 1 year old Adolescent
This image is in the public domain This image is in the public domain This image is in the public domain
21
Before Pruning After Pruning After Retraining
22
• 1. Pruning
• 2. Weight Sharing
• 3. Quantization
• 4. Low Rank
Approximation
• 5. Binary / Ternary Net
• 6. Winograd
Transformation
,
2.09, 2.12, 1.92,
1.87
2.0
Cluster the Weights
Generate Code Book
Quantize the Weights
with Code Book
Retrain Code Book
,
32 bit
4bit
2.09, 2.12, 1.92, 1.87
2.0
,
2.09 -0.98 1.48 0.09
0.05 -0.14 -1.08 2.12
-0.91 1.92 0 -1.03
1.87 0 1.53 1.49
weights
(32 bit float)
gradient
-0.03 -0.01 0.03 0.02 -0.03 0.12 0.02 -0.07 0.04
-0.01 0.01 -0.02 0.12
group by
0.03 0.01
-0.02 reduce
0.02
-0.01 0.02 0.04 0.01 0.02 -0.01 0.01 0.04 -0.02 0.04
-0.07 -0.02 0.01 -0.02 -0.01 -0.02 -0.01 0.01 -0.03
lr
cluster index
(2 bit uint)
centroids
3 0 2 1
3:
2.00
cluster
1 1 0 3 2: 1.50
0 3 1 0
1:
0.00
3 1 2 2
0:
-1.00
2.09 -0.98 1.48 0.09
0.05 -0.14 -1.08 2.12
-0.91 1.92 0 -1.03
1.87 0 1.53 1.49
cluster
cluster index
(2 bit uint)
3 0 2 1
1 1 0 3
0 3 1 0
3 1 2 2
gradient
-0.03 -0.01 0.03 0.02 -0.03 0.12 0.02 -0.07 0.04
-0.01 0.01 -0.02 0.12
group by
0.03 0.01
-0.02 reduce
0.02
-0.01 0.02 0.04 0.01 0.02 -0.01 0.01 0.04 -0.02 0.04
-0.07 -0.02 0.01 -0.02 -0.01 -0.02 -0.01 0.01 -0.03
1:
0:
2:
weights
(32 bit float)centroids
3: 2.00
1.50
0.00
-1.00
,
2.09 -0.98 1.48 0.09
0.05 -0.14 -1.08 2.12
-0.91 1.92 0 -1.03
1.87 0 1.53 1.49
cluster
weights
(32 bit float) centroids
3 0 2 1
1 1 0 3
0 3 1 0
3 1 2 2
cluster index
(2 bit uint)
gradient
-0.03 -0.01 0.03 0.02 -0.03 0.12 0.02 -0.07 0.04
-0.01 0.01 -0.02 0.12
group by
0.03 0.01
-0.02 reduce
0.02
-0.01 0.02 0.04 0.01 0.02 -0.01 0.01 0.04 -0.02 0.04
-0.07 -0.02 0.01 -0.02 -0.01 -0.02 -0.01 0.01 -0.03
1:
0:
2:
3: 2.00
1.50
0.00
-1.00
,
2.09 -0.98 1.48 0.09
0.05 -0.14 -1.08 2.12
-0.91 1.92 0 -1.03
1.87 0 1.53 1.49
-0.03 -0.01 0.03 0.02
-0.01 0.01 -0.02 0.12
-0.01 0.02 0.04 0.01
-0.07 -0.02 0.01 -0.02
cluster
weights
(32 bit float) centroids
gradient
3 0 2 1
1 1 0 3
0 3 1 0
3 1 2 2
cluster index
(2 bit uint)
-0.03 0.12 0.02 -0.07 0.04
group by
0.03 0.01
-0.02 reduce
0.02
0.02 -0.01 0.01 0.04 -0.02 0.04
-0.01 -0.02 -0.01 0.01 -0.03
1:
0:
2:
3: 2.00
1.50
0.00
-1.00
,
2.09 -0.98 1.48 0.09
0.05 -0.14 -1.08 2.12
-0.91 1.92 0 -1.03
1.87 0 1.53 1.49
-0.03 -0.01 0.03 0.02
-0.01 0.01 -0.02 0.12
-0.01 0.02 0.04 0.01
-0.07 -0.02 0.01 -0.02
cluster
weights
(32 bit float) centroids
gradient
3 0 2 1
1 1 0 3
0 3 1 0
3 1 2 2
cluster index
(2 bit uint)
2.00
1.50
0.00
-1.00
group by
-0.03 0.12 0.02 -0.07
0.03 0.01 -0.02 r
0.02 -0.01 0.01 0.04 -0.02
-0.01 -0.02 -0.01 0.01
0.04
educe
0.02
0.04
-0.03
1:
0:
2:
3:
,
2.09 -0.98 1.48 0.09
0.05 -0.14 -1.08 2.12
-0.91 1.92 0 -1.03
1.87 0 1.53 1.49
-0.03 -0.01 0.03 0.02
-0.01 0.01 -0.02 0.12
-0.01 0.02 0.04 0.01
-0.07 -0.02 0.01 -0.02
cluster
weights
(32 bit float) centroids
gradient
3 0 2 1
1 1 0 3
0 3 1 0
3 1 2 2
cluster index
(2 bit uint)
2.00
1.50
0.00
-1.00
group by
-0.03 0.12 0.02 -0.07
0.03 0.01 -0.02 r
0.02 -0.01 0.01 0.04 -0.02
-0.01 -0.02 -0.01 0.01
0.04
0.02
0.04
-0.03
educe
1:
lr0:
2:
3:
2.09 -0.98 1.48 0.09
0.05 -0.14 -1.08 2.12
-0.91 1.92 0 -1.03
1.87 0 1.53 1.49
-0.03 -0.01 0.03 0.02
-0.01 0.01 -0.02 0.12
-0.01 0.02 0.04 0.01
-0.07 -0.02 0.01 -0.02
cluster
weights
(32 bit float) centroids
gradient
3 0 2 1
1 1 0 3
0 3 1 0
3 1 2 2
cluster index
(2 bit uint)
2.00
1.50
0.00
-1.00
group by
-0.03 0.12 0.02 -0.07
0.03 0.01 -0.02
0.02 -0.01 0.01 0.04 -0.02
-0.01 -0.02 -0.01 0.01
fine-tuned
centroids
reduce
1.96
1.48
-0.04
-0.97l
0.04
0.02
0.04
-0.03
1:
r0:
2:
3:
,
,
Weight
Value
Count
,
Weight
Value
Count
,
Count
Weight
.
.
,
AlexNet on ImageNet
•In-frequent weights: use more bits to represent
•Frequent weights: use less bits to represent
,
Encode Weights
Encode Index
Huffman Encoding
Train Connectivity
Prune Connections
Train Weights
Cluster the Weights
Generate Code Book
Quantize the Weights
with Code Book
Retrain Code Book
Pruning: less number of weights
Quantization: less bits per weight
original
size
9x-13x
reduction
27x-31x
reduction
same
accuracy
same
accuracy
original
network
Encode Weights
Encode Index
Huffman Encoding
35x-49x
reduction
same
accuracy
Summaryof Deep Compression
Network
Original
Size
Original
Accuracy
Compressed
Accuracy
LeNet-300 1070KB 98.36% 98.42%
LeNet-5 1720KB 99.20% 99.26%
AlexNet 240MB 80.27% 80.30%
VGGNet 550MB 88.68% 89.09%
GoogleNet 28MB 88.90% 88.92%
ResNet-18 44.6MB
Compressed
Compression Size
Ratio
27KB 40x
44KB 39x
6.9MB 35x
11.3MB 49x
2.8MB 10x
4.0MB 11x
89.24% 89.28%
Can we make compact models to begin with?
.
1x1 Conv
Expand
3x3 Conv
Expand
Input
1x1 Conv
Squeeze
Output
Concat/Eltwise
Accuracy Accuracy
AlexNet - 240MB 1x 57.2% 80.3%
AlexNet SVD 48MB 5x 56.0% 79.4%
AlexNet
Deep
Compression
6.9MB 35x 57.2% 80.3%
SqueezeNet - 4.8MB 50x 57.5% 80.3%
SqueezeNet
Deep
0.47MB 510x 57.5% 80.3%
Compression
Network Approach Size Ratio
Top-1 Top-5
Average
CPU GPU mGPU
0.6x
Average
CPU GPU mGPU
Deep
Compression
F
8
• 1. Pruning
• 2. Weight Sharing
• 3. Quantization
• 4. Low Rank
Approximation
• 5. Binary / Ternary Net
• 6. Winograd
Transformation
•Train with float
•Quantizing the weight and
activation:
• Gather the statistics for
weight and activation
• Choose proper radix point
position
•Fine-tune in float format
•Convert to fixed-point format
1
O.
9
O.
8
O.
7
O.
6
O.
5
O.
4
O.
3
O.
2
O.
1
O
GoogleNet
fp32Fixed 16 Fixed 8 Fixed6
1
O.
9
O.
8
O.
7
O.
6
O.
5
O.
4
O.
3
O.
2
O.
1
O
VGG-16
fp32Fixed 16 Fixed 8 Fixed 6
• 1. Pruning
• 2. Weight Sharing
• 3. Quantization
• 4. Low Rank
Approximation
• 5. Binary / Ternary Net
• 6. Winograd
Transformation
• 1. Pruning
• 2. Weight Sharing
• 3. Quantization
• 4. Low Rank
Approximation
• 5. Binary / Ternary Net
• 6. Winograd
Transformation
-1 0 1−0.05 0.05
0
1600
3200
6400
4800
0
Weight Value
Count
=>
Normalize
-1 1
Quantize
-1 0 1 -1 0 1 Wp-t 0
t
Loss
Scale
Wn Wp
Feed Forward Back Propagate Inference Time
Trained
Quantization
-Wn 0
Full Precision Weight
Normalized
Full Precision Weight
Final Ternary WeightIntermediate Ternary Weight
gradient1 gradient2
TernaryWeightValue
3
2
1
0
-1
-2
-3
res1.0/conv1/Wn res1.0/conv1/Wp linear/Wn linear/Wpres3.2/conv2/Wn res3.2/conv2/Wp
TernaryWeight
Percentage
100%
75%
50%
25%
0%
0 50 100
Negatives Zeros Positives
Epochs
150 0 50 100
Negatives Zeros Positives
150 0 50 100 150
Negatives Zeros Positives
Ternary weights value (above) and distribution (below) with iterations for different
layers of ResNet-20 on CIFAR-10.
• 1. Pruning
• 2. Weight Sharing
• 3. Quantization
• 4. Low Rank
Approximation
• 5. Binary / Ternary Net
• 6. Winograd
Transformation
Compute Bound
8 7 6
5 4 3
2 1 0
Filter
0 1 2
3 4 5
6 7 8
Image
Tensor
6 2 8 07 1
0 8 1 7 2 6
4 4 5 33 5
9xC FMAs/Output: Math Intensive
4 0 4 1 4 2
4 3 4 4 4 5
4 6 4 7 4 8
9xK FMAs/Input: Good Data Reuse
∑
✽
Direct convolution: we need 9xCx4 = 36xC FMAs for 4 outputs
Transform Data to Reduce Math Intensity
4x4 Tile
Output
Transform
Filter
Data
Transform
Filter
Transform
∑ over C
Point-wise
multiplication
Direct convolution: we need 9xCx4 = 36xC FMAs for 4 outputs
Winograd convolution: we need 16xC FMAs for 4 outputs: 2.25x fewer FMAs
VGG16, Batch Size 1 – Relative
Performance
2.00
1.50
1.00
0.50
0.00 conv 1.1 conv 1.2 conv 2.1 conv 2.2 conv 3.1 conv 3.2 conv 4.1 conv 4.2 conv 5.0
cuDNN 3 cuDNN 5
•
1.
2.
3.
4.
5.
•
1.
2.
•
1.
2.
74
darshancganji12@gmal.com

More Related Content

What's hot

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Chapter 20 Deep generative models
Chapter 20 Deep generative modelsChapter 20 Deep generative models
Chapter 20 Deep generative modelsKyeongUkJang
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 
Intro to Feature Selection
Intro to Feature SelectionIntro to Feature Selection
Intro to Feature Selectionchenhm
 
Optimization in deep learning
Optimization in deep learningOptimization in deep learning
Optimization in deep learningRakshith Sathish
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descentkandelin
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksCloudxLab
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...
[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...
[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...Jihoo Kim
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learningmilad abbasi
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰taeseon ryu
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection methodAmir Razmjou
 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)cairo university
 

What's hot (20)

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Chapter 20 Deep generative models
Chapter 20 Deep generative modelsChapter 20 Deep generative models
Chapter 20 Deep generative models
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Intro to Feature Selection
Intro to Feature SelectionIntro to Feature Selection
Intro to Feature Selection
 
Optimization in deep learning
Optimization in deep learningOptimization in deep learning
Optimization in deep learning
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...
[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...
[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
 
Genetic programming
Genetic programmingGenetic programming
Genetic programming
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection method
 
Chapter 20 - VAE
Chapter 20 - VAEChapter 20 - VAE
Chapter 20 - VAE
 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)
 

Similar to Model Compression

jpg image processing nagham salim_as.ppt
jpg image processing nagham salim_as.pptjpg image processing nagham salim_as.ppt
jpg image processing nagham salim_as.pptnaghamallella
 
CAE REPORT
CAE REPORTCAE REPORT
CAE REPORTdayahisa
 
Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011USC
 
Self learning cloud controllers
Self learning cloud controllersSelf learning cloud controllers
Self learning cloud controllersPooyan Jamshidi
 
Week7 Quiz Help Excel File
Week7 Quiz Help Excel FileWeek7 Quiz Help Excel File
Week7 Quiz Help Excel FileBrent Heard
 
Performance tuning
Performance tuningPerformance tuning
Performance tuningJon Haddad
 
Large-scale Experimentation with Network Abstraction for Network Configuratio...
Large-scale Experimentation with Network Abstraction for Network Configuratio...Large-scale Experimentation with Network Abstraction for Network Configuratio...
Large-scale Experimentation with Network Abstraction for Network Configuratio...ARCFIRE ICT
 
Towards Probabilistic Assessment of Modularity
Towards Probabilistic Assessment of ModularityTowards Probabilistic Assessment of Modularity
Towards Probabilistic Assessment of ModularityKevin Hoffman
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningAndrea Gigli
 
Fuzzy Control meets Software Engineering
Fuzzy Control meets Software EngineeringFuzzy Control meets Software Engineering
Fuzzy Control meets Software EngineeringPooyan Jamshidi
 
Deep learning in E-Commerce Applications and Challenges (CNN)
Deep learning in E-Commerce Applications and Challenges (CNN) Deep learning in E-Commerce Applications and Challenges (CNN)
Deep learning in E-Commerce Applications and Challenges (CNN) Houda Bakir
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsViet-Trung TRAN
 
Roll grinding Six Sigma project
Roll grinding Six Sigma projectRoll grinding Six Sigma project
Roll grinding Six Sigma projectTariq Aziz
 
aserra_phdthesis_ppt
aserra_phdthesis_pptaserra_phdthesis_ppt
aserra_phdthesis_pptaserrapages
 
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Pooyan Jamshidi
 
Efficient Implementation of Self-Organizing Map for Sparse Input Data
Efficient Implementation of Self-Organizing Map for Sparse Input DataEfficient Implementation of Self-Organizing Map for Sparse Input Data
Efficient Implementation of Self-Organizing Map for Sparse Input Dataymelka
 
Video Compression Basics by sahil jain
Video Compression Basics by sahil jainVideo Compression Basics by sahil jain
Video Compression Basics by sahil jainSahil Jain
 

Similar to Model Compression (20)

jpg image processing nagham salim_as.ppt
jpg image processing nagham salim_as.pptjpg image processing nagham salim_as.ppt
jpg image processing nagham salim_as.ppt
 
CAE REPORT
CAE REPORTCAE REPORT
CAE REPORT
 
Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011
 
Self learning cloud controllers
Self learning cloud controllersSelf learning cloud controllers
Self learning cloud controllers
 
Week7 Quiz Help Excel File
Week7 Quiz Help Excel FileWeek7 Quiz Help Excel File
Week7 Quiz Help Excel File
 
Performance tuning
Performance tuningPerformance tuning
Performance tuning
 
Large-scale Experimentation with Network Abstraction for Network Configuratio...
Large-scale Experimentation with Network Abstraction for Network Configuratio...Large-scale Experimentation with Network Abstraction for Network Configuratio...
Large-scale Experimentation with Network Abstraction for Network Configuratio...
 
Towards Probabilistic Assessment of Modularity
Towards Probabilistic Assessment of ModularityTowards Probabilistic Assessment of Modularity
Towards Probabilistic Assessment of Modularity
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text Mining
 
Fuzzy Control meets Software Engineering
Fuzzy Control meets Software EngineeringFuzzy Control meets Software Engineering
Fuzzy Control meets Software Engineering
 
Taguchi1
Taguchi1Taguchi1
Taguchi1
 
Steganography Part 2
Steganography Part 2Steganography Part 2
Steganography Part 2
 
Deep learning in E-Commerce Applications and Challenges (CNN)
Deep learning in E-Commerce Applications and Challenges (CNN) Deep learning in E-Commerce Applications and Challenges (CNN)
Deep learning in E-Commerce Applications and Challenges (CNN)
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
 
Roll grinding Six Sigma project
Roll grinding Six Sigma projectRoll grinding Six Sigma project
Roll grinding Six Sigma project
 
aserra_phdthesis_ppt
aserra_phdthesis_pptaserra_phdthesis_ppt
aserra_phdthesis_ppt
 
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
Fuzzy Self-Learning Controllers for Elasticity Management in Dynamic Cloud Ar...
 
Efficient Implementation of Self-Organizing Map for Sparse Input Data
Efficient Implementation of Self-Organizing Map for Sparse Input DataEfficient Implementation of Self-Organizing Map for Sparse Input Data
Efficient Implementation of Self-Organizing Map for Sparse Input Data
 
Video Compression Basics by sahil jain
Video Compression Basics by sahil jainVideo Compression Basics by sahil jain
Video Compression Basics by sahil jain
 

Recently uploaded

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 

Recently uploaded (20)

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 

Model Compression

  • 1.
  • 2. 3 This image is licensed under CC-BY 2.0 This image is in the public domain This image is in the public domain This image is licensed under CC-BY 2.0
  • 3. IMAGE RECOGNITION SPEECH RECOGNITION 2012 AlexNet 2015 ResNet 152 layers 22.6 GFLOP ~3.5% error 8 layers 1.4 GFLOP ~16% Error 16X Model 2014 Deep Speech 1 2015 Deep Speech 2 80 GFLOP 7,000 hrs of Data ~8% Error 10X Training Ops 465 GFLOP 12,000 hrs of Data ~5% Error Microsoft Baidu
  • 4. Hard to distribute large models through over-the-air update This image is licensed under CC-BY 2.0 App icon is in the public domain Phone image is licensed under CC-BY 2.0
  • 5. ResNet18: 10.76% 2.5 days ResNet50: 7.02% 5 days ResNet101: 6.21% 1 week ResNet152: 6.16% 1.5 weeks Error rate Training time
  • 6. AlphaGo: 1920 CPUs and 280 GPUs, $3000 electric bill per game on mobile: drains battery
  • 7. larger model => more memory reference => more energy
  • 8. 1 Operatio n Energy [pJ] 32 bit int ADD 0.1 32 bit float ADD 0.9 32 bit Register File 1 32 bit int MULT 3.1 32 bit float MULT 3.7 32 bit SRAM Cache 5 32 bit DRAM Memory 640 Relative Energy Cost 1 10 =1000 100 1000 10000 larger model => more memory reference => more energy
  • 9. Operation Energy[pJ ] Relative Energy Cost 32 bit int AD 0.1 32 bit float ADD 0.9 32 bit Register File 1 32 bit int MULT 3.1 32 bit float MULT 3.7 32 bit SRAM Cache 5 32 bit DRAM Memory 640 1 10 100 1000 10000 how to make deep learning more efficient? larger model => more memory reference => more energy
  • 10. Operation: Energy (pJ) 8b Add 0.03 16b Add 0.05 32b Add 0.1 16b FP Add 0.4 32b FP Add 0.9 8b Mult 0.2 32b Mult 3.1 16b FP Mult 1.1 32b FP Mult 3.7 32b SRAM Read (8KB) 5 32b DRAM Read 640 Area (μm2 ) 36 67 137 1360 4184 282 3495 1640 7700 N/A N/A
  • 11. • 6. Winograd Transformation • 1. Pruning • 2. Weight Sharing • 3. Quantization • 4. Low Rank Approximation • 5. Binary / Ternary Net
  • 12. 6. Winograd Transformation • • 1. Pruning • 2. Weight Sharing • 3. Quantization • 4. Low Rank Approximation • 5. Binary / Ternary Net
  • 13. 13
  • 16. AccuracyLoss -0.5% -1.0% -1.5% -2.0% -2.5% -3.0% -3.5% -4.0% -4.5% 0.5% 0.0% 40% 50% 60% 70% 80% Parameters Pruned Away 90% 100% Pruning 16
  • 18. AccuracyLoss 0.5% 0.0% -0.5% -1.0% -1.5% -2.0% -2.5% -3.0% -3.5% -4.0% -4.5% 40% 50% 60% 70% 80% Parameters Pruned Away 100% Pruning Pruning+Retraining Iterative Pruning and Retraining 90% 18
  • 20. • Original:a basketball playerin a white uniform is playing with a ball • Pruned 90%: a basketball player in a white uniform is playing with a basketball • Original : a brown dog is running through a grassy field • Pruned 90%: a brown dog is running through a grassy area • • Original : a soccer player in red is running in the field Pruned 95%: a man in a red shirt and black and white black shirt is running through a field • • Original : a man is riding a surfboard on a wave Pruned 90%: a man in a wetsuit is riding a wave on a beach 95% 20 90% 90% 90%
  • 21. 50 Trillion Synapses 500 Trillion Synapses 1000 Trillion Synapses Newborn 1 year old Adolescent This image is in the public domain This image is in the public domain This image is in the public domain 21
  • 22. Before Pruning After Pruning After Retraining 22
  • 23. • 1. Pruning • 2. Weight Sharing • 3. Quantization • 4. Low Rank Approximation • 5. Binary / Ternary Net • 6. Winograd Transformation
  • 25. Cluster the Weights Generate Code Book Quantize the Weights with Code Book Retrain Code Book , 32 bit 4bit 2.09, 2.12, 1.92, 1.87 2.0
  • 26. , 2.09 -0.98 1.48 0.09 0.05 -0.14 -1.08 2.12 -0.91 1.92 0 -1.03 1.87 0 1.53 1.49 weights (32 bit float) gradient -0.03 -0.01 0.03 0.02 -0.03 0.12 0.02 -0.07 0.04 -0.01 0.01 -0.02 0.12 group by 0.03 0.01 -0.02 reduce 0.02 -0.01 0.02 0.04 0.01 0.02 -0.01 0.01 0.04 -0.02 0.04 -0.07 -0.02 0.01 -0.02 -0.01 -0.02 -0.01 0.01 -0.03 lr cluster index (2 bit uint) centroids 3 0 2 1 3: 2.00 cluster 1 1 0 3 2: 1.50 0 3 1 0 1: 0.00 3 1 2 2 0: -1.00
  • 27. 2.09 -0.98 1.48 0.09 0.05 -0.14 -1.08 2.12 -0.91 1.92 0 -1.03 1.87 0 1.53 1.49 cluster cluster index (2 bit uint) 3 0 2 1 1 1 0 3 0 3 1 0 3 1 2 2 gradient -0.03 -0.01 0.03 0.02 -0.03 0.12 0.02 -0.07 0.04 -0.01 0.01 -0.02 0.12 group by 0.03 0.01 -0.02 reduce 0.02 -0.01 0.02 0.04 0.01 0.02 -0.01 0.01 0.04 -0.02 0.04 -0.07 -0.02 0.01 -0.02 -0.01 -0.02 -0.01 0.01 -0.03 1: 0: 2: weights (32 bit float)centroids 3: 2.00 1.50 0.00 -1.00 ,
  • 28. 2.09 -0.98 1.48 0.09 0.05 -0.14 -1.08 2.12 -0.91 1.92 0 -1.03 1.87 0 1.53 1.49 cluster weights (32 bit float) centroids 3 0 2 1 1 1 0 3 0 3 1 0 3 1 2 2 cluster index (2 bit uint) gradient -0.03 -0.01 0.03 0.02 -0.03 0.12 0.02 -0.07 0.04 -0.01 0.01 -0.02 0.12 group by 0.03 0.01 -0.02 reduce 0.02 -0.01 0.02 0.04 0.01 0.02 -0.01 0.01 0.04 -0.02 0.04 -0.07 -0.02 0.01 -0.02 -0.01 -0.02 -0.01 0.01 -0.03 1: 0: 2: 3: 2.00 1.50 0.00 -1.00 ,
  • 29. 2.09 -0.98 1.48 0.09 0.05 -0.14 -1.08 2.12 -0.91 1.92 0 -1.03 1.87 0 1.53 1.49 -0.03 -0.01 0.03 0.02 -0.01 0.01 -0.02 0.12 -0.01 0.02 0.04 0.01 -0.07 -0.02 0.01 -0.02 cluster weights (32 bit float) centroids gradient 3 0 2 1 1 1 0 3 0 3 1 0 3 1 2 2 cluster index (2 bit uint) -0.03 0.12 0.02 -0.07 0.04 group by 0.03 0.01 -0.02 reduce 0.02 0.02 -0.01 0.01 0.04 -0.02 0.04 -0.01 -0.02 -0.01 0.01 -0.03 1: 0: 2: 3: 2.00 1.50 0.00 -1.00 ,
  • 30. 2.09 -0.98 1.48 0.09 0.05 -0.14 -1.08 2.12 -0.91 1.92 0 -1.03 1.87 0 1.53 1.49 -0.03 -0.01 0.03 0.02 -0.01 0.01 -0.02 0.12 -0.01 0.02 0.04 0.01 -0.07 -0.02 0.01 -0.02 cluster weights (32 bit float) centroids gradient 3 0 2 1 1 1 0 3 0 3 1 0 3 1 2 2 cluster index (2 bit uint) 2.00 1.50 0.00 -1.00 group by -0.03 0.12 0.02 -0.07 0.03 0.01 -0.02 r 0.02 -0.01 0.01 0.04 -0.02 -0.01 -0.02 -0.01 0.01 0.04 educe 0.02 0.04 -0.03 1: 0: 2: 3: ,
  • 31. 2.09 -0.98 1.48 0.09 0.05 -0.14 -1.08 2.12 -0.91 1.92 0 -1.03 1.87 0 1.53 1.49 -0.03 -0.01 0.03 0.02 -0.01 0.01 -0.02 0.12 -0.01 0.02 0.04 0.01 -0.07 -0.02 0.01 -0.02 cluster weights (32 bit float) centroids gradient 3 0 2 1 1 1 0 3 0 3 1 0 3 1 2 2 cluster index (2 bit uint) 2.00 1.50 0.00 -1.00 group by -0.03 0.12 0.02 -0.07 0.03 0.01 -0.02 r 0.02 -0.01 0.01 0.04 -0.02 -0.01 -0.02 -0.01 0.01 0.04 0.02 0.04 -0.03 educe 1: lr0: 2: 3:
  • 32. 2.09 -0.98 1.48 0.09 0.05 -0.14 -1.08 2.12 -0.91 1.92 0 -1.03 1.87 0 1.53 1.49 -0.03 -0.01 0.03 0.02 -0.01 0.01 -0.02 0.12 -0.01 0.02 0.04 0.01 -0.07 -0.02 0.01 -0.02 cluster weights (32 bit float) centroids gradient 3 0 2 1 1 1 0 3 0 3 1 0 3 1 2 2 cluster index (2 bit uint) 2.00 1.50 0.00 -1.00 group by -0.03 0.12 0.02 -0.07 0.03 0.01 -0.02 0.02 -0.01 0.01 0.04 -0.02 -0.01 -0.02 -0.01 0.01 fine-tuned centroids reduce 1.96 1.48 -0.04 -0.97l 0.04 0.02 0.04 -0.03 1: r0: 2: 3: ,
  • 36. .
  • 37. .
  • 38. ,
  • 40. •In-frequent weights: use more bits to represent •Frequent weights: use less bits to represent , Encode Weights Encode Index Huffman Encoding
  • 41. Train Connectivity Prune Connections Train Weights Cluster the Weights Generate Code Book Quantize the Weights with Code Book Retrain Code Book Pruning: less number of weights Quantization: less bits per weight original size 9x-13x reduction 27x-31x reduction same accuracy same accuracy original network Encode Weights Encode Index Huffman Encoding 35x-49x reduction same accuracy Summaryof Deep Compression
  • 42. Network Original Size Original Accuracy Compressed Accuracy LeNet-300 1070KB 98.36% 98.42% LeNet-5 1720KB 99.20% 99.26% AlexNet 240MB 80.27% 80.30% VGGNet 550MB 88.68% 89.09% GoogleNet 28MB 88.90% 88.92% ResNet-18 44.6MB Compressed Compression Size Ratio 27KB 40x 44KB 39x 6.9MB 35x 11.3MB 49x 2.8MB 10x 4.0MB 11x 89.24% 89.28% Can we make compact models to begin with? .
  • 43. 1x1 Conv Expand 3x3 Conv Expand Input 1x1 Conv Squeeze Output Concat/Eltwise
  • 44. Accuracy Accuracy AlexNet - 240MB 1x 57.2% 80.3% AlexNet SVD 48MB 5x 56.0% 79.4% AlexNet Deep Compression 6.9MB 35x 57.2% 80.3% SqueezeNet - 4.8MB 50x 57.5% 80.3% SqueezeNet Deep 0.47MB 510x 57.5% 80.3% Compression Network Approach Size Ratio Top-1 Top-5
  • 48. • 1. Pruning • 2. Weight Sharing • 3. Quantization • 4. Low Rank Approximation • 5. Binary / Ternary Net • 6. Winograd Transformation
  • 49. •Train with float •Quantizing the weight and activation: • Gather the statistics for weight and activation • Choose proper radix point position •Fine-tune in float format •Convert to fixed-point format
  • 50.
  • 51.
  • 52.
  • 53. 1 O. 9 O. 8 O. 7 O. 6 O. 5 O. 4 O. 3 O. 2 O. 1 O GoogleNet fp32Fixed 16 Fixed 8 Fixed6 1 O. 9 O. 8 O. 7 O. 6 O. 5 O. 4 O. 3 O. 2 O. 1 O VGG-16 fp32Fixed 16 Fixed 8 Fixed 6
  • 54. • 1. Pruning • 2. Weight Sharing • 3. Quantization • 4. Low Rank Approximation • 5. Binary / Ternary Net • 6. Winograd Transformation
  • 55.
  • 56.
  • 57. • 1. Pruning • 2. Weight Sharing • 3. Quantization • 4. Low Rank Approximation • 5. Binary / Ternary Net • 6. Winograd Transformation
  • 58. -1 0 1−0.05 0.05 0 1600 3200 6400 4800 0 Weight Value Count =>
  • 59. Normalize -1 1 Quantize -1 0 1 -1 0 1 Wp-t 0 t Loss Scale Wn Wp Feed Forward Back Propagate Inference Time Trained Quantization -Wn 0 Full Precision Weight Normalized Full Precision Weight Final Ternary WeightIntermediate Ternary Weight gradient1 gradient2
  • 60. TernaryWeightValue 3 2 1 0 -1 -2 -3 res1.0/conv1/Wn res1.0/conv1/Wp linear/Wn linear/Wpres3.2/conv2/Wn res3.2/conv2/Wp TernaryWeight Percentage 100% 75% 50% 25% 0% 0 50 100 Negatives Zeros Positives Epochs 150 0 50 100 Negatives Zeros Positives 150 0 50 100 150 Negatives Zeros Positives Ternary weights value (above) and distribution (below) with iterations for different layers of ResNet-20 on CIFAR-10.
  • 61. • 1. Pruning • 2. Weight Sharing • 3. Quantization • 4. Low Rank Approximation • 5. Binary / Ternary Net • 6. Winograd Transformation
  • 62. Compute Bound 8 7 6 5 4 3 2 1 0 Filter 0 1 2 3 4 5 6 7 8 Image Tensor 6 2 8 07 1 0 8 1 7 2 6 4 4 5 33 5 9xC FMAs/Output: Math Intensive 4 0 4 1 4 2 4 3 4 4 4 5 4 6 4 7 4 8 9xK FMAs/Input: Good Data Reuse ∑ ✽ Direct convolution: we need 9xCx4 = 36xC FMAs for 4 outputs
  • 63. Transform Data to Reduce Math Intensity 4x4 Tile Output Transform Filter Data Transform Filter Transform ∑ over C Point-wise multiplication Direct convolution: we need 9xCx4 = 36xC FMAs for 4 outputs Winograd convolution: we need 16xC FMAs for 4 outputs: 2.25x fewer FMAs
  • 64. VGG16, Batch Size 1 – Relative Performance 2.00 1.50 1.00 0.50 0.00 conv 1.1 conv 1.2 conv 2.1 conv 2.2 conv 3.1 conv 3.2 conv 4.1 conv 4.2 conv 5.0 cuDNN 3 cuDNN 5
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 73.
  • 74. 74