They proposed two novel methods.
1. Stripe-Wise Pruning (SWP)
They propose a new pruning paradigm called SWP (Stripe-Wise Pruning)
They achieve a higher pruning ratio compared to the filter-wise, channel-wise, and group-wise pruning methods.
2. Filter Skeleton (FS)
They propose a new method ‘Filter Skeleton’ to efficiently learn the optimal shape of the filters for pruning.
They didn't much compare with other baselines. But they obviously suggested the novel methods, that is why I choose for review when reviewing the paper. More, they said that It is State-of-the-art (SOTA) method of lately pruning methods.
Exploring the Future Potential of AI-Enabled Smartphone Processors
[NeuralIPS 2020]filter in filter pruning
1. Data-driven AI
Security HCI (DASH) Lab
1
Data-driven AI
Security HCI (DASH) Lab
Pruning Filter in Filter
김민하
소프트웨어학과
성균관대학교
NeuralIPS 2020
May 6, 2020
Data-driven AI
Security HCI (DASH) Lab
2. Data-driven AI
Security HCI (DASH) Lab
Pruning?
[NIPS2015] Learning both Weights and Connections for Efficient Neural Networks https://arxiv.org/abs/1506.02626
To remove the weight in neural network model
Weight Pruning
Weight Pruning (WP) prunes weight of each filter
It remove redundant neurons iteratively
3. Data-driven AI
Security HCI (DASH) Lab
Pruning?
C : number of channels
N : number of output channels
W,H : Width / High
Filter/Channel Pruning
Filter/Channel Pruning (FP) prunes at the level of filter and channel
It can prune a large region compared with the weight pruning
1. 2.
4. Data-driven AI
Security HCI (DASH) Lab
Pruning?
C : number of channels
N : number of output channels
W,H : Width / High
Group Pruning
It breaks the independent assumption on the filters
Although the position in each filter same, the importance of weights is different
The network may lose representation ability under a large pruning ratio
5. Data-driven AI
Security HCI (DASH) Lab
Abstract
model deployment is sometimes costly due to the large number of parameters in
DNNs
To solve this, ‘Pruning’ which is one of model compression algorithms
Filter Pruning(FP), Channel Pruning (CP), Weight Pruning (WP) and Group Pruning (GP)…
Proble
m
Backgrou
nd
However, these can lose Important information because of pruning the weights of the
same position
They wonder if they can learn the optimal kernel size of each filter by pruning
6. Data-driven AI
Security HCI (DASH) Lab
Abstract
Solutio
n
To converge the strength of filter pruning and weight pruning, they propose method as
Stripe-Wise Pruning (SWP) with Filter Skeleton (FS).
To fine the ‘filter shape’ alongside the filter weights, they propose ‘Filter Skeleton (FS)’
It treat a filter as K x K stripes, by pruning the stripes instead of the whole filter
It can achieve finer granularity than traditional FP while being hardware friendly
7. Data-driven AI
Security HCI (DASH) Lab
C : number of channels
N : number of output channels
W,H : Width / High
Stripe Pruning
It keeps each filter independent with each other,
thus can lead to a more efficient network structure.
(Their proposed method)
Proposed Method – Filter Skeleton(FS)
8. Data-driven AI
Security HCI (DASH) Lab
Proposed Method – Filter Skeleton(FS)
: # of filters
: channels
: Kernel size
: Filter Skeleton
: feature map height
: feature map width
Filter Skeleton (FS)
• Learnable matrix that reflects the shape of each filter
• Values of FS first initialized to 1
• l-th convolutional layer’s weight W : 𝑅𝑁×𝐶×𝐾×𝐾
→ FS : 𝑅𝑁×𝐾×𝐾
Each filter has one FS
9. Data-driven AI
Security HCI (DASH) Lab
Proposed Method – Filter Skeleton(FS)
● Filter Skeleton(FS)
○ Loss function
○ Gradient of W(Weight), I(Filter Skeleton)
○ Gradient of W, I
(1)
(2)
(3)
(4)
: # of filters
: channels
: Kernel size
: Filter Skeleton
: feature map height
: feature map width
10. Data-driven AI
Security HCI (DASH) Lab
Proposed Method – Filter Skeleton(FS)
Mini-figure one of 9 strips
X-axis all the filters (N)
Y-axis summation of the stripes located in the same position of all the filters
Filter (3x3)
11. Data-driven AI
Security HCI (DASH) Lab
Proposed Method – Stripe-wise pruning (SWP)
• Stripe-wise pruning (SWP)
• Set threshold δ
• Corresponding values in FS < δ → not updated during training → pruned
α : magnitude of regularization
g(I) : L1 norm penalty on ‘I’
(5)
(6)
13. Data-driven AI
Security HCI (DASH) Lab
Experiments-Group Pruning vs. Stripe Pruning
Can continue the training
Can not continue the training
Group-wise pruning (GP)
They find that in GW, layer2.7 filters will be identified as invalid
because all the weights are removed while training
It can not continue the training
Stripe-wise pruning (SWP)
Stripe-wise pruning keeps each filter independent of each other
It can continue the training and achieve a higher accuracy than GP
15. Data-driven AI
Security HCI (DASH) Lab
• White color denotes the corresponding strip in the filter is removed by SWP
• In the layer that close to input, most preserved layers have multiple strips
• In middle layers, SWP only have one strip redundancy is decreased
Experiments - Visualization of the filters pruned by SWP
(VGG19)
Layer #
Display the filters according to their frequency in such layer
Highest
Frequency
Lowest
Frequency
16. Data-driven AI
Security HCI (DASH) Lab
Experiments – Ablation Study
• How hyper-parameters affect pruning results
• Changing α (magnitude of regularization), δ (threshold)
• α = 1e-5, δ = 0.05 gives the acceptable pruning ratio and test accuracy
17. Data-driven AI
Security HCI (DASH) Lab
Conclusion
• Stripe-Wise Pruning (SWP)
- They propose a new pruning paradigm called SWP (Stripe-Wise Pruning)
- They achieve a higher pruning ratio compared to the filter-wise and
group-wise pruning methods.
- It achieve finer granularity than traditional FP while being hardware
friendly
• Filter Skeleton (FS)
- They propose a new method ‘Filter Skeleton’ to efficiently learn the
optimal shape of the filters for pruning
- Through extensive experiments and analyses, they demonstrate
the effectiveness
• Achievement SOTA pruning ratio
- They show SWP achieves state-of-art pruning ratio on CIFAR-10 and
ImageNet datasets compared to filter-wise, channel-wise or group-
wise pruning