Successfully reported this slideshow.
Your SlideShare is downloading. ×

Fractional step discriminant pruning

Fractional step discriminant pruning

Download to read offline

This is the presentation for the paper "Fractional Step Discriminant Pruning: A Filter Pruning Framework for Deep Convolutional Neural Networks", delivered by N. Gkalelis and V. Mezaris at the 7th IEEE Int. Workshop on Mobile Multimedia Computing (MMC2020) that was held as part of the IEEE Int. Conf. on Multimedia and Expo (ICME), in July 2020.

This is the presentation for the paper "Fractional Step Discriminant Pruning: A Filter Pruning Framework for Deep Convolutional Neural Networks", delivered by N. Gkalelis and V. Mezaris at the 7th IEEE Int. Workshop on Mobile Multimedia Computing (MMC2020) that was held as part of the IEEE Int. Conf. on Multimedia and Expo (ICME), in July 2020.

Advertisement
Advertisement

More Related Content

Similar to Fractional step discriminant pruning

Advertisement
Advertisement

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Fractional step discriminant pruning

  1. 1. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Title of presentation Subtitle Name of presenter Date Fractional step discriminant pruning: a filter pruning framework for deep convolutional neural networks N. Gkalelis, V. Mezaris CERTH-ITI, Thermi - Thessaloniki, Greece IEEE Int. Conf. on Multimedia & Expo Workshops, 7th MMC, London, United Kingdom, July 2020
  2. 2. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Outline 2 • Problem statement • Related work • Filter importance measure • Fractional step pruning strategy • Experiments • Conclusions
  3. 3. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 3 • Deep convolutional neural networks (DCNNs) are witnessing significant commercial deployment due to their breakthrough classification performance in many machine learning tasks Problem statement • Multimedia understanding • Self-driving cars • Edge computing Image Credits: V2Gov Image Credits: [1] [1] Chen, J., Ran, X.: Deep Learning With Edge Computing: A Review, Proc. of the IEEE, vol. 107, no. 8, (Aug. 2019)
  4. 4. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 4 • The deployment of DCNNs in resource-limited or real-time applications is still challenging due to their high computational inference time and storage requirements • DCNNs are highly overparametrized and the use of methods to reduce their capacity may be even beneficial for their performance [2]  How to reduce the size of DCNNs and at the same time retain their generalization performance ? Problem statement [2] Arora, S., Ge, R., Neyshabur, B., Zhang, Y.: Stronger generalization bounds for deep nets via a compression approach, ICML, 2018
  5. 5. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 5 Related work • DCNN compression and acceleration methods can be categorized to: a) pruning, b) low-rank factorization, c) compact conv filters, d) knowledge distillation [3, 4] • Filter pruning is getting increasing attention because: a) achieves high compression rates with small performance degradation, b) is complementary to the methods from the other 3 categories • It consists of: a) filter importance estimation criterion, usually the smaller-norm- less-important, b) pruning strategy, usually an iterative one: training, pruning, retraining, … [3] K. Ota, M.S. Dao, V. Mezaris, F.G.B. De Natale: Deep Learning for Mobile Multimedia: A Survey, ACM Trans. Multimedia Computing Communications & Applications (TOMM), vol. 13, no. 3s, June 2017 [4] Y. Cheng, D. Wang, P. Zhou and T. Zhang: Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges, IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 126-136, Jan. 2018
  6. 6. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 6 Related work • In [5], it is shown that pruning filters with small l2-norm may have a negative impact to network’s performance • FPGM is proposed utilizing a Geometric Median (GM) based measure • FPGM selects a fraction of filters using the l2-norm (usually 10%), and the rest using the GM-based measure  An iterative strategy is used (training, pruning, retraining, …) where all filters corresponding to the target pruning rate are pruned at each iteration [5] Y. He, P. Liu, Z. Wang, Z. Hu and Y. Yang: Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration, CVPR, 2019
  7. 7. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 7 Related work • In [6], it is shown that the iterative pruning strategy, where all selected filters are set to zero from the first iteration, may lead to unrecoverable information loss • Asymptotic pruning strategy: iterative strategy, but, the number of selected filters at each iteration varies asymptotically to the target pruning rate  The l2-norm measure is used to select the filters at each iteration [6] Y. He, X. Dong, G. Kang, Y. Fu, C. Yan and Y. Yang, "Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks, IEEE Trans. on Cybernetics, pp. 1-11, Aug. 2019
  8. 8. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 8 Overview of proposed method • Motivated from limitations in recent works [5, 6] and related research findings in shallow learning [7, 8, 9] we extend [6]: • Replacing the l2-norm-based criterion by: a) Class-Separability (CS) based exploiting labelling information in annotated training datasets [7, 8, 9], b) GM- based [5] • Applying fractional step pruning strategy: not only the number of selected filters but also their weights vary asymptotically to their target value [7] N. Gkalelis, V. Mezaris, I. Kompatsiaris and T. Stathaki: Mixture Subclass Discriminant Analysis Link to Restricted Gaussian Model and Other Generalizations, IEEE Trans. Neural Networks and Learning Systems, vol. 24, no. 1, pp. 8-21, Jan. 2013 [8] R. Lotlikar and R. Kothari: Fractional-step dimensionality reduction, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 623-627, June 2000 [9] K. Fukunaga, Introduction to statistical pattern recognition (2nd ed.), Academic Press Professional, Inc., San Diego, CA, USA, 1990
  9. 9. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 9 Importance measure 1: CS-based • Suppose an annotated training dataset of n observations and m classes • Let Xk (i,j) be the feature map of the k-th observation at the j-th filter of i-th layer • The feature maps are vectorized and stacked to form data matrix X(i,j) for filter (i,j) 𝐗(𝑖,𝑗) = 𝐱1 (𝑖,𝑗), … , 𝐱n (𝑖,𝑗) , 𝐱k (𝑖,𝑗) = 𝑣𝑒𝑐 𝑋 𝑘 (𝑖,𝑗) • A filter discriminant score is then computed using 𝜂(𝑖,𝑗) = 𝑡𝑟 𝐒(𝑖,𝑗) 𝐒(𝑖,𝑗) = 𝛍p (𝑖,𝑗) − 𝛍q (𝑖,𝑗) 𝛍p (𝑖,𝑗) − 𝛍q (𝑖,𝑗) 𝑇 𝑚 𝑞=𝑝+1 𝑚−1 𝑝=1 between-class scatter matrix for filter (i,j) (can be computed efficiently; see paper for details ) Mean vector of class p (class labels are used to compute the means)
  10. 10. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 10 Importance measure 1: CS-based • tr(S(i,j)) quantifies the distance among class distributions using the features produced from the corresponding filter [7,8,9] • A large value indicates that the filter extracts discriminant features for separating the classes • In contrary, filters that extract noise or irrelevant features with respect to the classification task attain very small CS values and can be discarded safely μ1 (i,1) ||v(i,2)|| > ||v(i,1)|| v(i,j) X(i,j) μ2 (i,1) μ1 (i,2) tr(S(i,1)) is large μ2 (i,2) tr(S(i,2)) is very small; despite a possible large l2-norm, filter (i,2) can be safely discarded
  11. 11. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 11 Importance measure 2: GM-based • For large pruning rates, the CS-based criterion may eliminate filters that extract features with small but still important discriminant information • The GM-based measure identifies the most replaceable filters in a layer [5] 𝜂(𝑖,𝑗) = 𝐯(𝑖,𝑗) − 𝐯(𝑖,𝑜) 𝑐 𝑖 𝑜=1 • Combined selection strategy: select a fraction of filters using the CS-based measure and another fraction using the GM-based one
  12. 12. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 12 Fractional step pruning strategy • Let ε, θ be the total epochs and target pruning rate • The pruning rate θι and scaling factor ζι at epoch ι are computed as: 𝜃𝜄 = 𝛼𝑒−𝛽𝜄 + 𝛾 𝜁𝜄 = 1 − 𝜃𝜄 𝜃 • The parameters α, β, γ, are estimated using 3 known points similarly to [6] • The individual pruning rates for the CS and GM-based criteria are 𝜃𝜄 = 𝑚𝑖𝑛 𝜃𝜄, 𝜃𝑓 𝜃𝜄 = 𝜃𝜄 − 𝜃𝜄 • 𝜃𝑓 is the final pruning rate associated with the CS measure (e.g. 10%)
  13. 13. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 13 • CIFAR10 [10]: 10 classes, 32 x 32 color images, 50000 training and 10000 testing observations • ImageNet32 [11]: ILSVRC-2012 where images are resized to 32 x 32; 1000 classes, 32 x 32 color images, 1281167 training and 50000 testing observations • GSC (ver. 0.01) [12]: 12 classes, speech utterances, 51094 training, 6798 validation and 6835 testing • Comparison with MIL [13], PFEC [14], CP [15], SFP [16] , FPGM [6], ASFP [7] [10] Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. rep. (2009) [11] P. Chrabaszcz, I. Loshchilov, and F. Hutter: A downsampled variant of ImageNet as an alternative to the CIFAR datasets, CoRR, vol. abs/1707.08819, 2017 [12] P. Warden, Speech commands: A dataset for limited-vocabulary speech recognition, CoRR, vol. abs/1804.03209, 2018 [13] X. Dong et al., More is less: A more complicated network with less inference complexity, CVPR, Honolulu, HI, USA, July 2017 [14] H. Li et al.: Pruning filters for efficient convnets, ICLR Toulon, France, Apr. 2017 [15] Y. He, X. Zhang, and J. Sun: Channel pruning for accelerating very deep neural networks, ICCV, Venice, Italy, Oct. 2017 [16] Y. He et al., “Soft filter pruning for accelerating deep convolutional neural networks,” IJCAI, Stockholm, Sweden, July 2018
  14. 14. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 14 • Experimental setup for CIFAR10 and ImageNet32, same as in FPGM [6], ASFP [7] • Images are normalized to zero mean and unit variance, data augmentation is applied (cropping, mirroring, flipping, etc.) • ResNet, CE loss, Minibatch SGD, Nesterov momentum 0.9, batch size 128, weight decay 0.0005, ε = 200 • Initial learning rate is 0.01, divided by 5 at epochs 60, 120, 160 for CIFAR10, and by 10 every 10 epochs for ImageNet32
  15. 15. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 15 • Experimental setup for GSC as in [17] • Log mel-spectrogams (LMSs) are used for speech commands representation to derive 32 x 32 LMS for each recording: 16KHz sampling rate, STFT with Hamming window of size 1024, hop length 512, 32 mel filterbanks, etc. • Augmentation: pitch shifting, mixing with background noise, etc. • ResNet, CE loss, Minibatch SGD, Nesterov momentum 0.9, batch size 96, weight decay 0.0005, ε = 70, initial learning rate is 0.01 and divided by 10 at epoch 50 [17] J. Salamon and J. P. Bello: Deep convolutional neural networks and data augmentation for environmental sound classification, IEEE Signal Process. Lett., vol. 24, no. 3, pp. 279–283, Mar. 2017.
  16. 16. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 16 θ=40% No pr. MIL [13] PFEC [14] CP [15] SFP [16] AFSP [6] FPGM [7] FSDP(𝜃𝑓=10%) FSDP(𝜃𝑓=40%) ResNet20 92.2% 91.43% --- --- 90.83% --- 91.99% 92.02% 92.09% ResNet56 93.59% --- 91.31% 90.90% 92.26% 92.44% 92.89% 93.13% 93.1% ResNet110 93.68% 93.44% 92.44% --- 93.38% 93.2% 93.85% 93.91% 93.93% • Correct classification rates (CCRs) in CIFAR10 with pruning rates θ = 40%, 50% θ=50% FPGM FSDP(𝜃𝑓=10%) ResNet20 89.73% 90.16% ResNet56 91.79% 92.64% ResNet110 92.51% 93.72% • FSDP outperforms all other methods • E.g., > 1% CCR improvement over FPGM (second best method) for ResNet110, θ = 50 %
  17. 17. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 17 θ=20% No pruning SFP [16] FPGM [7] FSDP(𝜃𝑓=10%) ImageNet32 40.79% 29.92% 37.23% 38.3% GSC 97.47% 94.57% 95.64% 96.22% • CCRs in ImageNet32 and GSC with ResNet56 and pruning rates θ = 20%, 50% • Evaluation of SFP, FPGM, FSDP (based on performance results in CIFAR10) θ=50% FPGM FSDP(𝜃𝑓=10%) ImageNet32 32.32% 33.23% GSC 92.89% 94.66% • FSDP outperforms both SFP and FPGM • In the challenging ImageNet32 dataset the performance drop of SFP is quite high; this is attributed to the l2-norm based criterion, where a fraction of the selected filters still carry significant discriminant information
  18. 18. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Experiments 18 • Visualization of FSDP (𝜃𝑓 = 20%) while training a ResNet20 in CIFAR10, with θ = 20% • Illustration of CS measure scores for each filter at epochs 10, 40, 200 (figures from left to right) • Filters closer to the input seem to attain high discriminant scores (especially in the initial epochs) • Surviving filters of the 2nd conv layer in residual blocks (e.g., 11, 13, 15, 17) accumulate a quite high discriminant power as the training proceeds • After a certain number of epochs, the surviving filters in the last conv layer attain a high discriminant power
  19. 19. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project Summary and next steps 19 • A new filter pruning approach was presented exploiting a class-separability-based measure for estimating the importance of the filters and a fractional step strategy to prune them asymptotically • The proposed approach was evaluated successfully in three popular datasets (CIFAR-10, ImageNet32, GSC) for image and speech classification tasks • As a future work, we are planning to investigate the use of variable pruning rates utilizing the discriminant scores at layer-level, similarly to the globally-comparing criteria in [14,18] [18] P. Molchanov et al.: Pruning convolutional neural networks for resource efficient inference, ICLR, Toulon, France, Apr. 2017
  20. 20. retv-project.eu @ReTV_EU @ReTVproject retv-project retv_project 20 Thank you for your attention! Questions? Nikolaos Gkalelis, gkalelis@iti.gr Vasileios Mezaris, bmezaris@iti.gr Code publicly available at: https://github.com/bmezaris/fractional_step_discriminant_pruning_dcnn This work was supported by the EUs Horizon 2020 research and innovation programme under grant agreement H2020-780656 ReTV

×