Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[Icml2019] parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization

640 views

Published on

2019年6月6日
ICML 2019 Reading Session @LeapMind

title: Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization

Reader
Azuma Kohei
LeapMind DL Engineer

Published in: Engineering
  • Multiple Time Lotto Winner Shocks The System�Reveals All! ◆◆◆ http://t.cn/Airf5UFH
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

[Icml2019] parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization

  1. 1. Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization Reader: LeapMind, DL Engineer Azuma Kohei LeapMind ICML2019 Reading Session
  2. 2. 1. Paper Info 2. Background 3. Problem 4. Proposed 5. Experiment 6. Discussion 7. Conclusion 2 Contents
  3. 3. LeapMind Inc. © 2018 Paper Info 3 ● “Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization” ○ ICML 2019 ○ Author: Hesham Mostafa, Xin Wang ● Why did I choose this paper ? ○ It is interesting for me to see the nature of the deep learning model through Pruning ○ pruning without pre-trained model, it sounds interesting
  4. 4. LeapMind Inc. © 2018 Background: Overview 4 ● Deep Learning Models are often over parameterized ○ ex. parameter num of VGG16: 138 million, 512MB (FP32) ○ Pressure on Memory ● Several ways to reduce model sizes without significantly reducing accuracy ○ Quantization: weight size reduction ■ ex. 138 million, 512MB -> 138 million, 128MB (int8) ○ Pruning: model size reduction ■ ex. 138 million, 512MB -> 35 million, 128MB (FP32) ○ (Distillation)
  5. 5. LeapMind Inc. © 2018 Background: Pruning 5 Parameter num can be reduced by 80~90% without degrading accuracy 1. Pre-Train a large dense model 2. Prune & Re-Train and get sparse model a. remove connections with weights below the threshold b. Re-Train [1] 2.a
  6. 6. LeapMind Inc. © 2018 Problem of pruning 6 Pre-Training remains memory-inefficient Models for pre-training have many parameters Can we train compact models directly ? The effectiveness of pruning indicates the existence of compact network parameter configurations
  7. 7. LeapMind Inc. © 2018 Related Works 7 ● Static sparse training ○ static: during training, fix the location of non-zero parameters ○ training a static and sparse model worse than compressing a large dense model [4] ○ static models are sensitive to initialization [5] ● Dynamic sparse reparameterization training ○ dynamic: during training, alter the location of non-zero parameters ○ using certain heuristic rules ○ ex. SET [2], DeepR [3]
  8. 8. LeapMind Inc. © 2018 Proposed 8 ● Dynamic sparse reparametrization technique ○ use adaptive threshold ○ automatically reallocate parameters across layers parameters are reallocated every hundreds of iterations with the algorithm described below
  9. 9. LeapMind Inc. © 2018 Proposed 9 ● Pruning is based on an adaptive global threshold ○ more scalable than methods relying on layer-specific pruning (SET[2]) ● Prune roughly num parameters ○ computationally cheaper than pruning exactly smallest weights (because not to need sort) ● Redistribute zero-initialized parameters after pruning ○ rule: layers having larger fractions of non-zero weights receive proportionally more free parameters ○ the numbers of pruned and grown free parameters are exactly the same rule: G_l: reallocate parameter num for layer l R_l: not pruned parameter num for layer l K_l: pruned parameter num for layer l
  10. 10. LeapMind Inc. © 2018 Proposed 10 pruning adjustment of threshold reallocate
  11. 11. LeapMind Inc. © 2018 Experiment 1: Evaluation with Baselines 11 comparing accuracy against existing methods ● Model: WRN-28-2 (Appendix A), Dataset: CIFAR10 ● Baselines ○ Full dense : original large and dense model (original WRN-28-2) ○ Compressed sparse : sparse model obtained by iteratively pruning Full dense [4] ○ Thin dense : dense model with fewer layers ○ Static sparse : sparse static model obtained by sparsing randomly ○ SET[2] : existing dynamic reparameterization ○ DeepR[3] : existing dynamic reparameterization ● Sparse models have same global sparsity ○ parameter num of Full dense : ○ parameter num of sparse model :
  12. 12. LeapMind Inc. © 2018 Experiment 1: Evaluation with Baselines 12 A. Full dense: pre trained original model B. Compressed sparse: pruned A C. Thin dense: A with less layers D. Static sparse: E. Dynamic sparse: this thesis ● E slightly better than A and SET ● Increase Global sparsity, C and D significantly degrade
  13. 13. LeapMind Inc. © 2018 Experiment 2: Search for Important Elements 13 Which is important ? Network sparse structure or Initialization ? ● “lottery ticket” hypothesis [5] ○ large network have some subnetworks ○ initialization is important (see Appendix B) ● After training with dynamic sparse method, retrain the final network sparseness pattern ○ randomly re-initialized ○ original initialization
  14. 14. LeapMind Inc. © 2018 Experiment 2: Search for Important Elements 14 ● random, original initialization failed to reach dynamic sparse ● initialization had little effect on dynamic sparse reparametrization ● dynamic reparametrization is important ○ solely the structure, nor to its initialization, nor to a combination of the two
  15. 15. LeapMind Inc. © 2018 Discussion 15 ● Dynamic reparametrization is important ○ solely the sparse structure, nor to its initialization, nor to a combination of the two ○ discontinuous jumps in parameter space when parameters are reallocated across layers helped training escape sharp minima that generalize badly ● Better to allocate some memory to explore more sophisticated network ● Computational efficiency is difficult ○ CPU and GPU cannot efficiently handle unstructured sparsity model
  16. 16. LeapMind Inc. © 2018 Conclusion 16 ● Dynamic sparse reparametrization for pruning ○ use adaptive threshold ○ automatically reallocate parameters across layers ● Performance was significantly higher than baselines ○ much better than static methods ○ slightly better than Compressed sparse ● Dynamic exploration of structure during training is important ○ solely the structure, nor to its initialization, nor to a combination of the two
  17. 17. LeapMind Inc. © 2018 Reference 17 [1] Learning both Weights and Connections for Efficient Neural Networks, Han, et al. NIPS 2015. [2] Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science, Mocanu, et al. Nature Communications 2018. [3] Deep Rewiring: Training very sparse deep networks, Bellec, et al. arxiv 2017. [4] To prune, or not to prune: exploring the efficacy of pruning for model compression, Zhu, et al. arxiv 2017. [5] The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks, Flankle, and Carbin. arxiv 2018.
  18. 18. LeapMind Inc. © 2018 Appendix A 18

×