Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Shake ruijie

18 views

Published on

Ruijie Quan

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Shake ruijie

  1. 1. SHAKE-SHAKE & SHAKE-DROP REGULARIZATION Ruijie Quan 2018/07/08
  2. 2. SHAKE-SHAKE − Help deep learning practitioners faced with an overfit problem 2 Shake-Shake regularization 10.07.2018 − The idea is to replace the standard summation of parallel branches with a stochastic affine combination in a multi-branch network. Input Images Internal Representations Motivation: Data Augmentation Techniques Resnet+2 residual branches: Proposed modification:
  3. 3. • Shake-Shake regularization can be seen as an extension of this concept where gradient noise is replaced by a form of gradient augmentation. SHAKE-SHAKE Shake-Shake regularization • Adding noise to the gradient during training helps training and generalization of complicated neural networks. 10.07.2018 3
  4. 4. II MOTIVATION Shake: all scaling coefficients are overwritten with new random numbers before the pass. Even: all scaling coefficients are set to 0.5 before the pass. Keep: we keep, for the backward pass, the scaling coefficients used during the forward pass. 10.07.2018 Shake-Shake regularization 4
  5. 5. COMPARISONSWITHSTATE-OF-THE-ARTRESULTS 510.07.2018 Shake-Shake regularization
  6. 6. CORRELATION BETWEEN RESIDUAL BRANCHES Whether the correlation between the 2 residual branches is increased or decreased by the regularization? 10.07.2018 6 Shake-Shake regularization Conclusion • At the end of the residual blocks forces an alignment of the layers on the left and right residual branches. • The correlation between the output tensors of the 2 residual branches seems to be reduced by the regularization. The regularization forces the branches to learn something different.
  7. 7. REGULARIZATION STRENGTH 10.07.2018 7 Shake-Shake regularization
  8. 8. SHAKE-DROP 810.07.2018 SHAKEDROP REGULARIZATION Shake-Shake: (1)Shake-Shake can be applied to only multi-branch architectures (i.e., ResNeXt). (2) Shake-Shake is not memory efficient A similar disturbance to Shake-Shake on a single residual block. Not trivial to realize Shake-Drop disturbs learning more strongly by multiplying even a negative factor to the output of a convolutional layer in the forward training pass. (To stabilize the learning process by employing ResDrop in a different usage from the usual. )
  9. 9. 10.07.2018 9 SHAKE-DROP SHAKEDROP REGULARIZATION
  10. 10. 1010.07.2018 SHAKE-DROP SHAKEDROP REGULARIZATION SHAKE-SHAKE REGULARIZATION (DRAWBACKS) SIMILAR REGULARIZATION TO SHAKE-SHAKE ON 1-BRANCH NETWORK ARCHITECTURES STABILIZING LEARNING WITH INTRODUCTION OF MECHANISM OF RESDROP too strong perturbation
  11. 11. EXPERIMENTS 10.07.2018 11 SHAKEDROP REGULARIZATION
  12. 12. EXPERIMENTS 10.07.2018 12 SHAKEDROP REGULARIZATION
  13. 13. Thank you for your attention.

×