Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Deep Residual Learning (ILSVRC2015 ... by Hirokatsu Kataoka 13823 views
- AI&BigData Lab 2016. Александр Баев... by GeeksLab Odessa 2750 views
- Res netと派生研究の紹介 by masataka nishimori 11630 views
- Deep Learning Class #2 - Deep learn... by Holberton School 1326 views
- ConvNetJS & CaffeJS by Anyline 1888 views
- Generative adversarial text to imag... by Universitat Polit... 1178 views

2,267 views

Published on

Published in:
Science

No Downloads

Total views

2,267

On SlideShare

0

From Embeds

0

Number of Embeds

153

Shares

0

Downloads

91

Comments

0

Likes

3

No embeds

No notes for slide

- 1. Article overview by Ilya Kuzovkin K. He, X. Zhang, S. Ren and J. Sun Microsoft Research Computational Neuroscience Seminar University of Tartu 2016 Deep Residual Learning for Image Recognition ILSVRC 2015 MS COCO 2015 WINNER
- 2. THE IDEA
- 3. 1000 classes
- 4. 2012 8 layers 15.31% error
- 5. 2012 8 layers 15.31% error 2013 9 layers, 2x params 11.74% error
- 6. 2012 8 layers 15.31% error 2013 2014 9 layers, 2x params 11.74% error 19 layers 7.41% error
- 7. 2012 8 layers 15.31% error 2013 2014 2015 9 layers, 2x params 11.74% error 19 layers 7.41% error ?
- 8. 2012 8 layers 15.31% error 2013 2014 2015 9 layers, 2x params 11.74% error 19 layers 7.41% error Is learning better networks as easy as stacking more layers? ?
- 9. ? 2012 8 layers 15.31% error 2013 2014 2015 9 layers, 2x params 11.74% error 19 layers 7.41% error Is learning better networks as easy as stacking more layers? Vanishing / exploding gradients
- 10. ? 2012 8 layers 15.31% error 2013 2014 2015 9 layers, 2x params 11.74% error 19 layers 7.41% error Is learning better networks as easy as stacking more layers? Vanishing / exploding gradients Normalized initialization & intermediate normalization
- 11. ? 2012 8 layers 15.31% error 2013 2014 2015 9 layers, 2x params 11.74% error 19 layers 7.41% error Is learning better networks as easy as stacking more layers? Vanishing / exploding gradients Normalized initialization & intermediate normalization Degradation problem
- 12. Degradation problem “with the network depth increasing, accuracy gets saturated”
- 13. Not caused by overﬁtting: Degradation problem “with the network depth increasing, accuracy gets saturated”
- 14. Conv Conv Conv Conv Trained Accuracy X% Tested
- 15. Conv Conv Conv Conv Trained Accuracy X% Conv Conv Conv Conv Identity Identity Identity Identity Tested Tested
- 16. Conv Conv Conv Conv Trained Accuracy X% Conv Conv Conv Conv Identity Identity Identity Identity Same performance Tested Tested
- 17. Conv Conv Conv Conv Trained Accuracy X% Conv Conv Conv Conv Identity Identity Identity Identity Same performance Conv Conv Conv Conv Conv Conv Conv Conv Trained Tested Tested Tested
- 18. Conv Conv Conv Conv Trained Accuracy X% Conv Conv Conv Conv Identity Identity Identity Identity Same performance Conv Conv Conv Conv Conv Conv Conv Conv Trained Worse! Tested Tested Tested
- 19. Conv Conv Conv Conv Trained Accuracy X% Conv Conv Conv Conv Identity Identity Identity Identity Same performance Conv Conv Conv Conv Conv Conv Conv Conv Trained Worse! Tested Tested Tested “Our current solvers on hand are unable to ﬁnd solutions that are comparably good or better than the constructed solution (or unable to do so in feasible time)”
- 20. Conv Conv Conv Conv Trained Accuracy X% Conv Conv Conv Conv Identity Identity Identity Identity Same performance Conv Conv Conv Conv Conv Conv Conv Conv Trained Worse! Tested Tested Tested “Our current solvers on hand are unable to ﬁnd solutions that are comparably good or better than the constructed solution (or unable to do so in feasible time)” “Solvers might have difﬁculties in approximating identity mappings by multiple nonlinear layers”
- 21. Conv Conv Conv Conv Trained Accuracy X% Conv Conv Conv Conv Identity Identity Identity Identity Same performance Conv Conv Conv Conv Conv Conv Conv Conv Trained Worse! Tested Tested Tested “Our current solvers on hand are unable to ﬁnd solutions that are comparably good or better than the constructed solution (or unable to do so in feasible time)” “Solvers might have difﬁculties in approximating identity mappings by multiple nonlinear layers” Add explicit identity connections and “solvers may simply drive the weights of the multiple nonlinear layers toward zero”
- 22. Add explicit identity connections and “solvers may simply drive the weights of the multiple nonlinear layers toward zero” is the true function we want to learn
- 23. Add explicit identity connections and “solvers may simply drive the weights of the multiple nonlinear layers toward zero” is the true function we want to learn Let’s pretend we want to learn instead.
- 24. Add explicit identity connections and “solvers may simply drive the weights of the multiple nonlinear layers toward zero” is the true function we want to learn Let’s pretend we want to learn instead. The original function is then
- 25. Network can decide how deep it needs to be…
- 26. Network can decide how deep it needs to be… “The identity connections introduce neither extra parameter nor computation complexity”
- 27. 2012 8 layers 15.31% error 2013 2014 2015 9 layers, 2x params 11.74% error 19 layers 7.41% error ?
- 28. 2012 8 layers 15.31% error 2013 2014 2015 9 layers, 2x params 11.74% error 19 layers 7.41% error 152 layers 3.57% error
- 29. EXPERIMENTS AND DETAILS
- 30. • Lots of convolutional 3x3 layers • VGG complexity is 19.6 billion FLOPs 34-layer-ResNet is 3.6 bln. FLOPs
- 31. • Lots of convolutional 3x3 layers • VGG complexity is 19.6 billion FLOPs 34-layer-ResNet is 3.6 bln. FLOPs ! • Batch normalization • SGD with batch size 256 • (up to) 600,000 iterations • LR 0.1 (divided by 10 when error plateaus) • Momentum 0.9 • No dropout • Weight decay 0.0001
- 32. • Lots of convolutional 3x3 layers • VGG complexity is 19.6 billion FLOPs 34-layer-ResNet is 3.6 bln. FLOPs ! • Batch normalization • SGD with batch size 256 • (up to) 600,000 iterations • LR 0.1 (divided by 10 when error plateaus) • Momentum 0.9 • No dropout • Weight decay 0.0001 ! • 1.28 million training images • 50,000 validation • 100,000 test
- 33. • 34-layer ResNet has lower training error. This indicates that the degradation problem is well addressed and we manage to obtain accuracy gains from increased depth.
- 34. • 34-layer ResNet has lower training error. This indicates that the degradation problem is well addressed and we manage to obtain accuracy gains from increased depth. ! • 34-layer-ResNet reduces the top-1 error by 3.5%
- 35. • 34-layer ResNet has lower training error. This indicates that the degradation problem is well addressed and we manage to obtain accuracy gains from increased depth. ! • 34-layer-ResNet reduces the top-1 error by 3.5% ! • 18-layer ResNet converges faster and thus ResNet eases the optimization by providing faster convergence at the early stage.
- 36. GOING DEEPER
- 37. Due to time complexity the usual building block is replaced by Bottleneck Block 50 / 101 / 152 - layer ResNets are build from those blocks
- 38. ANALYSIS ON CIFAR-10
- 39. ImageNet Classiﬁcation 2015 1st 3.57% error ImageNet Object Detection 2015 1st 194 / 200 categories ImageNet Object Localization 2015 1st 9.02% error COCO Detection 2015 1st 37.3% COCO Segmentation 2015 1st 28.2% http://research.microsoft.com/en-us/um/people/kahe/ilsvrc15/ilsvrc2015_deep_residual_learning_kaiminghe.pdf

No public clipboards found for this slide

Be the first to comment