Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

1,578 views

Published on

video: https://youtu.be/V0dLhyg5_Dw

Papers:

Going Deeper with Convolutions

Rethinking the Inception Architecture for Computer Vision

Inception-v4, Inception-RestNet and the Impact of Residual Connections on Learning

Xception: Deep Learning with Depthwise Separable Convolutions

Published in:
Science

No Downloads

Total views

1,578

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

119

Comments

0

Likes

3

No embeds

No notes for slide

This linear convolution is sufficient for abstraction when the instances of the latent concepts are linearly separable. However, representations that achieve good abstraction are generally highly nonlinear functions of the input data. In conventional CNN, this might be compensated by utilizing an over-complete set of filters [6] to cover all variations of the latent concepts. Namely, individual linear filters can be learned to detect different variations of a same concept. However, having too many filters for a single concept imposes extra burden on the next layer, which needs to consider all combinations of variations from the previous layer [7]. As in CNN, filters from higher layers map to larger regions in the original input. It generates a higher level concept by combining the lower level concepts from the layer below. Therefore, we argue that it would be beneficial to do a better abstraction on each local patch, before combining them into higher level concepts.

- 1. Inception & Xception PR12와 함께 이해하는 Jaejun Yoo Ph.D. Candidate @KAIST PR12 10th Sep, 2017 (GoogLeNet)
- 2. Today’s contents GoogLeNet : Inception models • Going Deeper with Convolution • Rethinking the Inception Architecture for Computer Vision • Inception-v4, Inception-RestNet and the Impact of Residual Connections on Learning
- 3. Today’s contents
- 4. Today’s contents Xception model • Xception: Deep Learning with Depthwise Separable Convolutions
- 5. Motivation Q) What is the best way to improve the performance of deep neural network? A) …
- 6. Motivation Q) What is the best way to improve the performance of deep neural network? A) Bigger size! (Increase the depth and width of the model) Yea DO IT WE ARE GOOGLE
- 7. Problem 1. Bigger model typically means a larger number of parameters → overfitting 2. Increased use of computational resources → e.g. quadratic increase of computation 𝟑 × 𝟑 × 𝑪 → 𝟑 × 𝟑 × 𝑪: 𝑪 𝟐 computations
- 8. Problem 1. Bigger model typically means a larger number of parameters → overfitting 2. Increased use of computational resources → e.g. quadratic increase of computation 𝟑 × 𝟑 × 𝑪 → 𝟑 × 𝟑 × 𝑪: 𝑪 𝟐 computations Solution: Sparsely connected architecture
- 9. Sparsely Connected Architecture 1. Mimicking biological system 2. Theoretical underpinnings → Arora et al. Provable bounds for learning some deep representations ICML 2014 “Given samples from a sparsely connected neural network whose each layer is a denoising autoencoder, can the net (and hence its reverse) be learnt in polynomial time with low sample complexity?”
- 10. Why Arora et al. is important To provably solve optimization problems for general neural networks with two or more layers, the algorithms that would be necessary hit some of the biggest open problems in computer science. So, we don't think there's much hope for machine learning researchers to try to find algorithms that are provably optimal for deep networks. This is because the problem is NP-hard, meaning that provably solving it in polynomial time would also solve thousands of open problems that have been open for decades. Indeed, in 1988 J. Stephen Judd shows the following problem to be NP-hard: https://www.oreilly.com/ideas/the-hard-thing-about-deep-learning Given a general neural network and a set of training examples, does there exist a set of edge weights for the network so that the network produces the correct output for all the training examples? Judd also shows that the problem remains NP-hard even if it only requires a network to produce the correct output for just two-thirds of the training examples, which implies that even approximately training a neural network is intrinsically difficult in the worst case. In 1993, Blum and Rivest make the news worse: even a simple network with just two layers and three nodes is NP-hard to train!
- 11. Sparsely Connected Architecture 1. Mimicking biological system 2. Theoretical underpinnings → Arora et al. Provable bounds for learning some deep representations ICML 2014 “Given samples from a sparsely connected neural network whose each layer is a denoising autoencoder, can the net (and hence its reverse) be learnt in polynomial time with low sample complexity?” YES!
- 12. Layer-wise learning Correlation statistics Videos: • http://techtalks.tv/talks/provable-bounds-for-learning-some-deep-representations/61118/ • https://www.youtube.com/watch?v=c43pqQE176g “If the probability distribution of the data-set is representable by a large, very sparse deep neural network, then the optimal network topology can be constructed layer by layer by analyzing the correlation statistics of the activations of the last layer and clustering neurons with highly correlated outputs.”
- 13. Layer-wise learning Correlation statistics Videos: • http://techtalks.tv/talks/provable-bounds-for-learning-some-deep-representations/61118/ • https://www.youtube.com/watch?v=c43pqQE176g “If the probability distribution of the data-set is representable by a large, very sparse deep neural network, then the optimal network topology can be constructed layer by layer by analyzing the correlation statistics of the activations of the last layer and clustering neurons with highly correlated outputs.” Hebbian principle!
- 14. Problem 1. Sparse matrix computation is very inefficient → dense matrix calculation is extremely efficient 2. Even ConvNet changed back from sparse connection to full connection for better optimize parallel computing
- 15. Problem 1. Sparse matrix computation is very inefficient → dense matrix calculation is extremely efficient 2. Even ConvNet changed back from sparse connection to full connection for better optimize parallel computing Is there any intermediate step?
- 16. Inception architecture Main idea How to find out an optimal local sparse structure in a convolutional network and how can it be approximated and covered by readily available dense components? : All we need is to find the optimal local construction and to repeat it spatially : Arora et al.: layer by layer construction with correlation statistics analysis
- 17. In images, correlations tend to be local
- 18. Cover very local clusters by 1x1 convolutions 1x1 number of filters
- 19. Less spread out correlations 1x1 number of filters
- 20. Cover more spread out clusters by 3x3 convolutions 1x1 3x3 number of filters
- 21. Cover more spread out clusters by 5x5 convolutions 1x1 number of filters 3x3
- 22. Cover more spread out clusters by 5x5 convolutions 1x1 number of filters 3x3 5x5
- 23. A heterogeneous set of convolutions 1x1 number of filters 3x3 5x5
- 24. Schematic view (naive version) 1x1 number of filters 3x3 5x5 1x1 convolutio ns 3x3 convolutio ns 5x5 convolutio ns Filter concaten ation Previous layer
- 25. 1x1 convoluti ons 3x3 convoluti ons 5x5 convoluti ons Filter concate nation Previous layer Naive idea 3x3 max pooli ng
- 26. 1x1 convoluti ons 3x3 convoluti ons 5x5 convoluti ons Filter concate nation Previous layer Naive idea (does not work!) 3x3 max pooli ng
- 27. 1x1 convoluti ons 3x3 convoluti ons 5x5 convoluti ons Filter concate nation Previous layer Inception module 3x3 max pooli ng 1x1 convoluti ons 1x1 convoluti ons 1x1 convoluti ons
- 28. Network in Network (NiN) MLPConvConv MLPConv = 𝟏 × 𝟏 ConvConv = GLM GLM is replaced with a ”micro network” structure
- 29. 𝟏 × 𝟏 Conv 1. Increase the representational power of neural network → In MLPConv sense 2. Dimension reduction → usually the filters are highly correlated
- 30. 𝟓 × 𝟓 Conv → 𝟑 × 𝟑 Conv + 𝟑 × 𝟑 Conv 5x5 : 3x3 = 25 : 9 (25/9 = 2.78 times) 5x5 conv 연산 한번은 당연히 3x3 conv 연산보다 약 2.78 배 비용이 더 들어간다. 만약 크기가 같은 2개의 layer 를 하나의 5x5 로 변환하는 것과 3x3 짜리 2개로 변환하는 것 사이의 비용을 계산해 보자. 5x5xN : (3x3xN) + (3x3xN) = 25 : 9+9 = 25 : 18 (약 28% 의 reduction 효과) https://norman3.github.io/papers/docs/google_inception.html
- 31. 𝟓 × 𝟓 Conv → 𝟑 × 𝟑 Conv + 𝟑 × 𝟑 Conv https://norman3.github.io/papers/docs/google_inception.html
- 32. Xception Observation 1 • Inception module try to explicitly factoring two tasks done by a single convolution kernel: mapping cross-channel correlation and spatial correlation Inception hypothesis • By inception module, These two correlations are sufficiently decoupled.
- 33. Xception Observation • Inception module try to explicitly factoring two tasks done by a single convolution kernel: mapping cross-channel correlation and spatial correlation Inception hypothesis • By inception module, These two correlations are sufficiently decoupled. Would it be reasonable to make a much stronger hypothesis than the Inception hypothesis?
- 34. Xception
- 35. Xception
- 36. Xception
- 37. Xception Commonly called “separable convolution” in deep learning frameworks such as TF and Keras; a spatial convolution performed independently over each channel of an input, followed by a pointwise convolution, i.e. 1 × 1 𝑐𝑜𝑛𝑣 Depthwise separable convolution
- 38. Xception 1. The order of the operations 2. The presence or absence of a non-linearity after the first operaton Xception vs. depthwise separable convolution Regular convolution Inception Depthwise separable convolution
- 39. Xception Regular convolution Inception Depthwise separable convolution Inception modules lie in between! Observation 2 Xception Hypothesis : Make the mapping that entirely decouples the cross-channels correlations and spatial correlations
- 40. Xception Regular convolution Inception Depthwise separable convolution Inception modules lie in between! Observation 2 Xception Hypothesis : Make the mapping that entirely decouples the cross-channels correlations and spatial correlations
- 41. Xception ImageNet JFT
- 42. Reference GoogLeNet : Inception models Blog • https://norman3.github.io/papers/docs/google_inception.html (Kor) • https://blog.acolyer.org/2017/03/21/convolution-neural-nets-part-2/ (Eng) Paper • Going Deeper with Convolution • Rethinking the Inception Architecture for Computer Vision • Inception-v4, Inception-RestNet and the Impact of Residual Connections on Learning Xception models Paper • Xception: Deep Learning with Depthwise Separable Convolutions
- 43. Reducing the grid size efficiently https://norman3.github.io/papers/docs/google_inception.html
- 44. •Pooling을 먼저하면? •이 때는 Representational Bottleneck 이 발생한다. •Pooling으로 인한 정보 손실로 생각하면 될 듯 하다. •예제로 드는 연산은 (d,d,k)(d,d,k) 를 (d/2,d/2,2k)(d/2,d/2,2k) 로 변환하는 Conv 로 확인. (따라서 여기서는 d=35d=35, k=320k=320) •어쨌거나 이렇게 하면 실제 연산 수는, •pooling + stride.1 conv with 2k filter => 2(d/2)2k22(d/2)2k2 연산 수 •strid.1 conv with 2k fileter + pooling => 2d2k22d2k2 연산 수 •즉, 왼쪽은 연산량이 좀 더 작지만 Representational Bottleneck 이 발생. •오른쪽은 정보 손실이 더 적지만 연산량이 2배. https://norman3.github.io/papers/docs/google_inception.html Reducing the grid size efficiently

No public clipboards found for this slide

Be the first to comment