Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Convolutional Neural Networks

Slides for explaining about CNN / ConvNet to newbies

  • Login to see the comments

  • Be the first to like this

Convolutional Neural Networks

  1. 1. Convolutional Neural Networks (CNN / ConvNet) 2017.03.31 KAIST iDBLab 윤상훈 이 문서는 나눔글꼴로 작성되었습니다. 설치하기
  2. 2. 1. Perceptron / MLP 2. Backpropagation 3. Convolutional Neural Networks 4. ReLU 5. Dropout 6. ILSVRC 목차
  3. 3. Perceptron Perceptron 3 / 14 • 생물체(인간)의 신경 세포(뉴런)을 본따 모델 • Activation function 𝑓(𝑧)의 출력값이 일정값(역치, threshold) 이상이면 뉴런 • Conventional activation function → sigmoid, tanh • Linear combination of input 𝑥 → sigmoid, tanh → Logistic regression 참조 - http://hunkim.github.io/ml/lec8.pdf 𝑓 𝑖 𝑤𝑖 𝑥𝑖 + 𝑏 𝑥 𝑏
  4. 4. Perceptron Perceptron 4 / 14 • Logistic regression → Linear decision boundary (hyperplane) • Linearly separable한 문제(AND, OR)는 잘 해결하지만 그렇지 못한 문제(XOR)는 풀지 못 한다 참조 – http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html http://programmermagazine.github.io/201404/img/perceptronLinearAnalysis.jpg
  5. 5. MLP Multi-layer Perceptron (MLP) 5 / 14 • 선을 하나 더 긋고(perceptron 하나 더 추가) 두 선 사이에 있는지 체크(layer 하나 더 추가) • 여러 레이어로 perceptron을 쌓으면 non-linear한 문제도 풀 수 있다 • 쌓으면 쌓을수록 complex, 넓으면 넓을수록 complex • 근데... weight 학습을 못하겠음 ㅜㅜ (Perceptrons, Marvin Minsky, 1969) 참조 - https://i.stack.imgur.com/nRZ6z.png
  6. 6. Backpropagation Backpropagation 6 / 14 • 무슨 소리! 이렇게 하면 됨 ㅎㅎ • Chain rule → ‘Back’-propagation • Gradient descent (공을 굴리자!) • Learning representations by back-propagating errors. Hinton et al. (1986) Nature 참조 - https://i.stack.imgur.com/H1KsG.png http://underflow.fr/wp-content/uploads/2014/03/parabola-floor.png 𝐸 = 1 2 𝑐 𝑗 𝑦 𝑗,𝑐−𝑑 𝑗,𝑐 2 𝑦 𝑤 Δ𝑤 = −𝜀 𝜕𝐸 𝜕𝑤
  7. 7. Convolutional Layer Convolutional Layer 7 / 14 • Pattern recognition에서는 절대적인 위치보다 상대적인 위치가 중요 • Shift, scale, distortion invariance → local receptive field, shared weights • parameter 개수도 현저히 줄어듬 • Gradient-based learning applied to document recognition. LeCun et al. (1998) Proceedings of the IEEE 참조 - https://www.slideshare.net/zukun/p03-neural-networks-cvpr2012-deep-learning-methods-for-vision
  8. 8. Convolutional Layer Convolutional Layer 8 / 14 • Layer마다 여러 개의 feature를 learning • Receptive field = filter = weight = feature detector • 𝑆𝑖𝑛𝑝𝑢𝑡 = ℎ𝑖𝑛𝑝𝑢𝑡 × 𝑤𝑖𝑛𝑝𝑢𝑡 × 𝑑𝑖𝑛𝑝𝑢𝑡 • 𝑆𝑓𝑖𝑙𝑡𝑒𝑟 = ℎ 𝑓𝑖𝑙𝑡𝑒𝑟 × 𝑤𝑓𝑖𝑙𝑡𝑒𝑟 × 𝑑𝑖𝑛𝑝𝑢𝑡 • 𝑆 𝑜𝑢𝑡𝑝𝑢𝑡 = ℎ𝑖𝑛𝑝𝑢𝑡 × 𝑤𝑖𝑛𝑝𝑢𝑡 × 𝑛 𝑓𝑖𝑙𝑡𝑒𝑟 • Number of filters = Depth(channel) of output • Output is also called feature map 참조 - http://cs231n.github.io/convolutional-networks/#conv
  9. 9. Pooling Layer Pooling Layer 9 / 14 • down(sub)sampling • feature map에서 특정 feature의 position의 정확도를 낮춤으로써 shift, distortion에 대한 sensitivity를 낮춘다 • max pool, avg pool • conv layer의 stride를 늘림으로써 제거하는 추세 (ResNet, Generative models) 참조 - http://cs231n.github.io/convolutional-networks/#conv
  10. 10. CNN / ConvNet Convolutional Neural Network 10 / 14 • oriented edges, endpoints, corners → high-order • LeNet → AlexNet → ZF Net → GoogLeNet, VGG → ResNet 참조 - http://killianlevacher.github.io/blog/posts/post-2016-03-01/img/layeredRepresentation.jpg https://www.embedded-vision.com/sites/default/files/technical-articles/FPGAsNeuralNetworks/Figure1.jpg
  11. 11. Vanishing Gradient Problem Vanishing Gradient Problem 11 / 14 • sigmoid나 tanh를 activation function으로 사용하게 되면 gradient가 0에 가까워진다 • RBM과 같은 unsupervised learning을 통해서 pre-train을 하면 이런 현상을 어느정도 해소 할 수 있다 • Understanding the difficulty of training deep feedforward neural networks. Glorot et al. (2010) Aistats. 참조 - https://nn.readthedocs.io/en/rtd/transfer/ sigmoid tanh
  12. 12. ReLU Rectified Linear Unit 12 / 14 • ReLU • max(𝑥, 0) • Gradient가 1 또는 0 → 비교적 빠르게 학습 • ILSVRC 2012에서 우승한 AlexNet에서 채택 • ImageNet Classification with Deep Convolutional Neural Networks. Hinton et al. (2012) NIPS 참조 - https://nn.readthedocs.io/en/rtd/transfer/
  13. 13. ReLU Rectified Linear Unit 13 / 14 • Q: 그럼 그냥 activation function을 linear하게 하면 되지 않나요? • A: Linear function은 합성해도 linear하기 때문에 그저 logistic regression이 되어버림(single layer) • Q: 그럼 ReLU가 뭘 하는 건가요? • A: Piece-wise linear tiling: mapping is locally linear ℎ3 = 𝑤3 𝑤2 𝑤1 𝑣 = 𝐴𝑣 참조 - http://www.hellot.net/_UPLOAD_FILES/semina/KAIST_김준모.pdf https://www.slideshare.net/milkers/lecture-06-marco-aurelio-ranzato-deep-learning Sparse activation
  14. 14. Dropout Dropout 14 / 14 • Regularization: 데이터셋이 충분히 크지 않을 때 overfitting을 방지하기 위해 • 0.5의 확률로 hidden unit을 끔 • Ensemble method: 여러 모델을 학습한 뒤 평균 • Improving neural networks by preventing co-adaptation of feature detectors. Hinton et al. (2012). arXiv.org. • Dropout: a simple way to prevent neural networks from overfitting. Hinton et al. (2014). Journal of Machine Learning Research 15
  15. 15. ILSVRC ImageNet Large-Scale Visual Recognition Challenge 15 / 14 ImageNet • Over 14M labeled images • About 22K categories Data for ambiance ratings • 1,000 categories(classes) • 1.2M training images • 50K validation images • 150K test images • Top-1 and Top-5 error rate • Winners: – AlexNet (2012) – ZF Net (2013) – GoogLeNet, VGG (2014) – ResNet (2015)
  16. 16. Q&A 이 문서는 나눔글꼴로 작성되었습니다. 설치하기
  17. 17. 1.1 페이지 제목 Reference 18 / 14 • 홍콩과기대 김성훈 교수님 YouTube 딥러닝 강의 • SKT 이승은님 Slideshare • Stanford Univ. CS231n: CNN for Visual Recognition • KAIST 김준모 교수님 2017 패턴인식 및 기계학습 겨울학교
  18. 18. 감사합니다 이 문서는 나눔글꼴로 작성되었습니다. 설치하기

×