Convolutional Neural Networks

Convolutional Neural Networks
(CNN / ConvNet)
2017.03.31
KAIST iDBLab
윤상훈
이 문서는 나눔글꼴로 작성되었습니다. 설치하기

1. Perceptron / MLP
2. Backpropagation
3. Convolutional Neural Networks
4. ReLU
5. Dropout
6. ILSVRC
목차

Perceptron
Perceptron
3 / 14
• 생물체(인간)의 신경 세포(뉴런)을 본따 모델
• Activation function 𝑓(𝑧)의 출력값이 일정값(역치, threshold) 이상이면 뉴런
• Conventional activation function → sigmoid, tanh
• Linear combination of input 𝑥 → sigmoid, tanh → Logistic regression
참조 - http://hunkim.github.io/ml/lec8.pdf
𝑓
𝑖
𝑤𝑖 𝑥𝑖 + 𝑏
𝑥
𝑏

Perceptron
Perceptron
4 / 14
• Logistic regression → Linear decision boundary (hyperplane)
• Linearly separable한 문제(AND, OR)는 잘 해결하지만 그렇지 못한 문제(XOR)는 풀지 못
한다
참조 – http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
http://programmermagazine.github.io/201404/img/perceptronLinearAnalysis.jpg

MLP
Multi-layer Perceptron (MLP)
5 / 14
• 선을 하나 더 긋고(perceptron 하나 더 추가) 두 선 사이에 있는지 체크(layer 하나 더 추가)
• 여러 레이어로 perceptron을 쌓으면 non-linear한 문제도 풀 수 있다
• 쌓으면 쌓을수록 complex, 넓으면 넓을수록 complex
• 근데... weight 학습을 못하겠음 ㅜㅜ (Perceptrons, Marvin Minsky, 1969)
참조 - https://i.stack.imgur.com/nRZ6z.png

Backpropagation
Backpropagation
6 / 14
• 무슨 소리! 이렇게 하면 됨 ㅎㅎ
• Chain rule → ‘Back’-propagation
• Gradient descent (공을 굴리자!)
• Learning representations by back-propagating errors. Hinton et al. (1986) Nature
참조 - https://i.stack.imgur.com/H1KsG.png
http://underflow.fr/wp-content/uploads/2014/03/parabola-floor.png
𝐸 = 1
2 𝑐 𝑗 𝑦 𝑗,𝑐−𝑑 𝑗,𝑐
2
𝑦
𝑤
Δ𝑤 = −𝜀 𝜕𝐸 𝜕𝑤

Convolutional Layer
Convolutional Layer
7 / 14
• Pattern recognition에서는 절대적인 위치보다 상대적인 위치가 중요
• Shift, scale, distortion invariance → local receptive field, shared weights
• parameter 개수도 현저히 줄어듬
• Gradient-based learning applied to document recognition. LeCun et al. (1998)
Proceedings of the IEEE
참조 - https://www.slideshare.net/zukun/p03-neural-networks-cvpr2012-deep-learning-methods-for-vision

Convolutional Layer
Convolutional Layer
8 / 14
• Layer마다 여러 개의 feature를 learning
• Receptive field = filter = weight = feature detector
• 𝑆𝑖𝑛𝑝𝑢𝑡 = ℎ𝑖𝑛𝑝𝑢𝑡 × 𝑤𝑖𝑛𝑝𝑢𝑡 × 𝑑𝑖𝑛𝑝𝑢𝑡
• 𝑆𝑓𝑖𝑙𝑡𝑒𝑟 = ℎ 𝑓𝑖𝑙𝑡𝑒𝑟 × 𝑤𝑓𝑖𝑙𝑡𝑒𝑟 × 𝑑𝑖𝑛𝑝𝑢𝑡
• 𝑆 𝑜𝑢𝑡𝑝𝑢𝑡 = ℎ𝑖𝑛𝑝𝑢𝑡 × 𝑤𝑖𝑛𝑝𝑢𝑡 × 𝑛 𝑓𝑖𝑙𝑡𝑒𝑟
• Number of filters = Depth(channel) of output
• Output is also called feature map 참조 - http://cs231n.github.io/convolutional-networks/#conv

Pooling Layer
Pooling Layer
9 / 14
• down(sub)sampling
• feature map에서 특정 feature의 position의 정확도를 낮춤으로써 shift, distortion에 대한
sensitivity를 낮춘다
• max pool, avg pool
• conv layer의 stride를 늘림으로써 제거하는 추세 (ResNet, Generative models)
참조 - http://cs231n.github.io/convolutional-networks/#conv

CNN / ConvNet
Convolutional Neural Network
10 / 14
• oriented edges, endpoints, corners → high-order
• LeNet → AlexNet → ZF Net → GoogLeNet, VGG → ResNet
참조 - http://killianlevacher.github.io/blog/posts/post-2016-03-01/img/layeredRepresentation.jpg
https://www.embedded-vision.com/sites/default/files/technical-articles/FPGAsNeuralNetworks/Figure1.jpg

Vanishing Gradient Problem
Vanishing Gradient Problem
11 / 14
• sigmoid나 tanh를 activation function으로 사용하게 되면 gradient가 0에 가까워진다
• RBM과 같은 unsupervised learning을 통해서 pre-train을 하면 이런 현상을 어느정도 해소
할 수 있다
• Understanding the difficulty of training deep feedforward neural networks. Glorot et al.
(2010) Aistats.
참조 - https://nn.readthedocs.io/en/rtd/transfer/
sigmoid tanh

ReLU
Rectified Linear Unit
12 / 14
• ReLU
• max(𝑥, 0)
• Gradient가 1 또는 0 → 비교적 빠르게 학습
• ILSVRC 2012에서 우승한 AlexNet에서 채택
• ImageNet Classification with Deep Convolutional Neural Networks. Hinton et al. (2012)
NIPS
참조 - https://nn.readthedocs.io/en/rtd/transfer/

ReLU
Rectified Linear Unit
13 / 14
• Q: 그럼 그냥 activation function을 linear하게 하면 되지 않나요?
• A: Linear function은 합성해도 linear하기 때문에 그저 logistic regression이 되어버림(single
layer)
• Q: 그럼 ReLU가 뭘 하는 건가요?
• A: Piece-wise linear tiling: mapping is locally linear
ℎ3 = 𝑤3 𝑤2 𝑤1 𝑣 = 𝐴𝑣
참조 - http://www.hellot.net/_UPLOAD_FILES/semina/KAIST_김준모.pdf
https://www.slideshare.net/milkers/lecture-06-marco-aurelio-ranzato-deep-learning
Sparse activation

Dropout
Dropout
14 / 14
• Regularization: 데이터셋이 충분히 크지 않을 때 overfitting을 방지하기 위해
• 0.5의 확률로 hidden unit을 끔
• Ensemble method: 여러 모델을 학습한 뒤 평균
• Improving neural networks by preventing co-adaptation of feature detectors. Hinton et al.
(2012). arXiv.org.
• Dropout: a simple way to prevent neural networks from overfitting. Hinton et al. (2014).
Journal of Machine Learning Research 15

ILSVRC
ImageNet Large-Scale Visual Recognition Challenge
15 / 14
ImageNet
• Over 14M labeled images
• About 22K categories
Data for ambiance ratings
• 1,000 categories(classes)
• 1.2M training images
• 50K validation images
• 150K test images
• Top-1 and Top-5 error rate
• Winners:
– AlexNet (2012)
– ZF Net (2013)
– GoogLeNet, VGG (2014)
– ResNet (2015)

Q&A

1.1 페이지 제목
Reference
18 / 14
• 홍콩과기대 김성훈 교수님 YouTube 딥러닝 강의
• SKT 이승은님 Slideshare
• Stanford Univ. CS231n: CNN for Visual Recognition
• KAIST 김준모 교수님 2017 패턴인식 및 기계학습 겨울학교

감사합니다

Convolutional Neural Networks

More Related Content

What's hot

Similar to Convolutional Neural Networks

More from Sanghoon Yoon

Convolutional Neural Networks